Author Topic: Possible drive failure - should i run chkdsk ?  (Read 1594 times)

Offline wiggum77

  • Newbie
  • *
  • Posts: 14
  • Karma: +0/-0
    • View Profile
Possible drive failure - should i run chkdsk ?
« on: January 26, 2016, 03:12:30 pm »
I noticed some issues with my configuration this morning (Cruise Control, with Pooling) and after digging around, I may have a imminent drive failure.

Windows event viewer shows:
Quote
Log Name:      System
Source:        Ntfs
Date:          27/01/2016 8:01:10 AM
Event ID:      55
Task Category: (2)
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      PC1
Description:
The file system structure on the disk is corrupt and unusable. Please run the chkdsk utility on the volume 3TB Seagate DM001.

So my question is simple... should i run chkdsk?

I might be living in the past, but i remember in older versions of windows, running chkdsk seemed to cause more problems that it solved. If you saw the chkdsk window, you usually knew that drive was done for.

I also have a couple of follow-up questions (apologies if they're dumb ones, its been a while since i messed with this stuff):
- If i were to run chkdsk, should i be stopping the drive pool first? or remove the drive from the pool altogether?
- Should i assume this drive is on its way out, and be looking to replacing it ASAP?

Thx in advance...

Edit: I just did a Quick Validate, and it showed:
The validation process is reporting 10 new, 0 changed, 516 deleted, and 0 corrupted files!

Safe to say, i did not delete 516 files recently :)

Also, ive got nightly Update & Validate jobs scheduled, and the Validate seems to have been failing regularly, on the same files every time:
Quote
2016-01-22 04:00:00,350] INFO : Trigger verify.SilentBobRaid[verify][Every Day at: 04:00:00
[Starting = null/null/null null:null:null]
[Ending = null/null/null null:null:null]] fired job SilentBobRaid.SilentBobRaid[verify] at:  04:00:00 01/22/2016
[2016-01-22 04:00:00,354] INFO : Executing verify...
[2016-01-22 04:00:00,354] INFO : [verify] started at: Fri Jan 22 04:00:00 EST 2016
[2016-01-22 04:00:07,165] INFO : Using RAID engine Tx
[2016-01-22 04:00:07,249] INFO : Starting verify at: 0 - Number of bytes: 2938445496320
[2016-01-22 06:31:45,355] ERROR: Verify failure: bit mismatch!
[2016-01-22 06:31:45,355] ERROR: At byte: 842813931520...
[2016-01-22 06:31:45,355] ERROR: NOTE: C:\FlexRAID-Managed-Pool\class1_0\{d361e181-9fba-4123-a075-043123184efa}\<filename1_removed> was open at the time of the failure
[2016-01-22 06:31:45,355] ERROR: INFO: healthy file => C:\FlexRAID-Managed-Pool\class1_0\{d361e181-9fba-4123-a075-043123184efa}\<filename1_removed>
[2016-01-22 06:31:45,355] ERROR: INFO: healthy file => C:\FlexRAID-Managed-Pool\class1_0\{d361e181-9fba-4123-a075-043123184efa}\<filename2_removed>
[2016-01-22 06:31:45,355] ERROR: NOTE:  was open at the time of the failure
[2016-01-22 06:31:45,355] ERROR: INFO: ... (N/A)
[2016-01-22 06:31:45,355] ERROR: NOTE: C:\FlexRAID-Managed-Pool\class1_0\{a25b4719-c758-4c4b-94af-afd20e96f297}\<filename3_removed>.avi was open at the time of the failure
[2016-01-22 06:31:45,355] ERROR: INFO: healthy file => C:\FlexRAID-Managed-Pool\class1_0\{a25b4719-c758-4c4b-94af-afd20e96f297}<filename3_removed>.avi
[2016-01-22 06:31:45,355] ERROR: INFO: healthy file => C:\FlexRAID-Managed-Pool\class1_0\{a25b4719-c758-4c4b-94af-afd20e96f297}<filename4_removed>.avi
[2016-01-22 06:31:45,355] ERROR: NOTE: C:\FlexRAID-Managed-Pool\class1_0\{4ec756ae-1413-4370-bc58-4238bb9171d8}\000785.flxr was open at the time of the failure
[2016-01-22 06:31:45,355] ERROR: INFO: healthy file => C:\FlexRAID-Managed-Pool\class1_0\{4ec756ae-1413-4370-bc58-4238bb9171d8}\000785.flxr
[2016-01-22 06:31:45,355] ERROR: INFO: healthy file => C:\FlexRAID-Managed-Pool\class1_0\{4ec756ae-1413-4370-bc58-4238bb9171d8}\000784.flxr
[2016-01-22 06:31:45,355] ERROR: ***Failure details for parity: 1***
[2016-01-22 06:31:45,355] ERROR: Buffer size: 1048576 - Number of failed bytes in buffer: 1042515
[2016-01-22 06:31:45,355] ERROR: First failure at position: 842813931520 - Last failure at position: 842814980095
[2016-01-22 06:31:45,357] INFO : Verify failure: operation aborted!
[2016-01-22 06:31:45,357] INFO : [verify] completed at: Fri Jan 22 06:31:45 EST 2016
[2016-01-22 06:31:53,630] INFO : Trigger verify.SilentBobRaid[verify][Every Day at: 04:00:00
[Starting = null/null/null null:null:null]
[Ending = null/null/null null:null:null]] completed firing job SilentBobRaid.SilentBobRaid[verify] at  06:31:53 01/22/2016 with resulting trigger instruction code: DO NOTHING


Would it be safe to assume the above errors are related to the potential failed disk?
« Last Edit: January 27, 2016, 04:11:04 am by wiggum77 »

Offline wiggum77

  • Newbie
  • *
  • Posts: 14
  • Karma: +0/-0
    • View Profile
Re: Possible drive failure - should i run chkdsk ?
« Reply #1 on: January 27, 2016, 04:17:30 am »
So in preparation, i went out and bought a replacement drive today, as well as USB HDD dock (since im out of sata ports on my MoBo)

I;ve initialized the new drive, and given it a drive letter... should i be doing a restore from Parity straight to the new drive at this point ?



Or should i perhaps do the following instead?
- stop the pool
- give the potential failed drive (DRU2) a drive letter and remove it from the pool
- copy everything i can from the old DRU2 to the new drive
- put the new drive back in the pool as the "new" DRU2, and then restore (to potentially get back in those 500+ missing files)

again, apologies if these are dumb questions. Ive read through a bunch of the documentation, and while i think i should be following the above 4 steps, I'm just a bit tentative to jump in without someone with more experience giving me the thumbs up :)

Thanks again in advance...
« Last Edit: January 27, 2016, 04:25:32 am by wiggum77 »

Offline Kikoyasi

  • Newbie
  • *
  • Posts: 2
  • Karma: +0/-0
    • View Profile
Re: Possible drive failure - should i run chkdsk ?
« Reply #2 on: January 28, 2016, 02:19:07 am »
View from the content shown, then it may be explained not understand much, but it is good.

Offline wiggum77

  • Newbie
  • *
  • Posts: 14
  • Karma: +0/-0
    • View Profile
Re: Possible drive failure - should i run chkdsk ?
« Reply #3 on: January 28, 2016, 07:19:17 am »
So i got a different error the other day:

Code: [Select]
Log Name:      System
Source:        Disk
Date:          27/01/2016 11:42:11 PM
Event ID:      7
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      SILENTBOB2
Description:
The device, \Device\Harddisk4\DR4, has a bad block.

So its a fairly good chance that drive is on it way out....

Im just wanting to make sure i do this the right way, with minimal data loss... can someone in the know please confirm if my proposed steps below are the best way to go... just terrified i'll mess it up  :-[ :-\

- stop the pool
- give the failed drive (DRU2) a drive letter and remove it from the pool
- copy everything i can from the old DRU2 to the new drive
- put the new drive back in the pool as the "new" DRU2, and then restore (to potentially get back in those 500+ missing files)


Offline Newhaven

  • Newbie
  • *
  • Posts: 40
  • Karma: +3/-0
    • View Profile
Re: Possible drive failure - should i run chkdsk ?
« Reply #4 on: January 28, 2016, 07:43:52 pm »
If you are absolutely certain the the last update to the storage pool was good, then you can do a swap/restore-along the lines as the method you proposed. If not certain, assign the damaged drive a drive letter. Run chkdisk /f /r on it. If the chkdisk is successful, immediately copy the contents to the new drive and swap it. Then run an update which might take awhile-depends on whether the last update was accurate or screwed up.

In my opinion, the best method is to run the chkdsk on the disk at bedtime and and then begin copying the contents in the morning :). Should this method fail, then you will still have the option of trying the restore.

Offline wiggum77

  • Newbie
  • *
  • Posts: 14
  • Karma: +0/-0
    • View Profile
Re: Possible drive failure - should i run chkdsk ?
« Reply #5 on: January 29, 2016, 06:50:30 am »
Thanks heaps for the reply Newhaven. I've had the chkdsk running for a while now, and it looks to have fixed a few things around the place  :D

We'll see how it goes....


Offline wiggum77

  • Newbie
  • *
  • Posts: 14
  • Karma: +0/-0
    • View Profile
Re: Possible drive failure - should i run chkdsk ?
« Reply #6 on: February 01, 2016, 06:44:34 am »
Just to close the loop on this one... after running chkdsk, it identified a few issues with a couple of files, but for the most part the data was intact.

I was able to successfully copy all the data into the new WD Red drive, and did a swap out. I then ran an update and a validate. The 5 files that chkdsk flagged as having problems showed up in the validate results as "corrupted". I've since deleted them (as they are not all that important)

The only thing i need to work out, is if its possible to rename drives within a pool ... for example, here;s the pool before i did the swap:


So what shows as L: drive in that screenshot has now been swapped in, and become the "new X:" if you will... since i like to have the drives named so i know which is which (when trying to troubleshoot physical issues in the case), is there any way to easily rename the drives in drive manager? or would i have to delete the config and start over? (I'm hesitant to do the latter - everythings working right now :P having the drives named isnt a show-stopper)