Author Topic: hanging on the restore pre-validation process  (Read 2497 times)

Offline johnzm

  • Newbie
  • *
  • Posts: 20
  • Karma: +0/-0
    • View Profile
hanging on the restore pre-validation process
« on: December 11, 2017, 02:23:43 pm »
Hi,

i am trying to restore a dru that failed. i pulled the failed drive out (2tb) inserted a new 4tb drive. during the restore pre-validation process all disk activity drops to 0 and it does not move forward. according to the logs, this is happening on a folder that i deleted from the array (off a different drive) a couple weeks ago.

i was thinking a next step could be to unplug all drives that are a part of the parity so that it cannot validate, and hope that it starts to restore.

does anyone have any thoughts on what i should try next?
« Last Edit: December 11, 2017, 02:29:04 pm by johnzm »

Offline johnzm

  • Newbie
  • *
  • Posts: 20
  • Karma: +0/-0
    • View Profile
Re: hanging on the restore pre-validation process
« Reply #1 on: December 11, 2017, 06:33:22 pm »
well,

i tried unplugging the other drives and according to the logs it got farther, but then failed with a
[2017-12-11 19:17:57,951] ERROR: [restore] error: IOException[Too many failed devices! Failed=22 - Number of devices supported to be restored at a time: 1]
java.io.IOException: Too many failed devices! Failed=22 - Number of devices supported to be restored at a time: 1


this has been a terrible experience. there was no reason for a file unrelated to the restore to completely halt operations.  additionally, this wasted almost 16 hours of my day.

this time i am lucky, it seems the drive fails only after many hours plugged in, and so far im able to successfully pull the data off.

but i do wonder why i even keep a parity drive, if this is the result of my attempt to restore.  can anyone please provide any insight into the reason why i had this issue, so that i might be able to avoid it in the future?


Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: hanging on the restore pre-validation process
« Reply #2 on: December 11, 2017, 07:54:37 pm »
well,

i tried unplugging the other drives and according to the logs it got farther, but then failed with a
[2017-12-11 19:17:57,951] ERROR: [restore] error: IOException[Too many failed devices! Failed=22 - Number of devices supported to be restored at a time: 1]
java.io.IOException: Too many failed devices! Failed=22 - Number of devices supported to be restored at a time: 1
....
Sorry, but you are being quite goofy.  ::)
When you don't understand things or are unsure, it is best not to do anything rather than doing totally silly things.

The basic premise of RAID is that all surviving disks are required to rebuild the failed one.

Quote
...
this has been a terrible experience. there was no reason for a file unrelated to the restore to completely halt operations.  additionally, this wasted almost 16 hours of my day.
...
Those files are fully related to being able to restore the failed data. That's just how the system works. Parity is not backup. Think of it as an algorithm that's used to rebuild data but it requires the remaining data as an input to do it.

The Restore Pre-Validation will take quite a while to run if you have a lot of files and have missing files. It does (or tries to do) all sort of smart things so that it can restore successfully despite the screw ups.

Offline johnzm

  • Newbie
  • *
  • Posts: 20
  • Karma: +0/-0
    • View Profile
Re: hanging on the restore pre-validation process
« Reply #3 on: December 11, 2017, 08:31:12 pm »
Sorry, but you are being quite goofy.  ::)
When you don't understand things or are unsure, it is best not to do anything rather than doing totally silly things.

The basic premise of RAID is that all surviving disks are required to rebuild the failed one.
Those files are fully related to being able to restore the failed data. That's just how the system works. Parity is not backup. Think of it as an algorithm that's used to rebuild data but it requires the remaining data as an input to do it.

The Restore Pre-Validation will take quite a while to run if you have a lot of files and have missing files. It does (or tries to do) all sort of smart things so that it can restore successfully despite the screw ups.

so, i can still attempt a parity restore (id like to know that it works) but when i see the logs are 3+ hours behind current and see it stall in the same place multiple times i tend to think there might be an issue.

what do you need to help me troubleshoot this so i can learn and not have so many issues the next time?

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: hanging on the restore pre-validation process
« Reply #4 on: December 11, 2017, 09:58:58 pm »
so, i can still attempt a parity restore (id like to know that it works) but when i see the logs are 3+ hours behind current and see it stall in the same place multiple times i tend to think there might be an issue.

what do you need to help me troubleshoot this so i can learn and not have so many issues the next time?
It is really about you understanding the fundamentals. So, hit up the wiki.

1. Google "parity raid" and read up
2. Make sure you  are familiar with (in case tRAID might be a better option for you): http://wiki.flexraid.com/2011/10/18/understanding-the-limitations-of-snapshot-raid/
3. http://wiki.flexraid.com/2011/03/26/testing-flexraids-recovery-abilities/
4. Back to the wiki and read as much as you can: http://wiki.flexraid.com/category/raidf/

Offline johnzm

  • Newbie
  • *
  • Posts: 20
  • Karma: +0/-0
    • View Profile
Re: hanging on the restore pre-validation process
« Reply #5 on: December 12, 2017, 04:58:13 am »
i had read up prior to purchase, and thought i had setup the array accordingly..
it was running weekly updates per the scheduler, and i just looked and do not see an option for t raid in my configuration.
« Last Edit: December 13, 2017, 10:41:31 am by Brahim »

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: hanging on the restore pre-validation process
« Reply #6 on: December 12, 2017, 09:43:32 am »
i had read up prior to purchase, and thought i had setup the array accordingly..
it was running weekly updates per the scheduler, and i just looked and do not see an option for t raid in my configuration.
...
It helps to re-read things.
tRAID is a different product. See comparison here: http://www.flexraid.com/download-try-buy/ and here http://www.flexraid.com/2013/10/04/table-comparison-of-transparent-raid-vs-raid-over-file-system/

Also, it looks like you have 22 data disks to one parity. Is your system strong enough to handle that many drives in a single array?
Maybe split the array into two for easier maintenance?

Offline johnzm

  • Newbie
  • *
  • Posts: 20
  • Karma: +0/-0
    • View Profile
Re: hanging on the restore pre-validation process
« Reply #7 on: December 12, 2017, 10:09:44 am »
i am running dual quad core xeon with 32 gigs ram. ideally the array physical disk will decrease in size as bigger drives become more common, but that makes a drive failure even worse with no hope of restore (due to my fault or others)
i am going to enable trace logging and attempt to restore once more. i will paste the end of the log in here for review.

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: hanging on the restore pre-validation process
« Reply #8 on: December 12, 2017, 11:56:50 am »
It is not about CPU or RAM. It is about your power supply and system stability under heavy I/O load.
So, both your PSU and disk controllers must be robust for the load. Not saying it is not. Just saying keep an eye on that aspect.

Offline johnzm

  • Newbie
  • *
  • Posts: 20
  • Karma: +0/-0
    • View Profile
Re: hanging on the restore pre-validation process
« Reply #9 on: December 12, 2017, 12:02:00 pm »
It is not about CPU or RAM. It is about your power supply and system stability under heavy I/O load.
So, both your PSU and disk controllers must be robust for the load. Not saying it is not. Just saying keep an eye on that aspect.

I wish the software aspect came as easily as the hardware aspect for me :)

system is 220 watts under load at the wall, and power supply is 1000 watts with over 80 amps available on the 12 volt rails :)

i am running the restore now, and will post the results of this instance soon

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: hanging on the restore pre-validation process
« Reply #10 on: December 12, 2017, 02:05:49 pm »
I wish the software aspect came as easily as the hardware aspect for me :)

system is 220 watts under load at the wall, and power supply is 1000 watts with over 80 amps available on the 12 volt rails :)
...
With 23+ disks, expect a much higher usage under peak load. But great that the PSU is 1KW.

....
i am running the restore now, and will post the results of this instance soon
Hopefully, you enabled TRACE logging for more details: http://wiki.flexraid.com/2011/03/26/how-to-enable-more-verbose-logging/

Offline johnzm

  • Newbie
  • *
  • Posts: 20
  • Karma: +0/-0
    • View Profile
Re: hanging on the restore pre-validation process
« Reply #11 on: December 13, 2017, 05:59:48 am »
i snapped a screenshot from the results of letting it run overnight.  this is the same that i've seen multiple times when trying to rebuild this particular disk


« Last Edit: December 13, 2017, 10:44:43 am by johnzm »

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: hanging on the restore pre-validation process
« Reply #12 on: December 13, 2017, 10:42:05 am »
Your screenshots are not rending.

Offline johnzm

  • Newbie
  • *
  • Posts: 20
  • Karma: +0/-0
    • View Profile
Re: hanging on the restore pre-validation process
« Reply #13 on: December 13, 2017, 10:46:08 am »
how about now?

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: hanging on the restore pre-validation process
« Reply #14 on: December 13, 2017, 11:27:25 am »
Fire up Resource Monitor and see what the RAID-F service process is doing.