Author Topic: Failed restore situation, need procedures...  (Read 308 times)

Online sforgez

  • Jr. Member
  • **
  • Posts: 70
  • Karma: +0/-0
    • View Profile
Failed restore situation, need procedures...
« on: November 29, 2017, 03:04:45 pm »
So I've updated Flexraid F to latest (11/21/2017) and since I was in the process of having to upgrade one of my DRU drive (2TB to 8TB), I decided to test the restore.

So here is my FlexRaid configuration:
10 DRU (2TB, 3TB, 4TB, etc..)
1 PPU (8TB)

The update job run every morning at 3:00 AM

-I took DRU2 (2TB) out of the server (the one I wanted to swap)
-Connected the new one (8TB) in the server, activated the disk and created the partition
-Then I simply went in Drive Manager, selected the DRU2 (missing at this point) and select restore and swap out
-I chose the first option "swap the drive first then restore the data into the new drive (and selected the letter of the new partition)
This process went on for several hours then finished...

Although, I had corruptions:
*See attached txt file where I copied the logs

If I understood correctly, since there were modification after last update, it created corruption...
But if running an update before restore must be done, when a drive fails, how can you run an update (since a drive is missing, so data is missing)?

I really need to have a solid procedure for when I really have a failed drive (since this was a test and I had the old drive, I ended up copying the data manually and rebuilding the RAID parity (I had other errors when trying to swap only so I just gave up)

I've been using Flexraid for a long time and trusted it but it only counts when you really need it which was the scenario I tested (thank god I had a plan B in this case)

Any hint about what I did wrong? I actually follow the FAQ for this, which seemed simple...
Thanks in advance, I need to trust Flexraid again....
« Last Edit: November 29, 2017, 03:07:40 pm by sforgez »

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,537
  • Karma: +204/-16
    • View Profile
Re: Failed restore situation, need procedures...
« Reply #1 on: November 29, 2017, 03:47:15 pm »
The gist of it lies in understanding Snapshot RAID: http://wiki.flexraid.com/2011/10/18/understanding-the-limitations-of-snapshot-raid/

Put in laymen terms, you committed the one sin in Snapshot RAID.  :P

As per the wiki article, certain operations are relatively safe. Deletes are safe if you have the recycle bin feature enabled.

However, the holy grail is that you must rarely edit your data.
Edits are bad in Snapshot RAID and you must ensure to sync up the RAID ASAP any time you edit your data. Your logs show a whole bunch of edited files that were not synced into the parity.

Again, any time data is edited (new files, renames, and deletes are fine, but edits are like "safety off!"), the parity is no longer able to recover part of the data. This must be fixed by running the Update task promptly.

I suspect you have a media cataloging tools that likely wrote metadata information to the affected files. Again, this is fine. However, the cardinal rule is that you must promptly run the Update task to sync up the changes.
One more time, this is a strict rule that applies to Edits (and deletes when not using the recycle bin feature).

If Edits are frequent in your deployment and don't like these "safety off!" moments, then you should look into tRAID which provides real-time RAID and does not have any limitation.

FYI, I have entered this feature request: http://bug.flexraid.com/view.php?id=457
Effectively, when implemented, users will have the option of having the Update task be triggered automatically on unsafe operations.

Online sforgez

  • Jr. Member
  • **
  • Posts: 70
  • Karma: +0/-0
    • View Profile
Re: Failed restore situation, need procedures...
« Reply #2 on: November 29, 2017, 05:43:11 pm »
Aahhh right... forgot about this...
I always hesitated going on RAID T because of the writing speed limitations.
But it seems more reliable so I'll read up on this, etc... Because now, I don't trust the parity, I'll always be anxious that something was modified and makes my parity basically useless...
The fact that as soon as something is modified, I would need an update kills the worry free part of it and I have enough to deal with already...

Thanks for the fast reply and the explanation, I appreciate it...

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,537
  • Karma: +204/-16
    • View Profile
Re: Failed restore situation, need procedures...
« Reply #3 on: November 29, 2017, 10:25:08 pm »
If you configure a landing disk, you will have comparable write speeds in tRAID as you do in RAID-F. See part 4 of this wiki article: http://wiki.flexraid.com/2013/06/27/performance-tuning-in-transparent-raid/ as well as this wiki article: http://wiki.flexraid.com/2013/06/27/storage-acceleration/

You can even achieve faster speeds than RAID-F if the landing disk is an SSD. :)
That said, future releases of RAID-F will also have the landing disk option. So ultimately, write speeds will be comparable. Read speeds are already the same.

Snapshot RAID is best when you have a staging disk where you do all your edits then copy the data to its final destination in the Snapshot array.