Author Topic: Question on Maintaining Real-time Raid - what are best practices?  (Read 3850 times)

Offline bigbob

  • Newbie
  • *
  • Posts: 36
  • Karma: +0/-0
    • View Profile
How do I know if everything is working ok now or to put it another way, how intelligent is the Real-time Raid now?  I know that Brahim has been loading the RT up with all kinds of new features but I don't want to assume it's set it and forget it if it's not.

I’ve used the snapshot raid since version 1.7 and I know that in those days, we were supposed to run verify, validate, update, etc every so often.

What kind of maintenance is required with a Real-time RAID now?  Is it all auto now and we just need to configure SMS or email notification in case FlexRAID discovers an error?

Second question, does the parity update happen instantly as the name implies?  For example, if I move some new files to one of the directories on one of the DRUs in the RT raid, and that drive dies right after the move is complete, could I expect to be able to recover those new files?

Here’s my setup:
Using FlexRAID-2.0-Public-Beta12
Real-time created using expert mode (I tried the cruise control with storage pool/RT combined and I FUBAR’d that)

5 drives: 3 DRU, 1 PPU.

On the DRUs, I’m NOT using the whole drives.  I created a directory on each drive in the DRU called FlexRAID <orig drive letter>.  Then I added those directories to create my DRUs.  The PPU drive is dedicated 100” to parity use, no sub-directories used.

Offline bigbob

  • Newbie
  • *
  • Posts: 36
  • Karma: +0/-0
    • View Profile
Re: Question on Maintaining Real-time Raid - what are best practices?
« Reply #1 on: January 15, 2012, 10:13:51 am »
Follow on questions. 
After I add directories to the DRUs, add a drive as PPU, then choose to create the Real-time RAID under command execution center, do I have to do anything else?  Does the RT Raid run from then on?  Will it automatically restart after reboot?  I cannot find a status on the RT RAID anywhere in the client UI.

Also, in the client UI (remember I'm using just Real-time RAID set up under expert mode) there are options to start the storage pool.  Would this work?  Can I just select the auto-start storage pool option, and the drive letter I want it to represent to convert my RT RAID into RT RAID Storage Pool?

I'm curious about the status of my RT Raid because copying a file to a directory in the RT RAID does not seem to trigger any further disk activity (such as the PPU updating) or a change in the log.

Thanks

Here's my log since the last reboot - I don't see anything here that RT RAID is on and functioning except the WARN notifications.

[2012-01-15 10:02:56,339] INFO : Setting core service port to 9595
[2012-01-15 10:02:56,339] INFO : Setting Http port to 8080
[2012-01-15 10:02:56,339] INFO : Https port not set. Https requests will not be serviced...
[2012-01-15 10:02:57,244] INFO : Starting FlexRAID...
[2012-01-15 10:02:57,244] INFO : FlexRAID 2.0 Public Beta12 [Snapshot 1.4 stable / Real-Time 1.0 Beta / Storage Pool 1.0 beta]
[2012-01-15 10:02:57,290] INFO : Started: January 15, 2012 10:02:57 AM CST
[2012-01-15 10:03:02,953] INFO : Initialized Scheduler Signaller of type: class org.quartz.core.SchedulerSignalerImpl
[2012-01-15 10:03:02,953] INFO : Quartz Scheduler v.1.8.4 created.
[2012-01-15 10:03:02,953] INFO : Registering Quartz shutdown hook.
[2012-01-15 10:03:02,953] INFO : Using thread monitor-based data access locking (synchronization).
[2012-01-15 10:03:03,655] INFO : Removed 0 Volatile Trigger(s).
[2012-01-15 10:03:03,655] INFO : Removed 0 Volatile Job(s).
[2012-01-15 10:03:03,655] INFO : JobStoreTX initialized.
[2012-01-15 10:03:03,655] INFO : Scheduler meta-data: Quartz Scheduler (v1.8.4) 'flexraid' with instanceId '1'
  Scheduler class: 'org.quartz.core.QuartzScheduler' - running locally.
  NOT STARTED.
  Currently in standby mode.
  Number of jobs executed: 0
  Using thread pool 'org.quartz.simpl.SimpleThreadPool' - with 3 threads.
  Using job-store 'org.quartz.impl.jdbcjobstore.JobStoreTX' - which supports persistence. and is not clustered.

[2012-01-15 10:03:03,655] INFO : Quartz scheduler 'flexraid' initialized from default resource file in Quartz package: 'quartz.properties'
[2012-01-15 10:03:03,655] INFO : Quartz scheduler version: 1.8.4
[2012-01-15 10:03:04,404] INFO : Freed 0 triggers from 'acquired' / 'blocked' state.
[2012-01-15 10:03:04,420] INFO : Recovering 0 jobs that were in-progress at the time of the last shut-down.
[2012-01-15 10:03:04,420] INFO : Recovery complete.
[2012-01-15 10:03:04,420] INFO : Removed 0 'complete' triggers.
[2012-01-15 10:03:04,420] INFO : Removed 0 stale fired job entries.
[2012-01-15 10:03:04,435] INFO : Scheduler flexraid_$_1 started.
[2012-01-15 10:03:04,704] INFO : jetty-7.x.y-SNAPSHOT
[2012-01-15 10:03:05,336] INFO : started o.e.j.s.ServletContextHandler{/,null}
[2012-01-15 10:03:05,543] INFO : Started SocketConnector@0.0.0.0:8080 STARTING
[2012-01-15 11:01:11,501] WARN : 31
[2012-01-15 11:01:58,075] WARN : 31
[2012-01-15 11:01:58,092] WARN : 31
[2012-01-15 11:02:08,281] WARN : 31
[2012-01-15 11:02:22,942] WARN : 31
[2012-01-15 11:02:22,942] WARN : 31
[2012-01-15 11:02:31,346] WARN : 30
[2012-01-15 11:02:42,523] WARN : 30
[2012-01-15 11:02:42,524] WARN : 30
[2012-01-15 11:02:49,880] WARN : 35
[2012-01-15 11:02:59,640] WARN : 35
[2012-01-15 11:02:59,641] WARN : 35
[2012-01-15 11:03:11,447] WARN : For input string: "550 (Average 506)"
[2012-01-15 11:03:17,419] WARN : 31
[2012-01-15 11:03:22,752] WARN : 31
[2012-01-15 11:03:22,752] WARN : 31
[2012-01-15 11:03:34,669] WARN : For input string: "276 (Average 292)"
[2012-01-15 11:03:53,286] WARN : 30
[2012-01-15 11:03:58,275] WARN : 30
[2012-01-15 11:03:58,275] WARN : 30
[2012-01-15 11:04:40,550] WARN : 31
« Last Edit: January 15, 2012, 11:48:05 am by bigbob »

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: Question on Maintaining Real-time Raid - what are best practices?
« Reply #2 on: January 15, 2012, 01:10:57 pm »
1. Real-Time RAID is just as it implies: real time.
It behaves exactly as if you had hardware or software RAID. That is the parity is updated in the same transaction as you write each data packet.

2. You've bought yourself some maintenance with using RT RAID in Expert mode. RT RAID is  Expert mode is not supported and considered experimental. The risk you run is having other programs/processes change the data outside of the pool. For that, you will need to Verify or Force-Sync-Verify task more often than Cruise Control users.

3. There is really no maintenance with RT RAID unlike Snapshot. That said, it makes sense to run the Verify or Force-Sync-Verify task once in a while (monthly?).

4. Usually, when an error occurs for whatever reason, FlexRAID might ask you to reconcile the RAID. In some cases, you might need to restore the RAID filesystem databases state from backup.

5. Please give me a screenshot of the Smart Attributes screen for one of your drive where an attribute has a value like "550 (Average 506)".
« Last Edit: January 15, 2012, 01:21:27 pm by Brahim »

Offline bigbob

  • Newbie
  • *
  • Posts: 36
  • Karma: +0/-0
    • View Profile
Re: Question on Maintaining Real-time Raid - what are best practices?
« Reply #3 on: January 15, 2012, 02:58:44 pm »
Error logs   
SMART Error Log Version: 1
ATA Error Count: 82 (device log contains only the most recent five errors)
   CR = Command Register [HEX]
   FR = Features Register [HEX]
   SC = Sector Count Register [HEX]
   SN = Sector Number Register [HEX]
   CL = Cylinder Low Register [HEX]
   CH = Cylinder High Register [HEX]
   DH = Device/Head Register [HEX]
   DC = Device Command Register [HEX]
   ER = Error register [HEX]
   ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 82 occurred at disk power-on lifetime: 6725 hours (280 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 08 f8 50 e7 40  Error: ICRC, ABRT at LBA = 0x00e750f8 = 15159544

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 00 f8 50 e7 40 00   2d+11:33:10.000  READ FPDMA QUEUED
  60 08 00 f8 50 e7 40 00   2d+11:33:09.900  READ FPDMA QUEUED
  60 40 00 70 de 95 40 00   2d+11:33:09.900  READ FPDMA QUEUED
  60 38 00 00 de 95 40 00   2d+11:33:09.900  READ FPDMA QUEUED
  61 08 00 00 b2 98 40 00   2d+11:33:09.900  WRITE FPDMA QUEUED

Error 81 occurred at disk power-on lifetime: 6725 hours (280 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  00 51 08 f8 50 e7 40  Error:  at LBA = 0x00e750f8 = 15159544

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  60 08 00 f8 50 e7 40 00   2d+11:33:10.000  READ FPDMA QUEUED
  60 08 00 f8 50 e7 40 00   2d+11:33:09.900  READ FPDMA QUEUED
  60 40 00 70 de 95 40 00   2d+11:33:09.900  READ FPDMA QUEUED
  60 38 00 00 de 95 40 00   2d+11:33:09.900  READ FPDMA QUEUED
  61 08 00 00 b2 98 40 00   2d+11:33:09.900  WRITE FPDMA QUEUED

Error 80 occurred at disk power-on lifetime: 6725 hours (280 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 98 68 a8 98 4c  Error: ICRC, ABRT at LBA = 0x0c98a868 = 211331176

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 f8 00 08 a8 98 40 00   2d+11:33:01.100  WRITE FPDMA QUEUED
  61 10 00 48 51 e7 40 00   2d+11:33:01.100  WRITE FPDMA QUEUED
  61 08 00 f8 67 2a 40 00   2d+11:33:01.100  WRITE FPDMA QUEUED
  61 f8 00 00 67 2a 40 00   2d+11:33:01.100  WRITE FPDMA QUEUED
  61 18 00 38 03 15 40 00   2d+11:33:01.100  WRITE FPDMA QUEUED

Error 79 occurred at disk power-on lifetime: 6725 hours (280 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 99 67 3a 8d 45  Error: ICRC, ABRT at LBA = 0x058d3a67 = 93141607

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 c8 00 38 3a 8d 40 00   2d+11:32:57.100  WRITE FPDMA QUEUED
  61 38 00 00 3a 8d 40 00   2d+11:32:57.100  WRITE FPDMA QUEUED
  61 c8 00 38 39 8d 40 00   2d+11:32:57.100  WRITE FPDMA QUEUED
  61 38 00 00 39 8d 40 00   2d+11:32:57.100  WRITE FPDMA QUEUED
  61 c8 00 38 38 8d 40 00   2d+11:32:57.100  WRITE FPDMA QUEUED

Error 78 occurred at disk power-on lifetime: 6725 hours (280 days + 5 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  84 51 18 e8 a4 98 4c  Error: ICRC, ABRT at LBA = 0x0c98a4e8 = 211330280

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  61 f8 00 08 a4 98 40 00   2d+11:32:53.000  WRITE FPDMA QUEUED
  61 08 00 f8 5f 2a 40 00   2d+11:32:53.000  WRITE FPDMA QUEUED
  61 f8 00 00 5f 2a 40 00   2d+11:32:53.000  WRITE FPDMA QUEUED
  61 d0 00 30 ab d2 40 00   2d+11:32:53.000  WRITE FPDMA QUEUED
  61 30 00 00 ab d2 40 00   2d+11:32:53.000  WRITE FPDMA QUEUED

Offline bigbob

  • Newbie
  • *
  • Posts: 36
  • Karma: +0/-0
    • View Profile
Re: Question on Maintaining Real-time Raid - what are best practices?
« Reply #4 on: January 15, 2012, 04:19:42 pm »

2. You've bought yourself some maintenance with using RT RAID in Expert mode. RT RAID is  Expert mode is not supported and considered experimental. The risk you run is having other programs/processes change the data outside of the pool. For that, you will need to Verify or Force-Sync-Verify task more often than Cruise Control users.



Brahim,

Thanks for the reply.  Ok, obviously I'm confused.  Even after spending considerable time reading here in this forum and on the WIKI, I guess I'm completely misunderstanding real-time RAID (RTR) and what you can and can't do with the pools.  As far as RTR, I imagined it to be just like Snapshot RAID except it was always updating the parity.  Can you point me to the best / most recent information/guide on how RTR works and should be used?  I definitely chose Real-time raid and NOT storage pool when I set it up.  But since you called it a pool,  I just tried and apparently successfully started a storage pool in the "storage pool merge configuration" option page which created a new U drive.  That's what I was asking about in my first post. I guess REAL-TIME means you get a storage pool whether you want it or not?

 And ya, I've been sharing and writing to those drives outside of the pool up until now so 

What is difference b/t Verify and Force-Sync-Verify?
Is there an easy way to get from my Expert RTR over to cruise control without deleting my current RTR and restarting from scratch and rebuilding the parity db?
 

Since I am now the proud owner of an unwanted storage pool (kids - this is why you should never practice unprotected RAID),  I have another question that pertains to pools.  I have 3 DRUs composed of 4 drives.  The Drives already had directories containing data on them.  Let's say

DRU1                        DRU2                       DRU3
Directory A                Directory B              Directory C
                                                                Directory D

As expected, now the storage pool has all of the directories under one drive letter:
Drive U=
Directory A
Directory B
Directory C
Directory D

Can I mess with these original Directories once they're in the pool?  If Dir A & Dir B both contain say movies, can I create a new Dir E in the storage pool and move the original Dir A&B into E?

OR, is there a way to start from scratch without being locked from your data for 24 hrs while the parity drive is built? I love your product and I thought I understood it pretty well but I have pretty much failed at this since you introduced the new "FlexRAID for Dummies".  Maybe this is just an IQ test that I am miserably failing, but I've tried using Cruise Control at least four or five times now.  Each time has failed or shortly blown up and not having my data (movies, TV, pics, music, etc) available for 24 hrs while the parity builds is very inconvenient to say the least.

So many questions....

Thanks for your time and effort in advance.

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: Question on Maintaining Real-time Raid - what are best practices?
« Reply #5 on: January 15, 2012, 04:49:30 pm »
1. Never choose Expert unless you have full confidence in your expertise on the product. ;)

2. RT RAID does not support writing to the source drives. You MUST always write through the pool. This is why you always get a pool with RT RAID. Now, FlexRAID is flexible such that writing to the source drives (not recommended but) can be dealt with by running the Reconcile task.
Again, RT RAID in Expert mode is not supported and better left to unique cases.

3. Think of RT RAID as a RAID 5 volume, but where the drives comprising it are left in a compatible format but still not allowed to be written to directly but through the RAID volume.

4. Verify does bit for bit verification and will abort at the first error. Forced-Sync-Verify will do bit for bit verification but will re-compute the parity on error.

5. There is no migration between Expert and Cruise Control configurations.

6. There is a feature request to allow access to the data while things are building. Will work on it when time permits.


Offline bigbob

  • Newbie
  • *
  • Posts: 36
  • Karma: +0/-0
    • View Profile
Re: Question on Maintaining Real-time Raid - what are best practices?
« Reply #6 on: January 15, 2012, 07:22:26 pm »
1. Never choose Expert unless you have full confidence in your expertise on the product. ;)

LOL.  Thanks for the personal hand holding.  I’m apologize for using up your valuable time.  However, I’ve always thought that expertise is best judged in the review mirror and every mistake is really just a new discovery.   Furthermore, I suspect if we all followed this advice (on this point only) then you might be the only guy in this forum.   ;)

I had just read the following before I began my misinformed adventure here down expert lane and I must have totally misinterpreted it.

You can only have one configuration utilizing the storage pool in the current releases.
A Cruise Control configuration, a Storage Pool configuration in Expert mode, and a Real-Time parity configuration in Expert mode are mutually exclusive.
The only thing you can have a multiple configuration of is Snapshot RAID in Expert mode.


Either way, I'll drag myself and "your" toys back into the sandbox and behave.  Like I said, I didn't realize Real-time RAID = Storage Pool or that I was doing something unsupported and ill-advised.  I still haven’t been able to get Cruise Control to work for me while expert mode works fine every time.  I know-I’m playing with fire.

6. There is a feature request to allow access to the data while things are building. Will work on it when time permits.

This would be a REALLY, REALLY great feature as it would minimize the pain and learning curve of beta testing. 

Do you mind answering my original question about pools since it looks like a pool is in my future one way or the other?


I have 3 DRUs composed of 4 drives.  The Drives already had directories containing data on them.  Let's say

DRU1                        DRU2                       DRU3
Directory A                Directory B              Directory C
                                                                Directory D

As expected, now the storage pool has all of the directories under one drive letter:
Drive U=Directory A, Directory B, Directory C, Directory D

Can I mess with these original Directories once they're in the pool?  If Dir A & Dir B both contain say movies, can I create a new Dir E in the storage pool and move the original Dir A&B into E?
« Last Edit: January 15, 2012, 07:25:30 pm by bigbob »

Offline xbmcg

  • Newbie
  • *
  • Posts: 10
  • Karma: +0/-0
    • View Profile
Re: Question on Maintaining Real-time Raid - what are best practices?
« Reply #7 on: January 19, 2012, 08:48:41 am »
I dont know, if it is a bug or feature or just lack of knowledge.

I tried to create a rt raid based on folder dru's / ppu's - cruise control do not let me do that - it shows only physical disks.

any way. I did it using expert mode and found, that even the dru's have just some MB in there, the PPU grows to the almost 99% of the remaining drive space of some 50GB.

In snap-shot mode, the ppu ist almost a little larger than the fill-level of a ppu.

Is this a bug, or a feature of the RT-raid?

Does RT-RAID mean, it works device based (like real hardware RAID4) or is it just a missing option?

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: Question on Maintaining Real-time Raid - what are best practices?
« Reply #8 on: January 19, 2012, 01:36:18 pm »
@xbmcg
That's how it works.

Offline bigbob

  • Newbie
  • *
  • Posts: 36
  • Karma: +0/-0
    • View Profile
Re: Question on Maintaining Real-time Raid - what are best practices?
« Reply #9 on: January 19, 2012, 05:26:35 pm »
Brahim,

I think my original BSOD issue was a loose SATA cable but I can't know for sure because I haven't gotten a chance to test it again with WMC .  I never heard from anyone on how to run chkdsk on a drive in the pool.  While I was trying to figure this out myself, one of the drives disappeared completely.  I reseated all the cables and Voila!, the drive was back online so I deleted the pool and started over since I didn't know if that had introduce some unknown variable.  After having completely removed FlexRAID, I ran chkdsk on every drive and every one passed.  I also defragged each drive then started over.

I reinstalled FR 13, using CC to setup RTR.  I initialized.  That finished fine.  I changed the pool parameters to non-removable and NTFS.  I started the pool manually which worked indicated by the new drive window pop up.  I set the pool to auto start and rebooted.  Now it says it needs a reconcile.

When I run reconcile I get the following error about not enough room on parity drive.  I also tried the force sync verify but it says I have to reconcile first.  Is there any thing else to do other than delete the config and start over?

[2012-01-19 14:14:18,394] ERROR: [MappedParityMeta::allocate] ErrorCode=31 Message=Out of real-time parity space error for \\?\C:\FlexRAID-Managed-Pool\class1_0\{cedc162d-0feb-11e0-9dff-0050ba435163}\DRU1.db requested=15590928 available=12911472
java.lang.RuntimeException: [MappedParityMeta::allocate] ErrorCode=31 Message=Out of real-time parity space error for \\?\C:\FlexRAID-Managed-Pool\class1_0\{cedc162d-0feb-11e0-9dff-0050ba435163}\DRU1.db requested=15590928 available=12911472
   at java.lang.Void.<unknown>(Unknown Source)
   at java.lang.Void.<unknown>(Unknown Source)
   at java.lang.Void.<unknown>(Unknown Source)
   at java.lang.Void.<unknown>(Unknown Source)
Caused by: java.lang.IllegalArgumentException: [MappedParityMeta::allocate] ErrorCode=31 Message=Out of real-time parity space error for \\?\C:\FlexRAID-Managed-Pool\class1_0\{cedc162d-0feb-11e0-9dff-0050ba435163}\DRU1.db requested=15590928 available=12911472
   ... 4 more
[2012-01-19 14:25:03,495] TRACE: Executing: live
[2012-01-19 14:25:03,511] TRACE:    => {"success": true, "status": {"referenceCode": "2", "task": "force-sync-verify", "status": "STATUS_STARTED", "progress": null, "report": null, "processes": 0, "processesRequested": 0}, "commandMessages": [{"messageCode": "successTaskInitiated", "messageData": ["force-sync-verify", "2"]}], "serverMessages": null}
[2012-01-19 14:25:03,923] ERROR: Need for reconciliation detected! Please run the reconcile task...
[2012-01-19 14:25:03,927] ERROR: Need for reconciliation detected! Please run the reconcile task...
java.io.IOException: Need for reconciliation detected! Please run the reconcile task...
   at java.lang.Void.<unknown>(Unknown Source)
   at java.lang.Void.<unknown>(Unknown Source)
   at java.lang.Void.<unknown>(Unknown Source)
   at java.lang.Void.<unknown>(Unknown Source)


Obviously, this Parity drive or perhaps the SATA card it's plugged into is suspect.  It's one of those Samsung 2TB drives that needed to be flashed (which I did as soon as I saw your post here about the issue).  It has gone offline on it's own now twice in the last 2 days.  When it's running it shows zero errors under chkdsk, or smartmon so I suspect a bad cable (now replaced) or a loose connection (now stabilized) which I have taken care of.  Time will tell.

Here's how I managed to run chkdsk on the drives in the pool:
I don't know if this is sanctioned, supported or even wise but until someone "officially addresses it" this is all I have


My assumption is that if chkdsk finds any errors and makes a repair that you need to immediately run one of the "Advance FlexRAID commands" recreate, forced-sync verify or Reconcile" as soon as you're done and flexraid is running again.


Stop the pool if it's running
Stop FlexRaid
right click Computer, chose Manage, then Disk Management.
right click on drive that you want to run chkdsk on and choose "Change drive letter and Paths..."
DO NOT REMOVE THE CURRENT MOUNT POINT THAT'S THERE NOW
instead click add, then choose "Assign the following drive letter" and pick a drive letter in the drop down, then OK
Do that for every drive you need to.
Run Chkdsk (rt clk on drive/or partition in the same console screen, choose properties, Tools, & check now, with automatically fix files system errors) on every drive that you just added a drive letter to
You might have to type y then enter in the dos box saying that you want to close any open handles to the drive

Let chkdsk run and note any errors

When it's done, reverse the process by removing the Drive Letters ONLY that you just added ONLY ON THE DRIVES THAT YOU ORIGINALLY CHANGED.
AGAIN, DO NOT REMOVE THE CURRENT MOUNT POINT THAT'S THERE NOW OR THE DRIVE LETTER FROM A DRIVE NOT IN THE POOL.  I SUGGESTED WRITING EVERYTHING DOWN AS YOU GO TO MAKE CERTAIN YOU DON'T MESS UP.
In my case, I could not remove the drive letter that I had just added without booting in safemode first
If you need to do this, Click start button, Type msconfig in the "Search PRograms and Files" hit enter
Under the 2nd tab called Boot, put a check mark in the safe boot box
(I also went in to services and changed the flexraid service from automatic start to manual just in case it runs in safemode)
Then I rebooted.  Did the same steps and removed only the drive letter(s), not the mount points
Then changed flexraid service back to automatic start, and went back into msconfig and removed the safeboot option, making sure the first tab under general, had normal startup button selected.
reboot and you should be good to go

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: Question on Maintaining Real-time Raid - what are best practices?
« Reply #10 on: January 19, 2012, 07:23:59 pm »
@bigbob

1. If you try to reconcile and you get an error, restore from the last backup. FlexRAID makes automatic backups and it is recommended that users make additional backup after copying large amounts of data.

2. It is true that, you might need to reconcile after running chckdsk or the like if they alter the source content at any point.

3. If the source drive is not visible to a tool when the said drive is under the control of FlexRAID, it is for good reason. It likely means that the tool is attempting direct filesystem access, which is a no-non when it comes to RT RAID. If you circumvent that, do remember to reconcile.

Offline bigbob

  • Newbie
  • *
  • Posts: 36
  • Karma: +0/-0
    • View Profile
Re: Question on Maintaining Real-time Raid - what are best practices?
« Reply #11 on: January 19, 2012, 08:42:06 pm »
Thanks Brahim.  Do you have an official policy, recommendation, guide etc on this?  There are certain scenarios (BSOD, power outage, stupidity, accidents, etc) where it might be a good idea to run a chkdsk.  At least I almost always do a chkdsk after some event like that.

Will your auto correction handle chores like that?  It was a pain in the butt to assign a drive letter to each drive in the pool, run chkdsk and then remove the drive letter again. 

It would be nice if you could develop an option that runs a script to do all of that and then triggers the reconcile task if chkdsk reports that it made corrections.  Perhaps you're way ahead of me on this.

Either way, you're doing a great job.

Thanks again.