Author Topic: "Zero Read error" 2.0 p 12 - RAID6 T2+  (Read 2828 times)

Offline Fastrack

  • Jr. Member
  • **
  • Posts: 63
  • Karma: +2/-0
    • View Profile
"Zero Read error" 2.0 p 12 - RAID6 T2+
« on: September 29, 2011, 03:19:22 pm »
Hi,

5 DRU's (added 8 for future)
2 PPU's

1st DRU is 2 drive letters D: and I: from a single 2TB disk
DRU 2-5 are 2TB drives E-H:

This is during the create process.  The file is fine, no read errors I can copy it to another drive.  Any idea???  I've kicked off the Create process again, but it'll probably fail.

Forgot to mention, I paused this for about 3 hours after running for 2hours. (started at 5pm paused at 8pm - resumed at 10:30pm... until it died at 3:22 the next day)

Code: [Select]
[2011-09-29 15:22:32,390] ERROR: Zero read error: E:\Backup\SageTV Thu Partial.bkf - 36248576 - 36690636
java.io.IOException: Zero read error: E:\Backup\SageTV Thu Partial.bkf - 36248576 - 36690636
at java.lang.Void.<unknown>(Unknown Source)
at java.lang.Void.<unknown>(Unknown Source)
at java.lang.Void.<unknown>(Unknown Source)
at java.lang.Void.<unknown>(Unknown Source)
[2011-09-29 15:22:32,812] ERROR: Invalid state - reading=false removing=false writing=false needsWriting=false computing=false needsComputing=true
java.io.IOException: Invalid state - reading=false removing=false writing=false needsWriting=false computing=false needsComputing=true
at java.lang.Void.<unknown>(Unknown Source)
at java.lang.Void.<unknown>(Unknown Source)
at java.lang.Void.<unknown>(Unknown Source)
at java.lang.Void.<unknown>(Unknown Source)
[2011-09-29 15:22:33,406] WARN : Operation is being aborted... re-validating last processed entries...
[2011-09-29 15:23:08,031] INFO : [create] completed at: Thu Sep 29 15:23:07 EDT 2011
[2011-09-29 15:23:08,077] WARN : Memory leak detected on native buffers! Leaks count = 6 out of 26
« Last Edit: September 29, 2011, 03:54:16 pm by Fastrack »
Ben
FlexRaid 2.1 Final (2015-11-01) | RAID6 T2 engine | Windows 7 64bit | C2D 6600 @ 2.40GHz | 4GB DDR3 | SageTV v9 | 5 x Hitachi 2TB DRU | 1 x Seagate 2TB DRU | 1 x Seagate 1TB DRU | 1 x Hitachi 2TB PPU

Offline xliv

  • Global Moderator
  • Full Member
  • *****
  • Posts: 208
  • Karma: +7/-1
    • View Profile
Re: "Zero Read error" 2.0 p 12 - RAID6 T2+
« Reply #1 on: September 29, 2011, 04:46:51 pm »
Can you run a chkdsk on the E: drive?
If that runs fine, it's probably a bug.
Server HW: ASUS P8B-E/4L + PIKE 2008 SAS | Xeon E3-1230 | 8GB DDR3 ECC 1333 [Kingston - KVR1333D3E9SK2/8G] | Intel 320 - 80 Go (OS)
Server OS: Win7 64b EN
FlexRAID 2.0 latest build, Cruise Control, 7 DRUs (6x3TB,1x2TB), 1 PPU (2x2TB)

Offline Fastrack

  • Jr. Member
  • **
  • Posts: 63
  • Karma: +2/-0
    • View Profile
Re: "Zero Read error" 2.0 p 12 - RAID6 T2+
« Reply #2 on: September 29, 2011, 06:25:46 pm »
Yeah I mean it failed after 19hrs.  Which is a loooonggggg wait :)  I had this built with 2 DRU's and 2PPU's, but made a lot of data changes when I added the additional 2 DRU's.  So I decided just to start over.

Would I have more luck with the ∞ engine?

Code: [Select]
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\Administrator>chkdsk e:
The type of the file system is NTFS.
Volume label is Atlantis.

WARNING!  F parameter not specified.
Running CHKDSK in read-only mode.

CHKDSK is verifying files (stage 1 of 3)...
File verification completed.
CHKDSK is verifying indexes (stage 2 of 3)...
Index verification completed.
CHKDSK is verifying security descriptors (stage 3 of 3)...
Security descriptor verification completed.

1953512000 KB total disk space.
1197383168 KB in 36016 files.
     85568 KB in 1801 indexes.
         0 KB in bad sectors.
    107904 KB in use by the system.
     65536 KB occupied by the log file.
 755935360 KB available on disk.

     65536 bytes in each allocation unit.
  30523625 total allocation units on disk.
  11811490 allocation units available on disk.

C:\Documents and Settings\Administrator>
Ben
FlexRaid 2.1 Final (2015-11-01) | RAID6 T2 engine | Windows 7 64bit | C2D 6600 @ 2.40GHz | 4GB DDR3 | SageTV v9 | 5 x Hitachi 2TB DRU | 1 x Seagate 2TB DRU | 1 x Seagate 1TB DRU | 1 x Hitachi 2TB PPU

Offline xliv

  • Global Moderator
  • Full Member
  • *****
  • Posts: 208
  • Karma: +7/-1
    • View Profile
Re: "Zero Read error" 2.0 p 12 - RAID6 T2+
« Reply #3 on: September 30, 2011, 01:24:51 am »
Sorry, I meant with /r to scan the sectors.
Server HW: ASUS P8B-E/4L + PIKE 2008 SAS | Xeon E3-1230 | 8GB DDR3 ECC 1333 [Kingston - KVR1333D3E9SK2/8G] | Intel 320 - 80 Go (OS)
Server OS: Win7 64b EN
FlexRAID 2.0 latest build, Cruise Control, 7 DRUs (6x3TB,1x2TB), 1 PPU (2x2TB)

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: "Zero Read error" 2.0 p 12 - RAID6 T2+
« Reply #4 on: September 30, 2011, 09:43:53 am »
@Fastrack
Try enabling read error retry.
A "Zero read error" usually means that the controller timeout trying to read the data.
Whether the timeout was caused by the drive or the controller itself now needs to be determined.

Offline Fastrack

  • Jr. Member
  • **
  • Posts: 63
  • Karma: +2/-0
    • View Profile
Re: "Zero Read error" 2.0 p 12 - RAID6 T2+
« Reply #5 on: September 30, 2011, 01:43:22 pm »
@Fastrack
Try enabling read error retry.
A "Zero read error" usually means that the controller timeout trying to read the data.
Whether the timeout was caused by the drive or the controller itself now needs to be determined.

Well long story short I had a 3ware 9500s in JBOD mode running 3 of these drives (it's a SATAI controller).  I finally got another Promise TX4 300 controller off Ebay, the drives are MUCH quicker on this controller (as it's SATAII).  If it's a controller timeout it was very likely the 3ware.  This is one of the reasons I switched to FlexRAID, the 3ware would drop these drives like flies, it would rebuild and x minutes later time out!  The Seagate 2TB drives I had were much worse on the 3ware, so I returned them.

I'm now rebuilding again, 6 Hitachi's are spanned across 2 Promise controllers.

I'll try enabling Read error retry - if I can find that option found it under Run-time Properties->Advanced Properties :)

Well I ran a chkdsk /r:
Code: [Select]
C:\Documents and Settings\Administrator>chkdsk e: /r
The type of the file system is NTFS.
Volume label is Atlantis (DRU2).

CHKDSK is verifying files (stage 1 of 5)...
File verification completed.
CHKDSK is verifying indexes (stage 2 of 5)...
Index verification completed.
CHKDSK is verifying security descriptors (stage 3 of 5)...
Security descriptor verification completed.
CHKDSK is verifying file data (stage 4 of 5)...
File data verification completed.
CHKDSK is verifying free space (stage 5 of 5)...
Free space verification is complete.

1953512000 KB total disk space.
1180915840 KB in 36015 files.
     85568 KB in 1801 indexes.
         0 KB in bad sectors.
    107904 KB in use by the system.
     65536 KB occupied by the log file.
 772402688 KB available on disk.

     65536 bytes in each allocation unit.
  30523625 total allocation units on disk.
  12068792 allocation units available on disk.
« Last Edit: September 30, 2011, 02:05:48 pm by Fastrack »
Ben
FlexRaid 2.1 Final (2015-11-01) | RAID6 T2 engine | Windows 7 64bit | C2D 6600 @ 2.40GHz | 4GB DDR3 | SageTV v9 | 5 x Hitachi 2TB DRU | 1 x Seagate 2TB DRU | 1 x Seagate 1TB DRU | 1 x Hitachi 2TB PPU

Offline Fastrack

  • Jr. Member
  • **
  • Posts: 63
  • Karma: +2/-0
    • View Profile
Re: "Zero Read error" 2.0 p 12 - RAID6 T2+
« Reply #6 on: October 01, 2011, 02:26:19 pm »
Well it failed again, this time on the Saturday backup (today).  So I believe it's related to the file changing after the Data Scanner to when the Parity happens.

What are the 2 columns, looks like file sizes to me?  Why are they different values?

32729088 vs 33731293 ??

32729088 is the actual file size and the backup of this file happens @ 4:00am.  Could something be going on with the Data Scanner process vs when the parity is generated?  The file size would have changed as this backup happened this morning.  IE the last failure happened on the Thursday backup at 3:30pm on Thursday.

How can I work around this?   Without excluding these files?

Code: [Select]
[2011-09-30 15:10:33,437] INFO : Executing create...
[2011-09-30 15:10:33,437] INFO : [create] started at: Fri Sep 30 15:10:33 EDT 2011
[2011-09-30 15:10:33,468] INFO : Using RAID engine T2+
[2011-09-30 15:12:55,656] INFO : *** Starting workers... ***
[2011-09-30 15:14:59,234] INFO : Every Week at: 03:00:00
[Starting = 2011/10/02 01:00:00]
[Ending = null/null/null null:null:null]
[2011-09-30 15:14:59,359] INFO : Cron: 00 00 03 ? * 1 *
[2011-09-30 15:15:33,703] INFO : Every Day at: 00:00:00
[Starting = null/null/null null:null:null]
[Ending = null/null/null null:null:null]
[2011-09-30 15:15:33,703] INFO : Cron: 00 00 00 * * ? *
[2011-09-30 15:16:51,187] INFO : Every Week at: 01:15:00
[Starting = 2011/10/02 null:null:null]
[Ending = null/null/null null:null:null]
[2011-09-30 15:16:51,187] INFO : Cron: 00 15 01 ? * 1 *
[2011-09-30 23:19:51,588] WARN : F:\TV Shows\CityNewsat6-2227157-0.mpg no longer exist! Skipping...
[2011-09-30 23:46:04,181] WARN : E:\TV Shows\Flashpoint-S04E08-TheWarWithin-2237538-0.mpg no longer exist! Skipping...
[2011-10-01 14:30:01,634] WARN : Retrying read on zero read error for E:\Backup\SageTV Sat Partial.bkf
[2011-10-01 14:30:02,822] ERROR: Zero read error: E:\Backup\SageTV Sat Partial.bkf - 32729088 - 33731293
java.io.IOException: Zero read error: E:\Backup\SageTV Sat Partial.bkf - 32729088 - 33731293
at java.lang.Void.<unknown>(Unknown Source)
at java.lang.Void.<unknown>(Unknown Source)
at java.lang.Void.<unknown>(Unknown Source)
at java.lang.Void.<unknown>(Unknown Source)
[2011-10-01 14:30:03,072] ERROR: Invalid state - reading=false removing=false writing=false needsWriting=false computing=false needsComputing=true
java.io.IOException: Invalid state - reading=false removing=false writing=false needsWriting=false computing=false needsComputing=true
at java.lang.Void.<unknown>(Unknown Source)
at java.lang.Void.<unknown>(Unknown Source)
at java.lang.Void.<unknown>(Unknown Source)
at java.lang.Void.<unknown>(Unknown Source)
[2011-10-01 14:30:03,775] WARN : Operation is being aborted... re-validating last processed entries...
[2011-10-01 14:30:16,697] INFO : [create] completed at: Sat Oct 01 14:30:16 EDT 2011
« Last Edit: October 01, 2011, 02:36:26 pm by Fastrack »
Ben
FlexRaid 2.1 Final (2015-11-01) | RAID6 T2 engine | Windows 7 64bit | C2D 6600 @ 2.40GHz | 4GB DDR3 | SageTV v9 | 5 x Hitachi 2TB DRU | 1 x Seagate 2TB DRU | 1 x Seagate 1TB DRU | 1 x Hitachi 2TB PPU

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: "Zero Read error" 2.0 p 12 - RAID6 T2+
« Reply #7 on: October 01, 2011, 05:30:52 pm »
I added a fix to handle this type of situation for the next release.