Author Topic: verify sync failes shortly after starting fue to "disk error"  (Read 587 times)

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,332
  • Karma: +199/-15
    • View Profile
Re: verify sync failes shortly after starting fue to "disk error"
« Reply #15 on: April 17, 2017, 08:43:46 am »
i am still getting some errors in windows events when i do copy/move with traid.
error 153. the IO operation at logical block address ... disk11 was retried.

it doesn't seem serious.  but its always for that disk that i just replaced, and i replaced the cable, too.  it's the only disk giving errors, but i can't figure out why always the same disk.
It is serious. You definitely want to resolve that or you will end up with file system corruption.

Offline pooler1

  • Jr. Member
  • **
  • Posts: 70
  • Karma: +0/-0
    • View Profile
Re: verify sync failes shortly after starting fue to "disk error"
« Reply #16 on: April 25, 2017, 01:06:01 pm »
i'm trying to identify the problem.  i just ran a regular verify, and it completed the whole thing 100%, but at the very end, there are a lot of red lines in the log...
error code= 9999999999
2 stripe block failure
first...
last...
operation aborted!
failed uor position = 6
failed uor id = 1000000020

also, since completing the verify, i have gotten no more of the 153 errors in the windows event log, previous to that i was getting them all the time like a couple times a minute.

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,332
  • Karma: +199/-15
    • View Profile
Re: verify sync failes shortly after starting fue to "disk error"
« Reply #17 on: April 26, 2017, 04:28:24 am »
Again, try to resolve the underlying issue. The software will not resolve any hardware or driver issue you might have.
Verify & Sync should succeed consistently before you can rest easy that your data is safe.

I would delete the RAID configuration, regain control of the disks, and then run a series of disks tests as well as load tests on the system itself.

Offline adridolf

  • Jr. Member
  • **
  • Posts: 85
  • Karma: +0/-0
    • View Profile
Re: verify sync failes shortly after starting fue to "disk error"
« Reply #18 on: April 27, 2017, 06:13:37 am »
i am still getting some errors in windows events when i do copy/move with traid.
error 153. the IO operation at logical block address ... disk11 was retried.

For which disk do you observe that?
Because I have the 153 regularly on a healthy system, but only after server restart and during verify/sync, and only for block 0x2 at the pool disk.

In this special case, I would consider it unrelated. In any other case, just ignore me. ;-)

Offline pooler1

  • Jr. Member
  • **
  • Posts: 70
  • Karma: +0/-0
    • View Profile
Re: verify sync failes shortly after starting fue to "disk error"
« Reply #19 on: April 27, 2017, 08:26:18 am »
it's usually just this one disk.  but here's the latest, i think i'm ok.

i ran the verify sync again.  when it started, i got the 153 errors a lot, like a couple times a minute.  a few hours later, they stopped, verify sync is still running.  the task completed successfully (before it would get aborted with a bunch of red text).  And i haven't got any more of the 153 errors since it stopped.  good!  (i think)

so maybe brahim can confirm this or someone else...
my conclusion is that the disk cable that i replaced was bad, and i don't think when i replaced the cable and disk, that a verify sync ever completed until just now.  And so, things should be ok now.

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,332
  • Karma: +199/-15
    • View Profile
Re: verify sync failes shortly after starting fue to "disk error"
« Reply #20 on: April 27, 2017, 09:25:19 am »
it's usually just this one disk.  but here's the latest, i think i'm ok.

i ran the verify sync again.  when it started, i got the 153 errors a lot, like a couple times a minute.  a few hours later, they stopped, verify sync is still running.  the task completed successfully (before it would get aborted with a bunch of red text).  And i haven't got any more of the 153 errors since it stopped.  good!  (i think)

so maybe brahim can confirm this or someone else...
my conclusion is that the disk cable that i replaced was bad, and i don't think when i replaced the cable and disk, that a verify sync ever completed until just now.  And so, things should be ok now.
As a test:
1. Could you run the Verify task without the pool running? Do not stop the array - just the pool.

2. These errors are not normal. Please check your TCQ, SWO, and pool cache settings (turn them off and see if that makes any difference while the pool is running).

3. Check your disks SMART and see if there is an increase in error rates.

Offline pooler1

  • Jr. Member
  • **
  • Posts: 70
  • Karma: +0/-0
    • View Profile
Re: verify sync failes shortly after starting fue to "disk error"
« Reply #21 on: April 27, 2017, 04:36:32 pm »
i'll do that and report back.  btw, what does it mean to have the pool running but not the array?  to me, it's not any different than not having either running because i cant access the data either way.  is there any practical purpose to having a pool running without the array?

Offline adridolf

  • Jr. Member
  • **
  • Posts: 85
  • Karma: +0/-0
    • View Profile
Re: verify sync failes shortly after starting fue to "disk error"
« Reply #22 on: April 28, 2017, 05:57:17 am »
The array without the pool, not the other way around.

The ARRAY just means the "transparent" disks, so a virtualized disk for each physical disk while parity is maintained.

The POOL is the one single disk which displays you the content of all individual disks as a single drive.

Thus, you can just start the array without the pool, so you can access the transparent, parity-protected disks individually (by assigning drive letters to them). The pool then is a distinct feature, providing you this merged view of all files.

Offline pooler1

  • Jr. Member
  • **
  • Posts: 70
  • Karma: +0/-0
    • View Profile
Re: verify sync failes shortly after starting fue to "disk error"
« Reply #23 on: July 24, 2017, 08:04:15 pm »
so i've looked more into my errors, here's some more info, please help me if it sounds like something you know about...

i was looking into if the cables, or hardware was a problem.  I switched some cables around.  It didn't seem like cables were an issue.
One thing I noticed was that by WD black 2TB drives were the ones giving me errors regardless of the cables, and even when i replaced it with a new one same model.  SO next thing i will try is replacing those drives with a hitachi and see if that gets rid of some of the issues.

The other set of issues i noticed were coming from the motherboards sata attached drives.  most of the drives are attached by sas cables and breakouts.  but 4 of them are direct to mobo sata connected, and they all give 153 errors.  could it be a mobo driver issue or something like that?  it's a supermicro x10sat motherboard.  the rest of the drives are attached to m1015, and other than the WD black drives, they don't seem to have problems.