Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.

Topics - pclausen

Pages: [1]
tRAID Bug Reports / Windows 10 bug
« on: March 01, 2015, 04:33:11 pm »
I installed Windows 10 Technical preview on my flexRAID server yesterday.  This is after having already been running the pre-release on several of my other boxes for the last 30 days or so.

Only issue I have found is with the tRAID Storage Pool drive letter assignments.  Under windows 8 I had V assigned to tRAID1 and W assigned to tRAID2.  After the conversion, tRAID1 came up as E and tRAID2 came up as F.  I double checked the setting under RAID options, and they were indeed still set for V and W.

So I went into Disk Management and manually changed the drive letters to V and W.  However, after a reboot, they reverted back to E and F.  So I just gave up and changed the drive letters under Storage Pooling options to E and F.  Not a big deal, just a little irritating.

The other issue I have noticed is that booting takes a really long time as each of my 48 drives in my tRAIDs are "inspected" for about 20 seconds each, so booting takes 16 minutes!  Basically the access LED comes on for about 20 seconds on each drive during boot.  Fortunately I rarely need to boot the server.  Hopefully this particular behavior will go away in future builds as we get close to the actual release.  I'm running LSI SAS 9220-8i adapters, so it is probably an issue with that driver.

General Discussion / Restore fails due to bad blocks on another drive
« on: March 01, 2015, 10:40:26 am »
I had a drive drop out (Disk 5 / 2DRU06) on one of my tRAIDs.  Turns out the drive was ok (but I only determined this after reformatting the drive and copying some test data to it), so I started a Restore operation choosing the "Do not swap (restore into the current drive)" option.  It was chugging along until I got to 7% and then failed with the following:

Name: Restore RAID [tRAID2]
Start Date: Sun Mar 01 09:07:05 GMT-500 2015
End Date: Sun Mar 01 10:29:36 GMT-500 2015
Duration: 01:22:30
Throughput: 8.280GB/s
Total Size: 40.025TB
Error Code = -23
2049 stripe blocks failure
First byte failure at -1
Last byte failure at 155222176256
Operation aborted!
Operation aborted due to disk error!
Failed UoR position = 16
Failed UoR ID = 10042

So I checked Event Viewer and sure enough, I see a bunch of these:

The device, \Device\Harddisk15\DR15, has a bad block

Disk 15 maps to 2DRU17.

So 2DRU5 is currently in a dropped state with no data on it and 2DRU17 has bad blocks preventing the data to be restored to 2DRU5.  I should also note that 2DRU17 contains no data yet, so I'm not sure why tRAID is accessing it during the restore process in the first place.

Additionally, when I look in Disk Manager at Disk 5, it is still Unallocated which I find a bit odd since the restore got to 7% before failing.

So how should I proceed at this point?


I got what appear to be a very odd issue.  I have 2 large pool drives with media on them.

When I attempt to configure my existing collection from within CoutchPotato, it is only able to see the contents of some folders.  For example, it can't see any of my Blu-Ray movie folders on the W: pool drive as seen here.  Folder permissions also shown.  Note the folder owner.  More on that later.

If I disable caching on the W: pool drive and assign a drive letter to one of the drives in that array.  CP is able to pull up the movie folders just fine.

Things are even stranger on my V: pool drive.  Here some folder content is visible to CP just fine, while others are not.  Here's an example of one that is:

And here's one that is not.  Note that the owner of the folder is "Administrators".  I changed that from the previous value (which was like the first 2 examples above.  It made no difference.

So what's going on here and what permissions or other settings do I need to change in order for CP to be able to consistently see the folder contents on pool drives?

Btw, I have both pool drives set to Removable = FALSE.

So I got the error shown in the first attachment below during the initial Create RAID Parity.

However, the disk (DRU13) did not appear as dropped in the tRAID configuration screen, even after hitting the refresh button a couple of times (2nd attachment below).

That said, when I logged into my raid controller it told me that drive had indeed failed.  This was also confirmed under Windows disk Management where disk 14 was missing.

So I pulled the drive and put in into my Frankenstein rig, and got a BIOS message that the S.M.A.R.T. status was bad. (3rd attachment below).  Fortunately the data is still intact, and I'm in the process of copying it onto another 1TB drive (my last 1TB spare!).

So a couple of questions:

1. Why did the drive not drop?  Could it be related to the issue with the latest release not passing through S.M.A.R.T. info?  (I think my Areca controllers also have an issue with passing S.M.A.R.T. so is it likely one or the other?)

2. Should I manually fail the drive or just delete and create the array again once I have installed the replacement DRU13 with the data restored?  I'm leaning towards the latter since the RAID newer got to the point of being online.

3. Should I wait for my IBM1015 SAS9220-8i controllers to arrive and swap out the Areca RAID controllers before creating the arrays?  This way S.M.A.R.T. will be known to work if that is what caused the drive not to drop.

I'm somewhat concerned about more of these failures as the create process is likely to take days.  This initial failure occurred about4 hours in.  I got 22 DRU disk packed with data.  See 4th attachment below.

So I setup a tRAID with DRUs 1-7 (4 x 4TB and 3 x 2TB) and PPUs 1-2 (both 4TB).  I also created place holders for DRU8-22).

DRUs 1-4 and PPUs 1-2 are connected as passthrough off an Areca ARE-1170 RAID card.

DRUs 5-7 are connected as passthrough off an Areca ARE-1680 which in turn is connected to a LSI SAS2X36 expander backplane.

All drives appear fine through Disk Manager and registered fine in the webgui.

When I attempt initialize the RAID and choose the "Create RAID Parity", I get the following error: 500 The call failed on the server; see server log for details

And the corresponding log entry:

[2014-07-11 23:48:47,605] WARN  log(1783) - Exception while dispatching incoming RPC call Service method 'public abstract com.tchegbe.lib.gwt.client.bean.ActionStatus com.tchegbe.nzfs.ui.client.rpc.NZFSService.offlineCreateRAIDParity(long,long)' threw an unexpected exception: java.lang.RuntimeException
   at javax.servlet.http.HttpServlet.service(
   at javax.servlet.http.HttpServlet.service(
   at org.eclipse.jetty.servlet.ServletHolder.handle(
   at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(
   at com.tchegbe.lib.gwt.server.servlet.SecurityFilter.doFilter(
   at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(
   at org.eclipse.jetty.servlet.ServletHandler.doHandle(
   at org.eclipse.jetty.server.session.SessionHandler.doHandle(
   at org.eclipse.jetty.server.handler.ContextHandler.doHandle(
   at org.eclipse.jetty.servlet.ServletHandler.doScope(
   at org.eclipse.jetty.server.session.SessionHandler.doScope(
   at org.eclipse.jetty.server.handler.ContextHandler.doScope(
   at org.eclipse.jetty.server.handler.ScopedHandler.handle(
   at org.eclipse.jetty.server.handler.HandlerList.handle(
   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(
   at org.eclipse.jetty.server.Server.handle(
   at org.eclipse.jetty.server.HttpConnection.handleRequest(
   at org.eclipse.jetty.server.HttpConnection$RequestHandler.content(
   at org.eclipse.jetty.http.HttpParser.parseNext(
   at org.eclipse.jetty.http.HttpParser.parseAvailable(
   at org.eclipse.jetty.server.BlockingHttpConnection.handle(
   at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(
   at org.eclipse.jetty.util.thread.QueuedThreadPool$
Caused by: java.lang.RuntimeException
   at com.tchegbe.nzfs.ui.server.rpc.NZFSServiceImpl.offlineCreateRAIDParity(
   ... 28 more
Caused by: java.lang.NullPointerException
   at com.tchegbe.nzfs.ui.server.op.NZFSDiskOperations.manageTRAIDState(
   at com.tchegbe.nzfs.ui.server.op.NZFSDiskOperations.validateTaskRun(
   at com.tchegbe.nzfs.ui.server.op.NZFSDiskOperations.offlineCreateRAIDParity(
   ... 30 more

I rebooted and tried again, same error.

I'm running the service/client that was just released today under Windows 8 x64.

I saw where others were getting this error, 500 but it was related to the storage pool and possibly caching.  I haven't gotten that far.  I'm trying to standup the very first array on this server.

General Discussion / Setting up very large system (24 x 1TB and 24 x 2TB)
« on: November 20, 2013, 04:55:34 pm »
I currently have 24 1TB drives in a RAID6 and another 24 2TB drives in a different RAID6 array.  Each RAID is run by a Areca controller (both in the same physical server but the 48TB RAID is in a 2nd chassis connected via a 4 lane SAS cable to the SAS expander).

I've been real close to loosing the 48TB a couple of time now when a rebuilt took place and I lost a 2nd drive during the rebuilt and a 3rd stated getting timeouts.  So I'm looking to move to a solution that will prevent a total loss of data in case more than 2 drives completely fail.

Does the latest version still only support 2 PPUs?  If so, does that mean I should be setting be setting up the following: (assuming using just my existing drives for now, see paragraph below)

RAID1 10 x 1TB DRU + 2 1TB PPU
RAID2 10 x 1TB DRU + 2 1TB PPU
RAID3 10 x 2TB DRU + 2 2TB PPU
RAID4 10 x 2TB DRU + 2 2TB PPU

Going forward, I plan to begin purchasing 4TB drives, and I'll need a handful to get this thing kick started so that I can take the 24TB off line and redeploy as tRAID.  I currently has a little over 19TB of data on the small array and I have 6 spare 2TB drives and 1 1TB drive laying around.  So that workout out to 13TB.  So if I pick up 3 4TB drives, I will should be able to setup a "kickstart" tRAID to offload the 19TB of data onto, right?

Any best practice advice on how to get started with setting up a large system would be appreciated.


Pages: [1]