Author Topic: Storage Pool crashes after 5 to 10 minutes **SOLVED**  (Read 1085 times)

Offline jamohamo

  • Jr. Member
  • **
  • Posts: 63
  • Karma: +0/-0
    • View Profile
Storage Pool crashes after 5 to 10 minutes **SOLVED**
« on: April 08, 2016, 08:04:58 pm »
Hi! I'm a long time user of tRAID and am currently running version "Enterprise Host Manager v1.0.0 2015.08.26" (the November version would not let me start the pool for some reason) on Windows Server 2008R2 DataCenter edition fully patched. It had been running without issue until last night when I went to delete a folder, there was no error but the folder did not disappear from the Windows Explorer view. I then tried to delete it again at which point an error came up about the item not being found. I refreshed the view and it came good so I thought it was ok. 10 minutes later the storage pool disappeared. The broker service had crashed. I looked through the forum and found that some users had been able to "fix" this issue in previous versions by turning off pool caching. I only had metadata cached but I tried setting it to none but to no avail. I then moved the folder that was the parent of the one I deleted to the Desktop. I then created a new folder under tRAID and copied the contents into it. The service still crashed after a bit.

Here is what I get in the application log now every time I start it. It always fails within 12 minutes:

Faulting application name: NZFSB.exe, version: 1.0.0.1, time stamp: 0x5600062c
Faulting module name: NZFSB.exe, version: 1.0.0.1, time stamp: 0x5600062c
Exception code: 0x40000015
Fault offset: 0x000c7c08
Faulting process id: 0x62c
Faulting application start time: 0x01d191f697f0b829
Faulting application path: C:\tRAID\Release_1.0_2015-09-21\NZFS-TRAID-HOST\common\NZFSB.exe
Faulting module path: C:\tRAID\Release_1.0_2015-09-21\NZFS-TRAID-HOST\common\NZFSB.exe
Report Id: 84db2c6c-fdeb-11e5-993f-3cd92b682a07

The FlexRAID.nzfs.log file only has this in it:

[2016-04-09 10:27:26.772571][1584]
[2016-04-09 10:27:26.772572][1584]
[2016-04-09 10:27:26.772572][1584]====================================================================================
[2016-04-09 10:27:26.772572][1584]|||||||||||||||||||||||||  NZFS Service Start v1.0.0 2015.09.21  |||||||||||||||||||||||||
[2016-04-09 10:27:26.772572][1584]====================================================================================

I've attached the .wer file too.

Last night I restarted the broker service and did not start the pool. The service was still running fine this morning and the nightly sync job had completed without problems so it definitely seems to be just the storage pool part causing trouble.

Thanks.

I also just ran a memory test with Memtest+ 5.01 and no errors were found.
« Last Edit: April 10, 2016, 06:27:54 am by jamohamo »

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: Storage Pool crashes after 5 to 10 minutes
« Reply #1 on: April 10, 2016, 04:12:46 am »
You need to be on the latest version as bug fixes to any issue would need to be done for the next release.

Offline jamohamo

  • Jr. Member
  • **
  • Posts: 63
  • Karma: +0/-0
    • View Profile
Re: Storage Pool crashes after 5 to 10 minutes (SOLVED)
« Reply #2 on: April 10, 2016, 06:27:12 am »
I have worked out what was causing the crash. It has something to do with the Landing Disk. Let me explain ...

Since the crash was happening consistently after 10 minutes and I knew that my Landing Disk was set to move data every 10 minutes I decided to deactivate my Landing Disk in the menu. Now the crash still happened (see Important Point later) but I was convinced there was some connection so I disabled both NZFS services to stop tRAID starting and rebooted.

To my surprise my Landing Disk folder was full of files and folders going back many months, perhaps even over a year or so, totalling over 300GB! I thought everything was supposed to be moved out of the Landing Disk. Anyway, I found the parent folder of the one I originally had trouble deleting. I then enabled and restarted both NZFS services about 15 seconds apart, the broker first. I restarted the pool and checked. According to my Z: drive (tRAID storage pool drive) I had a copy there too (see Important Point later) and decided to delete the one on the Landing Disk and rebooted.

The pool came up automatically as it should and stayed up. I think there was some glitch with the folder I deleted that caused the broker to crash whenever it tried to deal with the Landing Disk. I also noticed that the Dashboard was now only showing one of my DRUs, DRU1. I decided to disable the services again, reboot and move all the other stuff out of the Landing Disk. I then enabled them, rebooted and everything has been fine! I still haven't reactivated the Landing Disk but everything is working perfectly, including the Dashboard which is showing all 5 of my DRUs.

Important Point - Don't be fooled by the "Bypass Landing Disk / Activate" option. It only stops the Landing Disk being used as a temporary point of call for data destined for your array. Any data in the Landing Disk is still combined in the pool view. When I thought I had two copies of that folder I couldn't be sure (but since the broker service was crashing I think there might have been some inconsistency right there). Whatever you do, do not just go deleting stuff from the Landing Disk. Move it somewhere else to clean it out (especially on an SSD with TRIM enabled - oh the pain if you ever accidentally delete something!) and then move all that stuff back to the array, with the Landing Disk still deactivated, after it stays up for whatever period of time makes you feel confident with it.

So that is how I got it working again. It is a non-destructive way to do it if you are very careful about moving the data on the Landing Disk. I wasn't and paid the penalty. I think I've managed to get most of the important stuff back or recreate it luckily.

The question I have is why was there so much stuff sitting in the Landing Disk folder? Isn't it all supposed to be moved out to the array? The Landing Disk folder on my system sits on a partition that torrents download to as well so the disk can easily fill up. The amount of free space was below my reserve of 50GB for the Landing Disk at the time the problem happened and would have been in the past so I wonder if that had triggered the retention (or lack of data shifting) somehow.

Are there any recommendations for the size of the Landing Disk, the space reserve and whether it should be by itself on a partition or not?

I hope this helps others with the same problem anyway.

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: Storage Pool crashes after 5 to 10 minutes **SOLVED**
« Reply #3 on: April 11, 2016, 08:20:16 am »
Please look at the release notes to see what bug that has since been fixed is affecting you. Again, it is important to be on the latest release as many things have been fixed.