Author Topic: How to get Notification alert "Missing Drive" in pool and prevent update tasks?  (Read 1243 times)

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,385
  • Karma: +201/-15
    • View Profile
The SMART is used for disk health monitoring. It will notify you if issues are found with the monitored disks.

Offline TheJSN

  • Newbie
  • *
  • Posts: 14
  • Karma: +0/-0
    • View Profile
But what parameter to monitor to see if the disk is online or offline.  Under Smart Monitoring, does it use the standard notification email set in flexraid?  I don't see any option for if the drive is offline or online, only error counts, temperature etc.  The whole point of knowing a disk is dead or offline, is to ensure that the update won't run and the disk can be replaced/recovered immediately.  But as of right now I have to manually go to the page, check the disks in the pool to see if it's missing.  Shouldn't we be able to get an alert if a disk has failed?

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,385
  • Karma: +201/-15
    • View Profile
The Update task detects if a disk has failed or not and aborts in such a case.

Read this carefully: http://wiki.flexraid.com/2013/06/27/transparent-raid-web-ui-system-preferences/
When setting up email and SMS, you have to option on the types of notifications.

Offline TheJSN

  • Newbie
  • *
  • Posts: 14
  • Karma: +0/-0
    • View Profile
I followed a recommendation on this forum to run both a validate and update as part of the scheduled maintenance of snapshot raid.  Is it possible its the validate step that is having the error?  I sent you a copy of my log file from my last failure that clearly showed every file on the "missing/offline" disk marked as DELETED followed by a successful update.  Since i didn't discover the disk was offline until I noticed files missing, when i went back to do the recovery, it recovered nothing. A blank disk.  It seems something in the maintenance schedule just deleted every file from the parity.  I assume my disk died sometime before this operation on Jan 19th at 2:00am.

[2017-01-19 02:00:00,481] INFO : [update] started at: Thu Jan 19 02:00:00 EST 2017
[2017-01-19 02:00:26,710] INFO : Using RAID engine Tx
[2017-01-19 02:08:00,113] INFO : Total process size = 381823221760
[2017-01-19 02:08:00,113] INFO : Max data size = 3895049846784
[2017-01-19 02:08:00,113] INFO : DRU1 size=3742662222994
[2017-01-19 02:08:00,113] INFO : DRU2 size=2923890489322
[2017-01-19 02:08:00,113] INFO : DRU3 size=2978383160150
[2017-01-19 02:08:00,113] INFO : DRU4 size=2976806162556
[2017-01-19 02:08:00,113] INFO : DRU5 size=2932063522204
[2017-01-19 02:08:00,113] INFO : DRU6 size=3387872483082
[2017-01-19 02:08:00,113] INFO : DRU7 size=3868260275375
[2017-01-19 02:08:00,113] INFO : DRU8 size=3772404523091
[2017-01-19 02:08:00,113] INFO : DRU9 size=3895048906964
[2017-01-19 02:08:00,113] INFO : DRU10 size=3115721260535
[2017-01-19 02:08:00,113] INFO : DRU11 size=2722723491891
[2017-01-19 03:21:14,290] WARN : D:5:1\youtubedownload\Ming Xi on Instagram “💗💗💗#TrainLikeAnAngel #VSFashionShow.mp4 no longer exist! Skipping...
[2017-01-19 03:24:19,157] ERROR: Invalid state - Unexpected EOF on D:6:0\MP3\2008\Melissa\Nic_Chagall_-_What_You_Need__Marco_V_Remix_.mp3
java.io.IOException: Invalid state - Unexpected EOF on D:6:0\MP3\2008\Melissa\Nic_Chagall_-_What_You_Need__Marco_V_Remix_.mp3
   at com.tchegbe.lib.common.io.d.a(Unknown Source)
   at com.tchegbe.lib.common.io.d.a(Unknown Source)
   at com.tchegbe.lib.common.io.i.<unknown>(Unknown Source)
   at com.tchegbe.lib.common.io.parity.e.a(Unknown Source)
   at com.tchegbe.lib.common.io.e.f.run(Unknown Source)
   at java.lang.Thread.run(Unknown Source)
[2017-01-19 03:24:19,214] WARN : Incomplete - D:6:0\MP3\2008\Melissa\Nic_Chagall_-_What_You_Need__Marco_V_Remix_.mp3 - size:18690732 read:9191304
[2017-01-19 03:24:19,900] INFO : [update] completed at: Thu Jan 19 03:24:19 EST 2017
[2017-01-19 03:24:23,864] INFO : Sending email alert - message=[SNAP-RAID1][UPDATE] FAILURE
[2017-01-19 03:24:32,702] INFO : [SNAP-RAID1][UPDATE] FAILURE - emailed...
[2017-01-19 05:00:00,213] INFO : Executing validate...
[2017-01-19 05:00:00,214] INFO : [validate] started at: Thu Jan 19 05:00:00 EST 2017
[2017-01-19 05:00:22,356] INFO : Using RAID engine Tx
[2017-01-19 05:14:01,039] WARN : Deleted: D:6:0\HDTV1\Drama\Pearl Harbor (2001).mkv

************** Then it proceeds to Delete every file on the missing disk  1000's of Deletions it finishes many hours later********

[2017-01-19 19:49:47,221] WARN : Changed: P:0:0\003097.flxr
[2017-01-19 19:49:56,542] WARN : Changed: P:0:0\003098.flxr

************** There are a number of these Warn Changed Parity items in the log also after all of the Deletions ***********

[2017-01-20 12:34:29,683] WARN : New file found: C:\FlexRAID-Managed-Pool\class1_0\{beb8cf64-6dca-11e2-93f6-0019d17e6b63}\MP3\Tiger & Kitty Music\WaltzofThrones-v2.mp3
[2017-01-20 12:34:33,825] INFO : [validate] completed at: Fri Jan 20 12:34:33 EST 2017
[2017-01-20 12:34:35,250] INFO : Sending email alert - message=[SNAP-RAID1][VALIDATE] SUCCESS
[2017-01-20 12:34:41,327] INFO : [SNAP-RAID1][VALIDATE] SUCCESS - emailed...
[2017-01-20 23:19:38,337] INFO : Service stopping...
[2017-01-20 23:19:38,337] INFO : Stopping views...
[2017-01-20 23:19:38,891] INFO : Unmounting View...
[2017-01-20 23:19:46,676] INFO : Unmounting View complete...
[2017-01-20 23:19:47,839] INFO : Stopping scheduler...
[2017-01-20 23:19:48,019] INFO : Stopping web server...
[2017-01-20 23:19:48,222] INFO : Service stopping...
[2017-01-20 23:19:48,222] INFO : Stopping views...
[2017-01-20 23:19:48,222] INFO : Stopping scheduler...
[2017-01-20 23:19:48,222] INFO : Stopping web server...
[2017-01-20 23:19:48,353] INFO : Stopping socket server...
[2017-01-20 23:19:48,353] INFO : Stopping socket server...
[2017-01-20 23:19:48,665] INFO : Shutting down at: 1/20/17 11:19 PM
[2017-01-20 23:19:48,665] INFO : Closing all client connections...
[2017-01-20 23:19:48,665] INFO : Shutting down at: 1/20/17 11:19 PM
[2017-01-20 23:19:48,665] INFO : Closing all client connections...
[2017-01-20 23:19:48,681] INFO : Exiting all processes...
[2017-01-20 23:19:48,915] ERROR: Connection reset by peer: socket write error
java.net.SocketException: Connection reset by peer: socket write error
   at java.net.SocketOutputStream.socketWrite0(Native Method)
   at java.net.SocketOutputStream.socketWrite(Unknown Source)
   at java.net.SocketOutputStream.write(Unknown Source)
   at sun.nio.cs.StreamEncoder.writeBytes(Unknown Source)
   at sun.nio.cs.StreamEncoder.implFlushBuffer(Unknown Source)
   at sun.nio.cs.StreamEncoder.implFlush(Unknown Source)
   at sun.nio.cs.StreamEncoder.flush(Unknown Source)
   at java.io.OutputStreamWriter.flush(Unknown Source)
   at java.io.BufferedWriter.<unknown>(Unknown Source)
   at com.tchegbe.lib.b.a.b.a(Unknown Source)
   at com.tchegbe.lib.b.a.c.run(Unknown Source)
[2017-01-20 23:19:49,025] INFO : Exiting all processes...
[2017-01-20 23:19:50,244] ERROR: Connection refused: connect
java.net.ConnectException: Connection refused: connect
   at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
   at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
   at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
   at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
   at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
   at java.net.PlainSocketImpl.connect(Unknown Source)
   at java.net.SocksSocketImpl.connect(Unknown Source)
   at java.net.Socket.connect(Unknown Source)
   at com.tchegbe.lib.b.a.b.a(Unknown Source)
   at com.tchegbe.lib.b.a.e.a(Unknown Source)
   at com.tchegbe.lib.b.a.g.a(Unknown Source)
   at com.tchegbe.lib.common.i.d.b(Unknown Source)
   at com.tchegbe.lib.common.i.a.run(Unknown Source)
[2017-01-20 23:19:50,837] ERROR: Connection refused: connect
java.net.ConnectException: Connection refused: connect
   at java.net.DualStackPlainSocketImpl.waitForConnect(Native Method)
   at java.net.DualStackPlainSocketImpl.socketConnect(Unknown Source)
   at java.net.AbstractPlainSocketImpl.doConnect(Unknown Source)
   at java.net.AbstractPlainSocketImpl.connectToAddress(Unknown Source)
   at java.net.AbstractPlainSocketImpl.connect(Unknown Source)
   at java.net.PlainSocketImpl.connect(Unknown Source)
   at java.net.SocksSocketImpl.connect(Unknown Source)
   at java.net.Socket.connect(Unknown Source)
   at com.tchegbe.lib.b.a.b.a(Unknown Source)
   at com.tchegbe.lib.b.a.e.a(Unknown Source)
   at com.tchegbe.lib.b.a.g.a(Unknown Source)
   at com.tchegbe.lib.common.i.d.a(Unknown Source)
   at com.tchegbe.lib.common.i.a.<unknown>(Unknown Source)
   at com.tchegbe.lib.common.i.a.<unknown>(Unknown Source)
   at com.tchegbe.lib.common.i.a.<unknown>(Unknown Source)
   at com.tchegbe.lib.io.a.h.run(Unknown Source)
[2017-01-20 23:22:40,628] INFO : Setting core service port to 9595
[2017-01-20 23:22:40,659] INFO : Setting Http port to 8080
[2017-01-20 23:22:40,659] INFO : Https port not set. Https requests will not be serviced...
[2017-01-20 23:22:40,768] INFO : Starting FlexRAID...
[2017-01-20 23:22:40,784] INFO : FlexRAID 2.1 [Snapshot 1.1 / Storage Pool 2.0] [2015.11.01]
[2017-01-20 23:22:40,784] INFO : Started: January 20, 2017 11:22:40 PM EST
[2017-01-20 23:23:00,400] INFO : Initiating FlexRAID Storage Pool...
[2017-01-20 23:23:00,924] INFO : Host OS = Microsoft Windows [Version 6.3.9600]
[2017-01-20 23:23:00,924] INFO : Building View Registry...
[2017-01-20 23:23:00,991] INFO : Building Auto View Registry...
[2017-01-20 23:23:01,007] INFO : Loading View options...
[2017-01-20 23:23:01,409] INFO : Using Auto-View definition...
[2017-01-20 23:23:01,409] INFO : Building View Handler...
[2017-01-20 23:23:01,731] INFO : Mounting Storage Pool drive...
[2017-01-20 23:23:23,156] INFO : Reconnecting d$="D:" /GRANT:kevin,FULL
[2017-01-21 00:16:41,908] INFO : Executing update...
[2017-01-21 00:16:41,909] INFO : [update] started at: Sat Jan 21 00:16:41 EST 2017
[2017-01-21 00:16:54,850] INFO : Using RAID engine Tx
[2017-01-21 00:21:34,910] INFO : Total process size = 440686084096
[2017-01-21 00:21:34,910] INFO : Max data size = 3905625784320
[2017-01-21 00:21:34,910] INFO : DRU1 size=3742662222994
[2017-01-21 00:21:34,910] INFO : DRU2 size=2923890489322
[2017-01-21 00:21:34,910] INFO : DRU3 size=2978383160150
[2017-01-21 00:21:34,910] INFO : DRU4 size=2976806162556
[2017-01-21 00:21:34,910] INFO : DRU5 size=2932063522204
[2017-01-21 00:21:34,910] INFO : DRU6 size=3387872483082
[2017-01-21 00:21:34,910] INFO : DRU7 size=3905625465549
[2017-01-21 00:21:34,910] INFO : DRU8 size=3772404523091
[2017-01-21 00:21:34,910] INFO : DRU9 size=3895048906964
[2017-01-21 00:21:34,910] INFO : DRU10 size=3115721260535
[2017-01-21 00:21:34,910] INFO : DRU11 size=2722723491891
[2017-01-21 00:43:16,127] WARN : process has not exited
java.lang.IllegalThreadStateException: process has not exited
   at java.lang.ProcessImpl.<unknown>(Unknown Source)
   at java.lang.ProcessImpl.waitFor(Unknown Source)
   at com.tchegbe.a.a.c.a(Unknown Source)
   at com.tchegbe.a.a.c.<unknown>(Unknown Source)
   at com.tchegbe.a.a.c.<unknown>(Unknown Source)
   at com.tchegbe.lib.io.view.k.s.<unknown>(Unknown Source)
   at com.tchegbe.lib.io.view.k.s.<unknown>(Unknown Source)
   at com.tchegbe.lib.io.view.k.s.<unknown>(Unknown Source)
   at com.tchegbe.flexraid.main.b.A.getFlags(Unknown Source)
   at com.google.gwt.user.server.rpc.RPC.<unknown>(Unknown Source)
   at com.google.gwt.user.server.rpc.RemoteServiceServlet.processCall(Unknown Source)
   at com.google.gwt.user.server.rpc.RemoteServiceServlet.processPost(Unknown Source)
   at com.google.gwt.user.server.rpc.AbstractRemoteServiceServlet.doPost(Unknown Source)
   at javax.servlet.http.HttpServlet.service(Unknown Source)
   at javax.servlet.http.HttpServlet.service(Unknown Source)
   at org.eclipse.jetty.servlet.ServletHolder.handle(Unknown Source)
   at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Unknown Source)
   at com.tchegbe.lib.gwt.server.servlet.SecurityFilter.doFilter(Unknown Source)
   at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(Unknown Source)
   at org.eclipse.jetty.servlet.ServletHandler.doHandle(Unknown Source)
   at org.eclipse.jetty.server.session.SessionHandler.doHandle(Unknown Source)
   at org.eclipse.jetty.server.handler.ContextHandler.doHandle(Unknown Source)
   at org.eclipse.jetty.servlet.ServletHandler.doScope(Unknown Source)
   at org.eclipse.jetty.server.session.SessionHandler.doScope(Unknown Source)
   at org.eclipse.jetty.server.handler.ContextHandler.doScope(Unknown Source)
   at org.eclipse.jetty.server.handler.ScopedHandler.handle(Unknown Source)
   at org.eclipse.jetty.server.handler.HandlerList.handle(Unknown Source)
   at org.eclipse.jetty.server.handler.HandlerWrapper.handle(Unknown Source)
   at org.eclipse.jetty.server.Server.handle(Unknown Source)
   at org.eclipse.jetty.server.HttpConnection.handleRequest(Unknown Source)
   at org.eclipse.jetty.server.HttpConnection$RequestHandler.content(Unknown Source)
   at org.eclipse.jetty.http.HttpParser.parseNext(Unknown Source)
   at org.eclipse.jetty.http.HttpParser.parseAvailable(Unknown Source)
   at org.eclipse.jetty.server.BlockingHttpConnection.handle(Unknown Source)
   at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(Unknown Source)
   at org.eclipse.jetty.util.thread.QueuedThreadPool.<unknown>(Unknown Source)
   at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(Unknown Source)
   at java.lang.Thread.run(Unknown Source)
[2017-01-21 01:37:03,839] WARN : D:5:1\youtubedownload\Ming Xi on Instagram “💗💗💗#TrainLikeAnAngel #VSFashionShow.mp4 no longer exist! Skipping...
[2017-01-21 06:24:51,273] INFO : Successful UPDATE operation!
[2017-01-21 06:25:19,655] INFO : [update] completed at: Sat Jan 21 06:25:19 EST 2017
[2017-01-21 06:25:20,473] INFO : Sending email alert - message=[SNAP-RAID1][UPDATE] SUCCESS
[2017-01-21 06:25:30,785] INFO : [SNAP-RAID1][UPDATE] SUCCESS - emailed...
[2017-01-23 02:00:00,258] INFO : Executing update...
[2017-01-23 02:00:00,258] INFO : [update] started at: Mon Jan 23 02:00:00 EST 2017
[2017-01-23 02:00:28,908] INFO : Using RAID engine Tx
[2017-01-23 02:08:03,185] INFO : Total process size = 48191504384
[2017-01-23 02:08:03,185] INFO : Max data size = 3905625784320
[2017-01-23 02:08:03,185] INFO : DRU1 size=3742662222994
[2017-01-23 02:08:03,185] INFO : DRU2 size=2923890489322
[2017-01-23 02:08:03,185] INFO : DRU3 size=2978383160150
[2017-01-23 02:08:03,185] INFO : DRU4 size=2976806162556
[2017-01-23 02:08:03,185] INFO : DRU5 size=2932063522204
[2017-01-23 02:08:03,185] INFO : DRU6 size=3387872483082
[2017-01-23 02:08:03,185] INFO : DRU7 size=3905625465549
[2017-01-23 02:08:03,185] INFO : DRU8 size=3796419348022
[2017-01-23 02:08:03,185] INFO : DRU9 size=3895048906964
[2017-01-23 02:08:03,185] INFO : DRU10 size=3115721260535
[2017-01-23 02:08:03,185] INFO : DRU11 size=2722723491891
[2017-01-23 02:19:01,920] WARN : D:5:1\youtubedownload\Ming Xi on Instagram “💗💗💗#TrainLikeAnAngel #VSFashionShow.mp4 no longer exist! Skipping...
[2017-01-23 02:41:02,142] INFO : Successful UPDATE operation!
[2017-01-23 02:41:04,328] INFO : [update] completed at: Mon Jan 23 02:41:04 EST 2017
[2017-01-23 02:41:07,519] INFO : Sending email alert - message=[SNAP-RAID1][UPDATE] SUCCESS
« Last Edit: February 12, 2017, 09:03:55 am by TheJSN »

Offline TheJSN

  • Newbie
  • *
  • Posts: 14
  • Karma: +0/-0
    • View Profile
 Brahim I've sent and posted my logs from the last failure i could not recover from?  Any ideas?  Is it the validate process that is deleting the files?  Does it do the same checks as update for failed drives?

Offline TheJSN

  • Newbie
  • *
  • Posts: 14
  • Karma: +0/-0
    • View Profile
Brahim, I've had the same situation again.  A drive from my drive pool went missing, either just before or during the update.  No errors in the logs.  When i went to view the content of the failed drive, it was blank again.  It completed the update as if there was no data on the drive. Luckily I was able to remove it, and boot it up copying the data off manually, but FlexRaid has failed in this same situation.   I've sent you the previous logs and received no response.   Suspecting that it was the validation process, I removed it from my scheduled tasks and only do a quick validate and update weekly.

Please help.

Offline TheCodeMan

  • Newbie
  • *
  • Posts: 25
  • Karma: +0/-0
    • View Profile
I realize this is a very old post but I have also seen the exact same situations where drives go offline and the update still automatically occurs. I wonder if it has to do with some drives not being monitored by S.M.A.R.T.

Brahim, is it possible that the auto update cancellation feature depends on the S.M.A.R.T. status of the drive before it will cancel the auto-update?

In my case the S.M.A.R.T. functionality of FlexRAID doesn't see all my drives because of the controllers I am using. Apparently they are not compatible. My solution has been to no schedule any updating but rather do it manually, after a full disk check.