Author Topic: Transparent RAID Performance Thread - Part 3/4 (Performance Mode)  (Read 35471 times)

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: The new "Performance Mode" in Transparent RAID
« Reply #30 on: December 10, 2013, 09:14:51 pm »
@Devilman
How much RAM did you have on the system? And how much of it was free?
A screenshot of your RAID performance settings is also needed.

Here is the bug check analysis. It points to something else, but could still have set the condition for it by taking too much memory for instance.
One test you could do is reduce TCQ if you are in low memory situations.
Code: [Select]

Microsoft (R) Windows Debugger Version 6.12.0002.633 AMD64
Copyright (c) Microsoft Corporation. All rights reserved.


Loading Dump File [C:\Windows\Minidump\121013-16629-01.dmp]
Mini Kernel Dump File: Only registers and stack trace are available

Symbol search path is: SRV*C:\SYMBOLS*http://msdl.microsoft.com/download/symbols;C:\UserSymbols
Executable search path is:
Windows 7 Kernel Version 7601 (Service Pack 1) MP (8 procs) Free x64
Product: WinNt, suite: TerminalServer SingleUserTS
Built by: 7601.18247.amd64fre.win7sp1_gdr.130828-1532
Machine Name:
Kernel base = 0xfffff800`03249000 PsLoadedModuleList = 0xfffff800`0348c6d0
Debug session time: Tue Dec 10 14:47:53.399 2013 (UTC - 6:00)
System Uptime: 0 days 0:06:28.258
Loading Kernel Symbols
...............................................................
................................................................
........................
Loading User Symbols
Loading unloaded module list
.......
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

Use !analyze -v to get detailed debugging information.

BugCheck 1E, {0, 0, 0, 0}

Probably caused by : afd.sys ( afd!AfdCompletePollIrp+5e )

Followup: MachineOwner
---------

0: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

KMODE_EXCEPTION_NOT_HANDLED (1e)
This is a very common bugcheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: 0000000000000000, The exception code that was not handled
Arg2: 0000000000000000, The address that the exception occurred at
Arg3: 0000000000000000, Parameter 0 of the exception
Arg4: 0000000000000000, Parameter 1 of the exception

Debugging Details:
------------------


EXCEPTION_CODE: (Win32) 0 (0) - The operation completed successfully.

FAULTING_IP:
+6464343430346439
00000000`00000000 ??              ???

EXCEPTION_PARAMETER1:  0000000000000000

EXCEPTION_PARAMETER2:  0000000000000000

ERROR_CODE: (NTSTATUS) 0 - STATUS_WAIT_0

BUGCHECK_STR:  0x1E_0

CUSTOMER_CRASH_COUNT:  1

DEFAULT_BUCKET_ID:  VISTA_DRIVER_FAULT

PROCESS_NAME:  System

CURRENT_IRQL:  2

EXCEPTION_RECORD:  fffff80000b9c168 -- (.exr 0xfffff80000b9c168)
ExceptionAddress: fffff88003ee256e (afd!AfdCompletePollIrp+0x000000000000005e)
   ExceptionCode: c0000005 (Access violation)
  ExceptionFlags: 00000000
NumberParameters: 2
   Parameter[0]: 0000000000000000
   Parameter[1]: ffffffffffffffff
Attempt to read from address ffffffffffffffff

TRAP_FRAME:  fffff80000b9c210 -- (.trap 0xfffff80000b9c210)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=bdcd56464313ce36 rbx=0000000000000000 rcx=eec9589bf72e953b
rdx=0000000000000000 rsi=0000000000000000 rdi=0000000000000000
rip=fffff88003ee256e rsp=fffff80000b9c3a0 rbp=0000000000000000
 r8=eec9589bf72e953c  r9=0000000001cef5e9 r10=fffff80003249000
r11=fffff80000b9c470 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0         nv up ei ng nz na po nc
afd!AfdCompletePollIrp+0x5e:
fffff880`03ee256e 8b87a0000000    mov     eax,dword ptr [rdi+0A0h] ds:2980:00000000`000000a0=????????
Resetting default scope

LAST_CONTROL_TRANSFER:  from fffff800032b65be to fffff800032beb90

STACK_TEXT: 
fffff800`00b9b248 fffff800`032b65be : fffff800`00b9b410 fffff800`00000001 fffff800`00b9b9c0 fffff800`032e9a90 : nt!KeBugCheck
fffff800`00b9b250 fffff800`032e975d : fffff800`034cd380 fffff800`0340a260 fffff800`03249000 fffff800`00b9c168 : nt!KiKernelCalloutExceptionHandler+0xe
fffff800`00b9b280 fffff800`032e8535 : fffff800`0340e038 fffff800`00b9b2f8 fffff800`00b9c168 fffff800`03249000 : nt!RtlpExecuteHandlerForException+0xd
fffff800`00b9b2b0 fffff800`032f94c1 : fffff800`00b9c168 fffff800`00b9b9c0 fffff800`00000000 fb3e37b5`efb3dbf7 : nt!RtlDispatchException+0x415
fffff800`00b9b990 fffff800`032be242 : fffff800`00b9c168 00000000`00000000 fffff800`00b9c210 fffffa80`1be5ee10 : nt!KiDispatchException+0x135
fffff800`00b9c030 fffff800`032bcb4a : fffffa80`1ad7f840 00000000`00000000 fffff800`03439e80 fffffa80`193cd001 : nt!KiExceptionDispatch+0xc2
fffff800`00b9c210 fffff880`03ee256e : ffffffff`ffffffff fffffa80`1d007e02 fffffa80`193cd010 00000000`00000000 : nt!KiGeneralProtectionFault+0x10a
fffff800`00b9c3a0 fffff880`03ee264e : 00000000`00000000 00000000`00000000 fffffa80`1cbcea00 fffff800`032c422a : afd!AfdCompletePollIrp+0x5e
fffff800`00b9c400 fffff800`032c985c : fffff800`00b9c4c0 fffff800`00b9c4c0 00000006`00000000 00000000`00000055 : afd!AfdTimeoutPoll+0x2e
fffff800`00b9c430 fffff800`032c96f6 : fffffa80`1cbd0168 fffffa80`1cbd0168 00000000`00000000 00000000`00000000 : nt!KiProcessTimerDpcTable+0x6c
fffff800`00b9c4a0 fffff800`032c95de : 00000000`e76b874b fffff800`00b9cb18 00000000`00006138 fffff800`0343c988 : nt!KiProcessExpiredTimerList+0xc6
fffff800`00b9caf0 fffff800`032c93c7 : 00000000`3889ddc2 00000000`00006138 00000000`3889dd23 00000000`00000038 : nt!KiTimerExpiration+0x1be
fffff800`00b9cb90 fffff800`032b68ca : fffff800`03439e80 fffff800`03447cc0 00000000`00000001 fffff880`00000000 : nt!KiRetireDpcList+0x277
fffff800`00b9cc40 00000000`00000000 : fffff800`00b9d000 fffff800`00b97000 fffff800`00b9cc00 00000000`00000000 : nt!KiIdleLoop+0x5a


STACK_COMMAND:  kb

FOLLOWUP_IP:
afd!AfdCompletePollIrp+5e
fffff880`03ee256e 8b87a0000000    mov     eax,dword ptr [rdi+0A0h]

SYMBOL_STACK_INDEX:  7

SYMBOL_NAME:  afd!AfdCompletePollIrp+5e

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: afd

IMAGE_NAME:  afd.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  52462c33

FAILURE_BUCKET_ID:  X64_0x1E_0_afd!AfdCompletePollIrp+5e

BUCKET_ID:  X64_0x1E_0_afd!AfdCompletePollIrp+5e

Followup: MachineOwner
---------


« Last Edit: December 10, 2013, 09:33:16 pm by Brahim »

Offline Devilman

  • Newbie
  • *
  • Posts: 48
  • Karma: +1/-0
    • View Profile
Re: The new "Performance Mode" in Transparent RAID
« Reply #31 on: December 11, 2013, 02:20:07 am »
my system has 32GB Memory
Sorry for not sending the config. I was startting testing the performance-mode, so I did not activate any swo/tcq/os-caching: all false.
After the first bluescreen, the first thing I did, was restarting the server, then I started the array (still in performance-mode) for further testing. So not much ram was spend for other things.
After switching again to energy-saving mode, I could copy 200GB in different filesizes to the pool without any hanger. In fact, except these bluescreen under performance-mode, are the only one I have.
All Stuff like TV-Streaming, Media-Server, SABNZB-Stuff is running smoothly without any errors/bluescreens.
My System:
1x 480 Watt be quiet! Straight Power E9 CM Modular
1x Intel DQ77MK Intel Q77 So.1155 Dual Channel DDR3
1x Intel Xeon E3-1265Lv2 4x 2.50GHz So.1155
1x Adaptec 7805H 8 Port Multi-Lane PCIe 3.0 x8
4x 8192MB G.Skill Ares DDR3-1600 DIMM
2x DigitalDevices Cards
4xWD Reds 4TB (3xdru,1xppu)
1x Seagate Barracuda 3TB (plain)
2x 1,5TB (Samsung and some other/mirrored from windows)

Any suggestions, what I should do to help?
Right now, its 9:15am her. in about 9 hours, I'm at home, so I can open a teamviewer-session, if its help. Before it does not make sense, since when I get the bluscreen, the server is down and is not restarting itself
« Last Edit: December 11, 2013, 12:51:12 pm by Devilman »

Offline skoude

  • Newbie
  • *
  • Posts: 2
  • Karma: +0/-0
    • View Profile
Re: The new "Performance Mode" in Transparent RAID
« Reply #32 on: December 11, 2013, 04:50:37 am »
I would also like to test this.

Have HP Proliand DL 460 G8
128GB Memory
2x Xeon processor
2x Fusion-IO IODrive 2 1.2TB cards (250 000 iops per card)
2x 500GB 15K SAS Drives

Offline Devilman

  • Newbie
  • *
  • Posts: 48
  • Karma: +1/-0
    • View Profile
Re: The new "Performance Mode" in Transparent RAID
« Reply #33 on: December 14, 2013, 02:09:02 am »
Hi,
Any public news to the beta or are you testing now in private?
Ran now twice on memtest86 nad foun  no errors on the memory.
Still hoping for any gog news in the new mode.

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: The new "Performance Mode" in Transparent RAID
« Reply #34 on: December 14, 2013, 10:19:10 am »
Hi,
Any public news to the beta or are you testing now in private?
Ran now twice on memtest86 nad foun  no errors on the memory.
Still hoping for any gog news in the new mode.
Still stress testing, but mostly in VMs. My physical testing box is down. :(
So, I am hoping for more user testing and reports.

Can you send me your subsequent minidumps? Sometimes it takes several dumps to track finicky issues.

Offline Devilman

  • Newbie
  • *
  • Posts: 48
  • Karma: +1/-0
    • View Profile
Re: The new "Performance Mode" in Transparent RAID
« Reply #35 on: December 14, 2013, 01:12:02 pm »
Hi,
added 4 dumps from the same days.

Coming with 2-3 more, I'm "creating" now by switching into the  performance mode.

Offline Devilman

  • Newbie
  • *
  • Posts: 48
  • Karma: +1/-0
    • View Profile
Re: The new "Performance Mode" in Transparent RAID
« Reply #36 on: December 14, 2013, 01:51:20 pm »
2 more minidumps

121413-17191-01.dmp: Default Setting -> Bluscreen during deletion of a 21GB file
121413-18252-01.dmp: Direct-IO=false -> Hanging for 30 seconds during deletion of a 21GB file, Bluescree during start of a mkv

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: The new "Performance Mode" in Transparent RAID
« Reply #37 on: December 14, 2013, 04:51:43 pm »
2 more minidumps

121413-17191-01.dmp: Default Setting -> Bluscreen during deletion of a 21GB file
121413-18252-01.dmp: Direct-IO=false -> Hanging for 30 seconds during deletion of a 21GB file, Bluescree during start of a mkv

I have analyzed all six dumps and none of them point to tRAID.
Is it possible that tRAID might be corrupting memory which ends up affecting other components? Sure. However, it is rare that all failures happen everywhere else but in the active component. With tRAID being the most active component, any screw up on its part would most always case failure in one of its operations at some point.

What I am thinking is that your system might be unstable under load. Things work in energy mode because tRAID goes easy on the system. However, it looks like once tRAID starts to rev up, your system starts to flake.

So, make sure you have enough juice in your PSU for the load. This is the number one cause of these type of issue.
Then try to update your drivers.

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: The new "Performance Mode" in Transparent RAID
« Reply #38 on: December 14, 2013, 04:52:33 pm »
I would also like to test this.

Have HP Proliand DL 460 G8
128GB Memory
2x Xeon processor
2x Fusion-IO IODrive 2 1.2TB cards (250 000 iops per card)
2x 500GB 15K SAS Drives
The downloads are in the first post.

Offline Salty.dh

  • Newbie
  • *
  • Posts: 36
  • Karma: +0/-0
    • View Profile
Re: The new "Performance Mode" in Transparent RAID
« Reply #39 on: December 14, 2013, 05:39:51 pm »
I don't know if this will help or not, but I did notice that the version of Afd.sys which appears in at least a couple of those dump files is not the latest version.

I couldn't find that specific issue but there were some recent changes to Afd.sys.

http://www.aidinit.com/2013/08/afd-sys-pre-service-pack-2-revision-history-for-windows-7-sp1-windows-server-2008-r2-sp1-and-windows-small-business-server-2011-sbs-2011/

Might be worth a try, just make sure you've got backups :)

Offline Devilman

  • Newbie
  • *
  • Posts: 48
  • Karma: +1/-0
    • View Profile
Re: The new "Performance Mode" in Transparent RAID
« Reply #40 on: December 15, 2013, 06:34:51 am »
@Brahim:
Ok, thanks.
Last question:
I did check:
- all drivers are the latest
- memtest86+ running twice without seeing any erros (twice)
- prime95 running endless (system is running "hot") no issues
- transfering many GB of files between two systems without any issues
- running the tool "HeavyLoad" http://www.jam-software.com/heavyload/?language=EN for over 6 hours now and nothing happens, just running fine (cpu 100%, 80GB HDD access, GPU 100%, Memory over 26GB reserving)

Do you have an idea, how I can analyze it in more detail. Something has to be accessed in perfomance mode, what normaly is not accessed.

Besides, sorry for not coming further on testing. So far I'm unfortunatly out of "business" testing, since I do not get it stable in the perfomance mode.

@Salty.dh:
Thanks, I will try after the last heayload run finishes (11 hours)


« Last Edit: December 15, 2013, 06:36:28 am by Devilman »

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: The new "Performance Mode" in Transparent RAID
« Reply #41 on: December 15, 2013, 07:21:33 am »
@Devilman
Concurrent disk access is what you should be testing.
As previously stated, it could be your PSU not having enough juice for the load. Prime and others are CPU load centric, which is not the same as I/O stress testing.
FlexRAID will test your I/O and as well as overall system load.

Download: http://download.openegg.org/temp/Simple-IO-Test.zip
There are two example test scripts. The first one benchmarks each drive at time and the other does it in parallel.
Do the parallel test.

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: The new "Performance Mode" in Transparent RAID
« Reply #42 on: December 16, 2013, 12:31:54 am »
@Devilman
I have a new system that is having the same issue as you have reported in performance mode.
I will be debugging to find the cause. Stay tuned.

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: The new "Performance Mode" in Transparent RAID
« Reply #43 on: December 17, 2013, 08:11:44 am »
@Devilman
The bug is now fixed. I am working on another bug before the next beta release.

On a side note, the performance mode is working well me.
Check out these two RAID monitoring screenshots. The first one is while in energy saving mode and the other is while in performance mode.
Ps. this array involves some SSDs. One of the SSDs is actually the bottleneck.
« Last Edit: December 17, 2013, 08:14:09 am by Brahim »

Offline Brahim

  • Global Moderator
  • Hero Member
  • *****
  • Posts: 8,547
  • Karma: +204/-16
    • View Profile
Re: The new "Performance Mode" in Transparent RAID
« Reply #44 on: December 17, 2013, 08:15:03 pm »
We now have confirmation that the performance mode setting works as theorized.
Look at the attached screenshots for normal HDD. This RAID here uses slower WD 1TB Green drives.

Also, note that the numbers shown both with the SSD RAID and this slower RAID are actually lower than what Explorer shows while copying. For instance, the 29MB/s displayed below was more like 35MB+/s lowest ending speed. Likewise the 63MB/s was 75MB+/s as displayed in Explorer.

Bottom line, it works with improvements being 100% or more (double or more) over base speed.  ;D

I have one last bug affecting 32 bit OSes in performance mode that I need to resolve. After that, there will be a new beta build.
« Last Edit: December 18, 2013, 03:21:34 am by Brahim »