Aerosol Posted December 28, 2014 Report Posted December 28, 2014 I've discussed some 0xD1 debugging here, but I figured I'd also go into a different 0xD1 scenario here, and just show it from different angles by using NotMyFault to force a bug check.Download NotMyfault here.--------------------DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)This indicates that a kernel-mode driver attempted to access pageable memory at a process IRQL that was too high.We're all familiar with this bug check, so let's move on to what I wanted to talk about.Let's go ahead and do an !analyze -vDRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)An attempt was made to access a pageable (or completely invalid) address at aninterrupt request level (IRQL) that is too high. This is usuallycaused by drivers using improper addresses.If kernel debugger is available get stack backtrace.Arguments:Arg1: fffff8a0066eb800, memory referencedArg2: 0000000000000002, IRQLArg3: 0000000000000000, value 0 = read operation, 1 = write operationArg4: fffff88002af7385, address which referenced memoryfffff8a0066eb800 was the memory that was referenced. It's either invalid or it was at an IRQL that was too high.kd> !pte fffff8a0066eb800 VA fffff8a0066eb800PXE at FFFFF6FB7DBEDF88 PPE at FFFFF6FB7DBF1400 PDE at FFFFF6FB7E280198 PTE at FFFFF6FC50033758contains 000000007AC84863 contains 000000000367B863 contains 000000006B4C6863 contains 00003B5000000000pfn 7ac84 ---DA--KWEV pfn 367b ---DA--KWEV pfn 6b4c6 ---DA--KWEV not valid PageFile: 0 Offset: 3b50 Protect: 0Using our handy !pte command which shows page table and directory entry for an address, we can see that it is not a valid address despite appearing to be one based on a first glance. Why is it not valid? As we can see above, and as I highlighted in purple, it's because this address is currently on the pagefile.Why can't we just page it in? As we know, this is not how the Windows memory manager works regarding kernel-mode and its rules. If we're at IRQL (2) or higher (which we are, see argument 2), we cannot page anything in, therefore we bug check.Great, so we know why the system crashed. However, what caused it?--------------------Let's go ahead and dump the stack:kd> kChild-SP RetAddr Call Sitefffff880`032f4448 fffff800`02a912a9 nt!KeBugCheckExfffff880`032f4450 fffff800`02a8ff20 nt!KiBugCheckDispatch+0x69fffff880`032f4590 fffff880`02af7385 nt!KiPageFault+0x260fffff880`032f4720 fffff880`02af7727 myfault+0x1385fffff880`032f4870 fffff800`02dac127 myfault+0x1727fffff880`032f48d0 fffff800`02dac986 nt!IopXxxControlFile+0x607fffff880`032f4a00 fffff800`02a90f93 nt!NtDeviceIoControlFile+0x56fffff880`032f4a70 00000000`76df138a nt!KiSystemServiceCopyEnd+0x1300000000`0023edc8 00000000`00000000 0x76df138aSo here we have our call stack. Rather than doing <--- next to the calls, I'll just do this below because I don't want to destroy the formatting of the stack.We start out with something in user-mode that we don't have the symbols for, and this is why it's 0x76df138a as opposed to a resolved name that we can understand. Why did I make the 7 in the address red, and how did I know we started out with something going on in user-mode? Good question! When the first digit of an address like that is 7 or lower, it's a user-mode address.This is also due to the fact that this is a kernel-dump, which we can see towards the top of our crash dump within WinDbg:Kernel Summary Dump File: Only kernel address space is availableWith that said, we cannot see what the application was doing outside of when it went down into kernel-mode.So we know that some application (0x76df138a) did something, and called down into kernel-mode. Everything above 0x76df138a is now kernel-mode. On x64, you can tell because the addresses start with fffff880`032f4a00 under Child-SP which implies kernel-mode.We can see it goes through a few functions, and then ends up in myfault. Shortly afterwards, we hit a pagefault (trying to page in memory from the pagefile -- big no no).--------------------If we take a look at the trap frame:kd> .trap 0xfffff880032f4590NOTE: The trap frame does not contain all registers.Some register values may be zeroed or incorrect.rax=0000000005000000 rbx=0000000000000000 rcx=0000000000002481rdx=fffffa8001810000 rsi=0000000000000000 rdi=0000000000000000rip=fffff88002af7385 rsp=fffff880032f4720 rbp=fffff880032f4b60 r8=0000000000012408 r9=0000000000000810 r10=fffff80002a12000r11=0000000000000002 r12=0000000000000000 r13=0000000000000000r14=0000000000000000 r15=0000000000000000iopl=0 nv up ei ng nz na po ncmyfault+0x1385:fffff880`02af7385 8b03 mov eax,dword ptr [rbx] ds:00000000`00000000=????????The first very important thing to note is the note about the trap frame not containing all registers, and how they may be either zeroed out or incorrect. The big question is why? Well, trap frame code generation on x64 versions of Windows does not save the contents of registers that are non-volatile.With that said, registers such as rbx, rdi, rsi, etc, are either zeroed out or incorrect. This is due to the fact that on x64, any code that runs after the generation of a trap frame will properly hand it and restore it to its own frame. It's seen as an unnecessary step in a hot path within the kernel.Extremely detailed article with much more info here. Moving on, what happened with the instruction we failed on, we were moving a pointer which was stored in the rbx register:mov eax,dword ptr [rbx]Uh oh, rbx is zeroed out. With that said, we can't !pte the register address to double check it, etc. We just need to assume that this all occurred because of myfault attempted to access memory that was either paged out or invalid (which it did).--------------------If you wanted any extra proof or to see if NotMyFault was the crash, you could dump all of the processes at the time of the crash to see if there was any correlation. In this case, you'd use !process 0 0. Flags are important in this case, and you can as always check the WinDbg help file for info, or use MSDN.PROCESS fffffa80040a7060 SessionId: 1 Cid: 0654 Peb: 7fffffd4000 ParentCid: 0708 DirBase: 670ea000 ObjectTable: fffff8a00666c330 HandleCount: 68. Image: NotMyfault.exeWe can see we did indeed have a NotMyFault process running at the time of the crash, so we can at this point assume that this is very likely the accurate cause of the crash.Hope you enjoyed reading!Source Quote