0x133.1 3xW6400 PBR
- Nick Kuo
- BSOD Analysis
- 30 Mar, 2024
0: kd> .bugcheck
Bugcheck code 00000133
Arguments 00000000`00000001 00000000`00001e00 fffff807`0a91c340 00000000`00000000
0: kd> !corelist
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Core | PRCB | ClockKeepAlive | DpcWatchdogCount(0x133.1) | DpcWatchdogPeriodTicks(0x133.1) | DpcTimeCount(0x133.0) | DpcTimeLimitTicks(0x133.0) | PacketBarrier | TargetCount | LastTick | ClockOwner | CurrentThread | NextThread | IdleThread | DebuggerSavedIRQL | DpcStack | IsrStack |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 0 | 0xfffff80705ed0180 | 1 | 0x1e00 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x40df | 1 | 0xffff958518ef8080 | 0x0 | 0xfffff8070a94c6c0 | 0xd | 0xfffff8070cb08fb0 | 0xfffff8070cb10000 |
| 1 | 0xffffa80068716180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff95850893e040 | 0x0 | 0xffff95850893e040 | 0x0 | 0xffffb8063583ffb0 | 0xffffa80068731000 |
| 2 | 0xffffa80068a61180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x40de | 0 | 0xffff958508944040 | 0x0 | 0xffff958508944040 | 0x0 | 0xffffb80635857fb0 | 0xffffa80068a7c000 |
| 3 | 0xffffa80068b91180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x40de | 0 | 0xffff958508949040 | 0xffff958518ae50c0 | 0xffff958508949040 | 0x0 | 0xffffb8063586ffb0 | 0xffffa80068bac000 |
| 4 | 0xffffa80068bd7180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x3841 | 0 | 0xffff95850894a040 | 0x0 | 0xffff95850894a040 | 0x0 | 0xffffb80635887fb0 | 0xffffa80068bf2000 |
| 5 | 0xffffa80068d11180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff95850894b040 | 0x0 | 0xffff95850894b040 | 0x0 | 0xffffb8063589ffb0 | 0xffffa80068c76000 |
| 6 | 0xffffa80068dd1180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff9585089b8040 | 0x0 | 0xffff9585089b8040 | 0x0 | 0xffffb806358b7fb0 | 0xffffa80068dec000 |
| 7 | 0xffffa80068e18180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff9585089bc080 | 0x0 | 0xffff9585089bc080 | 0x0 | 0xffffb806358cffb0 | 0xffffa80068e33000 |
| 8 | 0xffffa80068f91180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff9585089ba040 | 0x0 | 0xffff9585089ba040 | 0x0 | 0xffffb806358e7fb0 | 0xffffa80068ef7000 |
| 9 | 0xffffa80069051180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff9585089c9040 | 0x0 | 0xffff9585089c9040 | 0x0 | 0xffffb806358fffb0 | 0xffffa8006906c000 |
| 10 | 0xffffa80069097180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22e3 | 0 | 0xffff9585089ca040 | 0x0 | 0xffff9585089ca040 | 0x0 | 0xffffb80635917fb0 | 0xffffa800690b2000 |
| 11 | 0xffffa800691c0180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff9585089cb040 | 0x0 | 0xffff9585089cb040 | 0x0 | 0xffffb8063592ffb0 | 0xffffa80069176000 |
| 12 | 0xffffa800692d1180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff9585089db040 | 0x0 | 0xffff9585089db040 | 0x0 | 0xffffb80635947fb0 | 0xffffa800691fe000 |
| 13 | 0xffffa80069391180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x3841 | 0 | 0xffff9585089dc040 | 0x0 | 0xffff9585089dc040 | 0x0 | 0xffffb8063595ffb0 | 0xffffa800693ac000 |
| 14 | 0xffffa800693d7180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff9585089e7040 | 0x0 | 0xffff9585089e7040 | 0x0 | 0xffffb80635977fb0 | 0xffffa800693f2000 |
| 15 | 0xffffa80069551180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff9585089ed040 | 0x0 | 0xffff9585089ed040 | 0x0 | 0xffffb8063598ffb0 | 0xffffa800694b6000 |
| 16 | 0xffffa800695c0180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a02080 | 0x0 | 0xffff958508a02080 | 0x0 | 0xffffb806359a7fb0 | 0xffffa800695db000 |
| 17 | 0xffffa800696d1180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a0c040 | 0x0 | 0xffff958508a0c040 | 0x0 | 0xffffb806359bffb0 | 0xffffa800696ec000 |
| 18 | 0xffffa80069717180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a11040 | 0x0 | 0xffff958508a11040 | 0x0 | 0xffffb806359d7fb0 | 0xffffa80069732000 |
| 19 | 0xffffa80069891180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a18040 | 0x0 | 0xffff958508a18040 | 0x0 | 0xffffb806359effb0 | 0xffffa800697f6000 |
| 20 | 0xffffa80069951180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a1d040 | 0x0 | 0xffff958508a1d040 | 0x0 | 0xffffb80635a07fb0 | 0xffffa8006996c000 |
| 21 | 0xffffa80069998180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a24080 | 0x0 | 0xffff958508a24080 | 0x0 | 0xffffb80635a1ffb0 | 0xffffa800699b3000 |
| 22 | 0xffffa80069b11180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a2a040 | 0x0 | 0xffff958508a2a040 | 0x0 | 0xffffb80635a37fb0 | 0xffffa80069a78000 |
| 23 | 0xffffa80069b49180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a2f040 | 0x0 | 0xffff958508a2f040 | 0x0 | 0xffffb80635a4ffb0 | 0xffffa80069b64000 |
| 24 | 0xffffa80069bd0180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x387e | 0 | 0xffff958508a36040 | 0x0 | 0xffff958508a36040 | 0x0 | 0xffffb80635a67fb0 | 0xffffa80069beb000 |
| 25 | 0xffffa80069ce8180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a3b040 | 0x0 | 0xffff958508a3b040 | 0x0 | 0xffffb80635a7ffb0 | 0xffffa80069d98000 |
| 26 | 0xffffa80069e51180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a41040 | 0x0 | 0xffff958508a41040 | 0x0 | 0xffffb80635a97fb0 | 0xffffa80069e6c000 |
| 27 | 0xffffa80069e97180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a46080 | 0x0 | 0xffff958508a46080 | 0x0 | 0xffffb80635aaffb0 | 0xffffa80069eb2000 |
| 28 | 0xffffa80069fc0180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a43040 | 0x0 | 0xffff958508a43040 | 0x0 | 0xffffb80635ac7fb0 | 0xffffa80069f76000 |
| 29 | 0xffffa8006a0a2180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a44040 | 0x0 | 0xffff958508a44040 | 0x0 | 0xffffb80635adffb0 | 0xffffa8006a0bd000 |
| 30 | 0xffffa8006a151180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a53040 | 0x0 | 0xffff958508a53040 | 0x0 | 0xffffb80635af7fb0 | 0xffffa8006a16c000 |
| 31 | 0xffffa8006a198180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a54040 | 0x0 | 0xffff958508a54040 | 0x0 | 0xffffb80635b0ffb0 | 0xffffa8006a1b3000 |
| 32 | 0xffffa8006a311180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff9585088ef300 | 0x0 | 0xffff9585088ef300 | 0x0 | 0xffffb80635b27fb0 | 0xffffa8006a278000 |
| 33 | 0xffffa8006a349180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a6c200 | 0x0 | 0xffff958508a6c200 | 0x0 | 0xffffb80635b3ffb0 | 0xffffa8006a364000 |
| 34 | 0xffffa8006a3d0180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a68080 | 0x0 | 0xffff958508a68080 | 0x0 | 0xffffb80635b57fb0 | 0xffffa8006a3eb000 |
| 35 | 0xffffa8006a4e8180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22e0 | 0 | 0xffff9585088bf040 | 0x0 | 0xffff9585088bf040 | 0x0 | 0xffffb80635b6ffb0 | 0xffffa8006a598000 |
| 36 | 0xffffa8006a651180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a79080 | 0x0 | 0xffff958508a79080 | 0x0 | 0xffffb80635b87fb0 | 0xffffa8006a66c000 |
| 37 | 0xffffa8006a697180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a64040 | 0x0 | 0xffff958508a64040 | 0x0 | 0xffffb80635b9ffb0 | 0xffffa8006a6b2000 |
| 38 | 0xffffa8006a7c0180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a65040 | 0x0 | 0xffff958508a65040 | 0x0 | 0xffffb80635bb7fb0 | 0xffffa8006a776000 |
| 39 | 0xffffa8006a8a2180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a8a080 | 0x0 | 0xffff958508a8a080 | 0x0 | 0xffffb80635bcffb0 | 0xffffa8006a8bd000 |
| 40 | 0xffffa8006a951180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a66040 | 0x0 | 0xffff958508a66040 | 0x0 | 0xffffb80635be7fb0 | 0xffffa8006a96c000 |
| 41 | 0xffffa8006a998180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a76040 | 0x0 | 0xffff958508a76040 | 0x0 | 0xffffb80635bfffb0 | 0xffffa8006a9b3000 |
| 42 | 0xffffa8006ab11180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a77040 | 0x0 | 0xffff958508a77040 | 0x0 | 0xffffb80635c17fb0 | 0xffffa8006aa78000 |
| 43 | 0xffffa8006ab49180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a86040 | 0x0 | 0xffff958508a86040 | 0x0 | 0xffffb80635c2ffb0 | 0xffffa8006ab64000 |
| 44 | 0xffffa8006abd0180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x3b2d | 0 | 0xffff958508a88040 | 0x0 | 0xffff958508a88040 | 0x0 | 0xffffb80635c47fb0 | 0xffffa8006abeb000 |
| 45 | 0xffffa8006ace8180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508aac080 | 0x0 | 0xffff958508aac080 | 0x0 | 0xffffb80635c5ffb0 | 0xffffa8006ad98000 |
| 46 | 0xffffa8006ae51180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a98040 | 0x0 | 0xffff958508a98040 | 0x0 | 0xffffb80635c77fb0 | 0xffffa8006ae6c000 |
| 47 | 0xffffa8006ae97180 | 1 | 0x0 | 0x1e00 | 0x0 | 0x500 | 0x0 | 0x0 | 0x22df | 0 | 0xffff958508a99040 | 0x0 | 0xffff958508a99040 | 0x0 | 0xffffb80635c8ffb0 | 0xffffa8006aeb2000 |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
@$corelist()
0: kd> k
# Child-SP RetAddr Call Site
00 fffff807`0cb0fa28 fffff807`09ed00e1 nt!KeBugCheckEx [minkernel\ntos\ke\amd64\procstat.asm @ 140]
01 (Inline Function) --------`-------- nt!KiAccumulateCumulativeDpcTicks+0xa8 [minkernel\ntos\ke\runtime.c @ 872]
02 (Inline Function) --------`-------- nt!KiAccumulateKernelTicks+0x1d9 [minkernel\ntos\ke\runtime.c @ 969]
03 fffff807`0cb0fa30 fffff807`09ecf95c nt!KeAccumulateTicks+0x231 [minkernel\ntos\ke\runtime.c @ 1139]
04 fffff807`0cb0fa90 fffff807`09eccf33 nt!KiUpdateRunTime+0xcc [minkernel\ntos\ke\runtime.c @ 1343]
05 fffff807`0cb0fc00 fffff807`09ecdb78 nt!KiUpdateTime+0x613 [minkernel\ntos\ke\runtime.c @ 1722]
06 fffff807`0cb0fea0 fffff807`09ecd429 nt!KeClockInterruptNotify+0x228 [minkernel\ntos\ke\clocktick.c @ 2763]
07 (Inline Function) --------`-------- nt!HalpTimerClockInterruptEpilogCommon+0xf [minkernel\hals\lib\timers\common\timerint.c @ 346]
08 (Inline Function) --------`-------- nt!HalpTimerClockInterruptCommon+0xf3 [minkernel\hals\lib\timers\common\timerint.c @ 295]
09 fffff807`0cb0ff40 fffff807`09f1b72c nt!HalpTimerClockInterrupt+0x109 [minkernel\hals\lib\timers\common\timerint.c @ 906]
0a fffff807`0cb0ff70 fffff807`0a03429a nt!KiCallInterruptServiceRoutine+0x9c [minkernel\ntos\ke\intrcmn.c @ 1716]
0b fffff807`0cb0ffb0 fffff807`0a034b07 nt!KiInterruptSubDispatchNoLockNoEtw+0xfa [minkernel\ntos\ke\amd64\intsup.asm @ 572]
0c ffffb806`39767060 fffff807`09f39da5 nt!KiInterruptDispatchNoLockNoEtw+0x37 [minkernel\ntos\ke\amd64\intsup.asm @ 693]
0d ffffb806`397671f0 fffff807`09f39c46 nt!KiInitiateGenericCallDpc+0xb5 [minkernel\ntos\ke\dpcobj.c @ 436]
0e ffffb806`39767240 fffff807`09efee2f nt!KiGenericCallDpcInitiatorWorker+0x66 [minkernel\ntos\ke\dpcobj.c @ 486]
0f ffffb806`39767280 fffff807`09f65d07 nt!KeGenericProcessorCallback+0x12b [minkernel\ntos\ke\miscc.c @ 1668]
10 ffffb806`39767450 fffff807`0a3ee05b nt!KeGenericCallDpc+0x27 [minkernel\ntos\ke\dpcobj.c @ 1443]
11 (Inline Function) --------`-------- nt!EtwpSynchronizeWithElevatedIrqlLogging+0xe [minkernel\ntos\etw\tracelog.c @ 6286]
12 ffffb806`39767490 fffff807`0a29a569 nt!EtwpFreeLoggerContext+0xc7 [minkernel\ntos\etw\tracesup.c @ 2562]
13 ffffb806`397674f0 fffff807`09e12667 nt!EtwpLogger+0x2a9 [minkernel\ntos\etw\tracelog.c @ 2629]
14 ffffb806`39767570 fffff807`0a0370a4 nt!PspSystemThreadStartup+0x57 [minkernel\ntos\ps\psexec.c @ 10885]
15 ffffb806`397675c0 00000000`00000000 nt!KiStartSystemThread+0x34 [minkernel\ntos\ke\amd64\threadbg.asm @ 83]

The value in rsp+50 gets decremented by each core after processing the DPC dispatched to them. This value was 1 (1 core left not finished).
Core 3 has the DPC still outstanding.
0: kd> !dpcs 3
CPU Type KDPC Function
3: Normal : 0xffffa80068b98fe0 0xfffff80709f65f70 nt!EtwpSynchronizationDpc
3: Normal : 0xffffa80068b99c68 0xfffff80709f548d0 nt!KiEntropyDpcRoutine
3: Normal : 0xffffa80068b995d8 0xfffff80709ebab00 nt!PpmPerfAction
3: Normal : 0xffff95851728d328 0xfffff8070e8723f0 stornvme!NVMeCompletionDpcRoutine
Check out what core 3 is doing.
0: kd> ~3
3: kd> k
# Child-SP RetAddr Call Site
00 ffffb806`358773e0 fffff807`09f2661a nt!HvlEndSystemInterrupt+0x1e [minkernel\ntos\hvl\amd64\hvls.asm @ 74]
01 ffffb806`35877400 fffff807`0a0342f4 nt!HalPerformEndOfInterrupt+0x1a [minkernel\hals\lib\interrupts\common\entry.c @ 1046]
02 ffffb806`35877430 fffff807`0a036efa nt!KiInterruptDispatch+0x44 [minkernel\ntos\ke\amd64\intsup.asm @ 613]
03 ffffb806`358775c0 00000000`00000000 nt!KiIdleLoop+0x5a [minkernel\ntos\ke\amd64\idle.asm @ 118]
Core 3 is stuck on ending system interrupt. Let’s see what this routine is doing

This looks really simple, one particular interesting bit is the wrmsr instruction. This writes to the MSR of the CPU
According to x86 manual, the MSR being referenced will be stored in ECX. In our case, ECX = 40000070 .

In normal case, we lookup the processor manual to identify what the MSR is for, and request PAE to confirm what’s the reason it is stuck processing. But in our case, you won’t find this MSR in x86 developer manual. This is because this MSR is not for the CPU, but is handled by Hyper-V. Virtual Interrupt Controller | Microsoft Learn
I’ll stop digging from this steps onward, as this issue is actually not related to Hyper-V at all, it is just a coincindence that all failure happens near here.
Speculations
Let’s think a bit, what can possibly lead to this situation?
- The core is physically “stuck”. i.e. it is not running any instructions anymore, just sitting there.
- There is a loop somewhere outside, and we’re entering this function constantly.
- Since we’re handling an interrupt, perhaps something keeps interrupting us?
We could make some assumptions:
- Unlikely, since if core hangs, it likely can’t process any instruction
- KeBugCheck use IPI to interrupt cores and request them to save current context so WinDbg can see registers, stacks.. in BSOD dumps. In our case we can see these info, meaning the IPI executed to capture them in dump.
- In !corelist, the suspected hang core 3 has no accmulated ticks, and its ClockKeepAlive is 1, it is probably processing timer interrupt before.
- We did not hit 0x101.
- It is easily checked, we just uf all previous functions and we see no loop.
- Check live, check CKCL, find who’s being interrupted to deduce who is interrupting us.
CKCL

Before BSOD, CKCL seems clean, we’ll see why in a moment.
Who is the interrupt?
In nt!KiInterruptDispatch the entrypoint of interrupts, it calls nt!KiInterruptSubDispatch . Both of these routine are ASM, starting from nt!KiCallInterruptServiceRoutine it becomes C, with _KINTERRUPT as 1st parameter RCX.

3: kd> dt nt!KiCallInterruptServiceRoutine
KiCallInterruptServiceRoutine unsigned char (
_KINTERRUPT*,
unsigned char)
When calling nt!HalPerformEndOfInterrupt , it is in C and 1st parameter is also the _KINTERRUPT .
3: kd> dt nt!HalPerformEndOfInterrupt
HalPerformEndOfInterrupt void (
_KINTERRUPT*)
Inspect the routine and found out it is pushed to RBX.

3: kd> k
# Child-SP RetAddr Call Site
00 ffffb806`358773e0 fffff807`09f2661a nt!HvlEndSystemInterrupt+0x1e [minkernel\ntos\hvl\amd64\hvls.asm @ 74]
01 ffffb806`35877400 fffff807`0a0342f4 nt!HalPerformEndOfInterrupt+0x1a [minkernel\hals\lib\interrupts\common\entry.c @ 1046]
02 ffffb806`35877430 fffff807`0a036efa nt!KiInterruptDispatch+0x44 [minkernel\ntos\ke\amd64\intsup.asm @ 613]
03 ffffb806`358775c0 00000000`00000000 nt!KiIdleLoop+0x5a [minkernel\ntos\ke\amd64\idle.asm @ 118]
3: kd> .frame /r 0x1
01 ffffb806`35877400 fffff807`0a0342f4 nt!HalPerformEndOfInterrupt+0x1a [minkernel\hals\lib\interrupts\common\entry.c @ 1046]
rax=fffff8070a02e260 rbx=ffffa8006941c8c0 rcx=ffffa8006941c8c0
rdx=ffff958508cce000 rsi=0000000000000000 rdi=ffff958508949040
rip=fffff80709f2661a rsp=ffffb80635877400 rbp=ffffb806358774b0
r8=0000000000000002 r9=ffffb80635877278 r10=0000fffff8070a02
r11=ffff97fb80800000 r12=0000000000001ddc r13=ffff958519f64080
r14=ffff95850898a040 r15=ffffa80068ba0000
iopl=0 nv up di pl zr na po nc
cs=0010 ss=0018 ds=0000 es=0000 fs=0000 gs=0000 efl=00000046
nt!HalPerformEndOfInterrupt+0x1a:
fffff807`09f2661a 803d9da3930000 cmp byte ptr [nt!HalpInterruptDirectedEoiModeEnabled (fffff807`0a8609be)],0 ds:fffff807`0a8609be=01
3: kd> dt nt!_KINTERRUPT ffffa8006941c8c0
+0x000 Type : 0n22
+0x002 Size : 0n288
+0x008 InterruptListEntry : _LIST_ENTRY [ 0x00000000`00000000 - 0x00000000`00000000 ]
+0x018 ServiceRoutine : 0xfffff807`11531fe0 unsigned char dxgkrnl!DpiFdoLineInterruptRoutine+0
+0x020 MessageServiceRoutine : (null)
+0x028 MessageIndex : 0
+0x030 ServiceContext : 0xffff9585`23914030 Void
+0x038 SpinLock : 0
+0x040 TickCount : 0
+0x048 ActualLock : 0xffff9585`1a0ffdd0 -> 0
+0x050 DispatchAddress : 0xfffff807`0a0342b0 void nt!KiInterruptDispatch+0
+0x058 Vector : 0x51
+0x05c Irql : 0x5 ''
+0x05d SynchronizeIrql : 0x5 ''
+0x05e FloatingSave : 0 ''
+0x05f Connected : 0x1 ''
+0x060 Number : 3
+0x064 ShareVector : 0x1 ''
+0x065 EmulateActiveBoth : 0 ''
+0x066 ActiveCount : 0
+0x068 InternalState : 0n4
+0x06c Mode : 0 ( LevelSensitive )
+0x070 Polarity : 0 ( InterruptPolarityUnknown )
+0x074 ServiceCount : 0
+0x078 DispatchCount : 0
+0x080 PassiveEvent : (null)
+0x088 TrapFrame : 0xffffb806`35877430 _KTRAP_FRAME
+0x090 DisconnectData : (null)
+0x098 ServiceThread : (null)
+0x0a0 ConnectionData : 0xffff9585`1ad64d30 _INTERRUPT_CONNECTION_DATA
+0x0a8 IntTrackEntry : 0xffff9585`1abfd040 Void
+0x0b0 IsrDpcStats : _ISRDPCSTATS
+0x110 RedirectObject : (null)
+0x118 Padding : [8] ""
So the current interrupt we’re “ending service” is something for dxgkrnl. Keep in mind our GFX driver will not register interrupt directly, instead it sets the ISR for dxg to callback to us.
In this case, the system only contains AMD GPU, so it looks trending our problem.
We are not yet sure if the interrupt is actually from us, perhaps some other device is sharing our vector, we run !idt
3: kd> !idt
Dumping IDT: ffffa80068b9f000
00: fffff8070a03ef00 nt!KiDivideErrorFault
01: fffff8070a03f240 nt!KiDebugTrapOrFault Stack = 0xFFFFA80068BCF000
02: fffff8070a03f840 nt!KiNmiInterrupt Stack = 0xFFFFA80068BC8000
03: fffff8070a03fdc0 nt!KiBreakpointTrap
04: fffff8070a040100 nt!KiOverflowTrap
05: fffff8070a040440 nt!KiBoundFault
06: fffff8070a040b00 nt!KiInvalidOpcodeFault
07: fffff8070a041180 nt!KiNpxNotAvailableFault
08: fffff8070a041540 nt!KiDoubleFaultAbort Stack = 0xFFFFA80068BBA000
09: fffff8070a041880 nt!KiNpxSegmentOverrunAbort
0a: fffff8070a041bc0 nt!KiInvalidTssFault
0b: fffff8070a041f00 nt!KiSegmentNotPresentFault
0c: fffff8070a0422c0 nt!KiStackFault
0d: fffff8070a042640 nt!KiGeneralProtectionFault
0e: fffff8070a0429c0 nt!KiPageFault
10: fffff8070a043180 nt!KiFloatingErrorFault
11: fffff8070a043540 nt!KiAlignmentFault
12: fffff8070a043880 nt!KiMcheckAbort Stack = 0xFFFFA80068BC1000
13: fffff8070a044600 nt!KiXmmException
14: fffff8070a044a00 nt!KiVirtualizationException
15: fffff8070a0450c0 nt!KiControlProtectionFault
1f: fffff8070a037bd0 nt!KiApcInterrupt
20: fffff8070a039ed0 nt!KiSwInterrupt
29: fffff8070a0457c0 nt!KiRaiseSecurityCheckFailure
2c: fffff8070a045b40 nt!KiRaiseAssertion
2d: fffff8070a045ec0 nt!KiDebugServiceTrap
2e: fffff8070a046240 nt!KiSystemService
2f: fffff8070a03a680 nt!KiDpcInterrupt
30: fffff8070a038350 nt!KiHvInterrupt
31: fffff8070a0386a0 nt!KiVmbusInterrupt0
32: fffff8070a0389f0 nt!KiVmbusInterrupt1
33: fffff8070a038d40 nt!KiVmbusInterrupt2
34: fffff8070a039090 nt!KiVmbusInterrupt3
35: fffff8070a0358d8 nt!HalpInterruptDeferredErrorService (KINTERRUPT ffff95850a0110e0)
36: fffff8070a0358e0 nt!HalpInterruptDeferredErrorService (KINTERRUPT ffff95850a011200)
51: fffff8070a0359b8 dxgkrnl!DpiFdoLineInterruptRoutine (KINTERRUPT ffffa8006941c8c0)
61: fffff8070a035a38 NDIS!ndisMiniportMessageIsr (KINTERRUPT ffffa8006941c500)
70: fffff8070a035ab0 stornvme!NVMeHwMSIInterrupt (STORPORT) (KINTERRUPT ffffa8006941ca00)
82: fffff8070a035b40 NDIS!ndisMiniportMessageIsr (KINTERRUPT ffffa8006941c000)
90: fffff8070a035bb0 pci!ExpressRootPortMessageRoutine (KINTERRUPT ffffa8006941cb40)
91: fffff8070a035bb8 USBXHCI!Interrupter_WdfEvtInterruptIsr (KMDF) (KINTERRUPT ffffa8006941c140)
92: fffff8070a035bc0 NDIS!ndisMiniportMessageIsr (KINTERRUPT ffffa8006941c780)
a0: fffff8070a035c30 pci!ExpressRootPortMessageRoutine (KINTERRUPT ffffa8006941cc80)
a1: fffff8070a035c38 USBXHCI!Interrupter_WdfEvtInterruptIsr (KMDF) (KINTERRUPT ffffa8006941c280)
a2: fffff8070a035c40 NDIS!ndisMiniportMessageIsr (KINTERRUPT ffffa8006941c3c0)
b0: fffff8070a035cb0 pci!ExpressRootPortMessageRoutine (KINTERRUPT ffffa8006941cdc0)
b2: fffff8070a035cc0 NDIS!ndisMiniportMessageIsr (KINTERRUPT ffffa8006941c640)
d1: fffff8070a035db8 nt!HalpTimerClockInterrupt (KINTERRUPT ffff95850a011b00)
d2: fffff8070a035dc0 nt!HalpTimerClockIpiRoutine (KINTERRUPT ffff95850a0119e0)
d7: fffff8070a035de8 nt!HalpInterruptRebootService (KINTERRUPT ffff95850a011560)
d8: fffff8070a035df0 nt!HalpInterruptStubService (KINTERRUPT ffff95850a011440)
df: fffff8070a035e28 nt!HalpInterruptSpuriousService (KINTERRUPT ffff95850a011320)
e1: fffff8070a03ad50 nt!KiIpiInterrupt
e2: fffff8070a035e40 nt!HalpInterruptLocalErrorService (KINTERRUPT ffff95850a011680)
e3: fffff8070a035e48 nt!HalpInterruptDeferredRecoveryService (KINTERRUPT ffff95850a0118c0)
fe: fffff8070a035f20 nt!HalpPerfInterrupt (KINTERRUPT ffff95850a0117a0)
So it is only possible from our GPU.
Live Debug
We break on AtiInterrupt on live, and we keep getting hit on this interrupt. This may not necessary mean failure, but just some interrupt from HW for our driver.
When writing ISR for WDDM driver, the driver supply the ISR as DxgkDdiInterruptRoutine , DXGKDDI_INTERRUPT_ROUTINE (dispmprt.h) - Windows drivers | Microsoft Learn
Summary of expected logic in supplied ISR:
- Is the interrupt line based or message based?
- Line based
MessageNumber = 0- Is the interrupt from the hardware of the servicing driver?
- Yes
- Dismiss interrupt. Make the device shut up.
- Service interrupt
- Create DPCs if necessary.
- Return TRUE.
- No
- Return FALSE.
- Yes
- Is the interrupt from the hardware of the servicing driver?
- Message based
MessageNumber != 0- (Since it is message based, it must be from the owning hardware)
- Service interrupt
- Create DPCs if necessary.
- Return TRUE.
- Line based
Inspecting entry point from dxgkrnl to our driver dxgkrnl!DpiFdoMessageInterruptRoutine . (dxgkrnl!DpiFdoLineInterruptRoutine just zeroes the MessageNumber parameter and calls dxgkrnl!DpiFdoMessageInterruptRoutine)
3: kd> ub
dxgkrnl!DpiFdoMessageInterruptRoutine+0x42 [onecoreuap\windows\core\dxkernel\dxgkrnl\port\dpfdo.cxx @ 10462]:
fffff802`0ba53f22 488b4f40 mov rcx,qword ptr [rdi+40h]
fffff802`0ba53f26 8bd3 mov edx,ebx
fffff802`0ba53f28 488b4128 mov rax,qword ptr [rcx+28h]
fffff802`0ba53f2c 488b4930 mov rcx,qword ptr [rcx+30h]
fffff802`0ba53f30 488b80b8000000 mov rax,qword ptr [rax+0B8h] -> rax = amdkmdag!ProxyInterruptRoutine
fffff802`0ba53f37 e864634600 call fffff802`0beba2a0 -> Guard icall to ISR
fffff802`0ba53f3c 90 nop
fffff802`0ba53f3d 488d4c2420 lea rcx,[rsp+20h]
The return value after icall to ISR is 0
3: kd> r
rax=0000000000000000 rbx=0000000000000000 rcx=ffffd980e0d2cef0
rdx=ffffb88f7b2c3280 rsi=ffffd980e3d961f0 rdi=ffffb88f7d195030
rip=fffff8020ba53f42 rsp=ffffd980e0d2ced0 rbp=0000000000000001
r8=0000000000000002 r9=0000000000000000 r10=0000fffff8025526
r11=ffffd980e0d2cec0 r12=0000000000001478 r13=ffffb88f74b5d080
r14=0000000000000001 r15=0000cfd8b84c98de
iopl=0 nv up ei pl zr na po nc
cs=0010 ss=0018 ds=002b es=002b fs=0053 gs=002b efl=00000246
dxgkrnl!DpiFdoMessageInterruptRoutine+0x62:
fffff802`0ba53f42 0fb6d8 movzx ebx,al
But on this vector there’s only our driver registered. Since our driver didn’t do any further processing, the interrupt is not cleared, leading to an interrupt storm.
This is a good point to hand to KMD developer to further dig out why our driver didn’t recognize the interrupt. Also if you see other vendor’s driver with same situation, it will be a good evidence to requeue to them.
Dive Deeper
Recall the service function dxgkrnl!DpiFdoLineInterruptRoutine Line means line interrupt, this is an ISR for line interrupts. After consult with GFX KMD developer, our hardware should not be using this interrupt, driver does not do anything with it, all interrupt from GFX should be message based.
These interrupts are asserted with PCIE, as experiment, we could disable PCIE interrupt as test.
!pci 0x100 b d f : Show the PCIE configuration space
PCI Configuration Space (Segment:0000 Bus:10 Device:00 Function:00)
Common Header:
00: VendorID 1002 ATI Technologies
02: DeviceID 7422
04: Command 0006 MemSpaceEn BusInitiate
06: Status 0018 INTPending CapList
This status indicate interrupt pending from PCIE device not yet acknowledged. We could force disable the interrupt by writing InterruptDis in PCIE config space.
!ecb 0xB.D.F 0x05 0x04 0x05: Second part of 0x04 Command, 0x04: InterruptDis flag. See PCIE spec.
In this case, after disabling, the system correctly shuts down.
Where is the interrupt coming from?
The interrupt manager in our GFX is NBIF. After consulting NBIF team, the line interrupt only comes from BACO exit done, the NBIF raise the interrupt.
From Navi3x BACO MAS:
If BACO exit is triggered by doorbell transaction, MP1 shall configure nBIF to send MSI/INTx message to host to indicate a doorbell transaction. GFX driver then gets informed on the doorbell transaction with the interrupt.
- Set
BIF_MP1_INTR_CTRL.BACO_EXIT_DONEto 0x1.The driver ISR shall support this:
When
BIF_BX:BIF_RB_CNTL.RB_ENABLEis 0, driver needs to pollBIF_BX:BIF_DOORBELL_INT_CNTL.DOORBELL_INTERRUPT_STATUS. If it is 1, the driver shall write 1 toBIF_BX:BIF_DOORBELL_INT_CNTL.DOORBELL_INTERRUPT_CLEARto clear that status to de-assert the interrupt.