Unexporting a function from a DLL at runtime by name obfuscation

Usually, the set of functions exported by a Windows DLL is considered to be immutable. When load-time binding is used, functions appearing in the loading module’s import table are resolved against the export table of the loaded DLL. If any imported functions are missing, the loader aborts the module load attempt and unmaps the modules.

Library clients which require a more flexible reaction to the presence or absence of specific exports opt for a more dynamic binding approach, either late binding via GetProcAddress directly or through the Visual C++ delay loading facility, which allows for setting callbacks for missing modules or exports.

Recently, as I was experimenting with implementing an LSA proxy authentication package (more on that in a followup post) I considered the issue of properly implementing a proxy DLL for a DLL whose set of exports is only partially known to the proxy. Modern LSA APs have a callback function that provides a function dispatch table and do no rely on the module’s export table, but for pedagogic purposes let us consider an hypothetical situation in which the export table is the only lookup apparatus in use by the LSA.

In the case of an LSA authentication package, the package may provide an implementation of either LsaApLogonUser, LsaApLogonUserEx or LsaApLogonUserEx2. An original LSA AP can export one or more of these functions. If we want to create a generic LSA AP proxy DLL, we may wish to have a specific export from our module, loaded in lieu of the original AP, only if the original module also had it. This presents difficulty since we may not be able to predict the specific set of exports during proxy compile time.

If we recognize that DLL exports are in fact a poor man’s native code reflection mechanism, we can adopt a non-traditional approach of dynamic modification of our module’s set of exported functions. Our proxy DLL shall initially (at compile time) export all functions that could potentially be exported by the original DLL, e.g. by a module definition (.DEF) file like this:

LIBRARY "lsaprxap"
EXPORTS
LsaApInitializePackage
LsaApLogonUser
LsaApLogonUserEx
LsaApLogonUserEx2

LsaApCallPackage
LsaApCallPackagePassthrough
LsaApLogonTerminated
LsaApCallPackageUntrusted
SpInitialize
SpInstanceInit
SpLsaModeInitialize
SpUserModeInitialize

Notice how we export all three functions, even though some may not appear in the original DLL. LSA uses late binding via GetProcAddress to decide which function in the AP to call. If we export LsaApLogonUserEx2 and our original DLL does not have that function, we’ll have nothing to do when our proxy function is called (no original to forward to after our own processing). There is no telling what will happen if we return STATUS_NOT_IMPLEMENTED. Besides, LSA AP’s are only an illustration, and in other cases the export might not even have the option to return a failure exit code. Therefore, the behavior we desire is that GetProcAddress for LsaApLogonUserEx2 will fail and return NULL if the original DLL for which we are acting as a proxy does not export LsaApLogonUserEx2 itself.

The names and addresses of exported functions from a DLL appear in the PE image’s export directory. By accessing the export directory, looking up an exported function of interest and removing it we can alter the behavior of GetProcAddress for the exported function at runtime, after the module has been loaded. Note that this alteration is only useful for GetProcAddress invocations from that time forward, and callers that invoked GetProcAddress earlier and cached the result or callers that used load-time binding against our export table already obtained a function pointer to the exported function. Therefore, this technique is only useful in limited circumstances.

The export directory points to a block of null-terminated string pointers, indexed by export ordinal. In order to outright remove an export from the middle of the export table, we’d have to copy the export table aside, remove the desired functions and point the PE header to the new export table. This is a feasible, but cumbersome approach. Instead I opted for an alternative technique – obfuscating the name of the export to prevent GetProcAddress callers from resolving it by its well-known name. The function is still exported, but its name is unknown to other callers. This is probably sufficient for the vast majority of cases. As for the obfuscation itself, in this illustration we’ll merely increment the character value of the first letter in the export:

// Set by the linker to the base address of the module.
EXTERN_C IMAGE_DOS_HEADER __ImageBase;

void UnexportFunction(LPSTR ExportName)
{
	IMAGE_DOS_HEADER* dosHeader = &__ImageBase;
	assert(dosHeader->e_magic == IMAGE_DOS_SIGNATURE);
	IMAGE_NT_HEADERS* ntHeaders = reinterpret_cast<IMAGE_NT_HEADERS*>(
		reinterpret_cast<BYTE*>(dosHeader) + dosHeader->e_lfanew);
	assert(ntHeaders->Signature == 0x00004550);
	IMAGE_OPTIONAL_HEADER* optionalHeader = &ntHeaders->OptionalHeader;
	IMAGE_DATA_DIRECTORY* exportDataDirectory =
		&optionalHeader->DataDirectory[IMAGE_DIRECTORY_ENTRY_EXPORT];
	IMAGE_EXPORT_DIRECTORY* exportDirectory = reinterpret_cast<IMAGE_EXPORT_DIRECTORY*>(
		reinterpret_cast<BYTE*>(dosHeader) + exportDataDirectory->VirtualAddress);

	ULONG* addressOfNames = reinterpret_cast<ULONG*>(
		reinterpret_cast<BYTE*>(dosHeader) + exportDirectory->AddressOfNames);
	for (DWORD i = 0; i < exportDirectory->NumberOfFunctions; i++)
	{
		LPSTR exportFunctionName = reinterpret_cast<LPSTR>(
			reinterpret_cast<BYTE*>(dosHeader) + addressOfNames[i]);
		if (strcmp(exportFunctionName, ExportName) == 0)
		{
			DWORD oldProtect = 0;
			BOOL rc = VirtualProtect(
				exportFunctionName,
				strlen(exportFunctionName),
				PAGE_READWRITE,
				&oldProtect);
			if (rc == FALSE)
			{
				OutputDebugString(TEXT("VirtualProtect failed.\n"));
			}
			exportFunctionName[0]++;
			break;
		}
	}
}

The sample UnexportFunction function iterates over the current module’s export table until the function of interest is encountered. Since the export table is mapped as read-only memory, VirtualProtect must be used to allow for its modification. The string containing the name of the exported function is modified in place as a trivial obfuscation. This is sufficient to result in the “unexporting” of the symbol:

FARPROC before = GetProcAddress((HMODULE)&__ImageBase, "LsaApLogonUser");
UnexportFunction("LsaApLogonUser");
FARPROC after = GetProcAddress((HMODULE)&__ImageBase, "LsaApLogonUser");
assert(before != after);
assert(after == NULL);

With this tool of export entry removal at our disposal, we can devise an architecture in which the proxy DLL contains all exports that are feasible to be present in the original DLL at runtime. Through “reflection” of the original DLL, the proxy shall determine which redundant exports it wishes to hide, resulting it runtime behavior consistent with that of the original DLL (again, assuming only late binding with GetProcAddress is used).

What if we are unable to construct a superset of all possible exports from the original DLL? Perhaps we wish to be future-proof as new, unknown exports are added to the original DLL. For that a variety of solutions may be considered. In particular, copying the original DLL’s export table to our own and inserting hooks as needed comes to mind. Perhaps a topic for a future post…

Advertisements

Debugging user-mode BootExecute native applications with kd

Debugging code executing during system startup always poses a unique challenge. One may need to debug a custom or built-in Windows service right from the start, when attaching to it after it has initialized proves insufficient or inappropriate. When developing a GINA hook or GINA stub, the need to debug the Winlogon process before the logon process is performed arises. The inability of the Visual Studio Debugger to be useful in these situations is one of the reasons people turn to Windbg.

For debugging Windows services or the Winlogon process during startup, Image File Execution Options provides a workable solution. As soon as a process of the name specified under the Image File Execution Options registry key is created, the debugger command-line specified in the Debugger value is executed in lieu of the original command-line, which is appended to the debugger command-line. The debugger started might be Visual Studio’s, if appropriate, an interactive Windbg in other cases or an NTSD remote debugging server when you will not or cannot do things like make the service process interactive.

For the vast majority of startup applications, the aforementioned technique is both quite sufficient and convenient. However, there is another, perhaps esoteric, category of startup processes. These run a very early stage of the boot process. They are the BootExecute applications.

BootExecute applications are started by the Session Manager (smss.exe) before invoking the “initial command” (Winlogon in XP) and before the various subsystems are started. As far as user-mode goes, it doesn’t get much earlier than this. Because of their early nature, a significant constraint is in place for BootExecute applications: they are native applications.

Do not confuse this usage of “native” with native code vs. .NET managed code. In this context, native means that only the Windows NT Native API, resident in ntdll.dll, is available. At this stage, the Win32 subsystem, composed of the kernel-mode win32k.sys component and the user-mode client/server runtime, CSRSS, have not yet been started by SMSS. Not even the Kernel32 library is usable by BootExecute applications.

What are these useful for? Those special tasks that must be performed before everything else has started in the system, yet remain in the domain of user-mode work. Consider these two typical examples:

  • AutoCheck, the BootExecute variant of the CHKDSK tool, used to examine the boot volume before it is locked and to fix critical file-system errors.
  • Sysinternals PageDefrag, a BootExecute utility that defragments the Paging File, registry hives and other files inaccessible to defragging by the normal Win32 Disk Defragmentation tool.

We can confirm that AutoCheck is indeed a native application by examining it with Visual C++’s DUMPBIN utility:

C:\WINDOWS\system32>dumpbin /headers autochk.exe
Microsoft (R) COFF/PE Dumper Version 9.00.21022.08
Copyright (C) Microsoft Corporation. All rights reserved.

Dump of file autochk.exe

PE signature found

File Type: EXECUTABLE IMAGE

FILE HEADER VALUES
14C machine (x86)
4 number of sections
48025203 time date stamp Sun Apr 13 21:33:39 2008
0 file pointer to symbol table
0 number of symbols
E0 size of optional header
10E characteristics
Executable
Line numbers stripped
Symbols stripped
32 bit word machine

OPTIONAL HEADER VALUES
10B magic # (PE32)
7.10 linker version
5B800 size of code
34200 size of initialized data
0 size of uninitialized data
D6B9 entry point (0100D6B9) _NtProcessStartupForGS@4
1000 base of code
5D000 base of data
1000000 image base (01000000 to 01091FFF)
1000 section alignment
200 file alignment
5.01 operating system version
5.01 image version
5.01 subsystem version
0 Win32 version
92000 size of image
400 size of headers
96A6F checksum
1 subsystem (Native)
0 DLL characteristics
40000 size of stack reserve
1000 size of stack commit
100000 size of heap reserve
1000 size of heap commit
0 loader flags
10 number of directories
... snipped ...

Notice the subsystem specified for AutoChk is the native subsystem. Notice further that the application’s entrypoint is NtProcessStartup (in its /GS compiler stack buffer overflow protection stub form).

As for PageDefrag, it takes advantage of the Session Manager running its BootExecute application before it has enabled use of the Paging File.

You may find reasons of your own to develop a BootExecute native application, or you may find yourself in a situation requiring debugging of an existing BootExecute application. For instance, you may wish to debug the interactions of AutoChk’s volume locking attempts with your file system filter driver.

Unfortunately, these native applications pose a special difficulty to the user-mode debugger. NTSD is a Win32 application and must be invoked only after the Win32 subsystem has been initialized. Therefore, invocation of NTSD for debugging BootExecute applications is out of the question. Indeed, it is quite likely the Image File Execution Options registry key is not even consulted for BootExecute invocations, as that would be quite pointless.

Theoretically, this problem could be addressed by the development of a native subsystem user mode debugger, in lieu of the Win32-based NTSD. Alex Ionescu, most recently contributing to the eagerly awaited 5th edition of the Windows Internals book, has discussed the specifics of the NT Native Debugging API (DbgUi, etc.) in a series of articles titled Windows Native Debugging Internals.

At the moment, however, I am unaware of any available native subsystem user mode debugger. Such a tool may or may not be available internally in Microsoft. Presumably the Windows developers would benefit from such functionality, but they might also be content with using the kernel debugger for those purposes.

Be that as it may, the rest of us must turn to the kernel debugger for resolution. The kernel debugger can be used for source-level debugging of user-mode applications, including native subsystem applications. The special difficulty with using it is getting to break in the right place at the right time. In lieu of a Image File Execution Options-style apparatus, an alternative approach is required.

When modifying the native BootExecute application in question is feasible, the simple approach of adding an invocation of ntdll’s DbgBreakPoint API to the top of the NtProcessStartup process entrypoint is probably the quickest way to get the desired effect. In the absence of a user-mode debugger, the debug break will make its way to the kernel debugger. The debugger will notice the presence of the user-mode module, load symbols and source and the usual debugger functions will be accessible. If source is not available, in many cases the image can be patched to contain either an invocation of DbgBreakPoint or just an inline INT 3, as appropriate.

Such an approach, however, may not be feasible at all times and has the significant disadvantage of making the modified native application hang when a kernel debugger is not attached to the system at boot. Ideally, we’d like to break at process startup without modifying the native application at all.

When using the user-mode debugger, “sxe ld” can break when user-mode modules are mapped by the loader, as documented in Controlling Exceptions and Events. Normally, the kernel debugger does not provide that capability. However, it turns out that it can do so, once appropriately configured.

Before booting with the kernel debugger, turn on the “Enable loading of kernel debugger symbols” Global Flag, using the GFlags utility bundled with the Debugging Tools for Windows:

C:\Program Files\Debugging Tools for Windows>gflags /r +ksl
Current Boot Registry Settings are: 00040000
ksl - Enable loading of kernel debugger symbols

Although the name and description of this Global Flag appear to have nothing to do with user-mode module load events in the kernel debugger, they acheive the desired effect. Once enabled, we can reboot with the kernel debugger attached and ask for the kernel debugger to break once the desired native application is mapped:

Microsoft (R) Windows Debugger Version 6.9.0003.113 X86
Copyright (c) Microsoft Corporation. All rights reserved.

Connected to Windows XP 2600 x86 compatible target, ptr64 FALSE
Kernel Debugger connection established. (Initial Breakpoint requested)
Symbol search path is: C:\WINDOWS\Symbols;SRV*E:\SymStore*http://referencesource.microsoft.com/symbols;SRV*E:\SymStore*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows XP Kernel Version 2600 MP (1 procs) Free x86 compatible
Built by: 2600.xpsp.080413-2111
Kernel base = 0x804dc000 PsLoadedModuleList = 0x805684c0
System Uptime: not available
Break instruction exception - code 80000003 (first chance)
... snipped breakpoint warning message ...
nt!RtlpBreakWithStatusInstruction:
804e7a42 cc int 3
kd> sxe ld:autochk
kd> g
nt!DebugService2+0x10:
8050ae56 cc int 3

Setting the kernel debugger to break on the load of the AutoChk native BootExecute application resulted in our desired break. Let us consider the context of this break:

0: kd> kb
ChildEBP RetAddr Args to Child
f738d9fc 8050b2f9 f738da40 f738da10 00000003 nt!DebugService2+0x10
f738da20 805c533a f738da40 01000000 82953020 nt!DbgLoadImageSymbols+0x42
f738da70 805c51f0 82ab9c28 01000000 82953020 nt!MiLoadUserSymbols+0x169
f738dab4 8058d013 82ab9c28 01000000 f738db5c nt!MiMapViewOfImageSection+0x4b6
f738db10 80504e27 00000004 82953110 f738db5c nt!MmMapViewOfSection+0x13c
f738db6c 80590520 e165ec14 00000000 e1412398 nt!MmInitializeProcessAddressSpace+0x33d
f738dcbc 8059082f 0015f870 001f0fff 0015f7d8 nt!PspCreateProcess+0x333
f738dd10 805b54b2 0015f870 001f0fff 0015f7d8 nt!NtCreateProcessEx+0x7e
f738dd3c 804e298f 0015f870 001f0fff 0015f7d8 nt!NtCreateProcess+0x3d
f738dd3c 7c90e4f4 0015f870 001f0fff 0015f7d8 nt!KiFastCallEntry+0xfc
WARNING: Frame IP not in any known module. Following frames may be wrong.
0015f830 00000000 00000000 00000000 00000000 0x7c90e4f4
0: kd> !process 0 0
**** NT ACTIVE PROCESS DUMP ****
PROCESS 82bc9830 SessionId: none Cid: 0004 Peb: 00000000 ParentCid: 0000
DirBase: 02f40000 ObjectTable: e1002e40 HandleCount: 46.
Image: System

PROCESS 82935128 SessionId: none Cid: 0218 Peb: 7ffd9000 ParentCid: 0004
DirBase: 0899b000 ObjectTable: e13fbc68 HandleCount: 7.
Image: smss.exe

Although AutoChk has been mapped into memory, the AutoChk process is still in the process of being created. Indeed, the AutoChk process is as of yet absent from the system process list displayed by the !process debugger extension command.

However, AutoChk’s pseudo-created state does not prevent us from taking this opportunity to set up a debug breakpoint at the top of user code:

1: kd> lm m autochk
start end module name
01000000 01092000 autochk (deferred)
1: kd> bp autochk!NtProcessStartup
1: kd> bl
0 e 0100dd3d 0001 (0001) autochk!NtProcessStartup

Beware that if you perform a symbol reload with the .reload command after the module load event for autochk has fired off, you may find that it has disappeared from the debugger’s loaded module list… Just make sure you set up your breakpoint immediately after the event break.

It is easy enough to set a breakpoint at the application’s NtProcessStartup entrypoint before the EPROCESS is available, but we may wish to to set early breakpoints in process context elsewhere. To that end, we may proceed to the return from the process creation API from the module load event break, until the process is listed in the system process list:

1: kd> k
ChildEBP RetAddr
f7b619fc 8050b2f9 nt!DebugService2+0x10
f7b61a20 805c533a nt!DbgLoadImageSymbols+0x42
f7b61a70 805c51f0 nt!MiLoadUserSymbols+0x169
f7b61ab4 8058d013 nt!MiMapViewOfImageSection+0x4b6
f7b61b10 80504e27 nt!MmMapViewOfSection+0x13c
f7b61b6c 80590520 nt!MmInitializeProcessAddressSpace+0x33d
f7b61cbc 8059082f nt!PspCreateProcess+0x333
f7b61d10 805b54b2 nt!NtCreateProcessEx+0x7e
f7b61d3c 804e298f nt!NtCreateProcess+0x3d
f7b61d3c 7c90e4f4 nt!KiFastCallEntry+0xfc
0015f830 00000000 ntdll!KiFastSystemCallRet
1: kd> gu; gu; gu; gu; gu; gu; gu
nt!NtCreateProcessEx+0x7e:
8059082f e87a76f5ff call nt!_SEH_epilog (804e7eae)
1: kd> !process 0 0
**** NT ACTIVE PROCESS DUMP ****
PROCESS 82bc9830 SessionId: none Cid: 0004 Peb: 00000000 ParentCid: 0000
DirBase: 02f40000 ObjectTable: e1002e40 HandleCount: 46.
Image: System

PROCESS 829b4128 SessionId: none Cid: 0228 Peb: 7ffd7000 ParentCid: 0004
DirBase: 08bbb000 ObjectTable: e1468f58 HandleCount: 8.
Image: smss.exe

PROCESS 8294a3d0 SessionId: none Cid: 0238 Peb: 7ffd6000 ParentCid: 0228
DirBase: 08d40000 ObjectTable: e13fd408 HandleCount: 0.
Image: autochk.exe

By examining our location in the call stack after the module load event fires, we can see that returning from the Process Manager’s process creation routine PspCreateProcess would require going up 7 times. With that routine’s execution completed, the EPROCESS for autochk is now listed in the system process list and its value can be used as a context parameter for breakpoint commands, etc.

With the breakpoint on the native entrypoint in place, we can resume system execution and have the kernel debugger land right where we want it:

1: kd> g
Breakpoint 0 hit
autochk!NtProcessStartup:
001b:0100dd3d 8bff mov edi,edi
1: kd> kb
ChildEBP RetAddr Args to Child
0006fff4 00000000 7ffde000 000000c8 0000010a autochk!NtProcessStartup
1: kd> .process
Implicit process is now 82935020
1: kd> .thread
Implicit thread is now 8293f020

From this point, convenient source debugging of the native application is also possible if it’s your own custom written application. The various features such as Locals, Watches, single stepping, etc., work as expected. Some quirks of kernel debugging of a user process should be taken into consideration (make sure breakpoints have an EPROCESS and ETHREAD context specified when appropriate to avoid venturing into other processes by accident, etc.) and the inaccessibility of some user-mode debugger extension commands may prove inconvenient.

Sure beats DbgPrints, though!

Replacing boot load drivers with the Windows Boot Debugger

Recently, I’ve been assigned to work on fixing several bugs in a Windows file system filter driver. Debugging native code has always been characterized by the tedious and cumbersome modify, compile and link, copy, run, repeat… cycle, but in the case of kernel-mode development, the overhead of that cycle is even more acute.

I’ve found that booting the target system or virtual machine every time you want to replace a driver file with an updated build and then rebooting to have the new driver loaded significantly prolongs the cycle. Therefore, I was happy to discover Windbg’s .kdfiles command.

The .kdfiles command configure’s the kernel debugger’s driver replacement map. Whenever the NT Memory Manager attempts to load a driver image, it consults the kernel debugger, if attached, asking it for an alternative driver image. If the debugger has one, it is transmitted over the kernel debugging connection from the host to the target, and used in lieu of the target’s local driver image.

Using the driver replacement map makes it easier to replace a driver with an updated version. However, in its usual form, the replacement map feature has a significant limitation – it cannot replace boot load drivers.

To understand the logic behind this restriction, one must consider the nature of boot driver loading. While demand-start drivers are started by the user-mode Service Control Manager (SCM) and system-start drivers are loaded by NTOSKRNL’s IoInitSystem function, boot drivers are, as their name suggests, required for the system to boot and are therefore loaded by osloader, a part of ntldr (this description is for pre-Vista systems).

By the time the NT kernel is up and its Memory Manager consults the kernel debugger and its driver replacement map, it is far too late to do anything about those drivers which have been pre-loaded by the OS loader. The initial breakpoint offered by the kernel debugger is simply too late.

Fortunately, Microsoft recognized the importance of providing a driver replacement map for boot load drivers and provides a somewhat esoteric solution in the form of the debug version of NTLDR.

The debug version of NTLDR expects the kernel debugger to attach to it during system startup. Unlike the kernel debugger, it is not configured with the boot.ini file and is always configured to a 115,200 baud connection on the COM1 serial port.

The documentation for .kdfiles points out that the Windows Driver Kit (WDK) bundles a debug version of NTLDR in the debug subdirectory. However, such a file is nowhere to be found there, probably because the WDK now contains the Vista checked kernel in its debug directory and the modern Vista boot loader is distinct from NTLDR. More on Windows Vista later, but for now let’s concentrate on Windows XP.

Failing to locate the debug NTLDR in the WDK, I turned back in time to the Windows Server 2003 SP1 IFS Kit, a variant of the Windows Server 2003 SP1 DDK for file system and file system filter developers. I was glad to find the ntldr_dbg file in its debug subdirectory.

However, my happiness quickly turned to disappointment when I replaced the original NTLDR with ntldr_dbg in a Windows XP virtual machine. The system refused to boot, claiming that NTLDR was corrupt. Since the debug directory in the IFS kit contains checked kernel binaries for Windows Server 2003 SP1, I figured that the provided version of ntldr_dbg is a match for that version, as well.

I turned to the archives, so to speak, and dusted off old MSDN Subscription CDs. I eventually turned up the rather antiquated Windows XP SP1 DDK. In there, I found another version of ntldr_dbg. I placed it as required and this time the system booted successfully.

It is unfortunate that one has to dig up the DDK of yore to locate the boot debugger. It really ought to be more accessible.

With the debug version of NTLDR is in place, when you boot the system, right before the OS loader menu appears, you see the following message:
Boot Debugger Using: COM1 (Baud Rate 115200)

Once the message is displayed, NTLDR blocks waiting for a kernel debugger to connect. I start the kernel debugger the way I’d usually start it:
windbg -b -k com:pipe,port=\\.\pipe\com_1

Soon enough, however, it is evident that this is no ordinary kernel debugging session:

Microsoft (R) Windows Debugger Version 6.9.0003.113 X86
Copyright (c) Microsoft Corporation. All rights reserved.


Opened \\.\pipe\com_1
Waiting to reconnect...
BD: Boot Debugger Initialized
Connected to Windows Boot Debugger 2600 x86 compatible target, ptr64 FALSE
Kernel Debugger connection established. (Initial Breakpoint requested)
Symbol search path is: C:\WINDOWS\Symbols;SRV*E:\SymStore*http://referencesource.microsoft.com/symbols;SRV*E:\SymStore*http://msdl.microsoft.com/download/symbols
Executable search path is:
Module List address is NULL - debugger not initialized properly.
WARNING: .reload failed, module list may be incomplete
KdDebuggerData.KernBase < SystemRangeStart
Windows Boot Debugger Kernel Version 2600 UP Checked x86 compatible
Primary image base = 0x00000000 Loaded module list = 0x00000000
System Uptime: not available
Break instruction exception - code 80000003 (first chance)
0041cf70 cc int 3
kd>

Windbg has attached to the Windows Boot Debugger, a debugging environment provided by the debug version of NTLDR at a very early stage of system startup, well before the NT kernel has been loaded. Indeed, the initial breakpoint at the boot debugger occurs before an OS to start has been selected at the loader boot menu.

With the boot debugger at its initial breakpoint, we can set up the driver replacement map as desired. For instance, we can replace NTFS and NDIS with their counterparts from the checked build of Windows XP:

kd> .kdfiles -m \WINDOWS\system32\drivers\Ntfs.sys C:\Stuff\xpsp3checked\Ntfs.sys
Added mapping for '\WINDOWS\system32\drivers\Ntfs.sys'
kd> .kdfiles -m \WINDOWS\system32\drivers\Ndis.sys C:\Stuff\xpsp3checked\Ndis.sys
Added mapping for '\WINDOWS\system32\drivers\Ndis.sys'
kd> g
BD: osloader.exe base address 00400000
BD: \WINDOWS\system32\NTKRNLMP.CHK base address 80A02000
BD: \WINDOWS\system32\HALMACPI.CHK base address 80100000
BD: \WINDOWS\system32\KDCOM.DLL base address 80010000
BD: \WINDOWS\system32\BOOTVID.dll base address 80001000
BD: \WINDOWS\system32\DRIVERS\ACPI.sys base address 8014C000
BD: \WINDOWS\system32\DRIVERS\WMILIB.SYS base address 80007000
BD: \WINDOWS\system32\DRIVERS\pci.sys base address 80062000
BD: \WINDOWS\system32\DRIVERS\isapnp.sys base address 80012000
BD: \WINDOWS\system32\DRIVERS\compbatt.sys base address 80009000
BD: \WINDOWS\system32\DRIVERS\BATTC.SYS base address 8000C000
BD: \WINDOWS\system32\DRIVERS\intelide.sys base address 8001C000
BD: \WINDOWS\system32\DRIVERS\PCIIDEX.SYS base address 8017A000
BD: \WINDOWS\System32\Drivers\MountMgr.sys base address 80181000
BD: \WINDOWS\system32\DRIVERS\ftdisk.sys base address 8018C000
BD: \WINDOWS\System32\drivers\dmload.sys base address 8001E000
BD: \WINDOWS\System32\drivers\dmio.sys base address 801AB000
BD: \WINDOWS\System32\Drivers\PartMgr.sys base address 801D1000
BD: \WINDOWS\System32\Drivers\VolSnap.sys base address 801D6000
BD: \WINDOWS\system32\DRIVERS\atapi.sys base address 801E3000
BD: \WINDOWS\system32\DRIVERS\vmscsi.sys base address 80073000
BD: \WINDOWS\system32\DRIVERS\SCSIPORT.SYS base address 801FB000
BD: \WINDOWS\system32\DRIVERS\disk.sys base address 80213000
BD: \WINDOWS\system32\DRIVERS\CLASSPNP.SYS base address 8021C000
BD: \WINDOWS\system32\drivers\fltmgr.sys base address 80229000
BD: \WINDOWS\system32\DRIVERS\sr.sys base address 802A7000
BD: \WINDOWS\System32\Drivers\KSecDD.sys base address 802B9000
KD: Accessing 'C:\Stuff\xpsp3checked\Ntfs.sys' (\WINDOWS\System32\Drivers\Ntfs.sys)
File size 814K.... ....BD: Loaded remote file \WINDOWS\System32\Drivers\Ntfs.sys

BlLoadImageEx: Pulled \WINDOWS\System32\Drivers\Ntfs.sys from Kernel Debugger
BD: \WINDOWS\System32\Drivers\Ntfs.sys base address 802D0000
KD: Accessing 'C:\Stuff\xpsp3checked\Ndis.sys' (\WINDOWS\System32\Drivers\NDIS.sys)
File size 424K.... ....BD: Loaded remote file \WINDOWS\System32\Drivers\NDIS.sys

BlLoadImageEx: Pulled \WINDOWS\System32\Drivers\NDIS.sys from Kernel Debugger
BD: \WINDOWS\System32\Drivers\NDIS.sys base address 804DC000
Shutdown occurred...unloading all symbol tables.
Waiting to reconnect.

We can see that the boot debugger picked up our driver replacements and transferred them from the host to the target through the kernel debugger connection. Alas, this can be a lengthy process for an obese driver over the 115,200 baud link…

Beyond being useful for replacing your own drivers, which is what I had in mind when I looked into this feature, the boot debugger can be used to easily go back and forth between Windows free build and checked build operating system components, as illustrated above. However, such use is not without its problems.

For one, replacing the kernel and the HAL with their checked counterparts through the driver replacement map does not work. An error citing kernel corruption results from such an attempt. The traditional way of using a checked kernel, by placing an appropriate entry in boot.ini, is still required.

When testing a file system filter driver, apart from using the checked version of the I/O Manager through the use of the checked NT kernel, it is advantageous to use checked versions of underlying file system drivers such as NTFS. The checked versions can assert when you pass on requests to them in a way which violates the file system’s locking hierarchy and which may lead to deadlocks. Replacing the NTFS driver with the driver replacement map feature worked as expected, apart from causing NDIS to bugcheck during system boot with some sort of paging error. The issue was resolved by replacing NDIS with its checked counterpart through the driver replacement map, as well.

However, for a reason I do not understand, when placing the checked build of the Filter Manager, useful for debugging file system minifilters, there was no such luck. The boot loader complained after transferring the checked Filter Manager that the NTFS driver was corrupt. I disabled System File Protection and replaced the free drivers with the checked drivers on disk, the traditional way and the system booted with the checked NTFS and Filter Manager successfully. So it appears that the boot-time driver replacement map feature can be a bit flaky…

It is probably best to place checked operating system components the traditional way and only replace your own, frequently modified drivers with the boot debugger and the driver replacement map.

So much for Windows XP and the legacy NTLDR. But what about Windows Vista?

At first, the situation looked promising. In Windows Vista, the boot debugger is built-in. It can, for instance, be enabled for an existing boot entry with the Boot Configuration Database editor from an elevated command prompt:

C:\Windows\system32>bcdedit /enum

Windows Boot Manager
--------------------
identifier {bootmgr}
device partition=C:
description Windows Boot Manager
locale en-US
inherit {globalsettings}
default {current}
displayorder {current}
{5761b19a-1e8a-11dd-bcd4-000c29797dc6}
toolsdisplayorder {memdiag}
timeout 30

Windows Boot Loader
-------------------
identifier {current}
device partition=C:
path \Windows\system32\winload.exe
description Microsoft Windows Vista
locale en-US
inherit {bootloadersettings}
osdevice partition=C:
systemroot \Windows
resumeobject {694d30db-e737-11dc-814f-e01223f3682a}
nx OptIn

Windows Boot Loader
-------------------
identifier {5761b19a-1e8a-11dd-bcd4-000c29797dc6}
device partition=C:
path \Windows\system32\winload.exe
description Debugging
locale en-US
inherit {bootloadersettings}
osdevice partition=C:
systemroot \Windows
resumeobject {694d30db-e737-11dc-814f-e01223f3682a}
nx OptIn
debug Yes

C:\Windows\system32>bcdedit /bootdebug {5761b19a-1e8a-11dd-bcd4-000c29797dc6} ON

The operation completed successfully.

Unlike the XP boot debugger, the Vista boot debugger is set for a specific boot loader menu entry. Once we reboot and pick the entry for which boot debugging is enabled, we can attach:

Microsoft (R) Windows Debugger Version 6.9.0003.113 X86
Copyright (c) Microsoft Corporation. All rights reserved.

Opened \\.\pipe\com_1
Waiting to reconnect...
BD: Boot Debugger Initialized
Connected to Windows Boot Debugger 6001 x86 compatible target, ptr64 FALSE
Kernel Debugger connection established. (Initial Breakpoint requested)
Symbol search path is: C:\WINDOWS\Symbols;SRV*E:\SymStore*http://referencesource.microsoft.com/symbols;SRV*E:\SymStore*http://msdl.microsoft.com/download/symbols
Executable search path is:
Windows Boot Debugger Kernel Version 6001 UP Free x86 compatible
Primary image base = 0x00584000 Loaded module list = 0x00684e78
System Uptime: not available
Break instruction exception - code 80000003 (first chance)
winload!RtlpBreakWithStatusInstruction:
005bce88 cc int 3
kd> k
ChildEBP RetAddr
00120c6c 005b0862 winload!RtlpBreakWithStatusInstruction
00120e84 005b0760 winload!vDbgPrintExWithPrefixInternal+0x100
00120e94 0058bdaf winload!DbgPrint+0x11
00120eb0 0058bf6d winload!BlBdStart+0x81
00120f48 005a2f88 winload!BlBdInitialize+0x172
00120f64 005a28c2 winload!InitializeLibrary+0x168
00120f7c 0058513a winload!BlInitializeLibrary+0x42
00120fe8 0044646a winload!OslMain+0x13a
WARNING: Frame IP not in any known module. Following frames may be wrong.
00000000 f000ff53 0x44646a
00000000 00000000 0xf000ff53

We can see that in Vista, the boot debugger’s initial break is in the new winload.exe, replacing the osloader.exe embedded in ntldr of yesteryear. At this point the boot load drivers have yet to be loaded, so it would be perfect to set the .kdfiles driver replacement map at this point.

Alas, no such luck. It turns out the boot load driver replacement map feature is MIA in Windows Vista. This is confirmed by Microsoft’s Doron Holan in a reply to a post (free registration required) in OSR’s WINDBG mailing list. It is unclear what is the point of bundling the boot debugger with the regular operating system, unlike in the case of the hard to find ntldr_dbg for XP, only for it to be completely useless… Anyone using the boot debugger for purposes other than boot load driver replacement is probably working for Microsoft, so why should the boot debugger be a part of the OS if it is now missing what seems to be its most important functionality?

Hopefully the boot load driver replacement map will make a comeback in the Windows 7 boot debugger…

Process Tracking in the Microsoft Network Monitor 3.2 beta

Network protocol analyzers like WireShark (formerly Ethereal) and Microsoft Network Monitor are very powerful tools for troubleshooting and analysis work. Anyone who uses them quickly gets accustomed to the convenience and can’t imagine working without them. I’ve done my fair share of analysis and debugging with these sniffers and therefore follow their development with special interest.

I was therefore intrigued the other day when I noticed the announcement of Microsoft’s release of the Network Monitor 3.2 beta. Two particular items in the feature list caught my attention. One is the addition of a network capture and frame parsing API (NmAPI) to Network Monitor 3.x. Windows versions prior to Windows Vista featured the Network Monitor 2.x capture API. However, this API was removed from Windows Vista and the new Network Monitor 3.x targeting Vista failed to provide an alternative until now.

The other, more exciting, item on the feature list was Process Tracking. The announcement post claimed the new beta could group frames under their sending or receiving process in the Conversations view, showing process name and PID. I contend that anyone with some diagnostic experience would appreciate the immense importance and power of a process-oriented view of network frames.

I immediately retrieved the beta release from Microsoft Connect and began evaluating this new feature. I started a network capture and used Internet Explorer and Ping to send frames to the network. Soon enough, I realized that while Internet Explorer was successfully identified, the new Network Monitor failed to recognize Ping’s ICMP frames and instead opted to group them in the “<Unknown>” process group.

After reviewing the release notes, I learned that the tracking feature as documented only groups TCP and UDP sessions by process and frames containing other protocols are not supported.

At this point I began considering how would the Network Monitor developers go about implementing the Process Tracking feature and why would frame process association be restricted exclusively to TCP and UDP.

Under Windows XP, Network Monitor 3 uses the legacy Network Monitor 2 driver, nmnt.sys, included with the OS, to retrieve network frames from NDIS. Under Windows Vista, a new driver, nm3.sys, is used instead. I shall discuss the new Vista driver from this point forward.

nm3.sys is a NDIS 6 filter driver. The NDIS 6.0 architecture is new to Windows Vista, which explains why Microsoft opted to develop a new, separate filter driver. nm3.sys signs up for examining network frames by registering as a NDIS filter with the NdisFRegisterFilterDriver API. The caller specifies a lengthy set of filter callback functions in the NDIS_FILTER_DRIVER_CHARACTERISTICS structure provided in the call.

Among the callback functions registered by a NDIS 6 filter, of special interest are the FilterSendNetBufferLists and FilterReceiveNetBufferLists callbacks, invoked when a NET_BUFFER_LIST is to be sent or received. These callbacks receive a a list of network buffers containing the frames in question, but no process information is provided directly. To investigate whether process information can be retrieved indirectly, we need to track nm3’s callback implementations. Unfortunately, Microsoft failed to release public symbols (PDBs) to the Microsoft symbol store for the 3.2 beta, so the nm3.sys driver included with 3.1, for which symbols are available, shall be examined instead:


0:000> uf nm3!DriverEntry
nm3!DriverEntry:
00019006 8bff mov edi,edi
00019008 55 push ebp
00019009 8bec mov ebp,esp
0001900b 56 push esi
0001900c ff750c push dword ptr [ebp+0Ch]
0001900f ff7508 push dword ptr [ebp+8]
00019012 e8a19dffff call nm3!NmInitializeGlobals (00012db8 )

00019017 ff7508 push dword ptr [ebp+8]
0001901a e8dd84ffff call nm3!NmRegisterFilter (000114fc)
0001901f 8bf0 mov esi,eax
00019021 85f6 test esi,esi
00019023 7517 jne nm3!DriverEntry+0x36 (0001903c)
nm3!DriverEntry+0x1f:
00019025 e8808effff call nm3!NmRegisterDevice (00011eaa)
0001902a 8bf0 mov esi,eax
0001902c 85f6 test esi,esi
0001902e 7411 je nm3!DriverEntry+0x3b (00019041)
nm3!DriverEntry+0x2a:
00019030 ff3550700100 push dword ptr [nm3!g_FilterDriverHandle (00017050)]
00019036 ff1514600100 call dword ptr [nm3!_imp__NdisFDeregisterFilterDriver (00016014)]
nm3!DriverEntry+0x36:
0001903c e8b998ffff call nm3!NmFreeDriverResources (000128fa)
nm3!DriverEntry+0x3b:
00019041 8bc6 mov eax,esi
00019043 5e pop esi
00019044 5d pop ebp
00019045 c20800 ret 8

During initialization, nm3 registers as a NDIS filter. Let’s see the specifics:

0:000> uf nm3!NmRegisterFilter
nm3!NmRegisterFilter:
000114fc 8bff mov edi,edi
000114fe 55 push ebp
000114ff 8bec mov ebp,esp
00011501 81ec80000000 sub esp,80h
00011507 53 push ebx
00011508 56 push esi
00011509 8b3540610100 mov esi,dword ptr [nm3!_imp__RtlInitUnicodeString (00016140)]
0001150f 685e5b0100 push offset nm3! ?? ::FNODOBFM::`string' (00015b5e)
00011514 8d45e8 lea eax,[ebp-18h]
00011517 50 push eax
00011518 ffd6 call esi
0001151a 681c5b0100 push offset nm3! ?? ::FNODOBFM::`string' (00015b1c)
0001151f 8d45f8 lea eax,[ebp-8]
00011522 50 push eax
00011523 ffd6 call esi
00011525 68ce5a0100 push offset nm3! ?? ::FNODOBFM::`string' (00015ace)
0001152a 8d45f0 lea eax,[ebp-10h]
0001152d 50 push eax
0001152e ffd6 call esi
00011530 6a68 push 68h
00011532 33db xor ebx,ebx
00011534 8d4580 lea eax,[ebp-80h]
00011537 53 push ebx
00011538 50 push eax
00011539 e8e6410000 call nm3!memset (00015724)
0001153e 8b45f8 mov eax,dword ptr [ebp-8]
00011541 89458c mov dword ptr [ebp-74h],eax
00011544 8b45fc mov eax,dword ptr [ebp-4]
00011547 894590 mov dword ptr [ebp-70h],eax
0001154a 8b45f0 mov eax,dword ptr [ebp-10h]
0001154d 894594 mov dword ptr [ebp-6Ch],eax
00011550 8b45f4 mov eax,dword ptr [ebp-0Ch]
00011553 894598 mov dword ptr [ebp-68h],eax
00011556 8b45e8 mov eax,dword ptr [ebp-18h]
00011559 83c40c add esp,0Ch
0001155c 89459c mov dword ptr [ebp-64h],eax
0001155f 8b45ec mov eax,dword ptr [ebp-14h]
00011562 6850700100 push offset nm3!g_FilterDriverHandle (00017050)
00011567 8945a0 mov dword ptr [ebp-60h],eax
0001156a 8b4508 mov eax,dword ptr [ebp+8]
0001156d 8d4d80 lea ecx,[ebp-80h]
00011570 51 push ecx
00011571 c740343e290100 mov dword ptr [eax+34h],offset nm3!NetmonUnload (0001293e)
00011578 ff35a0700100 push dword ptr [nm3!g_FilterDriverObject (000170a0)]
0001157e c645808b mov byte ptr [ebp-80h],8Bh
00011582 50 push eax
00011583 66c745826800 mov word ptr [ebp-7Eh],68h
00011589 c6458101 mov byte ptr [ebp-7Fh],1
0001158d c6458406 mov byte ptr [ebp-7Ch],6
00011591 885d85 mov byte ptr [ebp-7Bh],bl
00011594 c6458601 mov byte ptr [ebp-7Ah],1
00011598 885d87 mov byte ptr [ebp-79h],bl
0001159b 895d88 mov dword ptr [ebp-78h],ebx
0001159e c745ac203d0100 mov dword ptr [ebp-54h],offset nm3!NetmonFilterAttach (00013d20)
000115a5 c745b094390100 mov dword ptr [ebp-50h],offset nm3!NetmonFilterDetach (00013994)
000115ac c745b474100100 mov dword ptr [ebp-4Ch],offset nm3!NetmonFilterRestart (00011074)
000115b3 c745b834100100 mov dword ptr [ebp-48h],offset nm3!NetmonFilterPause (00011034)
000115ba c745d0ec130100 mov dword ptr [ebp-30h],offset nm3!NetmonOidRequest (000113ec)
000115c1 c745a824110100 mov dword ptr [ebp-58h],offset nm3!NetmonFilterSetModuleOptions (00011124)
000115c8 c745a406100100 mov dword ptr [ebp-5Ch],offset nm3!NetmonSetOptions (00011006)
000115cf c745c8324c0100 mov dword ptr [ebp-38h],offset nm3!NetmonReceiveNetBufferLists (00014c32)
000115d6 c745dc92440100 mov dword ptr [ebp-24h],offset nm3!NetmonDevicePnPEventNotify (00014492)
000115dd c745e0b0440100 mov dword ptr [ebp-20h],offset nm3!NetmonNetPnPEvent (000144b0)
000115e4 895dcc mov dword ptr [ebp-34h],ebx
000115e7 c745e406110100 mov dword ptr [ebp-1Ch],offset nm3!NetmonFilterStatus (00011106)
000115ee c745d42e130100 mov dword ptr [ebp-2Ch],offset nm3!NetmonOidRequestComplete (0001132e)
000115f5 895dd8 mov dword ptr [ebp-28h],ebx
000115f8 c745bcfe4d0100 mov dword ptr [ebp-44h],offset nm3!NetmonSendNetBufferLists (00014dfe)
000115ff 895dc0 mov dword ptr [ebp-40h],ebx
00011602 895dc4 mov dword ptr [ebp-3Ch],ebx
00011605 ff152c600100 call dword ptr [nm3!_imp__NdisFRegisterFilterDriver (0001602c)]
0001160b 5e pop esi
0001160c 5b pop ebx
0001160d c9 leave
0001160e c20400 ret 4

The callback implementations are NetmonSendNetBufferLists and NetmonReceiveNetBufferLists. We can place breakpoints on these callbacks while a network capture is in progress and examine the stack. Let’s look at what things look like for an outgoing ECHO request sent by the PING command:

1: kd> kb 2000
ChildEBP RetAddr Args to Child
9db555a0 85cbc585 851b4be8 856cb458 00000000 nm3!NetmonSendNetBufferLists
9db555c0 85cbc5a8 856cb458 856cb458 00000000 ndis!ndisFilterSendNetBufferLists+0x8b
9db555d8 8c60545f 851ba808 856cb458 00000000 ndis!NdisFSendNetBufferLists+0x18
9db55654 85cbc638 851b2d60 856cb458 00000000 pacer!PcFilterSendNetBufferLists+0x233
9db55670 85d8764a 856cb458 856cb458 00000000 ndis!ndisSendNBLToFilter+0x87
9db55694 85e8a1ee 851b4750 856cb458 00000000 ndis!NdisSendNetBufferLists+0x4f
9db556dc 85e89dcc 84b412b8 00000000 00000000 tcpip!FlSendPackets+0x399
9db5571c 85e899db 85eeec68 00000000 00000000 tcpip!IppFragmentPackets+0x201
9db55754 85e8b7cb 85eeec68 9db55870 616c7049 tcpip!IppDispatchSendPacketHelper+0x252
9db557f4 85e8ac3f 00b55870 85eeec68 00000000 tcpip!IppPacketizeDatagrams+0x8fd
9db55954 85e8c75d 00000000 856cb400 85eeec68 tcpip!IppSendDatagramsCommon+0x5f9
9db55974 85e57d83 85eeec68 9db559c0 83682128 tcpip!IppSendDatagrams+0x2a
9db5599c 85e58a3a 00000000 00000000 856cb458 tcpip!IppSendControl+0xfe
9db55b20 85e58234 00000000 000003a5 85ee96a4 tcpip!Ipv4SetEchoRequestCreate+0x718
9db55b64 85df8a29 9db55b7c 00000000 85683038 tcpip!Ipv4SetAllEchoRequestParameters+0xf2
9db55ba4 8c681551 00000006 8370218c 00000000 NETIO!NsiSetAllParametersEx+0xbd
9db55bf0 8c681eb8 00000000 8568eaa0 8568ead8 nsiproxy!NsippSetAllParameters+0x1b1
9db55c14 8c681f91 83702101 00000000 856838f8 nsiproxy!NsippDispatchDeviceControl+0x88
9db55c2c 8184b1ad 85142448 83702170 83702170 nsiproxy!NsippDispatch+0x33
9db55c44 819f7f64 856838f8 83702170 837021e0 nt!IofCallDriver+0x63
9db55c64 81a02940 85142448 856838f8 0016f400 nt!IopSynchronousServiceTail+0x1d9
9db55d00 81a346cf 85142448 83702170 00000000 nt!IopXxxControlFile+0x6b7
9db55d34 8185c9aa 000000f8 00000138 00000000 nt!NtDeviceIoControlFile+0x2a
9db55d34 77159a94 000000f8 00000138 00000000 nt!KiFastCallEntry+0x12a
0016f3d4 77158444 773514b9 000000f8 00000138 ntdll!KiFastSystemCallRet
0016f3d8 773514b9 000000f8 00000138 00000000 ntdll!ZwDeviceIoControlFile+0xc
0016f41c 77351b48 00120013 0016f450 00000028 NSI!NsiIoctl+0x5d
0016f440 77351b1b 0016f450 0024dc1c 00000000 NSI!NsiSetAllParametersEx+0x23
0016f478 753591f2 00000001 00000006 753533e4 NSI!NsiSetAllParameters+0x53
0016f528 00fe24df 0023fb60 00000000 00000000 IPHLPAPI!IcmpSendEcho2Ex+0x1d5
0016fa7c 00fe2a23 00000002 008422d0 00841578 PING!main+0xacb
0016fac0 75c54911 7ffdf000 0016fb0c 7713e4b6 PING!_initterm_e+0x163
0016facc 7713e4b6 7ffdf000 770df6a4 00000000 kernel32!BaseThreadInitThunk+0xe
0016fb0c 7713e489 00fe2b5d 7ffdf000 00000000 ntdll!__RtlUserThreadStart+0x23
0016fb24 00000000 00fe2b5d 7ffdf000 00000000 ntdll!_RtlUserThreadStart+0x1b

On the top of the kernel-mode stack, we see the filter callback for outgoing network frames. The frames were dispatched to the filter by NDIS (NdisFSendNetBufferLists). User space sent the network frame by issuing a IOCTL to the network protocol stack. Notice the cut-off between the user-mode and kernel-mode stack at KiFastSystemCallRet. PING uses the IP Helper API to send the outgoing ECHO frame.

We can conclude from this stack trace that at least for some cases, the user-space thread context active when a NDIS filter callback for outgoing frames is invoked is in fact associated with the process that sent the respective network frame. Theoretically, a filter could use IoGetCurrentProcess, extract information (name, PID, etc.) and provide it to the user-space network capture program.

However, there remains the possibility that due to network frame buffering or other reasons, the originating process will not be the one active when the filter callback is invoked, but rather we’d be in arbitrary thread context. Let’s accept that and consider the situation on the receive path. Let’s consider a stack trace for an invocation of the receive callback:

0: kd> kb 2000
ChildEBP RetAddr Args to Child
8069dea8 85d79aba 851b4be8 854c67b0 00000000 nm3!NetmonReceiveNetBufferLists
8069dec4 85cba54a 85687438 854c67b0 00000000 ndis!ndisMIndicateReceiveNetBufferListsInternal+0x27
8069dee0 8636f71f 85687438 854c67b0 00000000 ndis!NdisMIndicateReceiveNetBufferLists+0x20
8069df28 8636e77e 00000000 8636e6fe 00000001 E1G60I32!RxProcessReceiveInterrupts+0xdd
8069df40 85d7911c 01f0d160 00000000 00000000 E1G60I32!E1000HandleInterrupt+0x80
8069df64 85cba468 84bfd5fc 00000000 00000000 ndis!ndisMiniportDpc+0x81
8069df88 8186fab0 84bfd5fc 85687438 00000000 ndis!ndisInterruptDpc+0xc4
8069dff4 8186dfa5 9db55470 00000000 00000000 nt!KiRetireDpcList+0x147
8069dff8 9db55470 00000000 00000000 00000000 nt!KiDispatchInterrupt+0x45
WARNING: Frame IP not in any known module. Following frames may be wrong.
8186dfa5 00000000 0000001b 00c7850f bb830000 0x9db55470

The stack trace for the receive path illustrates that the receive callback is invoked directly during interrupt processing by the network interface card. The NIC’s driver DPC notifies NDIS that new network frames are available by calling NdisMIndicateReceiveNetBufferLists, which invokes the filter callback.

This makes sense. When network frames are received, they are first processed by the driver, then by the filter and only then would they be dispatched to the user-space process listening on a matching socket, etc.

With the receive path being unsuitable for deducing process association and the send path’s reliability being impaired by buffering and other behavior that can lead to arbitrary thread context, Network Monitor opts for a totally different approach for process tracking, not involving the NDIS filter driver at all.

Examining the modified filter callbacks in the 3.2 beta version of the nm3 driver proved that it did not attempt to collect any process information. Process Tracking must be implemented elsewhere. A simple examination of the Network Monitor program, netmon.exe, revealed that its import address table (IAT) references the well-known APIs GetExtendedTcpTable and GetExtendedUdpTable.

These APIs were introduced with the Windows XP Service Pack 2 release. You can see them in action by running the beloved “netstat” command with the new “-b” switch, that shows which process is associated with an open socket. Using “-b -v” shows all the modules involved in a socket connection. The “-o” switch provides more concise information in the form of the relevant PID.

What the new Network Monitor does, is, in effect, the moral equivalent of running “netstat -b” whenever a TCP or UDP frame is captured. The local endpoint (IP and port) is matched against the table returned by the GetExtendedXxpTable APIs.

There are several shortcomings to this technique. One is that it only works with TCP and UDP. Another is that information may not be available for very short-lived connections, since by the time the extended tables are queried, the session may already be long gone. One process could potentially send out UDP datagrams using a raw socket without establishing a port binding and be confused with another process that used a regular socket and bound itself to the same source port.

In conclusion, the addition of Process Tracking to Network Monitor, while welcome, is not the holy grail of process network monitoring by any means. Since the NDIS filter architecture does not lend itself to process-oriented monitoring, a solution from that end is probably not available.

As implied by this discussion thread, the way to go here, may be a Windows Filtering Platform Callout Driver on Vista or a TDI upper filter on downlevel systems. WFP provides process information internally and an upper TDI filter can apparently rely on the user-space thread context be non-arbitrary. The question remains whether such solutions would be overkill or not for a network protocol analyzer.

Remote Procedure Call debugging

Recently, I discussed how one would go about finding the other end of an LPC (Local inter-Process Communication, rather than Local Procedure Call, apparently) port. LPC is used directly through the native API for some Windows components such as LSA, but is more frequently used by third parties in the form of the “ncalrpc” RPC transport. When dealing with those cases, or cases where the higher level RPC runtime is used in general (e.g., with the named pipes or TCP transports), we must turn to a whole other family of techniques.

While in the case of LPC analysis we turned to the aid of the kernel debugger, in the case of RPC we can utilize built-in instrumentation found in the Windows RPC runtime library. Since RPC debugging may come to involve a variety of distributed scenarios, rather than opting for a plain registry setting enabling instrumentation, Microsoft chose to provide control through the group policy facility.

Enabling debugging aid by the runtime is prerequisite to any useful analysis work. Follow the instructions in the MSDN page “Enabling RPC State Information” and restart the system. Usually you’ll be able to make do with the “Server” setting.

For illustration purposes, we shall consider the HELLO RPC sample available with the Microsoft Windows SDK. The HELLO sample includes an IDL file specifying a trivial illustrative interface providing the HelloProc remote call that passes a string to the server side and the Shutdown remote call that instructs the server to shut down. Let’s run the HELLO server process.

In order to diagnose a product using RPC we must figure out the server endpoint of interest. Our primary tool will be the “dbgrpc” utility distributed with the Debugging Tools for Windows. With RPC state information enabled, we begin by enumerating RPC endpoints:

C:\Program Files\Debugging Tools for Windows>dbgrpc -e
Searching for endpoint info ...
PID CELL ID ST PROTSEQ ENDPOINT
-------------------------------------------------------------
0274 0000.0001 01 LRPC IUserProfile
0274 0000.0003 01 LRPC sclogonrpc
0274 0000.0005 01 NMP \PIPE\InitShutdown
0274 0000.0007 01 NMP \PIPE\SfcApi
0274 0000.000a 01 NMP \pipe\winlogonrpc
0274 0000.000e 01 LRPC OLEFEB89B1D900E460783A2A6ABA
02a0 0000.0001 01 LRPC ntsvcs
02a0 0000.0003 01 NMP \pipe\ntsvcs
02a0 0000.0006 01 NMP \PIPE\scerpc
02ac 0000.0001 01 NMP \PIPE\lsass
02ac 0000.0003 01 LRPC audit
02ac 0000.0005 01 LRPC securityevent
02ac 0000.0007 01 LRPC protected_storage
02ac 0000.0009 01 NMP \PIPE\protected_storage
034c 0000.0001 01 LRPC actkernel
034c 0000.0005 01 LRPC IcaApi
034c 0000.0007 01 NMP \pipe\Ctx_WinStation_API_ser
03a4 0000.0001 01 LRPC epmapper
03a4 0000.0003 01 TCP 135
03a4 0000.000a 01 NMP \pipe\epmapper
0414 0000.0001 01 LRPC dhcpcsvc
0414 0000.0003 01 LRPC wzcsvc
0414 0000.0005 01 LRPC OLEA390A47C8A6F4EA78EA712E62
0414 0000.0009 01 NMP \PIPE\atsvc
0414 0000.000e 01 LRPC AudioSrv
0414 0000.0010 01 NMP \PIPE\wkssvc
0414 0000.0011 01 NMP \pipe\keysvc
0414 0000.0012 01 LRPC keysvc
0414 0000.0014 01 LRPC SECLOGON
0414 0000.0016 01 NMP \pipe\trkwks
0414 0000.0017 01 LRPC trkwks
0414 0000.001a 01 NMP \PIPE\srvsvc
0414 0000.001d 01 LRPC srrpc
0414 0000.001f 01 LRPC senssvc
0414 0000.0021 01 NMP \PIPE\W32TIME
04ec 0000.0001 01 LRPC DNSResolver
0548 0000.0001 01 NMP \PIPE\DAV RPC SERVICE
0548 0000.0003 01 NMP \PIPE\winreg
0548 0000.0004 01 LRPC LRPC00000548.00000001
05e4 0000.0001 01 NMP \pipe\spoolss
05e4 0000.0003 01 LRPC spoolss
05e4 0000.0006 01 LRPC OLE8BC761BE0AFF4D9CA9603B53B
0684 0000.0001 01 LRPC OLE872E70B024824F8894A85E384
00ac 0000.0001 01 LRPC OLEAA4283CA4B51483E95665C439
0204 0000.0001 01 LRPC OLEDBAAFA32AEBF41AD808B50A1B
0594 0000.0001 01 LRPC OLEA0D6A971EC424B7DB839E9308
0314 0000.0001 01 LRPC hello

Endpoint enumeration gives you an idea of available RPC services in a server system. Since the HELLO server process was the last one launched, it is conveniently found at the bottom of the output.

Without repeating too much of the RPC debugging primer in the Windbg documentation, I’ll just point out the important fact that RPC state information is organized into “cells” in each process. Through the use of a simple endpoint enumeration command, we’ve already concluded that the HELLO server process is PID 0x314. Not an impressive feat for a process we just launched, but consider that this could easily be a third-party RPC server started as a service or on demand in an unknown executable.

Most of the time, we can associate the endpoint name with the application of interest since a descriptive string is being used. However, in other cases, we may know the server application of interest, but the endpoint name is unknown, random or auto-generated. When there’s just one endpoint, we can just find the process of interest in the dbgrpc endpoint enumeration output. In any case, we can examine the call used by the server application to the RPC runtime to determine which endpoint name is in use:


0:000> bp rpcrt4!RpcServerUseProtseqEpA
0:000> g
Breakpoint 0 hit
eax=00452000 ebx=7ffd5000 ecx=00452008 edx=00000014 esi=00d5f55c edi=7c911970
eip=77e97a0b esp=0012ff3c ebp=0012ff6c iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206
RPCRT4!RpcServerUseProtseqEpA:
77e97a0b 8bff mov edi,edi
0:000> kb
ChildEBP RetAddr Args to Child
0012ff38 00401046 00452000 00000014 00452008 RPCRT4!RpcServerUseProtseqEpA
0012ff6c 00401e37 00000001 003330a0 00333120 hellos!main+0x46 [e:\projects\hello\hellos.c @ 21]
0012ffb8 00401d0f 0012fff0 7c816ff7 7c911970 hellos!__tmainCRTStartup+0x117 [f:\dd\vctools\crt_bld\self_x86\crt\src\crt0.c @ 266]
0012ffc0 7c816ff7 7c911970 00d5f55c 7ffd5000 hellos!mainCRTStartup+0xf [f:\dd\vctools\crt_bld\self_x86\crt\src\crt0.c @ 182]
0012fff0 00000000 00401d00 00000000 78746341 kernel32!BaseProcessStart+0x23

We note that the third argument to RpcServerUseProtseqEp specifies the server endpoint name:

0:000> da 00452008
00452008 "hello"

Note that more complex varieties of RPC servers may use alternative approaches for endpoint name selection that do not utilize the aforementioned API.

When debugging a remote call, finding the server-side in process resolution may prove to be insufficient. Fortunately, we can continue and extract thread information. Consider an endpoint list entry for a running HELLO server:

0314 0000.0001 01 LRPC hello

Let’s examine thread information for this RPC server process:

C:\Program Files\Debugging Tools for Windows>dbgrpc -t -P 314
Searching for thread info ...
PID  CELL ID   ST TID       ENDPOINT LASTTIME
---------------------------------------------
0314 0000.0002 03 000000f0 0000.0001 003ffcad

We can see that a thread associated with cell ID 2 is associated with the endpoint at cell ID 1. If this were a server process serving multiple endpoints, we’d be able to filter the threads of interest by ignoring those associated with other endpoints.

We can use the thread ID returned by dbgrpc to find the thread in the debugger:

C:\Program Files\Debugging Tools for Windows>cdb -p 0x314
Microsoft (R) Windows Debugger Version 6.9.0003.113 X86
Copyright (c) Microsoft Corporation. All rights reserved.
*** wait with pending attach
Symbol search path is: SRV*C:\websymbols*\\.host\Shared Folders\SymStore*http://
msdl.microsoft.com/download/symbols
Executable search path is:
ModLoad: 00400000 00455000 C:\Documents and Settings\AdminUser\Desktop\hellos.
exe
ModLoad: 7c900000 7c9b0000 C:\WINDOWS\system32\ntdll.dll
ModLoad: 7c800000 7c8f5000 C:\WINDOWS\system32\kernel32.dll
ModLoad: 77e70000 77f01000 C:\WINDOWS\system32\RPCRT4.dll
ModLoad: 77dd0000 77e6b000 C:\WINDOWS\system32\ADVAPI32.dll
(314.674): Break instruction exception - code 80000003 (first chance)
eax=7ffde000 ebx=00000001 ecx=00000002 edx=00000003 esi=00000004 edi=00000005
eip=7c901230 esp=0036ffcc ebp=0036fff4 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000246
ntdll!DbgBreakPoint:
7c901230 cc int 3
0:002> ~
0 Id: 314.534 Suspend: 1 Teb: 7ffdd000 Unfrozen
1 Id: 314.f0 Suspend: 1 Teb: 7ffdc000 Unfrozen
. 2 Id: 314.674 Suspend: 1 Teb: 7ffdb000 Unfrozen
0:002> ~1 s
eax=00350020 ebx=00000000 ecx=00144530 edx=ffffffff esi=00144878 edi=00144a80
eip=7c90eb94 esp=0055fe18 ebp=0055ff80 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
ntdll!KiFastSystemCallRet:
7c90eb94 c3 ret
0:001> kb
ChildEBP RetAddr Args to Child
0055fe14 7c90e399 77e765d3 000007c8 0055ff74 ntdll!KiFastSystemCallRet
0055fe18 77e765d3 000007c8 0055ff74 00000000 ntdll!NtReplyWaitReceivePortEx+0xc
0055ff80 77e76c9f 0055ffa8 77e76ac1 00144878 RPCRT4!LRPC_ADDRESS::ReceiveLotsaCa
lls+0x12a
0055ff88 77e76ac1 00144878 7c90ee18 0012faf8 RPCRT4!RecvLotsaCallsWrapper+0xd
0055ffa8 77e76c87 00144218 0055ffec 7c80b6a3 RPCRT4!BaseCachedThreadRoutine+0x79
0055ffb4 7c80b6a3 00144a80 7c90ee18 0012faf8 RPCRT4!ThreadStartRoutine+0x1a
0055ffec 00000000 77e76c6d 00144a80 00000000 kernel32!BaseThreadStart+0x37
0:001>

Now, let’s add a breakpoint in the server-side implementation of the HelloProc remote call, run the HELLO client and see the context:

0:001> bp hellos!HelloProc
0:001> g
Breakpoint 0 hit
eax=004010f0 ebx=0055fd0c ecx=00000000 edx=00144c00 esi=0055f908 edi=0055f8e4
eip=004010f0 esp=0055f8e4 ebp=0055f8f8 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
hellos!HelloProc:
004010f0 55 push ebp
0:001> k
ChildEBP RetAddr
0055f8e0 77e799dc hellos!HelloProc
0055f8f8 77ef321a RPCRT4!Invoke+0x30
0055fcf4 77ef36ee RPCRT4!NdrStubCall2+0x297
0055fd10 77e794a5 RPCRT4!NdrServerCall2+0x19
0055fd44 77e7940a RPCRT4!DispatchToStubInC+0x38
0055fd98 77e79336 RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0x113
0055fdbc 77e7be3c RPCRT4!RPC_INTERFACE::DispatchToStub+0x84
0055fdf8 77e7bc99 RPCRT4!LRPC_SCALL::DealWithRequestMessage+0x2db
0055fe1c 77e7bbdd RPCRT4!LRPC_ADDRESS::DealWithLRPCRequest+0x16d
0055ff80 77e76c9f RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0x310
0055ff88 77e76ac1 RPCRT4!RecvLotsaCallsWrapper+0xd
0055ffa8 77e76c87 RPCRT4!BaseCachedThreadRoutine+0x79
0055ffb4 7c80b6a3 RPCRT4!ThreadStartRoutine+0x1a
0055ffec 00000000 kernel32!BaseThreadStart+0x37
0:001>

As expected, thread 1 is the one servicing the remote procedure call received at the endpoint. So even if we didn’t know the specific function being called on the server side, we could have followed the worker thread’s execution flow into the indirect call in NdrStubCall2 until arriving at the function of interest.

Another RPC behavior we can notice at this point is the spawning of an additional worker thread by the RPC runtime, since the current one is busy servicing the HelloProc call. While HelloProc is broken into, we note the dbgrpc thread list:

C:\Program Files\Debugging Tools for Windows>dbgrpc -t -P 0x314
Searching for thread info ...
PID CELL ID ST TID ENDPOINT LASTTIME
---------------------------------------------
0314 0000.0002 01 000000f0 0000.0001 0045c6f9
0314 0000.0003 03 00000218 0000.0001 0045c6f9

Notice how two threads are now associated with our endpoint. We can examine the new thread in the debugger:

0:001> ~
0 Id: 314.534 Suspend: 1 Teb: 7ffdd000 Unfrozen
. 1 Id: 314.f0 Suspend: 1 Teb: 7ffdc000 Unfrozen
2 Id: 314.218 Suspend: 1 Teb: 7ffdb000 Unfrozen
0:001> ~2 k
ChildEBP RetAddr
0065fe14 7c90e399 ntdll!KiFastSystemCallRet
0065fe18 77e765d3 ntdll!NtReplyWaitReceivePortEx+0xc
0065ff80 77e76c9f RPCRT4!LRPC_ADDRESS::ReceiveLotsaCalls+0x12a
0065ff88 77e76ac1 RPCRT4!RecvLotsaCallsWrapper+0xd
0065ffa8 77e76c87 RPCRT4!BaseCachedThreadRoutine+0x79
0065ffb4 7c80b6a3 RPCRT4!ThreadStartRoutine+0x1a
0065ffec 00000000 kernel32!BaseThreadStart+0x37

The stack trace is consistent with another RPC worker thread on the endpoint. It’s nice of the RPC runtime to provide these thread management services for us.

In a situation where a process has multiple RPC worker threads servicing an endpoint, it can be difficult to figure out which worker thread will pick up the call, unlike in the degenerate case discussed above. In the more complicated cases, we can utilize server call (“SCALL”) information provided by dbgrpc. With the server process at a break and the client process having performed a remote call, we enumerate the server’s calls:

C:\Program Files\Debugging Tools for Windows>dbgrpc -c -P 314
Searching for call info ...
PID CELL ID ST PNO IFSTART THRDCELL CALLFLAG CALLID LASTTIME CONN/CLN
----------------------------------------------------------------------------
0314 0000.0004 02 000 7a98c250 0000.0002 00000009 00000000 0045c6f9 05d8.00d0

This is pretty awesome. The listing notes that the SCALL has cell identifier 0.4. We can get a more verbose information view repeating the above:

C:\Program Files\Debugging Tools for Windows>dbgrpc -l -P 314 -L 0.4
Getting cell info ...
Call
Status: Dispatched
Procedure Number: 0
Interface UUID start (first DWORD only): 7A98C250
Call ID: 0x0 (0)
Servicing thread identifier: 0x0.2
Call Flags: cached, LRPC
Last update time (in seconds since boot):4572.921 (0x11DC.399)
Caller (PID/TID) is: 5d8.d0 (1496.208)

While we used endpoint enumeration and thread cell enumeration to find the server side, we can use SCALL enumeration to find our clients. Let’s see what’s going on at process 0x5d8 in thread d0:

C:\Program Files\Debugging Tools for Windows>cdb -p 0x5d8
Microsoft (R) Windows Debugger Version 6.9.0003.113 X86
Copyright (c) Microsoft Corporation. All rights reserved.
*** wait with pending attach
Symbol search path is: SRV*C:\websymbols*\\.host\Shared Folders\SymStore*http://
msdl.microsoft.com/download/symbols
Executable search path is:
ModLoad: 00400000 00455000 C:\Documents and Settings\AdminUser\Desktop\helloc.
exe
ModLoad: 7c900000 7c9b0000 C:\WINDOWS\system32\ntdll.dll
ModLoad: 7c800000 7c8f5000 C:\WINDOWS\system32\kernel32.dll
ModLoad: 77e70000 77f01000 C:\WINDOWS\system32\RPCRT4.dll
ModLoad: 77dd0000 77e6b000 C:\WINDOWS\system32\ADVAPI32.dll
(5d8.3a0): Break instruction exception - code 80000003 (first chance)
eax=7ffd7000 ebx=00000001 ecx=00000002 edx=00000003 esi=00000004 edi=00000005
eip=7c901230 esp=0035ffcc ebp=0035fff4 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000246
ntdll!DbgBreakPoint:
7c901230 cc int 3
0:001> ~
0 Id: 5d8.d0 Suspend: 1 Teb: 7ffdf000 Unfrozen
. 1 Id: 5d8.3a0 Suspend: 1 Teb: 7ffde000 Unfrozen
0:001> ~0 s
eax=77ea19bb ebx=00145618 ecx=00144a78 edx=00000000 esi=0012fb68 edi=0012fb3c
eip=7c90eb94 esp=0012fab4 ebp=0012fb00 iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206
ntdll!KiFastSystemCallRet:
7c90eb94 c3 ret
0:000> kb
ChildEBP RetAddr Args to Child
0012fab0 7c90e3ed 77e7ca99 000007c4 00145820 ntdll!KiFastSystemCallRet
0012fab4 77e7ca99 000007c4 00145820 00145820 ntdll!ZwRequestWaitReplyPort+0xc
0012fb00 77e7a326 00145858 0012fb20 77e7a357 RPCRT4!LRPC_CCALL::SendReceive+0x22
8
0012fb0c 77e7a357 0012fb3c 00444290 0012ff18 RPCRT4!I_RpcSendReceive+0x24
0012fb20 77ef3675 0012fb68 00145871 08efa12c RPCRT4!NdrSendReceive+0x2b
*** WARNING: Unable to verify checksum for C:\Documents and Settings\AdminUser\D
esktop\helloc.exe
0012fefc 004011b6 00444290 00444246 0012ff18 RPCRT4!NdrClientCall2+0x222
0012ff10 004010c5 00452010 058dc64f 08efa12c helloc!HelloProc+0x16
0012ff6c 004020d7 00000001 00332fe0 00333050 helloc!main+0xc5
0012ffb8 00401faf 0012fff0 7c816ff7 08efa12c helloc!__tmainCRTStartup+0x117
0012ffc0 7c816ff7 08efa12c 01c8c807 7ffd7000 helloc!mainCRTStartup+0xf
0012fff0 00000000 00401fa0 00000000 78746341 kernel32!BaseProcessStart+0x23
0:000>

We can clearly see the HelloProc client-side stub invoking NdrClientCall2 to perform the remote procedure call to our server process.

Note that the SCALL information also includes beginning of the RPC interface GUID (IfStart) and the slot (ProcNum) in the interface being invoked (think of the RPC interface as a C++ vtable) – this can be useful if we are looking for the server side implementation of an unknown interface and multiple interfaces are being exported by the server process.

You can figure out more techniques for using dbgrpc and the Windbg RPC debugging extension by going over Windbg’s RPC debugging documentation. I found the need for the above primer since the documentation is not exactly organized in tutorial form and can be daunting for the uninitiated.

There is another RPC debugging trick up our sleeve. I shall make an exception of my usual habit and discuss the “other” debugger, Visual Studio’s. The Visual Studio Debugger has an extremely powerful feature, unfortunately missing from Windbg, for RPC and COM debugging. Take a look at the documentation for the Native Debugging options dialog for where to turn it on. It is available as far back as Visual C++ 6.0, though you probably want to use a modern version of the Visual C++ debugger that would be able to use modern PDB symbol files (VC++ 6.0 chokes on XP SP2’s newer PDBs, etc.)

With this debugger feature enabled, you just perform a usual Step Into on the client side call during the debugging session, and instead of being lead into the low-level marshaling code generated by MIDL for the interface in question, another session of the debugger is automagically attached to the server-side process and the server-side thread is broken into at the call site of the server-side function implementation (O… M… G…) – pretty neat, don’t you think? COM folks, take notice – this stuff even works with full-fledged COM objects.

Unfortunately, Microsoft had to blow it by severely crippling this amazing debugger feature in Windows Vista. As if more excuses to dislike it were required, the debugger will no longer automatically locate the server process and attach to it on that OS. You’ll have to preattach the debugger to the server process by hand and only then will the server call be broken into when appropriate. On Vista, you can use the dbgrpc techniques discussed above to figure out which server process you should attach the Visual C++ debugger to. I also noticed the lack of the wonderful auto-attach behavior in one of my debugging sessions on a XP x64 system, although this is not mentioned in the Visual Studio documentation. What a waste!

Now on to RPC debugging across machine boundaries. Obviously, the RPC runtime will not provide us with process and thread identifiers if the call has crossed a machine boundary. For exploring this scenario, we shall modify the HELLO sample to use the named pipes transport (ncacn_np) to the remote HELLO server.

With CCALL information enabled (i.e., “Full” rather than just “Server” state information) we can see where outgoing RPC calls are headed.  Unfortunately, on one hand as soon as the server side responds the call is completed and the CCALL entry is gone. On the other hand, if we set up a breakpoint on the client-side stub (e.g., helloc!HelloProc) the RPC runtime doesn’t even know yet a remote call is about to be made.

If we know which server the outgoing call is headed to, we can break the server-side and thus make the client-side call block while waiting for the server to respond. In this state, we can examine CCALL information. First let’s set up the break on the server:

C:\Program Files\Debugging Tools for Windows>cdb E:\Projects\hello\hellos.exe
Microsoft (R) Windows Debugger Version 6.9.0003.113 X86
Copyright (c) Microsoft Corporation. All rights reserved.
CommandLine: E:\Projects\hello\hellos.exe
Symbol search path is: C:\WINDOWS\Symbols;SRV*E:\SymStore*http://referencesource
.microsoft.com/symbols;SRV*E:\SymStore*http://msdl.microsoft.com/download/symbols
Executable search path is:
ModLoad: 00400000 00455000 hellos.exe
ModLoad: 7c900000 7c9af000 ntdll.dll
ModLoad: 7c800000 7c8f6000 C:\WINDOWS\system32\kernel32.dll
ModLoad: 77e70000 77f02000 C:\WINDOWS\system32\RPCRT4.dll
ModLoad: 77dd0000 77e6b000 C:\WINDOWS\system32\ADVAPI32.dll
ModLoad: 77fe0000 77ff1000 C:\WINDOWS\system32\Secur32.dll
(1320.1304): Break instruction exception - code 80000003 (first chance)
eax=00241eb4 ebx=7ffdf000 ecx=00000007 edx=00000080 esi=00241f48 edi=00241eb4
eip=7c90120e esp=0012fb20 ebp=0012fc94 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
ntdll!DbgBreakPoint:
7c90120e cc int 3
0:000> bp hellos!HelloProc
*** WARNING: Unable to verify checksum for hellos.exe
0:000> g

Then let’s have the call block on the client side by simply running the client. Now let’s examine the client call list:

C:\Program Files\Debugging Tools for Windows>dbgrpc -a
Searching for call info ...
PID CELL ID PNO IFSTART TIDNUMBER CALLID LASTTIME PS CLTNUMBER ENDPOINT
------------------------------------------------------------------------------
0428 0000.003f 0009 4b112204 0000.0000 ffffffff 00019238 09 0000.003d LRPC000004
e4
0710 0000.0001 0000 7a98c250 0000.0000 00000001 00843c7a 0f 0000.0002 \pipe\hell
o
C:\Program Files\Debugging Tools for Windows>dbgrpc -l -P 710 -L 0.1
Getting cell info ...
Client call info
Procedure number: 0
Interface UUID start (first DWORD only): 7A98C250
Call ID: 0x1 (1)
Calling thread identifier: 0x0.0
Call target identifier: 0x0.2
Call target endpoint: \pipe\hello
C:\Program Files\Debugging Tools for Windows>dbgrpc -l -P 710 -L 0.2
Getting cell info ...
Call target info
Protocol Sequence: NMP
Last update time (in seconds since boot):8666.234 (0x21DA.EA)
Target server is: darkstar
C:\Program Files\Debugging Tools for Windows>

Notice how the CCALL information cell is associated with a target information cell containing the name of the remote host servicing the call. If we were unsure which remote calls were being made, we could extract the actual interface calls from the CCALL information entry (alternatively, a network protocol analyzer understanding MSRPC, such as Wireshark or Microsoft Network Monitor, could be used).

Now let’s see how a call from a remote client appears on the server end. We’ll wait for our breakpoint on the server-side stub to fire. At this point we’d have a SCALL entry to consider:

0:000> g
Breakpoint 0 hit
eax=004010f0 ebx=0055fd54 ecx=00000000 edx=00145700 esi=0055f950 edi=0055f92c
eip=004010f0 esp=0055f92c ebp=0055f940 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
hellos!HelloProc:
004010f0 55 push ebp
0:001> |
. 0 id: 6dc create name: hellos.exe
C:\Program Files\Debugging Tools for Windows>dbgrpc -c -P 6dc
Searching for call info ...
PID CELL ID ST PNO IFSTART THRDCELL CALLFLAG CALLID LASTTIME CONN/CLN
----------------------------------------------------------------------------
06dc 0000.0004 02 000 7a98c250 0000.0002 00000001 00000001 009b0f9e 0000.0003

Notice how instead of the traditional process and thread identifiers, we have what appears to be a cell ID as the caller. Let’s see what information cells 3 and 4 contain:

C:\Program Files\Debugging Tools for Windows>dbgrpc -l -P 6dc -L 0.4
Getting cell info ...
Call
Status: Dispatched
Procedure Number: 0
Interface UUID start (first DWORD only): 7A98C250
Call ID: 0x1 (1)
Servicing thread identifier: 0x0.2
Call Flags: cached
Last update time (in seconds since boot):10162.78 (0x27B2.4E)
Owning connection identifier: 0x0.3
C:\Program Files\Debugging Tools for Windows>dbgrpc -l -P 6dc -L 0.3
Getting cell info ...
Connection
Connection flags: Exclusive
Authentication Level: Default
Authentication Service: None
Last Transmit Fragment Size: 49 (0x1002050)
Endpoint for the connection: 0x0.1
Last send time (in seconds since boot):10162.78 (0x27B2.4E)
Last receive time (in seconds since boot):10162.78 (0x27B2.4E)
Getting endpoint info ...
Process object for caller is 0xA14

Notice that the connection cell contains the remote PID of the caller, 0xA14.

0:001> |
. 0 id: a14 attach name: E:\Projects\hello\helloc.exe
0:001>

Unfortunately, the thread identifier is missing so you’ll have to use CCALL information on the client for that. Even more tragically, dbgrpc fails to name the name of the remote caller! You know it’s PID 0xA14, you just don’t know on what machine… You’ll have to make an educated guess, perhaps with the assistance of a network protocol analyzer.

Occasionally we won’t be in a situation that allows for breaking the server-side to facilitate blocking the client-side call for CCALL information examination. In such cases, we’ll want to break the client-side right after debug information for the call has been registered, but before the call has been sent to the server for completion. The various RPC transports utilize the CCALL::SetDebugClientCallInformation function for this purpose. Let’s see what happens when we break on it, let it do the registration and examine the CCALL table:

0:000> bp rpcrt4!CCALL::SetDebugClientCallInformation
0:000> g
Breakpoint 0 hit
eax=0012faa8 ebx=00000000 ecx=001450a8 edx=00000000 esi=001450a8 edi=0012fb3c
eip=77ec44de esp=0012fa68 ebp=0012fab8 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
RPCRT4!CCALL::SetDebugClientCallInformation:
77ec44de 8bff mov edi,edi
0:000> k
ChildEBP RetAddr
0012fa64 77ea7b73 RPCRT4!CCALL::SetDebugClientCallInformation
0012fab8 77e808d0 RPCRT4!OSF_CCALL::FastSendReceive+0x72
0012fad4 77e80e1f RPCRT4!OSF_CCALL::SendReceiveHelper+0x58
0012fb00 77e7a326 RPCRT4!OSF_CCALL::SendReceive+0x41
0012fb0c 77e7a357 RPCRT4!I_RpcSendReceive+0x24
0012fb20 77ef3675 RPCRT4!NdrSendReceive+0x2b
*** WARNING: Unable to verify checksum for helloc.exe
0012fefc 004011b6 RPCRT4!NdrClientCall2+0x222
0012ff10 004010c5 helloc!HelloProc+0x16
0012ff6c 004020d7 helloc!main+0xc5
0012ffb8 00401faf helloc!__tmainCRTStartup+0x117
0012ffc0 7c816ff7 helloc!mainCRTStartup+0xf
0012fff0 00000000 kernel32!BaseProcessStart+0x23
0:000> gu
eax=00000000 ebx=00000000 ecx=00000002 edx=0000b10e esi=001450a8 edi=0012fb3c
eip=77ea7b73 esp=0012fa88 ebp=0012fab8 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
RPCRT4!OSF_CCALL::FastSendReceive+0x72:
77ea7b73 3bc3 cmp eax,ebx
0:000>
C:\Program Files\Debugging Tools for Windows>dbgrpc -a
Searching for call info ...
PID CELL ID PNO IFSTART TIDNUMBER CALLID LASTTIME PS CLTNUMBER ENDPOINT
------------------------------------------------------------------------------
0428 0000.003f 0009 4b112204 0000.0000 ffffffff 00019238 09 0000.003d LRPC000004
e4
0504 0000.0001 0000 7a98c250 0000.0000 001440c8 00b10eb8 00 0000.0002

Oops… notice how the name of the endpoint is missing from the CCALL entry at this point! With some disassembly (left as an exercise for the reader) it is clear the caller copies the endpoint name into the debug information buffer right after setting up the entry:
0:000> bp rpcrt4!CCALL::SetDebugClientCallInformation
0:000> g
Breakpoint 0 hit
eax=0012faa8 ebx=00000000 ecx=001450a8 edx=00000000 esi=001450a8 edi=0012fb3c
eip=77ec44de esp=0012fa68 ebp=0012fab8 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
RPCRT4!CCALL::SetDebugClientCallInformation:
77ec44de 8bff mov edi,edi
0:000> gu
eax=00000000 ebx=00000000 ecx=00000002 edx=0000bb8b esi=001450a8 edi=0012fb3c
eip=77ea7b73 esp=0012fa88 ebp=0012fab8 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
RPCRT4!OSF_CCALL::FastSendReceive+0x72:
77ea7b73 3bc3 cmp eax,ebx
0:000> bp rpcrt4!strncpy
0:000> g
Breakpoint 1 hit
eax=00350034 ebx=00000001 ecx=0012fa79 edx=00000000 esi=001450a8 edi=0000000c
eip=77e952a0 esp=0012fa6c ebp=0012fab8 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
RPCRT4!strncpy:
77e952a0 ff252813e777 jmp dword ptr [RPCRT4!_imp__strncpy (77e71328)] ds:
0023:77e71328={ntdll!strncpy (7c902c80)}
0:000> t
eax=00350034 ebx=00000001 ecx=0012fa79 edx=00000000 esi=001450a8 edi=0000000c
eip=7c902c80 esp=0012fa6c ebp=0012fab8 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
ntdll!strncpy:
7c902c80 8b4c240c mov ecx,dword ptr [esp+0Ch] ss:0023:0012fa78=000000
0c
0:000> gu
eax=00350034 ebx=00000001 ecx=00000000 edx=006f6c6c esi=001450a8 edi=0000000c
eip=77ea7be8 esp=0012fa70 ebp=0012fab8 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
RPCRT4!OSF_CCALL::FastSendReceive+0xe7:
77ea7be8 83c40c add esp,0Ch
0:000>
C:\Program Files\Debugging Tools for Windows>dbgrpc -a
Searching for call info ...
PID CELL ID PNO IFSTART TIDNUMBER CALLID LASTTIME PS CLTNUMBER ENDPOINT
------------------------------------------------------------------------------
0428 0000.003f 0009 4b112204 0000.0000 ffffffff 00019238 09 0000.003d LRPC000004
e4
07b4 0000.0001 0000 7a98c250 0000.0000 00000001 00bb8b2b 00 0000.0002 \pipe\hello

Ahh, that’s better. But if we examine the server name in the CCALL cell, we see it hasn’t yet been initialized. We need another round of strncpy for that. If we dig further into the transport code, we figure out that it would be better to break right before the function call dispatching the data to the server side. For instance, in the case of the named pipe transport, this would be the call to RPCRT4!OSF_CCALL::SendNextFragment from RPCRT4!OSF_CCALL::FastSendReceive. If we are using the LPC transport instead, other transport functions will be involved. To summarize – breaking the call after CCALL information has been completely registered but before it has been sent to the server is not so easy and is highly transport dependent. However, it is indeed quite possible if your scenario requires it.

And so the RPC debugging primer comes to conclusion. It is a messy ordeal, yet so much cooler than stepping through yet another SOAP web service in Visual Studio, isn’t it? :-)

Bridging the gap between native functions and Active Scripting with a COM-based FFI wrapper

A few weeks ago I was following the excitement as WebKit, Safari’s browser engine, incrementally passed more and more of the Acid 3 standards test. Wondering if the Gecko (Mozilla Firefox’s rendering engine) folks are also busy with that, I followed both the Planet WebKit and Planet Mozilla feeds for a few weeks.

Sometime in April I stumbled upon this post in Planet Mozilla. It discussed recent improvements to JSctypes. It was the first time I had heard of this project. JSctypes is an XPCOM component for Mozilla that allows calling native (or “foreign”) functions from privileged JavaScript code. Both the interface and name are inspired by the Python ctypes module, included with the standard distribution since version 2.5.

If you haven’t heard of ctypes, take a minute to get acquainted. It’s a great library that allows you to call native C functions dynamically from Python code. Its interface really feels at home in a dynamic language. Most of the time, you can just call functions without specifying the number and types of the arguments they receive. DLL modules can be accessed as attributes of the module attribute matching their calling convention (e.g., ctypes.windll.kernel32 or ctypes.cdll.msvcrt) and script functions can be passed as callbacks to the native APIs being invoked.

JSctypes takes Python’s ctypes concept into Mozilla’s JavaScript implementation. Mozilla has a COM-like architecture at the base of its object model which is called XPCOM. Usually, calling native functionality from JavaScript is achieved by exposing an XPCOM component to script. However, such an approach has clear disadvantages as every conceivable native functionality needs to be wrapped on a case by case basis by a compiled XPCOM component. Now, with JSctypes, Mozilla’s JavaScript code, when privileged (obviously a native call interface is not appropriate in the context of untrusted web content), can call most native functions with relative ease and without a compiled component, aside from JSctypes itself.

A native function interface for a dynamic language needs to deal with the relatively complex task of setting up the call stack frame for an arbitrary native API, according to argument counts, types and alignment requirements deduced dynamically at script execution time. As the interface layer seeks to support a broader and broader variety of argument types (basic data types, then structures, arrays, then callback functions, etc.) the task becomes increasingly complicated and difficult.

I reviewed both JSctypes and Python’s ctypes source code in their respective source code repositories and learned that they both share a common implementation of the lowest component in such a native interface layer. It is called libffi, the Foreign Function Interface library and seems to originate from the gcc project. Since libffi is designed to be compiled with a UNIX-style toolchain (has AT&T syntax assembly files, for instance) and Python needs to compile with Visual C++, the author of ctypes, Thomas Heller, ported an old revision of the library to Visual C++.

Usage of libffi is pretty simple. You initialize an ffi_cif (call information?) structure with the ABI type, return value type, argument count and argument types of the native function to be invoked by using the ffi_prep_cif function. Later, and repeatedly as needed, ffi_call is used to call the actual function with a specific set of argument values, passed in as an array and to retrieve the value returned from the native function.

I thought JSctypes is really cool and it then occurred to me it should not be prohibitively difficult to implement a similar adaptation layer for Microsoft’s JScript and possibly other Active Scripting languages.

In my mind’s eye, I envisioned an in-process COM server accessible to Active Scripting clients (implements IDispatch and associated with a ProgID) providing a call interface to arbitrary native functions.

I created an ATL COM DLL and gave the coclass the ProgID “FunctionWrapper.FunctionWrapper.1”. I knew you could call JScript functions with less or more arguments than they expect in their definition and figured pulling off the same in a native method I’ll expose to the script would be ideal. After a short investigation I learned of the IDL vararg attribute, which accomplishes just what I had in mind. At this point, the exposed interface looks like this:

[
object,
uuid(EBA4A11F-969B-4413-9D4E-FB5CB21039FC),
dual,
nonextensible,
helpstring("IFunctionWrapper Interface"),
pointer_default(unique)
]
interface IFunctionWrapper : IDispatch {
[id(1), helpstring("method CallFunction"), vararg] HRESULT CallFunction([in] SAFEARRAY(VARIANT) args, [out, retval] VARIANT* retVal);
};

The CallFunction method of the FunctionWrapper object is callable by JScript clients with arguments of arbitrary count and type of their choosing. As a simplistic start, I had the first argument specify a string identifying the native function, in the Windbg-inspired syntax of “module!export”, e.g. “user32!MessageBoxW”. The rest of the arguments would be passed to the native function.

I proceeded to implement CFunctionWrapper::CallFunction. The steps taken by the method would be:

  1. Ensure at least the first argument (function to invoke) was given.
  2. Ensure the first argument specifies a module and an export, load the module and retrieve the address of the export.
  3. Thunk the VARIANT arguments received by the method to libffi-style argument and types arrays.
  4. Invoke ffi_prep_cif to prepare the call and call the native function with ffi_call
  5. Thunk the return value of the function into a VARIANT usable by script.

Much of the work here is concise but stage 3 consists of relatively mundane boilerplate, translating two varieties of dynamically typed data, Microsoft’s VARIANT and libffi’s ffi_type. I’ll illustrate with a short snippet:

for (ULONG i = 1; i < arguments.GetCount(); i++)
{
ffi_type* argumentTypes = ...; // Dynamically allocated by argument count
void* ffiArgs = ...;


VARIANT& arg = arguments[i];
switch (V_VT(&arg)) {
case VT_UI1:
argumentTypes[i - 1] = &ffi_type_uint8;
ffiArgs[i - 1] = &(V_UI1(&arg));
break;
...
case VT_UI4:
argumentTypes[i - 1] = &ffi_type_uint32;
ffiArgs[i - 1] = &(V_UI4(&arg));
break;
}
}

Similar work is needed for other integer and floating-point types, strings and pointers.

Initially, I hard-coded a return value type of unsigned 32-bit integer and the stdcall calling convention to avoid providing an interface for selecting those parameters. I registered the DLL and tested the following script with WSH:

var functionWrapper = new ActiveXObject("FunctionWrapper.FunctionWrapper");
var retVal = functionWrapper.CallFunction("user32!MessageBoxW", 0, "text", "caption", 1);
WScript.Echo(retVal);

1 is also the value of the MB_OKCANCEL parameter to MessageBox. I used the W variety of the API since I implemented hardcoded UTF16 marshalling for VT_BSTR type variants, which is the form strings come in from JScript.

I was quite content when the test script not only failed to crash the WSH process, but also successfully presented a message box and provided the API’s return value successfully back to JScript.

At this point I considered what would it take to extend this solution beyond the basic value types. Arrays first came to mind. Such support, I imagined, would consist of copying an incoming SAFEARRAY argument into a native array and supplying the native array pointer to the native function. If “out” array argument support is desired, copying back into the SAFEARRAY would be required post-invocation, right after ffi_call.

Next in line were structs. These would be less straightforward. The problem with filling a JScript “object” (read, hash table) with a struct’s fields is that ordering would not be preserved as the order in the struct’s data layout. Using the hash as a JScript array would provide ordering, although it wouldn’t be very nice looking.

The final type of argument I considered, and arguably the most important, is callbacks. Many APIs take function pointers as arguments. Consider EnumWindows which invokes EnumWindowsProc on every window found. A native call interface should provide a capability to implement the callback as JScript function and pass it as seamlessly as possible during the native invocation.

Fortunately, libffi provides built-in support for callbacks, calling them “closures” in its terminology. An ffi_cif structure is initialized to describe the prototype of the callback function, in native eyes, as it if it were going to be called with ffi_call. ffi_prep_closure takes such a prototype description, a function pointer and a closure “trampoline buffer”, as I call it. The trampoline buffer, expected to be allocated in writable, executable memory (native code would later jump into its address) takes care of calling the provided function pointer. The twist is that the function pointer, instead of being called with a dynamic prototype, always receives its arguments in the form of libffi argument arrays.

The native callback function wrapped by the closure trampoline buffer would presumably fill a SAFEARRAY of variants with the arguments and invoke a script function. A wrapper callback coclass could be provided to the script and allow for more elaborate stuff like out parameters and the like. An instance of the callback object would wrap a JScript function object and invoke its apply method using the IDispatch interface as calls come in through the closure. It is unclear what a generic solution that doesn’t rely on functions being objects and having the apply method would look like, so at this point this wrapper callback concept is only suitable for JScript.

Right now I only got as far as implementing just the basic value types, and even that with code of such poor quality I avoid uploading it for the time being. The devil is in the details and supporting describing complex argument types would require quite a bit of work. Hopefully someday I or perhaps an enthusiastic reader would get around to coding and publishing a full-fledged implementation of a native call interface. Embedding such an interface in an Active Scripting host in scenarios where the hosted scripts enjoy full trust could provide endless extensibility possibilities for the script author.

Hey, cooler than P/Invoke…