A plug for a friend’s new blog

A friend of mine that goes by the ominous handle “haxorcize” has apparently been inspired by this blog’s debugging sheningans and started a blog of his own. He even shelled out a few bucks to get his own domain name. Take a look at his intriguing first post analyzing the mechanics of NetLimiter’s per-process traffic throttling using the ever trusty Windbg over here.

Advertisements

Windbg 6.9.3.113 released

A new version of the Debugging Tools for Windows appeared, quietly as usual, on Microsoft’s web site a few days ago.

Unfortunately the debugging symbols package for Windows XP SP3 is still MIA, presumably being delayed along with widespread SP3 availability on the Download Center and Windows Update. My local symbol store has grown quite obese with all the SP2 patches over the years so I’m looking forward to clean things up once that’s available.

Nothing too exciting in the RELNOTES.TXT for this release. Integrated managed debugging remains dysfunctional so the trusty 6.7.5.0 remains in place for that. Can’t even get SOS to break on an application’s Main method. The most exciting feature is enhancements to the “dt” command.

Yawn.

Deploying the Visual C++ libraries with an NSIS installer

Beginning with Visual C++ 2005 and continuing into Visual C++ 2008 and the foreseeable future, Microsoft’s various runtime libraries (CRT, ATL, MFC, etc.) are no longer installed into the system32 directory on Windows XP and later, but are rather “side-by-side assemblies” that need to be installed into the side-by-side store, “WinSxS”, in order to be available to all applications.

I’ve discussed the SxS store and the API Microsoft has documented for managing it in a previous post. Nevertheless, at the request of the NSIS maintainer, kichik, I’ll provide some guidance on the issue of runtime deployment and concrete examples to authors of NSIS-based installations. Do keep in mind that I am not adept at authoring NSIS installers and questions beyond the realm of the matter at hand are best targeted at the NSIS forum.

Unlike in the Linux world, where the C runtime library is considered an operating system component and versions of it are never installed by applications (at worst, some proprietary application is linked against an antique version of libc and requires the system administrator to install a compatibility package provided by the distribution), the CRT situation on Windows is more complicated. In the days of yore, Windows NT provided the now long defunct CRTDLL.DLL. Later, the newer variant MSVCRT.DLL shipped with Visual C++, going into the 6.0 release. However, in addition to serving as the runtime of a specific Visual C++ version, MSVCRT.DLL doubles as the “OS CRT”, the version of the C++ runtime deployed with the OS as far back as NT 4.0 and going into Windows Vista. Components included with the operating system itself, such as Notepad and Calculator, are linked against this CRT dynamically. Do not let the identical moniker fool you, the CRTs included with the various NT releases diverge significantly, sporting, for example, a brand new exception handling runtime in Windows Vista, aligned with newer Visual C++ compilers.

The existence of several MSVCRT.DLL variants and the associated versioning issues are probably what led Microsoft to adopt a policy of strongly versioned CRTs beginning with the Visual C++ .NET (2002) release. MSVCR70.DLL was the runtime required by the output of that product, and later versions would require deploying MSVCR71.DLL, MSVCR80.DLL and most recently MSVCR90.DLL. In addition to the CRT itself, there are also the various peripheral libraries that some applications may depend on, such as ATL and MFC.

I’ve discussed in the past an approach utilizing the Windows Driver Kit build environment that allows combining a modern C++ compiler with targeting the Visual C++ 6.0 / OS CRT, MSVCRT.DLL. It is for the brave who don’t mind getting their hands dirty and whose desire to target the broadly deployed runtime exceeds the fear of the plethora of potential version compatibility issues such an application configuration can cause.

For the more conservative lot, the question remains, how do I get the new C++ runtimes to my end-user’s machine? The first approach is that of utilizing static linking. It should be avoided at nearly all cost as it results in both obese executables that are unable to share the runtime’s memory pages with other running processes and is completely unservicable by Microsoft when a security update or another bug fix to all users of the runtime libraries needs to be broadly deployed.

We therefore turn our attention to approaches based on dynamic linking. First of all, the reader should review the official guidance provided by the Visual C++ team on the matter, although he or she may not like what they read. To summarize, Microsoft officially supports the following deployment methods:

  • Use an MSI-based (Windows Installer) installation procedure and utilize their MSM merge modules to include whatever runtime components you require with your application. The MSMs are black box magic that will get those runtime libraries into the “winsxs” store without asking too many questions. If you don’t like those massively complicated MSI installers and the WiX XML schemas make your head spin, that’s too bad.
  • Use the obese VCRedist.exe for the target architecture, without the benefit of picking and choosing only those runtime components that are of interest for your specific application.
  • Deploy the runtime libraries as files in your application’s directory, or “private assemblies” in SxS nomenclature, wasting the end-user’s hard disk space with multiple copies. This is not as bad as it seems, since at least SxS redirection policies can make an updated, security patched version from the “winsxs” store be loaded in place of out of date version deployed privately with the application, unlike with classic non-SxS local DLLs or with static linking.

As the popularity of NSIS as an installation apparatus shows, not everyone are willing to be strong-armed into an MSI-based installation just yet. So how do CRT deploying installers address this acute issue? I was disappointed, but not surprised, to see that VLC, DivX and various other applications with NSIS-based installers, opt for the “private assembly” approach, simplifying life for the installation author but needlessly wasting end-user disk real estate.

The now documented SxS API provides an alternative approach, presumably supported by Microsoft for deploying SxS assemblies in general (such as your own) but not specifically by the Visual C++ folks for theirs. The motivation for this lack of support is unclear, since the end result is as servicable by them as is using Windows Installer merge modules. Nevertheless, it is something that those who follow this path should be aware of.

OK, so let’s get on with it. Unlike with system32, we can’t just waltz into winsxs and drop our assembly’s files there. The directory structure is complicated, differs between XP and Vista, and in fact the ACL on the directory in Vista won’t allow anyone but TrustedInstaller (i.e., MSI) to touch it. Therefore we are required to perform the installation through the SxS API, which provides a COM-based interface for manipulating the store.

For illustration purposes, I shall use the Visual C++ 2005 (8.0) Debug CRT. Note that this is not the CRT you want to deploy to your end users, and in any case is explicitly NOT redistributable by Microsoft’s license terms. I use it for illustrative convenience since my XP virtual machine doesn’t have this assembly. We’ll use an NSIS installer script to drive the wonderful though peculiar System plug-in and get it to invoke the SxS API. Note that elaborate error handling is omitted for brevity. So here we go:


Name "NSIS SxS Test"
OutFile "nsissxs.exe"
SetPluginUnload alwaysoff
ShowInstDetails show
XPStyle on
SetCompressor /SOLID lzma
InstallDir $PROGRAMFILES\NSISSxS


!define FUSION_REFCOUNT_UNINSTALL_SUBKEY_GUID {8cedc215-ac4b-488b-93c0-a50a49cb2fb8}

Section "Uninstall"
DeleteRegKey "HKLM" "Software\Microsoft\Windows\CurrentVersion\Uninstall\nsissxs"
Delete $INSTDIR\uninst.exe
Delete $INSTDIR\dummy.txt
RMDir $INSTDIR
DetailPrint "Removing DebugCRT assembly..."
System::Call "sxs::CreateAssemblyCache(*i .r0, i 0) i.r1"
StrCmp $1 0 0 fail
System::Call "*(i 32, i 0, i 2364391957, i 1217113163, i 178634899, i 3090139977, w 'nsissxs', w '') i.s"
Pop $2
System::Call "$0->3(i 0, w 'Microsoft.VC80.DebugCRT,version=$\"8.0.50727.762$\",type=$\"win32$\",processorArchitecture=$\"x86$\",publicKeyToken=$\"1fc8b3b9a1e18e3b$\"', i r2, *i . r3) i.r1"
StrCmp $1 0 0 fail2
DetailPrint "Disposition returned is $3"
System::Call "$0->2()"
Goto end
fail:
DetailPrint "CreateAssemblyCache failed."
DetailPrint $1
Goto end
fail2:
DetailPrint "UninstallAssembly failed."
DetailPrint $1
Goto end
end:
SectionEnd

Section
SetOutPath $INSTDIR
File "dummy.txt"
WriteUninstaller $INSTDIR\uninst.exe
WriteRegStr "HKLM" "Software\Microsoft\Windows\CurrentVersion\Uninstall\nsissxs" "DisplayName" "NSIS SxS Test"
WriteRegStr "HKLM" "Software\Microsoft\Windows\CurrentVersion\Uninstall\nsissxs" "UninstallString" "$INSTDIR\uninst.exe"
InitPluginsDir
SetOutPath $PLUGINSDIR
File "msvcm80d.dll"
File "msvcp80d.dll"
File "msvcr80d.dll"
File "x86_Microsoft.VC80.DebugCRT_1fc8b3b9a1e18e3b_8.0.50727.762_x-ww_5490cd9f.cat"
File "x86_Microsoft.VC80.DebugCRT_1fc8b3b9a1e18e3b_8.0.50727.762_x-ww_5490cd9f.manifest"

DetailPrint "Installing DebugCRT assembly..."
System::Call "sxs::CreateAssemblyCache(*i .r0, i 0) i.r1"
StrCmp $1 0 0 fail
# Fill a FUSION_INSTALL_REFERENCE.
# fir.cbSize = sizeof(FUSION_INSTALL_REFERENCE) == 32
# fir.dwFlags = 0
# fir.guidScheme = FUSION_REFCOUNT_UNINSTALL_SUBKEY_GUID
# fir.szIdentifier = "nsissxs"
# fir.szNonCanonicalData = 0
System::Call "*(i 32, i 0, i 2364391957, i 1217113163, i 178634899, i 3090139977, w 'nsissxs', w '') i.s"
Pop $2
# IAssemblyCache::InstallAssembly(0, manifestPath, fir)
System::Call "$0->7(i 0, w '$PLUGINSDIR\x86_Microsoft.VC80.DebugCRT_1fc8b3b9a1e18e3b_8.0.50727.762_x-ww_5490cd9f.manifest', i r2) i.r1"
System::Free $2
StrCmp $1 0 0 fail2
System::Call "$0->2()"
Goto end
fail:
DetailPrint "CreateAssemblyCache failed."
DetailPrint $1
Goto end
fail2:
DetailPrint "InstallAssembly failed."
DetailPrint $1
Goto end
end:
SectionEnd

If you are not familiar with NSIS script syntax, now would be a good time to get acquainted. Let us review the contents of the 2nd section, which is the install section. The dummy file is a placeholder for the actual files your installer wants to deploy. Next, we set up an Uninstall entry in the registry as one usually would. Now on to the interesting part.

In order to deploy a SxS assembly, we must place its DLLs together in a temporary directory created by the installer. Note that if an assembly contains several DLLs, we cannot pick and choose only those that our application links with. The assembly is deployed, versioned and bound as a whole. We can figure out which files are part of the assembly in question by reviewing the assembly manifest, which we’ll find installed into the Manifests subdirectory of the WinSxS store on a Windows XP system. If we review the Debug CRT’s manifest, we can see <file> nodes under the <assembly> node, each referencing one of the files that must be deployed with the assembly. You can find the actual assembly files under the subdirectory with the assembly’s strong name in the WinSxS store.

In addition to the DLL files themselves, the assembly manifest and the assembly signing catalog are an integral part of the assembly. The catalog ensures the integrity of the assembly and is a welcome feature over traditional DLL deployment.

With the DLLs, assembly manifest and catalog in place, we are ready to invoke the SxS API for assembly installation. First, we call CreateAssemblyCache to retrieve the IAssemblyCache interface for managing the SxS store. Note that in the context of an NSIS installer, COM has already been initialized (for STA use) at this point, but if you are making a custom installer in another environment you may have to take care of that before reaching this point.

Assuming all goes well the next phase is setting up the FUSION_INSTALL_REFERENCE structure that will describe our assembly installation. Typically, you’ll want the reference to be associated with the registry Uninstall key for your application. Besides, other reference types do not seem to work too well and the documentation doesn’t err on the side of verbosity.

The not-so-seasoned NSIS scripter that I am, I couldn’t figure out a more legible way to specify the GUID argument to the InstallAssembly invocation so I broke down its components by hand. Counting vtable indices including the IUnknown and IAssemblyCache interfaces, InstallAssembly is at vtable slot 7. After the install reference structure is set up and the method invoked, we hope for the best.

Assuming a successful install transaction, we proceed to call the IUnknown method Release (vtable slot 2) to free the SxS cache manager and deem our install sequence completed.

We now turn our attention to the reverse sequence in the Uninstall section of the illustration installer. Being good citizens of the Windows ecosystem, we remove our reference to the shared assembly when the end-user removes our application from their system. WinSxS manages assembly reference counting and will figure out whether the assembly files should actually be removed from the disk.

We create an IAssemblyCache interface instance as before but this time call UninstallAssembly to remove our reference. This is the first method of the interface but is preceded by the IUnknown members and is thus at vtable slot 3. Following a successful invocation we can examine the returned Disposition value if it is of interest and proceed to free the instance.

Note that we remove an assembly by its full, strong name and not by path. You can figure out the assembly’s strong name from its manifest.

OK, installing the VC8 Debug CRT was easy enough. Note that other libraries (ATL, etc.) you’ll want to install may have dependencies on other assemblies, so make sure you get your install sequence in order.

Installing the Visual C++ 2005 runtime is nice and all, but somehow it just feels wrong installing obsolete software. I turned my attention to the Visual C++ 2008 libraries, encountering disappointing results.

I gave it a few shots but installing the Visual C++ 2008 Debug CRT always fails, InstallAssembly promptly returning an HRESULT containing ERROR_SXS_PROTECTION_CATALOG_NOT_VALID. A malformed catalog? In one of Microsoft’s very own assemblies? Say it ain’t so!

If you have the hots for deploying the newer runtime, you’ll have to figure out that one on your own, folks. I made sure the catalog for the Visual C++ 9.0 Debug CRT I picked up from the WinSxS store matches the same catalog file found in the MSI merge module (MSM) at C:\Program Files\Common Files\Merge Modules\Microsoft_VC90_DebugCRT_x86.msm by extracting the MSM’s files with the useful MSIX extractor. The catalog files matched and regardless, the SHA-1 hashes for the assembly files matched the catalog rejected by InstallAssembly. Mysterious.

Reviewing the Windows event log following this error didn’t help too much. The System log was now decorated with SideBySide Event ID 20, stating: “The manifest C:\WINDOWS\WinSxS\InstallTemp\160585\Manifests\x86_Microsoft.VC90.DebugCRT_1fc8b3b9a1e18e3b_9.0.21022.8_x-ww_597c3456.Manifest does not match its source catalog or the catalog is missing.” … No newsflash there.

I figured I’ll stick to VC8 support and leave the VC9 troubleshooting for later. If push comes to shove, one can always figure out how to write WiX installers. :)

Instead I opted to review whether this NSIS-based installation approach is compatible with Windows Vista. I was worried that with the restrictive ACLs on the WinSxS store in that OS, without being an MSI and running from the context of the mighty TrustedInstaller.exe process, the installation will surely fail.

I was therefore positively surprised when the test installer worked on Vista. This surprised me since I knew the elevated installer executable ran as Administrator, but there was no denying I was not supposed to be able to copy files to WinSxS:

C:\Users\User\Desktop>cacls C:\Windows\WinSxS
C:\Windows\winsxs NT SERVICE\TrustedInstaller:(OI)(CI)F
BUILTIN\Administrators:(OI)(CI)R
NT AUTHORITY\SYSTEM:(OI)(CI)R
BUILTIN\Users:(OI)(CI)R

There was no denying it. Users and Administrators alike have read-only access to the store, and only the TrustedInstaller service can actually modify it. I opted to run the installer once again, this time in Windbg, tracing the operation of the SxS API to figure out what was happening behind the scenes.


0:000> sxe ld:sxs
0:000> g
ModLoad: 75500000 7555f000 C:\Windows\system32\sxs.dll
eax=1000162a ebx=00000000 ecx=15bf8bb6 edx=00000007 esi=7ffdd000 edi=20000000
eip=76f99a94 esp=0278f620 ebp=0278f664 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
ntdll!KiFastSystemCallRet:
76f99a94 c3 ret
0:002> bp sxs!CreateAssemblyCache
0:002> g
Breakpoint 0 hit
eax=00000000 ebx=002ceda8 ecx=002cedc8 edx=002ce990 esi=002ceda8 edi=00000000
eip=7554a3aa esp=0278fd54 ebp=0278fd6c iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
sxs!CreateAssemblyCache:
7554a3aa 8bff mov edi,edi
0:002> kb 2
*** WARNING: Unable to verify checksum for C:\Users\User\AppData\Local\Temp\nspBB71.tmp\System.dll
ChildEBP RetAddr Args to Child
0278fd50 100024b5 00291a98 00000000 1000162a sxs!CreateAssemblyCache
0278fd6c 1000168d 002ceda8 00000000 75bbc780 System+0x24b5

We know from MSDN that CreateAssemblyCache returns the IAssemblyCache pointer through the first, out parameter. We expect a well-behaved caller to pass in storage initialized to zero, and the storage to contain the newly instantiated interface after the function returns:

0:002> dps 00291a98 L1
00291a98 00000000
0:002> gu
eax=00000000 ebx=002ceda8 ecx=00000000 edx=00000008 esi=002ceda8 edi=00000000
eip=100024b5 esp=0278fd60 ebp=0278fd6c iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
System+0x24b5:
100024b5 a340400010 mov dword ptr [System+0x4040 (10004040)],eax ds:0023:10004040={sxs!CreateAssemblyCache (7554a3aa)}
0:002> dps 00291a98 L1
00291a98 00291c88

The first pointer value in the storage reached when following a dereferenced interface pointer is a member function vtable. We verify this and follow the vtable to examine where the implementations of the various members reside:

0:002> dps 00291c88 L1
00291c88 755036f0 sxs!CAssemblyCache::`vftable'
0:002> dps 755036f0 L10
755036f0 75549ac4 sxs!CAssemblyCache::QueryInterface
755036f4 7550de6b sxs!CAssemblyCache::AddRef
755036f8 7554a355 sxs!CAssemblyCache::Release
755036fc 75549d35 sxs!CAssemblyCache::UninstallAssembly
75503700 75549b15 sxs!CAssemblyCache::QueryAssemblyInfo
75503704 7554a219 sxs!CAssemblyCache::CreateAssemblyCacheItem
75503708 755542f1 sxs!XMLParser::SetFlags
7550370c 75549e91 sxs!CAssemblyCache::InstallAssembly
75503710 7554a4d4 sxs!CAssemblyName::QueryInterface
75503714 7550de6b sxs!CAssemblyCache::AddRef
75503718 7554ad69 sxs!CAssemblyName::Release
7550371c 7554a525 sxs!CAssemblyName::SetProperty
75503720 7554a64d sxs!CAssemblyName::GetProperty
75503724 7554a4ba sxs!CAssemblyName::Finalize
75503728 7554aaac sxs!CAssemblyName::GetDisplayName
7550372c 7554a4ad sxs!CAssemblyName::Reserved

It is clear the implementation of the InstallAssembly method is sxs!CAssemblyCache::InstallAssembly. We set up a breakpoint, proceed there and perform a high-level trace:

0:002> bp sxs!CAssemblyCache::InstallAssembly
0:002> g
Breakpoint 1 hit
eax=00000000 ebx=002d2668 ecx=002d2688 edx=002cdc10 esi=002d2668 edi=00000000
eip=75549e91 esp=0278fd4c ebp=0278fd6c iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
sxs!CAssemblyCache::InstallAssembly:
75549e91 8bff mov edi,edi
0:002> wt -l 2 -m sxs
Tracing sxs!CAssemblyCache::InstallAssembly to return address 100024b5
16 0 [ 0] sxs!CAssemblyCache::InstallAssembly
13 0 [ 1] sxs!CFrame::CFrame
20 13 [ 0] sxs!CAssemblyCache::InstallAssembly
3 0 [ 1] sxs!CFrame::BaseEnter
11 0 [ 2] sxs!FusionpRtlPushFrame
6 11 [ 1] sxs!CFrame::BaseEnter
54 30 [ 0] sxs!CAssemblyCache::InstallAssembly
6 0 [ 1] sxs!CFrame::ClearLastError
58 36 [ 0] sxs!CAssemblyCache::InstallAssembly
10 0 [ 1] sxs!SxspTranslateReferenceFrom
13 0 [ 2] sxs!CFrame::CFrame
14 13 [ 1] sxs!SxspTranslateReferenceFrom
6 0 [ 2] sxs!CFrame::BaseEnter
70 19 [ 1] sxs!SxspTranslateReferenceFrom
9 0 [ 2] sxs!CFnTracerWin32::~CFnTracerWin32
74 28 [ 1] sxs!SxspTranslateReferenceFrom
63 138 [ 0] sxs!CAssemblyCache::InstallAssembly
4 0 [ 1] sxs!CFrame::ClearLastError
66 142 [ 0] sxs!CAssemblyCache::InstallAssembly
16 0 [ 1] sxs!SxsInstallW
13 0 [ 2] sxs!CFrame::CFrame
20 13 [ 1] sxs!SxsInstallW
6 0 [ 2] sxs!CFrame::BaseEnter
25 19 [ 1] sxs!SxsInstallW
13 0 [ 2] sxs!CGenericStringBuffer<64,CUnicodeCharTraits>::CGenericStringBuffer<64,CUnicodeCharTraits>
28 32 [ 1] sxs!SxsInstallW
13 0 [ 2] sxs!CGenericStringBuffer<64,CUnicodeCharTraits>::CGenericStringBuffer<64,CUnicodeCharTraits>
30 45 [ 1] sxs!SxsInstallW
13 0 [ 2] sxs!CGenericStringBuffer<64,CUnicodeCharTraits>::CGenericStringBuffer<64,CUnicodeCharTraits>
68 58 [ 1] sxs!SxsInstallW
4 0 [ 2] sxs!CFrame::ClearLastError
713 62 [ 1] sxs!SxsInstallW
72 0 [ 2] sxs!SxspExpandRelativePathToFull
797 134 [ 1] sxs!SxsInstallW
4 0 [ 2] sxs!CFrame::ClearLastError
800 138 [ 1] sxs!SxsInstallW
ModLoad: 741e0000 741ea000 C:\Windows\system32\sxsstore.dll
eax=ffffffff ebx=00000000 ecx=002e6a60 edx=00000001 esi=7ffdd000 edi=20000000
eip=76f99a94 esp=0278ed48 ebp=0278ed8c iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
ntdll!KiFastSystemCallRet:
76f99a94 c3 ret

OK, so it looks like this implementation lets sxs!SxsInstallW do the actual work. We rerun the installer and this time perform a trace from that point:

0:000> sxe ld:sxs
0:000> g
ModLoad: 75500000 7555f000 C:\Windows\system32\sxs.dll
eax=1000162a ebx=00000000 ecx=0f78c21d edx=00000007 esi=7ffdc000 edi=20000000
eip=76f99a94 esp=026ef620 ebp=026ef664 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
ntdll!KiFastSystemCallRet:
76f99a94 c3 ret
0:002> bp sxs!SxsInstallW
0:002> g
Breakpoint 0 hit
eax=026efce8 ebx=0032db90 ecx=026efd08 edx=0032dbac esi=00332a08 edi=026efd44
eip=755475ad esp=026efcd4 ebp=026efd48 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
sxs!SxsInstallW:
755475ad 8bff mov edi,edi
0:002> wt -l 2 -m sxs
Tracing sxs!SxsInstallW to return address 75549f41
16 0 [ 0] sxs!SxsInstallW
13 0 [ 1] sxs!CFrame::CFrame
20 13 [ 0] sxs!SxsInstallW
3 0 [ 1] sxs!CFrame::BaseEnter
11 0 [ 2] sxs!FusionpRtlPushFrame
6 11 [ 1] sxs!CFrame::BaseEnter
25 30 [ 0] sxs!SxsInstallW
10 0 [ 1] sxs!CGenericStringBuffer<64,CUnicodeCharTraits>::CGenericStringBuffer<64,CUnicodeCharTraits>
12 0 [ 2] sxs!CGenericBaseStringBuffer::InitializeInlineBuffer
13 12 [ 1] sxs!CGenericStringBuffer<64,CUnicodeCharTraits>::CGenericStringBuffer<64,CUnicodeCharTraits>
28 55 [ 0] sxs!SxsInstallW
10 0 [ 1] sxs!CGenericStringBuffer<64,CUnicodeCharTraits>::CGenericStringBuffer<64,CUnicodeCharTraits>
12 0 [ 2] sxs!CGenericBaseStringBuffer::InitializeInlineBuffer
13 12 [ 1] sxs!CGenericStringBuffer<64,CUnicodeCharTraits>::CGenericStringBuffer<64,CUnicodeCharTraits>
30 80 [ 0] sxs!SxsInstallW
10 0 [ 1] sxs!CGenericStringBuffer<64,CUnicodeCharTraits>::CGenericStringBuffer<64,CUnicodeCharTraits>
12 0 [ 2] sxs!CGenericBaseStringBuffer::InitializeInlineBuffer
13 12 [ 1] sxs!CGenericStringBuffer<64,CUnicodeCharTraits>::CGenericStringBuffer<64,CUnicodeCharTraits>
68 105 [ 0] sxs!SxsInstallW
4 0 [ 1] sxs!CFrame::ClearLastError
713 109 [ 0] sxs!SxsInstallW
15 0 [ 1] sxs!SxspExpandRelativePathToFull
13 0 [ 2] sxs!CFrame::CFrame
19 13 [ 1] sxs!SxspExpandRelativePathToFull
6 0 [ 2] sxs!CFrame::BaseEnter
22 19 [ 1] sxs!SxspExpandRelativePathToFull
29 0 [ 2] sxs!CGenericStringBufferAccessor::Attach
23 48 [ 1] sxs!SxspExpandRelativePathToFull
4 0 [ 2] sxs!CFrame::ClearLastError
29 52 [ 1] sxs!SxspExpandRelativePathToFull
13 0 [ 2] kernel32!GetFullPathNameW
36 65 [ 1] sxs!SxspExpandRelativePathToFull
31 0 [ 2] sxs!CGenericStringBufferAccessor::Detach
37 96 [ 1] sxs!SxspExpandRelativePathToFull
4 0 [ 2] sxs!CFrame::ClearLastError
41 100 [ 1] sxs!SxspExpandRelativePathToFull
68 0 [ 2] sxs!CGenericBaseStringBuffer::Win32ResizeBuffer
46 168 [ 1] sxs!SxspExpandRelativePathToFull
29 0 [ 2] sxs!CGenericStringBufferAccessor::Attach
47 197 [ 1] sxs!SxspExpandRelativePathToFull
4 0 [ 2] sxs!CFrame::ClearLastError
52 201 [ 1] sxs!SxspExpandRelativePathToFull
13 0 [ 2] kernel32!GetFullPathNameW
64 214 [ 1] sxs!SxspExpandRelativePathToFull
9 0 [ 2] sxs!CFnTracerWin32::~CFnTracerWin32
66 223 [ 1] sxs!SxspExpandRelativePathToFull
20 0 [ 2] sxs!CGenericStringBufferAccessor::~CGenericStringBufferAccessor
72 243 [ 1] sxs!SxspExpandRelativePathToFull
797 424 [ 0] sxs!SxsInstallW
4 0 [ 1] sxs!CFrame::ClearLastError
800 428 [ 0] sxs!SxsInstallW
28 0 [ 1] sxs!SxspGetRemoteStore
5 0 [ 2] sxs!SxspEnsureUserIsAdmin
34 5 [ 1] sxs!SxspGetRemoteStore
9 0 [ 2] kernel32!LoadLibraryW
40 14 [ 1] sxs!SxspGetRemoteStore
18 0 [ 2] ShimEng!StubGetProcAddress
50 32 [ 1] sxs!SxspGetRemoteStore
24 0 [ 2] ole32!CoCreateInstance
>> No match on ret
24 0 [ 2] ole32!CoCreateInstance
5 0 [ 2] RPCRT4!NdrpGetRpcHelper
>> No match on ret
5 0 [ 2] RPCRT4!NdrpGetRpcHelper
17 0 [ 2] RPCRT4!NdrpGetIIDFromBuffer
>> No match on ret
17 0 [ 2] RPCRT4!NdrpGetIIDFromBuffer
71 0 [ 2] RPCRT4!NdrpInterfacePointerUnmarshall
>> No match on ret
71 0 [ 2] RPCRT4!NdrpInterfacePointerUnmarshall
11 0 [ 2] RPCRT4!NdrpPointerUnmarshall
>> No match on ret
11 0 [ 2] RPCRT4!NdrpPointerUnmarshall
4 0 [ 2] RPCRT4!NdrPointerUnmarshall
>> No match on ret
4 0 [ 2] RPCRT4!NdrPointerUnmarshall
19 0 [ 2] RPCRT4!NdrpPointerUnmarshall
>> No match on ret
19 0 [ 2] RPCRT4!NdrpPointerUnmarshall
4 0 [ 2] RPCRT4!NdrPointerUnmarshall
>> No match on ret
4 0 [ 2] RPCRT4!NdrPointerUnmarshall
65 0 [ 2] RPCRT4!NdrpClientUnMarshal
>> No match on ret
65 0 [ 2] RPCRT4!NdrpClientUnMarshal
16 0 [ 2] RPCRT4!NdrClientCall2
>> No match on ret
16 0 [ 2] RPCRT4!NdrClientCall2
8 0 [ 2] RPCRT4!ObjectStublessClient
>> No match on ret
8 0 [ 2] RPCRT4!ObjectStublessClient
4 0 [ 2] RPCRT4!ObjectStubless
65 0 [ 2] ole32!CRpcResolver::CreateInstance
>> No match on ret
65 0 [ 2] ole32!CRpcResolver::CreateInstance
10 0 [ 2] ole32!CClientContextActivator::CreateInstance
>> No match on ret
10 0 [ 2] ole32!CClientContextActivator::CreateInstance
8 0 [ 2] ole32!ActivationPropertiesIn::DelegateCreateInstance
>> No match on ret
8 0 [ 2] ole32!ActivationPropertiesIn::DelegateCreateInstance
53 0 [ 2] ole32!ICoCreateInstanceEx
>> No match on ret
53 0 [ 2] ole32!ICoCreateInstanceEx
21 0 [ 2] ole32!CComActivator::DoCreateInstance
>> No match on ret
21 0 [ 2] ole32!CComActivator::DoCreateInstance
2 0 [ 2] ole32!CoCreateInstanceEx
>> No match on ret
2 0 [ 2] ole32!CoCreateInstanceEx
5 0 [ 2] ole32!CoCreateInstance
62 444 [ 1] sxs!SxspGetRemoteStore
34 0 [ 2] ole32!CStdIdentity::CInternalUnk::QueryInterface
>> No match on ret
34 0 [ 2] ole32!CStdIdentity::CInternalUnk::QueryInterface
14 0 [ 2] ole32!CreateIdentityHandler
>> No match on ret
14 0 [ 2] ole32!CreateIdentityHandler
21 0 [ 2] ole32!UnmarshalInternalObjRef
>> No match on ret
21 0 [ 2] ole32!UnmarshalInternalObjRef
27 0 [ 2] ole32!OXIDEntry::UnmarshalRemUnk
>> No match on ret
27 0 [ 2] ole32!OXIDEntry::UnmarshalRemUnk
18 0 [ 2] ole32!OXIDEntry::MakeRemUnk
>> No match on ret
18 0 [ 2] ole32!OXIDEntry::MakeRemUnk
7 0 [ 2] ole32!OXIDEntry::GetRemUnk
>> No match on ret
7 0 [ 2] ole32!OXIDEntry::GetRemUnk
2 0 [ 2] ole32!CStdMarshal::GetSecureRemUnk
>> No match on ret
2 0 [ 2] ole32!CStdMarshal::GetSecureRemUnk
23 0 [ 2] ole32!CStdMarshal::Begin_RemQIAndUnmarshal1
>> No match on ret
23 0 [ 2] ole32!CStdMarshal::Begin_RemQIAndUnmarshal1
5 0 [ 2] ole32!CStdMarshal::Begin_QueryRemoteInterfaces
>> No match on ret
5 0 [ 2] ole32!CStdMarshal::Begin_QueryRemoteInterfaces
ModLoad: 741e0000 741ea000 C:\Windows\system32\sxsstore.dll

Woah. That’s verbose. There’s a lot of noise in this trace, but the SxspGetRemoteStore function draws attention and it is obvious from all the OLE32 invocations later on that COM is at work here. Examining the sxs!SxspGetRemoteStore function reveals it instantiates the COM object identified by CLSID_SxsStore (left as an exercise for the reader).

Let’s have a look at the object’s registration information. First, extract the CLSID:

0:002> x sxs!CLSID_SxsStore
7554c454 sxs!CLSID_SxsStore =
0:002> dt nt!_GUID 7554c454
ntdll!_GUID
{3c6859ce-230b-48a4-be6c-932c0c202048}
+0x000 Data1 : 0x3c6859ce
+0x004 Data2 : 0x230b
+0x006 Data3 : 0x48a4
+0x008 Data4 : [8] "???"

Now, we’ll use the command-line to see what’s special about this object’s registration:

C:\Users\User\Desktop>reg query HKCR\CLSID\{3c6859ce-230b-48a4-be6c-932c0c202048
} /s


HKEY_CLASSES_ROOT\CLSID\{3c6859ce-230b-48a4-be6c-932c0c202048}
(Default) REG_SZ Sxs Store Class
AppID REG_SZ {752073A2-23F2-4396-85F0-8FDB879ED0ED}

HKEY_CLASSES_ROOT\CLSID\{3c6859ce-230b-48a4-be6c-932c0c202048}\LocalServer32
(Default) REG_EXPAND_SZ %systemroot%\servicing\TrustedInstaller.exe
ThreadingModel REG_SZ Both

OK, that explains it. The SxS API asks an out-of-process COM server running in the context of the TrustedInstaller service to do its bidding, explaining how things work despite the restrictive ACL on the store.

Hope you enjoyed that digression, but now back to the original business at hand. The installation process works just fine on Vista, but the plot thickens when we examine the uninstallation process.

Strangely and contrary to documentation, UninstallAssembly always returns success but with a disposition value of 0 on Vista, and the assembly files remain in place in the WinSxS store no matter what. The bottom line – if you use this approach to deploy the libraries to a Vista system, you may leave behind unused assembly files after your application is uninstalled, cluttering the user’s system. Take this to heart when considering whether this approach and the avoidance of an MSI installer is appropriate for your scenario.

Both the issue of Visual C++ 9.0 assembly deployment using the SxS API and the weird referencing behavior encounter during assembly uninstallation on Vista remain, as of yet, unresolved issues. If anyone is game for figuring those out, I’d be glad to hear about it.

Associating LPC clients and servers

Local Procedure Call (LPC) is a fast, lightweight interprocess communication mechanism used extensively by Windows system components. Microsoft’s documented approach for using LPC is through the higher-level RPC (Remote Procedure Call) API and its “ncalrpc” transport, which wraps the lower-level LPC interfaces. However, many Windows components use the LPC functions from the Native API directly and not through ncalrpc.

Like other remoting architectures, LPC presents a diagnostic challenge. We locate the remoting client thread and expect its flow to contain whatever information we are looking for, only to find it sending a message to the other end, magically receiving a response and continuing its merry way. In order to get a complete view of the complex interactions in the system, the task at hand is locating the server thread and examining its flow until we have the answer we are looking for.

As a case study, we shall consider an unlucky developer that must debug a custom security support provider or authentication package (SSP or AP) that runs in the context of the Windows Local Security Authority (LSA) subsystem. The developer wishes to single-step through the creation of a logon session. This sequence starts at the Winlogon process when it calls the LsaLogonUser API to authenticate a user’s credentials. The following treatise discusses Windows XP and traditional LPC. Vista features asynchronous LPC (ALPC) and is beyond the scope of this post.

We attach the trusty Windbg to the Winlogon process and set up a breakpoint on the API. We resume execution and use the Welcome screen GUI to initiate the actual logon attempt. For convenience, this would best be done with a remote debugging setup to a virtual machine. Once execution is suspended at the API of interest, we perform a trace to get an initial idea of the flow:

0:019> bp secur32!LsaLogonUser
0:019> g
ModLoad: 74980000 74a93000 C:\WINDOWS\system32\msxml3.dll
Breakpoint 0 hit
eax=0006ec78 ebx=00eedfb8 ecx=00000000 edx=00eedfdc esi=759799c4 edi=00eedfe4
eip=77fe33e8 esp=0006ec08 ebp=0006ecb8 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
Secur32!LsaLogonUser:
77fe33e8 8bff mov edi,edi
0:000> wt
Tracing Secur32!LsaLogonUser to return address 77dec60d
43 0 [ 0] Secur32!LsaLogonUser
1 0 [ 1] ntdll!NtRequestWaitReplyPort
2 0 [ 1] ntdll!ZwRequestWaitReplyPort
2 0 [ 2] ntdll!KiFastSystemCall
1 0 [ 1] ntdll!ZwRequestWaitReplyPort
64 6 [ 0] Secur32!LsaLogonUser


70 instructions were executed in 69 events (0 from other threads)

Function Name Invocations MinInst MaxInst AvgInst
Secur32!LsaLogonUser 1 64 64 64
ntdll!KiFastSystemCall 1 2 2 2
ntdll!NtRequestWaitReplyPort 1 1 1 1
ntdll!ZwRequestWaitReplyPort 2 1 2 1

1 system call was executed

Calls System Call
1 ntdll!KiFastSystemCall

eax=c000006d ebx=00eedfb8 ecx=0006eda8 edx=00100004 esi=759799c4 edi=00eedfe4
eip=77dec60d esp=0006ec44 ebp=0006ecb8 iopl=0 nv up ei ng nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000282
ADVAPI32!L32pLogonUser+0x2cf:
77dec60d 8bf0 mov esi,eax

The debugging developer is disappointed the API simply invokes the NtRequestWaitReplyPort native API, which is used by the client of an LPC port to send a request message and wait synchronously for the server response. The other end is probably the LSA subsystem process, lsass.exe, intuition predicts. First, let’s verify this intuitive guess.

We’ll break again at an invocation of LsaLogonUser in the Winlogon process and this time proceed to the invocation of the LPC port and examine its arguments:


:000> g
ModLoad: 74980000 74a93000 C:\WINDOWS\system32\msxml3.dll
ModLoad: 74980000 74a93000 C:\WINDOWS\system32\msxml3.dll
Breakpoint 0 hit
eax=0006ec78 ebx=00eedfb8 ecx=00000000 edx=00eedfdc esi=759799c4 edi=00eedfe4
eip=77fe33e8 esp=0006ec08 ebp=0006ecb8 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
Secur32!LsaLogonUser:
77fe33e8 8bff mov edi,edi
0:000> bp ntdll!NtRequestWaitReplyPort
0:000> g
Breakpoint 1 hit
eax=0006eb5c ebx=00eedfb8 ecx=000e000d edx=00eedfdc esi=00000000 edi=0006ebac
eip=7c90e3e1 esp=0006eb44 ebp=0006ec04 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
ntdll!NtRequestWaitReplyPort:
7c90e3e1 b8c8000000 mov eax,0C8h
0:000> kb 2
ChildEBP RetAddr Args to Child
0006eb40 77fe347e 000006b4 0006eb5c 0006eb5c ntdll!NtRequestWaitReplyPort
0006ec04 77dec60d 000006b4 0006ec78 00000002 Secur32!LsaLogonUser+0xa0

We see that the prototype of NtRequestWaitReplyPort places the LPC port handle as the first argument of the API. We verify this assertion:
0:000> !handle 6b4
Handle 6b4
Type Port

Great. Now what? An LPC port is, in fact, one of the many kinds of NT kernel objects. If we wish to gain access to more information about it, we must access the kernel’s knowledge of the object. This is accomplished through the use of the kernel debugger, kd. Windbg can also act as a GUI wrapper for kernel debugging sessions. As we shall soon find out, the kernel debugger is quite useful even when you have nothing to do with drivers or other kernel-mode code.

Attaching a kernel debugger is covered extensively in the literature and is left as an exercise for the reader. Suffice to say you are much better off using a virtual machine and a named pipe over using a serial or FireWire connection to a physical system, unless some exotic circumstances force you to use an actual machine (i.e., you are debugging authentication of a fingerprint reader product you are developing, etc. and need the physical device to be present in the debugged system)

Handles, except system handles, are process-local. What we want to do is associate the Winlogon port handle with the kernel LPC port object. First, we’ll find the Winlogon process and set the kernel debugger’s context to that process:

0: kd> !process 0 0 winlogon.exe
PROCESS 82bcdda0 SessionId: 0 Cid: 026c Peb: 7ffdd000 ParentCid: 0220
DirBase: 0a3e0000 ObjectTable: e14f1a48 HandleCount: 534.
Image: winlogon.exe

0: kd> .process 82bcdda0
Implicit process is now 82bcdda0
WARNING: .cache forcedecodeuser is not enabled

Good, now lets have a look at the handle, this time from the kernel debugger’s point of view. Although the command has the same name, the kernel debugger command is distinct.

0: kd> !handle 6b4
processor number 0, process 82bcdda0
PROCESS 82bcdda0 SessionId: 0 Cid: 026c Peb: 7ffdd000 ParentCid: 0220
DirBase: 0a3e0000 ObjectTable: e14f1a48 HandleCount: 534.
Image: winlogon.exe

Handle table at e1ba8000 with 534 Entries in use
06b4: Object: e1582f68 GrantedAccess: 001f0001 Entry: e169cd68
Object: e1582f68 Type: (82bed5d0) Port
ObjectHeader: e1582f50 (old version)
HandleCount: 1 PointerCount: 1

We see the LPC port kernel object is at kernel-mode address 0xe1582f68. Each variety of kernel object might have a distinct debugger extension for further examination. In the LPC port case, we shall use the !lpc port command to examine the port:

0: kd> !lpc port e1582f68

Client communication port 0xe1582f68
Handles: 1 References: 1
The LpcDataInfoChainHead queue is empty
Connected port: 0xe15765a0 Server connection port: 0xe17426c8

Server communication port 0xe15765a0
Handles: 1 References: 1
The LpcDataInfoChainHead queue is empty

Server connection port e17426c8 Name: LsaAuthenticationPort
Handles: 1 References: 52
Server process : 8290e4b8 (lsass.exe)
Queue semaphore : 82a0a3e8
Semaphore state 0 (0x0)
The message queue is empty
The LpcDataInfoChainHead queue is empty

Our suspicion that lsass.exe is the server for this LPC port is confirmed. Furthermore, we now know that the LPC port is called LsaAuthenticationPort by this client/server pair. If LSASS has just one port server thread, we’re done. Let us examine it to see if we are that lucky. A word of caution to the uninitiated: debugging LSASS is a messy deal and you can easily deadlock your system by initiating operations that interact with LSA and expect it to be active. For example, do not use a symbol server based on an SMB share, since the network session setup will block while LSA is suspended. For our purposes, a non-invasive, non-suspending session, using the NTSD debugger’s “-pvr” command line switch, would be best for the initial investigation. Once started, we use the debugger to locate stack frames with LPC servers, by looking for the NtReplyWaitReceivePort native API on the stack:

0:000> !findstack ntdll!NtReplyWaitReceivePort
Thread 004, 1 frame(s) match
* 01 008dfe88 7575ba56 ntdll!NtReplyWaitReceivePort+0xc

Thread 006, 1 frame(s) match
* 01 00acfe18 77e765d3 ntdll!NtReplyWaitReceivePortEx+0xc

Thread 007, 1 frame(s) match
* 01 00b8fe40 75738f5c ntdll!NtReplyWaitReceivePort+0xc

Thread 010, 1 frame(s) match
* 01 00e5fe18 77e765d3 ntdll!NtReplyWaitReceivePortEx+0xc

Thread 016, 1 frame(s) match
* 01 0007fe18 77e765d3 ntdll!NtReplyWaitReceivePortEx+0xc

Thread 017, 1 frame(s) match
* 01 00b0fe18 77e765d3 ntdll!NtReplyWaitReceivePortEx+0xc

Thread 019, 1 frame(s) match
* 01 00c8fe18 77e765d3 ntdll!NtReplyWaitReceivePortEx+0xc

GetContextState failed, 0x8007001F
GetContextState failed, 0x8007001F
GetContextState failed, 0x8007001F
Unable to get program counter
GetContextState failed, 0x8007001F
Unable to get current machine context, Win32 error 0n31
GetContextState failed, 0x8007001F
GetContextState failed, 0x8007001F
GetContextState failed, 0x8007001F
GetContextState failed, 0x8007001F
Unable to get current machine context, Win32 error 0n31
ERROR: !findstack: extension exception 0x8007001f.
"Unable to get thread stack"
0:000>

We can ignore the spurious errors from !findstack on the bottom of the output, which are the result of our non-invasive attach. Unfortunately for us, the plot thickens as lsass.exe has a bunch of server threads and it is as of yet unclear which one will have the server-side flow of interest. How can figure out which of the seven potential servers is the right one?

We’ll get started by extracting the server port handle from each thread’s stack. As in the case of the client function, NtReplyWaitReceivePort takes the port handle as its first argument.

0:000> ~4 kb 2
ChildEBP RetAddr Args to Child
008dfe84 7c90e384 7575ba56 000000d0 00000000 ntdll!KiFastSystemCallRet
008dfe88 7575ba56 000000d0 00000000 00000000 ntdll!NtReplyWaitReceivePort+0xc
0:000> ~6 kb 2
ChildEBP RetAddr Args to Child
00acfe14 7c90e399 77e765d3 000002e8 00acff74 ntdll!KiFastSystemCallRet
00acfe18 77e765d3 000002e8 00acff74 00000000 ntdll!NtReplyWaitReceivePortEx+0xc
0:000> ~7 kb 2
ChildEBP RetAddr Args to Child
00b8fe3c 7c90e384 75738f5c 000002cc 00b8fe6c ntdll!KiFastSystemCallRet
00b8fe40 75738f5c 000002cc 00b8fe6c 00000000 ntdll!NtReplyWaitReceivePort+0xc
0:000> ~10 kb 2
ChildEBP RetAddr Args to Child
00e5fe14 7c90e399 77e765d3 000002b4 00e5ff74 ntdll!KiFastSystemCallRet
00e5fe18 77e765d3 000002b4 00e5ff74 00000000 ntdll!NtReplyWaitReceivePortEx+0xc
0:000> ~16 kb 2
GetContextState failed, 0x8007001F
Unable to get current machine context, Win32 error 0n31
^ Illegal thread error in '~16 kb 2'
0:000> ~17 kb 2
GetContextState failed, 0x8007001F
Unable to get current machine context, Win32 error 0n31
^ Illegal thread error in '~17 kb 2'
0:000> ~19 kb 2
ChildEBP RetAddr Args to Child
00c8fe14 7c90e399 77e765d3 000002a8 00c8ff74 ntdll!KiFastSystemCallRet
00c8fe18 77e765d3 000002a8 00c8ff74 00000000 ntdll!NtReplyWaitReceivePortEx+0xc

The non-invasive session has trouble accessing the context of threads 16 and 17. If we don’t find the server handle we are looking for elsewhere, we could retry with an invasive attach. For now, let’s note the server port handles appearing in the stack traces we do have.

Knowing which thread is associated with which port handle, we turn to the kernel debugger for associating the server port object with its handle in the lsass.exe process. Recall that we have the server port object’s address from the output of the !lpc port command on the client port object. We list lsass.exe’s port handles and look for the one associated with it:

0: kd> !process 0 0 lsass.exe
PROCESS 8290e4b8 SessionId: 0 Cid: 02b0 Peb: 7ffdf000 ParentCid: 026c
DirBase: 0a337000 ObjectTable: e1730338 HandleCount: 357.
Image: lsass.exe
0: kd> !handle 0 0 8290e4b8 Port
processor number 0, process 8290e4b8
Searching for handles of type Port
PROCESS 8290e4b8 SessionId: 0 Cid: 02b0 Peb: 7ffdf000 ParentCid: 026c
DirBase: 0a337000 ObjectTable: e1730338 HandleCount: 357.
Image: lsass.exe
Handle table at e173b000 with 357 Entries in use
0018: Object: e1736c58 GrantedAccess: 001f0001 (Protected)
00c4: Object: e172a578 GrantedAccess: 001f0001
00cc: Object: e173d030 GrantedAccess: 001f0001
00d0: Object: e173ff68 GrantedAccess: 001f0001
01c8: Object: e17ba3b8 GrantedAccess: 001f0001
02a8: Object: e1754bc8 GrantedAccess: 001f0001
02b4: Object: e17427a0 GrantedAccess: 001f0001
02b8: Object: e1b6c318 GrantedAccess: 001f0001
02c8: Object: e173d240 GrantedAccess: 001f0001
02cc: Object: e17426c8 GrantedAccess: 001f0001
02e8: Object: e173f410 GrantedAccess: 001f0001
030c: Object: e1742470 GrantedAccess: 001f0001
0314: Object: e1742548 GrantedAccess: 001f0001
033c: Object: e1744e58 GrantedAccess: 001f0001
03a4: Object: e17a4e00 GrantedAccess: 001f0001
03b0: Object: e17a4cc0 GrantedAccess: 001f0001
03c0: Object: e17a9640 GrantedAccess: 001f0001
03c8: Object: e17f29f8 GrantedAccess: 001f0001
03cc: Object: e17deb48 GrantedAccess: 001f0001
03d4: Object: e17f53d0 GrantedAccess: 001f0001
03dc: Object: e17f7dd8 GrantedAccess: 001f0001
03e4: Object: e17f7c10 GrantedAccess: 001f0001
03f4: Object: e190abf0 GrantedAccess: 001f0001
03f8: Object: e17fb690 GrantedAccess: 001f0001
0404: Object: e1b43310 GrantedAccess: 001f0001
0418: Object: e19149b8 GrantedAccess: 001f0001
0440: Object: e1942030 GrantedAccess: 001f0001
0454: Object: e19184a0 GrantedAccess: 001f0001
0458: Object: e1b3a770 GrantedAccess: 001f0001
0464: Object: e1758170 GrantedAccess: 001f0001
0480: Object: e1601ab8 GrantedAccess: 001f0001
049c: Object: e1cf59d8 GrantedAccess: 001f0001
04a4: Object: e13082a8 GrantedAccess: 001f0001
04f4: Object: e1366490 GrantedAccess: 001f0001
0508: Object: e1b466c0 GrantedAccess: 001f0001
0514: Object: e1c9f450 GrantedAccess: 001f0001
0564: Object: e1b5a640 GrantedAccess: 001f0001
0580: Object: e15765a0 GrantedAccess: 001f0001
05cc: Object: e1d814e0 GrantedAccess: 001f0001

Woah, that’s quite a list. We note the handle pointing to the server port object is 0x2cc. Cross-referencing with the handle stacks we extracted from the user-mode debugger, we hit the gold and figure out thread 7 is the thread we are looking for:


0:000> ~7 k
ChildEBP RetAddr
00b8fe3c 7c90e384 ntdll!KiFastSystemCallRet
00b8fe40 75738f5c ntdll!NtReplyWaitReceivePort+0xc
00b8ff74 75738d66 LSASRV!LpcServerThread+0xaf
00b8ffb4 7c80b6a3 LSASRV!LsapThreadBase+0x91
00b8ffec 00000000 kernel32!BaseThreadStart+0x37

Yup, that looks about right. We now know we should set our breakpoint at this thread, as it is expected to return from the blocking NtReplyWaitReceivePort system call upon an incoming LPC message on LsaAuthenticationPort. At that point, we can follow execution as the LSA server DLL assigns handling the call to a worker thread or whatever else it may be up to.

Once again, kd saves the day.

Breaking when a function returns a specific value without depending on its call site

Jigar Mehta writes today about how to set a breakpoint in the debugger on a function and check its return value using conditionals, in another post in his fine blog, that can teach even a seasoned Windbg user a thing or two (personally I didn’t know of the “-psn” switch that allows easy attachment to a service process until reading about there).

However, Jigar’s approach requires to determine the call site of the interesting function and breaking after its return for examining the return value. This may be tolerable if you are only interested in what the function returns when invoked from a specific location, but for more general scenarios can be insufficient.

An alternative approach is to use the “gu” debugger command (go up to after the call site) and examine the return value register and potentially other output parameters at that state. For example:

0:000> bp kernel32!CreateFileW "gu; j @eax == -1 '.echo CreateFileW failed; gc' '.printf \"CreateFileW returned %p\", @$retreg; .echo; gc'"
0:000> g
ModLoad: 5cb70000 5cb96000 C:\WINDOWS\system32\ShimEng.dll
...
ModLoad: 74d90000 74dfb000 C:\WINDOWS\system32\USP10.dll
CreateFileW returned 000007c8
ModLoad: 74720000 7476b000 C:\WINDOWS\system32\MSCTF.dll
CreateFileW returned 00000774
CreateFileW returned 00000774
ModLoad: 755c0000 755ee000 C:\WINDOWS\system32\msctfime.ime
ModLoad: 605d0000 605d9000 C:\WINDOWS\system32\mslbui.dll

I get bonus points for using the portable @$retreg over @eax. I lose some for not figuring out how the heck to get .printf to write a newline without a trailing .echo (both \n and \r\n don’t seem to do the trick here). Of course if what you have in mind is to actually break at the breakpoint, do not append “gc” (go from conditional breakpoint) at the end of the branch of interest.

The case of the Windows Desktop Search CPU hog

My home computer is set up (using MBM 5) to sound off an alarm with the PC Speaker whenever CPU temperature exceeds 70 degrees Celsius. It is not that I overclock it, not at all, but rather because of the embarrassingly poor fan I got with the box (which I didn’t purchase personally) combined with the Pentium 4’s well-known poor thermal characteristics.

The alarm is quite audible and annoying. Fortunately, it is pretty unusual for the CPU to get that hot. I’ve noticed that it does this after more than a few minutes of running at near 100% CPU usage.

Earlier today, I could hear the annoying alarm from across the house. I sat down at the computer and quickly silenced it. Since this isn’t the first time I’ve heard the alarm, I already knew what I was looking for: a CPU hog. More often than not, the culprit is vmware-vmx.exe, VMware Workstation’s virtual machine process. If you run an MS-DOS VM without DOSIDLE or the like, the guest OS hogs the CPU and the blame is placed on vmware-vmx.exe. This can also happen with a modern guest OS like Windows in case the guest is misbehaving for some reason. However, this time, I didn’t remember leaving any VM running on the machine, and indeed, none was to be found.

I opened the Task Manager to examine the situation and after a few moments determined that cidaemon.exe was at 49% CPU usage. This actually means 100% on my system which has a hyper-threaded, single core Pentium 4.

My first reaction was that it’s probably just a random case of a CPU hog since I’ve never experienced it on my system before. Without asking too many questions, I killed the cidaemon.exe process and watched as CPU utilization plummeted to the healthy single-digits.

I got up from the computer and addressed other concerns (washed my car). It was about an hour later that I came back and heard the hectic CPU temperature alarm once again. I took another look at the process list and found out cidaemon.exe has been resurrected, and had returned to avenge its predecessor’s demise with another round of CPU hogging.

At this point I understood there is no recourse but to conduct a deeper investigation. I googled for cidaemon.exe being a CPU hog and pretty much only found the “solution” of disabling Windows Desktop Search. Yeah, well, not so fast. I fired up the debugger and attached to the CPU hogging cidaemon.exe process (there was another instance of cidaemon.exe which was being benign).

The first thing I did is get some basic information on where I found myself:

0:006> lmv m cidaemon
start end module name
01000000 01005000 cidaemon (pdb symbols) C:\WINDOWS\Symbols\exe\cidaemon.pdb
Loaded symbol image file: C:\WINDOWS\SYSTEM32\cidaemon.exe
Image path: C:\WINDOWS\SYSTEM32\cidaemon.exe
Image name: cidaemon.exe
Timestamp: Fri Aug 17 23:56:12 2001 (3B7D84EC)
CheckSum: 00007613
ImageSize: 00005000
File version: 5.1.2600.0
Product version: 5.1.2600.0
File flags: 0 (Mask 3F)
File OS: 40004 NT Win32
File type: 1.0 App
File date: 00000000.00000000
Translations: 0409.04b0
CompanyName: Microsoft Corporation
ProductName: Microsoft® Windows® Operating System
InternalName: cidaemon.exe
OriginalFilename: cidaemon.exe
ProductVersion: 5.1.2600.0
FileVersion: 5.1.2600.0 (xpclient.010817-1148)
FileDescription: Indexing Service filter daemon
LegalCopyright: © Microsoft Corporation. All rights reserved.

OK, so cidaemon.exe is the indexing service’s filter daemon. An educated guess would be that the name is actually short for “content indexing daemon.” I knew that the indexing service is a part of the core Windows XP operating system (as is apparent from the file version of cidaemon.exe above) and not an additional service installed by the separate Windows Desktop Search package I have on my system.

The next thing I did was to see how many threads I am dealing with in this demonic daemon process:

0:003> ~
0 Id: afc.af4 Suspend: 1 Teb: 7ffdf000 Unfrozen
1 Id: afc.a5c Suspend: 1 Teb: 7ffde000 Unfrozen
2 Id: afc.858 Suspend: 1 Teb: 7ffdc000 Unfrozen
. 3 Id: afc.13b4 Suspend: 1 Teb: 7ffdb000 Unfrozen
4 Id: afc.1190 Suspend: 1 Teb: 7ffda000 Unfrozen
6 Id: afc.10d8 Suspend: 1 Teb: 7ffd8000 Unfrozen

OK, looks like I’ve got quite a few. Only the CPU hogging thread is of interest, so let’s check the CPU times, let the process resume for a while, and examine the times once again:

0:003> !runaway
User Mode Time
Thread Time
6:10d8 0 days 0:00:53.515
0:af4 0 days 0:00:00.140
4:1190 0 days 0:00:00.078
3:13b4 0 days 0:00:00.000
2:858 0 days 0:00:00.000
1:a5c 0 days 0:00:00.000
0:003> g
(afc.ca8): Break instruction exception - code 80000003 (first chance)
eax=7ffdd000 ebx=00000001 ecx=00000002 edx=00000003 esi=00000004 edi=00000005
eip=7c901230 esp=0078ffcc ebp=0078fff4 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000246
ntdll!DbgBreakPoint:
7c901230 cc int 3
0:005> !runaway
User Mode Time
Thread Time
6:10d8 0 days 0:01:04.859
0:af4 0 days 0:00:00.140
4:1190 0 days 0:00:00.078
5:ca8 0 days 0:00:00.000
3:14b8 0 days 0:00:00.000
2:858 0 days 0:00:00.000
1:a5c 0 days 0:00:00.000

Well, no hard choice here. The other threads list the same CPU time and thread 6 is obviously the culprit. I examine the thread’s stack trace:

0:005> ~6 s
eax=00000103 ebx=00000aef ecx=00000103 edx=000043c3 esi=000043c0 edi=000000ef
eip=74da9a58 esp=00bdc2ac ebp=00bdc370 iopl=0 nv up ei ng nz ac po cy
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000293
USP10!ScriptTokenize+0x97:
74da9a58 663b4508 cmp ax,word ptr [ebp+8] ss:0023:00bdc378=0103
0:006> k
ChildEBP RetAddr
00bdc370 74da2d44 USP10!ScriptTokenize+0x97
00bdc398 74e85840 USP10!ScriptItemize+0x42
00bdc3d0 74e84a13 riched20!CUniscribe::ItemizeString+0x8a
00bdc644 74e5f269 riched20!CTxtBreaker::OnPostReplaceRange+0x200
00bdc664 74e6b436 riched20!CNotifyMgr::NotifyPostReplaceRange+0x25
00bdc6ec 74e65825 riched20!CRchTxtPtr::ReplaceRange+0x42c
00bdc750 74e65643 riched20!CTxtRange::ReplaceRange+0x127
00bdc7b8 74e66b3e riched20!CTxtRange::CheckLimitReplaceRange+0x179
00bdca04 74e5b32b riched20!CTxtRange::CleanseAndReplaceRange+0x833
00bdfa78 74e5b8b2 riched20!CLightDTEngine::ReadPlainText+0x2c7
00bdfd30 74e7ce5d riched20!CLightDTEngine::LoadFromEs+0x287
00bdfeb8 74e7e87b riched20!CTxtEdit::TxSendMessage+0x11c9
00bdff1c 00852b90 riched20!CTxtEdit::Open+0x244
WARNING: Stack unwind information not available. Following frames may be wrong.
00bdff44 00853e4e rtffilt!DllCanUnloadNow+0xaff
00bdff74 00853eb8 rtffilt!DllRegisterServer+0x1af
00bdffac 00853f11 rtffilt!DllRegisterServer+0x219
00bdffb4 7c80b683 rtffilt!DllRegisterServer+0x272
00bdffec 00000000 kernel32!BaseThreadStart+0x37

I was quite surprised and disappointed by this stack trace. As is apparent, no symbolic information is available for rtffilt.dll. Since my Windbg is configured to use the Microsoft web symbol store, the PDB should have been downloaded automatically. I pondered, perhaps a third party product has installed this filter into the indexing service? I examined who is to blame for this sorry state of affairs:

0:006> lmv m rtffilt
start end module name
00850000 0085a000 rtffilt (export symbols) C:\WINDOWS\system32\rtffilt.dll
Loaded symbol image file: C:\WINDOWS\system32\rtffilt.dll
Image path: C:\WINDOWS\system32\rtffilt.dll
Image name: rtffilt.dll
Timestamp: Tue Feb 06 01:36:06 2007 (45C7BF66)
CheckSum: 00014877
ImageSize: 0000A000
File version: 2006.0.6000.16431
Product version: 2006.0.6000.16431
File flags: 8 (Mask 3F) Private
File OS: 40004 NT Win32
File type: 2.0 Dll
File date: 00000000.00000000
Translations: 0409.04b0
CompanyName: Microsoft Corporation
ProductName: Microsoft® Windows® Operating System
InternalName: rtffilt.lib
OriginalFilename: rtffilt.lib
ProductVersion: 2006.0.6000.16431
FileVersion: 2006.0.6000.16431 (vista_gdr(wmbla).070205-1422)
FileDescription: RTF Filter
LegalCopyright: © Microsoft Corporation. All rights reserved.

Oy vey. The RTF filter is actually a Microsoft DLL. Since this is an XP machine and the DLL version is “Vista-ish”, I quickly concluded that the DLL belongs to Windows Desktop Search 3.01, which aligns the desktop search engine on XP with Vista’s built-in search engine. The only other thing with “Vista-ish” versions on my system is Windows Script Host 5.7, which similarly, aligns WSH functionality with the Vista version. I figured the RTF filter is unrelated to that product.

It appears that whenever Microsoft releases an add-on for Windows like Windows Desktop Search, releasing debugging symbols to the symbol store is not a foregone conclusion, even if the original version of the component has its symbols already published by virtue of being a part of the Windows Vista operating system. It is obvious that the absence of the symbols is not a misguided attempt at protecting intellectual property, since they are available anyway for their Vista counterparts. For some reason, not all groups inside Microsoft have uploading public symbols to the symbol store as an integral part of their release process. When mishaps like this happen, their customers are left out in the cold to diagnose the issue on their own. Sure, in this case it was on my home computer, but it could have easily been an enterprise production debugging scenario. Missing symbols are unacceptable, Microsoft. Someone should get their ass fired.

Although taken aback, I did not give up. Since I did have symbols for the upper part of the stack trace, I could still have some impression of what was going on. I figured I’d let the filter run for a while and examine the stack again:

0:006> g
(afc.fbc): Break instruction exception - code 80000003 (first chance)
eax=7ffdd000 ebx=00000001 ecx=00000002 edx=00000003 esi=00000004 edi=00000005
eip=7c901230 esp=0078ffcc ebp=0078fff4 iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=0038 gs=0000 efl=00000246
ntdll!DbgBreakPoint:
7c901230 cc int 3
0:005> ~6 k
ChildEBP RetAddr
00bdc3e8 74e848d6 riched20!CBreakArray::GetBreak+0x2a
00bdc644 74e5f269 riched20!CTxtBreaker::OnPostReplaceRange+0xc3
00bdc664 74e6b436 riched20!CNotifyMgr::NotifyPostReplaceRange+0x25
00bdc6ec 74e65825 riched20!CRchTxtPtr::ReplaceRange+0x42c
00bdc750 74e65643 riched20!CTxtRange::ReplaceRange+0x127
00bdc7b8 74e66b3e riched20!CTxtRange::CheckLimitReplaceRange+0x179
00bdca04 74e5b32b riched20!CTxtRange::CleanseAndReplaceRange+0x833
00bdfa78 74e5b8b2 riched20!CLightDTEngine::ReadPlainText+0x2c7
00bdfd30 74e7ce5d riched20!CLightDTEngine::LoadFromEs+0x287
00bdfeb8 74e7e87b riched20!CTxtEdit::TxSendMessage+0x11c9
00bdff1c 00852b90 riched20!CTxtEdit::Open+0x244
WARNING: Stack unwind information not available. Following frames may be wrong.
00bdff44 00853e4e rtffilt!DllCanUnloadNow+0xaff
00bdff74 00853eb8 rtffilt!DllRegisterServer+0x1af
00bdffac 00853f11 rtffilt!DllRegisterServer+0x219
00bdffb4 7c80b683 rtffilt!DllRegisterServer+0x272
00bdffec 00000000 kernel32!BaseThreadStart+0x37

So it looks like we went from stack bottom doing some Uniscribe work up to actual work on the RTF by the implementation in the Rich Edit library. We do not appear to be stuck in an infinite loop in Uniscribe, at least. Letting the daemon run a few more times shows that some movement between Rich Edit 2.0 functions is occurring, but it is unclear what’s making it work so hard.

A corrupt RTF file is an obvious suspect as the cause of the problem. Let’s see what the filter is working on. I was hoping to use the “!handle” extension to find out about the file handle used by the cidaemon process, but disappointingly, I could not get the extension to show the file name for a given file handle. I opted for Sysinternals’ Handle tool, which I remembered had this functionality. I reviewed cidaemon.exe’s open handles:

C:\Program Files\SysInternals>handle -p cidaemon.exe
Handle v3.2
Copyright (C) 1997-2006 Mark Russinovich
Sysinternals - http://www.sysinternals.com
------------------------------------------------------------------------------
cidaemon.exe pid: 5016 NT AUTHORITY\SYSTEM
C: File (RW-) C:\WINDOWS\system32
40: File (RW-) C:\WINDOWS\WinSxS\x86_Microsoft.Windows.Common-Controls_659
5b64144ccf1df_6.0.2600.2982_x-ww_ac3f9c03
68: Section \BaseNamedObjects\c::inetpub:catalog.wci__cisharedmem
F8: Section \BaseNamedObjects\__CiPerfMonMemory
------------------------------------------------------------------------------
cidaemon.exe pid: 2812 NT AUTHORITY\SYSTEM
C: File (RW-) C:\WINDOWS\system32
40: File (RW-) C:\WINDOWS\WinSxS\x86_Microsoft.Windows.Common-Controls_659
5b64144ccf1df_6.0.2600.2982_x-ww_ac3f9c03
68: Section \BaseNamedObjects\c::system volume information:catalog.wci_
_cisharedmem
F8: Section \BaseNamedObjects\__CiPerfMonMemory
14C: File (---) C:\Program Files\Nero\Nero 7\Core\CDI\CDI_IMAG.RTF
1D4: Section \BaseNamedObjects\ShimSharedMemory
20C: File (RW-) C:\WINDOWS\WinSxS\x86_Microsoft.Windows.GdiPlus_6595b64144c
cf1df_1.0.2600.2180_x-ww_522f9f82
264: File (RW-) C:\WINDOWS\WinSxS\x86_Microsoft.Windows.Common-Controls_659
5b64144ccf1df_6.0.2600.2982_x-ww_ac3f9c03
2A0: File (RW-) C:\Program Files\Nero\Nero 7\Core\CDI\CDI_IMAG.RTF

The offending cidaemon.exe has two open handles to an RTF file and in fact they are both for the same file. This file, CDI_IMAG.RTF, is a part of the Nero Burning ROM CD/DVD software installed on my system. This must be the file causing the RTF filter’s grief.

Attempting to confirm this is the case (although hardly anything would be able to persuade me otherwise at this point), I examine the thread’s stack yet again, this time also examining arguments:

0:006> kb
ChildEBP RetAddr Args to Child
00bdc3a8 74e795bb 00012414 00000100 00000000 riched20!CTxtPtr::FindOrSkipWhiteSpaces+0xb5
00bdc3dc 74e84889 00000001 00000101 00000100 riched20!CTxtPtr::FindWhiteSpaceBound+0xcc
00bdc644 74e5f269 00012414 00000000 00000001 riched20!CTxtBreaker::OnPostReplaceRange+0x76
00bdc664 74e6b436 00bdfe50 00012414 00000000 riched20!CNotifyMgr::NotifyPostReplaceRange+0x25
00bdc6ec 74e65825 00000001 00000001 00bdd2e8 riched20!CRchTxtPtr::ReplaceRange+0x42c
00bdc750 74e65643 00000001 00bdd2e8 00000000 riched20!CTxtRange::ReplaceRange+0x127
00bdc7b8 74e66b3e 00000001 00bdd2e8 000000ee riched20!CTxtRange::CheckLimitReplaceRange+0x179
00bdca04 74e5b32b 000003c6 00bdca78 00000000 riched20!CTxtRange::CleanseAndReplaceRange+0x833
00bdfa78 74e5b8b2 00bdfe4c 00bdfee4 00000000 riched20!CLightDTEngine::ReadPlainText+0x2c7
00bdfd30 74e7ce5d 00bdfe4c 00001011 00bdfee4 riched20!CLightDTEngine::LoadFromEs+0x287
00bdfeb8 74e7e87b 00000449 00001011 00bdfee4 riched20!CTxtEdit::TxSendMessage+0x11c9
00bdff1c 00852b90 000ba5b0 00bdff34 00000130 riched20!CTxtEdit::Open+0x244
WARNING: Stack unwind information not available. Following frames may be wrong.
00bdff44 00853e4e 00bdff64 b7ee486a 7ffdd000 rtffilt!DllCanUnloadNow+0xaff
00bdff74 00853eb8 b7ee48b2 7ffdd000 7ffdf000 rtffilt!DllRegisterServer+0x1af
00bdffac 00853f11 00bdffec 7c80b683 0007e384 rtffilt!DllRegisterServer+0x219
00bdffb4 7c80b683 0007e384 7ffdd000 7ffdf000 rtffilt!DllRegisterServer+0x272
00bdffec 00000000 00853f04 0007e384 00000000 kernel32!BaseThreadStart+0x37

Neither file handle 0x14c nor file handle 0x2a0 is immediately visible in this stack trace. Presumably the Rich Edit 2.0 library is dealing with the RTF stream in memory. Once again I am bitten by the absence of the RTF filter’s symbol and FPO information. If the stack trace had not been black-holed in the middle, perhaps associating the ongoing processing with an handle would have been easier. I realize in this case only one RTF file is open in the offending process, but I was still interested in how one could associate this operation with a specific file if the filter had been written to process many RTF files in bulk, simultaneously.

While we can’t readily see the handle’s association with the current Rich Edit processing, we can at least find old, nested function calls on the stack that used it. We can search the stack for the 0x2a0 handle, for example (little endian byte order):

0:006> s @esp L1000 a0 02
00bdc994 a0 02 00 00 00 00 00 00-00 00 00 00 00 00 00 00 ................

Let’s have a look at the stack in the vicinity of the 0x2a0 value we found to determine whether this is a false positive or actual use of the handle:
0:006> dps 0x00bdc994-20
00bdc974 74e64eab riched20!CTxtRange::Update_iFormat+0x15c
00bdc978 00000001
00bdc97c 00000000
00bdc980 00000000
00bdc984 00bdfe4c
00bdc988 00000000
00bdc98c 7c90e288 ntdll!NtReadFile+0xc
00bdc990 7c801875 kernel32!ReadFile+0x16c
00bdc994 000002a0
00bdc998 00000000
00bdc99c 00000000
00bdc9a0 00000000
00bdc9a4 00bdc9cc
00bdc9a8 00bdca78
00bdc9ac 00000ffe
00bdc9b0 00000000
00bdc9b4 7c801898 kernel32!ReadFile+0x1ca
00bdc9b8 00bdfe4c
00bdc9bc 00bdfee4
00bdc9c0 00000000
00bdc9c4 00000001
00bdc9c8 00bdfee4
00bdc9cc 00000000
00bdc9d0 00000ffe
00bdc9d4 00000000
00bdc9d8 00000000
00bdc9dc 00bdc9b8
00bdc9e0 00000000
00bdc9e4 00bdff68
00bdc9e8 7c839aa8 kernel32!_except_handler3
00bdc9ec 7c809ba8 kernel32!`string'+0xc
00bdc9f0 ffffffff

We can see the remains of a call to the ReadFile API with a file handle to the suspected RTF file on the offending thread’s stack. Had this process had more threads processing RTF files, we’d be able to make the connection between the CPU hogging thread and the problematic RTF file this way. If the process was using a thread pool and each thread was being recycled to process an RTF file after another, results could have still been inconclusive, however.

For those without Nero 7 on their systems who are wondering what’s the deal with that RTF file, it is, in fact, not a rich text format file at all. Ahead made the mistake of calling a non-RTF file with the RTF file extension, persuading the Indexing Service’s RTF filter to attempt to index it. Given all the talk from Microsoft these days on their Security Development Lifecycle, Threat Modeling and all the fuzzing they claim to be doing, it is curious that relatively modern code like the search engine’s RTF filter would happily go into a CPU hogging frenzy on an RTF file that doesn’t even bother with an RTF opening tag. Some basic sanity validation is amiss here.

I looked it up out of curiosity and apparently CDI_IMAG.RTF has something to do with the now very defunct CD-i technology, which White Book classic VCDs are required to be compatible with. I was surprised there’s actually something older than those Video CDs…

For the time being, I renamed the file so its extension won’t be RTF, to prevent the RTF filter from embarrassing itself and killing my box while doing so. Hopefully Nero won’t mind as long as I stick to 21st century technology.

Another happy ending.

WinHttpRequest performance woes

As I mentioned the other day, the WinHttpRequest automation object provides access from scripting to the WinHTTP client library. As I was looking into it for retrieving relatively large files for a testing scenario (e.g., the Sun JDK’s 80MB installation file) I couldn’t help but notice a clear flaw in its interface. Access to the response body for a request is provided by the ResponseText property, which simply provides a string containing the response. There’s also a similar ResponseBody property which is an array but otherwise identical. Unfortunately, the ResponseStream property, which returns an IStream reference, is not usable from scripting, which only supports dispinterfaces.

This is a problem because receiving large responses from the server requires maintaining their entire body in memory. Clearly, using ResponseText to download a 4 GB DVD ISO isn’t a great idea. So I figured as long as one keeps to reasonable transfer sizes, not massively ballooning the process working set, one should be fine.

It was only a short while later when a friend let me know about a problem he had in his own script making server requests. He was attempting to retrieve a file a few dozens megabytes in size from a web server, but instead of the retrieval completing promptly as expected, the Windows Script Host process became a CPU hog and continuously crept up in memory usage.

As we were trying to reproduce the problem in a as simple as possible environment, we saw that simply downloading a large file seemed to work promptly and as expected. Granted, memory usage was significant due to the property issue mentioned above, but the system had plenty of RAM and this was not an issue.

After some investigation the difference in the problematic scenario was identified. The transfer of the file was provided by a server generated dynamic page, where server code would dynamically write the contents of the requested file into the response stream.

Attaching the debugger to the running WSH process and examining the stack provides a big clue as to what is going on:
0:006> ~0 k 30
ChildEBP RetAddr
0013f004 7c81248c ntdll!RtlReAllocateHeap+0xc36
0013f05c 77506778 kernel32!GlobalReAlloc+0x17a
0013f078 77506802 ole32!CMemStm::SetSize+0x37
0013f09c 4d527444 ole32!CMemStm::Write+0x76
0013f0d8 4d527ca8 WINHTTP!CHttpRequest::ReadResponse+0x12b
0013f108 4d5248d9 WINHTTP!CHttpRequest::Send+0x1b9
0013f170 6fe9fcf0 WINHTTP!CHttpRequest::Invoke+0x359
0013f1ac 6fe9fc5d jscript!IDispatchInvoke2+0xb5
0013f1e8 6fe9fd78 jscript!IDispatchInvoke+0x59
0013f25c 6fea6c3c jscript!InvokeDispatch+0x90
0013f2a0 6fe9fab8 jscript!VAR::InvokeByName+0x1c2
0013f2e0 6fe9efea jscript!VAR::InvokeDispName+0x43
0013f304 6fea6ff4 jscript!VAR::InvokeByDispID+0xfd
0013f3bc 6fea165d jscript!CScriptRuntime::Run+0x16bd
0013f3d4 6fea1793 jscript!ScrFncObj::Call+0x8d
0013f444 6fe8da72 jscript!CSession::Execute+0xa7
0013f494 6fe8beba jscript!COleScript::ExecutePendingScripts+0x147
0013f4b0 0100220a jscript!COleScript::SetScriptState+0xf1
0013f4bc 0100217d cscript!CScriptingEngine::Run+0xb
0013f4d0 01001f34 cscript!CHost::RunStandardScript+0x85
0013f708 010027fc cscript!CHost::Execute+0x1f0
0013fcac 010024de cscript!CHost::Main+0x385
0013ff50 010025e6 cscript!main+0x6d
0013ffc0 7c816fd7 cscript!_mainCRTStartup+0xc4
0013fff0 00000000 kernel32!BaseProcessStart+0x23

As the WinHttpRequest object is reading the response from the server side script, it is naively writing it to an internal OLE memory stream (presumably the product of CreateStreamOnHGlobal). As more and more data is received, the memory stream is resized to accommodate it. We can track the stream’s growth pattern with a suitable breakpoint on the SetSize method. It is documented as receiving a ULARGE_INTEGER, an unsigned 64-bit integer, specifying the stream’s desired size. Therefore, we expect the argument to be 8 bytes above the position of the stack pointer when we enter the function, on x86. A 64-bit integer is also known as a quadword. The debugger command is hence:
bp ole32!CMemStm::SetSize ".printf \"SetSize called with size %d\", qwo(@esp + 8); .echo; gc"

If we promptly resume execution and let the script run for a while we see something similar to:
SetSize called with size 10650474
SetSize called with size 10658666
SetSize called with size 10666858
SetSize called with size 10675050
SetSize called with size 10683242

And so on… some quick arithmetic indicates that the numerous SetSize calls are apart by 8,192 bytes.

Resizing the stream by 8KB at a time for a file dozens of megabytes in size is clearly not a winning strategy. As can be seen in the initial stack trace, this results in a heap reallocation frenzy. As contiguous room for block resize on the heap is exhausted, the stream resize must be satisfied by allocating a whole new block and moving all the previous stream data there. As the stream grows, the whole process slows down exponentially. Since the block in question is a big one, it resides in a “virtual block” on the heap (a block of address space provided by VirtualAlloc rather than a smaller one from the heap’s free lists) and is presumably moved whenever there is not enough contiguous address space at the virtual block’s present position to extend it.

Let us examine the heap to confirm this. We can suspend execution at ole32!CMemStm::SetSize and proceed to kernel32!GlobalReAlloc. The first parameter to the latter is the handle to the heap allocation:
Breakpoint 0 hit
eax=00000000 ebx=00000000 ecx=006f5fa4 edx=774eee94 esi=00184790 edi=00002000
eip=77506741 esp=0013f07c ebp=0013f09c iopl=0 nv up ei pl zr na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000246
ole32!CMemStm::SetSize:
77506741 8bff mov edi,edi
0:000> bp kernel32!GlobalReAlloc
0:000> g
Breakpoint 1 hit
eax=00184d98 ebx=00000000 ecx=006f5fa4 edx=00184a14 esi=00184790 edi=006f5fa4
eip=7c8123b9 esp=0013f060 ebp=0013f078 iopl=0 nv up ei pl nz na pe nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000206
kernel32!GlobalReAlloc:
7c8123b9 6a24 push 24h
0:000> kb 1
ChildEBP RetAddr Args to Child
0013f05c 77506778 008e000c 006f5fa4 00002002 kernel32!GlobalReAlloc

If we proceed to examine the process default heap (at 0x00150000 in my case) and its virtual allocation list (we can ignore allocations from the linked lists below, they are for much smaller allocations) we can see the interesting block:
0:000> !heap -a 00150000
Index Address Name Debugging options enabled
1: 00150000
Segment at 00150000 to 00250000 (00035000 bytes committed)
Segment at 016D0000 to 017D0000 (00001000 bytes committed)
Flags: 00000002
ForceFlags: 00000000
Granularity: 8 bytes
Segment Reserve: 00200000
Segment Commit: 00002000
DeCommit Block Thres:00000200
DeCommit Total Thres:00002000
Total Free Size: 000001ee
Max. Allocation Size:7ffdefff
Lock Variable at: 00150608
Next TagIndex: 0000
Maximum TagIndex: 0000
Tag Entries: 00000000
PsuedoTag Entries: 00000000
Virtual Alloc List: 00150050
017d0000: 006f4000 [6f3fa4] - busy (2b) (Handle 008e000c), user flags (1)
...

If we let execution proceed for a while and examine the heap again, we see something like:
0:000> !heap -a 00150000
...
Virtual Alloc List: 00150050
02080000: 008ac000 [8abfa4] - busy (2b) (Handle 008e000c), user flags (1)
...

Notice the virtual allocation for the global heap handle has moved elsewhere in the interim. Originally at virtual address 0x017d0000, it is now at 0x02080000. Further examination shows that this happens repeatedly. In fact, examining the position of the block between calls of GlobalReAlloc shows how it alternates between a small set of virtual addresses, back and forth, as its size is modified. For a reason unclear without diving into deeper heap and virtual allocation internals, the virtual address block seems to change its location even when it can be extended to the new size at its present position, unlike what I speculated above.

Examination of the server response headers for the static page case and the dynamic case provide an insight to the difference in performance behavior. In the static case, the server, knowing in advance the size of the response, can provide the client with a Content-Length header. The header can be used by the client to allocate a block, huge as it may, in one swoop. Despite the slight memory pressure, this happens practically instantaneously.

In the dynamic case, the server cannot guess how much output the page is going to feel like providing. No Content-Length header is emitted, and the client’s naive stream memory allocation strategy comes to light and kills performance.

While WinHttpRequest’s naive reliance on COM’s memory stream allocation strategy is primarily at fault for this issue, the heap manager’s own deficiencies should not be overlooked. I believe its behavior for large block reallocations (virtual blocks) could be somewhat improved to mitigate inefficient clients.

This issue can be mitigated at the server side by ensuring a Content-Length header is always provided in responses. This can be tricky to calculate and I haven’t looked into whether lying (i.e., providing a Content-Length longer than the actual response just to get the client to preallocate memory) works correctly.

While I was looking into this issue, I was hoping to take advantage of Visual Studio 2005 Team Edition’s integrated profiler. I assumed it would be ideal for detecting and clearly illustrating an obvious CPU hog such as this case. However, after short examination, I must express my disappointment.

Visual Studio 2005’s profiler, like most other profilers, has two operation modes. Sampling and Instrumentation. Sampling means examining thread stacks at an interval and noting current execution state, basically. Instrumentation means binary patching of functions with code that notes execution flow has reached them. Instrumentation is therefore a means of “zooming in” on functions of interest, seeing as it is both more precise and has greater overhead.

I initially tried the Sampling mode. Examining its output in the impressive GUI, Performance Explorer, I noticed the sample didn’t clearly capture what was going on. I figured by instrumenting the functions I was suspicious of (the heap reallocator, etc.) a clear graphical illustration would be possible.

It was then I discovered that the profiler’s Instrumentation mode requires full debugging information. Since I am profiling native code that is not my own running under WSH, clearly all I have is public PDBs. Instrumentation refused to work in this scenario, making it useless for these kinds of cases.

This is too bad because other profilers seem content with instrumenting just exports or functions as they are indicated in public PDBs. I hope to see this as a feature in future versions of the Visual Studio profiler, since I like the slick Performance Explorer.

For those of you implementing memory stream classes and the like, keep in mind that sometimes an imperialistic memory allocation strategy is a good thing.

Think before you act, allocate before you write.