Bridging the gap between native functions and Active Scripting with a COM-based FFI wrapper

A few weeks ago I was following the excitement as WebKit, Safari’s browser engine, incrementally passed more and more of the Acid 3 standards test. Wondering if the Gecko (Mozilla Firefox’s rendering engine) folks are also busy with that, I followed both the Planet WebKit and Planet Mozilla feeds for a few weeks.

Sometime in April I stumbled upon this post in Planet Mozilla. It discussed recent improvements to JSctypes. It was the first time I had heard of this project. JSctypes is an XPCOM component for Mozilla that allows calling native (or “foreign”) functions from privileged JavaScript code. Both the interface and name are inspired by the Python ctypes module, included with the standard distribution since version 2.5.

If you haven’t heard of ctypes, take a minute to get acquainted. It’s a great library that allows you to call native C functions dynamically from Python code. Its interface really feels at home in a dynamic language. Most of the time, you can just call functions without specifying the number and types of the arguments they receive. DLL modules can be accessed as attributes of the module attribute matching their calling convention (e.g., ctypes.windll.kernel32 or ctypes.cdll.msvcrt) and script functions can be passed as callbacks to the native APIs being invoked.

JSctypes takes Python’s ctypes concept into Mozilla’s JavaScript implementation. Mozilla has a COM-like architecture at the base of its object model which is called XPCOM. Usually, calling native functionality from JavaScript is achieved by exposing an XPCOM component to script. However, such an approach has clear disadvantages as every conceivable native functionality needs to be wrapped on a case by case basis by a compiled XPCOM component. Now, with JSctypes, Mozilla’s JavaScript code, when privileged (obviously a native call interface is not appropriate in the context of untrusted web content), can call most native functions with relative ease and without a compiled component, aside from JSctypes itself.

A native function interface for a dynamic language needs to deal with the relatively complex task of setting up the call stack frame for an arbitrary native API, according to argument counts, types and alignment requirements deduced dynamically at script execution time. As the interface layer seeks to support a broader and broader variety of argument types (basic data types, then structures, arrays, then callback functions, etc.) the task becomes increasingly complicated and difficult.

I reviewed both JSctypes and Python’s ctypes source code in their respective source code repositories and learned that they both share a common implementation of the lowest component in such a native interface layer. It is called libffi, the Foreign Function Interface library and seems to originate from the gcc project. Since libffi is designed to be compiled with a UNIX-style toolchain (has AT&T syntax assembly files, for instance) and Python needs to compile with Visual C++, the author of ctypes, Thomas Heller, ported an old revision of the library to Visual C++.

Usage of libffi is pretty simple. You initialize an ffi_cif (call information?) structure with the ABI type, return value type, argument count and argument types of the native function to be invoked by using the ffi_prep_cif function. Later, and repeatedly as needed, ffi_call is used to call the actual function with a specific set of argument values, passed in as an array and to retrieve the value returned from the native function.

I thought JSctypes is really cool and it then occurred to me it should not be prohibitively difficult to implement a similar adaptation layer for Microsoft’s JScript and possibly other Active Scripting languages.

In my mind’s eye, I envisioned an in-process COM server accessible to Active Scripting clients (implements IDispatch and associated with a ProgID) providing a call interface to arbitrary native functions.

I created an ATL COM DLL and gave the coclass the ProgID “FunctionWrapper.FunctionWrapper.1″. I knew you could call JScript functions with less or more arguments than they expect in their definition and figured pulling off the same in a native method I’ll expose to the script would be ideal. After a short investigation I learned of the IDL vararg attribute, which accomplishes just what I had in mind. At this point, the exposed interface looks like this:

[
object,
uuid(EBA4A11F-969B-4413-9D4E-FB5CB21039FC),
dual,
nonextensible,
helpstring("IFunctionWrapper Interface"),
pointer_default(unique)
]
interface IFunctionWrapper : IDispatch {
[id(1), helpstring("method CallFunction"), vararg] HRESULT CallFunction([in] SAFEARRAY(VARIANT) args, [out, retval] VARIANT* retVal);
};

The CallFunction method of the FunctionWrapper object is callable by JScript clients with arguments of arbitrary count and type of their choosing. As a simplistic start, I had the first argument specify a string identifying the native function, in the Windbg-inspired syntax of “module!export”, e.g. “user32!MessageBoxW”. The rest of the arguments would be passed to the native function.

I proceeded to implement CFunctionWrapper::CallFunction. The steps taken by the method would be:

  1. Ensure at least the first argument (function to invoke) was given.
  2. Ensure the first argument specifies a module and an export, load the module and retrieve the address of the export.
  3. Thunk the VARIANT arguments received by the method to libffi-style argument and types arrays.
  4. Invoke ffi_prep_cif to prepare the call and call the native function with ffi_call
  5. Thunk the return value of the function into a VARIANT usable by script.

Much of the work here is concise but stage 3 consists of relatively mundane boilerplate, translating two varieties of dynamically typed data, Microsoft’s VARIANT and libffi’s ffi_type. I’ll illustrate with a short snippet:

for (ULONG i = 1; i < arguments.GetCount(); i++)
{
ffi_type* argumentTypes = ...; // Dynamically allocated by argument count
void* ffiArgs = ...;


VARIANT& arg = arguments[i];
switch (V_VT(&arg)) {
case VT_UI1:
argumentTypes[i - 1] = &ffi_type_uint8;
ffiArgs[i - 1] = &(V_UI1(&arg));
break;
...
case VT_UI4:
argumentTypes[i - 1] = &ffi_type_uint32;
ffiArgs[i - 1] = &(V_UI4(&arg));
break;
}
}

Similar work is needed for other integer and floating-point types, strings and pointers.

Initially, I hard-coded a return value type of unsigned 32-bit integer and the stdcall calling convention to avoid providing an interface for selecting those parameters. I registered the DLL and tested the following script with WSH:

var functionWrapper = new ActiveXObject("FunctionWrapper.FunctionWrapper");
var retVal = functionWrapper.CallFunction("user32!MessageBoxW", 0, "text", "caption", 1);
WScript.Echo(retVal);

1 is also the value of the MB_OKCANCEL parameter to MessageBox. I used the W variety of the API since I implemented hardcoded UTF16 marshalling for VT_BSTR type variants, which is the form strings come in from JScript.

I was quite content when the test script not only failed to crash the WSH process, but also successfully presented a message box and provided the API’s return value successfully back to JScript.

At this point I considered what would it take to extend this solution beyond the basic value types. Arrays first came to mind. Such support, I imagined, would consist of copying an incoming SAFEARRAY argument into a native array and supplying the native array pointer to the native function. If “out” array argument support is desired, copying back into the SAFEARRAY would be required post-invocation, right after ffi_call.

Next in line were structs. These would be less straightforward. The problem with filling a JScript “object” (read, hash table) with a struct’s fields is that ordering would not be preserved as the order in the struct’s data layout. Using the hash as a JScript array would provide ordering, although it wouldn’t be very nice looking.

The final type of argument I considered, and arguably the most important, is callbacks. Many APIs take function pointers as arguments. Consider EnumWindows which invokes EnumWindowsProc on every window found. A native call interface should provide a capability to implement the callback as JScript function and pass it as seamlessly as possible during the native invocation.

Fortunately, libffi provides built-in support for callbacks, calling them “closures” in its terminology. An ffi_cif structure is initialized to describe the prototype of the callback function, in native eyes, as it if it were going to be called with ffi_call. ffi_prep_closure takes such a prototype description, a function pointer and a closure “trampoline buffer”, as I call it. The trampoline buffer, expected to be allocated in writable, executable memory (native code would later jump into its address) takes care of calling the provided function pointer. The twist is that the function pointer, instead of being called with a dynamic prototype, always receives its arguments in the form of libffi argument arrays.

The native callback function wrapped by the closure trampoline buffer would presumably fill a SAFEARRAY of variants with the arguments and invoke a script function. A wrapper callback coclass could be provided to the script and allow for more elaborate stuff like out parameters and the like. An instance of the callback object would wrap a JScript function object and invoke its apply method using the IDispatch interface as calls come in through the closure. It is unclear what a generic solution that doesn’t rely on functions being objects and having the apply method would look like, so at this point this wrapper callback concept is only suitable for JScript.

Right now I only got as far as implementing just the basic value types, and even that with code of such poor quality I avoid uploading it for the time being. The devil is in the details and supporting describing complex argument types would require quite a bit of work. Hopefully someday I or perhaps an enthusiastic reader would get around to coding and publishing a full-fledged implementation of a native call interface. Embedding such an interface in an Active Scripting host in scenarios where the hosted scripts enjoy full trust could provide endless extensibility possibilities for the script author.

Hey, cooler than P/Invoke…

A JScript interactive interpreter shell for the Windows Script Host

A few weeks ago a friend of mine who was starting to code WSH scripts asked me if there was an interactive shell for it. I told him I didn’t know of one, but it got me thinking. Python users are well aware of its interactive shell, and indeed, take it for granted. The ability to execute statements immediately, one line at a time, is pretty fundamental. Yet, Windows Script Host offers no such built-in functionality.

As a new WSH script writer, I would often find myself in a cycle of Edit, Save, Run, quite similar to the Edit, Save, Compile cycle of native code, with small snippets or even one-liners. As my scripts exceeded a certain threshold of complexity, I would find myself using the script debugger. Unfortunately, since it provides a read-only view of the debugged script, when I wanted to make my changes and test them, I would have to switch to the editor window, make my changes and start the session all over again.

I recall one of my first programming experiences, around the age of 9. I was toying with my brother’s old Atari 800XL, initially in BASIC. The machine had a BASIC interpreter built-in to its ROM and had a measly 64K of memory. When it was turned on, a friendly “READY” banner written in white over a blue background greeted you to the BASIC interpreter. The ability to interpret statements for rapid modeling was considered so fundamental it was this, not a disk operating system, that was the core of the machine.

Fast forward back to the present. My friend’s question had me searching for a solution. I did not find an interpreter targeting WSH, but I did find a variety of JavaScript shells for the web browser, like this one. These are good candidates for brushing up on the HTML DOM, but are less useful to those using WSH. For instance, attempting to model automation controllers quickly brings you into the realm of warnings and denials from the browser’s security apparatus. In the case of the specific shell in question, its approach of having the user use Shift-Enter for multi-line entry was inconvenient, since you had to keep doing so until your code block was complete.

JScript lends itself well for implementing a self-hosted shell through the “eval” keyword. As I was examining the input mechanisms available to a command-line WSH script, I saw that the WScript.StdIn object was a TextStream, only supporting newline-terminated input. This means I could not implement the same Shift-Enter based approach for multi-line input used by the browser hosted shell mentioned above.

During my search, I also found two JavaScript shells that are not browser-based but do not target the Windows Scripting Host. One was a part of Spidermonkey, which is Mozilla’s classic JavaScript implementation (which is set to be retired and replaced by the JIT-based Tamarin, the open source version of Adobe Flash’s ActionScript VM, in future versions of Firefox). The other was a part of Rhino, an implementation of JavaScript in Java.

I examined their source to determine what was their approach to multi-line input. It appeared that both the Spidermonkey and the Rhino shells used the underlying script language implementation’s functionality for determining whether a given string is a “compilable entity.” They would keep on reading lines until that condition was met.

Unfortunately, it did not seem like I could adopt a similar approach. Calling “eval” repeatedly until successful is problematic. Even if I were to implement the shell in native code using the Active Scripting hosting interfaces instead, it did not appear as though IActiveScript or the related interfaces provided a similar “compile testing” method.

Defeated, I opted for a simple approach where a blank line initiates multi-line input and two consecutive blank lines terminate it.

Pardon me for the coarse, unpolished illustration code:

function hex(n) {
    if (n >= 0) {
        return n.toString(16);
    } else {
        n += 0x100000000;
        return n.toString(16);
    }
}
var scriptText;
var previousLine;
var line;
var result;
while(true) {
    WScript.StdOut.Write("jscript> ");
    if (WScript.StdIn.AtEndOfStream) {
        WScript.Echo("Bye.");
        break;
    }
    line = WScript.StdIn.ReadLine();
    scriptText = line + "\n";
    if (line === "") {
        WScript.Echo(
            "Enter two consecutive blank lines to terminate multi-line input.");
        do {
            if (WScript.StdIn.AtEndOfStream) {
                break;
            }
            previousLine = line;
            line = WScript.StdIn.ReadLine();
            line += "\n";
            scriptText += line;
        } while(previousLine != "\n" || line != "\n");
    }
    try {
        result = eval(scriptText);
    } catch (error) {
        WScript.Echo("0x" + hex(error.number) + " " + error.name + ": " +
            error.message);
    }
    if (result) {
        try {
            WScript.Echo(result);
        } catch (error) {
            WScript.Echo("<<<unprintable>>>");
        }
    }
    result = null;
}

This is simple enough and is quite useful for the majority of cases. It does have its disadvantages, however. Notably, the surrounding code of the shell is leaked into the namespace accessible by the interpreted snippets. For example, typing “hex” exposes the error code conversion function. However, for my needs, I found this quite satisfactory.

If anyone can offer an improved implementation, I’d be happy to see it in the comments.

Save this code to a file, like shell.js, and use “cscript shell.js” to start it. Multi-line input is performed as described above. Ctrl-Z can be used to quit.

A nice stunt you can pull with this is wrap the shell in a .WSF referencing your favorite type libraries. For example, consider this shell.wsf:

<job>
    <reference object="Scripting.FileSystemObject" />
    <script language="JScript" src="shell.js" />
</job>

If you start a shell with “cscript shell.wsf”, the shell instance will have access to type library constants like “ForReading”, “ForAppending” and so forth.

Although I’m not much of a VBScript fan, I considered doing something similar for it, since it could be quite handy for testing those pesky automation objects that take SAFEARRAYs and are thus not that JScript friendly. However, VBScript’s distinction between expressions and statements (and its Eval function vs. the Execute & ExecuteGlobal keywords) make such a thing a bit more complicated. It is also not clear whether the interpreter should opt for executing statements using Execute or ExecuteGlobal, and in what cases. If anyone is up for implementing this, I’d love to see it.

Have fun.

No SxS love from the Windows Script Host?

I was automating a scenario with a WSH script the other day that required interaction with a web server. So naturally I figured I’d make use of the WinHttpRequest automation object which wraps the WinHTTP API.

Those familiar with WSH may share my great distaste for the fact that when it functions as an automation controller, the developer is expected to hard-code enumeration constants and the like (as “vars” in JScript or “Consts” in VBScript). I first encountered this ridiculous limitation when a friend showed me how he translated C# code that automated Microsoft Word to VBScript, and had to look up the various constants by hand, with Visual Basic 6′s Object Browser, which functions as a convenient type library viewer. By default, the script engine only uses the automation object’s IDispatch interface, leaving the chore of constant resolution to the caller.

So I was relieved when I found out that .wsf files, which are XML files that wrap scripts executed by WSH, support referencing a type library for the purpose of making available the constants used with a controlled automation object. I was a little disappointed to find out about their not so ideal performance characteristics, but that was not problematic in my case.

I figured I’d introduce a reference of the following form to the script:
<reference object="WinHttp.WinHttpRequest.5.1"/>

Not all was well, however. After introducing the change above, I noticed my script had stopped working on one of the systems. Invocation of the Windows Scripting Host failed, with WSH claiming to be unable to resolve the reference to the specified ProgID.

After looking into it I figured out the problem with that specific system was that it was running Windows Server 2003 rather than Windows XP. It seemed strange the newer Windows Server 2003 would have a regression like that. I continued investigating.

The first clue was that winhttp.dll, the DLL implementing the WinHTTP API, was MIA from Windows Server 2003′s system32 directory. Surely the API was not missing from the OS, MSDN clearly documents its presence. It was indeed there, albeit in a modified form: a native side by side assembly.

OK, so winhttp.dll is there, in an oddly named subdirectory somewhere in the winsxs store instead of system32. Still, I recalled from my previous interaction with SxS that side by side assemblies could expose COM objects to their clients. Examination of the manifest file for WinHTTP in Windows Server 2003′s winsxs store revealed that it was indeed doing so.

Microsoft documents that users of the flat WinHTTP C API under Windows Server 2003 should add winhttp.dll as a dependent assembly to the activation context of the client application, but this approach seemed inappropriate to me in the context of the WinHttpRequest automation object, since clients activate it by ProgID or GUID and do not load winhttp.dll directly. Them being made aware of this relationship would be a serious breach of COM’s encapsulation.

I proceeded to write a test application in C++. It initialized COM and proceeded to call CLSIDFromProgID to translate “WinHttp.WinHttpRequest.5.1″ to a GUID. Given success of this translation, it would call CoCreateInstance on returned GUID and if that worked out, QueryInterface for IDispatch and for IWinHttpRequest (defined in the Windows SDK’s httprequest.idl).

To my, I must admit, great surprise, the test application worked. The first surprising thing was that CLSIDFromProgID returned successfully, even though I specified a ProgID exposed by a SxS assembly. The ProgID was clearly absent from the HKEY_CLASSES_ROOT registry key in Windows Server 2003, in contrast to its presence there in Windows XP. Only if ole32.dll, the COM runtime, had specific knowledge of SxS and ability to perform a lookup in the winsxs store, would such a request be serviced successfully, I figured. However, no mention of this functionality could be found directly in CLSIDFromProgID’s documentation.

I was even more surprised that the CLSID returned by CLSIDFromProgID as the result of the lookup was NOT the CLSID of winhttp.dll! I couldn’t find the returned CLSID in the registry. However, when I promptly invoked CoCreateInstance, not only the activation request succeeded, I actually saw a Module Load event for winhttp.dll from the winsxs store in the debugger. I assume that the returned CLSID is part of some COM SxS integration magic.

OK, so my poor man’s automation controller implemented in C++ could obviously activate the WinHttpRequest object even in Windows Server 2003 with no knowledge of its new SxS semantics. It seemed odd that my script would fail to do same, since I assumed similar mechanics were behind its resolution process for locating the type library.

The next thing I did was to try and run my script on Windows Vista. I figured the change to WinHTTP making it a SxS assembly introduced in Windows Server 2003 was incorporated into Microsoft’s latest OS, as well. Continuing the previous chain of surprises, the script suddenly worked.

The first difference between Windows Server 2003 and Windows Vista that I observed was that Windows Script Host was updated to version 5.7 in the new OS. My first theory was that the new WSH had corrected whatever implementation issue that prevented WSH 5.6 from locating SxS type libraries.

I looked it up and found out that only days earlier Microsoft had actually made a release of the new Windows Script Host 5.7 to down-level platforms. Untypically for Microsoft nowadays, they even made a release for Windows 2000. So now I had a chance to test my theory. I installed WSH 5.7 on the Windows Server 2003 system and reran the script. In yet another surprise, it didn’t work, the type library reference giving the same error as before. It seems my instincts are really off about all of this.

So there must be a different reason for the different behavior of Windows Server 2003 and Windows Vista. After examining the Vista system, it appeared the whole thing was a lot simpler than I had originally thought. Windows Vista was a strange hybrid of the Windows XP and Windows Server 2003 behaviors, with winhttp.dll being present both as a SxS assembly in its winsxs store and as a regular DLL in system32. Indeed, examination of HKEY_CLASSES_ROOT in the Vista registry resulted in the discovery of plain old ProgID registration for the non-SxS winhttp.dll. This is most likely the reason that the type library lookup succeeds in the Windows Vista system.

With these details at hand, I was finally able to find a discussion of this issue in a newsgroup. In that newsgroup thread, Microsoft’s Biao Wang acknowledges WSH’s lack of support for SxS type library references. The thread being an old one, the possibility of a fix being introduced in Windows Server 2003 Service Pack 1 was mentioned. However, considering the issue presented itself on the Windows Server 2003 system that had Service Pack 2 installed and that the latest WSH 5.7 still doesn’t support this down-level, it appears that the issue ended up remaining unresolved, for whatever consideration Microsoft had made on the matter.

The thread does mention a satisfactory workaround: reference the SxS type library by GUID and version instead of by object ProgID and it seems to work. I tried referencing the type library by GUID when the ProgID approach didn’t work on Windows Server 2003 originally, but that reference didn’t work either since I left out the “version” directive. Another happy ending.

Lovers of type library constant imports, rejoice!