P/Invoke Tips

Not very many C# programmers will ever need to do much with P/Invoke (Microsoft's technology for interoperation with legacy or native codebases), but for those of us who do, I've amassed a few little tips that aren't always included in the various tutorials found on the 'net.

If you're looking for a tutorial on using P/Invoke, this isn't the place to find it- there are already plenty of decent, in-depth discussions and examples on the internet just a Google search away; and they've already done a better job than I ever could. Instead, I'm hoping to make a quick cheat-sheet of some random parts of P/Invoke that I've used in making a game engine, and that might be useful to future programmers! So, without further introduction, here we go (in no particular order):

Exporting Methods

The first tip I'm going to give is something that I actually found surprisingly difficult (but not impossible) to get information about on the internet. There's a wealth of information about how to call pre-existing native methods from C# (e.g. anything in the C runtime libs or WINAPI)- in fact, there's even a whole website dedicated to it. But what if you're writing the native side too (e.g. a C++ or C library that exposes certain functions to be called from your managed code)? Here's the best way I found to expose an interface to be used from C# in your C++ code:

First, define a macro like this:

#define EXPORT extern "C" __declspec(dllexport)

View Code Fullscreen • "P/Invoke Export Macro"
Now, you can use this macro to 'export' a method from C++ to be used through P/Invoke. For example, here's an example from the game engine (slightly modified to keep things concise):

EXPORT INTEROP_BOOL RenderPassManager_ExecuteCommandList(ID3D11DeviceContext* immedContextPtr, ID3D11CommandList* commandListPtr) {
    RenderPassManager::ExecuteCommandList(immedContextPtr, commandListPtr);
}

View Code Fullscreen • "P/Invoke Export Macro In Use"
That is a function named RenderPassManager_ExecuteCommandList that returns an INTEROP_BOOL (more on that in a second), and takes two DirectX-related pointers as parameters. You can export as many methods like this as you want. This code for this particular method can then be called from managed C# through a P/Invoke signature that looks like this:

[DllImport(NATIVE_DLL_NAME, CallingConvention = InteropUtils.DEFAULT_CALLING_CONVENTION,
    EntryPoint = "RenderPassManager_ExecuteCommandList")]
public static extern InteropBool RenderPassManager_ExecuteCommandList(
    DeviceContextHandle immediateContextHandle,
    IntPtr commandList
);

View Code Fullscreen • "Usage of Exported Method from C#"
Notice how the CallingConvention is set to InteropUtils.DEFAULT_CALLING_CONVENTION. That corresponds to the function calling convention we specified in the EXPORT macro with extern "C". The calling convention basically specifies where and in which order the function parameters, return value, stack values and CPU registers are when calling a function, and we must match them up on both 'sides' of the code. The extern "C" declaration is actually specifying the calling convention as cdecl, and therefore InteropUtils.DEFAULT_CALLING_CONVENTION is set to CallingConvention.Cdecl. Furthermore, it should be noted that when you want to export a method with the cdecl convention, you could also declare the method with the __cdecl attribute instead of extern "C"- but this will not prohibit the C++ compiler from name mangling your exported methods- which is not something you want most likely.

Additionally, all this calling convention malarkey is purely for the benefit of 32-bit code. For 64-bit platforms, all options you try to set for the CallingConvention property are ignored and the x64 ABI is used which only has one usable calling convention (from the CLR's perspective, anyway).

Concerning the rest of that method, the two parameters to the call from the C# side are basically just two pointers that will point to the relevant DirectX objects.

This is certainly the easiest way to export methods from C++/native code to C#/managed via P/Invoke that I found. Of course, there are other ways, and if you have any suggestions or improvements, please do leave a comment!

Bools and Numeric Types

So in the previous section you may have noticed my use of a return type named INTEROP_BOOL. This is because I like to return a true/false value to indicate whether or not an interop call failed. However, what is INTEROP_BOOL? Well, it's a macro (and probably ought to be a typedef, but ignore my egregious crimes for now ;)), and it looks like this:

#define INTEROP_BOOL char
#define INTEROP_BOOL_TRUE ((INTEROP_BOOL) 1)
#define INTEROP_BOOL_FALSE ((INTEROP_BOOL) 0)

View Code Fullscreen • "Definition of INTEROP_BOOL"
So what gives? Why not just use bool?

Well, the answer is that in C++, the size of the bool type is implementation-defined, whereas the CLR mandates that the System.Boolean struct is exactly 1 byte. If you don't specify any particular marshalling, the C# bool will be marshalled to a WINAPI BOOL type, which is just a 4-byte integer.

All this marshalling on both sides presents a potential pitfall, espcially one where your code may work on one machine, but then not on another. Ultimately, I found it easier to just avoid the issue altogether and simply pass a single-byte in, doing the conversion from Byte -> Boolean myself. Thusly, on the C# side, I have the following struct defined:

[StructLayout(LayoutKind.Sequential, Pack = (int) InteropUtils.StructPacking.Safe, Size = sizeof(byte))]
public struct InteropBool : IEquatable<InteropBool> {
    public static readonly InteropBool TRUE = new InteropBool((byte) 1);
    public static readonly InteropBool FALSE = new InteropBool((byte) 0);
 
    private readonly byte value;
 
    private InteropBool(byte value) {
        this.value = value;
    }
 
    public bool Equals(InteropBool other) {
        return value == other.value;
    }
 
    public override bool Equals(object obj) {
        if (ReferenceEquals(null, obj)) {
            return false;
        }
        return obj is InteropBool && Equals((InteropBool)obj);
    }
 
    public override int GetHashCode() {
        return this ? TRUE.value : FALSE.value;
    }
 
    public override string ToString() {
        return this ? "true" : "false";
    }
 
    public static bool operator ==(InteropBool lhs, InteropBool rhs) {
        return lhs.Equals(rhs);
    }
 
    public static bool operator !=(InteropBool lhs, InteropBool rhs) {
        return !lhs.Equals(rhs);
    }
 
    public static implicit operator InteropBool(bool operand) {
        return operand ? TRUE : FALSE;
    }
 
    public static implicit operator bool(InteropBool operand) {
        return operand != FALSE;
    }
}

View Code Fullscreen • "InteropBool Struct"
This, together with the corresponding INTEROP_BOOL macros defined on the C++ side, allow me to sidestep any issues involving the marshalling of bools. It's worth noting that it is semi-possible to correctly and safely marshal bools from .NET to C++ and back; but this approach relies on the c-bool "bool" being 1 byte, which is highly likely but is an implementation detail.

And finally, you might (reasonably) decide that all this extra work is unnecessary, and that the default marshalling to WinAPI's BOOL type is fine. This will of course require you to use the windows-specific headers in your code or perhaps use a type like uint32_t- though this doesn't feel particularly neat to me, considering the type is meant to represent a boolean value.

When it comes to marshalling numeric types to and from p/invoke, it may again surprise you to learn that the int type is not guaranteed to be 32 bits in C++; that is only the minimum size. The same "minimum rule" applies to most of the inbuilt numeric types. You may think this a pedantic quibble about theoreticals, but for example, the size of the long type varies between different OSs and architectures. Furthermore, I find it's just asking for trouble when you have to remember that long always means 'a 64-bit integer' in C#, but often just means 'a 32-bit integer' in C++. Other differences (e.g. byte/char) also exist.

Therefore, at least for P/Invoke (if not everywhere in your C++ applications!), I recommend using the explicit-width types in the <cstdint> header; such as int32_t and uint64_t (32-bit int and unsigned 64 bit int respectively). These types correspond much better to the CLR numeric types and you are less likely to create faults in your P/Invoke operations by inadvertently marshalling between mismatched numeric types.

Strings

Strings in C++ are kind of messy to use compared to .NET and other more modern languages: Whereas in .NET you have only one string type, in C++ there is simultaneous support for different char types/encodings, as well as different ways of accessing and managing the memory allocated for strings, especially if you're working with WINAPI, which makes heavy use of C-style char-pointer string types with typedefs such as LPWCSTR, LPCSTR, etc. etc.

Although UTF-16 is the worst of both worlds, that is what the CLR's System.String class uses internally (as well as a lot of other languages/runtimes; and most of Windows' modern API). Therefore, if we want the quickest possible marshalling of strings, we need to also use UTF-16 from the native side where possible.

Marshalling Strings from Managed to Native

First of all, when sharing any reference type from managed to native, you need to start thinking very carefully about how memory is shared or created, who owns it, and where it will be freed.
Through (informal) testing, I found that the most performant way to marshal strings is as follows:

[DllImport(NATIVE_DLL_NAME, CallingConvention = InteropUtils.DEFAULT_CALLING_CONVENTION,
    EntryPoint = "WindowFactory_SetWindowTitle")]
public static extern InteropBool WindowFactory_SetWindowTitle(
    WindowHandle windowHandle,
    [MarshalAs(InteropUtils.INTEROP_STRING_TYPE)] string windowTitle
);

View Code Fullscreen • "String Marshalling Example"
This is a function to set the title of a window in our engine. InteropUtils.INTEROP_STRING_TYPE is UnmanagedType.LPWStr. On the C++ side, the corresponding exported function prototype looks like this:

EXPORT InteropBool WindowFactory_SetWindowTitle(WindowHandle* windowHandle, INTEROP_STRING windowTitle);

View Code Fullscreen • "String Marshalling Example (CPP Side)"
And INTEROP_STRING is a macro (again, a typedef would be better in retrospect), defined as:

#define INTEROP_STRING const char16_t*

View Code Fullscreen • "INTEROP_STRING Macro"
These options allow the .NET marshalling runtime to simply 'blit' the internal string pointer used by the CLR string to be used directly by the C++ code. You'll notice that my INTEROP_STRING macro defines a pointer to a const char16_t array. This is because the pointer provided through the P/Invoke mechanism is literally the address of the internal CLR unicode string, pinned on the managed heap for the duration of the unmanaged call. That means that if you modify the data, you will modify the string from the point of view of your managed program too, something that is usually impossible in .NET without unsafe code. Here's an example of the danger:

// C++
 
extern "C" __declspec(dllexport) void StringMutationExample(INTEROP_STRING myString) {
    const_cast<char16_t>(myString)[0] = L'R';
}

View Code Fullscreen • "Mutation of Marshalled Strings (CPP Side)"

// C#
 
[DllImport(NATIVE_DLL_NAME, CallingConvention = CallingConvention.Cdecl,
    EntryPoint = "StringMutationExample")]
public static extern void StringMutationExample(
    [MarshalAs(InteropUtils.INTEROP_STRING_TYPE)] string myString
);
 
public static void Test() {
    string myString = "Yolo";
    StringMutationExample(myString);
    Console.WriteLine(myString); // Prints 'Rolo' to the console!
}

View Code Fullscreen • "Mutation of Marshalled Strings (C# Side)"
This could potentially present a particularly nasty bug in your program if the given string is interned or shared at all (i.e. any non-local string, such as const or even just a public field). Furthermore, if the native code tries to modify the string by changing its length in any way, you will at best suffer a memory access violation, and at worst corrupt the state of your program in any number of ways.

At least by making the C++ type const, an explicit const_cast is required, which usually implies that the programmer writing the mutation is aware that they're doing something potentially unsafe (rather than it being an accident).

Marshalling Strings from Native to Managed

When marshalling string back from native to managed code, the concept of memory ownership becomes even more tricky, in my opinion- especially if your application is driven from managed code. If the string is created on the C++ side, you must somehow ensure the freeing of the unmanaged memory that the string occupies, but not before it is no longer in use on the C# side. This is not always simple, so the better option is to get the P/Invoke marshaller to copy the string from the C++ allocated memory in to the managed heap, and let RAII and smart pointers on the C++ side delete the string once the native method is complete.

By default, most P/Invoke marshalling operations will exhibit this behaviour and copy strings in to string objects. You can also do this manually using, for example, Marshal.PtrToStringUni() and other associated methods; supplying the function with a pointer to the start of the unmanaged string. The function will create a new managed string object and copy the unmanaged string to it; meaning that as soon as the function returns the unmanaged memory can safely be deleted or moved.

However, another, third approach is to allocate the memory completely on the C# side first (by asking the C++ side how much memory it needs) with something like stackalloc and then using methods such as the static Marshal.PtrToStringUni to convert the data in to a managed CLR string before the stack memory goes out-of-scope. This is the general approach I used for marshalling strings out of native memory in the end, because sometimes you can't be sure that the native string is not going to be moved or reclaimed in the time between getting a pointer to it and the PtrToStringUni function completing its copy.

If you're wondering whether this is overkill, it's actually quite an easy mistake to make- returning a pointer to a local/stack copy of a string in C++ means that you're already sitting on a time-bomb by the time you return to 'managed land'. What's worse is that this mistake tends to go unpunished 99% of the time (accessing deleted memory is undefined behaviour, after all), so your application will only suddenly stop working sometimes. And that's not even considering multi-threaded scenarios! And finally, I just like the idea of the calling side (i.e. the managed side) being completely responsible for memory ownership. It's ultra-explicit, and when it comes to handling memory in this sort of code, I think it pays to be 110% sure of what you're doing. Anyway, here is an example:

// C++
 
extern "C" __declspec(dllexport) uint32_t GetErrorStrLen() {
    return static_cast<uint32_t>(strlen(lastErrorString));
}
 
extern "C" __declspec(dllexport) void GetErrorStr(char16_t* const strSpace) {
    strcpy(strSpace, lastErrorString); // Consider using strcpy_s or similar in real code!
}

View Code Fullscreen • "Explicit String Marshalling Example (CPP Side)"

// C#
 
[DllImport(NATIVE_DLL_NAME, CallingConvention = CallingConvention.Cdecl,
    EntryPoint = "GetErrorStrLen")]
public static extern uint GetErrorStrLen();
 
[DllImport(NATIVE_DLL_NAME, CallingConvention = CallingConvention.Cdecl,
    EntryPoint = "GetErrorStr")]
public static extern uint GetErrorStr(
    IntPtr strSpace
);
 
public static unsafe void Test() {
        int errorStrLen = (int) GetErrorStrLen();
        char* errorStr = stackalloc char[errorStrLen + 1]; // +1 for the null terminator
        GetErrorString((IntPtr) errorStr);
        string errorString = Marshal.PtrToStringUni((IntPtr) errorStr);
}

View Code Fullscreen • "Explicit String Marshalling Example (C# Side)"
The other nice thing about this approach is that because we used stackalloc we don't have to remember to clean up any memory - the only allocated memory will be cleaned up when the function exits (not including the string object we also new'd up, but that will be garbage collected as usual).

However, a final note of warning- this method is still not an atomic operation. Although we don't have to worry about RAII'd smart pointers removing our memory out from under us anymore, it's still possible that another thread causes the value we're trying to copy to change between our calls to GetErrorStrLen() and GetErrorString(). It will be up to you to make sure that if you're using multiple threads, this code remains an atomic operation.

Whichever way you choose, make sure you think very carefully about where and how memory is handled from start to finish! Although the techniques I described above have worked well enough for my entire string marshalling needs, they are not at all exhaustive. If you feel like you need more advanced string marshalling options available to you, I would totally recommend the following CodeProject article: Advanced Topics in PInvoke String Marshaling. Even if you decide not to use the techniques described in that article, just understanding the concepts behind them can help you ensure you are not making mistakes with shared or invalid memory usage in your own interop code.

Suppressing Unmanaged Code Security

As you may or may not know, C# applications can be run in a secured 'sandbox', meaning that the CLR runtime will ensure an application (or AppDomain) can only execute certain functions. For a game engine, I don't find it particularly useful.

However, all P/Invoke calls must still request native code execution permissions. Normally, this is done on every call. However, the check takes a not-inconsiderable amount of time. If you're not interested in sandbox security, I would highly recommend that you apply the SuppressUnmanagedCodeSecurity attribute to all of your P/Invoke functions. You can also simply apply it to the class that contains your P/Invoke functions and it will be applied to all external methods contained within.

To see how much of a difference it really makes, I wrote a test framework that calls msvcrt's memcpy a whole bunch of times via P/Invoke. It runs once (A) with the security turned on by default, and once (B) with it suppressed. Both functions copied 1 kilobyte using memcpy 10000 times; and the whole test was run 60 times, with an extra 5 times at the start not counted to allow the JIT compiler (and initial security check) to do its thing.

The results are as follows:

>>> 60 repetitions >>> IN NANOSECONDS (1000ns = 0.001ms)
Method   Avg.         Min.         Max.         Jitter       Total
A        375,601ns    365,000ns    481,600ns    105,998ns    ! 22.536ms
B        275,173ns    270,400ns    334,000ns    58,826ns     ! 16.510ms

View Code Fullscreen • "Performance Comparison For SuppressUnmanagedCodeSecurity Attribute"
Put simply, with the default security options left enabled, it took 22.536ms on my machine to copy 1000 bytes 60000 times, and with unmanaged code security suppressed, it took 16.510ms. That means that with code security enabled, there's an overheard of roughly 1ms per 10000 P/Invoke calls (on my mid-high range machine). Although you may not eventually see the benefit of this micro-optimization, the way I see it it's a free win- you only have to add the attribute once and you get some free frame time in a game engine, for example. And we don't get many free wins in our industry. ;)

Custom Structs

Unfortunately, even with unmanged code security suppressed, making thousands of P/Invoke'd calls every frame can still be too slow for your needs. One approach to solving this that we baked in to our engine is instead to make use of lots of shared memory. Having a command-buffer like structure in unmanaged memory space can allow you to fill memory with draw instructions and parameters and then pass control over to C++; where you can than simply and rapidly iterate through the buffer, dispatching DirectX draw calls.

Even if you don't need this, at some point you might want to send a custom struct through a P/Invoke call. It can be tempting to keep things simple and use only primitive types like numbers and strings, but at some point you might find yourself in need of something more... Well, structured.

So how exactly do we share structured data between managed and unmanaged code? Let's imagine we want to share a WindowSetupDesc struct that describes the layout and meta-options for creating a new Win32 window. Let's define the exported method and P/Invoke pair as follows:

// C++
 
extern "C" __declspec(dllexport) void CreateNewWindow(WindowSetupDesc windowDesc) {
    // create the window here
}

View Code Fullscreen • "WindowSetupDesc Passing Example (CPP Side)"

// C#
 
[DllImport(NATIVE_DLL_NAME, CallingConvention = CallingConvention.Cdecl,
    EntryPoint = "CreateNewWindow")]
public static extern void CreateNewWindow(
    WindowSetupDesc windowDesc
);

View Code Fullscreen • "WindowSetupDesc Passing Example (C# Side)"
So, for the first attempt, you might be content just to write the struct in C# and C++ and pass it as you would any other parameter:

// C++
 
struct WindowSetupDesc {
    INTEROP_STRING Title;
    uint32_t Width;
    uint32_t Height;
    INTEROP_BOOL HideBorders;
    INTEROP_BOOL AllowResizing;
};

View Code Fullscreen • "Naive WindowSetupDesc Implementation (CPP Side)"

// C#
 
public struct WindowSetupDesc {
    public string Title;
    public uint Width;
    public uint Height;
    public InteropBool HideBorders;
    public InteropBool AllowResizing;
};

View Code Fullscreen • "Naive WindowSetupDesc Implementation (C# Side)"
...But you'd be wrong. This might happen to work on your machine if you're lucky (or is that unlucky?). But C# is allowed to perform various optimisations that include the reordering of struct fields. In essence, that means that actually, the Title field might start in memory after the Width and Height fields on the C# side, but be in the definition-order on the C++ size (C++ is not allowed to perform this optimization).

But as well as this, both compilers/languages are allowed to add extra bytes on to the end of a struct (called 'padding') for performance reasons or alignment requirements. And finally, both compilers/languages are also allowed to add padding between the different variables of the struct in order to make them line up neatly on cache lines and so on.

Now, all of these (except for alignment requirements) are simply performance tricks on most processors (some processors don't like reading words across cache lines, but x86 et al are fine). You should know these performance implications if you're writing something like a game engine, but for most cases (like when creating a window) we don't really need it. So, we kind of need to side-step these compiler optimisations somehow.

When it comes to doing this, there are two options. You can use the various constructs given to you in C# and C++ to build your structs safely, or you can do everything manually with blobs of bytes. Technically, the only safe way to share data like this between two compilers, languages, computers, or architectures is the latter. In essence, you should define your structure in pure memory terms (e.g. "8 bytes for string pointer followed by 4 bytes for window width followed by..." etc), and then simply copy the raw binary data from source to destination (or a pointer to it); taking the fields out at the right place, manually.

That being said... ;) I find that all a bit tedious. For network programming it's simply a must, but you can get C# and C++ to play nice together with a bit of hand-holding on the same machine. So, here's how:

Firstly, we need to force the compiler not to add extra packing bytes. And then, we have to tell C# not to reorder our fields. Here's an updated version of the struct on both sides (in MSVC++, consult your compiler's documentation for other compilers; but the principle is usually the same):

// C++
 
#pragma pack(push, 1)
struct WindowSetupDesc {
    INTEROP_STRING Title;
    uint32_t Width;
    uint32_t Height;
    INTEROP_BOOL HideBorders;
    INTEROP_BOOL AllowResizing;
};
#pragma pack(pop)

View Code Fullscreen • "Corrected WindowSetupDesc Implementation (CPP Side)"

// C#
 
[StructLayout(LayoutKind.Sequential, Pack = 1)]
public struct WindowSetupDesc {
    public string Title;
    public uint Width;
    public uint Height;
    public InteropBool HideBorders;
    public InteropBool AllowResizing;
};

View Code Fullscreen • "Corrected WindowSetupDesc Implementation (C# Side)"
On the C++ side, we are using the #pragma pack directive to force the compiler to minimize packing in our struct (the 1 value specifies a minimum distance between consecutive fields of 1 byte- i.e. no padding). On the C# side we've added a new StructLayout attribute that does two things. The Pack = 1 property does the same as the #pragma pack statement on the C++ side. And the LayoutKind.Sequential selection prevents the C# compiler from re-ordering our fields.

With these things in place, we can be fairly sure that passing our struct around (either as a parameter or in shared memory) is going to be safe... But not 100% sure. For example, if any of your fields on the C++ side have special alignment requirements (e.g. an SSE-enabled vector), those requirements will always take precedence over anything you write around the struct. Plus, well, it's just hard to juggle all the different compiler + language rules on both sides of the equation sometimes. So, for that reason, I recommend to anyone that they either use the manual-byte-arrays method I mentioned above, or that they include a ton of asserts in the program code near initialization to ensure that everything is the size expected (basically, checking sizeof() everywhere), which is what I do. In C++ you can even go for static_asserts. For my next project in C# I'm also going to experiment with something like Fody or another compile-time IL rewriter to embed the assertions with a custom attribute next to the struct.

Finally, it is worth noting that according to the MSDN page for StructLayout, the LayoutKind.Sequential specifier has no effect on non-blittable types when not going through the marshaller. In essence, this means that sharing memory with non-blittable types is still not safe... But non-blittable usually means "includes a GC-reference type", and sharing memory with structs that include GC'd members is usually a dumb idea anyway. In a pinch, it does mention that you can use the LayoutKind.Explicit specification instead, marking the exact byte offset of each field with FieldOffset, if you need this.

Managed Code Analysis

And finally, to end with, just a short and simple tip. :) If you turn on Managed Code Analysis for your projects, make sure you enable rule CA1400: This rule will inspect the entry points in native code according to all of your DllImport attributes, and check that they exist! Great for catching nasty typos early. :)

Well, that's it for this post folks. It's been a long and technically dense one; and I'm sure there's at least one error in there (despite my proof-reading), so if you spot any corrections or just have any criticisms or improvements, do leave a comment! You can also find me using the social media links in the sidebar. Good luck!

Table of Contents

Post Details

Recommended Posts