The Sorry State of Convenient IPC

(Published on 2014-07-29)

The Problem

How do you implement communication between two or more processes? This is a question that has been haunting me for at least 6 years now. Of course, this question is very broad and has many possible answers, depending on your scenario. So let me get more specific by describing the problem I want to solve.

What I want is to write a daemon process that runs in the background and can be controlled from other programs or libraries. The intention is that people can easily write custom interfaces or quick scripts to control the daemon. The service that the daemon offers over this communication channel can be thought of as its primary API, in this way you can think of the daemon as a persistent programming library. This concept is similar to existing programs such as btpd, MPD, Transmission and Telepathy - I’ll get back to these later.

More specifically, the most recent project I’ve been working on that follows this pattern is Globster, a remotely controllable Direct Connect client (if you’re not familiar with Direct Connect, think of it as IRC with some additional file sharing capabilities built in). While the problem I describe is not specific to Globster, it still serves as an important use case. I see many other projects with similar IPC requirements.

The IPC mechanism should support two messaging patterns: Request/response and asynchronous notifications. The request/response pattern is what you typically get in RPC systems - the client requests something of the daemon and the daemon then replies with a response. Asynchronous notifications are useful in allowing the daemon to send asynchronous status updates to the client, such as incoming chat messages or file transfer status. Lack of support for such notifications would mean that a client needs to continuously poll for updates, which is inefficient.

So what I’m looking for is a high-level IPC mechanism that handles this communication. Solutions are evaluated by the following criteria, in no particular order.

Easy

And with easy I refer to ease of use. As mentioned above, other people should be able to write applications and scripts to control the daemon. Not many people are willing to invest days of work just to figure out how to communicate with the daemon.

Simple

Simplicity refers to the actual protocol and the complexity of the code necessary to implement it. Complex protocols require complex code, and complex code is hard to maintain and will inevitably contain bugs. Note that simple and easy are very different things and often even conflict with each other.

Small

The IPC implementation shouldn’t be too large, and shouldn’t depend on huge libraries. If you need several megabytes worth of libraries just to send a few messages over a socket, you’re doing it wrong.

Language independent

Control the daemon with whatever programming language you’re familiar with.

Networked

A good solution should be accessible from both the local system (daemon running on the same machine as the client) and from the network (daemon and client running different machines).

Secure

There’s three parts in having a secure IPC mechanism. One part is to realize that IPC operates at a trust boundary; The daemon can’t blindly trust everything the client says and vice versa, so message validation and other mechanisms to prevent DoS or information disclosure on either part are necessary.

Then there the matter of confidentiality. On a local system, UNIX sockets will provide all the confidentiality you can get, so that’s trivial. Networked access, on the other hand, requires some form of transport layer security.

And finally, we need some form of authentication. There should be some mechanism to prevent just about anyone to connect to the daemon. A coarse-grained solution such as file permissions on a local UNIX socket or a password-based approach for networked access will do just fine for most purposes. Really, just keep it simple.

Fast

Although performance isn’t really a primary goal, the communication between the daemon and the clients shouldn’t be too slow or heavyweight. For my purposes, anything that supports about a hundred messages a second on average hardware will do perfectly fine. And that shouldn’t be particularly hard to achieve.

Proxy support

This isn’t really a hard requirement either, but it would be nice to allow other processes (say, plugins of the daemon, or clients connecting to the daemon) to export services over the same IPC channel as the main daemon. This is especially useful in implementing a cross-language plugin architecture. But again, not a hard requirement, because even if the IPC mechanism doesn’t directly support proxying, it’s always possible for the daemon to implement some custom APIs to achieve the same effect. This, however, requires extra work and may not be as elegant as a built-in solution.

Now let’s discuss some existing solutions…

Custom Protocol

Why use an existing IPC mechanism in the first place when all you need is UNIX/TCP sockets? This is the approach taken by btpd, MPD (protocol spec) and older versions of Transmission (see their 1.2x spec). Brpd hasn’t taken the time to documented the protocol format, suggesting it’s not really intended to be used as a convenient API (other than through their btcli), and Transmission has since changed to a different protocol. I’ll mainly focus on MPD here.

MPD uses a text-based request/response mechanism, where each request is a simple one-line command and a response consists of one or more lines, ending with an OK or ACK line. There’s no support for asynchronous notifications, although that could obviously have been implemented, too. Let’s grade this protocol…

Easy? Not really.

Although MPD has conventions for how messages are formatted, each individual message still requires custom parsing and validation. This can be automated by designing an IDL and accompanying code generator, but writing one specific for a single project doesn’t seem like a particularly fun task.

The protocol, despite its apparent simplicity, is apparently painful enough to use that there is a special libmpdclient library to abstract away the communication with MPD, and interfaces to this library are available in many programming languages. If you have access to such an application-specific library for your language of choice, then sure, using the IPC mechanism is easy enough. But that applies to literally any IPC mechanism.

Ideally, such a library needs to be written only once for the IPC mechanism in use, and after that no additional code is needed to communicate with services/daemons using that particular IPC mechanism. Code re-use among different projects is great, yo. It also doesn’t scale very well when extending the services offered by daemon, any addition to the API will require modifications to all implementations.

Simple? Definitely.

I only needed a quick glance at the MPD protocol reference and I was able to play a bit with telnet and control my MPD. Writing an implementation doesn’t seem like a complex task. Of course, this doesn’t necessarily apply to all custom protocols, but you can make it as simple or complex as you want it to be.

Small? Sure.

This obviously depends on how elaborate you design your protocol. If you have a large or complex API, the size of a generic message parser and validator can easily compensate for the custom parser and validator needed for each custom message. But for a simple APIs, it’s hard to beat a custom protocol in terms of size.

Language independent? Depends.

Of course, a socket library is available to most programming languages, and in that sense any IPC mechanism built on sockets is language independent. This is, as such, more of an argument as to how convenient it is to communicate with the protocol directly rather than with a library that abstracts the protocol away. In the case of MPD, the text-based protocol seems easy enough to use directly from most languages, yet for some reason most people prefer language-specific libraries for MPD.

If you design a binary protocol or anything more complex than simple request/response message types, using your protocol directly is going to be a pain in certain languages, and people will definitely want a library specific to your daemon for their favourite programming language. Something you’ll want to avoid, I suppose.

Networked? Sure enough.

Just a switch between UNIX sockets and TCP sockets. Whether a simple solution like that is a good idea, however, depends on the next point…

Secure? Ugh.

Security is hard to get right, so having an existing infrastructure that takes care of most security sensitive features will help a lot. Implementing your own protocol means that you also have to implement your own security, to some extent at least.

Writing code to parse and validate custom messages is error-prone, and a bug in this code could make both the daemon and the client vulnerable to crashes and buffer overflows. A statically-typed abstraction that handles parsing and validation would help a lot.

For networked communication, you’ll need some form of confidentiality. MPD does not seem to support this, so any networked access to an MPD server is vulnerable to passive observers and MITM attacks. This may be fine for a local network (presumably what it is intended to be used for), but certainly doesn’t work for exposing your MPD control interface to the wider internet. Existing protocols such as TLS or SSH can be used to create a secure channel, but these libraries tend to be large and hard to use securely. This is especially true for TLS, but at least there’s stunnel to simplify the implementation - at the cost of less convenient deployment.

In terms of authentication, you again need to implement this yourself. MPD supports authentication using a plain-text password. This is fine for a trusted network, but on an untrusted network you certainly want confidentiality to prevent a random observer from reading your password.

Fast? Sure.

Existing protocols may have put more effort into profiling and implementing various optimizations than one would typically do with a custom and quickly-hacked-together protocol, but still, it probably takes effort to design a protocol that isn’t fast enough.

Proxy support? Depends…

Really depends on how elaborate you want to be. It can be very simple if all you want is to route some messages, it can get very complex if you want to ensure that these messages follow some format or if you want to reserve certain interfaces or namespaces to certain clients. What surprised me about the MPD protocol is that it actually has some support for proxying. But considering the ad-hoc nature of the MPD protocol, the primitiveness and simplicity of this proxy support wasn’t too surprising. Gets the job done, I suppose.

Overall, and as a rather obvious conclusion, a custom protocol really is what you make of it. In general, though, it’s a lot of work, not always easy to use, and a challenge to get the security part right.

D-Bus

D-Bus is being used in Transmission and is what I used for Globster.

On a quick glance, D-Bus looks perfect. It is high-level, has the messaging patterns I described, the protocol specification does not seem overly complex (though certainly could be simplified), it has implementations for a number of programming languages, has support for networking, proxying is part of normal operation, and it seems fast enough for most purposes. When you actually give it a closer look, however, reality isn’t as rose-colored.

D-Bus is designed for two very specific use-cases. One is to allow local applications to securely interact with system-level daemons such as HAL (now long dead) and systemd, and the other use-case is to allow communication between different applications inside one login session. As such, on a typical Linux system there are two D-Bus daemons where applications can export interfaces and where messages can be routed through. These are called the system bus and the session bus.

Easy? Almost.

The basic ideas behind D-Bus seem easy enough to use. The fact that is has type-safe messages, interface descriptions and introspection really help in making D-Bus a convenient IPC mechanism.

The main reasons why I think D-Bus isn’t all that easy to use in practice is due to the lack of good introductionary documentation and the crappy state of the various D-Bus implementations. There is a fairly good article providing a high-level overview to D-Bus, but there isn’t a lot of material that covers how to actually use D-Bus to interact with applications or to implement a service.

On the implementations, I have had rather bad experiences with the actual libraries. I’ve personally used the official libdbus-1, which markets itself a “low-level” library designed to facilitate writing bindings for other languages. In practice, the functionality that it offers appears to be too high-level for writing bindings (GDBus doesn’t use it for this reason), and it is indeed missing a lot of functionality to make it convenient to use directly. I’ve also played around with Perl’s Net::DBus and was highly disappointed. Not only is the documentation rather incomplete, the actual implementation has more bugs than features. And instead of building on top of one of the many good event loops for Perl (such as AnyEvent), it chooses to implement its own event loop. The existence of several different libraries for Python doesn’t incite much confidence, either.

I was also disappointed in terms of the available tooling to help in the development, testing and debugging of services. The gdbus(1) tool is useful for monitoring messages and scripting some things, but is not all that convenient because D-Bus has too many namespaces and the terrible Java-like naming conventions make typing everything out a rather painful experience. D-Feet offers a great way to explore services, but lacks functionality for quick debugging sessions. I made an attempt to write a convenient command-line shell, but lost interest halfway. :-(

D-Bus has the potential to be an easy and convenient IPC mechanism, but the lack of any centralized organization to offer good implementations, documentation and tooling makes using D-Bus a pain to use.

Simple? Not quite.

D-Bus is conceptually easy and the message protocol is alright, too. Some aspects of D-Bus, however, are rather more complex than they need to be.

I have once made an attempt to fully understand how D-Bus discovers and connects to the session bus, but I gave up halfway because there are too many special cases. To quickly summarize what I found, there’s the DBUS_SESSION_BUS_ADDRESS environment variable which could point to the (filesystem or abstract) path of a UNIX socket or a TCP address. If that variable isn’t set, D-Bus will try to connect to your X server and get the address from that. In order to avoid linking everything against X libraries, a separate dbus-launch utility is spawned instead. Then the bus address could also be obtained from a file in your $HOME/.dbus/ directory, with added complexity to still support a different session bus for each X session. I’ve no idea how exactly connection initiation to the system bus works, but my impression is that a bunch of special cases exist there, too, depending on which init system your OS happens to use.

As if all the options in connection initiation aren’t annoying enough, there’s also work on kdbus, a Linux kernel implementation to get better performance. Not only will kdbus use a different underlying communication mechanism, it will also switch to a completely different serialization format. If/when this becomes widespread you will have to implement and support two completely different protocols and pray that your application works with both.

On the design aspect there is, in my opinion, needless complexity with regards to naming and namespaces. First there is a global namespace for bus names, which are probably better called application names, because that’s usually what they represent. Then, there is a separate object namespace local to each bus name. Each object has methods and properties, and these are associated with an interface name, in a namespace specific to the particular object. Despite these different namespaces, the convention is to use a full and globally unique path for everything that has a name. For example, to list the IM protocols that Telepathy supports, you call the ListProtocols method in the org.freedesktop.Telepathy.ConnectionManager interface on the /org/freedesktop/Telepathy/ConnectionManager object at the org.freedesktop.Telepathy bus. Fun times indeed. I can understand the reasoning behind most of these choices, but in my opinion they found the wrong trade-off.

Another point of complexity that annoys me is the fact that an XML format is used to describe interfaces. Supporting XML as an IDL format is alright, but requiring a separate format for an introspection interface gives me the impression that the message format wasn’t powerful enough for such a simple purpose. The direct effect of this is that any application wishing to use introspection data will have to link against an XML parser, and almost all conforming XML parser implementations are as large as the D-Bus implementation itself.

Small? Kind of.

libdbus-1.so.3.8.6 on my system is about 240 KiB. It doesn’t cover parsing interface descriptions or implementing a D-Bus daemon, but still covers most of what is needed to interact with services and to offer services over D-Bus. It’s not that small, but then again, libdbus-1 was not really written with small size in mind. There’s room for optimization.

Language independent? Sure.

D-Bus libraries exist for a number of programming languages.

Networked? Half-assed.

D-Bus officially supports networked connections to a D-Bus daemon. Actually using this, however, is painful. Convincing dbus-daemon(1) to accept connections on a TCP socket involves disabling all authentication (it expects UNIX credential passing, normally) and requires adding an undocumented <allow_anonymous/> tag in the configuration (I only figured this out from reading the source code).

Even when you’ve gotten that to work, there is the problem that D-Bus isn’t totally agnostic to the underlying socket protocol. D-Bus has support for passing UNIX file descriptors over the connection, and this of course doesn’t work over TCP. While this feature is optional and easily avoided, some services (I can’t find one now) use UNIX fds in order to keep track of processes that listen to a certain event. Obviously, those services can’t be accessed over the network.

Secure? Only locally.

D-Bus has statically typed messages that can be validated automatically, so that’s a plus.

For local authentication, there is support for standard UNIX permissions and credential passing for more fine-grained authorization. For remote authentication, I think there is support for a shared secret cookie, but I haven’t tried to use this yet.

There is, as with MPD, no support at all for confidentiality, so using networked D-Bus over an untrusted network would be a very bad idea anyway.

Fast? Mostly.

The messaging protocol is fairly lightweight, so no problems there. I do have to mention two potential performance issues, however.

The first issue is that the normal mode of operation in D-Bus is to proxy all messages through an intermediate D-Bus daemon. This involves extra context switches and message parsing passes in order to get one message from application A to application B. I believe it is officially supported to bypass this daemon and to communicate directly between two processes, but after my experience with networking I am wary of trying anything that isn’t part of how D-Bus is intended to be used. This particular performance issue is what kdbus addresses, so I suppose it won’t apply to future Linux systems.

The other issue is that a daemon that provides a service over D-Bus does not know whether there exists an application that is interested receiving its notifications. This means that the daemon always has to spend resources to send out notification messages, even if no application is actually interested in receiving them. In practice this means that the notification mechanism is avoided for events that may occur fairly often, and an equally inefficient polling approach has to be used instead. It is possible for a service provider to keep track of interested applications, but this is not part of the D-Bus protocol and not something you would want to implement for each possible event. I’ve no idea if kdbus addresses this issue, but it would be stupid not to.

Proxy support? Yup.

It’s part of normal operation, even.

D-Bus has many faults, some of them are by design, but many are fixable. I would have contributed to improving the situation, but I get the feeling that the goals of the D-Bus maintainers are not at all aligned with mine. My impression is that the D-Bus maintainers are far too focussed on their own specific needs and care little about projects with slightly different needs. Especially with the introduction of kdbus, I consider D-Bus too complex now to consider it worth the effort to improve. Starting from scratch seems less work.

JSON/XML RPC

While I haven’t extensively used JSON-RPC or XML-RPC myself, it’s still an interesting alternative to study. Transmission uses JSON-RPC (spec) as its primary IPC mechanism, and RTorrent has support for an optional XML-RPC interface. (Why do I keep referencing torrent clients? Surely there are other interesting applications? Oh well.)

The main selling point of HTTP-based IPC is that it is accessible from browser-based applications, assuming everything has been setup correctly. This is a nice advantage, but lack of this support is not really a deal-breaker for me. Browser-based applications can still use any other IPC mechanism, as long as there are browser plugins or some form of proxy server that converts the messages of the IPC mechanism to something that is usable over HTTP. For example, both solutions exist for D-Bus, in the form of the Browser DBus Bridge and cloudeebus. Of course, such solutions typically aren’t as convenient as native HTTP support.

Since HTTP is, by design, purely request-response, JSON-RPC and XML-RPC don’t generally support asynchronous notifications. It’s possible to still get asynchronous notifications by using WebSockets (Ugh, opaque stream sockets, time to go back to our custom protocol) or by having the client implement a HTTP server itself and send its URL to the service provider (This is known as a callback in the SOAP world. I have a lot of respect for developers who can put up with that crap). As I already hinted, neither solution is simple or easy.

Let’s move on to the usual grading…

Easy? Sure.

The ubiquity of HTTP, JSON and XML on the internet means that most developers are already familiar with using it. And even if you aren’t, there are so many easy-to-use and well-documented libraries available that you’re ready to go in a matter of minutes.

Although interface description languages/formats exist for XML-RPC (and possibly for JSON-RPC, too), I get the impression these are not often used outside of the SOAP world. As a result, interacting with such a service tends be weakly/stringly typed, which, I imagine, is not as convenient in strongly typed programming languages.

Simple? Not really.

Many people have the impression that HTTP is somehow a simple protocol. Sure, it may look simple on the wire, but in reality it is a hugely bloated and complex protocol. I strongly encourage everyone to read through RFC 2616 at least once to get an idea of its size and complexity. To make things worse, there’s a lot of recent activity to standardize on a next generation HTTP (SPDY and HTTP 2.0), but I suppose we can ignore these developments for the foreseeable future for the use case of IPC.

Of course, a lot of the functionality specified for HTTP is optional and can be ignored for the purpose of IPC, but that doesn’t mean that these options don’t exist. When implementing a client, it would be useful to know exactly which HTTP options the server supports. It would be wasteful to implement compression support if the server doesn’t support it, or keep-alive, or content negotiation, or ranged requests, or authentication, or correct handling for all response codes when the server will only ever send ‘OK’. What also commonly happens is that server implementors want to support as much as possible, to the point that you can have JSON or XML output, depending on what the client requested.

XML faces a similar problem. The format looks simple, but the specification has a bunch of features that hardly anyone uses. In contrast to HTTP, however, a correct XML parser can’t just decide to not parse <!DOCTYPE ..> stuff, so it has to implement some of this complexity.

On the upside, JSON is a really simple serialization format, and if you’re careful enough to only implement the functionality necessary for basic HTTP, a JSON-RPC implementation can be somewhat simple.

Small? Not really.

What typically happens is that implementors take an existing HTTP library and build on top of that. A generic HTTP library likely implements a lot more than necessary for IPC, so that’s not going to be very small. RTorrent, for example, makes use of the not-very-small xmlrpc-c, which in turn uses libcurl (400 KiB, excluding TLS library) and either the bloated libxml2 (1.5 MiB) or libexpat (170 KiB). In any case, expect your programs to grow by a megabyte or more if you go this route.

Transmission seems rather less bloated. It uses the HTTP library that is built into libevent (totalling ~500 KiB, but libevent is also used for other networking parts), and a simple JSON parser can’t be that large either. I’m sure that if you reimplement everything from scratch for the purpose of building an API, you could get something much smaller. Then again, even if you manage to shrink the size of the server that way, you can’t expect all your users to do the same.

If HTTPS is to be supported, add ~500 KiB more. TLS isn’t the simplest protocol, either.

Language independent? Yes.

Almost every language has libraries for web stuff.

Networked? Definitely.

In fact, I’ve never seen anyone use XML/JSON RPC over UNIX sockets.

Secure? Alright.

HTTP has built-in support for authentication, but it also isn’t uncommon to use some other mechanism (based on cookies, I guess?).

Confidentiality can be achieved with HTTPS. There is the problem of verifying the certificate, since I doubt anyone is going to have certificates of their local applications signed by a certificate authority, but there’s always the option of trust-on-first use. Custom applications can also include a fingerprint of the server certificate in the URL for verification, but this won’t work for web apps.

Fast? No.

JSON/XML RPC messages add significant overhead to the network and requires more parsing than a simple custom solution or D-Bus. I wouldn’t really call it fast, but admittedly, it might still be fast enough for most purposes.

Proxy support? Sure.

HTTP has native support for proxying, and it’s always possible to proxy some URI on the main server to another server, assuming the libraries you use support that. It’s not necessarily simple to implement, however.

The lack of asynchronous notifications and the overhead and complexity of JSON/XML RPC make me stay away from it, but it certainly is a solution that many client developers will like because of its ease of use.

Other Systems

There are a more alternatives out there than I have described so far. Most of those were options I dismissed early on because they’re either incomplete solutions or specific to a single framework or language. I’ll still mention a few here.

Message Queues

In the context of IPC I see that message queues such as RabbitMQ and ZeroMQ are quite popular. I can’t say I have much experience with any of these, but these MQs don’t seem to offer a solution to the problem I described in the introduction. My impression of MQs is that they offer a higher-level and more powerful alternative to TCP and UDP. That is, they route messages from one endpoint to another. The contents of the messages are still completely up to the application, so you’re still on your own in implementing an RPC mechanism on top of that. And for the purpose of building a simple RPC mechanism, I’m convinced that plain old UNIX sockets or TCP will do just fine.

Cap’n Proto

I probably should be spending a full chapter on Cap’n Proto instead of this tiny little section, but I’m simply not familiar enough with it to offer any deep insights. I can still offer my blatantly uninformed impression of it: It looks very promising, but puts, in my opinion, too much emphasis on performance and too little emphasis on ease of use. It lacks introspection and requires that clients have already obtained the schema of the service in order to interact with it. It also uses a capability system to handle authorization, which, despite being elegant and powerful, increases complexity and cognitive load (though I obviously need more experience to quantify this). It still lacks confidentiality for networked access and the number of bindings to other programming languages is limited, but these problems can be addressed.

Cap’n Proto seems like the ideal IPC mechanism for internal communication within a single (distributed) application and offers a bunch of unique features not found in other RPC systems. But it doesn’t feel quite right as an easy API for others to use.

CORBA

CORBA has been used by the GNOME project in the past, and was later abandoned in favour of D-Bus, primarily (I think) because CORBA was deemed too complex and incomplete. A system that is deemed more complex than D-Bus is an immediate red flag. The long and painful history of CORBA also makes me want to avoid it, if only because that makes it very hard to judge the quality and modernness of existing implementations.

Project Tanja

A bit over two years ago I was researching the same problem, but from a much more generic angle. The result of that was a project that I called Tanja. I described its concepts in an earlier article, and wrote an incomplete specification along with implementations in C, Go and Perl. I consider project Tanja a failure, primarily because of its genericity. It supported too many communication models and the lack of a specification as to which model was used, and the lack of any guarantee that this model was actually followed, made Tanja hard to use in practice. It was a very interesting experiment, but not something I would actually use. I learned the hard way that you sometimes have to move some complexity down into a lower abstraction layer in order to keep the complexity in check at higher layers of abstraction.

Conclusions

This must be the longest rant I’ve written so far.

In any case, there isn’t really a perfect IPC mechanism for my use case. A custom protocol involves reimplementing a lot of stuff, D-Bus is a pain, and JSON/XML RPC are bloat.

I am still undecided on what to do. I have a lot of ideas as to what a perfect IPC solution would look like, both in terms of features and in how to implement it, and I feel like I have enough experience by now to actually develop a proper solution. Unfortunately, writing a complete IPC system with the required utilities and language bindings takes a lot of time and effort. It’s not really worth it if I am the only one using it.

So here is my plea to you, dear reader: If you know of any existing solutions I’ve missed, please tell me. If you empathize with me and want a better solution to this problem, please get in touch as well! I’d love to hear about projects which face similar problems and have similar requirements.