Written by Thomas Stover in February 2006
The often overlooked Unix domain socket facility is one of the most powerful features in any modern Unix. Most socket programing books for Unix discus the topic mearly in an academic sense without ever explaining why it matters or what it is used for. Besides being the only way utilize certain abilities of the operating system, it is an area programers new to Linux, BSD, and other Unicies definitely need to be aware of. This is not a tutorial on sockets, rather a review of the features and benefits of one area of sockets programming.
The closest thing to a Unix domain socket would be a pipe. Unix pipes are an integral cornerstone of the OS at large. Analogous to a water pipe with water flowing in one direction, a stream of bytes flows from the write side of a pipe to the read side. A separate open file descriptor maintains a reference to the read and write side of a pipe. The different sides of the pipe can be in different processes or threads as long as they reside on the same local computer. Lets review the distinguishing characteristics of pipes in Unix.
As an exception to the fact that pipes must be written from one side and read from the other, Solaris pipes are full duplex. On Linux and BSD for example, full duplex operations with pipes use two different pipes. Named pipes and unnamed pipes are essentially the same thing. This is not the case with Windows. Windows provides two very different facilities, for what it calls named and anonymous pipes. Anonymous pipes are available in all versions of windows, and behave much like Unix pipes. Besides being dramatically slower, there are however several variations such as an adjustable pipe cache size that also effects the threshold for atomic writes. Windows named pipes are roughly analogous to Unix domain sockets. They are only available on the NT derived windows versions, and do not use the windows networking socket interface, winsock, at all. They do have the advantage of reaching across multiple computers in a NT domain.
A unix domain socket exists only inside a single computer. The word domain here has nothing to do with NIS, LDAP, or Windows, and instead refers to the file system. Unix domain sockets are identified by a file name in the file system like a named pipe would be. Programs communicating with a Unix domain socket must be on the same computer so they are not really a networking concept so much as they are an inter-process communication (IPC) concept. This explains why most networking books ignore them. They are interfaced with the same sockets API that is used for TCP/IP, UDP/IP, as well as other supported network protocols. You should be thinking at least two questions right now: "Why would a network program ever support Unix domain sockets as a transport?", and "Why would programs use a unix domain socket for an IPC mechanism instead of pipes, signals, or shared memory?". Here's some quick answers.
Not all of these features are available on every Unix. Worse there are variations on the way they are interfaced. Basic operations are pretty universally supported though. Lets move on to some examples.
Lets start with a very basic client and a forking server. A forking server spawns a new process to handle each incoming connection. After a connection is closed, its handler process exits. This type of server frequently gets a bad reputation due to its poor performance as a web server. The reason it performs poorly as a web server is because with HTTP, every single request is made with its own connection. The server thus spends a relatively disproportional amount of time creating and destroying processes versus actually handling requests. What is not commonly understood is that for other types a protocols which maintain a single connection during the entire time the client uses the server, a forking server is considered an acceptable design. Take Open SSH for example. The primary problem with this design for non-web server applications is that it is no longer as strait forward to share information between all the various handler instances. Multiplexing and multi-threaded as well as all sorts of other designs are out there, but the simple forking() server is a good as it gets for illustrating examples. Think of it as the "hello world" of server designs. Take the following sources.
#include <stdio.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <unistd.h>
int main(void)
{
struct sockaddr_un address;
int socket_fd, nbytes;
size_t address_length;
char buffer[256];
socket_fd = socket(PF_UNIX, SOCK_STREAM, 0);
if(socket_fd < 0)
{
printf("socket() failed\n");
return 1;
}
address.sun_family = AF_UNIX;
address_length = sizeof(address.sun_family) +
sprintf(address.sun_path, "./demo_socket");
if(connect(socket_fd, (struct sockaddr *) &address, address_length) != 0)
{
printf("connect() failed\n");
return 1;
}
nbytes = sprintf(buffer, "hello from a client");
write(socket_fd, buffer, nbytes);
nbytes = read(socket_fd, buffer, 256);
buffer[nbytes] = 0;
printf("MESSAGE FROM SERVER: %s\n", buffer);
close(socket_fd);
return 0;
}
|
#include <stdio.h>
#include <sys/socket.h>
#include <sys/un.h>
#include <sys/types.h>
#include <unistd.h>
int connection_handler(int connection_fd)
{
int nbytes;
char buffer[256];
nbytes = read(socket_fd, buffer, 256);
buffer[nbytes] = 0;
printf("MESSAGE FROM CLIENT: %s\n", buffer);
nbytes = sprintf(buffer, "hello from the server");
write(socket_fd, buffer, nbytes);
close(connection_fd);
return 0;
}
int main(void)
{
struct sockaddr_un address;
int socket_fd, connection_fd;
size_t address_length;
pid_t child;
socket_fd = socket(PF_UNIX, SOCK_STREAM, 0);
if(socket_fd < 0)
{
printf("socket() failed\n");
return 1;
}
unlink("./demo_socket");
address.sun_family = AF_UNIX;
address_length = sizeof(address.sun_family) +
sprintf(address.sun_path, "./demo_socket");
if(bind(socket_fd, (struct sockaddr *) &address, address_length) != 0)
{
printf("bind() failed\n");
return 1;
}
if(listen(socket_fd, 5) != 0)
{
printf("listen() failed\n");
return 1;
}
while((connection_fd = accept(socket_fd,
(struct sockaddr *) &address,
&address_length)) > -1)
{
child = fork();
if(child == 0)
{
/* now inside newly created connection handling process */
return connection_handler(connection_fd);
}
/* still inside server process */
close(connection_fd);
}
close(socket_fd);
unlink("./demo_socket");
return 0;
}
|
Armed with a some basic knowledge of C, beginner level Unix system programing, beginner level sockets programing, how to lookup man pages, and Google, the above example will help you create a UDS client and server. To try it out open a couple terminal windows, run the server in one, and the client in the other. After that try adding a something like sleep(15) to the server's connection handler, before it write()s back to the client. Bring up two more terminals, one with another instance of client and the other with top or ps -e, also netstat -au. Experiment with that for a while. Learn anything?
At this point there are several things we could do with this technology, that is: the ability to have running programs communicate with other arbitrary programs on the same computer. Taking into consideration were in the file system our socket is created and with what permissions, this could programs running with different credentials, that started at different times, or even with different login sessions (controlling ttys). A common example of a program that works like this is syslogd. On many unix types, programs use a unix domain socket to pass log messages to the syslog server.
There are other ways this could be accomplish without unix domain sockets, but not only are they pretty hard to beat, UDS allow for even more abilities.
|
Let us imagine a database server like PostgreSQL. The server can force every client program that connects to it to authenticate itself with a user name and password. It does this so that it can enforce its internal security policies based on what account a client is connecting with. Having to authenticate with a user name / password pair every time can get old so often other authentication schemes such as key pair authentication are used alternatively. In the case of local logins (client is on the same machine as the server) a feature of unix domain sockets known as credentials passing can be used. This is one area that is going to be different everywhere, so check your reference material. Let's look at how its done in Linux. Have a look here for how it's done on Open BSD. |
|
Linux uses a lower level socket function to grab the credentials of the process on the other side of unix domain socket, the multi-purpose getsockopt().
struct ucred credentials;
int ucred_length = sizeof(struct ucred);
/*fill in the user data structure */
if(getsockopt(connection_fd, SOL_SOCKET, SO_PEERCRED, &credentials, &ucred_length))
{
printf("could obtain credentials from unix domain socket");
return 1;
}
/* the process ID of the process on the other side of the socket */
credentials.pid;
/* the effective UID of the process on the other side of the socket */
credentials.uid;
/* the effective primary GID of the process on the other side of the socket */
credentials.gid;
/* To get supplemental groups, we will have to look them up in our account
database, after a reverse lookup on the UID to get the account name.
We can take this opportunity to check to see if this is a legit account.
*/
|
File descriptors can be sent from one process to another by two means. One way is by inheritance, the other is by passing through a unix domain socket. There are three reasons I know of why one might do this. The first is that on platforms that don't have a credentials passing mechanism but do have a file descriptor passing mechanism, an authentication scheme based on file system privilege demonstration could be used instead. The second is if one process has file system privileges that the other does not. The third is scenarios where a server will hand a connection's file descriptor to another all ready started helper process of some kind. Again this area is different from OS to OS. On Linux this is done with a socket feature known as ancillary data.
It works by one side sending some data to the other (at least 1 byte) with attached ancillary data. Normally this feature is used for odd features of various underlying network protocols, such as TCP/IP's almost pointless out of band data. This is accomplished with the lower level socket function sendmsg() that accepts both arrays of IO vectors and control data message objects as members of its struct msghdr parameter. Ancillary, also known as control, data in sockets takes the form of a struct cmsghdr. The members of this structure can mean different things based on what type of socket it is used with. Making it even more squirrelly is that most of these structures need to be modified with macros. Here are two example functions based on the ones available in Warren Gay's book mention at the end of this article. A socket's peer that read data sent to it by send_fd() without using recv_fd() would just get a single capital F.
|
|
Most of the time programs that communicate over a network work with stream, or connection oriented technology. This is when an additional software layer such as TCP's Nagle algorithm creates a virtual communication circuit out of the many single atomic (stateless) packets used by a underlying packet switched network. Sometimes we want to instead simply work with individual packets, such is the case with UDP. This technology is often called datagram communication. This strategy allows for a variety of trade-offs. One is the ability to make a low overhead, high performance server with a single context or "main loop" that handles multiple simultaneous clients. Although unix domain sockets are not a network protocol they do utilize the sockets network interface, and as such also provide datagram features.
Datagram communication works best with an application that can put a complete atomic message of some sort in a single packet. This can be a problem for UDP as various setbacks can limit the size of a packet to as little as 512 bytes. The limit for datagrams over a unix domain socket is much higher. A complete example is beyond our scope for this article. Those interested should find a UDP example (much easier to find) and combine that with the techniques above.
Another Linux specific feature is abstract names for unix domain sockets. Abstract named sockets are identical to regular UDS except that their name does not exist in the file system. This means two things: file permissions do not apply, and they can be accessed from inside chroot() jails. The trick is to make the first byte of the address name null. Look at the output from netstat -au to see what it looks while one of these abstract named sockets is in use. Example:
address.sun_family = AF_UNIX;
address_length = sizeof(address.sun_family) +
sprintf(address.sun_path, "#demo_socket");
address.sun_path[0] = 0;
bind(socket_fd, (struct sockaddr *) &address, address_length);
|
Even if you never need to directly program UD sockets, they are an important facet of understanding both the Unix security model and the inter-workings of the operating system. For those that do use them, they open up a world of possibilities.
Here are some recommended books for further reading.
Back to the Articles section