Dr Glendarme, or How I Learned to stop Kerberos and Love Factotum
1. Intro
This is my talk at the 9th International Workshop On Plan 9, in Waterloo, Canada in April 2023.
You can watch it below
or on youtube.
You can download the slides or read the paper.

2. Thanks

So, I will again thank everybody, especially the organizers.
It's been 10 years since I've left academia, and I've had jobs with pressing operational needs. Publishing papers and then going to conferences is like something I do once every five years, so I'm very happy to be here.
And furthermore, I sent my paper late because it's been 10 years without a Plan 9 workshop, and I was refreshing the page every few months. And last time I refreshed, I saw that I missed the deadline, and I was quite sad. My wife said "Just send an email, and maybe some nice people will let you in." And so I did, and so I'm here now. So I'm very happy to be here.
3. Context

A bit of context. In my professional life, I've met three people who had heard of Plan 9, not worked with it, not knew what it was, just heard of. So this is the first time in my life I'm talking with people that know more than me about Plan 9.
This is not to say that I know much… If I say something that is not quite right, And you're thinking, "This is not the way 9P works." Please do interrupt me, because you are probably right, and I'm probably wrong, and I want to know that. So if this could be an interactive session, I would be very happy. Do not wait until the end to ask questions. We are a very small number in the room, so it would be very workable to just talk. Just raise your hand, and I will stop talking, and yeah.
So anyway, let me give you a bit of context about the work I'm about to present here. Our biggest problem with Unix was that once you are root, you can do whatever you want.
Maybe you just wanted to switch users from Alice to Bob, but to do that, you need to switch to root, and then the machine is yours, and this is a huge problem for us.
3.1. French national computer crime Unit

I used to work at the French National Computer Crime Unit. The actual official logo is this one. It means something like "Digital Crime Fighting Center," and the rest is… the 9front style logo we used to do.
One of our friends spoke Russian, and he retro-engineered this acronym to mean something like "The Soviet of Whimsical Computer Scientists," or something like that, and it was quite nice.
And so what we had to do is manage a lot of forensics, digital forensics activity. So you had a few dozen investigators. And they would all be part of different investigations at the same time, filling in different roles. Some of them would be the lead investigator. Some of them would be like the experts in one precise piece of technology, for example, cryptography wallets on Windows. And they will fill different roles on different investigations, but at the same time, using sometimes the same machine, sometimes some servers, and so on.
So it was a huge mess, and we wanted to make sure that there could be no leak. What I mean by that is… It's about data privacy, but also about legal proceedings. We cannot have a piece of one case leak into another case. This would jeopardize both cases. So we have to be very careful.
And because we did not have a lot of money, we had a very small number of powerful servers, but every big computation was on those servers, so we needed to be very careful with isolation of processing.
3.2. Example Workload

And this is an example workload we have to deal with. Can you read at the back of the room? Is it big enough for you to read? Yeah? Thank you.
So you go to the suspect house, and you seize the hard drive, and you have, from this moment on, you have 24 hours to find solid evidence.
Because in 24 hours, you have to put him in front of a judge, and he will have an attorney, and you will have to provide actual evidence that this guy is a bad guy. And so the clock is running.
So what you do is you stream the hard drive. to a small binary that is going to, at the same time, dump the stream into another hard drive to make a copy, compute the hash of the whole hard drive. I will explain later why. And also search some specific things, like magic numbers for JPEG images, high entropy region, which are probably encrypted regions, and so on. And this is done in a single machine, in a single quite powerful machine, so that we can make sure that the link between the hard drive and the CPU is full all the time. And so we don't spend a lot of time copying the hard drive.
Nowadays it takes a few hours to copy a full hard drive sector by sector, and it's quite long when you only have 24 hours to find the evidence you need.
Then the copy is made. The images that have been carved are given to an image analyst, but once the copy is made you can mount it and you have access to the file system, which will also give you the images, not all of them because some of them have been deleted, which you would have found by carving, but anyway.
And the image analyst can work with that, and you can extract for example geolocation data, and your geographical information system analyst can use the data, along with for example data from the cell phone, to cross-reference it and whatever.
Once you find some encryption you can go to your big cluster, but the big cluster is also a host for very confidential information, so whatever you put in cannot go out without a form, like a real-life paper form.
And so we have that kind of complex workflow, and we have multiple workflows like this running simultaneously on our IT system.
And before we came in, what we used to do was literally pass the data along from office to office on the hard drive. It was very inefficient, quite costly, and frankly not very good for the chain of custody of digital evidence.
And what is good with 9P is that you can put the host and network boundary wherever you like, and you can just code your stuff as if, like you test it locally on actual files, and then you run it. And maybe the file is on the other side of the planet, maybe it's in this hard drive, you don't care, and it works the same. And this is very beautiful, it's a beautiful abstraction that we all like. And so we worked on putting all of this workflow through file system calls.
And for that, we needed a way to make sure that identity, like who is running what, was a somewhat consistent notion. And we ran into Unix problems like UID being numbers. So if you create Alice and Bob on one machine, and then Bob and Alice on the other, the UID will be switched. And that creates a whole lot of problems.
So you can have UID translations with NFS, but it was overly complex, and it was not good. So by that time, I had read the Plan9 papers, and I was like, okay, they solved the problem like years ago, it should be easier to do that now. Why are we dealing with this?
4. Glendarme

4.1. Setup

So let's talk about what we did. From the introduction tool that we did, I know that you are quite familiar with this. You are familiar with Plan9 and 9P. And again, this is the first time I talk to an audience that knows 9P. So this should not be new to you. This is what Plan9 from user space does right now, like without our work on top of that. So basically, you have Alice. She has a machine called Atlantis, and she's running a process called foo. And this process is going to access some files. So far, so good. It would be interesting if there were just files. They are located on Bob's machine. Bob's machine is called Bermuda, and the files are, in fact, here.
So the first thing that we want is that Alice on Atlantis and Alice on Bermuda are the same person. So we need a central authority to tell both machines who is who.
This is usually done with LDAP or on Windows with Active Directory, but both are hard to configure and are hard to secure. And you can tell Factotum about who is who. And so this is what we want to reproduce.
This Plan9 from user space tool called srv
is going to create a Unix socket.
So this is, I don't know if you can read, it's a folder in /tmp.
It's the emulation on Unix of namespaces in Plan9.
And so you will create a socket in it, and Mount is going to talk to this socket.
The system call related to files is going to be translated into 9p.
And this 9p is going to be sent to this socket.
And srv
is going to send this across the network to the other machine.
And on the other machine, 9p serve, which is, again, a piece of code from Plan9 from user space, is going to listen to this 9p stream.
And if you have on this picture, you have only one client.
But if you have multiple clients, 9p serve, and we will see how.
It's going to streamline, to multiplex them into one connection.
And this stream is going to go into a 9p server.
We have put a placeholder here it's just u9fs
which is a basic file
system server but, as you remember our workflow: we had multiple
servers but this is just a placeholder any 9P server would do.
4.2. The missing piece

And so there is a missing piece. The missing piece is over here is how do we
make sure Alice is actually Alice and how do we launch u9fs
as Alice ? The
color of whatever you see is tied with the owner so Bob is in red and Alice is
in blue and you can see that u9fs
is in blue this means that from the point of
view of Bermuda's kernel this process is owned by Alice.
Why do we want that ? Because we want this process to be what we call security agnostic. Thanks to 9p it's already network agnostic so you can program your your stuff without caring wether a piece of file is in your machine or somewhere else but we want also your piece of code not to deal with identity and access management and that kind of stuff the kernel should do that for you.
This piece of code is actually a process owned by Alice so we need
Bermuda to be aware of who Alice is and to have a way to check Alice's
identity and we didn't find that in plan 9 from user space so we created vrs
.
The name is just a mirror of srv
there is nothing to it. so
basically what vrs
does is listen to 9pserve
and once it sees a Tauth
message, a 9P authentication message, it's going to talk to
an authoritative factotum and then Alice's factotum and the
authoritative factotum are going to talk to one another until this one is
satisfied with all the answers and they will give the green light to vrs
and
then only vrs
is gonna fork and create this this server and vrs
is gonna drop
all administrative privileges before the server is launched so this this
program should not have any elevated privileges and then it could even if
it's pwned by an attacker it shouldn't be able to do much damage so this was a goal.
Do you have any question before I go further this is perfectly clear for all of you. This is wonderful I took months to explain to my interns.
4.3. Walking through what vrs does

[fumbling with the zoom for ages]
Alice sends a Tauth
At the beginning Alice is sending a Tauth
message. This is all 9p. I see some undergraduates that were curious: if
you have questions please ask them because we are going to talk 9p all day
so maybe ask your questions now.
So Alice is sending a Tauth
message
she wants to authenticate and 9pserve
is going to multiplex all of that into one
single connection for vrs
and what that entails is to rewrite this part and this
part this is the tag is client chosen. The response can come in out of
order and you need a tag to know which question this is and so on.
We keep a tab of tags and we know that if Alice has an incoming tag of 0 then this is an outgoing tag of 2 for 9pserve so we change that part for Alice and same for the FID and we need to switch them up.
Alice's 0 is actually a 4 and once it's transformed that way the message goes up
to to vrs
. vrs
sees that it's a Tauth
so vrs
is like okay this is for me not for
the server so I'm gonna handle that.
On Linux, if you want to authenticate a user, you have something called PAM, Plugable Authentication Modules, you are all aware of that here. So we did things properly, and we all went through PAM, so we coded a PAM module, a new one, that is able to talk to factotum. And so we're going to create a PAM context, and by default it's not authenticated. And then this PAM is going to talk to factotum, and factotum is going to give us an AQID, and this AQID is going to go back to Alice.
And then every read and write to this handle is going to be forwarded to factotum.
And the beauty of the Plan 9 model is vrs
doesn't need to know or care about what they are saying to each other,
as long as both factotum can agree on a protocol and then do that protocol,
it will work.
So to prove that, it did not need proof here, but when I submitted to USENIX, it did need proof,
so we added Guillou-Quisquater, it's a zero-knowledge protocol.
So we added this protocol to factotum, and it required no change at all in vrs
on 9P,
so it was quite beautiful.
And so anyway, that was a cryptographic conversation between Alice's factotum and the authoritative factotum from Bermuda. And at the end, probably, Alice's context is going to switch to success, which means that she's authenticated.
Charlie sends a Tattach
So let's see what happens when you have an authenticated call going on. So Charlie has already authenticated, and he's attaching the file server, which means he's mounting the file server.
And once again, we do some tag and AFID translation to streamline the stream.
And then vrs
sees the AFID, and he's going to check here.
Okay, AFID 1 for Charlie is actually authenticated via PAM,
so he will strip that from the message, and then actually start the server.
And the server will begin with a special FID, no FID, which is zero, I guess, in the code,
which means that the server is not aware of any authentication efforts.
So the server is starting unauthenticated.
The server does not need to care about authentication, but look at the color. The server is started as Charlie, because this system knows who Charlie is. He's an actual user, and the can now recognize Charlie as a user.
So we can set UID and set GID to whatever is relevant, and then fork and exec the server. And so the server is started just now. And once the server is started, The server will respond with one server-specific QID, which will be given back to Charlie. Quite straightforwardly.
Dave sends a Twrite
And when using the server, Dave is writing on the file he has previously opened. So he's sending a Twrite message. Once again, tag and AFID translation. This is given to VRS.
As you can see, there is no authentication part at all, so it is given without any further change to the server.
The only thing that we do is check who is this FID belonging to. So we have a table there.
FID 2 is for Dave. And then AFID 0 is actually good.
So once those two checks are made, we know where to put the information, and VRS maintains a table of file descriptors, which corresponds to the standard inputs and standard outputs. Of the many servers that have been launched.
And so the server is given the message, and the server responds whatever. And this response is given almost untouched to Dave. Do you have any questions? Yeah.
[Skip:]So if I understand correctly, VRS can do the user authentication using PAM. And you're doing the mapping to the users as Factotum knows them, right? So could VRS and 9P– could 9Pserve, which is essentially like a single namespace mount driver? Would that be a fair statement? 9P serve on P9– yeah, OK. So it's basically serving a single namespace. Could you combine those two, essentially become like a 9P server?
I guess we could, and I would say we should, because when we talk about where this project is right now, I think we would, yeah, good thinking.
And thank you for the opportunity to talk about that.
Just before we go to that, oh yeah, please.
[Audience member] Does it not fit into like the GSS model like well, to use a GSS module to–
I don't know what GSS is.
[Audience member] It's basically a plug-in, it's like parallel to PAM, which is what Kerberos is using. So it lets you do any, it's basically, it lets you write modules that do back and forth a bunch, and then you can do delegation of keys and stuff.
Okay, so the short answer is I didn't know about GSS. So maybe it's very relevant, but I don't know. And also, we kind of stepped away from Kerberos because we learned about Kerberos, and we were like, this is overly complex, and we didn't want to touch Kerberos with a ten foot pole because it was–
[Another audience member] All in one model, it's like you go 100% Kerberos, or you don't go 100% Kerberos. It's like, you take it all, or you cannot take it all.
[Someone else]Unless you export it to Azure AD and then– Then you get the whole mess. And a SAML, too.
The big problem for us with Kerberos was that you need to have cryptographic code into application space. And this was a big no-no for us. Some applications are binaries from an Israeli company, for example, and we don't want any of that, any cryptographic code in that.
So we need to isolate our processes.
[Audience discussion]The API, I think it's kind of like does what the Factotum does, right? It tries to be the key exchanger, but it's–is it multi-protocol, or is it just Kerberos? No, no, it's multi. It's general. It's like you just– Right, that's right. You put the plug in. That's right. It's got the log in. Right, right. Right. So it's kind of like Factotum Yeah. But the thing is, like, you have to programmatically put stuff inside. Right, right. Which is why it makes it a little more–because your core has to be bound in. And as he said, it's got binaries, which it doesn't know the kernel inside. Okay. It doesn't want to touch them, so. Right. GSS API might mean that opening those things is probably impossible. Yeah.
4.4. Capabilities

So one thing I forgot to mention is that the AFID and so on are capabilities on the 9P.
Once you have opened the file, for example, for writing, you can give this FID to whomever. And if they send the correctly crafted 9P message with this FID in it, they will be able to write to the file, which is very nice. We want to be able to do that. But if you do that on a style network without encryption, you're going to get burned. So we used STunnel, which is a nice little utility with which you can use TLS on any connection. It's a tunneling stuff.
It's quite good. And before we switched to 9P, we used SSH everywhere. But it was complicated. I wrote in the paper why we switched.
4.5. Bad news
Yeah, so if you want to use that today, I have some bad news. So basically, you need the PAM module. And then you need to tell NSS where the PAM module is and how to configure it. And also you need a modified version of factotum because to let the kernel know about which users exist and what the UIDs are and so on.
What we did was modify factotum in such a way that when you mount it, you have like basically an ATC password file, whatever, something like that, or maybe a group file as well. And then you point NSS to those mounted files. The kernel, without any further coding, will know what you're talking about.
So we have a modified version of factotum. And then we have vrs
proper. The
problem is, this modified version of factotum is basically a fork of Plan9 from
user space from 2019. And we have not kept up with the ongoing development. So
it's basically obsolete.
And also, for the life of me, I couldn't compile the PAM module on my current distro. So it's a bit problematic.
But effort is ongoing. So yeah, last time I saw the system working was when I was at the computer crime unit. It was in late 2020 on a Debian stable.
Debian is not moving very fast, to say the least. So it should work now. But yeah, so the code is online, but it's a kind of bit of a mess. Basically, it's like a stripped up plan9port repo without even proper attribution. I'm sorry about that. I'm going to correct that soon. But yeah. But yeah, it's difficult to work with. So why am I doing all this? So let me back up.
5. GNU Guix

I'll back up a bit and tell you about GNU Guix. Do you know of Guix? Please raise your hand if you have heard of this. Okay, quite a few, so that's nice. So yeah, this is the man page for Emacs. You know this joke, right? And GNU Guix is a GNU project. So those two communities, if you look quickly, may not be the best friends. Still, I want to introduce the GNU Guix project because it's quite awesome actually.
5.1. Trusting trust

So you all know about the trusting trust attack where you put a backdoor in the compiler and every time you try to compile the compiler, it will put the backdoor in the compiler and so on. And what this is, it's basically all the binary that you need in order to compile GCC. You can see that the version of GCC is quite ancient, but this is only 60 megabytes. It's a lot of binary to analyze, but it's quite low if you think about it.
[Question from IRC]. What has actually changed in Factotum?
Yeah, the only change we made is when you mount it, when you mount–
[IRC] So you said that your version was obsolete, so what's changed that's made it obsolete, I guess, is the question.
I just, if you try to compile it, if I just, I did some effort to compile it on a new version and I just copied old code to the new Factotum. I guess some patch along the two years just broke things and I didn't repair them. But it's not a big deal, yeah.
Yeah, anyway, so this is only 60 megabytes of binary code and their objective is to go down to 512 bytes only. There is a project called Stage Zero whose goal is to uncruft this binary blob and from that on bootstrap all the GNU tool chain.
So this is a crazy effort and they are actually quite close to making it. So this is very exciting to see and every, like in 2019 and 2020, they put up a blog post saying, okay, we've cut up from 250 megabytes to 60 and so on. So if you are interested in that kind of stuff, I will really encourage you to take a look. It's a, on the blog of the GNU Guix project, you will see that kind of stuff. And yeah, am I running out of time? What time is it? No? No? Okay. Cool, thanks. 20 minutes. Okay, perfect.
[Audience] That was in 95 and then I stopped using GCC
What year, what year was that?
[Audience] 95, and after that…
5.2. Software quality

So this is a good point. So this effort is actually good for software quality. I know that the suckless community at large and the GNU project have antinomic, antithetic goals.
So the XZ compression system needs GCC but GCC needs sed and sed needs XZ. So they were using like sed 1, GNU sed, not the actual sed, from 1993 in their project. And so they complained to the maintainer and they said, okay, this is ridiculous. We need, please just stop giving the source as XZ. At least give us a source as GZ or whatever. And they actually did it.
So the upstream changed the way they worked to make it more boostratable. So they did like an almost 30-year leap forward. So they care about that and they are able to make other people care as well. So this is a worthwhile effort and I think it should speak to you because you strive for software simplicity and so on. So this is actually a step in the right direction.
5.3. Reproducible forensics

So why did we – am I talking to you about GNU Guix? It's because what GNU Guix wants to do is reproducible science. You said you worked in supercomputers. The guy who started the project actually works in supercomputers as well. His job is to make sure the bioengineering, bioscience stuff runs the same every time it's run and so on. So he wants to do reproducible package management and so on. So he based his work on Nix. Maybe you know about Nix. And Guix is basically guile, the GNU Guile language plus Nix.
And what we wanted to do was computer forensics but reproducible as well. I told you that when we copy a hard drive, we make a hash of the content of the hard drive. That is because the defense attorney can go and say, "Okay, this is not the actual hard drive for my client." And so what we do is we take out the evidence bag and say, "Okay, this is your hard drive. This is the hash of the data we used. If you hash this hard drive, you will see that it's the same hash. So bit by bit, it's exactly the same data."
What we wanted to add was not only is this the same data, but it's also the same software. And GNU Guix this is basically a Merkle DAG, DAG is directly acyclic graph, it's a graph you cannot have a cycle because otherwise you depend on something that depends on you or whatever.
But Merkle means that, this is just a friendly name, but the real name of this is actually a huge hash digest, which depends on every single bit of the source code. And so everything that depends on this package, if you change one bit here, it will change the hash of this one, so its name will change, so the name of this one will change as well, you see what I mean?
There is a cascading change of hash, and so when you hold a package, you hold not only the package, but all the dependencies of the package, and the build dependencies as well on everything.
So if you change one bit anywhere, you change the package and you cannot, maybe you don't know what it does, but at least you know that you've been, something fishy is going on. So this is quite nice.
If you want to do reproducible forensics, because then you can give the judge and the defense attorney, this is the data, this is the software we used to analyze the data, this is the hashes, you can run it again, you will see that the output is exactly the same, and now we can talk about the source code of what we did, and we can see if what we did is actually conclusive evidence, but at least we can talk about the meat of the matter and not care about handling of evidence and technical bits of which version were you using. And usually when there is a trial, the analysis was done four years ago, and you don't even remember what you were doing that day, so this was important for us.
But the project was put on, the 9P project was put on hold because we switched our efforts to reproducible forensics, thanks to Guix. Also, I got out of there and switched jobs, and also we finally got the hardware we needed, and every investigator got a very powerful computer, and we didn't need any more to have a single forensics computer to share, so it kind of failed by the wayside, but still, good news:
6. The dam

My friends and I, we started a communal computing project called the Dam, it's a Pubnix, it's a public access Unix server, it's running GNU Guix, but you can have the plan 9 from user space tools in it, so I've not managed to put this project back into GNU Guix because it's quite complex, for example, PAM is a bit complex, and you don't have NSS per se anymore, so I need to switch things a little bit.
But you do have plan 9 tools, for example, we play tabletop role-playing games online, and we use the Plumber, do you know what the Plumber, raise your hand if you know what the Plumber is?
Yeah, so we use the Plumber to, if you type in the chat, like img: and then the path to an image, this image will open on your computer, so you can share the image of whatever you are fighting.
So it's quite, it's silly, but it's fun to play with, and we can play and play with the computers as well.
So you are most welcome, if you want to log in, just give me your SSH public key and I will add you to the… and we can chat.
And there is a chat system made, it's basically dumping text into a text file, and it has all the features of Slack, but it's only five lines of bash, so it's awesome.
So if you want to work on that kind of stuff, please tell me.
7. Integrity

And also, every time I speak in public, I finish with this, I'm a member of a charity called Go for Integrity. We are sponsoring a school in Tanzania, and for example, last week, the roof was blown off by a storm, and for only 300 bucks, they were able to make a new roof. So, with very little money, you can have a huge impact, so if you want to help, please go to the website. The website is kind of laggy. It's kind of lagging in updates, because now we are using a WhatsApp channel, and the school is directly sending updates to the members. So, anyway, please visit, and no pressure, but I always talk about that when I finish talking. [scattered applause]