Sunday, 23 March 2014

forkfs: making any directory into a throw away directory

I recently needed to work on two git branches simultaneously. My particular use case required me to make small changes to both branches, rebuild and run. These were temporary changes that I was going to blow away at the end. One approach could have been to have two clones of the repo but that would have required me to rebuild my project from scratch for the second working copy (it was boost which is large). Having run into the same predicament before, I decided to create a script to "fork" a directory.

The script, called forkfs, works as follows. Suppose you are in your bash session in directory foo:
~/foo$ ls
bar

You then execute forkfs which launches a new bash session and changes your prompt to remind you that you are forked:
~/foo$ sudo ~/bin/forkfs
(forkfs) ~/foo$ ls
bar

So far it looks like nothing has changed but if we start making changes to the contents of foo, they'll only be visible in our forked session:
(forkfs) ~/foo$ touch baz
(forkfs) ~/foo$ ls
bar baz

Open up another bash session and do an ls there:
~/foo$ ls
bar

When you exit out of your forkfs session, all your changes will be lost! You can also make multiple forks of the same directory, just not a fork of a fork. The forkfs script is available on GitHub.

Words Of Caution

Be careful where your fork begins. If you're using it for multiple git branches like I was, be sure to fork at the repository directory -- same place that houses the .git directory. Otherwise git will be confused outside of forked session.

Under The Hood

The script makes use of two technologies that are also used by Docker: aufs and mount namespaces. Aufs is a union filesystem which takes multiple source directories and combines them into a single destination directory. For example, one can mount /home/johndoe such that it's a union of /home/john and /home/doe directories. When you make changes in /home/johndoe, aufs uses a preconfigured policy to figure out which underlying directory gets the changes. One such policy allows forkfs to create Copy-On-Write functionality. When forkfs is forking ~/foo:

1. It creates an empty temporary directory, e.g. /tmp/tmp.Gb33ro1lrU
2. It mounts ~/foo (marked as readonly) + /tmp/tmp.Gb33ro1lrU (marked as read-write) into ~/foo:
mount -t aufs -o br=/tmp/tmp.Gb33ro1lrU=rw:~/foo=ro none ~/foo

Since ~/foo was marked as read only, all the writes go to the temporary directory, achieving copy-on-write.

Notice that the aufs was mounted over ~/foo. An unfortunate consequence of this is that the original ~/foo is no longer accessible. Moreover, it will not be possible to create other forks of ~/foo. This is where mount namespaces come to the rescue.

Mount namespaces allow a process to inherit all of the parent's mounts but have any further mounts not be visible by its parent. Linux actually has other namespaces of which a process can get a private copy: IPC, network, host/domain names, PIDs. Linux containers (LXC) makes use of these to provide light-weight virtualization.

unshare is a handy command to get a process running with a private namespace. forkfs uses "unshare -m bash" to get a bash running with a private mount namespace. It then executes aufs mount without having the rest of the system seeing the fork.

Future Work

If I have time, I'd like to add ability to create named forks and then be able to come back to them (like Python's virtualenv).

Acknowledgements

Special thanks goes to Vlad Didenko for helping me out with bash nuances.