Data Structure Storage Techniques: Copy-On-Write Snapshotting in C
My blog has been leaning more towards data structures and algorithms lately and I'd like to continue with that for now. The last couple posts have been about graph algorithms. I've also been working on storage and retrieval of data structures (hashtrees and graphs).
When researching different ways to store data structures to disk, I came across a technique called copy-on-write snapshotting. The basic idea is you have a data structure in virtual memory that you want to store on disk. You fork(2) a child process and write the data structure to disk. There are a few things you have to do in order to get copy-on-write snapshotting.
The first thing is to use a private, anonymous mmap for storing your data structure in virtual memory.
unsigned long mapsize; unsigned char *pmap; /* 1MB */ mapsize = (1 << 20); pmap = mmap(0, mapsize, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
If you're on OS X you probably need to use MAP_ANON instead of MAP_ANONYMOUS.
Let's say you have a binary tree T stored in virtual memory and it's time to take a snapshot of it. When you're using private virtual memory in one process and issue a fork() call, the parent process's VM pages aren't copied over to the child process. Instead the pages are shared until one of them writes to a shared page. If one process writes to a shared page, a copy of that page is made for the process doing the updating. The changes are not visible to the other process.
Copy-on-write is useful for when you're still issuing updates to the data structure in the parent process while snapshotting a current (as of the fork call) copy of it to disk in the child process. To take a snapshot, first make a call to fork():
pid_t pid = fork();
if (pid == 0) {
/* in the child process */
take_snapshot(T);
} else {
/* in the parent process */
...
}The take_snapshot() function is where you write a snapshot of the tree to disk:
#define RWRR S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH
static void take_snapshot(struct tree_s *T) {
int fd = open("snapshot.dat", O_CREAT | O_TRUNC | O_RDWR, RWRR);
/* Seek to EOF minus one byte and write a single
* byte at the end to create a valid file map */
lseek(fd, T->mapsize-1, SEEK_SET);
write(fd, "", 1);
/* create file mmap */
unsigned char *fmap = mmap(0, T->mapsize, PROT_WRITE, MAP_SHARED, fd, 0);
/* copy bytes from snapshot memory to file map */
memcpy(fmap, T->pmap, T->mapsize);
/* transfer all data to disk */
fsync(fd);
munmap(fmap, T->mapsize);
munmap(T->pmap, T->mapsize);
close(fd);
/* free the child's copy of the tree */
free(T);
/* exit the child process */
exit(EXIT_SUCCESS);
}The code above can be thought of as pseudo-code because I haven't defined `struct tree_s` above and it won't work as-is. To see an actual working example, take a look at my GitHub gist on copy-on-write snapshotting of a binary search tree.
I'm also working on Multiversion Concurrency Control so look forward to a blog post about that topic.