Last night was the Derivco Dev Night in Durban, South Africa, which consisted of 3 presentations on various development or technical topics.
While its always good to drink beer and eat burgers with other like minded developers I must say the highlight of the evening for me was the talk by Daniel De Abreu on the Zettabyte File System.
Daniel is clearly passionate about open source technologies and his passion spills over in the form of the occasional expletive which he tried hard to avoid as he got more carried away with explaining how amazing ZFS is.
His talk briefly touched on some 41 years of development going back to the origins of the Unix/Linux software and the impact of the pipe function pioneered by Douglas Mcllroy at Bell Labs - represented as | in shell commands.
For those Linux nerds who want to know more about the pipe command you can just type
For those Linux newbies, you can learn more about the man function by typing
- Christos I know you tried this :-)
No reference to the Unix and Bell Labs would have been complete without showing a retro video of some of the Bell engineers like Ken Thompson and Dennis Ritchie (creator of the C programming language and where would PHP be without C). Engineering fashion has come a long way since then, although it seems the excessive facial hair is a trend thats stands the test of time.
So the obvious question is “How big is a Zettabyte?”. With 128-bit addressing ZFS can store up to 275 billion TB per storage pool - called a zpool. The zpool basically takes all available storage drives and makes them available as a single resource. So this is really big but not necessarily revolutionary - and we all know size is not everything.
The real magic that ZFS brings is a whole range of file system optimisations and features, some of which make your life as a developer better. Working in a virtualised development/test environment can be made much simpler with efficient snapshot management. The snapshot provides a read-only, point-in-time copy of the dataset.
ZFS uses copy-on-write technology which means your data never gets overwritten. The new data is written to different data blocks and the pointers to the previous blocks updated. This makes working with snapshots very simple. You can snapshot a dataset which allows you to restore your system to a repeatable pre-test state, without having to create a clone of your entire virtual environment and then restore it to re-run the tests. The copy-on-write technology allows snapshots to be created and restored quickly by preserving the older version of the data on the disk. In addition creating a snapshot does not initially use additional disk space. It is also possible to view a comparison between two snapshots.
Sharing snapshots with other developers is very straight forward using zfs-send and zfs-receive, which can be piped through ssh between virtual environments ( remember McIlroy and |). Daniel talked us through a real world scenario where he sent a a copy of a file system using rsync versus using ZFS snapshots. The results showed a significant performance improvement using the ZFS snapshot. When you have development teams in different offices around the world these optimisations are very useful.
A combination of beers, burgers and ZFS made for a good Dervico Dev Night. Clearly Dan is the Man or in Unix terms
dan | man.