Imagine you have a data logging application, that writes data to disk continuously. Since the application is not very stable, you want it to write out the data in small files, so it does not loose too much data, if the application crashes. This creates some need to find a good trade-off between file size and file system in order to avoid wasting too much disk space with file system overhead.
An approach to measure file system overhead and to explore the design space of different file systems and file sizes quickly is as follows:
- Create a ramdisk and in this ramdisk create a bulk file of given size (using dd)
- For all combinations of file size and file system:
- Format the bulk file with a desired file system (using mkfs) and mount it.
- Continuously write files with a fixed size to the mounted bulk file until an exception occurs and record how many files could be written to mounted bulk file (using some script).
Operations to the mounted bulk file are very fast, since the bulk file resides in a ramdisk. An experiment using this approach was conducted for a bulk file of 1 GiB. Considered file systems were ntfs, exfat, vfat, ext2,ext3 and ext4. File sizes were varied from 1 byte to 2
20 bytes. A plot summarizing the relative file system overhead for different file sizes and file systems is shown below:
From this figure it can be seen that file system overhead is excessive for small file sizes. ext2, ext3 and ext4 behave almost identical in terms of overhead. Minimal overhead in this experiment is observed for vfat at a file size of 65536 bytes per file. Strangely exfat is always outperformed by ntfs.
The scripts that were used to conduct this experiment can be downloaded
here.