Before I did my PhD my interaction with data storage on disk had mainly been in the classic way. With the classic way I mean having a storage medium (SSD, HDD, USB, etc.) formatted and with a file system (HFS, EXT4, APFS, NTFS, XFS, etc.). During my PhD I started to ask the question: Do I actually need the file system?

The reason for this question was that I on Ubuntu was running in to a strange limitation. I was literally hitting an upper bound for how many files there could be in a single directory. I hit this by exceeding the number of open file descriptors allowed by Ubuntu, which was lower at the time than for instance Fedoras limit. Note, it is really confusing when code works on one Linux distribution and not the other and the error you get is related to the limits imposed for file descriptors. ANYWAYS! I had to solve this and the first solution (and worst) I came up was simply to implement a folder structure which essentially mapped a Try tree to disk. Not my smartest idea, but hey we got a FUSE based file system to work. However, after a while I started to consider if could avoid my problems by avoiding the file system all together.

The answer to this is yes, yes you can and it is surprisingly easy. All you need is! Is a raw block device. What is this then? Basically a raw block device is any device that does not have a file system to manage the data on it. To keep it simple it is disk partitions that has not been formatted with a file system, an entire disk or in the *NIX Virtual device such as /dev/null. You will need knowledge of pwrite and pread (or read/write or fread/fwrite), I will get into why I prefer the p versions later. These two functions can be untilised by including the unistd.h header. Oh and you need a file descriptor to the device.

Let us start by looking at pread and pwrite (see code below). Both take the same parameters as an input, a file descriptor fd, a data buffer buf which in the case of pwrite is const to indicate it is not changed, a byte counter count indicating how many bytes should be read/written, and an offset from/to where data should be read/written. A note, the file descriptor must be capable of seeking. The return value is the number of bytes read/written can be compared to count For pwrite 0 indicates nothing was written or end of file (EOF) in the case of pread. If -1 is returned and error has occurred and we can use errno to check what error it is.

ssize_t pread(int fd, void *buf, size_t count, off_t offset);
ssize_t pwrite(int fd, const void *buf, size_t count, off_t offset);

From this we can create a fairly simple example. /WARNING/: DO NOT RUN THIS EXAMPLE UNLESS YOU ADAPT IT TO A DEVICE WHERE YOU DO NOT MIND LOSING THE DATA!

#include <fcntl.h>  // Opening files and related flags
#include <unistd.h> // pread and pwrite
#include <string.h> // strcmp
#include <stdio.h>  // prints

int main(void)
{

    const char *device = "/dev/sda";
    const char *data = "Hello world!";
    const size_t data_length = 12;
    const off_t offset = 0;

    int fd = open(device, O_RDWR); // Opens file descriptor in READ/WRITE mode

    if (fd == -1) {
        perror("Error opening Device");
        return 1;
    }

    ssize_t bytes_written = pwrite(fd, data, data_length, offset);


    if (bytes_written == -1) {
        perror("Error writing data");
    }
    else {
        char data_out[data_length];
        bytes_written = pread(fd, data_out, data_length, offset);

        if (bytes_written == -1) {
            perror("Error Reading Data");
        } else {
            const int result_of_compare = strcmp(data, data_out);
            printf("It is the same data: %s", result_of_compare == 0 ? "True" : "False");
        }
    }
    close(fd);
    return 0;
}

Based on this example you should be able to see a few issues when reading and writing without a file system. First, where is the data or where should I put it? Let us say I run this code again, but with the data being I love hamburgers and data_length being 17. Then we will write over the Hello World on disk. But how do we know it is there? Well, we do not and we never will, unless we keep a record of it. Secondly, how do we know how much data to read? Let us say that the our write and read happens at two different points in time, how do we know how much data to read? Well, we do not and we never will, unless we keep a record of it. So we need to keep a record of both, which essentially is the beginning of a file system… But still we can read and write without a file system.

Now why do I prefer pread and pwrite over read/write and fread/fwrite. Well frankly, I just prefer the way I can provide an offset and then “magic” happens behind the seen doing the seek operations for me. But I know purist who prefer read and write for the opposite reason and that is fair enough.

Anyways now you can write to disk like a pro.

For the love of all gods and demons be careful.

./Lars