Efficient File I/O, Part 1: Some (somewhat) surprising findings on C’s `fseek`

16 minute read

Published: October 10, 2024

I’m working a lot with files for my latest research project, ExoLabel. I’ve (mostly) finished ensuring that the algorithm itself is accurate, so lately I’ve been turning my attention to optimizing its speed. Unsurprisingly, most of the slowest operations are working with files, since accessing files is orders of magnitude slower than accessing RAM.

This led me to ask–what, specifically, is the slowest operation when writing to or reading from a file? I’m specifically looking at the following operations that are pretty common throughout my code:

Update a random value in a file:

fseek to position
fread value
update value
fseek to position
fwrite value

Read a bunch of values at random positions:

fseek to position
fread value(s)

Now, there are a couple “dials” we can turn for optimization:

sorting locations vs. not sorting
buffered vs. unbuffered files
absolute vs. relative positioning for fseek
opening in rb vs. rb+ for reads
HDDs vs SSDs (can’t control this, but should know how it affects results)

Finally, there is always the option of using portable C vs. POSIX/Windows-specific instructions. I’m not going to test this because we’re specifically interested in the performance of 100% portable C code. You could probably improve performance further by using mmap (or MapViewOfFile), but controlling for those scenarios adds a ton of code complexity given that my algorithm has to work on both POSIX and Windows.

Benchmarks

Let’s start with the data, then we can break it down. If you’re just interested in the conclusions, skip to the bottom.

All of these tests used the following setup:

M1 MacBook Pro with 32GB of RAM
SSD testing used the MacBook internal drive (unclear specifications)
HDD testing used a LaCie Rugged Mini 2TB external hard drive (link) connected via USB-C (5120 Mbit/sec transfer speed)
Tests preinitialized a 4.29GB file (2^30 random integer values, nonzero to prevent sparse files)
Each test did 100,000 reads or updates at random locations (locations were consistent across tests)
Timing was measured using clock() from time.h, and reported in clock cycles (timing was only for each operation, excluding other things like file initialization or results reporting)
Tests marked as w/ sort would sort the indices to analyze prior to going through the file; this makes reads more sequential than random access
Tests marked as (b)uffered used the standard file buffer; those marked unbuffered (ub) called setvbuf(f, NULL, _IONBF, 0) to prevent buffering
SEEK_SET and SEEK_CUR are whence parameters for fseek, either calculating the position to move to as absolute (SEEK_SET) or relative to the current position (SEEK_CUR)
Sorting the array of indicies used qsort and took around 7,000 clock cycles, which was negligible relative to the overall time consumed for each test
Average times are calculated using the middle 90% range of times reported (the 5%-95% range of results) to reduce impact of huge outliers

Benchmark 1: Random Reads

This benchmark analyzed the average time per read to read 100,000 int values from random locations in a large file.

| drive | mode | sorted? |  buffered? |  whence  |  mean  | median |  max  |
|-------|------|---------|------------|----------|--------|--------|-------|
|  SSD  |  rb  | no sort |   buffered | SEEK_SET |   9.15 |      9 |  516  |
|  SSD  |  rb  | no sort |   buffered | SEEK_CUR |   8.61 |      9 |  226  |
|  SSD  |  rb  | no sort | unbuffered | SEEK_SET |   8.53 |      9 |  224  |
|  SSD  |  rb  | no sort | unbuffered | SEEK_CUR |  10.55 |     10 |  244  |
|  SSD  |  rb+ | no sort |   buffered | SEEK_SET |   9.22 |     10 |  409  |
|  SSD  |  rb+ | no sort |   buffered | SEEK_CUR |   9.16 |     10 |  207  |
|  SSD  |  rb+ | no sort | unbuffered | SEEK_SET |   8.16 |      9 |  436  |
|  SSD  |  rb+ | no sort | unbuffered | SEEK_CUR |   8.39 |      9 |  310  |
|-------|------|---------|------------|----------|--------|--------|-------|
|  SSD  |  rb  |  sorted |   buffered | SEEK_SET |   8.75 |      9 |  539  |
|  SSD  |  rb  |  sorted |   buffered | SEEK_CUR |   8.91 |      9 |  382  |
|  SSD  |  rb  |  sorted | unbuffered | SEEK_SET |   8.77 |      9 |  314  |
|  SSD  |  rb  |  sorted | unbuffered | SEEK_CUR |   8.49 |      9 |  259  |
|  SSD  |  rb+ |  sorted |   buffered | SEEK_SET |   9.47 |     10 |  563  |
|  SSD  |  rb+ |  sorted |   buffered | SEEK_CUR |   9.13 |     10 |  383  |
|  SSD  |  rb+ |  sorted | unbuffered | SEEK_SET |   8.39 |      9 |  267  |
|  SSD  |  rb+ |  sorted | unbuffered | SEEK_CUR |   8.65 |      9 |  338  |
|-------|------|---------|------------|----------|--------|--------|-------|
|  HDD  |  rb  | no sort |   buffered | SEEK_SET |  83.17 |     84 |  241  |
|  HDD  |  rb  | no sort |   buffered | SEEK_CUR |  95.34 |     96 |  301  |
|  HDD  |  rb  | no sort | unbuffered | SEEK_SET |  87.15 |     88 |  260  |
|  HDD  |  rb  | no sort | unbuffered | SEEK_CUR |  87.87 |     89 |  315  |
|  HDD  |  rb+ | no sort |   buffered | SEEK_SET | 107.93 |    107 |  329  |
|  HDD  |  rb+ | no sort |   buffered | SEEK_CUR | 105.47 |    105 |  293  |
|  HDD  |  rb+ | no sort | unbuffered | SEEK_SET |  80.47 |     81 |  269  |
|  HDD  |  rb+ | no sort | unbuffered | SEEK_CUR |  86.16 |     87 |  295  |
|-------|------|---------|------------|----------|--------|--------|-------|
|  HDD  |  rb  |  sorted |   buffered | SEEK_SET |  28.58 |     31 |  541  |
|  HDD  |  rb  |  sorted |   buffered | SEEK_CUR |  30.47 |     33 |  329  |
|  HDD  |  rb  |  sorted | unbuffered | SEEK_SET |  28.40 |     31 |  377  |
|  HDD  |  rb  |  sorted | unbuffered | SEEK_CUR |  27.45 |     30 |  320  |
|  HDD  |  rb+ |  sorted |   buffered | SEEK_SET |  22.17 |     21 |  446  |
|  HDD  |  rb+ |  sorted |   buffered | SEEK_CUR |  22.33 |     21 |  690  |
|  HDD  |  rb+ |  sorted | unbuffered | SEEK_SET |  20.34 |     19 |  388  |
|  HDD  |  rb+ |  sorted | unbuffered | SEEK_CUR |  21.00 |     20 |  314  |

The best performing results for HDDs and SSDs were:

|-------|------|---------|------------|----------|--------|--------|-------|
|  SSD  |  rb+ | no sort | unbuffered | SEEK_SET |   8.16 |      9 |  436  |
|  SSD  |  rb+ | no sort | unbuffered | SEEK_CUR |   8.39 |      9 |  310  |
|  SSD  |  rb+ |  sorted | unbuffered | SEEK_SET |   8.39 |      9 |  267  |
|-------|------|---------|------------|----------|--------|--------|-------|
|  HDD  |  rb+ |  sorted | unbuffered | SEEK_SET |  20.34 |     19 |  388  |
|-------|------|---------|------------|----------|--------|--------|-------|

Broadly, opening as rb vs. rb+ doesn’t make a difference. However, if you know in advance that you’re only going to be reading a single int from each position, eliminating buffering can improve runtimes. The largest improvement, however, was sorting the indices to traverse prior to analyzing for HDDs. HDDs benefit significantly from sequential over random access because of their internal hardware, so this finding makes sense. There’s also the difference between SEEK_SET and SEEK_CUR, which I’ll come back to later.

Benchmark 2: Random Updates

This benchmark analyzed the average time to find 100,000 values, increment each one, and then overwrite the initial value in the file. Running on an HDD without presorting the locations to access took prohibitively long, so those results are not included.

An additional whence argument is included here, which I’m calling SEEK_BOTH. There are two fseek calls required for each update: we fseek to the position, read it, and then have to again fseek to the position to write the new value (because fread increments the position in the file). We can either do two fseek calls from SEEK_SET (this is strategy SEEK_SET), do two relative calls with SEEK_CUR, or do one absolute call from SEEK_SET and then a call to fseek(f, -1*sizeof(int), SEEK_CUR). This last case is strategy SEEK_BOTH.

| Drive |  sort?  |  Strategy | buff? |  min  |  mean  | median |  max  |
|-------|---------|-----------|-------|-------|--------|--------|-------|
|   SSD | no sort |  SEEK_SET |     b |     3 |  95.21 |     20 |   514 |
|   SSD | no sort | SEEK_BOTH |     b |     3 |  92.48 |     15 |   606 |
|   SSD | no sort |  SEEK_CUR |     b |     3 |  81.08 |     15 |   611 |
|   SSD | no sort |  SEEK_SET |    ub |     2 |  93.62 |     16 |   589 |
|   SSD | no sort | SEEK_BOTH |    ub |     3 |  79.33 |     15 |   629 |
|   SSD | no sort |  SEEK_CUR |    ub |     3 |  80.24 |     15 |   628 |
|-------|---------|-----------|-------|-------|--------|--------|-------|
|   SSD | w/ sort |  SEEK_SET |     b |     2 |  18.60 |     14 |  1569 |
|   SSD | w/ sort | SEEK_BOTH |     b |     2 |  16.11 |     13 |  1311 |
|   SSD | w/ sort |  SEEK_CUR |     b |     2 |  16.60 |     13 |  1411 |
|   SSD | w/ sort |  SEEK_SET |    ub |     2 |  12.59 |     10 |  1308 |
|   SSD | w/ sort | SEEK_BOTH |    ub |     2 |  11.76 |      7 |  1423 |
|   SSD | w/ sort |  SEEK_CUR |    ub |     2 |   9.52 |      5 |  1321 |
|-------|---------|-----------|-------|-------|--------|--------|-------|
|   HDD | w/ sort |  SEEK_SET |     b |     1 |  33.01 |     35 |   403 |
|   HDD | w/ sort | SEEK_BOTH |     b |     1 |  40.26 |     37 |   400 |
|   HDD | w/ sort |  SEEK_CUR |     b |     1 |  44.22 |     39 |   393 |
|   HDD | w/ sort |  SEEK_SET |    ub |     1 |  34.44 |     34 |   572 |
|   HDD | w/ sort | SEEK_BOTH |    ub |     1 |  40.87 |     36 |   377 |
|   HDD | w/ sort |  SEEK_CUR |    ub |     1 |  35.41 |     36 |   370 |

The best results for SSDs and HDDs are:

|-------|---------|-----------|-------|-------|--------|--------|-------|
|   SSD | w/ sort |  SEEK_CUR |    ub |     2 |   9.52 |      5 |  1321 |
|   HDD | w/ sort |  SEEK_SET |     b |     1 |  33.01 |     35 |   403 |
|-------|---------|-----------|-------|-------|--------|--------|-------|

Once again, sorting the locations to visit prior to accessing them makes the largest difference. Once again, though, there is a performance improvement between SEEK_SET and SEEK_CUR, at least for HDDs.

What’s going on with SEEK_SET vs SEEK_CUR?

When I started writing code, I assumed that using SEEK_CUR would give better performance. It takes a little more work to program, but I thought that it would require the OS to move around less in the file. If we’re at position 10 and want to get to position 9, fseek(f, -1, SEEK_CUR) would only move one space whereas fseek(f, 9, SEEK_SET) would rewind to the start and then move 9 positions. At least, that’s what I thought.

To accomplish this, I added a lot of complexity to my code. Assume we have a vector of positions pos. Using just SEEK_SET looks like this:

// traversing just using SEEK_SET
for(int i=0; i<length_of_pos; i++){
  fseek(f, pos[i], SEEK_SET);
  // do stuff
}

If we want to only use relative paths, it instead looks like this:

// traversing just using SEEK_CUR
int cur_pos = 0, to_move;
for(int i=0; i<length_of_pos; i++){
  to_move = pos[i] - cur_pos;
  fseek(f, to_move, SEEK_CUR);
  // do stuff, if we read we have to increment cur_pos too
  cur_pos += to_move;
}

This is a little more complicated, and it can get a lot more complex when read/writes are included.

It turns out, though, that this is a dumb thing to do…and also, my understanding of fseek was completely incorrect. Let’s take a look at the source code for fseek (sourced from here, note I’m removing a lot of error checking to make this more digestible):

/*
 * Seek the given file to the given offset.
 * `Whence' must be one of the three SEEK_* macros.
 * NOTE: fseek is essentially a wrapper for fseeko
 */
int
fseeko(fp, offset, whence)
  register FILE *fp;
  off_t offset;
  int whence;
{
  register fpos_t (*seekfn) __P((void *, fpos_t, int));
  fpos_t target, curoff;
  size_t n;
  struct stat st;
  int havepos;

  /*
   * Change any SEEK_CUR to SEEK_SET, and check `whence' argument.
   * After this, whence is either SEEK_SET or SEEK_END.
   */
  switch (whence) {

  case SEEK_CUR:
    /*
     * In order to seek relative to the current stream offset,
     * we have to first find the current stream offset a la
     * ftell (see ftell for details).
     */
    if (fp->_flags & __SOFF)
      curoff = fp->_offset;
    else {
      curoff = (*seekfn)(fp->_cookie, (fpos_t)0, SEEK_CUR);
      if (curoff == -1) {
        return (EOF);
      }
    }
    if (fp->_flags & __SRD) {
      curoff -= fp->_r;
      if (HASUB(fp))
        curoff -= fp->_ur;
    } else if (fp->_flags & __SWR && fp->_p != NULL)
      curoff += fp->_p - fp->_bf._base;

    offset += curoff;
    whence = SEEK_SET;
    havepos = 1;
    break;

  case SEEK_SET:
  case SEEK_END:
    curoff = 0;
    havepos = 0;
    break;

  default:
    errno = EINVAL;
    return (EOF);
  }

  // ...other stuff happens here
}

It turns out that fseek is always using SEEK_SET (or SEEK_END). When fseek is called, it first has to figure out the absolute position in the file to go to. When whence is SEEK_CUR, it has to do some additional calculations to figure out where the pointer currently is to convert the relative position to absolute, which isn’t as simple as it sounds due to potential internal buffering. In fact, when buffering is disabled on the HDD, you can see that the difference between SEEK_CUR and SEEK_SET for the second benchmark is much smaller than when buffering is enabled.

Notably, this determination of the position to seek to is done before any checks for cached content. In other words, the program first finds where you want to go, then asks if it has any cached content from the destination you’re asking for.

The difference between these two argument values is not that big, but if you’re trying to eke out every bit of performance, you can typically improve results slightly by always using absolute positions rather than relative.

Conclusions

Broadly, here are the results I found for my benchmarks:

Unsurprising Results:

Reading is faster than updating. If you can keep immutable values on disk and mutable values in RAM, you’ll get a big performance boost.
SSDs are much faster than HDDs
Batched read/writes are much faster than lots of individual writes (I didn’t show this benchmark, but it should be easy to believe)

Useful Results:

SSDs have very good random access for reads. Regardless of parameterization, all my results in Benchmark 1 were essentially identical on SSDs.
Doing everything you can to ensure sequential access is a huge improvement on HDDs. Sorting the indices to access means that the drive will always be seeking forward, and for fewer positions on average than not presorting the positions. Sorting the positions to access is typically negligible in the overall runtime, so whenever possible, do it.
If you know exactly how many elements you’re going to be reading, using unbuffered access (or changing buffer size) can give a slight performance increase for HDDs. The increase is so minor that it’s not usually worth doing.
Using absolute positions for fseek (from SEEK_SET) on HDDs is at worst identical to relative positioning (SEEK_CUR), and at best around 20-25% faster. On SSDs using SEEK_CUR can be slightly faster, but all access times are basically identical anyway.

There are other ways you could further optimize–my next steps would probably be to ensure file pointers are always tagged as restrict unless absolutely necessary, and investigating platform-specific implementations. For ExoLabel, I’m going to move towards making files read-only and storing mutable data in RAM.

Thanks for reading. If you have further comments or suggestions on optimizing file read/writes using stdio.h, feel free to comment or email me–I’d love to hear your thoughts!

Check out the next part in this series here!

Share on

Facebook LinkedIn

Aidan Lakshman

Efficient File I/O, Part 1: Some (somewhat) surprising findings on C’s `fseek`

Benchmarks

Benchmark 1: Random Reads

Benchmark 2: Random Updates

What’s going on with SEEK_SET vs SEEK_CUR?

Conclusions

Share on

You May Also Enjoy

My SWE Interview Experiences: Amazon vs. Meta vs. Google

The ExoLabel Post: Clustering Massive Networks with Limited Resources

A Story of a Sneaky Bug

The ‘Research Scientist’ Role at Meta: What is it, exactly?