Advent of Code - Day 1

My boyfriend and i decided to start last years advent of code today. The first exercise can be found on their webpage.

adventofcode.

In short: Parse a text file and extract the first and last digit in a line to form a two digit number. For example:

1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet

should give 12, 38, 15 and 77. A short and fun exercise.

I wanted to use this opportunity and compare the standard file input functions in Hare and C. First i’ll look at a simple unbuffered read of the whole file, second i’ll compare buffered line reads.

Unbuffered I/O

In C i used the fread() function to perform the read operation. Since i needed the amount of bytes to read for the call, i used ftell for that first.

// Get length of file in bytes
fseek(fhandle, 0, SEEK_END);
u32 len = ftell(fhandle);
// Set file pointer to beginning
rewind(fhandle);

// Read whole file
char *textfile = malloc(len+1);
fread(textfile, 1, len, fhandle);
fclose(fhandle);

Next i iterated over the textfile and extracted the first and last digits.

int first = -1;
int last  = 0;

for ( u32 i = 0; i < len; i += 1 ) {
    // Search individual lines
    if ( textfile[i] == '\n' ) {
        u32 num = (textfile[first]-48)*10 + textfile[last]-48;
        AppendVec(digits, num);

        first = -1;
        last  = 0;
    }
    if ( textfile[i] > 47 && textfile[i] < 58 ) {
        last = i;
        first = first == -1 ? i : first;
    }
}

The variables first and last save the first and last appearance of ASCII-digits in each line. The first appearance is distinguished by setting first to -1 - something the iterator i should never be - and only updating the value if first == -1. If a linebreak is encountered, the two digits are evaluated - digit 0 is value 48 in ASCII - and appending the value to an array. I wrote a custom vector struct for this purpose. Then first and last are reset to their default values.

For Hare i used io::drain for the reading and the same logic for the loop.

let textfile = io::drain(fhandle) as []u8;
let digits: []u32 = [];

let first = -1;
let last  = 0;

for ( let i = 0u32; i < len(textfile); i += 1 ) {
    if ( textfile[i] == '\n' ) {
        // No number in line
        if ( first == -1 ) {
            continue;
        };
        let num = (textfile[first]-48)*10 + textfile[last]-48;
        append(digits, num);
        first = -1;
        last  = 0;
    };
    if ( textfile[i] > 47 && textfile[i] < 58 ) {
        if ( first == -1 ) { first = i: i32; };
        last = i: i32;
    };
};

Buffered I/O

In C i used the fgets() function with a heap-buffer and the same vector type as above.

u32 buff_size = 5000000;
char *BUFF = malloc( buff_size );
u32 first = 0;
u32 last  = 0;

Vec digits = NewVec();

while ( 1 ) {
    // Check for EOF
    if ( !fgets(BUFF, buff_size, fhandle) ) {
        break;
    }

    // Check for empty lines
    if ( BUFF[0] == '\n' ) {
        continue;
    }

    // Search first digit
    for ( u32 i = 0; i < strlen(BUFF); i += 1 ) {
        if ( BUFF[i] > 47 && BUFF[i] < 58 ) {
            first = i;
            break;
        }
    }

    // Search last digit
    for ( u32 i = first; i < strlen(BUFF); i += 1 ) {
        if ( BUFF[i] > 47 && BUFF[i] < 58 ) {
            last = i;
        }
    }

    AppendVec(digits, ((BUFF[first]-48)*10 + BUFF[last]-48) );
}

free(BUFF);

The buffer size is quite large, because i tested with long automatically generated files.

For Hare i used the bufio::read_line() function as described in the documentation.

documentation.

let digits: []u32 = [];

const scan = bufio::newscanner(fhandle, types::SIZE_MAX);
for (true) {
    const line = match (bufio::read_line(&scan)!) {
    case io::EOF =>
        break;
    case let line: []u8 =>
        yield line;
    };

    let first = 0u32;
    let last  = 0u32;

    // Find first
    for ( let i=0u32; i < len(line); i += 1 ) {
        // Search for ascii digits
        if ( line[i] < 58 && line[i] > 47 ) {
            first = i;
            break;
        };
    };
    // Find last
    for ( let i=first; i < len(line); i += 1 ) {
        // Search for ascii digits;
        // This works for utf8 as well, since no 7-bit
        // codepoints appear in multibyte characters
        if ( line[i] < 58 && line[i] > 47 ) {
            // Convert ascii digits to digit values
            last = i;
        };
    };

    // Use first found digit as  
    let num = (line[first]-48)*10 + (line[last]-48);
    append(digits, num);
};

Benchmark

To benchmark the different solution, i generated a text file with 10000 lines, random line length between 20 and 20020 chars and random ASCII characters.

for ( size_t i = 0; i < 10000; i += 1 ) {
    size_t len = 20 + rand()%20000;

    char *buff = malloc(len);

    for ( size_t k = 0; k < len-1; k += 1 ) {
        buff[k] = rand()%92 + 33;
    }
    buff[len-1] = 52;

    printf("%s\n", buff);

    free(buff);
}

The resulting file is 96 MiB large and should provide a reasonable testing ground for the different solutions. The last entry is always set to 52 to ensure that at least one digit is always present.

I tested gcc with and without optimization using -O3, tcc and hare build -R. To gain better insights, i measured the execution time of the I/O operations, the extraction loop and the linux execution time. All values are given in seconds.

gcc-O3tcchare
BUFF
time7.2730.1037.242168.658
IO0.0240.0210.021168.168
Loop7.2430.0777.212-
UNBUFF
time0.4430.1910.4400.590
IO0.0440.0350.0350.093
Loop0.3920.1490.3990.492

The significantly slower operations in the buffered I/O package in Hare is quite a surprise. But the jump from gcc/tcc to gcc -O3 is also quite impressive. The unbuffered IO solutions perform better due to the better loop optimization. For me the somewhat slower IO in C and slower time all around using -O3 is also quite surprising.

I’m honestly not too sure what to make of all of this, but it was quite fun!


This site uses the wonderful Catppuccin color scheme. My RSS Feed.
Mail: mia.schambeck@mailbox.org
XMPP: mia.schambeck@mailbox.org
Bluesky: @miajara.bsky.social
Mastodon: @miasb.sueden.social