Advent of Code - Day 1

My boyfriend and i decided to start last years advent of code today. The first exercise can be found on their webpage.

In short: Parse a text file and extract the first and last digit in a line to form a two digit number. For example:

1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet

should give 12, 38, 15 and 77. A short and fun exercise.

I wanted to use this opportunity and compare the standard file input functions in Hare and C. First i’ll look at a simple unbuffered read of the whole file, second i’ll compare buffered line reads.

Unbuffered I/O

In C i used the fread() function to perform the read operation. Since i needed the amount of bytes to read for the call, i used ftell for that first.

// Get length of file in bytes
fseek(fhandle, 0, SEEK_END);
u32 len = ftell(fhandle);
// Set file pointer to beginning
rewind(fhandle);

// Read whole file
char *textfile = malloc(len+1);
fread(textfile, 1, len, fhandle);
fclose(fhandle);

Next i iterated over the textfile and extracted the first and last digits.

int first = -1;
int last  = 0;

for ( u32 i = 0; i < len; i += 1 ) {
    // Search individual lines
    if ( textfile[i] == '\n' ) {
        u32 num = (textfile[first]-48)*10 + textfile[last]-48;
        AppendVec(digits, num);

        first = -1;
        last  = 0;
    }
    if ( textfile[i] > 47 && textfile[i] < 58 ) {
        last = i;
        first = first == -1 ? i : first;
    }
}

The variables first and last save the first and last appearance of ASCII-digits in each line. The first appearance is distinguished by setting first to -1 - something the iterator i should never be - and only updating the value if first == -1. If a linebreak is encountered, the two digits are evaluated - digit 0 is value 48 in ASCII - and appending the value to an array. I wrote a custom vector struct for this purpose. Then first and last are reset to their default values.

For Hare i used io::drain for the reading and the same logic for the loop.

let textfile = io::drain(fhandle) as []u8;
let digits: []u32 = [];

let first = -1;
let last  = 0;

for ( let i = 0u32; i < len(textfile); i += 1 ) {
    if ( textfile[i] == '\n' ) {
        // No number in line
        if ( first == -1 ) {
            continue;
        };
        let num = (textfile[first]-48)*10 + textfile[last]-48;
        append(digits, num);
        first = -1;
        last  = 0;
    };
    if ( textfile[i] > 47 && textfile[i] < 58 ) {
        if ( first == -1 ) { first = i: i32; };
        last = i: i32;
    };
};

Buffered I/O

In C i used the fgets() function with a heap-buffer and the same vector type as above.

u32 buff_size = 5000000;
char *BUFF = malloc( buff_size );
u32 first = 0;
u32 last  = 0;

Vec digits = NewVec();

while ( 1 ) {
    // Check for EOF
    if ( !fgets(BUFF, buff_size, fhandle) ) {
        break;
    }

    // Check for empty lines
    if ( BUFF[0] == '\n' ) {
        continue;
    }

    // Search first digit
    for ( u32 i = 0; i < strlen(BUFF); i += 1 ) {
        if ( BUFF[i] > 47 && BUFF[i] < 58 ) {
            first = i;
            break;
        }
    }

    // Search last digit
    for ( u32 i = first; i < strlen(BUFF); i += 1 ) {
        if ( BUFF[i] > 47 && BUFF[i] < 58 ) {
            last = i;
        }
    }

    AppendVec(digits, ((BUFF[first]-48)*10 + BUFF[last]-48) );
}

free(BUFF);

The buffer size is quite large, because i tested with long automatically generated files.

For Hare i used the bufio::read_line() function as described in the documentation.

documentation.

let digits: []u32 = [];

const scan = bufio::newscanner(fhandle, types::SIZE_MAX);
for (true) {
    const line = match (bufio::read_line(&scan)!) {
    case io::EOF =>
        break;
    case let line: []u8 =>
        yield line;
    };

    let first = 0u32;
    let last  = 0u32;

    // Find first
    for ( let i=0u32; i < len(line); i += 1 ) {
        // Search for ascii digits
        if ( line[i] < 58 && line[i] > 47 ) {
            first = i;
            break;
        };
    };
    // Find last
    for ( let i=first; i < len(line); i += 1 ) {
        // Search for ascii digits;
        // This works for utf8 as well, since no 7-bit
        // codepoints appear in multibyte characters
        if ( line[i] < 58 && line[i] > 47 ) {
            // Convert ascii digits to digit values
            last = i;
        };
    };

    // Use first found digit as  
    let num = (line[first]-48)*10 + (line[last]-48);
    append(digits, num);
};

Benchmark

To benchmark the different solution, i generated a text file with 10000 lines, random line length between 20 and 20020 chars and random ASCII characters.

for ( size_t i = 0; i < 10000; i += 1 ) {
    size_t len = 20 + rand()%20000;

    char *buff = malloc(len);

    for ( size_t k = 0; k < len-1; k += 1 ) {
        buff[k] = rand()%92 + 33;
    }
    buff[len-1] = 52;

    printf("%s\n", buff);

    free(buff);
}

The resulting file is 96 MiB large and should provide a reasonable testing ground for the different solutions. The last entry is always set to 52 to ensure that at least one digit is always present.

I tested gcc with and without optimization using -O3, tcc and hare build -R. To gain better insights, i measured the execution time of the I/O operations, the extraction loop and the linux execution time. All values are given in seconds.

	gcc	-O3	tcc	hare
BUFF
time	7.273	0.103	7.242	168.658
IO	0.024	0.021	0.021	168.168
Loop	7.243	0.077	7.212	-
UNBUFF
time	0.443	0.191	0.440	0.590
IO	0.044	0.035	0.035	0.093
Loop	0.392	0.149	0.399	0.492

The significantly slower operations in the buffered I/O package in Hare is quite a surprise. But the jump from gcc/tcc to gcc -O3 is also quite impressive. The unbuffered IO solutions perform better due to the better loop optimization. For me the somewhat slower IO in C and slower time all around using -O3 is also quite surprising.

I’m honestly not too sure what to make of all of this, but it was quite fun!

This site uses the wonderful Catppuccin color scheme.