Blog Posts


Efficient File I/O, Part 4: In-Place External Sorting

19 minute read


If you’ve been following all the posts in this series, you’ll know that by know I have a pretty good way to read in edges for ExoLabel due primarily to faster I/O and an optimized external sorting function (if you haven’t, check out the first post here!). I left off in my last post by mentioning that I wanted to change my external sort to work in-place, but I didn’t really explain why.

Efficient File I/O, Part 3: Loser Trees and External Sorting

25 minute read


In my last post, I talked about ways to improve the efficiency of fwrite calls. Essentially, it boils down to prioritizing sequential read/writes. At the end of that post, I mentioned that I need a way to sort a file on disk. These are called external sorting algorithms. While they’re not used very often today, they were incredibly important in the past. Back in the days of tape drives, computers rarely had enough RAM to load a dataset into memory for sorting, and so external sorts were used instead.

Efficient File I/O, Part 2: Why is fwrite so slow?

8 minute read


My last post talked about some of the things I discovered when looking into how to optimize my current research project, ExoLabel. Since then, I’ve made some big progress improvements in terms of speed, and I thought it would be worth it to break them down. Building efficient external memory algorithms is a really cool process; every potential inefficiency mattters a lot more than with RAM-based algorithms, so you start to understand how the computer works at a deeper level.

Efficient File I/O, Part 1: Some (somewhat) surprising findings on C’s fseek

16 minute read


I’m working a lot with files for my latest research project, ExoLabel. I’ve (mostly) finished ensuring that the algorithm itself is accurate, so lately I’ve been turning my attention to optimizing its speed. Unsurprisingly, most of the slowest operations are working with files, since accessing files is orders of magnitude slower than accessing RAM.

Converting vectors to dendrograms in R

7 minute read


I do really like R’s dendrogram object, but it can be really clunky to work with them. I recently implemented random forests from scratch, and the decision trees within were stored as two vectors. However, it would be really nice to be able to plot these decision trees to visualize their contents. This got me thinking–is it possible to convert vectors of values into dendrogram objects in R if we know the vectors are ordered in a known tree traversal? In other languages this would be fairly simple, but the nested list structure of dendrogram objects (and their other attributes) makes it a little trickier in R.

Automated R Testing and Coverage on GitHub (without Codecov!)

11 minute read


I’ve recently been working on adding some unit testing to Biostrings as part of working I’m doing for a grant awarded by R’s Infrastructure Steering Committee. As part of implementing unit tests, I wanted to add an automatic workflow that would evaluate the current tests on any future pull requests, so that (1) we could be sure contributed code isn’t breaking any additional functionality (2) we can be sure that contributed code is adequately tested.

Clustering without RAM

12 minute read


Update: This project has progressed pretty far from this inital sketch. This post describes my first pass at this problem–the final version is quite a bit different, and ended up teaching me a lot about computers and how they work with files. I’ve summarized those findings in a four-part series on Efficient File I/O, which you can check out here. Original post is below.

Writing a Random Forest from Scratch

35 minute read


I’ve recently had to implement random forests from scratch in R. This is a much longer post than I normally make, since I’m going to go through all the details of actually implementing one of these models. By “from scratch”, I mean a complete Random Forest prediction model, written in R, with no packages aside from those provided in a base installation.

Fortran, C, and R

13 minute read


First post of 2024! I wanted to take a little bit of time to talk about using C and Fortran in R. I often feel like the documentation for using C is a little tough to find, and finding out how to call Fortran from R is even harder. Is it even worth it, though? Should you be using Fortran in your R code? And if you could write C, why would you bother with Fortran? Let’s look through them step by step, using two common sorting algorithms (quicksort and mergesort) as examples.


Forth for R

3 minute read


I’ve always enjoyed learning about different programming languages. Different languages come with different specialties, paradigms, and constraints. As said in a recent talk at Strange Loop, the languages we know affect how we think about and approach problems.

Refactoring R’s dendrapply

16 minute read


As someone who specializes in comparative phylogenomics, I work a lot with phylogenetic trees. Trees are represented in R as dendrogram objects, which are essentially a series of nested lists. Each “node” of the tree is a list with multiple members (two if a binary tree, but dendrogram objects are not constrained to be binary), each of which is another dendrogram object. The leaves are special cases in that they have length 1 and an additional property leaf, which is set to TRUE.

Finishing the 6502 Emulator

1 minute read


This will be a short blog post–I’ve officially finished v1.0.0 of my 65c02 emulator. Since last time, I’ve fixed a bunch of bugs, finished implementing the 65C02 extended opcode set (including the Rockwell/WDC bit set/clear instructions and test-and-branch instructions), and wrote assembly scripts to test the implementation of (nearly) all the opcodes. The only ones I haven’t thoroughly checked are the Rockwell/WDC extended instructions (e.g. RMB0, SMB0, BBR0, BBS0). I’ve also updated the GUI to graphically iterate through instructions when (r)un is input, meaning you can set up an infinite loop and watch it iterate through. The iteration executes at the same speed the computer normally would (determined by the clock speed), so you can watch it step through programs at slow speeds if you’d like.

Writing a 6502 Emulator, part 2

6 minute read


Last week, I set up the beginnings of a 6502 emulator, including the core codebase. Unfortunately, a command line application that just runs 6502 assembly code is super hard to debug. The 6502 isn’t equipped with any way to print output by default (unless you’d hook up a 65C22 VIA, but coding that seems tricky), and reading raw bytecode isn’t the easiest thing to do. Other emulators I’ve used (ex. Symon) include a pretty nice GUI to debug applications. I didn’t want to go as far as writing a whole application frontend, but I did think implementing some kind of updated UI would be a great addition for both users and for my personal debugging.

Writing a 6502 Emulator

12 minute read


I recently watched an awesome video from Computerphile about writing an emulator for the Atari 2600. The 2600 runs off a 6507 processor, which is basically a modified 6502. This got me thinking: how hard would it actually be to write an emulator for a 6502 computer? At this point I’ve already built a computer with one and am close to having a working Forth interpreter–so I’m pretty familiar with how the microprocessor works internally.

6502 FORTH, Part 6: 16-Bit Division

18 minute read


In my last post, I wrote an algorithm for multiplication. I figured I should at least finish wrapping up the basic arithmetic functions before I go back to writing the main Forth interpreter, so today I’m implementing division.

6502 FORTH, Part 5: Multiplication for Peasants

10 minute read


It’s been a while since I worked on this project, and I wanted to ease back into it by implementing something auxilliary to get me back into the flow of writing assembly. It turns out the 6502 only has instructions for addition and subtraction, meaning that if you want any higher level arithmetic operations (multiplication, division), they need to be implemented manually. Furthermore, since we’re implementing a 16-bit system, we’ll have to make sure these operations work on 16-bit numbers. This post is going to cover multiplication–I’m still working out the best way to write a division algorithm.


6502 FORTH, Part 4: Basic I/O

14 minute read


I’ve been posting about creating a Forth interpreter for a 65c02, and at this point I’m pretty close to something that could be tested. However, I still need one more piece of infrastructure before I can begin writing and testing my Forth interpreter: some way to communicate with a user.

6502 FORTH, Part 3: 16 Bit Stack

9 minute read


In my previous post, I created my first Forth words: next, exit, and dolist. I was about to continue on to creating some simple arithmetic words, but then I realized my program is missing the main data structure of Forth…the internal data stack. This is a 16-bit implementation, so I’ll need a 16-bit stack. This of course is not included in the default 65c02 system, so I had to write one myself.

2022 Holiday Coding Challenge

8 minute read


One of the programming groups I’m in recently posted a challenge for the holiday season. The task is to print the following ASCII art using whatever language you like:

6502 FORTH, Part 1: Setup

5 minute read


Now that the hardware for my 65C02 computer is more or less complete, it’s time to start working on the software. I’ve been very interested in the Forth language since discovering it a couple years ago due to its relative simplicity, low size requirements, efficiency, and departure from many conventions seen in other programming languages.

64th02 Computer, Part 2

1 minute read


I finally got my shipment of resistors and soldering wicks, so it’s time to complete the hardware build for this computer! Unfortunately header pins are numerous, and soldering wicks aren’t quite as effective as I had hoped at removing solder. Fortunately I had plenty of extra parts, and PCBway’s minimum PCB shipment is five boards, so I decided to start over from scratch.

64th02 Computer, Part 1

1 minute read


I recently built an 8-bit computer on a breadboard using a 6502 microprocessor. The experience was great, but I’ve been dying to migrate this project to an actual PCB so I can play around with it without worrying about breaking it. I’d also like to remove the dependence on a Raspberry Pico, since it feels like cheating to use a device significantly more powerful than a 6502 for a 6502 computer. My end goal is to build a FORTH interpreter from scratch that I can run on this machine. This is the beginning of my 6-FORTH-02 Computer, AKA the 64th02!

Starting a blog!

less than 1 minute read


I figured it would be fun to keep a record of all the projects I’m working on in a blog format, so I’ve set up the blog page on my website! You’ll be able to access this page at any time by going to!