Posts by Tags

The ExoLabel Post: Clustering Massive Networks with Limited Resources

33 minute read

Published: April 11, 2025

I’ve made a lot of posts about ExoLabel, but the project has been moving so quickly that they’ve become out of date almost as soon as I post them. I’m finally close to the end of this project, so I thought it was high time I write out the entire project (partly for my own reference, so I don’t forget).

A Story of a Sneaky Bug

10 minute read

Published: April 04, 2025

I solved a really tricky bug in ExoLabel recently and thought it would be an interesting experience to share.

Efficient File I/O, Part 4: In-Place External Sorting

19 minute read

Published: December 13, 2024

If you’ve been following all the posts in this series, you’ll know that by know I have a pretty good way to read in edges for ExoLabel due primarily to faster I/O and an optimized external sorting function (if you haven’t, check out the first post here!). I left off in my last post by mentioning that I wanted to change my external sort to work in-place, but I didn’t really explain why.

Efficient File I/O, Part 3: Loser Trees and External Sorting

25 minute read

Published: December 09, 2024

In my last post, I talked about ways to improve the efficiency of fwrite calls. Essentially, it boils down to prioritizing sequential read/writes. At the end of that post, I mentioned that I need a way to sort a file on disk. These are called external sorting algorithms. While they’re not used very often today, they were incredibly important in the past. Back in the days of tape drives, computers rarely had enough RAM to load a dataset into memory for sorting, and so external sorts were used instead.

Efficient File I/O, Part 2: Why is `fwrite` so slow?

8 minute read

Published: December 07, 2024

My last post talked about some of the things I discovered when looking into how to optimize my current research project, ExoLabel. Since then, I’ve made some big progress improvements in terms of speed, and I thought it would be worth it to break them down. Building efficient external memory algorithms is a really cool process; every potential inefficiency mattters a lot more than with RAM-based algorithms, so you start to understand how the computer works at a deeper level.

Efficient File I/O, Part 1: Some (somewhat) surprising findings on C’s `fseek`

16 minute read

Published: October 10, 2024

I’m working a lot with files for my latest research project, ExoLabel. I’ve (mostly) finished ensuring that the algorithm itself is accurate, so lately I’ve been turning my attention to optimizing its speed. Unsurprisingly, most of the slowest operations are working with files, since accessing files is orders of magnitude slower than accessing RAM.

Clustering without RAM

13 minute read

Published: March 01, 2024

Update: This project has progressed pretty far from this inital sketch. This post describes my first pass at this problem–the final version is quite a bit different, and ended up teaching me a lot about computers and how they work with files. I’ve summarized those findings in a four-part series on Efficient File I/O, which you can check out here. Original post is below.

Writing a Random Forest from Scratch

35 minute read

Published: January 11, 2024

I’ve recently had to implement random forests from scratch in R. This is a much longer post than I normally make, since I’m going to go through all the details of actually implementing one of these models. By “from scratch”, I mean a complete Random Forest prediction model, written in R, with no packages aside from those provided in a base installation.

Fortran, C, and R

13 minute read

Published: January 05, 2024

First post of 2024! I wanted to take a little bit of time to talk about using C and Fortran in R. I often feel like the documentation for using C is a little tough to find, and finding out how to call Fortran from R is even harder. Is it even worth it, though? Should you be using Fortran in your R code? And if you could write C, why would you bother with Fortran? Let’s look through them step by step, using two common sorting algorithms (quicksort and mergesort) as examples.

Refactoring R’s `dendrapply`

16 minute read

Published: February 23, 2023

As someone who specializes in comparative phylogenomics, I work a lot with phylogenetic trees. Trees are represented in R as dendrogram objects, which are essentially a series of nested lists. Each “node” of the tree is a list with multiple members (two if a binary tree, but dendrogram objects are not constrained to be binary), each of which is another dendrogram object. The leaves are special cases in that they have length 1 and an additional property leaf, which is set to TRUE.

6502 FORTH, Part 6: 16-Bit Division

18 minute read

Published: January 06, 2023

In my last post, I wrote an algorithm for multiplication. I figured I should at least finish wrapping up the basic arithmetic functions before I go back to writing the main Forth interpreter, so today I’m implementing division.

6502 FORTH, Part 5: Multiplication for Peasants

10 minute read

Published: January 05, 2023

It’s been a while since I worked on this project, and I wanted to ease back into it by implementing something auxilliary to get me back into the flow of writing assembly. It turns out the 6502 only has instructions for addition and subtraction, meaning that if you want any higher level arithmetic operations (multiplication, division), they need to be implemented manually. Furthermore, since we’re implementing a 16-bit system, we’ll have to make sure these operations work on 16-bit numbers. This post is going to cover multiplication–I’m still working out the best way to write a division algorithm.

6502 FORTH, Part 4: Basic I/O

14 minute read

Published: December 21, 2022

I’ve been posting about creating a Forth interpreter for a 65c02, and at this point I’m pretty close to something that could be tested. However, I still need one more piece of infrastructure before I can begin writing and testing my Forth interpreter: some way to communicate with a user.

6502 FORTH, Part 3: 16 Bit Stack

9 minute read

Published: December 16, 2022

In my previous post, I created my first Forth words: next, exit, and dolist. I was about to continue on to creating some simple arithmetic words, but then I realized my program is missing the main data structure of Forth…the internal data stack. This is a 16-bit implementation, so I’ll need a 16-bit stack. This of course is not included in the default 65c02 system, so I had to write one myself.

6502 FORTH, Part 2: NEXT, EXIT, and DOLIST

6 minute read

Published: December 15, 2022

In my previous post, I set up my development environment for creating a Forth interpreter from scratch. The next step is to create the foundational Forth operators next, exit, and dolist.

6502 FORTH, Part 1: Setup

5 minute read

Published: December 14, 2022

Now that the hardware for my 65C02 computer is more or less complete, it’s time to start working on the software. I’ve been very interested in the Forth language since discovering it a couple years ago due to its relative simplicity, low size requirements, efficiency, and departure from many conventions seen in other programming languages.

Writing a Random Forest from Scratch

35 minute read

Published: January 11, 2024

I’ve recently had to implement random forests from scratch in R. This is a much longer post than I normally make, since I’m going to go through all the details of actually implementing one of these models. By “from scratch”, I mean a complete Random Forest prediction model, written in R, with no packages aside from those provided in a base installation.

Fortran, C, and R

13 minute read

Published: January 05, 2024

First post of 2024! I wanted to take a little bit of time to talk about using C and Fortran in R. I often feel like the documentation for using C is a little tough to find, and finding out how to call Fortran from R is even harder. Is it even worth it, though? Should you be using Fortran in your R code? And if you could write C, why would you bother with Fortran? Let’s look through them step by step, using two common sorting algorithms (quicksort and mergesort) as examples.

The ExoLabel Post: Clustering Massive Networks with Limited Resources

33 minute read

Published: April 11, 2025

I’ve made a lot of posts about ExoLabel, but the project has been moving so quickly that they’ve become out of date almost as soon as I post them. I’m finally close to the end of this project, so I thought it was high time I write out the entire project (partly for my own reference, so I don’t forget).

Converting vectors to dendrograms in R

7 minute read

Published: August 18, 2024

I do really like R’s dendrogram object, but it can be really clunky to work with them. I recently implemented random forests from scratch, and the decision trees within were stored as two vectors. However, it would be really nice to be able to plot these decision trees to visualize their contents. This got me thinking–is it possible to convert vectors of values into dendrogram objects in R if we know the vectors are ordered in a known tree traversal? In other languages this would be fairly simple, but the nested list structure of dendrogram objects (and their other attributes) makes it a little trickier in R.

Automated R Testing and Coverage on GitHub (without Codecov!)

11 minute read

Published: August 15, 2024

I’ve recently been working on adding some unit testing to Biostrings as part of working I’m doing for a grant awarded by R’s Infrastructure Steering Committee. As part of implementing unit tests, I wanted to add an automatic workflow that would evaluate the current tests on any future pull requests, so that (1) we could be sure contributed code isn’t breaking any additional functionality (2) we can be sure that contributed code is adequately tested.

Writing a Random Forest from Scratch

35 minute read

Published: January 11, 2024

I’ve recently had to implement random forests from scratch in R. This is a much longer post than I normally make, since I’m going to go through all the details of actually implementing one of these models. By “from scratch”, I mean a complete Random Forest prediction model, written in R, with no packages aside from those provided in a base installation.

Fortran, C, and R

13 minute read

Published: January 05, 2024

First post of 2024! I wanted to take a little bit of time to talk about using C and Fortran in R. I often feel like the documentation for using C is a little tough to find, and finding out how to call Fortran from R is even harder. Is it even worth it, though? Should you be using Fortran in your R code? And if you could write C, why would you bother with Fortran? Let’s look through them step by step, using two common sorting algorithms (quicksort and mergesort) as examples.

Forth for R

3 minute read

Published: November 27, 2023

I’ve always enjoyed learning about different programming languages. Different languages come with different specialties, paradigms, and constraints. As said in a recent talk at Strange Loop, the languages we know affect how we think about and approach problems.

`dendrapply` and How To Contribute to R

7 minute read

Published: November 10, 2023

If you’ve been following my blog posts, you know that I previously refactored R’s dendrapply function. After some initial feedback from R-devel, I was encouraged to apply for the R Project Sprint at the University of Warwick in the UK.

Refactoring R’s `dendrapply`

16 minute read

Published: February 23, 2023

As someone who specializes in comparative phylogenomics, I work a lot with phylogenetic trees. Trees are represented in R as dendrogram objects, which are essentially a series of nested lists. Each “node” of the tree is a list with multiple members (two if a binary tree, but dendrogram objects are not constrained to be binary), each of which is another dendrogram object. The leaves are special cases in that they have length 1 and an additional property leaf, which is set to TRUE.

My SWE Interview Experiences: Amazon vs. Meta vs. Google

14 minute read

Published: April 18, 2025

I just finished an interview at Google that I wasn’t initially expecting to get. I prepared for it like I had for my Meta interviews, but the experience was significantly different. I feel like people online treat FAANG/tech interviews as if they’re all the same, but these experiences could not have been more different…so I figured it would be helpful for someone out there if I documented how they compared to each other.

The ExoLabel Post: Clustering Massive Networks with Limited Resources

33 minute read

Published: April 11, 2025

I’ve made a lot of posts about ExoLabel, but the project has been moving so quickly that they’ve become out of date almost as soon as I post them. I’m finally close to the end of this project, so I thought it was high time I write out the entire project (partly for my own reference, so I don’t forget).

A Story of a Sneaky Bug

10 minute read

Published: April 04, 2025

I solved a really tricky bug in ExoLabel recently and thought it would be an interesting experience to share.

The ‘Research Scientist’ Role at Meta: What is it, exactly?

7 minute read

Published: March 19, 2025

My last post detailed my interview experience at Meta for a “Research Scientist” role. However, some people I know also interviewed for “Research Scientist” positions and had completely different experiences. It seems like the title “Research Scientist” is what’s given to employees with a PhD, and their actual responsibilities can vary widely. I did some research (no pun intended) on this and thought I’d share what the difference in the process for each position looks like, both based on my experience and what I’ve read online.

So what is it like to interview at Meta/Facebook? (Research Scientist)

18 minute read

Published: February 17, 2025

I recently accepted an offer to join Meta as a Research Scientist following graduation from my PhD program. I’ve gotten a lot of questions on what the process looks like and how I prepared, so I figured I’d put together a blog post to keep everything organized in one place.

Efficient File I/O, Part 4: In-Place External Sorting

19 minute read

Published: December 13, 2024

If you’ve been following all the posts in this series, you’ll know that by know I have a pretty good way to read in edges for ExoLabel due primarily to faster I/O and an optimized external sorting function (if you haven’t, check out the first post here!). I left off in my last post by mentioning that I wanted to change my external sort to work in-place, but I didn’t really explain why.

Efficient File I/O, Part 3: Loser Trees and External Sorting

25 minute read

Published: December 09, 2024

In my last post, I talked about ways to improve the efficiency of fwrite calls. Essentially, it boils down to prioritizing sequential read/writes. At the end of that post, I mentioned that I need a way to sort a file on disk. These are called external sorting algorithms. While they’re not used very often today, they were incredibly important in the past. Back in the days of tape drives, computers rarely had enough RAM to load a dataset into memory for sorting, and so external sorts were used instead.

Efficient File I/O, Part 2: Why is `fwrite` so slow?

8 minute read

Published: December 07, 2024

My last post talked about some of the things I discovered when looking into how to optimize my current research project, ExoLabel. Since then, I’ve made some big progress improvements in terms of speed, and I thought it would be worth it to break them down. Building efficient external memory algorithms is a really cool process; every potential inefficiency mattters a lot more than with RAM-based algorithms, so you start to understand how the computer works at a deeper level.

Efficient File I/O, Part 1: Some (somewhat) surprising findings on C’s `fseek`

16 minute read

Published: October 10, 2024

I’m working a lot with files for my latest research project, ExoLabel. I’ve (mostly) finished ensuring that the algorithm itself is accurate, so lately I’ve been turning my attention to optimizing its speed. Unsurprisingly, most of the slowest operations are working with files, since accessing files is orders of magnitude slower than accessing RAM.

Converting vectors to dendrograms in R

7 minute read

Published: August 18, 2024

I do really like R’s dendrogram object, but it can be really clunky to work with them. I recently implemented random forests from scratch, and the decision trees within were stored as two vectors. However, it would be really nice to be able to plot these decision trees to visualize their contents. This got me thinking–is it possible to convert vectors of values into dendrogram objects in R if we know the vectors are ordered in a known tree traversal? In other languages this would be fairly simple, but the nested list structure of dendrogram objects (and their other attributes) makes it a little trickier in R.

Automated R Testing and Coverage on GitHub (without Codecov!)

11 minute read

Published: August 15, 2024

I’ve recently been working on adding some unit testing to Biostrings as part of working I’m doing for a grant awarded by R’s Infrastructure Steering Committee. As part of implementing unit tests, I wanted to add an automatic workflow that would evaluate the current tests on any future pull requests, so that (1) we could be sure contributed code isn’t breaking any additional functionality (2) we can be sure that contributed code is adequately tested.

Clustering without RAM

13 minute read

Published: March 01, 2024

Update: This project has progressed pretty far from this inital sketch. This post describes my first pass at this problem–the final version is quite a bit different, and ended up teaching me a lot about computers and how they work with files. I’ve summarized those findings in a four-part series on Efficient File I/O, which you can check out here. Original post is below.

Writing a Random Forest from Scratch

35 minute read

Published: January 11, 2024

I’ve recently had to implement random forests from scratch in R. This is a much longer post than I normally make, since I’m going to go through all the details of actually implementing one of these models. By “from scratch”, I mean a complete Random Forest prediction model, written in R, with no packages aside from those provided in a base installation.

Fortran, C, and R

13 minute read

Published: January 05, 2024

First post of 2024! I wanted to take a little bit of time to talk about using C and Fortran in R. I often feel like the documentation for using C is a little tough to find, and finding out how to call Fortran from R is even harder. Is it even worth it, though? Should you be using Fortran in your R code? And if you could write C, why would you bother with Fortran? Let’s look through them step by step, using two common sorting algorithms (quicksort and mergesort) as examples.

Forth for R

3 minute read

Published: November 27, 2023

I’ve always enjoyed learning about different programming languages. Different languages come with different specialties, paradigms, and constraints. As said in a recent talk at Strange Loop, the languages we know affect how we think about and approach problems.

`dendrapply` and How To Contribute to R

7 minute read

Published: November 10, 2023

If you’ve been following my blog posts, you know that I previously refactored R’s dendrapply function. After some initial feedback from R-devel, I was encouraged to apply for the R Project Sprint at the University of Warwick in the UK.

Refactoring R’s `dendrapply`

16 minute read

Published: February 23, 2023

As someone who specializes in comparative phylogenomics, I work a lot with phylogenetic trees. Trees are represented in R as dendrogram objects, which are essentially a series of nested lists. Each “node” of the tree is a list with multiple members (two if a binary tree, but dendrogram objects are not constrained to be binary), each of which is another dendrogram object. The leaves are special cases in that they have length 1 and an additional property leaf, which is set to TRUE.

Finishing the 6502 Emulator

1 minute read

Published: February 06, 2023

This will be a short blog post–I’ve officially finished v1.0.0 of my 65c02 emulator. Since last time, I’ve fixed a bunch of bugs, finished implementing the 65C02 extended opcode set (including the Rockwell/WDC bit set/clear instructions and test-and-branch instructions), and wrote assembly scripts to test the implementation of (nearly) all the opcodes. The only ones I haven’t thoroughly checked are the Rockwell/WDC extended instructions (e.g. RMB0, SMB0, BBR0, BBS0). I’ve also updated the GUI to graphically iterate through instructions when (r)un is input, meaning you can set up an infinite loop and watch it iterate through. The iteration executes at the same speed the computer normally would (determined by the clock speed), so you can watch it step through programs at slow speeds if you’d like.

Writing a 6502 Emulator, part 2

6 minute read

Published: January 27, 2023

Last week, I set up the beginnings of a 6502 emulator, including the core codebase. Unfortunately, a command line application that just runs 6502 assembly code is super hard to debug. The 6502 isn’t equipped with any way to print output by default (unless you’d hook up a 65C22 VIA, but coding that seems tricky), and reading raw bytecode isn’t the easiest thing to do. Other emulators I’ve used (ex. Symon) include a pretty nice GUI to debug applications. I didn’t want to go as far as writing a whole application frontend, but I did think implementing some kind of updated UI would be a great addition for both users and for my personal debugging.

Writing a 6502 Emulator

12 minute read

Published: January 20, 2023

I recently watched an awesome video from Computerphile about writing an emulator for the Atari 2600. The 2600 runs off a 6507 processor, which is basically a modified 6502. This got me thinking: how hard would it actually be to write an emulator for a 6502 computer? At this point I’ve already built a computer with one and am close to having a working Forth interpreter–so I’m pretty familiar with how the microprocessor works internally.

6502 FORTH, Part 6: 16-Bit Division

18 minute read

Published: January 06, 2023

In my last post, I wrote an algorithm for multiplication. I figured I should at least finish wrapping up the basic arithmetic functions before I go back to writing the main Forth interpreter, so today I’m implementing division.

6502 FORTH, Part 5: Multiplication for Peasants

10 minute read

Published: January 05, 2023

It’s been a while since I worked on this project, and I wanted to ease back into it by implementing something auxilliary to get me back into the flow of writing assembly. It turns out the 6502 only has instructions for addition and subtraction, meaning that if you want any higher level arithmetic operations (multiplication, division), they need to be implemented manually. Furthermore, since we’re implementing a 16-bit system, we’ll have to make sure these operations work on 16-bit numbers. This post is going to cover multiplication–I’m still working out the best way to write a division algorithm.

6502 FORTH, Part 4: Basic I/O

14 minute read

Published: December 21, 2022

I’ve been posting about creating a Forth interpreter for a 65c02, and at this point I’m pretty close to something that could be tested. However, I still need one more piece of infrastructure before I can begin writing and testing my Forth interpreter: some way to communicate with a user.

6502 FORTH, Part 3: 16 Bit Stack

9 minute read

Published: December 16, 2022

In my previous post, I created my first Forth words: next, exit, and dolist. I was about to continue on to creating some simple arithmetic words, but then I realized my program is missing the main data structure of Forth…the internal data stack. This is a 16-bit implementation, so I’ll need a 16-bit stack. This of course is not included in the default 65c02 system, so I had to write one myself.

2022 Holiday Coding Challenge

8 minute read

Published: December 15, 2022

One of the programming groups I’m in recently posted a challenge for the holiday season. The task is to print the following ASCII art using whatever language you like:

6502 FORTH, Part 2: NEXT, EXIT, and DOLIST

6 minute read

Published: December 15, 2022

In my previous post, I set up my development environment for creating a Forth interpreter from scratch. The next step is to create the foundational Forth operators next, exit, and dolist.

6502 FORTH, Part 1: Setup

5 minute read

Published: December 14, 2022

Now that the hardware for my 65C02 computer is more or less complete, it’s time to start working on the software. I’ve been very interested in the Forth language since discovering it a couple years ago due to its relative simplicity, low size requirements, efficiency, and departure from many conventions seen in other programming languages.

64th02 Computer, Part 2

1 minute read

Published: December 10, 2022

I finally got my shipment of resistors and soldering wicks, so it’s time to complete the hardware build for this computer! Unfortunately header pins are numerous, and soldering wicks aren’t quite as effective as I had hoped at removing solder. Fortunately I had plenty of extra parts, and PCBway’s minimum PCB shipment is five boards, so I decided to start over from scratch.

64th02 Computer, Part 1

1 minute read

Published: December 07, 2022

I recently built an 8-bit computer on a breadboard using a 6502 microprocessor. The experience was great, but I’ve been dying to migrate this project to an actual PCB so I can play around with it without worrying about breaking it. I’d also like to remove the dependence on a Raspberry Pico, since it feels like cheating to use a device significantly more powerful than a 6502 for a 6502 computer. My end goal is to build a FORTH interpreter from scratch that I can run on this machine. This is the beginning of my 6-FORTH-02 Computer, AKA the 64th02!

Starting a blog!

less than 1 minute read

Published: December 05, 2022

I figured it would be fun to keep a record of all the projects I’m working on in a blog format, so I’ve set up the blog page on my website! You’ll be able to access this page at any time by going to ahl27.com/blog!

Finishing the 6502 Emulator

1 minute read

Published: February 06, 2023

This will be a short blog post–I’ve officially finished v1.0.0 of my 65c02 emulator. Since last time, I’ve fixed a bunch of bugs, finished implementing the 65C02 extended opcode set (including the Rockwell/WDC bit set/clear instructions and test-and-branch instructions), and wrote assembly scripts to test the implementation of (nearly) all the opcodes. The only ones I haven’t thoroughly checked are the Rockwell/WDC extended instructions (e.g. RMB0, SMB0, BBR0, BBS0). I’ve also updated the GUI to graphically iterate through instructions when (r)un is input, meaning you can set up an infinite loop and watch it iterate through. The iteration executes at the same speed the computer normally would (determined by the clock speed), so you can watch it step through programs at slow speeds if you’d like.

Writing a 6502 Emulator, part 2

6 minute read

Published: January 27, 2023

Last week, I set up the beginnings of a 6502 emulator, including the core codebase. Unfortunately, a command line application that just runs 6502 assembly code is super hard to debug. The 6502 isn’t equipped with any way to print output by default (unless you’d hook up a 65C22 VIA, but coding that seems tricky), and reading raw bytecode isn’t the easiest thing to do. Other emulators I’ve used (ex. Symon) include a pretty nice GUI to debug applications. I didn’t want to go as far as writing a whole application frontend, but I did think implementing some kind of updated UI would be a great addition for both users and for my personal debugging.

Writing a 6502 Emulator

12 minute read

Published: January 20, 2023

I recently watched an awesome video from Computerphile about writing an emulator for the Atari 2600. The 2600 runs off a 6507 processor, which is basically a modified 6502. This got me thinking: how hard would it actually be to write an emulator for a 6502 computer? At this point I’ve already built a computer with one and am close to having a working Forth interpreter–so I’m pretty familiar with how the microprocessor works internally.

2022 Holiday Coding Challenge

8 minute read

Published: December 15, 2022

One of the programming groups I’m in recently posted a challenge for the holiday season. The task is to print the following ASCII art using whatever language you like:

Aidan Lakshman

Posts by Tags

6502

C

Forth

Fortran

R

blog posts

emulator

just-for-fun