Building a web server in assembly to give my life (a lack of) meaning #
I’ve also written a more detailed writeup here: https://imtomt.github.io/ymawky/
I’ve also written a more detailed writeup here: https://imtomt.github.io/ymawky/
Also happy to take UI improvements because I am not great in that area!
Live demo: https://1e4.ai Code: https://github.com/thomasj02/1e4_ai
A few things that might be interesting:
- Trained on almost a full year of Lichess blitz games, around 1B total games
- Architecture is an a small (~9MM parameters) transformer-based network that takes the board, recent move history, the player's rating, and remaining clock time as input. Three separate models per rating bucket: move, clock-usage, and win probability. The clock model is what makes the bots feel humanish under time pressure rather than instant. Because the move model takes the clock as one input parameter, it also learns to blunder under time pressure like a human might.
- Because the network is so tiny, no GPU is needed for inference - it runs easily on a local CPU
- Downside of the tiny network is that it's a bit weak as you turn up the rating past around 1700. It can spot short tactics but not long multi-move combinations.
- Initial training on a rented 8xH100 cluster, then fine-tunes on my local GPU for different rating ranges
- Inspired by Maia-2 and DeepMind's "Grandmaster-Level Chess Without Search". On a held-out Lichess blitz benchmark, the it beats Maia-2 blitz on top-1 move prediction (56.7% vs 52.7%) and pretty substantially on win-probability calibration (Brier 0.176 vs 0.272). Numbers and code in https://github.com/thomasj02/1e4_ai/tree/master/experiments/...
- The data pipeline is C++ via nanobind, then training with Pytorch. Getting this right was actually the thing I spent the most time on. Pre-shuffling the dataset and then being able to read the shuffled dataset sequentially at training time kept the GPU utilization high. Without this it spent a huge percentage of time on I/O while the GPU sat idle.
Happy to answer questions about the rating-conditioning, the clock model, or the data pipeline.
Some cool things:
- The "Remind" list lives with your other Reminders, so adding from your phone or watch (or another Mac) just works.
- Frontmatter in the Reminder lets it continue an existing Claude session. A Stop hook captures the session id, so the next time it runs it can pick up where the last left off.
It's free, updates itself with Sparkle, and requires that you have `claude` on your $PATH running on macOS 15+.
Remind has become sticky in my personal workflow so I thought I'd share it here. Let me know if you have any suggestions or questions.
Have fun! This is the game where lower score is goood for your mental health
Tested on HotpotQA public dataset:
Vector + BM25 + entity graph: BothFound@5 71.5% Vector + BM25 only: BothFound@5 59.5%
Entity graph is the game changer to extract connected facts.
More Benchmark result:
LongMemEval-S: 84.8% recallAll@5 LoCoMo-10: 59% vs zep cloud 28%
What is your approach for connected facts retrieval ?