in machine learning are the same.
Coding, waiting for results, interpreting them, returning back to coding. Plus, some intermediate presentations of one’s progress. But, things mostly being the same does not mean that there’s nothing to learn. Quite on the contrary! Two to three years ago, I started a daily habit of writing down lessons that I learned from my ML work. In looking back through some of the lessons from this month, I found three practical lessons that stand out:
- Keep logging simple
- Use an experimental notebook
- Keep overnight runs in mind
Keep logging simple
For years, I used Weights & Biases (W&B)* as my go-to experiment logger. In fact, I have once been in the top 5% of all active users. The stats in below figure tell me that, at that time, I’ve trained close to 25000 models, used a cumulative 5000 hours of compute, and did more than 500 hyperparameter searches. I used it for papers, for big projects like weather prediction with large datasets, and for tracking countless small-scale experiments.
And W&B really is a great tool: if you want beautiful dashboards and are collaborating** with a team, W&B shines. And, until recently, while reconstructing data from trained neural networks, I ran multiple hyperparameter sweeps and W&B’s visualization capabilities were invaluable. I could directly compare reconstructions across runs.
But I realized that for most of my research projects, W&B was overkill. I rarely revisited individual runs, and once a project was done, the logs just sat there, and I did nothing with them ever after. When I then refactored the mentioned data reconstruction project, I thus explicitly removed the W&B integration. Not because anything was wrong with it, but because it wasn’t necessary.
Now, my setup is much simpler. I just log selected metrics to CSV and text files, writing directly to disk. For hyperparameter searches, I rely on Optuna. Not even the distributed version with a central server — just local Optuna, saving study states to a pickle file. If something crashes, I reload and continue. Pragmatic and sufficient (for my use cases).
The key insight here is this: logging is not the work. It’s a support system. Spending 99% of your time deciding on what you want to log — gradients? weights? distributions? and at which frequency? — can easily distract you from the actual research. For me, simple, local logging covers all needs, with minimal setup effort.
Maintain experimental lab notebooks
In December 1939, William Shockley wrote down an idea into his lab notebook: replace vacuum tubes with semiconductors. Roughly 20 years later, Shockley and two colleagues at Bell Labs were awarded Nobel Prizes for the invention of the modern transistor.
While most of us aren’t writing Nobel-worthy entries into our notebooks, we can still learn from the principle. Granted, in machine learning, our laboraties don’t have chemicals or test tubes, as we all envision when we think about a laboratory. Instead, our labs often are our computers; the same device that I use to write these lines has trained countless models over the years. And these labs are inherently portably, especially when we are developing remotely on high-performance compute clusters. Even better, thanks to highly-skilled administrative stuff, these clusters are running 24/7 — so there’s always time to run an experiment!
But, the question is, which experiment? Here, a former colleague introduced me to the idea of mainting a lab notebook, and lately I’ve returned to it in the simplest form possible. Before starting long-running experiments, I write down:
what I’m testing, and why I’m testing it.
Then, when I come back later — usually the next morning — I can immediately see which results are ready and what I had hoped to learn. It’s simple, but it changes the workflow. Instead of just “rerun until it works,” these dedicated experiments become part of a documented feedback loop. Failures are easier to interpret. Successes are easier to replicate.
Run experiments overnight
That’s a small, but painful lessons that I (re-)learned this month.
On a Friday evening, I discovered a bug that might affect my experiment results. I patched it and reran the experiments to validate. By Saturday morning, the runs had finished — but when I inspected the results, I realized I had forgotten to include a key ablation. Which meant … another full day of waiting.
In ML, overnight time is precious. For us programmers, it’s rest. For our experiments, it’s work. If we don’t have an experiment running while we sleep, we’re effectively wasting free compute cycles.
That doesn’t mean you should run experiments just for the sake of it. But whenever there is a meaningful one to launch, starting them in the evening is the perfect time. Clusters are often under-utilized and resources are more quickly available, and — most importantly — you will have results to analyse the next morning.
A simple trick is to plan this deliberately. As Cal Newport mentions in his book “Deep Work”, good workdays start the night before. If you know tomorrow’s tasks today, you can set up the right experiments in time.
* That ain’t bashing W&B (it would have been the same with, e.g., MLFlow), but rather asking users to evaluate what their project goals are, and then spend the majority of time on pursuing that goals with utmost focus.
** Footnote: mere collaborating is in my eyes not enough to warrant using such shared dashboards. You need to gain more insights from such shared tools than the time spent setting them up.