Turning tedious lab work quickly into actionable insights
During my time as a Physics student, manually extracting and analysing experimental measurements was often an unavoidable and frustrating part of Physics labs. Reading values from instruments, writing them down, transferring them into spreadsheets, and finally plotting the results was slow, repetitive, and error-prone.
Now that I work in Generative AI, I wondered: Why not automate this with AI?
This led me to build AI-OCR, an open-source prototype that uses AI to extract numerical data from images and turn it into insightful plots. The process of extracting text or numbers from images is commonly referred to as Optical Character Recognition (OCR) – hence the name for this project.
How it works:
- Upload images of measurements (or structured PDFs like financial reports)
- Prompt the AI to extract specific values into a clean DataFrame
- Prompt the AI to generate visualisations like time series, histograms, scatter plots, etc.
By automating what used to be tedious, AI-OCR helps reduce manual work while also breaking free from vendor lock-in. In many lab and industrial environments, even digital data often lives in proprietary formats, requiring expensive and/or restrictive software for access and analysis. With AI-OCR, you can simply photograph the measurements, extract the data from the image, and analyse as well as visualise the results with a simple prompt.
While conceived with simplifying lab workflows in mind, the tool’s applications extend far beyond science. From tracking health metrics to analysing utility bills or financial statements, AI-OCR can support a wide range of everyday data tasks.
In this article, I will walk through:
- Real-world use cases for the prototype
- A breakdown of how it works under the hood
- Challenges, limits, and trade-offs encountered
- Potential approaches for further development
Practical use cases: Where AI-OCR shines
Since I no longer work in a physics lab and I unfortunately do not have one in my basement, I was not able to test AI-OCR in its originally intended environment. Instead, I discovered several everyday use cases where this prototype proved to be surprisingly helpful.
In this section, I will walk through four real-world examples. I used AI-OCR to extract numerical data from everyday images/documents like the ones in the image below and generate meaningful plots with minimal effort. For each of these use cases, I used OpenAI’s API to the GPT-4.1 model for both the OCR and the data visualisation (more technical details in section 3).
Blood pressure tracking
In this first use case, I used AI-OCR to track my blood pressure and heart rate throughout the day. You can see a full demonstration of this use case in the following video:
🎥 https://youtu.be/pTk9RgQ5SkM
Here is how I used it in practice:
- I recorded my blood pressure roughly every 30 minutes by taking photos of the monitor’s display.
- I uploaded the images and prompted the AI to extract: systolic pressure, diastolic pressure, and heart rate.
- AI-OCR returned a
pandas.DataFrame
with the extracted values, timestamped using the image metadata. - Finally, I asked the AI to plot systolic and diastolic pressure as a time series, including horizontal lines indicating standard healthy ranges, as well as the heart rate in a separate subplot.

The result? A visual overview of my (slightly elevated) blood pressure fluctuations throughout the day, with a clear drop after lunch at 1PM. What’s particularly encouraging is that the plot does not show any obvious outliers, a good sanity check that indicates the AI extracted the values correctly from the images.
Most modern blood pressure monitors only store a limited number of readings internally. The device I used, for example, can hold up to 120 values. However, many affordable models (like mine) do not support data export. Even when they do, they often require proprietary apps, locking your health data into closed ecosystems. As you can see, this is not the case here.
Body weight tracking
In another health-related use case, I used AI-OCR to track my body weight over several weeks during a personal weight- loss effort.
Traditionally, you might weigh yourself and manually enter the result into a fitness app. Some modern scales offer synchronisation via Bluetooth, but again the data is often locked inside proprietary apps. These apps typically limit both data access and the kinds of visualisations you can generate, making it difficult to truly own or analyse your own health data.
With AI-OCR, I simply took a photo of my scale reading every morning. For someone who is not exactly a morning person, it felt far easier than fiddling with an app before my breakfast tea. Once I had a batch of images, I uploaded them and asked AI-OCR to extract the weight values and generate a time series plot of my weight.

From the resulting graph, you can see that I lost around 3 kg over roughly two month. I also asked the AI to perform a linear regression, estimating a weight loss rate of ~0.4 kg/week. With this approach, the user has full control over the analysis: I can ask the AI to generate a trend line, estimate my weight loss rate, or apply any custom logic I need.
Financial data analysis
AI-OCR is not just useful for health tracking. It can also help make sense of your personal finances. In my case, I found that the analytics provided by my brokerage app offered only basic summaries of my portfolio and often missed key insights about my investment strategy. Some numbers were even inaccurate or incomplete.
One example: after moving my portfolio to a new brokerage, I wanted to verify that my buy-in values were transferred correctly. This can be cumbersome, especially when shares are accumulated over time through savings plans or multiple partial purchases. Doing this manually would mean digging through many PDFs, copying numbers into spreadsheets, and double-checking formulas, all of which is time-consuming and error-prone.
AI-OCR automated the entire workflow. I uploaded all the PDF purchase confirmations from my previous broker and prompted the AI to extract share name, nominal value, and purchase price. In the second step, I asked it to compute the buy-in values for each share and generate a bar plot of the results. In the prompt I explained how to calculate the buy-in value:
“Buy-in value = share price × nominal value, normalized over total nominal value.”
The generated plot let me quickly spot inconsistencies in the transfer of the buy in-values. In fact, this plot allowed me to catch a few errors in the numbers from my new brokerage app.
Similarly, you can prompt AI-OCR to calculate realised gains or losses over time, based on your transaction history. This is a metric my brokerage app does not even provide.
Electricity metre readings
For the final use case, I will demonstrate how I digitised and tracked my electricity consumption using this prototype.
Like many older houses in Germany, mine still uses an analogue electricity metre, which makes daily tracking nearly impossible using modern (digital) technology. If I want to analyse consumption over a time period, I have to read the metre manually at the beginning and end of the period. Then I must repeat this for each interval/day. Doing this over multiple days quickly becomes mundane and error-prone.
Instead, I photographed the metre (almost) every day for a few weeks and uploaded the images to AI-OCR. With two simple prompts, the tool extracted the readings and generated a time-series plot of my cumulative electricity consumption in kWh.

The plot reveals a generally linear trend, a sign that my daily consumption was relatively steady. However, three outliers can be seen. These were not caused by my secret bitcoin mining rigs but instead resulted from misread digits during the OCR process. In three out of the 27 images, the model simply made a recognition error.
These glitches point us to current limitations of AI-OCR, which I will explore in more detail shortly. But first, let’s have a closer look at how this prototype actually works under the hood.
Under the hood: How AI-OCR works
AI-OCR is split into two main components: a frontend and a backend. The frontend is built using Streamlit, a Python library that lets you quickly turn Python scripts into web apps with little effort. It is a popular choice for machine learning prototypes and proofs of concept, thanks to its simplicity. That said: Streamlit is not intended for production-scale applications though.
This is why the main focus of this article is on the backend, which is where data extraction and visualisation take place. It is designed around two distinct processes:
- OCR (Optical Character Recognition): Recognising the numerical data from images or documents using AI.
- Data visualisation: Transforming the extracted data into insightful plots.
One of AI-OCR’s strengths is its flexibility: it is model-agnostic. You are not locked into a single Large Language Model (LLM) vendor. Both commercial and open-source models can be configured and swapped depending on the use case. Each process is powered by configurable LLMs. Besides OpenAI models such as GPT-4.1, the prototype supports (so far) quantised models in GGUF format, a binary file format that packages model weights and metadata together. These are loaded in and run locally via the llama.cpp
Python library.
For the OCR task, Hugging Face offers a huge variety of quantised models such as LLaVa, DeepSeek-VL, or Llama-3-vision. For the code generation of the visualisation component, models with strong coding capabilities are most effective. Due to lack of computational resources at home (I do not have access to a powerful GPU), I have only thoroughly tested this prototype with OpenAI models via the API.
The OCR component: Extracting the data
To turn images into insights, the relevant data must be recognised from the images, which is handled in the OCR component. The process begins when the user uploads images and submits a prompt describing which values should be recognised from the image and optional additional context to assist the model. The output is a pandas.DataFrame
containing the extracted values alongside the timestamps of the images.
The diagram below illustrates the design of the data extraction pipeline. The outer box represents the Streamlit-based frontend, while the inner section details the backend architecture, a REST API. Arrows connecting the frontend and backend represent API calls. Within the backend, each icon symbolises a distinct component of the backend.

At the core of the backend is the OCR Modelling object. When a prompt is submitted, this object receives it along with the selected model configuration. It loads the appropriate model and accesses the uploaded images.
One particularly instructive part of this design is the way the prompt is handled. Before the actual OCR task is performed, the prompt from the user is enhanced with the help of a Small Language Model (SLM). The SLM’s role is to identify the specific values mentioned in the user’s prompt and return them as a list. For example, in the blood pressure use case, the SLM would return:
[“heart rate”, “diastolic pressure”, “systolic pressure”]
.
This information is used to automatically enhance the original user prompt. The LLM is always requested to return structured output. Thus, the prompt needs to be enhanced by the specific JSON output format, which for the blood pressure case reads:
{“heart rate”: “value”, “diastolic pressure”: “value”, “systolic pressure”: “value”}
.
Notice that the SLM used here runs locally using llama.cpp
. For the use cases discussed previously, I used Gemma-2 9B (in quantised GGUF format). This technique highlights how smaller, lightweight models can be used for efficient and automatic prompt optimisation.
This enhanced prompt is then sent sequentially, along with each image, to the selected LLM. The model infers the requested values from the image. The responses are then aggregated into a pandas.DataFrame
, which is eventually returned to the user for viewing and downloading.
Visualising the result
The second part of turning your images into insights is the visualisation process. Here, the numerical data extracted into the DataFrame during the OCR process is transformed into meaningful plots based on the user’s request.
The user provides a prompt describing the type of visualisation they want (e.g., time series, histogram, scatter plot). The LLM then generates Python code to create the requested plot. This generated code is executed on the frontend, and the resulting visualisation is displayed directly within the frontend.
The diagram below once again illustrates this process in detail. The core of this particular process is the Plot Modelling object. It receives two key inputs:
- The user’s prompt describing the desired visualisation
- The
pandas.DataFrame
generated by the OCR process.

Before passing the prompt and metadata about the DataFrame to the LLM, the prompt first passes through a Governance Gateway. Its job is to ensure security by preventing the generation or execution of malicious code. It is implemented as an SLM. As previously, I used Gemma-2 9B (in quantized GGUF format) as an SLM that runs locally using llama.cpp
.
Specifically, the Governance Gateway first verifies via the instructed SLM that the user’s prompt contains a valid data visualisation request and does not include any harmful or suspicious instructions. Only if the prompt passes this initial check, it is forwarded to the LLM to generate the Python plotting code. After the code is generated, it is sent back to the SLM for a second security review to ensure the code is safe to execute.
After passing the second security validation, the generated code is sent back to the frontend, where it is executed to generate the requested plot. This two-factor governance approach helps ensure that AI-generated code runs safely and securely, while giving the user the flexibility to generate any desired data visualisation within the Matplotlib ecosystem.
Challenges, limits, and trade-offs
As already touched upon in the use case section, this prototype, particularly the OCR component, has notable limitations due to the constraints of the underlying language models.
In this section, I want to explicitly demonstrate two scenarios (illustrated in the image below) where the tool currently struggles significantly, and why, in some cases, it might not be the most optimal solution. These two scenarios both require interpreting analogue data. Despite the increasing digitization of lab equipment, this is still an important requirement for the application of such a tool in many physics labs.

On the left is an attempt to measure the length of an object (in this case, a book) using a physical ruler. On the right is an image of my car’s analogue RPM metre. In both cases, I processed multiple images with the prototype: static images for measuring the length of the book and video frames for reading the RPM metre. Despite supplying high-quality inputs and carefully crafted prompts, the resulting measurements were imprecise. While the extracted values always fell within the expected numeric range, they were consistently too far off for real-world applications.
While AI-OCR offers convenience, in some cases the overall cost might outweigh its benefits. In cases like the body weight tracker, it is worth mentioning, the tool provides convenience but at a cost in memory and token usage. Each image may be several megabytes, while the extracted data (a single float) is just a few bytes. Image analysis with LLMs can also be expensive. These trade-offs highlight the need to always align AI applications with clear business value.
Conclusion: Custom AI agents for tomorrow’s lab
In this article, we explored how to build an LLM-powered prototype that transforms measurement images into structured data and insightful plots. Users can upload images, describe the values they want to be recognised from the images as well as the type of data visualisation to be performed. The user then receives both raw values and visual interpretations.
If you have tried ChatGPT or other LLM platforms, you may have noticed: it can already do much of this and perhaps more. Simply upload an image to the chat, describe your desired data visualisation (optionally add context), and the system (e.g. ChatGPT) figures out the rest. Under the hood, this likely relies on a system of AI agents working in concert.
That same type of architecture is what a future version of AI-OCR could embrace. But why bother building it if one could simply use ChatGPT instead? Because of customisation and control. Unlike ChatGPT, the AI agents in AI-OCR can be tailored to your specific needs (like that of a lab assistant), and with local models, you retain full control over your data. For instance, you very likely would prefer not to upload your personal finance documents to ChatGPT.
A possible architecture for such a system of AI agents (that ChatGPT very likely relies on as well) is illustrated in the diagram below:

At the top level, a Root Agent receives the user’s input and delegates tasks via an Agent Communication Protocol (ACP). It can choose between two auxiliary agents:
- OCR Agent: Extracts relevant numerical data from images and interfaces with a Model Context Protocol (MCP) server that manages CSV data storage.
- The Data Vis(ualisation) Agent: Connects to a separate MCP plot server capable of executing Python code. This server includes the Governance Gateway powered by an SLM, which ensures all code is safe and appropriate before execution.
Unlike ChatGPT, this setup can be fully tailored: from local LLMs for data protection to system prompt tuning of the agents for niche tasks. AI-OCR is not meant to replace ChatGPT, but rather complement it. It could evolve into an autonomous lab assistant that streamlines data extraction, plotting, and analysis in specialised environments.
Acknowledgement
If you’re curious about the future of AI-OCR or interested in exploring ideas and collaborations, feel free to connect with me on LinkedIn.
At the end, I would like to thank Oliver Sharif, Tobias Jung, Sascha Niechciol, Oisín Culhane, and Justin Mayer for their feedback and sharp proofreading. Your insights greatly improved this article.