AI Is Not a Black Box (Relatively Speaking)

Summary: Opinion piece for the general TDS audience. I argue that AI is more transparent than humans in tangible ways. Claims of AI being a “black box” lack perspective and comparison to the opacity in studies of human intelligence which in some ways is behind studies of artificial intelligence.

reader, are a black box. Your mind is mysterious. I can’t know how you are thinking. I can’t know what you will do and I can’t know whether your words are honest and whether you justify your actions honestly and without pretext. We learn to understand and trust humans from many years of introspection and experience interacting with others. But experience also tells us that understanding is limited to those with similar-enough life backgrounds and trust is unwarranted for those with motivations contrary to our own.

Artificial intelligence—while still mysterious—is crystal clear in comparison. I can probe an AI for its equivalent of thoughts and motivations and know I’m getting the truth. Further, the AI equivalent of “life background”, its training data, and equivalent of “motivations”, its training goal, are mostly if not entirely known and open to scrutiny and analysis. While we still lack years of experience with modern AI systems, I argue that there is no problem of opacity; to the contrary, the relative transparency of AI systems to inspection, their “white box” nature, can be a foundation for understanding and trust.

You may have heard of AI as a “black box” in two senses: AI like OpenAI’s ChatGPT or Anthropic’s Claude are black boxes because you cannot inspect their code or parameters (black box access). In the more general sense, even if you could inspect those things (white box access), they would be of little help in understanding how the AI operates to any generalizable extent. You could follow every instruction that defines ChatGPT and gain no more insight than if you merely read its output, a corollary to the Chinese room argument. A (human) mind, however, is more opaque than even restricted-access AI. Since physical barriers and ethical constraints limit interrogation of the mechanisms of human thought and our models of the brain’s architecture and components are incomplete, the human mind is more of a black box—albeit an organic, carbon-based, “natural” one—than even the proprietary, closed-source AI models. Let’s compare what current science tells us about the internal workings of the human brain on the one hand and AI models on the other.

Fig 2. fMRI-captured volume of human brain. Functional data not shown. Image by author; data by Pietrini et al. included under PPDL.

As of 2025, the only static neural structures that have been mapped—those of a fly—have but a tiny fraction of the complexity of the human brain. Functionally, experiments using functional magnetic resonance imaging (fMRI) can pinpoint neural activity down to about 1mm³ volumes of brain matter. Figure 2 shows an example of the neural structure captured as part of an fMRI study. The required hardware includes a machine worth at least $200,000, steady access to liquid helium, and a supply of very patient humans willing to hold still while a tonne of superconductor spins inches from their heads. While fMRI studies can establish that, for example, the processing of visual depictions of faces and houses is associated with certain brain regions, much of what we know about the functions of the brain is thanks to literal accidents, which are of course not ethically scalable. Ethical, less invasive experimental approaches provide relatively low signal-to-noise ratios.

Fig 3. 425k concepts in Gemma2-2B across its 26 layers. Animation highlights each layer in sequence. Image and arrangement by author; data by Google included under CC BY.

Open source models (white box access), including large language models (LLM), are regularly sliced and diced (virtually) and otherwise interrogated in much more invasive ways than possible on humans even with the most expensive fMRI machine and sharpest scalpel—this using consumer computer gaming hardware. Every single bit of every single neural connection can be inspected and logged repeatedly and consistently under a huge space of inputs. The AI does not tire in the process, nor is it affected in any way. This level of access, control, and repeatability allows us to extract a massive amount of signal from which we can perform much fine-grained analysis. Controlling what an AI is observing lets us connect familiar concepts to components and processes within and outside of an AI in useful ways:

Associate neural activity with conc epts akin to an fMRI. We can tell whether an AI is “thinking” about a particular concept. How well can we tell when a human is thinking about a particular concept? Figs. 1 and 3 are two renderings of concepts from GemmaScope which provides annotations google’s Gemma2 model internals to concepts.

Determine the importance of particular inputs to outputs. We can tell whether a specific part of a prompt was important in producing an AI’s response. Can we tell whether a human’s decision is impacted by a particular concern?

Attribute conveyance of concepts as paths through an AI. This means we can tell exactly where in a neural network a concept traveled from input words to eventual outputs. Fig 4 shows an example of such a path trace for a grammatical concept of subject-number agreement. Can we do the same for humans?

Fig 4. Path through which subject-number agreement is conveyed across the layers of a bidirectional transformer (BERT) model. Image by author (source).

Humans can, of course, self-report answers to the first two questions above. You can ask a hiring manager what they were thinking about when they read your résumé or what factors were important in their decision to offer you a job (or not). Unfortunately, humans lie, they themselves don’t know the reasons for their actions, or they are biased in ways they are not aware of. While this is also the case for generative AI, methods for interpretability in the AI space do not rely on AI’s answers, truthful, unbiased, self-aware, or otherwise. We don’t need to trust the AI’s outputs in order to tell whether it is thinking about a particular concept. We literally read it off a (virtual) probe stuck onto its neurons. For open source models, this is trivial, laughably so considering what it takes to get this sort of information (ethically) out of a human.

What about closed-source “black box access” AI? Much can be inferred just from black box access. Models’ lineage is known, and so is their general architecture. Their basic components are standard. They can also be interrogated at a rate much higher than a human would put up with, and in a more controlled and reproducible manner. Repeatability under chosen inputs is often a replacement for open access. Parts of models can be inferred or their semantics copied by “distillation”. So black-box is not an absolute impediment to understanding and trust, but the most immediate way to make AI more transparent is to allow open access to its entire specification, despite current trends among the prominent AI builders.

Humans may be the more complex thinking machines, so the above comparisons may not seem fair. And we are more inclined to feel that we understand and can trust humans because of our years of experience being human and interacting with other (presumed) humans. Our experience with various AIs is growing rapidly, and so are their capabilities. While the sizes of the top-performing models are also growing, their general architectures have been stable. There is no indication that we will lose the kind of transparency into their operation described above, even as they attain and subsequently surpass human capabilities. There is also no indication that exploration of the human brain is likely to yield a breakthrough significant enough to render it the less opaque intelligence. AI is not—and likely will not become—the black box that the popular human sentiment says it is.

Piotr Mardziel, head of AI, RealmLabs.AI.

Sophia Merow and Saurabh Shintre contributed to this post.