The Hidden Security Risks of LLMs

rush to integrate large language models (LLMs) into customer service agents, internal copilots, and code generation helpers, there’s a blind spot emerging: security. While we focus on the continuous technological advancements and hype around AI, the underlying risks and vulnerabilities often go unaddressed. I see many companies handling a double standard when it comes to security. OnPrem IT set-ups are subjected to intense scrutiny, but the use of cloud AI services like Azure OpenAI studio, or Google Gemini are adopted quickly with the click of a button.

I know how easy it is to just build a wrapper solution around hosted LLM APIs, but is it really the right choice for enterprise use cases? If your AI agent is leaking company secrets to OpenAI or getting hijacked through a cleverly worded prompt, that’s not innovation but a breach waiting to happen. Just because we’re not directly confronted with security choices that concern the actual models when leveraging these external API’s, should not mean that we can forget that the companies behind those models made those choices for us.

In this article I want to explore the hidden risks and make the case for a more security aware path: self-hosted LLMs and appropriate risk mitigation strategies.

LLMs aren’t safe by default

Just because an LLM sounds very smart with its outputs doesn’t mean that they are inherently safe to integrate into your systems. A recent study by Yoao et al. explored the dual role of LLMs in security [1]. While LLMs open up a lot of possibilities and can sometimes even help with security practices, they also introduce new vulnerabilities and avenues for attack. Standard practices still need to evolve to be able to keep up with the new attack surfaces being created by AI powered solutions.

Let’s have a look at a couple of important security risks that need to be dealt with when working with LLMs.

Data Leakage

Data Leakage happens when sensitive information (like client data or IP) is unintentionally exposed, accessed or misused during model training or inference. With the average cost of a data breach reaching $5 million in 2025 [2], and 33% of employees regularly sharing sensitive data with AI tools [3], data leakage poses a very real risk that should be taken seriously.

Even if those third party LLM companies are promising to not train on your data, it’s hard to verify what’s logged, cached, or stored downstream. This leaves companies with little control over GDPR and HIPAA compliance.

Prompt injection

An attacker doesn’t need root access to your AI systems to do harm. A simple chat interface already provides plenty of opportunity. Prompt Injection is a method where a hacker tricks an LLM into providing unintended outputs or even executing unintended commands. OWASP notes prompt injection as the number one security risk for LLMs [4].

An example scenario:

A user employs an LLM to summarize a webpage containing hidden instructions that cause the LLM to leak chat information to an attacker.

The more agency your LLM has the bigger the vulnerability for prompt injection attacks [5].

Opaque supply chains

LLMs like GPT-4, Claude, and Gemini are closed-source. Therefore you won’t know:

What data they were trained on
When they were last updated
How vulnerable they are to zero-day exploits

Using them in production introduces a blind spot in your security.

Slopsquatting

With more LLMs being used as coding assistants a new security threat has emerged: slopsquatting. You might be familiar with the term typesquatting where hackers use common typos in code or URLs to create attacks. In slopsquatting, hackers do not rely on human typos, but on LLM hallucinations.

LLMs tend to hallucinate non-existing packages when generating code snippets, and if these snippets are used without proper checks, this provides hackers with a perfect opportunity to infect your systems with malware and the likes [6]. Often these hallucinated packages will sound very familiar to real packages, making it more difficult for a human to pick up on the error.

Proper mitigation strategies help

I know most LLMs seem very smart, but they don’t understand the difference between a normal user interaction and a cleverly disguised attack. Relying on them to self-detect attacks is like asking autocomplete to set your firewall rules. That’s why it’s so important to have proper processes and tooling in place to mitigate the risks around LLM based systems.

Mitigation strategies for a first line of defence

There are ways to reduce risk when working with LLMs:

Input/output sanitization (like regex filters). Just like it proved to be important in front-end development, it shouldn’t be forgotten in AI systems.
System prompts with strict boundaries. While system prompts are not a catch-all, they can help to set a good foundation of boundaries
Usage of AI guardrails frameworks to prevent malicious usage and enforce your usage policies. Frameworks like Guardrails AI make it straightforward to set up this type of protection [7].

In the end these mitigation strategies are only a first wall of defence. If you’re using third party hosted LLMs you’re still sending data outside your secure environment, and you’re still dependent on these LLM companies to appropriately handle security vulnerabilities.

Self-hosting your LLMs for more control

There are plenty of powerful open-source alternatives that you can run locally in your own environments, on your own terms. Recent advancements have even resulted in performant language models that can run on modest infrastructure [8]! Considering open-source models is not just about cost or customization (which arguably are nice bonusses as well). It’s about control.

Self-hosting gives you:

Full data ownership, nothing leaves your chosen environment!
Custom fine-tuning possibilities with private data, which allows for better performance for your use cases.
Strict network isolation and runtime sandboxing
Auditability. You know what model version you’re using and when it was changed.

Yes, it requires more effort: orchestration (e.g. BentoML, Ray Serve), monitoring, scaling. I’m also not saying that self-hosting is the answer for everything. However, when we’re talking about use cases handling sensitive data, the trade-off is worth it.

Treat GenAI systems as part of your attack surface

If your chatbot can make decisions, access documents, or call APIs, it’s effectively an unvetted external consultant with access to your systems. So treat it similarly from a security point of view: govern access, monitor carefully, and don’t outsource sensitive work to them. Keep the important AI systems in house, in your control.