Sept. 23, 2025

Episode 8: LLM Caching

Show Notes

In this episode, the hosts discuss the latest news and trends in AI, focusing on LLM caching, a new EU regulation on AI-generated code, the changing landscape for Stack Overflow, and a recent AI security vulnerability.

The hosts explain LLM caching as a technique to boost efficiency and cut costs for AI providers and developers. It involves saving parts of a prompt that are sent repeatedly, such as tool descriptions for a code agent or a developer's code. This means the content doesn't need to be re-tokenized each time, saving computational power. Providers offer a reduced rate for these cached tokens.

The discussion also highlights proxies like Light LLM, which can cache and reuse responses for multiple users even if their prompts aren't identical. This is achieved through semantic caching, which understands the meaning of words, allowing similar queries to receive the same cached answer.

The hosts express skepticism about the European Union's new AI Act, which mandates that any code "substantially generated or assisted by an AI system" must be clearly identified. This "AI watermarking" aims to increase transparency, but it has open-source platforms debating whether to accept AI-generated code contributions at all due to legal and compliance issues.

One host questions the regulation's practicality, seeing it as a fear-based, "proactive" measure for a problem that hasn't yet been observed. They point out the difficulty of reliably detecting and labeling AI-written code, especially as AI models improve at mimicking human styles. The hosts also note a study showing that AI coding assistants are more likely to introduce security vulnerabilities because they are trained on public code that often contains bugs and outdated security practices.

The podcast covers the decline of Stack Overflow, attributing it to the rise of generative AI tools. Traffic has dropped, and Stack Overflow has responded by partnering with OpenAI to provide its data and adding its own AI features. The hosts believe Stack Overflow's data is a valuable asset that should be monetized rather than scraped.

They conclude that Stack Overflow and similar content websites face a "generational problem." Younger users are less likely to use traditional sites, preferring integrated experiences like chatbots and AI assistants. They compare the future of the internet to a "Netflix algorithm," where AI will guide users directly to the content they need.

In their "Secure or Sus" segment, the hosts discuss a security flaw that allows a threat actor to steal a user's ChatGPT conversation through an "indirect prompt injection." The attacker uploads a malicious prompt to a public website. When a user interacts with it, ChatGPT is tricked into generating an image whose URL secretly contains the user's conversation. The image then sends the conversation to the attacker's server.

The hosts explain that this type of data exfiltration attack can be prevented with defensive measures like an LLM proxy and input/output sanitization. They note that similar vulnerabilities could exist in other AI-driven platforms and conclude that security in the age of AI requires proactive, disciplined measures rather than simply reacting to known vulnerabilities.

Episode 8: LLM Caching

Listen On

Recent Episodes