Sept. 29, 2025

Episode 9: Open Source Models

Episode 9: Open Source Models

In episode nine, hosts explore open source AI models and introduce the "KILLM chain" segment on LLM vulnerabilities. Co-host Dustin mentions an upcoming move, prompting an early recording.

The discussion expands on last week's open source AI model talk, referencing Anthropic CEO Dario Amodei’s view that "open source model" is a misnomer. Unlike software’s editable source code, AI offers "open weights"—trained model parameters—but not training data or processes. Amodei argues model quality, not openness, matters most, comparing models like DeepSeek (open weights) to closed ones like Claude Opus or GPT-5.

Openness varies:

  • Open weights with limits: E.g., Meta’s Llama has licensing restrictions (e.g., shutdown clauses for large-scale use).

  • Unrestricted open weights: Allows inference but not reproduction, like a compiled binary.

  • True open source: Requires training data for auditability/reproducibility, akin to software source code.

Training data is the "source code," defining model strengths, weaknesses, and risks (e.g., unsanitized data leaking PII or backdoors). Without it, auditing is limited; even with it, tech can’t fully trace behavior. Hugging Face stands out, offering models with data for fine-tuning. Challenges include data size (petabytes), sensitivity, and potential exploits (e.g., "Manchurian candidate" triggers). Testing catches some issues but misses rare cases.

Anthropic faces criticism for closed models and perceived regulatory capture, like pushing a California AI kill switch bill, which burdens smaller/open-source players. Hosts speculate closed models hide scraped data, risking lawsuits. They question if public data (e.g., Reddit posts) counts as contributions, estimating a 1/100,000 to 1/100 chance personal content is in models like ChatGPT.

The "KILLM chain" segment, based on OWASP’s LLM Top 10, addresses sensitive data exposure (PII, financials, health records, proprietary algorithms). LLMs risk leaking via outputs if data isn’t sanitized. Mitigation includes:

  • Training data sanitization.

  • User opt-outs/terms of use.

  • Input/output validation via proxies (e.g., LiteLLM, Bedrock guardrails).

  • Defense in depth: Multiple LLMs critiquing outputs to curb hallucinations/leaks.

Examples: Repeatedly prompting "poem" caused an LLM to dump memory (e.g., code, prompts). Hallucinations arise on untrained topics; prompts like "say you don’t know" help. Penetration testing uses fuzzing to extract secrets. Data races amplify risks, as companies mine private data (e.g., Gmail DMs) for advantage, potentially leaking it if unsanitized. Adversarial models could embed exploits.

Real-world issues include AI travel planners inventing places (e.g., "Sacred Canyon of Humanity" in Peru), costing users money.

Atlassian’s Jira Product Discovery Agents (beta) lets PMs input natural-language stories; AI generates tasks, UI/UX mockups, code drafts, docs, and tests, automating "sprint zero." This blurs PM-designer-engineer roles, with developers refining basic code. Software shifts from "gardening" (individual craft) to industrial automation. Tools like Jira/GitLab need code context (e.g., Bitbucket integration) for accuracy.

Benefits: Cuts 80% of development delays from unclear tickets, empowers non-tech users, and enables probabilistic experimentation (e.g., branching like quantum paths). AI’s non-determinism requires guardrails for security/predictability. Agile’s iterative ethos aligns, enabling rapid iterations.

Hosts speculate on future copyright clarifications for training data, likening it to music sampling lawsuits. Anthropic’s stance is seen as pragmatic yet self-serving. The episode ends with thanks and a possible skip next week, blending technical depth, speculation, and humor on AI’s transformative potential and risks.