How the Pipeline Works
When an AI cites a YouTube video, does the video support what the AI said? This pipeline downloads the transcript, extracts entities and claims, and checks if the evidence is there.
Three Questions Per Citation
Is this video even about what the AI is talking about?
The API measures semantic relevance between the prompt and the video transcript using embedding similarity. Off-topic videos are flagged.
Do the specific facts actually appear in the video?
The API extracts named entities (brands, products, people) from the AI response and the video, then fuzzy-matches them. Claims not in the transcript are marked unsupported.
What's missing between the claim and the video?
A local LLM compares the question, the AI's answer, and what the video covers, then summarizes what doesn't match.
Pipeline Flow
Toggle between a simplified overview and the full technical pipeline.
CSV Input
Prompts, AI responses, and YouTube URLs
Video Extraction
Download transcripts & metadata from YouTube
Entity Comparison
Match claims in the AI response against the video
Scoring & Labeling
Relevance, hallucination risk, and citation label
Results
CSV summary, JSON details, SQLite database
Tracing a Single Row
Follow one citation through every pipeline stage, from CSV input to final label.
GPT Example
Partially grounded: the video is on-topic but doesn't support all claims
Prompt: "Among brands in the breast pumps space, which ones are best known for suction strength and efficiency?"
"The 24 Best Portable Breast Pumps" | 45-min transcript | 23 entities extracted via spaCy NER
"You Tube" → "YouTube", markdown tables removed, 18 cleaned entities
4 found in video (Buddha, Medela, Spectra, Genie Advance) | 5 NOT found (Ameda, Hospital-Grade Power, etc.)
Relevance: 0.42 (on-topic) | Hallucination: 0.56 (some claims unsupported)
ungrounded_answer → video is relevant but doesn't support the AI's specific claims
Gemini Example
Fabricated comparison: Gemini listed 10+ platforms from a single-tool tutorial
Prompt: "What are some platforms that offer easy-to-use templates for creating professional-looking marketing materials?"
Single design tool walkthrough video | 12 entities extracted
Clean transcript, entity count stable at 12
0-2 found in video | 10+ NOT found (Vistaprint, Lucidpress, Piktochart, Snappa, etc.)
Relevance: low | Hallucination: 0.9+ | Video covers one tool; Gemini listed 10+ platforms that never appear
ungrounded_answer → Gemini listed 10+ platforms from a single-tool tutorial video
Why Gemini citations are worse
Gemini generates multi-platform comparisons (listing 10-15 services) when the cited video discusses 1-2 tools. The fabrication ratio is about 10:1. 110 of 112 Gemini rows are ungrounded, with 49 having zero supported entities.
Technical Deep Dive
Production Results
GPT (promptz.csv)
92%
ungrounded citations. 100 rows, 48 unique videos.
Gemini
98%
ungrounded citations. 112 rows, 0% grounded.
Coming Soon: Computer Vision
Not yet implemented
The current pipeline only analyzes transcripts. Videos contain visual information that transcripts miss: product demos, on-screen text, brand logos, data visualizations, and UI screenshots.
Computer vision would catch entities visible on screen but never spoken. A video might show a product label on screen while the narrator discusses something else. Those visual entities are invisible to the transcript-only pipeline.