How the Pipeline Works

When an AI cites a YouTube video, does the video support what the AI said? This pipeline downloads the transcript, extracts entities and claims, and checks if the evidence is there.

Three Questions Per Citation

Is this video even about what the AI is talking about?

The API measures semantic relevance between the prompt and the video transcript using embedding similarity. Off-topic videos are flagged.

Do the specific facts actually appear in the video?

The API extracts named entities (brands, products, people) from the AI response and the video, then fuzzy-matches them. Claims not in the transcript are marked unsupported.

What's missing between the claim and the video?

A local LLM compares the question, the AI's answer, and what the video covers, then summarizes what doesn't match.

Pipeline Flow

Toggle between a simplified overview and the full technical pipeline.

CSV Input

Prompts, AI responses, and YouTube URLs

Video Extraction

Download transcripts & metadata from YouTube

Entity Comparison

Match claims in the AI response against the video

Scoring & Labeling

Relevance, hallucination risk, and citation label

Results

CSV summary, JSON details, SQLite database

Tracing a Single Row

Follow one citation through every pipeline stage, from CSV input to final label.

GPT Example

Partially grounded: the video is on-topic but doesn't support all claims

1
InputRaw CSV row with prompt, AI response, and YouTube URL

Prompt: "Among brands in the breast pumps space, which ones are best known for suction strength and efficiency?"

2
ExtractDownload video transcript and metadata

"The 24 Best Portable Breast Pumps" | 45-min transcript | 23 entities extracted via spaCy NER

3
CleanNormalize ASR artifacts and strip markdown from response

"You Tube" → "YouTube", markdown tables removed, 18 cleaned entities

4
MatchCompare entities across video transcript and AI response

4 found in video (Buddha, Medela, Spectra, Genie Advance) | 5 NOT found (Ameda, Hospital-Grade Power, etc.)

5
ScoreCalculate relevance and hallucination risk

Relevance: 0.42 (on-topic) | Hallucination: 0.56 (some claims unsupported)

6
LabelApply citation label based on thresholds

ungrounded_answer → video is relevant but doesn't support the AI's specific claims

Gemini Example

Fabricated comparison: Gemini listed 10+ platforms from a single-tool tutorial

1
InputRaw CSV row with prompt, AI response, and YouTube URL

Prompt: "What are some platforms that offer easy-to-use templates for creating professional-looking marketing materials?"

2
ExtractDownload video transcript and metadata

Single design tool walkthrough video | 12 entities extracted

3
CleanNormalize ASR and strip response markdown

Clean transcript, entity count stable at 12

4
MatchCompare entities across sources

0-2 found in video | 10+ NOT found (Vistaprint, Lucidpress, Piktochart, Snappa, etc.)

5
ScoreCalculate relevance and hallucination risk

Relevance: low | Hallucination: 0.9+ | Video covers one tool; Gemini listed 10+ platforms that never appear

6
LabelApply citation label

ungrounded_answer → Gemini listed 10+ platforms from a single-tool tutorial video

Why Gemini citations are worse

Gemini generates multi-platform comparisons (listing 10-15 services) when the cited video discusses 1-2 tools. The fabrication ratio is about 10:1. 110 of 112 Gemini rows are ungrounded, with 49 having zero supported entities.

Technical Deep Dive

Production Results

GPT (promptz.csv)

92%

ungrounded citations. 100 rows, 48 unique videos.

Gemini

98%

ungrounded citations. 112 rows, 0% grounded.

Coming Soon: Computer Vision

Not yet implemented

The current pipeline only analyzes transcripts. Videos contain visual information that transcripts miss: product demos, on-screen text, brand logos, data visualizations, and UI screenshots.

Computer vision would catch entities visible on screen but never spoken. A video might show a product label on screen while the narrator discusses something else. Those visual entities are invisible to the transcript-only pipeline.