Lightweight, AI-first YouTube extraction CLI for transcripts, subtitles, chapters, descriptions, and structured metadata. Ready for RAG pipelines, MCP servers, and LLM coding agents.
A simple, lightweight CLI designed to bypass API restrictions and feed structured data straight to RAG databases, MCP servers, and LLM coding assistants.
Zero configuration. Bypasses complex GCP Project setups, YouTube OAuth verification, and query quota limits. Just install and run.
Get transcripts, metadata, channel details, custom timestamps, and chapters in a single clean JSON payload optimized for vector storage.
No Puppeteer, Playwright, or Selenium required. Extremely low RAM and CPU footprint, meaning it will run smoothly inside serverless functions or containerized jobs.
Perfect fit for OpenAI Agents, MCP Server implementations, Codex CLI workflows, CrewAI, AutoGen, and LangGraph structures. Ingest whole channels or tutorials as technical contexts.
Extract and process content entirely on your machine without relying on external APIs or cloud processing pipelines. Full control over privacy, speed, and reliability.
Designed for modern developer workflows with clean commands, structured outputs, automation-ready pipelines, and seamless integration into CI/CD systems.
From simple metadata extraction to automated developer workflows, customize Vidilearn to feed your pipeline needs.
# Standard complete extraction (prints title, transcript, chapters, etc.)
vidilearn extract "https://youtube.com/watch?v=VIDEO_ID"
# Pretty-print formatted JSON
vidilearn extract "https://youtube.com/watch?v=VIDEO_ID" --pretty
# Save structured JSON to file directly
vidilearn extract "https://youtube.com/watch?v=VIDEO_ID" > context.json
# Specialized extractions
vidilearn extract "https://youtube.com/watch?v=VIDEO_ID" --transcript
vidilearn extract "https://youtube.com/watch?v=VIDEO_ID" --chapters
vidilearn extract "https://youtube.com/watch?v=VIDEO_ID" --metadata
Always wrap URLs in double quotes (e.g. "https://youtube.com/...") to prevent terminal shells from parsing special characters like & or ? as background job triggers.
{
"title": "Build AI Agents",
"channel": "AI Academy",
"duration": "12:45",
"description": "Learn how to build and orchestrate multi-agent architectures...",
"transcript": "In this video, we will design and code multi-agent LLM teams...",
"chapters": [
{
"title": "Introduction",
"timestamp": "00:00"
},
{
"title": "Agent Architecture Design",
"timestamp": "03:42"
},
{
"title": "Code Implementation",
"timestamp": "07:15"
}
]
}
# Feed structured data straight into local vector database loading scripts
vidilearn extract "https://youtube.com/watch?v=VIDEO_ID" --json | node ingest-vector-store.js
# Use directly inside MCP (Model Context Protocol) tool configurations:
{
"mcpServers": {
"vidilearn-server": {
"command": "npx",
"args": ["-y", "vidilearn", "extract", "{{url}}", "--json"]
}
}
}
# Extract technical documentation from standard video tutorials
vidilearn extract "https://youtube.com/watch?v=VIDEO_ID" > /tmp/context-learning.json
# Feed the output as context injection to Codex or Gemini commands
codex run "implement a backend auth system using the rules in context-learning.json" \
--context=/tmp/context-learning.json
Vidilearn is designed to optimize resources. See how we rank against traditional web scrapers and API integrations.
Lower is better (seconds per extraction)
Lower is better (Disk usage)
Quick look up of all available flags, arguments, and roadmap features.
| Command / Flag | Description | Default Behavior |
|---|---|---|
vidilearn extract <url> |
Core parsing command for a YouTube video. | Extracts all available data |
--pretty |
Formats JSON output with spaces for human readability. | Minified JSON string |
--transcript |
Extracts only the transcription text. | Extracts full dataset |
--chapters |
Returns video chapters & timestamps. | Extracts full dataset |
--metadata |
Returns core title, description, channel, and duration. | Extracts full dataset |
--help |
Prints standard usage, examples, and flags information. | - |
Vidilearn is free, open source, and maintained by Alfo Tech Industries. If this tool saves your team GCP API quotas or speeds up your AI pipelines, please consider sponsoring development.
Sponsor on GitHub