yt-innertube
Fast, cookieless YouTube audio and stream extraction. It resolves a direct,
signed stream URL in about 200ms (versus 46 to 82s for yt-dlp --get-url)
by talking to YouTube's innertube player endpoint as the ANDROID_VR client.
No cookies, no paid API, no signature cipher to decode. Runs on Deno and Node.
Built at SocialGravity to clone voices from a person's public talks. We pivoted and no longer needed the YouTube path, so the tech is open here under MIT. It ran in production, it is not a toy.

Why
The usual way to get a YouTube audio stream URL is yt-dlp --get-url, which
probes several player clients over a spawn, parse and decipher chain. When one
client stalls on YouTube throttling the whole subprocess blocks, and end to end
it routinely takes 46 to 82 seconds per video.
The ANDROID_VR (Oculus) client is special: YouTube serves it direct, signed
stream URLs with no signature cipher to decode. So a single HTTP POST to
/youtubei/v1/player returns a URL ffmpeg can read immediately. That is the core
trick here. yt-dlp is still wired in as a fallback so the path stays reliable if
YouTube changes the contract. Stream copy (-c:a copy) instead of re-encoding
takes a clip from roughly 97s down to about 5s.
Install
Deno (import directly, no install):
import { downloadYouTubeAudioReliable, getStreamUrlViaInnertube }
from "https://raw.githubusercontent.com/AlvaroBalbin/yt-innertube/main/mod.ts";Node / Bun:
npm install yt-innertube
import { downloadYouTubeAudioReliable, getStreamUrlViaInnertube } from "yt-innertube";Requirements
- ffmpeg on PATH (for clip extraction)
- yt-dlp on PATH (used only as a fallback)
- A runtime: Deno 2.x, or Node 18+ / Bun
Quick start
Get the direct audio stream URL with one HTTP call, no subprocess:
const url = await getStreamUrlViaInnertube("dQw4w9WgXcQ", "audio");Download a time window to a temp file (innertube fast path, yt-dlp fallback, ffmpeg stream-copy, MP3 re-encode fallback):
const res = await downloadYouTubeAudioReliable("https://youtu.be/dQw4w9WgXcQ", 60, 90);
if (res.path) console.log("audio at", res.path);
else console.log("failed:", res.errorKind); // bot_detection | geo_block | unavailable | ...CLI demo (Deno):
deno run --allow-net --allow-run --allow-read --allow-write --allow-env \
examples/cli.ts "https://www.youtube.com/watch?v=dQw4w9WgXcQ" --start 60 --dur 30 --out clip.mka
# or just print the resolved stream URL, no download:
deno run --allow-net examples/cli.ts dQw4w9WgXcQ --url-only
Environment variables
All optional. The innertube fast path needs none of them.
YT_PROXY(orWEBSHARE_PROXY_URL): an HTTP proxy URL, e.g.http://user:pass@host:port. Passed to yt-dlp as--proxy. A rotating residential proxy is the most reliable way past the "Sign in to confirm you're not a bot" wall when running from a datacenter IP.YT_DISABLE_PROXY=true: force-disable the proxy even when the URL is set.YOUTUBE_COOKIES_TXT: Netscape-format cookies from a logged-in browser session, used as a yt-dlp fallback for bot detection. Optional, and a treadmill (cookies expire in a few weeks); prefer a proxy.YTDLP_BIN/YTDLP_PREARGS: override how yt-dlp is invoked. Defaults toyt-dlpon Windows andpython3 -m yt_dlpelsewhere.
API
getStreamUrlViaInnertube(videoId, "audio" | "video")resolves a direct signed URL or null. One HTTP call, never throws.downloadYouTubeAudioReliable(url, startSec?, endSec?, preStreamUrl?, opts?)returns{ path, videoId, errorKind?, keyframesDir? }.downloadMultipleClips(clips, opts?)downloads many windows in parallel.downloadPodcastAudio(mp3Url, startSec?, endSec?)windows a direct podcast MP3.extractKeyframeAtTimestamp(idOrUrl, timestampSec, outPath, signal?)pulls a single JPEG keyframe at an absolute timestamp via fast-seek.extractVideoId(urlOrId)returns the 11-char id, or null.
How it works
getStreamUrlViaInnertubePOSTs theANDROID_VRclient context to/youtubei/v1/playerand readsstreamingData.adaptiveFormats, picking the best itag with a plainurl(nosignatureCipher).- If innertube returns null it falls back to
yt-dlp --get-url, first anonymous (the anonymous manifest tends to expose clean audio-only formats), then with cookies only if the anonymous attempt hit bot detection. - ffmpeg reads the signed URL directly and copies the requested window without re-encoding. Stream URLs are cached per video id for 20s.
Notes
- Stream URLs are short-lived (CDN signed, ~30 to 60s). Resolve, then use promptly.
- The
ANDROID_VRclient version insrc/innertube.tsoccasionally needs a bump when YouTube rotates. It is backward compatible, so updating is safe. - This talks to YouTube directly. Respect YouTube's Terms of Service and the rights of content owners; you are responsible for how you use it.
License
MIT. See LICENSE.