or Space to navigate
Podcast AI Pipeline

PodSight

Auto-transcribe, summarize, and publish Taiwan's top finance podcasts — zero manual effort

Whisper STT Gemini AI Fully Automated
01 / 16

The Problem

Too Much Content

Multiple daily podcasts covering Taiwan stocks — impossible to listen to everything

FOMO on Insights

Key market analysis and stock picks buried in hours of audio content

Manual Work

No efficient way to scan key takeaways without listening to full episodes

02 / 16

The Solution

PodSight automatically processes new episodes end-to-end, delivering readable summaries to your browser and Telegram

3 Podcasts
2x Daily Runs
0 Manual Steps
03 / 16

Podcasts We Cover

股癌 Gooaye Wed & Sat

Taiwan's most popular stock market podcast. Deep analysis, market commentary.

游庭皓的財經皓角 Yu Ting Hao Daily ~9 AM

Morning market briefing with macro analysis and economic outlook.

兆華與股惑仔 Zhaohua Daily afternoon

Afternoon market recap with sector rotation and stock picks.

04 / 16

How It Works

6-step automated pipeline from RSS feed to your Telegram

RSS
Download
Transcribe
Summarize
Publish
Telegram
05 / 16

Step 1: Detect & Download

1a

Parse RSS Feed

Fetches each podcast's RSS feed and extracts episode metadata (title, date, audio URL)

1b

Detect New Episodes

Compares RSS episodes against existing summaries — only processes what's new

1c

Download Audio

Downloads MP3 files for new episodes (audio is gitignored, re-downloaded as needed)

06 / 16

Step 2: Transcribe

Powered by Groq's Whisper API — fast, accurate speech-to-text for Mandarin audio

Groq for Speed

Whisper large-v3 on Groq hardware — transcribes 1-hour episodes in seconds

Full Text Output

Complete transcript saved as .txt — searchable, archivable, version-controlled in Git

07 / 16

Step 3: AI Summary

Gemini processes the full transcript and generates a structured, readable summary

Summary Output Includes

Key topics and market themes discussed
Stock mentions and analysis highlights
Macro outlook and sector commentary
Actionable takeaways and key quotes
08 / 16

Steps 4–5: Publish & Notify

Static Site on Vercel

Generates a clean, searchable website with all summaries organized by podcast and episode. Auto-deploys on git push.

podsight.tw

Telegram Notifications

New episodes are pushed to the @podsight Telegram channel with a preview message and direct link to the full summary.

@podsight
09 / 16

Fully Automated

GitHub Actions runs the entire pipeline on a schedule — no human intervention needed

# .github/workflows/auto-pipeline.yml
schedule:
  - cron: "0 2 * * *" # 10 AM Taiwan
  - cron: "0 11 * * *" # 7 PM Taiwan

# Pipeline → Git Push → Wait for Vercel → Telegram
Runs 2x daily Auto-detects new episodes Duplicate prevention
10 / 16

Tech Stack

Groq + Whisper

Speech-to-text with Whisper large-v3 on Groq's fast inference

Google Gemini

Long-context AI summarization — handles full transcripts in one pass

Python Pipeline

Modular scripts for each step — easy to debug and extend

Vercel

Static site hosting — auto-deploys on every git push

GitHub Actions

CI/CD scheduling — runs pipeline 2x daily automatically

Telegram Bot

Push notifications with formatted messages and direct links

11 / 16

Frontend Architecture

A 3,300-line Python script generates 136+ static HTML pages — zero frameworks, zero build step

No React. No npm. Just Python.

Single script outputs self-contained HTML with embedded CSS, vanilla JS, and Lucide icons via CDN. No compilation needed.

Per-Podcast Color Themes

3 distinct color palettes injected via CSS custom properties. Each podcast gets its own visual identity automatically.

8-Section Content Parser

Handles TLDR, topics, stocks, humor, quotes, risks, and more — across 3 different AI summary output formats.

Cross-Podcast Stock Index

Click any stock tag to find all episodes mentioning it. Client-side regex filtering across 3 podcasts — no backend needed.

12 / 16

Smart Design Choices

RSS vs Summaries detection

Compares feed against existing summaries, not audio — works across fresh CI environments

Delayed Telegram push

Waits 3 min after git push for Vercel to deploy, then verifies URL before sending

Duplicate prevention

.telegram_published tracking file committed to Git — prevents re-sending old episodes

Multi-format episode IDs

Handles EP#### (gooaye/zhaohua) and date-based (yutinghao) formats gracefully

13 / 16

CI/CD War Stories

Real production bugs from the first week of automation

"0 episodes processed" in CI
Detection counted audio files (gitignored) — CI always saw 0 new files to process.
Fix: Compare RSS feed vs existing summaries (git-tracked).
Users click Telegram link → 404
Telegram push fired before git push triggered Vercel deployment.
Fix: Two-phase workflow — git push, wait 3 min, verify URL is live, then Telegram.
15 old episodes flooded Telegram at once
Tracking file only had recent pushes — all old summaries appeared "unpublished."
Fix: Pre-populate .telegram_published with ALL existing episode IDs.
yutinghao: "No new episodes" forever
Code assumed EP#### format. yutinghao has no episode number — only dates in titles.
Fix: Extract date prefix from title ("2026/3/2(一)..." → "2026_3_2_").
14 / 16

How to Use It

Three ways to get your daily podcast summaries

Browse the Site

Visit podsight.tw for all summaries, searchable by podcast and episode

Join Telegram

Subscribe to @podsight for instant push notifications on new episodes

Read Transcripts

Full transcripts in Git — search, grep, or reference any episode's raw text

15 / 16
PodSight

Thank You

Questions? Ideas for new podcasts to add?

16 / 16