(#2) Building NotebookLM from scratch: making consumption easier

From reading lists to playlists: Delivering AI-generated podcasts to your podcast apps

Dhruv Baldawa

Nov 08, 2024

Quick Update

Built a working pipeline that transforms articles into audio content
Restructured the codebase for better maintainability
Implemented configurable workflows using YAML
Added support for S3-compatible storage

Here is a podcast version of this article generated via gyandex

1×

0:00

-2:14

New to this series? Check out the first post where I explored why I am building an open-source alternative to NotebookLM.

The Journey So Far

Remember how we talked about information overload in our last post? Well, I've made significant progress in tackling that challenge. Instead of just dreaming about an ideal content consumption tool, we now have a working prototype that does something pretty cool: it turns your reading list into a podcast feed!

What's Working Now

The system can now:

Take an article link as input
Process and transform the content
Generate audio output
Create a podcast feed that updates automatically with new content

This is particularly exciting because it takes us one step closer to flexible content consumption - a key feature I wanted to expand upon from NotebookLM.

Technical Progress

From Prototype to Production-Ready

The initial prototype lived in a single Jupyter notebook. Now, it's evolved into a structured project with dedicated modules.

Before:
main.ipynb

After:
gyandex
├── __init__.py
├── cli
│   ├── __init__.py
│   └── genpod.py
├── llms
│   ├── __init__.py
│   ├── factory.py
│   └── factory_test.py
├── loaders
│   ├── __init__.py
│   ├── factory.py
│   └── factory_test.py
└── podgen
    ├── __init__.py
    ├── config
    │   ├── __init__.py
    │   ├── loader.py
    │   ├── loader_test.py
    │   └── schema.py
    ├── engine
    │   ├── __init__.py
    │   ├── generator.py
    │   ├── publisher.py
    │   ├── publisher_test.py
    │   ├── synthesizer.py
    │   ├── synthesizer_test.py
    │   └── workflows.py
    ├── feed
    │   ├── __init__.py
    │   ├── generator.py
    │   ├── generator_test.py
    │   ├── models.py
    │   └── models_test.py
    ├── processors
    │   ├── __init__.py
    │   └── tts.py
    └── storage
        ├── __init__.py
        ├── factory.py
        ├── factory_test.py
        ├── s3.py
        └── s3_test.py

It seems a bit daunting, but that is also the reason why I had to do this now because having everything in a notebook was no longer sustainable.

Configurable Workflows

Configuration is now managed through YAML files, making it easier to control the entire process. The system supports environment variables for secure credential management.

Here's a simplified example:

version: "1.0"
content:
  source: "https://www.rubick.com/skip-level-1-on-1s/"
  format: "html"

llm:
  provider: "google-generative-ai"
  model: "gemini-1.5-pro"
  temperature: 1.0
  google_api_key: "${GOOGLE_API_KEY}"

tts:  # This section isn't hooked up properly yet
  provider: "aws"
  default_voice: "default_host"
  voices:
    default_host:
      voice_id: "Matthew"
      speaking_rate: 1.0
      pitch: 0
    guest:
      voice_id: "Joanna"
      speaking_rate: 1.1
      pitch: 1

storage:
  provider: "s3"
  access_key: "${ACCESS_KEY_ID}"   # This is valid because the configuration format accepts environment variables
  secret_key: "${SECRET_ACCESS_KEY}"
  bucket: "gyandex"
  region: "us-east-1"
  endpoint: "https://xxx.r2.cloudflarestorage.com"
  custom_domain: "pub-xxx.r2.dev"

feed:
  title: "Gyandex: Tech Reading"
  slug: "reading-list"
  description: "Technical reading list curated by Dhruv Baldawa"
  author: "Dhruv Baldawa"
  email: "test@example.com"
  language: "en"
  categories: ["Technology", "Software Development", "Programming"]
  image: "https://images.pexels.com/photos/26730962/pexels-photo-26730962.jpeg?cs=srgb&dl=pexels-helloaesthe-26730962.jpg&fm=jpg&w=640&h=960"
  website: "https://github.com/dhruvbaldawa/gyandex"

I built a companion CLI (podgen) which can run the entire workflow using this configuration. I plan to create separate configuration files for different use-cases and their own customizations. For example, I’d like to have different podcast structure for technical articles vs philosophical ones.

podgen reading-list.yaml

For now, the feeds are powered by a local SQLite database. I will be iterating on my choice of data store throughout this project.

Learning Journey

While this update focused more on infrastructure than innovation, I discovered Meta's NotebookLlama - a toy example with similar goals. I'm looking forward to exploring their approach in future updates.

Next Steps

Content Processing Improvements
- Current focus: Enhancing the quality of generated content
- Exploring better approaches for longer articles and creating high-quality podcasts
Configuration Enhancements
- Expanding YAML configuration options
- Implementing more control over the text-to-speech process, which does not exist right now

Building in Public

I'm sharing this journey to document my learning about Generative AI and hopefully help others who are interested in similar projects. Each update brings new insights about both the technical challenges and the practical applications of current AI capabilities.