Code & Stream

ABOUT | CONTACT

Fun With TTS & Paul Graham's Essays

These are some notes from back when I was teaching myself how to use OpenAI's Text-To-Speech (TTS) models and APIs.

First I picked a few lines from one of my favorite T.S. Eliot poems and ran it through the tts-1-hd model with the fable voice at 0.9x speed:

from pathlib import Path
from openai import OpenAI

client = OpenAI(
    api_key = "sk-...",
    organization="org-..."
)

speech_file_path = "ts-eliot.mp3"

# Use with_streaming_response instead
with client.audio.speech.with_streaming_response.create(
    model="tts-1-hd",
    voice="fable",
    input='''We shall not cease from exploration.
        And the end of all our exploring.
        Will be to arrive where we started
        And know the place for the first time.''',
    speed=0.9,
) as response:
    response.stream_to_file(speech_file_path)

The result is not fantastic, it doesn't get the cadence right, but compared to TTS models that I had played with a decade ago it is still impressive.

After that I decided to give myself a more real-world problem. I find when I want to learn something the more I can get away from a toy example to something that I might actually use the more I am going to learn.

I decided to see if I could take all the essays on Paul Graham's website and make audio files that I could listen to on my phone while getting work done around the house and watching my (at the time) not yet school age daughter. I had read many of his essays over the years and I also had both a physical and audiobook copy of Hackers & Painters but many of the newer essays are absent from this collection.

Scraping Paul Graham's Essay Index

So first I needed to scrape Paul Graham's site which is thankfully old-school HTML and highly amenable to scraping. I just needed to get the articles page and parse it:

import requests

def get_html(url):
    response = requests.get(url)

    if response.status_code != 200:
        print('Failed to get content:', response.status_code)
    else:
        return response.text

url = 'https://paulgraham.com/articles.html'

html_content = get_html(url)

print(html_content)

The raw HTML consisted of a list of anchor tags linking to individual essay pages. Next I used BeautifulSoup to extract all internal essay links into a dictionary:

from bs4 import BeautifulSoup

soup = BeautifulSoup(html_content, 'html.parser')

# Dictionary to hold the essay titles and their links
essay_links = {}

for a_tag in soup.find_all('a', href=True):
    essay_title = a_tag.get_text()
    essay_link = a_tag['href']
    # Check if the href is a valid link to an essay
    if essay_link.endswith('.html') and not essay_link.startswith('http'):
        full_link = f'https://paulgraham.com/{essay_link}'
        essay_links[essay_title] = full_link

I then trimmed the first and last entries from the dictionary, which were navigation links rather than actual essays:

mod_list = list(essay_links.items())[1:-1]

# Convert the list back to a dictionary
d = dict(mod_list)
for title, link in d.items():
    print(f"{title}: {link}")

This gave me a clean dictionary of roughly 200 essays with their titles and URLs.

Downloading and Extracting Essay Text

With all the URLs in hand, I looped through and fetched the HTML content of every single essay:

response_dict = {}

for key, value in d.items():
    response = requests.get(value)
    if response.status_code == 200:
        response_dict[key] = response.content

However, raw HTML wasn't going to work for the TTS API which requires plain text. So I wrote a small function to strip all the HTML and extract just the body text:

from bs4 import BeautifulSoup

def extract_text(html_content):
    # Decode with a more lenient error handling strategy
    soup = BeautifulSoup(
        html_content.decode('utf-8', errors='replace'), 'html.parser'
    )
    return soup.body.get_text(separator=' ', strip=True)

# Create a new dictionary with the processed text
processed_text_dict = {
    key: extract_text(value) for key, value in response_dict.items()
}

I spot-checked the output by printing just the contents of the essay "Is it Worth Being Wise?"

How Much Will This Cost?

Before I hit run on 200+ API calls and went off to do something else, I wanted to get a sense of what my max cost for my experiment might look like. OpenAI's TTS pricing is based on character count:

Model	Price
TTS Standard	$0.015 / 1K characters
TTS HD	$0.030 / 1K characters

These were how the costs were advertised back when I ran this experiment. Checking the OpenAI pricing pages now the costs are listed as:

$15.00 / 1M characters (Standard)
$30.00 / 1M characters (HD)

So identical costs though a slightly different way of framing it to customers.

Then I wrote a quick loop to count characters across all essays:

# TTS	$0.015 / 1K characters
# TTS HD	$0.030 / 1K characters
count = 0

for key, value in processed_text_dict.items():
    char_count = sum(len(item) for item in value)
    print(f"character count for {key} is {char_count}")
    count += char_count

tts = (count/1000)*.015
tts_hd = (count/1000)*.03
print(count)
print(f"TTS cost:{tts}")
print(f"TTS HD cost:{tts_hd}")

Result: ~3,066,238 total characters across all essays. That's ~$46 for TTS standard or ~$92 for TTS HD.

character count for Is it Worth Being Wise? is 22030
character count for Having Kids is 8211
character count for How to Lose Time and Money is 3751
character count for The Best Essay is 24420
character count for Superlinear Returns is 24932
character count for How to Do Great Work is 66844
character count for How to Get New Ideas is 793
character count for The Need to Read is 2502
character count for What You (Want to)* Want is 2775
character count for Alien Truth is 3903
character count for What I've Learned from Users is 12653
character count for Heresy is 12494
character count for Putting Ideas into Words is 6486
character count for Is There Such a Thing as Good Taste? is 6068
character count for Beyond Smart is 8229
character count for Weird Languages is 2132
character count for How to Work Hard is 18108
character count for A Project of One's Own is 14010
character count for Fierce Nerds is 7500
character count for Crazy New Ideas is 7779
character count for An NFT That Saves Lives is 1751
character count for The Real Reason to End the Death Penalty is 4686
character count for How People Get Rich Now is 14645
character count for Write Simply is 2785
character count for Donate Unrestricted is 2904
character count for What I Worked On is 74905
character count for Earnestness is 9634
character count for Billionaires Build is 19225
character count for The Airbnbs is 6063
character count for How to Think for Yourself is 20938
character count for Early Work is 14176
character count for Modeling a Wealth Tax is 2428
character count for The Four Quadrants of Conformism is 12090
character count for Orthodox Privilege is 3857
character count for Coronavirus and Credibility is 1445
character count for How to Write Usefully is 16290
character count for Being a Noob is 2142
character count for Haters is 7705
character count for The Two Kinds of Moderate is 3931
character count for Fashionable Problems is 1157
character count for The Lesson to Unlearn is 22503
character count for Novelty and Heresy is 1583
character count for The Bus Ticket Theory of Genius is 15107
character count for General and Surprising is 2592
character count for Charisma / Power is 657
character count for The Risk of Discovery is 1286
character count for How to Make Pittsburgh a Startup Hub is 14917
character count for Life is Short is 9254
character count for Economic Inequality is 20040
character count for The Refragmentation is 42366
character count for Jessica Livingston is 11204
character count for A Way to Detect Bias is 3423
character count for Write Like You Talk is 4059
character count for Default Alive or Default Dead? is 8545
character count for Why It's Safe for Founders to Be Nice is 4417
character count for Change Your Name is 4175
character count for What Microsoft Is this the Altair Basic of? is 2134
character count for The Ronco Principle is 3514
character count for What Doesn't Seem Like Work? is 2670
character count for Don't Talk to Corp Dev is 7113
character count for Let the Other 95% of Great Programmers In is 5450
character count for How to Be an Expert in a Changing World is 6331
character count for How You Know is 3712
character count for The Fatal Pinch is 9062
character count for Mean People Fail is 6623
character count for Before the Startup is 25650
character count for How to Raise Money is 60743
character count for Investor Herd Dynamics is 6495
character count for How to Convince Investors is 21020
character count for Do Things that Don't Scale is 25208
character count for Startup Investing Trends is 16711
character count for How to Get Startup Ideas is 40755
character count for The Hardware Renaissance is 2496
character count for Startup = Growth is 31253
character count for Black Swan Farming is 12137
character count for The Top of My Todo List is 1306
character count for Writing and Speaking is 6472
character count for How Y Combinator Started is 7850
character count for Defining Property is 5608
character count for Frighteningly Ambitious Startup Ideas is 21294
character count for A Word to the Resourceful is 4514
character count for Schlep Blindness is 4993
character count for Snapshot: Viaweb, June 1998 is 4891
character count for Why Startup Hubs Work is 10287
character count for The Patent Pledge is 4045
character count for Subject: Airbnb is 7462
character count for Founder Control is 4351
character count for Tablets is 3122
character count for What We Look for in Founders is 4545
character count for The New Funding Landscape is 20177
character count for Where to See Silicon Valley is 6139
character count for High Resolution Fundraising  is 4226
character count for What Happened to Yahoo  is 11893
character count for The Future of Startup Funding  is 22365
character count for The Acceleration of Addictiveness is 7475
character count for The Top Idea in Your Mind  is 6580
character count for How to Lose Time and Money  is 3751
character count for Organic Startup Ideas is 5646
character count for Apple's Mistake is 12454
character count for What Startups Are Really Like is 29329
character count for Persuade xor Discover  is 7546
character count for Post-Medium Publishing is 10367
character count for The List of N Things is 7972
character count for The Anatomy of Determination  is 9153
character count for What Kate Saw in Silicon Valley   is 4743
character count for The Trouble with the Segway is 2149
character count for Ramen Profitable is 10625
character count for Maker's Schedule, Manager's Schedule  is 6637
character count for A Local Revolution? is 7955
character count for Why Twitter is a Big Deal is 813
character count for The Founder Visa is 2357
character count for Five Founders is 4221
character count for Relentlessly Resourceful is 5719
character count for How to Be an Angel Investor is 22467
character count for Why TV Lost is 8948
character count for Can You Buy a Silicon Valley?  Maybe. is 10731
character count for What I've Learned from Hacker News is 16500
character count for Startups in 13 Sentences is 7610
character count for Keep Your Identity Small   is 5304
character count for After Credentials is 14034
character count for Could VC be a Casualty of the Recession? is 7838
character count for The High-Res Society is 9091
character count for The Other Half of "Artists Ship"   is 7645
character count for Why to Start a Startup in a Bad Economy is 6155
character count for A Fundraising Survival Guide is 27996
character count for The Pooled-Risk Company Management Company is 7400
character count for Cities and Ambition is 20307
character count for Disconnecting Distraction is 6442
character count for Lies We Tell Kids is 29384
character count for Be Good is 16801
character count for Why There Aren't More Googles is 7722
character count for Some Heroes is 15060
character count for How to Disagree is 9166
character count for You Weren't Meant to Have a Boss is 14404
character count for A New Venture Animal is 11226
character count for Trolls is 5105
character count for Six Principles for Making New Things is 6822
character count for Why to Move to a Startup Hub is 8275
character count for The Future of Web Startups is 19623
character count for How to Do Philosophy is 28112
character count for News from the Front is 12582
character count for How Not to Die is 10866
character count for Holding a Program in One's Head is 10770
character count for Stuff is 7073
character count for The Equity Equation is 6136
character count for An Alternative Theory of Unions is 3034
character count for The Hacker's Guide to Investors is 35300
character count for Two Kinds of Judgement is 4435
character count for Microsoft is Dead is 7313
character count for Why to Not Not Start a Startup is 34742
character count for Is It Worth Being Wise? is 22030
character count for Learning from Founders is 4850
character count for How Art Can Be Good is 20347
character count for The 18 Mistakes That Kill Startups is 32122
character count for A Student's Guide to Startups is 36137
character count for How to Present to Investors is 16186
character count for Copy What You Like is 5454
character count for The Island Test is 4123
character count for The Power of the Marginal is 34660
character count for Why Startups Condense in America is 27795
character count for How to Be Silicon Valley is 21190
character count for The Hardest Lessons for Startups to Learn is 27225
character count for See Randomness is 3272
character count for Are Software Patents Evil? is 27292
character count for 6,631,372 is 3898
character count for Why YC is 2091
character count for How to Do What You Love is 25907
character count for Good and Bad Procrastination is 10298
character count for Web 2.0 is 19358
character count for How to Fund a Startup is 50992
character count for The Venture Capital Squeeze is 9102
character count for Ideas for Startups is 22457
character count for What I Did this Summer is 14851
character count for Inequality and Risk is 16640
character count for After the Ladder is 3428
character count for What Business Can Learn from Open Source is 24850
character count for Hiring is Obsolete is 27348
character count for The Submarine is 13591
character count for Why Smart People Have Bad Ideas is 17936
character count for Return of the Mac is 5583
character count for Writing,  Briefly is 2762
character count for Undergraduation is 20864
character count for A Unified Theory of VC Suckage is 8155
character count for How to Start a Startup is 54656
character count for What You'll Wish You'd Known is 28319
character count for Made in USA is 10820
character count for It's Charisma, Stupid is 8887
character count for Bradley's Ghost is 3636
character count for A Version 1.0 is 24595
character count for What the Bubble Got Right is 21319
character count for The Age of the Essay is 26296
character count for The Python Paradox is 2767
character count for Great Hackers is 29749
character count for Mind the Gap is 32819
character count for How to Make Wealth is 50601
character count for The Word "Hacker" is 11609
character count for What You Can't Say is 31187
character count for Filters that Fight Back is 4864
character count for Hackers and Painters is 32120
character count for If Lisp is So Great is 2524
character count for The Hundred-Year Language is 28027
character count for Why Nerds are Unpopular is 32078
character count for Better Bayesian Filtering is 25553
character count for Design and Research is 15106
character count for A Plan for Spam is 31716
character count for Revenge of the Nerds is 33991
character count for Succinctness is Power is 17367
character count for What Languages Fix is 1382
character count for Taste for Makers is 25062
character count for Why Arc Isn't Especially Object-Oriented is 2994
character count for What Made Lisp Different is 4266
character count for The Other Road Ahead is 69135
character count for The Roots of Lisp is 2205
character count for Five Questions about Language Design is 17037
character count for Being Popular is 43487
character count for Java's Cover is 7756
character count for Beating the Averages is 25662
character count for Lisp for Web-Based Applications is 322
character count for Programming Bottom-Up is 5478
character count for This Year We Can End the Death Penalty in California is 1019
3066238
TTS cost:45.99357
TTS HD cost:91.98714

First Attempt: Naive Bulk Generation

My first attempt was very simplistic, loop through all essays and generate one MP3 per essay:

from openai import OpenAI

client = OpenAI(
    api_key = "sk-...",
    organization = "org-..."
)

# Iterate over each item in the dictionary
for key, value in processed_text_dict.items():
    filename_key = ''.join(
        e for e in key if e.isalnum() or e in [' ']
    ).replace(' ', '_') + ".mp3"
    speech_file_path = f"./{filename_key}"

    print(f"Generating speech for {key}...")
    try:
        response = client.audio.speech.create(
            model="tts-1-hd",
            voice="fable",
            input=value
        )

        response.stream_to_file(speech_file_path)
        print(f"Saved to {speech_file_path}")
    except Exception as e:
        print(f"An error occurred while processing '{key}': {e}")

    import time
    time.sleep(1)

print("Finished generating speech files.")

Problem: I quickly discovered OpenAI TTS API has a 4,096 character input limit.
So if you actually want to do TTS on anything remotely useful they have left this as an exercise to the reader and don't provide an easy straightforward way to handle this with their API. Most of Paul Graham's essays are much longer than that, so this approach would either fail or silently truncate the text. I needed a chunking strategy.

The Solution: Chunking and Concatenation

The fix was to split each essay into 4,096-character chunks, generate audio for each chunk separately, then stitch them back together into a single MP3. For the audio concatenation I installed PyDub and ffmpeg:

!pip install PyDub ffmpeg

Then I built a slightly improved pipeline. Here's the intermediate version I tested first:

from openai import OpenAI
from pydub import AudioSegment
import time

client = OpenAI(
    api_key = "sk-...",
    organization = "org-..."
)

# Function for splitting text into chunks of size max_len
def split_text(text, max_len):
    chunks = []
    for i in range(0, len(text), max_len):
        chunks.append(text[i:i+max_len])
    return chunks

# Iterate over each item in the dictionary
for key, value in processed_text_dict.items():
    filename_key = ''.join(
        e for e in key if e.isalnum() or e in [' ']
    ).replace(' ', '_')

    print(f"Generating speech for {key}...")

    # Split input text into chunks
    chunks = split_text(value, 4096)

    # Create an empty list to hold the audio chunks
    audio_chunks = []

    for i, chunk in enumerate(chunks):
        try:
            response = client.audio.speech.create(
                model="tts-1-hd",
                voice="fable",
                input=chunk
            )

            chunk_file_path = f"./{filename_key}_{i}.mp3"
            response.stream_to_file(chunk_file_path)

            audio_chunks.append(
                AudioSegment.from_file(chunk_file_path)
            )

        except Exception as e:
            print(
                f"An error occurred while processing "
                f"'{key}', chunk {i}: {e}"
            )
            continue

        time.sleep(1)

    # Concatenate all the audio chunks
    combined = sum(audio_chunks, AudioSegment.empty())
    all_file_path = f"./{filename_key}.mp3"
    combined.export(all_file_path, format="mp3")
    print(f"All chunks of '{key}' saved to {all_file_path}")

print("Finished generating speech files.")

This worked, but it dumped everything (chunks and final files) into the same directory, which was confusing. I then made a slightly updated version to clean this up.

Final Version: Organized Output

This still very much a toy version adds proper directory structure: chunk files go into ./chunked/ and final concatenated MP3s go into ./final/:

import os
from openai import OpenAI
from pydub import AudioSegment
import time

# Create directories for chunked and final files
if not os.path.exists('chunked'):
    os.makedirs('chunked')
if not os.path.exists('final'):
    os.makedirs('final')

client = OpenAI(
    api_key = "sk-...",
    organization = "org-..."
)

# Function for splitting text into chunks of size max_len
def split_text(text, max_len):
    chunks = []
    for i in range(0, len(text), max_len):
        chunks.append(text[i:i+max_len])
    return chunks

# Iterate over each item in the dictionary
for key, value in processed_text_dict.items():
    filename_key = ''.join(
        e for e in key if e.isalnum() or e in [' ']
    ).replace(' ', '_')

    print(f"Generating speech for {key}...")

    # Split input text into chunks
    chunks = split_text(value, 4096)

    # Create an empty list to hold the audio chunks
    audio_chunks = []

    for i, chunk in enumerate(chunks):
        try:
            response = client.audio.speech.create(
                model="tts-1-hd",
                voice="fable",
                input=chunk
            )

            # Save to 'chunked' directory
            chunk_file_path = (
                f"./chunked/{filename_key}_{i}.mp3"
            )
            response.stream_to_file(chunk_file_path)

            # Add this audio chunk to the list
            audio_chunks.append(
                AudioSegment.from_file(chunk_file_path)
            )

        except Exception as e:
            print(
                f"An error occurred while processing "
                f"'{key}', chunk {i}: {e}"
            )
            continue

        time.sleep(1)

    # Concatenate all the audio chunks
    combined = sum(audio_chunks, AudioSegment.empty())

    # Save the final file to the 'final' directory
    all_file_path = f"./final/{filename_key}.mp3"
    combined.export(all_file_path, format="mp3")

    print(
        f"All chunks of '{key}' saved to {all_file_path}"
    )

print("Finished generating speech files.")

Success! The script processed all ~200 essays without errors. Every essay was split into chunks, converted to audio, and concatenated into a single MP3 in the ./final/ directory.

Conclusion

So for the not so low price of $92 I had 200 Paul Graham audio files that I have since listened to on my phone. Here is one of the files as an example from Paul Graham's essay "When To do What You Love" :

As stated before this is still very much a nice toy proof of concept example. There are numerous things that could be improved like:

Smarter chunking. My split_text() function splits on a hard character count, which means it can cut mid-sentence or even mid-word. This occasionally creates an extra awkward pause or mispronunciation at chunk boundaries. A better approach would be to split on sentence or paragraph boundaries.

Text cleanup. The extracted text includes footnote markers like [ 1 ] and occasional artifacts like "Polish Translation French Translation" at the end of some essays. Pre-processing the text to strip these would make for cleaner narration.

Resumability/error handling. If the script fails halfway through (rate limit, network error, etc.), it starts over from scratch. Adding a check for already-generated files would make it idempotent.

Sync execution → async. Currently the code runs completely synchronously. I could swap to async OAI client calls to speed up execution.