These are some notes from back when I was teaching myself how to use OpenAI's Text-To-Speech (TTS) models and APIs.
First I picked a few lines from one of my favorite T.S. Eliot poems and ran it through the tts-1-hd model with the fable voice at 0.9x speed:
from pathlib import Path
from openai import OpenAI
client = OpenAI(
api_key = "sk-...",
organization="org-..."
)
speech_file_path = "ts-eliot.mp3"
# Use with_streaming_response instead
with client.audio.speech.with_streaming_response.create(
model="tts-1-hd",
voice="fable",
input='''We shall not cease from exploration.
And the end of all our exploring.
Will be to arrive where we started
And know the place for the first time.''',
speed=0.9,
) as response:
response.stream_to_file(speech_file_path)
The result is not fantastic, it doesn't get the cadence right, but compared to TTS models that I had played with a decade ago it is still impressive.
After that I decided to give myself a more real-world problem. I find when I want to learn something the more I can get away from a toy example to something that I might actually use the more I am going to learn.
I decided to see if I could take all the essays on Paul Graham's website and make audio files that I could listen to on my phone while getting work done around the house and watching my (at the time) not yet school age daughter. I had read many of his essays over the years and I also had both a physical and audiobook copy of Hackers & Painters but many of the newer essays are absent from this collection.
So first I needed to scrape Paul Graham's site which is thankfully old-school HTML and highly amenable to scraping. I just needed to get the articles page and parse it:
import requests
def get_html(url):
response = requests.get(url)
if response.status_code != 200:
print('Failed to get content:', response.status_code)
else:
return response.text
url = 'https://paulgraham.com/articles.html'
html_content = get_html(url)
print(html_content)
The raw HTML consisted of a list of anchor tags linking to individual essay pages. Next I used BeautifulSoup to extract all internal essay links into a dictionary:
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_content, 'html.parser')
# Dictionary to hold the essay titles and their links
essay_links = {}
for a_tag in soup.find_all('a', href=True):
essay_title = a_tag.get_text()
essay_link = a_tag['href']
# Check if the href is a valid link to an essay
if essay_link.endswith('.html') and not essay_link.startswith('http'):
full_link = f'https://paulgraham.com/{essay_link}'
essay_links[essay_title] = full_link
I then trimmed the first and last entries from the dictionary, which were navigation links rather than actual essays:
mod_list = list(essay_links.items())[1:-1]
# Convert the list back to a dictionary
d = dict(mod_list)
for title, link in d.items():
print(f"{title}: {link}")
This gave me a clean dictionary of roughly 200 essays with their titles and URLs.
With all the URLs in hand, I looped through and fetched the HTML content of every single essay:
response_dict = {}
for key, value in d.items():
response = requests.get(value)
if response.status_code == 200:
response_dict[key] = response.content
However, raw HTML wasn't going to work for the TTS API which requires plain text. So I wrote a small function to strip all the HTML and extract just the body text:
from bs4 import BeautifulSoup
def extract_text(html_content):
# Decode with a more lenient error handling strategy
soup = BeautifulSoup(
html_content.decode('utf-8', errors='replace'), 'html.parser'
)
return soup.body.get_text(separator=' ', strip=True)
# Create a new dictionary with the processed text
processed_text_dict = {
key: extract_text(value) for key, value in response_dict.items()
}
I spot-checked the output by printing just the contents of the essay "Is it Worth Being Wise?"
Before I hit run on 200+ API calls and went off to do something else, I wanted to get a sense of what my max cost for my experiment might look like. OpenAI's TTS pricing is based on character count:
| Model | Price |
|---|---|
| TTS Standard | $0.015 / 1K characters |
| TTS HD | $0.030 / 1K characters |
These were how the costs were advertised back when I ran this experiment. Checking the OpenAI pricing pages now the costs are listed as:
$15.00 / 1M characters (Standard)
$30.00 / 1M characters (HD)
So identical costs though a slightly different way of framing it to customers.
Then I wrote a quick loop to count characters across all essays:
# TTS $0.015 / 1K characters
# TTS HD $0.030 / 1K characters
count = 0
for key, value in processed_text_dict.items():
char_count = sum(len(item) for item in value)
print(f"character count for {key} is {char_count}")
count += char_count
tts = (count/1000)*.015
tts_hd = (count/1000)*.03
print(count)
print(f"TTS cost:{tts}")
print(f"TTS HD cost:{tts_hd}")
Result: ~3,066,238 total characters across all essays. That's ~$46 for TTS standard or ~$92 for TTS HD.
character count for Is it Worth Being Wise? is 22030 character count for Having Kids is 8211 character count for How to Lose Time and Money is 3751 character count for The Best Essay is 24420 character count for Superlinear Returns is 24932 character count for How to Do Great Work is 66844 character count for How to Get New Ideas is 793 character count for The Need to Read is 2502 character count for What You (Want to)* Want is 2775 character count for Alien Truth is 3903 character count for What I've Learned from Users is 12653 character count for Heresy is 12494 character count for Putting Ideas into Words is 6486 character count for Is There Such a Thing as Good Taste? is 6068 character count for Beyond Smart is 8229 character count for Weird Languages is 2132 character count for How to Work Hard is 18108 character count for A Project of One's Own is 14010 character count for Fierce Nerds is 7500 character count for Crazy New Ideas is 7779 character count for An NFT That Saves Lives is 1751 character count for The Real Reason to End the Death Penalty is 4686 character count for How People Get Rich Now is 14645 character count for Write Simply is 2785 character count for Donate Unrestricted is 2904 character count for What I Worked On is 74905 character count for Earnestness is 9634 character count for Billionaires Build is 19225 character count for The Airbnbs is 6063 character count for How to Think for Yourself is 20938 character count for Early Work is 14176 character count for Modeling a Wealth Tax is 2428 character count for The Four Quadrants of Conformism is 12090 character count for Orthodox Privilege is 3857 character count for Coronavirus and Credibility is 1445 character count for How to Write Usefully is 16290 character count for Being a Noob is 2142 character count for Haters is 7705 character count for The Two Kinds of Moderate is 3931 character count for Fashionable Problems is 1157 character count for The Lesson to Unlearn is 22503 character count for Novelty and Heresy is 1583 character count for The Bus Ticket Theory of Genius is 15107 character count for General and Surprising is 2592 character count for Charisma / Power is 657 character count for The Risk of Discovery is 1286 character count for How to Make Pittsburgh a Startup Hub is 14917 character count for Life is Short is 9254 character count for Economic Inequality is 20040 character count for The Refragmentation is 42366 character count for Jessica Livingston is 11204 character count for A Way to Detect Bias is 3423 character count for Write Like You Talk is 4059 character count for Default Alive or Default Dead? is 8545 character count for Why It's Safe for Founders to Be Nice is 4417 character count for Change Your Name is 4175 character count for What Microsoft Is this the Altair Basic of? is 2134 character count for The Ronco Principle is 3514 character count for What Doesn't Seem Like Work? is 2670 character count for Don't Talk to Corp Dev is 7113 character count for Let the Other 95% of Great Programmers In is 5450 character count for How to Be an Expert in a Changing World is 6331 character count for How You Know is 3712 character count for The Fatal Pinch is 9062 character count for Mean People Fail is 6623 character count for Before the Startup is 25650 character count for How to Raise Money is 60743 character count for Investor Herd Dynamics is 6495 character count for How to Convince Investors is 21020 character count for Do Things that Don't Scale is 25208 character count for Startup Investing Trends is 16711 character count for How to Get Startup Ideas is 40755 character count for The Hardware Renaissance is 2496 character count for Startup = Growth is 31253 character count for Black Swan Farming is 12137 character count for The Top of My Todo List is 1306 character count for Writing and Speaking is 6472 character count for How Y Combinator Started is 7850 character count for Defining Property is 5608 character count for Frighteningly Ambitious Startup Ideas is 21294 character count for A Word to the Resourceful is 4514 character count for Schlep Blindness is 4993 character count for Snapshot: Viaweb, June 1998 is 4891 character count for Why Startup Hubs Work is 10287 character count for The Patent Pledge is 4045 character count for Subject: Airbnb is 7462 character count for Founder Control is 4351 character count for Tablets is 3122 character count for What We Look for in Founders is 4545 character count for The New Funding Landscape is 20177 character count for Where to See Silicon Valley is 6139 character count for High Resolution Fundraising is 4226 character count for What Happened to Yahoo is 11893 character count for The Future of Startup Funding is 22365 character count for The Acceleration of Addictiveness is 7475 character count for The Top Idea in Your Mind is 6580 character count for How to Lose Time and Money is 3751 character count for Organic Startup Ideas is 5646 character count for Apple's Mistake is 12454 character count for What Startups Are Really Like is 29329 character count for Persuade xor Discover is 7546 character count for Post-Medium Publishing is 10367 character count for The List of N Things is 7972 character count for The Anatomy of Determination is 9153 character count for What Kate Saw in Silicon Valley is 4743 character count for The Trouble with the Segway is 2149 character count for Ramen Profitable is 10625 character count for Maker's Schedule, Manager's Schedule is 6637 character count for A Local Revolution? is 7955 character count for Why Twitter is a Big Deal is 813 character count for The Founder Visa is 2357 character count for Five Founders is 4221 character count for Relentlessly Resourceful is 5719 character count for How to Be an Angel Investor is 22467 character count for Why TV Lost is 8948 character count for Can You Buy a Silicon Valley? Maybe. is 10731 character count for What I've Learned from Hacker News is 16500 character count for Startups in 13 Sentences is 7610 character count for Keep Your Identity Small is 5304 character count for After Credentials is 14034 character count for Could VC be a Casualty of the Recession? is 7838 character count for The High-Res Society is 9091 character count for The Other Half of "Artists Ship" is 7645 character count for Why to Start a Startup in a Bad Economy is 6155 character count for A Fundraising Survival Guide is 27996 character count for The Pooled-Risk Company Management Company is 7400 character count for Cities and Ambition is 20307 character count for Disconnecting Distraction is 6442 character count for Lies We Tell Kids is 29384 character count for Be Good is 16801 character count for Why There Aren't More Googles is 7722 character count for Some Heroes is 15060 character count for How to Disagree is 9166 character count for You Weren't Meant to Have a Boss is 14404 character count for A New Venture Animal is 11226 character count for Trolls is 5105 character count for Six Principles for Making New Things is 6822 character count for Why to Move to a Startup Hub is 8275 character count for The Future of Web Startups is 19623 character count for How to Do Philosophy is 28112 character count for News from the Front is 12582 character count for How Not to Die is 10866 character count for Holding a Program in One's Head is 10770 character count for Stuff is 7073 character count for The Equity Equation is 6136 character count for An Alternative Theory of Unions is 3034 character count for The Hacker's Guide to Investors is 35300 character count for Two Kinds of Judgement is 4435 character count for Microsoft is Dead is 7313 character count for Why to Not Not Start a Startup is 34742 character count for Is It Worth Being Wise? is 22030 character count for Learning from Founders is 4850 character count for How Art Can Be Good is 20347 character count for The 18 Mistakes That Kill Startups is 32122 character count for A Student's Guide to Startups is 36137 character count for How to Present to Investors is 16186 character count for Copy What You Like is 5454 character count for The Island Test is 4123 character count for The Power of the Marginal is 34660 character count for Why Startups Condense in America is 27795 character count for How to Be Silicon Valley is 21190 character count for The Hardest Lessons for Startups to Learn is 27225 character count for See Randomness is 3272 character count for Are Software Patents Evil? is 27292 character count for 6,631,372 is 3898 character count for Why YC is 2091 character count for How to Do What You Love is 25907 character count for Good and Bad Procrastination is 10298 character count for Web 2.0 is 19358 character count for How to Fund a Startup is 50992 character count for The Venture Capital Squeeze is 9102 character count for Ideas for Startups is 22457 character count for What I Did this Summer is 14851 character count for Inequality and Risk is 16640 character count for After the Ladder is 3428 character count for What Business Can Learn from Open Source is 24850 character count for Hiring is Obsolete is 27348 character count for The Submarine is 13591 character count for Why Smart People Have Bad Ideas is 17936 character count for Return of the Mac is 5583 character count for Writing, Briefly is 2762 character count for Undergraduation is 20864 character count for A Unified Theory of VC Suckage is 8155 character count for How to Start a Startup is 54656 character count for What You'll Wish You'd Known is 28319 character count for Made in USA is 10820 character count for It's Charisma, Stupid is 8887 character count for Bradley's Ghost is 3636 character count for A Version 1.0 is 24595 character count for What the Bubble Got Right is 21319 character count for The Age of the Essay is 26296 character count for The Python Paradox is 2767 character count for Great Hackers is 29749 character count for Mind the Gap is 32819 character count for How to Make Wealth is 50601 character count for The Word "Hacker" is 11609 character count for What You Can't Say is 31187 character count for Filters that Fight Back is 4864 character count for Hackers and Painters is 32120 character count for If Lisp is So Great is 2524 character count for The Hundred-Year Language is 28027 character count for Why Nerds are Unpopular is 32078 character count for Better Bayesian Filtering is 25553 character count for Design and Research is 15106 character count for A Plan for Spam is 31716 character count for Revenge of the Nerds is 33991 character count for Succinctness is Power is 17367 character count for What Languages Fix is 1382 character count for Taste for Makers is 25062 character count for Why Arc Isn't Especially Object-Oriented is 2994 character count for What Made Lisp Different is 4266 character count for The Other Road Ahead is 69135 character count for The Roots of Lisp is 2205 character count for Five Questions about Language Design is 17037 character count for Being Popular is 43487 character count for Java's Cover is 7756 character count for Beating the Averages is 25662 character count for Lisp for Web-Based Applications is 322 character count for Programming Bottom-Up is 5478 character count for This Year We Can End the Death Penalty in California is 1019 3066238 TTS cost:45.99357 TTS HD cost:91.98714
My first attempt was very simplistic, loop through all essays and generate one MP3 per essay:
from openai import OpenAI
client = OpenAI(
api_key = "sk-...",
organization = "org-..."
)
# Iterate over each item in the dictionary
for key, value in processed_text_dict.items():
filename_key = ''.join(
e for e in key if e.isalnum() or e in [' ']
).replace(' ', '_') + ".mp3"
speech_file_path = f"./{filename_key}"
print(f"Generating speech for {key}...")
try:
response = client.audio.speech.create(
model="tts-1-hd",
voice="fable",
input=value
)
response.stream_to_file(speech_file_path)
print(f"Saved to {speech_file_path}")
except Exception as e:
print(f"An error occurred while processing '{key}': {e}")
import time
time.sleep(1)
print("Finished generating speech files.")
Problem: I quickly discovered OpenAI TTS API has a 4,096 character input limit.
So if you actually want to do TTS on anything remotely useful they have left this as an exercise to the reader and don't provide an easy straightforward way to handle this with their API. Most of Paul Graham's essays are much longer than that, so this approach would either fail or silently truncate the text. I needed a chunking strategy.
The fix was to split each essay into 4,096-character chunks, generate audio for each chunk separately, then stitch them back together into a single MP3. For the audio concatenation I installed PyDub and ffmpeg:
!pip install PyDub ffmpeg
Then I built a slightly improved pipeline. Here's the intermediate version I tested first:
from openai import OpenAI
from pydub import AudioSegment
import time
client = OpenAI(
api_key = "sk-...",
organization = "org-..."
)
# Function for splitting text into chunks of size max_len
def split_text(text, max_len):
chunks = []
for i in range(0, len(text), max_len):
chunks.append(text[i:i+max_len])
return chunks
# Iterate over each item in the dictionary
for key, value in processed_text_dict.items():
filename_key = ''.join(
e for e in key if e.isalnum() or e in [' ']
).replace(' ', '_')
print(f"Generating speech for {key}...")
# Split input text into chunks
chunks = split_text(value, 4096)
# Create an empty list to hold the audio chunks
audio_chunks = []
for i, chunk in enumerate(chunks):
try:
response = client.audio.speech.create(
model="tts-1-hd",
voice="fable",
input=chunk
)
chunk_file_path = f"./{filename_key}_{i}.mp3"
response.stream_to_file(chunk_file_path)
audio_chunks.append(
AudioSegment.from_file(chunk_file_path)
)
except Exception as e:
print(
f"An error occurred while processing "
f"'{key}', chunk {i}: {e}"
)
continue
time.sleep(1)
# Concatenate all the audio chunks
combined = sum(audio_chunks, AudioSegment.empty())
all_file_path = f"./{filename_key}.mp3"
combined.export(all_file_path, format="mp3")
print(f"All chunks of '{key}' saved to {all_file_path}")
print("Finished generating speech files.")
This worked, but it dumped everything (chunks and final files) into the same directory, which was confusing. I then made a slightly updated version to clean this up.
This still very much a toy version adds proper directory structure: chunk files go into ./chunked/ and final concatenated MP3s go into ./final/:
import os
from openai import OpenAI
from pydub import AudioSegment
import time
# Create directories for chunked and final files
if not os.path.exists('chunked'):
os.makedirs('chunked')
if not os.path.exists('final'):
os.makedirs('final')
client = OpenAI(
api_key = "sk-...",
organization = "org-..."
)
# Function for splitting text into chunks of size max_len
def split_text(text, max_len):
chunks = []
for i in range(0, len(text), max_len):
chunks.append(text[i:i+max_len])
return chunks
# Iterate over each item in the dictionary
for key, value in processed_text_dict.items():
filename_key = ''.join(
e for e in key if e.isalnum() or e in [' ']
).replace(' ', '_')
print(f"Generating speech for {key}...")
# Split input text into chunks
chunks = split_text(value, 4096)
# Create an empty list to hold the audio chunks
audio_chunks = []
for i, chunk in enumerate(chunks):
try:
response = client.audio.speech.create(
model="tts-1-hd",
voice="fable",
input=chunk
)
# Save to 'chunked' directory
chunk_file_path = (
f"./chunked/{filename_key}_{i}.mp3"
)
response.stream_to_file(chunk_file_path)
# Add this audio chunk to the list
audio_chunks.append(
AudioSegment.from_file(chunk_file_path)
)
except Exception as e:
print(
f"An error occurred while processing "
f"'{key}', chunk {i}: {e}"
)
continue
time.sleep(1)
# Concatenate all the audio chunks
combined = sum(audio_chunks, AudioSegment.empty())
# Save the final file to the 'final' directory
all_file_path = f"./final/{filename_key}.mp3"
combined.export(all_file_path, format="mp3")
print(
f"All chunks of '{key}' saved to {all_file_path}"
)
print("Finished generating speech files.")
Success! The script processed all ~200 essays without errors. Every essay was split into chunks, converted to audio, and concatenated into a single MP3 in the ./final/ directory.
So for the not so low price of $92 I had 200 Paul Graham audio files that I have since listened to on my phone. Here is one of the files as an example from Paul Graham's essay "When To do What You Love" :
As stated before this is still very much a nice toy proof of concept example. There are numerous things that could be improved like:
Smarter chunking. My split_text() function splits on a hard character count, which means it can cut mid-sentence or even mid-word. This occasionally creates an extra awkward pause or mispronunciation at chunk boundaries. A better approach would be to split on sentence or paragraph boundaries.
Text cleanup. The extracted text includes footnote markers like [ 1 ] and occasional artifacts like "Polish Translation French Translation" at the end of some essays. Pre-processing the text to strip these would make for cleaner narration.
Resumability/error handling. If the script fails halfway through (rate limit, network error, etc.), it starts over from scratch. Adding a check for already-generated files would make it idempotent.
Sync execution → async. Currently the code runs completely synchronously. I could swap to async OAI client calls to speed up execution.