One of the new releases at OpenAI's DevDay was their first Text-to-Speech (TTS) model.
As AI continues to proliferate, the ability to convert written content into audio can be quite useful for publishers looking to enhance their user experience. Whether it's for accessibility, convenience, or just the fact that not many people actually like to read anymore...text-to-speech (TTS) models are rapidly gaining popularity.
In this guide, let's walk through a simple example of how to turn any blog post into audio using OpenAI's text-to-speech, including:
- Scrape the text from a web page or blog post
- Converting text-to-speech
- Combining multiple audio files
Step 1: Setting Up the Environment
To get started, we'll be working in a Colab notebook, so let's install the following libraries:
beautifulsoup4
for web scrapingopenai
for the TTS modelpydub
for audio processing
!pip install beautifulsoup4 openai pydub
Next, we'll add our OpenAI API key to the Colab Secrets manage and set it as an environment variable
import os
from google.colab import userdata
# Retrieving the API key from Secret Manager
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')
# Setting the API key as an environment variable
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY
Step 2: Scraping Website Content
Next, let's use requests
to fetch the page and BeautifulSoup
to parse the HTML.
For this example, I'll be scraping my own posts, and we'll want to target the specific content within this page so we'll set a target div class. In my case the content is found within the class f-article-content js-article-content
, although this will change depending on the site.
- Fetching Webpage: Use
requests
to retrieve the HTML of the specified URL. - Parsing HTML: Use BeautifulSoup to parse the HTML content.
- Extracting Specific Content: Locates and extracts text from a specific
div
element.
import requests
from bs4 import BeautifulSoup
def scrape_website_content(url):
response = requests.get(url)
soup = BeautifulSoup(response.content, 'html.parser')
main_content = soup.find('div', class_='f-article-content js-article-content') # update to div class of target content
return main_content.get_text(strip=True)
Step 3: Converting Text to Speech
Next, let's write a function to convert text-to-speech. Since OpenAI's TTS model has a maximum input limit of 4096 characters, we'll want to break it into chunks and convert each chunk into speech in seperate audio files.
- Initialize TTS Client: Creates an instance of OpenAI's TTS client.
- Chunking Text: Divides the text into chunks to adhere to character limits.
- Converting to Speech: Converts each text chunk into an audio file and saves it.
from openai import OpenAI
from pathlib import Path
def text_to_speech(text, voice_type="alloy"):
client = OpenAI()
max_length = 4096
chunks = [text[i:i+max_length] for i in range(0, len(text), max_length)]
audio_files = []
for index, chunk in enumerate(chunks):
speech_file_path = Path(f"speech_part_{index+1}.mp3")
response = client.audio.speech.create(
model="tts-1",
voice=voice_type,
input=chunk
)
response.stream_to_file(speech_file_path)
audio_files.append(speech_file_path)
return audio_files
Step 4: Combining Audio Files
Now that we have individual audio files, we can use pydub
to combine them into a single audio file.
- Create an Empty Audio Segment: Initializes an empty audio segment to concatenate audio files.
- Append Audio Files: Iterates through the list of audio files, appending each to the combined segment.
- Export Combined Audio: Saves the concatenated audio as a single MP3 file.
from pydub import AudioSegment
def combine_audio_files(audio_files, output_file="combined_speech.mp3"):
combined = AudioSegment.empty()
for file in audio_files:
audio = AudioSegment.from_mp3(file)
combined += audio
combined.export(output_file, format="mp3")
Optional: Editing the Transcript
Finally, to refine the content or make it more audio-friendly (like replacing code snippets with descriptive text), you can save the scraped content to a file, manually edit it, and then use the edited version for TTS.
- Save Original Content: Writes the scraped text to a file for manual editing.
- Manual Editing: The user edits the text file as needed to refine the content and re-uploads it as
edited_transcript.txt
- Read Edited Content: Loads the edited text for TTS conversion.
# Saving the scraped content
file_path = "original_transcript.txt"
with open(file_path, "w") as file:
file.write(scrape_website_content(url))
# After editing, read the edited file
edited_file_path = "edited_transcript.txt"
with open(edited_file_path, "r") as file:
edited_transcript = file.read()
# Convert the edited transcript to speech
audio_files = text_to_speech(edited_transcript)
combine_audio_files(audio_files)
Summary: Voice with OpenAI's Text-to-Speech
In this guide, we saw how simple it is to use OpenAI's text-to-speech model and add audio to any website.
Whether creating audiobooks, enhancing accessibility, or re-purposing written content, this new model opens up a wide range of possibilities for publishers and content creators.
Access the full code for this tutorial
This content is only available to subscribers
Subscribe now and have access to all our stories, enjoy exclusive content and stay up to date with constant updates.
Sign up nowAlready have an account? Sign in