Add Voice to Any Website Using OpenAI's Text-to-Speech (TTS)

Text to Speech OpenAI

0:00

/209.4

One of the new releases at OpenAI's DevDay was their first Text-to-Speech (TTS) model.

As AI continues to proliferate, the ability to convert written content into audio can be quite useful for publishers looking to enhance their user experience. Whether it's for accessibility, convenience, or just the fact that not many people actually like to read anymore...text-to-speech (TTS) models are rapidly gaining popularity.

In this guide, let's walk through a simple example of how to turn any blog post into audio using OpenAI's text-to-speech, including:

Scrape the text from a web page or blog post
Converting text-to-speech
Combining multiple audio files

💡

Access the full code for this tutorial.

Step 1: Setting Up the Environment

To get started, we'll be working in a Colab notebook, so let's install the following libraries:

beautifulsoup4 for web scraping
openai for the TTS model
pydub for audio processing

!pip install beautifulsoup4 openai pydub

Next, we'll add our OpenAI API key to the Colab Secrets manage and set it as an environment variable

import os
from google.colab import userdata

# Retrieving the API key from Secret Manager
OPENAI_API_KEY = userdata.get('OPENAI_API_KEY')

# Setting the API key as an environment variable
os.environ["OPENAI_API_KEY"] = OPENAI_API_KEY

Step 2: Scraping Website Content

Next, let's use requests to fetch the page and BeautifulSoup to parse the HTML.

For this example, I'll be scraping my own posts, and we'll want to target the specific content within this page so we'll set a target div class. In my case the content is found within the class f-article-content js-article-content, although this will change depending on the site.

Fetching Webpage: Use requests to retrieve the HTML of the specified URL.
Parsing HTML: Use BeautifulSoup to parse the HTML content.
Extracting Specific Content: Locates and extracts text from a specific div element.

import requests
from bs4 import BeautifulSoup

def scrape_website_content(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.content, 'html.parser')
    main_content = soup.find('div', class_='f-article-content js-article-content') # update to div class of target content
    return main_content.get_text(strip=True)

Step 3: Converting Text to Speech

Next, let's write a function to convert text-to-speech. Since OpenAI's TTS model has a maximum input limit of 4096 characters, we'll want to break it into chunks and convert each chunk into speech in seperate audio files.

Initialize TTS Client: Creates an instance of OpenAI's TTS client.
Chunking Text: Divides the text into chunks to adhere to character limits.
Converting to Speech: Converts each text chunk into an audio file and saves it.

from openai import OpenAI
from pathlib import Path

def text_to_speech(text, voice_type="alloy"):
    client = OpenAI()
    max_length = 4096
    chunks = [text[i:i+max_length] for i in range(0, len(text), max_length)]
    audio_files = []

    for index, chunk in enumerate(chunks):
        speech_file_path = Path(f"speech_part_{index+1}.mp3")
        response = client.audio.speech.create(
            model="tts-1",
            voice=voice_type,
            input=chunk
        )
        response.stream_to_file(speech_file_path)
        audio_files.append(speech_file_path)

    return audio_files

Step 4: Combining Audio Files

Now that we have individual audio files, we can use pydub to combine them into a single audio file.

Create an Empty Audio Segment: Initializes an empty audio segment to concatenate audio files.
Append Audio Files: Iterates through the list of audio files, appending each to the combined segment.
Export Combined Audio: Saves the concatenated audio as a single MP3 file.

from pydub import AudioSegment

def combine_audio_files(audio_files, output_file="combined_speech.mp3"):
    combined = AudioSegment.empty()
    for file in audio_files:
        audio = AudioSegment.from_mp3(file)
        combined += audio
    combined.export(output_file, format="mp3")

Optional: Editing the Transcript

Finally, to refine the content or make it more audio-friendly (like replacing code snippets with descriptive text), you can save the scraped content to a file, manually edit it, and then use the edited version for TTS.

Save Original Content: Writes the scraped text to a file for manual editing.
Manual Editing: The user edits the text file as needed to refine the content and re-uploads it as edited_transcript.txt
Read Edited Content: Loads the edited text for TTS conversion.

# Saving the scraped content
file_path = "original_transcript.txt"
with open(file_path, "w") as file:
    file.write(scrape_website_content(url))

# After editing, read the edited file
edited_file_path = "edited_transcript.txt"
with open(edited_file_path, "r") as file:
    edited_transcript = file.read()

# Convert the edited transcript to speech
audio_files = text_to_speech(edited_transcript)
combine_audio_files(audio_files)

Summary: Voice with OpenAI's Text-to-Speech

In this guide, we saw how simple it is to use OpenAI's text-to-speech model and add audio to any website.

Whether creating audiobooks, enhancing accessibility, or re-purposing written content, this new model opens up a wide range of possibilities for publishers and content creators.