MLQ Academy: Build a YouTube Video Assistant Using Whisper & GPT-3
In this video tutorial, we'll walk through a Colab notebook on how to use Whisper and GPT-3 to build a YouTube assistant that can transcribe videos and allow users to ask questions about them.
In particular, we'll use Whisper to transcribe the video, and then prepare the data by splitting the transcript into smaller subsections that contain relevant context to the user's questions. As OpenAI highlights, Whisper...
Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web.
Whisper approaches human level robustness and accuracy on English speech recognition.
We then use OpenAI embeddings API to get the document and query embeddings, and the Completions API to answer questions and write summaries of the video. As OpenAI highlights:
OpenAI’s text embeddings measure the relatedness of text strings. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.
In the next video, we'll use these functions to create a simple web app using StreamLit that allows users to upload their own videos and ask questions about them.