All tutorials

Use LangChain, Deepgram, and Mistral 7B to Build a Youtube Video Summarization App

22 min

In the vast ocean of content that is YouTube, the ability to quickly and accurately summarize video content is not just a luxury, but a necessity for many users and businesses.

Whether you're a content creator wanting to provide concise summaries for your audience, a researcher looking to extract key information, or a business aiming to analyze video content at scale, having the right tools and techniques is crucial.

This guide delves deep into the world of YouTube video summarization, harnessing the power of cutting-edge technologies including Deepgram for superior audio transcription, Langchain for harvesting the power of the LLM, and Mistral 7B, a state-of-the-art and open-source LLM.

Together, these technologies form a formidable trio, enabling us to extract, process, and summarize video content with unparalleled accuracy and efficiency.

In this tutorial, you'll construct a fully functional Streamlit application from the ground up. Streamlit lets you turn simple data scripts into web applications without traditional front-end tools. This application will be capable of downloading audio from any YouTube video, transcribing it using Deepgram, and then summarizing the content with the assistance of Mistral 7B, all streamlined through the capabilities of Langchain.

You can deploy and preview the YouTube Summarization application from this guide using the Deploy to Koyeb button below:

Deploy to Koyeb

Note: Remember to use a larger instance for faster processing times and to replace the value of the DEEPGRAM_API_KEY environment variable with your own information (as described in the section on integrating Deepgram).


To successfully develop this application, you will need the following:

  • Python installed on your machine, here we use version 3.11
  • A Koyeb account to deploy the application
  • A Deepgram account for using their API. Deepgram offers new accounts $200 credit without providing credit card details.
  • A GitHub account to store your application code and trigger deployments


  1. Install and setup Streamlit
  2. Download audio with yt-dip
  3. Transcribe audio with Deepgram
  4. Summarize transcript with Langchain and Mistral 7B
  5. Combine everything together in the Streamlit app
  6. Deploy to Koyeb

Install and Setup Streamlit

First, start by creating a new project. You should use venv to keep your Python dependencies organized in a virtual environment.

Create a new project/folder locally on your computer with:

# Create and move to the new folder
mkdir YouTubeSummarizer
cd YouTubeSummarizer

# Create a virtual environment
python -m venv venv

# Active the virtual environment (Windows)

# Active the virtual environment (Linux and MacOS)
source ./venv/bin/activate

To install Streamlit, you just need to run the pip command:

pip install streamlit

A Streamlit application can consist of more than one page, but in this case, you will create a single-page application.

Streamlit allows you to design the page by adding different components to that page. The components can be text, like headings and sub-headings, but also objects like input widgets and download buttons.

To start your Streamlit application, create a new file called, with the following initial contents:

import streamlit as st

# Set page title
st.set_page_config(page_title="YouTube Video Summarization", page_icon="📜", layout="wide")

# Set title
st.title("YouTube Video Summarization", anchor=False)
st.header("Summarize YouTube videos with AI", anchor=False)

This code snippet uses the Streamlit library to set up the web application interface:

  • The code starts by importing the Streamlit library, which is aliased as st for easier reference throughout the code.
  • Next, the st.set_page_config function is called to configure the page settings. The page title is set to "YouTube Video Summarization", a scroll emoji is set as the page icon, and the layout is set to "wide" which provides a wider layout compared to the default.
  • Following the page configuration, the st.title and st.header functions are called to set the main title and header of the page respectively. Additionally, the anchor parameter is specified as False in both functions, indicating that there won't be HTML anchor links associated with the title and header.

This setup lays down the basic structure and design of the web page for the YouTube Video Summarization application, creating a user-friendly interface.

You can run the application with:

streamlit run

You will add more elements to this code when you build the final application in the last section, but for now you have the initial basic Streamlit application:

Your initial Streamlit application

Download audio with yt-dlp

To download audio from YouTube videos, you'll utilize the widely used yt-dlp library, which can be installed using the pip command as follows:

pip install yt-dlp

Now, let's proceed to craft the download logic to retrieve the audio file from a YouTube video.

For better organization and to keep the logic separate, you'll place this logic in a new file. Go ahead and create a file named

from yt_dlp import YoutubeDL

def download_audio_from_url(url):
    videoinfo = YoutubeDL().extract_info(url=url, download=False)
    length = videoinfo['duration']
    filename = f"./audio/youtube/{videoinfo['id']}.mp3"
    options = {
        'format': 'bestaudio/best',
        'keepvideo': False,
        'outtmpl': filename,
    with YoutubeDL(options) as ydl:[videoinfo['webpage_url']])
    return filename, length

# Testing by running this file
if __name__ == "__main__":
    url = ""
    filename, length = download_audio_from_url(url)
    print(f"Audio file: {filename} with length {length} seconds")

Here’s a breakdown of what each section of the code does:

  • The YoutubeDL class is imported from the yt_dlp library. It is used to interact with and download video/audio from YouTube.
  • An instance of YoutubeDL is created and its extract_info method is called with the url argument to gather information about the video without downloading it (download=False). The information is stored in the videoinfo variable.
  • The filename variable constructs a file path where the audio will be saved, using the video's unique id and an .mp3 extension.
  • An options dictionary is created to specify the download options:
    • 'format': 'bestaudio/best' specifies to download the best audio quality available.
    • 'keepvideo': False indicates that the video portion should not be kept after downloading.
    • 'outtmpl': filename specifies the output template for the filename.
  • A new YoutubeDL instance is created with the options dictionary as the argument, and within a with block, its download method is called with the video URL to download the audio.

This function encapsulates the process of downloading audio from a YouTube video, preparing the necessary file path and download options, and utilizing the yt_dlp library to perform the audio download.

You can test the audio download with the sample URL provided in the code or you can use your own. Run the file with:


A similar output to this will be shown:

[youtube] Extracting URL:
[youtube] q_eMJiOPZMU: Downloading webpage
[download] Destination: audio\youtube\q_eMJiOPZMU.mp3
[download] 100% of    3.09MiB in 00:00:00 at 16.97MiB/s
Audio file: ./audio/youtube/q_eMJiOPZMU.mp3 with length 242 seconds

You can check the audio file at the path mentioned in the output.

Transcribe audio with Deepgram

To summarize the video, a transcript is required. In the realm of audio or video recordings, a transcript is a document encompassing the textual representation of all spoken content, serving as a medium to access and read the recording's content in text form.

For transcribing, you'll use the remarkable Deepgram library. You can register for free and receive a $200 credit without providing credit card details.

Post registration, you'll gain access to the Dashboard where you can generate the necessary API key:

Deepgram Dashboard - Creating an API key

You can define your API key with these options (you can give it a different name if you would like):

Deepgram API key settings

Then just click on the 'Create Key' button to create your API key and you will see the newly generated key:

Your Deepgram API key

Make sure to copy this key to a safe place, like a text file. You will need it later on.

To keep your API key safe and not exposed in the code, create a .env file where you will keep any necessary secrets:

DEEPGRAM_API_KEY=<your key here>

To use the keys from the .envfile, you will use the python-decouple library, which can be installed with:

pip install python-decouple

You should also install the Deepgram SDK:

pip install deepgram-sdk

Now you have all the necessary configuration to write the transcribe functionality.

Create a new file called where you will place this logic:

from decouple import config
from deepgram import Deepgram


def transcribe_audio(filename):
    dg_client = Deepgram(DEEPGRAM_API_KEY)
    with open(filename, 'rb') as audio:
        source = {'buffer': audio, 'mimetype': 'audio/mp3'}
        response = dg_client.transcription.sync_prerecorded(source,
        transcript = response['results']['channels'][0]['alternatives'][0]['transcript']
        return transcript

Here's a breakdown of each part of the code:

  • The function transcribe_audio is defined with a parameter filename, which is expected to be the path to the audio file to be transcribed.
  • An instance of the Deepgram client is created named dg_client, using DEEPGRAM_API_KEY as the authentication key, which comes from the .env file.
  • The with open(filename, 'rb') as audio: line is using a context manager to open the specified audio file in binary read mode ('rb'). This ensures that the file is automatically closed after use.
  • A dictionary named source is created to hold the audio buffer and its mime type. The buffer key holds the audio file object, and the mimetype key specifies that the audio file is in MP3 format.
  • A transcription request is made to Deepgram using the sync_prerecorded method of the dg_client object. The method is passed several arguments:
    • source: The source dictionary containing the audio buffer and mime type.
    • model: Specifies the model to be used for transcription, in this case, nova-2-ea.
    • smart_format: When set to True, this option enables smart formatting of the transcript.
  • The transcript text is extracted from the response dictionary by navigating through its nested structure: response['results']['channels'][0]['alternatives'][0]['transcript'].

This function encapsulates the process of transcribing an audio file using the Deepgram API, extracting the transcript text from the response, and returning it.

Summarize transcript with Langchain and Mistral 7B

So far you have downloaded an audio file from YouTube and created a transcript. Now is the time to create a summary of that transcript of the orginal video, so that it captures the most important points.

For this, you will use the Langchain library, a powerful toolkit for orchestrating language models. At the core of this orchestration is the Map-Reduce pattern, powered by Langchain's MapReduceDocumentsChain, to process the transcript in a systematic and scalable manner.

It loads the transcript and then segments it into manageable chunks, which are individually summarized (mapped) before being consolidated into a final summary (reduced).

You will also leverage the prowess of Mistral 7B, a state-of-the-art language model, orchestrated through a series of chains and templates provided by Langchain.

  • In the use of the Langchain library, "chains" refer to a sequence of processing steps where the output from one step is used as the input for the next, facilitating complex text processing tasks like summarization through an orchestrated flow.
  • On the other hand, "templates" are predefined textual frameworks called "PromptTemplates" which structure the prompts given to language models, incorporating placeholders and specific instructions that mold the model's outputs towards a targeted outcome, ensuring consistency and direction in the tasks being executed.

The final result is a well-organized, consolidated summary of the main points from the transcript, achieved with a high degree of efficiency and accuracy.

Start by installing the Langchain library, transformers, and ctransformers, as usual with a pip command:

pip install langchain ctransformers transformers

The transformers library is an open-source, state-of-the-art machine-learning library developed by Hugging Face. It provides a vast collection of pre-trained models specialized in various NLP tasks, including text classification, summarization, translation, and question-answering.

Now you can create a new file called that will contain the summarization logic:

import os
import time
from langchain.chains import MapReduceDocumentsChain, LLMChain, ReduceDocumentsChain, StuffDocumentsChain
from langchain.llms import CTransformers
from langchain.prompts import PromptTemplate
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import (TextLoader)

# (... code continues in following snippets)

First, you start with the imports for the necessary chains, prompt template, text splitter, and also load the transcript text.

To run Mistral 7B, you will use the CTransformers library from Langchain itself.

Next, you expand the file to load the transcript and LLM (Mistral 7B):

# (... previous code ...)

def summarize_transcript(filename):
    # Load transcript
    loader = TextLoader(filename)
    docs = loader.load()

    # Load LLM
    config = {'max_new_tokens': 4096, 'temperature': 0.7, 'context_length': 4096}
    llm = CTransformers(model="TheBloke/Mistral-7B-Instruct-v0.1-GGUF",

Here is the breakdown of the code:

  • A function named summarize_transcript is defined with filename as its argument, which is intended to hold the path to the transcript file.
  • An instance of TextLoader is created and the load method is called to load the contents of the transcript file into a variable named docs.
  • A language model instance (CTransformers) is initialized using a specific model and configuration, which is stored in a variable named llm.
  • The config variable holds the LLM configuration:
    • max_new_tokens: This specifies the maximum number of tokens that the language model is allowed to generate in a single invocation. In this case, the model should not generate more than 4096 tokens.
    • temperature: This controls the randomness of the output generated by the language model. A temperature closer to 0 makes the model more deterministic, favoring more likely outcomes. A higher temperature leads to more diversity but can also result in less coherent outputs. A value of 0.7 suggests a balance between randomness and determinism.
    • context_length: This indicates the number of tokens from the input that the model should consider when generating new content or making predictions. In this case, the model takes into account up to 4096 tokens of the provided input data to understand the context before generating any output.

For the LLM you will use a version of Mistral 7B from TheBloke, which is optimized to run on the CPU. When the code is run, the model will be automatically downloaded.

The next step is to build the prompt templates that will generate the prompts that run at the different stages of the chain:

# (... previous function code ...)

    # Map template and chain
    map_template = """<s>[INST] The following is a part of a transcript:
    Based on this, please identify the main points.
    Answer:  [/INST] </s>"""
    map_prompt = PromptTemplate.from_template(map_template)
    map_chain = LLMChain(llm=llm, prompt=map_prompt)

    # Reduce template and chain
    reduce_template = """<s>[INST] The following is set of summaries from the transcript:
    Take these and distill it into a final, consolidated summary of the main points.
    Construct it as a well organized summary of the main points and should be between 3 and 5 paragraphs.
    Answer:  [/INST] </s>"""
    reduce_prompt = PromptTemplate.from_template(reduce_template)
    reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

This portion of the code is focused on setting up the templates and chains for the Map and Reduce phases of the summarization process using the Langchain library, here’s a breakdown.

For the map_template and chain:

  • A string map_template is defined to hold the template for mapping. The template is structured to instruct a language model to identify the main points from a portion of the transcript represented by {docs}.
  • PromptTemplate.from_template(map_template) is then called to convert the string template into a PromptTemplate object, which is stored in map_prompt.
  • An instance of LLMChain is created with the llm object (representing the language model) and map_prompt as arguments, and is stored in map_chain. This chain will be used to process individual chunks of the transcript and identify the main points from each chunk.

For the reduce_template and chain:

  • A string reduce_template is defined to hold the template for reducing. The template is structured to instruct a language model to consolidate a set of summaries (represented by {doc_summaries}) into a final, organized summary of main points that should span between 3 and 5 paragraphs.
  • PromptTemplate.from_template(reduce_template) is called to convert the string template into a PromptTemplate object, which is stored in reduce_prompt.
  • An instance of LLMChain is created with the llm object and reduce_prompt as arguments, and is stored in reduce_chain. This chain will be used to process the set of summaries generated from the map phase and consolidate them into a final summary.

With the different templates and individual chain components prepared, the next step is to start building the chain itself:

# (... previous function code ...)

    # Takes a list of documents, combines them into a single string, and passes this to an LLMChain
    combine_documents_chain = StuffDocumentsChain(
        llm_chain=reduce_chain, document_variable_name="doc_summaries"
    # Combines and iteratively reduces the mapped documents
    reduce_documents_chain = ReduceDocumentsChain(
        # This is final chain that is called.
        # If documents exceed context for `StuffDocumentsChain`
        # The maximum number of tokens to group documents into.
    # Combining documents by mapping a chain over them, then combining results
    map_reduce_chain = MapReduceDocumentsChain(
        # Map chain
        # Reduce chain
        # The variable name in the llm_chain to put the documents in
        # Return the results of the map steps in the output

This code block is dedicated to setting up various chains that orchestrate the process of summarizing documents. These chains are designed to work together, with each serving a specific role in the summarization pipeline, here's a detailed breakdown.

Combining documents chain:

  • StuffDocumentsChain: This chain is initiated with llm_chain set to reduce_chain and document_variable_name set to "doc_summaries". Its purpose is to take a list of documents, combine them into a single string, and pass this combined string to the reduce_chain. This is stored in the variable combine_documents_chain.

Reducing documents chain:

  • ReduceDocumentsChain: This chain is initiated with three arguments:
    • combine_documents_chain: Refers to the previously defined combine_documents_chain.
    • collapse_documents_chain: Also refers to combine_documents_chain, indicating that if the documents exceed the context length, they should be passed to the combine_documents_chain to be collapsed or combined further.
    • token_max: Set to 4000, this argument specifies the maximum number of tokens to group documents into before passing them to the combine_documents_chain.
    • This chain, stored in the variable reduce_documents_chain, is designed to manage the iterative process of reducing or summarizing the documents further.

The map-reduce chain:

  • MapReduceDocumentsChain: This is a more complex chain that orchestrates the map-reduce process for summarizing documents. It's initiated with several arguments:
    • llm_chain: Refers to the previously defined map_chain, which is used for mapping or summarizing individual chunks of documents.
    • reduce_documents_chain: Refers to the reduce_documents_chain, which is used for reducing or summarizing the mapped documents further.
    • document_variable_name: Set to "docs", this argument specifies the variable name in the llm_chain to put the documents in.
    • return_intermediate_steps: Set to True, this argument specifies that the results of the map steps should be returned in the output.
    • This chain, stored in the variable map_reduce_chain, orchestrates the overall map-reduce process of summarizing documents by first mapping a chain over them to summarize individual chunks, and then reducing or consolidating these summaries further.

The final step in the file and the summarization function is to actually run the complete chain:

# (... previous function code ...)

    # Split documents into chunks
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=4000, chunk_overlap=0
    split_docs = text_splitter.split_documents(docs)

    # Run the chain
    start_time = time.time()
    result = map_reduce_chain.__call__(split_docs, return_only_outputs=True)
    print(f"Time taken: {time.time() - start_time} seconds")
    return result['output_text']

This section of the code is focused on preparing the documents for processing, and executing the map-reduce summarization chain, here's a breakdown:

Splitting documents:

  • An instance of RecursiveCharacterTextSplitter is created, named text_splitter, with a chunk_size of 4000 and chunk_overlap of 0. This object is designed to split documents into smaller chunks based on character count.
  • The split_documents method of text_splitter is then called with the docs variable (which contains the loaded transcript) as the argument. This splits the transcript into smaller chunks, and the result is stored in the split_docs variable.

Summarization Chain:

  • The __call__ method of map_reduce_chain is then executed with split_docs and return_only_outputs=True as arguments. This triggers the map-reduce summarization process on the split documents. The return_only_outputs=True argument indicates that only the final outputs of the chain should be returned.

Measuring LLM execution time:

  • The time.time() method is called at the end, and the difference between this time and start_time is calculated to determine the time taken to run the summarization chain.

This completes the file. It may seem quite complex, but the basic principle is to prepare the prompts, prepare the map-reduce chains, and execute the final chain, splitting documents when necessary.

Combine them all together in the Streamlit app

With all the necessary components built for the different stages of the flow, you can now focus on creating the user interface to receive a YouTube URL, process it and also display info for each stage of the process, and finally show the resulting summary.

The final step in building this summarization application is add the necessary widgets to finish the Streamlit application that you started in the first step.

You can complete the file:


# (... previous import ..)

from download import download_audio_from_url
from summarize import summarize_transcript
from transcribe import transcribe_audio

# (... code continuation ...)

# Input URL
url = st.text_input("Enter YouTube URL", value="")

# Download audio
if url:
    with st.status("Processing...", state="running", expanded=True) as status:
        st.write("Downloading audio file from YouTube...")
        audio_file, length = download_audio_from_url(url)
        st.write("Transcribing audio file...")
        transcript = transcribe_audio(audio_file)
        st.write("Summarizing transcript...")
        with open("transcript.txt", "w") as f:
        summary = summarize_transcript("transcript.txt")
        status.update(label="Finished", state="complete")

    # Play Audio
    st.divider(), format='audio/mp3')

    # Show Summary
    st.subheader("Summary:", anchor=False)

This script is constructed to handle user input, download audio from a specified YouTube URL, transcribe the audio, summarize the transcript, and display the result using the Streamlit framework. Here’s a detailed breakdown:

Input URL:

  • A visual divider is created using st.divider().
  • The st.text_input function creates a text input box for the user to enter a YouTube URL. The text "Enter YouTube URL" is displayed as a placeholder.

Processing URL:

  • Within the if statement, a status indicator is initiated using st.status with the message "Processing...", and the following steps are encapsulated within this status indicator:
    • The download_audio_from_url function is called with the URL as an argument, and the returned audio file path and length are stored in audio_file and length, respectively.
    • Then a call to the transcribe_audio function with audio_file as the argument, storing the returned transcript in transcript.
    • The transcript is written to a file named "transcript.txt", and then the summarize_transcript function is called with "transcript.txt" as the argument, storing the returned summary in summary.
    • The status indicator is updated to "Finished" using status.update.


  • The function is used to create an audio player widget that allows the user to play the downloaded audio file, specifying the format as 'audio/mp3'.
  • The summary is displayed below the subheader using st.write(summary).

This concludes your YouTube summarization application, you can now run with:

streamlit run

You should see something similar to this:

Deploy to Koyeb

Now that you have the application running locally, depending on your CPU it might take some seconds to run the summarization. Deploy it on Koyeb and take advantage of the better processing power offered by high-performance microVMs.

Create a repository on your GitHub account, called YouTubeSummarizer.

Then create a .gitignore file in your local directory to exclude some folders and files from being pushed to the repository:

# PyCharm files
# Audio folder
# Python virtual environment
# Environment variables
# Transcripts text file

Run the following commands in your terminal to commit and push your code to the repository:

echo "# YouTubeSummarizer" >>
git init
git add .
git commit -m "first commit"
git branch -M main
git remote add origin [Your GitHub repository URL]
git push -u origin main

You should now have all your local code in your remote repository. Now it is time to deploy the application.

Within the Koyeb control panel, while on the Overview tab, click Create Web Service. On the App deployment page to begin:

  1. Select GitHub as your deployment method.
  2. Choose the repository that you created earlier. For example, YouTubeSummarizer.
  3. In the Builder section, select Buildpack. Click the Override toggle associated with the Run command and enter streamlit run in the field.
  4. In the Environment variables section, click the Add variable button to add your Deepgram API key named DEEPGRAM_API_KEY.
  5. In the Instance selection, click XLarge. This will provide you instance a good balance between performance and cost.
  6. In the Exposed ports section, set the port to 8501, the port Streamlit uses to serve your application.
  7. Choose a name for your App and Service. Keep in mind it will be used to create the URL for your application.
  8. Finally, click the Deploy button.

Your application will start to deploy. Please note that the first time that you run the application it will take longer because it will download the Mistral 7B model. Subsequent runs will be much faster.

After the deployment process is complete, you can access your app by clicking with the application URL.


By following this guide, you have created a sophisticated tool for summarizing YouTube video content. You've seen firsthand how potent tools like Deepgram, Langchain, and Mistral 7B can be used together to create a Streamlit application that not only simplifies but also amplifies the value of video content.

Deploying this application to Koyeb enables you to harness the power of high-performance microVMs, ensuring that your application runs smoothly in the regions where your users are located.

Now that you have a working application, you can think of ways to expand it, whether for enhancing personal productivity, advancing research, or providing cutting-edge business solutions.


Welcome to Koyeb

Koyeb is a developer-friendly serverless platform to deploy any apps globally.

  • Start for free, pay as you grow
  • Deploy your first app in no time
Start for free
The fastest way to deploy applications globally.