All tutorials

Use LlamaIndex, Streamlit, and OpenAI to Query Unstructured Data

9 min


LlamaIndex is a data framework that makes it simple to build production-ready applications from your data using LLMs. Specifically, LlamaIndex specializes in context augmentation, a technique of providing custom data as context for queries to generalized LLMs allowing you to inject your specific contextual information without the trouble and expense of fine-tuning a dedicated model.

In this guide, we will demonstrate how to build an application with LlamaIndex and Streamlit, a Python framework for building and serving data-based applications, and deploy it to Koyeb. The application will deploy an example web app that allows users to ask questions about custom data. In our example, this custom text will be the story "The Gift of the Magi" by O. Henry.

You can deploy and preview the example application from this guide with our LlamaIndex One-Click app or by clicking the Deploy to Koyeb button below:

Deploy to Koyeb

Be sure to set the OPENAI_API_KEY environment variable during configuration. You can consult the repository on GitHub to find out more about the example application that this guide uses.


To successfully follow and complete this guide, you need:

  • Python 3.11 installed on your local computer.
  • A GitHub account to host your LlamaIndex application.
  • A Koyeb account to deploy and run the preview environments for each pull request.
  • An OpenAI API key so that our application can send queries to OpenAI.


To complete this guide and deploy a LlamaIndex application, you'll need to follow these steps:

  1. Set up the project directory
  2. Install project dependencies and fetch custom data
  3. Create the LlamaIndex application
  4. Test the application
  5. Create a Dockerfile
  6. Deploy to Koyeb

Set up the project directory

To get started, create and then move into a project directory that will hold the application and assets we will be creating:

mkdir example-llamaindex
cd example-llamaindex

Next, create and activate a new Python virtual environment for the project. This will isolate our project's dependencies from system packages to avoid conflicts and offer better reproducability:

python -m venv venv
source venv/bin/activate

Your virtual environment should now be activated.

Install project dependencies and fetch custom data

Now that we are working within a virtual environment, we can begin to install the packages our application will use and set up the project directory.

First, install the LlamaIndex and Streamlit packages so that we can use them to build the application. We can also take this opportunity to make sure that the local copy of pip is up-to-date:

pip install --upgrade pip llama-index streamlit

After installing the dependencies, record them in a requirements.txt file so that we can install the correct versions for this project at a later time:

pip freeze > requirements.txt

Next we will download the story that we will be using as context for our LLM prompts. We can download a PDF copy of "The Gift of the Magi" by O. Henry from TSS Publishing a platform for short fiction that hosts free short stories.

Create a data directory to hold the contextual data for our application and then download a copy of the story by typing:

mkdir data
curl -L -o data/gift_of_the_magi.pdf

You should now have a PDF file that we cat load into our application and attach as context to our LLM queries.

Create the LlamaIndex application

We have everything in place to start writing our LlamaIndex application. Create an file in your project directory and paste in the following content:

import os.path
import streamlit as st

from llama_index.core import (

# check if storage already exists
PERSIST_DIR = "./storage"
if not os.path.exists(PERSIST_DIR):
    # load the documents and create the index
    documents = SimpleDirectoryReader("data").load_data()
    index = VectorStoreIndex.from_documents(documents)
    # store it for later
    # load the existing index
    storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR)
    index = load_index_from_storage(storage_context)

query_engine = index.as_query_engine()

# Define a simple Streamlit app
st.title('Ask Llama about "The Gift of the Magi"')
query = st.text_input("What would you like to ask? (source: data/gift_of_the_magi.pdf)", "What happens in the story?")

# If the 'Submit' button is clicked
if st.button("Submit"):
    if not query.strip():
        st.error(f"Please provide the search query.")
            response = query_engine.query(query)
        except Exception as e:
            st.error(f"An error occurred: {e}")

Let's go over what the application is doing.

It begins by importing the basic packages and modules it will use to create the application. This includes Streamlit (aliased as st) as well as functionality from LlamaIndex for loading data and indexes from directories, building vector indexes, and attaching contexts.

Next, we set up some semi-persistent storage for the index files. This logic helps us avoid creating an index from our data document every time we run the application by storing index information in a storage directory the first time it is evaluated. The application can load the index data from the storage directory directly on subsequent runs to increase performance.

Afterward the index is created or loaded, we create a query engine based on it and begin constructing the application frontend with Streamlit. We add a input field that will be translated into our query and then display the results upon submission.

Test the application

We can test that the application works as expected on your local machine.

First, set and export the OPENAI_API_KEY environment variable using your OpenAI API key as the value:


Next, run the application by typing:

streamlit run

This will start the application server. Navigate to in your web browser to view the page prompting for your questions about "The Gift of the Magi". You can submit the default query or ask any other questions you have about the story.

When you are finished, press CTRL-C to stop the server.

Create a Dockerfile

During the deployment process, Koyeb will build our project from a Dockerfile. This gives us more control over the version of Python, the build process, and the runtime environment. The next step is to create this Dockerfile describing how to build and run the project.

Before we begin, create a basic .dockerignore command. This will define files and artifacts that we don't want to copy over into the Docker image. In this case, we want to avoid copying the venv/ and storage/ directories since the image will install dependencies and manage cached index files at runtime:

# .dockerignore

Next, create a Dockerfile with the following contents:

# Dockerfile
FROM python:3.11-slim

COPY . .
RUN pip install --requirement requirements.txt && pip cache purge

EXPOSE ${PORT:-8000}

CMD streamlit run --server.port ${PORT:-8000}

The image uses the 3.11-slim tag of the official Python Docker image as its starting point. It defines a directory called /app as the working directory and copies all of the project files inside. Afterwards, it installs the dependencies from the requiremnts.txt file.

The PORT environment variable is also defined as a build variable. This allows us to set it at build time to adjust the port that the image will listen on. We use this value in the EXPOSE instruction and again in the main streamlit command we run with the CMD instruction. Both values will use port 8000 as a fallback if PORT is not defined explicitly.

Publish the repository to GitHub

The application is almost ready to deploy. We just need to commit the changes to Git and push the repository to GitHub.

In the project directory, initialize a new Git repository by running the following command:

git init

Next, download a generic .gitignore file for Python projects from GitHub:

curl -L -o .gitignore

Add the storage/ directory to the .gitignore file so that Git ignores the cached vector index files:

echo "storage/" >> .gitignore

You can also optionally add the Python runtime version to a runtime.txt file if you want to try to build the repository from the Python buildpack instead of the Dockerfile:

echo "python-3.11.8" > runtime.txt

Next, add the project files to the staging area and commit them. If you don't have an existing GitHub repository to push the code to, create a new one and run the following commands to commit and push changes to your GitHub repository:

git add :/
git commit -m "Initial commit"
git branch -M main
git push -u origin main

Note: Make sure to replace <YOUR_GITHUB_USERNAME>/<YOUR_REPOSITORY_NAME> with your GitHub username and repository name.

Deploy to Koyeb

Once the repository is pushed to GitHub, you can deploy the LlamaIndex application to Koyeb. Any changes in the deployed branch of your codebase will automatically trigger a redeploy on Koyeb, ensuring that your application is always up-to-date.

To get started, open the Koyeb control panel and complete the following steps:

  1. On the Overview tab, click Create Web Service.
  2. Select GitHub as the deployment method.
  3. Choose the repository containing your application code. Alternatively, you can enter our public LlamaIndex example repository into the Public GitHub repository at the bottom of the page:
  4. In the Builder section, choose Dockerfile.
  5. Choose an Instance of size micro or larger.
  6. Expand the Environment variables section and click Add variable to configure a new environment variable. Create a variable called OPENAI_API_KEY. Select the Secret type and choose Create secret in the value. In the form that appears, create a new secret containing your OpenAI API key.
  7. Choose a name for your App and Service, for example example-llamaindex, and click Deploy.

Koyeb will clone the GitHub repository and use the Dockerfile file to build a new container image for the project. Once the build is complete, a container will be started from the image to run your application.

Once the deployment is healthy, visit your Koyeb Service's subdomain (you can find this on your Service's detail page). It will have the following format:


You should see your LlamaIndex application's prompt, allowing you to ask questions about the story and get responses from the OpenAI API.


In this guide, we discussed how to build and deploy an LLM-based web app to Koyeb using LlamaIndex and Streamlit. The application loads a story from a PDF on disk and sends this as additional context when submitting user-supplied queries. This allows you to customize the focus of the query without having to fine-tune a model for the purpose.

This tutorial demonstrates a very simple implementation of these technologies. To learn more about how LlamaIndex can help you use LLMs to answer questions about your own data, take a look at the LlamaIndex documentation.


Welcome to Koyeb

Koyeb is a developer-friendly serverless platform to deploy any apps globally.

  • Start for free, pay as you grow
  • Deploy your first app in no time
Start for free
The fastest way to deploy applications globally.