Automate videos transcription with Koyeb Serverless Engine

December 01, 2020

Édouard Bonlieu

Édouard Bonlieu


This guide showcases how to deploy a video transcription service. We will use the Google Cloud Video Intelligence API to transform videos to text (speed-to-text) and the Koyeb Serverless Engine to handle your video files and orchestrate the processing.

Once you have completed this tutorial, you will be able to upload your videos via the Koyeb S3-compatible API which will trigger a function generating a video speech transcription file each time a new video is uploaded.

You can then use the speech transcription file to:

  • Index the results in a database and make your videos searchable
  • Automatically generate subtitles for your videos
  • Moderate videos based on their content

And many other use-cases.

In this guide, we use a Koyeb Managed Store to store our videos and the generated video speech transcription file. You can also connect your own Cloud Storage Provider to integrate with your existing infrastructure and data with minimal effort.


To successfully follow and implement this tutorial, you need:


To build a video transcription service using Koyeb and the GCP Video Intelligence API there are four steps:

  1. Create a Koyeb Store to Upload videos & Retrieve the video speech transcription file
  2. Create a Koyeb Secret to store your Google Cloud Service configuration
  3. Create a Stack and deploy the video transcription function
  4. Upload a video and retrieve the video transcription file

Create a Koyeb Store to Upload videos & Retrieve the video speech transcription file

The first step is to create a Koyeb Store to store our videos and the transcription files generated. Koyeb Stores provides an S3-compatible API allowing you to manage your data programmatically using any S3-compatible SDKs and tools.

To create a new Koyeb Managed Store with the CLI, in your terminal, type:

koyeb create store -f new-store.yaml

Where the content of new-store.yaml is:

name: my-store-01 type: koyeb

From there, you have a Koyeb Store up and running. You can interact with your store using any S3 compatible SDKs and tools.


The next step is to configure S3cmd to interact with the Store, so we can upload videos and retrieve video audio transcription files from there.

In the Koyeb Control Panel, click API in the left side menu and click New in the S3 credentials section. A modal appears to create a new S3 credential. Enter the name and a description (optional) to identify and remember what this credential is used for.

Click the Submit button. Save the access_key and secret_key generated in a secure place. Once the modal closed, you will not able to see them again.

Create an S3cmd in your home repository and replace the value REPLACE_ME with the credentials you previously generated.

[default] access_key = REPLACE_ME secret_key = REPLACE_ME bucket_location = US check_ssl_certificate = True check_ssl_hostname = True default_mime_type = binary/octet-stream delay_updates = False delete_after = False delete_after_fetch = False delete_removed = False dry_run = False enable_multipart = True encoding = UTF-8 encrypt = False follow_symlinks = False force = False get_continue = False guess_mime_type = True host_base = host_bucket = %(bucket) human_readable_sizes = False invalidate_default_index_on_cf = False invalidate_default_index_root_on_cf = True invalidate_on_cf = False limit = -1 limitrate = 0 list_md5 = False long_listing = False max_delete = -1 multipart_chunk_size_mb = 15 multipart_max_chunks = 10000 preserve_attrs = True progress_meter = True put_continue = False recursive = False recv_chunk = 65536 reduced_redundancy = False requester_pays = False restore_days = 1 restore_priority = Standard send_chunk = 65536 server_side_encryption = False signature_v2 = False signurl_use_https = False skip_existing = False socket_timeout = 300 stats = False stop_on_error = False throttle_max = 100 urlencoding_mode = normal use_https = True use_mime_magic = True verbosity = WARNING

To check the configuration is working fine, in the terminal type:

s3cmd -c ~/.s3cfg-gcp ls 2020-10-28 09:15 s3://my-store-01

You should see the Store you previously created.

Create a Koyeb Secret to store your Google Cloud Service Account configuration

Create a Koyeb Secret to securely store your GCP Service Account configuration. Koyeb Secrets allow you to access API credentials, tokens, etc. securely in your configuration and functions without having to expose them.

Create a secret.yaml file and replace the value with your GCP Service Account configuration.

name: gcp-sa-vi value: | {...}
koyeb create secrets -f secret.yaml

Create a Stack and deploy the video transcription function

Our Store is configured and ready-to-use. The next step is to deploy our processing function to perform the video speech transcription. We will use the Koyeb Catalog App to perform the processing as it allows you to perform this operation without writing a single line of code.

In the terminal, start by creating a new Stack. Stacks are processing environments containing code and containers.

koyeb create stack -n video-transcription

With our Stack created, we can configure and deploy the video speech transcription app. Create a file containing our function configuration video-transcription.yaml:

functions: - name: gcp-video-intelligence use: gcp-video-intelligence@1.0.1 with: STORE: my-store-01 #The store to watch to trigger the function and save the GCP Video intelligence result. This parameter is required. GCP_KEY: my-gcp-secret #The name of the secret in which the GCP service account will be stored. This parameter is required. VIDEO_INTELLIGENCE_FEATURE: SPEECH_TRANSCRIPTION

Deploy the function by running:

koyeb create revision video-transcription -f video-transcription.yaml`

This deploys the function into our Stack. Now, each time a video is uploaded to the Store my-store-01, the function will be triggered and a video speech transcription file will be generated.

Upload a video and retrieve the video transcription file

With our processing stack ready, we can now check everything is running fine and that for each video uploaded, a video speech transcription file is generated.

To upload a video using S3cmd, in the terminal type:

s3cmd put /path/to/video.mp4 s3://my-store-01

Now, if you type koyeb logs stack-events video-transcription you see an event appears that triggers your functions. This event is then used in your function to retrieve the video file and perform the speech transcription. You can follow the function execution running: koyeb logs functions video-transcription gcp-video-intelligence.

Once the execution done, you can retrieve the speech-transcription file running:

s3cmd get s3://my-store-01/gcp-video-intelligence-SPEECH_TRANSCRIPTION-[...].json

This file contains the result of the processing function with the detected text in the video:

"results": [ { "alternatives": [ { "transcript": "Hey, I'm John...", "confidence": 0.7477226853370667, "words": [ { "startTime": { "nanos": 500000000 }, "endTime": { "nanos": 700000000 }, "word": "Hey," }, { "startTime": { "nanos": 700000000 }, "endTime": { "nanos": 900000000 }, "word": "I'm" }, ...


In this guide, we discovered how to deploy a video transcription service using Google Video Intelligence API and the Koyeb Serverless Engine. We used S3cmd to upload and retrieve video but you can also use any S3 compatible SDKs and tools.

The catalog integrations code used in this guide is available on GitHub.

If you would like to read more Koyeb tutorials, checkout out our tutorials collection. Have an idea for a tutorial you'd like us to cover? Let us know by joining the conversation over on the Koyeb community platform!

Welcome to Koyeb

Koyeb is a developer-friendly serverless platform to deploy any apps globally.

Start for free
Start for free, pay as you grow

Deploy 2 services for free and enjoy our predictable pricing as you grow

Deploy your first app in no time

Get up and running in 5 minutes