Streamlining distributed systems: claim-check pattern with Redpanda

Learn how to implement this practical pattern to streamline your distributed system

10 min readApr 24, 2024

The claim-check pattern is an architectural pattern used in distributed systems for efficiently handling large payloads. This pattern allows large messages to be processed without overwhelming or slowing down the messaging platform or the client.

The claim-check pattern works well in various scenarios, such as object storage solutions, file upload systems, and the gaming industry. It’s particularly useful when you need to reduce latency in processing large amounts of data or enhance security. If you’re concerned about trusting outside parties with your data, you can use the claim-check key to hide the sensitive portions of it.

With this tutorial, you’ll learn about the claim-check pattern and walk through a demo to practice creating more efficient and streamlined event-driven architectures for handling substantial data payloads.

Let’s get started.

What is the claim-check pattern?

The claim-check pattern is used in a variety of scenarios, particularly in systems where a large amount of data needs to be processed or whenever a message cannot fit the supported message limit of the chosen message bus technology. In a broader sense, this pattern works as follows:

The sender splits a large message into two parts: a claim check and a payload.
The claim-check key, which acts like a reference or pointer to the payload, is sent to the messaging platform.
The payload, which contains the actual data, is stored using an external service.

Animation of how the claim-check pattern works. (Source: Dunith Danushka)

Claim-check patterns can use object storage solutions like Amazon S3, Azure Blob Storage, or Google Cloud Storage to store the binary data, as that’s more cost-efficient than scaling up and paying for more throughput units and only storing the claim-check key in message brokers.

This technique is particularly useful in message and event-driven systems where applications communicate via asynchronous components (such as message queues or topics) and when the message being communicated involves large binary data that might be used rarely or not at all by subscribing services. These messaging systems are typically designed for fast processing and handling of small messages, which often means that there is a limit to the message size.

Applying the claim-check pattern for these use cases optimizes the message flow by reducing latency. It also reduces costs since storage is usually cheaper than the cost incurred for memory (RAM) and processing units.

Redpanda is an ideal platform for implementing the claim-check pattern due to its compatibility with the Kafka API and ability to manage large volumes of payloads. By storing large messages in an external service and sending only a reference to these messages through the event stream, developers can use Redpanda to build fast, scalable, and reliable event-driven systems that efficiently handle large data payloads.

Implementing the claim-check pattern with Redpanda

Imagine you’re working on a web application that allows users to upload a sizable video file to a web app. The app then performs several operations on the video, such as transcoding the video to a different format, extracting metadata, generating thumbnails, and so on.

These operations are time-consuming and resource-intensive and can’t be performed immediately upon file upload due to the size of the video file and the load on the server.

Here’s how you can use the claim-check pattern with Redpanda to tackle this scenario:

Handle file upload: When the user uploads the video file, the web application stores the file in a storage system. For demo purposes, you’ll use the local file system, but in production apps, you can use a distributed object storage solution like a cloud storage service. The app then obtains a unique identifier (the “claim check”) for the stored file.
Send claim check: The application sends a message to a Redpanda topic. This message doesn’t contain the video file itself, but rather the claim check and metadata, such as the title, duration of the video, resolution, and so on. You’ll publish the claim check as a message key in the topic for later retrieval of the video file.
Process message: A consumer application subscribes to the Redpanda topic and reads the messages as they arrive. For each message, it uses the claim check to retrieve the stored metadata of the video file and then uses it to retrieve the video file from storage.
Perform operations: The consumer application performs the necessary operations on the video file, such as transcoding. It can do this asynchronously without slowing down the web application or the message bus.
Store the converted video: Once the conversion operation is complete, the processed file gets stored in a destination storage directory.

Using the claim-check pattern with Redpanda allows the web application to handle large file uploads efficiently and reliably without overwhelming the message bus or the server.

The above scenario is captured in the following diagram:

Prerequisites

For this tutorial, you’ll need the following:

Docker Desktop (4.24 or the latest version)
A running Redpanda Console instance on Docker (23.2.14 or the latest version)
Python version 3.9 or later
An IDE of your choice

The code for this entire tutorial is available in this GitHub repository. Now, let’s dig into the steps.

1. Verify the state of Redpanda and Redpanda Console

Open a terminal and run the command docker ps to verify if Redpanda and the Redpanda Console application are running:

CONTAINER ID   IMAGE                                                COMMAND                   CREATED          STATUS          PORTS
                                                          NAMES
b2a7f63e6d35   docker.redpanda.com/redpandadata/console:v2.3.1      "/bin/sh -c 'echo \"$…"   21 seconds ago   Up 16 seconds   0.0.0.0:8080->8080/tcp
                                                          redpanda-console
bb23cc15261b   docker.redpanda.com/redpandadata/redpanda:v23.2.14   "/entrypoint.sh redp…"    21 seconds ago   Up 17 seconds   8081-8082/tcp, 0.0.0.0:18081-18082->18081-18082/tcp, 9092
/tcp, 0.0.0.0:19092->19092/tcp, 0.0.0.0:19644->9644/tcp   redpanda-0

2. Create a topic on Redpanda

Create a topic named video_metadata in your Redpanda cluster to receive the events from a demo producer application that you'll set up later.

To do that, open a browser to access the Redpanda Console application at http://localhost:8080/:

Here, you can get an overview and examine the health status of the Redpanda cluster. To create the topics in the cluster, click the Topics option in the side menu and click Create Topic:

Fill in the topic name in the on-screen prompt and click Create:

3. Create the producer app

Now that your topic is ready, go ahead and create a project directory named claim-check-pattern-with-redpanda on your machine. You'll develop a Python app to handle the claim-check key and metadata generation based on a video file and store the video file in a file system.

This producer app will send the claim check (as a message key) and the metadata of the video file to the Redpanda topic video_metadata.

Create a requirements.txt file in the project directory and paste in the content below:

kafka-python==2.0.2
av==11.0.0

The kafka-python library allows your Python app to interact with the Redpanda cluster, while the av library will operate on the video file to extract the metadata and then later transform the file's video format. You can prepare the virtual environment required for this Python app by running the following commands in a command line terminal (keep this terminal open for later use):

python -m venv venv
venv\Scripts\activate
pip install -r requirements.txt

Next, create a Python package named producer_app using your preferred IDE. Inside that package, create a directory named output where the producer app will store the video file after extracting its metadata and generating a claim check. Then, create a Python module called main.py and paste in the code below:

import json
import os.path
import shutil
import uuid

import av
import av.datasets
from kafka import KafkaProducer

# Define the Redpanda topic and Kafka producer
topic = "video_metadata"
producer = KafkaProducer(bootstrap_servers='localhost:19092',
                         key_serializer=str.encode,
                         value_serializer=lambda v: json.dumps(v).encode('utf-8'))

# Input video file for the demo is based on av.datasets
source_file = av.datasets.curated("pexels/time-lapse-video-of-night-sky-857195.mp4")

def get_video_metadata(file_path_in: str) -> dict:
    video_metadata = {}
    # Open the video file
    container = av.open(file_path_in)

    # Compose the metadata
    video_metadata["file_name"] = os.path.basename(container.name)
    video_metadata["title"] = container.metadata.get("title")
    video_metadata["duration"] = container.duration / av.time_base, "seconds"

    video_metadata["title"] = container.metadata.get("title")

    for stream in container.streams:
        if stream.type == 'video':
            video_metadata["video_resolution"] = str(stream.width) + " x " + str(stream.height)
            video_metadata["video_codec"] = stream.codec_context.codec.name

    return video_metadata

def store_file(file_path_in: str, file_path_out: str):
    # Copy the file
    shutil.copy(file_path_in, file_path_out)

metadata = get_video_metadata(source_file)

# Producer app's output file path
# This is the path where the original video file will be stored by the producer after the extraction of metadata
output_file = "producer_app/output/" + metadata.get("file_name")

# Store the video file
store_file(source_file, output_file)

# Generate message key that serves as claim-check key
claim_check_key = str(uuid.uuid4())

# Send the video_metadata to the Redpanda topic
future = producer.send(topic, key=claim_check_key, value=metadata)
# Block until a single message is sent (or time out in 15 seconds)
result = future.get(timeout=15)

print("Message sent, partition: ", result.partition, ", offset: ", result.offset)

The av library provides convenient out-of-the-box video data sets for developers to start experimenting with, and this producer app is developed to utilize the same for demo purposes.

Run this producer application after preparing your consumer application in the next step.

4. Create the consumer app

The consumer app will listen to the Redpanda topic video_metadata and retrieve the metadata of the video file based on the claim-check key (message key, in this case). It will then use that information to locate the video in storage and convert the video format from MP4 to MKV. The final converted video will be stored in the storage (local file system directory).

From the project directory you’ve been working in, create a Python package named consumer_app. Inside that package, create a directory named output where the consumer app will store the converted video file. Then, create a Python module called main.py and paste in the code below:

import json
import os

import av
from kafka import KafkaConsumer

# Define the Redpanda topic and Kafka producer
topic = "video_metadata"
consumer = KafkaConsumer(topic,
                         bootstrap_servers='localhost:19092',
                         value_deserializer=lambda v: json.loads(v.decode('utf-8')))
output_file_base_path = "consumer_app/output/"

def get_producer_output_directory() -> str:
    # Get the directory of the current script
    current_file_directory = os.path.dirname(os.path.abspath(__file__))

    # Relative path to your video file
    source_file_base_path = os.path.join(current_file_directory, "../producer_app/output/")
    return source_file_base_path

# Define a function to convert an MP4 video file to MKV format
def convert_mp4_to_mkv(mp4_file_in: str):
    # Print a message indicating the start of the conversion process
    print("Converting the video " + mp4_file_in + " from mp4 to mkv format")

    # Open the input video file
    input_ = av.open(mp4_file_in)
    # Open the output video file in write mode
    output = av.open(output_file_base_path + "remuxed.mkv", "w")

    # Get the first video stream from the input file
    in_stream = input_.streams.video[0]
    # Add a new stream to the output file, using the input stream as a template
    out_stream = output.add_stream(template=in_stream)

    # Loop over all packets in the input stream
    for packet in input_.demux(in_stream):

        # If the packet's decoding timestamp (DTS) is None, skip this packet
        if packet.dts is None:
            continue

        # Set the packet's stream to the output stream
        packet.stream = out_stream

        # Multiplex ("mux") the packet to the output stream
        output.mux(packet)

    # Close the input file
    input_.close()
    # Close the output file
    output.close()

# Consume the message from the topic
for message in consumer:
    # Get the metadata information (such as file_name) from the message
    file_name = message.value.get("file_name")
    # Construct the MP4 file path
    mp4_file = get_producer_output_directory() + file_name
    # Convert the MP4 video format file to MKV format and save it in the output_file_base_path location
    convert_mp4_to_mkv(mp4_file)
    print("Converted the mp4 video: " + file_name + " to mkv format!")

With this, both your producer and consumer apps are ready for execution. Let’s run them.

5. Run the apps

Open a terminal, activate the Python virtual environment, and start the consumer app first to start listening to the incoming messages from the topic:

python consumer_app\main.py

Keep this terminal running. Then, open another terminal and start the producer app to start sending messages with claim checks and storing the MP4 video file:

python producer_app\main.py

You should see the following output from the producer:

Message sent, partition:  0 , offset:  0

The above output indicates that a message is sent to the topic. You can use the Redpanda Console UI to view this message. Go to the Topics screen that you accessed before and click the video_metadata topic to view its contents produced by the producer app:

Notice that the key represents the claim-check key (a unique identifier) for your video file. By double-clicking the data row or message that’s displayed as a result in the topic, you can view the complete metadata value that gets generated by your producer app:

Viewing the video metadata within Redpanda Console

Open the output directory of producer_app, and you should see a video file stored in this directory now:

Next, switch to the terminal where your consumer app was running. You should see an output similar to the following:

Converting the video /producer_app/output/time-lapse-video-of-night-sky-857195.mp4 from mp4
 to mkv format
Converted the mp4 video: time-lapse-video-of-night-sky-857195.mp4 to mkv format!

This indicates that your consumer app has successfully received the message from the video_metadata topic and processed it.

Open the output directory of consumer_app, and you should see the converted video file stored in this directory now:

Wrapping up

By understanding and implementing the claim-check pattern with Redpanda, you can now create more efficient and streamlined event-driven architectures. This not only enhances the performance of your applications but also ensures reliable and scalable data handling.

As larger amounts of data continue to be generated and processed, strategies like the claim-check pattern will become increasingly crucial in the world of distributed systems.

To keep exploring Redpanda, why not try their new Serverless option for free? If you have questions or want to chat with the team, join the Redpanda Community on Slack.