Announcing the general availability of Redpanda 23.2

Say hello to more power, scale, cost-efficiency, operational simplicity, and Kafka ecosystem support

Redpanda Data
7 min readAug 2, 2023

Authors: Doug Flora, Matt Schumpert

In our recent Series C funding announcement, we noted our commitment to evolving Redpanda beyond the scope of legacy streaming data platforms, with a focus on empowering engineers to build the next generation of data applications and pipelines.

We’re happy to now release Redpanda 23.2, bringing the next stage of our evolution with even more power, scale, cost-efficiency, operational simplicity, and Kafka ecosystem support. The new capabilities detailed in this blog are generally available for both the self-hosted Redpanda Platform and fully-managed Redpanda Cloud deployments.

Let’s take a closer look at what’s packed into this new release.

Power, robustness, and scale: infinite storage, balancer improvements, higher throughput for transactions

Our goal with Redpanda is to offer a streaming data platform with no tradeoffs between performance and reliability when running at GBps+ scale. The 23.2 release continues to move the needle in that direction.

Last year, we introduced Redpanda Tiered Storage — with S3-compatible cloud object storage as the default storage tier — to reduce data retention costs by up to 10x and provide unified access to both historical and real-time data using the same API. Today, we’re enhancing Redpanda Tiered Storage to be even more seamless, automated, and efficient when scaling for large workloads in the cloud. These updates have the added benefit of helping to lower your cloud storage bill!

Here’s what changed:

  • Infinite topic retention: when using Redpanda Tiered Storage, the partition manifest becomes a compressed binary structure that’s seamlessly offloaded to cloud storage and fetched from cloud storage when needed. This change means you can scale to a virtually infinite number of segments per partition, keeping any amount of historical topic data accessible to clients (provided you have proper cluster sizing to support your read workloads).
  • Fine-grained caching: Redpanda recalls data from cloud storage in smaller chunks to support more concurrent consumers of historical data efficiently, with less local storage required to host the cache. (Even more storage efficiency gains!)
  • Automatic disk space management: Redpanda intelligently purges data from brokers’ local storage whenever needed and moves it to cloud storage to avoid filling disks. This further optimizes the use of all available local storage — for topic data, the tiered storage cache, and a reserve — so Redpanda is using 100% of the local disk that it safely can. The result: more data is available at low latency without the administrative overhead of configuring local retention explicitly.
Redpanda cloud-first storage architecture

We’ve also added a number of improvements to make Redpanda more robust and powerful by preventing hotspots, improving efficiency, and optimizing performance for high-throughput workloads. These improvements include:

  • Topic-aware leadership balancing: Redpanda now continuously distributes leaders of a topic across your cluster, spreading your topic workload evenly to prevent topic hotspots. This delivers more consistent performance and throughput when you’re writing data, particularly for large deployments.
  • Improved data balancing: Redpanda now avoids data hotspots by allocating partitions to nodes randomly rather than allocating them grouped by topic, ensuring that all replicas (not just leaders) of a topic get distributed more evenly across the cluster. Redpanda also checks for free disk space and the health of the target node before moving partitions to ensure a successful move. With the latest improvements, Redpanda can schedule partition moves faster by scheduling processes asynchronously, which vastly improves the efficiency and throughput of large moves.
  • Higher throughput for transactions: Redpanda now supports higher throughput when using the Kafka Transactions API, and can achieve up to a 50% throughput boost on the same hardware. The transaction state is now sharded into a topic with multiple partitions to provide more transaction throughput per broker. This behavior is enabled automatically only for new clusters created with v23.2 or later.
  • Faster restarts on aged clusters: Redpanda has significantly improved startup times and shortened maintenance windows for nodes in long-running clusters, because it now loads the controller log from a snapshot on startup.

Save up to 50% on cloud networking costs with follower fetching

In June, we announced follower fetching as an upcoming feature. Today we’re happy to share that it’s generally available for Redpanda self-hosted and Redpanda Cloud users.

Follower fetching enables your Redpanda consumers to fetch records from the closest physical replica of a topic partition, regardless of whether it’s a leader or a follower, to reduce network traffic for cross-AZ and cross-rack deployments. Ultimately, this can save you up to 50% on your infrastructure costs for read-heavy workloads in certain public clouds (check out the beta announcement blog to see a real-world example of cost savings on AWS).

In addition to cost savings, with follower fetching you can also reduce read latency for consumers and support additional read throughput via more efficient use of network bandwidth by all nodes in the cluster.

Follower fetching in a Redpanda topic

Leveled-up DevEx and DevOps: rpk, Redpanda Console, K8s, Ansible, Terraform

Redpanda 23.2 includes a revamp of rpk—the command line experience—to help accelerate complex streaming data projects and improve cloud integration, including cluster profiles, Tiered Storage insights, and autocompletion.

  • Cluster profiles: users can now manage multiple Redpanda clusters from rpk, our command line interface (CLI) utility, helping to standardize and accelerate work happening across different clusters. Users can create multiple profiles for multiple clusters and swap between them with rpk profile use.
  • Tiered Storage insights: you can now get more insights on your data in Redpanda Tiered Storage with rpk topic describe-storage, including detailed information on local and cloud storage usage, and time of last write to object storage.
  • Autocompletion: you can now write the first few letters of an rpk command and then use the tab key to complete it automatically—just like with a search engine or word processor.
The new and improved rpk — simplifying admin across multiple environments

For a full breakdown of the new rpk functionality, check out the Redpanda documentation.

Redpanda Console, our powerful GUI for managing and troubleshooting applications, is now easier to integrate with your identity and access management (IAM) and encryption infrastructure, thanks to the following updates:

  • Azure AD and Keycloak SSO support: Users can now sign into Redpanda Console using their existing Azure AD or Keycloak identities and login credentials, adding to our growing list of Single Sign-On (SSO) integrations.
  • TLS Termination: You can now secure Redpanda Console using Transport Layer Security (TLS), either by letting Redpanda Console handle TLS termination or by offloading termination to an upstream component such as a reverse proxy or a cloud HTTPS load balancer.

We’re also making it easier than ever for Redpanda developers and administrators to deploy and manage Redpanda in cloud-native environments, with our improved Kubernetes (K8s) operator, and new Ansible and Terraform capabilities:

  • Redpanda OperatorV2: A major overhaul to our K8s deployment tooling greatly improves simplicity, transparency, extensibility, and automation. You can now use our Helm chart for simple, declarative cluster operations on K8s, and you can bring in additional complex orchestration and automation with our new-and-improved Redpanda OperatorV2. The new operator uses FluxCD Helm controllers to manage Redpanda installation, so both Helm and operator users have the same experience and support. We welcome community contributions to the open-source operator to make it even better!
  • “five minutes to production” with Ansible and Terraform updates: Redpanda now supports rapid installation using Ansible and Terraform, including air-gapped installation via HTTP proxy or offline packaging and installation of binaries (see our Redpanda Deployment Automation Kit). Terraform Multi-AZ support is also expanded, including support for proxies and inter-security group rules.

Kafka ecosystem support: Schema ID validation, deleteRecords API

Redpanda functions as a drop-in replacement for Apache Kafka®, and to that end, we’re dedicated to ensuring API compatibility with the Kafka ecosystem. In addition to follower fetching, we’ve also added support for the Kafka deleteRecords API.

If you’re not familiar with deleteRecords, it’s exactly what it sounds like — it lets you delete records from a topic, from the beginning of a partition up to a specific offset. This is helpful if you need to free up disk space. For example, if your producers are pushing more data than anticipated, or if you want to trim the data that’s available to clients based on a specific business event (represented by the offset).

And last but certainly not least: we’re bringing schema ID validation to Redpanda so developers can easily validate messages against known data types registered in the Redpanda schema registry. Schema validation uses policies that define exactly where to expect certain data types within messages sent to topics. This means you get value out of real-time data more quickly with less manual filtering work, and ensures that only valid data reaches downstream consumers.

Plus, Redpanda schema validation supports the Confluent Serdes wire format so it’s now super simple to migrate existing applications that use Confluent Serdes libraries and validation over to Redpanda. No code changes required.

Other improvements to Redpanda’s Kafka ecosystem compatibility include a richer built-in Protobuf type system and enhanced support for Avro formats.

That’s a wrap!

Phew, talk about a well-rounded release. We trust that all our work over the last number of months will help accelerate your streaming data projects and ensure that Redpanda is your real-time engine of choice, regardless of workload type or size.

All of these summer updates are available today, so request a 30-day trial of Redpanda Enterprise or Cloud, or grab the free Community Edition from our Redpanda GitHub repo.

By the way, did you know that Redpanda is proven to help you reduce your streaming data costs by up to 6x? If you have questions about how much we can reduce your cloud bill, join the Redpanda Community on Slack, or get in touch with one of our experts.

--

--

Redpanda Data

The streaming data platform for developers—fully Kafka compatible. Helping developers build the best data streaming experiences.