Skip to main content

Introduction

In a world where technology constantly pushes boundaries, we find ourselves at the forefront of progress. High-end CPUs, lightning-fast I/O operations, memory that defies limits, and robust infrastructures like DPDK, VPP, and eBPF have ushered in a new era. It’s a fusion of power and potential that seems boundless.

Our CTO, Mr. Vivek Gupta, unveiled the intriguing world of DPDK app scaling and the challenges faced whilst doing so, at the DPDK Summit by the prestigious Linux Foundation at Dublin, Ireland.

Why are we here?

If you are a developer, building VNFs and networking applications such as NGFWs or Load Balancers, you may have encountered some challenges or limitations that stopped you from reaching your goals. You may have struggled to migrate your existing applications, which were originally built as standard Linux user-space apps, to a DPDK and VPP-enabled environment. You may have also faced difficulties in extending the benefits of DPDK and VPP to appliances such as edge devices, which often have limited hardware resources.

But did you succeed in achieving your desired results? Did you face any bottlenecks or limitations that prevented you from reaching the full potential of your user space networking applications?

If that sounds familiar, then you’re in the right place. In this post, we will share with you some of the problems that we faced while building various solutions using DPDK and how we solved them. We will also share some ideas which were presented to the community to introduce in Dpdk to better support different kinds of user space networking applications.

Let’s get started.

Challenge #1: High-throughput, I/O-intensive TCP/UDP apps

One of the most common types of user space networking applications are those that involve high-throughput, I/O-intensive TCP/UDP operations. Examples of these applications are secure socket layer virtual private networks (SSL VPNs), proxies, data-driven machine learning solutions, and real-time observability apps.

These applications usually follow one of these models:

Event-Driven Architecture:

– Utilizes centralized event processing within a single context.
– Relies on blocking calls for socket read/write actions.
– Achieves scaling by deploying multiple process instances.

Thread-Per-Connection Approach:

– Spawns individual threads for each network connection.
– Employs blocking calls for specific network operations within threads.
– Assigns each thread the entire lifecycle of its designated connection.

Worker Thread Model:

– Designates specialized threads for managing network I/O tasks.
– Offloads actual computation to these worker threads.
– Commonly executes synchronous calls to complete allocated tasks.

Speaking of Network Security, there are a few bad practices one must avoid.

However, these models face compatibility issues with DPDK-based user space TCP stack architectures, which favor a thread-per-core or process per core model for comprehensive processing, encompassing both TCP stack and application logic. Check the following diagram from a better visual representation.

This means that if you want to migrate your existing TCP/UDP applications to a DPDK environment, you will have to make significant code changes to support async behavior where the packet I/O is needed in the slow paths. You will also have to split your application into two different parts: slow path and fast path, and assign different cores for packet path and worker tasks.

This solution may work, but it is far from ideal. It introduces additional complexity and overhead in your application logic, and limits your flexibility in handling different types of connections across different types of interfaces. For example, if your application requires heavy-duty processing such as policy lookups or deep packet inspections, you may end up facing issues such as high latency, frequent packet drops, and multiple retransmissions. Check this following visual presentation for a better understanding.

Challenge #2: SSL VPN-based applications

Another common type of user space networking applications are those that implement SSL VPNs. These applications rely heavily on the OpenSSL library for cryptographic functions, such as encryption and decryption.

DPDK plays a dual role in encryption and decryption. Firstly, it turbocharges encryption when it’s done using specialized hardware. Secondly, it accelerates encryption when software is in play. To achieve this, DPDK can partner with OpenSSL, a handy toolkit for encryption, to streamline and enhance the encryption process. Check out the following diagram.

But there’s a catch. OpenSSL, even though it’s widely used, isn’t perfect. It sometimes slows things down because it uses locks. Imagine a traffic jam on the information highway. Also, VPP, which has its own special features for secure connections, still relies on OpenSSL or a similar tool to get the job done.

We faced this challenge when we tried to develop a VPN broker application that connects remote users to data center applications. Our application required each packet to exit one tunnel and be switched to another tunnel. Our existing VPN broker application followed the model of thread-per-connection, used sync mode of processing, and employed extensive locking mechanisms throughout the application, including SSL operations.

We realized that this model was unsuitable due to the high per-packet overhead. We decided to use the option of VPP host-stack with VCL library, which allowed us to migrate our VPN application to utilize DPDK and VCL.

This solution worked, but it was not ideal either. The best solution would have been if TCP stack, SSL processing were embedded in a single process with multiple threads, and allowed the reuse of sockets across threads.

Challenge #3: Machine learning applications

Machine learning applications are gaining ground in the world of networking. Here’s what they do:

– They search for specific patterns or signatures in data packets or blocks of information.
– They pick out these patterns to use as input for the machine learning engine.
– They make sure the data is in the right order to accurately find these different patterns.
– They use machine learning techniques to process the extracted data and give you the results you’re after.

These applications are not necessarily user-space applications, but they still face challenges in achieving line rate performance. The algorithms are relatively slow and complex, and require high computational power.

Challenge #4: Kubernetes-based DPDK applications

On the cloud, DPDK applications running on pods have specific requirements, such as:

– Ability to attach to the physical interfaces for high throughput. Use SR-IOV.
– Ability to assign the CPUs to the pods based on the type of pod.
– Ability to assign the interfaces to the pods in a dynamic fashion.
– Ability to attach the pod and the interfaces to the load-balancer target groups.
– Ability to unbind the interfaces when the pod goes down.

We worked on multiple DPDK/VPP based applications on Kubernetes platform. We realized that there is a need to have a service that can manage the pods, interfaces, connectivity, and ensure that the system is up and running. We developed our own service to manage these tasks, but there may be better and more standardized alternatives to do the same.

We propose adding support to DPDK for various applications to improve scalability and efficiency. Our key requirements include:

Enhancing user space TCP-stacks: We want to enable the use of sockets across multiple threads and facilitate synchronous operations.

Session and flow tables with aging and export support: Many applications deal with encrypted traffic, where sessions rely on inner packet information. The current Rte_flow is inadequate for this purpose. We need session tables with customizable keys, extensions or lookaside tables to address specific requirements. Additionally, we require support for aging and session export features.

OpenSSL equivalent for DPDK: We need a cryptographic library that can seamlessly integrate with DPDK and VPP, offering lock-free operations.

Traffic management: We seek a method to handle various types of connections across different interfaces, taking into account their volume and distribution.

If you agree with us, and if you are interested in collaborating with us on these topics, please let us know. Get in touch with our army of industry leaders and thought experts today. Let’s build something great together!

Leave a Reply