Blog: Solving the Challenge of High Scale, Stateful Network Services

Written By Michael Zagalsky, VP Engineering, InsidePacket

A Tale of Two Platforms

We’re all familiar with the dichotomy of the networking world.

There are two competing camps, caught up in a bitter struggle.

The first camp consists of the proponents of virtualization and disaggregation. It’s all about flexibility. Hardware must be commoditized, software must be containerized, deployment must be streamlined, and network functions must be flexibly chained.

The second camp evangelizes acceleration and consolidation. It’s all about performance. Hardware must be purpose-built, software must be optimized, data path must be fine-tuned, and network functions must be closely integrated.

So, which camp got it right? At InsidePacket, we believe they are both right in a way. And in a way, they are both wrong. In life, as always, the truth lies somewhere in the middle.

The Primordial Sin

Let’s get to the bottom of it and examine the fundamental qualities of each platform.

A dedicated HW is great at two major metrics: latency and throughput. Additionally, traffic processing performance is deterministic; you get your clocks-per-packet-per-stage, and whatever function is embedded in the pipeline gets done – there are no surprises.

The downside? Well, there are a few. Silicon vendors have a limited real estate that must be split between the processing logic and memory. As a result, memory resources are limited. The deterministic pipeline also means that each stage must have guaranteed access to the relevant memories for each processed packet, and there is a contention between CPU-accesses and pipeline-accesses. ASIC designers must carefully craft memory access interfaces. Compromises must be made, limiting the platform flexibility and ability to support stateful processing at line-rate.

These are inherent problems not limited to a specific vendor. The new breed of programmable ASICs extends the boundaries of applications flexibility and memory allocation, but when confronted with the performance requirements of L4-L7 applications such as NGFW, Threat Protection, CG-NAT, and inline Load Balancing – it struggles. The performance envelope is severely unbalanced: throughput is in the Terabits, but Connections-per-second (CPS) rate is tens up to few hundreds of thousands top, and concurrent connections (CCN) limit is in the order of tens of millions at best, with the right programmable switch ASIC.

It is a mirror picture for server-based platforms. Memory is practically unlimited – at least compared to ASICs. Memory access is uncontested – CPU is the king. Flexibility… limited only by your creativity and imagination.

Downsides? Limited throughput, high latency, sharp performance degradation under load, poor integration between networking functions where each added function steals precious CPU cycles from the rest.

The scales of the performance envelope tilt in the opposite direction: CPS in the millions, CCN in hundreds of millions, but throughput in the order of tens of Gbps with high sensitivity to spikes.

Of course, we can always throw more VMs at the problem, add a few dozen/hundreds/thousands of more servers, and we’re saved – right? Not really. Not if we are taking new security Edge requirements seriously. The obvious one is cost – how many extra dollars are you willing to throw just to be able to claim your network is “all-virtual”? There is also the overhead of increased management complexity unless you choose to waiver it – after all, we are engineers, we like hard problems! But what about all the added latency, power consumption, physical space? At some point, the scaling approach reaches a breaking point.

The Path to Salvation

If only we could somehow marry the two – get the best of both worlds, bring each platform to its comfort zone where it can truly shine… Well, there is a way – with a Hybrid Platform. The idea of separation of duties and exercising load sharing between CPU and ASIC components is of course not new. What is unique about the InsidePacket approach is using proven best-in-class solutions as elementary building blocks, that can be easily scaled independently of each other. Combined, they form a flexible Accelerated Data Plane, controlled via a high-level open API.

InsidePacket Hybrid is composed of two types of elements: the server hosts a flexible virtualized control plane (that can be moved to the cloud) and a flexible data plane, and a Carrier Grade programmable switch provides a high-throughput data plane. On the server-side, the choice between a standard NIC and a SmartNIC can be made based on application requirements, where a SmartNIC can further improve performance by offloading certain workloads such as SSL encryption/decryption. You get an all-in-one package of application flexibility, multi-Tbps throughput, low latency, guaranteed performance, CPS in the millions, and CCN in the hundreds of millions – magic!

So, is there a catch? Well, yes. Let’s address the elephant (flow) in the room: how do we get the right flows to be processed by the right platform? How do we identify the fraction of heavy-hitter (A.K.A. elephant) flows carrying the lion’s share of the traffic volume – these are the flows we want to fully offload to the programmable switch pipeline!

The Elephant in the Room

For our Hybrid Platform elements to work in harmony, we must quickly and efficiently identify the chosen ones – the heavy hitter flows that must cross the sea to the promised land of unlimited throughput on the switch pipeline. This is not an easy task, and there are numerous publications on this topic suggesting which fields to track, which parameters to measure, what kind of DPI to perform, etc. The problem is that these are computationally intensive tasks when performed on each traffic flow, exhausting the resources needed to do the actual work the server is tasked with.

There are also some algorithms designed for a programmable pipeline implementation – these typically aim at detecting some number of heavy-hitters out of tens of thousands of flows – a drop in the sea compared to the capacity required by real-life stateful applications.

InsidePacket’s approach is to use switch-assisted detection algorithms. This approach provides a holistic solution where all traffic is inspected, and the focus is dynamically steered to the proper portions of the traffic.

The switch can easily do the heavy lifting of looking at all the traffic, albeit not in a per-flow resolution. It can provide instant and accurate aggregative insights on “traffic brackets” – partitions of the overall traffic flowing through the Hybrid Platform. The server then picks up where the switch left off. But now it is not looking in the dark – it can use the aggregative insights provided by the switch as the first level of approximation and run a deeper analysis just on the relevant traffic brackets.

In addition to accurately detecting the heavy hitters, the consolidated insights are fed back into the traffic distribution logic defining the traffic brackets – thus creating a dynamic positive feedback loop.

The value of efficient heavy-hitter flows detection extends far beyond workload distribution between the switch and the server, to domains such as Traffic Engineering, Analytics, and more.

Two Is Better Than One

A Hybrid Platform is an extremely powerful concept when properly designed. From our experience at InsidePacket, we learned that making the server and the switch work in tandem, so that each component is leveraged to perform a part of a problem that best matches its abilities, provides multiple benefits, and generally outperforms a single-platform approach.

This can be applied to a variety of performance-optimization challenges. We already mentioned the throughput/CPS imbalance problem, and the heavy-hitter detection problem. Other great use cases, to name a few, are offloading L2/L3 networking and tunneling workload to the switch, managing traffic distribution between multiple CPUs/cores by the switch, implementing DDoS mitigation and policing at line rate in the switch, coupled with millions of dynamically-managed ACL rules – protecting not only the downstream services but also the server resources. The list goes on, covering some non-trivial use cases such as assisting and accelerating content inspection and malware detection.

The Call for an Open Ecosystem

Building a highly scalable, low-latency, and function-rich Edge is a real challenge. To overcome this challenge, we need to use the best-in-class technologies available today: scalable programmable-pipeline switches, efficient data-path implementation in the server, SmartNIC offload, advanced algorithms complemented by models built using Machine Learning techniques, and more.

But these are just tools. To spur a real disruption in the industry, we need more than that – we need a vibrant and enthusiastic ecosystem where industry visionaries from various domains of expertise can work together to promote collaborative solutions based on these tools.

This is why at InsidePacket we endorse open architectures and frameworks such as Distributed Disaggregated Chassis (DDC), WhiteBox switches, SONiC NOS, DPDK, and VPP.

In order to promote such a collaboration, we created an open API. This API enables applications to utilize the engines we have built on top of an Open Accelerated Data plane platform. We then ported several open-source DDOS, Firewall, and load balancer projects, to run on top of these engines and APIs, showcasing multiple high-performance network applications running in concurrency.

These first steps offer a layered approach. The bottom layer consists of a programmable switch pipeline built for concurrency and scale, a state-of-the-art inspect-once server data path, and an optimized flow tracking and distribution logic based on accurate Heavy Hitter flows detection. All the complexity of this layer is abstracted out, and its functionality is exposed via elaborate and open API, serving as the middle layer. These two layers effectively provide an accessible Dataplane-as-a-Service (DaaS) infrastructure, enabling networking-services developers to focus on the advanced business logic of their applications at the top layer.

We are inviting you to innovate and experiment together and see how InsidePacket DaaS infrastructure can serve your application needs to deliver accelerated services wherever protecting the growing network traffic is becoming challenging. After all – if the switch and the server can collaborate in perfect harmony to achieve more, this is a clear example for us, the people who build them – no matter what camp we are coming from!

To learn more, contact us at

About InsidePacket

InsidePacket is a disruptive Software Start-Up in the Network Security market.

We leverage recent groundbreaking developments in the fields of Network Function Disaggregation, switch ASIC programmability and SDN to build innovative security solutions unmatched in scale and performance for Data Centers and Service Providers.

InsidePacket was founded by a team of World-Class Networking and Security Experts and is supported by industry thought leaders and strategic Tier One VCs.

If you are passionate about Software development and have a healthy understanding of Networking and/or Security Fundamentals, we have the proper challenge for you!

Join us at InsidePacket and become part of the CORE TEAM at the cutting edge of this revolution!

Follow by Email3