Networking in Hyperscale Environments
Hyperscale Networks Lead the Revolution
Many technologies, new solutions and innovations start in small deployments and slowly mature and expand into larger deployments. Networking has a long history of large companies pushing the industry to create next-generation high-speed solutions and new innovations.
Some of the largest networks in the world now belong to Web giants such as Google, Facebook, Amazon and Microsoft. If these hyperscale players used traditional networking models, the capital expenses and operational requirements would crush them. Many of the technologies that fall into the category of software-defined networking (SDN)* have been proven at scale in the environments of companies that have higher growth rates and lower tolerance for outages than any enterprise.
The Opportunity of SDN
While which technologies are truly SDN and what is simply “SDN-washing” is debatable, there is consensus that new solutions should be designed for scalability and therefore favor automation and reducing/eliminating manual processes. Agility and speed of deployment of the network needs to catch up with other parts of IT in the era of cloud and virtualization.
In an interview at the Open Networking Summit 2014, Microsoft’s Rajeev Nagar said, “there are 10s of thousands of network changes on a daily basis [in the Azure cloud]…we could not do this in a rigid, inflexible and manual manner as we used to do in the past. We live, eat an breath the notion of a software-defined data center.”
Google has also been leveraging SDN for several years and recently unveiled its networking stack, code-named Andromeda. Google says, “Delivering the highest level of performance, availability, and security requires orchestrating across virtual machines, hypervisors, operating systems, network interface cards, top of rack switches, fabric switches, border routers, and even our network peering edge.”
While few companies have networking budgets that surpass $1B, as with other cloud offerings, users can either consume services from these suppliers or leverage SDN solutions that are being productized for the enterprise.
The SDN Toolbox
Two years ago, OpenFlow was the primary technology being discussed in any conversation about SDN. OpenFlow helped kick off the discussion of moving from a world of programming switches to a centralized provisioning model. This architectural discussion includes separating the control plane and data plane, which includes the requirement of an SDN controller. While the hyperscale players create their own architectures, the networking vendors have rushed to become the SDN controller of choice. The proliferation of SDN controller options (see SDN Central’s extensive list of products that have been announced and shipping) has made users cautious about the lack of any clear leading products and concern of lock-in. Users are interested in the creation of a networking software ecosystem, which requires an SDN controller that supports an ecosystem preferably with open APIs. Many open source SDN projects are looking to tackle this challenge. Projects such as OpenDaylight (see interview with ODL Technical Chairperson David Meyer), which has the goal of creating an open source controller with open northbound (physical) and southbound (application) APIs, will support OpenFlow today and be extensible to support many options in the future.
Server virtualization has transformed data centers, creating a significant impact on the flow of network traffic. The volume of traffic between servers (East-West bandwidth) is now often greater than the server-to-core (North-South) traffic. Adding to the complexity is that packets must move from virtual environments to physical and then back to virtual. Overlay networks are a class of tools that tackle this problem through encapsulation and tunneling so that traffic between virtualized servers can move between nodes with less concern about the configuration or protocols in use for the physical environment.
The applicability in hyperscale deployments is seen with Amazon VPC, Google Advanced Routing and as part of the overall networking solution in Microsoft Azure. The tools that are available in general are VXLAN (launched by VMware and a broad ecosystem), NVGRE (Microsoft and ecosystem; Microsoft stresses that this is just a tool, not a strategy) and STT (supported by Nicira, which was acquired by VMware).
Especially in cloud, but also in enterprise data centers, agility is critical. Overlay networking allows networking to have the speed of deployment and mobility that match what is possible with virtual machines. This is done by creating tunnels between virtual switches, which are inside the virtualization layer. The physical layer does not have to learn the virtual end-nodes, which has been a limit to the scalability of networks. Today end-nodes must all be VMs; solutions such as VXLAN Tunnel End Points (VTEP) and Cisco’s Application Centric Infrastructure (ACI) are looking to allow physical and virtual to work better together.
While overlay networks do not require a controller box, they do require support in the hypervisor, NIC/adapter and switch. Adapters now support offload/acceleration for overlay networks, such as Emulex’s VNeX technology, and switches can include translation between different overlays (such as VXLAN to NVGRE). As virtual machines move between physical devices, overlay networks allow the networking settings to move with the VMs, enabling the speed and agility that is required in cloud and enterprise today. Overlays also enable multi-tenancy by simplifying allowing the management on a single environment having to combine domains.
White Box Switches and Switch Operating Systems
Hyperscale companies do not buy traditional hardware. Since they buy at higher volume and have specific needs – more homogenous environments and well-defined data center design compared to the diversity of the enterprise – the biggest companies buy components from white box switches rather than name-brand solutions. White box switches have the same ASICs that are packaged by many of the name brand solutions but do not have the value-add operating system, management and other services that companies like Arista, HP, Juniper and others add on top of the market silicon. While the physical boxes are significantly cheaper, companies considering a move to white box solutions must balance the burden of development and management of the environment. This model makes the most sense for environments that must manage a high-growth network that adds dozens of new switches a year and would not make sense for companies with relatively stable networks that are upgraded on a 5-10 year refresh cycle.
Google has built a full networking stack. Companies like Cumulus Networks are offering an operating system for switches that is independent of the hardware. Cumulus Linux can be run on white box switches and others including Dell. While white box and switch OS movements are already accepted in hyperscale deployments, these offerings have organizational, skill set and purchasing challenges to overcome in the enterprise.
Modernization of Applications Accelerate Networking Changes
Server virtualization is not the only workload that is impacting network design in cloud and enterprise environments. Modern applications built for mobile applications and analytics are driving high performance and low latency requirements that are a stress on traditional architectures. Some of the fastest growing parts of the big data market are Hadoop and NoSQL (such as MongoDB), and the networking requirements are much closer to what is required for HPC environments (scale-out with high-bandwidth and predictable latency between nodes).
Server virtualization increased the utilization of resources and changed the traffic patterns. While the focus of Platform-as-a-Service (PaaS) is the creation of modern applications, a secondary effect is that networking will have to keep up with even higher utilization of individual devices that require even more communication between nodes. The take-away is that businesses that maintain their internal IT as strategic must have scale, agility and extensibility as design principles, since the workloads and usage will continue to strain the network.
Footnotes: *In this article, the term SDN is used as an umbrella to include a broad spectrum of options. For the purpose of the discussion of hyperscale environments, NFV, which today is primarily used in telecom environments, is out of scope, see Sorting out SDN, NFV, Network Virtualization and the New Networking.