VXLAN: What is VXLAN

Reading Time: 5 minutes

In this post I would like to talk about VXLAN, what it is and what you can do with VXLAN. In later posts I will show some use cases for VXLAN.

What is VXLAN

VXLAN was designed by VMWare and is documented in RFC7348. The abbreviation VXLAN means Virtual Extensible Local Area Network and mainly provides a MAC-in-IP encapsulation. Doing this, the main benefit is to extend L2 connectivity over L3 boundaries.In current data center architectures, VXLAN is mainly used as an overlay technology. The idea behind this, is to create a static underlay, the hardware layer, which consists of the network hardware, interconnected in a so called clos architecture. This means, that every switch in the lower tier is connected to every switch in the upper tier. This ensures high availability, high bandwidth and a predefined latency due to the same hop account for every connection outside of the same access switch. To understand the concept, you can find an example below:

Clos-Architecture
Clos-Architecture

To get the underlay as static as possible you would define the links between access and core as L3 links. Doing so, you can avoid Spanning Tree. This also helps in balance the load among the links, as you would use Equal Cost MultiPath. To interconnect servers, which are connected to different access switches but belong to the same L2 network, one could use VXLAN and create a VXLAN tunnel between those two switches. VXLAN will use the L3 underlay to create the connection. The underlay will only see udp packets, therefore another big point for VXLAN comes up. you do not need any special network hardware for devices which will forward the traffic. Devices which needs to encapsulate traffic into an VXLAN tunnel needs those capabilities and are called VTEP (Virtual Tunnel Endpoint).

VXLAN capable devices also includes some virtual switches. Using those virtual switches, it is possible to create VXLAN tunnel without the need for special network equipment or to change your existing infrastructure. This works fine until you have the need to talk to devices which are not able of talking VXLAN. This is important, if you need to pass the traffic through a firewall or if you need to route the traffic between different VXLAN tunnels. Obviously, it is also needed of you would like to send traffic to destinations, like clients which are not able to understand VXLAN.

How does VXLAN work

VXLAN is MAC-in-IP encapsulation. To achieve this, Ethernet frames are encapsulated into UDP packets. The VXLAN header adds 8 bytes to the packet, which means it has a very small overhead, compared to other tunneling methods. The header has the following fields:

  • Flags(8 bit): All flags needs to be 0 except the I flag, which is required for valid network ID
  • VXLAN Segmet(24 bit): defines the VXLAN network identifier (VNI). I always think of it as a VLAN, because the behavior is the same.  clients which belong to different VNI’s cannot talk to each other. You can have more than 16 million VNI’s, which is much more than the 4096 vlans.
  • Reserved Fields(24 and 8 bit): must be set to 0 and could bring more features in the future

This leads to the following packet schema:

VXLAN-Packet
VXLAN-Packet

Knowing the packet schema, I will now explain how the whole process works. Lets assume, we have the following infrastructure:

VXLAN-Infrastructure
VXLAN-Infrastructure

Let’s assume further, that VM1 and VM4 belong to the same network and need to talk to each other. If VM1 needs to send a packet to VM4, the VM would send the packet directly to VM4’s mac address. As the frame arrives at the VSwitch A, the switch would know, that traffic from VM1 belongs to VNI 1 and would encapsulate the frame into the VXLAN packet. The switch would also encapsulate the VXLAN packet into an UDP packet. The destination port is always port 4789 and the source port is selected by the VSwitch dynamically, which will help to share the load among the links to the core. The checksum for the UDP packet can be either zero or the correct value, both options are valid. The source IP of this packet would be the IP of VSwitch A, the destination IP would be the IP of VSwitch B. A L2 header is also added, which should have the mac address of Switch A as destination and the mac address of VSwitch A as source address. This should look like this (Switch A is the default gateway for VSwitch A):

VXLAN-Packet-Example
VXLAN-Packet-Example

The packet is routed through the core and when the packet arrives at VSwitch B, the VXLAN header information is used to send to original frame to VM 4. No rocket science, but this was a really easy example. If you need to send traffic to a physical host, a hardware switch, which is than called a VTEP (Virtual Tunnel Endpoint) would do the same as VSwitch B did in our example. Now, let’s talk about different traffic types and how they get forwarded in the tunnel.

Unicast Traffic and VXLAN

Unicast traffic is the simplest option. If a VTEP needs to encapsulate a frame, it looks up the destination mac address before forwarding the frame. Mac addresses are stored in a table together with the IP address of the VTEP, where the host with this specific mac address is connected to. If a new unicast packet is received by the VTEP, the VTEP checks the VNI and if the inner mac address is associated with a connected host. Before forwarding the frame to the destination, the remote VTEP IP and the inner source mac address is stored in the VXLAN mac address table. If the source VTEP is not able to find a mapping in his VXLAN mac address table and therefore did not know to which VTEP the packet should be sent, normal broadcast is used, which means that all known VTEP’s, which have the same VNI will get the ARP request. To realize this in an L3 environment, multicast is used To avoid this kind of traffic, some vendors are using a control plane protocol to exchange the mac addresses between the VTEP’s. I will explain this in a later post.

Multicast and Broadcast Traffic and VXLAN

Multicast traffic and broadcast traffic needs to be replicated to get them to all VTEP’s which are configured for a specific VNI. To make this efficient as possible, the underlying infrastructure should be multicast enabled. If so, all VTEP’s, belonging to the same VNI can join a specific multicast group for this VNI. If multicast or broadcast needs to be sent to all members of this specific VNI, the traffic can be sent as multicast, which makes it very efficient. I will cover the VXLAN topic in a future post, including some use cases and how it works on HPE switches. If you found this post interesting please give me a like. If have questions, please leave a comment below this post.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.