Giving packets a label
One of the things I like about OpenBSD is that it has so many ways to communicate with other systems. Today I explore building an MPLS network using OpenBSD.
One of the things I like about OpenBSD is that it has so many ways to communicate with other systems. Today I explore building an MPLS network using OpenBSD. When I was trying to make this setup work for myself, I couldn't find any good documents on how to set things up. So after piecing things together from different scraps that were 10 years old, I figured I write this article up to help someone else.
MPLS or (Multi Protocol Label Switching) is a protocol used by the largest providers to segment their WANs and provide "managed" services to customers. MPLS works by encapsulating packets into an MPLS frame and labeling that frame. (Usually multiple labels are on the frame as we'll see.) That frame is moved from host to host with the label being either pushed, popped or swapped on the frame. I will not go into all the intricacies of MPLS here, but it's import to understand that the outer label will change every hop the packet takes.
First some terminology
- P router - Provider router that is in the center of the provider network. Only P and PE routers attach to the P routers
- PE router - Provider Edge router attaches to both P routers and CE routers. Encapsulation/Decapsulation of the customer frames happens on the PE routers. Multiple customers are generally connected to PE routers
- CE router - Customer Edge router attaches to PE routers. This is the customers WAN router. It has no knowledge of the underlying MPLS network.
In order to make MPLS work there are a layering of multiple routing protocols. Each layer and each protocol has a purpose. I'll explain each layer as we get to it.
Diagram
Here is the diagram we'll be recreating. Since this is being built in a hypervisor, the management interfaces on the routers are only there to help connect from the hypervisor to the Guest OS. They do not play any roll in the MPLS network and I've left them off of the diagram.
(mpe3)
(lo3)
(lo0)
(lo10) CE1 (vio1) <=> (vio2) PE1 (vio1) <-> (vio1)
P (lo0)
(lo10) CE2 (vio1) <=> (vio2) PE2 (vio1) <-> (vio2)
(lo0)
(lo3)
(mpe3)
P (lo0) 192.0.2.3/32
P (vio1) 100.64.1.2/24
P (vio2) 100.64.2.1/24
PE1 (lo0) 192.0.2.1/32
PE1 (lo3) 10.3.3.1/32 rtable 3
PE1 (vio1) 100.64.1.1/23
PE1 (vio2) 192.168.2.1/24
PE1 (mpe3) 10.3.3.1/32 rtable 3
PE2 (lo0) 192.0.2.2/32
PE2 (lo3) 10.3.3.2/32 rtable 3
PE2 (vio1) 100.64.2.2/24
PE2 (vio2) 192.168.3.1/24
PE2 (mpe3) 10.3.3.1/32 rtable 3
CE1 (lo10) 3.3.3.1/32
CE1 (vio1) 192.168.2.10/24
CE2 (lo10) 3.3.3.2/32
CE2 (vio1) 192.168.3.10/24
Underlay
The first layer is the underlay. This is where the P and PE routers operate.
P router
The P router will connect both the PE routers. On the VIO1 and VIO2 interfaces, we add the additional tag of mpls to indicate to the OS that this interface will participate in MPLS.
inet 192.0.2.3/32
inet 100.64.1.2/24
mpls
inet 100.64.2.1/24
mpls
/bin/sh /etc/netstart lo0 vio1 vio2
Next we make sure that the P router will forward packets.
net.inet.ip.forwarding=1
net.inet.ip.mforwarding=1
net.inet6.ip6.forwarding=1
net.inet6.ip6.mforwarding=1
for i in `cat /etc/sysctl.conf`; do sysctl $i; done
Next we will need to enable a routing protocol. This is critical to allow the loopback interfaces of each P / PE router to communicate with each other. This communication is used by ldpd(8) to exchange labels. While static routes will work (which you should never do), we'll use a routing protocol. Fortunately, OpenBSD provides several to choose from: ripd(8), ospfd(8), eigrpd(8). And because eigrpd(8) doesn't get much love, we'll use that here. The setup is very simple. We first tell eigrpd it's router-id, to update the fib with any routes it learns and then we tell it what interfaces to listen on.
router-id 192.0.2.3
fib-update yes
address-family ipv4 {
autonomous-system 1 {
default-metric 100000 10 255 1 1500
redistribute connected
interface vio1
interface vio2
interface lo0
}
}
chmod 600 /etc/eigrpd.conf
rcctl enable eigrpd
rcctl start eigrpd
Lastly we'll configure ldpd(8). This routing daemon will exchange label information with the PE routers (and P's if we had any more) and update the lib or label information base which is equivalent to the fib for IP protocols. Configuring ldpd(8) is very similar to eigrpd(8). We set the router-id, tell it to update the fib and tell it what interfaces to communicate on. The only extra step here is that we tell it what neighbors it will communicate with.
router-id 192.0.2.3
fib-update yes
transport-preference ipv4
address-family ipv4 {
interface vio1
interface vio2
}
neighbor 192.0.2.1 { }
neighbor 192.0.2.2 { }
chmod 600 /etc/ldpd.conf
rcctl enable ldpd
rcctl start ldpd
PE routers
The PE routers are going to be very similar to the P router when configuration the underlay.
PE1
inet 192.0.2.1/32
inet 100.64.1.1/24
mpls
router-id 192.0.2.1
fib-update yes
address-family ipv4 {
autonomous-system 1 {
default-metric 100000 10 255 1 1500
redistribute connected
interface vio1
interface lo0
}
}
router-id 192.0.2.1
fib-update yes
transport-preference ipv4
address-family ipv4 {
interface vio1
}
neighbor 192.0.2.3 { }
PE2
Similar configuration to PE1.
inet 192.0.2.2/32
inet 100.64.2.2/24
mpls
router-id 192.0.2.2
fib-update yes
address-family ipv4 {
autonomous-system 1 {
default-metric 100000 10 255 1 1500
redistribute connected
interface vio1
interface lo0
}
}
router-id 192.0.2.2
fib-update yes
transport-preference ipv4
address-family ipv4 {
interface vio1
}
neighbor 192.0.2.3 { }
Both PE1 and PE2
Now enable the config and daemons on PE1 and PE2
/bin/sh /etc/netstart lo0 vio1
net.inet.ip.forwarding=1
net.inet.ip.mforwarding=1
net.inet6.ip6.forwarding=1
net.inet6.ip6.mforwarding=1
for i in `cat /etc/sysctl.conf`; do sysctl $i; done
chmod 600 /etc/eigrpd.conf
rcctl enable eigrpd
rcctl start eigrpd
chmod 600 /etc/ldpd.conf
rcctl enable ldpd
rcctl start ldpd
Validate connectivity
First thing is to ping over the locally connected interfaces.
pe1# ping 100.64.1.2
PING 100.64.1.2 (100.64.1.2): 56 data bytes
64 bytes from 100.64.1.2: icmp_seq=0 ttl=255 time=18.009 ms
64 bytes from 100.64.1.2: icmp_seq=1 ttl=255 time=0.488 ms
^C
--- 100.64.1.2 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 0.488/9.249/18.009/8.761 ms
Next is to ping from loopback to loopback.
pe1# ping -I 192.0.2.1 192.0.2.3
PING 192.0.2.3 (192.0.2.3): 56 data bytes
64 bytes from 192.0.2.3: icmp_seq=0 ttl=255 time=1.525 ms
64 bytes from 192.0.2.3: icmp_seq=1 ttl=255 time=0.550 ms
^C
--- 192.0.2.3 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 0.550/1.037/1.525/0.488 ms
Now let's check our underlay routing protocols.
pe1# eigrpctl show neig
AF AS Address Iface Holdtime Uptime
ipv4 1 100.64.1.2 vio1 15 1d08h59m
pe1# eigrpctl show interfaces
AF AS Interface Address Linkstate Uptime nc
ipv4 1 vio1 100.64.1.1/24 active 1d09h00m 1
ipv4 1 lo0 192.0.2.1/32 unknown 1d09h00m 0
Pay attention to the routes marked with the *D EX flags. These are routes learned from other eigrp neighbors.
pe1# eigrpctl show fib
flags: * = valid, D = EIGRP, C = Connected, S = Static
Flags Prio Destination Nexthop
*C 4 100.64.1.0/24 100.64.1.1
*D EX 28 100.64.2.0/24 100.64.1.2
*C 0 127.0.0.0/8 link#0
* 1 192.0.2.1/32 192.0.2.1
* EX 28 192.0.2.2/32 100.64.1.2
*D EX 28 192.0.2.3/32 100.64.1.2
Now let's check out ldpd. Notice that the labels have been automatically assigned and that the IP of the peers shows up in the list.
pe1# ldpctl show interfaces
AF Interface State Linkstate Uptime Hello Timers ac
ipv4 vio1 ACTIVE active 1d09h03m 5/15 1
pe1# ldpctl show neig
AF ID State Remote Address Uptime
ipv4 192.0.2.3 OPERATIONAL 192.0.2.3 1d09h03m
pe1# ldpctl show lib
AF Destination Nexthop Local Label Remote Label In Use
ipv4 100.64.1.0/24 192.0.2.3 imp-null imp-null no
ipv4 100.64.2.0/24 192.0.2.3 20 imp-null yes
ipv4 192.0.2.1/32 192.0.2.3 imp-null 20 no
ipv4 192.0.2.2/32 192.0.2.3 23 22 yes
ipv4 192.0.2.3/32 192.0.2.3 21 imp-null yes
Overlay
PE's and CE's
Now that the provider network is setup we'll turn our attention to the PE and CE communication. On the PE the CE connections and routing will take place in their own routing domain/routing table. For this example we'll use rdomain 3. Similar to the start we'll first configure the interfaces.
PE1 Configuration
rdomain 3
inet 10.3.3.1/32
rdomain 3
inet 192.168.2.1/24
Here is a new interface. This interface will provide a forwarding path from the rdomain it's in to the MPLS layer in rdomain 0.
rdomain 3
mplslabel 300
inet 10.3.3.1/32
up
The astute among you will notice that the IP address on mpe3 is the same as on lo3. This is fine. While I cannot speak authoritatively, the address is just needed to 'enable' the interface to participate in IP router.
Finally we enable bgpd(8). We will need to enable 2 instances of bgpd(8) the first instance will run in rdomain 0 and will be responsible for leaning the routes in rdomain 3 and importing and tagging them. The second bgpd(8) will run in rdomain 3 and communicate with the CE.
AS 65000
router-id 192.0.2.1
fib-update yes
vpn "rdom3" on mpe3 {
rd 65000:2
import-target rt 65000:2
export-target rt 65000:2
network inet connected
network inet static
network inet priority 36
}
group "PEs" {
remote-as 65000
announce IPv4 unicast
announce IPv4 vpn
local-address 192.0.2.1
neighbor 192.0.2.2
}
allow from any
allow to any
match from any community GRACEFUL_SHUTDOWN set { localpref 0 }
AS 65000
router-id 10.3.3.1
fib-priority 36
network 10.3.3.1/32
network 0.0.0.0/0
# upstream providers
group "downstream" {
neighbor 192.168.2.10 {
local-address 192.168.2.1
remote-as 65001
descr "ce1"
}
}
allow from any
allow to any
match from any community GRACEFUL_SHUTDOWN set { localpref 0 }
PE2 Configuration
PE2 configuration will be similar to PE1
rdomain 3
inet 10.3.3.2/32
rdomain 3
inet 192.168.3.1/24
rdomain 3
mplslabel 300
inet 10.3.3.2/32
up
AS 65000
router-id 192.0.2.2
fib-update yes
vpn "rdom3" on mpe3 {
rd 65000:2
import-target rt 65000:2
export-target rt 65000:2
network inet connected
network inet static
network inet priority 36
}
group "PEs" {
remote-as 65000
announce IPv4 unicast
announce IPv4 vpn
local-address 192.0.2.2
neighbor 192.0.2.1
}
allow from any
allow to any
match from any community GRACEFUL_SHUTDOWN set { localpref 0 }
AS 65000
router-id 10.3.3.2
fib-priority 36
network 10.3.3.2/32
network 0.0.0.0/0
# upstream providers
group "downstream" {
neighbor 192.168.3.10 {
local-address 192.168.3.1
remote-as 65001
descr "ce2"
}
}
allow from any
allow to any
match from any community GRACEFUL_SHUTDOWN set { localpref 0 }
PE1 and PE2 Enablement
Now we enable the interfaces and start the bgpd(8) servers
/bin/sh /etc/netstart lo3 vio2 mpe3
chmod 600 /etc/bgpd.conf
rcctl enable bgpd
rcctl start bgpd
For bgpd(8) in rdomain 3 we will need to "create" a new rc script. Then we can set the arguments and start it.
ln -s /etc/rc.d/bgpd /etc/rc.d/bgpd_rdom3
chmod 600 /etc/bgpd_rdom.conf
rcctl enable bgpd_rdom3
rcctl set bgpd_rdom3 rtable 3
rcctl set bgpd_rdom3 flags -f/etc/bgpd_rdom3.conf
rcctl start bgpd_rdom3
CE1 Configuration
The CE configuration is quite simple. We will just configure the interfaces and setup routing. To simulate host networks behind the CE an lo10 interface will be created.
inet 192.168.2.10/24
inet 3.3.3.1/32
net.inet.ip.forwarding=1
net.inet6.ip6.forwarding=1
for i in `cat /etc/sysctl.conf`; do sysctl $i; done
#
AS 65001
router-id 192.168.3.10
network inet connected
# upstream providers
group "upstreams" {
neighbor 192.168.2.1 {
remote-as 65000
descr "pe1"
}
}
## rules section
allow from any
allow to any
CE2 Configuration
inet 192.168.3.10/24
inet 3.3.3.2/32
net.inet.ip.forwarding=1
net.inet6.ip6.forwarding=1
for i in `cat /etc/sysctl.conf`; do sysctl $i; done
#
AS 65002
router-id 192.168.3.10
network inet connected
# upstream providers
group "upstreams" {
neighbor 192.168.3.1 {
remote-as 65000
descr "pe2"
}
}
## rules section
allow from any
allow to any
CE1 and CE2 Enablement
/bin/sh /etc/netstart vio1 lo10
chmod 600 /etc/bgpd.conf
rcctl enable bgpd
rcctl start bgpd
Now lets make sure the configurations are working. To start with on PE1 and PE2 the bgpd servers in rdomain 0 should have formed peers.
pe1# bgpctl show sum
Neighbor AS MsgRcvd MsgSent OutQ Up/Down State/PrfRcvd
192.0.2.2 65000 46 46 0 00:04:08 1
pe1# bgpctl show rib
flags: * = Valid, > = Selected, I = via IBGP, A = Announced,
S = Stale, E = Error
origin validation state: N = not-found, V = valid, ! = invalid
origin: i = IGP, e = EGP, ? = Incomplete
flags ovs destination gateway lpref med aspath origin
AI*> N rd 65000:2 3.3.3.1/32 rd 0:0 0.0.0.0 100 0 i
I*> N rd 65000:2 3.3.3.2/32 192.0.2.2 100 0 i
AI*> N rd 65000:2 192.168.2.0/24 rd 0:0 0.0.0.0 100 0 i
I*> N rd 65000:2 192.168.3.0/24 192.0.2.2 100 0 i
Notice that the rib shows that 3.3.3.1/32 has an rd of 0:0 and a nexthop of 0.0.0.0 meaning it's local and 3.3.3.2/32 has a nexthop of 192.0.2.2 which eill take the packet over MPLS.
If we ping from CE1 to CE2 and do a tcpdump(8) on P router vio1 we can see the MPLS tags in the packet.
ce1# ping -I 3.3.3.1 3.3.3.2
PING 3.3.3.2 (3.3.3.2): 56 data bytes
64 bytes from 3.3.3.2: icmp_seq=0 ttl=252 time=5.764 ms
64 bytes from 3.3.3.2: icmp_seq=1 ttl=252 time=1.704 ms
^C
--- 3.3.3.2 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 1.704/3.734/5.764/2.030 ms
p# tcpdump -i vio1
tcpdump: listening on vio1, link-type EN10MB
13:30:21.901256 MPLS(label 17, exp 0, ttl 254) MPLS(label 300, exp 0, ttl 254) 3.3.3.1 > 3.3.3.2: icmp: echo request
13:30:21.902665 MPLS(label 300, exp 0, ttl 253) 3.3.3.2 > 3.3.3.1: icmp: echo reply
13:30:22.899795 MPLS(label 17, exp 0, ttl 254) MPLS(label 300, exp 0, ttl 254) 3.3.3.1 > 3.3.3.2: icmp: echo request
13:30:22.900622 MPLS(label 300, exp 0, ttl 253) 3.3.3.2 > 3.3.3.1: icmp: echo reply
Troubleshooting
If there is a problem check the /var/log/daemon log file for hints as to what might be failing. If all else fails, start from the beginning. Check the underlay, then move to the overlay
Wrapping up
If you stuck with this article to the end, thank you. Let me know if I've make any mistakes or if something can be improved.