Giving packets a label

One of the things I like about OpenBSD is that it has so many ways to communicate with other systems.  Today I explore building an MPLS network using OpenBSD.  When I was trying to make this setup work for myself, I couldn't find any good documents on how to set things up.  So after piecing things together from different scraps that were 10 years old, I figured I write this article up to help someone else.

MPLS or (Multi Protocol Label Switching) is a protocol used by the largest providers to segment their WANs and provide "managed" services to customers.  MPLS works by encapsulating packets into an MPLS frame and labeling that frame.  (Usually multiple labels are on the frame as we'll see.)  That frame is moved from host to host with the label being either pushed, popped or swapped on the frame.  I will not go into all the intricacies of MPLS here, but it's import to understand that the outer label will change every hop the packet takes.

First some terminology

  • P router - Provider router that is in the center of the provider network.  Only P and PE routers attach to the P routers
  • PE router - Provider Edge router attaches to both P routers and CE routers. Encapsulation/Decapsulation of the customer frames happens on the PE routers.  Multiple customers are generally connected to PE routers
  • CE router - Customer Edge router attaches to PE routers.  This is the customers WAN router.  It has no knowledge of the underlying MPLS network.

In order to make MPLS work there are a layering of multiple routing protocols.   Each layer and each protocol has a purpose.  I'll explain each layer as we get to it.

Diagram

Here is the diagram we'll be recreating.  Since this is being built in a hypervisor, the management interfaces on the routers are only there to help connect from the hypervisor to the Guest OS.  They do not play any roll in the MPLS network and I've left them off of the diagram.

                            (mpe3)
                            (lo3)
                            (lo0)
(lo10) CE1 (vio1) <=> (vio2) PE1 (vio1) <-> (vio1)
                                              P (lo0)
(lo10) CE2 (vio1) <=> (vio2) PE2 (vio1) <-> (vio2)
                            (lo0)
                            (lo3)
                            (mpe3)

P   (lo0)  192.0.2.3/32
P   (vio1) 100.64.1.2/24
P   (vio2) 100.64.2.1/24
PE1 (lo0)  192.0.2.1/32
PE1 (lo3)  10.3.3.1/32     rtable 3
PE1 (vio1) 100.64.1.1/23
PE1 (vio2) 192.168.2.1/24
PE1 (mpe3) 10.3.3.1/32     rtable 3
PE2 (lo0)  192.0.2.2/32
PE2 (lo3)  10.3.3.2/32     rtable 3
PE2 (vio1) 100.64.2.2/24
PE2 (vio2) 192.168.3.1/24
PE2 (mpe3) 10.3.3.1/32     rtable 3
CE1 (lo10) 3.3.3.1/32
CE1 (vio1) 192.168.2.10/24
CE2 (lo10) 3.3.3.2/32
CE2 (vio1) 192.168.3.10/24

Underlay

The first layer is the underlay.  This is where the P and PE routers operate.

P router

The P router will connect both the PE routers.  On the VIO1 and VIO2 interfaces, we add the additional tag of mpls to indicate to the OS that this interface will participate in MPLS.

inet 192.0.2.3/32
hostname.lo0
inet 100.64.1.2/24
mpls
hostname.vio1
inet 100.64.2.1/24
mpls
hostname.vio2
/bin/sh /etc/netstart lo0 vio1 vio2
Enable the interfaces

Next we make sure that the P router will forward packets.

net.inet.ip.forwarding=1
net.inet.ip.mforwarding=1
net.inet6.ip6.forwarding=1
net.inet6.ip6.mforwarding=1
sysctl.conf
for i in `cat /etc/sysctl.conf`; do sysctl $i; done
Enable packet forwarding

Next we will need to enable a routing protocol.  This is critical to allow the loopback interfaces of each P / PE router to communicate with each other.  This communication is used by ldpd(8) to exchange labels.  While static routes will work (which you should never do), we'll use a routing protocol.  Fortunately, OpenBSD provides several to choose from: ripd(8), ospfd(8), eigrpd(8).  And because eigrpd(8) doesn't get much love, we'll use that here.  The setup is very simple. We first tell eigrpd it's router-id, to update the fib with any routes it learns and then we tell it what interfaces to listen on.

router-id 192.0.2.3
fib-update yes
address-family ipv4 {
        autonomous-system 1 {
                default-metric 100000 10 255 1 1500
                redistribute connected
                interface vio1
                interface vio2
                interface lo0
        }
}
eigrpd.conf
chmod 600 /etc/eigrpd.conf
rcctl enable eigrpd
rcctl start eigrpd
Enable and start eigrpd

Lastly we'll configure ldpd(8).  This routing daemon will exchange label information with the PE routers (and P's if we had any more) and update the lib or label information base which is equivalent to the fib for IP protocols.  Configuring ldpd(8) is very similar to eigrpd(8).  We set the router-id, tell it to update the fib and tell it what interfaces to communicate on.  The only extra step here is that we tell it what neighbors it will communicate with.

router-id 192.0.2.3
fib-update yes
transport-preference ipv4
address-family ipv4 {
        interface vio1
        interface vio2
}
neighbor 192.0.2.1 { }
neighbor 192.0.2.2 { }
ldpd.conf
chmod 600 /etc/ldpd.conf
rcctl enable ldpd
rcctl start ldpd
Enable and start ldpd

PE routers

The PE routers are going to be very similar to the P router when configuration the underlay.

PE1

inet 192.0.2.1/32
hostname.lo0
inet 100.64.1.1/24
mpls
hostname.vio1
router-id 192.0.2.1
fib-update yes
address-family ipv4 {
        autonomous-system 1 {
                default-metric 100000 10 255 1 1500
                redistribute connected
                interface vio1
                interface lo0
        }
}
eigrpd.conf
router-id 192.0.2.1
fib-update yes
transport-preference ipv4
address-family ipv4 {
        interface vio1
}
neighbor 192.0.2.3 { }
ldpd.con

PE2

Similar configuration to PE1.

inet 192.0.2.2/32
hostname.lo0
inet 100.64.2.2/24
mpls
hostname.vio1
router-id 192.0.2.2
fib-update yes
address-family ipv4 {
        autonomous-system 1 {
                default-metric 100000 10 255 1 1500
                redistribute connected
                interface vio1
                interface lo0
        }
}
eigrpd.conf
router-id 192.0.2.2
fib-update yes
transport-preference ipv4
address-family ipv4 {
        interface vio1
}
neighbor 192.0.2.3 { }
ldpd.conf

Both PE1 and PE2

Now enable the config and daemons on PE1 and PE2

/bin/sh /etc/netstart lo0 vio1
Enable the interfaces
net.inet.ip.forwarding=1
net.inet.ip.mforwarding=1
net.inet6.ip6.forwarding=1
net.inet6.ip6.mforwarding=1
sysctl.conf
for i in `cat /etc/sysctl.conf`; do sysctl $i; done
Enable packet forwarding
chmod 600 /etc/eigrpd.conf
rcctl enable eigrpd
rcctl start eigrpd
Enable and start eigrpd
chmod 600 /etc/ldpd.conf
rcctl enable ldpd
rcctl start ldpd
Enable and start ldpd

Validate connectivity

First thing is to ping over the locally connected interfaces.

pe1# ping 100.64.1.2
PING 100.64.1.2 (100.64.1.2): 56 data bytes
64 bytes from 100.64.1.2: icmp_seq=0 ttl=255 time=18.009 ms
64 bytes from 100.64.1.2: icmp_seq=1 ttl=255 time=0.488 ms
^C
--- 100.64.1.2 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 0.488/9.249/18.009/8.761 ms
ping from PE1 to P

Next is to ping from loopback to loopback.

pe1# ping -I 192.0.2.1 192.0.2.3
PING 192.0.2.3 (192.0.2.3): 56 data bytes
64 bytes from 192.0.2.3: icmp_seq=0 ttl=255 time=1.525 ms
64 bytes from 192.0.2.3: icmp_seq=1 ttl=255 time=0.550 ms
^C
--- 192.0.2.3 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 0.550/1.037/1.525/0.488 ms
ping from PE1 lo0 to P lo0

Now let's check our underlay routing protocols.

pe1# eigrpctl show neig       
AF   AS    Address            Iface       Holdtime     Uptime
ipv4 1     100.64.1.2         vio1        15         1d08h59m
eigrpctl show neighbor
pe1# eigrpctl show interfaces 
AF   AS    Interface   Address            Linkstate  Uptime    nc
ipv4 1     vio1        100.64.1.1/24      active     1d09h00m   1
ipv4 1     lo0         192.0.2.1/32       unknown    1d09h00m   0
eigrpctl show interfaces

Pay attention to the routes marked with the *D EX flags.  These are routes learned from other eigrp neighbors.

pe1# eigrpctl show fib        
flags: * = valid, D = EIGRP, C = Connected, S = Static
Flags  Prio Destination          Nexthop          
*C        4 100.64.1.0/24        100.64.1.1
*D EX    28 100.64.2.0/24        100.64.1.2
*C        0 127.0.0.0/8          link#0
*         1 192.0.2.1/32         192.0.2.1
*  EX    28 192.0.2.2/32         100.64.1.2
*D EX    28 192.0.2.3/32         100.64.1.2
eigrpctl show fib

Now let's check out ldpd.  Notice that the labels have been automatically assigned and that the IP of the peers shows up in the list.

pe1# ldpctl show interfaces
AF   Interface   State  Linkstate  Uptime   Hello Timers  ac
ipv4 vio1        ACTIVE active     1d09h03m 5/15           1
ldpctl show interfaces
pe1# ldpctl show neig       
AF   ID              State       Remote Address    Uptime
ipv4 192.0.2.3       OPERATIONAL 192.0.2.3       1d09h03m
ldpctl show neighbor
pe1# ldpctl show lib  
AF   Destination          Nexthop         Local Label Remote Label  In Use
ipv4 100.64.1.0/24        192.0.2.3       imp-null    imp-null          no
ipv4 100.64.2.0/24        192.0.2.3       20          imp-null         yes
ipv4 192.0.2.1/32         192.0.2.3       imp-null    20                no
ipv4 192.0.2.2/32         192.0.2.3       23          22               yes
ipv4 192.0.2.3/32         192.0.2.3       21          imp-null         yes
ldpctl show lib

Overlay

PE's and CE's

Now that the provider network is setup we'll turn our attention to the PE and CE communication.  On the PE the CE connections and routing will take place in their own routing domain/routing table.  For this example we'll use rdomain 3.  Similar to the start we'll first configure the interfaces.

PE1 Configuration

rdomain 3
inet 10.3.3.1/32
hostname.lo3
rdomain 3
inet 192.168.2.1/24
hostname.vio2

Here is a new interface. This interface will provide a forwarding path from the rdomain it's in to the MPLS layer in rdomain 0.

rdomain 3
mplslabel 300
inet 10.3.3.1/32
up
hostname.mpe3

The astute among you will notice that the IP address on mpe3 is the same as on lo3.  This is fine.  While I cannot speak authoritatively, the address is just needed to 'enable' the interface to participate in IP router.

Finally we enable bgpd(8).  We will need to enable 2 instances of bgpd(8) the first instance will run in rdomain 0 and will be responsible for leaning the routes in rdomain 3 and importing and tagging them.  The second bgpd(8) will run in rdomain 3 and communicate with the CE.

AS 65000
router-id 192.0.2.1
fib-update yes
vpn "rdom3" on mpe3 {
        rd 65000:2
        import-target rt 65000:2
        export-target rt 65000:2
        network inet connected
        network inet static
        network inet priority 36
}
group "PEs" {
        remote-as 65000
        announce IPv4 unicast
        announce IPv4 vpn
        local-address 192.0.2.1
        neighbor 192.0.2.2
}
allow from any
allow to any
match from any community GRACEFUL_SHUTDOWN set { localpref 0 }
bgpd.conf
AS 65000
router-id 10.3.3.1
fib-priority 36
network 10.3.3.1/32
network 0.0.0.0/0
# upstream providers
group "downstream" {
  neighbor 192.168.2.10 {
    local-address 192.168.2.1
    remote-as 65001
    descr "ce1"
  }
}
allow from any
allow to any
match from any community GRACEFUL_SHUTDOWN set { localpref 0 }
bgpd_rdom3.conf

PE2 Configuration

PE2 configuration will be similar to PE1

rdomain 3
inet 10.3.3.2/32
hostname.lo3
rdomain 3
inet 192.168.3.1/24
hostname.vio2
rdomain 3
mplslabel 300
inet 10.3.3.2/32
up
hostname.mpe3
AS 65000
router-id 192.0.2.2
fib-update yes
vpn "rdom3" on mpe3 {
        rd 65000:2
        import-target rt 65000:2
        export-target rt 65000:2
        network inet connected
        network inet static
        network inet priority 36
}
group "PEs" {
        remote-as 65000
        announce IPv4 unicast
        announce IPv4 vpn
        local-address 192.0.2.2
        neighbor 192.0.2.1
}
allow from any
allow to any
match from any community GRACEFUL_SHUTDOWN set { localpref 0 }
bgpd.conf
AS 65000
router-id 10.3.3.2
fib-priority 36
network 10.3.3.2/32
network 0.0.0.0/0
# upstream providers
group "downstream" {
  neighbor 192.168.3.10 {
    local-address 192.168.3.1
    remote-as 65001
    descr "ce2"
  }
}
allow from any
allow to any
match from any community GRACEFUL_SHUTDOWN set { localpref 0 }
bgpd_rdom3.conf

PE1 and PE2 Enablement

Now we enable the interfaces and start the bgpd(8) servers

/bin/sh /etc/netstart lo3 vio2 mpe3
Enable the interfaces
chmod 600 /etc/bgpd.conf
rcctl enable bgpd
rcctl start bgpd
Enable bgpd in rdomain 0

For bgpd(8) in rdomain 3 we will need to "create" a new rc script.  Then we can set the arguments and start it.

ln -s /etc/rc.d/bgpd /etc/rc.d/bgpd_rdom3
chmod 600 /etc/bgpd_rdom.conf
rcctl enable bgpd_rdom3
rcctl set bgpd_rdom3 rtable 3
rcctl set bgpd_rdom3 flags -f/etc/bgpd_rdom3.conf
rcctl start bgpd_rdom3
Enable bgpd in rdomain 3

CE1 Configuration

The CE configuration is quite simple.  We will just configure the interfaces and setup routing. To simulate  host networks behind the CE an lo10 interface will be created.

inet 192.168.2.10/24
hostname.vio1
inet 3.3.3.1/32
hostname.lo10
net.inet.ip.forwarding=1
net.inet6.ip6.forwarding=1
sysctl.conf
for i in `cat /etc/sysctl.conf`; do sysctl $i; done
Enable packet forwarding
#
AS 65001
router-id 192.168.3.10
network inet connected
# upstream providers
group "upstreams" {
  neighbor 192.168.2.1 {
    remote-as 65000
    descr "pe1"
  }
}
## rules section
allow from any
allow to any
bgpd.conf

CE2 Configuration

inet 192.168.3.10/24
hostname.vio1
inet 3.3.3.2/32
hostname.lo10
net.inet.ip.forwarding=1
net.inet6.ip6.forwarding=1
sysctl.conf
for i in `cat /etc/sysctl.conf`; do sysctl $i; done
Enable packet forwarding
#
AS 65002
router-id 192.168.3.10
network inet connected
# upstream providers
group "upstreams" {
  neighbor 192.168.3.1 {
    remote-as 65000
    descr "pe2"
  }
}
## rules section
allow from any
allow to any
bgpd.conf

CE1 and CE2 Enablement

/bin/sh /etc/netstart vio1 lo10
Enable interfaces
chmod 600 /etc/bgpd.conf
rcctl enable bgpd
rcctl start bgpd
Enable and start bgpd

Now lets make sure the configurations are working.  To start with on PE1 and PE2 the bgpd servers in rdomain 0 should have formed peers.

pe1# bgpctl show sum
Neighbor                   AS    MsgRcvd    MsgSent  OutQ Up/Down  State/PrfRcvd
192.0.2.2               65000         46         46     0 00:04:08      1
bgpctl show sum
pe1# bgpctl show rib
flags: * = Valid, > = Selected, I = via IBGP, A = Announced,
       S = Stale, E = Error
origin validation state: N = not-found, V = valid, ! = invalid
origin: i = IGP, e = EGP, ? = Incomplete

flags ovs destination          gateway          lpref   med aspath origin
AI*>    N rd 65000:2 3.3.3.1/32 rd 0:0 0.0.0.0    100     0 i
I*>     N rd 65000:2 3.3.3.2/32 192.0.2.2         100     0 i
AI*>    N rd 65000:2 192.168.2.0/24 rd 0:0 0.0.0.0    100     0 i
I*>     N rd 65000:2 192.168.3.0/24 192.0.2.2         100     0 i
bgpctl show rib

Notice that the rib shows that 3.3.3.1/32 has an rd of 0:0 and a nexthop of 0.0.0.0 meaning it's local and 3.3.3.2/32 has a nexthop of 192.0.2.2 which eill take the packet over MPLS.

If we ping from CE1 to CE2 and do a tcpdump(8) on P router vio1 we can see the MPLS tags in the packet.

ce1# ping -I 3.3.3.1 3.3.3.2
PING 3.3.3.2 (3.3.3.2): 56 data bytes
64 bytes from 3.3.3.2: icmp_seq=0 ttl=252 time=5.764 ms
64 bytes from 3.3.3.2: icmp_seq=1 ttl=252 time=1.704 ms
^C
--- 3.3.3.2 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/std-dev = 1.704/3.734/5.764/2.030 ms
ping from CE1 to CE2
p# tcpdump -i vio1 
tcpdump: listening on vio1, link-type EN10MB
13:30:21.901256 MPLS(label 17, exp 0, ttl 254) MPLS(label 300, exp 0, ttl 254) 3.3.3.1 > 3.3.3.2: icmp: echo request
13:30:21.902665 MPLS(label 300, exp 0, ttl 253) 3.3.3.2 > 3.3.3.1: icmp: echo reply
13:30:22.899795 MPLS(label 17, exp 0, ttl 254) MPLS(label 300, exp 0, ttl 254) 3.3.3.1 > 3.3.3.2: icmp: echo request
13:30:22.900622 MPLS(label 300, exp 0, ttl 253) 3.3.3.2 > 3.3.3.1: icmp: echo reply
tcpdump of ping from CE1 to CE2

Troubleshooting

If there is a problem check the /var/log/daemon log file for hints as to what might be failing.  If all else fails, start from the beginning.  Check the underlay, then move to the overlay

Wrapping up

If you stuck with this article to the end, thank you.  Let me know if I've make any mistakes or if something can be improved.