I started this activity thinking that it would be a quick afternoon and I would have things working. How wrong I was. I'm writing this so that I can remember what I did to make this work. Hopefully it helps someone else.
Now one thing I am doing here that is probably lead to most of my issues is I wanted to use rdomain(4)s for the private side of the connections. So let's start from the beginning... it's a good place to start. Step 1 is getting the communication between vm1 and vm2 working with straight GRE. Step 2 will encapsulate the GRE in IPSec. Why would someone do this? Why not just use straight IPSec? Enter the need for the rdomains. I want to logically separate functions/customers/etc from one another. Using rdomains I can accomplish this. But once I go over a WAN/Internet I need some way to tag the traffic to a specific rdomain so I can keep it separate. GRE allows this tagging through use of the vnetid. So my tunnels keep stuff separate and and IPSec keeps things secure.
Step 1: GRE
Enable sysctl values to allow forwarding and GRE.
After the sysctl.conf is updated enable the settings. Here a quick way to enable without restartinglll
for c in $(cat /etc/sysctl.conf | egrep -v "^#"); do sysctl $c; done
Now let's update the interface(s)
It would be awesome if we could specify an interface to use instead of the IP address of that interface, but until I can write better code... I'll have to dream.
That should be it. To enable the interfaces without rebooting use the /etc/netstart command. This also insures the hostname file will be process correctly during reboot.
vm1# /bin/sh /etc/netstart gre0 # or whatever interface you want.
Quick check of the interfaces should show similar
vm1# ifconfig vio0 vio0: flags=8b43<UP,BROADCAST,RUNNING,PROMISC,ALLMULTI,SIMPLEX,MULTICAST> mtu 1500 lladdr fe:e1:bb:d1:e8:ec index 1 priority 0 llprio 3 media: Ethernet autoselect status: active inet 100.64.3.1 netmask 0xffffff00 broadcast 100.64.3.255 vm1# ifconfig gre0 gre0: flags=8051<UP,POINTOPOINT,RUNNING,MULTICAST> rdomain 10 mtu 1476 index 4 priority 0 llprio 6 encap: vnetid 11 txprio payload rxprio packet groups: gre tunnel: inet 100.64.3.1 -> 100.64.3.2 ttl 64 nodf ecn rdomain 0 inet 192.168.2.1 --> 192.168.2.2 netmask 0xffffff00
Now lets test.
vm1# ping 192.168.2.2 PING 192.168.2.2 (192.168.2.2): 56 data bytes ping: sendmsg: No route to host ping: wrote 192.168.2.2 64 chars, ret=-1 ping: sendmsg: No route to host ping: wrote 192.168.2.2 64 chars, ret=-1 ping: sendmsg: No route to host ping: wrote 192.168.2.2 64 chars, ret=-1 ^C --- 192.168.2.2 ping statistics --- 3 packets transmitted, 0 packets received, 100.0% packet loss
Why did it fail? Well as stated above,
I wanted to make my life hard err... I wanted to have the private side in a different rdomain. Because of this I need to run the ping from within the correct rdomain like this
vm1# route -T 10 exec ping 192.168.2.2 PING 192.168.2.2 (192.168.2.2): 56 data bytes 64 bytes from 192.168.2.2: icmp_seq=0 ttl=255 time=20.050 ms 64 bytes from 192.168.2.2: icmp_seq=1 ttl=255 time=208.008 ms 64 bytes from 192.168.2.2: icmp_seq=2 ttl=255 time=1.804 ms ^C --- 192.168.2.2 ping statistics --- 3 packets transmitted, 3 packets received, 0.0% packet loss round-trip min/avg/max/std-dev = 1.804/76.621/208.008/93.203 ms
Now that looks better!
Phase 2: IKED/IPSEC
Here we begin what must be the ultimate test in frustration, we need to enable IKED. We could manually configure the IPSec SA, but that is not realistic and very manual... (maybe after we have configured IKED the manual IPSec begins to look good.).
To start we need to distribute the local.pub key on each system to the other systems. The key is located in /etc/iked/local.pub.
Now let's configure iked.conf. For no particular reason at all we will make vm1 the initiator by specifying active. Make sure the the file has permission of '0600' so that iked doesn't complain. We also set the mode to transport instead of the default tunneled. Since the networks behind the rdomain are already encapsulated, the only thing that IPSec sees is the GRE packet.
To test that the configuration files are correct use...
vm1# iked -nvf /etc/iked.conf ikev2 "policy1" active transport esp proto gre inet from 100.64.3.1 to 100.64.3.2 local any peer 100.64.3.2 ikesa enc aes-128-gcm,aes-256-gcm prf hmac-sha2-256,hmac-sha2-384,hmac-sha2-512,hmac-sha1 group curve25519,ecp521,ecp384,ecp256,modp4096,modp3072,modp2048,modp1536,modp1024 ikesa enc aes-256,aes-192,aes-128,3des prf hmac-sha2-256,hmac-sha2-384,hmac-sha2-512,hmac-sha1 auth hmac-sha2-256,hmac-sha2-384,hmac-sha2-512,hmac-sha1 group curve25519,ecp521,ecp384,ecp256,modp4096,modp3072,modp2048,modp1536,modp1024 childsa enc aes-128-gcm,aes-256-gcm esn,noesn childsa enc aes-256,aes-192,aes-128 auth hmac-sha2-256,hmac-sha2-384,hmac-sha2-512,hmac-sha1 esn,noesn lifetime 10800 bytes 536870912 signature configuration OK
If all is well you'll see "Configuration OK".
Now we can start iked with the -dv flag to see the debug information. This will show us if something fails. tmux(1) makes it easy to run this command in one pane and have a shell to continue to work on in another. Now we can check that the iked daemons are communicating using the ikectl command
vm1# ikectl show sa iked_sas: 0xdd25d63a7d0 rspi 0x19dbcab84fda1041 ispi 0x9c7b6e03c5247505 100.64.3.1:500->100.64.3.2:500<FQDN/vm2> ESTABLISHED r udpecap nexti 0x0 pol 0xdd205f67000 .1:500->100.64.3.2:500<FQDN/vm2> ESTABLISHED r udpecap nexti 0x0 pol 0xdd205f6(LA) B=0x0 P=0xdd20d731600 @0xdd25d63a7d0 7000 sa_childsas: 0xdd210366100 ESP 0x8b2cd3f1 in 100.64.3.2:500 -> 100.64.3.1:500 dd25d63a7d0 (LA) B=0x0 P=0xdd20d731600 @0xdd25d63a7d0 sa_childsas: 0xdd20d731600 ESP 0x40b143ea out 100.64.3.1:500 -> 100.64.3.2:5000 (L) B=0x0 P=0xdd210366100 @0xdd25d63a7d0 (L) B=0x0 P=0xdd210366100 @0xdd25d63a7d0 sa_flows: 0xdd24e75bc00 ESP out 100.64.3.1/32 -> 100.64.3.2/32 @-1 (L) @0xd25d63a7d0 dd25d63a7d0 sa_flows: 0xdd1fa4b1c00 ESP in 100.64.3.2/32 -> 100.64.3.1/32 @-1 (L) @0xd d25d63a7d0 iked_activesas: 0xdd20d731600 ESP 0x40b143ea out 100.64.3.1:500 -> 100.64.3.2:50 0 (L) B=0x0 P=0xdd210366100 @0xdd25d63a7d0 iked_activesas: 0xdd210366100 ESP 0x8b2cd3f1 in 100.64.3.2:500 -> 100.64.3.1:500 (LA) B=0x0 P=0xdd20d731600 @0xdd25d63a7d0 iked_flows: 0xdd1fa4b1c00 ESP in 100.64.3.2/32 -> 100.64.3.1/32 @-1 (L) @0xd d25d63a7d0 iked_flows: 0xdd24e75bc00 ESP out 100.64.3.1/32 -> 100.64.3.2/32 @-1 (L) @0xdd25d63a7d0 dd25d63a7d0
If you have similar output you can now use the ipsecctl command to see the IPSec SAs
vm1# ipsecctl -sall FLOWS: flow esp in proto gre from 100.64.3.2 to 100.64.3.1 peer 100.64.3.2 srcid FQDN/v m1 dstid FQDN/vm2 type require flow esp out proto gre from 100.64.3.1 to 100.64.3.2 peer 100.64.3.2 srcid FQDN/ vm1 dstid FQDN/vm2 type require SAD: esp transport from 100.64.3.1 to 100.64.3.2 spi 0x40b143ea enc aes-128-gcm esp transport from 100.64.3.2 to 100.64.3.1 spi 0x8b2cd3f1 enc aes-128-gcm
If things are not working check the the iked pane and look for any errors. I have listed some errors I saw below.
Last we need to enable the enc interface
Run the netstart command like was done previously.
Some may ask why we don't put this interface into the rdomain 10 like the GRE interfaces. This took me longer than I would like to admit to figure out. let me try to explain. The traffic first hits gre0 interface and is encapsulated and the encapsulated packet is placed in rdomain 0 (the default domain). From there the packet is then intercepted by IPSec and encrypted before being sent out the interface.
Let's test again. From vm2 we will run a tcpdump on the vio0 interface.
vm2# tcpdump -i vio0 -nvv proto 50 or proto 47
This will show us if the traffic is ESP or GRE. From vm1 we run the ping again (and we won't forget to do it from the correct rdomain? Right ;)
vm1# route -T 10 exec ping 192.168.2.2 PING 192.168.2.2 (192.168.2.2): 56 data bytes 64 bytes from 192.168.2.2: icmp_seq=0 ttl=255 time=0.760 ms 64 bytes from 192.168.2.2: icmp_seq=1 ttl=255 time=1.535 ms ^C --- 192.168.2.2 ping statistics --- 2 packets transmitted, 2 packets received, 0.0% packet loss round-trip min/avg/max/std-dev = 0.760/1.148/1.535/0.388 ms
and the output from tcpdump should look similar
vm2# tcpdump -i vio0 -nvv proto 50 or proto 47 tcpdump: listening on vio0, link-type EN10MB 22:28:29.832670 100.64.3.1 > 100.64.3.2: esp spi 0x40b143ea seq 2 len 128 (ttl 64, id 16207, len 148) 22:28:29.832850 100.64.3.2 > 100.64.3.1: esp spi 0x8b2cd3f1 seq 2 len 128 (ttl 64, id 27150, len 148) 22:28:31.849452 100.64.3.1 > 100.64.3.2: esp spi 0x40b143ea seq 3 len 128 (ttl 64, id 14054, len 148) 22:28:31.849804 100.64.3.2 > 100.64.3.1: esp spi 0x8b2cd3f1 seq 3 len 128 (ttl 64, id 52498, len 148)
With everything working we can stop the debug session with iked and start them like proper services.
rcctl enable iked rcctl start iked
Below are some error you are likely to see if the keys are not distributed correctly
ca_getreq: no valid local certificate found for FQDN/vm1 ca_validate_pubkey: could not open public key pubkeys/fqdn/vm2 ikev2_dispatch_cert: peer certificate is invalid ikev2_send_auth_failed: authentication failed for FQDN/vm2 sa_free: authentication failed notification from peer