diff --git a/docs/user-guide/host-settings.md b/docs/user-guide/host-settings.md index 5a7895da..50ec99c0 100644 --- a/docs/user-guide/host-settings.md +++ b/docs/user-guide/host-settings.md @@ -88,3 +88,149 @@ kubectl fabric vpc attach --vpc-subnet vpc-2/default --connection server-1--leaf [bonding]: https://www.kernel.org/doc/html/latest/networking/bonding.html +## HostBGP container + +If using [HostBGP subnets](vpcs.md#hostbgp-subnets), BGP should be running on the host server and +an appropriate configuration should be applied. To facilitate these steps, Hedgehog provides a +docker container which automatically starts [FRR](https://docs.frrouting.org/en/latest/) with +a valid configuration to join the Fabric. + +As a first step, users should download the docker image from our registry: +```bash +docker pull ghcr.io/githedgehog/host-bgp +``` + +The container should then be run with host networking (so that FRR can communicate with the leaves +using the host's interfaces) and in privileged mode. Additionally, a few input parameters are required: + +- an optional ASN to use in BGP - if not specified the container will use ASN 64999; +- one or more VPCs with their related parameters, in the format + `:v=:i=[:i=...]:a=[:a=...]`, where: + - `v=` is the VLAN ID to be used for the VPC; use 0 for untagged. + - `i=` is an interface to be used to establish a BGP unnumbered session with a + Fabric leaf; if a VLAN ID was specified, a corresponding VLAN interface will be created using + the provided interface as the master device. + - `a=` is the Virtual IP (or VIP) to be advertised to the leaves; it should have + a prefix length of /32 and be part of the subnet the host is attaching to. + +As an example, the command might look something like this: +```bash +docker run --network=host --privileged --rm --detach --name hostbgp ghcr.io/githedgehog/host-bgp vpc-01:v=1001:i=enp2s1:i=enp2s2:a=10.100.34.5/32 +``` +!!! note + With the above command, any output produced by the container will not be visible from the terminal + where it was started. Verify that the container is running correctly with `docker ps`, or examine + the logs of the container with `docker logs hostbgp` to investigate a failure. + +With the above command: + +- VLAN interfaces `enp2s1.1001` and `enp2s2.1001` would be created, if not already existing +- BGP unnumbered sessions would be created on those same interfaces, using the default ASN 64999 +- the address `10.100.34.5/32` would be configured on the loopback of the host server and it would be advertised to the leaves + +To further modify the configuration or to troubleshoot the state of the system, an +expert user can invoke the FRR CLI using the following command: +```bash +docker exec -it hostbgp vtysh +``` + +To stop the container, just run the following command: +```bash +docker stop -t 1 hostbgp +``` + +Note that stopping the docker container does not currently remove the VIPs from the loopback, nor +does it delete the VLAN interfaces. If needed, these should be removed manually; for example, +using iproute2 and the reference command above, one could run: +```bash +sudo ip address delete dev lo 10.100.34.5/32 +sudo ip link delete dev enp2s1.1001 +sudo ip link delete dev enp2s2.1001 +``` + +### Example: multi-VPC multi-homed server + +Let's assume that `server-03` is attached to both `leaf-01` and `leaf-02` with unbundled connections +`server-03--unbundled--leaf-01` and `server-03--unbundled--leaf-02`, and that we want it to be part +of two separate VPCs using host-BGP. We can create the VPCs and attachments e.g. from the control node +using the Fabric `kubectl` plugin: + +```bash +core@control-1 ~ $ kubectl fabric vpc create --name=vpc-01 --subnet=10.0.1.0/24 --vlan=1001 --host-bgp=true +10:04:09 INF VPC created name=vpc-01 +core@control-1 ~ $ kubectl fabric vpc create --name=vpc-02 --subnet=10.0.2.0/24 --vlan=1002 --host-bgp=true +10:04:24 INF VPC created name=vpc-02 +core@control-1 ~ $ kubectl fabric vpc attach --name=s3-v1-l1 --conn=server-03--unbundled--leaf-01 --subnet=vpc-01/default +10:05:59 INF VPCAttachment created name=s3-v1-l1 +core@control-1 ~ $ kubectl fabric vpc attach --name=s3-v1-l2 --conn=server-03--unbundled--leaf-02 --subnet=vpc-01/default +10:06:08 INF VPCAttachment created name=s3-v1-l2 +core@control-1 ~ $ kubectl fabric vpc attach --name=s3-v2-l1 --conn=server-03--unbundled--leaf-01 --subnet=vpc-02/default +10:06:24 INF VPCAttachment created name=s3-v2-l1 +core@control-1 ~ $ kubectl fabric vpc attach --name=s3-v2-l2 --conn=server-03--unbundled--leaf-02 --subnet=vpc-02/default +10:06:33 INF VPCAttachment created name=s3-v2-l2 +``` + +Then we can configure `server-03` using the provided container: + +```bash +docker run --network=host --privileged --rm --detach --name hostbgp ghcr.io/githedgehog/host-bgp vpc-01:v=1001:i=enp2s1:i=enp2s2:a=10.0.1.3/32 vpc-02:v=1002:i=enp2s1:i=enp2s2:a=10.0.2.3/32 +``` + +This will generate the following FRR configuration: +``` +! +ip prefix-list vpc-01 seq 5 permit 10.0.1.3/32 +ip prefix-list vpc-02 seq 5 permit 10.0.2.3/32 +! +route-map vpc-01 permit 10 + match ip address prefix-list vpc-01 +exit +! +route-map vpc-02 permit 10 + match ip address prefix-list vpc-02 +exit +! +interface lo + ip address 10.0.1.3/32 + ip address 10.0.2.3/32 +exit +! +router bgp 64999 + no bgp ebgp-requires-policy + bgp bestpath as-path multipath-relax + timers bgp 3 9 + neighbor enp2s1.1001 interface remote-as external + neighbor enp2s1.1002 interface remote-as external + neighbor enp2s2.1001 interface remote-as external + neighbor enp2s2.1002 interface remote-as external + ! + address-family ipv4 unicast + network 10.0.1.3/32 + network 10.0.2.3/32 + neighbor enp2s1.1001 route-map vpc-01 out + neighbor enp2s1.1002 route-map vpc-02 out + neighbor enp2s2.1001 route-map vpc-01 out + neighbor enp2s2.1002 route-map vpc-02 out + maximum-paths 4 + exit-address-family +exit +! +``` + +And we can verify on either of the leaves attached to `server-03` that VIPs are only +learned in the VPC they belong to: +``` +leaf-01# show ip route vrf VrfVvpc-01 +Codes: K - kernel route, C - connected, S - static, B - BGP, O - OSPF, A - attached-host + > - selected route, * - FIB route, q - queued route, r - rejected route, b - backup + Destination Gateway Dist/Metric Last Update +-------------------------------------------------------------------------------------------------------------------------------- + B>* 10.0.1.3/32 via fe80::e20:12ff:fefe:401 Ethernet1.1001 20/0 00:09:43 ago +leaf-01# show ip route vrf VrfVvpc-02 +Codes: K - kernel route, C - connected, S - static, B - BGP, O - OSPF, A - attached-host + > - selected route, * - FIB route, q - queued route, r - rejected route, b - backup + Destination Gateway Dist/Metric Last Update +-------------------------------------------------------------------------------------------------------------------------------- + B>* 10.0.2.3/32 via fe80::e20:12ff:fefe:401 Ethernet1.1002 20/0 00:09:47 ago +leaf-01# +``` diff --git a/docs/user-guide/vpcs.md b/docs/user-guide/vpcs.md index aae79a07..14e31da7 100644 --- a/docs/user-guide/vpcs.md +++ b/docs/user-guide/vpcs.md @@ -56,6 +56,11 @@ spec: subnet: 10.10.100.0/24 vlan: 1100 + bgp-on-host: # Another subnet with hosts peering with leaves via BGP + subnet: 10.10.50.0/25 + hostBGP: true + vlan: 1050 + permit: # Defines which subnets of the current VPC can communicate to each other, applied on top of subnets "isolated" flag (doesn't affect VPC peering) - [subnet-1, subnet-2, subnet-3] # 1, 2 and 3 subnets can communicate to each other - [subnet-4, subnet-5] # Possible to define multiple lists @@ -108,6 +113,24 @@ packet: Fabric and will be in `VrfV` format, for example `VrfVvpc-1` for a VPC named `vpc-1` in the Fabric API. * _CircuitID_ (suboption 1) identifies the VLAN which, together with the VRF (VPC) name, maps to a specific VPC subnet. +### HostBGP subnets + +At times, it is useful to have BGP running directly on the host and peering with the Fabric: one such case is +to support active-active multi-homed servers, or simply to have redundancy when other techniques such +as MCLAG or ESLAG are not available, for example because of hardware limitations. + +Consider this scenario: `server-1` is connected to two different Fabric switches `sw-1` and `sw-2`, and attached to +`vpc-1/subnet-1` on both of them. This subnet is configured as `hostBGP`; the switches will be configured to peer with +`server-1` using unnumbered BGP (IPv4 unicast address family), only importing /32 prefixes in the subnet of the VPC and +exporting routes learned from other VPC peers. Similarly, BGP is running on `server-1`, unnumbered BGP sessions are +established with each leaf, and one or more Virtual IPs (VIPs) in the VPC subnet are advertised. With this setup, the +host is part of the VPC and can be reached via one of the advertised VIPs from either link to the Fabric. + +It is important to keep in mind that Hedgehog Fabric does not directly operate the host servers attached to it; +running subnets in HostBGP mode requires running a routing suite and configuring it accordingly. To facilitate this +process, however, we do provide a container image which can autogenerate a valid configuration, given some input parameters. +For more details, see [the related section in the Host Settings page](host-settings.md#hostbgp-container). + ## VPCAttachment A VPCAttachment represents a specific VPC subnet assignment to the `Connection` object which means a binding between an @@ -282,5 +305,3 @@ user@server ~$ ip route 10.10.0.1/24 via 10.10.0.1 dev enp2s1.1000 proto dhcp src 10.10.0.4 metric 1024 # Route for VPC subnet gateway 10.10.0.1 dev enp2s1.1000 proto dhcp scope link src 10.10.0.4 metric 1024 ``` - -