Contents

Bridge Netfilter: Unexpected Consequences

Docker loads bridge netfilter kernel module which creates certain side effects on the system. This blog explores the unexpected consequences and provides a solution for it.

The problem

My journey into the bridge netfilter rabbit hole started when I was trying to create a network topology on my system using namespaces. A simplified version that showcases the core problem is as shown below:

/posts/linux/networking/bridge-netfilter-unexpected-consequences/network-topology.png
Test Network Topology

In this setup, there are two network namespaces connected via a linux bridge. The goal is to ping between them.

Two namespaces with bridge script
#!/bin/bash

# Create two network namespaces
ip netns add ns1
ip netns add ns2

# Create a bridge
ip link add br0 type bridge
ip link set dev br0 up

# Create two veth pairs
ip link add veth0a type veth peer name veth0b
ip link add veth1a type veth peer name veth1b

# Move one end of each veth pair into the namespaces
ip link set veth0a netns ns1
ip link set veth1a netns ns2

# Configure the interfaces in the namespaces
ip netns exec ns1 ip addr add 192.168.30.1/24 dev veth0a
ip netns exec ns1 ip link set dev veth0a up
ip netns exec ns1 ip link set dev lo up

ip netns exec ns2 ip addr add 192.168.30.2/24 dev veth1a
ip netns exec ns2 ip link set dev veth1a up
ip netns exec ns2 ip link set dev lo up

# Connect the other ends of the veth pairs to the bridge
ip link set veth0b master br0
ip link set veth1b master br0

# Enable the bridge interface
ip link set dev br0 up

echo "Network namespaces and bridge are set up."

The expectation here is that as both the namespaces are in the same subnet and connected to by a bridge, they should be able to ping each other. But surprisingly, the ping does not work.

ARP resolution

When checked, the arp resolution was successful on the namespaces showing that the Layer 2 is working well.

# arp -n
Address                  HWtype  HWaddress           Flags Mask            Iface
192.168.30.2             ether   86:3a:ab:26:5c:97   C                     veth0a

IPTable rules

As this is pure Layer 2 communication, the effects of iptable rules were not considered. After exhausting all the troubleshooting options, I googled for this problem and found a very interesting concept around br-netfilter here

It turns out, if docker is installed on a system it enables br-netfilter kernel module and in addition it changes the default FORWARD chain policy to DROP.

INPUT     ACCEPT
FORWARD   DROP
OUTPUT    ACCEPT

We can check if br-netfilter module is loaded using the below command:

# lsmod | grep -E 'veth|bridge'
veth                   32768  0
bridge                307200  1 br_netfilter
stp                    16384  1 bridge
llc                    16384  2 bridge,stp

Loading the br-netfilter module will result in iptable rules being applied on bridged ethernet frames and will result in packet drops, which will be totally unexpected by the unaware.

Solution for the impatient

If you are looking for a quick solution to the problem, you need to add the below rule to your iptables:

Allow forward traffic to/from the bridge
sudo iptables -I FORWARD -i <bridge-name> -j LOG --log-prefix "ACCEPTED: " # remove this line if you don't want logging
sudo iptables -I FORWARD -i <bridge-name> -j ACCEPT

Understanding br-netfilters

As per the br-netfilter website, the module enables the following functionality:

  • {Ip, Ip6, Arp}tables can filter bridged IPv4/IPv6/ARP packets, even when encapsulated in an 802.1Q VLAN or PPPoE header This enables the functionality of a stateful transparent firewall.
  • All filtering, logging and NAT features of the 3 tools can therefore be used on bridged frames.
  • Combined with ebtables, the bridge-nf code therefore makes Linux a very powerful transparent firewall.
  • This enables, f.e., the creation of a transparent masquerading machine (i.e. all local hosts think they are directly connected to the Internet).
  • Letting {ip, ip6, arp}tables see bridged traffic can be disabled or enabled using the appropriate proc entries, located in /proc/sys/net/bridge/:
    • bridge-nf-call-arptables
    • bridge-nf-call-iptables
    • bridge-nf-call-ip6tables
  • Also, letting the aforementioned firewall tools see bridged 802.1Q VLAN and PPPoE encapsulated packets can be disabled or enabled with a proc entry in the same directory:
    • bridge-nf-filter-vlan-tagged
  • bridge-nf-filter-pppoe-tagged These proc entries are just regular files. Writing ‘1’ to the file (echo 1 > file) enables the specific functionality, while > writing a ‘0’ to the file disables it.

An excerpt from here says:

If you have mod probe enabled and iptables Forward chain dropping packets:

iptables -I FORWARD -j DROP
modprobe br_netfilter

then by default the bridge will drop all IPv4 traffic received at layer 2 through iptables. For IPv4 frames, br_netfilter will temporarily “upgrade” Ethernet frames at layer 2 into IPv4 packets and feed them to iptables, still in the bridge path. Packets dropped by iptables won’t go out of the bridge. Those not dropped are fed back as Ethernet frames into the bridge to continue bridge processing.

Chain traversal for bridged IP packets

The following picture shows the chain traversal done by bridged ethernet frames/packets.

/posts/linux/networking/bridge-netfilter-unexpected-consequences/chain-traversal.png
Bridged frame/packet chain traversal

Here, the chains in green belong to netfilter which is used by IPTables. The chains in blue belong to bride-netfilter used by etables/nftables.

As can be seen, a bridged frame will hit 3 of the iptable chains:

  • nat PREROUTING
  • filter FORWARD
  • nat POSTROUTING

Due to this reason, the iptable rules in these 3 chains apply to the bridged traffic and in our case filter FORWARD chain rules will drop the frames between ns1 and ns2.

Unexpected scenario

When the br-nf module is enabled in the kernel, a frame/packet can go through the above 3 mentioned iptable chains in two possible ways:

  1. The frame is bridged.
  2. The packet got routed.

So, the complexity comes in filter FORWARD chain where it needs to distinguish between these two flows.

Strange behavior starts to happen when this distinction is not provided. The below example picked up from etables.netfilter.org provides insights into the issue. I will explain it with my interpretation.

Lets say we have the below setup:

/posts/linux/networking/bridge-netfilter-unexpected-consequences/strange-scenario-part1.png
Strage behaviour topology

The bridge/router is connected to the internet via port eth0 and has the below rule set:

iptables -t nat -A POSTROUTING -s 172.16.1.0/24 -j MASQUERADE

The requirement here is that traffic between 172.16.1.2 and 172.16.1.4 is bridged, while the rest is routed, using masquerading.

/posts/linux/networking/bridge-netfilter-unexpected-consequences/strange-scenario-part2.png
Traffic flow requirement

Now at the filter FORWARD chain level, some distinction has to be made to make it understand which traffic is for routing and which is for bridging.

Lets say, we don’t add any rules and try to ping between 172.16.1.2 and 172.16.1.4. The bridge will receive the ping request and will send it to the other end after masquerading the src ip address. So when the packet is received by 172.16.1.4 the src ip will be 172.16.1.1 instead of 172.16.1.2. When 172.16.1.4 sends the response back to the bridge, masquerading will kick in again and change the ip destination of this response from 172.16.1.1 to 172.16.1.4. The ping will work and the end result will look as if everything is working fine but internally bridged frames are getting masqueraded.

To avoid this situation, we need to make filter FORWARD chain aware of the two possible packet flows, bridged vs routing. This can be done by adding the below rule (which is generally unnecessary if only iptables is being used) to the iptables.

iptables -A FORWARD -s 172.16.1.0/24 -d 172.16.1.0/24 -j ACCEPT
Beware

The etables.netfilter.org sets iptable rule on the POSTROUTING chain which I feel is a typo. I have corrected it to the FORWARD chain above.

iptables -t nat -A POSTROUTING -s 172.16.1.0/24 -d 172.16.1.0/24 -j ACCEPT

Lets complete the discussion with a high level overview diagram of packet flow in linux networking.

High level network packet flow

featured-image.png
Traffic flow

The Link Layer at the bottom (blue) is where bridging happens and the green chains in this layer indicate the iptable chains that get called in the bridging flow due to the br-nf module.