I recently ran into an issue with newer versions of XenServer that had me really really confused. I recently moved a VM that is acting as a router from a XenServer 5.0 machine to a XenServer 5.6 machine. This machine runs both OSPF and a DHCP server on top of VLAN interfaces in XenServer, and oddly, both of these services stopped working after I moved the VM.
The network configuration was pretty much identical on the old and new XenServer machines — two NICs, each NIC with a native VLAN of ‘1’ (internal network), and a pile of VLANs on top of each NIC for actual networks I care about. A ‘tcpdump’ on the XenServer 5.6 dom0 would show both the DHCP and OSPF packets, but a ‘tcpdump’ within the VM would not show anything. It seemed *odd!*.
After doing some digging, I ran across a forum post at Citrix which described the same issue. The basic issue appears to be that Linux’s bridging implementation in newer kernels eats the 802.1q packets for broadcast addresses (for example, DHCP packets are addressed to the mac ‘FF:FF:FF:FF:FF:FF’).. in my case, I had the VMs on tagged interfaces on top of ‘eth1’, which XenServer had put into a bridge called ‘xenbr1’. I did not have any VMs using the raw interface for eth1 (ie – adding a VM virtual interface to xenbr1), so I just did a ‘ifconfig xenbr1 down ; brctl delbr xenbr1’ to get rid of that bridge, and everything started working.
This obviously isn’t a long-term solution; just a ‘works-for-now’ hack.. I need to look into a method to disable the native VLAN bridge, and also look into the possibility of using OpenVSwitch to handle the networking, which is supposed to resolve this issue. If you’ve run into this issue, I’d love to hear what you did to get around it!