Proxmox 5x “e1000 driver hang” fix

0
1344

Problem:

I started to experience proxmox servers going up and down, an intermittent response on the network (SSH), and the only thing that we had done was upgraded the Proxmox from 5.2 to the latest proxmox 5. (5.4) and reboot the server. This happens up to several times a day at seemingly random!

After investigating the logs, these stood to me:

Mar 18 09:43:36 node4 kernel: [41239.339034] e1000e 0000:00:19.0 eno1: Detected Hardware Unit Hang:
Mar 18 09:43:39 node4 kernel: [41242.405980] e1000e 0000:00:19.0 eno1: Reset adapter unexpectedly
Mar 18 09:43:39 node4 kernel: [41242.406283] vmbr0: port 1(eno1) entered disabled state
Mar 18 09:43:43 node4 kernel: [41246.191838] e1000e: eno1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Mar 18 09:43:43 node4 kernel: [41246.191874] vmbr0: port 1(eno1) entered blocking state
Mar 18 09:43:43 node4 kernel: [41246.191876] vmbr0: port 1(eno1) entered forwarding state

The issue seems that after a kernel/driver update, the interface going into a perpetual hang state, and crashes..

Solution:

The solution i have found to work across multiple different servers is using the “ethtool” to set some values to prevent this from happening.*

Disable the following:

  • GSO (generic-segmentation-offload)
  • GRO (generic-receive-offload)
  • TSO (tcp-segmentation-offload)
  • TX (tx-checksumming)
  • RX (rx-checksumming)
ethtool -K  gso off gro off tso off tx off rx off

and also disable “pcie power saver”:

pcie_aspm=off

Create a script to automate this for us

Since this has to be done for the WAN facing interfaces (not loopback etc) and only (and everytime) an interface connects, we can utilize the scripts in /etc/network/if-up.d/ to do the job.

Now run the following command to create a new script file:

nano /etc/network/if-up.d/hangfix-ifup && chmod +x /etc/network/if-up.d/hangfix-ifup

Insert this into the file:

!/bin/sh -e
 if [ "$IFACE" = "YOUR-INTERFACE-NAME-HERE" ]; then
     /sbin/ethtool -K $IFACE gso off gro off tso off tx off rx off
     pcie_aspm=off
 fi
 exit 0

LEAVE A REPLY

Please enter your comment!
Please enter your name here