R&D/OS

TX Hang

sunshout 2013. 10. 4. 13:47
* what does Tx Unit Hang means?
We check to make sure transmits are occurring after a set period of time.  If 
during that time a transmit has not occurred and we have transmits pending we 
set a bit saying we are concerned.  This bit will be cleared if we receive a 
pause frame. If during the next check the same conditions are true and the bit 
is still set we say we have a TX hang.

* what is the state of the driver after this kind of messages?
After entering this state we should schedule a reset.  The idea being that it 
will restart transmits.  Sounds like you are not seeing this.



http://serverfault.com/questions/193114/linux-e1000e-intel-networking-driver-problems-galore-where-do-i-start


10G NIC 에서 1G mode로 negotiation 하면서 발생하기도 함

     ethtool -A ethX autoneg off rx off tx off

NOTE: For 82598 backplane cards entering 1 gig mode, flow control default 
behavior is changed to off.  Flow control in 1 gig mode on these devices can 
lead to Tx hangs. 
http://downloadmirror.intel.com/14687/eng/readme.txt

[ 1762.997753] igb 0000:01:00.0: Detected Tx Unit Hang

[ 1762.997753]   Tx Queue             <0>

[ 1762.997753]   TDH                  <f4>

[ 1762.997753]   TDT                  <f4>

[ 1762.997753]   next_to_use          <f6>

[ 1762.997753]   next_to_clean        <f4>

[ 1762.997753] buffer_info[next_to_clean]

[ 1762.997753]   time_stamp           <23981>

[ 1762.997753]   next_to_watch        <c08edf50>

[ 1762.997753]   jiffies              <239ec>

[ 1762.997753]   desc.status          <150200>

[ 1764.997784] igb 0000:01:00.0: Detected Tx Unit Hang

[ 1764.997784]   Tx Queue             <0>

[ 1764.997784]   TDH                  <f4>

[ 1764.997784]   TDT                  <f4>

[ 1764.997784]   next_to_use          <f6>

[ 1764.997784]   next_to_clean        <f4>

[ 1764.997784] buffer_info[next_to_clean]

[ 1764.997784]   time_stamp           <23981>

[ 1764.997784]   next_to_watch        <c08edf50>

[ 1764.997784]   jiffies              <23ab4>

[ 1764.997784]   desc.status          <150200>

[ 1766.997761] igb 0000:01:00.0: Detected Tx Unit Hang

[ 1766.997761]   Tx Queue             <0>

[ 1766.997761]   TDH                  <f4>

[ 1766.997761]   TDT                  <f4>

[ 1766.997761]   next_to_use          <f6>

[ 1766.997761]   next_to_clean        <f4>

[ 1766.997761] buffer_info[next_to_clean]

[ 1766.997761]   time_stamp           <23981>

[ 1766.997761]   next_to_watch        <c08edf50>

[ 1766.997761]   jiffies              <23b7c>

[ 1766.997761]   desc.status          <150200>

[ 1768.997849] igb 0000:01:00.0: Detected Tx Unit Hang

[ 1768.997849]   Tx Queue             <0>

[ 1768.997849]   TDH                  <f4>

[ 1768.997849]   TDT                  <f4>

[ 1768.997849]   next_to_use          <f6>

[ 1768.997849]   next_to_clean        <f4>

[ 1768.997849] buffer_info[next_to_clean]

[ 1768.997849]   time_stamp           <23981>

[ 1768.997849]   next_to_watch        <c08edf50>

[ 1768.997849]   jiffies              <23c44>

[ 1768.997849]   desc.status          <150200>

[ 1770.997818] igb 0000:01:00.0: Detected Tx Unit Hang

[ 1770.997818]   Tx Queue             <0>

[ 1770.997818]   TDH                  <f4>

[ 1770.997818]   TDT                  <f4>

[ 1770.997818]   next_to_use          <f6>

[ 1770.997818]   next_to_clean        <f4>

[ 1770.997818] buffer_info[next_to_clean]

[ 1770.997818]   time_stamp           <23981>

[ 1770.997818]   next_to_watch        <c08edf50>

[ 1770.997818]   jiffies              <23d0c>

[ 1770.997818]   desc.status          <150200>

[ 1771.041677] ------------[ cut here ]------------

[ 1771.046311] WARNING: at net/sched/sch_generic.c:255 dev_watchdog+0x258/0x278)

[ 1771.053611] NETDEV WATCHDOG: eth0 (igb): transmit queue 0 timed out

[ 1771.059943] Modules linked in:

[ 1771.063070] [<80015d28>] (unwind_backtrace+0x0/0xf8) from [<8001e1ec>] (warn)

[ 1771.072536] [<8001e1ec>] (warn_slowpath_common+0x4c/0x6c) from [<8001e2a0>] )

[ 1771.082170] [<8001e2a0>] (warn_slowpath_fmt+0x30/0x40) from [<80437b2c>] (de)

[ 1771.091288] [<80437b2c>] (dev_watchdog+0x258/0x278) from [<8002a52c>] (call_)

[ 1771.100752] [<8002a52c>] (call_timer_fn.isra.31+0x24/0x84) from [<8002a6fc>])

[ 1771.110641] [<8002a6fc>] (run_timer_softirq+0x170/0x1f0) from [<80024e98>] ()

[ 1771.119846] [<80024e98>] (__do_softirq+0xe0/0x1b8) from [<80024fb0>] (run_ks)

[ 1771.128521] [<80024fb0>] (run_ksoftirqd+0x40/0x5c) from [<800405e4>] (smpboo)

[ 1771.137657] [<800405e4>] (smpboot_thread_fn+0xf0/0x178) from [<800394b4>] (k)

[ 1771.146216] [<800394b4>] (kthread+0xa4/0xb0) from [<8000ec98>] (ret_from_for)

[ 1771.154388] ---[ end trace 4107bd53718c6753 ]---