Commit bf368e4e70cd4e0f880923c44e95a4273d725ab4

Authored by Eric Dumazet
Committed by David S. Miller
1 parent 37b607c5ac

net: Avoid extra wakeups of threads blocked in wait_for_packet()

In 2.6.25 we added UDP mem accounting.

This unfortunatly added a penalty when a frame is transmitted, since
we have at TX completion time to call sock_wfree() to perform necessary
memory accounting. This calls sock_def_write_space() and utimately
scheduler if any thread is waiting on the socket.
Thread(s) waiting for an incoming frame was scheduled, then had to sleep
again as event was meaningless.

(All threads waiting on a socket are using same sk_sleep anchor)

This adds lot of extra wakeups and increases latencies, as noted
by Christoph Lameter, and slows down softirq handler.

Reference : http://marc.info/?l=linux-netdev&m=124060437012283&w=2

Fortunatly, Davide Libenzi recently added concept of keyed wakeups
into kernel, and particularly for sockets (see commit
37e5540b3c9d838eb20f2ca8ea2eb8072271e403
epoll keyed wakeups: make sockets use keyed wakeups)

Davide goal was to optimize epoll, but this new wakeup infrastructure
can help non epoll users as well, if they care to setup an appropriate
handler.

This patch introduces new DEFINE_WAIT_FUNC() helper and uses it
in wait_for_packet(), so that only relevant event can wakeup a thread
blocked in this function.

Trace of function calls from bnx2 TX completion bnx2_poll_work() is :
__kfree_skb()
 skb_release_head_state()
  sock_wfree()
   sock_def_write_space()
    __wake_up_sync_key()
     __wake_up_common()
      receiver_wake_function() : Stops here since thread is waiting for an INPUT


Reported-by: Christoph Lameter <cl@linux.com>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Showing 2 changed files with 17 additions and 3 deletions Side-by-side Diff

include/linux/wait.h
... ... @@ -440,12 +440,14 @@
440 440 int autoremove_wake_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
441 441 int wake_bit_function(wait_queue_t *wait, unsigned mode, int sync, void *key);
442 442  
443   -#define DEFINE_WAIT(name) \
  443 +#define DEFINE_WAIT_FUNC(name, function) \
444 444 wait_queue_t name = { \
445 445 .private = current, \
446   - .func = autoremove_wake_function, \
  446 + .func = function, \
447 447 .task_list = LIST_HEAD_INIT((name).task_list), \
448 448 }
  449 +
  450 +#define DEFINE_WAIT(name) DEFINE_WAIT_FUNC(name, autoremove_wake_function)
449 451  
450 452 #define DEFINE_WAIT_BIT(name, word, bit) \
451 453 struct wait_bit_queue name = { \
... ... @@ -64,13 +64,25 @@
64 64 return sk->sk_type == SOCK_SEQPACKET || sk->sk_type == SOCK_STREAM;
65 65 }
66 66  
  67 +static int receiver_wake_function(wait_queue_t *wait, unsigned mode, int sync,
  68 + void *key)
  69 +{
  70 + unsigned long bits = (unsigned long)key;
  71 +
  72 + /*
  73 + * Avoid a wakeup if event not interesting for us
  74 + */
  75 + if (bits && !(bits & (POLLIN | POLLERR)))
  76 + return 0;
  77 + return autoremove_wake_function(wait, mode, sync, key);
  78 +}
67 79 /*
68 80 * Wait for a packet..
69 81 */
70 82 static int wait_for_packet(struct sock *sk, int *err, long *timeo_p)
71 83 {
72 84 int error;
73   - DEFINE_WAIT(wait);
  85 + DEFINE_WAIT_FUNC(wait, receiver_wake_function);
74 86  
75 87 prepare_to_wait_exclusive(sk->sk_sleep, &wait, TASK_INTERRUPTIBLE);
76 88