23 Mar, 2008

6 commits

  • As reported by Johannes Berg:

    I started getting this warning with recent kernels:

    [ 773.908927] ------------[ cut here ]------------
    [ 773.908954] Badness at net/core/dev.c:2204
    ...

    If we loop more than once in gem_poll(), we'll
    use more than the real budget in our gem_rx()
    calls, thus eventually trigger the caller's
    assertions in net_rx_action().

    Subtract "work_done" from "budget" for the second
    arg to gem_rx() to fix the bug.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • On 10GBaseT boards setting the type to TP will cause the driver to try
    to configure 1GBaseT.
    Since there are currently no boards that support setting of the port
    type, disable this for now.

    Signed-off-by: Eliezer Tamir
    Signed-off-by: David S. Miller

    Eliezer Tamir
     
  • The variable cb is initialized but never used otherwise.

    The semantic patch that makes this change is as follows:
    (http://www.emn.fr/x-info/coccinelle/)

    //
    @@
    type T;
    identifier i;
    constant C;
    @@

    (
    extern T i;
    |
    - T i;

    )
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: David S. Miller

    Julia Lawall
     
  • The variable hlen is initialized but never used otherwise.

    The semantic patch that makes this change is as follows:
    (http://www.emn.fr/x-info/coccinelle/)

    //
    @@
    type T;
    identifier i;
    constant C;
    @@

    (
    extern T i;
    |
    - T i;

    )
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: David S. Miller

    Julia Lawall
     
  • This gets rid of a warning caused by the test in rcu_assign_pointer.
    I tried to fix rcu_assign_pointer, but that devolved into a long set
    of discussions about doing it right that came to no real solution.
    Since the test in rcu_assign_pointer for constant NULL would never
    succeed in fib_trie, just open code instead.

    Signed-off-by: Stephen Hemminger
    Acked-by: Paul E. McKenney
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • While testing the virtio-net driver on KVM with TSO I noticed
    that TSO performance with a 1500 MTU is significantly worse
    compared to the performance of non-TSO with a 16436 MTU. The
    packet dump shows that most of the packets sent are smaller
    than a page.

    Looking at the code this actually is quite obvious as it always
    stop extending the packet if it's the first packet yet to be
    sent and if it's larger than the MSS. Since each extension is
    bound by the page size, this means that (given a 1500 MTU) we're
    very unlikely to construct packets greater than a page, provided
    that the receiver and the path is fast enough so that packets can
    always be sent immediately.

    The fix is also quite obvious. The push calls inside the loop
    is just an optimisation so that we don't end up doing all the
    sending at the end of the loop. Therefore there is no specific
    reason why it has to do so at MSS boundaries. For TSO, the
    most natural extension of this optimisation is to do the pushing
    once the skb exceeds the TSO size goal.

    This is what the patch does and testing with KVM shows that the
    TSO performance with a 1500 MTU easily surpasses that of a 16436
    MTU and indeed the packet sizes sent are generally larger than
    16436.

    I don't see any obvious downsides for slower peers or connections,
    but it would be prudent to test this extensively to ensure that
    those cases don't regress.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

22 Mar, 2008

3 commits

  • This is a narrow pedantry :) but the dlci_ioctl_hook check and call
    should not be parted with the mutex lock.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • Introduced by 270637abff0cdf848b910b9f96ad342e1da61c66
    ("[SCTP]: Fix a race between module load and protosw access")

    Reported by Gabriel C:

    In file included from net/sctp/sm_statetable.c:50:
    include/net/sctp/sctp.h: In function 'sctp_v6_pf_init':
    include/net/sctp/sctp.h:392: warning: 'return' with a value, in function returning void
    In file included from net/sctp/sm_statefuns.c:62:
    include/net/sctp/sctp.h: In function 'sctp_v6_pf_init':
    include/net/sctp/sctp.h:392: warning: 'return' with a value, in function returning void
    ...

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Been seeing occasional panics in my testing of 2.6.25-rc in ip_defrag.
    Offending line in ip_defrag is here:

    net = skb->dev->nd_net

    where dev is NULL. Bisected the problem down to commit
    ac18e7509e7df327e30d6e073a787d922eaf211d ([NETNS][FRAGS]: Make the
    inet_frag_queue lookup work in namespaces).

    Below patch (idea from Patrick McHardy) fixes the problem for me.

    Signed-off-by: Phil Oester
    Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Phil Oester
     

21 Mar, 2008

12 commits

  • [ 10.536424] =======================================================
    [ 10.536424] [ INFO: possible circular locking dependency detected ]
    [ 10.536424] 2.6.25-rc3-devel #3
    [ 10.536424] -------------------------------------------------------
    [ 10.536424] swapper/0 is trying to acquire lock:
    [ 10.536424] (&dev->queue_lock){-+..}, at: []
    dev_queue_xmit+0x175/0x2f3
    [ 10.536424]
    [ 10.536424] but task is already holding lock:
    [ 10.536424] (&p->tcfc_lock){-+..}, at: [] tcf_mirred+0x20/0x178
    [act_mirred]
    [ 10.536424]
    [ 10.536424] which lock already depends on the new lock.

    lockdep warns of locking order while using ifb with sch_ingress and
    act_mirred: ingress_lock, tcfc_lock, queue_lock (usually queue_lock
    is at the beginning). This patch is only to tell lockdep that ifb is
    a different device (e.g. from eth) and has its own pair of queue
    locks. (This warning is a false-positive in common scenario of using
    ifb; yet there are possible situations, when this order could be
    dangerous; lockdep should warn in such a case.) (With suggestions by
    David S. Miller)

    Reported-and-tested-by: Denys Fedoryshchenko
    Signed-off-by: Jarek Poplawski
    Acked-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • Based on notice from "Colin" .

    Signed-off-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    YOSHIFUJI Hideaki
     
  • When selecting a new window, tcp_select_window() tries not to shrink
    the offered window by using the maximum of the remaining offered window
    size and the newly calculated window size. The newly calculated window
    size is always a multiple of the window scaling factor, the remaining
    window size however might not be since it depends on rcv_wup/rcv_nxt.
    This means we're effectively shrinking the window when scaling it down.

    The dump below shows the problem (scaling factor 2^7):

    - Window size of 557 (71296) is advertised, up to 3111907257:

    IP 172.2.2.3.33000 > 172.2.2.2.33000: . ack 3111835961 win 557

    - New window size of 514 (65792) is advertised, up to 3111907217, 40 bytes
    below the last end:

    IP 172.2.2.3.33000 > 172.2.2.2.33000: . 3113575668:3113577116(1448) ack 3111841425 win 514

    The number 40 results from downscaling the remaining window:

    3111907257 - 3111841425 = 65832
    65832 / 2^7 = 514
    65832 % 2^7 = 40

    If the sender uses up the entire window before it is shrunk, this can have
    chaotic effects on the connection. When sending ACKs, tcp_acceptable_seq()
    will notice that the window has been shrunk since tcp_wnd_end() is before
    tp->snd_nxt, which makes it choose tcp_wnd_end() as sequence number.
    This will fail the receivers checks in tcp_sequence() however since it
    is before it's tp->rcv_wup, making it respond with a dupack.

    If both sides are in this condition, this leads to a constant flood of
    ACKs until the connection times out.

    Make sure the window is never shrunk by aligning the remaining window to
    the window scaling factor.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • zap_completion_queue() retrieves skbs from completion_queue where they have
    zero skb->users counter. Before dev_kfree_skb_any() it should be non-zero
    yet, so it's increased now.

    Reported-and-tested-by: Andrew Morton
    Signed-off-by: Jarek Poplawski
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Jarek Poplawski
     
  • In br_fdb_cleanup() next_timer and this_timer are in jiffies, so they
    should be compared using the time_after() macro.

    Signed-off-by: Fabio Checconi
    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Fabio Checconi
     
  • Sparc MAC address support should be protected consistently
    with CONFIG_SPARC, but there was a stray CONFIG_SPARC64
    case.

    Bump driver version and release date.

    Reported by Andrew Morton.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Signed-off-by: Pavel Machek
    Signed-off-by: David S. Miller

    Pavel Machek
     
  • From: Pavel Emelyanov

    This patch is based on the one from Thomas.

    The kauditd_thread() calls the netlink_unicast() and passes
    the audit_pid to it. The audit_pid, in turn, is received from
    the user space and the tool (I've checked the audit v1.6.9)
    uses getpid() to pass one in the kernel. Besides, this tool
    doesn't bind the netlink socket to this id, but simply creates
    it allowing the kernel to auto-bind one.

    That's the preamble.

    The problem is that netlink_autobind() _does_not_ guarantees
    that the socket will be auto-bound to the current pid. Instead
    it uses the current pid as a hint to start looking for a free
    id. So, in case of conflict, the audit messages can be sent
    to a wrong socket. This can happen (it's unlikely, but can be)
    in case some task opens more than one netlink sockets and then
    the audit one starts - in this case the audit's pid can be busy
    and its socket will be bound to another id.

    The proposal is to introduce an audit_nlk_pid in audit subsys,
    that will point to the netlink socket to send packets to. It
    will most often be equal to audit_pid. The socket id can be
    got from the skb's netlink CB right in the audit_receive_msg.
    The audit_nlk_pid reset to 0 is not required, since all the
    decisions are taken based on audit_pid value only.

    Later, if the audit tools will bind the socket themselves, the
    kernel will have to provide a way to setup the audit_nlk_pid
    as well.

    A good side effect of this patch is that audit_pid can later
    be converted to struct pid, as it is not longer safe to use
    pid_t-s in the presence of pid namespaces. But audit code still
    uses the tgid from task_struct in the audit_signal_info and in
    the audit_filter_syscall.

    Signed-off-by: Thomas Graf
    Signed-off-by: Pavel Emelyanov
    Acked-by: Eric Paris
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     
  • commit e9720ac ([NET]: Make /proc/net a symlink on /proc/self/net (v3))
    broke ganglia and probably other applications that read /proc/net/dev.

    This is due to the change of permissions of /proc/net that was
    introduced in that commit.

    Before: dr-xr-xr-x 5 root root 0 Mar 19 11:30 /proc/net
    After: dr-xr--r-- 5 root root 0 Mar 19 11:29 /proc/self/net

    This patch restores the permissions to the old value which makes
    ganglia happy again.

    Pavel Emelyanov says:

    This also broke the postfix, as it was reported in bug #10286
    and described in detail by Benjamin.

    Signed-off-by: Andre Noll
    Acked-by: Pavel Emelyanov
    Signed-off-by: David S. Miller

    Andre Noll
     
  • There is a race is SCTP between the loading of the module
    and the access by the socket layer to the protocol functions.
    In particular, a list of addresss that SCTP maintains is
    not initialized prior to the registration with the protosw.
    Thus it is possible for a user application to gain access
    to SCTP functions before everything has been initialized.
    The problem shows up as odd crashes during connection
    initializtion when we try to access the SCTP address list.

    The solution is to refactor how we do registration and
    initialize the lists prior to registering with the protosw.
    Care must be taken since the address list initialization
    depends on some other pieces of SCTP initialization. Also
    the clean-up in case of failure now also needs to be refactored.

    Signed-off-by: Vlad Yasevich
    Acked-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • If a rule using ipt_recent is created with a hit count greater than
    ip_pkt_list_tot, the rule will never match as it cannot keep track
    of enough timestamps. This patch makes ipt_recent refuse to create such
    rules.

    With ip_pkt_list_tot's default value of 20, the following can be used
    to reproduce the problem.

    nc -u -l 0.0.0.0 1234 &
    for i in `seq 1 100`; do echo $i | nc -w 1 -u 127.0.0.1 1234; done

    This limits it to 20 packets:
    iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \
    --rsource
    iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \
    60 --hitcount 20 --name test --rsource -j DROP

    While this is unlimited:
    iptables -A OUTPUT -p udp --dport 1234 -m recent --set --name test \
    --rsource
    iptables -A OUTPUT -p udp --dport 1234 -m recent --update --seconds \
    60 --hitcount 21 --name test --rsource -j DROP

    With the patch the second rule-set will throw an EINVAL.

    Reported-by: Sean Kennedy
    Signed-off-by: Daniel Hokka Zakrisson
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Daniel Hokka Zakrisson
     
  • logical-bitwise & confusion

    Signed-off-by: Roel Kluin
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Roel Kluin
     

19 Mar, 2008

1 commit


18 Mar, 2008

13 commits


17 Mar, 2008

5 commits