21 Sep, 2007

3 commits

  • we upgraded the kernel of a nfs-server from 2.6.17.11 to 2.6.22.6. Since
    then we get the message

    lockd: too many open TCP sockets, consider increasing the number of nfsd threads
    lockd: last TCP connect from ^\\236^\É^D

    These random characters in the second line are caused by a bug in
    svc_tcp_accept.

    (Note: there are two previous __svc_print_addr(sin, buf, sizeof(buf))
    calls in this function, either of which would initialize buf correctly;
    but both are inside "if"'s and are not necessarily executed. This is
    less obvious in the second case, which is inside a dprintk(), which is a
    macro which expands to an if statement.)

    Signed-off-by: Wolfgang Walter
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Linus Torvalds

    Wolfgang Walter
     
  • Acked-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Alexey Kuznetsov
     
  • The following patch fixes the handling of netlink packets containing
    multiple messages.

    As exposed during netfilter workshop, nfnetlink_log was overwritten the
    message type of the last message (setting it to MSG_DONE) in a multipart
    packet. The consequence was libnfnetlink to ignore the last message in the
    packet.

    The following patch adds a supplementary message (with type MSG_DONE) af
    the end of the netlink skb.

    Signed-off-by: Eric Leblond
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Eric Leblond
     

17 Sep, 2007

9 commits

  • * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
    [VLAN]: Fix net_device leak.
    [PPP] generic: Fix receive path data clobbering & non-linear handling
    [PPP] generic: Call skb_cow_head before scribbling over skb
    [NET] skbuff: Add skb_cow_head
    [BRIDGE]: Kill clone argument to br_flood_*
    [PPP] pppoe: Fill in header directly in __pppoe_xmit
    [PPP] pppoe: Fix data clobbering in __pppoe_xmit and return value
    [PPP] pppoe: Fix skb_unshare_check call position
    [SCTP]: Convert bind_addr_list locking to RCU
    [SCTP]: Add RCU synchronization around sctp_localaddr_list
    [PKT_SCHED]: sch_cbq.c: Shut up uninitialized variable warning
    [PKTGEN]: srcmac fix
    [IPV6]: Fix source address selection.
    [IPV4]: Just increment OutDatagrams once per a datagram.
    [IPV6]: Just increment OutDatagrams once per a datagram.
    [IPV6]: Fix unbalanced socket reference with MSG_CONFIRM.
    [NET_SCHED] protect action config/dump from irqs
    [NET]: Fix two issues wrt. SO_BINDTODEVICE.

    Linus Torvalds
     
  • In "[VLAN]: Move device registation to seperate function" (commit
    e89fe42cd03c8fd3686df82d8390a235717a66de), a pile of code got moved
    to register_vlan_dev(), including grabbing a reference to underlying
    device. However, original dev_hold() had been left behind, so we
    leak a reference to net_device now...

    Signed-off-by: Al Viro
    Signed-off-by: David S. Miller

    Al Viro
     
  • This patch adds an optimised version of skb_cow that avoids the copy if
    the header can be modified even if the rest of the payload is cloned.

    This can be used in encapsulating paths where we only need to modify the
    header. As it is, this can be used in PPPOE and bridging.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • The clone argument is only used by one caller and that caller can clone
    the packet itself. This patch moves the clone call into the caller and
    kills the clone argument.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Since the sctp_sockaddr_entry is now RCU enabled as part of
    the patch to synchronize sctp_localaddr_list, it makes sense to
    change all handling of these entries to RCU. This includes the
    sctp_bind_addrs structure and it's list of bound addresses.

    This list is currently protected by an external rw_lock and that
    looks like an overkill. There are only 2 writers to the list:
    bind()/bindx() calls, and BH processing of ASCONF-ACK chunks.
    These are already seriealized via the socket lock, so they will
    not step on each other. These are also relatively rare, so we
    should be good with RCU.

    The readers are varied and they are easily converted to RCU.

    Signed-off-by: Vlad Yasevich
    Acked-by: Paul E. McKenney
    Acked-by: Sridhar Samdurala
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • sctp_localaddr_list is modified dynamically via NETDEV_UP
    and NETDEV_DOWN events, but there is not synchronization
    between writer (even handler) and readers. As a result,
    the readers can access an entry that has been freed and
    crash the sytem.

    Signed-off-by: Vlad Yasevich
    Acked-by: Paul E. McKenney
    Acked-by: Sridhar Samdurala
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • net/sched/sch_cbq.c: In function 'cbq_enqueue':
    net/sched/sch_cbq.c:383: warning: 'ret' may be used uninitialized in this function

    has been verified to be a bogus case. So let's shut it up.

    Signed-off-by: Satyam Sharma
    Acked-by: Patrick McHardy
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Satyam Sharma
     
  • From: Adit Ranadive

    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Adit Ranadive
     
  • The commit 95c385 broke proper source address selection for cases in which
    there is a address which is makred 'deprecated'. The commit mistakenly
    changed ifa->flags to ifa_result->flags (probably copy/paste error from a
    few lines above) in the 'Rule 3' address selection code.

    The patch restores the previous RFC-compliant behavior.

    Signed-off-by: Jiri Kosina
    Signed-off-by: David S. Miller

    Jiri Kosina
     

15 Sep, 2007

6 commits

  • Signed-off-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    YOSHIFUJI Hideaki
     
  • Signed-off-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    YOSHIFUJI Hideaki
     
  • Signed-off-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    YOSHIFUJI Hideaki
     
  • (with no apologies to C Heston)

    On Mon, 2007-10-09 at 21:00 +0800, Herbert Xu wrote:
    On Sun, Sep 02, 2007 at 01:11:29PM +0000, Christian Kujau wrote:
    > >
    > > after upgrading to 2.6.23-rc5 (and applying davem's fix [0]), lockdep
    > > was quite noisy when I tried to shape my external (wireless) interface:
    > >
    > > [ 6400.534545] FahCore_78.exe/3552 just changed the state of lock:
    > > [ 6400.534713] (&dev->ingress_lock){-+..}, at: []
    > > netif_receive_skb+0x2d5/0x3c0
    > > [ 6400.534941] but this lock took another, soft-read-irq-unsafe lock in the
    > > past:
    > > [ 6400.535145] (police_lock){-.--}
    >
    > This is a genuine dead-lock. The police lock can be taken
    > for reading with softirqs on. If a second CPU tries to take
    > the police lock for writing, while holding the ingress lock,
    > then a softirq on the first CPU can dead-lock when it tries
    > to get the ingress lock.

    Signed-off-by: Jamal Hadi Salim
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     
  • 1) Comments suggest that setting optlen to zero will unbind
    the socket from whatever device it might be attached to. This
    hasn't been the case since at least 2.2.x because the first thing
    this function does is return -EINVAL if 'optlen' is less than
    sizeof(int).

    This check also means that passing in a two byte string doesn't
    work so well. It's almost as if this code was testing with "eth?"
    patterned strings and nothing else :-)

    Fix this by breaking the logic of this facility out into a
    seperate function which validates optlen more appropriately.

    The optlen==0 and small string cases now work properly.

    2) We should reset the cached route of the socket after we have made
    the device binding changes, not before.

    Reported by Ben Greear.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • Commit aaf68cfbf2241d24d46583423f6bff5c47e088b3 added a bias
    to sk_inuse, so this test for an unused socket now fails. So no
    sockets get closed because they are old (they might get closed
    if the client closed them).

    This bug has existed since 2.6.21-rc1.

    Thanks to Wolfgang Walter for finding and reporting the bug.

    Cc: Wolfgang Walter
    Signed-off-by: Neil Brown
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Linus Torvalds

    Neil Brown
     

12 Sep, 2007

1 commit


11 Sep, 2007

9 commits

  • netlink_run_queue() doesn't handle multiple processes processing the
    queue concurrently. Serialize queue processing in inet_diag to fix
    a oops in netlink_rcv_skb caused by netlink_run_queue passing a
    NULL for the skb.

    BUG: unable to handle kernel NULL pointer dereference at virtual address 00000054
    [349587.500454] printing eip:
    [349587.500457] c03318ae
    [349587.500459] *pde = 00000000
    [349587.500464] Oops: 0000 [#1]
    [349587.500466] PREEMPT SMP
    [349587.500474] Modules linked in: w83627hf hwmon_vid i2c_isa
    [349587.500483] CPU: 0
    [349587.500485] EIP: 0060:[] Not tainted VLI
    [349587.500487] EFLAGS: 00010246 (2.6.22.3 #1)
    [349587.500499] EIP is at netlink_rcv_skb+0xa/0x7e
    [349587.500506] eax: 00000000 ebx: 00000000 ecx: c148d2a0 edx: c0398819
    [349587.500510] esi: 00000000 edi: c0398819 ebp: c7a21c8c esp: c7a21c80
    [349587.500517] ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
    [349587.500521] Process oidentd (pid: 17943, ti=c7a20000 task=cee231c0 task.ti=c7a20000)
    [349587.500527] Stack: 00000000 c7a21cac f7c8ba78 c7a21ca4 c0331962 c0398819 f7c8ba00 0000004c
    [349587.500542] f736f000 c7a21cb4 c03988e3 00000001 f7c8ba00 c7a21cc4 c03312a5 0000004c
    [349587.500558] f7c8ba00 c7a21cd4 c0330681 f7c8ba00 e4695280 c7a21d00 c03307c6 7fffffff
    [349587.500578] Call Trace:
    [349587.500581] [] show_trace_log_lvl+0x1c/0x33
    [349587.500591] [] show_stack_log_lvl+0x8d/0xaa
    [349587.500595] [] show_registers+0x1cb/0x321
    [349587.500604] [] die+0x112/0x1e1
    [349587.500607] [] do_page_fault+0x229/0x565
    [349587.500618] [] error_code+0x72/0x78
    [349587.500625] [] netlink_run_queue+0x40/0x76
    [349587.500632] [] inet_diag_rcv+0x1f/0x2c
    [349587.500639] [] netlink_data_ready+0x57/0x59
    [349587.500643] [] netlink_sendskb+0x24/0x45
    [349587.500651] [] netlink_unicast+0x100/0x116
    [349587.500656] [] netlink_sendmsg+0x1c2/0x280
    [349587.500664] [] sock_sendmsg+0xba/0xd5
    [349587.500671] [] sys_sendmsg+0x17b/0x1e8
    [349587.500676] [] sys_socketcall+0x230/0x24d
    [349587.500684] [] syscall_call+0x7/0xb
    [349587.500691] =======================
    [349587.500693] Code: f0 ff 4e 18 0f 94 c0 84 c0 0f 84 66 ff ff ff 89 f0 e8 86 e2 fc ff e9 5a ff ff ff f0 ff 40 10 eb be 55 89 e5 57 89 d7 56 89 c6 53 50 54 83 fa 10 72 55 8b 9e 9c 00 00 00 31 c9 8b 03 83 f8 0f

    Reported by Athanasius

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Some of skbs in sk->write_queue do not have skb->dst because
    we do not fill skb->dst when we allocate new skb in append_data().

    BTW, I think we may not need to (or we should not) increment some stats
    when using corking; if 100 sendmsg() (with MSG_MORE) result in 2 packets,
    how many should we increment?

    If 100, we should set skb->dst for every queued skbs.

    If 1 (or 2 (*)), we increment the stats for the first queued skb and
    we should just skip incrementing OutDiscards for the rest of queued skbs,
    adn we should also impelement this semantics in other places;
    e.g., we should increment other stats just once, not 100 times.

    *: depends on the place we are discarding the datagram.

    I guess should just increment by 1 (or 2).

    Signed-off-by: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    YOSHIFUJI Hideaki
     
  • So I've had a deadlock reported to me. I've found that the sequence of
    events goes like this:

    1) process A (modprobe) runs to remove ip_tables.ko

    2) process B (iptables-restore) runs and calls setsockopt on a netfilter socket,
    increasing the ip_tables socket_ops use count

    3) process A acquires a file lock on the file ip_tables.ko, calls remove_module
    in the kernel, which in turn executes the ip_tables module cleanup routine,
    which calls nf_unregister_sockopt

    4) nf_unregister_sockopt, seeing that the use count is non-zero, puts the
    calling process into uninterruptible sleep, expecting the process using the
    socket option code to wake it up when it exits the kernel

    4) the user of the socket option code (process B) in do_ipt_get_ctl, calls
    ipt_find_table_lock, which in this case calls request_module to load
    ip_tables_nat.ko

    5) request_module forks a copy of modprobe (process C) to load the module and
    blocks until modprobe exits.

    6) Process C. forked by request_module process the dependencies of
    ip_tables_nat.ko, of which ip_tables.ko is one.

    7) Process C attempts to lock the request module and all its dependencies, it
    blocks when it attempts to lock ip_tables.ko (which was previously locked in
    step 3)

    Theres not really any great permanent solution to this that I can see, but I've
    developed a two part solution that corrects the problem

    Part 1) Modifies the nf_sockopt registration code so that, instead of using a
    use counter internal to the nf_sockopt_ops structure, we instead use a pointer
    to the registering modules owner to do module reference counting when nf_sockopt
    calls a modules set/get routine. This prevents the deadlock by preventing set 4
    from happening.

    Part 2) Enhances the modprobe utilty so that by default it preforms non-blocking
    remove operations (the same way rmmod does), and add an option to explicity
    request blocking operation. So if you select blocking operation in modprobe you
    can still cause the above deadlock, but only if you explicity try (and since
    root can do any old stupid thing it would like.... :) ).

    Signed-off-by: Neil Horman
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Neil Horman
     
  • Since we're now using a generic tuple decoding function in ICMP
    connection tracking, ipv4_get_l4proto() might get called with a
    fragmented packet from within an ICMP error. Remove the error
    message we used to print when this happens.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • David S. Miller
     
  • From: Denis V. Lunev

    addrconf_dad_failure calls addrconf_dad_stop which takes referenced address
    and drops the count. So, in6_ifa_put perrformed at out: is extra. This
    results in message: "Freeing alive inet6 address" and not released dst entries.

    Signed-off-by: Denis V. Lunev
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • Not all are listed, same as the IPV4 devinet bug.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • Bug: http://bugzilla.kernel.org/show_bug.cgi?id=8876

    Not all ips are shown by "ip addr show" command when IPs number assigned to an
    interface is more than 60-80 (in fact it depends on broadcast/label etc
    presence on each address).

    Steps to reproduce:
    It's terribly simple to reproduce:

    # for i in $(seq 1 100); do ip ad add 10.0.$i.1/24 dev eth10 ; done
    # ip addr show

    this will _not_ show all IPs.
    Looks like the problem is in netlink/ipv4 message processing.

    This is fix from bug submitter, it looks correct.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • When msg_iovlen is zero we shouldn't try to dereference
    msg_iov. Right now the only thing that tries to do so
    is skb_copy_and_csum_datagram_iovec. Since the total
    length should also be zero if msg_iovlen is zero, it's
    sufficient to check the total length there and simply
    return if it's zero.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

09 Sep, 2007

3 commits


01 Sep, 2007

1 commit


31 Aug, 2007

8 commits