03 Sep, 2012

1 commit

  • Pull networking fixes from David Miller:

    1) NLA_PUT* --> nla_put_* conversion got one case wrong in
    nfnetlink_log, fix from Patrick McHardy.

    2) Missed error return check in ipw2100 driver, from Julia Lawall.

    3) PMTU updates in ipv4 were setting the expiry time incorrectly, fix
    from Eric Dumazet.

    4) SFC driver erroneously reversed src and dst when reporting filters
    via ethtool.

    5) Memory leak in CAN protocol and wrong setting of IRQF_SHARED in
    sja1000 can platform driver, from Alexey Khoroshilov and Sven
    Schmitt.

    6) Fix multicast traffic scaling regression in ipv4_dst_destroy, only
    take the lock when we really need to. From Eric Dumazet.

    7) Fix non-root process spoofing in netlink, from Pablo Neira Ayuso.

    8) CWND reduction in TCP is done incorrectly during non-SACK recovery,
    fix from Yuchung Cheng.

    9) Revert netpoll change, and fix what was actually a driver specific
    problem. From Amerigo Wang. This should cure bootup hangs with
    netconsole some people reported.

    10) Fix xen-netfront invoking __skb_fill_page_desc() with a NULL page
    pointer. From Ian Campbell.

    11) SIP NAT fix for expectiontation creation, from Pablo Neira Ayuso.

    12) __ip_rt_update_pmtu() needs RCU locking, from Eric Dumazet.

    13) Fix usbnet deadlock on resume, can't use GFP_KERNEL in this
    situation. From Oliver Neukum.

    14) The davinci ethernet driver triggers an OOPS on removal because it
    frees an MDIO object before unregistering it. Fix from Bin Liu.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
    net: qmi_wwan: add several new Gobi devices
    fddi: 64 bit bug in smt_add_para()
    net: ethernet: fix kernel OOPS when remove davinci_mdio module
    net/xfrm/xfrm_state.c: fix error return code
    net: ipv6: fix error return code
    net: qmi_wwan: new device: Foxconn/Novatel E396
    usbnet: fix deadlock in resume
    cs89x0 : packet reception not working
    netfilter: nf_conntrack: fix racy timer handling with reliable events
    bnx2x: Correct the ndo_poll_controller call
    bnx2x: Move netif_napi_add to the open call
    ipv4: must use rcu protection while calling fib_lookup
    bnx2x: fix 57840_MF pci id
    net: ipv4: ipmr_expire_timer causes crash when removing net namespace
    e1000e: DoS while TSO enabled caused by link partner with small MSS
    l2tp: avoid to use synchronize_rcu in tunnel free function
    gianfar: fix default tx vlan offload feature flag
    netfilter: nf_nat_sip: fix incorrect handling of EBUSY for RTCP expectation
    xen-netfront: use __pskb_pull_tail to ensure linear area is big enough on RX
    netfilter: nfnetlink_log: fix error return code in init path
    ...

    Linus Torvalds
     

01 Sep, 2012

3 commits

  • Initialize return variable before exiting on an error path.

    A simplified version of the semantic match that finds this problem is as
    follows: (http://coccinelle.lip6.fr/)

    //
    (
    if@p1 (\(ret < 0\|ret != 0\))
    { ... return ret; }
    |
    ret@p1 = 0
    )
    ... when != ret = e1
    when != &ret
    *if(...)
    {
    ... when != ret = e2
    when forall
    return ret;
    }

    //

    Signed-off-by: Julia Lawall
    Signed-off-by: David S. Miller

    Julia Lawall
     
  • Initialize return variable before exiting on an error path.

    The initial initialization of the return variable is also dropped, because
    that value is never used.

    A simplified version of the semantic match that finds this problem is as
    follows: (http://coccinelle.lip6.fr/)

    //
    (
    if@p1 (\(ret < 0\|ret != 0\))
    { ... return ret; }
    |
    ret@p1 = 0
    )
    ... when != ret = e1
    when != &ret
    *if(...)
    {
    ... when != ret = e2
    when forall
    return ret;
    }

    //

    Signed-off-by: Julia Lawall
    Signed-off-by: David S. Miller

    Julia Lawall
     
  • David S. Miller
     

31 Aug, 2012

5 commits

  • Existing code assumes that del_timer returns true for alive conntrack
    entries. However, this is not true if reliable events are enabled.
    In that case, del_timer may return true for entries that were
    just inserted in the dying list. Note that packets / ctnetlink may
    hold references to conntrack entries that were just inserted to such
    list.

    This patch fixes the issue by adding an independent timer for
    event delivery. This increases the size of the ecache extension.
    Still we can revisit this later and use variable size extensions
    to allocate this area on demand.

    Tested-by: Oliver Smith
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     
  • Following lockdep splat was reported by Pavel Roskin :

    [ 1570.586223] ===============================
    [ 1570.586225] [ INFO: suspicious RCU usage. ]
    [ 1570.586228] 3.6.0-rc3-wl-main #98 Not tainted
    [ 1570.586229] -------------------------------
    [ 1570.586231] /home/proski/src/linux/net/ipv4/route.c:645 suspicious rcu_dereference_check() usage!
    [ 1570.586233]
    [ 1570.586233] other info that might help us debug this:
    [ 1570.586233]
    [ 1570.586236]
    [ 1570.586236] rcu_scheduler_active = 1, debug_locks = 0
    [ 1570.586238] 2 locks held by Chrome_IOThread/4467:
    [ 1570.586240] #0: (slock-AF_INET){+.-...}, at: [] release_sock+0x2c/0xa0
    [ 1570.586253] #1: (fnhe_lock){+.-...}, at: [] update_or_create_fnhe+0x2c/0x270
    [ 1570.586260]
    [ 1570.586260] stack backtrace:
    [ 1570.586263] Pid: 4467, comm: Chrome_IOThread Not tainted 3.6.0-rc3-wl-main #98
    [ 1570.586265] Call Trace:
    [ 1570.586271] [] lockdep_rcu_suspicious+0xfd/0x130
    [ 1570.586275] [] update_or_create_fnhe+0x15c/0x270
    [ 1570.586278] [] __ip_rt_update_pmtu+0x73/0xb0
    [ 1570.586282] [] ip_rt_update_pmtu+0x29/0x90
    [ 1570.586285] [] inet_csk_update_pmtu+0x2c/0x80
    [ 1570.586290] [] tcp_v4_mtu_reduced+0x2e/0xc0
    [ 1570.586293] [] tcp_release_cb+0xa4/0xb0
    [ 1570.586296] [] release_sock+0x55/0xa0
    [ 1570.586300] [] tcp_sendmsg+0x4af/0xf50
    [ 1570.586305] [] inet_sendmsg+0x120/0x230
    [ 1570.586308] [] ? inet_sk_rebuild_header+0x40/0x40
    [ 1570.586312] [] ? sock_update_classid+0xbd/0x3b0
    [ 1570.586315] [] ? sock_update_classid+0x130/0x3b0
    [ 1570.586320] [] do_sock_write+0xc5/0xe0
    [ 1570.586323] [] sock_aio_write+0x53/0x80
    [ 1570.586328] [] do_sync_write+0xa3/0xe0
    [ 1570.586332] [] vfs_write+0x165/0x180
    [ 1570.586335] [] sys_write+0x45/0x90
    [ 1570.586340] [] system_call_fastpath+0x16/0x1b

    Signed-off-by: Eric Dumazet
    Reported-by: Pavel Roskin
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • When tearing down a net namespace, ipv4 mr_table structures are freed
    without first deactivating their timers. This can result in a crash in
    run_timer_softirq.
    This patch mimics the corresponding behaviour in ipv6.
    Locking and synchronization seem to be adequate.
    We are about to kfree mrt, so existing code should already make sure that
    no other references to mrt are pending or can be created by incoming traffic.
    The functions invoked here do not cause new references to mrt or other
    race conditions to be created.
    Invoking del_timer_sync guarantees that ipmr_expire_timer is inactive.
    Both ipmr_expire_process (whose completion we may have to wait in
    del_timer_sync) and mroute_clean_tables internally use mfc_unres_lock
    or other synchronizations when needed, and they both only modify mrt.

    Tested in Linux 3.4.8.

    Signed-off-by: Francesco Ruggeri
    Signed-off-by: David S. Miller

    Francesco Ruggeri
     
  • Avoid to use synchronize_rcu in l2tp_tunnel_free because context may be
    atomic.

    Signed-off-by: Dmitry Kozlov
    Signed-off-by: David S. Miller

    xeb@mail.ru
     
  • We're hitting bug while trying to reinsert an already existing
    expectation:

    kernel BUG at kernel/timer.c:895!
    invalid opcode: 0000 [#1] SMP
    [...]
    Call Trace:

    [] nf_ct_expect_related_report+0x4a0/0x57a [nf_conntrack]
    [] ? in4_pton+0x72/0x131
    [] ip_nat_sdp_media+0xeb/0x185 [nf_nat_sip]
    [] set_expected_rtp_rtcp+0x32d/0x39b [nf_conntrack_sip]
    [] process_sdp+0x30c/0x3ec [nf_conntrack_sip]
    [] ? irq_exit+0x9a/0x9c
    [] ? ip_nat_sdp_media+0x185/0x185 [nf_nat_sip]

    We have to remove the RTP expectation if the RTCP expectation hits EBUSY
    since we keep trying with other ports until we succeed.

    Reported-by: Rafal Fitt
    Signed-off-by: Pablo Neira Ayuso

    Pablo Neira Ayuso
     

30 Aug, 2012

4 commits

  • Initialize return variable before exiting on an error path.

    A simplified version of the semantic match that finds this problem is as
    follows: (http://coccinelle.lip6.fr/)

    //
    (
    if@p1 (\(ret < 0\|ret != 0\))
    { ... return ret; }
    |
    ret@p1 = 0
    )
    ... when != ret = e1
    when != &ret
    *if(...)
    {
    ... when != ret = e2
    when forall
    return ret;
    }

    //

    Signed-off-by: Julia Lawall
    Signed-off-by: Pablo Neira Ayuso

    Julia Lawall
     
  • Initialize return variable before exiting on an error path.

    A simplified version of the semantic match that finds this problem is as
    follows: (http://coccinelle.lip6.fr/)

    //
    (
    if@p1 (\(ret < 0\|ret != 0\))
    { ... return ret; }
    |
    ret@p1 = 0
    )
    ... when != ret = e1
    when != &ret
    *if(...)
    {
    ... when != ret = e2
    when forall
    return ret;
    }

    //

    Signed-off-by: Julia Lawall
    Signed-off-by: Pablo Neira Ayuso

    Julia Lawall
     
  • Initialize return variable before exiting on an error path.

    A simplified version of the semantic match that finds this problem is as
    follows: (http://coccinelle.lip6.fr/)

    //
    (
    if@p1 (\(ret < 0\|ret != 0\))
    { ... return ret; }
    |
    ret@p1 = 0
    )
    ... when != ret = e1
    when != &ret
    *if(...)
    {
    ... when != ret = e2
    when forall
    return ret;
    }

    //

    Signed-off-by: Julia Lawall
    Acked-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso

    Julia Lawall
     
  • Against -net.

    In the patch "netpoll: re-enable irq in poll_napi()", I tried to
    fix the following warning:

    [100718.051041] ------------[ cut here ]------------
    [100718.051048] WARNING: at kernel/softirq.c:159 local_bh_enable_ip+0x7d/0xb0()
    (Not tainted)
    [100718.051049] Hardware name: ProLiant BL460c G7
    ...
    [100718.051068] Call Trace:
    [100718.051073] [] ? warn_slowpath_common+0x87/0xc0
    [100718.051075] [] ? warn_slowpath_null+0x1a/0x20
    [100718.051077] [] ? local_bh_enable_ip+0x7d/0xb0
    [100718.051080] [] ? _spin_unlock_bh+0x1b/0x20
    [100718.051085] [] ? be_process_mcc+0x74/0x230 [be2net]
    [100718.051088] [] ? be_poll_tx_mcc+0x16c/0x290 [be2net]
    [100718.051090] [] ? netpoll_poll_dev+0xd6/0x490
    [100718.051095] [] ? bond_poll_controller+0x75/0x80 [bonding]
    [100718.051097] [] ? netpoll_poll_dev+0x45/0x490
    [100718.051100] [] ? ksize+0x19/0x80
    [100718.051102] [] ? netpoll_send_skb_on_dev+0x157/0x240

    by reenabling IRQ before calling ->poll, but it seems more
    problems are introduced after that patch:

    http://ozlabs.org/~akpm/stuff/IMG_20120824_122054.jpg
    http://marc.info/?l=linux-netdev&m=134563282530588&w=2

    So it is safe to fix be2net driver code directly.

    This patch reverts the offending commit and fixes be_poll() by
    avoid disabling BH there, this is okay because be_poll()
    can be called either by poll_napi() which already disables
    IRQ, or by net_rx_action() which already disables BH.

    Reported-by: Andrew Morton
    Reported-by: Sylvain Munaut
    Cc: Sylvain Munaut
    Cc: Andrew Morton
    Cc: David Miller
    Cc: Sathya Perla
    Cc: Subbu Seetharaman
    Cc: Ajit Khaparde
    Signed-off-by: Cong Wang
    Tested-by: Sylvain Munaut
    Signed-off-by: David S. Miller

    Amerigo Wang
     

26 Aug, 2012

1 commit

  • Pull nfsd bugfixes from J. Bruce Fields:
    "Particular thanks to Michael Tokarev, Malahal Naineni, and Jamie
    Heilman for their testing and debugging help."

    * 'for-3.6' of git://linux-nfs.org/~bfields/linux:
    svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping
    svcrpc: sends on closed socket should stop immediately
    svcrpc: fix BUG() in svc_tcp_clear_pages
    nfsd4: fix security flavor of NFSv4.0 callback

    Linus Torvalds
     

25 Aug, 2012

3 commits

  • John W. Linville says:

    ====================
    This batch of fixes is intended for 3.6...

    Johannes Berg gives us a pair of iwlwifi fixes. One corrects some
    improperly defined ifdefs that lead to crashes and BUG_ONs. The other
    prevents attempts to read SRAM for devices that aren't actually started.

    Julia Lawall provides an ipw2100 fix to properly set the return code
    from a function call before testing it! :-)

    Thomas Huehn corrects the improper use of a constant related to a power
    setting in ath5k.

    Thomas Pedersen offers a mac80211 fix to properly handle destination
    addresses of unicast frames passing though a mesh gate.

    Vladimir Zapolskiy provides a brcmsmac fix to properly mark the
    interface state when the device goes down.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The cwnd reduction in fast recovery is based on the number of packets
    newly delivered per ACK. For non-sack connections every DUPACK
    signifies a packet has been delivered, but the sender mistakenly
    skips counting them for cwnd reduction.

    The fix is to compute newly_acked_sacked after DUPACKs are accounted
    in sacked_out for non-sack connections.

    Signed-off-by: Yuchung Cheng
    Acked-by: Nandita Dukkipati
    Acked-by: Neal Cardwell
    Signed-off-by: David S. Miller

    Yuchung Cheng
     
  • Non-root user-space processes can send Netlink messages to other
    processes that are well-known for being subscribed to Netlink
    asynchronous notifications. This allows ilegitimate non-root
    process to send forged messages to Netlink subscribers.

    The userspace process usually verifies the legitimate origin in
    two ways:

    a) Socket credentials. If UID != 0, then the message comes from
    some ilegitimate process and the message needs to be dropped.

    b) Netlink portID. In general, portID == 0 means that the origin
    of the messages comes from the kernel. Thus, discarding any
    message not coming from the kernel.

    However, ctnetlink sets the portID in event messages that has
    been triggered by some user-space process, eg. conntrack utility.
    So other processes subscribed to ctnetlink events, eg. conntrackd,
    know that the event was triggered by some user-space action.

    Neither of the two ways to discard ilegitimate messages coming
    from non-root processes can help for ctnetlink.

    This patch adds capability validation in case that dst_pid is set
    in netlink_sendmsg(). This approach is aggressive since existing
    applications using any Netlink bus to deliver messages between
    two user-space processes will break. Note that the exception is
    NETLINK_USERSOCK, since it is reserved for netlink-to-netlink
    userspace communication.

    Still, if anyone wants that his Netlink bus allows netlink-to-netlink
    userspace, then they can set NL_NONROOT_SEND. However, by default,
    I don't think it makes sense to allow to use NETLINK_ROUTE to
    communicate two processes that are sending no matter what information
    that is not related to link/neighbouring/routing. They should be using
    NETLINK_USERSOCK instead for that.

    Signed-off-by: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Pablo Neira Ayuso
     

24 Aug, 2012

3 commits


23 Aug, 2012

3 commits

  • John W. Linville
     
  • Sylvain Munault reported following info :

    - TCP connection get "stuck" with data in send queue when doing
    "large" transfers ( like typing 'ps ax' on a ssh connection )
    - Only happens on path where the PMTU is lower than the MTU of
    the interface
    - Is not present right after boot, it only appears 10-20min after
    boot or so. (and that's inside the _same_ TCP connection, it works
    fine at first and then in the same ssh session, it'll get stuck)
    - Definitely seems related to fragments somehow since I see a router
    sending ICMP message saying fragmentation is needed.
    - Exact same setup works fine with kernel 3.5.1

    Problem happens when the 10 minutes (ip_rt_mtu_expires) expiration
    period is over.

    ip_rt_update_pmtu() calls dst_set_expires() to rearm a new expiration,
    but dst_set_expires() does nothing because dst.expires is already set.

    It seems we want to set the expires field to a new value, regardless
    of prior one.

    With help from Julian Anastasov.

    Reported-by: Sylvain Munaut
    Signed-off-by: Eric Dumazet
    CC: Julian Anastasov
    Tested-by: Sylvain Munaut
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Pull ceph fixes from Sage Weil:
    "Jim's fix closes a narrow race introduced with the msgr changes. One
    fix resolves problems with debugfs initialization that Yan found when
    multiple client instances are created (e.g., two clusters mounted, or
    rbd + cephfs), another one fixes problems with mounting a nonexistent
    server subdirectory, and the last one fixes a divide by zero error
    from unsanitized ioctl input that Dan Carpenter found."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    ceph: avoid divide by zero in __validate_layout()
    libceph: avoid truncation due to racing banners
    ceph: tolerate (and warn on) extraneous dentry from mds
    libceph: delay debugfs initialization until we learn global_id

    Linus Torvalds
     

22 Aug, 2012

6 commits

  • The destination address of unicast frames forwarded through a mesh gate
    was being replaced with the broadcast address. Instead leave the
    original destination address as the mesh DA. If the nexthop address is
    not in the mpath table it will be resolved. If that fails, the frame
    will be forwarded to known mesh gates.

    Reported-by: Cedric Voncken
    Signed-off-by: Thomas Pedersen
    Signed-off-by: Johannes Berg

    Thomas Pedersen
     
  • Because the Ceph client messenger uses a non-blocking connect, it is
    possible for the sending of the client banner to race with the
    arrival of the banner sent by the peer.

    When ceph_sock_state_change() notices the connect has completed, it
    schedules work to process the socket via con_work(). During this
    time the peer is writing its banner, and arrival of the peer banner
    races with con_work().

    If con_work() calls try_read() before the peer banner arrives, there
    is nothing for it to do, after which con_work() calls try_write() to
    send the client's banner. In this case Ceph's protocol negotiation
    can complete succesfully.

    The server-side messenger immediately sends its banner and addresses
    after accepting a connect request, *before* actually attempting to
    read or verify the banner from the client. As a result, it is
    possible for the banner from the server to arrive before con_work()
    calls try_read(). If that happens, try_read() will read the banner
    and prepare protocol negotiation info via prepare_write_connect().
    prepare_write_connect() calls con_out_kvec_reset(), which discards
    the as-yet-unsent client banner. Next, con_work() calls
    try_write(), which sends the protocol negotiation info rather than
    the banner that the peer is expecting.

    The result is that the peer sees an invalid banner, and the client
    reports "negotiation failed".

    Fix this by moving con_out_kvec_reset() out of
    prepare_write_connect() to its callers at all locations except the
    one where the banner might still need to be sent.

    [elder@inktak.com: added note about server-side behavior]

    Signed-off-by: Jim Schutt
    Reviewed-by: Alex Elder

    Jim Schutt
     
  • Pablo Neira Ayuso discovered that avahi and
    potentially NetworkManager accept spoofed Netlink messages because of a
    kernel bug. The kernel passes all-zero SCM_CREDENTIALS ancillary data
    to the receiver if the sender did not provide such data, instead of not
    including any such data at all or including the correct data from the
    peer (as it is the case with AF_UNIX).

    This bug was introduced in commit 16e572626961
    (af_unix: dont send SCM_CREDENTIALS by default)

    This patch forces passing credentials for netlink, as
    before the regression.

    Another fix would be to not add SCM_CREDENTIALS in
    netlink messages if not provided by the sender, but it
    might break some programs.

    With help from Florian Weimer & Petr Matousek

    This issue is designated as CVE-2012-3520

    Signed-off-by: Eric Dumazet
    Cc: Petr Matousek
    Cc: Florian Weimer
    Cc: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Christian Casteyde reported a kmemcheck 32-bit read from uninitialized
    memory in __ip_select_ident().

    It turns out that __ip_make_skb() called ip_select_ident() before
    properly initializing iph->daddr.

    This is a bug uncovered by commit 1d861aa4b3fb (inet: Minimize use of
    cached route inetpeer.)

    Addresses https://bugzilla.kernel.org/show_bug.cgi?id=46131

    Reported-by: Christian Casteyde
    Signed-off-by: Eric Dumazet
    Cc: Stephen Hemminger
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Since 0e734419923bd ("ipv4: Use inet_csk_route_child_sock() in DCCP and
    TCP."), inet_csk_route_child_sock() is called instead of
    inet_csk_route_req().

    However, after creating the child-sock in tcp/dccp_v4_syn_recv_sock(),
    ireq->opt is set to NULL, before calling inet_csk_route_child_sock().
    Thus, inside inet_csk_route_child_sock() opt is always NULL and the
    SRR-options are not respected anymore.
    Packets sent by the server won't have the correct destination-IP.

    This patch fixes it by accessing newinet->inet_opt instead of ireq->opt
    inside inet_csk_route_child_sock().

    Reported-by: Luca Boccassi
    Signed-off-by: Christoph Paasch
    Signed-off-by: David S. Miller

    Christoph Paasch
     
  • Commit 6f458dfb40 (tcp: improve latencies of timer triggered events)
    added bug leading to following trace :

    [ 2866.131281] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
    [ 2866.131726]
    [ 2866.132188] =========================
    [ 2866.132281] [ BUG: held lock freed! ]
    [ 2866.132281] 3.6.0-rc1+ #622 Not tainted
    [ 2866.132281] -------------------------
    [ 2866.132281] kworker/0:1/652 is freeing memory ffff880019ec0000-ffff880019ec0a1f, with a lock still held there!
    [ 2866.132281] (sk_lock-AF_INET-RPC){+.+...}, at: [] tcp_sendmsg+0x29/0xcc6
    [ 2866.132281] 4 locks held by kworker/0:1/652:
    [ 2866.132281] #0: (rpciod){.+.+.+}, at: [] process_one_work+0x1de/0x47f
    [ 2866.132281] #1: ((&task->u.tk_work)){+.+.+.}, at: [] process_one_work+0x1de/0x47f
    [ 2866.132281] #2: (sk_lock-AF_INET-RPC){+.+...}, at: [] tcp_sendmsg+0x29/0xcc6
    [ 2866.132281] #3: (&icsk->icsk_retransmit_timer){+.-...}, at: [] run_timer_softirq+0x1ad/0x35f
    [ 2866.132281]
    [ 2866.132281] stack backtrace:
    [ 2866.132281] Pid: 652, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #622
    [ 2866.132281] Call Trace:
    [ 2866.132281] [] debug_check_no_locks_freed+0x112/0x159
    [ 2866.132281] [] ? __sk_free+0xfd/0x114
    [ 2866.132281] [] kmem_cache_free+0x6b/0x13a
    [ 2866.132281] [] __sk_free+0xfd/0x114
    [ 2866.132281] [] sk_free+0x1c/0x1e
    [ 2866.132281] [] tcp_write_timer+0x51/0x56
    [ 2866.132281] [] run_timer_softirq+0x218/0x35f
    [ 2866.132281] [] ? run_timer_softirq+0x1ad/0x35f
    [ 2866.132281] [] ? rb_commit+0x58/0x85
    [ 2866.132281] [] ? tcp_write_timer_handler+0x148/0x148
    [ 2866.132281] [] __do_softirq+0xcb/0x1f9
    [ 2866.132281] [] ? _raw_spin_unlock+0x29/0x2e
    [ 2866.132281] [] call_softirq+0x1c/0x30
    [ 2866.132281] [] do_softirq+0x4a/0xa6
    [ 2866.132281] [] irq_exit+0x51/0xad
    [ 2866.132281] [] do_IRQ+0x9d/0xb4
    [ 2866.132281] [] common_interrupt+0x6f/0x6f
    [ 2866.132281] [] ? sched_clock_cpu+0x58/0xd1
    [ 2866.132281] [] ? _raw_spin_unlock_irqrestore+0x4c/0x56
    [ 2866.132281] [] mod_timer+0x178/0x1a9
    [ 2866.132281] [] sk_reset_timer+0x19/0x26
    [ 2866.132281] [] tcp_rearm_rto+0x99/0xa4
    [ 2866.132281] [] tcp_event_new_data_sent+0x6e/0x70
    [ 2866.132281] [] tcp_write_xmit+0x7de/0x8e4
    [ 2866.132281] [] ? __alloc_skb+0xa0/0x1a1
    [ 2866.132281] [] __tcp_push_pending_frames+0x2e/0x8a
    [ 2866.132281] [] tcp_sendmsg+0xb32/0xcc6
    [ 2866.132281] [] inet_sendmsg+0xaa/0xd5
    [ 2866.132281] [] ? inet_autobind+0x5f/0x5f
    [ 2866.132281] [] ? trace_clock_local+0x9/0xb
    [ 2866.132281] [] sock_sendmsg+0xa3/0xc4
    [ 2866.132281] [] ? rb_reserve_next_event+0x26f/0x2d5
    [ 2866.132281] [] ? native_sched_clock+0x29/0x6f
    [ 2866.132281] [] ? sched_clock+0x9/0xd
    [ 2866.132281] [] ? trace_clock_local+0x9/0xb
    [ 2866.132281] [] kernel_sendmsg+0x37/0x43
    [ 2866.132281] [] xs_send_kvec+0x77/0x80
    [ 2866.132281] [] xs_sendpages+0x6f/0x1a0
    [ 2866.132281] [] ? try_to_del_timer_sync+0x55/0x61
    [ 2866.132281] [] xs_tcp_send_request+0x55/0xf1
    [ 2866.132281] [] xprt_transmit+0x89/0x1db
    [ 2866.132281] [] ? call_connect+0x3c/0x3c
    [ 2866.132281] [] call_transmit+0x1c5/0x20e
    [ 2866.132281] [] __rpc_execute+0x6f/0x225
    [ 2866.132281] [] ? call_connect+0x3c/0x3c
    [ 2866.132281] [] rpc_async_schedule+0x28/0x34
    [ 2866.132281] [] process_one_work+0x24d/0x47f
    [ 2866.132281] [] ? process_one_work+0x1de/0x47f
    [ 2866.132281] [] ? __rpc_execute+0x225/0x225
    [ 2866.132281] [] worker_thread+0x236/0x317
    [ 2866.132281] [] ? process_scheduled_works+0x2f/0x2f
    [ 2866.132281] [] kthread+0x9a/0xa2
    [ 2866.132281] [] kernel_thread_helper+0x4/0x10
    [ 2866.132281] [] ? retint_restore_args+0x13/0x13
    [ 2866.132281] [] ? __init_kthread_worker+0x5a/0x5a
    [ 2866.132281] [] ? gs_change+0x13/0x13
    [ 2866.308506] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
    [ 2866.309689] =============================================================================
    [ 2866.310254] BUG TCP (Not tainted): Object already free
    [ 2866.310254] -----------------------------------------------------------------------------
    [ 2866.310254]

    The bug comes from the fact that timer set in sk_reset_timer() can run
    before we actually do the sock_hold(). socket refcount reaches zero and
    we free the socket too soon.

    timer handler is not allowed to reduce socket refcnt if socket is owned
    by the user, or we need to change sk_reset_timer() implementation.

    We should take a reference on the socket in case TCP_DELACK_TIMER_DEFERRED
    or TCP_DELACK_TIMER_DEFERRED bit are set in tsq_flags

    Also fix a typo in tcp_delack_timer(), where TCP_WRITE_TIMER_DEFERRED
    was used instead of TCP_DELACK_TIMER_DEFERRED.

    For consistency, use same socket refcount change for TCP_MTU_REDUCED_DEFERRED,
    even if not fired from a timer.

    Reported-by: Fengguang Wu
    Tested-by: Fengguang Wu
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

21 Aug, 2012

4 commits

  • The rpc server tries to ensure that there will be room to send a reply
    before it receives a request.

    It does this by tracking, in xpt_reserved, an upper bound on the total
    size of the replies that is has already committed to for the socket.

    Currently it is adding in the estimate for a new reply *before* it
    checks whether there is space available. If it finds that there is not
    space, it then subtracts the estimate back out.

    This may lead the subsequent svc_xprt_enqueue to decide that there is
    space after all.

    The results is a svc_recv() that will repeatedly return -EAGAIN, causing
    server threads to loop without doing any actual work.

    Cc: stable@vger.kernel.org
    Reported-by: Michael Tokarev
    Tested-by: Michael Tokarev
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • svc_tcp_sendto sets XPT_CLOSE if we fail to transmit the entire reply.
    However, the XPT_CLOSE won't be acted on immediately. Meanwhile other
    threads could send further replies before the socket is really shut
    down. This can manifest as data corruption: for example, if a truncated
    read reply is followed by another rpc reply, that second reply will look
    to the client like further read data.

    Symptoms were data corruption preceded by svc_tcp_sendto logging
    something like

    kernel: rpc-srv/tcp: nfsd: sent only 963696 when sending 1048708 bytes - shutting down socket

    Cc: stable@vger.kernel.org
    Reported-by: Malahal Naineni
    Tested-by: Malahal Naineni
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • Examination of svc_tcp_clear_pages shows that it assumes sk_tcplen is
    consistent with sk_pages[] (in particular, sk_pages[n] can't be NULL if
    sk_tcplen would lead us to expect n pages of data).

    svc_tcp_restore_pages zeroes out sk_pages[] while leaving sk_tcplen.
    This is OK, since both functions are serialized by XPT_BUSY. However,
    that means the inconsistency must be repaired before dropping XPT_BUSY.

    Therefore we should be ensuring that svc_tcp_save_pages repairs the
    problem before exiting svc_tcp_recv_record on error.

    Symptoms were a BUG() in svc_tcp_clear_pages.

    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     
  • The debugfs directory includes the cluster fsid and our unique global_id.
    We need to delay the initialization of the debug entry until we have
    learned both the fsid and our global_id from the monitor or else the
    second client can't create its debugfs entry and will fail (and multiple
    client instances aren't properly reflected in debugfs).

    Reported by: Yan, Zheng
    Signed-off-by: Sage Weil
    Reviewed-by: Yehuda Sadeh

    Sage Weil
     

20 Aug, 2012

7 commits

  • Commit 1db20a52 (nfnetlink_log: Stop using NLA_PUT*().) incorrectly
    converted a NLA_PUT_BE16 macro to nla_put_be32() in nfnetlink_log:

    - NLA_PUT_BE16(inst->skb, NFULA_HWTYPE, htons(skb->dev->type));
    + if (nla_put_be32(inst->skb, NFULA_HWTYPE, htons(skb->dev->type))

    Signed-off-by: Patrick McHardy
    Signed-off-by: Pablo Neira Ayuso

    Patrick McHardy
     
  • This commit removes the sk_rx_dst_set calls from
    tcp_create_openreq_child(), because at that point the icsk_af_ops
    field of ipv6_mapped TCP sockets has not been set to its proper final
    value.

    Instead, to make sure we get the right sk_rx_dst_set variant
    appropriate for the address family of the new connection, we have
    tcp_v{4,6}_syn_recv_sock() directly call the appropriate function
    shortly after the call to tcp_create_openreq_child() returns.

    This also moves inet6_sk_rx_dst_set() to avoid a forward declaration
    with the new approach.

    Signed-off-by: Neal Cardwell
    Reported-by: Artem Savkov
    Cc: Eric Dumazet
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Neal Cardwell
     
  • Fix kernel-doc warning:

    Warning(net/core/dev.c:5745): No description found for parameter 'dev'

    Signed-off-by: Randy Dunlap
    Cc: "David S. Miller"
    Cc: netdev@vger.kernel.org
    Signed-off-by: David S. Miller

    Randy Dunlap
     
  • Commit 97bab73f (inet: Hide route peer accesses behind helpers.) introduced
    a bug in xfrm6_policy_destroy(). The xfrm_dst's _rt6i_peer member is not
    initialized, causing a false positive result from inetpeer_ptr_is_peer(),
    which in turn causes a NULL pointer dereference in inet_putpeer().

    Pid: 314, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #17 To Be Filled By O.E.M. To Be Filled By O.E.M./P4S800D-X
    EIP: 0060:[] EFLAGS: 00010246 CPU: 0
    EIP is at inet_putpeer+0xe/0x16
    EAX: 00000000 EBX: f3481700 ECX: 00000000 EDX: 000dd641
    ESI: f3481700 EDI: c05e949c EBP: f551def4 ESP: f551def4
    DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
    CR0: 8005003b CR2: 00000070 CR3: 3243d000 CR4: 00000750
    DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
    DR6: ffff0ff0 DR7: 00000400
    f551df04 c0423de1 00000000 f3481700 f551df18 c038d5f7 f254b9f8 f551df28
    f34f85d8 f551df20 c03ef48d f551df3c c0396870 f30697e8 f24e1738 c05e98f4
    f5509540 c05cd2b4 f551df7c c0142d2b c043feb5 f5509540 00000000 c05cd2e8
    [] xfrm6_dst_destroy+0x42/0xdb
    [] dst_destroy+0x1d/0xa4
    [] xfrm_bundle_flo_delete+0x2b/0x36
    [] flow_cache_gc_task+0x85/0x9f
    [] process_one_work+0x122/0x441
    [] ? apic_timer_interrupt+0x31/0x38
    [] ? flow_cache_new_hashrnd+0x2b/0x2b
    [] worker_thread+0x113/0x3cc

    Fix by adding a init_dst() callback to struct xfrm_policy_afinfo to
    properly initialize the dst's peer pointer.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • In net/caif/chnl_net.c::chnl_recv_cb() we call skb_header_pointer()
    which may return NULL, but we do not check for a NULL pointer before
    dereferencing it.
    This patch adds such a NULL check and properly free's allocated memory
    and return an error (-EINVAL) on failure - much better than crashing..

    Signed-off-by: Jesper Juhl
    Acked-by: Sjur Brændeland
    Signed-off-by: David S. Miller

    Jesper Juhl
     
  • Pable Neira Ayuso says:

    ====================
    The following five patches contain fixes for 3.6-rc, they are:

    * Two fixes for message parsing in the SIP conntrack helper, from
    Patrick McHardy.

    * One fix for the SIP helper introduced in the user-space cthelper
    infrastructure, from Patrick McHardy.

    * fix missing appropriate locking while modifying one conntrack entry
    from the nfqueue integration code, from myself.

    * fix possible access to uninitiliazed timer in the nf_conntrack
    expectation infrastructure, from myself.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • If a packet is emitted on one socket in one group of fanout sockets,
    it is transmitted again. It is thus read again on one of the sockets
    of the fanout group. This result in a loop for software which
    generate packets when receiving one.
    This retransmission is not the intended behavior: a fanout group
    must behave like a single socket. The packet should not be
    transmitted on a socket if it originates from a socket belonging
    to the same fanout group.

    This patch fixes the issue by changing the transmission check to
    take fanout group info account.

    Reported-by: Aleksandr Kotov
    Signed-off-by: Eric Leblond
    Signed-off-by: David S. Miller

    Eric Leblond