20 Oct, 2011

1 commit


19 Oct, 2011

4 commits

  • The transparent socket option setting was not copied to the time wait
    socket when an inet socket was being replaced by a time wait socket. This
    broke the --transparent option of the socket match and may have caused
    that FIN packets belonging to sockets in FIN_WAIT2 or TIME_WAIT state
    were being dropped by the packet filter.

    Signed-off-by: KOVACS Krisztian
    Signed-off-by: David S. Miller

    KOVACS Krisztian
     
  • The Bluetooth stack has internal connection handlers for all of the various
    Bluetooth protocols, and unfortunately, they are currently lacking the LSM
    hooks found in the core network stack's connection handlers. I say
    unfortunately, because this can cause problems for users who have have an
    LSM enabled and are using certain Bluetooth devices. See one problem
    report below:

    * http://bugzilla.redhat.com/show_bug.cgi?id=741703

    In order to keep things simple at this point in time, this patch fixes the
    problem by cloning the parent socket's LSM attributes to the newly created
    child socket. If we decide we need a more elaborate LSM marking mechanism
    for Bluetooth (I somewhat doubt this) we can always revisit this decision
    in the future.

    Reported-by: James M. Cape
    Signed-off-by: Paul Moore
    Acked-by: James Morris
    Signed-off-by: David S. Miller

    Paul Moore
     
  • l2tp_xmit_skb() can leak one skb if skb_cow_head() returns an error.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Need to cleanup bridge device timers and ports when being bridge
    device is being removed via netlink.

    This fixes the problem of observed when doing:
    ip link add br0 type bridge
    ip link set dev eth1 master br0
    ip link set br0 up
    ip link del br0

    which would cause br0 to hang in unregister_netdev because
    of leftover reference count.

    Reported-by: Sridhar Samudrala
    Signed-off-by: Stephen Hemminger
    Acked-by: Sridhar Samudrala
    Signed-off-by: David S. Miller

    stephen hemminger
     

18 Oct, 2011

4 commits

  • David S. Miller
     
  • x25_find_listener does not check that the amount of call user data given
    in the skb is big enough in per-socket comparisons, hence buffer
    overreads may occur. Fix this by adding a check.

    Signed-off-by: Matthew Daley
    Cc: Eric Dumazet
    Cc: Andrew Hendry
    Cc: stable
    Acked-by: Andrew Hendry
    Signed-off-by: David S. Miller

    Matthew Daley
     
  • There are multiple locations in the X.25 packet layer where a skb is
    assumed to be of at least a certain size and that all its data is
    currently available at skb->data. These assumptions are not checked,
    hence buffer overreads may occur. Use pskb_may_pull to check these
    minimal size assumptions and ensure that data is available at skb->data
    when necessary, as well as use skb_copy_bits where needed.

    Signed-off-by: Matthew Daley
    Cc: Eric Dumazet
    Cc: Andrew Hendry
    Cc: stable
    Acked-by: Andrew Hendry
    Signed-off-by: David S. Miller

    Matthew Daley
     
  • X.25 call user data is being copied in its entirety from incoming messages
    without consideration to the size of the destination buffers, leading to
    possible buffer overflows. Validate incoming call user data lengths before
    these copies are performed.

    It appears this issue was noticed some time ago, however nothing seemed to
    come of it: see http://www.spinics.net/lists/linux-x25/msg00043.html and
    commit 8db09f26f912f7c90c764806e804b558da520d4f.

    Signed-off-by: Matthew Daley
    Acked-by: Eric Dumazet
    Tested-by: Andrew Hendry
    Cc: stable
    Signed-off-by: David S. Miller

    Matthew Daley
     

13 Oct, 2011

1 commit

  • ip_vs_mutext is used by both netns shutdown code and startup
    and both implicit uses sk_lock-AF_INET mutex.

    cleanup CPU-1 startup CPU-2
    ip_vs_dst_event() ip_vs_genl_set_cmd()
    sk_lock-AF_INET __ip_vs_mutex
    sk_lock-AF_INET
    __ip_vs_mutex
    * DEAD LOCK *

    A new mutex placed in ip_vs netns struct called sync_mutex is added.

    Comments from Julian and Simon added.
    This patch has been running for more than 3 month now and it seems to work.

    Ver. 3
    IP_VS_SO_GET_DAEMON in do_ip_vs_get_ctl protected by sync_mutex
    instead of __ip_vs_mutex as sugested by Julian.

    Signed-off-by: Hans Schillstrom
    Acked-by: Julian Anastasov
    Signed-off-by: Simon Horman
    Signed-off-by: Pablo Neira Ayuso

    Hans Schillstrom
     

11 Oct, 2011

1 commit


07 Oct, 2011

1 commit

  • This resolves a regression seen by some users of bridging.
    Some users use the bridge like a dummy device.
    They expect to be able to put an IPv6 address on the device
    with no ports attached. Although there are better ways of doing
    this, there is no reason to not allow it.

    Note: the bridge still will reflect the state of ports in the
    bridge if there are any added.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     

06 Oct, 2011

1 commit


05 Oct, 2011

3 commits

  • lost_skb_hint is used by tcp_mark_head_lost() to mark the first unhandled skb.
    lost_cnt_hint is the number of packets or sacked packets before the lost_skb_hint;
    When shifting a skb that is before the lost_skb_hint, if tcp_is_fack() is ture,
    the skb has already been counted in the lost_cnt_hint; if tcp_is_fack() is false,
    tcp_sacktag_one() will increase the lost_cnt_hint. So tcp_shifted_skb() does not
    need to adjust the lost_cnt_hint by itself. When shifting a skb that is equal to
    lost_skb_hint, the shifted packets will not be counted by tcp_mark_head_lost().
    So tcp_shifted_skb() should adjust the lost_cnt_hint even tcp_is_fack(tp) is true.

    Signed-off-by: Zheng Yan
    Signed-off-by: David S. Miller

    Yan, Zheng
     
  • tcp_v4_clear_md5_list() assumes that multiple tcp md5sig peers
    only hold one reference to md5sig_pool. but tcp_v4_md5_do_add()
    increases use count of md5sig_pool for each peer. This patch
    makes tcp_v4_md5_do_add() only increases use count for the first
    tcp md5sig peer.

    Signed-off-by: Zheng Yan
    Signed-off-by: David S. Miller

    Yan, Zheng
     
  • * git://github.com/davem330/net:
    pch_gbe: Fixed the issue on which a network freezes
    pch_gbe: Fixed the issue on which PC was frozen when link was downed.
    make PACKET_STATISTICS getsockopt report consistently between ring and non-ring
    net: xen-netback: correctly restart Tx after a VM restore/migrate
    bonding: properly stop queuing work when requested
    can bcm: fix incomplete tx_setup fix
    RDSRDMA: Fix cleanup of rds_iw_mr_pool
    net: Documentation: Fix type of variables
    ibmveth: Fix oops on request_irq failure
    ipv6: nullify ipv6_ac_list and ipv6_fl_list when creating new socket
    cxgb4: Fix EEH on IBM P7IOC
    can bcm: fix tx_setup off-by-one errors
    MAINTAINERS: tehuti: Alexander Indenbaum's address bounces
    dp83640: reduce driver noise
    ptp: fix L2 event message recognition

    Linus Torvalds
     

04 Oct, 2011

1 commit

  • This is a minor change.

    Up until kernel 2.6.32, getsockopt(fd, SOL_PACKET, PACKET_STATISTICS,
    ...) would return total and dropped packets since its last invocation. The
    introduction of socket queue overflow reporting [1] changed drop
    rate calculation in the normal packet socket path, but not when using a
    packet ring. As a result, the getsockopt now returns different statistics
    depending on the reception method used. With a ring, it still returns the
    count since the last call, as counts are incremented in tpacket_rcv and
    reset in getsockopt. Without a ring, it returns 0 if no drops occurred
    since the last getsockopt and the total drops over the lifespan of
    the socket otherwise. The culprit is this line in packet_rcv, executed
    on a drop:

    drop_n_acct:
    po->stats.tp_drops = atomic_inc_return(&sk->sk_drops);

    As it shows, the new drop number it taken from the socket drop counter,
    which is not reset at getsockopt. I put together a small example
    that demonstrates the issue [2]. It runs for 10 seconds and overflows
    the queue/ring on every odd second. The reported drop rates are:
    ring: 16, 0, 16, 0, 16, ...
    non-ring: 0, 15, 0, 30, 0, 46, 0, 60, 0 , 74.

    Note how the even ring counts monotonically increase. Because the
    getsockopt adds tp_drops to tp_packets, total counts are similarly
    reported cumulatively. Long story short, reinstating the original code, as
    the below patch does, fixes the issue at the cost of additional per-packet
    cycles. Another solution that does not introduce per-packet overhead
    is be to keep the current data path, record the value of sk_drops at
    getsockopt() at call N in a new field in struct packetsock and subtract
    that when reporting at call N+1. I'll be happy to code that, instead,
    it's just more messy.

    [1] http://patchwork.ozlabs.org/patch/35665/
    [2] http://kernel.googlecode.com/files/test-packetsock-getstatistics.c

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

03 Oct, 2011

1 commit


30 Sep, 2011

3 commits

  • * 'for-linus' of git://github.com/NewDreamNetwork/ceph-client:
    libceph: fix pg_temp mapping update
    libceph: fix pg_temp mapping calculation
    libceph: fix linger request requeuing
    libceph: fix parse options memory leak
    libceph: initialize ack_stamp to avoid unnecessary connection reset

    Linus Torvalds
     
  • The commit aabdcb0b553b9c9547b1a506b34d55a764745870 ("can bcm: fix tx_setup
    off-by-one errors") fixed only a part of the original problem reported by
    Andre Naujoks. It turned out that the original code needed to be re-ordered
    to reduce complexity and to finally fix the reported frame counting issues.

    Signed-off-by: Oliver Hartkopp
    Signed-off-by: David S. Miller

    Oliver Hartkopp
     
  • In the rds_iw_mr_pool struct the free_pinned field keeps track of
    memory pinned by free MRs. While this field is incremented properly
    upon allocation, it is never decremented upon unmapping. This would
    cause the rds_rdma module to crash the kernel upon unloading, by
    triggering the BUG_ON in the rds_iw_destroy_mr_pool function.

    This change keeps track of the MRs that become unpinned, so that
    free_pinned can be decremented appropriately.

    Signed-off-by: Jonathan Lallinger
    Signed-off-by: Steve Wise
    Signed-off-by: David S. Miller

    Jonathan Lallinger
     

29 Sep, 2011

4 commits

  • ipv6_ac_list and ipv6_fl_list from listening socket are inadvertently
    shared with new socket created for connection.

    Signed-off-by: Zheng Yan
    Signed-off-by: David S. Miller

    Yan, Zheng
     
  • This patch fixes two off-by-one errors that canceled each other out.
    Checking for the same condition two times in bcm_tx_timeout_tsklet() reduced
    the count of frames to be sent by one. This did not show up the first time
    tx_setup is invoked as an additional frame is sent due to TX_ANNONCE.
    Invoking a second tx_setup on the same item led to a reduced (by 1) number of
    sent frames.

    Reported-by: Andre Naujoks
    Signed-off-by: Oliver Hartkopp
    Signed-off-by: David S. Miller

    Oliver Hartkopp
     
  • The incremental map updates have a record for each pg_temp mapping that is
    to be add/updated (len > 0) or removed (len == 0). The old code was
    written as if the updates were a complete enumeration; that was just wrong.
    Update the code to remove 0-length entries and drop the rbtree traversal.

    This avoids misdirected (and hung) requests that manifest as server
    errors like

    [WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11

    Signed-off-by: Sage Weil

    Sage Weil
     
  • We need to apply the modulo pg_num calculation before looking up a pgid in
    the pg_temp mapping rbtree. This fixes pg_temp mappings, and fixes
    (some) misdirected requests that result in messages like

    [WRN] client4104 10.0.1.219:0/275025290 misdirected client4104.1:129 0.1 to osd0 not [1,0] in e11/11

    on the server and stall make the client block without getting a reply (at
    least until the pg_temp mapping goes way, but that can take a long long
    time).

    Reorder calc_pg_raw() a bit to make more sense.

    Signed-off-by: Sage Weil

    Sage Weil
     

28 Sep, 2011

7 commits


23 Sep, 2011

1 commit

  • corrects a critical bug of the GW feature. This bug made all the unicast
    packets destined to a GW to be sent as broadcast. This bug is present even if
    the sender GW feature is configured as OFF. It's an urgent bug fix and should
    be committed as soon as possible.

    This was a regression introduced by 43676ab590c3f8686fd047d34c3e33803eef71f0

    Signed-off-by: Antonio Quartulli
    Signed-off-by: Marek Lindner

    Antonio Quartulli
     

22 Sep, 2011

3 commits

  • Incorrect variable was used in validating the akm_suites array from
    NL80211_ATTR_AKM_SUITES. In addition, there was no explicit
    validation of the array length (we only have room for
    NL80211_MAX_NR_AKM_SUITES).

    This can result in a buffer write overflow for stack variables with
    arbitrary data from user space. The nl80211 commands using the affected
    functionality require GENL_ADMIN_PERM, so this is only exposed to admin
    users.

    Cc: stable@kernel.org
    Signed-off-by: Jouni Malinen
    Signed-off-by: John W. Linville

    Jouni Malinen
     
  • When asyncronous crypto algorithms are used, there might be many
    packets that passed the xfrm replay check, but the replay advance
    function is not called yet for these packets. So the replay check
    function would accept a replay of all of these packets. Also the
    system might crash if there are more packets in async processing
    than the size of the anti replay window, because the replay advance
    function would try to update the replay window beyond the bounds.

    This pach adds a second replay check after resuming from the async
    processing to fix these issues.

    Signed-off-by: Steffen Klassert
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Steffen Klassert
     
  • add new fib rule can cause BUG_ON happen
    the reproduce shell is
    ip rule add pref 38
    ip rule add pref 38
    ip rule add to 192.168.3.0/24 goto 38
    ip rule del pref 38
    ip rule add to 192.168.3.0/24 goto 38
    ip rule add pref 38

    then the BUG_ON will happen
    del BUG_ON and use (ctarget == NULL) identify whether this rule is unresolved

    Signed-off-by: Gao feng
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Gao feng
     

21 Sep, 2011

1 commit

  • When calling snmp6_alloc_dev fails, the snmp6 relevant memory
    are freed by snmp6_alloc_dev. Calling in6_dev_finish_destroy
    will free these memory twice.

    Double free will lead that undefined behavior occurs.

    Signed-off-by: Roy Li
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Roy Li
     

20 Sep, 2011

2 commits


19 Sep, 2011

1 commit

  • D-SACK is allowed to reside below snd_una. But the corresponding check
    in tcp_is_sackblock_valid() is the exact opposite. It looks like a typo.

    Signed-off-by: Zheng Yan
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Zheng Yan