30 Nov, 2010

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits)
    af_unix: limit recursion level
    pch_gbe driver: The wrong of initializer entry
    pch_gbe dreiver: chang author
    ucc_geth: fix ucc halt problem in half duplex mode
    inet: Fix __inet_inherit_port() to correctly increment bsockets and num_owners
    ehea: Add some info messages and fix an issue
    hso: fix disable_net
    NET: wan/x25_asy, move lapb_unregister to x25_asy_close_tty
    cxgb4vf: fix setting unicast/multicast addresses ...
    net, ppp: Report correct error code if unit allocation failed
    DECnet: don't leak uninitialized stack byte
    au1000_eth: fix invalid address accessing the MAC enable register
    dccp: fix error in updating the GAR
    tcp: restrict net.ipv4.tcp_adv_win_scale (#20312)
    netns: Don't leak others' openreq-s in proc
    Net: ceph: Makefile: Remove unnessary code
    vhost/net: fix rcu check usage
    econet: fix CVE-2010-3848
    econet: fix CVE-2010-3850
    econet: disallow NULL remote addr for sendmsg(), fixes CVE-2010-3849
    ...

    Linus Torvalds
     
  • Its easy to eat all kernel memory and trigger NMI watchdog, using an
    exploit program that queues unix sockets on top of others.

    lkml ref : http://lkml.org/lkml/2010/11/25/8

    This mechanism is used in applications, one choice we have is to have a
    recursion limit.

    Other limits might be needed as well (if we queue other types of files),
    since the passfd mechanism is currently limited by socket receive queue
    sizes only.

    Add a recursion_level to unix socket, allowing up to 4 levels.

    Each time we send an unix socket through sendfd mechanism, we copy its
    recursion level (plus one) to receiver. This recursion level is cleared
    when socket receive queue is emptied.

    Reported-by: Марк Коренберг
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

29 Nov, 2010

4 commits

  • inet sockets corresponding to passive connections are added to the bind hash
    using ___inet_inherit_port(). These sockets are later removed from the bind
    hash using __inet_put_port(). These two functions are not exactly symmetrical.
    __inet_put_port() decrements hashinfo->bsockets and tb->num_owners, whereas
    ___inet_inherit_port() does not increment them. This results in both of these
    going to -ve values.

    This patch fixes this by calling inet_bind_hash() from ___inet_inherit_port(),
    which does the right thing.

    'bsockets' and 'num_owners' were introduced by commit a9d8f9110d7e953c
    (inet: Allowing more than 64k connections and heavily optimize bind(0))

    Signed-off-by: Nagendra Singh Tomar
    Acked-by: Eric Dumazet
    Acked-by: Evgeniy Polyakov
    Signed-off-by: David S. Miller

    Nagendra Tomar
     
  • A single uninitialized padding byte is leaked to userspace.

    Signed-off-by: Dan Rosenberg
    CC: stable
    Signed-off-by: David S. Miller

    Dan Rosenberg
     
  • This fixes a bug in updating the Greatest Acknowledgment number Received (GAR):
    the current implementation does not track the greatest received value -
    lower values in the range AWL..AWH (RFC 4340, 7.5.1) erase higher ones.

    Signed-off-by: Gerrit Renker
    Signed-off-by: David S. Miller

    Gerrit Renker
     
  • tcp_win_from_space() does the following:

    if (sysctl_tcp_adv_win_scale > (-sysctl_tcp_adv_win_scale);
    else
    return space - (space >> sysctl_tcp_adv_win_scale);

    "space" is int.

    As per C99 6.5.7 (3) shifting int for 32 or more bits is
    undefined behaviour.

    Indeed, if sysctl_tcp_adv_win_scale is exactly 32,
    space >> 32 equals space and function returns 0.

    Which means we busyloop in tcp_fixup_rcvbuf().

    Restrict net.ipv4.tcp_adv_win_scale to [-31, 31].

    Fix https://bugzilla.kernel.org/show_bug.cgi?id=20312

    Steps to reproduce:

    echo 32 >/proc/sys/net/ipv4/tcp_adv_win_scale
    wget www.kernel.org
    [softlockup]

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan
     

28 Nov, 2010

2 commits


27 Nov, 2010

1 commit

  • * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
    NFS: Ensure we return the dirent->d_type when it is known
    NFS: Correct the array bound calculation in nfs_readdir_add_to_array
    NFS: Don't ignore errors from nfs_do_filldir()
    NFS: Fix the error handling in "uncached_readdir()"
    NFS: Fix a page leak in uncached_readdir()
    NFS: Fix a page leak in nfs_do_filldir()
    NFS: Assume eof if the server returns no readdir records
    NFS: Buffer overflow in ->decode_dirent() should not be fatal
    Pure nfs client performance using odirect.
    SUNRPC: Fix an infinite loop in call_refresh/call_refreshresult

    Linus Torvalds
     

25 Nov, 2010

5 commits

  • Don't declare variable sized array of iovecs on the stack since this
    could cause stack overflow if msg->msgiovlen is large. Instead, coalesce
    the user-supplied data into a new buffer and use a single iovec for it.

    Signed-off-by: Phil Blundell
    Signed-off-by: David S. Miller

    Phil Blundell
     
  • Add missing check for capable(CAP_NET_ADMIN) in SIOCSIFADDR operation.

    Signed-off-by: Phil Blundell
    Signed-off-by: David S. Miller

    Phil Blundell
     
  • Later parts of econet_sendmsg() rely on saddr != NULL, so return early
    with EINVAL if NULL was passed otherwise an oops may occur.

    Signed-off-by: Phil Blundell
    Signed-off-by: David S. Miller

    Phil Blundell
     
  • Use TCP_MIN_MSS instead of constant 64.

    Reported-by: Min Zhang
    Signed-off-by: David S. Miller

    David S. Miller
     
  • Vegard Nossum found a unix socket OOM was possible, posting an exploit
    program.

    My analysis is we can eat all LOWMEM memory before unix_gc() being
    called from unix_release_sock(). Moreover, the thread blocked in
    unix_gc() can consume huge amount of time to perform cleanup because of
    huge working set.

    One way to handle this is to have a sensible limit on unix_tot_inflight,
    tested from wait_for_unix_gc() and to force a call to unix_gc() if this
    limit is hit.

    This solves the OOM and also reduce overall latencies, and should not
    slowdown normal workloads.

    Reported-by: Vegard Nossum
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

24 Nov, 2010

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
    of/phylib: Use device tree properties to initialize Marvell PHYs.
    phylib: Add support for Marvell 88E1149R devices.
    phylib: Use common page register definition for Marvell PHYs.
    qlge: Fix incorrect usage of module parameters and netdev msg level
    ipv6: fix missing in6_ifa_put in addrconf
    SuperH IrDA: correct Baud rate error correction
    atl1c: Fix hardware type check for enabling OTP CLK
    net: allow GFP_HIGHMEM in __vmalloc()
    bonding: change list contact to netdev@vger.kernel.org
    e1000: fix screaming IRQ

    Linus Torvalds
     

23 Nov, 2010

1 commit


22 Nov, 2010

2 commits

  • Fix ref count bug introduced by

    commit 2de795707294972f6c34bae9de713e502c431296
    Author: Lorenzo Colitti
    Date: Wed Oct 27 18:16:49 2010 +0000

    ipv6: addrconf: don't remove address state on ifdown if the address
    is being kept

    Fix logic so that addrconf_ifdown() decrements the inet6_ifaddr
    refcnt correctly with in6_ifa_put().

    Reported-by: Stephen Hemminger
    Signed-off-by: John Fastabend
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    John Fastabend
     
  • We forgot to use __GFP_HIGHMEM in several __vmalloc() calls.

    In ceph, add the missing flag.

    In fib_trie.c, xfrm_hash.c and request_sock.c, using vzalloc() is
    cleaner and allows using HIGHMEM pages as well.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

20 Nov, 2010

3 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    ceph: fix readdir EOVERFLOW on 32-bit archs
    ceph: fix frag offset for non-leftmost frags
    ceph: fix dangling pointer
    ceph: explicitly specify page alignment in network messages
    ceph: make page alignment explicit in osd interface
    ceph: fix comment, remove extraneous args
    ceph: fix update of ctime from MDS
    ceph: fix version check on racing inode updates
    ceph: fix uid/gid on resent mds requests
    ceph: fix rdcache_gen usage and invalidate
    ceph: re-request max_size if cap auth changes
    ceph: only let auth caps update max_size
    ceph: fix open for write on clustered mds
    ceph: fix bad pointer dereference in ceph_fill_trace
    ceph: fix small seq message skipping
    Revert "ceph: update issue_seq on cap grant"

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (31 commits)
    net: fix kernel-doc for sk_filter_rcu_release
    be2net: Fix to avoid firmware update when interface is not open.
    netfilter: fix IP_VS dependencies
    net: irda: irttp: sync error paths of data- and udata-requests
    ipv6: Expose reachable and retrans timer values as msecs
    ipv6: Expose IFLA_PROTINFO timer values in msecs instead of jiffies
    3c59x: fix build failure on !CONFIG_PCI
    ipg.c: remove id [SUNDANCE, 0x1021]
    net: caif: spi: fix potential NULL dereference
    ath9k_htc: Avoid setting QoS control for non-QoS frames
    net: zero kobject in rx_queue_release
    net: Fix duplicate volatile warning.
    MAINTAINERS: Add stmmac maintainer
    bonding: fix a race in IGMP handling
    cfg80211: fix can_beacon_sec_chan, reenable HT40
    gianfar: fix signedness issue
    net: bnx2x: fix error value sign
    8139cp: fix checksum broken
    r8169: fix checksum broken
    rds: Integer overflow in RDS cmsg handling
    ...

    Linus Torvalds
     
  • Fix kernel-doc warning for sk_filter_rcu_release():

    Warning(net/core/filter.c:586): missing initial short description on line:
    * sk_filter_rcu_release: Release a socket filter by rcu_head

    Signed-off-by: Randy Dunlap
    Cc: "David S. Miller"
    Cc: netdev@vger.kernel.org
    Signed-off-by: David S. Miller

    Randy Dunlap
     

19 Nov, 2010

7 commits

  • When NF_CONNTRACK is enabled, IP_VS uses conntrack symbols.
    Therefore IP_VS can't be linked statically when conntrack
    is built modular.

    Reported-by: Justin P. Mattock
    Tested-by: Justin P. Mattock
    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     
  • irttp_data_request() returns meaningful errorcodes, while irttp_udata_request()
    just returns -1 in similar situations. Sync the two and the loglevels of the
    accompanying output.

    Signed-off-by: Wolfram Sang
    Cc: Samuel Ortiz
    Cc: David Miller
    Signed-off-by: David S. Miller

    Wolfram Sang
     
  • Expose reachable and retrans timer values in msecs instead of jiffies.
    Both timer values are already exposed as msecs in the neighbour table
    netlink interface.

    The creation timestamp format with increased precision is kept but
    cleaned up.

    Signed-off-by: Thomas Graf
    Cc: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • David S. Miller
     
  • IFLA_PROTINFO exposes timer related per device settings in jiffies.
    Change it to expose these values in msecs like the sysctl interface
    does.

    I did not find any users of IFLA_PROTINFO which rely on any of these
    values and even if there are, they are likely already broken because
    there is no way for them to reliably convert such a value to another
    time format.

    Signed-off-by: Thomas Graf
    Cc: YOSHIFUJI Hideaki
    Signed-off-by: David S. Miller

    Thomas Graf
     
  • netif_set_real_num_rx_queues() can decrement and increment
    the number of rx queues. For example ixgbe does this as
    features and offloads are toggled. Presumably this could
    also happen across down/up on most devices if the available
    resources changed (cpu offlined).

    The kobject needs to be zero'd in this case so that the
    state is not preserved across kobject_put()/kobject_init_and_add().

    This resolves the following error report.

    ixgbe 0000:03:00.0: eth2: NIC Link is Up 10 Gbps, Flow Control: RX/TX
    kobject (ffff880324b83210): tried to init an initialized object, something is seriously wrong.
    Pid: 1972, comm: lldpad Not tainted 2.6.37-rc18021qaz+ #169
    Call Trace:
    [] kobject_init+0x3a/0x83
    [] kobject_init_and_add+0x23/0x57
    [] ? mark_lock+0x21/0x267
    [] net_rx_queue_update_kobjects+0x63/0xc6
    [] netif_set_real_num_rx_queues+0x5f/0x78
    [] ixgbe_set_num_queues+0x1c6/0x1ca [ixgbe]
    [] ixgbe_init_interrupt_scheme+0x1e/0x79c [ixgbe]
    [] ixgbe_dcbnl_set_state+0x167/0x189 [ixgbe]

    Signed-off-by: John Fastabend
    Acked-by: Eric Dumazet
    Signed-off-by: David S. Miller

    John Fastabend
     
  • This follows wireless-testing 9236d838c920e90708570d9bbd7bb82d30a38130
    ("cfg80211: fix extension channel checks to initiate communication") and
    fixes accidental case fall-through. Without this fix, HT40 is entirely
    blocked.

    Signed-off-by: Mark Mentovai
    Cc: stable@kernel.org
    Acked-by: Luis R. Rodriguez

    Mark Mentovai
     

18 Nov, 2010

2 commits

  • In rds_cmsg_rdma_args(), the user-provided args->nr_local value is
    restricted to less than UINT_MAX. This seems to need a tighter upper
    bound, since the calculation of total iov_size can overflow, resulting
    in a small sock_kmalloc() allocation. This would probably just result
    in walking off the heap and crashing when calling rds_rdma_pages() with
    a high count value. If it somehow doesn't crash here, then memory
    corruption could occur soon after.

    Signed-off-by: Dan Rosenberg
    Signed-off-by: David S. Miller

    Dan Rosenberg
     
  • The big kernel lock has been removed from all these files at some point,
    leaving only the #include.

    Remove this too as a cleanup.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     

17 Nov, 2010

4 commits

  • When operating in a mode that initiates communication and using
    HT40 we should fail if we cannot use both primary and secondary
    channels to initiate communication. Our current ht40 allowmap
    only covers STA mode of operation, for beaconing modes we need
    a check on the fly as the mode of operation is dynamic and
    there other flags other than disable which we should read
    to check if we can initiate communication.

    Do not allow for initiating communication if our secondary HT40
    channel has is either disabled, has a passive scan flag, a
    no-ibss flag or is a radar channel. Userspace now has similar
    checks but this is also needed in-kernel.

    Reported-by: Jouni Malinen
    Cc: stable@kernel.org
    Signed-off-by: Luis R. Rodriguez
    Signed-off-by: John W. Linville

    Luis R. Rodriguez
     
  • otherwise xfrm_lookup will fail to find correct policy

    Signed-off-by: Ulrich Weber
    Signed-off-by: David S. Miller

    Ulrich Weber
     
  • Sending zero byte packets is not neccessarily an error (AF_INET accepts it,
    too), so just apply a shortcut. This was discovered because of a non-working
    software with WINE. See

    http://bugs.winehq.org/show_bug.cgi?id=19397#c86
    http://thread.gmane.org/gmane.linux.irda.general/1643

    for very detailed debugging information and a testcase. Kudos to Wolfgang for
    those!

    Reported-by: Wolfgang Schwotzer
    Signed-off-by: Wolfram Sang
    Tested-by: Mike Evans
    Signed-off-by: David S. Miller

    Wolfram Sang
     
  • Hi,

    We can simplify net/sunrpc/stats.c::rpc_alloc_iostats() a bit by getting
    rid of the unneeded local variable 'new'.

    Please CC me on replies.

    Signed-off-by: Jesper Juhl
    Signed-off-by: Trond Myklebust

    Jesper Juhl
     

13 Nov, 2010

6 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (66 commits)
    can-bcm: fix minor heap overflow
    gianfar: Do not call device_set_wakeup_enable() under a spinlock
    ipv6: Warn users if maximum number of routes is reached.
    docs: Add neigh/gc_thresh3 and route/max_size documentation.
    axnet_cs: fix resume problem for some Ax88790 chip
    ipv6: addrconf: don't remove address state on ifdown if the address is being kept
    tcp: Don't change unlocked socket state in tcp_v4_err().
    x25: Prevent crashing when parsing bad X.25 facilities
    cxgb4vf: add call to Firmware to reset VF State.
    cxgb4vf: Fail open if link_start() fails.
    cxgb4vf: flesh out PCI Device ID Table ...
    cxgb4vf: fix some errors in Gather List to skb conversion
    cxgb4vf: fix bug in Generic Receive Offload
    cxgb4vf: don't implement trivial (and incorrect) ndo_select_queue()
    ixgbe: Look inside vlan when determining offload protocol.
    bnx2x: Look inside vlan when determining checksum proto.
    vlan: Add function to retrieve EtherType from vlan packets.
    virtio-net: init link state correctly
    ucc_geth: Fix deadlock
    ucc_geth: Do not bring the whole IF down when TX failure.
    ...

    Linus Torvalds
     
  • On 64-bit platforms the ASCII representation of a pointer may be up to 17
    bytes long. This patch increases the length of the buffer accordingly.

    http://marc.info/?l=linux-netdev&m=128872251418192&w=2

    Reported-by: Dan Rosenberg
    Signed-off-by: Oliver Hartkopp
    CC: Linus Torvalds
    Signed-off-by: David S. Miller

    Oliver Hartkopp
     
  • This gives users at least some clue as to what the problem
    might be and how to go about fixing it.

    Signed-off-by: Ben Greear
    Signed-off-by: David S. Miller

    Ben Greear
     
  • Currently, addrconf_ifdown does not delete statically configured IPv6
    addresses when the interface is brought down. The intent is that when
    the interface comes back up the address will be usable again. However,
    this doesn't actually work, because the system stops listening on the
    corresponding solicited-node multicast address, so the address cannot
    respond to neighbor solicitations and thus receive traffic. Also, the
    code notifies the rest of the system that the address is being deleted
    (e.g, RTM_DELADDR), even though it is not. Fix it so that none of this
    state is updated if the address is being kept on the interface.

    Tested: Added a statically configured IPv6 address to an interface,
    started ping, brought link down, brought link up again. When link came
    up ping kept on going and "ip -6 maddr" showed that the host was still
    subscribed to there

    Signed-off-by: Lorenzo Colitti
    Signed-off-by: David S. Miller

    Lorenzo Colitti
     
  • Alexey Kuznetsov noticed a regression introduced by
    commit f1ecd5d9e7366609d640ff4040304ea197fbc618
    ("Revert Backoff [v3]: Revert RTO on ICMP destination unreachable")

    The RTO and timer modification code added to tcp_v4_err()
    doesn't check sock_owned_by_user(), which if true means we
    don't have exclusive access to the socket and therefore cannot
    modify it's critical state.

    Just skip this new code block if sock_owned_by_user() is true
    and eliminate the now superfluous sock_owned_by_user() code
    block contained within.

    Reported-by: Alexey Kuznetsov
    Signed-off-by: David S. Miller
    CC: Damian Lukowski
    Acked-by: Eric Dumazet

    David S. Miller
     
  • Now with improved comma support.

    On parsing malformed X.25 facilities, decrementing the remaining length
    may cause it to underflow. Since the length is an unsigned integer,
    this will result in the loop continuing until the kernel crashes.

    This patch adds checks to ensure decrementing the remaining length does
    not cause it to wrap around.

    Signed-off-by: Dan Rosenberg
    Signed-off-by: David S. Miller

    Dan Rosenberg