22 Sep, 2011

1 commit

  • Conflicts:
    MAINTAINERS
    drivers/net/Kconfig
    drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
    drivers/net/ethernet/broadcom/tg3.c
    drivers/net/wireless/iwlwifi/iwl-pci.c
    drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c
    drivers/net/wireless/rt2x00/rt2800usb.c
    drivers/net/wireless/wl12xx/main.c

    David S. Miller
     

25 Aug, 2011

1 commit

  • Dereferencing a user pointer directly from kernel-space without going
    through the copy_from_user family of functions is a bad idea. Two of
    such usages can be found in the sendmsg code path called from sendmmsg,
    added by

    commit c71d8ebe7a4496fb7231151cb70a6baa0cb56f9a upstream.
    commit 5b47b8038f183b44d2d8ff1c7d11a5c1be706b34 in the 3.0-stable tree.

    Usages are performed through memcmp() and memcpy() directly. Fix those
    by using the already copied msg_sys structure instead of the __user *msg
    structure. Note that msg_sys can be set to NULL by verify_compat_iovec()
    or verify_iovec(), which requires additional NULL pointer checks.

    Signed-off-by: Mathieu Desnoyers
    Signed-off-by: David Goulet
    CC: Tetsuo Handa
    CC: Anton Blanchard
    CC: David S. Miller
    CC: stable
    Signed-off-by: David S. Miller

    Mathieu Desnoyers
     

08 Aug, 2011

1 commit


05 Aug, 2011

3 commits

  • The sendmmsg() introduced by commit 228e548e "net: Add sendmmsg socket system
    call" is capable of sending to multiple different destination addresses.

    SMACK is using destination's address for checking sendmsg() permission.
    However, security_socket_sendmsg() is called for only once even if multiple
    different destination addresses are passed to sendmmsg().

    Therefore, we need to call security_socket_sendmsg() for each destination
    address rather than only the first destination address.

    Since calling security_socket_sendmsg() every time when only single destination
    address was passed to sendmmsg() is a waste of time, omit calling
    security_socket_sendmsg() unless destination address of previous datagram and
    that of current datagram differs.

    Signed-off-by: Tetsuo Handa
    Acked-by: Anton Blanchard
    Cc: stable [3.0+]
    Signed-off-by: David S. Miller

    Tetsuo Handa
     
  • To limit the amount of time we can spend in sendmmsg, cap the
    number of elements to UIO_MAXIOV (currently 1024).

    For error handling an application using sendmmsg needs to retry at
    the first unsent message, so capping is simpler and requires less
    application logic than returning EINVAL.

    Signed-off-by: Anton Blanchard
    Cc: stable [3.0+]
    Signed-off-by: David S. Miller

    Anton Blanchard
     
  • sendmmsg uses a similar error return strategy as recvmmsg but it
    turns out to be a confusing way to communicate errors.

    The current code stores the error code away and returns it on the next
    sendmmsg call. This means a call with completely valid arguments could
    get an error from a previous call.

    Change things so we only return an error if no datagrams could be sent.
    If less than the requested number of messages were sent, the application
    must retry starting at the first failed one and if the problem is
    persistent the error will be returned.

    This matches the behaviour of other syscalls like read/write - it
    is not an error if less than the requested number of elements are sent.

    Signed-off-by: Anton Blanchard
    Cc: stable [3.0+]
    Signed-off-by: David S. Miller

    Anton Blanchard
     

02 Aug, 2011

1 commit

  • When assigning a NULL value to an RCU protected pointer, no barrier
    is needed. The rcu_assign_pointer, used to handle that but will soon
    change to not handle the special case.

    Convert all rcu_assign_pointer of NULL value.

    //smpl
    @@ expression P; @@

    - rcu_assign_pointer(P, NULL)
    + RCU_INIT_POINTER(P, NULL)

    //

    Signed-off-by: Stephen Hemminger
    Acked-by: Paul E. McKenney
    Signed-off-by: David S. Miller

    Stephen Hemminger
     

28 Jul, 2011

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (32 commits)
    tg3: Remove 5719 jumbo frames and TSO blocks
    tg3: Break larger frags into 4k chunks for 5719
    tg3: Add tx BD budgeting code
    tg3: Consolidate code that calls tg3_tx_set_bd()
    tg3: Add partial fragment unmapping code
    tg3: Generalize tg3_skb_error_unmap()
    tg3: Remove short DMA check for 1st fragment
    tg3: Simplify tx bd assignments
    tg3: Reintroduce tg3_tx_ring_info
    ASIX: Use only 11 bits of header for data size
    ASIX: Simplify condition in rx_fixup()
    Fix cdc-phonet build
    bonding: reduce noise during init
    bonding: fix string comparison errors
    net: Audit drivers to identify those needing IFF_TX_SKB_SHARING cleared
    net: add IFF_SKB_TX_SHARED flag to priv_flags
    net: sock_sendmsg_nosec() is static
    forcedeth: fix vlans
    gianfar: fix bug caused by 87c288c6e9aa31720b72e2bc2d665e24e1653c3e
    gro: Only reset frag0 when skb can be pulled
    ...

    Linus Torvalds
     
  • Signed-off-by: Eric Dumazet
    CC: Anton Blanchard
    Signed-off-by: David S. Miller

    Eric Dumazet
     

27 Jul, 2011

1 commit


21 May, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1446 commits)
    macvlan: fix panic if lowerdev in a bond
    tg3: Add braces around 5906 workaround.
    tg3: Fix NETIF_F_LOOPBACK error
    macvlan: remove one synchronize_rcu() call
    networking: NET_CLS_ROUTE4 depends on INET
    irda: Fix error propagation in ircomm_lmp_connect_response()
    irda: Kill set but unused variable 'bytes' in irlan_check_command_param()
    irda: Kill set but unused variable 'clen' in ircomm_connect_indication()
    rxrpc: Fix set but unused variable 'usage' in rxrpc_get_transport()
    be2net: Kill set but unused variable 'req' in lancer_fw_download()
    irda: Kill set but unused vars 'saddr' and 'daddr' in irlan_provider_connect_indication()
    atl1c: atl1c_resume() is only used when CONFIG_PM_SLEEP is defined.
    rxrpc: Fix set but unused variable 'usage' in rxrpc_get_peer().
    rxrpc: Kill set but unused variable 'local' in rxrpc_UDP_error_handler()
    rxrpc: Kill set but unused variable 'sp' in rxrpc_process_connection()
    rxrpc: Kill set but unused variable 'sp' in rxrpc_rotate_tx_window()
    pkt_sched: Kill set but unused variable 'protocol' in tc_classify()
    isdn: capi: Use pr_debug() instead of ifdefs.
    tg3: Update version to 3.119
    tg3: Apply rx_discards fix to 5719/5720
    ...

    Fix up trivial conflicts in arch/x86/Kconfig and net/mac80211/agg-tx.c
    as per Davem.

    Linus Torvalds
     

18 May, 2011

2 commits


08 May, 2011

1 commit


06 May, 2011

1 commit

  • This patch adds a multiple message send syscall and is the send
    version of the existing recvmmsg syscall. This is heavily
    based on the patch by Arnaldo that added recvmmsg.

    I wrote a microbenchmark to test the performance gains of using
    this new syscall:

    http://ozlabs.org/~anton/junkcode/sendmmsg_test.c

    The test was run on a ppc64 box with a 10 Gbit network card. The
    benchmark can send both UDP and RAW ethernet packets.

    64B UDP

    batch pkts/sec
    1 804570
    2 872800 (+ 8 %)
    4 916556 (+14 %)
    8 939712 (+17 %)
    16 952688 (+18 %)
    32 956448 (+19 %)
    64 964800 (+20 %)

    64B raw socket

    batch pkts/sec
    1 1201449
    2 1350028 (+12 %)
    4 1461416 (+22 %)
    8 1513080 (+26 %)
    16 1541216 (+28 %)
    32 1553440 (+29 %)
    64 1557888 (+30 %)

    We see a 20% improvement in throughput on UDP send and 30%
    on raw socket send.

    [ Add sparc syscall entries. -DaveM ]

    Signed-off-by: Anton Blanchard
    Signed-off-by: David S. Miller

    Anton Blanchard
     

12 Apr, 2011

2 commits


31 Mar, 2011

1 commit


19 Mar, 2011

1 commit

  • This structure was accidentally defined such that its layout can
    differ between 32-bit and 64-bit processes. Add compat structure
    definitions and an ioctl wrapper function.

    Signed-off-by: Ben Hutchings
    Acked-by: Alexander Duyck
    Cc: stable@kernel.org [2.6.30+]
    Signed-off-by: David S. Miller

    Ben Hutchings
     

24 Feb, 2011

1 commit


23 Feb, 2011

1 commit


13 Jan, 2011

1 commit


08 Jan, 2011

1 commit

  • …t/npiggin/linux-npiggin

    * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits)
    fs: scale mntget/mntput
    fs: rename vfsmount counter helpers
    fs: implement faster dentry memcmp
    fs: prefetch inode data in dcache lookup
    fs: improve scalability of pseudo filesystems
    fs: dcache per-inode inode alias locking
    fs: dcache per-bucket dcache hash locking
    bit_spinlock: add required includes
    kernel: add bl_list
    xfs: provide simple rcu-walk ACL implementation
    btrfs: provide simple rcu-walk ACL implementation
    ext2,3,4: provide simple rcu-walk ACL implementation
    fs: provide simple rcu-walk generic_check_acl implementation
    fs: provide rcu-walk aware permission i_ops
    fs: rcu-walk aware d_revalidate method
    fs: cache optimise dentry and inode for rcu-walk
    fs: dcache reduce branches in lookup path
    fs: dcache remove d_mounted
    fs: fs_struct use seqlock
    fs: rcu-walk for path lookup
    ...

    Linus Torvalds
     

07 Jan, 2011

5 commits

  • The problem that this patch aims to fix is vfsmount refcounting scalability.
    We need to take a reference on the vfsmount for every successful path lookup,
    which often go to the same mount point.

    The fundamental difficulty is that a "simple" reference count can never be made
    scalable, because any time a reference is dropped, we must check whether that
    was the last reference. To do that requires communication with all other CPUs
    that may have taken a reference count.

    We can make refcounts more scalable in a couple of ways, involving keeping
    distributed counters, and checking for the global-zero condition less
    frequently.

    - check the global sum once every interval (this will delay zero detection
    for some interval, so it's probably a showstopper for vfsmounts).

    - keep a local count and only taking the global sum when local reaches 0 (this
    is difficult for vfsmounts, because we can't hold preempt off for the life of
    a reference, so a counter would need to be per-thread or tied strongly to a
    particular CPU which requires more locking).

    - keep a local difference of increments and decrements, which allows us to sum
    the total difference and hence find the refcount when summing all CPUs. Then,
    keep a single integer "long" refcount for slow and long lasting references,
    and only take the global sum of local counters when the long refcount is 0.

    This last scheme is what I implemented here. Attached mounts and process root
    and working directory references are "long" references, and everything else is
    a short reference.

    This allows scalable vfsmount references during path walking over mounted
    subtrees and unattached (lazy umounted) mounts with processes still running
    in them.

    This results in one fewer atomic op in the fastpath: mntget is now just a
    per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
    and non-atomic decrement in the common case. However code is otherwise bigger
    and heavier, so single threaded performance is basically a wash.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Regardless of how much we possibly try to scale dcache, there is likely
    always going to be some fundamental contention when adding or removing children
    under the same parent. Pseudo filesystems do not seem need to have connected
    dentries because by definition they are disconnected.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Reduce some branches and memory accesses in dcache lookup by adding dentry
    flags to indicate common d_ops are set, rather than having to check them.
    This saves a pointer memory access (dentry->d_op) in common path lookup
    situations, and saves another pointer load and branch in cases where we
    have d_op but not the particular operation.

    Patched with:

    git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Pseudo filesystems that don't put inode on RCU list or reachable by
    rcu-walk dentries do not need to RCU free their inodes.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • RCU free the struct inode. This will allow:

    - Subsequent store-free path walking patch. The inode must be consulted for
    permissions when walking, so an RCU inode reference is a must.
    - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
    to take i_lock no longer need to take sb_inode_list_lock to walk the list in
    the first place. This will simplify and optimize locking.
    - Could remove some nested trylock loops in dcache code
    - Could potentially simplify things a bit in VM land. Do not need to take the
    page lock to follow page->mapping.

    The downsides of this is the performance cost of using RCU. In a simple
    creat/unlink microbenchmark, performance drops by about 10% due to inability to
    reuse cache-hot slab objects. As iterations increase and RCU freeing starts
    kicking over, this increases to about 20%.

    In cases where inode lifetimes are longer (ie. many inodes may be allocated
    during the average life span of a single inode), a lot of this cache reuse is
    not applicable, so the regression caused by this patch is smaller.

    The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
    however this adds some complexity to list walking and store-free path walking,
    so I prefer to implement this at a later date, if it is shown to be a win in
    real situations. I haven't found a regression in any non-micro benchmark so I
    doubt it will be a problem.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

18 Dec, 2010

1 commit


11 Dec, 2010

1 commit


13 Nov, 2010

1 commit


31 Oct, 2010

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
    isdn: mISDN: socket: fix information leak to userland
    netdev: can: Change mail address of Hans J. Koch
    pcnet_cs: add new_id
    net: Truncate recvfrom and sendto length to INT_MAX.
    RDS: Let rds_message_alloc_sgs() return NULL
    RDS: Copy rds_iovecs into kernel memory instead of rereading from userspace
    RDS: Clean up error handling in rds_cmsg_rdma_args
    RDS: Return -EINVAL if rds_rdma_pages returns an error
    net: fix rds_iovec page count overflow
    can: pch_can: fix section mismatch warning by using a whitelisted name
    can: pch_can: fix sparse warning
    netxen_nic: Fix the tx queue manipulation bug in netxen_nic_probe
    ip_gre: fix fallback tunnel setup
    vmxnet: trivial annotation of protocol constant
    vmxnet3: remove unnecessary byteswapping in BAR writing macros
    ipv6/udp: report SndbufErrors and RcvbufErrors
    phy/marvell: rename 88ec048 to 88e1318s and fix mscr1 addr

    Linus Torvalds
     
  • Signed-off-by: Linus Torvalds
    Signed-off-by: David S. Miller

    Linus Torvalds
     

29 Oct, 2010

1 commit


27 Oct, 2010

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
    split invalidate_inodes()
    fs: skip I_FREEING inodes in writeback_sb_inodes
    fs: fold invalidate_list into invalidate_inodes
    fs: do not drop inode_lock in dispose_list
    fs: inode split IO and LRU lists
    fs: switch bdev inode bdi's correctly
    fs: fix buffer invalidation in invalidate_list
    fsnotify: use dget_parent
    smbfs: use dget_parent
    exportfs: use dget_parent
    fs: use RCU read side protection in d_validate
    fs: clean up dentry lru modification
    fs: split __shrink_dcache_sb
    fs: improve DCACHE_REFERENCED usage
    fs: use percpu counter for nr_dentry and nr_dentry_unused
    fs: simplify __d_free
    fs: take dcache_lock inside __d_path
    fs: do not assign default i_ino in new_inode
    fs: introduce a per-cpu last_ino allocator
    new helper: ihold()
    ...

    Linus Torvalds
     
  • * 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: (99 commits)
    svcrpc: svc_tcp_sendto XPT_DEAD check is redundant
    svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue
    svcrpc: assume svc_delete_xprt() called only once
    svcrpc: never clear XPT_BUSY on dead xprt
    nfsd4: fix connection allocation in sequence()
    nfsd4: only require krb5 principal for NFSv4.0 callbacks
    nfsd4: move minorversion to client
    nfsd4: delay session removal till free_client
    nfsd4: separate callback change and callback probe
    nfsd4: callback program number is per-session
    nfsd4: track backchannel connections
    nfsd4: confirm only on succesful create_session
    nfsd4: make backchannel sequence number per-session
    nfsd4: use client pointer to backchannel session
    nfsd4: move callback setup into session init code
    nfsd4: don't cache seq_misordered replies
    SUNRPC: Properly initialize sock_xprt.srcaddr in all cases
    SUNRPC: Use conventional switch statement when reclassifying sockets
    sunrpc/xprtrdma: clean up workqueue usage
    sunrpc: Turn list_for_each-s into the ..._entry-s
    ...

    Fix up trivial conflicts (two different deprecation notices added in
    separate branches) in Documentation/feature-removal-schedule.txt

    Linus Torvalds
     

26 Oct, 2010

2 commits

  • Instead of always assigning an increasing inode number in new_inode
    move the call to assign it into those callers that actually need it.
    For now callers that need it is estimated conservatively, that is
    the call is added to all filesystems that do not assign an i_ino
    by themselves. For a few more filesystems we can avoid assigning
    any inode number given that they aren't user visible, and for others
    it could be done lazily when an inode number is actually needed,
    but that's left for later patches.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Dave Chinner
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Clones an existing reference to inode; caller must already hold one.

    Signed-off-by: Al Viro

    Al Viro
     

24 Oct, 2010

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits)
    bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL.
    vlan: Calling vlan_hwaccel_do_receive() is always valid.
    tproxy: use the interface primary IP address as a default value for --on-ip
    tproxy: added IPv6 support to the socket match
    cxgb3: function namespace cleanup
    tproxy: added IPv6 support to the TPROXY target
    tproxy: added IPv6 socket lookup function to nf_tproxy_core
    be2net: Changes to use only priority codes allowed by f/w
    tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
    tproxy: added tproxy sockopt interface in the IPV6 layer
    tproxy: added udp6_lib_lookup function
    tproxy: added const specifiers to udp lookup functions
    tproxy: split off ipv6 defragmentation to a separate module
    l2tp: small cleanup
    nf_nat: restrict ICMP translation for embedded header
    can: mcp251x: fix generation of error frames
    can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set
    can-raw: add msg_flags to distinguish local traffic
    9p: client code cleanup
    rds: make local functions/variables static
    ...

    Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and
    drivers/net/wireless/ath/ath9k/debug.c as per David

    Linus Torvalds
     

21 Oct, 2010

1 commit