17 Dec, 2009

5 commits


02 Dec, 2009

1 commit


12 Nov, 2009

2 commits

  • We have two implementations of the compat_ioctl handling for ATM, the
    one that we have had for ages in fs/compat_ioctl.c and the one added to
    net/atm/ioctl.c by David Woodhouse. Unfortunately, both versions are
    incomplete, and in practice we use a very confusing combination of the
    two.

    For ioctl numbers that have the same identifier on 32 and 64 bit systems,
    we go directly through the compat_ioctl socket operation, for those that

    differ, we do a conversion in fs/compat_ioctl.c.

    This patch moves both variants into the vcc_compat_ioctl() function,
    while preserving the current behaviour. It also kills off the COMPATIBLE_IOCTL
    definitions that we never use here.
    Doing it this way is clearly not a good solution, but I hope it is a
    step into the right direction, so that someone is able to clean up this
    mess for real.

    Signed-off-by: Arnd Bergmann
    Cc: Eric Dumazet
    Cc: David Woodhouse
    Signed-off-by: David S. Miller

    Arnd Bergmann
     
  • Handling for SIOCSHWTSTAMP is broken on architectures
    with a split user/kernel address space like s390,
    because it passes a real user pointer while using
    set_fs(KERNEL_DS).
    A similar problem might arise the next time somebody
    adds code to dev_ifsioc.

    Split up dev_ifsioc into three separate functions for
    SIOCSHWTSTAMP, SIOC*IFMAP and all other numbers so
    we can get rid of set_fs in all potentially affected
    cases.

    Signed-off-by: Arnd Bergmann
    Cc: Patrick Ohly
    Cc: David S. Miller
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

09 Nov, 2009

2 commits

  • This adds compat_ioctl support for SIOCWANDEV, which has
    always been missing.

    The definition of struct compat_ifreq was missing an
    ifru_settings fields that is needed to support SIOCWANDEV,
    so add that and clean up the whitespace damage in the
    struct definition.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     
  • SIOCGMIIPHY and SIOCGMIIREG return data through ifreq,
    so it needs to be converted on the way out as well.

    SIOCGIFPFLAGS is unused, but has the same problem in theory.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

07 Nov, 2009

3 commits

  • The MII ioctls and SIOCSIFNAME need to go through ifsioc conversion,
    which they never did so far. Some others are not implemented in the
    native path, so we can just return -EINVAL directly.

    Add IFSLAVE ioctls to the EINVAL list and move it to the end to
    optimize the code path for the common case.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     
  • This removes the original socket compat_ioctl code
    from fs/compat_ioctl.c and converts the code from the copy
    in net/socket.c into a single function. We add a few cycles
    of runtime to compat_sock_ioctl() with the long switch()
    statement, but gain some cycles in return by simplifying
    the call chain to get there.

    Due to better inlining, save 1.5kb of object size in the
    process, and enable further savings:

    before:
    text data bss dec hex filename
    13540 18008 2080 33628 835c obj/fs/compat_ioctl.o
    14565 636 40 15241 3b89 obj/net/socket.o

    after:
    text data bss dec hex filename
    8916 15176 2080 26172 663c obj/fs/compat_ioctl.o
    20725 636 40 21401 5399 obj/net/socket.o

    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     
  • This makes an identical copy of the socket compat_ioctl code
    from fs/compat_ioctl.c to net/socket.c, as a preparation
    for moving the functionality in a way that can be easily
    reviewed.

    The code is hidden inside of #if 0 and gets activated in the
    patch that will make it work.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: David S. Miller

    Arnd Bergmann
     

06 Nov, 2009

1 commit

  • The generic __sock_create function has a kern argument which allows the
    security system to make decisions based on if a socket is being created by
    the kernel or by userspace. This patch passes that flag to the
    net_proto_family specific create function, so it can do the same thing.

    Signed-off-by: Eric Paris
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Eric Paris
     

13 Oct, 2009

2 commits

  • Meaning receive multiple messages, reducing the number of syscalls and
    net stack entry/exit operations.

    Next patches will introduce mechanisms where protocols that want to
    optimize this operation will provide an unlocked_recvmsg operation.

    This takes into account comments made by:

    . Paul Moore: sock_recvmsg is called only for the first datagram,
    sock_recvmsg_nosec is used for the rest.

    . Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
    works in the same fashion as the ppoll one.

    If the underlying protocol returns a datagram with MSG_OOB set, this
    will make recvmmsg return right away with as many datagrams (+ the OOB
    one) it has received so far.

    . Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
    datagrams and then recvmsg returns an error, recvmmsg will return
    the successfully received datagrams, store the error and return it
    in the next call.

    This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
    where we will be able to acquire the lock only at batch start and end, not at
    every underlying recvmsg call.

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Create a new socket level option to report number of queue overflows

    Recently I augmented the AF_PACKET protocol to report the number of frames lost
    on the socket receive queue between any two enqueued frames. This value was
    exported via a SOL_PACKET level cmsg. AFter I completed that work it was
    requested that this feature be generalized so that any datagram oriented socket
    could make use of this option. As such I've created this patch, It creates a
    new SOL_SOCKET level option called SO_RXQ_OVFL, which when enabled exports a
    SOL_SOCKET level cmsg that reports the nubmer of times the sk_receive_queue
    overflowed between any two given frames. It also augments the AF_PACKET
    protocol to take advantage of this new feature (as it previously did not touch
    sk->sk_drops, which this patch uses to record the overflow count). Tested
    successfully by me.

    Notes:

    1) Unlike my previous patch, this patch simply records the sk_drops value, which
    is not a number of drops between packets, but rather a total number of drops.
    Deltas must be computed in user space.

    2) While this patch currently works with datagram oriented protocols, it will
    also be accepted by non-datagram oriented protocols. I'm not sure if thats
    agreeable to everyone, but my argument in favor of doing so is that, for those
    protocols which aren't applicable to this option, sk_drops will always be zero,
    and reporting no drops on a receive queue that isn't used for those
    non-participating protocols seems reasonable to me. This also saves us having
    to code in a per-protocol opt in mechanism.

    3) This applies cleanly to net-next assuming that commit
    977750076d98c7ff6cbda51858bb5a5894a9d9ab (my af packet cmsg patch) is reverted

    Signed-off-by: Neil Horman
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Neil Horman
     

10 Oct, 2009

1 commit


08 Oct, 2009

1 commit

  • Refactor wext to
    * split out iwpriv handling
    * split out iwspy handling
    * split out procfs support
    * allow cfg80211 to have wireless extensions compat code
    w/o CONFIG_WIRELESS_EXT

    After this, drivers need to
    - select WIRELESS_EXT - for wext support
    - select WEXT_PRIV - for iwpriv support
    - select WEXT_SPY - for iwspy support

    except cfg80211 -- which gets new hooks in wext-core.c
    and can then get wext handlers without CONFIG_WIRELESS_EXT.

    Wireless extensions procfs support is auto-selected
    based on PROC_FS and anything that requires the wext core
    (i.e. WIRELESS_EXT or CFG80211_WEXT).

    Signed-off-by: Johannes Berg
    Signed-off-by: John W. Linville

    Johannes Berg
     

07 Oct, 2009

1 commit

  • An incoming datagram must bring into cpu cache *lot* of cache lines,
    in particular : (other parts omitted (hash chains, ip route cache...))

    On 32bit arches :

    offsetof(struct sock, sk_rcvbuf) =0x30 (read)
    offsetof(struct sock, sk_lock) =0x34 (rw)

    offsetof(struct sock, sk_sleep) =0x50 (read)
    offsetof(struct sock, sk_rmem_alloc) =0x64 (rw)
    offsetof(struct sock, sk_receive_queue)=0x74 (rw)

    offsetof(struct sock, sk_forward_alloc)=0x98 (rw)

    offsetof(struct sock, sk_callback_lock)=0xcc (rw)
    offsetof(struct sock, sk_drops) =0xd8 (read if we add dropcount support, rw if frame dropped)
    offsetof(struct sock, sk_filter) =0xf8 (read)

    offsetof(struct sock, sk_socket) =0x138 (read)

    offsetof(struct sock, sk_data_ready) =0x15c (read)

    We can avoid sk->sk_socket and socket->fasync_list referencing on sockets
    with no fasync() structures. (socket->fasync_list ptr is probably already in cache
    because it shares a cache line with socket->wait, ie location pointed by sk->sk_sleep)

    This avoids one cache line load per incoming packet for common cases (no fasync())

    We can leave (or even move in a future patch) sk->sk_socket in a cold location

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 Oct, 2009

1 commit

  • This provides safety against negative optlen at the type
    level instead of depending upon (sometimes non-trivial)
    checks against this sprinkled all over the the place, in
    each and every implementation.

    Based upon work done by Arjan van de Ven and feedback
    from Linus Torvalds.

    Signed-off-by: David S. Miller

    David S. Miller
     

29 Sep, 2009

1 commit

  • The sys_socketcall() function has a very clever system for the copy
    size of its arguments. Unfortunately, gcc cannot deal with this in
    terms of proving that the copy_from_user() is then always in bounds.
    This is the last (well 9th of this series, but last in the kernel) such
    case around.

    With this patch, we can turn on code to make having the boundary provably
    right for the whole kernel, and detect introduction of new security
    accidents of this type early on.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: David S. Miller

    Arjan van de Ven
     

23 Sep, 2009

1 commit

  • Move various magic-number definitions into magic.h.

    Signed-off-by: Nick Black
    Acked-by: Pekka Enberg
    Cc: Al Viro
    Cc: "David S. Miller"
    Cc: Casey Schaufler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Black
     

22 Sep, 2009

1 commit


15 Sep, 2009

1 commit


14 Aug, 2009

1 commit

  • kernel_sendpage() does the proper default case handling for when the
    socket doesn't have a native sendpage implementation.

    Now, arguably this might be something that we could instead solve by
    just specifying that all protocols should do it themselves at the
    protocol level, but we really only care about the common protocols.
    Does anybody really care about sendpage on something like Appletalk? Not
    likely.

    Acked-by: David S. Miller
    Acked-by: Julien TINNES
    Acked-by: Tavis Ormandy
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

07 Apr, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
    b44: Use kernel DMA addresses for the kernel DMA API
    forcedeth: Fix resume from hibernation regression.
    xfrm: fix fragmentation on inter family tunnels
    ibm_newemac: Fix dangerous struct assumption
    gigaset: documentation update
    gigaset: in file ops, check for device disconnect before anything else
    bas_gigaset: use tasklet_hi_schedule for timing critical tasklets
    net/802/fddi.c: add MODULE_LICENSE
    smsc911x: remove unused #include
    axnet_cs: fix phy_id detection for bogus Asix chip.
    bnx2: Use request_firmware()
    b44: Fix sizes passed to b44_sync_dma_desc_for_{device,cpu}()
    socket: use percpu_add() while updating sockets_in_use
    virtio_net: Set the mac config only when VIRITO_NET_F_MAC
    myri_sbus: use request_firmware
    e1000: fix loss of multicast packets
    vxge: should include tcp.h

    Conflict in firmware/WHENCE (SCSI vs net firmware)

    Linus Torvalds
     

05 Apr, 2009

1 commit

  • sock_alloc() currently uses following code to update sockets_in_use

    get_cpu_var(sockets_in_use)++;
    put_cpu_var(sockets_in_use);

    This translates to :

    c0436274: b8 01 00 00 00 mov $0x1,%eax
    c0436279: e8 42 40 df ff call c022a2c0
    c043627e: bb 20 4f 6a c0 mov $0xc06a4f20,%ebx
    c0436283: e8 18 ca f0 ff call c0342ca0
    c0436288: 03 1c 85 60 4a 65 c0 add -0x3f9ab5a0(,%eax,4),%ebx
    c043628f: ff 03 incl (%ebx)
    c0436291: b8 01 00 00 00 mov $0x1,%eax
    c0436296: e8 75 3f df ff call c022a210
    c043629b: 89 e0 mov %esp,%eax
    c043629d: 25 00 e0 ff ff and $0xffffe000,%eax
    c04362a2: f6 40 08 08 testb $0x8,0x8(%eax)
    c04362a6: 75 07 jne c04362af
    c04362a8: 8d 46 d8 lea -0x28(%esi),%eax
    c04362ab: 5b pop %ebx
    c04362ac: 5e pop %esi
    c04362ad: c9 leave
    c04362ae: c3 ret
    c04362af: e8 cc 5d 09 00 call c04cc080
    c04362b4: 8d 74 26 00 lea 0x0(%esi,%eiz,1),%esi
    c04362b8: eb ee jmp c04362a8

    While percpu_add(sockets_in_use, 1) translates to a single instruction :

    c0436275: 64 83 05 20 5f 6a c0 addl $0x1,%fs:0xc06a5f20

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

28 Mar, 2009

3 commits

  • The socket_post_accept() hook is not currently used by any in-tree modules
    and its existence continues to cause problems by confusing people about
    what can be safely accomplished using this hook. If a legitimate need for
    this hook arises in the future it can always be reintroduced.

    Signed-off-by: Paul Moore
    Signed-off-by: James Morris

    Paul Moore
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (37 commits)
    fs: avoid I_NEW inodes
    Merge code for single and multiple-instance mounts
    Remove get_init_pts_sb()
    Move common mknod_ptmx() calls into caller
    Parse mount options just once and copy them to super block
    Unroll essentials of do_remount_sb() into devpts
    vfs: simple_set_mnt() should return void
    fs: move bdev code out of buffer.c
    constify dentry_operations: rest
    constify dentry_operations: configfs
    constify dentry_operations: sysfs
    constify dentry_operations: JFS
    constify dentry_operations: OCFS2
    constify dentry_operations: GFS2
    constify dentry_operations: FAT
    constify dentry_operations: FUSE
    constify dentry_operations: procfs
    constify dentry_operations: ecryptfs
    constify dentry_operations: CIFS
    constify dentry_operations: AFS
    ...

    Linus Torvalds
     
  • Signed-off-by: Al Viro

    Al Viro
     

27 Mar, 2009

1 commit


16 Mar, 2009

1 commit

  • Removing the BKL from FASYNC handling ran into the challenge of keeping the
    setting of the FASYNC bit in filp->f_flags atomic with regard to calls to
    the underlying fasync() function. Andi Kleen suggested moving the handling
    of that bit into fasync(); this patch does exactly that. As a result, we
    have a couple of internal API changes: fasync() must now manage the FASYNC
    bit, and it will be called without the BKL held.

    As it happens, every fasync() implementation in the kernel with one
    exception calls fasync_helper(). So, if we make fasync_helper() set the
    FASYNC bit, we can avoid making any changes to the other fasync()
    functions - as long as those functions, themselves, have proper locking.
    Most fasync() implementations do nothing but call fasync_helper() - which
    has its own lock - so they are easily verified as correct. The BKL had
    already been pushed down into the rest.

    The networking code has its own version of fasync_helper(), so that code
    has been augmented with explicit FASYNC bit handling.

    Cc: Al Viro
    Cc: David Miller
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jonathan Corbet

    Jonathan Corbet
     

16 Feb, 2009

1 commit

  • The overlap with the old SO_TIMESTAMP[NS] options is handled so
    that time stamping in software (net_enable_timestamp()) is
    enabled when SO_TIMESTAMP[NS] and/or SO_TIMESTAMPING_RX_SOFTWARE
    is set. It's disabled if all of these are off.

    Signed-off-by: Patrick Ohly
    Signed-off-by: David S. Miller

    Patrick Ohly
     

14 Jan, 2009

3 commits


05 Jan, 2009

2 commits


29 Dec, 2008

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1429 commits)
    net: Allow dependancies of FDDI & Tokenring to be modular.
    igb: Fix build warning when DCA is disabled.
    net: Fix warning fallout from recent NAPI interface changes.
    gro: Fix potential use after free
    sfc: If AN is enabled, always read speed/duplex from the AN advertising bits
    sfc: When disabling the NIC, close the device rather than unregistering it
    sfc: SFT9001: Add cable diagnostics
    sfc: Add support for multiple PHY self-tests
    sfc: Merge top-level functions for self-tests
    sfc: Clean up PHY mode management in loopback self-test
    sfc: Fix unreliable link detection in some loopback modes
    sfc: Generate unique names for per-NIC workqueues
    802.3ad: use standard ethhdr instead of ad_header
    802.3ad: generalize out mac address initializer
    802.3ad: initialize ports LACPDU from const initializer
    802.3ad: remove typedef around ad_system
    802.3ad: turn ports is_individual into a bool
    802.3ad: turn ports is_enabled into a bool
    802.3ad: make ntt bool
    ixgbe: Fix set_ringparam in ixgbe to use the same memory pools.
    ...

    Fixed trivial IPv4/6 address printing conflicts in fs/cifs/connect.c due
    to the conversion to %pI (in this networking merge) and the addition of
    doing IPv6 addresses (from the earlier merge of CIFS).

    Linus Torvalds
     

25 Dec, 2008

1 commit