13 Jan, 2011

1 commit


08 Jan, 2011

1 commit

  • …t/npiggin/linux-npiggin

    * 'vfs-scale-working' of git://git.kernel.org/pub/scm/linux/kernel/git/npiggin/linux-npiggin: (57 commits)
    fs: scale mntget/mntput
    fs: rename vfsmount counter helpers
    fs: implement faster dentry memcmp
    fs: prefetch inode data in dcache lookup
    fs: improve scalability of pseudo filesystems
    fs: dcache per-inode inode alias locking
    fs: dcache per-bucket dcache hash locking
    bit_spinlock: add required includes
    kernel: add bl_list
    xfs: provide simple rcu-walk ACL implementation
    btrfs: provide simple rcu-walk ACL implementation
    ext2,3,4: provide simple rcu-walk ACL implementation
    fs: provide simple rcu-walk generic_check_acl implementation
    fs: provide rcu-walk aware permission i_ops
    fs: rcu-walk aware d_revalidate method
    fs: cache optimise dentry and inode for rcu-walk
    fs: dcache reduce branches in lookup path
    fs: dcache remove d_mounted
    fs: fs_struct use seqlock
    fs: rcu-walk for path lookup
    ...

    Linus Torvalds
     

07 Jan, 2011

5 commits

  • The problem that this patch aims to fix is vfsmount refcounting scalability.
    We need to take a reference on the vfsmount for every successful path lookup,
    which often go to the same mount point.

    The fundamental difficulty is that a "simple" reference count can never be made
    scalable, because any time a reference is dropped, we must check whether that
    was the last reference. To do that requires communication with all other CPUs
    that may have taken a reference count.

    We can make refcounts more scalable in a couple of ways, involving keeping
    distributed counters, and checking for the global-zero condition less
    frequently.

    - check the global sum once every interval (this will delay zero detection
    for some interval, so it's probably a showstopper for vfsmounts).

    - keep a local count and only taking the global sum when local reaches 0 (this
    is difficult for vfsmounts, because we can't hold preempt off for the life of
    a reference, so a counter would need to be per-thread or tied strongly to a
    particular CPU which requires more locking).

    - keep a local difference of increments and decrements, which allows us to sum
    the total difference and hence find the refcount when summing all CPUs. Then,
    keep a single integer "long" refcount for slow and long lasting references,
    and only take the global sum of local counters when the long refcount is 0.

    This last scheme is what I implemented here. Attached mounts and process root
    and working directory references are "long" references, and everything else is
    a short reference.

    This allows scalable vfsmount references during path walking over mounted
    subtrees and unattached (lazy umounted) mounts with processes still running
    in them.

    This results in one fewer atomic op in the fastpath: mntget is now just a
    per-CPU inc, rather than an atomic inc; and mntput just requires a spinlock
    and non-atomic decrement in the common case. However code is otherwise bigger
    and heavier, so single threaded performance is basically a wash.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Regardless of how much we possibly try to scale dcache, there is likely
    always going to be some fundamental contention when adding or removing children
    under the same parent. Pseudo filesystems do not seem need to have connected
    dentries because by definition they are disconnected.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Reduce some branches and memory accesses in dcache lookup by adding dentry
    flags to indicate common d_ops are set, rather than having to check them.
    This saves a pointer memory access (dentry->d_op) in common path lookup
    situations, and saves another pointer load and branch in cases where we
    have d_op but not the particular operation.

    Patched with:

    git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Pseudo filesystems that don't put inode on RCU list or reachable by
    rcu-walk dentries do not need to RCU free their inodes.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • RCU free the struct inode. This will allow:

    - Subsequent store-free path walking patch. The inode must be consulted for
    permissions when walking, so an RCU inode reference is a must.
    - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
    to take i_lock no longer need to take sb_inode_list_lock to walk the list in
    the first place. This will simplify and optimize locking.
    - Could remove some nested trylock loops in dcache code
    - Could potentially simplify things a bit in VM land. Do not need to take the
    page lock to follow page->mapping.

    The downsides of this is the performance cost of using RCU. In a simple
    creat/unlink microbenchmark, performance drops by about 10% due to inability to
    reuse cache-hot slab objects. As iterations increase and RCU freeing starts
    kicking over, this increases to about 20%.

    In cases where inode lifetimes are longer (ie. many inodes may be allocated
    during the average life span of a single inode), a lot of this cache reuse is
    not applicable, so the regression caused by this patch is smaller.

    The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
    however this adds some complexity to list walking and store-free path walking,
    so I prefer to implement this at a later date, if it is shown to be a win in
    real situations. I haven't found a regression in any non-micro benchmark so I
    doubt it will be a problem.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

18 Dec, 2010

1 commit


11 Dec, 2010

1 commit


13 Nov, 2010

1 commit


31 Oct, 2010

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
    isdn: mISDN: socket: fix information leak to userland
    netdev: can: Change mail address of Hans J. Koch
    pcnet_cs: add new_id
    net: Truncate recvfrom and sendto length to INT_MAX.
    RDS: Let rds_message_alloc_sgs() return NULL
    RDS: Copy rds_iovecs into kernel memory instead of rereading from userspace
    RDS: Clean up error handling in rds_cmsg_rdma_args
    RDS: Return -EINVAL if rds_rdma_pages returns an error
    net: fix rds_iovec page count overflow
    can: pch_can: fix section mismatch warning by using a whitelisted name
    can: pch_can: fix sparse warning
    netxen_nic: Fix the tx queue manipulation bug in netxen_nic_probe
    ip_gre: fix fallback tunnel setup
    vmxnet: trivial annotation of protocol constant
    vmxnet3: remove unnecessary byteswapping in BAR writing macros
    ipv6/udp: report SndbufErrors and RcvbufErrors
    phy/marvell: rename 88ec048 to 88e1318s and fix mscr1 addr

    Linus Torvalds
     
  • Signed-off-by: Linus Torvalds
    Signed-off-by: David S. Miller

    Linus Torvalds
     

29 Oct, 2010

1 commit


27 Oct, 2010

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
    split invalidate_inodes()
    fs: skip I_FREEING inodes in writeback_sb_inodes
    fs: fold invalidate_list into invalidate_inodes
    fs: do not drop inode_lock in dispose_list
    fs: inode split IO and LRU lists
    fs: switch bdev inode bdi's correctly
    fs: fix buffer invalidation in invalidate_list
    fsnotify: use dget_parent
    smbfs: use dget_parent
    exportfs: use dget_parent
    fs: use RCU read side protection in d_validate
    fs: clean up dentry lru modification
    fs: split __shrink_dcache_sb
    fs: improve DCACHE_REFERENCED usage
    fs: use percpu counter for nr_dentry and nr_dentry_unused
    fs: simplify __d_free
    fs: take dcache_lock inside __d_path
    fs: do not assign default i_ino in new_inode
    fs: introduce a per-cpu last_ino allocator
    new helper: ihold()
    ...

    Linus Torvalds
     
  • * 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: (99 commits)
    svcrpc: svc_tcp_sendto XPT_DEAD check is redundant
    svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue
    svcrpc: assume svc_delete_xprt() called only once
    svcrpc: never clear XPT_BUSY on dead xprt
    nfsd4: fix connection allocation in sequence()
    nfsd4: only require krb5 principal for NFSv4.0 callbacks
    nfsd4: move minorversion to client
    nfsd4: delay session removal till free_client
    nfsd4: separate callback change and callback probe
    nfsd4: callback program number is per-session
    nfsd4: track backchannel connections
    nfsd4: confirm only on succesful create_session
    nfsd4: make backchannel sequence number per-session
    nfsd4: use client pointer to backchannel session
    nfsd4: move callback setup into session init code
    nfsd4: don't cache seq_misordered replies
    SUNRPC: Properly initialize sock_xprt.srcaddr in all cases
    SUNRPC: Use conventional switch statement when reclassifying sockets
    sunrpc/xprtrdma: clean up workqueue usage
    sunrpc: Turn list_for_each-s into the ..._entry-s
    ...

    Fix up trivial conflicts (two different deprecation notices added in
    separate branches) in Documentation/feature-removal-schedule.txt

    Linus Torvalds
     

26 Oct, 2010

2 commits

  • Instead of always assigning an increasing inode number in new_inode
    move the call to assign it into those callers that actually need it.
    For now callers that need it is estimated conservatively, that is
    the call is added to all filesystems that do not assign an i_ino
    by themselves. For a few more filesystems we can avoid assigning
    any inode number given that they aren't user visible, and for others
    it could be done lazily when an inode number is actually needed,
    but that's left for later patches.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Dave Chinner
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Clones an existing reference to inode; caller must already hold one.

    Signed-off-by: Al Viro

    Al Viro
     

24 Oct, 2010

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1699 commits)
    bnx2/bnx2x: Unsupported Ethtool operations should return -EINVAL.
    vlan: Calling vlan_hwaccel_do_receive() is always valid.
    tproxy: use the interface primary IP address as a default value for --on-ip
    tproxy: added IPv6 support to the socket match
    cxgb3: function namespace cleanup
    tproxy: added IPv6 support to the TPROXY target
    tproxy: added IPv6 socket lookup function to nf_tproxy_core
    be2net: Changes to use only priority codes allowed by f/w
    tproxy: allow non-local binds of IPv6 sockets if IP_TRANSPARENT is enabled
    tproxy: added tproxy sockopt interface in the IPV6 layer
    tproxy: added udp6_lib_lookup function
    tproxy: added const specifiers to udp lookup functions
    tproxy: split off ipv6 defragmentation to a separate module
    l2tp: small cleanup
    nf_nat: restrict ICMP translation for embedded header
    can: mcp251x: fix generation of error frames
    can: mcp251x: fix endless loop in interrupt handler if CANINTF_MERRF is set
    can-raw: add msg_flags to distinguish local traffic
    9p: client code cleanup
    rds: make local functions/variables static
    ...

    Fix up conflicts in net/core/dev.c, drivers/net/pcmcia/smc91c92_cs.c and
    drivers/net/wireless/ath/ath9k/debug.c as per David

    Linus Torvalds
     

21 Oct, 2010

1 commit


15 Oct, 2010

1 commit

  • All file_operations should get a .llseek operation so we can make
    nonseekable_open the default for future file operations without a
    .llseek pointer.

    The three cases that we can automatically detect are no_llseek, seq_lseek
    and default_llseek. For cases where we can we can automatically prove that
    the file offset is always ignored, we use noop_llseek, which maintains
    the current behavior of not returning an error from a seek.

    New drivers should normally not use noop_llseek but instead use no_llseek
    and call nonseekable_open at open time. Existing drivers can be converted
    to do the same when the maintainer knows for certain that no user code
    relies on calling seek on the device file.

    The generated code is often incorrectly indented and right now contains
    comments that clarify for each added line why a specific variant was
    chosen. In the version that gets submitted upstream, the comments will
    be gone and I will manually fix the indentation, because there does not
    seem to be a way to do that using coccinelle.

    Some amount of new code is currently sitting in linux-next that should get
    the same modifications, which I will do at the end of the merge window.

    Many thanks to Julia Lawall for helping me learn to write a semantic
    patch that does all this.

    ===== begin semantic patch =====
    // This adds an llseek= method to all file operations,
    // as a preparation for making no_llseek the default.
    //
    // The rules are
    // - use no_llseek explicitly if we do nonseekable_open
    // - use seq_lseek for sequential files
    // - use default_llseek if we know we access f_pos
    // - use noop_llseek if we know we don't access f_pos,
    // but we still want to allow users to call lseek
    //
    @ open1 exists @
    identifier nested_open;
    @@
    nested_open(...)
    {

    }

    @ open exists@
    identifier open_f;
    identifier i, f;
    identifier open1.nested_open;
    @@
    int open_f(struct inode *i, struct file *f)
    {

    }

    @ read disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {

    }

    @ read_no_fpos disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ write @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {

    }

    @ write_no_fpos @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ fops0 @
    identifier fops;
    @@
    struct file_operations fops = {
    ...
    };

    @ has_llseek depends on fops0 @
    identifier fops0.fops;
    identifier llseek_f;
    @@
    struct file_operations fops = {
    ...
    .llseek = llseek_f,
    ...
    };

    @ has_read depends on fops0 @
    identifier fops0.fops;
    identifier read_f;
    @@
    struct file_operations fops = {
    ...
    .read = read_f,
    ...
    };

    @ has_write depends on fops0 @
    identifier fops0.fops;
    identifier write_f;
    @@
    struct file_operations fops = {
    ...
    .write = write_f,
    ...
    };

    @ has_open depends on fops0 @
    identifier fops0.fops;
    identifier open_f;
    @@
    struct file_operations fops = {
    ...
    .open = open_f,
    ...
    };

    // use no_llseek if we call nonseekable_open
    ////////////////////////////////////////////
    @ nonseekable1 depends on !has_llseek && has_open @
    identifier fops0.fops;
    identifier nso ~= "nonseekable_open";
    @@
    struct file_operations fops = {
    ... .open = nso, ...
    +.llseek = no_llseek, /* nonseekable */
    };

    @ nonseekable2 depends on !has_llseek @
    identifier fops0.fops;
    identifier open.open_f;
    @@
    struct file_operations fops = {
    ... .open = open_f, ...
    +.llseek = no_llseek, /* open uses nonseekable */
    };

    // use seq_lseek for sequential files
    /////////////////////////////////////
    @ seq depends on !has_llseek @
    identifier fops0.fops;
    identifier sr ~= "seq_read";
    @@
    struct file_operations fops = {
    ... .read = sr, ...
    +.llseek = seq_lseek, /* we have seq_read */
    };

    // use default_llseek if there is a readdir
    ///////////////////////////////////////////
    @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier readdir_e;
    @@
    // any other fop is used that changes pos
    struct file_operations fops = {
    ... .readdir = readdir_e, ...
    +.llseek = default_llseek, /* readdir is present */
    };

    // use default_llseek if at least one of read/write touches f_pos
    /////////////////////////////////////////////////////////////////
    @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read.read_f;
    @@
    // read fops use offset
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = default_llseek, /* read accesses f_pos */
    };

    @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ... .write = write_f, ...
    + .llseek = default_llseek, /* write accesses f_pos */
    };

    // Use noop_llseek if neither read nor write accesses f_pos
    ///////////////////////////////////////////////////////////

    @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    identifier write_no_fpos.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ...
    .write = write_f,
    .read = read_f,
    ...
    +.llseek = noop_llseek, /* read and write both use no f_pos */
    };

    @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write_no_fpos.write_f;
    @@
    struct file_operations fops = {
    ... .write = write_f, ...
    +.llseek = noop_llseek, /* write uses no f_pos */
    };

    @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    @@
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = noop_llseek, /* read uses no f_pos */
    };

    @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    @@
    struct file_operations fops = {
    ...
    +.llseek = noop_llseek, /* no read or write fn */
    };
    ===== End semantic patch =====

    Signed-off-by: Arnd Bergmann
    Cc: Julia Lawall
    Cc: Christoph Hellwig

    Arnd Bergmann
     

02 Oct, 2010

1 commit


09 Sep, 2010

1 commit

  • Casts __kernel to __user pointer require __force markup, so add it. Also
    sock_get/setsockopt() takes @optval and/or @optlen arguments as user pointers
    but were taking kernel pointers, use new variables 'uoptval' and/or 'uoptlen'
    to fix it. These remove following warnings from sparse:

    net/socket.c:1922:46: warning: cast adds address space to expression ()
    net/socket.c:3061:61: warning: incorrect type in argument 4 (different address spaces)
    net/socket.c:3061:61: expected char [noderef] *optval
    net/socket.c:3061:61: got char *optval
    net/socket.c:3061:69: warning: incorrect type in argument 5 (different address spaces)
    net/socket.c:3061:69: expected int [noderef] *optlen
    net/socket.c:3061:69: got int *optlen
    net/socket.c:3063:67: warning: incorrect type in argument 4 (different address spaces)
    net/socket.c:3063:67: expected char [noderef] *optval
    net/socket.c:3063:67: got char *optval
    net/socket.c:3064:45: warning: incorrect type in argument 5 (different address spaces)
    net/socket.c:3064:45: expected int [noderef] *optlen
    net/socket.c:3064:45: got int *optlen
    net/socket.c:3078:61: warning: incorrect type in argument 4 (different address spaces)
    net/socket.c:3078:61: expected char [noderef] *optval
    net/socket.c:3078:61: got char *optval
    net/socket.c:3080:67: warning: incorrect type in argument 4 (different address spaces)
    net/socket.c:3080:67: expected char [noderef] *optval
    net/socket.c:3080:67: got char *optval

    Signed-off-by: Namhyung Kim
    Signed-off-by: David S. Miller

    Namhyung Kim
     

19 Aug, 2010

1 commit

  • This patch removes the abstraction introduced by the union skb_shared_tx in
    the shared skb data.

    The access of the different union elements at several places led to some
    confusion about accessing the shared tx_flags e.g. in skb_orphan_try().

    http://marc.info/?l=linux-netdev&m=128084897415886&w=2

    Signed-off-by: Oliver Hartkopp
    Signed-off-by: David S. Miller

    Oliver Hartkopp
     

19 Jul, 2010

2 commits

  • This patch adds a new networking option to allow hardware time stamps
    from PHY devices. When enabled, likely candidates among incoming and
    outgoing network packets are offered to the PHY driver for possible
    time stamping. When accepted by the PHY driver, incoming packets are
    deferred for later delivery by the driver.

    The patch also adds phylib driver methods for the SIOCSHWTSTAMP ioctl
    and callbacks for transmit and receive time stamping. Drivers may
    optionally implement these functions.

    Signed-off-by: Richard Cochran
    Signed-off-by: David S. Miller

    Richard Cochran
     
  • MAX_SOCK_ADDR is no longer used because commit 230b1839 "net: Use standard
    structures for generic socket address structures." replaced
    "char address[MAX_SOCK_ADDR];" with "struct sockaddr_storage address;".

    Signed-off-by: Tetsuo Handa
    Signed-off-by: David S. Miller

    Tetsuo Handa
     

04 Jun, 2010

1 commit

  • From: Eric Dumazet
    Date: Thu, 3 Jun 2010 04:29:41 +0000
    Subject: [PATCH 2/3] net: net/socket.c and net/compat.c cleanups

    cleanup patch, to match modern coding style.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    ---
    net/compat.c | 47 ++++++++---------
    net/socket.c | 165 ++++++++++++++++++++++++++++------------------------------
    2 files changed, 102 insertions(+), 110 deletions(-)

    diff --git a/net/compat.c b/net/compat.c
    index 1cf7590..63d260e 100644
    --- a/net/compat.c
    +++ b/net/compat.c
    @@ -81,7 +81,7 @@ int verify_compat_iovec(struct msghdr *kern_msg, struct iovec *kern_iov,
    int tot_len;

    if (kern_msg->msg_namelen) {
    - if (mode==VERIFY_READ) {
    + if (mode == VERIFY_READ) {
    int err = move_addr_to_kernel(kern_msg->msg_name,
    kern_msg->msg_namelen,
    kern_address);
    @@ -354,7 +354,7 @@ static int do_set_attach_filter(struct socket *sock, int level, int optname,
    static int do_set_sock_timeout(struct socket *sock, int level,
    int optname, char __user *optval, unsigned int optlen)
    {
    - struct compat_timeval __user *up = (struct compat_timeval __user *) optval;
    + struct compat_timeval __user *up = (struct compat_timeval __user *)optval;
    struct timeval ktime;
    mm_segment_t old_fs;
    int err;
    @@ -367,7 +367,7 @@ static int do_set_sock_timeout(struct socket *sock, int level,
    return -EFAULT;
    old_fs = get_fs();
    set_fs(KERNEL_DS);
    - err = sock_setsockopt(sock, level, optname, (char *) &ktime, sizeof(ktime));
    + err = sock_setsockopt(sock, level, optname, (char *)&ktime, sizeof(ktime));
    set_fs(old_fs);

    return err;
    @@ -389,11 +389,10 @@ asmlinkage long compat_sys_setsockopt(int fd, int level, int optname,
    char __user *optval, unsigned int optlen)
    {
    int err;
    - struct socket *sock;
    + struct socket *sock = sockfd_lookup(fd, &err);

    - if ((sock = sockfd_lookup(fd, &err))!=NULL)
    - {
    - err = security_socket_setsockopt(sock,level,optname);
    + if (sock) {
    + err = security_socket_setsockopt(sock, level, optname);
    if (err) {
    sockfd_put(sock);
    return err;
    @@ -453,7 +452,7 @@ static int compat_sock_getsockopt(struct socket *sock, int level, int optname,
    int compat_sock_get_timestamp(struct sock *sk, struct timeval __user *userstamp)
    {
    struct compat_timeval __user *ctv =
    - (struct compat_timeval __user*) userstamp;
    + (struct compat_timeval __user *) userstamp;
    int err = -ENOENT;
    struct timeval tv;

    @@ -477,7 +476,7 @@ EXPORT_SYMBOL(compat_sock_get_timestamp);
    int compat_sock_get_timestampns(struct sock *sk, struct timespec __user *userstamp)
    {
    struct compat_timespec __user *ctv =
    - (struct compat_timespec __user*) userstamp;
    + (struct compat_timespec __user *) userstamp;
    int err = -ENOENT;
    struct timespec ts;

    @@ -502,12 +501,10 @@ asmlinkage long compat_sys_getsockopt(int fd, int level, int optname,
    char __user *optval, int __user *optlen)
    {
    int err;
    - struct socket *sock;
    + struct socket *sock = sockfd_lookup(fd, &err);

    - if ((sock = sockfd_lookup(fd, &err))!=NULL)
    - {
    - err = security_socket_getsockopt(sock, level,
    - optname);
    + if (sock) {
    + err = security_socket_getsockopt(sock, level, optname);
    if (err) {
    sockfd_put(sock);
    return err;
    @@ -557,7 +554,7 @@ struct compat_group_filter {

    int compat_mc_setsockopt(struct sock *sock, int level, int optname,
    char __user *optval, unsigned int optlen,
    - int (*setsockopt)(struct sock *,int,int,char __user *,unsigned int))
    + int (*setsockopt)(struct sock *, int, int, char __user *, unsigned int))
    {
    char __user *koptval = optval;
    int koptlen = optlen;
    @@ -640,12 +637,11 @@ int compat_mc_setsockopt(struct sock *sock, int level, int optname,
    }
    return setsockopt(sock, level, optname, koptval, koptlen);
    }
    -
    EXPORT_SYMBOL(compat_mc_setsockopt);

    int compat_mc_getsockopt(struct sock *sock, int level, int optname,
    char __user *optval, int __user *optlen,
    - int (*getsockopt)(struct sock *,int,int,char __user *,int __user *))
    + int (*getsockopt)(struct sock *, int, int, char __user *, int __user *))
    {
    struct compat_group_filter __user *gf32 = (void *)optval;
    struct group_filter __user *kgf;
    @@ -681,7 +677,7 @@ int compat_mc_getsockopt(struct sock *sock, int level, int optname,
    __put_user(interface, &kgf->gf_interface) ||
    __put_user(fmode, &kgf->gf_fmode) ||
    __put_user(numsrc, &kgf->gf_numsrc) ||
    - copy_in_user(&kgf->gf_group,&gf32->gf_group,sizeof(kgf->gf_group)))
    + copy_in_user(&kgf->gf_group, &gf32->gf_group, sizeof(kgf->gf_group)))
    return -EFAULT;

    err = getsockopt(sock, level, optname, (char __user *)kgf, koptlen);
    @@ -714,21 +710,22 @@ int compat_mc_getsockopt(struct sock *sock, int level, int optname,
    copylen = numsrc * sizeof(gf32->gf_slist[0]);
    if (copylen > klen)
    copylen = klen;
    - if (copy_in_user(gf32->gf_slist, kgf->gf_slist, copylen))
    + if (copy_in_user(gf32->gf_slist, kgf->gf_slist, copylen))
    return -EFAULT;
    }
    return err;
    }
    -
    EXPORT_SYMBOL(compat_mc_getsockopt);

    /* Argument list sizes for compat_sys_socketcall */
    #define AL(x) ((x) * sizeof(u32))
    -static unsigned char nas[20]={AL(0),AL(3),AL(3),AL(3),AL(2),AL(3),
    - AL(3),AL(3),AL(4),AL(4),AL(4),AL(6),
    - AL(6),AL(2),AL(5),AL(5),AL(3),AL(3),
    - AL(4),AL(5)};
    +static unsigned char nas[20] = {
    + AL(0), AL(3), AL(3), AL(3), AL(2), AL(3),
    + AL(3), AL(3), AL(4), AL(4), AL(4), AL(6),
    + AL(6), AL(2), AL(5), AL(5), AL(3), AL(3),
    + AL(4), AL(5)
    +};
    #undef AL

    asmlinkage long compat_sys_sendmsg(int fd, struct compat_msghdr __user *msg, unsigned flags)
    @@ -827,7 +824,7 @@ asmlinkage long compat_sys_socketcall(int call, u32 __user *args)
    compat_ptr(a[4]), compat_ptr(a[5]));
    break;
    case SYS_SHUTDOWN:
    - ret = sys_shutdown(a0,a1);
    + ret = sys_shutdown(a0, a1);
    break;
    case SYS_SETSOCKOPT:
    ret = compat_sys_setsockopt(a0, a1, a[2],
    diff --git a/net/socket.c b/net/socket.c
    index 367d547..b63c051 100644
    --- a/net/socket.c
    +++ b/net/socket.c
    @@ -124,7 +124,7 @@ static int sock_fasync(int fd, struct file *filp, int on);
    static ssize_t sock_sendpage(struct file *file, struct page *page,
    int offset, size_t size, loff_t *ppos, int more);
    static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
    - struct pipe_inode_info *pipe, size_t len,
    + struct pipe_inode_info *pipe, size_t len,
    unsigned int flags);

    /*
    @@ -162,7 +162,7 @@ static const struct net_proto_family *net_families[NPROTO] __read_mostly;
    * Statistics counters of the socket lists
    */

    -static DEFINE_PER_CPU(int, sockets_in_use) = 0;
    +static DEFINE_PER_CPU(int, sockets_in_use);

    /*
    * Support routines.
    @@ -309,9 +309,9 @@ static int init_inodecache(void)
    }

    static const struct super_operations sockfs_ops = {
    - .alloc_inode = sock_alloc_inode,
    - .destroy_inode =sock_destroy_inode,
    - .statfs = simple_statfs,
    + .alloc_inode = sock_alloc_inode,
    + .destroy_inode = sock_destroy_inode,
    + .statfs = simple_statfs,
    };

    static int sockfs_get_sb(struct file_system_type *fs_type,
    @@ -411,6 +411,7 @@ int sock_map_fd(struct socket *sock, int flags)

    return fd;
    }
    +EXPORT_SYMBOL(sock_map_fd);

    static struct socket *sock_from_file(struct file *file, int *err)
    {
    @@ -422,7 +423,7 @@ static struct socket *sock_from_file(struct file *file, int *err)
    }

    /**
    - * sockfd_lookup - Go from a file number to its socket slot
    + * sockfd_lookup - Go from a file number to its socket slot
    * @fd: file handle
    * @err: pointer to an error code return
    *
    @@ -450,6 +451,7 @@ struct socket *sockfd_lookup(int fd, int *err)
    fput(file);
    return sock;
    }
    +EXPORT_SYMBOL(sockfd_lookup);

    static struct socket *sockfd_lookup_light(int fd, int *err, int *fput_needed)
    {
    @@ -540,6 +542,7 @@ void sock_release(struct socket *sock)
    }
    sock->file = NULL;
    }
    +EXPORT_SYMBOL(sock_release);

    int sock_tx_timestamp(struct msghdr *msg, struct sock *sk,
    union skb_shared_tx *shtx)
    @@ -586,6 +589,7 @@ int sock_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
    ret = wait_on_sync_kiocb(&iocb);
    return ret;
    }
    +EXPORT_SYMBOL(sock_sendmsg);

    int kernel_sendmsg(struct socket *sock, struct msghdr *msg,
    struct kvec *vec, size_t num, size_t size)
    @@ -604,6 +608,7 @@ int kernel_sendmsg(struct socket *sock, struct msghdr *msg,
    set_fs(oldfs);
    return result;
    }
    +EXPORT_SYMBOL(kernel_sendmsg);

    static int ktime2ts(ktime_t kt, struct timespec *ts)
    {
    @@ -664,7 +669,6 @@ void __sock_recv_timestamp(struct msghdr *msg, struct sock *sk,
    put_cmsg(msg, SOL_SOCKET,
    SCM_TIMESTAMPING, sizeof(ts), &ts);
    }
    -
    EXPORT_SYMBOL_GPL(__sock_recv_timestamp);

    inline void sock_recv_drops(struct msghdr *msg, struct sock *sk, struct sk_buff *skb)
    @@ -720,6 +724,7 @@ int sock_recvmsg(struct socket *sock, struct msghdr *msg,
    ret = wait_on_sync_kiocb(&iocb);
    return ret;
    }
    +EXPORT_SYMBOL(sock_recvmsg);

    static int sock_recvmsg_nosec(struct socket *sock, struct msghdr *msg,
    size_t size, int flags)
    @@ -752,6 +757,7 @@ int kernel_recvmsg(struct socket *sock, struct msghdr *msg,
    set_fs(oldfs);
    return result;
    }
    +EXPORT_SYMBOL(kernel_recvmsg);

    static void sock_aio_dtor(struct kiocb *iocb)
    {
    @@ -774,7 +780,7 @@ static ssize_t sock_sendpage(struct file *file, struct page *page,
    }

    static ssize_t sock_splice_read(struct file *file, loff_t *ppos,
    - struct pipe_inode_info *pipe, size_t len,
    + struct pipe_inode_info *pipe, size_t len,
    unsigned int flags)
    {
    struct socket *sock = file->private_data;
    @@ -887,7 +893,7 @@ static ssize_t sock_aio_write(struct kiocb *iocb, const struct iovec *iov,
    */

    static DEFINE_MUTEX(br_ioctl_mutex);
    -static int (*br_ioctl_hook) (struct net *, unsigned int cmd, void __user *arg) = NULL;
    +static int (*br_ioctl_hook) (struct net *, unsigned int cmd, void __user *arg);

    void brioctl_set(int (*hook) (struct net *, unsigned int, void __user *))
    {
    @@ -895,7 +901,6 @@ void brioctl_set(int (*hook) (struct net *, unsigned int, void __user *))
    br_ioctl_hook = hook;
    mutex_unlock(&br_ioctl_mutex);
    }
    -
    EXPORT_SYMBOL(brioctl_set);

    static DEFINE_MUTEX(vlan_ioctl_mutex);
    @@ -907,7 +912,6 @@ void vlan_ioctl_set(int (*hook) (struct net *, void __user *))
    vlan_ioctl_hook = hook;
    mutex_unlock(&vlan_ioctl_mutex);
    }
    -
    EXPORT_SYMBOL(vlan_ioctl_set);

    static DEFINE_MUTEX(dlci_ioctl_mutex);
    @@ -919,7 +923,6 @@ void dlci_ioctl_set(int (*hook) (unsigned int, void __user *))
    dlci_ioctl_hook = hook;
    mutex_unlock(&dlci_ioctl_mutex);
    }
    -
    EXPORT_SYMBOL(dlci_ioctl_set);

    static long sock_do_ioctl(struct net *net, struct socket *sock,
    @@ -1047,6 +1050,7 @@ out_release:
    sock = NULL;
    goto out;
    }
    +EXPORT_SYMBOL(sock_create_lite);

    /* No kernel lock held - perfect */
    static unsigned int sock_poll(struct file *file, poll_table *wait)
    @@ -1147,6 +1151,7 @@ call_kill:
    rcu_read_unlock();
    return 0;
    }
    +EXPORT_SYMBOL(sock_wake_async);

    static int __sock_create(struct net *net, int family, int type, int protocol,
    struct socket **res, int kern)
    @@ -1265,11 +1270,13 @@ int sock_create(int family, int type, int protocol, struct socket **res)
    {
    return __sock_create(current->nsproxy->net_ns, family, type, protocol, res, 0);
    }
    +EXPORT_SYMBOL(sock_create);

    int sock_create_kern(int family, int type, int protocol, struct socket **res)
    {
    return __sock_create(&init_net, family, type, protocol, res, 1);
    }
    +EXPORT_SYMBOL(sock_create_kern);

    SYSCALL_DEFINE3(socket, int, family, int, type, int, protocol)
    {
    @@ -1474,7 +1481,8 @@ SYSCALL_DEFINE4(accept4, int, fd, struct sockaddr __user *, upeer_sockaddr,
    goto out;

    err = -ENFILE;
    - if (!(newsock = sock_alloc()))
    + newsock = sock_alloc();
    + if (!newsock)
    goto out_put;

    newsock->type = sock->type;
    @@ -1861,8 +1869,7 @@ SYSCALL_DEFINE3(sendmsg, int, fd, struct msghdr __user *, msg, unsigned, flags)
    if (MSG_CMSG_COMPAT & flags) {
    if (get_compat_msghdr(&msg_sys, msg_compat))
    return -EFAULT;
    - }
    - else if (copy_from_user(&msg_sys, msg, sizeof(struct msghdr)))
    + } else if (copy_from_user(&msg_sys, msg, sizeof(struct msghdr)))
    return -EFAULT;

    sock = sockfd_lookup_light(fd, &err, &fput_needed);
    @@ -1964,8 +1971,7 @@ static int __sys_recvmsg(struct socket *sock, struct msghdr __user *msg,
    if (MSG_CMSG_COMPAT & flags) {
    if (get_compat_msghdr(msg_sys, msg_compat))
    return -EFAULT;
    - }
    - else if (copy_from_user(msg_sys, msg, sizeof(struct msghdr)))
    + } else if (copy_from_user(msg_sys, msg, sizeof(struct msghdr)))
    return -EFAULT;

    err = -EMSGSIZE;
    @@ -2191,10 +2197,10 @@ SYSCALL_DEFINE5(recvmmsg, int, fd, struct mmsghdr __user *, mmsg,
    /* Argument list sizes for sys_socketcall */
    #define AL(x) ((x) * sizeof(unsigned long))
    static const unsigned char nargs[20] = {
    - AL(0),AL(3),AL(3),AL(3),AL(2),AL(3),
    - AL(3),AL(3),AL(4),AL(4),AL(4),AL(6),
    - AL(6),AL(2),AL(5),AL(5),AL(3),AL(3),
    - AL(4),AL(5)
    + AL(0), AL(3), AL(3), AL(3), AL(2), AL(3),
    + AL(3), AL(3), AL(4), AL(4), AL(4), AL(6),
    + AL(6), AL(2), AL(5), AL(5), AL(3), AL(3),
    + AL(4), AL(5)
    };

    #undef AL
    @@ -2340,6 +2346,7 @@ int sock_register(const struct net_proto_family *ops)
    printk(KERN_INFO "NET: Registered protocol family %d\n", ops->family);
    return err;
    }
    +EXPORT_SYMBOL(sock_register);

    /**
    * sock_unregister - remove a protocol handler
    @@ -2366,6 +2373,7 @@ void sock_unregister(int family)

    printk(KERN_INFO "NET: Unregistered protocol family %d\n", family);
    }
    +EXPORT_SYMBOL(sock_unregister);

    static int __init sock_init(void)
    {
    @@ -2490,13 +2498,13 @@ static int dev_ifconf(struct net *net, struct compat_ifconf __user *uifc32)
    ifc.ifc_req = NULL;
    uifc = compat_alloc_user_space(sizeof(struct ifconf));
    } else {
    - size_t len =((ifc32.ifc_len / sizeof (struct compat_ifreq)) + 1) *
    - sizeof (struct ifreq);
    + size_t len = ((ifc32.ifc_len / sizeof(struct compat_ifreq)) + 1) *
    + sizeof(struct ifreq);
    uifc = compat_alloc_user_space(sizeof(struct ifconf) + len);
    ifc.ifc_len = len;
    ifr = ifc.ifc_req = (void __user *)(uifc + 1);
    ifr32 = compat_ptr(ifc32.ifcbuf);
    - for (i = 0; i < ifc32.ifc_len; i += sizeof (struct compat_ifreq)) {
    + for (i = 0; i < ifc32.ifc_len; i += sizeof(struct compat_ifreq)) {
    if (copy_in_user(ifr, ifr32, sizeof(struct compat_ifreq)))
    return -EFAULT;
    ifr++;
    @@ -2516,9 +2524,9 @@ static int dev_ifconf(struct net *net, struct compat_ifconf __user *uifc32)
    ifr = ifc.ifc_req;
    ifr32 = compat_ptr(ifc32.ifcbuf);
    for (i = 0, j = 0;
    - i + sizeof (struct compat_ifreq) < ifc.ifc_len;
    - i += sizeof (struct compat_ifreq), j += sizeof (struct ifreq)) {
    - if (copy_in_user(ifr32, ifr, sizeof (struct compat_ifreq)))
    + i + sizeof(struct compat_ifreq) < ifc.ifc_len;
    + i += sizeof(struct compat_ifreq), j += sizeof(struct ifreq)) {
    + if (copy_in_user(ifr32, ifr, sizeof(struct compat_ifreq)))
    return -EFAULT;
    ifr32++;
    ifr++;
    @@ -2567,7 +2575,7 @@ static int compat_siocwandev(struct net *net, struct compat_ifreq __user *uifr32
    compat_uptr_t uptr32;
    struct ifreq __user *uifr;

    - uifr = compat_alloc_user_space(sizeof (*uifr));
    + uifr = compat_alloc_user_space(sizeof(*uifr));
    if (copy_in_user(uifr, uifr32, sizeof(struct compat_ifreq)))
    return -EFAULT;

    @@ -2601,9 +2609,9 @@ static int bond_ioctl(struct net *net, unsigned int cmd,
    return -EFAULT;

    old_fs = get_fs();
    - set_fs (KERNEL_DS);
    + set_fs(KERNEL_DS);
    err = dev_ioctl(net, cmd, &kifr);
    - set_fs (old_fs);
    + set_fs(old_fs);

    return err;
    case SIOCBONDSLAVEINFOQUERY:
    @@ -2710,9 +2718,9 @@ static int compat_sioc_ifmap(struct net *net, unsigned int cmd,
    return -EFAULT;

    old_fs = get_fs();
    - set_fs (KERNEL_DS);
    + set_fs(KERNEL_DS);
    err = dev_ioctl(net, cmd, (void __user *)&ifr);
    - set_fs (old_fs);
    + set_fs(old_fs);

    if (cmd == SIOCGIFMAP && !err) {
    err = copy_to_user(uifr32, &ifr, sizeof(ifr.ifr_name));
    @@ -2734,7 +2742,7 @@ static int compat_siocshwtstamp(struct net *net, struct compat_ifreq __user *uif
    compat_uptr_t uptr32;
    struct ifreq __user *uifr;

    - uifr = compat_alloc_user_space(sizeof (*uifr));
    + uifr = compat_alloc_user_space(sizeof(*uifr));
    if (copy_in_user(uifr, uifr32, sizeof(struct compat_ifreq)))
    return -EFAULT;

    @@ -2750,20 +2758,20 @@ static int compat_siocshwtstamp(struct net *net, struct compat_ifreq __user *uif
    }

    struct rtentry32 {
    - u32 rt_pad1;
    + u32 rt_pad1;
    struct sockaddr rt_dst; /* target address */
    struct sockaddr rt_gateway; /* gateway addr (RTF_GATEWAY) */
    struct sockaddr rt_genmask; /* target network mask (IP) */
    - unsigned short rt_flags;
    - short rt_pad2;
    - u32 rt_pad3;
    - unsigned char rt_tos;
    - unsigned char rt_class;
    - short rt_pad4;
    - short rt_metric; /* +1 for binary compatibility! */
    + unsigned short rt_flags;
    + short rt_pad2;
    + u32 rt_pad3;
    + unsigned char rt_tos;
    + unsigned char rt_class;
    + short rt_pad4;
    + short rt_metric; /* +1 for binary compatibility! */
    /* char * */ u32 rt_dev; /* forcing the device at add */
    - u32 rt_mtu; /* per route MTU/Window */
    - u32 rt_window; /* Window clamping */
    + u32 rt_mtu; /* per route MTU/Window */
    + u32 rt_window; /* Window clamping */
    unsigned short rt_irtt; /* Initial RTT */
    };

    @@ -2793,29 +2801,29 @@ static int routing_ioctl(struct net *net, struct socket *sock,

    if (sock && sock->sk && sock->sk->sk_family == AF_INET6) { /* ipv6 */
    struct in6_rtmsg32 __user *ur6 = argp;
    - ret = copy_from_user (&r6.rtmsg_dst, &(ur6->rtmsg_dst),
    + ret = copy_from_user(&r6.rtmsg_dst, &(ur6->rtmsg_dst),
    3 * sizeof(struct in6_addr));
    - ret |= __get_user (r6.rtmsg_type, &(ur6->rtmsg_type));
    - ret |= __get_user (r6.rtmsg_dst_len, &(ur6->rtmsg_dst_len));
    - ret |= __get_user (r6.rtmsg_src_len, &(ur6->rtmsg_src_len));
    - ret |= __get_user (r6.rtmsg_metric, &(ur6->rtmsg_metric));
    - ret |= __get_user (r6.rtmsg_info, &(ur6->rtmsg_info));
    - ret |= __get_user (r6.rtmsg_flags, &(ur6->rtmsg_flags));
    - ret |= __get_user (r6.rtmsg_ifindex, &(ur6->rtmsg_ifindex));
    + ret |= __get_user(r6.rtmsg_type, &(ur6->rtmsg_type));
    + ret |= __get_user(r6.rtmsg_dst_len, &(ur6->rtmsg_dst_len));
    + ret |= __get_user(r6.rtmsg_src_len, &(ur6->rtmsg_src_len));
    + ret |= __get_user(r6.rtmsg_metric, &(ur6->rtmsg_metric));
    + ret |= __get_user(r6.rtmsg_info, &(ur6->rtmsg_info));
    + ret |= __get_user(r6.rtmsg_flags, &(ur6->rtmsg_flags));
    + ret |= __get_user(r6.rtmsg_ifindex, &(ur6->rtmsg_ifindex));

    r = (void *) &r6;
    } else { /* ipv4 */
    struct rtentry32 __user *ur4 = argp;
    - ret = copy_from_user (&r4.rt_dst, &(ur4->rt_dst),
    + ret = copy_from_user(&r4.rt_dst, &(ur4->rt_dst),
    3 * sizeof(struct sockaddr));
    - ret |= __get_user (r4.rt_flags, &(ur4->rt_flags));
    - ret |= __get_user (r4.rt_metric, &(ur4->rt_metric));
    - ret |= __get_user (r4.rt_mtu, &(ur4->rt_mtu));
    - ret |= __get_user (r4.rt_window, &(ur4->rt_window));
    - ret |= __get_user (r4.rt_irtt, &(ur4->rt_irtt));
    - ret |= __get_user (rtdev, &(ur4->rt_dev));
    + ret |= __get_user(r4.rt_flags, &(ur4->rt_flags));
    + ret |= __get_user(r4.rt_metric, &(ur4->rt_metric));
    + ret |= __get_user(r4.rt_mtu, &(ur4->rt_mtu));
    + ret |= __get_user(r4.rt_window, &(ur4->rt_window));
    + ret |= __get_user(r4.rt_irtt, &(ur4->rt_irtt));
    + ret |= __get_user(rtdev, &(ur4->rt_dev));
    if (rtdev) {
    - ret |= copy_from_user (devname, compat_ptr(rtdev), 15);
    + ret |= copy_from_user(devname, compat_ptr(rtdev), 15);
    r4.rt_dev = devname; devname[15] = 0;
    } else
    r4.rt_dev = NULL;
    @@ -2828,9 +2836,9 @@ static int routing_ioctl(struct net *net, struct socket *sock,
    goto out;
    }

    - set_fs (KERNEL_DS);
    + set_fs(KERNEL_DS);
    ret = sock_do_ioctl(net, sock, cmd, (unsigned long) r);
    - set_fs (old_fs);
    + set_fs(old_fs);

    out:
    return ret;
    @@ -2993,11 +3001,13 @@ int kernel_bind(struct socket *sock, struct sockaddr *addr, int addrlen)
    {
    return sock->ops->bind(sock, addr, addrlen);
    }
    +EXPORT_SYMBOL(kernel_bind);

    int kernel_listen(struct socket *sock, int backlog)
    {
    return sock->ops->listen(sock, backlog);
    }
    +EXPORT_SYMBOL(kernel_listen);

    int kernel_accept(struct socket *sock, struct socket **newsock, int flags)
    {
    @@ -3022,24 +3032,28 @@ int kernel_accept(struct socket *sock, struct socket **newsock, int flags)
    done:
    return err;
    }
    +EXPORT_SYMBOL(kernel_accept);

    int kernel_connect(struct socket *sock, struct sockaddr *addr, int addrlen,
    int flags)
    {
    return sock->ops->connect(sock, addr, addrlen, flags);
    }
    +EXPORT_SYMBOL(kernel_connect);

    int kernel_getsockname(struct socket *sock, struct sockaddr *addr,
    int *addrlen)
    {
    return sock->ops->getname(sock, addr, addrlen, 0);
    }
    +EXPORT_SYMBOL(kernel_getsockname);

    int kernel_getpeername(struct socket *sock, struct sockaddr *addr,
    int *addrlen)
    {
    return sock->ops->getname(sock, addr, addrlen, 1);
    }
    +EXPORT_SYMBOL(kernel_getpeername);

    int kernel_getsockopt(struct socket *sock, int level, int optname,
    char *optval, int *optlen)
    @@ -3056,6 +3070,7 @@ int kernel_getsockopt(struct socket *sock, int level, int optname,
    set_fs(oldfs);
    return err;
    }
    +EXPORT_SYMBOL(kernel_getsockopt);

    int kernel_setsockopt(struct socket *sock, int level, int optname,
    char *optval, unsigned int optlen)
    @@ -3072,6 +3087,7 @@ int kernel_setsockopt(struct socket *sock, int level, int optname,
    set_fs(oldfs);
    return err;
    }
    +EXPORT_SYMBOL(kernel_setsockopt);

    int kernel_sendpage(struct socket *sock, struct page *page, int offset,
    size_t size, int flags)
    @@ -3083,6 +3099,7 @@ int kernel_sendpage(struct socket *sock, struct page *page, int offset,

    return sock_no_sendpage(sock, page, offset, size, flags);
    }
    +EXPORT_SYMBOL(kernel_sendpage);

    int kernel_sock_ioctl(struct socket *sock, int cmd, unsigned long arg)
    {
    @@ -3095,33 +3112,11 @@ int kernel_sock_ioctl(struct socket *sock, int cmd, unsigned long arg)

    return err;
    }
    +EXPORT_SYMBOL(kernel_sock_ioctl);

    int kernel_sock_shutdown(struct socket *sock, enum sock_shutdown_cmd how)
    {
    return sock->ops->shutdown(sock, how);
    }
    -
    -EXPORT_SYMBOL(sock_create);
    -EXPORT_SYMBOL(sock_create_kern);
    -EXPORT_SYMBOL(sock_create_lite);
    -EXPORT_SYMBOL(sock_map_fd);
    -EXPORT_SYMBOL(sock_recvmsg);
    -EXPORT_SYMBOL(sock_register);
    -EXPORT_SYMBOL(sock_release);
    -EXPORT_SYMBOL(sock_sendmsg);
    -EXPORT_SYMBOL(sock_unregister);
    -EXPORT_SYMBOL(sock_wake_async);
    -EXPORT_SYMBOL(sockfd_lookup);
    -EXPORT_SYMBOL(kernel_sendmsg);
    -EXPORT_SYMBOL(kernel_recvmsg);
    -EXPORT_SYMBOL(kernel_bind);
    -EXPORT_SYMBOL(kernel_listen);
    -EXPORT_SYMBOL(kernel_accept);
    -EXPORT_SYMBOL(kernel_connect);
    -EXPORT_SYMBOL(kernel_getsockname);
    -EXPORT_SYMBOL(kernel_getpeername);
    -EXPORT_SYMBOL(kernel_getsockopt);
    -EXPORT_SYMBOL(kernel_setsockopt);
    -EXPORT_SYMBOL(kernel_sendpage);
    -EXPORT_SYMBOL(kernel_sock_ioctl);
    EXPORT_SYMBOL(kernel_sock_shutdown);
    +
    --
    1.7.0.4

    Eric Dumazet
     

24 May, 2010

1 commit

  • Up until now cls_cgroup has relied on fetching the classid out of
    the current executing thread. This runs into trouble when a packet
    processing is delayed in which case it may execute out of another
    thread's context.

    Furthermore, even when a packet is not delayed we may fail to
    classify it if soft IRQs have been disabled, because this scenario
    is indistinguishable from one where a packet unrelated to the
    current thread is processed by a real soft IRQ.

    In fact, the current semantics is inherently broken, as a single
    skb may be constructed out of the writes of two different tasks.
    A different manifestation of this problem is when the TCP stack
    transmits in response of an incoming ACK. This is currently
    unclassified.

    As we already have a concept of packet ownership for accounting
    purposes in the skb->sk pointer, this is a natural place to store
    the classid in a persistent manner.

    This patch adds the cls_cgroup classid in struct sock, filling up
    an existing hole on 64-bit :)

    The value is set at socket creation time. So all sockets created
    via socket(2) automatically gains the ID of the thread creating it.
    Whenever another process touches the socket by either reading or
    writing to it, we will change the socket classid to that of the
    process if it has a valid (non-zero) classid.

    For sockets created on inbound connections through accept(2), we
    inherit the classid of the original listening socket through
    sk_clone, possibly preceding the actual accept(2) call.

    In order to minimise risks, I have not made this the authoritative
    classid. For now it is only used as a backup when we execute
    with soft IRQs disabled. Once we're completely happy with its
    semantics we can use it as the sole classid.

    Footnote: I have rearranged the error path on cls_group module
    creation. If we didn't do this, then there is a window where
    someone could create a tc rule using cls_group before the cgroup
    subsystem has been registered.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     

18 May, 2010

1 commit


02 May, 2010

1 commit

  • sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
    need two atomic operations (and associated dirtying) per incoming
    packet.

    RCU conversion is pretty much needed :

    1) Add a new structure, called "struct socket_wq" to hold all fields
    that will need rcu_read_lock() protection (currently: a
    wait_queue_head_t and a struct fasync_struct pointer).

    [Future patch will add a list anchor for wakeup coalescing]

    2) Attach one of such structure to each "struct socket" created in
    sock_alloc_inode().

    3) Respect RCU grace period when freeing a "struct socket_wq"

    4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
    socket_wq"

    5) Change sk_sleep() function to use new sk->sk_wq instead of
    sk->sk_sleep

    6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
    a rcu_read_lock() section.

    7) Change all sk_has_sleeper() callers to :
    - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
    - Use wq_has_sleeper() to eventually wakeup tasks.
    - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)

    8) sock_wake_async() is modified to use rcu protection as well.

    9) Exceptions :
    macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
    instead of dynamically allocated ones. They dont need rcu freeing.

    Some cleanups or followups are probably needed, (possible
    sk_callback_lock conversion to a spinlock for example...).

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

01 May, 2010

1 commit


22 Apr, 2010

1 commit

  • kill_fasync() uses a central rwlock, candidate for RCU conversion, to
    avoid cache line ping pongs on SMP.

    fasync_remove_entry() and fasync_add_entry() can disable IRQS on a short
    section instead during whole list scan.

    Use a spinlock per fasync_struct to synchronize kill_fasync_rcu() and
    fasync_{remove|add}_entry(). This spinlock is IRQ safe, so sock_fasync()
    doesnt need its own implementation and can use fasync_helper(), to
    reduce code size and complexity.

    We can remove __kill_fasync() direct use in net/socket.c, and rename it
    to kill_fasync_rcu().

    Signed-off-by: Eric Dumazet
    Cc: Paul E. McKenney
    Cc: Lai Jiangshan
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 Apr, 2010

1 commit


07 Apr, 2010

2 commits


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

27 Mar, 2010

1 commit

  • Add new flag MSG_WAITFORONE for the recvmmsg() syscall.
    When this flag is specified for a blocking socket, recvmmsg()
    will only block until at least 1 packet is available. The
    default behavior is to block until all vlen packets are
    available. This flag has no effect on non-blocking sockets
    or when used in combination with MSG_DONTWAIT.

    Signed-off-by: Brandon L Black
    Acked-by: Ulrich Drepper
    Acked-by: Eric Dumazet
    Acked-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Brandon L Black
     

17 Dec, 2009

4 commits