23 Jun, 2006

1 commit

  • Extend the get_sb() filesystem operation to take an extra argument that
    permits the VFS to pass in the target vfsmount that defines the mountpoint.

    The filesystem is then required to manually set the superblock and root dentry
    pointers. For most filesystems, this should be done with simple_set_mnt()
    which will set the superblock pointer and then set the root dentry to the
    superblock's s_root (as per the old default behaviour).

    The get_sb() op now returns an integer as there's now no need to return the
    superblock pointer.

    This patch permits a superblock to be implicitly shared amongst several mount
    points, such as can be done with NFS to avoid potential inode aliasing. In
    such a case, simple_set_mnt() would not be called, and instead the mnt_root
    and mnt_sb would be set directly.

    The patch also makes the following changes:

    (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
    pointer argument and return an integer, so most filesystems have to change
    very little.

    (*) If one of the convenience function is not used, then get_sb() should
    normally call simple_set_mnt() to instantiate the vfsmount. This will
    always return 0, and so can be tail-called from get_sb().

    (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
    dcache upon superblock destruction rather than shrink_dcache_anon().

    This is required because the superblock may now have multiple trees that
    aren't actually bound to s_root, but that still need to be cleaned up. The
    currently called functions assume that the whole tree is rooted at s_root,
    and that anonymous dentries are not the roots of trees which results in
    dentries being left unculled.

    However, with the way NFS superblock sharing are currently set to be
    implemented, these assumptions are violated: the root of the filesystem is
    simply a dummy dentry and inode (the real inode for '/' may well be
    inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
    with child trees.

    [*] Anonymous until discovered from another tree.

    (*) The documentation has been adjusted, including the additional bit of
    changing ext2_* into foo_* in the documentation.

    [akpm@osdl.org: convert ipath_fs, do other stuff]
    Signed-off-by: David Howells
    Acked-by: Al Viro
    Cc: Nathan Scott
    Cc: Roland Dreier
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

01 May, 2006

1 commit

  • On Thursday 23 March 2006 09:08, John D. Ramsdell wrote:
    > I noticed that a socketcall(bind) and socketcall(connect) event contain a
    > record of type=SOCKADDR, but I cannot see one for a system call event
    > associated with socketcall(accept). Recording the sockaddr of an accepted
    > socket is important for cross platform information flow analys

    Thanks for pointing this out. The following patch should address this.

    Signed-off-by: Steve Grubb
    Signed-off-by: Al Viro

    Steve Grubb
     

20 Apr, 2006

1 commit


11 Apr, 2006

3 commits

  • * 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
    [PATCH] vfs: add splice_write and splice_read to documentation
    [PATCH] Remove sys_ prefix of new syscalls from __NR_sys_*
    [PATCH] splice: warning fix
    [PATCH] another round of fs/pipe.c cleanups
    [PATCH] splice: comment styles
    [PATCH] splice: add Ingo as addition copyright holder
    [PATCH] splice: unlikely() optimizations
    [PATCH] splice: speedups and optimizations
    [PATCH] pipe.c/fifo.c code cleanups
    [PATCH] get rid of the PIPE_*() macros
    [PATCH] splice: speedup __generic_file_splice_read
    [PATCH] splice: add direct fd fd splicing support
    [PATCH] splice: add optional input and output offsets
    [PATCH] introduce a "kernel-internal pipe object" abstraction
    [PATCH] splice: be smarter about calling do_page_cache_readahead()
    [PATCH] splice: optimize the splice buffer mapping
    [PATCH] splice: cleanup __generic_file_splice_read()
    [PATCH] splice: only call wake_up_interruptible() when we really have to
    [PATCH] splice: potential !page dereference
    [PATCH] splice: mark the io page as accessed

    Linus Torvalds
     
  • for_each_cpu() actually iterates across all possible CPUs. We've had mistakes
    in the past where people were using for_each_cpu() where they should have been
    iterating across only online or present CPUs. This is inefficient and
    possibly buggy.

    We're renaming for_each_cpu() to for_each_possible_cpu() to avoid this in the
    future.

    This patch replaces for_each_cpu with for_each_possible_cpu under /net

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • From: Andrew Morton

    net/socket.c:148: warning: initialization from incompatible pointer type

    extern declarations in .c files! Bad boy.

    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Andrew Morton
     

02 Apr, 2006

1 commit


01 Apr, 2006

1 commit

  • This regression was added by commit:
    39d8c1b6fbaeb8d6adec4a8c08365cc9eaca6ae4
    ("Do not lose accepted socket when -ENFILE/-EMFILE.")

    This is based upon a patch from Andi Kleen.

    Thanks to Adrian Bridgett for narrowing down a good test case, and to
    Andi Kleen and Andrew Morton for eyeballing this code.

    Signed-off-by: David S. Miller

    David S. Miller
     

31 Mar, 2006

1 commit

  • This adds support for the sys_splice system call. Using a pipe as a
    transport, it can connect to files or sockets (latter as output only).

    From the splice.c comments:

    "splice": joining two ropes together by interweaving their strands.

    This is the "extended pipe" functionality, where a pipe is used as
    an arbitrary in-memory buffer. Think of a pipe as a small kernel
    buffer that you can use to transfer data from one end to the other.

    The traditional unix read/write is extended with a "splice()" operation
    that transfers data buffers to or from a pipe buffer.

    Named by Larry McVoy, original implementation from Linus, extended by
    Jens to support splicing to files and fixing the initial implementation
    bugs.

    Signed-off-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Jens Axboe
     

29 Mar, 2006

1 commit

  • This is a conversion to make the various file_operations structs in fs/
    const. Basically a regexp job, with a few manual fixups

    The goal is both to increase correctness (harder to accidentally write to
    shared datastructures) and reducing the false sharing of cachelines with
    things that get dirty in .data (while .rodata is nicely read only and thus
    cache clean)

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

24 Mar, 2006

2 commits

  • Rewrap the overly long source code lines resulting from the previous
    patch's addition of the slab cache flag SLAB_MEM_SPREAD. This patch
    contains only formatting changes, and no function change.

    Signed-off-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     
  • Mark file system inode and similar slab caches subject to SLAB_MEM_SPREAD
    memory spreading.

    If a slab cache is marked SLAB_MEM_SPREAD, then anytime that a task that's
    in a cpuset with the 'memory_spread_slab' option enabled goes to allocate
    from such a slab cache, the allocations are spread evenly over all the
    memory nodes (task->mems_allowed) allowed to that task, instead of favoring
    allocation on the node local to the current cpu.

    The following inode and similar caches are marked SLAB_MEM_SPREAD:

    file cache
    ==== =====
    fs/adfs/super.c adfs_inode_cache
    fs/affs/super.c affs_inode_cache
    fs/befs/linuxvfs.c befs_inode_cache
    fs/bfs/inode.c bfs_inode_cache
    fs/block_dev.c bdev_cache
    fs/cifs/cifsfs.c cifs_inode_cache
    fs/coda/inode.c coda_inode_cache
    fs/dquot.c dquot
    fs/efs/super.c efs_inode_cache
    fs/ext2/super.c ext2_inode_cache
    fs/ext2/xattr.c (fs/mbcache.c) ext2_xattr
    fs/ext3/super.c ext3_inode_cache
    fs/ext3/xattr.c (fs/mbcache.c) ext3_xattr
    fs/fat/cache.c fat_cache
    fs/fat/inode.c fat_inode_cache
    fs/freevxfs/vxfs_super.c vxfs_inode
    fs/hpfs/super.c hpfs_inode_cache
    fs/isofs/inode.c isofs_inode_cache
    fs/jffs/inode-v23.c jffs_fm
    fs/jffs2/super.c jffs2_i
    fs/jfs/super.c jfs_ip
    fs/minix/inode.c minix_inode_cache
    fs/ncpfs/inode.c ncp_inode_cache
    fs/nfs/direct.c nfs_direct_cache
    fs/nfs/inode.c nfs_inode_cache
    fs/ntfs/super.c ntfs_big_inode_cache_name
    fs/ntfs/super.c ntfs_inode_cache
    fs/ocfs2/dlm/dlmfs.c dlmfs_inode_cache
    fs/ocfs2/super.c ocfs2_inode_cache
    fs/proc/inode.c proc_inode_cache
    fs/qnx4/inode.c qnx4_inode_cache
    fs/reiserfs/super.c reiser_inode_cache
    fs/romfs/inode.c romfs_inode_cache
    fs/smbfs/inode.c smb_inode_cache
    fs/sysv/inode.c sysv_inode_cache
    fs/udf/super.c udf_inode_cache
    fs/ufs/super.c ufs_inode_cache
    net/socket.c sock_inode_cache
    net/sunrpc/rpc_pipe.c rpc_inode_cache

    The choice of which slab caches to so mark was quite simple. I marked
    those already marked SLAB_RECLAIM_ACCOUNT, except for fs/xfs, dentry_cache,
    inode_cache, and buffer_head, which were marked in a previous patch. Even
    though SLAB_RECLAIM_ACCOUNT is for a different purpose, it marks the same
    potentially large file system i/o related slab caches as we need for memory
    spreading.

    Given that the rule now becomes "wherever you would have used a
    SLAB_RECLAIM_ACCOUNT slab cache flag before (usually the inode cache), use
    the SLAB_MEM_SPREAD flag too", this should be easy enough to maintain.
    Future file system writers will just copy one of the existing file system
    slab cache setups and tend to get it right without thinking.

    Signed-off-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     

22 Mar, 2006

1 commit


21 Mar, 2006

3 commits

  • Semaphore to mutex conversion.

    The conversion was generated via scripts, and the result was validated
    automatically via a script as well.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Arjan van de Ven
     
  • Here's an updated copy of the patch to use fget_light in net/socket.c.
    Rerunning the tests show a drop of ~80Mbit/s on average, which looks
    bad until you see the drop in cpu usage from ~89% to ~82%. That will
    get fixed in another patch...

    Before: max 8113.70, min 8026.32, avg 8072.34
    87380 16384 16384 10.01 8045.55 87.11 87.11 1.774 1.774
    87380 16384 16384 10.01 8065.14 90.86 90.86 1.846 1.846
    87380 16384 16384 10.00 8077.76 89.85 89.85 1.822 1.822
    87380 16384 16384 10.00 8026.32 89.80 89.80 1.833 1.833
    87380 16384 16384 10.01 8108.59 89.81 89.81 1.815 1.815
    87380 16384 16384 10.01 8034.53 89.01 89.01 1.815 1.815
    87380 16384 16384 10.00 8113.70 90.45 90.45 1.827 1.827
    87380 16384 16384 10.00 8111.37 89.90 89.90 1.816 1.816
    87380 16384 16384 10.01 8077.75 87.96 87.96 1.784 1.784
    87380 16384 16384 10.00 8062.70 90.25 90.25 1.834 1.834

    After: max 8035.81, min 7963.69, avg 7998.14
    87380 16384 16384 10.01 8000.93 82.11 82.11 1.682 1.682
    87380 16384 16384 10.01 8016.17 83.67 83.67 1.710 1.710
    87380 16384 16384 10.01 7963.69 83.47 83.47 1.717 1.717
    87380 16384 16384 10.01 8014.35 81.71 81.71 1.671 1.671
    87380 16384 16384 10.00 7967.68 83.41 83.41 1.715 1.715
    87380 16384 16384 10.00 7995.22 81.00 81.00 1.660 1.660
    87380 16384 16384 10.00 8002.61 83.90 83.90 1.718 1.718
    87380 16384 16384 10.00 8035.81 81.71 81.71 1.666 1.666
    87380 16384 16384 10.01 8005.36 82.56 82.56 1.690 1.690
    87380 16384 16384 10.00 7979.61 82.50 82.50 1.694 1.694

    Signed-off-by: Benjamin LaHaise
    Signed-off-by: David S. Miller

    Benjamin LaHaise
     
  • Try to allocate the struct file and an unused file
    descriptor before we try to pull a newly accepted
    socket out of the protocol layer.

    Based upon a patch by Prassana Meda.

    Signed-off-by: David S. Miller

    David S. Miller
     

07 Feb, 2006

1 commit


06 Feb, 2006

1 commit

  • percpu_data blindly allocates bootmem memory to store NR_CPUS instances of
    cpudata, instead of allocating memory only for possible cpus.

    As a preparation for changing that, we need to convert various 0 -> NR_CPUS
    loops to use for_each_cpu().

    (The above only applies to users of asm-generic/percpu.h. powerpc has gone it
    alone and is presently only allocating memory for present CPUs, so it's
    currently corrupting memory).

    Signed-off-by: Eric Dumazet
    Cc: "David S. Miller"
    Cc: James Bottomley
    Acked-by: Ingo Molnar
    Cc: Jens Axboe
    Cc: Anton Blanchard
    Acked-by: William Irwin
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

31 Jan, 2006

1 commit

  • This patch contains the following changes:
    - add a CONFIG_WIRELESS_EXT select'ed by NET_RADIO for conditional
    code
    - remove the now no longer required #ifdef CONFIG_NET_RADIO from some
    #include's

    Based on a patch by Jean Tourrilhes .

    Signed-off-by: Adrian Bunk
    Signed-off-by: John W. Linville

    Adrian Bunk
     

12 Jan, 2006

1 commit


04 Jan, 2006

4 commits


28 Sep, 2005

1 commit

  • I have been experimenting with loadable protocol modules, and ran into
    several issues with module reference counting.

    The first issue was that __module_get failed at the BUG_ON check at
    the top of the routine (checking that my module reference count was
    not zero) when I created the first socket. When sk_alloc() is called,
    my module reference count was still 0. When I looked at why sctp
    didn't have this problem, I discovered that sctp creates a control
    socket during module init (when the module ref count is not 0), which
    keeps the reference count non-zero. This section has been updated to
    address the point Stephen raised about checking the return value of
    try_module_get().

    The next problem arose when my socket init routine returned an error.
    This resulted in my module reference count being decremented below 0.
    My socket ops->release routine was also being called. The issue here
    is that sock_release() calls the ops->release routine and decrements
    the ref count if sock->ops is not NULL. Since the socket probably
    didn't get correctly initialized, this should not be done, so we will
    set sock->ops to NULL because we will not call try_module_get().

    While searching for another bug, I also noticed that sys_accept() has
    a possibility of doing a module_put() when it did not do an
    __module_get so I re-ordered the call to security_socket_accept().

    Signed-off-by: Frank Filz
    Signed-off-by: David S. Miller

    Frank Filz
     

27 Sep, 2005

1 commit


17 Sep, 2005

1 commit


08 Sep, 2005

1 commit

  • When we copy 32bit ->msg_control contents to kernel, we walk the same
    userland data twice without sanity checks on the second pass.

    Second version of this patch: the original broke with 64-bit arches
    running 32-bit-compat-mode executables doing sendmsg() syscalls with
    unaligned CMSG data areas

    Another thing is that we use kmalloc() to allocate and sock_kfree_s()
    to free afterwards; less serious, but also needs fixing.

    Signed-off-by: Al Viro
    Signed-off-by: David Woodhouse
    Signed-off-by: Chris Wright
    Signed-off-by: Linus Torvalds

    Al Viro
     

07 Sep, 2005

1 commit


30 Aug, 2005

3 commits

  • This patch puts mostly read only data in the right section
    (read_mostly), to help sharing of these data between CPUS without
    memory ping pongs.

    On one of my production machine, tcp_statistics was sitting in a
    heavily modified cache line, so *every* SNMP update had to force a
    reload.

    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     
  • Of this type, mostly:

    CHECK net/ipv6/netfilter.c
    net/ipv6/netfilter.c:96:12: warning: symbol 'ipv6_netfilter_init' was not declared. Should it be static?
    net/ipv6/netfilter.c:101:6: warning: symbol 'ipv6_netfilter_fini' was not declared. Should it be static?

    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: David S. Miller

    Arnaldo Carvalho de Melo
     
  • Please consider the patch below which makes use of file->private_data to
    store the pointer to the socket, which avoids touching several unused
    cachelines in the dentry and inode in sockfd_lookup.

    Signed-off-by: Benjamin LaHaise
    Signed-off-by: David S. Miller

    Benjamin LaHaise
     

23 Jun, 2005

1 commit


02 Jun, 2005

1 commit


17 May, 2005

1 commit


06 May, 2005

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds