27 Sep, 2006

1 commit

  • This eliminates the i_blksize field from struct inode. Filesystems that want
    to provide a per-inode st_blksize can do so by providing their own getattr
    routine instead of using the generic_fillattr() function.

    Note that some filesystems were providing pretty much random (and incorrect)
    values for i_blksize.

    [bunk@stusta.de: cleanup]
    [akpm@osdl.org: generic_fillattr() fix]
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Theodore Ts'o
     

23 Jun, 2006

1 commit

  • Extend the get_sb() filesystem operation to take an extra argument that
    permits the VFS to pass in the target vfsmount that defines the mountpoint.

    The filesystem is then required to manually set the superblock and root dentry
    pointers. For most filesystems, this should be done with simple_set_mnt()
    which will set the superblock pointer and then set the root dentry to the
    superblock's s_root (as per the old default behaviour).

    The get_sb() op now returns an integer as there's now no need to return the
    superblock pointer.

    This patch permits a superblock to be implicitly shared amongst several mount
    points, such as can be done with NFS to avoid potential inode aliasing. In
    such a case, simple_set_mnt() would not be called, and instead the mnt_root
    and mnt_sb would be set directly.

    The patch also makes the following changes:

    (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
    pointer argument and return an integer, so most filesystems have to change
    very little.

    (*) If one of the convenience function is not used, then get_sb() should
    normally call simple_set_mnt() to instantiate the vfsmount. This will
    always return 0, and so can be tail-called from get_sb().

    (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
    dcache upon superblock destruction rather than shrink_dcache_anon().

    This is required because the superblock may now have multiple trees that
    aren't actually bound to s_root, but that still need to be cleaned up. The
    currently called functions assume that the whole tree is rooted at s_root,
    and that anonymous dentries are not the roots of trees which results in
    dentries being left unculled.

    However, with the way NFS superblock sharing are currently set to be
    implemented, these assumptions are violated: the root of the filesystem is
    simply a dummy dentry and inode (the real inode for '/' may well be
    inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
    with child trees.

    [*] Anonymous until discovered from another tree.

    (*) The documentation has been adjusted, including the additional bit of
    changing ext2_* into foo_* in the documentation.

    [akpm@osdl.org: convert ipath_fs, do other stuff]
    Signed-off-by: David Howells
    Acked-by: Al Viro
    Cc: Nathan Scott
    Cc: Roland Dreier
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

02 May, 2006

4 commits

  • Apply the same rules as the anon pipe pages, only allow stealing
    if no one else is using the page.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • The pipe ->map() method uses kmap() to virtually map the pages, which
    is both slow and has known scalability issues on SMP. This patch enables
    atomic copying of pipe pages, by pre-faulting data and using kmap_atomic()
    instead.

    lmbench bw_pipe and lat_pipe measurements agree this is a Good Thing. Here
    are results from that on a UP machine with highmem (1.5GiB of RAM), running
    first a UP kernel, SMP kernel, and SMP kernel patched.

    Vanilla-UP:
    Pipe bandwidth: 1622.28 MB/sec
    Pipe bandwidth: 1610.59 MB/sec
    Pipe bandwidth: 1608.30 MB/sec
    Pipe latency: 7.3275 microseconds
    Pipe latency: 7.2995 microseconds
    Pipe latency: 7.3097 microseconds

    Vanilla-SMP:
    Pipe bandwidth: 1382.19 MB/sec
    Pipe bandwidth: 1317.27 MB/sec
    Pipe bandwidth: 1355.61 MB/sec
    Pipe latency: 9.6402 microseconds
    Pipe latency: 9.6696 microseconds
    Pipe latency: 9.6153 microseconds

    Patched-SMP:
    Pipe bandwidth: 1578.70 MB/sec
    Pipe bandwidth: 1579.95 MB/sec
    Pipe bandwidth: 1578.63 MB/sec
    Pipe latency: 9.1654 microseconds
    Pipe latency: 9.2266 microseconds
    Pipe latency: 9.1527 microseconds

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • The ->map() function is really expensive on highmem machines right now,
    since it has to use the slower kmap() instead of kmap_atomic(). Splice
    rarely needs to access the virtual address of a page, so it's a waste
    of time doing it.

    Introduce ->pin() to take over the responsibility of making sure the
    page data is valid. ->map() is then reduced to just kmap(). That way we
    can also share a most of the pipe buffer ops between pipe.c and splice.c

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Found by Oleg Nesterov , fixed by me.

    - Only allow full pages to go to the page cache.
    - Check page != buf->page instead of using PIPE_BUF_FLAG_STOLEN.
    - Remember to clear 'stolen' if add_to_page_cache() fails.

    And as a cleanup on that:

    - Make the bottom fall-through logic a little less convoluted. Also make
    the steal path hold an extra reference to the page, so we don't have
    to differentiate between stolen and non-stolen at the end.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

30 Apr, 2006

1 commit


11 Apr, 2006

5 commits


10 Apr, 2006

1 commit

  • separate out the 'internal pipe object' abstraction, and make it
    usable to splice. This cleans up and fixes several aspects of the
    internal splice APIs and the pipe code:

    - pipes: the allocation and freeing of pipe_inode_info is now more symmetric
    and more streamlined with existing kernel practices.

    - splice: small micro-optimization: less pointer dereferencing in splice
    methods

    Signed-off-by: Ingo Molnar

    Update XFS for the ->splice_read/->splice_write changes.

    Signed-off-by: Jens Axboe

    Ingo Molnar
     

03 Apr, 2006

2 commits

  • Originally from Nick Piggin, just adapted to the newer branch.

    You can't check PageLRU without holding zone->lru_lock. The page
    release code can get away with it only because the page refcount is 0 at
    that point. Also, you can't reliably remove pages from the LRU unless
    the refcount is 0. Ever.

    Signed-off-by: Nick Piggin
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • By cleaning up the writeback logic (killing write_one_page() and the manual
    set_page_dirty()), we can get rid of ->stolen inside the pipe_buffer and
    just keep it local in pipe_to_file().

    This also adds dirty page balancing logic and O_SYNC handling.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

31 Mar, 2006

2 commits

  • This enables the caller to migrate pages from one address space page
    cache to another. In buzz word marketing, you can do zero-copy file
    copies!

    Signed-off-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Jens Axboe
     
  • This adds support for the sys_splice system call. Using a pipe as a
    transport, it can connect to files or sockets (latter as output only).

    From the splice.c comments:

    "splice": joining two ropes together by interweaving their strands.

    This is the "extended pipe" functionality, where a pipe is used as
    an arbitrary in-memory buffer. Think of a pipe as a small kernel
    buffer that you can use to transfer data from one end to the other.

    The traditional unix read/write is extended with a "splice()" operation
    that transfers data buffers to or from a pipe buffer.

    Named by Larry McVoy, original implementation from Linus, extended by
    Jens to support splicing to files and fixing the initial implementation
    bugs.

    Signed-off-by: Jens Axboe
    Signed-off-by: Linus Torvalds

    Jens Axboe
     

29 Mar, 2006

1 commit

  • This is a conversion to make the various file_operations structs in fs/
    const. Basically a regexp job, with a few manual fixups

    The goal is both to increase correctness (harder to accidentally write to
    shared datastructures) and reducing the false sharing of cachelines with
    things that get dirty in .data (while .rodata is nicely read only and thus
    cache clean)

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

27 Mar, 2006

1 commit

  • I discovered on oprofile hunting on a SMP platform that dentry lookups were
    slowed down because d_hash_mask, d_hash_shift and dentry_hashtable were in
    a cache line that contained inodes_stat. So each time inodes_stats is
    changed by a cpu, other cpus have to refill their cache line.

    This patch moves some variables to the __read_mostly section, in order to
    avoid false sharing. RCU dentry lookups can go full speed.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

26 Mar, 2006

1 commit


09 Mar, 2006

1 commit


15 Jan, 2006

1 commit


11 Jan, 2006

1 commit

  • To allow various options to work per-mount instead of per-sb we need a
    struct vfsmount when updating ctime and mtime. This preparation patch
    replaces the inode_update_time routine with a file_update_atime routine so
    we can easily get at the vfsmount. (and the file makes more sense in this
    context anyway). Also get rid of the unused second argument - we always
    want to update the ctime when calling this routine.

    Signed-off-by: Christoph Hellwig
    Cc: Al Viro
    Cc: Anton Altaparmakov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

10 Jan, 2006

1 commit


11 Sep, 2005

1 commit

  • This patch implements a task state bit (TASK_NONINTERACTIVE), which can be
    used by blocking points to mark the task's wait as "non-interactive". This
    does not mean the task will be considered a CPU-hog - the wait will simply
    not have an effect on the waiting task's priority - positive or negative
    alike. Right now only pipe_wait() will make use of it, because it's a
    common source of not-so-interactive waits (kernel compilation jobs, etc.).

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

08 Sep, 2005

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds