09 May, 2009

1 commit


03 Apr, 2009

1 commit

  • Impact: cleanup

    We want to remove percpu.h from rcupdate.h (for upcoming kmemtrace
    changes), but this is not possible currently without breaking the
    build because fs.h has an implicit include file depedency: it
    uses PAGE_SIZE but does not include asm/page.h which defines it.

    This problem gets masked in practice because most fs.h using sites
    use rcupreempt.h (and other headers) which includes percpu.h which
    brings in asm/page.h indirectly.

    We cannot add asm/page.h to asm/fs.h because page.h is not an
    exported header.

    Move simple_transaction_set() to the other simple-transaction
    file helpers in fs/libfs.c.

    This removes the include file hell and also reduces
    kernel size a bit.

    Acked-by: Al Viro
    Cc: Alexey Dobriyan
    Cc: Pekka Enberg
    Cc: Eduard - Gabriel Munteanu
    Cc: paulmck@linux.vnet.ibm.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

28 Mar, 2009

2 commits

  • simple_set_mnt() is defined as returning 'int' but always returns 0.
    Callers assume simple_set_mnt() never fails and don't properly cleanup if
    it were to _ever_ fail. For instance, get_sb_single() and get_sb_nodev()
    should:

    up_write(sb->s_unmount);
    deactivate_super(sb);

    if simple_set_mnt() fails.

    Since simple_set_mnt() never fails, would be cleaner if it did not
    return anything.

    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Sukadev Bhattiprolu
    Acked-by: Serge Hallyn
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Sukadev Bhattiprolu
     
  • Signed-off-by: Al Viro

    Al Viro
     

06 Jan, 2009

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    inotify: fix type errors in interfaces
    fix breakage in reiserfs_new_inode()
    fix the treatment of jfs special inodes
    vfs: remove duplicate code in get_fs_type()
    add a vfs_fsync helper
    sys_execve and sys_uselib do not call into fsnotify
    zero i_uid/i_gid on inode allocation
    inode->i_op is never NULL
    ntfs: don't NULL i_op
    isofs check for NULL ->i_op in root directory is dead code
    affs: do not zero ->i_op
    kill suid bit only for regular files
    vfs: lseek(fd, 0, SEEK_CUR) race condition

    Linus Torvalds
     
  • ... and don't bother in callers. Don't bother with zeroing i_blocks,
    while we are at it - it's already been zeroed.

    i_mode is not worth the effort; it has no common default value.

    Signed-off-by: Al Viro

    Al Viro
     

05 Jan, 2009

1 commit

  • With the write_begin/write_end aops, page_symlink was broken because it
    could no longer pass a GFP_NOFS type mask into the point where the
    allocations happened. They are done in write_begin, which would always
    assume that the filesystem can be entered from reclaim. This bug could
    cause filesystem deadlocks.

    The funny thing with having a gfp_t mask there is that it doesn't really
    allow the caller to arbitrarily tinker with the context in which it can be
    called. It couldn't ever be GFP_ATOMIC, for example, because it needs to
    take the page lock. The only thing any callers care about is __GFP_FS
    anyway, so turn that into a single flag.

    Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on
    this flag in their write_begin function. Change __grab_cache_page to
    accept a nofs argument as well, to honour that flag (while we're there,
    change the name to grab_cache_page_write_begin which is more instructive
    and does away with random leading underscores).

    This is really a more flexible way to go in the end anyway -- if a
    filesystem happens to want any extra allocations aside from the pagecache
    ones in ints write_begin function, it may now use GFP_KERNEL (rather than
    GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a
    random example).

    [kosaki.motohiro@jp.fujitsu.com: fix ubifs]
    [kosaki.motohiro@jp.fujitsu.com: fix fuse]
    Signed-off-by: Nick Piggin
    Reviewed-by: KOSAKI Motohiro
    Cc: [2.6.28.x]
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    [ Cleaned up the calling convention: just pass in the AOP flags
    untouched to the grab_cache_page_write_begin() function. That
    just simplifies everybody, and may even allow future expansion of the
    logic. - Linus ]
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

31 Oct, 2008

1 commit

  • Nothing uses prepare_write or commit_write. Remove them from the tree
    completely.

    [akpm@linux-foundation.org: schedule simple_prepare_write() for unexporting]
    Signed-off-by: Nick Piggin
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

23 Oct, 2008

1 commit

  • The calling conventions of d_alloc_anon are rather unfortunate for all
    users, and it's name is not very descriptive either.

    Add d_obtain_alias as a new exported helper that drops the inode
    reference in the failure case, too and allows to pass-through NULL
    pointers and inodes to allow for tail-calls in the export operations.

    Incidentally this helper already existed as a private function in
    libfs.c as exportfs_d_alloc so kill that one and switch the callers
    to d_obtain_alias.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

31 Jul, 2008

1 commit

  • This commit:

    commit ba52de123d454b57369f291348266d86f4b35070
    Author: Theodore Ts'o
    Date: Wed Sep 27 01:50:49 2006 -0700

    [PATCH] inode-diet: Eliminate i_blksize from the inode structure

    caused the block size used by pseudo-filesystems to decrease from
    PAGE_SIZE to 1024 leading to a doubling of the number of context switches
    during a kernbench run.

    Signed-off-by: Alex Nixon
    Cc: Andi Kleen
    Cc: Jeremy Fitzhardinge
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Ian Campbell
    Cc: "Theodore Ts'o"
    Cc: Alexander Viro
    Cc: Hugh Dickins
    Cc: Jens Axboe
    Cc: [2.6.25.x, 2.6.26.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Nixon
     

05 Jul, 2008

1 commit


07 Jun, 2008

1 commit

  • This patch introduces memory_read_from_buffer().

    The only difference between memory_read_from_buffer() and
    simple_read_from_buffer() is which address space the function copies to.

    simple_read_from_buffer copies to user space memory.
    memory_read_from_buffer copies to normal memory.

    Signed-off-by: Akinobu Mita
    Cc: Al Viro
    Cc: Doug Warzecha
    Cc: Zhang Rui
    Cc: Matt Domsch
    Cc: Abhay Salunke
    Cc: Greg Kroah-Hartman
    Cc: Markus Rechberger
    Cc: Kay Sievers
    Cc: Bob Moore
    Cc: Thomas Renninger
    Cc: Len Brown
    Cc: Benjamin Herrenschmidt
    Cc: "Antonino A. Daplas"
    Cc: Krzysztof Helt
    Cc: Geert Uytterhoeven
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Peter Oberparleiter
    Cc: Michael Holzheu
    Cc: Brian King
    Cc: James E.J. Bottomley
    Cc: Andrew Vasquez
    Cc: Seokmann Ju
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

09 Feb, 2008

3 commits

  • simple_attr_close implementes ->release so it should be named accordingly.

    Signed-off-by: Christoph Hellwig
    Cc:
    Cc: Arnd Bergmann
    Cc: Greg KH
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Use mutex_lock_interruptible in simple_attr_read/write.

    Signed-off-by: Christoph Hellwig
    Cc:
    Cc: Arnd Bergmann
    Cc: Greg KH
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Sometimes simple attributes might need to return an error, e.g. for
    acquiring a mutex interruptibly. In fact we have that situation in
    spufs already which is the original user of the simple attributes. This
    patch merged the temporarily forked attributes in spufs back into the
    main ones and allows to return errors.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Christoph Hellwig
    Cc:
    Cc: Arnd Bergmann
    Cc: Greg KH
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

06 Feb, 2008

1 commit

  • Simplify page cache zeroing of segments of pages through 3 functions

    zero_user_segments(page, start1, end1, start2, end2)

    Zeros two segments of the page. It takes the position where to
    start and end the zeroing which avoids length calculations and
    makes code clearer.

    zero_user_segment(page, start, end)

    Same for a single segment.

    zero_user(page, start, length)

    Length variant for the case where we know the length.

    We remove the zero_user_page macro. Issues:

    1. Its a macro. Inline functions are preferable.

    2. The KM_USER0 macro is only defined for HIGHMEM.

    Having to treat this special case everywhere makes the
    code needlessly complex. The parameter for zeroing is always
    KM_USER0 except in one single case that we open code.

    Avoiding KM_USER0 makes a lot of code not having to be dealing
    with the special casing for HIGHMEM anymore. Dealing with
    kmap is only necessary for HIGHMEM configurations. In those
    configurations we use KM_USER0 like we do for a series of other
    functions defined in highmem.h.

    Since KM_USER0 is depends on HIGHMEM the existing zero_user_page
    function could not be a macro. zero_user_* functions introduced
    here can be be inline because that constant is not used when these
    functions are called.

    Also extract the flushing of the caches to be outside of the kmap.

    [akpm@linux-foundation.org: fix nfs and ntfs build]
    [akpm@linux-foundation.org: fix ntfs build some more]
    Signed-off-by: Christoph Lameter
    Cc: Steven French
    Cc: Michael Halcrow
    Cc:
    Cc: Steven Whitehouse
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: David Chinner
    Cc: Michael Halcrow
    Cc: Steven French
    Cc: Steven Whitehouse
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

22 Oct, 2007

1 commit

  • Add the guts for the new filesystem API to exportfs.

    There's now a fh_to_dentry method that returns a dentry for the object looked
    for given a filehandle fragment, and a fh_to_parent operation that returns the
    dentry for the encoded parent directory in case the file handle contains it.

    There are default implementations for these methods that only take a callback
    for an nfs-enhanced iget variant and implement the rest of the semantics.

    Signed-off-by: Christoph Hellwig
    Cc: Neil Brown
    Cc: "J. Bruce Fields"
    Cc:
    Cc: Dave Kleikamp
    Cc: Anton Altaparmakov
    Cc: David Chinner
    Cc: Timothy Shimmin
    Cc: OGAWA Hirofumi
    Cc: Hugh Dickins
    Cc: Chris Mason
    Cc: Jeff Mahoney
    Cc: "Vladimir V. Saveliev"
    Cc: Steven Whitehouse
    Cc: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

17 Oct, 2007

2 commits

  • simple_commit_write() can now become static.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • These are intended to replace prepare_write and commit_write with more
    flexible alternatives that are also able to avoid the buffered write
    deadlock problems efficiently (which prepare_write is unable to do).

    [mark.fasheh@oracle.com: API design contributions, code review and fixes]
    [akpm@linux-foundation.org: various fixes]
    [dmonakhov@sw.ru: new aop block_write_begin fix]
    Signed-off-by: Nick Piggin
    Signed-off-by: Mark Fasheh
    Signed-off-by: Dmitriy Monakhov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

09 May, 2007

2 commits


05 Mar, 2007

1 commit


21 Feb, 2007

1 commit

  • simple_prepare_write leaks uninitialised kernel data. This happens because
    the it leaves an uninitialised "hole" over the part of the page that the
    write is expected to go to. This is fine, but it then marks the page
    uptodate, which means a concurrent read can come in and copy the
    uninitialised memory into userspace before it written to.

    Fix it by simply marking it uptodate in simple_commit_write instead, after
    the hole has been filled in. This could theoretically break an fs that
    uses simple_prepare_write and not simple_commit_write, and that relies on
    the incorrect simple_prepare_write behaviour. Luckily, none of those
    exists in the tree.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

13 Feb, 2007

2 commits

  • This patch is inspired by Arjan's "Patch series to mark struct
    file_operations and struct inode_operations const".

    Compile tested with gcc & sparse.

    Signed-off-by: Josef 'Jeff' Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef 'Jeff' Sipek
     
  • Many struct inode_operations in the kernel can be "const". Marking them const
    moves these to the .rodata section, which avoids false sharing with potential
    dirty data. In addition it'll catch accidental writes at compile time to
    these shared resources.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

09 Dec, 2006

1 commit

  • This patch changes struct file to use struct path instead of having
    independent pointers to struct dentry and struct vfsmount, and converts all
    users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

    Additionally, it adds two #define's to make the transition easier for users of
    the f_dentry and f_vfsmnt.

    Signed-off-by: Josef "Jeff" Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef "Jeff" Sipek
     

01 Oct, 2006

2 commits

  • This is mostly included for parity with dec_nlink(), where we will have some
    more hooks. This one should stay pretty darn straightforward for now.

    Signed-off-by: Dave Hansen
    Acked-by: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • When a filesystem decrements i_nlink to zero, it means that a write must be
    performed in order to drop the inode from the filesystem.

    We're shortly going to have keep filesystems from being remounted r/o between
    the time that this i_nlink decrement and that write occurs.

    So, add a little helper function to do the decrements. We'll tie into it in a
    bit to note when i_nlink hits zero.

    Signed-off-by: Dave Hansen
    Acked-by: Christoph Hellwig
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     

30 Sep, 2006

1 commit

  • Remove the unnecessary PageUptodate check from simple_readpage. The only
    two callers for ->readpage that don't have explicit PageUptodate check are
    read_cache_pages and page_cache_read which operate on newly allocated pages
    which don't have the flag set.

    [akpm: use the allegedly-faster clear_page(), too]
    Signed-off-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka J Enberg
     

27 Sep, 2006

2 commits

  • This eliminates the i_blksize field from struct inode. Filesystems that want
    to provide a per-inode st_blksize can do so by providing their own getattr
    routine instead of using the generic_fillattr() function.

    Note that some filesystems were providing pretty much random (and incorrect)
    values for i_blksize.

    [bunk@stusta.de: cleanup]
    [akpm@osdl.org: generic_fillattr() fix]
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Theodore Ts'o
     
  • The following patches reduce the size of the VFS inode structure by 28 bytes
    on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction
    in the inode size on a UP kernel that is configured in a production mode
    (i.e., with no spinlock or other debugging functions enabled; if you want to
    save memory taken up by in-core inodes, the first thing you should do is
    disable the debugging options; they are responsible for a huge amount of bloat
    in the VFS inode structure).

    This patch:

    The filesystem or device-specific pointer in the inode is inside a union,
    which is pretty pointless given that all 30+ users of this field have been
    using the void pointer. Get rid of the union and rename it to i_private, with
    a comment to explain who is allowed to use the void pointer. This is just a
    cleanup, but it allows us to reuse the union 'u' for something something where
    the union will actually be used.

    [judith@osdl.org: powerpc build fix]
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Judith Lebzelter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Theodore Ts'o
     

27 Jun, 2006

1 commit

  • This patch converts the combination of list_del(A) and list_add(A, B) to
    list_move(A, B).

    Cc: Greg Kroah-Hartman
    Cc: Ram Pai
    Signed-off-by: Akinobu Mita
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

25 Jun, 2006

1 commit


23 Jun, 2006

2 commits

  • Give the statfs superblock operation a dentry pointer rather than a superblock
    pointer.

    This complements the get_sb() patch. That reduced the significance of
    sb->s_root, allowing NFS to place a fake root there. However, NFS does
    require a dentry to use as a target for the statfs operation. This permits
    the root in the vfsmount to be used instead.

    linux/mount.h has been added where necessary to make allyesconfig build
    successfully.

    Interest has also been expressed for use with the FUSE and XFS filesystems.

    Signed-off-by: David Howells
    Acked-by: Al Viro
    Cc: Nathan Scott
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Extend the get_sb() filesystem operation to take an extra argument that
    permits the VFS to pass in the target vfsmount that defines the mountpoint.

    The filesystem is then required to manually set the superblock and root dentry
    pointers. For most filesystems, this should be done with simple_set_mnt()
    which will set the superblock pointer and then set the root dentry to the
    superblock's s_root (as per the old default behaviour).

    The get_sb() op now returns an integer as there's now no need to return the
    superblock pointer.

    This patch permits a superblock to be implicitly shared amongst several mount
    points, such as can be done with NFS to avoid potential inode aliasing. In
    such a case, simple_set_mnt() would not be called, and instead the mnt_root
    and mnt_sb would be set directly.

    The patch also makes the following changes:

    (*) the get_sb_*() convenience functions in the core kernel now take a vfsmount
    pointer argument and return an integer, so most filesystems have to change
    very little.

    (*) If one of the convenience function is not used, then get_sb() should
    normally call simple_set_mnt() to instantiate the vfsmount. This will
    always return 0, and so can be tail-called from get_sb().

    (*) generic_shutdown_super() now calls shrink_dcache_sb() to clean up the
    dcache upon superblock destruction rather than shrink_dcache_anon().

    This is required because the superblock may now have multiple trees that
    aren't actually bound to s_root, but that still need to be cleaned up. The
    currently called functions assume that the whole tree is rooted at s_root,
    and that anonymous dentries are not the roots of trees which results in
    dentries being left unculled.

    However, with the way NFS superblock sharing are currently set to be
    implemented, these assumptions are violated: the root of the filesystem is
    simply a dummy dentry and inode (the real inode for '/' may well be
    inaccessible), and all the vfsmounts are rooted on anonymous[*] dentries
    with child trees.

    [*] Anonymous until discovered from another tree.

    (*) The documentation has been adjusted, including the additional bit of
    changing ext2_* into foo_* in the documentation.

    [akpm@osdl.org: convert ipath_fs, do other stuff]
    Signed-off-by: David Howells
    Acked-by: Al Viro
    Cc: Nathan Scott
    Cc: Roland Dreier
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

09 Jun, 2006

1 commit


29 Mar, 2006

1 commit

  • This is a conversion to make the various file_operations structs in fs/
    const. Basically a regexp job, with a few manual fixups

    The goal is both to increase correctness (harder to accidentally write to
    shared datastructures) and reducing the false sharing of cachelines with
    things that get dirty in .data (while .rodata is nicely read only and thus
    cache clean)

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

23 Mar, 2006

1 commit

  • Semaphore to mutex conversion.

    The conversion was generated via scripts, and the result was validated
    automatically via a script as well.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

04 Feb, 2006

1 commit


10 Jan, 2006

1 commit