25 Jun, 2009

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (23 commits)
    switch xfs to generic acl caching helpers
    helpers for acl caching + switch to those
    switch shmem to inode->i_acl
    switch reiserfs to inode->i_acl
    switch reiserfs to usual conventions for caching ACLs
    reiserfs: minimal fix for ACL caching
    switch nilfs2 to inode->i_acl
    switch btrfs to inode->i_acl
    switch jffs2 to inode->i_acl
    switch jfs to inode->i_acl
    switch ext4 to inode->i_acl
    switch ext3 to inode->i_acl
    switch ext2 to inode->i_acl
    add caching of ACLs in struct inode
    fs: Add new pre-allocation ioctls to vfs for compatibility with legacy xfs ioctls
    cleanup __writeback_single_inode
    ... and the same for vfsmount id/mount group id
    Make allocation of anon devices cheaper
    update Documentation/filesystems/Locking
    devpts: remove module-related code
    ...

    Linus Torvalds
     

24 Jun, 2009

1 commit


23 Jun, 2009

1 commit

  • Some filesystems need to set lockdep map for i_mutex differently for
    different directories. For example OCFS2 has system directories (for
    orphan inode tracking and for gathering all system files like journal
    or quota files into a single place) which have different locking
    locking rules than standard directories. For a filesystem setting
    lockdep map is naturaly done when the inode is read but we have to
    modify unlock_new_inode() not to overwrite the lockdep map the filesystem
    has set.

    Acked-by: peterz@infradead.org
    CC: mingo@redhat.com
    Signed-off-by: Jan Kara
    Signed-off-by: Joel Becker

    Jan Kara
     

13 Jun, 2009

1 commit


12 Jun, 2009

3 commits

  • This patch speeds up lmbench lat_mmap test by about another 2% after the
    first patch.

    Before:
    avg = 462.286
    std = 5.46106

    After:
    avg = 453.12
    std = 9.58257

    (50 runs of each, stddev gives a reasonable confidence)

    It does this by introducing mnt_clone_write, which avoids some heavyweight
    operations of mnt_want_write if called on a vfsmount which we know already
    has a write count; and mnt_want_write_file, which can call mnt_clone_write
    if the file is open for write.

    After these two patches, mnt_want_write and mnt_drop_write go from 7% on
    the profile down to 1.3% (including mnt_clone_write).

    [AV: mnt_want_write_file() should take file alone and derive mnt from it;
    not only all callers have that form, but that's the only mnt about which
    we know that it's already held for write if file is opened for write]

    Cc: Dave Hansen
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    npiggin@suse.de
     
  • When an fs is unmounted with an fsnotify mark entry attached to one of its
    inodes we need to destroy that mark entry and we also (like inotify) send
    an unmount event.

    Signed-off-by: Eric Paris
    Acked-by: Al Viro
    Cc: Christoph Hellwig

    Eric Paris
     
  • This patch creates a way for fsnotify groups to attach marks to inodes.
    These marks have little meaning to the generic fsnotify infrastructure
    and thus their meaning should be interpreted by the group that attached
    them to the inode's list.

    dnotify and inotify will make use of these markings to indicate which
    inodes are of interest to their respective groups. But this implementation
    has the useful property that in the future other listeners could actually
    use the marks for the exact opposite reason, aka to indicate which inodes
    it had NO interest in.

    Signed-off-by: Eric Paris
    Acked-by: Al Viro
    Cc: Christoph Hellwig

    Eric Paris
     

07 Jun, 2009

1 commit

  • CONFIG_IMA=y inode activity leaks iint_cache and radix_tree_node objects
    until the system runs out of memory. Nowhere is calling ima_inode_free()
    a.k.a. ima_iint_delete(). Fix that by calling it from destroy_inode().

    Signed-off-by: Hugh Dickins
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

06 Jun, 2009

1 commit

  • OK, that's probably the easiest way to do that, as much as I don't like it...
    Since iget() et.al. will not accept I_FREEING (will wait to go away
    and restart), and since we'd better have serialization between new/free
    on fs data structures anyway, we can afford simply skipping I_FREEING
    et.al. in insert_inode_locked().

    We do that from new_inode, so it won't race with free_inode in any interesting
    ways and it won't race with iget (of any origin; nfsd or in case of fs
    corruption a lookup) since both still will wait for I_LOCK.

    Reviewed-by: "Theodore Ts'o"
    Acked-by: Jan Kara
    Tested-by: David Watson
    Signed-off-by: Al Viro

    Al Viro
     

09 May, 2009

1 commit


15 Apr, 2009

1 commit

  • There are lots of sequences like this, especially in splice code:

    if (pipe->inode)
    mutex_lock(&pipe->inode->i_mutex);
    /* do something */
    if (pipe->inode)
    mutex_unlock(&pipe->inode->i_mutex);

    so introduce helpers which do the conditional locking and unlocking.
    Also replace the inode_double_lock() call with a pipe_double_lock()
    helper to avoid spreading the use of this functionality beyond the
    pipe code.

    This patch is just a cleanup, and should cause no behavioral changes.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Jens Axboe

    Miklos Szeredi
     

28 Mar, 2009

3 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (37 commits)
    fs: avoid I_NEW inodes
    Merge code for single and multiple-instance mounts
    Remove get_init_pts_sb()
    Move common mknod_ptmx() calls into caller
    Parse mount options just once and copy them to super block
    Unroll essentials of do_remount_sb() into devpts
    vfs: simple_set_mnt() should return void
    fs: move bdev code out of buffer.c
    constify dentry_operations: rest
    constify dentry_operations: configfs
    constify dentry_operations: sysfs
    constify dentry_operations: JFS
    constify dentry_operations: OCFS2
    constify dentry_operations: GFS2
    constify dentry_operations: FAT
    constify dentry_operations: FUSE
    constify dentry_operations: procfs
    constify dentry_operations: ecryptfs
    constify dentry_operations: CIFS
    constify dentry_operations: AFS
    ...

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6: (27 commits)
    ext2: Zero our b_size in ext2_quota_read()
    trivial: fix typos/grammar errors in fs/Kconfig
    quota: Coding style fixes
    quota: Remove superfluous inlines
    quota: Remove uppercase aliases for quota functions.
    nfsd: Use lowercase names of quota functions
    jfs: Use lowercase names of quota functions
    udf: Use lowercase names of quota functions
    ufs: Use lowercase names of quota functions
    reiserfs: Use lowercase names of quota functions
    ext4: Use lowercase names of quota functions
    ext3: Use lowercase names of quota functions
    ext2: Use lowercase names of quota functions
    ramfs: Remove quota call
    vfs: Use lowercase names of quota functions
    quota: Remove dqbuf_t and other cleanups
    quota: Remove NODQUOT macro
    quota: Make global quota locks cacheline aligned
    quota: Move quota files into separate directory
    ext4: quota reservation for delayed allocation
    ...

    Linus Torvalds
     
  • To be on the safe side, it should be less fragile to exclude I_NEW inodes
    from inode list scans by default (unless there is an important reason to
    have them).

    Normally they will get excluded (eg. by zero refcount or writecount etc),
    however it is a bit fragile for list walkers to know exactly what parts of
    the inode state is set up and valid to test when in I_NEW. So along these
    lines, move I_NEW checks upward as well (sometimes taking I_FREEING etc
    checks with them too -- this shouldn't be a problem should it?)

    Signed-off-by: Nick Piggin
    Acked-by: Jan Kara
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Nick Piggin
     

27 Mar, 2009

2 commits

  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (71 commits)
    SELinux: inode_doinit_with_dentry drop no dentry printk
    SELinux: new permission between tty audit and audit socket
    SELinux: open perm for sock files
    smack: fixes for unlabeled host support
    keys: make procfiles per-user-namespace
    keys: skip keys from another user namespace
    keys: consider user namespace in key_permission
    keys: distinguish per-uid keys in different namespaces
    integrity: ima iint radix_tree_lookup locking fix
    TOMOYO: Do not call tomoyo_realpath_init unless registered.
    integrity: ima scatterlist bug fix
    smack: fix lots of kernel-doc notation
    TOMOYO: Don't create securityfs entries unless registered.
    TOMOYO: Fix exception policy read failure.
    SELinux: convert the avc cache hash list to an hlist
    SELinux: code readability with avc_cache
    SELinux: remove unused av.decided field
    SELinux: more careful use of avd in avc_has_perm_noaudit
    SELinux: remove the unused ae.used
    SELinux: check seqno when updating an avc_node
    ...

    Linus Torvalds
     
  • Allow atime to be updated once per day even with relatime. This lets
    utilities like tmpreaper (which delete files based on last access time)
    continue working, making relatime a plausible default for distributions.

    Signed-off-by: Matthew Garrett
    Reviewed-by: Matthew Wilcox
    Acked-by: Valerie Aurora Henson
    Acked-by: Alan Cox
    Acked-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Matthew Garrett
     

26 Mar, 2009

1 commit


24 Mar, 2009

1 commit


13 Mar, 2009

1 commit

  • There was a report of a data corruption
    http://lkml.org/lkml/2008/11/14/121. There is a script included to
    reproduce the problem.

    During testing, I encountered a number of strange things with ext3, so I
    tried ext2 to attempt to reduce complexity of the problem. I found that
    fsstress would quickly hang in wait_on_inode, waiting for I_LOCK to be
    cleared, even though instrumentation showed that unlock_new_inode had
    already been called for that inode. This points to memory scribble, or
    synchronisation problme.

    i_state of I_NEW inodes is not protected by inode_lock because other
    processes are not supposed to touch them until I_LOCK (and I_NEW) is
    cleared. Adding WARN_ON(inode->i_state & I_NEW) to sites where we modify
    i_state revealed that generic_sync_sb_inodes is picking up new inodes from
    the inode lists and passing them to __writeback_single_inode without
    waiting for I_NEW. Subsequently modifying i_state causes corruption. In
    my case it would look like this:

    CPU0 CPU1
    unlock_new_inode() __sync_single_inode()
    reg i_state
    reg -> reg & ~(I_LOCK|I_NEW) reg i_state
    reg -> inode->i_state reg -> reg | I_SYNC
    reg -> inode->i_state

    Non-atomic RMW on CPU1 overwrites CPU0 store and sets I_LOCK|I_NEW again.

    Fix for this is rather than wait for I_NEW inodes, just skip over them:
    inodes concurrently being created are not subject to data integrity
    operations, and should not significantly contribute to dirty memory
    either.

    After this change, I'm unable to reproduce any of the added warnings or
    hangs after ~1hour of running. Previously, the new warnings would start
    immediately and hang would happen in under 5 minutes.

    I'm also testing on ext3 now, and so far no problems there either. I
    don't know whether this fixes the problem reported above, but it fixes a
    real problem for me.

    Cc: "Jorge Boncompte [DTI2]"
    Reported-by: Adrian Hunter
    Cc: Jan Kara
    Cc:
    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

06 Feb, 2009

2 commits

  • Conflicts:
    fs/namei.c

    Manually merged per:

    diff --cc fs/namei.c
    index 734f2b5,bbc15c2..0000000
    --- a/fs/namei.c
    +++ b/fs/namei.c
    @@@ -860,9 -848,8 +849,10 @@@ static int __link_path_walk(const char
    nd->flags |= LOOKUP_CONTINUE;
    err = exec_permission_lite(inode);
    if (err == -EAGAIN)
    - err = vfs_permission(nd, MAY_EXEC);
    + err = inode_permission(nd->path.dentry->d_inode,
    + MAY_EXEC);
    + if (!err)
    + err = ima_path_check(&nd->path, MAY_EXEC);
    if (err)
    break;

    @@@ -1525,14 -1506,9 +1509,14 @@@ int may_open(struct path *path, int acc
    flag &= ~O_TRUNC;
    }

    - error = vfs_permission(nd, acc_mode);
    + error = inode_permission(inode, acc_mode);
    if (error)
    return error;
    +
    - error = ima_path_check(&nd->path,
    ++ error = ima_path_check(path,
    + acc_mode & (MAY_READ | MAY_WRITE | MAY_EXEC));
    + if (error)
    + return error;
    /*
    * An append-only file must be opened in append mode for writing.
    */

    Signed-off-by: James Morris

    James Morris
     
  • This patch replaces the generic integrity hooks, for which IMA registered
    itself, with IMA integrity hooks in the appropriate places directly
    in the fs directory.

    Signed-off-by: Mimi Zohar
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    Mimi Zohar
     

10 Jan, 2009

1 commit


08 Jan, 2009

1 commit


07 Jan, 2009

2 commits

  • Fix kernel-doc notation:

    Warning(linux-2.6.28-git3//fs/inode.c:120): No description found for parameter 'sb'
    Warning(linux-2.6.28-git3//fs/inode.c:120): No description found for parameter 'inode'
    Warning(linux-2.6.28-git3//fs/inode.c:588): No description found for parameter 'sb'
    Warning(linux-2.6.28-git3//fs/inode.c:588): No description found for parameter 'inode'

    Signed-off-by: Randy Dunlap
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • GFP_HIGHUSER_PAGECACHE is just an alias for GFP_HIGHUSER_MOVABLE, making
    that harder to track down: remove it, and its out-of-work brothers
    GFP_NOFS_PAGECACHE and GFP_USER_PAGECACHE.

    Since we're making that improvement to hotremove_migrate_alloc(), I think
    we can now also remove one of the "o"s from its comment.

    Signed-off-by: Hugh Dickins
    Acked-by: Mel Gorman
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

06 Jan, 2009

1 commit


01 Jan, 2009

1 commit

  • new helpers - insert_inode_locked() and insert_inode_locked4().
    Hash new inode, making sure that there's no such inode in icache
    already. If there is and it does not end up unhashed (as would
    happen if we have nfsd trying to resolve a bogus fhandle), fail.
    Otherwise insert our inode into hash and succeed.

    In either case have i_state set to new+locked; cleanup ends up
    being simpler with such calling conventions.

    Signed-off-by: Al Viro

    Al Viro
     

10 Nov, 2008

1 commit


30 Oct, 2008

3 commits

  • To make sure we free the security data inodes need to be freed using
    the proper VFS helper (which we also need to export for this). We mark
    these inodes bad so we can skip the flush path for them.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • To allow XFS to combine the XFS and linux inodes into a single
    structure, we need to drive inode lookup from the XFS inode cache,
    not the generic inode cache. This means that we need initialise a
    struct inode from a context outside alloc_inode() as it is no longer
    used by XFS.

    After inode allocation and initialisation, we need to add the inode
    to the superblock list, the in-use list, hash it and do some
    accounting. This all needs to be done with the inode_lock held and
    there are already several places in fs/inode.c that do this list
    manipulation. Factor out the common code, add a locking wrapper and
    export the function so ti can be called from XFS.

    Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    David Chinner
     
  • To allow XFS to combine the XFS and linux inodes into a single
    structure, we need to drive inode lookup from the XFS inode cache,
    not the generic inode cache. This means that we need initialise a
    struct inode from a context outside alloc_inode() as it is no longer
    used by XFS.

    Factor and export the struct inode initialisation code from
    alloc_inode() to inode_init_always() as a counterpart to
    inode_init_once(). i.e. we have to call this init function for each
    inode instantiation (always), as opposed inode_init_once() which is
    only called on slab object instantiation (once).

    Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    David Chinner
     

15 Aug, 2008

1 commit

  • write_cache_pages() uses i_mapping->writeback_index to pick up where it
    left off the last time a given inode was found by pdflush or
    balance_dirty_pages (or anyone else who sets wbc->range_cyclic)

    alloc_inode() should set it to a sane value so that writeback doesn't
    start in the middle of a file. It is somewhat difficult to notice the bug
    since write_cache_pages will loop around to the start of the file and the
    elevator helps hide the resulting seeks.

    For whatever reason, Btrfs hits this often. Unpatched, untarring 30
    copies of the linux kernel in series runs at 47MB/s on a single sata
    drive. With this fix, it jumps to 62MB/s.

    Signed-off-by: Chris Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Mason
     

27 Jul, 2008

2 commits

  • Kmem cache passed to constructor is only needed for constructors that are
    themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
    passed kmem cache in non-trivial way, so pass only pointer to object.

    Non-trivial places are:
    arch/powerpc/mm/init_64.c
    arch/powerpc/mm/hugetlbpage.c

    This is flag day, yes.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Pekka Enberg
    Acked-by: Christoph Lameter
    Cc: Jon Tollefson
    Cc: Nick Piggin
    Cc: Matt Mackall
    [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
    [akpm@linux-foundation.org: fix mm/slab.c]
    [akpm@linux-foundation.org: fix ubifs]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • mapping->tree_lock has no read lockers. convert the lock from an rwlock
    to a spinlock.

    Signed-off-by: Nick Piggin
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Hugh Dickins
    Cc: "Paul E. McKenney"
    Reviewed-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

07 May, 2008

2 commits

  • Commit 33dcdac2df54e66c447ae03f58c95c7251aa5649 ("kill ->put_inode")
    removed the final use of i_op->put_inode, but left the now totally
    unused "op" variable in iput().

    Get rid of it.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • And with that last patch to affs killing the last put_inode instance we
    can finally, after many years of transition kill this racy and awkward
    interface.

    (It's kinda funny that even the description in
    Documentation/filesystems/vfs.txt was entirely wrong..)

    Also remove a very misleading comment above the defintion of
    struct super_operations.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

29 Apr, 2008

1 commit


19 Apr, 2008

2 commits


08 Feb, 2008

1 commit

  • Remove the old iget() call and the read_inode() superblock operation it uses
    as these are really obsolete, and the use of read_inode() does not produce
    proper error handling (no distinction between ENOMEM and EIO when marking an
    inode bad).

    Furthermore, this removes the temptation to use iget() to find an inode by
    number in a filesystem from code outside that filesystem.

    iget_locked() should be used instead. A new function is added in an earlier
    patch (iget_failed) that is to be called to mark an inode as bad, unlock it
    and release it should the get routine fail. Mark iget() and read_inode() as
    being obsolete and remove references to them from the documentation.

    Typically a filesystem will be modified such that the read_inode function
    becomes an internal iget function, for example the following:

    void thingyfs_read_inode(struct inode *inode)
    {
    ...
    }

    would be changed into something like:

    struct inode *thingyfs_iget(struct super_block *sp, unsigned long ino)
    {
    struct inode *inode;
    int ret;

    inode = iget_locked(sb, ino);
    if (!inode)
    return ERR_PTR(-ENOMEM);
    if (!(inode->i_state & I_NEW))
    return inode;

    ...
    unlock_new_inode(inode);
    return inode;
    error:
    iget_failed(inode);
    return ERR_PTR(ret);
    }

    and then thingyfs_iget() would be called rather than iget(), for example:

    ret = -EINVAL;
    inode = iget(sb, ino);
    if (!inode || is_bad_inode(inode))
    goto error;

    becomes:

    inode = thingyfs_iget(sb, ino);
    if (IS_ERR(inode)) {
    ret = PTR_ERR(inode);
    goto error;
    }

    Note that is_bad_inode() does not need to be called. The error returned by
    thingyfs_iget() should render it unnecessary.

    Signed-off-by: David Howells
    Acked-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells