28 Mar, 2009

3 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (37 commits)
    fs: avoid I_NEW inodes
    Merge code for single and multiple-instance mounts
    Remove get_init_pts_sb()
    Move common mknod_ptmx() calls into caller
    Parse mount options just once and copy them to super block
    Unroll essentials of do_remount_sb() into devpts
    vfs: simple_set_mnt() should return void
    fs: move bdev code out of buffer.c
    constify dentry_operations: rest
    constify dentry_operations: configfs
    constify dentry_operations: sysfs
    constify dentry_operations: JFS
    constify dentry_operations: OCFS2
    constify dentry_operations: GFS2
    constify dentry_operations: FAT
    constify dentry_operations: FUSE
    constify dentry_operations: procfs
    constify dentry_operations: ecryptfs
    constify dentry_operations: CIFS
    constify dentry_operations: AFS
    ...

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-quota-2.6: (27 commits)
    ext2: Zero our b_size in ext2_quota_read()
    trivial: fix typos/grammar errors in fs/Kconfig
    quota: Coding style fixes
    quota: Remove superfluous inlines
    quota: Remove uppercase aliases for quota functions.
    nfsd: Use lowercase names of quota functions
    jfs: Use lowercase names of quota functions
    udf: Use lowercase names of quota functions
    ufs: Use lowercase names of quota functions
    reiserfs: Use lowercase names of quota functions
    ext4: Use lowercase names of quota functions
    ext3: Use lowercase names of quota functions
    ext2: Use lowercase names of quota functions
    ramfs: Remove quota call
    vfs: Use lowercase names of quota functions
    quota: Remove dqbuf_t and other cleanups
    quota: Remove NODQUOT macro
    quota: Make global quota locks cacheline aligned
    quota: Move quota files into separate directory
    ext4: quota reservation for delayed allocation
    ...

    Linus Torvalds
     
  • To be on the safe side, it should be less fragile to exclude I_NEW inodes
    from inode list scans by default (unless there is an important reason to
    have them).

    Normally they will get excluded (eg. by zero refcount or writecount etc),
    however it is a bit fragile for list walkers to know exactly what parts of
    the inode state is set up and valid to test when in I_NEW. So along these
    lines, move I_NEW checks upward as well (sometimes taking I_FREEING etc
    checks with them too -- this shouldn't be a problem should it?)

    Signed-off-by: Nick Piggin
    Acked-by: Jan Kara
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Nick Piggin
     

27 Mar, 2009

2 commits

  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (71 commits)
    SELinux: inode_doinit_with_dentry drop no dentry printk
    SELinux: new permission between tty audit and audit socket
    SELinux: open perm for sock files
    smack: fixes for unlabeled host support
    keys: make procfiles per-user-namespace
    keys: skip keys from another user namespace
    keys: consider user namespace in key_permission
    keys: distinguish per-uid keys in different namespaces
    integrity: ima iint radix_tree_lookup locking fix
    TOMOYO: Do not call tomoyo_realpath_init unless registered.
    integrity: ima scatterlist bug fix
    smack: fix lots of kernel-doc notation
    TOMOYO: Don't create securityfs entries unless registered.
    TOMOYO: Fix exception policy read failure.
    SELinux: convert the avc cache hash list to an hlist
    SELinux: code readability with avc_cache
    SELinux: remove unused av.decided field
    SELinux: more careful use of avd in avc_has_perm_noaudit
    SELinux: remove the unused ae.used
    SELinux: check seqno when updating an avc_node
    ...

    Linus Torvalds
     
  • Allow atime to be updated once per day even with relatime. This lets
    utilities like tmpreaper (which delete files based on last access time)
    continue working, making relatime a plausible default for distributions.

    Signed-off-by: Matthew Garrett
    Reviewed-by: Matthew Wilcox
    Acked-by: Valerie Aurora Henson
    Acked-by: Alan Cox
    Acked-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Matthew Garrett
     

26 Mar, 2009

1 commit


24 Mar, 2009

1 commit


13 Mar, 2009

1 commit

  • There was a report of a data corruption
    http://lkml.org/lkml/2008/11/14/121. There is a script included to
    reproduce the problem.

    During testing, I encountered a number of strange things with ext3, so I
    tried ext2 to attempt to reduce complexity of the problem. I found that
    fsstress would quickly hang in wait_on_inode, waiting for I_LOCK to be
    cleared, even though instrumentation showed that unlock_new_inode had
    already been called for that inode. This points to memory scribble, or
    synchronisation problme.

    i_state of I_NEW inodes is not protected by inode_lock because other
    processes are not supposed to touch them until I_LOCK (and I_NEW) is
    cleared. Adding WARN_ON(inode->i_state & I_NEW) to sites where we modify
    i_state revealed that generic_sync_sb_inodes is picking up new inodes from
    the inode lists and passing them to __writeback_single_inode without
    waiting for I_NEW. Subsequently modifying i_state causes corruption. In
    my case it would look like this:

    CPU0 CPU1
    unlock_new_inode() __sync_single_inode()
    reg i_state
    reg -> reg & ~(I_LOCK|I_NEW) reg i_state
    reg -> inode->i_state reg -> reg | I_SYNC
    reg -> inode->i_state

    Non-atomic RMW on CPU1 overwrites CPU0 store and sets I_LOCK|I_NEW again.

    Fix for this is rather than wait for I_NEW inodes, just skip over them:
    inodes concurrently being created are not subject to data integrity
    operations, and should not significantly contribute to dirty memory
    either.

    After this change, I'm unable to reproduce any of the added warnings or
    hangs after ~1hour of running. Previously, the new warnings would start
    immediately and hang would happen in under 5 minutes.

    I'm also testing on ext3 now, and so far no problems there either. I
    don't know whether this fixes the problem reported above, but it fixes a
    real problem for me.

    Cc: "Jorge Boncompte [DTI2]"
    Reported-by: Adrian Hunter
    Cc: Jan Kara
    Cc:
    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

06 Feb, 2009

2 commits

  • Conflicts:
    fs/namei.c

    Manually merged per:

    diff --cc fs/namei.c
    index 734f2b5,bbc15c2..0000000
    --- a/fs/namei.c
    +++ b/fs/namei.c
    @@@ -860,9 -848,8 +849,10 @@@ static int __link_path_walk(const char
    nd->flags |= LOOKUP_CONTINUE;
    err = exec_permission_lite(inode);
    if (err == -EAGAIN)
    - err = vfs_permission(nd, MAY_EXEC);
    + err = inode_permission(nd->path.dentry->d_inode,
    + MAY_EXEC);
    + if (!err)
    + err = ima_path_check(&nd->path, MAY_EXEC);
    if (err)
    break;

    @@@ -1525,14 -1506,9 +1509,14 @@@ int may_open(struct path *path, int acc
    flag &= ~O_TRUNC;
    }

    - error = vfs_permission(nd, acc_mode);
    + error = inode_permission(inode, acc_mode);
    if (error)
    return error;
    +
    - error = ima_path_check(&nd->path,
    ++ error = ima_path_check(path,
    + acc_mode & (MAY_READ | MAY_WRITE | MAY_EXEC));
    + if (error)
    + return error;
    /*
    * An append-only file must be opened in append mode for writing.
    */

    Signed-off-by: James Morris

    James Morris
     
  • This patch replaces the generic integrity hooks, for which IMA registered
    itself, with IMA integrity hooks in the appropriate places directly
    in the fs directory.

    Signed-off-by: Mimi Zohar
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    Mimi Zohar
     

10 Jan, 2009

1 commit


08 Jan, 2009

1 commit


07 Jan, 2009

2 commits

  • Fix kernel-doc notation:

    Warning(linux-2.6.28-git3//fs/inode.c:120): No description found for parameter 'sb'
    Warning(linux-2.6.28-git3//fs/inode.c:120): No description found for parameter 'inode'
    Warning(linux-2.6.28-git3//fs/inode.c:588): No description found for parameter 'sb'
    Warning(linux-2.6.28-git3//fs/inode.c:588): No description found for parameter 'inode'

    Signed-off-by: Randy Dunlap
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • GFP_HIGHUSER_PAGECACHE is just an alias for GFP_HIGHUSER_MOVABLE, making
    that harder to track down: remove it, and its out-of-work brothers
    GFP_NOFS_PAGECACHE and GFP_USER_PAGECACHE.

    Since we're making that improvement to hotremove_migrate_alloc(), I think
    we can now also remove one of the "o"s from its comment.

    Signed-off-by: Hugh Dickins
    Acked-by: Mel Gorman
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

06 Jan, 2009

1 commit


01 Jan, 2009

1 commit

  • new helpers - insert_inode_locked() and insert_inode_locked4().
    Hash new inode, making sure that there's no such inode in icache
    already. If there is and it does not end up unhashed (as would
    happen if we have nfsd trying to resolve a bogus fhandle), fail.
    Otherwise insert our inode into hash and succeed.

    In either case have i_state set to new+locked; cleanup ends up
    being simpler with such calling conventions.

    Signed-off-by: Al Viro

    Al Viro
     

10 Nov, 2008

1 commit


30 Oct, 2008

3 commits

  • To make sure we free the security data inodes need to be freed using
    the proper VFS helper (which we also need to export for this). We mark
    these inodes bad so we can skip the flush path for them.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • To allow XFS to combine the XFS and linux inodes into a single
    structure, we need to drive inode lookup from the XFS inode cache,
    not the generic inode cache. This means that we need initialise a
    struct inode from a context outside alloc_inode() as it is no longer
    used by XFS.

    After inode allocation and initialisation, we need to add the inode
    to the superblock list, the in-use list, hash it and do some
    accounting. This all needs to be done with the inode_lock held and
    there are already several places in fs/inode.c that do this list
    manipulation. Factor out the common code, add a locking wrapper and
    export the function so ti can be called from XFS.

    Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    David Chinner
     
  • To allow XFS to combine the XFS and linux inodes into a single
    structure, we need to drive inode lookup from the XFS inode cache,
    not the generic inode cache. This means that we need initialise a
    struct inode from a context outside alloc_inode() as it is no longer
    used by XFS.

    Factor and export the struct inode initialisation code from
    alloc_inode() to inode_init_always() as a counterpart to
    inode_init_once(). i.e. we have to call this init function for each
    inode instantiation (always), as opposed inode_init_once() which is
    only called on slab object instantiation (once).

    Signed-off-by: Dave Chinner
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    David Chinner
     

15 Aug, 2008

1 commit

  • write_cache_pages() uses i_mapping->writeback_index to pick up where it
    left off the last time a given inode was found by pdflush or
    balance_dirty_pages (or anyone else who sets wbc->range_cyclic)

    alloc_inode() should set it to a sane value so that writeback doesn't
    start in the middle of a file. It is somewhat difficult to notice the bug
    since write_cache_pages will loop around to the start of the file and the
    elevator helps hide the resulting seeks.

    For whatever reason, Btrfs hits this often. Unpatched, untarring 30
    copies of the linux kernel in series runs at 47MB/s on a single sata
    drive. With this fix, it jumps to 62MB/s.

    Signed-off-by: Chris Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Mason
     

27 Jul, 2008

2 commits

  • Kmem cache passed to constructor is only needed for constructors that are
    themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
    passed kmem cache in non-trivial way, so pass only pointer to object.

    Non-trivial places are:
    arch/powerpc/mm/init_64.c
    arch/powerpc/mm/hugetlbpage.c

    This is flag day, yes.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Pekka Enberg
    Acked-by: Christoph Lameter
    Cc: Jon Tollefson
    Cc: Nick Piggin
    Cc: Matt Mackall
    [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
    [akpm@linux-foundation.org: fix mm/slab.c]
    [akpm@linux-foundation.org: fix ubifs]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • mapping->tree_lock has no read lockers. convert the lock from an rwlock
    to a spinlock.

    Signed-off-by: Nick Piggin
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Hugh Dickins
    Cc: "Paul E. McKenney"
    Reviewed-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

07 May, 2008

2 commits

  • Commit 33dcdac2df54e66c447ae03f58c95c7251aa5649 ("kill ->put_inode")
    removed the final use of i_op->put_inode, but left the now totally
    unused "op" variable in iput().

    Get rid of it.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • And with that last patch to affs killing the last put_inode instance we
    can finally, after many years of transition kill this racy and awkward
    interface.

    (It's kinda funny that even the description in
    Documentation/filesystems/vfs.txt was entirely wrong..)

    Also remove a very misleading comment above the defintion of
    struct super_operations.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

29 Apr, 2008

1 commit


19 Apr, 2008

2 commits


08 Feb, 2008

1 commit

  • Remove the old iget() call and the read_inode() superblock operation it uses
    as these are really obsolete, and the use of read_inode() does not produce
    proper error handling (no distinction between ENOMEM and EIO when marking an
    inode bad).

    Furthermore, this removes the temptation to use iget() to find an inode by
    number in a filesystem from code outside that filesystem.

    iget_locked() should be used instead. A new function is added in an earlier
    patch (iget_failed) that is to be called to mark an inode as bad, unlock it
    and release it should the get routine fail. Mark iget() and read_inode() as
    being obsolete and remove references to them from the documentation.

    Typically a filesystem will be modified such that the read_inode function
    becomes an internal iget function, for example the following:

    void thingyfs_read_inode(struct inode *inode)
    {
    ...
    }

    would be changed into something like:

    struct inode *thingyfs_iget(struct super_block *sp, unsigned long ino)
    {
    struct inode *inode;
    int ret;

    inode = iget_locked(sb, ino);
    if (!inode)
    return ERR_PTR(-ENOMEM);
    if (!(inode->i_state & I_NEW))
    return inode;

    ...
    unlock_new_inode(inode);
    return inode;
    error:
    iget_failed(inode);
    return ERR_PTR(ret);
    }

    and then thingyfs_iget() would be called rather than iget(), for example:

    ret = -EINVAL;
    inode = iget(sb, ino);
    if (!inode || is_bad_inode(inode))
    goto error;

    becomes:

    inode = thingyfs_iget(sb, ino);
    if (IS_ERR(inode)) {
    ret = PTR_ERR(inode);
    goto error;
    }

    Note that is_bad_inode() does not need to be called. The error returned by
    thingyfs_iget() should render it unnecessary.

    Signed-off-by: David Howells
    Acked-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

29 Jan, 2008

2 commits

  • This patch adds 64-bit inode version support to ext4. The lower 32 bits
    are stored in the osd1.linux1.l_i_version field while the high 32 bits
    are stored in the i_version_hi field newly created in the ext4_inode.
    This field is incremented in case the ext4_inode is large enough. A
    i_version mount option has been added to enable the feature.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andreas Dilger
    Signed-off-by: Kalpak Shah
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Jean Noel Cordenner

    Jean Noel Cordenner
     
  • The i_version field of the inode is changed to be a 64-bit counter that
    is set on every inode creation and that is incremented every time the
    inode data is modified (similarly to the "ctime" time-stamp).
    The aim is to fulfill a NFSv4 requirement for rfc3530.
    This first part concerns the vfs, it converts the 32-bit i_version in
    the generic inode to a 64-bit, a flag is added in the super block in
    order to check if the feature is enabled and the i_version is
    incremented in the vfs.

    Signed-off-by: Mingming Cao
    Signed-off-by: Jean Noel Cordenner
    Signed-off-by: Kalpak Shah

    Jean Noel Cordenner
     

17 Oct, 2007

4 commits

  • I_LOCK was used for several unrelated purposes, which caused deadlock
    situations in certain filesystems as a side effect. One of the purposes
    now uses the new I_SYNC bit.

    Also document the various bits and change their order from historical to
    logical.

    [bunk@stusta.de: make fs/inode.c:wake_up_inode() static]
    Signed-off-by: Joern Engel
    Cc: Dave Kleikamp
    Cc: David Chinner
    Cc: Anton Altaparmakov
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joern Engel
     
  • Since the mempages parameter is actually not used, they should be removed.

    Now there is only files_init use the mempages parameter,

    files_init(mempages);

    but I don't think the adaptation to mempages in files_init is really
    useful; and if files_init also changed to the prototype void (*func)(void),
    the wrapper vfs_caches_init would also not need the mempages parameter.

    Signed-off-by: Denis Cheng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denis Cheng
     
  • Slab constructors currently have a flags parameter that is never used. And
    the order of the arguments is opposite to other slab functions. The object
    pointer is placed before the kmem_cache pointer.

    Convert

    ctor(void *object, struct kmem_cache *s, unsigned long flags)

    to

    ctor(struct kmem_cache *s, void *object)

    throughout the kernel

    [akpm@linux-foundation.org: coupla fixes]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • A slight oversight tripped lockdep debugging code, each lockdep
    class should have but a single init site.

    Rearange the code to make this true.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

15 Oct, 2007

1 commit


14 Oct, 2007

1 commit

  • On Mon, 2007-09-24 at 22:13 -0400, Steven Rostedt wrote:
    > The circular lock seems to be this:
    >
    > #1:
    >
    > sys_mmap2: down_write(&mm->mmap_sem);
    > nfs_revalidate_mapping: mutex_lock(&inode->i_mutex);
    >
    >
    > #0:
    >
    > vfs_readdir: mutex_lock(&inode->i_mutex);
    > - during the readdir (filldir64), we take a user fault (missing page?)
    > and call do_page_fault -
    > do_page_fault: down_read(&mm->mmap_sem);
    >
    >
    > So it does indeed look like a circular locking. Now the question is, "is
    > this a bug?". Looking like the inode of #1 must be a file or something
    > else that you can mmap and the inode of #0 seems it must be a directory.
    > I would say "no".
    >
    > Now if you can readdir on a file or mmap a directory, then this could be
    > an issue.
    >
    > Otherwise, I'd love to see someone teach lockdep about this issue! ;-)

    Make a distinction between file and dir usage of i_mutex.
    The inode should be complete and unused at unlock_new_inode(), re-init
    i_mutex depending on its type.

    Signed-off-by: Peter Zijlstra

    Peter Zijlstra
     

20 Jul, 2007

1 commit

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     

18 Jul, 2007

2 commits

  • I can never remember what the function to register to receive VM pressure
    is called. I have to trace down from __alloc_pages() to find it.

    It's called "set_shrinker()", and it needs Your Help.

    1) Don't hide struct shrinker. It contains no magic.
    2) Don't allocate "struct shrinker". It's not helpful.
    3) Call them "register_shrinker" and "unregister_shrinker".
    4) Call the function "shrink" not "shrinker".
    5) Reduce the 17 lines of waffly comments to 13, but document it properly.

    Signed-off-by: Rusty Russell
    Cc: David Chinner
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • It is often known at allocation time whether a page may be migrated or not.
    This patch adds a flag called __GFP_MOVABLE and a new mask called
    GFP_HIGH_MOVABLE. Allocations using the __GFP_MOVABLE can be either migrated
    using the page migration mechanism or reclaimed by syncing with backing
    storage and discarding.

    An API function very similar to alloc_zeroed_user_highpage() is added for
    __GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable(). The
    flags used by alloc_zeroed_user_highpage() are not changed because it would
    change the semantics of an existing API. After this patch is applied there
    are no in-kernel users of alloc_zeroed_user_highpage() so it probably should
    be marked deprecated if this patch is merged.

    Note that this patch includes a minor cleanup to the use of __GFP_ZERO in
    shmem.c to keep all flag modifications to inode->mapping in the
    shmem_dir_alloc() helper function. This clean-up suggestion is courtesy of
    Hugh Dickens.

    Additional credit goes to Christoph Lameter and Linus Torvalds for shaping the
    concept. Credit to Hugh Dickens for catching issues with shmem swap vector
    and ramfs allocations.

    [akpm@linux-foundation.org: build fix]
    [hugh@veritas.com: __GFP_ZERO cleanup]
    Signed-off-by: Mel Gorman
    Cc: Andy Whitcroft
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman