08 Mar, 2011

1 commit


02 Mar, 2011

1 commit

  • vfs_rename_other() does not lock renamed inode with i_mutex. Thus changing
    i_nlink in a non-atomic manner (which happens in ext2_rename()) can corrupt
    it as reported and analyzed by Josh.

    In fact, there is no good reason to mess with i_nlink of the moved file.
    We did it presumably to simulate linking into the new directory and unlinking
    from an old one. But the practical effect of this is disputable because fsck
    can possibly treat file as being properly linked into both directories without
    writing any error which is confusing. So we just stop increment-decrement
    games with i_nlink which also fixes the corruption.

    CC: stable@kernel.org
    CC: Al Viro
    Signed-off-by: Josh Hunt
    Signed-off-by: Jan Kara

    Josh Hunt
     

02 Feb, 2011

1 commit

  • SELinux would like to implement a new labeling behavior of newly created
    inodes. We currently label new inodes based on the parent and the creating
    process. This new behavior would also take into account the name of the
    new object when deciding the new label. This is not the (supposed) full path,
    just the last component of the path.

    This is very useful because creating /etc/shadow is different than creating
    /etc/passwd but the kernel hooks are unable to differentiate these
    operations. We currently require that userspace realize it is doing some
    difficult operation like that and than userspace jumps through SELinux hoops
    to get things set up correctly. This patch does not implement new
    behavior, that is obviously contained in a seperate SELinux patch, but it
    does pass the needed name down to the correct LSM hook. If no such name
    exists it is fine to pass NULL.

    Signed-off-by: Eric Paris

    Eric Paris
     

12 Jan, 2011

2 commits

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (44 commits)
    ext4: fix trimming starting with block 0 with small blocksize
    ext4: revert buggy trim overflow patch
    ext4: don't pass entire map to check_eofblocks_fl
    ext4: fix memory leak in ext4_free_branches
    ext4: remove ext4_mb_return_to_preallocation()
    ext4: flush the i_completed_io_list during ext4_truncate
    ext4: add error checking to calls to ext4_handle_dirty_metadata()
    ext4: fix trimming of a single group
    ext4: fix uninitialized variable in ext4_register_li_request
    ext4: dynamically allocate the jbd2_inode in ext4_inode_info as necessary
    ext4: drop i_state_flags on architectures with 64-bit longs
    ext4: reorder ext4_inode_info structure elements to remove unneeded padding
    ext4: drop ec_type from the ext4_ext_cache structure
    ext4: use ext4_lblk_t instead of sector_t for logical blocks
    ext4: replace i_delalloc_reserved_flag with EXT4_STATE_DELALLOC_RESERVED
    ext4: fix 32bit overflow in ext4_ext_find_goal()
    ext4: add more error checks to ext4_mkdir()
    ext4: ext4_ext_migrate should use NULL not 0
    ext4: Use ext4_error_file() to print the pathname to the corrupted inode
    ext4: use IS_ERR() to check for errors in ext4_error_file
    ...

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
    ext2: Resolve 'dereferencing pointer to incomplete type' when enabling EXT2_XATTR_DEBUG
    ext3: Remove redundant unlikely()
    ext2: Remove redundant unlikely()
    ext3: speed up file creates by optimizing rec_len functions
    ext2: speed up file creates by optimizing rec_len functions
    ext3: Add more journal error check
    ext3: Add journal error check in resize.c
    quota: Use %pV and __attribute__((format (printf in __quota_error and fix fallout
    ext3: Add FITRIM handling
    ext3: Add batched discard support for ext3
    ext3: Add journal error check into ext3_rename()
    ext3: Use search_dirblock() in ext3_dx_find_entry()
    ext3: Avoid uninitialized memory references with a corrupted htree directory
    ext3: Return error code from generic_check_addressable
    ext3: Add journal error check into ext3_delete_entry()
    ext3: Add error check in ext3_mkdir()
    fs/ext3/super.c: Use printf extension %pV
    fs/ext2/super.c: Use printf extension %pV
    ext3: don't update sb journal_devnum when RO dev

    Linus Torvalds
     

11 Jan, 2011

4 commits


07 Jan, 2011

3 commits

  • This simple implementation just checks for no ACLs on the inode, and
    if so, then the rcu-walk may proceed, otherwise fail it.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Signed-off-by: Nick Piggin

    Nick Piggin
     
  • RCU free the struct inode. This will allow:

    - Subsequent store-free path walking patch. The inode must be consulted for
    permissions when walking, so an RCU inode reference is a must.
    - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
    to take i_lock no longer need to take sb_inode_list_lock to walk the list in
    the first place. This will simplify and optimize locking.
    - Could remove some nested trylock loops in dcache code
    - Could potentially simplify things a bit in VM land. Do not need to take the
    page lock to follow page->mapping.

    The downsides of this is the performance cost of using RCU. In a simple
    creat/unlink microbenchmark, performance drops by about 10% due to inability to
    reuse cache-hot slab objects. As iterations increase and RCU freeing starts
    kicking over, this increases to about 20%.

    In cases where inode lifetimes are longer (ie. many inodes may be allocated
    during the average life span of a single inode), a lot of this cache reuse is
    not applicable, so the regression caused by this patch is smaller.

    The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
    however this adds some complexity to list walking and store-free path walking,
    so I prefer to implement this at a later date, if it is shown to be a win in
    real situations. I haven't found a regression in any non-micro benchmark so I
    doubt it will be a problem.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

06 Jan, 2011

1 commit


29 Oct, 2010

1 commit


28 Oct, 2010

3 commits

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (24 commits)
    quota: Fix possible oops in __dquot_initialize()
    ext3: Update kernel-doc comments
    jbd/2: fixed typos
    ext2: fixed typo.
    ext3: Fix debug messages in ext3_group_extend()
    jbd: Convert atomic_inc() to get_bh()
    ext3: Remove misplaced BUFFER_TRACE() in ext3_truncate()
    jbd: Fix debug message in do_get_write_access()
    jbd: Check return value of __getblk()
    ext3: Use DIV_ROUND_UP() on group desc block counting
    ext3: Return proper error code on ext3_fill_super()
    ext3: Remove unnecessary casts on bh->b_data
    ext3: Cleanup ext3_setup_super()
    quota: Fix issuing of warnings from dquot_transfer
    quota: fix dquot_disable vs dquot_transfer race v2
    jbd: Convert bitops to buffer fns
    ext3/jbd: Avoid WARN() messages when failing to write the superblock
    jbd: Use offset_in_page() instead of manual calculation
    jbd: Remove unnecessary goto statement
    jbd: Use printk_ratelimited() in journal_alloc_journal_head()
    ...

    Linus Torvalds
     
  • "excpet"

    Signed-off-by: Andrea Gelmini
    Signed-off-by: Jan Kara

    Andrea Gelmini
     
  • @handle doesn't exist in ext2. Remove it.
    Also, fit comment header into kernel-doc format.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Jan Kara

    Namhyung Kim
     

26 Oct, 2010

3 commits


25 Oct, 2010

1 commit

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (39 commits)
    Update broken web addresses in arch directory.
    Update broken web addresses in the kernel.
    Revert "drivers/usb: Remove unnecessary return's from void functions" for musb gadget
    Revert "Fix typo: configuation => configuration" partially
    ida: document IDA_BITMAP_LONGS calculation
    ext2: fix a typo on comment in ext2/inode.c
    drivers/scsi: Remove unnecessary casts of private_data
    drivers/s390: Remove unnecessary casts of private_data
    net/sunrpc/rpc_pipe.c: Remove unnecessary casts of private_data
    drivers/infiniband: Remove unnecessary casts of private_data
    drivers/gpu/drm: Remove unnecessary casts of private_data
    kernel/pm_qos_params.c: Remove unnecessary casts of private_data
    fs/ecryptfs: Remove unnecessary casts of private_data
    fs/seq_file.c: Remove unnecessary casts of private_data
    arm: uengine.c: remove C99 comments
    arm: scoop.c: remove C99 comments
    Fix typo configue => configure in comments
    Fix typo: configuation => configuration
    Fix typo interrest[ing|ed] => interest[ing|ed]
    Fix various typos of valid in comments
    ...

    Fix up trivial conflicts in:
    drivers/char/ipmi/ipmi_si_intf.c
    drivers/usb/gadget/rndis.c
    net/irda/irnet/irnet_ppp.c

    Linus Torvalds
     

05 Oct, 2010

2 commits

  • The BKL is still used in ext2_put_super(), ext2_fill_super(), ext2_sync_fs()
    ext2_remount() and ext2_write_inode(). From these calls ext2_put_super(),
    ext2_fill_super() and ext2_remount() are protected against each other by
    the struct super_block s_umount rw semaphore. The call in ext2_write_inode()
    could only protect the modification of the ext2_sb_info through
    ext2_update_dynamic_rev() against concurrent ext2_sync_fs() or ext2_remount().
    ext2_fill_super() and ext2_put_super() can be left out because you need a
    valid filesystem reference in all three cases, which you do not have when
    you are one of these functions.

    If the BKL is only protecting the modification of the ext2_sb_info it can
    safely be removed since this is protected by the struct ext2_sb_info s_lock.

    Signed-off-by: Jan Blunck
    Cc: Jan Kara
    Signed-off-by: Arnd Bergmann

    Jan Blunck
     
  • This patch is a preparation necessary to remove the BKL from do_new_mount().
    It explicitly adds calls to lock_kernel()/unlock_kernel() around
    get_sb/fill_super operations for filesystems that still uses the BKL.

    I've read through all the code formerly covered by the BKL inside
    do_kern_mount() and have satisfied myself that it doesn't need the BKL
    any more.

    do_kern_mount() is already called without the BKL when mounting the rootfs
    and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called
    from various places without BKL: simple_pin_fs(), nfs_do_clone_mount()
    through nfs_follow_mountpoint(), afs_mntpt_do_automount() through
    afs_mntpt_follow_link(). Both later functions are actually the filesystems
    follow_link inode operation. vfs_kern_mount() is calling the specified
    get_sb function and lets the filesystem do its job by calling the given
    fill_super function.

    Therefore I think it is safe to push down the BKL from the VFS to the
    low-level filesystems get_sb/fill_super operation.

    [arnd: do not add the BKL to those file systems that already
    don't use it elsewhere]

    Signed-off-by: Jan Blunck
    Signed-off-by: Arnd Bergmann
    Cc: Matthew Wilcox
    Cc: Christoph Hellwig

    Jan Blunck
     

23 Sep, 2010

1 commit


10 Aug, 2010

12 commits

  • The mbcache code was written to support a variable number of indexes,
    but all the existing users use exactly one index. Simplify to code to
    support only that case.

    There are also no users of the cache entry free operation, and none of
    the users keep extra data in cache entries. Remove those features as
    well.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • ... it's beyond fs-writeback reach already - writeback won't
    be started at that point.

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • brute-force conversion

    Signed-off-by: Al Viro

    Al Viro
     
  • Make sure we check the truncate constraints early on in ->setattr by adding
    those checks to inode_change_ok. Also clean up and document inode_change_ok
    to make this obvious.

    As a fallout we don't have to call inode_newsize_ok from simple_setsize and
    simplify it down to a truncate_setsize which doesn't return an error. This
    simplifies a lot of setattr implementations and means we use truncate_setsize
    almost everywhere. Get rid of fat_setsize now that it's trivial and mark
    ext2_setsize static to make the calling convention obvious.

    Keep the inode_newsize_ok in vmtruncate for now as all callers need an
    audit for its removal anyway.

    Note: setattr code in ecryptfs doesn't call inode_change_ok at all and
    needs a deeper audit, but that is left for later.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Despite its name it's now a generic implementation of ->setattr, but
    rather a helper to copy attributes from a struct iattr to the inode.
    Rename it to setattr_copy to reflect this fact.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Move the call to vmtruncate to get rid of accessive blocks to the callers
    in preparation of the new truncate sequence and rename the non-truncating
    version to block_write_begin.

    While we're at it also remove several unused arguments to block_write_begin.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Split up the block_write_begin implementation - __block_write_begin is a new
    trivial wrapper for block_prepare_write that always takes an already
    allocated page and can be either called from block_write_begin or filesystem
    code that already has a page allocated. Remove the handling of already
    allocated pages from block_write_begin after switching all callers that
    do it to __block_write_begin.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • For filesystem that implement directories in pagecache we call
    block_write_begin with an already allocated page for this code, while the
    normal regular file write path uses the default block_write_begin behaviour.

    Get rid of the __foofs_write_begin helper and opencode the normal write_begin
    call in foofs_write_begin, while adding a new foofs_prepare_chunk helper for
    the directory code. The added benefit is that foofs_prepare_chunk has
    a much saner calling convention.

    Note that the interruptible flag passed into block_write_begin is always
    ignored if we already pass in a page (see next patch for details), and
    we never were doing truncations of exessive blocks for this case either so we
    can switch directly to block_write_begin_newtrunc.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Move the call to vmtruncate to get rid of accessive blocks to the only
    remaining caller and rename the non-truncating version to nobh_write_begin.

    Get rid of the superflous file argument to it while we're at it.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Move the call to vmtruncate to get rid of accessive blocks to the callers
    in prepearation of the new truncate calling sequence. This was only done
    for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
    was not needed anyway. Get rid of blockdev_direct_IO_no_locking and
    its _newtrunc variant while at it as just opencoding the two additional
    paramters is shorted than the name suffix.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

25 Jun, 2010

1 commit


05 Jun, 2010

1 commit

  • mtime and ctime should be changed only if the file size has actually
    changed. Patches changing ext2 and tmpfs from vmtruncate to new truncate
    sequence has caused regressions where they always update timestamps.

    There is some strange cases in POSIX where truncate(2) must not update
    times unless the size has acutally changed, see 6e656be89.

    This area is all still rather buggy in different ways in a lot of
    filesystems and needs a cleanup and audit (ideally the vfs will provide
    a simple attribute or call to direct all filesystems exactly which
    attributes to change). But coming up with the best solution will take a
    while and is not appropriate for rc anyway.

    So fix recent regression for now.

    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    Nick Piggin
     

31 May, 2010

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
    quota: Convert quota statistics to generic percpu_counter
    ext3 uses rb_node = NULL; to zero rb_root.
    quota: Fixup dquot_transfer
    reiserfs: Fix resuming of quotas on remount read-write
    pohmelfs: Remove dead quota code
    ufs: Remove dead quota code
    udf: Remove dead quota code
    quota: rename default quotactl methods to dquot_
    quota: explicitly set ->dq_op and ->s_qcop
    quota: drop remount argument to ->quota_on and ->quota_off
    quota: move unmount handling into the filesystem
    quota: kill the vfs_dq_off and vfs_dq_quota_on_remount wrappers
    quota: move remount handling into the filesystem
    ocfs2: Fix use after free on remount read-only

    Fix up conflicts in fs/ext4/super.c and fs/ufs/file.c

    Linus Torvalds
     

28 May, 2010

1 commit