20 Jul, 2011

3 commits


27 May, 2011

3 commits

  • Tell the filesystem if we just updated timestamp (I_DIRTY_SYNC) or
    anything else, so that the filesystem can track internally if it
    needs to push out a transaction for fdatasync or not.

    This is just the prototype change with no user for it yet. I plan
    to push large XFS changes for the next merge window, and getting
    this trivial infrastructure in this window would help a lot to avoid
    tree interdependencies.

    Also remove incorrect comments that ->dirty_inode can't block. That
    has been changed a long time ago, and many implementations rely on it.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/djm/tmem:
    xen: cleancache shim to Xen Transcendent Memory
    ocfs2: add cleancache support
    ext4: add cleancache support
    btrfs: add cleancache support
    ext3: add cleancache support
    mm/fs: add hooks to support cleancache
    mm: cleancache core ops functions and config
    fs: add field to superblock to support cleancache
    mm/fs: cleancache documentation

    Fix up trivial conflict in fs/btrfs/extent_io.c due to includes

    Linus Torvalds
     
  • This fifth patch of eight in this cleancache series "opts-in"
    cleancache for ext3. Filesystems must explicitly enable
    cleancache by calling cleancache_init_fs anytime an instance
    of the filesystem is mounted. For ext3, all other cleancache
    hooks are in the VFS layer including the matching cleancache_flush_fs
    hook which must be called on unmount.

    Details and a FAQ can be found in Documentation/vm/cleancache.txt

    [v6-v8: no changes]
    [v5: jeremy@goop.org: simplify init hook and any future fs init changes]
    Signed-off-by: Dan Magenheimer
    Reviewed-by: Jeremy Fitzhardinge
    Reviewed-by: Konrad Rzeszutek Wilk
    Acked-by: Andreas Dilger
    Cc: Ted Ts'o
    Cc: Andrew Morton
    Cc: Al Viro
    Cc: Matthew Wilcox
    Cc: Nick Piggin
    Cc: Mel Gorman
    Cc: Rik Van Riel
    Cc: Jan Beulich
    Cc: Chris Mason
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Nitin Gupta

    Dan Magenheimer
     

17 May, 2011

1 commit

  • When make_indexed_dir() fails (e.g. because of ENOSPC) after it has allocated
    block for index tree root, we did not properly mark all changed buffers dirty.
    This lead to only some of these buffers being written out and thus effectively
    corrupting the directory.

    Fix the issue by marking all changed data dirty even in the error failure case.

    CC: stable@kernel.org
    Signed-off-by: Jan Kara

    Jan Kara
     

30 Apr, 2011

1 commit

  • ext3_symlink() cannot call __page_symlink() with transaction open.
    __page_symlink() calls ext3_write_begin() which gets page lock which ranks
    above transaction start (thus lock ordering is violated) and and also
    ext3_write_begin() waits for a transaction commit when we run out of space
    which never happens if we hold transaction open.

    Fix the problem by stopping a transaction before calling __page_symlink()
    (we have to be careful and put inode to orphan list so that it gets deleted
    in case of crash) and starting another one after __page_symlink() returns
    for addition of symlink into a directory.

    Signed-off-by: Jan Kara

    Jan Kara
     

08 Apr, 2011

1 commit


31 Mar, 2011

1 commit


25 Mar, 2011

1 commit

  • * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
    Documentation/iostats.txt: bit-size reference etc.
    cfq-iosched: removing unnecessary think time checking
    cfq-iosched: Don't clear queue stats when preempt.
    blk-throttle: Reset group slice when limits are changed
    blk-cgroup: Only give unaccounted_time under debug
    cfq-iosched: Don't set active queue in preempt
    block: fix non-atomic access to genhd inflight structures
    block: attempt to merge with existing requests on plug flush
    block: NULL dereference on error path in __blkdev_get()
    cfq-iosched: Don't update group weights when on service tree
    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
    block: Require subsystems to explicitly allocate bio_set integrity mempool
    jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    fs: make fsync_buffers_list() plug
    mm: make generic_writepages() use plugging
    blk-cgroup: Add unaccounted time to timeslice_used.
    block: fixup plugging stubs for !CONFIG_BLOCK
    block: remove obsolete comments for blkdev_issue_zeroout.
    blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
    ...

    Fix up conflicts in fs/{aio.c,super.c}

    Linus Torvalds
     

24 Mar, 2011

2 commits


18 Mar, 2011

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
    ext3: Always set dx_node's fake_dirent explicitly.
    ext3: Fix an overflow in ext3_trim_fs.
    jbd: Remove one to many n's in a word.
    ext3: skip orphan cleanup on rocompat fs
    ext2: Fix link count corruption under heavy link+rename load
    ext3: speed up group trim with the right free block count.
    ext3: Adjust trim start with first_data_block.
    quota: return -ENOMEM when memory allocation fails

    Linus Torvalds
     

17 Mar, 2011

1 commit

  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (33 commits)
    AppArmor: kill unused macros in lsm.c
    AppArmor: cleanup generated files correctly
    KEYS: Add an iovec version of KEYCTL_INSTANTIATE
    KEYS: Add a new keyctl op to reject a key with a specified error code
    KEYS: Add a key type op to permit the key description to be vetted
    KEYS: Add an RCU payload dereference macro
    AppArmor: Cleanup make file to remove cruft and make it easier to read
    SELinux: implement the new sb_remount LSM hook
    LSM: Pass -o remount options to the LSM
    SELinux: Compute SID for the newly created socket
    SELinux: Socket retains creator role and MLS attribute
    SELinux: Auto-generate security_is_socket_class
    TOMOYO: Fix memory leak upon file open.
    Revert "selinux: simplify ioctl checking"
    selinux: drop unused packet flow permissions
    selinux: Fix packet forwarding checks on postrouting
    selinux: Fix wrong checks for selinux_policycap_netpeer
    selinux: Fix check for xfrm selinux context algorithm
    ima: remove unnecessary call to ima_must_measure
    IMA: remove IMA imbalance checking
    ...

    Linus Torvalds
     

15 Mar, 2011

2 commits


10 Mar, 2011

1 commit

  • Code has been converted over to the new explicit on-stack plugging,
    and delay users have been converted to use the new API for that.
    So lets kill off the old plugging along with aops->sync_page().

    Signed-off-by: Jens Axboe

    Jens Axboe
     

08 Mar, 2011

2 commits


04 Mar, 2011

1 commit

  • In a bs=4096 volume, if we call FITRIM with the following parameter as
    fstrim_range(start = 102400, len = 134144000, minlen = 10240), with the
    following code:
    if (len >= EXT3_BLOCKS_PER_GROUP(sb))
    len -= (EXT3_BLOCKS_PER_GROUP(sb) - first_block);
    else
    last_block = first_block + len;

    So if len < EXT3_BLOCKS_PER_GROUP while first_block + len >
    EXT3_BLOCKS_PER_GROUP, last_block will be set to an overflow value
    which exceeds EXT3_BLOCKS_PER_GROUP.

    This patch fixes it and adjusts len and last_block accordingly.

    Cc: Lukas Czerner
    Cc: Jan Kara
    Signed-off-by: Tao Ma
    Signed-off-by: Jan Kara

    Tao Ma
     

01 Mar, 2011

1 commit

  • Orphan cleanup is currently executed even if the file system has some
    number of unknown ROCOMPAT features, which deletes inodes and frees
    blocks, which could be very bad for some RO_COMPAT features.

    This patch skips the orphan cleanup if it contains readonly compatible
    features not known by this ext3 implementation, which would prevent
    the fs from being mounted (or remounted) readwrite.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     

24 Feb, 2011

2 commits

  • When we trim some free blocks in a group of ext3, we should
    calculate the free blocks properly and check whether there are
    enough freed blocks left for us to trim. Current solution will
    only calculate free spaces if they are large for a trim which
    is wrong.

    Let us see a small example:
    a group has 1.5M free which are 300k, 300k, 300k, 300k, 300k.
    And minblocks is 1M. With current solution, we have to iterate
    the whole group since these 300k will never be subtracted from
    1.5M. But actually we should exit after we find the first 2
    free spaces since the left 3 chunks only sum up to 900K if we
    subtract the first 600K although they can't be trimed.

    Cc: Jan Kara
    Cc: Lukas Czerner
    Signed-off-by: Tao Ma
    Signed-off-by: Jan Kara

    Tao Ma
     
  • As we have make the consense in the e-mail[1], the trim start should
    be added with first_data_block. So this patch fulfill it and remove
    the check for start < first_data_block.

    [1] http://www.spinics.net/lists/linux-ext4/msg22737.html

    Cc: Jan Kara
    Cc: Lukas Czerner
    Signed-off-by: Tao Ma
    Signed-off-by: Jan Kara

    Tao Ma
     

02 Feb, 2011

1 commit

  • SELinux would like to implement a new labeling behavior of newly created
    inodes. We currently label new inodes based on the parent and the creating
    process. This new behavior would also take into account the name of the
    new object when deciding the new label. This is not the (supposed) full path,
    just the last component of the path.

    This is very useful because creating /etc/shadow is different than creating
    /etc/passwd but the kernel hooks are unable to differentiate these
    operations. We currently require that userspace realize it is doing some
    difficult operation like that and than userspace jumps through SELinux hoops
    to get things set up correctly. This patch does not implement new
    behavior, that is obviously contained in a seperate SELinux patch, but it
    does pass the needed name down to the correct LSM hook. If no such name
    exists it is fine to pass NULL.

    Signed-off-by: Eric Paris

    Eric Paris
     

21 Jan, 2011

1 commit


14 Jan, 2011

1 commit

  • * 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
    block: ensure that completion error gets properly traced
    blktrace: add missing probe argument to block_bio_complete
    block cfq: don't use atomic_t for cfq_group
    block cfq: don't use atomic_t for cfq_queue
    block: trace event block fix unassigned field
    block: add internal hd part table references
    block: fix accounting bug on cross partition merges
    kref: add kref_test_and_get
    bio-integrity: mark kintegrityd_wq highpri and CPU intensive
    block: make kblockd_workqueue smarter
    Revert "sd: implement sd_check_events()"
    block: Clean up exit_io_context() source code.
    Fix compile warnings due to missing removal of a 'ret' variable
    fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
    block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
    cfq-iosched: don't check cfqg in choose_service_tree()
    fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
    cdrom: export cdrom_check_events()
    sd: implement sd_check_events()
    sr: implement sr_check_events()
    ...

    Linus Torvalds
     

13 Jan, 2011

1 commit

  • As Al Viro pointed out path resolution during Q_QUOTAON calls to quotactl
    is prone to deadlocks. We hold s_umount semaphore for reading during the
    path resolution and resolution itself may need to acquire the semaphore
    for writing when e. g. autofs mountpoint is passed.

    Solve the problem by performing the resolution before we get hold of the
    superblock (and thus s_umount semaphore). The whole thing is complicated
    by the fact that some filesystems (OCFS2) ignore the path argument. So to
    distinguish between filesystem which want the path and which do not we
    introduce new .quota_on_meta callback which does not get the path. OCFS2
    then uses this callback instead of old .quota_on.

    CC: Al Viro
    CC: Christoph Hellwig
    CC: Ted Ts'o
    CC: Joel Becker
    Signed-off-by: Jan Kara

    Jan Kara
     

12 Jan, 2011

2 commits

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (44 commits)
    ext4: fix trimming starting with block 0 with small blocksize
    ext4: revert buggy trim overflow patch
    ext4: don't pass entire map to check_eofblocks_fl
    ext4: fix memory leak in ext4_free_branches
    ext4: remove ext4_mb_return_to_preallocation()
    ext4: flush the i_completed_io_list during ext4_truncate
    ext4: add error checking to calls to ext4_handle_dirty_metadata()
    ext4: fix trimming of a single group
    ext4: fix uninitialized variable in ext4_register_li_request
    ext4: dynamically allocate the jbd2_inode in ext4_inode_info as necessary
    ext4: drop i_state_flags on architectures with 64-bit longs
    ext4: reorder ext4_inode_info structure elements to remove unneeded padding
    ext4: drop ec_type from the ext4_ext_cache structure
    ext4: use ext4_lblk_t instead of sector_t for logical blocks
    ext4: replace i_delalloc_reserved_flag with EXT4_STATE_DELALLOC_RESERVED
    ext4: fix 32bit overflow in ext4_ext_find_goal()
    ext4: add more error checks to ext4_mkdir()
    ext4: ext4_ext_migrate should use NULL not 0
    ext4: Use ext4_error_file() to print the pathname to the corrupted inode
    ext4: use IS_ERR() to check for errors in ext4_error_file
    ...

    Linus Torvalds
     
  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
    ext2: Resolve 'dereferencing pointer to incomplete type' when enabling EXT2_XATTR_DEBUG
    ext3: Remove redundant unlikely()
    ext2: Remove redundant unlikely()
    ext3: speed up file creates by optimizing rec_len functions
    ext2: speed up file creates by optimizing rec_len functions
    ext3: Add more journal error check
    ext3: Add journal error check in resize.c
    quota: Use %pV and __attribute__((format (printf in __quota_error and fix fallout
    ext3: Add FITRIM handling
    ext3: Add batched discard support for ext3
    ext3: Add journal error check into ext3_rename()
    ext3: Use search_dirblock() in ext3_dx_find_entry()
    ext3: Avoid uninitialized memory references with a corrupted htree directory
    ext3: Return error code from generic_check_addressable
    ext3: Add journal error check into ext3_delete_entry()
    ext3: Add error check in ext3_mkdir()
    fs/ext3/super.c: Use printf extension %pV
    fs/ext2/super.c: Use printf extension %pV
    ext3: don't update sb journal_devnum when RO dev

    Linus Torvalds
     

11 Jan, 2011

7 commits

  • IS_ERR() already implies unlikely(), so it can be omitted here.

    Signed-off-by: Tobias Klauser
    Signed-off-by: Jan Kara

    Tobias Klauser
     
  • The addition of 64k block capability in the rec_len_from_disk
    and rec_len_to_disk functions added a bit of math overhead which
    slows down file create workloads needlessly when the architecture
    cannot even support 64k blocks, thanks to page size limits.

    Similar changes already exist in the ext4 codebase.

    The directory entry checking can also be optimized a bit
    by sprinkling in some unlikely() conditions to move the
    error handling out of line.

    bonnie++ sequential file creates on a 512MB ramdisk speeds up
    from about 77,000/s to about 82,000/s, about a 6% improvement.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Jan Kara

    Eric Sandeen
     
  • Check return value of ext3_journal_get_write_acccess() and
    ext3_journal_dirty_metadata().

    Signed-off-by: Namhyung Kim
    Signed-off-by: Jan Kara

    Namhyung Kim
     
  • Check return value of ext3_journal_get_write_access() and
    ext3_journal_dirty_metadata().

    Signed-off-by: Namhyung Kim
    Signed-off-by: Jan Kara

    Namhyung Kim
     
  • The ioctl takes fstrim_range structure (defined in include/linux/fs.h) as an
    argument specifying a range of filesystem to trim and the minimum size of an
    continguous extent to trim. After the FITRIM is done, the number of bytes
    passed from the filesystem down the block stack to the device for potential
    discard is stored in fstrim_range.len. This number is a maximum discard amount
    from the storage device's perspective, because FITRIM called repeatedly will
    keep sending the same sectors for discard. fstrim_range.len will report the
    same potential discard bytes each time, but only sectors which had been written
    to between the discards would actually be discarded by the storage device.
    Further, the kernel block layer reserves the right to adjust the discard ranges
    to fit raid stripe geometry, non-trim capable devices in a LVM setup, etc.
    These reductions would not be reflected in fstrim_range.len.

    Thus fstrim_range.len can give the user better insight on how much storage
    space has potentially been released for wear-leveling, but it needs to be one
    of only one criteria the userspace tools take into account when trying to
    optimize calls to FITRIM.

    Thanks to Greg Freemyer for better commit message.

    Signed-off-by: Lukas Czerner
    Signed-off-by: Jan Kara

    Lukas Czerner
     
  • Walk through allocation groups and trim all free extents. It can be
    invoked through FITRIM ioctl on the file system. The main idea is to
    provide a way to trim the whole file system if needed, since some SSD's
    may suffer from performance loss after the whole device was filled (it
    does not mean that fs is full!).

    It search for free extents in allocation groups specified by Byte range
    start -> start+len. When the free extent is within this range, blocks are
    marked as used and then trimmed. Afterwards these blocks are marked as
    free in per-group bitmap.

    [JK: Fixed up error handling and trimming of a single group]

    Signed-off-by: Lukas Czerner
    Reviewed-by: Jan Kara
    Reviewed-by: Dmitry Monakhov
    Signed-off-by: Jan Kara

    Lukas Czerner
     
  • Signed-off-by: Wang Sheng-Hui
    Signed-off-by: "Theodore Ts'o"

    Wang Sheng-Hui
     

07 Jan, 2011

3 commits

  • This simple implementation just checks for no ACLs on the inode, and
    if so, then the rcu-walk may proceed, otherwise fail it.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Signed-off-by: Nick Piggin

    Nick Piggin
     
  • RCU free the struct inode. This will allow:

    - Subsequent store-free path walking patch. The inode must be consulted for
    permissions when walking, so an RCU inode reference is a must.
    - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
    to take i_lock no longer need to take sb_inode_list_lock to walk the list in
    the first place. This will simplify and optimize locking.
    - Could remove some nested trylock loops in dcache code
    - Could potentially simplify things a bit in VM land. Do not need to take the
    page lock to follow page->mapping.

    The downsides of this is the performance cost of using RCU. In a simple
    creat/unlink microbenchmark, performance drops by about 10% due to inability to
    reuse cache-hot slab objects. As iterations increase and RCU freeing starts
    kicking over, this increases to about 20%.

    In cases where inode lifetimes are longer (ie. many inodes may be allocated
    during the average life span of a single inode), a lot of this cache reuse is
    not applicable, so the regression caused by this patch is smaller.

    The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
    however this adds some complexity to list walking and store-free path walking,
    so I prefer to implement this at a later date, if it is shown to be a win in
    real situations. I haven't found a regression in any non-micro benchmark so I
    doubt it will be a problem.

    Signed-off-by: Nick Piggin

    Nick Piggin