23 Mar, 2011

1 commit

  • Instead of always creating a huge (268K) deflate_workspace with the
    maximum compression parameters (windowBits=15, memLevel=8), allow the
    caller to obtain a smaller workspace by specifying smaller parameter
    values.

    For example, when capturing oops and panic reports to a medium with
    limited capacity, such as NVRAM, compression may be the only way to
    capture the whole report. In this case, a small workspace (24K works
    fine) is a win, whether you allocate the workspace when you need it (i.e.,
    during an oops or panic) or at boot time.

    I've verified that this patch works with all accepted values of windowBits
    (positive and negative), memLevel, and compression level.

    Signed-off-by: Jim Keniston
    Cc: Herbert Xu
    Cc: David Miller
    Cc: Chris Mason
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jim Keniston
     

19 Mar, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (47 commits)
    doc: CONFIG_UNEVICTABLE_LRU doesn't exist anymore
    Update cpuset info & webiste for cgroups
    dcdbas: force SMI to happen when expected
    arch/arm/Kconfig: remove one to many l's in the word.
    asm-generic/user.h: Fix spelling in comment
    drm: fix printk typo 'sracth'
    Remove one to many n's in a word
    Documentation/filesystems/romfs.txt: fixing link to genromfs
    drivers:scsi Change printk typo initate -> initiate
    serial, pch uart: Remove duplicate inclusion of linux/pci.h header
    fs/eventpoll.c: fix spelling
    mm: Fix out-of-date comments which refers non-existent functions
    drm: Fix printk typo 'failled'
    coh901318.c: Change initate to initiate.
    mbox-db5500.c Change initate to initiate.
    edac: correct i82975x error-info reported
    edac: correct i82975x mci initialisation
    edac: correct commented info
    fs: update comments to point correct document
    target: remove duplicate include of target/target_core_device.h from drivers/target/target_core_hba.c
    ...

    Trivial conflict in fs/eventpoll.c (spelling vs addition)

    Linus Torvalds
     

17 Mar, 2011

1 commit

  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (33 commits)
    AppArmor: kill unused macros in lsm.c
    AppArmor: cleanup generated files correctly
    KEYS: Add an iovec version of KEYCTL_INSTANTIATE
    KEYS: Add a new keyctl op to reject a key with a specified error code
    KEYS: Add a key type op to permit the key description to be vetted
    KEYS: Add an RCU payload dereference macro
    AppArmor: Cleanup make file to remove cruft and make it easier to read
    SELinux: implement the new sb_remount LSM hook
    LSM: Pass -o remount options to the LSM
    SELinux: Compute SID for the newly created socket
    SELinux: Socket retains creator role and MLS attribute
    SELinux: Auto-generate security_is_socket_class
    TOMOYO: Fix memory leak upon file open.
    Revert "selinux: simplify ioctl checking"
    selinux: drop unused packet flow permissions
    selinux: Fix packet forwarding checks on postrouting
    selinux: Fix wrong checks for selinux_policycap_netpeer
    selinux: Fix check for xfrm selinux context algorithm
    ima: remove unnecessary call to ima_must_measure
    IMA: remove IMA imbalance checking
    ...

    Linus Torvalds
     

16 Mar, 2011

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (57 commits)
    tidy the trailing symlinks traversal up
    Turn resolution of trailing symlinks iterative everywhere
    simplify link_path_walk() tail
    Make trailing symlink resolution in path_lookupat() iterative
    update nd->inode in __do_follow_link() instead of after do_follow_link()
    pull handling of one pathname component into a helper
    fs: allow AT_EMPTY_PATH in linkat(), limit that to CAP_DAC_READ_SEARCH
    Allow passing O_PATH descriptors via SCM_RIGHTS datagrams
    readlinkat(), fchownat() and fstatat() with empty relative pathnames
    Allow O_PATH for symlinks
    New kind of open files - "location only".
    ext4: Copy fs UUID to superblock
    ext3: Copy fs UUID to superblock.
    vfs: Export file system uuid via /proc//mountinfo
    unistd.h: Add new syscalls numbers to asm-generic
    x86: Add new syscalls for x86_64
    x86: Add new syscalls for x86_32
    fs: Remove i_nlink check from file system link callback
    fs: Don't allow to create hardlink for deleted file
    vfs: Add open by file handle support
    ...

    Linus Torvalds
     
  • James Morris
     

15 Mar, 2011

1 commit


14 Mar, 2011

2 commits


12 Mar, 2011

1 commit

  • Josef had changed shrink_delalloc to exit after three shrink
    attempts, which wasn't quite enough because new writers could
    race in and steal free space.

    But it also fixed deadlocks and stalls as we tried to recover
    delalloc reservations. The code was tweaked to loop 1024
    times, and would reset the counter any time a small amount
    of progress was made. This was too drastic, and with a
    lot of writers we can end up stuck in shrink_delalloc forever.

    The shrink_delalloc loop is fairly complex because the caller is looping
    too, and the caller will go ahead and force a transaction commit to make
    sure we reclaim space.

    This reworks things to exit shrink_delalloc when we've forced some
    writeback and the delalloc reservations have gone down. This means
    the writeback has not just started but has also finished at
    least some of the metadata changes required to reclaim delalloc
    space.

    If we've got this wrong, we're returning ENOSPC too early, which
    is a big improvement over the current behavior of hanging the machine.

    Test 224 in xfstests hammers on this nicely, and with 1000 writers
    trying to fill a 1GB drive we get our first ENOSPC at 93% full. The
    other writers are able to continue until we get 100%.

    This is a worst case test for btrfs because the 1000 writers are doing
    small IO, and the small FS size means we don't have a lot of room
    for metadata chunks.

    Signed-off-by: Chris Mason

    Chris Mason
     

11 Mar, 2011

2 commits

  • btrfs_link() will insert 3 items(inode ref, dir name item and dir index item)
    into the b+ tree and update 2 items(its inode, and parent's inode) in the b+
    tree. So we should reserve space for these 5 items, not 3 items.

    Reported-by: Tsutomu Itoh
    Signed-off-by: Miao Xie
    Signed-off-by: Chris Mason

    Miao Xie
     
  • The btrfs DIO code leaks dip structs when dip->csums allocation
    fails; bio->bi_end_io isn't set at the point where the free_ordered
    branch is consequently taken, thus bio_endio doesn't call the function
    which would free it in the normal case. Fix.

    Signed-off-by: Daniel J Blueman
    Acked-by: Miao Xie
    Signed-off-by: Chris Mason

    Daniel J Blueman
     

09 Mar, 2011

1 commit

  • The btrfs fiemap code was incorrectly returning duplicate or overlapping
    extents in some cases. cp was blindly trusting this result and we would
    end up with a destination file that was bigger than the original because
    some bytes were copied twice.

    The fix here adjusts our offsets to make sure we're always moving
    forward in the fiemap results.

    Signed-off-by: Chris Mason

    Chris Mason
     

08 Mar, 2011

2 commits


07 Mar, 2011

1 commit

  • Commit 914ee295af418e936ec20a08c1663eaabe4cd07a fixed deadlocks in
    btrfs_file_write where we would catch page faults on pages we had
    locked.

    But, there were a few problems:

    1) The x86-32 iov_iter_copy_from_user_atomic code always fails to copy
    data when the amount to copy is more than 4K and the offset to start
    copying from is not page aligned. The result was btrfs_file_write
    looping forever retrying the iov_iter_copy_from_user_atomic

    We deal with this by changing btrfs_file_write to drop down to single
    page copies when iov_iter_copy_from_user_atomic starts returning failure.

    2) The btrfs_file_write code was leaking delalloc reservations when
    iov_iter_copy_from_user_atomic returned zero. The looping above would
    result in the entire filesystem running out of delalloc reservations and
    constantly trying to flush things to disk.

    3) btrfs_file_write will lock down page cache pages, make sure
    any writeback is finished, do the copy_from_user and then release them.
    Before the loop runs we check the first and last pages in the write to
    see if they are only being partially modified. If the start or end of
    the write isn't aligned, we make sure the corresponding pages are
    up to date so that we don't introduce garbage into the file.

    With the copy_from_user changes, we're allowing the VM to reclaim the
    pages after a partial update from copy_from_user, but we're not
    making sure the page cache page is up to date when we loop around to
    resume the write.

    We deal with this by pushing the up to date checks down into the page
    prep code. This fits better with how the rest of file_write works.

    Signed-off-by: Chris Mason
    Reported-by: Mitch Harder
    cc: stable@kernel.org

    Chris Mason
     

01 Mar, 2011

1 commit


26 Feb, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
    Btrfs: fix fiemap bugs with delalloc
    Btrfs: set FMODE_EXCL in btrfs_device->mode
    Btrfs: make btrfs_rm_device() fail gracefully
    Btrfs: Avoid accessing unmapped kernel address
    Btrfs: Fix BTRFS_IOC_SUBVOL_SETFLAGS ioctl
    Btrfs: allow balance to explicitly allocate chunks as it relocates
    Btrfs: put ENOSPC debugging under a mount option

    Linus Torvalds
     

24 Feb, 2011

1 commit

  • The Btrfs fiemap code wasn't properly returning delalloc extents,
    so applications that trust fiemap to decide if there are holes in the
    file see holes instead of delalloc.

    This reworks the btrfs fiemap code, adding a get_extent helper that
    searches for delalloc ranges and also adding a helper for extent_fiemap
    that skips past holes in the file.

    Signed-off-by: Chris Mason

    Chris Mason
     

17 Feb, 2011

6 commits

  • This fixes a bug introduced in d4d77629, where the device added online
    (and therefore initialized via btrfs_init_new_device()) would be left
    with the positive bdev->bd_holders after unmount. Since d4d77629 we no
    longer OR FMODE_EXCL explicitly on blkdev_put(), set it in
    btrfs_device->mode.

    Signed-off-by: Ilya Dryomov
    Signed-off-by: Chris Mason

    Ilya Dryomov
     
  • If shrinking done as part of the online device removal fails add that
    device back to the allocation list and increment the rw_devices counter.
    This fixes two bugs:

    1) we could have a perfectly good device out of alloc list for no good
    reason;

    2) in the btrfs consisting of two devices, failure in btrfs_rm_device()
    could lead to a situation where it was impossible to remove any of the
    devices because of the "unable to remove the only writeable device"
    error.

    Signed-off-by: Ilya Dryomov
    Signed-off-by: Chris Mason

    Ilya Dryomov
     
  • When decompressing a chunk of data, we'll copy the data out to
    a working buffer if the data is stored in more than one page,
    otherwise we'll use the mapped page directly to avoid memory
    copy.

    In the latter case, we'll end up accessing the kernel address
    after we've unmapped the page in a corner case.

    Reported-by: Juan Francisco Cantero Hurtado
    Signed-off-by: Li Zefan
    Signed-off-by: Chris Mason

    Li Zefan
     
  • - Check user-specified flags correctly
    - Check the inode owership
    - Search root item in root tree but not fs tree

    Reported-by: Dan Rosenberg
    Signed-off-by: Li Zefan
    Signed-off-by: Chris Mason

    Li Zefan
     
  • Btrfs device shrinking and balancing ends up reallocating all the blocks
    in order to allow COW to move them to new destinations. It is somewhat
    awkward in terms of ENOSPC because most of the enospc code is built
    around the idea that some operation on a reference counted tree triggers
    allocations in the non-reference counted trees.

    This commit changes the balancing code to deal with enospc by trying to
    allocate a new chunk. If that allocation succeeds, we go ahead and
    retry whatever failed due to enospc.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • ENOSPC in btrfs is getting to the point where the extra debugging isn't
    required. I've put it under mount -o enospc_debug just in case someone
    is having difficult problems.

    Signed-off-by: Chris Mason

    Chris Mason
     

16 Feb, 2011

1 commit


15 Feb, 2011

6 commits

  • I add the check on the return value of alloc_extent_map() to several places.
    In addition, alloc_extent_map() returns only the address or NULL.
    Therefore, check by IS_ERR() is unnecessary. So, I remove IS_ERR() checking.

    Signed-off-by: Tsutomu Itoh
    Signed-off-by: Chris Mason

    Tsutomu Itoh
     
  • Memory allocated by calling kstrdup() should be freed.

    Signed-off-by: Ilya Dryomov
    Signed-off-by: Chris Mason

    Ilya Dryomov
     
  • Commit bf5fc093c5b625e4259203f1cee7ca73488a5620 refactored
    btrfs_ioctl_space_info() and introduced several security issues.

    space_args.space_slots is an unsigned 64-bit type controlled by a
    possibly unprivileged caller. The comparison as a signed int type
    allows providing values that are treated as negative and cause the
    subsequent allocation size calculation to wrap, or be truncated to 0.
    By providing a size that's truncated to 0, kmalloc() will return
    ZERO_SIZE_PTR. It's also possible to provide a value smaller than the
    slot count. The subsequent loop ignores the allocation size when
    copying data in, resulting in a heap overflow or write to ZERO_SIZE_PTR.

    The fix changes the slot count type and comparison typecast to u64,
    which prevents truncation or signedness errors, and also ensures that we
    don't copy more data than we've allocated in the subsequent loop. Note
    that zero-size allocations are no longer possible since there is already
    an explicit check for space_args.space_slots being 0 and truncation of
    this value is no longer an issue.

    Signed-off-by: Dan Rosenberg
    Signed-off-by: Josef Bacik
    Reviewed-by: Josef Bacik
    Signed-off-by: Chris Mason

    Dan Rosenberg
     
  • Mark the cloned backref_node as checked in clone_backref_node()

    Signed-off-by: Yan, Zheng
    Signed-off-by: Chris Mason

    Yan, Zheng
     
  • Btrfs tracks uptodate state in an rbtree as well as in the
    page bits. This is supposed to enable us to use block sizes other than
    the page size, but there are a few parts still missing before that
    completely works.

    But, our readpage routine trusts this additional range based tracking
    of uptodateness, much in the same way the buffer head up to date bits
    are trusted for the other filesystems.

    The problem is that sometimes we need to allocate memory in order to
    split records in the rbtree, even when we are just clearing bits. This
    can be difficult when our clearing function is called GFP_ATOMIC, which
    can happen in the releasepage path.

    So, what happens today looks like this:

    releasepage called with GFP_ATOMIC
    btrfs_releasepage calls clear_extent_bit
    clear_extent_bit fails to allocate ram, leaving the up to date bit set
    btrfs_releasepage returns success

    The end result is the page being gone, but btrfs thinking the range is
    up to date. Later on if someone tries to read that same page, the
    btrfs readpage code will return immediately thinking the page is already
    up to date.

    This commit fixes things to fail the releasepage when we can't clear the
    extent state bits. It covers both data pages and metadata tree blocks.

    Signed-off-by: Chris Mason

    Chris Mason
     
  • There is a race where btrfs_releasepage can drop the
    page->private contents just as alloc_extent_buffer is setting
    up pages for metadata. Because of how the Btrfs page flags work,
    this results in us skipping the crc on the page during IO.

    This patch sovles the race by waiting until after the extent buffer
    is inserted into the radix tree before it sets page private.

    Signed-off-by: Chris Mason

    Chris Mason
     

08 Feb, 2011

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: (33 commits)
    Btrfs: Fix page count calculation
    btrfs: Drop __exit attribute on btrfs_exit_compress
    btrfs: cleanup error handling in btrfs_unlink_inode()
    Btrfs: exclude super blocks when we read in block groups
    Btrfs: make sure search_bitmap finds something in remove_from_bitmap
    btrfs: fix return value check of btrfs_start_transaction()
    btrfs: checking NULL or not in some functions
    Btrfs: avoid uninit variable warnings in ordered-data.c
    Btrfs: catch errors from btrfs_sync_log
    Btrfs: make shrink_delalloc a little friendlier
    Btrfs: handle no memory properly in prepare_pages
    Btrfs: do error checking in btrfs_del_csums
    Btrfs: use the global block reserve if we cannot reserve space
    Btrfs: do not release more reserved bytes to the global_block_rsv than we need
    Btrfs: fix check_path_shared so it returns the right value
    btrfs: check return value of btrfs_start_ioctl_transaction() properly
    btrfs: fix return value check of btrfs_join_transaction()
    fs/btrfs/inode.c: Add missing IS_ERR test
    btrfs: fix missing break in switch phrase
    btrfs: fix several uncheck memory allocations
    ...

    Linus Torvalds
     
  • take offset of start position into account when calculating page count.

    Signed-off-by: Yan, Zheng
    Signed-off-by: Chris Mason

    Yan, Zheng
     

06 Feb, 2011

4 commits

  • As this function is called in some error paths while not
    removing the module, the __exit attribute prevents the kernel
    image from linking when btrfs is compiled in statically.

    Signed-off-by: Alexey Charkov
    Signed-off-by: Chris Mason

    Alexey Charkov
     
  • When btrfs_alloc_path() fails, btrfs_free_path() need not be called.
    Therefore, it changes the branch ahead.

    Signed-off-by: Tsutomu Itoh
    Signed-off-by: Chris Mason

    Tsutomu Itoh
     
  • This has been resulting in a BUT_ON(ret) after btrfs_reserve_extent in
    btrfs_cow_file_range. The reason is we don't actually calculate the bytes_super
    for a block group until we go to cache it, which means that the space_info can
    hand out reservations for space that it doesn't actually have, and we can run
    out of data space. This is also a problem if you are using space caching since
    we don't ever calculate bytes_super for the block groups. So instead everytime
    we read a block group call exclude_super_stripes, which calculates the
    bytes_super for the block group so it can be left out of the space_info. Then
    whenever caching completes we just call free_excluded_extents so that the super
    excluded extents are freed up. Also if we are unmounting and we hit any block
    groups that haven't been cached we still need to call free_excluded_extents to
    make sure things are cleaned up properly. Thanks,

    Reported-by: Arne Jansen
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • When we're cleaning up the tree log we need to be able to remove free space from
    the block group. The problem is if that free space spans bitmaps we would not
    find the space since we're looking for too many bytes. So make sure the amount
    of bytes we search for is limited to either the number of bytes we want, or the
    number of bytes left in the bitmap. This was tested by a user who was hitting
    the BUG() after search_bitmap. With this patch he can now mount his fs.
    Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     

02 Feb, 2011

1 commit

  • SELinux would like to implement a new labeling behavior of newly created
    inodes. We currently label new inodes based on the parent and the creating
    process. This new behavior would also take into account the name of the
    new object when deciding the new label. This is not the (supposed) full path,
    just the last component of the path.

    This is very useful because creating /etc/shadow is different than creating
    /etc/passwd but the kernel hooks are unable to differentiate these
    operations. We currently require that userspace realize it is doing some
    difficult operation like that and than userspace jumps through SELinux hoops
    to get things set up correctly. This patch does not implement new
    behavior, that is obviously contained in a seperate SELinux patch, but it
    does pass the needed name down to the correct LSM hook. If no such name
    exists it is fine to pass NULL.

    Signed-off-by: Eric Paris

    Eric Paris
     

01 Feb, 2011

2 commits