15 Jun, 2013

8 commits

  • Pull VFS fixes from Al Viro:
    "Several fixes + obvious cleanup (you've missed a couple of open-coded
    can_lookup() back then)"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    snd_pcm_link(): fix a leak...
    use can_lookup() instead of direct checks of ->i_op->lookup
    move exit_task_namespaces() outside of exit_notify()
    fput: task_work_add() can fail if the caller has passed exit_task_work()
    ncpfs: fix rmdir returns Device or resource busy

    Linus Torvalds
     
  • Pull xfs fixes from Ben Myers:
    - Remove noisy warnings about experimental support which spams the logs
    - Add padding to align directory and attr structures correctly
    - Set block number on child buffer on a root btree split
    - Disable verifiers during log recovery for non-CRC filesystems

    * tag 'for-linus-v3.10-rc6' of git://oss.sgi.com/xfs/xfs:
    xfs: don't shutdown log recovery on validation errors
    xfs: ensure btree root split sets blkno correctly
    xfs: fix implicit padding in directory and attr CRC formats
    xfs: don't emit v5 superblock warnings on write

    Linus Torvalds
     
  • a couple of places got missed back when Linus has introduced that one...

    Signed-off-by: Al Viro

    Al Viro
     
  • fput() assumes that it can't be called after exit_task_work() but
    this is not true, for example free_ipc_ns()->shm_destroy() can do
    this. In this case fput() silently leaks the file.

    Change it to fallback to delayed_fput_work if task_work_add() fails.
    The patch looks complicated but it is not, it changes the code from

    if (PF_KTHREAD) {
    schedule_work(...);
    return;
    }
    task_work_add(...)

    to
    if (!PF_KTHREAD) {
    if (!task_work_add(...))
    return;
    /* fallback */
    }
    schedule_work(...);

    As for shm_destroy() in particular, we could make another fix but I
    think this change makes sense anyway. There could be another similar
    user, it is not safe to assume that task_work_add() can't fail.

    Reported-by: Andrey Vagin
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Al Viro

    Oleg Nesterov
     
  • Unfortunately, we cannot guarantee that items logged multiple times
    and replayed by log recovery do not take objects back in time. When
    they are taken back in time, the go into an intermediate state which
    is corrupt, and hence verification that occurs on this intermediate
    state causes log recovery to abort with a corruption shutdown.

    Instead of causing a shutdown and unmountable filesystem, don't
    verify post-recovery items before they are written to disk. This is
    less than optimal, but there is no way to detect this issue for
    non-CRC filesystems If log recovery successfully completes, this
    will be undone and the object will be consistent by subsequent
    transactions that are replayed, so in most cases we don't need to
    take drastic action.

    For CRC enabled filesystems, leave the verifiers in place - we need
    to call them to recalculate the CRCs on the objects anyway. This
    recovery problem can be solved for such filesystems - we have a LSN
    stamped in all metadata at writeback time that we can to determine
    whether the item should be replayed or not. This is a separate piece
    of work, so is not addressed by this patch.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    (cherry picked from commit 9222a9cf86c0d64ffbedf567412b55da18763aa3)

    Dave Chinner
     
  • For CRC enabled filesystems, the BMBT is rooted in an inode, so it
    passes through a different code path on root splits than the
    freespace and inode btrees. This is much less traversed by xfstests
    than the other trees. When testing on a 1k block size filesystem,
    I've been seeing ASSERT failures in generic/234 like:

    XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_private.b.allocated == 0, file: fs/xfs/xfs_btree.c, line: 317

    which are generally preceded by a lblock check failure. I noticed
    this in the bmbt stats:

    $ pminfo -f xfs.btree.block_map

    xfs.btree.block_map.lookup
    value 39135

    xfs.btree.block_map.compare
    value 268432

    xfs.btree.block_map.insrec
    value 15786

    xfs.btree.block_map.delrec
    value 13884

    xfs.btree.block_map.newroot
    value 2

    xfs.btree.block_map.killroot
    value 0
    .....

    Very little coverage of root splits and merges. Indeed, on a 4k
    filesystem, block_map.newroot and block_map.killroot are both zero.
    i.e. the code is not exercised at all, and it's the only generic
    btree infrastructure operation that is not exercised by a default run
    of xfstests.

    Turns out that on a 1k filesystem, generic/234 accounts for one of
    those two root splits, and that is somewhat of a smoking gun. In
    fact, it's the same problem we saw in the directory/attr code where
    headers are memcpy()d from one block to another without updating the
    self describing metadata.

    Simple fix - when copying the header out of the root block, make
    sure the block number is updated correctly.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    (cherry picked from commit ade1335afef556df6538eb02e8c0dc91fbd9cc37)

    Dave Chinner
     
  • Michael L. Semon has been testing CRC patches on a 32 bit system and
    been seeing assert failures in the directory code from xfs/080.
    Thanks to Michael's heroic efforts with printk debugging, we found
    that the problem was that the last free space being left in the
    directory structure was too small to fit a unused tag structure and
    it was being corrupted and attempting to log a region out of bounds.
    Hence the assert failure looked something like:

    .....
    #5 calling xfs_dir2_data_log_unused() 36 32
    #1 4092 4095 4096
    #2 8182 8183 4096
    XFS: Assertion failed: first < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568

    Where #1 showed the first region of the dup being logged (i.e. the
    last 4 bytes of a directory buffer) and #2 shows the corrupt values
    being calculated from the length of the dup entry which overflowed
    the size of the buffer.

    It turns out that the problem was not in the logging code, nor in
    the freespace handling code. It is an initial condition bug that
    only shows up on 32 bit systems. When a new buffer is initialised,
    where's the freespace that is set up:

    [ 172.316249] calling xfs_dir2_leaf_addname() from xfs_dir_createname()
    [ 172.316346] #9 calling xfs_dir2_data_log_unused()
    [ 172.316351] #1 calling xfs_trans_log_buf() 60 63 4096
    [ 172.316353] #2 calling xfs_trans_log_buf() 4094 4095 4096

    Note the offset of the first region being logged? It's 60 bytes into
    the buffer. Once I saw that, I pretty much knew that the bug was
    going to be caused by this.

    Essentially, all direct entries are rounded to 8 bytes in length,
    and all entries start with an 8 byte alignment. This means that we
    can decode inplace as variables are naturally aligned. With the
    directory data supposedly starting on a 8 byte boundary, and all
    entries padded to 8 bytes, the minimum freespace in a directory
    block is supposed to be 8 bytes, which is large enough to fit a
    unused data entry structure (6 bytes in size). The fact we only have
    4 bytes of free space indicates a directory data block alignment
    problem.

    And what do you know - there's an implicit hole in the directory
    data block header for the CRC format, which means the header is 60
    byte on 32 bit intel systems and 64 bytes on 64 bit systems. Needs
    padding. And while looking at the structures, I found the same
    problem in the attr leaf header. Fix them both.

    Note that this only affects 32 bit systems with CRCs enabled.
    Everything else is just fine. Note that CRC enabled filesystems created
    before this fix on such systems will not be readable with this fix
    applied.

    Reported-by: Michael L. Semon
    Debugged-by: Michael L. Semon
    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    (cherry picked from commit 8a1fd2950e1fe267e11fc8c85dcaa6b023b51b60)

    Dave Chinner
     
  • We write the superblock every 30s or so which results in the
    verifier being called. Right now that results in this output
    every 30s:

    XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
    Use of these features in this kernel is at your own risk!

    And spamming the logs.

    We don't need to check for whether we support v5 superblocks or
    whether there are feature bits we don't support set as these are
    only relevant when we first mount the filesytem. i.e. on superblock
    read. Hence for the write verification we can just skip all the
    checks (and hence verbose output) altogether.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Ben Myers

    (cherry picked from commit 34510185abeaa5be9b178a41c0a03d30aec3db7e)

    Dave Chinner
     

14 Jun, 2013

1 commit


13 Jun, 2013

6 commits

  • Merge misc fixes from Andrew Morton:
    "Bunch of fixes and one little addition to math64.h"

    * emailed patches from Andrew Morton : (27 commits)
    include/linux/math64.h: add div64_ul()
    mm: memcontrol: fix lockless reclaim hierarchy iterator
    frontswap: fix incorrect zeroing and allocation size for frontswap_map
    kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules()
    mm: migration: add migrate_entry_wait_huge()
    ocfs2: add missing lockres put in dlm_mig_lockres_handler
    mm/page_alloc.c: fix watermark check in __zone_watermark_ok()
    drivers/misc/sgi-gru/grufile.c: fix info leak in gru_get_config_info()
    aio: fix io_destroy() regression by using call_rcu()
    rtc-at91rm9200: use shadow IMR on at91sam9x5
    rtc-at91rm9200: add shadow interrupt mask
    rtc-at91rm9200: refactor interrupt-register handling
    rtc-at91rm9200: add configuration support
    rtc-at91rm9200: add match-table compile guard
    fs/ocfs2/namei.c: remove unecessary ERROR when removing non-empty directory
    swap: avoid read_swap_cache_async() race to deadlock while waiting on discard I/O completion
    drivers/rtc/rtc-twl.c: fix missing device_init_wakeup() when booted with device tree
    cciss: fix broken mutex usage in ioctl
    audit: wait_for_auditd() should use TASK_UNINTERRUPTIBLE
    drivers/rtc/rtc-cmos.c: fix accidentally enabling rtc channel
    ...

    Linus Torvalds
     
  • dlm_mig_lockres_handler() is missing a dlm_lockres_put() on an error path.

    Signed-off-by: joyce
    Reviewed-by: shencanquan
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xue jiufei
     
  • There was a regression introduced by 36f5588905c1 ("aio: refcounting
    cleanup"), reported by Jens Axboe - the refcounting cleanup switched to
    using RCU in the shutdown path, but the synchronize_rcu() was done in
    the context of the io_destroy() syscall greatly increasing the time it
    could block.

    This patch switches it to call_rcu() and makes shutdown asynchronous
    (more asynchronous than it was originally; before the refcount changes
    io_destroy() would still wait on pending kiocbs).

    Note that there's a global quota on the max outstanding kiocbs, and that
    quota must be manipulated synchronously; otherwise io_setup() could
    return -EAGAIN when there isn't quota available, and userspace won't
    have any way of waiting until shutdown of the old kioctxs has finished
    (besides busy looping).

    So we release our quota before kioctx shutdown has finished, which
    should be fine since the quota never corresponded to anything real
    anyways.

    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Reported-by: Jens Axboe
    Tested-by: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Signed-off-by: Benjamin LaHaise
    Tested-by: Benjamin LaHaise
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • While removing a non-empty directory, the kernel dumps a message:

    (rmdir,21743,1):ocfs2_unlink:953 ERROR: status = -39

    Suppress the error message from being printed in the dmesg so users
    don't panic.

    Signed-off-by: Goldwyn Rodrigues
    Cc: Mark Fasheh
    Cc: Joel Becker
    Acked-by: Sunil Mushran
    Reviewed-by: Jie Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Goldwyn Rodrigues
     
  • If an error occurs, for example an EIO in __ocfs2_prepare_orphan_dir,
    ocfs2_prep_new_orphaned_file will release the inode_ac, then when the
    caller of ocfs2_prep_new_orphaned_file gets a 0 return, it will refer to
    a NULL ocfs2_alloc_context struct in the following functions. A kernel
    panic happens.

    Signed-off-by: "Xiaowei.Hu"
    Reviewed-by: shencanquan
    Acked-by: Sunil Mushran
    Cc: Joe Jin
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xiaowei.Hu
     
  • The dmesg_restrict sysctl currently covers the syslog method for access
    dmesg, however /dev/kmsg isn't covered by the same protections. Most
    people haven't noticed because util-linux dmesg(1) defaults to using the
    syslog method for access in older versions. With util-linux dmesg(1)
    defaults to reading directly from /dev/kmsg.

    To fix /dev/kmsg, let's compare the existing interfaces and what they
    allow:

    - /proc/kmsg allows:
    - open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
    single-reader interface (SYSLOG_ACTION_READ).
    - everything, after an open.

    - syslog syscall allows:
    - anything, if CAP_SYSLOG.
    - SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
    dmesg_restrict==0.
    - nothing else (EPERM).

    The use-cases were:
    - dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
    - sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
    destructive SYSLOG_ACTION_READs.

    AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
    clear the ring buffer.

    Based on the comments in devkmsg_llseek, it sounds like actions besides
    reading aren't going to be supported by /dev/kmsg (i.e.
    SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
    syslog syscall actions.

    To this end, move the check as Josh had done, but also rename the
    constants to reflect their new uses (SYSLOG_FROM_CALL becomes
    SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
    SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
    allows destructive actions after a capabilities-constrained
    SYSLOG_ACTION_OPEN check.

    - /dev/kmsg allows:
    - open if CAP_SYSLOG or dmesg_restrict==0
    - reading/polling, after open

    Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192

    [akpm@linux-foundation.org: use pr_warn_once()]
    Signed-off-by: Kees Cook
    Reported-by: Christian Kujau
    Tested-by: Josh Boyer
    Cc: Kay Sievers
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

12 Jun, 2013

1 commit

  • Pull ceph fixes from Sage Weil:
    "There is a pair of fixes for double-frees in the recent bundle for
    3.10, a couple of fixes for long-standing bugs (sleep while atomic and
    an endianness fix), and a locking fix that can be triggered when osds
    are going down"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    rbd: fix cleanup in rbd_add()
    rbd: don't destroy ceph_opts in rbd_add()
    ceph: ceph_pagelist_append might sleep while atomic
    ceph: add cpu_to_le32() calls when encoding a reconnect capability
    libceph: must hold mutex for reset_changed_osds()

    Linus Torvalds
     

09 Jun, 2013

7 commits

  • This patch fixes warnings due to missing lock on write error path.

    WARNING: at fs/hpfs/hpfs_fn.h:353 hpfs_truncate+0x75/0x80 [hpfs]()
    Hardware name: empty
    Pid: 26563, comm: dd Tainted: P O 3.9.4 #12
    Call Trace:
    hpfs_truncate+0x75/0x80 [hpfs]
    hpfs_write_begin+0x84/0x90 [hpfs]
    _hpfs_bmap+0x10/0x10 [hpfs]
    generic_file_buffered_write+0x121/0x2c0
    __generic_file_aio_write+0x1c7/0x3f0
    generic_file_aio_write+0x7c/0x100
    do_sync_write+0x98/0xd0
    hpfs_file_write+0xd/0x50 [hpfs]
    vfs_write+0xa2/0x160
    sys_write+0x51/0xa0
    page_fault+0x22/0x30
    system_call_fastpath+0x1a/0x1f

    Signed-off-by: Mikulas Patocka
    Cc: stable@kernel.org # 2.6.39+
    Signed-off-by: Linus Torvalds

    Mikulas Patocka
     
  • Pull timer fixes from Thomas Gleixner:

    - Trivial: unused variable removal

    - Posix-timers: Add the clock ID to the new proc interface to make it
    useful. The interface is new and should be functional when we reach
    the final 3.10 release.

    - Cure a false positive warning in the tick code introduced by the
    overhaul in 3.10

    - Fix for a persistent clock detection regression introduced in this
    cycle

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    timekeeping: Correct run-time detection of persistent_clock.
    ntp: Remove unused variable flags in __hardpps
    posix-timers: Show clock ID in proc file
    tick: Cure broadcast false positive pending bit warning

    Linus Torvalds
     
  • Dave reported a panic because the extent_root->commit_root was NULL in the
    caching kthread. That is because we just unset it in free_root_pointers, which
    is not the correct thing to do, we have to either wait for the caching kthread
    to complete or hold the extent_commit_sem lock so we know the thread has exited.
    This patch makes the kthreads all stop first and then we do our cleanup. This
    should fix the race. Thanks,

    Reported-by: David Sterba
    Signed-off-by: Josef Bacik

    Josef Bacik
     
  • Commit be283b2e674a09457d4563729015adb637ce7cc1
    ( Btrfs: use helper to cleanup tree roots) introduced the following bug,

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000034
    IP: [] extent_buffer_get+0x4/0xa [btrfs]
    [...]
    Pid: 2463, comm: btrfs-cache-1 Tainted: G O 3.9.0+ #4 innotek GmbH VirtualBox/VirtualBox
    RIP: 0010:[] [] extent_buffer_get+0x4/0xa [btrfs]
    Process btrfs-cache-1 (pid: 2463, threadinfo ffff880112d60000, task ffff880117679730)
    [...]
    Call Trace:
    [] btrfs_search_slot+0x104/0x64d [btrfs]
    [] btrfs_next_old_leaf+0xa7/0x334 [btrfs]
    [] btrfs_next_leaf+0x10/0x12 [btrfs]
    [] caching_thread+0x1a3/0x2e0 [btrfs]
    [] worker_loop+0x14b/0x48e [btrfs]
    [] ? btrfs_queue_worker+0x25c/0x25c [btrfs]
    [] kthread+0x8d/0x95
    [] ? kthread_freezable_should_stop+0x43/0x43
    [] ret_from_fork+0x7c/0xb0
    [] ? kthread_freezable_should_stop+0x43/0x43
    RIP [] extent_buffer_get+0x4/0xa [btrfs]

    We've free'ed commit_root before actually getting to free block groups where
    caching thread needs valid extent_root->commit_root.

    Signed-off-by: Liu Bo
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Liu Bo
     
  • Dave reported a NULL pointer deref. This is caused because he thought he'd be
    smart and add sanity checks to the extent_io bit operations, but he didn't
    expect a tree to have a NULL mapping. To fix this we just need to init the
    relocation's processed_blocks with the btree_inode->i_mapping. Thanks,

    Reported-by: David Sterba
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • There is a path where btrfs_drop_inode() is called with its inode's root
    is NULL: In btrfs_new_inode(), when btrfs_set_inode_index() fails,
    iput() is called. We should handle this case before taking look at the
    root->root_item.

    Signed-off-by: Naohiro Aota
    Reviewed-by: Miao Xie
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Naohiro Aota
     
  • We get a use after free if we had a transaction to cleanup since there could be
    delayed inodes which refer to their respective fs_root. Thanks

    Reported-by: David Sterba
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     

08 Jun, 2013

3 commits

  • …/git/tyhicks/ecryptfs

    Pull ecryptfs fixes from Tyler Hicks:
    - Fixes how eCryptfs handles msync to sync both the upper and lower
    file
    - A couple of MAINTAINERS updates

    * tag 'ecryptfs-3.10-rc5-msync' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
    eCryptfs: Check return of filemap_write_and_wait during fsync
    Update eCryptFS maintainers
    ecryptfs: fixed msync to flush data

    Linus Torvalds
     
  • Pull CIFS fix from Steve French:
    "Fix one byte buffer overrun with prefixpaths on cifs mounts which can
    cause a problem with mount depending on the string length"

    * 'for-3.10' of git://git.samba.org/sfrench/cifs-2.6:
    cifs: fix off-by-one bug in build_unc_path_to_root

    Linus Torvalds
     
  • 1d2ef5901483004d74947bbf78d5146c24038fe7 caused a regression in ncpfs such that
    directories could no longer be removed. This was because ncp_rmdir checked
    to see if a dentry could be unhashed before allowing it to be removed. Since
    1d2ef5901483004d74947bbf78d5146c24038fe7 introduced a change that incremented
    dentry->d_count causing it to always be greater than 1 unhash would always
    fail. Thus causing the error path in ncp_rmdir to always be taken. Removing
    this error path is safe as unhashing is still accomplished by calls to dput
    from vfs_rmdir.

    Signed-off-by: Dave Chiluk
    Signed-off-by: Petr Vandrovec
    Signed-off-by: Al Viro

    Dave Chiluk
     

07 Jun, 2013

1 commit

  • Pull more xfs updates from Ben Myers:
    "Here are several fixes for filesystems with CRC support turned on:
    fixes for quota, remote attributes, and recovery. There is also some
    feature work related to CRCs: the implementation of CRCs for the inode
    unlinked lists, disabling noattr2/attr2 options when appropriate, and
    bumping the maximum number of ACLs.

    I would have preferred to defer this last category of items to 3.11.
    This would require setting a feature bit for the on-disk changes, so
    there is some pressure to get these in 3.10. I believe this
    represents the end of the CRC related queue.

    - Rework of dquot CRCs
    - Fix for remote attribute invalidation of a leaf
    - Fix ordering of transaction replay in recovery
    - Implement CRCs for inode unlinked list
    - Disable noattr2/attr2 mount options when CRCs are enabled
    - Bump the limitation of ACL entries for v5 superblocks"

    * tag 'for-linus-v3.10-rc5' of git://oss.sgi.com/xfs/xfs:
    xfs: increase number of ACL entries for V5 superblocks
    xfs: disable noattr2/attr2 mount options for CRC enabled filesystems
    xfs: inode unlinked list needs to recalculate the inode CRC
    xfs: fix log recovery transaction item reordering
    xfs: fix remote attribute invalidation for a leaf
    xfs: rework dquot CRCs

    Linus Torvalds
     

06 Jun, 2013

6 commits

  • The limit of 25 ACL entries is arbitrary, but baked into the on-disk
    format. For version 5 superblocks, increase it to the maximum nuber
    of ACLs that can fit into a single xattr.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    (cherry picked from commit 5c87d4bc1a86bd6e6754ac3d6e111d776ddcfe57)

    Dave Chinner
     
  • attr2 format is always enabled for v5 superblock filesystems, so the
    mount options to enable or disable it need to be cause mount errors.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Ben Myers

    (cherry picked from commit d3eaace84e40bf946129e516dcbd617173c1cf14)

    Dave Chinner
     
  • The inode unlinked list manipulations operate directly on the inode
    buffer, and so bypass the inode CRC calculation mechanisms. Hence an
    inode on the unlinked list has an invalid CRC. Fix this by
    recalculating the CRC whenever we modify an unlinked list pointer in
    an inode, ncluding during log recovery. This is trivial to do and
    results in unlinked list operations always leaving a consistent
    inode in the buffer.

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    (cherry picked from commit 0a32c26e720a8b38971d0685976f4a7d63f9e2ef)

    Dave Chinner
     
  • There are several constraints that inode allocation and unlink
    logging impose on log recovery. These all stem from the fact that
    inode alloc/unlink are logged in buffers, but all other inode
    changes are logged in inode items. Hence there are ordering
    constraints that recovery must follow to ensure the correct result
    occurs.

    As it turns out, this ordering has been working mostly by chance
    than good management. The existing code moves all buffers except
    cancelled buffers to the head of the list, and everything else to
    the tail of the list. The problem with this is that is interleaves
    inode items with the buffer cancellation items, and hence whether
    the inode item in an cancelled buffer gets replayed is essentially
    left to chance.

    Further, this ordering causes problems for log recovery when inode
    CRCs are enabled. It typically replays the inode unlink buffer long before
    it replays the inode core changes, and so the CRC recorded in an
    unlink buffer is going to be invalid and hence any attempt to
    validate the inode in the buffer is going to fail. Hence we really
    need to enforce the ordering that the inode alloc/unlink code has
    expected log recovery to have since inode chunk de-allocation was
    introduced back in 2003...

    Signed-off-by: Dave Chinner
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    (cherry picked from commit a775ad778073d55744ed6709ccede36310638911)

    Dave Chinner
     
  • When invalidating an attribute leaf block block, there might be
    remote attributes that it points to. With the recent rework of the
    remote attribute format, we have to make sure we calculate the
    length of the attribute correctly. We aren't doing that in
    xfs_attr3_leaf_inactive(), so fix it.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Mark Tinguely
    Signed-off-by: Ben Myers

    (cherry picked from commit 59913f14dfe8eb772ff93eb442947451b4416329)

    Dave Chinner
     
  • Calculating dquot CRCs when the backing buffer is written back just
    doesn't work reliably. There are several places which manipulate
    dquots directly in the buffers, and they don't calculate CRCs
    appropriately, nor do they always set the buffer up to calculate
    CRCs appropriately.

    Firstly, if we log a dquot buffer (e.g. during allocation) it gets
    logged without valid CRC, and so on recovery we end up with a dquot
    that is not valid.

    Secondly, if we recover/repair a dquot, we don't have a verifier
    attached to the buffer and hence CRCs are not calculated on the way
    down to disk.

    Thirdly, calculating the CRC after we've changed the contents means
    that if we re-read the dquot from the buffer, we cannot verify the
    contents of the dquot are valid, as the CRC is invalid.

    So, to avoid all the dquot CRC errors that are being detected by the
    read verifier, change to using the same model as for inodes. That
    is, dquot CRCs are calculated and written to the backing buffer at
    the time the dquot is flushed to the backing buffer. If we modify
    the dquot directly in the backing buffer, calculate the CRC
    immediately after the modification is complete. Hence the dquot in
    the on-disk buffer should always have a valid CRC.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    (cherry picked from commit 6fcdc59de28817d1fbf1bd58cc01f4f3fac858fb)

    Dave Chinner
     

05 Jun, 2013

3 commits

  • Error out of ecryptfs_fsync() if filemap_write_and_wait() fails.

    Signed-off-by: Tyler Hicks
    Cc: Paul Taysom
    Cc: Olof Johansson
    Cc: stable@vger.kernel.org # v3.6+

    Tyler Hicks
     
  • Pull gfs2 fixes from Steven Whitehouse:
    "There are four patches this time.

    The first fixes a problem where the wrong descriptor type was being
    written into the log for journaled data blocks.

    The second fixes a race relating to the deallocation of allocator
    data.

    The third provides a fallback if kmalloc is unable to satisfy a
    request to allocate a directory hash table.

    The fourth fixes the iopen glock caching so that inodes are deleted in
    a more timely manner after rmdir/unlink"

    * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
    GFS2: Don't cache iopen glocks
    GFS2: Fall back to vmalloc if kmalloc fails for dir hash tables
    GFS2: Increase i_writecount during gfs2_setattr_size
    GFS2: Set log descriptor type for jdata blocks

    Linus Torvalds
     
  • Pull fuse fixes from Miklos Szeredi:
    "One patch fixes an Oops introduced in 3.9 with the readdirplus
    feature. The rest are fixes for async-dio in 3.10"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: fix alignment in short read optimization for async_dio
    fuse: return -EIOCBQUEUED from fuse_direct_IO() for all async requests
    fuse: fix readdirplus Oops in fuse_dentry_revalidate
    fuse: update inode size and invalidate attributes on fallocate
    fuse: truncate pagecache range on hole punch
    fuse: allocate for_background dio requests based on io->async state

    Linus Torvalds
     

04 Jun, 2013

1 commit


03 Jun, 2013

3 commits

  • This patch makes GFS2 immediately reclaim/delete all iopen glocks
    as soon as they're dequeued. This allows deleters to get an
    EXclusive lock on iopen so files are deleted properly instead of
    being set as unlinked.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • This version has one more correction: the vmalloc calls are replaced
    by __vmalloc calls to preserve the GFP_NOFS flag.

    When GFS2's directory management code allocates buffers for a
    directory hash table, if it can't get the memory it needs, it
    currently gives a bad return code. Rather than giving an error,
    this patch allows it to use virtual memory rather than kernel
    memory for the hash table. This should make it possible for
    directories to function properly, even when kernel memory becomes
    very fragmented.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson
     
  • This patch calls get_write_access in a few functions. This
    merely increases inode->i_writecount for the duration of the function.
    That will ensure that any file closes won't delete the inode's
    multi-block reservation while the function is running.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse

    Bob Peterson