17 Aug, 2012

2 commits

  • Following a report of a crash during an automount expire I found that
    the locking in fs/autofs4/expire.c:get_next_positive_subdir() was wrong.
    Not only is the locking wrong but the function is more complex than it
    needs to be.

    The function is meant to calculate (and dget) the next entry in the list
    of directories contained in the root of an autofs mount point (an autofs
    indirect mount to be precise). The main problem was that the d_lock of
    the owner of the list was not being taken when walking the list, which
    lead to list corruption under load. The only other lock that needs to
    be taken is against the next dentry candidate so it can be checked for
    usability.

    Signed-off-by: Ian Kent
    Signed-off-by: Linus Torvalds

    Ian Kent
     
  • Pull fuse updates from Miklos Szeredi.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: verify all ioctl retry iov elements
    fuse: add missing INIT flag descriptions
    fuse: add missing INIT flags
    fuse: update attributes on aio_read
    fuse: invalidate inode mapping if mtime changes
    fuse: add FUSE_AUTO_INVAL_DATA init flag

    Linus Torvalds
     

13 Aug, 2012

1 commit


09 Aug, 2012

1 commit

  • We got a recursive lock in mksubvol because the caller already held
    a lock. I think we got into this due to a merge error. Commit a874a63
    removed the mnt_want_write call from btrfs_mksubvol and added a
    replacement call to mnt_want_write_file in btrfs_ioctl_snap_create_transid.
    Commit e7848683 however tried to move all calls to mnt_want_write above
    i_mutex. So somewhere while merging this, it got mixed up. The
    solution is to remove the mnt_want_write call completely from
    mksubvol.

    Reported-by: David Sterba
    Signed-off-by: Alexander Block
    Signed-off-by: Chris Mason

    Alexander Block
     

07 Aug, 2012

1 commit

  • Commit 7572777eef78ebdee1ecb7c258c0ef94d35bad16 attempted to verify that
    the total iovec from the client doesn't overflow iov_length() but it
    only checked the first element. The iovec could still overflow by
    starting with a small element. The obvious fix is to check all the
    elements.

    The overflow case doesn't look dangerous to the kernel as the copy is
    limited by the length after the overflow. This fix restores the
    intention of returning an error instead of successfully copying less
    than the iovec represented.

    I found this by code inspection. I built it but don't have a test case.
    I'm cc:ing stable because the initial commit did as well.

    Signed-off-by: Zach Brown
    Signed-off-by: Miklos Szeredi
    CC: [2.6.37+]

    Zach Brown
     

04 Aug, 2012

14 commits

  • This one ought to be __mnt_drop_write(), to match __mnt_want_write()
    in the beginning...

    Signed-off-by: Al Viro

    Al Viro
     
  • The pdflush thread is long gone, so this patch removes references to pdflush
    from UBIFS comments.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • The pdflush thread is long gone, so this patch removes references to pdflush
    from gfs comments.

    Cc: Steven Whitehouse
    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • The '->write_super' superblock method is gone, and this patch removes all the
    references to 'write_super' from ntfs.

    Cc: KONISHI Ryusuke
    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • The '->write_super' superblock method is gone, and this patch removes all the
    references to 'write_super' from hfs.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • The pdflush thread is long gone, so this patch removes references to pdflush
    from vfs comments.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • The '->write_super' superblock method is gone, and this patch removes all the
    references to 'write_super' from various jbd and jbd2.

    Cc: Andrew Morton
    Cc: Jan Kara
    Cc: "Theodore Ts'o"
    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • The pdflush thread is long gone, so this patch removes references to pdflush
    from btrfs comments.

    Cc: Chris Mason
    Cc: linux-btrfs@vger.kernel.org
    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • The '->write_super' superblock method is gone, and this patch removes all the
    references to 'write_super' from btrfs.

    Cc: Chris Mason
    Cc: linux-btrfs@vger.kernel.org
    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • The pdflush thread is long gone, so this patch removes references to pdflush
    from ext4 comments.

    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • The '->write_super' superblock method is gone, and this patch removes all the
    references to 'write_super' from ext3.

    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • The '->write_super' superblock method is gone, and this patch removes all the
    references to 'write_super' from ext3.

    Cc: Jan Kara
    Cc: Andrew Morton
    Cc: Andreas Dilger
    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • Finally we can kill the 'sync_supers' kernel thread along with the
    '->write_super()' superblock operation because all the users are gone.
    Now every file-system is supposed to self-manage own superblock and
    its dirty state.

    The nice thing about killing this thread is that it improves power management.
    Indeed, 'sync_supers' is a source of monotonic system wake-ups - it woke up
    every 5 seconds no matter what - even if there were no dirty superblocks and
    even if there were no file-systems using this service (e.g., btrfs and
    journalled ext4 do not need it). So it was wasting power most of the time. And
    because the thread was in the core of the kernel, all systems had to have it.
    So I am quite happy to make it go away.

    Interestingly, this thread is a left-over from the pdflush kernel thread which
    was a self-forking kernel thread responsible for all the write-back in old
    Linux kernels. It was turned into per-block device BDI threads, and
    'sync_supers' was a left-over. Thus, R.I.P, pdflush as well.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • Pull exofs update from Boaz Harrosh:
    "They are all mostly fixes, except the most important patch by Artem
    Bityutskiy which removes the use of s_dirt. After this patch s_dirt
    can be completely removed from the tree."

    * 'for-linus' of git://git.open-osd.org/linux-open-osd:
    ore: Fix out-of-bounds access in _ios_obj()
    exofs: Use proper max_IO calculations from ore
    exofs: Fix __r4w_get_page when offset is beyond i_size
    exofs: stop using s_dirt
    exofs: readpage_strip: Add a BUG_ON to check for PageLocked(page)

    Linus Torvalds
     

03 Aug, 2012

4 commits

  • Pull two ceph fixes from Sage Weil:
    "The first patch fixes up the old crufty open intent code to use the
    atomic_open stuff properly, and the second fixes a possible null deref
    and memory leak with the crypto keys."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    libceph: fix crypto key null deref, memory leak
    ceph: simplify+fix atomic_open

    Linus Torvalds
     
  • Pull ecryptfs fixes from Tyler Hicks:
    - Fixes a bug when the lower filesystem mount options include 'acl',
    but the eCryptfs mount options do not
    - Cleanups in the messaging code
    - Better handling of empty files in the lower filesystem to improve
    usability. Failed file creations are now cleaned up and empty lower
    files are converted into eCryptfs during open().
    - The write-through cache changes are being reverted due to bugs that
    are not easy to fix. Stability outweighs the performance
    enhancements here.
    - Improvement to the mount code to catch unsupported ciphers specified
    in the mount options

    * tag 'ecryptfs-3.6-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
    eCryptfs: check for eCryptfs cipher support at mount
    eCryptfs: Revert to a writethrough cache model
    eCryptfs: Initialize empty lower files when opening them
    eCryptfs: Unlink lower inode when ecryptfs_create() fails
    eCryptfs: Make all miscdev functions use daemon ptr in file private_data
    eCryptfs: Remove unused messaging declarations and function
    eCryptfs: Copy up POSIX ACL and read-only flags from lower mount

    Linus Torvalds
     
  • Pull CIFS update from Steve French:
    "Adds SMB2 rmdir/mkdir capability to the SMB2/SMB2.1 support in cifs.

    I am holding up a few more days on merging the remainder of the
    SMB2/SMB2.1 enablement although it is nearing review completion, in
    order to address some review comments from Jeff Layton on a few of the
    subsequent SMB2 patches, and also to debug an unrelated cifs problem
    that Pavel discovered."

    * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
    CIFS: Add SMB2 support for rmdir
    CIFS: Move rmdir code to ops struct
    CIFS: Add SMB2 support for mkdir operation
    CIFS: Separate protocol specific part from mkdir
    CIFS: Simplify cifs_mkdir call

    Linus Torvalds
     
  • The initial ->atomic_open op was carried over from the old intent code,
    which was incomplete and didn't really work. Replace it with a fresh
    method. In particular:

    * always attempt to do an atomic open+lookup, both for the create case
    and for lookups of existing files.
    * fix symlink handling by returning 1 to the VFS so that we can follow
    the link to its destination. This fixes a longstanding ceph bug (#2392).

    Signed-off-by: Sage Weil

    Sage Weil
     

02 Aug, 2012

7 commits

  • _ios_obj() is accessed by group_index not device_table index.

    The oc->comps array is only a group_full of devices at a time
    it is not like ore_comp_dev() which is indexed by a global
    device_table index.

    This did not BUG until now because exofs only uses a single
    COMP for all devices. But with other FSs like PanFS this is
    not true.

    This bug was only in the write_path, all other users were
    using it correctly

    [This is a bug since 3.2 Kernel]
    CC: Stable Tree

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • exofs_max_io_pages should just use the ORE's
    calculated layout->max_io_length,

    And avoid unnecessary BUGs, calculations made here were
    also a layering violation.

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • It is very common for the end of the file to be unaligned on
    stripe size. But since we know it's beyond file's end then
    the XOR should be preformed with all zeros.

    Old code used to just read zeros out of the OSD devices, which is a great
    waist. But what scares me more about this situation is that, we now have
    pages attached to the file's mapping that are beyond i_size. I don't
    like the kind of bugs this calls for.

    Fix both birds, by returning a global ZERO_PAGE, if offset is beyond
    i_size.

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • Exofs has the '->write_super()' handler and makes some use of the '->s_dirt'
    superblock flag, but it really needs neither of them because it never sets
    's_dirt' to one which means the VFS never calls its '->write_super()' handler.
    Thus, remove both.

    Note, I am trying to remove both 's_dirt' and 'write_super()' from VFS
    altogether once all users are gone.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Boaz Harrosh

    Artem Bityutskiy
     
  • readpage_strip can be called from several code paths all of which
    require that the page be locked before any operations are carried
    out.

    Since we export the exofs_readpage callback to the VFS, add a
    BUG_ON to check for PageLocked(page) to make sure that this
    understanding is never compromised.

    Signed-off-by: Kautuk Consul
    Signed-off-by: Boaz Harrosh

    Kautuk Consul
     
  • Pull second vfs pile from Al Viro:
    "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
    deadlock reproduced by xfstests 068), symlink and hardlink restriction
    patches, plus assorted cleanups and fixes.

    Note that another fsfreeze deadlock (emergency thaw one) is *not*
    dealt with - the series by Fernando conflicts a lot with Jan's, breaks
    userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
    for massive vfsmount leak; this is going to be handled next cycle.
    There probably will be another pull request, but that stuff won't be
    in it."

    Fix up trivial conflicts due to unrelated changes next to each other in
    drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
    delousing target_core_file a bit
    Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
    fs: Remove old freezing mechanism
    ext2: Implement freezing
    btrfs: Convert to new freezing mechanism
    nilfs2: Convert to new freezing mechanism
    ntfs: Convert to new freezing mechanism
    fuse: Convert to new freezing mechanism
    gfs2: Convert to new freezing mechanism
    ocfs2: Convert to new freezing mechanism
    xfs: Convert to new freezing code
    ext4: Convert to new freezing mechanism
    fs: Protect write paths by sb_start_write - sb_end_write
    fs: Skip atime update on frozen filesystem
    fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
    fs: Improve filesystem freezing handling
    switch the protection of percpu_counter list to spinlock
    nfsd: Push mnt_want_write() outside of i_mutex
    btrfs: Push mnt_want_write() outside of i_mutex
    fat: Push mnt_want_write() outside of i_mutex
    ...

    Linus Torvalds
     
  • In commit 3b6e2723f32d ("locks: prevent side-effects of
    locks_release_private before file_lock is initialized") we removed the
    last user of lm_release_private without removing the field itself.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     

01 Aug, 2012

10 commits

  • Merge Andrew's second set of patches:
    - MM
    - a few random fixes
    - a couple of RTC leftovers

    * emailed patches from Andrew Morton : (120 commits)
    rtc/rtc-88pm80x: remove unneed devm_kfree
    rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
    mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
    tmpfs: distribute interleave better across nodes
    mm: remove redundant initialization
    mm: warn if pg_data_t isn't initialized with zero
    mips: zero out pg_data_t when it's allocated
    memcg: gix memory accounting scalability in shrink_page_list
    mm/sparse: remove index_init_lock
    mm/sparse: more checks on mem_section number
    mm/sparse: optimize sparse_index_alloc
    memcg: add mem_cgroup_from_css() helper
    memcg: further prevent OOM with too many dirty pages
    memcg: prevent OOM with too many dirty pages
    mm: mmu_notifier: fix freed page still mapped in secondary MMU
    mm: memcg: only check anon swapin page charges for swap cache
    mm: memcg: only check swap cache pages for repeated charging
    mm: memcg: split swapin charge function into private and public part
    mm: memcg: remove needless !mm fixup to init_mm when charging
    mm: memcg: remove unneeded shmem charge type
    ...

    Linus Torvalds
     
  • Pull second wave of NFS client updates from Trond Myklebust:

    - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into
    separate modules.

    - Fix Oopses in the NFSv4 idmapper

    - Fix a deadlock whereby rpciod tries to allocate a new socket and ends
    up recursing into the NFS code due to memory reclaim.

    - Increase the number of permitted callback connections.

    * tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    nfs: explicitly reject LOCK_MAND flock() requests
    nfs: increase number of permitted callback connections.
    SUNRPC: return negative value in case rpcbind client creation error
    NFS: Convert v4 into a module
    NFS: Convert v3 into a module
    NFS: Convert v2 into a module
    NFS: Keep module parameters in the generic NFS client
    NFS: Split out remaining NFS v4 inode functions
    NFS: Pass super operations and xattr handlers in the nfs_subversion
    NFS: Only initialize the ACL client in the v3 case
    NFS: Create a try_mount rpc op
    NFS: Remove the NFS v4 xdev mount function
    NFS: Add version registering framework
    NFS: Fix a number of bugs in the idmapper
    nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
    sunrpc: clarify comments on rpc_make_runnable
    pnfsblock: bail out partial page IO

    Linus Torvalds
     
  • GFP_NOFS is _more_ permissive than GFP_NOIO in that it will initiate IO,
    just not of any filesystem data.

    The problem is that previously NOFS was correct because that avoids
    recursion into the NFS code. With swap-over-NFS, it is no longer correct
    as swap IO can lead to this recursion.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Christoph Hellwig
    Cc: David S. Miller
    Cc: Eric B Munson
    Cc: Eric Paris
    Cc: James Morris
    Cc: Mel Gorman
    Cc: Mike Christie
    Cc: Neil Brown
    Cc: Sebastian Andrzej Siewior
    Cc: Trond Myklebust
    Cc: Xiaotian Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Implement the new swapfile a_ops for NFS and hook up ->direct_IO. This
    will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under
    PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol
    ->connect() method.

    PF_MEMALLOC should allow the allocation of struct socket and related
    objects and the early (re)setting of SOCK_MEMALLOC should allow us to
    receive the packets required for the TCP connection buildup.

    [jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases]
    [dfeng@redhat.com: Fix handling of multiple swap files]
    [a.p.zijlstra@chello.nl: Original patch]
    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Christoph Hellwig
    Cc: David S. Miller
    Cc: Eric B Munson
    Cc: Eric Paris
    Cc: James Morris
    Cc: Mel Gorman
    Cc: Mike Christie
    Cc: Neil Brown
    Cc: Sebastian Andrzej Siewior
    Cc: Trond Myklebust
    Cc: Xiaotian Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • The VM does not like PG_private set on PG_swapcache pages. As suggested
    by Trond in http://lkml.org/lkml/2006/8/25/348, this patch disables NFS
    data cache revalidation on swap files. as it does not make sense to have
    other clients change the file while it is being used as swap. This avoids
    setting PG_private on swap pages, since there ought to be no further races
    with invalidate_inode_pages2() to deal with.

    Since we cannot set PG_private we cannot use page->private which is
    already used by PG_swapcache pages to store the nfs_page. Thus augment
    the new nfs_page_find_request logic.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Christoph Hellwig
    Cc: David S. Miller
    Cc: Eric B Munson
    Cc: Eric Paris
    Cc: James Morris
    Cc: Mel Gorman
    Cc: Mike Christie
    Cc: Neil Brown
    Cc: Sebastian Andrzej Siewior
    Cc: Trond Myklebust
    Cc: Xiaotian Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Replace all relevant occurences of page->index and page->mapping in the
    NFS client with the new page_file_index() and page_file_mapping()
    functions.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Christoph Hellwig
    Cc: David S. Miller
    Cc: Eric B Munson
    Cc: Eric Paris
    Cc: James Morris
    Cc: Mel Gorman
    Cc: Mike Christie
    Cc: Neil Brown
    Cc: Sebastian Andrzej Siewior
    Cc: Trond Myklebust
    Cc: Xiaotian Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • 09f363c7 ("vmscan: fix shrinker callback bug in fs/super.c") fixed a
    shrinker callback which was returning -1 when nr_to_scan is zero, which
    caused excessive slab scanning. But 635697c6 ("vmscan: fix initial
    shrinker size handling") fixed the problem, again so we can freely return
    -1 although nr_to_scan is zero. So let's revert 09f363c7 because the
    comment added in 09f363c7 made an unnecessary rule.

    Signed-off-by: Minchan Kim
    Cc: Al Viro
    Cc: Mikulas Patocka
    Cc: Konstantin Khlebnikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Use a mmu_gather instead of a temporary linked list for accumulating pages
    when we unmap a hugepage range

    Signed-off-by: Aneesh Kumar K.V
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: David Rientjes
    Cc: Hillf Danton
    Cc: Michal Hocko
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • Since per-BDI flusher threads were introduced in 2.6, the pdflush
    mechanism is not used any more. But the old interface exported through
    /proc/sys/vm/nr_pdflush_threads still exists and is obviously useless.

    For back-compatibility, printk warning information and return 2 to notify
    the users that the interface is removed.

    Signed-off-by: Wanpeng Li
    Cc: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wanpeng Li
     
  • Pull nfsd changes from J. Bruce Fields:
    "This has been an unusually quiet cycle--mostly bugfixes and cleanup.
    The one large piece is Stanislav's work to containerize the server's
    grace period--but that in itself is just one more step in a
    not-yet-complete project to allow fully containerized nfs service.

    There are a number of outstanding delegation, container, v4 state, and
    gss patches that aren't quite ready yet; 3.7 may be wilder."

    * 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (35 commits)
    NFSd: make boot_time variable per network namespace
    NFSd: make grace end flag per network namespace
    Lockd: move grace period management from lockd() to per-net functions
    LockD: pass actual network namespace to grace period management functions
    LockD: manage grace list per network namespace
    SUNRPC: service request network namespace helper introduced
    NFSd: make nfsd4_manager allocated per network namespace context.
    LockD: make lockd manager allocated per network namespace
    LockD: manage grace period per network namespace
    Lockd: add more debug to host shutdown functions
    Lockd: host complaining function introduced
    LockD: manage used host count per networks namespace
    LockD: manage garbage collection timeout per networks namespace
    LockD: make garbage collector network namespace aware.
    LockD: mark host per network namespace on garbage collect
    nfsd4: fix missing fault_inject.h include
    locks: move lease-specific code out of locks_delete_lock
    locks: prevent side-effects of locks_release_private before file_lock is initialized
    NFSd: set nfsd_serv to NULL after service destruction
    NFSd: introduce nfsd_destroy() helper
    ...

    Linus Torvalds