03 Aug, 2012

4 commits

  • Pull two ceph fixes from Sage Weil:
    "The first patch fixes up the old crufty open intent code to use the
    atomic_open stuff properly, and the second fixes a possible null deref
    and memory leak with the crypto keys."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    libceph: fix crypto key null deref, memory leak
    ceph: simplify+fix atomic_open

    Linus Torvalds
     
  • Pull ecryptfs fixes from Tyler Hicks:
    - Fixes a bug when the lower filesystem mount options include 'acl',
    but the eCryptfs mount options do not
    - Cleanups in the messaging code
    - Better handling of empty files in the lower filesystem to improve
    usability. Failed file creations are now cleaned up and empty lower
    files are converted into eCryptfs during open().
    - The write-through cache changes are being reverted due to bugs that
    are not easy to fix. Stability outweighs the performance
    enhancements here.
    - Improvement to the mount code to catch unsupported ciphers specified
    in the mount options

    * tag 'ecryptfs-3.6-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
    eCryptfs: check for eCryptfs cipher support at mount
    eCryptfs: Revert to a writethrough cache model
    eCryptfs: Initialize empty lower files when opening them
    eCryptfs: Unlink lower inode when ecryptfs_create() fails
    eCryptfs: Make all miscdev functions use daemon ptr in file private_data
    eCryptfs: Remove unused messaging declarations and function
    eCryptfs: Copy up POSIX ACL and read-only flags from lower mount

    Linus Torvalds
     
  • Pull CIFS update from Steve French:
    "Adds SMB2 rmdir/mkdir capability to the SMB2/SMB2.1 support in cifs.

    I am holding up a few more days on merging the remainder of the
    SMB2/SMB2.1 enablement although it is nearing review completion, in
    order to address some review comments from Jeff Layton on a few of the
    subsequent SMB2 patches, and also to debug an unrelated cifs problem
    that Pavel discovered."

    * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
    CIFS: Add SMB2 support for rmdir
    CIFS: Move rmdir code to ops struct
    CIFS: Add SMB2 support for mkdir operation
    CIFS: Separate protocol specific part from mkdir
    CIFS: Simplify cifs_mkdir call

    Linus Torvalds
     
  • The initial ->atomic_open op was carried over from the old intent code,
    which was incomplete and didn't really work. Replace it with a fresh
    method. In particular:

    * always attempt to do an atomic open+lookup, both for the create case
    and for lookups of existing files.
    * fix symlink handling by returning 1 to the VFS so that we can follow
    the link to its destination. This fixes a longstanding ceph bug (#2392).

    Signed-off-by: Sage Weil

    Sage Weil
     

02 Aug, 2012

2 commits

  • Pull second vfs pile from Al Viro:
    "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
    deadlock reproduced by xfstests 068), symlink and hardlink restriction
    patches, plus assorted cleanups and fixes.

    Note that another fsfreeze deadlock (emergency thaw one) is *not*
    dealt with - the series by Fernando conflicts a lot with Jan's, breaks
    userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
    for massive vfsmount leak; this is going to be handled next cycle.
    There probably will be another pull request, but that stuff won't be
    in it."

    Fix up trivial conflicts due to unrelated changes next to each other in
    drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
    delousing target_core_file a bit
    Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
    fs: Remove old freezing mechanism
    ext2: Implement freezing
    btrfs: Convert to new freezing mechanism
    nilfs2: Convert to new freezing mechanism
    ntfs: Convert to new freezing mechanism
    fuse: Convert to new freezing mechanism
    gfs2: Convert to new freezing mechanism
    ocfs2: Convert to new freezing mechanism
    xfs: Convert to new freezing code
    ext4: Convert to new freezing mechanism
    fs: Protect write paths by sb_start_write - sb_end_write
    fs: Skip atime update on frozen filesystem
    fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
    fs: Improve filesystem freezing handling
    switch the protection of percpu_counter list to spinlock
    nfsd: Push mnt_want_write() outside of i_mutex
    btrfs: Push mnt_want_write() outside of i_mutex
    fat: Push mnt_want_write() outside of i_mutex
    ...

    Linus Torvalds
     
  • In commit 3b6e2723f32d ("locks: prevent side-effects of
    locks_release_private before file_lock is initialized") we removed the
    last user of lm_release_private without removing the field itself.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     

01 Aug, 2012

13 commits

  • Merge Andrew's second set of patches:
    - MM
    - a few random fixes
    - a couple of RTC leftovers

    * emailed patches from Andrew Morton : (120 commits)
    rtc/rtc-88pm80x: remove unneed devm_kfree
    rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
    mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
    tmpfs: distribute interleave better across nodes
    mm: remove redundant initialization
    mm: warn if pg_data_t isn't initialized with zero
    mips: zero out pg_data_t when it's allocated
    memcg: gix memory accounting scalability in shrink_page_list
    mm/sparse: remove index_init_lock
    mm/sparse: more checks on mem_section number
    mm/sparse: optimize sparse_index_alloc
    memcg: add mem_cgroup_from_css() helper
    memcg: further prevent OOM with too many dirty pages
    memcg: prevent OOM with too many dirty pages
    mm: mmu_notifier: fix freed page still mapped in secondary MMU
    mm: memcg: only check anon swapin page charges for swap cache
    mm: memcg: only check swap cache pages for repeated charging
    mm: memcg: split swapin charge function into private and public part
    mm: memcg: remove needless !mm fixup to init_mm when charging
    mm: memcg: remove unneeded shmem charge type
    ...

    Linus Torvalds
     
  • Pull second wave of NFS client updates from Trond Myklebust:

    - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into
    separate modules.

    - Fix Oopses in the NFSv4 idmapper

    - Fix a deadlock whereby rpciod tries to allocate a new socket and ends
    up recursing into the NFS code due to memory reclaim.

    - Increase the number of permitted callback connections.

    * tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    nfs: explicitly reject LOCK_MAND flock() requests
    nfs: increase number of permitted callback connections.
    SUNRPC: return negative value in case rpcbind client creation error
    NFS: Convert v4 into a module
    NFS: Convert v3 into a module
    NFS: Convert v2 into a module
    NFS: Keep module parameters in the generic NFS client
    NFS: Split out remaining NFS v4 inode functions
    NFS: Pass super operations and xattr handlers in the nfs_subversion
    NFS: Only initialize the ACL client in the v3 case
    NFS: Create a try_mount rpc op
    NFS: Remove the NFS v4 xdev mount function
    NFS: Add version registering framework
    NFS: Fix a number of bugs in the idmapper
    nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
    sunrpc: clarify comments on rpc_make_runnable
    pnfsblock: bail out partial page IO

    Linus Torvalds
     
  • GFP_NOFS is _more_ permissive than GFP_NOIO in that it will initiate IO,
    just not of any filesystem data.

    The problem is that previously NOFS was correct because that avoids
    recursion into the NFS code. With swap-over-NFS, it is no longer correct
    as swap IO can lead to this recursion.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Christoph Hellwig
    Cc: David S. Miller
    Cc: Eric B Munson
    Cc: Eric Paris
    Cc: James Morris
    Cc: Mel Gorman
    Cc: Mike Christie
    Cc: Neil Brown
    Cc: Sebastian Andrzej Siewior
    Cc: Trond Myklebust
    Cc: Xiaotian Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Implement the new swapfile a_ops for NFS and hook up ->direct_IO. This
    will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under
    PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol
    ->connect() method.

    PF_MEMALLOC should allow the allocation of struct socket and related
    objects and the early (re)setting of SOCK_MEMALLOC should allow us to
    receive the packets required for the TCP connection buildup.

    [jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases]
    [dfeng@redhat.com: Fix handling of multiple swap files]
    [a.p.zijlstra@chello.nl: Original patch]
    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Christoph Hellwig
    Cc: David S. Miller
    Cc: Eric B Munson
    Cc: Eric Paris
    Cc: James Morris
    Cc: Mel Gorman
    Cc: Mike Christie
    Cc: Neil Brown
    Cc: Sebastian Andrzej Siewior
    Cc: Trond Myklebust
    Cc: Xiaotian Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • The VM does not like PG_private set on PG_swapcache pages. As suggested
    by Trond in http://lkml.org/lkml/2006/8/25/348, this patch disables NFS
    data cache revalidation on swap files. as it does not make sense to have
    other clients change the file while it is being used as swap. This avoids
    setting PG_private on swap pages, since there ought to be no further races
    with invalidate_inode_pages2() to deal with.

    Since we cannot set PG_private we cannot use page->private which is
    already used by PG_swapcache pages to store the nfs_page. Thus augment
    the new nfs_page_find_request logic.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Christoph Hellwig
    Cc: David S. Miller
    Cc: Eric B Munson
    Cc: Eric Paris
    Cc: James Morris
    Cc: Mel Gorman
    Cc: Mike Christie
    Cc: Neil Brown
    Cc: Sebastian Andrzej Siewior
    Cc: Trond Myklebust
    Cc: Xiaotian Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Replace all relevant occurences of page->index and page->mapping in the
    NFS client with the new page_file_index() and page_file_mapping()
    functions.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Christoph Hellwig
    Cc: David S. Miller
    Cc: Eric B Munson
    Cc: Eric Paris
    Cc: James Morris
    Cc: Mel Gorman
    Cc: Mike Christie
    Cc: Neil Brown
    Cc: Sebastian Andrzej Siewior
    Cc: Trond Myklebust
    Cc: Xiaotian Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • 09f363c7 ("vmscan: fix shrinker callback bug in fs/super.c") fixed a
    shrinker callback which was returning -1 when nr_to_scan is zero, which
    caused excessive slab scanning. But 635697c6 ("vmscan: fix initial
    shrinker size handling") fixed the problem, again so we can freely return
    -1 although nr_to_scan is zero. So let's revert 09f363c7 because the
    comment added in 09f363c7 made an unnecessary rule.

    Signed-off-by: Minchan Kim
    Cc: Al Viro
    Cc: Mikulas Patocka
    Cc: Konstantin Khlebnikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Use a mmu_gather instead of a temporary linked list for accumulating pages
    when we unmap a hugepage range

    Signed-off-by: Aneesh Kumar K.V
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: David Rientjes
    Cc: Hillf Danton
    Cc: Michal Hocko
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aneesh Kumar K.V
     
  • Since per-BDI flusher threads were introduced in 2.6, the pdflush
    mechanism is not used any more. But the old interface exported through
    /proc/sys/vm/nr_pdflush_threads still exists and is obviously useless.

    For back-compatibility, printk warning information and return 2 to notify
    the users that the interface is removed.

    Signed-off-by: Wanpeng Li
    Cc: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wanpeng Li
     
  • Pull nfsd changes from J. Bruce Fields:
    "This has been an unusually quiet cycle--mostly bugfixes and cleanup.
    The one large piece is Stanislav's work to containerize the server's
    grace period--but that in itself is just one more step in a
    not-yet-complete project to allow fully containerized nfs service.

    There are a number of outstanding delegation, container, v4 state, and
    gss patches that aren't quite ready yet; 3.7 may be wilder."

    * 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (35 commits)
    NFSd: make boot_time variable per network namespace
    NFSd: make grace end flag per network namespace
    Lockd: move grace period management from lockd() to per-net functions
    LockD: pass actual network namespace to grace period management functions
    LockD: manage grace list per network namespace
    SUNRPC: service request network namespace helper introduced
    NFSd: make nfsd4_manager allocated per network namespace context.
    LockD: make lockd manager allocated per network namespace
    LockD: manage grace period per network namespace
    Lockd: add more debug to host shutdown functions
    Lockd: host complaining function introduced
    LockD: manage used host count per networks namespace
    LockD: manage garbage collection timeout per networks namespace
    LockD: make garbage collector network namespace aware.
    LockD: mark host per network namespace on garbage collect
    nfsd4: fix missing fault_inject.h include
    locks: move lease-specific code out of locks_delete_lock
    locks: prevent side-effects of locks_release_private before file_lock is initialized
    NFSd: set nfsd_serv to NULL after service destruction
    NFSd: introduce nfsd_destroy() helper
    ...

    Linus Torvalds
     
  • Pull Ceph changes from Sage Weil:
    "Lots of stuff this time around:

    - lots of cleanup and refactoring in the libceph messenger code, and
    many hard to hit races and bugs closed as a result.
    - lots of cleanup and refactoring in the rbd code from Alex Elder,
    mostly in preparation for the layering functionality that will be
    coming in 3.7.
    - some misc rbd cleanups from Josh Durgin that are finally going
    upstream
    - support for CRUSH tunables (used by newer clusters to improve the
    data placement)
    - some cleanup in our use of d_parent that Al brought up a while back
    - a random collection of fixes across the tree

    There is another patch coming that fixes up our ->atomic_open()
    behavior, but I'm going to hammer on it a bit more before sending it."

    Fix up conflicts due to commits that were already committed earlier in
    drivers/block/rbd.c, net/ceph/{messenger.c, osd_client.c}

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (132 commits)
    rbd: create rbd_refresh_helper()
    rbd: return obj version in __rbd_refresh_header()
    rbd: fixes in rbd_header_from_disk()
    rbd: always pass ops array to rbd_req_sync_op()
    rbd: pass null version pointer in add_snap()
    rbd: make rbd_create_rw_ops() return a pointer
    rbd: have __rbd_add_snap_dev() return a pointer
    libceph: recheck con state after allocating incoming message
    libceph: change ceph_con_in_msg_alloc convention to be less weird
    libceph: avoid dropping con mutex before fault
    libceph: verify state after retaking con lock after dispatch
    libceph: revoke mon_client messages on session restart
    libceph: fix handling of immediate socket connect failure
    ceph: update MAINTAINERS file
    libceph: be less chatty about stray replies
    libceph: clear all flags on con_close
    libceph: clean up con flags
    libceph: replace connection state bits with states
    libceph: drop unnecessary CLOSED check in socket state change callback
    libceph: close socket directly from ceph_con_close()
    ...

    Linus Torvalds
     
  • We have no mechanism to emulate LOCK_MAND locks on NFSv4, so explicitly
    return -EINVAL if someone requests it.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Jeff Layton
     
  • By default a sunrpc service is limited to (N+3)*20 connections
    where N is the number of threads. This is 80 when N==1.
    If this number is exceeded a warning is printed suggesting that
    the number of threads be increased. However with services which
    run a single thread, this is impossible.

    For such services there is a ->sv_maxconn setting that can be
    used to forcibly increase the limit, and silence the message.
    This is used by lockd.

    The nfs client uses a sunrpc service to handle callbacks and
    it too is single-threaded, so to avoid the useless messages,
    and to allow a reasonable number of concurrent connections,
    we need to set ->sv_maxconn. 1024 seems like a good number.

    Signed-off-by: NeilBrown
    Signed-off-by: Trond Myklebust

    NeilBrown
     

31 Jul, 2012

21 commits

  • Now that all users are converted, we can remove functions, variables, and
    constants defined by the old freezing mechanism.

    BugLink: https://bugs.launchpad.net/bugs/897421
    Tested-by: Kamal Mostafa
    Tested-by: Peter M. Petrakis
    Tested-by: Dann Frazier
    Tested-by: Massimo Morana
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • The only missing piece to make freezing work reliably with ext2 is to
    stop iput() of unlinked inode from deleting the inode on frozen filesystem.
    So add a necessary protection to ext2_evict_inode().

    We also provide appropriate ->freeze_fs and ->unfreeze_fs functions.

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • We convert btrfs_file_aio_write() to use new freeze check. We also add proper
    freeze protection to btrfs_page_mkwrite(). We also add freeze protection to
    the transaction mechanism to avoid starting transactions on frozen filesystem.
    At minimum this is necessary to stop iput() of unlinked file to change frozen
    filesystem during truncation.

    Checks in cleaner_kthread() and transaction_kthread() can be safely removed
    since btrfs_freeze() will lock the mutexes and thus block the threads (and they
    shouldn't have anything to do anyway).

    CC: linux-btrfs@vger.kernel.org
    CC: Chris Mason
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • We change nilfs_page_mkwrite() to provide proper freeze protection for
    writeable page faults (we must wait for frozen filesystem even if the
    page is fully mapped).

    We remove all vfs_check_frozen() checks since they are now handled by
    the generic code.

    CC: linux-nilfs@vger.kernel.org
    CC: KONISHI Ryusuke
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Move check in ntfs_file_aio_write_nolock() to ntfs_file_aio_write() and
    use new freeze protection.

    CC: linux-ntfs-dev@lists.sourceforge.net
    CC: Anton Altaparmakov
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Convert check in fuse_file_aio_write() to using new freeze protection.

    CC: fuse-devel@lists.sourceforge.net
    CC: Miklos Szeredi
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • We update gfs2_page_mkwrite() to use new freeze protection and the transaction
    code to use freeze protection while the transaction is running. That is needed
    to stop iput() of unlinked file from modifying the filesystem. The rest is
    handled by the generic code.

    CC: cluster-devel@redhat.com
    CC: Steven Whitehouse
    Acked-by: Steven Whitehouse
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Protect ocfs2_page_mkwrite() and ocfs2_file_aio_write() using the new freeze
    protection. We also protect several ioctl entry points which were missing the
    protection. Finally, we add freeze protection to the journaling mechanism so
    that iput() of unlinked inode cannot modify a frozen filesystem.

    CC: Mark Fasheh
    CC: Joel Becker
    CC: ocfs2-devel@oss.oracle.com
    Acked-by: Joel Becker
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Generic code now blocks all writers from standard write paths. So we add
    blocking of all writers coming from ioctl (we get a protection of ioctl against
    racing remount read-only as a bonus) and convert xfs_file_aio_write() to a
    non-racy freeze protection. We also keep freeze protection on transaction
    start to block internal filesystem writes such as removal of preallocated
    blocks.

    CC: Ben Myers
    CC: Alex Elder
    CC: xfs@oss.sgi.com
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • We remove most of frozen checks since upper layer takes care of blocking all
    writes. We have to handle protection in ext4_page_mkwrite() in a special way
    because we cannot use generic block_page_mkwrite(). Also we add a freeze
    protection to ext4_evict_inode() so that iput() of unlinked inode cannot modify
    a frozen filesystem (we cannot easily instrument ext4_journal_start() /
    ext4_journal_stop() with freeze protection because we are missing the
    superblock pointer in ext4_journal_stop() in nojournal mode).

    CC: linux-ext4@vger.kernel.org
    CC: "Theodore Ts'o"
    BugLink: https://bugs.launchpad.net/bugs/897421
    Tested-by: Kamal Mostafa
    Tested-by: Peter M. Petrakis
    Tested-by: Dann Frazier
    Tested-by: Massimo Morana
    Acked-by: "Theodore Ts'o"
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • There are several entry points which dirty pages in a filesystem. mmap
    (handled by block_page_mkwrite()), buffered write (handled by
    __generic_file_aio_write()), splice write (generic_file_splice_write),
    truncate, and fallocate (these can dirty last partial page - handled inside
    each filesystem separately). Protect these places with sb_start_write() and
    sb_end_write().

    ->page_mkwrite() calls are particularly complex since they are called with
    mmap_sem held and thus we cannot use standard sb_start_write() due to lock
    ordering constraints. We solve the problem by using a special freeze protection
    sb_start_pagefault() which ranks below mmap_sem.

    BugLink: https://bugs.launchpad.net/bugs/897421
    Tested-by: Kamal Mostafa
    Tested-by: Peter M. Petrakis
    Tested-by: Dann Frazier
    Tested-by: Massimo Morana
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • It is unexpected to block reading of frozen filesystem because of atime update.
    Also handling blocking on frozen filesystem because of atime update would make
    locking more complex than it already is. So just skip atime update when
    filesystem is frozen like we skip it when filesystem is remounted read-only.

    BugLink: https://bugs.launchpad.net/bugs/897421
    Tested-by: Kamal Mostafa
    Tested-by: Peter M. Petrakis
    Tested-by: Dann Frazier
    Tested-by: Massimo Morana
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Most of places where we want freeze protection coincides with the places where
    we also have remount-ro protection. So make mnt_want_write() and
    mnt_drop_write() (and their _file alternative) prevent freezing as well.
    For the few cases that are really interested only in remount-ro protection
    provide new function variants.

    BugLink: https://bugs.launchpad.net/bugs/897421
    Tested-by: Kamal Mostafa
    Tested-by: Peter M. Petrakis
    Tested-by: Dann Frazier
    Tested-by: Massimo Morana
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • vfs_check_frozen() tests are racy since the filesystem can be frozen just after
    the test is performed. Thus in write paths we can end up marking some pages or
    inodes dirty even though the file system is already frozen. This creates
    problems with flusher thread hanging on frozen filesystem.

    Another problem is that exclusion between ->page_mkwrite() and filesystem
    freezing has been handled by setting page dirty and then verifying s_frozen.
    This guaranteed that either the freezing code sees the faulted page, writes it,
    and writeprotects it again or we see s_frozen set and bail out of page fault.
    This works to protect from page being marked writeable while filesystem
    freezing is running but has an unpleasant artefact of leaving dirty (although
    unmodified and writeprotected) pages on frozen filesystem resulting in similar
    problems with flusher thread as the first problem.

    This patch aims at providing exclusion between write paths and filesystem
    freezing. We implement a writer-freeze read-write semaphore in the superblock.
    Actually, there are three such semaphores because of lock ranking reasons - one
    for page fault handlers (->page_mkwrite), one for all other writers, and one of
    internal filesystem purposes (used e.g. to track running transactions). Write
    paths which should block freezing (e.g. directory operations, ->aio_write(),
    ->page_mkwrite) hold reader side of the semaphore. Code freezing the filesystem
    takes the writer side.

    Only that we don't really want to bounce cachelines of the semaphores between
    CPUs for each write happening. So we implement the reader side of the semaphore
    as a per-cpu counter and the writer side is implemented using s_writers.frozen
    superblock field.

    [AV: microoptimize sb_start_write(); we want it fast in normal case]

    BugLink: https://bugs.launchpad.net/bugs/897421
    Tested-by: Kamal Mostafa
    Tested-by: Peter M. Petrakis
    Tested-by: Dann Frazier
    Tested-by: Massimo Morana
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Pull writeback updates from Wu Fengguang:
    "Use time based periods to age the writeback proportions, which can
    adapt equally well to fast/slow devices."

    Fix up trivial conflict in comment in fs/sync.c

    * tag 'writeback-proportions' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    writeback: Fix some comment errors
    block: Convert BDI proportion calculations to flexible proportions
    lib: Fix possible deadlock in flexible proportion code
    lib: Proportions with flexible period

    Linus Torvalds
     
  • Pull NFS client updates from Trond Myklebust:
    "Features include:
    - More preparatory patches for modularising NFSv2/v3/v4. Split out
    the various NFSv2/v3/v4-specific code into separate files
    - More preparation for the NFSv4 migration code
    - Ensure that OPEN(O_CREATE) observes the pNFS mds threshold
    parameters
    - pNFS fast failover when the data servers are down
    - Various cleanups and debugging patches"

    * tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (67 commits)
    nfs: fix fl_type tests in NFSv4 code
    NFS: fix pnfs regression with directio writes
    NFS: fix pnfs regression with directio reads
    sunrpc: clnt: Add missing braces
    nfs: fix stub return type warnings
    NFS: exit_nfs_v4() shouldn't be an __exit function
    SUNRPC: Add a missing spin_unlock to gss_mech_list_pseudoflavors
    NFS: Split out NFS v4 client functions
    NFS: Split out the NFS v4 filesystem types
    NFS: Create a single nfs_clone_super() function
    NFS: Split out NFS v4 server creating code
    NFS: Initialize the NFS v4 client from init_nfs_v4()
    NFS: Move the v4 getroot code to nfs4getroot.c
    NFS: Split out NFS v4 file operations
    NFS: Initialize v4 sysctls from nfs_init_v4()
    NFS: Create an init_nfs_v4() function
    NFS: Split out NFS v4 inode operations
    NFS: Split out NFS v3 inode operations
    NFS: Split out NFS v2 inode operations
    NFS: Clean up nfs4_proc_setclientid() and friends
    ...

    Linus Torvalds
     
  • There are two structures in which a count of snapshots are
    maintained:

    struct ceph_snap_context {
    ...
    u32 num_snaps;
    ...
    }
    and
    struct ceph_snap_realm {
    ...
    u32 num_prior_parent_snaps; /* had prior to parent_since */
    ...
    u32 num_snaps;
    ...
    }

    These fields never take on negative values (e.g., to hold special
    meaning), and so are really inherently unsigned. Furthermore they
    take their value from over-the-wire or on-disk formatted 32-bit
    values.

    So change their definition to have type u32, and change some spots
    elsewhere in the code to account for this change.

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     
  • We re-run the loop but we don't re-set the attrs pointer back to NULL.

    Signed-off-by: Alan Cox
    Reviewed-by: Alex Elder

    Alan Cox
     
  • When we detect a mds session reset, close the old ceph_connection before
    reopening it. This ensures we clean up the old socket properly and keep
    the ceph_connection state correct.

    Signed-off-by: Sage Weil
    Reviewed-by: Alex Elder
    Reviewed-by: Yehuda Sadeh

    Sage Weil
     
  • Merge Andrew's first set of patches:
    "Non-MM patches:

    - lots of misc bits

    - tree-wide have_clk() cleanups

    - quite a lot of printk tweaks. I draw your attention to "printk:
    convert the format for KERN_ to a 2 byte pattern" which
    looks a bit scary. But afaict it's solid.

    - backlight updates

    - lib/ feature work (notably the addition and use of memweight())

    - checkpatch updates

    - rtc updates

    - nilfs updates

    - fatfs updates (partial, still waiting for acks)

    - kdump, proc, fork, IPC, sysctl, taskstats, pps, etc

    - new fault-injection feature work"

    * Merge emailed patches from Andrew Morton : (128 commits)
    drivers/misc/lkdtm.c: fix missing allocation failure check
    lib/scatterlist: do not re-write gfp_flags in __sg_alloc_table()
    fault-injection: add tool to run command with failslab or fail_page_alloc
    fault-injection: add selftests for cpu and memory hotplug
    powerpc: pSeries reconfig notifier error injection module
    memory: memory notifier error injection module
    PM: PM notifier error injection module
    cpu: rewrite cpu-notifier-error-inject module
    fault-injection: notifier error injection
    c/r: fcntl: add F_GETOWNER_UIDS option
    resource: make sure requested range is included in the root range
    include/linux/aio.h: cpp->C conversions
    fs: cachefiles: add support for large files in filesystem caching
    pps: return PTR_ERR on error in device_create
    taskstats: check nla_reserve() return
    sysctl: suppress kmemleak messages
    ipc: use Kconfig options for __ARCH_WANT_[COMPAT_]IPC_PARSE_VERSION
    ipc: compat: use signed size_t types for msgsnd and msgrcv
    ipc: allow compat IPC version field parsing if !ARCH_WANT_OLD_COMPAT_IPC
    ipc: add COMPAT_SHMLBA support
    ...

    Linus Torvalds
     
  • When we restore file descriptors we would like them to look exactly as
    they were at dumping time.

    With help of fcntl it's almost possible, the missing snippet is file
    owners UIDs.

    To be able to read their values the F_GETOWNER_UIDS is introduced.

    This option is valid iif CONFIG_CHECKPOINT_RESTORE is turned on, otherwise
    returning -EINVAL.

    Signed-off-by: Cyrill Gorcunov
    Acked-by: "Eric W. Biederman"
    Cc: "Serge E. Hallyn"
    Cc: Oleg Nesterov
    Cc: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov