10 Jan, 2015

3 commits

  • Pull two nfsd bugfixes from Bruce Fields.

    * 'for-3.19' of git://linux-nfs.org/~bfields/linux:
    rpc: fix xdr_truncate_encode to handle buffer ending on page boundary
    nfsd: fix fi_delegees leak when fi_had_conflict returns true

    Linus Torvalds
     
  • Pull two Ceph fixes from Sage Weil:
    "These are both pretty trivial: a sparse warning fix and size_t printk
    thing"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    libceph: fix sparse endianness warnings
    ceph: use %zu for len in ceph_fill_inline_data()

    Linus Torvalds
     
  • Pull btrfs fixes from Chris Mason:
    "None of these are huge, but my commit does fix a regression from 3.18
    that could cause lost files during log replay.

    This also adds Dave Sterba to the list of Btrfs maintainers. It
    doesn't mean we're doing things differently, but Dave has really been
    helping with the maintainer workload for years"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: don't delay inode ref updates during log replay
    Btrfs: correctly get tree level in tree_backref_for_extent
    Btrfs: call inode_dec_link_count() on mkdir error path
    Btrfs: abort transaction if we don't find the block group
    Btrfs, scrub: uninitialized variable in scrub_extent_for_parity()
    Btrfs: add more maintainers

    Linus Torvalds
     

09 Jan, 2015

4 commits

  • Fix clashing values for O_PATH and FMODE_NONOTIFY on sparc. The
    clashing O_PATH value was added in commit 5229645bdc35 ("vfs: add
    nonconflicting values for O_PATH") but this can't be changed as it is
    user-visible.

    FMODE_NONOTIFY is only used internally in the kernel, but it is in the
    same numbering space as the other O_* flags, as indicated by the comment
    at the top of include/uapi/asm-generic/fcntl.h (and its use in
    fs/notify/fanotify/fanotify_user.c). So renumber it to avoid the clash.

    All of this has happened before (commit 12ed2e36c98a: "fanotify:
    FMODE_NONOTIFY and __O_SYNC in sparc conflict"), and all of this will
    happen again -- so update the uniqueness check in fcntl_init() to
    include __FMODE_NONOTIFY.

    Signed-off-by: David Drysdale
    Acked-by: David S. Miller
    Acked-by: Jan Kara
    Cc: Heinrich Schuchardt
    Cc: Alexander Viro
    Cc: Arnd Bergmann
    Cc: Stephen Rothwell
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Drysdale
     
  • In ocfs2_link(), the parent directory inode passed to function
    ocfs2_lookup_ino_from_name() is wrong. Parameter dir is the parent of
    new_dentry not old_dentry. We should get old_dir from old_dentry and
    lookup old_dentry in old_dir in case another node remove the old dentry.

    With this change, hard linking works again, when paths are relative with
    at least one subdirectory. This is how the problem was reproducable:

    # mkdir a
    # mkdir b
    # touch a/test
    # ln a/test b/test
    ln: failed to create hard link `b/test' => `a/test': No such file or directory

    However when creating links in the same dir, it worked well.

    Now the link gets created.

    Fixes: 0e048316ff57 ("ocfs2: check existence of old dentry in ocfs2_link()")
    Signed-off-by: joyce.xue
    Reported-by: Szabo Aron - UBIT
    Cc: Mark Fasheh
    Cc: Joel Becker
    Tested-by: Aron Szabo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xue jiufei
     
  • In dlm_process_recovery_data, only when dlm_new_lock failed the ret will
    be set to -ENOMEM. And in this case, newlock is definitely NULL. So
    test newlock is meaningless, remove it.

    Signed-off-by: Joseph Qi
    Reviewed-by: Alex Chen
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • len is size_t, should be printed with %zu.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     

08 Jan, 2015

1 commit

  • Currently, nfs4_set_delegation takes a reference to an existing
    delegation and then checks to see if there is a conflict. If there is
    one, then it doesn't release that reference.

    Change the code to take the reference after the check and only if there
    is no conflict.

    Signed-off-by: Jeff Layton
    Cc: stable@vger.kernel.org
    Signed-off-by: J. Bruce Fields

    Jeff Layton
     

07 Jan, 2015

1 commit


03 Jan, 2015

7 commits

  • Signed-off-by: Jakub Wilk
    Signed-off-by: Theodore Ts'o

    Jakub Wilk
     
  • This reverts commit 14516bb7bb6ffbd49f35389f9ece3b2045ba5815.

    This was causing regression test failures with generic/285 with an ext3
    filesystem using CONFIG_EXT4_USE_FOR_EXT23.

    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     
  • Commit 1d52c78afbb (Btrfs: try not to ENOSPC on log replay) added a
    check to skip delayed inode updates during log replay because it
    confuses the enospc code. But the delayed processing will end up
    ignoring delayed refs from log replay because the inode itself wasn't
    put through the delayed code.

    This can end up triggering a warning at commit time:

    WARNING: CPU: 2 PID: 778 at fs/btrfs/delayed-inode.c:1410 btrfs_assert_delayed_root_empty+0x32/0x34()

    Which is repeated for each commit because we never process the delayed
    inode ref update.

    The fix used here is to change btrfs_delayed_delete_inode_ref to return
    an error if we're currently in log replay. The caller will do the ref
    deletion immediately and everything will work properly.

    Signed-off-by: Chris Mason
    cc: stable@vger.kernel.org # v3.18 and any stable series that picked 1d52c78afbbf80b58299e076a159617d6b42fe3c

    Chris Mason
     
  • If we are using skinny metadata, the block's tree level is in the offset
    of the key and not in a btrfs_tree_block_info structure following the
    extent item (it doesn't exist). Therefore fix it.

    Besides returning the correct level in the tree, this also prevents reading
    past the leaf's end in the case where the extent item is the last item in
    the leaf (eb) and it has only 1 inline reference - this is because
    sizeof(struct btrfs_tree_block_info) is greater than
    sizeof(struct btrfs_extent_inline_ref).

    Got it while running a scrub which produced the following warning:

    BTRFS: checksum error at logical 42123264 on dev /dev/sde, sector 15840: metadata node (level 24) in tree 5

    Signed-off-by: Filipe Manana
    Reviewed-by: Satoru Takeuchi
    Signed-off-by: Chris Mason

    Filipe Manana
     
  • In btrfs_mkdir(), if it fails to create dir, we should
    clean up existed items, setting inode's link properly
    to make sure it could be cleaned up properly.

    Signed-off-by: Wang Shilong
    Signed-off-by: Chris Mason

    Wang Shilong
     
  • We shouldn't BUG_ON() if there is corruption. I hit this while testing my block
    group patch and the abort worked properly. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • The only way that "ret" is set is when we call scrub_pages_for_parity()
    so the skip to "if (ret) " test doesn't make sense and causes a static
    checker warning.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Chris Mason

    Dan Carpenter
     

30 Dec, 2014

2 commits


27 Dec, 2014

1 commit

  • Prevent BUG or corrupted file systems after the following:

    mkfs.ext4 /dev/vdc 100M
    mount -t ext4 -o sb=40961 /dev/vdc /vdc
    resize2fs /dev/vdc

    We previously prevented online resizing using the old resize ioctl.
    Move the code to ext4_resize_begin(), so the check applies for all of
    the resize ioctl's.

    Reported-by: Maxim Malkov
    Signed-off-by: Theodore Ts'o

    Theodore Ts'o
     

23 Dec, 2014

1 commit

  • In spite of different file type,
    if file is same name and same inode number, old inode cache is used.
    This causes that you can not cd directory, can not cat SymbolicLink.
    So this patch is that if file type is different, return error.

    Reproducible sample :
    1. create file 'a' at cifs client.
    2. repeat rm and mkdir 'a' 4 times at server, then direcotry 'a' having same inode number is created.
    (Repeat 4 times, then same inode number is recycled.)
    (When server is under RHEL 6.6, 1 time is O.K. Always same inode number is recycled.)
    3. ls -li at client, then you can not cd directory, can not remove directory.

    SymbolicLink has same problem.

    Bug link:
    https://bugzilla.kernel.org/show_bug.cgi?id=90011

    Signed-off-by: Nakajima Akira
    Acked-by: Jeff Layton
    Signed-off-by: Steve French

    Nakajima Akira
     

22 Dec, 2014

2 commits

  • Replace repeated dereferences like dir->i_sb by storing superblock
    pointer in a variable and using that.

    Signed-off-by: Jan Kara

    Jan Kara
     
  • Check that length specified in a component of a symlink fits in the
    input buffer we are reading. Also properly ignore component length for
    component types that do not use it. Otherwise we read memory after end
    of buffer for corrupted udf image.

    Reported-by: Carl Henrik Lunde
    CC: stable@vger.kernel.org
    Signed-off-by: Jan Kara

    Jan Kara
     

20 Dec, 2014

4 commits

  • Pull vfs pile #3 from Al Viro:
    "Assorted fixes and patches from the last cycle"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    [regression] chunk lost from bd9b51
    vfs: make mounts and mountstats honor root dir like mountinfo does
    vfs: cleanup show_mountinfo
    init: fix read-write root mount
    unfuck binfmt_misc.c (broken by commit e6084d4)
    vm_area_operations: kill ->migrate()
    new helper: iter_is_iovec()
    move_extent_per_page(): get rid of unused w_flags
    lustre: get rid of playing with ->fs
    btrfs: filp_open() returns ERR_PTR() on failure, not NULL...

    Linus Torvalds
     
  • …/git/tyhicks/ecryptfs

    Pull eCryptfs fixes from Tyler Hicks:
    "Fixes for filename decryption and encrypted view plus a cleanup

    - The filename decryption routines were, at times, writing a zero
    byte one character past the end of the filename buffer

    - The encrypted view feature attempted, and failed, to roll its own
    form of enforcing a read-only mount instead of letting the VFS
    enforce it"

    * tag 'ecryptfs-3.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
    eCryptfs: Remove buggy and unnecessary write in file name decode routine
    eCryptfs: Remove unnecessary casts when parsing packet lengths
    eCryptfs: Force RO mount when encrypted view is enabled

    Linus Torvalds
     
  • Pull more btrfs updates from Chris Mason:
    "This is part two of our merge window patches.

    These are all from Filipe, and fix some really hard to find races that
    can cause corruptions. Most of them involved block group removal
    (balance) or discard"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: remove non-sense btrfs_error_discard_extent() function
    Btrfs: fix fs corruption on transaction abort if device supports discard
    Btrfs: always clear a block group node when removing it from the tree
    Btrfs: ensure deletion from pinned_chunks list is protected

    Linus Torvalds
     
  • Pull irq core fix from Thomas Gleixner:
    "A single fix plugging a long standing race between proc/stat and
    proc/interrupts access and freeing of interrupt descriptors"

    * 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    genirq: Prevent proc race against freeing of irq descriptors

    Linus Torvalds
     

19 Dec, 2014

11 commits

  • Symlink reading code does not check whether the resulting path fits into
    the page provided by the generic code. This isn't as easy as just
    checking the symlink size because of various encoding conversions we
    perform on path. So we have to check whether there is still enough space
    in the buffer on the fly.

    CC: stable@vger.kernel.org
    Reported-by: Carl Henrik Lunde
    Signed-off-by: Jan Kara

    Jan Kara
     
  • UDF specification allows arbitrarily large symlinks. However we support
    only symlinks at most one block large. Check the length of the symlink
    so that we don't access memory beyond end of the symlink block.

    CC: stable@vger.kernel.org
    Reported-by: Carl Henrik Lunde
    Signed-off-by: Jan Kara

    Jan Kara
     
  • Verify that inode size is sane when loading inode with data stored in
    ICB. Otherwise we may get confused later when working with the inode and
    inode size is too big.

    CC: stable@vger.kernel.org
    Reported-by: Carl Henrik Lunde
    Signed-off-by: Jan Kara

    Jan Kara
     
  • We didn't check length of rock ridge ER records before printing them.
    Thus corrupted isofs image can cause us to access and print some memory
    behind the buffer with obvious consequences.

    Reported-and-tested-by: Carl Henrik Lunde
    CC: stable@vger.kernel.org
    Signed-off-by: Jan Kara

    Jan Kara
     
  • Merge misc patches from Andrew Morton:
    "A few stragglers"

    * emailed patches from Andrew Morton :
    tools/testing/selftests/Makefile: alphasort the TARGETS list
    mm/zsmalloc: adjust order of functions
    ocfs2: fix journal commit deadlock
    ocfs2/dlm: fix race between dispatched_work and dlm_lockres_grab_inflight_worker
    ocfs2: reflink: fix slow unlink for refcounted file
    mm/memory.c:do_shared_fault(): add comment
    .mailmap: Santosh Shilimkar has moved
    .mailmap: update akpm@osdl.org
    lib/show_mem.c: add cma reserved information
    fs/proc/meminfo.c: include cma info in proc/meminfo
    mm: cma: split cma-reserved in dmesg log
    hfsplus: fix longname handling
    mm/mempolicy.c: remove unnecessary is_valid_nodemask()

    Linus Torvalds
     
  • For buffer write, page lock will be got in write_begin and released in
    write_end, in ocfs2_write_end_nolock(), before it unlock the page in
    ocfs2_free_write_ctxt(), it calls ocfs2_run_deallocs(), this will ask
    for the read lock of journal->j_trans_barrier. Holding page lock and
    ask for journal->j_trans_barrier breaks the locking order.

    This will cause a deadlock with journal commit threads, ocfs2cmt will
    get write lock of journal->j_trans_barrier first, then it wakes up
    kjournald2 to do the commit work, at last it waits until done. To
    commit journal, kjournald2 needs flushing data first, it needs get the
    cache page lock.

    Since some ocfs2 cluster locks are holding by write process, this
    deadlock may hung the whole cluster.

    unlock pages before ocfs2_run_deallocs() can fix the locking order, also
    put unlock before ocfs2_commit_trans() to make page lock is unlocked
    before j_trans_barrier to preserve unlocking order.

    Signed-off-by: Junxiao Bi
    Reviewed-by: Wengang Wang
    Cc:
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     
  • Commit ac4fef4d23ed ("ocfs2/dlm: do not purge lockres that is queued for
    assert master") may have the following possible race case:

    dlm_dispatch_assert_master dlm_wq
    ========================================================================
    queue_work(dlm->quedlm_worker,
    &dlm->dispatched_work);
    dispatch work,
    dlm_lockres_drop_inflight_worker
    *BUG_ON(res->inflight_assert_workers == 0)*
    dlm_lockres_grab_inflight_worker
    inflight_assert_workers++

    So ensure inflight_assert_workers to be increased first.

    Signed-off-by: Joseph Qi
    Signed-off-by: Xue jiufei
    Cc: Joel Becker
    Reviewed-by: Mark Fasheh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • When running ocfs2 test suite multiple nodes reflink stress test, for a
    4 nodes cluster, every unlink() for refcounted file needs about 700s.

    The slow unlink is caused by the contention of refcount tree lock since
    all nodes are unlink files using the same refcount tree. When the
    unlinking file have many extents(over 1600 in our test), most of the
    extents has refcounted flag set. In ocfs2_commit_truncate(), it will
    execute the following call trace for every extents. This means it needs
    get and released refcount tree lock about 1600 times. And when several
    nodes are do this at the same time, the performance will be very low.

    ocfs2_remove_btree_range()
    -- ocfs2_lock_refcount_tree()
    ---- ocfs2_refcount_lock()
    ------ __ocfs2_cluster_lock()

    ocfs2_refcount_lock() is costly, move it to ocfs2_commit_truncate() to
    do lock/unlock once can improve a lot performance.

    Signed-off-by: Junxiao Bi
    Cc: Wengang
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junxiao Bi
     
  • This patch include CMA info (CMATotal, CMAFree) in /proc/meminfo.
    Currently, in a CMA enabled system, if somebody wants to know the total
    CMA size declared, there is no way to tell, other than the dmesg or
    /var/log/messages logs.

    With this patch we are showing the CMA info as part of meminfo, so that it
    can be determined at any point of time. This will be populated only when
    CMA is enabled.

    Below is the sample output from a ARM based device with RAM:512MB and CMA:16MB.

    MemTotal: 471172 kB
    MemFree: 111712 kB
    MemAvailable: 271172 kB
    .
    .
    .
    CmaTotal: 16384 kB
    CmaFree: 6144 kB

    This patch also fix below checkpatch errors that were found during these changes.

    ERROR: space required after that ',' (ctx:ExV)
    199: FILE: fs/proc/meminfo.c:199:
    + ,atomic_long_read(&num_poisoned_pages) << (PAGE_SHIFT - 10)
    ^

    ERROR: space required after that ',' (ctx:ExV)
    202: FILE: fs/proc/meminfo.c:202:
    + ,K(global_page_state(NR_ANON_TRANSPARENT_HUGEPAGES) *
    ^

    ERROR: space required after that ',' (ctx:ExV)
    206: FILE: fs/proc/meminfo.c:206:
    + ,K(totalcma_pages)
    ^

    total: 3 errors, 0 warnings, 2 checks, 236 lines checked

    Signed-off-by: Pintu Kumar
    Signed-off-by: Vishnu Pratap Singh
    Acked-by: Michal Nazarewicz
    Cc: Rafael Aquini
    Cc: Jerome Marchand
    Cc: Marek Szyprowski
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pintu Kumar
     
  • Longname is not correctly handled by hfsplus driver. If an attempt to
    create a longname(>255) file/directory is made, it succeeds by creating a
    file/directory with HFSPLUS_MAX_STRLEN and incorrect catalog key. Thus
    leaving the volume in an inconsistent state. This patch fixes this issue.

    Although lookup is always called first to create a negative entry, so just
    doing a check in lookup would probably fix this issue. I choose to
    propagate error to other iops as well.

    Please NOTE: I have factored out hfsplus_cat_build_key_with_cnid from
    hfsplus_cat_build_key, to avoid unncessary branching.

    Thanks a lot.

    TEST:
    ------
    dir="TEST_DIR"
    cdir=`pwd`
    name255="_123456789_123456789_123456789_123456789_123456789_123456789\
    _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
    _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
    _123456789_123456789_123456789_123456789_123456789_1234"
    name256="${name255}5"

    mkdir $dir
    cd $dir
    touch $name255
    rm -f $name255
    touch $name256
    ls -la
    cd $cdir
    rm -rf $dir

    RESULT:
    -------
    [sougata@ultrabook tmp]$ cdir=`pwd`
    [sougata@ultrabook tmp]$
    name255="_123456789_123456789_123456789_123456789_123456789_123456789\
    > _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
    > _123456789_123456789_123456789_123456789_123456789_123456789_123456789\
    > _123456789_123456789_123456789_123456789_123456789_1234"
    [sougata@ultrabook tmp]$ name256="${name255}5"
    [sougata@ultrabook tmp]$
    [sougata@ultrabook tmp]$ mkdir $dir
    [sougata@ultrabook tmp]$ cd $dir
    [sougata@ultrabook TEST_DIR]$ touch $name255
    [sougata@ultrabook TEST_DIR]$ rm -f $name255
    [sougata@ultrabook TEST_DIR]$ touch $name256
    [sougata@ultrabook TEST_DIR]$ ls -la
    ls: cannot access
    _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234:
    No such file or directory
    total 0
    drwxrwxr-x 1 sougata sougata 3 Feb 20 19:56 .
    drwxrwxrwx 1 root root 6 Feb 20 19:56 ..
    -????????? ? ? ? ? ?
    _123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_123456789_1234
    [sougata@ultrabook TEST_DIR]$ cd $cdir
    [sougata@ultrabook tmp]$ rm -rf $dir
    rm: cannot remove `TEST_DIR': Directory not empty

    -ENAMETOOLONG returned from hfsplus_asc2uni was not propaged to iops.
    This allowed hfsplus to create files/directories with HFSPLUS_MAX_STRLEN
    and incorrect keys, leaving the FS in an inconsistent state. This patch
    fixes this issue.

    Signed-off-by: Sougata Santra
    Reviewed-by: Christoph Hellwig
    Cc: Vyacheslav Dubeyko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sougata Santra
     
  • While reviewing the code of umount_tree I realized that when we append
    to a preexisting unmounted list we do not change pprev of the former
    first item in the list.

    Which means later in namespace_unlock hlist_del_init(&mnt->mnt_hash) on
    the former first item of the list will stomp unmounted.first leaving
    it set to some random mount point which we are likely to free soon.

    This isn't likely to hit, but if it does I don't know how anyone could
    track it down.

    [ This happened because we don't have all the same operations for
    hlist's as we do for normal doubly-linked lists. In particular,
    list_splice() is easy on our standard doubly-linked lists, while
    hlist_splice() doesn't exist and needs both start/end entries of the
    hlist. And commit 38129a13e6e7 incorrectly open-coded that missing
    hlist_splice().

    We should think about making these kinds of "mindless" conversions
    easier to get right by adding the missing hlist helpers - Linus ]

    Fixes: 38129a13e6e71f666e0468e99fdd932a687b4d7e switch mnt_hash to hlist
    Cc: stable@vger.kernel.org
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

18 Dec, 2014

3 commits

  • Neither Sage nor I noticed that Zheng Yan had mistakenly committed
    fs/ceph/super.h.rej as part of commit 31c542a199d7 ("ceph: add inline
    data to pagecache").

    Remove it.

    Requested-by: Yan, Zheng
    Cc: Sage Weil
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Pull ceph updates from Sage Weil:
    "The big item here is support for inline data for CephFS and for
    message signatures from Zheng. There are also several bug fixes,
    including interrupted flock request handling, 0-length xattrs, mksnap,
    cached readdir results, and a message version compat field. Finally
    there are several cleanups from Ilya, Dan, and Markus.

    Note that there is another series coming soon that fixes some bugs in
    the RBD 'lingering' requests, but it isn't quite ready yet"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (27 commits)
    ceph: fix setting empty extended attribute
    ceph: fix mksnap crash
    ceph: do_sync is never initialized
    libceph: fixup includes in pagelist.h
    ceph: support inline data feature
    ceph: flush inline version
    ceph: convert inline data to normal data before data write
    ceph: sync read inline data
    ceph: fetch inline data when getting Fcr cap refs
    ceph: use getattr request to fetch inline data
    ceph: add inline data to pagecache
    ceph: parse inline data in MClientReply and MClientCaps
    libceph: specify position of extent operation
    libceph: add CREATE osd operation support
    libceph: add SETXATTR/CMPXATTR osd operations support
    rbd: don't treat CEPH_OSD_OP_DELETE as extent op
    ceph: remove unused stringification macros
    libceph: require cephx message signature by default
    ceph: introduce global empty snap context
    ceph: message versioning fixes
    ...

    Linus Torvalds
     
  • Pull user namespace related fixes from Eric Biederman:
    "As these are bug fixes almost all of thes changes are marked for
    backporting to stable.

    The first change (implicitly adding MNT_NODEV on remount) addresses a
    regression that was created when security issues with unprivileged
    remount were closed. I go on to update the remount test to make it
    easy to detect if this issue reoccurs.

    Then there are a handful of mount and umount related fixes.

    Then half of the changes deal with the a recently discovered design
    bug in the permission checks of gid_map. Unix since the beginning has
    allowed setting group permissions on files to less than the user and
    other permissions (aka ---rwx---rwx). As the unix permission checks
    stop as soon as a group matches, and setgroups allows setting groups
    that can not later be dropped, results in a situtation where it is
    possible to legitimately use a group to assign fewer privileges to a
    process. Which means dropping a group can increase a processes
    privileges.

    The fix I have adopted is that gid_map is now no longer writable
    without privilege unless the new file /proc/self/setgroups has been
    set to permanently disable setgroups.

    The bulk of user namespace using applications even the applications
    using applications using user namespaces without privilege remain
    unaffected by this change. Unfortunately this ix breaks a couple user
    space applications, that were relying on the problematic behavior (one
    of which was tools/selftests/mount/unprivileged-remount-test.c).

    To hopefully prevent needing a regression fix on top of my security
    fix I rounded folks who work with the container implementations mostly
    like to be affected and encouraged them to test the changes.

    > So far nothing broke on my libvirt-lxc test bed. :-)
    > Tested with openSUSE 13.2 and libvirt 1.2.9.
    > Tested-by: Richard Weinberger

    > Tested on Fedora20 with libvirt 1.2.11, works fine.
    > Tested-by: Chen Hanxiao

    > Ok, thanks - yes, unprivileged lxc is working fine with your kernels.
    > Just to be sure I was testing the right thing I also tested using
    > my unprivileged nsexec testcases, and they failed on setgroup/setgid
    > as now expected, and succeeded there without your patches.
    > Tested-by: Serge Hallyn

    > I tested this with Sandstorm. It breaks as is and it works if I add
    > the setgroups thing.
    > Tested-by: Andy Lutomirski # breaks things as designed :("

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    userns: Unbreak the unprivileged remount tests
    userns; Correct the comment in map_write
    userns: Allow setting gid_maps without privilege when setgroups is disabled
    userns: Add a knob to disable setgroups on a per user namespace basis
    userns: Rename id_map_mutex to userns_state_mutex
    userns: Only allow the creator of the userns unprivileged mappings
    userns: Check euid no fsuid when establishing an unprivileged uid mapping
    userns: Don't allow unprivileged creation of gid mappings
    userns: Don't allow setgroups until a gid mapping has been setablished
    userns: Document what the invariant required for safe unprivileged mappings.
    groups: Consolidate the setgroups permission checks
    mnt: Clear mnt_expire during pivot_root
    mnt: Carefully set CL_UNPRIVILEGED in clone_mnt
    mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers.
    umount: Do not allow unmounting rootfs.
    umount: Disallow unprivileged mount force
    mnt: Update unprivileged remount test
    mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount

    Linus Torvalds