18 May, 2015

5 commits

  • …nux-stable into ti-linux-3.14.y

    This is the 3.14.43 stable release

    * tag 'v3.14.43' of http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (52 commits)
    Linux 3.14.43
    kvm: arm64: vgic: fix hyp panic with 64k pages on juno platform
    arm64: kvm: use inner-shareable barriers for inner-shareable maintenance
    KVM: ARM: vgic: Fix the overlap check action about setting the GICD & GICC base address.
    KVM: arm/arm64: vgic: fix GICD_ICFGR register accesses
    ARM: KVM: trap VM system registers until MMU and caches are ON
    ARM: KVM: add world-switch for AMAIR{0,1}
    ARM: KVM: introduce per-vcpu HYP Configuration Register
    ARM: KVM: fix ordering of 64bit coprocessor accesses
    ARM: KVM: fix handling of trapped 64bit coprocessor accesses
    ARM: KVM: force cache clean on page fault when caches are off
    arm64: KVM: flush VM pages before letting the guest enable caches
    ARM: KVM: introduce kvm_p*d_addr_end
    arm64: KVM: trap VM system registers until MMU and caches are ON
    arm64: KVM: allows discrimination of AArch32 sysreg access
    arm64: KVM: force cache clean on page fault when caches are off
    deal with deadlock in d_walk()
    ACPICA: Utilities: Cleanup to remove useless ACPI_PRINTF/FORMAT_xxx helpers.
    ACPICA: Utilities: Cleanup to convert physical address printing formats.
    ACPICA: Utilities: Cleanup to enforce ACPI_PHYSADDR_TO_PTR()/ACPI_PTR_TO_PHYSADDR().
    ...

    Signed-off-by: Texas Instruments Auto Merger <lcpd_integration@list.ti.com>

    Texas Instruments Auto Merger
     
  • commit ca5358ef75fc69fee5322a38a340f5739d997c10 upstream.

    ... by not hitting rename_retry for reasons other than rename having
    happened. In other words, do _not_ restart when finding that
    between unlocking the child and locking the parent the former got
    into __dentry_kill(). Skip the killed siblings instead...

    Signed-off-by: Al Viro
    Cc: Ben Hutchings
    [hujianyang: Backported to 3.14 refer to the work of Ben Hutchings in 3.2:
    - Adjust context to make __dentry_kill() apply to d_kill()]
    Signed-off-by: hujianyang
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     
  • commit 7e96c1b0e0f495c5a7450dc4aa7c9a24ba4305bd upstream.

    This fixes a dumb bug in fs_fully_visible that allows proc or sys to
    be mounted if there is a bind mount of part of /proc/ or /sys/ visible.

    Reported-by: Eric Windisch
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit d8fd150fe3935e1692bf57c66691e17409ebb9c1 upstream.

    The range check for b-tree level parameter in nilfs_btree_root_broken()
    is wrong; it accepts the case of "level == NILFS_BTREE_LEVEL_MAX" even
    though the level is limited to values in the range of 0 to
    (NILFS_BTREE_LEVEL_MAX - 1).

    Since the level parameter is read from storage device and used to index
    nilfs_btree_path array whose element count is NILFS_BTREE_LEVEL_MAX, it
    can cause memory overrun during btree operations if the boundary value
    is set to the level parameter on device.

    This fixes the broken sanity check and adds a comment to clarify that
    the upper bound NILFS_BTREE_LEVEL_MAX is exclusive.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Ryusuke Konishi
     
  • commit b1432a2a35565f538586774a03bf277c27fc267d upstream.

    There is a race window in dlm_get_lock_resource(), which may return a
    lock resource which has been purged. This will cause the process to
    hang forever in dlmlock() as the ast msg can't be handled due to its
    lock resource not existing.

    dlm_get_lock_resource {
    ...
    spin_lock(&dlm->spinlock);
    tmpres = __dlm_lookup_lockres_full(dlm, lockid, namelen, hash);
    if (tmpres) {
    spin_unlock(&dlm->spinlock);
    >>>>>>>> race window, dlm_run_purge_list() may run and purge
    the lock resource
    spin_lock(&tmpres->spinlock);
    ...
    spin_unlock(&tmpres->spinlock);
    }
    }

    Signed-off-by: Junxiao Bi
    Cc: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Junxiao Bi
     

14 May, 2015

1 commit

  • …nux-stable into ti-linux-3.14.y

    This is the 3.14.42 stable release

    * tag 'v3.14.42' of http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (26 commits)
    Linux 3.14.42
    ARC: signal handling robustify
    UBI: fix soft lockup in ubi_check_volume()
    compal-laptop: Fix leaking hwmon device
    Drivers: hv: vmbus: Don't wait after requesting offers
    staging: panel: fix lcd type
    usb: gadget: printer: enqueue printer's response for setup request
    usb: host: ehci: use new USB_RESUME_TIMEOUT
    usb: host: oxu210hp: use new USB_RESUME_TIMEOUT
    usb: musb: use new USB_RESUME_TIMEOUT
    drm/radeon: add SI DPM quirk for Sapphire R9 270 Dual-X 2G GDDR5
    3w-sas: fix command completion race
    3w-9xxx: fix command completion race
    3w-xxxx: fix command completion race
    ext4: fix data corruption caused by unwritten and delayed extents
    rbd: end I/O the entire obj_request on error
    tty/serial: at91: maxburst was missing for dma transfers
    ASoC: dapm: Enable autodisable on SOC_DAPM_SINGLE_TLV_AUTODISABLE
    serial: of-serial: Remove device_type = "serial" registration
    ALSA: hda - Add mute-LED mode control to Thinkpad
    ...

    Conflicts:
    drivers/usb/musb/musb_core.c

    Signed-off-by: Dan Murphy <dmurphy@ti.com>

    Dan Murphy
     

13 May, 2015

1 commit

  • commit d2dc317d564a46dfc683978a2e5a4f91434e9711 upstream.

    Currently it is possible to lose whole file system block worth of data
    when we hit the specific interaction with unwritten and delayed extents
    in status extent tree.

    The problem is that when we insert delayed extent into extent status
    tree the only way to get rid of it is when we write out delayed buffer.
    However there is a limitation in the extent status tree implementation
    so that when inserting unwritten extent should there be even a single
    delayed block the whole unwritten extent would be marked as delayed.

    At this point, there is no way to get rid of the delayed extents,
    because there are no delayed buffers to write out. So when a we write
    into said unwritten extent we will convert it to written, but it still
    remains delayed.

    When we try to write into that block later ext4_da_map_blocks() will set
    the buffer new and delayed and map it to invalid block which causes
    the rest of the block to be zeroed loosing already written data.

    For now we can fix this by simply not allowing to set delayed status on
    written extent in the extent status tree. Also add WARN_ON() to make
    sure that we notice if this happens in the future.

    This problem can be easily reproduced by running the following xfs_io.

    xfs_io -f -c "pwrite -S 0xaa 4096 2048" \
    -c "falloc 0 131072" \
    -c "pwrite -S 0xbb 65536 2048" \
    -c "fsync" /mnt/test/fff

    echo 3 > /proc/sys/vm/drop_caches
    xfs_io -c "pwrite -S 0xdd 67584 2048" /mnt/test/fff

    This can be theoretically also reproduced by at random by running fsx,
    but it's not very reliable, though on machines with bigger page size
    (like ppc) this can be seen more often (especially xfstest generic/127)

    Signed-off-by: Lukas Czerner
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Lukas Czerner
     

07 May, 2015

10 commits

  • …nux-stable into ti-linux-3.14.y

    This is the 3.14.41 stable release

    * tag 'v3.14.41' of http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (93 commits)
    Linux 3.14.41
    nosave: consolidate __nosave_{begin,end} in <asm/sections.h>
    fs: take i_mutex during prepare_binprm for set[ug]id executables
    driver core: bus: Goto appropriate labels on failure in bus_add_device
    memstick: mspro_block: add missing curly braces
    C6x: time: Ensure consistency in __init
    crypto: omap-aes - Fix support for unequal lengths
    wl18xx: show rx_frames_per_rates as an array as it really is
    lib: memzero_explicit: use barrier instead of OPTIMIZER_HIDE_VAR
    e1000: add dummy allocator to fix race condition between mtu change and netpoll
    ksoftirqd: Enable IRQs and call cond_resched() before poking RCU
    RCU pathwalk breakage when running into a symlink overmounting something
    drm/i915: cope with large i2c transfers
    drm/radeon: fix doublescan modes (v2)
    i2c: core: Export bus recovery functions
    IB/mlx4: Fix WQE LSO segment calculation
    IB/core: don't disallow registering region starting at 0x0
    IB/core: disallow registering 0-sized memory region
    stk1160: Make sure current buffer is released
    mvsas: fix panic on expander attached SATA devices
    ...

    Signed-off-by: Texas Instruments Auto Merger <lcpd_integration@list.ti.com>

    Texas Instruments Auto Merger
     
  • commit 8b01fc86b9f425899f8a3a8fc1c47d73c2c20543 upstream.

    This prevents a race between chown() and execve(), where chowning a
    setuid-user binary to root would momentarily make the binary setuid
    root.

    This patch was mostly written by Linus Torvalds.

    Signed-off-by: Jann Horn
    Signed-off-by: Linus Torvalds
    Signed-off-by: Charles Williams
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     
  • commit 3cab989afd8d8d1bc3d99fef0e7ed87c31e7b647 upstream.

    Calling unlazy_walk() in walk_component() and do_last() when we find
    a symlink that needs to be followed doesn't acquire a reference to vfsmount.
    That's fine when the symlink is on the same vfsmount as the parent directory
    (which is almost always the case), but it's not always true - one _can_
    manage to bind a symlink on top of something. And in such cases we end up
    with excessive mntput().

    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     
  • commit e12fb97222fc41e8442896934f76d39ef99b590a upstream.

    Previously commit 14ece1028b3ed53ffec1b1213ffc6acaf79ad77c added a
    support for for syncing parent directory of newly created inodes to
    make sure that the inode is not lost after a power failure in
    no-journal mode.

    However this does not work in majority of cases, namely:
    - if the directory has inline data
    - if the directory is already indexed
    - if the directory already has at least one block and:
    - the new entry fits into it
    - or we've successfully converted it to indexed

    So in those cases we might lose the inode entirely even after fsync in
    the no-journal mode. This also includes ext2 default mode obviously.

    I've noticed this while running xfstest generic/321 and even though the
    test should fail (we need to run fsck after a crash in no-journal mode)
    I could not find a newly created entries even when if it was fsynced
    before.

    Fix this by adjusting the ext4_add_entry() successful exit paths to set
    the inode EXT4_STATE_NEWENTRY so that fsync has the chance to fsync the
    parent directory as well.

    Signed-off-by: Lukas Czerner
    Signed-off-by: Theodore Ts'o
    Reviewed-by: Jan Kara
    Cc: Frank Mayhar
    Signed-off-by: Greg Kroah-Hartman

    Lukas Czerner
     
  • commit a87938b2e246b81b4fb713edb371a9fa3c5c3c86 upstream.

    With CONFIG_ARCH_BINFMT_ELF_RANDOMIZE_PIE enabled, and a normal top-down
    address allocation strategy, load_elf_binary() will attempt to map a PIE
    binary into an address range immediately below mm->mmap_base.

    Unfortunately, load_elf_ binary() does not take account of the need to
    allocate sufficient space for the entire binary which means that, while
    the first PT_LOAD segment is mapped below mm->mmap_base, the subsequent
    PT_LOAD segment(s) end up being mapped above mm->mmap_base into the are
    that is supposed to be the "gap" between the stack and the binary.

    Since the size of the "gap" on x86_64 is only guaranteed to be 128MB this
    means that binaries with large data segments > 128MB can end up mapping
    part of their data segment over their stack resulting in corruption of the
    stack (and the data segment once the binary starts to run).

    Any PIE binary with a data segment > 128MB is vulnerable to this although
    address randomization means that the actual gap between the stack and the
    end of the binary is normally greater than 128MB. The larger the data
    segment of the binary the higher the probability of failure.

    Fix this by calculating the total size of the binary in the same way as
    load_elf_interp().

    Signed-off-by: Michael Davidson
    Cc: Alexander Viro
    Cc: Jiri Kosina
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Michael Davidson
     
  • commit c1b8940b42bb6487b10f2267a96b486276ce9ff7 upstream.

    We have observed a BUG() crash in fs/attr.c:notify_change(). The crash
    occurs during an rsync into a filesystem that is exported via NFS.

    1.) fs/attr.c:notify_change() modifies the caller's version of attr.
    2.) 6de0ec00ba8d ("VFS: make notify_change pass ATTR_KILL_S*ID to
    setattr operations") introduced a BUG() restriction such that "no
    function will ever call notify_change() with both ATTR_MODE and
    ATTR_KILL_S*ID set". Under some circumstances though, it will have
    assisted in setting the caller's version of attr to this very
    combination.
    3.) 27ac0ffeac80 ("locks: break delegations on any attribute
    modification") introduced code to handle breaking
    delegations. This can result in notify_change() being re-called. attr
    _must_ be explicitly reset to avoid triggering the BUG() established
    in #2.
    4.) The path that that triggers this is via fs/open.c:chmod_common().
    The combination of attr flags set here and in the first call to
    notify_change() along with a later failed break_deleg_wait()
    results in notify_change() being called again via retry_deleg
    without resetting attr.

    Solution is to move retry_deleg in chmod_common() a bit further up to
    ensure attr is completely reset.

    There are other places where this seemingly could occur, such as
    fs/utimes.c:utimes_common(), but the attr flags are not initially
    set in such a way to trigger this.

    Fixes: 27ac0ffeac80 ("locks: break delegations on any attribute modification")
    Reported-by: Eric Meddaugh
    Tested-by: Eric Meddaugh
    Signed-off-by: Andrew Elble
    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Andrew Elble
     
  • commit 113e8283869b9855c8b999796aadd506bbac155f upstream.

    If we pass a length of 0 to the extent_same ioctl, we end up locking an
    extent range with a start offset greater then its end offset (if the
    destination file's offset is greater than zero). This results in a warning
    from extent_io.c:insert_state through the following call chain:

    btrfs_extent_same()
    btrfs_double_lock()
    lock_extent_range()
    lock_extent(inode->io_tree, offset, offset + len - 1)
    lock_extent_bits()
    __set_extent_bit()
    insert_state()
    --> WARN_ON(end < start)

    This leads to an infinite loop when evicting the inode. This is the same
    problem that my previous patch titled
    "Btrfs: fix inode eviction infinite loop after cloning into it" addressed
    but for the extent_same ioctl instead of the clone ioctl.

    Signed-off-by: Filipe Manana
    Reviewed-by: Omar Sandoval
    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana
     
  • commit ccccf3d67294714af2d72a6fd6fd7d73b01c9329 upstream.

    If we attempt to clone a 0 length region into a file we can end up
    inserting a range in the inode's extent_io tree with a start offset
    that is greater then the end offset, which triggers immediately the
    following warning:

    [ 3914.619057] WARNING: CPU: 17 PID: 4199 at fs/btrfs/extent_io.c:435 insert_state+0x4b/0x10b [btrfs]()
    [ 3914.620886] BTRFS: end < start 4095 4096
    (...)
    [ 3914.638093] Call Trace:
    [ 3914.638636] [] dump_stack+0x4c/0x65
    [ 3914.639620] [] warn_slowpath_common+0xa1/0xbb
    [ 3914.640789] [] ? insert_state+0x4b/0x10b [btrfs]
    [ 3914.642041] [] warn_slowpath_fmt+0x46/0x48
    [ 3914.643236] [] insert_state+0x4b/0x10b [btrfs]
    [ 3914.644441] [] __set_extent_bit+0x107/0x3f4 [btrfs]
    [ 3914.645711] [] lock_extent_bits+0x65/0x1bf [btrfs]
    [ 3914.646914] [] ? _raw_spin_unlock+0x28/0x33
    [ 3914.648058] [] ? test_range_bit+0xcc/0xde [btrfs]
    [ 3914.650105] [] lock_extent+0x13/0x15 [btrfs]
    [ 3914.651361] [] lock_extent_range+0x3d/0xcd [btrfs]
    [ 3914.652761] [] btrfs_ioctl_clone+0x278/0x388 [btrfs]
    [ 3914.654128] [] ? might_fault+0x58/0xb5
    [ 3914.655320] [] btrfs_ioctl+0xb51/0x2195 [btrfs]
    (...)
    [ 3914.669271] ---[ end trace 14843d3e2e622fc1 ]---

    This later makes the inode eviction handler enter an infinite loop that
    keeps dumping the following warning over and over:

    [ 3915.117629] WARNING: CPU: 22 PID: 4228 at fs/btrfs/extent_io.c:435 insert_state+0x4b/0x10b [btrfs]()
    [ 3915.119913] BTRFS: end < start 4095 4096
    (...)
    [ 3915.137394] Call Trace:
    [ 3915.137913] [] dump_stack+0x4c/0x65
    [ 3915.139154] [] warn_slowpath_common+0xa1/0xbb
    [ 3915.140316] [] ? insert_state+0x4b/0x10b [btrfs]
    [ 3915.141505] [] warn_slowpath_fmt+0x46/0x48
    [ 3915.142709] [] insert_state+0x4b/0x10b [btrfs]
    [ 3915.143849] [] __set_extent_bit+0x107/0x3f4 [btrfs]
    [ 3915.145120] [] ? btrfs_kill_super+0x17/0x23 [btrfs]
    [ 3915.146352] [] ? deactivate_locked_super+0x3b/0x50
    [ 3915.147565] [] lock_extent_bits+0x65/0x1bf [btrfs]
    [ 3915.148785] [] ? _raw_write_unlock+0x28/0x33
    [ 3915.149931] [] btrfs_evict_inode+0x196/0x482 [btrfs]
    [ 3915.151154] [] evict+0xa0/0x148
    [ 3915.152094] [] dispose_list+0x39/0x43
    [ 3915.153081] [] evict_inodes+0xdc/0xeb
    [ 3915.154062] [] generic_shutdown_super+0x49/0xef
    [ 3915.155193] [] kill_anon_super+0x13/0x1e
    [ 3915.156274] [] btrfs_kill_super+0x17/0x23 [btrfs]
    (...)
    [ 3915.167404] ---[ end trace 14843d3e2e622fc2 ]---

    So just bail out of the clone ioctl if the length of the region to clone
    is zero, without locking any extent range, in order to prevent this issue
    (same behaviour as a pwrite with a 0 length for example).

    This is trivial to reproduce. For example, the steps for the test I just
    made for fstests:

    mkfs.btrfs -f SCRATCH_DEV
    mount SCRATCH_DEV $SCRATCH_MNT

    touch $SCRATCH_MNT/foo
    touch $SCRATCH_MNT/bar

    $CLONER_PROG -s 0 -d 4096 -l 0 $SCRATCH_MNT/foo $SCRATCH_MNT/bar
    umount $SCRATCH_MNT

    A test case for fstests follows soon.

    Signed-off-by: Filipe Manana
    Reviewed-by: Omar Sandoval
    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana
     
  • commit 3c3b04d10ff1811a27f86684ccd2f5ba6983211d upstream.

    Due to insufficient check in btrfs_is_valid_xattr, this unexpectedly
    works:

    $ touch file
    $ setfattr -n user. -v 1 file
    $ getfattr -d file
    user.="1"

    ie. the missing attribute name after the namespace.

    Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=94291
    Reported-by: William Douglas
    Signed-off-by: David Sterba
    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    David Sterba
     
  • commit dcc82f4783ad91d4ab654f89f37ae9291cdc846a upstream.

    While committing a transaction we free the log roots before we write the
    new super block. Freeing the log roots implies marking the disk location
    of every node/leaf (metadata extent) as pinned before the new super block
    is written. This is to prevent the disk location of log metadata extents
    from being reused before the new super block is written, otherwise we
    would have a corrupted log tree if before the new super block is written
    a crash/reboot happens and the location of any log tree metadata extent
    ended up being reused and rewritten.

    Even though we pinned the log tree's metadata extents, we were issuing a
    discard against them if the fs was mounted with the -o discard option,
    resulting in corruption of the log tree if a crash/reboot happened before
    writing the new super block - the next time the fs was mounted, during
    the log replay process we would find nodes/leafs of the log btree with
    a content full of zeroes, causing the process to fail and require the
    use of the tool btrfs-zero-log to wipeout the log tree (and all data
    previously fsynced becoming lost forever).

    Fix this by not doing a discard when pinning an extent. The discard will
    be done later when it's safe (after the new super block is committed) at
    extent-tree.c:btrfs_finish_extent_commit().

    Fixes: e688b7252f78 (Btrfs: fix extent pinning bugs in the tree log)
    Signed-off-by: Filipe Manana
    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana
     

29 Apr, 2015

4 commits

  • …nux-stable into ti-linux-3.14.y

    This is the 3.14.40 stable release

    * tag 'v3.14.40' of http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (44 commits)
    Linux 3.14.40
    arc: mm: Fix build failure
    proc/pagemap: walk page tables under pte lock
    mm: softdirty: unmapped addresses between VMAs are clean
    sb_edac: avoid INTERNAL ERROR message in EDAC with unspecified channel
    x86: mm: move mmap_sem unlock from mm_fault_error() to caller
    ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE
    ARM: 8108/1: mm: Introduce {pte,pmd}_isset and {pte,pmd}_isclear
    vm: make stack guard page errors return VM_FAULT_SIGSEGV rather than SIGBUS
    vm: add VM_FAULT_SIGSEGV handling support
    sched: declare pid_alive as inline
    move d_rcu from overlapping d_child to overlapping d_alias
    KVM: x86: SYSENTER emulation is broken
    netfilter: conntrack: disable generic tracking for known protocols
    mm: hwpoison: drop lru_add_drain_all() in __soft_offline_page()
    Bluetooth: Add USB device 04ca:3010 as Atheros AR3012
    Bluetooth: ath3k: Add support of MCI 13d3:3408 bt device
    Bluetooth: Add support for Acer [0489:e078]
    Add a new PID/VID 0227/0930 for AR3012.
    Bluetooth: Add support for Broadcom device of Asus Z97-DELUXE motherboard
    ...

    Signed-off-by: Texas Instruments Auto Merger <lcpd_integration@list.ti.com>

    Texas Instruments Auto Merger
     
  • commit 05fbf357d94152171bc50f8a369390f1f16efd89 upstream.

    Lockless access to pte in pagemap_pte_range() might race with page
    migration and trigger BUG_ON(!PageLocked()) in migration_entry_to_page():

    CPU A (pagemap) CPU B (migration)
    lock_page()
    try_to_unmap(page, TTU_MIGRATION...)
    make_migration_entry()
    set_pte_at()

    pte_to_pagemap_entry()
    remove_migration_ptes()
    unlock_page()
    if(is_migration_entry())
    migration_entry_to_page()
    BUG_ON(!PageLocked(page))

    Also lockless read might be non-atomic if pte is larger than wordsize.
    Other pte walkers (smaps, numa_maps, clear_refs) already lock ptes.

    Fixes: 052fb0d635df ("proc: report file/anon bit in /proc/pid/pagemap")
    Signed-off-by: Konstantin Khlebnikov
    Reported-by: Andrey Ryabinin
    Reviewed-by: Cyrill Gorcunov
    Acked-by: Naoya Horiguchi
    Acked-by: Kirill A. Shutemov
    Cc: [3.5+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Konstantin Khlebnikov
     
  • commit 81d0fa623c5b8dbd5279d9713094b0f9b0a00fb4 upstream.

    If a /proc/pid/pagemap read spans a [VMA, an unmapped region, then a
    VM_SOFTDIRTY VMA], the virtual pages in the unmapped region are reported
    as softdirty. Here's a program to demonstrate the bug:

    int main() {
    const uint64_t PAGEMAP_SOFTDIRTY = 1ul << 55;
    uint64_t pme[3];
    int fd = open("/proc/self/pagemap", O_RDONLY);;
    char *m = mmap(NULL, 3 * getpagesize(), PROT_READ,
    MAP_ANONYMOUS | MAP_SHARED, -1, 0);
    munmap(m + getpagesize(), getpagesize());
    pread(fd, pme, 24, (unsigned long) m / getpagesize() * 8);
    assert(pme[0] & PAGEMAP_SOFTDIRTY); /* passes */
    assert(!(pme[1] & PAGEMAP_SOFTDIRTY)); /* fails */
    assert(pme[2] & PAGEMAP_SOFTDIRTY); /* passes */
    return 0;
    }

    (Note that all pages in new VMAs are softdirty until cleared).

    Tested:
    Used the program given above. I'm going to include this code in
    a selftest in the future.

    [n-horiguchi@ah.jp.nec.com: prevent pagemap_pte_range() from overrunning]
    Signed-off-by: Peter Feiner
    Cc: "Kirill A. Shutemov"
    Cc: Cyrill Gorcunov
    Cc: Pavel Emelyanov
    Cc: Jamie Liu
    Cc: Hugh Dickins
    Cc: Naoya Horiguchi
    Signed-off-by: Naoya Horiguchi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Peter Feiner
     
  • commit 946e51f2bf37f1656916eb75bd0742ba33983c28 upstream.

    move d_rcu from overlapping d_child to overlapping d_alias

    Signed-off-by: Al Viro
    Cc: Ben Hutchings
    [hujianyang: Backported to 3.14 refer to the work of Ben Hutchings in 3.2:
    - Apply name changes in all the different places we use d_alias and d_child
    - Move the WARN_ON() in __d_free() to d_free() as we don't have dentry_free()]
    Signed-off-by: hujianyang
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     

19 Apr, 2015

6 commits

  • …nux-stable into ti-linux-3.14.y

    This is the 3.14.39 stable release

    * tag 'v3.14.39' of http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (44 commits)
    Linux 3.14.39
    IB/mlx4: Saturate RoCE port PMA counters in case of overflow
    net: llc: use correct size for sysctl timeout entries
    net: rds: use correct size for max unacked packets and bytes
    media: s5p-mfc: fix mmap support for 64bit arch
    sh_veu: v4l2_dev wasn't set
    iscsi target: fix oops when adding reject pdu
    ioctx_alloc(): fix vma (and file) leak on failure
    ocfs2: _really_ sync the right range
    be2iscsi: Fix kernel panic when device initialization fails
    cifs: fix use-after-free bug in find_writable_file
    cifs: smb2_clone_range() - exit on unhandled error
    n_tty: Fix read buffer overwrite when no newline
    tty: serial: fsl_lpuart: clear receive flag on FIFO flush
    usb: xhci: apply XHCI_AVOID_BEI quirk to all Intel xHCI controllers
    usb: xhci: handle Config Error Change (CEC) in xhci driver
    cpuidle: ACPI: do not overwrite name and description of C0
    cpuidle: remove state_count field from struct cpuidle_device
    can: flexcan: Deferred on Regulator return EPROBE_DEFER
    x86/reboot: Add ASRock Q1900DC-ITX mainboard reboot quirk
    ...

    Signed-off-by: Texas Instruments Auto Merger <lcpd_integration@list.ti.com>

    Texas Instruments Auto Merger
     
  • commit deeb8525f9bcea60f5e86521880c1161de7a5829 upstream.

    If we fail past the aio_setup_ring(), we need to destroy the
    mapping. We don't need to care about anybody having found ctx,
    or added requests to it, since the last failure exit is exactly
    the failure to make ctx visible to lookups.

    Reproducer (based on one by Joe Mario ):

    void count(char *p)
    {
    char s[80];
    printf("%s: ", p);
    fflush(stdout);
    sprintf(s, "/bin/cat /proc/%d/maps|/bin/fgrep -c '/[aio] (deleted)'", getpid());
    system(s);
    }

    int main()
    {
    io_context_t *ctx;
    int created, limit, i, destroyed;
    FILE *f;

    count("before");
    if ((f = fopen("/proc/sys/fs/aio-max-nr", "r")) == NULL)
    perror("opening aio-max-nr");
    else if (fscanf(f, "%d", &limit) != 1)
    fprintf(stderr, "can't parse aio-max-nr\n");
    else if ((ctx = calloc(limit, sizeof(io_context_t))) == NULL)
    perror("allocating aio_context_t array");
    else {
    for (i = 0, created = 0; i < limit; i++) {
    if (io_setup(1000, ctx + created) == 0)
    created++;
    }
    for (i = 0, destroyed = 0; i < created; i++)
    if (io_destroy(ctx[i]) == 0)
    destroyed++;
    printf("created %d, failed %d, destroyed %d\n",
    created, limit - created, destroyed);
    count("after");
    }
    }

    Found-by: Joe Mario
    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     
  • commit 64b4e2526d1cf6e6a4db6213d6e2b6e6ab59479a upstream.

    "ocfs2 syncs the wrong range" had been broken; prior to it the
    code was doing the wrong thing in case of O_APPEND, all right,
    but _after_ it we were syncing the wrong range in 100% cases.
    *ppos, aka iocb->ki_pos is incremented prior to that point,
    so we are always doing sync on the area _after_ the one we'd
    written to.

    Spotted by Joseph Qi back in January;
    unfortunately, I'd missed his mail back then ;-/

    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     
  • commit e1e9bda22d7ddf88515e8fe401887e313922823e upstream.

    Under intermittent network outages, find_writable_file() is susceptible
    to the following race condition, which results in a user-after-free in
    the cifs_writepages code-path:

    Thread 1 Thread 2
    ======== ========

    inv_file = NULL
    refind = 0
    spin_lock(&cifs_file_list_lock)

    // invalidHandle found on openFileList

    inv_file = open_file
    // inv_file->count currently 1

    cifsFileInfo_get(inv_file)
    // inv_file->count = 2

    spin_unlock(&cifs_file_list_lock);

    cifs_reopen_file() cifs_close()
    // fails (rc != 0) ->cifsFileInfo_put()
    spin_lock(&cifs_file_list_lock)
    // inv_file->count = 1
    spin_unlock(&cifs_file_list_lock)

    spin_lock(&cifs_file_list_lock);
    list_move_tail(&inv_file->flist,
    &cifs_inode->openFileList);
    spin_unlock(&cifs_file_list_lock);

    cifsFileInfo_put(inv_file);
    ->spin_lock(&cifs_file_list_lock)

    // inv_file->count = 0
    list_del(&cifs_file->flist);
    // cleanup!!
    kfree(cifs_file);

    spin_unlock(&cifs_file_list_lock);

    spin_lock(&cifs_file_list_lock);
    ++refind;
    // refind = 1
    goto refind_writable;

    At this point we loop back through with an invalid inv_file pointer
    and a refind value of 1. On second pass, inv_file is not overwritten on
    openFileList traversal, and is subsequently dereferenced.

    Signed-off-by: David Disseldorp
    Reviewed-by: Jeff Layton
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    David Disseldorp
     
  • commit 2477bc58d49edb1c0baf59df7dc093dce682af2b upstream.

    While attempting to clone a file on a samba server, we receive a
    STATUS_INVALID_DEVICE_REQUEST. This is mapped to -EOPNOTSUPP which
    isn't handled in smb2_clone_range(). We end up looping in the while loop
    making same call to the samba server over and over again.

    The proposed fix is to exit and return the error value when encountered
    with an unhandled error.

    Signed-off-by: Sachin Prabhu
    Signed-off-by: Steve French
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Sachin Prabhu
     
  • commit 9c4f61f01d269815bb7c37be3ede59c5587747c6 upstream.

    We can search and add the orphan item in one go,
    btrfs_insert_orphan_item will find out if the item already exists.

    Signed-off-by: David Sterba
    Cc: Chris Mason
    Cc: Roman Mamedov
    Signed-off-by: Greg Kroah-Hartman

    David Sterba
     

13 Apr, 2015

2 commits

  • …nux-stable into ti-linux-3.14.y

    This is the 3.14.38 stable release

    * tag 'v3.14.38' of http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (39 commits)
    Linux 3.14.38
    mfd: kempld-core: Fix callback return value check
    net: ethernet: pcnet32: Setup the SRAM and NOUFLO on Am79C97{3, 5}
    powerpc/mpc85xx: Add ranges to etsec2 nodes
    powerpc/pseries: Little endian fixes for post mobility device tree update
    arm64: Use the reserved TTBR0 if context switching to the init_mm
    powerpc/book3s: Fix the MCE code to use CONFIG_KVM_BOOK3S_64_HANDLER
    hfsplus: fix B-tree corruption after insertion at position 0
    spi: trigger trace event for message-done before mesg->complete
    dm io: deal with wandering queue limits when handling REQ_DISCARD and REQ_WRITE_SAME
    dm: hold suspend_lock while suspending device during device deletion
    dmaengine: dw: append MODULE_ALIAS for platform driver
    vt6655: RFbSetPower fix missing rate RATE_12M
    staging: vt6656: vnt_rf_setpower: fix missing rate RATE_12M
    perf: Fix irq_work 'tail' recursion
    of/irq: Fix of_irq_parse_one() returned error codes
    phy: Find the right match in devm_phy_destroy()
    Revert "iwlwifi: mvm: fix failure path when power_update fails in add_interface"
    mac80211: drop unencrypted frames in mesh fwding
    mac80211: disable u-APSD queues by default
    ...

    Signed-off-by: Texas Instruments Auto Merger <lcpd_integration@list.ti.com>

    Texas Instruments Auto Merger
     
  • commit 98cf21c61a7f5419d82f847c4d77bf6e96a76f5f upstream.

    Fix B-tree corruption when a new record is inserted at position 0 in the
    node in hfs_brec_insert(). In this case a hfs_brec_update_parent() is
    called to update the parent index node (if exists) and it is passed
    hfs_find_data with a search_key containing a newly inserted key instead
    of the key to be updated. This results in an inconsistent index node.
    The bug reproduces on my machine after an extents overflow record for
    the catalog file (CNID=4) is inserted into the extents overflow B-tree.
    Because of a low (reserved) value of CNID=4, it has to become the first
    record in the first leaf node.

    The resulting first leaf node is correct:

    ----------------------------------------------------
    | key0.CNID=4 | key1.CNID=123 | key2.CNID=456, ... |
    ----------------------------------------------------

    But the parent index key0 still contains the previous key CNID=123:

    -----------------------
    | key0.CNID=123 | ... |
    -----------------------

    A change in hfs_brec_insert() makes hfs_brec_update_parent() work
    correctly by preventing it from getting fd->record=-1 value from
    __hfs_brec_find().

    Along the way, I removed duplicate code with unification of the if
    condition. The resulting code is equivalent to the original code
    because node is never 0.

    Also hfs_brec_update_parent() will now return an error after getting a
    negative fd->record value. However, the return value of
    hfs_brec_update_parent() is not checked anywhere in the file and I'm
    leaving it unchanged by this patch. brec.c lacks error checking after
    some other calls too, but this issue is of less importance than the one
    being fixed by this patch.

    Signed-off-by: Sergei Antonov
    Cc: Joe Perches
    Reviewed-by: Vyacheslav Dubeyko
    Acked-by: Hin-Tak Leung
    Cc: Anton Altaparmakov
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Sergei Antonov
     

02 Apr, 2015

1 commit

  • …nux-stable into ti-linux-3.14.y

    This is the 3.14.37 stable release

    * tag 'v3.14.37' of http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (175 commits)
    Linux 3.14.37
    target: Allow Write Exclusive non-reservation holders to READ
    target: Allow AllRegistrants to re-RESERVE existing reservation
    target: Avoid dropping AllRegistrants reservation during unregister
    target: Fix R_HOLDER bit usage for AllRegistrants
    target/pscsi: Fix NULL pointer dereference in get_device_type
    iscsi-target: Avoid early conn_logout_comp for iser connections
    target: Fix virtual LUN=0 target_configure_device failure OOPs
    target: Fix reference leak in target_get_sess_cmd() error path
    ARM: dts: DRA7x: Fix the bypass clock source for dpll_iva and others
    ARM: at91: pm: fix at91rm9200 standby
    arm64: Honor __GFP_ZERO in dma allocations
    netfilter: xt_socket: fix a stack corruption bug
    netfilter: nft_compat: fix module refcount underflow
    ipvs: rerouting to local clients is not needed anymore
    ipvs: add missing ip_vs_pe_put in sync code
    x86/vdso: Fix the build on GCC5
    x86/fpu: Drop_fpu() should not assume that tsk equals current
    x86/fpu: Avoid math_state_restore() without used_math() in __restore_xstate_sig()
    crypto: aesni - fix memory usage in GCM decryption
    ...

    Signed-off-by: Texas Instruments Auto Merger <lcpd_integration@list.ti.com>

    Texas Instruments Auto Merger
     

26 Mar, 2015

4 commits

  • commit ab676b7d6fbf4b294bf198fb27ade5b0e865c7ce upstream.

    As pointed by recent post[1] on exploiting DRAM physical imperfection,
    /proc/PID/pagemap exposes sensitive information which can be used to do
    attacks.

    This disallows anybody without CAP_SYS_ADMIN to read the pagemap.

    [1] http://googleprojectzero.blogspot.com/2015/03/exploiting-dram-rowhammer-bug-to-gain.html

    [ Eventually we might want to do anything more finegrained, but for now
    this is the simple model. - Linus ]

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Konstantin Khlebnikov
    Acked-by: Andy Lutomirski
    Cc: Pavel Emelyanov
    Cc: Andrew Morton
    Cc: Mark Seaborn
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Kirill A. Shutemov
     
  • commit 283ee1482f349d6c0c09dfb725db5880afc56813 upstream.

    According to a report from Yuxuan Shui, nilfs2 in kernel 3.19 got stuck
    during recovery at mount time. The code path that caused the deadlock was
    as follows:

    nilfs_fill_super()
    load_nilfs()
    nilfs_salvage_orphan_logs()
    * Do roll-forwarding, attach segment constructor for recovery,
    and kick it.

    nilfs_segctor_thread()
    nilfs_segctor_thread_construct()
    * A lock is held with nilfs_transaction_lock()
    nilfs_segctor_do_construct()
    nilfs_segctor_drop_written_files()
    iput()
    iput_final()
    write_inode_now()
    writeback_single_inode()
    __writeback_single_inode()
    do_writepages()
    nilfs_writepage()
    nilfs_construct_dsync_segment()
    nilfs_transaction_lock() --> deadlock

    This can happen if commit 7ef3ff2fea8b ("nilfs2: fix deadlock of segment
    constructor over I_SYNC flag") is applied and roll-forward recovery was
    performed at mount time. The roll-forward recovery can happen if datasync
    write is done and the file system crashes immediately after that. For
    instance, we can reproduce the issue with the following steps:

    < nilfs2 is mounted on /nilfs (device: /dev/sdb1) >
    # dd if=/dev/zero of=/nilfs/test bs=4k count=1 && sync
    # dd if=/dev/zero of=/nilfs/test conv=notrunc oflag=dsync bs=4k
    count=1 && reboot -nfh
    < the system will immediately reboot >
    # mount -t nilfs2 /dev/sdb1 /nilfs

    The deadlock occurs because iput() can run segment constructor through
    writeback_single_inode() if MS_ACTIVE flag is not set on sb->s_flags. The
    above commit changed segment constructor so that it calls iput()
    asynchronously for inodes with i_nlink == 0, but that change was
    imperfect.

    This fixes the another deadlock by deferring iput() in segment constructor
    even for the case that mount is not finished, that is, for the case that
    MS_ACTIVE flag is not set.

    Signed-off-by: Ryusuke Konishi
    Reported-by: Yuxuan Shui
    Tested-by: Ryusuke Konishi
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Ryusuke Konishi
     
  • commit 0d2783626a53d4c922f82d51fa675cb5d13f0d36 upstream.

    fuse_try_move_page() is not prepared for replacing pages that have already
    been read.

    Reported-by: Al Viro
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     
  • commit aa991b3b267e24f578bac7b09cc57579b660304b upstream.

    Regular pipe buffers' ->steal method (generic_pipe_buf_steal()) doesn't set
    PG_uptodate.

    Don't warn on this condition, just set the uptodate flag.

    Signed-off-by: Miklos Szeredi
    Signed-off-by: Greg Kroah-Hartman

    Miklos Szeredi
     

18 Mar, 2015

6 commits

  • commit 7c0af9ffb7bb4e5355470fa60b3eb711ddf226fa upstream.

    put_rpccred() can sleep.

    Fixes: 8f649c3762547 ("NFSv4: Fix the locking in nfs_inode_reclaim_delegation()")
    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • commit 957ed60b53b519064a54988c4e31e0087e47d091 upstream.

    Each inode of nilfs2 stores a root node of a b-tree, and it turned out to
    have a memory overrun issue:

    Each b-tree node of nilfs2 stores a set of key-value pairs and the number
    of them (in "bn_nchildren" member of nilfs_btree_node struct), as well as
    a few other "bn_*" members.

    Since the value of "bn_nchildren" is used for operations on the key-values
    within the b-tree node, it can cause memory access overrun if a large
    number is incorrectly set to "bn_nchildren".

    For instance, nilfs_btree_node_lookup() function determines the range of
    binary search with it, and too large "bn_nchildren" leads
    nilfs_btree_node_get_key() in that function to overrun.

    As for intermediate b-tree nodes, this is prevented by a sanity check
    performed when each node is read from a drive, however, no sanity check
    has been done for root nodes stored in inodes.

    This patch fixes the issue by adding missing sanity check against b-tree
    root nodes so that it's called when on-memory inodes are read from ifile,
    inode metadata file.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Ryusuke Konishi
     
  • commit 7e0e953bb0cf649f93277ac8fb67ecbb7f7b04a9 upstream.

    use_pde()/unuse_pde() in ->follow_link()/->put_link() resp.

    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     
  • commit 0db59e59299f0b67450c5db21f7f316c8fb04e84 upstream.

    As it is, we have debugfs_remove() racing with symlink traversals.
    Supply ->evict_inode() and do freeing there - inode will remain
    pinned until we are done with the symlink body.

    And rip the idiocy with checking if dentry is positive right after
    we'd verified debugfs_positive(), which is a stronger check...

    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     
  • commit 0a280962dc6e117e0e4baa668453f753579265d9 upstream.

    X-Coverup: just ask spender
    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     
  • commit dd9ef135e3542ffc621c4eb7f0091870ec7a1504 upstream.

    Improper arithmetics when calculting the address of the extended ref could
    lead to an out of bounds memory read and kernel panic.

    Signed-off-by: Quentin Casasnovas
    Reviewed-by: David Sterba
    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    Quentin Casasnovas