07 Feb, 2014

7 commits

  • …nux-stable into ti-linux-3.12.y

    This is the 3.12.10 stable release

    * tag 'v3.12.10' of http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (133 commits)
    Linux 3.12.10
    x86, cpu, amd: Add workaround for family 16h, erratum 793
    powerpc: Make sure "cache" directory is removed when offlining cpu
    powerpc: Fix the setup of CPU-to-Node mappings during CPU online
    btrfs: restrict snapshotting to own subvolumes
    Btrfs: handle EAGAIN case properly in btrfs_drop_snapshot()
    target/iscsi: Fix network portal creation race
    iscsi-target: Pre-allocate more tags to avoid ack starvation
    virtio-scsi: Fix hotcpu_notifier use-after-free with virtscsi_freeze
    SCSI: bfa: Chinook quad port 16G FC HBA claim issue
    usb: core: get config and string descriptors for unauthorized devices
    hpfs: remember free space
    ALSA: hda/hdmi - allow PIN_OUT to be dynamically enabled
    ALSA: hda - hdmi: introduce patch_nvhdmi()
    ALSA: hda - Don't set indep_hp flag for old AD codecs
    KVM: PPC: e500: Fix bad address type in deliver_tlb_misss()
    KVM: PPC: Book3S HV: use xics_wake_cpu only when defined
    parisc: fix cache-flushing
    alpha: fix broken network checksum
    inet_diag: fix inet_diag_dump_icsk() timewait socket state logic
    ...

    Signed-off-by: Dan Murphy <DMurphy@ti.com>

    Dan Murphy
     
  • commit d024206133ce21936b3d5780359afc00247655b7 upstream.

    Currently, any user can snapshot any subvolume if the path is accessible and
    thus indirectly create and keep files he does not own under his direcotries.
    This is not possible with traditional directories.

    In security context, a user can snapshot root filesystem and pin any
    potentially buggy binaries, even if the updates are applied.

    All the snapshots are visible to the administrator, so it's possible to
    verify if there are suspicious snapshots.

    Another more practical problem is that any user can pin the space used
    by eg. root and cause ENOSPC.

    Original report:
    https://bugs.launchpad.net/ubuntu/+source/apparmor/+bug/484786

    Signed-off-by: David Sterba
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    David Sterba
     
  • commit 90515e7f5d7d24cbb2a4038a3f1b5cfa2921aa17 upstream.

    We may return early in btrfs_drop_snapshot(), we shouldn't
    call btrfs_std_err() for this case, fix it.

    Signed-off-by: Wang Shilong
    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    Wang Shilong
     
  • commit 2cbe5c76fc5e38e9af4b709593146e4b8272b69e upstream.

    Previously, hpfs scanned all bitmaps each time the user asked for free
    space using statfs. This patch changes it so that hpfs scans the
    bitmaps only once, remembes the free space and on next invocation of
    statfs it returns the value instantly.

    New versions of wine are hammering on the statfs syscall very heavily,
    making some games unplayable when they're stored on hpfs, with load
    times in minutes.

    This should be backported to the stable kernels because it fixes
    user-visible problem (excessive level load times in wine).

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Mikulas Patocka
     
  • commit 260a459d2e39761fbd39803497205ce1690bc7b1 upstream.

    A bug was introduced with the is_mounted helper function in
    commit f7a99c5b7c8bd3d3f533c8b38274e33f3da9096e
    Author: Al Viro
    Date: Sat Jun 9 00:59:08 2012 -0400

    get rid of ->mnt_longterm

    it's enough to set ->mnt_ns of internal vfsmounts to something
    distinct from all struct mnt_namespace out there; then we can
    just use the check for ->mnt_ns != NULL in the fast path of
    mntput_no_expire()

    Signed-off-by: Al Viro

    The intent was to test if the real_mount(vfsmount)->mnt_ns was
    NULL_OR_ERR but the code is actually testing real_mount(vfsmount)
    and always returning true.

    The result is d_absolute_path returning paths it should be hiding.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit a8323da0366d3398eda62741d2ac1130c8a172ed upstream.

    In commit 232d2d60aa5469bb097f55728f65146bd49c1d25
    Author: Waiman Long
    Date: Mon Sep 9 12:18:13 2013 -0400

    dcache: Translating dentry into pathname without taking rename_lock

    The __dentry_path locking was changed and the variable error was
    intended to be moved outside of the loop. Unfortunately the inner
    declaration of error was not removed. Resulting in a version of
    __dentry_path that will never return an error.

    Remove the problematic inner declaration of error and allow
    __dentry_path to return errors once again.

    Cc: Waiman Long
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit 09c455aaa8f47a94d5bafaa23d58365768210507 upstream.

    A missing cast means that when we are truncating a file which is less
    than 60 bytes, we don't clear the correct area of memory, and in fact
    we can end up truncating the next inode in the inode table, or worse
    yet, some other kernel data structure.

    Addresses-Coverity-Id: #751987

    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     

26 Jan, 2014

6 commits

  • …ux-stable into ti-linux-3.12.y

    This is the 3.12.9 stable release

    * tag 'v3.12.9' of http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (28 commits)
    Linux 3.12.9
    ARM: 7938/1: OMAP4/highbank: Flush L2 cache before disabling
    drm/i915: Don't grab crtc mutexes in intel_modeset_gem_init()
    ARM: 7934/1: DT/kernel: fix arch_match_cpu_phys_id to avoid erroneous match
    serial: amba-pl011: use port lock to guard control register access
    mm: Make {,set}page_address() static inline if WANT_PAGE_VIRTUAL
    md/raid5: Fix possible confusion when multiple write errors occur.
    md/raid10: fix two bugs in handling of known-bad-blocks.
    md/raid10: fix bug when raid10 recovery fails to recover a block.
    md: fix problem when adding device to read-only array with bitmap.
    drm/i915: fix DDI PLLs HW state readout code
    nilfs2: fix segctor bug that causes file system corruption
    mm: fix crash when using XFS on loopback
    crash_dump: fix compilation error (on MIPS at least)
    ftrace/x86: Load ftrace_ops in parameter not the variable holding it
    thp: fix copy_page_rep GPF by testing is_huge_zero_pmd once only
    SELinux: Fix possible NULL pointer dereference in selinux_inode_permission()
    writeback: Fix data corruption on NFS
    hwmon: (coretemp) Fix truncated name of alarm attributes
    i2c: Re-instate body of i2c_parent_is_i2c_adapter()
    ...

    Signed-off-by: Dan Murphy <DMurphy@ti.com>

    Dan Murphy
     
  • commit 70f2fe3a26248724d8a5019681a869abdaf3e89a upstream.

    There is a bug in the function nilfs_segctor_collect, which results in
    active data being written to a segment, that is marked as clean. It is
    possible, that this segment is selected for a later segment
    construction, whereby the old data is overwritten.

    The problem shows itself with the following kernel log message:

    nilfs_sufile_do_cancel_free: segment 6533 must be clean

    Usually a few hours later the file system gets corrupted:

    NILFS: bad btree node (blocknr=8748107): level = 0, flags = 0x0, nchildren = 0
    NILFS error (device sdc1): nilfs_bmap_last_key: broken bmap (inode number=114660)

    The issue can be reproduced with a file system that is nearly full and
    with the cleaner running, while some IO intensive task is running.
    Although it is quite hard to reproduce.

    This is what happens:

    1. The cleaner starts the segment construction
    2. nilfs_segctor_collect is called
    3. sc_stage is on NILFS_ST_SUFILE and segments are freed
    4. sc_stage is on NILFS_ST_DAT current segment is full
    5. nilfs_segctor_extend_segments is called, which
    allocates a new segment
    6. The new segment is one of the segments freed in step 3
    7. nilfs_sufile_cancel_freev is called and produces an error message
    8. Loop around and the collection starts again
    9. sc_stage is on NILFS_ST_SUFILE and segments are freed
    including the newly allocated segment, which will contain active
    data and can be allocated at a later time
    10. A few hours later another segment construction allocates the
    segment and causes file system corruption

    This can be prevented by simply reordering the statements. If
    nilfs_sufile_cancel_freev is called before nilfs_segctor_extend_segments
    the freed segments are marked as dirty and cannot be allocated any more.

    Signed-off-by: Andreas Rohner
    Reviewed-by: Ryusuke Konishi
    Tested-by: Andreas Rohner
    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Andreas Rohner
     
  • commit f9b0e058cbd04ada76b13afffa7e1df830543c24 upstream.

    Commit 4f8ad655dbc8 "writeback: Refactor writeback_single_inode()" added
    a condition to skip clean inode. However this is wrong in WB_SYNC_ALL
    mode because there we also want to wait for outstanding writeback on
    possibly clean inode. This was causing occasional data corruption issues
    on NFS because it uses sync_inode() to make sure all outstanding writes
    are flushed to the server before truncating the inode and with
    sync_inode() returning prematurely file was sometimes extended back
    by an outstanding write after it was truncated.

    So modify the test to also check for pages under writeback in
    WB_SYNC_ALL mode.

    Fixes: 4f8ad655dbc82cf05d2edc11e66b78a42d38bf93
    Reported-and-tested-by: Dan Duval
    Signed-off-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit 41301ae78a99ead04ea42672a1ab72c6f44cc81d upstream.

    Gao feng reported that commit
    e51db73532955dc5eaba4235e62b74b460709d5b
    userns: Better restrictions on when proc and sysfs can be mounted
    caused a regression on mounting a new instance of proc in a mount
    namespace created with user namespace privileges, when binfmt_misc
    is mounted on /proc/sys/fs/binfmt_misc.

    This is an unintended regression caused by the absolutely bogus empty
    directory check in fs_fully_visible. The check fs_fully_visible replaced
    didn't even bother to attempt to verify proc was fully visible and
    hiding proc files with any kind of mount is rare. So for now fix
    the userspace regression by allowing directory with nlink == 1
    as /proc/sys/fs/binfmt_misc has.

    I will have a better patch but it is not stable material, or
    last minute kernel material. So it will have to wait.

    Acked-by: Serge Hallyn
    Acked-by: Gao feng
    Tested-by: Gao feng
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit f48cfddc6729ef133933062320039808bafa6f45 upstream.

    Aditya Kali (adityakali@google.com) wrote:
    > Commit bf056bfa80596a5d14b26b17276a56a0dcb080e5:
    > "proc: Fix the namespace inode permission checks." converted
    > the namespace files into symlinks. The same commit changed
    > the way namespace bind mounts appear in /proc/mounts:
    > $ mount --bind /proc/self/ns/ipc /mnt/ipc
    > Originally:
    > $ cat /proc/mounts | grep ipc
    > proc /mnt/ipc proc rw,nosuid,nodev,noexec 0 0
    >
    > After commit bf056bfa80596a5d14b26b17276a56a0dcb080e5:
    > $ cat /proc/mounts | grep ipc
    > proc ipc:[4026531839] proc rw,nosuid,nodev,noexec 0 0
    >
    > This breaks userspace which expects the 2nd field in
    > /proc/mounts to be a valid path.

    The symlink /proc//ns/{ipc,mnt,net,pid,user,uts} point to
    dentries allocated with d_alloc_pseudo that we can mount, and
    that have interesting names printed out with d_dname.

    When these files are bind mounted /proc/mounts is not currently
    displaying the mount point correctly because d_dname is called instead
    of just displaying the path where the file is mounted.

    Solve this by adding an explicit check to distinguish mounted pseudo
    inodes and unmounted pseudo inodes. Unmounted pseudo inodes always
    use mount of their filesstem as the mnt_root in their path making
    these two cases easy to distinguish.

    Acked-by: Serge Hallyn
    Reported-by: Aditya Kali
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit 62e96cf81988101fe9e086b2877307b6adda5197 upstream.

    This patch calls get_write_access in function gfs2_setattr_chown,
    which merely increases inode->i_writecount for the duration of the
    function. That will ensure that any file closes won't delete the
    inode's multi-block reservation while the function is running.
    It also ensures that a multi-block reservation exists when needed
    for quota change operations during the chown.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse
    Signed-off-by: Greg Kroah-Hartman

    Bob Peterson
     

18 Jan, 2014

3 commits

  • …x-stable into ti-linux-3.12.y

    This is the 3.12.7 stable release

    * tag 'v3.12.7' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (154 commits)
    Linux 3.12.7
    sh: add EXPORT_SYMBOL(min_low_pfn) and EXPORT_SYMBOL(max_low_pfn) to sh_ksyms_32.c
    ext4: fix bigalloc regression
    ACPIPHP / radeon / nouveau: Fix VGA switcheroo problem related to hotplug
    nouveau_acpi: convert acpi_get_handle() to acpi_has_method()
    aio/migratepages: make aio migrate pages sane
    aio: clean up and fix aio_setup_ring page mapping
    clocksource: dw_apb_timer_of: Fix support for dts binding "snps,dw-apb-timer"
    clocksource: dw_apb_timer_of: Fix read_sched_clock
    selinux: process labeled IPsec TCP SYN-ACK packets properly in selinux_ip_postroute()
    selinux: look for IPsec labels on both inbound and outbound packets
    sh: always link in helper functions extracted from libgcc
    gpio: msm: Fix irq mask/unmask by writing bits instead of numbers
    gpio: twl4030: Fix regression for twl gpio LED output
    sh-pfc: Fix PINMUX_GPIO macro
    jbd2: don't BUG but return ENOSPC if a handle runs out of space
    s390/3270: fix allocation of tty3270_screen structure
    ARM: sun7i: dt: Fix interrupt trigger types
    memcg: fix memcg_size() calculation
    GFS2: Fix incorrect invalidation for DIO/buffered I/O
    ...

    Conflicts:
    arch/arm/mach-omap2/omap_hwmod_7xx_data.c
    drivers/usb/musb/musb_core.c

    Signed-off-by: Dan Murphy <dmurphy@ti.com>

    Dan Murphy
     
  • …x-stable into ti-linux-3.12.y

    This is the 3.12.6 stable release

    * tag 'v3.12.6' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (120 commits)
    Linux 3.12.6
    ARM: OMAP2+: hwmod: Fix SOFTRESET logic
    drm/i915/vlv: fix up broken precision in vlv_crtc_clock_get
    drm/i915/vlv: add VLV specific clock_get function v3
    i915/vlv: untangle integrated clock source handling v4
    Btrfs: fix lockdep error in async commit
    Btrfs: fix a crash when running balance and defrag concurrently
    Btrfs: do not run snapshot-aware defragment on error
    Btrfs: take ordered root lock when removing ordered operations inode
    Btrfs: stop using vfs_read in send
    Btrfs: fix incorrect inode acl reset
    Btrfs: fix hole check in log_one_extent
    Btrfs: fix memory leak of chunks' extent map
    Btrfs: reset intwrite on transaction abort
    Btrfs: do a full search everytime in btrfs_search_old_slot
    Revert "net: update consumers of MSG_MORE to recognize MSG_SENDPAGE_NOTLAST"
    Input: elantech - add support for newer (August 2013) devices
    NFSv4 wait on recovery for async session errors
    sc1200_wdt: Fix oops
    staging: comedi: ssv_dnp: use comedi_dio_update_state()
    ...

    Conflicts:
    arch/arm/mach-omap2/omap_hwmod.c
    drivers/usb/musb/musb_cppi41.c

    Signed-off-by: Dan Murphy <dmurphy@ti.com>

    Dan Murphy
     
  • …x-stable into ti-linux-3.12.y

    This is the 3.12.5 stable release

    * tag 'v3.12.5' of git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable: (64 commits)
    Linux 3.12.5
    crypto: scatterwalk - Use sg_chain_ptr on chain entries
    drivers/char/i8k.c: add Dell XPLS L421X
    USB: cdc-acm: Added support for the Lenovo RD02-D400 USB Modem
    USB: spcp8x5: correct handling of CS5 setting
    USB: mos7840: correct handling of CS5 setting
    USB: ftdi_sio: fixed handling of unsupported CSIZE setting
    USB: pl2303: fixed handling of CS5 setting
    n_tty: Fix missing newline echo
    mei: add 9 series PCH mei device ids
    mei: me: add Lynx Point Wellsburg work station device id
    Input: mousedev - allow disabling even without CONFIG_EXPERT
    Input: allow deselecting serio drivers even without CONFIG_EXPERT
    tg3: avoid double-freeing of rx data memory
    iwlwifi: dvm: don't override mac80211's queue setting
    SCSI: Disable WRITE SAME for RAID and virtual host adapter drivers
    x86-64, build: Always pass in -mno-sse
    net: update consumers of MSG_MORE to recognize MSG_SENDPAGE_NOTLAST
    irq: Enable all irqs unconditionally in irq_resume
    Update of blkg_stat and blkg_rwstat may happen in bh context. While u64_stats_fetch_retry is only preempt_disable on 32bit UP system. This is not enough to avoid preemption by bh and may read strange 64 bit value.
    ...

    Signed-off-by: Dan Murphy <dmurphy@ti.com>

    Dan Murphy
     

10 Jan, 2014

24 commits

  • commit d0abafac8c9162f39c4f6b2f8141b772a09b3770 upstream.

    Commit f5a44db5d2 introduced a regression on filesystems created with
    the bigalloc feature (cluster size > blocksize). It causes xfstests
    generic/006 and /013 to fail with an unexpected JBD2 failure and
    transaction abort that leaves the test file system in a read only state.
    Other xfstests run on bigalloc file systems are likely to fail as well.

    The cause is the accidental use of a cluster mask where a cluster
    offset was needed in ext4_ext_map_blocks().

    Signed-off-by: Eric Whitney
    Cc: Theodore Ts'o
    Signed-off-by: Greg Kroah-Hartman

    Eric Whitney
     
  • commit 8e321fefb0e60bae4e2a28d20fc4fa30758d27c6 upstream.

    The arbitrary restriction on page counts offered by the core
    migrate_page_move_mapping() code results in rather suspicious looking
    fiddling with page reference counts in the aio_migratepage() operation.
    To fix this, make migrate_page_move_mapping() take an extra_count parameter
    that allows aio to tell the code about its own reference count on the page
    being migrated.

    While cleaning up aio_migratepage(), make it validate that the old page
    being passed in is actually what aio_migratepage() expects to prevent
    misbehaviour in the case of races.

    Signed-off-by: Benjamin LaHaise
    Signed-off-by: Greg Kroah-Hartman

    Benjamin LaHaise
     
  • commit 3dc9acb67600393249a795934ccdfc291a200e6b upstream.

    Since commit 36bc08cc01709 ("fs/aio: Add support to aio ring pages
    migration") the aio ring setup code has used a special per-ring backing
    inode for the page allocations, rather than just using random anonymous
    pages.

    However, rather than remembering the pages as it allocated them, it
    would allocate the pages, insert them into the file mapping (dirty, so
    that they couldn't be free'd), and then forget about them. And then to
    look them up again, it would mmap the mapping, and then use
    "get_user_pages()" to get back an array of the pages we just created.

    Now, not only is that incredibly inefficient, it also leaked all the
    pages if the mmap failed (which could happen due to excessive number of
    mappings, for example).

    So clean it all up, making it much more straightforward. Also remove
    some left-overs of the previous (broken) mm_populate() usage that was
    removed in commit d6c355c7dabc ("aio: fix race in ring buffer page
    lookup introduced by page migration support") but left the pointless and
    now misleading MAP_POPULATE flag around.

    Tested-and-acked-by: Benjamin LaHaise
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Linus Torvalds
     
  • commit f6c07cad081ba222d63623d913aafba5586c1d2c upstream.

    If a handle runs out of space, we currently stop the kernel with a BUG
    in jbd2_journal_dirty_metadata(). This makes it hard to figure out
    what might be going on. So return an error of ENOSPC, so we can let
    the file system layer figure out what is going on, to make it more
    likely we can get useful debugging information). This should make it
    easier to debug problems such as the one which was reported by:

    https://bugzilla.kernel.org/show_bug.cgi?id=44731

    The only two callers of this function are ext4_handle_dirty_metadata()
    and ocfs2_journal_dirty(). The ocfs2 function will trigger a
    BUG_ON(), which means there will be no change in behavior. The ext4
    function will call ext4_error_inode() which will print the useful
    debugging information and then handle the situation using ext4's error
    handling mechanisms (i.e., which might mean halting the kernel or
    remounting the file system read-only).

    Also, since both file systems already call WARN_ON(), drop the WARN_ON
    from jbd2_journal_dirty_metadata() to avoid two stack traces from
    being displayed.

    Signed-off-by: "Theodore Ts'o"
    Cc: ocfs2-devel@oss.oracle.com
    Acked-by: Joel Becker
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit dfd11184d894cd0a92397b25cac18831a1a6a5bc upstream.

    In patch 209806aba9d540dde3db0a5ce72307f85f33468f we allowed
    local deferred locks to be granted against a cached exclusive
    lock. That opened up a corner case which this patch now
    fixes.

    The solution to the problem is to check whether we have cached
    pages each time we do direct I/O and if so to unmap, flush
    and invalidate those pages. Since the glock state machine
    normally does that for us, mostly the code will be a no-op.

    Signed-off-by: Steven Whitehouse
    Signed-off-by: Greg Kroah-Hartman

    Steven Whitehouse
     
  • commit 502be2a32f09f388e4ff34ef2e3ebcabbbb261da upstream.

    This patch fixes a slab memory leak that sometimes can occur
    for files with a very short lifespan. The problem occurs when
    a dinode is deleted before it has gotten to the journal properly.
    In the leak scenario, the bd object is pinned for journal
    committment (queued to the metadata buffers queue: sd_log_le_buf)
    but is subsequently unpinned and dequeued before it finds its way
    to the ail or the revoke queue. In this rare circumstance, the bd
    object needs to be freed from slab memory, or it is forgotten.
    We have to be very careful how we do it, though, because
    multiple processes can call gfs2_remove_from_journal. In order to
    avoid double-frees, only the process that does the unpinning is
    allowed to free the bd.

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse
    Signed-off-by: Greg Kroah-Hartman

    Bob Peterson
     
  • commit 9290a9a7c0bcf5400e8dbfbf9707fa68ea3fb338 upstream.

    Function gfs2_remove_from_ail drops the reference on the bh via
    brelse. This patch fixes a race condition whereby bh is deferenced
    after the brelse when setting bd->bd_blkno = bh->b_blocknr;
    Under certain rare circumstances, bh might be gone or reused,
    and bd->bd_blkno is set to whatever that memory happens to be,
    which is often 0. Later, in gfs2_trans_add_unrevoke, that bd fails
    the test "bd->bd_blkno >= blkno" which causes it to never be freed.
    The end result is that the bd is never freed from the bufdata cache,
    which results in this error:
    slab error in kmem_cache_destroy(): cache `gfs2_bufdata': Can't free all objects

    Signed-off-by: Bob Peterson
    Signed-off-by: Steven Whitehouse
    Signed-off-by: Greg Kroah-Hartman

    Bob Peterson
     
  • commit dfe5b9ad83a63180f358b27d1018649a27b394a9 upstream.

    This is a GFS2 version of Tejun's patch:
    4f331f01b9c43bf001d3ffee578a97a1e0633eac
    vfs: don't hold s_umount over close_bdev_exclusive() call

    In this case its blkdev_put itself that is the issue and this
    patch uses the same solution of dropping and retaking s_umount.

    Reported-by: Tejun Heo
    Reported-by: Al Viro
    Signed-off-by: Steven Whitehouse
    Signed-off-by: Greg Kroah-Hartman

    Steven Whitehouse
     
  • commit df4e7ac0bb70abc97fbfd9ef09671fc084b3f9db upstream.

    ext2_quota_write() doesn't properly setup bh it passes to
    ext2_get_block() and thus we hit assertion BUG_ON(maxblocks == 0) in
    ext2_get_blocks() (or we could actually ask for mapping arbitrary number
    of blocks depending on whatever value was on stack).

    Fix ext2_quota_write() to properly fill in number of blocks to map.

    Reviewed-by: "Theodore Ts'o"
    Reviewed-by: Christoph Hellwig
    Reported-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit f1e3268126a35b9d3cb8bf67487fcc6cd13991d8 upstream.

    Set FILE_CREATED on O_CREAT|O_EXCL.

    cifs code didn't change during commit 116cc0225381415b96551f725455d067f63a76a0

    Kernel bugzilla 66251

    Signed-off-by: Shirish Pargaonkar
    Acked-by: Jeff Layton
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Shirish Pargaonkar
     
  • commit 750b8de6c4277d7034061e1da50663aa1b0479e4 upstream.

    When we obtain tcon from cifs_sb, we use cifs_sb_tlink() to first obtain
    tlink which also grabs a reference to it. We do not drop this reference
    to tlink once we are done with the call.

    The patch fixes this issue by instead passing tcon as a parameter and
    avoids having to obtain a reference to the tlink. A lookup for the tcon
    is already made in the calling functions and this way we avoid having to
    re-run the lookup. This is also consistent with the argument list for
    other similar calls for M-F symlinks.

    We should also return an ENOSYS when we do not find a protocol specific
    function to lookup the MF Symlink data.

    Signed-off-by: Sachin Prabhu
    Reviewed-by: Jeff Layton
    Signed-off-by: Steve French
    Signed-off-by: Greg Kroah-Hartman

    Sachin Prabhu
     
  • commit 56f91aad69444d650237295f68c195b74d888d95 upstream.

    If the length of data to be read in readpage() is exactly
    PAGE_CACHE_SIZE, the original code does not flush d-cache
    for data consistency after finishing reading. This patches fixes
    this.

    Signed-off-by: Li Wang
    Signed-off-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Li Wang
     
  • commit 8f9ff189205a6817aee5a1f996f876541f86e07c upstream.

    When using FITRIM ioctl on a file system without journal it will
    only trim the block group once, no matter how many times you invoke
    FITRIM ioctl and how many block you release from the block group.

    It is because we only clear EXT4_GROUP_INFO_WAS_TRIMMED_BIT in journal
    callback. Fix this by clearing the bit in no journal mode as well.

    Signed-off-by: Lukas Czerner
    Signed-off-by: "Theodore Ts'o"
    Reported-by: Jorge Fábregas
    Signed-off-by: Greg Kroah-Hartman

    Lukas Czerner
     
  • commit f5a44db5d2d677dfbf12deee461f85e9ec633961 upstream.

    The missing casts can cause the high 64-bits of the physical blocks to
    be lost. Set up new macros which allows us to make sure the right
    thing happen, even if at some point we end up supporting larger
    logical block numbers.

    Thanks to the Emese Revfy and the PaX security team for reporting this
    issue.

    Reported-by: PaX Team
    Reported-by: Emese Revfy
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit 34cf865d54813aab3497838132fb1bbd293f4054 upstream.

    Akira-san has been reporting rare deadlocks of his machine when running
    xfstests test 269 on ext4 filesystem. The problem turned out to be in
    ext4_da_reserve_metadata() and ext4_da_reserve_space() which called
    ext4_should_retry_alloc() while holding i_data_sem. Since
    ext4_should_retry_alloc() can force a transaction commit, this is a
    lock ordering violation and leads to deadlocks.

    Fix the problem by just removing the retry loops. These functions should
    just report ENOSPC to the caller (e.g. ext4_da_write_begin()) and that
    function must take care of retrying after dropping all necessary locks.

    Reported-and-tested-by: Akira Fujita
    Reviewed-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit 30fac0f75da24dd5bb43c9e911d2039a984ac815 upstream.

    When the filesystem doesn't support extents (like in ext2/3
    compatibility modes), there is no need to reserve any clusters. Space
    estimates for writing are exact, hole punching doesn't need new
    metadata, and there are no unwritten extents to convert.

    This fixes a problem when filesystem still having some free space when
    accessed with a native ext2/3 driver suddently reports ENOSPC when
    accessed with ext4 driver.

    Reported-by: Geert Uytterhoeven
    Tested-by: Geert Uytterhoeven
    Reviewed-by: Lukas Czerner
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit 9105bb149bbbc555d2e11ba5166dfe7a24eae09e upstream.

    That thing should be del_timer_sync(); consider what happens
    if ext4_put_super() call of del_timer() happens to come just as it's
    getting run on another CPU. Since that timer reschedules itself
    to run next day, you are pretty much guaranteed that you'll end up
    with kfree'd scheduled timer, with usual fun consequences. AFAICS,
    that's -stable fodder all way back to 2010... [the second del_timer_sync()
    is almost certainly not needed, but it doesn't hurt either]

    Signed-off-by: Al Viro
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     
  • commit 5946d089379a35dda0e531710b48fca05446a196 upstream.

    A corrupted ext4 may have out of order leaf extents, i.e.

    extent: lblk 0--1023, len 1024, pblk 9217, flags: LEAF UNINIT
    extent: lblk 1000--2047, len 1024, pblk 10241, flags: LEAF UNINIT
    ^^^^ overlap with previous extent

    Reading such extent could hit BUG_ON() in ext4_es_cache_extent().

    BUG_ON(end < lblk);

    The problem is that __read_extent_tree_block() tries to cache holes as
    well but assumes 'lblk' is greater than 'prev' and passes underflowed
    length to ext4_es_cache_extent(). Fix it by checking for overlapping
    extents in ext4_valid_extent_entries().

    I hit this when fuzz testing ext4, and am able to reproduce it by
    modifying the on-disk extent by hand.

    Also add the check for (ee_block + len - 1) in ext4_valid_extent() to
    make sure the value is not overflow.

    Ran xfstests on patched ext4 and no regression.

    Cc: Lukáš Czerner
    Signed-off-by: Eryu Guan
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Greg Kroah-Hartman

    Eryu Guan
     
  • commit 4e8d2139802ce4f41936a687f06c560b12115247 upstream.

    ext4_mb_put_pa should hold pa->pa_lock before accessing pa->pa_count.
    While ext4_mb_use_preallocated checks pa->pa_deleted first and then
    increments pa->count later, ext4_mb_put_pa decrements pa->pa_count
    before holding pa->pa_lock and then sets pa->pa_deleted.

    * Free sequence
    ext4_mb_put_pa (1): atomic_dec_and_test pa->pa_count
    ext4_mb_put_pa (2): lock pa->pa_lock
    ext4_mb_put_pa (3): check pa->pa_deleted
    ext4_mb_put_pa (4): set pa->pa_deleted=1
    ext4_mb_put_pa (5): unlock pa->pa_lock
    ext4_mb_put_pa (6): remove pa from a list
    ext4_mb_pa_callback: free pa

    * Use sequence
    ext4_mb_use_preallocated (1): iterate over preallocation
    ext4_mb_use_preallocated (2): lock pa->pa_lock
    ext4_mb_use_preallocated (3): check pa->pa_deleted
    ext4_mb_use_preallocated (4): increase pa->pa_count
    ext4_mb_use_preallocated (5): unlock pa->pa_lock
    ext4_mb_release_context: access pa

    * Use-after-free sequence
    [initial status] pa_deleted = 0, pa_count = 1>
    ext4_mb_use_preallocated (1): iterate over preallocation
    ext4_mb_use_preallocated (2): lock pa->pa_lock
    ext4_mb_use_preallocated (3): check pa->pa_deleted
    ext4_mb_put_pa (1): atomic_dec_and_test pa->pa_count
    [pa_count decremented] pa_deleted = 0, pa_count = 0>
    ext4_mb_use_preallocated (4): increase pa->pa_count
    [pa_count incremented] pa_deleted = 0, pa_count = 1>
    ext4_mb_use_preallocated (5): unlock pa->pa_lock
    ext4_mb_put_pa (2): lock pa->pa_lock
    ext4_mb_put_pa (3): check pa->pa_deleted
    ext4_mb_put_pa (4): set pa->pa_deleted=1
    [race condition!] pa_deleted = 1, pa_count = 1>
    ext4_mb_put_pa (5): unlock pa->pa_lock
    ext4_mb_put_pa (6): remove pa from a list
    ext4_mb_pa_callback: free pa
    ext4_mb_release_context: access pa

    AddressSanitizer has detected use-after-free in ext4_mb_new_blocks
    Bug report: http://goo.gl/rG1On3

    Signed-off-by: Junho Ryu
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Greg Kroah-Hartman

    Junho Ryu
     
  • commit ae1495b12df1897d4f42842a7aa7276d920f6290 upstream.

    While it's true that errors can only happen if there is a bug in
    jbd2_journal_dirty_metadata(), if a bug does happen, we need to halt
    the kernel or remount the file system read-only in order to avoid
    further data loss. The ext4_journal_abort_handle() function doesn't
    do any of this, and while it's likely that this call (since it doesn't
    adjust refcounts) will likely result in the file system eventually
    deadlocking since the current transaction will never be able to close,
    it's much cleaner to call let ext4's error handling system deal with
    this situation.

    There's a separate bug here which is that if certain jbd2 errors
    errors occur and file system is mounted errors=continue, the file
    system will probably eventually end grind to a halt as described
    above. But things have been this way in a long time, and usually when
    we have these sorts of errors it's pretty much a disaster --- and
    that's why the jbd2 layer aggressively retries memory allocations,
    which is the most likely cause of these jbd2 errors.

    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Theodore Ts'o
     
  • commit 718cc6f88cbfc4fbd39609f28c4c86883945f90d upstream.

    xfs_quota(8) will hang up if trying to turn group/project quota off
    before the user quota is off, this could be 100% reproduced by:
    # mount -ouquota,gquota /dev/sda7 /xfs
    # mkdir /xfs/test
    # xfs_quota -xc 'off -g' /xfs /proc/sysrq-trigger
    # dmesg

    SysRq : Show Blocked State
    task PC stack pid father
    xfs_quota D 0000000000000000 0 27574 2551 0x00000000
    [snip]
    Call Trace:
    [] schedule+0xad/0xc0
    [] schedule_timeout+0x35e/0x3c0
    [] ? mark_held_locks+0x176/0x1c0
    [] ? call_timer_fn+0x2c0/0x2c0
    [] ? xfs_qm_shrink_count+0x30/0x30 [xfs]
    [] schedule_timeout_uninterruptible+0x26/0x30
    [] xfs_qm_dquot_walk+0x235/0x260 [xfs]
    [] ? xfs_perag_get+0x1d8/0x2d0 [xfs]
    [] ? xfs_perag_get+0x5/0x2d0 [xfs]
    [] ? xfs_inode_ag_iterator+0xae/0xf0 [xfs]
    [] ? xfs_trans_free_dqinfo+0x50/0x50 [xfs]
    [] ? xfs_inode_ag_iterator+0xcf/0xf0 [xfs]
    [] xfs_qm_dqpurge_all+0x66/0xb0 [xfs]
    [] xfs_qm_scall_quotaoff+0x20a/0x5f0 [xfs]
    [] xfs_fs_set_xstate+0x136/0x180 [xfs]
    [] do_quotactl+0x53a/0x6b0
    [] ? iput+0x5b/0x90
    [] SyS_quotactl+0x167/0x1d0
    [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [] system_call_fastpath+0x16/0x1b

    It's fine if we turn user quota off at first, then turn off other
    kind of quotas if they are enabled since the group/project dquot
    refcount is decreased to zero once the user quota if off. Otherwise,
    those dquots refcount is non-zero due to the user dquot might refer
    to them as hint(s). Hence, above operation cause an infinite loop
    at xfs_qm_dquot_walk() while trying to purge dquot cache.

    This problem has been around since Linux 3.4, it was introduced by:
    [ b84a3a9675 xfs: remove the per-filesystem list of dquots ]

    Originally we will release the group dquot pointers because the user
    dquots maybe carrying around as a hint via xfs_qm_detach_gdquots().
    However, with above change, there is no such work to be done before
    purging group/project dquot cache.

    In order to solve this problem, this patch introduces a special routine
    xfs_qm_dqpurge_hints(), and it would release the group/project dquot
    pointers the user dquots maybe carrying around as a hint, and then it
    will proceed to purge the user dquot cache if requested.

    (cherry picked from commit df8052e7dae00bde6f21b40b6e3e1099770f3afc)

    Signed-off-by: Jie Liu
    Reviewed-by: Dave Chinner
    Signed-off-by: Ben Myers
    Signed-off-by: Greg Kroah-Hartman

    Jie Liu
     
  • commit 1881686f842065d2f92ec9c6424830ffc17d23b0 upstream.

    e34ecee2ae791df674dfb466ce40692ca6218e43 reworked the percpu reference
    counting to correct a bug trinity found. Unfortunately, the change lead
    to kioctxes being leaked because there was no final reference count to
    put. Add that reference count back in to fix things.

    Signed-off-by: Benjamin LaHaise
    Signed-off-by: Greg Kroah-Hartman

    Benjamin LaHaise
     
  • commit ff638b7df5a9264024a6448bdfde2b2bf5d1994a upstream.

    ceph_osdc_readpages() returns number of bytes read, currently,
    the code only allocate full-zero page into fscache, this patch
    fixes this.

    Signed-off-by: Li Wang
    Reviewed-by: Milosz Tanski
    Reviewed-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Li Wang
     
  • commit fc55d2c9448b34218ca58733a6f51fbede09575b upstream.

    We also need to wake up 'safe' waiters if error occurs or request
    aborted. Otherwise sync(2)/fsync(2) may hang forever.

    Signed-off-by: Yan, Zheng
    Signed-off-by: Sage Weil
    Signed-off-by: Greg Kroah-Hartman

    Yan, Zheng