07 Aug, 2015

16 commits

  • Recently I addressed a few of hwpoison race problems and the patches are
    merged on v4.2-rc1. It made progress, but unfortunately some problems
    still remain due to less coverage of my testing. So I'm trying to fix
    or avoid them in this series.

    One point I'm expecting to discuss is that patch 4/5 changes the page
    flag set to be checked on free time. In current behavior, __PG_HWPOISON
    is not supposed to be set when the page is freed. I think that there is
    no strong reason for this behavior, and it causes a problem hard to fix
    only in error handler side (because __PG_HWPOISON could be set at
    arbitrary timing.) So I suggest to change it.

    With this patchset, hwpoison stress testing in official mce-test
    testsuite (which previously failed) passes.

    This patch (of 5):

    In "just unpoisoned" path, we do put_page and then unlock_page, which is
    a wrong order and causes "freeing locked page" bug. So let's fix it.

    Signed-off-by: Naoya Horiguchi
    Cc: Andi Kleen
    Cc: Dean Nelson
    Cc: Tony Luck
    Cc: "Kirill A. Shutemov"
    Cc: Hugh Dickins
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • The shm implementation internally uses shmem or hugetlbfs inodes for shm
    segments. As these inodes are never directly exposed to userspace and
    only accessed through the shm operations which are already hooked by
    security modules, mark the inodes with the S_PRIVATE flag so that inode
    security initialization and permission checking is skipped.

    This was motivated by the following lockdep warning:

    ======================================================
    [ INFO: possible circular locking dependency detected ]
    4.2.0-0.rc3.git0.1.fc24.x86_64+debug #1 Tainted: G W
    -------------------------------------------------------
    httpd/1597 is trying to acquire lock:
    (&ids->rwsem){+++++.}, at: shm_close+0x34/0x130
    but task is already holding lock:
    (&mm->mmap_sem){++++++}, at: SyS_shmdt+0x4b/0x180
    which lock already depends on the new lock.
    the existing dependency chain (in reverse order) is:
    -> #3 (&mm->mmap_sem){++++++}:
    lock_acquire+0xc7/0x270
    __might_fault+0x7a/0xa0
    filldir+0x9e/0x130
    xfs_dir2_block_getdents.isra.12+0x198/0x1c0 [xfs]
    xfs_readdir+0x1b4/0x330 [xfs]
    xfs_file_readdir+0x2b/0x30 [xfs]
    iterate_dir+0x97/0x130
    SyS_getdents+0x91/0x120
    entry_SYSCALL_64_fastpath+0x12/0x76
    -> #2 (&xfs_dir_ilock_class){++++.+}:
    lock_acquire+0xc7/0x270
    down_read_nested+0x57/0xa0
    xfs_ilock+0x167/0x350 [xfs]
    xfs_ilock_attr_map_shared+0x38/0x50 [xfs]
    xfs_attr_get+0xbd/0x190 [xfs]
    xfs_xattr_get+0x3d/0x70 [xfs]
    generic_getxattr+0x4f/0x70
    inode_doinit_with_dentry+0x162/0x670
    sb_finish_set_opts+0xd9/0x230
    selinux_set_mnt_opts+0x35c/0x660
    superblock_doinit+0x77/0xf0
    delayed_superblock_init+0x10/0x20
    iterate_supers+0xb3/0x110
    selinux_complete_init+0x2f/0x40
    security_load_policy+0x103/0x600
    sel_write_load+0xc1/0x750
    __vfs_write+0x37/0x100
    vfs_write+0xa9/0x1a0
    SyS_write+0x58/0xd0
    entry_SYSCALL_64_fastpath+0x12/0x76
    ...

    Signed-off-by: Stephen Smalley
    Reported-by: Morten Stevens
    Acked-by: Hugh Dickins
    Acked-by: Paul Moore
    Cc: Manfred Spraul
    Cc: Davidlohr Bueso
    Cc: Prarit Bhargava
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Smalley
     
  • Commit 92923ca3aace ("mm: meminit: only set page reserved in the
    memblock region") broke memory hotplug which expects the memmap for
    newly added sections to be reserved until onlined by
    online_pages_range(). This patch marks hotplugged pages as reserved
    when adding new zones.

    Signed-off-by: Mel Gorman
    Reported-by: David Vrabel
    Tested-by: David Vrabel
    Cc: Nathan Zimmer
    Cc: Robin Holt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • When using a large volume, for example 9T volume with 2T already used,
    frequent creation of small files with O_DIRECT when the IO is not
    cluster aligned may clear sectors in the wrong place. This will cause
    filesystem corruption.

    This is because p_cpos is a u32. When calculating the corresponding
    sector it should be converted to u64 first, otherwise it may overflow.

    Signed-off-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: [4.0+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • The s-Par visornic driver, currently in staging, processes a queue being
    serviced by the an s-Par service partition. We can get a message that
    something has happened with the Service Partition, when that happens, we
    must not access the channel until we get a message that the service
    partition is back again.

    The visornic driver has a thread for processing the channel, when we get
    the message, we need to be able to park the thread and then resume it
    when the problem clears.

    We can do this with kthread_park and unpark but they are not exported
    from the kernel, this patch exports the needed functions.

    Signed-off-by: David Kershner
    Acked-by: Ingo Molnar
    Acked-by: Neil Horman
    Acked-by: Thomas Gleixner
    Cc: Richard Weinberger
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Kershner
     
  • fsnotify_clear_marks_by_group_flags() can race with
    fsnotify_destroy_marks() so that when fsnotify_destroy_mark_locked()
    drops mark_mutex, a mark from the list iterated by
    fsnotify_clear_marks_by_group_flags() can be freed and thus the next
    entry pointer we have cached may become stale and we dereference free
    memory.

    Fix the problem by first moving marks to free to a special private list
    and then always free the first entry in the special list. This method
    is safe even when entries from the list can disappear once we drop the
    lock.

    Signed-off-by: Jan Kara
    Reported-by: Ashish Sangwan
    Reviewed-by: Ashish Sangwan
    Cc: Lino Sanfilippo
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Using a 64 bit constant generates "warning: integer constant is too
    large for 'long' type" on 32 bit platforms. Instead use ~0ul and
    BITS_PER_LONG.

    Detected by Andrew Morton on ARMD.

    Signed-off-by: Sowmini Varadhan
    Cc: Benjamin Herrenschmidt
    Cc: David S. Miller
    Cc: Guenter Roeck
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sowmini Varadhan
     
  • This patch fixes creation of new kmem-caches after enabling
    sanity_checks for existing mergeable kmem-caches in runtime: before that
    patch creation fails because unique name in sysfs already taken by
    existing kmem-cache.

    Unlike other debug options this doesn't change object layout and could
    be enabled and disabled at any time.

    Signed-off-by: Konstantin Khlebnikov
    Acked-by: Christoph Lameter
    Cc: Pekka Enberg
    Acked-by: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • This function may copy the si_addr_lsb field to user mode when it hasn't
    been initialized, which can leak kernel stack data to user mode.

    Just checking the value of si_code is insufficient because the same
    si_code value is shared between multiple signals. This is solved by
    checking the value of si_signo in addition to si_code.

    Signed-off-by: Amanieu d'Antras
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amanieu d'Antras
     
  • This function may copy the si_addr_lsb, si_lower and si_upper fields to
    user mode when they haven't been initialized, which can leak kernel
    stack data to user mode.

    Just checking the value of si_code is insufficient because the same
    si_code value is shared between multiple signals. This is solved by
    checking the value of si_signo in addition to si_code.

    Signed-off-by: Amanieu d'Antras
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amanieu d'Antras
     
  • This function can leak kernel stack data when the user siginfo_t has a
    positive si_code value. The top 16 bits of si_code descibe which fields
    in the siginfo_t union are active, but they are treated inconsistently
    between copy_siginfo_from_user32, copy_siginfo_to_user32 and
    copy_siginfo_to_user.

    copy_siginfo_from_user32 is called from rt_sigqueueinfo and
    rt_tgsigqueueinfo in which the user has full control overthe top 16 bits
    of si_code.

    This fixes the following information leaks:
    x86: 8 bytes leaked when sending a signal from a 32-bit process to
    itself. This leak grows to 16 bytes if the process uses x32.
    (si_code = __SI_CHLD)
    x86: 100 bytes leaked when sending a signal from a 32-bit process to
    a 64-bit process. (si_code = -1)
    sparc: 4 bytes leaked when sending a signal from a 32-bit process to a
    64-bit process. (si_code = any)

    parsic and s390 have similar bugs, but they are not vulnerable because
    rt_[tg]sigqueueinfo have checks that prevent sending a positive si_code
    to a different process. These bugs are also fixed for consistency.

    Signed-off-by: Amanieu d'Antras
    Cc: Oleg Nesterov
    Cc: Ingo Molnar
    Cc: Russell King
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Chris Metcalf
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amanieu d'Antras
     
  • The "BUG_ON(list_empty(&osb->blocked_lock_list))" in
    ocfs2_downconvert_thread_do_work can be triggered in the following case:

    ocfs2dc has firstly saved osb->blocked_lock_count to local varibale
    processed, and then processes the dentry lockres. During the dentry
    put, it calls iput and then deletes rw, inode and open lockres from
    blocked list in ocfs2_mark_lockres_freeing. And this causes the
    variable `processed' to not reflect the number of blocked lockres to be
    processed, which triggers the BUG.

    Signed-off-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • Dave Hansen reported the following;

    My laptop has been behaving strangely with 4.2-rc2. Once I log
    in to my X session, I start getting all kinds of strange errors
    from applications and see this in my dmesg:

    VFS: file-max limit 8192 reached

    The problem is that the file-max is calculated before memory is fully
    initialised and miscalculates how much memory the kernel is using. This
    patch recalculates file-max after deferred memory initialisation. Note
    that using memory hotplug infrastructure would not have avoided this
    problem as the value is not recalculated after memory hot-add.

    4.1: files_stat.max_files = 6582781
    4.2-rc2: files_stat.max_files = 8192
    4.2-rc2 patched: files_stat.max_files = 6562467

    Small differences with the patch applied and 4.1 but not enough to matter.

    Signed-off-by: Mel Gorman
    Reported-by: Dave Hansen
    Cc: Nicolai Stange
    Cc: Dave Hansen
    Cc: Alex Ng
    Cc: Fengguang Wu
    Cc: Peter Zijlstra (Intel)
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Commit 0e1cc95b4cc7 ("mm: meminit: finish initialisation of struct pages
    before basic setup") introduced a rwsem to signal completion of the
    initialization workers.

    Lockdep complains about possible recursive locking:
    =============================================
    [ INFO: possible recursive locking detected ]
    4.1.0-12802-g1dc51b8 #3 Not tainted
    ---------------------------------------------
    swapper/0/1 is trying to acquire lock:
    (pgdat_init_rwsem){++++.+},
    at: [] page_alloc_init_late+0xc7/0xe6

    but task is already holding lock:
    (pgdat_init_rwsem){++++.+},
    at: [] page_alloc_init_late+0x3e/0xe6

    Replace the rwsem by a completion together with an atomic
    "outstanding work counter".

    [peterz@infradead.org: Barrier removal on the grounds of being pointless]
    [mgorman@suse.de: Applied review feedback]
    Signed-off-by: Nicolai Stange
    Signed-off-by: Mel Gorman
    Acked-by: Peter Zijlstra (Intel)
    Cc: Dave Hansen
    Cc: Alex Ng
    Cc: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicolai Stange
     
  • early_pfn_to_nid() historically was inherently not SMP safe but only
    used during boot which is inherently single threaded or during hotplug
    which is protected by a giant mutex.

    With deferred memory initialisation there was a thread-safe version
    introduced and the early_pfn_to_nid would trigger a BUG_ON if used
    unsafely. Memory hotplug hit that check. This patch makes
    early_pfn_to_nid introduces a lock to make it safe to use during
    hotplug.

    Signed-off-by: Mel Gorman
    Reported-by: Alex Ng
    Tested-by: Alex Ng
    Acked-by: Peter Zijlstra (Intel)
    Cc: Nicolai Stange
    Cc: Dave Hansen
    Cc: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • A while back, the message queue implementation in the kernel was
    improved to use btrees to speed up retrieval of messages, in commit
    d6629859b36d ("ipc/mqueue: improve performance of send/recv").

    That patch introducing the improved kernel handling of message queues
    (using btrees) has, as a by-product, changed the meaning of the QSIZE
    field in the pseudo-file created for the queue. Before, this field
    reflected the size of the user-data in the queue. Since, it also takes
    kernel data structures into account. For example, if 13 bytes of user
    data are in the queue, on my machine the file reports a size of 61
    bytes.

    There was some discussion on this topic before (for example
    https://lkml.org/lkml/2014/10/1/115). Commenting on a th lkml, Michael
    Kerrisk gave the following background
    (https://lkml.org/lkml/2015/6/16/74):

    The pseudofiles in the mqueue filesystem (usually mounted at
    /dev/mqueue) expose fields with metadata describing a message
    queue. One of these fields, QSIZE, as originally implemented,
    showed the total number of bytes of user data in all messages in
    the message queue, and this feature was documented from the
    beginning in the mq_overview(7) page. In 3.5, some other (useful)
    work happened to break the user-space API in a couple of places,
    including the value exposed via QSIZE, which now includes a measure
    of kernel overhead bytes for the queue, a figure that renders QSIZE
    useless for its original purpose, since there's no way to deduce
    the number of overhead bytes consumed by the implementation.
    (The other user-space breakage was subsequently fixed.)

    This patch removes the accounting of kernel data structures in the
    queue. Reporting the size of these data-structures in the QSIZE field
    was a breaking change (see Michael's comment above). Without the QSIZE
    field reporting the total size of user-data in the queue, there is no
    way to deduce this number.

    It should be noted that the resource limit RLIMIT_MSGQUEUE is counted
    against the worst-case size of the queue (in both the old and the new
    implementation). Therefore, the kernel overhead accounting in QSIZE is
    not necessary to help the user understand the limitations RLIMIT imposes
    on the processes.

    Signed-off-by: Marcus Gelderie
    Acked-by: Doug Ledford
    Acked-by: Michael Kerrisk
    Acked-by: Davidlohr Bueso
    Cc: David Howells
    Cc: Alexander Viro
    Cc: John Duffy
    Cc: Arto Bendiken
    Cc: Manfred Spraul
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marcus Gelderie
     

05 Aug, 2015

6 commits

  • Pull KVM fixes from Paolo Bonzini:
    "Just two very small & simple patches"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: MTRR: Use default type for non-MTRR-covered gfn before WARN_ON
    KVM: s390: Fix hang VCPU hang/loop regression

    Linus Torvalds
     
  • The patch was munged on commit to re-order these tests resulting in
    excessive warnings when trying to do device assignment. Return to
    original ordering: https://lkml.org/lkml/2015/7/15/769

    Fixes: 3e5d2fdceda1 ("KVM: MTRR: simplify kvm_mtrr_get_guest_memory_type")
    Signed-off-by: Alex Williamson
    Reviewed-by: Xiao Guangrong
    Signed-off-by: Paolo Bonzini

    Alex Williamson
     
  • Pull md fixes from Neil Brown:
    "Three more fixes for md in 4.2

    Mostly corner-case stuff.

    One of these patches is for a CVE: CVE-2015-5697

    I'm not convinced it is serious (data leak from CAP_SYS_ADMIN ioctl)
    but as people seem to want to back-port it, I've included a minimal
    version here. The remainder of that patch from Benjamin is
    code-cleanup and will arrive in the 4.3 merge window"

    * tag 'md/4.2-rc5-fixes' of git://neil.brown.name/md:
    md/raid5: don't let shrink_slab shrink too far.
    md: use kzalloc() when bitmap is disabled
    md/raid1: extend spinlock to protect raid1_end_read_request against inconsistencies

    Linus Torvalds
     
  • Pull nfsd fixes from Bruce Fields.

    * 'for-4.2' of git://linux-nfs.org/~bfields/linux:
    nfsd: do nfs4_check_fh in nfs4_check_file instead of nfs4_check_olstateid
    nfsd: Fix a file leak on nfsd4_layout_setlease failure
    nfsd: Drop BUG_ON and ignore SECLABEL on absent filesystem

    Linus Torvalds
     
  • Nikolay has reported a hang when a memcg reclaim got stuck with the
    following backtrace:

    PID: 18308 TASK: ffff883d7c9b0a30 CPU: 1 COMMAND: "rsync"
    #0 __schedule at ffffffff815ab152
    #1 schedule at ffffffff815ab76e
    #2 schedule_timeout at ffffffff815ae5e5
    #3 io_schedule_timeout at ffffffff815aad6a
    #4 bit_wait_io at ffffffff815abfc6
    #5 __wait_on_bit at ffffffff815abda5
    #6 wait_on_page_bit at ffffffff8111fd4f
    #7 shrink_page_list at ffffffff81135445
    #8 shrink_inactive_list at ffffffff81135845
    #9 shrink_lruvec at ffffffff81135ead
    #10 shrink_zone at ffffffff811360c3
    #11 shrink_zones at ffffffff81136eff
    #12 do_try_to_free_pages at ffffffff8113712f
    #13 try_to_free_mem_cgroup_pages at ffffffff811372be
    #14 try_charge at ffffffff81189423
    #15 mem_cgroup_try_charge at ffffffff8118c6f5
    #16 __add_to_page_cache_locked at ffffffff8112137d
    #17 add_to_page_cache_lru at ffffffff81121618
    #18 pagecache_get_page at ffffffff8112170b
    #19 grow_dev_page at ffffffff811c8297
    #20 __getblk_slow at ffffffff811c91d6
    #21 __getblk_gfp at ffffffff811c92c1
    #22 ext4_ext_grow_indepth at ffffffff8124565c
    #23 ext4_ext_create_new_leaf at ffffffff81246ca8
    #24 ext4_ext_insert_extent at ffffffff81246f09
    #25 ext4_ext_map_blocks at ffffffff8124a848
    #26 ext4_map_blocks at ffffffff8121a5b7
    #27 mpage_map_one_extent at ffffffff8121b1fa
    #28 mpage_map_and_submit_extent at ffffffff8121f07b
    #29 ext4_writepages at ffffffff8121f6d5
    #30 do_writepages at ffffffff8112c490
    #31 __filemap_fdatawrite_range at ffffffff81120199
    #32 filemap_flush at ffffffff8112041c
    #33 ext4_alloc_da_blocks at ffffffff81219da1
    #34 ext4_rename at ffffffff81229b91
    #35 ext4_rename2 at ffffffff81229e32
    #36 vfs_rename at ffffffff811a08a5
    #37 SYSC_renameat2 at ffffffff811a3ffc
    #38 sys_renameat2 at ffffffff811a408e
    #39 sys_rename at ffffffff8119e51e
    #40 system_call_fastpath at ffffffff815afa89

    Dave Chinner has properly pointed out that this is a deadlock in the
    reclaim code because ext4 doesn't submit pages which are marked by
    PG_writeback right away.

    The heuristic was introduced by commit e62e384e9da8 ("memcg: prevent OOM
    with too many dirty pages") and it was applied only when may_enter_fs
    was specified. The code has been changed by c3b94f44fcb0 ("memcg:
    further prevent OOM with too many dirty pages") which has removed the
    __GFP_FS restriction with a reasoning that we do not get into the fs
    code. But this is not sufficient apparently because the fs doesn't
    necessarily submit pages marked PG_writeback for IO right away.

    ext4_bio_write_page calls io_submit_add_bh but that doesn't necessarily
    submit the bio. Instead it tries to map more pages into the bio and
    mpage_map_one_extent might trigger memcg charge which might end up
    waiting on a page which is marked PG_writeback but hasn't been submitted
    yet so we would end up waiting for something that never finishes.

    Fix this issue by replacing __GFP_IO by may_enter_fs check (for case 2)
    before we go to wait on the writeback. The page fault path, which is
    the only path that triggers memcg oom killer since 3.12, shouldn't
    require GFP_NOFS and so we shouldn't reintroduce the premature OOM
    killer issue which was originally addressed by the heuristic.

    As per David Chinner the xfs is doing similar thing since 2.6.15 already
    so ext4 is not the only affected filesystem. Moreover he notes:

    : For example: IO completion might require unwritten extent conversion
    : which executes filesystem transactions and GFP_NOFS allocations. The
    : writeback flag on the pages can not be cleared until unwritten
    : extent conversion completes. Hence memory reclaim cannot wait on
    : page writeback to complete in GFP_NOFS context because it is not
    : safe to do so, memcg reclaim or otherwise.

    Cc: stable@vger.kernel.org # 3.9+
    [tytso@mit.edu: corrected the control flow]
    Fixes: c3b94f44fcb0 ("memcg: further prevent OOM with too many dirty pages")
    Reported-by: Nikolay Borisov
    Signed-off-by: Michal Hocko
    Signed-off-by: Hugh Dickins
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Pull PCI fix from Bjorn Helgaas:
    "This is a trivial fix for a change that broke user program compilation
    (QEMU in this case)"

    * tag 'pci-v4.2-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
    PCI: Restore PCI_MSIX_FLAGS_BIRMASK definition

    Linus Torvalds
     

04 Aug, 2015

13 commits

  • Pull drm mst fixes from Daniel Vetter:
    "Special pull request for mst fixes since most of the patches touch
    code outside of i915 proper. DRM parts have also been reviewed by
    Thierry (nvidia) since Dave's enjoying vacations"

    * tag 'topic/mst-fixes-2015-08-04' of git://anongit.freedesktop.org/drm-intel:
    drm/atomic-helpers: Make encoder picking more robust
    drm/dp-mst: Remove debug WARN_ON
    drm/i915: Fixup dp mst encoder selection
    drm/atomic-helper: Add an atomice best_encoder callback

    Linus Torvalds
     
  • Pull xen bug fixes from David Vrabel:

    - don't lose interrupts when offlining CPUs

    - fix gntdev oops during unmap

    - drop the balloon lock occasionally to allow domain create/destroy

    * tag 'for-linus-4.2-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
    xen/events/fifo: Handle linked events when closing a port
    xen: release lock occasionally during ballooning
    xen/gntdevt: Fix race condition in gntdev_release()

    Linus Torvalds
     
  • An event channel bound to a CPU that was offlined may still be linked
    on that CPU's queue. If this event channel is closed and reused,
    subsequent events will be lost because the event channel is never
    unlinked and thus cannot be linked onto the correct queue.

    When a channel is closed and the event is still linked into a queue,
    ensure that it is unlinked before completing.

    If the CPU to which the event channel bound is online, spin until the
    event is handled by that CPU. If that CPU is offline, it can't handle
    the event, so clear the event queue during the close, dropping the
    events.

    This fixes the missing interrupts (and subsequent disk stalls etc.)
    when offlining a CPU.

    Signed-off-by: Ross Lagerwall
    Cc:
    Signed-off-by: David Vrabel

    Ross Lagerwall
     
  • Pull kbuild fixes from Michal Marek:
    "Two fixes for kbuild:

    - The new ARCH_{CPP,A,C}FLAGS variables are reset before including
    the arch Makefile

    - Fix calling make modules_install twice when module compression is
    enabled"

    * 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
    Makefile: Force gzip and xz on module install
    kbuild: Do not pick up ARCH_{CPP,A,C}FLAGS from the environment

    Linus Torvalds
     
  • We've had a few issues with atomic where subtle bugs in the encoder
    picking logic lead to accidental self-stealing of the encoder,
    resulting in a NULL connector_state->crtc in update_connector_routing
    and subsequent.

    Linus applied some duct-tape for an mst regression in

    commit 27667f4744fc5a0f3e50910e78740bac5670d18b
    Author: Linus Torvalds
    Date: Wed Jul 29 22:18:16 2015 -0700

    i915: temporary fix for DP MST docking station NULL pointer dereference

    But that was incomplete (the code will still oops when debuggin is
    enabled) and mangled the state even further. So instead WARN and bail
    out as the more future-proof option.

    Cc: Theodore Ts'o
    Cc: Linus Torvalds
    Reviewed-by: Thierry Reding
    Reviewed-by: Ander Conselvan de Oliveira
    Signed-off-by: Daniel Vetter

    Daniel Vetter
     
  • Apparently been in there since forever and fairly easy to hit when
    hotplugging really fast. I can do that since my mst hub has a manual
    button to flick the hpd line for reprobing. The resulting WARNING spam
    isn't pretty.

    Cc: Dave Airlie
    Cc: stable@vger.kernel.org
    Reviewed-by: Thierry Reding
    Reviewed-by: Ander Conselvan de Oliveira
    Signed-off-by: Daniel Vetter

    Daniel Vetter
     
  • In

    commit 8c7b5ccb729870e606321b3703e2c2e698c49a95
    Author: Ander Conselvan de Oliveira
    Date: Tue Apr 21 17:13:19 2015 +0300

    drm/i915: Use atomic helpers for computing changed flags

    we've switched over to the atomic version to compute the
    crtc->encoder->connector routing from the i915 variant. That one
    relies upon the ->best_encoder callback, but the i915-private version
    relied upon intel_find_encoder. Which didn't matter except for dp mst,
    where the encoder depends upon the selected crtc.

    Fix this functional bug by implemented a correct atomic-state based
    encoder selector for dp mst.

    Note that we can't get rid of the legacy best_encoder callback since
    the fbdev emulation uses that still. That means it's incorrect there
    still, but that's been the case ever since i915 dp mst support was
    merged so not a regression. Best to fix that by converting fbdev over
    to atomic too.

    Cc: Chris Wilson
    Cc: Linus Torvalds
    Cc: Theodore Ts'o
    Reviewed-by: Ander Conselvan de Oliveira
    Signed-off-by: Daniel Vetter

    Daniel Vetter
     
  • With legacy helpers all the routing was already set up when calling
    best_encoder and so could be inspected. But with atomic it's staged,
    hence we need a new atomic compliant callback for drivers which need
    to inspect the requested state and can't just decided the best encoder
    statically.

    This is needed to fix up i915 dp mst where we need to pick the right
    encoder depending upon the requested CRTC for the connector.

    v2: Don't forget to amend the kerneldoc

    Cc: Chris Wilson
    Cc: Linus Torvalds
    Cc: Theodore Ts'o
    Acked-by: Thierry Reding
    Reviewed-by: Ander Conselvan de Oliveira
    Signed-off-by: Daniel Vetter

    Daniel Vetter
     
  • Pull i2c fixes from Wolfram Sang:
    "A refcounting bugfix for the i2c-core, bugfixes for the generic bus
    recovery algorithm and for its omap-user, making binary file
    attributes for EEPROMs behave POSIX compliant, and a small typo fix
    while we are here"

    * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
    i2c: fix leaked device refcount on of_find_i2c_* error path
    i2c: Fix typo in i2c-bfin-twi.c
    i2c: omap: fix bus recovery setup
    i2c: core: only use set_scl for bus recovery after calling prepare_recovery
    misc: eeprom: at24: clean up at24_bin_write()
    i2c: slave eeprom: clean up sysfs bin attribute read()/write()

    Linus Torvalds
     
  • Pull Ceph fixes from Sage Weil:
    "There are two critical regression fixes for CephFS from Zheng, and an
    RBD completion fix for layered images from Ilya"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    rbd: fix copyup completion race
    ceph: always re-send cap flushes when MDS recovers
    ceph: fix ceph_encode_locks_to_buffer()

    Linus Torvalds
     
  • Pull security layer fix from James Morris:
    "Yama initialization fix"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
    Adding YAMA hooks also when YAMA is not stacked.

    Linus Torvalds
     
  • Pull crypto fixes from Herbert Xu:
    "This fixes the following issues:

    - a bogus BUG_ON in ixp4xx that can be triggered by a dst buffer that
    is an SG list.

    - the error handling in hwrngd may cause a crash in case of an error.

    - fix a race condition in qat registration when multiple devices are
    present"

    * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    hwrng: core - correct error check of kthread_run call
    crypto: ixp4xx - Remove bogus BUG_ON on scattered dst buffer
    crypto: qat - Fix invalid synchronization between register/unregister sym algs

    Linus Torvalds
     
  • Pull module fix from Rusty Russell:
    "Single overzealous locking assertion fix"

    * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    module: weaken locking assertion for oops path.

    Linus Torvalds
     

03 Aug, 2015

5 commits

  • Without this patch YAMA will not work at all if it is chosen
    as the primary LSM instead of being "stacked".

    Signed-off-by: Salvatore Mesoraca
    Acked-by: Kees Cook
    Signed-off-by: James Morris

    Salvatore Mesoraca
     
  • I have a report of drop_one_stripe() called from
    raid5_cache_scan() apparently finding ->max_nr_stripes == 0.

    This should not be allowed.

    So add a test to keep max_nr_stripes above min_nr_stripes.

    Also use a 'mask' rather than a 'mod' in drop_one_stripe
    to ensure 'hash' is valid even if max_nr_stripes does reach zero.

    Fixes: edbe83ab4c27 ("md/raid5: allow the stripe_cache to grow and shrink.")
    Cc: stable@vger.kernel.org (4.1 - please release with 2d5b569b665)
    Reported-by: Tomas Papan
    Signed-off-by: NeilBrown

    NeilBrown
     
  • In drivers/md/md.c get_bitmap_file() uses kmalloc() for creating a
    mdu_bitmap_file_t called "file".

    5769 file = kmalloc(sizeof(*file), GFP_NOIO);
    5770 if (!file)
    5771 return -ENOMEM;

    This structure is copied to user space at the end of the function.

    5786 if (err == 0 &&
    5787 copy_to_user(arg, file, sizeof(*file)))
    5788 err = -EFAULT

    But if bitmap is disabled only the first byte of "file" is initialized
    with zero, so it's possible to read some bytes (up to 4095) of kernel
    space memory from user space. This is an information leak.

    5775 /* bitmap disabled, zero the first byte and copy out */
    5776 if (!mddev->bitmap_info.file)
    5777 file->pathname[0] = '\0';

    Signed-off-by: Benjamin Randazzo
    Signed-off-by: NeilBrown

    Benjamin Randazzo
     
  • raid1_end_read_request() assumes that the In_sync bits are consistent
    with the ->degaded count.
    raid1_spare_active updates the In_sync bit before the ->degraded count
    and so exposes an inconsistency, as does error()
    So extend the spinlock in raid1_spare_active() and error() to hide those
    inconsistencies.

    This should probably be part of
    Commit: 34cab6f42003 ("md/raid1: fix test for 'was read error from
    last working device'.")
    as it addresses the same issue. It fixes the same bug and should go
    to -stable for same reasons.

    Fixes: 76073054c95b ("md/raid1: clean up read_balance.")
    Cc: stable@vger.kernel.org (v3.0+)
    Signed-off-by: NeilBrown

    NeilBrown
     
  • Linus Torvalds