26 Jan, 2017

2 commits

  • commit fff5d99225107f5f13fe4a9805adc2a1c4b5fb00 upstream.

    On architectures like arm64, swiotlb is tied intimately to the core
    architecture DMA support. In addition, ZONE_DMA cannot be disabled.

    To aid debugging and catch devices not supporting DMA to memory outside
    the 32-bit address space, add a kernel command line option
    "swiotlb=noforce", which disables the use of bounce buffers.
    If specified, trying to map memory that cannot be used with DMA will
    fail, and a rate-limited warning will be printed.

    Note that io_tlb_nslabs is set to 1, which is the minimal supported
    value.

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Konrad Rzeszutek Wilk
    Signed-off-by: Greg Kroah-Hartman

    Geert Uytterhoeven
     
  • commit ae7871be189cb41184f1e05742b4a99e2c59774d upstream.

    Convert the flag swiotlb_force from an int to an enum, to prepare for
    the advent of more possible values.

    Suggested-by: Konrad Rzeszutek Wilk
    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Konrad Rzeszutek Wilk
    Signed-off-by: Greg Kroah-Hartman

    Geert Uytterhoeven
     

20 Jan, 2017

1 commit

  • commit b9dc6f65bc5e232d1c05fe34b5daadc7e8bbf1fb upstream.

    The logics in pipe_advance() used to release all buffers past the new
    position failed in cases when the number of buffers to release was equal
    to pipe->buffers. If that happened, none of them had been released,
    leaving pipe full. Worse, it was trivial to trigger and we end up with
    pipe full of uninitialized pages. IOW, it's an infoleak.

    Reported-by: "Alan J. Wylie"
    Tested-by: "Alan J. Wylie"
    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     

10 Dec, 2016

1 commit

  • This reverts commit 53855d10f4567a0577360b6448d52a863929775b.

    It shouldn't have come in yet - it depends on the changes in linux-next
    that will come in during the next merge window. As Matthew Wilcox says,
    the test suite is broken with the current state without the revert.

    Requested-by: Matthew Wilcox
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

08 Dec, 2016

2 commits

  • Patch "lib/radix-tree: Convert to hotplug state machine" breaks the test
    suite as it adds a call to cpuhp_setup_state_nocalls() which is not
    currently emulated in the test suite. Add it, and delete the emulation
    of the old CPU hotplug mechanism.

    Link: http://lkml.kernel.org/r/1480369871-5271-36-git-send-email-mawilcox@linuxonhyperv.com
    Signed-off-by: Matthew Wilcox
    Tested-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Ross Zwisler
    Cc: Matthew Wilcox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Pull locking fixes from Ingo Molnar:
    "Two rtmutex race fixes (which miraculously never triggered, that we
    know of), plus two lockdep printk formatting regression fixes"

    * 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    lockdep: Fix report formatting
    locking/rtmutex: Use READ_ONCE() in rt_mutex_owner()
    locking/rtmutex: Prevent dequeue vs. unlock race
    locking/selftest: Fix output since KERN_CONT changes

    Linus Torvalds
     

01 Dec, 2016

2 commits

  • Gcc revision 241896 implements use-after-scope detection. Will be
    available in gcc 7. Support it in KASAN.

    Gcc emits 2 new callbacks to poison/unpoison large stack objects when
    they go in/out of scope. Implement the callbacks and add a test.

    [dvyukov@google.com: v3]
    Link: http://lkml.kernel.org/r/1479998292-144502-1-git-send-email-dvyukov@google.com
    Link: http://lkml.kernel.org/r/1479226045-145148-1-git-send-email-dvyukov@google.com
    Signed-off-by: Dmitry Vyukov
    Acked-by: Andrey Ryabinin
    Cc: Alexander Potapenko
    Cc: [4.0+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Vyukov
     
  • Drivers, or other modules, that use a mixture of objects (especially
    objects embedded within other objects) would like to take advantage of
    the debugobjects facilities to help catch misuse. Currently, the
    debugobjects interface is only available to builtin drivers and requires
    a set of EXPORT_SYMBOL_GPL for use by modules.

    I am using the debugobjects in i915.ko to try and catch some invalid
    operations on embedded objects. The problem currently only presents
    itself across module unload so forcing i915 to be builtin is not an
    option.

    Link: http://lkml.kernel.org/r/20161122143039.6433-1-chris@chris-wilson.co.uk
    Signed-off-by: Chris Wilson
    Cc: "Du, Changbin"
    Cc: Thomas Gleixner
    Cc: Christian Borntraeger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Wilson
     

25 Nov, 2016

2 commits

  • Since the KERN_CONT changes the locking-selftest output is messed up, eg:

    ----------------------------------------------------------------------------
    | spin |wlock |rlock |mutex | wsem | rsem |
    --------------------------------------------------------------------------
    A-A deadlock:
    ok |
    ok |
    ok |
    ok |
    ok |
    ok |

    Use pr_cont() to get it looking normal again:

    ----------------------------------------------------------------------------
    | spin |wlock |rlock |mutex | wsem | rsem |
    --------------------------------------------------------------------------
    A-A deadlock: ok | ok | ok | ok | ok | ok |

    Reported-by: Christian Kujau
    Signed-off-by: Michael Ellerman
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linuxppc-dev@ozlabs.org
    Link: http://lkml.kernel.org/r/1480027528-934-1-git-send-email-mpe@ellerman.id.au
    Signed-off-by: Ingo Molnar

    Michael Ellerman
     
  • This fixes CVE-2016-8650.

    If mpi_powm() is given a zero exponent, it wants to immediately return
    either 1 or 0, depending on the modulus. However, if the result was
    initalised with zero limb space, no limbs space is allocated and a
    NULL-pointer exception ensues.

    Fix this by allocating a minimal amount of limb space for the result when
    the 0-exponent case when the result is 1 and not touching the limb space
    when the result is 0.

    This affects the use of RSA keys and X.509 certificates that carry them.

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] mpi_powm+0x32/0x7e6
    PGD 0
    Oops: 0002 [#1] SMP
    Modules linked in:
    CPU: 3 PID: 3014 Comm: keyctl Not tainted 4.9.0-rc6-fscache+ #278
    Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
    task: ffff8804011944c0 task.stack: ffff880401294000
    RIP: 0010:[] [] mpi_powm+0x32/0x7e6
    RSP: 0018:ffff880401297ad8 EFLAGS: 00010212
    RAX: 0000000000000000 RBX: ffff88040868bec0 RCX: ffff88040868bba0
    RDX: ffff88040868b260 RSI: ffff88040868bec0 RDI: ffff88040868bee0
    RBP: ffff880401297ba8 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000047 R11: ffffffff8183b210 R12: 0000000000000000
    R13: ffff8804087c7600 R14: 000000000000001f R15: ffff880401297c50
    FS: 00007f7a7918c700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 0000000401250000 CR4: 00000000001406e0
    Stack:
    ffff88040868bec0 0000000000000020 ffff880401297b00 ffffffff81376cd4
    0000000000000100 ffff880401297b10 ffffffff81376d12 ffff880401297b30
    ffffffff81376f37 0000000000000100 0000000000000000 ffff880401297ba8
    Call Trace:
    [] ? __sg_page_iter_next+0x43/0x66
    [] ? sg_miter_get_next_page+0x1b/0x5d
    [] ? sg_miter_next+0x17/0xbd
    [] ? mpi_read_raw_from_sgl+0xf2/0x146
    [] rsa_verify+0x9d/0xee
    [] ? pkcs1pad_sg_set_buf+0x2e/0xbb
    [] pkcs1pad_verify+0xc0/0xe1
    [] public_key_verify_signature+0x1b0/0x228
    [] x509_check_for_self_signed+0xa1/0xc4
    [] x509_cert_parse+0x167/0x1a1
    [] x509_key_preparse+0x21/0x1a1
    [] asymmetric_key_preparse+0x34/0x61
    [] key_create_or_update+0x145/0x399
    [] SyS_add_key+0x154/0x19e
    [] do_syscall_64+0x80/0x191
    [] entry_SYSCALL64_slow_path+0x25/0x25
    Code: 56 41 55 41 54 53 48 81 ec a8 00 00 00 44 8b 71 04 8b 42 04 4c 8b 67 18 45 85 f6 89 45 80 0f 84 b4 06 00 00 85 c0 75 2f 41 ff ce c7 04 24 01 00 00 00 b0 01 75 0b 48 8b 41 18 48 83 38 01 0f
    RIP [] mpi_powm+0x32/0x7e6
    RSP
    CR2: 0000000000000000
    ---[ end trace d82015255d4a5d8d ]---

    Basically, this is a backport of a libgcrypt patch:

    http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=patch;h=6e1adb05d290aeeb1c230c763970695f4a538526

    Fixes: cdec9cb5167a ("crypto: GnuPG based MPI lib - source files (part 1)")
    Signed-off-by: Andrey Ryabinin
    Signed-off-by: David Howells
    cc: Dmitry Kasatkin
    cc: linux-ima-devel@lists.sourceforge.net
    cc: stable@vger.kernel.org
    Signed-off-by: James Morris

    Andrey Ryabinin
     

22 Nov, 2016

1 commit

  • Pull sparc fixes from David Miller:

    1) With modern networking cards we can run out of 32-bit DMA space, so
    support 64-bit DMA addressing when possible on sparc64. From Dave
    Tushar.

    2) Some signal frame validation checks are inverted on sparc32, fix
    from Andreas Larsson.

    3) Lockdep tables can get too large in some circumstances on sparc64,
    add a way to adjust the size a bit. From Babu Moger.

    4) Fix NUMA node probing on some sun4v systems, from Thomas Tai.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
    sparc: drop duplicate header scatterlist.h
    lockdep: Limit static allocations if PROVE_LOCKING_SMALL is defined
    config: Adding the new config parameter CONFIG_PROVE_LOCKING_SMALL for sparc
    sunbmac: Fix compiler warning
    sunqe: Fix compiler warnings
    sparc64: Enable 64-bit DMA
    sparc64: Enable sun4v dma ops to use IOMMU v2 APIs
    sparc64: Bind PCIe devices to use IOMMU v2 service
    sparc64: Initialize iommu_map_table and iommu_pool
    sparc64: Add ATU (new IOMMU) support
    sparc64: Add FORCE_MAX_ZONEORDER and default to 13
    sparc64: fix compile warning section mismatch in find_node()
    sparc32: Fix inverted invalid_frame_pointer checks on sigreturns
    sparc64: Fix find_node warning if numa node cannot be found

    Linus Torvalds
     

19 Nov, 2016

1 commit

  • This new config parameter limits the space used for "Lock debugging:
    prove locking correctness" by about 4MB. The current sparc systems have
    the limitation of 32MB size for kernel size including .text, .data and
    .bss sections. With PROVE_LOCKING feature, the kernel size could grow
    beyond this limit and causing system boot-up issues. With this option,
    kernel limits the size of the entries of lock_chains, stack_trace etc.,
    so that kernel fits in required size limit. This is not visible to user
    and only used for sparc.

    Signed-off-by: Babu Moger
    Acked-by: Sam Ravnborg
    Signed-off-by: David S. Miller

    Babu Moger
     

17 Nov, 2016

1 commit

  • iov_iter_advance() needs to decrement iter->count by the number of
    bytes we'd moved beyond. Normal flavours do that, but ITER_PIPE
    doesn't and ITER_PIPE generic_file_read_iter() for O_DIRECT files
    ends up with a bogus fallback to page cache read, resulting in incorrect
    values for file offset and bytes read.

    Signed-off-by: Abhi Das
    Signed-off-by: Al Viro

    Abhi Das
     

12 Nov, 2016

1 commit

  • Some drivers would like to record stacktraces in order to aide leak
    tracing. As stackdepot already provides a facility for only storing the
    unique traces, thereby reducing the memory required, export that
    functionality for use by drivers.

    The code was originally created for KASAN and moved under lib in commit
    cd11016e5f521 ("mm, kasan: stackdepot implementation. Enable stackdepot
    for SLAB") so that it could be shared with mm/. In turn, we want to
    share it now with drivers.

    Link: http://lkml.kernel.org/r/20161108133209.22704-1-chris@chris-wilson.co.uk
    Signed-off-by: Chris Wilson
    Cc: Andrey Ryabinin
    Cc: Alexander Potapenko
    Cc: Dmitry Vyukov
    Cc: Joonsoo Kim
    Cc: "Kirill A. Shutemov"
    Cc: Daniel Vetter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Wilson
     

30 Oct, 2016

1 commit

  • Pull networking fixes from David Miller:
    "Lots of fixes, mostly drivers as is usually the case.

    1) Don't treat zero DMA address as invalid in vmxnet3, from Alexey
    Khoroshilov.

    2) Fix element timeouts in netfilter's nft_dynset, from Anders K.
    Pedersen.

    3) Don't put aead_req crypto struct on the stack in mac80211, from
    Ard Biesheuvel.

    4) Several uninitialized variable warning fixes from Arnd Bergmann.

    5) Fix memory leak in cxgb4, from Colin Ian King.

    6) Fix bpf handling of VLAN header push/pop, from Daniel Borkmann.

    7) Several VRF semantic fixes from David Ahern.

    8) Set skb->protocol properly in ip6_tnl_xmit(), from Eli Cooper.

    9) Socket needs to be locked in udp_disconnect(), from Eric Dumazet.

    10) Div-by-zero on 32-bit fix in mlx4 driver, from Eugenia Emantayev.

    11) Fix stale link state during failover in NCSCI driver, from Gavin
    Shan.

    12) Fix netdev lower adjacency list traversal, from Ido Schimmel.

    13) Propvide proper handle when emitting notifications of filter
    deletes, from Jamal Hadi Salim.

    14) Memory leaks and big-endian issues in rtl8xxxu, from Jes Sorensen.

    15) Fix DESYNC_FACTOR handling in ipv6, from Jiri Bohac.

    16) Several routing offload fixes in mlxsw driver, from Jiri Pirko.

    17) Fix broadcast sync problem in TIPC, from Jon Paul Maloy.

    18) Validate chunk len before using it in SCTP, from Marcelo Ricardo
    Leitner.

    19) Revert a netns locking change that causes regressions, from Paul
    Moore.

    20) Add recursion limit to GRO handling, from Sabrina Dubroca.

    21) GFP_KERNEL in irq context fix in ibmvnic, from Thomas Falcon.

    22) Avoid accessing stale vxlan/geneve socket in data path, from
    Pravin Shelar"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (189 commits)
    geneve: avoid using stale geneve socket.
    vxlan: avoid using stale vxlan socket.
    qede: Fix out-of-bound fastpath memory access
    net: phy: dp83848: add dp83822 PHY support
    enic: fix rq disable
    tipc: fix broadcast link synchronization problem
    ibmvnic: Fix missing brackets in init_sub_crq_irqs
    ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context
    Revert "ibmvnic: Fix releasing of sub-CRQ IRQs in interrupt context"
    arch/powerpc: Update parameters for csum_tcpudp_magic & csum_tcpudp_nofold
    net/mlx4_en: Save slave ethtool stats command
    net/mlx4_en: Fix potential deadlock in port statistics flow
    net/mlx4: Fix firmware command timeout during interrupt test
    net/mlx4_core: Do not access comm channel if it has not yet been initialized
    net/mlx4_en: Fix panic during reboot
    net/mlx4_en: Process all completions in RX rings after port goes up
    net/mlx4_en: Resolve dividing by zero in 32-bit system
    net/mlx4_core: Change the default value of enable_qos
    net/mlx4_core: Avoid setting ports to auto when only one port type is supported
    net/mlx4_core: Fix the resource-type enum in res tracker to conform to FW spec
    ...

    Linus Torvalds
     

28 Oct, 2016

3 commits

  • gen_pool_alloc_algo() iterates over the chunks of a pool trying to find
    a contiguous block of memory that satisfies the allocation request.

    The shortcut

    if (size > atomic_read(&chunk->avail))
    continue;

    makes the loop skip over chunks that do not have enough bytes left to
    fulfill the request. There are two situations, though, where an
    allocation might still fail:

    (1) The available memory is not contiguous, i.e. the request cannot
    be fulfilled due to external fragmentation.

    (2) A race condition. Another thread runs the same code concurrently
    and is quicker to grab the available memory.

    In those situations, the loop calls pool->algo() to search the entire
    chunk, and pool->algo() returns some value that is >= end_bit to
    indicate that the search failed. This return value is then assigned to
    start_bit. The variables start_bit and end_bit describe the range that
    should be searched, and this range should be reset for every chunk that
    is searched. Today, the code fails to reset start_bit to 0. As a
    result, prefixes of subsequent chunks are ignored. Memory allocations
    might fail even though there is plenty of room left in these prefixes of
    those other chunks.

    Fixes: 7f184275aa30 ("lib, Make gen_pool memory allocator lockless")
    Link: http://lkml.kernel.org/r/1477420604-28918-1-git-send-email-danielmentz@google.com
    Signed-off-by: Daniel Mentz
    Reviewed-by: Mathieu Desnoyers
    Acked-by: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Mentz
     
  • KASAN uses stackdepot to memorize stacks for all kmalloc/kfree calls.
    Current stackdepot capacity is 16MB (1024 top level entries x 4 pages on
    second level). Size of each stack is (num_frames + 3) * sizeof(long).
    Which gives us ~84K stacks. This capacity was chosen empirically and it
    is enough to run kernel normally.

    However, when lots of configs are enabled and a fuzzer tries to maximize
    code coverage, it easily hits the limit within tens of minutes. I've
    tested for long a time with number of top level entries bumped 4x
    (4096). And I think I've seen overflow only once. But I don't have all
    configs enabled and code coverage has not reached maximum yet. So bump
    it 8x to 8192.

    Since we have two-level table, memory cost of this is very moderate --
    currently the top-level table is 8KB, with this patch it is 64KB, which
    is negligible under KASAN.

    Here is some approx math.

    128MB allows us to memorize ~670K stacks (assuming stack is ~200b).
    I've grepped kernel for kmalloc|kfree|kmem_cache_alloc|kmem_cache_free|
    kzalloc|kstrdup|kstrndup|kmemdup and it gives ~60K matches. Most of
    alloc/free call sites are reachable with only one stack. But some
    utility functions can have large fanout. Assuming average fanout is 5x,
    total number of alloc/free stacks is ~300K.

    Link: http://lkml.kernel.org/r/1476458416-122131-1-git-send-email-dvyukov@google.com
    Signed-off-by: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Alexander Potapenko
    Cc: Joonsoo Kim
    Cc: Baozeng Ding
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Vyukov
     
  • When building with the latent_entropy plugin, set the default
    CONFIG_FRAME_WARN to 2048, since some __init functions have many basic
    blocks that, when instrumented by the latent_entropy plugin, grow beyond
    1024 byte stack size on 32-bit builds.

    Link: http://lkml.kernel.org/r/20161018211216.GA39687@beast
    Signed-off-by: Kees Cook
    Reported-by: kbuild test robot
    Cc: Emese Revfy
    Cc: Ingo Molnar
    Cc: Michal Marek
    Cc: "Paul E. McKenney"
    Cc: Dan Williams
    Cc: Andrey Ryabinin
    Cc: Josh Poimboeuf
    Cc: Tejun Heo
    Cc: Nikolay Aleksandrov
    Cc: Dmitry Vyukov
    Cc: Shuah Khan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

21 Oct, 2016

1 commit

  • After commit 636c2628086e ("net: skbuff: Remove errornous length
    validation in skb_vlan_pop()") mentioned test case stopped working,
    throwing a -12 (ENOMEM) return code. The issue however is not due to
    636c2628086e, but rather due to a buggy test case that got uncovered
    from the change in behaviour in 636c2628086e.

    The data_size of that test case for the skb was set to 1. In the
    bpf_fill_ld_abs_vlan_push_pop() handler bpf insns are generated that
    loop with: reading skb data, pushing 68 tags, reading skb data,
    popping 68 tags, reading skb data, etc, in order to force a skb
    expansion and thus trigger that JITs recache skb->data. Problem is
    that initial data_size is too small.

    While before 636c2628086e, the test silently bailed out due to the
    skb->len < VLAN_ETH_HLEN check with returning 0, and now throwing an
    error from failing skb_ensure_writable(). Set at least minimum of
    ETH_HLEN as an initial length so that on first push of data, equivalent
    pop will succeed.

    Fixes: 4d9c5c53ac99 ("test_bpf: add bpf_skb_vlan_push/pop() tests")
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Borkmann
     

16 Oct, 2016

1 commit

  • Pull gcc plugins update from Kees Cook:
    "This adds a new gcc plugin named "latent_entropy". It is designed to
    extract as much possible uncertainty from a running system at boot
    time as possible, hoping to capitalize on any possible variation in
    CPU operation (due to runtime data differences, hardware differences,
    SMP ordering, thermal timing variation, cache behavior, etc).

    At the very least, this plugin is a much more comprehensive example
    for how to manipulate kernel code using the gcc plugin internals"

    * tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    latent_entropy: Mark functions with __latent_entropy
    gcc-plugins: Add latent_entropy plugin

    Linus Torvalds
     

15 Oct, 2016

4 commits

  • Pull more misc uaccess and vfs updates from Al Viro:
    "The rest of the stuff from -next (more uaccess work) + assorted fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    score: traps: Add missing include file to fix build error
    fs/super.c: don't fool lockdep in freeze_super() and thaw_super() paths
    fs/super.c: fix race between freeze_super() and thaw_super()
    overlayfs: Fix setting IOP_XATTR flag
    iov_iter: kernel-doc import_iovec() and rw_copy_check_uvector()
    blackfin: no access_ok() for __copy_{to,from}_user()
    arm64: don't zero in __copy_from_user{,_inatomic}
    arm: don't zero in __copy_from_user_inatomic()/__copy_from_user()
    arc: don't leak bits of kernel stack into coredump
    alpha: get rid of tail-zeroing in __copy_user()

    Linus Torvalds
     
  • Both import_iovec() and rw_copy_check_uvector() take an array
    (typically small and on-stack) which is used to hold an iovec array copy
    from userspace. This is to avoid an expensive memory allocation in the
    fast path (i.e. few iovec elements).

    The caller may have to check whether these functions actually used
    the provided buffer or allocated a new one -- but this differs between
    the too. Let's just add a kernel doc to clarify what the semantics are
    for each function.

    Signed-off-by: Vegard Nossum
    Signed-off-by: Al Viro

    Vegard Nossum
     
  • …/kernel/git/shuah/linux-kselftest

    Pull kselftest updates from Shuah Khan:
    "This update consists of:

    - Fixes and improvements to existing tests

    - Moving code from Documentation to selftests, samples, and tools:

    * Moves dnotify_test, prctl, ptp, vDSO, ia64, watchdog, and
    networking tests from Documentation to selftests.

    * Moves mic/mpssd, misc-devices/mei, timers, watchdog, auxdisplay,
    and blackfin examples from Documentation to samples.

    * Moves accounting, laptops/dslm, and pcmcia/crc32hash tools from
    Documentation to tools.

    * Deletes BUILD_DOCSRC and its dependencies"

    * tag 'linux-kselftest-4.9-rc1-update' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: (21 commits)
    selftests/futex: Check ANSI terminal color support
    Doc: update 00-INDEX files to reflect the runnable code move
    samples: move blackfin gptimers-example from Documentation
    tools: move pcmcia crc32hash tool from Documentation
    tools: move laptops dslm tool from Documentation
    tools: move accounting tool from Documentation
    samples: move auxdisplay example code from Documentation
    samples: move watchdog example code from Documentation
    samples: move timers example code from Documentation
    samples: move misc-devices/mei example code from Documentation
    samples: move mic/mpssd example code from Documentation
    selftests: Move networking/timestamping from Documentation
    selftests: move watchdog tests from Documentation/watchdog
    selftests: move ia64 tests from Documentation/ia64
    selftests: move vDSO tests from Documentation/vDSO
    selftests: move ptp tests from Documentation/ptp
    selftests: move prctl tests from Documentation/prctl
    selftests: move dnotify_test from Documentation/filesystems
    selftests/timers: Add missing error code assignment before test
    selftests/zram: replace ZRAM_LZ4_COMPRESS
    ...

    Linus Torvalds
     
  • Pull percpu updates from Tejun Heo:

    - Nick improved generic implementations of percpu operations which
    modify the variable and return so that they calculate the physical
    address only once.

    - percpu_ref percpu atomic mode switching improvements. The
    patchset was originally posted about a year ago but fell through the
    crack.

    - misc non-critical fixes.

    * 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
    mm/percpu.c: fix potential memory leakage for pcpu_embed_first_chunk()
    mm/percpu.c: correct max_distance calculation for pcpu_embed_first_chunk()
    percpu: eliminate two sparse warnings
    percpu: improve generic percpu modify-return implementation
    percpu-refcount: init ->confirm_switch member properly
    percpu_ref: allow operation mode switching operations to be called concurrently
    percpu_ref: restructure operation mode switching
    percpu_ref: unify staggered atomic switching wait behavior
    percpu_ref: reorganize __percpu_ref_switch_to_atomic() and relocate percpu_ref_switch_to_atomic()
    percpu_ref: remove unnecessary RCU grace period for staggered atomic switching confirmation

    Linus Torvalds
     

12 Oct, 2016

5 commits

  • There's no point in collecting coverage from lib/stackdepot.c, as it is
    not a function of syscall inputs. Disabling kcov instrumentation for that
    file will reduce the coverage noise level.

    Link: http://lkml.kernel.org/r/1474640972-104131-1-git-send-email-glider@google.com
    Signed-off-by: Alexander Potapenko
    Acked-by: Dmitry Vyukov
    Cc: Kostya Serebryany
    Cc: Andrey Konovalov
    Cc: syzkaller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     
  • Today there are platforms with many CPUs (up to 4K). Trying to boot only
    part of the CPUs may result in too long string.

    For example lets take NPS platform that is part of arch/arc. This
    platform have SMP system with 256 cores each with 16 HW threads (SMT
    machine) where HW thread appears as CPU to the kernel. In this example
    there is total of 4K CPUs. When one tries to boot only part of the HW
    threads from each core the string representing the map may be long... For
    example if for sake of performance we decided to boot only first half of
    HW threads of each core the map will look like:
    0-7,16-23,32-39,...,4080-4087

    This patch introduce new syntax to accommodate with such use case. I
    added an optional postfix to a range of CPUs which will choose according
    to given modulo the desired range of reminders i.e.:

    :sed_size/group_size

    For example, above map can be described in new syntax like this:
    0-4095:8/16

    Note that this patch is backward compatible with current syntax.

    [akpm@linux-foundation.org: rework documentation]
    Link: http://lkml.kernel.org/r/1473579629-4283-1-git-send-email-noamca@mellanox.com
    Signed-off-by: Noam Camus
    Cc: David Decotigny
    Cc: Ben Hutchings
    Cc: David S. Miller
    Cc: Pan Xinhui
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Noam Camus
     
  • Set "overflow" bit upon encountering it instead of postponing to the end
    of the conversion. Somehow gcc unwedges itself and generates better code:

    $ ./scripts/bloat-o-meter ../vmlinux-000 ../obj/vmlinux
    _parse_integer 177 139 -38

    Inspired by patch from Zhaoxiu Zeng.

    Link: http://lkml.kernel.org/r/20160826221920.GA1909@p183.telecom.by
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • The strncpy_from_user() accessor is effectively a copy_from_user()
    specialised to copy strings, terminating early at a NUL byte if possible.
    In other respects it is identical, and can be used to copy an arbitrarily
    large buffer from userspace into the kernel. Conceptually, it exposes a
    similar attack surface.

    As with copy_from_user(), we check the destination range when the kernel
    is built with KASAN, but unlike copy_from_user() we do not check the
    destination buffer when using HARDENED_USERCOPY. As strncpy_from_user()
    calls get_user() in a loop, we must call check_object_size() explicitly.

    This patch adds this instrumentation to strncpy_from_user(), per the same
    rationale as with the regular copy_from_user(). In the absence of
    hardened usercopy this will have no impact as the instrumentation expands
    to an empty static inline function.

    Link: http://lkml.kernel.org/r/1472221903-31181-1-git-send-email-mark.rutland@arm.com
    Signed-off-by: Mark Rutland
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark Rutland
     
  • it actually worked only when requested area ended on the page boundary...

    Reported-by: Marco Grassi
    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

11 Oct, 2016

2 commits

  • The __latent_entropy gcc attribute can be used only on functions and
    variables. If it is on a function then the plugin will instrument it for
    gathering control-flow entropy. If the attribute is on a variable then
    the plugin will initialize it with random contents. The variable must
    be an integer, an integer array type or a structure with integer fields.

    These specific functions have been selected because they are init
    functions (to help gather boot-time entropy), are called at unpredictable
    times, or they have variable loops, each of which provide some level of
    latent entropy.

    Signed-off-by: Emese Revfy
    [kees: expanded commit message]
    Signed-off-by: Kees Cook

    Emese Revfy
     
  • Pull misc vfs updates from Al Viro:
    "Assorted misc bits and pieces.

    There are several single-topic branches left after this (rename2
    series from Miklos, current_time series from Deepa Dinamani, xattr
    series from Andreas, uaccess stuff from from me) and I'd prefer to
    send those separately"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
    proc: switch auxv to use of __mem_open()
    hpfs: support FIEMAP
    cifs: get rid of unused arguments of CIFSSMBWrite()
    posix_acl: uapi header split
    posix_acl: xattr representation cleanups
    fs/aio.c: eliminate redundant loads in put_aio_ring_file
    fs/internal.h: add const to ns_dentry_operations declaration
    compat: remove compat_printk()
    fs/buffer.c: make __getblk_slow() static
    proc: unsigned file descriptors
    fs/file: more unsigned file descriptors
    fs: compat: remove redundant check of nr_segs
    cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
    cifs: don't use memcpy() to copy struct iov_iter
    get rid of separate multipage fault-in primitives
    fs: Avoid premature clearing of capabilities
    fs: Give dentry to inode_change_ok() instead of inode
    fuse: Propagate dentry down to inode_change_ok()
    ceph: Propagate dentry down to inode_change_ok()
    xfs: Propagate dentry down to inode_change_ok()
    ...

    Linus Torvalds
     

10 Oct, 2016

1 commit

  • Move blackfin gptimers-example to samples and remove it from Documentation
    Makefile. Update samples Kconfig and Makefile to build gptimers-example.

    blackfin is the last CONFIG_BUILD_DOCSRC target in Documentation/Makefile.
    Hence this patch also includes changes to remove CONFIG_BUILD_DOCSRC from
    Makefile and lib/Kconfig.debug and updates VIDEO_PCI_SKELETON dependency
    on BUILD_DOCSRC.

    Documentation/Makefile is not deleted to avoid braking make htmldocs and
    make distclean.

    Acked-by: Michal Marek
    Acked-by: Jonathan Corbet
    Reviewed-by: Kees Cook
    Reported-by: Valentin Rothberg
    Reported-by: Paul Gortmaker
    Signed-off-by: Shuah Khan

    Shuah Khan
     

08 Oct, 2016

8 commits

  • Merge updates from Andrew Morton:

    - fsnotify updates

    - ocfs2 updates

    - all of MM

    * emailed patches from Andrew Morton : (127 commits)
    console: don't prefer first registered if DT specifies stdout-path
    cred: simpler, 1D supplementary groups
    CREDITS: update Pavel's information, add GPG key, remove snail mail address
    mailmap: add Johan Hovold
    .gitattributes: set git diff driver for C source code files
    uprobes: remove function declarations from arch/{mips,s390}
    spelling.txt: "modeled" is spelt correctly
    nmi_backtrace: generate one-line reports for idle cpus
    arch/tile: adopt the new nmi_backtrace framework
    nmi_backtrace: do a local dump_stack() instead of a self-NMI
    nmi_backtrace: add more trigger_*_cpu_backtrace() methods
    min/max: remove sparse warnings when they're nested
    Documentation/filesystems/proc.txt: add more description for maps/smaps
    mm, proc: fix region lost in /proc/self/smaps
    proc: fix timerslack_ns CAP_SYS_NICE check when adjusting self
    proc: add LSM hook checks to /proc//timerslack_ns
    proc: relax /proc//timerslack_ns capability requirements
    meminfo: break apart a very long seq_printf with #ifdefs
    seq/proc: modify seq_put_decimal_[u]ll to take a const char *, not char
    proc: faster /proc/*/status
    ...

    Linus Torvalds
     
  • When doing an nmi backtrace of many cores, most of which are idle, the
    output is a little overwhelming and very uninformative. Suppress
    messages for cpus that are idling when they are interrupted and just
    emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".

    We do this by grouping all the cpuidle code together into a new
    .cpuidle.text section, and then checking the address of the interrupted
    PC to see if it lies within that section.

    This commit suitably tags x86 and tile idle routines, and only adds in
    the minimal framework for other architectures.

    Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.com
    Signed-off-by: Chris Metcalf
    Acked-by: Peter Zijlstra (Intel)
    Tested-by: Peter Zijlstra (Intel)
    Tested-by: Daniel Thompson [arm]
    Tested-by: Petr Mladek
    Cc: Aaron Tomlin
    Cc: Peter Zijlstra (Intel)
    Cc: "Rafael J. Wysocki"
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Metcalf
     
  • Currently on arm there is code that checks whether it should call
    dump_stack() explicitly, to avoid trying to raise an NMI when the
    current context is not preemptible by the backtrace IPI. Similarly, the
    forthcoming arch/tile support uses an IPI mechanism that does not
    support generating an NMI to self.

    Accordingly, move the code that guards this case into the generic
    mechanism, and invoke it unconditionally whenever we want a backtrace of
    the current cpu. It seems plausible that in all cases, dump_stack()
    will generate better information than generating a stack from the NMI
    handler. The register state will be missing, but that state is likely
    not particularly helpful in any case.

    Or, if we think it is helpful, we should be capturing and emitting the
    current register state in all cases when regs == NULL is passed to
    nmi_cpu_backtrace().

    Link: http://lkml.kernel.org/r/1472487169-14923-3-git-send-email-cmetcalf@mellanox.com
    Signed-off-by: Chris Metcalf
    Tested-by: Daniel Thompson [arm]
    Reviewed-by: Petr Mladek
    Acked-by: Aaron Tomlin
    Cc: "Rafael J. Wysocki"
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Metcalf
     
  • Patch series "improvements to the nmi_backtrace code" v9.

    This patch series modifies the trigger_xxx_backtrace() NMI-based remote
    backtracing code to make it more flexible, and makes a few small
    improvements along the way.

    The motivation comes from the task isolation code, where there are
    scenarios where we want to be able to diagnose a case where some cpu is
    about to interrupt a task-isolated cpu. It can be helpful to see both
    where the interrupting cpu is, and also an approximation of where the
    cpu that is being interrupted is. The nmi_backtrace framework allows us
    to discover the stack of the interrupted cpu.

    I've tested that the change works as desired on tile, and build-tested
    x86, arm, mips, and sparc64. For x86 I confirmed that the generic
    cpuidle stuff as well as the architecture-specific routines are in the
    new cpuidle section. For arm, mips, and sparc I just build-tested it
    and made sure the generic cpuidle routines were in the new cpuidle
    section, but I didn't attempt to figure out which the platform-specific
    idle routines might be. That might be more usefully done by someone
    with platform experience in follow-up patches.

    This patch (of 4):

    Currently you can only request a backtrace of either all cpus, or all
    cpus but yourself. It can also be helpful to request a remote backtrace
    of a single cpu, and since we want that, the logical extension is to
    support a cpumask as the underlying primitive.

    This change modifies the existing lib/nmi_backtrace.c code to take a
    cpumask as its basic primitive, and modifies the linux/nmi.h code to use
    the new "cpumask" method instead.

    The existing clients of nmi_backtrace (arm and x86) are converted to
    using the new cpumask approach in this change.

    The other users of the backtracing API (sparc64 and mips) are converted
    to use the cpumask approach rather than the all/allbutself approach.
    The mips code ignored the "include_self" boolean but with this change it
    will now also dump a local backtrace if requested.

    Link: http://lkml.kernel.org/r/1472487169-14923-2-git-send-email-cmetcalf@mellanox.com
    Signed-off-by: Chris Metcalf
    Tested-by: Daniel Thompson [arm]
    Reviewed-by: Aaron Tomlin
    Reviewed-by: Petr Mladek
    Cc: "Rafael J. Wysocki"
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Ralf Baechle
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Metcalf
     
  • This came to light when implementing native 64-bit atomics for ARCv2.

    The atomic64 self-test code uses CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
    to check whether atomic64_dec_if_positive() is available. It seems it
    was needed when not every arch defined it. However as of current code
    the Kconfig option seems needless

    - for CONFIG_GENERIC_ATOMIC64 it is auto-enabled in lib/Kconfig and a
    generic definition of API is present lib/atomic64.c
    - arches with native 64-bit atomics select it in arch/*/Kconfig and
    define the API in their headers

    So I see no point in keeping the Kconfig option

    Compile tested for:
    - blackfin (CONFIG_GENERIC_ATOMIC64)
    - x86 (!CONFIG_GENERIC_ATOMIC64)
    - ia64

    Link: http://lkml.kernel.org/r/1473703083-8625-3-git-send-email-vgupta@synopsys.com
    Signed-off-by: Vineet Gupta
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Ralf Baechle
    Cc: "James E.J. Bottomley"
    Cc: Helge Deller
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: "David S. Miller"
    Cc: Chris Metcalf
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Vineet Gupta
    Cc: Zhaoxiu Zeng
    Cc: Linus Walleij
    Cc: Alexander Potapenko
    Cc: Andrey Ryabinin
    Cc: Herbert Xu
    Cc: Ming Lin
    Cc: Arnd Bergmann
    Cc: Geert Uytterhoeven
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Andi Kleen
    Cc: Boqun Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vineet Gupta
     
  • Pull VFS splice updates from Al Viro:
    "There's a bunch of branches this cycle, both mine and from other folks
    and I'd rather send pull requests separately.

    This one is the conversion of ->splice_read() to ITER_PIPE iov_iter
    (and introduction of such). Gets rid of a lot of code in fs/splice.c
    and elsewhere; there will be followups, but these are for the next
    cycle... Some pipe/splice-related cleanups from Miklos in the same
    branch as well"

    * 'work.splice_read' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    pipe: fix comment in pipe_buf_operations
    pipe: add pipe_buf_steal() helper
    pipe: add pipe_buf_confirm() helper
    pipe: add pipe_buf_release() helper
    pipe: add pipe_buf_get() helper
    relay: simplify relay_file_read()
    switch default_file_splice_read() to use of pipe-backed iov_iter
    switch generic_file_splice_read() to use of ->read_iter()
    new iov_iter flavour: pipe-backed
    fuse_dev_splice_read(): switch to add_to_pipe()
    skb_splice_bits(): get rid of callback
    new helper: add_to_pipe()
    splice: lift pipe_lock out of splice_to_pipe()
    splice: switch get_iovec_page_array() to iov_iter
    splice_to_pipe(): don't open-code wakeup_pipe_readers()
    consistent treatment of EFAULT on O_DIRECT read/write

    Linus Torvalds
     
  • Pull block layer updates from Jens Axboe:
    "This is the main pull request for block layer changes in 4.9.

    As mentioned at the last merge window, I've changed things up and now
    do just one branch for core block layer changes, and driver changes.
    This avoids dependencies between the two branches. Outside of this
    main pull request, there are two topical branches coming as well.

    This pull request contains:

    - A set of fixes, and a conversion to blk-mq, of nbd. From Josef.

    - Set of fixes and updates for lightnvm from Matias, Simon, and Arnd.
    Followup dependency fix from Geert.

    - General fixes from Bart, Baoyou, Guoqing, and Linus W.

    - CFQ async write starvation fix from Glauber.

    - Add supprot for delayed kick of the requeue list, from Mike.

    - Pull out the scalable bitmap code from blk-mq-tag.c and make it
    generally available under the name of sbitmap. Only blk-mq-tag uses
    it for now, but the blk-mq scheduling bits will use it as well.
    From Omar.

    - bdev thaw error progagation from Pierre.

    - Improve the blk polling statistics, and allow the user to clear
    them. From Stephen.

    - Set of minor cleanups from Christoph in block/blk-mq.

    - Set of cleanups and optimizations from me for block/blk-mq.

    - Various nvme/nvmet/nvmeof fixes from the various folks"

    * 'for-4.9/block' of git://git.kernel.dk/linux-block: (54 commits)
    fs/block_dev.c: return the right error in thaw_bdev()
    nvme: Pass pointers, not dma addresses, to nvme_get/set_features()
    nvme/scsi: Remove power management support
    nvmet: Make dsm number of ranges zero based
    nvmet: Use direct IO for writes
    admin-cmd: Added smart-log command support.
    nvme-fabrics: Add host_traddr options field to host infrastructure
    nvme-fabrics: revise host transport option descriptions
    nvme-fabrics: rework nvmf_get_address() for variable options
    nbd: use BLK_MQ_F_BLOCKING
    blkcg: Annotate blkg_hint correctly
    cfq: fix starvation of asynchronous writes
    blk-mq: add flag for drivers wanting blocking ->queue_rq()
    blk-mq: remove non-blocking pass in blk_mq_map_request
    blk-mq: get rid of manual run of queue with __blk_mq_run_hw_queue()
    block: export bio_free_pages to other modules
    lightnvm: propagate device_add() error code
    lightnvm: expose device geometry through sysfs
    lightnvm: control life of nvm_dev in driver
    blk-mq: register device instead of disk
    ...

    Linus Torvalds
     
  • Pull trivial updates from Jiri Kosina:
    "The usual rocket science from the trivial tree"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    tracing/syscalls: fix multiline in error message text
    lib/Kconfig.debug: fix DEBUG_SECTION_MISMATCH description
    doc: vfs: fix fadvise() sycall name
    x86/entry: spell EBX register correctly in documentation
    securityfs: fix securityfs_create_dir comment
    irq: Fix typo in tracepoint.xml

    Linus Torvalds