26 Mar, 2016

4 commits

  • Signed-off-by: Alexander Potapenko
    Acked-by: Andrey Ryabinin
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Steven Rostedt
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     
  • Implement the stack depot and provide CONFIG_STACKDEPOT. Stack depot
    will allow KASAN store allocation/deallocation stack traces for memory
    chunks. The stack traces are stored in a hash table and referenced by
    handles which reside in the kasan_alloc_meta and kasan_free_meta
    structures in the allocated memory chunks.

    IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
    duplication.

    Right now stackdepot support is only enabled in SLAB allocator. Once
    KASAN features in SLAB are on par with those in SLUB we can switch SLUB
    to stackdepot as well, thus removing the dependency on SLUB stack
    bookkeeping, which wastes a lot of memory.

    This patch is based on the "mm: kasan: stack depots" patch originally
    prepared by Dmitry Chernenkov.

    Joonsoo has said that he plans to reuse the stackdepot code for the
    mm/page_owner.c debugging facility.

    [akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t]
    [aryabinin@virtuozzo.com: comment style fixes]
    Signed-off-by: Alexander Potapenko
    Signed-off-by: Andrey Ryabinin
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Steven Rostedt
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     
  • Add KASAN hooks to SLAB allocator.

    This patch is based on the "mm: kasan: unified support for SLUB and SLAB
    allocators" patch originally prepared by Dmitry Chernenkov.

    Signed-off-by: Alexander Potapenko
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Steven Rostedt
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     
  • This patchset implements SLAB support for KASAN

    Unlike SLUB, SLAB doesn't store allocation/deallocation stacks for heap
    objects, therefore we reimplement this feature in mm/kasan/stackdepot.c.
    The intention is to ultimately switch SLUB to use this implementation as
    well, which will save a lot of memory (right now SLUB bloats each object
    by 256 bytes to store the allocation/deallocation stacks).

    Also neither SLUB nor SLAB delay the reuse of freed memory chunks, which
    is necessary for better detection of use-after-free errors. We
    introduce memory quarantine (mm/kasan/quarantine.c), which allows
    delayed reuse of deallocated memory.

    This patch (of 7):

    Rename kmalloc_large_oob_right() to kmalloc_pagealloc_oob_right(), as
    the test only checks the page allocator functionality. Also reimplement
    kmalloc_large_oob_right() so that the test allocates a large enough
    chunk of memory that still does not trigger the page allocator fallback.

    Signed-off-by: Alexander Potapenko
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Steven Rostedt
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     

23 Mar, 2016

3 commits

  • On parisc and metag the stack grows upwards, so for those we need to
    scan the stack downwards in order to calculate how much stack a process
    has used.

    Tested on a 64bit parisc kernel.

    Signed-off-by: Helge Deller

    Helge Deller
     
  • -fsanitize=* options makes GCC less smart than usual and increase number
    of 'maybe-uninitialized' false-positives. So this patch does two things:

    * Add -Wno-maybe-uninitialized to CFLAGS_UBSAN which will disable all
    such warnings for instrumented files.

    * Remove CONFIG_UBSAN_SANITIZE_ALL from all[yes|mod]config builds. So
    the all[yes|mod]config build goes without -fsanitize=* and still with
    -Wmaybe-uninitialized.

    Signed-off-by: Andrey Ryabinin
    Reported-by: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     
  • kcov provides code coverage collection for coverage-guided fuzzing
    (randomized testing). Coverage-guided fuzzing is a testing technique
    that uses coverage feedback to determine new interesting inputs to a
    system. A notable user-space example is AFL
    (http://lcamtuf.coredump.cx/afl/). However, this technique is not
    widely used for kernel testing due to missing compiler and kernel
    support.

    kcov does not aim to collect as much coverage as possible. It aims to
    collect more or less stable coverage that is function of syscall inputs.
    To achieve this goal it does not collect coverage in soft/hard
    interrupts and instrumentation of some inherently non-deterministic or
    non-interesting parts of kernel is disbled (e.g. scheduler, locking).

    Currently there is a single coverage collection mode (tracing), but the
    API anticipates additional collection modes. Initially I also
    implemented a second mode which exposes coverage in a fixed-size hash
    table of counters (what Quentin used in his original patch). I've
    dropped the second mode for simplicity.

    This patch adds the necessary support on kernel side. The complimentary
    compiler support was added in gcc revision 231296.

    We've used this support to build syzkaller system call fuzzer, which has
    found 90 kernel bugs in just 2 months:

    https://github.com/google/syzkaller/wiki/Found-Bugs

    We've also found 30+ bugs in our internal systems with syzkaller.
    Another (yet unexplored) direction where kcov coverage would greatly
    help is more traditional "blob mutation". For example, mounting a
    random blob as a filesystem, or receiving a random blob over wire.

    Why not gcov. Typical fuzzing loop looks as follows: (1) reset
    coverage, (2) execute a bit of code, (3) collect coverage, repeat. A
    typical coverage can be just a dozen of basic blocks (e.g. an invalid
    input). In such context gcov becomes prohibitively expensive as
    reset/collect coverage steps depend on total number of basic
    blocks/edges in program (in case of kernel it is about 2M). Cost of
    kcov depends only on number of executed basic blocks/edges. On top of
    that, kernel requires per-thread coverage because there are always
    background threads and unrelated processes that also produce coverage.
    With inlined gcov instrumentation per-thread coverage is not possible.

    kcov exposes kernel PCs and control flow to user-space which is
    insecure. But debugfs should not be mapped as user accessible.

    Based on a patch by Quentin Casasnovas.

    [akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
    [akpm@linux-foundation.org: unbreak allmodconfig]
    [akpm@linux-foundation.org: follow x86 Makefile layout standards]
    Signed-off-by: Dmitry Vyukov
    Reviewed-by: Kees Cook
    Cc: syzkaller
    Cc: Vegard Nossum
    Cc: Catalin Marinas
    Cc: Tavis Ormandy
    Cc: Will Deacon
    Cc: Quentin Casasnovas
    Cc: Kostya Serebryany
    Cc: Eric Dumazet
    Cc: Alexander Potapenko
    Cc: Kees Cook
    Cc: Bjorn Helgaas
    Cc: Sasha Levin
    Cc: David Drysdale
    Cc: Ard Biesheuvel
    Cc: Andrey Ryabinin
    Cc: Kirill A. Shutemov
    Cc: Jiri Slaby
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Vyukov
     

21 Mar, 2016

2 commits

  • Pull 'objtool' stack frame validation from Ingo Molnar:
    "This tree adds a new kernel build-time object file validation feature
    (ONFIG_STACK_VALIDATION=y): kernel stack frame correctness validation.
    It was written by and is maintained by Josh Poimboeuf.

    The motivation: there's a category of hard to find kernel bugs, most
    of them in assembly code (but also occasionally in C code), that
    degrades the quality of kernel stack dumps/backtraces. These bugs are
    hard to detect at the source code level. Such bugs result in
    incorrect/incomplete backtraces most of time - but can also in some
    rare cases result in crashes or other undefined behavior.

    The build time correctness checking is done via the new 'objtool'
    user-space utility that was written for this purpose and which is
    hosted in the kernel repository in tools/objtool/. The tool's (very
    simple) UI and source code design is shaped after Git and perf and
    shares quite a bit of infrastructure with tools/perf (which tooling
    infrastructure sharing effort got merged via perf and is already
    upstream). Objtool follows the well-known kernel coding style.

    Objtool does not try to check .c or .S files, it instead analyzes the
    resulting .o generated machine code from first principles: it decodes
    the instruction stream and interprets it. (Right now objtool supports
    the x86-64 architecture.)

    From tools/objtool/Documentation/stack-validation.txt:

    "The kernel CONFIG_STACK_VALIDATION option enables a host tool named
    objtool which runs at compile time. It has a "check" subcommand
    which analyzes every .o file and ensures the validity of its stack
    metadata. It enforces a set of rules on asm code and C inline
    assembly code so that stack traces can be reliable.

    Currently it only checks frame pointer usage, but there are plans to
    add CFI validation for C files and CFI generation for asm files.

    For each function, it recursively follows all possible code paths
    and validates the correct frame pointer state at each instruction.

    It also follows code paths involving special sections, like
    .altinstructions, __jump_table, and __ex_table, which can add
    alternative execution paths to a given instruction (or set of
    instructions). Similarly, it knows how to follow switch statements,
    for which gcc sometimes uses jump tables."

    When this new kernel option is enabled (it's disabled by default), the
    tool, if it finds any suspicious assembly code pattern, outputs
    warnings in compiler warning format:

    warning: objtool: rtlwifi_rate_mapping()+0x2e7: frame pointer state mismatch
    warning: objtool: cik_tiling_mode_table_init()+0x6ce: call without frame pointer save/setup
    warning: objtool:__schedule()+0x3c0: duplicate frame pointer save
    warning: objtool:__schedule()+0x3fd: sibling call from callable instruction with changed frame pointer

    ... so that scripts that pick up compiler warnings will notice them.
    All known warnings triggered by the tool are fixed by the tree, most
    of the commits in fact prepare the kernel to be warning-free. Most of
    them are bugfixes or cleanups that stand on their own, but there are
    also some annotations of 'special' stack frames for justified cases
    such entries to JIT-ed code (BPF) or really special boot time code.

    There are two other long-term motivations behind this tool as well:

    - To improve the quality and reliability of kernel stack frames, so
    that they can be used for optimized live patching.

    - To create independent infrastructure to check the correctness of
    CFI stack frames at build time. CFI debuginfo is notoriously
    unreliable and we cannot use it in the kernel as-is without extra
    checking done both on the kernel side and on the build side.

    The quality of kernel stack frames matters to debuggability as well,
    so IMO we can merge this without having to consider the live patching
    or CFI debuginfo angle"

    * 'core-objtool-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
    objtool: Only print one warning per function
    objtool: Add several performance improvements
    tools: Copy hashtable.h into tools directory
    objtool: Fix false positive warnings for functions with multiple switch statements
    objtool: Rename some variables and functions
    objtool: Remove superflous INIT_LIST_HEAD
    objtool: Add helper macros for traversing instructions
    objtool: Fix false positive warnings related to sibling calls
    objtool: Compile with debugging symbols
    objtool: Detect infinite recursion
    objtool: Prevent infinite recursion in noreturn detection
    objtool: Detect and warn if libelf is missing and don't break the build
    tools: Support relative directory path for 'O='
    objtool: Support CROSS_COMPILE
    x86/asm/decoder: Use explicitly signed chars
    objtool: Enable stack metadata validation on 64-bit x86
    objtool: Add CONFIG_STACK_VALIDATION option
    objtool: Add tool to perform compile-time stack metadata validation
    x86/kprobes: Mark kretprobe_trampoline() stack frame as non-standard
    sched: Always inline context_switch()
    ...

    Linus Torvalds
     
  • Pull virtio/vhost updates from Michael Tsirkin:
    "New features, performance improvements, cleanups:

    - basic polling support for vhost
    - rework virtio to optionally use DMA API, fixing it on Xen
    - balloon stats gained a new entry
    - using the new napi_alloc_skb speeds up virtio net
    - virtio blk stats can now be read while another VCPU is busy
    inflating or deflating the balloon

    plus misc cleanups in various places"

    * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
    virtio_net: replace netdev_alloc_skb_ip_align() with napi_alloc_skb()
    vhost_net: basic polling support
    vhost: introduce vhost_vq_avail_empty()
    vhost: introduce vhost_has_work()
    virtio_balloon: Allow to resize and update the balloon stats in parallel
    virtio_balloon: Use a workqueue instead of "vballoon" kthread
    virtio/s390: size of SET_IND payload
    virtio/s390: use dev_to_virtio
    vhost: rename vhost_init_used()
    vhost: rename cross-endian helpers
    virtio_blk: VIRTIO_BLK_F_WCE->VIRTIO_BLK_F_FLUSH
    vring: Use the DMA API on Xen
    virtio_pci: Use the DMA API if enabled
    virtio_mmio: Use the DMA API if enabled
    virtio: Add improved queue allocation API
    virtio_ring: Support DMA APIs
    vring: Introduce vring_use_dma_api()
    s390/dma: Allow per device dma ops
    alpha/dma: use common noop dma ops
    dma: Provide simple noop dma ops

    Linus Torvalds
     

20 Mar, 2016

1 commit

  • Pull networking updates from David Miller:
    "Highlights:

    1) Support more Realtek wireless chips, from Jes Sorenson.

    2) New BPF types for per-cpu hash and arrap maps, from Alexei
    Starovoitov.

    3) Make several TCP sysctls per-namespace, from Nikolay Borisov.

    4) Allow the use of SO_REUSEPORT in order to do per-thread processing
    of incoming TCP/UDP connections. The muxing can be done using a
    BPF program which hashes the incoming packet. From Craig Gallek.

    5) Add a multiplexer for TCP streams, to provide a messaged based
    interface. BPF programs can be used to determine the message
    boundaries. From Tom Herbert.

    6) Add 802.1AE MACSEC support, from Sabrina Dubroca.

    7) Avoid factorial complexity when taking down an inetdev interface
    with lots of configured addresses. We were doing things like
    traversing the entire address less for each address removed, and
    flushing the entire netfilter conntrack table for every address as
    well.

    8) Add and use SKB bulk free infrastructure, from Jesper Brouer.

    9) Allow offloading u32 classifiers to hardware, and implement for
    ixgbe, from John Fastabend.

    10) Allow configuring IRQ coalescing parameters on a per-queue basis,
    from Kan Liang.

    11) Extend ethtool so that larger link mode masks can be supported.
    From David Decotigny.

    12) Introduce devlink, which can be used to configure port link types
    (ethernet vs Infiniband, etc.), port splitting, and switch device
    level attributes as a whole. From Jiri Pirko.

    13) Hardware offload support for flower classifiers, from Amir Vadai.

    14) Add "Local Checksum Offload". Basically, for a tunneled packet
    the checksum of the outer header is 'constant' (because with the
    checksum field filled into the inner protocol header, the payload
    of the outer frame checksums to 'zero'), and we can take advantage
    of that in various ways. From Edward Cree"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1548 commits)
    bonding: fix bond_get_stats()
    net: bcmgenet: fix dma api length mismatch
    net/mlx4_core: Fix backward compatibility on VFs
    phy: mdio-thunder: Fix some Kconfig typos
    lan78xx: add ndo_get_stats64
    lan78xx: handle statistics counter rollover
    RDS: TCP: Remove unused constant
    RDS: TCP: Add sysctl tunables for sndbuf/rcvbuf on rds-tcp socket
    net: smc911x: convert pxa dma to dmaengine
    team: remove duplicate set of flag IFF_MULTICAST
    bonding: remove duplicate set of flag IFF_MULTICAST
    net: fix a comment typo
    ethernet: micrel: fix some error codes
    ip_tunnels, bpf: define IP_TUNNEL_OPTS_MAX and use it
    bpf, dst: add and use dst_tclassid helper
    bpf: make skb->tc_classid also readable
    net: mvneta: bm: clarify dependencies
    cls_bpf: reset class and reuse major in da
    ldmvsw: Checkpatch sunvnet.c and sunvnet_common.c
    ldmvsw: Add ldmvsw.c driver code
    ...

    Linus Torvalds
     

19 Mar, 2016

1 commit

  • Merge second patch-bomb from Andrew Morton:

    - a couple of hotfixes

    - the rest of MM

    - a new timer slack control in procfs

    - a couple of procfs fixes

    - a few misc things

    - some printk tweaks

    - lib/ updates, notably to radix-tree.

    - add my and Nick Piggin's old userspace radix-tree test harness to
    tools/testing/radix-tree/. Matthew said it was a godsend during the
    radix-tree work he did.

    - a few code-size improvements, switching to __always_inline where gcc
    screwed up.

    - partially implement character sets in sscanf

    * emailed patches from Andrew Morton : (118 commits)
    sscanf: implement basic character sets
    lib/bug.c: use common WARN helper
    param: convert some "on"/"off" users to strtobool
    lib: add "on"/"off" support to kstrtobool
    lib: update single-char callers of strtobool()
    lib: move strtobool() to kstrtobool()
    include/linux/unaligned: force inlining of byteswap operations
    include/uapi/linux/byteorder, swab: force inlining of some byteswap operations
    include/asm-generic/atomic-long.h: force inlining of some atomic_long operations
    usb: common: convert to use match_string() helper
    ide: hpt366: convert to use match_string() helper
    ata: hpt366: convert to use match_string() helper
    power: ab8500: convert to use match_string() helper
    power: charger_manager: convert to use match_string() helper
    drm/edid: convert to use match_string() helper
    pinctrl: convert to use match_string() helper
    device property: convert to use match_string() helper
    lib/string: introduce match_string() helper
    radix-tree tests: add test for radix_tree_iter_next
    radix-tree tests: add regression3 test
    ...

    Linus Torvalds
     

18 Mar, 2016

16 commits

  • Pull trivial tree updates from Jiri Kosina.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    drivers/rtc: broken link fix
    drm/i915 Fix typos in i915_gem_fence.c
    Docs: fix missing word in REPORTING-BUGS
    lib+mm: fix few spelling mistakes
    MAINTAINERS: add git URL for APM driver
    treewide: Fix typo in printk

    Linus Torvalds
     
  • Pull arm64 updates from Catalin Marinas:
    "Here are the main arm64 updates for 4.6. There are some relatively
    intrusive changes to support KASLR, the reworking of the kernel
    virtual memory layout and initial page table creation.

    Summary:

    - Initial page table creation reworked to avoid breaking large block
    mappings (huge pages) into smaller ones. The ARM architecture
    requires break-before-make in such cases to avoid TLB conflicts but
    that's not always possible on live page tables

    - Kernel virtual memory layout: the kernel image is no longer linked
    to the bottom of the linear mapping (PAGE_OFFSET) but at the bottom
    of the vmalloc space, allowing the kernel to be loaded (nearly)
    anywhere in physical RAM

    - Kernel ASLR: position independent kernel Image and modules being
    randomly mapped in the vmalloc space with the randomness is
    provided by UEFI (efi_get_random_bytes() patches merged via the
    arm64 tree, acked by Matt Fleming)

    - Implement relative exception tables for arm64, required by KASLR
    (initial code for ARCH_HAS_RELATIVE_EXTABLE added to lib/extable.c
    but actual x86 conversion to deferred to 4.7 because of the merge
    dependencies)

    - Support for the User Access Override feature of ARMv8.2: this
    allows uaccess functions (get_user etc.) to be implemented using
    LDTR/STTR instructions. Such instructions, when run by the kernel,
    perform unprivileged accesses adding an extra level of protection.
    The set_fs() macro is used to "upgrade" such instruction to
    privileged accesses via the UAO bit

    - Half-precision floating point support (part of ARMv8.2)

    - Optimisations for CPUs with or without a hardware prefetcher (using
    run-time code patching)

    - copy_page performance improvement to deal with 128 bytes at a time

    - Sanity checks on the CPU capabilities (via CPUID) to prevent
    incompatible secondary CPUs from being brought up (e.g. weird
    big.LITTLE configurations)

    - valid_user_regs() reworked for better sanity check of the
    sigcontext information (restored pstate information)

    - ACPI parking protocol implementation

    - CONFIG_DEBUG_RODATA enabled by default

    - VDSO code marked as read-only

    - DEBUG_PAGEALLOC support

    - ARCH_HAS_UBSAN_SANITIZE_ALL enabled

    - Erratum workaround Cavium ThunderX SoC

    - set_pte_at() fix for PROT_NONE mappings

    - Code clean-ups"

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (99 commits)
    arm64: kasan: Fix zero shadow mapping overriding kernel image shadow
    arm64: kasan: Use actual memory node when populating the kernel image shadow
    arm64: Update PTE_RDONLY in set_pte_at() for PROT_NONE permission
    arm64: Fix misspellings in comments.
    arm64: efi: add missing frame pointer assignment
    arm64: make mrs_s prefixing implicit in read_cpuid
    arm64: enable CONFIG_DEBUG_RODATA by default
    arm64: Rework valid_user_regs
    arm64: mm: check at build time that PAGE_OFFSET divides the VA space evenly
    arm64: KVM: Move kvm_call_hyp back to its original localtion
    arm64: mm: treat memstart_addr as a signed quantity
    arm64: mm: list kernel sections in order
    arm64: lse: deal with clobbered IP registers after branch via PLT
    arm64: mm: dump: Use VA_START directly instead of private LOWEST_ADDR
    arm64: kconfig: add submenu for 8.2 architectural features
    arm64: kernel: acpi: fix ioremap in ACPI parking protocol cpu_postboot
    arm64: Add support for Half precision floating point
    arm64: Remove fixmap include fragility
    arm64: Add workaround for Cavium erratum 27456
    arm64: mm: Mark .rodata as RO
    ...

    Linus Torvalds
     
  • Implement basic character sets for the '%[' conversion specifier.

    The '%[' conversion specifier matches a nonempty sequence of characters
    from the specified set of accepted (or with '^', rejected) characters
    between the brackets. The substring matched is to be made up of
    characters in (or not in) the set. This is useful for matching
    substrings that are delimited by something other than spaces.

    This implementation differs from its glibc counterpart in the following ways:
    (1) No support for character ranges (e.g., 'a-z' or '0-9')
    (2) The hyphen '-' is not a special character
    (3) The closing bracket ']' cannot be matched
    (4) No support (yet) for discarding matching input ('%*[')

    The bitmap code is largely based upon sample code which was provided by
    Rasmus.

    The motivation for adding character set support to sscanf originally
    stemmed from the kernel livepatching project. An ongoing patchset
    utilizes new livepatch Elf symbol and section names to store important
    metadata livepatch needs to properly apply its patches. Such metadata
    is stored in these section and symbol names as substrings delimited by
    periods '.' and commas ','. For example, a livepatch symbol name might
    look like this:

    .klp.sym.vmlinux.printk,0

    However, sscanf currently can only extract "substrings" delimited by
    whitespace using the "%s" specifier. Thus for the above symbol name,
    one cannot not use sscanf() to extract substrings "vmlinux" or
    "printk", for example. A number of discussions on the livepatch
    mailing list dealing with string parsing code for extracting these '.'
    and ',' delimited substrings eventually led to the conclusion that such
    code would be completely unnecessary if the kernel sscanf() supported
    character sets. Thus only a single sscanf() call would be necessary to
    extract these substrings. In addition, such an addition to sscanf()
    could benefit other areas of the kernel that might have a similar need
    in the future.

    [akpm@linux-foundation.org: 80-col tweaks]
    Signed-off-by: Jessica Yu
    Signed-off-by: Rasmus Villemoes
    Cc: Andy Shevchenko
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jessica Yu
     
  • The traceoff_on_warning option doesn't have any effect on s390, powerpc,
    arm64, parisc, and sh because there are two different types of WARN
    implementations:

    1) The above mentioned architectures treat WARN() as a special case of a
    BUG() exception. They handle warnings in report_bug() in lib/bug.c.

    2) All other architectures just call warn_slowpath_*() directly. Their
    warnings are handled in warn_slowpath_common() in kernel/panic.c.

    Support traceoff_on_warning on all architectures and prevent any future
    divergence by using a single common function to emit the warning.

    Also remove the '()' from '%pS()', because the parentheses look funky:

    [ 45.607629] WARNING: at /root/warn_mod/warn_mod.c:17 .init_dummy+0x20/0x40 [warn_mod]()

    Reported-by: Chunyu Hu
    Signed-off-by: Josh Poimboeuf
    Acked-by: Heiko Carstens
    Tested-by: Prarit Bhargava
    Acked-by: Prarit Bhargava
    Acked-by: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Poimboeuf
     
  • Add support for "on" and "off" when converting to boolean.

    Signed-off-by: Kees Cook
    Cc: Amitkumar Karwar
    Cc: Andy Shevchenko
    Cc: Daniel Borkmann
    Cc: Heiko Carstens
    Cc: Joe Perches
    Cc: Kalle Valo
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Nishant Sarmukadam
    Cc: Rasmus Villemoes
    Cc: Steve French
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Create the kstrtobool_from_user() helper and move strtobool() logic into
    the new kstrtobool() (matching all the other kstrto* functions).
    Provides an inline wrapper for existing strtobool() callers.

    Signed-off-by: Kees Cook
    Cc: Joe Perches
    Cc: Andy Shevchenko
    Cc: Rasmus Villemoes
    Cc: Daniel Borkmann
    Cc: Amitkumar Karwar
    Cc: Nishant Sarmukadam
    Cc: Kalle Valo
    Cc: Steve French
    Cc: Michael Ellerman
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • Occasionally we have to search for an occurrence of a string in an array
    of strings. Make a simple helper for that purpose.

    Signed-off-by: Andy Shevchenko
    Cc: "David S. Miller"
    Cc: Bartlomiej Zolnierkiewicz
    Cc: David Airlie
    Cc: David Woodhouse
    Cc: Dmitry Eremin-Solenikov
    Cc: Greg Kroah-Hartman
    Cc: Heikki Krogerus
    Cc: Linus Walleij
    Cc: Mika Westerberg
    Cc: Rafael J. Wysocki
    Cc: Sebastian Reichel
    Cc: Tejun Heo
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • This is debug code which is #if 0 out.

    Signed-off-by: Matthew Wilcox
    Cc: Johannes Weiner
    Cc: Matthew Wilcox
    Cc: "Kirill A. Shutemov"
    Cc: Ross Zwisler
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • With huge pages, it is convenient to have the radix tree be able to
    return an entry that covers multiple indices. Previous attempts to deal
    with the problem have involved inserting N duplicate entries, which is a
    waste of memory and leads to problems trying to handle aliased tags, or
    probing the tree multiple times to find alternative entries which might
    cover the requested index.

    This approach inserts one canonical entry into the tree for a given
    range of indices, and may also insert other entries in order to ensure
    that lookups find the canonical entry.

    This solution only tolerates inserting powers of two that are greater
    than the fanout of the tree. If we wish to expand the radix tree's
    abilities to support large-ish pages that is less than the fanout at the
    penultimate level of the tree, then we would need to add one more step
    in lookup to ensure that any sibling nodes in the final level of the
    tree are dereferenced and we return the canonical entry that they
    reference.

    Signed-off-by: Matthew Wilcox
    Cc: Johannes Weiner
    Cc: Matthew Wilcox
    Cc: "Kirill A. Shutemov"
    Cc: Ross Zwisler
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • When we introduce entries that can cover multiple indices, we will need
    to stop in __radix_tree_create based on the shift, not the height.
    Split out for ease of bisect.

    Signed-off-by: Matthew Wilcox
    Cc: Johannes Weiner
    Cc: Matthew Wilcox
    Cc: "Kirill A. Shutemov"
    Cc: Ross Zwisler
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Set the 'indirect_ptr' bit on all the pointers to internal nodes, not
    just on the root node. This enables the following patches to support
    multi-order entries in the radix tree. This patch is split out for ease
    of bisection.

    Signed-off-by: Matthew Wilcox
    Cc: Johannes Weiner
    Cc: Matthew Wilcox
    Cc: "Kirill A. Shutemov"
    Cc: Ross Zwisler
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Christian Borntraeger reported that panic_on_warn doesn't have any
    effect on s390.

    The panic_on_warn feature was introduced with 9e3961a09798 ("kernel: add
    panic_on_warn"). However it did care only for the case when
    WANT_WARN_ON_SLOWPATH is defined. This is turn is only the case for
    architectures which do not have an own __WARN_TAINT defined.

    Other architectures which do have __WARN_TAINT defined call report_bug()
    for warnings within lib/bug.c which does not call panic() in case
    panic_on_warn is set.

    Let's simply enable the panic_on_warn feature by adding the same code
    like it was added to warn_slowpath_common() in panic.c.

    This enables panic_on_warn also for arm64, parisc, powerpc, s390 and sh.

    Signed-off-by: Heiko Carstens
    Reported-by: Christian Borntraeger
    Tested-by: Christian Borntraeger
    Acked-by: Prarit Bhargava
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: "James E.J. Bottomley"
    Cc: Helge Deller
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Tested-by: Michael Ellerman (powerpc)
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • Allocation of radix_tree_node objects can be easily triggered from
    userspace, so we should account them to memory cgroup. Besides, we need
    them accounted for making shadow node shrinker per memcg (see
    mm/workingset.c).

    A tricky thing about accounting radix_tree_node objects is that they are
    mostly allocated through radix_tree_preload(), so we can't just set
    SLAB_ACCOUNT for radix_tree_node_cachep - that would likely result in a
    lot of unrelated cgroups using objects from each other's caches.

    One way to overcome this would be making radix tree preloads per memcg,
    but that would probably look cumbersome and overcomplicated.

    Instead, we make radix_tree_node_alloc() first try to allocate from the
    cache with __GFP_ACCOUNT, no matter if the caller has preloaded or not,
    and only if it fails fall back on using per cpu preloads. This should
    make most allocations accounted.

    Signed-off-by: Vladimir Davydov
    Acked-by: Johannes Weiner
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • Pull char/misc updates from Greg KH:
    "Here is the big char/misc driver update for 4.6-rc1.

    The majority of the patches here is hwtracing and some new mic
    drivers, but there's a lot of other driver updates as well. Full
    details in the shortlog.

    All have been in linux-next for a while with no reported issues"

    * tag 'char-misc-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (238 commits)
    goldfish: Fix build error of missing ioremap on UM
    nvmem: mediatek: Fix later provider initialization
    nvmem: imx-ocotp: Fix return value of imx_ocotp_read
    nvmem: Fix dependencies for !HAS_IOMEM archs
    char: genrtc: replace blacklist with whitelist
    drivers/hwtracing: make coresight-etm-perf.c explicitly non-modular
    drivers: char: mem: fix IS_ERROR_VALUE usage
    char: xillybus: Fix internal data structure initialization
    pch_phub: return -ENODATA if ROM can't be mapped
    Drivers: hv: vmbus: Support kexec on ws2012 r2 and above
    Drivers: hv: vmbus: Support handling messages on multiple CPUs
    Drivers: hv: utils: Remove util transport handler from list if registration fails
    Drivers: hv: util: Pass the channel information during the init call
    Drivers: hv: vmbus: avoid unneeded compiler optimizations in vmbus_wait_for_unload()
    Drivers: hv: vmbus: remove code duplication in message handling
    Drivers: hv: vmbus: avoid wait_for_completion() on crash
    Drivers: hv: vmbus: don't loose HVMSG_TIMER_EXPIRED messages
    misc: at24: replace memory_accessor with nvmem_device_read
    eeprom: 93xx46: extend driver to plug into the NVMEM framework
    eeprom: at25: extend driver to plug into the NVMEM framework
    ...

    Linus Torvalds
     
  • Pull driver core updates from Greg KH:
    "Just a few patches this time around for the 4.6-rc1 merge window.
    Largest is a new firmware driver, but there are some other updates to
    the driver core in here as well, the shortlog has the details.

    All have been in linux-next for a while with no reported issues"

    * tag 'driver-core-4.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
    Revert "driver-core: platform: probe of-devices only using list of compatibles"
    firmware: qemu config needs I/O ports
    firmware: qemu_fw_cfg.c: fix typo FW_CFG_DATA_OFF
    driver-core: platform: probe of-devices only using list of compatibles
    driver-core: platform: fix typo in documentation for multi-driver helper
    component: remove impossible condition
    drivers: dma-coherent: simplify dma_init_coherent_memory return value
    devicetree: update documentation for fw_cfg ARM bindings
    firmware: create directory hierarchy for sysfs fw_cfg entries
    firmware: introduce sysfs driver for QEMU's fw_cfg device
    kobject: export kset_find_obj() for module use
    driver core: bus: use to_subsys_private and to_device_private_bus
    driver core: bus: use list_for_each_entry*
    debugfs: Add stub function for debugfs_create_automount().
    kernfs: make kernfs_walk_ns() use kernfs_pr_cont_buf[]

    Linus Torvalds
     
  • Pull crypto update from Herbert Xu:
    "Here is the crypto update for 4.6:

    API:
    - Convert remaining crypto_hash users to shash or ahash, also convert
    blkcipher/ablkcipher users to skcipher.
    - Remove crypto_hash interface.
    - Remove crypto_pcomp interface.
    - Add crypto engine for async cipher drivers.
    - Add akcipher documentation.
    - Add skcipher documentation.

    Algorithms:
    - Rename crypto/crc32 to avoid name clash with lib/crc32.
    - Fix bug in keywrap where we zero the wrong pointer.

    Drivers:
    - Support T5/M5, T7/M7 SPARC CPUs in n2 hwrng driver.
    - Add PIC32 hwrng driver.
    - Support BCM6368 in bcm63xx hwrng driver.
    - Pack structs for 32-bit compat users in qat.
    - Use crypto engine in omap-aes.
    - Add support for sama5d2x SoCs in atmel-sha.
    - Make atmel-sha available again.
    - Make sahara hashing available again.
    - Make ccp hashing available again.
    - Make sha1-mb available again.
    - Add support for multiple devices in ccp.
    - Improve DMA performance in caam.
    - Add hashing support to rockchip"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (116 commits)
    crypto: qat - remove redundant arbiter configuration
    crypto: ux500 - fix checks of error code returned by devm_ioremap_resource()
    crypto: atmel - fix checks of error code returned by devm_ioremap_resource()
    crypto: qat - Change the definition of icp_qat_uof_regtype
    hwrng: exynos - use __maybe_unused to hide pm functions
    crypto: ccp - Add abstraction for device-specific calls
    crypto: ccp - CCP versioning support
    crypto: ccp - Support for multiple CCPs
    crypto: ccp - Remove check for x86 family and model
    crypto: ccp - memset request context to zero during import
    lib/mpi: use "static inline" instead of "extern inline"
    lib/mpi: avoid assembler warning
    hwrng: bcm63xx - fix non device tree compatibility
    crypto: testmgr - allow rfc3686 aes-ctr variants in fips mode.
    crypto: qat - The AE id should be less than the maximal AE number
    lib/mpi: Endianness fix
    crypto: rockchip - add hash support for crypto engine in rk3288
    crypto: xts - fix compile errors
    crypto: doc - add skcipher API documentation
    crypto: doc - update AEAD AD handling
    ...

    Linus Torvalds
     

17 Mar, 2016

1 commit

  • Merge first patch-bomb from Andrew Morton:

    - some misc things

    - ofs2 updates

    - about half of MM

    - checkpatch updates

    - autofs4 update

    * emailed patches from Andrew Morton : (120 commits)
    autofs4: fix string.h include in auto_dev-ioctl.h
    autofs4: use pr_xxx() macros directly for logging
    autofs4: change log print macros to not insert newline
    autofs4: make autofs log prints consistent
    autofs4: fix some white space errors
    autofs4: fix invalid ioctl return in autofs4_root_ioctl_unlocked()
    autofs4: fix coding style line length in autofs4_wait()
    autofs4: fix coding style problem in autofs4_get_set_timeout()
    autofs4: coding style fixes
    autofs: show pipe inode in mount options
    kallsyms: add support for relative offsets in kallsyms address table
    kallsyms: don't overload absolute symbol type for percpu symbols
    x86: kallsyms: disable absolute percpu symbols on !SMP
    checkpatch: fix another left brace warning
    checkpatch: improve UNSPECIFIED_INT test for bare signed/unsigned uses
    checkpatch: warn on bare unsigned or signed declarations without int
    checkpatch: exclude asm volatile from complex macro check
    mm: memcontrol: drop unnecessary lru locking from mem_cgroup_migrate()
    mm: migrate: consolidate mem_cgroup_migrate() calls
    mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous
    ...

    Linus Torvalds
     

16 Mar, 2016

3 commits

  • In mm we use several kinds of flags bitfields that are sometimes printed
    for debugging purposes, or exported to userspace via sysfs. To make
    them easier to interpret independently on kernel version and config, we
    want to dump also the symbolic flag names. So far this has been done
    with repeated calls to pr_cont(), which is unreliable on SMP, and not
    usable for e.g. sysfs export.

    To get a more reliable and universal solution, this patch extends
    printk() format string for pointers to handle the page flags (%pGp),
    gfp_flags (%pGg) and vma flags (%pGv). Existing users of
    dump_flag_names() are converted and simplified.

    It would be possible to pass flags by value instead of pointer, but the
    %p format string for pointers already has extensions for various kernel
    structures, so it's a good fit, and the extra indirection in a
    non-critical path is negligible.

    [linux@rasmusvillemoes.dk: lots of good implementation suggestions]
    Signed-off-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Ingo Molnar
    Cc: Rasmus Villemoes
    Cc: Joonsoo Kim
    Cc: Minchan Kim
    Cc: Sasha Levin
    Cc: "Kirill A. Shutemov"
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • Pull cpu hotplug updates from Thomas Gleixner:
    "This is the first part of the ongoing cpu hotplug rework:

    - Initial implementation of the state machine

    - Runs all online and prepare down callbacks on the plugged cpu and
    not on some random processor

    - Replaces busy loop waiting with completions

    - Adds tracepoints so the states can be followed"

    More detailed commentary on this work from an earlier email:
    "What's wrong with the current cpu hotplug infrastructure?

    - Asymmetry

    The hotplug notifier mechanism is asymmetric versus the bringup and
    teardown. This is mostly caused by the notifier mechanism.

    - Largely undocumented dependencies

    While some notifiers use explicitely defined notifier priorities,
    we have quite some notifiers which use numerical priorities to
    express dependencies without any documentation why.

    - Control processor driven

    Most of the bringup/teardown of a cpu is driven by a control
    processor. While it is understandable, that preperatory steps,
    like idle thread creation, memory allocation for and initialization
    of essential facilities needs to be done before a cpu can boot,
    there is no reason why everything else must run on a control
    processor. Before this patch series, bringup looks like this:

    Control CPU Booting CPU

    do preparatory steps
    kick cpu into life

    do low level init

    sync with booting cpu sync with control cpu

    bring the rest up

    - All or nothing approach

    There is no way to do partial bringups. That's something which is
    really desired because we waste e.g. at boot substantial amount of
    time just busy waiting that the cpu comes to life. That's stupid
    as we could very well do preparatory steps and the initial IPI for
    other cpus and then go back and do the necessary low level
    synchronization with the freshly booted cpu.

    - Minimal debuggability

    Due to the notifier based design, it's impossible to switch between
    two stages of the bringup/teardown back and forth in order to test
    the correctness. So in many hotplug notifiers the cancel
    mechanisms are either not existant or completely untested.

    - Notifier [un]registering is tedious

    To [un]register notifiers we need to protect against hotplug at
    every callsite. There is no mechanism that bringup/teardown
    callbacks are issued on the online cpus, so every caller needs to
    do it itself. That also includes error rollback.

    What's the new design?

    The base of the new design is a symmetric state machine, where both
    the control processor and the booting/dying cpu execute a well
    defined set of states. Each state is symmetric in the end, except
    for some well defined exceptions, and the bringup/teardown can be
    stopped and reversed at almost all states.

    So the bringup of a cpu will look like this in the future:

    Control CPU Booting CPU

    do preparatory steps
    kick cpu into life

    do low level init

    sync with booting cpu sync with control cpu

    bring itself up

    The synchronization step does not require the control cpu to wait.
    That mechanism can be done asynchronously via a worker or some
    other mechanism.

    The teardown can be made very similar, so that the dying cpu cleans
    up and brings itself down. Cleanups which need to be done after
    the cpu is gone, can be scheduled asynchronously as well.

    There is a long way to this, as we need to refactor the notion when a
    cpu is available. Today we set the cpu online right after it comes
    out of the low level bringup, which is not really correct.

    The proper mechanism is to set it to available, i.e. cpu local
    threads, like softirqd, hotplug thread etc. can be scheduled on that
    cpu, and once it finished all booting steps, it's set to online, so
    general workloads can be scheduled on it. The reverse happens on
    teardown. First thing to do is to forbid scheduling of general
    workloads, then teardown all the per cpu resources and finally shut it
    off completely.

    This patch series implements the basic infrastructure for this at the
    core level. This includes the following:

    - Basic state machine implementation with well defined states, so
    ordering and prioritization can be expressed.

    - Interfaces to [un]register state callbacks

    This invokes the bringup/teardown callback on all online cpus with
    the proper protection in place and [un]installs the callbacks in
    the state machine array.

    For callbacks which have no particular ordering requirement we have
    a dynamic state space, so that drivers don't have to register an
    explicit hotplug state.

    If a callback fails, the code automatically does a rollback to the
    previous state.

    - Sysfs interface to drive the state machine to a particular step.

    This is only partially functional today. Full functionality and
    therefor testability will be achieved once we converted all
    existing hotplug notifiers over to the new scheme.

    - Run all CPU_ONLINE/DOWN_PREPARE notifiers on the booting/dying
    processor:

    Control CPU Booting CPU

    do preparatory steps
    kick cpu into life

    do low level init

    sync with booting cpu sync with control cpu
    wait for boot
    bring itself up

    Signal completion to control cpu

    In a previous step of this work we've done a full tree mechanical
    conversion of all hotplug notifiers to the new scheme. The balance
    is a net removal of about 4000 lines of code.

    This is not included in this series, as we decided to take a
    different approach. Instead of mechanically converting everything
    over, we will do a proper overhaul of the usage sites one by one so
    they nicely fit into the symmetric callback scheme.

    I decided to do that after I looked at the ugliness of some of the
    converted sites and figured out that their hotplug mechanism is
    completely buggered anyway. So there is no point to do a
    mechanical conversion first as we need to go through the usage
    sites one by one again in order to achieve a full symmetric and
    testable behaviour"

    * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
    cpu/hotplug: Document states better
    cpu/hotplug: Fix smpboot thread ordering
    cpu/hotplug: Remove redundant state check
    cpu/hotplug: Plug death reporting race
    rcu: Make CPU_DYING_IDLE an explicit call
    cpu/hotplug: Make wait for dead cpu completion based
    cpu/hotplug: Let upcoming cpu bring itself fully up
    arch/hotplug: Call into idle with a proper state
    cpu/hotplug: Move online calls to hotplugged cpu
    cpu/hotplug: Create hotplug threads
    cpu/hotplug: Split out the state walk into functions
    cpu/hotplug: Unpark smpboot threads from the state machine
    cpu/hotplug: Move scheduler cpu_online notifier to hotplug core
    cpu/hotplug: Implement setup/removal interface
    cpu/hotplug: Make target state writeable
    cpu/hotplug: Add sysfs state interface
    cpu/hotplug: Hand in target state to _cpu_up/down
    cpu/hotplug: Convert the hotplugged cpu work to a state machine
    cpu/hotplug: Convert to a state machine for the control processor
    cpu/hotplug: Add tracepoints
    ...

    Linus Torvalds
     
  • Pull x86 asm updates from Ingo Molnar:
    "This is another big update. Main changes are:

    - lots of x86 system call (and other traps/exceptions) entry code
    enhancements. In particular the complex parts of the 64-bit entry
    code have been migrated to C code as well, and a number of dusty
    corners have been refreshed. (Andy Lutomirski)

    - vDSO special mapping robustification and general cleanups (Andy
    Lutomirski)

    - cpufeature refactoring, cleanups and speedups (Borislav Petkov)

    - lots of other changes ..."

    * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (64 commits)
    x86/cpufeature: Enable new AVX-512 features
    x86/entry/traps: Show unhandled signal for i386 in do_trap()
    x86/entry: Call enter_from_user_mode() with IRQs off
    x86/entry/32: Change INT80 to be an interrupt gate
    x86/entry: Improve system call entry comments
    x86/entry: Remove TIF_SINGLESTEP entry work
    x86/entry/32: Add and check a stack canary for the SYSENTER stack
    x86/entry/32: Simplify and fix up the SYSENTER stack #DB/NMI fixup
    x86/entry: Only allocate space for tss_struct::SYSENTER_stack if needed
    x86/entry: Vastly simplify SYSENTER TF (single-step) handling
    x86/entry/traps: Clear DR6 early in do_debug() and improve the comment
    x86/entry/traps: Clear TIF_BLOCKSTEP on all debug exceptions
    x86/entry/32: Restore FLAGS on SYSEXIT
    x86/entry/32: Filter NT and speed up AC filtering in SYSENTER
    x86/entry/compat: In SYSENTER, sink AC clearing below the existing FLAGS test
    selftests/x86: In syscall_nt, test NT|TF as well
    x86/asm-offsets: Remove PARAVIRT_enabled
    x86/entry/32: Introduce and use X86_BUG_ESPFIX instead of paravirt_enabled
    uprobes: __create_xol_area() must nullify xol_mapping.fault
    x86/cpufeature: Create a new synthetic cpu capability for machine check recovery
    ...

    Linus Torvalds
     

15 Mar, 2016

1 commit

  • Pull perf updates from Ingo Molnar:
    "Main kernel side changes:

    - Big reorganization of the x86 perf support code. The old code grew
    organically deep inside arch/x86/kernel/cpu/perf* and its naming
    became somewhat messy.

    The new location is under arch/x86/events/, using the following
    cleaner hierarchy of source code files:

    perf/x86: Move perf_event.c .................. => x86/events/core.c
    perf/x86: Move perf_event_amd.c .............. => x86/events/amd/core.c
    perf/x86: Move perf_event_amd_ibs.c .......... => x86/events/amd/ibs.c
    perf/x86: Move perf_event_amd_iommu.[ch] ..... => x86/events/amd/iommu.[ch]
    perf/x86: Move perf_event_amd_uncore.c ....... => x86/events/amd/uncore.c
    perf/x86: Move perf_event_intel_bts.c ........ => x86/events/intel/bts.c
    perf/x86: Move perf_event_intel.c ............ => x86/events/intel/core.c
    perf/x86: Move perf_event_intel_cqm.c ........ => x86/events/intel/cqm.c
    perf/x86: Move perf_event_intel_cstate.c ..... => x86/events/intel/cstate.c
    perf/x86: Move perf_event_intel_ds.c ......... => x86/events/intel/ds.c
    perf/x86: Move perf_event_intel_lbr.c ........ => x86/events/intel/lbr.c
    perf/x86: Move perf_event_intel_pt.[ch] ...... => x86/events/intel/pt.[ch]
    perf/x86: Move perf_event_intel_rapl.c ....... => x86/events/intel/rapl.c
    perf/x86: Move perf_event_intel_uncore.[ch] .. => x86/events/intel/uncore.[ch]
    perf/x86: Move perf_event_intel_uncore_nhmex.c => x86/events/intel/uncore_nmhex.c
    perf/x86: Move perf_event_intel_uncore_snb.c => x86/events/intel/uncore_snb.c
    perf/x86: Move perf_event_intel_uncore_snbep.c => x86/events/intel/uncore_snbep.c
    perf/x86: Move perf_event_knc.c .............. => x86/events/intel/knc.c
    perf/x86: Move perf_event_p4.c ............... => x86/events/intel/p4.c
    perf/x86: Move perf_event_p6.c ............... => x86/events/intel/p6.c
    perf/x86: Move perf_event_msr.c .............. => x86/events/msr.c

    (Borislav Petkov)

    - Update various x86 PMU constraint and hw support details (Stephane
    Eranian)

    - Optimize kprobes for BPF execution (Martin KaFai Lau)

    - Rewrite, refactor and fix the Intel uncore PMU driver code (Thomas
    Gleixner)

    - Rewrite, refactor and fix the Intel RAPL PMU code (Thomas Gleixner)

    - Various fixes and smaller cleanups.

    There are lots of perf tooling updates as well. A few highlights:

    perf report/top:

    - Hierarchy histogram mode for 'perf top' and 'perf report',
    showing multiple levels, one per --sort entry: (Namhyung Kim)

    On a mostly idle system:

    # perf top --hierarchy -s comm,dso

    Then expand some levels and use 'P' to take a snapshot:

    # cat perf.hist.0
    - 92.32% perf
    58.20% perf
    22.29% libc-2.22.so
    5.97% [kernel]
    4.18% libelf-0.165.so
    1.69% [unknown]
    - 4.71% qemu-system-x86
    3.10% [kernel]
    1.60% qemu-system-x86_64 (deleted)
    + 2.97% swapper
    #

    - Add 'L' hotkey to dynamicly set the percent threshold for
    histogram entries and callchains, i.e. dynamicly do what the
    --percent-limit command line option to 'top' and 'report' does.
    (Namhyung Kim)

    perf mem:

    - Allow specifying events via -e in 'perf mem record', also listing
    what events can be specified via 'perf mem record -e list' (Jiri
    Olsa)

    perf record:

    - Add 'perf record' --all-user/--all-kernel options, so that one
    can tell that all the events in the command line should be
    restricted to the user or kernel levels (Jiri Olsa), i.e.:

    perf record -e cycles:u,instructions:u

    is equivalent to:

    perf record --all-user -e cycles,instructions

    - Make 'perf record' collect CPU cache info in the perf.data file header:

    $ perf record usleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
    $ perf report --header-only -I | tail -10 | head -8
    # CPU cache info:
    # L1 Data 32K [0-1]
    # L1 Instruction 32K [0-1]
    # L1 Data 32K [2-3]
    # L1 Instruction 32K [2-3]
    # L2 Unified 256K [0-1]
    # L2 Unified 256K [2-3]
    # L3 Unified 4096K [0-3]

    Will be used in 'perf c2c' and eventually in 'perf diff' to
    allow, for instance running the same workload in multiple
    machines and then when using 'diff' show the hardware difference.
    (Jiri Olsa)

    - Improved support for Java, using the JVMTI agent library to do
    jitdumps that then will be inserted in synthesized
    PERF_RECORD_MMAP2 events via 'perf inject' pointed to synthesized
    ELF files stored in ~/.debug and keyed with build-ids, to allow
    symbol resolution and even annotation with source line info, see
    the changeset comments to see how to use it (Stephane Eranian)

    perf script/trace:

    - Decode data_src values (e.g. perf.data files generated by 'perf
    mem record') in 'perf script': (Jiri Olsa)

    # perf script
    perf 693 [1] 4.088652: 1 cpu/mem-loads,ldlat=30/P: ffff88007d0b0f40 68100142 L1 hit|SNP None|TLB L1 or L2 hit|LCK No
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    - Improve support to 'data_src', 'weight' and 'addr' fields in
    'perf script' (Jiri Olsa)

    - Handle empty print fmts in 'perf script -s' i.e. when running
    python or perl scripts (Taeung Song)

    perf stat:

    - 'perf stat' now shows shadow metrics (insn per cycle, etc) in
    interval mode too. E.g:

    # perf stat -I 1000 -e instructions,cycles sleep 1
    # time counts unit events
    1.000215928 519,620 instructions # 0.69 insn per cycle
    1.000215928 752,003 cycles

    - Port 'perf kvm stat' to PowerPC (Hemant Kumar)

    - Implement CSV metrics output in 'perf stat' (Andi Kleen)

    perf BPF support:

    - Support converting data from bpf events in 'perf data' (Wang Nan)

    - Print bpf-output events in 'perf script': (Wang Nan).

    # perf record -e bpf-output/no-inherit,name=evt/ -e ./test_bpf_output_3.c/map:channel.event=evt/ usleep 1000
    # perf script
    usleep 4882 21384.532523: evt: ffffffff810e97d1 sys_nanosleep ([kernel.kallsyms])
    BPF output: 0000: 52 61 69 73 65 20 61 20 Raise a
    0008: 42 50 46 20 65 76 65 6e BPF even
    0010: 74 21 00 00 t!..
    BPF string: "Raise a BPF event!"
    #

    - Add API to set values of map entries in a BPF object, be it
    individual map slots or ranges (Wang Nan)

    - Introduce support for the 'bpf-output' event (Wang Nan)

    - Add glue to read perf events in a BPF program (Wang Nan)

    - Improve support for bpf-output events in 'perf trace' (Wang Nan)

    ... and tons of other changes as well - see the shortlog and git log
    for details!"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (342 commits)
    perf stat: Add --metric-only support for -A
    perf stat: Implement --metric-only mode
    perf stat: Document CSV format in manpage
    perf hists browser: Check sort keys before hot key actions
    perf hists browser: Allow thread filtering for comm sort key
    perf tools: Add sort__has_comm variable
    perf tools: Recalc total periods using top-level entries in hierarchy
    perf tools: Remove nr_sort_keys field
    perf hists browser: Cleanup hist_browser__fprintf_hierarchy_entry()
    perf tools: Remove hist_entry->fmt field
    perf tools: Fix command line filters in hierarchy mode
    perf tools: Add more sort entry check functions
    perf tools: Fix hist_entry__filter() for hierarchy
    perf jitdump: Build only on supported archs
    tools lib traceevent: Add '~' operation within arg_num_eval()
    perf tools: Omit unnecessary cast in perf_pmu__parse_scale
    perf tools: Pass perf_hpp_list all the way through setup_sort_list
    perf tools: Fix perf script python database export crash
    perf jitdump: DWARF is also needed
    perf bench mem: Prepare the x86-64 build for upstream memcpy_mcsafe() changes
    ...

    Linus Torvalds
     

14 Mar, 2016

1 commit

  • This patch updates all instances of csum_tcpudp_magic and
    csum_tcpudp_nofold to reflect the types that are usually used as the source
    inputs. For example the protocol field is populated based on nexthdr which
    is actually an unsigned 8 bit value. The length is usually populated based
    on skb->len which is an unsigned integer.

    This addresses an issue in which the IPv6 function csum_ipv6_magic was
    generating a checksum using the full 32b of skb->len while
    csum_tcpudp_magic was only using the lower 16 bits. As a result we could
    run into issues when attempting to adjust the checksum as there was no
    protocol agnostic way to update it.

    With this change the value is still truncated as many architectures use
    "(len + proto) << 8", however this truncation only occurs for values
    greater than 16776960 in length and as such is unlikely to occur as we stop
    the inner headers at ~64K in size.

    I did have to make a few minor changes in the arm, mn10300, nios2, and
    score versions of the function in order to support these changes as they
    were either using things such as an OR to combine the protocol and length,
    or were using ntohs to convert the length which would have truncated the
    value.

    I also updated a few spots in terms of whitespace and type differences for
    the addresses. Most of this was just to make sure all of the definitions
    were in sync going forward.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     

10 Mar, 2016

2 commits

  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Given we have uninitialized list_heads being passed to list_add() it
    will always be the case that those uninitialized values randomly trigger
    the poison value. Especially since a list_add() operation will seed the
    stack with the poison value for later stack allocations to trip over.

    For example, see these two false positive reports:

    list_add attempted on force-poisoned entry
    WARNING: at lib/list_debug.c:34
    [..]
    NIP [c00000000043c390] __list_add+0xb0/0x150
    LR [c00000000043c38c] __list_add+0xac/0x150
    Call Trace:
    __list_add+0xac/0x150 (unreliable)
    __down+0x4c/0xf8
    down+0x68/0x70
    xfs_buf_lock+0x4c/0x150 [xfs]

    list_add attempted on force-poisoned entry(0000000000000500),
    new->next == d0000000059ecdb0, new->prev == 0000000000000500
    WARNING: at lib/list_debug.c:33
    [..]
    NIP [c00000000042db78] __list_add+0xa8/0x140
    LR [c00000000042db74] __list_add+0xa4/0x140
    Call Trace:
    __list_add+0xa4/0x140 (unreliable)
    rwsem_down_read_failed+0x6c/0x1a0
    down_read+0x58/0x60
    xfs_log_commit_cil+0x7c/0x600 [xfs]

    Fixes: commit 5c2c2587b132 ("mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup")
    Signed-off-by: Dan Williams
    Reported-by: Eryu Guan
    Tested-by: Eryu Guan
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     

02 Mar, 2016

3 commits

  • We are going to require dma_ops for several common drivers, even for
    systems that do have an identity mapping. Lets provide some minimal
    no-op dma_ops that can be used for that purpose.

    Signed-off-by: Christian Borntraeger
    Reviewed-by: Joerg Roedel
    Signed-off-by: Andy Lutomirski
    Signed-off-by: Michael S. Tsirkin

    Christian Borntraeger
     
  • We want the fixes in here, and others are sending us pull requests based
    on this kernel tree.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • Make it possible to write a target state to the per cpu state file, so we can
    switch between states.

    Signed-off-by: Thomas Gleixner
    Cc: linux-arch@vger.kernel.org
    Cc: Rik van Riel
    Cc: Rafael Wysocki
    Cc: "Srivatsa S. Bhat"
    Cc: Peter Zijlstra
    Cc: Arjan van de Ven
    Cc: Sebastian Siewior
    Cc: Rusty Russell
    Cc: Steven Rostedt
    Cc: Oleg Nesterov
    Cc: Tejun Heo
    Cc: Andrew Morton
    Cc: Paul McKenney
    Cc: Linus Torvalds
    Cc: Paul Turner
    Link: http://lkml.kernel.org/r/20160226182341.022814799@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

29 Feb, 2016

2 commits