16 Mar, 2016

1 commit

  • In mm we use several kinds of flags bitfields that are sometimes printed
    for debugging purposes, or exported to userspace via sysfs. To make
    them easier to interpret independently on kernel version and config, we
    want to dump also the symbolic flag names. So far this has been done
    with repeated calls to pr_cont(), which is unreliable on SMP, and not
    usable for e.g. sysfs export.

    To get a more reliable and universal solution, this patch extends
    printk() format string for pointers to handle the page flags (%pGp),
    gfp_flags (%pGg) and vma flags (%pGv). Existing users of
    dump_flag_names() are converted and simplified.

    It would be possible to pass flags by value instead of pointer, but the
    %p format string for pointers already has extensions for various kernel
    structures, so it's a good fit, and the extra indirection in a
    non-critical path is negligible.

    [linux@rasmusvillemoes.dk: lots of good implementation suggestions]
    Signed-off-by: Vlastimil Babka
    Acked-by: Michal Hocko
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Ingo Molnar
    Cc: Rasmus Villemoes
    Cc: Joonsoo Kim
    Cc: Minchan Kim
    Cc: Sasha Levin
    Cc: "Kirill A. Shutemov"
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

15 Mar, 2016

1 commit

  • Pull perf updates from Ingo Molnar:
    "Main kernel side changes:

    - Big reorganization of the x86 perf support code. The old code grew
    organically deep inside arch/x86/kernel/cpu/perf* and its naming
    became somewhat messy.

    The new location is under arch/x86/events/, using the following
    cleaner hierarchy of source code files:

    perf/x86: Move perf_event.c .................. => x86/events/core.c
    perf/x86: Move perf_event_amd.c .............. => x86/events/amd/core.c
    perf/x86: Move perf_event_amd_ibs.c .......... => x86/events/amd/ibs.c
    perf/x86: Move perf_event_amd_iommu.[ch] ..... => x86/events/amd/iommu.[ch]
    perf/x86: Move perf_event_amd_uncore.c ....... => x86/events/amd/uncore.c
    perf/x86: Move perf_event_intel_bts.c ........ => x86/events/intel/bts.c
    perf/x86: Move perf_event_intel.c ............ => x86/events/intel/core.c
    perf/x86: Move perf_event_intel_cqm.c ........ => x86/events/intel/cqm.c
    perf/x86: Move perf_event_intel_cstate.c ..... => x86/events/intel/cstate.c
    perf/x86: Move perf_event_intel_ds.c ......... => x86/events/intel/ds.c
    perf/x86: Move perf_event_intel_lbr.c ........ => x86/events/intel/lbr.c
    perf/x86: Move perf_event_intel_pt.[ch] ...... => x86/events/intel/pt.[ch]
    perf/x86: Move perf_event_intel_rapl.c ....... => x86/events/intel/rapl.c
    perf/x86: Move perf_event_intel_uncore.[ch] .. => x86/events/intel/uncore.[ch]
    perf/x86: Move perf_event_intel_uncore_nhmex.c => x86/events/intel/uncore_nmhex.c
    perf/x86: Move perf_event_intel_uncore_snb.c => x86/events/intel/uncore_snb.c
    perf/x86: Move perf_event_intel_uncore_snbep.c => x86/events/intel/uncore_snbep.c
    perf/x86: Move perf_event_knc.c .............. => x86/events/intel/knc.c
    perf/x86: Move perf_event_p4.c ............... => x86/events/intel/p4.c
    perf/x86: Move perf_event_p6.c ............... => x86/events/intel/p6.c
    perf/x86: Move perf_event_msr.c .............. => x86/events/msr.c

    (Borislav Petkov)

    - Update various x86 PMU constraint and hw support details (Stephane
    Eranian)

    - Optimize kprobes for BPF execution (Martin KaFai Lau)

    - Rewrite, refactor and fix the Intel uncore PMU driver code (Thomas
    Gleixner)

    - Rewrite, refactor and fix the Intel RAPL PMU code (Thomas Gleixner)

    - Various fixes and smaller cleanups.

    There are lots of perf tooling updates as well. A few highlights:

    perf report/top:

    - Hierarchy histogram mode for 'perf top' and 'perf report',
    showing multiple levels, one per --sort entry: (Namhyung Kim)

    On a mostly idle system:

    # perf top --hierarchy -s comm,dso

    Then expand some levels and use 'P' to take a snapshot:

    # cat perf.hist.0
    - 92.32% perf
    58.20% perf
    22.29% libc-2.22.so
    5.97% [kernel]
    4.18% libelf-0.165.so
    1.69% [unknown]
    - 4.71% qemu-system-x86
    3.10% [kernel]
    1.60% qemu-system-x86_64 (deleted)
    + 2.97% swapper
    #

    - Add 'L' hotkey to dynamicly set the percent threshold for
    histogram entries and callchains, i.e. dynamicly do what the
    --percent-limit command line option to 'top' and 'report' does.
    (Namhyung Kim)

    perf mem:

    - Allow specifying events via -e in 'perf mem record', also listing
    what events can be specified via 'perf mem record -e list' (Jiri
    Olsa)

    perf record:

    - Add 'perf record' --all-user/--all-kernel options, so that one
    can tell that all the events in the command line should be
    restricted to the user or kernel levels (Jiri Olsa), i.e.:

    perf record -e cycles:u,instructions:u

    is equivalent to:

    perf record --all-user -e cycles,instructions

    - Make 'perf record' collect CPU cache info in the perf.data file header:

    $ perf record usleep 1
    [ perf record: Woken up 1 times to write data ]
    [ perf record: Captured and wrote 0.017 MB perf.data (7 samples) ]
    $ perf report --header-only -I | tail -10 | head -8
    # CPU cache info:
    # L1 Data 32K [0-1]
    # L1 Instruction 32K [0-1]
    # L1 Data 32K [2-3]
    # L1 Instruction 32K [2-3]
    # L2 Unified 256K [0-1]
    # L2 Unified 256K [2-3]
    # L3 Unified 4096K [0-3]

    Will be used in 'perf c2c' and eventually in 'perf diff' to
    allow, for instance running the same workload in multiple
    machines and then when using 'diff' show the hardware difference.
    (Jiri Olsa)

    - Improved support for Java, using the JVMTI agent library to do
    jitdumps that then will be inserted in synthesized
    PERF_RECORD_MMAP2 events via 'perf inject' pointed to synthesized
    ELF files stored in ~/.debug and keyed with build-ids, to allow
    symbol resolution and even annotation with source line info, see
    the changeset comments to see how to use it (Stephane Eranian)

    perf script/trace:

    - Decode data_src values (e.g. perf.data files generated by 'perf
    mem record') in 'perf script': (Jiri Olsa)

    # perf script
    perf 693 [1] 4.088652: 1 cpu/mem-loads,ldlat=30/P: ffff88007d0b0f40 68100142 L1 hit|SNP None|TLB L1 or L2 hit|LCK No
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    - Improve support to 'data_src', 'weight' and 'addr' fields in
    'perf script' (Jiri Olsa)

    - Handle empty print fmts in 'perf script -s' i.e. when running
    python or perl scripts (Taeung Song)

    perf stat:

    - 'perf stat' now shows shadow metrics (insn per cycle, etc) in
    interval mode too. E.g:

    # perf stat -I 1000 -e instructions,cycles sleep 1
    # time counts unit events
    1.000215928 519,620 instructions # 0.69 insn per cycle
    1.000215928 752,003 cycles

    - Port 'perf kvm stat' to PowerPC (Hemant Kumar)

    - Implement CSV metrics output in 'perf stat' (Andi Kleen)

    perf BPF support:

    - Support converting data from bpf events in 'perf data' (Wang Nan)

    - Print bpf-output events in 'perf script': (Wang Nan).

    # perf record -e bpf-output/no-inherit,name=evt/ -e ./test_bpf_output_3.c/map:channel.event=evt/ usleep 1000
    # perf script
    usleep 4882 21384.532523: evt: ffffffff810e97d1 sys_nanosleep ([kernel.kallsyms])
    BPF output: 0000: 52 61 69 73 65 20 61 20 Raise a
    0008: 42 50 46 20 65 76 65 6e BPF even
    0010: 74 21 00 00 t!..
    BPF string: "Raise a BPF event!"
    #

    - Add API to set values of map entries in a BPF object, be it
    individual map slots or ranges (Wang Nan)

    - Introduce support for the 'bpf-output' event (Wang Nan)

    - Add glue to read perf events in a BPF program (Wang Nan)

    - Improve support for bpf-output events in 'perf trace' (Wang Nan)

    ... and tons of other changes as well - see the shortlog and git log
    for details!"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (342 commits)
    perf stat: Add --metric-only support for -A
    perf stat: Implement --metric-only mode
    perf stat: Document CSV format in manpage
    perf hists browser: Check sort keys before hot key actions
    perf hists browser: Allow thread filtering for comm sort key
    perf tools: Add sort__has_comm variable
    perf tools: Recalc total periods using top-level entries in hierarchy
    perf tools: Remove nr_sort_keys field
    perf hists browser: Cleanup hist_browser__fprintf_hierarchy_entry()
    perf tools: Remove hist_entry->fmt field
    perf tools: Fix command line filters in hierarchy mode
    perf tools: Add more sort entry check functions
    perf tools: Fix hist_entry__filter() for hierarchy
    perf jitdump: Build only on supported archs
    tools lib traceevent: Add '~' operation within arg_num_eval()
    perf tools: Omit unnecessary cast in perf_pmu__parse_scale
    perf tools: Pass perf_hpp_list all the way through setup_sort_list
    perf tools: Fix perf script python database export crash
    perf jitdump: DWARF is also needed
    perf bench mem: Prepare the x86-64 build for upstream memcpy_mcsafe() changes
    ...

    Linus Torvalds
     

10 Mar, 2016

2 commits

  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Given we have uninitialized list_heads being passed to list_add() it
    will always be the case that those uninitialized values randomly trigger
    the poison value. Especially since a list_add() operation will seed the
    stack with the poison value for later stack allocations to trip over.

    For example, see these two false positive reports:

    list_add attempted on force-poisoned entry
    WARNING: at lib/list_debug.c:34
    [..]
    NIP [c00000000043c390] __list_add+0xb0/0x150
    LR [c00000000043c38c] __list_add+0xac/0x150
    Call Trace:
    __list_add+0xac/0x150 (unreliable)
    __down+0x4c/0xf8
    down+0x68/0x70
    xfs_buf_lock+0x4c/0x150 [xfs]

    list_add attempted on force-poisoned entry(0000000000000500),
    new->next == d0000000059ecdb0, new->prev == 0000000000000500
    WARNING: at lib/list_debug.c:33
    [..]
    NIP [c00000000042db78] __list_add+0xa8/0x140
    LR [c00000000042db74] __list_add+0xa4/0x140
    Call Trace:
    __list_add+0xa4/0x140 (unreliable)
    rwsem_down_read_failed+0x6c/0x1a0
    down_read+0x58/0x60
    xfs_log_commit_cil+0x7c/0x600 [xfs]

    Fixes: commit 5c2c2587b132 ("mm, dax, pmem: introduce {get|put}_dev_pagemap() for dax-gup")
    Signed-off-by: Dan Williams
    Reported-by: Eryu Guan
    Tested-by: Eryu Guan
    Cc: Ross Zwisler
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     

29 Feb, 2016

2 commits

  • Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Almost every cpumask function is exported, just not the one I need to make the
    Intel uncore driver modular.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andi Kleen
    Cc: Arnaldo Carvalho de Melo
    Cc: Borislav Petkov
    Cc: David S. Miller
    Cc: Harish Chegondi
    Cc: Jacob Pan
    Cc: Jiri Olsa
    Cc: Kan Liang
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rusty Russell
    Cc: Stephane Eranian
    Cc: Vince Weaver
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160222221011.878299859@linutronix.de
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

16 Feb, 2016

2 commits

  • The comparisons should be >= since 0x800 and 0x80 require an additional bit
    to store.

    For the 3 byte case, the existing shift would drop off 2 more bits than
    intended.

    For the 2 byte case, there should be 5 bits bits in byte 1, and 6 bits in
    byte 2.

    Signed-off-by: Jason Andryuk
    Reviewed-by: Laszlo Ersek
    Cc: Peter Jones
    Cc: Matthew Garrett
    Cc: "Lee, Chun-Yi"
    Signed-off-by: Matt Fleming

    Jason Andryuk
     
  • Pull EFI fixes from Matt Fleming:

    * Prevent accidental deletion of EFI variables through efivarfs that
    may brick machines. We use a whitelist of known-safe variables to
    allow things like installing distributions to work out of the box, and
    instead restrict vendor-specific variable deletion by making
    non-whitelist variables immutable (Peter Jones)

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

15 Feb, 2016

1 commit


13 Feb, 2016

1 commit


12 Feb, 2016

2 commits

  • The kptr_restrict flag, when set to 1, only prints the kernel address
    when the user has CAP_SYSLOG. When it is set to 2, the kernel address
    is always printed as zero. When set to 1, this needs to check whether
    or not we're in IRQ.

    However, when set to 2, this check is unneccessary, and produces
    confusing results in dmesg. Thus, only make sure we're not in IRQ when
    mode 1 is used, but not mode 2.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Jason A. Donenfeld
    Cc: Rasmus Villemoes
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason A. Donenfeld
     
  • When enabling UBSAN_SANITIZE_ALL, the kernel image size gets increased
    significantly (~3x). So, it sounds better to have some note in Kconfig.

    And, fixed a typo.

    Signed-off-by: Yang Shi
    Acked-by: Andrey Ryabinin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Shi
     

11 Feb, 2016

1 commit

  • Pull workqueue fixes from Tejun Heo:
    "Workqueue fixes for v4.5-rc3.

    - Remove a spurious triggering of flush dependency warning.

    - Officially break local execution guarantee of unbound work items
    and add a debug feature to flush out usages which depend on it.

    - Work around CPU -> NODE mapping becoming invalid on CPU offline.

    The branch is young but pushing out early as stable kernels are being
    affected"

    * 'for-4.5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup
    workqueue: implement "workqueue.debug_force_rr_cpu" debug feature
    workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs
    Revert "workqueue: make sure delayed work run in local cpu"
    workqueue: skip flush dependency checks for legacy workqueues

    Linus Torvalds
     

10 Feb, 2016

2 commits

  • This adds ucs2_utf8size(), which tells us how big our ucs2 string is in
    bytes, and ucs2_as_utf8, which translates from ucs2 to utf8..

    Signed-off-by: Peter Jones
    Tested-by: Lee, Chun-Yi
    Acked-by: Matthew Garrett
    Signed-off-by: Matt Fleming

    Peter Jones
     
  • Workqueue used to guarantee local execution for work items queued
    without explicit target CPU. The guarantee is gone now which can
    break some usages in subtle ways. To flush out those cases, this
    patch implements a debug feature which forces round-robin CPU
    selection for all such work items.

    The debug feature defaults to off and can be enabled with a kernel
    parameter. The default can be flipped with a debug config option.

    If you hit this commit during bisection, please refer to 041bd12e272c
    ("Revert "workqueue: make sure delayed work run in local cpu"") for
    more information and ping me.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

09 Feb, 2016

3 commits


08 Feb, 2016

1 commit

  • The starting node for a klist iteration is often passed in from
    somewhere way above the klist infrastructure, meaning there's no
    guarantee the node is still on the list. We've seen this in SCSI where
    we use bus_find_device() to iterate through a list of devices. In the
    face of heavy hotplug activity, the last device returned by
    bus_find_device() can be removed before the next call. This leads to

    Dec 3 13:22:02 localhost kernel: WARNING: CPU: 2 PID: 28073 at include/linux/kref.h:47 klist_iter_init_node+0x3d/0x50()
    Dec 3 13:22:02 localhost kernel: Modules linked in: scsi_debug x86_pkg_temp_thermal kvm_intel kvm irqbypass crc32c_intel joydev iTCO_wdt dcdbas ipmi_devintf acpi_power_meter iTCO_vendor_support ipmi_si imsghandler pcspkr wmi acpi_cpufreq tpm_tis tpm shpchp lpc_ich mfd_core nfsd nfs_acl lockd grace sunrpc tg3 ptp pps_core
    Dec 3 13:22:02 localhost kernel: CPU: 2 PID: 28073 Comm: cat Not tainted 4.4.0-rc1+ #2
    Dec 3 13:22:02 localhost kernel: Hardware name: Dell Inc. PowerEdge R320/08VT7V, BIOS 2.0.22 11/19/2013
    Dec 3 13:22:02 localhost kernel: ffffffff81a20e77 ffff880613acfd18 ffffffff81321eef 0000000000000000
    Dec 3 13:22:02 localhost kernel: ffff880613acfd50 ffffffff8107ca52 ffff88061176b198 0000000000000000
    Dec 3 13:22:02 localhost kernel: ffffffff814542b0 ffff880610cfb100 ffff88061176b198 ffff880613acfd60
    Dec 3 13:22:02 localhost kernel: Call Trace:
    Dec 3 13:22:02 localhost kernel: [] dump_stack+0x44/0x55
    Dec 3 13:22:02 localhost kernel: [] warn_slowpath_common+0x82/0xc0
    Dec 3 13:22:02 localhost kernel: [] ? proc_scsi_show+0x20/0x20
    Dec 3 13:22:02 localhost kernel: [] warn_slowpath_null+0x1a/0x20
    Dec 3 13:22:02 localhost kernel: [] klist_iter_init_node+0x3d/0x50
    Dec 3 13:22:02 localhost kernel: [] bus_find_device+0x51/0xb0
    Dec 3 13:22:02 localhost kernel: [] scsi_seq_next+0x2d/0x40
    [...]

    And an eventual crash. It can actually occur in any hotplug system
    which has a device finder and a starting device.

    We can fix this globally by making sure the starting node for
    klist_iter_init_node() is actually a member of the list before using it
    (and by starting from the beginning if it isn't).

    Reported-by: Ewan D. Milne
    Tested-by: Ewan D. Milne
    Cc: stable@vger.kernel.org
    Signed-off-by: James Bottomley
    Signed-off-by: Greg Kroah-Hartman

    James Bottomley
     

06 Feb, 2016

1 commit

  • Some servers experienced fatal deadlocks because of a combination of
    bugs, leading to multiple cpus calling dump_stack().

    The checksumming bug was fixed in commit 34ae6a1aa054 ("ipv6: update
    skb->csum when CE mark is propagated").

    The second problem is a faulty locking in dump_stack()

    CPU1 runs in process context and calls dump_stack(), grabs dump_lock.

    CPU2 receives a TCP packet under softirq, grabs socket spinlock, and
    call dump_stack() from netdev_rx_csum_fault().

    dump_stack() spins on atomic_cmpxchg(&dump_lock, -1, 2), since
    dump_lock is owned by CPU1

    While dumping its stack, CPU1 is interrupted by a softirq, and happens
    to process a packet for the TCP socket locked by CPU2.

    CPU1 spins forever in spin_lock() : deadlock

    Stack trace on CPU1 looked like :

    NMI backtrace for cpu 1
    RIP: _raw_spin_lock+0x25/0x30
    ...
    Call Trace:

    tcp_v6_rcv+0x243/0x620
    ip6_input_finish+0x11f/0x330
    ip6_input+0x38/0x40
    ip6_rcv_finish+0x3c/0x90
    ipv6_rcv+0x2a9/0x500
    process_backlog+0x461/0xaa0
    net_rx_action+0x147/0x430
    __do_softirq+0x167/0x2d0
    call_softirq+0x1c/0x30
    do_softirq+0x3f/0x80
    irq_exit+0x6e/0xc0
    smp_call_function_single_interrupt+0x35/0x40
    call_function_single_interrupt+0x6a/0x70

    printk+0x4d/0x4f
    printk_address+0x31/0x33
    print_trace_address+0x33/0x3c
    print_context_stack+0x7f/0x119
    dump_trace+0x26b/0x28e
    show_trace_log_lvl+0x4f/0x5c
    show_stack_log_lvl+0x104/0x113
    show_stack+0x42/0x44
    dump_stack+0x46/0x58
    netdev_rx_csum_fault+0x38/0x3c
    __skb_checksum_complete_head+0x6e/0x80
    __skb_checksum_complete+0x11/0x20
    tcp_rcv_established+0x2bd5/0x2fd0
    tcp_v6_do_rcv+0x13c/0x620
    sk_backlog_rcv+0x15/0x30
    release_sock+0xd2/0x150
    tcp_recvmsg+0x1c1/0xfc0
    inet_recvmsg+0x7d/0x90
    sock_recvmsg+0xaf/0xe0
    ___sys_recvmsg+0x111/0x3b0
    SyS_recvmsg+0x5c/0xb0
    system_call_fastpath+0x16/0x1b

    Fixes: b58d977432c8 ("dump_stack: serialize the output from dump_stack()")
    Signed-off-by: Eric Dumazet
    Cc: Alex Thorlton
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

04 Feb, 2016

2 commits

  • If the indirect_ptr bit is set on a slot, that indicates we need to redo
    the lookup. Introduce a new function radix_tree_iter_retry() which
    forces the loop to retry the lookup by setting 'slot' to NULL and
    turning the iterator back to point at the problematic entry.

    This is a pretty rare problem to hit at the moment; the lookup has to
    race with a grow of the radix tree from a height of 0. The consequences
    of hitting this race are that gang lookup could return a pointer to a
    radix_tree_node instead of a pointer to whatever the user had inserted
    in the tree.

    Fixes: cebbd29e1c2f ("radix-tree: rewrite gang lookup using iterator")
    Signed-off-by: Matthew Wilcox
    Cc: Hugh Dickins
    Cc: Ohad Ben-Cohen
    Cc: Konstantin Khlebnikov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Recently added commit 564b026fbd0d ("string_helpers: fix precision loss
    for some inputs") fixed precision issues for string_get_size() and broke
    tests.

    Fix and improve them: test both STRING_UNITS_2 and STRING_UNITS_10 at a
    time, better failure reporting, test small an huge values.

    Fixes: 564b026fbd0d28e9 ("string_helpers: fix precision loss for some inputs")
    Signed-off-by: Vitaly Kuznetsov
    Cc: Andy Shevchenko
    Cc: Rasmus Villemoes
    Cc: James Bottomley
    Cc: James Bottomley
    Cc: "James E.J. Bottomley"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Kuznetsov
     

27 Jan, 2016

1 commit

  • On my bigger s390 systems I always get "Out of memory.
    ODEBUG disabled". Since the number of objects is needed at
    compile time, we can not change the size dynamically before
    the caches etc are available. Doubling the size seems to
    do the trick. Since it is init data it will be freed anyway,
    this should be ok.

    Signed-off-by: Christian Borntraeger
    Link: http://lkml.kernel.org/r/1453905478-13409-1-git-send-email-borntraeger@de.ibm.com
    Signed-off-by: Thomas Gleixner

    Christian Borntraeger
     

24 Jan, 2016

1 commit

  • Pull rdma updates from Doug Ledford:
    "Initial roundup of 4.5 merge window patches

    - Remove usage of ib_query_device and instead store attributes in
    ib_device struct

    - Move iopoll out of block and into lib, rename to irqpoll, and use
    in several places in the rdma stack as our new completion queue
    polling library mechanism. Update the other block drivers that
    already used iopoll to use the new mechanism too.

    - Replace the per-entry GID table locks with a single GID table lock

    - IPoIB multicast cleanup

    - Cleanups to the IB MR facility

    - Add support for 64bit extended IB counters

    - Fix for netlink oops while parsing RDMA nl messages

    - RoCEv2 support for the core IB code

    - mlx4 RoCEv2 support

    - mlx5 RoCEv2 support

    - Cross Channel support for mlx5

    - Timestamp support for mlx5

    - Atomic support for mlx5

    - Raw QP support for mlx5

    - MAINTAINERS update for mlx4/mlx5

    - Misc ocrdma, qib, nes, usNIC, cxgb3, cxgb4, mlx4, mlx5 updates

    - Add support for remote invalidate to the iSER driver (pushed
    through the RDMA tree due to dependencies, acknowledged by nab)

    - Update to NFSoRDMA (pushed through the RDMA tree due to
    dependencies, acknowledged by Bruce)"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (169 commits)
    IB/mlx5: Unify CQ create flags check
    IB/mlx5: Expose Raw Packet QP to user space consumers
    {IB, net}/mlx5: Move the modify QP operation table to mlx5_ib
    IB/mlx5: Support setting Ethernet priority for Raw Packet QPs
    IB/mlx5: Add Raw Packet QP query functionality
    IB/mlx5: Add create and destroy functionality for Raw Packet QP
    IB/mlx5: Refactor mlx5_ib_qp to accommodate other QP types
    IB/mlx5: Allocate a Transport Domain for each ucontext
    net/mlx5_core: Warn on unsupported events of QP/RQ/SQ
    net/mlx5_core: Add RQ and SQ event handling
    net/mlx5_core: Export transport objects
    IB/mlx5: Expose CQE version to user-space
    IB/mlx5: Add CQE version 1 support to user QPs and SRQs
    IB/mlx5: Fix data validation in mlx5_ib_alloc_ucontext
    IB/sa: Fix netlink local service GFP crash
    IB/srpt: Remove redundant wc array
    IB/qib: Improve ipoib UD performance
    IB/mlx4: Advertise RoCE v2 support
    IB/mlx4: Create and use another QP1 for RoCEv2
    IB/mlx4: Enable send of RoCE QP1 packets with IP/UDP headers
    ...

    Linus Torvalds
     

23 Jan, 2016

2 commits

  • Pull crypto fixes from Herbert Xu:
    "This fixes the following issues:

    API:
    - A large number of bug fixes for the af_alg interface, credit goes
    to Dmitry Vyukov for discovering and reporting these issues.

    Algorithms:
    - sw842 needs to select crc32.
    - The soft dependency on crc32c is now in the correct spot.

    Drivers:
    - The atmel AES driver needs HAS_DMA.
    - The atmel AES driver was a missing break statement, fortunately
    it's only a debug function.
    - A number of bug fixes for the Intel qat driver"

    * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (24 commits)
    crypto: algif_skcipher - sendmsg SG marking is off by one
    crypto: crc32c - Fix crc32c soft dependency
    crypto: algif_skcipher - Load TX SG list after waiting
    crypto: atmel-aes - Add missing break to atmel_aes_reg_name
    crypto: algif_skcipher - Fix race condition in skcipher_check_key
    crypto: algif_hash - Fix race condition in hash_check_key
    crypto: CRYPTO_DEV_ATMEL_AES should depend on HAS_DMA
    lib: sw842: select crc32
    crypto: af_alg - Forbid bind(2) when nokey child sockets are present
    crypto: algif_skcipher - Remove custom release parent function
    crypto: algif_hash - Remove custom release parent function
    crypto: af_alg - Allow af_af_alg_release_parent to be called on nokey path
    crypto: qat - update init_esram for C3xxx dev type
    crypto: qat - fix timeout issues
    crypto: qat - remove to call get_sram_bar_id for qat_c3xxx
    crypto: algif_skcipher - Add key check exception for cipher_null
    crypto: skcipher - Add crypto_skcipher_has_setkey
    crypto: algif_hash - Require setkey before accept(2)
    crypto: hash - Add crypto_ahash_has_setkey
    crypto: algif_skcipher - Add nokey compatibility path
    ...

    Linus Torvalds
     
  • Merge misc fixes from Andrew Morton:
    "Six fixes"

    * emailed patches from Andrew Morton :
    ocfs2: NFS hangs in __ocfs2_cluster_lock due to race with ocfs2_unblock_lock
    reiserfs: fix dereference of ERR_PTR
    ratelimit: fix bug in time interval by resetting right begin time
    mm: fix kernel crash in khugepaged thread
    mm: fix mlock accouting
    thp: change pmd_trans_huge_lock() interface to return ptl

    Linus Torvalds
     

22 Jan, 2016

4 commits

  • Pull block driver updates from Jens Axboe:
    "This is the block driver pull request for 4.5, with the exception of
    NVMe, which is in a separate branch and will be posted after this one.

    This pull request contains:

    - A set of bcache stability fixes, which have been acked by Kent.
    These have been used and tested for more than a year by the
    community, so it's about time that they got in.

    - A set of drbd updates from the drbd team (Andreas, Lars, Philipp)
    and Markus Elfring, Oleg Drokin.

    - A set of fixes for xen blkback/front from the usual suspects, (Bob,
    Konrad) as well as community based fixes from Kiri, Julien, and
    Peng.

    - A 2038 time fix for sx8 from Shraddha, with a fix from me.

    - A small mtip32xx cleanup from Zhu Yanjun.

    - A null_blk division fix from Arnd"

    * 'for-4.5/drivers' of git://git.kernel.dk/linux-block: (71 commits)
    null_blk: use sector_div instead of do_div
    mtip32xx: restrict variables visible in current code module
    xen/blkfront: Fix crash if backend doesn't follow the right states.
    xen/blkback: Fix two memory leaks.
    xen/blkback: make st_ statistics per ring
    xen/blkfront: Handle non-indirect grant with 64KB pages
    xen-blkfront: Introduce blkif_ring_get_request
    xen-blkback: clear PF_NOFREEZE for xen_blkif_schedule()
    xen/blkback: Free resources if connect_ring failed.
    xen/blocks: Return -EXX instead of -1
    xen/blkback: make pool of persistent grants and free pages per-queue
    xen/blkback: get the number of hardware queues/rings from blkfront
    xen/blkback: pseudo support for multi hardware queues/rings
    xen/blkback: separate ring information out of struct xen_blkif
    xen/blkfront: correct setting for xen_blkif_max_ring_order
    xen/blkfront: make persistent grants pool per-queue
    xen/blkfront: Remove duplicate setting of ->xbdev.
    xen/blkfront: Cleanup of comments, fix unaligned variables, and syntax errors.
    xen/blkfront: negotiate number of queues/rings to be used with backend
    xen/blkfront: split per device io_lock
    ...

    Linus Torvalds
     
  • rs->begin in ratelimit is set in two cases.
    1) when rs->begin was not initialized
    2) when rs->interval was passed

    For case #2, current ratelimit sets the begin to 0. This incurrs
    improper suppression. The begin value will be set in the next ratelimit
    call by 1). Then the time interval check will be always false, and
    rs->printed will not be initialized. Although enough time passed,
    ratelimit may return 0 if rs->printed is not less than rs->burst. To
    reset interval properly, begin should be jiffies rather than 0.

    For an example code below:

    static DEFINE_RATELIMIT_STATE(mylimit, 1, 1);
    for (i = 1; i
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jaewon Kim
     
  • Expose an interface to allow users to mark several accesses together as
    being user space accesses, allowing batching of the surrounding user
    space access markers (SMAP on x86, PAN on arm64, domain register
    switching on arm).

    This is currently only used for the user string lenth and copying
    functions, where the SMAP overhead on x86 drowned the actual user
    accesses (only noticeable on newer microarchitectures that support SMAP
    in the first place, of course).

    * user access batching branch:
    Use the new batched user accesses in generic user string handling
    Add 'unsafe' user access functions for batched accesses
    x86: reorganize SMAP handling in user space accesses

    Linus Torvalds
     
  • Merge third patch-bomb from Andrew Morton:
    "I'm pretty much done for -rc1 now:

    - the rest of MM, basically

    - lib/ updates

    - checkpatch, epoll, hfs, fatfs, ptrace, coredump, exit

    - cpu_mask simplifications

    - kexec, rapidio, MAINTAINERS etc, etc.

    - more dma-mapping cleanups/simplifications from hch"

    * emailed patches from Andrew Morton : (109 commits)
    MAINTAINERS: add/fix git URLs for various subsystems
    mm: memcontrol: add "sock" to cgroup2 memory.stat
    mm: memcontrol: basic memory statistics in cgroup2 memory controller
    mm: memcontrol: do not uncharge old page in page cache replacement
    Documentation: cgroup: add memory.swap.{current,max} description
    mm: free swap cache aggressively if memcg swap is full
    mm: vmscan: do not scan anon pages if memcg swap limit is hit
    swap.h: move memcg related stuff to the end of the file
    mm: memcontrol: replace mem_cgroup_lruvec_online with mem_cgroup_online
    mm: vmscan: pass memcg to get_scan_count()
    mm: memcontrol: charge swap to cgroup2
    mm: memcontrol: clean up alloc, online, offline, free functions
    mm: memcontrol: flatten struct cg_proto
    mm: memcontrol: rein in the CONFIG space madness
    net: drop tcp_memcontrol.c
    mm: memcontrol: introduce CONFIG_MEMCG_LEGACY_KMEM
    mm: memcontrol: allow to disable kmem accounting for cgroup2
    mm: memcontrol: account "kmem" consumers in cgroup2 memory controller
    mm: memcontrol: move kmem accounting code to CONFIG_MEMCG
    mm: memcontrol: separate kmem code from legacy tcp accounting code
    ...

    Linus Torvalds
     

21 Jan, 2016

10 commits

  • Pull asm-generic updates from Arnd Bergmann:
    "The asm-generic tree this time contains one series from Nicolas Pitre
    that makes the optimized do_div() implementation from the ARM
    architecture available to all architectures.

    This also adds stricter type checking for callers of do_div, which has
    uncovered a number of bugs in existing code, and fixes up the ones we
    have found"

    * tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
    ARM: asm/div64.h: adjust to generic codde
    __div64_32(): make it overridable at compile time
    __div64_const32(): abstract out the actual 128-bit cross product code
    do_div(): generic optimization for constant divisor on 32-bit machines
    div64.h: optimize do_div() for power-of-two constant divisors
    mtd/sm_ftl.c: fix wrong do_div() usage
    drm/mgag200/mgag200_mode.c: fix wrong do_div() usage
    hid-sensor-hub.c: fix wrong do_div() usage
    ti/fapll: fix wrong do_div() usage
    ti/clkt_dpll: fix wrong do_div() usage
    tegra/clk-divider: fix wrong do_div() usage
    imx/clk-pllv2: fix wrong do_div() usage
    imx/clk-pllv1: fix wrong do_div() usage
    nouveau/nvkm/subdev/clk/gk20a.c: fix wrong do_div() usage

    Linus Torvalds
     
  • Larry Finger reports:
    "My PowerBook G4 Aluminum with a 32-bit PPC processor fails to boot for
    the 4.4-git series".

    This is likely due to X still needing /dev/mem access on this platform.

    CONFIG_IO_STRICT_DEVMEM is not yet safe to turn on when
    CONFIG_STRICT_DEVMEM=y.

    Remove the default so that old configurations do not change behavior.

    Fixes: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
    Reported-by: Larry Finger
    Tested-by: Larry Finger
    Link: http://marc.info/?l=linux-kernel&m=145332012023825&w=2
    Acked-by: Kees Cook
    Cc: Arnd Bergmann
    Cc: Ingo Molnar
    Cc: Russell King
    Cc: Andrew Morton
    Cc: Greg Kroah-Hartman
    Signed-off-by: Dan Williams
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • UBSAN uses compile-time instrumentation to catch undefined behavior
    (UB). Compiler inserts code that perform certain kinds of checks before
    operations that could cause UB. If check fails (i.e. UB detected)
    __ubsan_handle_* function called to print error message.

    So the most of the work is done by compiler. This patch just implements
    ubsan handlers printing errors.

    GCC has this capability since 4.9.x [1] (see -fsanitize=undefined
    option and its suboptions).
    However GCC 5.x has more checkers implemented [2].
    Article [3] has a bit more details about UBSAN in the GCC.

    [1] - https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Debugging-Options.html
    [2] - https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html
    [3] - http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/

    Issues which UBSAN has found thus far are:

    Found bugs:

    * out-of-bounds access - 97840cb67ff5 ("netfilter: nfnetlink: fix
    insufficient validation in nfnetlink_bind")

    undefined shifts:

    * d48458d4a768 ("jbd2: use a better hash function for the revoke
    table")

    * 10632008b9e1 ("clockevents: Prevent shift out of bounds")

    * 'x << -1' shift in ext4 -
    http://lkml.kernel.org/r/

    * undefined rol32(0) -
    http://lkml.kernel.org/r/

    * undefined dirty_ratelimit calculation -
    http://lkml.kernel.org/r/

    * undefined roundown_pow_of_two(0) -
    http://lkml.kernel.org/r/

    * [WONTFIX] undefined shift in __bpf_prog_run -
    http://lkml.kernel.org/r/

    WONTFIX here because it should be fixed in bpf program, not in kernel.

    signed overflows:

    * 32a8df4e0b33f ("sched: Fix odd values in effective_load()
    calculations")

    * mul overflow in ntp -
    http://lkml.kernel.org/r/

    * incorrect conversion into rtc_time in rtc_time64_to_tm() -
    http://lkml.kernel.org/r/

    * unvalidated timespec in io_getevents() -
    http://lkml.kernel.org/r/

    * [NOTABUG] signed overflow in ktime_add_safe() -
    http://lkml.kernel.org/r/

    [akpm@linux-foundation.org: fix unused local warning]
    [akpm@linux-foundation.org: fix __int128 build woes]
    Signed-off-by: Andrey Ryabinin
    Cc: Peter Zijlstra
    Cc: Sasha Levin
    Cc: Randy Dunlap
    Cc: Rasmus Villemoes
    Cc: Jonathan Corbet
    Cc: Michal Marek
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Yury Gribov
    Cc: Dmitry Vyukov
    Cc: Konstantin Khlebnikov
    Cc: Kostya Serebryany
    Cc: Johannes Berg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     
  • The clz table (__clz_tab) in lib/clz_tab.c is also provided as part of
    libgcc.a, and many architectures link against libgcc. To allow the
    linker to avoid a multiple-definition link failure, clz_tab.o has to be
    in lib/lib.a rather than lib/builtin.o. The specific issue is that
    libgcc.a comes before lib/builtin.o on vmlinux.o's link command line, so
    its _clz.o is pulled to satisfy __clz_tab, and then when the remainder
    of lib/builtin.o is pulled in to satisfy all the other dependencies, the
    __clz_tab symbols conflict. By putting clz_tab.o in lib.a, the linker
    can simply avoid pulling it into vmlinux.o when this situation arises.

    The definitions of __clz_tab are the same in libgcc.a and in the kernel;
    arguably we could also simply rename the kernel version, but it's
    unlikely the libgcc version will ever change to become incompatible, so
    just using it seems reasonably safe.

    Signed-off-by: Chris Metcalf
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Metcalf
     
  • Like others test are doing print the gathered statistics after test module
    is finished. Return from the module based on the result.

    Signed-off-by: Andy Shevchenko
    Acked-by: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • Currently the only one combination is tested for overflow, i.e. rowsize =
    16, groupsize = 1, len = 1. Do various test to go through all possible
    branches.

    Signed-off-by: Andy Shevchenko
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • After processing by hex_dump_to_buffer() check all the parts to be expected.

    Part 1. The actual expected hex dump with or without ASCII part.

    Part 2. Check if the buffer is dirty beyond needed.

    Part 3. Return code should be as expected.

    This is done by using comparison of the return code and memcmp() against
    the test buffer. We fill the buffer by FILL_CHAR ('#') characters, so, we
    expect to have a tail of the buffer will be left untouched. The
    terminating NUL is also checked by memcmp().

    Signed-off-by: Andy Shevchenko
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • Better to use memcmp() against entire buffer to check that nothing is
    happened to the data in the tail.

    Signed-off-by: Andy Shevchenko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • The magic numbers of the length are converted to their actual meaning,
    such as end of the buffer with and without ASCII part.

    We don't touch the rest of the magic constants that will be removed in the
    following commits.

    Signed-off-by: Andy Shevchenko
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • When test for overflow do iterate the buffer length in a range 0 ..
    BUF_SIZE.

    Signed-off-by: Andy Shevchenko
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko