26 Dec, 2016

1 commit

  • Add a new page flag, PageWaiters, to indicate the page waitqueue has
    tasks waiting. This can be tested rather than testing waitqueue_active
    which requires another cacheline load.

    This bit is always set when the page has tasks on page_waitqueue(page),
    and is set and cleared under the waitqueue lock. It may be set when
    there are no tasks on the waitqueue, which will cause a harmless extra
    wakeup check that will clears the bit.

    The generic bit-waitqueue infrastructure is no longer used for pages.
    Instead, waitqueues are used directly with a custom key type. The
    generic code was not flexible enough to have PageWaiters manipulation
    under the waitqueue lock (which simplifies concurrency).

    This improves the performance of page lock intensive microbenchmarks by
    2-3%.

    Putting two bits in the same word opens the opportunity to remove the
    memory barrier between clearing the lock bit and testing the waiters
    bit, after some work on the arch primitives (e.g., ensuring memory
    operand widths match and cover both bits).

    Signed-off-by: Nicholas Piggin
    Cc: Dave Hansen
    Cc: Bob Peterson
    Cc: Steven Whitehouse
    Cc: Andrew Lutomirski
    Cc: Andreas Gruenbacher
    Cc: Peter Zijlstra
    Cc: Mel Gorman
    Signed-off-by: Linus Torvalds

    Nicholas Piggin
     

25 Dec, 2016

1 commit


18 Dec, 2016

2 commits

  • Pull networking fixes and cleanups from David Miller:

    1) Revert bogus nla_ok() change, from Alexey Dobriyan.

    2) Various bpf validator fixes from Daniel Borkmann.

    3) Add some necessary SET_NETDEV_DEV() calls to hsis_femac and hip04
    drivers, from Dongpo Li.

    4) Several ethtool ksettings conversions from Philippe Reynes.

    5) Fix bugs in inet port management wrt. soreuseport, from Tom Herbert.

    6) XDP support for virtio_net, from John Fastabend.

    7) Fix NAT handling within a vrf, from David Ahern.

    8) Endianness fixes in dpaa_eth driver, from Claudiu Manoil

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (63 commits)
    net: mv643xx_eth: fix build failure
    isdn: Constify some function parameters
    mlxsw: spectrum: Mark split ports as such
    cgroup: Fix CGROUP_BPF config
    qed: fix old-style function definition
    net: ipv6: check route protocol when deleting routes
    r6040: move spinlock in r6040_close as SOFTIRQ-unsafe lock order detected
    irda: w83977af_ir: cleanup an indent issue
    net: sfc: use new api ethtool_{get|set}_link_ksettings
    net: davicom: dm9000: use new api ethtool_{get|set}_link_ksettings
    net: cirrus: ep93xx: use new api ethtool_{get|set}_link_ksettings
    net: chelsio: cxgb3: use new api ethtool_{get|set}_link_ksettings
    net: chelsio: cxgb2: use new api ethtool_{get|set}_link_ksettings
    bpf: fix mark_reg_unknown_value for spilled regs on map value marking
    bpf: fix overflow in prog accounting
    bpf: dynamically allocate digest scratch buffer
    gtp: Fix initialization of Flags octet in GTPv1 header
    gtp: gtp_check_src_ms_ipv4() always return success
    net/x25: use designated initializers
    isdn: use designated initializers
    ...

    Linus Torvalds
     
  • CGROUP_BPF depended on SOCK_CGROUP_DATA which can't be manually
    enabled, making it rather challenging to turn CGROUP_BPF on.

    Signed-off-by: Andy Lutomirski
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Andy Lutomirski
     

15 Dec, 2016

1 commit

  • Pull modules updates from Jessica Yu:
    "Summary of modules changes for the 4.10 merge window:

    - The rodata= cmdline parameter has been extended to additionally
    apply to module mappings

    - Fix a hard to hit race between module loader error/clean up
    handling and ftrace registration

    - Some code cleanups, notably panic.c and modules code use a unified
    taint_flags table now. This is much cleaner than duplicating the
    taint flag code in modules.c"

    * tag 'modules-for-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
    module: fix DEBUG_SET_MODULE_RONX typo
    module: extend 'rodata=off' boot cmdline parameter to module mappings
    module: Fix a comment above strong_try_module_get()
    module: When modifying a module's text ignore modules which are going away too
    module: Ensure a module's state is set accordingly during module coming cleanup code
    module: remove trailing whitespace
    taint/module: Clean up global and module taint flags handling
    modpost: free allocated memory

    Linus Torvalds
     

14 Dec, 2016

1 commit

  • Pull workqueue updates from Tejun Heo:
    "Mostly patches to initialize workqueue subsystem earlier and get rid
    of keventd_up().

    The patches were headed for the last merge cycle but got delayed due
    to a bug found late minute, which is fixed now.

    Also, to help debugging, destroy_workqueue() is more chatty now on a
    sanity check failure."

    * 'for-4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: move wq_numa_init() to workqueue_init()
    workqueue: remove keventd_up()
    debugobj, workqueue: remove keventd_up() usage
    slab, workqueue: remove keventd_up() usage
    power, workqueue: remove keventd_up() usage
    tty, workqueue: remove keventd_up() usage
    mce, workqueue: remove keventd_up() usage
    workqueue: make workqueue available early during boot
    workqueue: dump workqueue state on sanity check failures in destroy_workqueue()

    Linus Torvalds
     

13 Dec, 2016

5 commits

  • Pull documentation update from Jonathan Corbet:
    "These are the documentation changes for 4.10.

    It's another busy cycle for the docs tree, as the sphinx conversion
    continues. Highlights include:

    - Further work on PDF output, which remains a bit of a pain but
    should be more solid now.

    - Five more DocBook template files converted to Sphinx. Only 27 to
    go... Lots of plain-text files have also been converted and
    integrated.

    - Images in binary formats have been replaced with more
    source-friendly versions.

    - Various bits of organizational work, including the renaming of
    various files discussed at the kernel summit.

    - New documentation for the device_link mechanism.

    ... and, of course, lots of typo fixes and small updates"

    * tag 'docs-4.10' of git://git.lwn.net/linux: (193 commits)
    dma-buf: Extract dma-buf.rst
    Update Documentation/00-INDEX
    docs: 00-INDEX: document directories/files with no docs
    docs: 00-INDEX: remove non-existing entries
    docs: 00-INDEX: add missing entries for documentation files/dirs
    docs: 00-INDEX: consolidate process/ and admin-guide/ description
    scripts: add a script to check if Documentation/00-INDEX is sane
    Docs: change sh -> awk in REPORTING-BUGS
    Documentation/core-api/device_link: Add initial documentation
    core-api: remove an unexpected unident
    ppc/idle: Add documentation for powersave=off
    Doc: Correct typo, "Introdution" => "Introduction"
    Documentation/atomic_ops.txt: convert to ReST markup
    Documentation/local_ops.txt: convert to ReST markup
    Documentation/assoc_array.txt: convert to ReST markup
    docs-rst: parse-headers.pl: cleanup the documentation
    docs-rst: fix media cleandocs target
    docs-rst: media/Makefile: reorganize the rules
    docs-rst: media: build SVG from graphviz files
    docs-rst: replace bayer.png by a SVG image
    ...

    Linus Torvalds
     
  • Merge updates from Andrew Morton:

    - various misc bits

    - most of MM (quite a lot of MM material is awaiting the merge of
    linux-next dependencies)

    - kasan

    - printk updates

    - procfs updates

    - MAINTAINERS

    - /lib updates

    - checkpatch updates

    * emailed patches from Andrew Morton : (123 commits)
    init: reduce rootwait polling interval time to 5ms
    binfmt_elf: use vmalloc() for allocation of vma_filesz
    checkpatch: don't emit unified-diff error for rename-only patches
    checkpatch: don't check c99 types like uint8_t under tools
    checkpatch: avoid multiple line dereferences
    checkpatch: don't check .pl files, improve absolute path commit log test
    scripts/checkpatch.pl: fix spelling
    checkpatch: don't try to get maintained status when --no-tree is given
    lib/ida: document locking requirements a bit better
    lib/rbtree.c: fix typo in comment of ____rb_erase_color
    lib/Kconfig.debug: make CONFIG_STRICT_DEVMEM depend on CONFIG_DEVMEM
    MAINTAINERS: add drm and drm/i915 irc channels
    MAINTAINERS: add "C:" for URI for chat where developers hang out
    MAINTAINERS: add drm and drm/i915 bug filing info
    MAINTAINERS: add "B:" for URI where to file bugs
    get_maintainer: look for arbitrary letter prefixes in sections
    printk: add Kconfig option to set default console loglevel
    printk/sound: handle more message headers
    printk/btrfs: handle more message headers
    printk/kdb: handle more message headers
    ...

    Linus Torvalds
     
  • Pull timer updates from Thomas Gleixner:
    "The time/timekeeping/timer folks deliver with this update:

    - Fix a reintroduced signed/unsigned issue and cleanup the whole
    signed/unsigned mess in the timekeeping core so this wont happen
    accidentaly again.

    - Add a new trace clock based on boot time

    - Prevent injection of random sleep times when PM tracing abuses the
    RTC for storage

    - Make posix timers configurable for real tiny systems

    - Add tracepoints for the alarm timer subsystem so timer based
    suspend wakeups can be instrumented

    - The usual pile of fixes and updates to core and drivers"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
    timekeeping: Use mul_u64_u32_shr() instead of open coding it
    timekeeping: Get rid of pointless typecasts
    timekeeping: Make the conversion call chain consistently unsigned
    timekeeping_Force_unsigned_clocksource_to_nanoseconds_conversion
    alarmtimer: Add tracepoints for alarm timers
    trace: Update documentation for mono, mono_raw and boot clock
    trace: Add an option for boot clock as trace clock
    timekeeping: Add a fast and NMI safe boot clock
    timekeeping/clocksource_cyc2ns: Document intended range limitation
    timekeeping: Ignore the bogus sleep time if pm_trace is enabled
    selftests/timers: Fix spelling mistake "Asyncrhonous" -> "Asynchronous"
    clocksource/drivers/bcm2835_timer: Unmap region obtained by of_iomap
    clocksource/drivers/arm_arch_timer: Map frame with of_io_request_and_map()
    arm64: dts: rockchip: Arch counter doesn't tick in system suspend
    clocksource/drivers/arm_arch_timer: Don't assume clock runs in suspend
    posix-timers: Make them configurable
    posix_cpu_timers: Move the add_device_randomness() call to a proper place
    timer: Move sys_alarm from timer.c to itimer.c
    ptp_clock: Allow for it to be optional
    Kconfig: Regenerate *.c_shipped files after previous changes
    ...

    Linus Torvalds
     
  • For several devices, the rootwait time is sensitive because it directly
    affects booting time. The polling interval of rootwait is currently
    100ms. To save unnessesary waiting time, reduce the polling interval to
    5 ms.

    [akpm@linux-foundation.org: remove used-once #define]
    Link: http://lkml.kernel.org/r/20161207060743.1728-1-js07.lee@samsung.com
    Signed-off-by: Jungseung Lee
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jungseung Lee
     
  • Pull x86 idle updates from Ingo Molnar:
    "There were two bigger changes in this development cycle:

    - remove idle notifiers:

    32 files changed, 74 insertions(+), 803 deletions(-)

    These notifiers were of questionable value and the main usecase,
    the i7300 driver, was essentially unmaintained and can be removed,
    plus modern power management concepts don't need the callback - so
    use this golden opportunity and get rid of this opaque and fragile
    callback from a latency sensitive code path.

    (Len Brown, Thomas Gleixner)

    - improve the AMD Erratum 400 workaround that used high overhead MSR
    polling in the idle loop (Borisla Petkov, Thomas Gleixner)"

    * 'x86-idle-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86: Remove empty idle.h header
    x86/amd: Simplify AMD E400 aware idle routine
    x86/amd: Check for the C1E bug post ACPI subsystem init
    x86/bugs: Separate AMD E400 erratum and C1E bug
    x86/cpufeature: Provide helper to set bugs bits
    x86/idle: Remove enter_idle(), exit_idle()
    x86: Remove x86_test_and_clear_bit_percpu()
    x86/idle: Remove is_idle flag
    x86/idle: Remove idle_notifier
    i7300_idle: Remove this driver

    Linus Torvalds
     

10 Dec, 2016

1 commit

  • AMD CPUs affected by the E400 erratum suffer from the issue that the
    local APIC timer stops when the CPU goes into C1E. Unfortunately there
    is no way to detect the affected CPUs on early boot. It's only possible
    to determine the range of possibly affected CPUs from the family/model
    range.

    The actual decision whether to enter C1E and thus cause the bug is done
    by the firmware and we need to detect that case late, after ACPI has
    been initialized.

    The current solution is to check in the idle routine whether the CPU is
    affected by reading the MSR_K8_INT_PENDING_MSG MSR and checking for the
    K8_INTP_C1E_ACTIVE_MASK bits. If one of the bits is set then the CPU is
    affected and the system is switched into forced broadcast mode.

    This is ineffective and on non-affected CPUs every entry to idle does
    the extra RDMSR.

    After doing some research it turns out that the bits are visible on the
    boot CPU right after the ACPI subsystem is initialized in the early
    boot process. So instead of polling for the bits in the idle loop, add
    a detection function after acpi_subsystem_init() and check for the MSR
    bits. If set, then the X86_BUG_AMD_APIC_C1E is set on the boot CPU and
    the TSC is marked unstable when X86_FEATURE_NONSTOP_TSC is not set as it
    will stop in C1E state as well.

    The switch to broadcast mode cannot be done at this point because the
    boot CPU still uses HPET as a clockevent device and the local APIC timer
    is not yet calibrated and installed. The switch to broadcast mode on the
    affected CPUs needs to be done when the local APIC timer is actually set
    up.

    This allows to cleanup the amd_e400_idle() function in the next step.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Borislav Petkov
    Cc: Jiri Olsa
    Link: http://lkml.kernel.org/r/20161209182912.2726-4-bp@alien8.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

04 Dec, 2016

1 commit

  • Couple conflicts resolved here:

    1) In the MACB driver, a bug fix to properly initialize the
    RX tail pointer properly overlapped with some changes
    to support variable sized rings.

    2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix
    overlapping with a reorganization of the driver to support
    ACPI, OF, as well as PCI variants of the chip.

    3) In 'net' we had several probe error path bug fixes to the
    stmmac driver, meanwhile a lot of this code was cleaned up
    and reorganized in 'net-next'.

    4) The cls_flower classifier obtained a helper function in
    'net-next' called __fl_delete() and this overlapped with
    Daniel Borkamann's bug fix to use RCU for object destruction
    in 'net'. It also overlapped with Jiri's change to guard
    the rhashtable_remove_fast() call with a check against
    tc_skip_sw().

    5) In mlx4, a revert bug fix in 'net' overlapped with some
    unrelated changes in 'net-next'.

    6) In geneve, a stale header pointer after pskb_expand_head()
    bug fix in 'net' overlapped with a large reorganization of
    the same code in 'net-next'. Since the 'net-next' code no
    longer had the bug in question, there was nothing to do
    other than to simply take the 'net-next' hunks.

    Signed-off-by: David S. Miller

    David S. Miller
     

30 Nov, 2016

1 commit

  • This enables CONFIG_MODVERSIONS again, but allows for missing symbol CRC
    information in order to work around the issue that newer binutils
    versions seem to occasionally drop the CRC on the floor. binutils 2.26
    seems to work fine, while binutils 2.27 seems to break MODVERSIONS of
    symbols that have been defined in assembler files.

    [ We've had random missing CRC's before - it may be an old problem that
    just is now reliably triggered with the weak asm symbols and a new
    version of binutils ]

    Some day I really do want to remove MODVERSIONS entirely. Sadly, today
    does not appear to be that day: Debian people apparently do want the
    option to enable MODVERSIONS to make it easier to have external modules
    across kernel versions, and this seems to be a fairly minimal fix for
    the annoying problem.

    Cc: Ben Hutchings
    Acked-by: Michal Marek
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

29 Nov, 2016

1 commit

  • The newly added 'rodata_enabled' global variable is protected by
    the wrong #ifdef, leading to a link error when CONFIG_DEBUG_SET_MODULE_RONX
    is turned on:

    kernel/module.o: In function `disable_ro_nx':
    module.c:(.text.unlikely.disable_ro_nx+0x88): undefined reference to `rodata_enabled'
    kernel/module.o: In function `module_disable_ro':
    module.c:(.text.module_disable_ro+0x8c): undefined reference to `rodata_enabled'
    kernel/module.o: In function `module_enable_ro':
    module.c:(.text.module_enable_ro+0xb0): undefined reference to `rodata_enabled'

    CONFIG_SET_MODULE_RONX does not exist, so use the correct one instead.

    Fixes: 39290b389ea2 ("module: extend 'rodata=off' boot cmdline parameter to module mappings")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Jessica Yu

    Arnd Bergmann
     

28 Nov, 2016

1 commit

  • The current "rodata=off" parameter disables read-only kernel mappings
    under CONFIG_DEBUG_RODATA:
    commit d2aa1acad22f ("mm/init: Add 'rodata=off' boot cmdline parameter
    to disable read-only kernel mappings")

    This patch is a logical extension to module mappings ie. read-only mappings
    at module loading can be disabled even if CONFIG_DEBUG_SET_MODULE_RONX
    (mainly for debug use). Please note, however, that it only affects RO/RW
    permissions, keeping NX set.

    This is the first step to make CONFIG_DEBUG_SET_MODULE_RONX mandatory
    (always-on) in the future as CONFIG_DEBUG_RODATA on x86 and arm64.

    Suggested-by: and Acked-by: Mark Rutland
    Signed-off-by: AKASHI Takahiro
    Reviewed-by: Kees Cook
    Acked-by: Rusty Russell
    Link: http://lkml.kernel.org/r/20161114061505.15238-1-takahiro.akashi@linaro.org
    Signed-off-by: Jessica Yu

    AKASHI Takahiro
     

27 Nov, 2016

1 commit


26 Nov, 2016

2 commits

  • CONFIG_MODVERSIONS has been broken for pretty much the whole 4.9 series,
    and quite frankly, nobody has cared very deeply. We absolutely know how
    to fix it, and it's not _complicated_, but it's not exactly pretty
    either.

    This oneliner fixes it without the ugliness, and allows for further
    future cleanups.

    "We've secretly replaced their regular MODVERSIONS with nothing at
    all, let's see if they notice"

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • This patch adds two sets of eBPF program pointers to struct cgroup.
    One for such that are directly pinned to a cgroup, and one for such
    that are effective for it.

    To illustrate the logic behind that, assume the following example
    cgroup hierarchy.

    A - B - C
    \ D - E

    If only B has a program attached, it will be effective for B, C, D
    and E. If D then attaches a program itself, that will be effective for
    both D and E, and the program in B will only affect B and C. Only one
    program of a given type is effective for a cgroup.

    Attaching and detaching programs will be done through the bpf(2)
    syscall. For now, ingress and egress inet socket filtering are the
    only supported use-cases.

    Signed-off-by: Daniel Mack
    Acked-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Daniel Mack
     

25 Nov, 2016

1 commit


16 Nov, 2016

1 commit

  • Some embedded systems have no use for them. This removes about
    25KB from the kernel binary size when configured out.

    Corresponding syscalls are routed to a stub logging the attempt to
    use those syscalls which should be enough of a clue if they were
    disabled without proper consideration. They are: timer_create,
    timer_gettime: timer_getoverrun, timer_settime, timer_delete,
    clock_adjtime, setitimer, getitimer, alarm.

    The clock_settime, clock_gettime, clock_getres and clock_nanosleep
    syscalls are replaced by simple wrappers compatible with CLOCK_REALTIME,
    CLOCK_MONOTONIC and CLOCK_BOOTTIME only which should cover the vast
    majority of use cases with very little code.

    Signed-off-by: Nicolas Pitre
    Acked-by: Richard Cochran
    Acked-by: Thomas Gleixner
    Acked-by: John Stultz
    Reviewed-by: Josh Triplett
    Cc: Paul Bolle
    Cc: linux-kbuild@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Cc: Michal Marek
    Cc: Edward Cree
    Link: http://lkml.kernel.org/r/1478841010-28605-7-git-send-email-nicolas.pitre@linaro.org
    Signed-off-by: Thomas Gleixner

    Nicolas Pitre
     

24 Oct, 2016

1 commit


20 Oct, 2016

1 commit


16 Oct, 2016

1 commit

  • Pull gcc plugins update from Kees Cook:
    "This adds a new gcc plugin named "latent_entropy". It is designed to
    extract as much possible uncertainty from a running system at boot
    time as possible, hoping to capitalize on any possible variation in
    CPU operation (due to runtime data differences, hardware differences,
    SMP ordering, thermal timing variation, cache behavior, etc).

    At the very least, this plugin is a much more comprehensive example
    for how to manipulate kernel code using the gcc plugin internals"

    * tag 'gcc-plugins-v4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    latent_entropy: Mark functions with __latent_entropy
    gcc-plugins: Add latent_entropy plugin

    Linus Torvalds
     

15 Oct, 2016

1 commit

  • Pull kbuild updates from Michal Marek:

    - EXPORT_SYMBOL for asm source by Al Viro.

    This does bring a regression, because genksyms no longer generates
    checksums for these symbols (CONFIG_MODVERSIONS). Nick Piggin is
    working on a patch to fix this.

    Plus, we are talking about functions like strcpy(), which rarely
    change prototypes.

    - Fixes for PPC fallout of the above by Stephen Rothwell and Nick
    Piggin

    - fixdep speedup by Alexey Dobriyan.

    - preparatory work by Nick Piggin to allow architectures to build with
    -ffunction-sections, -fdata-sections and --gc-sections

    - CONFIG_THIN_ARCHIVES support by Stephen Rothwell

    - fix for filenames with colons in the initramfs source by me.

    * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (22 commits)
    initramfs: Escape colons in depfile
    ppc: there is no clear_pages to export
    powerpc/64: whitelist unresolved modversions CRCs
    kbuild: -ffunction-sections fix for archs with conflicting sections
    kbuild: add arch specific post-link Makefile
    kbuild: allow archs to select link dead code/data elimination
    kbuild: allow architectures to use thin archives instead of ld -r
    kbuild: Regenerate genksyms lexer
    kbuild: genksyms fix for typeof handling
    fixdep: faster CONFIG_ search
    ia64: move exports to definitions
    sparc32: debride memcpy.S a bit
    [sparc] unify 32bit and 64bit string.h
    sparc: move exports to definitions
    ppc: move exports to definitions
    arm: move exports to definitions
    s390: move exports to definitions
    m68k: move exports to definitions
    alpha: move exports to actual definitions
    x86: move exports to actual definitions
    ...

    Linus Torvalds
     

12 Oct, 2016

1 commit

  • Relay avoids calling wake_up_interruptible() for doing the wakeup of
    readers/consumers, waiting for the generation of new data, from the
    context of a process which produced the data. This is apparently done to
    prevent the possibility of a deadlock in case Scheduler itself is is
    generating data for the relay, after acquiring rq->lock.

    The following patch used a timer (to be scheduled at next jiffy), for
    delegating the wakeup to another context.
    commit 7c9cb38302e78d24e37f7d8a2ea7eed4ae5f2fa7
    Author: Tom Zanussi
    Date: Wed May 9 02:34:01 2007 -0700

    relay: use plain timer instead of delayed work

    relay doesn't need to use schedule_delayed_work() for waking readers
    when a simple timer will do.

    Scheduling a plain timer, at next jiffies boundary, to do the wakeup
    causes a significant wakeup latency for the Userspace client, which makes
    relay less suitable for the high-frequency low-payload use cases where the
    data gets generated at a very high rate, like multiple sub buffers getting
    filled within a milli second. Moreover the timer is re-scheduled on every
    newly produced sub buffer so the timer keeps getting pushed out if sub
    buffers are filled in a very quick succession (less than a jiffy gap
    between filling of 2 sub buffers). As a result relay runs out of sub
    buffers to store the new data.

    By using irq_work it is ensured that wakeup of userspace client, blocked
    in the poll call, is done at earliest (through self IPI or next timer
    tick) enabling it to always consume the data in time. Also this makes
    relay consistent with printk & ring buffers (trace), as they too use
    irq_work for deferred wake up of readers.

    [arnd@arndb.de: select CONFIG_IRQ_WORK]
    Link: http://lkml.kernel.org/r/20160912154035.3222156-1-arnd@arndb.de
    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/1472906487-1559-1-git-send-email-akash.goel@intel.com
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Akash Goel
    Cc: Tom Zanussi
    Cc: Chris Wilson
    Cc: Tvrtko Ursulin
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

11 Oct, 2016

1 commit

  • This adds a new gcc plugin named "latent_entropy". It is designed to
    extract as much possible uncertainty from a running system at boot time as
    possible, hoping to capitalize on any possible variation in CPU operation
    (due to runtime data differences, hardware differences, SMP ordering,
    thermal timing variation, cache behavior, etc).

    At the very least, this plugin is a much more comprehensive example for
    how to manipulate kernel code using the gcc plugin internals.

    The need for very-early boot entropy tends to be very architecture or
    system design specific, so this plugin is more suited for those sorts
    of special cases. The existing kernel RNG already attempts to extract
    entropy from reliable runtime variation, but this plugin takes the idea to
    a logical extreme by permuting a global variable based on any variation
    in code execution (e.g. a different value (and permutation function)
    is used to permute the global based on loop count, case statement,
    if/then/else branching, etc).

    To do this, the plugin starts by inserting a local variable in every
    marked function. The plugin then adds logic so that the value of this
    variable is modified by randomly chosen operations (add, xor and rol) and
    random values (gcc generates separate static values for each location at
    compile time and also injects the stack pointer at runtime). The resulting
    value depends on the control flow path (e.g., loops and branches taken).

    Before the function returns, the plugin mixes this local variable into
    the latent_entropy global variable. The value of this global variable
    is added to the kernel entropy pool in do_one_initcall() and _do_fork(),
    though it does not credit any bytes of entropy to the pool; the contents
    of the global are just used to mix the pool.

    Additionally, the plugin can pre-initialize arrays with build-time
    random contents, so that two different kernel builds running on identical
    hardware will not have the same starting values.

    Signed-off-by: Emese Revfy
    [kees: expanded commit message and code comments]
    Signed-off-by: Kees Cook

    Emese Revfy
     

08 Oct, 2016

1 commit

  • Pull parisc updates from Helge Deller:
    "Changes include:

    - Fix boot of 32bit SMP kernel (initial kernel mapping was too small)

    - Added hardened usercopy checks

    - Drop bootmem and switch to memblock and NO_BOOTMEM implementation

    - Drop the BROKEN_RODATA config option (and thus remove the relevant
    code from the generic headers and files because parisc was the last
    architecture which used this config option)

    - Improve segfault reporting by printing human readable error strings

    - Various smaller changes, e.g. dwarf debug support for assembly
    code, update comments regarding copy_user_page_asm, switch to
    kmalloc_array()"

    * 'parisc-4.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
    parisc: Increase KERNEL_INITIAL_SIZE for 32-bit SMP kernels
    parisc: Drop bootmem and switch to memblock
    parisc: Add hardened usercopy feature
    parisc: Add cfi_startproc and cfi_endproc to assembly code
    parisc: Move hpmc stack into page aligned bss section
    parisc: Fix self-detected CPU stall warnings on Mako machines
    parisc: Report trap type as human readable string
    parisc: Update comment regarding implementation of copy_user_page_asm
    parisc: Use kmalloc_array() in add_system_map_addresses()
    parisc: Check return value of smp_boot_one_cpu()
    parisc: Drop BROKEN_RODATA config option

    Linus Torvalds
     

21 Sep, 2016

1 commit


18 Sep, 2016

1 commit

  • Workqueue is currently initialized in an early init call; however,
    there are cases where early boot code has to be split and reordered to
    come after workqueue initialization or the same code path which makes
    use of workqueues is used both before workqueue initailization and
    after. The latter cases have to gate workqueue usages with
    keventd_up() tests, which is nasty and easy to get wrong.

    Workqueue usages have become widespread and it'd be a lot more
    convenient if it can be used very early from boot. This patch splits
    workqueue initialization into two steps. workqueue_init_early() which
    sets up the basic data structures so that workqueues can be created
    and work items queued, and workqueue_init() which actually brings up
    workqueues online and starts executing queued work items. The former
    step can be done very early during boot once memory allocation,
    cpumasks and idr are initialized. The latter right after kthreads
    become available.

    This allows work item queueing and canceling from very early boot
    which is what most of these use cases want.

    * As systemd_wq being initialized doesn't indicate that workqueue is
    fully online anymore, update keventd_up() to test wq_online instead.
    The follow-up patches will get rid of all its usages and the
    function itself.

    * Flushing doesn't make sense before workqueue is fully initialized.
    The flush functions trigger WARN and return immediately before fully
    online.

    * Work items are never in-flight before fully online. Canceling can
    always succeed by skipping the flush step.

    * Some code paths can no longer assume to be called with irq enabled
    as irq is disabled during early boot. Use irqsave/restore
    operations instead.

    v2: Watchdog init, which requires timer to be running, moved from
    workqueue_init_early() to workqueue_init().

    Signed-off-by: Tejun Heo
    Suggested-by: Linus Torvalds
    Link: http://lkml.kernel.org/r/CA+55aFx0vPuMuxn00rBSM192n-Du5uxy+4AvKa0SBSOVJeuCGg@mail.gmail.com

    Tejun Heo
     

16 Sep, 2016

1 commit

  • There are a few places in the kernel that access stack memory
    belonging to a different task. Before we can start freeing task
    stacks before the task_struct is freed, we need a way for those code
    paths to pin the stack.

    Signed-off-by: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Jann Horn
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/17a434f50ad3d77000104f21666575e10a9c1fbd.1474003868.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

15 Sep, 2016

1 commit

  • If an arch opts in by setting CONFIG_THREAD_INFO_IN_TASK_STRUCT,
    then thread_info is defined as a single 'u32 flags' and is the first
    entry of task_struct. thread_info::task is removed (it serves no
    purpose if thread_info is embedded in task_struct), and
    thread_info::cpu gets its own slot in task_struct.

    This is heavily based on a patch written by Linus.

    Originally-from: Linus Torvalds
    Signed-off-by: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Jann Horn
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/a0898196f0476195ca02713691a5037a14f2aac5.1473801993.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

09 Sep, 2016

1 commit

  • Introduce LD_DEAD_CODE_DATA_ELIMINATION option for architectures to
    select to build with -ffunction-sections, -fdata-sections, and link
    with --gc-sections. It requires some work (documented) to ensure all
    unreferenced entrypoints are live, and requires toolchain and build
    verification, so it is made a per-arch option for now.

    On a random powerpc64le build, this yelds a significant size saving,
    it boots and runs fine, but there is a lot I haven't tested as yet, so
    these savings may be reduced if there are bugs in the link.

    text data bss dec filename
    11169741 1180744 1923176 14273661 vmlinux
    10445269 1004127 1919707 13369103 vmlinux.dce

    ~700K text, ~170K data, 6% removed from kernel image size.

    Signed-off-by: Nicholas Piggin
    Signed-off-by: Michal Marek

    Nicholas Piggin
     

09 Aug, 2016

1 commit

  • Pull usercopy protection from Kees Cook:
    "Tbhis implements HARDENED_USERCOPY verification of copy_to_user and
    copy_from_user bounds checking for most architectures on SLAB and
    SLUB"

    * tag 'usercopy-v4.8' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
    mm: SLUB hardened usercopy support
    mm: SLAB hardened usercopy support
    s390/uaccess: Enable hardened usercopy
    sparc/uaccess: Enable hardened usercopy
    powerpc/uaccess: Enable hardened usercopy
    ia64/uaccess: Enable hardened usercopy
    arm64/uaccess: Enable hardened usercopy
    ARM: uaccess: Enable hardened usercopy
    x86/uaccess: Enable hardened usercopy
    mm: Hardened usercopy
    mm: Implement stack frame object validation
    mm: Add is_migrate_cma_page

    Linus Torvalds
     

03 Aug, 2016

6 commits

  • It doesn't trim just symbols that are totally unused in-tree - it trims
    the symbols unused by any in-tree modules actually built. If you've
    done a 'make localmodconfig' and only build a hundred or so modules,
    it's pretty likely that your out-of-tree module will come up lacking
    something...

    Hopefully this will save the next guy from a Homer Simpson "D'oh!"
    moment.

    Link: http://lkml.kernel.org/r/10177.1469787292@turing-police.cc.vt.edu
    Signed-off-by: Valdis Kletnieks
    Cc: Michal Marek
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Valdis Kletnieks
     
  • Doing patches with allmodconfig kernel compiled and committing stuff
    into local tree have unfortunate consequence: kernel version changes (as
    it should) leading to recompiling and relinking of several files even if
    they weren't touched (or interesting at all). This and "git-whatever"
    figuring out current version slow down compilation for no good reason.

    But lets face it, "allmodconfig" kernels don't care about kernel
    version, they are simply compile check guinea pigs.

    Make LOCALVERSION_AUTO depend on !COMPILE_TEST, so it doesn't sneak into
    allmodconfig .config.

    Link: http://lkml.kernel.org/r/20160707214954.GC31678@p183.telecom.by
    Signed-off-by: Alexey Dobriyan
    Cc: Michal Marek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • sprint_symbol_no_offset() returns the string "function_name
    [module_name]" where [module_name] is not printed for built in kernel
    functions. This means that the blacklisting code will fail when
    comparing module function names with the extended string.

    This patch adds the functionality to block a module's module_init()
    function by finding the space in the string and truncating the
    comparison to that length.

    Link: http://lkml.kernel.org/r/1466124387-20446-1-git-send-email-prarit@redhat.com
    Signed-off-by: Prarit Bhargava
    Cc: Thomas Gleixner
    Cc: Yang Shi
    Cc: Prarit Bhargava
    Cc: Ingo Molnar
    Cc: Mel Gorman
    Cc: Rasmus Villemoes
    Cc: Kees Cook
    Cc: Yaowei Bai
    Cc: Andrey Ryabinin
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Prarit Bhargava
     
  • There was only one use of __initdata_refok and __exit_refok

    __init_refok was used 46 times against 82 for __ref.

    Those definitions are obsolete since commit 312b1485fb50 ("Introduce new
    section reference annotations tags: __ref, __refdata, __refconst")

    This patch removes the following compatibility definitions and replaces
    them treewide.

    /* compatibility defines */
    #define __init_refok __ref
    #define __initdata_refok __refdata
    #define __exit_refok __ref

    I can also provide separate patches if necessary.
    (One patch per tree and check in 1 month or 2 to remove old definitions)

    [akpm@linux-foundation.org: coding-style fixes]
    Link: http://lkml.kernel.org/r/1466796271-3043-1-git-send-email-fabf@skynet.be
    Signed-off-by: Fabian Frederick
    Cc: Ingo Molnar
    Cc: Sam Ravnborg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • UML is a bit special since it does not have iomem nor dma. That means a
    lot of drivers will not build if they miss a dependency on HAS_IOMEM.
    s390 used to have the same issues but since it gained PCI support UML is
    the only stranger.

    We are tired of patching dozens of new drivers after every merge window
    just to un-break allmod/yesconfig UML builds. One could argue that a
    decent driver has to know on what it depends and therefore a missing
    HAS_IOMEM dependency is a clear driver bug. But the dependency not
    obvious and not everyone does UML builds with COMPILE_TEST enabled when
    developing a device driver.

    A possible solution to make these builds succeed on UML would be
    providing stub functions for ioremap() and friends which fail upon
    runtime. Another one is simply disabling COMPILE_TEST for UML. Since
    it is the least hassle and does not force use to fake iomem support
    let's do the latter.

    Link: http://lkml.kernel.org/r/1466152995-28367-1-git-send-email-richard@nod.at
    Signed-off-by: Richard Weinberger
    Acked-by: Arnd Bergmann
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Weinberger
     
  • cgroup's document path is changed to "cgroup-v1". update it.

    Link: http://lkml.kernel.org/r/1470148443-6509-1-git-send-email-iamyooon@gmail.com
    Signed-off-by: seokhoon.yoon
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    seokhoon.yoon