02 Mar, 2016

1 commit

  • Make it possible to write a target state to the per cpu state file, so we can
    switch between states.

    Signed-off-by: Thomas Gleixner
    Cc: linux-arch@vger.kernel.org
    Cc: Rik van Riel
    Cc: Rafael Wysocki
    Cc: "Srivatsa S. Bhat"
    Cc: Peter Zijlstra
    Cc: Arjan van de Ven
    Cc: Sebastian Siewior
    Cc: Rusty Russell
    Cc: Steven Rostedt
    Cc: Oleg Nesterov
    Cc: Tejun Heo
    Cc: Andrew Morton
    Cc: Paul McKenney
    Cc: Linus Torvalds
    Cc: Paul Turner
    Link: http://lkml.kernel.org/r/20160226182341.022814799@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

10 Feb, 2016

1 commit

  • Workqueue used to guarantee local execution for work items queued
    without explicit target CPU. The guarantee is gone now which can
    break some usages in subtle ways. To flush out those cases, this
    patch implements a debug feature which forces round-robin CPU
    selection for all such work items.

    The debug feature defaults to off and can be enabled with a kernel
    parameter. The default can be flipped with a debug config option.

    If you hit this commit during bisection, please refer to 041bd12e272c
    ("Revert "workqueue: make sure delayed work run in local cpu"") for
    more information and ping me.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

22 Jan, 2016

1 commit

  • Merge third patch-bomb from Andrew Morton:
    "I'm pretty much done for -rc1 now:

    - the rest of MM, basically

    - lib/ updates

    - checkpatch, epoll, hfs, fatfs, ptrace, coredump, exit

    - cpu_mask simplifications

    - kexec, rapidio, MAINTAINERS etc, etc.

    - more dma-mapping cleanups/simplifications from hch"

    * emailed patches from Andrew Morton : (109 commits)
    MAINTAINERS: add/fix git URLs for various subsystems
    mm: memcontrol: add "sock" to cgroup2 memory.stat
    mm: memcontrol: basic memory statistics in cgroup2 memory controller
    mm: memcontrol: do not uncharge old page in page cache replacement
    Documentation: cgroup: add memory.swap.{current,max} description
    mm: free swap cache aggressively if memcg swap is full
    mm: vmscan: do not scan anon pages if memcg swap limit is hit
    swap.h: move memcg related stuff to the end of the file
    mm: memcontrol: replace mem_cgroup_lruvec_online with mem_cgroup_online
    mm: vmscan: pass memcg to get_scan_count()
    mm: memcontrol: charge swap to cgroup2
    mm: memcontrol: clean up alloc, online, offline, free functions
    mm: memcontrol: flatten struct cg_proto
    mm: memcontrol: rein in the CONFIG space madness
    net: drop tcp_memcontrol.c
    mm: memcontrol: introduce CONFIG_MEMCG_LEGACY_KMEM
    mm: memcontrol: allow to disable kmem accounting for cgroup2
    mm: memcontrol: account "kmem" consumers in cgroup2 memory controller
    mm: memcontrol: move kmem accounting code to CONFIG_MEMCG
    mm: memcontrol: separate kmem code from legacy tcp accounting code
    ...

    Linus Torvalds
     

21 Jan, 2016

2 commits

  • Larry Finger reports:
    "My PowerBook G4 Aluminum with a 32-bit PPC processor fails to boot for
    the 4.4-git series".

    This is likely due to X still needing /dev/mem access on this platform.

    CONFIG_IO_STRICT_DEVMEM is not yet safe to turn on when
    CONFIG_STRICT_DEVMEM=y.

    Remove the default so that old configurations do not change behavior.

    Fixes: 90a545e98126 ("restrict /dev/mem to idle io memory ranges")
    Reported-by: Larry Finger
    Tested-by: Larry Finger
    Link: http://marc.info/?l=linux-kernel&m=145332012023825&w=2
    Acked-by: Kees Cook
    Cc: Arnd Bergmann
    Cc: Ingo Molnar
    Cc: Russell King
    Cc: Andrew Morton
    Cc: Greg Kroah-Hartman
    Signed-off-by: Dan Williams
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • UBSAN uses compile-time instrumentation to catch undefined behavior
    (UB). Compiler inserts code that perform certain kinds of checks before
    operations that could cause UB. If check fails (i.e. UB detected)
    __ubsan_handle_* function called to print error message.

    So the most of the work is done by compiler. This patch just implements
    ubsan handlers printing errors.

    GCC has this capability since 4.9.x [1] (see -fsanitize=undefined
    option and its suboptions).
    However GCC 5.x has more checkers implemented [2].
    Article [3] has a bit more details about UBSAN in the GCC.

    [1] - https://gcc.gnu.org/onlinedocs/gcc-4.9.0/gcc/Debugging-Options.html
    [2] - https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html
    [3] - http://developerblog.redhat.com/2014/10/16/gcc-undefined-behavior-sanitizer-ubsan/

    Issues which UBSAN has found thus far are:

    Found bugs:

    * out-of-bounds access - 97840cb67ff5 ("netfilter: nfnetlink: fix
    insufficient validation in nfnetlink_bind")

    undefined shifts:

    * d48458d4a768 ("jbd2: use a better hash function for the revoke
    table")

    * 10632008b9e1 ("clockevents: Prevent shift out of bounds")

    * 'x << -1' shift in ext4 -
    http://lkml.kernel.org/r/

    * undefined rol32(0) -
    http://lkml.kernel.org/r/

    * undefined dirty_ratelimit calculation -
    http://lkml.kernel.org/r/

    * undefined roundown_pow_of_two(0) -
    http://lkml.kernel.org/r/

    * [WONTFIX] undefined shift in __bpf_prog_run -
    http://lkml.kernel.org/r/

    WONTFIX here because it should be fixed in bpf program, not in kernel.

    signed overflows:

    * 32a8df4e0b33f ("sched: Fix odd values in effective_load()
    calculations")

    * mul overflow in ntp -
    http://lkml.kernel.org/r/

    * incorrect conversion into rtc_time in rtc_time64_to_tm() -
    http://lkml.kernel.org/r/

    * unvalidated timespec in io_getevents() -
    http://lkml.kernel.org/r/

    * [NOTABUG] signed overflow in ktime_add_safe() -
    http://lkml.kernel.org/r/

    [akpm@linux-foundation.org: fix unused local warning]
    [akpm@linux-foundation.org: fix __int128 build woes]
    Signed-off-by: Andrey Ryabinin
    Cc: Peter Zijlstra
    Cc: Sasha Levin
    Cc: Randy Dunlap
    Cc: Rasmus Villemoes
    Cc: Jonathan Corbet
    Cc: Michal Marek
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Yury Gribov
    Cc: Dmitry Vyukov
    Cc: Konstantin Khlebnikov
    Cc: Kostya Serebryany
    Cc: Johannes Berg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     

17 Jan, 2016

1 commit

  • As illustrated by commit a3afe70b83fd ("[S390] latencytop s390
    support."), HAVE_LATENCYTOP_SUPPORT is defined by an architecture to
    advertise an implementation of save_stack_trace_tsk.

    However, as of 9212ddb5eada ("stacktrace: provide save_stack_trace_tsk()
    weak alias") a dummy implementation is provided if STACKTRACE=y. Given
    that LATENCYTOP already depends on STACKTRACE_SUPPORT and selects
    STACKTRACE, we can remove HAVE_LATENCYTOP_SUPPORT altogether.

    Signed-off-by: Will Deacon
    Acked-by: Heiko Carstens
    Cc: Vineet Gupta
    Cc: Russell King
    Cc: James Hogan
    Cc: Michal Simek
    Cc: Helge Deller
    Acked-by: Michael Ellerman
    Cc: "David S. Miller"
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Will Deacon
     

16 Jan, 2016

1 commit

  • This patch adds a third argument to macros which create function
    definitions for page flags. This argument defines how page-flags
    helpers behave on compound functions.

    For now we define four policies:

    - PF_ANY: the helper function operates on the page it gets, regardless
    if it's non-compound, head or tail.

    - PF_HEAD: the helper function operates on the head page of the
    compound page if it gets tail page.

    - PF_NO_TAIL: only head and non-compond pages are acceptable for this
    helper function.

    - PF_NO_COMPOUND: only non-compound pages are acceptable for this
    helper function.

    For now we use policy PF_ANY for all helpers, which matches current
    behaviour.

    We do not enforce the policy for TESTPAGEFLAG, because we have flags
    checked for random pages all over the kernel. Noticeable exception to
    this is PageTransHuge() which triggers VM_BUG_ON() for tail page.

    Signed-off-by: Kirill A. Shutemov
    Cc: Andrea Arcangeli
    Cc: Hugh Dickins
    Cc: Dave Hansen
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Vlastimil Babka
    Cc: Christoph Lameter
    Cc: Naoya Horiguchi
    Cc: Steve Capper
    Cc: "Aneesh Kumar K.V"
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Jerome Marchand
    Cc: Jérôme Glisse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

14 Jan, 2016

1 commit

  • Pull libnvdimm updates from Dan Williams:
    "The bulk of this has appeared in -next and independently received a
    build success notification from the kbuild robot. The 'for-4.5/block-
    dax' topic branch was rebased over the weekend to drop the "block
    device end-of-life" rework that Al would like to see re-implemented
    with a notifier, and to address bug reports against the badblocks
    integration.

    There is pending feedback against "libnvdimm: Add a poison list and
    export badblocks" received last week. Linda identified some localized
    fixups that we will handle incrementally.

    Summary:

    - Media error handling: The 'badblocks' implementation that
    originated in md-raid is up-levelled to a generic capability of a
    block device. This initial implementation is limited to being
    consulted in the pmem block-i/o path. Later, 'badblocks' will be
    consulted when creating dax mappings.

    - Raw block device dax: For virtualization and other cases that want
    large contiguous mappings of persistent memory, add the capability
    to dax-mmap a block device directly.

    - Increased /dev/mem restrictions: Add an option to treat all
    io-memory as IORESOURCE_EXCLUSIVE, i.e. disable /dev/mem access
    while a driver is actively using an address range. This behavior
    is controlled via the new CONFIG_IO_STRICT_DEVMEM option and can be
    overridden by the existing "iomem=relaxed" kernel command line
    option.

    - Miscellaneous fixes include a 'pfn'-device huge page alignment fix,
    block device shutdown crash fix, and other small libnvdimm fixes"

    * tag 'libnvdimm-for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (32 commits)
    block: kill disk_{check|set|clear|alloc}_badblocks
    libnvdimm, pmem: nvdimm_read_bytes() badblocks support
    pmem, dax: disable dax in the presence of bad blocks
    pmem: fail io-requests to known bad blocks
    libnvdimm: convert to statically allocated badblocks
    libnvdimm: don't fail init for full badblocks list
    block, badblocks: introduce devm_init_badblocks
    block: clarify badblocks lifetime
    badblocks: rename badblocks_free to badblocks_exit
    libnvdimm, pmem: move definition of nvdimm_namespace_add_poison to nd.h
    libnvdimm: Add a poison list and export badblocks
    nfit_test: Enable DSMs for all test NFITs
    md: convert to use the generic badblocks code
    block: Add badblock management for gendisks
    badblocks: Add core badblock management code
    block: fix del_gendisk() vs blkdev_ioctl crash
    block: enable dax for raw block devices
    block: introduce bdev_file_inode()
    restrict /dev/mem to idle io memory ranges
    arch: consolidate CONFIG_STRICT_DEVM in lib/Kconfig.debug
    ...

    Linus Torvalds
     

13 Jan, 2016

1 commit

  • Pull networking updates from Davic Miller:

    1) Support busy polling generically, for all NAPI drivers. From Eric
    Dumazet.

    2) Add byte/packet counter support to nft_ct, from Floriani Westphal.

    3) Add RSS/XPS support to mvneta driver, from Gregory Clement.

    4) Implement IPV6_HDRINCL socket option for raw sockets, from Hannes
    Frederic Sowa.

    5) Add support for T6 adapter to cxgb4 driver, from Hariprasad Shenai.

    6) Add support for VLAN device bridging to mlxsw switch driver, from
    Ido Schimmel.

    7) Add driver for Netronome NFP4000/NFP6000, from Jakub Kicinski.

    8) Provide hwmon interface to mlxsw switch driver, from Jiri Pirko.

    9) Reorganize wireless drivers into per-vendor directories just like we
    do for ethernet drivers. From Kalle Valo.

    10) Provide a way for administrators "destroy" connected sockets via the
    SOCK_DESTROY socket netlink diag operation. From Lorenzo Colitti.

    11) Add support to add/remove multicast routes via netlink, from Nikolay
    Aleksandrov.

    12) Make TCP keepalive settings per-namespace, from Nikolay Borisov.

    13) Add forwarding and packet duplication facilities to nf_tables, from
    Pablo Neira Ayuso.

    14) Dead route support in MPLS, from Roopa Prabhu.

    15) TSO support for thunderx chips, from Sunil Goutham.

    16) Add driver for IBM's System i/p VNIC protocol, from Thomas Falcon.

    17) Rationalize, consolidate, and more completely document the checksum
    offloading facilities in the networking stack. From Tom Herbert.

    18) Support aborting an ongoing scan in mac80211/cfg80211, from
    Vidyullatha Kanchanapally.

    19) Use per-bucket spinlock for bpf hash facility, from Tom Leiming.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1375 commits)
    net: bnxt: always return values from _bnxt_get_max_rings
    net: bpf: reject invalid shifts
    phonet: properly unshare skbs in phonet_rcv()
    dwc_eth_qos: Fix dma address for multi-fragment skbs
    phy: remove an unneeded condition
    mdio: remove an unneed condition
    mdio_bus: NULL dereference on allocation error
    net: Fix typo in netdev_intersect_features
    net: freescale: mac-fec: Fix build error from phy_device API change
    net: freescale: ucc_geth: Fix build error from phy_device API change
    bonding: Prevent IPv6 link local address on enslaved devices
    IB/mlx5: Add flow steering support
    net/mlx5_core: Export flow steering API
    net/mlx5_core: Make ipv4/ipv6 location more clear
    net/mlx5_core: Enable flow steering support for the IB driver
    net/mlx5_core: Initialize namespaces only when supported by device
    net/mlx5_core: Set priority attributes
    net/mlx5_core: Connect flow tables
    net/mlx5_core: Introduce modify flow table command
    net/mlx5_core: Managing root flow table
    ...

    Linus Torvalds
     

12 Jan, 2016

1 commit

  • Pull MMC updates from Ulf Hansson:
    "MMC core:
    - Optimize boot time by detecting cards simultaneously
    - Make runtime resume default behavior for MMC/SD
    - Enable MMC/SD/SDIO devices to suspend/resume asynchronously
    - Allow more than 8 partitions per card
    - Introduce MMC_CAP2_NO_SDIO to prevent unsupported SDIO commands
    - Support the standard DT wakeup-source property
    - Fix driver strength switching for HS200 and HS400
    - Fix switch command timeout
    - Fix invalid vdd in voltage switch power cycle for SDIO

    MMC host:
    - sdhci: Restore behavior when setting VDD via external regulator
    - sdhci: A couple of changes/fixes related to the dma support
    - sdhci-tegra: Add Tegra210 support
    - sdhci-tegra: Support for UHS-I cards including tuning support
    - sdhci-of-at91: Add PM support
    - sh_mmcif: Rework dma channel handling
    - mvsdio: Delete platform data code path"

    * tag 'mmc-v4.5' of git://git.linaro.org/people/ulf.hansson/mmc: (52 commits)
    mmc: dw_mmc: remove the unused quirks
    mmc: sdhci-pci: use to_pci_dev()
    mmc: cb710: use to_platform_device()
    mmc: tegra: use correct accessor for misc ctrl register
    mmc: tegra: enable UHS-I modes
    mmc: tegra: implement UHS tuning
    mmc: tegra: disable SPI_MODE_CLKEN
    mmc: tegra: implement module external clock change
    mmc: sdhci: restore behavior when setting VDD via external regulator
    mmc: It is not an error for the card to be removed while suspended
    mmc: block: Allow more than 8 partitions per card
    mmc: core: Optimize boot time by detecting cards simultaneously
    mmc: dw_mmc: use resource_size_t to store physical address
    mmc: core: fix __mmc_switch timeout caused by preempt
    mmc: usdhi6rol0: handle NULL data in timeout
    mmc: of_mmc_spi: Add IRQF_ONESHOT to interrupt flags
    mmc: mediatek: change some dev_err to dev_dbg
    mmc: enable MMC/SD/SDIO device to suspend/resume asynchronously
    mmc: sdhci: Fix sdhci_runtime_pm_bus_on/off()
    mmc: sdhci: 64-bit DMA actually has 4-byte alignment
    ...

    Linus Torvalds
     

09 Jan, 2016

2 commits

  • This effectively promotes IORESOURCE_BUSY to IORESOURCE_EXCLUSIVE
    semantics by default. If userspace really believes it is safe to access
    the memory region it can also perform the extra step of disabling an
    active driver. This protects device address ranges with read side
    effects and otherwise directs userspace to use the driver.

    Persistent memory presents a large "mistake surface" to /dev/mem as now
    accidental writes can corrupt a filesystem.

    In general if a device driver is busily using a memory region it already
    informs other parts of the kernel to not touch it via
    request_mem_region(). /dev/mem should honor the same safety restriction
    by default. Debugging a device driver from userspace becomes more
    difficult with this enabled. Any application using /dev/mem or mmap of
    sysfs pci resources will now need to perform the extra step of either:

    1/ Disabling the driver, for example:

    echo > /dev/bus//drivers//unbind

    2/ Rebooting with "iomem=relaxed" on the command line

    3/ Recompiling with CONFIG_IO_STRICT_DEVMEM=n

    Traditional users of /dev/mem like dosemu are unaffected because the
    first 1MB of memory is not subject to the IO_STRICT_DEVMEM restriction.
    Legacy X configurations use /dev/mem to talk to graphics hardware, but
    that functionality has since moved to kernel graphics drivers.

    Cc: Arnd Bergmann
    Cc: Russell King
    Cc: Andrew Morton
    Cc: Greg Kroah-Hartman
    Acked-by: Kees Cook
    Acked-by: Ingo Molnar
    Signed-off-by: Dan Williams

    Dan Williams
     
  • Let all the archs that implement devmem_is_allowed() opt-in to a common
    definition of CONFIG_STRICT_DEVM in lib/Kconfig.debug.

    Cc: Kees Cook
    Cc: Russell King
    Cc: Will Deacon
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Andrew Morton
    Cc: Greg Kroah-Hartman
    Cc: "David S. Miller"
    Acked-by: Catalin Marinas
    Acked-by: Heiko Carstens
    [heiko: drop 'default y' for s390]
    Acked-by: Ingo Molnar
    Suggested-by: Arnd Bergmann
    Signed-off-by: Dan Williams

    Dan Williams
     

22 Dec, 2015

1 commit


09 Dec, 2015

1 commit

  • Workqueue stalls can happen from a variety of usage bugs such as
    missing WQ_MEM_RECLAIM flag or concurrency managed work item
    indefinitely staying RUNNING. These stalls can be extremely difficult
    to hunt down because the usual warning mechanisms can't detect
    workqueue stalls and the internal state is pretty opaque.

    To alleviate the situation, this patch implements workqueue lockup
    detector. It periodically monitors all worker_pools periodically and,
    if any pool failed to make forward progress longer than the threshold
    duration, triggers warning and dumps workqueue state as follows.

    BUG: workqueue lockup - pool cpus=0 node=0 flags=0x0 nice=0 stuck for 31s!
    Showing busy workqueues and worker pools:
    workqueue events: flags=0x0
    pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=17/256
    pending: monkey_wrench_fn, e1000_watchdog, cache_reap, vmstat_shepherd, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, release_one_tty, cgroup_release_agent
    workqueue events_power_efficient: flags=0x80
    pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=2/256
    pending: check_lifetime, neigh_periodic_work
    workqueue cgroup_pidlist_destroy: flags=0x0
    pwq 0: cpus=0 node=0 flags=0x0 nice=0 active=1/1
    pending: cgroup_pidlist_destroy_work_fn
    ...

    The detection mechanism is controller through kernel parameter
    workqueue.watchdog_thresh and can be updated at runtime through the
    sysfs module parameter file.

    v2: Decoupled from softlockup control knobs.

    Signed-off-by: Tejun Heo
    Acked-by: Don Zickus
    Cc: Ulrich Obergfell
    Cc: Michal Hocko
    Cc: Chris Mason
    Cc: Andrew Morton

    Tejun Heo
     

02 Dec, 2015

1 commit

  • This module allows to insert errors in some of netdevice's notifier
    events. All network drivers use these notifiers to signal various events
    and to check if they are allowed, e.g. PRECHANGEMTU and CHANGEMTU
    afterwards. Until recently I had to run failure tests by injecting
    a custom module, but now this infrastructure makes it trivial to test
    these failure paths. Some of the recent bugs I fixed were found using
    this module.
    Here's an example:
    $ cd /sys/kernel/debug/notifier-error-inject/netdev
    $ echo -22 > actions/NETDEV_CHANGEMTU/error
    $ ip link set eth0 mtu 1024
    RTNETLINK answers: Invalid argument

    CC: Akinobu Mita
    CC: "David S. Miller"
    CC: netdev
    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     

10 Nov, 2015

1 commit


07 Nov, 2015

1 commit

  • This adds a simple module for testing the kernel's printf facilities.
    Previously, some %p extensions have caused a wrong return value in case
    the entire output didn't fit and/or been unusable in kasprintf(). This
    should help catch such issues. Also, it should help ensure that changes
    to the formatting algorithms don't break anything.

    I'm not sure if we have a struct dentry or struct file lying around at
    boot time or if we can fake one, but most %p extensions should be
    testable, as should the ordinary number and string formatting.

    The nature of vararg functions means we can't use a more conventional
    table-driven approach.

    For now, this is mostly a skeleton; contributions are very
    welcome. Some tests are/will be slightly annoying to write, since the
    expected output depends on stuff like CONFIG_*, sizeof(long), runtime
    values etc.

    Signed-off-by: Rasmus Villemoes
    Reviewed-by: Kees Cook
    Cc: Andy Shevchenko
    Cc: Martin Kletzander
    Cc: Rasmus Villemoes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     

23 Oct, 2015

1 commit

  • When the kernel compiled with KASAN=y, GCC adds redzones for each
    variable on stack. This enlarges function's stack frame and causes:

    'warning: the frame size of X bytes is larger than Y bytes'

    The worst case I've seen for now is following:

    ../net/wireless/nl80211.c: In function `nl80211_send_wiphy':
    ../net/wireless/nl80211.c:1731:1: warning: the frame size of 5448 bytes is larger than 2048 bytes [-Wframe-larger-than=]

    That kind of warning becomes useless with KASAN=y. It doesn't
    necessarily indicate that there is some problem in the code, thus we
    should turn it off.

    (The KASAN=y stack size in increased from 16k to 32k for this reason)

    Signed-off-by: Andrey Ryabinin
    Reported-by: Fengguang Wu
    Acked-by: Abylay Ospan
    Cc: Andi Kleen
    Cc: Ingo Molnar
    Cc: Mauro Carvalho Chehab
    Cc: Kozlov Sergey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     

06 Oct, 2015

1 commit

  • The section mismatch warning can be easy to miss during the kernel build
    process. Allow it to be marked as fatal to be easily caught and prevent
    bugs from slipping in.

    Setting CONFIG_SECTION_MISMATCH_WARN_ONLY=y causes these warnings to be
    non-fatal, since there are a number of section mismatches when using
    allmodconfig on some architectures, and we do not want to break these
    builds by default.

    Signed-off-by: Nicolas Boichat
    Change-Id: Ic346706e3297c9f0d790e3552aa94e5cff9897a6
    Signed-off-by: Rusty Russell

    Nicolas Boichat
     

04 Sep, 2015

1 commit

  • Pull locking and atomic updates from Ingo Molnar:
    "Main changes in this cycle are:

    - Extend atomic primitives with coherent logic op primitives
    (atomic_{or,and,xor}()) and deprecate the old partial APIs
    (atomic_{set,clear}_mask())

    The old ops were incoherent with incompatible signatures across
    architectures and with incomplete support. Now every architecture
    supports the primitives consistently (by Peter Zijlstra)

    - Generic support for 'relaxed atomics':

    - _acquire/release/relaxed() flavours of xchg(), cmpxchg() and {add,sub}_return()
    - atomic_read_acquire()
    - atomic_set_release()

    This came out of porting qwrlock code to arm64 (by Will Deacon)

    - Clean up the fragile static_key APIs that were causing repeat bugs,
    by introducing a new one:

    DEFINE_STATIC_KEY_TRUE(name);
    DEFINE_STATIC_KEY_FALSE(name);

    which define a key of different types with an initial true/false
    value.

    Then allow:

    static_branch_likely()
    static_branch_unlikely()

    to take a key of either type and emit the right instruction for the
    case. To be able to know the 'type' of the static key we encode it
    in the jump entry (by Peter Zijlstra)

    - Static key self-tests (by Jason Baron)

    - qrwlock optimizations (by Waiman Long)

    - small futex enhancements (by Davidlohr Bueso)

    - ... and misc other changes"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
    jump_label/x86: Work around asm build bug on older/backported GCCs
    locking, ARM, atomics: Define our SMP atomics in terms of _relaxed() operations
    locking, include/llist: Use linux/atomic.h instead of asm/cmpxchg.h
    locking/qrwlock: Make use of _{acquire|release|relaxed}() atomics
    locking/qrwlock: Implement queue_write_unlock() using smp_store_release()
    locking/lockref: Remove homebrew cmpxchg64_relaxed() macro definition
    locking, asm-generic: Add _{relaxed|acquire|release}() variants for 'atomic_long_t'
    locking, asm-generic: Rework atomic-long.h to avoid bulk code duplication
    locking/atomics: Add _{acquire|release|relaxed}() variants of some atomic operations
    locking, compiler.h: Cast away attributes in the WRITE_ONCE() magic
    locking/static_keys: Make verify_keys() static
    jump label, locking/static_keys: Update docs
    locking/static_keys: Provide a selftest
    jump_label: Provide a self-test
    s390/uaccess, locking/static_keys: employ static_branch_likely()
    x86, tsc, locking/static_keys: Employ static_branch_likely()
    locking/static_keys: Add selftest
    locking/static_keys: Add a new static_key interface
    locking/static_keys: Rework update logic
    locking/static_keys: Add static_key_{en,dis}able() helpers
    ...

    Linus Torvalds
     

04 Aug, 2015

1 commit


03 Aug, 2015

2 commits

  • The 'jump label' self-test is in reality testing static keys - rename things
    accordingly.

    Also prettify the code in various places while at it.

    Acked-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Jason Baron
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Shuah Khan
    Cc: Thomas Gleixner
    Cc: benh@kernel.crashing.org
    Cc: bp@alien8.de
    Cc: davem@davemloft.net
    Cc: ddaney@caviumnetworks.com
    Cc: heiko.carstens@de.ibm.com
    Cc: linux-kernel@vger.kernel.org
    Cc: liuj97@gmail.com
    Cc: luto@amacapital.net
    Cc: michael@ellerman.id.au
    Cc: rabin@rab.in
    Cc: ralf@linux-mips.org
    Cc: rostedt@goodmis.org
    Cc: vbabka@suse.cz
    Cc: will.deacon@arm.com
    Link: http://lkml.kernel.org/r/0c091ecebd78a879ed8a71835d205a691a75ab4e.1438227999.git.jbaron@akamai.com
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Signed-off-by: Jason Baron
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: benh@kernel.crashing.org
    Cc: bp@alien8.de
    Cc: davem@davemloft.net
    Cc: ddaney@caviumnetworks.com
    Cc: heiko.carstens@de.ibm.com
    Cc: linux-kernel@vger.kernel.org
    Cc: liuj97@gmail.com
    Cc: luto@amacapital.net
    Cc: michael@ellerman.id.au
    Cc: rabin@rab.in
    Cc: ralf@linux-mips.org
    Cc: rostedt@goodmis.org
    Cc: shuahkh@osg.samsung.com
    Cc: vbabka@suse.cz
    Cc: will.deacon@arm.com
    Link: http://lkml.kernel.org/r/0c091ecebd78a879ed8a71835d205a691a75ab4e.1438227999.git.jbaron@akamai.com
    Signed-off-by: Ingo Molnar

    Jason Baron
     

23 Jul, 2015

1 commit


20 Jul, 2015

2 commits

  • No one uses this anymore, and this is not the first time the
    idea of replacing it with a (now possible) userspace side.
    Lock stealing logic was removed long ago in when the lock
    was granted to the highest prio.

    Signed-off-by: Davidlohr Bueso
    Cc: Darren Hart
    Cc: Steven Rostedt
    Cc: Mike Galbraith
    Cc: Paul E. McKenney
    Cc: Sebastian Andrzej Siewior
    Cc: Davidlohr Bueso
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1435782588-4177-2-git-send-email-dave@stgolabs.net
    Signed-off-by: Thomas Gleixner

    Davidlohr Bueso
     
  • Although futexes are well known for being a royal pita,
    we really have very little debugging capabilities - except
    for relying on tglx's eye half the time.

    By simply making use of the existing fault-injection machinery,
    we can improve this situation, allowing generating artificial
    uaddress faults and deadlock scenarios. Of course, when this is
    disabled in production systems, the overhead for failure checks
    is practically zero -- so this is very cheap at the same time.
    Future work would be nice to now enhance trinity to make use of
    this.

    There is a special tunable 'ignore-private', which can filter
    out private futexes. Given the tsk->make_it_fail filter and
    this option, pi futexes can be narrowed down pretty closely.

    Signed-off-by: Davidlohr Bueso
    Cc: Peter Zijlstra
    Cc: Darren Hart
    Cc: Davidlohr Bueso
    Link: http://lkml.kernel.org/r/1435645562-975-3-git-send-email-dave@stgolabs.net
    Signed-off-by: Thomas Gleixner

    Davidlohr Bueso
     

18 Jul, 2015

1 commit

  • The CONFIG_RCU_CPU_STALL_INFO has been default-y for a couple of
    releases with no complaints, so it is time to eliminate this Kconfig
    option entirely, so that the long-form RCU CPU stall warnings cannot
    be disabled. This commit does just that.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

04 Jul, 2015

1 commit

  • Both CONFIG_SCHEDSTATS=y and CONFIG_TASK_DELAY_ACCT=y track task
    sched_info, which results in ugly #if clauses.

    Simplify the code by introducing a synthethic CONFIG_SCHED_INFO
    switch, selected by both.

    Signed-off-by: Naveen N. Rao
    Cc: Balbir Singh
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Srikar Dronamraju
    Cc: Thomas Gleixner
    Cc: a.p.zijlstra@chello.nl
    Cc: ricklind@us.ibm.com
    Link: http://lkml.kernel.org/r/8d19eef800811a94b0f91bcbeb27430a884d7433.1435255405.git.naveen.n.rao@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar

    Naveen N. Rao
     

28 May, 2015

3 commits

  • This commit applies some warning-omission micro-optimizations to RCU's
    various extended-quiescent-state functions, which are on the kernel/user
    hotpath for CONFIG_NO_HZ_FULL=y.

    Reported-by: Rik van Riel
    Reported by: Mike Galbraith
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Currently, Kconfig will ask the user whether TASKS_RCU should be set.
    This is silly because Kconfig already has all the information that it
    needs to set this parameter. This commit therefore directly drives
    the value of TASKS_RCU via "select" statements. Which means that
    as subsystems require TASKS_RCU, those subsystems will need to add
    "select" statements of their own.

    Reported-by: Ingo Molnar
    Signed-off-by: Paul E. McKenney
    Cc: Steven Rostedt
    Reviewed-by: Pranith Kumar

    Paul E. McKenney
     
  • Grace-period scans of the rcu_node combining tree normally
    proceed quite quickly, so that it is very difficult to reproduce
    races against them. This commit therefore allows grace-period
    pre-initialization and cleanup to be artificially slowed down,
    increasing race-reproduction probability. A pair of pairs of new
    Kconfig parameters are provided, RCU_TORTURE_TEST_SLOW_PREINIT to
    enable the slowing down of propagating CPU-hotplug changes up the
    combining tree along with RCU_TORTURE_TEST_SLOW_PREINIT_DELAY to
    specify the delay in jiffies, and RCU_TORTURE_TEST_SLOW_CLEANUP
    to enable the slowing down of the end-of-grace-period cleanup scan
    along with RCU_TORTURE_TEST_SLOW_CLEANUP_DELAY to specify the delay
    in jiffies. Boot-time parameters named rcutree.gp_preinit_delay and
    rcutree.gp_cleanup_delay allow these delays to be specified at boot time.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

07 May, 2015

1 commit


18 Apr, 2015

1 commit


15 Apr, 2015

5 commits

  • In a misguided attempt to avoid an #ifdef, the use of the
    gp_init_delay module parameter was conditioned on the corresponding
    RCU_TORTURE_TEST_SLOW_INIT Kconfig variable, using IS_ENABLED() at
    the point of use in the code. This meant that the compiler always saw
    the delay, which meant that RCU_TORTURE_TEST_SLOW_INIT_DELAY had to be
    unconditionally defined. This in turn caused "make oldconfig" to ask
    pointless questions about the value of RCU_TORTURE_TEST_SLOW_INIT_DELAY
    in cases where it was not even used.

    This commit avoids these pointless questions by defining gp_init_delay
    under #ifdef. In one branch, gp_init_delay is initialized to
    RCU_TORTURE_TEST_SLOW_INIT_DELAY and is also a module parameter (thus
    allowing boot-time modification), and in the other branch gp_init_delay
    is a const variable initialized by default to zero.

    This approach also simplifies the code at the delay point by eliminating
    the IS_DEFINED(). Because gp_init_delay is constant zero in the no-delay
    case intended for production use, the "gp_init_delay > 0" check causes
    the delay to become dead code, as desired in this case. In addition,
    this commit replaces magic constant "10" with the preprocessor variable
    PER_RCU_NODE_PERIOD, which controls the number of grace periods that
    are allowed to elapse at full speed before a delay is inserted.

    Reported-by: Linus Torvalds Signed-off-by:
    Paul E. McKenney

    Paul E. McKenney
     
  • Merge first patchbomb from Andrew Morton:

    - arch/sh updates

    - ocfs2 updates

    - kernel/watchdog feature

    - about half of mm/

    * emailed patches from Andrew Morton : (122 commits)
    Documentation: update arch list in the 'memtest' entry
    Kconfig: memtest: update number of test patterns up to 17
    arm: add support for memtest
    arm64: add support for memtest
    memtest: use phys_addr_t for physical addresses
    mm: move memtest under mm
    mm, hugetlb: abort __get_user_pages if current has been oom killed
    mm, mempool: do not allow atomic resizing
    memcg: print cgroup information when system panics due to panic_on_oom
    mm: numa: remove migrate_ratelimited
    mm: fold arch_randomize_brk into ARCH_HAS_ELF_RANDOMIZE
    mm: split ET_DYN ASLR from mmap ASLR
    s390: redefine randomize_et_dyn for ELF_ET_DYN_BASE
    mm: expose arch_mmap_rnd when available
    s390: standardize mmap_rnd() usage
    powerpc: standardize mmap_rnd() usage
    mips: extract logic for mmap_rnd()
    arm64: standardize mmap_rnd() usage
    x86: standardize mmap_rnd() usage
    arm: factor out mmap ASLR into mmap_rnd
    ...

    Linus Torvalds
     
  • Additional test patterns for memtest were introduced since commit
    63823126c221 ("x86: memtest: add additional (regular) test patterns"),
    but looks like Kconfig was not updated that time.

    Update Kconfig entry with the actual number of maximum test patterns.

    Signed-off-by: Vladimir Murzin
    Cc: "H. Peter Anvin"
    Cc: Catalin Marinas
    Cc: Ingo Molnar
    Cc: Mark Rutland
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Murzin
     
  • Memtest is a simple feature which fills the memory with a given set of
    patterns and validates memory contents, if bad memory regions is detected
    it reserves them via memblock API. Since memblock API is widely used by
    other architectures this feature can be enabled outside of x86 world.

    This patch set promotes memtest to live under generic mm umbrella and
    enables memtest feature for arm/arm64.

    It was reported that this patch set was useful for tracking down an issue
    with some errant DMA on an arm64 platform.

    This patch (of 6):

    There is nothing platform dependent in the core memtest code, so other
    platforms might benefit from this feature too.

    [linux@roeck-us.net: MEMTEST depends on MEMBLOCK]
    Signed-off-by: Vladimir Murzin
    Acked-by: Will Deacon
    Tested-by: Mark Rutland
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Catalin Marinas
    Cc: Russell King
    Cc: Paul Bolle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Murzin
     
  • Pull RCU changes from Ingo Molnar:
    "The main changes in this cycle were:

    - changes permitting use of call_rcu() and friends very early in
    boot, for example, before rcu_init() is invoked.

    - add in-kernel API to enable and disable expediting of normal RCU
    grace periods.

    - improve RCU's handling of (hotplug-) outgoing CPUs.

    - NO_HZ_FULL_SYSIDLE fixes.

    - tiny-RCU updates to make it more tiny.

    - documentation updates.

    - miscellaneous fixes"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
    cpu: Provide smpboot_thread_init() on !CONFIG_SMP kernels as well
    cpu: Defer smpboot kthread unparking until CPU known to scheduler
    rcu: Associate quiescent-state reports with grace period
    rcu: Yet another fix for preemption and CPU hotplug
    rcu: Add diagnostics to grace-period cleanup
    rcutorture: Default to grace-period-initialization delays
    rcu: Handle outgoing CPUs on exit from idle loop
    cpu: Make CPU-offline idle-loop transition point more precise
    rcu: Eliminate ->onoff_mutex from rcu_node structure
    rcu: Process offlining and onlining only at grace-period start
    rcu: Move rcu_report_unblock_qs_rnp() to common code
    rcu: Rework preemptible expedited bitmask handling
    rcu: Remove event tracing from rcu_cpu_notify(), used by offline CPUs
    rcutorture: Enable slow grace-period initializations
    rcu: Provide diagnostic option to slow down grace-period initialization
    rcu: Detect stalls caused by failure to propagate up rcu_node tree
    rcu: Eliminate empty HOTPLUG_CPU ifdef
    rcu: Simplify sync_rcu_preempt_exp_init()
    rcu: Put all orphan-callback-related code under same comment
    rcu: Consolidate offline-CPU callback initialization
    ...

    Linus Torvalds
     

20 Mar, 2015

1 commit


13 Mar, 2015

1 commit

  • Recently there's been requests for better sanity
    checking in the time code, so that it's more clear
    when something is going wrong, since timekeeping issues
    could manifest in a large number of strange ways in
    various subsystems.

    Thus, this patch adds some extra infrastructure to
    add a check to update_wall_time() to print two new
    warnings:

    1) if we see the call delayed beyond the 'max_cycles'
    overflow point,

    2) or if we see the call delayed beyond the clocksource's
    'max_idle_ns' value, which is currently 50% of the
    overflow point.

    This extra infrastructure is conditional on
    a new CONFIG_DEBUG_TIMEKEEPING option, also
    added in this patch - default off.

    Tested this a bit by halting qemu for specified
    lengths of time to trigger the warnings.

    Signed-off-by: John Stultz
    Cc: Dave Jones
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Prarit Bhargava
    Cc: Richard Cochran
    Cc: Stephen Boyd
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1426133800-29329-5-git-send-email-john.stultz@linaro.org
    [ Improved the changelog and the messages a bit. ]
    Signed-off-by: Ingo Molnar

    John Stultz