13 Mar, 2019

1 commit

  • Add panic() calls if memblock_alloc() returns NULL.

    The panic() format duplicates the one used by memblock itself and in
    order to avoid explosion with long parameters list replace open coded
    allocation size calculations with a local variable.

    Link: http://lkml.kernel.org/r/1548057848-15136-18-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: Catalin Marinas
    Cc: Christophe Leroy
    Cc: Christoph Hellwig
    Cc: "David S. Miller"
    Cc: Dennis Zhou
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Guo Ren [c-sky]
    Cc: Heiko Carstens
    Cc: Juergen Gross [Xen]
    Cc: Mark Salter
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Paul Burton
    Cc: Petr Mladek
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Rob Herring
    Cc: Rob Herring
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

11 Mar, 2019

2 commits

  • Pull Kbuild updates from Masahiro Yamada:

    - do not generate unneeded top-level built-in.a

    - let git ignore O= directory entirely

    - optimize scripts/kallsyms slightly

    - exclude DWARF info from *.s regardless of config options

    - fix GCC toolchain search path for Clang to prepare ld.lld support

    - do not generate modules.order when CONFIG_MODULES is disabled

    - simplify single target rules and remove VPATH for external module
    build

    - allow to add optional flags to dpkg-buildpackage when building
    deb-pkg

    - move some compiler option tests from Makefile to Kconfig

    - various Makefile cleanups

    * tag 'kbuild-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (40 commits)
    kbuild: remove scripts/basic/% build target
    kbuild: use -Werror=implicit-... instead of -Werror-implicit-...
    kbuild: clean up scripts/gcc-version.sh
    kbuild: remove cc-version macro
    kbuild: update comment block of scripts/clang-version.sh
    kbuild: remove commented-out INITRD_COMPRESS
    kbuild: move -gsplit-dwarf, -gdwarf-4 option tests to Kconfig
    kbuild: [bin]deb-pkg: add DPKG_FLAGS variable
    kbuild: move ".config not found!" message from Kconfig to Makefile
    kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing
    kbuild: simplify single target rules
    kbuild: remove empty rules for makefiles
    kbuild: make -r/-R effective in top Makefile for old Make versions
    kbuild: move tools_silent to a more relevant place
    kbuild: compute false-positive -Wmaybe-uninitialized cases in Kconfig
    kbuild: refactor cc-cross-prefix implementation
    kbuild: hardcode genksyms path and remove GENKSYMS variable
    scripts/gdb: refactor rules for symlink creation
    kbuild: create symlink to vmlinux-gdb.py in scripts_gdb target
    scripts/gdb: do not descend into scripts/gdb from scripts
    ...

    Linus Torvalds
     
  • Pull timer fix from Thomas Gleixner:
    "A single fix to prevent a unmet dependencies warning in Kconfig"

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    time: Make VIRT_CPU_ACCOUNTING_GEN depend on GENERIC_CLOCKEVENTS

    Linus Torvalds
     

09 Mar, 2019

1 commit

  • Pull io_uring IO interface from Jens Axboe:
    "Second attempt at adding the io_uring interface.

    Since the first one, we've added basic unit testing of the three
    system calls, that resides in liburing like the other unit tests that
    we have so far. It'll take a while to get full coverage of it, but
    we're working towards it. I've also added two basic test programs to
    tools/io_uring. One uses the raw interface and has support for all the
    various features that io_uring supports outside of standard IO, like
    fixed files, fixed IO buffers, and polled IO. The other uses the
    liburing API, and is a simplified version of cp(1).

    This adds support for a new IO interface, io_uring.

    io_uring allows an application to communicate with the kernel through
    two rings, the submission queue (SQ) and completion queue (CQ) ring.
    This allows for very efficient handling of IOs, see the v5 posting for
    some basic numbers:

    https://lore.kernel.org/linux-block/20190116175003.17880-1-axboe@kernel.dk/

    Outside of just efficiency, the interface is also flexible and
    extendable, and allows for future use cases like the upcoming NVMe
    key-value store API, networked IO, and so on. It also supports async
    buffered IO, something that we've always failed to support in the
    kernel.

    Outside of basic IO features, it supports async polled IO as well.
    This particular feature has already been tested at Facebook months ago
    for flash storage boxes, with 25-33% improvements. It makes polled IO
    actually useful for real world use cases, where even basic flash sees
    a nice win in terms of efficiency, latency, and performance. These
    boxes were IOPS bound before, now they are not.

    This series adds three new system calls. One for setting up an
    io_uring instance (io_uring_setup(2)), one for submitting/completing
    IO (io_uring_enter(2)), and one for aux functions like registrating
    file sets, buffers, etc (io_uring_register(2)). Through the help of
    Arnd, I've coordinated the syscall numbers so merge on that front
    should be painless.

    Jon did a writeup of the interface a while back, which (except for
    minor details that have been tweaked) is still accurate. Find that
    here:

    https://lwn.net/Articles/776703/

    Huge thanks to Al Viro for helping getting the reference cycle code
    correct, and to Jann Horn for his extensive reviews focused on both
    security and bugs in general.

    There's a userspace library that provides basic functionality for
    applications that don't need or want to care about how to fiddle with
    the rings directly. It has helpers to allow applications to easily set
    up an io_uring instance, and submit/complete IO through it without
    knowing about the intricacies of the rings. It also includes man pages
    (thanks to Jeff Moyer), and will continue to grow support helper
    functions and features as time progresses. Find it here:

    git://git.kernel.dk/liburing

    Fio has full support for the raw interface, both in the form of an IO
    engine (io_uring), but also with a small test application (t/io_uring)
    that can exercise and benchmark the interface"

    * tag 'io_uring-2019-03-06' of git://git.kernel.dk/linux-block:
    io_uring: add a few test tools
    io_uring: allow workqueue item to handle multiple buffered requests
    io_uring: add support for IORING_OP_POLL
    io_uring: add io_kiocb ref count
    io_uring: add submission polling
    io_uring: add file set registration
    net: split out functions related to registering inflight socket files
    io_uring: add support for pre-mapped user IO buffers
    block: implement bio helper to add iter bvec pages to bio
    io_uring: batch io_kiocb allocation
    io_uring: use fget/fput_many() for file references
    fs: add fget_many() and fput_many()
    io_uring: support for IO polling
    io_uring: add fsync support
    Add io_uring IO interface

    Linus Torvalds
     

08 Mar, 2019

3 commits

  • Merge more updates from Andrew Morton:

    - some of the rest of MM

    - various misc things

    - dynamic-debug updates

    - checkpatch

    - some epoll speedups

    - autofs

    - rapidio

    - lib/, lib/lzo/ updates

    * emailed patches from Andrew Morton : (83 commits)
    samples/mic/mpssd/mpssd.h: remove duplicate header
    kernel/fork.c: remove duplicated include
    include/linux/relay.h: fix percpu annotation in struct rchan
    arch/nios2/mm/fault.c: remove duplicate include
    unicore32: stop printing the virtual memory layout
    MAINTAINERS: fix GTA02 entry and mark as orphan
    mm: create the new vm_fault_t type
    arm, s390, unicore32: remove oneliner wrappers for memblock_alloc()
    arch: simplify several early memory allocations
    openrisc: simplify pte_alloc_one_kernel()
    sh: prefer memblock APIs returning virtual address
    microblaze: prefer memblock API returning virtual address
    powerpc: prefer memblock APIs returning virtual address
    lib/lzo: separate lzo-rle from lzo
    lib/lzo: implement run-length encoding
    lib/lzo: fast 8-byte copy on arm64
    lib/lzo: 64-bit CTZ on arm64
    lib/lzo: tidy-up ifdefs
    ipc/sem.c: replace kvmalloc/memset with kvzalloc and use struct_size
    ipc: annotate implicit fall through
    ...

    Linus Torvalds
     
  • Use distinct error messages when archive decompression failed.

    Link: http://lkml.kernel.org/r/20190212075635.7373-1-david.engraf@sysgo.com
    Signed-off-by: David Engraf
    Reviewed-by: Andrew Morton
    Tested-by: Andy Shevchenko
    Cc: Dominik Brodowski
    Cc: Greg Kroah-Hartman
    Cc: Philippe Ombredanne
    Cc: Arnd Bergmann
    Cc: Luc Van Oostenryck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Engraf
     
  • Pull audit updates from Paul Moore:
    "A lucky 13 audit patches for v5.1.

    Despite the rather large diffstat, most of the changes are from two
    bug fix patches that move code from one Kconfig option to another.

    Beyond that bit of churn, the remaining changes are largely cleanups
    and bug-fixes as we slowly march towards container auditing. It isn't
    all boring though, we do have a couple of new things: file
    capabilities v3 support, and expanded support for filtering on
    filesystems to solve problems with remote filesystems.

    All changes pass the audit-testsuite. Please merge for v5.1"

    * tag 'audit-pr-20190305' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
    audit: mark expected switch fall-through
    audit: hide auditsc_get_stamp and audit_serial prototypes
    audit: join tty records to their syscall
    audit: remove audit_context when CONFIG_ AUDIT and not AUDITSYSCALL
    audit: remove unused actx param from audit_rule_match
    audit: ignore fcaps on umount
    audit: clean up AUDITSYSCALL prototypes and stubs
    audit: more filter PATH records keyed on filesystem magic
    audit: add support for fcaps v3
    audit: move loginuid and sessionid from CONFIG_AUDITSYSCALL to CONFIG_AUDIT
    audit: add syscall information to CONFIG_CHANGE records
    audit: hand taken context to audit_kill_trees for syscall logging
    audit: give a clue what CONFIG_CHANGE op was involved

    Linus Torvalds
     

07 Mar, 2019

3 commits

  • Moving the CONTEXT_TRACKING Kconfig option into kernel/time/Kconfig added
    an implicit dependency on the surrounding GENERIC_CLOCKEVENTS option, but
    this is not always enabled when it is possible to select
    VIRT_CPU_ACCOUNTING_GEN:

    WARNING: unmet direct dependencies detected for CONTEXT_TRACKING
    Depends on [n]: GENERIC_CLOCKEVENTS [=n]
    Selected by [y]:
    - VIRT_CPU_ACCOUNTING_GEN [=y] && && HAVE_CONTEXT_TRACKING [=y] && HAVE_VIRT_CPU_ACCOUNTING_GEN [=y]

    Platforms without GENERIC_CLOCKEVENTS are rare enough so that corner case
    can be just ignored. Make it a dependency for VIRT_CPU_ACCOUNTING_GEN to
    simplify the configuration.

    Fixes: a4cffdad7314 ("time: Move CONTEXT_TRACKING to kernel/time/Kconfig")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Thomas Gleixner
    Cc: "Paul E . McKenney"
    Cc: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/20190304200202.1163250-1-arnd@arndb.de

    Arnd Bergmann
     
  • Merge misc updates from Andrew Morton:

    - a few misc things

    - ocfs2 updates

    - most of MM

    * emailed patches from Andrew Morton : (159 commits)
    tools/testing/selftests/proc/proc-self-syscall.c: remove duplicate include
    proc: more robust bulk read test
    proc: test /proc/*/maps, smaps, smaps_rollup, statm
    proc: use seq_puts() everywhere
    proc: read kernel cpu stat pointer once
    proc: remove unused argument in proc_pid_lookup()
    fs/proc/thread_self.c: code cleanup for proc_setup_thread_self()
    fs/proc/self.c: code cleanup for proc_setup_self()
    proc: return exit code 4 for skipped tests
    mm,mremap: bail out earlier in mremap_to under map pressure
    mm/sparse: fix a bad comparison
    mm/memory.c: do_fault: avoid usage of stale vm_area_struct
    writeback: fix inode cgroup switching comment
    mm/huge_memory.c: fix "orig_pud" set but not used
    mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC
    mm/memcontrol.c: fix bad line in comment
    mm/cma.c: cma_declare_contiguous: correct err handling
    mm/page_ext.c: fix an imbalance with kmemleak
    mm/compaction: pass pgdat to too_many_isolated() instead of zone
    mm: remove zone_lru_lock() function, access ->lru_lock directly
    ...

    Linus Torvalds
     
  • Pull scheduler updates from Ingo Molnar:
    "The main changes in this cycle were:

    - refcount conversions

    - Solve the rq->leaf_cfs_rq_list can of worms for real.

    - improve power-aware scheduling

    - add sysctl knob for Energy Aware Scheduling

    - documentation updates

    - misc other changes"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
    kthread: Do not use TIMER_IRQSAFE
    kthread: Convert worker lock to raw spinlock
    sched/fair: Use non-atomic cpumask_{set,clear}_cpu()
    sched/fair: Remove unused 'sd' parameter from select_idle_smt()
    sched/wait: Use freezable_schedule() when possible
    sched/fair: Prune, fix and simplify the nohz_balancer_kick() comment block
    sched/fair: Explain LLC nohz kick condition
    sched/fair: Simplify nohz_balancer_kick()
    sched/topology: Fix percpu data types in struct sd_data & struct s_data
    sched/fair: Simplify post_init_entity_util_avg() by calling it with a task_struct pointer argument
    sched/fair: Fix O(nr_cgroups) in the load balancing path
    sched/fair: Optimize update_blocked_averages()
    sched/fair: Fix insertion in rq->leaf_cfs_rq_list
    sched/fair: Add tmp_alone_branch assertion
    sched/core: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock()
    sched/debug: Initialize sd_sysctl_cpus if !CONFIG_CPUMASK_OFFSTACK
    sched/pelt: Skip updating util_est when utilization is higher than CPU's capacity
    sched/fair: Update scale invariance of PELT
    sched/fair: Move the rq_of() helper function
    sched/core: Convert task_struct.stack_refcount to refcount_t
    ...

    Linus Torvalds
     

06 Mar, 2019

1 commit

  • Patch series "Replace all open encodings for NUMA_NO_NODE", v3.

    All these places for replacement were found by running the following
    grep patterns on the entire kernel code. Please let me know if this
    might have missed some instances. This might also have replaced some
    false positives. I will appreciate suggestions, inputs and review.

    1. git grep "nid == -1"
    2. git grep "node == -1"
    3. git grep "nid = -1"
    4. git grep "node = -1"

    This patch (of 2):

    At present there are multiple places where invalid node number is
    encoded as -1. Even though implicitly understood it is always better to
    have macros in there. Replace these open encodings for an invalid node
    number with the global macro NUMA_NO_NODE. This helps remove NUMA
    related assumptions like 'invalid node' from various places redirecting
    them to a common definition.

    Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Anshuman Khandual
    Reviewed-by: David Hildenbrand
    Acked-by: Jeff Kirsher [ixgbe]
    Acked-by: Jens Axboe [mtip32xx]
    Acked-by: Vinod Koul [dmaengine.c]
    Acked-by: Michael Ellerman [powerpc]
    Acked-by: Doug Ledford [drivers/infiniband]
    Cc: Joseph Qi
    Cc: Hans Verkuil
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anshuman Khandual
     

04 Mar, 2019

1 commit


28 Feb, 2019

1 commit

  • The submission queue (SQ) and completion queue (CQ) rings are shared
    between the application and the kernel. This eliminates the need to
    copy data back and forth to submit and complete IO.

    IO submissions use the io_uring_sqe data structure, and completions
    are generated in the form of io_uring_cqe data structures. The SQ
    ring is an index into the io_uring_sqe array, which makes it possible
    to submit a batch of IOs without them being contiguous in the ring.
    The CQ ring is always contiguous, as completion events are inherently
    unordered, and hence any io_uring_cqe entry can point back to an
    arbitrary submission.

    Two new system calls are added for this:

    io_uring_setup(entries, params)
    Sets up an io_uring instance for doing async IO. On success,
    returns a file descriptor that the application can mmap to
    gain access to the SQ ring, CQ ring, and io_uring_sqes.

    io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize)
    Initiates IO against the rings mapped to this fd, or waits for
    them to complete, or both. The behavior is controlled by the
    parameters passed in. If 'to_submit' is non-zero, then we'll
    try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
    kernel will wait for 'min_complete' events, if they aren't
    already available. It's valid to set IORING_ENTER_GETEVENTS
    and 'min_complete' == 0 at the same time, this allows the
    kernel to return already completed events without waiting
    for them. This is useful only for polling, as for IRQ
    driven IO, the application can just check the CQ ring
    without entering the kernel.

    With this setup, it's possible to do async IO with a single system
    call. Future developments will enable polled IO with this interface,
    and polled submission as well. The latter will enable an application
    to do IO without doing ANY system calls at all.

    For IRQ driven IO, an application only needs to enter the kernel for
    completions if it wants to wait for them to occur.

    Each io_uring is backed by a workqueue, to support buffered async IO
    as well. We will only punt to an async context if the command would
    need to wait for IO on the device side. Any data that can be accessed
    directly in the page cache is done inline. This avoids the slowness
    issue of usual threadpools, since cached data is accessed as quickly
    as a sync interface.

    Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.c

    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Jens Axboe
     

27 Feb, 2019

1 commit

  • Since -Wmaybe-uninitialized was introduced by GCC 4.7, we have patched
    various false positives:

    - commit e74fc973b6e5 ("Turn off -Wmaybe-uninitialized when building
    with -Os") turned off this option for -Os.

    - commit 815eb71e7149 ("Kbuild: disable 'maybe-uninitialized' warning
    for CONFIG_PROFILE_ALL_BRANCHES") turned off this option for
    CONFIG_PROFILE_ALL_BRANCHES

    - commit a76bcf557ef4 ("Kbuild: enable -Wmaybe-uninitialized warning
    for "make W=1"") turned off this option for GCC < 4.9
    Arnd provided more explanation in https://lkml.org/lkml/2017/3/14/903

    I think this looks better by shifting the logic from Makefile to Kconfig.

    Link: https://github.com/ClangBuiltLinux/linux/issues/350
    Signed-off-by: Masahiro Yamada
    Reviewed-by: Nathan Chancellor
    Tested-by: Nick Desaulniers

    Masahiro Yamada
     

22 Feb, 2019

1 commit

  • Revert ff1522bb7d9845 ("initramfs: cleanup incomplete rootfs").

    Andy reports

    : This breaks my setup where I have U-boot provided more size of initramfs
    : than needed. This allows a bit of flexibility to increase or decrease
    : initramfs compressed image without taking care of bootloader. The proper
    : solution is to do this if we sure that we didn't get enough memory,
    : otherwise I can't consider the error fatal to clean up rootfs.

    Fixes: ff1522bb7d9845 ("initramfs: cleanup incomplete rootfs")
    Reported-by: Andy Shevchenko
    Tested-by: Andy Shevchenko
    Cc: David Engraf
    Cc: Dominik Brodowski
    Cc: Greg Kroah-Hartman
    Cc: Philippe Ombredanne
    Cc: Arnd Bergmann
    Cc: Luc Van Oostenryck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

13 Feb, 2019

1 commit

  • This reverts commit fe53ca54270a ("mm: use early_pfn_to_nid in
    page_ext_init").

    When booting a system with "page_owner=on",

    start_kernel
    page_ext_init
    invoke_init_callbacks
    init_section_page_ext
    init_page_owner
    init_early_allocated_pages
    init_zones_in_node
    init_pages_in_zone
    lookup_page_ext
    page_to_nid

    The issue here is that page_to_nid() will not work since some page flags
    have no node information until later in page_alloc_init_late() due to
    DEFERRED_STRUCT_PAGE_INIT. Hence, it could trigger an out-of-bounds
    access with an invalid nid.

    UBSAN: Undefined behaviour in ./include/linux/mm.h:1104:50
    index 7 is out of range for type 'zone [5]'

    Also, kernel will panic since flags were poisoned earlier with,

    CONFIG_DEBUG_VM_PGFLAGS=y
    CONFIG_NODE_NOT_IN_PAGE_FLAGS=n

    start_kernel
    setup_arch
    pagetable_init
    paging_init
    sparse_init
    sparse_init_nid
    memblock_alloc_try_nid_raw

    It did not handle it well in init_pages_in_zone() which ends up calling
    page_to_nid().

    page:ffffea0004200000 is uninitialized and poisoned
    raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
    raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
    page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
    page_owner info is not active (free page?)
    kernel BUG at include/linux/mm.h:990!
    RIP: 0010:init_page_owner+0x486/0x520

    This means that assumptions behind commit fe53ca54270a ("mm: use
    early_pfn_to_nid in page_ext_init") are incomplete. Therefore, revert
    the commit for now. A proper way to move the page_owner initialization
    to sooner is to hook into memmap initialization.

    Link: http://lkml.kernel.org/r/20190115202812.75820-1-cai@lca.pw
    Signed-off-by: Qian Cai
    Acked-by: Michal Hocko
    Cc: Pasha Tatashin
    Cc: Mel Gorman
    Cc: Yang Shi
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Qian Cai
     

11 Feb, 2019

1 commit


04 Feb, 2019

3 commits

  • atomic_t variables are currently used to implement reference
    counters with the following properties:

    - counter is initialized to 1 using atomic_set()
    - a resource is freed upon counter reaching zero
    - once counter reaches zero, its further
    increments aren't allowed
    - counter schema uses basic atomic operations
    (set, inc, inc_not_zero, dec_and_test, etc.)

    Such atomic variables should be converted to a newly provided
    refcount_t type and API that prevents accidental counter overflows
    and underflows. This is important since overflows and underflows
    can lead to use-after-free situation and be exploitable.

    The variable task_struct.stack_refcount is used as pure reference counter.
    Convert it to refcount_t and fix up the operations.

    ** Important note for maintainers:

    Some functions from refcount_t API defined in lib/refcount.c
    have different memory ordering guarantees than their atomic
    counterparts.

    The full comparison can be seen in
    https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
    in state to be merged to the documentation tree.

    Normally the differences should not matter since refcount_t provides
    enough guarantees to satisfy the refcounting use cases, but in
    some rare cases it might matter.

    Please double check that you don't have some undocumented
    memory guarantees for this variable usage.

    For the task_struct.stack_refcount it might make a difference
    in following places:

    - try_get_task_stack(): increment in refcount_inc_not_zero() only
    guarantees control dependency on success vs. fully ordered
    atomic counterpart
    - put_task_stack(): decrement in refcount_dec_and_test() only
    provides RELEASE ordering and control dependency on success
    vs. fully ordered atomic counterpart

    Suggested-by: Kees Cook
    Signed-off-by: Elena Reshetova
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: David Windsor
    Reviewed-by: Hans Liljestrand
    Reviewed-by: Andrea Parri
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: viro@zeniv.linux.org.uk
    Link: https://lkml.kernel.org/r/1547814450-18902-6-git-send-email-elena.reshetova@intel.com
    Signed-off-by: Ingo Molnar

    Elena Reshetova
     
  • atomic_t variables are currently used to implement reference
    counters with the following properties:

    - counter is initialized to 1 using atomic_set()
    - a resource is freed upon counter reaching zero
    - once counter reaches zero, its further
    increments aren't allowed
    - counter schema uses basic atomic operations
    (set, inc, inc_not_zero, dec_and_test, etc.)

    Such atomic variables should be converted to a newly provided
    refcount_t type and API that prevents accidental counter overflows
    and underflows. This is important since overflows and underflows
    can lead to use-after-free situation and be exploitable.

    The variable task_struct.usage is used as pure reference counter.
    Convert it to refcount_t and fix up the operations.

    ** Important note for maintainers:

    Some functions from refcount_t API defined in lib/refcount.c
    have different memory ordering guarantees than their atomic
    counterparts.

    The full comparison can be seen in
    https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
    in state to be merged to the documentation tree.

    Normally the differences should not matter since refcount_t provides
    enough guarantees to satisfy the refcounting use cases, but in
    some rare cases it might matter.

    Please double check that you don't have some undocumented
    memory guarantees for this variable usage.

    For the task_struct.usage it might make a difference
    in following places:

    - put_task_struct(): decrement in refcount_dec_and_test() only
    provides RELEASE ordering and control dependency on success
    vs. fully ordered atomic counterpart

    Suggested-by: Kees Cook
    Signed-off-by: Elena Reshetova
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: David Windsor
    Reviewed-by: Hans Liljestrand
    Reviewed-by: Andrea Parri
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: viro@zeniv.linux.org.uk
    Link: https://lkml.kernel.org/r/1547814450-18902-5-git-send-email-elena.reshetova@intel.com
    Signed-off-by: Ingo Molnar

    Elena Reshetova
     
  • atomic_t variables are currently used to implement reference
    counters with the following properties:

    - counter is initialized to 1 using atomic_set()
    - a resource is freed upon counter reaching zero
    - once counter reaches zero, its further
    increments aren't allowed
    - counter schema uses basic atomic operations
    (set, inc, inc_not_zero, dec_and_test, etc.)

    Such atomic variables should be converted to a newly provided
    refcount_t type and API that prevents accidental counter overflows
    and underflows. This is important since overflows and underflows
    can lead to use-after-free situation and be exploitable.

    The variable signal_struct.sigcnt is used as pure reference counter.
    Convert it to refcount_t and fix up the operations.

    ** Important note for maintainers:

    Some functions from refcount_t API defined in lib/refcount.c
    have different memory ordering guarantees than their atomic
    counterparts.

    The full comparison can be seen in
    https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon
    in state to be merged to the documentation tree.

    Normally the differences should not matter since refcount_t provides
    enough guarantees to satisfy the refcounting use cases, but in
    some rare cases it might matter.

    Please double check that you don't have some undocumented
    memory guarantees for this variable usage.

    For the signal_struct.sigcnt it might make a difference
    in following places:

    - put_signal_struct(): decrement in refcount_dec_and_test() only
    provides RELEASE ordering and control dependency on success
    vs. fully ordered atomic counterpart

    Suggested-by: Kees Cook
    Signed-off-by: Elena Reshetova
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: David Windsor
    Reviewed-by: Hans Liljestrand
    Reviewed-by: Andrea Parri
    Reviewed-by: Oleg Nesterov
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: viro@zeniv.linux.org.uk
    Link: https://lkml.kernel.org/r/1547814450-18902-3-git-send-email-elena.reshetova@intel.com
    Signed-off-by: Ingo Molnar

    Elena Reshetova
     

02 Feb, 2019

2 commits

  • The current help text caused some confusion in online forums about
    whether or not to default-enable or default-disable psi in vendor
    kernels. This is because it doesn't communicate the reason for why we
    made this setting configurable in the first place: that the overhead is
    non-zero in an artificial scheduler stress test.

    Since this isn't representative of real workloads, and the effect was
    not measurable in scheduler-heavy real world applications such as the
    webservers and memcache installations at Facebook, it's fair to point
    out that this is a pretty cautious option to select.

    Link: http://lkml.kernel.org/r/20190129233617.16767-1-hannes@cmpxchg.org
    Signed-off-by: Johannes Weiner
    Reviewed-by: Andrew Morton
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Link: http://lkml.kernel.org/r/20190129150813.15785-1-j.neuschaefer@gmx.net
    Signed-off-by: Jonathan Neuschäfer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jonathan Neuschäfer
     

26 Jan, 2019

1 commit

  • loginuid and sessionid (and audit_log_session_info) should be part of
    CONFIG_AUDIT scope and not CONFIG_AUDITSYSCALL since it is used in
    CONFIG_CHANGE, ANOM_LINK, FEATURE_CHANGE (and INTEGRITY_RULE), none of
    which are otherwise dependent on AUDITSYSCALL.

    Please see github issue
    https://github.com/linux-audit/audit-kernel/issues/104

    Signed-off-by: Richard Guy Briggs
    [PM: tweaked subject line for better grep'ing]
    Signed-off-by: Paul Moore

    Richard Guy Briggs
     

14 Jan, 2019

1 commit

  • When building using GCC 4.7 or older, -ffunction-sections & the -pg flag
    used by ftrace are incompatible. This causes warnings or build failures
    (where -Werror applies) such as the following:

    arch/mips/generic/init.c:
    error: -ffunction-sections disabled; it makes profiling impossible

    This used to be taken into account by the ordering of calls to cc-option
    from within the top-level Makefile, which was introduced by commit
    90ad4052e85c ("kbuild: avoid conflict between -ffunction-sections and
    -pg on gcc-4.7"). Unfortunately this was broken when the
    CONFIG_LD_DEAD_CODE_DATA_ELIMINATION cc-option check was moved to
    Kconfig in commit e85d1d65cd8a ("kbuild: test dead code/data elimination
    support in Kconfig"), because the flags used by this check no longer
    include -pg.

    Fix this by not allowing CONFIG_LD_DEAD_CODE_DATA_ELIMINATION to be
    enabled at the same time as ftrace/CONFIG_FUNCTION_TRACER when building
    using GCC 4.7 or older.

    Signed-off-by: Paul Burton
    Fixes: e85d1d65cd8a ("kbuild: test dead code/data elimination support in Kconfig")
    Reported-by: Geert Uytterhoeven
    Cc: Nicholas Piggin
    Cc: stable@vger.kernel.org # v4.19+
    Signed-off-by: Masahiro Yamada

    Paul Burton
     

06 Jan, 2019

2 commits

  • Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label".

    The jump label is controlled by HAVE_JUMP_LABEL, which is defined
    like this:

    #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL)
    # define HAVE_JUMP_LABEL
    #endif

    We can improve this by testing 'asm goto' support in Kconfig, then
    make JUMP_LABEL depend on CC_HAS_ASM_GOTO.

    Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will
    match to the real kernel capability.

    Signed-off-by: Masahiro Yamada
    Acked-by: Michael Ellerman (powerpc)
    Tested-by: Sedat Dilek

    Masahiro Yamada
     
  • Pull vfs mount API prep from Al Viro:
    "Mount API prereqs.

    Mostly that's LSM mount options cleanups. There are several minor
    fixes in there, but nothing earth-shattering (leaks on failure exits,
    mostly)"

    * 'mount.part1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (27 commits)
    mount_fs: suppress MAC on MS_SUBMOUNT as well as MS_KERNMOUNT
    smack: rewrite smack_sb_eat_lsm_opts()
    smack: get rid of match_token()
    smack: take the guts of smack_parse_opts_str() into a new helper
    LSM: new method: ->sb_add_mnt_opt()
    selinux: rewrite selinux_sb_eat_lsm_opts()
    selinux: regularize Opt_... names a bit
    selinux: switch away from match_token()
    selinux: new helper - selinux_add_opt()
    LSM: bury struct security_mnt_opts
    smack: switch to private smack_mnt_opts
    selinux: switch to private struct selinux_mnt_opts
    LSM: hide struct security_mnt_opts from any generic code
    selinux: kill selinux_sb_get_mnt_opts()
    LSM: turn sb_eat_lsm_opts() into a method
    nfs_remount(): don't leak, don't ignore LSM options quietly
    btrfs: sanitize security_mnt_opts use
    selinux; don't open-code a loop in sb_finish_set_opts()
    LSM: split ->sb_set_mnt_opts() out of ->sb_kern_mount()
    new helper: security_sb_eat_lsm_opts()
    ...

    Linus Torvalds
     

05 Jan, 2019

3 commits

  • Unpacking an external initrd may fail e.g. not enough memory. This
    leads to an incomplete rootfs because some files might be extracted
    already. Fixed by cleaning the rootfs so the kernel is not using an
    incomplete rootfs.

    Link: http://lkml.kernel.org/r/20181030151805.5519-1-david.engraf@sysgo.com
    Signed-off-by: David Engraf
    Cc: Dominik Brodowski
    Cc: Greg Kroah-Hartman
    Cc: Philippe Ombredanne
    Cc: Arnd Bergmann
    Cc: Luc Van Oostenryck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Engraf
     
  • We get a warning when building kernel with W=1:

    kernel/fork.c:167:13: warning: no previous prototype for `arch_release_thread_stack' [-Wmissing-prototypes]
    kernel/fork.c:779:13: warning: no previous prototype for `fork_init' [-Wmissing-prototypes]

    Add the missing declaration in head file to fix this.

    Also, remove arch_release_thread_stack() completely because no arch
    seems to implement it since bb9d81264 (arch: remove tile port).

    Link: http://lkml.kernel.org/r/1542170087-23645-1-git-send-email-wang.yi59@zte.com.cn
    Signed-off-by: Yi Wang
    Acked-by: Michal Hocko
    Acked-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yi Wang
     
  • Initcall names should not be changed.

    Link: http://lkml.kernel.org/r/20181124091829.GD10969@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

29 Dec, 2018

4 commits

  • Pull Devicetree updates from Rob Herring:
    "The biggest highlight here is the start of using json-schema for DT
    bindings. Being able to validate bindings has been discussed for years
    with little progress.

    - Initial support for DT bindings using json-schema language. This is
    the start of converting DT bindings from free-form text to a
    structured format.

    - Reworking of initrd address initialization. This moves to using the
    phys address instead of virt addr in the DT parsing code. This
    rework was motivated by CONFIG_DEV_BLK_INITRD causing unnecessary
    rebuilding of lots of files.

    - Fix stale phandle entries in phandle cache

    - DT overlay validation improvements. This exposed several memory
    leak bugs which have been fixed.

    - Use node name and device_type helper functions in DT code

    - Last remaining conversions to using %pOFn printk specifier instead
    of device_node.name directly

    - Create new common RTC binding doc and move all trivial RTC devices
    out of trivial-devices.txt.

    - New bindings for Freescale MAG3110 magnetometer, Cadence Sierra
    PHY, and Xen shared memory

    - Update dtc to upstream version v1.4.7-57-gf267e674d145"

    * tag 'devicetree-for-4.21' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (68 commits)
    of: __of_detach_node() - remove node from phandle cache
    of: of_node_get()/of_node_put() nodes held in phandle cache
    gpio-omap.txt: add reg and interrupts properties
    dt-bindings: mrvl,intc: fix a trivial typo
    dt-bindings: iio: magnetometer: add dt-bindings for freescale mag3110
    dt-bindings: Convert trivial-devices.txt to json-schema
    dt-bindings: arm: mrvl: amend Browstone compatible string
    dt-bindings: arm: Convert Tegra board/soc bindings to json-schema
    dt-bindings: arm: Convert ZTE board/soc bindings to json-schema
    dt-bindings: arm: Add missing Xilinx boards
    dt-bindings: arm: Convert Xilinx board/soc bindings to json-schema
    dt-bindings: arm: Convert VIA board/soc bindings to json-schema
    dt-bindings: arm: Convert ST STi board/soc bindings to json-schema
    dt-bindings: arm: Convert SPEAr board/soc bindings to json-schema
    dt-bindings: arm: Convert CSR SiRF board/soc bindings to json-schema
    dt-bindings: arm: Convert QCom board/soc bindings to json-schema
    dt-bindings: arm: Convert TI nspire board/soc bindings to json-schema
    dt-bindings: arm: Convert TI davinci board/soc bindings to json-schema
    dt-bindings: arm: Convert Calxeda board/soc bindings to json-schema
    dt-bindings: arm: Convert Altera board/soc bindings to json-schema
    ...

    Linus Torvalds
     
  • Merge misc updates from Andrew Morton:

    - large KASAN update to use arm's "software tag-based mode"

    - a few misc things

    - sh updates

    - ocfs2 updates

    - just about all of MM

    * emailed patches from Andrew Morton : (167 commits)
    kernel/fork.c: mark 'stack_vm_area' with __maybe_unused
    memcg, oom: notify on oom killer invocation from the charge path
    mm, swap: fix swapoff with KSM pages
    include/linux/gfp.h: fix typo
    mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm
    hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
    hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
    memory_hotplug: add missing newlines to debugging output
    mm: remove __hugepage_set_anon_rmap()
    include/linux/vmstat.h: remove unused page state adjustment macro
    mm/page_alloc.c: allow error injection
    mm: migrate: drop unused argument of migrate_page_move_mapping()
    blkdev: avoid migration stalls for blkdev pages
    mm: migrate: provide buffer_migrate_page_norefs()
    mm: migrate: move migrate_page_lock_buffers()
    mm: migrate: lock buffers before migrate_page_move_mapping()
    mm: migration: factor out code to compute expected number of page references
    mm, page_alloc: enable pcpu_drain with zone capability
    kmemleak: add config to select auto scan
    mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init
    ...

    Linus Torvalds
     
  • Pull block updates from Jens Axboe:
    "This is the main pull request for block/storage for 4.21.

    Larger than usual, it was a busy round with lots of goodies queued up.
    Most notable is the removal of the old IO stack, which has been a long
    time coming. No new features for a while, everything coming in this
    week has all been fixes for things that were previously merged.

    This contains:

    - Use atomic counters instead of semaphores for mtip32xx (Arnd)

    - Cleanup of the mtip32xx request setup (Christoph)

    - Fix for circular locking dependency in loop (Jan, Tetsuo)

    - bcache (Coly, Guoju, Shenghui)
    * Optimizations for writeback caching
    * Various fixes and improvements

    - nvme (Chaitanya, Christoph, Sagi, Jay, me, Keith)
    * host and target support for NVMe over TCP
    * Error log page support
    * Support for separate read/write/poll queues
    * Much improved polling
    * discard OOM fallback
    * Tracepoint improvements

    - lightnvm (Hans, Hua, Igor, Matias, Javier)
    * Igor added packed metadata to pblk. Now drives without metadata
    per LBA can be used as well.
    * Fix from Geert on uninitialized value on chunk metadata reads.
    * Fixes from Hans and Javier to pblk recovery and write path.
    * Fix from Hua Su to fix a race condition in the pblk recovery
    code.
    * Scan optimization added to pblk recovery from Zhoujie.
    * Small geometry cleanup from me.

    - Conversion of the last few drivers that used the legacy path to
    blk-mq (me)

    - Removal of legacy IO path in SCSI (me, Christoph)

    - Removal of legacy IO stack and schedulers (me)

    - Support for much better polling, now without interrupts at all.
    blk-mq adds support for multiple queue maps, which enables us to
    have a map per type. This in turn enables nvme to have separate
    completion queues for polling, which can then be interrupt-less.
    Also means we're ready for async polled IO, which is hopefully
    coming in the next release.

    - Killing of (now) unused block exports (Christoph)

    - Unification of the blk-rq-qos and blk-wbt wait handling (Josef)

    - Support for zoned testing with null_blk (Masato)

    - sx8 conversion to per-host tag sets (Christoph)

    - IO priority improvements (Damien)

    - mq-deadline zoned fix (Damien)

    - Ref count blkcg series (Dennis)

    - Lots of blk-mq improvements and speedups (me)

    - sbitmap scalability improvements (me)

    - Make core inflight IO accounting per-cpu (Mikulas)

    - Export timeout setting in sysfs (Weiping)

    - Cleanup the direct issue path (Jianchao)

    - Export blk-wbt internals in block debugfs for easier debugging
    (Ming)

    - Lots of other fixes and improvements"

    * tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block: (364 commits)
    kyber: use sbitmap add_wait_queue/list_del wait helpers
    sbitmap: add helpers for add/del wait queue handling
    block: save irq state in blkg_lookup_create()
    dm: don't reuse bio for flushes
    nvme-pci: trace SQ status on completions
    nvme-rdma: implement polling queue map
    nvme-fabrics: allow user to pass in nr_poll_queues
    nvme-fabrics: allow nvmf_connect_io_queue to poll
    nvme-core: optionally poll sync commands
    block: make request_to_qc_t public
    nvme-tcp: fix spelling mistake "attepmpt" -> "attempt"
    nvme-tcp: fix endianess annotations
    nvmet-tcp: fix endianess annotations
    nvme-pci: refactor nvme_poll_irqdisable to make sparse happy
    nvme-pci: only set nr_maps to 2 if poll queues are supported
    nvmet: use a macro for default error location
    nvmet: fix comparison of a u16 with -1
    blk-mq: enable IO poll if .nr_queues of type poll > 0
    blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()
    blk-mq: skip zero-queue maps in blk_mq_map_swqueue
    ...

    Linus Torvalds
     
  • The current value of the early boot static pool size, 1024 is not big
    enough for systems with large number of CPUs with timer or/and workqueue
    objects selected. As the results, systems have 60+ CPUs with both timer
    and workqueue objects enabled could trigger "ODEBUG: Out of memory.
    ODEBUG disabled".

    Some debug objects are allocated during the early boot. Enabling some
    options like timers or workqueue objects may increase the size required
    significantly with large number of CPUs. For example,

    CONFIG_DEBUG_OBJECTS_TIMERS:
    No. CPUs x 2 (worker pool) objects:
    start_kernel
    workqueue_init_early
    init_worker_pool
    init_timer_key
    debug_object_init

    plus No. CPUs objects (CONFIG_HIGH_RES_TIMERS):
    sched_init
    hrtick_rq_init
    hrtimer_init

    CONFIG_DEBUG_OBJECTS_WORK:
    No. CPUs objects:
    vmalloc_init
    __init_work

    plus No. CPUs x 6 (workqueue) objects:
    workqueue_init_early
    alloc_workqueue
    __alloc_workqueue_key
    alloc_and_link_pwqs
    init_pwq

    Also, plus No. CPUs objects:
    perf_event_init
    __init_srcu_struct
    init_srcu_struct_fields
    init_srcu_struct_nodes
    __init_work

    However, none of the things are actually used or required before
    debug_objects_mem_init() is invoked, so just move the call right before
    vmalloc_init().

    According to tglx, "the reason why the call is at this place in
    start_kernel() is historical. It's because back in the days when
    debugobjects were added the memory allocator was enabled way later than
    today."

    Link: http://lkml.kernel.org/r/20181126102407.1836-1-cai@gmx.us
    Signed-off-by: Qian Cai
    Suggested-by: Thomas Gleixner
    Cc: Waiman Long
    Cc: Yang Shi
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Qian Cai
     

28 Dec, 2018

1 commit

  • Pull audit updates from Paul Moore:
    "In the finest of holiday of traditions, I have a number of gifts to
    share today. While most of them are re-gifts from others, unlike the
    typical re-gift, these are things you will want in and around your
    tree; I promise.

    This pull request is perhaps a bit larger than our typical PR, but
    most of it comes from Jan's rework of audit's fanotify code; a very
    welcome improvement. We ran this through our normal regression tests,
    as well as some newly created stress tests and everything looks good.

    Richard added a few patches, mostly cleaning up a few things and and
    shortening some of the audit records that we send to userspace; a
    change the userspace folks are quite happy about.

    Finally YueHaibing and I kick in a few patches to simplify things a
    bit and make the code less prone to errors.

    Lastly, I want to say thanks one more time to everyone who has
    contributed patches, testing, and code reviews for the audit subsystem
    over the past year. The project is what it is due to your help and
    contributions - thank you"

    * tag 'audit-pr-20181224' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: (22 commits)
    audit: remove duplicated include from audit.c
    audit: shorten PATH cap values when zero
    audit: use current whenever possible
    audit: minimize our use of audit_log_format()
    audit: remove WATCH and TREE config options
    audit: use session_info helper
    audit: localize audit_log_session_info prototype
    audit: Use 'mark' name for fsnotify_mark variables
    audit: Replace chunk attached to mark instead of replacing mark
    audit: Simplify locking around untag_chunk()
    audit: Drop all unused chunk nodes during deletion
    audit: Guarantee forward progress of chunk untagging
    audit: Allocate fsnotify mark independently of chunk
    audit: Provide helper for dropping mark's chunk reference
    audit: Remove pointless check in insert_hash()
    audit: Factor out chunk replacement code
    audit: Make hash table insertion safe against concurrent lookups
    audit: Embed key into chunk
    audit: Fix possible tagging failures
    audit: Fix possible spurious -ENOSPC error
    ...

    Linus Torvalds
     

27 Dec, 2018

2 commits

  • Pull EFI updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Allocate the E820 buffer before doing the
    GetMemoryMap/ExitBootServices dance so we don't run out of space

    - Clear EFI boot services mappings when freeing the memory

    - Harden efivars against callers that invoke it on non-EFI boots

    - Reduce the number of memblock reservations resulting from extensive
    use of the new efi_mem_reserve_persistent() API

    - Other assorted fixes and cleanups"

    * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/efi: Don't unmap EFI boot services code/data regions for EFI_OLD_MEMMAP and EFI_MIXED_MODE
    efi: Reduce the amount of memblock reservations for persistent allocations
    efi: Permit multiple entries in persistent memreserve data structure
    efi/libstub: Disable some warnings for x86{,_64}
    x86/efi: Move efi__boot_services() to arch/x86
    x86/efi: Unmap EFI boot services code/data regions from efi_pgd
    x86/mm/pageattr: Introduce helper function to unmap EFI boot services
    efi/fdt: Simplify the get_fdt() flow
    efi/fdt: Indentation fix
    firmware/efi: Add NULL pointer checks in efivars API functions

    Linus Torvalds
     
  • Pull RCU updates from Ingo Molnar:
    "The biggest RCU changes in this cycle were:

    - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar.

    - Replace calls of RCU-bh and RCU-sched update-side functions to
    their vanilla RCU counterparts. This series is a step towards
    complete removal of the RCU-bh and RCU-sched update-side functions.

    ( Note that some of these conversions are going upstream via their
    respective maintainers. )

    - Documentation updates, including a number of flavor-consolidation
    updates from Joel Fernandes.

    - Miscellaneous fixes.

    - Automate generation of the initrd filesystem used for rcutorture
    testing.

    - Convert spin_is_locked() assertions to instead use lockdep.

    ( Note that some of these conversions are going upstream via their
    respective maintainers. )

    - SRCU updates, especially including a fix from Dennis Krein for a
    bag-on-head-class bug.

    - RCU torture-test updates"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (112 commits)
    rcutorture: Don't do busted forward-progress testing
    rcutorture: Use 100ms buckets for forward-progress callback histograms
    rcutorture: Recover from OOM during forward-progress tests
    rcutorture: Print forward-progress test age upon failure
    rcutorture: Print time since GP end upon forward-progress failure
    rcutorture: Print histogram of CB invocation at OOM time
    rcutorture: Print GP age upon forward-progress failure
    rcu: Print per-CPU callback counts for forward-progress failures
    rcu: Account for nocb-CPU callback counts in RCU CPU stall warnings
    rcutorture: Dump grace-period diagnostics upon forward-progress OOM
    rcutorture: Prepare for asynchronous access to rcu_fwd_startat
    torture: Remove unnecessary "ret" variables
    rcutorture: Affinity forward-progress test to avoid housekeeping CPUs
    rcutorture: Break up too-long rcu_torture_fwd_prog() function
    rcutorture: Remove cbflood facility
    torture: Bring any extra CPUs online during kernel startup
    rcutorture: Add call_rcu() flooding forward-progress tests
    rcutorture/formal: Replace synchronize_sched() with synchronize_rcu()
    tools/kernel.h: Replace synchronize_sched() with synchronize_rcu()
    net/decnet: Replace rcu_barrier_bh() with rcu_barrier()
    ...

    Linus Torvalds
     

21 Dec, 2018

1 commit


15 Dec, 2018

1 commit

  • The kernel commandline parameter named in CONFIG_PSI_DEFAULT_DISABLED
    help text contradicts the documentation in kernel-parameters.txt, and
    the code. Fix that.

    Link: http://lkml.kernel.org/r/20181203213416.GA12627@cmpxchg.org
    Fixes: e0c274472d ("psi: make disabling/enabling easier for vendor kernels")
    Signed-off-by: Baruch Siach
    Acked-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Baruch Siach
     

05 Dec, 2018

1 commit

  • Pull in v4.20-rc5, solving a conflict we'll otherwise get in aio.c and
    also getting the merge fix that went into mainline that users are
    hitting testing for-4.21/block and/or for-next.

    * tag 'v4.20-rc5': (664 commits)
    Linux 4.20-rc5
    PCI: Fix incorrect value returned from pcie_get_speed_cap()
    MAINTAINERS: Update linux-mips mailing list address
    ocfs2: fix potential use after free
    mm/khugepaged: fix the xas_create_range() error path
    mm/khugepaged: collapse_shmem() do not crash on Compound
    mm/khugepaged: collapse_shmem() without freezing new_page
    mm/khugepaged: minor reorderings in collapse_shmem()
    mm/khugepaged: collapse_shmem() remember to clear holes
    mm/khugepaged: fix crashes due to misaccounted holes
    mm/khugepaged: collapse_shmem() stop if punched or truncated
    mm/huge_memory: fix lockdep complaint on 32-bit i_size_read()
    mm/huge_memory: splitting set mapping+index before unfreeze
    mm/huge_memory: rename freeze_page() to unmap_page()
    initramfs: clean old path before creating a hardlink
    kernel/kcov.c: mark funcs in __sanitizer_cov_trace_pc() as notrace
    psi: make disabling/enabling easier for vendor kernels
    proc: fixup map_files test on arm
    debugobjects: avoid recursive calls with kmemleak
    userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set
    ...

    Jens Axboe
     

04 Dec, 2018

1 commit

  • …k/linux-rcu into core/rcu

    Pull RCU changes from Paul E. McKenney:

    - Convert RCU's BUG_ON() and similar calls to WARN_ON() and similar.

    - Replace calls of RCU-bh and RCU-sched update-side functions
    to their vanilla RCU counterparts. This series is a step
    towards complete removal of the RCU-bh and RCU-sched update-side
    functions.

    ( Note that some of these conversions are going upstream via their
    respective maintainers. )

    - Documentation updates, including a number of flavor-consolidation
    updates from Joel Fernandes.

    - Miscellaneous fixes.

    - Automate generation of the initrd filesystem used for
    rcutorture testing.

    - Convert spin_is_locked() assertions to instead use lockdep.

    ( Note that some of these conversions are going upstream via their
    respective maintainers. )

    - SRCU updates, especially including a fix from Dennis Krein
    for a bag-on-head-class bug.

    - RCU torture-test updates.

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar