06 May, 2019

1 commit

  • Poking-mm initialization might require to duplicate the PGD in early
    stage. Initialize the PGD cache earlier to prevent boot failures.

    Reported-by: kernel test robot
    Signed-off-by: Nadav Amit
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rick Edgecombe
    Cc: Rik van Riel
    Cc: Stephen Rothwell
    Cc: Thomas Gleixner
    Fixes: 4fc19708b165 ("x86/alternatives: Initialize temporary mm for patching")
    Link: http://lkml.kernel.org/r/20190505011124.39692-1-namit@vmware.com
    Signed-off-by: Ingo Molnar

    Nadav Amit
     

30 Apr, 2019

1 commit

  • To prevent improper use of the PTEs that are used for text patching, the
    next patches will use a temporary mm struct. Initailize it by copying
    the init mm.

    The address that will be used for patching is taken from the lower area
    that is usually used for the task memory. Doing so prevents the need to
    frequently synchronize the temporary-mm (e.g., when BPF programs are
    installed), since different PGDs are used for the task memory.

    Finally, randomize the address of the PTEs to harden against exploits
    that use these PTEs.

    Suggested-by: Andy Lutomirski
    Tested-by: Masami Hiramatsu
    Signed-off-by: Nadav Amit
    Signed-off-by: Rick Edgecombe
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Masami Hiramatsu
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: ard.biesheuvel@linaro.org
    Cc: deneen.t.dock@intel.com
    Cc: kernel-hardening@lists.openwall.com
    Cc: kristen@linux.intel.com
    Cc: linux_dti@icloud.com
    Cc: will.deacon@arm.com
    Link: https://lkml.kernel.org/r/20190426232303.28381-8-nadav.amit@gmail.com
    Signed-off-by: Ingo Molnar

    Nadav Amit
     

20 Apr, 2019

1 commit

  • When a module option, or core kernel argument, toggles a static-key it
    requires jump labels to be initialized early. While x86, PowerPC, and
    ARM64 arrange for jump_label_init() to be called before parse_args(),
    ARM does not.

    Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1 console=ttyAMA0,115200 page_alloc.shuffle=1
    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303
    page_alloc_shuffle+0x12c/0x1ac
    static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used
    before call to jump_label_init()
    Modules linked in:
    CPU: 0 PID: 0 Comm: swapper Not tainted
    5.1.0-rc4-next-20190410-00003-g3367c36ce744 #1
    Hardware name: ARM Integrator/CP (Device Tree)
    [] (unwind_backtrace) from [] (show_stack+0x10/0x18)
    [] (show_stack) from [] (dump_stack+0x18/0x24)
    [] (dump_stack) from [] (__warn+0xe0/0x108)
    [] (__warn) from [] (warn_slowpath_fmt+0x44/0x6c)
    [] (warn_slowpath_fmt) from []
    (page_alloc_shuffle+0x12c/0x1ac)
    [] (page_alloc_shuffle) from [] (shuffle_store+0x28/0x48)
    [] (shuffle_store) from [] (parse_args+0x1f4/0x350)
    [] (parse_args) from [] (start_kernel+0x1c0/0x488)

    Move the fallback call to jump_label_init() to occur before
    parse_args().

    The redundant calls to jump_label_init() in other archs are left intact
    in case they have static key toggling use cases that are even earlier
    than option parsing.

    Link: http://lkml.kernel.org/r/155544804466.1032396.13418949511615676665.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams
    Reported-by: Guenter Roeck
    Reviewed-by: Kees Cook
    Cc: Mathieu Desnoyers
    Cc: Thomas Gleixner
    Cc: Mike Rapoport
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     

13 Mar, 2019

1 commit

  • Add panic() calls if memblock_alloc() returns NULL.

    The panic() format duplicates the one used by memblock itself and in
    order to avoid explosion with long parameters list replace open coded
    allocation size calculations with a local variable.

    Link: http://lkml.kernel.org/r/1548057848-15136-18-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: Catalin Marinas
    Cc: Christophe Leroy
    Cc: Christoph Hellwig
    Cc: "David S. Miller"
    Cc: Dennis Zhou
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Guo Ren [c-sky]
    Cc: Heiko Carstens
    Cc: Juergen Gross [Xen]
    Cc: Mark Salter
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Paul Burton
    Cc: Petr Mladek
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Rob Herring
    Cc: Rob Herring
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

13 Feb, 2019

1 commit

  • This reverts commit fe53ca54270a ("mm: use early_pfn_to_nid in
    page_ext_init").

    When booting a system with "page_owner=on",

    start_kernel
    page_ext_init
    invoke_init_callbacks
    init_section_page_ext
    init_page_owner
    init_early_allocated_pages
    init_zones_in_node
    init_pages_in_zone
    lookup_page_ext
    page_to_nid

    The issue here is that page_to_nid() will not work since some page flags
    have no node information until later in page_alloc_init_late() due to
    DEFERRED_STRUCT_PAGE_INIT. Hence, it could trigger an out-of-bounds
    access with an invalid nid.

    UBSAN: Undefined behaviour in ./include/linux/mm.h:1104:50
    index 7 is out of range for type 'zone [5]'

    Also, kernel will panic since flags were poisoned earlier with,

    CONFIG_DEBUG_VM_PGFLAGS=y
    CONFIG_NODE_NOT_IN_PAGE_FLAGS=n

    start_kernel
    setup_arch
    pagetable_init
    paging_init
    sparse_init
    sparse_init_nid
    memblock_alloc_try_nid_raw

    It did not handle it well in init_pages_in_zone() which ends up calling
    page_to_nid().

    page:ffffea0004200000 is uninitialized and poisoned
    raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
    raw: ffffffffffffffff ffffffffffffffff ffffffffffffffff ffffffffffffffff
    page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
    page_owner info is not active (free page?)
    kernel BUG at include/linux/mm.h:990!
    RIP: 0010:init_page_owner+0x486/0x520

    This means that assumptions behind commit fe53ca54270a ("mm: use
    early_pfn_to_nid in page_ext_init") are incomplete. Therefore, revert
    the commit for now. A proper way to move the page_owner initialization
    to sooner is to hook into memmap initialization.

    Link: http://lkml.kernel.org/r/20190115202812.75820-1-cai@lca.pw
    Signed-off-by: Qian Cai
    Acked-by: Michal Hocko
    Cc: Pasha Tatashin
    Cc: Mel Gorman
    Cc: Yang Shi
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Qian Cai
     

05 Jan, 2019

2 commits

  • We get a warning when building kernel with W=1:

    kernel/fork.c:167:13: warning: no previous prototype for `arch_release_thread_stack' [-Wmissing-prototypes]
    kernel/fork.c:779:13: warning: no previous prototype for `fork_init' [-Wmissing-prototypes]

    Add the missing declaration in head file to fix this.

    Also, remove arch_release_thread_stack() completely because no arch
    seems to implement it since bb9d81264 (arch: remove tile port).

    Link: http://lkml.kernel.org/r/1542170087-23645-1-git-send-email-wang.yi59@zte.com.cn
    Signed-off-by: Yi Wang
    Acked-by: Michal Hocko
    Acked-by: Mike Rapoport
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yi Wang
     
  • Initcall names should not be changed.

    Link: http://lkml.kernel.org/r/20181124091829.GD10969@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

29 Dec, 2018

3 commits

  • Merge misc updates from Andrew Morton:

    - large KASAN update to use arm's "software tag-based mode"

    - a few misc things

    - sh updates

    - ocfs2 updates

    - just about all of MM

    * emailed patches from Andrew Morton : (167 commits)
    kernel/fork.c: mark 'stack_vm_area' with __maybe_unused
    memcg, oom: notify on oom killer invocation from the charge path
    mm, swap: fix swapoff with KSM pages
    include/linux/gfp.h: fix typo
    mm/hmm: fix memremap.h, move dev_page_fault_t callback to hmm
    hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race
    hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization
    memory_hotplug: add missing newlines to debugging output
    mm: remove __hugepage_set_anon_rmap()
    include/linux/vmstat.h: remove unused page state adjustment macro
    mm/page_alloc.c: allow error injection
    mm: migrate: drop unused argument of migrate_page_move_mapping()
    blkdev: avoid migration stalls for blkdev pages
    mm: migrate: provide buffer_migrate_page_norefs()
    mm: migrate: move migrate_page_lock_buffers()
    mm: migrate: lock buffers before migrate_page_move_mapping()
    mm: migration: factor out code to compute expected number of page references
    mm, page_alloc: enable pcpu_drain with zone capability
    kmemleak: add config to select auto scan
    mm/page_alloc.c: don't call kasan_free_pages() at deferred mem init
    ...

    Linus Torvalds
     
  • Pull block updates from Jens Axboe:
    "This is the main pull request for block/storage for 4.21.

    Larger than usual, it was a busy round with lots of goodies queued up.
    Most notable is the removal of the old IO stack, which has been a long
    time coming. No new features for a while, everything coming in this
    week has all been fixes for things that were previously merged.

    This contains:

    - Use atomic counters instead of semaphores for mtip32xx (Arnd)

    - Cleanup of the mtip32xx request setup (Christoph)

    - Fix for circular locking dependency in loop (Jan, Tetsuo)

    - bcache (Coly, Guoju, Shenghui)
    * Optimizations for writeback caching
    * Various fixes and improvements

    - nvme (Chaitanya, Christoph, Sagi, Jay, me, Keith)
    * host and target support for NVMe over TCP
    * Error log page support
    * Support for separate read/write/poll queues
    * Much improved polling
    * discard OOM fallback
    * Tracepoint improvements

    - lightnvm (Hans, Hua, Igor, Matias, Javier)
    * Igor added packed metadata to pblk. Now drives without metadata
    per LBA can be used as well.
    * Fix from Geert on uninitialized value on chunk metadata reads.
    * Fixes from Hans and Javier to pblk recovery and write path.
    * Fix from Hua Su to fix a race condition in the pblk recovery
    code.
    * Scan optimization added to pblk recovery from Zhoujie.
    * Small geometry cleanup from me.

    - Conversion of the last few drivers that used the legacy path to
    blk-mq (me)

    - Removal of legacy IO path in SCSI (me, Christoph)

    - Removal of legacy IO stack and schedulers (me)

    - Support for much better polling, now without interrupts at all.
    blk-mq adds support for multiple queue maps, which enables us to
    have a map per type. This in turn enables nvme to have separate
    completion queues for polling, which can then be interrupt-less.
    Also means we're ready for async polled IO, which is hopefully
    coming in the next release.

    - Killing of (now) unused block exports (Christoph)

    - Unification of the blk-rq-qos and blk-wbt wait handling (Josef)

    - Support for zoned testing with null_blk (Masato)

    - sx8 conversion to per-host tag sets (Christoph)

    - IO priority improvements (Damien)

    - mq-deadline zoned fix (Damien)

    - Ref count blkcg series (Dennis)

    - Lots of blk-mq improvements and speedups (me)

    - sbitmap scalability improvements (me)

    - Make core inflight IO accounting per-cpu (Mikulas)

    - Export timeout setting in sysfs (Weiping)

    - Cleanup the direct issue path (Jianchao)

    - Export blk-wbt internals in block debugfs for easier debugging
    (Ming)

    - Lots of other fixes and improvements"

    * tag 'for-4.21/block-20181221' of git://git.kernel.dk/linux-block: (364 commits)
    kyber: use sbitmap add_wait_queue/list_del wait helpers
    sbitmap: add helpers for add/del wait queue handling
    block: save irq state in blkg_lookup_create()
    dm: don't reuse bio for flushes
    nvme-pci: trace SQ status on completions
    nvme-rdma: implement polling queue map
    nvme-fabrics: allow user to pass in nr_poll_queues
    nvme-fabrics: allow nvmf_connect_io_queue to poll
    nvme-core: optionally poll sync commands
    block: make request_to_qc_t public
    nvme-tcp: fix spelling mistake "attepmpt" -> "attempt"
    nvme-tcp: fix endianess annotations
    nvmet-tcp: fix endianess annotations
    nvme-pci: refactor nvme_poll_irqdisable to make sparse happy
    nvme-pci: only set nr_maps to 2 if poll queues are supported
    nvmet: use a macro for default error location
    nvmet: fix comparison of a u16 with -1
    blk-mq: enable IO poll if .nr_queues of type poll > 0
    blk-mq: change blk_mq_queue_busy() to blk_mq_queue_inflight()
    blk-mq: skip zero-queue maps in blk_mq_map_swqueue
    ...

    Linus Torvalds
     
  • The current value of the early boot static pool size, 1024 is not big
    enough for systems with large number of CPUs with timer or/and workqueue
    objects selected. As the results, systems have 60+ CPUs with both timer
    and workqueue objects enabled could trigger "ODEBUG: Out of memory.
    ODEBUG disabled".

    Some debug objects are allocated during the early boot. Enabling some
    options like timers or workqueue objects may increase the size required
    significantly with large number of CPUs. For example,

    CONFIG_DEBUG_OBJECTS_TIMERS:
    No. CPUs x 2 (worker pool) objects:
    start_kernel
    workqueue_init_early
    init_worker_pool
    init_timer_key
    debug_object_init

    plus No. CPUs objects (CONFIG_HIGH_RES_TIMERS):
    sched_init
    hrtick_rq_init
    hrtimer_init

    CONFIG_DEBUG_OBJECTS_WORK:
    No. CPUs objects:
    vmalloc_init
    __init_work

    plus No. CPUs x 6 (workqueue) objects:
    workqueue_init_early
    alloc_workqueue
    __alloc_workqueue_key
    alloc_and_link_pwqs
    init_pwq

    Also, plus No. CPUs objects:
    perf_event_init
    __init_srcu_struct
    init_srcu_struct_fields
    init_srcu_struct_nodes
    __init_work

    However, none of the things are actually used or required before
    debug_objects_mem_init() is invoked, so just move the call right before
    vmalloc_init().

    According to tglx, "the reason why the call is at this place in
    start_kernel() is historical. It's because back in the days when
    debugobjects were added the memory allocator was enabled way later than
    today."

    Link: http://lkml.kernel.org/r/20181126102407.1836-1-cai@gmx.us
    Signed-off-by: Qian Cai
    Suggested-by: Thomas Gleixner
    Cc: Waiman Long
    Cc: Yang Shi
    Cc: Arnd Bergmann
    Cc: Catalin Marinas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Qian Cai
     

27 Dec, 2018

1 commit

  • Pull EFI updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Allocate the E820 buffer before doing the
    GetMemoryMap/ExitBootServices dance so we don't run out of space

    - Clear EFI boot services mappings when freeing the memory

    - Harden efivars against callers that invoke it on non-EFI boots

    - Reduce the number of memblock reservations resulting from extensive
    use of the new efi_mem_reserve_persistent() API

    - Other assorted fixes and cleanups"

    * 'efi-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/efi: Don't unmap EFI boot services code/data regions for EFI_OLD_MEMMAP and EFI_MIXED_MODE
    efi: Reduce the amount of memblock reservations for persistent allocations
    efi: Permit multiple entries in persistent memreserve data structure
    efi/libstub: Disable some warnings for x86{,_64}
    x86/efi: Move efi__boot_services() to arch/x86
    x86/efi: Unmap EFI boot services code/data regions from efi_pgd
    x86/mm/pageattr: Introduce helper function to unmap EFI boot services
    efi/fdt: Simplify the get_fdt() flow
    efi/fdt: Indentation fix
    firmware/efi: Add NULL pointer checks in efivars API functions

    Linus Torvalds
     

30 Nov, 2018

1 commit

  • efi__boot_services() are x86 specific quirks and as such
    should be in asm/efi.h, so move them from linux/efi.h. Also, call
    efi_free_boot_services() from __efi_enter_virtual_mode() as it is x86
    specific call and ideally shouldn't be part of init/main.c

    Signed-off-by: Sai Praneeth Prakhya
    Signed-off-by: Ard Biesheuvel
    Acked-by: Thomas Gleixner
    Cc: Andy Lutomirski
    Cc: Arend van Spriel
    Cc: Bhupesh Sharma
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: Eric Snowberg
    Cc: Hans de Goede
    Cc: Joe Perches
    Cc: Jon Hunter
    Cc: Julien Thierry
    Cc: Linus Torvalds
    Cc: Marc Zyngier
    Cc: Matt Fleming
    Cc: Nathan Chancellor
    Cc: Peter Zijlstra
    Cc: Sedat Dilek
    Cc: YiFei Zhu
    Cc: linux-efi@vger.kernel.org
    Link: http://lkml.kernel.org/r/20181129171230.18699-7-ard.biesheuvel@linaro.org
    Signed-off-by: Ingo Molnar

    Sai Praneeth Prakhya
     

28 Nov, 2018

1 commit


08 Nov, 2018

1 commit

  • This removes a bunch of core and elevator related code. On the core
    front, we remove anything related to queue running, draining,
    initialization, plugging, and congestions. We also kill anything
    related to request allocation, merging, retrieval, and completion.

    Remove any checking for single queue IO schedulers, as they no
    longer exist. This means we can also delete a bunch of code related
    to request issue, adding, completion, etc - and all the SQ related
    ops and helpers.

    Also kill the load_default_modules(), as all that did was provide
    for a way to load the default single queue elevator.

    Tested-by: Ming Lei
    Reviewed-by: Omar Sandoval
    Signed-off-by: Jens Axboe

    Jens Axboe
     

31 Oct, 2018

4 commits

  • When a memblock allocation APIs are called with align = 0, the alignment
    is implicitly set to SMP_CACHE_BYTES.

    Implicit alignment is done deep in the memblock allocator and it can
    come as a surprise. Not that such an alignment would be wrong even
    when used incorrectly but it is better to be explicit for the sake of
    clarity and the prinicple of the least surprise.

    Replace all such uses of memblock APIs with the 'align' parameter
    explicitly set to SMP_CACHE_BYTES and stop implicit alignment assignment
    in the memblock internal allocation functions.

    For the case when memblock APIs are used via helper functions, e.g. like
    iommu_arena_new_node() in Alpha, the helper functions were detected with
    Coccinelle's help and then manually examined and updated where
    appropriate.

    The direct memblock APIs users were updated using the semantic patch below:

    @@
    expression size, min_addr, max_addr, nid;
    @@
    (
    |
    - memblock_alloc_try_nid_raw(size, 0, min_addr, max_addr, nid)
    + memblock_alloc_try_nid_raw(size, SMP_CACHE_BYTES, min_addr, max_addr,
    nid)
    |
    - memblock_alloc_try_nid_nopanic(size, 0, min_addr, max_addr, nid)
    + memblock_alloc_try_nid_nopanic(size, SMP_CACHE_BYTES, min_addr, max_addr,
    nid)
    |
    - memblock_alloc_try_nid(size, 0, min_addr, max_addr, nid)
    + memblock_alloc_try_nid(size, SMP_CACHE_BYTES, min_addr, max_addr, nid)
    |
    - memblock_alloc(size, 0)
    + memblock_alloc(size, SMP_CACHE_BYTES)
    |
    - memblock_alloc_raw(size, 0)
    + memblock_alloc_raw(size, SMP_CACHE_BYTES)
    |
    - memblock_alloc_from(size, 0, min_addr)
    + memblock_alloc_from(size, SMP_CACHE_BYTES, min_addr)
    |
    - memblock_alloc_nopanic(size, 0)
    + memblock_alloc_nopanic(size, SMP_CACHE_BYTES)
    |
    - memblock_alloc_low(size, 0)
    + memblock_alloc_low(size, SMP_CACHE_BYTES)
    |
    - memblock_alloc_low_nopanic(size, 0)
    + memblock_alloc_low_nopanic(size, SMP_CACHE_BYTES)
    |
    - memblock_alloc_from_nopanic(size, 0, min_addr)
    + memblock_alloc_from_nopanic(size, SMP_CACHE_BYTES, min_addr)
    |
    - memblock_alloc_node(size, 0, nid)
    + memblock_alloc_node(size, SMP_CACHE_BYTES, nid)
    )

    [mhocko@suse.com: changelog update]
    [akpm@linux-foundation.org: coding-style fixes]
    [rppt@linux.ibm.com: fix missed uses of implicit alignment]
    Link: http://lkml.kernel.org/r/20181016133656.GA10925@rapoport-lnx
    Link: http://lkml.kernel.org/r/1538687224-17535-1-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Suggested-by: Michal Hocko
    Acked-by: Paul Burton [MIPS]
    Acked-by: Michael Ellerman [powerpc]
    Acked-by: Michal Hocko
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: Geert Uytterhoeven
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: Matt Turner
    Cc: Michal Simek
    Cc: Richard Weinberger
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Move remaining definitions and declarations from include/linux/bootmem.h
    into include/linux/memblock.h and remove the redundant header.

    The includes were replaced with the semantic patch below and then
    semi-automated removal of duplicated '#include

    @@
    @@
    - #include
    + #include

    [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
    [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
    [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
    Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
    Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Signed-off-by: Stephen Rothwell
    Acked-by: Michal Hocko
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • The alloc_bootmem(size) is a shortcut for allocation of SMP_CACHE_BYTES
    aligned memory. When the align parameter of memblock_alloc() is 0, the
    alignment is implicitly set to SMP_CACHE_BYTES and thus alloc_bootmem(size)
    and memblock_alloc(size, 0) are equivalent.

    The conversion is done using the following semantic patch:

    @@
    expression size;
    @@
    - alloc_bootmem(size)
    + memblock_alloc(size, 0)

    Link: http://lkml.kernel.org/r/1536927045-23536-22-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • The conversion is done using

    sed -i 's@memblock_virt_alloc@memblock_alloc@g' \
    $(git grep -l memblock_virt_alloc)

    Link: http://lkml.kernel.org/r/1536927045-23536-8-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Hocko
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

23 Oct, 2018

1 commit

  • Pull locking and misc x86 updates from Ingo Molnar:
    "Lots of changes in this cycle - in part because locking/core attracted
    a number of related x86 low level work which was easier to handle in a
    single tree:

    - Linux Kernel Memory Consistency Model updates (Alan Stern, Paul E.
    McKenney, Andrea Parri)

    - lockdep scalability improvements and micro-optimizations (Waiman
    Long)

    - rwsem improvements (Waiman Long)

    - spinlock micro-optimization (Matthew Wilcox)

    - qspinlocks: Provide a liveness guarantee (more fairness) on x86.
    (Peter Zijlstra)

    - Add support for relative references in jump tables on arm64, x86
    and s390 to optimize jump labels (Ard Biesheuvel, Heiko Carstens)

    - Be a lot less permissive on weird (kernel address) uaccess faults
    on x86: BUG() when uaccess helpers fault on kernel addresses (Jann
    Horn)

    - macrofy x86 asm statements to un-confuse the GCC inliner. (Nadav
    Amit)

    - ... and a handful of other smaller changes as well"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (57 commits)
    locking/lockdep: Make global debug_locks* variables read-mostly
    locking/lockdep: Fix debug_locks off performance problem
    locking/pvqspinlock: Extend node size when pvqspinlock is configured
    locking/qspinlock_stat: Count instances of nested lock slowpaths
    locking/qspinlock, x86: Provide liveness guarantee
    x86/asm: 'Simplify' GEN_*_RMWcc() macros
    locking/qspinlock: Rework some comments
    locking/qspinlock: Re-order code
    locking/lockdep: Remove duplicated 'lock_class_ops' percpu array
    x86/defconfig: Enable CONFIG_USB_XHCI_HCD=y
    futex: Replace spin_is_locked() with lockdep
    locking/lockdep: Make class->ops a percpu counter and move it under CONFIG_DEBUG_LOCKDEP=y
    x86/jump-labels: Macrofy inline assembly code to work around GCC inlining bugs
    x86/cpufeature: Macrofy inline assembly code to work around GCC inlining bugs
    x86/extable: Macrofy inline assembly code to work around GCC inlining bugs
    x86/paravirt: Work around GCC inlining bugs when compiling paravirt ops
    x86/bug: Macrofy the BUG table section handling, to work around GCC inlining bugs
    x86/alternatives: Macrofy lock prefixes to work around GCC inlining bugs
    x86/refcount: Work around GCC inlining bug
    x86/objtool: Use asm macros to work around GCC inlining bugs
    ...

    Linus Torvalds
     

09 Oct, 2018

1 commit

  • With CONFIG_VMAP_STACK=y the kernel stack of all tasks should be
    allocated in the vmalloc space. The initial stack used for all
    the early init code is in the init_thread_union. To be able to
    switch from this early stack to a properly allocated stack
    from vmalloc the architecture needs a switch-over point.

    Introduce the arch_call_rest_init() function with a weak definition
    in init/main.c with the only purpose to call rest_init() from the
    end of start_kernel(). The architecture override can then do the
    necessary magic to switch to the new vmalloc'ed stack.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     

27 Sep, 2018

1 commit

  • Jump table entries are mostly read-only, with the exception of the
    init and module loader code that defuses entries that point into init
    code when the code being referred to is freed.

    For robustness, it would be better to move these entries into the
    ro_after_init section, but clearing the 'code' member of each jump
    table entry referring to init code at module load time races with the
    module_enable_ro() call that remaps the ro_after_init section read
    only, so we'd like to do it earlier.

    So given that whether such an entry refers to init code can be decided
    much earlier, we can pull this check forward. Since we may still need
    the code entry at this point, let's switch to setting a low bit in the
    'key' member just like we do to annotate the default state of a jump
    table entry.

    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Thomas Gleixner
    Reviewed-by: Kees Cook
    Acked-by: Peter Zijlstra (Intel)
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: linux-s390@vger.kernel.org
    Cc: Arnd Bergmann
    Cc: Heiko Carstens
    Cc: Will Deacon
    Cc: Catalin Marinas
    Cc: Steven Rostedt
    Cc: Martin Schwidefsky
    Cc: Jessica Yu
    Link: https://lkml.kernel.org/r/20180919065144.25010-8-ard.biesheuvel@linaro.org

    Ard Biesheuvel
     

23 Aug, 2018

2 commits

  • Add a log message to `run_init_process()`.

    This log message serves two purposes.

    1. If the init process is not specified on the Linux Kernel command
    line, the user sees, what file was chosen.

    2. The time stamps shows exactly, when the Linux kernel handed over
    control to the init process.

    Link: http://lkml.kernel.org/r/b1fc97fa-4aa9-1904-ddb5-859e78995c41@molgen.mpg.de
    Signed-off-by: Paul Menzel
    Reviewed-by: Andrew Morton
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menzel
     
  • Allow the initcall tables to be emitted using relative references that
    are only half the size on 64-bit architectures and don't require fixups
    at runtime on relocatable kernels.

    Link: http://lkml.kernel.org/r/20180704083651.24360-5-ard.biesheuvel@linaro.org
    Acked-by: James Morris
    Acked-by: Sergey Senozhatsky
    Acked-by: Petr Mladek
    Acked-by: Michael Ellerman
    Acked-by: Ingo Molnar
    Signed-off-by: Ard Biesheuvel
    Cc: Arnd Bergmann
    Cc: Benjamin Herrenschmidt
    Cc: Bjorn Helgaas
    Cc: Catalin Marinas
    Cc: James Morris
    Cc: Jessica Yu
    Cc: Josh Poimboeuf
    Cc: Kees Cook
    Cc: Nicolas Pitre
    Cc: Paul Mackerras
    Cc: Russell King
    Cc: "Serge E. Hallyn"
    Cc: Steven Rostedt
    Cc: Thomas Garnier
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ard Biesheuvel
     

21 Aug, 2018

1 commit

  • Pull tracing updates from Steven Rostedt:

    - Restructure of lockdep and latency tracers

    This is the biggest change. Joel Fernandes restructured the hooks
    from irqs and preemption disabling and enabling. He got rid of a lot
    of the preprocessor #ifdef mess that they caused.

    He turned both lockdep and the latency tracers to use trace events
    inserted in the preempt/irqs disabling paths. But unfortunately,
    these started to cause issues in corner cases. Thus, parts of the
    code was reverted back to where lockdep and the latency tracers just
    get called directly (without using the trace events). But because the
    original change cleaned up the code very nicely we kept that, as well
    as the trace events for preempt and irqs disabling, but they are
    limited to not being called in NMIs.

    - Have trace events use SRCU for "rcu idle" calls. This was required
    for the preempt/irqs off trace events. But it also had to not allow
    them to be called in NMI context. Waiting till Paul makes an NMI safe
    SRCU API.

    - New notrace SRCU API to allow trace events to use SRCU.

    - Addition of mcount-nop option support

    - SPDX headers replacing GPL templates.

    - Various other fixes and clean ups.

    - Some fixes are marked for stable, but were not fully tested before
    the merge window opened.

    * tag 'trace-v4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (44 commits)
    tracing: Fix SPDX format headers to use C++ style comments
    tracing: Add SPDX License format tags to tracing files
    tracing: Add SPDX License format to bpf_trace.c
    blktrace: Add SPDX License format header
    s390/ftrace: Add -mfentry and -mnop-mcount support
    tracing: Add -mcount-nop option support
    tracing: Avoid calling cc-option -mrecord-mcount for every Makefile
    tracing: Handle CC_FLAGS_FTRACE more accurately
    Uprobe: Additional argument arch_uprobe to uprobe_write_opcode()
    Uprobes: Simplify uprobe_register() body
    tracepoints: Free early tracepoints after RCU is initialized
    uprobes: Use synchronize_rcu() not synchronize_sched()
    tracing: Fix synchronizing to event changes with tracepoint_synchronize_unregister()
    ftrace: Remove unused pointer ftrace_swapper_pid
    tracing: More reverting of "tracing: Centralize preemptirq tracepoints and unify their usage"
    tracing/irqsoff: Handle preempt_count for different configs
    tracing: Partial revert of "tracing: Centralize preemptirq tracepoints and unify their usage"
    tracing: irqsoff: Account for additional preempt_disable
    trace: Use rcu_dereference_raw for hooks from trace-event subsystem
    tracing/kprobes: Fix within_notrace_func() to check only notrace functions
    ...

    Linus Torvalds
     

14 Aug, 2018

2 commits

  • Pull x86 timer updates from Thomas Gleixner:
    "Early TSC based time stamping to allow better boot time analysis.

    This comes with a general cleanup of the TSC calibration code which
    grew warts and duct taping over the years and removes 250 lines of
    code. Initiated and mostly implemented by Pavel with help from various
    folks"

    * 'x86-timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
    x86/kvmclock: Mark kvm_get_preset_lpj() as __init
    x86/tsc: Consolidate init code
    sched/clock: Disable interrupts when calling generic_sched_clock_init()
    timekeeping: Prevent false warning when persistent clock is not available
    sched/clock: Close a hole in sched_clock_init()
    x86/tsc: Make use of tsc_calibrate_cpu_early()
    x86/tsc: Split native_calibrate_cpu() into early and late parts
    sched/clock: Use static key for sched_clock_running
    sched/clock: Enable sched clock early
    sched/clock: Move sched clock initialization and merge with generic clock
    x86/tsc: Use TSC as sched clock early
    x86/tsc: Initialize cyc2ns when tsc frequency is determined
    x86/tsc: Calibrate tsc only once
    ARM/time: Remove read_boot_clock64()
    s390/time: Remove read_boot_clock64()
    timekeeping: Default boot time offset to local_clock()
    timekeeping: Replace read_boot_clock64() with read_persistent_wall_and_boot_offset()
    s390/time: Add read_persistent_wall_and_boot_offset()
    x86/xen/time: Output xen sched_clock time from 0
    x86/xen/time: Initialize pv xen time in init_hypervisor_platform()
    ...

    Linus Torvalds
     
  • Pull x86 PTI updates from Thomas Gleixner:
    "The Speck brigade sadly provides yet another large set of patches
    destroying the perfomance which we carefully built and preserved

    - PTI support for 32bit PAE. The missing counter part to the 64bit
    PTI code implemented by Joerg.

    - A set of fixes for the Global Bit mechanics for non PCID CPUs which
    were setting the Global Bit too widely and therefore possibly
    exposing interesting memory needlessly.

    - Protection against userspace-userspace SpectreRSB

    - Support for the upcoming Enhanced IBRS mode, which is preferred
    over IBRS. Unfortunately we dont know the performance impact of
    this, but it's expected to be less horrible than the IBRS
    hammering.

    - Cleanups and simplifications"

    * 'x86/pti' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
    x86/mm/pti: Move user W+X check into pti_finalize()
    x86/relocs: Add __end_rodata_aligned to S_REL
    x86/mm/pti: Clone kernel-image on PTE level for 32 bit
    x86/mm/pti: Don't clear permissions in pti_clone_pmd()
    x86/mm/pti: Fix 32 bit PCID check
    x86/mm/init: Remove freed kernel image areas from alias mapping
    x86/mm/init: Add helper for freeing kernel image pages
    x86/mm/init: Pass unconverted symbol addresses to free_init_pages()
    mm: Allow non-direct-map arguments to free_reserved_area()
    x86/mm/pti: Clear Global bit more aggressively
    x86/speculation: Support Enhanced IBRS on future CPUs
    x86/speculation: Protect against userspace-userspace spectreRSB
    x86/kexec: Allocate 8k PGDs for PTI
    Revert "perf/core: Make sure the ring-buffer is mapped in all page-tables"
    x86/mm: Remove in_nmi() warning from vmalloc_fault()
    x86/entry/32: Check for VM86 mode in slow-path check
    perf/core: Make sure the ring-buffer is mapped in all page-tables
    x86/pti: Check the return value of pti_user_pagetable_walk_pmd()
    x86/pti: Check the return value of pti_user_pagetable_walk_p4d()
    x86/entry/32: Add debug code to check entry/exit CR3
    ...

    Linus Torvalds
     

13 Aug, 2018

1 commit

  • This is purely a preparatory patch for upcoming changes during the 4.19
    merge window.

    We have a function called "boot_cpu_state_init()" that isn't really
    about the bootup cpu state: that is done much earlier by the similarly
    named "boot_cpu_init()" (note lack of "state" in name).

    This function initializes some hotplug CPU state, and needs to run after
    the percpu data has been properly initialized. It even has a comment to
    that effect.

    Except it _doesn't_ actually run after the percpu data has been properly
    initialized. On x86 it happens to do that, but on at least arm and
    arm64, the percpu base pointers are initialized by the arch-specific
    'smp_prepare_boot_cpu()' hook, which ran _after_ boot_cpu_state_init().

    This had some unexpected results, and in particular we have a patch
    pending for the merge window that did the obvious cleanup of using
    'this_cpu_write()' in the cpu hotplug init code:

    - per_cpu_ptr(&cpuhp_state, smp_processor_id())->state = CPUHP_ONLINE;
    + this_cpu_write(cpuhp_state.state, CPUHP_ONLINE);

    which is obviously the right thing to do. Except because of the
    ordering issue, it actually failed miserably and unexpectedly on arm64.

    So this just fixes the ordering, and changes the name of the function to
    be 'boot_cpu_hotplug_init()' to make it obvious that it's about cpu
    hotplug state, because the core CPU state was supposed to have already
    been done earlier.

    Marked for stable, since the (not yet merged) patch that will show this
    problem is marked for stable.

    Reported-by: Vlastimil Babka
    Reported-by: Mian Yousaf Kaukab
    Suggested-by: Catalin Marinas
    Acked-by: Thomas Gleixner
    Cc: Will Deacon
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

11 Aug, 2018

1 commit

  • Joel Fernandes created a nice patch that cleaned up the duplicate hooks used
    by lockdep and irqsoff latency tracer. It made both use tracepoints. But it
    caused lockdep to trigger several false positives. We have not figured out
    why yet, but removing lockdep from using the trace event hooks and just call
    its helper functions directly (like it use to), makes the problem go away.

    This is a partial revert of the clean up patch c3bc8fd637a9 ("tracing:
    Centralize preemptirq tracepoints and unify their usage") that adds direct
    calls for lockdep, but also keeps most of the clean up done to get rid of
    the horrible preprocessor if statements.

    Link: http://lkml.kernel.org/r/20180806155058.5ee875f4@gandalf.local.home

    Cc: Peter Zijlstra
    Reviewed-by: Joel Fernandes (Google)
    Fixes: c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and unify their usage")
    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)
     

31 Jul, 2018

1 commit

  • This patch detaches the preemptirq tracepoints from the tracers and
    keeps it separate.

    Advantages:
    * Lockdep and irqsoff event can now run in parallel since they no longer
    have their own calls.

    * This unifies the usecase of adding hooks to an irqsoff and irqson
    event, and a preemptoff and preempton event.
    3 users of the events exist:
    - Lockdep
    - irqsoff and preemptoff tracers
    - irqs and preempt trace events

    The unification cleans up several ifdefs and makes the code in preempt
    tracer and irqsoff tracers simpler. It gets rid of all the horrific
    ifdeferry around PROVE_LOCKING and makes configuration of the different
    users of the tracepoints more easy and understandable. It also gets rid
    of the time_* function calls from the lockdep hooks used to call into
    the preemptirq tracer which is not needed anymore. The negative delta in
    lines of code in this patch is quite large too.

    In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS
    as a single point for registering probes onto the tracepoints. With
    this,
    the web of config options for preempt/irq toggle tracepoints and its
    users becomes:

    PREEMPT_TRACER PREEMPTIRQ_EVENTS IRQSOFF_TRACER PROVE_LOCKING
    | | \ | |
    \ (selects) / \ \ (selects) /
    TRACE_PREEMPT_TOGGLE ----> TRACE_IRQFLAGS
    \ /
    \ (depends on) /
    PREEMPTIRQ_TRACEPOINTS

    Other than the performance tests mentioned in the previous patch, I also
    ran the locking API test suite. I verified that all tests cases are
    passing.

    I also injected issues by not registering lockdep probes onto the
    tracepoints and I see failures to confirm that the probes are indeed
    working.

    This series + lockdep probes not registered (just to inject errors):
    [ 0.000000] hard-irqs-on + irq-safe-A/21: ok | ok | ok |
    [ 0.000000] soft-irqs-on + irq-safe-A/21: ok | ok | ok |
    [ 0.000000] sirq-safe-A => hirqs-on/12:FAILED|FAILED| ok |
    [ 0.000000] sirq-safe-A => hirqs-on/21:FAILED|FAILED| ok |
    [ 0.000000] hard-safe-A + irqs-on/12:FAILED|FAILED| ok |
    [ 0.000000] soft-safe-A + irqs-on/12:FAILED|FAILED| ok |
    [ 0.000000] hard-safe-A + irqs-on/21:FAILED|FAILED| ok |
    [ 0.000000] soft-safe-A + irqs-on/21:FAILED|FAILED| ok |
    [ 0.000000] hard-safe-A + unsafe-B #1/123: ok | ok | ok |
    [ 0.000000] soft-safe-A + unsafe-B #1/123: ok | ok | ok |

    With this series + lockdep probes registered, all locking tests pass:

    [ 0.000000] hard-irqs-on + irq-safe-A/21: ok | ok | ok |
    [ 0.000000] soft-irqs-on + irq-safe-A/21: ok | ok | ok |
    [ 0.000000] sirq-safe-A => hirqs-on/12: ok | ok | ok |
    [ 0.000000] sirq-safe-A => hirqs-on/21: ok | ok | ok |
    [ 0.000000] hard-safe-A + irqs-on/12: ok | ok | ok |
    [ 0.000000] soft-safe-A + irqs-on/12: ok | ok | ok |
    [ 0.000000] hard-safe-A + irqs-on/21: ok | ok | ok |
    [ 0.000000] soft-safe-A + irqs-on/21: ok | ok | ok |
    [ 0.000000] hard-safe-A + unsafe-B #1/123: ok | ok | ok |
    [ 0.000000] soft-safe-A + unsafe-B #1/123: ok | ok | ok |

    Link: http://lkml.kernel.org/r/20180730222423.196630-4-joel@joelfernandes.org

    Acked-by: Peter Zijlstra (Intel)
    Reviewed-by: Namhyung Kim
    Signed-off-by: Joel Fernandes (Google)
    Signed-off-by: Steven Rostedt (VMware)

    Joel Fernandes (Google)
     

20 Jul, 2018

3 commits

  • Introduce a new function to finalize the kernel mappings for the userspace
    page-table after all ro/nx protections have been applied to the kernel
    mappings.

    Also move the call to pti_clone_kernel_text() to that function so that it
    will run on 32 bit kernels too.

    Signed-off-by: Joerg Roedel
    Signed-off-by: Thomas Gleixner
    Tested-by: Pavel Machek
    Cc: "H . Peter Anvin"
    Cc: linux-mm@kvack.org
    Cc: Linus Torvalds
    Cc: Andy Lutomirski
    Cc: Dave Hansen
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Jiri Kosina
    Cc: Boris Ostrovsky
    Cc: Brian Gerst
    Cc: David Laight
    Cc: Denys Vlasenko
    Cc: Eduardo Valentin
    Cc: Greg KH
    Cc: Will Deacon
    Cc: aliguori@amazon.com
    Cc: daniel.gruss@iaik.tugraz.at
    Cc: hughd@google.com
    Cc: keescook@google.com
    Cc: Andrea Arcangeli
    Cc: Waiman Long
    Cc: "David H . Gutteridge"
    Cc: joro@8bytes.org
    Link: https://lkml.kernel.org/r/1531906876-13451-30-git-send-email-joro@8bytes.org

    Joerg Roedel
     
  • Allow sched_clock() to be used before schec_clock_init() is called. This
    provides a way to get early boot timestamps on machines with unstable
    clocks.

    Signed-off-by: Pavel Tatashin
    Signed-off-by: Thomas Gleixner
    Cc: steven.sistare@oracle.com
    Cc: daniel.m.jordan@oracle.com
    Cc: linux@armlinux.org.uk
    Cc: schwidefsky@de.ibm.com
    Cc: heiko.carstens@de.ibm.com
    Cc: john.stultz@linaro.org
    Cc: sboyd@codeaurora.org
    Cc: hpa@zytor.com
    Cc: douly.fnst@cn.fujitsu.com
    Cc: peterz@infradead.org
    Cc: prarit@redhat.com
    Cc: feng.tang@intel.com
    Cc: pmladek@suse.com
    Cc: gnomes@lxorguk.ukuu.org.uk
    Cc: linux-s390@vger.kernel.org
    Cc: boris.ostrovsky@oracle.com
    Cc: jgross@suse.com
    Cc: pbonzini@redhat.com
    Link: https://lkml.kernel.org/r/20180719205545.16512-24-pasha.tatashin@oracle.com

    Pavel Tatashin
     
  • sched_clock_postinit() initializes a generic clock on systems where no
    other clock is provided. This function may be called only after
    timekeeping_init().

    Rename sched_clock_postinit to generic_clock_inti() and call it from
    sched_clock_init(). Move the call for sched_clock_init() until after
    time_init().

    Suggested-by: Peter Zijlstra
    Signed-off-by: Pavel Tatashin
    Signed-off-by: Thomas Gleixner
    Cc: steven.sistare@oracle.com
    Cc: daniel.m.jordan@oracle.com
    Cc: linux@armlinux.org.uk
    Cc: schwidefsky@de.ibm.com
    Cc: heiko.carstens@de.ibm.com
    Cc: john.stultz@linaro.org
    Cc: sboyd@codeaurora.org
    Cc: hpa@zytor.com
    Cc: douly.fnst@cn.fujitsu.com
    Cc: prarit@redhat.com
    Cc: feng.tang@intel.com
    Cc: pmladek@suse.com
    Cc: gnomes@lxorguk.ukuu.org.uk
    Cc: linux-s390@vger.kernel.org
    Cc: boris.ostrovsky@oracle.com
    Cc: jgross@suse.com
    Cc: pbonzini@redhat.com
    Link: https://lkml.kernel.org/r/20180719205545.16512-23-pasha.tatashin@oracle.com

    Pavel Tatashin
     

26 May, 2018

1 commit

  • In commit c7753208a94c ("x86, swiotlb: Add memory encryption support") a
    call to function `mem_encrypt_init' was added. Include prototype
    defined in header to prevent a warning reported
    during compilation with W=1:

    init/main.c:494:20: warning: no previous prototype for `mem_encrypt_init' [-Wmissing-prototypes]

    Link: http://lkml.kernel.org/r/20180522195533.31415-1-malat@debian.org
    Signed-off-by: Mathieu Malaterre
    Reviewed-by: Andrew Morton
    Acked-by: Steven Rostedt (VMware)
    Cc: Tom Lendacky
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Kees Cook
    Cc: Laura Abbott
    Cc: Dominik Brodowski
    Cc: Gargi Sharma
    Cc: Josh Poimboeuf
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Malaterre
     

12 May, 2018

1 commit

  • load_module() creates W+X mappings via __vmalloc_node_range() (from
    layout_and_allocate()->move_module()->module_alloc()) by using
    PAGE_KERNEL_EXEC. These mappings are later cleaned up via
    "call_rcu_sched(&freeinit->rcu, do_free_init)" from do_init_module().

    This is a problem because call_rcu_sched() queues work, which can be run
    after debug_checkwx() is run, resulting in a race condition. If hit,
    the race results in a nasty splat about insecure W+X mappings, which
    results in a poor user experience as these are not the mappings that
    debug_checkwx() is intended to catch.

    This issue is observed on multiple arm64 platforms, and has been
    artificially triggered on an x86 platform.

    Address the race by flushing the queued work before running the
    arch-defined mark_rodata_ro() which then calls debug_checkwx().

    Link: http://lkml.kernel.org/r/1525103946-29526-1-git-send-email-jhugo@codeaurora.org
    Fixes: e1a58320a38d ("x86/mm: Warn on W^X mappings")
    Signed-off-by: Jeffrey Hugo
    Reported-by: Timur Tabi
    Reported-by: Jan Glauber
    Acked-by: Kees Cook
    Acked-by: Ingo Molnar
    Acked-by: Will Deacon
    Acked-by: Laura Abbott
    Cc: Mark Rutland
    Cc: Ard Biesheuvel
    Cc: Catalin Marinas
    Cc: Stephen Smalley
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeffrey Hugo
     

07 May, 2018

1 commit


12 Apr, 2018

2 commits

  • For fine-grained debugging and usercopy protection.

    Link: http://lkml.kernel.org/r/20180310085027.GA17121@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Cc: Al Viro
    Cc: Glauber Costa
    Cc: Vladimir Davydov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • So "struct uts_namespace" can enjoy fine-grained SLAB debugging and
    usercopy protection.

    I'd prefer shorter name "utsns" but there is "user_namespace" already.

    Link: http://lkml.kernel.org/r/20180228215158.GA23146@avx2
    Signed-off-by: Alexey Dobriyan
    Reviewed-by: Andrew Morton
    Cc: "Eric W. Biederman"
    Cc: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

11 Apr, 2018

1 commit

  • Pull tracing updates from Steven Rostedt:
    "New features:

    - Tom Zanussi's extended histogram work.

    This adds the synthetic events to have histograms from multiple
    event data Adds triggers "onmatch" and "onmax" to call the
    synthetic events Several updates to the histogram code from this

    - Allow way to nest ring buffer calls in the same context

    - Allow absolute time stamps in ring buffer

    - Rewrite of filter code parsing based on Al Viro's suggestions

    - Setting of trace_clock to global if TSC is unstable (on boot)

    - Better OOM handling when allocating large ring buffers

    - Added initcall tracepoints (consolidated initcall_debug code with
    them)

    And other various fixes and clean ups"

    * tag 'trace-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (68 commits)
    init: Have initcall_debug still work without CONFIG_TRACEPOINTS
    init, tracing: Have printk come through the trace events for initcall_debug
    init, tracing: instrument security and console initcall trace events
    init, tracing: Add initcall trace events
    tracing: Add rcu dereference annotation for test func that touches filter->prog
    tracing: Add rcu dereference annotation for filter->prog
    tracing: Fixup logic inversion on setting trace_global_clock defaults
    tracing: Hide global trace clock from lockdep
    ring-buffer: Add set/clear_current_oom_origin() during allocations
    ring-buffer: Check if memory is available before allocation
    lockdep: Add print_irqtrace_events() to __warn
    vsprintf: Do not preprocess non-dereferenced pointers for bprintf (%px and %pK)
    tracing: Uninitialized variable in create_tracing_map_fields()
    tracing: Make sure variable string fields are NULL-terminated
    tracing: Add action comparisons when testing matching hist triggers
    tracing: Don't add flag strings when displaying variable references
    tracing: Fix display of hist trigger expressions containing timestamps
    ftrace: Drop a VLA in module_exists()
    tracing: Mention trace_clock=global when warning about unstable clocks
    tracing: Default to using trace_global_clock if sched_clock is unstable
    ...

    Linus Torvalds
     

08 Apr, 2018

1 commit


06 Apr, 2018

1 commit

  • With trace events set before and after the initcall function calls, instead
    of having a separate routine for printing out the initcalls when
    initcall_debug is specified on the kernel command line, have the code
    register a callback to the tracepoints where the initcall trace events are.

    This removes the need for having a separate function to do the initcalls as
    the tracepoint callbacks can handle the printk. It also includes other
    initcalls that are not called by the do_one_initcall() which includes
    console and security initcalls.

    Signed-off-by: Steven Rostedt (VMware)

    Steven Rostedt (VMware)