06 Mar, 2019

1 commit

  • We're (finally) phasing out a.out support for good. As Borislav Petkov
    points out, we've supported ELF binaries for about 25 years by now, and
    coredumping in particular has bitrotted over the years.

    None of the tool chains even support generating a.out binaries any more,
    and the plan is to deprecate a.out support entirely for the kernel. But
    I want to start with just removing the core dumping code, because I can
    still imagine that somebody actually might want to support a.out as a
    simpler biinary format.

    Particularly if you generate some random binaries on the fly, ELF is a
    much more complicated format (admittedly ELF also does have a lot of
    toolchain support, mitigating that complexity a lot and you really
    should have moved over in the last 25 years).

    So it's at least somewhat possible that somebody out there has some
    workflow that still involves generating and running a.out executables.

    In contrast, it's very unlikely that anybody depends on debugging any
    legacy a.out core files. But regardless, I want this phase-out to be
    done in two steps, so that we can resurrect a.out support (if needed)
    without having to resurrect the core file dumping that is almost
    certainly not needed.

    Jann Horn pointed to the file that my first trivial
    cut at this had missed.

    And Alan Cox points out that the a.out binary loader _could_ be done in
    user space if somebody wants to, but we might keep just the loader in
    the kernel if somebody really wants it, since the loader isn't that big
    and has no really odd special cases like the core dumping does.

    Acked-by: Borislav Petkov
    Cc: Alan Cox
    Cc: Jann Horn
    Cc: Richard Weinberger
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

05 Jan, 2019

1 commit

  • Patch series "Add support for fast mremap".

    This series speeds up the mremap(2) syscall by copying page tables at
    the PMD level even for non-THP systems. There is concern that the extra
    'address' argument that mremap passes to pte_alloc may do something
    subtle architecture related in the future that may make the scheme not
    work. Also we find that there is no point in passing the 'address' to
    pte_alloc since its unused. This patch therefore removes this argument
    tree-wide resulting in a nice negative diff as well. Also ensuring
    along the way that the enabled architectures do not do anything funky
    with the 'address' argument that goes unnoticed by the optimization.

    Build and boot tested on x86-64. Build tested on arm64. The config
    enablement patch for arm64 will be posted in the future after more
    testing.

    The changes were obtained by applying the following Coccinelle script.
    (thanks Julia for answering all Coccinelle questions!).
    Following fix ups were done manually:
    * Removal of address argument from pte_fragment_alloc
    * Removal of pte_alloc_one_fast definitions from m68k and microblaze.

    // Options: --include-headers --no-includes
    // Note: I split the 'identifier fn' line, so if you are manually
    // running it, please unsplit it so it runs for you.

    virtual patch

    @pte_alloc_func_def depends on patch exists@
    identifier E2;
    identifier fn =~
    "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
    type T2;
    @@

    fn(...
    - , T2 E2
    )
    { ... }

    @pte_alloc_func_proto_noarg depends on patch exists@
    type T1, T2, T3, T4;
    identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
    @@

    (
    - T3 fn(T1, T2);
    + T3 fn(T1);
    |
    - T3 fn(T1, T2, T4);
    + T3 fn(T1, T2);
    )

    @pte_alloc_func_proto depends on patch exists@
    identifier E1, E2, E4;
    type T1, T2, T3, T4;
    identifier fn =~
    "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
    @@

    (
    - T3 fn(T1 E1, T2 E2);
    + T3 fn(T1 E1);
    |
    - T3 fn(T1 E1, T2 E2, T4 E4);
    + T3 fn(T1 E1, T2 E2);
    )

    @pte_alloc_func_call depends on patch exists@
    expression E2;
    identifier fn =~
    "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
    @@

    fn(...
    -, E2
    )

    @pte_alloc_macro depends on patch exists@
    identifier fn =~
    "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
    identifier a, b, c;
    expression e;
    position p;
    @@

    (
    - #define fn(a, b, c) e
    + #define fn(a, b) e
    |
    - #define fn(a, b) e
    + #define fn(a) e
    )

    Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.com
    Signed-off-by: Joel Fernandes (Google)
    Suggested-by: Kirill A. Shutemov
    Acked-by: Kirill A. Shutemov
    Cc: Michal Hocko
    Cc: Julia Lawall
    Cc: Kirill A. Shutemov
    Cc: William Kucharski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joel Fernandes (Google)
     

28 Dec, 2018

4 commits

  • reenable_fd has been a NOP since the introduction of the EPOLL
    based interrupt controller.
    reenable_channel() is no longer needed as the flow control is
    now handled via the write IRQs on the channel.

    Signed-off-by: Anton Ivanov
    Signed-off-by: Richard Weinberger

    Anton Ivanov
     
  • This commit removes redundant generic-y defines in
    arch/um/include/asm/Kbuild.

    It is redundant to define generic-y when arch-specific implementation
    exists in arch/$(ARCH)/include/asm/*.h

    Remove the following generic-y:

    hardirq.h
    io.h

    Signed-off-by: Masahiro Yamada
    Signed-off-by: Richard Weinberger

    Masahiro Yamada
     
  • Changing protection is a very high cost operation in UML
    because in addition to an extra syscall it also interrupts
    mmap merge sequences generated by the tlb.

    While the condition is not particularly common it is worth
    avoiding.

    Signed-off-by: Anton Ivanov
    Signed-off-by: Richard Weinberger

    Anton Ivanov
     
  • Support for DISCARD and WRITE_ZEROES in the ubd driver using
    fallocate.

    DISCARD is enabled by default and can be disabled using a new
    UBD command line flag.

    If the underlying fs on which the UBD image is stored does not
    support DISCARD the support for both DISCARD and WRITE_ZEROES
    is turned off.

    Signed-off-by: Anton Ivanov
    Signed-off-by: Richard Weinberger

    Anton Ivanov
     

01 Nov, 2018

1 commit

  • Pull UML updates from Richard Weinberger:

    - removal of old and dead code

    - a bug fix for our tty driver

    - other minor cleanups across the code base

    * 'for-linus-4.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
    um: Make line/tty semantics use true write IRQ
    um: trap: fix spelling mistake, EACCESS -> EACCES
    um: Don't hardcode path as it is architecture dependent
    um: NULL check before kfree is not needed
    um: remove unused AIO code
    um: Give start_idle_thread() a return code
    um: Remove update_debugregs()
    um: Drop own definition of PTRACE_SYSEMU/_SINGLESTEP

    Linus Torvalds
     

30 Oct, 2018

1 commit


11 Oct, 2018

1 commit

  • Since the struct lsm_info table is not an initcall, we can just move it
    into INIT_DATA like all the other tables.

    Signed-off-by: Kees Cook
    Reviewed-by: Casey Schaufler
    Reviewed-by: John Johansen
    Reviewed-by: James Morris
    Signed-off-by: James Morris

    Kees Cook
     

16 Jun, 2018

1 commit


11 Jun, 2018

1 commit

  • __uml_initcall() is not used and .uml.initcall.init section is empty:

    $ grep -r '__uml_initcall('
    arch/um/include/shared/init.h:#define __uml_initcall(fn) \
    $ readelf -s ../umobj/linux | grep __uml_initcall
    23214: 00000000603b75d8 0 NOTYPE GLOBAL DEFAULT 32 __uml_initcall_start
    25337: 00000000603b75d8 0 NOTYPE GLOBAL DEFAULT 32 __uml_initcall_end

    So it is unnecessary.

    Signed-off-by: Alexander Pateenok
    Signed-off-by: Richard Weinberger

    Alexander Pateenok
     

19 Apr, 2018

1 commit

  • We have a couple of files that try to include asm/compat.h on
    architectures where this is available. Those should generally use the
    higher-level linux/compat.h file, but that in turn fails to include
    asm/compat.h when CONFIG_COMPAT is disabled, unless we can provide
    that header on all architectures.

    This adds the asm/compat.h for all remaining architectures to
    simplify the dependencies.

    Architectures that are getting removed in linux-4.17 are not changed
    here, to avoid needless conflicts with the removal patches. Those
    architectures are broken by this patch, but we have already shown
    that they have no users.

    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

20 Feb, 2018

3 commits

  • 1. Provides infrastructure for vector IO using recvmmsg/sendmmsg.
    1.1. Multi-message read.
    1.2. Multi-message write.
    1.3. Optimized queue support for multi-packet enqueue/dequeue.
    1.4. BQL/DQL support.
    2. Implements transports for several transports as well support
    for direct wiring of PWEs to NIC. Allows direct connection of VMs
    to host, other VMs and network devices with no switch in use.
    2.1. Raw socket >4 times higher PPS and 10 times higher tcp RX
    than existing pcap based transport (> 4Gbit)
    2.2. New tap transport using socket RX and tap xmit. Similar
    performance improvements (>4Gbit)
    2.3. GRE transport - direct wiring to GRE PWE
    2.4. L2TPv3 transport - direct wiring to L2TPv3 PWE
    3. Tuning, performance and offload related setting support via ethtool.
    4. Initial BPF support - used in tap/raw to avoid software looping
    5. Scatter Gather support.
    6. VNET and checksum offload support for raw socket transport.
    7. TSO/GSO support where applicable or available
    8. Migrates all error messages to netdevice_*() and rate limits
    them where needed.

    Signed-off-by: Anton Ivanov
    Signed-off-by: Richard Weinberger

    Anton Ivanov
     
  • 1. Removes the need to walk the IRQ/Device list to determine
    who triggered the IRQ.
    2. Improves scalability (up to several times performance
    improvement for cases with 10s of devices).
    3. Improves UML baseline IO performance for one disk + one NIC
    use case by up to 10%.
    4. Introduces write poll triggered IRQs.
    5. Prerequisite for introducing high performance mmesg family
    of functions in network IO.
    6. Fixes RNG shutdown which was leaking a file descriptor

    Signed-off-by: Anton Ivanov
    Signed-off-by: Richard Weinberger

    Anton Ivanov
     
  • If CONFIG_MODVERSIONS=y:

    WARNING: EXPORT symbol "__memcpy" [vmlinux] version generation failed, symbol will not be versioned.
    WARNING: EXPORT symbol "memcpy" [vmlinux] version generation failed, symbol will not be versioned.

    Add , including the generic version, so that
    genksyms knows the types of these symbols and can generate CRCs for
    them.

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Richard Weinberger

    Geert Uytterhoeven
     

02 Feb, 2018

1 commit

  • Pull clk updates from Stephen Boyd:
    "The core framework has a handful of patches this time around, mostly
    due to the clk rate protection support added by Jerome Brunet.

    This feature will allow consumers to lock in a certain rate on the
    output of a clk so that things like audio playback don't hear pops
    when the clk frequency changes due to shared parent clks changing
    rates. Currently the clk API doesn't guarantee the rate of a clk stays
    at the rate you request after clk_set_rate() is called, so this new
    API will allow drivers to express that requirement.

    Beyond this, the core got some debugfs pretty printing patches and a
    couple minor non-critical fixes.

    Looking outside of the core framework diff we have some new driver
    additions and the removal of a legacy TI clk driver. Both of these hit
    high in the dirstat. Also, the removal of the asm-generic/clkdev.h
    file causes small one-liners in all the architecture Kbuild files.

    Overall, the driver diff seems to be the normal stuff that comes all
    the time to fix little problems here and there and to support new
    hardware.

    Summary:

    Core:
    - Clk rate protection
    - Symbolic clk flags in debugfs output
    - Clk registration enabled clks while doing bookkeeping updates

    New Drivers:
    - Spreadtrum SC9860
    - HiSilicon hi3660 stub
    - Qualcomm A53 PLL, SPMI clkdiv, and MSM8916 APCS
    - Amlogic Meson-AXG
    - ASPEED BMC

    Removed Drivers:
    - TI OMAP 3xxx legacy clk (non-DT) support
    - asm*/clkdev.h got removed (not really a driver)

    Updates:
    - Renesas FDP1-0 module clock on R-Car M3-W
    - Renesas LVDS module clock on R-Car V3M
    - Misc fixes to pr_err() prints
    - Qualcomm MSM8916 audio fixes
    - Qualcomm IPQ8074 rounded out support for more peripherals
    - Qualcomm Alpha PLL variants
    - Divider code was using container_of() on bad pointers
    - Allwinner DE2 clks on H3
    - Amlogic minor data fixes and dropping of CLK_IGNORE_UNUSED
    - Mediatek clk driver compile test support
    - AT91 PMC clk suspend/resume restoration support
    - PLL issues fixed on si5351
    - Broadcom IProc PLL calculation updates
    - DVFS support for Armada mvebu CPU clks
    - Allwinner fixed post-divider support
    - TI clkctrl fixes and support for newer SoCs"

    * tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux: (125 commits)
    clk: aspeed: Handle inverse polarity of USB port 1 clock gate
    clk: aspeed: Fix return value check in aspeed_cc_init()
    clk: aspeed: Add reset controller
    clk: aspeed: Register gated clocks
    clk: aspeed: Add platform driver and register PLLs
    clk: aspeed: Register core clocks
    clk: Add clock driver for ASPEED BMC SoCs
    clk: mediatek: adjust dependency of reset.c to avoid unexpectedly being built
    clk: fix reentrancy of clk_enable() on UP systems
    clk: meson-axg: fix potential NULL dereference in axg_clkc_probe()
    clk: Simplify debugfs registration
    clk: Fix debugfs_create_*() usage
    clk: Show symbolic clock flags in debugfs
    clk: renesas: r8a7796: Add FDP clock
    clk: Move __clk_{get,put}() into private clk.h API
    clk: sunxi: Use CLK_IS_CRITICAL flag for critical clks
    clk: Improve flags doc for of_clk_detect_critical()
    arch: Remove clkdev.h asm-generic from Kbuild
    clk: sunxi-ng: a83t: Add M divider to TCON1 clock
    clk: Prepare to remove asm-generic/clkdev.h
    ...

    Linus Torvalds
     

10 Jan, 2018

1 commit

  • Construct the init thread stack in the linker script rather than doing it
    by means of a union so that ia64's init_task.c can be got rid of.

    The following symbols are then made available from INIT_TASK_DATA() linker
    script macro:

    init_thread_union
    init_stack

    INIT_TASK_DATA() also expands the region to THREAD_SIZE to accommodate the
    size of the init stack. init_thread_union is given its own section so that
    it can be placed into the stack space in the right order. I'm assuming
    that the ia64 ordering is correct and that the task_struct is first and the
    thread_info second.

    Signed-off-by: David Howells
    Tested-by: Tony Luck
    Tested-by: Will Deacon (arm64)
    Tested-by: Palmer Dabbelt
    Acked-by: Thomas Gleixner

    David Howells
     

04 Jan, 2018

1 commit


24 Dec, 2017

1 commit

  • Pull x86 PTI preparatory patches from Thomas Gleixner:
    "Todays Advent calendar window contains twentyfour easy to digest
    patches. The original plan was to have twenty three matching the date,
    but a late fixup made that moot.

    - Move the cpu_entry_area mapping out of the fixmap into a separate
    address space. That's necessary because the fixmap becomes too big
    with NRCPUS=8192 and this caused already subtle and hard to
    diagnose failures.

    The top most patch is fresh from today and cures a brain slip of
    that tall grumpy german greybeard, who ignored the intricacies of
    32bit wraparounds.

    - Limit the number of CPUs on 32bit to 64. That's insane big already,
    but at least it's small enough to prevent address space issues with
    the cpu_entry_area map, which have been observed and debugged with
    the fixmap code

    - A few TLB flush fixes in various places plus documentation which of
    the TLB functions should be used for what.

    - Rename the SYSENTER stack to CPU_ENTRY_AREA stack as it is used for
    more than sysenter now and keeping the name makes backtraces
    confusing.

    - Prevent LDT inheritance on exec() by moving it to arch_dup_mmap(),
    which is only invoked on fork().

    - Make vysycall more robust.

    - A few fixes and cleanups of the debug_pagetables code. Check
    PAGE_PRESENT instead of checking the PTE for 0 and a cleanup of the
    C89 initialization of the address hint array which already was out
    of sync with the index enums.

    - Move the ESPFIX init to a different place to prepare for PTI.

    - Several code moves with no functional change to make PTI
    integration simpler and header files less convoluted.

    - Documentation fixes and clarifications"

    * 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
    x86/cpu_entry_area: Prevent wraparound in setup_cpu_entry_area_ptes() on 32bit
    init: Invoke init_espfix_bsp() from mm_init()
    x86/cpu_entry_area: Move it out of the fixmap
    x86/cpu_entry_area: Move it to a separate unit
    x86/mm: Create asm/invpcid.h
    x86/mm: Put MMU to hardware ASID translation in one place
    x86/mm: Remove hard-coded ASID limit checks
    x86/mm: Move the CR3 construction functions to tlbflush.h
    x86/mm: Add comments to clarify which TLB-flush functions are supposed to flush what
    x86/mm: Remove superfluous barriers
    x86/mm: Use __flush_tlb_one() for kernel memory
    x86/microcode: Dont abuse the TLB-flush interface
    x86/uv: Use the right TLB-flush API
    x86/entry: Rename SYSENTER_stack to CPU_ENTRY_AREA_entry_stack
    x86/doc: Remove obvious weirdnesses from the x86 MM layout documentation
    x86/mm/64: Improve the memory map documentation
    x86/ldt: Prevent LDT inheritance on exec
    x86/ldt: Rework locking
    arch, mm: Allow arch_dup_mmap() to fail
    x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode
    ...

    Linus Torvalds
     

23 Dec, 2017

1 commit

  • In order to sanitize the LDT initialization on x86 arch_dup_mmap() must be
    allowed to fail. Fix up all instances.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Andy Lutomirski
    Cc: Andy Lutomirsky
    Cc: Boris Ostrovsky
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Dave Hansen
    Cc: Dave Hansen
    Cc: David Laight
    Cc: Denys Vlasenko
    Cc: Eduardo Valentin
    Cc: Greg KH
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Juergen Gross
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Will Deacon
    Cc: aliguori@amazon.com
    Cc: dan.j.williams@intel.com
    Cc: hughd@google.com
    Cc: keescook@google.com
    Cc: kirill.shutemov@linux.intel.com
    Cc: linux-mm@kvack.org
    Signed-off-by: Ingo Molnar

    Thomas Gleixner
     

17 Dec, 2017

1 commit

  • [ Note, this is a Git cherry-pick of the following commit:

    a23f06f06dbe ("bpf: fix build issues on um due to mising bpf_perf_event.h")

    ... for easier x86 PTI code testing and back-porting. ]

    Since c895f6f703ad ("bpf: correct broken uapi for
    BPF_PROG_TYPE_PERF_EVENT program type") um (uml) won't build
    on i386 or x86_64:

    [...]
    CC init/main.o
    In file included from ../include/linux/perf_event.h:18:0,
    from ../include/linux/trace_events.h:10,
    from ../include/trace/syscall.h:7,
    from ../include/linux/syscalls.h:82,
    from ../init/main.c:20:
    ../include/uapi/linux/bpf_perf_event.h:11:32: fatal error:
    asm/bpf_perf_event.h: No such file or directory #include

    [...]

    Lets add missing bpf_perf_event.h also to um arch. This seems
    to be the only one still missing.

    Fixes: c895f6f703ad ("bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type")
    Reported-by: Randy Dunlap
    Suggested-by: Richard Weinberger
    Signed-off-by: Daniel Borkmann
    Tested-by: Randy Dunlap
    Cc: Hendrik Brueckner
    Cc: Richard Weinberger
    Acked-by: Alexei Starovoitov
    Acked-by: Richard Weinberger
    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Ingo Molnar

    Daniel Borkmann
     

13 Dec, 2017

1 commit

  • Since c895f6f703ad ("bpf: correct broken uapi for
    BPF_PROG_TYPE_PERF_EVENT program type") um (uml) won't build
    on i386 or x86_64:

    [...]
    CC init/main.o
    In file included from ../include/linux/perf_event.h:18:0,
    from ../include/linux/trace_events.h:10,
    from ../include/trace/syscall.h:7,
    from ../include/linux/syscalls.h:82,
    from ../init/main.c:20:
    ../include/uapi/linux/bpf_perf_event.h:11:32: fatal error:
    asm/bpf_perf_event.h: No such file or directory #include

    [...]

    Lets add missing bpf_perf_event.h also to um arch. This seems
    to be the only one still missing.

    Fixes: c895f6f703ad ("bpf: correct broken uapi for BPF_PROG_TYPE_PERF_EVENT program type")
    Reported-by: Randy Dunlap
    Suggested-by: Richard Weinberger
    Signed-off-by: Daniel Borkmann
    Tested-by: Randy Dunlap
    Cc: Hendrik Brueckner
    Cc: Richard Weinberger
    Acked-by: Alexei Starovoitov
    Acked-by: Richard Weinberger
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     

07 Nov, 2017

1 commit


02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

24 Oct, 2017

1 commit

  • linux/compiler.h is included indirectly by linux/types.h via
    uapi/linux/types.h -> uapi/linux/posix_types.h -> linux/stddef.h
    -> uapi/linux/stddef.h and is needed to provide a proper definition of
    offsetof.

    Unfortunately, compiler.h requires a definition of
    smp_read_barrier_depends() for defining lockless_dereference() and soon
    for defining READ_ONCE(), which means that all
    users of READ_ONCE() will need to include asm/barrier.h to avoid splats
    such as:

    In file included from include/uapi/linux/stddef.h:1:0,
    from include/linux/stddef.h:4,
    from arch/h8300/kernel/asm-offsets.c:11:
    include/linux/list.h: In function 'list_empty':
    >> include/linux/compiler.h:343:2: error: implicit declaration of function 'smp_read_barrier_depends' [-Werror=implicit-function-declaration]
    smp_read_barrier_depends(); /* Enforce dependency ordering from x */ \
    ^

    A better alternative is to include asm/barrier.h in linux/compiler.h,
    but this requires a type definition for "bool" on some architectures
    (e.g. x86), which is defined later by linux/types.h. Type "bool" is also
    used directly in linux/compiler.h, so the whole thing is pretty fragile.

    This patch splits compiler.h in two: compiler_types.h contains type
    annotations, definitions and the compiler-specific parts, whereas
    compiler.h #includes compiler-types.h and additionally defines macros
    such as {READ,WRITE.ACCESS}_ONCE().

    uapi/linux/stddef.h and linux/linkage.h are then moved over to include
    linux/compiler_types.h, which fixes the build for h8 and blackfin.

    Signed-off-by: Will Deacon
    Cc: Linus Torvalds
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1508840570-22169-2-git-send-email-will.deacon@arm.com
    Signed-off-by: Ingo Molnar

    Will Deacon
     

23 Sep, 2017

1 commit


17 Sep, 2017

1 commit

  • Pull UML updates from Richard Weinberger:

    - minor improvements

    - fixes for Debian's new gcc defaults (pie enabled by default)

    - fixes for XSTATE/XSAVE to make UML work again on modern systems

    * 'for-linus-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
    um: return negative in tuntap_open_tramp()
    um: remove a stray tab
    um: Use relative modversions with LD_SCRIPT_DYN
    um: link vmlinux with -no-pie
    um: Fix CONFIG_GCOV for modules.
    Fix minor typos and grammar in UML start_up help
    um: defconfig: Cleanup from old Kconfig options
    um: Fix FP register size for XSTATE/XSAVE

    Linus Torvalds
     

14 Sep, 2017

1 commit


05 Sep, 2017

1 commit

  • Pull x86 asm updates from Ingo Molnar:

    - Introduce the ORC unwinder, which can be enabled via
    CONFIG_ORC_UNWINDER=y.

    The ORC unwinder is a lightweight, Linux kernel specific debuginfo
    implementation, which aims to be DWARF done right for unwinding.
    Objtool is used to generate the ORC unwinder tables during build, so
    the data format is flexible and kernel internal: there's no
    dependency on debuginfo created by an external toolchain.

    The ORC unwinder is almost two orders of magnitude faster than the
    (out of tree) DWARF unwinder - which is important for perf call graph
    profiling. It is also significantly simpler and is coded defensively:
    there has not been a single ORC related kernel crash so far, even
    with early versions. (knock on wood!)

    But the main advantage is that enabling the ORC unwinder allows
    CONFIG_FRAME_POINTERS to be turned off - which speeds up the kernel
    measurably:

    With frame pointers disabled, GCC does not have to add frame pointer
    instrumentation code to every function in the kernel. The kernel's
    .text size decreases by about 3.2%, resulting in better cache
    utilization and fewer instructions executed, resulting in a broad
    kernel-wide speedup. Average speedup of system calls should be
    roughly in the 1-3% range - measurements by Mel Gorman [1] have shown
    a speedup of 5-10% for some function execution intense workloads.

    The main cost of the unwinder is that the unwinder data has to be
    stored in RAM: the memory cost is 2-4MB of RAM, depending on kernel
    config - which is a modest cost on modern x86 systems.

    Given how young the ORC unwinder code is it's not enabled by default
    - but given the performance advantages the plan is to eventually make
    it the default unwinder on x86.

    See Documentation/x86/orc-unwinder.txt for more details.

    - Remove lguest support: its intended role was that of a temporary
    proof of concept for virtualization, plus its removal will enable the
    reduction (removal) of the paravirt API as well, so Rusty agreed to
    its removal. (Juergen Gross)

    - Clean up and fix FSGS related functionality (Andy Lutomirski)

    - Clean up IO access APIs (Andy Shevchenko)

    - Enhance the symbol namespace (Jiri Slaby)

    * 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits)
    objtool: Handle GCC stack pointer adjustment bug
    x86/entry/64: Use ENTRY() instead of ALIGN+GLOBAL for stub32_clone()
    x86/fpu/math-emu: Add ENDPROC to functions
    x86/boot/64: Extract efi_pe_entry() from startup_64()
    x86/boot/32: Extract efi_pe_entry() from startup_32()
    x86/lguest: Remove lguest support
    x86/paravirt/xen: Remove xen_patch()
    objtool: Fix objtool fallthrough detection with function padding
    x86/xen/64: Fix the reported SS and CS in SYSCALL
    objtool: Track DRAP separately from callee-saved registers
    objtool: Fix validate_branch() return codes
    x86: Clarify/fix no-op barriers for text_poke_bp()
    x86/switch_to/64: Rewrite FS/GS switching yet again to fix AMD CPUs
    selftests/x86/fsgsbase: Test selectors 1, 2, and 3
    x86/fsgsbase/64: Report FSBASE and GSBASE correctly in core dumps
    x86/fsgsbase/64: Fully initialize FS and GS state in start_thread_common
    x86/asm: Fix UNWIND_HINT_REGS macro for older binutils
    x86/asm/32: Fix regs_get_register() on segment registers
    x86/xen/64: Rearrange the SYSCALL entries
    x86/asm/32: Remove a bunch of '& 0xffff' from pt_regs segment reads
    ...

    Linus Torvalds
     

11 Aug, 2017

2 commits

  • Nadav reported parallel MADV_DONTNEED on same range has a stale TLB
    problem and Mel fixed it[1] and found same problem on MADV_FREE[2].

    Quote from Mel Gorman:
    "The race in question is CPU 0 running madv_free and updating some PTEs
    while CPU 1 is also running madv_free and looking at the same PTEs.
    CPU 1 may have writable TLB entries for a page but fail the pte_dirty
    check (because CPU 0 has updated it already) and potentially fail to
    flush.

    Hence, when madv_free on CPU 1 returns, there are still potentially
    writable TLB entries and the underlying PTE is still present so that a
    subsequent write does not necessarily propagate the dirty bit to the
    underlying PTE any more. Reclaim at some unknown time at the future
    may then see that the PTE is still clean and discard the page even
    though a write has happened in the meantime. I think this is possible
    but I could have missed some protection in madv_free that prevents it
    happening."

    This patch aims for solving both problems all at once and is ready for
    other problem with KSM, MADV_FREE and soft-dirty story[3].

    TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending
    and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can
    catch there are parallel threads going on. In that case, forcefully,
    flush TLB to prevent for user to access memory via stale TLB entry
    although it fail to gather page table entry.

    I confirmed this patch works with [4] test program Nadav gave so this
    patch supersedes "mm: Always flush VMA ranges affected by zap_page_range
    v2" in current mmotm.

    NOTE:

    This patch modifies arch-specific TLB gathering interface(x86, ia64,
    s390, sh, um). It seems most of architecture are straightforward but
    s390 need to be careful because tlb_flush_mmu works only if
    mm->context.flush_mm is set to non-zero which happens only a pte entry
    really is cleared by ptep_get_and_clear and friends. However, this
    problem never changes the pte entries but need to flush to prevent
    memory access from stale tlb.

    [1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net
    [2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrfawp@suse.de
    [3] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com
    [4] https://patchwork.kernel.org/patch/9861621/

    [minchan@kernel.org: decrease tlb flush pending count in tlb_finish_mmu]
    Link: http://lkml.kernel.org/r/20170808080821.GA31730@bbox
    Link: http://lkml.kernel.org/r/20170802000818.4760-7-namit@vmware.com
    Signed-off-by: Minchan Kim
    Signed-off-by: Nadav Amit
    Reported-by: Nadav Amit
    Reported-by: Mel Gorman
    Acked-by: Mel Gorman
    Cc: Ingo Molnar
    Cc: Russell King
    Cc: Tony Luck
    Cc: Martin Schwidefsky
    Cc: "David S. Miller"
    Cc: Heiko Carstens
    Cc: Yoshinori Sato
    Cc: Jeff Dike
    Cc: Andrea Arcangeli
    Cc: Andy Lutomirski
    Cc: Hugh Dickins
    Cc: Mel Gorman
    Cc: Nadav Amit
    Cc: Rik van Riel
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • This patch is a preparatory patch for solving race problems caused by
    TLB batch. For that, we will increase/decrease TLB flush pending count
    of mm_struct whenever tlb_[gather|finish]_mmu is called.

    Before making it simple, this patch separates architecture specific part
    and rename it to arch_tlb_[gather|finish]_mmu and generic part just
    calls it.

    It shouldn't change any behavior.

    Link: http://lkml.kernel.org/r/20170802000818.4760-5-namit@vmware.com
    Signed-off-by: Minchan Kim
    Signed-off-by: Nadav Amit
    Acked-by: Mel Gorman
    Cc: Ingo Molnar
    Cc: Russell King
    Cc: Tony Luck
    Cc: Martin Schwidefsky
    Cc: "David S. Miller"
    Cc: Heiko Carstens
    Cc: Yoshinori Sato
    Cc: Jeff Dike
    Cc: Andrea Arcangeli
    Cc: Andy Lutomirski
    Cc: Hugh Dickins
    Cc: Mel Gorman
    Cc: Nadav Amit
    Cc: Rik van Riel
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

26 Jul, 2017

1 commit

  • Add the new ORC unwinder which is enabled by CONFIG_ORC_UNWINDER=y.
    It plugs into the existing x86 unwinder framework.

    It relies on objtool to generate the needed .orc_unwind and
    .orc_unwind_ip sections.

    For more details on why ORC is used instead of DWARF, see
    Documentation/x86/orc-unwinder.txt - but the short version is
    that it's a simplified, fundamentally more robust debugninfo
    data structure, which also allows up to two orders of magnitude
    faster lookups than the DWARF unwinder - which matters to
    profiling workloads like perf.

    Thanks to Andy Lutomirski for the performance improvement ideas:
    splitting the ORC unwind table into two parallel arrays and creating a
    fast lookup table to search a subset of the unwind table.

    Signed-off-by: Josh Poimboeuf
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Jiri Slaby
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: live-patching@vger.kernel.org
    Link: http://lkml.kernel.org/r/0a6cbfb40f8da99b7a45a1a8302dc6aef16ec812.1500938583.git.jpoimboe@redhat.com
    [ Extended the changelog. ]
    Signed-off-by: Ingo Molnar

    Josh Poimboeuf
     

16 Jul, 2017

1 commit

  • Pull UML updates from Richard Weinberger:
    "Mostly fixes for UML:

    - First round of fixes for PTRACE_GETRESET/SETREGSET

    - A printf vs printk cleanup

    - Minor improvements"

    * 'for-linus-4.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
    um: Correctly check for PTRACE_GETRESET/SETREGSET
    um: v2: Use generic NOTES macro
    um: Add kerneldoc for userspace_tramp() and start_userspace()
    um: Add kerneldoc for segv_handler
    um: stub-data.h: remove superfluous include
    um: userspace - be more verbose in ptrace set regs error
    um: add dummy ioremap and iounmap functions
    um: Allow building and running on older hosts
    um: Avoid longjmp/setjmp symbol clashes with libpthread.a
    um: console: Ignore console= option
    um: Use os_warn to print out pre-boot warning/error messages
    um: Add os_warn() for pre-boot warning/error messages
    um: Use os_info for the messages on normal path
    um: Add os_info() for pre-boot information messages
    um: Use printk instead of printf in make_uml_dir

    Linus Torvalds
     

11 Jul, 2017

1 commit


07 Jul, 2017

1 commit


06 Jul, 2017

3 commits


29 Jun, 2017

1 commit

  • The only user of thread_saved_pc() in non-arch-specific code was removed
    in commit 8243d5597793 ("sched/core: Remove pointless printout in
    sched_show_task()"). Remove the implementations as well.

    Some architectures use thread_saved_pc() in their arch-specific code.
    Leave their thread_saved_pc() intact.

    Signed-off-by: Tobias Klauser
    Acked-by: Geert Uytterhoeven
    Cc: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Tobias Klauser
     

02 May, 2017

1 commit

  • Pull x86 mm updates from Ingo Molnar:
    "The main x86 MM changes in this cycle were:

    - continued native kernel PCID support preparation patches to the TLB
    flushing code (Andy Lutomirski)

    - various fixes related to 32-bit compat syscall returning address
    over 4Gb in applications, launched from 64-bit binaries - motivated
    by C/R frameworks such as Virtuozzo. (Dmitry Safonov)

    - continued Intel 5-level paging enablement: in particular the
    conversion of x86 GUP to the generic GUP code. (Kirill A. Shutemov)

    - x86/mpx ABI corner case fixes/enhancements (Joerg Roedel)

    - ... plus misc updates, fixes and cleanups"

    * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (62 commits)
    mm, zone_device: Replace {get, put}_zone_device_page() with a single reference to fix pmem crash
    x86/mm: Fix flush_tlb_page() on Xen
    x86/mm: Make flush_tlb_mm_range() more predictable
    x86/mm: Remove flush_tlb() and flush_tlb_current_task()
    x86/vm86/32: Switch to flush_tlb_mm_range() in mark_screen_rdonly()
    x86/mm/64: Fix crash in remove_pagetable()
    Revert "x86/mm/gup: Switch GUP to the generic get_user_page_fast() implementation"
    x86/boot/e820: Remove a redundant self assignment
    x86/mm: Fix dump pagetables for 4 levels of page tables
    x86/mpx, selftests: Only check bounds-vs-shadow when we keep shadow
    x86/mpx: Correctly report do_mpx_bt_fault() failures to user-space
    Revert "x86/mm/numa: Remove numa_nodemask_from_meminfo()"
    x86/espfix: Add support for 5-level paging
    x86/kasan: Extend KASAN to support 5-level paging
    x86/mm: Add basic defines/helpers for CONFIG_X86_5LEVEL=y
    x86/paravirt: Add 5-level support to the paravirt code
    x86/mm: Define virtual memory map for 5-level paging
    x86/asm: Remove __VIRTUAL_MASK_SHIFT==47 assert
    x86/boot: Detect 5-level paging support
    x86/mm/numa: Remove numa_nodemask_from_meminfo()
    ...

    Linus Torvalds