21 May, 2019

2 commits

  • Add SPDX license identifiers to all Make/Kconfig files which:

    - Have no license information of any form

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

19 May, 2019

1 commit

  • Since commit 54c7a8916a88 ("initramfs: free initrd memory if opening
    /initrd.image fails"), the kernel has unconditionally attempted to free
    the initrd even if it doesn't exist.

    In the non-existent case this causes a boot-time splat if
    CONFIG_DEBUG_VIRTUAL is enabled due to a call to virt_to_phys() with a
    NULL address.

    Instead we should check that the initrd actually exists and only attempt
    to free it if it does.

    Link: http://lkml.kernel.org/r/20190516143125.48948-1-steven.price@arm.com
    Fixes: 54c7a8916a88 ("initramfs: free initrd memory if opening /initrd.image fails")
    Signed-off-by: Steven Price
    Reported-by: Mark Rutland
    Tested-by: Mark Rutland
    Reviewed-by: Mike Rapoport
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Price
     

15 May, 2019

10 commits

  • Patch series "mm: Randomize free memory", v10.

    This patch (of 3):

    Randomization of the page allocator improves the average utilization of
    a direct-mapped memory-side-cache. Memory side caching is a platform
    capability that Linux has been previously exposed to in HPC
    (high-performance computing) environments on specialty platforms. In
    that instance it was a smaller pool of high-bandwidth-memory relative to
    higher-capacity / lower-bandwidth DRAM. Now, this capability is going
    to be found on general purpose server platforms where DRAM is a cache in
    front of higher latency persistent memory [1].

    Robert offered an explanation of the state of the art of Linux
    interactions with memory-side-caches [2], and I copy it here:

    It's been a problem in the HPC space:
    http://www.nersc.gov/research-and-development/knl-cache-mode-performance-coe/

    A kernel module called zonesort is available to try to help:
    https://software.intel.com/en-us/articles/xeon-phi-software

    and this abandoned patch series proposed that for the kernel:
    https://lkml.kernel.org/r/20170823100205.17311-1-lukasz.daniluk@intel.com

    Dan's patch series doesn't attempt to ensure buffers won't conflict, but
    also reduces the chance that the buffers will. This will make performance
    more consistent, albeit slower than "optimal" (which is near impossible
    to attain in a general-purpose kernel). That's better than forcing
    users to deploy remedies like:
    "To eliminate this gradual degradation, we have added a Stream
    measurement to the Node Health Check that follows each job;
    nodes are rebooted whenever their measured memory bandwidth
    falls below 300 GB/s."

    A replacement for zonesort was merged upstream in commit cc9aec03e58f
    ("x86/numa_emulation: Introduce uniform split capability"). With this
    numa_emulation capability, memory can be split into cache sized
    ("near-memory" sized) numa nodes. A bind operation to such a node, and
    disabling workloads on other nodes, enables full cache performance.
    However, once the workload exceeds the cache size then cache conflicts
    are unavoidable. While HPC environments might be able to tolerate
    time-scheduling of cache sized workloads, for general purpose server
    platforms, the oversubscribed cache case will be the common case.

    The worst case scenario is that a server system owner benchmarks a
    workload at boot with an un-contended cache only to see that performance
    degrade over time, even below the average cache performance due to
    excessive conflicts. Randomization clips the peaks and fills in the
    valleys of cache utilization to yield steady average performance.

    Here are some performance impact details of the patches:

    1/ An Intel internal synthetic memory bandwidth measurement tool, saw a
    3X speedup in a contrived case that tries to force cache conflicts.
    The contrived cased used the numa_emulation capability to force an
    instance of the benchmark to be run in two of the near-memory sized
    numa nodes. If both instances were placed on the same emulated they
    would fit and cause zero conflicts. While on separate emulated nodes
    without randomization they underutilized the cache and conflicted
    unnecessarily due to the in-order allocation per node.

    2/ A well known Java server application benchmark was run with a heap
    size that exceeded cache size by 3X. The cache conflict rate was 8%
    for the first run and degraded to 21% after page allocator aging. With
    randomization enabled the rate levelled out at 11%.

    3/ A MongoDB workload did not observe measurable difference in
    cache-conflict rates, but the overall throughput dropped by 7% with
    randomization in one case.

    4/ Mel Gorman ran his suite of performance workloads with randomization
    enabled on platforms without a memory-side-cache and saw a mix of some
    improvements and some losses [3].

    While there is potentially significant improvement for applications that
    depend on low latency access across a wide working-set, the performance
    may be negligible to negative for other workloads. For this reason the
    shuffle capability defaults to off unless a direct-mapped
    memory-side-cache is detected. Even then, the page_alloc.shuffle=0
    parameter can be specified to disable the randomization on those systems.

    Outside of memory-side-cache utilization concerns there is potentially
    security benefit from randomization. Some data exfiltration and
    return-oriented-programming attacks rely on the ability to infer the
    location of sensitive data objects. The kernel page allocator, especially
    early in system boot, has predictable first-in-first out behavior for
    physical pages. Pages are freed in physical address order when first
    onlined.

    Quoting Kees:
    "While we already have a base-address randomization
    (CONFIG_RANDOMIZE_MEMORY), attacks against the same hardware and
    memory layouts would certainly be using the predictability of
    allocation ordering (i.e. for attacks where the base address isn't
    important: only the relative positions between allocated memory).
    This is common in lots of heap-style attacks. They try to gain
    control over ordering by spraying allocations, etc.

    I'd really like to see this because it gives us something similar
    to CONFIG_SLAB_FREELIST_RANDOM but for the page allocator."

    While SLAB_FREELIST_RANDOM reduces the predictability of some local slab
    caches it leaves vast bulk of memory to be predictably in order allocated.
    However, it should be noted, the concrete security benefits are hard to
    quantify, and no known CVE is mitigated by this randomization.

    Introduce shuffle_free_memory(), and its helper shuffle_zone(), to perform
    a Fisher-Yates shuffle of the page allocator 'free_area' lists when they
    are initially populated with free memory at boot and at hotplug time. Do
    this based on either the presence of a page_alloc.shuffle=Y command line
    parameter, or autodetection of a memory-side-cache (to be added in a
    follow-on patch).

    The shuffling is done in terms of CONFIG_SHUFFLE_PAGE_ORDER sized free
    pages where the default CONFIG_SHUFFLE_PAGE_ORDER is MAX_ORDER-1 i.e. 10,
    4MB this trades off randomization granularity for time spent shuffling.
    MAX_ORDER-1 was chosen to be minimally invasive to the page allocator
    while still showing memory-side cache behavior improvements, and the
    expectation that the security implications of finer granularity
    randomization is mitigated by CONFIG_SLAB_FREELIST_RANDOM. The
    performance impact of the shuffling appears to be in the noise compared to
    other memory initialization work.

    This initial randomization can be undone over time so a follow-on patch is
    introduced to inject entropy on page free decisions. It is reasonable to
    ask if the page free entropy is sufficient, but it is not enough due to
    the in-order initial freeing of pages. At the start of that process
    putting page1 in front or behind page0 still keeps them close together,
    page2 is still near page1 and has a high chance of being adjacent. As
    more pages are added ordering diversity improves, but there is still high
    page locality for the low address pages and this leads to no significant
    impact to the cache conflict rate.

    [1]: https://itpeernetwork.intel.com/intel-optane-dc-persistent-memory-operating-modes/
    [2]: https://lkml.kernel.org/r/AT5PR8401MB1169D656C8B5E121752FC0F8AB120@AT5PR8401MB1169.NAMPRD84.PROD.OUTLOOK.COM
    [3]: https://lkml.org/lkml/2018/10/12/309

    [dan.j.williams@intel.com: fix shuffle enable]
    Link: http://lkml.kernel.org/r/154943713038.3858443.4125180191382062871.stgit@dwillia2-desk3.amr.corp.intel.com
    [cai@lca.pw: fix SHUFFLE_PAGE_ALLOCATOR help texts]
    Link: http://lkml.kernel.org/r/20190425201300.75650-1-cai@lca.pw
    Link: http://lkml.kernel.org/r/154899811738.3165233.12325692939590944259.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams
    Signed-off-by: Qian Cai
    Reviewed-by: Kees Cook
    Acked-by: Michal Hocko
    Cc: Dave Hansen
    Cc: Keith Busch
    Cc: Robert Elliott
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     
  • Various architectures including x86 poison the freed init memory. Do the
    same in the generic free_initmem implementation and switch sparc32
    architecture that is identical to the generic code over to it now.

    Link: http://lkml.kernel.org/r/1550515285-17446-4-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Reviewed-by: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Palmer Dabbelt
    Cc: Richard Kuo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Patch series "provide a generic free_initmem implementation", v2.

    Many architectures implement free_initmem() in exactly the same or very
    similar way: they wrap the call to free_initmem_default() with sometimes
    different 'poison' parameter.

    These patches switch those architectures to use a generic implementation
    that does free_initmem_default(POISON_FREE_INITMEM).

    This was inspired by Christoph's patches for free_initrd_mem [1] and I
    shamelessly copied changelog entries from his patches :)

    [1] https://lore.kernel.org/lkml/20190213174621.29297-1-hch@lst.de/

    This patch (of 2):

    For most architectures free_initmem just a wrapper for the same
    free_initmem_default(-1) call. Provide that as a generic implementation
    marked __weak.

    Link: http://lkml.kernel.org/r/1550515285-17446-2-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Reviewed-by: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Palmer Dabbelt
    Cc: Richard Kuo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     
  • Various architectures including x86 poison the freed initrd memory. Do
    the same in the generic free_initrd_mem implementation and switch a few
    more architectures that are identical to the generic code over to it now.

    Link: http://lkml.kernel.org/r/20190213174621.29297-9-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Acked-by: Mike Rapoport
    Cc: Catalin Marinas [arm64]
    Cc: Geert Uytterhoeven [m68k]
    Cc: Steven Price
    Cc: Alexander Viro
    Cc: Guan Xuetao
    Cc: Russell King
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • For most architectures free_initrd_mem just expands to the same
    free_reserved_area call. Provide that as a generic implementation marked
    __weak.

    Link: http://lkml.kernel.org/r/20190213174621.29297-8-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Acked-by: Geert Uytterhoeven [m68k]
    Acked-by: Mike Rapoport
    Cc: Catalin Marinas [arm64]
    Cc: Steven Price
    Cc: Alexander Viro
    Cc: Guan Xuetao
    Cc: Russell King
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • No need to handle the freeing disable in arch code when we already have a
    core hook (and a different name for the option) for it.

    Link: http://lkml.kernel.org/r/20190213174621.29297-7-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Acked-by: Catalin Marinas [arm64]
    Acked-by: Mike Rapoport
    Cc: Geert Uytterhoeven [m68k]
    Cc: Steven Price
    Cc: Alexander Viro
    Cc: Guan Xuetao
    Cc: Russell King
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • The code for kernels that support ramdisks or not is mostly the same.
    Unify it by using an IS_ENABLED for the info message, and moving the error
    message into a stub for populate_initrd_image.

    [cai@lca.pw: fix a compilation error]
    Link: http://lkml.kernel.org/r/20190328014806.36375-1-cai@lca.pw
    Link: http://lkml.kernel.org/r/20190213174621.29297-6-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Qian Cai
    Acked-by: Mike Rapoport
    Cc: Catalin Marinas [arm64]
    Cc: Geert Uytterhoeven [m68k]
    Cc: Steven Price
    Cc: Alexander Viro
    Cc: Guan Xuetao
    Cc: Russell King
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • This will allow for cleaner code sharing in the caller.

    Link: http://lkml.kernel.org/r/20190213174621.29297-5-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Acked-by: Mike Rapoport
    Cc: Catalin Marinas [arm64]
    Cc: Geert Uytterhoeven [m68k]
    Cc: Steven Price
    Cc: Alexander Viro
    Cc: Guan Xuetao
    Cc: Russell King
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Factor the kexec logic into a separate helper, and then inline the rest of
    free_initrd into the only caller.

    Link: http://lkml.kernel.org/r/20190213174621.29297-4-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Acked-by: Mike Rapoport
    Cc: Catalin Marinas [arm64]
    Cc: Geert Uytterhoeven [m68k]
    Cc: Steven Price
    Cc: Alexander Viro
    Cc: Guan Xuetao
    Cc: Russell King
    Cc: Will Deacon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Patch series "initramfs tidyups".

    I've spent some time chasing down behavior in initramfs and found
    plenty of opportunity to improve the code. A first stab on that is
    contained in this series.

    This patch (of 7):

    We free the initrd memory for all successful or error cases except for the
    case where opening /initrd.image fails, which looks like an oversight.

    Steven said:

    : This also changes the behaviour when CONFIG_INITRAMFS_FORCE is enabled
    : - specifically it means that the initrd is freed (previously it was
    : ignored and never freed). But that seems like reasonable behaviour and
    : the previous behaviour looks like another oversight.

    Link: http://lkml.kernel.org/r/20190213174621.29297-3-hch@lst.de
    Signed-off-by: Christoph Hellwig
    Reviewed-by: Steven Price
    Acked-by: Mike Rapoport
    Cc: Catalin Marinas [arm64]
    Cc: Geert Uytterhoeven [m68k]
    Cc: Alexander Viro
    Cc: Russell King
    Cc: Will Deacon
    Cc: Guan Xuetao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     

08 May, 2019

4 commits

  • Pull randomness updates from Ted Ts'o:

    - initialize the random driver earler

    - fix CRNG initialization when we trust the CPU's RNG on NUMA systems

    - other miscellaneous cleanups and fixes.

    * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
    random: add a spinlock_t to struct batched_entropy
    random: document get_random_int() family
    random: fix CRNG initialization when random.trust_cpu=1
    random: move rand_initialize() earlier
    random: only read from /dev/random after its pool has received 128 bits
    drivers/char/random.c: make primary_crng static
    drivers/char/random.c: remove unused stuct poolinfo::poolbits
    drivers/char/random.c: constify poolinfo_table

    Linus Torvalds
     
  • Pull driver core/kobject updates from Greg KH:
    "Here is the "big" set of driver core patches for 5.2-rc1

    There are a number of ACPI patches in here as well, as Rafael said
    they should go through this tree due to the driver core changes they
    required. They have all been acked by the ACPI developers.

    There are also a number of small subsystem-specific changes in here,
    due to some changes to the kobject core code. Those too have all been
    acked by the various subsystem maintainers.

    As for content, it's pretty boring outside of the ACPI changes:
    - spdx cleanups
    - kobject documentation updates
    - default attribute groups for kobjects
    - other minor kobject/driver core fixes

    All have been in linux-next for a while with no reported issues"

    * tag 'driver-core-5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (47 commits)
    kobject: clean up the kobject add documentation a bit more
    kobject: Fix kernel-doc comment first line
    kobject: Remove docstring reference to kset
    firmware_loader: Fix a typo ("syfs" -> "sysfs")
    kobject: fix dereference before null check on kobj
    Revert "driver core: platform: Fix the usage of platform device name(pdev->name)"
    init/config: Do not select BUILD_BIN2C for IKCONFIG
    Provide in-kernel headers to make extending kernel easier
    kobject: Improve doc clarity kobject_init_and_add()
    kobject: Improve docs for kobject_add/del
    driver core: platform: Fix the usage of platform device name(pdev->name)
    livepatch: Replace klp_ktype_patch's default_attrs with groups
    cpufreq: schedutil: Replace default_attrs field with groups
    padata: Replace padata_attr_type default_attrs field with groups
    irqdesc: Replace irq_kobj_type's default_attrs field with groups
    net-sysfs: Replace ktype default_attrs field with groups
    block: Replace all ktype default_attrs with groups
    samples/kobject: Replace foo_ktype's default_attrs field with groups
    kobject: Add support for default attribute groups to kobj_type
    driver core: Postpone DMA tear-down until after devres release for probe failure
    ...

    Linus Torvalds
     
  • Pull pidfd updates from Christian Brauner:
    "This patchset makes it possible to retrieve pidfds at process creation
    time by introducing the new flag CLONE_PIDFD to the clone() system
    call. Linus originally suggested to implement this as a new flag to
    clone() instead of making it a separate system call.

    After a thorough review from Oleg CLONE_PIDFD returns pidfds in the
    parent_tidptr argument. This means we can give back the associated pid
    and the pidfd at the same time. Access to process metadata information
    thus becomes rather trivial.

    As has been agreed, CLONE_PIDFD creates file descriptors based on
    anonymous inodes similar to the new mount api. They are made
    unconditional by this patchset as they are now needed by core kernel
    code (vfs, pidfd) even more than they already were before (timerfd,
    signalfd, io_uring, epoll etc.). The core patchset is rather small.
    The bulky looking changelist is caused by David's very simple changes
    to Kconfig to make anon inodes unconditional.

    A pidfd comes with additional information in fdinfo if the kernel
    supports procfs. The fdinfo file contains the pid of the process in
    the callers pid namespace in the same format as the procfs status
    file, i.e. "Pid:\t%d".

    To remove worries about missing metadata access this patchset comes
    with a sample/test program that illustrates how a combination of
    CLONE_PIDFD and pidfd_send_signal() can be used to gain race-free
    access to process metadata through /proc/.

    Further work based on this patchset has been done by Joel. His work
    makes pidfds pollable. It finished too late for this merge window. I
    would prefer to have it sitting in linux-next for a while and send it
    for inclusion during the 5.3 merge window"

    * tag 'pidfd-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux:
    samples: show race-free pidfd metadata access
    signal: support CLONE_PIDFD with pidfd_send_signal
    clone: add CLONE_PIDFD
    Make anon_inodes unconditional

    Linus Torvalds
     
  • Pull printk updates from Petr Mladek:

    - Allow state reset of printk_once() calls.

    - Prevent crashes when dereferencing invalid pointers in vsprintf().
    Only the first byte is checked for simplicity.

    - Make vsprintf warnings consistent and inlined.

    - Treewide conversion of obsolete %pf, %pF to %ps, %pF printf
    modifiers.

    - Some clean up of vsprintf and test_printf code.

    * tag 'printk-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
    lib/vsprintf: Make function pointer_string static
    vsprintf: Limit the length of inlined error messages
    vsprintf: Avoid confusion between invalid address and value
    vsprintf: Prevent crash when dereferencing invalid pointers
    vsprintf: Consolidate handling of unknown pointer specifiers
    vsprintf: Factor out %pO handler as kobject_string()
    vsprintf: Factor out %pV handler as va_format()
    vsprintf: Factor out %p[iI] handler as ip_addr_string()
    vsprintf: Do not check address of well-known strings
    vsprintf: Consistent %pK handling for kptr_restrict == 0
    vsprintf: Shuffle restricted_pointer()
    printk: Tie printk_once / printk_deferred_once into .data.once for reset
    treewide: Switch printk users from %pf and %pF to %ps and %pS, respectively
    lib/test_printf: Switch to bitmap_zalloc()

    Linus Torvalds
     

06 May, 2019

1 commit

  • Poking-mm initialization might require to duplicate the PGD in early
    stage. Initialize the PGD cache earlier to prevent boot failures.

    Reported-by: kernel test robot
    Signed-off-by: Nadav Amit
    Cc: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Rick Edgecombe
    Cc: Rik van Riel
    Cc: Stephen Rothwell
    Cc: Thomas Gleixner
    Fixes: 4fc19708b165 ("x86/alternatives: Initialize temporary mm for patching")
    Link: http://lkml.kernel.org/r/20190505011124.39692-1-namit@vmware.com
    Signed-off-by: Ingo Molnar

    Nadav Amit
     

30 Apr, 2019

1 commit

  • To prevent improper use of the PTEs that are used for text patching, the
    next patches will use a temporary mm struct. Initailize it by copying
    the init mm.

    The address that will be used for patching is taken from the lower area
    that is usually used for the task memory. Doing so prevents the need to
    frequently synchronize the temporary-mm (e.g., when BPF programs are
    installed), since different PGDs are used for the task memory.

    Finally, randomize the address of the PTEs to harden against exploits
    that use these PTEs.

    Suggested-by: Andy Lutomirski
    Tested-by: Masami Hiramatsu
    Signed-off-by: Nadav Amit
    Signed-off-by: Rick Edgecombe
    Signed-off-by: Peter Zijlstra (Intel)
    Reviewed-by: Masami Hiramatsu
    Cc: Borislav Petkov
    Cc: Dave Hansen
    Cc: H. Peter Anvin
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: akpm@linux-foundation.org
    Cc: ard.biesheuvel@linaro.org
    Cc: deneen.t.dock@intel.com
    Cc: kernel-hardening@lists.openwall.com
    Cc: kristen@linux.intel.com
    Cc: linux_dti@icloud.com
    Cc: will.deacon@arm.com
    Link: https://lkml.kernel.org/r/20190426232303.28381-8-nadav.amit@gmail.com
    Signed-off-by: Ingo Molnar

    Nadav Amit
     

29 Apr, 2019

2 commits

  • Since commit 13610aa908dc ("kernel/configs: use .incbin directive to
    embed config_data.gz"), IKCONFIG no longer uses BUILD_BIN2C so prevent
    it from being selected in Kconfig.

    Reviewed-by: Masahiro Yamada
    Signed-off-by: Joel Fernandes (Google)
    Signed-off-by: Greg Kroah-Hartman

    Joel Fernandes (Google)
     
  • Introduce in-kernel headers which are made available as an archive
    through proc (/proc/kheaders.tar.xz file). This archive makes it
    possible to run eBPF and other tracing programs that need to extend the
    kernel for tracing purposes without any dependency on the file system
    having headers.

    A github PR is sent for the corresponding BCC patch at:
    https://github.com/iovisor/bcc/pull/2312

    On Android and embedded systems, it is common to switch kernels but not
    have kernel headers available on the file system. Further once a
    different kernel is booted, any headers stored on the file system will
    no longer be useful. This is an issue even well known to distros.
    By storing the headers as a compressed archive within the kernel, we can
    avoid these issues that have been a hindrance for a long time.

    The best way to use this feature is by building it in. Several users
    have a need for this, when they switch debug kernels, they do not want to
    update the filesystem or worry about it where to store the headers on
    it. However, the feature is also buildable as a module in case the user
    desires it not being part of the kernel image. This makes it possible to
    load and unload the headers from memory on demand. A tracing program can
    load the module, do its operations, and then unload the module to save
    kernel memory. The total memory needed is 3.3MB.

    By having the archive available at a fixed location independent of
    filesystem dependencies and conventions, all debugging tools can
    directly refer to the fixed location for the archive, without concerning
    with where the headers on a typical filesystem which significantly
    simplifies tooling that needs kernel headers.

    The code to read the headers is based on /proc/config.gz code and uses
    the same technique to embed the headers.

    Other approaches were discussed such as having an in-memory mountable
    filesystem, but that has drawbacks such as requiring an in-kernel xz
    decompressor which we don't have today, and requiring usage of 42 MB of
    kernel memory to host the decompressed headers at anytime. Also this
    approach is simpler than such approaches.

    Reviewed-by: Masahiro Yamada
    Signed-off-by: Joel Fernandes (Google)
    Signed-off-by: Greg Kroah-Hartman

    Joel Fernandes (Google)
     

20 Apr, 2019

2 commits

  • Right now rand_initialize() is run as an early_initcall(), but it only
    depends on timekeeping_init() (for mixing ktime_get_real() into the
    pools). However, the call to boot_init_stack_canary() for stack canary
    initialization runs earlier, which triggers a warning at boot:

    random: get_random_bytes called from start_kernel+0x357/0x548 with crng_init=0

    Instead, this moves rand_initialize() to after timekeeping_init(), and moves
    canary initialization here as well.

    Note that this warning may still remain for machines that do not have
    UEFI RNG support (which initializes the RNG pools during setup_arch()),
    or for x86 machines without RDRAND (or booting without "random.trust=on"
    or CONFIG_RANDOM_TRUST_CPU=y).

    Signed-off-by: Kees Cook
    Signed-off-by: Theodore Ts'o

    Kees Cook
     
  • When a module option, or core kernel argument, toggles a static-key it
    requires jump labels to be initialized early. While x86, PowerPC, and
    ARM64 arrange for jump_label_init() to be called before parse_args(),
    ARM does not.

    Kernel command line: rdinit=/sbin/init page_alloc.shuffle=1 panic=-1 console=ttyAMA0,115200 page_alloc.shuffle=1
    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 0 at ./include/linux/jump_label.h:303
    page_alloc_shuffle+0x12c/0x1ac
    static_key_enable(): static key 'page_alloc_shuffle_key+0x0/0x4' used
    before call to jump_label_init()
    Modules linked in:
    CPU: 0 PID: 0 Comm: swapper Not tainted
    5.1.0-rc4-next-20190410-00003-g3367c36ce744 #1
    Hardware name: ARM Integrator/CP (Device Tree)
    [] (unwind_backtrace) from [] (show_stack+0x10/0x18)
    [] (show_stack) from [] (dump_stack+0x18/0x24)
    [] (dump_stack) from [] (__warn+0xe0/0x108)
    [] (__warn) from [] (warn_slowpath_fmt+0x44/0x6c)
    [] (warn_slowpath_fmt) from []
    (page_alloc_shuffle+0x12c/0x1ac)
    [] (page_alloc_shuffle) from [] (shuffle_store+0x28/0x48)
    [] (shuffle_store) from [] (parse_args+0x1f4/0x350)
    [] (parse_args) from [] (start_kernel+0x1c0/0x488)

    Move the fallback call to jump_label_init() to occur before
    parse_args().

    The redundant calls to jump_label_init() in other archs are left intact
    in case they have static key toggling use cases that are even earlier
    than option parsing.

    Link: http://lkml.kernel.org/r/155544804466.1032396.13418949511615676665.stgit@dwillia2-desk3.amr.corp.intel.com
    Signed-off-by: Dan Williams
    Reported-by: Guenter Roeck
    Reviewed-by: Kees Cook
    Cc: Mathieu Desnoyers
    Cc: Thomas Gleixner
    Cc: Mike Rapoport
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Williams
     

19 Apr, 2019

1 commit

  • Make the anon_inodes facility unconditional so that it can be used by core
    VFS code and pidfd code.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro
    [christian@brauner.io: adapt commit message to mention pidfds]
    Signed-off-by: Christian Brauner

    David Howells
     

09 Apr, 2019

1 commit

  • %pF and %pf are functionally equivalent to %pS and %ps conversion
    specifiers. The former are deprecated, therefore switch the current users
    to use the preferred variant.

    The changes have been produced by the following command:

    git grep -l '%p[fF]' | grep -v '^\(tools\|Documentation\)/' | \
    while read i; do perl -i -pe 's/%pf/%ps/g; s/%pF/%pS/g;' $i; done

    And verifying the result.

    Link: http://lkml.kernel.org/r/20190325193229.23390-1-sakari.ailus@linux.intel.com
    Cc: Andy Shevchenko
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: sparclinux@vger.kernel.org
    Cc: linux-um@lists.infradead.org
    Cc: xen-devel@lists.xenproject.org
    Cc: linux-acpi@vger.kernel.org
    Cc: linux-pm@vger.kernel.org
    Cc: drbd-dev@lists.linbit.com
    Cc: linux-block@vger.kernel.org
    Cc: linux-mmc@vger.kernel.org
    Cc: linux-nvdimm@lists.01.org
    Cc: linux-pci@vger.kernel.org
    Cc: linux-scsi@vger.kernel.org
    Cc: linux-btrfs@vger.kernel.org
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: linux-mm@kvack.org
    Cc: ceph-devel@vger.kernel.org
    Cc: netdev@vger.kernel.org
    Signed-off-by: Sakari Ailus
    Acked-by: David Sterba (for btrfs)
    Acked-by: Mike Rapoport (for mm/memblock.c)
    Acked-by: Bjorn Helgaas (for drivers/pci)
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Petr Mladek

    Sakari Ailus
     

13 Mar, 2019

1 commit

  • Add panic() calls if memblock_alloc() returns NULL.

    The panic() format duplicates the one used by memblock itself and in
    order to avoid explosion with long parameters list replace open coded
    allocation size calculations with a local variable.

    Link: http://lkml.kernel.org/r/1548057848-15136-18-git-send-email-rppt@linux.ibm.com
    Signed-off-by: Mike Rapoport
    Cc: Catalin Marinas
    Cc: Christophe Leroy
    Cc: Christoph Hellwig
    Cc: "David S. Miller"
    Cc: Dennis Zhou
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Guo Ren
    Cc: Guo Ren [c-sky]
    Cc: Heiko Carstens
    Cc: Juergen Gross [Xen]
    Cc: Mark Salter
    Cc: Matt Turner
    Cc: Max Filippov
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Paul Burton
    Cc: Petr Mladek
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Rob Herring
    Cc: Rob Herring
    Cc: Russell King
    Cc: Stafford Horne
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

11 Mar, 2019

2 commits

  • Pull Kbuild updates from Masahiro Yamada:

    - do not generate unneeded top-level built-in.a

    - let git ignore O= directory entirely

    - optimize scripts/kallsyms slightly

    - exclude DWARF info from *.s regardless of config options

    - fix GCC toolchain search path for Clang to prepare ld.lld support

    - do not generate modules.order when CONFIG_MODULES is disabled

    - simplify single target rules and remove VPATH for external module
    build

    - allow to add optional flags to dpkg-buildpackage when building
    deb-pkg

    - move some compiler option tests from Makefile to Kconfig

    - various Makefile cleanups

    * tag 'kbuild-v5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (40 commits)
    kbuild: remove scripts/basic/% build target
    kbuild: use -Werror=implicit-... instead of -Werror-implicit-...
    kbuild: clean up scripts/gcc-version.sh
    kbuild: remove cc-version macro
    kbuild: update comment block of scripts/clang-version.sh
    kbuild: remove commented-out INITRD_COMPRESS
    kbuild: move -gsplit-dwarf, -gdwarf-4 option tests to Kconfig
    kbuild: [bin]deb-pkg: add DPKG_FLAGS variable
    kbuild: move ".config not found!" message from Kconfig to Makefile
    kbuild: invoke syncconfig if include/config/auto.conf.cmd is missing
    kbuild: simplify single target rules
    kbuild: remove empty rules for makefiles
    kbuild: make -r/-R effective in top Makefile for old Make versions
    kbuild: move tools_silent to a more relevant place
    kbuild: compute false-positive -Wmaybe-uninitialized cases in Kconfig
    kbuild: refactor cc-cross-prefix implementation
    kbuild: hardcode genksyms path and remove GENKSYMS variable
    scripts/gdb: refactor rules for symlink creation
    kbuild: create symlink to vmlinux-gdb.py in scripts_gdb target
    scripts/gdb: do not descend into scripts/gdb from scripts
    ...

    Linus Torvalds
     
  • Pull timer fix from Thomas Gleixner:
    "A single fix to prevent a unmet dependencies warning in Kconfig"

    * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    time: Make VIRT_CPU_ACCOUNTING_GEN depend on GENERIC_CLOCKEVENTS

    Linus Torvalds
     

09 Mar, 2019

1 commit

  • Pull io_uring IO interface from Jens Axboe:
    "Second attempt at adding the io_uring interface.

    Since the first one, we've added basic unit testing of the three
    system calls, that resides in liburing like the other unit tests that
    we have so far. It'll take a while to get full coverage of it, but
    we're working towards it. I've also added two basic test programs to
    tools/io_uring. One uses the raw interface and has support for all the
    various features that io_uring supports outside of standard IO, like
    fixed files, fixed IO buffers, and polled IO. The other uses the
    liburing API, and is a simplified version of cp(1).

    This adds support for a new IO interface, io_uring.

    io_uring allows an application to communicate with the kernel through
    two rings, the submission queue (SQ) and completion queue (CQ) ring.
    This allows for very efficient handling of IOs, see the v5 posting for
    some basic numbers:

    https://lore.kernel.org/linux-block/20190116175003.17880-1-axboe@kernel.dk/

    Outside of just efficiency, the interface is also flexible and
    extendable, and allows for future use cases like the upcoming NVMe
    key-value store API, networked IO, and so on. It also supports async
    buffered IO, something that we've always failed to support in the
    kernel.

    Outside of basic IO features, it supports async polled IO as well.
    This particular feature has already been tested at Facebook months ago
    for flash storage boxes, with 25-33% improvements. It makes polled IO
    actually useful for real world use cases, where even basic flash sees
    a nice win in terms of efficiency, latency, and performance. These
    boxes were IOPS bound before, now they are not.

    This series adds three new system calls. One for setting up an
    io_uring instance (io_uring_setup(2)), one for submitting/completing
    IO (io_uring_enter(2)), and one for aux functions like registrating
    file sets, buffers, etc (io_uring_register(2)). Through the help of
    Arnd, I've coordinated the syscall numbers so merge on that front
    should be painless.

    Jon did a writeup of the interface a while back, which (except for
    minor details that have been tweaked) is still accurate. Find that
    here:

    https://lwn.net/Articles/776703/

    Huge thanks to Al Viro for helping getting the reference cycle code
    correct, and to Jann Horn for his extensive reviews focused on both
    security and bugs in general.

    There's a userspace library that provides basic functionality for
    applications that don't need or want to care about how to fiddle with
    the rings directly. It has helpers to allow applications to easily set
    up an io_uring instance, and submit/complete IO through it without
    knowing about the intricacies of the rings. It also includes man pages
    (thanks to Jeff Moyer), and will continue to grow support helper
    functions and features as time progresses. Find it here:

    git://git.kernel.dk/liburing

    Fio has full support for the raw interface, both in the form of an IO
    engine (io_uring), but also with a small test application (t/io_uring)
    that can exercise and benchmark the interface"

    * tag 'io_uring-2019-03-06' of git://git.kernel.dk/linux-block:
    io_uring: add a few test tools
    io_uring: allow workqueue item to handle multiple buffered requests
    io_uring: add support for IORING_OP_POLL
    io_uring: add io_kiocb ref count
    io_uring: add submission polling
    io_uring: add file set registration
    net: split out functions related to registering inflight socket files
    io_uring: add support for pre-mapped user IO buffers
    block: implement bio helper to add iter bvec pages to bio
    io_uring: batch io_kiocb allocation
    io_uring: use fget/fput_many() for file references
    fs: add fget_many() and fput_many()
    io_uring: support for IO polling
    io_uring: add fsync support
    Add io_uring IO interface

    Linus Torvalds
     

08 Mar, 2019

3 commits

  • Merge more updates from Andrew Morton:

    - some of the rest of MM

    - various misc things

    - dynamic-debug updates

    - checkpatch

    - some epoll speedups

    - autofs

    - rapidio

    - lib/, lib/lzo/ updates

    * emailed patches from Andrew Morton : (83 commits)
    samples/mic/mpssd/mpssd.h: remove duplicate header
    kernel/fork.c: remove duplicated include
    include/linux/relay.h: fix percpu annotation in struct rchan
    arch/nios2/mm/fault.c: remove duplicate include
    unicore32: stop printing the virtual memory layout
    MAINTAINERS: fix GTA02 entry and mark as orphan
    mm: create the new vm_fault_t type
    arm, s390, unicore32: remove oneliner wrappers for memblock_alloc()
    arch: simplify several early memory allocations
    openrisc: simplify pte_alloc_one_kernel()
    sh: prefer memblock APIs returning virtual address
    microblaze: prefer memblock API returning virtual address
    powerpc: prefer memblock APIs returning virtual address
    lib/lzo: separate lzo-rle from lzo
    lib/lzo: implement run-length encoding
    lib/lzo: fast 8-byte copy on arm64
    lib/lzo: 64-bit CTZ on arm64
    lib/lzo: tidy-up ifdefs
    ipc/sem.c: replace kvmalloc/memset with kvzalloc and use struct_size
    ipc: annotate implicit fall through
    ...

    Linus Torvalds
     
  • Use distinct error messages when archive decompression failed.

    Link: http://lkml.kernel.org/r/20190212075635.7373-1-david.engraf@sysgo.com
    Signed-off-by: David Engraf
    Reviewed-by: Andrew Morton
    Tested-by: Andy Shevchenko
    Cc: Dominik Brodowski
    Cc: Greg Kroah-Hartman
    Cc: Philippe Ombredanne
    Cc: Arnd Bergmann
    Cc: Luc Van Oostenryck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Engraf
     
  • Pull audit updates from Paul Moore:
    "A lucky 13 audit patches for v5.1.

    Despite the rather large diffstat, most of the changes are from two
    bug fix patches that move code from one Kconfig option to another.

    Beyond that bit of churn, the remaining changes are largely cleanups
    and bug-fixes as we slowly march towards container auditing. It isn't
    all boring though, we do have a couple of new things: file
    capabilities v3 support, and expanded support for filtering on
    filesystems to solve problems with remote filesystems.

    All changes pass the audit-testsuite. Please merge for v5.1"

    * tag 'audit-pr-20190305' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit:
    audit: mark expected switch fall-through
    audit: hide auditsc_get_stamp and audit_serial prototypes
    audit: join tty records to their syscall
    audit: remove audit_context when CONFIG_ AUDIT and not AUDITSYSCALL
    audit: remove unused actx param from audit_rule_match
    audit: ignore fcaps on umount
    audit: clean up AUDITSYSCALL prototypes and stubs
    audit: more filter PATH records keyed on filesystem magic
    audit: add support for fcaps v3
    audit: move loginuid and sessionid from CONFIG_AUDITSYSCALL to CONFIG_AUDIT
    audit: add syscall information to CONFIG_CHANGE records
    audit: hand taken context to audit_kill_trees for syscall logging
    audit: give a clue what CONFIG_CHANGE op was involved

    Linus Torvalds
     

07 Mar, 2019

3 commits

  • Moving the CONTEXT_TRACKING Kconfig option into kernel/time/Kconfig added
    an implicit dependency on the surrounding GENERIC_CLOCKEVENTS option, but
    this is not always enabled when it is possible to select
    VIRT_CPU_ACCOUNTING_GEN:

    WARNING: unmet direct dependencies detected for CONTEXT_TRACKING
    Depends on [n]: GENERIC_CLOCKEVENTS [=n]
    Selected by [y]:
    - VIRT_CPU_ACCOUNTING_GEN [=y] && && HAVE_CONTEXT_TRACKING [=y] && HAVE_VIRT_CPU_ACCOUNTING_GEN [=y]

    Platforms without GENERIC_CLOCKEVENTS are rare enough so that corner case
    can be just ignored. Make it a dependency for VIRT_CPU_ACCOUNTING_GEN to
    simplify the configuration.

    Fixes: a4cffdad7314 ("time: Move CONTEXT_TRACKING to kernel/time/Kconfig")
    Signed-off-by: Arnd Bergmann
    Signed-off-by: Thomas Gleixner
    Cc: "Paul E . McKenney"
    Cc: Frederic Weisbecker
    Link: https://lkml.kernel.org/r/20190304200202.1163250-1-arnd@arndb.de

    Arnd Bergmann
     
  • Merge misc updates from Andrew Morton:

    - a few misc things

    - ocfs2 updates

    - most of MM

    * emailed patches from Andrew Morton : (159 commits)
    tools/testing/selftests/proc/proc-self-syscall.c: remove duplicate include
    proc: more robust bulk read test
    proc: test /proc/*/maps, smaps, smaps_rollup, statm
    proc: use seq_puts() everywhere
    proc: read kernel cpu stat pointer once
    proc: remove unused argument in proc_pid_lookup()
    fs/proc/thread_self.c: code cleanup for proc_setup_thread_self()
    fs/proc/self.c: code cleanup for proc_setup_self()
    proc: return exit code 4 for skipped tests
    mm,mremap: bail out earlier in mremap_to under map pressure
    mm/sparse: fix a bad comparison
    mm/memory.c: do_fault: avoid usage of stale vm_area_struct
    writeback: fix inode cgroup switching comment
    mm/huge_memory.c: fix "orig_pud" set but not used
    mm/hotplug: fix an imbalance with DEBUG_PAGEALLOC
    mm/memcontrol.c: fix bad line in comment
    mm/cma.c: cma_declare_contiguous: correct err handling
    mm/page_ext.c: fix an imbalance with kmemleak
    mm/compaction: pass pgdat to too_many_isolated() instead of zone
    mm: remove zone_lru_lock() function, access ->lru_lock directly
    ...

    Linus Torvalds
     
  • Pull scheduler updates from Ingo Molnar:
    "The main changes in this cycle were:

    - refcount conversions

    - Solve the rq->leaf_cfs_rq_list can of worms for real.

    - improve power-aware scheduling

    - add sysctl knob for Energy Aware Scheduling

    - documentation updates

    - misc other changes"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
    kthread: Do not use TIMER_IRQSAFE
    kthread: Convert worker lock to raw spinlock
    sched/fair: Use non-atomic cpumask_{set,clear}_cpu()
    sched/fair: Remove unused 'sd' parameter from select_idle_smt()
    sched/wait: Use freezable_schedule() when possible
    sched/fair: Prune, fix and simplify the nohz_balancer_kick() comment block
    sched/fair: Explain LLC nohz kick condition
    sched/fair: Simplify nohz_balancer_kick()
    sched/topology: Fix percpu data types in struct sd_data & struct s_data
    sched/fair: Simplify post_init_entity_util_avg() by calling it with a task_struct pointer argument
    sched/fair: Fix O(nr_cgroups) in the load balancing path
    sched/fair: Optimize update_blocked_averages()
    sched/fair: Fix insertion in rq->leaf_cfs_rq_list
    sched/fair: Add tmp_alone_branch assertion
    sched/core: Use READ_ONCE()/WRITE_ONCE() in move_queued_task()/task_rq_lock()
    sched/debug: Initialize sd_sysctl_cpus if !CONFIG_CPUMASK_OFFSTACK
    sched/pelt: Skip updating util_est when utilization is higher than CPU's capacity
    sched/fair: Update scale invariance of PELT
    sched/fair: Move the rq_of() helper function
    sched/core: Convert task_struct.stack_refcount to refcount_t
    ...

    Linus Torvalds
     

06 Mar, 2019

1 commit

  • Patch series "Replace all open encodings for NUMA_NO_NODE", v3.

    All these places for replacement were found by running the following
    grep patterns on the entire kernel code. Please let me know if this
    might have missed some instances. This might also have replaced some
    false positives. I will appreciate suggestions, inputs and review.

    1. git grep "nid == -1"
    2. git grep "node == -1"
    3. git grep "nid = -1"
    4. git grep "node = -1"

    This patch (of 2):

    At present there are multiple places where invalid node number is
    encoded as -1. Even though implicitly understood it is always better to
    have macros in there. Replace these open encodings for an invalid node
    number with the global macro NUMA_NO_NODE. This helps remove NUMA
    related assumptions like 'invalid node' from various places redirecting
    them to a common definition.

    Link: http://lkml.kernel.org/r/1545127933-10711-2-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Anshuman Khandual
    Reviewed-by: David Hildenbrand
    Acked-by: Jeff Kirsher [ixgbe]
    Acked-by: Jens Axboe [mtip32xx]
    Acked-by: Vinod Koul [dmaengine.c]
    Acked-by: Michael Ellerman [powerpc]
    Acked-by: Doug Ledford [drivers/infiniband]
    Cc: Joseph Qi
    Cc: Hans Verkuil
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anshuman Khandual
     

04 Mar, 2019

1 commit


28 Feb, 2019

1 commit

  • The submission queue (SQ) and completion queue (CQ) rings are shared
    between the application and the kernel. This eliminates the need to
    copy data back and forth to submit and complete IO.

    IO submissions use the io_uring_sqe data structure, and completions
    are generated in the form of io_uring_cqe data structures. The SQ
    ring is an index into the io_uring_sqe array, which makes it possible
    to submit a batch of IOs without them being contiguous in the ring.
    The CQ ring is always contiguous, as completion events are inherently
    unordered, and hence any io_uring_cqe entry can point back to an
    arbitrary submission.

    Two new system calls are added for this:

    io_uring_setup(entries, params)
    Sets up an io_uring instance for doing async IO. On success,
    returns a file descriptor that the application can mmap to
    gain access to the SQ ring, CQ ring, and io_uring_sqes.

    io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize)
    Initiates IO against the rings mapped to this fd, or waits for
    them to complete, or both. The behavior is controlled by the
    parameters passed in. If 'to_submit' is non-zero, then we'll
    try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
    kernel will wait for 'min_complete' events, if they aren't
    already available. It's valid to set IORING_ENTER_GETEVENTS
    and 'min_complete' == 0 at the same time, this allows the
    kernel to return already completed events without waiting
    for them. This is useful only for polling, as for IRQ
    driven IO, the application can just check the CQ ring
    without entering the kernel.

    With this setup, it's possible to do async IO with a single system
    call. Future developments will enable polled IO with this interface,
    and polled submission as well. The latter will enable an application
    to do IO without doing ANY system calls at all.

    For IRQ driven IO, an application only needs to enter the kernel for
    completions if it wants to wait for them to occur.

    Each io_uring is backed by a workqueue, to support buffered async IO
    as well. We will only punt to an async context if the command would
    need to wait for IO on the device side. Any data that can be accessed
    directly in the page cache is done inline. This avoids the slowness
    issue of usual threadpools, since cached data is accessed as quickly
    as a sync interface.

    Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.c

    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Jens Axboe
     

27 Feb, 2019

1 commit

  • Since -Wmaybe-uninitialized was introduced by GCC 4.7, we have patched
    various false positives:

    - commit e74fc973b6e5 ("Turn off -Wmaybe-uninitialized when building
    with -Os") turned off this option for -Os.

    - commit 815eb71e7149 ("Kbuild: disable 'maybe-uninitialized' warning
    for CONFIG_PROFILE_ALL_BRANCHES") turned off this option for
    CONFIG_PROFILE_ALL_BRANCHES

    - commit a76bcf557ef4 ("Kbuild: enable -Wmaybe-uninitialized warning
    for "make W=1"") turned off this option for GCC < 4.9
    Arnd provided more explanation in https://lkml.org/lkml/2017/3/14/903

    I think this looks better by shifting the logic from Makefile to Kconfig.

    Link: https://github.com/ClangBuiltLinux/linux/issues/350
    Signed-off-by: Masahiro Yamada
    Reviewed-by: Nathan Chancellor
    Tested-by: Nick Desaulniers

    Masahiro Yamada
     

22 Feb, 2019

1 commit

  • Revert ff1522bb7d9845 ("initramfs: cleanup incomplete rootfs").

    Andy reports

    : This breaks my setup where I have U-boot provided more size of initramfs
    : than needed. This allows a bit of flexibility to increase or decrease
    : initramfs compressed image without taking care of bootloader. The proper
    : solution is to do this if we sure that we didn't get enough memory,
    : otherwise I can't consider the error fatal to clean up rootfs.

    Fixes: ff1522bb7d9845 ("initramfs: cleanup incomplete rootfs")
    Reported-by: Andy Shevchenko
    Tested-by: Andy Shevchenko
    Cc: David Engraf
    Cc: Dominik Brodowski
    Cc: Greg Kroah-Hartman
    Cc: Philippe Ombredanne
    Cc: Arnd Bergmann
    Cc: Luc Van Oostenryck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton