28 Nov, 2017

1 commit


18 Nov, 2017

5 commits

  • Clean up the EXPERT menu (yet again).

    Move FHANDLE and CHECKPOINT_RESTORE into the primary EXPERT menu since
    they already depend on EXPERT.

    Move BPF_SYSCALL and USERFAULTFD out of the EXPERT Kconfig symbols menu
    list since they do not depend on EXPERT and were breaking the continuity
    of that menu list.

    Move all of the KALLSYMS Kconfig symbols to the end of the EXPERT menu.
    This separates the kernel services from the build options.

    This patch depends on [PATCH] pci: move PCI_QUIRKS to the PCI bus menu
    (https://lkml.org/lkml/2017/11/2/907).

    Link: http://lkml.kernel.org/r/72e4465a-a5ff-cb3c-1a90-11aa4861b161@infradead.org
    Signed-off-by: Randy Dunlap
    Acked-by: Daniel Borkmann [BPF]
    Cc: Andrea Arcangeli
    Cc: Alexei Starovoitov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • The cpio format uses a 32-bit number to encode file timestamps, which
    breaks initramfs support in 2038. This reinterprets the timestamp as
    unsigned, to give us another 68 years and avoids breaking until 2106.

    Link: http://lkml.kernel.org/r/20171019095536.801199-1-arnd@arndb.de
    Signed-off-by: Arnd Bergmann
    Cc: Al Viro
    Cc: Deepa Dinamani
    Cc: Arnd Bergmann
    Cc: Daniel Thompson
    Cc: Lokesh Vutla
    Cc: Stafford Horne
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     
  • pidhash is no longer required as all the information can be looked up
    from idr tree. nr_hashed represented the number of pids that had been
    hashed. Since, nr_hashed and PIDNS_HASH_ADDING are no longer relevant,
    it has been renamed to pid_allocated and PIDNS_ADDING respectively.

    [gs051095@gmail.com: v6]
    Link: http://lkml.kernel.org/r/1507760379-21662-3-git-send-email-gs051095@gmail.com
    Link: http://lkml.kernel.org/r/1507583624-22146-3-git-send-email-gs051095@gmail.com
    Signed-off-by: Gargi Sharma
    Reviewed-by: Rik van Riel
    Tested-by: Tony Luck [ia64]
    Cc: Julia Lawall
    Cc: Ingo Molnar
    Cc: Pavel Tatashin
    Cc: Kirill Tkhai
    Cc: Oleg Nesterov
    Cc: Eric W. Biederman
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gargi Sharma
     
  • Patch series "Replacing PID bitmap implementation with IDR API", v4.

    This series replaces kernel bitmap implementation of PID allocation with
    IDR API. These patches are written to simplify the kernel by replacing
    custom code with calls to generic code.

    The following are the stats for pid and pid_namespace object files
    before and after the replacement. There is a noteworthy change between
    the IDR and bitmap implementation.

    Before
    text data bss dec hex filename
    8447 3894 64 12405 3075 kernel/pid.o
    After
    text data bss dec hex filename
    3397 304 0 3701 e75 kernel/pid.o

    Before
    text data bss dec hex filename
    5692 1842 192 7726 1e2e kernel/pid_namespace.o
    After
    text data bss dec hex filename
    2854 216 16 3086 c0e kernel/pid_namespace.o

    The following are the stats for ps, pstree and calling readdir on /proc
    for 10,000 processes.

    ps:
    With IDR API With bitmap
    real 0m1.479s 0m2.319s
    user 0m0.070s 0m0.060s
    sys 0m0.289s 0m0.516s

    pstree:
    With IDR API With bitmap
    real 0m1.024s 0m1.794s
    user 0m0.348s 0m0.612s
    sys 0m0.184s 0m0.264s

    proc:
    With IDR API With bitmap
    real 0m0.059s 0m0.074s
    user 0m0.000s 0m0.004s
    sys 0m0.016s 0m0.016s

    This patch (of 2):

    Replace the current bitmap implementation for Process ID allocation.
    Functions that are no longer required, for example, free_pidmap(),
    alloc_pidmap(), etc. are removed. The rest of the functions are
    modified to use the IDR API. The change was made to make the PID
    allocation less complex by replacing custom code with calls to generic
    API.

    [gs051095@gmail.com: v6]
    Link: http://lkml.kernel.org/r/1507760379-21662-2-git-send-email-gs051095@gmail.com
    [avagin@openvz.org: restore the old behaviour of the ns_last_pid sysctl]
    Link: http://lkml.kernel.org/r/20171106183144.16368-1-avagin@openvz.org
    Link: http://lkml.kernel.org/r/1507583624-22146-2-git-send-email-gs051095@gmail.com
    Signed-off-by: Gargi Sharma
    Reviewed-by: Rik van Riel
    Acked-by: Oleg Nesterov
    Cc: Julia Lawall
    Cc: Ingo Molnar
    Cc: Pavel Tatashin
    Cc: Kirill Tkhai
    Cc: Eric W. Biederman
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gargi Sharma
     
  • init/version.c has nothing to do with modules, so remove the
    .

    Instead, include for EXPORT_SYMBOL_GPL.

    This cuts off a lot of unnecessary header parsing.

    Link: http://lkml.kernel.org/r/1505920984-8523-1-git-send-email-yamada.masahiro@socionext.com
    Signed-off-by: Masahiro Yamada
    Cc: Paul Gortmaker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masahiro Yamada
     

16 Nov, 2017

6 commits

  • Merge updates from Andrew Morton:

    - a few misc bits

    - ocfs2 updates

    - almost all of MM

    * emailed patches from Andrew Morton : (131 commits)
    memory hotplug: fix comments when adding section
    mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP
    mm: simplify nodemask printing
    mm,oom_reaper: remove pointless kthread_run() error check
    mm/page_ext.c: check if page_ext is not prepared
    writeback: remove unused function parameter
    mm: do not rely on preempt_count in print_vma_addr
    mm, sparse: do not swamp log with huge vmemmap allocation failures
    mm/hmm: remove redundant variable align_end
    mm/list_lru.c: mark expected switch fall-through
    mm/shmem.c: mark expected switch fall-through
    mm/page_alloc.c: broken deferred calculation
    mm: don't warn about allocations which stall for too long
    fs: fuse: account fuse_inode slab memory as reclaimable
    mm, page_alloc: fix potential false positive in __zone_watermark_ok
    mm: mlock: remove lru_add_drain_all()
    mm, sysctl: make NUMA stats configurable
    shmem: convert shmem_init_inodecache() to void
    Unify migrate_pages and move_pages access checks
    mm, pagevec: rename pagevec drained field
    ...

    Linus Torvalds
     
  • Convert all allocations that used a NOTRACK flag to stop using it.

    Link: http://lkml.kernel.org/r/20171007030159.22241-3-alexander.levin@verizon.com
    Signed-off-by: Sasha Levin
    Cc: Alexander Potapenko
    Cc: Eric W. Biederman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Steven Rostedt
    Cc: Tim Hansen
    Cc: Vegard Nossum
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Levin, Alexander (Sasha Levin)
     
  • Patch series "kmemcheck: kill kmemcheck", v2.

    As discussed at LSF/MM, kill kmemcheck.

    KASan is a replacement that is able to work without the limitation of
    kmemcheck (single CPU, slow). KASan is already upstream.

    We are also not aware of any users of kmemcheck (or users who don't
    consider KASan as a suitable replacement).

    The only objection was that since KASAN wasn't supported by all GCC
    versions provided by distros at that time we should hold off for 2
    years, and try again.

    Now that 2 years have passed, and all distros provide gcc that supports
    KASAN, kill kmemcheck again for the very same reasons.

    This patch (of 4):

    Remove kmemcheck annotations, and calls to kmemcheck from the kernel.

    [alexander.levin@verizon.com: correctly remove kmemcheck call from dma_map_sg_attrs]
    Link: http://lkml.kernel.org/r/20171012192151.26531-1-alexander.levin@verizon.com
    Link: http://lkml.kernel.org/r/20171007030159.22241-2-alexander.levin@verizon.com
    Signed-off-by: Sasha Levin
    Cc: Alexander Potapenko
    Cc: Eric W. Biederman
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Cc: Steven Rostedt
    Cc: Tim Hansen
    Cc: Vegard Nossum
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Levin, Alexander (Sasha Levin)
     
  • According to discussion with Christoph
    (https://marc.info/?l=linux-kernel&m=150695909709711&w=2), it sounds like
    it is pointless to keep CONFIG_SLABINFO around.

    This patch removes the CONFIG_SLABINFO config option, but /proc/slabinfo
    is still available.

    [yang.s@alibaba-inc.com: v11]
    Link: http://lkml.kernel.org/r/1507656303-103845-3-git-send-email-yang.s@alibaba-inc.com
    Link: http://lkml.kernel.org/r/1507152550-46205-3-git-send-email-yang.s@alibaba-inc.com
    Signed-off-by: Yang Shi
    Acked-by: David Rientjes
    Cc: Christoph Lameter
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yang Shi
     
  • Pull PCI updates from Bjorn Helgaas:

    - detach driver before tearing down procfs/sysfs (Alex Williamson)

    - disable PCIe services during shutdown (Sinan Kaya)

    - fix ASPM oops on systems with no Root Ports (Ard Biesheuvel)

    - fix ASPM LTR_L1.2_THRESHOLD programming (Bjorn Helgaas)

    - fix ASPM Common_Mode_Restore_Time computation (Bjorn Helgaas)

    - fix portdrv MSI/MSI-X vector allocation (Dongdong Liu, Bjorn
    Helgaas)

    - report non-fatal AER errors only to the affected endpoint (Gabriele
    Paoloni)

    - distribute bus numbers, MMIO, and I/O space among hotplug bridges to
    allow more devices to be hot-added (Mika Westerberg)

    - fix pciehp races during initialization and surprise link down (Mika
    Westerberg)

    - handle surprise-removed devices in PME handling (Qiang)

    - support resizable BARs for large graphics devices (Christian König)

    - expose SR-IOV offset, stride, and VF device ID via sysfs (Filippo
    Sironi)

    - create SR-IOV virtfn/physfn sysfs links before attaching driver
    (Stuart Hayes)

    - fix SR-IOV "ARI Capable Hierarchy" restore issue (Tony Nguyen)

    - enforce Kconfig IOV/REALLOC dependency (Sascha El-Sharkawy)

    - avoid slot reset if bridge itself is broken (Jan Glauber)

    - clean up pci_reset_function() path (Jan H. Schönherr)

    - make pci_map_rom() fail if the option ROM is invalid (Changbin Du)

    - convert timers to timer_setup() (Kees Cook)

    - move PCI_QUIRKS to PCI bus Kconfig menu (Randy Dunlap)

    - constify pci_dev_type and intel_mid_pci_ops (Bhumika Goyal)

    - remove unnecessary pci_dev, pci_bus, resource, pcibios_set_master()
    declarations (Bjorn Helgaas)

    - fix endpoint framework overflows and BUG()s (Dan Carpenter)

    - fix endpoint framework issues (Kishon Vijay Abraham I)

    - avoid broken Cavium CN8xxx bus reset behavior (David Daney)

    - extend Cavium ACS capability quirks (Vadim Lomovtsev)

    - support Synopsys DesignWare RC in ECAM mode (Ard Biesheuvel)

    - turn off dra7xx clocks cleanly on shutdown (Keerthy)

    - fix Faraday probe error path (Wei Yongjun)

    - support HiSilicon STB SoC PCIe host controller (Jianguo Sun)

    - fix Hyper-V interrupt affinity issue (Dexuan Cui)

    - remove useless ACPI warning for Hyper-V pass-through devices (Vitaly
    Kuznetsov)

    - support multiple MSI on iProc (Sandor Bodo-Merle)

    - support Layerscape LS1012a and LS1046a PCIe host controllers (Hou
    Zhiqiang)

    - fix Layerscape default error response (Minghuan Lian)

    - support MSI on Tango host controller (Marc Gonzalez)

    - support Tegra186 PCIe host controller (Manikanta Maddireddy)

    - use generic accessors on Tegra when possible (Thierry Reding)

    - support V3 Semiconductor PCI host controller (Linus Walleij)

    * tag 'pci-v4.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (85 commits)
    PCI/ASPM: Add L1 Substates definitions
    PCI/ASPM: Reformat ASPM register definitions
    PCI/ASPM: Use correct capability pointer to program LTR_L1.2_THRESHOLD
    PCI/ASPM: Account for downstream device's Port Common_Mode_Restore_Time
    PCI: xgene: Rename xgene_pcie_probe_bridge() to xgene_pcie_probe()
    PCI: xilinx: Rename xilinx_pcie_link_is_up() to xilinx_pcie_link_up()
    PCI: altera: Rename altera_pcie_link_is_up() to altera_pcie_link_up()
    PCI: Fix kernel-doc build warning
    PCI: Fail pci_map_rom() if the option ROM is invalid
    PCI: Move pci_map_rom() error path
    PCI: Move PCI_QUIRKS to the PCI bus menu
    alpha/PCI: Make pdev_save_srm_config() static
    PCI: Remove unused declarations
    PCI: Remove redundant pci_dev, pci_bus, resource declarations
    PCI: Remove redundant pcibios_set_master() declarations
    PCI/PME: Handle invalid data when reading Root Status
    PCI: hv: Use effective affinity mask
    PCI: pciehp: Do not clear Presence Detect Changed during initialization
    PCI: pciehp: Fix race condition handling surprise link down
    PCI: Distribute available resources to hotplug-capable bridges
    ...

    Linus Torvalds
     
  • Pull trivial tree updates from Jiri Kosina:
    "The usual rocket-science from trivial tree for 4.15"

    * 'for-linus' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    MAINTAINERS: relinquish kconfig
    MAINTAINERS: Update my email address
    treewide: Fix typos in Kconfig
    kfifo: Fix comments
    init/Kconfig: Fix module signing document location
    misc: ibmasm: Return error on error path
    HID: logitech-hidpp: fix mistake in printk, "feeback" -> "feedback"
    MAINTAINERS: Correct path to uDraw PS3 driver
    tracing: Fix doc mistakes in trace sample
    tracing: Kconfig text fixes for CONFIG_HWLAT_TRACER
    MIPS: Alchemy: Remove reverted CONFIG_NETLINK_MMAP from db1xxx_defconfig
    mm/huge_memory.c: fixup grammar in comment
    lib/xz: Add fall-through comments to a switch statement

    Linus Torvalds
     

14 Nov, 2017

1 commit

  • Pull x86 APIC updates from Thomas Gleixner:
    "This update provides a major overhaul of the APIC initialization and
    vector allocation code:

    - Unification of the APIC and interrupt mode setup which was
    scattered all over the place and was hard to follow. This also
    distangles the timer setup from the APIC initialization which
    brings a clear separation of functionality.

    Great detective work from Dou Lyiang!

    - Refactoring of the x86 vector allocation mechanism. The existing
    code was based on nested loops and rather convoluted APIC callbacks
    which had a horrible worst case behaviour and tried to serve all
    different use cases in one go. This led to quite odd hacks when
    supporting the new managed interupt facility for multiqueue devices
    and made it more or less impossible to deal with the vector space
    exhaustion which was a major roadblock for server hibernation.

    Aside of that the code dealing with cpu hotplug and the system
    vectors was disconnected from the actual vector management and
    allocation code, which made it hard to follow and maintain.

    Utilizing the new bitmap matrix allocator core mechanism, the new
    allocator and management code consolidates the handling of system
    vectors, legacy vectors, cpu hotplug mechanisms and the actual
    allocation which needs to be aware of system and legacy vectors and
    hotplug constraints into a single consistent entity.

    This has one visible change: The support for multi CPU targets of
    interrupts, which is only available on a certain subset of
    CPUs/APIC variants has been removed in favour of single interrupt
    targets. A proper analysis of the multi CPU target feature revealed
    that there is no real advantage as the vast majority of interrupts
    end up on the CPU with the lowest APIC id in the set of target CPUs
    anyway. That change was agreed on by the relevant folks and allowed
    to simplify the implementation significantly and to replace rather
    fragile constructs like the vector cleanup IPI with straight
    forward and solid code.

    Furthermore this allowed to cleanly separate the allocation details
    for legacy, normal and managed interrupts:

    * Legacy interrupts are not longer wasting 16 vectors
    unconditionally

    * Managed interrupts have now a guaranteed vector reservation, but
    the actual vector assignment happens when the interrupt is
    requested. It's guaranteed not to fail.

    * Normal interrupts no longer allocate vectors unconditionally
    when the interrupt is set up (IO/APIC init or MSI(X) enable).
    The mechanism has been switched to a best effort reservation
    mode. The actual allocation happens when the interrupt is
    requested. Contrary to managed interrupts the request can fail
    due to vector space exhaustion, but drivers must handle a fail
    of request_irq() anyway. When the interrupt is freed, the vector
    is handed back as well.

    This solves a long standing problem with large unconditional
    vector allocations for a certain class of enterprise devices
    which prevented server hibernation due to vector space
    exhaustion when the unused allocated vectors had to be migrated
    to CPU0 while unplugging all non boot CPUs.

    The code has been equipped with trace points and detailed debugfs
    information to aid analysis of the vector space"

    * 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (60 commits)
    x86/vector/msi: Select CONFIG_GENERIC_IRQ_RESERVATION_MODE
    PCI/MSI: Set MSI_FLAG_MUST_REACTIVATE in core code
    genirq: Add config option for reservation mode
    x86/vector: Use correct per cpu variable in free_moved_vector()
    x86/apic/vector: Ignore set_affinity call for inactive interrupts
    x86/apic: Fix spelling mistake: "symmectic" -> "symmetric"
    x86/apic: Use dead_cpu instead of current CPU when cleaning up
    ACPI/init: Invoke early ACPI initialization earlier
    x86/vector: Respect affinity mask in irq descriptor
    x86/irq: Simplify hotplug vector accounting
    x86/vector: Switch IOAPIC to global reservation mode
    x86/vector/msi: Switch to global reservation mode
    x86/vector: Handle managed interrupts proper
    x86/io_apic: Reevaluate vector configuration on activate()
    iommu/amd: Reevaluate vector configuration on activate()
    iommu/vt-d: Reevaluate vector configuration on activate()
    x86/apic/msi: Force reactivation of interrupts at startup time
    x86/vector: Untangle internal state from irq_cfg
    x86/vector: Compile SMP only code conditionally
    x86/apic: Remove unused callbacks
    ...

    Linus Torvalds
     

08 Nov, 2017

2 commits


07 Nov, 2017

1 commit


03 Nov, 2017

1 commit

  • …el/git/gregkh/driver-core

    Pull initial SPDX identifiers from Greg KH:
    "License cleanup: add SPDX license identifiers to some files

    Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the
    'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally
    binding shorthand, which can be used instead of the full boiler plate
    text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart
    and Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset
    of the use cases:

    - file had no licensing information it it.

    - file was a */uapi/* one with no licensing information in it,

    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to
    license had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied
    to a file was done in a spreadsheet of side by side results from of
    the output of two independent scanners (ScanCode & Windriver)
    producing SPDX tag:value files created by Philippe Ombredanne.
    Philippe prepared the base worksheet, and did an initial spot review
    of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537
    files assessed. Kate Stewart did a file by file comparison of the
    scanner results in the spreadsheet to determine which SPDX license
    identifier(s) to be applied to the file. She confirmed any
    determination that was not immediately clear with lawyers working with
    the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:

    - Files considered eligible had to be source code files.

    - Make and config files were included as candidates if they contained
    >5 lines of source

    - File already had some variant of a license header in it (even if <5
    lines).

    All documentation files were explicitly excluded.

    The following heuristics were used to determine which SPDX license
    identifiers to apply.

    - when both scanners couldn't find any license traces, file was
    considered to have no license information in it, and the top level
    COPYING file license applied.

    For non */uapi/* files that summary was:

    SPDX license identifier # files
    ---------------------------------------------------|-------
    GPL-2.0 11139

    and resulted in the first patch in this series.

    If that file was a */uapi/* path one, it was "GPL-2.0 WITH
    Linux-syscall-note" otherwise it was "GPL-2.0". Results of that
    was:

    SPDX license identifier # files
    ---------------------------------------------------|-------
    GPL-2.0 WITH Linux-syscall-note 930

    and resulted in the second patch in this series.

    - if a file had some form of licensing information in it, and was one
    of the */uapi/* ones, it was denoted with the Linux-syscall-note if
    any GPL family license was found in the file or had no licensing in
    it (per prior point). Results summary:

    SPDX license identifier # files
    ---------------------------------------------------|------
    GPL-2.0 WITH Linux-syscall-note 270
    GPL-2.0+ WITH Linux-syscall-note 169
    ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21
    ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17
    LGPL-2.1+ WITH Linux-syscall-note 15
    GPL-1.0+ WITH Linux-syscall-note 14
    ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5
    LGPL-2.0+ WITH Linux-syscall-note 4
    LGPL-2.1 WITH Linux-syscall-note 3
    ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3
    ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1

    and that resulted in the third patch in this series.

    - when the two scanners agreed on the detected license(s), that
    became the concluded license(s).

    - when there was disagreement between the two scanners (one detected
    a license but the other didn't, or they both detected different
    licenses) a manual inspection of the file occurred.

    - In most cases a manual inspection of the information in the file
    resulted in a clear resolution of the license that should apply
    (and which scanner probably needed to revisit its heuristics).

    - When it was not immediately clear, the license identifier was
    confirmed with lawyers working with the Linux Foundation.

    - If there was any question as to the appropriate license identifier,
    the file was flagged for further research and to be revisited later
    in time.

    In total, over 70 hours of logged manual review was done on the
    spreadsheet to determine the SPDX license identifiers to apply to the
    source files by Kate, Philippe, Thomas and, in some cases,
    confirmation by lawyers working with the Linux Foundation.

    Kate also obtained a third independent scan of the 4.13 code base from
    FOSSology, and compared selected files where the other two scanners
    disagreed against that SPDX file, to see if there was new insights.
    The Windriver scanner is based on an older version of FOSSology in
    part, so they are related.

    Thomas did random spot checks in about 500 files from the spreadsheets
    for the uapi headers and agreed with SPDX license identifier in the
    files he inspected. For the non-uapi files Thomas did random spot
    checks in about 15000 files.

    In initial set of patches against 4.14-rc6, 3 files were found to have
    copy/paste license identifier errors, and have been fixed to reflect
    the correct identifier.

    Additionally Philippe spent 10 hours this week doing a detailed manual
    inspection and review of the 12,461 patched files from the initial
    patch version early this week with:

    - a full scancode scan run, collecting the matched texts, detected
    license ids and scores

    - reviewing anything where there was a license detected (about 500+
    files) to ensure that the applied SPDX license was correct

    - reviewing anything where there was no detection but the patch
    license was not GPL-2.0 WITH Linux-syscall-note to ensure that the
    applied SPDX license was correct

    This produced a worksheet with 20 files needing minor correction. This
    worksheet was then exported into 3 different .csv files for the
    different types of files to be modified.

    These .csv files were then reviewed by Greg. Thomas wrote a script to
    parse the csv files and add the proper SPDX tag to the file, in the
    format that the file expected. This script was further refined by Greg
    based on the output to detect more types of files automatically and to
    distinguish between header and source .c files (which need different
    comment types.) Finally Greg ran the script using the .csv files to
    generate the patches.

    Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org>
    Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com>
    Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>"

    * tag 'spdx_identifiers-4.14-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
    License cleanup: add SPDX license identifier to uapi header files with a license
    License cleanup: add SPDX license identifier to uapi header files with no license
    License cleanup: add SPDX GPL-2.0 license identifier to files with no license

    Linus Torvalds
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

27 Oct, 2017

3 commits

  • We want to centralize the isolation management, done by the housekeeping
    subsystem. Therefore we need to handle the nohz_full= parameter from
    there.

    Since nohz_full= so far has involved unbound timers, watchdog, RCU
    and tilegx NAPI isolation, we keep that default behaviour.

    nohz_full= will be deprecated in the future. We want to control
    the isolation features from the isolcpus= parameter.

    Signed-off-by: Frederic Weisbecker
    Acked-by: Thomas Gleixner
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Linus Torvalds
    Cc: Luiz Capitulino
    Cc: Mike Galbraith
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1509072159-31808-10-git-send-email-frederic@kernel.org
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • Split the housekeeping config from CONFIG_NO_HZ_FULL. This way we finally
    separate the isolation code from NOHZ.

    Although a dependency to CONFIG_NO_HZ_FULL remains for now, while the
    housekeeping code still deals with NOHZ internals.

    Signed-off-by: Frederic Weisbecker
    Acked-by: Thomas Gleixner
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Linus Torvalds
    Cc: Luiz Capitulino
    Cc: Mike Galbraith
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1509072159-31808-8-git-send-email-frederic@kernel.org
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • The housekeeping code is currently tied to the NOHZ code. As we are
    planning to make housekeeping independent from it, start with moving
    the relevant code to its own file.

    Signed-off-by: Frederic Weisbecker
    Acked-by: Thomas Gleixner
    Acked-by: Paul E. McKenney
    Cc: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Linus Torvalds
    Cc: Luiz Capitulino
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1509072159-31808-2-git-send-email-frederic@kernel.org
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

12 Oct, 2017

1 commit


07 Oct, 2017

1 commit

  • The choice containing the CC_OPTIMIZE_FOR_PERFORMANCE symbol
    accidentally added a "CONFIG_" prefix when trying to make it the
    default, selecting an undefined symbol as the default.

    The mistake is harmless here: Since the default symbol is not visible,
    the choice falls back on using the visible symbol as the default
    instead, which is CC_OPTIMIZE_FOR_PERFORMANCE, as intended.

    A patch that makes Kconfig print a warning in this case has been
    submitted separately:
    http://www.spinics.net/lists/linux-kbuild/msg15566.html

    Signed-off-by: Ulf Magnusson
    Acked-by: Arnd Bergmann
    Signed-off-by: Masahiro Yamada

    Ulf Magnusson
     

27 Sep, 2017

1 commit

  • acpi_early_init() unmaps the temporary ACPI Table mappings which are used
    in the early startup code and prepares for permanent table mappings.

    Before the consolidation of the x86 APIC setup code the invocation of
    acpi_early_init() happened before the interrupt remapping unit was
    initialized. With the rework the remapping unit initialization moved in
    front of acpi_early_init() which causes an ACPI warning when the ACPI root
    tables get reallocated afterwards.

    Invoke acpi_early_init() before late_time_init() which is before the access
    to the DMAR tables happens.

    Fixes: 935356cecda8 ("x86/apic: Initialize interrupt mode after timer init")
    Reported-by: Xiaolong Ye
    Signed-off-by: Dou Liyang
    Cc: Tony Luck
    Cc: linux-ia64@vger.kernel.org
    Cc: bhe@redhat.com
    Cc: Fenghua Yu
    Cc: Michael Ellerman
    Cc: "Rafael J. Wysocki"
    Cc: Will Deacon
    Cc: linux-acpi@vger.kernel.org
    Cc: bp@alien8.de
    Cc: Lv"
    Cc: yinghai@kernel.org
    Cc: linux-arm-kernel@lists.infradead.org
    Link: https://lkml.kernel.org/r/1505294274-441-1-git-send-email-douly.fnst@cn.fujitsu.com
    Signed-off-by: Thomas Gleixner

    Dou Liyang
     

15 Sep, 2017

2 commits

  • Pull mount flag updates from Al Viro:
    "Another chunk of fmount preparations from dhowells; only trivial
    conflicts for that part. It separates MS_... bits (very grotty
    mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
    only a small subset of MS_... stuff).

    This does *not* convert the filesystems to new constants; only the
    infrastructure is done here. The next step in that series is where the
    conflicts would be; that's the conversion of filesystems. It's purely
    mechanical and it's better done after the merge, so if you could run
    something like

    list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')

    sed -i -e 's/\/SB_RDONLY/g' \
    -e 's/\/SB_NOSUID/g' \
    -e 's/\/SB_NODEV/g' \
    -e 's/\/SB_NOEXEC/g' \
    -e 's/\/SB_SYNCHRONOUS/g' \
    -e 's/\/SB_MANDLOCK/g' \
    -e 's/\/SB_DIRSYNC/g' \
    -e 's/\/SB_NOATIME/g' \
    -e 's/\/SB_NODIRATIME/g' \
    -e 's/\/SB_SILENT/g' \
    -e 's/\/SB_POSIXACL/g' \
    -e 's/\/SB_KERNMOUNT/g' \
    -e 's/\/SB_I_VERSION/g' \
    -e 's/\/SB_LAZYTIME/g' \
    $list

    and commit it with something along the lines of 'convert filesystems
    away from use of MS_... constants' as commit message, it would save a
    quite a bit of headache next cycle"

    * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    VFS: Differentiate mount flags (MS_*) from internal superblock flags
    VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
    vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags

    Linus Torvalds
     
  • Pull ipc compat cleanup and 64-bit time_t from Al Viro:
    "IPC copyin/copyout sanitizing, including 64bit time_t work from Deepa
    Dinamani"

    * 'work.ipc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    utimes: Make utimes y2038 safe
    ipc: shm: Make shmid_kernel timestamps y2038 safe
    ipc: sem: Make sem_array timestamps y2038 safe
    ipc: msg: Make msg_queue timestamps y2038 safe
    ipc: mqueue: Replace timespec with timespec64
    ipc: Make sys_semtimedop() y2038 safe
    get rid of SYSVIPC_COMPAT on ia64
    semtimedop(): move compat to native
    shmat(2): move compat to native
    msgrcv(2), msgsnd(2): move compat to native
    ipc(2): move compat to native
    ipc: make use of compat ipc_perm helpers
    semctl(): move compat to native
    semctl(): separate all layout-dependent copyin/copyout
    msgctl(): move compat to native
    msgctl(): split the actual work from copyin/copyout
    ipc: move compat shmctl to native
    shmctl: split the work from copyin/copyout

    Linus Torvalds
     

09 Sep, 2017

2 commits

  • Feed the boot command-line as to the /dev/random entropy pool

    Existing Android bootloaders usually pass data which may not be known by
    an external attacker on the kernel command-line. It may also be the
    case on other embedded systems. Sample command-line from a Google Pixel
    running CopperheadOS....

    console=ttyHSL0,115200,n8 androidboot.console=ttyHSL0
    androidboot.hardware=sailfish user_debug=31 ehci-hcd.park=3
    lpm_levels.sleep_disabled=1 cma=32M@0-0xffffffff buildvariant=user
    veritykeyid=id:dfcb9db0089e5b3b4090a592415c28e1cb4545ab
    androidboot.bootdevice=624000.ufshc androidboot.verifiedbootstate=yellow
    androidboot.veritymode=enforcing androidboot.keymaster=1
    androidboot.serialno=FA6CE0305299 androidboot.baseband=msm
    mdss_mdp.panel=1:dsi:0:qcom,mdss_dsi_samsung_ea8064tg_1080p_cmd:1:none:cfg:single_dsi
    androidboot.slot_suffix=_b fpsimd.fpsimd_settings=0
    app_setting.use_app_setting=0 kernelflag=0x00000000 debugflag=0x00000000
    androidboot.hardware.revision=PVT radioflag=0x00000000
    radioflagex1=0x00000000 radioflagex2=0x00000000 cpumask=0x00000000
    androidboot.hardware.ddr=4096MB,Hynix,LPDDR4 androidboot.ddrinfo=00000006
    androidboot.ddrsize=4GB androidboot.hardware.color=GRA00
    androidboot.hardware.ufs=32GB,Samsung androidboot.msm.hw_ver_id=268824801
    androidboot.qf.st=2 androidboot.cid=11111111 androidboot.mid=G-2PW4100
    androidboot.bootloader=8996-012001-1704121145
    androidboot.oem_unlock_support=1 androidboot.fp_src=1
    androidboot.htc.hrdump=detected androidboot.ramdump.opt=mem@2g:2g,mem@4g:2g
    androidboot.bootreason=reboot androidboot.ramdump_enable=0 ro
    root=/dev/dm-0 dm="system none ro,0 1 android-verity /dev/sda34"
    rootwait skip_initramfs init=/init androidboot.wificountrycode=US
    androidboot.boottime=1BLL:85,1BLE:669,2BLL:0,2BLE:1777,SW:6,KL:8136

    Among other things, it contains a value unique to the device
    (androidboot.serialno=FA6CE0305299), unique to the OS builds for the
    device variant (veritykeyid=id:dfcb9db0089e5b3b4090a592415c28e1cb4545ab)
    and timings from the bootloader stages in milliseconds
    (androidboot.boottime=1BLL:85,1BLE:669,2BLL:0,2BLE:1777,SW:6,KL:8136).

    [tytso@mit.edu: changelog tweak]
    [labbott@redhat.com: line-wrapped command line]
    Link: http://lkml.kernel.org/r/20170816231458.2299-3-labbott@redhat.com
    Signed-off-by: Daniel Micay
    Signed-off-by: Laura Abbott
    Acked-by: Kees Cook
    Cc: "Theodore Ts'o"
    Cc: Laura Abbott
    Cc: Nick Kralevich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Micay
     
  • Patch series "Command line randomness", v3.

    A series to add the kernel command line as a source of randomness.

    This patch (of 2):

    Stack canary intialization involves getting a random number. Getting this
    random number may involve accessing caches or other architectural specific
    features which are not available until after the architecture is setup.
    Move the stack canary initialization later to accommodate this.

    Link: http://lkml.kernel.org/r/20170816231458.2299-2-labbott@redhat.com
    Signed-off-by: Laura Abbott
    Signed-off-by: Laura Abbott
    Acked-by: Kees Cook
    Cc: "Theodore Ts'o"
    Cc: Daniel Micay
    Cc: Nick Kralevich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Laura Abbott
     

07 Sep, 2017

3 commits

  • Pull percpu updates from Tejun Heo:
    "A lot of changes for percpu this time around. percpu inherited the
    same area allocator from the original pre-virtual-address-mapped
    implementation. This was from the time when percpu allocator wasn't
    used all that much and the implementation was focused on simplicity,
    with the unfortunate computational complexity of O(number of areas
    allocated from the chunk) per alloc / free.

    With the increase in percpu usage, we're hitting cases where the lack
    of scalability is hurting. The most prominent one right now is bpf
    perpcu map creation / destruction which may allocate and free a lot of
    entries consecutively and it's likely that the problem will become
    more prominent in the future.

    To address the issue, Dennis replaced the area allocator with hinted
    bitmap allocator which is more consistent. While the new allocator
    does perform a bit worse in some cases, it outperforms the old
    allocator way more than an order of magnitude in other more common
    scenarios while staying mostly flat in CPU overhead and completely
    flat in memory consumption"

    * 'for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (27 commits)
    percpu: update header to contain bitmap allocator explanation.
    percpu: update pcpu_find_block_fit to use an iterator
    percpu: use metadata blocks to update the chunk contig hint
    percpu: update free path to take advantage of contig hints
    percpu: update alloc path to only scan if contig hints are broken
    percpu: keep track of the best offset for contig hints
    percpu: skip chunks if the alloc does not fit in the contig hint
    percpu: add first_bit to keep track of the first free in the bitmap
    percpu: introduce bitmap metadata blocks
    percpu: replace area map allocator with bitmap
    percpu: generalize bitmap (un)populated iterators
    percpu: increase minimum percpu allocation size and align first regions
    percpu: introduce nr_empty_pop_pages to help empty page accounting
    percpu: change the number of pages marked in the first_chunk pop bitmap
    percpu: combine percpu address checks
    percpu: modify base_addr to be region specific
    percpu: setup_first_chunk rename schunk/dchunk to chunk
    percpu: end chunk area maps page aligned for the populated bitmap
    percpu: unify allocation of schunk and dchunk
    percpu: setup_first_chunk remove dyn_size and consolidate logic
    ...

    Linus Torvalds
     
  • build_all_zonelists gets a zone parameter to initialize zone's pagesets.
    There is only a single user which gives a non-NULL zone parameter and
    that one doesn't really need the rest of the build_all_zonelists (see
    commit 6dcd73d7011b ("memory-hotplug: allocate zone's pcp before
    onlining pages")).

    Therefore remove setup_zone_pageset from build_all_zonelists and call it
    from its only user directly. This will also remove a pointless zonlists
    rebuilding which is always good.

    Link: http://lkml.kernel.org/r/20170721143915.14161-5-mhocko@kernel.org
    Signed-off-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Shaohua Li
    Cc: Toshi Kani
    Cc: Wen Congyang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • This SLUB free list pointer obfuscation code is modified from Brad
    Spengler/PaX Team's code in the last public patch of grsecurity/PaX
    based on my understanding of the code. Changes or omissions from the
    original code are mine and don't reflect the original grsecurity/PaX
    code.

    This adds a per-cache random value to SLUB caches that is XORed with
    their freelist pointer address and value. This adds nearly zero
    overhead and frustrates the very common heap overflow exploitation
    method of overwriting freelist pointers.

    A recent example of the attack is written up here:

    http://cyseclabs.com/blog/cve-2016-6187-heap-off-by-one-exploit

    and there is a section dedicated to the technique the book "A Guide to
    Kernel Exploitation: Attacking the Core".

    This is based on patches by Daniel Micay, and refactored to minimize the
    use of #ifdef.

    With 200-count cycles of "hackbench -g 20 -l 1000" I saw the following
    run times:

    before:
    mean 10.11882499999999999995
    variance .03320378329145728642
    stdev .18221905304181911048

    after:
    mean 10.12654000000000000014
    variance .04700556623115577889
    stdev .21680767106160192064

    The difference gets lost in the noise, but if the above is to be taken
    literally, using CONFIG_FREELIST_HARDENED is 0.07% slower.

    Link: http://lkml.kernel.org/r/20170802180609.GA66807@beast
    Signed-off-by: Kees Cook
    Suggested-by: Daniel Micay
    Cc: Rik van Riel
    Cc: Tycho Andersen
    Cc: Alexander Popov
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

05 Sep, 2017

3 commits

  • Pull x86 mm changes from Ingo Molnar:
    "PCID support, 5-level paging support, Secure Memory Encryption support

    The main changes in this cycle are support for three new, complex
    hardware features of x86 CPUs:

    - Add 5-level paging support, which is a new hardware feature on
    upcoming Intel CPUs allowing up to 128 PB of virtual address space
    and 4 PB of physical RAM space - a 512-fold increase over the old
    limits. (Supercomputers of the future forecasting hurricanes on an
    ever warming planet can certainly make good use of more RAM.)

    Many of the necessary changes went upstream in previous cycles,
    v4.14 is the first kernel that can enable 5-level paging.

    This feature is activated via CONFIG_X86_5LEVEL=y - disabled by
    default.

    (By Kirill A. Shutemov)

    - Add 'encrypted memory' support, which is a new hardware feature on
    upcoming AMD CPUs ('Secure Memory Encryption', SME) allowing system
    RAM to be encrypted and decrypted (mostly) transparently by the
    CPU, with a little help from the kernel to transition to/from
    encrypted RAM. Such RAM should be more secure against various
    attacks like RAM access via the memory bus and should make the
    radio signature of memory bus traffic harder to intercept (and
    decrypt) as well.

    This feature is activated via CONFIG_AMD_MEM_ENCRYPT=y - disabled
    by default.

    (By Tom Lendacky)

    - Enable PCID optimized TLB flushing on newer Intel CPUs: PCID is a
    hardware feature that attaches an address space tag to TLB entries
    and thus allows to skip TLB flushing in many cases, even if we
    switch mm's.

    (By Andy Lutomirski)

    All three of these features were in the works for a long time, and
    it's coincidence of the three independent development paths that they
    are all enabled in v4.14 at once"

    * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (65 commits)
    x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)
    x86/mm: Use pr_cont() in dump_pagetable()
    x86/mm: Fix SME encryption stack ptr handling
    kvm/x86: Avoid clearing the C-bit in rsvd_bits()
    x86/CPU: Align CR3 defines
    x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages
    acpi, x86/mm: Remove encryption mask from ACPI page protection type
    x86/mm, kexec: Fix memory corruption with SME on successive kexecs
    x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txt
    x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y
    x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID
    x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y
    x86/mm: Allow userspace have mappings above 47-bit
    x86/mm: Prepare to expose larger address space to userspace
    x86/mpx: Do not allow MPX if we have mappings above 47-bit
    x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit()
    x86/xen: Redefine XEN_ELFNOTE_INIT_P2M using PUD_SIZE * PTRS_PER_PUD
    x86/mm/dump_pagetables: Fix printout of p4d level
    x86/mm/dump_pagetables: Generalize address normalization
    x86/boot: Fix memremap() related build failure
    ...

    Linus Torvalds
     
  • Pull locking updates from Ingo Molnar:

    - Add 'cross-release' support to lockdep, which allows APIs like
    completions, where it's not the 'owner' who releases the lock, to be
    tracked. It's all activated automatically under
    CONFIG_PROVE_LOCKING=y.

    - Clean up (restructure) the x86 atomics op implementation to be more
    readable, in preparation of KASAN annotations. (Dmitry Vyukov)

    - Fix static keys (Paolo Bonzini)

    - Add killable versions of down_read() et al (Kirill Tkhai)

    - Rework and fix jump_label locking (Marc Zyngier, Paolo Bonzini)

    - Rework (and fix) tlb_flush_pending() barriers (Peter Zijlstra)

    - Remove smp_mb__before_spinlock() and convert its usages, introduce
    smp_mb__after_spinlock() (Peter Zijlstra)

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
    locking/lockdep/selftests: Fix mixed read-write ABBA tests
    sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK()
    acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuse
    locking/pvqspinlock: Relax cmpxchg's to improve performance on some architectures
    smp: Avoid using two cache lines for struct call_single_data
    locking/lockdep: Untangle xhlock history save/restore from task independence
    locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time being
    futex: Remove duplicated code and fix undefined behaviour
    Documentation/locking/atomic: Finish the document...
    locking/lockdep: Fix workqueue crossrelease annotation
    workqueue/lockdep: 'Fix' flush_work() annotation
    locking/lockdep/selftests: Add mixed read-write ABBA tests
    mm, locking/barriers: Clarify tlb_flush_pending() barriers
    locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS truly non-interactive
    locking/lockdep: Explicitly initialize wq_barrier::done::map
    locking/lockdep: Rename CONFIG_LOCKDEP_COMPLETE to CONFIG_LOCKDEP_COMPLETIONS
    locking/lockdep: Reword title of LOCKDEP_CROSSRELEASE config
    locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING
    locking/refcounts, x86/asm: Implement fast refcount overflow protection
    locking/lockdep: Fix the rollback and overwrite detection logic in crossrelease
    ...

    Linus Torvalds
     
  • Pull scheduler updates from Ingo Molnar:
    "The main changes in this cycle were:

    - fix affine wakeups (Peter Zijlstra)

    - improve CPU onlining (and general bootup) scalability on systems
    with ridiculous number (thousands) of CPUs (Peter Zijlstra)

    - sched/numa updates (Rik van Riel)

    - sched/deadline updates (Byungchul Park)

    - sched/cpufreq enhancements and related cleanups (Viresh Kumar)

    - sched/debug enhancements (Xie XiuQi)

    - various fixes"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
    sched/debug: Optimize sched_domain sysctl generation
    sched/topology: Avoid pointless rebuild
    sched/topology, cpuset: Avoid spurious/wrong domain rebuilds
    sched/topology: Improve comments
    sched/topology: Fix memory leak in __sdt_alloc()
    sched/completion: Document that reinit_completion() must be called after complete_all()
    sched/autogroup: Fix error reporting printk text in autogroup_create()
    sched/fair: Fix wake_affine() for !NUMA_BALANCING
    sched/debug: Intruduce task_state_to_char() helper function
    sched/debug: Show task state in /proc/sched_debug
    sched/debug: Use task_pid_nr_ns in /proc/$pid/sched
    sched/core: Remove unnecessary initialization init_idle_bootup_task()
    sched/deadline: Change return value of cpudl_find()
    sched/deadline: Make find_later_rq() choose a closer CPU in topology
    sched/numa: Scale scan period with tasks in group and shared/private
    sched/numa: Slow down scan rate if shared faults dominate
    sched/pelt: Fix false running accounting
    sched: Mark pick_next_task_dl() and build_sched_domain() as static
    sched/cpupri: Don't re-initialize 'struct cpupri'
    sched/deadline: Don't re-initialize 'struct cpudl'
    ...

    Linus Torvalds
     

04 Sep, 2017

1 commit

  • struct timespec is not y2038 safe on 32 bit machines.
    Replace timespec with y2038 safe struct timespec64.

    Note that the patch only changes the internals without
    modifying the syscall interfaces. This will be part
    of a separate series.

    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Al Viro

    Deepa Dinamani
     

14 Aug, 2017

1 commit

  • The allocated debug objects are either on the free list or in the
    hashed bucket lists. So they won't get lost. However if both debug
    objects and kmemleak are enabled and kmemleak scanning is done
    while some of the debug objects are transitioning from one list to
    the others, false negative reporting of memory leaks may happen for
    those objects. For example,

    [38687.275678] kmemleak: 12 new suspected memory leaks (see
    /sys/kernel/debug/kmemleak)
    unreferenced object 0xffff92e98aabeb68 (size 40):
    comm "ksmtuned", pid 4344, jiffies 4298403600 (age 906.430s)
    hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 d0 bc db 92 e9 92 ff ff ................
    01 00 00 00 00 00 00 00 38 36 8a 61 e9 92 ff ff ........86.a....
    backtrace:
    [] kmemleak_alloc+0x4a/0xa0
    [] kmem_cache_alloc+0xe9/0x320
    [] __debug_object_init+0x3e6/0x400
    [] debug_object_activate+0x131/0x210
    [] __call_rcu+0x3f/0x400
    [] call_rcu_sched+0x1d/0x20
    [] put_object+0x2c/0x40
    [] __delete_object+0x3c/0x50
    [] delete_object_full+0x1d/0x20
    [] kmemleak_free+0x32/0x80
    [] kmem_cache_free+0x77/0x350
    [] unlink_anon_vmas+0x82/0x1e0
    [] free_pgtables+0xa1/0x110
    [] exit_mmap+0xc1/0x170
    [] mmput+0x80/0x150
    [] do_exit+0x2a9/0xd20

    The references in the debug objects may also hide a real memory leak.

    As there is no point in having kmemleak to track debug object
    allocations, kmemleak checking is now disabled for debug objects.

    Signed-off-by: Waiman Long
    Signed-off-by: Thomas Gleixner
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/1502718733-8527-1-git-send-email-longman@redhat.com

    Waiman Long
     

10 Aug, 2017

1 commit

  • init_idle_bootup_task( ) is called in rest_init( ) to switch
    the scheduling class of the boot thread to the idle class.

    the function only sets:

    idle->sched_class = &idle_sched_class;

    which has been set in init_idle() called by sched_init():

    /*
    * The idle tasks have their own, simple scheduling class:
    */
    idle->sched_class = &idle_sched_class;

    We've already set the boot thread to idle class in
    start_kernel()->sched_init()->init_idle()
    so it's unnecessary to set it again in
    start_kernel()->rest_init()->init_idle_bootup_task()

    Signed-off-by: Cheng Jian
    Signed-off-by: Xie XiuQi
    Signed-off-by: Peter Zijlstra (Intel)
    Cc:
    Cc:
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/1501838377-109720-1-git-send-email-cj.chengjian@huawei.com
    Signed-off-by: Ingo Molnar

    Cheng Jian
     

01 Aug, 2017

1 commit

  • This makes it possible to preserve basic futex support and compile out the
    PI support when RT mutexes are not available.

    Signed-off-by: Nicolas Pitre
    Signed-off-by: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Darren Hart
    Link: http://lkml.kernel.org/r/alpine.LFD.2.20.1708010024190.5981@knanqh.ubzr

    Nicolas Pitre
     

27 Jul, 2017

1 commit

  • The percpu memory allocator is experiencing scalability issues when
    allocating and freeing large numbers of counters as in BPF.
    Additionally, there is a corner case where iteration is triggered over
    all chunks if the contig_hint is the right size, but wrong alignment.

    This patch replaces the area map allocator with a basic bitmap allocator
    implementation. Each subsequent patch will introduce new features and
    replace full scanning functions with faster non-scanning options when
    possible.

    Implementation:
    This patchset removes the area map allocator in favor of a bitmap
    allocator backed by metadata blocks. The primary goal is to provide
    consistency in performance and memory footprint with a focus on small
    allocations (< 64 bytes). The bitmap removes the heavy memmove from the
    freeing critical path and provides a consistent memory footprint. The
    metadata blocks provide a bound on the amount of scanning required by
    maintaining a set of hints.

    In an effort to make freeing fast, the metadata is updated on the free
    path if the new free area makes a page free, a block free, or spans
    across blocks. This causes the chunk's contig hint to potentially be
    smaller than what it could allocate by up to the smaller of a page or a
    block. If the chunk's contig hint is contained within a block, a check
    occurs and the hint is kept accurate. Metadata is always kept accurate
    on allocation, so there will not be a situation where a chunk has a
    later contig hint than available.

    Evaluation:
    I have primarily done testing against a simple workload of allocation of
    1 million objects (2^20) of varying size. Deallocation was done by in
    order, alternating, and in reverse. These numbers were collected after
    rebasing ontop of a80099a152. I present the worst-case numbers here:

    Area Map Allocator:

    Object Size | Alloc Time (ms) | Free Time (ms)
    ----------------------------------------------
    4B | 310 | 4770
    16B | 557 | 1325
    64B | 436 | 273
    256B | 776 | 131
    1024B | 3280 | 122

    Bitmap Allocator:

    Object Size | Alloc Time (ms) | Free Time (ms)
    ----------------------------------------------
    4B | 490 | 70
    16B | 515 | 75
    64B | 610 | 80
    256B | 950 | 100
    1024B | 3520 | 200

    This data demonstrates the inability for the area map allocator to
    handle less than ideal situations. In the best case of reverse
    deallocation, the area map allocator was able to perform within range
    of the bitmap allocator. In the worst case situation, freeing took
    nearly 5 seconds for 1 million 4-byte objects. The bitmap allocator
    dramatically improves the consistency of the free path. The small
    allocations performed nearly identical regardless of the freeing
    pattern.

    While it does add to the allocation latency, the allocation scenario
    here is optimal for the area map allocator. The area map allocator runs
    into trouble when it is allocating in chunks where the latter half is
    full. It is difficult to replicate this, so I present a variant where
    the pages are second half filled. Freeing was done sequentially. Below
    are the numbers for this scenario:

    Area Map Allocator:

    Object Size | Alloc Time (ms) | Free Time (ms)
    ----------------------------------------------
    4B | 4118 | 4892
    16B | 1651 | 1163
    64B | 598 | 285
    256B | 771 | 158
    1024B | 3034 | 160

    Bitmap Allocator:

    Object Size | Alloc Time (ms) | Free Time (ms)
    ----------------------------------------------
    4B | 481 | 67
    16B | 506 | 69
    64B | 636 | 75
    256B | 892 | 90
    1024B | 3262 | 147

    The data shows a parabolic curve of performance for the area map
    allocator. This is due to the memmove operation being the dominant cost
    with the lower object sizes as more objects are packed in a chunk and at
    higher object sizes, the traversal of the chunk slots is the dominating
    cost. The bitmap allocator suffers this problem as well. The above data
    shows the inability to scale for the allocation path with the area map
    allocator and that the bitmap allocator demonstrates consistent
    performance in general.

    The second problem of additional scanning can result in the area map
    allocator completing in 52 minutes when trying to allocate 1 million
    4-byte objects with 8-byte alignment. The same workload takes
    approximately 16 seconds to complete for the bitmap allocator.

    V2:
    Fixed a bug in pcpu_alloc_first_chunk end_offset was setting the bitmap
    using bytes instead of bits.

    Added a comment to pcpu_cnt_pop_pages to explain bitmap_weight.

    Signed-off-by: Dennis Zhou
    Reviewed-by: Josef Bacik
    Signed-off-by: Tejun Heo

    Dennis Zhou (Facebook)
     

18 Jul, 2017

1 commit

  • Since DMA addresses will effectively look like 48-bit addresses when the
    memory encryption mask is set, SWIOTLB is needed if the DMA mask of the
    device performing the DMA does not support 48-bits. SWIOTLB will be
    initialized to create decrypted bounce buffers for use by these devices.

    Signed-off-by: Tom Lendacky
    Reviewed-by: Thomas Gleixner
    Cc: Alexander Potapenko
    Cc: Andrey Ryabinin
    Cc: Andy Lutomirski
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brijesh Singh
    Cc: Dave Young
    Cc: Dmitry Vyukov
    Cc: Jonathan Corbet
    Cc: Konrad Rzeszutek Wilk
    Cc: Larry Woodman
    Cc: Linus Torvalds
    Cc: Matt Fleming
    Cc: Michael S. Tsirkin
    Cc: Paolo Bonzini
    Cc: Peter Zijlstra
    Cc: Radim Krčmář
    Cc: Rik van Riel
    Cc: Toshimitsu Kani
    Cc: kasan-dev@googlegroups.com
    Cc: kvm@vger.kernel.org
    Cc: linux-arch@vger.kernel.org
    Cc: linux-doc@vger.kernel.org
    Cc: linux-efi@vger.kernel.org
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/aa2d29b78ae7d508db8881e46a3215231b9327a7.1500319216.git.thomas.lendacky@amd.com
    Signed-off-by: Ingo Molnar

    Tom Lendacky