30 May, 2018

2 commits

  • [ Upstream commit 69c907022a7d9325cdc5c9dd064571e445df9a47 ]

    At the point of sysfs callback, the call to gup is
    done without mmap_sem (or any lock for that matter).
    This is racy. As such, use the get_user_pages_fast()
    alternative and safely avoid taking the lock, if possible.

    Signed-off-by: Davidlohr Bueso
    Signed-off-by: Tony Luck
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Davidlohr Bueso
     
  • [ Upstream commit 173a3efd3edb2ef6ef07471397c5f542a360e9c1 ]

    Looking at functions with large stack frames across all architectures
    led me discovering that BUG() suffers from the same problem as
    fortify_panic(), which I've added a workaround for already.

    In short, variables that go out of scope by calling a noreturn function
    or __builtin_unreachable() keep using stack space in functions
    afterwards.

    A workaround that was identified is to insert an empty assembler
    statement just before calling the function that doesn't return. I'm
    adding a macro "barrier_before_unreachable()" to document this, and
    insert calls to that in all instances of BUG() that currently suffer
    from this problem.

    The files that saw the largest change from this had these frame sizes
    before, and much less with my patch:

    fs/ext4/inode.c:82:1: warning: the frame size of 1672 bytes is larger than 800 bytes [-Wframe-larger-than=]
    fs/ext4/namei.c:434:1: warning: the frame size of 904 bytes is larger than 800 bytes [-Wframe-larger-than=]
    fs/ext4/super.c:2279:1: warning: the frame size of 1160 bytes is larger than 800 bytes [-Wframe-larger-than=]
    fs/ext4/xattr.c:146:1: warning: the frame size of 1168 bytes is larger than 800 bytes [-Wframe-larger-than=]
    fs/f2fs/inode.c:152:1: warning: the frame size of 1424 bytes is larger than 800 bytes [-Wframe-larger-than=]
    net/netfilter/ipvs/ip_vs_core.c:1195:1: warning: the frame size of 1068 bytes is larger than 800 bytes [-Wframe-larger-than=]
    net/netfilter/ipvs/ip_vs_core.c:395:1: warning: the frame size of 1084 bytes is larger than 800 bytes [-Wframe-larger-than=]
    net/netfilter/ipvs/ip_vs_ftp.c:298:1: warning: the frame size of 928 bytes is larger than 800 bytes [-Wframe-larger-than=]
    net/netfilter/ipvs/ip_vs_ftp.c:418:1: warning: the frame size of 908 bytes is larger than 800 bytes [-Wframe-larger-than=]
    net/netfilter/ipvs/ip_vs_lblcr.c:718:1: warning: the frame size of 960 bytes is larger than 800 bytes [-Wframe-larger-than=]
    drivers/net/xen-netback/netback.c:1500:1: warning: the frame size of 1088 bytes is larger than 800 bytes [-Wframe-larger-than=]

    In case of ARC and CRIS, it turns out that the BUG() implementation
    actually does return (or at least the compiler thinks it does),
    resulting in lots of warnings about uninitialized variable use and
    leaving noreturn functions, such as:

    block/cfq-iosched.c: In function 'cfq_async_queue_prio':
    block/cfq-iosched.c:3804:1: error: control reaches end of non-void function [-Werror=return-type]
    include/linux/dmaengine.h: In function 'dma_maxpq':
    include/linux/dmaengine.h:1123:1: error: control reaches end of non-void function [-Werror=return-type]

    This makes them call __builtin_trap() instead, which should normally
    dump the stack and kill the current process, like some of the other
    architectures already do.

    I tried adding barrier_before_unreachable() to panic() and
    fortify_panic() as well, but that had very little effect, so I'm not
    submitting that patch.

    Vineet said:

    : For ARC, it is double win.
    :
    : 1. Fixes 3 -Wreturn-type warnings
    :
    : | ../net/core/ethtool.c:311:1: warning: control reaches end of non-void function
    : [-Wreturn-type]
    : | ../kernel/sched/core.c:3246:1: warning: control reaches end of non-void function
    : [-Wreturn-type]
    : | ../include/linux/sunrpc/svc_xprt.h:180:1: warning: control reaches end of
    : non-void function [-Wreturn-type]
    :
    : 2. bloat-o-meter reports code size improvements as gcc elides the
    : generated code for stack return.

    Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82365
    Link: http://lkml.kernel.org/r/20171219114112.939391-1-arnd@arndb.de
    Signed-off-by: Arnd Bergmann
    Acked-by: Vineet Gupta [arch/arc]
    Tested-by: Vineet Gupta [arch/arc]
    Cc: Mikael Starvik
    Cc: Jesper Nilsson
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: Geert Uytterhoeven
    Cc: "David S. Miller"
    Cc: Christopher Li
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Kees Cook
    Cc: Ingo Molnar
    Cc: Josh Poimboeuf
    Cc: Will Deacon
    Cc: "Steven Rostedt (VMware)"
    Cc: Mark Rutland
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Arnd Bergmann
     

03 Mar, 2018

1 commit

  • [ Upstream commit 7729bebc619307a0233c86f8585a4bf3eadc7ce4 ]

    Remove the extra parenthesis.

    This bug was introduced by:

    e2339a4caa5e: ("ia64: Convert vtime to use nsec units directly")

    Signed-off-by: Valentin Ilie
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: fenghua.yu@intel.com
    Cc: linux-ia64@vger.kernel.org
    Cc: tony.luck@intel.com
    Link: http://lkml.kernel.org/r/1515193979-24873-1-git-send-email-valentin.ilie@gmail.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Valentin Ilie
     

06 Nov, 2017

1 commit

  • Pull x86 fixes from Ingo Molnar:
    "Two fixes:

    - A PCID related revert that fixes power management and performance
    regressions.

    - The module loader robustization and sanity check commit is rather
    fresh, but it looked like a good idea to apply because of the
    hidden data corruption problem such invalid modules could cause"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/module: Detect and skip invalid relocations
    Revert "x86/mm: Stop calling leave_mm() in idle code"

    Linus Torvalds
     

04 Nov, 2017

1 commit

  • This reverts commit 43858b4f25cf0adc5c2ca9cf5ce5fdf2532941e5.

    The reason I removed the leave_mm() calls in question is because the
    heuristic wasn't needed after that patch. With the original version
    of my PCID series, we never flushed a "lazy cpu" (i.e. a CPU running
    kernel thread) due a flush on the loaded mm.

    Unfortunately, that caused architectural issues, so now I've
    reinstated these flushes on non-PCID systems in:

    commit b956575bed91 ("x86/mm: Flush more aggressively in lazy TLB mode").

    That, in turn, gives us a power management and occasionally
    performance regression as compared to old kernels: a process that
    goes into a deep idle state on a given CPU and gets its mm flushed
    due to activity on a different CPU will wake the idle CPU.

    Reinstate the old ugly heuristic: if a CPU goes into ACPI C3 or an
    intel_idle state that is likely to cause a TLB flush gets its mm
    switched to init_mm before going idle.

    FWIW, this heuristic is lousy. Whether we should change CR3 before
    idle isn't a good hint except insofar as the performance hit is a bit
    lower if the TLB is getting flushed by the idle code anyway. What we
    really want to know is whether we anticipate being idle long enough
    that the mm is likely to be flushed before we wake up. This is more a
    matter of the expected latency than the idle state that gets chosen.
    This heuristic also completely fails on systems that don't know
    whether the TLB will be flushed (e.g. AMD systems?). OTOH it may be a
    bit obsolete anyway -- PCID systems don't presently benefit from this
    heuristic at all.

    We also shouldn't do this callback from innermost bit of the idle code
    due to the RCU nastiness it causes. All the information need is
    available before rcu_idle_enter() needs to happen.

    Signed-off-by: Andy Lutomirski
    Cc: Borislav Petkov
    Cc: Borislav Petkov
    Cc: Brian Gerst
    Cc: Denys Vlasenko
    Cc: H. Peter Anvin
    Cc: Josh Poimboeuf
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Fixes: 43858b4f25cf "x86/mm: Stop calling leave_mm() in idle code"
    Link: http://lkml.kernel.org/r/c513bbd4e653747213e05bc7062de000bf0202a5.1509793738.git.luto@kernel.org
    Signed-off-by: Ingo Molnar

    Andy Lutomirski
     

02 Nov, 2017

3 commits

  • Many user space API headers have licensing information, which is either
    incomplete, badly formatted or just a shorthand for referring to the
    license under which the file is supposed to be. This makes it hard for
    compliance tools to determine the correct license.

    Update these files with an SPDX license identifier. The identifier was
    chosen based on the license information in the file.

    GPL/LGPL licensed headers get the matching GPL/LGPL SPDX license
    identifier with the added 'WITH Linux-syscall-note' exception, which is
    the officially assigned exception identifier for the kernel syscall
    exception:

    NOTE! This copyright does *not* cover user programs that use kernel
    services by normal system calls - this is merely considered normal use
    of the kernel, and does *not* fall under the heading of "derived work".

    This exception makes it possible to include GPL headers into non GPL
    code, without confusing license compliance tools.

    Headers which have either explicit dual licensing or are just licensed
    under a non GPL license are updated with the corresponding SPDX
    identifier and the GPLv2 with syscall exception identifier. The format
    is:
    ((GPL-2.0 WITH Linux-syscall-note) OR SPDX-ID-OF-OTHER-LICENSE)

    SPDX license identifiers are a legally binding shorthand, which can be
    used instead of the full boiler plate text. The update does not remove
    existing license information as this has to be done on a case by case
    basis and the copyright holders might have to be consulted. This will
    happen in a separate step.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne. See the previous patch in this series for the
    methodology of how this patch was researched.

    Reviewed-by: Kate Stewart
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • Many user space API headers are missing licensing information, which
    makes it hard for compliance tools to determine the correct license.

    By default are files without license information under the default
    license of the kernel, which is GPLV2. Marking them GPLV2 would exclude
    them from being included in non GPLV2 code, which is obviously not
    intended. The user space API headers fall under the syscall exception
    which is in the kernels COPYING file:

    NOTE! This copyright does *not* cover user programs that use kernel
    services by normal system calls - this is merely considered normal use
    of the kernel, and does *not* fall under the heading of "derived work".

    otherwise syscall usage would not be possible.

    Update the files which contain no license information with an SPDX
    license identifier. The chosen identifier is 'GPL-2.0 WITH
    Linux-syscall-note' which is the officially assigned identifier for the
    Linux syscall exception. SPDX license identifiers are a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne. See the previous patch in this series for the
    methodology of how this patch was researched.

    Reviewed-by: Kate Stewart
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

15 Sep, 2017

1 commit

  • Pull ipc compat cleanup and 64-bit time_t from Al Viro:
    "IPC copyin/copyout sanitizing, including 64bit time_t work from Deepa
    Dinamani"

    * 'work.ipc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    utimes: Make utimes y2038 safe
    ipc: shm: Make shmid_kernel timestamps y2038 safe
    ipc: sem: Make sem_array timestamps y2038 safe
    ipc: msg: Make msg_queue timestamps y2038 safe
    ipc: mqueue: Replace timespec with timespec64
    ipc: Make sys_semtimedop() y2038 safe
    get rid of SYSVIPC_COMPAT on ia64
    semtimedop(): move compat to native
    shmat(2): move compat to native
    msgrcv(2), msgsnd(2): move compat to native
    ipc(2): move compat to native
    ipc: make use of compat ipc_perm helpers
    semctl(): move compat to native
    semctl(): separate all layout-dependent copyin/copyout
    msgctl(): move compat to native
    msgctl(): split the actual work from copyin/copyout
    ipc: move compat shmctl to native
    shmctl: split the work from copyin/copyout

    Linus Torvalds
     

12 Sep, 2017

1 commit

  • Pull namespace updates from Eric Biederman:
    "Life has been busy and I have not gotten half as much done this round
    as I would have liked. I delayed it so that a minor conflict
    resolution with the mips tree could spend a little time in linux-next
    before I sent this pull request.

    This includes two long delayed user namespace changes from Kirill
    Tkhai. It also includes a very useful change from Serge Hallyn that
    allows the security capability attribute to be used inside of user
    namespaces. The practical effect of this is people can now untar
    tarballs and install rpms in user namespaces. It had been suggested to
    generalize this and encode some of the namespace information
    information in the xattr name. Upon close inspection that makes the
    things that should be hard easy and the things that should be easy
    more expensive.

    Then there is my bugfix/cleanup for signal injection that removes the
    magic encoding of the siginfo union member from the kernel internal
    si_code. The mips folks reported the case where I had used FPE_FIXME
    me is impossible so I have remove FPE_FIXME from mips, while at the
    same time including a return statement in that case to keep gcc from
    complaining about unitialized variables.

    I almost finished the work to get make copy_siginfo_to_user a trivial
    copy to user. The code is available at:

    git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3

    But I did not have time/energy to get the code posted and reviewed
    before the merge window opened.

    I was able to see that the security excuse for just copying fields
    that we know are initialized doesn't work in practice there are buggy
    initializations that don't initialize the proper fields in siginfo. So
    we still sometimes copy unitialized data to userspace"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    Introduce v3 namespaced file capabilities
    mips/signal: In force_fcr31_sig return in the impossible case
    signal: Remove kernel interal si_code magic
    fcntl: Don't use ambiguous SIG_POLL si_codes
    prctl: Allow local CAP_SYS_ADMIN changing exe_file
    security: Use user_namespace::level to avoid redundant iterations in cap_capable()
    userns,pidns: Verify the userns for new pid namespaces
    signal/testing: Don't look for __SI_FAULT in userspace
    signal/mips: Document a conflict with SI_USER with SIGFPE
    signal/sparc: Document a conflict with SI_USER with SIGFPE
    signal/ia64: Document a conflict with SI_USER with SIGFPE
    signal/alpha: Document a conflict with SI_USER for SIGTRAP

    Linus Torvalds
     

09 Sep, 2017

1 commit

  • Pull PCI updates from Bjorn Helgaas:

    - add enhanced Downstream Port Containment support, which prints more
    details about Root Port Programmed I/O errors (Dongdong Liu)

    - add Layerscape ls1088a and ls2088a support (Hou Zhiqiang)

    - add MediaTek MT2712 and MT7622 support (Ryder Lee)

    - add MediaTek MT2712 and MT7622 MSI support (Honghui Zhang)

    - add Qualcom IPQ8074 support (Varadarajan Narayanan)

    - add R-Car r8a7743/5 device tree support (Biju Das)

    - add Rockchip per-lane PHY support for better power management (Shawn
    Lin)

    - fix IRQ mapping for hot-added devices by replacing the
    pci_fixup_irqs() boot-time design with a host bridge hook called at
    probe-time (Lorenzo Pieralisi, Matthew Minter)

    - fix race when enabling two devices that results in upstream bridge
    not being enabled correctly (Srinath Mannam)

    - fix pciehp power fault infinite loop (Keith Busch)

    - fix SHPC bridge MSI hotplug events by enabling bus mastering
    (Aleksandr Bezzubikov)

    - fix a VFIO issue by correcting PCIe capability sizes (Alex
    Williamson)

    - fix an INTD issue on Xilinx and possibly other drivers by unifying
    INTx IRQ domain support (Paul Burton)

    - avoid IOMMU stalls by marking AMD Stoney GPU ATS as broken (Joerg
    Roedel)

    - allow APM X-Gene device assignment to guests by adding an ACS quirk
    (Feng Kan)

    - fix driver crashes by disabling Extended Tags on Broadcom HT2100
    (Extended Tags support is required for PCIe Receivers but not
    Requesters, and we now enable them by default when Requesters support
    them) (Sinan Kaya)

    - fix MSIs for devices that use phantom RIDs for DMA by assuming MSIs
    use the real Requester ID (not a phantom RID) (Robin Murphy)

    - prevent assignment of Intel VMD children to guests (which may be
    supported eventually, but isn't yet) by not associating an IOMMU with
    them (Jon Derrick)

    - fix Intel VMD suspend/resume by releasing IRQs on suspend (Scott
    Bauer)

    - fix a Function-Level Reset issue with Intel 750 NVMe by waiting
    longer (up to 60sec instead of 1sec) for device to become ready
    (Sinan Kaya)

    - fix a Function-Level Reset issue on iProc Stingray by working around
    hardware defects in the CRS implementation (Oza Pawandeep)

    - fix an issue with Intel NVMe P3700 after an iProc reset by adding a
    delay during shutdown (Oza Pawandeep)

    - fix a Microsoft Hyper-V lockdep issue by polling instead of blocking
    in compose_msi_msg() (Stephen Hemminger)

    - fix a wireless LAN driver timeout by clearing DesignWare MSI
    interrupt status after it is handled, not before (Faiz Abbas)

    - fix DesignWare ATU enable checking (Jisheng Zhang)

    - reduce Layerscape dependencies on the bootloader by doing more
    initialization in the driver (Hou Zhiqiang)

    - improve Intel VMD performance allowing allocation of more IRQ vectors
    than present CPUs (Keith Busch)

    - improve endpoint framework support for initial DMA mask, different
    BAR sizes, configurable page sizes, MSI, test driver, etc (Kishon
    Vijay Abraham I, Stan Drozd)

    - rework CRS support to add periodic messages while we poll during
    enumeration and after Function-Level Reset and prepare for possible
    other uses of CRS (Sinan Kaya)

    - clean up Root Port AER handling by removing unnecessary code and
    moving error handler methods to struct pcie_port_service_driver
    (Christoph Hellwig)

    - clean up error handling paths in various drivers (Bjorn Andersson,
    Fabio Estevam, Gustavo A. R. Silva, Harunobu Kurokawa, Jeffy Chen,
    Lorenzo Pieralisi, Sergei Shtylyov)

    - clean up SR-IOV resource handling by disabling VF decoding before
    updating the corresponding resource structs (Gavin Shan)

    - clean up DesignWare-based drivers by unifying quirks to update Class
    Code and Interrupt Pin and related handling of write-protected
    registers (Hou Zhiqiang)

    - clean up by adding empty generic pcibios_align_resource() and
    pcibios_fixup_bus() and removing empty arch-specific implementations
    (Palmer Dabbelt)

    - request exclusive reset control for several drivers to allow cleanup
    elsewhere (Philipp Zabel)

    - constify various structures (Arvind Yadav, Bhumika Goyal)

    - convert from full_name() to %pOF (Rob Herring)

    - remove unused variables from iProc, HiSi, Altera, Keystone (Shawn
    Lin)

    * tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (170 commits)
    PCI: xgene: Clean up whitespace
    PCI: xgene: Define XGENE_PCI_EXP_CAP and use generic PCI_EXP_RTCTL offset
    PCI: xgene: Fix platform_get_irq() error handling
    PCI: xilinx-nwl: Fix platform_get_irq() error handling
    PCI: rockchip: Fix platform_get_irq() error handling
    PCI: altera: Fix platform_get_irq() error handling
    PCI: spear13xx: Fix platform_get_irq() error handling
    PCI: artpec6: Fix platform_get_irq() error handling
    PCI: armada8k: Fix platform_get_irq() error handling
    PCI: dra7xx: Fix platform_get_irq() error handling
    PCI: exynos: Fix platform_get_irq() error handling
    PCI: iproc: Clean up whitespace
    PCI: iproc: Rename PCI_EXP_CAP to IPROC_PCI_EXP_CAP
    PCI: iproc: Add 500ms delay during device shutdown
    PCI: Fix typos and whitespace errors
    PCI: Remove unused "res" variable from pci_resource_io()
    PCI: Correct kernel-doc of pci_vpd_srdt_size(), pci_vpd_srdt_tag()
    PCI/AER: Reformat AER register definitions
    iommu/vt-d: Prevent VMD child devices from being remapping targets
    x86/PCI: Use is_vmd() rather than relying on the domain number
    ...

    Linus Torvalds
     

07 Sep, 2017

1 commit

  • Pull networking updates from David Miller:

    1) Support ipv6 checksum offload in sunvnet driver, from Shannon
    Nelson.

    2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric
    Dumazet.

    3) Allow generic XDP to work on virtual devices, from John Fastabend.

    4) Add bpf device maps and XDP_REDIRECT, which can be used to build
    arbitrary switching frameworks using XDP. From John Fastabend.

    5) Remove UFO offloads from the tree, gave us little other than bugs.

    6) Remove the IPSEC flow cache, from Florian Westphal.

    7) Support ipv6 route offload in mlxsw driver.

    8) Support VF representors in bnxt_en, from Sathya Perla.

    9) Add support for forward error correction modes to ethtool, from
    Vidya Sagar Ravipati.

    10) Add time filter for packet scheduler action dumping, from Jamal Hadi
    Salim.

    11) Extend the zerocopy sendmsg() used by virtio and tap to regular
    sockets via MSG_ZEROCOPY. From Willem de Bruijn.

    12) Significantly rework value tracking in the BPF verifier, from Edward
    Cree.

    13) Add new jump instructions to eBPF, from Daniel Borkmann.

    14) Rework rtnetlink plumbing so that operations can be run without
    taking the RTNL semaphore. From Florian Westphal.

    15) Support XDP in tap driver, from Jason Wang.

    16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal.

    17) Add Huawei hinic ethernet driver.

    18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan
    Delalande.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits)
    i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq
    i40e: avoid NVM acquire deadlock during NVM update
    drivers: net: xgene: Remove return statement from void function
    drivers: net: xgene: Configure tx/rx delay for ACPI
    drivers: net: xgene: Read tx/rx delay for ACPI
    rocker: fix kcalloc parameter order
    rds: Fix non-atomic operation on shared flag variable
    net: sched: don't use GFP_KERNEL under spin lock
    vhost_net: correctly check tx avail during rx busy polling
    net: mdio-mux: add mdio_mux parameter to mdio_mux_init()
    rxrpc: Make service connection lookup always check for retry
    net: stmmac: Delete dead code for MDIO registration
    gianfar: Fix Tx flow control deactivation
    cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6
    cxgb4: Fix pause frame count in t4_get_port_stats
    cxgb4: fix memory leak
    tun: rename generic_xdp to skb_xdp
    tun: reserve extra headroom only when XDP is set
    net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping
    net: dsa: bcm_sf2: Advertise number of egress queues
    ...

    Linus Torvalds
     

06 Sep, 2017

1 commit

  • Pull ACPI updates from Rafael Wysocki:
    "These include a usual ACPICA code update (this time to upstream
    revision 20170728), a fix for a boot crash on some systems with
    Thunderbolt devices connected at boot time, a rework of the handling
    of PCI bridges when setting up device wakeup, new support for Apple
    device properties, support for DMA configurations reported via ACPI on
    ARM64, APEI-related updates, ACPI EC driver updates and assorted minor
    modifications in several places.

    Specifics:

    - Update the ACPICA code in the kernel to upstream revision 20170728
    including:
    * Alias operator handling update (Bob Moore).
    * Deferred resolution of reference package elements (Bob Moore).
    * Support for the _DMA method in walk resources (Bob Moore).
    * Tables handling update and support for deferred table
    verification (Lv Zheng).
    * Update of SMMU models for IORT (Robin Murphy).
    * Compiler and disassembler updates (Alex James, Erik Schmauss,
    Ganapatrao Kulkarni, James Morse).
    * Tools updates (Erik Schmauss, Lv Zheng).
    * Assorted minor fixes and cleanups (Bob Moore, Kees Cook, Lv
    Zheng, Shao Ming).

    - Rework the initialization of non-wakeup GPEs with method handlers
    in order to address a boot crash on some systems with Thunderbolt
    devices connected at boot time where we miss an early hotplug event
    due to a delay in GPE enabling (Rafael Wysocki).

    - Rework the handling of PCI bridges when setting up ACPI-based
    device wakeup in order to avoid disabling wakeup for bridges
    prematurely (Rafael Wysocki).

    - Consolidate Apple DMI checks throughout the tree, add support for
    Apple device properties to the device properties framework and use
    these properties for the handling of I2C and SPI devices on Apple
    systems (Lukas Wunner).

    - Add support for _DMA to the ACPI-based device properties lookup
    code and make it possible to use the information from there to
    configure DMA regions on ARM64 systems (Lorenzo Pieralisi).

    - Fix several issues in the APEI code, add support for exporting the
    BERT error region over sysfs and update APEI MAINTAINERS entry with
    reviewers information (Borislav Petkov, Dongjiu Geng, Loc Ho, Punit
    Agrawal, Tony Luck, Yazen Ghannam).

    - Fix a potential initialization ordering issue in the ACPI EC driver
    and clean it up somewhat (Lv Zheng).

    - Update the ACPI SPCR driver to extend the existing XGENE 8250
    workaround in it to a new platform (m400) and to work around an
    Xgene UART clock issue (Graeme Gregory).

    - Add a new utility function to the ACPI core to support using ACPI
    OEM ID / OEM Table ID / Revision for system identification in
    blacklisting or similar and switch over the existing code already
    using this information to this new interface (Toshi Kani).

    - Fix an xpower PMIC issue related to GPADC reads that always return
    0 without extra pin manipulations (Hans de Goede).

    - Add statements to print debug messages in a couple of places in the
    ACPI core for easier diagnostics (Rafael Wysocki).

    - Clean up the ACPI processor driver slightly (Colin Ian King, Hanjun
    Guo).

    - Clean up the ACPI x86 boot code somewhat (Andy Shevchenko).

    - Add a quirk for Dell OptiPlex 9020M to the ACPI backlight driver
    (Alex Hung).

    - Assorted fixes, cleanups and updates related to ACPI (Amitoj Kaur
    Chawla, Bhumika Goyal, Frank Rowand, Jean Delvare, Punit Agrawal,
    Ronald Tschalär, Sumeet Pawnikar)"

    * tag 'acpi-4.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (75 commits)
    ACPI / APEI: Suppress message if HEST not present
    intel_pstate: convert to use acpi_match_platform_list()
    ACPI / blacklist: add acpi_match_platform_list()
    ACPI, APEI, EINJ: Subtract any matching Register Region from Trigger resources
    ACPI: make device_attribute const
    ACPI / sysfs: Extend ACPI sysfs to provide access to boot error region
    ACPI: APEI: fix the wrong iteration of generic error status block
    ACPI / processor: make function acpi_processor_check_duplicates() static
    ACPI / EC: Clean up EC GPE mask flag
    ACPI: EC: Fix possible issues related to EC initialization order
    ACPI / PM: Add debug statements to acpi_pm_notify_handler()
    ACPI: Add debug statements to acpi_global_event_handler()
    ACPI / scan: Enable GPEs before scanning the namespace
    ACPICA: Make it possible to enable runtime GPEs earlier
    ACPICA: Dispatch active GPEs at init time
    ACPI: SPCR: work around clock issue on xgene UART
    ACPI: SPCR: extend XGENE 8250 workaround to m400
    ACPI / LPSS: Don't abort ACPI scan on missing mem resource
    mailbox: pcc: Drop uninformative output during boot
    ACPI/IORT: Add IORT named component memory address limits
    ...

    Linus Torvalds
     

05 Sep, 2017

2 commits

  • Pull x86 mm changes from Ingo Molnar:
    "PCID support, 5-level paging support, Secure Memory Encryption support

    The main changes in this cycle are support for three new, complex
    hardware features of x86 CPUs:

    - Add 5-level paging support, which is a new hardware feature on
    upcoming Intel CPUs allowing up to 128 PB of virtual address space
    and 4 PB of physical RAM space - a 512-fold increase over the old
    limits. (Supercomputers of the future forecasting hurricanes on an
    ever warming planet can certainly make good use of more RAM.)

    Many of the necessary changes went upstream in previous cycles,
    v4.14 is the first kernel that can enable 5-level paging.

    This feature is activated via CONFIG_X86_5LEVEL=y - disabled by
    default.

    (By Kirill A. Shutemov)

    - Add 'encrypted memory' support, which is a new hardware feature on
    upcoming AMD CPUs ('Secure Memory Encryption', SME) allowing system
    RAM to be encrypted and decrypted (mostly) transparently by the
    CPU, with a little help from the kernel to transition to/from
    encrypted RAM. Such RAM should be more secure against various
    attacks like RAM access via the memory bus and should make the
    radio signature of memory bus traffic harder to intercept (and
    decrypt) as well.

    This feature is activated via CONFIG_AMD_MEM_ENCRYPT=y - disabled
    by default.

    (By Tom Lendacky)

    - Enable PCID optimized TLB flushing on newer Intel CPUs: PCID is a
    hardware feature that attaches an address space tag to TLB entries
    and thus allows to skip TLB flushing in many cases, even if we
    switch mm's.

    (By Andy Lutomirski)

    All three of these features were in the works for a long time, and
    it's coincidence of the three independent development paths that they
    are all enabled in v4.14 at once"

    * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (65 commits)
    x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)
    x86/mm: Use pr_cont() in dump_pagetable()
    x86/mm: Fix SME encryption stack ptr handling
    kvm/x86: Avoid clearing the C-bit in rsvd_bits()
    x86/CPU: Align CR3 defines
    x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages
    acpi, x86/mm: Remove encryption mask from ACPI page protection type
    x86/mm, kexec: Fix memory corruption with SME on successive kexecs
    x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txt
    x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y
    x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID
    x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y
    x86/mm: Allow userspace have mappings above 47-bit
    x86/mm: Prepare to expose larger address space to userspace
    x86/mpx: Do not allow MPX if we have mappings above 47-bit
    x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit()
    x86/xen: Redefine XEN_ELFNOTE_INIT_P2M using PUD_SIZE * PTRS_PER_PUD
    x86/mm/dump_pagetables: Fix printout of p4d level
    x86/mm/dump_pagetables: Generalize address normalization
    x86/boot: Fix memremap() related build failure
    ...

    Linus Torvalds
     
  • Pull locking updates from Ingo Molnar:

    - Add 'cross-release' support to lockdep, which allows APIs like
    completions, where it's not the 'owner' who releases the lock, to be
    tracked. It's all activated automatically under
    CONFIG_PROVE_LOCKING=y.

    - Clean up (restructure) the x86 atomics op implementation to be more
    readable, in preparation of KASAN annotations. (Dmitry Vyukov)

    - Fix static keys (Paolo Bonzini)

    - Add killable versions of down_read() et al (Kirill Tkhai)

    - Rework and fix jump_label locking (Marc Zyngier, Paolo Bonzini)

    - Rework (and fix) tlb_flush_pending() barriers (Peter Zijlstra)

    - Remove smp_mb__before_spinlock() and convert its usages, introduce
    smp_mb__after_spinlock() (Peter Zijlstra)

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
    locking/lockdep/selftests: Fix mixed read-write ABBA tests
    sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK()
    acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuse
    locking/pvqspinlock: Relax cmpxchg's to improve performance on some architectures
    smp: Avoid using two cache lines for struct call_single_data
    locking/lockdep: Untangle xhlock history save/restore from task independence
    locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time being
    futex: Remove duplicated code and fix undefined behaviour
    Documentation/locking/atomic: Finish the document...
    locking/lockdep: Fix workqueue crossrelease annotation
    workqueue/lockdep: 'Fix' flush_work() annotation
    locking/lockdep/selftests: Add mixed read-write ABBA tests
    mm, locking/barriers: Clarify tlb_flush_pending() barriers
    locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS truly non-interactive
    locking/lockdep: Explicitly initialize wq_barrier::done::map
    locking/lockdep: Rename CONFIG_LOCKDEP_COMPLETE to CONFIG_LOCKDEP_COMPLETIONS
    locking/lockdep: Reword title of LOCKDEP_CROSSRELEASE config
    locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING
    locking/refcounts, x86/asm: Implement fast refcount overflow protection
    locking/lockdep: Fix the rollback and overwrite detection logic in crossrelease
    ...

    Linus Torvalds
     

04 Sep, 2017

1 commit

  • * acpi-x86:
    ACPI / boot: Add number of legacy IRQs to debug output
    ACPI / boot: Correct address space of __acpi_map_table()
    ACPI / boot: Don't define unused variables

    * acpi-soc:
    ACPI / LPSS: Don't abort ACPI scan on missing mem resource

    * acpi-pmic:
    ACPI / PMIC: xpower: Do pinswitch magic when reading GPADC

    * acpi-apple:
    spi: Use Apple device properties in absence of ACPI resources
    ACPI / scan: Recognize Apple SPI and I2C slaves
    ACPI / property: Support Apple _DSM properties
    ACPI / property: Don't evaluate objects for devices w/o handle
    treewide: Consolidate Apple DMI checks

    Rafael J. Wysocki
     

26 Aug, 2017

2 commits

  • Conflicts:
    arch/x86/kernel/head64.c
    arch/x86/mm/mmap.c

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • There is code duplicated over all architecture's headers for
    futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
    and comparison of the result.

    Remove this duplication and leave up to the arches only the needed
    assembly which is now in arch_futex_atomic_op_inuser.

    This effectively distributes the Will Deacon's arm64 fix for undefined
    behaviour reported by UBSAN to all architectures. The fix was done in
    commit 5f16a046f8e1 (arm64: futex: Fix undefined behaviour with
    FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump.

    And as suggested by Thomas, check for negative oparg too, because it was
    also reported to cause undefined behaviour report.

    Note that s390 removed access_ok check in d12a29703 ("s390/uaccess:
    remove pointless access_ok() checks") as access_ok there returns true.
    We introduce it back to the helper for the sake of simplicity (it gets
    optimized away anyway).

    Signed-off-by: Jiri Slaby
    Signed-off-by: Thomas Gleixner
    Acked-by: Russell King
    Acked-by: Michael Ellerman (powerpc)
    Acked-by: Heiko Carstens [s390]
    Acked-by: Chris Metcalf [for tile]
    Reviewed-by: Darren Hart (VMware)
    Reviewed-by: Will Deacon [core/arm64]
    Cc: linux-mips@linux-mips.org
    Cc: Rich Felker
    Cc: linux-ia64@vger.kernel.org
    Cc: linux-sh@vger.kernel.org
    Cc: peterz@infradead.org
    Cc: Benjamin Herrenschmidt
    Cc: Max Filippov
    Cc: Paul Mackerras
    Cc: sparclinux@vger.kernel.org
    Cc: Jonas Bonn
    Cc: linux-s390@vger.kernel.org
    Cc: linux-arch@vger.kernel.org
    Cc: Yoshinori Sato
    Cc: linux-hexagon@vger.kernel.org
    Cc: Helge Deller
    Cc: "James E.J. Bottomley"
    Cc: Catalin Marinas
    Cc: Matt Turner
    Cc: linux-snps-arc@lists.infradead.org
    Cc: Fenghua Yu
    Cc: Arnd Bergmann
    Cc: linux-xtensa@linux-xtensa.org
    Cc: Stefan Kristiansson
    Cc: openrisc@lists.librecores.org
    Cc: Ivan Kokshaysky
    Cc: Stafford Horne
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: Richard Henderson
    Cc: Chris Zankel
    Cc: Michal Simek
    Cc: Tony Luck
    Cc: linux-parisc@vger.kernel.org
    Cc: Vineet Gupta
    Cc: Ralf Baechle
    Cc: Richard Kuo
    Cc: linux-alpha@vger.kernel.org
    Cc: Martin Schwidefsky
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: "David S. Miller"
    Link: http://lkml.kernel.org/r/20170824073105.3901-1-jslaby@suse.cz

    Jiri Slaby
     

21 Aug, 2017

1 commit


17 Aug, 2017

1 commit

  • There is no agreed-upon definition of spin_unlock_wait()'s semantics,
    and it appears that all callers could do just as well with a lock/unlock
    pair. This commit therefore removes the underlying arch-specific
    arch_spin_unlock_wait() for all architectures providing them.

    Signed-off-by: Paul E. McKenney
    Cc:
    Cc: Peter Zijlstra
    Cc: Alan Stern
    Cc: Andrea Parri
    Cc: Linus Torvalds
    Acked-by: Will Deacon
    Acked-by: Boqun Feng

    Paul E. McKenney
     

16 Aug, 2017

1 commit


11 Aug, 2017

2 commits

  • Nadav reported parallel MADV_DONTNEED on same range has a stale TLB
    problem and Mel fixed it[1] and found same problem on MADV_FREE[2].

    Quote from Mel Gorman:
    "The race in question is CPU 0 running madv_free and updating some PTEs
    while CPU 1 is also running madv_free and looking at the same PTEs.
    CPU 1 may have writable TLB entries for a page but fail the pte_dirty
    check (because CPU 0 has updated it already) and potentially fail to
    flush.

    Hence, when madv_free on CPU 1 returns, there are still potentially
    writable TLB entries and the underlying PTE is still present so that a
    subsequent write does not necessarily propagate the dirty bit to the
    underlying PTE any more. Reclaim at some unknown time at the future
    may then see that the PTE is still clean and discard the page even
    though a write has happened in the meantime. I think this is possible
    but I could have missed some protection in madv_free that prevents it
    happening."

    This patch aims for solving both problems all at once and is ready for
    other problem with KSM, MADV_FREE and soft-dirty story[3].

    TLB batch API(tlb_[gather|finish]_mmu] uses [inc|dec]_tlb_flush_pending
    and mmu_tlb_flush_pending so that when tlb_finish_mmu is called, we can
    catch there are parallel threads going on. In that case, forcefully,
    flush TLB to prevent for user to access memory via stale TLB entry
    although it fail to gather page table entry.

    I confirmed this patch works with [4] test program Nadav gave so this
    patch supersedes "mm: Always flush VMA ranges affected by zap_page_range
    v2" in current mmotm.

    NOTE:

    This patch modifies arch-specific TLB gathering interface(x86, ia64,
    s390, sh, um). It seems most of architecture are straightforward but
    s390 need to be careful because tlb_flush_mmu works only if
    mm->context.flush_mm is set to non-zero which happens only a pte entry
    really is cleared by ptep_get_and_clear and friends. However, this
    problem never changes the pte entries but need to flush to prevent
    memory access from stale tlb.

    [1] http://lkml.kernel.org/r/20170725101230.5v7gvnjmcnkzzql3@techsingularity.net
    [2] http://lkml.kernel.org/r/20170725100722.2dxnmgypmwnrfawp@suse.de
    [3] http://lkml.kernel.org/r/BD3A0EBE-ECF4-41D4-87FA-C755EA9AB6BD@gmail.com
    [4] https://patchwork.kernel.org/patch/9861621/

    [minchan@kernel.org: decrease tlb flush pending count in tlb_finish_mmu]
    Link: http://lkml.kernel.org/r/20170808080821.GA31730@bbox
    Link: http://lkml.kernel.org/r/20170802000818.4760-7-namit@vmware.com
    Signed-off-by: Minchan Kim
    Signed-off-by: Nadav Amit
    Reported-by: Nadav Amit
    Reported-by: Mel Gorman
    Acked-by: Mel Gorman
    Cc: Ingo Molnar
    Cc: Russell King
    Cc: Tony Luck
    Cc: Martin Schwidefsky
    Cc: "David S. Miller"
    Cc: Heiko Carstens
    Cc: Yoshinori Sato
    Cc: Jeff Dike
    Cc: Andrea Arcangeli
    Cc: Andy Lutomirski
    Cc: Hugh Dickins
    Cc: Mel Gorman
    Cc: Nadav Amit
    Cc: Rik van Riel
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • This patch is a preparatory patch for solving race problems caused by
    TLB batch. For that, we will increase/decrease TLB flush pending count
    of mm_struct whenever tlb_[gather|finish]_mmu is called.

    Before making it simple, this patch separates architecture specific part
    and rename it to arch_tlb_[gather|finish]_mmu and generic part just
    calls it.

    It shouldn't change any behavior.

    Link: http://lkml.kernel.org/r/20170802000818.4760-5-namit@vmware.com
    Signed-off-by: Minchan Kim
    Signed-off-by: Nadav Amit
    Acked-by: Mel Gorman
    Cc: Ingo Molnar
    Cc: Russell King
    Cc: Tony Luck
    Cc: Martin Schwidefsky
    Cc: "David S. Miller"
    Cc: Heiko Carstens
    Cc: Yoshinori Sato
    Cc: Jeff Dike
    Cc: Andrea Arcangeli
    Cc: Andy Lutomirski
    Cc: Hugh Dickins
    Cc: Mel Gorman
    Cc: Nadav Amit
    Cc: Rik van Riel
    Cc: Sergey Senozhatsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     

04 Aug, 2017

1 commit

  • The send call ignores unknown flags. Legacy applications may already
    unwittingly pass MSG_ZEROCOPY. Continue to ignore this flag unless a
    socket opts in to zerocopy.

    Introduce socket option SO_ZEROCOPY to enable MSG_ZEROCOPY processing.
    Processes can also query this socket option to detect kernel support
    for the feature. Older kernels will return ENOPROTOOPT.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     

03 Aug, 2017

1 commit

  • Multiple architectures define this as a trivial function, and I'm adding
    another one as part of the RISC-V port. Add a __weak version of
    pcibios_align_resource() and delete the now-obselete ones in a handful of
    ports.

    The only functional change should be that a handful of ports used to export
    pcibios_fixup_bus(). Only some architectures export this, so I just
    dropped it.

    Signed-off-by: Palmer Dabbelt
    Signed-off-by: Bjorn Helgaas

    Palmer Dabbelt
     

25 Jul, 2017

2 commits

  • Sparse complains about wrong address space used in __acpi_map_table()
    and in __acpi_unmap_table().

    arch/x86/kernel/acpi/boot.c:127:29: warning: incorrect type in return expression (different address spaces)
    arch/x86/kernel/acpi/boot.c:127:29: expected char *
    arch/x86/kernel/acpi/boot.c:127:29: got void [noderef] *
    arch/x86/kernel/acpi/boot.c:135:23: warning: incorrect type in argument 1 (different address spaces)
    arch/x86/kernel/acpi/boot.c:135:23: expected void [noderef] *addr
    arch/x86/kernel/acpi/boot.c:135:23: got char *map

    Correct address space to be in align of type of returned and passed
    parameter.

    Reviewed-by: Hanjun Guo
    Signed-off-by: Andy Shevchenko
    Signed-off-by: Rafael J. Wysocki

    Andy Shevchenko
     
  • struct siginfo is a union and the kernel since 2.4 has been hiding a union
    tag in the high 16bits of si_code using the values:
    __SI_KILL
    __SI_TIMER
    __SI_POLL
    __SI_FAULT
    __SI_CHLD
    __SI_RT
    __SI_MESGQ
    __SI_SYS

    While this looks plausible on the surface, in practice this situation has
    not worked well.

    - Injected positive signals are not copied to user space properly
    unless they have these magic high bits set.

    - Injected positive signals are not reported properly by signalfd
    unless they have these magic high bits set.

    - These kernel internal values leaked to userspace via ptrace_peek_siginfo

    - It was possible to inject these kernel internal values and cause the
    the kernel to misbehave.

    - Kernel developers got confused and expected these kernel internal values
    in userspace in kernel self tests.

    - Kernel developers got confused and set si_code to __SI_FAULT which
    is SI_USER in userspace which causes userspace to think an ordinary user
    sent the signal and that it was not kernel generated.

    - The values make it impossible to reorganize the code to transform
    siginfo_copy_to_user into a plain copy_to_user. As si_code must
    be massaged before being passed to userspace.

    So remove these kernel internal si codes and make the kernel code simpler
    and more maintainable.

    To replace these kernel internal magic si_codes introduce the helper
    function siginfo_layout, that takes a signal number and an si_code and
    computes which union member of siginfo is being used. Have
    siginfo_layout return an enumeration so that gcc will have enough
    information to warn if a switch statement does not handle all of union
    members.

    A couple of architectures have a messed up ABI that defines signal
    specific duplications of SI_USER which causes more special cases in
    siginfo_layout than I would like. The good news is only problem
    architectures pay the cost.

    Update all of the code that used the previous magic __SI_ values to
    use the new SIL_ values and to call siginfo_layout to get those
    values. Escept where not all of the cases are handled remove the
    defaults in the switch statements so that if a new case is missed in
    the future the lack will show up at compile time.

    Modify the code that copies siginfo si_code to userspace to just copy
    the value and not cast si_code to a short first. The high bits are no
    longer used to hold a magic union member.

    Fixup the siginfo header files to stop including the __SI_ values in
    their constants and for the headers that were missing it to properly
    update the number of si_codes for each signal type.

    The fixes to copy_siginfo_from_user32 implementations has the
    interesting property that several of them perviously should never have
    worked as the __SI_ values they depended up where kernel internal.
    With that dependency gone those implementations should work much
    better.

    The idea of not passing the __SI_ values out to userspace and then
    not reinserting them has been tested with criu and criu worked without
    changes.

    Ref: 2.4.0-test1
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

20 Jul, 2017

1 commit

  • Setting si_code to __SI_FAULT results in a userspace seeing
    an si_code of 0. This is the same si_code as SI_USER. Posix
    and common sense requires that SI_USER not be a signal specific
    si_code. As such this use of 0 for the si_code is a pretty
    horribly broken ABI.

    Given that ia64 is on it's last legs I don't know that it is worth
    fixing this, but it is worth documenting what is going on so that
    no one decides to copy this bad decision.

    This was introduced in 2.3.51 so this mess has had a long time for
    people to be able to start depending on it.

    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: linux-ia64@vger.kernel.org
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

18 Jul, 2017

2 commits

  • The efi_mem_type() function currently returns a 0, which maps to
    EFI_RESERVED_TYPE, if the function is unable to find a memmap entry for
    the supplied physical address. Returning EFI_RESERVED_TYPE implies that
    a memmap entry exists, when it doesn't. Instead of returning 0, change
    the function to return a negative error value when no memmap entry is
    found.

    Signed-off-by: Tom Lendacky
    Reviewed-by: Thomas Gleixner
    Reviewed-by: Matt Fleming
    Reviewed-by: Borislav Petkov
    Cc: Alexander Potapenko
    Cc: Andrey Ryabinin
    Cc: Andy Lutomirski
    Cc: Arnd Bergmann
    Cc: Borislav Petkov
    Cc: Brijesh Singh
    Cc: Dave Young
    Cc: Dmitry Vyukov
    Cc: Jonathan Corbet
    Cc: Konrad Rzeszutek Wilk
    Cc: Larry Woodman
    Cc: Linus Torvalds
    Cc: Michael S. Tsirkin
    Cc: Paolo Bonzini
    Cc: Peter Zijlstra
    Cc: Radim Krčmář
    Cc: Rik van Riel
    Cc: Toshimitsu Kani
    Cc: kasan-dev@googlegroups.com
    Cc: kvm@vger.kernel.org
    Cc: linux-arch@vger.kernel.org
    Cc: linux-doc@vger.kernel.org
    Cc: linux-efi@vger.kernel.org
    Cc: linux-mm@kvack.org
    Link: http://lkml.kernel.org/r/7fbf40a9dc414d5da849e1ddcd7f7c1285e4e181.1500319216.git.thomas.lendacky@amd.com
    Signed-off-by: Ingo Molnar

    Tom Lendacky
     
  • The SME patches we are about to apply add some E820 logic, so merge in
    pending E820 code changes first, to have a single code base.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

16 Jul, 2017

2 commits


14 Jul, 2017

1 commit

  • Pull more Kbuild updates from Masahiro Yamada:

    - Move generic-y of exported headers to uapi/asm/Kbuild for complete
    de-coupling of UAPI

    - Clean up scripts/Makefile.headersinst

    - Fix host programs for 32 bit machine with XFS file system

    * tag 'kbuild-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (29 commits)
    kbuild: Enable Large File Support for hostprogs
    kbuild: remove wrapper files handling from Makefile.headersinst
    kbuild: split exported generic header creation into uapi-asm-generic
    kbuild: do not include old-kbuild-file from Makefile.headersinst
    xtensa: move generic-y of exported headers to uapi/asm/Kbuild
    unicore32: move generic-y of exported headers to uapi/asm/Kbuild
    tile: move generic-y of exported headers to uapi/asm/Kbuild
    sparc: move generic-y of exported headers to uapi/asm/Kbuild
    sh: move generic-y of exported headers to uapi/asm/Kbuild
    parisc: move generic-y of exported headers to uapi/asm/Kbuild
    openrisc: move generic-y of exported headers to uapi/asm/Kbuild
    nios2: move generic-y of exported headers to uapi/asm/Kbuild
    nios2: remove unneeded arch/nios2/include/(generated/)asm/signal.h
    microblaze: move generic-y of exported headers to uapi/asm/Kbuild
    metag: move generic-y of exported headers to uapi/asm/Kbuild
    m68k: move generic-y of exported headers to uapi/asm/Kbuild
    m32r: move generic-y of exported headers to uapi/asm/Kbuild
    ia64: remove redundant generic-y += kvm_para.h from asm/Kbuild
    hexagon: move generic-y of exported headers to uapi/asm/Kbuild
    h8300: move generic-y of exported headers to uapi/asm/Kbuild
    ...

    Linus Torvalds
     

13 Jul, 2017

3 commits

  • Make the use of inline like the rest of the kernel.

    Link: http://lkml.kernel.org/r/f42b2202bd0d4e7ccf79ce5348bb255a035e67bb.1499284835.git.joe@perches.com
    Signed-off-by: Joe Perches
    Cc: Tony Luck
    Cc: Fenghua Yu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Make the use of inline like the rest of the kernel.

    Link: http://lkml.kernel.org/r/d47074493af80ce12590340294bc49618165c30d.1499284835.git.joe@perches.com
    Signed-off-by: Joe Perches
    Cc: Tony Luck
    Cc: Fenghua Yu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • As Eric said,
    "what we need to do is move the variable vmcoreinfo_note out of the
    kernel's .bss section. And modify the code to regenerate and keep this
    information in something like the control page.

    Definitely something like this needs a page all to itself, and ideally
    far away from any other kernel data structures. I clearly was not
    watching closely the data someone decided to keep this silly thing in
    the kernel's .bss section."

    This patch allocates extra pages for these vmcoreinfo_XXX variables, one
    advantage is that it enhances some safety of vmcoreinfo, because
    vmcoreinfo now is kept far away from other kernel data structures.

    Link: http://lkml.kernel.org/r/1493281021-20737-1-git-send-email-xlpang@redhat.com
    Signed-off-by: Xunlei Pang
    Tested-by: Michael Holzheu
    Reviewed-by: Juergen Gross
    Suggested-by: Eric Biederman
    Cc: Benjamin Herrenschmidt
    Cc: Dave Young
    Cc: Hari Bathini
    Cc: Mahesh Salgaonkar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xunlei Pang
     

10 Jul, 2017

1 commit


08 Jul, 2017

1 commit

  • …/masahiroy/linux-kbuild

    Pull Kbuild thin archives updates from Masahiro Yamada:
    "Thin archives migration by Nicholas Piggin.

    THIN_ARCHIVES has been available for a while as an optional feature
    only for PowerPC architecture, but we do not need two different
    intermediate-artifact schemes.

    Using thin archives instead of conventional incremental linking has
    various advantages:

    - save disk space for builds

    - speed-up building a little

    - fix some link issues (for example, allyesconfig on ARM) due to more
    flexibility for the final linking

    - work better with dead code elimination we are planning

    As discussed before, this migration has been done unconditionally so
    that any problems caused by this will show up with "git bisect".

    With testing with 0-day and linux-next, some architectures actually
    showed up problems, but they were trivial and all fixed now"

    * tag 'kbuild-thinar-v4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
    tile: remove unneeded extra-y in Makefile
    kbuild: thin archives make default for all archs
    x86/um: thin archives build fix
    tile: thin archives fix linking
    ia64: thin archives fix linking
    sh: thin archives fix linking
    kbuild: handle libs-y archives separately from built-in.o archives
    kbuild: thin archives use P option to ar
    kbuild: thin archives final link close --whole-archives option
    ia64: remove unneeded extra-y in Makefile.gate
    tile: fix dependency and .*.cmd inclusion for incremental build
    sparc64: Use indirect calls in hamming weight stubs

    Linus Torvalds
     

07 Jul, 2017

2 commits

  • Merge misc updates from Andrew Morton:

    - a few hotfixes

    - various misc updates

    - ocfs2 updates

    - most of MM

    * emailed patches from Andrew Morton : (108 commits)
    mm, memory_hotplug: move movable_node to the hotplug proper
    mm, memory_hotplug: drop CONFIG_MOVABLE_NODE
    mm, memory_hotplug: drop artificial restriction on online/offline
    mm: memcontrol: account slab stats per lruvec
    mm: memcontrol: per-lruvec stats infrastructure
    mm: memcontrol: use generic mod_memcg_page_state for kmem pages
    mm: memcontrol: use the node-native slab memory counters
    mm: vmstat: move slab statistics from zone to node counters
    mm/zswap.c: delete an error message for a failed memory allocation in zswap_dstmem_prepare()
    mm/zswap.c: improve a size determination in zswap_frontswap_init()
    mm/zswap.c: delete an error message for a failed memory allocation in zswap_pool_create()
    mm/swapfile.c: sort swap entries before free
    mm/oom_kill: count global and memory cgroup oom kills
    mm: per-cgroup memory reclaim stats
    mm: kmemleak: treat vm_struct as alternative reference to vmalloc'ed objects
    mm: kmemleak: factor object reference updating out of scan_block()
    mm: kmemleak: slightly reduce the size of some structures on 64-bit architectures
    mm, mempolicy: don't check cpuset seqlock where it doesn't matter
    mm, cpuset: always use seqlock when changing task's nodemask
    mm, mempolicy: simplify rebinding mempolicies when updating cpusets
    ...

    Linus Torvalds
     
  • Pull user access str* updates from Al Viro:
    "uaccess str...() dead code removal"

    * 'uaccess.strlen' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    s390 keyboard.c: don't open-code strndup_user()
    mips: get rid of unused __strnlen_user()
    get rid of unused __strncpy_from_user() instances
    kill strlen_user()

    Linus Torvalds