12 Sep, 2017

1 commit

  • Pull namespace updates from Eric Biederman:
    "Life has been busy and I have not gotten half as much done this round
    as I would have liked. I delayed it so that a minor conflict
    resolution with the mips tree could spend a little time in linux-next
    before I sent this pull request.

    This includes two long delayed user namespace changes from Kirill
    Tkhai. It also includes a very useful change from Serge Hallyn that
    allows the security capability attribute to be used inside of user
    namespaces. The practical effect of this is people can now untar
    tarballs and install rpms in user namespaces. It had been suggested to
    generalize this and encode some of the namespace information
    information in the xattr name. Upon close inspection that makes the
    things that should be hard easy and the things that should be easy
    more expensive.

    Then there is my bugfix/cleanup for signal injection that removes the
    magic encoding of the siginfo union member from the kernel internal
    si_code. The mips folks reported the case where I had used FPE_FIXME
    me is impossible so I have remove FPE_FIXME from mips, while at the
    same time including a return statement in that case to keep gcc from
    complaining about unitialized variables.

    I almost finished the work to get make copy_siginfo_to_user a trivial
    copy to user. The code is available at:

    git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace.git neuter-copy_siginfo_to_user-v3

    But I did not have time/energy to get the code posted and reviewed
    before the merge window opened.

    I was able to see that the security excuse for just copying fields
    that we know are initialized doesn't work in practice there are buggy
    initializations that don't initialize the proper fields in siginfo. So
    we still sometimes copy unitialized data to userspace"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    Introduce v3 namespaced file capabilities
    mips/signal: In force_fcr31_sig return in the impossible case
    signal: Remove kernel interal si_code magic
    fcntl: Don't use ambiguous SIG_POLL si_codes
    prctl: Allow local CAP_SYS_ADMIN changing exe_file
    security: Use user_namespace::level to avoid redundant iterations in cap_capable()
    userns,pidns: Verify the userns for new pid namespaces
    signal/testing: Don't look for __SI_FAULT in userspace
    signal/mips: Document a conflict with SI_USER with SIGFPE
    signal/sparc: Document a conflict with SI_USER with SIGFPE
    signal/ia64: Document a conflict with SI_USER with SIGFPE
    signal/alpha: Document a conflict with SI_USER for SIGTRAP

    Linus Torvalds
     

10 Sep, 2017

1 commit

  • Merge more updates from Andrew Morton:

    - most of the rest of MM

    - a small number of misc things

    - lib/ updates

    - checkpatch

    - autofs updates

    - ipc/ updates

    * emailed patches from Andrew Morton : (126 commits)
    ipc: optimize semget/shmget/msgget for lots of keys
    ipc/sem: play nicer with large nsops allocations
    ipc/sem: drop sem_checkid helper
    ipc: convert kern_ipc_perm.refcount from atomic_t to refcount_t
    ipc: convert sem_undo_list.refcnt from atomic_t to refcount_t
    ipc: convert ipc_namespace.count from atomic_t to refcount_t
    kcov: support compat processes
    sh: defconfig: cleanup from old Kconfig options
    mn10300: defconfig: cleanup from old Kconfig options
    m32r: defconfig: cleanup from old Kconfig options
    drivers/pps: use surrounding "if PPS" to remove numerous dependency checks
    drivers/pps: aesthetic tweaks to PPS-related content
    cpumask: make cpumask_next() out-of-line
    kmod: move #ifdef CONFIG_MODULES wrapper to Makefile
    kmod: split off umh headers into its own file
    MAINTAINERS: clarify kmod is just a kernel module loader
    kmod: split out umh code into its own file
    test_kmod: flip INT checks to be consistent
    test_kmod: remove paranoid UINT_MAX check on uint range processing
    vfat: deduplicate hex2bin()
    ...

    Linus Torvalds
     

09 Sep, 2017

2 commits

  • Alpha already had an optimised fill-memory-with-16-bit-quantity
    assembler routine called memsetw(). It has a slightly different calling
    convention from memset16() in that it takes a byte count, not a count of
    words. That's the same convention used by ARM's __memset routines, so
    rename Alpha's routine to match and add a memset16() wrapper around it.
    Then convert Alpha's scr_memsetw() to call memset16() instead of
    memsetw().

    Link: http://lkml.kernel.org/r/20170720184539.31609-6-willy@infradead.org
    Signed-off-by: Matthew Wilcox
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: "H. Peter Anvin"
    Cc: "James E.J. Bottomley"
    Cc: "Martin K. Petersen"
    Cc: David Miller
    Cc: Ingo Molnar
    Cc: Michael Ellerman
    Cc: Minchan Kim
    Cc: Ralf Baechle
    Cc: Russell King
    Cc: Sam Ravnborg
    Cc: Sergey Senozhatsky
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • Pull PCI updates from Bjorn Helgaas:

    - add enhanced Downstream Port Containment support, which prints more
    details about Root Port Programmed I/O errors (Dongdong Liu)

    - add Layerscape ls1088a and ls2088a support (Hou Zhiqiang)

    - add MediaTek MT2712 and MT7622 support (Ryder Lee)

    - add MediaTek MT2712 and MT7622 MSI support (Honghui Zhang)

    - add Qualcom IPQ8074 support (Varadarajan Narayanan)

    - add R-Car r8a7743/5 device tree support (Biju Das)

    - add Rockchip per-lane PHY support for better power management (Shawn
    Lin)

    - fix IRQ mapping for hot-added devices by replacing the
    pci_fixup_irqs() boot-time design with a host bridge hook called at
    probe-time (Lorenzo Pieralisi, Matthew Minter)

    - fix race when enabling two devices that results in upstream bridge
    not being enabled correctly (Srinath Mannam)

    - fix pciehp power fault infinite loop (Keith Busch)

    - fix SHPC bridge MSI hotplug events by enabling bus mastering
    (Aleksandr Bezzubikov)

    - fix a VFIO issue by correcting PCIe capability sizes (Alex
    Williamson)

    - fix an INTD issue on Xilinx and possibly other drivers by unifying
    INTx IRQ domain support (Paul Burton)

    - avoid IOMMU stalls by marking AMD Stoney GPU ATS as broken (Joerg
    Roedel)

    - allow APM X-Gene device assignment to guests by adding an ACS quirk
    (Feng Kan)

    - fix driver crashes by disabling Extended Tags on Broadcom HT2100
    (Extended Tags support is required for PCIe Receivers but not
    Requesters, and we now enable them by default when Requesters support
    them) (Sinan Kaya)

    - fix MSIs for devices that use phantom RIDs for DMA by assuming MSIs
    use the real Requester ID (not a phantom RID) (Robin Murphy)

    - prevent assignment of Intel VMD children to guests (which may be
    supported eventually, but isn't yet) by not associating an IOMMU with
    them (Jon Derrick)

    - fix Intel VMD suspend/resume by releasing IRQs on suspend (Scott
    Bauer)

    - fix a Function-Level Reset issue with Intel 750 NVMe by waiting
    longer (up to 60sec instead of 1sec) for device to become ready
    (Sinan Kaya)

    - fix a Function-Level Reset issue on iProc Stingray by working around
    hardware defects in the CRS implementation (Oza Pawandeep)

    - fix an issue with Intel NVMe P3700 after an iProc reset by adding a
    delay during shutdown (Oza Pawandeep)

    - fix a Microsoft Hyper-V lockdep issue by polling instead of blocking
    in compose_msi_msg() (Stephen Hemminger)

    - fix a wireless LAN driver timeout by clearing DesignWare MSI
    interrupt status after it is handled, not before (Faiz Abbas)

    - fix DesignWare ATU enable checking (Jisheng Zhang)

    - reduce Layerscape dependencies on the bootloader by doing more
    initialization in the driver (Hou Zhiqiang)

    - improve Intel VMD performance allowing allocation of more IRQ vectors
    than present CPUs (Keith Busch)

    - improve endpoint framework support for initial DMA mask, different
    BAR sizes, configurable page sizes, MSI, test driver, etc (Kishon
    Vijay Abraham I, Stan Drozd)

    - rework CRS support to add periodic messages while we poll during
    enumeration and after Function-Level Reset and prepare for possible
    other uses of CRS (Sinan Kaya)

    - clean up Root Port AER handling by removing unnecessary code and
    moving error handler methods to struct pcie_port_service_driver
    (Christoph Hellwig)

    - clean up error handling paths in various drivers (Bjorn Andersson,
    Fabio Estevam, Gustavo A. R. Silva, Harunobu Kurokawa, Jeffy Chen,
    Lorenzo Pieralisi, Sergei Shtylyov)

    - clean up SR-IOV resource handling by disabling VF decoding before
    updating the corresponding resource structs (Gavin Shan)

    - clean up DesignWare-based drivers by unifying quirks to update Class
    Code and Interrupt Pin and related handling of write-protected
    registers (Hou Zhiqiang)

    - clean up by adding empty generic pcibios_align_resource() and
    pcibios_fixup_bus() and removing empty arch-specific implementations
    (Palmer Dabbelt)

    - request exclusive reset control for several drivers to allow cleanup
    elsewhere (Philipp Zabel)

    - constify various structures (Arvind Yadav, Bhumika Goyal)

    - convert from full_name() to %pOF (Rob Herring)

    - remove unused variables from iProc, HiSi, Altera, Keystone (Shawn
    Lin)

    * tag 'pci-v4.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (170 commits)
    PCI: xgene: Clean up whitespace
    PCI: xgene: Define XGENE_PCI_EXP_CAP and use generic PCI_EXP_RTCTL offset
    PCI: xgene: Fix platform_get_irq() error handling
    PCI: xilinx-nwl: Fix platform_get_irq() error handling
    PCI: rockchip: Fix platform_get_irq() error handling
    PCI: altera: Fix platform_get_irq() error handling
    PCI: spear13xx: Fix platform_get_irq() error handling
    PCI: artpec6: Fix platform_get_irq() error handling
    PCI: armada8k: Fix platform_get_irq() error handling
    PCI: dra7xx: Fix platform_get_irq() error handling
    PCI: exynos: Fix platform_get_irq() error handling
    PCI: iproc: Clean up whitespace
    PCI: iproc: Rename PCI_EXP_CAP to IPROC_PCI_EXP_CAP
    PCI: iproc: Add 500ms delay during device shutdown
    PCI: Fix typos and whitespace errors
    PCI: Remove unused "res" variable from pci_resource_io()
    PCI: Correct kernel-doc of pci_vpd_srdt_size(), pci_vpd_srdt_tag()
    PCI/AER: Reformat AER register definitions
    iommu/vt-d: Prevent VMD child devices from being remapping targets
    x86/PCI: Use is_vmd() rather than relying on the domain number
    ...

    Linus Torvalds
     

07 Sep, 2017

4 commits

  • Merge updates from Andrew Morton:

    - various misc bits

    - DAX updates

    - OCFS2

    - most of MM

    * emailed patches from Andrew Morton : (119 commits)
    mm,fork: introduce MADV_WIPEONFORK
    x86,mpx: make mpx depend on x86-64 to free up VMA flag
    mm: add /proc/pid/smaps_rollup
    mm: hugetlb: clear target sub-page last when clearing huge page
    mm: oom: let oom_reap_task and exit_mmap run concurrently
    swap: choose swap device according to numa node
    mm: replace TIF_MEMDIE checks by tsk_is_oom_victim
    mm, oom: do not rely on TIF_MEMDIE for memory reserves access
    z3fold: use per-cpu unbuddied lists
    mm, swap: don't use VMA based swap readahead if HDD is used as swap
    mm, swap: add sysfs interface for VMA based swap readahead
    mm, swap: VMA based swap readahead
    mm, swap: fix swap readahead marking
    mm, swap: add swap readahead hit statistics
    mm/vmalloc.c: don't reinvent the wheel but use existing llist API
    mm/vmstat.c: fix wrong comment
    selftests/memfd: add memfd_create hugetlbfs selftest
    mm/shmem: add hugetlbfs support to memfd_create()
    mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups
    mm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas()
    ...

    Linus Torvalds
     
  • Introduce MADV_WIPEONFORK semantics, which result in a VMA being empty
    in the child process after fork. This differs from MADV_DONTFORK in one
    important way.

    If a child process accesses memory that was MADV_WIPEONFORK, it will get
    zeroes. The address ranges are still valid, they are just empty.

    If a child process accesses memory that was MADV_DONTFORK, it will get a
    segmentation fault, since those address ranges are no longer valid in
    the child after fork.

    Since MADV_DONTFORK also seems to be used to allow very large programs
    to fork in systems with strict memory overcommit restrictions, changing
    the semantics of MADV_DONTFORK might break existing programs.

    MADV_WIPEONFORK only works on private, anonymous VMAs.

    The use case is libraries that store or cache information, and want to
    know that they need to regenerate it in the child process after fork.

    Examples of this would be:
    - systemd/pulseaudio API checks (fail after fork) (replacing a getpid
    check, which is too slow without a PID cache)
    - PKCS#11 API reinitialization check (mandated by specification)
    - glibc's upcoming PRNG (reseed after fork)
    - OpenSSL PRNG (reseed after fork)

    The security benefits of a forking server having a re-inialized PRNG in
    every child process are pretty obvious. However, due to libraries
    having all kinds of internal state, and programs getting compiled with
    many different versions of each library, it is unreasonable to expect
    calling programs to re-initialize everything manually after fork.

    A further complication is the proliferation of clone flags, programs
    bypassing glibc's functions to call clone directly, and programs calling
    unshare, causing the glibc pthread_atfork hook to not get called.

    It would be better to have the kernel take care of this automatically.

    The patch also adds MADV_KEEPONFORK, to undo the effects of a prior
    MADV_WIPEONFORK.

    This is similar to the OpenBSD minherit syscall with MAP_INHERIT_ZERO:

    https://man.openbsd.org/minherit.2

    [akpm@linux-foundation.org: numerically order arch/parisc/include/uapi/asm/mman.h #defines]
    Link: http://lkml.kernel.org/r/20170811212829.29186-3-riel@redhat.com
    Signed-off-by: Rik van Riel
    Reported-by: Florian Weimer
    Reported-by: Colm MacCártaigh
    Reviewed-by: Mike Kravetz
    Cc: "H. Peter Anvin"
    Cc: "Kirill A. Shutemov"
    Cc: Andy Lutomirski
    Cc: Dave Hansen
    Cc: Ingo Molnar
    Cc: Helge Deller
    Cc: Kees Cook
    Cc: Matthew Wilcox
    Cc: Thomas Gleixner
    Cc: Will Drewry
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     
  • A non-default huge page size can be encoded in the flags argument of the
    mmap system call. The definitions for these encodings are in arch
    specific header files. However, all architectures use the same values.

    Consolidate all the definitions in the primary user header file
    (uapi/linux/mman.h). Include definitions for all known huge page sizes.
    Use the generic encoding definitions in hugetlb_encode.h as the basis
    for these definitions.

    Link: http://lkml.kernel.org/r/1501527386-10736-3-git-send-email-mike.kravetz@oracle.com
    Signed-off-by: Mike Kravetz
    Acked-by: Michal Hocko
    Cc: Andi Kleen
    Cc: Andrea Arcangeli
    Cc: Aneesh Kumar K.V
    Cc: Anshuman Khandual
    Cc: Arnd Bergmann
    Cc: Davidlohr Bueso
    Cc: Matthew Wilcox
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Kravetz
     
  • Pull networking updates from David Miller:

    1) Support ipv6 checksum offload in sunvnet driver, from Shannon
    Nelson.

    2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric
    Dumazet.

    3) Allow generic XDP to work on virtual devices, from John Fastabend.

    4) Add bpf device maps and XDP_REDIRECT, which can be used to build
    arbitrary switching frameworks using XDP. From John Fastabend.

    5) Remove UFO offloads from the tree, gave us little other than bugs.

    6) Remove the IPSEC flow cache, from Florian Westphal.

    7) Support ipv6 route offload in mlxsw driver.

    8) Support VF representors in bnxt_en, from Sathya Perla.

    9) Add support for forward error correction modes to ethtool, from
    Vidya Sagar Ravipati.

    10) Add time filter for packet scheduler action dumping, from Jamal Hadi
    Salim.

    11) Extend the zerocopy sendmsg() used by virtio and tap to regular
    sockets via MSG_ZEROCOPY. From Willem de Bruijn.

    12) Significantly rework value tracking in the BPF verifier, from Edward
    Cree.

    13) Add new jump instructions to eBPF, from Daniel Borkmann.

    14) Rework rtnetlink plumbing so that operations can be run without
    taking the RTNL semaphore. From Florian Westphal.

    15) Support XDP in tap driver, from Jason Wang.

    16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal.

    17) Add Huawei hinic ethernet driver.

    18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan
    Delalande.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits)
    i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq
    i40e: avoid NVM acquire deadlock during NVM update
    drivers: net: xgene: Remove return statement from void function
    drivers: net: xgene: Configure tx/rx delay for ACPI
    drivers: net: xgene: Read tx/rx delay for ACPI
    rocker: fix kcalloc parameter order
    rds: Fix non-atomic operation on shared flag variable
    net: sched: don't use GFP_KERNEL under spin lock
    vhost_net: correctly check tx avail during rx busy polling
    net: mdio-mux: add mdio_mux parameter to mdio_mux_init()
    rxrpc: Make service connection lookup always check for retry
    net: stmmac: Delete dead code for MDIO registration
    gianfar: Fix Tx flow control deactivation
    cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6
    cxgb4: Fix pause frame count in t4_get_port_stats
    cxgb4: fix memory leak
    tun: rename generic_xdp to skb_xdp
    tun: reserve extra headroom only when XDP is set
    net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping
    net: dsa: bcm_sf2: Advertise number of egress queues
    ...

    Linus Torvalds
     

06 Sep, 2017

1 commit

  • Pull alpha updates from Matt Turner:
    "This contains some small clean up patches I've neglected, and some
    build improvements from Ben Hutchings"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
    alpha: math-emu: Fix modular build
    alpha: Restore symbol versions for symbols exported from assembly
    alpha: defconfig: Cleanup from old Kconfig options
    alpha: use kobj_to_dev()
    alpha: squash lines for immediate return
    alpha: kernel: Use vma_pages()
    alpha: silence a buffer overflow warning
    alpha: marvel: make use of raw_spinlock variants
    alpha: cleanup: remove __NR_sys_epoll_*, leave __NR_epoll_*
    alpha: use generic fb.h

    Linus Torvalds
     

05 Sep, 2017

11 commits

  • Commit 00fc0e0dda62 ("alpha: move exports to actual definitions") also
    removed the exports of the math emulator hooks, which are defined in C
    code. In case anyone cares about the option of CONFIG_MATHEMU=m, add
    exports next to those definitions. Also add a MODULE_LICENSE.

    Fixes: 00fc0e0dda62 ("alpha: move exports to actual definitions")
    Signed-off-by: Ben Hutchings
    Signed-off-by: Matt Turner

    Ben Hutchings
     
  • Add so that genksyms knows the types of
    these symbols and can generate CRCs for them.

    Fixes: 00fc0e0dda62 ("alpha: move exports to actual definitions")
    Signed-off-by: Ben Hutchings
    Signed-off-by: Matt Turner

    Ben Hutchings
     
  • Remove old, dead Kconfig options (in order appearing in this commit):
    - IP_NF_QUEUE: commit 3dd6664fac7e ("netfilter: remove unused "config
    IP_NF_QUEUE"");
    - AUTOFS_FS: commit 561c5cf9236a ("staging: Remove autofs3");

    Signed-off-by: Krzysztof Kozlowski
    Signed-off-by: Matt Turner

    Krzysztof Kozlowski
     
  • Use kobj_to_dev() instead of open-coding it.

    Signed-off-by: Geliang Tang
    Signed-off-by: Matt Turner

    Geliang Tang
     
  • Remove unneeded variables and assignments.

    While we are here, fix the coding style of SMC37c669_read_config():
    - replace whitespaces at the start of lines with tabs
    - remove unneeded whitespaces around parentheses

    Signed-off-by: Masahiro Yamada
    Signed-off-by: Matt Turner

    Masahiro Yamada
     
  • Replace explicit computation of vma page count by a call to
    vma_pages()

    Signed-off-by: Shyam Saini
    Signed-off-by: Matt Turner

    Shyam Saini
     
  • We check that "member" is in bounds for the first line, but we also use
    it on the next line without checking which is a mistake.

    Signed-off-by: Dan Carpenter
    Signed-off-by: Matt Turner

    Dan Carpenter
     
  • The alpha/marvel code currently implements an irq_chip for handling
    interrupts; due to how irq_chip handling is done, it's necessary for the
    irq_chip methods to be invoked from hardirq context, even on a a
    real-time kernel. Because the spinlock_t type becomes a "sleeping"
    spinlock w/ RT kernels, it is not suitable to be used with irq_chips.

    A quick audit of the operations under the lock reveal that they do only
    minimal, bounded work, and are therefore safe to do under a raw spinlock.

    Signed-off-by: Julia Cartwright
    Signed-off-by: Matt Turner

    Julia Cartwright
     
  • __NR_sys_epoll_create and friends are alpha-specific
    while __NR_epoll_create is a generic name for other
    arches.

    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: linux-alpha@vger.kernel.org
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Sergei Trofimovich
    Signed-off-by: Matt Turner

    Sergei Trofimovich
     
  • The arch uses a verbatim copy of the asm-generic version and does not
    add any own implemntations to the header, so use asm-generic/fb.h
    instead of duplicating code.

    Signed-off-by: Tobias Klauser
    Signed-off-by: Matt Turner

    Tobias Klauser
     
  • Pull locking updates from Ingo Molnar:

    - Add 'cross-release' support to lockdep, which allows APIs like
    completions, where it's not the 'owner' who releases the lock, to be
    tracked. It's all activated automatically under
    CONFIG_PROVE_LOCKING=y.

    - Clean up (restructure) the x86 atomics op implementation to be more
    readable, in preparation of KASAN annotations. (Dmitry Vyukov)

    - Fix static keys (Paolo Bonzini)

    - Add killable versions of down_read() et al (Kirill Tkhai)

    - Rework and fix jump_label locking (Marc Zyngier, Paolo Bonzini)

    - Rework (and fix) tlb_flush_pending() barriers (Peter Zijlstra)

    - Remove smp_mb__before_spinlock() and convert its usages, introduce
    smp_mb__after_spinlock() (Peter Zijlstra)

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
    locking/lockdep/selftests: Fix mixed read-write ABBA tests
    sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK()
    acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuse
    locking/pvqspinlock: Relax cmpxchg's to improve performance on some architectures
    smp: Avoid using two cache lines for struct call_single_data
    locking/lockdep: Untangle xhlock history save/restore from task independence
    locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time being
    futex: Remove duplicated code and fix undefined behaviour
    Documentation/locking/atomic: Finish the document...
    locking/lockdep: Fix workqueue crossrelease annotation
    workqueue/lockdep: 'Fix' flush_work() annotation
    locking/lockdep/selftests: Add mixed read-write ABBA tests
    mm, locking/barriers: Clarify tlb_flush_pending() barriers
    locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS truly non-interactive
    locking/lockdep: Explicitly initialize wq_barrier::done::map
    locking/lockdep: Rename CONFIG_LOCKDEP_COMPLETE to CONFIG_LOCKDEP_COMPLETIONS
    locking/lockdep: Reword title of LOCKDEP_CROSSRELEASE config
    locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING
    locking/refcounts, x86/asm: Implement fast refcount overflow protection
    locking/lockdep: Fix the rollback and overwrite detection logic in crossrelease
    ...

    Linus Torvalds
     

04 Sep, 2017

2 commits

  • Pull RCU updates from Ingo Molnad:
    "The main RCU related changes in this cycle were:

    - Removal of spin_unlock_wait()
    - SRCU updates
    - RCU torture-test updates
    - RCU Documentation updates
    - Extend the sys_membarrier() ABI with the MEMBARRIER_CMD_PRIVATE_EXPEDITED variant
    - Miscellaneous RCU fixes
    - CPU-hotplug fixes"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (63 commits)
    arch: Remove spin_unlock_wait() arch-specific definitions
    locking: Remove spin_unlock_wait() generic definitions
    drivers/ata: Replace spin_unlock_wait() with lock/unlock pair
    ipc: Replace spin_unlock_wait() with lock/unlock pair
    exit: Replace spin_unlock_wait() with lock/unlock pair
    completion: Replace spin_unlock_wait() with lock/unlock pair
    doc: Set down RCU's scheduling-clock-interrupt needs
    doc: No longer allowed to use rcu_dereference on non-pointers
    doc: Add RCU files to docbook-generation files
    doc: Update memory-barriers.txt for read-to-write dependencies
    doc: Update RCU documentation
    membarrier: Provide expedited private command
    rcu: Remove exports from rcu_idle_exit() and rcu_idle_enter()
    rcu: Add warning to rcu_idle_enter() for irqs enabled
    rcu: Make rcu_idle_enter() rely on callers disabling irqs
    rcu: Add assertions verifying blocked-tasks list
    rcu/tracing: Set disable_rcu_irq_enter on rcu_eqs_exit()
    rcu: Add TPS() protection for _rcu_barrier_trace strings
    rcu: Use idle versions of swait to make idle-hack clear
    swait: Add idle variants which don't contribute to load average
    ...

    Linus Torvalds
     
  • Conflicts:
    mm/page_alloc.c

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

02 Sep, 2017

1 commit


30 Aug, 2017

8 commits


26 Aug, 2017

1 commit

  • There is code duplicated over all architecture's headers for
    futex_atomic_op_inuser. Namely op decoding, access_ok check for uaddr,
    and comparison of the result.

    Remove this duplication and leave up to the arches only the needed
    assembly which is now in arch_futex_atomic_op_inuser.

    This effectively distributes the Will Deacon's arm64 fix for undefined
    behaviour reported by UBSAN to all architectures. The fix was done in
    commit 5f16a046f8e1 (arm64: futex: Fix undefined behaviour with
    FUTEX_OP_OPARG_SHIFT usage). Look there for an example dump.

    And as suggested by Thomas, check for negative oparg too, because it was
    also reported to cause undefined behaviour report.

    Note that s390 removed access_ok check in d12a29703 ("s390/uaccess:
    remove pointless access_ok() checks") as access_ok there returns true.
    We introduce it back to the helper for the sake of simplicity (it gets
    optimized away anyway).

    Signed-off-by: Jiri Slaby
    Signed-off-by: Thomas Gleixner
    Acked-by: Russell King
    Acked-by: Michael Ellerman (powerpc)
    Acked-by: Heiko Carstens [s390]
    Acked-by: Chris Metcalf [for tile]
    Reviewed-by: Darren Hart (VMware)
    Reviewed-by: Will Deacon [core/arm64]
    Cc: linux-mips@linux-mips.org
    Cc: Rich Felker
    Cc: linux-ia64@vger.kernel.org
    Cc: linux-sh@vger.kernel.org
    Cc: peterz@infradead.org
    Cc: Benjamin Herrenschmidt
    Cc: Max Filippov
    Cc: Paul Mackerras
    Cc: sparclinux@vger.kernel.org
    Cc: Jonas Bonn
    Cc: linux-s390@vger.kernel.org
    Cc: linux-arch@vger.kernel.org
    Cc: Yoshinori Sato
    Cc: linux-hexagon@vger.kernel.org
    Cc: Helge Deller
    Cc: "James E.J. Bottomley"
    Cc: Catalin Marinas
    Cc: Matt Turner
    Cc: linux-snps-arc@lists.infradead.org
    Cc: Fenghua Yu
    Cc: Arnd Bergmann
    Cc: linux-xtensa@linux-xtensa.org
    Cc: Stefan Kristiansson
    Cc: openrisc@lists.librecores.org
    Cc: Ivan Kokshaysky
    Cc: Stafford Horne
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: Richard Henderson
    Cc: Chris Zankel
    Cc: Michal Simek
    Cc: Tony Luck
    Cc: linux-parisc@vger.kernel.org
    Cc: Vineet Gupta
    Cc: Ralf Baechle
    Cc: Richard Kuo
    Cc: linux-alpha@vger.kernel.org
    Cc: Martin Schwidefsky
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: "David S. Miller"
    Link: http://lkml.kernel.org/r/20170824073105.3901-1-jslaby@suse.cz

    Jiri Slaby
     

17 Aug, 2017

1 commit

  • There is no agreed-upon definition of spin_unlock_wait()'s semantics,
    and it appears that all callers could do just as well with a lock/unlock
    pair. This commit therefore removes the underlying arch-specific
    arch_spin_unlock_wait() for all architectures providing them.

    Signed-off-by: Paul E. McKenney
    Cc:
    Cc: Peter Zijlstra
    Cc: Alan Stern
    Cc: Andrea Parri
    Cc: Linus Torvalds
    Acked-by: Will Deacon
    Acked-by: Boqun Feng

    Paul E. McKenney
     

04 Aug, 2017

2 commits

  • The send call ignores unknown flags. Legacy applications may already
    unwittingly pass MSG_ZEROCOPY. Continue to ignore this flag unless a
    socket opts in to zerocopy.

    Introduce socket option SO_ZEROCOPY to enable MSG_ZEROCOPY processing.
    Processes can also query this socket option to detect kernel support
    for the feature. Older kernels will return ENOPROTOOPT.

    Signed-off-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Willem de Bruijn
     
  • The pci_fixup_irqs() function allocates IRQs for all PCI devices present in
    a system; those PCI devices possibly belong to different PCI bus trees (and
    possibly rooted at different host bridges) and may well be enabled (ie
    probed and bound to a driver) by the time pci_fixup_irqs() is called when
    probing a given host bridge driver.

    Furthermore, current kernel code relying on pci_fixup_irqs() to assign
    legacy PCI IRQs to devices does not work at all for hotplugged devices in
    that the code carrying out the IRQ fixup is called at host bridge driver
    probe time, which just cannot take into account devices hotplugged after
    the system has booted.

    The introduction of map/swizzle function hooks in struct pci_host_bridge
    allows us to define per-bridge map/swizzle functions that can be used at
    device probe time in PCI core code to allocate IRQs for a given device
    (through pci_assign_irq()).

    Convert PCI host bridge initialization code to the
    pci_scan_root_bus_bridge() API (that allows to pass a struct
    pci_host_bridge with initialized map/swizzle pointers) and remove the
    pci_fixup_irqs() call from arch code.

    Signed-off-by: Lorenzo Pieralisi
    Signed-off-by: Bjorn Helgaas
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky

    Lorenzo Pieralisi
     

25 Jul, 2017

1 commit

  • struct siginfo is a union and the kernel since 2.4 has been hiding a union
    tag in the high 16bits of si_code using the values:
    __SI_KILL
    __SI_TIMER
    __SI_POLL
    __SI_FAULT
    __SI_CHLD
    __SI_RT
    __SI_MESGQ
    __SI_SYS

    While this looks plausible on the surface, in practice this situation has
    not worked well.

    - Injected positive signals are not copied to user space properly
    unless they have these magic high bits set.

    - Injected positive signals are not reported properly by signalfd
    unless they have these magic high bits set.

    - These kernel internal values leaked to userspace via ptrace_peek_siginfo

    - It was possible to inject these kernel internal values and cause the
    the kernel to misbehave.

    - Kernel developers got confused and expected these kernel internal values
    in userspace in kernel self tests.

    - Kernel developers got confused and set si_code to __SI_FAULT which
    is SI_USER in userspace which causes userspace to think an ordinary user
    sent the signal and that it was not kernel generated.

    - The values make it impossible to reorganize the code to transform
    siginfo_copy_to_user into a plain copy_to_user. As si_code must
    be massaged before being passed to userspace.

    So remove these kernel internal si codes and make the kernel code simpler
    and more maintainable.

    To replace these kernel internal magic si_codes introduce the helper
    function siginfo_layout, that takes a signal number and an si_code and
    computes which union member of siginfo is being used. Have
    siginfo_layout return an enumeration so that gcc will have enough
    information to warn if a switch statement does not handle all of union
    members.

    A couple of architectures have a messed up ABI that defines signal
    specific duplications of SI_USER which causes more special cases in
    siginfo_layout than I would like. The good news is only problem
    architectures pay the cost.

    Update all of the code that used the previous magic __SI_ values to
    use the new SIL_ values and to call siginfo_layout to get those
    values. Escept where not all of the cases are handled remove the
    defaults in the switch statements so that if a new case is missed in
    the future the lack will show up at compile time.

    Modify the code that copies siginfo si_code to userspace to just copy
    the value and not cast si_code to a short first. The high bits are no
    longer used to hold a magic union member.

    Fixup the siginfo header files to stop including the __SI_ values in
    their constants and for the headers that were missing it to properly
    update the number of si_codes for each signal type.

    The fixes to copy_siginfo_from_user32 implementations has the
    interesting property that several of them perviously should never have
    worked as the __SI_ values they depended up where kernel internal.
    With that dependency gone those implementations should work much
    better.

    The idea of not passing the __SI_ values out to userspace and then
    not reinserting them has been tested with criu and criu worked without
    changes.

    Ref: 2.4.0-test1
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

20 Jul, 2017

1 commit

  • Setting si_code to __SI_FAULT results in a userspace seeing
    an si_code of 0. This is the same si_code as SI_USER. Posix
    and common sense requires that SI_USER not be a signal specific
    si_code. As such this use of 0 for the si_code is a pretty
    horribly broken ABI.

    Given that alpha is on it's last legs I don't know that it is worth
    fixing this, but it is worth documenting what is going on so that
    no one decides to copy this bad decision.

    This was introduced during the 2.5 development cycle so this
    mess has had a long time for people to be able to depend upon it.

    v2: Added FPE_FIXME for alpha as Helge Deller pointed out
    with his alternate patch one of the cases is SIGFPE not SIGTRAP.

    Cc: Helge Deller
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: linux-alpha@vger.kernel.org
    Acked-by: Richard Henderson
    History Tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
    Ref: 0a635c7a84cf ("Fill in siginfo_t.")
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

17 Jul, 2017

1 commit

  • This ioctl does nothing to justify an _IOC_READ or _IOC_WRITE flag
    because it doesn't copy anything from/to userspace to access the
    argument.

    Fixes: 54ebbfb16034 ("tty: add TIOCGPTPEER ioctl")
    Signed-off-by: Gleb Fotengauer-Malinovskiy
    Acked-by: Aleksa Sarai
    Acked-by: Arnd Bergmann
    Signed-off-by: Greg Kroah-Hartman

    Gleb Fotengauer-Malinovskiy
     

07 Jul, 2017

2 commits