19 Dec, 2014

1 commit

  • Pull KVM update from Paolo Bonzini:
    "3.19 changes for KVM:

    - spring cleaning: removed support for IA64, and for hardware-
    assisted virtualization on the PPC970

    - ARM, PPC, s390 all had only small fixes

    For x86:
    - small performance improvements (though only on weird guests)
    - usual round of hardware-compliancy fixes from Nadav
    - APICv fixes
    - XSAVES support for hosts and guests. XSAVES hosts were broken
    because the (non-KVM) XSAVES patches inadvertently changed the KVM
    userspace ABI whenever XSAVES was enabled; hence, this part is
    going to stable. Guest support is just a matter of exposing the
    feature and CPUID leaves support"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (179 commits)
    KVM: move APIC types to arch/x86/
    KVM: PPC: Book3S: Enable in-kernel XICS emulation by default
    KVM: PPC: Book3S HV: Improve H_CONFER implementation
    KVM: PPC: Book3S HV: Fix endianness of instruction obtained from HEIR register
    KVM: PPC: Book3S HV: Remove code for PPC970 processors
    KVM: PPC: Book3S HV: Tracepoints for KVM HV guest interactions
    KVM: PPC: Book3S HV: Simplify locking around stolen time calculations
    arch: powerpc: kvm: book3s_paired_singles.c: Remove unused function
    arch: powerpc: kvm: book3s_pr.c: Remove unused function
    arch: powerpc: kvm: book3s.c: Remove some unused functions
    arch: powerpc: kvm: book3s_32_mmu.c: Remove unused function
    KVM: PPC: Book3S HV: Check wait conditions before sleeping in kvmppc_vcore_blocked
    KVM: PPC: Book3S HV: ptes are big endian
    KVM: PPC: Book3S HV: Fix inaccuracies in ICP emulation for H_IPI
    KVM: PPC: Book3S HV: Fix KSM memory corruption
    KVM: PPC: Book3S HV: Fix an issue where guest is paused on receiving HMI
    KVM: PPC: Book3S HV: Fix computation of tlbie operand
    KVM: PPC: Book3S HV: Add missing HPTE unlock
    KVM: PPC: BookE: Improve irq inject tracepoint
    arm/arm64: KVM: Require in-kernel vgic for the arch timers
    ...

    Linus Torvalds
     

18 Dec, 2014

1 commit

  • Pull user namespace related fixes from Eric Biederman:
    "As these are bug fixes almost all of thes changes are marked for
    backporting to stable.

    The first change (implicitly adding MNT_NODEV on remount) addresses a
    regression that was created when security issues with unprivileged
    remount were closed. I go on to update the remount test to make it
    easy to detect if this issue reoccurs.

    Then there are a handful of mount and umount related fixes.

    Then half of the changes deal with the a recently discovered design
    bug in the permission checks of gid_map. Unix since the beginning has
    allowed setting group permissions on files to less than the user and
    other permissions (aka ---rwx---rwx). As the unix permission checks
    stop as soon as a group matches, and setgroups allows setting groups
    that can not later be dropped, results in a situtation where it is
    possible to legitimately use a group to assign fewer privileges to a
    process. Which means dropping a group can increase a processes
    privileges.

    The fix I have adopted is that gid_map is now no longer writable
    without privilege unless the new file /proc/self/setgroups has been
    set to permanently disable setgroups.

    The bulk of user namespace using applications even the applications
    using applications using user namespaces without privilege remain
    unaffected by this change. Unfortunately this ix breaks a couple user
    space applications, that were relying on the problematic behavior (one
    of which was tools/selftests/mount/unprivileged-remount-test.c).

    To hopefully prevent needing a regression fix on top of my security
    fix I rounded folks who work with the container implementations mostly
    like to be affected and encouraged them to test the changes.

    > So far nothing broke on my libvirt-lxc test bed. :-)
    > Tested with openSUSE 13.2 and libvirt 1.2.9.
    > Tested-by: Richard Weinberger

    > Tested on Fedora20 with libvirt 1.2.11, works fine.
    > Tested-by: Chen Hanxiao

    > Ok, thanks - yes, unprivileged lxc is working fine with your kernels.
    > Just to be sure I was testing the right thing I also tested using
    > my unprivileged nsexec testcases, and they failed on setgroup/setgid
    > as now expected, and succeeded there without your patches.
    > Tested-by: Serge Hallyn

    > I tested this with Sandstorm. It breaks as is and it works if I add
    > the setgroups thing.
    > Tested-by: Andy Lutomirski # breaks things as designed :("

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    userns: Unbreak the unprivileged remount tests
    userns; Correct the comment in map_write
    userns: Allow setting gid_maps without privilege when setgroups is disabled
    userns: Add a knob to disable setgroups on a per user namespace basis
    userns: Rename id_map_mutex to userns_state_mutex
    userns: Only allow the creator of the userns unprivileged mappings
    userns: Check euid no fsuid when establishing an unprivileged uid mapping
    userns: Don't allow unprivileged creation of gid mappings
    userns: Don't allow setgroups until a gid mapping has been setablished
    userns: Document what the invariant required for safe unprivileged mappings.
    groups: Consolidate the setgroups permission checks
    mnt: Clear mnt_expire during pivot_root
    mnt: Carefully set CL_UNPRIVILEGED in clone_mnt
    mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers.
    umount: Do not allow unmounting rootfs.
    umount: Disallow unprivileged mount force
    mnt: Update unprivileged remount test
    mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount

    Linus Torvalds
     

15 Dec, 2014

1 commit

  • Pull driver core update from Greg KH:
    "Here's the set of driver core patches for 3.19-rc1.

    They are dominated by the removal of the .owner field in platform
    drivers. They touch a lot of files, but they are "simple" changes,
    just removing a line in a structure.

    Other than that, a few minor driver core and debugfs changes. There
    are some ath9k patches coming in through this tree that have been
    acked by the wireless maintainers as they relied on the debugfs
    changes.

    Everything has been in linux-next for a while"

    * tag 'driver-core-3.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (324 commits)
    Revert "ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries"
    fs: debugfs: add forward declaration for struct device type
    firmware class: Deletion of an unnecessary check before the function call "vunmap"
    firmware loader: fix hung task warning dump
    devcoredump: provide a one-way disable function
    device: Add dev__once variants
    ath: ath9k: use debugfs_create_devm_seqfile() helper for seq_file entries
    ath: use seq_file api for ath9k debugfs files
    debugfs: add helper function to create device related seq_file
    drivers/base: cacheinfo: remove noisy error boot message
    Revert "core: platform: add warning if driver has no owner"
    drivers: base: support cpu cache information interface to userspace via sysfs
    drivers: base: add cpu_device_create to support per-cpu devices
    topology: replace custom attribute macros with standard DEVICE_ATTR*
    cpumask: factor out show_cpumap into separate helper function
    driver core: Fix unbalanced device reference in drivers_probe
    driver core: fix race with userland in device_add()
    sysfs/kernfs: make read requests on pre-alloc files use the buffer.
    sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
    fs: sysfs: return EGBIG on write if offset is larger than file size
    ...

    Linus Torvalds
     

14 Dec, 2014

4 commits

  • Pull crypto update from Herbert Xu:
    - The crypto API is now documented :)
    - Disallow arbitrary module loading through crypto API.
    - Allow get request with empty driver name through crypto_user.
    - Allow speed testing of arbitrary hash functions.
    - Add caam support for ctr(aes), gcm(aes) and their derivatives.
    - nx now supports concurrent hashing properly.
    - Add sahara support for SHA1/256.
    - Add ARM64 version of CRC32.
    - Misc fixes.

    * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (77 commits)
    crypto: tcrypt - Allow speed testing of arbitrary hash functions
    crypto: af_alg - add user space interface for AEAD
    crypto: qat - fix problem with coalescing enable logic
    crypto: sahara - add support for SHA1/256
    crypto: sahara - replace tasklets with kthread
    crypto: sahara - add support for i.MX53
    crypto: sahara - fix spinlock initialization
    crypto: arm - replace memset by memzero_explicit
    crypto: powerpc - replace memset by memzero_explicit
    crypto: sha - replace memset by memzero_explicit
    crypto: sparc - replace memset by memzero_explicit
    crypto: algif_skcipher - initialize upon init request
    crypto: algif_skcipher - removed unneeded code
    crypto: algif_skcipher - Fixed blocking recvmsg
    crypto: drbg - use memzero_explicit() for clearing sensitive data
    crypto: drbg - use MODULE_ALIAS_CRYPTO
    crypto: include crypto- module prefix in template
    crypto: user - add MODULE_ALIAS
    crypto: sha-mb - remove a bogus NULL check
    crytpo: qat - Fix 64 bytes requests
    ...

    Linus Torvalds
     
  • Merge second patchbomb from Andrew Morton:
    - the rest of MM
    - misc fs fixes
    - add execveat() syscall
    - new ratelimit feature for fault-injection
    - decompressor updates
    - ipc/ updates
    - fallocate feature creep
    - fsnotify cleanups
    - a few other misc things

    * emailed patches from Andrew Morton : (99 commits)
    cgroups: Documentation: fix trivial typos and wrong paragraph numberings
    parisc: percpu: update comments referring to __get_cpu_var
    percpu: update local_ops.txt to reflect this_cpu operations
    percpu: remove __get_cpu_var and __raw_get_cpu_var macros
    fsnotify: remove destroy_list from fsnotify_mark
    fsnotify: unify inode and mount marks handling
    fallocate: create FAN_MODIFY and IN_MODIFY events
    mm/cma: make kmemleak ignore CMA regions
    slub: fix cpuset check in get_any_partial
    slab: fix cpuset check in fallback_alloc
    shmdt: use i_size_read() instead of ->i_size
    ipc/shm.c: fix overly aggressive shmdt() when calls span multiple segments
    ipc/msg: increase MSGMNI, remove scaling
    ipc/sem.c: increase SEMMSL, SEMMNI, SEMOPM
    ipc/sem.c: change memory barrier in sem_lock() to smp_rmb()
    lib/decompress.c: consistency of compress formats for kernel image
    decompress_bunzip2: off by one in get_next_block()
    usr/Kconfig: make initrd compression algorithm selection not expert
    fault-inject: add ratelimit option
    ratelimit: add initialization macro
    ...

    Linus Torvalds
     
  • Following the suggestions from Andrew Morton and Stephen Rothwell,
    Dont expand the ARCH list in kernel/gcov/Kconfig. Instead,
    define a ARCH_HAS_GCOV_PROFILE_ALL bool which architectures
    can enable.

    set ARCH_HAS_GCOV_PROFILE_ALL on Architectures where it was
    previously allowed + ARM64 which I tested.

    Signed-off-by: Riku Voipio
    Cc: Peter Oberparleiter
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Riku Voipio
     
  • Now, we have prepared to avoid using debug-pagealloc in boottime. So
    introduce new kernel-parameter to disable debug-pagealloc in boottime, and
    makes related functions to be disabled in this case.

    Only non-intuitive part is change of guard page functions. Because guard
    page is effective only if debug-pagealloc is enabled, turning off
    according to debug-pagealloc is reasonable thing to do.

    Signed-off-by: Joonsoo Kim
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: Dave Hansen
    Cc: Michal Nazarewicz
    Cc: Jungsoo Son
    Cc: Ingo Molnar
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     

13 Dec, 2014

1 commit

  • Pull another networking update from David Miller:
    "Small follow-up to the main merge pull from the other day:

    1) Alexander Duyck's DMA memory barrier patch set.

    2) cxgb4 driver fixes from Karen Xie.

    3) Add missing export of fixed_phy_register() to modules, from Mark
    Salter.

    4) DSA bug fixes from Florian Fainelli"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (24 commits)
    net/macb: add TX multiqueue support for gem
    linux/interrupt.h: remove the definition of unused tasklet_hi_enable
    jme: replace calls to redundant function
    net: ethernet: davicom: Allow to select DM9000 for nios2
    net: ethernet: smsc: Allow to select SMC91X for nios2
    cxgb4: Add support for QSA modules
    libcxgbi: fix freeing skb prematurely
    cxgb4i: use set_wr_txq() to set tx queues
    cxgb4i: handle non-pdu-aligned rx data
    cxgb4i: additional types of negative advice
    cxgb4/cxgb4i: set the max. pdu length in firmware
    cxgb4i: fix credit check for tx_data_wr
    cxgb4i: fix tx immediate data credit check
    net: phy: export fixed_phy_register()
    fib_trie: Fix trie balancing issue if new node pushes down existing node
    vlan: Add ability to always enable TSO/UFO
    r8169:update rtl8168g pcie ephy parameter
    net: dsa: bcm_sf2: force link for all fixed PHY devices
    fm10k/igb/ixgbe: Use dma_rmb on Rx descriptor reads
    r8169: Use dma_rmb() and dma_wmb() for DescOwn checks
    ...

    Linus Torvalds
     

12 Dec, 2014

4 commits

  • There are a number of situations where the mandatory barriers rmb() and
    wmb() are used to order memory/memory operations in the device drivers
    and those barriers are much heavier than they actually need to be. For
    example in the case of PowerPC wmb() calls the heavy-weight sync
    instruction when for coherent memory operations all that is really needed
    is an lsync or eieio instruction.

    This commit adds a coherent only version of the mandatory memory barriers
    rmb() and wmb(). In most cases this should result in the barrier being the
    same as the SMP barriers for the SMP case, however in some cases we use a
    barrier that is somewhere in between rmb() and smp_rmb(). For example on
    ARM the rmb barriers break down as follows:

    Barrier Call Explanation
    --------- -------- ----------------------------------
    rmb() dsb() Data synchronization barrier - system
    dma_rmb() dmb(osh) data memory barrier - outer sharable
    smp_rmb() dmb(ish) data memory barrier - inner sharable

    These new barriers are not as safe as the standard rmb() and wmb().
    Specifically they do not guarantee ordering between coherent and incoherent
    memories. The primary use case for these would be to enforce ordering of
    reads and writes when accessing coherent memory that is shared between the
    CPU and a device.

    It may also be noted that there is no dma_mb(). Most architectures don't
    provide a good mechanism for performing a coherent only full barrier without
    resorting to the same mechanism used in mb(). As such there isn't much to
    be gained in trying to define such a function.

    Cc: Frederic Weisbecker
    Cc: Mathieu Desnoyers
    Cc: Michael Ellerman
    Cc: Michael Neuling
    Cc: Russell King
    Cc: Geert Uytterhoeven
    Cc: Heiko Carstens
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Tony Luck
    Cc: Oleg Nesterov
    Cc: "Paul E. McKenney"
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: David Miller
    Acked-by: Benjamin Herrenschmidt
    Acked-by: Will Deacon
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • This patch is meant to cleanup the handling of read_barrier_depends and
    smp_read_barrier_depends. In multiple spots in the kernel headers
    read_barrier_depends is defined as "do {} while (0)", however we then go
    into the SMP vs non-SMP sections and have the SMP version reference
    read_barrier_depends, and the non-SMP define it as yet another empty
    do/while.

    With this commit I went through and cleaned out the duplicate definitions
    and reduced the number of definitions down to 2 per header. In addition I
    moved the 50 line comments for the macro from the x86 and mips headers that
    defined it as an empty do/while to those that were actually defining the
    macro, alpha and blackfin.

    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • Pull s390 updates from Martin Schwidefsky:
    "The most notable change for this pull request is the ftrace rework
    from Heiko. It brings a small performance improvement and the ground
    work to support a new gcc option to replace the mcount blocks with a
    single nop.

    Two new s390 specific system calls are added to emulate user space
    mmio for PCI, an artifact of the how PCI memory is accessed.

    Two patches for the memory management with changes to common code.
    For KVM mm_forbids_zeropage is added which disables the empty zero
    page for an mm that is used by a KVM process. And an optimization,
    pmdp_get_and_clear_full is added analog to ptep_get_and_clear_full.

    Some micro optimization for the cmpxchg and the spinlock code.

    And as usual bug fixes and cleanups"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (46 commits)
    s390/cputime: fix 31-bit compile
    s390/scm_block: make the number of reqs per HW req configurable
    s390/scm_block: handle multiple requests in one HW request
    s390/scm_block: allocate aidaw pages only when necessary
    s390/scm_block: use mempool to manage aidaw requests
    s390/eadm: change timeout value
    s390/mm: fix memory leak of ptlock in pmd_free_tlb
    s390: use local symbol names in entry[64].S
    s390/ptrace: always include vector registers in core files
    s390/simd: clear vector register pointer on fork/clone
    s390: translate cputime magic constants to macros
    s390/idle: convert open coded idle time seqcount
    s390/idle: add missing irq off lockdep annotation
    s390/debug: avoid function call for debug_sprintf_*
    s390/kprobes: fix instruction copy for out of line execution
    s390: remove diag 44 calls from cpu_relax()
    s390/dasd: retry partition detection
    s390/dasd: fix list corruption for sleep_on requests
    s390/dasd: fix infinite term I/O loop
    s390/dasd: remove unused code
    ...

    Linus Torvalds
     
  • Pull networking updates from David Miller:

    1) New offloading infrastructure and example 'rocker' driver for
    offloading of switching and routing to hardware.

    This work was done by a large group of dedicated individuals, not
    limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
    Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu

    2) Start making the networking operate on IOV iterators instead of
    modifying iov objects in-situ during transfers. Thanks to Al Viro
    and Herbert Xu.

    3) A set of new netlink interfaces for the TIPC stack, from Richard
    Alpe.

    4) Remove unnecessary looping during ipv6 routing lookups, from Martin
    KaFai Lau.

    5) Add PAUSE frame generation support to gianfar driver, from Matei
    Pavaluca.

    6) Allow for larger reordering levels in TCP, which are easily
    achievable in the real world right now, from Eric Dumazet.

    7) Add a variable of napi_schedule that doesn't need to disable cpu
    interrupts, from Eric Dumazet.

    8) Use a doubly linked list to optimize neigh_parms_release(), from
    Nicolas Dichtel.

    9) Various enhancements to the kernel BPF verifier, and allow eBPF
    programs to actually be attached to sockets. From Alexei
    Starovoitov.

    10) Support TSO/LSO in sunvnet driver, from David L Stevens.

    11) Allow controlling ECN usage via routing metrics, from Florian
    Westphal.

    12) Remote checksum offload, from Tom Herbert.

    13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
    driver, from Thomas Lendacky.

    14) Add MPLS support to openvswitch, from Simon Horman.

    15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
    Klassert.

    16) Do gro flushes on a per-device basis using a timer, from Eric
    Dumazet. This tries to resolve the conflicting goals between the
    desired handling of bulk vs. RPC-like traffic.

    17) Allow userspace to ask for the CPU upon what a packet was
    received/steered, via SO_INCOMING_CPU. From Eric Dumazet.

    18) Limit GSO packets to half the current congestion window, from Eric
    Dumazet.

    19) Add a generic helper so that all drivers set their RSS keys in a
    consistent way, from Eric Dumazet.

    20) Add xmit_more support to enic driver, from Govindarajulu
    Varadarajan.

    21) Add VLAN packet scheduler action, from Jiri Pirko.

    22) Support configurable RSS hash functions via ethtool, from Eyal
    Perry.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
    Fix race condition between vxlan_sock_add and vxlan_sock_release
    net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
    net/mlx4: Add support for A0 steering
    net/mlx4: Refactor QUERY_PORT
    net/mlx4_core: Add explicit error message when rule doesn't meet configuration
    net/mlx4: Add A0 hybrid steering
    net/mlx4: Add mlx4_bitmap zone allocator
    net/mlx4: Add a check if there are too many reserved QPs
    net/mlx4: Change QP allocation scheme
    net/mlx4_core: Use tasklet for user-space CQ completion events
    net/mlx4_core: Mask out host side virtualization features for guests
    net/mlx4_en: Set csum level for encapsulated packets
    be2net: Export tunnel offloads only when a VxLAN tunnel is created
    gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
    cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
    net: fec: only enable mdio interrupt before phy device link up
    net: fec: clear all interrupt events to support i.MX6SX
    net: fec: reset fep link status in suspend function
    net: sock: fix access via invalid file descriptor
    net: introduce helper macro for_each_cmsghdr
    ...

    Linus Torvalds
     

11 Dec, 2014

5 commits

  • Pull VFS changes from Al Viro:
    "First pile out of several (there _definitely_ will be more). Stuff in
    this one:

    - unification of d_splice_alias()/d_materialize_unique()

    - iov_iter rewrite

    - killing a bunch of ->f_path.dentry users (and f_dentry macro).

    Getting that completed will make life much simpler for
    unionmount/overlayfs, since then we'll be able to limit the places
    sensitive to file _dentry_ to reasonably few. Which allows to have
    file_inode(file) pointing to inode in a covered layer, with dentry
    pointing to (negative) dentry in union one.

    Still not complete, but much closer now.

    - crapectomy in lustre (dead code removal, mostly)

    - "let's make seq_printf return nothing" preparations

    - assorted cleanups and fixes

    There _definitely_ will be more piles"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    copy_from_iter_nocache()
    new helper: iov_iter_kvec()
    csum_and_copy_..._iter()
    iov_iter.c: handle ITER_KVEC directly
    iov_iter.c: convert copy_to_iter() to iterate_and_advance
    iov_iter.c: convert copy_from_iter() to iterate_and_advance
    iov_iter.c: get rid of bvec_copy_page_{to,from}_iter()
    iov_iter.c: convert iov_iter_zero() to iterate_and_advance
    iov_iter.c: convert iov_iter_get_pages_alloc() to iterate_all_kinds
    iov_iter.c: convert iov_iter_get_pages() to iterate_all_kinds
    iov_iter.c: convert iov_iter_npages() to iterate_all_kinds
    iov_iter.c: iterate_and_advance
    iov_iter.c: macros for iterating over iov_iter
    kill f_dentry macro
    dcache: fix kmemcheck warning in switch_names
    new helper: audit_file()
    nfsd_vfs_write(): use file_inode()
    ncpfs: use file_inode()
    kill f_dentry uses
    lockd: get rid of ->f_path.dentry->d_sb
    ...

    Linus Torvalds
     
  • Conflicts:
    drivers/net/ethernet/amd/xgbe/xgbe-desc.c
    drivers/net/ethernet/renesas/sh_eth.c

    Overlapping changes in both conflict cases.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • As there are now no remaining users of arch_fast_hash(), lets kill
    it entirely.

    This basically reverts commit 71ae8aac3e19 ("lib: introduce arch
    optimized hash library") and follow-up work, that is f.e., commit
    237217546d44 ("lib: hash: follow-up fixups for arch hash"),
    commit e3fec2f74f7f ("lib: Add missing arch generic-y entries for
    asm-generic/hash.h") and last but not least commit 6a02652df511
    ("perf tools: Fix include for non x86 architectures").

    Cc: Francesco Fusco
    Cc: Thomas Graf
    Cc: Arnaldo Carvalho de Melo
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Pull x86 MPX support from Thomas Gleixner:
    "This enables support for x86 MPX.

    MPX is a new debug feature for bound checking in user space. It
    requires kernel support to handle the bound tables and decode the
    bound violating instruction in the trap handler"

    * 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    asm-generic: Remove asm-generic arch_bprm_mm_init()
    mm: Make arch_unmap()/bprm_mm_init() available to all architectures
    x86: Cleanly separate use of asm-generic/mm_hooks.h
    x86 mpx: Change return type of get_reg_offset()
    fs: Do not include mpx.h in exec.c
    x86, mpx: Add documentation on Intel MPX
    x86, mpx: Cleanup unused bound tables
    x86, mpx: On-demand kernel allocation of bounds tables
    x86, mpx: Decode MPX instruction to get bound violation information
    x86, mpx: Add MPX-specific mmap interface
    x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific
    x86, mpx: Add MPX to disabled features
    ia64: Sync struct siginfo with general version
    mips: Sync struct siginfo with general version
    mpx: Extend siginfo structure to include bound violation information
    x86, mpx: Rename cfg_reg_u and status_reg
    x86: mpx: Give bndX registers actual names
    x86: Remove arbitrary instruction size limit in instruction decoder

    Linus Torvalds
     
  • Pull irq domain updates from Thomas Gleixner:
    "The real interesting irq updates:

    - Support for hierarchical irq domains:

    For complex interrupt routing scenarios where more than one
    interrupt related chip is involved we had no proper representation
    in the generic interrupt infrastructure so far. That made people
    implement rather ugly constructs in their nested irq chip
    implementations. The main offenders are x86 and arm/gic.

    To distangle that mess we have now hierarchical irqdomains which
    seperate the various interrupt chips and connect them via the
    hierarchical domains. That keeps the domain specific details
    internal to the particular hierarchy level and removes the
    criss/cross referencing of chip internals. The resulting hierarchy
    for a complex x86 system will look like this:

    vector mapped: 74
    msi-0 mapped: 2
    dmar-ir-1 mapped: 69
    ioapic-1 mapped: 4
    ioapic-0 mapped: 20
    pci-msi-2 mapped: 45
    dmar-ir-0 mapped: 3
    ioapic-2 mapped: 1
    pci-msi-1 mapped: 2
    htirq mapped: 0

    Neither ioapic nor pci-msi know about the dmar interrupt remapping
    between themself and the vector domain. If interrupt remapping is
    disabled ioapic and pci-msi become direct childs of the vector
    domain.

    In hindsight we should have done that years ago, but in hindsight
    we always know better :)

    - Support for generic MSI interrupt domain handling

    We have more and more non PCI related MSI interrupts, so providing
    a generic infrastructure for this is better than having all
    affected architectures implementing their own private hacks.

    - Support for PCI-MSI interrupt domain handling, based on the generic
    MSI support.

    This part carries the pci/msi branch from Bjorn Helgaas pci tree to
    avoid a massive conflict. The PCI/MSI parts are acked by Bjorn.

    I have two more branches on top of this. The full conversion of x86
    to hierarchical domains and a partial conversion of arm/gic"

    * 'irq-irqdomain-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
    genirq: Move irq_chip_write_msi_msg() helper to core
    PCI/MSI: Allow an msi_controller to be associated to an irq domain
    PCI/MSI: Provide mechanism to alloc/free MSI/MSIX interrupt from irqdomain
    PCI/MSI: Enhance core to support hierarchy irqdomain
    PCI/MSI: Move cached entry functions to irq core
    genirq: Provide default callbacks for msi_domain_ops
    genirq: Introduce msi_domain_alloc/free_irqs()
    asm-generic: Add msi.h
    genirq: Add generic msi irq domain support
    genirq: Introduce callback irq_chip.irq_write_msi_msg
    genirq: Work around __irq_set_handler vs stacked domains ordering issues
    irqdomain: Introduce helper function irq_domain_add_hierarchy()
    irqdomain: Implement a method to automatically call parent domains alloc/free
    genirq: Introduce helper irq_domain_set_info() to reduce duplicated code
    genirq: Split out flow handler typedefs into seperate header file
    genirq: Add IRQ_SET_MASK_OK_DONE to support stacked irqchip
    genirq: Introduce irq_chip.irq_compose_msi_msg() to support stacked irqchip
    genirq: Add more helper functions to support stacked irq_chip
    genirq: Introduce helper functions to support stacked irq_chip
    irqdomain: Do irq_find_mapping and set_type for hierarchy irqdomain in case OF
    ...

    Linus Torvalds
     

10 Dec, 2014

1 commit

  • Pull asm-generic asm/io.h rewrite from Arnd Bergmann:
    "While there normally is no reason to have a pull request for
    asm-generic but have all changes get merged through whichever tree
    needs them, I do have a series for 3.19.

    There are two sets of patches that change significant portions of
    asm/io.h, and this branch contains both in order to resolve the
    conflicts:

    - Will Deacon has done a set of patches to ensure that all
    architectures define {read,write}{b,w,l,q}_relaxed() functions or
    get them by including asm-generic/io.h.

    These functions are commonly used on ARM specific drivers to avoid
    expensive L2 cache synchronization implied by the normal
    {read,write}{b,w,l,q}, but we need to define them on all
    architectures in order to share the drivers across architectures
    and to enable CONFIG_COMPILE_TEST configurations for them

    - Thierry Reding has done an unrelated set of patches that extends
    the asm-generic/io.h file to the degree necessary to make it useful
    on ARM64 and potentially other architectures"

    * tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (29 commits)
    ARM64: use GENERIC_PCI_IOMAP
    sparc: io: remove duplicate relaxed accessors on sparc32
    ARM: sa11x0: Use void __iomem * in MMIO accessors
    arm64: Use include/asm-generic/io.h
    ARM: Use include/asm-generic/io.h
    asm-generic/io.h: Implement generic {read,write}s*()
    asm-generic/io.h: Reconcile I/O accessor overrides
    /dev/mem: Use more consistent data types
    Change xlate_dev_{kmem,mem}_ptr() prototypes
    ARM: ixp4xx: Properly override I/O accessors
    ARM: ixp4xx: Fix build with IXP4XX_INDIRECT_PCI
    ARM: ebsa110: Properly override I/O accessors
    ARC: Remove redundant PCI_IOBASE declaration
    documentation: memory-barriers: clarify relaxed io accessor semantics
    x86: io: implement dummy relaxed accessor macros for writes
    tile: io: implement dummy relaxed accessor macros for writes
    sparc: io: implement dummy relaxed accessor macros for writes
    powerpc: io: implement dummy relaxed accessor macros for writes
    parisc: io: implement dummy relaxed accessor macros for writes
    mn10300: io: implement dummy relaxed accessor macros for writes
    ...

    Linus Torvalds
     

09 Dec, 2014

1 commit


08 Dec, 2014

9 commits

  • git commit 8461b63ca01d125a245a0d0fb4821ea0656e5053
    "s390: translate cputime magic constants to macros"
    introduce a built error for 31-bit:

    kernel/built-in.o: In function `posix_cpu_timer_set':
    posix-cpu-timers.c:(.text+0x2a8cc): undefined reference to `__udivdi3'

    The original code is actually broken for 31-bit and has been
    corrected by the above commit by forcing the compiler to use
    64-bit arithmetic through the CPUTIME_PER_USEC define.

    To fix the compile error replace the 64-bit division with
    a call to __div().

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • The pmd_free_tlb function fails to call pgtable_pmd_page_dtor.
    Without the call the ptlock for the pmd tables will not be freed.
    Add the missing call.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • To improve the output of the perf tool hide most of the symbols
    from entry[64].S by using the '.L' prefix.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • On machines with support for vector registers the signal frame includes
    an area for the vector registers and the ptrace regset interface allow
    read and write. This is true even if the task never used any vector
    instruction. Only elf core dumps do not include the vector registers,
    to make things consistent always include the vector register note in
    core dumps create on a machine with vector register support.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • The copy_thread function fails to reset the p->thread.vxrs pointer.
    This causes the child to use the same vector register save area,
    causing both data corruptions and multiple frees of the memory for
    the save area after the tasks sharing the save area terminate.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • Make the code more self-explanatory by naming magic constants.

    Cc: Benjamin Herrenschmidt
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: Martin Schwidefsky
    Cc: Oleg Nesterov
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Wu Fengguang
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Frederic Weisbecker
     
  • s390 uses open coded seqcount to synchronize idle time accounting.
    Lets consolidate it with the standard API.

    Cc: Benjamin Herrenschmidt
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: Martin Schwidefsky
    Cc: Oleg Nesterov
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Wu Fengguang
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Frederic Weisbecker
     
  • psw_idle() returns with interrupts disabled, so we should add the
    missing annotation.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • debug_sprintf_event/exception are called even for debug events
    with a disabling debug level. All other functions already do
    the check in a wrapper function. Lets do the same here.
    Due to the var_args the compiler rejects to make this function
    inline. So let's wrap this via a macro.
    This patch saves around 80 ns on my z196 for a KVM round trip (we
    have two debug statements for entry and exit) when KVM is build as
    a module.
    The savings for built-in drivers is smaller as we then avoid the
    PLT overhead for a function call.

    Signed-off-by: Christian Borntraeger
    Reviewed-by: Michael Holzheu
    Reviewed-by: David Hildenbrand
    Signed-off-by: Martin Schwidefsky

    Christian Borntraeger
     

06 Dec, 2014

2 commits

  • introduce new setsockopt() command:

    setsockopt(sock, SOL_SOCKET, SO_ATTACH_BPF, &prog_fd, sizeof(prog_fd))

    where prog_fd was received from syscall bpf(BPF_PROG_LOAD, attr, ...)
    and attr->prog_type == BPF_PROG_TYPE_SOCKET_FILTER

    setsockopt() calls bpf_prog_get() which increments refcnt of the program,
    so it doesn't get unloaded while socket is using the program.

    The same eBPF program can be attached to multiple sockets.

    User task exit automatically closes socket which calls sk_filter_uncharge()
    which decrements refcnt of eBPF program

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: David S. Miller

    Alexei Starovoitov
     
  • Today there are 3 instances of setgroups and due to an oversight their
    permission checking has diverged. Add a common function so that
    they may all share the same permission checking code.

    This corrects the current oversight in the current permission checks
    and adds a helper to avoid this in the future.

    A user namespace security fix will update this new helper, shortly.

    Cc: stable@vger.kernel.org
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

04 Dec, 2014

3 commits

  • Instead of returning a possibly random or'ed together value, let's
    always return -EFAULT if rc is set.

    Signed-off-by: Jens Freimann
    Reviewed-by: David Hildenbrand
    Acked-by: Cornelia Huck
    Signed-off-by: Christian Borntraeger

    Jens Freimann
     
  • Currently we use a mixture of atomic/non-atomic bitops
    and the local_int spin lock to protect the pending_irqs bitmap
    and interrupt payload data.

    We need to use atomic bitops for the pending_irqs bitmap everywhere
    and in addition acquire the local_int lock where interrupt data needs
    to be protected.

    Signed-off-by: Jens Freimann
    Reviewed-by: David Hildenbrand
    Signed-off-by: Christian Borntraeger

    Jens Freimann
     
  • The cpu address of a source cpu (responsible for an external irq) is only to
    be stored if bit 6 of the ext irq code is set.

    If bit 6 is not set, it is to be zeroed out.

    The special external irq code used for virtio and pfault uses the cpu addr as a
    parameter field. As bit 6 is set, this implementation is correct.

    Reviewed-by: Thomas Huth
    Signed-off-by: David Hildenbrand
    Signed-off-by: Christian Borntraeger

    David Hildenbrand
     

01 Dec, 2014

2 commits

  • When we generate the instruction for out of line execution the length
    of the to be copied instruction was evaluated from a not initialized
    memory location.
    Therefore we ended up with a random (2, 4 or 6) number of bytes being
    copied instead of taking the real instruction length into account.
    This works surprisingly well most of the time, but still not always.

    Reported-by: Ursula Braun
    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Commit eb7e7d76 "s390: Replace __get_cpu_var uses" broke machine check
    handling.

    We copy machine check information from per-cpu to a stack variable for
    local processing. Next we should zap the per-cpu variable, not the
    stack variable.

    Signed-off-by: Sebastian Ott
    Reviewed-by: Heiko Carstens
    Acked-by: Christoph Lameter
    Signed-off-by: Martin Schwidefsky

    Sebastian Ott
     

28 Nov, 2014

5 commits

  • Allow to specify CR14, logout area, external damage code
    and failed storage address.

    Since more then one machine check can be indicated to the guest at
    a time we need to combine all indication bits with already pending
    requests.

    Signed-off-by: Jens Freimann
    Reviewed-by: Cornelia Huck
    Reviewed-by: David Hildenbrand
    Signed-off-by: Christian Borntraeger

    Jens Freimann
     
  • This patch adapts handling of local interrupts to be more compliant with
    the z/Architecture Principles of Operation and introduces a data
    structure
    which allows more efficient handling of interrupts.

    * get rid of li->active flag, use bitmap instead
    * Keep interrupts in a bitmap instead of a list
    * Deliver interrupts in the order of their priority as defined in the
    PoP
    * Use a second bitmap for sigp emergency requests, as a CPU can have
    one request pending from every other CPU in the system.

    Signed-off-by: Jens Freimann
    Reviewed-by: Cornelia Huck
    Reviewed-by: David Hildenbrand
    Signed-off-by: Christian Borntraeger

    Jens Freimann
     
  • Adds a bitmap to the vcpu structure which is used to keep track
    of local pending interrupts. Also add enum with all interrupt
    types sorted in order of priority (highest to lowest)

    Signed-off-by: Jens Freimann
    Reviewed-by: Thomas Huth
    Reviewed-by: Cornelia Huck
    Reviewed-by: David Hildenbrand
    Signed-off-by: Christian Borntraeger

    Jens Freimann
     
  • Move delivery code for cpu-local interrupt from the huge do_deliver_interrupt()
    to smaller functions which handle one type of interrupt.

    Signed-off-by: Jens Freimann
    Reviewed-by: David Hildenbrand
    Reviewed-by: Cornelia Huck
    Signed-off-by: Christian Borntraeger

    Jens Freimann
     
  • Get rid of open coded value for virtio and pfault completion interrupts.

    Signed-off-by: Jens Freimann
    Reviewed-by: David Hildenbrand
    Signed-off-by: Christian Borntraeger

    Jens Freimann