17 Dec, 2014

1 commit

  • Pull ARM SoC/iommu configuration update from Arnd Bergmann:
    "The iomm-config branch contains work from Will Deacon, quoting his
    description:

    This series adds automatic IOMMU and DMA-mapping configuration for
    OF-based DMA masters described using the generic IOMMU devicetree
    bindings. Although there is plenty of future work around splitting up
    iommu_ops, adding default IOMMU domains and sorting out automatic IOMMU
    group creation for the platform_bus, this is already useful enough for
    people to port over their IOMMU drivers and start using the new probing
    infrastructure (indeed, Marek has patches queued for the Exynos IOMMU).

    The branch touches core ARM and IOMMU driver files, and the respective
    maintainers (Russell King and Joerg Roedel) agreed to have the
    contents merged through the arm-soc tree.

    The final version was ready just before the merge window, so we ended
    up delaying it a bit longer than the rest, but we don't expect to see
    regressions because this is just additional infrastructure that will
    get used in drivers starting in 3.20 but is unused so far"

    * tag 'iommu-config-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
    iommu: store DT-probed IOMMU data privately
    arm: dma-mapping: plumb our iommu mapping ops into arch_setup_dma_ops
    arm: call iommu_init before of_platform_populate
    dma-mapping: detect and configure IOMMU in of_dma_configure
    iommu: fix initialization without 'add_device' callback
    iommu: provide helper function to configure an IOMMU for an of master
    iommu: add new iommu_ops callback for adding an OF device
    dma-mapping: replace set_arch_dma_coherent_ops with arch_setup_dma_ops
    iommu: provide early initialisation hook for IOMMU drivers

    Linus Torvalds
     

13 Dec, 2014

1 commit

  • Pull another networking update from David Miller:
    "Small follow-up to the main merge pull from the other day:

    1) Alexander Duyck's DMA memory barrier patch set.

    2) cxgb4 driver fixes from Karen Xie.

    3) Add missing export of fixed_phy_register() to modules, from Mark
    Salter.

    4) DSA bug fixes from Florian Fainelli"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (24 commits)
    net/macb: add TX multiqueue support for gem
    linux/interrupt.h: remove the definition of unused tasklet_hi_enable
    jme: replace calls to redundant function
    net: ethernet: davicom: Allow to select DM9000 for nios2
    net: ethernet: smsc: Allow to select SMC91X for nios2
    cxgb4: Add support for QSA modules
    libcxgbi: fix freeing skb prematurely
    cxgb4i: use set_wr_txq() to set tx queues
    cxgb4i: handle non-pdu-aligned rx data
    cxgb4i: additional types of negative advice
    cxgb4/cxgb4i: set the max. pdu length in firmware
    cxgb4i: fix credit check for tx_data_wr
    cxgb4i: fix tx immediate data credit check
    net: phy: export fixed_phy_register()
    fib_trie: Fix trie balancing issue if new node pushes down existing node
    vlan: Add ability to always enable TSO/UFO
    r8169:update rtl8168g pcie ephy parameter
    net: dsa: bcm_sf2: force link for all fixed PHY devices
    fm10k/igb/ixgbe: Use dma_rmb on Rx descriptor reads
    r8169: Use dma_rmb() and dma_wmb() for DescOwn checks
    ...

    Linus Torvalds
     

12 Dec, 2014

3 commits

  • There are a number of situations where the mandatory barriers rmb() and
    wmb() are used to order memory/memory operations in the device drivers
    and those barriers are much heavier than they actually need to be. For
    example in the case of PowerPC wmb() calls the heavy-weight sync
    instruction when for coherent memory operations all that is really needed
    is an lsync or eieio instruction.

    This commit adds a coherent only version of the mandatory memory barriers
    rmb() and wmb(). In most cases this should result in the barrier being the
    same as the SMP barriers for the SMP case, however in some cases we use a
    barrier that is somewhere in between rmb() and smp_rmb(). For example on
    ARM the rmb barriers break down as follows:

    Barrier Call Explanation
    --------- -------- ----------------------------------
    rmb() dsb() Data synchronization barrier - system
    dma_rmb() dmb(osh) data memory barrier - outer sharable
    smp_rmb() dmb(ish) data memory barrier - inner sharable

    These new barriers are not as safe as the standard rmb() and wmb().
    Specifically they do not guarantee ordering between coherent and incoherent
    memories. The primary use case for these would be to enforce ordering of
    reads and writes when accessing coherent memory that is shared between the
    CPU and a device.

    It may also be noted that there is no dma_mb(). Most architectures don't
    provide a good mechanism for performing a coherent only full barrier without
    resorting to the same mechanism used in mb(). As such there isn't much to
    be gained in trying to define such a function.

    Cc: Frederic Weisbecker
    Cc: Mathieu Desnoyers
    Cc: Michael Ellerman
    Cc: Michael Neuling
    Cc: Russell King
    Cc: Geert Uytterhoeven
    Cc: Heiko Carstens
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Tony Luck
    Cc: Oleg Nesterov
    Cc: "Paul E. McKenney"
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: David Miller
    Acked-by: Benjamin Herrenschmidt
    Acked-by: Will Deacon
    Signed-off-by: Alexander Duyck
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • Pull s390 updates from Martin Schwidefsky:
    "The most notable change for this pull request is the ftrace rework
    from Heiko. It brings a small performance improvement and the ground
    work to support a new gcc option to replace the mcount blocks with a
    single nop.

    Two new s390 specific system calls are added to emulate user space
    mmio for PCI, an artifact of the how PCI memory is accessed.

    Two patches for the memory management with changes to common code.
    For KVM mm_forbids_zeropage is added which disables the empty zero
    page for an mm that is used by a KVM process. And an optimization,
    pmdp_get_and_clear_full is added analog to ptep_get_and_clear_full.

    Some micro optimization for the cmpxchg and the spinlock code.

    And as usual bug fixes and cleanups"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (46 commits)
    s390/cputime: fix 31-bit compile
    s390/scm_block: make the number of reqs per HW req configurable
    s390/scm_block: handle multiple requests in one HW request
    s390/scm_block: allocate aidaw pages only when necessary
    s390/scm_block: use mempool to manage aidaw requests
    s390/eadm: change timeout value
    s390/mm: fix memory leak of ptlock in pmd_free_tlb
    s390: use local symbol names in entry[64].S
    s390/ptrace: always include vector registers in core files
    s390/simd: clear vector register pointer on fork/clone
    s390: translate cputime magic constants to macros
    s390/idle: convert open coded idle time seqcount
    s390/idle: add missing irq off lockdep annotation
    s390/debug: avoid function call for debug_sprintf_*
    s390/kprobes: fix instruction copy for out of line execution
    s390: remove diag 44 calls from cpu_relax()
    s390/dasd: retry partition detection
    s390/dasd: fix list corruption for sleep_on requests
    s390/dasd: fix infinite term I/O loop
    s390/dasd: remove unused code
    ...

    Linus Torvalds
     
  • Pull networking updates from David Miller:

    1) New offloading infrastructure and example 'rocker' driver for
    offloading of switching and routing to hardware.

    This work was done by a large group of dedicated individuals, not
    limited to: Scott Feldman, Jiri Pirko, Thomas Graf, John Fastabend,
    Jamal Hadi Salim, Andy Gospodarek, Florian Fainelli, Roopa Prabhu

    2) Start making the networking operate on IOV iterators instead of
    modifying iov objects in-situ during transfers. Thanks to Al Viro
    and Herbert Xu.

    3) A set of new netlink interfaces for the TIPC stack, from Richard
    Alpe.

    4) Remove unnecessary looping during ipv6 routing lookups, from Martin
    KaFai Lau.

    5) Add PAUSE frame generation support to gianfar driver, from Matei
    Pavaluca.

    6) Allow for larger reordering levels in TCP, which are easily
    achievable in the real world right now, from Eric Dumazet.

    7) Add a variable of napi_schedule that doesn't need to disable cpu
    interrupts, from Eric Dumazet.

    8) Use a doubly linked list to optimize neigh_parms_release(), from
    Nicolas Dichtel.

    9) Various enhancements to the kernel BPF verifier, and allow eBPF
    programs to actually be attached to sockets. From Alexei
    Starovoitov.

    10) Support TSO/LSO in sunvnet driver, from David L Stevens.

    11) Allow controlling ECN usage via routing metrics, from Florian
    Westphal.

    12) Remote checksum offload, from Tom Herbert.

    13) Add split-header receive, BQL, and xmit_more support to amd-xgbe
    driver, from Thomas Lendacky.

    14) Add MPLS support to openvswitch, from Simon Horman.

    15) Support wildcard tunnel endpoints in ipv6 tunnels, from Steffen
    Klassert.

    16) Do gro flushes on a per-device basis using a timer, from Eric
    Dumazet. This tries to resolve the conflicting goals between the
    desired handling of bulk vs. RPC-like traffic.

    17) Allow userspace to ask for the CPU upon what a packet was
    received/steered, via SO_INCOMING_CPU. From Eric Dumazet.

    18) Limit GSO packets to half the current congestion window, from Eric
    Dumazet.

    19) Add a generic helper so that all drivers set their RSS keys in a
    consistent way, from Eric Dumazet.

    20) Add xmit_more support to enic driver, from Govindarajulu
    Varadarajan.

    21) Add VLAN packet scheduler action, from Jiri Pirko.

    22) Support configurable RSS hash functions via ethtool, from Eyal
    Perry.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1820 commits)
    Fix race condition between vxlan_sock_add and vxlan_sock_release
    net/macb: fix compilation warning for print_hex_dump() called with skb->mac_header
    net/mlx4: Add support for A0 steering
    net/mlx4: Refactor QUERY_PORT
    net/mlx4_core: Add explicit error message when rule doesn't meet configuration
    net/mlx4: Add A0 hybrid steering
    net/mlx4: Add mlx4_bitmap zone allocator
    net/mlx4: Add a check if there are too many reserved QPs
    net/mlx4: Change QP allocation scheme
    net/mlx4_core: Use tasklet for user-space CQ completion events
    net/mlx4_core: Mask out host side virtualization features for guests
    net/mlx4_en: Set csum level for encapsulated packets
    be2net: Export tunnel offloads only when a VxLAN tunnel is created
    gianfar: Fix dma check map error when DMA_API_DEBUG is enabled
    cxgb4/csiostor: Don't use MASTER_MUST for fw_hello call
    net: fec: only enable mdio interrupt before phy device link up
    net: fec: clear all interrupt events to support i.MX6SX
    net: fec: reset fep link status in suspend function
    net: sock: fix access via invalid file descriptor
    net: introduce helper macro for_each_cmsghdr
    ...

    Linus Torvalds
     

11 Dec, 2014

3 commits

  • As there are now no remaining users of arch_fast_hash(), lets kill
    it entirely.

    This basically reverts commit 71ae8aac3e19 ("lib: introduce arch
    optimized hash library") and follow-up work, that is f.e., commit
    237217546d44 ("lib: hash: follow-up fixups for arch hash"),
    commit e3fec2f74f7f ("lib: Add missing arch generic-y entries for
    asm-generic/hash.h") and last but not least commit 6a02652df511
    ("perf tools: Fix include for non x86 architectures").

    Cc: Francesco Fusco
    Cc: Thomas Graf
    Cc: Arnaldo Carvalho de Melo
    Signed-off-by: Daniel Borkmann
    Signed-off-by: David S. Miller

    Daniel Borkmann
     
  • Pull x86 MPX support from Thomas Gleixner:
    "This enables support for x86 MPX.

    MPX is a new debug feature for bound checking in user space. It
    requires kernel support to handle the bound tables and decode the
    bound violating instruction in the trap handler"

    * 'x86-mpx-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    asm-generic: Remove asm-generic arch_bprm_mm_init()
    mm: Make arch_unmap()/bprm_mm_init() available to all architectures
    x86: Cleanly separate use of asm-generic/mm_hooks.h
    x86 mpx: Change return type of get_reg_offset()
    fs: Do not include mpx.h in exec.c
    x86, mpx: Add documentation on Intel MPX
    x86, mpx: Cleanup unused bound tables
    x86, mpx: On-demand kernel allocation of bounds tables
    x86, mpx: Decode MPX instruction to get bound violation information
    x86, mpx: Add MPX-specific mmap interface
    x86, mpx: Introduce VM_MPX to indicate that a VMA is MPX specific
    x86, mpx: Add MPX to disabled features
    ia64: Sync struct siginfo with general version
    mips: Sync struct siginfo with general version
    mpx: Extend siginfo structure to include bound violation information
    x86, mpx: Rename cfg_reg_u and status_reg
    x86: mpx: Give bndX registers actual names
    x86: Remove arbitrary instruction size limit in instruction decoder

    Linus Torvalds
     
  • Pull irq domain updates from Thomas Gleixner:
    "The real interesting irq updates:

    - Support for hierarchical irq domains:

    For complex interrupt routing scenarios where more than one
    interrupt related chip is involved we had no proper representation
    in the generic interrupt infrastructure so far. That made people
    implement rather ugly constructs in their nested irq chip
    implementations. The main offenders are x86 and arm/gic.

    To distangle that mess we have now hierarchical irqdomains which
    seperate the various interrupt chips and connect them via the
    hierarchical domains. That keeps the domain specific details
    internal to the particular hierarchy level and removes the
    criss/cross referencing of chip internals. The resulting hierarchy
    for a complex x86 system will look like this:

    vector mapped: 74
    msi-0 mapped: 2
    dmar-ir-1 mapped: 69
    ioapic-1 mapped: 4
    ioapic-0 mapped: 20
    pci-msi-2 mapped: 45
    dmar-ir-0 mapped: 3
    ioapic-2 mapped: 1
    pci-msi-1 mapped: 2
    htirq mapped: 0

    Neither ioapic nor pci-msi know about the dmar interrupt remapping
    between themself and the vector domain. If interrupt remapping is
    disabled ioapic and pci-msi become direct childs of the vector
    domain.

    In hindsight we should have done that years ago, but in hindsight
    we always know better :)

    - Support for generic MSI interrupt domain handling

    We have more and more non PCI related MSI interrupts, so providing
    a generic infrastructure for this is better than having all
    affected architectures implementing their own private hacks.

    - Support for PCI-MSI interrupt domain handling, based on the generic
    MSI support.

    This part carries the pci/msi branch from Bjorn Helgaas pci tree to
    avoid a massive conflict. The PCI/MSI parts are acked by Bjorn.

    I have two more branches on top of this. The full conversion of x86
    to hierarchical domains and a partial conversion of arm/gic"

    * 'irq-irqdomain-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
    genirq: Move irq_chip_write_msi_msg() helper to core
    PCI/MSI: Allow an msi_controller to be associated to an irq domain
    PCI/MSI: Provide mechanism to alloc/free MSI/MSIX interrupt from irqdomain
    PCI/MSI: Enhance core to support hierarchy irqdomain
    PCI/MSI: Move cached entry functions to irq core
    genirq: Provide default callbacks for msi_domain_ops
    genirq: Introduce msi_domain_alloc/free_irqs()
    asm-generic: Add msi.h
    genirq: Add generic msi irq domain support
    genirq: Introduce callback irq_chip.irq_write_msi_msg
    genirq: Work around __irq_set_handler vs stacked domains ordering issues
    irqdomain: Introduce helper function irq_domain_add_hierarchy()
    irqdomain: Implement a method to automatically call parent domains alloc/free
    genirq: Introduce helper irq_domain_set_info() to reduce duplicated code
    genirq: Split out flow handler typedefs into seperate header file
    genirq: Add IRQ_SET_MASK_OK_DONE to support stacked irqchip
    genirq: Introduce irq_chip.irq_compose_msi_msg() to support stacked irqchip
    genirq: Add more helper functions to support stacked irq_chip
    genirq: Introduce helper functions to support stacked irq_chip
    irqdomain: Do irq_find_mapping and set_type for hierarchy irqdomain in case OF
    ...

    Linus Torvalds
     

10 Dec, 2014

3 commits

  • Pull scheduler updates from Ingo Molnar:
    "The main changes in this cycle are:

    - 'Nested Sleep Debugging', activated when CONFIG_DEBUG_ATOMIC_SLEEP=y.

    This instruments might_sleep() checks to catch places that nest
    blocking primitives - such as mutex usage in a wait loop. Such
    bugs can result in hard to debug races/hangs.

    Another category of invalid nesting that this facility will detect
    is the calling of blocking functions from within schedule() ->
    sched_submit_work() -> blk_schedule_flush_plug().

    There's some potential for false positives (if secondary blocking
    primitives themselves are not ready yet for this facility), but the
    kernel will warn once about such bugs per bootup, so the warning
    isn't much of a nuisance.

    This feature comes with a number of fixes, for problems uncovered
    with it, so no messages are expected normally.

    - Another round of sched/numa optimizations and refinements, for
    CONFIG_NUMA_BALANCING=y.

    - Another round of sched/dl fixes and refinements.

    Plus various smaller fixes and cleanups"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits)
    sched: Add missing rcu protection to wake_up_all_idle_cpus
    sched/deadline: Introduce start_hrtick_dl() for !CONFIG_SCHED_HRTICK
    sched/numa: Init numa balancing fields of init_task
    sched/deadline: Remove unnecessary definitions in cpudeadline.h
    sched/cpupri: Remove unnecessary definitions in cpupri.h
    sched/deadline: Fix rq->dl.pushable_tasks bug in push_dl_task()
    sched/fair: Fix stale overloaded status in the busiest group finding logic
    sched: Move p->nr_cpus_allowed check to select_task_rq()
    sched/completion: Document when to use wait_for_completion_io_*()
    sched: Update comments about CLONE_NEWUTS and CLONE_NEWIPC
    sched/fair: Kill task_struct::numa_entry and numa_group::task_list
    sched: Refactor task_struct to use numa_faults instead of numa_* pointers
    sched/deadline: Don't check CONFIG_SMP in switched_from_dl()
    sched/deadline: Reschedule from switched_from_dl() after a successful pull
    sched/deadline: Push task away if the deadline is equal to curr during wakeup
    sched/deadline: Add deadline rq status print
    sched/deadline: Fix artificial overrun introduced by yield_task_dl()
    sched/rt: Clean up check_preempt_equal_prio()
    sched/core: Use dl_bw_of() under rcu_read_lock_sched()
    sched: Check if we got a shallowest_idle_cpu before searching for least_loaded_cpu
    ...

    Linus Torvalds
     
  • Pull asm-generic asm/io.h rewrite from Arnd Bergmann:
    "While there normally is no reason to have a pull request for
    asm-generic but have all changes get merged through whichever tree
    needs them, I do have a series for 3.19.

    There are two sets of patches that change significant portions of
    asm/io.h, and this branch contains both in order to resolve the
    conflicts:

    - Will Deacon has done a set of patches to ensure that all
    architectures define {read,write}{b,w,l,q}_relaxed() functions or
    get them by including asm-generic/io.h.

    These functions are commonly used on ARM specific drivers to avoid
    expensive L2 cache synchronization implied by the normal
    {read,write}{b,w,l,q}, but we need to define them on all
    architectures in order to share the drivers across architectures
    and to enable CONFIG_COMPILE_TEST configurations for them

    - Thierry Reding has done an unrelated set of patches that extends
    the asm-generic/io.h file to the degree necessary to make it useful
    on ARM64 and potentially other architectures"

    * tag 'asm-generic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: (29 commits)
    ARM64: use GENERIC_PCI_IOMAP
    sparc: io: remove duplicate relaxed accessors on sparc32
    ARM: sa11x0: Use void __iomem * in MMIO accessors
    arm64: Use include/asm-generic/io.h
    ARM: Use include/asm-generic/io.h
    asm-generic/io.h: Implement generic {read,write}s*()
    asm-generic/io.h: Reconcile I/O accessor overrides
    /dev/mem: Use more consistent data types
    Change xlate_dev_{kmem,mem}_ptr() prototypes
    ARM: ixp4xx: Properly override I/O accessors
    ARM: ixp4xx: Fix build with IXP4XX_INDIRECT_PCI
    ARM: ebsa110: Properly override I/O accessors
    ARC: Remove redundant PCI_IOBASE declaration
    documentation: memory-barriers: clarify relaxed io accessor semantics
    x86: io: implement dummy relaxed accessor macros for writes
    tile: io: implement dummy relaxed accessor macros for writes
    sparc: io: implement dummy relaxed accessor macros for writes
    powerpc: io: implement dummy relaxed accessor macros for writes
    parisc: io: implement dummy relaxed accessor macros for writes
    mn10300: io: implement dummy relaxed accessor macros for writes
    ...

    Linus Torvalds
     
  • Pull arm64 updates from Will Deacon:
    "Here's the usual mixed bag of arm64 updates, also including some
    related EFI changes (Acked by Matt) and the MMU gather range cleanup
    (Acked by you).

    Changes include:
    - support for alternative instruction patching from Andre
    - seccomp from Akashi
    - some AArch32 instruction emulation, required by the Android folks
    - optimisations for exception entry/exit code, cmpxchg, pcpu atomics
    - mmu_gather range calculations moved into core code
    - EFI updates from Ard, including long-awaited SMBIOS support
    - /proc/cpuinfo fixes to align with the format used by arch/arm/
    - a few non-critical fixes across the architecture"

    * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (70 commits)
    arm64: remove the unnecessary arm64_swiotlb_init()
    arm64: add module support for alternatives fixups
    arm64: perf: Prevent wraparound during overflow
    arm64/include/asm: Fixed a warning about 'struct pt_regs'
    arm64: Provide a namespace to NCAPS
    arm64: bpf: lift restriction on last instruction
    arm64: Implement support for read-mostly sections
    arm64: compat: align cacheflush syscall with arch/arm
    arm64: add seccomp support
    arm64: add SIGSYS siginfo for compat task
    arm64: add seccomp syscall for compat task
    asm-generic: add generic seccomp.h for secure computing mode 1
    arm64: ptrace: allow tracer to skip a system call
    arm64: ptrace: add NT_ARM_SYSTEM_CALL regset
    arm64: Move some head.text functions to executable section
    arm64: jump labels: NOP out NOP -> NOP replacement
    arm64: add support to dump the kernel page tables
    arm64: Add FIX_HOLE to permanent fixed addresses
    arm64: alternatives: fix pr_fmt string for consistency
    arm64: vmlinux.lds.S: don't discard .exit.* sections at link-time
    ...

    Linus Torvalds
     

08 Dec, 2014

1 commit


02 Dec, 2014

1 commit

  • IOMMU drivers must be initialised before any of their upstream devices,
    otherwise the relevant iommu_ops won't be configured for the bus in
    question. To solve this, a number of IOMMU drivers use initcalls to
    initialise the driver before anything has a chance to be probed.

    Whilst this solves the immediate problem, it leaves the job of probing
    the IOMMU completely separate from the iommu_ops to configure the IOMMU,
    which are called on a per-bus basis and require the driver to figure out
    exactly which instance of the IOMMU is being requested. In particular,
    the add_device callback simply passes a struct device to the driver,
    which then has to parse firmware tables or probe buses to identify the
    relevant IOMMU instance.

    This patch takes the first step in addressing this problem by adding an
    early initialisation pass for IOMMU drivers, giving them the ability to
    store some per-instance data in their iommu_ops structure and store that
    in their of_node. This can later be used when parsing OF masters to
    identify the IOMMU instance in question.

    Acked-by: Arnd Bergmann
    Acked-by: Joerg Roedel
    Acked-by: Marek Szyprowski
    Tested-by: Robin Murphy
    Signed-off-by: Will Deacon

    Will Deacon
     

28 Nov, 2014

1 commit


23 Nov, 2014

2 commits

  • To support MSI irq domains we want a generic data structure for
    allocation, but we need the option to provide an architecture specific
    version of it. So instead of playing #ifdef games in linux/msi.h we
    add a generic header file and let architectures decide whether to
    include it or to provide their own implementation and provide the
    required typedef.

    I know that typedefs are not really nice, but in this case there are no
    forward declarations required and it's the simplest solution.

    Signed-off-by: Thomas Gleixner
    Acked-by: Arnd Bergmann
    Cc: Jiang Liu
    Cc: Tony Luck
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: Bjorn Helgaas
    Cc: Grant Likely
    Cc: Marc Zyngier
    Cc: Yijing Wang
    Cc: Yingjoe Chen
    Cc: Borislav Petkov
    Cc: Matthias Brugger
    Cc: Alexander Gordeev

    Thomas Gleixner
     
  • This is a follow-on to commit 62e88b1c00de 'mm: Make
    arch_unmap()/bprm_mm_init() available to all architectures'

    I removed the asm-generic version of arch_unmap() in that patch,
    but missed arch_bprm_mm_init(). So this broke the build for
    architectures using asm-generic/mmu_context.h who actually have
    an MMU.

    Fixes: 62e88b1c00de 'mm: Make arch_unmap()/bprm_mm_init() available to all architectures'
    Signed-off-by: Dave Hansen
    Cc: Dave Hansen
    Cc: linux-arch@vger.kernel.org
    Cc: x86@kernel.org
    Link: http://lkml.kernel.org/r/20141122163711.0F037EE6@viggo.jf.intel.com
    Signed-off-by: Thomas Gleixner

    Dave Hansen
     

19 Nov, 2014

1 commit

  • The x86 MPX patch set calls arch_unmap() and arch_bprm_mm_init()
    from fs/exec.c, so we need at least a stub for them in all
    architectures. They are only called under an #ifdef for
    CONFIG_MMU=y, so we can at least restict this to architectures
    with MMU support.

    blackfin/c6x have no MMU support, so do not call arch_unmap().
    They also do not include mm_hooks.h or mmu_context.h at all and
    do not need to be touched.

    s390, um and unicore32 do not use asm-generic/mm_hooks.h, so got
    their own arch_unmap() versions. (I also moved um's
    arch_dup_mmap() to be closer to the other mm_hooks.h functions).

    xtensa only includes mm_hooks when MMU=y, which should be fine
    since arch_unmap() is called only from MMU=y code.

    For the rest, we use the stub copies of these functions in
    asm-generic/mm_hook.h.

    I cross compiled defconfigs for cris (to check NOMMU) and s390
    to make sure that this works. I also checked a 64-bit build
    of UML and all my normal x86 builds including PARAVIRT on and
    off.

    Signed-off-by: Dave Hansen
    Cc: Dave Hansen
    Cc: linux-arch@vger.kernel.org
    Cc: x86@kernel.org
    Link: http://lkml.kernel.org/r/20141118182350.8B4AA2C2@viggo.jf.intel.com
    Signed-off-by: Thomas Gleixner

    Dave Hansen
     

18 Nov, 2014

2 commits

  • The previous patch allocates bounds tables on-demand. As noted in
    an earlier description, these can add up to *HUGE* amounts of
    memory. This has caused OOMs in practice when running tests.

    This patch adds support for freeing bounds tables when they are no
    longer in use.

    There are two types of mappings in play when unmapping tables:
    1. The mapping with the actual data, which userspace is
    munmap()ing or brk()ing away, etc...
    2. The mapping for the bounds table *backing* the data
    (is tagged with VM_MPX, see the patch "add MPX specific
    mmap interface").

    If userspace use the prctl() indroduced earlier in this patchset
    to enable the management of bounds tables in kernel, when it
    unmaps the first type of mapping with the actual data, the kernel
    needs to free the mapping for the bounds table backing the data.
    This patch hooks in at the very end of do_unmap() to do so.
    We look at the addresses being unmapped and find the bounds
    directory entries and tables which cover those addresses. If
    an entire table is unused, we clear associated directory entry
    and free the table.

    Once we unmap the bounds table, we would have a bounds directory
    entry pointing at empty address space. That address space might
    now be allocated for some other (random) use, and the MPX
    hardware might now try to walk it as if it were a bounds table.
    That would be bad. So any unmapping of an enture bounds table
    has to be accompanied by a corresponding write to the bounds
    directory entry to invalidate it. That write to the bounds
    directory can fault, which causes the following problem:

    Since we are doing the freeing from munmap() (and other paths
    like it), we hold mmap_sem for write. If we fault, the page
    fault handler will attempt to acquire mmap_sem for read and
    we will deadlock. To avoid the deadlock, we pagefault_disable()
    when touching the bounds directory entry and use a
    get_user_pages() to resolve the fault.

    The unmapping of bounds tables happends under vm_munmap(). We
    also (indirectly) call vm_munmap() to _do_ the unmapping of the
    bounds tables. We avoid unbounded recursion by disallowing
    freeing of bounds tables *for* bounds tables. This would not
    occur normally, so should not have any practical impact. Being
    strict about it here helps ensure that we do not have an
    exploitable stack overflow.

    Based-on-patch-by: Qiaowei Ren
    Signed-off-by: Dave Hansen
    Cc: linux-mm@kvack.org
    Cc: linux-mips@linux-mips.org
    Cc: Dave Hansen
    Link: http://lkml.kernel.org/r/20141114151831.E4531C4A@viggo.jf.intel.com
    Signed-off-by: Thomas Gleixner

    Dave Hansen
     
  • This is really the meat of the MPX patch set. If there is one patch to
    review in the entire series, this is the one. There is a new ABI here
    and this kernel code also interacts with userspace memory in a
    relatively unusual manner. (small FAQ below).

    Long Description:

    This patch adds two prctl() commands to provide enable or disable the
    management of bounds tables in kernel, including on-demand kernel
    allocation (See the patch "on-demand kernel allocation of bounds tables")
    and cleanup (See the patch "cleanup unused bound tables"). Applications
    do not strictly need the kernel to manage bounds tables and we expect
    some applications to use MPX without taking advantage of this kernel
    support. This means the kernel can not simply infer whether an application
    needs bounds table management from the MPX registers. The prctl() is an
    explicit signal from userspace.

    PR_MPX_ENABLE_MANAGEMENT is meant to be a signal from userspace to
    require kernel's help in managing bounds tables.

    PR_MPX_DISABLE_MANAGEMENT is the opposite, meaning that userspace don't
    want kernel's help any more. With PR_MPX_DISABLE_MANAGEMENT, the kernel
    won't allocate and free bounds tables even if the CPU supports MPX.

    PR_MPX_ENABLE_MANAGEMENT will fetch the base address of the bounds
    directory out of a userspace register (bndcfgu) and then cache it into
    a new field (->bd_addr) in the 'mm_struct'. PR_MPX_DISABLE_MANAGEMENT
    will set "bd_addr" to an invalid address. Using this scheme, we can
    use "bd_addr" to determine whether the management of bounds tables in
    kernel is enabled.

    Also, the only way to access that bndcfgu register is via an xsaves,
    which can be expensive. Caching "bd_addr" like this also helps reduce
    the cost of those xsaves when doing table cleanup at munmap() time.
    Unfortunately, we can not apply this optimization to #BR fault time
    because we need an xsave to get the value of BNDSTATUS.

    ==== Why does the hardware even have these Bounds Tables? ====

    MPX only has 4 hardware registers for storing bounds information.
    If MPX-enabled code needs more than these 4 registers, it needs to
    spill them somewhere. It has two special instructions for this
    which allow the bounds to be moved between the bounds registers
    and some new "bounds tables".

    They are similar conceptually to a page fault and will be raised by
    the MPX hardware during both bounds violations or when the tables
    are not present. This patch handles those #BR exceptions for
    not-present tables by carving the space out of the normal processes
    address space (essentially calling the new mmap() interface indroduced
    earlier in this patch set.) and then pointing the bounds-directory
    over to it.

    The tables *need* to be accessed and controlled by userspace because
    the instructions for moving bounds in and out of them are extremely
    frequent. They potentially happen every time a register pointing to
    memory is dereferenced. Any direct kernel involvement (like a syscall)
    to access the tables would obviously destroy performance.

    ==== Why not do this in userspace? ====

    This patch is obviously doing this allocation in the kernel.
    However, MPX does not strictly *require* anything in the kernel.
    It can theoretically be done completely from userspace. Here are
    a few ways this *could* be done. I don't think any of them are
    practical in the real-world, but here they are.

    Q: Can virtual space simply be reserved for the bounds tables so
    that we never have to allocate them?
    A: As noted earlier, these tables are *HUGE*. An X-GB virtual
    area needs 4*X GB of virtual space, plus 2GB for the bounds
    directory. If we were to preallocate them for the 128TB of
    user virtual address space, we would need to reserve 512TB+2GB,
    which is larger than the entire virtual address space today.
    This means they can not be reserved ahead of time. Also, a
    single process's pre-popualated bounds directory consumes 2GB
    of virtual *AND* physical memory. IOW, it's completely
    infeasible to prepopulate bounds directories.

    Q: Can we preallocate bounds table space at the same time memory
    is allocated which might contain pointers that might eventually
    need bounds tables?
    A: This would work if we could hook the site of each and every
    memory allocation syscall. This can be done for small,
    constrained applications. But, it isn't practical at a larger
    scale since a given app has no way of controlling how all the
    parts of the app might allocate memory (think libraries). The
    kernel is really the only place to intercept these calls.

    Q: Could a bounds fault be handed to userspace and the tables
    allocated there in a signal handler instead of in the kernel?
    A: (thanks to tglx) mmap() is not on the list of safe async
    handler functions and even if mmap() would work it still
    requires locking or nasty tricks to keep track of the
    allocation state there.

    Having ruled out all of the userspace-only approaches for managing
    bounds tables that we could think of, we create them on demand in
    the kernel.

    Based-on-patch-by: Qiaowei Ren
    Signed-off-by: Dave Hansen
    Cc: linux-mm@kvack.org
    Cc: linux-mips@linux-mips.org
    Cc: Dave Hansen
    Link: http://lkml.kernel.org/r/20141114151829.AD4310DE@viggo.jf.intel.com
    Signed-off-by: Thomas Gleixner

    Dave Hansen
     

17 Nov, 2014

1 commit

  • On architectures with hardware broadcasting of TLB invalidation messages
    , it makes sense to reduce the range of the mmu_gather structure when
    unmapping page ranges based on the dirty address information passed to
    tlb_remove_tlb_entry.

    arm64 already does this by directly manipulating the start/end fields
    of the gather structure, but this confuses the generic code which
    does not expect these fields to change and can end up calculating
    invalid, negative ranges when forcing a flush in zap_pte_range.

    This patch moves the minimal range calculation out of the arm64 code
    and into the generic implementation, simplifying zap_pte_range in the
    process (which no longer needs to care about start/end, since they will
    point to the appropriate ranges already). With the range being tracked
    by core code, the need_flush flag is dropped in favour of checking that
    the end of the range has actually been set.

    Cc: Benjamin Herrenschmidt
    Cc: Peter Zijlstra
    Cc: Russell King - ARM Linux
    Cc: Michal Simek
    Acked-by: Linus Torvalds
    Signed-off-by: Will Deacon

    Will Deacon
     

15 Nov, 2014

1 commit

  • This reverts commit e5a2c899957659cd1a9f789bc462f9c0b35f5150.

    Commit e5a2c899 introduced an alternative_call, arch_fast_hash2,
    that selects between __jhash2 and __intel_crc4_2_hash based on the
    X86_FEATURE_XMM4_2.

    Unfortunately, the alternative_call system does not appear to be
    suitable for use with C functions, as register usage is not handled
    properly for the called functions. The __jhash2 function in particular
    clobbers registers that are not preserved when called via
    alternative_call, resulting in a panic for direct callers of
    arch_fast_hash2 on older CPUs lacking sse4_2. It is possible that
    __intel_crc4_2_hash works merely by chance because it uses fewer
    registers.

    This commit was suggested as the source of the problem by Jesse
    Gross .

    Signed-off-by: Jay Vosburgh
    Signed-off-by: David S. Miller

    Jay Vosburgh
     

12 Nov, 2014

1 commit

  • * 'io' of git://git.kernel.org/pub/scm/linux/kernel/git/will/linux:
    documentation: memory-barriers: clarify relaxed io accessor semantics
    x86: io: implement dummy relaxed accessor macros for writes
    tile: io: implement dummy relaxed accessor macros for writes
    sparc: io: implement dummy relaxed accessor macros for writes
    powerpc: io: implement dummy relaxed accessor macros for writes
    parisc: io: implement dummy relaxed accessor macros for writes
    mn10300: io: implement dummy relaxed accessor macros for writes
    m68k: io: implement dummy relaxed accessor macros for writes
    m32r: io: implement dummy relaxed accessor macros for writes
    ia64: io: implement dummy relaxed accessor macros for writes
    cris: io: implement dummy relaxed accessor macros for writes
    frv: io: implement dummy relaxed accessor macros for writes
    xtensa: io: remove dummy relaxed accessor macros for reads
    s390: io: remove dummy relaxed accessor macros for reads
    microblaze: io: remove dummy relaxed accessor macros
    asm-generic: io: implement relaxed accessor macros as conditional wrappers

    Conflicts:
    include/asm-generic/io.h

    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

10 Nov, 2014

2 commits

  • Currently driver writers need to use io{read,write}{8,16,32}_rep() when
    accessing FIFO registers portably. This is bad for two reasons: it is
    inconsistent with how other registers are accessed using the standard
    {read,write}{b,w,l}() functions, which can lead to confusion. On some
    architectures the io{read,write}*() functions also need to perform some
    extra checks to determine whether an address is memory-mapped or refers
    to I/O space. Drivers which can be expected to never use I/O can safely
    use the {read,write}s{b,w,l,q}(), just like they use their non-string
    variants and there's no need for these extra checks.

    This patch implements generic versions of readsb(), readsw(), readsl(),
    readsq(), writesb(), writesw(), writesl() and writesq(). Variants of
    these string functions for I/O accesses (ins*() and outs*() as well as
    ioread*_rep() and iowrite*_rep()) are now implemented in terms of the
    new functions.

    Going forward, {read,write}{,s}{b,w,l,q}() should be used consistently
    by drivers for devices that will only ever be memory-mapped and hence
    don't need to access I/O space, whereas io{read,write}{8,16,32}_rep()
    should be used by drivers for devices that can be either memory-mapped
    or I/O-mapped.

    Signed-off-by: Thierry Reding

    Thierry Reding
     
  • Overriding I/O accessors and helpers is currently very inconsistent.
    This commit introduces a homogeneous way to override functions by
    checking for the existence of a macro with the same of the function.
    Architectures can provide their own implementations and communicate this
    to the generic header by defining the appropriate macro. Doing this will
    also help prevent the implementations from being subsequently
    overridden.

    While at it, also turn a lot of macros into static inline functions for
    better type checking and to provide a canonical signature for overriding
    architectures to copy. Also reorder functions by logical groups.

    Signed-off-by: Thierry Reding

    Thierry Reding
     

06 Nov, 2014

1 commit

  • By default the arch_fast_hash hashing function pointers are initialized
    to jhash(2). If during boot-up a CPU with SSE4.2 is detected they get
    updated to the CRC32 ones. This dispatching scheme incurs a function
    pointer lookup and indirect call for every hashing operation.

    rhashtable as a user of arch_fast_hash e.g. stores pointers to hashing
    functions in its structure, too, causing two indirect branches per
    hashing operation.

    Using alternative_call we can get away with one of those indirect branches.

    Acked-by: Daniel Borkmann
    Cc: Thomas Graf
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

28 Oct, 2014

1 commit

  • task_preempt_count() is pointless if preemption counter is per-cpu,
    currently this is x86 only. It is only valid if the task is not
    running, and even in this case the only info it can provide is the
    state of PREEMPT_ACTIVE bit.

    Change its single caller to check p->on_rq instead, this should be
    the same if p->state != TASK_RUNNING, and kill this helper.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Steven Rostedt
    Cc: Kirill Tkhai
    Cc: Alexander Graf
    Cc: Andrew Morton
    Cc: Arnd Bergmann
    Cc: Christoph Lameter
    Cc: Linus Torvalds
    Cc: linux-arch@vger.kernel.org
    Link: http://lkml.kernel.org/r/20141008183348.GC17495@redhat.com
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     

27 Oct, 2014

1 commit


21 Oct, 2014

1 commit

  • {read,write}{b,w,l,q}_relaxed are implemented by some architectures in
    order to permit memory-mapped I/O accesses with weaker barrier semantics
    than the non-relaxed variants.

    This patch adds wrappers to asm-generic so that drivers can rely on the
    relaxed accessors being available, even if they don't always provide
    weaker ordering guarantees. Since some architectures both include
    asm-generic/io.h and define some relaxed accessors, the definitions here
    are conditional for the time being.

    Cc: Arnd Bergmann
    Signed-off-by: Will Deacon

    Will Deacon
     

20 Oct, 2014

1 commit

  • Pull audit updates from Eric Paris:
    "So this change across a whole bunch of arches really solves one basic
    problem. We want to audit when seccomp is killing a process. seccomp
    hooks in before the audit syscall entry code. audit_syscall_entry
    took as an argument the arch of the given syscall. Since the arch is
    part of what makes a syscall number meaningful it's an important part
    of the record, but it isn't available when seccomp shoots the
    syscall...

    For most arch's we have a better way to get the arch (syscall_get_arch)
    So the solution was two fold: Implement syscall_get_arch() everywhere
    there is audit which didn't have it. Use syscall_get_arch() in the
    seccomp audit code. Having syscall_get_arch() everywhere meant it was
    a useless flag on the stack and we could get rid of it for the typical
    syscall entry.

    The other changes inside the audit system aren't grand, fixed some
    records that had invalid spaces. Better locking around the task comm
    field. Removing some dead functions and structs. Make some things
    static. Really minor stuff"

    * git://git.infradead.org/users/eparis/audit: (31 commits)
    audit: rename audit_log_remove_rule to disambiguate for trees
    audit: cull redundancy in audit_rule_change
    audit: WARN if audit_rule_change called illegally
    audit: put rule existence check in canonical order
    next: openrisc: Fix build
    audit: get comm using lock to avoid race in string printing
    audit: remove open_arg() function that is never used
    audit: correct AUDIT_GET_FEATURE return message type
    audit: set nlmsg_len for multicast messages.
    audit: use union for audit_field values since they are mutually exclusive
    audit: invalid op= values for rules
    audit: use atomic_t to simplify audit_serial()
    kernel/audit.c: use ARRAY_SIZE instead of sizeof/sizeof[0]
    audit: reduce scope of audit_log_fcaps
    audit: reduce scope of audit_net_id
    audit: arm64: Remove the audit arch argument to audit_syscall_entry
    arm64: audit: Add audit hook in syscall_trace_enter/exit()
    audit: x86: drop arch from __audit_syscall_entry() interface
    sparc: implement is_32bit_task
    sparc: properly conditionalize use of TIF_32BIT
    ...

    Linus Torvalds
     

19 Oct, 2014

1 commit

  • Pull MIPS updates from Ralf Baechle:
    "This is the MIPS pull request for the next kernel:

    - Zubair's patch series adds CMA support for MIPS. Doing so it also
    touches ARM64 and x86.
    - remove the last instance of IRQF_DISABLED from arch/mips
    - updates to two of the MIPS defconfig files.
    - cleanup of how cache coherency bits are handled on MIPS and
    implement support for write-combining.
    - platform upgrades for Alchemy
    - move MIPS DTS files to arch/mips/boot/dts/"

    * 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (24 commits)
    MIPS: ralink: remove deprecated IRQF_DISABLED
    MIPS: pgtable.h: Implement the pgprot_writecombine function for MIPS
    MIPS: cpu-probe: Set the write-combine CCA value on per core basis
    MIPS: pgtable-bits: Define the CCA bit for WC writes on Ingenic cores
    MIPS: pgtable-bits: Move the CCA bits out of the core's ifdef blocks
    MIPS: DMA: Add cma support
    x86: use generic dma-contiguous.h
    arm64: use generic dma-contiguous.h
    asm-generic: Add dma-contiguous.h
    MIPS: BPF: Add new emit_long_instr macro
    MIPS: ralink: Move device-trees to arch/mips/boot/dts/
    MIPS: Netlogic: Move device-trees to arch/mips/boot/dts/
    MIPS: sead3: Move device-trees to arch/mips/boot/dts/
    MIPS: Lantiq: Move device-trees to arch/mips/boot/dts/
    MIPS: Octeon: Move device-trees to arch/mips/boot/dts/
    MIPS: Add support for building device-tree binaries
    MIPS: Create common infrastructure for building built-in device-trees
    MIPS: SEAD3: Enable DEVTMPFS
    MIPS: SEAD3: Regenerate defconfigs
    MIPS: Alchemy: DB1300: Add touch penirq support
    ...

    Linus Torvalds
     

15 Oct, 2014

1 commit

  • Pull clock tree updates from Mike Turquette:
    "The clk tree changes for 3.18 are dominated by clock drivers. Mostly
    fixes and enhancements to existing drivers as well as new drivers.
    This tag contains a bit more arch code than I usually take due to some
    OMAP2+ changes. Additionally it contains the restart notifier
    handlers which are merged as a dependency into several trees.

    The PXA changes are the only messy part. Due to having a stable tree
    I had to revert one patch and follow up with one more fix near the tip
    of this tag. Some dead code is introduced but it will soon become
    live code after 3.18-rc1 is released as the rest of the PXA family is
    converted over to the common clock framework.

    Another trend in this tag is that multiple vendors have started to
    push the complexity of changing their CPU frequency into the clock
    driver, whereas this used to be done in CPUfreq drivers.

    Changes to the clk core include a generic gpio-clock type and a
    clk_set_phase() function added to the top-level clk.h api. Due to
    some confusion on the fbdev mailing list the kernel boot parameters
    documentation was updated to further explain the clk_ignore_unused
    parameter, which is often required by users of the simplefb driver.

    Finally some fixes to the locking around the clock debugfs stuff was
    done to prevent deadlocks when interacting with other subsystems."

    * tag 'clk-for-linus-3.18' of git://git.linaro.org/people/mike.turquette/linux: (99 commits)
    clk: pxa clocks build system fix
    Revert "arm: pxa: Transition pxa27x to clk framework"
    clk: samsung: register restart handlers for s3c2412 and s3c2443
    clk: rockchip: add restart handler
    clk: rockchip: rk3288: i2s_frac adds flag to set parent's rate
    doc/kernel-parameters.txt: clarify clk_ignore_unused
    arm: pxa: Transition pxa27x to clk framework
    dts: add devicetree bindings for pxa27x clocks
    clk: add pxa27x clock drivers
    arm: pxa: add clock pll selection bits
    clk: dts: document pxa clock binding
    clk: add pxa clocks infrastructure
    clk: gpio-gate: Ensure gpiod_ APIs are prototyped
    clk: ti: dra7-atl-clock: Mark the device as pm_runtime_irq_safe
    clk: ti: LLVMLinux: Move __init outside of type definition
    clk: ti: consider the fact that of_clk_get() might return an error
    clk: ti: dra7-atl-clock: fix a memory leak
    clk: ti: change clock init to use generic of_clk_init
    clk: hix5hd2: add I2C clocks
    clk: hix5hd2: add watchdog0 clocks
    ...

    Linus Torvalds
     

14 Oct, 2014

1 commit

  • For VMAs that don't want write notifications, PTEs created for read faults
    have their write bit set. If the read fault happens after VM_SOFTDIRTY is
    cleared, then the PTE's softdirty bit will remain clear after subsequent
    writes.

    Here's a simple code snippet to demonstrate the bug:

    char* m = mmap(NULL, getpagesize(), PROT_READ | PROT_WRITE,
    MAP_ANONYMOUS | MAP_SHARED, -1, 0);
    system("echo 4 > /proc/$PPID/clear_refs"); /* clear VM_SOFTDIRTY */
    assert(*m == '\0'); /* new PTE allows write access */
    assert(!soft_dirty(x));
    *m = 'x'; /* should dirty the page */
    assert(soft_dirty(x)); /* fails */

    With this patch, write notifications are enabled when VM_SOFTDIRTY is
    cleared. Furthermore, to avoid unnecessary faults, write notifications
    are disabled when VM_SOFTDIRTY is set.

    As a side effect of enabling and disabling write notifications with
    care, this patch fixes a bug in mprotect where vm_page_prot bits set by
    drivers were zapped on mprotect. An analogous bug was fixed in mmap by
    commit c9d0bf241451 ("mm: uncached vma support with writenotify").

    Signed-off-by: Peter Feiner
    Reported-by: Peter Feiner
    Suggested-by: Kirill A. Shutemov
    Cc: Cyrill Gorcunov
    Cc: Pavel Emelyanov
    Cc: Jamie Liu
    Cc: Hugh Dickins
    Cc: Naoya Horiguchi
    Cc: Bjorn Helgaas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Feiner
     

13 Oct, 2014

2 commits

  • Pull scheduler updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave
    Hansen)

    - Various sched/idle refinements for better idle handling (Nicolas
    Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot)

    - sched/numa updates and optimizations (Rik van Riel)

    - sysbench speedup (Vincent Guittot)

    - capacity calculation cleanups/refactoring (Vincent Guittot)

    - Various cleanups to thread group iteration (Oleg Nesterov)

    - Double-rq-lock removal optimization and various refactorings
    (Kirill Tkhai)

    - various sched/deadline fixes

    ... and lots of other changes"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
    sched/dl: Use dl_bw_of() under rcu_read_lock_sched()
    sched/fair: Delete resched_cpu() from idle_balance()
    sched, time: Fix build error with 64 bit cputime_t on 32 bit systems
    sched: Improve sysbench performance by fixing spurious active migration
    sched/x86: Fix up typo in topology detection
    x86, sched: Add new topology for multi-NUMA-node CPUs
    sched/rt: Use resched_curr() in task_tick_rt()
    sched: Use rq->rd in sched_setaffinity() under RCU read lock
    sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask'
    sched: Use dl_bw_of() under RCU read lock
    sched/fair: Remove duplicate code from can_migrate_task()
    sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
    sched: print_rq(): Don't use tasklist_lock
    sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()
    sched: Fix the task-group check in tg_has_rt_tasks()
    sched/fair: Leverage the idle state info when choosing the "idlest" cpu
    sched: Let the scheduler see CPU idle states
    sched/deadline: Fix inter- exclusive cpusets migrations
    sched/deadline: Clear dl_entity params when setscheduling to different class
    sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault()
    ...

    Linus Torvalds
     
  • Pull arch atomic cleanups from Ingo Molnar:
    "This is a series kept separate from the main locking tree, which
    cleans up and improves various details in the atomics type handling:

    - Remove the unused atomic_or_long() method

    - Consolidate and compress atomic ops implementations between
    architectures, to reduce linecount and to make it easier to add new
    ops.

    - Rewrite generic atomic support to only require cmpxchg() from an
    architecture - generate all other methods from that"

    * 'locking-arch-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
    locking,arch: Use ACCESS_ONCE() instead of cast to volatile in atomic_read()
    locking, mips: Fix atomics
    locking, sparc64: Fix atomics
    locking,arch: Rewrite generic atomic support
    locking,arch,xtensa: Fold atomic_ops
    locking,arch,sparc: Fold atomic_ops
    locking,arch,sh: Fold atomic_ops
    locking,arch,powerpc: Fold atomic_ops
    locking,arch,parisc: Fold atomic_ops
    locking,arch,mn10300: Fold atomic_ops
    locking,arch,mips: Fold atomic_ops
    locking,arch,metag: Fold atomic_ops
    locking,arch,m68k: Fold atomic_ops
    locking,arch,m32r: Fold atomic_ops
    locking,arch,ia64: Fold atomic_ops
    locking,arch,hexagon: Fold atomic_ops
    locking,arch,cris: Fold atomic_ops
    locking,arch,avr32: Fold atomic_ops
    locking,arch,arm64: Fold atomic_ops
    locking,arch,arm: Fold atomic_ops
    ...

    Linus Torvalds
     

11 Oct, 2014

1 commit

  • Pull dma-mapping update from Marek Szyprowski:
    "Provide the dma write coherent api (available previously on ARM
    architecture) for all other architectures, which use dma_ops-based dma
    mapping implementation.

    This lets one to use the same code in the device drivers regardless of
    the selected architecture"

    * 'for-v3.18' of git://git.linaro.org/people/mszyprowski/linux-dma-mapping:
    dma-mapping: Provide write-combine allocations
    s390: Implement dma_{alloc,free}_attrs()

    Linus Torvalds
     

10 Oct, 2014

5 commits

  • Merge patch-bomb from Andrew Morton:
    - part of OCFS2 (review is laggy again)
    - procfs
    - slab
    - all of MM
    - zram, zbud
    - various other random things: arch, filesystems.

    * emailed patches from Andrew Morton : (164 commits)
    nosave: consolidate __nosave_{begin,end} in
    include/linux/screen_info.h: remove unused ORIG_* macros
    kernel/sys.c: compat sysinfo syscall: fix undefined behavior
    kernel/sys.c: whitespace fixes
    acct: eliminate compile warning
    kernel/async.c: switch to pr_foo()
    include/linux/blkdev.h: use NULL instead of zero
    include/linux/kernel.h: deduplicate code implementing clamp* macros
    include/linux/kernel.h: rewrite min3, max3 and clamp using min and max
    alpha: use Kbuild logic to include
    frv: remove deprecated IRQF_DISABLED
    frv: remove unused cpuinfo_frv and friends to fix future build error
    zbud: avoid accessing last unused freelist
    zsmalloc: simplify init_zspage free obj linking
    mm/zsmalloc.c: correct comment for fullness group computation
    zram: use notify_free to account all free notifications
    zram: report maximum used memory
    zram: zram memory size limitation
    zsmalloc: change return value unit of zs_get_total_size_bytes
    zsmalloc: move pages_allocated to zs_pool
    ...

    Linus Torvalds
     
  • The different architectures used their own (and different) declarations:

    extern __visible const void __nosave_begin, __nosave_end;
    extern const void __nosave_begin, __nosave_end;
    extern long __nosave_begin, __nosave_end;

    Consolidate them using the first variant in .

    Signed-off-by: Geert Uytterhoeven
    Cc: Russell King
    Cc: Ralf Baechle
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: "David S. Miller"
    Cc: Guan Xuetao
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     
  • For architectures without coherent DMA, memory for DMA may need to be
    remapped with coherent attributes. Factor out the the remapping code from
    arm and put it in a common location to reduce code duplication.

    As part of this, the arm APIs are now migrated away from
    ioremap_page_range to the common APIs which use map_vm_area for remapping.
    This should be an equivalent change and using map_vm_area is more correct
    as ioremap_page_range is intended to bring in io addresses into the cpu
    space and not regular kernel managed memory.

    Signed-off-by: Laura Abbott
    Reviewed-by: Catalin Marinas
    Cc: Arnd Bergmann
    Cc: David Riley
    Cc: Olof Johansson
    Cc: Ritesh Harjain
    Cc: Russell King
    Cc: Thierry Reding
    Cc: Will Deacon
    Cc: James Hogan
    Cc: Laura Abbott
    Cc: Mitchel Humpherys
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Laura Abbott
     
  • ARCH_USES_NUMA_PROT_NONE was defined for architectures that implemented
    _PAGE_NUMA using _PROT_NONE. This saved using an additional PTE bit and
    relied on the fact that PROT_NONE vmas were skipped by the NUMA hinting
    fault scanner. This was found to be conceptually confusing with a lot of
    implicit assumptions and it was asked that an alternative be found.

    Commit c46a7c81 "x86: define _PAGE_NUMA by reusing software bits on the
    PMD and PTE levels" redefined _PAGE_NUMA on x86 to be one of the swap PTE
    bits and shrunk the maximum possible swap size but it did not go far
    enough. There are no architectures that reuse _PROT_NONE as _PROT_NUMA
    but the relics still exist.

    This patch removes ARCH_USES_NUMA_PROT_NONE and removes some unnecessary
    duplication in powerpc vs the generic implementation by defining the types
    the core NUMA helpers expected to exist from x86 with their ppc64
    equivalent. This necessitated that a PTE bit mask be created that
    identified the bits that distinguish present from NUMA pte entries but it
    is expected this will only differ between arches based on _PAGE_PROTNONE.
    The naming for the generic helpers was taken from x86 originally but ppc64
    has types that are equivalent for the purposes of the helper so they are
    mapped instead of duplicating code.

    Signed-off-by: Mel Gorman
    Cc: Hugh Dickins
    Cc: "Kirill A. Shutemov"
    Cc: Rik van Riel
    Cc: Johannes Weiner
    Cc: Cyrill Gorcunov
    Reviewed-by: Aneesh Kumar K.V
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Pull PCI updates from Bjorn Helgaas:
    "The interesting things here are:

    - Turn on Config Request Retry Status Software Visibility. This
    caused hangs last time, but we included a fix this time.
    - Rework PCI device configuration to use _HPP/_HPX more aggressively
    - Allow PCI devices to be put into D3cold during system suspend
    - Add arm64 PCI support
    - Add APM X-Gene host bridge driver
    - Add TI Keystone host bridge driver
    - Add Xilinx AXI host bridge driver

    More detailed summary:

    Enumeration
    - Check Vendor ID only for Config Request Retry Status (Rajat Jain)
    - Enable Config Request Retry Status when supported (Rajat Jain)
    - Add generic domain handling (Catalin Marinas)
    - Generate uppercase hex for modalias interface class (Ricardo Ribalda Delgado)

    Resource management
    - Add missing MEM_64 mask in pci_assign_unassigned_bridge_resources() (Yinghai Lu)
    - Increase IBM ipr SAS Crocodile BARs to at least system page size (Douglas Lehr)

    PCI device hotplug
    - Prevent NULL dereference during pciehp probe (Andreas Noever)
    - Move _HPP & _HPX handling into core (Bjorn Helgaas)
    - Apply _HPP to PCIe devices as well as PCI (Bjorn Helgaas)
    - Apply _HPP/_HPX to display devices (Bjorn Helgaas)
    - Preserve SERR & PARITY settings when applying _HPP/_HPX (Bjorn Helgaas)
    - Preserve MPS and MRRS settings when applying _HPP/_HPX (Bjorn Helgaas)
    - Apply _HPP/_HPX to all devices, not just hot-added ones (Bjorn Helgaas)
    - Fix wait time in pciehp timeout message (Yinghai Lu)
    - Add more pciehp Slot Control debug output (Yinghai Lu)
    - Stop disabling pciehp notifications during init (Yinghai Lu)

    MSI
    - Remove arch_msi_check_device() (Alexander Gordeev)
    - Rename pci_msi_check_device() to pci_msi_supported() (Alexander Gordeev)
    - Move D0 check into pci_msi_check_device() (Alexander Gordeev)
    - Remove unused kobject from struct msi_desc (Yijing Wang)
    - Remove "pos" from the struct msi_desc msi_attrib (Yijing Wang)
    - Add "msi_bus" sysfs MSI/MSI-X control for endpoints (Yijing Wang)
    - Use __get_cached_msi_msg() instead of get_cached_msi_msg() (Yijing Wang)
    - Use __read_msi_msg() instead of read_msi_msg() (Yijing Wang)
    - Use __write_msi_msg() instead of write_msi_msg() (Yijing Wang)

    Power management
    - Drop unused runtime PM support code for PCIe ports (Rafael J. Wysocki)
    - Allow PCI devices to be put into D3cold during system suspend (Rafael J. Wysocki)

    AER
    - Add additional AER error strings (Gong Chen)
    - Make standalone includable (Thierry Reding)

    Virtualization
    - Add ACS quirk for Solarflare SFC9120 & SFC9140 (Alex Williamson)
    - Add ACS quirk for Intel 10G NICs (Alex Williamson)
    - Add ACS quirk for AMD A88X southbridge (Marti Raudsepp)
    - Remove unused pci_find_upstream_pcie_bridge(), pci_get_dma_source() (Alex Williamson)
    - Add device flag helpers (Ethan Zhao)
    - Assume all Mellanox devices have broken INTx masking (Gavin Shan)

    Generic host bridge driver
    - Fix ioport_map() for !CONFIG_GENERIC_IOMAP (Liviu Dudau)
    - Add pci_register_io_range() and pci_pio_to_address() (Liviu Dudau)
    - Define PCI_IOBASE as the base of virtual PCI IO space (Liviu Dudau)
    - Fix the conversion of IO ranges into IO resources (Liviu Dudau)
    - Add pci_get_new_domain_nr() and of_get_pci_domain_nr() (Liviu Dudau)
    - Add support for parsing PCI host bridge resources from DT (Liviu Dudau)
    - Add pci_remap_iospace() to map bus I/O resources (Liviu Dudau)
    - Add arm64 architectural support for PCI (Liviu Dudau)

    APM X-Gene
    - Add APM X-Gene PCIe driver (Tanmay Inamdar)
    - Add arm64 DT APM X-Gene PCIe device tree nodes (Tanmay Inamdar)

    Freescale i.MX6
    - Probe in module_init(), not fs_initcall() (Lucas Stach)
    - Delay enabling reference clock for SS until it stabilizes (Tim Harvey)

    Marvell MVEBU
    - Fix uninitialized variable in mvebu_get_tgt_attr() (Thomas Petazzoni)

    NVIDIA Tegra
    - Make sure the PCIe PLL is really reset (Eric Yuen)
    - Add error path tegra_msi_teardown_irq() cleanup (Jisheng Zhang)
    - Fix extended configuration space mapping (Peter Daifuku)
    - Implement resource hierarchy (Thierry Reding)
    - Clear CLKREQ# enable on port disable (Thierry Reding)
    - Add Tegra124 support (Thierry Reding)

    ST Microelectronics SPEAr13xx
    - Pass config resource through reg property (Pratyush Anand)

    Synopsys DesignWare
    - Use NULL instead of false (Fabio Estevam)
    - Parse bus-range property from devicetree (Lucas Stach)
    - Use pci_create_root_bus() instead of pci_scan_root_bus() (Lucas Stach)
    - Remove pci_assign_unassigned_resources() (Lucas Stach)
    - Check private_data validity in single place (Lucas Stach)
    - Setup and clear exactly one MSI at a time (Lucas Stach)
    - Remove open-coded bitmap operations (Lucas Stach)
    - Fix configuration base address when using 'reg' (Minghuan Lian)
    - Fix IO resource end address calculation (Minghuan Lian)
    - Rename get_msi_data() to get_msi_addr() (Minghuan Lian)
    - Add get_msi_data() to pcie_host_ops (Minghuan Lian)
    - Add support for v3.65 hardware (Murali Karicheri)
    - Fold struct pcie_port_info into struct pcie_port (Pratyush Anand)

    TI Keystone
    - Add TI Keystone PCIe driver (Murali Karicheri)
    - Limit MRSS for all downstream devices (Murali Karicheri)
    - Assume controller is already in RC mode (Murali Karicheri)
    - Set device ID based on SoC to support multiple ports (Murali Karicheri)

    Xilinx AXI
    - Add Xilinx AXI PCIe driver (Srikanth Thokala)
    - Fix xilinx_pcie_assign_msi() return value test (Dan Carpenter)

    Miscellaneous
    - Clean up whitespace (Quentin Lambert)
    - Remove assignments from "if" conditions (Quentin Lambert)
    - Move PCI_VENDOR_ID_VMWARE to pci_ids.h (Francesco Ruggeri)
    - x86: Mark DMI tables as initialization data (Mathias Krause)
    - x86: Move __init annotation to the correct place (Mathias Krause)
    - x86: Mark constants of pci_mmcfg_nvidia_mcp55() as __initconst (Mathias Krause)
    - x86: Constify pci_mmcfg_probes[] array (Mathias Krause)
    - x86: Mark PCI BIOS initialization code as such (Mathias Krause)
    - Parenthesize PCI_DEVID and PCI_VPD_LRDT_ID parameters (Megan Kamiya)
    - Remove unnecessary variable in pci_add_dynid() (Tobias Klauser)"

    * tag 'pci-v3.18-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (109 commits)
    arm64: dts: Add APM X-Gene PCIe device tree nodes
    PCI: Add ACS quirk for AMD A88X southbridge devices
    PCI: xgene: Add APM X-Gene PCIe driver
    PCI: designware: Remove open-coded bitmap operations
    PCI/MSI: Remove unnecessary temporary variable
    PCI/MSI: Use __write_msi_msg() instead of write_msi_msg()
    MSI/powerpc: Use __read_msi_msg() instead of read_msi_msg()
    PCI/MSI: Use __get_cached_msi_msg() instead of get_cached_msi_msg()
    PCI/MSI: Add "msi_bus" sysfs MSI/MSI-X control for endpoints
    PCI/MSI: Remove "pos" from the struct msi_desc msi_attrib
    PCI/MSI: Remove unused kobject from struct msi_desc
    PCI/MSI: Rename pci_msi_check_device() to pci_msi_supported()
    PCI/MSI: Move D0 check into pci_msi_check_device()
    PCI/MSI: Remove arch_msi_check_device()
    irqchip: armada-370-xp: Remove arch_msi_check_device()
    PCI/MSI/PPC: Remove arch_msi_check_device()
    arm64: Add architectural support for PCI
    PCI: Add pci_remap_iospace() to map bus I/O resources
    of/pci: Add support for parsing PCI host bridge resources from DT
    of/pci: Add pci_get_new_domain_nr() and of_get_pci_domain_nr()
    ...

    Conflicts:
    arch/arm64/boot/dts/apm-storm.dtsi

    Linus Torvalds