19 Nov, 2016

8 commits

  • Drop duplicate header scatterlist.h from iommu_common.h.

    Signed-off-by: Geliang Tang
    Signed-off-by: David S. Miller

    Geliang Tang
     
  • This new config parameter limits the space used for "Lock debugging:
    prove locking correctness" by about 4MB. The current sparc systems have
    the limitation of 32MB size for kernel size including .text, .data and
    .bss sections. With PROVE_LOCKING feature, the kernel size could grow
    beyond this limit and causing system boot-up issues. With this option,
    kernel limits the size of the entries of lock_chains, stack_trace etc.,
    so that kernel fits in required size limit. This is not visible to user
    and only used for sparc.

    Signed-off-by: Babu Moger
    Acked-by: Sam Ravnborg
    Signed-off-by: David S. Miller

    Babu Moger
     
  • ATU 64bit addressing allows PCIe devices with 64bit DMA capabilities
    to use ATU for 64bit DMA.

    Signed-off-by: Tushar Dave
    Reviewed-by: chris hyser
    Acked-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Tushar Dave
     
  • Add Hypervisor IOMMU v2 APIs pci_iotsb_map(), pci_iotsb_demap() and
    enable sun4v dma ops to use IOMMU v2 API for all PCIe devices with
    64bit DMA mask.

    Signed-off-by: Tushar Dave
    Reviewed-by: chris hyser
    Acked-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Tushar Dave
     
  • In order to use Hypervisor (HV) IOMMU v2 API for map/demap, each PCIe
    device has to be bound to IOTSB using HV API pci_iotsb_bind().

    Signed-off-by: Tushar Dave
    Reviewed-by: chris hyser
    Acked-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Tushar Dave
     
  • Like legacy IOMMU, use common iommu_map_table and iommu_pool for ATU.
    This change initializes iommu_map_table and iommu_pool for ATU.

    Signed-off-by: Tushar Dave
    Reviewed-by: chris hyser
    Reviewed-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Tushar Dave
     
  • ATU (Address Translation Unit) is a new IOMMU in SPARC supported with
    Hypervisor IOMMU v2 APIs.

    Current SPARC IOMMU supports only 32bit address ranges and one TSB
    per PCIe root complex that has a 2GB per root complex DVMA space
    limit. The limit has become a scalability bottleneck nowadays that
    a typical 10G/40G NIC can consume 300MB-500MB DVMA space per
    instance. When DVMA resource is exhausted, devices will not be usable
    since the driver can't allocate DVMA.

    ATU removes bottleneck by allowing guest os to create IOTSB of size
    32G (or more) with 64bit address ranges available in ATU HW. 32G is
    more than enough DVMA space to be shared by all PCIe devices under
    root complex contrast to 2G space provided by legacy IOMMU.

    ATU allows PCIe devices to use 64bit DMA addressing. Devices
    which choose to use 32bit DMA mask will continue to work with the
    existing legacy IOMMU.

    Signed-off-by: Tushar Dave
    Reviewed-by: chris hyser
    Acked-by: Sowmini Varadhan
    Signed-off-by: David S. Miller

    Tushar Dave
     
  • This change allows ATU (new IOMMU) in SPARC systems to request
    large (32M) contiguous memory during boot for creating IOTSB backing
    store.

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Tushar Dave
    Signed-off-by: David S. Miller

    Dave Kleikamp
     

15 Nov, 2016

1 commit

  • A compile warning is introduced by a commit to fix the find_node().
    This patch fix the compile warning by moving find_node() into __init
    section. Because find_node() is only used by memblock_nid_range() which
    is only used by a __init add_node_ranges(). find_node() and
    memblock_nid_range() should also be inside __init section.

    Signed-off-by: Thomas Tai
    Signed-off-by: David S. Miller

    Thomas Tai
     

11 Nov, 2016

2 commits

  • Signed-off-by: Andreas Larsson
    Signed-off-by: David S. Miller

    Andreas Larsson
     
  • When booting up LDOM, find_node() warns that a physical address
    doesn't match a NUMA node.

    WARNING: CPU: 0 PID: 0 at arch/sparc/mm/init_64.c:835
    find_node+0xf4/0x120 find_node: A physical address doesn't
    match a NUMA node rule. Some physical memory will be
    owned by node 0.Modules linked in:

    CPU: 0 PID: 0 Comm: swapper Not tainted 4.9.0-rc3 #4
    Call Trace:
    [0000000000468ba0] __warn+0xc0/0xe0
    [0000000000468c74] warn_slowpath_fmt+0x34/0x60
    [00000000004592f4] find_node+0xf4/0x120
    [0000000000dd0774] add_node_ranges+0x38/0xe4
    [0000000000dd0b1c] numa_parse_mdesc+0x268/0x2e4
    [0000000000dd0e9c] bootmem_init+0xb8/0x160
    [0000000000dd174c] paging_init+0x808/0x8fc
    [0000000000dcb0d0] setup_arch+0x2c8/0x2f0
    [0000000000dc68a0] start_kernel+0x48/0x424
    [0000000000dcb374] start_early_boot+0x27c/0x28c
    [0000000000a32c08] tlb_fixup_done+0x4c/0x64
    [0000000000027f08] 0x27f08

    It is because linux use an internal structure node_masks[] to
    keep the best memory latency node only. However, LDOM mdesc can
    contain single latency-group with multiple memory latency nodes.

    If the address doesn't match the best latency node within
    node_masks[], it should check for an alternative via mdesc.
    The warning message should only be printed if the address
    doesn't match any node_masks[] nor within mdesc. To minimize
    the impact of searching mdesc every time, the last matched
    mask and index is stored in a variable.

    Signed-off-by: Thomas Tai
    Reviewed-by: Chris Hyser
    Reviewed-by: Liam Merwick
    Signed-off-by: David S. Miller

    Thomas Tai
     

28 Oct, 2016

1 commit

  • When the vmalloc area gets fragmented, and because the firmware
    mapping area sits between where modules live and the vmalloc area, we
    can sometimes receive requests for enormous kernel TLB range flushes.

    When this happens the cpu just spins flushing billions of pages and
    this triggers the NMI watchdog and other problems.

    We took care of this on the TSB side by doing a linear scan of the
    table once we pass a certain threshold.

    Do something similar for the TLB flush, however we are limited by
    the TLB flush facilities provided by the different chip variants.

    First of all we use an (mostly arbitrary) cut-off of 256K which is
    about 32 pages. This can be tuned in the future.

    The huge range code path for each chip works as follows:

    1) On spitfire we flush all non-locked TLB entries using diagnostic
    acceses.

    2) On cheetah we use the "flush all" TLB flush.

    3) On sun4v/hypervisor we do a TLB context flush on context 0, which
    unlike previous chips does not remove "permanent" or locked
    entries.

    We could probably do something better on spitfire, such as limiting
    the flush to kernel TLB entries or even doing range comparisons.
    However that probably isn't worth it since those chips are old and
    the TLB only had 64 entries.

    Reported-by: James Clarke
    Tested-by: James Clarke
    Signed-off-by: David S. Miller

    David S. Miller
     

27 Oct, 2016

2 commits


26 Oct, 2016

3 commits


25 Oct, 2016

14 commits


19 Oct, 2016

3 commits

  • Merge the gup_flags cleanups from Lorenzo Stoakes:
    "This patch series adjusts functions in the get_user_pages* family such
    that desired FOLL_* flags are passed as an argument rather than
    implied by flags.

    The purpose of this change is to make the use of FOLL_FORCE explicit
    so it is easier to grep for and clearer to callers that this flag is
    being used. The use of FOLL_FORCE is an issue as it overrides missing
    VM_READ/VM_WRITE flags for the VMA whose pages we are reading
    from/writing to, which can result in surprising behaviour.

    The patch series came out of the discussion around commit 38e088546522
    ("mm: check VMA flags to avoid invalid PROT_NONE NUMA balancing"),
    which addressed a BUG_ON() being triggered when a page was faulted in
    with PROT_NONE set but having been overridden by FOLL_FORCE.
    do_numa_page() was run on the assumption the page _must_ be one marked
    for NUMA node migration as an actual PROT_NONE page would have been
    dealt with prior to this code path, however FOLL_FORCE introduced a
    situation where this assumption did not hold.

    See

    https://marc.info/?l=linux-mm&m=147585445805166

    for the patch proposal"

    Additionally, there's a fix for an ancient bug related to FOLL_FORCE and
    FOLL_WRITE by me.

    [ This branch was rebased recently to add a few more acked-by's and
    reviewed-by's ]

    * gup_flag-cleanups:
    mm: replace access_process_vm() write parameter with gup_flags
    mm: replace access_remote_vm() write parameter with gup_flags
    mm: replace __access_remote_vm() write parameter with gup_flags
    mm: replace get_user_pages_remote() write/force parameters with gup_flags
    mm: replace get_user_pages() write/force parameters with gup_flags
    mm: replace get_vaddr_frames() write/force parameters with gup_flags
    mm: replace get_user_pages_locked() write/force parameters with gup_flags
    mm: replace get_user_pages_unlocked() write/force parameters with gup_flags
    mm: remove write/force parameters from __get_user_pages_unlocked()
    mm: remove write/force parameters from __get_user_pages_locked()
    mm: remove gup_flags FOLL_WRITE games from __get_user_pages()

    Linus Torvalds
     
  • This removes the 'write' argument from access_process_vm() and replaces
    it with 'gup_flags' as use of this function previously silently implied
    FOLL_FORCE, whereas after this patch callers explicitly pass this flag.

    We make this explicit as use of FOLL_FORCE can result in surprising
    behaviour (and hence bugs) within the mm subsystem.

    Signed-off-by: Lorenzo Stoakes
    Acked-by: Jesper Nilsson
    Acked-by: Michal Hocko
    Acked-by: Michael Ellerman
    Signed-off-by: Linus Torvalds

    Lorenzo Stoakes
     
  • This removes the 'write' and 'force' use from get_user_pages_unlocked()
    and replaces them with 'gup_flags' to make the use of FOLL_FORCE
    explicit in callers as use of this flag can result in surprising
    behaviour (and hence bugs) within the mm subsystem.

    Signed-off-by: Lorenzo Stoakes
    Reviewed-by: Jan Kara
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Lorenzo Stoakes
     

15 Oct, 2016

1 commit

  • Pull kbuild updates from Michal Marek:

    - EXPORT_SYMBOL for asm source by Al Viro.

    This does bring a regression, because genksyms no longer generates
    checksums for these symbols (CONFIG_MODVERSIONS). Nick Piggin is
    working on a patch to fix this.

    Plus, we are talking about functions like strcpy(), which rarely
    change prototypes.

    - Fixes for PPC fallout of the above by Stephen Rothwell and Nick
    Piggin

    - fixdep speedup by Alexey Dobriyan.

    - preparatory work by Nick Piggin to allow architectures to build with
    -ffunction-sections, -fdata-sections and --gc-sections

    - CONFIG_THIN_ARCHIVES support by Stephen Rothwell

    - fix for filenames with colons in the initramfs source by me.

    * 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild: (22 commits)
    initramfs: Escape colons in depfile
    ppc: there is no clear_pages to export
    powerpc/64: whitelist unresolved modversions CRCs
    kbuild: -ffunction-sections fix for archs with conflicting sections
    kbuild: add arch specific post-link Makefile
    kbuild: allow archs to select link dead code/data elimination
    kbuild: allow architectures to use thin archives instead of ld -r
    kbuild: Regenerate genksyms lexer
    kbuild: genksyms fix for typeof handling
    fixdep: faster CONFIG_ search
    ia64: move exports to definitions
    sparc32: debride memcpy.S a bit
    [sparc] unify 32bit and 64bit string.h
    sparc: move exports to definitions
    ppc: move exports to definitions
    arm: move exports to definitions
    s390: move exports to definitions
    m68k: move exports to definitions
    alpha: move exports to actual definitions
    x86: move exports to actual definitions
    ...

    Linus Torvalds
     

12 Oct, 2016

1 commit

  • Pull uaccess.h prepwork from Al Viro:
    "Preparations to tree-wide switch to use of linux/uaccess.h (which,
    obviously, will allow to start unifying stuff for real). The last step
    there, ie

    PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*'
    sed -i -e "s!$PATT!#include !" \
    `git grep -l "$PATT"|grep -v ^include/linux/uaccess.h`

    is not taken here - I would prefer to do it once just before or just
    after -rc1. However, everything should be ready for it"

    * 'work.uaccess2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    remove a stray reference to asm/uaccess.h in docs
    sparc64: separate extable_64.h, switch elf_64.h to it
    score: separate extable.h, switch module.h to it
    mips: separate extable.h, switch module.h to it
    x86: separate extable.h, switch sections.h to it
    remove stray include of asm/uaccess.h from cacheflush.h
    mn10300: remove a bogus processor.h->uaccess.h include
    xtensa: split uaccess.h into C and asm sides
    bonding: quit messing with IOCTL
    kill __kernel_ds_p off
    mn10300: finish verify_area() off
    frv: move HAVE_ARCH_UNMAPPED_AREA to pgtable.h
    exceptions: detritus removal

    Linus Torvalds
     

08 Oct, 2016

3 commits

  • When doing an nmi backtrace of many cores, most of which are idle, the
    output is a little overwhelming and very uninformative. Suppress
    messages for cpus that are idling when they are interrupted and just
    emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".

    We do this by grouping all the cpuidle code together into a new
    .cpuidle.text section, and then checking the address of the interrupted
    PC to see if it lies within that section.

    This commit suitably tags x86 and tile idle routines, and only adds in
    the minimal framework for other architectures.

    Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.com
    Signed-off-by: Chris Metcalf
    Acked-by: Peter Zijlstra (Intel)
    Tested-by: Peter Zijlstra (Intel)
    Tested-by: Daniel Thompson [arm]
    Tested-by: Petr Mladek
    Cc: Aaron Tomlin
    Cc: Peter Zijlstra (Intel)
    Cc: "Rafael J. Wysocki"
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Metcalf
     
  • Patch series "improvements to the nmi_backtrace code" v9.

    This patch series modifies the trigger_xxx_backtrace() NMI-based remote
    backtracing code to make it more flexible, and makes a few small
    improvements along the way.

    The motivation comes from the task isolation code, where there are
    scenarios where we want to be able to diagnose a case where some cpu is
    about to interrupt a task-isolated cpu. It can be helpful to see both
    where the interrupting cpu is, and also an approximation of where the
    cpu that is being interrupted is. The nmi_backtrace framework allows us
    to discover the stack of the interrupted cpu.

    I've tested that the change works as desired on tile, and build-tested
    x86, arm, mips, and sparc64. For x86 I confirmed that the generic
    cpuidle stuff as well as the architecture-specific routines are in the
    new cpuidle section. For arm, mips, and sparc I just build-tested it
    and made sure the generic cpuidle routines were in the new cpuidle
    section, but I didn't attempt to figure out which the platform-specific
    idle routines might be. That might be more usefully done by someone
    with platform experience in follow-up patches.

    This patch (of 4):

    Currently you can only request a backtrace of either all cpus, or all
    cpus but yourself. It can also be helpful to request a remote backtrace
    of a single cpu, and since we want that, the logical extension is to
    support a cpumask as the underlying primitive.

    This change modifies the existing lib/nmi_backtrace.c code to take a
    cpumask as its basic primitive, and modifies the linux/nmi.h code to use
    the new "cpumask" method instead.

    The existing clients of nmi_backtrace (arm and x86) are converted to
    using the new cpumask approach in this change.

    The other users of the backtracing API (sparc64 and mips) are converted
    to use the cpumask approach rather than the all/allbutself approach.
    The mips code ignored the "include_self" boolean but with this change it
    will now also dump a local backtrace if requested.

    Link: http://lkml.kernel.org/r/1472487169-14923-2-git-send-email-cmetcalf@mellanox.com
    Signed-off-by: Chris Metcalf
    Tested-by: Daniel Thompson [arm]
    Reviewed-by: Aaron Tomlin
    Reviewed-by: Petr Mladek
    Cc: "Rafael J. Wysocki"
    Cc: Russell King
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Ralf Baechle
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Metcalf
     
  • This came to light when implementing native 64-bit atomics for ARCv2.

    The atomic64 self-test code uses CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
    to check whether atomic64_dec_if_positive() is available. It seems it
    was needed when not every arch defined it. However as of current code
    the Kconfig option seems needless

    - for CONFIG_GENERIC_ATOMIC64 it is auto-enabled in lib/Kconfig and a
    generic definition of API is present lib/atomic64.c
    - arches with native 64-bit atomics select it in arch/*/Kconfig and
    define the API in their headers

    So I see no point in keeping the Kconfig option

    Compile tested for:
    - blackfin (CONFIG_GENERIC_ATOMIC64)
    - x86 (!CONFIG_GENERIC_ATOMIC64)
    - ia64

    Link: http://lkml.kernel.org/r/1473703083-8625-3-git-send-email-vgupta@synopsys.com
    Signed-off-by: Vineet Gupta
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Ralf Baechle
    Cc: "James E.J. Bottomley"
    Cc: Helge Deller
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Michael Ellerman
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: "David S. Miller"
    Cc: Chris Metcalf
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Vineet Gupta
    Cc: Zhaoxiu Zeng
    Cc: Linus Walleij
    Cc: Alexander Potapenko
    Cc: Andrey Ryabinin
    Cc: Herbert Xu
    Cc: Ming Lin
    Cc: Arnd Bergmann
    Cc: Geert Uytterhoeven
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Andi Kleen
    Cc: Boqun Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vineet Gupta
     

06 Oct, 2016

1 commit

  • Pull sparc updates from David Miller:
    "Besides some cleanups the major thing here is supporting relaxed
    ordering PCIe transactions on newer sparc64 machines, from Chris
    Hyser"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
    sparc: fixing ident and beautifying code
    sparc64: Enable setting "relaxed ordering" in IOMMU mappings
    sparc64: Enable PCI IOMMU version 2 API
    sparc: migrate exception table users off module.h and onto extable.h

    Linus Torvalds