08 Apr, 2015

1 commit

  • New code will require TRACE_SYSTEM to be a valid C variable name,
    but some tracepoints have TRACE_SYSTEM with '-' and not '_', so
    it can not be used. Instead, add a TRACE_SYSTEM_VAR that can
    give the tracing infrastructure a unique name for the trace system.

    Link: http://lkml.kernel.org/r/20150402111500.5e52c1ed.cornelia.huck@de.ibm.com

    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: David Hildenbrand
    Cc: Christian Borntraeger
    Acked-by: Cornelia Huck
    Reviewed-by: Masami Hiramatsu
    Tested-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

10 Mar, 2015

2 commits

  • Pull kvm/s390 bugfixes from Marcelo Tosatti.

    * git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: s390: non-LPAR case obsolete during facilities mask init
    KVM: s390: include guest facilities in kvm facility test
    KVM: s390: fix in memory copy of facility lists
    KVM: s390/cpacf: Fix kernel bug under z/VM
    KVM: s390/cpacf: Enable key wrapping by default

    Linus Torvalds
     
  • Pull s390 fixes from Martin Schwidefsky:
    "One performance optimization for page_clear and a couple of bug fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
    s390/mm: fix incorrect ASCE after crst_table_downgrade
    s390/ftrace: fix crashes when switching tracers / add notrace to cpu_relax()
    s390/pci: unify pci_iomap symbol exports
    s390/pci: fix [un]map_resources sequence
    s390: let the compiler do page clearing
    s390/pci: fix possible information leak in mmio syscall
    s390/dcss: array index 'i' is used before limits check.
    s390/scm_block: fix off by one during cluster reservation
    s390/jump label: improve and fix sanity check
    s390/jump label: add missing jump_label_apply_nops() call

    Linus Torvalds
     

06 Mar, 2015

1 commit


04 Mar, 2015

4 commits

  • With patch "include guest facilities in kvm facility test" it is no
    longer necessary to have special handling for the non-LPAR case.

    Signed-off-by: Michael Mueller
    Signed-off-by: Christian Borntraeger

    Michael Mueller
     
  • Most facility related decisions in KVM have to take into account:

    - the facilities offered by the underlying run container (LPAR/VM)
    - the facilities supported by the KVM code itself
    - the facilities requested by a guest VM

    This patch adds the KVM driver requested facilities to the test routine.

    It additionally renames struct s390_model_fac to kvm_s390_fac and its field
    names to be more meaningful.

    The semantics of the facilities stored in the KVM architecture structure
    is changed. The address arch.model.fac->list now points to the guest
    facility list and arch.model.fac->mask points to the KVM facility mask.

    This patch fixes the behaviour of KVM for some facilities for guests
    that ignore the guest visible facility bits, e.g. guests could use
    transactional memory intructions on hosts supporting them even if the
    chosen cpu model would not offer them.

    The userspace interface is not affected by this change.

    Signed-off-by: Michael Mueller
    Signed-off-by: Christian Borntraeger

    Michael Mueller
     
  • The facility lists were not fully copied.

    Signed-off-by: Michael Mueller
    Signed-off-by: Christian Borntraeger

    Michael Mueller
     
  • Under z/VM PQAP might trigger an operation exception if no crypto cards
    are defined via APVIRTUAL or APDEDICATED.

    [ 386.098666] Kernel BUG at 0000000000135c56 [verbose debug info unavailable]
    [ 386.098693] illegal operation: 0001 ilc:2 [#1] SMP
    [...]
    [ 386.098751] Krnl PSW : 0704c00180000000 0000000000135c56 (kvm_s390_apxa_installed+0x46/0x98)
    [...]
    [ 386.098804] [] kvm_arch_init_vm+0x29c/0x358
    [ 386.098806] [] kvm_dev_ioctl+0xc0/0x460
    [ 386.098809] [] do_vfs_ioctl+0x332/0x508
    [ 386.098811] [] SyS_ioctl+0x9e/0xb0
    [ 386.098814] [] system_call+0xd6/0x258
    [ 386.098815] [] 0x3fffc7400a2

    Lets add an extable entry and provide a zeroed config in that case.

    Reported-by: Stefan Zimmermann
    Signed-off-by: Christian Borntraeger
    Reviewed-by: Thomas Huth
    Tested-by: Stefan Zimmermann

    Christian Borntraeger
     

03 Mar, 2015

3 commits

  • z/VM and LPAR enable key wrapping by default, lets do the same on KVM.

    Signed-off-by: Tony Krowiak
    Signed-off-by: Christian Borntraeger

    Tony Krowiak
     
  • The switch_mm function does nothing in case the prev and next mm
    are the same. It can happen that a crst_table_downgrade has changed
    the top-level pgd in the meantime on a different CPU. Always store
    the new ASCE to be picked up in entry.S.

    [heiko.carstens@de.ibm.com]: Bug was introduced with git commit
    53e857f30867 ("s390/mm,tlb: race of lazy TLB flush vs. recreation
    of TLB entries") and causes random crashes due to broken page tables
    being used.

    Reported-by: Dominik Vogt
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens

    Martin Schwidefsky
     
  • With git commit 4d92f50249eb ("s390: reintroduce diag 44 calls for
    cpu_relax()") I reintroduced a non-trivial cpu_relax() variant on s390.

    The difference to the previous variant however is that the new version is
    an out-of-line function, which will be traced if function tracing is enabled.

    Switching to different tracers includes instruction patching. Therefore this
    is done within stop_machine() "context" to prevent that any function tracing
    is going on while instructions are being patched.
    With the new out-of-line variant of cpu_relax() this is not true anymore,
    since cpu_relax() gets called in a busy loop by all waiting cpus within
    stop_machine() until function patching is finished.
    Therefore cpu_relax() must be marked notrace.

    This fixes kernel crashes when frequently switching between "function" and
    "function_graph" tracers.

    Moving cpu_relax() to a header file again, doesn't work because of header
    include order dependencies.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     

01 Mar, 2015

1 commit

  • Core mm expects __PAGETABLE_{PUD,PMD}_FOLDED to be defined if these page
    table levels folded. Usually, these defines are provided by
    and .

    But some architectures fold page table levels in a custom way. They
    need to define these macros themself. This patch adds missing defines.

    The patch fixes mm->nr_pmds underflow and eliminates dead __pmd_alloc()
    and __pud_alloc() on architectures without these page table levels.

    Signed-off-by: Kirill A. Shutemov
    Cc: Aaro Koskinen
    Cc: David Howells
    Cc: Geert Uytterhoeven
    Cc: Heiko Carstens
    Cc: Helge Deller
    Cc: "James E.J. Bottomley"
    Cc: Koichi Yasutake
    Cc: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

28 Feb, 2015

2 commits

  • Since commit 8cfc99b58366 ("s390: add pci_iomap_range") we use
    EXPORT_SYMBOL for pci_iomap but EXPORT_SYMBOL_GPL for pci_iounmap.
    Change the related functions to use EXPORT_SYMBOL like the asm-generic
    variants do.

    Signed-off-by: Sebastian Ott
    Signed-off-by: Martin Schwidefsky

    Sebastian Ott
     
  • Commit 8cfc99b58366 ("s390: add pci_iomap_range") introduced counters
    to keep track of the number of mappings created. This revealed that
    we don't have our internal mappings in order when using hotunplug or
    resume from hibernate. This patch addresses both issues.

    Signed-off-by: Sebastian Ott
    Signed-off-by: Martin Schwidefsky

    Sebastian Ott
     

26 Feb, 2015

4 commits

  • The hardware folks told me that for page clearing "when you exactly
    know what to do, hand written xc+pfd is usally faster then mvcl for
    page clearing, as it saves millicode overhead and parameter parsing
    and checking" as long as you dont need the cache bypassing.
    Turns out that gcc already does a proper xc,pfd loop.

    A small test on z196 that does

    buff = mmap(NULL, bufsize,PROT_EXEC|PROT_WRITE|PROT_READ,AP_PRIVATE| MAP_ANONYMOUS,0,0);
    for ( i = 0; i < bufsize; i+= 256)
    buff[i] = 0x5;

    gets 20% faster (touches every cache line of a page)

    and

    buff = mmap(NULL, bufsize,PROT_EXEC|PROT_WRITE|PROT_READ,AP_PRIVATE| MAP_ANONYMOUS,0,0);
    for ( i = 0; i < bufsize; i+= 4096)
    buff[i] = 0x5;

    is within noise ratio (touches one cache line of a page).

    As the clear_page is usually called for first memory accesses
    we can assume that at least one cache line is used afterwards,
    so this change should be always better.
    Another benchmark, a make -j 40 of my testsuite in tmpfs with
    hot caches on a 32cpu system:

    -- unpatched -- -- patched --
    real 0m1.017s real 0m0.994s (~2% faster, but in noise)
    user 0m5.339s user 0m5.016s (~6% faster)
    sys 0m0.691s sys 0m0.632s (~8% faster)

    Let use the same define to memset as the asm-generic variant

    Signed-off-by: Christian Borntraeger
    Signed-off-by: Martin Schwidefsky

    Christian Borntraeger
     
  • Make sure that even in error situations we do not use copy_to_user
    on uninitialized kernel memory.

    Cc: stable@vger.kernel.org # 3.19+
    Signed-off-by: Sebastian Ott
    Signed-off-by: Martin Schwidefsky

    Sebastian Ott
     
  • Fix the output of the jump label sanity check and also print the
    code pattern that is supposed to be written to the jump label.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • When modules are loaded we want to transform the compile time generated
    nops into runtime generated nops. Otherwise the jump label sanity check
    will detect invalid code when trying to patch code.

    Fixes this crash:

    Jump label code mismatch at __rds_conn_create+0x3c/0x720
    Found: c0 04 00 00 00 01
    Expected: c0 04 00 00 00 00
    Kernel panic - not syncing: Corrupted kernel text
    CPU: 0 PID: 10 Comm: migration/0 Not tainted 3.19.0-01935-g006610f #14
    Call Trace:
    show_trace+0xf8/0x158)
    show_stack+0x6a/0xe8
    dump_stack+0x7c/0xd8
    panic+0xe4/0x288
    jump_label_bug.isra.2+0xbe/0xc001
    __jump_label_transform+0x94/0xc8

    Reported-by: Sebastian Ott
    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     

23 Feb, 2015

2 commits

  • Pull more vfs updates from Al Viro:
    "Assorted stuff from this cycle. The big ones here are multilayer
    overlayfs from Miklos and beginning of sorting ->d_inode accesses out
    from David"

    * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (51 commits)
    autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation
    procfs: fix race between symlink removals and traversals
    debugfs: leave freeing a symlink body until inode eviction
    Documentation/filesystems/Locking: ->get_sb() is long gone
    trylock_super(): replacement for grab_super_passive()
    fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions
    Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions
    VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry)
    SELinux: Use d_is_positive() rather than testing dentry->d_inode
    Smack: Use d_is_positive() rather than testing dentry->d_inode
    TOMOYO: Use d_is_dir() rather than d_inode and S_ISDIR()
    Apparmor: Use d_is_positive/negative() rather than testing dentry->d_inode
    Apparmor: mediated_filesystem() should use dentry->d_sb not inode->i_sb
    VFS: Split DCACHE_FILE_TYPE into regular and special types
    VFS: Add a fallthrough flag for marking virtual dentries
    VFS: Add a whiteout dentry type
    VFS: Introduce inode-getting helpers for layered/unioned fs environments
    Infiniband: Fix potential NULL d_inode dereference
    posix_acl: fix reference leaks in posix_acl_create
    autofs4: Wrong format for printing dentry
    ...

    Linus Torvalds
     
  • Convert the following where appropriate:

    (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry).

    (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry).

    (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more
    complicated than it appears as some calls should be converted to
    d_can_lookup() instead. The difference is whether the directory in
    question is a real dir with a ->lookup op or whether it's a fake dir with
    a ->d_automount op.

    In some circumstances, we can subsume checks for dentry->d_inode not being
    NULL into this, provided we the code isn't in a filesystem that expects
    d_inode to be NULL if the dirent really *is* negative (ie. if we're going to
    use d_inode() rather than d_backing_inode() to get the inode pointer).

    Note that the dentry type field may be set to something other than
    DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS
    manages the fall-through from a negative dentry to a lower layer. In such a
    case, the dentry type of the negative union dentry is set to the same as the
    type of the lower dentry.

    However, if you know d_inode is not NULL at the call site, then you can use
    the d_is_xxx() functions even in a filesystem.

    There is one further complication: a 0,0 chardev dentry may be labelled
    DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was
    intended for special directory entry types that don't have attached inodes.

    The following perl+coccinelle script was used:

    use strict;

    my @callers;
    open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') ||
    die "Can't grep for S_ISDIR and co. callers";
    @callers = ;
    close($fd);
    unless (@callers) {
    print "No matches\n";
    exit(0);
    }

    my @cocci = (
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISLNK(E->d_inode->i_mode)',
    '+ d_is_symlink(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISDIR(E->d_inode->i_mode)',
    '+ d_is_dir(E)',
    '',
    '@@',
    'expression E;',
    '@@',
    '',
    '- S_ISREG(E->d_inode->i_mode)',
    '+ d_is_reg(E)' );

    my $coccifile = "tmp.sp.cocci";
    open($fd, ">$coccifile") || die $coccifile;
    print($fd "$_\n") || die $coccifile foreach (@cocci);
    close($fd);

    foreach my $file (@callers) {
    chomp $file;
    print "Processing ", $file, "\n";
    system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 ||
    die "spatch failed";
    }

    [AV: overlayfs parts skipped]

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

22 Feb, 2015

1 commit

  • Pull s390 fixes from Martin Schwidefsky:
    "Two patches to save some memory if CONFIG_NR_CPUS is large, a changed
    default for the use of compare-and-delay, and a couple of bug fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
    s390/spinlock: disabled compare-and-delay by default
    s390/mm: align 64-bit PIE binaries to 4GB
    s390/cacheinfo: coding style changes
    s390/cacheinfo: fix shared cpu masks
    s390/smp: reduce size of struct pcpu
    s390/topology: convert cpu_topology array to per cpu variable
    s390/topology: delay initialization of topology cpu masks
    s390/vdso: fix clock_gettime for CLOCK_THREAD_CPUTIME_ID, -2 and -3

    Linus Torvalds
     

20 Feb, 2015

2 commits


19 Feb, 2015

2 commits

  • The base address (STACK_TOP / 3 * 2) for a 64-bit program is two thirds
    into the 4GB segment at 0x2aa00000000. The randomization added on z13
    can eat another 1GB of the remaining 1.33GB to the next 4GB boundary.
    In the worst case 300MB are left for the executable + bss which may
    cross into the next 4GB segment. This is bad for branch prediction,
    therefore align the base address to 4GB to give the program more room
    before it crosses the 4GB boundary.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • Pull virtio updates from Rusty Russell:
    "OK, this has the big virtio 1.0 implementation, as specified by OASIS.

    On top of tht is the major rework of lguest, to use PCI and virtio
    1.0, to double-check the implementation.

    Then comes the inevitable fixes and cleanups from that work"

    * tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (80 commits)
    virtio: don't set VIRTIO_CONFIG_S_DRIVER_OK twice.
    virtio_net: unconditionally define struct virtio_net_hdr_v1.
    tools/lguest: don't use legacy definitions for net device in example launcher.
    virtio: Don't expose legacy net features when VIRTIO_NET_NO_LEGACY defined.
    tools/lguest: use common error macros in the example launcher.
    tools/lguest: give virtqueues names for better error messages
    tools/lguest: more documentation and checking of virtio 1.0 compliance.
    lguest: don't look in console features to find emerg_wr.
    tools/lguest: don't start devices until DRIVER_OK status set.
    tools/lguest: handle indirect partway through chain.
    tools/lguest: insert driver references from the 1.0 spec (4.1 Virtio Over PCI)
    tools/lguest: insert device references from the 1.0 spec (4.1 Virtio Over PCI)
    tools/lguest: rename virtio_pci_cfg_cap field to match spec.
    tools/lguest: fix features_accepted logic in example launcher.
    tools/lguest: handle device reset correctly in example launcher.
    virtual: Documentation: simplify and generalize paravirt_ops.txt
    lguest: remove NOTIFY call and eventfd facility.
    lguest: remove NOTIFY facility from demonstration launcher.
    lguest: use the PCI console device's emerg_wr for early boot messages.
    lguest: always put console in PCI slot #1.
    ...

    Linus Torvalds
     

14 Feb, 2015

2 commits

  • For instrumenting global variables KASan will shadow memory backing memory
    for modules. So on module loading we will need to allocate memory for
    shadow and map it at address in shadow that corresponds to the address
    allocated in module_alloc().

    __vmalloc_node_range() could be used for this purpose, except it puts a
    guard hole after allocated area. Guard hole in shadow memory should be a
    problem because at some future point we might need to have a shadow memory
    at address occupied by guard hole. So we could fail to allocate shadow
    for module_alloc().

    Now we have VM_NO_GUARD flag disabling guard page, so we need to pass into
    __vmalloc_node_range(). Add new parameter 'vm_flags' to
    __vmalloc_node_range() function.

    Signed-off-by: Andrey Ryabinin
    Cc: Dmitry Vyukov
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrey Konovalov
    Cc: Yuri Gribov
    Cc: Konstantin Khlebnikov
    Cc: Sasha Levin
    Cc: Christoph Lameter
    Cc: Joonsoo Kim
    Cc: Dave Hansen
    Cc: Andi Kleen
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: "H. Peter Anvin"
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     
  • Pull KVM update from Paolo Bonzini:
    "Fairly small update, but there are some interesting new features.

    Common:
    Optional support for adding a small amount of polling on each HLT
    instruction executed in the guest (or equivalent for other
    architectures). This can improve latency up to 50% on some
    scenarios (e.g. O_DSYNC writes or TCP_RR netperf tests). This
    also has to be enabled manually for now, but the plan is to
    auto-tune this in the future.

    ARM/ARM64:
    The highlights are support for GICv3 emulation and dirty page
    tracking

    s390:
    Several optimizations and bugfixes. Also a first: a feature
    exposed by KVM (UUID and long guest name in /proc/sysinfo) before
    it is available in IBM's hypervisor! :)

    MIPS:
    Bugfixes.

    x86:
    Support for PML (page modification logging, a new feature in
    Broadwell Xeons that speeds up dirty page tracking), nested
    virtualization improvements (nested APICv---a nice optimization),
    usual round of emulation fixes.

    There is also a new option to reduce latency of the TSC deadline
    timer in the guest; this needs to be tuned manually.

    Some commits are common between this pull and Catalin's; I see you
    have already included his tree.

    Powerpc:
    Nothing yet.

    The KVM/PPC changes will come in through the PPC maintainers,
    because I haven't received them yet and I might end up being
    offline for some part of next week"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (130 commits)
    KVM: ia64: drop kvm.h from installed user headers
    KVM: x86: fix build with !CONFIG_SMP
    KVM: x86: emulate: correct page fault error code for NoWrite instructions
    KVM: Disable compat ioctl for s390
    KVM: s390: add cpu model support
    KVM: s390: use facilities and cpu_id per KVM
    KVM: s390/CPACF: Choose crypto control block format
    s390/kernel: Update /proc/sysinfo file with Extended Name and UUID
    KVM: s390: reenable LPP facility
    KVM: s390: floating irqs: fix user triggerable endless loop
    kvm: add halt_poll_ns module parameter
    kvm: remove KVM_MMIO_SIZE
    KVM: MIPS: Don't leak FPU/DSP to guest
    KVM: MIPS: Disable HTW while in guest
    KVM: nVMX: Enable nested posted interrupt processing
    KVM: nVMX: Enable nested virtual interrupt delivery
    KVM: nVMX: Enable nested apic register virtualization
    KVM: nVMX: Make nested control MSRs per-cpu
    KVM: nVMX: Enable nested virtualize x2apic mode
    KVM: nVMX: Prepare for using hardware MSR bitmap
    ...

    Linus Torvalds
     

13 Feb, 2015

2 commits

  • Now that all in-tree users of strnicmp have been converted to
    strncasecmp, the wrapper can be removed.

    Signed-off-by: Rasmus Villemoes
    Cc: David Howells
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     
  • If an attacker can cause a controlled kernel stack overflow, overwriting
    the restart block is a very juicy exploit target. This is because the
    restart_block is held in the same memory allocation as the kernel stack.

    Moving the restart block to struct task_struct prevents this exploit by
    making the restart_block harder to locate.

    Note that there are other fields in thread_info that are also easy
    targets, at least on some architectures.

    It's also a decent simplification, since the restart code is more or less
    identical on all architectures.

    [james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
    Signed-off-by: Andy Lutomirski
    Cc: Thomas Gleixner
    Cc: Al Viro
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Kees Cook
    Cc: David Miller
    Acked-by: Richard Weinberger
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: Vineet Gupta
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Steven Miao
    Cc: Mark Salter
    Cc: Aurelien Jacquiot
    Cc: Mikael Starvik
    Cc: Jesper Nilsson
    Cc: David Howells
    Cc: Richard Kuo
    Cc: "Luck, Tony"
    Cc: Geert Uytterhoeven
    Cc: Michal Simek
    Cc: Ralf Baechle
    Cc: Jonas Bonn
    Cc: "James E.J. Bottomley"
    Cc: Helge Deller
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Acked-by: Michael Ellerman (powerpc)
    Tested-by: Michael Ellerman (powerpc)
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Chen Liqin
    Cc: Lennox Wu
    Cc: Chris Metcalf
    Cc: Guan Xuetao
    Cc: Chris Zankel
    Cc: Max Filippov
    Cc: Oleg Nesterov
    Cc: Guenter Roeck
    Signed-off-by: James Hogan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     

12 Feb, 2015

11 commits

  • Just some minor coding style changes, while I had to look at the code.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • When testing Sudeep Holla's cache info rework I didn't realize that the
    shared cpu masks are broken (all have the same cpu set).
    Let's fix this.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Reduce the size of struct pcpu, since the pcpu_devices array consists
    of NR_CPUS elements of type struct pcpu. For most machines this is just
    a waste of memory.
    So let's try to make it a bit smaller.
    This saves 16k with performance_defconfig.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Convert the per cpu topology cpu masks to a per cpu variable.
    At least for machines which do have less possible cpus than NR_CPUS this can
    save a bit of memory (z/VM: max 64 vs 512 for performance_defconfig).

    This reduces the kernel image size by 100k.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • There is no reason to initialize the topology cpu masks already while
    setup_arch() is being called. It is sufficient to initialize the masks
    before the scheduler becomes SMP aware.
    Therefore a pre-SMP initcall aka early_initcall is suffucient.

    This also allows to convert the cpu_topology array into a per cpu
    variable with a later patch. Without this patch this wouldn't be
    possible since the per cpu memory areas are not allocated while setup_arch
    is executed.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Git commit 8d8f2e18a6dbd3d09dd918788422e6ac8c878e96
    "s390/vdso: ectg gettime support for CLOCK_THREAD_CPUTIME_ID"
    broke clock_gettime for CLOCK_THREAD_CPUTIME_ID.

    Git commit c742b31c03f37c5c499178f09f57381aa6c70131
    "fast vdso implementation for CLOCK_THREAD_CPUTIME_ID"
    introduced the ECTG for clock id -2. Correct would have been
    clock id -3.

    Fix the whole mess, CLOCK_THREAD_CPUTIME_ID is based on
    CPUCLOCK_SCHED and can not be speed up by the vdso. A speedup
    is only available for clock id -3 which is CPUCLOCK_VIRT for
    the task currently running on the CPU.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • Merge second set of updates from Andrew Morton:
    "More of MM"

    * emailed patches from Andrew Morton : (83 commits)
    mm/nommu.c: fix arithmetic overflow in __vm_enough_memory()
    mm/mmap.c: fix arithmetic overflow in __vm_enough_memory()
    vmstat: Reduce time interval to stat update on idle cpu
    mm/page_owner.c: remove unnecessary stack_trace field
    Documentation/filesystems/proc.txt: describe /proc//map_files
    mm: incorporate read-only pages into transparent huge pages
    vmstat: do not use deferrable delayed work for vmstat_update
    mm: more aggressive page stealing for UNMOVABLE allocations
    mm: always steal split buddies in fallback allocations
    mm: when stealing freepages, also take pages created by splitting buddy page
    mincore: apply page table walker on do_mincore()
    mm: /proc/pid/clear_refs: avoid split_huge_page()
    mm: pagewalk: fix misbehavior of walk_page_range for vma(VM_PFNMAP)
    mempolicy: apply page table walker on queue_pages_range()
    arch/powerpc/mm/subpage-prot.c: use walk->vma and walk_page_vma()
    memcg: cleanup preparation for page table walk
    numa_maps: remove numa_maps->vma
    numa_maps: fix typo in gather_hugetbl_stats
    pagemap: use walk->vma instead of calling find_vma()
    clear_refs: remove clear_refs_private->vma and introduce clear_refs_test_walk()
    ...

    Linus Torvalds
     
  • Pull s390 updates from Martin Schwidefsky:

    - The remaining patches for the z13 machine support: kernel build
    option for z13, the cache synonym avoidance, SMT support,
    compare-and-delay for spinloops and the CES5S crypto adapater.

    - The ftrace support for function tracing with the gcc hotpatch option.
    This touches common code Makefiles, Steven is ok with the changes.

    - The hypfs file system gets an extension to access diagnose 0x0c data
    in user space for performance analysis for Linux running under z/VM.

    - The iucv hvc console gets wildcard spport for the user id filtering.

    - The cacheinfo code is converted to use the generic infrastructure.

    - Cleanup and bug fixes.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (42 commits)
    s390/process: free vx save area when releasing tasks
    s390/hypfs: Eliminate hypfs interval
    s390/hypfs: Add diagnose 0c support
    s390/cacheinfo: don't use smp_processor_id() in preemptible context
    s390/zcrypt: fixed domain scanning problem (again)
    s390/smp: increase maximum value of NR_CPUS to 512
    s390/jump label: use different nop instruction
    s390/jump label: add sanity checks
    s390/mm: correct missing space when reporting user process faults
    s390/dasd: cleanup profiling
    s390/dasd: add locking for global_profile access
    s390/ftrace: hotpatch support for function tracing
    ftrace: let notrace function attribute disable hotpatching if necessary
    ftrace: allow architectures to specify ftrace compile options
    s390: reintroduce diag 44 calls for cpu_relax()
    s390/zcrypt: Add support for new crypto express (CEX5S) adapter.
    s390/zcrypt: Number of supported ap domains is not retrievable.
    s390/spinlock: add compare-and-delay to lock wait loops
    s390/tape: remove redundant if statement
    s390/hvc_iucv: add simple wildcard matches to the iucv allow filter
    ...

    Linus Torvalds
     
  • This allows the get_user_pages_fast slow path to release the mmap_sem
    before blocking.

    Signed-off-by: Andrea Arcangeli
    Reviewed-by: Kirill A. Shutemov
    Cc: Andres Lagar-Cavilla
    Cc: Peter Feiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • LKP has triggered a compiler warning after my recent patch "mm: account
    pmd page tables to the process":

    mm/mmap.c: In function 'exit_mmap':
    >> mm/mmap.c:2857:2: warning: right shift count >= width of type [enabled by default]

    The code:

    > 2857 WARN_ON(mm_nr_pmds(mm) >
    2858 round_up(FIRST_USER_ADDRESS, PUD_SIZE) >> PUD_SHIFT);

    In this, on tile, we have FIRST_USER_ADDRESS defined as 0. round_up() has
    the same type -- int. PUD_SHIFT.

    I think the best way to fix it is to define FIRST_USER_ADDRESS as unsigned
    long. On every arch for consistency.

    Signed-off-by: Kirill A. Shutemov
    Reported-by: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Currently we have many duplicates in definitions around
    follow_huge_addr(), follow_huge_pmd(), and follow_huge_pud(), so this
    patch tries to remove the m. The basic idea is to put the default
    implementation for these functions in mm/hugetlb.c as weak symbols
    (regardless of CONFIG_ARCH_WANT_GENERAL_HUGETL B), and to implement
    arch-specific code only when the arch needs it.

    For follow_huge_addr(), only powerpc and ia64 have their own
    implementation, and in all other architectures this function just returns
    ERR_PTR(-EINVAL). So this patch sets returning ERR_PTR(-EINVAL) as
    default.

    As for follow_huge_(pmd|pud)(), if (pmd|pud)_huge() is implemented to
    always return 0 in your architecture (like in ia64 or sparc,) it's never
    called (the callsite is optimized away) no matter how implemented it is.
    So in such architectures, we don't need arch-specific implementation.

    In some architecture (like mips, s390 and tile,) their current
    arch-specific follow_huge_(pmd|pud)() are effectively identical with the
    common code, so this patch lets these architecture use the common code.

    One exception is metag, where pmd_huge() could return non-zero but it
    expects follow_huge_pmd() to always return NULL. This means that we need
    arch-specific implementation which returns NULL. This behavior looks
    strange to me (because non-zero pmd_huge() implies that the architecture
    supports PMD-based hugepage, so follow_huge_pmd() can/should return some
    relevant value,) but that's beyond this cleanup patch, so let's keep it.

    Justification of non-trivial changes:
    - in s390, follow_huge_pmd() checks !MACHINE_HAS_HPAGE at first, and this
    patch removes the check. This is OK because we can assume MACHINE_HAS_HPAGE
    is true when follow_huge_pmd() can be called (note that pmd_huge() has
    the same check and always returns 0 for !MACHINE_HAS_HPAGE.)
    - in s390 and mips, we use HPAGE_MASK instead of PMD_MASK as done in common
    code. This patch forces these archs use PMD_MASK, but it's OK because
    they are identical in both archs.
    In s390, both of HPAGE_SHIFT and PMD_SHIFT are 20.
    In mips, HPAGE_SHIFT is defined as (PAGE_SHIFT + PAGE_SHIFT - 3) and
    PMD_SHIFT is define as (PAGE_SHIFT + PAGE_SHIFT + PTE_ORDER - 3), but
    PTE_ORDER is always 0, so these are identical.

    Signed-off-by: Naoya Horiguchi
    Acked-by: Hugh Dickins
    Cc: James Hogan
    Cc: David Rientjes
    Cc: Mel Gorman
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Luiz Capitulino
    Cc: Nishanth Aravamudan
    Cc: Lee Schermerhorn
    Cc: Steve Capper
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi