03 May, 2014

1 commit


28 Apr, 2014

1 commit

  • This merges the patch to fix possible loss of dirty bit on munmap() or
    madvice(DONTNEED). If there are concurrent writers on other CPU's that
    have the unmapped/unneeded page in their TLBs, their writes to the page
    could possibly get lost if a third CPU raced with the TLB flush and did
    a page_mkclean() before the page was fully written.

    Admittedly, if you unmap() or madvice(DONTNEED) an area _while_ another
    thread is still busy writing to it, you deserve all the lost writes you
    could get. But we kernel people hold ourselves to higher quality
    standards than "crazy people deserve to lose", because, well, we've seen
    people do all kinds of crazy things.

    So let's get it right, just because we can, and we don't have to worry
    about it.

    * safe-dirty-tlb-flush:
    mm: split 'tlb_flush_mmu()' into tlb flushing and memory freeing parts

    Linus Torvalds
     

26 Apr, 2014

2 commits

  • The mmu-gather operation 'tlb_flush_mmu()' has done two things: the
    actual tlb flush operation, and the batched freeing of the pages that
    the TLB entries pointed at.

    This splits the operation into separate phases, so that the forced
    batched flushing done by zap_pte_range() can now do the actual TLB flush
    while still holding the page table lock, but delay the batched freeing
    of all the pages to after the lock has been dropped.

    This in turn allows us to avoid a race condition between
    set_page_dirty() (as called by zap_pte_range() when it finds a dirty
    shared memory pte) and page_mkclean(): because we now flush all the
    dirty page data from the TLB's while holding the pte lock,
    page_mkclean() will be held up walking the (recently cleaned) page
    tables until after the TLB entries have been flushed from all CPU's.

    Reported-by: Benjamin Herrenschmidt
    Tested-by: Dave Hansen
    Acked-by: Hugh Dickins
    Cc: Peter Zijlstra
    Cc: Russell King - ARM Linux
    Cc: Tony Luck
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • commit 0b60f9ead5d4816e7e3d6e28f4a0d22d4a1b2513 (s390: use
    device_remove_file_self() instead of device_schedule_callback())

    caused random memory corruption on my s390 box. Turns out that the
    last element of the ccwgroup structure is of dynamic size, so we
    must move the newly introduced work structure _before_ the zero
    length array.

    Signed-off-by: Christian Borntraeger
    CC: Tejun Heo
    CC: Martin Schwidefsky
    CC: Heiko Carstens
    CC: Sebastian Ott
    CC: Peter Oberparleiter
    Acked-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Christian Borntraeger
     

25 Apr, 2014

1 commit


17 Apr, 2014

1 commit

  • Pull s390 patches from Martin Schwidefsky:
    "An update to the oops output with additional information about the
    crash. The renameat2 system call is enabled. Two patches in regard
    to the PTR_ERR_OR_ZERO cleanup. And a bunch of bug fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
    s390/sclp_cmd: replace PTR_RET with PTR_ERR_OR_ZERO
    s390/sclp: replace PTR_RET with PTR_ERR_OR_ZERO
    s390/sclp_vt220: Fix kernel panic due to early terminal input
    s390/compat: fix typo
    s390/uaccess: fix possible register corruption in strnlen_user_srst()
    s390: add 31 bit warning message
    s390: wire up sys_renameat2
    s390: show_registers() should not map user space addresses to kernel symbols
    s390/mm: print control registers and page table walk on crash
    s390/smp: fix smp_stop_cpu() for !CONFIG_SMP
    s390: fix control register update

    Linus Torvalds
     

13 Apr, 2014

2 commits

  • Pull vfs updates from Al Viro:
    "The first vfs pile, with deep apologies for being very late in this
    window.

    Assorted cleanups and fixes, plus a large preparatory part of iov_iter
    work. There's a lot more of that, but it'll probably go into the next
    merge window - it *does* shape up nicely, removes a lot of
    boilerplate, gets rid of locking inconsistencie between aio_write and
    splice_write and I hope to get Kent's direct-io rewrite merged into
    the same queue, but some of the stuff after this point is having
    (mostly trivial) conflicts with the things already merged into
    mainline and with some I want more testing.

    This one passes LTP and xfstests without regressions, in addition to
    usual beating. BTW, readahead02 in ltp syscalls testsuite has started
    giving failures since "mm/readahead.c: fix readahead failure for
    memoryless NUMA nodes and limit readahead pages" - might be a false
    positive, might be a real regression..."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    missing bits of "splice: fix racy pipe->buffers uses"
    cifs: fix the race in cifs_writev()
    ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
    kill generic_file_buffered_write()
    ocfs2_file_aio_write(): switch to generic_perform_write()
    ceph_aio_write(): switch to generic_perform_write()
    xfs_file_buffered_aio_write(): switch to generic_perform_write()
    export generic_perform_write(), start getting rid of generic_file_buffer_write()
    generic_file_direct_write(): get rid of ppos argument
    btrfs_file_aio_write(): get rid of ppos
    kill the 5th argument of generic_file_buffered_write()
    kill the 4th argument of __generic_file_aio_write()
    lustre: don't open-code kernel_recvmsg()
    ocfs2: don't open-code kernel_recvmsg()
    drbd: don't open-code kernel_recvmsg()
    constify blk_rq_map_user_iov() and friends
    lustre: switch to kernel_sendmsg()
    ocfs2: don't open-code kernel_sendmsg()
    take iov_iter stuff to mm/iov_iter.c
    process_vm_access: tidy up a bit
    ...

    Linus Torvalds
     
  • Pull audit updates from Eric Paris.

    * git://git.infradead.org/users/eparis/audit: (28 commits)
    AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
    audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
    audit: do not cast audit_rule_data pointers pointlesly
    AUDIT: Allow login in non-init namespaces
    audit: define audit_is_compat in kernel internal header
    kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
    sched: declare pid_alive as inline
    audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
    syscall_get_arch: remove useless function arguments
    audit: remove stray newline from audit_log_execve_info() audit_panic() call
    audit: remove stray newlines from audit_log_lost messages
    audit: include subject in login records
    audit: remove superfluous new- prefix in AUDIT_LOGIN messages
    audit: allow user processes to log from another PID namespace
    audit: anchor all pid references in the initial pid namespace
    audit: convert PPIDs to the inital PID namespace.
    pid: get pid_t ppid of task in init_pid_ns
    audit: rename the misleading audit_get_context() to audit_take_context()
    audit: Add generic compat syscall support
    audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
    ...

    Linus Torvalds
     

11 Apr, 2014

5 commits


09 Apr, 2014

4 commits

  • Print extra debugging information to the console if the kernel or a user
    space process crashed (with user space debugging enabled):

    - contents of control register 7 and 13
    - failing address and translation exception identification
    - page table walk for the failing address

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • smp_stop_cpu() should stop the current cpu even for !CONFIG_SMP.
    Otherwise machine_halt() will return and and the machine generates a
    panic instread of simply stopping the current cpu:

    Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000000

    CPU: 0 PID: 1 Comm: systemd-shutdow Not tainted 3.14.0-01527-g2b6ef16a6bc5 #10
    [...]
    Call Trace:
    ([] show_trace+0xf8/0x158)
    [] show_stack+0x6a/0xe8
    [] panic+0xe4/0x268
    [] do_exit+0xa88/0xb2c
    [] SyS_reboot+0x1f0/0x234
    [] sysc_nr_ok+0x22/0x28
    [] 0x7d5a09b4

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • The git commit c63badebfebacdba827ab1cc1d420fc81bd8d818
    "s390: optimize control register update" broke the update for
    control register 0. After the update do the lctlg from the correct
    value.

    Cc: # 3.14
    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • Pull second set of s390 patches from Martin Schwidefsky:
    "The second part of Heikos uaccess rework, the page table walker for
    uaccess is now a thing of the past (yay!)

    The code change to fix the theoretical TLB flush problem allows us to
    add a TLB flush optimization for zEC12, this machine has new
    instructions that allow to do CPU local TLB flushes for single pages
    and for all pages of a specific address space.

    Plus the usual bug fixing and some more cleanup"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
    s390/uaccess: rework uaccess code - fix locking issues
    s390/mm,tlb: optimize TLB flushing for zEC12
    s390/mm,tlb: safeguard against speculative TLB creation
    s390/irq: Use defines for external interruption codes
    s390/irq: Add defines for external interruption codes
    s390/sclp: add timeout for queued requests
    kvm/s390: also set guest pages back to stable on kexec/kdump
    lcs: Add missing destroy_timer_on_stack()
    s390/tape: Add missing destroy_timer_on_stack()
    s390/tape: Use del_timer_sync()
    s390/3270: fix crash with multiple reset device requests
    s390/bitops,atomic: add missing memory barriers
    s390/zcrypt: add length check for aligned data to avoid overflow in msg-type 6

    Linus Torvalds
     

08 Apr, 2014

4 commits

  • Merge second patch-bomb from Andrew Morton:
    - the rest of MM
    - zram updates
    - zswap updates
    - exit
    - procfs
    - exec
    - wait
    - crash dump
    - lib/idr
    - rapidio
    - adfs, affs, bfs, ufs
    - cris
    - Kconfig things
    - initramfs
    - small amount of IPC material
    - percpu enhancements
    - early ioremap support
    - various other misc things

    * emailed patches from Andrew Morton : (156 commits)
    MAINTAINERS: update Intel C600 SAS driver maintainers
    fs/ufs: remove unused ufs_super_block_third pointer
    fs/ufs: remove unused ufs_super_block_second pointer
    fs/ufs: remove unused ufs_super_block_first pointer
    fs/ufs/super.c: add __init to init_inodecache()
    doc/kernel-parameters.txt: add early_ioremap_debug
    arm64: add early_ioremap support
    arm64: initialize pgprot info earlier in boot
    x86: use generic early_ioremap
    mm: create generic early_ioremap() support
    x86/mm: sparse warning fix for early_memremap
    lglock: map to spinlock when !CONFIG_SMP
    percpu: add preemption checks to __this_cpu ops
    vmstat: use raw_cpu_ops to avoid false positives on preemption checks
    slub: use raw_cpu_inc for incrementing statistics
    net: replace __this_cpu_inc in route.c with raw_cpu_inc
    modules: use raw_cpu_write for initialization of per cpu refcount.
    mm: use raw_cpu ops for determining current NUMA node
    percpu: add raw_cpu_ops
    slub: fix leak of 'name' in sysfs_slab_add
    ...

    Linus Torvalds
     
  • If the renamed symbol is defined lib/iomap.c implements ioport_map and
    ioport_unmap and currently (nearly) all platforms define the port
    accessor functions outb/inb and friend unconditionally. So
    HAS_IOPORT_MAP is the better name for this.

    Consequently NO_IOPORT is renamed to NO_IOPORT_MAP.

    The motivation for this change is to reintroduce a symbol HAS_IOPORT
    that signals if outb/int et al are available. I will address that at
    least one merge window later though to keep surprises to a minimum and
    catch new introductions of (HAS|NO)_IOPORT.

    The changes in this commit were done using:

    $ git grep -l -E '(NO|HAS)_IOPORT' | xargs perl -p -i -e 's/\b((?:CONFIG_)?(?:NO|HAS)_IOPORT)\b/$1_MAP/'

    Signed-off-by: Uwe Kleine-König
    Acked-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • The main motivation behind this patch is to provide a way to disable THP
    for jobs where the code cannot be modified, and using a malloc hook with
    madvise is not an option (i.e. statically allocated data). This patch
    allows us to do just that, without affecting other jobs running on the
    system.

    We need to do this sort of thing for jobs where THP hurts performance,
    due to the possibility of increased remote memory accesses that can be
    created by situations such as the following:

    When you touch 1 byte of an untouched, contiguous 2MB chunk, a THP will
    be handed out, and the THP will be stuck on whatever node the chunk was
    originally referenced from. If many remote nodes need to do work on
    that same chunk, they'll be making remote accesses.

    With THP disabled, 4K pages can be handed out to separate nodes as
    they're needed, greatly reducing the amount of remote accesses to
    memory.

    This patch is based on some of my work combined with some
    suggestions/patches given by Oleg Nesterov. The main goal here is to
    add a prctl switch to allow us to disable to THP on a per mm_struct
    basis.

    Here's a bit of test data with the new patch in place...

    First with the flag unset:

    # perf stat -a ./prctl_wrapper_mmv3 0 ./thp_pthread -C 0 -m 0 -c 512 -b 256g
    Setting thp_disabled for this task...
    thp_disable: 0
    Set thp_disabled state to 0
    Process pid = 18027

    PF/
    MAX MIN TOTCPU/ TOT_PF/ TOT_PF/ WSEC/
    TYPE: CPUS WALL WALL SYS USER TOTCPU CPU WALL_SEC SYS_SEC CPU NODES
    512 1.120 0.060 0.000 0.110 0.110 0.000 28571428864 -9223372036854775808 55803572 23

    Performance counter stats for './prctl_wrapper_mmv3_hack 0 ./thp_pthread -C 0 -m 0 -c 512 -b 256g':

    273719072.841402 task-clock # 641.026 CPUs utilized [100.00%]
    1,008,986 context-switches # 0.000 M/sec [100.00%]
    7,717 CPU-migrations # 0.000 M/sec [100.00%]
    1,698,932 page-faults # 0.000 M/sec
    355,222,544,890,379 cycles # 1.298 GHz [100.00%]
    536,445,412,234,588 stalled-cycles-frontend # 151.02% frontend cycles idle [100.00%]
    409,110,531,310,223 stalled-cycles-backend # 115.17% backend cycles idle [100.00%]
    148,286,797,266,411 instructions # 0.42 insns per cycle
    # 3.62 stalled cycles per insn [100.00%]
    27,061,793,159,503 branches # 98.867 M/sec [100.00%]
    1,188,655,196 branch-misses # 0.00% of all branches

    427.001706337 seconds time elapsed

    Now with the flag set:

    # perf stat -a ./prctl_wrapper_mmv3 1 ./thp_pthread -C 0 -m 0 -c 512 -b 256g
    Setting thp_disabled for this task...
    thp_disable: 1
    Set thp_disabled state to 1
    Process pid = 144957

    PF/
    MAX MIN TOTCPU/ TOT_PF/ TOT_PF/ WSEC/
    TYPE: CPUS WALL WALL SYS USER TOTCPU CPU WALL_SEC SYS_SEC CPU NODES
    512 0.620 0.260 0.250 0.320 0.570 0.001 51612901376 128000000000 100806448 23

    Performance counter stats for './prctl_wrapper_mmv3_hack 1 ./thp_pthread -C 0 -m 0 -c 512 -b 256g':

    138789390.540183 task-clock # 641.959 CPUs utilized [100.00%]
    534,205 context-switches # 0.000 M/sec [100.00%]
    4,595 CPU-migrations # 0.000 M/sec [100.00%]
    63,133,119 page-faults # 0.000 M/sec
    147,977,747,269,768 cycles # 1.066 GHz [100.00%]
    200,524,196,493,108 stalled-cycles-frontend # 135.51% frontend cycles idle [100.00%]
    105,175,163,716,388 stalled-cycles-backend # 71.07% backend cycles idle [100.00%]
    180,916,213,503,160 instructions # 1.22 insns per cycle
    # 1.11 stalled cycles per insn [100.00%]
    26,999,511,005,868 branches # 194.536 M/sec [100.00%]
    714,066,351 branch-misses # 0.00% of all branches

    216.196778807 seconds time elapsed

    As with previous versions of the patch, We're getting about a 2x
    performance increase here. Here's a link to the test case I used, along
    with the little wrapper to activate the flag:

    http://oss.sgi.com/projects/memtests/thp_pthread_mmprctlv3.tar.gz

    This patch (of 3):

    Revert commit 8e72033f2a48 and add in code to fix up any issues caused
    by the revert.

    The revert is necessary because hugepage_madvise would return -EINVAL
    when VM_NOHUGEPAGE is set, which will break subsequent chunks of this
    patch set.

    Here's a snip of an e-mail from Gerald detailing the original purpose of
    this code, and providing justification for the revert:

    "The intent of commit 8e72033f2a48 was to guard against any future
    programming errors that may result in an madvice(MADV_HUGEPAGE) on
    guest mappings, which would crash the kernel.

    Martin suggested adding the bit to arch/s390/mm/pgtable.c, if
    8e72033f2a48 was to be reverted, because that check will also prevent
    a kernel crash in the case described above, it will now send a
    SIGSEGV instead.

    This would now also allow to do the madvise on other parts, if
    needed, so it is a more flexible approach. One could also say that
    it would have been better to do it this way right from the
    beginning..."

    Signed-off-by: Alex Thorlton
    Suggested-by: Oleg Nesterov
    Tested-by: Christian Borntraeger
    Cc: Gerald Schaefer
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Paolo Bonzini
    Cc: "Kirill A. Shutemov"
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Andrea Arcangeli
    Cc: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: Johannes Weiner
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Thorlton
     
  • Pull CPU hotplug notifiers registration fixes from Rafael Wysocki:
    "The purpose of this single series of commits from Srivatsa S Bhat
    (with a small piece from Gautham R Shenoy) touching multiple
    subsystems that use CPU hotplug notifiers is to provide a way to
    register them that will not lead to deadlocks with CPU online/offline
    operations as described in the changelog of commit 93ae4f978ca7f ("CPU
    hotplug: Provide lockless versions of callback registration
    functions").

    The first three commits in the series introduce the API and document
    it and the rest simply goes through the users of CPU hotplug notifiers
    and converts them to using the new method"

    * tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (52 commits)
    net/iucv/iucv.c: Fix CPU hotplug callback registration
    net/core/flow.c: Fix CPU hotplug callback registration
    mm, zswap: Fix CPU hotplug callback registration
    mm, vmstat: Fix CPU hotplug callback registration
    profile: Fix CPU hotplug callback registration
    trace, ring-buffer: Fix CPU hotplug callback registration
    xen, balloon: Fix CPU hotplug callback registration
    hwmon, via-cputemp: Fix CPU hotplug callback registration
    hwmon, coretemp: Fix CPU hotplug callback registration
    thermal, x86-pkg-temp: Fix CPU hotplug callback registration
    octeon, watchdog: Fix CPU hotplug callback registration
    oprofile, nmi-timer: Fix CPU hotplug callback registration
    intel-idle: Fix CPU hotplug callback registration
    clocksource, dummy-timer: Fix CPU hotplug callback registration
    drivers/base/topology.c: Fix CPU hotplug callback registration
    acpi-cpufreq: Fix CPU hotplug callback registration
    zsmalloc: Fix CPU hotplug callback registration
    scsi, fcoe: Fix CPU hotplug callback registration
    scsi, bnx2fc: Fix CPU hotplug callback registration
    scsi, bnx2i: Fix CPU hotplug callback registration
    ...

    Linus Torvalds
     

04 Apr, 2014

1 commit

  • Pull tracing updates from Steven Rostedt:
    "Most of the changes were largely clean ups, and some documentation.
    But there were a few features that were added:

    Uprobes now work with event triggers and multi buffers and have
    support under ftrace and perf.

    The big feature is that the function tracer can now be used within the
    multi buffer instances. That is, you can now trace some functions in
    one buffer, others in another buffer, all functions in a third buffer
    and so on. They are basically agnostic from each other. This only
    works for the function tracer and not for the function graph trace,
    although you can have the function graph tracer running in the top
    level buffer (or any tracer for that matter) and have different
    function tracing going on in the sub buffers"

    * tag 'trace-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (45 commits)
    tracing: Add BUG_ON when stack end location is over written
    tracepoint: Remove unused API functions
    Revert "tracing: Move event storage for array from macro to standalone function"
    ftrace: Constify ftrace_text_reserved
    tracepoints: API doc update to tracepoint_probe_register() return value
    tracepoints: API doc update to data argument
    ftrace: Fix compilation warning about control_ops_free
    ftrace/x86: BUG when ftrace recovery fails
    ftrace: Warn on error when modifying ftrace function
    ftrace: Remove freelist from struct dyn_ftrace
    ftrace: Do not pass data to ftrace_dyn_arch_init
    ftrace: Pass retval through return in ftrace_dyn_arch_init()
    ftrace: Inline the code from ftrace_dyn_table_alloc()
    ftrace: Cleanup of global variables ftrace_new_pgs and ftrace_update_cnt
    tracing: Evaluate len expression only once in __dynamic_array macro
    tracing: Correctly expand len expressions from __dynamic_array macro
    tracing/module: Replace include of tracepoint.h with jump_label.h in module.h
    tracing: Fix event header migrate.h to include tracepoint.h
    tracing: Fix event header writeback.h to include tracepoint.h
    tracing: Warn if a tracepoint is not set via debugfs
    ...

    Linus Torvalds
     

03 Apr, 2014

8 commits

  • The current uaccess code uses a page table walk in some circumstances,
    e.g. in case of the in atomic futex operations or if running on old
    hardware which doesn't support the mvcos instruction.

    However it turned out that the page table walk code does not correctly
    lock page tables when accessing page table entries.
    In other words: a different cpu may invalidate a page table entry while
    the current cpu inspects the pte. This may lead to random data corruption.

    Adding correct locking however isn't trivial for all uaccess operations.
    Especially copy_in_user() is problematic since that requires to hold at
    least two locks, but must be protected against ABBA deadlock when a
    different cpu also performs a copy_in_user() operation.

    So the solution is a different approach where we change address spaces:

    User space runs in primary address mode, or access register mode within
    vdso code, like it currently already does.

    The kernel usually also runs in home space mode, however when accessing
    user space the kernel switches to primary or secondary address mode if
    the mvcos instruction is not available or if a compare-and-swap (futex)
    instruction on a user space address is performed.
    KVM however is special, since that requires the kernel to run in home
    address space while implicitly accessing user space with the sie
    instruction.

    So we end up with:

    User space:
    - runs in primary or access register mode
    - cr1 contains the user asce
    - cr7 contains the user asce
    - cr13 contains the kernel asce

    Kernel space:
    - runs in home space mode
    - cr1 contains the user or kernel asce
    -> the kernel asce is loaded when a uaccess requires primary or
    secondary address mode
    - cr7 contains the user or kernel asce, (changed with set_fs())
    - cr13 contains the kernel asce

    In case of uaccess the kernel changes to:
    - primary space mode in case of a uaccess (copy_to_user) and uses
    e.g. the mvcp instruction to access user space. However the kernel
    will stay in home space mode if the mvcos instruction is available
    - secondary space mode in case of futex atomic operations, so that the
    instructions come from primary address space and data from secondary
    space

    In case of kvm the kernel runs in home space mode, but cr1 gets switched
    to contain the gmap asce before the sie instruction gets executed. When
    the sie instruction is finished cr1 will be switched back to contain the
    user asce.

    A context switch between two processes will always load the kernel asce
    for the next process in cr1. So the first exit to user space is a bit
    more expensive (one extra load control register instruction) than before,
    however keeps the code rather simple.

    In sum this means there is no need to perform any error prone page table
    walks anymore when accessing user space.

    The patch seems to be rather large, however it mainly removes the
    the page table walk code and restores the previously deleted "standard"
    uaccess code, with a couple of changes.

    The uaccess without mvcos mode can be enforced with the "uaccess_primary"
    kernel parameter.

    Reported-by: Christian Borntraeger
    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • The zEC12 machines introduced the local-clearing control for the IDTE
    and IPTE instruction. If the control is set only the TLB of the local
    CPU is cleared of entries, either all entries of a single address space
    for IDTE, or the entry for a single page-table entry for IPTE.
    Without the local-clearing control the TLB flush is broadcasted to all
    CPUs in the configuration, which is expensive.

    The reset of the bit mask of the CPUs that need flushing after a
    non-local IDTE is tricky. As TLB entries for an address space remain
    in the TLB even if the address space is detached a new bit field is
    required to keep track of attached CPUs vs. CPUs in the need of a
    flush. After a non-local flush with IDTE the bit-field of attached CPUs
    is copied to the bit-field of CPUs in need of a flush. The ordering
    of operations on cpu_attach_mask, attach_count and mm_cpumask(mm) is
    such that an underindication in mm_cpumask(mm) is prevented but an
    overindication in mm_cpumask(mm) is possible.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • The principles of operations states that the CPU is allowed to create
    TLB entries for an address space anytime while an ASCE is loaded to
    the control register. This is true even if the CPU is running in the
    kernel and the user address space is not (actively) accessed.

    In theory this can affect two aspects of the TLB flush logic.
    For full-mm flushes the ASCE of the dying process is still attached.
    The approach to flush first with IDTE and then just free all page
    tables can in theory lead to stale TLB entries. Use the batched
    free of page tables for the full-mm flushes as well.

    For operations that can have a stale ASCE in the control register,
    e.g. a delayed update_user_asce in switch_mm, load the kernel ASCE
    to prevent invalid TLBs from being created.

    Signed-off-by: Martin Schwidefsky

    Martin Schwidefsky
     
  • Use the new defines for external interruption codes to get rid
    of "magic" numbers in the s390 source code. And while we're at it,
    also rename the (un-)register_external_interrupt function to
    something shorter so that this patch does not exceed the 80
    columns all over the place.

    Signed-off-by: Thomas Huth
    Signed-off-by: Martin Schwidefsky

    Thomas Huth
     
  • Introduce defines for external interruption codes so that we
    can get rid of some "magic" numbers in the s390 source code.

    Signed-off-by: Thomas Huth
    Signed-off-by: Martin Schwidefsky

    Thomas Huth
     
  • Pull networking updates from David Miller:
    "Here is my initial pull request for the networking subsystem during
    this merge window:

    1) Support for ESN in AH (RFC 4302) from Fan Du.

    2) Add full kernel doc for ethtool command structures, from Ben
    Hutchings.

    3) Add BCM7xxx PHY driver, from Florian Fainelli.

    4) Export computed TCP rate information in netlink socket dumps, from
    Eric Dumazet.

    5) Allow IPSEC SA to be dumped partially using a filter, from Nicolas
    Dichtel.

    6) Convert many drivers to pci_enable_msix_range(), from Alexander
    Gordeev.

    7) Record SKB timestamps more efficiently, from Eric Dumazet.

    8) Switch to microsecond resolution for TCP round trip times, also
    from Eric Dumazet.

    9) Clean up and fix 6lowpan fragmentation handling by making use of
    the existing inet_frag api for it's implementation.

    10) Add TX grant mapping to xen-netback driver, from Zoltan Kiss.

    11) Auto size SKB lengths when composing netlink messages based upon
    past message sizes used, from Eric Dumazet.

    12) qdisc dumps can take a long time, add a cond_resched(), From Eric
    Dumazet.

    13) Sanitize netpoll core and drivers wrt. SKB handling semantics.
    Get rid of never-used-in-tree netpoll RX handling. From Eric W
    Biederman.

    14) Support inter-address-family and namespace changing in VTI tunnel
    driver(s). From Steffen Klassert.

    15) Add Altera TSE driver, from Vince Bridgers.

    16) Optimizing csum_replace2() so that it doesn't adjust the checksum
    by checksumming the entire header, from Eric Dumazet.

    17) Expand BPF internal implementation for faster interpreting, more
    direct translations into JIT'd code, and much cleaner uses of BPF
    filtering in non-socket ocntexts. From Daniel Borkmann and Alexei
    Starovoitov"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1976 commits)
    netpoll: Use skb_irq_freeable to make zap_completion_queue safe.
    net: Add a test to see if a skb is freeable in irq context
    qlcnic: Fix build failure due to undefined reference to `vxlan_get_rx_port'
    net: ptp: move PTP classifier in its own file
    net: sxgbe: make "core_ops" static
    net: sxgbe: fix logical vs bitwise operation
    net: sxgbe: sxgbe_mdio_register() frees the bus
    Call efx_set_channels() before efx->type->dimension_resources()
    xen-netback: disable rogue vif in kthread context
    net/mlx4: Set proper build dependancy with vxlan
    be2net: fix build dependency on VxLAN
    mac802154: make csma/cca parameters per-wpan
    mac802154: allow only one WPAN to be up at any given time
    net: filter: minor: fix kdoc in __sk_run_filter
    netlink: don't compare the nul-termination in nla_strcmp
    can: c_can: Avoid led toggling for every packet.
    can: c_can: Simplify TX interrupt cleanup
    can: c_can: Store dlc private
    can: c_can: Reduce register access
    can: c_can: Make the code readable
    ...

    Linus Torvalds
     
  • Pull trivial tree updates from Jiri Kosina:
    "Usual rocket science -- mostly documentation and comment updates"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    sparse: fix comment
    doc: fix double words
    isdn: capi: fix "CAPI_VERSION" comment
    doc: DocBook: Fix typos in xml and template file
    Bluetooth: add module name for btwilink
    driver core: unexport static function create_syslog_header
    mmc: core: typo fix in printk specifier
    ARM: spear: clean up editing mistake
    net-sysfs: fix comment typo 'CONFIG_SYFS'
    doc: Insert MODULE_ in module-signing macros
    Documentation: update URL to hfsplus Technote 1150
    gpio: update path to documentation
    ixgbe: Fix format string in ixgbe_fcoe.
    Kconfig: Remove useless "default N" lines
    user_namespace.c: Remove duplicated word in comment
    CREDITS: fix formatting
    treewide: Fix typo in Documentation/DocBook
    mm: Fix warning on make htmldocs caused by slab.c
    ata: ata-samsung_cf: cleanup in header file
    idr: remove unused prototype of idr_free()

    Linus Torvalds
     
  • Pull kvm updates from Paolo Bonzini:
    "PPC and ARM do not have much going on this time. Most of the cool
    stuff, instead, is in s390 and (after a few releases) x86.

    ARM has some caching fixes and PPC has transactional memory support in
    guests. MIPS has some fixes, with more probably coming in 3.16 as
    QEMU will soon get support for MIPS KVM.

    For x86 there are optimizations for debug registers, which trigger on
    some Windows games, and other important fixes for Windows guests. We
    now expose to the guest Broadwell instruction set extensions and also
    Intel MPX. There's also a fix/workaround for OS X guests, nested
    virtualization features (preemption timer), and a couple kvmclock
    refinements.

    For s390, the main news is asynchronous page faults, together with
    improvements to IRQs (floating irqs and adapter irqs) that speed up
    virtio devices"

    * tag 'kvm-3.15-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (96 commits)
    KVM: PPC: Book3S HV: Save/restore host PMU registers that are new in POWER8
    KVM: PPC: Book3S HV: Fix decrementer timeouts with non-zero TB offset
    KVM: PPC: Book3S HV: Don't use kvm_memslots() in real mode
    KVM: PPC: Book3S HV: Return ENODEV error rather than EIO
    KVM: PPC: Book3S: Trim top 4 bits of physical address in RTAS code
    KVM: PPC: Book3S HV: Add get/set_one_reg for new TM state
    KVM: PPC: Book3S HV: Add transactional memory support
    KVM: Specify byte order for KVM_EXIT_MMIO
    KVM: vmx: fix MPX detection
    KVM: PPC: Book3S HV: Fix KVM hang with CONFIG_KVM_XICS=n
    KVM: PPC: Book3S: Introduce hypervisor call H_GET_TCE
    KVM: PPC: Book3S HV: Fix incorrect userspace exit on ioeventfd write
    KVM: s390: clear local interrupts at cpu initial reset
    KVM: s390: Fix possible memory leak in SIGP functions
    KVM: s390: fix calculation of idle_mask array size
    KVM: s390: randomize sca address
    KVM: ioapic: reinject pending interrupts on KVM_SET_IRQCHIP
    KVM: Bump KVM_MAX_IRQ_ROUTES for s390
    KVM: s390: irq routing for adapter interrupts.
    KVM: s390: adapter interrupt sources
    ...

    Linus Torvalds
     

02 Apr, 2014

4 commits

  • it only makes control flow in __fput() and friends more convoluted.

    Signed-off-by: Al Viro

    Al Viro
     
  • Pull driver core and sysfs updates from Greg KH:
    "Here's the big driver core / sysfs update for 3.15-rc1.

    Lots of kernfs updates to make it useful for other subsystems, and a
    few other tiny driver core patches.

    All have been in linux-next for a while"

    * tag 'driver-core-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (42 commits)
    Revert "sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()"
    kernfs: cache atomic_write_len in kernfs_open_file
    numa: fix NULL pointer access and memory leak in unregister_one_node()
    Revert "driver core: synchronize device shutdown"
    kernfs: fix off by one error.
    kernfs: remove duplicate dir.c at the top dir
    x86: align x86 arch with generic CPU modalias handling
    cpu: add generic support for CPU feature based module autoloading
    sysfs: create bin_attributes under the requested group
    driver core: unexport static function create_syslog_header
    firmware: use power efficient workqueue for unloading and aborting fw load
    firmware: give a protection when map page failed
    firmware: google memconsole driver fixes
    firmware: fix google/gsmi duplicate efivars_sysfs_init()
    drivers/base: delete non-required instances of include
    kernfs: fix kernfs_node_from_dentry()
    ACPI / platform: drop redundant ACPI_HANDLE check
    kernfs: fix hash calculation in kernfs_rename_ns()
    kernfs: add CONFIG_KERNFS
    sysfs, kobject: add sysfs wrapper for kernfs_enable_ns()
    ...

    Linus Torvalds
     
  • Pull PCI changes from Bjorn Helgaas:
    "Enumeration
    - Increment max correctly in pci_scan_bridge() (Andreas Noever)
    - Clarify the "scan anyway" comment in pci_scan_bridge() (Andreas Noever)
    - Assign CardBus bus number only during the second pass (Andreas Noever)
    - Use request_resource_conflict() instead of insert_ for bus numbers (Andreas Noever)
    - Make sure bus number resources stay within their parents bounds (Andreas Noever)
    - Remove pci_fixup_parent_subordinate_busnr() (Andreas Noever)
    - Check for child busses which use more bus numbers than allocated (Andreas Noever)
    - Don't scan random busses in pci_scan_bridge() (Andreas Noever)
    - x86: Drop pcibios_scan_root() check for bus already scanned (Bjorn Helgaas)
    - x86: Use pcibios_scan_root() instead of pci_scan_bus_with_sysdata() (Bjorn Helgaas)
    - x86: Use pcibios_scan_root() instead of pci_scan_bus_on_node() (Bjorn Helgaas)
    - x86: Merge pci_scan_bus_on_node() into pcibios_scan_root() (Bjorn Helgaas)
    - x86: Drop return value of pcibios_scan_root() (Bjorn Helgaas)

    NUMA
    - x86: Add x86_pci_root_bus_node() to look up NUMA node from PCI bus (Bjorn Helgaas)
    - x86: Use x86_pci_root_bus_node() instead of get_mp_bus_to_node() (Bjorn Helgaas)
    - x86: Remove mp_bus_to_node[], set_mp_bus_to_node(), get_mp_bus_to_node() (Bjorn Helgaas)
    - x86: Use NUMA_NO_NODE, not -1, for unknown node (Bjorn Helgaas)
    - x86: Remove acpi_get_pxm() usage (Bjorn Helgaas)
    - ia64: Use NUMA_NO_NODE, not MAX_NUMNODES, for unknown node (Bjorn Helgaas)
    - ia64: Remove acpi_get_pxm() usage (Bjorn Helgaas)
    - ACPI: Fix acpi_get_node() prototype (Bjorn Helgaas)

    Resource management
    - i2o: Fix and refactor PCI space allocation (Bjorn Helgaas)
    - Add resource_contains() (Bjorn Helgaas)
    - Add %pR support for IORESOURCE_UNSET (Bjorn Helgaas)
    - Mark resources as IORESOURCE_UNSET if we can't assign them (Bjorn Helgaas)
    - Don't clear IORESOURCE_UNSET when updating BAR (Bjorn Helgaas)
    - Check IORESOURCE_UNSET before updating BAR (Bjorn Helgaas)
    - Don't try to claim IORESOURCE_UNSET resources (Bjorn Helgaas)
    - Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit (Bjorn Helgaas)
    - Don't enable decoding if BAR hasn't been assigned an address (Bjorn Helgaas)
    - Add "weak" generic pcibios_enable_device() implementation (Bjorn Helgaas)
    - alpha, microblaze, sh, sparc, tile: Use default pcibios_enable_device() (Bjorn Helgaas)
    - s390: Use generic pci_enable_resources() (Bjorn Helgaas)
    - Don't check resource_size() in pci_bus_alloc_resource() (Bjorn Helgaas)
    - Set type in __request_region() (Bjorn Helgaas)
    - Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region() (Bjorn Helgaas)
    - Change pci_bus_alloc_resource() type_mask to unsigned long (Bjorn Helgaas)
    - Log IDE resource quirk in dmesg (Bjorn Helgaas)
    - Revert "[PATCH] Insert GART region into resource map" (Bjorn Helgaas)

    PCI device hotplug
    - Make check_link_active() non-static (Rajat Jain)
    - Use link change notifications for hot-plug and removal (Rajat Jain)
    - Enable link state change notifications (Rajat Jain)
    - Don't disable the link permanently during removal (Rajat Jain)
    - Don't check adapter or latch status while disabling (Rajat Jain)
    - Disable link notification across slot reset (Rajat Jain)
    - Ensure very fast hotplug events are also processed (Rajat Jain)
    - Add hotplug_lock to serialize hotplug events (Rajat Jain)
    - Remove a non-existent card, regardless of "surprise" capability (Rajat Jain)
    - Don't turn slot off when hot-added device already exists (Yijing Wang)

    MSI
    - Keep pci_enable_msi() documentation (Alexander Gordeev)
    - ahci: Fix broken single MSI fallback (Alexander Gordeev)
    - ahci, vfio: Use pci_enable_msi_range() (Alexander Gordeev)
    - Check kmalloc() return value, fix leak of name (Greg Kroah-Hartman)
    - Fix leak of msi_attrs (Greg Kroah-Hartman)
    - Fix pci_msix_vec_count() htmldocs failure (Masanari Iida)

    Virtualization
    - Device-specific ACS support (Alex Williamson)

    Freescale i.MX6
    - Wait for retraining (Marek Vasut)

    Marvell MVEBU
    - Use Device ID and revision from underlying endpoint (Andrew Lunn)
    - Fix incorrect size for PCI aperture resources (Jason Gunthorpe)
    - Call request_resource() on the apertures (Jason Gunthorpe)
    - Fix potential issue in range parsing (Jean-Jacques Hiblot)

    Renesas R-Car
    - Check platform_get_irq() return code (Ben Dooks)
    - Add error interrupt handling (Ben Dooks)
    - Fix bridge logic configuration accesses (Ben Dooks)
    - Register each instance independently (Magnus Damm)
    - Break out window size handling (Magnus Damm)
    - Make the Kconfig dependencies more generic (Magnus Damm)

    Synopsys DesignWare
    - Fix RC BAR to be single 64-bit non-prefetchable memory (Mohit Kumar)

    Miscellaneous
    - Remove unused SR-IOV VF Migration support (Bjorn Helgaas)
    - Enable INTx if BIOS left them disabled (Bjorn Helgaas)
    - Fix hex vs decimal typo in cpqhpc_probe() (Dan Carpenter)
    - Clean up par-arch object file list (Liviu Dudau)
    - Set IORESOURCE_ROM_SHADOW only for the default VGA device (Sander Eikelenboom)
    - ACPI, ARM, drm, powerpc, pcmcia, PCI: Use list_for_each_entry() for bus traversal (Yijing Wang)
    - Fix pci_bus_b() build failure (Paul Gortmaker)"

    * tag 'pci-v3.15-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (108 commits)
    Revert "[PATCH] Insert GART region into resource map"
    PCI: Log IDE resource quirk in dmesg
    PCI: Change pci_bus_alloc_resource() type_mask to unsigned long
    PCI: Check all IORESOURCE_TYPE_BITS in pci_bus_alloc_from_region()
    resources: Set type in __request_region()
    PCI: Don't check resource_size() in pci_bus_alloc_resource()
    s390/PCI: Use generic pci_enable_resources()
    tile PCI RC: Use default pcibios_enable_device()
    sparc/PCI: Use default pcibios_enable_device() (Leon only)
    sh/PCI: Use default pcibios_enable_device()
    microblaze/PCI: Use default pcibios_enable_device()
    alpha/PCI: Use default pcibios_enable_device()
    PCI: Add "weak" generic pcibios_enable_device() implementation
    PCI: Don't enable decoding if BAR hasn't been assigned an address
    PCI: Enable INTx in pci_reenable_device() only when MSI/MSI-X not enabled
    PCI: Mark 64-bit resource as IORESOURCE_UNSET if we only support 32-bit
    PCI: Don't try to claim IORESOURCE_UNSET resources
    PCI: Check IORESOURCE_UNSET before updating BAR
    PCI: Don't clear IORESOURCE_UNSET when updating BAR
    PCI: Mark resources as IORESOURCE_UNSET if we can't assign them
    ...

    Conflicts:
    arch/x86/include/asm/topology.h
    drivers/ata/ahci.c

    Linus Torvalds
     
  • Pull irq code updates from Thomas Gleixner:
    "The irq department proudly presents:

    - Another tree wide sweep of irq infrastructure abuse. Clear winner
    of the trainwreck engineering contest was:
    #include "../../../kernel/irq/settings.h"

    - Tree wide update of irq_set_affinity() callbacks which miss a cpu
    online check when picking a single cpu out of the affinity mask.

    - Tree wide consolidation of interrupt statistics.

    - Updates to the threaded interrupt infrastructure to allow explicit
    wakeup of the interrupt thread and a variant of synchronize_irq()
    which synchronizes only the hard interrupt handler. Both are
    needed to replace the homebrewn thread handling in the mmc/sdhci
    code.

    - New irq chip callbacks to allow proper support for GPIO based irqs.
    The GPIO based interrupts need to request/release GPIO resources
    from request/free_irq.

    - A few new ARM interrupt chips. No revolutionary new hardware, just
    differently wreckaged variations of the scheme.

    - Small improvments, cleanups and updates all over the place"

    I was hoping that that trainwreck engineering contest was a April Fools'
    joke. But no.

    * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (68 commits)
    irqchip: sun7i/sun6i: Disable NMI before registering the handler
    ARM: sun7i/sun6i: dts: Fix IRQ number for sun6i NMI controller
    ARM: sun7i/sun6i: irqchip: Update the documentation
    ARM: sun7i/sun6i: dts: Add NMI irqchip support
    ARM: sun7i/sun6i: irqchip: Add irqchip driver for NMI controller
    genirq: Export symbol no_action()
    arm: omap: Fix typo in ams-delta-fiq.c
    m68k: atari: Fix the last kernel_stat.h fallout
    irqchip: sun4i: Simplify sun4i_irq_ack
    irqchip: sun4i: Use handle_fasteoi_irq for all interrupts
    genirq: procfs: Make smp_affinity values go+r
    softirq: Add linux/irq.h to make it compile again
    m68k: amiga: Add linux/irq.h to make it compile again
    irqchip: sun4i: Don't ack IRQs > 0, fix acking of IRQ 0
    irqchip: sun4i: Fix a comment about mask register initialization
    irqchip: sun4i: Fix irq 0 not working
    genirq: Add a new IRQCHIP_EOI_THREADED flag
    genirq: Document IRQCHIP_ONESHOT_SAFE flag
    ARM: sunxi: dt: Convert to the new irq controller compatibles
    irqchip: sunxi: Change compatibles
    ...

    Linus Torvalds
     

01 Apr, 2014

5 commits

  • We need to reset the usage state of the pages on kexec/kdump,
    which use subcode 0 and 1. We will only do the cmma reset in
    the kernel, everything else is done in userspace as before.

    Signed-off-by: Christian Borntraeger
    Signed-off-by: Martin Schwidefsky

    Christian Borntraeger
     
  • When reworking the bitops and atomic ops I missed that those instructions
    that got atomic behaviour only perform a "specific-operand-serialization"
    instead of a full "serialization".
    The compare-and-swap instruction used before performs a full serialization
    before and after the instruction is executed, which means it has full
    memory barrier semantics.
    In order to give the new bitops and atomic ops functions also full memory
    barrier semantics add a "bcr 14,0" before and after each of those new
    instructions which performs full serialization as well.

    This restores memory barrier semantics for bitops and atomic ops functions
    which return values, like e.g. atomic_add_return(), but not for functions
    which do not return a value, like e.g. atomic_add().
    This is consistent to other architectures and what common code requires.

    Cc: stable@vger.kernel.org # v3.13+
    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • Pull s390 updates from Martin Schwidefsky:
    "There are two memory management related changes, the CMMA support for
    KVM to avoid swap-in of freed pages and the split page table lock for
    the PMD level. These two come with common code changes in mm/.

    A fix for the long standing theoretical TLB flush problem, this one
    comes with a common code change in kernel/sched/.

    Another set of changes is Heikos uaccess work, included is the initial
    set of patches with more to come.

    And fixes and cleanups as usual"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (36 commits)
    s390/con3270: optionally disable auto update
    s390/mm: remove unecessary parameter from pgste_ipte_notify
    s390/mm: remove unnecessary parameter from gmap_do_ipte_notify
    s390/mm: fixing comment so that parameter name match
    s390/smp: limit number of cpus in possible cpu mask
    hypfs: Add clarification for "weight_min" attribute
    s390: update defconfigs
    s390/ptrace: add support for PTRACE_SINGLEBLOCK
    s390/perf: make print_debug_cf() static
    s390/topology: Remove call to update_cpu_masks()
    s390/compat: remove compat exec domain
    s390: select CONFIG_TTY for use of tty in unconditional keyboard driver
    s390/appldata_os: fix cpu array size calculation
    s390/checksum: remove memset() within csum_partial_copy_from_user()
    s390/uaccess: remove copy_from_user_real()
    s390/sclp_early: Return correct HSA block count also for zero
    s390: add some drivers/subsystems to the MAINTAINERS file
    s390: improve debug feature usage
    s390/airq: add support for irq ranges
    s390/mm: enable split page table lock for PMD level
    ...

    Linus Torvalds
     
  • Pull s390 compat wrapper rework from Heiko Carstens:
    "S390 compat system call wrapper simplification work.

    The intention of this work is to get rid of all hand written assembly
    compat system call wrappers on s390, which perform proper sign or zero
    extension, or pointer conversion of compat system call parameters.
    Instead all of this should be done with C code eg by using Al's
    COMPAT_SYSCALL_DEFINEx() macro.

    Therefore all common code and s390 specific compat system calls have
    been converted to the COMPAT_SYSCALL_DEFINEx() macro.

    In order to generate correct code all compat system calls may only
    have eg compat_ulong_t parameters, but no unsigned long parameters.
    Those patches which change parameter types from unsigned long to
    compat_ulong_t parameters are separate in this series, but shouldn't
    cause any harm.

    The only compat system calls which intentionally have 64 bit
    parameters (preadv64 and pwritev64) in support of the x86/32 ABI
    haven't been changed, but are now only available if an architecture
    defines __ARCH_WANT_COMPAT_SYS_PREADV64/PWRITEV64.

    System calls which do not have a compat variant but still need proper
    zero extension on s390, like eg "long sys_brk(unsigned long brk)" will
    get a proper wrapper function with the new s390 specific
    COMPAT_SYSCALL_WRAPx() macro:

    COMPAT_SYSCALL_WRAP1(brk, unsigned long, brk);

    which generates the following code (simplified):

    asmlinkage long sys_brk(unsigned long brk);
    asmlinkage long compat_sys_brk(long brk)
    {
    return sys_brk((u32)brk);
    }

    Given that the C file which contains all the COMPAT_SYSCALL_WRAP lines
    includes both linux/syscall.h and linux/compat.h, it will generate
    build errors, if the declaration of sys_brk() doesn't match, or if
    there exists a non-matching compat_sys_brk() declaration.

    In addition this will intentionally result in a link error if
    somewhere else a compat_sys_brk() function exists, which probably
    should have been used instead. Two more BUILD_BUG_ONs make sure the
    size and type of each compat syscall parameter can be handled
    correctly with the s390 specific macros.

    I converted the compat system calls step by step to verify the
    generated code is correct and matches the previous code. In fact it
    did not always match, however that was always a bug in the hand
    written asm code.

    In result we get less code, less bugs, and much more sanity checking"

    * 'compat' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (44 commits)
    s390/compat: add copyright statement
    compat: include linux/unistd.h within linux/compat.h
    s390/compat: get rid of compat wrapper assembly code
    s390/compat: build error for large compat syscall args
    mm/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
    kexec/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
    net/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
    ipc/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
    fs/compat: convert to COMPAT_SYSCALL_DEFINE with changing parameter types
    ipc/compat: convert to COMPAT_SYSCALL_DEFINE
    fs/compat: convert to COMPAT_SYSCALL_DEFINE
    security/compat: convert to COMPAT_SYSCALL_DEFINE
    mm/compat: convert to COMPAT_SYSCALL_DEFINE
    net/compat: convert to COMPAT_SYSCALL_DEFINE
    kernel/compat: convert to COMPAT_SYSCALL_DEFINE
    fs/compat: optional preadv64/pwrite64 compat system calls
    ipc/compat_sys_msgrcv: change msgtyp type from long to compat_long_t
    s390/compat: partial parameter conversion within syscall wrappers
    s390/compat: automatic zero, sign and pointer conversion of syscalls
    s390/compat: add sync_file_range and fallocate compat syscalls
    ...

    Linus Torvalds
     
  • Pull core locking updates from Ingo Molnar:
    "The biggest change is the MCS spinlock generalization changes from Tim
    Chen, Peter Zijlstra, Jason Low et al. There's also lockdep
    fixes/enhancements from Oleg Nesterov, in particular a false negative
    fix related to lockdep_set_novalidate_class() usage"

    * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
    locking/mutex: Fix debug checks
    locking/mutexes: Add extra reschedule point
    locking/mutexes: Introduce cancelable MCS lock for adaptive spinning
    locking/mutexes: Unlock the mutex without the wait_lock
    locking/mutexes: Modify the way optimistic spinners are queued
    locking/mutexes: Return false if task need_resched() in mutex_can_spin_on_owner()
    locking: Move mcs_spinlock.h into kernel/locking/
    m68k: Skip futex_atomic_cmpxchg_inatomic() test
    futex: Allow architectures to skip futex_atomic_cmpxchg_inatomic() test
    Revert "sched/wait: Suppress Sparse 'variable shadowing' warning"
    lockdep: Change lockdep_set_novalidate_class() to use _and_name
    lockdep: Change mark_held_locks() to check hlock->check instead of lockdep_no_validate
    lockdep: Don't create the wrong dependency on hlock->check == 0
    lockdep: Make held_lock->check and "int check" argument bool
    locking/mcs: Allow architecture specific asm files to be used for contended case
    locking/mcs: Order the header files in Kbuild of each architecture in alphabetical order
    sched/wait: Suppress Sparse 'variable shadowing' warning
    hung_task/Documentation: Fix hung_task_warnings description
    locking/mcs: Allow architectures to hook in to contended paths
    locking/mcs: Micro-optimize the MCS code, add extra comments
    ...

    Linus Torvalds
     

31 Mar, 2014

1 commit

  • This patch adds a jited flag into sk_filter struct in order to indicate
    whether a filter is currently jited or not. The size of sk_filter is
    not being expanded as the 32 bit 'len' member allows upper bits to be
    reused since a filter can currently only grow as large as BPF_MAXINSNS.

    Therefore, there's enough room also for other in future needed flags to
    reuse 'len' field if necessary. The jited flag also allows for having
    alternative interpreter functions running as currently, we can only
    detect jit compiled filters by testing fp->bpf_func to not equal the
    address of sk_run_filter().

    Joint work with Alexei Starovoitov.

    Signed-off-by: Alexei Starovoitov
    Signed-off-by: Daniel Borkmann
    Cc: Pablo Neira Ayuso
    Signed-off-by: David S. Miller

    Daniel Borkmann