20 Oct, 2014

1 commit

  • Pull audit updates from Eric Paris:
    "So this change across a whole bunch of arches really solves one basic
    problem. We want to audit when seccomp is killing a process. seccomp
    hooks in before the audit syscall entry code. audit_syscall_entry
    took as an argument the arch of the given syscall. Since the arch is
    part of what makes a syscall number meaningful it's an important part
    of the record, but it isn't available when seccomp shoots the
    syscall...

    For most arch's we have a better way to get the arch (syscall_get_arch)
    So the solution was two fold: Implement syscall_get_arch() everywhere
    there is audit which didn't have it. Use syscall_get_arch() in the
    seccomp audit code. Having syscall_get_arch() everywhere meant it was
    a useless flag on the stack and we could get rid of it for the typical
    syscall entry.

    The other changes inside the audit system aren't grand, fixed some
    records that had invalid spaces. Better locking around the task comm
    field. Removing some dead functions and structs. Make some things
    static. Really minor stuff"

    * git://git.infradead.org/users/eparis/audit: (31 commits)
    audit: rename audit_log_remove_rule to disambiguate for trees
    audit: cull redundancy in audit_rule_change
    audit: WARN if audit_rule_change called illegally
    audit: put rule existence check in canonical order
    next: openrisc: Fix build
    audit: get comm using lock to avoid race in string printing
    audit: remove open_arg() function that is never used
    audit: correct AUDIT_GET_FEATURE return message type
    audit: set nlmsg_len for multicast messages.
    audit: use union for audit_field values since they are mutually exclusive
    audit: invalid op= values for rules
    audit: use atomic_t to simplify audit_serial()
    kernel/audit.c: use ARRAY_SIZE instead of sizeof/sizeof[0]
    audit: reduce scope of audit_log_fcaps
    audit: reduce scope of audit_net_id
    audit: arm64: Remove the audit arch argument to audit_syscall_entry
    arm64: audit: Add audit hook in syscall_trace_enter/exit()
    audit: x86: drop arch from __audit_syscall_entry() interface
    sparc: implement is_32bit_task
    sparc: properly conditionalize use of TIF_32BIT
    ...

    Linus Torvalds
     

19 Oct, 2014

2 commits

  • Pull module fix from Rusty Russell:
    "A single panic fix for a rare race, stable CC'd"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
    modules, lock around setting of MODULE_STATE_UNFORMED

    Linus Torvalds
     
  • Commit b0c29f79ecea (futexes: Avoid taking the hb->lock if there's
    nothing to wake up) changes the futex code to avoid taking a lock when
    there are no waiters. This code has been subsequently fixed in commit
    11d4616bd07f (futex: revert back to the explicit waiter counting code).
    Both the original commit and the fix-up rely on get_futex_key_refs() to
    always imply a barrier.

    However, for private futexes, none of the cases in the switch statement
    of get_futex_key_refs() would be hit and the function completes without
    a memory barrier as required before checking the "waiters" in
    futex_wake() -> hb_waiters_pending(). The consequence is a race with a
    thread waiting on a futex on another CPU, allowing the waker thread to
    read "waiters == 0" while the waiter thread to have read "futex_val ==
    locked" (in kernel).

    Without this fix, the problem (user space deadlocks) can be seen with
    Android bionic's mutex implementation on an arm64 multi-cluster system.

    Signed-off-by: Catalin Marinas
    Reported-by: Matteo Franchin
    Fixes: b0c29f79ecea (futexes: Avoid taking the hb->lock if there's nothing to wake up)
    Acked-by: Davidlohr Bueso
    Tested-by: Mike Galbraith
    Cc:
    Cc: Darren Hart
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Paul E. McKenney
    Signed-off-by: Linus Torvalds

    Catalin Marinas
     

15 Oct, 2014

2 commits

  • Pull percpu consistent-ops changes from Tejun Heo:
    "Way back, before the current percpu allocator was implemented, static
    and dynamic percpu memory areas were allocated and handled separately
    and had their own accessors. The distinction has been gone for many
    years now; however, the now duplicate two sets of accessors remained
    with the pointer based ones - this_cpu_*() - evolving various other
    operations over time. During the process, we also accumulated other
    inconsistent operations.

    This pull request contains Christoph's patches to clean up the
    duplicate accessor situation. __get_cpu_var() uses are replaced with
    with this_cpu_ptr() and __this_cpu_ptr() with raw_cpu_ptr().

    Unfortunately, the former sometimes is tricky thanks to C being a bit
    messy with the distinction between lvalues and pointers, which led to
    a rather ugly solution for cpumask_var_t involving the introduction of
    this_cpu_cpumask_var_ptr().

    This converts most of the uses but not all. Christoph will follow up
    with the remaining conversions in this merge window and hopefully
    remove the obsolete accessors"

    * 'for-3.18-consistent-ops' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (38 commits)
    irqchip: Properly fetch the per cpu offset
    percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t -fix
    ia64: sn_nodepda cannot be assigned to after this_cpu conversion. Use __this_cpu_write.
    percpu: Resolve ambiguities in __get_cpu_var/cpumask_var_t
    Revert "powerpc: Replace __get_cpu_var uses"
    percpu: Remove __this_cpu_ptr
    clocksource: Replace __this_cpu_ptr with raw_cpu_ptr
    sparc: Replace __get_cpu_var uses
    avr32: Replace __get_cpu_var with __this_cpu_write
    blackfin: Replace __get_cpu_var uses
    tile: Use this_cpu_ptr() for hardware counters
    tile: Replace __get_cpu_var uses
    powerpc: Replace __get_cpu_var uses
    alpha: Replace __get_cpu_var
    ia64: Replace __get_cpu_var uses
    s390: cio driver &__get_cpu_var replacements
    s390: Replace __get_cpu_var uses
    mips: Replace __get_cpu_var uses
    MIPS: Replace __get_cpu_var uses in FPU emulator.
    arm: Replace __this_cpu_ptr with raw_cpu_ptr
    ...

    Linus Torvalds
     
  • A panic was seen in the following sitation.

    There are two threads running on the system. The first thread is a system
    monitoring thread that is reading /proc/modules. The second thread is
    loading and unloading a module (in this example I'm using my simple
    dummy-module.ko). Note, in the "real world" this occurred with the qlogic
    driver module.

    When doing this, the following panic occurred:

    ------------[ cut here ]------------
    kernel BUG at kernel/module.c:3739!
    invalid opcode: 0000 [#1] SMP
    Modules linked in: binfmt_misc sg nfsv3 rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel lrw igb gf128mul glue_helper iTCO_wdt iTCO_vendor_support ablk_helper ptp sb_edac cryptd pps_core edac_core shpchp i2c_i801 pcspkr wmi lpc_ich ioatdma mfd_core dca ipmi_si nfsd ipmi_msghandler auth_rpcgss nfs_acl lockd sunrpc xfs libcrc32c sr_mod cdrom sd_mod crc_t10dif crct10dif_common mgag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm isci drm libsas ahci libahci scsi_transport_sas libata i2c_core dm_mirror dm_region_hash dm_log dm_mod [last unloaded: dummy_module]
    CPU: 37 PID: 186343 Comm: cat Tainted: GF O-------------- 3.10.0+ #7
    Hardware name: Intel Corporation S2600CP/S2600CP, BIOS RMLSDP.86I.00.29.D696.1311111329 11/11/2013
    task: ffff8807fd2d8000 ti: ffff88080fa7c000 task.ti: ffff88080fa7c000
    RIP: 0010:[] [] module_flags+0xb5/0xc0
    RSP: 0018:ffff88080fa7fe18 EFLAGS: 00010246
    RAX: 0000000000000003 RBX: ffffffffa03b5200 RCX: 0000000000000000
    RDX: 0000000000001000 RSI: ffff88080fa7fe38 RDI: ffffffffa03b5000
    RBP: ffff88080fa7fe28 R08: 0000000000000010 R09: 0000000000000000
    R10: 0000000000000000 R11: 000000000000000f R12: ffffffffa03b5000
    R13: ffffffffa03b5008 R14: ffffffffa03b5200 R15: ffffffffa03b5000
    FS: 00007f6ae57ef740(0000) GS:ffff88101e7a0000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000404f70 CR3: 0000000ffed48000 CR4: 00000000001407e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Stack:
    ffffffffa03b5200 ffff8810101e4800 ffff88080fa7fe70 ffffffff810d666c
    ffff88081e807300 000000002e0f2fbf 0000000000000000 ffff88100f257b00
    ffffffffa03b5008 ffff88080fa7ff48 ffff8810101e4800 ffff88080fa7fee0
    Call Trace:
    [] m_show+0x19c/0x1e0
    [] seq_read+0x16e/0x3b0
    [] proc_reg_read+0x3d/0x80
    [] vfs_read+0x9c/0x170
    [] SyS_read+0x58/0xb0
    [] system_call_fastpath+0x16/0x1b
    Code: 48 63 c2 83 c2 01 c6 04 03 29 48 63 d2 eb d9 0f 1f 80 00 00 00 00 48 63 d2 c6 04 13 2d 41 8b 0c 24 8d 50 02 83 f9 01 75 b2 eb cb 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41
    RIP [] module_flags+0xb5/0xc0
    RSP

    Consider the two processes running on the system.

    CPU 0 (/proc/modules reader)
    CPU 1 (loading/unloading module)

    CPU 0 opens /proc/modules, and starts displaying data for each module by
    traversing the modules list via fs/seq_file.c:seq_open() and
    fs/seq_file.c:seq_read(). For each module in the modules list, seq_read
    does

    op->start() show() stop() state == MODULE_STATE_UNFORMED);
    ...

    The other thread, CPU 1, in unloading the module calls the syscall
    delete_module() defined in kernel/module.c. The module_mutex is acquired
    for a short time, and then released. free_module() is called without the
    module_mutex. free_module() then sets mod->state = MODULE_STATE_UNFORMED,
    also without the module_mutex. Some additional code is called and then the
    module_mutex is reacquired to remove the module from the modules list:

    /* Now we can delete it from the lists */
    mutex_lock(&module_mutex);
    stop_machine(__unlink_module, mod, NULL);
    mutex_unlock(&module_mutex);

    This is the sequence of events that leads to the panic.

    CPU 1 is removing dummy_module via delete_module(). It acquires the
    module_mutex, and then releases it. CPU 1 has NOT set dummy_module->state to
    MODULE_STATE_UNFORMED yet.

    CPU 0, which is reading the /proc/modules, acquires the module_mutex and
    acquires a pointer to the dummy_module which is still in the modules list.
    CPU 0 calls m_show for dummy_module. The check in m_show() for
    MODULE_STATE_UNFORMED passed for dummy_module even though it is being
    torn down.

    Meanwhile CPU 1, which has been continuing to remove dummy_module without
    holding the module_mutex, now calls free_module() and sets
    dummy_module->state to MODULE_STATE_UNFORMED.

    CPU 0 now calls module_flags() with dummy_module and ...

    static char *module_flags(struct module *mod, char *buf)
    {
    int bx = 0;

    BUG_ON(mod->state == MODULE_STATE_UNFORMED);

    and BOOM.

    Acquire and release the module_mutex lock around the setting of
    MODULE_STATE_UNFORMED in the teardown path, which should resolve the
    problem.

    Testing: In the unpatched kernel I can panic the system within 1 minute by
    doing

    while (true) do insmod dummy_module.ko; rmmod dummy_module.ko; done

    and

    while (true) do cat /proc/modules; done

    in separate terminals.

    In the patched kernel I was able to run just over one hour without seeing
    any issues. I also verified the output of panic via sysrq-c and the output
    of /proc/modules looks correct for all three states for the dummy_module.

    dummy_module 12661 0 - Unloading 0xffffffffa03a5000 (OE-)
    dummy_module 12661 0 - Live 0xffffffffa03bb000 (OE)
    dummy_module 14015 1 - Loading 0xffffffffa03a5000 (OE+)

    Signed-off-by: Prarit Bhargava
    Reviewed-by: Oleg Nesterov
    Signed-off-by: Rusty Russell
    Cc: stable@kernel.org

    Prarit Bhargava
     

14 Oct, 2014

12 commits

  • Merge second patch-bomb from Andrew Morton:
    - a few hotfixes
    - drivers/dma updates
    - MAINTAINERS updates
    - Quite a lot of lib/ updates
    - checkpatch updates
    - binfmt updates
    - autofs4
    - drivers/rtc/
    - various small tweaks to less used filesystems
    - ipc/ updates
    - kernel/watchdog.c changes

    * emailed patches from Andrew Morton : (135 commits)
    mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared
    kernel/param: consolidate __{start,stop}___param[] in
    ia64: remove duplicate declarations of __per_cpu_start[] and __per_cpu_end[]
    frv: remove unused declarations of __start___ex_table and __stop___ex_table
    kvm: ensure hard lockup detection is disabled by default
    kernel/watchdog.c: control hard lockup detection default
    staging: rtl8192u: use %*pEn to escape buffer
    staging: rtl8192e: use %*pEn to escape buffer
    staging: wlan-ng: use %*pEhp to print SN
    lib80211: remove unused print_ssid()
    wireless: hostap: proc: print properly escaped SSID
    wireless: ipw2x00: print SSID via %*pE
    wireless: libertas: print esaped string via %*pE
    lib/vsprintf: add %*pE[achnops] format specifier
    lib / string_helpers: introduce string_escape_mem()
    lib / string_helpers: refactoring the test suite
    lib / string_helpers: move documentation to c-file
    include/linux: remove strict_strto* definitions
    arch/x86/mm/numa.c: fix boot failure when all nodes are hotpluggable
    fs: check bh blocknr earlier when searching lru
    ...

    Linus Torvalds
     
  • Pull s390 updates from Martin Schwidefsky:
    "This patch set contains the main portion of the changes for 3.18 in
    regard to the s390 architecture. It is a bit bigger than usual,
    mainly because of a new driver and the vector extension patches.

    The interesting bits are:
    - Quite a bit of work on the tracing front. Uprobes is enabled and
    the ftrace code is reworked to get some of the lost performance
    back if CONFIG_FTRACE is enabled.
    - To improve boot time with CONFIG_DEBIG_PAGEALLOC, support for the
    IPTE range facility is added.
    - The rwlock code is re-factored to improve writer fairness and to be
    able to use the interlocked-access instructions.
    - The kernel part for the support of the vector extension is added.
    - The device driver to access the CD/DVD on the HMC is added, this
    will hopefully come in handy to improve the installation process.
    - Add support for control-unit initiated reconfiguration.
    - The crypto device driver is enhanced to enable the additional AP
    domains and to allow the new crypto hardware to be used.
    - Bug fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits)
    s390/ftrace: simplify enabling/disabling of ftrace_graph_caller
    s390/ftrace: remove 31 bit ftrace support
    s390/kdump: add support for vector extension
    s390/disassembler: add vector instructions
    s390: add support for vector extension
    s390/zcrypt: Toleration of new crypto hardware
    s390/idle: consolidate idle functions and definitions
    s390/nohz: use a per-cpu flag for arch_needs_cpu
    s390/vtime: do not reset idle data on CPU hotplug
    s390/dasd: add support for control unit initiated reconfiguration
    s390/dasd: fix infinite loop during format
    s390/mm: make use of ipte range facility
    s390/setup: correct 4-level kernel page table detection
    s390/topology: call set_sched_topology early
    s390/uprobes: architecture backend for uprobes
    s390/uprobes: common library for kprobes and uprobes
    s390/rwlock: use the interlocked-access facility 1 instructions
    s390/rwlock: improve writer fairness
    s390/rwlock: remove interrupt-enabling rwlock variant.
    s390/mm: remove change bit override support
    ...

    Linus Torvalds
     
  • Pull x86 seccomp changes from Ingo Molnar:
    "This tree includes x86 seccomp filter speedups and related preparatory
    work, which touches core seccomp facilities as well.

    The main idea is to split seccomp into two phases, to be able to enter
    a simple fast path for syscalls with ptrace side effects.

    There's no substantial user-visible (and ABI) effects expected from
    this, except a change in how we emit a better audit record for
    SECCOMP_RET_TRACE events"

    * 'x86-seccomp-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86_64, entry: Use split-phase syscall_trace_enter for 64-bit syscalls
    x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscalls
    x86: Split syscall_trace_enter into two phases
    x86, entry: Only call user_exit if TIF_NOHZ
    x86, x32, audit: Fix x32's AUDIT_ARCH wrt audit
    seccomp: Document two-phase seccomp and arch-provided seccomp_data
    seccomp: Allow arch code to provide seccomp_data
    seccomp: Refactor the filter callback and the API
    seccomp,x86,arm,mips,s390: Remove nr parameter from secure_computing

    Linus Torvalds
     
  • Consolidate the various external const and non-const declarations of
    __start___param[] and __stop___param in . This
    requires making a few struct kernel_param pointers in kernel/params.c
    const.

    Signed-off-by: Geert Uytterhoeven
    Acked-by: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     
  • In some cases we don't want hard lockup detection enabled by default.
    An example is when running as a guest. Introduce

    watchdog_enable_hardlockup_detector(bool)

    allowing those cases to disable hard lockup detection. This must be
    executed early by the boot processor from e.g. smp_prepare_boot_cpu, in
    order to allow kernel command line arguments to override it, as well as
    to avoid hard lockup detection being enabled before we've had a chance
    to indicate that it's unwanted. In summary,

    initial boot: default=enabled
    smp_prepare_boot_cpu
    watchdog_enable_hardlockup_detector(false): default=disabled
    cmdline has 'nmi_watchdog=1': default=enabled

    The running kernel still has the ability to enable/disable at any time
    with /proc/sys/kernel/nmi_watchdog us usual. However even when the
    default has been overridden /proc/sys/kernel/nmi_watchdog will initially
    show '1'. To truly turn it on one must disable/enable it, i.e.

    echo 0 > /proc/sys/kernel/nmi_watchdog
    echo 1 > /proc/sys/kernel/nmi_watchdog

    This patch will be immediately useful for KVM with the next patch of this
    series. Other hypervisor guest types may find it useful as well.

    [akpm@linux-foundation.org: fix build]
    [dzickus@redhat.com: fix compile issues on sparc]
    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Andrew Jones
    Signed-off-by: Don Zickus
    Signed-off-by: Don Zickus
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • The kernel used to contain two functions for length-delimited,
    case-insensitive string comparison, strnicmp with correct semantics and
    a slightly buggy strncasecmp. The latter is the POSIX name, so strnicmp
    was renamed to strncasecmp, and strnicmp made into a wrapper for the new
    strncasecmp to avoid breaking existing users.

    To allow the compat wrapper strnicmp to be removed at some point in the
    future, and to avoid the extra indirection cost, do
    s/strnicmp/strncasecmp/g.

    Signed-off-by: Rasmus Villemoes
    Cc: Jason Wessel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     
  • We have a large university system in the UK that is experiencing very long
    delays modprobing the driver for a specific I/O device. The delay is from
    8-10 minutes per device and there are 31 devices in the system. This 4 to
    5 hour delay in starting up those I/O devices is very much a burden on the
    customer.

    There are two causes for requiring a restart/reload of the drivers. First
    is periodic preventive maintenance (PM) and the second is if any of the
    devices experience a fatal error. Both of these trigger this excessively
    long delay in bringing the system back up to full capability.

    The problem was tracked down to a very slow IOREMAP operation and the
    excessively long ioresource lookup to insure that the user is not
    attempting to ioremap RAM. These patches provide a speed up to that
    function.

    The modprobe time appears to be affected quite a bit by previous activity
    on the ioresource list, which I suspect is due to cache preloading. While
    the overall improvement is impacted by other overhead of starting the
    devices, this drastically improves the modprobe time.

    Also our system is considerably smaller so the percentages gained will not
    be the same. Best case improvement with the modprobe on our 20 device
    smallish system was from 'real 5m51.913s' to 'real 0m18.275s'.

    This patch (of 2):

    Since the ioremap operation is verifying that the specified address range
    is NOT RAM, it will search the entire ioresource list if the condition is
    true. To make matters worse, it does this one 4k page at a time. For a
    128M BAR region this is 32 passes to determine the entire region does not
    contain any RAM addresses.

    This patch provides another resource lookup function, region_is_ram, that
    searches for the entire region specified, verifying that it is completely
    contained within the resource region. If it is found, then it is checked
    to be RAM or not, within a single pass.

    The return result reflects if it was found or not (-1), and whether it is
    RAM (1) or not (0). This allows the caller to fallback to the previous
    page by page search if it was not found.

    [akpm@linux-foundation.org: fix spellos and typos in comment]
    Signed-off-by: Mike Travis
    Acked-by: Alex Thorlton
    Reviewed-by: Cliff Wickman
    Cc: Thomas Gleixner
    Cc: H. Peter Anvin
    Cc: Mark Salter
    Cc: Dave Young
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Mel Gorman
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Travis
     
  • This is a cleanup. In function parse_crashkernel_suffix, the parameter
    crash_base is not used. So here remove it.

    Signed-off-by: Baoquan He
    Acked-by: Vivek Goyal
    Cc: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Baoquan He
     
  • In locate_mem_hole functions, a memory hole is located and added as
    kexec_segment. But from the name of locate_mem_hole, it should only take
    responsibility of searching a available memory hole to contain data of a
    specified size.

    So in this patch add a new field 'mem' into kexec_buf, then take that
    kexec segment adding code out of locate_mem_hole_top_down and
    locate_mem_hole_bottom_up. This make clear of the functionality of
    locate_mem_hole just like it declars to do. And by this
    locate_mem_hole_callback chould be used later if anyone want to locate a
    memory hole for other use.

    Meanwhile Vivek suggested opening code function __kexec_add_segment(),
    that way we have to retreive ksegment pointer once and it is easy to read.
    So just do it in this patch and remove __kexec_add_segment() since no one
    use it anymore.

    Signed-off-by: Baoquan He
    Acked-by: Vivek Goyal
    Cc: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Baoquan He
     
  • Reduce boilerplate code by using __seq_open_private() instead of
    seq_open() in kallsyms_open().

    Signed-off-by: Rob Jones
    Cc: Gideon Israel Dsouza
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rob Jones
     
  • Commit 458df9fd4815 ("printk: remove separate printk_sched buffers and use
    printk buf instead") hardcodes printk_deferred() to KERN_WARNING and
    inserts the string "[sched_delayed] " before the actual message. However
    it doesn't take into account the KERN_* prefix of the message, that now
    ends up in the middle of the output:

    [sched_delayed] ^a4CE: hpet increased min_delta_ns to 20115 nsec

    Fix this by just getting rid of the "[sched_delayed] " scnprintf(). The
    prefix is useless since 458df9fd4815 anyway since from that moment
    printk_deferred() inserts the message into the kernel printk buffer
    immediately. So if the message eventually gets printed to console, it is
    printed in the correct order with other messages and there's no need for
    any special prefix. And if the kernel crashes before the message makes it
    to console, then prefix in the printk buffer doesn't make the situation
    any better.

    Link: http://lkml.org/lkml/2014/9/14/4

    Signed-off-by: Markus Trippelsdorf
    Acked-by: Jan Kara
    Acked-by: Steven Rostedt
    Cc: Geert Uytterhoeven
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Markus Trippelsdorf
     
  • When configuring a uniprocessor kernel, don't bother the user with an
    irrelevant LOG_CPU_MAX_BUF_SHIFT question, and don't build the unused
    code.

    Signed-off-by: Geert Uytterhoeven
    Acked-by: Luis R. Rodriguez
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     

13 Oct, 2014

7 commits

  • Pull scheduler updates from Ingo Molnar:
    "The main changes in this cycle were:

    - Optimized support for Intel "Cluster-on-Die" (CoD) topologies (Dave
    Hansen)

    - Various sched/idle refinements for better idle handling (Nicolas
    Pitre, Daniel Lezcano, Chuansheng Liu, Vincent Guittot)

    - sched/numa updates and optimizations (Rik van Riel)

    - sysbench speedup (Vincent Guittot)

    - capacity calculation cleanups/refactoring (Vincent Guittot)

    - Various cleanups to thread group iteration (Oleg Nesterov)

    - Double-rq-lock removal optimization and various refactorings
    (Kirill Tkhai)

    - various sched/deadline fixes

    ... and lots of other changes"

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (72 commits)
    sched/dl: Use dl_bw_of() under rcu_read_lock_sched()
    sched/fair: Delete resched_cpu() from idle_balance()
    sched, time: Fix build error with 64 bit cputime_t on 32 bit systems
    sched: Improve sysbench performance by fixing spurious active migration
    sched/x86: Fix up typo in topology detection
    x86, sched: Add new topology for multi-NUMA-node CPUs
    sched/rt: Use resched_curr() in task_tick_rt()
    sched: Use rq->rd in sched_setaffinity() under RCU read lock
    sched: cleanup: Rename 'out_unlock' to 'out_free_new_mask'
    sched: Use dl_bw_of() under RCU read lock
    sched/fair: Remove duplicate code from can_migrate_task()
    sched, mips, ia64: Remove __ARCH_WANT_UNLOCKED_CTXSW
    sched: print_rq(): Don't use tasklist_lock
    sched: normalize_rt_tasks(): Don't use _irqsave for tasklist_lock, use task_rq_lock()
    sched: Fix the task-group check in tg_has_rt_tasks()
    sched/fair: Leverage the idle state info when choosing the "idlest" cpu
    sched: Let the scheduler see CPU idle states
    sched/deadline: Fix inter- exclusive cpusets migrations
    sched/deadline: Clear dl_entity params when setscheduling to different class
    sched/numa: Kill the wrong/dead TASK_DEAD check in task_numa_fault()
    ...

    Linus Torvalds
     
  • Pull watchdog fixes from Ingo Molnar:
    "Two small watchdog subsystem fixes"

    * 'perf-watchdog-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    watchdog: Fix print-once on enable
    watchdog: Remove unnecessary header files

    Linus Torvalds
     
  • Pull perf fixes from Ingo Molnar:
    "Two leftover fixes from the v3.17 cycle - these will be forwarded to
    stable as well, if they prove problem-free in wider testing as well"

    [ Side note: the "fix perf bug in fork()" fix had also come in through
    Andrew's patch-bomb - Linus ]

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf: Fix perf bug in fork()
    perf: Fix unclone_ctx() vs. locking

    Linus Torvalds
     
  • Pull perf updates from Ingo Molnar:
    "Kernel side updates:

    - Fix and enhance poll support (Jiri Olsa)

    - Re-enable inheritance optimization (Jiri Olsa)

    - Enhance Intel memory events support (Stephane Eranian)

    - Refactor the Intel uncore driver to be more maintainable (Zheng
    Yan)

    - Enhance and fix Intel CPU and uncore PMU drivers (Peter Zijlstra,
    Andi Kleen)

    - [ plus various smaller fixes/cleanups ]

    User visible tooling updates:

    - Add +field argument support for --field option, so that one can add
    fields to the default list of fields to show, ie now one can just
    do:

    perf report --fields +pid

    And the pid will appear in addition to the default fields (Jiri
    Olsa)

    - Add +field argument support for --sort option (Jiri Olsa)

    - Honour -w in the report tools (report, top), allowing to specify
    the widths for the histogram entries columns (Namhyung Kim)

    - Properly show submicrosecond times in 'perf kvm stat' (Christian
    Borntraeger)

    - Add beautifier for mremap flags param in 'trace' (Alex Snast)

    - perf script: Allow callchains if any event samples them

    - Don't truncate Intel style addresses in 'annotate' (Alex Converse)

    - Allow profiling when kptr_restrict == 1 for non root users, kernel
    samples will just remain unresolved (Andi Kleen)

    - Allow configuring default options for callchains in config file
    (Namhyung Kim)

    - Support operations for shared futexes. (Davidlohr Bueso)

    - "perf kvm stat report" improvements by Alexander Yarygin:
    - Save pid string in opts.target.pid
    - Enable the target.system_wide flag
    - Unify the title bar output

    - [ plus lots of other fixes and small improvements. ]

    Tooling infrastructure changes:

    - Refactor unit and scale function parameters for PMU parsing
    routines (Matt Fleming)

    - Improve DSO long names lookup with rbtree, resulting in great
    speedup for workloads with lots of DSOs (Waiman Long)

    - We were not handling POLLHUP notifications for event file
    descriptors

    Fix it by filtering entries in the events file descriptor array
    after poll() returns, refcounting mmaps so that when the last fd
    pointing to a perf mmap goes away we do the unmap (Arnaldo Carvalho
    de Melo)

    - Intel PT prep work, from Adrian Hunter, including:
    - Let a user specify a PMU event without any config terms
    - Add perf-with-kcore script
    - Let default config be defined for a PMU
    - Add perf_pmu__scan_file()
    - Add a 'perf test' for tracking with sched_switch
    - Add 'flush' callback to scripting API

    - Use ring buffer consume method to look like other tools (Arnaldo
    Carvalho de Melo)

    - hists browser (used in top and report) refactorings, getting rid of
    unused variables and reducing source code size by handling similar
    cases in a fewer functions (Namhyung Kim).

    - Replace thread unsafe strerror() with strerror_r() accross the
    whole tools/perf/ tree (Masami Hiramatsu)

    - Rename ordered_samples to ordered_events and allow setting a queue
    size for ordering events (Jiri Olsa)

    - [ plus lots of fixes, cleanups and other improvements ]"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (198 commits)
    perf/x86: Tone down kernel messages when the PMU check fails in a virtual environment
    perf/x86/intel/uncore: Fix minor race in box set up
    perf record: Fix error message for --filter option not coming after tracepoint
    perf tools: Fix build breakage on arm64 targets
    perf symbols: Improve DSO long names lookup speed with rbtree
    perf symbols: Encapsulate dsos list head into struct dsos
    perf bench futex: Sanitize -q option in requeue
    perf bench futex: Support operations for shared futexes
    perf trace: Fix mmap return address truncation to 32-bit
    perf tools: Refactor unit and scale function parameters
    perf tools: Fix line number in the config file error message
    perf tools: Convert {record,top}.call-graph option to call-graph.record-mode
    perf tools: Introduce perf_callchain_config()
    perf callchain: Move some parser functions to callchain.c
    perf tools: Move callchain config from record_opts to callchain_param
    perf hists browser: Fix callchain print bug on TUI
    perf tools: Use ACCESS_ONCE() instead of volatile cast
    perf tools: Modify error code for when perf_session__new() fails
    perf tools: Fix perf record as non root with kptr_restrict == 1
    perf stat: Fix --per-core on multi socket systems
    ...

    Linus Torvalds
     
  • Pull core locking updates from Ingo Molnar:
    "The main updates in this cycle were:

    - mutex MCS refactoring finishing touches: improve comments, refactor
    and clean up code, reduce debug data structure footprint, etc.

    - qrwlock finishing touches: remove old code, self-test updates.

    - small rwsem optimization

    - various smaller fixes/cleanups"

    * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    locking/lockdep: Revert qrwlock recusive stuff
    locking/rwsem: Avoid double checking before try acquiring write lock
    locking/rwsem: Move EXPORT_SYMBOL() lines to follow function definition
    locking/rwlock, x86: Delete unused asm/rwlock.h and rwlock.S
    locking/rwlock, x86: Clean up asm/spinlock*.h to remove old rwlock code
    locking/semaphore: Resolve some shadow warnings
    locking/selftest: Support queued rwlock
    locking/lockdep: Restrict the use of recursive read_lock() with qrwlock
    locking/spinlocks: Always evaluate the second argument of spin_lock_nested()
    locking/Documentation: Update locking/mutex-design.txt disadvantages
    locking/Documentation: Move locking related docs into Documentation/locking/
    locking/mutexes: Use MUTEX_SPIN_ON_OWNER when appropriate
    locking/mutexes: Refactor optimistic spinning code
    locking/mcs: Remove obsolete comment
    locking/mutexes: Document quick lock release when unlocking
    locking/mutexes: Standardize arguments in lock/unlock slowpaths
    locking: Remove deprecated smp_mb__() barriers

    Linus Torvalds
     
  • Pull RCU updates from Ingo Molnar:
    "The main changes in this cycle were:

    - changes related to No-CBs CPUs and NO_HZ_FULL

    - RCU-tasks implementation

    - torture-test updates

    - miscellaneous fixes

    - locktorture updates

    - RCU documentation updates"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits)
    workqueue: Use cond_resched_rcu_qs macro
    workqueue: Add quiescent state between work items
    locktorture: Cleanup header usage
    locktorture: Cannot hold read and write lock
    locktorture: Fix __acquire annotation for spinlock irq
    locktorture: Support rwlocks
    rcu: Eliminate deadlock between CPU hotplug and expedited grace periods
    locktorture: Document boot/module parameters
    rcutorture: Rename rcutorture_runnable parameter
    locktorture: Add test scenario for rwsem_lock
    locktorture: Add test scenario for mutex_lock
    locktorture: Make torture scripting account for new _runnable name
    locktorture: Introduce torture context
    locktorture: Support rwsems
    locktorture: Add infrastructure for torturing read locks
    torture: Address race in module cleanup
    locktorture: Make statistics generic
    locktorture: Teach about lock debugging
    locktorture: Support mutexes
    locktorture: Add documentation
    ...

    Linus Torvalds
     
  • Pull vfs updates from Al Viro:
    "The big thing in this pile is Eric's unmount-on-rmdir series; we
    finally have everything we need for that. The final piece of prereqs
    is delayed mntput() - now filesystem shutdown always happens on
    shallow stack.

    Other than that, we have several new primitives for iov_iter (Matt
    Wilcox, culled from his XIP-related series) pushing the conversion to
    ->read_iter()/ ->write_iter() a bit more, a bunch of fs/dcache.c
    cleanups and fixes (including the external name refcounting, which
    gives consistent behaviour of d_move() wrt procfs symlinks for long
    and short names alike) and assorted cleanups and fixes all over the
    place.

    This is just the first pile; there's a lot of stuff from various
    people that ought to go in this window. Starting with
    unionmount/overlayfs mess... ;-/"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (60 commits)
    fs/file_table.c: Update alloc_file() comment
    vfs: Deduplicate code shared by xattr system calls operating on paths
    reiserfs: remove pointless forward declaration of struct nameidata
    don't need that forward declaration of struct nameidata in dcache.h anymore
    take dname_external() into fs/dcache.c
    let path_init() failures treated the same way as subsequent link_path_walk()
    fix misuses of f_count() in ppp and netlink
    ncpfs: use list_for_each_entry() for d_subdirs walk
    vfs: move getname() from callers to do_mount()
    gfs2_atomic_open(): skip lookups on hashed dentry
    [infiniband] remove pointless assignments
    gadgetfs: saner API for gadgetfs_create_file()
    f_fs: saner API for ffs_sb_create_file()
    jfs: don't hash direct inode
    [s390] remove pointless assignment of ->f_op in vmlogrdr ->open()
    ecryptfs: ->f_op is never NULL
    android: ->f_op is never NULL
    nouveau: __iomem misannotations
    missing annotation in fs/file.c
    fs: namespace: suppress 'may be used uninitialized' warnings
    ...

    Linus Torvalds
     

12 Oct, 2014

2 commits

  • Pull tracing fixes from Steven Rostedt:
    "Seems that Peter Zijlstra added a new check that is making old code
    scream nasty warnings:

    WARNING: CPU: 0 PID: 91 at kernel/sched/core.c:7253 __might_sleep+0x9a/0x378()
    do not call blocking ops when !TASK_RUNNING; state=1 set at [] event_test_thread+0x48/0x93
    Call Trace:
    __might_sleep+0x9a/0x378
    down_read+0x26/0x98
    exit_signals+0x27/0x1c2
    do_exit+0x193/0x10bd
    kthread+0x156/0x156
    ret_from_fork+0x7a/0xb0

    These are triggered by some self tests that run at start up when
    configure in. Although the code is technically correct, they are a
    little sloppy and not very robust. They work now because it runs at
    boot up and the tests do not call anything that might trigger a
    spurious wake up. But that doesn't mean those tests wont change in
    the future.

    It's best to clean them now to make sure the tests used to test the
    internal workings of the system don't cause breakage themselves.

    This also quiets the warnings made by the new checks"

    * tag 'trace-3.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Clean up scheduling in trace_wakeup_test_thread()
    tracing: Robustify wait loop

    Linus Torvalds
     
  • Pull tracing updates from Steven Rostedt:
    "This set has a few minor updates, but the big change is the redesign
    of the trampoline logic.

    The trampoline logic of 3.17 required a descriptor for every function
    that is registered to be traced and uses a trampoline. Currently,
    only the function graph tracer uses a trampoline, but if you were to
    trace all 32,000 (give or take a few thousand) functions with the
    function graph tracer, it would create 32,000 descriptors to let us
    know that there's a trampoline associated with it. This takes up a
    bit of memory when there's a better way to do it.

    The redesign now reuses the ftrace_ops' (what registers the function
    graph tracer) hash tables. The hash tables tell ftrace what the
    tracer wants to trace or doesn't want to trace. There's two of them:
    one that tells us what to trace, the other tells us what not to trace.
    If the first one is empty, it means all functions should be traced,
    otherwise only the ones that are listed should be. The second hash
    table tells us what not to trace, and if it is empty, all functions
    may be traced, and if there's any listed, then those should not be
    traced even if they exist in the first hash table.

    It took a bit of massaging, but now these hashes can be used to keep
    track of what has a trampoline and what does not, and allows the
    ftrace accounting to work. Now we can trace all functions when using
    the function graph trampoline, and avoid needing to create any special
    descriptors to hold all the functions that are being traced"

    * tag 'trace-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    ftrace: Only disable ftrace_enabled to test buffer in selftest
    ftrace: Add sanity check when unregistering last ftrace_ops
    kernel: trace_syscalls: Replace rcu_assign_pointer() with RCU_INIT_POINTER()
    tracing: generate RCU warnings even when tracepoints are disabled
    ftrace: Replace tramp_hash with old_*_hash to save space
    ftrace: Annotate the ops operation on update
    ftrace: Grab any ops for a rec for enabled_functions output
    ftrace: Remove freeing of old_hash from ftrace_hash_move()
    ftrace: Set callback to ftrace_stub when no ops are registered
    ftrace: Add helper function ftrace_ops_get_func()
    ftrace: Add separate function for non recursive callbacks

    Linus Torvalds
     

11 Oct, 2014

5 commits


10 Oct, 2014

9 commits

  • Pull percpu updates from Tejun Heo:
    "A lot of activities on percpu front. Notable changes are...

    - percpu allocator now can take @gfp. If @gfp doesn't contain
    GFP_KERNEL, it tries to allocate from what's already available to
    the allocator and a work item tries to keep the reserve around
    certain level so that these atomic allocations usually succeed.

    This will replace the ad-hoc percpu memory pool used by
    blk-throttle and also be used by the planned blkcg support for
    writeback IOs.

    Please note that I noticed a bug in how @gfp is interpreted while
    preparing this pull request and applied the fix 6ae833c7fe0c
    ("percpu: fix how @gfp is interpreted by the percpu allocator")
    just now.

    - percpu_ref now uses longs for percpu and global counters instead of
    ints. It leads to more sparse packing of the percpu counters on
    64bit machines but the overhead should be negligible and this
    allows using percpu_ref for refcnting pages and in-memory objects
    directly.

    - The switching between percpu and single counter modes of a
    percpu_ref is made independent of putting the base ref and a
    percpu_ref can now optionally be initialized in single or killed
    mode. This allows avoiding percpu shutdown latency for cases where
    the refcounted objects may be synchronously created and destroyed
    in rapid succession with only a fraction of them reaching fully
    operational status (SCSI probing does this when combined with
    blk-mq support). It's also planned to be used to implement forced
    single mode to detect underflow more timely for debugging.

    There's a separate branch percpu/for-3.18-consistent-ops which cleans
    up the duplicate percpu accessors. That branch causes a number of
    conflicts with s390 and other trees. I'll send a separate pull
    request w/ resolutions once other branches are merged"

    * 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (33 commits)
    percpu: fix how @gfp is interpreted by the percpu allocator
    blk-mq, percpu_ref: start q->mq_usage_counter in atomic mode
    percpu_ref: make INIT_ATOMIC and switch_to_atomic() sticky
    percpu_ref: add PERCPU_REF_INIT_* flags
    percpu_ref: decouple switching to percpu mode and reinit
    percpu_ref: decouple switching to atomic mode and killing
    percpu_ref: add PCPU_REF_DEAD
    percpu_ref: rename things to prepare for decoupling percpu/atomic mode switch
    percpu_ref: replace pcpu_ prefix with percpu_
    percpu_ref: minor code and comment updates
    percpu_ref: relocate percpu_ref_reinit()
    Revert "blk-mq, percpu_ref: implement a kludge for SCSI blk-mq stall during probe"
    Revert "percpu: free percpu allocation info for uniprocessor system"
    percpu-refcount: make percpu_ref based on longs instead of ints
    percpu-refcount: improve WARN messages
    percpu: fix locking regression in the failure path of pcpu_alloc()
    percpu-refcount: add @gfp to percpu_ref_init()
    proportions: add @gfp to init functions
    percpu_counter: add @gfp to percpu_counter_init()
    percpu_counter: make percpu_counters_lock irq-safe
    ...

    Linus Torvalds
     
  • Pull cgroup updates from Tejun Heo:
    "Nothing too interesting. Just a handful of cleanup patches"

    * 'for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    Revert "cgroup: remove redundant variable in cgroup_mount()"
    cgroup: remove redundant variable in cgroup_mount()
    cgroup: fix missing unlock in cgroup_release_agent()
    cgroup: remove CGRP_RELEASABLE flag
    perf/cgroup: Remove perf_put_cgroup()
    cgroup: remove redundant check in cgroup_ino()
    cpuset: simplify proc_cpuset_show()
    cgroup: simplify proc_cgroup_show()
    cgroup: use a per-cgroup work for release agent
    cgroup: remove bogus comments
    cgroup: remove redundant code in cgroup_rmdir()
    cgroup: remove some useless forward declarations
    cgroup: fix a typo in comment.

    Linus Torvalds
     
  • Merge patch-bomb from Andrew Morton:
    - part of OCFS2 (review is laggy again)
    - procfs
    - slab
    - all of MM
    - zram, zbud
    - various other random things: arch, filesystems.

    * emailed patches from Andrew Morton : (164 commits)
    nosave: consolidate __nosave_{begin,end} in
    include/linux/screen_info.h: remove unused ORIG_* macros
    kernel/sys.c: compat sysinfo syscall: fix undefined behavior
    kernel/sys.c: whitespace fixes
    acct: eliminate compile warning
    kernel/async.c: switch to pr_foo()
    include/linux/blkdev.h: use NULL instead of zero
    include/linux/kernel.h: deduplicate code implementing clamp* macros
    include/linux/kernel.h: rewrite min3, max3 and clamp using min and max
    alpha: use Kbuild logic to include
    frv: remove deprecated IRQF_DISABLED
    frv: remove unused cpuinfo_frv and friends to fix future build error
    zbud: avoid accessing last unused freelist
    zsmalloc: simplify init_zspage free obj linking
    mm/zsmalloc.c: correct comment for fullness group computation
    zram: use notify_free to account all free notifications
    zram: report maximum used memory
    zram: zram memory size limitation
    zsmalloc: change return value unit of zs_get_total_size_bytes
    zsmalloc: move pages_allocated to zs_pool
    ...

    Linus Torvalds
     
  • Fix undefined behavior and compiler warning by replacing right shift 32
    with upper_32_bits macro

    Signed-off-by: Scotty Bauer
    Cc: Clemens Ladisch
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Scotty Bauer
     
  • Fix minor errors and warning messages in kernel/sys.c. These errors were
    reported by checkpatch while working with some modifications in sys.c
    file. Fixing this first will help me to improve my further patches.

    ERROR: trailing whitespace - 9
    ERROR: do not use assignment in if condition - 4
    ERROR: spaces required around that '?' (ctx:VxO) - 10
    ERROR: switch and case should be at the same indent - 3

    total 26 errors & 3 warnings fixed.

    Signed-off-by: vishnu.ps
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    vishnu.ps
     
  • If ACCT_VERSION is not defined to 3, below warning appears:
    CC kernel/acct.o
    kernel/acct.c: In function `do_acct_process':
    kernel/acct.c:475:24: warning: unused variable `ns' [-Wunused-variable]

    [akpm@linux-foundation.org: retain the local for code size improvements
    Signed-off-by: Ying Xue
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ying Xue
     
  • Signed-off-by: Ionut Alexa
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ionut Alexa
     
  • Dump the contents of the relevant struct_mm when we hit the bug condition.

    Signed-off-by: Sasha Levin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sasha Levin
     
  • 1. vma_policy_mof(task) is simply not safe unless task == current,
    it can race with do_exit()->mpol_put(). Remove this arg and update
    its single caller.

    2. vma can not be NULL, remove this check and simplify the code.

    Signed-off-by: Oleg Nesterov
    Cc: KAMEZAWA Hiroyuki
    Cc: David Rientjes
    Cc: KOSAKI Motohiro
    Cc: Alexander Viro
    Cc: Cyrill Gorcunov
    Cc: "Eric W. Biederman"
    Cc: "Kirill A. Shutemov"
    Cc: Peter Zijlstra
    Cc: Hugh Dickins
    Cc: Andi Kleen
    Cc: Naoya Horiguchi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov