22 Aug, 2012

1 commit

  • It seems commit 4a9d4b024a31 ("switch fput to task_work_add") re-
    introduced the problem addressed in 944be0b22472 ("close_files(): add
    scheduling point")

    If a server process with a lot of files (say 2 million tcp sockets) is
    killed, we can spend a lot of time in task_work_run() and trigger a soft
    lockup.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

21 Aug, 2012

1 commit


19 Aug, 2012

2 commits

  • Merge alpha architecture update from Michael Cree:
    "The Alpha Maintainer, Matt Turner, is currently unavailable, so I have
    collected up patches that have been posted to the linux-alpha mailing
    list over the last couple of months, and are forwarding them to you in
    the hope that you are prepared to accept them via me.

    The patches by Al Viro and myself I have been running against kernels
    for two months now so have had quite a bit of testing. All except one
    patch were intended for the 3.5 kernel but because of Matt's
    unavailability never got forwarded to you."

    * emailed patches from Michael Cree : (9 commits)
    alpha: Fix fall-out from disintegrating asm/system.h
    Redefine ATOMIC_INIT and ATOMIC64_INIT to drop the casts
    alpha: fix fpu.h usage in userspace
    alpha/mm/fault.c: Port OOM changes to do_page_fault
    alpha: take kernel_execve() out of entry.S
    alpha: take a bunch of syscalls into osf_sys.c
    alpha: Use new generic strncpy_from_user() and strnlen_user()
    alpha: Wire up cross memory attach syscalls
    alpha: Don't export SOCK_NONBLOCK to user space.

    Linus Torvalds
     
  • New helper: current_thread_info(). Allows to do a bunch of odd syscalls
    in C. While we are at it, there had never been a reason to do
    osf_getpriority() in assembler. We also get "namespace"-aware (read:
    consistent with getuid(2), etc.) behaviour from getx?id() syscalls now.

    Signed-off-by: Al Viro
    Signed-off-by: Michael Cree
    Acked-by: Matt Turner
    Signed-off-by: Linus Torvalds

    Al Viro
     

14 Aug, 2012

5 commits

  • Make stop scheduler class do the same accounting as other classes,

    Migration threads can be caught in the act while doing exec balancing,
    leading to the below due to use of unmaintained ->se.exec_start. The
    load that triggered this particular instance was an apparently out of
    control heavily threaded application that does system monitoring in
    what equated to an exec bomb, with one of the VERY frequently migrated
    tasks being ps.

    %CPU PID USER CMD
    99.3 45 root [migration/10]
    97.7 53 root [migration/12]
    97.0 57 root [migration/13]
    90.1 49 root [migration/11]
    89.6 65 root [migration/15]
    88.7 17 root [migration/3]
    80.4 37 root [migration/8]
    78.1 41 root [migration/9]
    44.2 13 root [migration/2]

    Signed-off-by: Mike Galbraith
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1344051854.6739.19.camel@marge.simpson.net
    Signed-off-by: Thomas Gleixner

    Mike Galbraith
     
  • Root task group bandwidth replenishment must service all CPUs, regardless of
    where the timer was last started, and regardless of the isolation mechanism,
    lest 'Quoth the Raven, "Nevermore"' become rt scheduling policy.

    Signed-off-by: Mike Galbraith
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1344326558.6968.25.camel@marge.simpson.net
    Signed-off-by: Thomas Gleixner

    Mike Galbraith
     
  • With multiple instances of task_groups, for_each_rt_rq() is a noop,
    no task groups having been added to the rt.c list instance. This
    renders __enable/disable_runtime() and print_rt_stats() noop, the
    user (non) visible effect being that rt task groups are missing in
    /proc/sched_debug.

    Signed-off-by: Mike Galbraith
    Cc: stable@kernel.org # v3.3+
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1344308413.6846.7.camel@marge.simpson.net
    Signed-off-by: Thomas Gleixner

    Mike Galbraith
     
  • On architectures where cputime_t is 64 bit type, is possible to trigger
    divide by zero on do_div(temp, (__force u32) total) line, if total is a
    non zero number but has lower 32 bit's zeroed. Removing casting is not
    a good solution since some do_div() implementations do cast to u32
    internally.

    This problem can be triggered in practice on very long lived processes:

    PID: 2331 TASK: ffff880472814b00 CPU: 2 COMMAND: "oraagent.bin"
    #0 [ffff880472a51b70] machine_kexec at ffffffff8103214b
    #1 [ffff880472a51bd0] crash_kexec at ffffffff810b91c2
    #2 [ffff880472a51ca0] oops_end at ffffffff814f0b00
    #3 [ffff880472a51cd0] die at ffffffff8100f26b
    #4 [ffff880472a51d00] do_trap at ffffffff814f03f4
    #5 [ffff880472a51d60] do_divide_error at ffffffff8100cfff
    #6 [ffff880472a51e00] divide_error at ffffffff8100be7b
    [exception RIP: thread_group_times+0x56]
    RIP: ffffffff81056a16 RSP: ffff880472a51eb8 RFLAGS: 00010046
    RAX: bc3572c9fe12d194 RBX: ffff880874150800 RCX: 0000000110266fad
    RDX: 0000000000000000 RSI: ffff880472a51eb8 RDI: 001038ae7d9633dc
    RBP: ffff880472a51ef8 R8: 00000000b10a3a64 R9: ffff880874150800
    R10: 00007fcba27ab680 R11: 0000000000000202 R12: ffff880472a51f08
    R13: ffff880472a51f10 R14: 0000000000000000 R15: 0000000000000007
    ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
    #7 [ffff880472a51f00] do_sys_times at ffffffff8108845d
    #8 [ffff880472a51f40] sys_times at ffffffff81088524
    #9 [ffff880472a51f80] system_call_fastpath at ffffffff8100b0f2
    RIP: 0000003808caac3a RSP: 00007fcba27ab6d8 RFLAGS: 00000202
    RAX: 0000000000000064 RBX: ffffffff8100b0f2 RCX: 0000000000000000
    RDX: 00007fcba27ab6e0 RSI: 000000000076d58e RDI: 00007fcba27ab6e0
    RBP: 00007fcba27ab700 R8: 0000000000000020 R9: 000000000000091b
    R10: 00007fcba27ab680 R11: 0000000000000202 R12: 00007fff9ca41940
    R13: 0000000000000000 R14: 00007fcba27ac9c0 R15: 00007fff9ca41940
    ORIG_RAX: 0000000000000064 CS: 0033 SS: 002b

    Cc: stable@vger.kernel.org
    Signed-off-by: Stanislaw Gruszka
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20120808092714.GA3580@redhat.com
    Signed-off-by: Thomas Gleixner

    Stanislaw Gruszka
     
  • Peter Portante reported that for large cgroup hierarchies (and or on
    large CPU counts) we get immense lock contention on rq->lock and stuff
    stops working properly.

    His workload was a ton of processes, each in their own cgroup,
    everybody idling except for a sporadic wakeup once every so often.

    It was found that:

    schedule()
    idle_balance()
    load_balance()
    local_irq_save()
    double_rq_lock()
    update_h_load()
    walk_tg_tree(tg_load_down)
    tg_load_down()

    Results in an entire cgroup hierarchy walk under rq->lock for every
    new-idle balance and since new-idle balance isn't throttled this
    results in a lot of work while holding the rq->lock.

    This patch does two things, it removes the work from under rq->lock
    based on the good principle of race and pray which is widely employed
    in the load-balancer as a whole. And secondly it throttles the
    update_h_load() calculation to max once per jiffy.

    I considered excluding update_h_load() for new-idle balance
    all-together, but purely relying on regular balance passes to update
    this data might not work out under some rare circumstances where the
    new-idle busiest isn't the regular busiest for a while (unlikely, but
    a nightmare to debug if someone hits it and suffers).

    Cc: pjt@google.com
    Cc: Larry Woodman
    Cc: Mike Galbraith
    Reported-by: Peter Portante
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-aaarrzfpnaam7pqrekofu8a6@git.kernel.org
    Signed-off-by: Thomas Gleixner

    Peter Zijlstra
     

13 Aug, 2012

2 commits

  • Pull power management fixes from Rafael J. Wysocki:

    - Fix for two recent regressions in the generic PM domains framework.

    - Revert of a commit that introduced a resume regression and is
    conceptually incorrect in my opinion.

    - Fix for a return value in pcc-cpufreq.c from Julia Lawall.

    - RTC wakeup signaling fix from Neil Brown.

    - Suppression of compiler warnings for CONFIG_PM_SLEEP unset in ACPI,
    platform/x86 and TPM drivers.

    * tag 'pm-for-3.6-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    tpm_tis / PM: Fix unused function warning for CONFIG_PM_SLEEP
    platform / x86 / PM: Fix unused function warnings for CONFIG_PM_SLEEP
    ACPI / PM: Fix unused function warnings for CONFIG_PM_SLEEP
    Revert "NMI watchdog: fix for lockup detector breakage on resume"
    PM: Make dev_pm_get_subsys_data() always return 0 on success
    drivers/cpufreq/pcc-cpufreq.c: fix error return code
    RTC: Avoid races between RTC alarm wakeup and suspend.

    Linus Torvalds
     
  • While tracking down a weird buffer overflow issue in a program that
    looked to be sane, I started double checking the length returned by
    syslog(SYSLOG_ACTION_READ_ALL, ...) to make sure it wasn't overflowing
    the buffer.

    Sure enough, it was. I saw this in strace:

    11339 syslog(SYSLOG_ACTION_READ_ALL, "[244017.708129] REISERFS (dev"..., 8192) = 8279

    It turns out that the loops that calculate how much space the entries
    will take when they're copied don't include the newlines and prefixes
    that will be included in the final output since prev flags is passed as
    zero.

    This patch properly accounts for it and fixes the overflow.

    CC: stable@kernel.org
    Signed-off-by: Jeff Mahoney
    Signed-off-by: Linus Torvalds

    Jeff Mahoney
     

09 Aug, 2012

1 commit

  • Revert commit 45226e9 (NMI watchdog: fix for lockup detector breakage
    on resume) which breaks resume from system suspend on my SH7372
    Mackerel board (by causing a NULL pointer dereference to happen) and
    is generally wrong, because it abuses the CPU hotplug functionality
    in a shamelessly blatant way.

    The original issue should be addressed through appropriate syscore
    resume callback instead.

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

05 Aug, 2012

1 commit

  • Tetsuo Handa reported that sporadically the system clock starts
    counting up too quickly which is enough to confuse the hangcheck
    timer to print a bogus stall warning.

    Commit 2a8c0883 "time: Move xtime_nsec adjustment underflow handling
    timekeeping_adjust" overlooked this exit path:

    } else
    return;

    which should really be a proper exit sequence, fixing the bug as a
    side effect.

    Also make the flow more readable by properly balancing curly
    braces.

    Reported-by: Tetsuo Handa wrote:
    Tested-by: Tetsuo Handa wrote:
    Signed-off-by: Ingo Molnar
    Cc: john.stultz@linaro.org
    Cc: a.p.zijlstra@chello.nl
    Cc: richardcochran@gmail.com
    Cc: prarit@redhat.com
    Link: http://lkml.kernel.org/r/20120804192114.GA28347@gmail.com
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

04 Aug, 2012

6 commits


02 Aug, 2012

1 commit

  • Pull second vfs pile from Al Viro:
    "The stuff in there: fsfreeze deadlock fixes by Jan (essentially, the
    deadlock reproduced by xfstests 068), symlink and hardlink restriction
    patches, plus assorted cleanups and fixes.

    Note that another fsfreeze deadlock (emergency thaw one) is *not*
    dealt with - the series by Fernando conflicts a lot with Jan's, breaks
    userland ABI (FIFREEZE semantics gets changed) and trades the deadlock
    for massive vfsmount leak; this is going to be handled next cycle.
    There probably will be another pull request, but that stuff won't be
    in it."

    Fix up trivial conflicts due to unrelated changes next to each other in
    drivers/{staging/gdm72xx/usb_boot.c, usb/gadget/storage_common.c}

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
    delousing target_core_file a bit
    Documentation: Correct s_umount state for freeze_fs/unfreeze_fs
    fs: Remove old freezing mechanism
    ext2: Implement freezing
    btrfs: Convert to new freezing mechanism
    nilfs2: Convert to new freezing mechanism
    ntfs: Convert to new freezing mechanism
    fuse: Convert to new freezing mechanism
    gfs2: Convert to new freezing mechanism
    ocfs2: Convert to new freezing mechanism
    xfs: Convert to new freezing code
    ext4: Convert to new freezing mechanism
    fs: Protect write paths by sb_start_write - sb_end_write
    fs: Skip atime update on frozen filesystem
    fs: Add freezing handling to mnt_want_write() / mnt_drop_write()
    fs: Improve filesystem freezing handling
    switch the protection of percpu_counter list to spinlock
    nfsd: Push mnt_want_write() outside of i_mutex
    btrfs: Push mnt_want_write() outside of i_mutex
    fat: Push mnt_want_write() outside of i_mutex
    ...

    Linus Torvalds
     

01 Aug, 2012

9 commits

  • Pull irqdomain changes from Grant Likely:
    "Round of refactoring and enhancements to irq_domain infrastructure.
    This series starts the process of simplifying irqdomain. The ultimate
    goal is to merge LEGACY, LINEAR and TREE mappings into a single
    system, but had to back off from that after some last minute bugs.
    Instead it mainly reorganizes the code and ensures that the reverse
    map gets populated when the irq is mapped instead of the first time it
    is looked up.

    Merging of the irq_domain types is deferred to v3.7

    In other news, this series adds helpers for creating static mappings
    on a linear or tree mapping."

    * tag 'irqdomain-for-linus' of git://git.secretlab.ca/git/linux-2.6:
    irqdomain: Improve diagnostics when a domain mapping fails
    irqdomain: eliminate slow-path revmap lookups
    irqdomain: Fix irq_create_direct_mapping() to test irq_domain type.
    irqdomain: Eliminate dedicated radix lookup functions
    irqdomain: Support for static IRQ mapping and association.
    irqdomain: Always update revmap when setting up a virq
    irqdomain: Split disassociating code into separate function
    irq_domain: correct a minor wrong comment for linear revmap
    irq_domain: Standardise legacy/linear domain selection
    irqdomain: Make ops->map hook optional
    irqdomain: Remove unnecessary test for IRQ_DOMAIN_MAP_LEGACY
    irqdomain: Simple NUMA awareness.
    devicetree: add helper inline for retrieving a node's full name

    Linus Torvalds
     
  • Merge Andrew's second set of patches:
    - MM
    - a few random fixes
    - a couple of RTC leftovers

    * emailed patches from Andrew Morton : (120 commits)
    rtc/rtc-88pm80x: remove unneed devm_kfree
    rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
    mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
    tmpfs: distribute interleave better across nodes
    mm: remove redundant initialization
    mm: warn if pg_data_t isn't initialized with zero
    mips: zero out pg_data_t when it's allocated
    memcg: gix memory accounting scalability in shrink_page_list
    mm/sparse: remove index_init_lock
    mm/sparse: more checks on mem_section number
    mm/sparse: optimize sparse_index_alloc
    memcg: add mem_cgroup_from_css() helper
    memcg: further prevent OOM with too many dirty pages
    memcg: prevent OOM with too many dirty pages
    mm: mmu_notifier: fix freed page still mapped in secondary MMU
    mm: memcg: only check anon swapin page charges for swap cache
    mm: memcg: only check swap cache pages for repeated charging
    mm: memcg: split swapin charge function into private and public part
    mm: memcg: remove needless !mm fixup to init_mm when charging
    mm: memcg: remove unneeded shmem charge type
    ...

    Linus Torvalds
     
  • Pull random subsystem patches from Ted Ts'o:
    "This patch series contains a major revamp of how we collect entropy
    from interrupts for /dev/random and /dev/urandom.

    The goal is to addresses weaknesses discussed in the paper "Mining
    your Ps and Qs: Detection of Widespread Weak Keys in Network Devices",
    by Nadia Heninger, Zakir Durumeric, Eric Wustrow, J. Alex Halderman,
    which will be published in the Proceedings of the 21st Usenix Security
    Symposium, August 2012. (See https://factorable.net for more
    information and an extended version of the paper.)"

    Fix up trivial conflicts due to nearby changes in
    drivers/{mfd/ab3100-core.c, usb/gadget/omap_udc.c}

    * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: (33 commits)
    random: mix in architectural randomness in extract_buf()
    dmi: Feed DMI table to /dev/random driver
    random: Add comment to random_initialize()
    random: final removal of IRQF_SAMPLE_RANDOM
    um: remove IRQF_SAMPLE_RANDOM which is now a no-op
    sparc/ldc: remove IRQF_SAMPLE_RANDOM which is now a no-op
    [ARM] pxa: remove IRQF_SAMPLE_RANDOM which is now a no-op
    board-palmz71: remove IRQF_SAMPLE_RANDOM which is now a no-op
    isp1301_omap: remove IRQF_SAMPLE_RANDOM which is now a no-op
    pxa25x_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
    omap_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
    goku_udc: remove IRQF_SAMPLE_RANDOM which was commented out
    uartlite: remove IRQF_SAMPLE_RANDOM which is now a no-op
    drivers: hv: remove IRQF_SAMPLE_RANDOM which is now a no-op
    xen-blkfront: remove IRQF_SAMPLE_RANDOM which is now a no-op
    n2_crypto: remove IRQF_SAMPLE_RANDOM which is now a no-op
    pda_power: remove IRQF_SAMPLE_RANDOM which is now a no-op
    i2c-pmcmsp: remove IRQF_SAMPLE_RANDOM which is now a no-op
    input/serio/hp_sdc.c: remove IRQF_SAMPLE_RANDOM which is now a no-op
    mfd: remove IRQF_SAMPLE_RANDOM which is now a no-op
    ...

    Linus Torvalds
     
  • This is needed to allow network softirq packet processing to make use of
    PF_MEMALLOC.

    Currently softirq context cannot use PF_MEMALLOC due to it not being
    associated with a task, and therefore not having task flags to fiddle with
    - thus the gfp to alloc flag mapping ignores the task flags when in
    interrupts (hard or soft) context.

    Allowing softirqs to make use of PF_MEMALLOC therefore requires some
    trickery. This patch borrows the task flags from whatever process happens
    to be preempted by the softirq. It then modifies the gfp to alloc flags
    mapping to not exclude task flags in softirq context, and modify the
    softirq code to save, clear and restore the PF_MEMALLOC flag.

    The save and clear, ensures the preempted task's PF_MEMALLOC flag doesn't
    leak into the softirq. The restore ensures a softirq's PF_MEMALLOC flag
    cannot leak back into the preempted process. This should be safe due to
    the following reasons

    Softirqs can run on multiple CPUs sure but the same task should not be
    executing the same softirq code. Neither should the softirq
    handler be preempted by any other softirq handler so the flags
    should not leak to an unrelated softirq.

    Softirqs re-enable hardware interrupts in __do_softirq() so can be
    preempted by hardware interrupts so PF_MEMALLOC is inherited
    by the hard IRQ. However, this is similar to a process in
    reclaim being preempted by a hardirq. While PF_MEMALLOC is
    set, gfp_to_alloc_flags() distinguishes between hard and
    soft irqs and avoids giving a hardirq the ALLOC_NO_WATERMARKS
    flag.

    If the softirq is deferred to ksoftirq then its flags may be used
    instead of a normal tasks but as the softirq cannot be preempted,
    the PF_MEMALLOC flag does not leak to other code by accident.

    [davem@davemloft.net: Document why PF_MEMALLOC is safe]
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Mel Gorman
    Cc: David Miller
    Cc: Neil Brown
    Cc: Mike Christie
    Cc: Eric B Munson
    Cc: Eric Dumazet
    Cc: Sebastian Andrzej Siewior
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • When hotadd_new_pgdat() is called to create new pgdat for a new node, a
    fallback zonelist should be created for the new node. There's code to try
    to achieve that in hotadd_new_pgdat() as below:

    /*
    * The node we allocated has no zone fallback lists. For avoiding
    * to access not-initialized zonelist, build here.
    */
    mutex_lock(&zonelists_mutex);
    build_all_zonelists(pgdat, NULL);
    mutex_unlock(&zonelists_mutex);

    But it doesn't work as expected. When hotadd_new_pgdat() is called, the
    new node is still in offline state because node_set_online(nid) hasn't
    been called yet. And build_all_zonelists() only builds zonelists for
    online nodes as:

    for_each_online_node(nid) {
    pg_data_t *pgdat = NODE_DATA(nid);

    build_zonelists(pgdat);
    build_zonelist_cache(pgdat);
    }

    Though we hope to create zonelist for the new pgdat, but it doesn't. So
    add a new parameter "pgdat" the build_all_zonelists() to build pgdat for
    the new pgdat too.

    Signed-off-by: Jiang Liu
    Signed-off-by: Xishi Qiu
    Cc: Mel Gorman
    Cc: Michal Hocko
    Cc: Minchan Kim
    Cc: Rusty Russell
    Cc: Yinghai Lu
    Cc: Tony Luck
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: David Rientjes
    Cc: Keping Chen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiang Liu
     
  • Sanity:

    CONFIG_CGROUP_MEM_RES_CTLR -> CONFIG_MEMCG
    CONFIG_CGROUP_MEM_RES_CTLR_SWAP -> CONFIG_MEMCG_SWAP
    CONFIG_CGROUP_MEM_RES_CTLR_SWAP_ENABLED -> CONFIG_MEMCG_SWAP_ENABLED
    CONFIG_CGROUP_MEM_RES_CTLR_KMEM -> CONFIG_MEMCG_KMEM

    [mhocko@suse.cz: fix missed bits]
    Cc: Glauber Costa
    Acked-by: Michal Hocko
    Cc: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Cc: Hugh Dickins
    Cc: Tejun Heo
    Cc: Aneesh Kumar K.V
    Cc: David Rientjes
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Since per-BDI flusher threads were introduced in 2.6, the pdflush
    mechanism is not used any more. But the old interface exported through
    /proc/sys/vm/nr_pdflush_threads still exists and is obviously useless.

    For back-compatibility, printk warning information and return 2 to notify
    the users that the interface is removed.

    Signed-off-by: Wanpeng Li
    Cc: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wanpeng Li
     
  • vm_stat_account() accounts the shared_vm, stack_vm and reserved_vm now.
    But we can also account for total_vm in the vm_stat_account() which makes
    the code tidy.

    Even for mprotect_fixup(), we can get the right result in the end.

    Signed-off-by: Huang Shijie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Huang Shijie
     
  • Pull perf updates from Ingo Molnar:
    "The biggest changes are Intel Nehalem-EX PMU uncore support, uprobes
    updates/cleanups/fixes from Oleg and diverse tooling updates (mostly
    fixes) now that Arnaldo is back from vacation."

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
    uprobes: __replace_page() needs munlock_vma_page()
    uprobes: Rename vma_address() and make it return "unsigned long"
    uprobes: Fix register_for_each_vma()->vma_address() check
    uprobes: Introduce vaddr_to_offset(vma, vaddr)
    uprobes: Teach build_probe_list() to consider the range
    uprobes: Remove insert_vm_struct()->uprobe_mmap()
    uprobes: Remove copy_vma()->uprobe_mmap()
    uprobes: Fix overflow in vma_address()/find_active_uprobe()
    uprobes: Suppress uprobe_munmap() from mmput()
    uprobes: Uprobe_mmap/munmap needs list_for_each_entry_safe()
    uprobes: Clean up and document write_opcode()->lock_page(old_page)
    uprobes: Kill write_opcode()->lock_page(new_page)
    uprobes: __replace_page() should not use page_address_in_vma()
    uprobes: Don't recheck vma/f_mapping in write_opcode()
    perf/x86: Fix missing struct before structure name
    perf/x86: Fix format definition of SNB-EP uncore QPI box
    perf/x86: Make bitfield unsigned
    perf/x86: Fix LLC-* and node-* events on Intel SandyBridge
    perf/x86: Add Intel Nehalem-EX uncore support
    perf/x86: Fix typo in format definition of uncore PCU filter
    ...

    Linus Torvalds
     

31 Jul, 2012

11 commits

  • Ingo noted that the numerous timekeeper.value references made
    the timekeeping code ugly and caused many long lines that
    had to be broken up. He recommended replacing timekeeper.value
    references with tk->value.

    This patch provides a local tk value for all top level time
    functions and sets it to &timekeeper. Then all timekeeper
    access is done via a tk pointer.

    Signed-off-by: John Stultz
    Cc: Prarit Bhargava
    Link: http://lkml.kernel.org/r/1343414893-45779-6-git-send-email-john.stultz@linaro.org
    Signed-off-by: Ingo Molnar

    John Stultz
     
  • For performance reasons, we maintain ktime_t based duplicates of
    wall_to_monotonic (offs_real) and total_sleep_time (offs_boot).

    Since large problems could occur (such as the resume regression
    on 3.5-rc7, or the leapsecond hrtimer issue) if these value
    pairs were to be inconsistently updated, this patch this cleans
    up how we modify these value pairs to ensure we are always
    consistent.

    As a side-effect this is also more efficient as we only
    caulculate the duplicate values when they are changed,
    rather then every update_wall_time call.

    This also provides WARN_ONs to detect if future changes break
    the invariants.

    Signed-off-by: John Stultz
    Cc: Peter Zijlstra
    Cc: Richard Cochran
    Cc: Prarit Bhargava
    Link: http://lkml.kernel.org/r/1343414893-45779-5-git-send-email-john.stultz@linaro.org
    [ Cleaned up minor style issues. ]
    Signed-off-by: Ingo Molnar

    John Stultz
     
  • Ingo noted inconsistent newline usage between functions.
    This patch cleans those up.

    Signed-off-by: John Stultz
    Cc: Prarit Bhargava
    Link: http://lkml.kernel.org/r/1343414893-45779-4-git-send-email-john.stultz@linaro.org
    Signed-off-by: Ingo Molnar

    John Stultz
     
  • Ingo noted that ACTHZ is a confusing name, and requested it
    be renamed, so this patch renames ACTHZ to SHIFTED_HZ to
    better describe it.

    Signed-off-by: John Stultz
    Cc: Prarit Bhargava
    Link: http://lkml.kernel.org/r/1343414893-45779-3-git-send-email-john.stultz@linaro.org
    Signed-off-by: Ingo Molnar

    John Stultz
     
  • Merge in Linus's branch which already has timers/core merged.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • A few events are interesting not only for a current task.
    For example, sched_stat_* events are interesting for a task
    which wakes up. For this reason, it will be good if such
    events will be delivered to a target task too.

    Now a target task can be set by using __perf_task().

    The original idea and a draft patch belongs to Peter Zijlstra.

    I need these events for profiling sleep times. sched_switch is used for
    getting callchains and sched_stat_* is used for getting time periods.
    These events are combined in user space, then it can be analyzed by
    perf tools.

    Inspired-by: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Steven Rostedt
    Cc: Arun Sharma
    Signed-off-by: Andrew Vagin
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1342016098-213063-1-git-send-email-avagin@openvz.org
    Signed-off-by: Ingo Molnar

    Andrew Vagin
     
  • With this patch struct ld_env will have a pointer of the load balancing
    cpumask and we don't need to pass a cpumask around anymore.

    Signed-off-by: Michael Wang
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/4FFE8665.3080705@linux.vnet.ibm.com
    Signed-off-by: Ingo Molnar

    Michael Wang
     
  • Currently kernel never set KGDB_REASON_NMI. We do now, when we enter
    KGDB/KDB from an NMI.

    This is not to be confused with kgdb_nmicallback(), NMI callback is
    an entry for the slave CPUs during CPUs roundup, but REASON_NMI is the
    entry for the master CPU.

    Signed-off-by: Anton Vorontsov
    Signed-off-by: Jason Wessel

    Anton Vorontsov
     
  • Having the CPU in the more prompt is completely redundent vs the
    standard kdb prompt, and it also wastes 32 bytes on the stack.

    Signed-off-by: Jason Wessel

    Jason Wessel
     
  • This code cleanup was missed in the original kdb merge, and this code
    is simply not used at all. The code that was previously used to set
    the KDB_FLAG_ONLY_DO_DUMP was removed prior to the initial kdb merge.

    Signed-off-by: Jason Wessel

    Jason Wessel
     
  • When the requested range is outside of the root range the logic in
    __reserve_region_with_split will cause an infinite recursion which will
    overflow the stack as seen in the warning bellow.

    This particular stack overflow was caused by requesting the
    (100000000-107ffffff) range while the root range was (0-ffffffff). In
    this case __request_resource would return the whole root range as
    conflict range (i.e. 0-ffffffff). Then, the logic in
    __reserve_region_with_split would continue the recursion requesting the
    new range as (conflict->end+1, end) which incidentally in this case
    equals the originally requested range.

    This patch aborts looking for an usable range when the request does not
    intersect with the root range. When the request partially overlaps with
    the root range, it ajust the request to fall in the root range and then
    continues with the new request.

    When the request is modified or aborted errors and a stack trace are
    logged to allow catching the errors in the upper layers.

    [ 5.968374] WARNING: at kernel/sched.c:4129 sub_preempt_count+0x63/0x89()
    [ 5.975150] Modules linked in:
    [ 5.978184] Pid: 1, comm: swapper Not tainted 3.0.22-mid27-00004-gb72c817 #46
    [ 5.985324] Call Trace:
    [ 5.987759] [] ? console_unlock+0x17b/0x18d
    [ 5.992891] [] warn_slowpath_common+0x48/0x5d
    [ 5.998194] [] ? sub_preempt_count+0x63/0x89
    [ 6.003412] [] warn_slowpath_null+0xf/0x13
    [ 6.008453] [] sub_preempt_count+0x63/0x89
    [ 6.013499] [] _raw_spin_unlock+0x27/0x3f
    [ 6.018453] [] add_partial+0x36/0x3b
    [ 6.022973] [] deactivate_slab+0x96/0xb4
    [ 6.027842] [] __slab_alloc.isra.54.constprop.63+0x204/0x241
    [ 6.034456] [] ? kzalloc.constprop.5+0x29/0x38
    [ 6.039842] [] ? kzalloc.constprop.5+0x29/0x38
    [ 6.045232] [] kmem_cache_alloc_trace+0x51/0xb0
    [ 6.050710] [] ? kzalloc.constprop.5+0x29/0x38
    [ 6.056100] [] kzalloc.constprop.5+0x29/0x38
    [ 6.061320] [] __reserve_region_with_split+0x1c/0xd1
    [ 6.067230] [] __reserve_region_with_split+0xc6/0xd1
    ...
    [ 7.179057] [] __reserve_region_with_split+0xc6/0xd1
    [ 7.184970] [] reserve_region_with_split+0x30/0x42
    [ 7.190709] [] e820_reserve_resources_late+0xd1/0xe9
    [ 7.196623] [] pcibios_resource_survey+0x23/0x2a
    [ 7.202184] [] pcibios_init+0x23/0x35
    [ 7.206789] [] pci_subsys_init+0x3f/0x44
    [ 7.211659] [] do_one_initcall+0x72/0x122
    [ 7.216615] [] ? pci_legacy_init+0x3d/0x3d
    [ 7.221659] [] kernel_init+0xa6/0x118
    [ 7.226265] [] ? start_kernel+0x334/0x334
    [ 7.231223] [] kernel_thread_helper+0x6/0x10

    Signed-off-by: Octavian Purdila
    Signed-off-by: Ram Pai
    Cc: Jesse Barnes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Octavian Purdila