31 Mar, 2011

1 commit


23 Mar, 2011

1 commit

  • Change the printk() calls to have the KERN_INFO/KERN_ERROR stuff, and
    fixes other coding style errors. Not _all_ of them are gone, though.

    [akpm@linux-foundation.org: revert the bits I disagree with]
    Signed-off-by: Michael Rodriguez
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Rodriguez
     

07 Jan, 2011

1 commit

  • …-linus', 'x86-paravirt-for-linus', 'core-locking-for-linus' and 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

    * 'x86-alternatives-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, suspend: Avoid unnecessary smp alternatives switch during suspend/resume

    * 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86-64, asm: Use fxsaveq/fxrestorq in more places

    * 'x86-hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, hwmon: Add core threshold notification to therm_throt.c

    * 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, paravirt: Use native_halt on a halt, not native_safe_halt

    * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    locking, lockdep: Convert sprintf_symbol to %pS

    * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    irq: Better struct irqaction layout

    Linus Torvalds
     

14 Dec, 2010

1 commit

  • During suspend, we disable all the non boot cpus. And during resume we bring
    them all back again. So no need to do alternatives_smp_switch() in between.

    On my core 2 based laptop, this speeds up the suspend path by 15msec and the
    resume path by 5 msec (suspend/resume speed up differences can be attributed
    to the different P-states that the cpu is in during suspend/resume).

    Signed-off-by: Suresh Siddha
    LKML-Reference:
    Cc: Rafael J. Wysocki
    Signed-off-by: H. Peter Anvin

    Suresh Siddha
     

23 Nov, 2010

2 commits

  • Oleg mentioned that there is no actual guarantee the dying cpu's
    migration thread is actually finished running when we get there, so
    replace the BUG_ON() with a spinloop waiting for it.

    Reported-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • GCC warns us about:

    kernel/cpu.c: In function ‘take_cpu_down’:
    kernel/cpu.c:200:15: warning: unused variable ‘cpu’

    This variable is unused since param->hcpu is directly
    used later on in cpu_notify.

    Signed-off-by: Dhaval Giani
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Dhaval Giani
     

18 Nov, 2010

1 commit

  • While discussing the need for sched_idle_next(), Oleg remarked that
    since try_to_wake_up() ensures sleeping tasks will end up running on a
    sane cpu, we can do away with migrate_live_tasks().

    If we then extend the existing hack of migrating current from
    CPU_DYING to migrating the full rq worth of tasks from CPU_DYING, the
    need for the sched_idle_next() abomination disappears as well, since
    idle will be the only possible thread left after the migration thread
    stops.

    This greatly simplifies the hot-unplug task migration path, as can be
    seen from the resulting code reduction (and about half the new lines
    are comments).

    Suggested-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

09 Jun, 2010

1 commit

  • Currently, when a cpu goes down, cpu_active is cleared before
    CPU_DOWN_PREPARE starts and cpuset configuration is updated from a
    default priority cpu notifier. When a cpu is coming up, it's set
    before CPU_ONLINE but cpuset configuration again is updated from the
    same cpu notifier.

    For cpu notifiers, this presents an inconsistent state. Threads which
    a CPU_DOWN_PREPARE notifier expects to be bound to the CPU can be
    migrated to other cpus because the cpu is no more inactive.

    Fix it by updating cpu_active in the highest priority cpu notifier and
    cpuset configuration in the second highest when a cpu is coming up.
    Down path is updated similarly. This guarantees that all other cpu
    notifiers see consistent cpu_active and cpuset configuration.

    cpuset_track_online_cpus() notifier is converted to
    cpuset_update_active_cpus() which just updates the configuration and
    now called from cpuset_cpu_[in]active() notifiers registered from
    sched_init_smp(). If cpuset is disabled, cpuset_update_active_cpus()
    degenerates into partition_sched_domains() making separate notifier
    for !CONFIG_CPUSETS unnecessary.

    This problem is triggered by cmwq. During CPU_DOWN_PREPARE, hotplug
    callback creates a kthread and kthread_bind()s it to the target cpu,
    and the thread is expected to run on that cpu.

    * Ingo's test discovered __cpuinit/exit markups were incorrect.
    Fixed.

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Cc: Rusty Russell
    Cc: Ingo Molnar
    Cc: Paul Menage

    Tejun Heo
     

02 Jun, 2010

1 commit

  • In commit e9fb7631ebcd ("cpu-hotplug: introduce cpu_notify(),
    __cpu_notify(), cpu_notify_nofail()") the new helper functions access
    cpu_chain. As a result, it shouldn't be marked __cpuinitdata (via
    section mismatch warning).

    Alternatively, the helper functions should be forced inline, or marked
    __ref or __cpuinit. In the meantime, this patch silences the warning
    the trivial way.

    Signed-off-by: Daniel J Blueman
    Signed-off-by: Linus Torvalds

    Daniel J Blueman
     

31 May, 2010

1 commit


28 May, 2010

4 commits

  • Commit e9fb7631ebcd ("cpu-hotplug: introduce cpu_notify(),
    __cpu_notify(), cpu_notify_nofail()") also introduced this annoying
    warning:

    kernel/cpu.c:157: warning: 'cpu_notify_nofail' defined but not used

    when CONFIG_HOTPLUG_CPU wasn't set.

    So move that helper inside the #ifdef CONFIG_HOTPLUG_CPU region, and
    simplify it while at it.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Since when CONFIG_HOTPLUG_CPU=n, get_online_cpus() do nothing, so we don't
    need cpu_hotplug_begin() either.

    This patch moves cpu_hotplug_begin()/cpu_hotplug_done() into the code
    block of CONFIG_HOTPLUG_CPU=y.

    Signed-off-by: Lai Jiangshan
    Cc: Gautham R Shenoy
    Cc: Ingo Molnar

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lai Jiangshan
     
  • Currently, onlining or offlining a CPU failure by one of the cpu notifiers
    error always cause -EINVAL error. (i.e. writing 0 or 1 to
    /sys/devices/system/cpu/cpuX/online gets EINVAL)

    To get better error reporting rather than always getting -EINVAL, This
    changes cpu_notify() to return -errno value with notifier_to_errno() and
    fix the callers. Now that cpu notifiers can return encapsulate errno
    value.

    Currently, all cpu hotplug notifiers return NOTIFY_OK, NOTIFY_BAD, or
    NOTIFY_DONE. So cpu_notify() can returns 0 or -EPERM with this change for
    now.

    (notifier_to_errno(NOTIFY_OK) == 0, notifier_to_errno(NOTIFY_DONE) == 0,
    notifier_to_errno(NOTIFY_BAD) == -EPERM)

    Forthcoming patches convert several cpu notifiers to return encapsulate
    errno value with notifier_from_errno().

    Signed-off-by: Akinobu Mita
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • No functional change. These are just wrappers of
    raw_cpu_notifier_call_chain.

    Signed-off-by: Akinobu Mita
    Cc: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

25 May, 2010

3 commits

  • Add global mutex zonelists_mutex to fix the possible race:

    CPU0 CPU1 CPU2
    (1) zone->present_pages += online_pages;
    (2) build_all_zonelists();
    (3) alloc_page();
    (4) free_page();
    (5) build_all_zonelists();
    (6) __build_all_zonelists();
    (7) zone->pageset = alloc_percpu();

    In step (3,4), zone->pageset still points to boot_pageset, so bad
    things may happen if 2+ nodes are in this state. Even if only 1 node
    is accessing the boot_pageset, (3) may still consume too much memory
    to fail the memory allocations in step (7).

    Besides, atomic operation ensures alloc_percpu() in step (7) will never fail
    since there is a new fresh memory block added in step(6).

    [haicheng.li@linux.intel.com: hold zonelists_mutex when build_all_zonelists]
    Signed-off-by: Haicheng Li
    Signed-off-by: Wu Fengguang
    Reviewed-by: Andi Kleen
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Haicheng Li
     
  • For each new populated zone of hotadded node, need to update its pagesets
    with dynamically allocated per_cpu_pageset struct for all possible CPUs:

    1) Detach zone->pageset from the shared boot_pageset
    at end of __build_all_zonelists().

    2) Use mutex to protect zone->pageset when it's still
    shared in onlined_pages()

    Otherwises, multiple zones of different nodes would share same boot strapping
    boot_pageset for same CPU, which will finally cause below kernel panic:

    ------------[ cut here ]------------
    kernel BUG at mm/page_alloc.c:1239!
    invalid opcode: 0000 [#1] SMP
    ...
    Call Trace:
    [] __alloc_pages_nodemask+0x131/0x7b0
    [] alloc_pages_current+0x87/0xd0
    [] __page_cache_alloc+0x67/0x70
    [] __do_page_cache_readahead+0x120/0x260
    [] ra_submit+0x21/0x30
    [] ondemand_readahead+0x166/0x2c0
    [] page_cache_async_readahead+0x80/0xa0
    [] generic_file_aio_read+0x364/0x670
    [] nfs_file_read+0xca/0x130
    [] do_sync_read+0xfa/0x140
    [] vfs_read+0xb5/0x1a0
    [] sys_read+0x51/0x80
    [] system_call_fastpath+0x16/0x1b
    RIP [] get_page_from_freelist+0x883/0x900
    RSP
    ---[ end trace 4bda28328b9990db ]

    [akpm@linux-foundation.org: merge fix]
    Signed-off-by: Haicheng Li
    Signed-off-by: Wu Fengguang
    Reviewed-by: Andi Kleen
    Reviewed-by: Christoph Lameter
    Cc: Mel Gorman
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Haicheng Li
     
  • Enable users to online CPUs even if the CPUs belongs to a numa node which
    doesn't have onlined local memory.

    The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either
    in system boot/init period, or at the time of local memory online. For a
    numa node without onlined local memory, its zonelists are not initialized
    at present. As a result, any memory allocation operations executed by
    CPUs within this node will fail. In fact, an out-of-memory error is
    triggered when attempt to online CPUs before memory comes to online.

    This patch tries to create zonelists for such numa nodes, so that the
    memory allocation for this node can be fallback'ed to other nodes.

    [akpm@linux-foundation.org: remove unneeded export]
    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: minskey guo
    Cc: Minchan Kim
    Cc: Yasunori Goto
    Cc: Andi Kleen
    Cc: Christoph Lameter
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    minskey guo
     

07 May, 2010

1 commit

  • Reimplement stop_machine using cpu_stop. As cpu stoppers are
    guaranteed to be available for all online cpus,
    stop_machine_create/destroy() are no longer necessary and removed.

    With resource management and synchronization handled by cpu_stop, the
    new implementation is much simpler. Asking the cpu_stop to execute
    the stop_cpu() state machine on all online cpus with cpu hotplug
    disabled is enough.

    stop_machine itself doesn't need to manage any global resources
    anymore, so all per-instance information is rolled into struct
    stop_machine_data and the mutex and all static data variables are
    removed.

    The previous implementation created and destroyed RT workqueues as
    necessary which made stop_machine() calls highly expensive on very
    large machines. According to Dimitri Sivanich, preventing the dynamic
    creation/destruction makes booting faster more than twice on very
    large machines. cpu_stop resources are preallocated for all online
    cpus and should have the same effect.

    Signed-off-by: Tejun Heo
    Acked-by: Rusty Russell
    Acked-by: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Dimitri Sivanich

    Tejun Heo
     

15 Apr, 2010

1 commit


03 Apr, 2010

1 commit

  • _cpu_down() changes the current task's affinity and then recovers it at
    the end. The problems are well known: we can't restore old_allowed if it
    was bound to the now-dead-cpu, and we can race with the userspace which
    can change cpu-affinity during unplug.

    _cpu_down() should not play with current->cpus_allowed at all. Instead,
    take_cpu_down() can migrate the caller of _cpu_down() after __cpu_disable()
    removes the dying cpu from cpu_online_mask.

    Signed-off-by: Oleg Nesterov
    Acked-by: Rafael J. Wysocki
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

07 Mar, 2010

1 commit


28 Jan, 2010

2 commits

  • Due to an incorrect line break the output currently contains tabs.
    Also remove trailing space.

    The actual output that logcheck sent me looked like this:
    Task events/1 (pid = 10) is on cpu 1^I^I^I^I(state = 1, flags = 84208040)

    After this patch it becomes:
    Task events/1 (pid = 10) is on cpu 1 (state = 1, flags = 84208040)

    Signed-off-by: Frans Pop
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Frans Pop
     
  • We moved to migrate on wakeup, which means that sleeping tasks could
    still be present on offline cpus. Amend the check to only test running
    tasks.

    Reported-by: Heiko Carstens
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

17 Dec, 2009

1 commit

  • Sachin found cpu hotplug test failures on powerpc, which made
    the kernel hang on his POWER box.

    The problem is that we fail to re-activate a cpu when a
    hot-unplug fails. Fix this by moving the de-activation into
    _cpu_down after doing the initial checks.

    Remove the synchronize_sched() calls and rely on those implied
    by rebuilding the sched domains using the new mask.

    Reported-by: Sachin Sant
    Signed-off-by: Xiaotian Feng
    Tested-by: Sachin Sant
    Signed-off-by: Peter Zijlstra
    Cc: Mike Galbraith
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Xiaotian Feng
     

13 Dec, 2009

1 commit

  • …l/git/tip/linux-2.6-tip

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (21 commits)
    sched: Remove forced2_migrations stats
    sched: Fix memory leak in two error corner cases
    sched: Fix build warning in get_update_sysctl_factor()
    sched: Update normalized values on user updates via proc
    sched: Make tunable scaling style configurable
    sched: Fix missing sched tunable recalculation on cpu add/remove
    sched: Fix task priority bug
    sched: cgroup: Implement different treatment for idle shares
    sched: Remove unnecessary RCU exclusion
    sched: Discard some old bits
    sched: Clean up check_preempt_wakeup()
    sched: Move update_curr() in check_preempt_wakeup() to avoid redundant call
    sched: Sanitize fork() handling
    sched: Clean up ttwu() rq locking
    sched: Remove rq->clock coupling from set_task_cpu()
    sched: Consolidate select_task_rq() callers
    sched: Remove sysctl.sched_features
    sched: Protect sched_rr_get_param() access to task->sched_class
    sched: Protect task->cpus_allowed access in sched_getaffinity()
    sched: Fix balance vs hotplug race
    ...

    Fixed up conflicts in kernel/sysctl.c (due to sysctl cleanup)

    Linus Torvalds
     

07 Dec, 2009

1 commit

  • Since (e761b77: cpu hotplug, sched: Introduce cpu_active_map and redo
    sched domain managment) we have cpu_active_mask which is suppose to rule
    scheduler migration and load-balancing, except it never (fully) did.

    The particular problem being solved here is a crash in try_to_wake_up()
    where select_task_rq() ends up selecting an offline cpu because
    select_task_rq_fair() trusts the sched_domain tree to reflect the
    current state of affairs, similarly select_task_rq_rt() trusts the
    root_domain.

    However, the sched_domains are updated from CPU_DEAD, which is after the
    cpu is taken offline and after stop_machine is done. Therefore it can
    race perfectly well with code assuming the domains are right.

    Cure this by building the domains from cpu_active_mask on
    CPU_DOWN_PREPARE.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

26 Nov, 2009

1 commit

  • Limit the number of per cpu calibration messages by only
    printing out results for the first cpu to boot.

    Also, don't print "CPUx is down" as this is expected, and we
    don't need 4096 reminders... ;-)

    Signed-off-by: Mike Travis
    Cc: Heiko Carstens
    Cc: Roland Dreier
    Cc: Randy Dunlap
    Cc: Tejun Heo
    Cc: Andi Kleen
    Cc: Greg Kroah-Hartman
    Cc: Yinghai Lu
    Cc: David Rientjes
    Cc: Steven Rostedt
    Cc: Rusty Russell
    Cc: Hidetoshi Seto
    Cc: Jack Steiner
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Travis
     

16 Sep, 2009

1 commit

  • * 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, pat: Fix cacheflush address in change_page_attr_set_clr()
    mm: remove !NUMA condition from PAGEFLAGS_EXTENDED condition set
    x86: Fix earlyprintk=dbgp for machines without NX
    x86, pat: Sanity check remap_pfn_range for RAM region
    x86, pat: Lookup the protection from memtype list on vm_insert_pfn()
    x86, pat: Add lookup_memtype to get the current memtype of a paddr
    x86, pat: Use page flags to track memtypes of RAM pages
    x86, pat: Generalize the use of page flag PG_uncached
    x86, pat: Add rbtree to do quick lookup in memtype tracking
    x86, pat: Add PAT reserve free to io_mapping* APIs
    x86, pat: New i/f for driver to request memtype for IO regions
    x86, pat: ioremap to follow same PAT restrictions as other PAT users
    x86, pat: Keep identity maps consistent with mmaps even when pat_disabled
    x86, mtrr: make mtrr_aps_delayed_init static bool
    x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init
    generic-ipi: Allow cpus not yet online to call smp_call_function with irqs disabled
    x86: Fix an incorrect argument of reserve_bootmem()
    x86: Fix system crash when loading with "reservetop" parameter

    Linus Torvalds
     

02 Sep, 2009

1 commit


22 Aug, 2009

1 commit

  • SDM Vol 3a section titled "MTRR considerations in MP systems" specifies
    the need for synchronizing the logical cpu's while initializing/updating
    MTRR.

    Currently Linux kernel does the synchronization of all cpu's only when
    a single MTRR register is programmed/updated. During an AP online
    (during boot/cpu-online/resume) where we initialize all the MTRR/PAT registers,
    we don't follow this synchronization algorithm.

    This can lead to scenarios where during a dynamic cpu online, that logical cpu
    is initializing MTRR/PAT with cache disabled (cr0.cd=1) etc while other logical
    HT sibling continue to run (also with cache disabled because of cr0.cd=1
    on its sibling).

    Starting from Westmere, VMX transitions with cr0.cd=1 don't work properly
    (because of some VMX performance optimizations) and the above scenario
    (with one logical cpu doing VMX activity and another logical cpu coming online)
    can result in system crash.

    Fix the MTRR initialization by doing rendezvous of all the cpus. During
    boot and resume, we delay the MTRR/PAT init for APs till all the
    logical cpu's come online and the rendezvous process at the end of AP's bringup,
    will initialize the MTRR/PAT for all AP's.

    For dynamic single cpu online, we synchronize all the logical cpus and
    do the MTRR/PAT init on the AP that is coming online.

    Signed-off-by: Suresh Siddha
    Signed-off-by: H. Peter Anvin

    Suresh Siddha
     

22 Jul, 2009

1 commit

  • Support for graceful handling of sleep states (S3/S4/S5) after an Intel(R) TXT launch.

    Without this patch, attempting to place the system in one of the ACPI sleep
    states (S3/S4/S5) will cause the TXT hardware to treat this as an attack and
    will cause a system reset, with memory locked. Not only may the subsequent
    memory scrub take some time, but the platform will be unable to enter the
    requested power state.

    This patch calls back into the tboot so that it may properly and securely clean
    up system state and clear the secrets-in-memory flag, after which it will place
    the system into the requested sleep state using ACPI information passed by the kernel.

    arch/x86/kernel/smpboot.c | 2 ++
    drivers/acpi/acpica/hwsleep.c | 3 +++
    kernel/cpu.c | 7 ++++++-
    3 files changed, 11 insertions(+), 1 deletion(-)

    Signed-off-by: Joseph Cihula
    Signed-off-by: Shane Wang
    Signed-off-by: H. Peter Anvin

    Joseph Cihula
     

23 Jun, 2009

1 commit

  • SLAB uses get/put_online_cpus() which use a mutex which is itself only
    initialized when cpu_hotplug_init() is called. Currently we hang suring
    boot in SLAB due to doing that too late.

    Reported by James Bottomley and Sachin Sant (and possibly others).
    Debugged by Benjamin Herrenschmidt.

    This just removes the dynamic initialization of the data structures, and
    replaces it with a static one, avoiding this dependency entirely, and
    removing one unnecessary special initcall.

    Tested-by: Sachin Sant
    Tested-by: James Bottomley
    Tested-by: Benjamin Herrenschmidt
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

30 Mar, 2009

1 commit


08 Jan, 2009

1 commit

  • disable_nonboot_cpus calls _cpu_down. But _cpu_down requires that the
    caller already created the stop_machine workqueue (like cpu_down does).
    Otherwise a call to stop_machine will lead to accesses to random memory
    regions.

    When introducing this new interface (9ea09af3bd3090e8349ca2899ca2011bd94cda85
    "stop_machine: introduce stop_machine_create/destroy") I missed the second
    call site of _cpu_down.
    So add the missing stop_machine_create/destroy calls to disable_nonboot_cpus
    as well.

    Fixes suspend-to-ram/disk and also this bug:

    [ 286.547348] BUG: unable to handle kernel paging request at 6b6b6b6b
    [ 286.548940] IP: [] __stop_machine+0x88/0xe3
    [ 286.550598] Oops: 0002 [#1] SMP
    [ 286.560580] Pid: 3273, comm: halt Not tainted (2.6.28-06127-g238c6d5
    [ 286.560580] EIP: is at __stop_machine+0x88/0xe3
    [ 286.560580] Process halt (pid: 3273, ti=f1a28000 task=f4530f30
    [ 286.560580] Call Trace:
    [ 286.560580] [] ? _cpu_down+0x10f/0x234
    [ 286.560580] [] ? disable_nonboot_cpus+0x58/0xdc
    [ 286.560580] [] ? kernel_poweroff+0x22/0x39
    [ 286.560580] [] ? sys_reboot+0xde/0x14c
    [ 286.560580] [] ? complete_signal+0x179/0x191
    [ 286.560580] [] ? send_signal+0x1cc/0x1e1
    [ 286.560580] [] ? _spin_unlock_irqrestore+0x2d/0x3c
    [ 286.560580] [] ? group_send_signal_info+0x58/0x61
    [ 286.560580] [] ? kill_pid_info+0x30/0x3a
    [ 286.560580] [] ? sys_kill+0x75/0x13a
    [ 286.560580] [] ? mntput_no_expire+ox1f/0x101
    [ 286.560580] [] ? dput+0x1e/0x105
    [ 286.560580] [] ? __fput+0x150/0x158
    [ 286.560580] [] ? audit_syscall_entry+0x137/0x159
    [ 286.560580] [] ? sysenter_do_call+0x12/0x34

    Reported-and-tested-by: "Justin P. Mattock"
    Reviewed-by: Pekka Enberg
    Signed-off-by: Heiko Carstens
    Tested-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     

05 Jan, 2009

1 commit

  • Introduce stop_machine_create/destroy. With this interface subsystems
    that need a non-failing stop_machine environment can create the
    stop_machine machine threads before actually calling stop_machine.
    When the threads aren't needed anymore they can be killed with
    stop_machine_destroy again.

    When stop_machine gets called and the threads aren't present they
    will be created and destroyed automatically. This restores the old
    behaviour of stop_machine.

    This patch also converts cpu hotplug to the new interface since it
    is special: cpu_down calls __stop_machine instead of stop_machine.
    However the kstop threads will only be created when stop_machine
    gets called.

    Changing the code so that the threads would be created automatically
    on __stop_machine is currently not possible: when __stop_machine gets
    called we hold cpu_add_remove_lock, which is the same lock that
    create_rt_workqueue would take. So the workqueue needs to be created
    before the cpu hotplug code locks cpu_add_remove_lock.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Rusty Russell

    Heiko Carstens
     

01 Jan, 2009

1 commit

  • Impact: Reduce kernel stack and memory usage, use new cpumask API.

    Use cpumask_var_t for take_cpu_down() stack var, and frozen_cpus.

    Note that notify_cpu_starting() can be called before core_initcall
    allocates frozen_cpus, but the NULL check is optimized out by gcc for
    the CONFIG_CPUMASK_OFFSTACK=n case.

    Signed-off-by: Rusty Russell

    Rusty Russell
     

30 Dec, 2008

2 commits

  • They're only for use in boot/cpu hotplug code anyway, and this avoids
    the use of deprecated cpu_*_map.

    Stephen Rothwell points out that gcc 4.2.4 (on powerpc at least)
    didn't like the cast away of const anyway:

    include/linux/cpumask.h: In function 'set_cpu_possible':
    include/linux/cpumask.h:1052: warning: passing argument 2 of 'cpumask_set_cpu' discards qualifiers from pointer target type

    So this kills two birds with one stone.

    Signed-off-by: Rusty Russell

    Rusty Russell
     
  • Impact: cleanup

    This implements the obsolescent cpu_online_map in terms of
    cpu_online_mask, rather than the other way around. Same for the other
    maps.

    The documentation comments are also updated to refer to _mask rather
    than _map.

    Signed-off-by: Rusty Russell
    Signed-off-by: Mike Travis

    Rusty Russell
     

13 Dec, 2008

1 commit

  • Impact: cleanup

    Each SMP arch defines these themselves. Move them to a central
    location.

    Twists:
    1) Some archs (m32, parisc, s390) set possible_map to all 1, so we add a
    CONFIG_INIT_ALL_POSSIBLE for this rather than break them.

    2) mips and sparc32 '#define cpu_possible_map phys_cpu_present_map'.
    Those archs simply have phys_cpu_present_map replaced everywhere.

    3) Alpha defined cpu_possible_map to cpu_present_map; this is tricky
    so I just manipulate them both in sync.

    4) IA64, cris and m32r have gratuitous 'extern cpumask_t cpu_possible_map'
    declarations.

    Signed-off-by: Rusty Russell
    Reviewed-by: Grant Grundler
    Tested-by: Tony Luck
    Acked-by: Ingo Molnar
    Cc: Mike Travis
    Cc: ink@jurassic.park.msu.ru
    Cc: rmk@arm.linux.org.uk
    Cc: starvik@axis.com
    Cc: tony.luck@intel.com
    Cc: takata@linux-m32r.org
    Cc: ralf@linux-mips.org
    Cc: grundler@parisc-linux.org
    Cc: paulus@samba.org
    Cc: schwidefsky@de.ibm.com
    Cc: lethal@linux-sh.org
    Cc: wli@holomorphy.com
    Cc: davem@davemloft.net
    Cc: jdike@addtoit.com
    Cc: mingo@redhat.com

    Rusty Russell