08 Apr, 2014

1 commit

  • Pull CPU hotplug notifiers registration fixes from Rafael Wysocki:
    "The purpose of this single series of commits from Srivatsa S Bhat
    (with a small piece from Gautham R Shenoy) touching multiple
    subsystems that use CPU hotplug notifiers is to provide a way to
    register them that will not lead to deadlocks with CPU online/offline
    operations as described in the changelog of commit 93ae4f978ca7f ("CPU
    hotplug: Provide lockless versions of callback registration
    functions").

    The first three commits in the series introduce the API and document
    it and the rest simply goes through the users of CPU hotplug notifiers
    and converts them to using the new method"

    * tag 'cpu-hotplug-3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (52 commits)
    net/iucv/iucv.c: Fix CPU hotplug callback registration
    net/core/flow.c: Fix CPU hotplug callback registration
    mm, zswap: Fix CPU hotplug callback registration
    mm, vmstat: Fix CPU hotplug callback registration
    profile: Fix CPU hotplug callback registration
    trace, ring-buffer: Fix CPU hotplug callback registration
    xen, balloon: Fix CPU hotplug callback registration
    hwmon, via-cputemp: Fix CPU hotplug callback registration
    hwmon, coretemp: Fix CPU hotplug callback registration
    thermal, x86-pkg-temp: Fix CPU hotplug callback registration
    octeon, watchdog: Fix CPU hotplug callback registration
    oprofile, nmi-timer: Fix CPU hotplug callback registration
    intel-idle: Fix CPU hotplug callback registration
    clocksource, dummy-timer: Fix CPU hotplug callback registration
    drivers/base/topology.c: Fix CPU hotplug callback registration
    acpi-cpufreq: Fix CPU hotplug callback registration
    zsmalloc: Fix CPU hotplug callback registration
    scsi, fcoe: Fix CPU hotplug callback registration
    scsi, bnx2fc: Fix CPU hotplug callback registration
    scsi, bnx2i: Fix CPU hotplug callback registration
    ...

    Linus Torvalds
     

20 Mar, 2014

1 commit

  • The following method of CPU hotplug callback registration is not safe
    due to the possibility of an ABBA deadlock involving the cpu_add_remove_lock
    and the cpu_hotplug.lock.

    get_online_cpus();

    for_each_online_cpu(cpu)
    init_cpu(cpu);

    register_cpu_notifier(&foobar_cpu_notifier);

    put_online_cpus();

    The deadlock is shown below:

    CPU 0 CPU 1
    ----- -----

    Acquire cpu_hotplug.lock
    [via get_online_cpus()]

    CPU online/offline operation
    takes cpu_add_remove_lock
    [via cpu_maps_update_begin()]

    Try to acquire
    cpu_add_remove_lock
    [via register_cpu_notifier()]

    CPU online/offline operation
    tries to acquire cpu_hotplug.lock
    [via cpu_hotplug_begin()]

    *** DEADLOCK! ***

    The problem here is that callback registration takes the locks in one order
    whereas the CPU hotplug operations take the same locks in the opposite order.
    To avoid this issue and to provide a race-free method to register CPU hotplug
    callbacks (along with initialization of already online CPUs), introduce new
    variants of the callback registration APIs that simply register the callbacks
    without holding the cpu_add_remove_lock during the registration. That way,
    we can avoid the ABBA scenario. However, we will need to hold the
    cpu_add_remove_lock throughout the entire critical section, to protect updates
    to the callback/notifier chain.

    This can be achieved by writing the callback registration code as follows:

    cpu_maps_update_begin(); [ or cpu_notifier_register_begin(); see below ]

    for_each_online_cpu(cpu)
    init_cpu(cpu);

    /* This doesn't take the cpu_add_remove_lock */
    __register_cpu_notifier(&foobar_cpu_notifier);

    cpu_maps_update_done(); [ or cpu_notifier_register_done(); see below ]

    Note that we can't use get_online_cpus() here instead of cpu_maps_update_begin()
    because the cpu_hotplug.lock is dropped during the invocation of CPU_POST_DEAD
    notifiers, and hence get_online_cpus() cannot provide the necessary
    synchronization to protect the callback/notifier chains against concurrent
    reads and writes. On the other hand, since the cpu_add_remove_lock protects
    the entire hotplug operation (including CPU_POST_DEAD), we can use
    cpu_maps_update_begin/done() to guarantee proper synchronization.

    Also, since cpu_maps_update_begin/done() is like a super-set of
    get/put_online_cpus(), the former naturally protects the critical sections
    from concurrent hotplug operations.

    Since the names cpu_maps_update_begin/done() don't make much sense in CPU
    hotplug callback registration scenarios, we'll introduce new APIs named
    cpu_notifier_register_begin/done() and map them to cpu_maps_update_begin/done().

    In summary, introduce the lockless variants of un/register_cpu_notifier() and
    also export the cpu_notifier_register_begin/done() APIs for use by modules.
    This way, we provide a race-free way to register hotplug callbacks as well as
    perform initialization for the CPUs that are already online.

    Cc: Thomas Gleixner
    Cc: Andrew Morton
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Acked-by: Oleg Nesterov
    Acked-by: Toshi Kani
    Reviewed-by: Gautham R. Shenoy
    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Srivatsa S. Bhat
     

19 Feb, 2014

1 commit

  • The x86 CPU feature modalias handling existed before it was reimplemented
    generically. This patch aligns the x86 handling so that it
    (a) reuses some more code that is now generic;
    (b) uses the generic format for the modalias module metadata entry, i.e., it
    now uses 'cpu:type:x86,venVVVVfamFFFFmodMMMM:feature:,XXXX,YYYY' instead of
    the 'x86cpu:vendor:VVVV:family:FFFF:model:MMMM:feature:,XXXX,YYYY' that was
    used before.

    Signed-off-by: Ard Biesheuvel
    Acked-by: H. Peter Anvin
    Signed-off-by: Greg Kroah-Hartman

    Ard Biesheuvel
     

14 Nov, 2013

1 commit

  • Pull ACPI and power management updates from Rafael J Wysocki:

    - New power capping framework and the the Intel Running Average Power
    Limit (RAPL) driver using it from Srinivas Pandruvada and Jacob Pan.

    - Addition of the in-kernel switching feature to the arm_big_little
    cpufreq driver from Viresh Kumar and Nicolas Pitre.

    - cpufreq support for iMac G5 from Aaro Koskinen.

    - Baytrail processors support for intel_pstate from Dirk Brandewie.

    - cpufreq support for Midway/ECX-2000 from Mark Langsdorf.

    - ARM vexpress/TC2 cpufreq support from Sudeep KarkadaNagesha.

    - ACPI power management support for the I2C and SPI bus types from Mika
    Westerberg and Lv Zheng.

    - cpufreq core fixes and cleanups from Viresh Kumar, Srivatsa S Bhat,
    Stratos Karafotis, Xiaoguang Chen, Lan Tianyu.

    - cpufreq drivers updates (mostly fixes and cleanups) from Viresh
    Kumar, Aaro Koskinen, Jungseok Lee, Sudeep KarkadaNagesha, Lukasz
    Majewski, Manish Badarkhe, Hans-Christian Egtvedt, Evgeny Kapaev.

    - intel_pstate updates from Dirk Brandewie and Adrian Huang.

    - ACPICA update to version 20130927 includig fixes and cleanups and
    some reduction of divergences between the ACPICA code in the kernel
    and ACPICA upstream in order to improve the automatic ACPICA patch
    generation process. From Bob Moore, Lv Zheng, Tomasz Nowicki, Naresh
    Bhat, Bjorn Helgaas, David E Box.

    - ACPI IPMI driver fixes and cleanups from Lv Zheng.

    - ACPI hotplug fixes and cleanups from Bjorn Helgaas, Toshi Kani, Zhang
    Yanfei, Rafael J Wysocki.

    - Conversion of the ACPI AC driver to the platform bus type and
    multiple driver fixes and cleanups related to ACPI from Zhang Rui.

    - ACPI processor driver fixes and cleanups from Hanjun Guo, Jiang Liu,
    Bartlomiej Zolnierkiewicz, Mathieu Rhéaume, Rafael J Wysocki.

    - Fixes and cleanups and new blacklist entries related to the ACPI
    video support from Aaron Lu, Felipe Contreras, Lennart Poettering,
    Kirill Tkhai.

    - cpuidle core cleanups from Viresh Kumar and Lorenzo Pieralisi.

    - cpuidle drivers fixes and cleanups from Daniel Lezcano, Jingoo Han,
    Bartlomiej Zolnierkiewicz, Prarit Bhargava.

    - devfreq updates from Sachin Kamat, Dan Carpenter, Manish Badarkhe.

    - Operation Performance Points (OPP) core updates from Nishanth Menon.

    - Runtime power management core fix from Rafael J Wysocki and update
    from Ulf Hansson.

    - Hibernation fixes from Aaron Lu and Rafael J Wysocki.

    - Device suspend/resume lockup detection mechanism from Benoit Goby.

    - Removal of unused proc directories created for various ACPI drivers
    from Lan Tianyu.

    - ACPI LPSS driver fix and new device IDs for the ACPI platform scan
    handler from Heikki Krogerus and Jarkko Nikula.

    - New ACPI _OSI blacklist entry for Toshiba NB100 from Levente Kurusa.

    - Assorted fixes and cleanups related to ACPI from Andy Shevchenko, Al
    Stone, Bartlomiej Zolnierkiewicz, Colin Ian King, Dan Carpenter,
    Felipe Contreras, Jianguo Wu, Lan Tianyu, Yinghai Lu, Mathias Krause,
    Liu Chuansheng.

    - Assorted PM fixes and cleanups from Andy Shevchenko, Thierry Reding,
    Jean-Christophe Plagniol-Villard.

    * tag 'pm+acpi-3.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (386 commits)
    cpufreq: conservative: fix requested_freq reduction issue
    ACPI / hotplug: Consolidate deferred execution of ACPI hotplug routines
    PM / runtime: Use pm_runtime_put_sync() in __device_release_driver()
    ACPI / event: remove unneeded NULL pointer check
    Revert "ACPI / video: Ignore BIOS initial backlight value for HP 250 G1"
    ACPI / video: Quirk initial backlight level 0
    ACPI / video: Fix initial level validity test
    intel_pstate: skip the driver if ACPI has power mgmt option
    PM / hibernate: Avoid overflow in hibernate_preallocate_memory()
    ACPI / hotplug: Do not execute "insert in progress" _OST
    ACPI / hotplug: Carry out PCI root eject directly
    ACPI / hotplug: Merge device hot-removal routines
    ACPI / hotplug: Make acpi_bus_hot_remove_device() internal
    ACPI / hotplug: Simplify device ejection routines
    ACPI / hotplug: Fix handle_root_bridge_removal()
    ACPI / hotplug: Refuse to hot-remove all objects with disabled hotplug
    ACPI / scan: Start matching drivers after trying scan handlers
    ACPI: Remove acpi_pci_slot_init() headers from internal.h
    ACPI / blacklist: fix name of ThinkPad Edge E530
    PowerCap: Fix build error with option -Werror=format-security
    ...

    Conflicts:
    arch/arm/mach-omap2/opp.c
    drivers/Kconfig
    drivers/spi/spi.c

    Linus Torvalds
     

16 Oct, 2013

1 commit

  • Use for_each_node_by_type() to iterate all cpu nodes in the
    system.

    Provide and overridable function arch_find_n_match_cpu_physical_id,
    which sees if the given device node matches 'cpu' and if so sets
    '*thread' when non-NULL to the cpu thread number within the core.

    The default implementation behaves the same as the existing code.

    Add a sparc64 implementation.

    Signed-off-by: David S. Miller
    Tested-by: Sudeep KarkadaNagesha
    Signed-off-by: Grant Likely

    David Miller
     

01 Oct, 2013

1 commit

  • cpu_hotplug_driver_lock() serializes CPU online/offline operations
    when ARCH_CPU_PROBE_RELEASE is set. This lock interface is no longer
    necessary with the following reason:

    - lock_device_hotplug() now protects CPU online/offline operations,
    including the probe & release interfaces enabled by
    ARCH_CPU_PROBE_RELEASE. The use of cpu_hotplug_driver_lock() is
    redundant.
    - cpu_hotplug_driver_lock() is only valid when ARCH_CPU_PROBE_RELEASE
    is defined, which is misleading and is only enabled on powerpc.

    This patch removes the cpu_hotplug_driver_lock() interface. As
    a result, ARCH_CPU_PROBE_RELEASE only enables / disables the cpu
    probe & release interface as intended. There is no functional change
    in this patch.

    Signed-off-by: Toshi Kani
    Reviewed-by: Nathan Fontenot
    Signed-off-by: Rafael J. Wysocki

    Toshi Kani
     

27 Aug, 2013

1 commit

  • * pm-cpufreq: (60 commits)
    cpufreq: pmac32-cpufreq: remove device tree parsing for cpu nodes
    cpufreq: pmac64-cpufreq: remove device tree parsing for cpu nodes
    cpufreq: maple-cpufreq: remove device tree parsing for cpu nodes
    cpufreq: arm_big_little: remove device tree parsing for cpu nodes
    cpufreq: kirkwood-cpufreq: remove device tree parsing for cpu nodes
    cpufreq: spear-cpufreq: remove device tree parsing for cpu nodes
    cpufreq: highbank-cpufreq: remove device tree parsing for cpu nodes
    cpufreq: cpufreq-cpu0: remove device tree parsing for cpu nodes
    cpufreq: imx6q-cpufreq: remove device tree parsing for cpu nodes
    drivers/bus: arm-cci: avoid parsing DT for cpu device nodes
    ARM: mvebu: remove device tree parsing for cpu nodes
    ARM: topology: remove hwid/MPIDR dependency from cpu_capacity
    of/device: add helper to get cpu device node from logical cpu index
    driver/core: cpu: initialize of_node in cpu's device struture
    ARM: DT/kernel: define ARM specific arch_match_cpu_phys_id
    of: move of_get_cpu_node implementation to DT core library
    powerpc: refactor of_get_cpu_node to support other architectures
    openrisc: remove undefined of_get_cpu_node declaration
    microblaze: remove undefined of_get_cpu_node declaration
    cpufreq: fix bad unlock balance on !CONFIG_SMP
    ...

    Rafael J. Wysocki
     

21 Aug, 2013

1 commit

  • This patch moves the generalized implementation of of_get_cpu_node from
    PowerPC to DT core library, thereby adding support for retrieving cpu
    node for a given logical cpu index on any architecture.

    The CPU subsystem can now use this function to assign of_node in the
    cpu device while registering CPUs.

    It is recommended to use these helper function only in pre-SMP/early
    initialisation stages to retrieve CPU device node pointers in logical
    ordering. Once the cpu devices are registered, it can be retrieved easily
    from cpu device of_node which avoids unnecessary parsing and matching.

    Cc: Benjamin Herrenschmidt
    Cc: Grant Likely
    Acked-by: Rob Herring
    Signed-off-by: Sudeep KarkadaNagesha

    Sudeep KarkadaNagesha
     

13 Aug, 2013

1 commit

  • CPU system maps are protected with reader/writer locks. The reader
    lock, get_online_cpus(), assures that the maps are not updated while
    holding the lock. The writer lock, cpu_hotplug_begin(), is used to
    udpate the cpu maps along with cpu_maps_update_begin().

    However, the ACPI processor handler updates the cpu maps without
    holding the the writer lock.

    acpi_map_lsapic() is called from acpi_processor_hotadd_init() to
    update cpu_possible_mask and cpu_present_mask. acpi_unmap_lsapic()
    is called from acpi_processor_remove() to update cpu_possible_mask.
    Currently, they are either unprotected or protected with the reader
    lock, which is not correct.

    For example, the get_online_cpus() below is supposed to assure that
    cpu_possible_mask is not changed while the code is iterating with
    for_each_possible_cpu().

    get_online_cpus();
    for_each_possible_cpu(cpu) {
    :
    }
    put_online_cpus();

    However, this lock has no protection with CPU hotplug since the ACPI
    processor handler does not use the writer lock when it updates
    cpu_possible_mask. The reader lock does not serialize within the
    readers.

    This patch protects them with the writer lock with cpu_hotplug_begin()
    along with cpu_maps_update_begin(), which must be held before calling
    cpu_hotplug_begin(). It also protects arch_register_cpu() /
    arch_unregister_cpu(), which creates / deletes a sysfs cpu device
    interface. For this purpose it changes cpu_hotplug_begin() and
    cpu_hotplug_done() to global and exports them in cpu.h.

    Signed-off-by: Toshi Kani
    Signed-off-by: Rafael J. Wysocki

    Toshi Kani
     

15 Jul, 2013

1 commit

  • The __cpuinit type of throwaway sections might have made sense
    some time ago when RAM was more constrained, but now the savings
    do not offset the cost and complications. For example, the fix in
    commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
    is a good example of the nasty type of bugs that can be created
    with improper use of the various __init prefixes.

    After a discussion on LKML[1] it was decided that cpuinit should go
    the way of devinit and be phased out. Once all the users are gone,
    we can then finally remove the macros themselves from linux/init.h.

    This removes all the uses of the __cpuinit macros from C files in
    the core kernel directories (kernel, init, lib, mm, and include)
    that don't really have a specific maintainer.

    [1] https://lkml.org/lkml/2013/5/20/589

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

05 Jul, 2013

1 commit

  • Pull trivial tree updates from Jiri Kosina:
    "The usual stuff from trivial tree"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
    treewide: relase -> release
    Documentation/cgroups/memory.txt: fix stat file documentation
    sysctl/net.txt: delete reference to obsolete 2.4.x kernel
    spinlock_api_smp.h: fix preprocessor comments
    treewide: Fix typo in printk
    doc: device tree: clarify stuff in usage-model.txt.
    open firmware: "/aliasas" -> "/aliases"
    md: bcache: Fixed a typo with the word 'arithmetic'
    irq/generic-chip: fix a few kernel-doc entries
    frv: Convert use of typedef ctl_table to struct ctl_table
    sgi: xpc: Convert use of typedef ctl_table to struct ctl_table
    doc: clk: Fix incorrect wording
    Documentation/arm/IXP4xx fix a typo
    Documentation/networking/ieee802154 fix a typo
    Documentation/DocBook/media/v4l fix a typo
    Documentation/video4linux/si476x.txt fix a typo
    Documentation/virtual/kvm/api.txt fix a typo
    Documentation/early-userspace/README fix a typo
    Documentation/video4linux/soc-camera.txt fix a typo
    lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment
    ...

    Linus Torvalds
     

13 Jun, 2013

1 commit

  • There are instances in the kernel where we would like to disable CPU
    hotplug (from sysfs) during some important operation. Today the freezer
    code depends on this and the code to do it was kinda tailor-made for
    that.

    Restructure the code and make it generic enough to be useful for other
    usecases too.

    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Robin Holt
    Cc: H. Peter Anvin
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Russ Anderson
    Cc: Robin Holt
    Cc: Russell King
    Cc: Guan Xuetao
    Cc: Shawn Guo
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Srivatsa S. Bhat
     

28 May, 2013

1 commit


08 Apr, 2013

2 commits

  • All idle functions in arch/* are more or less the same, plus minus a
    few bugs and extra instrumentation, tickless support and other
    optional items.

    Implement a generic idle function which resembles the functionality
    found in arch/. Provide weak arch_cpu_idle_* functions which can be
    overridden by the architecture code if needed.

    Signed-off-by: Thomas Gleixner
    Cc: Linus Torvalds
    Cc: Rusty Russell
    Cc: Paul McKenney
    Cc: Peter Zijlstra
    Reviewed-by: Cc: Srivatsa S. Bhat
    Cc: Magnus Damm
    Link: http://lkml.kernel.org/r/20130321215233.646635455@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     
  • For now this calls cpu_idle(), but in the long run we want to move the
    cpu bringup code to the core and therefor we add a state argument.

    Signed-off-by: Thomas Gleixner
    Cc: Linus Torvalds
    Cc: Rusty Russell
    Cc: Paul McKenney
    Cc: Peter Zijlstra
    Reviewed-by: Cc: Srivatsa S. Bhat
    Cc: Magnus Damm
    Link: http://lkml.kernel.org/r/20130321215233.583190032@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

18 Jul, 2012

1 commit

  • Currently, all workqueue cpu hotplug operations run off
    CPU_PRI_WORKQUEUE which is higher than normal notifiers. This is to
    ensure that workqueue is up and running while bringing up a CPU before
    other notifiers try to use workqueue on the CPU.

    Per-cpu workqueues are supposed to remain working and bound to the CPU
    for normal CPU_DOWN_PREPARE notifiers. This holds mostly true even
    with workqueue offlining running with higher priority because
    workqueue CPU_DOWN_PREPARE only creates a bound trustee thread which
    runs the per-cpu workqueue without concurrency management without
    explicitly detaching the existing workers.

    However, if the trustee needs to create new workers, it creates
    unbound workers which may wander off to other CPUs while
    CPU_DOWN_PREPARE notifiers are in progress. Furthermore, if the CPU
    down is cancelled, the per-CPU workqueue may end up with workers which
    aren't bound to the CPU.

    While reliably reproducible with a convoluted artificial test-case
    involving scheduling and flushing CPU burning work items from CPU down
    notifiers, this isn't very likely to happen in the wild, and, even
    when it happens, the effects are likely to be hidden by the following
    successful CPU down.

    Fix it by using different priorities for up and down notifiers - high
    priority for up operations and low priority for down operations.

    Workqueue cpu hotplug operations will soon go through further cleanup.

    Signed-off-by: Tejun Heo
    Cc: stable@vger.kernel.org
    Acked-by: "Rafael J. Wysocki"

    Tejun Heo
     

01 Jun, 2012

1 commit

  • Many architectures clear tasks' mm_cpumask like this:

    read_lock(&tasklist_lock);
    for_each_process(p) {
    if (p->mm)
    cpumask_clear_cpu(cpu, mm_cpumask(p->mm));
    }
    read_unlock(&tasklist_lock);

    Depending on the context, the code above may have several problems,
    such as:

    1. Working with task->mm w/o getting mm or grabing the task lock is
    dangerous as ->mm might disappear (exit_mm() assigns NULL under
    task_lock(), so tasklist lock is not enough).

    2. Checking for process->mm is not enough because process' main
    thread may exit or detach its mm via use_mm(), but other threads
    may still have a valid mm.

    This patch implements a small helper function that does things
    correctly, i.e.:

    1. We take the task's lock while whe handle its mm (we can't use
    get_task_mm()/mmput() pair as mmput() might sleep);

    2. To catch exited main thread case, we use find_lock_task_mm(),
    which walks up all threads and returns an appropriate task
    (with task lock held).

    Also, Per Peter Zijlstra's idea, now we don't grab tasklist_lock in
    the new helper, instead we take the rcu read lock. We can do this
    because the function is called after the cpu is taken down and marked
    offline, so no new tasks will get this cpu set in their mm mask.

    Signed-off-by: Anton Vorontsov
    Cc: Richard Weinberger
    Cc: Oleg Nesterov
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Benjamin Herrenschmidt
    Cc: Mike Frysinger
    Cc: Paul Mundt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Vorontsov
     

17 May, 2012

1 commit

  • It's been broken forever (i.e. it's not scheduling in a power
    aware fashion), as reported by Suresh and others sending
    patches, and nobody cares enough to fix it properly ...
    so remove it to make space free for something better.

    There's various problems with the code as it stands today, first
    and foremost the user interface which is bound to topology
    levels and has multiple values per level. This results in a
    state explosion which the administrator or distro needs to
    master and almost nobody does.

    Furthermore large configuration state spaces aren't good, it
    means the thing doesn't just work right because it's either
    under so many impossibe to meet constraints, or even if
    there's an achievable state workloads have to be aware of
    it precisely and can never meet it for dynamic workloads.

    So pushing this kind of decision to user-space was a bad idea
    even with a single knob - it's exponentially worse with knobs
    on every node of the topology.

    There is a proposal to replace the user interface with a single
    3 state knob:

    sched_balance_policy := { performance, power, auto }

    where 'auto' would be the preferred default which looks at things
    like Battery/AC mode and possible cpufreq state or whatever the hw
    exposes to show us power use expectations - but there's been no
    progress on it in the past many months.

    Aside from that, the actual implementation of the various knobs
    is known to be broken. There have been sporadic attempts at
    fixing things but these always stop short of reaching a mergable
    state.

    Therefore this wholesale removal with the hopes of spurring
    people who care to come forward once again and work on a
    coherent replacement.

    Signed-off-by: Peter Zijlstra
    Cc: Suresh Siddha
    Cc: Arjan van de Ven
    Cc: Vincent Guittot
    Cc: Vaidyanathan Srinivasan
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/1326104915.2442.53.camel@twins
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

25 Mar, 2012

1 commit

  • Pull avoidance patches from Paul Gortmaker:
    "Nearly every subsystem has some kind of header with a proto like:

    void foo(struct device *dev);

    and yet there is no reason for most of these guys to care about the
    sub fields within the device struct. This allows us to significantly
    reduce the scope of headers including headers. For this instance, a
    reduction of about 40% is achieved by replacing the include with the
    simple fact that the device is some kind of a struct.

    Unlike the much larger module.h cleanup, this one is simply two
    commits. One to fix the implicit users, and then one
    to delete the device.h includes from the linux/include/ dir wherever
    possible."

    * tag 'device-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
    device.h: audit and cleanup users in main include dir
    device.h: cleanup users outside of linux/include (C files)

    Linus Torvalds
     

16 Mar, 2012

1 commit

  • The header includes a lot of stuff, and
    it in turn gets a lot of use just for the basic "struct device"
    which appears so often.

    Clean up the users as follows:

    1) For those headers only needing "struct device" as a pointer
    in fcn args, replace the include with exactly that.

    2) For headers not really using anything from device.h, simply
    delete the include altogether.

    3) For headers relying on getting device.h implicitly before
    being included themselves, now explicitly include device.h

    4) For files in which doing #1 or #2 uncovers an implicit
    dependency on some other header, fix by explicitly adding
    the required header(s).

    Any C files that were implicitly relying on device.h to be
    present have already been dealt with in advance.

    Total removals from #1 and #2: 51. Total additions coming
    from #3: 9. Total other implicit dependencies from #4: 7.

    As of 3.3-rc1, there were 110, so a net removal of 42 gives
    about a 38% reduction in device.h presence in include/*

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

27 Jan, 2012

1 commit

  • This patch is based on Andi Kleen's work:
    Implement autoprobing/loading of modules serving CPU
    specific features (x86cpu autoloading).

    And Kay Siever's work to get rid of sysdev cpu structures
    and making use of struct device instead.

    Before, the cpuid driver had to be loaded to get the x86cpu
    autoloading feature. With this patch autoloading works through
    the /sys/devices/system/cpu object

    Cc: Kay Sievers
    Cc: Dave Jones
    Cc: Jens Axboe
    Cc: Herbert Xu
    Cc: Huang Ying
    Cc: Len Brown
    Acked-by: Andi Kleen
    Signed-off-by: Thomas Renninger
    Acked-by: H. Peter Anvin
    Signed-off-by: Greg Kroah-Hartman

    Thomas Renninger
     

08 Jan, 2012

1 commit

  • * 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits)
    arm: fix up some samsung merge sysdev conversion problems
    firmware: Fix an oops on reading fw_priv->fw in sysfs loading file
    Drivers:hv: Fix a bug in vmbus_driver_unregister()
    driver core: remove __must_check from device_create_file
    debugfs: add missing #ifdef HAS_IOMEM
    arm: time.h: remove device.h #include
    driver-core: remove sysdev.h usage.
    clockevents: remove sysdev.h
    arm: convert sysdev_class to a regular subsystem
    arm: leds: convert sysdev_class to a regular subsystem
    kobject: remove kset_find_obj_hinted()
    m86k: gpio - convert sysdev_class to a regular subsystem
    mips: txx9_sram - convert sysdev_class to a regular subsystem
    mips: 7segled - convert sysdev_class to a regular subsystem
    sh: dma - convert sysdev_class to a regular subsystem
    sh: intc - convert sysdev_class to a regular subsystem
    power: suspend - convert sysdev_class to a regular subsystem
    power: qe_ic - convert sysdev_class to a regular subsystem
    power: cmm - convert sysdev_class to a regular subsystem
    s390: time - convert sysdev_class to a regular subsystem
    ...

    Fix up conflicts with 'struct sysdev' removal from various platform
    drivers that got changed:
    - arch/arm/mach-exynos/cpu.c
    - arch/arm/mach-exynos/irq-eint.c
    - arch/arm/mach-s3c64xx/common.c
    - arch/arm/mach-s3c64xx/cpu.c
    - arch/arm/mach-s5p64x0/cpu.c
    - arch/arm/mach-s5pv210/common.c
    - arch/arm/plat-samsung/include/plat/cpu.h
    - arch/powerpc/kernel/sysfs.c
    and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h

    Linus Torvalds
     

22 Dec, 2011

1 commit

  • This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
    and converts the devices to regular devices. The sysdev drivers are
    implemented as subsystem interfaces now.

    After all sysdev classes are ported to regular driver core entities, the
    sysdev implementation will be entirely removed from the kernel.

    Userspace relies on events and generic sysfs subsystem infrastructure
    from sysdev devices, which are made available with this conversion.

    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: Arnd Bergmann
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Paul Mundt
    Cc: "David S. Miller"
    Cc: Chris Metcalf
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Borislav Petkov
    Cc: Tigran Aivazian
    Cc: Len Brown
    Cc: Zhang Rui
    Cc: Dave Jones
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Andrew Morton
    Cc: Arjan van de Ven
    Cc: "Rafael J. Wysocki"
    Cc: "Srivatsa S. Bhat"
    Signed-off-by: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     

12 Dec, 2011

1 commit

  • When architectures register CPUs, they indicate whether the CPU allows
    hotplugging; notably, x86 and ARM don't allow hotplugging CPU 0.
    Userspace can easily query the hotpluggability of a CPU via sysfs;
    however, the kernel has no convenient way of accessing that property in
    an architecture-independent way. While the kernel can simply try it and
    see, some code needs to distinguish between "hotplug failed" and
    "hotplug has no hope of working on this CPU"; for example, rcutorture's
    CPU hotplug tests want to avoid drowning out real hotplug failures with
    expected failures.

    Expose this property via a new cpu_is_hotpluggable function, so that the
    rest of the kernel can access it in an architecture-independent way.

    Signed-off-by: Josh Triplett
    Signed-off-by: Paul E. McKenney

    Josh Triplett
     

05 Nov, 2011

1 commit


26 Jul, 2011

1 commit

  • We presently define all kinds of notifiers in notifier.h. This is not
    necessary at all, since different subsystems use different notifiers, they
    are almost non-related with each other.

    This can also save much build time. Suppose I add a new netdevice event,
    really I don't have to recompile all the source, just network related.
    Without this patch, all the source will be recompiled.

    I move the notify events near to their subsystem notifier registers, so
    that they can be found more easily.

    This patch:

    It is not necessary to share the same notifier.h.

    Signed-off-by: WANG Cong
    Cc: David Miller
    Cc: "Rafael J. Wysocki"
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amerigo Wang
     

11 Nov, 2010

1 commit


29 Jun, 2010

1 commit

  • Reimplement CPU hotplugging support using trustee thread. On CPU
    down, a trustee thread is created and each step of CPU down is
    executed by the trustee and workqueue_cpu_callback() simply drives and
    waits for trustee state transitions.

    CPU down operation no longer waits for works to be drained but trustee
    sticks around till all pending works have been completed. If CPU is
    brought back up while works are still draining,
    workqueue_cpu_callback() tells trustee to step down and tell workers
    to rebind to the cpu.

    As it's difficult to tell whether cwqs are empty if it's freezing or
    frozen, trustee doesn't consider draining to be complete while a gcwq
    is freezing or frozen (tracked by new GCWQ_FREEZING flag). Also,
    workers which get unbound from their cpu are marked with WORKER_ROGUE.

    Trustee based implementation doesn't bring any new feature at this
    point but it will be used to manage worker pool when dynamic shared
    worker pool is implemented.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

09 Jun, 2010

2 commits

  • Currently, when a cpu goes down, cpu_active is cleared before
    CPU_DOWN_PREPARE starts and cpuset configuration is updated from a
    default priority cpu notifier. When a cpu is coming up, it's set
    before CPU_ONLINE but cpuset configuration again is updated from the
    same cpu notifier.

    For cpu notifiers, this presents an inconsistent state. Threads which
    a CPU_DOWN_PREPARE notifier expects to be bound to the CPU can be
    migrated to other cpus because the cpu is no more inactive.

    Fix it by updating cpu_active in the highest priority cpu notifier and
    cpuset configuration in the second highest when a cpu is coming up.
    Down path is updated similarly. This guarantees that all other cpu
    notifiers see consistent cpu_active and cpuset configuration.

    cpuset_track_online_cpus() notifier is converted to
    cpuset_update_active_cpus() which just updates the configuration and
    now called from cpuset_cpu_[in]active() notifiers registered from
    sched_init_smp(). If cpuset is disabled, cpuset_update_active_cpus()
    degenerates into partition_sched_domains() making separate notifier
    for !CONFIG_CPUSETS unnecessary.

    This problem is triggered by cmwq. During CPU_DOWN_PREPARE, hotplug
    callback creates a kthread and kthread_bind()s it to the target cpu,
    and the thread is expected to run on that cpu.

    * Ingo's test discovered __cpuinit/exit markups were incorrect.
    Fixed.

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Cc: Rusty Russell
    Cc: Ingo Molnar
    Cc: Paul Menage

    Tejun Heo
     
  • Instead of hardcoding priority 10 and 20 in sched and perf, collect
    them into CPU_PRI_* enums.

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Cc: Rusty Russell
    Cc: Paul Mackerras
    Cc: Ingo Molnar
    Cc: Arnaldo Carvalho de Melo

    Tejun Heo
     

09 Dec, 2009

2 commits

  • Currently the cpu-allocation/deallocation process comprises of two steps:
    - Set the indicators and to update the device tree with DLPAR node
    information.

    - Online/offline the allocated/deallocated CPU.

    This is achieved by writing to the sysfs tunables "probe" during allocation
    and "release" during deallocation.

    At the sametime, the userspace can independently online/offline the CPUs of
    the system using the sysfs tunable "online".

    It is quite possible that when a userspace tool offlines a CPU
    for the purpose of deallocation and is in the process of updating the device
    tree, some other userspace tool could bring the CPU back online by writing to
    the "online" sysfs tunable thereby causing the deallocate process to fail.

    The solution to this is to serialize writes to the "probe/release" sysfs
    tunable with the writes to the "online" sysfs tunable.

    This patch employs a mutex to provide this serialization, which is a no-op on
    all architectures except PPC_PSERIES

    Signed-off-by: Gautham R Shenoy
    Acked-by: Vaidyanathan Srinivasan
    Signed-off-by: Benjamin Herrenschmidt

    Gautham R Shenoy
     
  • Version 3 of this patch is updated with documentation added to
    Documentation/ABI. There are no changes to any of the C code from v2
    of the patch.

    In order to support kernel DLPAR of CPU resources we need to provide an
    interface to add (probe) and remove (release) the resource from the system.
    This patch Creates new generic probe and release sysfs files to facilitate
    cpu probe/release. The probe/release interface provides for allowing each
    arch to supply their own routines for implementing the backend of adding
    and removing cpus to/from the system.

    This also creates the powerpc specific stubs to handle the arch callouts
    from writes to the sysfs files.

    The creation and use of these files is regulated by the
    CONFIG_ARCH_CPU_PROBE_RELEASE option so that only architectures that need the
    capability will have the files created.

    Signed-off-by: Nathan Fontenot
    Signed-off-by: Benjamin Herrenschmidt

    Nathan Fontenot
     

16 Aug, 2009

1 commit

  • This patch introduces a new cpu_notifier() API that is similar
    to hotcpu_notifier(), but which also notifies of CPUs coming
    online during boot in the !HOTPLUG_CPU case.

    Reported-by: Ingo Molnar
    Reported-by: Hugh Dickins
    Tested-by: Hugh Dickins
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: josht@linux.vnet.ibm.com
    Cc: akpm@linux-foundation.org
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: benh@kernel.crashing.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

23 Jun, 2009

1 commit

  • SLAB uses get/put_online_cpus() which use a mutex which is itself only
    initialized when cpu_hotplug_init() is called. Currently we hang suring
    boot in SLAB due to doing that too late.

    Reported by James Bottomley and Sachin Sant (and possibly others).
    Debugged by Benjamin Herrenschmidt.

    This just removes the dynamic initialization of the data structures, and
    replaces it with a static one, avoiding this dependency entirely, and
    removing one unnecessary special initcall.

    Tested-by: Sachin Sant
    Tested-by: James Bottomley
    Tested-by: Benjamin Herrenschmidt
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

03 Apr, 2009

1 commit


09 Sep, 2008

1 commit

  • Right now, there is no notifier that is called on a new cpu, before the new
    cpu begins processing interrupts/softirqs.
    Various kernel function would need that notification, e.g. kvm works around
    by calling smp_call_function_single(), rcu polls cpu_online_map.

    The patch adds a CPU_STARTING notification. It also adds a helper function
    that sends the message to all cpu_chain handlers.

    Tested on x86-64.
    All other archs are untested. Especially on sparc, I'm not sure if I got
    it right.

    Signed-off-by: Manfred Spraul
    Signed-off-by: Ingo Molnar

    Manfred Spraul
     

26 Jul, 2008

1 commit

  • workqueue_cpu_callback(CPU_DEAD) flushes cwq->thread under
    cpu_maps_update_begin(). This means that the multithreaded workqueues
    can't use get_online_cpus() due to the possible deadlock, very bad and
    very old problem.

    Introduce the new state, CPU_POST_DEAD, which is called after
    cpu_hotplug_done() but before cpu_maps_update_done().

    Change workqueue_cpu_callback() to use CPU_POST_DEAD instead of CPU_DEAD.
    This means that create/destroy functions can't rely on get_online_cpus()
    any longer and should take cpu_add_remove_lock instead.

    [akpm@linux-foundation.org: fix CONFIG_SMP=n]
    Signed-off-by: Oleg Nesterov
    Acked-by: Gautham R Shenoy
    Cc: Heiko Carstens
    Cc: Max Krasnyansky
    Cc: Paul Jackson
    Cc: Paul Menage
    Cc: Peter Zijlstra
    Cc: Vegard Nossum
    Cc: Martin Schwidefsky
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

29 Apr, 2008

1 commit

  • Fix following warnings:
    WARNING: vmlinux.o(.data+0x5020): Section mismatch in reference from the variable cpu_vsyscall_notifier_nb.12876 to the function .cpuinit.text:cpu_vsyscall_notifier()
    WARNING: vmlinux.o(.data+0x9ce0): Section mismatch in reference from the variable profile_cpu_callback_nb.17654 to the function .devinit.text:profile_cpu_callback()
    WARNING: vmlinux.o(.data+0xd380): Section mismatch in reference from the variable workqueue_cpu_callback_nb.15004 to the function .devinit.text:workqueue_cpu_callback()
    WARNING: vmlinux.o(.data+0x11d00): Section mismatch in reference from the variable relay_hotcpu_callback_nb.19626 to the function .cpuinit.text:relay_hotcpu_callback()
    WARNING: vmlinux.o(.data+0x12970): Section mismatch in reference from the variable cpu_callback_nb.24694 to the function .devinit.text:cpu_callback()
    WARNING: vmlinux.o(.data+0x3fee0): Section mismatch in reference from the variable percpu_counter_hotcpu_callback_nb.10903 to the function .cpuinit.text:percpu_counter_hotcpu_callback()
    WARNING: vmlinux.o(.data+0x74ce0): Section mismatch in reference from the variable topology_cpu_callback_nb.12506 to the function .cpuinit.text:topology_cpu_callback()

    Functions used as argument are by definition only used in HOTPLUG_CPU
    situations so thay are annotated __cpuinit. Annotate the static variable used
    by hotcpu_register with __cpuinitdata to match this definition.

    Signed-off-by: Sam Ravnborg
    Cc: Gautham R Shenoy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sam Ravnborg
     

19 Apr, 2008

1 commit


26 Jan, 2008

1 commit