22 Dec, 2011

1 commit

  • This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
    and converts the devices to regular devices. The sysdev drivers are
    implemented as subsystem interfaces now.

    After all sysdev classes are ported to regular driver core entities, the
    sysdev implementation will be entirely removed from the kernel.

    Userspace relies on events and generic sysfs subsystem infrastructure
    from sysdev devices, which are made available with this conversion.

    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: Arnd Bergmann
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Paul Mundt
    Cc: "David S. Miller"
    Cc: Chris Metcalf
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Borislav Petkov
    Cc: Tigran Aivazian
    Cc: Len Brown
    Cc: Zhang Rui
    Cc: Dave Jones
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Andrew Morton
    Cc: Arjan van de Ven
    Cc: "Rafael J. Wysocki"
    Cc: "Srivatsa S. Bhat"
    Signed-off-by: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     

08 Nov, 2011

1 commit

  • * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
    cpuidle: Single/Global registration of idle states
    cpuidle: Split cpuidle_state structure and move per-cpu statistics fields
    cpuidle: Remove CPUIDLE_FLAG_IGNORE and dev->prepare()
    cpuidle: Move dev->last_residency update to driver enter routine; remove dev->last_state
    ACPI: Fix CONFIG_ACPI_DOCK=n compiler warning
    ACPI: Export FADT pm_profile integer value to userspace
    thermal: Prevent polling from happening during system suspend
    ACPI: Drop ACPI_NO_HARDWARE_INIT
    ACPI atomicio: Convert width in bits to bytes in __acpi_ioremap_fast()
    PNPACPI: Simplify disabled resource registration
    ACPI: Fix possible recursive locking in hwregs.c
    ACPI: use kstrdup()
    mrst pmu: update comment
    tools/power turbostat: less verbose debugging

    Linus Torvalds
     

07 Nov, 2011

4 commits

  • This patch makes the cpuidle_states structure global (single copy)
    instead of per-cpu. The statistics needed on per-cpu basis
    by the governor are kept per-cpu. This simplifies the cpuidle
    subsystem as state registration is done by single cpu only.
    Having single copy of cpuidle_states saves memory. Rare case
    of asymmetric C-states can be handled within the cpuidle driver
    and architectures such as POWER do not have asymmetric C-states.

    Having single/global registration of all the idle states,
    dynamic C-state transitions on x86 are handled by
    the boot cpu. Here, the boot cpu would disable all the devices,
    re-populate the states and later enable all the devices,
    irrespective of the cpu that would receive the notification first.

    Reference:
    https://lkml.org/lkml/2011/4/25/83

    Signed-off-by: Deepthi Dharwar
    Signed-off-by: Trinabh Gupta
    Tested-by: Jean Pihet
    Reviewed-by: Kevin Hilman
    Acked-by: Arjan van de Ven
    Acked-by: Kevin Hilman
    Signed-off-by: Len Brown

    Deepthi Dharwar
     
  • This is the first step towards global registration of cpuidle
    states. The statistics used primarily by the governor are per-cpu
    and have to be split from rest of the fields inside cpuidle_state,
    which would be made global i.e. single copy. The driver_data field
    is also per-cpu and moved.

    Signed-off-by: Deepthi Dharwar
    Signed-off-by: Trinabh Gupta
    Tested-by: Jean Pihet
    Reviewed-by: Kevin Hilman
    Acked-by: Arjan van de Ven
    Acked-by: Kevin Hilman
    Signed-off-by: Len Brown

    Deepthi Dharwar
     
  • The cpuidle_device->prepare() mechanism causes updates to the
    cpuidle_state[].flags, setting and clearing CPUIDLE_FLAG_IGNORE
    to tell the governor not to chose a state on a per-cpu basis at
    run-time. State demotion is now handled by the driver and it returns
    the actual state entered. Hence, this mechanism is not required.
    Also this removes per-cpu flags from cpuidle_state enabling
    it to be made global.

    Reference:
    https://lkml.org/lkml/2011/3/25/52

    Signed-off-by: Deepthi Dharwar
    Signed-off-by: Trinabh Gupta
    Tested-by: Jean Pihet
    Acked-by: Arjan van de Ven
    Reviewed-by: Kevin Hilman
    Signed-off-by: Len Brown

    Deepthi Dharwar
     
  • Cpuidle governor only suggests the state to enter using the
    governor->select() interface, but allows the low level driver to
    override the recommended state. The actual entered state
    may be different because of software or hardware demotion. Software
    demotion is done by the back-end cpuidle driver and can be accounted
    correctly. Current cpuidle code uses last_state field to capture the
    actual state entered and based on that updates the statistics for the
    state entered.

    Ideally the driver enter routine should update the counters,
    and it should return the state actually entered rather than the time
    spent there. The generic cpuidle code should simply handle where
    the counters live in the sysfs namespace, not updating the counters.

    Reference:
    https://lkml.org/lkml/2011/3/25/52

    Signed-off-by: Deepthi Dharwar
    Signed-off-by: Trinabh Gupta
    Tested-by: Jean Pihet
    Reviewed-by: Kevin Hilman
    Acked-by: Arjan van de Ven
    Acked-by: Kevin Hilman
    Signed-off-by: Len Brown

    Deepthi Dharwar
     

01 Nov, 2011

2 commits


25 Aug, 2011

1 commit


04 Aug, 2011

3 commits

  • cpuidle users should call cpuidle_call_idle() directly
    rather than via (pm_idle)() function pointer.

    Architecture may choose to continue using (pm_idle)(),
    but cpuidle need not depend on it:

    my_arch_cpu_idle()
    ...
    if(cpuidle_call_idle())
    pm_idle();

    cc: Kevin Hilman
    cc: Paul Mundt
    cc: x86@kernel.org
    Acked-by: H. Peter Anvin
    Signed-off-by: Len Brown

    Len Brown
     
  • When a Xen Dom0 kernel boots on a hypervisor, it gets access
    to the raw-hardware ACPI tables. While it parses the idle tables
    for the hypervisor's beneift, it uses HLT for its own idle.

    Rather than have xen scribble on pm_idle and access default_idle,
    have it simply disable_cpuidle() so acpi_idle will not load and
    architecture default HLT will be used.

    cc: xen-devel@lists.xensource.com
    Tested-by: Konrad Rzeszutek Wilk
    Acked-by: H. Peter Anvin
    Signed-off-by: Len Brown

    Len Brown
     
  • useful for disabling cpuidle to fall back
    to architecture-default idle loop

    cpuidle drivers and governors will fail to register.
    on x86 they'll say so:

    intel_idle: intel_idle yielding to (null)
    ACPI: acpi_idle yielding to (null)

    Signed-off-by: Len Brown

    Len Brown
     

30 May, 2011

1 commit

  • * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
    x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param
    x86 idle: deprecate "no-hlt" cmdline param
    x86 idle APM: deprecate CONFIG_APM_CPU_IDLE
    x86 idle floppy: deprecate disable_hlt()
    x86 idle: EXPORT_SYMBOL(default_idle, pm_idle) only when APM demands it
    x86 idle: clarify AMD erratum 400 workaround
    idle governor: Avoid lock acquisition to read pm_qos before entering idle
    cpuidle: menu: fixed wrapping timers at 4.294 seconds

    Linus Torvalds
     

29 May, 2011

1 commit

  • Cpuidle menu governor is using u32 as a temporary datatype for storing
    nanosecond values which wrap around at 4.294 seconds. This causes errors
    in predicted sleep times resulting in higher than should be C state
    selection and increased power consumption. This also breaks cpuidle
    state residency statistics.

    cc: stable@kernel.org # .32.x through .39.x
    Signed-off-by: Tero Kristo
    Signed-off-by: Len Brown

    Tero Kristo
     

15 Feb, 2011

1 commit


19 Jan, 2011

1 commit

  • Fix a bunch of
    warning: ‘inline’ is not at beginning of declaration
    messages when building a 'make allyesconfig' kernel with -Wextra.

    These warnings are trivial to kill, yet rather annoying when building with
    -Wextra.
    The more we can cut down on pointless crap like this the better (IMHO).

    A previous patch to do this for a 'allnoconfig' build has already been
    merged. This just takes the cleanup a little further.

    Signed-off-by: Jesper Juhl
    Signed-off-by: Jiri Kosina

    Jesper Juhl
     

13 Jan, 2011

6 commits

  • Len Brown
     
  • Len Brown
     
  • … from the cpuidle layer

    Currently intel_idle and acpi_idle driver show double cpu_idle "exit idle"
    events -> this patch fixes it and makes cpu_idle events throwing less complex.

    It also introduces cpu_idle events for all architectures which use
    the cpuidle subsystem, namely:
    - arch/arm/mach-at91/cpuidle.c
    - arch/arm/mach-davinci/cpuidle.c
    - arch/arm/mach-kirkwood/cpuidle.c
    - arch/arm/mach-omap2/cpuidle34xx.c
    - arch/drivers/acpi/processor_idle.c (for all cases, not only mwait)
    - arch/x86/kernel/process.c (did throw events before, but was a mess)
    - drivers/idle/intel_idle.c (did throw events before)

    Convention should be:
    Fire cpu_idle events inside the current pm_idle function (not somewhere
    down the the callee tree) to keep things easy.

    Current possible pm_idle functions in X86:
    c1e_idle, poll_idle, cpuidle_idle_call, mwait_idle, default_idle
    -> this is really easy is now.

    This affects userspace:
    The type field of the cpu_idle power event can now direclty get
    mapped to:
    /sys/devices/system/cpu/cpuX/cpuidle/stateX/{name,desc,usage,time,...}
    instead of throwing very CPU/mwait specific values.
    This change is not visible for the intel_idle driver.
    For the acpi_idle driver it should only be visible if the vendor
    misses out C-states in his BIOS.
    Another (perf timechart) patch reads out cpuidle info of cpu_idle
    events from:
    /sys/.../cpuidle/stateX/*, then the cpuidle events are mapped
    to the correct C-/cpuidle state again, even if e.g. vendors miss
    out C-states in their BIOS and for example only export C1 and C3.
    -> everything is fine.

    Signed-off-by: Thomas Renninger <trenn@suse.de>
    CC: Robert Schoene <robert.schoene@tu-dresden.de>
    CC: Jean Pihet <j-pihet@ti.com>
    CC: Arjan van de Ven <arjan@linux.intel.com>
    CC: Ingo Molnar <mingo@elte.hu>
    CC: Frederic Weisbecker <fweisbec@gmail.com>
    CC: linux-pm@lists.linux-foundation.org
    CC: linux-acpi@vger.kernel.org
    CC: linux-kernel@vger.kernel.org
    CC: linux-perf-users@vger.kernel.org
    CC: linux-omap@vger.kernel.org
    Signed-off-by: Len Brown <len.brown@intel.com>

    Thomas Renninger
     
  • it serves no purpose

    Signed-off-by: Len Brown

    Len Brown
     
  • C0 means and is well know as "not idle".
    All documentation out there uses this term as "running"/"not idle"
    state. Also Linux userspace tools (e.g. cpufreq-aperf and turbostat)
    show C0 residency which there is correct, but means something totally
    else than cpuidle "POLL" state.

    Signed-off-by: Thomas Renninger
    Signed-off-by: Len Brown

    Thomas Renninger
     
  • The following scenario is possible with the current cpuidle code and
    the ACPI cpuidle driver:
    (1) acpi_processor_cst_has_changed() is called,
    (2) cpuidle_disable_device() is called,
    (3) cpuidle_remove_state_sysfs() is called to remove the (presumably
    outdated) states info from sysfs,
    (3) acpi_processor_get_power_info() is called, the first entry in the
    pr->power.states[] table is filled with zeros,
    (4) acpi_processor_setup_cpuidle() is called and it doesn't fill the
    first entry in pr->power.states[],
    (5) cpuidle_enable_device() is called,
    (6) __cpuidle_register_device() is _not_ called, since the device has
    already been registered,
    (7) Consequently, poll_idle_init() is _not_ called either,
    (8) cpuidle_add_state_sysfs() is called to create the sysfs attributes
    for the new states and it uses the bogus first table entry from
    acpi_processor_get_power_info() for creating state0.

    This problem is avoided if cpuidle_enable_device()
    unconditionally calls poll_idle_init().

    Reported-by: Len Brown
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Len Brown
    cc: stable@kernel.org

    Rafael J. Wysocki
     

08 Jan, 2011

1 commit

  • * 'for-2.6.38' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (30 commits)
    gameport: use this_cpu_read instead of lookup
    x86: udelay: Use this_cpu_read to avoid address calculation
    x86: Use this_cpu_inc_return for nmi counter
    x86: Replace uses of current_cpu_data with this_cpu ops
    x86: Use this_cpu_ops to optimize code
    vmstat: User per cpu atomics to avoid interrupt disable / enable
    irq_work: Use per cpu atomics instead of regular atomics
    cpuops: Use cmpxchg for xchg to avoid lock semantics
    x86: this_cpu_cmpxchg and this_cpu_xchg operations
    percpu: Generic this_cpu_cmpxchg() and this_cpu_xchg support
    percpu,x86: relocate this_cpu_add_return() and friends
    connector: Use this_cpu operations
    xen: Use this_cpu_inc_return
    taskstats: Use this_cpu_ops
    random: Use this_cpu_inc_return
    fs: Use this_cpu_inc_return in buffer.c
    highmem: Use this_cpu_xx_return() operations
    vmstat: Use this_cpu_inc_return for vm statistics
    x86: Support for this_cpu_add, sub, dec, inc_return
    percpu: Generic support for this_cpu_add, sub, dec, inc_return
    ...

    Fixed up conflicts: in arch/x86/kernel/{apic/nmi.c, apic/x2apic_uv_x.c, process.c}
    as per Tejun.

    Linus Torvalds
     

04 Jan, 2011

1 commit

  • Add these new power trace events:

    power:cpu_idle
    power:cpu_frequency
    power:machine_suspend

    The old C-state/idle accounting events:
    power:power_start
    power:power_end

    Have now a replacement (but we are still keeping the old
    tracepoints for compatibility):

    power:cpu_idle

    and
    power:power_frequency

    is replaced with:
    power:cpu_frequency

    power:machine_suspend is newly introduced.

    Jean Pihet has a patch integrated into the generic layer
    (kernel/power/suspend.c) which will make use of it.

    the type= field got removed from both, it was never
    used and the type is differed by the event type itself.

    perf timechart userspace tool gets adjusted in a separate patch.

    Signed-off-by: Thomas Renninger
    Signed-off-by: Ingo Molnar
    Acked-by: Arjan van de Ven
    Acked-by: Jean Pihet
    Cc: Arnaldo Carvalho de Melo
    Cc: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: rjw@sisk.pl
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    LKML-Reference:

    Thomas Renninger
     

17 Dec, 2010

1 commit

  • __get_cpu_var() can be replaced with this_cpu_read and will then use a single
    read instruction with implied address calculation to access the correct per cpu
    instance.

    However, the address of a per cpu variable passed to __this_cpu_read() cannot be
    determed (since its an implied address conversion through segment prefixes).
    Therefore apply this only to uses of __get_cpu_var where the addres of the
    variable is not used.

    V3->V4:
    - Move one instance of this_cpu_inc_return to a later patch
    so that this one can go in without percpu infrastructrure
    changes.

    Sedat: fixed compile failure caused by an extra ')'.

    Cc: Neil Horman
    Cc: Martin Schwidefsky
    Cc: Sedat Dilek
    Acked-by: H. Peter Anvin
    Signed-off-by: Christoph Lameter
    Signed-off-by: Tejun Heo

    Christoph Lameter
     

29 Sep, 2010

1 commit


10 Aug, 2010

1 commit

  • On some SoC chips, HW resources may be in use during any particular idle
    period. As a consequence, the cpuidle states that the SoC is safe to
    enter can change from idle period to idle period. In addition, the
    latency and threshold of each cpuidle state can vary, depending on the
    operating condition when the CPU becomes idle, e.g. the current cpu
    frequency, the current state of the HW blocks, etc.

    cpuidle core and the menu governor, in the current form, are geared
    towards cpuidle states that are static, i.e. the availabiltiy of the
    states, their latencies, their thresholds are non-changing during run
    time. cpuidle does not provide any hook that cpuidle drivers can use to
    adjust those values on the fly for the current idle period before the menu
    governor selects the target cpuidle state.

    This patch extends cpuidle core and the menu governor to handle states
    that are dynamic. There are three additions in the patch and the patch
    maintains backwards-compatibility with existing cpuidle drivers.

    1) add prepare() to struct cpuidle_device. A cpuidle driver can hook
    into the callback and cpuidle will call prepare() before calling the
    governor's select function. The callback gives the cpuidle driver a
    chance to update the dynamic information of the cpuidle states for the
    current idle period, e.g. state availability, latencies, thresholds,
    power values, etc.

    2) add CPUIDLE_FLAG_IGNORE as one of the state flags. In the prepare()
    function, a cpuidle driver can set/clear the flag to indicate to the
    menu governor whether a cpuidle state should be ignored, i.e. not
    available, during the current idle period.

    3) add power_specified bit to struct cpuidle_device. The menu governor
    currently assumes that the cpuidle states are arranged in the order of
    increasing latency, threshold, and power savings. This is true or can
    be made true for static states. Once the state parameters are dynamic,
    the latencies, thresholds, and power savings for the cpuidle states can
    increase or decrease by different amounts from idle period to idle
    period. So the assumption of increasing latency, threshold, and power
    savings from Cn to C(n+1) can no longer be guaranteed.

    It can be straightforward to calculate the power consumption of each
    available state and to specify it in power_usage for the idle period.
    Using the power_usage fields, the menu governor then selects the state
    that has the lowest power consumption and that still satisfies all other
    critieria. The power_specified bit defaults to 0. For existing cpuidle
    drivers, cpuidle detects that power_specified is 0 and fills in a dummy
    set of power_usage values.

    Signed-off-by: Ai Li
    Cc: Len Brown
    Acked-by: Arjan van de Ven
    Cc: Ingo Molnar
    Cc: Venkatesh Pallipadi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ai Li
     

04 Aug, 2010

1 commit

  • and fix the broken case if a core's frequency depends on others.

    trace_power_frequency was only implemented in a rather ungeneric way
    in acpi-cpufreq driver's target() function only.
    -> Move the call to trace_power_frequency to
    cpufreq.c:cpufreq_notify_transition() where CPUFREQ_POSTCHANGE
    notifier is triggered.
    This will support power frequency tracing by all cpufreq drivers

    trace_power_frequency did not trace frequency changes correctly when
    the userspace governor was used or when CPU cores' frequency depend
    on each other.
    -> Moving this into the CPUFREQ_POSTCHANGE notifier and pass the cpu
    which gets switched automatically fixes this.

    Robert Schoene provided some important fixes on top of my initial
    quick shot version which are integrated in this patch:
    - Forgot some changes in power_end trace (TP_printk/variable names)
    - Variable dummy in power_end must now be cpu_id
    - Use static 64 bit variable instead of unsigned int for cpu_id

    Signed-off-by: Thomas Renninger
    CC: davej@redhat.com
    CC: arjan@infradead.org
    CC: linux-kernel@vger.kernel.org
    CC: robert.schoene@tu-dresden.de
    Tested-by: robert.schoene@tu-dresden.de
    Signed-off-by: Dave Jones

    Thomas Renninger
     

01 Jul, 2010

1 commit

  • Commit 0224cf4c5e (sched: Intoduce get_cpu_iowait_time_us())
    broke things by not making sure preemption was indeed disabled
    by the callers of nr_iowait_cpu() which took the iowait value of
    the current cpu.

    This resulted in a heap of preempt warnings. Cure this by making
    nr_iowait_cpu() take a cpu number and fix up the callers to pass
    in the right number.

    Signed-off-by: Peter Zijlstra
    Cc: Arjan van de Ven
    Cc: Sergey Senozhatsky
    Cc: Rafael J. Wysocki
    Cc: Maxim Levitsky
    Cc: Len Brown
    Cc: Pavel Machek
    Cc: Jiri Slaby
    Cc: linux-pm@lists.linux-foundation.org
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

29 May, 2010

1 commit

  • * 'idle-release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-idle-2.6:
    intel_idle: native hardware cpuidle driver for latest Intel processors
    ACPI: acpi_idle: touch TS_POLLING only in the non-MWAIT case
    acpi_pad: uses MONITOR/MWAIT, so it doesn't need to clear TS_POLLING
    sched: clarify commment for TS_POLLING
    ACPI: allow a native cpuidle driver to displace ACPI
    cpuidle: make cpuidle_curr_driver static
    cpuidle: add cpuidle_unregister_driver() error check
    cpuidle: fail to register if !CONFIG_CPU_IDLE

    Linus Torvalds
     

28 May, 2010

2 commits


25 May, 2010

1 commit

  • Currently, the menu governor uses the (corrected) next timer as key item
    for predicting the idle duration.

    It turns out that there are specific cases where this breaks down: There
    are cases where we have a very repetitive pattern of idle durations, where
    the idle period is pretty much the same, for reasons completely unrelated
    to the next timer event. Examples of such repeating patterns are network
    loads with irq mitigation, the mouse moving but in theory also the wifi
    beacons.

    This patch adds a relatively simple detector for such repeating patterns,
    where the standard deviation of the last 8 idle periods is compared to a
    threshold.

    With this extra predictor in place, measurements show that the DECAY
    factor can now be increased (the decaying average will now decay slower)
    to get an even more stable result.

    [arjan@infradead.org: fix bug identified by Frank]
    Signed-off-by: Arjan van de Ven
    Cc: Corrado Zoccolo
    Cc: Frank Rowand
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

11 May, 2010

1 commit

  • This patch changes the string based list management to a handle base
    implementation to help with the hot path use of pm-qos, it also renames
    much of the API to use "request" as opposed to "requirement" that was
    used in the initial implementation. I did this because request more
    accurately represents what it actually does.

    Also, I added a string based ABI for users wanting to use a string
    interface. So if the user writes 0xDDDDDDDD formatted hex it will be
    accepted by the interface. (someone asked me for it and I don't think
    it hurts anything.)

    This patch updates some documentation input I got from Randy.

    Signed-off-by: markgross
    Signed-off-by: Rafael J. Wysocki

    Mark Gross
     

10 May, 2010

1 commit

  • commit 672917dcc78 ("cpuidle: menu governor: reduce latency on exit")
    added an optimization, where the analysis on the past idle period moved
    from the end of idle, to the beginning of the new idle.

    Unfortunately, this optimization had a bug where it zeroed one key
    variable for new use, that is needed for the analysis. The fix is
    simple, zero the variable after doing the work from the previous idle.

    During the audit of the code that found this issue, another issue was
    also found; the ->measured_us data structure member is never set, a
    local variable is always used instead.

    Signed-off-by: Arjan van de Ven
    Cc: Corrado Zoccolo
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

08 Mar, 2010

2 commits

  • Constify struct sysfs_ops.

    This is part of the ops structure constification
    effort started by Arjan van de Ven et al.

    Benefits of this constification:

    * prevents modification of data that is shared
    (referenced) by many other structure instances
    at runtime

    * detects/prevents accidental (but not intentional)
    modification attempts on archs that enforce
    read-only kernel data at runtime

    * potentially better optimized code as the compiler
    can assume that the const data cannot be changed

    * the compiler/linker move const data into .rodata
    and therefore exclude them from false sharing

    Signed-off-by: Emese Revfy
    Acked-by: David Teigland
    Acked-by: Matt Domsch
    Acked-by: Maciej Sosnowski
    Acked-by: Hans J. Koch
    Acked-by: Pekka Enberg
    Acked-by: Jens Axboe
    Acked-by: Stephen Hemminger
    Signed-off-by: Greg Kroah-Hartman

    Emese Revfy
     
  • Passing the attribute to the low level IO functions allows all kinds
    of cleanups, by sharing low level IO code without requiring
    an own function for every piece of data.

    Also drivers can extend the attributes with own data fields
    and use that in the low level function.

    Similar to sysdev_attributes and normal attributes.

    This is a tree-wide sweep, converting everything in one go.

    No functional changes in this patch other than passing the new
    argument everywhere.

    Tested on x86, the non x86 parts are uncompiled.

    Signed-off-by: Andi Kleen
    Signed-off-by: Greg Kroah-Hartman

    Andi Kleen
     

07 Mar, 2010

1 commit


12 Jan, 2010

1 commit