25 Sep, 2013

1 commit

  • cpufreq_get() can be called from external drivers which might not be aware if
    cpufreq driver is registered or not. And so we should actually check if cpufreq
    driver is registered or not and also if cpufreq is active or disabled, at the
    beginning of cpufreq_get().

    Otherwise call to lock_policy_rwsem_read() might hit BUG_ON(!policy).

    Reported-and-tested-by: Linus Walleij
    Signed-off-by: Viresh Kumar
    Reviewed-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     

20 Sep, 2013

1 commit

  • On systems that support intel_pstate, acpi_cpufreq fails to load, and
    udev keeps trying until trace gets filled up and kernel crashes.

    The root cause is driver return ret from cpufreq_register_driver(),
    because when some other driver takes over before, it will return
    EBUSY and then udev will keep trying ...

    cpufreq_register_driver() should return EEXIST instead so that the
    system can boot without appending intel_pstate=disable and still use
    intel_pstate.

    Signed-off-by: Yinghai Lu
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Yinghai Lu
     

18 Sep, 2013

2 commits

  • Current code looks like this:

    WARN_ON(lock_policy_rwsem_write(cpu));
    update_policy_cpu(policy, new_cpu);
    unlock_policy_rwsem_write(cpu);

    {lock|unlock}_policy_rwsem_write(cpu) takes/releases policy->cpu's rwsem.
    Because cpu is changing with the call to update_policy_cpu(), the
    unlock_policy_rwsem_write() will release the incorrect lock.

    The right solution would be to release the same lock as was taken earlier. Also
    update_policy_cpu() was also called from cpufreq_add_dev() without any locks and
    so its better if we move this locking to inside update_policy_cpu().

    This patch fixes a regression introduced in 3.12 by commit f9ba680d23
    (cpufreq: Extract the handover of policy cpu to a helper function).

    Reported-and-tested-by: Jon Medhurst
    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • This broke after a recent change "cedb70a cpufreq: Split __cpufreq_remove_dev()
    into two parts" from Srivatsa.

    Consider a scenario where we have two CPUs in a policy (0 & 1) and we are
    removing CPU 1. On the call to __cpufreq_remove_dev_prepare() we have cleared 1
    from policy->cpus and now on a call to __cpufreq_remove_dev_finish() we read
    cpumask_weight of policy->cpus, which will come as 1 and this code will behave
    as if we are removing the last CPU from policy :)

    Fix it by clearing the CPU mask in __cpufreq_remove_dev_finish() instead of
    __cpufreq_remove_dev_prepare().

    Tested-by: Stephen Warren
    Reviewed-by: Srivatsa S. Bhat
    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     

12 Sep, 2013

4 commits

  • In cpufreq_policy_restore() before system suspend policy is read from
    percpu's cpufreq_cpu_data_fallback. It's a read operation rather
    than a write one, so take the lock for reading in there.

    Signed-off-by: Lan Tianyu
    Reviewed-by: Srivatsa S. Bhat
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Lan Tianyu
     
  • If update_policy_cpu() is invoked with the existing policy->cpu itself
    as the new-cpu parameter, then a lot of things can go terribly wrong.

    In its present form, update_policy_cpu() always assumes that the new-cpu
    is different from policy->cpu and invokes other functions to perform their
    respective updates. And those functions implement the actual update like
    this:

    per_cpu(..., new_cpu) = per_cpu(..., last_cpu);
    per_cpu(..., last_cpu) = NULL;

    Thus, when new_cpu == last_cpu, the final NULL assignment makes the per-cpu
    references vanish into thin air! (memory leak). From there, it leads to more
    problems: cpufreq_stats_create_table() now doesn't find the per-cpu reference
    and hence tries to create a new sysfs-group; but sysfs already had created
    the group earlier, so it complains that it cannot create a duplicate filename.
    In short, the repercussions of a rather innocuous invocation of
    update_policy_cpu() can turn out to be pretty nasty.

    Ideally update_policy_cpu() should handle this situation (new == last)
    gracefully, and not lead to such severe problems. So fix it by adding an
    appropriate check.

    Signed-off-by: Srivatsa S. Bhat
    Tested-by: Stephen Warren
    Signed-off-by: Rafael J. Wysocki

    Srivatsa S. Bhat
     
  • In __cpufreq_remove_dev_prepare(), the code which decides whether to remove
    the sysfs link or nominate a new policy cpu, is governed by an if/else block
    with a rather complex set of conditionals. Worse, they harbor a subtlety
    which leads to certain unintended behavior.

    The code looks like this:

    if (cpu != policy->cpu && !frozen) {
    sysfs_remove_link(&dev->kobj, "cpufreq");
    } else if (cpus > 1) {
    new_cpu = cpufreq_nominate_new_policy_cpu(...);
    ...
    update_policy_cpu(..., new_cpu);
    }

    The original intention was:
    If the CPU going offline is not policy->cpu, just remove the link.
    On the other hand, if the CPU going offline is the policy->cpu itself,
    handover the policy->cpu job to some other surviving CPU in that policy.

    But because the 'if' condition also includes the 'frozen' check, now there
    are *two* possibilities by which we can enter the 'else' block:

    1. cpu == policy->cpu (intended)
    2. cpu != policy->cpu && frozen (unintended)

    Due to the second (unintended) scenario, we end up spuriously nominating
    a CPU as the policy->cpu, even when the existing policy->cpu is alive and
    well. This can cause problems further down the line, especially when we end
    up nominating the same policy->cpu as the new one (ie., old == new),
    because it totally confuses update_policy_cpu().

    To avoid this mess, restructure the if/else block to only do what was
    originally intended, and thus prevent any unwelcome surprises.

    Signed-off-by: Srivatsa S. Bhat
    Tested-by: Stephen Warren
    Signed-off-by: Rafael J. Wysocki

    Srivatsa S. Bhat
     
  • Stephen Warren reported that the cpufreq-stats code hits a NULL pointer
    dereference during the second attempt to suspend a system. He also
    pin-pointed the problem to commit 5302c3f "cpufreq: Perform light-weight
    init/teardown during suspend/resume".

    That commit actually ensured that the cpufreq-stats table and the
    cpufreq-stats sysfs entries are *not* torn down (ie., not freed) during
    suspend/resume, which makes it all the more surprising. However, it turns
    out that the root-cause is not that we access an already freed memory, but
    that the reference to the allocated memory gets moved around and we lose
    track of that during resume, leading to the reported crash in a subsequent
    suspend attempt.

    In the suspend path, during CPU offline, the value of policy->cpu is updated
    by choosing one of the surviving CPUs in that policy, as long as there is
    atleast one CPU in that policy. And cpufreq_stats_update_policy_cpu() is
    invoked to update the reference to the stats structure by assigning it to
    the new CPU. However, in the resume path, during CPU online, we end up
    assigning a fresh CPU as the policy->cpu, without letting cpufreq-stats
    know about this. Thus the reference to the stats structure remains
    (incorrectly) associated with the old CPU. So, in a subsequent suspend attempt,
    during CPU offline, we end up accessing an incorrect location to get the
    stats structure, which eventually leads to the NULL pointer dereference.

    Fix this by letting cpufreq-stats know about the update of the policy->cpu
    during CPU online in the resume path. (Also, move the update_policy_cpu()
    function higher up in the file, so that __cpufreq_add_dev() can invoke
    it).

    Reported-and-tested-by: Stephen Warren
    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Srivatsa S. Bhat
     

10 Sep, 2013

8 commits

  • Commit 7c30ed5 (cpufreq: make sure frequency transitions are
    serialized) attempted to serialize frequency transitions by
    adding checks to the CPUFREQ_PRECHANGE and CPUFREQ_POSTCHANGE
    notifications. However, it assumed that the notifications will
    always originate from the driver's .target() callback, but they
    also can be triggered by cpufreq_out_of_sync() and that leads to
    warnings like this on some systems:

    WARNING: CPU: 0 PID: 14543 at drivers/cpufreq/cpufreq.c:317
    __cpufreq_notify_transition+0x238/0x260()
    In middle of another frequency transition

    accompanied by a call trace similar to this one:

    [] dump_stack+0x46/0x58
    [] warn_slowpath_common+0x8c/0xc0
    [] ? acpi_cpufreq_target+0x320/0x320
    [] warn_slowpath_fmt+0x46/0x50
    [] __cpufreq_notify_transition+0x238/0x260
    [] cpufreq_notify_transition+0x3e/0x70
    [] cpufreq_out_of_sync+0x6d/0xb0
    [] cpufreq_update_policy+0x10c/0x160
    [] ? cpufreq_update_policy+0x160/0x160
    [] cpufreq_set_cur_state+0x8c/0xb5
    [] processor_set_cur_state+0xa3/0xcf
    [] thermal_cdev_update+0x9c/0xb0
    [] step_wise_throttle+0x5a/0x90
    [] handle_thermal_trip+0x4f/0x140
    [] thermal_zone_device_update+0x57/0xa0
    [] acpi_thermal_check+0x2e/0x30
    [] acpi_thermal_notify+0x40/0xdc
    [] acpi_device_notify+0x19/0x1b
    [] acpi_ev_notify_dispatch+0x41/0x5c
    [] acpi_os_execute_deferred+0x25/0x32
    [] process_one_work+0x170/0x4a0
    [] worker_thread+0x121/0x390
    [] ? manage_workers.isra.20+0x170/0x170
    [] kthread+0xc0/0xd0
    [] ? flush_kthread_worker+0xb0/0xb0
    [] ret_from_fork+0x7c/0xb0
    [] ? flush_kthread_worker+0xb0/0xb0

    For this reason, revert commit 7c30ed5 along with the fix 266c13d
    (cpufreq: Fix serialization of frequency transitions) on top of it
    and we will revisit the serialization problem later.

    Reported-by: Alessandro Bono
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • There are places where the variable 'ret' is declared as unsigned int
    and then used to store negative return values such as -EINVAL. Fix them
    by declaring the variable as a signed quantity.

    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Srivatsa S. Bhat
     
  • Commit "cpufreq: serialize calls to __cpufreq_governor()" had been a temporary
    and partial solution to the race condition between writing to a cpufreq sysfs
    file and taking a CPU offline. Now that we have a proper and complete solution
    to that problem, remove the temporary fix.

    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Srivatsa S. Bhat
     
  • The functions that are used to write to cpufreq sysfs files (such as
    store_scaling_max_freq()) are not hotplug safe. They can race with CPU
    hotplug tasks and lead to problems such as trying to acquire an already
    destroyed timer-mutex etc.

    Eg:

    __cpufreq_remove_dev()
    __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
    policy->governor->governor(policy, CPUFREQ_GOV_STOP);
    cpufreq_governor_dbs()
    case CPUFREQ_GOV_STOP:
    mutex_destroy(&cpu_cdbs->timer_mutex)
    cpu_cdbs->cur_policy = NULL;

    store()
    __cpufreq_set_policy()
    __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
    policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
    case CPUFREQ_GOV_LIMITS:
    mutex_lock(&cpu_cdbs->timer_mutex); max < cpu_cdbs->cur_policy->cur)
    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Srivatsa S. Bhat
     
  • __cpufreq_remove_dev_finish() handles the kobject cleanup for a CPU going
    offline. But because we destroy the kobject towards the end of the CPU offline
    phase, there are certain race windows where a task can try to write to a
    cpufreq sysfs file (eg: using store_scaling_max_freq()) while we are taking
    that CPU offline, and this can bump up the kobject refcount, which in turn might
    hinder the CPU offline task from running to completion. (It can also cause
    other more serious problems such as trying to acquire a destroyed timer-mutex
    etc., depending on the exact stage of the cleanup at which the task managed to
    take a new refcount).

    To fix the race window, we will need to synchronize those store_*() call-sites
    with CPU hotplug, using get_online_cpus()/put_online_cpus(). However, that
    in turn can cause a total deadlock because it can end up waiting for the
    CPU offline task to complete, with incremented refcount!

    Write to sysfs CPU offline task
    -------------- ----------------
    kobj_refcnt++

    Acquire cpu_hotplug.lock

    get_online_cpus();

    Wait for kobj_refcnt to drop to zero

    **DEADLOCK**

    A simple way to avoid this problem is to perform the kobject cleanup in the
    CPU offline path, with the cpu_hotplug.lock *released*. That is, we can
    perform the wait-for-kobj-refcnt-to-drop as well as the subsequent cleanup
    in the CPU_POST_DEAD stage of CPU offline, which is run with cpu_hotplug.lock
    released. Doing this helps us avoid deadlocks due to holding kobject refcounts
    and waiting on each other on the cpu_hotplug.lock.

    (Note: We can't move all of the cpufreq CPU offline steps to the
    CPU_POST_DEAD stage, because certain things such as stopping the governors
    have to be done before the outgoing CPU is marked offline. So retain those
    parts in the CPU_DOWN_PREPARE stage itself).

    Reported-by: Stephen Boyd
    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Srivatsa S. Bhat
     
  • During CPU offline, the cpufreq core invokes __cpufreq_remove_dev()
    to perform work such as stopping the cpufreq governor, clearing the
    CPU from the policy structure etc, and finally cleaning up the
    kobject.

    There are certain subtle issues related to the kobject cleanup, and
    it would be much easier to deal with them if we separate that part
    from the rest of the cleanup-work in the CPU offline phase. So split
    the __cpufreq_remove_dev() function into 2 parts: one that handles
    the kobject cleanup, and the other that handles the rest of the work.

    Reported-by: Stephen Boyd
    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Srivatsa S. Bhat
     
  • We can't take a big lock around __cpufreq_governor() as this causes
    recursive locking for some cases. But calls to this routine must be
    serialized for every policy. Otherwise we can see some unpredictable
    events.

    For example, consider following scenario:

    __cpufreq_remove_dev()
    __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
    policy->governor->governor(policy, CPUFREQ_GOV_STOP);
    cpufreq_governor_dbs()
    case CPUFREQ_GOV_STOP:
    mutex_destroy(&cpu_cdbs->timer_mutex)
    cpu_cdbs->cur_policy = NULL;

    store()
    __cpufreq_set_policy()
    __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
    policy->governor->governor(policy, CPUFREQ_GOV_LIMITS);
    case CPUFREQ_GOV_LIMITS:
    mutex_lock(&cpu_cdbs->timer_mutex); max < cpu_cdbs->cur_policy->cur)
    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • __cpufreq_governor() returns with -EBUSY when governor is already
    stopped and we try to stop it again, but when it is stopped we must
    not allow calls to CPUFREQ_GOV_LIMITS event as well.

    This patch adds this check in __cpufreq_governor().

    Reported-by: Stephen Boyd
    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     

21 Aug, 2013

1 commit

  • This patch tries to fix lockdep complaint attached below.

    It seems that we should always read acquire the cpufreq_rwsem,
    whether CONFIG_SMP is enabled or not. And CONFIG_HOTPLUG_CPU
    depends on CONFIG_SMP, so it seems we don't need CONFIG_SMP for the
    code enabled by CONFIG_HOTPLUG_CPU.

    [ 0.504191] =====================================
    [ 0.504627] [ BUG: bad unlock balance detected! ]
    [ 0.504627] 3.11.0-rc6-next-20130819 #1 Not tainted
    [ 0.504627] -------------------------------------
    [ 0.504627] swapper/1 is trying to release lock (cpufreq_rwsem) at:
    [ 0.504627] [] cpufreq_add_dev+0x13a/0x3e0
    [ 0.504627] but there are no more locks to release!
    [ 0.504627]
    [ 0.504627] other info that might help us debug this:
    [ 0.504627] 1 lock held by swapper/1:
    [ 0.504627] #0: (subsys mutex#4){+.+.+.}, at: [] subsys_interface_register+0x4f/0xe0
    [ 0.504627]
    [ 0.504627] stack backtrace:
    [ 0.504627] CPU: 0 PID: 1 Comm: swapper Not tainted 3.11.0-rc6-next-20130819 #1
    [ 0.504627] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
    [ 0.504627] ffffffff813d927a ffff88007f847c98 ffffffff814c062b ffff88007f847cc8
    [ 0.504627] ffffffff81098bce ffff88007f847cf8 ffffffff81aadc30 ffffffff813d927a
    [ 0.504627] 00000000ffffffff ffff88007f847d68 ffffffff8109d0be 0000000000000006
    [ 0.504627] Call Trace:
    [ 0.504627] [] ? cpufreq_add_dev+0x13a/0x3e0
    [ 0.504627] [] dump_stack+0x19/0x1b
    [ 0.504627] [] print_unlock_imbalance_bug+0xfe/0x110
    [ 0.504627] [] ? cpufreq_add_dev+0x13a/0x3e0
    [ 0.504627] [] lock_release_non_nested+0x1ee/0x310
    [ 0.504627] [] ? mark_held_locks+0xae/0x120
    [ 0.504627] [] ? kfree+0xcb/0x1d0
    [ 0.504627] [] ? cpufreq_policy_free+0x4a/0x60
    [ 0.504627] [] ? cpufreq_add_dev+0x13a/0x3e0
    [ 0.504627] [] lock_release+0xc4/0x250
    [ 0.504627] [] up_read+0x23/0x40
    [ 0.504627] [] cpufreq_add_dev+0x13a/0x3e0
    [ 0.504627] [] subsys_interface_register+0x99/0xe0
    [ 0.504627] [] ? cpufreq_gov_dbs_init+0x12/0x12
    [ 0.504627] [] cpufreq_register_driver+0x9d/0x1d0
    [ 0.504627] [] ? cpufreq_gov_dbs_init+0x12/0x12
    [ 0.504627] [] acpi_cpufreq_init+0xfe/0x1f8
    [ 0.504627] [] do_one_initcall+0xda/0x180
    [ 0.504627] [] kernel_init_freeable+0x12c/0x1bb
    [ 0.504627] [] ? do_early_param+0x8c/0x8c
    [ 0.504627] [] ? rest_init+0x140/0x140
    [ 0.504627] [] kernel_init+0xe/0xf0
    [ 0.504627] [] ret_from_fork+0x7a/0xb0
    [ 0.504627] [] ? rest_init+0x140/0x140

    Signed-off-by: Li Zhong
    Acked-and-tested-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Li Zhong
     

20 Aug, 2013

5 commits

  • To iterate over all policies we currently iterate over all online
    CPUs and then get the policy for each of them which is suboptimal.
    Use the newly created cpufreq_policy_list for this purpose instead.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • cpufreq_policy_cpu per-cpu variables are used for storing the ID of
    the CPU that manages the given CPU's policy. However, we also store
    a policy pointer for each cpu in cpufreq_cpu_data, so the
    cpufreq_policy_cpu information is simply redundant.

    It is better to use cpufreq_cpu_data to retrieve a policy and get
    policy->cpu from there, so make that happen everywhere and drop the
    cpufreq_policy_cpu per-cpu variables which aren't necessary any more.

    [rjw: Changelog]
    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • We don't need to check if event is CPUFREQ_GOV_POLICY_INIT and put
    governor module as we are sure event can only be START/STOP here.

    Remove the useless check.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • cpufreq_policy_list is a list of active policies. We do remove
    policies from this list when all CPUs belonging to that policy are
    removed. But during system suspend we don't really free a policy
    struct as it will be used again during resume, so we didn't remove
    it from cpufreq_policy_list as well..

    However, this is incorrect. We are saying this policy isn't valid
    anymore and must not be referenced (though we haven't freed it), but
    it can still be used by code that iterates over cpufreq_policy_list.

    Remove policy from this list during system suspend as well.
    Of course, we must add it back whenever the first CPU belonging to
    that policy shows up.

    [rjw: Changelog]
    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • Align closing brace '}' of an if block.

    [rjw: Subject and changelog]
    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     

18 Aug, 2013

1 commit


10 Aug, 2013

5 commits

  • The __cpufreq_governor() function can fail in rare cases especially
    if there are bugs in cpufreq drivers. Thus we must stop processing
    as soon as this routine fails, otherwise it may result in undefined
    behavior.

    This patch adds error checking code whenever this routine is called
    from any place.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • Critical sections of the cpufreq core are protected with the help of
    the driver module owner's refcount, which isn't the correct approach,
    because it causes rmmod to return an error when some routine has
    updated that refcount.

    Let's use rwsem for this purpose instead. Only
    cpufreq_unregister_driver() will use write sem
    and everybody else will use read sem.

    [rjw: Subject & changelog]
    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • The cpufreq governor owner refcount usage is broken. We should only
    increment that refcount when a CPUFREQ_GOV_POLICY_INIT event has come
    and it should only be decremented if CPUFREQ_GOV_POLICY_EXIT has come.

    Currently, there can be situations where the governor is in use, but
    we have allowed it to be unloaded which may result in undefined
    behavior. Let's fix it.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • To iterate over all policies we currently iterate over all CPUs and
    then get the policy for each of them. Let's use the newly created
    cpufreq_policy_list for this purpose.

    [rjw: Changelog]
    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • Policies available in the cpufreq framework are now linked together.
    They are accessible via cpufreq_policy_list defined in the cpufreq
    core.

    [rjw: Fix from Yinghai Lu folded in]
    Signed-off-by: Lukasz Majewski
    Signed-off-by: Myungjoo Ham
    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Lukasz Majewski
     

08 Aug, 2013

12 commits

  • Chapter 14 of Documentation/CodingStyle says:

    The preferred form for passing a size of a struct is the following:

    p = kmalloc(sizeof(*p), ...);

    The alternative form where struct name is spelled out hurts
    readability and introduces an opportunity for a bug when the pointer
    variable type is changed but the corresponding sizeof that is passed
    to a memory allocator is not.

    This wasn't followed consistently in drivers/cpufreq, let's make it
    more consistent by always following this rule.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • They are called policy, cur_policy, new_policy, data, etc. Just call
    them policy wherever possible.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • This patch addresses the following issues in the header files in the
    cpufreq core:
    - Include headers in ascending order, so that we don't add same
    many times by mistake.
    - must be included after , so that they override
    whatever they need to.
    - Remove unnecessary includes.
    - Don't include files already included by cpufreq.h or
    cpufreq_governor.h.

    [rjw: Changelog]
    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • * pm-cpufreq:
    cpufreq: Remove unused function __cpufreq_driver_getavg()
    cpufreq: Remove unused APERF/MPERF support
    cpufreq: ondemand: Change the calculation of target frequency

    Rafael J. Wysocki
     
  • The caller of cpufreq_add_policy_cpu() already has a pointer to the
    policy structure and there is no need to look it up again in
    cpufreq_add_policy_cpu(). Let's pass it directly.

    Signed-off-by: Viresh Kumar
    Reviewed-by: Srivatsa S. Bhat
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • The only case triggering a jump to the err_out_unregister label in
    __cpufreq_add_dev() is when cpufreq_add_dev_interface() fails.
    However, if cpufreq_add_dev_interface() fails, it calls kobject_put()
    for the policy kobject in its error code path and since that causes
    the kobject's refcount to become 0, the additional kobject_put() for
    the same kobject under err_out_unregister and the
    wait_for_completion() following it are pointless, so drop them.

    Signed-off-by: Rafael J. Wysocki
    Reviewed-by: Srivatsa S. Bhat
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • The cpufreq core is a little inconsistent in the way it uses the
    driver module refcount.

    Namely, if __cpufreq_add_dev() is called for a CPU that doesn't
    share the policy object with any other CPUs, the driver module
    refcount it grabs to start with will be dropped by it before
    returning and will be equal to whatever it had been before that
    function was invoked.

    However, if the given CPU does share the policy object with other
    CPUs, either cpufreq_add_policy_cpu() is called to link the new CPU
    to the existing policy, or cpufreq_add_dev_symlink() is used to link
    the other CPUs sharing the policy with it to the just created policy
    object. In that case, because both cpufreq_add_policy_cpu() and
    cpufreq_add_dev_symlink() call cpufreq_cpu_get() for the given
    policy (the latter possibly many times) without the balancing
    cpufreq_cpu_put() (unless there is an error), the driver module
    refcount will be left by __cpufreq_add_dev() with a nonzero value
    (different from the initial one).

    To remove that inconsistency make cpufreq_add_policy_cpu() execute
    cpufreq_cpu_put() for the given policy before returning, which
    decrements the driver module refcount so that it will be equal to its
    initial value after __cpufreq_add_dev() returns. Also remove the
    cpufreq_cpu_get() call from cpufreq_add_dev_symlink(), since both the
    policy refcount and the driver module refcount are nonzero when it is
    called and they don't need to be bumped up by it.

    Accordingly, drop the cpufreq_cpu_put() from __cpufreq_remove_dev(),
    since it is only necessary to balance the cpufreq_cpu_get() called
    by cpufreq_add_policy_cpu() or cpufreq_add_dev_symlink().

    Signed-off-by: Rafael J. Wysocki
    Reviewed-by: Srivatsa S. Bhat
    Acked-by: Viresh Kumar

    Rafael J. Wysocki
     
  • Pointer to struct cpufreq_policy is already passed to these routines
    and we don't need to send policy->cpu to them as well. So, get rid
    of this extra argument and use policy->cpu everywhere.

    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • We call cpufreq_cpu_get() in cpufreq_add_dev_symlink() to increase usage
    refcount of policy, but not to get a policy for the given CPU. So, we
    don't really need to capture the return value of this routine. We can
    simply use policy passed as an argument to cpufreq_add_dev_symlink().

    Moreover debug print is rewritten to make it more clear.

    [rjw: Changelog]
    Signed-off-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Viresh Kumar
     
  • Now that we have the infrastructure to perform a light-weight init/tear-down,
    use that in the cpufreq CPU hotplug notifier when invoked from the
    suspend/resume path.

    This also ensures that the file permissions of the cpufreq sysfs files are
    preserved across suspend/resume, something which commit a66b2e (cpufreq:
    Preserve sysfs files across suspend/resume) originally intended to do, but
    had to be reverted due to other problems.

    Signed-off-by: Srivatsa S. Bhat
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Srivatsa S. Bhat
     
  • To perform light-weight cpu-init and teardown in the cpufreq subsystem
    during suspend/resume, we need to separate out the 2 main functionalities
    of the cpufreq CPU hotplug callbacks, as outlined below:

    1. Init/tear-down of core cpufreq and CPU-specific components, which are
    critical to the correct functioning of the cpufreq subsystem.

    2. Init/tear-down of cpufreq sysfs files during suspend/resume.

    The first part requires accurate updates to the policy structure such as
    its ->cpus and ->related_cpus masks, whereas the second part requires that
    the policy->kobj structure is not released or re-initialized during
    suspend/resume.

    To handle both these requirements, we need to allow updates to the policy
    structure throughout suspend/resume, but prevent the structure from getting
    freed up. Also, we must have a mechanism by which the cpu-up callbacks can
    restore the policy structure, without allocating things afresh. (That also
    helps avoid memory leaks).

    To achieve this, we use 2 schemes:
    a. Use a fallback per-cpu storage area for preserving the policy structures
    during suspend, so that they can be restored during resume appropriately.

    b. Use the 'frozen' flag to determine when to free or allocate the policy
    structure vs when to restore the policy from the saved fallback storage.
    Thus we can successfully preserve the structure across suspend/resume.

    Effectively, this helps us complete the separation of the 'light-weight'
    and the 'full' init/tear-down sequences in the cpufreq subsystem, so that
    this can be made use of in the suspend/resume scenario.

    Signed-off-by: Srivatsa S. Bhat
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Srivatsa S. Bhat
     
  • During suspend/resume we would like to do a light-weight init/teardown of
    CPUs in the cpufreq subsystem and preserve certain things such as sysfs files
    etc across suspend/resume transitions. Add a flag called 'frozen' to help
    distinguish the full init/teardown sequence from the light-weight one.

    Signed-off-by: Srivatsa S. Bhat
    Acked-by: Viresh Kumar
    Signed-off-by: Rafael J. Wysocki

    Srivatsa S. Bhat