28 May, 2011

1 commit

  • sched_domain iterations needs to be protected by rcu_read_lock() now,
    this patch adds another two places which needs the rcu lock, which is
    spotted by following suspicious rcu_dereference_check() usage warnings.

    kernel/sched_rt.c:1244 invoked rcu_dereference_check() without protection!
    kernel/sched_stats.h:41 invoked rcu_dereference_check() without protection!

    Signed-off-by: Xiaotian Feng
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1303469634-11678-1-git-send-email-dfeng@redhat.com
    Signed-off-by: Ingo Molnar

    Xiaotian Feng
     

24 Oct, 2010

1 commit


18 Jun, 2010

1 commit

  • account_group_xxx() functions check ->exit_state to ensure that
    current->signal is valid and can't go away. This is not needed
    since ea6d290c, task->signal is pinned to task_struct.

    The comment and another hack in account_group_exec_runtime() refers
    to task_rq_unlock_wait() which was already removed by b7b8ff63.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     

25 Mar, 2009

1 commit

  • Impact: cleanup, new schedstat ABI

    Since they are used on in statistics and are always set to zero, the
    following fields from struct rq have been removed: yld_exp_empty,
    yld_act_empty and yld_both_empty.

    Both Sched Debug and SCHEDSTAT_VERSION versions has also been
    incremented since ABIs have been changed.

    The schedtop tool has been updated to properly handle new version of
    schedstat:

    http://rt.wiki.kernel.org/index.php/Schedtop_utility

    Signed-off-by: Luis Henriques
    Acked-by: Gregory Haskins
    Acked-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Luis Henriques
     

05 Feb, 2009

1 commit

  • Change the process wide cpu timers/clocks so that we:

    1) don't mess up the kernel with too many threads,
    2) don't have a per-cpu allocation for each process,
    3) have no impact when not used.

    In order to accomplish this we're going to split it into two parts:

    - clocks; which can take all the time they want since they run
    from user context -- ie. sys_clock_gettime(CLOCK_PROCESS_CPUTIME_ID)

    - timers; which need constant time sampling but since they're
    explicity used, the user can pay the overhead.

    The clock readout will go back to a full sum of the thread group, while the
    timers will run of a global 'clock' that only runs when needed, so only
    programs that make use of the facility pay the price.

    Signed-off-by: Peter Zijlstra
    Reviewed-by: Ingo Molnar
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

08 Jan, 2009

1 commit

  • Either we bounce once cacheline per cpu per tick, yielding n^2 bounces
    or we just bounce a single..

    Also, using per-cpu allocations for the thread-groups complicates the
    per-cpu allocator in that its currently aimed to be a fixed sized
    allocator and the only possible extention to that would be vmap based,
    which is seriously constrained on 32 bit archs.

    So making the per-cpu memory requirement depend on the number of
    processes is an issue.

    Lastly, it didn't deal with cpu-hotplug, although admittedly that might
    be fixable.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

03 Jan, 2009

1 commit

  • …/git/tip/linux-2.6-tip

    * 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits)
    x86: export vector_used_by_percpu_irq
    x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and()
    sched: nominate preferred wakeup cpu, fix
    x86: fix lguest used_vectors breakage, -v2
    x86: fix warning in arch/x86/kernel/io_apic.c
    sched: fix warning in kernel/sched.c
    sched: move test_sd_parent() to an SMP section of sched.h
    sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0
    sched: activate active load balancing in new idle cpus
    sched: bias task wakeups to preferred semi-idle packages
    sched: nominate preferred wakeup cpu
    sched: favour lower logical cpu number for sched_mc balance
    sched: framework for sched_mc/smt_power_savings=N
    sched: convert BALANCE_FOR_xx_POWER to inline functions
    x86: use possible_cpus=NUM to extend the possible cpus allowed
    x86: fix cpu_mask_to_apicid_and to include cpu_online_mask
    x86: update io_apic.c to the new cpumask code
    x86: Introduce topology_core_cpumask()/topology_thread_cpumask()
    x86: xen: use smp_call_function_many()
    x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c
    ...

    Fixed up trivial conflict in kernel/time/tick-sched.c manually

    Linus Torvalds
     

18 Dec, 2008

1 commit

  • Impact: simplify code

    When we turn on CONFIG_SCHEDSTATS, per-task cpu runtime is accumulated
    twice. Once in task->se.sum_exec_runtime and once in sched_info.cpu_time.
    These two stats are exactly the same.

    Given that task->se.sum_exec_runtime is always accumulated by the core
    scheduler, sched_info can reuse that data instead of duplicate the accounting.

    Signed-off-by: Ken Chen
    Acked-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Ken Chen
     

13 Dec, 2008

2 commits

  • Conflicts:

    arch/x86/kernel/io_apic.c
    kernel/sched.c
    kernel/sched_stats.h

    Rusty Russell
     
  • …t_scnprintf to take pointers.

    Impact: change calling convention of existing cpumask APIs

    Most cpumask functions started with cpus_: these have been replaced by
    cpumask_ ones which take struct cpumask pointers as expected.

    These four functions don't have good replacement names; fortunately
    they're rarely used, so we just change them over.

    Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
    Signed-off-by: Mike Travis <travis@sgi.com>
    Acked-by: Ingo Molnar <mingo@elte.hu>
    Cc: paulus@samba.org
    Cc: mingo@redhat.com
    Cc: tony.luck@intel.com
    Cc: ralf@linux-mips.org
    Cc: Greg Kroah-Hartman <gregkh@suse.de>
    Cc: cl@linux-foundation.org
    Cc: srostedt@redhat.com

    Rusty Russell
     

25 Nov, 2008

1 commit


17 Nov, 2008

1 commit

  • Impact: fix potential NULL dereference

    Contrary to ad474caca3e2a0550b7ce0706527ad5ab389a4d4 changelog, other
    acct_group_xxx() helpers can be called after exit_notify() by timer tick.
    Thanks to Roland for pointing out this. Somehow I missed this simple fact
    when I read the original patch, and I am afraid I confused Frank during
    the discussion. Sorry.

    Fortunately, these helpers work with current, we can check ->exit_state
    to ensure that ->signal can't go away under us.

    Also, add the comment and compiler barrier to account_group_exec_runtime(),
    to make sure we load ->signal only once.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Ingo Molnar

    Oleg Nesterov
     

24 Oct, 2008

2 commits

  • * 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc: (35 commits)
    proc: remove fs/proc/proc_misc.c
    proc: move /proc/vmcore creation to fs/proc/vmcore.c
    proc: move pagecount stuff to fs/proc/page.c
    proc: move all /proc/kcore stuff to fs/proc/kcore.c
    proc: move /proc/schedstat boilerplate to kernel/sched_stats.h
    proc: move /proc/modules boilerplate to kernel/module.c
    proc: move /proc/diskstats boilerplate to block/genhd.c
    proc: move /proc/zoneinfo boilerplate to mm/vmstat.c
    proc: move /proc/vmstat boilerplate to mm/vmstat.c
    proc: move /proc/pagetypeinfo boilerplate to mm/vmstat.c
    proc: move /proc/buddyinfo boilerplate to mm/vmstat.c
    proc: move /proc/vmallocinfo to mm/vmalloc.c
    proc: move /proc/slabinfo boilerplate to mm/slub.c, mm/slab.c
    proc: move /proc/slab_allocators boilerplate to mm/slab.c
    proc: move /proc/interrupts boilerplate code to fs/proc/interrupts.c
    proc: move /proc/stat to fs/proc/stat.c
    proc: move rest of /proc/partitions code to block/genhd.c
    proc: move /proc/cpuinfo code to fs/proc/cpuinfo.c
    proc: move /proc/devices code to fs/proc/devices.c
    proc: move rest of /proc/locks to fs/locks.c
    ...

    Linus Torvalds
     
  • …l/git/tip/linux-2.6-tip

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched: disable the hrtick for now
    sched: revert back to per-rq vruntime
    sched: fair scheduler should not resched rt tasks
    sched: optimize group load balancer
    sched: minor fast-path overhead reduction
    sched: fix the wrong mask_len, cleanup
    sched: kill unused scheduler decl.
    sched: fix the wrong mask_len
    sched: only update rq->clock while holding rq->lock

    Linus Torvalds
     

23 Oct, 2008

1 commit


17 Oct, 2008

2 commits


28 Sep, 2008

1 commit


23 Sep, 2008

1 commit

  • This is the second resubmission of the posix timer rework patch, posted
    a few days ago.

    This includes the changes from the previous resubmittion, which addressed
    Oleg Nesterov's comments, removing the RCU stuff from the patch and
    un-inlining the thread_group_cputime() function for SMP.

    In addition, per Ingo Molnar it simplifies the UP code, consolidating much
    of it with the SMP version and depending on lower-level SMP/UP handling to
    take care of the differences.

    It also cleans up some UP compile errors, moves the scheduler stats-related
    macros into kernel/sched_stats.h, cleans up a merge error in
    kernel/fork.c and has a few other minor fixes and cleanups as suggested
    by Oleg and Ingo. Thanks for the review, guys.

    Signed-off-by: Frank Mayhar
    Cc: Roland McGrath
    Cc: Alexey Dobriyan
    Cc: Andrew Morton
    Signed-off-by: Ingo Molnar

    Frank Mayhar
     

04 Jul, 2008

1 commit

  • On Thu, Jun 19, 2008 at 12:27:14PM +0200, Peter Zijlstra wrote:
    > On Thu, 2008-06-05 at 10:50 +0530, Ankita Garg wrote:
    >
    > > Thanks Peter for the explanation...
    > >
    > > I agree with the above and that is the reason why I did not see weird
    > > values with cpu_time. But, run_delay still would suffer skews as the end
    > > points for delta could be taken on different cpus due to migration (more
    > > so on RT kernel due to the push-pull operations). With the below patch,
    > > I could not reproduce the issue I had seen earlier. After every dequeue,
    > > we take the delta and start wait measurements from zero when moved to a
    > > different rq.
    >
    > OK, so task delay delay accounting is broken because it doesn't take
    > migration into account.
    >
    > What you've done is make it symmetric wrt enqueue, and account it like
    >
    > cpu0 cpu1
    >
    > enqueue
    >
    > dequeue
    > enqueue
    >
    > run
    >
    > Where you add both d1 and d2 to the run_delay,.. right?
    >

    Thanks for reviewing the patch. The above is exactly what I have done.

    > This seems like a good fix, however it looks like the patch will break
    > compilation in !CONFIG_SCHEDSTATS && !CONFIG_TASK_DELAY_ACCT, of it
    > failing to provide a stub for sched_info_dequeue() in that case.

    Fixed. Pl. find the new patch below.

    Signed-off-by: Ankita Garg
    Acked-by: Peter Zijlstra
    Cc: Gregory Haskins
    Cc: rostedt@goodmis.org
    Cc: suresh.b.siddha@intel.com
    Cc: aneesh.kumar@linux.vnet.ibm.com
    Cc: dhaval@linux.vnet.ibm.com
    Cc: vatsa@linux.vnet.ibm.com
    Cc: David Bahi
    Signed-off-by: Ingo Molnar

    Ankita Garg
     

19 Jun, 2008

1 commit

  • This patch corrects the incorrect value of per process run-queue wait
    time reported by delay statistics. The anomaly was due to the following
    reason. When a process leaves the CPU and immediately starts waiting for
    CPU on the runqueue (which means it remains in the TASK_RUNNABLE state),
    the time of re-entry into the run-queue is never recorded. Due to this,
    the waiting time on the runqueue from this point of re-entry upto the
    next time it hits the CPU is not accounted for. This is solved by
    recording the time of re-entry of a process leaving the CPU in the
    sched_info_depart() function IF the process will go back to waiting on
    the run-queue. This IF condition is verified by checking whether the
    process is still in the TASK_RUNNABLE state.

    The patch was tested on 2.6.26-rc6 using two simple CPU hog programs.
    The values noted prior to the fix did not account for the time spent on
    the runqueue waiting. After the fix, the correct values were reported
    back to user space.

    Signed-off-by: Bharath Ravi
    Signed-off-by: Madhava K R
    Cc: dhaval@linux.vnet.ibm.com
    Cc: vatsa@in.ibm.com
    Cc: balbir@in.ibm.com
    Acked-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Bharath Ravi
     

29 May, 2008

1 commit

  • The Coverity checker spotted a memleak introduced by commit
    39106dcf85285e78f3b290022122c76f851379b8 (cpumask: use new cpus_scnprintf
    function).

    It seems the kfree() got lost between v2 and v3 of this patch...

    Signed-off-by: Adrian Bunk
    Cc: Mike Travis
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar

    Adrian Bunk
     

20 Apr, 2008

1 commit

  • * Cleaned up references to cpumask_scnprintf() and added new
    cpulist_scnprintf() interfaces where appropriate.

    * Fix some small bugs (or code efficiency improvments) for various uses
    of cpumask_scnprintf.

    * Clean up some checkpatch errors.

    Signed-off-by: Mike Travis
    Signed-off-by: Ingo Molnar

    Mike Travis
     

28 Nov, 2007

1 commit


10 Nov, 2007

1 commit

  • Fix the delay accounting regression introduced by commit
    75d4ef16a6aa84f708188bada182315f80aab6fa. rq no longer has sched_info
    data associated with it. task_struct sched_info structure is used by delay
    accounting to provide back statistics to user space.

    also remove direct use of sched_clock() (which is not a valid thing to
    do anymore) and use rq->clock instead.

    Signed-off-by: Balbir Singh
    Signed-off-by: Ingo Molnar

    Balbir Singh
     

19 Oct, 2007

1 commit

  • schedstat is useful in investigating CPU scheduler behavior. Ideally,
    I think it is beneficial to have it on all the time. However, the
    cost of turning it on in production system is quite high, largely due
    to number of events it collects and also due to its large memory
    footprint.

    Most of the fields probably don't need to be full 64-bit on 64-bit
    arch. Rolling over 4 billion events will most like take a long time
    and user space tool can be made to accommodate that. I'm proposing
    kernel to cut back most of variable width on 64-bit system. (note,
    the following patch doesn't affect 32-bit system).

    Signed-off-by: Ken Chen
    Signed-off-by: Ingo Molnar

    Ken Chen
     

15 Oct, 2007

2 commits

  • rename all 'cnt' fields and variables to the less yucky 'count' name.

    yuckage noticed by Andrew Morton.

    no change in code, other than the /proc/sched_debug bkl_count string got
    a bit larger:

    text data bss dec hex filename
    38236 3506 24 41766 a326 sched.o.before
    38240 3506 24 41770 a32a sched.o.after

    Signed-off-by: Ingo Molnar
    Reviewed-by: Thomas Gleixner

    Ingo Molnar
     
  • fix delay accounting performance regression - those sched_clock()
    calls are not needed.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Peter Zijlstra
    Reviewed-by: Thomas Gleixner

    Ingo Molnar
     

02 Aug, 2007

1 commit


10 Jul, 2007

2 commits