03 Aug, 2018

1 commit

  • [ Upstream commit 02acc80d19edb0d5684c997b2004ad19f9f5236e ]

    try_to_wake_up() might invoke delayacct_blkio_end() while holding the
    pi_lock (which is a raw_spinlock_t). delayacct_blkio_end() acquires
    task_delay_info.lock which is a spinlock_t. This causes a might sleep splat
    on -RT where non raw spinlocks are converted to 'sleeping' spinlocks.

    task_delay_info.lock is only held for a short amount of time so it's not a
    problem latency wise to make convert it to a raw spinlock.

    Signed-off-by: Sebastian Andrzej Siewior
    Signed-off-by: Thomas Gleixner
    Cc: Balbir Singh
    Link: https://lkml.kernel.org/r/20180423161024.6710-1-bigeasy@linutronix.de
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Sebastian Andrzej Siewior
     

24 Jan, 2018

1 commit

  • commit c96f5471ce7d2aefd0dda560cc23f08ab00bc65d upstream.

    Before commit:

    e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler")

    delayacct_blkio_end() was called after context-switching into the task which
    completed I/O.

    This resulted in double counting: the task would account a delay both waiting
    for I/O and for time spent in the runqueue.

    With e33a9bba85a8, delayacct_blkio_end() is called by try_to_wake_up().
    In ttwu, we have not yet context-switched. This is more correct, in that
    the delay accounting ends when the I/O is complete.

    But delayacct_blkio_end() relies on 'get_current()', and we have not yet
    context-switched into the task whose I/O completed. This results in the
    wrong task having its delay accounting statistics updated.

    Instead of doing that, pass the task_struct being woken to delayacct_blkio_end(),
    so that it can update the statistics of the correct task.

    Signed-off-by: Josh Snyder
    Acked-by: Tejun Heo
    Acked-by: Balbir Singh
    Cc: Brendan Gregg
    Cc: Jens Axboe
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-block@vger.kernel.org
    Fixes: e33a9bba85a8 ("sched/core: move IO scheduling accounting from io_schedule_timeout() into scheduler")
    Link: http://lkml.kernel.org/r/1513613712-571-1-git-send-email-joshs@netflix.com
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Josh Snyder
     

02 Mar, 2017

2 commits


01 Feb, 2017

2 commits

  • Use the new nsec based cputime accessors as part of the whole cputime
    conversion from cputime_t to nsecs.

    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Fenghua Yu
    Cc: Heiko Carstens
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1485832191-26889-14-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     
  • This API returns a task's cputime in cputime_t in order to ease the
    conversion of cputime internals to use nsecs units instead. Blindly
    converting all cputime readers to use this API now will later let us
    convert more smoothly and step by step all these places to use the
    new nsec based cputime.

    Signed-off-by: Frederic Weisbecker
    Cc: Benjamin Herrenschmidt
    Cc: Fenghua Yu
    Cc: Heiko Carstens
    Cc: Linus Torvalds
    Cc: Martin Schwidefsky
    Cc: Michael Ellerman
    Cc: Paul Mackerras
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Stanislaw Gruszka
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Wanpeng Li
    Link: http://lkml.kernel.org/r/1485832191-26889-7-git-send-email-fweisbec@gmail.com
    Signed-off-by: Ingo Molnar

    Frederic Weisbecker
     

15 Jan, 2016

1 commit

  • Mark those kmem allocations that are known to be easily triggered from
    userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
    memcg. For the list, see below:

    - threadinfo
    - task_struct
    - task_delay_info
    - pid
    - cred
    - mm_struct
    - vm_area_struct and vm_region (nommu)
    - anon_vma and anon_vma_chain
    - signal_struct
    - sighand_struct
    - fs_struct
    - files_struct
    - fdtable and fdtable->full_fds_bits
    - dentry and external_name
    - inode for all filesystems. This is the most tedious part, because
    most filesystems overwrite the alloc_inode method.

    The list is far from complete, so feel free to add more objects.
    Nevertheless, it should be close to "account everything" approach and
    keep most workloads within bounds. Malevolent users will be able to
    breach the limit, but this was possible even with the former "account
    everything" approach (simply because it did not account everything in
    fact).

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Vladimir Davydov
    Acked-by: Johannes Weiner
    Acked-by: Michal Hocko
    Cc: Tejun Heo
    Cc: Greg Thelen
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

24 Jul, 2014

2 commits


12 Jun, 2014

1 commit

  • do_posix_clock_monotonic_gettime() is a leftover from the initial
    posix timer implementation which maps to ktime_get_ts(). Remove the
    silly wrapper while at it.

    Signed-off-by: Thomas Gleixner
    Cc: John Stultz
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20140611234606.931409215@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

13 Nov, 2013

1 commit


28 Jan, 2013

1 commit

  • This is in preparation for the full dynticks feature. While
    remotely reading the cputime of a task running in a full
    dynticks CPU, we'll need to do some extra-computation. This
    way we can account the time it spent tickless in userspace
    since its last cputime snapshot.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

14 Jul, 2011

1 commit

  • To implement steal time, we need the hypervisor to pass the guest
    information about how much time was spent running other processes
    outside the VM, while the vcpu had meaningful work to do - halt
    time does not count.

    This information is acquired through the run_delay field of
    delayacct/schedstats infrastructure, that counts time spent in a
    runqueue but not running.

    Steal time is a per-cpu information, so the traditional MSR-based
    infrastructure is used. A new msr, KVM_MSR_STEAL_TIME, holds the
    memory area address containing information about steal time

    This patch contains the hypervisor part of the steal time infrasructure,
    and can be backported independently of the guest portion.

    [avi, yongjie: export delayacct_on, to avoid build failures in some configs]

    Signed-off-by: Glauber Costa
    Tested-by: Eric B Munson
    CC: Rik van Riel
    CC: Jeremy Fitzhardinge
    CC: Peter Zijlstra
    CC: Anthony Liguori
    Signed-off-by: Yongjie Ren
    Signed-off-by: Avi Kivity

    Glauber Costa
     

19 Sep, 2009

1 commit


18 Dec, 2008

1 commit

  • Impact: simplify code

    When we turn on CONFIG_SCHEDSTATS, per-task cpu runtime is accumulated
    twice. Once in task->se.sum_exec_runtime and once in sched_info.cpu_time.
    These two stats are exactly the same.

    Given that task->se.sum_exec_runtime is always accumulated by the core
    scheduler, sched_info can reuse that data instead of duplicate the accounting.

    Signed-off-by: Ken Chen
    Acked-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Ken Chen
     

26 Jul, 2008

2 commits

  • Add members for memory reclaim delay to taskstats, and accumulate them in
    __delayacct_add_tsk() .

    Signed-off-by: Keika Kobayashi
    Cc: Hiroshi Shimamoto
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Keika Kobayashi
     
  • Sometimes, application responses become bad under heavy memory load.
    Applications take a bit time to reclaim memory. The statistics, how long
    memory reclaim takes, will be useful to measure memory usage.

    This patch adds accounting memory reclaim to per-task-delay-accounting for
    accounting the time of do_try_to_free_pages().

    - When System is under low memory load,
    memory reclaim may not occur.

    $ free
    total used free shared buffers cached
    Mem: 8197800 1577300 6620500 0 4808 1516724
    -/+ buffers/cache: 55768 8142032
    Swap: 16386292 0 16386292

    $ vmstat 1
    procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
    r b swpd free buff cache si so bi bo in cs us sy id wa
    0 0 0 5069748 10612 3014060 0 0 0 0 3 26 0 0 100 0
    0 0 0 5069748 10612 3014060 0 0 0 0 4 22 0 0 100 0
    0 0 0 5069748 10612 3014060 0 0 0 0 3 18 0 0 100 0

    Measure the time of tar command.

    $ ls -s test.dat
    1501472 test.dat

    $ time tar cvf test.tar test.dat
    real 0m13.388s
    user 0m0.116s
    sys 0m5.304s

    $ ./delayget -d -p
    CPU count real total virtual total delay total
    428 5528345500 5477116080 62749891
    IO count delay total
    338 8078977189
    SWAP count delay total
    0 0
    RECLAIM count delay total
    0 0

    - When system is under heavy memory load
    memory reclaim may occur.

    $ vmstat 1
    procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
    r b swpd free buff cache si so bi bo in cs us sy id wa
    0 0 7159032 49724 1812 3012 0 0 0 0 3 24 0 0 100 0
    0 0 7159032 49724 1812 3012 0 0 0 0 4 24 0 0 100 0
    0 0 7159032 49848 1812 3012 0 0 0 0 3 22 0 0 100 0

    In this case, one process uses more 8G memory
    by execution of malloc() and memset().

    $ time tar cvf test.tar test.dat
    real 1m38.563s
    CPU count real total virtual total delay total
    9021 7140446250 7315277975 923201824
    IO count delay total
    8965 90466349669
    SWAP count delay total
    3 21036367
    RECLAIM count delay total
    740 61011951153

    In the later case, the value of RECLAIM is increasing.
    So, taskstats can show how much memory reclaim influences TAT.

    Signed-off-by: Keika Kobayashi
    Acked-by: Balbir Singh
    Acked-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Keika Kobayashi
     

19 Oct, 2007

1 commit

  • This adds items to the taststats struct to account for user and system
    time based on scaling the CPU frequency and instruction issue rates.

    Adds account_(user|system)_time_scaled callbacks which architectures
    can use to account for time using this mechanism.

    Signed-off-by: Michael Neuling
    Cc: Balbir Singh
    Cc: Jay Lan
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Neuling
     

15 Oct, 2007

1 commit

  • rename all 'cnt' fields and variables to the less yucky 'count' name.

    yuckage noticed by Andrew Morton.

    no change in code, other than the /proc/sched_debug bkl_count string got
    a bit larger:

    text data bss dec hex filename
    38236 3506 24 41766 a326 sched.o.before
    38240 3506 24 41770 a32a sched.o.after

    Signed-off-by: Ingo Molnar
    Reviewed-by: Thomas Gleixner

    Ingo Molnar
     

10 Jul, 2007

1 commit


08 May, 2007

1 commit

  • This patch provides a new macro

    KMEM_CACHE(, )

    to simplify slab creation. KMEM_CACHE creates a slab with the name of the
    struct, with the size of the struct and with the alignment of the struct.
    Additional slab flags may be specified if necessary.

    Example

    struct test_slab {
    int a,b,c;
    struct list_head;
    } __cacheline_aligned_in_smp;

    test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC)

    will create a new slab named "test_slab" of the size sizeof(struct
    test_slab) and aligned to the alignment of test slab. If it fails then we
    panic.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

08 Dec, 2006

2 commits

  • Replace all uses of kmem_cache_t with struct kmem_cache.

    The patch was generated using the following script:

    #!/bin/sh
    #
    # Replace one string by another in all the kernel sources.
    #

    set -e

    for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
    quilt add $file
    sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
    mv /tmp/$$ $file
    quilt refresh
    done

    The script was run like this

    sh replace kmem_cache_t "struct kmem_cache"

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_KERNEL is an alias of GFP_KERNEL.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

06 Nov, 2006

1 commit

  • Make the delayacct lock irqsave; this avoids the possible deadlock where
    an interrupt is taken while holding the delayacct lock which needs to
    take the delayacct lock.

    Signed-off-by: Peter Zijlstra
    Acked-by: Oleg Nesterov
    Cc: Balbir Singh
    Cc: Shailabh Nagar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

02 Sep, 2006

1 commit

  • Cleanup allocation and freeing of tsk->delays used by delay accounting.
    This solves two problems reported for delay accounting:

    1. oops in __delayacct_blkio_ticks
    http://www.uwsg.indiana.edu/hypermail/linux/kernel/0608.2/1844.html

    Currently tsk->delays is getting freed too early in task exit which can
    cause a NULL tsk->delays to get accessed via reading of /proc//stats.
    The patch fixes this problem by freeing tsk->delays closer to when
    task_struct itself is freed up. As a result, it also eliminates the use of
    tsk->delays_lock which was only being used (inadequately) to safeguard
    access to tsk->delays while a task was exiting.

    2. Possible memory leak in kernel/delayacct.c
    http://www.uwsg.indiana.edu/hypermail/linux/kernel/0608.2/1389.html

    The patch cleans up tsk->delays allocations after a bad fork which was
    missing earlier.

    The patch has been tested to fix the problems listed above and stress
    tested with rapid calls to delay accounting's taskstats command interface
    (which is the other path that can access the same data, besides the /proc
    interface causing the oops above).

    Signed-off-by: Shailabh Nagar
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     

01 Aug, 2006

1 commit

  • Enable delay accounting by default so that feature gets coverage testing
    without requiring special measures.

    Earlier, it was off by default and had to be enabled via a boot time param.
    This patch reverses the default behaviour to improve coverage testing. It
    can be removed late in the kernel development cycle if its believed users
    shouldn't have to incur any cost if they don't want delay accounting. Or
    it can be retained forever if the utility of the stats is deemed common
    enough to warrant keeping the feature on.

    Signed-off-by: Shailabh Nagar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     

15 Jul, 2006

4 commits

  • Export I/O delays seen by a task through /proc//stats for use in top
    etc.

    Note that delays for I/O done for swapping in pages (swapin I/O) is clubbed
    together with all other I/O here (this is not the case in the netlink
    interface where the swapin I/O is kept distinct)

    [akpm@osdl.org: printk warning fix]
    Signed-off-by: Shailabh Nagar
    Signed-off-by: Balbir Singh
    Cc: Jes Sorensen
    Cc: Peter Chubb
    Cc: Erich Focht
    Cc: Levent Serinol
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     
  • Usage of taskstats interface by delay accounting.

    Signed-off-by: Shailabh Nagar
    Signed-off-by: Balbir Singh
    Cc: Jes Sorensen
    Cc: Peter Chubb
    Cc: Erich Focht
    Cc: Levent Serinol
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     
  • Unlike earlier iterations of the delay accounting patches, now delays are only
    collected for the actual I/O waits rather than try and cover the delays seen
    in I/O submission paths.

    Account separately for block I/O delays incurred as a result of swapin page
    faults whose frequency can be affected by the task/process' rss limit. Hence
    swapin delays can act as feedback for rss limit changes independent of I/O
    priority changes.

    Signed-off-by: Shailabh Nagar
    Signed-off-by: Balbir Singh
    Cc: Jes Sorensen
    Cc: Peter Chubb
    Cc: Erich Focht
    Cc: Levent Serinol
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     
  • Initialization code related to collection of per-task "delay" statistics which
    measure how long it had to wait for cpu, sync block io, swapping etc. The
    collection of statistics and the interface are in other patches. This patch
    sets up the data structures and allows the statistics collection to be
    disabled through a kernel boot parameter.

    Signed-off-by: Shailabh Nagar
    Signed-off-by: Balbir Singh
    Cc: Jes Sorensen
    Cc: Peter Chubb
    Cc: Erich Focht
    Cc: Levent Serinol
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar