15 Jan, 2016

1 commit

  • Mark those kmem allocations that are known to be easily triggered from
    userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
    memcg. For the list, see below:

    - threadinfo
    - task_struct
    - task_delay_info
    - pid
    - cred
    - mm_struct
    - vm_area_struct and vm_region (nommu)
    - anon_vma and anon_vma_chain
    - signal_struct
    - sighand_struct
    - fs_struct
    - files_struct
    - fdtable and fdtable->full_fds_bits
    - dentry and external_name
    - inode for all filesystems. This is the most tedious part, because
    most filesystems overwrite the alloc_inode method.

    The list is far from complete, so feel free to add more objects.
    Nevertheless, it should be close to "account everything" approach and
    keep most workloads within bounds. Malevolent users will be able to
    breach the limit, but this was possible even with the former "account
    everything" approach (simply because it did not account everything in
    fact).

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Vladimir Davydov
    Acked-by: Johannes Weiner
    Acked-by: Michal Hocko
    Cc: Tejun Heo
    Cc: Greg Thelen
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

24 Jul, 2014

2 commits


12 Jun, 2014

1 commit

  • do_posix_clock_monotonic_gettime() is a leftover from the initial
    posix timer implementation which maps to ktime_get_ts(). Remove the
    silly wrapper while at it.

    Signed-off-by: Thomas Gleixner
    Cc: John Stultz
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20140611234606.931409215@linutronix.de
    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

13 Nov, 2013

1 commit


28 Jan, 2013

1 commit

  • This is in preparation for the full dynticks feature. While
    remotely reading the cputime of a task running in a full
    dynticks CPU, we'll need to do some extra-computation. This
    way we can account the time it spent tickless in userspace
    since its last cputime snapshot.

    Signed-off-by: Frederic Weisbecker
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: Li Zhong
    Cc: Namhyung Kim
    Cc: Paul E. McKenney
    Cc: Paul Gortmaker
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Thomas Gleixner

    Frederic Weisbecker
     

14 Jul, 2011

1 commit

  • To implement steal time, we need the hypervisor to pass the guest
    information about how much time was spent running other processes
    outside the VM, while the vcpu had meaningful work to do - halt
    time does not count.

    This information is acquired through the run_delay field of
    delayacct/schedstats infrastructure, that counts time spent in a
    runqueue but not running.

    Steal time is a per-cpu information, so the traditional MSR-based
    infrastructure is used. A new msr, KVM_MSR_STEAL_TIME, holds the
    memory area address containing information about steal time

    This patch contains the hypervisor part of the steal time infrasructure,
    and can be backported independently of the guest portion.

    [avi, yongjie: export delayacct_on, to avoid build failures in some configs]

    Signed-off-by: Glauber Costa
    Tested-by: Eric B Munson
    CC: Rik van Riel
    CC: Jeremy Fitzhardinge
    CC: Peter Zijlstra
    CC: Anthony Liguori
    Signed-off-by: Yongjie Ren
    Signed-off-by: Avi Kivity

    Glauber Costa
     

19 Sep, 2009

1 commit


18 Dec, 2008

1 commit

  • Impact: simplify code

    When we turn on CONFIG_SCHEDSTATS, per-task cpu runtime is accumulated
    twice. Once in task->se.sum_exec_runtime and once in sched_info.cpu_time.
    These two stats are exactly the same.

    Given that task->se.sum_exec_runtime is always accumulated by the core
    scheduler, sched_info can reuse that data instead of duplicate the accounting.

    Signed-off-by: Ken Chen
    Acked-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Ken Chen
     

26 Jul, 2008

2 commits

  • Add members for memory reclaim delay to taskstats, and accumulate them in
    __delayacct_add_tsk() .

    Signed-off-by: Keika Kobayashi
    Cc: Hiroshi Shimamoto
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Keika Kobayashi
     
  • Sometimes, application responses become bad under heavy memory load.
    Applications take a bit time to reclaim memory. The statistics, how long
    memory reclaim takes, will be useful to measure memory usage.

    This patch adds accounting memory reclaim to per-task-delay-accounting for
    accounting the time of do_try_to_free_pages().

    - When System is under low memory load,
    memory reclaim may not occur.

    $ free
    total used free shared buffers cached
    Mem: 8197800 1577300 6620500 0 4808 1516724
    -/+ buffers/cache: 55768 8142032
    Swap: 16386292 0 16386292

    $ vmstat 1
    procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
    r b swpd free buff cache si so bi bo in cs us sy id wa
    0 0 0 5069748 10612 3014060 0 0 0 0 3 26 0 0 100 0
    0 0 0 5069748 10612 3014060 0 0 0 0 4 22 0 0 100 0
    0 0 0 5069748 10612 3014060 0 0 0 0 3 18 0 0 100 0

    Measure the time of tar command.

    $ ls -s test.dat
    1501472 test.dat

    $ time tar cvf test.tar test.dat
    real 0m13.388s
    user 0m0.116s
    sys 0m5.304s

    $ ./delayget -d -p
    CPU count real total virtual total delay total
    428 5528345500 5477116080 62749891
    IO count delay total
    338 8078977189
    SWAP count delay total
    0 0
    RECLAIM count delay total
    0 0

    - When system is under heavy memory load
    memory reclaim may occur.

    $ vmstat 1
    procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
    r b swpd free buff cache si so bi bo in cs us sy id wa
    0 0 7159032 49724 1812 3012 0 0 0 0 3 24 0 0 100 0
    0 0 7159032 49724 1812 3012 0 0 0 0 4 24 0 0 100 0
    0 0 7159032 49848 1812 3012 0 0 0 0 3 22 0 0 100 0

    In this case, one process uses more 8G memory
    by execution of malloc() and memset().

    $ time tar cvf test.tar test.dat
    real 1m38.563s
    CPU count real total virtual total delay total
    9021 7140446250 7315277975 923201824
    IO count delay total
    8965 90466349669
    SWAP count delay total
    3 21036367
    RECLAIM count delay total
    740 61011951153

    In the later case, the value of RECLAIM is increasing.
    So, taskstats can show how much memory reclaim influences TAT.

    Signed-off-by: Keika Kobayashi
    Acked-by: Balbir Singh
    Acked-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Keika Kobayashi
     

19 Oct, 2007

1 commit

  • This adds items to the taststats struct to account for user and system
    time based on scaling the CPU frequency and instruction issue rates.

    Adds account_(user|system)_time_scaled callbacks which architectures
    can use to account for time using this mechanism.

    Signed-off-by: Michael Neuling
    Cc: Balbir Singh
    Cc: Jay Lan
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Neuling
     

15 Oct, 2007

1 commit

  • rename all 'cnt' fields and variables to the less yucky 'count' name.

    yuckage noticed by Andrew Morton.

    no change in code, other than the /proc/sched_debug bkl_count string got
    a bit larger:

    text data bss dec hex filename
    38236 3506 24 41766 a326 sched.o.before
    38240 3506 24 41770 a32a sched.o.after

    Signed-off-by: Ingo Molnar
    Reviewed-by: Thomas Gleixner

    Ingo Molnar
     

10 Jul, 2007

1 commit


08 May, 2007

1 commit

  • This patch provides a new macro

    KMEM_CACHE(, )

    to simplify slab creation. KMEM_CACHE creates a slab with the name of the
    struct, with the size of the struct and with the alignment of the struct.
    Additional slab flags may be specified if necessary.

    Example

    struct test_slab {
    int a,b,c;
    struct list_head;
    } __cacheline_aligned_in_smp;

    test_slab_cache = KMEM_CACHE(test_slab, SLAB_PANIC)

    will create a new slab named "test_slab" of the size sizeof(struct
    test_slab) and aligned to the alignment of test slab. If it fails then we
    panic.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

08 Dec, 2006

2 commits

  • Replace all uses of kmem_cache_t with struct kmem_cache.

    The patch was generated using the following script:

    #!/bin/sh
    #
    # Replace one string by another in all the kernel sources.
    #

    set -e

    for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
    quilt add $file
    sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
    mv /tmp/$$ $file
    quilt refresh
    done

    The script was run like this

    sh replace kmem_cache_t "struct kmem_cache"

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • SLAB_KERNEL is an alias of GFP_KERNEL.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

06 Nov, 2006

1 commit

  • Make the delayacct lock irqsave; this avoids the possible deadlock where
    an interrupt is taken while holding the delayacct lock which needs to
    take the delayacct lock.

    Signed-off-by: Peter Zijlstra
    Acked-by: Oleg Nesterov
    Cc: Balbir Singh
    Cc: Shailabh Nagar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

02 Sep, 2006

1 commit

  • Cleanup allocation and freeing of tsk->delays used by delay accounting.
    This solves two problems reported for delay accounting:

    1. oops in __delayacct_blkio_ticks
    http://www.uwsg.indiana.edu/hypermail/linux/kernel/0608.2/1844.html

    Currently tsk->delays is getting freed too early in task exit which can
    cause a NULL tsk->delays to get accessed via reading of /proc//stats.
    The patch fixes this problem by freeing tsk->delays closer to when
    task_struct itself is freed up. As a result, it also eliminates the use of
    tsk->delays_lock which was only being used (inadequately) to safeguard
    access to tsk->delays while a task was exiting.

    2. Possible memory leak in kernel/delayacct.c
    http://www.uwsg.indiana.edu/hypermail/linux/kernel/0608.2/1389.html

    The patch cleans up tsk->delays allocations after a bad fork which was
    missing earlier.

    The patch has been tested to fix the problems listed above and stress
    tested with rapid calls to delay accounting's taskstats command interface
    (which is the other path that can access the same data, besides the /proc
    interface causing the oops above).

    Signed-off-by: Shailabh Nagar
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     

01 Aug, 2006

1 commit

  • Enable delay accounting by default so that feature gets coverage testing
    without requiring special measures.

    Earlier, it was off by default and had to be enabled via a boot time param.
    This patch reverses the default behaviour to improve coverage testing. It
    can be removed late in the kernel development cycle if its believed users
    shouldn't have to incur any cost if they don't want delay accounting. Or
    it can be retained forever if the utility of the stats is deemed common
    enough to warrant keeping the feature on.

    Signed-off-by: Shailabh Nagar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     

15 Jul, 2006

4 commits

  • Export I/O delays seen by a task through /proc//stats for use in top
    etc.

    Note that delays for I/O done for swapping in pages (swapin I/O) is clubbed
    together with all other I/O here (this is not the case in the netlink
    interface where the swapin I/O is kept distinct)

    [akpm@osdl.org: printk warning fix]
    Signed-off-by: Shailabh Nagar
    Signed-off-by: Balbir Singh
    Cc: Jes Sorensen
    Cc: Peter Chubb
    Cc: Erich Focht
    Cc: Levent Serinol
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     
  • Usage of taskstats interface by delay accounting.

    Signed-off-by: Shailabh Nagar
    Signed-off-by: Balbir Singh
    Cc: Jes Sorensen
    Cc: Peter Chubb
    Cc: Erich Focht
    Cc: Levent Serinol
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     
  • Unlike earlier iterations of the delay accounting patches, now delays are only
    collected for the actual I/O waits rather than try and cover the delays seen
    in I/O submission paths.

    Account separately for block I/O delays incurred as a result of swapin page
    faults whose frequency can be affected by the task/process' rss limit. Hence
    swapin delays can act as feedback for rss limit changes independent of I/O
    priority changes.

    Signed-off-by: Shailabh Nagar
    Signed-off-by: Balbir Singh
    Cc: Jes Sorensen
    Cc: Peter Chubb
    Cc: Erich Focht
    Cc: Levent Serinol
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar
     
  • Initialization code related to collection of per-task "delay" statistics which
    measure how long it had to wait for cpu, sync block io, swapping etc. The
    collection of statistics and the interface are in other patches. This patch
    sets up the data structures and allows the statistics collection to be
    disabled through a kernel boot parameter.

    Signed-off-by: Shailabh Nagar
    Signed-off-by: Balbir Singh
    Cc: Jes Sorensen
    Cc: Peter Chubb
    Cc: Erich Focht
    Cc: Levent Serinol
    Cc: Jay Lan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shailabh Nagar