11 Jul, 2016

1 commit

  • Variable "now" seems to be genuinely used unintialized
    if branch

    if (CPUCLOCK_PERTHREAD(timer->it_clock)) {

    is not taken and branch

    if (unlikely(sighand == NULL)) {

    is taken. In this case the process has been reaped and the timer is marked as
    disarmed anyway. So none of the postprocessing of the sample is
    required. Return right away.

    Signed-off-by: Alexey Dobriyan
    Cc: stable@vger.kernel.org
    Link: http://lkml.kernel.org/r/20160707223911.GA26483@p183.telecom.by
    Signed-off-by: Thomas Gleixner

    Alexey Dobriyan
     

02 Mar, 2016

1 commit

  • Instead of providing asynchronous checks for the nohz subsystem to verify
    posix cpu timers tick dependency, migrate the latter to the new mask.

    In order to keep track of the running timers and expose the tick
    dependency accordingly, we must probe the timers queuing and dequeuing
    on threads and process lists.

    Unfortunately it implies both task and signal level dependencies. We
    should be able to further optimize this and merge all that on the task
    level dependency, at the cost of a bit of complexity and may be overhead.

    Reviewed-by: Chris Metcalf
    Cc: Christoph Lameter
    Cc: Chris Metcalf
    Cc: Ingo Molnar
    Cc: Luiz Capitulino
    Cc: Peter Zijlstra
    Cc: Rik van Riel
    Cc: Thomas Gleixner
    Cc: Viresh Kumar
    Signed-off-by: Frederic Weisbecker

    Frederic Weisbecker
     

15 Oct, 2015

4 commits

  • It was found while running a database workload on large systems that
    significant time was spent trying to acquire the sighand lock.

    The issue was that whenever an itimer expired, many threads ended up
    simultaneously trying to send the signal. Most of the time, nothing
    happened after acquiring the sighand lock because another thread
    had just already sent the signal and updated the "next expire" time.
    The fastpath_timer_check() didn't help much since the "next expire"
    time was updated after the threads exit fastpath_timer_check().

    This patch addresses this by having the thread_group_cputimer structure
    maintain a boolean to signify when a thread in the group is already
    checking for process wide timers, and adds extra logic in the fastpath
    to check the boolean.

    Signed-off-by: Jason Low
    Reviewed-by: Oleg Nesterov
    Reviewed-by: George Spelvin
    Cc: Paul E. McKenney
    Cc: Frederic Weisbecker
    Cc: Davidlohr Bueso
    Cc: Steven Rostedt
    Cc: hideaki.kimura@hpe.com
    Cc: terry.rudd@hpe.com
    Cc: scott.norton@hpe.com
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1444849677-29330-5-git-send-email-jason.low2@hp.com
    Signed-off-by: Thomas Gleixner

    Jason Low
     
  • In the next patch in this series, a new field 'checking_timer' will
    be added to 'struct thread_group_cputimer'. Both this and the
    existing 'running' integer field are just used as boolean values. To
    save space in the structure, we can make both of these fields booleans.

    This is a preparatory patch to convert the existing running integer
    field to a boolean.

    Suggested-by: George Spelvin
    Signed-off-by: Jason Low
    Reviewed: George Spelvin
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Frederic Weisbecker
    Cc: Davidlohr Bueso
    Cc: Steven Rostedt
    Cc: hideaki.kimura@hpe.com
    Cc: terry.rudd@hpe.com
    Cc: scott.norton@hpe.com
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1444849677-29330-4-git-send-email-jason.low2@hp.com
    Signed-off-by: Thomas Gleixner

    Jason Low
     
  • The fastpath_timer_check() contains logic to check for if any timers
    are set by checking if !task_cputime_zero(). Similarly, we can do this
    before calling check_thread_timers(). In the case where there
    are only process-wide timers, this will skip all of the computations for
    per-thread timers when there are no per-thread timers.

    As suggested by George, we can put the task_cputime_zero() check in
    check_thread_timers(), since that is more of an optization to the
    function. Similarly, we move the existing check of cputimer->running
    to check_process_timers().

    Signed-off-by: Jason Low
    Reviewed-by: Oleg Nesterov
    Reviewed-by: George Spelvin
    Cc: Paul E. McKenney
    Cc: Frederic Weisbecker
    Cc: Davidlohr Bueso
    Cc: Steven Rostedt
    Cc: hideaki.kimura@hpe.com
    Cc: terry.rudd@hpe.com
    Cc: scott.norton@hpe.com
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1444849677-29330-3-git-send-email-jason.low2@hp.com
    Signed-off-by: Thomas Gleixner

    Jason Low
     
  • In fastpath_timer_check(), the task_cputime() function is always
    called to compute the utime and stime values. However, this is not
    necessary if there are no per-thread timers to check for. This patch
    modifies the code such that we compute the task_cputime values only
    when there are per-thread timers set.

    Signed-off-by: Jason Low
    Reviewed-by: Oleg Nesterov
    Reviewed-by: Frederic Weisbecker
    Reviewed-by: Davidlohr Bueso
    Reviewed-by: George Spelvin
    Cc: Paul E. McKenney
    Cc: Steven Rostedt
    Cc: hideaki.kimura@hpe.com
    Cc: terry.rudd@hpe.com
    Cc: scott.norton@hpe.com
    Cc: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1444849677-29330-2-git-send-email-jason.low2@hp.com
    Signed-off-by: Thomas Gleixner

    Jason Low
     

08 May, 2015

3 commits

  • Recent optimizations were made to thread_group_cputimer to improve its
    scalability by keeping track of cputime stats without a lock. However,
    the values were open coded to the structure, causing them to be at
    a different abstraction level from the regular task_cputime structure.
    Furthermore, any subsequent similar optimizations would not be able to
    share the new code, since they are specific to thread_group_cputimer.

    This patch adds the new task_cputime_atomic data structure (introduced in
    the previous patch in the series) to thread_group_cputimer for keeping
    track of the cputime atomically, which also helps generalize the code.

    Suggested-by: Ingo Molnar
    Signed-off-by: Jason Low
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Thomas Gleixner
    Acked-by: Rik van Riel
    Cc: Andrew Morton
    Cc: Aswin Chandramouleeswaran
    Cc: Borislav Petkov
    Cc: Davidlohr Bueso
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Mel Gorman
    Cc: Mike Galbraith
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Scott J Norton
    Cc: Steven Rostedt
    Cc: Waiman Long
    Link: http://lkml.kernel.org/r/1430251224-5764-6-git-send-email-jason.low2@hp.com
    Signed-off-by: Ingo Molnar

    Jason Low
     
  • While running a database workload, we found a scalability issue with itimers.

    Much of the problem was caused by the thread_group_cputimer spinlock.
    Each time we account for group system/user time, we need to obtain a
    thread_group_cputimer's spinlock to update the timers. On larger systems
    (such as a 16 socket machine), this caused more than 30% of total time
    spent trying to obtain this kernel lock to update these group timer stats.

    This patch converts the timers to 64-bit atomic variables and use
    atomic add to update them without a lock. With this patch, the percent
    of total time spent updating thread group cputimer timers was reduced
    from 30% down to less than 1%.

    Note: On 32-bit systems using the generic 64-bit atomics, this causes
    sample_group_cputimer() to take locks 3 times instead of just 1 time.
    However, we tested this patch on a 32-bit system ARM system using the
    generic atomics and did not find the overhead to be much of an issue.
    An explanation for why this isn't an issue is that 32-bit systems usually
    have small numbers of CPUs, and cacheline contention from extra spinlocks
    called periodically is not really apparent on smaller systems.

    Signed-off-by: Jason Low
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Thomas Gleixner
    Acked-by: Rik van Riel
    Cc: Andrew Morton
    Cc: Aswin Chandramouleeswaran
    Cc: Borislav Petkov
    Cc: Davidlohr Bueso
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Mel Gorman
    Cc: Mike Galbraith
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Peter Zijlstra
    Cc: Preeti U Murthy
    Cc: Scott J Norton
    Cc: Steven Rostedt
    Cc: Waiman Long
    Link: http://lkml.kernel.org/r/1430251224-5764-4-git-send-email-jason.low2@hp.com
    Signed-off-by: Ingo Molnar

    Jason Low
     
  • ACCESS_ONCE doesn't work reliably on non-scalar types. This patch removes
    the rest of the existing usages of ACCESS_ONCE() in the scheduler, and use
    the new READ_ONCE() and WRITE_ONCE() APIs as appropriate.

    Signed-off-by: Jason Low
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Thomas Gleixner
    Acked-by: Rik van Riel
    Acked-by: Waiman Long
    Cc: Andrew Morton
    Cc: Aswin Chandramouleeswaran
    Cc: Borislav Petkov
    Cc: Davidlohr Bueso
    Cc: Frederic Weisbecker
    Cc: H. Peter Anvin
    Cc: Linus Torvalds
    Cc: Mel Gorman
    Cc: Mike Galbraith
    Cc: Oleg Nesterov
    Cc: Paul E. McKenney
    Cc: Preeti U Murthy
    Cc: Scott J Norton
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/1430251224-5764-2-git-send-email-jason.low2@hp.com
    Signed-off-by: Ingo Molnar

    Jason Low
     

13 Feb, 2015

1 commit

  • If an attacker can cause a controlled kernel stack overflow, overwriting
    the restart block is a very juicy exploit target. This is because the
    restart_block is held in the same memory allocation as the kernel stack.

    Moving the restart block to struct task_struct prevents this exploit by
    making the restart_block harder to locate.

    Note that there are other fields in thread_info that are also easy
    targets, at least on some architectures.

    It's also a decent simplification, since the restart code is more or less
    identical on all architectures.

    [james.hogan@imgtec.com: metag: align thread_info::supervisor_stack]
    Signed-off-by: Andy Lutomirski
    Cc: Thomas Gleixner
    Cc: Al Viro
    Cc: "H. Peter Anvin"
    Cc: Ingo Molnar
    Cc: Kees Cook
    Cc: David Miller
    Acked-by: Richard Weinberger
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Matt Turner
    Cc: Vineet Gupta
    Cc: Russell King
    Cc: Catalin Marinas
    Cc: Will Deacon
    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Steven Miao
    Cc: Mark Salter
    Cc: Aurelien Jacquiot
    Cc: Mikael Starvik
    Cc: Jesper Nilsson
    Cc: David Howells
    Cc: Richard Kuo
    Cc: "Luck, Tony"
    Cc: Geert Uytterhoeven
    Cc: Michal Simek
    Cc: Ralf Baechle
    Cc: Jonas Bonn
    Cc: "James E.J. Bottomley"
    Cc: Helge Deller
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Acked-by: Michael Ellerman (powerpc)
    Tested-by: Michael Ellerman (powerpc)
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Chen Liqin
    Cc: Lennox Wu
    Cc: Chris Metcalf
    Cc: Guan Xuetao
    Cc: Chris Zankel
    Cc: Max Filippov
    Cc: Oleg Nesterov
    Cc: Guenter Roeck
    Signed-off-by: James Hogan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Lutomirski
     

16 Nov, 2014

1 commit

  • While looking over the cpu-timer code I found that we appear to add
    the delta for the calling task twice, through:

    cpu_timer_sample_group()
    thread_group_cputimer()
    thread_group_cputime()
    times->sum_exec_runtime += task_sched_runtime();

    *sample = cputime.sum_exec_runtime + task_delta_exec();

    Which would make the sample run ahead, making the sleep short.

    Signed-off-by: Peter Zijlstra (Intel)
    Cc: KOSAKI Motohiro
    Cc: Oleg Nesterov
    Cc: Stanislaw Gruszka
    Cc: Christoph Lameter
    Cc: Frederic Weisbecker
    Cc: Linus Torvalds
    Cc: Rik van Riel
    Cc: Tejun Heo
    Link: http://lkml.kernel.org/r/20141112113737.GI10476@twins.programming.kicks-ass.net
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

08 Sep, 2014

1 commit

  • Both times() and clock_gettime(CLOCK_PROCESS_CPUTIME_ID) have scalability
    issues on large systems, due to both functions being serialized with a
    lock.

    The lock protects against reporting a wrong value, due to a thread in the
    task group exiting, its statistics reporting up to the signal struct, and
    that exited task's statistics being counted twice (or not at all).

    Protecting that with a lock results in times() and clock_gettime() being
    completely serialized on large systems.

    This can be fixed by using a seqlock around the events that gather and
    propagate statistics. As an additional benefit, the protection code can
    be moved into thread_group_cputime(), slightly simplifying the calling
    functions.

    In the case of posix_cpu_clock_get_task() things can be simplified a
    lot, because the calling function already ensures that the task sticks
    around, and the rest is now taken care of in thread_group_cputime().

    This way the statistics reporting code can run lockless.

    Signed-off-by: Rik van Riel
    Signed-off-by: Peter Zijlstra (Intel)
    Cc: Alex Thorlton
    Cc: Andrew Morton
    Cc: Daeseok Youn
    Cc: David Rientjes
    Cc: Dongsheng Yang
    Cc: Geert Uytterhoeven
    Cc: Guillaume Morin
    Cc: Ionut Alexa
    Cc: Kees Cook
    Cc: Linus Torvalds
    Cc: Li Zefan
    Cc: Michal Hocko
    Cc: Michal Schmidt
    Cc: Oleg Nesterov
    Cc: Vladimir Davydov
    Cc: umgwanakikbuti@gmail.com
    Cc: fweisbec@gmail.com
    Cc: srao@redhat.com
    Cc: lwoodman@redhat.com
    Cc: atheurer@redhat.com
    Link: http://lkml.kernel.org/r/20140816134010.26a9b572@annuminas.surriel.com
    Signed-off-by: Ingo Molnar

    Rik van Riel
     

23 Jun, 2014

1 commit