14 May, 2010

4 commits

  • This patch adds data to be passed to tracepoint callbacks.

    The created functions from DECLARE_TRACE() now need a mandatory data
    parameter. For example:

    DECLARE_TRACE(mytracepoint, int value, value)

    Will create the register function:

    int register_trace_mytracepoint((void(*)(void *data, int value))probe,
    void *data);

    As the first argument, all callbacks (probes) must take a (void *data)
    parameter. So a callback for the above tracepoint will look like:

    void myprobe(void *data, int value)
    {
    }

    The callback may choose to ignore the data parameter.

    This change allows callbacks to register a private data pointer along
    with the function probe.

    void mycallback(void *data, int value);

    register_trace_mytracepoint(mycallback, mydata);

    Then the mycallback() will receive the "mydata" as the first parameter
    before the args.

    A more detailed example:

    DECLARE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

    /* In the C file */

    DEFINE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

    [...]

    trace_mytracepoint(status);

    /* In a file registering this tracepoint */

    int my_callback(void *data, int status)
    {
    struct my_struct my_data = data;
    [...]
    }

    [...]
    my_data = kmalloc(sizeof(*my_data), GFP_KERNEL);
    init_my_data(my_data);
    register_trace_mytracepoint(my_callback, my_data);

    The same callback can also be registered to the same tracepoint as long
    as the data registered is different. Note, the data must also be used
    to unregister the callback:

    unregister_trace_mytracepoint(my_callback, my_data);

    Because of the data parameter, tracepoints declared this way can not have
    no args. That is:

    DECLARE_TRACE(mytracepoint, TP_PROTO(void), TP_ARGS());

    will cause an error.

    If no arguments are needed, a new macro can be used instead:

    DECLARE_TRACE_NOARGS(mytracepoint);

    Since there are no arguments, the proto and args fields are left out.

    This is part of a series to make the tracepoint footprint smaller:

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4914025 1088868 861512 6864405 68be15 vmlinux.class
    4918492 1084612 861512 6864616 68bee8 vmlinux.tracepoint

    Again, this patch also increases the size of the kernel, but
    lays the ground work for decreasing it.

    v5: Fixed net/core/drop_monitor.c to handle these updates.

    v4: Moved the DECLARE_TRACE() DECLARE_TRACE_NOARGS out of the
    #ifdef CONFIG_TRACE_POINTS, since the two are the same in both
    cases. The __DECLARE_TRACE() is what changes.
    Thanks to Frederic Weisbecker for pointing this out.

    v3: Made all register_* functions require data to be passed and
    all callbacks to take a void * parameter as its first argument.
    This makes the calling functions comply with C standards.

    Also added more comments to the modifications of DECLARE_TRACE().

    v2: Made the DECLARE_TRACE() have the ability to pass arguments
    and added a new DECLARE_TRACE_NOARGS() for tracepoints that
    do not need any arguments.

    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: Frederic Weisbecker
    Cc: Neil Horman
    Cc: David S. Miller
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • This check is meant to be used by tracepoint users which do a direct cast of
    callbacks to (void *) for direct registration, thus bypassing the
    register_trace_##name and unregister_trace_##name checks.

    This permits to ensure that the callback type matches the function type at the
    call site, but without generating any code.

    Acked-by: Masami Hiramatsu
    Acked-by: Frederic Weisbecker
    Signed-off-by: Mathieu Desnoyers
    LKML-Reference:
    CC: Ingo Molnar
    CC: Andrew Morton
    CC: Thomas Gleixner
    CC: Peter Zijlstra
    CC: Arnaldo Carvalho de Melo
    CC: Lai Jiangshan
    CC: Li Zefan
    CC: Christoph Hellwig
    Signed-off-by: Steven Rostedt

    Mathieu Desnoyers
     
  • This patch creates a ftrace_event_class struct that event structs point to.
    This class struct will be made to hold information to modify the
    events. Currently the class struct only holds the events system name.

    This patch slightly increases the size, but this change lays the ground work
    of other changes to make the footprint of tracepoints smaller.

    With 82 standard tracepoints, and 618 system call tracepoints
    (two tracepoints per syscall: enter and exit):

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4914025 1088868 861512 6864405 68be15 vmlinux.class

    This patch also cleans up some stale comments in ftrace.h.

    v2: Fixed missing semi-colon in macro.

    Acked-by: Frederic Weisbecker
    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • …inux-2.6-tip into trace/tip/tracing/core-4

    Steven Rostedt
     

11 May, 2010

1 commit

  • epoll should not touch flags in wait_queue_t. This patch introduces a new
    function __add_wait_queue_exclusive(), for the users, who use wait queue as a
    LIFO queue.

    __add_wait_queue_tail_exclusive() is introduced too instead of
    add_wait_queue_exclusive_locked(). remove_wait_queue_locked() is removed, as
    it is a duplicate of __remove_wait_queue(), disliked by users, and with less
    users.

    Signed-off-by: Changli Gao
    Signed-off-by: Peter Zijlstra
    Cc: Alexander Viro
    Cc: Paul Menage
    Cc: Li Zefan
    Cc: Davide Libenzi
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Changli Gao
     

10 May, 2010

9 commits

  • This comment should have been removed together with uids_mutex
    when removing user sched.

    Signed-off-by: Li Zefan
    Cc: Peter Zijlstra
    Cc: Dhaval Giani
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     
  • Pavel Machek pointed out that not all CPUs have an efficient
    idle at high frequency. Specifically, older Intel and various
    AMD cpus would get a higher powerusage when copying files from
    USB.

    Mike Chan pointed out that the same is true for various ARM
    chips as well.

    Thomas Renninger suggested to make this a sysfs tunable with a
    reasonable default.

    This patch adds a sysfs tunable for the new behavior, and uses
    a very simple function to determine a reasonable default,
    depending on the CPU vendor/type.

    Signed-off-by: Arjan van de Ven
    Acked-by: Rik van Riel
    Acked-by: Pavel Machek
    Acked-by: Peter Zijlstra
    Cc: davej@redhat.com
    LKML-Reference:
    [ minor tidyup ]
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     
  • The ondemand cpufreq governor uses CPU busy time (e.g. not-idle
    time) as a measure for scaling the CPU frequency up or down.
    If the CPU is busy, the CPU frequency scales up, if it's idle,
    the CPU frequency scales down. Effectively, it uses the CPU busy
    time as proxy variable for the more nebulous "how critical is
    performance right now" question.

    This algorithm falls flat on its face in the light of workloads
    where you're alternatingly disk and CPU bound, such as the ever
    popular "git grep", but also things like startup of programs and
    maildir using email clients... much to the chagarin of Andrew
    Morton.

    This patch changes the ondemand algorithm to count iowait time
    as busy, not idle, time. As shown in the breakdown cases above,
    iowait is performance critical often, and by counting iowait,
    the proxy variable becomes a more accurate representation of the
    "how critical is performance" question.

    The problem and fix are both verified with the "perf timechar"
    tool.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Dave Jones
    Reviewed-by: Rik van Riel
    Acked-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     
  • For the ondemand cpufreq governor, it is desired that the iowait
    time is microaccounted in a similar way as idle time is.

    This patch introduces the infrastructure to account and expose
    this information via the get_cpu_iowait_time_us() function.

    [akpm@linux-foundation.org: fix CONFIG_NO_HZ=n build]
    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Reviewed-by: Rik van Riel
    Acked-by: Peter Zijlstra
    Cc: davej@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     
  • Now that the only user of ts->idle_lastupdate is
    update_ts_time_stats(), the entire field can be eliminated.

    In update_ts_time_stats(), idle_lastupdate is first set to
    "now", and a few lines later, the only user is an if() statement
    that assigns a variable either to "now" or to
    ts->idle_lastupdate, which has the value of "now" at that point.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Reviewed-by: Rik van Riel
    Acked-by: Peter Zijlstra
    Cc: davej@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     
  • This patch folds the updating of the last_update_time into the
    update_ts_time_stats() function, and updates the callers.

    This allows for further cleanups that are done in the next
    patch.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Reviewed-by: Rik van Riel
    Acked-by: Peter Zijlstra
    Cc: davej@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     
  • Right now, get_cpu_idle_time_us() only reports the idle
    statistics upto the point the CPU entered last idle; not what is
    valid right now.

    This patch adds an update of the idle statistics to
    get_cpu_idle_time_us(), so that calling this function always
    returns statistics that are accurate at the point of the call.

    This includes resetting the start of the idle time for
    accounting purposes to avoid double accounting.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Reviewed-by: Rik van Riel
    Acked-by: Peter Zijlstra
    Cc: davej@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     
  • Currently, two places update the idle statistics (and more to
    come later in this series).

    This patch creates a helper function for updating these
    statistics.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Reviewed-by: Rik van Riel
    Acked-by: Peter Zijlstra
    Cc: davej@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     
  • The exported function get_cpu_idle_time_us() has no comment
    describing it; add a kerneldoc comment

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Reviewed-by: Rik van Riel
    Acked-by: Peter Zijlstra
    Cc: davej@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     

09 May, 2010

1 commit


08 May, 2010

1 commit

  • When !CONFIG_SMP, cpu_stop functions weren't defined at all which
    could lead to build failures if UP code uses cpu_stop facility. Add
    dummy cpu_stop implementation for UP. The waiting variants execute
    the work function directly with preempt disabled and
    stop_one_cpu_nowait() schedules a workqueue work.

    Makefile and ifdefs around stop_machine implementation are updated to
    accomodate CONFIG_SMP && !CONFIG_STOP_MACHINE case.

    Signed-off-by: Tejun Heo
    Reported-by: Ingo Molnar

    Tejun Heo
     

07 May, 2010

8 commits

  • struct rq isn't visible outside of sched.o so its near useless to
    expose the pointer, also there are no users of it, so remove it.

    Acked-by: Steven Rostedt
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Ingo Molnar
     
  • If synchronize_sched_expedited() is ever to be called from within
    kernel/sched.c in a !SMP PREEMPT kernel, the !SMP implementation needs
    a barrier().

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Tejun Heo

    Paul E. McKenney
     
  • The memory barriers must be in the SMP case, not in the !SMP case.
    Also add a barrier after the atomic_inc() in order to ensure that
    other CPUs see post-synchronize_sched_expedited() actions as following
    the expedited grace period.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Tejun Heo

    Paul E. McKenney
     
  • The paranoid check which verifies that the cpu_stop callback is
    actually called on all online cpus is completely superflous. It's
    guaranteed by cpu_stop facility and if it didn't work as advertised
    other things would go horribly wrong and trying to recover using
    synchronize_sched() wouldn't be very meaningful.

    Kill the paranoid check. Removal of this feature is done as a
    separate step so that it can serve as a bisection point if something
    actually goes wrong.

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Dipankar Sarma
    Cc: Josh Triplett
    Cc: Paul E. McKenney
    Cc: Oleg Nesterov
    Cc: Dimitri Sivanich

    Tejun Heo
     
  • Currently migration_thread is serving three purposes - migration
    pusher, context to execute active_load_balance() and forced context
    switcher for expedited RCU synchronize_sched. All three roles are
    hardcoded into migration_thread() and determining which job is
    scheduled is slightly messy.

    This patch kills migration_thread and replaces all three uses with
    cpu_stop. The three different roles of migration_thread() are
    splitted into three separate cpu_stop callbacks -
    migration_cpu_stop(), active_load_balance_cpu_stop() and
    synchronize_sched_expedited_cpu_stop() - and each use case now simply
    asks cpu_stop to execute the callback as necessary.

    synchronize_sched_expedited() was implemented with private
    preallocated resources and custom multi-cpu queueing and waiting
    logic, both of which are provided by cpu_stop.
    synchronize_sched_expedited_count is made atomic and all other shared
    resources along with the mutex are dropped.

    synchronize_sched_expedited() also implemented a check to detect cases
    where not all the callback got executed on their assigned cpus and
    fall back to synchronize_sched(). If called with cpu hotplug blocked,
    cpu_stop already guarantees that and the condition cannot happen;
    otherwise, stop_machine() would break. However, this patch preserves
    the paranoid check using a cpumask to record on which cpus the stopper
    ran so that it can serve as a bisection point if something actually
    goes wrong theree.

    Because the internal execution state is no longer visible,
    rcu_expedited_torture_stats() is removed.

    This patch also renames cpu_stop threads to from "stopper/%d" to
    "migration/%d". The names of these threads ultimately don't matter
    and there's no reason to make unnecessary userland visible changes.

    With this patch applied, stop_machine() and sched now share the same
    resources. stop_machine() is faster without wasting any resources and
    sched migration users are much cleaner.

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Dipankar Sarma
    Cc: Josh Triplett
    Cc: Paul E. McKenney
    Cc: Oleg Nesterov
    Cc: Dimitri Sivanich

    Tejun Heo
     
  • Reimplement stop_machine using cpu_stop. As cpu stoppers are
    guaranteed to be available for all online cpus,
    stop_machine_create/destroy() are no longer necessary and removed.

    With resource management and synchronization handled by cpu_stop, the
    new implementation is much simpler. Asking the cpu_stop to execute
    the stop_cpu() state machine on all online cpus with cpu hotplug
    disabled is enough.

    stop_machine itself doesn't need to manage any global resources
    anymore, so all per-instance information is rolled into struct
    stop_machine_data and the mutex and all static data variables are
    removed.

    The previous implementation created and destroyed RT workqueues as
    necessary which made stop_machine() calls highly expensive on very
    large machines. According to Dimitri Sivanich, preventing the dynamic
    creation/destruction makes booting faster more than twice on very
    large machines. cpu_stop resources are preallocated for all online
    cpus and should have the same effect.

    Signed-off-by: Tejun Heo
    Acked-by: Rusty Russell
    Acked-by: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Dimitri Sivanich

    Tejun Heo
     
  • Implement a simplistic per-cpu maximum priority cpu monopolization
    mechanism. A non-sleeping callback can be scheduled to run on one or
    multiple cpus with maximum priority monopolozing those cpus. This is
    primarily to replace and unify RT workqueue usage in stop_machine and
    scheduler migration_thread which currently is serving multiple
    purposes.

    Four functions are provided - stop_one_cpu(), stop_one_cpu_nowait(),
    stop_cpus() and try_stop_cpus().

    This is to allow clean sharing of resources among stop_cpu and all the
    migration thread users. One stopper thread per cpu is created which
    is currently named "stopper/CPU". This will eventually replace the
    migration thread and take on its name.

    * This facility was originally named cpuhog and lived in separate
    files but Peter Zijlstra nacked the name and thus got renamed to
    cpu_stop and moved into stop_machine.c.

    * Better reporting of preemption leak as per Peter's suggestion.

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Dimitri Sivanich

    Tejun Heo
     

06 May, 2010

1 commit


05 May, 2010

3 commits

  • When more than one header is included under CREATE_TRACE_POINTS
    the DECLARE_TRACE() macro is not defined back to its original meaning
    and the second include will fail to initialize the TRACE_EVENT()
    and DECLARE_TRACE() correctly.

    To fix this the tracepoint.h file moves the define of DECLARE_TRACE()
    out of the #ifdef _LINUX_TRACEPOINT_H protection (just like the
    define of the TRACE_EVENT()). This way the define_trace.h will undef
    the DECLARE_TRACE() at the end and allow new headers to start
    from scratch.

    This patch also requires fixing the include/events/napi.h

    It currently uses DECLARE_TRACE() and should be converted to a TRACE_EVENT()
    format. But I'll leave that change to the authors of that file.
    But since the napi.h file depends on using the CREATE_TRACE_POINTS
    and does not define its own DEFINE_TRACE() it must use the define_trace.h
    method instead.

    Cc: Neil Horman
    Cc: David S. Miller
    Cc: Mathieu Desnoyers
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Make it clear that event-list is a comma separated list of events.

    Reported-by: KOSAKI Motohiro
    Signed-off-by: Li Zefan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Li Zefan
     
  • Wrap open-coded WARN_ONCE functionality into the equivalent macro.

    Signed-off-by: Borislav Petkov
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Borislav Petkov
     

04 May, 2010

1 commit

  • The ftrace.h file contains several functions as macros when the
    functions are disabled due to config options. This patch converts
    most of them to static inlines.

    There are two exceptions:

    register_ftrace_function() and unregister_ftrace_function()

    This is because their parameter "ops" must not be evaluated since
    code using the function is allowed to #ifdef out the creation of
    the parameter.

    This also fixes an error caused by recent changes:

    kernel/trace/trace_irqsoff.c: In function 'start_irqsoff_tracer':
    kernel/trace/trace_irqsoff.c:571: error: expected expression before 'do'

    Reported-by: Ingo Molnar
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

30 Apr, 2010

11 commits