14 May, 2010

4 commits

  • This patch adds data to be passed to tracepoint callbacks.

    The created functions from DECLARE_TRACE() now need a mandatory data
    parameter. For example:

    DECLARE_TRACE(mytracepoint, int value, value)

    Will create the register function:

    int register_trace_mytracepoint((void(*)(void *data, int value))probe,
    void *data);

    As the first argument, all callbacks (probes) must take a (void *data)
    parameter. So a callback for the above tracepoint will look like:

    void myprobe(void *data, int value)
    {
    }

    The callback may choose to ignore the data parameter.

    This change allows callbacks to register a private data pointer along
    with the function probe.

    void mycallback(void *data, int value);

    register_trace_mytracepoint(mycallback, mydata);

    Then the mycallback() will receive the "mydata" as the first parameter
    before the args.

    A more detailed example:

    DECLARE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

    /* In the C file */

    DEFINE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

    [...]

    trace_mytracepoint(status);

    /* In a file registering this tracepoint */

    int my_callback(void *data, int status)
    {
    struct my_struct my_data = data;
    [...]
    }

    [...]
    my_data = kmalloc(sizeof(*my_data), GFP_KERNEL);
    init_my_data(my_data);
    register_trace_mytracepoint(my_callback, my_data);

    The same callback can also be registered to the same tracepoint as long
    as the data registered is different. Note, the data must also be used
    to unregister the callback:

    unregister_trace_mytracepoint(my_callback, my_data);

    Because of the data parameter, tracepoints declared this way can not have
    no args. That is:

    DECLARE_TRACE(mytracepoint, TP_PROTO(void), TP_ARGS());

    will cause an error.

    If no arguments are needed, a new macro can be used instead:

    DECLARE_TRACE_NOARGS(mytracepoint);

    Since there are no arguments, the proto and args fields are left out.

    This is part of a series to make the tracepoint footprint smaller:

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4914025 1088868 861512 6864405 68be15 vmlinux.class
    4918492 1084612 861512 6864616 68bee8 vmlinux.tracepoint

    Again, this patch also increases the size of the kernel, but
    lays the ground work for decreasing it.

    v5: Fixed net/core/drop_monitor.c to handle these updates.

    v4: Moved the DECLARE_TRACE() DECLARE_TRACE_NOARGS out of the
    #ifdef CONFIG_TRACE_POINTS, since the two are the same in both
    cases. The __DECLARE_TRACE() is what changes.
    Thanks to Frederic Weisbecker for pointing this out.

    v3: Made all register_* functions require data to be passed and
    all callbacks to take a void * parameter as its first argument.
    This makes the calling functions comply with C standards.

    Also added more comments to the modifications of DECLARE_TRACE().

    v2: Made the DECLARE_TRACE() have the ability to pass arguments
    and added a new DECLARE_TRACE_NOARGS() for tracepoints that
    do not need any arguments.

    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: Frederic Weisbecker
    Cc: Neil Horman
    Cc: David S. Miller
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • This check is meant to be used by tracepoint users which do a direct cast of
    callbacks to (void *) for direct registration, thus bypassing the
    register_trace_##name and unregister_trace_##name checks.

    This permits to ensure that the callback type matches the function type at the
    call site, but without generating any code.

    Acked-by: Masami Hiramatsu
    Acked-by: Frederic Weisbecker
    Signed-off-by: Mathieu Desnoyers
    LKML-Reference:
    CC: Ingo Molnar
    CC: Andrew Morton
    CC: Thomas Gleixner
    CC: Peter Zijlstra
    CC: Arnaldo Carvalho de Melo
    CC: Lai Jiangshan
    CC: Li Zefan
    CC: Christoph Hellwig
    Signed-off-by: Steven Rostedt

    Mathieu Desnoyers
     
  • This patch creates a ftrace_event_class struct that event structs point to.
    This class struct will be made to hold information to modify the
    events. Currently the class struct only holds the events system name.

    This patch slightly increases the size, but this change lays the ground work
    of other changes to make the footprint of tracepoints smaller.

    With 82 standard tracepoints, and 618 system call tracepoints
    (two tracepoints per syscall: enter and exit):

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4914025 1088868 861512 6864405 68be15 vmlinux.class

    This patch also cleans up some stale comments in ftrace.h.

    v2: Fixed missing semi-colon in macro.

    Acked-by: Frederic Weisbecker
    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • …inux-2.6-tip into trace/tip/tracing/core-4

    Steven Rostedt
     

11 May, 2010

1 commit

  • epoll should not touch flags in wait_queue_t. This patch introduces a new
    function __add_wait_queue_exclusive(), for the users, who use wait queue as a
    LIFO queue.

    __add_wait_queue_tail_exclusive() is introduced too instead of
    add_wait_queue_exclusive_locked(). remove_wait_queue_locked() is removed, as
    it is a duplicate of __remove_wait_queue(), disliked by users, and with less
    users.

    Signed-off-by: Changli Gao
    Signed-off-by: Peter Zijlstra
    Cc: Alexander Viro
    Cc: Paul Menage
    Cc: Li Zefan
    Cc: Davide Libenzi
    Cc:
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Changli Gao
     

10 May, 2010

2 commits

  • For the ondemand cpufreq governor, it is desired that the iowait
    time is microaccounted in a similar way as idle time is.

    This patch introduces the infrastructure to account and expose
    this information via the get_cpu_iowait_time_us() function.

    [akpm@linux-foundation.org: fix CONFIG_NO_HZ=n build]
    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Reviewed-by: Rik van Riel
    Acked-by: Peter Zijlstra
    Cc: davej@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     
  • Now that the only user of ts->idle_lastupdate is
    update_ts_time_stats(), the entire field can be eliminated.

    In update_ts_time_stats(), idle_lastupdate is first set to
    "now", and a few lines later, the only user is an if() statement
    that assigns a variable either to "now" or to
    ts->idle_lastupdate, which has the value of "now" at that point.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Reviewed-by: Rik van Riel
    Acked-by: Peter Zijlstra
    Cc: davej@redhat.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     

09 May, 2010

1 commit


08 May, 2010

1 commit

  • When !CONFIG_SMP, cpu_stop functions weren't defined at all which
    could lead to build failures if UP code uses cpu_stop facility. Add
    dummy cpu_stop implementation for UP. The waiting variants execute
    the work function directly with preempt disabled and
    stop_one_cpu_nowait() schedules a workqueue work.

    Makefile and ifdefs around stop_machine implementation are updated to
    accomodate CONFIG_SMP && !CONFIG_STOP_MACHINE case.

    Signed-off-by: Tejun Heo
    Reported-by: Ingo Molnar

    Tejun Heo
     

07 May, 2010

5 commits

  • struct rq isn't visible outside of sched.o so its near useless to
    expose the pointer, also there are no users of it, so remove it.

    Acked-by: Steven Rostedt
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Ingo Molnar
     
  • Currently migration_thread is serving three purposes - migration
    pusher, context to execute active_load_balance() and forced context
    switcher for expedited RCU synchronize_sched. All three roles are
    hardcoded into migration_thread() and determining which job is
    scheduled is slightly messy.

    This patch kills migration_thread and replaces all three uses with
    cpu_stop. The three different roles of migration_thread() are
    splitted into three separate cpu_stop callbacks -
    migration_cpu_stop(), active_load_balance_cpu_stop() and
    synchronize_sched_expedited_cpu_stop() - and each use case now simply
    asks cpu_stop to execute the callback as necessary.

    synchronize_sched_expedited() was implemented with private
    preallocated resources and custom multi-cpu queueing and waiting
    logic, both of which are provided by cpu_stop.
    synchronize_sched_expedited_count is made atomic and all other shared
    resources along with the mutex are dropped.

    synchronize_sched_expedited() also implemented a check to detect cases
    where not all the callback got executed on their assigned cpus and
    fall back to synchronize_sched(). If called with cpu hotplug blocked,
    cpu_stop already guarantees that and the condition cannot happen;
    otherwise, stop_machine() would break. However, this patch preserves
    the paranoid check using a cpumask to record on which cpus the stopper
    ran so that it can serve as a bisection point if something actually
    goes wrong theree.

    Because the internal execution state is no longer visible,
    rcu_expedited_torture_stats() is removed.

    This patch also renames cpu_stop threads to from "stopper/%d" to
    "migration/%d". The names of these threads ultimately don't matter
    and there's no reason to make unnecessary userland visible changes.

    With this patch applied, stop_machine() and sched now share the same
    resources. stop_machine() is faster without wasting any resources and
    sched migration users are much cleaner.

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Dipankar Sarma
    Cc: Josh Triplett
    Cc: Paul E. McKenney
    Cc: Oleg Nesterov
    Cc: Dimitri Sivanich

    Tejun Heo
     
  • Reimplement stop_machine using cpu_stop. As cpu stoppers are
    guaranteed to be available for all online cpus,
    stop_machine_create/destroy() are no longer necessary and removed.

    With resource management and synchronization handled by cpu_stop, the
    new implementation is much simpler. Asking the cpu_stop to execute
    the stop_cpu() state machine on all online cpus with cpu hotplug
    disabled is enough.

    stop_machine itself doesn't need to manage any global resources
    anymore, so all per-instance information is rolled into struct
    stop_machine_data and the mutex and all static data variables are
    removed.

    The previous implementation created and destroyed RT workqueues as
    necessary which made stop_machine() calls highly expensive on very
    large machines. According to Dimitri Sivanich, preventing the dynamic
    creation/destruction makes booting faster more than twice on very
    large machines. cpu_stop resources are preallocated for all online
    cpus and should have the same effect.

    Signed-off-by: Tejun Heo
    Acked-by: Rusty Russell
    Acked-by: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Dimitri Sivanich

    Tejun Heo
     
  • Implement a simplistic per-cpu maximum priority cpu monopolization
    mechanism. A non-sleeping callback can be scheduled to run on one or
    multiple cpus with maximum priority monopolozing those cpus. This is
    primarily to replace and unify RT workqueue usage in stop_machine and
    scheduler migration_thread which currently is serving multiple
    purposes.

    Four functions are provided - stop_one_cpu(), stop_one_cpu_nowait(),
    stop_cpus() and try_stop_cpus().

    This is to allow clean sharing of resources among stop_cpu and all the
    migration thread users. One stopper thread per cpu is created which
    is currently named "stopper/CPU". This will eventually replace the
    migration thread and take on its name.

    * This facility was originally named cpuhog and lived in separate
    files but Peter Zijlstra nacked the name and thus got renamed to
    cpu_stop and moved into stop_machine.c.

    * Better reporting of preemption leak as per Peter's suggestion.

    Signed-off-by: Tejun Heo
    Acked-by: Peter Zijlstra
    Cc: Oleg Nesterov
    Cc: Dimitri Sivanich

    Tejun Heo
     

05 May, 2010

1 commit

  • When more than one header is included under CREATE_TRACE_POINTS
    the DECLARE_TRACE() macro is not defined back to its original meaning
    and the second include will fail to initialize the TRACE_EVENT()
    and DECLARE_TRACE() correctly.

    To fix this the tracepoint.h file moves the define of DECLARE_TRACE()
    out of the #ifdef _LINUX_TRACEPOINT_H protection (just like the
    define of the TRACE_EVENT()). This way the define_trace.h will undef
    the DECLARE_TRACE() at the end and allow new headers to start
    from scratch.

    This patch also requires fixing the include/events/napi.h

    It currently uses DECLARE_TRACE() and should be converted to a TRACE_EVENT()
    format. But I'll leave that change to the authors of that file.
    But since the napi.h file depends on using the CREATE_TRACE_POINTS
    and does not define its own DEFINE_TRACE() it must use the define_trace.h
    method instead.

    Cc: Neil Horman
    Cc: David S. Miller
    Cc: Mathieu Desnoyers
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

04 May, 2010

1 commit

  • The ftrace.h file contains several functions as macros when the
    functions are disabled due to config options. This patch converts
    most of them to static inlines.

    There are two exceptions:

    register_ftrace_function() and unregister_ftrace_function()

    This is because their parameter "ops" must not be evaluated since
    code using the function is allowed to #ifdef out the creation of
    the parameter.

    This also fixes an error caused by recent changes:

    kernel/trace/trace_irqsoff.c: In function 'start_irqsoff_tracer':
    kernel/trace/trace_irqsoff.c:571: error: expected expression before 'do'

    Reported-by: Ingo Molnar
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

30 Apr, 2010

1 commit

  • * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
    nfs: fix memory leak in nfs_get_sb with CONFIG_NFS_V4
    nfs: fix some issues in nfs41_proc_reclaim_complete()
    NFS: Ensure that nfs_wb_page() waits for Pg_writeback to clear
    NFS: Fix an unstable write data integrity race
    nfs: testing for null instead of ERR_PTR()
    NFS: rsize and wsize settings ignored on v4 mounts
    NFSv4: Don't attempt an atomic open if the file is a mountpoint
    SUNRPC: Fix a bug in rpcauth_prune_expired

    Linus Torvalds
     

29 Apr, 2010

3 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits)
    sfc: Change falcon_probe_board() to fail for unsupported boards
    sfc: Always close net device at the end of a disabling reset
    sfc: Wait at most 10ms for the MC to finish reading out MAC statistics
    sctp: Fix oops when sending queued ASCONF chunks
    sctp: fix to calc the INIT/INIT-ACK chunk length correctly is set
    sctp: per_cpu variables should be in bh_disabled section
    sctp: fix potential reference of a freed pointer
    sctp: avoid irq lock inversion while call sk->sk_data_ready()
    Revert "tcp: bind() fix when many ports are bound"
    net/usb: add sierra_net.c driver
    cdc_ether: fix autosuspend for mbm devices
    bluetooth: handle l2cap_create_connless_pdu() errors
    gianfar: Wait for both RX and TX to stop
    ipheth: potential null dereferences on error path
    smc91c92_cs: spin_unlock_irqrestore before calling smc_interrupt()
    drivers/usb/net/kaweth.c: add device "Allied Telesyn AT-USB10 USB Ethernet Adapter"
    bnx2: Update version to 2.0.9.
    bnx2: Prevent "scheduling while atomic" warning with cnic, bonding and vlan.
    bnx2: Fix lost MSI-X problem on 5709 NICs.
    cxgb3: Wait longer for control packets on initialization
    ...

    Linus Torvalds
     
  • When we finish processing ASCONF_ACK chunk, we try to send
    the next queued ASCONF. This action runs the sctp state
    machine recursively and it's not prepared to do so.

    kernel BUG at kernel/timer.c:790!
    invalid opcode: 0000 [#1] SMP
    last sysfs file: /sys/module/ipv6/initstate
    Modules linked in: sha256_generic sctp libcrc32c ipv6 dm_multipath
    uinput 8139too i2c_piix4 8139cp mii i2c_core pcspkr virtio_net joydev
    floppy virtio_blk virtio_pci [last unloaded: scsi_wait_scan]

    Pid: 0, comm: swapper Not tainted 2.6.34-rc4 #15 /Bochs
    EIP: 0060:[] EFLAGS: 00010286 CPU: 0
    EIP is at add_timer+0xd/0x1b
    EAX: cecbab14 EBX: 000000f0 ECX: c0957b1c EDX: 03595cf4
    ESI: cecba800 EDI: cf276f00 EBP: c0957aa0 ESP: c0957aa0
    DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
    Process swapper (pid: 0, ti=c0956000 task=c0988ba0 task.ti=c0956000)
    Stack:
    c0957ae0 d1851214 c0ab62e4 c0ab5f26 0500ffff 00000004 00000005 00000004
    00000000 d18694fd 00000004 1666b892 cecba800 cecba800 c0957b14
    00000004
    c0957b94 d1851b11 ceda8b00 cecba800 cf276f00 00000001 c0957b14
    000000d0
    Call Trace:
    [] ? sctp_side_effects+0x607/0xdfc [sctp]
    [] ? sctp_do_sm+0x108/0x159 [sctp]
    [] ? sctp_pname+0x0/0x1d [sctp]
    [] ? sctp_primitive_ASCONF+0x36/0x3b [sctp]
    [] ? sctp_process_asconf_ack+0x2a4/0x2d3 [sctp]
    [] ? sctp_sf_do_asconf_ack+0x1dd/0x2b4 [sctp]
    [] ? sctp_do_sm+0xb8/0x159 [sctp]
    [] ? sctp_cname+0x0/0x52 [sctp]
    [] ? sctp_assoc_bh_rcv+0xac/0xe1 [sctp]
    [] ? sctp_inq_push+0x2d/0x30 [sctp]
    [] ? sctp_rcv+0x797/0x82e [sctp]

    Tested-by: Wei Yongjun
    Signed-off-by: Yuansong Qiao
    Signed-off-by: Shuaijun Zhang
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • sk->sk_data_ready() of sctp socket can be called from both BH and non-BH
    contexts, but the default sk->sk_data_ready(), sock_def_readable(), can
    not be used in this case. Therefore, we have to make a new function
    sctp_data_ready() to grab sk->sk_data_ready() with BH disabling.

    =========================================================
    [ INFO: possible irq lock inversion dependency detected ]
    2.6.33-rc6 #129
    ---------------------------------------------------------
    sctp_darn/1517 just changed the state of lock:
    (clock-AF_INET){++.?..}, at: [] sock_def_readable+0x20/0x80
    but this lock took another, SOFTIRQ-unsafe lock in the past:
    (slock-AF_INET){+.-...}

    and interrupts could create inverse lock ordering between them.

    other info that might help us debug this:
    1 lock held by sctp_darn/1517:
    #0: (sk_lock-AF_INET){+.+.+.}, at: [] sctp_sendmsg+0x23d/0xc00 [sctp]

    Signed-off-by: Wei Yongjun
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Wei Yongjun
     

28 Apr, 2010

5 commits

  • * 'urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/pcmcia-2.6:
    pcmcia: fix matching rules for pseudo-multi-function cards
    pcmcia: pcmcia_dev_present bugfix

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    coda: move backing-dev.h kernel include inside __KERNEL__
    mtd: ensure that bdi entries are properly initialized and registered
    Move mtd_bdi_*mappable to mtdcore.c
    btrfs: convert to using bdi_setup_and_register()
    Catch filesystems lacking s_bdi
    drbd: Terminate a connection early if sending the protocol fails
    drbd: fix memory leak
    Fix JFFS2 sync silent failure
    smbfs: add bdi backing to mount session
    ncpfs: add bdi backing to mount session
    exofs: add bdi backing to mount session
    ecryptfs: add bdi backing to mount session
    coda: add bdi backing to mount session
    cifs: add bdi backing to mount session
    afs: add bdi backing to mount session.
    9p: add bdi backing to mount session
    bdi: add helper function for doing init and register of a bdi for a file system
    block: ensure jiffies wrap is handled correctly in blk_rq_timed_out_timer

    Linus Torvalds
     
  • Otherwise we must export backing-dev.h as well, which doesn't make
    any sense.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • When performing a non-consuming read, a synchronize_sched() is
    performed once for every cpu which is actively tracing.

    This is very expensive, and can make it take several seconds to open
    up the 'trace' file with lots of cpus.

    Only one synchronize_sched() call is actually necessary. What is
    desired is for all cpus to see the disabling state change. So we
    transform the existing sequence:

    for_each_cpu() {
    ring_buffer_read_start();
    }

    where each ring_buffer_start() call performs a synchronize_sched(),
    into the following:

    for_each_cpu() {
    ring_buffer_read_prepare();
    }
    ring_buffer_read_prepare_sync();
    for_each_cpu() {
    ring_buffer_read_start();
    }

    wherein only the single ring_buffer_read_prepare_sync() call needs to
    do the synchronize_sched().

    The first phase, via ring_buffer_read_prepare(), allocates the 'iter'
    memory and increments ->record_disabled.

    In the second phase, ring_buffer_read_prepare_sync() makes sure this
    ->record_disabled state is visible fully to all cpus.

    And in the final third phase, the ring_buffer_read_start() calls reset
    the 'iter' objects allocated in the first phase since we now know that
    none of the cpus are adding trace entries any more.

    This makes openning the 'trace' file nearly instantaneous on a
    sparc64 Niagara2 box with 128 cpus tracing.

    Signed-off-by: David S. Miller
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    David Miller
     
  • Add function graph output to irqsoff tracer.

    The graph output is enabled by setting new 'display-graph' trace option.

    Signed-off-by: Jiri Olsa
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Jiri Olsa
     

25 Apr, 2010

2 commits

  • noop_backing_dev_info is used only as a flag to mark filesystems that
    don't have any backing store, like tmpfs, procfs, spufs, etc.

    Signed-off-by: Joern Engel

    Changed the BUG_ON() to a WARN_ON(). Note that adding dirty inodes
    to the noop_backing_dev_info is not legal and will not result in
    them being flushed, but we already catch this condition in
    __mark_inode_dirty() when checking for a registered bdi.

    Signed-off-by: Jens Axboe

    Jörn Engel
     
  • If a futex key happens to be located within a huge page mapped
    MAP_PRIVATE, get_futex_key() can go into an infinite loop waiting for a
    page->mapping that will never exist.

    See https://bugzilla.redhat.com/show_bug.cgi?id=552257 for more details
    about the problem.

    This patch makes page->mapping a poisoned value that includes
    PAGE_MAPPING_ANON mapped MAP_PRIVATE. This is enough for futex to
    continue but because of PAGE_MAPPING_ANON, the poisoned value is not
    dereferenced or used by futex. No other part of the VM should be
    dereferencing the page->mapping of a hugetlbfs page as its page cache is
    not on the LRU.

    This patch fixes the problem with the test case described in the bugzilla.

    [akpm@linux-foundation.org: mel cant spel]
    Signed-off-by: Mel Gorman
    Acked-by: Peter Zijlstra
    Acked-by: Darren Hart
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

24 Apr, 2010

1 commit

  • This cleans up a few of the complaints of __generic_block_fiemap. I've
    fixed all the typing stuff, used inline functions instead of macros,
    gotten rid of a couple of variables, and made sure the size and block
    requests are all block aligned. It also fixes a problem where sometimes
    FIEMAP_EXTENT_LAST wasn't being set properly.

    Signed-off-by: Josef Bacik
    Signed-off-by: Linus Torvalds

    Josef Bacik
     

23 Apr, 2010

4 commits


22 Apr, 2010

6 commits


21 Apr, 2010

2 commits