04 Jan, 2012

1 commit


01 Nov, 2011

1 commit


11 Aug, 2011

1 commit

  • Add FLUSH/FUA support to blktrace. As FLUSH precedes WRITE and/or
    FUA follows WRITE, use the same 'F' flag for both cases and
    distinguish them by their (relative) position. The end results
    look like (other flags might be shown also):

    - WRITE: W
    - WRITE_FLUSH: FW
    - WRITE_FUA: WF
    - WRITE_FLUSH_FUA: FWF

    Note that we reuse TC_BARRIER due to lack of bit space of act_mask
    so that the older versions of blktrace tools will report flush
    requests as barriers from now on.

    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Signed-off-by: Namhyung Kim
    Reviewed-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Namhyung Kim
     

16 Apr, 2011

1 commit

  • It's a pretty close match to what we had before - the timer triggering
    would mean that nobody unplugged the plug in due time, in the new
    scheme this matches very closely what the schedule() unplug now is.
    It's essentially the difference between an explicit unplug (IO unplug)
    or an implicit unplug (timer unplug, we scheduled with pending IO
    queued).

    Signed-off-by: Jens Axboe

    Jens Axboe
     

12 Apr, 2011

2 commits


12 Mar, 2011

1 commit

  • In blk_add_trace_rq, we only chose the minor 2 bits from
    request's cmd_flags and did some check for discard.
    so most of other flags(e.g, REQ_SYNC) are missing.

    For example, with a sync write after blkparse we get:
    8,16 1 1 0.001776503 7509 A WS 1349632 + 1024 cmd_flags directly to __blk_add_trace.

    With this patch, after a sync write we get:
    8,16 1 1 0.001776900 5425 A WS 1189888 + 1024
    Acked-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Tao Ma
     

03 Mar, 2011

1 commit

  • If we enable trace events to trace block actions, We use
    blk_fill_rwbs_rq to analyze the corresponding actions
    in request's cmd_flags, but we only choose the minor 2 bits
    from it, so most of other flags(e.g, REQ_SYNC) are missing.
    For example, with a sync write we get:
    write_test-2409 [001] 160.013869: block_rq_insert: 3,64 W 0 () 258135 + =
    8 [write_test]

    Since now we have integrated the flags of both bio and request,
    it is safe to pass rq->cmd_flags directly to blk_fill_rwbs and
    blk_fill_rwbs_rq isn't needed any more.

    With this patch, after a sync write we get:
    write_test-2417 [000] 226.603878: block_rq_insert: 3,64 WS 0 () 258135 +=
    8 [write_test]

    Signed-off-by: Tao Ma
    Acked-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Tao Ma
     

19 Jan, 2011

1 commit

  • Now if we enable blktrace, cfq has too many messages output to the
    trace buffer. It is fine if we don't specify any action mask.
    But if I do like this:
    blktrace /dev/sdb -a issue -a complete -o - | blkparse -i -
    I only want to see 'D' and 'C', while with the following command
    dd if=/mnt/ocfs2/test of=/dev/null bs=4k count=1 iflag=direct

    I will get(with a 2.6.37 vanilla kernel):
    8,16 0 0 0.000000000 0 m N cfq3805 alloced
    8,16 0 0 0.000004126 0 m N cfq3805 insert_request
    8,16 0 0 0.000004884 0 m N cfq3805 add_to_rr
    8,16 0 0 0.000008417 0 m N cfq workload slice:300
    8,16 0 0 0.000009557 0 m N cfq3805 set_active wl_prio:0 wl_type:2
    8,16 0 0 0.000010640 0 m N cfq3805 fifo= (null)
    8,16 0 0 0.000011193 0 m N cfq3805 dispatch_insert
    8,16 0 0 0.000012221 0 m N cfq3805 dispatched a request
    8,16 0 0 0.000012802 0 m N cfq3805 activate rq, drv=1
    8,16 0 1 0.000013181 3805 D R 114759 + 8 [dd]
    8,16 0 2 0.000164244 0 C R 114759 + 8 [0]
    8,16 0 0 0.000167997 0 m N cfq3805 complete rqnoidle 0
    8,16 0 0 0.000168782 0 m N cfq3805 set_slice=100
    8,16 0 0 0.000169874 0 m N cfq3805 arm_idle: 8 group_idle: 0
    8,16 0 0 0.000170189 0 m N cfq schedule dispatch
    8,16 0 0 0.000397938 0 m N cfq3805 slice expired t=0
    8,16 0 0 0.000399763 0 m N cfq3805 sl_used=1 disp=1 charge=1 iops=0 sect=8
    8,16 0 0 0.000400227 0 m N cfq3805 del_from_rr
    8,16 0 0 0.000400882 0 m N cfq3805 put_queue

    See, there are 19 lines while I only need 2. I don't think it is
    appropriate for a user.

    So this patch will disable any messages if the BLK_TC_NOTIFY isn't set.
    Now the output for the same command will look like:
    8,16 0 1 0.000000000 4908 D R 114759 + 8 [dd]
    8,16 0 2 0.000146827 0 C R 114759 + 8 [0]

    Yes, it is what I want to see.

    Cc: Steven Rostedt
    Cc: Jeff Moyer
    Signed-off-by: Tao Ma
    Signed-off-by: Jens Axboe

    Tao Ma
     

10 Jan, 2011

1 commit


07 Jan, 2011

1 commit

  • blktrace.c block bio complete callback needs to gain a new argument to reflect
    the newly added "error" tracepoint argument. This is needed to match the new
    block_bio_complete TRACE_EVENT as of
    commit de983a7bfcb7c020901ca6e2314cf55a4207ab5a.

    Signed-off-by: Mathieu Desnoyers
    CC: Jeff Moyer
    CC: Steven Rostedt
    CC: Frederic Weisbecker
    CC: Ingo Molnar
    CC: Thomas Gleixner
    CC: Li Zefan
    Signed-off-by: Jens Axboe

    Mathieu Desnoyers
     

16 Nov, 2010

1 commit


10 Nov, 2010

1 commit

  • REQ_HARDBARRIER is dead now, so remove the leftovers. What's left
    at this point is:

    - various checks inside the block layer.
    - sanity checks in bio based drivers.
    - now unused bio_empty_barrier helper.
    - Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while,
    but Xen really needs to sort out it's barrier situaton.
    - setting of ordered tags in uas - dead code copied from old scsi
    drivers.
    - scsi different retry for barriers - it's dead and should have been
    removed when flushes were converted to FS requests.
    - blktrace handling of barriers - removed. Someone who knows blktrace
    better should add support for REQ_FLUSH and REQ_FUA, though.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

23 Oct, 2010

1 commit

  • * 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
    vfs: make no_llseek the default
    vfs: don't use BKL in default_llseek
    llseek: automatically add .llseek fop
    libfs: use generic_file_llseek for simple_attr
    mac80211: disallow seeks in minstrel debug code
    lirc: make chardev nonseekable
    viotape: use noop_llseek
    raw: use explicit llseek file operations
    ibmasmfs: use generic_file_llseek
    spufs: use llseek in all file operations
    arm/omap: use generic_file_llseek in iommu_debug
    lkdtm: use generic_file_llseek in debugfs
    net/wireless: use generic_file_llseek in debugfs
    drm: use noop_llseek

    Linus Torvalds
     

19 Oct, 2010

1 commit


15 Oct, 2010

1 commit

  • All file_operations should get a .llseek operation so we can make
    nonseekable_open the default for future file operations without a
    .llseek pointer.

    The three cases that we can automatically detect are no_llseek, seq_lseek
    and default_llseek. For cases where we can we can automatically prove that
    the file offset is always ignored, we use noop_llseek, which maintains
    the current behavior of not returning an error from a seek.

    New drivers should normally not use noop_llseek but instead use no_llseek
    and call nonseekable_open at open time. Existing drivers can be converted
    to do the same when the maintainer knows for certain that no user code
    relies on calling seek on the device file.

    The generated code is often incorrectly indented and right now contains
    comments that clarify for each added line why a specific variant was
    chosen. In the version that gets submitted upstream, the comments will
    be gone and I will manually fix the indentation, because there does not
    seem to be a way to do that using coccinelle.

    Some amount of new code is currently sitting in linux-next that should get
    the same modifications, which I will do at the end of the merge window.

    Many thanks to Julia Lawall for helping me learn to write a semantic
    patch that does all this.

    ===== begin semantic patch =====
    // This adds an llseek= method to all file operations,
    // as a preparation for making no_llseek the default.
    //
    // The rules are
    // - use no_llseek explicitly if we do nonseekable_open
    // - use seq_lseek for sequential files
    // - use default_llseek if we know we access f_pos
    // - use noop_llseek if we know we don't access f_pos,
    // but we still want to allow users to call lseek
    //
    @ open1 exists @
    identifier nested_open;
    @@
    nested_open(...)
    {

    }

    @ open exists@
    identifier open_f;
    identifier i, f;
    identifier open1.nested_open;
    @@
    int open_f(struct inode *i, struct file *f)
    {

    }

    @ read disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {

    }

    @ read_no_fpos disable optional_qualifier exists @
    identifier read_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ write @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    expression E;
    identifier func;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {

    }

    @ write_no_fpos @
    identifier write_f;
    identifier f, p, s, off;
    type ssize_t, size_t, loff_t;
    @@
    ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
    {
    ... when != off
    }

    @ fops0 @
    identifier fops;
    @@
    struct file_operations fops = {
    ...
    };

    @ has_llseek depends on fops0 @
    identifier fops0.fops;
    identifier llseek_f;
    @@
    struct file_operations fops = {
    ...
    .llseek = llseek_f,
    ...
    };

    @ has_read depends on fops0 @
    identifier fops0.fops;
    identifier read_f;
    @@
    struct file_operations fops = {
    ...
    .read = read_f,
    ...
    };

    @ has_write depends on fops0 @
    identifier fops0.fops;
    identifier write_f;
    @@
    struct file_operations fops = {
    ...
    .write = write_f,
    ...
    };

    @ has_open depends on fops0 @
    identifier fops0.fops;
    identifier open_f;
    @@
    struct file_operations fops = {
    ...
    .open = open_f,
    ...
    };

    // use no_llseek if we call nonseekable_open
    ////////////////////////////////////////////
    @ nonseekable1 depends on !has_llseek && has_open @
    identifier fops0.fops;
    identifier nso ~= "nonseekable_open";
    @@
    struct file_operations fops = {
    ... .open = nso, ...
    +.llseek = no_llseek, /* nonseekable */
    };

    @ nonseekable2 depends on !has_llseek @
    identifier fops0.fops;
    identifier open.open_f;
    @@
    struct file_operations fops = {
    ... .open = open_f, ...
    +.llseek = no_llseek, /* open uses nonseekable */
    };

    // use seq_lseek for sequential files
    /////////////////////////////////////
    @ seq depends on !has_llseek @
    identifier fops0.fops;
    identifier sr ~= "seq_read";
    @@
    struct file_operations fops = {
    ... .read = sr, ...
    +.llseek = seq_lseek, /* we have seq_read */
    };

    // use default_llseek if there is a readdir
    ///////////////////////////////////////////
    @ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier readdir_e;
    @@
    // any other fop is used that changes pos
    struct file_operations fops = {
    ... .readdir = readdir_e, ...
    +.llseek = default_llseek, /* readdir is present */
    };

    // use default_llseek if at least one of read/write touches f_pos
    /////////////////////////////////////////////////////////////////
    @ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read.read_f;
    @@
    // read fops use offset
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = default_llseek, /* read accesses f_pos */
    };

    @ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ... .write = write_f, ...
    + .llseek = default_llseek, /* write accesses f_pos */
    };

    // Use noop_llseek if neither read nor write accesses f_pos
    ///////////////////////////////////////////////////////////

    @ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    identifier write_no_fpos.write_f;
    @@
    // write fops use offset
    struct file_operations fops = {
    ...
    .write = write_f,
    .read = read_f,
    ...
    +.llseek = noop_llseek, /* read and write both use no f_pos */
    };

    @ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier write_no_fpos.write_f;
    @@
    struct file_operations fops = {
    ... .write = write_f, ...
    +.llseek = noop_llseek, /* write uses no f_pos */
    };

    @ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    identifier read_no_fpos.read_f;
    @@
    struct file_operations fops = {
    ... .read = read_f, ...
    +.llseek = noop_llseek, /* read uses no f_pos */
    };

    @ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
    identifier fops0.fops;
    @@
    struct file_operations fops = {
    ...
    +.llseek = noop_llseek, /* no read or write fn */
    };
    ===== End semantic patch =====

    Signed-off-by: Arnd Bergmann
    Cc: Julia Lawall
    Cc: Christoph Hellwig

    Arnd Bergmann
     

12 Aug, 2010

1 commit

  • Secure discard is the same as discard except that all copies of the
    discarded sectors (perhaps created by garbage collection) must also be
    erased.

    Signed-off-by: Adrian Hunter
    Acked-by: Jens Axboe
    Cc: Kyungmin Park
    Cc: Madhusudhan Chikkature
    Cc: Christoph Hellwig
    Cc: Ben Gardiner
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Hunter
     

08 Aug, 2010

3 commits

  • The blktrace driver currently needs the BKL, but
    we should not need to take that in the block layer,
    so just push it down into the driver itself.

    It is quite likely that the BKL is not actually
    required in blktrace code and could be removed
    in a follow-on patch.

    Signed-off-by: Arnd Bergmann
    Acked-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     
  • Remove the current bio flags and reuse the request flags for the bio, too.
    This allows to more easily trace the type of I/O from the filesystem
    down to the block driver. There were two flags in the bio that were
    missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
    renamed two request flags that had a superflous RW in them.

    Note that the flags are in bio.h despite having the REQ_ name - as
    blkdev.h includes bio.h that is the only way to go for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Remove all the trivial wrappers for the cmd_type and cmd_flags fields in
    struct requests. This allows much easier grepping for different request
    types instead of unwinding through macros.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

31 May, 2010

1 commit

  • Fix blktrace.c kernel-doc warnings:
    Warning(kernel/trace/blktrace.c:858): No description found for parameter 'ignore'
    Warning(kernel/trace/blktrace.c:890): No description found for parameter 'ignore'

    Signed-off-by: Randy Dunlap
    Cc: Jens Axboe
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Randy Dunlap
     

15 May, 2010

1 commit

  • Multiple events may use the same method to print their data.
    Instead of having all events have a pointer to their print funtions,
    the trace_event structure now points to a trace_event_functions structure
    that will hold the way to print ouf the event.

    The event itself is now passed to the print function to let the print
    function know what kind of event it should print.

    This opens the door to consolidating the way several events print
    their output.

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4900382 1048964 861512 6810858 67ecea vmlinux.init
    4900446 1049028 861512 6810986 67ed6a vmlinux.preprint

    This change slightly increases the size but is needed for the next change.

    v3: Fix the branch tracer events to handle this change.

    v2: Fix the new function graph tracer event calls to handle this change.

    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

14 May, 2010

1 commit

  • This patch adds data to be passed to tracepoint callbacks.

    The created functions from DECLARE_TRACE() now need a mandatory data
    parameter. For example:

    DECLARE_TRACE(mytracepoint, int value, value)

    Will create the register function:

    int register_trace_mytracepoint((void(*)(void *data, int value))probe,
    void *data);

    As the first argument, all callbacks (probes) must take a (void *data)
    parameter. So a callback for the above tracepoint will look like:

    void myprobe(void *data, int value)
    {
    }

    The callback may choose to ignore the data parameter.

    This change allows callbacks to register a private data pointer along
    with the function probe.

    void mycallback(void *data, int value);

    register_trace_mytracepoint(mycallback, mydata);

    Then the mycallback() will receive the "mydata" as the first parameter
    before the args.

    A more detailed example:

    DECLARE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

    /* In the C file */

    DEFINE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

    [...]

    trace_mytracepoint(status);

    /* In a file registering this tracepoint */

    int my_callback(void *data, int status)
    {
    struct my_struct my_data = data;
    [...]
    }

    [...]
    my_data = kmalloc(sizeof(*my_data), GFP_KERNEL);
    init_my_data(my_data);
    register_trace_mytracepoint(my_callback, my_data);

    The same callback can also be registered to the same tracepoint as long
    as the data registered is different. Note, the data must also be used
    to unregister the callback:

    unregister_trace_mytracepoint(my_callback, my_data);

    Because of the data parameter, tracepoints declared this way can not have
    no args. That is:

    DECLARE_TRACE(mytracepoint, TP_PROTO(void), TP_ARGS());

    will cause an error.

    If no arguments are needed, a new macro can be used instead:

    DECLARE_TRACE_NOARGS(mytracepoint);

    Since there are no arguments, the proto and args fields are left out.

    This is part of a series to make the tracepoint footprint smaller:

    text data bss dec hex filename
    4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
    4914025 1088868 861512 6864405 68be15 vmlinux.class
    4918492 1084612 861512 6864616 68bee8 vmlinux.tracepoint

    Again, this patch also increases the size of the kernel, but
    lays the ground work for decreasing it.

    v5: Fixed net/core/drop_monitor.c to handle these updates.

    v4: Moved the DECLARE_TRACE() DECLARE_TRACE_NOARGS out of the
    #ifdef CONFIG_TRACE_POINTS, since the two are the same in both
    cases. The __DECLARE_TRACE() is what changes.
    Thanks to Frederic Weisbecker for pointing this out.

    v3: Made all register_* functions require data to be passed and
    all callbacks to take a void * parameter as its first argument.
    This makes the calling functions comply with C standards.

    Also added more comments to the modifications of DECLARE_TRACE().

    v2: Made the DECLARE_TRACE() have the ability to pass arguments
    and added a new DECLARE_TRACE_NOARGS() for tracepoints that
    do not need any arguments.

    Acked-by: Mathieu Desnoyers
    Acked-by: Masami Hiramatsu
    Acked-by: Frederic Weisbecker
    Cc: Neil Horman
    Cc: David S. Miller
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

01 Mar, 2010

1 commit


02 Oct, 2009

2 commits

  • Since 2.6.31 now has request-based device-mapper, it's useful to have
    a tracepoint for request-remapping as well as bio-remapping.
    This patch adds a tracepoint for request-remapping, trace_block_rq_remap().

    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Cc: Alasdair G Kergon
    Cc: Li Zefan
    Signed-off-by: Jens Axboe

    Jun'ichi Nomura
     
  • Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
    introduced in commit 1d54ad6da9192fed5dd3b60224d9f2dfea0dcd82.
    Release kobject also in case the request_fn is NULL.

    Problem was noticed via kmemleak backtrace when some sysfs entries were
    note properly destroyed during device removal:

    unreferenced object 0xffff88001aa76640 (size 80):
    comm "lvcreate", pid 2120, jiffies 4294885144
    hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 f0 65 a7 1a 00 88 ff ff .........e......
    90 66 a7 1a 00 88 ff ff 86 1d 53 81 ff ff ff ff .f........S.....
    backtrace:
    [] kmemleak_alloc+0x26/0x60
    [] kmem_cache_alloc+0x133/0x1c0
    [] sysfs_new_dirent+0x41/0x120
    [] sysfs_add_file_mode+0x3c/0xb0
    [] internal_create_group+0xc1/0x1a0
    [] sysfs_create_group+0x13/0x20
    [] blk_trace_init_sysfs+0x14/0x20
    [] blk_register_queue+0x3c/0xf0
    [] add_disk+0x94/0x160
    [] dm_create+0x598/0x6e0 [dm_mod]
    [] dev_create+0x51/0x350 [dm_mod]
    [] ctl_ioctl+0x1a3/0x240 [dm_mod]
    [] dm_compat_ctl_ioctl+0x12/0x20 [dm_mod]
    [] compat_sys_ioctl+0xcd/0x4f0
    [] sysenter_dispatch+0x7/0x2c
    [] 0xffffffffffffffff

    Signed-off-by: Zdenek Kabelac
    Reviewed-by: Li Zefan
    Signed-off-by: Jens Axboe

    Zdenek Kabelac
     

06 Sep, 2009

1 commit


05 Sep, 2009

1 commit

  • The latency tracers (irqsoff and wakeup) can swap trace buffers
    on the fly. If an event is happening and has reserved data on one of
    the buffers, and the latency tracer swaps the global buffer with the
    max buffer, the result is that the event may commit the data to the
    wrong buffer.

    This patch changes the API to the trace recording to be recieve the
    buffer that was used to reserve a commit. Then this buffer can be passed
    in to the commit.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

13 Aug, 2009

1 commit

  • commit fd51d251e4cdb21f68e9dbc4336514d64a105a79
    Author: Stefan Raspl
    Date: Tue May 19 09:59:08 2009 +0200

    blktrace: remove debugfs entries on bad path

    added in an explicit invocation of debugfs_remove for bt->dir, in
    blk_remove_buf_file_callback we are also getting the directory removed. On
    occasion I am seeing memory corruption that I have bisected down to
    this commit. [The testing involves a (long) series of I/O benchmarks
    with blktrace invoked around the actual runs.] I believe that this
    committed patch is correct, but the problem actually lies in the code
    in blk_remove_buf_file_callback.

    With this patch I am able to consistently get complete runs whereas
    previously I could not get a single run to complete.

    The first part of the patch simply moves the debugfs_remove below the
    relay_close: the relay_close call will remove files under bt->dir, and
    so we should not remove the directory until all the files we created
    have been removed. (Note: This is not sufficient to fix the problem -
    the file system code has ref counts on the directoy, so our invocation
    does not cause the directory to actually be removed. Nonetheless, we
    should not rely upon that feature.)

    Signed-off-by: Alan D. Brunelle
    Signed-off-by: Jens Axboe

    Alan D. Brunelle
     

13 Jul, 2009

1 commit

  • * Remove smp_lock.h from files which don't need it (including some headers!)
    * Add smp_lock.h to files which do need it
    * Make smp_lock.h include conditional in hardirq.h
    It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

    This will make hardirq.h inclusion cheaper for every PREEMPT=n config
    (which includes allmodconfig/allyesconfig, BTW)

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

12 Jun, 2009

1 commit

  • * 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits)
    block: add request clone interface (v2)
    floppy: fix hibernation
    ramdisk: remove long-deprecated "ramdisk=" boot-time parameter
    fs/bio.c: add missing __user annotation
    block: prevent possible io_context->refcount overflow
    Add serial number support for virtio_blk, V4a
    block: Add missing bounce_pfn stacking and fix comments
    Revert "block: Fix bounce limit setting in DM"
    cciss: decode unit attention in SCSI error handling code
    cciss: Remove no longer needed sendcmd reject processing code
    cciss: change SCSI error handling routines to work with interrupts enabled.
    cciss: separate error processing and command retrying code in sendcmd_withirq_core()
    cciss: factor out fix target status processing code from sendcmd functions
    cciss: simplify interface of sendcmd() and sendcmd_withirq()
    cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code
    cciss: Use schedule_timeout_uninterruptible in SCSI error handling code
    block: needs to set the residual length of a bidi request
    Revert "block: implement blkdev_readpages"
    block: Fix bounce limit setting in DM
    Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt
    ...

    Manually fix conflicts with tracing updates in:
    block/blk-sysfs.c
    drivers/ide/ide-atapi.c
    drivers/ide/ide-cd.c
    drivers/ide/ide-floppy.c
    drivers/ide/ide-tape.c
    include/trace/events/block.h
    kernel/trace/blktrace.c

    Linus Torvalds
     

10 Jun, 2009

1 commit

  • TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
    these new capabilities to this tracepoint:

    - zero-copy and per-cpu splice() tracing
    - binary tracing without printf overhead
    - structured logging records exposed under /debug/tracing/events
    - trace events embedded in function tracer output and other plugins
    - user-defined, per tracepoint filter expressions
    ...

    Cons:

    - no dev_t info for the output of plug, unplug_timer and unplug_io events.
    no dev_t info for getrq and sleeprq events if bio == NULL.
    no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.

    This is mainly because we can't get the deivce from a request queue.
    But this may change in the future.

    - A packet command is converted to a string in TP_assign, not TP_print.
    While blktrace do the convertion just before output.

    Since pc requests should be rather rare, this is not a big issue.

    - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
    has a unique format, which means we have some unused data in a trace entry.

    The overhead is minimized by using __dynamic_array() instead of __array().

    I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:

    dd dd + ioctl blktrace dd + TRACE_EVENT (splice)
    1 7.36s, 42.7 MB/s 7.50s, 42.0 MB/s 7.41s, 42.5 MB/s
    2 7.43s, 42.3 MB/s 7.48s, 42.1 MB/s 7.43s, 42.4 MB/s
    3 7.38s, 42.6 MB/s 7.45s, 42.2 MB/s 7.41s, 42.5 MB/s

    So the overhead of tracing is very small, and no regression when using
    those trace events vs blktrace.

    And the binary output of TRACE_EVENT is much smaller than blktrace:

    # ls -l -h
    -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
    -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
    -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out

    Following are some comparisons between TRACE_EVENT and blktrace:

    plug:
    kjournald-480 [000] 303.084981: block_plug: [kjournald]
    kjournald-480 [000] 303.084981: 8,0 P N [kjournald]

    unplug_io:
    kblockd/0-118 [000] 300.052973: block_unplug_io: [kblockd/0] 1
    kblockd/0-118 [000] 300.052974: 8,0 U N [kblockd/0] 1

    remap:
    kjournald-480 [000] 303.085042: block_remap: 8,0 W 102736992 + 8 v3:

    - use the newly introduced __dynamic_array().

    Changelog from v1 -> v2:

    - use __string() instead of __array() to minimize the memory required
    to store hex dump of rq->cmd().

    - support large pc requests.

    - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.

    - some cleanups.

    Signed-off-by: Li Zefan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Li Zefan
     

19 May, 2009

1 commit

  • debugfs directory entries for devices are not removed on some
    of the failure pathes in do_blk_trace_setup().
    One way to reproduce is to start blktrace on multiple devices
    with insufficient Vmalloc space: Devices will fail with
    a message like this:

    BLKTRACESETUP(2) /dev/sdu failed: 5/Input/output error

    If so, the respective entries in debugfs
    (e.g. /sys/kernel/debug/block/sdu) will remain and subsequent
    attempts to start blktrace on the respective devices will not
    succeed due to existing directories.

    [ Impact: fix /debug/tracing file cleanup corner case ]

    Signed-off-by: Stefan Raspl
    Acked-by: Li Zefan
    Cc: Li Zefan
    Cc: schwidefsky@de.ibm.com
    Cc: heiko.carstens@de.ibm.com
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Stefan Raspl
     

11 May, 2009

3 commits

  • I got this:
    8,0 1 305.417782332 2037 I R 32 (ffffff9e 10 00 ...) [bash]

    It should be:
    8,0 1 305.417782332 2037 I R 32 (9e 10 00 ...) [bash]

    [ Impact: fix output of pc events ]

    Signed-off-by: Li Zefan
    Cc: Jens Axboe
    Cc: Arnaldo Carvalho de Melo
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     
  • struct request has had a few different ways to represent some
    properties of a request. ->hard_* represent block layer's view of the
    request progress (completion cursor) and the ones without the prefix
    are supposed to represent the issue cursor and allowed to be updated
    as necessary by the low level drivers. The thing is that as block
    layer supports partial completion, the two cursors really aren't
    necessary and only cause confusion. In addition, manual management of
    request detail from low level drivers is cumbersome and error-prone at
    the very least.

    Another interesting duplicate fields are rq->[hard_]nr_sectors and
    rq->{hard_cur|current}_nr_sectors against rq->data_len and
    rq->bio->bi_size. This is more convoluted than the hard_ case.

    rq->[hard_]nr_sectors are initialized for requests with bio but
    blk_rq_bytes() uses it only for !pc requests. rq->data_len is
    initialized for all request but blk_rq_bytes() uses it only for pc
    requests. This causes good amount of confusion throughout block layer
    and its drivers and determining the request length has been a bit of
    black magic which may or may not work depending on circumstances and
    what the specific LLD is actually doing.

    rq->{hard_cur|current}_nr_sectors represent the number of sectors in
    the contiguous data area at the front. This is mainly used by drivers
    which transfers data by walking request segment-by-segment. This
    value always equals rq->bio->bi_size >> 9. However, data length for
    pc requests may not be multiple of 512 bytes and using this field
    becomes a bit confusing.

    In general, having multiple fields to represent the same property
    leads only to confusion and subtle bugs. With recent block low level
    driver cleanups, no driver is accessing or manipulating these
    duplicate fields directly. Drop all the duplicates. Now rq->sector
    means the current sector, rq->data_len the current total length and
    rq->bio->bi_size the current segment length. Everything else is
    defined in terms of these three and available only through accessors.

    * blk_recalc_rq_sectors() is collapsed into blk_update_request() and
    now handles pc and fs requests equally other than rq->sector update.
    This means that now pc requests can use partial completion too (no
    in-kernel user yet tho).

    * bio_cur_sectors() is replaced with bio_cur_bytes() as block layer
    now uses byte count as the primary data length.

    * blk_rq_pos() is now guranteed to be always correct. In-block users
    converted.

    * blk_rq_bytes() is now guaranteed to be always valid as is
    blk_rq_sectors(). In-block users converted.

    * blk_rq_sectors() is now guaranteed to equal blk_rq_bytes() >> 9.
    More convenient one is used.

    * blk_rq_bytes() and blk_rq_cur_bytes() are now inlined and take const
    pointer to request.

    [ Impact: API cleanup, single way to represent one property of a request ]

    Signed-off-by: Tejun Heo
    Cc: Boaz Harrosh
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Implement accessors - blk_rq_pos(), blk_rq_sectors() and
    blk_rq_cur_sectors() which return rq->hard_sector, rq->hard_nr_sectors
    and rq->hard_cur_sectors respectively and convert direct references of
    the said fields to the accessors.

    This is in preparation of request data length handling cleanup.

    Geert : suggested adding const to struct request * parameter to accessors
    Sergei : spotted error in patch description

    [ Impact: cleanup ]

    Signed-off-by: Tejun Heo
    Acked-by: Geert Uytterhoeven
    Acked-by: Stephen Rothwell
    Tested-by: Grant Likely
    Acked-by: Grant Likely
    Ackec-by: Sergei Shtylyov
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Borislav Petkov
    Cc: James Bottomley
    Signed-off-by: Jens Axboe

    Tejun Heo
     

06 May, 2009

2 commits

  • Remove redundant from-sector parameter: it's /always/ the bio's sector
    passed in.

    [ Impact: cleanup ]

    Signed-off-by: Alan D. Brunelle
    Reviewed-by: Li Zefan
    Reviewed-by: KOSAKI Motohiro
    Cc: Jens Axboe
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Alan D. Brunelle
     
  • This attempts to clarify names utilized during block I/O remap
    operations (partition, volume manager). It correctly matches up the
    /from/ information for both device & sector. This takes in the concept
    from Kosaki Motohiro and extends it to include better naming for the
    "device_from" field.

    [ Impact: cleanup ]

    Signed-off-by: Alan D. Brunelle
    Reviewed-by: Li Zefan
    Reviewed-by: KOSAKI Motohiro
    Cc: Jens Axboe
    Cc: Arnaldo Carvalho de Melo
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Alan D. Brunelle
     

16 Apr, 2009

1 commit

  • When current tracer is set to blk tracer, TRACE_ITER_CONTEXT_INFO is
    unset, but actually context-info is printed:

    pdflush-431 [000] 821.181576: 8,0 P N [pdflush]

    And then if we enable TRACE_ITER_CONTEXT_INFO:

    # echo context-info > trace_options

    We'll see context-info printed twice. What's worse, when we use blk
    tracer and trace events at the same time, we'll see no context-info
    for trace events at all:

    jbd2_commit_logging: dev dm-0:8 transaction 333227
    jbd2_end_commit: dev dm-0:8 transaction 333227 head 332814
    rm-25433 [001] 9578.307485: 8,18 m N cfq25433 slice expired t=0
    rm-25433 [001] 9578.307486: 8,18 m N cfq25433 put_queue

    This patch adds blk_tracer->set_flags(), and context-info flag is unset
    only when we set the output to classic mode.

    Note after this patch, one should unset context-info explicitly if he
    wants to get binary output that can be parsed by blkparse:

    # echo nocontext-info > trace_options
    # echo bin > trace_options
    # echo blk > current_tracer
    # cat trace_pipe | blkparse -i -

    Reported-by: Theodore Ts'o
    Signed-off-by: Li Zefan
    Cc: Jens Axboe
    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Steven Rostedt
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan