06 Mar, 2014

1 commit

  • trace_block_rq_complete does not take into account that request can
    be partially completed, so we can get the following incorrect output
    of blkparser:

    C R 232 + 240 [0]
    C R 240 + 232 [0]
    C R 248 + 224 [0]
    C R 256 + 216 [0]

    but should be:

    C R 232 + 8 [0]
    C R 240 + 8 [0]
    C R 248 + 8 [0]
    C R 256 + 8 [0]

    Also, the whole output summary statistics of completed requests and
    final throughput will be incorrect.

    This patch takes into account real completion size of the request and
    fixes wrong completion accounting.

    Signed-off-by: Roman Pen
    CC: Steven Rostedt
    CC: Frederic Weisbecker
    CC: Ingo Molnar
    CC: linux-kernel@vger.kernel.org
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Roman Pen
     

24 Nov, 2013

1 commit

  • Immutable biovecs are going to require an explicit iterator. To
    implement immutable bvecs, a later patch is going to add a bi_bvec_done
    member to this struct; for now, this patch effectively just renames
    things.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Ed L. Cashin"
    Cc: Nick Piggin
    Cc: Lars Ellenberg
    Cc: Jiri Kosina
    Cc: Matthew Wilcox
    Cc: Geoff Levand
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Joshua Morris
    Cc: Philip Kelleher
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Neil Brown
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Cc: dm-devel@redhat.com
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux390@de.ibm.com
    Cc: Boaz Harrosh
    Cc: Benny Halevy
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: "Nicholas A. Bellinger"
    Cc: Alexander Viro
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Cc: Jaegeuk Kim
    Cc: Steven Whitehouse
    Cc: Dave Kleikamp
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Trond Myklebust
    Cc: KONISHI Ryusuke
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Ben Myers
    Cc: xfs@oss.sgi.com
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Len Brown
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Herton Ronaldo Krzesinski
    Cc: Ben Hutchings
    Cc: Andrew Morton
    Cc: Guo Chao
    Cc: Tejun Heo
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Wei Yongjun
    Cc: "Roger Pau Monné"
    Cc: Jan Beulich
    Cc: Stefano Stabellini
    Cc: Ian Campbell
    Cc: Sebastian Ott
    Cc: Christian Borntraeger
    Cc: Minchan Kim
    Cc: Jiang Liu
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Joe Perches
    Cc: Peng Tao
    Cc: Andy Adamson
    Cc: fanchaoting
    Cc: Jie Liu
    Cc: Sunil Mushran
    Cc: "Martin K. Petersen"
    Cc: Namjae Jeon
    Cc: Pankaj Kumar
    Cc: Dan Magenheimer
    Cc: Mel Gorman 6

    Kent Overstreet
     

22 Sep, 2013

1 commit

  • Adding the number of bios in a remapped request to 'block_rq_remap'
    tracepoint.

    Request remapper clones bios in a request to track the completion
    status of each bio. So the number of bios can be useful information
    for investigation.

    Related discussions:
    http://www.redhat.com/archives/dm-devel/2013-August/msg00084.html
    http://www.redhat.com/archives/dm-devel/2013-September/msg00024.html

    Signed-off-by: Jun'ichi Nomura
    Acked-by: Mike Snitzer
    Cc: Jens Axboe
    Signed-off-by: Jens Axboe

    Jun'ichi Nomura
     

09 May, 2013

1 commit

  • Pull block core updates from Jens Axboe:

    - Major bit is Kents prep work for immutable bio vecs.

    - Stable candidate fix for a scheduling-while-atomic in the queue
    bypass operation.

    - Fix for the hang on exceeded rq->datalen 32-bit unsigned when merging
    discard bios.

    - Tejuns changes to convert the writeback thread pool to the generic
    workqueue mechanism.

    - Runtime PM framework, SCSI patches exists on top of these in James'
    tree.

    - A few random fixes.

    * 'for-3.10/core' of git://git.kernel.dk/linux-block: (40 commits)
    relay: move remove_buf_file inside relay_close_buf
    partitions/efi.c: replace useless kzalloc's by kmalloc's
    fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read()
    block: fix max discard sectors limit
    blkcg: fix "scheduling while atomic" in blk_queue_bypass_start
    Documentation: cfq-iosched: update documentation help for cfq tunables
    writeback: expose the bdi_wq workqueue
    writeback: replace custom worker pool implementation with unbound workqueue
    writeback: remove unused bdi_pending_list
    aoe: Fix unitialized var usage
    bio-integrity: Add explicit field for owner of bip_buf
    block: Add an explicit bio flag for bios that own their bvec
    block: Add bio_alloc_pages()
    block: Convert some code to bio_for_each_segment_all()
    block: Add bio_for_each_segment_all()
    bounce: Refactor __blk_queue_bounce to not use bi_io_vec
    raid1: use bio_copy_data()
    pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage
    pktcdvd: use bio_copy_data()
    block: Add bio_copy_data()
    ...

    Linus Torvalds
     

19 Apr, 2013

1 commit

  • This reverts commit 3a366e614d0837d9fc23f78cdb1a1186ebc3387f.

    Wanlong Gao reports that it causes a kernel panic on his machine several
    minutes after boot. Reverting it removes the panic.

    Jens says:
    "It's not quite clear why that is yet, so I think we should just revert
    the commit for 3.9 final (which I'm assuming is pretty close).

    The wifi is crap at the LSF hotel, so sending this email instead of
    queueing up a revert and pull request."

    Reported-by: Wanlong Gao
    Requested-by: Jens Axboe
    Cc: Tejun Heo
    Cc: Steven Rostedt
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

24 Mar, 2013

1 commit

  • Bunch of places in the code weren't using it where they could be -
    this'll reduce the size of the patch that puts bi_sector/bi_size/bi_idx
    into a struct bvec_iter.

    Signed-off-by: Kent Overstreet
    CC: Jens Axboe
    CC: "Ed L. Cashin"
    CC: Nick Piggin
    CC: Jiri Kosina
    CC: Jim Paris
    CC: Geoff Levand
    CC: Alasdair Kergon
    CC: dm-devel@redhat.com
    CC: Neil Brown
    CC: Steven Rostedt
    Acked-by: Ed Cashin

    Kent Overstreet
     

14 Jan, 2013

3 commits

  • The former is triggered from touch_buffer() and the latter
    mark_buffer_dirty().

    This is part of tracepoint additions to improve visiblity into
    dirtying / writeback operations for io tracer and userland.

    v2: Transformed writeback_dirty_buffer to block_dirty_buffer and made
    it share TP definition with block_touch_buffer.

    Signed-off-by: Tejun Heo
    Cc: Fengguang Wu
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • bio_{front|back}_merge tracepoints report a bio merging into an
    existing request but didn't specify which request the bio is being
    merged into. Add @req to it. This makes it impossible to share the
    event template with block_bio_queue - split it out.

    @req isn't used or exported to userland at this point and there is no
    userland visible behavior change. Later changes will make use of the
    extra parameter.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • bio completion didn't kick block_bio_complete TP. Only dm was
    explicitly triggering the TP on IO completion. This makes
    block_bio_complete TP useless for tracers which want to know about
    bios, and all other bio based drivers skip generating blktrace
    completion events.

    This patch makes all bio completions via bio_endio() generate
    block_bio_complete TP.

    * Explicit trace_block_bio_complete() invocation removed from dm and
    the trace point is unexported.

    * @rq dropped from trace_block_bio_complete(). bios may fly around
    w/o queue associated. Verifying and accessing the assocaited queue
    belongs to TP probes.

    * blktrace now gets both request and bio completions. Make it ignore
    bio completions if request completion path is happening.

    This makes all bio based drivers generate blktrace completion events
    properly and makes the block_bio_complete TP actually useful.

    v2: With this change, block_bio_complete TP could be invoked on sg
    commands which have bio's with %NULL bi_bdev. Update TP
    assignment code to check whether bio->bi_bdev is %NULL before
    dereferencing.

    Signed-off-by: Tejun Heo
    Original-patch-by: Namhyung Kim
    Cc: Tejun Heo
    Cc: Steven Rostedt
    Cc: Alasdair Kergon
    Cc: dm-devel@redhat.com
    Cc: Neil Brown
    Signed-off-by: Jens Axboe

    Tejun Heo
     

11 Aug, 2011

1 commit

  • Add FLUSH/FUA support to blktrace. As FLUSH precedes WRITE and/or
    FUA follows WRITE, use the same 'F' flag for both cases and
    distinguish them by their (relative) position. The end results
    look like (other flags might be shown also):

    - WRITE: W
    - WRITE_FLUSH: FW
    - WRITE_FUA: WF
    - WRITE_FLUSH_FUA: FWF

    Note that we reuse TC_BARRIER due to lack of bit space of act_mask
    so that the older versions of blktrace tools will report flush
    requests as barriers from now on.

    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Signed-off-by: Namhyung Kim
    Reviewed-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Namhyung Kim
     

16 Apr, 2011

1 commit

  • It's a pretty close match to what we had before - the timer triggering
    would mean that nobody unplugged the plug in due time, in the new
    scheme this matches very closely what the schedule() unplug now is.
    It's essentially the difference between an explicit unplug (IO unplug)
    or an implicit unplug (timer unplug, we scheduled with pending IO
    queued).

    Signed-off-by: Jens Axboe

    Jens Axboe
     

12 Apr, 2011

2 commits


03 Mar, 2011

1 commit

  • If we enable trace events to trace block actions, We use
    blk_fill_rwbs_rq to analyze the corresponding actions
    in request's cmd_flags, but we only choose the minor 2 bits
    from it, so most of other flags(e.g, REQ_SYNC) are missing.
    For example, with a sync write we get:
    write_test-2409 [001] 160.013869: block_rq_insert: 3,64 W 0 () 258135 + =
    8 [write_test]

    Since now we have integrated the flags of both bio and request,
    it is safe to pass rq->cmd_flags directly to blk_fill_rwbs and
    blk_fill_rwbs_rq isn't needed any more.

    With this patch, after a sync write we get:
    write_test-2417 [000] 226.603878: block_rq_insert: 3,64 WS 0 () 258135 +=
    8 [write_test]

    Signed-off-by: Tao Ma
    Acked-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Tao Ma
     

07 Jan, 2011

1 commit

  • The "error" field in block_bio_complete is not assigned, leaving the memory area
    uninitialized (keeping garbage data). Pass an additional tracepoint argument to
    this event to initialize this field.

    Signed-off-by: Jeff Moyer
    Signed-off-by: Mathieu Desnoyers
    CC: Steven Rostedt
    CC: Frederic Weisbecker
    CC: Ingo Molnar
    CC: Thomas Gleixner
    CC: Li Zefan
    CC: Alan.Brunelle@hp.com
    Signed-off-by: Jens Axboe

    Jeff Moyer
     

16 Nov, 2010

1 commit


08 Aug, 2010

1 commit


09 Mar, 2010

1 commit


26 Nov, 2009

1 commit

  • use DECLARE_EVENT_CLASS to remove duplicate code:

    text data bss dec hex filename
    53570 3284 184 57038 dece block/blk-core.o.old
    43702 3284 144 47130 b81a block/blk-core.o

    12 events are converted:

    block_rq: block_rq_insert, block_rq_issue
    block_rq_with_error: block_rq_{abort, requeue, complete}
    block_bio: block_bio_{backmerge, frontmerge, queue}
    block_get_rq: block_getrq, block_sleeprq
    block_unplug: block_unplug_timer, block_unplug_io

    No change in functionality.

    Signed-off-by: Li Zefan
    Cc: Jens Axboe
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     

02 Oct, 2009

1 commit

  • Since 2.6.31 now has request-based device-mapper, it's useful to have
    a tracepoint for request-remapping as well as bio-remapping.
    This patch adds a tracepoint for request-remapping, trace_block_rq_remap().

    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Cc: Alasdair G Kergon
    Cc: Li Zefan
    Signed-off-by: Jens Axboe

    Jun'ichi Nomura
     

13 Sep, 2009

1 commit

  • Booting 2.6.31 and executing
    echo 1 >/sys/kernel/debug/tracing/events/enable
    leads to
    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [] ftrace_raw_event_block_bio_bounce+0x4b/0xb9

    Apparently,
    bio = bio_map_user(q, NULL, uaddr, len, reading, gfp_mask);
    is called in block/blk-map.c:58 where bio->bi_bdev in set to NULL and
    still is NULL when an attempt is made to evaluate bio->bi_bdev->bd_dev
    in include/trace/events/block.h:189.

    The tracepoint should ensure bio->bi_bdev is not dereferenced, if NULL.

    Signed-off-by: Carsten Emde
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Carsten Emde
     

13 Jul, 2009

1 commit

  • If TRACE_INCLDUE_FILE is defined,
    will be included and compiled, otherwise it will be

    So TRACE_SYSTEM should be defined outside of #if proctection,
    just like TRACE_INCLUDE_FILE.

    Imaging this scenario:

    #include
    -> TRACE_SYSTEM == foo
    ...
    #include
    -> TRACE_SYSTEM == bar
    ...
    #define CREATE_TRACE_POINTS
    #include
    -> TRACE_SYSTEM == bar !!!

    and then bar.h will be included and compiled.

    Signed-off-by: Li Zefan
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     

12 Jun, 2009

1 commit

  • * 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits)
    block: add request clone interface (v2)
    floppy: fix hibernation
    ramdisk: remove long-deprecated "ramdisk=" boot-time parameter
    fs/bio.c: add missing __user annotation
    block: prevent possible io_context->refcount overflow
    Add serial number support for virtio_blk, V4a
    block: Add missing bounce_pfn stacking and fix comments
    Revert "block: Fix bounce limit setting in DM"
    cciss: decode unit attention in SCSI error handling code
    cciss: Remove no longer needed sendcmd reject processing code
    cciss: change SCSI error handling routines to work with interrupts enabled.
    cciss: separate error processing and command retrying code in sendcmd_withirq_core()
    cciss: factor out fix target status processing code from sendcmd functions
    cciss: simplify interface of sendcmd() and sendcmd_withirq()
    cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code
    cciss: Use schedule_timeout_uninterruptible in SCSI error handling code
    block: needs to set the residual length of a bidi request
    Revert "block: implement blkdev_readpages"
    block: Fix bounce limit setting in DM
    Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt
    ...

    Manually fix conflicts with tracing updates in:
    block/blk-sysfs.c
    drivers/ide/ide-atapi.c
    drivers/ide/ide-cd.c
    drivers/ide/ide-floppy.c
    drivers/ide/ide-tape.c
    include/trace/events/block.h
    kernel/trace/blktrace.c

    Linus Torvalds
     

10 Jun, 2009

2 commits

  • The sector field is either u64 or unsigned long depending on
    the arch. This patch casts the sector to unsigned long long to
    prevent the printf warnings.

    [ Impact: remove compile warnings ]

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
    these new capabilities to this tracepoint:

    - zero-copy and per-cpu splice() tracing
    - binary tracing without printf overhead
    - structured logging records exposed under /debug/tracing/events
    - trace events embedded in function tracer output and other plugins
    - user-defined, per tracepoint filter expressions
    ...

    Cons:

    - no dev_t info for the output of plug, unplug_timer and unplug_io events.
    no dev_t info for getrq and sleeprq events if bio == NULL.
    no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.

    This is mainly because we can't get the deivce from a request queue.
    But this may change in the future.

    - A packet command is converted to a string in TP_assign, not TP_print.
    While blktrace do the convertion just before output.

    Since pc requests should be rather rare, this is not a big issue.

    - In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
    has a unique format, which means we have some unused data in a trace entry.

    The overhead is minimized by using __dynamic_array() instead of __array().

    I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:

    dd dd + ioctl blktrace dd + TRACE_EVENT (splice)
    1 7.36s, 42.7 MB/s 7.50s, 42.0 MB/s 7.41s, 42.5 MB/s
    2 7.43s, 42.3 MB/s 7.48s, 42.1 MB/s 7.43s, 42.4 MB/s
    3 7.38s, 42.6 MB/s 7.45s, 42.2 MB/s 7.41s, 42.5 MB/s

    So the overhead of tracing is very small, and no regression when using
    those trace events vs blktrace.

    And the binary output of TRACE_EVENT is much smaller than blktrace:

    # ls -l -h
    -rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
    -rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
    -rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out

    Following are some comparisons between TRACE_EVENT and blktrace:

    plug:
    kjournald-480 [000] 303.084981: block_plug: [kjournald]
    kjournald-480 [000] 303.084981: 8,0 P N [kjournald]

    unplug_io:
    kblockd/0-118 [000] 300.052973: block_unplug_io: [kblockd/0] 1
    kblockd/0-118 [000] 300.052974: 8,0 U N [kblockd/0] 1

    remap:
    kjournald-480 [000] 303.085042: block_remap: 8,0 W 102736992 + 8 v3:

    - use the newly introduced __dynamic_array().

    Changelog from v1 -> v2:

    - use __string() instead of __array() to minimize the memory required
    to store hex dump of rq->cmd().

    - support large pc requests.

    - add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.

    - some cleanups.

    Signed-off-by: Li Zefan
    LKML-Reference:
    Signed-off-by: Steven Rostedt

    Li Zefan