28 Oct, 2016

1 commit

  • Now that we don't need the common flags to overflow outside the range
    of a 32-bit type we can encode them the same way for both the bio and
    request fields. This in addition allows us to place the operation
    first (and make some room for more ops while we're at it) and to
    stop having to shift around the operation values.

    In addition this allows passing around only one value in the block layer
    instead of two (and eventuall also in the file systems, but we can do
    that later) and thus clean up a lot of code.

    Last but not least this allows decreasing the size of the cmd_flags
    field in struct request to 32-bits. Various functions passing this
    value could also be updated, but I'd like to avoid the churn for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

16 Aug, 2016

1 commit

  • Commit 288dab8a35a0 ("block: add a separate operation type for secure
    erase") split REQ_OP_SECURE_ERASE from REQ_OP_DISCARD without considering
    all the places REQ_OP_DISCARD was being used to mean either. Fix those.

    Signed-off-by: Adrian Hunter
    Fixes: 288dab8a35a0 ("block: add a separate operation type for secure erase")
    Signed-off-by: Jens Axboe

    Adrian Hunter
     

08 Aug, 2016

1 commit

  • Since commit 63a4cc24867d, bio->bi_rw contains flags in the lower
    portion and the op code in the higher portions. This means that
    old code that relies on manually setting bi_rw is most likely
    going to be broken. Instead of letting that brokeness linger,
    rename the member, to force old and out-of-tree code to break
    at compile time instead of at runtime.

    No intended functional changes in this commit.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

27 Jul, 2016

1 commit

  • Pull block driver updates from Jens Axboe:
    "This branch also contains core changes. I've come to the conclusion
    that from 4.9 and forward, I'll be doing just a single branch. We
    often have dependencies between core and drivers, and it's hard to
    always split them up appropriately without pulling core into drivers
    when that happens.

    That said, this contains:

    - separate secure erase type for the core block layer, from
    Christoph.

    - set of discard fixes, from Christoph.

    - bio shrinking fixes from Christoph, as a followup up to the
    op/flags change in the core branch.

    - map and append request fixes from Christoph.

    - NVMeF (NVMe over Fabrics) code from Christoph. This is pretty
    exciting!

    - nvme-loop fixes from Arnd.

    - removal of ->driverfs_dev from Dan, after providing a
    device_add_disk() helper.

    - bcache fixes from Bhaktipriya and Yijing.

    - cdrom subchannel read fix from Vchannaiah.

    - set of lightnvm updates from Wenwei, Matias, Johannes, and Javier.

    - set of drbd updates and fixes from Fabian, Lars, and Philipp.

    - mg_disk error path fix from Bart.

    - user notification for failed device add for loop, from Minfei.

    - NVMe in general:
    + NVMe delay quirk from Guilherme.
    + SR-IOV support and command retry limits from Keith.
    + fix for memory-less NUMA node from Masayoshi.
    + use UINT_MAX for discard sectors, from Minfei.
    + cancel IO fixes from Ming.
    + don't allocate unused major, from Neil.
    + error code fixup from Dan.
    + use constants for PSDT/FUSE from James.
    + variable init fix from Jay.
    + fabrics fixes from Ming, Sagi, and Wei.
    + various fixes"

    * 'for-4.8/drivers' of git://git.kernel.dk/linux-block: (115 commits)
    nvme/pci: Provide SR-IOV support
    nvme: initialize variable before logical OR'ing it
    block: unexport various bio mapping helpers
    scsi/osd: open code blk_make_request
    target: stop using blk_make_request
    block: simplify and export blk_rq_append_bio
    block: ensure bios return from blk_get_request are properly initialized
    virtio_blk: use blk_rq_map_kern
    memstick: don't allow REQ_TYPE_BLOCK_PC requests
    block: shrink bio size again
    block: simplify and cleanup bvec pool handling
    block: get rid of bio_rw and READA
    block: don't ignore -EOPNOTSUPP blkdev_issue_write_same
    block: introduce BLKDEV_DISCARD_ZERO to fix zeroout
    NVMe: don't allocate unused nvme_major
    nvme: avoid crashes when node 0 is memoryless node.
    nvme: Limit command retries
    loop: Make user notify for adding loop device failed
    nvme-loop: fix nvme-loop Kconfig dependencies
    nvmet: fix return value check in nvmet_subsys_alloc()
    ...

    Linus Torvalds
     

18 Jun, 2016

1 commit

  • The blktrace code stores the current time in a 32-bit word in its
    user interface. This is a bad idea because 32-bit seconds overflow
    at some point.

    We probably have until 2106 before this one overflows, as it seems
    to use an 'unsigned' variable, but we should confirm that user
    space treats it the same way.

    Aside from this, we want to stop using 'struct timespec' here,
    so I'm adding a comment about the overflow and change the code
    to use timespec64 instead to make the loss of range more obvious.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     

09 Jun, 2016

1 commit


08 Jun, 2016

3 commits

  • To avoid confusion between REQ_OP_FLUSH, which is handled by
    request_fn drivers, and upper layers requesting the block layer
    perform a flush sequence along with possibly a WRITE, this patch
    renames REQ_FLUSH to REQ_PREFLUSH.

    Signed-off-by: Mike Christie
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     
  • This adds a REQ_OP_FLUSH operation that is sent to request_fn
    based drivers by the block layer's flush code, instead of
    sending requests with the request->cmd_flags REQ_FLUSH bit set.

    Signed-off-by: Mike Christie
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     
  • Have blktrace use the req/bio op accessor to get the REQ_OP.

    Signed-off-by: Mike Christie
    Reviewed-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     

10 May, 2016

2 commits


23 Mar, 2016

1 commit

  • Use the more common logging method with the eventual goal of removing
    pr_warning altogether.

    Miscellanea:

    - Realign arguments
    - Coalesce formats
    - Add missing space between a few coalesced formats

    Signed-off-by: Joe Perches
    Acked-by: Rafael J. Wysocki [kernel/power/suspend.c]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     

04 Jan, 2016

1 commit


07 Nov, 2015

1 commit

  • Pull tracking updates from Steven Rostedt:
    "Most of the changes are clean ups and small fixes. Some of them have
    stable tags to them. I searched through my INBOX just as the merge
    window opened and found lots of patches to pull. I ran them through
    all my tests and they were in linux-next for a few days.

    Features added this release:
    ----------------------------

    - Module globbing. You can now filter function tracing to several
    modules. # echo '*:mod:*snd*' > set_ftrace_filter (Dmitry Safonov)

    - Tracer specific options are now visible even when the tracer is not
    active. It was rather annoying that you can only see and modify
    tracer options after enabling the tracer. Now they are in the
    options/ directory even when the tracer is not active. Although
    they are still only visible when the tracer is active in the
    trace_options file.

    - Trace options are now per instance (although some of the tracer
    specific options are global)

    - New tracefs file: set_event_pid. If any pid is added to this file,
    then all events in the instance will filter out events that are not
    part of this pid. sched_switch and sched_wakeup events handle next
    and the wakee pids"

    * tag 'trace-v4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (68 commits)
    tracefs: Fix refcount imbalance in start_creating()
    tracing: Put back comma for empty fields in boot string parsing
    tracing: Apply tracer specific options from kernel command line.
    tracing: Add some documentation about set_event_pid
    ring_buffer: Remove unneeded smp_wmb() before wakeup of reader benchmark
    tracing: Allow dumping traces without tracking trace started cpus
    ring_buffer: Fix more races when terminating the producer in the benchmark
    ring_buffer: Do no not complete benchmark reader too early
    tracing: Remove redundant TP_ARGS redefining
    tracing: Rename max_stack_lock to stack_trace_max_lock
    tracing: Allow arch-specific stack tracer
    recordmcount: arm64: Replace the ignored mcount call into nop
    recordmcount: Fix endianness handling bug for nop_mcount
    tracepoints: Fix documentation of RCU lockdep checks
    tracing: ftrace_event_is_function() can return boolean
    tracing: is_legal_op() can return boolean
    ring-buffer: rb_event_is_commit() can return boolean
    ring-buffer: rb_per_cpu_empty() can return boolean
    ring_buffer: ring_buffer_empty{cpu}() can return boolean
    ring-buffer: rb_is_reader_page() can return boolean
    ...

    Linus Torvalds
     

30 Oct, 2015

1 commit

  • This is really about simplifying the double xchg patterns into
    a single cmpxchg, with the same logic. Other than the immediate
    cleanup, there are some subtleties this change deals with:

    (i) While the load of the old bt is fully ordered wrt everything,
    ie:

    old_bt = xchg(&q->blk_trace, bt); [barrier]
    if (old_bt)
    (void) xchg(&q->blk_trace, old_bt); [barrier]

    blk_trace could still be changed between the xchg and the old_bt
    load. Note that this description is merely theoretical and afaict
    very small, but doing everything in a single context with cmpxchg
    closes this potential race.

    (ii) Ordering guarantees are obviously kept with cmpxchg.

    (iii) Gets rid of the hacky-by-nature (void)xchg pattern.

    Signed-off-by: Davidlohr Bueso
    eviewed-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Davidlohr Bueso
     

01 Oct, 2015

1 commit

  • In preparation to make trace options per instance, the global trace_flags
    needs to be moved from being a global variable to a field within the trace
    instance trace_array structure.

    There's still more work to do, as there's some functions that use
    trace_flags without passing in a way to get to the current_trace array. For
    those, the global_trace is used directly (from trace.c). This includes
    setting and clearing the trace_flags. This means that when a new instance is
    created, it just gets the trace_flags of the global_trace and will not be
    able to modify them. Depending on the functions that have access to the
    trace_array, the flags of an instance may not affect parts of its trace,
    where the global_trace is used. These will be fixed in future changes.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

26 Sep, 2015

1 commit

  • In preparation for having trace options be per instance, the trace_array
    needs to be passed to the trace_buffer_unlock_commit(). The
    trace_event_buffer_lock_reserve() already passes in the trace_event_file
    where the trace_array can be derived from.

    Also added a "__init" to the boot up test event plus function tracing
    function function_test_events_call().

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

29 Jul, 2015

1 commit

  • Currently we have two different ways to signal an I/O error on a BIO:

    (1) by clearing the BIO_UPTODATE flag
    (2) by returning a Linux errno value to the bi_end_io callback

    The first one has the drawback of only communicating a single possible
    error (-EIO), and the second one has the drawback of not beeing persistent
    when bios are queued up, and are not passed along from child to parent
    bio in the ever more popular chaining scenario. Having both mechanisms
    available has the additional drawback of utterly confusing driver authors
    and introducing bugs where various I/O submitters only deal with one of
    them, and the others have to add boilerplate code to deal with both kinds
    of error returns.

    So add a new bi_error field to store an errno value directly in struct
    bio and remove the existing mechanisms to clean all this up.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

27 Jun, 2015

1 commit

  • Pull tracing updates from Steven Rostedt:
    "This patch series contains several clean ups and even a new trace
    clock "monitonic raw". Also some enhancements to make the ring buffer
    even faster. But the biggest and most noticeable change is the
    renaming of the ftrace* files, structures and variables that have to
    deal with trace events.

    Over the years I've had several developers tell me about their
    confusion with what ftrace is compared to events. Technically,
    "ftrace" is the infrastructure to do the function hooks, which include
    tracing and also helps with live kernel patching. But the trace
    events are a separate entity altogether, and the files that affect the
    trace events should not be named "ftrace". These include:

    include/trace/ftrace.h -> include/trace/trace_events.h
    include/linux/ftrace_event.h -> include/linux/trace_events.h

    Also, functions that are specific for trace events have also been renamed:

    ftrace_print_*() -> trace_print_*()
    (un)register_ftrace_event() -> (un)register_trace_event()
    ftrace_event_name() -> trace_event_name()
    ftrace_trigger_soft_disabled() -> trace_trigger_soft_disabled()
    ftrace_define_fields_##call() -> trace_define_fields_##call()
    ftrace_get_offsets_##call() -> trace_get_offsets_##call()

    Structures have been renamed:

    ftrace_event_file -> trace_event_file
    ftrace_event_{call,class} -> trace_event_{call,class}
    ftrace_event_buffer -> trace_event_buffer
    ftrace_subsystem_dir -> trace_subsystem_dir
    ftrace_event_raw_##call -> trace_event_raw_##call
    ftrace_event_data_offset_##call-> trace_event_data_offset_##call
    ftrace_event_type_funcs_##call -> trace_event_type_funcs_##call

    And a few various variables and flags have also been updated.

    This has been sitting in linux-next for some time, and I have not
    heard a single complaint about this rename breaking anything. Mostly
    because these functions, variables and structures are mostly internal
    to the tracing system and are seldom (if ever) used by anything
    external to that"

    * tag 'trace-v4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (33 commits)
    ring_buffer: Allow to exit the ring buffer benchmark immediately
    ring-buffer-benchmark: Fix the wrong type
    ring-buffer-benchmark: Fix the wrong param in module_param
    ring-buffer: Add enum names for the context levels
    ring-buffer: Remove useless unused tracing_off_permanent()
    ring-buffer: Give NMIs a chance to lock the reader_lock
    ring-buffer: Add trace_recursive checks to ring_buffer_write()
    ring-buffer: Allways do the trace_recursive checks
    ring-buffer: Move recursive check to per_cpu descriptor
    ring-buffer: Add unlikelys to make fast path the default
    tracing: Rename ftrace_get_offsets_##call() to trace_event_get_offsets_##call()
    tracing: Rename ftrace_define_fields_##call() to trace_event_define_fields_##call()
    tracing: Rename ftrace_event_type_funcs_##call to trace_event_type_funcs_##call
    tracing: Rename ftrace_data_offset_##call to trace_event_data_offset_##call
    tracing: Rename ftrace_raw_##call event structures to trace_event_raw_##call
    tracing: Rename ftrace_trigger_soft_disabled() to trace_trigger_soft_disabled()
    tracing: Rename FTRACE_EVENT_FL_* flags to EVENT_FILE_FL_*
    tracing: Rename struct ftrace_subsystem_dir to trace_subsystem_dir
    tracing: Rename ftrace_event_name() to trace_event_name()
    tracing: Rename FTRACE_MAX_EVENT to TRACE_EVENT_TYPE_MAX
    ...

    Linus Torvalds
     

26 Jun, 2015

1 commit

  • Part of the disassembly of do_blk_trace_setup:

    231b: e8 00 00 00 00 callq 2320
    231c: R_X86_64_PC32 strlen+0xfffffffffffffffc
    2320: eb 0a jmp 232c
    2322: 66 0f 1f 44 00 00 nopw 0x0(%rax,%rax,1)
    2328: 48 83 c3 01 add $0x1,%rbx
    232c: 48 39 d8 cmp %rbx,%rax
    232f: 76 47 jbe 2378
    2331: 41 80 3c 1c 2f cmpb $0x2f,(%r12,%rbx,1)
    2336: 75 f0 jne 2328
    2338: 41 c6 04 1c 5f movb $0x5f,(%r12,%rbx,1)
    233d: 4c 89 e7 mov %r12,%rdi
    2340: e8 00 00 00 00 callq 2345
    2341: R_X86_64_PC32 strlen+0xfffffffffffffffc
    2345: eb e1 jmp 2328

    Yep, that's right: gcc isn't smart enough to realize that replacing '/' by
    '_' cannot change the strlen(), so we call it again and again (at least
    when a '/' is found). Even if gcc were that smart, this construction
    would still loop over the string twice, once for the initial strlen() call
    and then the open-coded loop.

    Let's simply use strreplace() instead.

    Signed-off-by: Rasmus Villemoes
    Acked-by: Steven Rostedt
    Liked-by: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rasmus Villemoes
     

14 May, 2015

1 commit


14 Dec, 2014

1 commit

  • Pull block driver core update from Jens Axboe:
    "This is the pull request for the core block IO changes for 3.19. Not
    a huge round this time, mostly lots of little good fixes:

    - Fix a bug in sysfs blktrace interface causing a NULL pointer
    dereference, when enabled/disabled through that API. From Arianna
    Avanzini.

    - Various updates/fixes/improvements for blk-mq:

    - A set of updates from Bart, mostly fixing buts in the tag
    handling.

    - Cleanup/code consolidation from Christoph.

    - Extend queue_rq API to be able to handle batching issues of IO
    requests. NVMe will utilize this shortly. From me.

    - A few tag and request handling updates from me.

    - Cleanup of the preempt handling for running queues from Paolo.

    - Prevent running of unmapped hardware queues from Ming Lei.

    - Move the kdump memory limiting check to be in the correct
    location, from Shaohua.

    - Initialize all software queues at init time from Takashi. This
    prevents a kobject warning when CPUs are brought online that
    weren't online when a queue was registered.

    - Single writeback fix for I_DIRTY clearing from Tejun. Queued with
    the core IO changes, since it's just a single fix.

    - Version X of the __bio_add_page() segment addition retry from
    Maurizio. Hope the Xth time is the charm.

    - Documentation fixup for IO scheduler merging from Jan.

    - Introduce (and use) generic IO stat accounting helpers for non-rq
    drivers, from Gu Zheng.

    - Kill off artificial limiting of max sectors in a request from
    Christoph"

    * 'for-3.19/core' of git://git.kernel.dk/linux-block: (26 commits)
    bio: modify __bio_add_page() to accept pages that don't start a new segment
    blk-mq: Fix uninitialized kobject at CPU hotplugging
    blktrace: don't let the sysfs interface remove trace from running list
    blk-mq: Use all available hardware queues
    blk-mq: Micro-optimize bt_get()
    blk-mq: Fix a race between bt_clear_tag() and bt_get()
    blk-mq: Avoid that __bt_get_word() wraps multiple times
    blk-mq: Fix a use-after-free
    blk-mq: prevent unmapped hw queue from being scheduled
    blk-mq: re-check for available tags after running the hardware queue
    blk-mq: fix hang in bt_get()
    blk-mq: move the kdump check to blk_mq_alloc_tag_set
    blk-mq: cleanup tag free handling
    blk-mq: use 'nr_cpu_ids' as highest CPU ID count for hwq cpu map
    blk: introduce generic io stat accounting help function
    blk-mq: handle the single queue case in blk_mq_hctx_next_cpu
    genhd: check for int overflow in disk_expand_part_tbl()
    blk-mq: add blk_mq_free_hctx_request()
    blk-mq: export blk_mq_free_request()
    blk-mq: use get_cpu/put_cpu instead of preempt_disable/preempt_enable
    ...

    Linus Torvalds
     

10 Dec, 2014

1 commit

  • Currently, blktrace can be started/stopped via its ioctl-based interface
    (used by the userspace blktrace tool) or via its ftrace interface. The
    function blk_trace_remove_queue(), called each time an "enable" tunable
    of the ftrace interface transitions to zero, removes the trace from the
    running list, even if no function from the sysfs interface adds it to
    such a list. This leads to a null pointer dereference. This commit
    changes the blk_trace_remove_queue() function so that it does not remove
    the blk_trace from the running list.

    v2:
    - Now the patch removes the invocation of list_del() instead of
    adding an useless if branch, as suggested by Namhyung Kim.

    Signed-off-by: Arianna Avanzini
    Signed-off-by: Jens Axboe

    Arianna Avanzini
     

20 Nov, 2014

1 commit

  • Checking the return code of every trace_seq_printf() operation and having
    to return early if it overflowed makes the code messy.

    Using the new trace_seq_has_overflowed() and trace_handle_return() functions
    allows us to clean up the code.

    In the future, trace_seq_printf() and friends will be turning into void
    functions and not returning a value. The trace_seq_has_overflowed() is to
    be used instead. This cleanup allows that change to take place.

    Cc: Jens Axboe
    Reviewed-by: Petr Mladek
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

04 Apr, 2014

1 commit

  • Pull tracing updates from Steven Rostedt:
    "Most of the changes were largely clean ups, and some documentation.
    But there were a few features that were added:

    Uprobes now work with event triggers and multi buffers and have
    support under ftrace and perf.

    The big feature is that the function tracer can now be used within the
    multi buffer instances. That is, you can now trace some functions in
    one buffer, others in another buffer, all functions in a third buffer
    and so on. They are basically agnostic from each other. This only
    works for the function tracer and not for the function graph trace,
    although you can have the function graph tracer running in the top
    level buffer (or any tracer for that matter) and have different
    function tracing going on in the sub buffers"

    * tag 'trace-3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (45 commits)
    tracing: Add BUG_ON when stack end location is over written
    tracepoint: Remove unused API functions
    Revert "tracing: Move event storage for array from macro to standalone function"
    ftrace: Constify ftrace_text_reserved
    tracepoints: API doc update to tracepoint_probe_register() return value
    tracepoints: API doc update to data argument
    ftrace: Fix compilation warning about control_ops_free
    ftrace/x86: BUG when ftrace recovery fails
    ftrace: Warn on error when modifying ftrace function
    ftrace: Remove freelist from struct dyn_ftrace
    ftrace: Do not pass data to ftrace_dyn_arch_init
    ftrace: Pass retval through return in ftrace_dyn_arch_init()
    ftrace: Inline the code from ftrace_dyn_table_alloc()
    ftrace: Cleanup of global variables ftrace_new_pgs and ftrace_update_cnt
    tracing: Evaluate len expression only once in __dynamic_array macro
    tracing: Correctly expand len expressions from __dynamic_array macro
    tracing/module: Replace include of tracepoint.h with jump_label.h in module.h
    tracing: Fix event header migrate.h to include tracepoint.h
    tracing: Fix event header writeback.h to include tracepoint.h
    tracing: Warn if a tracepoint is not set via debugfs
    ...

    Linus Torvalds
     

06 Mar, 2014

1 commit

  • trace_block_rq_complete does not take into account that request can
    be partially completed, so we can get the following incorrect output
    of blkparser:

    C R 232 + 240 [0]
    C R 240 + 232 [0]
    C R 248 + 224 [0]
    C R 256 + 216 [0]

    but should be:

    C R 232 + 8 [0]
    C R 240 + 8 [0]
    C R 248 + 8 [0]
    C R 256 + 8 [0]

    Also, the whole output summary statistics of completed requests and
    final throughput will be incorrect.

    This patch takes into account real completion size of the request and
    fixes wrong completion accounting.

    Signed-off-by: Roman Pen
    CC: Steven Rostedt
    CC: Frederic Weisbecker
    CC: Ingo Molnar
    CC: linux-kernel@vger.kernel.org
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Roman Pen
     

21 Feb, 2014

1 commit


24 Nov, 2013

1 commit

  • Immutable biovecs are going to require an explicit iterator. To
    implement immutable bvecs, a later patch is going to add a bi_bvec_done
    member to this struct; for now, this patch effectively just renames
    things.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Ed L. Cashin"
    Cc: Nick Piggin
    Cc: Lars Ellenberg
    Cc: Jiri Kosina
    Cc: Matthew Wilcox
    Cc: Geoff Levand
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Joshua Morris
    Cc: Philip Kelleher
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Neil Brown
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Cc: dm-devel@redhat.com
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux390@de.ibm.com
    Cc: Boaz Harrosh
    Cc: Benny Halevy
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: "Nicholas A. Bellinger"
    Cc: Alexander Viro
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Cc: Jaegeuk Kim
    Cc: Steven Whitehouse
    Cc: Dave Kleikamp
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Trond Myklebust
    Cc: KONISHI Ryusuke
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Ben Myers
    Cc: xfs@oss.sgi.com
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Len Brown
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Herton Ronaldo Krzesinski
    Cc: Ben Hutchings
    Cc: Andrew Morton
    Cc: Guo Chao
    Cc: Tejun Heo
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Wei Yongjun
    Cc: "Roger Pau Monné"
    Cc: Jan Beulich
    Cc: Stefano Stabellini
    Cc: Ian Campbell
    Cc: Sebastian Ott
    Cc: Christian Borntraeger
    Cc: Minchan Kim
    Cc: Jiang Liu
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Joe Perches
    Cc: Peng Tao
    Cc: Andy Adamson
    Cc: fanchaoting
    Cc: Jie Liu
    Cc: Sunil Mushran
    Cc: "Martin K. Petersen"
    Cc: Namjae Jeon
    Cc: Pankaj Kumar
    Cc: Dan Magenheimer
    Cc: Mel Gorman 6

    Kent Overstreet
     

09 Nov, 2013

1 commit


08 Nov, 2013

1 commit

  • Currently each task sends BLK_TN_PROCESS event to the first traced
    device it interacts with after a new trace is started. When there are
    several traced devices and the task accesses more devices, this logic
    can result in BLK_TN_PROCESS being sent several times to some devices
    while it is never sent to other devices. Thus blkparse doesn't display
    command name when parsing some blktrace files.

    Fix the problem by sending BLK_TN_PROCESS event to all traced devices
    when a task interacts with any of them.

    Signed-off-by: Jan Kara
    Review-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Jan Kara
     

09 May, 2013

1 commit

  • Pull block driver updates from Jens Axboe:
    "It might look big in volume, but when categorized, not a lot of
    drivers are touched. The pull request contains:

    - mtip32xx fixes from Micron.

    - A slew of drbd updates, this time in a nicer series.

    - bcache, a flash/ssd caching framework from Kent.

    - Fixes for cciss"

    * 'for-3.10/drivers' of git://git.kernel.dk/linux-block: (66 commits)
    bcache: Use bd_link_disk_holder()
    bcache: Allocator cleanup/fixes
    cciss: bug fix to prevent cciss from loading in kdump crash kernel
    cciss: add cciss_allow_hpsa module parameter
    drivers/block/mg_disk.c: add CONFIG_PM_SLEEP to suspend/resume functions
    mtip32xx: Workaround for unaligned writes
    bcache: Make sure blocksize isn't smaller than device blocksize
    bcache: Fix merge_bvec_fn usage for when it modifies the bvm
    bcache: Correctly check against BIO_MAX_PAGES
    bcache: Hack around stuff that clones up to bi_max_vecs
    bcache: Set ra_pages based on backing device's ra_pages
    bcache: Take data offset from the bdev superblock.
    mtip32xx: mtip32xx: Disable TRIM support
    mtip32xx: fix a smatch warning
    bcache: Disable broken btree fuzz tester
    bcache: Fix a format string overflow
    bcache: Fix a minor memory leak on device teardown
    bcache: Documentation updates
    bcache: Use WARN_ONCE() instead of __WARN()
    bcache: Add missing #include
    ...

    Linus Torvalds
     

30 Apr, 2013

1 commit

  • Pull tracing updates from Steven Rostedt:
    "Along with the usual minor fixes and clean ups there are a few major
    changes with this pull request.

    1) Multiple buffers for the ftrace facility

    This feature has been requested by many people over the last few
    years. I even heard that Google was about to implement it themselves.
    I finally had time and cleaned up the code such that you can now
    create multiple instances of the ftrace buffer and have different
    events go to different buffers. This way, a low frequency event will
    not be lost in the noise of a high frequency event.

    Note, currently only events can go to different buffers, the tracers
    (ie function, function_graph and the latency tracers) still can only
    be written to the main buffer.

    2) The function tracer triggers have now been extended.

    The function tracer had two triggers. One to enable tracing when a
    function is hit, and one to disable tracing. Now you can record a
    stack trace on a single (or many) function(s), take a snapshot of the
    buffer (copy it to the snapshot buffer), and you can enable or disable
    an event to be traced when a function is hit.

    3) A perf clock has been added.

    A "perf" clock can be chosen to be used when tracing. This will cause
    ftrace to use the same clock as perf uses, and hopefully this will
    make it easier to interleave the perf and ftrace data for analysis."

    * tag 'trace-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (82 commits)
    tracepoints: Prevent null probe from being added
    tracing: Compare to 1 instead of zero for is_signed_type()
    tracing: Remove obsolete macro guard _TRACE_PROFILE_INIT
    ftrace: Get rid of ftrace_profile_bits
    tracing: Check return value of tracing_init_dentry()
    tracing: Get rid of unneeded key calculation in ftrace_hash_move()
    tracing: Reset ftrace_graph_filter_enabled if count is zero
    tracing: Fix off-by-one on allocating stat->pages
    kernel: tracing: Use strlcpy instead of strncpy
    tracing: Update debugfs README file
    tracing: Fix ftrace_dump()
    tracing: Rename trace_event_mutex to trace_event_sem
    tracing: Fix comment about prefix in arch_syscall_match_sym_name()
    tracing: Convert trace_destroy_fields() to static
    tracing: Move find_event_field() into trace_events.c
    tracing: Use TRACE_MAX_PRINT instead of constant
    tracing: Use pr_warn_once instead of open coded implementation
    ring-buffer: Add ring buffer startup selftest
    tracing: Bring Documentation/trace/ftrace.txt up to date
    tracing: Add "perf" trace_clock
    ...

    Conflicts:
    kernel/trace/ftrace.c
    kernel/trace/trace.c

    Linus Torvalds
     

19 Apr, 2013

1 commit

  • This reverts commit 3a366e614d0837d9fc23f78cdb1a1186ebc3387f.

    Wanlong Gao reports that it causes a kernel panic on his machine several
    minutes after boot. Reverting it removes the panic.

    Jens says:
    "It's not quite clear why that is yet, so I think we should just revert
    the commit for 3.9 final (which I'm assuming is pretty close).

    The wifi is crap at the LSF hotel, so sending this email instead of
    queueing up a revert and pull request."

    Reported-by: Wanlong Gao
    Requested-by: Jens Axboe
    Cc: Tejun Heo
    Cc: Steven Rostedt
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

24 Mar, 2013

1 commit


15 Mar, 2013

1 commit

  • Currently, the way the latency tracers and snapshot feature works
    is to have a separate trace_array called "max_tr" that holds the
    snapshot buffer. For latency tracers, this snapshot buffer is used
    to swap the running buffer with this buffer to save the current max
    latency.

    The only items needed for the max_tr is really just a copy of the buffer
    itself, the per_cpu data pointers, the time_start timestamp that states
    when the max latency was triggered, and the cpu that the max latency
    was triggered on. All other fields in trace_array are unused by the
    max_tr, making the max_tr mostly bloat.

    This change removes the max_tr completely, and adds a new structure
    called trace_buffer, that holds the buffer pointer, the per_cpu data
    pointers, the time_start timestamp, and the cpu where the latency occurred.

    The trace_array, now has two trace_buffers, one for the normal trace and
    one for the max trace or snapshot. By doing this, not only do we remove
    the bloat from the max_trace but the instances of traces can now use
    their own snapshot feature and not have just the top level global_trace have
    the snapshot feature and latency tracers for itself.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

01 Mar, 2013

1 commit

  • Pull block IO core bits from Jens Axboe:
    "Below are the core block IO bits for 3.9. It was delayed a few days
    since my workstation kept crashing every 2-8h after pulling it into
    current -git, but turns out it is a bug in the new pstate code (divide
    by zero, will report separately). In any case, it contains:

    - The big cfq/blkcg update from Tejun and and Vivek.

    - Additional block and writeback tracepoints from Tejun.

    - Improvement of the should sort (based on queues) logic in the plug
    flushing.

    - _io() variants of the wait_for_completion() interface, using
    io_schedule() instead of schedule() to contribute to io wait
    properly.

    - Various little fixes.

    You'll get two trivial merge conflicts, which should be easy enough to
    fix up"

    Fix up the trivial conflicts due to hlist traversal cleanups (commit
    b67bfe0d42ca: "hlist: drop the node parameter from iterators").

    * 'for-3.9/core' of git://git.kernel.dk/linux-block: (39 commits)
    block: remove redundant check to bd_openers()
    block: use i_size_write() in bd_set_size()
    cfq: fix lock imbalance with failed allocations
    drivers/block/swim3.c: fix null pointer dereference
    block: don't select PERCPU_RWSEM
    block: account iowait time when waiting for completion of IO request
    sched: add wait_for_completion_io[_timeout]
    writeback: add more tracepoints
    block: add block_{touch|dirty}_buffer tracepoint
    buffer: make touch_buffer() an exported function
    block: add @req to bio_{front|back}_merge tracepoints
    block: add missing block_bio_complete() tracepoint
    block: Remove should_sort judgement when flush blk_plug
    block,elevator: use new hashtable implementation
    cfq-iosched: add hierarchical cfq_group statistics
    cfq-iosched: collect stats from dead cfqgs
    cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats()
    blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock
    block: RCU free request_queue
    blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge()
    ...

    Linus Torvalds
     

22 Jan, 2013

1 commit

  • typeof(&buffer) is a pointer to array of 1024 char, or char (*)[1024].
    But, typeof(&buffer[0]) is a pointer to char which match the return type of get_trace_buf().
    As well-known, the value of &buffer is equal to &buffer[0].
    so return this_cpu_ptr(&percpu_buffer->buffer[0]) can avoid type cast.

    Link: http://lkml.kernel.org/r/50A1A800.3020102@gmail.com

    Reviewed-by: Christoph Lameter
    Signed-off-by: Shan Wei
    Signed-off-by: Steven Rostedt

    Shan Wei
     

14 Jan, 2013

2 commits

  • bio_{front|back}_merge tracepoints report a bio merging into an
    existing request but didn't specify which request the bio is being
    merged into. Add @req to it. This makes it impossible to share the
    event template with block_bio_queue - split it out.

    @req isn't used or exported to userland at this point and there is no
    userland visible behavior change. Later changes will make use of the
    extra parameter.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • bio completion didn't kick block_bio_complete TP. Only dm was
    explicitly triggering the TP on IO completion. This makes
    block_bio_complete TP useless for tracers which want to know about
    bios, and all other bio based drivers skip generating blktrace
    completion events.

    This patch makes all bio completions via bio_endio() generate
    block_bio_complete TP.

    * Explicit trace_block_bio_complete() invocation removed from dm and
    the trace point is unexported.

    * @rq dropped from trace_block_bio_complete(). bios may fly around
    w/o queue associated. Verifying and accessing the assocaited queue
    belongs to TP probes.

    * blktrace now gets both request and bio completions. Make it ignore
    bio completions if request completion path is happening.

    This makes all bio based drivers generate blktrace completion events
    properly and makes the block_bio_complete TP actually useful.

    v2: With this change, block_bio_complete TP could be invoked on sg
    commands which have bio's with %NULL bi_bdev. Update TP
    assignment code to check whether bio->bi_bdev is %NULL before
    dereferencing.

    Signed-off-by: Tejun Heo
    Original-patch-by: Namhyung Kim
    Cc: Tejun Heo
    Cc: Steven Rostedt
    Cc: Alasdair Kergon
    Cc: dm-devel@redhat.com
    Cc: Neil Brown
    Signed-off-by: Jens Axboe

    Tejun Heo
     

06 Apr, 2012

1 commit

  • Many users of debugfs copy the implementation of default_open() when
    they want to support a custom read/write function op. This leads to a
    proliferation of the default_open() implementation across the entire
    tree.

    Now that the common implementation has been consolidated into libfs we
    can replace all the users of this function with simple_open().

    This replacement was done with the following semantic patch:

    @ open @
    identifier open_f != simple_open;
    identifier i, f;
    @@
    -int open_f(struct inode *i, struct file *f)
    -{
    (
    -if (i->i_private)
    -f->private_data = i->i_private;
    |
    -f->private_data = i->i_private;
    )
    -return 0;
    -}

    @ has_open depends on open @
    identifier fops;
    identifier open.open_f;
    @@
    struct file_operations fops = {
    ...
    -.open = open_f,
    +.open = simple_open,
    ...
    };

    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Stephen Boyd
    Cc: Greg Kroah-Hartman
    Cc: Al Viro
    Cc: Julia Lawall
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Boyd