Eric Lee / smarc-fsl-linux-kernel

04 Jan, 2012

1 commit

f4ae40a6a switch debugfs to umode_t ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:54:56 +0800

01 Nov, 2011

1 commit

6e5fdeedc kernel: Fix files explicitly needing EXPORT_SYMBOL infrastructure ... Browse Code »

These files were getting via an implicit non-obvious
path, but we want to crush those out of existence since they cost
time during compiles of processing thousands of lines of headers
for no reason. Give them the lightweight header that just contains
the EXPORT_SYMBOL infrastructure.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-11-01 07:30:05 +0800

11 Aug, 2011

1 commit

c09c47cae blktrace: add FLUSH/FUA support ... Browse Code »

Add FLUSH/FUA support to blktrace. As FLUSH precedes WRITE and/or
FUA follows WRITE, use the same 'F' flag for both cases and
distinguish them by their (relative) position. The end results
look like (other flags might be shown also):

- WRITE: W
- WRITE_FLUSH: FW
- WRITE_FUA: WF
- WRITE_FLUSH_FUA: FWF

Note that we reuse TC_BARRIER due to lack of bit space of act_mask
so that the older versions of blktrace tools will report flush
requests as barriers from now on.

Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Signed-off-by: Namhyung Kim
Reviewed-by: Jeff Moyer
Signed-off-by: Jens Axboe

Namhyung Kim
2011-08-11 16:36:05 +0800

16 Apr, 2011

1 commit

49cac01e1 block: make unplug timer trace event correspond to the schedule() unplug ... Browse Code »

It's a pretty close match to what we had before - the timer triggering
would mean that nobody unplugged the plug in due time, in the new
scheme this matches very closely what the schedule() unplug now is.
It's essentially the difference between an explicit unplug (IO unplug)
or an implicit unplug (timer unplug, we scheduled with pending IO
queued).

Signed-off-by: Jens Axboe

Jens Axboe
2011-04-16 19:51:05 +0800

12 Apr, 2011

2 commits

94b5eb28b block: fixup block IO unplug trace call ... Browse Code »

It was removed with the on-stack plugging, readd it and track the
depth of requests added when flushing the plug.

Signed-off-by: Jens Axboe

Jens Axboe
2011-04-12 16:12:19 +0800
d9c978331 block: remove block_unplug_timer() trace point ... Browse Code »

We no longer have an unplug timer running, so no point in keeping
the trace point.

Signed-off-by: Jens Axboe

Jens Axboe
2011-04-12 16:06:33 +0800

12 Mar, 2011

1 commit

805f6b5e1 blktrace: Use rq->cmd_flags directly in blk_add_trace_rq. ... Browse Code »

In blk_add_trace_rq, we only chose the minor 2 bits from
request's cmd_flags and did some check for discard.
so most of other flags(e.g, REQ_SYNC) are missing.

For example, with a sync write after blkparse we get:
8,16 1 1 0.001776503 7509 A WS 1349632 + 1024 cmd_flags directly to __blk_add_trace.

With this patch, after a sync write we get:
8,16 1 1 0.001776900 5425 A WS 1189888 + 1024
Acked-by: Jeff Moyer
Signed-off-by: Jens Axboe

Tao Ma
2011-03-12 03:11:59 +0800

03 Mar, 2011

1 commit

2d3a8497f blktrace: Remove blk_fill_rwbs_rq. ... Browse Code »

If we enable trace events to trace block actions, We use
blk_fill_rwbs_rq to analyze the corresponding actions
in request's cmd_flags, but we only choose the minor 2 bits
from it, so most of other flags(e.g, REQ_SYNC) are missing.
For example, with a sync write we get:
write_test-2409 [001] 160.013869: block_rq_insert: 3,64 W 0 () 258135 + =
8 [write_test]

Since now we have integrated the flags of both bio and request,
it is safe to pass rq->cmd_flags directly to blk_fill_rwbs and
blk_fill_rwbs_rq isn't needed any more.

With this patch, after a sync write we get:
write_test-2417 [000] 226.603878: block_rq_insert: 3,64 WS 0 () 258135 +=
8 [write_test]

Signed-off-by: Tao Ma
Acked-by: Jeff Moyer
Signed-off-by: Jens Axboe

Tao Ma
2011-03-03 23:53:20 +0800

19 Jan, 2011

1 commit

490da40d8 blktrace: Don't output messages if NOTIFY isn't set. ... Browse Code »

Now if we enable blktrace, cfq has too many messages output to the
trace buffer. It is fine if we don't specify any action mask.
But if I do like this:
blktrace /dev/sdb -a issue -a complete -o - | blkparse -i -
I only want to see 'D' and 'C', while with the following command
dd if=/mnt/ocfs2/test of=/dev/null bs=4k count=1 iflag=direct

I will get(with a 2.6.37 vanilla kernel):
8,16 0 0 0.000000000 0 m N cfq3805 alloced
8,16 0 0 0.000004126 0 m N cfq3805 insert_request
8,16 0 0 0.000004884 0 m N cfq3805 add_to_rr
8,16 0 0 0.000008417 0 m N cfq workload slice:300
8,16 0 0 0.000009557 0 m N cfq3805 set_active wl_prio:0 wl_type:2
8,16 0 0 0.000010640 0 m N cfq3805 fifo= (null)
8,16 0 0 0.000011193 0 m N cfq3805 dispatch_insert
8,16 0 0 0.000012221 0 m N cfq3805 dispatched a request
8,16 0 0 0.000012802 0 m N cfq3805 activate rq, drv=1
8,16 0 1 0.000013181 3805 D R 114759 + 8 [dd]
8,16 0 2 0.000164244 0 C R 114759 + 8 [0]
8,16 0 0 0.000167997 0 m N cfq3805 complete rqnoidle 0
8,16 0 0 0.000168782 0 m N cfq3805 set_slice=100
8,16 0 0 0.000169874 0 m N cfq3805 arm_idle: 8 group_idle: 0
8,16 0 0 0.000170189 0 m N cfq schedule dispatch
8,16 0 0 0.000397938 0 m N cfq3805 slice expired t=0
8,16 0 0 0.000399763 0 m N cfq3805 sl_used=1 disp=1 charge=1 iops=0 sect=8
8,16 0 0 0.000400227 0 m N cfq3805 del_from_rr
8,16 0 0 0.000400882 0 m N cfq3805 put_queue

See, there are 19 lines while I only need 2. I don't think it is
appropriate for a user.

So this patch will disable any messages if the BLK_TC_NOTIFY isn't set.
Now the output for the same command will look like:
8,16 0 1 0.000000000 4908 D R 114759 + 8 [dd]
8,16 0 2 0.000146827 0 C R 114759 + 8 [0]

Yes, it is what I want to see.

Cc: Steven Rostedt
Cc: Jeff Moyer
Signed-off-by: Tao Ma
Signed-off-by: Jens Axboe

Tao Ma
2011-01-19 23:25:02 +0800

10 Jan, 2011

1 commit

797a455d2 block: ensure that completion error gets properly traced ... Browse Code »

We normally just use the BIO_UPTODATE flag to signal 0/-EIO. If
we have more information available, we should pass that along to
the trace output.

Signed-off-by: Jens Axboe

Jens Axboe
2011-01-10 16:06:44 +0800

07 Jan, 2011

1 commit

23036f1a3 blktrace: add missing probe argument to block_bio_complete ... Browse Code »

blktrace.c block bio complete callback needs to gain a new argument to reflect
the newly added "error" tracepoint argument. This is needed to match the new
block_bio_complete TRACE_EVENT as of
commit de983a7bfcb7c020901ca6e2314cf55a4207ab5a.

Signed-off-by: Mathieu Desnoyers
CC: Jeff Moyer
CC: Steven Rostedt
CC: Frederic Weisbecker
CC: Ingo Molnar
CC: Thomas Gleixner
CC: Li Zefan
Signed-off-by: Jens Axboe

Mathieu Desnoyers
2011-01-07 15:53:58 +0800

16 Nov, 2010

1 commit

d07335e51 block: Rename "block_remap" tracepoint to "block_bio_remap" to clarify the event. ... Browse Code »

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Signed-off-by: Mike Snitzer
Signed-off-by: Jens Axboe

Mike Snitzer
2010-11-16 19:53:39 +0800

10 Nov, 2010

1 commit

02e031cbc block: remove REQ_HARDBARRIER ... Browse Code »

REQ_HARDBARRIER is dead now, so remove the leftovers. What's left
at this point is:

- various checks inside the block layer.
- sanity checks in bio based drivers.
- now unused bio_empty_barrier helper.
- Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while,
but Xen really needs to sort out it's barrier situaton.
- setting of ordered tags in uas - dead code copied from old scsi
drivers.
- scsi different retry for barriers - it's dead and should have been
removed when flushes were converted to FS requests.
- blktrace handling of barriers - removed. Someone who knows blktrace
better should add support for REQ_FLUSH and REQ_FUA, though.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2010-11-10 21:54:09 +0800

23 Oct, 2010

1 commit

092e0e7e5 Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl ... Browse Code »

* 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
vfs: make no_llseek the default
vfs: don't use BKL in default_llseek
llseek: automatically add .llseek fop
libfs: use generic_file_llseek for simple_attr
mac80211: disallow seeks in minstrel debug code
lirc: make chardev nonseekable
viotape: use noop_llseek
raw: use explicit llseek file operations
ibmasmfs: use generic_file_llseek
spufs: use llseek in all file operations
arm/omap: use generic_file_llseek in iommu_debug
lkdtm: use generic_file_llseek in debugfs
net/wireless: use generic_file_llseek in debugfs
drm: use noop_llseek

Linus Torvalds
2010-10-23 01:52:56 +0800

19 Oct, 2010

1 commit

01b284f9b blktrace: remove the big kernel lock ... Browse Code »

According to Jens, this code does not need the BKL at all,
it is sufficiently serialized by bd_mutex.

Signed-off-by: Arnd Bergmann
Cc: Jens Axboe
Cc: Steven Rostedt

Arnd Bergmann
2010-10-19 17:29:57 +0800

15 Oct, 2010

1 commit

6038f373a llseek: automatically add .llseek fop ... Browse Code »

All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.

The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.

New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time. Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.

The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.

Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.

Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.

===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
// but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{

}

@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{

}

@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{

}

@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}

@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{

}

@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}

@ fops0 @
identifier fops;
@@
struct file_operations fops = {
...
};

@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
.llseek = llseek_f,
...
};

@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
.read = read_f,
...
};

@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
.write = write_f,
...
};

@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
.open = open_f,
...
};

// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
... .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};

@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
... .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};

// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
... .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};

// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};

// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};

@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+ .llseek = default_llseek, /* write accesses f_pos */
};

// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////

@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
.write = write_f,
.read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};

@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};

@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};

@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====

Signed-off-by: Arnd Bergmann
Cc: Julia Lawall
Cc: Christoph Hellwig

Arnd Bergmann
2010-10-15 21:53:27 +0800

12 Aug, 2010

1 commit

8d57a98cc block: add secure discard ... Browse Code »

Secure discard is the same as discard except that all copies of the
discarded sectors (perhaps created by garbage collection) must also be
erased.

Signed-off-by: Adrian Hunter
Acked-by: Jens Axboe
Cc: Kyungmin Park
Cc: Madhusudhan Chikkature
Cc: Christoph Hellwig
Cc: Ben Gardiner
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Hunter
2010-08-12 23:43:30 +0800

08 Aug, 2010

3 commits

62c2a7d96 block: push BKL into blktrace ioctls ... Browse Code »

The blktrace driver currently needs the BKL, but
we should not need to take that in the block layer,
so just push it down into the driver itself.

It is quite likely that the BKL is not actually
required in blktrace code and could be removed
in a follow-on patch.

Signed-off-by: Arnd Bergmann
Acked-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Arnd Bergmann
2010-08-08 00:26:08 +0800
7b6d91dae block: unify flags for struct bio and struct request ... Browse Code »

Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver. There were two flags in the bio that were
missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
renamed two request flags that had a superflous RW in them.

Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2010-08-08 00:20:39 +0800
33659ebba block: remove wrappers for request type/flags ... Browse Code »

Remove all the trivial wrappers for the cmd_type and cmd_flags fields in
struct requests. This allows much easier grepping for different request
types instead of unwinding through macros.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2010-08-08 00:17:56 +0800

31 May, 2010

1 commit

546cf44a1 blktrace: Fix new kernel-doc warnings ... Browse Code »

Fix blktrace.c kernel-doc warnings:
Warning(kernel/trace/blktrace.c:858): No description found for parameter 'ignore'
Warning(kernel/trace/blktrace.c:890): No description found for parameter 'ignore'

Signed-off-by: Randy Dunlap
Cc: Jens Axboe
Cc: Steven Rostedt
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar

Randy Dunlap
2010-05-31 15:58:20 +0800

15 May, 2010

1 commit

a9a577638 tracing: Allow events to share their print functions ... Browse Code »

Multiple events may use the same method to print their data.
Instead of having all events have a pointer to their print funtions,
the trace_event structure now points to a trace_event_functions structure
that will hold the way to print ouf the event.

The event itself is now passed to the print function to let the print
function know what kind of event it should print.

This opens the door to consolidating the way several events print
their output.

text data bss dec hex filename
4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
4900382 1048964 861512 6810858 67ecea vmlinux.init
4900446 1049028 861512 6810986 67ed6a vmlinux.preprint

This change slightly increases the size but is needed for the next change.

v3: Fix the branch tracer events to handle this change.

v2: Fix the new function graph tracer event calls to handle this change.

Acked-by: Mathieu Desnoyers
Acked-by: Masami Hiramatsu
Acked-by: Frederic Weisbecker
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-05-15 02:20:32 +0800

14 May, 2010

1 commit

38516ab59 tracing: Let tracepoints have data passed to tracepoint callbacks ... Browse Code »

This patch adds data to be passed to tracepoint callbacks.

The created functions from DECLARE_TRACE() now need a mandatory data
parameter. For example:

DECLARE_TRACE(mytracepoint, int value, value)

Will create the register function:

int register_trace_mytracepoint((void(*)(void *data, int value))probe,
void *data);

As the first argument, all callbacks (probes) must take a (void *data)
parameter. So a callback for the above tracepoint will look like:

void myprobe(void *data, int value)
{
}

The callback may choose to ignore the data parameter.

This change allows callbacks to register a private data pointer along
with the function probe.

void mycallback(void *data, int value);

register_trace_mytracepoint(mycallback, mydata);

Then the mycallback() will receive the "mydata" as the first parameter
before the args.

A more detailed example:

DECLARE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

/* In the C file */

DEFINE_TRACE(mytracepoint, TP_PROTO(int status), TP_ARGS(status));

[...]

trace_mytracepoint(status);

/* In a file registering this tracepoint */

int my_callback(void *data, int status)
{
struct my_struct my_data = data;
[...]
}

[...]
my_data = kmalloc(sizeof(*my_data), GFP_KERNEL);
init_my_data(my_data);
register_trace_mytracepoint(my_callback, my_data);

The same callback can also be registered to the same tracepoint as long
as the data registered is different. Note, the data must also be used
to unregister the callback:

unregister_trace_mytracepoint(my_callback, my_data);

Because of the data parameter, tracepoints declared this way can not have
no args. That is:

DECLARE_TRACE(mytracepoint, TP_PROTO(void), TP_ARGS());

will cause an error.

If no arguments are needed, a new macro can be used instead:

DECLARE_TRACE_NOARGS(mytracepoint);

Since there are no arguments, the proto and args fields are left out.

This is part of a series to make the tracepoint footprint smaller:

text data bss dec hex filename
4913961 1088356 861512 6863829 68bbd5 vmlinux.orig
4914025 1088868 861512 6864405 68be15 vmlinux.class
4918492 1084612 861512 6864616 68bee8 vmlinux.tracepoint

Again, this patch also increases the size of the kernel, but
lays the ground work for decreasing it.

v5: Fixed net/core/drop_monitor.c to handle these updates.

v4: Moved the DECLARE_TRACE() DECLARE_TRACE_NOARGS out of the
#ifdef CONFIG_TRACE_POINTS, since the two are the same in both
cases. The __DECLARE_TRACE() is what changes.
Thanks to Frederic Weisbecker for pointing this out.

v3: Made all register_* functions require data to be passed and
all callbacks to take a void * parameter as its first argument.
This makes the calling functions comply with C standards.

Also added more comments to the modifications of DECLARE_TRACE().

v2: Made the DECLARE_TRACE() have the ability to pass arguments
and added a new DECLARE_TRACE_NOARGS() for tracepoints that
do not need any arguments.

Acked-by: Mathieu Desnoyers
Acked-by: Masami Hiramatsu
Acked-by: Frederic Weisbecker
Cc: Neil Horman
Cc: David S. Miller
Signed-off-by: Steven Rostedt

Steven Rostedt
2010-05-14 21:50:34 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

01 Mar, 2010

1 commit

9a8c28c83 blktrace: perform cleanup after setup error ... Browse Code »

Currently even if BLKTRACESETUP ioctl has failed user must call
BLKTRACETEARDOWN to be shure what all staff was cleaned, which
is contr-intuitive.
Let's setup ioctl make necessery cleanup by it self.

Signed-off-by: Dmitry Monakhov
Signed-off-by: Jens Axboe

Dmitry Monakhov
2010-03-01 02:47:19 +0800

02 Oct, 2009

2 commits

b0da3f0da Add a tracepoint for block request remapping ... Browse Code »

Since 2.6.31 now has request-based device-mapper, it's useful to have
a tracepoint for request-remapping as well as bio-remapping.
This patch adds a tracepoint for request-remapping, trace_block_rq_remap().

Signed-off-by: Kiyoshi Ueda
Signed-off-by: Jun'ichi Nomura
Cc: Alasdair G Kergon
Cc: Li Zefan
Signed-off-by: Jens Axboe

Jun'ichi Nomura
2009-10-02 03:19:34 +0800
48c0d4d4c Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs ... Browse Code »

Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
introduced in commit 1d54ad6da9192fed5dd3b60224d9f2dfea0dcd82.
Release kobject also in case the request_fn is NULL.

Problem was noticed via kmemleak backtrace when some sysfs entries were
note properly destroyed during device removal:

unreferenced object 0xffff88001aa76640 (size 80):
comm "lvcreate", pid 2120, jiffies 4294885144
hex dump (first 32 bytes):
01 00 00 00 00 00 00 00 f0 65 a7 1a 00 88 ff ff .........e......
90 66 a7 1a 00 88 ff ff 86 1d 53 81 ff ff ff ff .f........S.....
backtrace:
[] kmemleak_alloc+0x26/0x60
[] kmem_cache_alloc+0x133/0x1c0
[] sysfs_new_dirent+0x41/0x120
[] sysfs_add_file_mode+0x3c/0xb0
[] internal_create_group+0xc1/0x1a0
[] sysfs_create_group+0x13/0x20
[] blk_trace_init_sysfs+0x14/0x20
[] blk_register_queue+0x3c/0xf0
[] add_disk+0x94/0x160
[] dm_create+0x598/0x6e0 [dm_mod]
[] dev_create+0x51/0x350 [dm_mod]
[] ctl_ioctl+0x1a3/0x240 [dm_mod]
[] dm_compat_ctl_ioctl+0x12/0x20 [dm_mod]
[] compat_sys_ioctl+0xcd/0x4f0
[] sysenter_dispatch+0x7/0x2c
[] 0xffffffffffffffff

Signed-off-by: Zdenek Kabelac
Reviewed-by: Li Zefan
Signed-off-by: Jens Axboe

Zdenek Kabelac
2009-10-02 03:15:46 +0800

06 Sep, 2009

1 commit

ed011b22c Merge commit 'v2.6.31-rc9' into tracing/core ... Browse Code »

Merge reason: move from -rc5 to -rc9.

Signed-off-by: Ingo Molnar

Ingo Molnar
2009-09-06 12:11:42 +0800

05 Sep, 2009

1 commit

e77405ad8 tracing: pass around ring buffer instead of tracer ... Browse Code »

The latency tracers (irqsoff and wakeup) can swap trace buffers
on the fly. If an event is happening and has reserved data on one of
the buffers, and the latency tracer swaps the global buffer with the
max buffer, the result is that the event may commit the data to the
wrong buffer.

This patch changes the API to the trace recording to be recieve the
buffer that was used to reserve a commit. Then this buffer can be passed
in to the commit.

Signed-off-by: Steven Rostedt

Steven Rostedt
2009-09-05 06:59:39 +0800

13 Aug, 2009

1 commit

39cbb602b Remove double removal of blktrace directory ... Browse Code »

commit fd51d251e4cdb21f68e9dbc4336514d64a105a79
Author: Stefan Raspl
Date: Tue May 19 09:59:08 2009 +0200

blktrace: remove debugfs entries on bad path

added in an explicit invocation of debugfs_remove for bt->dir, in
blk_remove_buf_file_callback we are also getting the directory removed. On
occasion I am seeing memory corruption that I have bisected down to
this commit. [The testing involves a (long) series of I/O benchmarks
with blktrace invoked around the actual runs.] I believe that this
committed patch is correct, but the problem actually lies in the code
in blk_remove_buf_file_callback.

With this patch I am able to consistently get complete runs whereas
previously I could not get a single run to complete.

The first part of the patch simply moves the debugfs_remove below the
relay_close: the relay_close call will remove files under bt->dir, and
so we should not remove the directory until all the files we created
have been removed. (Note: This is not sufficient to fix the problem -
the file system code has ref counts on the directoy, so our invocation
does not cause the directory to actually be removed. Nonetheless, we
should not rely upon that feature.)

Signed-off-by: Alan D. Brunelle
Signed-off-by: Jens Axboe

Alan D. Brunelle
2009-08-13 00:50:08 +0800

13 Jul, 2009

1 commit

405f55712 headers: smp_lock.h redux ... Browse Code »

* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

This will make hardirq.h inclusion cheaper for every PREEMPT=n config
(which includes allmodconfig/allyesconfig, BTW)

Signed-off-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-07-13 03:22:34 +0800

12 Jun, 2009

1 commit

c9059598e Merge branch 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits)
block: add request clone interface (v2)
floppy: fix hibernation
ramdisk: remove long-deprecated "ramdisk=" boot-time parameter
fs/bio.c: add missing __user annotation
block: prevent possible io_context->refcount overflow
Add serial number support for virtio_blk, V4a
block: Add missing bounce_pfn stacking and fix comments
Revert "block: Fix bounce limit setting in DM"
cciss: decode unit attention in SCSI error handling code
cciss: Remove no longer needed sendcmd reject processing code
cciss: change SCSI error handling routines to work with interrupts enabled.
cciss: separate error processing and command retrying code in sendcmd_withirq_core()
cciss: factor out fix target status processing code from sendcmd functions
cciss: simplify interface of sendcmd() and sendcmd_withirq()
cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code
cciss: Use schedule_timeout_uninterruptible in SCSI error handling code
block: needs to set the residual length of a bidi request
Revert "block: implement blkdev_readpages"
block: Fix bounce limit setting in DM
Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt
...

Manually fix conflicts with tracing updates in:
block/blk-sysfs.c
drivers/ide/ide-atapi.c
drivers/ide/ide-cd.c
drivers/ide/ide-floppy.c
drivers/ide/ide-tape.c
include/trace/events/block.h
kernel/trace/blktrace.c

Linus Torvalds
2009-06-12 02:10:35 +0800

10 Jun, 2009

1 commit

55782138e tracing/events: convert block trace points to TRACE_EVENT() ... Browse Code »

TRACE_EVENT is a more generic way to define tracepoints. Doing so adds
these new capabilities to this tracepoint:

- zero-copy and per-cpu splice() tracing
- binary tracing without printf overhead
- structured logging records exposed under /debug/tracing/events
- trace events embedded in function tracer output and other plugins
- user-defined, per tracepoint filter expressions
...

Cons:

- no dev_t info for the output of plug, unplug_timer and unplug_io events.
no dev_t info for getrq and sleeprq events if bio == NULL.
no dev_t info for rq_abort,...,rq_requeue events if rq->rq_disk == NULL.

This is mainly because we can't get the deivce from a request queue.
But this may change in the future.

- A packet command is converted to a string in TP_assign, not TP_print.
While blktrace do the convertion just before output.

Since pc requests should be rather rare, this is not a big issue.

- In blktrace, an event can have 2 different print formats, but a TRACE_EVENT
has a unique format, which means we have some unused data in a trace entry.

The overhead is minimized by using __dynamic_array() instead of __array().

I've benchmarked the ioctl blktrace vs the splice based TRACE_EVENT tracing:

dd dd + ioctl blktrace dd + TRACE_EVENT (splice)
1 7.36s, 42.7 MB/s 7.50s, 42.0 MB/s 7.41s, 42.5 MB/s
2 7.43s, 42.3 MB/s 7.48s, 42.1 MB/s 7.43s, 42.4 MB/s
3 7.38s, 42.6 MB/s 7.45s, 42.2 MB/s 7.41s, 42.5 MB/s

So the overhead of tracing is very small, and no regression when using
those trace events vs blktrace.

And the binary output of TRACE_EVENT is much smaller than blktrace:

# ls -l -h
-rw-r--r-- 1 root root 8.8M 06-09 13:24 sda.blktrace.0
-rw-r--r-- 1 root root 195K 06-09 13:24 sda.blktrace.1
-rw-r--r-- 1 root root 2.7M 06-09 13:25 trace_splice.out

Following are some comparisons between TRACE_EVENT and blktrace:

plug:
kjournald-480 [000] 303.084981: block_plug: [kjournald]
kjournald-480 [000] 303.084981: 8,0 P N [kjournald]

unplug_io:
kblockd/0-118 [000] 300.052973: block_unplug_io: [kblockd/0] 1
kblockd/0-118 [000] 300.052974: 8,0 U N [kblockd/0] 1

remap:
kjournald-480 [000] 303.085042: block_remap: 8,0 W 102736992 + 8 v3:

- use the newly introduced __dynamic_array().

Changelog from v1 -> v2:

- use __string() instead of __array() to minimize the memory required
to store hex dump of rq->cmd().

- support large pc requests.

- add missing blk_fill_rwbs_rq() in block_rq_requeue TRACE_EVENT.

- some cleanups.

Signed-off-by: Li Zefan
LKML-Reference:
Signed-off-by: Steven Rostedt

Li Zefan
2009-06-10 00:34:23 +0800

19 May, 2009

1 commit

fd51d251e blktrace: remove debugfs entries on bad path ... Browse Code »

debugfs directory entries for devices are not removed on some
of the failure pathes in do_blk_trace_setup().
One way to reproduce is to start blktrace on multiple devices
with insufficient Vmalloc space: Devices will fail with
a message like this:

BLKTRACESETUP(2) /dev/sdu failed: 5/Input/output error

If so, the respective entries in debugfs
(e.g. /sys/kernel/debug/block/sdu) will remain and subsequent
attempts to start blktrace on the respective devices will not
succeed due to existing directories.

[ Impact: fix /debug/tracing file cleanup corner case ]

Signed-off-by: Stefan Raspl
Acked-by: Li Zefan
Cc: Li Zefan
Cc: schwidefsky@de.ibm.com
Cc: heiko.carstens@de.ibm.com
LKML-Reference:
Signed-off-by: Ingo Molnar

Stefan Raspl
2009-05-19 16:29:21 +0800

11 May, 2009

3 commits

049862579 blktrace: pdu_buf of pc events should be unsigned ... Browse Code »

I got this:
8,0 1 305.417782332 2037 I R 32 (ffffff9e 10 00 ...) [bash]

It should be:
8,0 1 305.417782332 2037 I R 32 (9e 10 00 ...) [bash]

[ Impact: fix output of pc events ]

Signed-off-by: Li Zefan
Cc: Jens Axboe
Cc: Arnaldo Carvalho de Melo
Cc: Steven Rostedt
Cc: Frederic Weisbecker
LKML-Reference:
Signed-off-by: Ingo Molnar

Li Zefan
2009-05-11 18:25:50 +0800
2e46e8b27 block: drop request->hard_* and *nr_sectors ... Browse Code »

struct request has had a few different ways to represent some
properties of a request. ->hard_* represent block layer's view of the
request progress (completion cursor) and the ones without the prefix
are supposed to represent the issue cursor and allowed to be updated
as necessary by the low level drivers. The thing is that as block
layer supports partial completion, the two cursors really aren't
necessary and only cause confusion. In addition, manual management of
request detail from low level drivers is cumbersome and error-prone at
the very least.

Another interesting duplicate fields are rq->[hard_]nr_sectors and
rq->{hard_cur|current}_nr_sectors against rq->data_len and
rq->bio->bi_size. This is more convoluted than the hard_ case.

rq->[hard_]nr_sectors are initialized for requests with bio but
blk_rq_bytes() uses it only for !pc requests. rq->data_len is
initialized for all request but blk_rq_bytes() uses it only for pc
requests. This causes good amount of confusion throughout block layer
and its drivers and determining the request length has been a bit of
black magic which may or may not work depending on circumstances and
what the specific LLD is actually doing.

rq->{hard_cur|current}_nr_sectors represent the number of sectors in
the contiguous data area at the front. This is mainly used by drivers
which transfers data by walking request segment-by-segment. This
value always equals rq->bio->bi_size >> 9. However, data length for
pc requests may not be multiple of 512 bytes and using this field
becomes a bit confusing.

In general, having multiple fields to represent the same property
leads only to confusion and subtle bugs. With recent block low level
driver cleanups, no driver is accessing or manipulating these
duplicate fields directly. Drop all the duplicates. Now rq->sector
means the current sector, rq->data_len the current total length and
rq->bio->bi_size the current segment length. Everything else is
defined in terms of these three and available only through accessors.

* blk_recalc_rq_sectors() is collapsed into blk_update_request() and
now handles pc and fs requests equally other than rq->sector update.
This means that now pc requests can use partial completion too (no
in-kernel user yet tho).

* bio_cur_sectors() is replaced with bio_cur_bytes() as block layer
now uses byte count as the primary data length.

* blk_rq_pos() is now guranteed to be always correct. In-block users
converted.

* blk_rq_bytes() is now guaranteed to be always valid as is
blk_rq_sectors(). In-block users converted.

* blk_rq_sectors() is now guaranteed to equal blk_rq_bytes() >> 9.
More convenient one is used.

* blk_rq_bytes() and blk_rq_cur_bytes() are now inlined and take const
pointer to request.

[ Impact: API cleanup, single way to represent one property of a request ]

Signed-off-by: Tejun Heo
Cc: Boaz Harrosh
Signed-off-by: Jens Axboe

Tejun Heo
2009-05-11 15:50:54 +0800
5b93629b4 block: implement blk_rq_pos/[cur_]sectors() and convert obvious ones ... Browse Code »

Implement accessors - blk_rq_pos(), blk_rq_sectors() and
blk_rq_cur_sectors() which return rq->hard_sector, rq->hard_nr_sectors
and rq->hard_cur_sectors respectively and convert direct references of
the said fields to the accessors.

This is in preparation of request data length handling cleanup.

Geert : suggested adding const to struct request * parameter to accessors
Sergei : spotted error in patch description

[ Impact: cleanup ]

Signed-off-by: Tejun Heo
Acked-by: Geert Uytterhoeven
Acked-by: Stephen Rothwell
Tested-by: Grant Likely
Acked-by: Grant Likely
Ackec-by: Sergei Shtylyov
Cc: Bartlomiej Zolnierkiewicz
Cc: Borislav Petkov
Cc: James Bottomley
Signed-off-by: Jens Axboe

Tejun Heo
2009-05-11 15:50:53 +0800

06 May, 2009

2 commits

22a7c31a9 blktrace: from-sector redundant in trace_block_remap ... Browse Code »

Remove redundant from-sector parameter: it's /always/ the bio's sector
passed in.

[ Impact: cleanup ]

Signed-off-by: Alan D. Brunelle
Reviewed-by: Li Zefan
Reviewed-by: KOSAKI Motohiro
Cc: Jens Axboe
Cc: Arnaldo Carvalho de Melo
LKML-Reference:
Signed-off-by: Ingo Molnar

Alan D. Brunelle
2009-05-06 20:13:01 +0800
a42aaa3bb blktrace: correct remap names ... Browse Code »

This attempts to clarify names utilized during block I/O remap
operations (partition, volume manager). It correctly matches up the
/from/ information for both device & sector. This takes in the concept
from Kosaki Motohiro and extends it to include better naming for the
"device_from" field.

[ Impact: cleanup ]

Signed-off-by: Alan D. Brunelle
Reviewed-by: Li Zefan
Reviewed-by: KOSAKI Motohiro
Cc: Jens Axboe
Cc: Arnaldo Carvalho de Melo
LKML-Reference:
Signed-off-by: Ingo Molnar

Alan D. Brunelle
2009-05-06 20:13:00 +0800

16 Apr, 2009

1 commit

f3948f885 blktrace: fix context-info when mixed-using blk tracer and trace events ... Browse Code »

When current tracer is set to blk tracer, TRACE_ITER_CONTEXT_INFO is
unset, but actually context-info is printed:

pdflush-431 [000] 821.181576: 8,0 P N [pdflush]

And then if we enable TRACE_ITER_CONTEXT_INFO:

# echo context-info > trace_options

We'll see context-info printed twice. What's worse, when we use blk
tracer and trace events at the same time, we'll see no context-info
for trace events at all:

jbd2_commit_logging: dev dm-0:8 transaction 333227
jbd2_end_commit: dev dm-0:8 transaction 333227 head 332814
rm-25433 [001] 9578.307485: 8,18 m N cfq25433 slice expired t=0
rm-25433 [001] 9578.307486: 8,18 m N cfq25433 put_queue

This patch adds blk_tracer->set_flags(), and context-info flag is unset
only when we set the output to classic mode.

Note after this patch, one should unset context-info explicitly if he
wants to get binary output that can be parsed by blkparse:

# echo nocontext-info > trace_options
# echo bin > trace_options
# echo blk > current_tracer
# cat trace_pipe | blkparse -i -

Reported-by: Theodore Ts'o
Signed-off-by: Li Zefan
Cc: Jens Axboe
Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Steven Rostedt
LKML-Reference:
Signed-off-by: Ingo Molnar

Li Zefan
2009-04-16 16:11:01 +0800