Eric Lee / smarc-fsl-linux-kernel

04 Jan, 2012

1 commit

2c9ede55e switch device_get_devnode() and ->devnode() to umode_t * ... Browse Code »

both callers of device_get_devnode() are only interested in lower 16bits
and nobody tries to return anything wider than 16bit anyway.

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:54:55 +0800

10 Nov, 2011

1 commit

d0985394e block: Revert "[SCSI] genhd: add a new attribute "alias" in gendisk" ... Browse Code »

This reverts commit a72c5e5eb738033938ab30d6a634b74d1d060f10.

The commit introduced alias for block devices which is intended to be
used during logging although actual usage hasn't been committed yet.
This approach adds very limited benefit (raw log might be easier to
follow) which can be trivially implemented in userland but has a lot
of problems.

It is much worse than netif renames because it doesn't rename the
actual device but just adds conveninence name which isn't used
universally or enforced. Everything internal including device lookup
and sysfs still uses the internal name and nothing prevents two
devices from using conflicting alias - ie. sda can have sdb as its
alias.

This has been nacked by people working on device driver core, block
layer and kernel-userland interface and shouldn't have been
upstreamed. Revert it.

http://thread.gmane.org/gmane.linux.kernel/1155104
http://thread.gmane.org/gmane.linux.scsi/68632
http://thread.gmane.org/gmane.linux.scsi/69776

Signed-off-by: Tejun Heo
Acked-by: Greg Kroah-Hartman
Acked-by: Kay Sievers
Cc: "James E.J. Bottomley"
Cc: Nao Nishijima
Cc: Alan Cox
Cc: Al Viro
Signed-off-by: Jens Axboe

Tejun Heo
2011-11-10 16:03:55 +0800

05 Nov, 2011

1 commit

3d0a8d10c Merge branch 'for-3.2/drivers' of git://git.kernel.dk/linux-block ... Browse Code »

* 'for-3.2/drivers' of git://git.kernel.dk/linux-block: (30 commits)
virtio-blk: use ida to allocate disk index
hpsa: add small delay when using PCI Power Management to reset for kump
cciss: add small delay when using PCI Power Management to reset for kump
xen/blkback: Fix two races in the handling of barrier requests.
xen/blkback: Check for proper operation.
xen/blkback: Fix the inhibition to map pages when discarding sector ranges.
xen/blkback: Report VBD_WSECT (wr_sect) properly.
xen/blkback: Support 'feature-barrier' aka old-style BARRIER requests.
xen-blkfront: plug device number leak in xlblk_init() error path
xen-blkfront: If no barrier or flush is supported, use invalid operation.
xen-blkback: use kzalloc() in favor of kmalloc()+memset()
xen-blkback: fixed indentation and comments
xen-blkfront: fix a deadlock while handling discard response
xen-blkfront: Handle discard requests.
xen-blkback: Implement discard requests ('feature-discard')
xen-blkfront: add BLKIF_OP_DISCARD and discard request struct
drivers/block/loop.c: remove unnecessary bdev argument from loop_clr_fd()
drivers/block/loop.c: emit uevent on auto release
drivers/block/cpqarray.c: use pci_dev->revision
loop: always allow userspace partitions and optionally support automatic scanning
...

Fic up trivial header file includsion conflict in drivers/block/loop.c

Linus Torvalds
2011-11-05 08:22:14 +0800

29 Aug, 2011

1 commit

a72c5e5eb [SCSI] genhd: add a new attribute "alias" in gendisk ... Browse Code »

This patch allows the user to set an "alias" of the disk via sysfs interface.

This patch only adds a new attribute "alias" in gendisk structure.
To show the alias instead of the device name in kernel messages,
we need to revise printk messages and use alias_name() in them.

Example:
(current) printk("disk name is %s\n", disk->disk_name);
(new) printk("disk name is %s\n", alias_name(disk));

Users can use alphabets, numbers, '-' and '_' in "alias" attribute. A disk can
have an "alias" which length is up to 255 bytes. This attribute is write-once.

Suggested-by: James Bottomley
Suggested-by: Jon Masters
Signed-off-by: Nao Nishijima
Signed-off-by: James Bottomley

Nao Nishijima
2011-08-29 15:16:19 +0800

24 Aug, 2011

1 commit

d27769ec3 block: add GENHD_FL_NO_PART_SCAN ... Browse Code »

There are cases where suppressing partition scan is useful - e.g. for
lo devices and pseudo SATA devices which advertise to be a disk but
get upset on partition scan (some port multiplier control devices show
such behavior).

This patch adds GENHD_FL_NO_PART_SCAN which suppresses partition scan
regardless of the number of possible partitions. disk_partitionable()
is renamed to disk_part_scan_enabled() as suppressing partition scan
doesn't imply the device can't be partitioned using
BLKPG_ADD/DEL_PARTITION calls from userland. show_partition() now
directly tests disk_max_parts() to maintain backward-compatibility.

-v2: Updated to make it clear that only partition scan is suppressed
not partitioning itself as suggested by Kay Sievers.

Signed-off-by: Tejun Heo
Cc: Kay Sievers
Signed-off-by: Jens Axboe

Tejun Heo
2011-08-24 02:01:04 +0800

01 Jul, 2011

1 commit

85ef06d1d block: flush MEDIA_CHANGE from drivers on close(2) ... Browse Code »
43

Currently, only open(2) is defined as the 'clearing' point. It has
two roles - first, it's an acknowledgement from userland indicating
that the event has been received and kernel can clear pending states
and proceed to generate more events. Secondly, it's passed on to
device drivers as a hint indicating that a synchronization point has
been reached and it might want to take a deeper look at the device.

The latter currently is only used by sr which uses two different
mechanisms - GET_EVENT_MEDIA_STATUS_NOTIFICATION and TEST_UNIT_READY
to discover events, where the former is lighter weight and safe to be
used repeatedly but may not provide full coverage. Among other
things, GET_EVENT can't detect media removal while TUR can.

This patch makes close(2) - blkdev_put() - indicate clearing hint for
MEDIA_CHANGE to drivers. disk_check_events() is renamed to
disk_flush_events() and updated to take @mask for events to flush
which is or'd to ev->clearing and will be passed to the driver on the
next ->check_events() invocation.

This change makes sr generate MEDIA_CHANGE when media is ejected from
userland - e.g. with eject(1).

Note: Given the current usage, it seems @clearing hint is needlessly
complex. disk_clear_events() can simply clear all events and the hint
can be boolean @flush.

Signed-off-by: Tejun Heo
Cc: Kay Sievers
Signed-off-by: Jens Axboe

Tejun Heo
2011-07-01 22:17:47 +0800

30 May, 2011

1 commit

a1706ac4c Revert "block: Remove extra discard_alignment from hd_struct." ... Browse Code »

It was not a good idea to start dereferencing disk->queue from
the fs sysfs strategy for displaying discard alignment. We ran
into first a NULL pointer deref, and after fixing that we sometimes
see unvalid disk->queue pointer values.

Since discard is the only one of the bunch actually looking into
the queue, just revert the change.

This reverts commit 23ceb5b7719e9276d4fa72a3ecf94dd396755276.

Conflicts:
fs/partitions/check.c

Jens Axboe
2011-05-30 13:42:51 +0800

07 May, 2011

1 commit

23ceb5b77 block: Remove extra discard_alignment from hd_struct. ... Browse Code »

Currently, hd_struct.discard_alignment is only used when we
show /sys/block/sdx/sdx/discard_alignment. So remove it and
calculate when it is asked to show.

Signed-off-by: Tao Ma
Signed-off-by: Jens Axboe

Tao Ma
2011-05-07 09:30:02 +0800

22 Apr, 2011

1 commit

d4dc210f6 block: don't block events on excl write for non-optical devices ... Browse Code »

Disk event code automatically blocks events on excl write. This is
primarily to avoid issuing polling commands while burning is in
progress. This behavior doesn't fit other types of devices with
removeable media where polling commands don't have adverse side
effects and door locking usually doesn't exist.

This patch introduces new genhd flag which controls the auto-blocking
behavior and uses it to enable auto-blocking only on optical devices.

Note for stable: 2.6.38 and later only

Cc: stable@kernel.org
Signed-off-by: Tejun Heo
Reported-by: Kay Sievers
Signed-off-by: Jens Axboe

Tejun Heo
2011-04-22 02:54:46 +0800

22 Mar, 2011

1 commit

1e9bb8808 block: fix non-atomic access to genhd inflight structures ... Browse Code »

After the stack plugging introduction, these are called lockless.
Ensure that the counters are updated atomically.

Signed-off-by: Shaohua Li
Signed-off-by: Jens Axboe

Shaohua Li
2011-03-22 15:35:35 +0800

13 Jan, 2011

1 commit

81c5e2ae3 Merge branch 'for-2.6.38/event-handling' into for-2.6.38/core Browse Code »

Jens Axboe
2011-01-13 21:47:54 +0800

07 Jan, 2011

1 commit

6c23a9681 block: add internal hd part table references ... Browse Code »

We can't use krefs since it's apparently restricted to very basic
reference counting.

This reverts commit e4a683c8.

Signed-off-by: Jens Axboe

Jens Axboe
2011-01-07 15:43:37 +0800

05 Jan, 2011

1 commit

09e099d4b block: fix accounting bug on cross partition merges ... Browse Code »

/proc/diskstats would display a strange output as follows.

$ cat /proc/diskstats |grep sda
8 0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089
8 1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691
~~~~~~~~~~
8 2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390
8 3 sda3 54 487 2188 92 0 0 0 0 0 88 92
8 4 sda4 4 0 8 0 0 0 0 0 0 0 0
8 5 sda5 81 2027 2130 138 0 0 0 0 0 87 137

Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is
merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE.

The detailed root cause is as follows.

Assuming that there are two partition, sda1 and sda2.

1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight
is 0 and sda2's one is 1.

| hd_struct->in_flight
---------------------------
sda1 | 0
sda2 | 1
---------------------------

2. A bio belongs to sda1 is issued and is merged into the request mentioned on
step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed
from sda2 region to sda1 region. However the two partition's
hd_struct->in_flight are not changed.

| hd_struct->in_flight
---------------------------
sda1 | 0
sda2 | 1
---------------------------

3. The request is finished and blk_account_io_done() is called. In this case,
sda2's hd_struct->in_flight, not a sda1's one, is decremented.

| hd_struct->in_flight
---------------------------
sda1 | -1
sda2 | 1
---------------------------

The patch fixes the problem by caching the partition lookup
inside the request structure, hence making sure that the increment
and decrement will always happen on the same partition struct. This
also speeds up IO with accounting enabled, since it cuts down on
the number of lookups we have to do.

Also add a refcount to struct hd_struct to keep the partition in
memory as long as users exist. We use kref_test_and_get() to ensure
we don't add a reference to a partition which is going away.

Signed-off-by: Jerome Marchand
Signed-off-by: Yasuaki Ishimatsu
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Jerome Marchand
2011-01-05 23:57:38 +0800

17 Dec, 2010

3 commits

77ea887e4 implement in-kernel gendisk events handling ... Browse Code »

Currently, media presence polling for removeable block devices is done
from userland. There are several issues with this.

* Polling is done by periodically opening the device. For SCSI
devices, the command sequence generated by such action involves a
few different commands including TEST_UNIT_READY. This behavior,
while perfectly legal, is different from Windows which only issues
single command, GET_EVENT_STATUS_NOTIFICATION. Unfortunately, some
ATAPI devices lock up after being periodically queried such command
sequences.

* There is no reliable and unintrusive way for a userland program to
tell whether the target device is safe for media presence polling.
For example, polling for media presence during an on-going burning
session can make it fail. The polling program can avoid this by
opening the device with O_EXCL but then it risks making a valid
exclusive user of the device fail w/ -EBUSY.

* Userland polling is unnecessarily heavy and in-kernel implementation
is lighter and better coordinated (workqueue, timer slack).

This patch implements framework for in-kernel disk event handling,
which includes media presence polling.

* bdops->check_events() is added, which supercedes ->media_changed().
It should check whether there's any pending event and return if so.
Currently, two events are defined - DISK_EVENT_MEDIA_CHANGE and
DISK_EVENT_EJECT_REQUEST. ->check_events() is guaranteed not to be
called parallelly.

* gendisk->events and ->async_events are added. These should be
initialized by block driver before passing the device to add_disk().
The former contains the mask of all supported events and the latter
the mask of all events which the device can report without polling.
/sys/block/*/events[_async] export these to userland.

* Kernel parameter block.events_dfl_poll_msecs controls the system
polling interval (default is 0 which means disable) and
/sys/block/*/events_poll_msecs control polling intervals for
individual devices (default is -1 meaning use system setting). Note
that if a device can report all supported events asynchronously and
its polling interval isn't explicitly set, the device won't be
polled regardless of the system polling interval.

* If a device is opened exclusively with write access, event checking
is automatically disabled until all write exclusive accesses are
released.

* There are event 'clearing' events. For example, both of currently
defined events are cleared after the device has been successfully
opened. This information is passed to ->check_events() callback
using @clearing argument as a hint.

* Event checking is always performed from system_nrt_wq and timer
slack is set to 25% for polling.

* Nothing changes for drivers which implement ->media_changed() but
not ->check_events(). Going forward, all drivers will be converted
to ->check_events() and ->media_change() will be dropped.

Signed-off-by: Tejun Heo
Cc: Kay Sievers
Cc: Jan Kara
Signed-off-by: Jens Axboe

Tejun Heo
2010-12-17 00:53:38 +0800
d2bf1b672 block: move register_disk() and del_gendisk() to block/genhd.c ... Browse Code »

There's no reason for register_disk() and del_gendisk() to be in
fs/partitions/check.c. Move both to genhd.c. While at it, collapse
unlink_gendisk(), which was artificially in a separate function due to
genhd.c / check.c split, into del_gendisk().

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2010-12-17 00:53:38 +0800
dddd9dc34 block: kill genhd_media_change_notify() ... Browse Code »

There's no user of the facility. Kill it.

Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe

Tejun Heo
2010-12-17 00:53:38 +0800

25 Oct, 2010

1 commit

f253b86b4 Revert "block: fix accounting bug on cross partition merges" ... Browse Code »

This reverts commit 7681bfeeccff5efa9eb29bf09249a3c400b15327.

Conflicts:

include/linux/genhd.h

It has numerous issues with the cleanup path and non-elevator
devices. Revert it for now so we can come up with a clean
version without rushing things.

Signed-off-by: Jens Axboe

Jens Axboe
2010-10-25 04:06:02 +0800

23 Oct, 2010

1 commit

e9dd2b683 Merge branch 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block: (39 commits)
cfq-iosched: Fix a gcc 4.5 warning and put some comments
block: Turn bvec_k{un,}map_irq() into static inline functions
block: fix accounting bug on cross partition merges
block: Make the integrity mapped property a bio flag
block: Fix double free in blk_integrity_unregister
block: Ensure physical block size is unsigned int
blkio-throttle: Fix possible multiplication overflow in iops calculations
blkio-throttle: limit max iops value to UINT_MAX
blkio-throttle: There is no need to convert jiffies to milli seconds
blkio-throttle: Fix link failure failure on i386
blkio: Recalculate the throttled bio dispatch time upon throttle limit change
blkio: Add root group to td->tg_list
blkio: deletion of a cgroup was causes oops
blkio: Do not export throttle files if CONFIG_BLK_DEV_THROTTLING=n
block: set the bounce_pfn to the actual DMA limit rather than to max memory
block: revert bad fix for memory hotplug causing bounces
Fix compile error in blk-exec.c for !CONFIG_DETECT_HUNG_TASK
block: set the bounce_pfn to the actual DMA limit rather than to max memory
block: Prevent hang_check firing during long I/O
cfq: improve fsync performance for small files
...

Fix up trivial conflicts due to __rcu sparse annotation in include/linux/genhd.h

Linus Torvalds
2010-10-23 08:00:32 +0800

19 Oct, 2010

1 commit

7681bfeec block: fix accounting bug on cross partition merges ... Browse Code »

/proc/diskstats would display a strange output as follows.

$ cat /proc/diskstats |grep sda
8 0 sda 90524 7579 102154 20464 0 0 0 0 0 14096 20089
8 1 sda1 19085 1352 21841 4209 0 0 0 0 4294967064 15689 4293424691
~~~~~~~~~~
8 2 sda2 71252 3624 74891 15950 0 0 0 0 232 23995 1562390
8 3 sda3 54 487 2188 92 0 0 0 0 0 88 92
8 4 sda4 4 0 8 0 0 0 0 0 0 0 0
8 5 sda5 81 2027 2130 138 0 0 0 0 0 87 137

Its reason is the wrong way of accounting hd_struct->in_flight. When a bio is
merged into a request belongs to different partition by ELEVATOR_FRONT_MERGE.

The detailed root cause is as follows.

Assuming that there are two partition, sda1 and sda2.

1. A request for sda2 is in request_queue. Hence sda1's hd_struct->in_flight
is 0 and sda2's one is 1.

| hd_struct->in_flight
---------------------------
sda1 | 0
sda2 | 1
---------------------------

2. A bio belongs to sda1 is issued and is merged into the request mentioned on
step1 by ELEVATOR_BACK_MERGE. The first sector of the request is changed
from sda2 region to sda1 region. However the two partition's
hd_struct->in_flight are not changed.

| hd_struct->in_flight
---------------------------
sda1 | 0
sda2 | 1
---------------------------

3. The request is finished and blk_account_io_done() is called. In this case,
sda2's hd_struct->in_flight, not a sda1's one, is decremented.

| hd_struct->in_flight
---------------------------
sda1 | -1
sda2 | 1
---------------------------

The patch fixes the problem by caching the partition lookup
inside the request structure, hence making sure that the increment
and decrement will always happen on the same partition struct. This
also speeds up IO with accounting enabled, since it cuts down on
the number of lookups we have to do.

When reloading partition tables, quiesce IO to ensure that no
request references to the partition struct exists. When it is safe
to free the partition table, the IO for that device is restarted
again.

Signed-off-by: Yasuaki Ishimatsu
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Yasuaki Ishimatsu
2010-10-19 15:07:02 +0800

15 Sep, 2010

1 commit

6d1d8050b block, partition: add partition_meta_info to hd_struct ... Browse Code »
44

I'm reposting this patch series as v4 since there have been no additional
comments, and I cleaned up one extra bit of unneeded code (in 3/3). The patches
are against Linus's tree: 2bfc96a127bc1cc94d26bfaa40159966064f9c8c
(2.6.36-rc3).

Would this patchset be suitable for inclusion in an mm branch?

This changes adds a partition_meta_info struct which itself contains a
union of structures that provide partition table specific metadata.

This change leaves the union empty. The subsequent patch includes an
implementation for CONFIG_EFI_PARTITION-based metadata.

Signed-off-by: Will Drewry
Signed-off-by: Jens Axboe

Will Drewry
2010-09-15 22:13:18 +0800

20 Aug, 2010

1 commit

4d2deb40b kernel: __rcu annotations ... Browse Code »

This adds annotations for RCU operations in core kernel components

Signed-off-by: Arnd Bergmann
Signed-off-by: Paul E. McKenney
Cc: Al Viro
Cc: Jens Axboe
Cc: Andrew Morton
Reviewed-by: Josh Triplett

Arnd Bergmann
2010-08-20 08:18:03 +0800

16 Mar, 2010

1 commit

97fedbbe1 Remove GENHD_FL_DRIVERFS ... Browse Code »

This flag is not used, so best discarded.

Signed-off-by: NeilBrown
--
Hi Jens,
I came across this recently - these are the only two occurances
of "GENHD_FL_DRIVERFS" in the kernel, so it cannot be needed.
NeilBrown
Signed-off-by: Jens Axboe

NeilBrown
2010-03-16 15:55:32 +0800

17 Feb, 2010

1 commit

43cf38eb5 percpu: add __percpu sparse annotations to core kernel subsystems ... Browse Code »

Add __percpu sparse annotations to core subsystems.

These annotations are to make sparse consider percpu variables to be
in a different address space and warn if accessed without going
through percpu accessors. This patch doesn't affect normal builds.

Signed-off-by: Tejun Heo
Reviewed-by: Christoph Lameter
Acked-by: Paul E. McKenney
Cc: Jens Axboe
Cc: linux-mm@kvack.org
Cc: Rusty Russell
Cc: Dipankar Sarma
Cc: Peter Zijlstra
Cc: Andrew Morton
Cc: Eric Biederman

Tejun Heo
2010-02-17 10:17:38 +0800

11 Jan, 2010

1 commit

7af92f875 genhd: overlapping variable definition ... Browse Code »

This fixes the sparse warning:
fs/ext4/super.c:2390:40: warning: symbol 'i' shadows an earlier one
fs/ext4/super.c:2368:22: originally declared here

Using 'i' in a macro is dubious practice.

Signed-off-by: Stephen Hemminger
Signed-off-by: Jens Axboe

Stephen Hemminger
2010-01-11 21:32:44 +0800

10 Nov, 2009

1 commit

86b372814 block: Expose discard granularity ... Browse Code »

While SSDs track block usage on a per-sector basis, RAID arrays often
have allocation blocks that are bigger. Allow the discard granularity
and alignment to be set and teach the topology stacking logic how to
handle them.

Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Martin K. Petersen
2009-11-10 18:50:21 +0800

07 Oct, 2009

1 commit

316d315bf block: Seperate read and write statistics of in_flight requests v2 ... Browse Code »

Commit a9327cac440be4d8333bba975cbbf76045096275 added seperate read
and write statistics of in_flight requests. And exported the number
of read and write requests in progress seperately through sysfs.

But Corrado Zoccolo reported getting strange
output from "iostat -kx 2". Global values for service time and
utilization were garbage. For interval values, utilization was always
100%, and service time is higher than normal.

So this was reverted by commit 0f78ab9899e9d6acb09d5465def618704255963b

The problem was in part_round_stats_single(), I missed the following:
if (now == part->stamp)
return;

- if (part->in_flight) {
+ if (part_in_flight(part)) {
__part_stat_add(cpu, part, time_in_queue,
part_in_flight(part) * (now - part->stamp));
__part_stat_add(cpu, part, io_ticks, (now - part->stamp));

With this chunk included, the reported regression gets fixed.

Signed-off-by: Nikanth Karthikesan

--
Signed-off-by: Jens Axboe

Nikanth Karthikesan
2009-10-07 02:16:55 +0800

05 Oct, 2009

1 commit

0f78ab989 Revert "Seperate read and write statistics of in_flight requests" ... Browse Code »

This reverts commit a9327cac440be4d8333bba975cbbf76045096275.

Corrado Zoccolo reports:

"with 2.6.32-rc1 I started getting the following strange output from
"iostat -kx 2":
Linux 2.6.31bisect (et2) 04/10/2009 _i686_ (2 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
10,70 0,00 3,16 15,75 0,00 70,38

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 18,22 0,00 0,67 0,01 14,77 0,02
43,94 0,01 10,53 39043915,03 2629219,87
sdb 60,89 9,68 50,79 3,04 1724,43 50,52
65,95 0,70 13,06 488437,47 2629219,87

avg-cpu: %user %nice %system %iowait %steal %idle
2,72 0,00 0,74 0,00 0,00 96,53

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 100,00
sdb 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 100,00

avg-cpu: %user %nice %system %iowait %steal %idle
6,68 0,00 0,99 0,00 0,00 92,33

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 100,00
sdb 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 100,00

avg-cpu: %user %nice %system %iowait %steal %idle
4,40 0,00 0,73 1,47 0,00 93,40

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 100,00
sdb 0,00 4,00 0,00 3,00 0,00 28,00
18,67 0,06 19,50 333,33 100,00

Global values for service time and utilization are garbage. For
interval values, utilization is always 100%, and service time is
higher than normal.

I bisected it down to:
[a9327cac440be4d8333bba975cbbf76045096275] Seperate read and write
statistics of in_flight requests
and verified that reverting just that commit indeed solves the issue
on 2.6.32-rc1."

So until this is debugged, revert the bad commit.

Signed-off-by: Jens Axboe

Jens Axboe
2009-10-05 03:04:38 +0800

22 Sep, 2009

1 commit

83d5cde47 const: make block_device_operations const ... Browse Code »

Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-09-22 22:17:25 +0800

20 Sep, 2009

1 commit

e454cea20 Driver-Core: extend devnode callbacks to provide permissions ... Browse Code »

This allows subsytems to provide devtmpfs with non-default permissions
for the device node. Instead of the default mode of 0600, null, zero,
random, urandom, full, tty, ptmx now have a mode of 0666, which allows
non-privileged processes to access standard device nodes in case no
other userspace process applies the expected permissions.

This also fixes a wrong assignment in pktcdvd and a checkpatch.pl complain.

Signed-off-by: Kay Sievers
Signed-off-by: Greg Kroah-Hartman

Kay Sievers
2009-09-20 03:50:38 +0800

14 Sep, 2009

1 commit

a9327cac4 Seperate read and write statistics of in_flight requests ... Browse Code »

Currently, there is a single in_flight counter measuring the number of
requests in the request_queue. But some monitoring tools would like to
know how many read requests and write requests are in progress. Split the
current in_flight counter into two seperate counters for read and write.

This information is exported as a sysfs attribute, as changing the
currently available stat files would break the existing tools.

Signed-off-by: Nikanth Karthikesan
Signed-off-by: Jens Axboe

Nikanth Karthikesan
2009-09-14 14:24:52 +0800

16 Jun, 2009

1 commit

b03f38b68 Driver Core: block: add nodename support for block drivers. ... Browse Code »

This adds support for block drivers to report their requested nodename
to userspace. It also updates a number of block drivers to provide the
needed subdirectory and device name to be used for them.

Signed-off-by: Kay Sievers
Signed-off-by: Jan Blunck
Signed-off-by: Greg Kroah-Hartman

Kay Sievers
2009-06-16 12:30:25 +0800

13 Jun, 2009

1 commit

d614aec47 Merge branch 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6 ... Browse Code »

* 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (29 commits)
ide: re-implement ide_pci_init_one() on top of ide_pci_init_two()
ide: unexport ide_find_dma_mode()
ide: fix PowerMac bootup oops
ide: skip probe if there are no devices on the port (v2)
sl82c105: add printk() logging facility
ide-tape: fix proc warning
ide: add IDE_DFLAG_NIEN_QUIRK device flag
ide: respect quirk_drives[] list on all controllers
hpt366: enable all quirks for devices on quirk_drives[] list
hpt366: sync quirk_drives[] list with pdc202xx_{new,old}.c
ide: remove superfluous SELECT_MASK() call from do_rw_taskfile()
ide: remove superfluous SELECT_MASK() call from ide_driveid_update()
icside: remove superfluous ->maskproc method
ide-tape: fix IDE_AFLAG_* atomic accesses
ide-tape: change IDE_AFLAG_IGNORE_DSC non-atomically
pdc202xx_old: kill resetproc() method
pdc202xx_old: don't call pdc202xx_reset() on IRQ timeout
pdc202xx_old: use ide_dma_test_irq()
ide: preserve Host Protected Area by default (v2)
ide-gd: implement block device ->set_capacity method (v2)
...

Linus Torvalds
2009-06-13 00:29:42 +0800

07 Jun, 2009

1 commit

db429e9ec partitions: add ->set_capacity block device method ... Browse Code »

* Add ->set_capacity block device method and use it in rescan_partitions()
to attempt enabling native capacity of the device upon detecting the
partition which exceeds device capacity.

* Add GENHD_FL_NATIVE_CAPACITY flag to try limit attempts of enabling
native capacity during partition scan.

Together with the consecutive patch implementing ->set_capacity method in
ide-gd device driver this allows automatic disabling of Host Protected Area
(HPA) if any partitions overlapping HPA are detected.

Cc: Robert Hancock
Cc: Frans Pop
Cc: "Andries E. Brouwer"
Acked-by: Al Viro
Emphatically-Acked-by: Alan Cox
Signed-off-by: Bartlomiej Zolnierkiewicz

Bartlomiej Zolnierkiewicz
2009-06-07 19:52:52 +0800

23 May, 2009

1 commit

c72758f33 block: Export I/O topology for block devices and partitions ... Browse Code »

To support devices with physical block sizes bigger than 512 bytes we
need to ensure proper alignment. This patch adds support for exposing
I/O topology characteristics as devices are stacked.

logical_block_size is the smallest unit the device can address.

physical_block_size indicates the smallest I/O the device can write
without incurring a read-modify-write penalty.

The io_min parameter is the smallest preferred I/O size reported by
the device. In many cases this is the same as the physical block
size. However, the io_min parameter can be scaled up when stacking
(RAID5 chunk size > physical block size).

The io_opt characteristic indicates the optimal I/O size reported by
the device. This is usually the stripe width for arrays.

The alignment_offset parameter indicates the number of bytes the start
of the device/partition is offset from the device's natural alignment.
Partition tools and MD/DM utilities can use this to pad their offsets
so filesystems start on proper boundaries.

Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe

Martin K. Petersen
2009-05-23 05:22:55 +0800

22 Apr, 2009

1 commit

71982a409 block: include empty disks in /proc/diskstats ... Browse Code »

/proc/diskstats used to show stats for all disks whether they're
zero-sized or not and their non-zero partitions. Commit
074a7aca7afa6f230104e8e65eba3420263714a5 accidentally changed the
behavior such that it doesn't print out zero sized disks. This patch
implements DISK_PITER_INCL_EMPTY_PART0 flag to partition iterator and
uses it in diskstats_show() such that empty part0 is shown in
/proc/diskstats.

Reported and bisectd by Dianel Collins.

Signed-off-by: Tejun Heo
Reported-by: Daniel Collins
Signed-off-by: Jens Axboe

Tejun Heo
2009-04-22 14:35:10 +0800

24 Mar, 2009

2 commits

d39922864 block: genhd.h cleanup patch ... Browse Code »

In include/linux/genhd.h: Line 335 has a comment that needs to be updated from: /* drivers/block/ll_rw_blk.c */ to /* block/blk-core.c */. Also as of kernel 2.6.16, the function definition for get_blkdev_list was removed from block/genhd.c but the function declaration is still present on line 339. This patch addresses both those fixes, by updating the comment and removing the declaration.

Signed-off-by: Petros Koutoupis
Signed-off-by: Jens Axboe

Petros Koutoupis
2009-03-24 19:35:17 +0800
32ca163c9 block: genhd.h comment needs updating ... Browse Code »

The include/linux/genhd.h file, on line 338-352 declares some function
prototypes in which the comment on line 338 states that the definition of
these prototypes are to be found at drivers/block/genhd.c. The problem is
that genhd.c has been relocated to block/genhd.c. See attached patch to
correct this minor cosmetic typo.

Signed-off-by: Petros Koutoupis
Signed-off-by: Jens Axboe

Petros Koutoupis
2009-03-24 19:35:17 +0800

29 Dec, 2008

1 commit

a6f23657d block: add one-hit cache for disk partition lookup ... Browse Code »

disk_map_sector_rcu() returns a partition from a sector offset,
which we use for IO statistics on a per-partition basis. The
lookup itself is an O(N) list lookup, where N is the number of
partitions. This actually hurts performance quite a bit, even
on the lower end partitions. On higher numbered partitions,
it can get pretty bad.

Solve this by adding a one-hit cache for partition lookup.
This makes the lookup O(1) for the case where we do most IO to
one partition. Even for mixed partition workloads, amortized cost
is pretty close to O(1) since the natural IO batching makes the
one-hit cache last for lots of IOs.

Signed-off-by: Jens Axboe

Jens Axboe
2008-12-29 15:29:51 +0800

18 Nov, 2008

1 commit

ba32929a9 block: make add_partition() return pointer to hd_struct ... Browse Code »

Make add_partition() return pointer to the new hd_struct on success
and ERR_PTR() value on failure. This change will be used to fix md
autodetection bug.

Signed-off-by: Tejun Heo
Cc: Neil Brown
Signed-off-by: Jens Axboe

Tejun Heo
2008-11-18 22:08:56 +0800

23 Oct, 2008

1 commit

31d85ab28 proc: move /proc/diskstats boilerplate to block/genhd.c ... Browse Code »

Signed-off-by: Alexey Dobriyan
Acked-by: Jens Axboe

Alexey Dobriyan
2008-10-23 21:57:37 +0800