07 Oct, 2009
1 commit
-
Commit a9327cac440be4d8333bba975cbbf76045096275 added seperate read
and write statistics of in_flight requests. And exported the number
of read and write requests in progress seperately through sysfs.But Corrado Zoccolo reported getting strange
output from "iostat -kx 2". Global values for service time and
utilization were garbage. For interval values, utilization was always
100%, and service time is higher than normal.So this was reverted by commit 0f78ab9899e9d6acb09d5465def618704255963b
The problem was in part_round_stats_single(), I missed the following:
if (now == part->stamp)
return;- if (part->in_flight) {
+ if (part_in_flight(part)) {
__part_stat_add(cpu, part, time_in_queue,
part_in_flight(part) * (now - part->stamp));
__part_stat_add(cpu, part, io_ticks, (now - part->stamp));With this chunk included, the reported regression gets fixed.
Signed-off-by: Nikanth Karthikesan
--
Signed-off-by: Jens Axboe
05 Oct, 2009
1 commit
-
This reverts commit a9327cac440be4d8333bba975cbbf76045096275.
Corrado Zoccolo reports:
"with 2.6.32-rc1 I started getting the following strange output from
"iostat -kx 2":
Linux 2.6.31bisect (et2) 04/10/2009 _i686_ (2 CPU)avg-cpu: %user %nice %system %iowait %steal %idle
10,70 0,00 3,16 15,75 0,00 70,38Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 18,22 0,00 0,67 0,01 14,77 0,02
43,94 0,01 10,53 39043915,03 2629219,87
sdb 60,89 9,68 50,79 3,04 1724,43 50,52
65,95 0,70 13,06 488437,47 2629219,87avg-cpu: %user %nice %system %iowait %steal %idle
2,72 0,00 0,74 0,00 0,00 96,53Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 100,00
sdb 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 100,00avg-cpu: %user %nice %system %iowait %steal %idle
6,68 0,00 0,99 0,00 0,00 92,33Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 100,00
sdb 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 100,00avg-cpu: %user %nice %system %iowait %steal %idle
4,40 0,00 0,73 1,47 0,00 93,40Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await svctm %util
sda 0,00 0,00 0,00 0,00 0,00 0,00
0,00 0,00 0,00 0,00 100,00
sdb 0,00 4,00 0,00 3,00 0,00 28,00
18,67 0,06 19,50 333,33 100,00Global values for service time and utilization are garbage. For
interval values, utilization is always 100%, and service time is
higher than normal.I bisected it down to:
[a9327cac440be4d8333bba975cbbf76045096275] Seperate read and write
statistics of in_flight requests
and verified that reverting just that commit indeed solves the issue
on 2.6.32-rc1."So until this is debugged, revert the bad commit.
Signed-off-by: Jens Axboe
22 Sep, 2009
1 commit
-
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Sep, 2009
1 commit
-
Let attribute group vectors be declared "const". We'd
like to let most attribute metadata live in read-only
sections... this is a start.Signed-off-by: David Brownell
Signed-off-by: Greg Kroah-Hartman
14 Sep, 2009
1 commit
-
Currently, there is a single in_flight counter measuring the number of
requests in the request_queue. But some monitoring tools would like to
know how many read requests and write requests are in progress. Split the
current in_flight counter into two seperate counters for read and write.This information is exported as a sysfs attribute, as changing the
currently available stat files would break the existing tools.Signed-off-by: Nikanth Karthikesan
Signed-off-by: Jens Axboe
13 Jul, 2009
1 commit
-
git commit f67f129e "Driver core: implement uevent suppress in kobject"
contains this chunk for fs/partitions/check.c:/* suppress uevent if the disk supresses it */
- if (!ddev->uevent_suppress)
+ if (!dev_get_uevent_suppress(pdev))
kobject_uevent(&pdev->kobj, KOBJ_ADD);However that should have been
- if (!ddev->uevent_suppress)
+ if (!dev_get_uevent_suppress(ddev))Signed-off-by: Heiko Carstens
Acked-by: Ming Lei
Cc: stable
Signed-off-by: Greg Kroah-Hartman
13 Jun, 2009
1 commit
-
* 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (29 commits)
ide: re-implement ide_pci_init_one() on top of ide_pci_init_two()
ide: unexport ide_find_dma_mode()
ide: fix PowerMac bootup oops
ide: skip probe if there are no devices on the port (v2)
sl82c105: add printk() logging facility
ide-tape: fix proc warning
ide: add IDE_DFLAG_NIEN_QUIRK device flag
ide: respect quirk_drives[] list on all controllers
hpt366: enable all quirks for devices on quirk_drives[] list
hpt366: sync quirk_drives[] list with pdc202xx_{new,old}.c
ide: remove superfluous SELECT_MASK() call from do_rw_taskfile()
ide: remove superfluous SELECT_MASK() call from ide_driveid_update()
icside: remove superfluous ->maskproc method
ide-tape: fix IDE_AFLAG_* atomic accesses
ide-tape: change IDE_AFLAG_IGNORE_DSC non-atomically
pdc202xx_old: kill resetproc() method
pdc202xx_old: don't call pdc202xx_reset() on IRQ timeout
pdc202xx_old: use ide_dma_test_irq()
ide: preserve Host Protected Area by default (v2)
ide-gd: implement block device ->set_capacity method (v2)
...
07 Jun, 2009
2 commits
-
* Add ->set_capacity block device method and use it in rescan_partitions()
to attempt enabling native capacity of the device upon detecting the
partition which exceeds device capacity.* Add GENHD_FL_NATIVE_CAPACITY flag to try limit attempts of enabling
native capacity during partition scan.Together with the consecutive patch implementing ->set_capacity method in
ide-gd device driver this allows automatic disabling of Host Protected Area
(HPA) if any partitions overlapping HPA are detected.Cc: Robert Hancock
Cc: Frans Pop
Cc: "Andries E. Brouwer"
Acked-by: Al Viro
Emphatically-Acked-by: Alan Cox
Signed-off-by: Bartlomiej Zolnierkiewicz -
The current warning message says only about the kernel's action taken
without mentioning the underlying reason behind it.Noticed-by: Robert Hancock
Cc: Frans Pop
Cc: "Andries E. Brouwer"
Cc: Al Viro
Emphatically-Acked-by: Alan Cox
Signed-off-by: Bartlomiej Zolnierkiewicz
23 May, 2009
2 commits
-
To support devices with physical block sizes bigger than 512 bytes we
need to ensure proper alignment. This patch adds support for exposing
I/O topology characteristics as devices are stacked.logical_block_size is the smallest unit the device can address.
physical_block_size indicates the smallest I/O the device can write
without incurring a read-modify-write penalty.The io_min parameter is the smallest preferred I/O size reported by
the device. In many cases this is the same as the physical block
size. However, the io_min parameter can be scaled up when stacking
(RAID5 chunk size > physical block size).The io_opt characteristic indicates the optimal I/O size reported by
the device. This is usually the stripe width for arrays.The alignment_offset parameter indicates the number of bytes the start
of the device/partition is offset from the device's natural alignment.
Partition tools and MD/DM utilities can use this to pad their offsets
so filesystems start on proper boundaries.Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe -
Until now we have had a 1:1 mapping between storage device physical
block size and the logical block sized used when addressing the device.
With SATA 4KB drives coming out that will no longer be the case. The
sector size will be 4KB but the logical block size will remain
512-bytes. Hence we need to distinguish between the physical block size
and the logical ditto.This patch renames hardsect_size to logical_block_size.
Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe
02 Apr, 2009
1 commit
-
Conflicts:
include/linux/slub_def.h
lib/Kconfig.debug
mm/slob.c
mm/slub.c
27 Mar, 2009
1 commit
-
* 'for-linus' of git://git390.marist.edu/pub/scm/linux-2.6: (81 commits)
[S390] remove duplicated #includes
[S390] cpumask: use mm_cpumask() wrapper
[S390] cpumask: Use accessors code.
[S390] cpumask: prepare for iterators to only go to nr_cpu_ids/nr_cpumask_bits.
[S390] cpumask: remove cpu_coregroup_map
[S390] fix clock comparator save area usage
[S390] Add hwcap flag for the etf3 enhancement facility
[S390] Ensure that ipl panic notifier is called late.
[S390] fix dfp elf hwcap/facility bit detection
[S390] smp: perform initial cpu reset before starting a cpu
[S390] smp: fix memory leak on __cpu_up
[S390] ipl: Improve checking logic and remove switch defaults.
[S390] s390dbf: Remove needless check for NULL pointer.
[S390] s390dbf: Remove redundant initilizations.
[S390] use kzfree()
[S390] BUG to BUG_ON changes
[S390] zfcpdump: Prevent zcore from beeing built as a kernel module.
[S390] Use csum_partial in checksum.h
[S390] cleanup lowcore.h
[S390] eliminate ipl_device from lowcore
...
26 Mar, 2009
1 commit
-
The dasd device driver will now support ECKD devices with more then
65520 cylinders.
In the traditional ECKD adressing scheme each track is addressed
by a 16-bit cylinder and 16-bit head number. The new addressing
scheme makes use of the fact that the actual number of heads is
never larger then 15, so 12 bits of the head number can be redefined
to be part of the cylinder address.Signed-off-by: Stefan Weinhuber
Signed-off-by: Martin Schwidefsky
25 Mar, 2009
1 commit
-
This patch implements uevent suppress in kobject and removes it
from struct device, based on the following ideas:1,Uevent sending should be one attribute of kobject, so suppressing it
in kobject layer is more natural than in device layer. By this way,
we can do it for other objects embedded with kobject.2,It may save several bytes for each instance of struct device.(On my
omap3(32bit ARM) based box, can save 8bytes per device object)This patch also introduces dev_set|get_uevent_suppress() helpers to
set and query uevent_suppress attribute in case to help kobject
as private part of struct device in future.[This version is against the latest driver-core patch set of Greg,please
ignore the last version.]Signed-off-by: Ming Lei
Signed-off-by: Greg Kroah-Hartman
27 Jan, 2009
1 commit
-
Also make sure sparse (make C=2 block/blktrace.o) is happy too.
Reported-by: Ingo Molnar
Signed-off-by: Arnaldo Carvalho de Melo
Signed-off-by: Ingo Molnar
26 Jan, 2009
1 commit
-
Impact: New way of using the blktrace infrastructure
This drops the requirement of userspace utilities to use the blktrace
facility.Configuration is done thru sysfs, adding a "trace" directory to the
partition directory where blktrace can be enabled for the associated
request_queue.The same filters present in the IOCTL interface are present as sysfs
device attributes.The /sys/block/sdX/sdXN/trace/enable file allows tracing without any
filters.The other files in this directory: pid, act_mask, start_lba and end_lba
can be used with the same meaning as with the IOCTL interface.Using the sysfs interface will only setup the request_queue->blk_trace
fields, tracing will only take place when the "blk" tracer is selected
via the ftrace interface, as in the following example:To see the trace, one can use the /d/tracing/trace file or the
/d/tracign/trace_pipe file, with semantics defined in the ftrace
documentation in Documentation/ftrace.txt.[root@f10-1 ~]# cat /t/trace
kjournald-305 [000] 3046.491224: 8,1 A WBS 6367 + 8 -0 [000] 3046.511914: 8,1 C RS 6367 + 8 [6367]
[root@f10-1 ~]#The default line context (prefix) format is the one described in the ftrace
documentation, with the blktrace specific bits using its existing format,
described in blkparse(8).If one wants to have the classic blktrace formatting, this is possible by
using:[root@f10-1 ~]# echo blk_classic > /t/trace_options
[root@f10-1 ~]# cat /t/trace
8,1 0 3046.491224 305 A WBS 6367 + 8 /t/trace_options
[root@f10-1 ~]# echo stacktrace > /t/trace_options[root@f10-1 ~]# cat /t/trace
kjournald-305 [000] 3318.826779: 8,1 A WBS 6375 + 8
Signed-off-by: Ingo Molnar
10 Jan, 2009
1 commit
-
Neil writes:
Hi Jens,
I've found a little bug for you. It was introduced by
a6f23657d3072bde6844055bbc2290e497f33fbcblock: add one-hit cache for disk partition lookup
and has the effect of killing my machine whenever I try to assemble
an md array :-(
One of the devices in the array has partitions, and mdadm always
deletes partitions before putting a whole-device in an array (as it
can cause confusion). The next IO to that device locks the machine.
I don't really understand exactly why it locks up, but it happens in
disk_map_sector_rcu(). This patch fixes it.Which is due to a missing clear of the (now) stale partition lookup
data. So clear that when we delete a partition.Signed-off-by: Jens Axboe
07 Jan, 2009
1 commit
-
Cc: Jens Axboe
Signed-off-by: Kay Sievers
Signed-off-by: Greg Kroah-Hartman
18 Nov, 2008
3 commits
-
Block ext devt conversion missed md_autodetect_dev() call in
rescan_partitions() leaving md autodetect unable to see partitions.
Fix it.Signed-off-by: Tejun Heo
Cc: Neil Brown
Signed-off-by: Jens Axboe -
Make add_partition() return pointer to the new hd_struct on success
and ERR_PTR() value on failure. This change will be used to fix md
autodetection bug.Signed-off-by: Tejun Heo
Cc: Neil Brown
Signed-off-by: Jens Axboe -
Partition stats structure was not freed on devt allocation failure
path. Fix it.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe
21 Oct, 2008
2 commits
-
* get rid of fake struct file/struct dentry in __blkdev_get()
* merge __blkdev_get() and do_open()
* get rid of flags argument of blkdev_get()Signed-off-by: Al Viro
-
Signed-off-by: Al Viro
17 Oct, 2008
3 commits
-
With extended devt, finding out the partition number becomes a bit
more challenging as subtracting the minor number from that of the
parent device doesn't work anymore. The only thing left is parsing
the partition name which is brittle and not exactly universal (some
have '-' between the device name and partition number while others
don't). This patch introduced partition attribute which contains the
partition number of the device. This should make finding partitions
and its index easier.This problem and solution were suggested by H. Peter Anvin.
Signed-off-by: Tejun Heo
Cc: H. Peter Anvin
Signed-off-by: Jens Axboe -
We currently follow blindly what the partition table lies about the
disk, and let the kernel create block devices which can not be accessed.
Trying to identify the device leads to kernel logs full of:
sdb: rw=0, want=73392, limit=28800
attempt to access beyond end of deviceHere is an example of a broken partition table, where sda2 starts
behind the end of the disk, and sdb3 is larger than the entire disk:
Disk /dev/sdb: 14 MB, 14745600 bytes
1 heads, 29 sectors/track, 993 cylinders, total 28800 sectors
Device Boot Start End Blocks Id System
/dev/sdb1 29 7800 3886 83 Linux
/dev/sdb2 37801 45601 3900+ 83 Linux
/dev/sdb3 15602 73402 28900+ 83 Linux
/dev/sdb4 23403 28796 2697 83 LinuxThe kernel creates these completely invalid devices, which can not be
accessed, or may lead to other unpredictable failures:
grep . /sys/class/block/sdb*/{start,size}
/sys/class/block/sdb/size:28800
/sys/class/block/sdb1/start:29
/sys/class/block/sdb1/size:7772
/sys/class/block/sdb2/start:37801
/sys/class/block/sdb2/size:7801
/sys/class/block/sdb3/start:15602
/sys/class/block/sdb3/size:57801
/sys/class/block/sdb4/start:23403
/sys/class/block/sdb4/size:5394With this patch, we ignore partitions which start behind the end of the disk,
and limit partitions to the end of the disk if they pretend to be larger:
grep . /sys/class/block/sdb*/{start,size}
/sys/class/block/sdb/size:28800
/sys/class/block/sdb1/start:29
/sys/class/block/sdb1/size:7772
/sys/class/block/sdb3/start:15602
/sys/class/block/sdb3/size:13198
/sys/class/block/sdb4/start:23403
/sys/class/block/sdb4/size:5394These warnings are printed to the kernel log:
sdb: p2 ignored, start 37801 is behind the end of the disk
sdb: p3 size 57801 limited to end of diskSigned-off-by: Kay Sievers
Cc: Herton Ronaldo Krzesinski
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
I missed this when I did the arm26 removal.
Reported-by: Robert P. J. Day
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Oct, 2008
13 commits
-
Check for device resize in the rescan_partitions() routine. If the device
has been resized, the bdev size is set to match. The rescan_partitions()
routine is called when opening the device and when calling the
BLKRRPART ioctl.Signed-off-by: Andrew Patterson
Signed-off-by: Jens Axboe -
Now that disk and partition handlings are mostly unified, it's easy to
allow disk to have extended device number. This patch makes
add_disk() use extended device number if disk->minors is zero. Both
sd and ide-disk are updated to use this.* sd_format_disk_name() is implemented which can generically determine
the drive name. This removes disk number restriction stemming from
limited device names.* If sd index goes over SD_MAX_DISKS (which can be increased now BTW),
sd simply doesn't initialize minors letting block layer choose
extended device number.* If CONFIG_DEBUG_EXT_DEVT is set, both sd and ide-disk always set
minors to 0 and use extended device numbers.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe -
With previous changes, it's meaningless to limit the number of
partitions. Replace @ext_minors with GENHD_FL_EXT_DEVT such that
setting the flag allows the disk to have maximum number of allowed
partitions (only limited by the number of entries in parsed_partitions
as determined by MAX_PART constant).This kills not-too-pretty alloc_disk_ext[_node]() functions and makes
@minors parameter to alloc_disk[_node]() unnecessary. The parameter
is left alone to avoid disturbing the users.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe -
disk->__part used to be statically allocated to the maximum possible
number of partitions. This patch makes partition array allocation
dynamic. The added overhead is minimal as only real change is one
memory dereference changed to RCU one. This saves both a bit of
memory and cpu cycles iterating through unoccupied slots and makes
increasing partition limit easier.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe -
Move stats related fields - stamp, in_flight, dkstats - from disk to
part0 and unify stat handling such that...* part_stat_*() now updates part0 together if the specified partition
is not part0. ie. part_stat_*() are now essentially all_stat_*().* {disk|all}_stat_*() are gone.
* part_round_stats() is updated similary. It handles part0 stats
automatically and disk_round_stats() is killed.* part_{inc|dec}_in_fligh() is implemented which automatically updates
part0 stats for parts other than part0.* disk_map_sector_rcu() is updated to return part0 if no part matches.
Combined with the above changes, this makes NULL special case
handling in callers unnecessary.* Separate stats show code paths for disk are collapsed into part
stats show code paths.* Rename disk_stat_lock/unlock() to part_stat_lock/unlock()
While at it, reposition stat handling macros a bit and add missing
parentheses around macro parameters.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe -
GENHD_FL_FAIL for disk is what make_it_fail is for parts. Kill it and
use part0->make_it_fail. Sysfs node handling is unified too.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe -
Till now, bdev->bd_part is set only if the bdev was for parts other
than part0. This patch makes bdev->bd_part always set so that code
paths don't have to differenciate common handling.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe -
Move disk->holder_dir to part0->holder_dir. Kill now mostly
superflous bdev_get_holder().While at it, kill superflous kobject_get/put() around holder_dir,
slave_dir and cmd_filter creation and collapse
disk_sysfs_add_subdirs() into register_disk(). These serve no purpose
but obfuscating the code.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe -
Move disk->policy to part0->policy. Implement and use get_disk_ro().
Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe -
Now that capacity and __dev are moved to part0, part0 and others can
share the same method.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe -
Move disk->capacity to part0->nr_sects and convert all users who
directly accessed the field to use {get|set}_capacity(). This is done
early to allow the __dev field to be moved.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe -
genhd and partition code handled disk and partitions separately. All
information about the whole disk was in struct genhd and partitions in
struct hd_struct. However, the whole disk (part0) and other
partitions have a lot in common and the data structures end up having
good number of common fields and thus separate code paths doing the
same thing. Also, the partition array was indexed by partno - 1 which
gets pretty confusing at times.This patch introduces partition 0 and makes the partition array
indexed by partno. Following patches will unify the handling of disk
and parts piece-by-piece.This patch also implements disk_partitionable() which tests whether a
disk is partitionable. With coming dynamic partition array change,
the most common usage of disk_max_parts() will be testing whether a
disk is partitionable and the number of max partitions will become
much less important.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe -
Implement {disk|part}_to_dev() and use them to access generic device
instead of directly dereferencing {disk|part}->dev. To make sure no
user is left behind, rename generic devices fields to __dev.This is in preparation of unifying partition 0 handling with other
partitions.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe