02 Oct, 2009

1 commit

  • Add missing blk_trace_remove_sysfs to be in pair with blk_trace_init_sysfs
    introduced in commit 1d54ad6da9192fed5dd3b60224d9f2dfea0dcd82.
    Release kobject also in case the request_fn is NULL.

    Problem was noticed via kmemleak backtrace when some sysfs entries were
    note properly destroyed during device removal:

    unreferenced object 0xffff88001aa76640 (size 80):
    comm "lvcreate", pid 2120, jiffies 4294885144
    hex dump (first 32 bytes):
    01 00 00 00 00 00 00 00 f0 65 a7 1a 00 88 ff ff .........e......
    90 66 a7 1a 00 88 ff ff 86 1d 53 81 ff ff ff ff .f........S.....
    backtrace:
    [] kmemleak_alloc+0x26/0x60
    [] kmem_cache_alloc+0x133/0x1c0
    [] sysfs_new_dirent+0x41/0x120
    [] sysfs_add_file_mode+0x3c/0xb0
    [] internal_create_group+0xc1/0x1a0
    [] sysfs_create_group+0x13/0x20
    [] blk_trace_init_sysfs+0x14/0x20
    [] blk_register_queue+0x3c/0xf0
    [] add_disk+0x94/0x160
    [] dm_create+0x598/0x6e0 [dm_mod]
    [] dev_create+0x51/0x350 [dm_mod]
    [] ctl_ioctl+0x1a3/0x240 [dm_mod]
    [] dm_compat_ctl_ioctl+0x12/0x20 [dm_mod]
    [] compat_sys_ioctl+0xcd/0x4f0
    [] sysenter_dispatch+0x7/0x2c
    [] 0xffffffffffffffff

    Signed-off-by: Zdenek Kabelac
    Reviewed-by: Li Zefan
    Signed-off-by: Jens Axboe

    Zdenek Kabelac
     

14 Sep, 2009

1 commit


02 Sep, 2009

1 commit

  • The patch "block: Use accessor functions for queue limits"
    (ae03bf639a5027d27270123f5f6e3ee6a412781d) changed queue_max_sectors_store()
    to use blk_queue_max_sectors() instead of directly assigning the value.

    But blk_queue_max_sectors() differs a bit
    1. It sets both max_sectors_kb, and max_hw_sectors_kb
    2. Never allows one to change max_sectors_kb above BLK_DEF_MAX_SECTORS. If one
    specifies a value greater then max_hw_sectors is set to that value but
    max_sectors is set to BLK_DEF_MAX_SECTORS

    I am not sure whether blk_queue_max_sectors() should be changed, as it seems
    to be that way for a long time. And there may be callers dependent on that
    behaviour.

    This patch simply reverts to the older way of directly assigning the value to
    max_sectors as it was before.

    Signed-off-by: Nikanth Karthikesan
    Signed-off-by: Jens Axboe

    Nikanth Karthikesan
     

17 Jul, 2009

1 commit

  • In blk-sysfs.c, queue_var_store uses unsigned long to store data,
    but queue_var_show uses unsigned int to show data. This causes,

    # echo 70000000000 > /sys/block//queue/read_ahead_kb
    # cat /sys/block//queue/read_ahead_kb => get wrong value

    Fix it by using unsigned long.

    While at it, convert queue_rq_affinity_show() such that it uses bool
    variable instead of explicit != 0 testing.

    Signed-off-by: Xiaotian Feng
    Signed-off-by: Tejun Heo

    Xiaotian Feng
     

12 Jun, 2009

1 commit

  • * 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits)
    block: add request clone interface (v2)
    floppy: fix hibernation
    ramdisk: remove long-deprecated "ramdisk=" boot-time parameter
    fs/bio.c: add missing __user annotation
    block: prevent possible io_context->refcount overflow
    Add serial number support for virtio_blk, V4a
    block: Add missing bounce_pfn stacking and fix comments
    Revert "block: Fix bounce limit setting in DM"
    cciss: decode unit attention in SCSI error handling code
    cciss: Remove no longer needed sendcmd reject processing code
    cciss: change SCSI error handling routines to work with interrupts enabled.
    cciss: separate error processing and command retrying code in sendcmd_withirq_core()
    cciss: factor out fix target status processing code from sendcmd functions
    cciss: simplify interface of sendcmd() and sendcmd_withirq()
    cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code
    cciss: Use schedule_timeout_uninterruptible in SCSI error handling code
    block: needs to set the residual length of a bidi request
    Revert "block: implement blkdev_readpages"
    block: Fix bounce limit setting in DM
    Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt
    ...

    Manually fix conflicts with tracing updates in:
    block/blk-sysfs.c
    drivers/ide/ide-atapi.c
    drivers/ide/ide-cd.c
    drivers/ide/ide-floppy.c
    drivers/ide/ide-tape.c
    include/trace/events/block.h
    kernel/trace/blktrace.c

    Linus Torvalds
     

23 May, 2009

4 commits

  • To support devices with physical block sizes bigger than 512 bytes we
    need to ensure proper alignment. This patch adds support for exposing
    I/O topology characteristics as devices are stacked.

    logical_block_size is the smallest unit the device can address.

    physical_block_size indicates the smallest I/O the device can write
    without incurring a read-modify-write penalty.

    The io_min parameter is the smallest preferred I/O size reported by
    the device. In many cases this is the same as the physical block
    size. However, the io_min parameter can be scaled up when stacking
    (RAID5 chunk size > physical block size).

    The io_opt characteristic indicates the optimal I/O size reported by
    the device. This is usually the stripe width for arrays.

    The alignment_offset parameter indicates the number of bytes the start
    of the device/partition is offset from the device's natural alignment.
    Partition tools and MD/DM utilities can use this to pad their offsets
    so filesystems start on proper boundaries.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Currently stacking devices do not have a queue directory in sysfs.
    However, many of the I/O characteristics like sector size, maximum
    request size, etc. are queue properties.

    This patch enables the queue directory for MD/DM devices. The elevator
    code has been modified to deal with queues that do not have an I/O
    scheduler.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Convert all external users of queue limits to using wrapper functions
    instead of poking the request queue variables directly.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Until now we have had a 1:1 mapping between storage device physical
    block size and the logical block sized used when addressing the device.
    With SATA 4KB drives coming out that will no longer be the case. The
    sector size will be 4KB but the logical block size will remain
    512-bytes. Hence we need to distinguish between the physical block size
    and the logical ditto.

    This patch renames hardsect_size to logical_block_size.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

07 May, 2009

1 commit


24 Apr, 2009

1 commit

  • This simplifies I/O stat accounting switching code and separates it
    completely from I/O scheduler switch code.

    Requests are accounted according to the state of their request queue
    at the time of the request allocation. There is no need anymore to
    flush the request queue when switching I/O accounting state.

    Signed-off-by: Jerome Marchand
    Signed-off-by: Jens Axboe

    Jerome Marchand
     

16 Apr, 2009

1 commit

  • Impact: allow ftrace-plugin blktrace to trace device-mapper devices

    To trace a single partition:
    # echo 1 > /sys/block/sda/sda1/enable

    To trace the whole sda instead:
    # echo 1 > /sys/block/sda/enable

    Thus we also fix an issue reported by Ted, that ftrace-plugin blktrace
    can't be used to trace device-mapper devices.

    Now:

    # echo 1 > /sys/block/dm-0/trace/enable
    echo: write error: No such device or address
    # mount -t ext4 /dev/dm-0 /mnt
    # echo 1 > /sys/block/dm-0/trace/enable
    # echo blk > /debug/tracing/current_tracer

    Reported-by: Theodore Tso
    Signed-off-by: Li Zefan
    Acked-by: "Theodore Ts'o"
    Cc: Arnaldo Carvalho de Melo
    Cc: Shawn Du
    Cc: Jens Axboe
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Li Zefan
     

15 Apr, 2009

1 commit


07 Apr, 2009

1 commit


06 Apr, 2009

1 commit


30 Jan, 2009

2 commits


29 Dec, 2008

1 commit


09 Oct, 2008

2 commits

  • This patch adds support for controlling the IO completion CPU of
    either all requests on a queue, or on a per-request basis. We export
    a sysfs variable (rq_affinity) which, if set, migrates completions
    of requests to the CPU that originally submitted it. A bio helper
    (bio_set_completion_cpu()) is also added, so that queuers can ask
    for completion on that specific CPU.

    In testing, this has been show to cut the system time by as much
    as 20-40% on synthetic workloads where CPU affinity is desired.

    This requires a little help from the architecture, so it'll only
    work as designed for archs that are using the new generic smp
    helper infrastructure.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Implement {disk|part}_to_dev() and use them to access generic device
    instead of directly dereferencing {disk|part}->dev. To make sure no
    user is left behind, rename generic devices fields to __dev.

    This is in preparation of unifying partition 0 handling with other
    partitions.

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     

07 May, 2008

1 commit


29 Apr, 2008

1 commit

  • The block I/O + elevator + I/O scheduler code spend a lot of time trying
    to merge I/Os -- rightfully so under "normal" circumstances. However,
    if one were to know that the incoming I/O stream was /very/ random in
    nature, the cycles are wasted.

    This patch adds a per-request_queue tunable that (when set) disables
    merge attempts (beyond the simple one-hit cache check), thus freeing up
    a non-trivial amount of CPU cycles.

    Signed-off-by: Alan D. Brunelle
    Signed-off-by: Jens Axboe

    Alan D. Brunelle
     

21 Apr, 2008

1 commit

  • blk_register_queue() returns -ENXIO when queue->request_fn is NULL. But there
    are some block drivers that call blk_register_queue() via add_disk() with
    queue->request_fn == NULL. (For example, brd, loop)

    Although no one checks return value of blk_register_queue(), this patch makes
    it return 0 instead of -ENXIO when queue->request_fn is NULL,

    Also this patch adds warning when blk_register_queue() and
    blk_unregister_queue() are called with queue == NULL rather than ignore
    invalid usage silently.

    Signed-off-by: Akinobu Mita
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Akinobu Mita
     

01 Feb, 2008

1 commit


30 Jan, 2008

2 commits