16 Sep, 2009

3 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6:
    Driver Core: devtmpfs - kernel-maintained tmpfs-based /dev
    debugfs: Modify default debugfs directory for debugging pktcdvd.
    debugfs: Modified default dir of debugfs for debugging UHCI.
    debugfs: Change debugfs directory of IWMC3200
    debugfs: Change debuhgfs directory of trace-events-sample.h
    debugfs: Fix mount directory of debugfs by default in events.txt
    hpilo: add poll f_op
    hpilo: add interrupt handler
    hpilo: staging for interrupt handling
    driver core: platform_device_add_data(): use kmemdup()
    Driver core: Add support for compatibility classes
    uio: add generic driver for PCI 2.3 devices
    driver-core: move dma-coherent.c from kernel to driver/base
    mem_class: fix bug
    mem_class: use minor as index instead of searching the array
    driver model: constify attribute groups
    UIO: remove 'default n' from Kconfig
    Driver core: Add accessor for device platform data
    Driver core: move dev_get/set_drvdata to drivers/base/dd.c
    Driver core: add new device to bus's list before probing

    Linus Torvalds
     
  • Let attribute group vectors be declared "const". We'd
    like to let most attribute metadata live in read-only
    sections... this is a start.

    Signed-off-by: David Brownell
    Signed-off-by: Greg Kroah-Hartman

    David Brownell
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (46 commits)
    powerpc64: convert to dynamic percpu allocator
    sparc64: use embedding percpu first chunk allocator
    percpu: kill lpage first chunk allocator
    x86,percpu: use embedding for 64bit NUMA and page for 32bit NUMA
    percpu: update embedding first chunk allocator to handle sparse units
    percpu: use group information to allocate vmap areas sparsely
    vmalloc: implement pcpu_get_vm_areas()
    vmalloc: separate out insert_vmalloc_vm()
    percpu: add chunk->base_addr
    percpu: add pcpu_unit_offsets[]
    percpu: introduce pcpu_alloc_info and pcpu_group_info
    percpu: move pcpu_lpage_build_unit_map() and pcpul_lpage_dump_cfg() upward
    percpu: add @align to pcpu_fc_alloc_fn_t
    percpu: make @dyn_size mandatory for pcpu_setup_first_chunk()
    percpu: drop @static_size from first chunk allocators
    percpu: generalize first chunk allocator selection
    percpu: build first chunk allocators selectively
    percpu: rename 4k first chunk allocator to page
    percpu: improve boot messages
    percpu: fix pcpu_reclaim() locking
    ...

    Fix trivial conflict as by Tejun Heo in kernel/sched.c

    Linus Torvalds
     

15 Sep, 2009

1 commit

  • * 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block: (29 commits)
    block: use blkdev_issue_discard in blk_ioctl_discard
    Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads
    block: don't assume device has a request list backing in nr_requests store
    block: Optimal I/O limit wrapper
    cfq: choose a new next_req when a request is dispatched
    Seperate read and write statistics of in_flight requests
    aoe: end barrier bios with EOPNOTSUPP
    block: trace bio queueing trial only when it occurs
    block: enable rq CPU completion affinity by default
    cfq: fix the log message after dispatched a request
    block: use printk_once
    cciss: memory leak in cciss_init_one()
    splice: update mtime and atime on files
    block: make blk_iopoll_prep_sched() follow normal 0/1 return convention
    cfq-iosched: get rid of must_alloc flag
    block: use interrupts disabled version of raise_softirq_irqoff()
    block: fix comment in blk-iopoll.c
    block: adjust default budget for blk-iopoll
    block: fix long lines in block/blk-iopoll.c
    block: add blk-iopoll, a NAPI like approach for block devices
    ...

    Linus Torvalds
     

14 Sep, 2009

5 commits

  • blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard,
    the only difference between the two is that blkdev_issue_discard needs to
    send a barrier discard request and blk_ioctl_discard a non-barrier one,
    and blk_ioctl_discard needs to wait on the request. To facilitates this
    add a flags argument to blkdev_issue_discard to control both aspects of the
    behaviour. This will be very useful later on for using the waiting
    funcitonality for other callers.

    Based on an earlier patch from Matthew Wilcox .

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Stacked devices do not. For now, just error out with -EINVAL. Later
    we could make the limit apply on stacked devices too, for throttling
    reasons.

    This fixes

    5a54cd13353bb3b88887604e2c980aa01e314309

    and should go into 2.6.31 stable as well.

    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Implement blk_limits_io_opt() and make blk_queue_io_opt() a wrapper
    around it. DM needs this to avoid poking at the queue_limits directly.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • This patch addresses http://bugzilla.kernel.org/show_bug.cgi?id=13401, a
    regression introduced in 2.6.30.

    From the bug report:

    Signed-off-by: Jens Axboe

    Jeff Moyer
     
  • Currently, there is a single in_flight counter measuring the number of
    requests in the request_queue. But some monitoring tools would like to
    know how many read requests and write requests are in progress. Split the
    current in_flight counter into two seperate counters for read and write.

    This information is exported as a sysfs attribute, as changing the
    currently available stat files would break the existing tools.

    Signed-off-by: Nikanth Karthikesan
    Signed-off-by: Jens Axboe

    Nikanth Karthikesan
     

11 Sep, 2009

17 commits

  • If BIO is discarded or cross over end of device,
    BIO queueing trial doesn't occur.

    Actually the trace was called just before make_request at first:
    [PATCH] Block queue IO tracing support (blktrace) as of 2006-03-23
         2056a782f8e7e65fd4bfd027506b4ce1c5e9ccd4

    And then 2 patches added some checks between them:
    [PATCH] md: check bio address after mapping through partitions
           5ddfe9691c91a244e8d1be597b6428fcefd58103,
    [BLOCK] Don't allow empty barriers to be passed down to
    queues that don't grok them
           51fd77bd9f512ab6cc9df0733ba1caaab89eb957

    It breaks original goal.
    Let's trace it only when it happens.

    Signed-off-by: Minchan Kim
    Acked-by: Wu Fengguang
    Cc: Li Zefan
    Signed-off-by: Jens Axboe

    Minchan Kim
     
  • The blktrace tools can show process id when cfq dispatched a request,
    using cfq_log_cfqq() instead of cfq_log().

    Signed-off-by: Shan Wei
    Signed-off-by: Jens Axboe

    Shan Wei
     
  • It's not currently used, as pointed out by
    Gui Jianfeng . We already check the
    wait_request flag to allow an idling queue priority allocation access,
    so we don't need this extra flag.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • We already have interrupts disabled at that point, so use the
    __raise_softirq_irqoff() variant.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Signed-off-by: Jens Axboe

    Jens Axboe
     
  • It's not exported, I doubt we'll have a reason to change this...

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Note sure why they happened in the first place, probably some bad
    terminal setting.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This borrows some code from NAPI and implements a polled completion
    mode for block devices. The idea is the same as NAPI - instead of
    doing the command completion when the irq occurs, schedule a dedicated
    softirq in the hopes that we will complete more IO when the iopoll
    handler is invoked. Devices have a budget of commands assigned, and will
    stay in polled mode as long as they continue to consume their budget
    from the iopoll softirq handler. If they do not, the device is set back
    to interrupt completion mode.

    This patch holds the core bits for blk-iopoll, device driver support
    sold separately.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Instead of just checking whether this device uses block layer
    tagging, we can improve the detection by looking at the maximum
    queue depth it has reached. If that crosses 4, then deem it a
    queuing device.

    This is important on high IOPS devices, since plugging hurts
    the performance there (it can be as much as 10-15% of the sys
    time).

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Get rid of any functions that test for these bits and make callers
    use bio_rw_flagged() directly. Then it is at least directly apparent
    what variable and flag they check.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Whenever a block device changes it's read-only attribute
    notify the userspace about it.

    Signed-off-by: Hannes Reinecke
    Signed-off-by: Nikanth Karthikesan
    Signed-off-by: Jens Axboe

    Hannes Reinecke
     
  • o Get rid of busy_rt_queues infrastructure. Looks like it is redundant.

    o Once an RT queue gets request it will preempt any of the BE or IDLE queues
    immediately. Otherwise this queue will be put on service tree and scheduler
    will anyway select this queue before any of the BE or IDLE queue. Hence
    looks like there is no need to keep track of how many busy RT queues are
    currently on service tree.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • To lessen the impact of async IO on sync IO, let the device drain of
    any async IO in progress when switching to a sync cfqq that has idling
    enabled.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Update scsi_io_completion() such that it only fails requests till the
    next error boundary and retry the leftover. This enables block layer
    to merge requests with different failfast settings and still behave
    correctly on errors. Allow merge of requests of different failfast
    settings.

    As SCSI is currently the only subsystem which follows failfast status,
    there's no need to worry about other block drivers for now.

    Signed-off-by: Tejun Heo
    Cc: Niel Lambrechts
    Cc: James Bottomley
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Failfast has characteristics from other attributes. When issuing,
    executing and successuflly completing requests, failfast doesn't make
    any difference. It only affects how a request is handled on failure.
    Allowing requests with different failfast settings to be merged cause
    normal IOs to fail prematurely while not allowing has performance
    penalties as failfast is used for read aheads which are likely to be
    located near in-flight or to-be-issued normal IOs.

    This patch introduces the concept of 'mixed merge'. A request is a
    mixed merge if it is merge of segments which require different
    handling on failure. Currently the only mixable attributes are
    failfast ones (or lack thereof).

    When a bio with different failfast settings is added to an existing
    request or requests of different failfast settings are merged, the
    merged request is marked mixed. Each bio carries failfast settings
    and the request always tracks failfast state of the first bio. When
    the request fails, blk_rq_err_bytes() can be used to determine how
    many bytes can be safely failed without crossing into an area which
    requires further retrials.

    This allows request merging regardless of failfast settings while
    keeping the failure handling correct.

    This patch only implements mixed merge but doesn't enable it. The
    next one will update SCSI to make use of mixed merge.

    Signed-off-by: Tejun Heo
    Cc: Niel Lambrechts
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • bio and request use the same set of failfast bits. This patch makes
    the following changes to simplify things.

    * enumify BIO_RW* bits and reorder bits such that BIOS_RW_FAILFAST_*
    bits coincide with __REQ_FAILFAST_* bits.

    * The above pushes BIO_RW_AHEAD out of sync with __REQ_FAILFAST_DEV
    but the matching is useless anyway. init_request_from_bio() is
    responsible for setting FAILFAST bits on FS requests and non-FS
    requests never use BIO_RW_AHEAD. Drop the code and comment from
    blk_rq_bio_prep().

    * Define REQ_FAILFAST_MASK which is OR of all FAILFAST bits and
    simplify FAILFAST flags handling in init_request_from_bio().

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • This enables us to track who does what and print info. Its main use
    is catching dirty inodes on the default_backing_dev_info, so we can
    fix that up.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

02 Sep, 2009

1 commit

  • The patch "block: Use accessor functions for queue limits"
    (ae03bf639a5027d27270123f5f6e3ee6a412781d) changed queue_max_sectors_store()
    to use blk_queue_max_sectors() instead of directly assigning the value.

    But blk_queue_max_sectors() differs a bit
    1. It sets both max_sectors_kb, and max_hw_sectors_kb
    2. Never allows one to change max_sectors_kb above BLK_DEF_MAX_SECTORS. If one
    specifies a value greater then max_hw_sectors is set to that value but
    max_sectors is set to BLK_DEF_MAX_SECTORS

    I am not sure whether blk_queue_max_sectors() should be changed, as it seems
    to be that way for a long time. And there may be callers dependent on that
    behaviour.

    This patch simply reverts to the older way of directly assigning the value to
    max_sectors as it was before.

    Signed-off-by: Nikanth Karthikesan
    Signed-off-by: Jens Axboe

    Nikanth Karthikesan
     

14 Aug, 2009

1 commit

  • Conflicts:
    arch/sparc/kernel/smp_64.c
    arch/x86/kernel/cpu/perf_counter.c
    arch/x86/kernel/setup_percpu.c
    drivers/cpufreq/cpufreq_ondemand.c
    mm/percpu.c

    Conflicts in core and arch percpu codes are mostly from commit
    ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many
    num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all
    the first chunk allocators into mm/percpu.c, the changes are moved
    from arch code to mm/percpu.c.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

05 Aug, 2009

1 commit


01 Aug, 2009

4 commits


29 Jul, 2009

1 commit

  • Prior to the change for more sane end_io functions, we exported
    the helpers with the normal EXPORT_SYMBOL(). That got changed
    to _GPL() for the new interface. Revert that particular change,
    on the basis that this is basic functionality and doesn't dip
    into internal structures. If these exports can't be non-GPL,
    then we may as well make EXPORT_SYMBOL() imply GPL for
    everything.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

28 Jul, 2009

2 commits


17 Jul, 2009

2 commits

  • In blk-sysfs.c, queue_var_store uses unsigned long to store data,
    but queue_var_show uses unsigned int to show data. This causes,

    # echo 70000000000 > /sys/block//queue/read_ahead_kb
    # cat /sys/block//queue/read_ahead_kb => get wrong value

    Fix it by using unsigned long.

    While at it, convert queue_rq_affinity_show() such that it uses bool
    variable instead of explicit != 0 testing.

    Signed-off-by: Xiaotian Feng
    Signed-off-by: Tejun Heo

    Xiaotian Feng
     
  • Commit ab0fd1debe730ec9998678a0c53caefbd121ed10 tries to prevent merge
    of requests with different failfast settings. In elv_rq_merge_ok(),
    it compares new bio's failfast flags against the merge target
    request's. However, the flag testing accessors for bio and blk don't
    return boolean but the tested bit value directly and FAILFAST on bio
    and blk don't match, so directly comparing them with == results in
    false negative unnecessary preventing merge of readahead requests.

    This patch convert the results to boolean by negating them before
    comparison.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Cc: Boaz Harrosh
    Cc: FUJITA Tomonori
    Cc: James Bottomley
    Cc: Jeff Garzik

    Tejun Heo
     

11 Jul, 2009

2 commits

  • In case memory is scarce, we now default to oom_cfqq. Once memory is
    available again, we should allocate a new cfqq and stop using oom_cfqq for
    a particular io context.

    Once a new request comes in, check if we are using oom_cfqq, and if yes,
    try to allocate a new cfqq.

    Tested the patch by forcing the use of oom_cfqq and upon next request thread
    realized that it was using oom_cfqq and it allocated a new cfqq.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • Currently, blk_scsi_ioctl_init() is not called since it lacks
    an initcall marking. This causes the command table to be
    unitialized, hence somce commands are block when they should
    not have been.

    This fixes a regression introduced by commit
    018e0446890661504783f92388ecce7138c1566d

    Signed-off-by: FUJITA Tomonori
    Signed-off-by: Jens Axboe

    FUJITA Tomonori