14 Aug, 2009

1 commit

  • Conflicts:
    arch/sparc/kernel/smp_64.c
    arch/x86/kernel/cpu/perf_counter.c
    arch/x86/kernel/setup_percpu.c
    drivers/cpufreq/cpufreq_ondemand.c
    mm/percpu.c

    Conflicts in core and arch percpu codes are mostly from commit
    ed78e1e078dd44249f88b1dd8c76dafb39567161 which substituted many
    num_possible_cpus() with nr_cpu_ids. As for-next branch has moved all
    the first chunk allocators into mm/percpu.c, the changes are moved
    from arch code to mm/percpu.c.

    Signed-off-by: Tejun Heo

    Tejun Heo
     

05 Aug, 2009

1 commit


01 Aug, 2009

4 commits


29 Jul, 2009

1 commit

  • Prior to the change for more sane end_io functions, we exported
    the helpers with the normal EXPORT_SYMBOL(). That got changed
    to _GPL() for the new interface. Revert that particular change,
    on the basis that this is basic functionality and doesn't dip
    into internal structures. If these exports can't be non-GPL,
    then we may as well make EXPORT_SYMBOL() imply GPL for
    everything.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

28 Jul, 2009

2 commits


17 Jul, 2009

2 commits

  • In blk-sysfs.c, queue_var_store uses unsigned long to store data,
    but queue_var_show uses unsigned int to show data. This causes,

    # echo 70000000000 > /sys/block//queue/read_ahead_kb
    # cat /sys/block//queue/read_ahead_kb => get wrong value

    Fix it by using unsigned long.

    While at it, convert queue_rq_affinity_show() such that it uses bool
    variable instead of explicit != 0 testing.

    Signed-off-by: Xiaotian Feng
    Signed-off-by: Tejun Heo

    Xiaotian Feng
     
  • Commit ab0fd1debe730ec9998678a0c53caefbd121ed10 tries to prevent merge
    of requests with different failfast settings. In elv_rq_merge_ok(),
    it compares new bio's failfast flags against the merge target
    request's. However, the flag testing accessors for bio and blk don't
    return boolean but the tested bit value directly and FAILFAST on bio
    and blk don't match, so directly comparing them with == results in
    false negative unnecessary preventing merge of readahead requests.

    This patch convert the results to boolean by negating them before
    comparison.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Cc: Boaz Harrosh
    Cc: FUJITA Tomonori
    Cc: James Bottomley
    Cc: Jeff Garzik

    Tejun Heo
     

11 Jul, 2009

2 commits

  • In case memory is scarce, we now default to oom_cfqq. Once memory is
    available again, we should allocate a new cfqq and stop using oom_cfqq for
    a particular io context.

    Once a new request comes in, check if we are using oom_cfqq, and if yes,
    try to allocate a new cfqq.

    Tested the patch by forcing the use of oom_cfqq and upon next request thread
    realized that it was using oom_cfqq and it allocated a new cfqq.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • Currently, blk_scsi_ioctl_init() is not called since it lacks
    an initcall marking. This causes the command table to be
    unitialized, hence somce commands are block when they should
    not have been.

    This fixes a regression introduced by commit
    018e0446890661504783f92388ecce7138c1566d

    Signed-off-by: FUJITA Tomonori
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     

04 Jul, 2009

2 commits

  • Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix
    changes. As alpha in percpu tree uses 'weak' attribute instead of
    inline assembly, there's no need for __used attribute.

    Conflicts:
    arch/alpha/include/asm/percpu.h
    arch/mn10300/kernel/vmlinux.lds.S
    include/linux/percpu-defs.h

    Tejun Heo
     
  • Block layer used to merge requests and bios with different failfast
    settings. This caused regular IOs to fail prematurely when they were
    merged into failfast requests for readahead.

    Niel Lambrechts could trigger the problem semi-reliably on ext4 when
    resuming from STR. ext4 uses readahead when reading inodes and
    combined with the deterministic extra SATA PHY exception cycle during
    resume on the specific configuration, non-readahead inode read would
    fail causing ext4 errors. Please read the following thread for
    details.

    http://lkml.org/lkml/2009/5/23/21

    This patch makes block layer reject merging if the failfast settings
    don't match. This is correct but likely to lower IO performance by
    preventing regular IOs from mingling into surrounding readahead
    requests. Changes to allow such mixed merges and handle errors
    correctly will be added later.

    Signed-off-by: Tejun Heo
    Reported-by: Niel Lambrechts
    Cc: Theodore Tso
    Signed-off-by: Jens Axboe

    Tejun Heo
     

01 Jul, 2009

6 commits

  • With the changes for falling back to an oom_cfqq, we never fail
    to find/allocate a queue in cfq_get_queue(). So remove the check.

    Signed-off-by: Shan Wei
    Signed-off-by: Jens Axboe

    Shan Wei
     
  • The next_ordered flag is only meaningful for devices that use __make_request.
    So move the test against next_ordered out of generic code and in to
    __make_request

    Since this test was added, barriers have not worked on md or any
    devices that don't use __make_request and so don't bother to set
    next_ordered. (dm explicitly sets something other than
    QUEUE_ORDERED_NONE since
    commit 99360b4c18f7675b50d283301d46d755affe75fd
    but notes in the comments that it is otherwise meaningless).

    Cc: Ken Milmore
    Cc: stable@kernel.org
    Signed-off-by: NeilBrown
    Signed-off-by: Jens Axboe

    NeilBrown
     
  • The initial patches to support this through sysfs export were broken
    and have been if 0'ed out in any release. So lets just kill the code
    and reclaim some space in struct request_queue, if anyone would later
    like to fixup the sysfs bits, the git history can easily restore
    the removed bits.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This patch restores stacking ability to the block layer integrity
    infrastructure by creating a set of dedicated bip slabs. Each bip slab
    has an embedded bio_vec array at the end. This cuts down on memory
    allocations and also simplifies the code compared to the original bvec
    version. Only the largest bip slab is backed by a mempool. The pool is
    contained in the bio_set so stacking drivers can ensure forward
    progress.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Setup an emergency fallback cfqq that we allocate at IO scheduler init
    time. If the slab allocation fails in cfq_find_alloc_queue(), we'll just
    punt IO to that cfqq instead. This ensures that cfq_find_alloc_queue()
    never fails without having to ensure free memory.

    On cfqq lookup, always try to allocate a new cfqq if the given cfq io
    context has the oom_cfqq assigned. This ensures that we only temporarily
    punt to this shared queue.

    Reviewed-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • We're going to be needing that init code outside of that function
    to get rid of the __GFP_NOFAIL in cfqq allocation.

    Reviewed-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Jens Axboe
     

24 Jun, 2009

1 commit

  • Percpu variable definition is about to be updated such that all percpu
    symbols including the static ones must be unique. Update percpu
    variable definitions accordingly.

    * as,cfq: rename ioc_count uniquely

    * cpufreq: rename cpu_dbs_info uniquely

    * xen: move nesting_count out of xen_evtchn_do_upcall() and rename it

    * mm: move ratelimits out of balance_dirty_pages_ratelimited_nr() and
    rename it

    * ipv4,6: rename cookie_scratch uniquely

    * x86 perf_counter: rename prev_left to pmc_prev_left, irq_entry to
    pmc_irq_entry and nmi_entry to pmc_nmi_entry

    * perf_counter: rename disable_count to perf_disable_count

    * ftrace: rename test_event_disable to ftrace_test_event_disable

    * kmemleak: rename test_pointer to kmemleak_test_pointer

    * mce: rename next_interval to mce_next_interval

    [ Impact: percpu usage cleanups, no duplicate static percpu var names ]

    Signed-off-by: Tejun Heo
    Reviewed-by: Christoph Lameter
    Cc: Ivan Kokshaysky
    Cc: Jens Axboe
    Cc: Dave Jones
    Cc: Jeremy Fitzhardinge
    Cc: linux-mm
    Cc: David S. Miller
    Cc: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Li Zefan
    Cc: Catalin Marinas
    Cc: Andi Kleen

    Tejun Heo
     

21 Jun, 2009

1 commit


19 Jun, 2009

2 commits


18 Jun, 2009

1 commit


17 Jun, 2009

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (64 commits)
    debugfs: use specified mode to possibly mark files read/write only
    debugfs: Fix terminology inconsistency of dir name to mount debugfs filesystem.
    xen: remove driver_data direct access of struct device from more drivers
    usb: gadget: at91_udc: remove driver_data direct access of struct device
    uml: remove driver_data direct access of struct device
    block/ps3: remove driver_data direct access of struct device
    s390: remove driver_data direct access of struct device
    parport: remove driver_data direct access of struct device
    parisc: remove driver_data direct access of struct device
    of_serial: remove driver_data direct access of struct device
    mips: remove driver_data direct access of struct device
    ipmi: remove driver_data direct access of struct device
    infiniband: ehca: remove driver_data direct access of struct device
    ibmvscsi: gadget: at91_udc: remove driver_data direct access of struct device
    hvcs: remove driver_data direct access of struct device
    xen block: remove driver_data direct access of struct device
    thermal: remove driver_data direct access of struct device
    scsi: remove driver_data direct access of struct device
    pcmcia: remove driver_data direct access of struct device
    PCIE: remove driver_data direct access of struct device
    ...

    Manually fix up trivial conflicts due to different direct driver_data
    direct access fixups in drivers/block/{ps3disk.c,ps3vram.c}

    Linus Torvalds
     

16 Jun, 2009

7 commits


12 Jun, 2009

3 commits

  • Fix kernel-doc warnings in recently changed block/ source code.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • * 'for-2.6.31' of git://git.kernel.dk/linux-2.6-block: (153 commits)
    block: add request clone interface (v2)
    floppy: fix hibernation
    ramdisk: remove long-deprecated "ramdisk=" boot-time parameter
    fs/bio.c: add missing __user annotation
    block: prevent possible io_context->refcount overflow
    Add serial number support for virtio_blk, V4a
    block: Add missing bounce_pfn stacking and fix comments
    Revert "block: Fix bounce limit setting in DM"
    cciss: decode unit attention in SCSI error handling code
    cciss: Remove no longer needed sendcmd reject processing code
    cciss: change SCSI error handling routines to work with interrupts enabled.
    cciss: separate error processing and command retrying code in sendcmd_withirq_core()
    cciss: factor out fix target status processing code from sendcmd functions
    cciss: simplify interface of sendcmd() and sendcmd_withirq()
    cciss: factor out core of sendcmd_withirq() for use by SCSI error handling code
    cciss: Use schedule_timeout_uninterruptible in SCSI error handling code
    block: needs to set the residual length of a bidi request
    Revert "block: implement blkdev_readpages"
    block: Fix bounce limit setting in DM
    Removed reference to non-existing file Documentation/PCI/PCI-DMA-mapping.txt
    ...

    Manually fix conflicts with tracing updates in:
    block/blk-sysfs.c
    drivers/ide/ide-atapi.c
    drivers/ide/ide-cd.c
    drivers/ide/ide-floppy.c
    drivers/ide/ide-tape.c
    include/trace/events/block.h
    kernel/trace/blktrace.c

    Linus Torvalds
     
  • * 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (28 commits)
    ide-tape: fix debug call
    alim15x3: Remove historical hacks, re-enable init_hwif for PowerPC
    ide-dma: don't reset request fields on dma_timeout_retry()
    ide: drop rq->data handling from ide_map_sg()
    ide-atapi: kill unused fields and callbacks
    ide-tape: simplify read/write functions
    ide-tape: use byte size instead of sectors on rw issue functions
    ide-tape: unify r/w init paths
    ide-tape: kill idetape_bh
    ide-tape: use standard data transfer mechanism
    ide-tape: use single continuous buffer
    ide-atapi,tape,floppy: allow ->pc_callback() to change rq->data_len
    ide-tape,floppy: fix failed command completion after request sense
    ide-pm: don't abuse rq->data
    ide-cd,atapi: use bio for internal commands
    ide-atapi: convert ide-{floppy,tape} to using preallocated sense buffer
    ide-cd: convert to using generic sense request
    ide: add helpers for preparing sense requests
    ide-cd: don't abuse rq->buffer
    ide-atapi: don't abuse rq->buffer
    ...

    Linus Torvalds
     

11 Jun, 2009

3 commits

  • This patch adds the following 2 interfaces for request-stacking drivers:

    - blk_rq_prep_clone(struct request *clone, struct request *orig,
    struct bio_set *bs, gfp_t gfp_mask,
    int (*bio_ctr)(struct bio *, struct bio*, void *),
    void *data)
    * Clones bios in the original request to the clone request
    (bio_ctr is called for each cloned bios.)
    * Copies attributes of the original request to the clone request.
    The actual data parts (e.g. ->cmd, ->buffer, ->sense) are not
    copied.

    - blk_rq_unprep_clone(struct request *clone)
    * Frees cloned bios from the clone request.

    Request stacking drivers (e.g. request-based dm) need to make a clone
    request for a submitted request and dispatch it to other devices.

    To allocate request for the clone, request stacking drivers may not
    be able to use blk_get_request() because the allocation may be done
    in an irq-disabled context.
    So blk_rq_prep_clone() takes a request allocated by the caller
    as an argument.

    For each clone bio in the clone request, request stacking drivers
    should be able to set up their own completion handler.
    So blk_rq_prep_clone() takes a callback function which is called
    for each clone bio, and a pointer for private data which is passed
    to the callback.

    NOTE:
    blk_rq_prep_clone() doesn't copy any actual data of the original
    request. Pages are shared between original bios and cloned bios.
    So caller must not complete the original request before the clone
    request.

    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Cc: Boaz Harrosh
    Signed-off-by: Jens Axboe

    Kiyoshi Ueda
     
  • * 'tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (244 commits)
    Revert "x86, bts: reenable ptrace branch trace support"
    tracing: do not translate event helper macros in print format
    ftrace/documentation: fix typo in function grapher name
    tracing/events: convert block trace points to TRACE_EVENT(), fix !CONFIG_BLOCK
    tracing: add protection around module events unload
    tracing: add trace_seq_vprint interface
    tracing: fix the block trace points print size
    tracing/events: convert block trace points to TRACE_EVENT()
    ring-buffer: fix ret in rb_add_time_stamp
    ring-buffer: pass in lockdep class key for reader_lock
    tracing: add annotation to what type of stack trace is recorded
    tracing: fix multiple use of __print_flags and __print_symbolic
    tracing/events: fix output format of user stack
    tracing/events: fix output format of kernel stack
    tracing/trace_stack: fix the number of entries in the header
    ring-buffer: discard timestamps that are at the start of the buffer
    ring-buffer: try to discard unneeded timestamps
    ring-buffer: fix bug in ring_buffer_discard_commit
    ftrace: do not profile functions when disabled
    tracing: make trace pipe recognize latency format flag
    ...

    Linus Torvalds
     
  • Currently io_context has an atomic_t(32-bit) as refcount. In the case of
    cfq, for each device against whcih a task does I/O, a reference to the
    io_context would be taken. And when there are multiple process sharing
    io_contexts(CLONE_IO) would also have a reference to the same io_context.

    Theoretically the possible maximum number of processes sharing the same
    io_context + the number of disks/cfq_data referring to the same io_context
    can overflow the 32-bit counter on a very high-end machine.

    Even though it is an improbable case, let us make it atomic_long_t.

    Signed-off-by: Nikanth Karthikesan
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Nikanth Karthikesan