18 Apr, 2008

1 commit


11 Apr, 2008

1 commit

  • This patch adds the missing include directive to the
    cciss.c source file.   This was discovered by our release team when building
    the kernel for the Alpha architecture.

    Errors were found as references to functions 'sg_init_table' and 'sg_page' do
    not exist without the include for Alpha.

    Signed-off-by: Mike Pagano
    Cc: Jens Axboe
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Pagano
     

09 Apr, 2008

1 commit

  • When __blk_end_request returns nonzero, it means that the request was
    not completely processed and some BIOs are still attached. Since we
    have dequeued it by that time, it means leaking requests and hanging
    processes, which is why BUG() was in there. In ub this happens if
    a packet request ends normally, but with residue (e.g. when scsi_id
    issues INQUIRY).

    The fix is to make sure that arguments passed to __blk_end_request
    are correct: the full request length and not just transferred length.
    The transferred length is indicated to applications by adjusting
    rq->data_len with old, unchanged code outside of this patch.

    Signed-off-by: Pete Zaitcev
    Cc: Kiyoshi Ueda
    Cc: Greg KH
    Cc: Boaz Harrosh
    Cc: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pete Zaitcev
     

04 Apr, 2008

1 commit


03 Apr, 2008

1 commit

  • NBD does not protect the nbd_device's socket from becoming NULL during
    receives.

    This closes a race with the NBD_CLEAR_SOCK ioctl (nbd-client -d) setting
    the nbd_device's socket to NULL right before NBD calls sock_xmit.

    Signed-off-by: Mike Snitzer
    Cc: Paul Clements
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Snitzer
     

26 Mar, 2008

1 commit


18 Mar, 2008

1 commit


17 Mar, 2008

2 commits


14 Mar, 2008

1 commit

  • Floppy rmmod locks up when no such hardware was initialized, since there is
    nobody to wake the remove code up. Remove the completion, because release is
    called during platform_unregister anyway.

    Signed-off-by: Jiri Slaby
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     

13 Mar, 2008

1 commit

  • The iSeries viodasd drivers does some very strange things with
    scatterlists, one of these causing a BUG_ON to trigger when
    scatterlist debugging is enabled due to initializing the
    scatterlist with memset instead of sg_init_table().

    This fixes it by using sg_init_table(). The rest of the stuff
    it does to that poor list is still pretty awful but it will work.

    I may look into fixing things in a nicer way some other time.

    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Paul Mackerras

    Benjamin Herrenschmidt
     

05 Mar, 2008

1 commit

  • On my system, pkt_open() consumes 584 bytes because the compiler decides to
    inline lots of functions that would not normally be part of long call chains.
    The following patch fixes that problem on my system.

    Signed-off-by: Peter Osterlund
    Cc: Nix
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Osterlund
     

04 Mar, 2008

2 commits

  • This patch removes the #define READ_AHEAD 1024 from the driver and uses the
    block layer defaults, instead. We have found that under certain workloads
    the setting can cause a disk connected to the e200 controller to go offline.
    If the disk hiccups the link may try to downshift but the controller is
    never notified that the link successfully completed the renegotiation.
    We've also found that performance using the block layer default of 32 pages
    was on par with the 1024 setting. We tried setting it to zero at one time
    based on info from our firmware guys but that killed performance. Turns out
    we were talking about 2 different read ahead settings.
    Please consider this for inclusion.

    Signed-off-by: Mike Miller
    Signed-off-by: Jens Axboe

    Mike Miller
     
  • volumes

    This patch allows us to display information about all of the logical volumes
    configured on a particular controller without stepping on memory even when
    there are many volumes (128 or more) configured.
    Please consider this for inclusion.

    Signed-off-by: Mike Miller
    Signed-off-by: Jens Axboe

    Mike Miller
     

24 Feb, 2008

1 commit

  • NBD doesn't work well with CFQ (or AS) schedulers, so let's default to
    something else.

    The two problems I have experienced with nbd and cfq are:

    1) nbd hangs with cfq on RHEL 5 (2.6.18) -- this may well have been
    fixed

    There's a similar debian bug that has been filed as well:

    http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=447638

    There have been posts to nbd-general mailing list about problems with
    cfq and nbd also.

    2) nbd performs about 10% better (the last time I tested) with deadline
    vs. cfq (the overhead of cfq doesn't provide much advantage to nbd [not
    being a real disk], and you end up going through the I/O scheduler on
    the nbd server anyway, so it makes sense that deadline is better with
    nbd)

    Signed-off-by: Paul Clements
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Clements
     

22 Feb, 2008

1 commit

  • The below implements the getgeo hook for Xen block devices. Extracted
    from the xen-unstable tree where it has been used for ages.

    It is useful to have because it allows things like grub2 (used by the
    Debian installer images) to work in a guest domain without having to
    sprinkle Xen specific hacks around the place.

    Signed-off-by: Ian Campbell
    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Linus Torvalds

    Ian Campbell
     

15 Feb, 2008

1 commit

  • The current pmac32_defconfig fails to build with the following error:

    Building modules, stage 2.
    ERROR: "check_media_bay" [drivers/block/swim3.ko] undefined!
    WARNING: modpost: Found 23 section mismatch(es).
    To see full details build your kernel with:
    'make CONFIG_DEBUG_SECTION_MISMATCH=y'
    make[2]: *** [__modpost] Error 1

    This patch fixes that.

    Signed-off-by: Tony Breeds
    Acked-by: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Acked-by: Bartlomiej Zolnierkiewicz
    Cc: Josh Boyer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tony Breeds
     

10 Feb, 2008

1 commit


09 Feb, 2008

16 commits

  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    Enhanced partition statistics: documentation update
    Enhanced partition statistics: remove old partition statistics
    Enhanced partition statistics: procfs
    Enhanced partition statistics: sysfs
    Enhanced partition statistics: aoe fix
    Enhanced partition statistics: update partition statitics
    Enhanced partition statistics: core statistics
    block: fixup rq_init() a bit

    Manually fixed conflict in drivers/block/aoe/aoecmd.c due to statistics
    support.

    Linus Torvalds
     
  • Remove the arbitrary 128 device limit for NBD. nbds_max can now be set to
    any number. In certain scenarios where devices are used sparsely we have
    run into the 128 device limit.

    Signed-off-by: Paul Clements
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Clements
     
  • I guess aoedev_init() can go away now.

    Cc: Greg KH
    Cc: "Ed L. Cashin"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Update the year in the copyright notices.

    Signed-off-by: Ed L. Cashin
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed L. Cashin
     
  • Andrew Morton pointed out that the "too many targets" message in patch 2 could
    be printed for failing GFP_ATOMIC allocations. This patch makes the messages
    more specific.

    Signed-off-by: Ed L. Cashin
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed L. Cashin
     
  • The aoedev aoeminor member doesn't need a long format.

    Signed-off-by: Ed L. Cashin
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed L. Cashin
     
  • An AoE target provides an estimate of the number of outstanding commands that
    the AoE initiator can send before getting a response. The aoe_maxout
    parameter provides a way to set an even lower limit. It will not allow a user
    to use more outstanding commands than the target permits. If a user discovers
    a problem with a large setting, this parameter provides a way for us to work
    with them to debug the problem. We expect to improve the dynamic window
    sizing algorithm and drop this parameter. For the time being, it is a
    debugging aid.

    Signed-off-by: Ed L. Cashin
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed L. Cashin
     
  • An aoe driver user who had about 70 AoE targets found that he was hitting a
    BUG in sysfs_create_file because the aoe driver was trying to tell the kernel
    about an AoE device more than once. Each AoE device was reachable by several
    local network interfaces, and multiple ATA device indentify responses were
    returning from that single device.

    This patch eliminates a race condition so that aoe always informs the block
    layer of a new AoE device once in the presence of multiple incoming ATA device
    identify responses.

    Signed-off-by: Ed L. Cashin
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed L. Cashin
     
  • What this Patch Does

    Even before this recent series of 12 patches to 2.6.22-rc4, the aoe
    driver was reusing a small set of skbs that were allocated once and
    were only used for outbound AoE commands.

    The network layer cannot be allowed to put_page on the data that is
    still associated with a bio we haven't returned to the block layer,
    so the aoe driver (even before the patch under discussion) is still
    the owner of skbs that have been handed to the network layer for
    transmission. We need to keep track of these skbs so that we can
    free them, but by tracking them, we can also easily re-use them.

    The new patch was a response to the behavior of certain network
    drivers. We cannot reuse an skb that the network driver still has
    in its transmit ring. Network drivers can defer transmit ring
    cleanup and then use the state in the skb to determine how many data
    segments to clean up in its transmit ring. The tg3 driver is one
    driver that behaves in this way.

    When the network driver defers cleanup of its transmit ring, the aoe
    driver can find itself in a situation where it would like to send an
    AoE command, and the AoE target is ready for more work, but the
    network driver still has all of the pre-allocated skbs. In that
    case, the new patch just calls alloc_skb, as you'd expect.

    We don't want to get carried away, though. We try not to do
    excessive allocation in the write path, so we cap the number of skbs
    we dynamically allocate.

    Probably calling it a "dynamic pool" is misleading. We were already
    trying to use a small fixed-size set of pre-allocated skbs before
    this patch, and this patch just provides a little headroom (with a
    ceiling, though) to accomodate network drivers that hang onto skbs,
    by allocating when needed. The d->skbpool_hd list of allocated skbs
    is necessary so that we can free them later.

    We didn't notice the need for this headroom until AoE targets got
    fast enough.

    Alternatives

    If the network layer never did a put_page on the pages in the bio's
    we get from the block layer, then it would be possible for us to
    hand skbs to the network layer and forget about them, allowing the
    network layer to free skbs itself (and thereby calling our own
    skb->destructor callback function if we needed that). In that case
    we could get rid of the pre-allocated skbs and also the
    d->skbpool_hd, instead just calling alloc_skb every time we wanted
    to transmit a packet. The slab allocator would effectively maintain
    the list of skbs.

    Besides a loss of CPU cache locality, the main concern with that
    approach the danger that it would increase the likelihood of
    deadlock when VM is trying to free pages by writing dirty data from
    the page cache through the aoe driver out to persistent storage on
    an AoE device. Right now we have a situation where we have
    pre-allocation that corresponds to how much we use, which seems
    ideal.

    Of course, there's still the separate issue of receiving the packets
    that tell us that a write has successfully completed on the AoE
    target. When memory is low and VM is using AoE to flush dirty data
    to free up pages, it would be perfect if there were a way for us to
    register a fast callback that could recognize write command
    completion responses. But I don't think the current problems with
    the receive side of the situation are a justification for
    exacerbating the problem on the transmit side.

    Signed-off-by: Ed L. Cashin
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed L. Cashin
     
  • When an AoE device is detected, the kernel is informed, and a new block device
    is created. If the device is unused, the block device corresponding to remote
    device that is no longer available may be removed from the system by telling
    the aoe driver to "flush" its list of devices.

    Without this patch, software like GPFS and LVM may attempt to read from AoE
    devices that were discovered earlier but are no longer present, blocking until
    the I/O attempt times out.

    Signed-off-by: Ed L. Cashin
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed L. Cashin
     
  • Adam Richter suggested eliminating this goto.

    Signed-off-by: Ed L. Cashin
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed L. Cashin
     
  • By returning unsigned long long, mac_addr does not generate compiler warnings
    on 64-bit architectures.

    Signed-off-by: Ed L. Cashin
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed L. Cashin
     
  • A remote AoE device is something can process ATA commands and is identified by
    an AoE shelf number and an AoE slot number. Such a device might have more
    than one network interface, and it might be reachable by more than one local
    network interface. This patch tracks the available network paths available to
    each AoE device, allowing them to be used more efficiently.

    Andrew Morton asked about the call to msleep_interruptible in the revalidate
    function. Yes, if a signal is pending, then msleep_interruptible will not
    return 0. That means we will not loop but will call aoenet_xmit with a NULL
    skb, which is a noop. If the system is too low on memory or the aoe driver is
    too low on frames, then the user can hit control-C to interrupt the attempt to
    do a revalidate. I have added a comment to the code summarizing that.

    Andrew Morton asked whether the allocation performed inside addtgt could use a
    more relaxed allocation like GFP_KERNEL, but addtgt is called when the aoedev
    lock has been locked with spin_lock_irqsave. It would be nice to allocate the
    memory under fewer restrictions, but targets are only added when the device is
    being discovered, and if the target can't be added right now, we can try again
    in a minute when then next AoE config query broadcast goes out.

    Andrew Morton pointed out that the "too many targets" message could be printed
    for failing GFP_ATOMIC allocations. The last patch in this series makes the
    messages more specific.

    Signed-off-by: Ed L. Cashin
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed L. Cashin
     
  • Signed-off-by: Ed L. Cashin
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ed L. Cashin
     
  • Support direct_access XIP method with brd.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • This is a rewrite of the ramdisk block device driver.

    The old one is really difficult because it effectively implements a block
    device which serves data out of its own buffer cache. It relies on the dirty
    bit being set, to pin its backing store in cache, however there are non
    trivial paths which can clear the dirty bit (eg. try_to_free_buffers()),
    which had recently lead to data corruption. And in general it is completely
    wrong for a block device driver to do this.

    The new one is more like a regular block device driver. It has no idea about
    vm/vfs stuff. It's backing store is similar to the buffer cache (a simple
    radix-tree of pages), but it doesn't know anything about page cache (the pages
    in the radix tree are not pagecache pages).

    There is one slight downside -- direct block device access and filesystem
    metadata access goes through an extra copy and gets stored in RAM twice.
    However, this downside is only slight, because the real buffercache of the
    device is now reclaimable (because we're not playing crazy games with it), so
    under memory intensive situations, footprint should effectively be the same --
    maybe even a slight advantage to the new driver because it can also reclaim
    buffer heads.

    The fact that it now goes through all the regular vm/fs paths makes it
    much more useful for testing, too.

    text data bss dec hex filename
    2837 849 384 4070 fe6 drivers/block/rd.o
    3528 371 12 3911 f47 drivers/block/brd.o

    Text is larger, but data and bss are smaller, making total size smaller.

    A few other nice things about it:
    - Similar structure and layout to the new loop device handlinag.
    - Dynamic ramdisk creation.
    - Runtime flexible buffer head size (because it is no longer part of the
    ramdisk code).
    - Boot / load time flexible ramdisk size, which could easily be extended
    to a per-ramdisk runtime changeable size (eg. with an ioctl).
    - Can use highmem for the backing store.

    [akpm@linux-foundation.org: fix build]
    [byron.bbradley@gmail.com: make rd_size non-static]
    Signed-off-by: Nick Piggin
    Signed-off-by: Byron Bradley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

08 Feb, 2008

2 commits

  • Updates the enhanced partition statistics in ATA over Ethernet driver
    (not tested).

    Signed-off-by: Jerome Marchand

    Jerome Marchand
     
  • * 'for-2.6.25' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: (69 commits)
    [POWERPC] Add SPE registers to core dumps
    [POWERPC] Use regset code for compat PTRACE_*REGS* calls
    [POWERPC] Use generic compat_sys_ptrace
    [POWERPC] Use generic compat_ptrace_request
    [POWERPC] Use generic ptrace peekdata/pokedata
    [POWERPC] Use regset code for PTRACE_*REGS* requests
    [POWERPC] Switch to generic compat_binfmt_elf code
    [POWERPC] Switch to using user_regset-based core dumps
    [POWERPC] Add user_regset compat support
    [POWERPC] Add user_regset_view definitions
    [POWERPC] Use user_regset accessors for GPRs
    [POWERPC] ptrace accessors for special regs MSR and TRAP
    [POWERPC] Use user_regset accessors for SPE regs
    [POWERPC] Use user_regset accessors for altivec regs
    [POWERPC] Use user_regset accessors for FP regs
    [POWERPC] mpc52xx: fix compile error introduce when rebasing patch
    [POWERPC] 4xx: PCIe indirect DCR spinlock fix.
    [POWERPC] Add missing native dcr dcr_ind_lock spinlock
    [POWERPC] 4xx: Fix offset value on Warp board
    [POWERPC] 4xx: Add 440EPx Sequoia ehci dts entry
    ...

    Linus Torvalds
     

07 Feb, 2008

4 commits

  • Josh Boyer
     
  • Commit edfaa7c36574f1bf09c65ad602412db9da5f96bf

    Driver core: convert block from raw kobjects to core devices

    This moves the block devices to /sys/class/block. It will create a
    flat list of all block devices, with the disks and partitions in one
    directory. For compatibility /sys/block is created and contains symlinks
    to the disks.

    introduced a global disk_type variable in , causing the
    following compile error on Atari:

    drivers/block/ataflop.c:93: error: conflicting types for 'disk_type'
    include/linux/genhd.h:21: error: previous declaration of 'disk_type' was here

    Rename the local disk_type variable in drivers/block/ataflop.c to
    atari_disk_type, to avoid the conflict.

    Signed-off-by: Geert Uytterhoeven
    Cc: Kay Sievers
    Acked-by: Greg Kroah-Hartman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     
  • Use upper_32_bits(x) macro to handle shifts that may be >= the width of
    the data type.

    drivers/block/cciss.c: In function 'do_cciss_request':
    drivers/block/cciss.c:2655: warning: right shift count >= width of type
    drivers/block/cciss.c:2656: warning: right shift count >= width of type
    drivers/block/cciss.c:2657: warning: right shift count >= width of type
    drivers/block/cciss.c:2658: warning: right shift count >= width of type

    Signed-off-by: Randy Dunlap
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Signed-off-by: Robert P. J. Day
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day