04 Jul, 2009

2 commits

  • Block layer used to merge requests and bios with different failfast
    settings. This caused regular IOs to fail prematurely when they were
    merged into failfast requests for readahead.

    Niel Lambrechts could trigger the problem semi-reliably on ext4 when
    resuming from STR. ext4 uses readahead when reading inodes and
    combined with the deterministic extra SATA PHY exception cycle during
    resume on the specific configuration, non-readahead inode read would
    fail causing ext4 errors. Please read the following thread for
    details.

    http://lkml.org/lkml/2009/5/23/21

    This patch makes block layer reject merging if the failfast settings
    don't match. This is correct but likely to lower IO performance by
    preventing regular IOs from mingling into surrounding readahead
    requests. Changes to allow such mixed merges and handle errors
    correctly will be added later.

    Signed-off-by: Tejun Heo
    Reported-by: Niel Lambrechts
    Cc: Theodore Tso
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • When doing an unexpected shutdown like kexec the cciss
    firmware might still have some commands in flight, which
    it is trying to complete.
    The driver is doing it's best on resetting the HBA,
    but sadly there's a firmware issue causing the firmware
    _not_ to abort or drop old commands.
    So the firmware will send us commands which we haven't
    accounted for, causing the driver to panic.

    With this patch we're just ignoring these commands as
    there is nothing we could be doing with them anyway.

    Signed-off-by: Hannes Reinecke
    Acked-by: Mike Miller
    Signed-off-by: Jens Axboe

    Hannes Reinecke
     

02 Jul, 2009

10 commits

  • Somehow I managed to generate a diff that put these 2 lines
    into the wrong function: should have been in dump_struct()
    instead of in dump_enum().

    Signed-off-by: Randy Dunlap
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • * git://git.infradead.org/mtd-2.6:
    mtd: nand: fix build failure and incorrect return from omap_wait()
    mtd: Use BLOCK_NIL consistently in NFTL/INFTL
    mtd: m25p80 timeout too short for worst-case m25p16 devices
    mtd: atmel_nand: Fix typo s/parititions/partitions/
    mtd: cmdlineparts: Use 64-bit format when printing a debug message.
    mtd: maps: Remove BUS_ID_SIZE from integrator_flash
    jffs2: fix another potential leak on error path in scan.c

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: invalidation reverse calls
    fuse: allow umask processing in userspace
    fuse: fix bad return value in fuse_file_poll()
    fuse: fix return value of fuse_dev_write()

    Linus Torvalds
     
  • This fixes kernel.org bug #13584. The IOVA code attempted to optimise
    the insertion of new ranges into the rbtree, with the unfortunate result
    that some ranges just didn't get inserted into the tree at all. Then
    those ranges would be handed out more than once, and things kind of go
    downhill from there.

    Introduced after 2.6.25 by ddf02886cbe665d67ca750750196ea5bf524b10b
    ("PCI: iova RB tree setup tweak").

    Signed-off-by: David Woodhouse
    Cc: mark gross
    Cc: Andrew Morton
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    David Woodhouse
     
  • We can run a 32-bit kernel on boxes with an IOMMU, so we need
    pci_unmap_addr() etc. to work -- without it, drivers will leak mappings.

    To be honest, this whole thing looks like it's more pain than it's
    worth; I'm half inclined to remove the no-op #else case altogether.

    But this is the minimal fix, which just does the right thing if
    CONFIG_DMAR is set.

    Signed-off-by: David Woodhouse
    Cc: stable@kernel.org [ for 2.6.30 ]
    Signed-off-by: Linus Torvalds

    David Woodhouse
     
  • Check before use it.

    Signed-off-by: WANG Cong
    Cc: Alexander Viro
    Cc: David Howells
    Acked-by: Roland McGrath
    Acked-by: James Morris
    Signed-off-by: Linus Torvalds

    Amerigo Wang
     
  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    cfq-iosched: remove redundant check for NULL cfqq in cfq_set_request()
    blocK: Restore barrier support for md and probably other virtual devices.
    block: get rid of queue-private command filter
    block: Create bip slabs with embedded integrity vectors
    cfq-iosched: get rid of the need for __GFP_NOFAIL in cfq_find_alloc_queue()
    cfq-iosched: move cfqq initialization out of cfq_find_alloc_queue()
    Trivial typo fixes in Documentation/block/data-integrity.txt.

    Linus Torvalds
     
  • * 'for-linus' of git://neil.brown.name/md:
    md: use interruptible wait when duration is controlled by userspace.
    md/raid5: suspend shouldn't affect read requests.
    md: tidy up error paths in md_alloc
    md: fix error path when duplicate name is found on md device creation.
    md: avoid dereferencing NULL pointer when accessing suspend_* sysfs attributes.
    md: Use new topology calls to indicate alignment and I/O sizes

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (31 commits)
    Revert "ipv4: arp announce, arp_proxy and windows ip conflict verification"
    igb: return PCI_ERS_RESULT_DISCONNECT on permanent error
    e1000e: io_error_detected callback should return PCI_ERS_RESULT_DISCONNECT
    e1000: return PCI_ERS_RESULT_DISCONNECT on permanent error
    e1000: fix unmap bug
    igb: fix unmap length bug
    ixgbe: fix unmap length bug
    ixgbe: Fix link capabilities during adapter resets
    ixgbe: Fix device capabilities of 82599 single speed fiber NICs.
    ixgbe: Fix SFP log messages
    usbnet: Remove private stats structure
    usbnet: Use netdev stats structure
    smsc95xx: Use netdev stats structure
    rndis_host: Use netdev stats structure
    net1080: Use netdev stats structure
    dm9601: Use netdev stats structure
    cdc_eem: Use netdev stats structure
    ipv4: Fix fib_trie rebalancing, part 3
    bnx2x: Fix the behavior of ethtool when ONBOOT=no
    sctp: xmit sctp packet always return no route error
    ...

    Linus Torvalds
     
  • One of the kmemleak changes caused the following
    scheduling-while-holding-the-tasklist-lock regression on x86:

    BUG: sleeping function called from invalid context at mm/kmemleak.c:795
    in_atomic(): 1, irqs_disabled(): 0, pid: 1737, name: kmemleak
    2 locks held by kmemleak/1737:
    #0: (scan_mutex){......}, at: [] kmemleak_scan_thread+0x45/0x86
    #1: (tasklist_lock){......}, at: [] kmemleak_scan+0x1a9/0x39c
    Pid: 1737, comm: kmemleak Not tainted 2.6.31-rc1-tip #59266
    Call Trace:
    [] ? __debug_show_held_locks+0x1e/0x20
    [] __might_sleep+0x10a/0x111
    [] scan_yield+0x17/0x3b
    [] scan_block+0x39/0xd4
    [] kmemleak_scan+0x1bb/0x39c
    [] ? kmemleak_scan_thread+0x0/0x86
    [] kmemleak_scan_thread+0x4a/0x86
    [] kthread+0x6e/0x73
    [] ? kthread+0x0/0x73
    [] kernel_thread_helper+0x7/0x10
    kmemleak: 834 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

    The bit causing it is highly dubious:

    static void scan_yield(void)
    {
    might_sleep();

    if (time_is_before_eq_jiffies(next_scan_yield)) {
    schedule();
    next_scan_yield = jiffies + jiffies_scan_yield;
    }
    }

    It called deep inside the codepath and in a conditional way,
    and that is what crapped up when one of the new scan_block()
    uses grew a tasklist_lock dependency.

    This minimal patch removes that yielding stuff and adds the
    proper cond_resched().

    The background scanning thread could probably also be reniced
    to +10.

    Signed-off-by: Ingo Molnar
    Acked-by: Pekka Enberg
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

01 Jul, 2009

28 commits

  • With the changes for falling back to an oom_cfqq, we never fail
    to find/allocate a queue in cfq_get_queue(). So remove the check.

    Signed-off-by: Shan Wei
    Signed-off-by: Jens Axboe

    Shan Wei
     
  • The next_ordered flag is only meaningful for devices that use __make_request.
    So move the test against next_ordered out of generic code and in to
    __make_request

    Since this test was added, barriers have not worked on md or any
    devices that don't use __make_request and so don't bother to set
    next_ordered. (dm explicitly sets something other than
    QUEUE_ORDERED_NONE since
    commit 99360b4c18f7675b50d283301d46d755affe75fd
    but notes in the comments that it is otherwise meaningless).

    Cc: Ken Milmore
    Cc: stable@kernel.org
    Signed-off-by: NeilBrown
    Signed-off-by: Jens Axboe

    NeilBrown
     
  • The initial patches to support this through sysfs export were broken
    and have been if 0'ed out in any release. So lets just kill the code
    and reclaim some space in struct request_queue, if anyone would later
    like to fixup the sysfs bits, the git history can easily restore
    the removed bits.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This patch restores stacking ability to the block layer integrity
    infrastructure by creating a set of dedicated bip slabs. Each bip slab
    has an embedded bio_vec array at the end. This cuts down on memory
    allocations and also simplifies the code compared to the original bvec
    version. Only the largest bip slab is backed by a mempool. The pool is
    contained in the bio_set so stacking drivers can ensure forward
    progress.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Setup an emergency fallback cfqq that we allocate at IO scheduler init
    time. If the slab allocation fails in cfq_find_alloc_queue(), we'll just
    punt IO to that cfqq instead. This ensures that cfq_find_alloc_queue()
    never fails without having to ensure free memory.

    On cfqq lookup, always try to allocate a new cfqq if the given cfq io
    context has the oom_cfqq assigned. This ensures that we only temporarily
    punt to this shared queue.

    Reviewed-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • We're going to be needing that init code outside of that function
    to get rid of the __GFP_NOFAIL in cfqq allocation.

    Reviewed-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Signed-off-by: Andre Noll
    Acked-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Andre Noll
     
  • User space can set various limits on an md array so that resync waits
    when it gets to a certain point, or so that I/O is blocked for a short
    while.
    When md is waiting against one of these limit, it should use an
    interruptible wait so as not to add to the load average, and so are
    not to trigger a warning if the wait goes on for too long.

    Signed-off-by: NeilBrown

    NeilBrown
     
  • md allows write to regions on an array to be suspended temporarily.
    This allows user-space to participate is aspects of reshape.
    In particular, data can be copied with not risk of a race.
    We should not be blocking read requests though, so don't.

    Cc: stable@kernel.org
    Signed-off-by: NeilBrown

    NeilBrown
     
  • This reverts commit 73ce7b01b4496a5fbf9caf63033c874be692333f.

    After discovering that we don't listen to gratuitious arps in 2.6.30
    I tracked the failure down to this commit.

    The patch makes absolutely no sense. RFC2131 RFC3927 and RFC5227.
    are all in agreement that an arp request with sip == 0 should be used
    for the probe (to prevent learning) and an arp request with sip == tip
    should be used for the gratitous announcement that people can learn
    from.

    It appears the author of the broken patch got those two cases confused
    and modified the code to drop all gratuitous arp traffic. Ouch!

    Cc: stable@kernel.org
    Signed-off-by: Eric W. Biederman
    Signed-off-by: David S. Miller

    Eric W. Biederman
     
  • PCI drivers that implement the io_error_detected callback should return
    PCI_ERS_RESULT_DISCONNECT if the state passed in is
    pci_channel_io_perm_failure. This patch fixes the issue for igb.

    Signed-off-by: Alexander Duyck
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Alexander Duyck
     
  • on permanent failure

    PCI drivers that implement the io_error_detected callback
    should return PCI_ERS_RESULT_DISCONNECT if the state
    passed in is pci_channel_io_perm_failure. This state is not
    checked in many of the network drivers.

    This patch fixes the omission in the e1000e driver.

    Signed-off-by: Mike Mason
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Mike Mason
     
  • PCI drivers that implement the io_error_detected callback
    should return PCI_ERS_RESULT_DISCONNECT if the state
    passed in is pci_channel_io_perm_failure. This state is
    not checked in many of the network drivers.

    The patch fixes the omission in the e1000 driver.

    Based on Mike Mason's similar patch for e1000e.

    Signed-off-by: Andre Detsch
    CC: Mike Mason
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Andre Detsch
     
  • as reported by kerneloops.org

    [ 121.781161] ------------[ cut here ]------------
    [ 121.781171] WARNING: at lib/dma-debug.c:793 check_unmap+0x14e/0x577()
    [ 121.781173] Hardware name: S5520HC
    [ 121.781177] e1000 0000:0a:00.0: DMA-API: device driver tries to free DMA
    memory it has not allocated [device address=0x00000001d688b0fa] [size=1522
    bytes]
    [ 121.781180] Modules linked in: e1000 mdio dca [last unloaded: ixgbe]
    [ 121.781187] Pid: 4793, comm: bash Tainted: P 2.6.30-master-06161113 #3
    [ 121.781190] Call Trace:
    [ 121.781195] [] ? check_unmap+0x14e/0x577
    [ 121.781201] [] warn_slowpath_common+0x77/0x8f
    [ 121.781205] [] warn_slowpath_fmt+0x9f/0xa1
    [ 121.781212] [] ? _spin_lock_irqsave+0x3f/0x49
    [ 121.781216] [] ? get_hash_bucket+0x28/0x33
    [ 121.781220] [] check_unmap+0x14e/0x577
    [ 121.781225] [] ? check_bytes_and_report+0x38/0xcb
    [ 121.781230] [] debug_dma_unmap_page+0x80/0x92
    [ 121.781234] [] ? unmap_single+0x1a/0x4e
    [ 121.781239] [] ? __kfree_skb+0x74/0x78
    [ 121.781250] [] pci_unmap_single+0x64/0x6d [e1000]
    [ 121.781259] [] e1000_clean_rx_ring+0x4c/0xbf [e1000]
    [ 121.781268] [] e1000_clean_all_rx_rings+0x28/0x36 [e1000]
    [ 121.781277] [] e1000_down+0x138/0x141 [e1000]
    [ 121.781286] [] __e1000_shutdown+0x6b/0x198 [e1000]
    [ 121.781296] [] e1000_suspend+0x17/0x50 [e1000]
    [ 121.781301] [] pci_legacy_suspend+0x3b/0xbe
    [ 121.781305] [] pci_pm_suspend+0x3e/0xf1
    [ 121.781310] [] pm_op+0x57/0xde
    [ 121.781314] [] dpm_suspend_start+0x31e/0x470
    [ 121.781319] [] suspend_devices_and_enter+0x3e/0x1a2
    [ 121.781323] [] enter_state+0xd1/0x127
    [ 121.781327] [] state_store+0xa7/0xc9
    [ 121.781332] [] kobj_attr_store+0x17/0x19
    [ 121.781336] [] sysfs_write_file+0xe5/0x121
    [ 121.781341] [] vfs_write+0xab/0x105
    [ 121.781344] [] sys_write+0x47/0x6d
    [ 121.781349] [] system_call_fastpath+0x16/0x1b
    [ 121.781352] ---[ end trace 97bacaaac2ed7786 ]---

    Fix is to correctly zero out internal ->dma value when unmapping
    and make sure never to unmap unless there specifically was a mapping done.

    Signed-off-by: Jesse Brandeburg
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Jesse Brandeburg
     
  • driver was mixing NET_IP_ALIGN count bytes in map/unmap calls
    unevenly. Only map the bytes that the hardware might dma into

    also fix unmap related bug where ->dma was not being cleared
    after unmap

    Signed-off-by: Jesse Brandeburg
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Jesse Brandeburg
     
  • This patch addresses three WARN_ON statements from DMA-API debug code

    ixgbe is mapping more than it unmaps, reduce the length of the map call and
    remove the "used once" local variable.

    found by Joerg Roedel in 2.6.30, so is a candidate
    for -stable.

    in addition, fix missing ->dma = 0 after unmap to prevent double free with
    pci_unmap_single

    and lastly, don't unmap (half) pages that aren't mapped.

    Signed-off-by: Jesse Brandeburg
    CC: Joerg Roedel
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Jesse Brandeburg
     
  • Adapter link advertisement capabilities were not persistent during
    adapter resets. While configuring multispeed fiber link check for
    phy autoneg_advertised settings before overwriting with default
    link capabilities

    Signed-off-by: Mallikarjuna R Chilakala
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Mallikarjuna R Chilakala
     
  • 82599 single speed fiber modules only support 10G/Full. Return
    proper device capabilities while querrying the adapter and error
    while changing device advertisement/speed/duplex capabilities.

    Signed-off-by: Mallikarjuna R Chilakala
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Mallikarjuna R Chilakala
     
  • We had a wide range of log messages for the same sort of SFP
    failure. This patch makes them all more similar and less
    confusing along with converting them to dev_err.

    Signed-off-by: Don Skidmore
    Signed-off-by: Jeff Kirsher
    Signed-off-by: David S. Miller

    Don Skidmore
     
  • Now that nothing uses the private stats structure we can remove it.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Now that netdev has its own stats structure we should use that
    instead.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Now that netdev has its own stats structure we should use that
    instead.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Now that netdev has its own stats structure we should use that
    instead.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Now that netdev has its own stats structure we should use that
    instead.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Now that netdev has its own stats structure we should use that
    instead.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • Now that netdev has its own stats structure we should use that
    instead.

    Signed-off-by: Herbert Xu
    Signed-off-by: David S. Miller

    Herbert Xu
     
  • As the recent bug in md_alloc showed, having a single exit path for
    unlocking and putting is a good idea. So restructure md_alloc to have
    a single mutex_unlock and mddev_put, and use gotos where necessary.

    Found-by: Jiri Slaby
    Signed-off-by: NeilBrown

    NeilBrown
     
  • When an md device is created by name (rather than number) we need to
    check that the name is not already in use. If this check finds a
    duplicate, we return an error without dropping the lock or freeing
    the newly create mddev.
    This patch fixes that.

    Cc: stable@kernel.org
    Found-by: Jiri Slaby
    Signed-off-by: NeilBrown

    NeilBrown