06 Jun, 2011

1 commit

  • Since we are modifying this RCU pointer, we need to hold
    the lock protecting it around it.

    This fixes a potential reuse and double free of a cfq
    io_context structure. The bug has been in CFQ for a long
    time, it hit very few people but those it did hit seemed
    to see it a lot.

    Tracked in RH bugzilla here:

    https://bugzilla.redhat.com/show_bug.cgi?id=577968

    Credit goes to Paul Bolle for figuring out that the issue
    was around the one-hit ioc->ioc_data cache. Thanks to his
    hard work the issue is now fixed.

    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Jens Axboe
     

03 Jun, 2011

1 commit

  • Hi, Jens,

    If you recall, I posted an RFC patch for this back in July of last year:
    http://lkml.org/lkml/2010/7/13/279

    The basic problem is that a process can issue a never-ending stream of
    async direct I/Os to the same sector on a device, thus starving out
    other I/O in the system (due to the way the alias handling works in both
    cfq and deadline). The solution I proposed back then was to start
    dispatching from the fifo after a certain number of aliases had been
    dispatched. Vivek asked why we had to treat aliases differently at all,
    and I never had a good answer. So, I put together a simple patch which
    allows aliases to be added to the rb tree (it adds them to the right,
    though that doesn't matter as the order isn't guaranteed anyway). I
    think this is the preferred solution, as it doesn't break up time slices
    in CFQ or batches in deadline. I've tested it, and it does solve the
    starvation issue. Let me know what you think.

    Cheers,
    Jeff

    Signed-off-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Jeff Moyer
     

02 Jun, 2011

3 commits

  • This reverts commit a197b59ae6e8bee56fcef37ea2482dc08414e2ac.

    As rmk says:
    "Commit a197b59ae6e8 (mm: fail GFP_DMA allocations when ZONE_DMA is not
    configured) is causing regressions on ARM with various drivers which
    use GFP_DMA.

    The behaviour up until now has been to silently ignore that flag when
    CONFIG_ZONE_DMA is not enabled, and to allocate from the normal zone.
    However, as a result of the above commit, such allocations now fail
    which causes drivers to fail. These are regressions compared to the
    previous kernel version."

    so just revert it.

    Requested-by: Russell King
    Acked-by: Andrew Morton
    Cc: David Rientjes
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • * git://git.infradead.org/iommu-2.6:
    intel-iommu: Fix off-by-one in RMRR setup
    intel-iommu: Add domain check in domain_remove_one_dev_info
    intel-iommu: Remove Host Bridge devices from identity mapping
    intel-iommu: Use coherent DMA mask when requested
    intel-iommu: Dont cache iova above 32bit
    intel-iommu: Speed up processing of the identity_mapping function
    intel-iommu: Check for identity mapping candidate using system dma mask
    intel-iommu: Only unlink device domains from iommu
    intel-iommu: Enable super page (2MiB, 1GiB, etc.) support
    intel-iommu: Flush unmaps at domain_exit
    intel-iommu: Remove obsolete comment from detect_intel_iommu
    intel-iommu: fix VT-d PMR disable for TXT on S3 resume

    Linus Torvalds
     
  • Jens' back-merge commit 698567f3fa79 ("Merge commit 'v2.6.39' into
    for-2.6.40/core") was incorrectly done, and re-introduced the
    DISK_EVENT_MEDIA_CHANGE lines that had been removed earlier in commits

    - 9fd097b14918 ("block: unexport DISK_EVENT_MEDIA_CHANGE for
    legacy/fringe drivers")

    - 7eec77a1816a ("ide: unexport DISK_EVENT_MEDIA_CHANGE for ide-gd
    and ide-cd")

    because of conflicts with the "g->flags" updates near-by by commit
    d4dc210f69bc ("block: don't block events on excl write for non-optical
    devices")

    As a result, we re-introduced the hanging behavior due to infinite disk
    media change reports.

    Tssk, tssk, people! Don't do back-merges at all, and *definitely* don't
    do them to hide merge conflicts from me - especially as I'm likely
    better at merging them than you are, since I do so many merges.

    Reported-by: Steven Rostedt
    Cc: Jens Axboe
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

01 Jun, 2011

19 commits

  • * git://git.infradead.org/mtd-2.6:
    mtd: fix physmap.h warnings

    Linus Torvalds
     
  • We were mapping an extra byte (and hence usually an extra page):
    iommu_prepare_identity_map() expects to be given an 'end' argument which
    is the last byte to be mapped; not the first byte *not* to be mapped.

    Signed-off-by: David Woodhouse

    David Woodhouse
     
  • The comment in domain_remove_one_dev_info() states "No need to compare
    PCI domain; it has to be the same". But for the si_domain that isn't
    going to be true, as it consists of all the PCI devices that are
    identity mapped thus multiple PCI domains can be in si_domain. The
    code needs to validate the PCI domain too.

    Signed-off-by: Mike Habeck
    Signed-off-by: Mike Travis
    Cc: stable@kernel.org
    Signed-off-by: David Woodhouse

    Mike Habeck
     
  • When using the 1:1 (identity) PCI DMA remapping, PCI Host Bridge devices
    that do not use the IOMMU causes a kernel panic. Fix that by not
    inserting those devices into the si_domain.

    Signed-off-by: Mike Travis
    Reviewed-by: Mike Habeck
    Cc: stable@kernel.org
    Signed-off-by: David Woodhouse

    Mike Travis
     
  • The __intel_map_single function is not honoring the passed in DMA mask.
    This results in not using the coherent DMA mask when called from
    intel_alloc_coherent().

    Signed-off-by: Mike Travis
    Acked-by: Chris Wright
    Reviewed-by: Mike Habeck
    Cc: stable@kernel.org
    Signed-off-by: David Woodhouse

    Mike Travis
     
  • Mike Travis and Mike Habeck reported an issue where iova allocation
    would return a range that was larger than a device's dma mask.

    https://lkml.org/lkml/2011/3/29/423

    The dmar initialization code will reserve all PCI MMIO regions and copy
    those reservations into a domain specific iova tree. It is possible for
    one of those regions to be above the dma mask of a device. It is typical
    to allocate iovas with a 32bit mask (despite device's dma mask possibly
    being larger) and cache the result until it exhausts the lower 32bit
    address space. Freeing the iova range that is >= the last iova in the
    lower 32bit range when there is still an iova above the 32bit range will
    corrupt the cached iova by pointing it to a region that is above 32bit.
    If that region is also larger than the device's dma mask, a subsequent
    allocation will return an unusable iova and cause dma failure.

    Simply don't cache an iova that is above the 32bit caching boundary.

    Reported-by: Mike Travis
    Reported-by: Mike Habeck
    Cc: stable@kernel.org
    Acked-by: Mike Travis
    Tested-by: Mike Habeck
    Signed-off-by: Chris Wright
    Signed-off-by: David Woodhouse

    Chris Wright
     
  • When there are a large count of PCI devices, and the pass through
    option for iommu is set, much time is spent in the identity_mapping
    function hunting though the iommu domains to check if a specific
    device is "identity mapped".

    Speed up the function by checking the cached info to see if
    it's mapped to the static identity domain.

    Signed-off-by: Mike Travis
    Reviewed-by: Mike Habeck
    Cc: stable@kernel.org
    Signed-off-by: David Woodhouse

    Mike Travis
     
  • The identity mapping code appears to make the assumption that if the
    devices dma_mask is greater than 32bits the device can use identity
    mapping. But that is not true: take the case where we have a 40bit
    device in a 44bit architecture. The device can potentially receive a
    physical address that it will truncate and cause incorrect addresses
    to be used.

    Instead check to see if the device's dma_mask is large enough
    to address the system's dma_mask.

    Signed-off-by: Mike Travis
    Reviewed-by: Mike Habeck
    Cc: stable@kernel.org
    Signed-off-by: David Woodhouse

    Chris Wright
     
  • Commit a97590e5 added unlinking domains from iommus to reciprocate the
    iommu from domains unlinking that was already done. We actually want
    to only do this for device domains and never for the static
    identity map domain or VM domains. The SI domain is special and
    never freed, while VM domain->id lives in their own special address
    space, separate from iommu->domain_ids.

    In the current code, a VM can get domain->id zero, then mark that
    domain unused when unbound from pci-stub. This leads to DMAR
    write faults when the device is re-bound to the host driver.

    Signed-off-by: Alex Williamson
    Cc: stable@kernel.org
    Signed-off-by: David Woodhouse

    Alex Williamson
     
  • There are no externally-visible changes with this. In the loop in the
    internal __domain_mapping() function, we simply detect if we are mapping:
    - size >= 2MiB, and
    - virtual address aligned to 2MiB, and
    - physical address aligned to 2MiB, and
    - on hardware that supports superpages.

    (and likewise for larger superpages).

    We automatically use a superpage for such mappings. We never have to
    worry about *breaking* superpages, since we trust that we will always
    *unmap* the same range that was mapped. So all we need to do is ensure
    that dma_pte_clear_range() will also cope with superpages.

    Adjust pfn_to_dma_pte() to take a superpage 'level' as an argument, so
    it can return a PTE at the appropriate level rather than always
    extending the page tables all the way down to level 1. Again, this is
    simplified by the fact that we should never encounter existing small
    pages when we're creating a mapping; any old mapping that used the same
    virtual range will have been entirely removed and its obsolete page
    tables freed.

    Provide an 'intel_iommu=sp_off' argument on the command line as a
    chicken bit. Not that it should ever be required.

    ==

    The original commit seen in the iommu-2.6.git was Youquan's
    implementation (and completion) of my own half-baked code which I'd
    typed into an email. Followed by half a dozen subsequent 'fixes'.

    I've taken the unusual step of rewriting history and collapsing the
    original commits in order to keep the main history simpler, and make
    life easier for the people who are going to have to backport this to
    older kernels. And also so I can give it a more coherent commit comment
    which (hopefully) gives a better explanation of what's going on.

    The original sequence of commits leading to identical code was:

    Youquan Song (3):
    intel-iommu: super page support
    intel-iommu: Fix superpage alignment calculation error
    intel-iommu: Fix superpage level calculation error in dma_pfn_level_pte()

    David Woodhouse (4):
    intel-iommu: Precalculate superpage support for dmar_domain
    intel-iommu: Fix hardware_largepage_caps()
    intel-iommu: Fix inappropriate use of superpages in __domain_mapping()
    intel-iommu: Fix phys_pfn in __domain_mapping for sglist pages

    Signed-off-by: Youquan Song
    Signed-off-by: David Woodhouse

    Youquan Song
     
  • Fix build warnings in physmap.h:

    include/linux/mtd/physmap.h:25: warning: 'struct platform_device' declared inside parameter list
    include/linux/mtd/physmap.h:25: warning: its scope is only this definition or declaration, which is probably not what you want
    include/linux/mtd/physmap.h:26: warning: 'struct platform_device' declared inside parameter list
    include/linux/mtd/physmap.h:27: warning: 'struct platform_device' declared inside parameter list

    Signed-off-by: Randy Dunlap
    Signed-off-by: David Woodhouse

    Randy Dunlap
     
  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
    AppArmor: fix oops in apparmor_setprocattr

    Linus Torvalds
     
  • The new instruction_pointer_set helper is defined for people who have
    converted to asm-generic/ptrace.h, so don't use it generally unless
    the arch needs it (in which case it has been converted). This should
    fix building of kgdb tests for arches not yet converted.

    Signed-off-by: Mike Frysinger
    Acked-by: Stephen Rothwell
    Cc: Jason Wessel
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Frysinger
     
  • When invalid parameters are passed to apparmor_setprocattr a NULL deref
    oops occurs when it tries to record an audit message. This is because
    it is passing NULL for the profile parameter for aa_audit. But aa_audit
    now requires that the profile passed is not NULL.

    Fix this by passing the current profile on the task that is trying to
    setprocattr.

    Signed-off-by: Kees Cook
    Signed-off-by: John Johansen
    Cc: stable@kernel.org
    Signed-off-by: James Morris

    Kees Cook
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
    virtio_net: delay TX callbacks
    virtio: add api for delayed callbacks
    virtio_test: support event index
    vhost: support event index
    virtio_ring: support event idx feature
    virtio ring: inline function to check for events
    virtio: event index interface
    virtio: add full three-clause BSD text to headers.
    virtio balloon: kill tell-host-first logic
    virtio console: don't manually set or finalize VIRTIO_CONSOLE_F_MULTIPORT.
    drivers, block: virtio_blk: Replace cryptic number with the macro
    virtio_blk: allow re-reading config space at runtime
    lguest: remove support for VIRTIO_F_NOTIFY_ON_EMPTY.
    lguest: fix up compilation after move
    lguest: fix timer interrupt setup

    Linus Torvalds
     
  • * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
    [IA64] wire up sendmmsg() syscall for Itanium

    Linus Torvalds
     
  • Add entries in unistd.h and entry.S to make this new syscall visible.

    Signed-off-by: Tony Luck

    Tony Luck
     
  • …/git/tip/linux-2.6-tip

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: Fix mwait_play_dead() faulting on mwait-incapable cpus
    x86 idle: Fix mwait deprecation warning message

    Evil merge to remove extra quote noticed by Joe Perches

    Linus Torvalds
     
  • …l/git/tip/linux-2.6-tip

    * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    rcu: Cure load woes

    Linus Torvalds
     

31 May, 2011

4 commits

  • …l/git/tip/linux-2.6-tip

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86: Put back -pg to tsc.o and add no GCOV to vread_tsc_64.o

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    autofs4: bogus dentry_unhash() added in ->unlink()
    vfs: shrink_dcache_parent before rmdir, dir rename

    Linus Torvalds
     
  • The Apple custom PIC only exist in some earlier machine models,
    anything with an MPIC will crash on suspend if we register those
    syscore ops unconditionally.

    This is a regression caused by commit f5a592f7d74e ("PM / PowerPC: Use
    struct syscore_ops instead of sysdevs for PM")

    Signed-off-by: Benjamin Herrenschmidt
    Signed-off-by: Linus Torvalds

    Benjamin Herrenschmidt
     
  • Commit cc3ce5176d83 (rcu: Start RCU kthreads in TASK_INTERRUPTIBLE
    state) fudges a sleeping task' state, resulting in the scheduler seeing
    a TASK_UNINTERRUPTIBLE task going to sleep, but a TASK_INTERRUPTIBLE
    task waking up. The result is unbalanced load calculation.

    The problem that patch tried to address is that the RCU threads could
    stay in UNINTERRUPTIBLE state for quite a while and triggering the hung
    task detector due to on-demand wake-ups.

    Cure the problem differently by always giving the tasks at least one
    wake-up once the CPU is fully up and running, this will kick them out of
    the initial UNINTERRUPTIBLE state and into the regular INTERRUPTIBLE
    wait state.

    [ The alternative would be teaching kthread_create() to start threads as
    INTERRUPTIBLE but that needs a tad more thought. ]

    Reported-by: Damien Wyart
    Signed-off-by: Peter Zijlstra
    Acked-by: Paul E. McKenney
    Link: http://lkml.kernel.org/r/1306755291.1200.2872.camel@twins
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

30 May, 2011

12 commits

  • A logic error in mwait_play_dead() causes the kernel to use
    mwait even on cpus which don't support it, such as KVM virtual
    cpus.

    Introduced by:

    349c004e3d31: x86: A fast way to check capabilities of the current cpu

    Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=36222
    Reported-by: Török Edwin
    Signed-off-by: Avi Kivity
    Cc: Christoph Lameter
    Cc: Tejun Heo
    Link: http://lkml.kernel.org/r/1306758237-9327-1-git-send-email-avi@redhat.com
    Signed-off-by: Ingo Molnar

    Avi Kivity
     
  • Fix:

    arch/x86/kernel/process.c:645:1: warning: unknown escape sequence '\i'

    due to missing escape backslash, introduced by this commit:

    5d4c47e0195b: x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param

    Signed-off-by: Borislav Petkov
    Cc: Len Brown
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/1306748286-24701-1-git-send-email-bp@alien8.de
    Signed-off-by: Ingo Molnar

    Borislav Petkov
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • The dentry_unhash push-down series missed that shink_dcache_parent needs to
    be called prior to rmdir or dir rename to clear DCACHE_REFERENCED and
    allow efficient dentry reclaim.

    Reported-by: Dave Chinner
    Signed-off-by: Sage Weil
    Signed-off-by: Al Viro

    Sage Weil
     
  • Ask for delayed callbacks on TX ring full, to give the
    other side more of a chance to make progress.

    Signed-off-by: Michael S. Tsirkin
    Acked-by: David S. Miller
    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     
  • Add an API that tells the other side that callbacks
    should be delayed until a lot of work has been done.
    Implement using the new event_idx feature.

    Note: it might seem advantageous to let the drivers
    ask for a callback after a specific capacity has
    been reached. However, as a single head can
    free many entries in the descriptor table,
    we don't really have a clue about capacity
    until get_buf is called. The API is the simplest
    to implement at the moment, we'll see what kind of
    hints drivers can pass when there's more than one
    user of the feature.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     
  • Add ability to test the new event idx feature,
    enable by default.

    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     
  • Support the new event index feature. When acked,
    utilize it to reduce the # of interrupts sent to the guest.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     
  • Support for the new event idx feature:
    1. When enabling interrupts, publish the current avail index
    value to the host to get interrupts on the next update.
    2. Use the new avail_event feature to reduce the number
    of exits from the guest.

    Simple test with the simulator:

    [virtio]# time ./virtio_test
    spurious wakeus: 0x7

    real 0m0.169s
    user 0m0.140s
    sys 0m0.019s
    [virtio]# time ./virtio_test --no-event-idx
    spurious wakeus: 0x11

    real 0m0.649s
    user 0m0.295s
    sys 0m0.335s

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     
  • With the new used_event and avail_event and features, both
    host and guest need similar logic to check whether events are
    enabled, so it helps to put the common code in the header.

    Note that Xen has similar logic for notification hold-off
    in include/xen/interface/io/ring.h with req_event and req_prod
    corresponding to event_idx + 1 and new_idx respectively.
    +1 comes from the fact that req_event and req_prod in Xen start at 1,
    while event index in virtio starts at 0.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     
  • Define a new feature bit for the guest and host to utilize
    an event index (like Xen) instead if a flag bit to enable/disable
    interrupts and kicks.

    Signed-off-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Michael S. Tsirkin
     
  • It's unclear to me if it's important, but it's obviously causing my
    technical colleages some headaches and I'd hate such imprecision to
    slow virtio adoption.

    I've emailed this to all non-trivial contributors for approval, too.

    Signed-off-by: Rusty Russell
    Acked-by: Grant Likely
    Acked-by: Ryan Harper
    Acked-by: Anthony Liguori
    Acked-by: Eric Van Hensbergen
    Acked-by: john cooper
    Acked-by: Aneesh Kumar K.V
    Acked-by: Christian Borntraeger
    Acked-by: Fernando Luis Vazquez Cao

    Rusty Russell