05 Dec, 2009

1 commit


04 Dec, 2009

16 commits

  • I have observed cases where the implicit stop_machine_destroy() done by
    stop_machine() hangs while destroying the workqueues, specifically in
    kthread_stop(). This seems to be because timer ticks are not restarted
    until after stop_machine() returns.

    Fortunately stop_machine provides a facility to pre-create/post-destroy
    the workqueues so use this to ensure that workqueues are only destroyed
    after everything is really up and running again.

    I only actually observed this failure with 2.6.30. It seems that newer
    kernels are somehow more robust against doing kthread_stop() without timer
    interrupts (I tried some backports of some likely looking candidates but
    did not track down the commit which added this robustness). However this
    change seems like a reasonable belt&braces thing to do.

    Signed-off-by: Ian Campbell
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stable Kernel

    Ian Campbell
     
  • The existing error handling has a few issues:
    - If freeze_processes() fails it exits with shutting_down = SHUTDOWN_SUSPEND.
    - If dpm_suspend_noirq() fails it exits without resuming xenbus.
    - If stop_machine() fails it exits without resuming xenbus or calling
    dpm_resume_end().
    - xs_suspend()/xs_resume() and dpm_suspend_noirq()/dpm_resume_noirq() were not
    nested in the obvious way.

    Fix by ensuring each failure case goto's the correct label. Treat a failure of
    stop_machine() as a cancelled suspend in order to follow the correct resume
    path.

    Signed-off-by: Ian Campbell
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stable Kernel

    Ian Campbell
     
  • On resume irq_info[*].evtchn is reset to 0 since event channel mappings
    are not preserved over suspend/resume. The other contents of irq_info
    is preserved to allow rebind_evtchn_irq() to function.

    However when a device resumes it will try to unbind from the
    previous IRQ (e.g. blkfront goes blkfront_resume() -> blkif_free() ->
    unbind_from_irqhandler() -> unbind_from_irq()). This will fail due to the
    check for VALID_EVTCHN in unbind_from_irq() and the IRQ is leaked. The
    device will then continue to resume and allocate a new IRQ, eventually
    leading to find_unbound_irq() panic()ing.

    Fix this by changing unbind_from_irq() to handle teardown of interrupts
    which have type!=IRQT_UNBOUND but are not currently bound to a specific
    event channel.

    Signed-off-by: Ian Campbell
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stable Kernel

    Ian Campbell
     
  • tick_resume() is never called on secondary processors. Presumably this
    is because they are offlined for suspend on native and so this is
    normally taken care of in the CPU onlining path. Under Xen we keep all
    CPUs online over a suspend.

    This patch papers over the issue for me but I will investigate a more
    generic, less hacky, way of doing to the same.

    tick_suspend is also only called on the boot CPU which I presume should
    be fixed too.

    Signed-off-by: Ian Campbell
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stable Kernel
    Cc: Thomas Gleixner

    Ian Campbell
     
  • If Xen wants to return to a 32b usermode with sysret it must use the
    right form. When using VCGF_in_syscall to trigger this, it looks at
    the code segment and does a 32b sysret if it is FLAT_USER_CS32.
    However, this is different from __USER32_CS, so it fails to return
    properly if we use the normal Linux segment.

    So avoid the whole mess by dropping VCGF_in_syscall and simply use
    plain iret to return to usermode.

    Signed-off-by: Jeremy Fitzhardinge
    Acked-by: Jan Beulich
    Cc: Stable Kernel

    Jeremy Fitzhardinge
     
  • dpm_resume_noirq() takes a mutex, so it can't be called from a no-interrupt
    context. Don't call it from within the stop-machine function, but just
    afterwards, since we're resuming anyway, regardless of what happened.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stable Kernel

    Jeremy Fitzhardinge
     
  • printk timestamping uses sched_clock, which in turn relies on runstate
    info under Xen. So make sure we set it up before any printks can
    be called.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stable Kernel

    Jeremy Fitzhardinge
     
  • The commit "xen: re-register runstate area earlier on resume" caused us
    to never try and setup the runstate area for secondary CPUs. Ensure that
    we do this...

    Signed-off-by: Ian Campbell
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stable Kernel

    Ian Campbell
     
  • Otherwise the timer is disabled by dpm_suspend_noirq() which in turn prevents
    correct operation of stop_machine on multi-processor systems and breaks
    suspend.

    Signed-off-by: Ian Campbell
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stable Kernel

    Ian Campbell
     
  • pvops kernels >= 2.6.30 can currently only be saved and restored once. The
    second attempt to save results in:

    ERROR Internal error: Frame# in pfn-to-mfn frame list is not in pseudophys
    ERROR Internal error: entry 0: p2m_frame_list[0] is 0xf2c2c2c2, max 0x120000
    ERROR Internal error: Failed to map/save the p2m frame list

    I finally narrowed it down to:

    commit cdaead6b4e657f960d6d6f9f380e7dfeedc6a09b
    Author: Jeremy Fitzhardinge
    Date: Fri Feb 27 15:34:59 2009 -0800

    xen: split construction of p2m mfn tables from registration

    Build the p2m_mfn_list_list early with the rest of the p2m table, but
    register it later when the real shared_info structure is in place.

    Signed-off-by: Jeremy Fitzhardinge

    The unforeseen side-effect of this change was to cause the mfn list list to not
    be rebuilt on resume. Prior to this change it would have been rebuilt via
    xen_post_suspend() -> xen_setup_shared_info() -> xen_setup_mfn_list_list().

    Fix by explicitly calling xen_build_mfn_list_list() from xen_post_suspend().

    Signed-off-by: Ian Campbell
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stable Kernel

    Ian Campbell
     
  • Even if have_vcpu_info_placement is not set, we still need to set up
    the runstate area on each resumed vcpu.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stable Kernel

    Jeremy Fitzhardinge
     
  • This is necessary to ensure the runstate area is available to
    xen_sched_clock before any calls to printk which will require it in
    order to provide a timestamp.

    I chose to pull the xen_setup_runstate_info out of xen_time_init into
    the caller in order to maintain parity with calling
    xen_setup_runstate_info separately from calling xen_time_resume.

    Signed-off-by: Ian Campbell
    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stable Kernel

    Ian Campbell
     
  • Increases the device timeout from 10s to 5 minutes, giving the user a
    visual indication during that time in case there are problems. The patch
    is a backport of changesets 144 and 150 in the Xenbits tree.

    Cc: Jeremy Fitzhardinge
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Jeremy Fitzhardinge

    Paolo Bonzini
     
  • When printing a warning about a timed-out device, print the
    current state of both ends of the device connection (i.e., backend as
    well as frontend). This backports half of changeset 146 from the
    Xenbits tree.

    Cc: Jeremy Fitzhardinge
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Jeremy Fitzhardinge

    Paolo Bonzini
     
  • The logic of is_disconnected_device/exists_disconnected_device is wrong
    in that they are used to test whether a device is trying to connect (i.e.
    connecting). For this reason the patch fixes them to not consider a
    Closing or Closed device to be connecting. At the same time the patch
    also renames the functions according to what they really do; you could
    say a closed device is "disconnected" (the old name), but not "connecting"
    (the new name).

    This patch is a backport of changeset 909 from the Xenbits tree.

    Cc: Jeremy Fitzhardinge
    Signed-off-by: Paolo Bonzini
    Signed-off-by: Jeremy Fitzhardinge

    Paolo Bonzini
     
  • They don't need to be global, and may cause linker clashes.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stable Kernel

    Jeremy Fitzhardinge
     

04 Nov, 2009

2 commits


28 Oct, 2009

1 commit


23 Oct, 2009

2 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
    move virtrng_remove to .devexit.text
    move virtballoon_remove to .devexit.text
    virtio_blk: Revert serial number support
    virtio: let header files include virtio_ids.h
    virtio_blk: revert QUEUE_FLAG_VIRT addition

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (21 commits)
    niu: VLAN_ETH_HLEN should be used to make sure that the whole MAC header was copied to the head buffer in the Vlan packets case
    KS8851: Fix ks8851_set_rx_mode() for IFF_MULTICAST
    KS8851: Fix MAC address write order
    KS8851: Add soft reset at probe time
    net: fix section mismatch in fec.c
    net: Fix struct inet_timewait_sock bitfield annotation
    tcp: Try to catch MSG_PEEK bug
    net: Fix IP_MULTICAST_IF
    bluetooth: static lock key fix
    bluetooth: scheduling while atomic bug fix
    tcp: fix TCP_DEFER_ACCEPT retrans calculation
    tcp: reduce SYN-ACK retrans for TCP_DEFER_ACCEPT
    tcp: accept socket after TCP_DEFER_ACCEPT period
    Revert "tcp: fix tcp_defer_accept to consider the timeout"
    AF_UNIX: Fix deadlock on connecting to shutdown socket
    ethoc: clear only pending irqs
    ethoc: inline regs access
    vmxnet3: use dev_dbg, fix build for CONFIG_BLOCK=n
    virtio_net: use dev_kfree_skb_any() in free_old_xmit_skbs()
    be2net: fix support for PCI hot plug
    ...

    Linus Torvalds
     

22 Oct, 2009

16 commits

  • The function virtrng_remove is used only wrapped by __devexit_p so define
    it using __devexit.

    Signed-off-by: Uwe Kleine-König
    Acked-by: Sam Ravnborg
    Cc: Rusty Russell
    Cc: Michael S. Tsirkin
    Acked-by: Christian Borntraeger
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Rusty Russell

    Uwe Kleine-König
     
  • The function virtballoon_remove is used only wrapped by __devexit_p so
    define it using __devexit.

    Signed-off-by: Uwe Kleine-König
    Acked-by: Sam Ravnborg
    Acked-by: Michael S. Tsirkin
    Signed-off-by: Rusty Russell

    Uwe Kleine-König
     
  • This reverts "Add serial number support for virtio_blk, V4a".

    Turns out that virtio_pci, lguest and s/390 all have an 8 bit limit
    on virtio config space, so noone could ever use this.

    This is coming back later in a cleaner form.

    Signed-off-by: Rusty Russell
    Cc: john cooper
    Cc: Jens Axboe

    Rusty Russell
     
  • Rusty,

    commit 3ca4f5ca73057a617f9444a91022d7127041970a
    virtio: add virtio IDs file
    moved all device IDs into a single file. While the change itself is
    a very good one, it can break userspace applications. For example
    if a userspace tool wanted to get the ID of virtio_net it used to
    include virtio_net.h. This does no longer work, since virtio_net.h
    does not include virtio_ids.h.
    This patch moves all "#include " from the C
    files into the header files, making the header files compatible with
    the old ones.

    In addition, this patch exports virtio_ids.h to userspace.

    CC: Fernando Luis Vazquez Cao
    Signed-off-by: Christian Borntraeger
    Signed-off-by: Rusty Russell

    Christian Borntraeger
     
  • It seems like the addition of QUEUE_FLAG_VIRT caueses major performance
    regressions for Fedora users:

    https://bugzilla.redhat.com/show_bug.cgi?id=509383
    https://bugzilla.redhat.com/show_bug.cgi?id=505695

    while I can't reproduce those extreme regressions myself I think the flag
    is wrong.

    Rationale:

    QUEUE_FLAG_VIRT expands to QUEUE_FLAG_NONROT which casus the queue
    unplugged immediately. This is not a good behaviour for at least
    qemu and kvm where we do have significant overhead for every
    I/O operations. Even with all the latested speeups (native AIO,
    MSI support, zero copy) we can only get native speed for up to 128kb
    I/O requests we already are down to 66% of native performance for 4kb
    requests even on my laptop running the Intel X25-M SSD for which the
    QUEUE_FLAG_NONROT was designed.
    If we ever get virtio-blk overhead low enough that this flag makes
    sense it should only be set based on a feature flag set by the host.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Rusty Russell

    Christoph Hellwig
     
  • …ied to the head buffer in the Vlan packets case

    Signed-off-by: Joyce Yu <joyce.yu@sun.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

    Joyce Yu
     
  • * 'for-linus' of git://git.infradead.org/users/eparis/notify:
    dnotify: ignore FS_EVENT_ON_CHILD
    inotify: fix coalesce duplicate events into a single event in special case
    inotify: deprecate the inotify kernel interface
    fsnotify: do not set group for a mark before it is on the i_list

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
    Input: hp_sdc_rtc - fix test in hp_sdc_rtc_read_rt()
    Input: atkbd - consolidate force release quirks for volume keys
    Input: logips2pp - model 73 is actually TrackMan FX
    Input: i8042 - add Sony Vaio VGN-FZ240E to the nomux list
    Input: fix locking issue in /proc/bus/input/ handlers
    Input: atkbd - postpone restoring LED/repeat rate at resume
    Input: atkbd - restore resetting LED state at startup
    Input: i8042 - make pnp_data_busted variable boolean instead of int
    Input: synaptics - add another Protege M300 to rate blacklist

    Linus Torvalds
     
  • * 'kvm-updates/2.6.32' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    KVM: Prevent kvm_init from corrupting debugfs structures
    KVM: MMU: fix pointer cast
    KVM: use proper hrtimer function to retrieve expiration time

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
    dm snapshot: allow chunk size to be less than page size
    dm snapshot: use unsigned integer chunk size
    dm snapshot: lock snapshot while supplying status
    dm exception store: fix failed set_chunk_size error path
    dm snapshot: require non zero chunk size by end of ctr
    dm: dec_pending needs locking to save error value
    dm: add missing del_gendisk to alloc_dev error path
    dm log: userspace fix incorrect luid cast in userspace_ctr
    dm snapshot: free exception store on init failure
    dm snapshot: sort by chunk size to fix race

    Linus Torvalds
     
  • Increase TEST_SUSPEND_SECONDS to 10 so the warning in
    suspend_test_finish() doesn't annoy the users of slower systems so much.

    Also, make the warning print the suspend-resume cycle time, so that we
    know why the warning actually triggered.

    Patch prepared during the hacking session at the Kernel Summit in Tokyo.

    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • This fixes a compile bug introduced in

    6ef297f (ARM: 5720/1: Move MMCI header to amba include dir)

    That commit moved arch/arm/include/asm/mach/mmc.h to
    include/linux/amba/mmci.h. Just removing the include was enough.

    Signed-off-by: Uwe Kleine-König
    Acked-by: Linus Walleij
    Acked-by: Nicolas Ferre
    Acked-by: Bill Gatliff
    Cc: Catalin Marinas
    Cc: Russell King
    Cc: Pierre Ossman
    Cc: linux-arm-kernel@lists.infradead.org
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Uwe Kleine-König
     
  • * 'sh/for-2.6.32' of git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6:
    sh: Kill off stray HAVE_FTRACE_SYSCALLS reference.
    sh: Remove BKL from landisk gio.
    sh: disabled cache handling fix.
    sh: Fix up single page flushing to use PAGE_SIZE.

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: aesni-intel - Fix irq_fpu_usable usage
    crypto: padlock-sha - Fix stack alignment

    Linus Torvalds
     
  • Fix a (small) memory leak in one of the error paths of the NFS mount
    options parsing code.

    Regression introduced in 2.6.30 by commit a67d18f (NFS: load the
    rpc/rdma transport module automatically).

    Reported-by: Yinghai Lu
    Reported-by: Pekka Enberg
    Signed-off-by: Ingo Molnar
    Signed-off-by: Trond Myklebust
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     
  • This patch fixes a null pointer exception in pipe_rdwr_open() which
    generates the stack trace:

    > Unable to handle kernel NULL pointer dereference at 0000000000000028 RIP:
    > [] pipe_rdwr_open+0x35/0x70
    > [] __dentry_open+0x13c/0x230
    > [] do_filp_open+0x2d/0x40
    > [] do_sys_open+0x5a/0x100
    > [] sysenter_do_call+0x1b/0x67

    The failure mode is triggered by an attempt to open an anonymous
    pipe via /proc/pid/fd/* as exemplified by this script:

    =============================================================
    while : ; do
    { echo y ; sleep 1 ; } | { while read ; do echo z$REPLY; done ; } &
    PID=$!
    OUT=$(ps -efl | grep 'sleep 1' | grep -v grep |
    { read PID REST ; echo $PID; } )
    OUT="${OUT%% *}"
    DELAY=$((RANDOM * 1000 / 32768))
    usleep $((DELAY * 1000 + RANDOM % 1000 ))
    echo n > /proc/$OUT/fd/1 # Trigger defect
    done
    =============================================================

    Note that the failure window is quite small and I could only
    reliably reproduce the defect by inserting a small delay
    in pipe_rdwr_open(). For example:

    static int
    pipe_rdwr_open(struct inode *inode, struct file *filp)
    {
    msleep(100);
    mutex_lock(&inode->i_mutex);

    Although the defect was observed in pipe_rdwr_open(), I think it
    makes sense to replicate the change through all the pipe_*_open()
    functions.

    The core of the change is to verify that inode->i_pipe has not
    been released before attempting to manipulate it. If inode->i_pipe
    is no longer present, return ENOENT to indicate so.

    The comment about potentially using atomic_t for i_pipe->readers
    and i_pipe->writers has also been removed because it is no longer
    relevant in this context. The inode->i_mutex lock must be used so
    that inode->i_pipe can be dealt with correctly.

    Signed-off-by: Earl Chew
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Earl Chew
     

21 Oct, 2009

2 commits

  • In ks8851_set_rx_mode() the case handling IFF_MULTICAST was also setting
    the RXCR1_AE bit by accident. This meant that all unicast frames where
    being accepted by the device. Remove RXCR1_AE from this case.

    Note, RXCR1_AE was also masking a problem with setting the MAC address
    properly, so needs to be applied after fixing the MAC write order.

    Fixes a bug reported by Doong, Ping of Micrel. This version of the
    patch avoids setting RXCR1_ME for all cases.

    Signed-off-by: Ben Dooks
    Signed-off-by: David S. Miller

    Ben Dooks
     
  • The MAC address register was being written in the wrong order, so add
    a new address macro to convert mac-address byte to register address and
    a ks8851_wrreg8() function to write each byte without having to worry
    about any difficult byte swapping.

    Fixes a bug reported by Doong, Ping of Micrel.

    Signed-off-by: Ben Dooks
    Signed-off-by: David S. Miller

    Ben Dooks