13 Jun, 2013

25 commits

  • The bitmap accessed by bitops must have enough size to hold the required
    numbers of bits rounded up to a multiple of BITS_PER_LONG. And the
    bitmap must not be zeroed by memset() if the number of bits cleared is
    not a multiple of BITS_PER_LONG.

    This fixes incorrect zeroing and allocation size for frontswap_map. The
    incorrect zeroing part doesn't cause any problem because frontswap_map
    is freed just after zeroing. But the wrongly calculated allocation size
    may cause the problem.

    For 32bit systems, the allocation size of frontswap_map is about twice
    as large as required size. For 64bit systems, the allocation size is
    smaller than requeired if the number of bits is not a multiple of
    BITS_PER_LONG.

    Signed-off-by: Akinobu Mita
    Cc: Konrad Rzeszutek Wilk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • audit_add_tree_rule() must set 'rule->tree = NULL;' firstly, to protect
    the rule itself freed in kill_rules().

    The reason is when it is killed, the 'rule' itself may have already
    released, we should not access it. one example: we add a rule to an
    inode, just at the same time the other task is deleting this inode.

    The work flow for adding a rule:

    audit_receive() -> (need audit_cmd_mutex lock)
    audit_receive_skb() ->
    audit_receive_msg() ->
    audit_receive_filter() ->
    audit_add_rule() ->
    audit_add_tree_rule() -> (need audit_filter_mutex lock)
    ...
    unlock audit_filter_mutex
    get_tree()
    ...
    iterate_mounts() -> (iterate all related inodes)
    tag_mount() ->
    tag_trunk() ->
    create_trunk() -> (assume it is 1st rule)
    fsnotify_add_mark() ->
    fsnotify_add_inode_mark() -> (add mark to inode->i_fsnotify_marks)
    ...
    get_tree(); (each inode will get one)
    ...
    lock audit_filter_mutex

    The work flow for deleting an inode:

    __destroy_inode() ->
    fsnotify_inode_delete() ->
    __fsnotify_inode_delete() ->
    fsnotify_clear_marks_by_inode() -> (get mark from inode->i_fsnotify_marks)
    fsnotify_destroy_mark() ->
    fsnotify_destroy_mark_locked() ->
    audit_tree_freeing_mark() ->
    evict_chunk() ->
    ...
    tree->goner = 1
    ...
    kill_rules() -> (assume current->audit_context == NULL)
    call_rcu() -> (rule->tree != NULL)
    audit_free_rule_rcu() ->
    audit_free_rule()
    ...
    audit_schedule_prune() -> (assume current->audit_context == NULL)
    kthread_run() -> (need audit_cmd_mutex and audit_filter_mutex lock)
    prune_one() -> (delete it from prue_list)
    put_tree(); (match the original get_tree above)

    Signed-off-by: Chen Gang
    Cc: Eric Paris
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen Gang
     
  • When we have a page fault for the address which is backed by a hugepage
    under migration, the kernel can't wait correctly and do busy looping on
    hugepage fault until the migration finishes. As a result, users who try
    to kick hugepage migration (via soft offlining, for example) occasionally
    experience long delay or soft lockup.

    This is because pte_offset_map_lock() can't get a correct migration entry
    or a correct page table lock for hugepage. This patch introduces
    migration_entry_wait_huge() to solve this.

    Signed-off-by: Naoya Horiguchi
    Reviewed-by: Rik van Riel
    Reviewed-by: Wanpeng Li
    Reviewed-by: Michal Hocko
    Cc: Mel Gorman
    Cc: Andi Kleen
    Cc: KOSAKI Motohiro
    Cc: [2.6.35+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Naoya Horiguchi
     
  • dlm_mig_lockres_handler() is missing a dlm_lockres_put() on an error path.

    Signed-off-by: joyce
    Reviewed-by: shencanquan
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xue jiufei
     
  • The watermark check consists of two sub-checks. The first one is:

    if (free_pages < order; o++) {
    free_pages -= z->free_area[o].nr_free << o;
    min >>= 1;
    if (free_pages free_area[o].nr_free is equal to the number of free pages
    including free CMA pages. Therefore the CMA pages are subtracted twice.
    This may cause a false positive fail of __zone_watermark_ok() if the CMA
    area gets strongly fragmented. In such a case there are many 0-order
    free pages located in CMA. Those pages are subtracted twice therefore
    they will quickly drain free_pages during the check against
    fragmentation. The test fails even though there are many free non-cma
    pages in the zone.

    This patch fixes this issue by subtracting CMA pages only for a purpose of
    (free_pages
    Signed-off-by: Kyungmin Park
    Tested-by: Laura Abbott
    Cc: Bartlomiej Zolnierkiewicz
    Acked-by: Minchan Kim
    Cc: Mel Gorman
    Tested-by: Marek Szyprowski
    Cc: [3.7+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tomasz Stanislawski
     
  • The "info.fill" array isn't initialized so it can leak uninitialized stack
    information to user space.

    Signed-off-by: Dan Carpenter
    Acked-by: Robin Holt
    Acked-by: Dimitri Sivanich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     
  • There was a regression introduced by 36f5588905c1 ("aio: refcounting
    cleanup"), reported by Jens Axboe - the refcounting cleanup switched to
    using RCU in the shutdown path, but the synchronize_rcu() was done in
    the context of the io_destroy() syscall greatly increasing the time it
    could block.

    This patch switches it to call_rcu() and makes shutdown asynchronous
    (more asynchronous than it was originally; before the refcount changes
    io_destroy() would still wait on pending kiocbs).

    Note that there's a global quota on the max outstanding kiocbs, and that
    quota must be manipulated synchronously; otherwise io_setup() could
    return -EAGAIN when there isn't quota available, and userspace won't
    have any way of waiting until shutdown of the old kioctxs has finished
    (besides busy looping).

    So we release our quota before kioctx shutdown has finished, which
    should be fine since the quota never corresponded to anything real
    anyways.

    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Reported-by: Jens Axboe
    Tested-by: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Signed-off-by: Benjamin LaHaise
    Tested-by: Benjamin LaHaise
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     
  • Add support for the at91sam9x5-family which must use the shadow
    interrupt mask due to a hardware issue (causing RTC_IMR to always be
    zero).

    Signed-off-by: Johan Hovold
    Acked-by: Nicolas Ferre
    Cc: Douglas Gilbert
    Cc: Jean-Christophe PLAGNIOL-VILLARD
    Cc: Ludovic Desroches
    Cc: Robert Nelson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johan Hovold
     
  • Add shadow interrupt-mask register which can be used on SoCs where the
    actual hardware register is broken.

    Note that some care needs to be taken to make sure the shadow mask
    corresponds to the actual hardware state. The added overhead is not an
    issue for the non-broken SoCs due to the relatively infrequent
    interrupt-mask updates. We do, however, only use the shadow mask value
    as a fall-back when it actually needed as there is still a theoretical
    possibility that the mask is incorrect (see the code for details).

    Signed-off-by: Johan Hovold
    Acked-by: Nicolas Ferre
    Cc: Douglas Gilbert
    Cc: Jean-Christophe PLAGNIOL-VILLARD
    Cc: Ludovic Desroches
    Cc: Robert Nelson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johan Hovold
     
  • Add accessors for the interrupt register.

    This will allow us to easily add a shadow interrupt-mask register to use
    on SoCs where the interrupt-mask register cannot be used.

    Signed-off-by: Johan Hovold
    Acked-by: Nicolas Ferre
    Cc: Douglas Gilbert
    Cc: Jean-Christophe PLAGNIOL-VILLARD
    Cc: Ludovic Desroches
    Cc: Robert Nelson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johan Hovold
     
  • Add configuration support which can be used to implement SoC-specific
    workarounds for broken hardware.

    Signed-off-by: Johan Hovold
    Acked-by: Nicolas Ferre
    Cc: Douglas Gilbert
    Cc: Jean-Christophe PLAGNIOL-VILLARD
    Cc: Ludovic Desroches
    Cc: Robert Nelson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johan Hovold
     
  • The members of Atmel's at91sam9x5 family (9x5) have a broken RTC
    interrupt mask register (AT91_RTC_IMR). It does not reflect enabled
    interrupts but instead always returns zero.

    The kernel's rtc-at91rm9200 driver handles the RTC for the 9x5 family.
    Currently when the date/time is set, an interrupt is generated and this
    driver neglects to handle the interrupt. The kernel complains about the
    un-handled interrupt and disables it henceforth. This not only breaks
    the RTC function, but since that interrupt is shared (Atmel's SYS
    interrupt) then other things break as well (e.g. the debug port no
    longer accepts characters).

    Tested on the at91sam9g25. Bug confirmed by Atmel.

    This patch (of 5):

    Add missing match-table compile guard.

    Signed-off-by: Johan Hovold
    Acked-by: Nicolas Ferre
    Cc: Douglas Gilbert
    Cc: Jean-Christophe PLAGNIOL-VILLARD
    Cc: Ludovic Desroches
    Cc: Robert Nelson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johan Hovold
     
  • While removing a non-empty directory, the kernel dumps a message:

    (rmdir,21743,1):ocfs2_unlink:953 ERROR: status = -39

    Suppress the error message from being printed in the dmesg so users
    don't panic.

    Signed-off-by: Goldwyn Rodrigues
    Cc: Mark Fasheh
    Cc: Joel Becker
    Acked-by: Sunil Mushran
    Reviewed-by: Jie Liu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Goldwyn Rodrigues
     
  • read_swap_cache_async() can race against get_swap_page(), and stumble
    across a SWAP_HAS_CACHE entry in the swap map whose page wasn't brought
    into the swapcache yet.

    This transient swap_map state is expected to be transitory, but the
    actual placement of discard at scan_swap_map() inserts a wait for I/O
    completion thus making the thread at read_swap_cache_async() to loop
    around its -EEXIST case, while the other end at get_swap_page() is
    scheduled away at scan_swap_map(). This can leave the system deadlocked
    if the I/O completion happens to be waiting on the CPU waitqueue where
    read_swap_cache_async() is busy looping and !CONFIG_PREEMPT.

    This patch introduces a cond_resched() call to make the aforementioned
    read_swap_cache_async() busy loop condition to bail out when necessary,
    thus avoiding the subtle race window.

    Signed-off-by: Rafael Aquini
    Acked-by: Johannes Weiner
    Acked-by: KOSAKI Motohiro
    Acked-by: Hugh Dickins
    Cc: Shaohua Li
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael Aquini
     
  • When booted in legacy mode device_init_wakeup() gets called by
    drivers/mfd/twl-core.c when the children are initialized. However, when
    booted using device tree, the children are created with
    of_platform_populate() instead add_children().

    This means that the RTC driver will not have device_init_wakeup() set,
    and we need to call it from the driver probe like RTC drivers typically
    do.

    Without this we cannot test PM wake-up events on omaps for cases where
    there may not be any physical wake-up event.

    Signed-off-by: Tony Lindgren
    Reported-by: Kevin Hilman
    Cc: Alessandro Zummo
    Cc: Jingoo Han
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tony Lindgren
     
  • If a new logical drive is added and the CCISS_REGNEWD ioctl is invoked
    (as is normal with the Array Configuration Utility) the process will
    hang as below. It attempts to acquire the same mutex twice, once in
    do_ioctl() and once in cciss_unlocked_open(). The BKL was recursive,
    the mutex isn't.

    Linux version 3.10.0-rc2 (scameron@localhost.localdomain) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Fri May 24 14:32:12 CDT 2013
    [...]
    acu D 0000000000000001 0 3246 3191 0x00000080
    Call Trace:
    schedule+0x29/0x70
    schedule_preempt_disabled+0xe/0x10
    __mutex_lock_slowpath+0x17b/0x220
    mutex_lock+0x2b/0x50
    cciss_unlocked_open+0x2f/0x110 [cciss]
    __blkdev_get+0xd3/0x470
    blkdev_get+0x5c/0x1e0
    register_disk+0x182/0x1a0
    add_disk+0x17c/0x310
    cciss_add_disk+0x13a/0x170 [cciss]
    cciss_update_drive_info+0x39b/0x480 [cciss]
    rebuild_lun_table+0x258/0x370 [cciss]
    cciss_ioctl+0x34f/0x470 [cciss]
    do_ioctl+0x49/0x70 [cciss]
    __blkdev_driver_ioctl+0x28/0x30
    blkdev_ioctl+0x200/0x7b0
    block_ioctl+0x3c/0x40
    do_vfs_ioctl+0x89/0x350
    SyS_ioctl+0xa1/0xb0
    system_call_fastpath+0x16/0x1b

    This mutex usage was added into the ioctl path when the big kernel lock
    was removed. As it turns out, these paths are all thread safe anyway
    (or can easily be made so) and we don't want ioctl() to be single
    threaded in any case.

    Signed-off-by: Stephen M. Cameron
    Cc: Jens Axboe
    Cc: Mike Miller
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen M. Cameron
     
  • audit_log_start() does wait_for_auditd() in a loop until
    audit_backlog_wait_time passes or audit_skb_queue has a room.

    If signal_pending() is true this becomes a busy-wait loop, schedule() in
    TASK_INTERRUPTIBLE won't block.

    Thanks to Guy for fully investigating and explaining the problem.

    (akpm: that'll cause the system to lock up on a non-preemptible
    uniprocessor kernel)

    (Guy: "Our customer was in fact running a uniprocessor machine, and they
    reported a system hang.")

    Signed-off-by: Oleg Nesterov
    Reported-by: Guy Streeter
    Cc: Eric Paris
    Cc: Al Viro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • During resume, we call hpet_rtc_timer_init after masking an irq bit in
    hpet. This will cause the call to hpet_disable_rtc_channel to be undone
    if RTC_AIE is the only bit not masked.

    Allowing the cmos interrupt handler to run before resuming caused some
    issues where the timer for the alarm was not removed. This would cause
    other, later timers to not be cleared, so utilities such as hwclock
    would time out when waiting for the update interrupt.

    [akpm@linux-foundation.org: coding-style tweak]
    Signed-off-by: Derek Basehore
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Derek Basehore
     
  • Use device_init_wakeup() instead of device_set_wakeup_capable() and move
    it before rtc dev registering. This fixes alarmtimer not registered
    when tps6586x rtc is the only wakeup compatible rtc in the system.

    Signed-off-by: Dmitry Osipenko
    Cc: Laxman Dewangan
    Cc: Jingoo Han
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Osipenko
     
  • struct memcg_cache_params has a union. Different parts of this union
    are used for root and non-root caches. A part with destroying work is
    used only for non-root caches.

    BUG: unable to handle kernel paging request at 0000000fffffffe0
    IP: kmem_cache_alloc+0x41/0x1f0
    Modules linked in: netlink_diag af_packet_diag udp_diag tcp_diag inet_diag unix_diag ip6table_filter ip6_tables i2c_piix4 virtio_net virtio_balloon microcode i2c_core pcspkr floppy
    CPU: 0 PID: 1929 Comm: lt-vzctl Tainted: G D 3.10.0-rc1+ #2
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    RIP: kmem_cache_alloc+0x41/0x1f0
    Call Trace:
    getname_flags.part.34+0x30/0x140
    getname+0x38/0x60
    do_sys_open+0xc5/0x1e0
    SyS_open+0x22/0x30
    system_call_fastpath+0x16/0x1b
    Code: f4 53 48 83 ec 18 8b 05 8e 53 b7 00 4c 8b 4d 08 21 f0 a8 10 74 0d 4c 89 4d c0 e8 1b 76 4a 00 4c 8b 4d c0 e9 92 00 00 00 4d 89 f5 8b 45 00 65 4c 03 04 25 48 cd 00 00 49 8b 50 08 4d 8b 38 49
    RIP [] kmem_cache_alloc+0x41/0x1f0

    Signed-off-by: Andrey Vagin
    Cc: Konstantin Khlebnikov
    Cc: Glauber Costa
    Cc: Johannes Weiner
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Reviewed-by: Michal Hocko
    Cc: Li Zefan
    Cc: [3.9.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Vagin
     
  • If an error occurs, for example an EIO in __ocfs2_prepare_orphan_dir,
    ocfs2_prep_new_orphaned_file will release the inode_ac, then when the
    caller of ocfs2_prep_new_orphaned_file gets a 0 return, it will refer to
    a NULL ocfs2_alloc_context struct in the following functions. A kernel
    panic happens.

    Signed-off-by: "Xiaowei.Hu"
    Reviewed-by: shencanquan
    Acked-by: Sunil Mushran
    Cc: Joe Jin
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xiaowei.Hu
     
  • For 'while' looping, need stop when 'nbytes == 0', or will cause issue.
    ('nbytes' is size_t which is always bigger or equal than zero).

    The related warning: (with EXTRA_CFLAGS=-W)

    lib/mpi/mpicoder.c:40:2: warning: comparison of unsigned expression >= 0 is always true [-Wtype-limits]

    Signed-off-by: Chen Gang
    Cc: Rusty Russell
    Cc: David Howells
    Cc: James Morris
    Cc: Andy Shevchenko
    Acked-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen Gang
     
  • The dmesg_restrict sysctl currently covers the syslog method for access
    dmesg, however /dev/kmsg isn't covered by the same protections. Most
    people haven't noticed because util-linux dmesg(1) defaults to using the
    syslog method for access in older versions. With util-linux dmesg(1)
    defaults to reading directly from /dev/kmsg.

    To fix /dev/kmsg, let's compare the existing interfaces and what they
    allow:

    - /proc/kmsg allows:
    - open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
    single-reader interface (SYSLOG_ACTION_READ).
    - everything, after an open.

    - syslog syscall allows:
    - anything, if CAP_SYSLOG.
    - SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
    dmesg_restrict==0.
    - nothing else (EPERM).

    The use-cases were:
    - dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
    - sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
    destructive SYSLOG_ACTION_READs.

    AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
    clear the ring buffer.

    Based on the comments in devkmsg_llseek, it sounds like actions besides
    reading aren't going to be supported by /dev/kmsg (i.e.
    SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
    syslog syscall actions.

    To this end, move the check as Josh had done, but also rename the
    constants to reflect their new uses (SYSLOG_FROM_CALL becomes
    SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
    SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
    allows destructive actions after a capabilities-constrained
    SYSLOG_ACTION_OPEN check.

    - /dev/kmsg allows:
    - open if CAP_SYSLOG or dmesg_restrict==0
    - reading/polling, after open

    Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192

    [akpm@linux-foundation.org: use pr_warn_once()]
    Signed-off-by: Kees Cook
    Reported-by: Christian Kujau
    Tested-by: Josh Boyer
    Cc: Kay Sievers
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • We recently noticed that reboot of a 1024 cpu machine takes approx 16
    minutes of just stopping the cpus. The slowdown was tracked to commit
    f96972f2dc63 ("kernel/sys.c: call disable_nonboot_cpus() in
    kernel_restart()").

    The current implementation does all the work of hot removing the cpus
    before halting the system. We are switching to just migrating to the
    boot cpu and then continuing with shutdown/reboot.

    This also has the effect of not breaking x86's command line parameter
    for specifying the reboot cpu. Note, this code was shamelessly copied
    from arch/x86/kernel/reboot.c with bits removed pertaining to the
    reboot_cpu command line parameter.

    Signed-off-by: Robin Holt
    Tested-by: Shawn Guo
    Cc: "Srivatsa S. Bhat"
    Cc: H. Peter Anvin
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Russ Anderson
    Cc: Robin Holt
    Cc: Russell King
    Cc: Guan Xuetao
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robin Holt
     
  • There are instances in the kernel where we would like to disable CPU
    hotplug (from sysfs) during some important operation. Today the freezer
    code depends on this and the code to do it was kinda tailor-made for
    that.

    Restructure the code and make it generic enough to be useful for other
    usecases too.

    Signed-off-by: Srivatsa S. Bhat
    Signed-off-by: Robin Holt
    Cc: H. Peter Anvin
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: Russ Anderson
    Cc: Robin Holt
    Cc: Russell King
    Cc: Guan Xuetao
    Cc: Shawn Guo
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Srivatsa S. Bhat
     

12 Jun, 2013

2 commits

  • Pull NVMe fixes from Matthew Wilcox.

    * 'fixes-3.10' of git://git.infradead.org/users/willy/linux-nvme:
    NVMe: Add MSI support
    NVMe: Use dma_set_mask() correctly
    Return the result from user admin command IOCTL even in case of failure
    NVMe: Do not cancel command multiple times
    NVMe: fix error return code in nvme_submit_bio_queue()
    NVMe: check for integer overflow in nvme_map_user_pages()
    MAINTAINERS: update NVM EXPRESS DRIVER file list
    NVMe: Fix a signedness bug in nvme_trans_modesel_get_mp
    NVMe: Remove redundant version.h header include

    Linus Torvalds
     
  • Pull kvm bugfixes from Gleb Natapov:
    "There is one more fix for MIPS KVM ABI here, MIPS and PPC build
    breakage fixes and a couple of PPC bug fixes"

    * 'fixes' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
    kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit()
    kvm/ppc/booke: Hold srcu lock when calling gfn functions
    kvm/ppc/booke64: Disable e6500 support
    kvm/ppc/booke64: Fix AltiVec interrupt numbers and build breakage
    mips/kvm: Use KVM_REG_MIPS and proper size indicators for *_ONE_REG
    kvm: Add definition of KVM_REG_MIPS
    KVM: add kvm_para_available to asm-generic/kvm_para.h

    Linus Torvalds
     

11 Jun, 2013

12 commits

  • EE is hard-disabled on entry to kvmppc_handle_exit(), so call
    hard_irq_disable() so that PACA_IRQ_HARD_DIS is set, and soft_enabled
    is unset.

    Without this, we get warnings such as arch/powerpc/kernel/time.c:300,
    and sometimes host kernel hangs.

    Signed-off-by: Scott Wood
    Signed-off-by: Gleb Natapov

    Scott Wood
     
  • KVM core expects arch code to acquire the srcu lock when calling
    gfn_to_memslot and similar functions.

    Signed-off-by: Scott Wood
    Signed-off-by: Gleb Natapov

    Scott Wood
     
  • The previous patch made 64-bit booke KVM build again, but Altivec
    support is still not complete, and we can't prevent the guest from
    turning on Altivec (which can corrupt host state until state
    save/restore is implemented). Disable e6500 on KVM until this is
    fixed.

    Signed-off-by: Scott Wood
    Signed-off-by: Gleb Natapov

    Scott Wood
     
  • Interrupt numbers defined for Book3E follows IVORs definition. Align
    BOOKE_INTERRUPT_ALTIVEC_UNAVAIL and BOOKE_INTERRUPT_ALTIVEC_ASSIST to this
    rule which also fixes the build breakage.
    IVORs 32 and 33 are shared so reflect this in the interrupts naming.

    This fixes a build break for 64-bit booke KVM.

    Signed-off-by: Mihai Caraman
    Signed-off-by: Scott Wood
    Signed-off-by: Gleb Natapov

    Mihai Caraman
     
  • The API requires that the GET_ONE_REG and SET_ONE_REG ioctls have this
    extra information encoded in the register identifiers.

    Signed-off-by: David Daney
    Signed-off-by: Gleb Natapov

    David Daney
     
  • We use 0x7000000000000000ULL as 0x6000000000000000ULL is reserved for
    ARM64.

    Signed-off-by: David Daney
    Signed-off-by: Gleb Natapov

    David Daney
     
  • The stop machine logic can lock up if all but one of the migration
    threads make it through the disable-irq step and the one remaining
    thread gets stuck in __do_softirq. The reason __do_softirq can hang is
    that it has a bail-out based on jiffies timeout, but in the lockup case,
    jiffies itself is not incremented.

    To work around this, re-add the max_restart counter in __do_irq and stop
    processing irqs after 10 restarts.

    Thanks to Tejun Heo and Rusty Russell and others for helping me track
    this down.

    This was introduced in 3.9 by commit c10d73671ad3 ("softirq: reduce
    latencies").

    It may be worth looking into ath9k to see if it has issues with its irq
    handler at a later date.

    The hang stack traces look something like this:

    ------------[ cut here ]------------
    WARNING: at kernel/watchdog.c:245 watchdog_overflow_callback+0x9c/0xa7()
    Watchdog detected hard LOCKUP on cpu 2
    Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
    Pid: 23, comm: migration/2 Tainted: G C 3.9.4+ #11
    Call Trace:
    warn_slowpath_common+0x85/0x9f
    warn_slowpath_fmt+0x46/0x48
    watchdog_overflow_callback+0x9c/0xa7
    __perf_event_overflow+0x137/0x1cb
    perf_event_overflow+0x14/0x16
    intel_pmu_handle_irq+0x2dc/0x359
    perf_event_nmi_handler+0x19/0x1b
    nmi_handle+0x7f/0xc2
    do_nmi+0xbc/0x304
    end_repeat_nmi+0x1e/0x2e
    <>
    cpu_stopper_thread+0xae/0x162
    smpboot_thread_fn+0x258/0x260
    kthread+0xc7/0xcf
    ret_from_fork+0x7c/0xb0
    ---[ end trace 4947dfa9b0a4cec3 ]---
    BUG: soft lockup - CPU#1 stuck for 22s! [migration/1:17]
    Modules linked in: ath9k ath9k_common ath9k_hw ath mac80211 cfg80211 nfsv4 auth_rpcgss nfs fscache nf_nat_ipv4 nf_nat veth 8021q garp stp mrp llc pktgen lockd sunrpc]
    irq event stamp: 835637905
    hardirqs last enabled at (835637904): __do_softirq+0x9f/0x257
    hardirqs last disabled at (835637905): apic_timer_interrupt+0x6d/0x80
    softirqs last enabled at (5654720): __do_softirq+0x1ff/0x257
    softirqs last disabled at (5654725): irq_exit+0x5f/0xbb
    CPU 1
    Pid: 17, comm: migration/1 Tainted: G WC 3.9.4+ #11 To be filled by O.E.M. To be filled by O.E.M./To be filled by O.E.M.
    RIP: tasklet_hi_action+0xf0/0xf0
    Process migration/1
    Call Trace:

    __do_softirq+0x117/0x257
    irq_exit+0x5f/0xbb
    smp_apic_timer_interrupt+0x8a/0x98
    apic_timer_interrupt+0x72/0x80

    printk+0x4d/0x4f
    stop_machine_cpu_stop+0x22c/0x274
    cpu_stopper_thread+0xae/0x162
    smpboot_thread_fn+0x258/0x260
    kthread+0xc7/0xcf
    ret_from_fork+0x7c/0xb0

    Signed-off-by: Ben Greear
    Acked-by: Tejun Heo
    Acked-by: Pekka Riikonen
    Cc: Eric Dumazet
    Cc: stable@kernel.org
    Signed-off-by: Linus Torvalds

    Ben Greear
     
  • Pull net/9p bug fix from Eric Van Hensbergen:
    "zero copy error fix"

    * tag '9p-3.10-bug-fix-1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    net/9p: Handle error in zero copy request correctly for 9p2000.u

    Linus Torvalds
     
  • Pull spi fixes from Mark Brown:
    "A few nasty issues, particularly a race with the interrupt controller
    in the xilinx driver, together with a couple of more minor fixes and a
    much needed move of the mailing list away from sourceforge."

    * tag 'spi-v3.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
    spi: hspi: fixup long delay time
    spi: spi-xilinx: Remove ISR race condition
    spi: topcliff-pch: fix error return code in pch_spi_probe()
    spi: topcliff-pch: Pass correct pointer to free_irq()
    spi: Move mailing list to vger

    Linus Torvalds
     
  • …kernel/git/konrad/xen

    Pull xen fixes from Konrad Rzeszutek Wilk:
    "Two bug-fixes for regressions:
    - xen/tmem stopped working after a certain combination of
    modprobe/swapon was used
    - cpu online/offlining would trigger WARN_ON."

    * tag 'stable/for-linus-3.10-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
    xen/tmem: Don't over-write tmem_frontswap_poolid after tmem_frontswap_init set it.
    xen/smp: Fixup NOHZ per cpu data when onlining an offline CPU.

    Linus Torvalds
     
  • Pull regmap fixes from Mark Brown:
    "The biggest fix here is Lars-Peter's fix for custom locking callbacks
    which is pretty localised but important for those devices that use the
    feature. Otherwise we've got a couple of fairly small cleanups which
    would have been sent sooner were it not for letting Lars-Peter's patch
    soak for a while"

    * tag 'regmap-v3.10-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
    regmap: rbtree: Fixed node range check on sync
    regmap: regcache: Fixup locking for custom lock callbacks
    regmap: debugfs: Check return value of regmap_write()

    Linus Torvalds
     
  • Pull crypto fixes from Herbert Xu:
    "This fixes a build problem in sahara and temporarily disables two new
    optimisations because of performance regressions until a permanent fix
    is ready"

    * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: sahara - fix building as module
    crypto: blowfish - disable AVX2 implementation
    crypto: twofish - disable AVX2 implementation

    Linus Torvalds
     

10 Jun, 2013

1 commit

  • Commit 10a7a0771399a57a297fca9615450dbb3f88081a ("xen: tmem: enable Xen
    tmem shim to be built/loaded as a module") allows the tmem module
    to be loaded any time. For this work the frontswap API had to
    be able to asynchronously to call tmem_frontswap_init before
    or after the swap image had been set. That was added in git
    commit 905cd0e1bf9ffe82d6906a01fd974ea0f70be97a
    ("mm: frontswap: lazy initialization to allow tmem backends to build/run as modules").

    Which means we could do this (The common case):

    modprobe tmem [so calls frontswap_register_ops, no ->init]
    modifies tmem_frontswap_poolid = -1
    swapon /dev/xvda1 [__frontswap_init, calls -> init, tmem_frontswap_poolid is
    < 0 so tmem hypercall done]

    Or the failing one:

    swapon /dev/xvda1 [calls __frontswap_init, sets the need_init bitmap]
    modprobe tmem [calls frontswap_register_ops, -->init calls, finds out
    tmem_frontswap_poolid is 0, does not make a hypercall.
    Later in the module_init, sets tmem_frontswap_poolid=-1]

    Which meant that in the failing case we would not call the hypercall
    to initialize the pool and never be able to make any frontswap
    backend calls.

    Moving the frontswap_register_ops after setting the tmem_frontswap_poolid
    fixes it.

    Signed-off-by: Konrad Rzeszutek Wilk
    Reviewed-by: Bob Liu

    Konrad Rzeszutek Wilk