24 Sep, 2010

13 commits


23 Sep, 2010

27 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
    percpu: fix pcpu_last_unit_cpu

    Linus Torvalds
     
  • When we reboot, we disable vmx extensions or otherwise INIT gets blocked.
    If a task on another cpu hits a vmx instruction, it will fault if vmx is
    disabled. We trap that to avoid a nasty oops and spin until the reboot
    completes.

    Problem is, we sleep with interrupts disabled. This blocks smp_send_stop()
    from running, and the reboot process halts.

    Fix by enabling interrupts before spinning.

    KVM-Stable-Tag.
    Signed-off-by: Avi Kivity
    Signed-off-by: Marcelo Tosatti

    Avi Kivity
     
  • I think I see the following (theoretical) race:

    During irqfd assign, we drop irqfds lock before we
    schedule inject work. Therefore, deassign running
    on another CPU could cause shutdown and flush to run
    before inject, causing user after free in inject.

    A simple fix it to schedule inject under the lock.

    Signed-off-by: Michael S. Tsirkin
    Acked-by: Gregory Haskins
    Signed-off-by: Marcelo Tosatti

    Michael S. Tsirkin
     
  • When modprobe.conf has
    options ipmi_si type="kcs" ports=0xCA2 regspacings="4"

    ipmi_si can be loaded properly, but when try to unload it get:

    Sep 20 15:00:27 xx abrt: Kerneloops: Reported 1 kernel oopses to Abrt
    Sep 20 15:00:27 xx abrtd: Directory 'kerneloops-1285020027-1' creation detected
    Sep 20 15:00:27 xx abrtd: New crash /var/spool/abrt/kerneloops-1285020027-1, processing
    Sep 20 15:01:09 xx kernel: ------------[ cut here ]------------
    Sep 20 15:01:09 xx kernel: WARNING: at drivers/base/driver.c:262 driver_unregister+0x8a/0xa0()
    Sep 20 15:01:09 xx kernel: Hardware name: Sun Fire x4800
    Sep 20 15:01:09 xx kernel: Unexpected driver unregister!
    Sep 20 15:01:09 xx kernel: Modules linked in: ipmi_si(-) ipmi_msghandler ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat bridge stp llc autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf xt_physdev be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb3i iw_cxgb3 cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr ipv6 iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_region_hash dm_log dm_mod vhost_net macvtap macvlan tun kvm_intel kvm uinput sg ses enclosure ahci libahci pcspkr i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support igb dca i7core_edac edac_core ext3 jbd mbcache sd_mod crc_t10dif megaraid_sas [last unloaded: ipmi_devintf]
    Sep 20 15:01:09 xx kernel: Pid: 10625, comm: modprobe Tainted: G W 2.6.36-rc5-tip+ #6
    Sep 20 15:01:09 xx kernel: Call Trace:
    Sep 20 15:01:09 xx kernel: [] warn_slowpath_common+0x7f/0xc0
    Sep 20 15:01:09 xx kernel: [] warn_slowpath_fmt+0x46/0x50
    Sep 20 15:01:09 xx kernel: [] driver_unregister+0x8a/0xa0
    Sep 20 15:01:09 xx kernel: [] pnp_unregister_driver+0x12/0x20
    Sep 20 15:01:09 xx kernel: [] cleanup_ipmi_si+0x3c/0xa7 [ipmi_si]
    Sep 20 15:01:09 xx kernel: [] sys_delete_module+0x1a0/0x270
    Sep 20 15:01:09 xx kernel: [] ? do_page_fault+0x150/0x320
    Sep 20 15:01:09 xx kernel: [] system_call_fastpath+0x16/0x1b
    Sep 20 15:01:09 xx kernel: ---[ end trace 0d1967161adcee0d ]---

    We need to check if ipmi_pnp_driver is loaded before we try to unload it.

    Signed-off-by: Yinghai Lu
    Cc: Corey Minyard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     
  • This change resolves a problem about unbalanced calls of
    enable_irq_wakeup() and disable_irq_wakeup() for alarm interrupt.

    Bug reproduction:

    root@eb600:~# echo 0 > /sys/class/rtc/rtc0/wakealarm

    WARNING: at kernel/irq/manage.c:361 set_irq_wake+0x7c/0xe4()
    Unbalanced IRQ 46 wake disable
    Modules linked in:
    [] (unwind_backtrace+0x0/0xd8) from [] (warn_slowpath_common+0x44/0x5c)
    [] (warn_slowpath_common+0x44/0x5c) from [] (warn_slowpath_fmt+0x24/0x30)
    [] (warn_slowpath_fmt+0x24/0x30) from [] (set_irq_wake+0x7c/0xe4)
    [] (set_irq_wake+0x7c/0xe4) from [] (s3c_rtc_setalarm+0xa8/0xb8)
    [] (s3c_rtc_setalarm+0xa8/0xb8) from [] (rtc_set_alarm+0x60/0x74)
    [] (rtc_set_alarm+0x60/0x74) from [] (rtc_sysfs_set_wakealarm+0xc8/0xd8)
    [] (rtc_sysfs_set_wakealarm+0xc8/0xd8) from [] (dev_attr_store+0x20/0x24)
    [] (dev_attr_store+0x20/0x24) from [] (sysfs_write_file+0x104/0x13c)
    [] (sysfs_write_file+0x104/0x13c) from [] (vfs_write+0xb0/0x158)
    [] (vfs_write+0xb0/0x158) from [] (sys_write+0x3c/0x68)
    [] (sys_write+0x3c/0x68) from [] (ret_fast_syscall+0x0/0x28)

    Signed-off-by: Vladimir Zapolskiy
    Cc: Alessandro Zummo
    Cc: Ben Dooks
    Cc: Atul Dahiya
    Cc: Taekgyun Ko
    Cc: Kukjin Kim
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Zapolskiy
     
  • If __split_vma fails because of an out of memory condition the
    anon_vma_chain isn't teardown and freed potentially leading to rmap walks
    accessing freed vma information plus there's a memleak.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Johannes Weiner
    Acked-by: Rik van Riel
    Acked-by: Hugh Dickins
    Cc: Marcelo Tosatti
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • The below bug in fork led to the rmap walk finding the parent huge-pmd
    twice instead of just once, because the anon_vma_chain objects of the
    child vma still point to the vma->vm_mm of the parent.

    The patch fixes it by making the rmap walk accurate during fork. It's not
    a big deal normally but it worth being accurate considering the cost is
    the same.

    Signed-off-by: Andrea Arcangeli
    Acked-by: Johannes Weiner
    Acked-by: Rik van Riel
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • drivers/pci/intel-iommu.c: In function `__iommu_calculate_agaw':
    drivers/pci/intel-iommu.c:437: sorry, unimplemented: inlining failed in call to 'width_to_agaw': function body not available
    drivers/pci/intel-iommu.c:445: sorry, unimplemented: called from here

    Move the offending function (and its siblings) to top-of-file, remove the
    forward declaration.

    Addresses https://bugzilla.kernel.org/show_bug.cgi?id=17441

    Reported-by: Martin Mokrejs
    Cc: David Woodhouse
    Cc: Jesse Barnes
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • /proc/sys/vm/oom_dump_tasks is enabled by default, so it's necessary to
    limit as much information as possible that it should emit.

    The tasklist dump should be filtered to only those tasks that are eligible
    for oom kill. This is already done for memcg ooms, but this patch extends
    it to both cpuset and mempolicy ooms as well as init.

    In addition to suppressing irrelevant information, this also reduces
    confusion since users currently don't know which tasks in the tasklist
    aren't eligible for kill (such as those attached to cpusets or bound to
    mempolicies with a disjoint set of mems or nodes, respectively) since that
    information is not shown.

    Signed-off-by: David Rientjes
    Reviewed-by: KOSAKI Motohiro
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • The FBIOGET_VBLANK device ioctl allows unprivileged users to read 16 bytes
    of uninitialized stack memory, because the "reserved" member of the
    fb_vblank struct declared on the stack is not altered or zeroed before
    being copied back to the user. This patch takes care of it.

    Signed-off-by: Dan Rosenberg
    Cc: Thomas Winischhofer
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Rosenberg
     
  • This fixes:
    incompatible pointer type: => 89
    arch/um/kernel/exec.c: warning: passing argument 2 of 'execve1' from
    incompatible pointer type: => 69, 85
    arch/um/kernel/exec.c: warning: passing argument 3 of 'execve1' from
    incompatible pointer type: => 69, 85

    which was introduced by d7627467b7a8d ("Make do_execve() take a const
    filename pointer")

    Signed-off-by: Richard Weinberger
    Cc: David Howells
    Cc: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Weinberger
     
  • Currently, /proc//smaps has wrong dirty pages accounting.
    Shared_Dirty and Private_Dirty output only pte dirty pages and ignore
    PG_dirty page flag. It is difference against documentation, but also
    inconsistent against Referenced field. (Referenced checks both pte and
    page flags)

    This patch fixes it.

    Test program:

    large-array.c
    ---------------------------------------------------
    #include
    #include
    #include
    #include

    char array[1*1024*1024*1024L];

    int main(void)
    {
    memset(array, 1, sizeof(array));
    pause();

    return 0;
    }
    ---------------------------------------------------

    Test case:
    1. run ./large-array
    2. cat /proc/`pidof large-array`/smaps
    3. swapoff -a
    4. cat /proc/`pidof large-array`/smaps again

    Test result:

    00601000-40601000 rw-p 00000000 00:00 0
    Size: 1048576 kB
    Rss: 1048576 kB
    Pss: 1048576 kB
    Shared_Clean: 0 kB
    Shared_Dirty: 0 kB
    Private_Clean: 218992 kB

    00601000-40601000 rw-p 00000000 00:00 0
    Size: 1048576 kB
    Rss: 1048576 kB
    Pss: 1048576 kB
    Shared_Clean: 0 kB
    Shared_Dirty: 0 kB
    Private_Clean: 0 kB
    Private_Dirty: 1048576 kB
    Acked-by: Hugh Dickins
    Cc: Matt Mackall
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • Fix the lockdep warning:

    [ 13.657164] INFO: trying to register non-static key.
    [ 13.657169] the code is fine but needs lockdep annotation.
    [ 13.657171] turning off the locking correctness validator.
    [ 13.657177] Pid: 622, comm: modprobe Not tainted 2.6.36-rc3c #8
    [ 13.657180] Call Trace:
    [ 13.657194] [] ? printk+0x18/0x20
    [ 13.657202] [] register_lock_class+0x336/0x350
    [ 13.657208] [] __lock_acquire+0x449/0x1180
    [ 13.657215] [] lock_acquire+0x67/0x80
    [ 13.657222] [] ? __cancel_work_timer+0x51/0x230
    [ 13.657227] [] __cancel_work_timer+0x83/0x230
    [ 13.657231] [] ? __cancel_work_timer+0x51/0x230
    [ 13.657236] [] ? mark_held_locks+0x62/0x80
    [ 13.657243] [] ? kfree+0x7f/0xe0
    [ 13.657248] [] ? trace_hardirqs_on_caller+0x11c/0x160
    [ 13.657253] [] ? trace_hardirqs_on+0xb/0x10
    [ 13.657259] [] ? fbcon_deinit+0x16d/0x1e0
    [ 13.657263] [] ? fbcon_deinit+0x16d/0x1e0
    [ 13.657268] [] cancel_work_sync+0xa/0x10
    [ 13.657272] [] fbcon_deinit+0xe4/0x1e0
    ...

    The warning is caused by trying to cancel an uninitialized work from
    fbcon_exit(). Fix it by adding a check for queue.func, similarly to other
    places in this code.

    Signed-off-by: Jarek Poplawski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jarek Poplawski
     
  • Enable the EFI framebuffer on 14 more Macs, including the iMac11,1
    iMac10,1 iMac8,1 Macmini3,1 Macmini4,1 MacBook5,1 MacBook6,1 MacBook7,1
    MacBookPro2,2 MacBookPro5,2 MacBookPro5,3 MacBookPro6,1 MacBookPro6,2 and
    MacBookPro7,1

    Information gathered from various user submissions.

    https://bugzilla.redhat.com/show_bug.cgi?id=528232
    http://ubuntuforums.org/showthread.php?t=1557326

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Luke Macken
    Signed-off-by: Peter Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Luke Macken
     
  • Some Apple machines have identical DMI data but different memory
    configurations for the video. Given that, check that the address in our
    table is actually within the range of a PCI BAR on a VGA device in the
    machine.

    This also fixes up the return value from set_system(), which has always
    been wrong, but never resulted in bad behavior since there's only ever
    been one matching entry in the dmi table.

    The patch

    1) stops people's machines from crashing when we get their display wrong,
    which seems to be unfortunately inevitable,

    2) allows us to support identical dmi data with differing video memory
    configurations

    This also adds me as the efifb maintainer, since I've effectively been
    acting as such for quite some time.

    Signed-off-by: Peter Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Jones
     
  • OCFS2 can return ERESTARTSYS from its write function when the process is
    signalled while waiting for a cluster lock (and the filesystem is mounted
    with intr mount option). Generally, it seems reasonable to allow
    filesystems to return this error code from its IO functions. As we must
    not leak ERESTARTSYS (and similar error codes) to userspace as a result of
    an AIO operation, we have to properly convert it to EINTR inside AIO code
    (restarting the syscall isn't really an option because other AIO could
    have been already submitted by the same io_submit syscall).

    Signed-off-by: Jan Kara
    Reviewed-by: Jeff Moyer
    Cc: Christoph Hellwig
    Cc: Zach Brown
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • M. Vefa Bicakci reported 2.6.35 kernel hang up when hibernation on his
    32bit 3GB mem machine.
    (https://bugzilla.kernel.org/show_bug.cgi?id=16771). Also he bisected
    the regression to

    commit bb21c7ce18eff8e6e7877ca1d06c6db719376e3c
    Author: KOSAKI Motohiro
    Date: Fri Jun 4 14:15:05 2010 -0700

    vmscan: fix do_try_to_free_pages() return value when priority==0 reclaim failure

    At first impression, this seemed very strange because the above commit
    only chenged function return value and hibernate_preallocate_memory()
    ignore return value of shrink_all_memory(). But it's related.

    Now, page allocation from hibernation code may enter infinite loop if the
    system has highmem. The reasons are that vmscan don't care enough OOM
    case when oom_killer_disabled.

    The problem sequence is following as.

    1. hibernation
    2. oom_disable
    3. alloc_pages
    4. do_try_to_free_pages
    if (scanning_global_lru(sc) && !all_unreclaimable)
    return 1;

    If kswapd is not freozen, it would set zone->all_unreclaimable to 1 and
    then shrink_zones maybe return true(ie, all_unreclaimable is true). So at
    last, alloc_pages could go to _nopage_. If it is, it should have no
    problem.

    This patch adds all_unreclaimable check to protect in direct reclaim path,
    too. It can care of hibernation OOM case and help bailout
    all_unreclaimable case slightly.

    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Minchan Kim
    Reported-by: M. Vefa Bicakci
    Reported-by:
    Reviewed-by: Johannes Weiner
    Tested-by:
    Acked-by: Rafael J. Wysocki
    Acked-by: Rik van Riel
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • Otherwise, calling platform_get_drvdata() in ab3100_rtc_remove() returns
    NULL.

    Signed-off-by: Axel Lin
    Acked-by:Wan ZongShun
    Acked-by: Linus Walleij
    Cc: Alessandro Zummo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Axel Lin
     
  • Alter the maintainer of the AVR32 architecture and the AVR32/AT32AP
    machine support to me. Haavard is moving on to new challenges, and we've
    found it better to transfer the maintainer part to me. I will have good
    contact with Haavard anyway.

    Signed-off-by: Hans-Christian Egtvedt
    Acked-by: Haavard Skinnemoen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hans-Christian Egtvedt
     
  • In an effort to minimize customer confusion we want to unify naming
    convention for VMware-provided kernel modules. This change renames the
    balloon driver from vmware_ballon to vmw_balloon.

    We expect to follow this naming convention (vmw_) for all
    modules that are part of mainline kernel and/or being distributed by
    VMware, with the sole exception of vmxnet3 driver (since the name of
    mainline driver happens to match with the name used in VMware Tools).

    Signed-off-by: Dmitry Torokhov
    Acked-by: Bhavesh Davda
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Torokhov
     
  • This fixes the regression caused by the commit 6fee48cd330c68
    ("dma-mapping: arm: use generic pci_set_dma_mask and
    pci_set_consistent_dma_mask").

    ARM needs to clip the dma coherent mask for dmabounce devices. This
    restores the old trick.

    Note that strictly speaking, the DMA API doesn't allow architectures to do
    such but I'm not sure it's worth adding the new API to set the dma mask
    that allows architectures to clip it.

    Reported-by: Krzysztof Halasa
    Signed-off-by: FUJITA Tomonori
    Acked-by: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    FUJITA Tomonori
     
  • Commit 73296bc611 ("procfs: Use generic_file_llseek in /proc/vmcore")
    broke seeking on /proc/vmcore. This changes it back to use default_llseek
    in order to restore the original behaviour.

    The problem with generic_file_llseek is that it only allows seeks up to
    inode->i_sb->s_maxbytes, which is zero on procfs and some other virtual
    file systems. We should merge generic_file_llseek and default_llseek some
    day and clean this up in a proper way, but for 2.6.35/36, reverting vmcore
    is the safer solution.

    Signed-off-by: Arnd Bergmann
    Cc: Frederic Weisbecker
    Reported-by: CAI Qian
    Tested-by: CAI Qian
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     
  • After d9e1b6c45059ccf ("ipmi: fix ACPI detection with regspacing") we get

    [ 11.026326] ipmi_si: probing via ACPI
    [ 11.030019] ipmi_si 00:09: (null) regsize 1 spacing 1 irq 0
    [ 11.035594] ipmi_si: Adding ACPI-specified kcs state machine

    on an old system with only one range for ipmi kcs range.

    Try to fix it by adding another res pointer.

    Signed-off-by: Yinghai Lu
    Signed-off-by: Corey Minyard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     
  • A task's badness score is roughly a proportion of its rss and swap
    compared to the system's capacity. The scale ranges from 0 to 1000 with
    the highest score chosen for kill. Thus, this scale operates on a
    resolution of 0.1% of RAM + swap. Admin tasks are also given a 3% bonus,
    so the badness score of an admin task using 3% of memory, for example,
    would still be 0.

    It's possible that an exceptionally large number of tasks will combine to
    exhaust all resources but never have a single task that uses more than
    0.1% of RAM and swap (or 3.0% for admin tasks).

    This patch ensures that the badness score of any eligible task is never 0
    so the machine doesn't unnecessarily panic because it cannot find a task
    to kill.

    Signed-off-by: David Rientjes
    Cc: Dave Hansen
    Cc: Nitin Gupta
    Cc: Pekka Enberg
    Cc: Minchan Kim
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • In 32-bit compatibility mode, the error handling for
    compat_do_readv_writev() may free an uninitialized pointer, potentially
    leading to all sorts of ugly memory corruption. This is reliably
    triggerable by unprivileged users by invoking the readv()/writev()
    syscalls with an invalid iovec pointer. The below patch fixes this to
    emulate the non-compat version.

    Introduced by commit b83733639a49 ("compat: factor out
    compat_rw_copy_check_uvector from compat_do_readv_writev")

    Signed-off-by: Dan Rosenberg
    Cc: stable@kernel.org (2.6.35)
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Dan Rosenberg
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
    sparc: Prevent no-handler signal syscall restart recursion.
    sparc: Don't mask signal when we can't setup signal frame.
    sparc64: Fix race in signal instruction flushing.
    sparc64: Support RAW perf events.

    Linus Torvalds
     
  • Make sigreturn zero regs->trap, make do_signal() do the same on all
    paths. As it is, signal interrupting e.g. read() from fd 512 (==
    ERESTARTSYS) with another signal getting unblocked when the first
    handler finishes will lead to restart one insn earlier than it ought
    to. Same for multiple signals with in-kernel handlers interrupting
    that sucker at the same time. Same for multiple signals of any kind
    interrupting that sucker on 64bit...

    Signed-off-by: Al Viro
    Acked-by: Paul Mackerras
    Signed-off-by: Linus Torvalds

    Al Viro