22 Mar, 2011

40 commits

  • Greg Kroah-Hartman
     
  • commit bc10f96757bd6ab3721510df8defa8f21c32f974 upstream.

    Remove the call to tty_ldisc_flush() from the RESULT_NO_CARRIER
    branch of isdn_tty_modem_result(), as already proposed in commit
    00409bb045887ec5e7b9e351bc080c38ab6bfd33.
    This avoids a "sleeping function called from invalid context" BUG
    when the hardware driver calls the statcallb() callback with
    command==ISDN_STAT_DHUP in atomic context, which in turn calls
    isdn_tty_modem_result(RESULT_NO_CARRIER, ~), and from there,
    tty_ldisc_flush() which may sleep.

    Signed-off-by: Tilman Schmidt
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Tilman Schmidt
     
  • commit 4981d01eada5354d81c8929d5b2836829ba3df7b upstream.

    According to intel CPU manual, every time PGD entry is changed in i386 PAE
    mode, we need do a full TLB flush. Current code follows this and there is
    comment for this too in the code.

    But current code misses the multi-threaded case. A changed page table
    might be used by several CPUs, every such CPU should flush TLB. Usually
    this isn't a problem, because we prepopulate all PGD entries at process
    fork. But when the process does munmap and follows new mmap, this issue
    will be triggered.

    When it happens, some CPUs keep doing page faults:

    http://marc.info/?l=linux-kernel&m=129915020508238&w=2

    Reported-by: Yasunori Goto
    Tested-by: Yasunori Goto
    Reviewed-by: Rik van Riel
    Signed-off-by: Shaohua Li
    Cc: Mallick Asit K
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: linux-mm
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Shaohua Li
     
  • commit 45a5791920ae643eafc02e2eedef1a58e341b736 upstream.

    Paul McKenney's review pointed out two problems with the barriers in the
    2.6.38 update to the smp call function many code.

    First, a barrier that would force the func and info members of data to
    be visible before their consumption in the interrupt handler was
    missing. This can be solved by adding a smp_wmb between setting the
    func and info members and setting setting the cpumask; this will pair
    with the existing and required smp_rmb ordering the cpumask read before
    the read of refs. This placement avoids the need a second smp_rmb in
    the interrupt handler which would be executed on each of the N cpus
    executing the call request. (I was thinking this barrier was present
    but was not).

    Second, the previous write to refs (establishing the zero that we the
    interrupt handler was testing from all cpus) was performed by a third
    party cpu. This would invoke transitivity which, as a recient or
    concurrent addition to memory-barriers.txt now explicitly states, would
    require a full smp_mb().

    However, we know the cpumask will only be set by one cpu (the data
    owner) and any preivous iteration of the mask would have cleared by the
    reading cpu. By redundantly writing refs to 0 on the owning cpu before
    the smp_wmb, the write to refs will follow the same path as the writes
    that set the cpumask, which in turn allows us to keep the barrier in the
    interrupt handler a smp_rmb instead of promoting it to a smp_mb (which
    will be be executed by N cpus for each of the possible M elements on the
    list).

    I moved and expanded the comment about our (ab)use of the rcu list
    primitives for the concurrent walk earlier into this function. I
    considered moving the first two paragraphs to the queue list head and
    lock, but felt it would have been too disconected from the code.

    Cc: Paul McKinney
    Signed-off-by: Milton Miller
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Milton Miller
     
  • commit e6cd1e07a185d5f9b0aa75e020df02d3c1c44940 upstream.

    Peter pointed out there was nothing preventing the list_del_rcu in
    smp_call_function_interrupt from running before the list_add_rcu in
    smp_call_function_many.

    Fix this by not setting refs until we have gotten the lock for the list.
    Take advantage of the wmb in list_add_rcu to save an explicit additional
    one.

    I tried to force this race with a udelay before the lock & list_add and
    by mixing all 64 online cpus with just 3 random cpus in the mask, but
    was unsuccessful. Still, inspection shows a valid race, and the fix is
    a extension of the existing protection window in the current code.

    Reported-by: Peter Zijlstra
    Signed-off-by: Milton Miller
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Milton Miller
     
  • commit d7433142b63d727b5a217c37b1a1468b116a9771 upstream.

    (crossport of 1f7bebb9e911d870fa8f997ddff838e82b5715ea
    by Andreas Schlick )

    When ext3_dx_add_entry() has to split an index node, it has to ensure that
    name_len of dx_node's fake_dirent is also zero, because otherwise e2fsck
    won't recognise it as an intermediate htree node and consider the htree to
    be corrupted.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Eric Sandeen
     
  • commit 0837e3242c73566fc1c0196b4ec61779c25ffc93 upstream.

    Events on POWER7 can roll back if a speculative event doesn't
    eventually complete. Unfortunately in some rare cases they will
    raise a performance monitor exception. We need to catch this to
    ensure we reset the PMC. In all cases the PMC will be 256 or less
    cycles from overflow.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Anton Blanchard
     
  • commit e020c6800c9621a77223bf2c1ff68180e41e8ebf upstream.

    This fixes a race in which the task->tk_callback() puts the rpc_task
    to sleep, setting a new callback. Under certain circumstances, the current
    code may end up executing the task->tk_action before it gets round to the
    callback.

    Signed-off-by: Trond Myklebust
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • commit ed0f36bc5719b25659b637f80ceea85494b84502 upstream.

    The use of blk_execute_rq_nowait() implies __blk_put_request() is needed
    in stpg_endio() rather than blk_put_request() -- blk_finish_request() is
    called with queue lock already held.

    Signed-off-by: Joseph Gruher
    Signed-off-by: Ilgu Hong
    Signed-off-by: Mike Snitzer
    Signed-off-by: James Bottomley
    Signed-off-by: Greg Kroah-Hartman

    Joseph Gruher
     
  • commit efed5f26664f93991c929d5bb343e65f900d72bc upstream.

    Clear input settings before initialization.

    Signed-off-by: Przemyslaw Bruski
    Signed-off-by: Takashi Iwai
    Signed-off-by: Greg Kroah-Hartman

    Przemyslaw Bruski
     
  • commit f164753a263bfd2daaf3e0273b179de7e099c57d upstream.

    SDPIF status retrieval always returned the default settings instead of
    the actual ones.

    Signed-off-by: Przemyslaw Bruski
    Signed-off-by: Takashi Iwai
    Signed-off-by: Greg Kroah-Hartman

    Przemyslaw Bruski
     
  • commit 4c1847e884efddcc3ede371f7839e5e65b25c34d upstream.

    SPDIF status mask creation was incorrect.

    Signed-off-by: Przemyslaw Bruski
    Signed-off-by: Takashi Iwai
    Signed-off-by: Greg Kroah-Hartman

    Przemyslaw Bruski
     
  • commit 0f12a4e29368a9476076515881d9ef4e5876c6e2 upstream.

    Commit 280c73d ("PCI: centralize the capabilities code in
    pci-sysfs.c") changed the initialisation of the "rom" and "vpd"
    attributes, and made the failure path for the "vpd" attribute
    incorrect. We must free the new attribute structure (attr), but
    instead we currently free dev->vpd->attr. That will normally be NULL,
    resulting in a memory leak, but it might be a stale pointer, resulting
    in a double-free.

    Found by inspection; compile-tested only.

    Signed-off-by: Ben Hutchings
    Signed-off-by: Jesse Barnes
    Signed-off-by: Greg Kroah-Hartman

    Ben Hutchings
     
  • commit 87e3dc3855430bd254370afc79f2ed92250f5b7c upstream.

    Some broken BIOSes on ICH4 chipset report an ACPI region which is in
    conflict with legacy IDE ports when ACPI is disabled. Even though the
    regions overlap, IDE ports are working correctly (we cannot find out
    the decoding rules on chipsets).

    So the only problem is the reported region itself, if we don't reserve
    the region in the quirk everything works as expected.

    This patch avoids reserving any quirk regions below PCIBIOS_MIN_IO
    which is 0x1000. Some regions might be (and are by a fast google
    query) below this border, but the only difference is that they won't
    be reserved anymore. They should still work though the same as before.

    The conflicts look like (1f.0 is bridge, 1f.1 is IDE ctrl):
    pci 0000:00:1f.1: address space collision: [io 0x0170-0x0177] conflicts with 0000:00:1f.0 [io 0x0100-0x017f]

    At 0x0100 a 128 bytes long ACPI region is reported in the quirk for
    ICH4. ata_piix then fails to find disks because the IDE legacy ports
    are zeroed:
    ata_piix 0000:00:1f.1: device not available (can't reserve [io 0x0000-0x0007])

    References: https://bugzilla.novell.com/show_bug.cgi?id=558740
    Signed-off-by: Jiri Slaby
    Cc: Bjorn Helgaas
    Cc: "David S. Miller"
    Cc: Thomas Renninger
    Signed-off-by: Jesse Barnes
    Signed-off-by: Greg Kroah-Hartman

    Jiri Slaby
     
  • commit cdb9755849fbaf2bb9c0a009ba5baa817a0f152d upstream.

    Per ICH4 and ICH6 specs, ACPI and GPIO regions are valid iff ACPI_EN
    and GPIO_EN bits are set to 1. Add checks for these bits into the
    quirks prior to the region creation.

    While at it, name the constants by macros.

    Signed-off-by: Jiri Slaby
    Cc: Bjorn Helgaas
    Cc: "David S. Miller"
    Cc: Thomas Renninger
    Signed-off-by: Jesse Barnes
    Signed-off-by: Greg Kroah-Hartman

    Jiri Slaby
     
  • commit b99af4b002e4908d1a5cdaf424529bdf1dc69768 upstream.

    Revert commit 7eb93b175d4de9438a4b0af3a94a112cb5266944
    Author: Yu Zhao
    Date: Fri Apr 3 15:18:11 2009 +0800

    PCI: SR-IOV quirk for Intel 82576 NIC

    If BIOS doesn't allocate resources for the SR-IOV BARs, zero the Flash
    BAR and program the SR-IOV BARs to use the old Flash Memory Space.

    Please refer to Intel 82576 Gigabit Ethernet Controller Datasheet
    section 7.9.2.14.2 for details.
    http://download.intel.com/design/network/datashts/82576_Datasheet.pdf

    Signed-off-by: Yu Zhao
    Signed-off-by: Jesse Barnes

    This quirk was added before SR-IOV was in production and now all machines that
    originally had this issue alreayd have bios updates to correct the issue. The
    quirk itself is no longer needed and in fact causes bugs if run. Remove it.

    Signed-off-by: Jesse Brandeburg
    CC: Yu Zhao
    CC: Jesse Barnes
    Signed-off-by: Jesse Barnes
    Signed-off-by: Greg Kroah-Hartman

    Brandeburg, Jesse
     
  • commit 094a42452abd5564429045e210281c6d22e67fca upstream.

    When the mux for digital mic is different from the mux for other mics,
    the current auto-parser doesn't handle them in a right way but provides
    only one mic. This patch fixes the issue.

    Signed-off-by: Vitaliy Kulikov
    Signed-off-by: Takashi Iwai
    Signed-off-by: Greg Kroah-Hartman

    Vitaliy Kulikov
     
  • commit a122eb2fdfd78b58c6dd992d6f4b1aaef667eef9 upstream.

    The XFS_IOC_FSGETXATTR ioctl allows unprivileged users to read 12
    bytes of uninitialized stack memory, because the fsxattr struct
    declared on the stack in xfs_ioc_fsgetxattr() does not alter (or zero)
    the 12-byte fsx_pad member before copying it back to the user. This
    patch takes care of it.

    Signed-off-by: Dan Rosenberg
    Reviewed-by: Eric Sandeen
    Signed-off-by: Alex Elder
    Cc: dann frazier
    Signed-off-by: Greg Kroah-Hartman

    Dan Rosenberg
     
  • commit d14fc1a74e846d7851f24fc9519fe87dc12a1231 upstream.

    Alan's commit 335f8514f200e63d689113d29cb7253a5c282967 introduced
    .carrier_raised function in several drivers. That also means
    tty_port_block_til_ready can now suspend the process trying to open the serial
    port when Carrier Detect is low and put it into tty_port.open_wait queue. We
    need to wake up the process when Carrier Detect goes high and trigger TTY
    hangup when CD goes low.

    Some of the devices do not report modem status line changes, or at least we
    don't understand the status message, so for those we remove .carrier_raised
    again.

    Signed-off-by: Libor Pechacek
    Signed-off-by: Greg Kroah-Hartman

    Libor Pechacek
     
  • commit 9926c0df7b31b2128eebe92e0e2b052f380ea464 upstream.

    Device ID removed 0x10C4/0x8149 for West Mountain Radio Computerized
    Battery Analyzer. This device is actually based on a SiLabs C8051Fxxx,
    see http://www.etheus.net/SiUSBXp_Linux_Driver for further info.

    Signed-off-by: Craig Shelley
    Signed-off-by: Greg Kroah-Hartman

    Craig Shelley
     
  • commit faea63f7ccfddfb8fc19798799fcd38c58415172 upstream.

    Device Ids added for IRZ Automation Teleport SG-10 GSM/GPRS Modem and
    DekTec DTA Plus VHF/UHF Booster/Attenuator.

    Signed-off-by: Craig Shelley
    Signed-off-by: Greg Kroah-Hartman

    Craig Shelley
     
  • commit 7571f089d7522a95c103558faf313c7af8856ceb upstream.

    In the vhci_urb_dequeue() function the TCP connection is checked twice.
    Each time when the TCP connection is closed the URB is unlinked and given
    back. Remove the second attempt of unlinking and giving back of the URB completely.

    This patch fixes the bug described at https://bugzilla.kernel.org/show_bug.cgi?id=24872 .

    Signed-off-by: Márton Németh
    Signed-off-by: Greg Kroah-Hartman

    Márton Németh
     
  • commit 4bdab43323b459900578b200a4b8cf9713ac8fab upstream.

    sctp_packet_config() is called when getting the packet ready
    for appending of chunks. The function should not touch the
    current state, since it's possible to ping-pong between two
    transports when sending, and that can result packet corruption
    followed by skb overlfow crash.

    Reported-by: Thomas Dreibholz
    Signed-off-by: Vlad Yasevich
    Signed-off-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Vlad Yasevich
     
  • commit 2a1b7e575b80ceb19ea50bfa86ce0053ea57181d upstream.

    I may have an explanation for the LSI 1068 HBA hangs provoked by ATA
    pass-through commands, in particular by smartctl.

    First, my version of the symptoms. On an LSI SAS1068E B3 HBA running
    01.29.00.00 firmware, with SATA disks, and with smartd running, I'm seeing
    occasional task, bus, and host resets, some of which lead to hard faults of
    the HBA requiring a reboot. Abusively looping the smartctl command,

    # while true; do smartctl -a /dev/sdb > /dev/null; done

    dramatically increases the frequency of these failures to nearly one per
    minute. A high IO load through the HBA while looping smartctl seems to
    improve the chance of a full scsi host reset or a non-recoverable hang.

    I reduced what smartctl was doing down to a simple test case which
    causes the hang with a single IO when pointed at the sd interface. See
    the code at the bottom of this e-mail. It uses an SG_IO ioctl to issue
    a single pass-through ATA identify device command. If the buffer
    userspace gives for the read data has certain alignments, the task is
    issued to the HBA but the HBA fails to respond. If run against the sg
    interface, neither the test code nor smartctl causes a hang.

    sd and sg handle the SG_IO ioctl slightly differently. Unless you
    specifically set a flag to do direct IO, sg passes a buffer of its own,
    which is page-aligned, to the block layer and later copies the result
    into the userspace buffer regardless of its alignment. sd, on the other
    hand, always does direct IO unless the userspace buffer fails an
    alignment test at block/blk-map.c line 57, in which case a page-aligned
    buffer is created and used for the transfer.

    The alignment test currently checks for word-alignment, the default
    setup by scsi_lib.c; therefore, userspace buffers of almost any
    alignment are given directly to the HBA as DMA targets. The LSI 1068
    hardware doesn't seem to like at least a couple of the alignments which
    cross a page boundary (see the test code below). Curiously, many
    page-boundary-crossing alignments do work just fine.

    So, either the hardware has an bug handling certain alignments or the
    hardware has a stricter alignment requirement than the driver is
    advertising. If stricter alignment is required, then in no case should
    misaligned buffers from userspace be allowed through without being
    bounced or at least causing an error to be returned.

    It seems the mptsas driver could use blk_queue_dma_alignment() to advertise
    a stricter alignment requirement. If it does, sd does the right thing and
    bounces misaligned buffers (see block/blk-map.c line 57). The following
    patch to 2.6.34-rc5 makes my symptoms go away. I'm sure this is the wrong
    place for this code, but it gets my idea across.

    Acked-by: Kashyap Desai
    Signed-off-by: James Bottomley
    Signed-off-by: Greg Kroah-Hartman

    Ryan Kuester
     
  • commit e75e863dd5c7d96b91ebbd241da5328fc38a78cc upstream.

    We have 32-bit variable overflow possibility when multiply in
    task_times() and thread_group_times() functions. When the
    overflow happens then the scaled utime value becomes erroneously
    small and the scaled stime becomes i erroneously big.

    Reported here:

    https://bugzilla.redhat.com/show_bug.cgi?id=633037
    https://bugzilla.kernel.org/show_bug.cgi?id=16559

    Reported-by: Michael Chapman
    Reported-by: Ciriaco Garcia de Celis
    Signed-off-by: Stanislaw Gruszka
    Signed-off-by: Peter Zijlstra
    Cc: Hidetoshi Seto
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Stanislaw Gruszka
     
  • commit 9c4cf6d94fb362c27a24df5223ed6e327eb7279a upstream.

    This patch adds the device id for the windy31 USB device to the rt73usb
    driver.

    Thanks to Ralf Flaxa for reporting this and providing testing and a
    sample device.

    Reported-by: Ralf Flaxa
    Tested-by: Ralf Flaxa
    Signed-off-by: Greg Kroah-Hartman
    Acked-by: Ivo van Doorn
    Signed-off-by: John W. Linville

    Greg Kroah-Hartman
     
  • commit 950eaaca681c44aab87a46225c9e44f902c080aa upstream.

    [ 23.584719]
    [ 23.584720] ===================================================
    [ 23.585059] [ INFO: suspicious rcu_dereference_check() usage. ]
    [ 23.585176] ---------------------------------------------------
    [ 23.585176] kernel/pid.c:419 invoked rcu_dereference_check() without protection!
    [ 23.585176]
    [ 23.585176] other info that might help us debug this:
    [ 23.585176]
    [ 23.585176]
    [ 23.585176] rcu_scheduler_active = 1, debug_locks = 1
    [ 23.585176] 1 lock held by rc.sysinit/728:
    [ 23.585176] #0: (tasklist_lock){.+.+..}, at: [] sys_setpgid+0x5f/0x193
    [ 23.585176]
    [ 23.585176] stack backtrace:
    [ 23.585176] Pid: 728, comm: rc.sysinit Not tainted 2.6.36-rc2 #2
    [ 23.585176] Call Trace:
    [ 23.585176] [] lockdep_rcu_dereference+0x99/0xa2
    [ 23.585176] [] find_task_by_pid_ns+0x50/0x6a
    [ 23.585176] [] find_task_by_vpid+0x1d/0x1f
    [ 23.585176] [] sys_setpgid+0x67/0x193
    [ 23.585176] [] system_call_fastpath+0x16/0x1b
    [ 24.959669] type=1400 audit(1282938522.956:4): avc: denied { module_request } for pid=766 comm="hwclock" kmod="char-major-10-135" scontext=system_u:system_r:hwclock_t:s0 tcontext=system_u:system_r:kernel_t:s0 tclas

    It turns out that the setpgid() system call fails to enter an RCU
    read-side critical section before doing a PID-to-task_struct translation.
    This commit therefore does rcu_read_lock() before the translation, and
    also does rcu_read_unlock() after the last use of the returned pointer.

    Reported-by: Andrew Morton
    Signed-off-by: Paul E. McKenney
    Acked-by: David Howells
    Cc: Jiri Slaby
    Cc: Oleg Nesterov
    Signed-off-by: Greg Kroah-Hartman

    Paul E. McKenney
     
  • commit 46b30ea9bc3698bc1d1e6fd726c9601d46fa0a91 upstream.

    pcpu_first/last_unit_cpu are used to track which cpu has the first and
    last units assigned. This in turn is used to determine the span of a
    chunk for man/unmap cache flushes and whether an address belongs to
    the first chunk or not in per_cpu_ptr_to_phys().

    When the number of possible CPUs isn't power of two, a chunk may
    contain unassigned units towards the end of a chunk. The logic to
    determine pcpu_last_unit_cpu was incorrect when there was an unused
    unit at the end of a chunk. It failed to ignore the unused unit and
    assigned the unused marker NR_CPUS to pcpu_last_unit_cpu.

    This was discovered through kdump failure which was caused by
    malfunctioning per_cpu_ptr_to_phys() on a kvm setup with 50 possible
    CPUs by CAI Qian.

    Signed-off-by: Tejun Heo
    Reported-by: CAI Qian
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • commit 72853e2991a2702ae93aaf889ac7db743a415dd3 upstream.

    When allocating a page, the system uses NR_FREE_PAGES counters to
    determine if watermarks would remain intact after the allocation was made.
    This check is made without interrupts disabled or the zone lock held and
    so is race-prone by nature. Unfortunately, when pages are being freed in
    batch, the counters are updated before the pages are added on the list.
    During this window, the counters are misleading as the pages do not exist
    yet. When under significant pressure on systems with large numbers of
    CPUs, it's possible for processes to make progress even though they should
    have been stalled. This is particularly problematic if a number of the
    processes are using GFP_ATOMIC as the min watermark can be accidentally
    breached and in extreme cases, the system can livelock.

    This patch updates the counters after the pages have been added to the
    list. This makes the allocator more cautious with respect to preserving
    the watermarks and mitigates livelock possibilities.

    [akpm@linux-foundation.org: avoid modifying incoming args]
    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: Christoph Lameter
    Reviewed-by: KOSAKI Motohiro
    Acked-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Mel Gorman
     
  • commit 9ee493ce0a60bf42c0f8fd0b0fe91df5704a1cbf upstream.

    When under significant memory pressure, a process enters direct reclaim
    and immediately afterwards tries to allocate a page. If it fails and no
    further progress is made, it's possible the system will go OOM. However,
    on systems with large amounts of memory, it's possible that a significant
    number of pages are on per-cpu lists and inaccessible to the calling
    process. This leads to a process entering direct reclaim more often than
    it should increasing the pressure on the system and compounding the
    problem.

    This patch notes that if direct reclaim is making progress but allocations
    are still failing that the system is already under heavy pressure. In
    this case, it drains the per-cpu lists and tries the allocation a second
    time before continuing.

    Signed-off-by: Mel Gorman
    Reviewed-by: Minchan Kim
    Reviewed-by: KAMEZAWA Hiroyuki
    Reviewed-by: KOSAKI Motohiro
    Reviewed-by: Christoph Lameter
    Cc: Dave Chinner
    Cc: Wu Fengguang
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Mel Gorman
     
  • …low and kswapd is awake

    commit aa45484031ddee09b06350ab8528bfe5b2c76d1c upstream.

    Ordinarily watermark checks are based on the vmstat NR_FREE_PAGES as it is
    cheaper than scanning a number of lists. To avoid synchronization
    overhead, counter deltas are maintained on a per-cpu basis and drained
    both periodically and when the delta is above a threshold. On large CPU
    systems, the difference between the estimated and real value of
    NR_FREE_PAGES can be very high. If NR_FREE_PAGES is much higher than
    number of real free page in buddy, the VM can allocate pages below min
    watermark, at worst reducing the real number of pages to zero. Even if
    the OOM killer kills some victim for freeing memory, it may not free
    memory if the exit path requires a new page resulting in livelock.

    This patch introduces a zone_page_state_snapshot() function (courtesy of
    Christoph) that takes a slightly more accurate view of an arbitrary vmstat
    counter. It is used to read NR_FREE_PAGES while kswapd is awake to avoid
    the watermark being accidentally broken. The estimate is not perfect and
    may result in cache line bounces but is expected to be lighter than the
    IPI calls necessary to continually drain the per-cpu counters while kswapd
    is awake.

    Signed-off-by: Christoph Lameter <cl@linux.com>
    Signed-off-by: Mel Gorman <mel@csn.ul.ie>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

    Christoph Lameter
     
  • commit 3d96406c7da1ed5811ea52a3b0905f4f0e295376 upstream.

    Fix a bug in keyctl_session_to_parent() whereby it tries to check the ownership
    of the parent process's session keyring whether or not the parent has a session
    keyring [CVE-2010-2960].

    This results in the following oops:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
    IP: [] keyctl_session_to_parent+0x251/0x443
    ...
    Call Trace:
    [] ? keyctl_session_to_parent+0x67/0x443
    [] ? __do_fault+0x24b/0x3d0
    [] sys_keyctl+0xb4/0xb8
    [] system_call_fastpath+0x16/0x1b

    if the parent process has no session keyring.

    If the system is using pam_keyinit then it mostly protected against this as all
    processes derived from a login will have inherited the session keyring created
    by pam_keyinit during the log in procedure.

    To test this, pam_keyinit calls need to be commented out in /etc/pam.d/.

    Reported-by: Tavis Ormandy
    Signed-off-by: David Howells
    Acked-by: Tavis Ormandy
    Cc: dann frazier
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     
  • commit 9d1ac65a9698513d00e5608d93fca0c53f536c14 upstream.

    There's an protected access to the parent process's credentials in the middle
    of keyctl_session_to_parent(). This results in the following RCU warning:

    ===================================================
    [ INFO: suspicious rcu_dereference_check() usage. ]
    ---------------------------------------------------
    security/keys/keyctl.c:1291 invoked rcu_dereference_check() without protection!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 0
    1 lock held by keyctl-session-/2137:
    #0: (tasklist_lock){.+.+..}, at: [] keyctl_session_to_parent+0x60/0x236

    stack backtrace:
    Pid: 2137, comm: keyctl-session- Not tainted 2.6.36-rc2-cachefs+ #1
    Call Trace:
    [] lockdep_rcu_dereference+0xaa/0xb3
    [] keyctl_session_to_parent+0xed/0x236
    [] sys_keyctl+0xb4/0xb6
    [] system_call_fastpath+0x16/0x1b

    The code should take the RCU read lock to make sure the parents credentials
    don't go away, even though it's holding a spinlock and has IRQ disabled.

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds
    Cc: dann frazier
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     
  • commit 611da04f7a31b2208e838be55a42c7a1310ae321 upstream.

    Since the .31 or so notify rewrite inotify has not sent events about
    inodes which are unmounted. This patch restores those events.

    Signed-off-by: Eric Paris
    Cc: Ben Hutchings
    Signed-off-by: Greg Kroah-Hartman

    Eric Paris
     
  • commit 2d2b6901649a62977452be85df53eda2412def24 upstream.

    Tony's fix (f574c843191728d9407b766a027f779dcd27b272) has a small bug,
    it incorrectly uses "r3" as a scratch register in the first of the two
    unlock paths ... it is also inefficient. Optimize the fast path again.

    Signed-off-by: Petr Tesarik
    Signed-off-by: Tony Luck
    Signed-off-by: Greg Kroah-Hartman

    Petr Tesarik
     
  • commit f574c843191728d9407b766a027f779dcd27b272 upstream.

    When ia64 converted to using ticket locks, an inline implementation
    of trylock/unlock in fsys.S was missed. This was not noticed because
    in most circumstances it simply resulted in using the slow path because
    the siglock was apparently not available (under old spinlock rules).

    Problems occur when the ticket spinlock has value 0x0 (when first
    initialised, or when it wraps around). At this point the fsys.S
    code acquires the lock (changing the 0x0 to 0x1. If another process
    attempts to get the lock at this point, it will change the value from
    0x1 to 0x2 (using new ticket lock rules). Then the fsys.S code will
    free the lock using old spinlock rules by writing 0x0 to it. From
    here a variety of bad things can happen.

    Signed-off-by: Tony Luck
    Signed-off-by: Greg Kroah-Hartman

    Tony Luck
     
  • commit f790674d3f87df6390828ac21a7d1530f71b59c8 upstream.

    Functions set_fan_min() and set_fan_div() assume that the fan_div
    values have already been read from the register. The driver currently
    doesn't initialize them at load time, they are only set when function
    via686a_update_device() is called. This means that set_fan_min() and
    set_fan_div() misbehave if, for example, "sensors -s" is called
    before any monitoring application (e.g. "sensors") is has been run.

    Fix the problem by always initializing the fan_div values at device
    bind time.

    Signed-off-by: Jean Delvare
    Acked-by: Guenter Roeck
    Signed-off-by: Greg Kroah-Hartman

    Jean Delvare
     
  • commit 068e35eee9ef98eb4cab55181977e24995d273be upstream.

    Hardware breakpoints can't be registered within pid namespaces
    because tsk->pid is passed rather than the pid in the current
    namespace.

    (See https://bugzilla.kernel.org/show_bug.cgi?id=17281 )

    This is a quick fix demonstrating the problem but is not the
    best method of solving the problem since passing pids internally
    is not the best way to avoid pid namespace bugs. Subsequent patches
    will show a better solution.

    Much thanks to Frederic Weisbecker for doing
    the bulk of the work finding this bug.

    Reported-by: Robin Green
    Signed-off-by: Matt Helsley
    Signed-off-by: Peter Zijlstra
    Cc: Prasad
    Cc: Arnaldo Carvalho de Melo
    Cc: Steven Rostedt
    Cc: Will Deacon
    Cc: Mahesh Salgaonkar
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Greg Kroah-Hartman

    Matt Helsley
     
  • commit f362b73244fb16ea4ae127ced1467dd8adaa7733 upstream.

    Using a program like the following:

    #include
    #include
    #include
    #include

    int main() {
    id_t id;
    siginfo_t infop;
    pid_t res;

    id = fork();
    if (id == 0) { sleep(1); exit(0); }
    kill(id, SIGSTOP);
    alarm(1);
    waitid(P_PID, id, &infop, WCONTINUED);
    return 0;
    }

    to call waitid() on a stopped process results in access to the child task's
    credentials without the RCU read lock being held - which may be replaced in the
    meantime - eliciting the following warning:

    ===================================================
    [ INFO: suspicious rcu_dereference_check() usage. ]
    ---------------------------------------------------
    kernel/exit.c:1460 invoked rcu_dereference_check() without protection!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 1
    2 locks held by waitid02/22252:
    #0: (tasklist_lock){.?.?..}, at: [] do_wait+0xc5/0x310
    #1: (&(&sighand->siglock)->rlock){-.-...}, at: []
    wait_consider_task+0x19a/0xbe0

    stack backtrace:
    Pid: 22252, comm: waitid02 Not tainted 2.6.35-323cd+ #3
    Call Trace:
    [] lockdep_rcu_dereference+0xa4/0xc0
    [] wait_consider_task+0xaf1/0xbe0
    [] do_wait+0xf5/0x310
    [] sys_waitid+0x86/0x1f0
    [] ? child_wait_callback+0x0/0x70
    [] system_call_fastpath+0x16/0x1b

    This is fixed by holding the RCU read lock in wait_task_continued() to ensure
    that the task's current credentials aren't destroyed between us reading the
    cred pointer and us reading the UID from those credentials.

    Furthermore, protect wait_task_stopped() in the same way.

    We don't need to keep holding the RCU read lock once we've read the UID from
    the credentials as holding the RCU read lock doesn't stop the target task from
    changing its creds under us - so the credentials may be outdated immediately
    after we've read the pointer, lock or no lock.

    Signed-off-by: Daniel J Blueman
    Signed-off-by: David Howells
    Acked-by: Paul E. McKenney
    Acked-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Daniel J Blueman
     
  • commit b4aaa78f4c2f9cde2f335b14f4ca30b01f9651ca upstream.

    The VIAFB_GET_INFO device ioctl allows unprivileged users to read 246
    bytes of uninitialized stack memory, because the "reserved" member of
    the viafb_ioctl_info struct declared on the stack is not altered or
    zeroed before being copied back to the user. This patch takes care of
    it.

    Signed-off-by: Dan Rosenberg
    Signed-off-by: Florian Tobias Schandinat
    Signed-off-by: Greg Kroah-Hartman

    Dan Rosenberg