10 Dec, 2011

9 commits


09 Dec, 2011

30 commits

  • In order to safely dereference current->real_parent inside an
    rcu_read_lock, we need an rcu_dereference.

    Signed-off-by: Mandeep Singh Baines
    Cc: Thomas Gleixner
    Cc: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mandeep Singh Baines
     
  • Modify initialization of PCIe capability registers in Tsi721 mport driver:
    - change Completion Timeout value to avoid unexpected data transfer
    aborts during intensive traffic.
    - replace hardcoded offset of PCIe capability block by making it use the
    common function.

    This patch is applicable to kernel versions starting from 3.2-rc1.

    Signed-off-by: Alexandre Bounine
    Cc: Matt Porter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Bounine
     
  • Bug fix for Tsi721 RapidIO mport driver: Tsi721 supports four RapidIO
    mailboxes (MBOX0 - MBOX3) as defined by RapidIO specification. Mailbox
    resources has to be properly reported to allow use of all available
    mailboxes (initial version reports only MBOX0).

    This patch is applicable to kernel versions staring from 3.2-rc1.

    Signed-off-by: Alexandre Bounine
    Cc: Matt Porter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Bounine
     
  • Replace the pair dma_alloc_coherent()+memset() with the new
    dma_zalloc_coherent() added by Andrew Morton for kernel version 3.2

    Signed-off-by: Alexandre Bounine
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexandre Bounine
     
  • Since commit a25cac5198d4 ("proc: Consider NO_HZ when printing idle and
    iowait times") we are reporting idle/io_wait time also while a CPU is
    tickless. We rely on get_{idle,iowait}_time functions to retrieve
    proper data.

    These functions, however, use usecs_to_cputime to translate micro
    seconds time to cputime64_t. This is just an alias to usecs_to_jiffies
    which reduces the data type from u64 to unsigned int and also checks
    whether the given parameter overflows jiffies_to_usecs(MAX_JIFFY_OFFSET)
    and returns MAX_JIFFY_OFFSET in that case.

    When we overflow depends on CONFIG_HZ but especially for CONFIG_HZ_300
    it is quite low (1431649781) so we are getting MAX_JIFFY_OFFSET for
    >3000s! until we overflow unsigned int. Just for reference
    CONFIG_HZ_100 has an overflow window around 20s, CONFIG_HZ_250 ~8s and
    CONFIG_HZ_1000 ~2s.

    This results in a bug when people saw [h]top going mad reporting 100%
    CPU usage even though there was basically no CPU load. The reason was
    simply that /proc/stat stopped reporting idle/io_wait changes (and
    reported MAX_JIFFY_OFFSET) and so the only change happening was for user
    system time.

    Let's use nsecs_to_jiffies64 instead which doesn't reduce the precision
    to 32b type and it is much more appropriate for cumulative time values
    (unlike usecs_to_jiffies which intended for timeout calculations).

    Signed-off-by: Michal Hocko
    Tested-by: Artem S. Tashkinov
    Cc: Dave Jones
    Cc: Arnd Bergmann
    Cc: Alexey Dobriyan
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Commit f5252e00 ("mm: avoid null pointer access in vm_struct via
    /proc/vmallocinfo") adds newly allocated vm_structs to the vmlist after
    it is fully initialised. Unfortunately, it did not check that
    __vmalloc_area_node() successfully populated the area. In the event of
    allocation failure, the vmalloc area is freed but the pointer to freed
    memory is inserted into the vmlist leading to a a crash later in
    get_vmalloc_info().

    This patch adds a check for ____vmalloc_area_node() failure within
    __vmalloc_node_range. It does not use "goto fail" as in the previous
    error path as a warning was already displayed by __vmalloc_area_node()
    before it called vfree in its failure path.

    Credit goes to Luciano Chavez for doing all the real work of identifying
    exactly where the problem was.

    Signed-off-by: Mel Gorman
    Reported-by: Luciano Chavez
    Tested-by: Luciano Chavez
    Reviewed-by: Rik van Riel
    Acked-by: David Rientjes
    Cc: [3.1.x+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • setup_zone_migrate_reserve() expects that zone->start_pfn starts at
    pageblock_nr_pages aligned pfn otherwise we could access beyond an
    existing memblock resulting in the following panic if
    CONFIG_HOLES_IN_ZONE is not configured and we do not check pfn_valid:

    IP: [] setup_zone_migrate_reserve+0xcd/0x180
    *pdpt = 0000000000000000 *pde = f000ff53f000ff53
    Oops: 0000 [#1] SMP
    Pid: 1, comm: swapper Not tainted 3.0.7-0.7-pae #1 VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform
    EIP: 0060:[] EFLAGS: 00010006 CPU: 0
    EIP is at setup_zone_migrate_reserve+0xcd/0x180
    EAX: 000c0000 EBX: f5801fc0 ECX: 000c0000 EDX: 00000000
    ESI: 000c01fe EDI: 000c01fe EBP: 00140000 ESP: f2475f58
    DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
    Process swapper (pid: 1, ti=f2474000 task=f2472cd0 task.ti=f2474000)
    Call Trace:
    [] __setup_per_zone_wmarks+0xec/0x160
    [] setup_per_zone_wmarks+0xf/0x20
    [] init_per_zone_wmark_min+0x27/0x86
    [] do_one_initcall+0x2b/0x160
    [] kernel_init+0xbe/0x157
    [] kernel_thread_helper+0x6/0xd
    Code: a5 39 f5 89 f7 0f 46 fd 39 cf 76 40 8b 03 f6 c4 08 74 32 eb 91 90 89 c8 c1 e8 0e 0f be 80 80 2f 86 c0 8b 14 85 60 2f 86 c0 89 c8 82 b4 12 00 00 c1 e0 05 03 82 ac 12 00 00 8b 00 f6 c4 08 0f
    EIP: [] setup_zone_migrate_reserve+0xcd/0x180 SS:ESP 0068:f2475f58
    CR2: 00000000000012b4

    We crashed in pageblock_is_reserved() when accessing pfn 0xc0000 because
    highstart_pfn = 0x36ffe.

    The issue was introduced in 3.0-rc1 by 6d3163ce ("mm: check if any page
    in a pageblock is reserved before marking it MIGRATE_RESERVE").

    Make sure that start_pfn is always aligned to pageblock_nr_pages to
    ensure that pfn_valid s always called at the start of each pageblock.
    Architectures with holes in pageblocks will be correctly handled by
    pfn_valid_within in pageblock_is_reserved.

    Signed-off-by: Michal Hocko
    Signed-off-by: Mel Gorman
    Tested-by: Dang Bo
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Andrea Arcangeli
    Cc: David Rientjes
    Cc: Arve Hjnnevg
    Cc: KOSAKI Motohiro
    Cc: John Stultz
    Cc: Dave Hansen
    Cc: [3.0+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Hocko
     
  • Avoid unlocking and unlocked page if we failed to lock it.

    Signed-off-by: Hillf Danton
    Cc: Naoya Horiguchi
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hillf Danton
     
  • Commit 70b50f94f1644 ("mm: thp: tail page refcounting fix") keeps all
    page_tail->_count zero at all times. But the current kernel does not
    set page_tail->_count to zero if a 1GB page is utilized. So when an
    IOMMU 1GB page is used by KVM, it wil result in a kernel oops because a
    tail page's _count does not equal zero.

    kernel BUG at include/linux/mm.h:386!
    invalid opcode: 0000 [#1] SMP
    Call Trace:
    gup_pud_range+0xb8/0x19d
    get_user_pages_fast+0xcb/0x192
    ? trace_hardirqs_off+0xd/0xf
    hva_to_pfn+0x119/0x2f2
    gfn_to_pfn_memslot+0x2c/0x2e
    kvm_iommu_map_pages+0xfd/0x1c1
    kvm_iommu_map_memslots+0x7c/0xbd
    kvm_iommu_map_guest+0xaa/0xbf
    kvm_vm_ioctl_assigned_device+0x2ef/0xa47
    kvm_vm_ioctl+0x36c/0x3a2
    do_vfs_ioctl+0x49e/0x4e4
    sys_ioctl+0x5a/0x7c
    system_call_fastpath+0x16/0x1b
    RIP gup_huge_pud+0xf2/0x159

    Signed-off-by: Youquan Song
    Reviewed-by: Andrea Arcangeli
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Youquan Song
     
  • With the 3.2-rc kernel, IOMMU 2M pages in KVM works. But when I tried
    to use IOMMU 1GB pages in KVM, I encountered an oops and the 1GB page
    failed to be used.

    The root cause is that 1GB page allocation calls gup_huge_pud() while 2M
    page calls gup_huge_pmd. If compound pages are used and the page is a
    tail page, gup_huge_pmd() increases _mapcount to record tail page are
    mapped while gup_huge_pud does not do that.

    So when the mapped page is relesed, it will result in kernel oops
    because the page is not marked mapped.

    This patch add tail process for compound page in 1GB huge page which
    keeps the same process as 2M page.

    Reproduce like:
    1. Add grub boot option: hugepagesz=1G hugepages=8
    2. mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages
    3. qemu-kvm -m 2048 -hda os-kvm.img -cpu kvm64 -smp 4 -mem-path /dev/hugepages
    -net none -device pci-assign,host=07:00.1

    kernel BUG at mm/swap.c:114!
    invalid opcode: 0000 [#1] SMP
    Call Trace:
    put_page+0x15/0x37
    kvm_release_pfn_clean+0x31/0x36
    kvm_iommu_put_pages+0x94/0xb1
    kvm_iommu_unmap_memslots+0x80/0xb6
    kvm_assign_device+0xba/0x117
    kvm_vm_ioctl_assigned_device+0x301/0xa47
    kvm_vm_ioctl+0x36c/0x3a2
    do_vfs_ioctl+0x49e/0x4e4
    sys_ioctl+0x5a/0x7c
    system_call_fastpath+0x16/0x1b
    RIP put_compound_page+0xd4/0x168

    Signed-off-by: Youquan Song
    Reviewed-by: Andrea Arcangeli
    Cc: Andi Kleen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Youquan Song
     
  • Commit 4f2a8d3cf5e ("printk: Fix console_sem vs logbuf_lock unlock race")
    introduced another silly bug where we would want to acquire an already
    held lock. Avoid this.

    Reported-by: Andrea Arcangeli
    Signed-off-by: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • More players joined to memory cgroup developments and Johannes' great work
    changed internal design of memory cgroup dramatically. And he will do
    more works. Michal Hokko did many bug fixes and know memory cgroup very
    well. Daisuke Nishimura helped us very much but he seems busy now.
    Thanks to his works.

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: Michal Hocko
    Acked-by: Johannes Weiner
    Acked-by: Daisuke Nishimura
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • If an error occurs after the clock is enabled, the enable/disable state
    can become unbalanced.

    Signed-off-by: Jonghwan Choi
    Cc: Alessandro Zummo
    Acked-by: Kukjin Kim
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jonghwan Choi
     
  • Small clean-up for my CREDITS entry; the GPG fingerprint was not up to
    date, so I fixed other details at the same time too.

    Signed-off-by: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • khugepaged can sometimes cause suspend to fail, requiring that the user
    retry the suspend operation.

    Use wait_event_freezable_timeout() instead of
    schedule_timeout_interruptible() to avoid missing freezer wakeups. A
    try_to_freeze() would have been needed in the khugepaged_alloc_hugepage
    tight loop too in case of the allocation failing repeatedly, and
    wait_event_freezable_timeout will provide it too.

    khugepaged would still freeze just fine by trying again the next minute
    but it's better if it freezes immediately.

    Reported-by: Jiri Slaby
    Signed-off-by: Andrea Arcangeli
    Tested-by: Jiri Slaby
    Cc: Tejun Heo
    Cc: Oleg Nesterov
    Cc: "Srivatsa S. Bhat"
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Fix the error message "directives may not be used inside a macro argument"
    which appears when the kernel is compiled for the cris architecture.

    Signed-off-by: Claudio Scordino
    Cc: Andrea Arcangeli
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Claudio Scordino
     
  • Use atomic-long operations instead of looping around cmpxchg().

    [akpm@linux-foundation.org: massage atomic.h inclusions]
    Signed-off-by: Konstantin Khlebnikov
    Cc: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • A shrinker function can return -1, means that it cannot do anything
    without a risk of deadlock. For example prune_super() does this if it
    cannot grab a superblock refrence, even if nr_to_scan=0. Currently we
    interpret this -1 as a ULONG_MAX size shrinker and evaluate `total_scan'
    according to this. So the next time around this shrinker can cause
    really big pressure. Let's skip such shrinkers instead.

    Also make total_scan signed, otherwise the check (total_scan < 0) below
    never works.

    Signed-off-by: Konstantin Khlebnikov
    Cc: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     
  • Commit 29b68415e335 ("x86: amd_iommu: move to drivers/iommu/")
    moved the files, update the patterns.

    CC: Ohad Ben-Cohen
    CC: Joerg Roedel

    Signed-off-by: Joe Perches
    Signed-off-by: Joerg Roedel

    Joe Perches
     
  • If we encounter an efi_memory_desc_t without EFI_MEMORY_WB set
    in ->attribute we currently call set_memory_uc(), which in turn
    calls __pa() on a potentially ioremap'd address.

    On CONFIG_X86_32 this is invalid, resulting in the following
    oops on some machines:

    BUG: unable to handle kernel paging request at f7f22280
    IP: [] reserve_ram_pages_type+0x89/0x210
    [...]

    Call Trace:
    [] ? page_is_ram+0x1a/0x40
    [] reserve_memtype+0xdf/0x2f0
    [] set_memory_uc+0x49/0xa0
    [] efi_enter_virtual_mode+0x1c2/0x3aa
    [] start_kernel+0x291/0x2f2
    [] ? loglevel+0x1b/0x1b
    [] i386_start_kernel+0xbf/0xc8

    A better approach to this problem is to map the memory region
    with the correct attributes from the start, instead of modifying
    it after the fact. The uncached case can be handled by
    ioremap_nocache() and the cached by ioremap_cache().

    Despite first impressions, it's not possible to use
    ioremap_cache() to map all cached memory regions on
    CONFIG_X86_64 because EFI_RUNTIME_SERVICES_DATA regions really
    don't like being mapped into the vmalloc space, as detailed in
    the following bug report,

    https://bugzilla.redhat.com/show_bug.cgi?id=748516

    Therefore, we need to ensure that any EFI_RUNTIME_SERVICES_DATA
    regions are covered by the direct kernel mapping table on
    CONFIG_X86_64. To accomplish this we now map E820_RESERVED_EFI
    regions via the direct kernel mapping with the initial call to
    init_memory_mapping() in setup_arch(), whereas previously these
    regions wouldn't be mapped if they were after the last E820_RAM
    region until efi_ioremap() was called. Doing it this way allows
    us to delete efi_ioremap() completely.

    Signed-off-by: Matt Fleming
    Cc: H. Peter Anvin
    Cc: Matthew Garrett
    Cc: Zhang Rui
    Cc: Huang Ying
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Link: http://lkml.kernel.org/r/1321621751-3650-1-git-send-email-matt@console-pimps.org
    Signed-off-by: Ingo Molnar

    Matt Fleming
     
  • Prior to commit eaf35b1, cifs_save_resume_key had some NULL pointer
    checks at the top. It turns out that at least one of those NULL
    pointer checks is needed after all.

    When the LastNameOffset in a FIND reply appears to be beyond the end of
    the buffer, CIFSFindFirst and CIFSFindNext will set srch_inf.last_entry
    to NULL. Since eaf35b1, the code will now oops in this situation.

    Fix this by having the callers check for a NULL last entry pointer
    before calling cifs_save_resume_key. No change is needed for the
    call site in cifs_readdir as it's not reachable with a NULL
    current_entry pointer.

    This should fix:

    https://bugzilla.redhat.com/show_bug.cgi?id=750247

    Cc: stable@vger.kernel.org
    Cc: Christoph Hellwig
    Reported-by: Adam G. Metzler
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     
  • In the recent overhaul of the demultiplex thread receive path, I
    neglected to ensure that we attempt to freeze on each pass through the
    receive loop.

    Reported-and-Tested-by: Woody Suwalski
    Reported-and-Tested-by: Adam Williamson
    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     
  • Fix sparse endian check warning while calling cifs_strtoUCS

    CHECK fs/cifs/smbencrypt.c
    fs/cifs/smbencrypt.c:216:37: warning: incorrect type in argument 1
    (different base types)
    fs/cifs/smbencrypt.c:216:37: expected restricted __le16 [usertype] *
    fs/cifs/smbencrypt.c:216:37: got unsigned short *

    Signed-off-by: Steve French
    Acked-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com

    Steve French
     
  • Signed-off-by: Pavel Shilovsky
    Signed-off-by: Steve French

    Pavel Shilovsky
     
  • NULL pointer access causes crash in raid5 module.

    Signed-off-by: Adam Kwolek
    Signed-off-by: NeilBrown

    Adam Kwolek
     
  • * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    alarmtimers: Fix time comparison
    ptp: Fix clock_getres() implementation

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: drop spin lock when memory alloc fails
    Btrfs: check if the to-be-added device is writable
    Btrfs: try cluster but don't advance in search list
    Btrfs: try to allocate from cluster even at LOOP_NO_EMPTY_SIZE

    Linus Torvalds
     
  • * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (28 commits)
    ARM: sa1100: fix build error
    ARM: OMAP1: recalculate loops per jiffy after dpll1 reprogram
    ARM: davinci: dm365 evm: align nand partition table to u-boot
    ARM: davinci: da850 evm: change audio edma event queue to EVENTQ_0
    ARM: davinci: dm646x evm: wrong register used in setup_vpif_input_channel_mode
    ARM: davinci: dm646x does not have a DSP domain
    ARM: davinci: psc: fix incorrect offsets
    ARM: davinci: psc: fix incorrect mask
    ARM: mx28: LRADC macro rename
    arm: mx23: recognise stmp378x as mx23
    ARM: mxs: fix machines' initializers order
    ARM: mxs/tx28: add __initconst for fec pdata
    ARM: S3C64XX: Staticise s3c6400_sysclass
    ARM: S3C64XX: Add linux/export.h to dev-spi.c
    ARM: S3C64XX: Remove extern from definition of framebuffer setup call
    MAINTAINERS: Extend Samsung patterns to cover SPI and ASoC drivers
    MAINTAINERS: Add linux-samsung-soc mailing list for Samsung
    MAINTAINERS: Consolidate Samsung MAINTAINERS
    ARM: CSR: PM: fix build error due to undeclared 'THIS_MODULE'
    ARM: CSR: fix build error due to new mdesc->dma_zone_size
    ...

    Linus Torvalds
     
  • Current tomoyo_realpath_from_path() implementation returns strange pathname
    when calculating pathname of a file which belongs to lazy unmounted tree.
    Use local pathname rather than strange absolute pathname in that case.

    Also, this patch fixes a regression by commit 02125a82 "fix apparmor
    dereferencing potentially freed dentry, sanitize __d_path() API".

    Signed-off-by: Tetsuo Handa
    Acked-by: Al Viro
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     
  • When HPET is operating in RTC mode, the TN_ENABLE bit on timer1
    controls whether the HPET or the RTC delivers interrupts to irq8. When
    the system goes into suspend, the RTC driver sends a signal to the
    HPET driver so that the HPET releases control of irq8, allowing the
    RTC to wake the system from suspend. The switchover is accomplished by
    a write to the HPET configuration registers which currently only
    occurs while servicing the HPET interrupt.

    On some systems, I have seen the system suspend before an HPET
    interrupt occurs, preventing the write to the HPET configuration
    register and leaving the HPET in control of the irq8. As the HPET is
    not active during suspend, it does not generate a wake signal and RTC
    alarms do not work.

    This patch forces the HPET driver to immediately transfer control of
    the irq8 channel to the RTC instead of waiting until the next
    interrupt event.

    Signed-off-by: Mark Langsdorf
    Link: http://lkml.kernel.org/r/20111118153306.GB16319@alberich.amd.com
    Tested-by: Andreas Herrmann
    Signed-off-by: Andreas Herrmann
    Signed-off-by: Thomas Gleixner
    Cc: stable@vger.kernel.org

    Mark Langsdorf
     

08 Dec, 2011

1 commit