Eric Lee / smarc-fsl-linux-kernel

15 Jan, 2012

1 commit

0a80939b3 Merge tag 'for-linus' of git://github.com/rustyrussell/linux ... Browse Code »

Autogenerated GPG tag for Rusty D1ADB8F1: 15EE 8D6C AB0E 7F0C F999 BFCB D920 0E6C D1AD B8F1

* tag 'for-linus' of git://github.com/rustyrussell/linux:
module_param: check that bool parameters really are bool.
intelfbdrv.c: bailearly is an int module_param
paride/pcd: fix bool verbose module parameter.
module_param: make bool parameters really bool (drivers & misc)
module_param: make bool parameters really bool (arch)
module_param: make bool parameters really bool (core code)
kernel/async: remove redundant declaration.
printk: fix unnecessary module_param_name.
lirc_parallel: fix module parameter description.
module_param: avoid bool abuse, add bint for special cases.
module_param: check type correctness for module_param_array
modpost: use linker section to generate table.
modpost: use a table rather than a giant if/else statement.
modules: sysfs - export: taint, coresize, initsize
kernel/params: replace DEBUGP with pr_debug
module: replace DEBUGP with pr_debug
module: struct module_ref should contains long fields
module: Fix performance regression on modules with large symbol tables
module: Add comments describing how the "strmap" logic works

Fix up conflicts in scripts/mod/file2alias.c due to the new linker-
generated table approach to adding __mod_*_device_table entries. The
ARM sa11x0 mcp bus needed to be converted to that too.

Linus Torvalds
2012-01-15 04:32:16 +0800

13 Jan, 2012

2 commits

9512938b8 cpumask: update setup_node_to_cpumask_map() comments ... Browse Code »

node_to_cpumask() has been replaced by cpumask_of_node(), and wholly
removed since commit 29c337a0 ("cpumask: remove obsolete node_to_cpumask
now everyone uses cpumask_of_node").

So update the comments for setup_node_to_cpumask_map().

Signed-off-by: Wanlong Gao
Acked-by: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wanlong Gao
2012-01-13 12:13:11 +0800
476bc0015 module_param: make bool parameters really bool (arch) ... Browse Code »

module_param(bool) used to counter-intuitively take an int. In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

It's time to remove the int/unsigned int option. For this version
it'll simply give a warning, but it'll break next kernel version.

Signed-off-by: Rusty Russell

Rusty Russell
2012-01-13 07:02:18 +0800

12 Jan, 2012

1 commit

d0b9706c2 Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/numa: Add constraints check for nid parameters
mm, x86: Remove debug_pagealloc_enabled
x86/mm: Initialize high mem before free_all_bootmem()
arch/x86/kernel/e820.c: quiet sparse noise about plain integer as NULL pointer
arch/x86/kernel/e820.c: Eliminate bubble sort from sanitize_e820_map()
x86: Fix mmap random address range
x86, mm: Unify zone_sizes_init()
x86, mm: Prepare zone_sizes_init() for unification
x86, mm: Use max_low_pfn for ZONE_NORMAL on 64-bit
x86, mm: Wrap ZONE_DMA32 with CONFIG_ZONE_DMA32
x86, mm: Use max_pfn instead of highend_pfn
x86, mm: Move zone init from paging_init() on 64-bit
x86, mm: Use MAX_DMA_PFN for ZONE_DMA on 32-bit

Linus Torvalds
2012-01-12 11:12:10 +0800

07 Jan, 2012

3 commits

cf3f33551 Merge branch 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'x86-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Use "do { } while(0)" for empty lock_cmos()/unlock_cmos() macros
x86: Use "do { } while(0)" for empty flush_tlb_fix_spurious_fault() macro
x86, CPU: Drop superfluous get_cpu_cap() prototype
arch/x86/mm/pageattr.c: Quiet sparse noise; local functions should be static
arch/x86/kernel/ptrace.c: Quiet sparse noise
x86: Use kmemdup() in copy_thread(), rather than duplicating its implementation
x86: Replace the EVT_TO_HPET_DEV() macro with an inline function

Linus Torvalds
2012-01-07 06:00:12 +0800
69734b644 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
x86: Fix atomic64_xxx_cx8() functions
x86: Fix and improve cmpxchg_double{,_local}()
x86_64, asm: Optimise fls(), ffs() and fls64()
x86, bitops: Move fls64.h inside __KERNEL__
x86: Fix and improve percpu_cmpxchg{8,16}b_double()
x86: Report cpb and eff_freq_ro flags correctly
x86/i386: Use less assembly in strlen(), speed things up a bit
x86: Use the same node_distance for 32 and 64-bit
x86: Fix rflags in FAKE_STACK_FRAME
x86: Clean up and extend do_int3()
x86: Call do_notify_resume() with interrupts enabled
x86/div64: Add a micro-optimization shortcut if base is power of two
x86-64: Cleanup some assembly entry points
x86-64: Slightly shorten line system call entry and exit paths
x86-64: Reduce amount of redundant code generated for invalidate_interruptNN
x86-64: Slightly shorten int_ret_from_sys_call
x86, efi: Convert efi_phys_get_time() args to physical addresses
x86: Default to vsyscall=emulate
x86-64: Set siginfo and context on vsyscall emulation faults
x86: consolidate xchg and xadd macros
...

Linus Torvalds
2012-01-07 05:59:14 +0800
67b024313 Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Skip cpus with apic-ids >= 255 in !x2apic_mode
x86, x2apic: Allow "nox2apic" to disable x2apic mode setup by BIOS
x86, x2apic: Fallback to xapic when BIOS doesn't setup interrupt-remapping
x86, acpi: Skip acpi x2apic entries if the x2apic feature is not present
x86, apic: Add probe() for apic_flat
x86: Simplify code by removing a !SMP #ifdefs from 'struct cpuinfo_x86'
x86: Convert per-cpu counter icr_read_retry_count into a member of irq_stat
x86: Add per-cpu stat counter for APIC ICR read tries
pci, x86/io-apic: Allow PCI_IOAPIC to be user configurable on x86
x86: Fix the !CONFIG_NUMA build of the new CPU ID fixup code support
x86: Add NumaChip support
x86: Add x86_init platform override to fix up NUMA core numbering
x86: Make flat_init_apic_ldr() available

Linus Torvalds
2012-01-07 05:58:21 +0800

04 Jan, 2012

1 commit

adaf4ed2a Merge commit 'v3.2-rc7' into x86/asm ... Browse Code »

Merge reason: Update from -rc4 to -rc7.

Signed-off-by: Ingo Molnar

Ingo Molnar
2012-01-04 22:01:28 +0800

24 Dec, 2011

1 commit

a35fd2825 x86, acpi: Skip acpi x2apic entries if the x2apic feature is not present ... Browse Code »

If the x2apic feature is not present (either the cpu is not capable of it
or the user has disabled the feature using boot-parameter etc), ignore the
x2apic MADT and SRAT entries provided by the ACPI tables.

Signed-off-by: Yinghai Lu
Link: http://lkml.kernel.org/r/20111222014632.540896503@sbsiddha-desk.sc.intel.com
Signed-off-by: Suresh Siddha
Signed-off-by: H. Peter Anvin

Yinghai Lu
2011-12-24 03:00:50 +0800

20 Dec, 2011

1 commit

45aa0663c Merge branch 'memblock-kill-early_node_map' of git://git.kernel.org/pub/scm/linu… ... Browse Code »

…x/kernel/git/tj/misc into core/memblock

Ingo Molnar
2011-12-20 19:14:26 +0800

09 Dec, 2011

2 commits

b6999b191 thp: add compound tail page _mapcount when mapped ... Browse Code »
87

With the 3.2-rc kernel, IOMMU 2M pages in KVM works. But when I tried
to use IOMMU 1GB pages in KVM, I encountered an oops and the 1GB page
failed to be used.

The root cause is that 1GB page allocation calls gup_huge_pud() while 2M
page calls gup_huge_pmd. If compound pages are used and the page is a
tail page, gup_huge_pmd() increases _mapcount to record tail page are
mapped while gup_huge_pud does not do that.

So when the mapped page is relesed, it will result in kernel oops
because the page is not marked mapped.

This patch add tail process for compound page in 1GB huge page which
keeps the same process as 2M page.

Reproduce like:
1. Add grub boot option: hugepagesz=1G hugepages=8
2. mount -t hugetlbfs -o pagesize=1G hugetlbfs /dev/hugepages
3. qemu-kvm -m 2048 -hda os-kvm.img -cpu kvm64 -smp 4 -mem-path /dev/hugepages
-net none -device pci-assign,host=07:00.1

kernel BUG at mm/swap.c:114!
invalid opcode: 0000 [#1] SMP
Call Trace:
put_page+0x15/0x37
kvm_release_pfn_clean+0x31/0x36
kvm_iommu_put_pages+0x94/0xb1
kvm_iommu_unmap_memslots+0x80/0xb6
kvm_assign_device+0xba/0x117
kvm_vm_ioctl_assigned_device+0x301/0xa47
kvm_vm_ioctl+0x36c/0x3a2
do_vfs_ioctl+0x49e/0x4e4
sys_ioctl+0x5a/0x7c
system_call_fastpath+0x16/0x1b
RIP put_compound_page+0xd4/0x168

Signed-off-by: Youquan Song
Reviewed-by: Andrea Arcangeli
Cc: Andi Kleen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Youquan Song
2011-12-09 23:50:28 +0800
54eed6cb1 x86/numa: Add constraints check for nid parameters ... Browse Code »

This patch adds constraint checks to the numa_set_distance()
function.

When the check triggers (this should not happen normally) it
emits a warning and avoids a store to a negative index in
numa_distance[] array - i.e. avoids memory corruption.

Negative ids can be passed when the pxm-to-nids mapping is not
properly filled while parsing the SRAT.

Signed-off-by: Petr Holasek
Acked-by: David Rientjes
Cc: Anton Arapov
Link: http://lkml.kernel.org/r/20111208121640.GA2229@dhcp-27-244.brq.redhat.com
Signed-off-by: Ingo Molnar

Petr Holasek
2011-12-09 15:03:34 +0800

06 Dec, 2011

5 commits

54c29c635 mm, x86: Remove debug_pagealloc_enabled ... Browse Code »

When (no)bootmem finish operation, it pass pages to buddy
allocator. Since debug_pagealloc_enabled is not set, we will do
not protect pages, what is not what we want with
CONFIG_DEBUG_PAGEALLOC=y.

To fix remove debug_pagealloc_enabled. That variable was
introduced by commit 12d6f21e "x86: do not PSE on
CONFIG_DEBUG_PAGEALLOC=y" to get more CPA (change page
attribude) code testing. But currently we have CONFIG_CPA_DEBUG,
which test CPA.

Signed-off-by: Stanislaw Gruszka
Acked-by: Mel Gorman
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1322582711-14571-1-git-send-email-sgruszka@redhat.com
Signed-off-by: Ingo Molnar

Stanislaw Gruszka
2011-12-06 16:24:07 +0800
855c743a2 x86/mm: Initialize high mem before free_all_bootmem() ... Browse Code »

Patch fixes a boot crash with pagealloc debugging enabled:

Initializing HighMem for node 0 (000377fe:0003fff0)
BUG: unable to handle kernel paging request at f6fefe80
IP: [] find_range_array+0x5e/0x69
[...]
Call Trace:
[] __get_free_all_memory_range+0x39/0xb4
[] add_highpages_with_active_regions+0x18/0x9b
[] set_highmem_pages_init+0x70/0x90
[] mem_init+0x50/0x21b
[] start_kernel+0x1bf/0x31c
[] i386_start_kernel+0x65/0x67

The crash happens when memblock wants to allocate big area for
temporary "struct range" array and reuses pages from top of low
memory, which were already passed to the buddy allocator.

Reported-by: Ingo Molnar
Signed-off-by: Stanislaw Gruszka
Cc: linux-mm@kvack.org
Cc: Mel Gorman
Link: http://lkml.kernel.org/r/20111206080833.GB3105@redhat.com
Signed-off-by: Ingo Molnar

Stanislaw Gruszka
2011-12-06 16:23:53 +0800
2d070eff6 arch/x86/mm/pageattr.c: Quiet sparse noise; local functions should be static ... Browse Code »

Local functions should be marked static. This also quiets the
following sparse noise:

warning: symbol '_set_memory_array' was not declared. Should it be static?

Signed-off-by: H Hartley Sweeten
Signed-off-by: Andrew Morton
Cc: hartleys@visionengravers.com
Signed-off-by: Ingo Molnar

H Hartley Sweeten
2011-12-06 00:11:05 +0800
9af0c7a6f x86: Fix mmap random address range ... Browse Code »
1

On x86_32 casting the unsigned int result of get_random_int() to
long may result in a negative value. On x86_32 the range of
mmap_rnd() therefore was -255 to 255. The 32bit mode on x86_64
used 0 to 255 as intended.

The bug was introduced by 675a081 ("x86: unify mmap_{32|64}.c")
in January 2008.

Signed-off-by: Ludwig Nussel
Cc: Linus Torvalds
Cc: harvey.harrison@gmail.com
Cc: "H. Peter Anvin"
Cc: Harvey Harrison
Signed-off-by: Andrew Morton
Link: http://lkml.kernel.org/r/201111152246.pAFMklOB028527@wpaz5.hot.corp.google.com
Signed-off-by: Ingo Molnar

Ludwig Nussel
2011-12-06 00:07:23 +0800
2cd1c8d4d x86/paravirt: PTE updates in k(un)map_atomic need to be synchronous, regardless of lazy_mmu mode ... Browse Code »
1

Fix an outstanding issue that has been reported since 2.6.37.
Under a heavy loaded machine processing "fork()" calls could
crash with:

BUG: unable to handle kernel paging request at f573fc8c
IP: [] swap_count_continued+0x104/0x180
*pdpt = 000000002a3b9027 *pde = 0000000001bed067 *pte = 0000000000000000 Oops: 0000 [#1] SMP
Modules linked in:
Pid: 1638, comm: apache2 Not tainted 3.0.4-linode37 #1
EIP: 0061:[] EFLAGS: 00210246 CPU: 3
EIP is at swap_count_continued+0x104/0x180
.. snip..
Call Trace:
[] ? __swap_duplicate+0xc2/0x160
[] ? pte_mfn_to_pfn+0x87/0xe0
[] ? swap_duplicate+0x14/0x40
[] ? copy_pte_range+0x45b/0x500
[] ? copy_page_range+0x195/0x200
[] ? dup_mmap+0x1c6/0x2c0
[] ? dup_mm+0xa8/0x130
[] ? copy_process+0x98a/0xb30
[] ? do_fork+0x4f/0x280
[] ? getnstimeofday+0x43/0x100
[] ? sys_clone+0x30/0x40
[] ? ptregs_clone+0x15/0x48
[] ? syscall_call+0x7/0xb

The problem is that in copy_page_range() we turn lazy mode on,
and then in swap_entry_free() we call swap_count_continued()
which ends up in:

map = kmap_atomic(page, KM_USER0) + offset;

and then later we touch *map.

Since we are running in batched mode (lazy) we don't actually
set up the PTE mappings and the kmap_atomic is not done
synchronously and ends up trying to dereference a page that has
not been set.

Looking at kmap_atomic_prot_pfn(), it uses
'arch_flush_lazy_mmu_mode' and doing the same in
kmap_atomic_prot() and __kunmap_atomic() makes the problem go
away.

Interestingly, commit b8bcfe997e4615 ("x86/paravirt: remove lazy
mode in interrupts") removed part of this to fix an interrupt
issue - but it went to far and did not consider this scenario.

Signed-off-by: Konrad Rzeszutek Wilk
Cc: Peter Zijlstra
Cc: Jeremy Fitzhardinge
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Ingo Molnar

Konrad Rzeszutek Wilk
2011-12-06 00:06:34 +0800

05 Dec, 2011

1 commit

4fc349011 x86-64: Set siginfo and context on vsyscall emulation faults ... Browse Code »

To make this work, we teach the page fault handler how to send
signals on failed uaccess. This only works for user addresses
(kernel addresses will never hit the page fault handler in the
first place), so we need to generate signals for those
separately.

This gets the tricky case right: if the user buffer spans
multiple pages and only the second page is invalid, we set
cr2 and si_addr correctly. UML relies on this behavior to
"fault in" pages as needed.

We steal a bit from thread_info.uaccess_err to enable this.
Before this change, uaccess_err was a 32-bit boolean value.

This fixes issues with UML when vsyscall=emulate.

Reported-by: Adrian Bunk
Signed-off-by: Andy Lutomirski
Cc: richard -rw- weinberger
Cc: H. Peter Anvin
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/4c8f91de7ec5cd2ef0f59521a04e1015f11e42b4.1320712291.git.luto@amacapital.net
Signed-off-by: Ingo Molnar

Andy Lutomirski
2011-12-05 19:17:27 +0800

29 Nov, 2011

1 commit

d4bbf7e77 Merge branch 'master' into x86/memblock ... Browse Code »

Conflicts & resolutions:

* arch/x86/xen/setup.c

dc91c728fd "xen: allow extra memory to be in multiple regions"
24aa07882b "memblock, x86: Replace memblock_x86_reserve/free..."

conflicted on xen_add_extra_mem() updates. The resolution is
trivial as the latter just want to replace
memblock_x86_reserve_range() with memblock_reserve().

* drivers/pci/intel-iommu.c

166e9278a3f "x86/ia64: intel-iommu: move to drivers/iommu/"
5dfe8660a3d "bootmem: Replace work_with_active_regions() with..."

conflicted as the former moved the file under drivers/iommu/.
Resolved by applying the chnages from the latter on the moved
file.

* mm/Kconfig

6661672053a "memblock: add NO_BOOTMEM config symbol"
c378ddd53f9 "memblock, x86: Make ARCH_DISCARD_MEMBLOCK a config option"

conflicted trivially. Both added config options. Just
letting both add their own options resolves the conflict.

* mm/memblock.c

d1f0ece6cdc "mm/memblock.c: small function definition fixes"
ed7b56a799c "memblock: Remove memblock_memory_can_coalesce()"

confliected. The former updates function removed by the
latter. Resolution is trivial.

Signed-off-by: Tejun Heo

Tejun Heo
2011-11-29 01:46:22 +0800

11 Nov, 2011

7 commits

176239153 x86, mm: Unify zone_sizes_init() ... Browse Code »

Now that zone_sizes_init() is identical on 32-bit and 64-bit,
move the code to arch/x86/mm/init.c and use it for both
architectures.

Acked-by: Tejun Heo
Acked-by: Yinghai Lu
Signed-off-by: Pekka Enberg
Link: http://lkml.kernel.org/r/1320155902-10424-7-git-send-email-penberg@kernel.org
Signed-off-by: Ingo Molnar

Pekka Enberg
2011-11-11 17:22:55 +0800
248b52b97 x86, mm: Prepare zone_sizes_init() for unification ... Browse Code »

Make 32-bit and 64-bit zone_sizes_init() identical in
preparation for unification.

Acked-by: Tejun Heo
Acked-by: Yinghai Lu
Signed-off-by: Pekka Enberg
Link: http://lkml.kernel.org/r/1320155902-10424-6-git-send-email-penberg@kernel.org
Signed-off-by: Ingo Molnar

Pekka Enberg
2011-11-11 17:22:53 +0800
ece838b62 x86, mm: Use max_low_pfn for ZONE_NORMAL on 64-bit ... Browse Code »

64-bit has no highmem so max_low_pfn is always the same as
'max_pfn'.

Acked-by: Tejun Heo
Acked-by: Yinghai Lu
Acked-by: David Rientjes
Signed-off-by: Pekka Enberg
Link: http://lkml.kernel.org/r/1320155902-10424-5-git-send-email-penberg@kernel.org
Signed-off-by: Ingo Molnar

Pekka Enberg
2011-11-11 17:22:50 +0800
80b3cac97 x86, mm: Wrap ZONE_DMA32 with CONFIG_ZONE_DMA32 ... Browse Code »

In preparation for unifying 32-bit and 64-bit zone_sizes_init()
make sure ZONE_DMA32 is wrapped in CONFIG_ZONE_DMA32.

Acked-by: Tejun Heo
Acked-by: Yinghai Lu
Acked-by: David Rientjes
Acked-by: Arun Sharma
Signed-off-by: Pekka Enberg
Link: http://lkml.kernel.org/r/1320155902-10424-4-git-send-email-penberg@kernel.org
Signed-off-by: Ingo Molnar

Pekka Enberg
2011-11-11 17:22:48 +0800
e4794640c x86, mm: Use max_pfn instead of highend_pfn ... Browse Code »

The 'highend_pfn' variable is always set to 'max_pfn' so just
use the latter directly.

Acked-by: Tejun Heo
Acked-by: Yinghai Lu
Signed-off-by: Pekka Enberg
Link: http://lkml.kernel.org/r/1320155902-10424-3-git-send-email-penberg@kernel.org
Signed-off-by: Ingo Molnar

Pekka Enberg
2011-11-11 17:22:45 +0800
4c0b2e5f8 x86, mm: Move zone init from paging_init() on 64-bit ... Browse Code »

This patch introduces a zone_sizes_init() helper function on
64-bit to make it more similar to 32-bit init.

Acked-by: Tejun Heo
Acked-by: Yinghai Lu
Acked-by: David Rientjes
Signed-off-by: Pekka Enberg
Link: http://lkml.kernel.org/r/1320155902-10424-2-git-send-email-penberg@kernel.org
Signed-off-by: Ingo Molnar

Pekka Enberg
2011-11-11 17:22:43 +0800
ff14c1d01 x86, mm: Use MAX_DMA_PFN for ZONE_DMA on 32-bit ... Browse Code »

Use MAX_DMA_PFN which represents the 16 MB ISA DMA limit on
32-bit x86 just like we do on 64-bit.

Acked-by: Tejun Heo
Acked-by: Yinghai Lu
Acked-by: David Rientjes
Signed-off-by: Pekka Enberg
Link: http://lkml.kernel.org/r/1320155902-10424-1-git-send-email-penberg@kernel.org
Signed-off-by: Ingo Molnar

Pekka Enberg
2011-11-11 17:22:41 +0800

03 Nov, 2011

2 commits

b35a35b55 thp: share get_huge_page_tail() ... Browse Code »
1

This avoids duplicating the function in every arch gup_fast.

Signed-off-by: Andrea Arcangeli
Cc: Peter Zijlstra
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Rik van Riel
Cc: Mel Gorman
Cc: KOSAKI Motohiro
Cc: Benjamin Herrenschmidt
Cc: David Gibson
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: David Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2011-11-03 07:06:58 +0800
70b50f94f mm: thp: tail page refcounting fix ... Browse Code »
2

Michel while working on the working set estimation code, noticed that
calling get_page_unless_zero() on a random pfn_to_page(random_pfn)
wasn't safe, if the pfn ended up being a tail page of a transparent
hugepage under splitting by __split_huge_page_refcount().

He then found the problem could also theoretically materialize with
page_cache_get_speculative() during the speculative radix tree lookups
that uses get_page_unless_zero() in SMP if the radix tree page is freed
and reallocated and get_user_pages is called on it before
page_cache_get_speculative has a chance to call get_page_unless_zero().

So the best way to fix the problem is to keep page_tail->_count zero at
all times. This will guarantee that get_page_unless_zero() can never
succeed on any tail page. page_tail->_mapcount is guaranteed zero and
is unused for all tail pages of a compound page, so we can simply
account the tail page references there and transfer them to
tail_page->_count in __split_huge_page_refcount() (in addition to the
head_page->_mapcount).

While debugging this s/_count/_mapcount/ change I also noticed get_page is
called by direct-io.c on pages returned by get_user_pages. That wasn't
entirely safe because the two atomic_inc in get_page weren't atomic. As
opposed to other get_user_page users like secondary-MMU page fault to
establish the shadow pagetables would never call any superflous get_page
after get_user_page returns. It's safer to make get_page universally safe
for tail pages and to use get_page_foll() within follow_page (inside
get_user_pages()). get_page_foll() is safe to do the refcounting for tail
pages without taking any locks because it is run within PT lock protected
critical sections (PT lock for pte and page_table_lock for
pmd_trans_huge).

The standard get_page() as invoked by direct-io instead will now take
the compound_lock but still only for tail pages. The direct-io paths
are usually I/O bound and the compound_lock is per THP so very
finegrined, so there's no risk of scalability issues with it. A simple
direct-io benchmarks with all lockdep prove locking and spinlock
debugging infrastructure enabled shows identical performance and no
overhead. So it's worth it. Ideally direct-io should stop calling
get_page() on pages returned by get_user_pages(). The spinlock in
get_page() is already optimized away for no-THP builds but doing
get_page() on tail pages returned by GUP is generally a rare operation
and usually only run in I/O paths.

This new refcounting on page_tail->_mapcount in addition to avoiding new
RCU critical sections will also allow the working set estimation code to
work without any further complexity associated to the tail page
refcounting with THP.

Signed-off-by: Andrea Arcangeli
Reported-by: Michel Lespinasse
Reviewed-by: Michel Lespinasse
Reviewed-by: Minchan Kim
Cc: Peter Zijlstra
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Rik van Riel
Cc: Mel Gorman
Cc: KOSAKI Motohiro
Cc: Benjamin Herrenschmidt
Cc: David Gibson
Cc:
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2011-11-03 07:06:57 +0800

28 Oct, 2011

2 commits

ca836a254 Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86-64, doc: Remove int 0xcc from entry_64.S documentation
x86, vsyscall: Add missing to arch/x86/mm/fault.c

Fix up trivial conflicts in arch/x86/mm/fault.c (asm/fixmap.h vs
asm/vsyscall.h: both work, which to use? Whatever..)

Linus Torvalds
2011-10-28 20:46:02 +0800
e34eb39c1 Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, amd: Include linux/elf.h since we use stuff from asm/elf.h
x86: cache_info: Update calculation of AMD L3 cache indices
x86: cache_info: Kill the atomic allocation in amd_init_l3_cache()
x86: cache_info: Kill the moronic shadow struct
x86: cache_info: Remove bogus free of amd_l3_cache data
x86, amd: Include elf.h explicitly, prepare the code for the module.h split
x86-32, amd: Move va_align definition to unbreak 32-bit build
x86, amd: Move BSP code to cpu_dev helper
x86: Add a BSP cpu_dev helper
x86, amd: Avoid cache aliasing penalties on AMD family 15h

Linus Torvalds
2011-10-28 20:03:12 +0800

26 Oct, 2011

1 commit

7b86572a7 Merge branch 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86-64: Fix CFI data for interrupt frames
x86-64: Don't apply destructive erratum workaround on unaffected CPUs

Linus Torvalds
2011-10-26 23:42:03 +0800

25 Oct, 2011

1 commit

59e525341 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (59 commits)
MAINTAINERS: linux-m32r is moderated for non-subscribers
linux@lists.openrisc.net is moderated for non-subscribers
Drop default from "DM365 codec select" choice
parisc: Kconfig: cleanup Kernel page size default
Kconfig: remove redundant CONFIG_ prefix on two symbols
cris: remove arch/cris/arch-v32/lib/nand_init.S
microblaze: add missing CONFIG_ prefixes
h8300: drop puzzling Kconfig dependencies
MAINTAINERS: microblaze-uclinux@itee.uq.edu.au is moderated for non-subscribers
tty: drop superfluous dependency in Kconfig
ARM: mxc: fix Kconfig typo 'i.MX51'
Fix file references in Kconfig files
aic7xxx: fix Kconfig references to READMEs
Fix file references in drivers/ide/
thinkpad_acpi: Fix printk typo 'bluestooth'
bcmring: drop commented out line in Kconfig
btmrvl_sdio: fix typo 'btmrvl_sdio_sd6888'
doc: raw1394: Trivial typo fix
CIFS: Don't free volume_info->UNC until we are entirely done with it.
treewide: Correct spelling of successfully in comments
...

Linus Torvalds
2011-10-25 18:11:02 +0800

24 Oct, 2011

1 commit

8548c84da x86: Fix S4 regression ... Browse Code »
1

Commit 4b239f458 ("x86-64, mm: Put early page table high") causes a S4
regression since 2.6.39, namely the machine reboots occasionally at S4
resume. It doesn't happen always, overall rate is about 1/20. But,
like other bugs, once when this happens, it continues to happen.

This patch fixes the problem by essentially reverting the memory
assignment in the older way.

Signed-off-by: Takashi Iwai
Cc:
Cc: Rafael J. Wysocki
Cc: Yinghai Lu
[ We'll hopefully find the real fix, but that's too late for 3.1 now ]
Signed-off-by: Linus Torvalds

Takashi Iwai
2011-10-24 12:55:20 +0800

29 Sep, 2011

1 commit

e05139f25 x86-64: Don't apply destructive erratum workaround on unaffected CPUs ... Browse Code »

Erratum 93 applies to AMD K8 CPUs only, and its workaround
(forcing the upper 32 bits of %rip to all get set under certain
conditions) is actually getting in the way of analyzing page
faults occurring during EFI physical mode runtime calls (in
particular the page table walk shown is completely unrelated to
the actual fault). This is because typically EFI runtime code
lives in the space between 2G and 4G, which - modulo the above
manipulation - is likely to overlap with the kernel or modules
area.

While even for the other errata workarounds their taking effect
could be limited to just the affected CPUs, none of them appears
to be destructive, and they're generally getting called only
outside of performance critical paths, so they're being left
untouched.

Signed-off-by: Jan Beulich
Link: http://lkml.kernel.org/r/4E835FE30200007800058464@nat28.tlf.novell.com
Signed-off-by: Ingo Molnar

Jan Beulich
2011-09-29 01:04:48 +0800

15 Sep, 2011

2 commits

e060c3843 Merge branch 'master' into for-next ... Browse Code »

Fast-forward merge with Linus to be able to merge patches
based on more recent version of the tree.

Jiri Kosina
2011-09-15 21:08:18 +0800
469bfb4a5 Remove unneeded version.h include from arch/x86/ ... Browse Code »

It was pointed out by 'make versioncheck' that the include of
linux/version.h is not needed in arch/x86/mm/mmio-mod.c .
This patch removes it.

Signed-off-by: Jesper Juhl
Signed-off-by: Jiri Kosina

Jesper Juhl
2011-09-15 20:57:06 +0800

16 Aug, 2011

2 commits

fab1167c4 x86, vsyscall: Add missing <asm/fixmap.h> to arch/x86/mm/fault.c ... Browse Code »

arch/x86/mm/fault.c now depend on having the symbol VSYSCALL_START
defined, which is best handled by including (it isn't
unreasonable we may want other fixed addresses in this file in the
future, and so it is cleaner than including
directly.)

This addresses an x86-64 allnoconfig build failure. On other
configurations it was masked by an indirect path:

-> -> ->

... however, the first such include is conditional on CONFIG_X86_LOCAL_APIC.

Originally-by: Randy Dunlap
Cc: Linus Torvalds
Link: http://lkml.kernel.org/r/CA%2B55aFxsOMc9=p02r8-QhJ=h=Mqwckk4_Pnx9LQt5%2BfqMp_exQ@mail.gmail.com
Signed-off-by: H. Peter Anvin

H. Peter Anvin
2011-08-16 23:04:02 +0800
cedf03bd9 x86: fix mm/fault.c build ... Browse Code »

arch/x86/mm/fault.c needs to include asm/vsyscall.h to fix a
build error:

arch/x86/mm/fault.c: In function '__bad_area_nosemaphore':
arch/x86/mm/fault.c:728: error: 'VSYSCALL_START' undeclared (first use in this function)

Signed-off-by: Randy Dunlap
Signed-off-by: Linus Torvalds

Randy Dunlap
2011-08-16 10:10:50 +0800

13 Aug, 2011

1 commit

06e727d2a Merge branch 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-tip ... Browse Code »

* 'x86-vdso-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-tip:
x86-64: Rework vsyscall emulation and add vsyscall= parameter
x86-64: Wire up getcpu syscall
x86: Remove unnecessary compile flag tweaks for vsyscall code
x86-64: Add vsyscall:emulate_vsyscall trace event
x86-64: Add user_64bit_mode paravirt op
x86-64, xen: Enable the vvar mapping
x86-64: Work around gold bug 13023
x86-64: Move the "user" vsyscall segment out of the data segment.
x86-64: Pad vDSO to a page boundary

Linus Torvalds
2011-08-13 11:46:24 +0800

11 Aug, 2011

1 commit

3ae36655b x86-64: Rework vsyscall emulation and add vsyscall= parameter ... Browse Code »

There are three choices:

vsyscall=native: Vsyscalls are native code that issues the
corresponding syscalls.

vsyscall=emulate (default): Vsyscalls are emulated by instruction
fault traps, tested in the bad_area path. The actual contents of
the vsyscall page is the same as the vsyscall=native case except
that it's marked NX. This way programs that make assumptions about
what the code in the page does will not be confused when they read
that code.

vsyscall=none: Trying to execute a vsyscall will segfault.

Signed-off-by: Andy Lutomirski
Link: http://lkml.kernel.org/r/8449fb3abf89851fd6b2260972666a6f82542284.1312988155.git.luto@mit.edu
Signed-off-by: H. Peter Anvin

Andy Lutomirski
2011-08-11 08:26:46 +0800