Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

26 Sep, 2014

1 commit

c32064785 x86/mm: In the PTE swapout page reclaim case clear the accessed bit instead of flushing the TLB ... Browse Code »

commit b13b1d2d8692b437203de7a404c6b809d2cc4d99 upstream.

We use the accessed bit to age a page at page reclaim time,
and currently we also flush the TLB when doing so.

But in some workloads TLB flush overhead is very heavy. In my
simple multithreaded app with a lot of swap to several pcie
SSDs, removing the tlb flush gives about 20% ~ 30% swapout
speedup.

Fortunately just removing the TLB flush is a valid optimization:
on x86 CPUs, clearing the accessed bit without a TLB flush
doesn't cause data corruption.

It could cause incorrect page aging and the (mistaken) reclaim of
hot pages, but the chance of that should be relatively low.

So as a performance optimization don't flush the TLB when
clearing the accessed bit, it will eventually be flushed by
a context switch or a VM operation anyway. [ In the rare
event of it not getting flushed for a long time the delay
shouldn't really matter because there's no real memory
pressure for swapout to react to. ]

Suggested-by: Linus Torvalds
Signed-off-by: Shaohua Li
Acked-by: Rik van Riel
Acked-by: Mel Gorman
Acked-by: Hugh Dickins
Acked-by: Johannes Weiner
Cc: linux-mm@kvack.org
Cc: Peter Zijlstra
Link: http://lkml.kernel.org/r/20140408075809.GA1764@kernel.org
[ Rewrote the changelog and the code comments. ]
Signed-off-by: Ingo Molnar
Signed-off-by: Mel Gorman
Signed-off-by: Jiri Slaby

Shaohua Li
2014-09-26 17:52:01 +0800

10 Jul, 2013

1 commit

73b44ff43 mm/pgtable: don't accumulate addr during pgd prepopulate pmd ... Browse Code »

The old codes accumulate addr to get right pmd, however, currently pmds
are preallocated and transfered as a parameter, there is unnecessary to
accumulate addr variable any more, this patch remove it.

Signed-off-by: Wanpeng Li
Reviewed-by: Michal Hocko
Reviewed-by: Zhang Yanfei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wanpeng Li
2013-07-10 01:33:23 +0800

13 Apr, 2013

1 commit

1de14c3c5 x86-32: Fix possible incomplete TLB invalidate with PAE pagetables ... Browse Code »

This patch attempts to fix:

https://bugzilla.kernel.org/show_bug.cgi?id=56461

The symptom is a crash and messages like this:

chrome: Corrupted page table at address 34a03000
*pdpt = 0000000000000000 *pde = 0000000000000000
Bad pagetable: 000f [#1] PREEMPT SMP

Ingo guesses this got introduced by commit 611ae8e3f520 ("x86/tlb:
enable tlb flush range support for x86") since that code started to free
unused pagetables.

On x86-32 PAE kernels, that new code has the potential to free an entire
PMD page and will clear one of the four page-directory-pointer-table
(aka pgd_t entries).

The hardware aggressively "caches" these top-level entries and invlpg
does not actually affect the CPU's copy. If we clear one we *HAVE* to
do a full TLB flush, otherwise we might continue using a freed pmd page.
(note, we do this properly on the population side in pud_populate()).

This patch tracks whenever we clear one of these entries in the 'struct
mmu_gather', and ensures that we follow up with a full tlb flush.

BTW, I disassembled and checked that:

if (tlb->fullmm == 0)
and
if (!tlb->fullmm && !tlb->need_flush_all)

generate essentially the same code, so there should be zero impact there
to the !PAE case.

Signed-off-by: Dave Hansen
Cc: Peter Anvin
Cc: Ingo Molnar
Cc: Artem S Tashkinov
Signed-off-by: Linus Torvalds

Dave Hansen
2013-04-13 07:56:47 +0800

26 Jan, 2013

1 commit

7b5c4a65c Merge tag 'v3.8-rc5' into x86/mm ... Browse Code »

The __pa() fixup series that follows touches KVM code that is not
present in the existing branch based on v3.7-rc5, so merge in the
current upstream from Linus.

Signed-off-by: H. Peter Anvin

H. Peter Anvin
2013-01-26 08:31:21 +0800

17 Dec, 2012

1 commit

3d59eebc5 Merge tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma ... Browse Code »

Pull Automatic NUMA Balancing bare-bones from Mel Gorman:
"There are three implementations for NUMA balancing, this tree
(balancenuma), numacore which has been developed in tip/master and
autonuma which is in aa.git.

In almost all respects balancenuma is the dumbest of the three because
its main impact is on the VM side with no attempt to be smart about
scheduling. In the interest of getting the ball rolling, it would be
desirable to see this much merged for 3.8 with the view to building
scheduler smarts on top and adapting the VM where required for 3.9.

The most recent set of comparisons available from different people are

mel: https://lkml.org/lkml/2012/12/9/108
mingo: https://lkml.org/lkml/2012/12/7/331
tglx: https://lkml.org/lkml/2012/12/10/437
srikar: https://lkml.org/lkml/2012/12/10/397

The results are a mixed bag. In my own tests, balancenuma does
reasonably well. It's dumb as rocks and does not regress against
mainline. On the other hand, Ingo's tests shows that balancenuma is
incapable of converging for this workloads driven by perf which is bad
but is potentially explained by the lack of scheduler smarts. Thomas'
results show balancenuma improves on mainline but falls far short of
numacore or autonuma. Srikar's results indicate we all suffer on a
large machine with imbalanced node sizes.

My own testing showed that recent numacore results have improved
dramatically, particularly in the last week but not universally.
We've butted heads heavily on system CPU usage and high levels of
migration even when it shows that overall performance is better.
There are also cases where it regresses. Of interest is that for
specjbb in some configurations it will regress for lower numbers of
warehouses and show gains for higher numbers which is not reported by
the tool by default and sometimes missed in treports. Recently I
reported for numacore that the JVM was crashing with
NullPointerExceptions but currently it's unclear what the source of
this problem is. Initially I thought it was in how numacore batch
handles PTEs but I'm no longer think this is the case. It's possible
numacore is just able to trigger it due to higher rates of migration.

These reports were quite late in the cycle so I/we would like to start
with this tree as it contains much of the code we can agree on and has
not changed significantly over the last 2-3 weeks."

* tag 'balancenuma-v11' of git://git.kernel.org/pub/scm/linux/kernel/git/mel/linux-balancenuma: (50 commits)
mm/rmap, migration: Make rmap_walk_anon() and try_to_unmap_anon() more scalable
mm/rmap: Convert the struct anon_vma::mutex to an rwsem
mm: migrate: Account a transhuge page properly when rate limiting
mm: numa: Account for failed allocations and isolations as migration failures
mm: numa: Add THP migration for the NUMA working set scanning fault case build fix
mm: numa: Add THP migration for the NUMA working set scanning fault case.
mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node
mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG
mm: sched: numa: Control enabling and disabling of NUMA balancing
mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate
mm: numa: Use a two-stage filter to restrict pages being migrated for unlikely tasknode relationships
mm: numa: migrate: Set last_nid on newly allocated page
mm: numa: split_huge_page: Transfer last_nid on tail page
mm: numa: Introduce last_nid to the page frame
sched: numa: Slowly increase the scanning period as NUMA faults are handled
mm: numa: Rate limit setting of pte_numa if node is saturated
mm: numa: Rate limit the amount of memory that is migrated between nodes
mm: numa: Structures for Migrate On Fault per NUMA migration rate limiting
mm: numa: Migrate pages handled during a pmd_numa hinting fault
mm: numa: Migrate on reference policy
...

Linus Torvalds
2012-12-17 07:18:08 +0800

11 Dec, 2012

2 commits

e4a1cc56e x86: mm: drop TLB flush from ptep_set_access_flags ... Browse Code »

Intel has an architectural guarantee that the TLB entry causing
a page fault gets invalidated automatically. This means
we should be able to drop the local TLB invalidation.

Because of the way other areas of the page fault code work,
chances are good that all x86 CPUs do this. However, if
someone somewhere has an x86 CPU that does not invalidate
the TLB entry causing a page fault, this one-liner should
be easy to revert.

Signed-off-by: Rik van Riel
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Michel Lespinasse
Cc: Peter Zijlstra
Cc: Ingo Molnar

Rik van Riel
2012-12-11 22:28:33 +0800
0f9a921cf x86: mm: only do a local tlb flush in ptep_set_access_flags() ... Browse Code »

The function ptep_set_access_flags() is only ever invoked to set access
flags or add write permission on a PTE. The write bit is only ever set
together with the dirty bit.

Because we only ever upgrade a PTE, it is safe to skip flushing entries on
remote TLBs. The worst that can happen is a spurious page fault on other
CPUs, which would flush that TLB entry.

Lazily letting another CPU incur a spurious page fault occasionally is
(much!) cheaper than aggressively flushing everybody else's TLB.

Signed-off-by: Rik van Riel
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Peter Zijlstra
Cc: Michel Lespinasse
Cc: Ingo Molnar

Rik van Riel
2012-12-11 22:28:33 +0800

06 Dec, 2012

1 commit

6d49e352a propagate name change to comments in kernel source ... Browse Code »

I've legally changed my name with New York State, the US Social Security
Administration, et al. This patch propagates the name change and change
in initials and login to comments in the kernel source as well.

Signed-off-by: Nadia Yvette Chambers
Signed-off-by: Jiri Kosina

Nadia Yvette Chambers
2012-12-06 17:39:54 +0800

23 Nov, 2012

1 commit

5e4bf1a55 x86/mm: Don't flush the TLB on #WP pmd fixups ... Browse Code »

If we have a write protection #PF and fix up the pmd then the
hugetlb code [the only user of pmdp_set_access_flags], in its
do_huge_pmd_wp_page() page fault resolution function calls
pmdp_set_access_flags() to mark the pmd permissive again,
and flushes the TLB.

This TLB flush is unnecessary: a flush on #PF is guaranteed on
most (all?) x86 CPUs, and even in the worst-case we'll generate
a spurious fault.

So remove it.

Cc: Linus Torvalds
Cc: Andrew Morton
Cc: Peter Zijlstra
Cc: Paul Turner
Cc: Lee Schermerhorn
Cc: Andrea Arcangeli
Cc: Rik van Riel
Cc: Johannes Weiner
Cc: Christoph Lameter
Cc: Mel Gorman
Cc: Hugh Dickins
Link: http://lkml.kernel.org/r/20121120120251.GA15742@gmail.com
Signed-off-by: Ingo Molnar

Ingo Molnar
2012-11-23 04:52:06 +0800

18 Mar, 2011

1 commit

4981d01ea x86: Flush TLB if PGD entry is changed in i386 PAE mode ... Browse Code »

According to intel CPU manual, every time PGD entry is changed in i386 PAE
mode, we need do a full TLB flush. Current code follows this and there is
comment for this too in the code.

But current code misses the multi-threaded case. A changed page table
might be used by several CPUs, every such CPU should flush TLB. Usually
this isn't a problem, because we prepopulate all PGD entries at process
fork. But when the process does munmap and follows new mmap, this issue
will be triggered.

When it happens, some CPUs keep doing page faults:

http://marc.info/?l=linux-kernel&m=129915020508238&w=2

Reported-by: Yasunori Goto
Tested-by: Yasunori Goto
Reviewed-by: Rik van Riel
Signed-off-by: Shaohua Li
Cc: Mallick Asit K
Cc: Linus Torvalds
Cc: Andrew Morton
Cc: linux-mm
Cc: stable
LKML-Reference:
Signed-off-by: Ingo Molnar

Shaohua Li
2011-03-18 18:44:01 +0800

10 Mar, 2011

1 commit

a79e53d85 x86/mm: Fix pgd_lock deadlock ... Browse Code »
2

It's forbidden to take the page_table_lock with the irq disabled
or if there's contention the IPIs (for tlb flushes) sent with
the page_table_lock held will never run leading to a deadlock.

Nobody takes the pgd_lock from irq context so the _irqsave can be
removed.

Signed-off-by: Andrea Arcangeli
Acked-by: Rik van Riel
Tested-by: Konrad Rzeszutek Wilk
Signed-off-by: Andrew Morton
Cc: Peter Zijlstra
Cc: Linus Torvalds
Cc:
LKML-Reference:
Signed-off-by: Ingo Molnar

Andrea Arcangeli
2011-03-10 16:41:57 +0800

14 Jan, 2011

2 commits

f2d6bfe9f thp: add x86 32bit support ... Browse Code »

Add support for transparent hugepages to x86 32bit.

Share the same VM_ bitflag for VM_MAPPED_COPY. mm/nommu.c will never
support transparent hugepages.

Signed-off-by: Johannes Weiner
Signed-off-by: Andrea Arcangeli
Reviewed-by: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2011-01-14 09:32:44 +0800
db3eb96f4 thp: add pmd mangling functions to x86 ... Browse Code »

Add needed pmd mangling functions with symmetry with their pte
counterparts. pmdp_splitting_flush() is the only new addition on the pmd_
methods and it's needed to serialize the VM against split_huge_page. It
simply atomically sets the splitting bit in a similar way
pmdp_clear_flush_young atomically clears the accessed bit.
pmdp_splitting_flush() also has to flush the tlb to make it effective
against gup_fast, but it wouldn't really require to flush the tlb too.
Just the tlb flush is the simplest operation we can invoke to serialize
pmdp_splitting_flush() against gup_fast.

Signed-off-by: Andrea Arcangeli
Acked-by: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2011-01-14 09:32:40 +0800

22 Oct, 2010

1 commit

709d9f54c Merge branch 'x86-vmware-for-linus' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'x86-vmware-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, paravirt: Remove alloc_pmd_clone hook, only used by VMI
x86, vmware: Remove deprecated VMI kernel support

Fix up trivial #include conflict in arch/x86/kernel/smpboot.c

Linus Torvalds
2010-10-22 04:53:24 +0800

20 Oct, 2010

1 commit

617d34d9e x86, mm: Hold mm->page_table_lock while doing vmalloc_sync ... Browse Code »

Take mm->page_table_lock while syncing the vmalloc region. This prevents
a race with the Xen pagetable pin/unpin code, which expects that the
page_table_lock is already held. If this race occurs, then Xen can see
an inconsistent page type (a page can either be read/write or a pagetable
page, and pin/unpin converts it between them), which will cause either
the pin or the set_p[gm]d to fail; either will crash the kernel.

vmalloc_sync_all() should be called rarely, so this extra use of
page_table_lock should not interfere with its normal users.

The mm pointer is stashed in the pgd page's index field, as that won't
be otherwise used for pgds.

Reported-by: Ian Campbell
Originally-by: Jan Beulich
LKML-Reference:
Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: H. Peter Anvin

Jeremy Fitzhardinge
2010-10-20 04:57:08 +0800

24 Aug, 2010

1 commit

b0f4c062f x86, paravirt: Remove alloc_pmd_clone hook, only used by VMI ... Browse Code »

VMI was the only user of the alloc_pmd_clone hook, given that VMI
is now removed we can also remove this hook.

Signed-off-by: Alok N Kataria
LKML-Reference:
Cc: Jeremy Fitzhardinge
Signed-off-by: H. Peter Anvin

Alok Kataria
2010-08-24 08:09:44 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

25 Feb, 2010

1 commit

143155920 x86, mm: Allow highmem user page tables to be disabled at boot time ... Browse Code »

Distros generally (I looked at Debian, RHEL5 and SLES11) seem to
enable CONFIG_HIGHPTE for any x86 configuration which has highmem
enabled. This means that the overhead applies even to machines which
have a fairly modest amount of high memory and which therefore do not
really benefit from allocating PTEs in high memory but still pay the
price of the additional mapping operations.

Running kernbench on a 4G box I found that with CONFIG_HIGHPTE=y but
no actual highptes being allocated there was a reduction in system
time used from 59.737s to 55.9s.

With CONFIG_HIGHPTE=y and highmem PTEs being allocated:
Average Optimal load -j 4 Run (std deviation):
Elapsed Time 175.396 (0.238914)
User Time 515.983 (5.85019)
System Time 59.737 (1.26727)
Percent CPU 263.8 (71.6796)
Context Switches 39989.7 (4672.64)
Sleeps 42617.7 (246.307)

With CONFIG_HIGHPTE=y but with no highmem PTEs being allocated:
Average Optimal load -j 4 Run (std deviation):
Elapsed Time 174.278 (0.831968)
User Time 515.659 (6.07012)
System Time 55.9 (1.07799)
Percent CPU 263.8 (71.266)
Context Switches 39929.6 (4485.13)
Sleeps 42583.7 (373.039)

This patch allows the user to control the allocation of PTEs in
highmem from the command line ("userpte=nohigh") but retains the
status-quo as the default.

It is possible that some simple heuristic could be developed which
allows auto-tuning of this option however I don't have a sufficiently
large machine available to me to perform any particularly meaningful
experiments. We could probably handwave up an argument for a threshold
at 16G of total RAM.

Assuming 768M of lowmem we have 196608 potential lowmem PTE
pages. Each page can map 2M of RAM in a PAE-enabled configuration,
meaning a maximum of 384G of RAM could potentially be mapped using
lowmem PTEs.

Even allowing generous factor of 10 to account for other required
lowmem allocations, generous slop to account for page sharing (which
reduces the total amount of RAM mappable by a given number of PT
pages) and other innacuracies in the estimations it would seem that
even a 32G machine would not have a particularly pressing need for
highmem PTEs. I think 32G could be considered to be at the upper bound
of what might be sensible on a 32 bit machine (although I think in
practice 64G is still supported).

It's seems questionable if HIGHPTE is even a win for any amount of RAM
you would sensibly run a 32 bit kernel on rather than going 64 bit.

Signed-off-by: Ian Campbell
LKML-Reference:
Signed-off-by: H. Peter Anvin

Ian Campbell
2010-02-25 17:28:19 +0800

04 Aug, 2009

1 commit

6abf65510 x86, 32-bit: Fix double accounting in reserve_top_address() ... Browse Code »

With VMALLOC_END included in the calculation of MAXMEM (as of
2.6.28) it is no longer correct to also bump __VMALLOC_RESERVE
in reserve_top_address(). Doing so results in needlessly small
lowmem.

Signed-off-by: Jan Beulich
LKML-Reference:
Signed-off-by: Ingo Molnar

Jan Beulich
2009-08-04 22:27:29 +0800

28 Jul, 2009

1 commit

9e1b32caa mm: Pass virtual address to [__]p{te,ud,md}_free_tlb() ... Browse Code »

mm: Pass virtual address to [__]p{te,ud,md}_free_tlb()

Upcoming paches to support the new 64-bit "BookE" powerpc architecture
will need to have the virtual address corresponding to PTE page when
freeing it, due to the way the HW table walker works.

Basically, the TLB can be loaded with "large" pages that cover the whole
virtual space (well, sort-of, half of it actually) represented by a PTE
page, and which contain an "indirect" bit indicating that this TLB entry
RPN points to an array of PTEs from which the TLB can then create direct
entries. Thus, in order to invalidate those when PTE pages are deleted,
we need the virtual address to pass to tlbilx or tlbivax instructions.

The old trick of sticking it somewhere in the PTE page struct page sucks
too much, the address is almost readily available in all call sites and
almost everybody implemets these as macros, so we may as well add the
argument everywhere. I added it to the pmd and pud variants for consistency.

Signed-off-by: Benjamin Herrenschmidt
Acked-by: David Howells [MN10300 & FRV]
Acked-by: Nick Piggin
Acked-by: Martin Schwidefsky [s390]
Signed-off-by: Linus Torvalds

Benjamin Herrenschmidt
2009-07-28 03:10:38 +0800

15 Jun, 2009

1 commit

9e730237c kmemcheck: don't track page tables ... Browse Code »

As these are allocated using the page allocator, we need to pass
__GFP_NOTRACK before we add page allocator support to kmemcheck.

Signed-off-by: Vegard Nossum

Vegard Nossum
2009-06-15 18:40:11 +0800

10 Apr, 2009

1 commit

3b3809ac5 x86: fix set_fixmap to use phys_addr_t ... Browse Code »

Use phys_addr_t for receiving a physical address argument instead of
unsigned long. This allows fixmap to handle pages higher than 4GB on
x86-32.

Signed-off-by: Masami Hiramatsu
Cc: Ingo Molnar
Acked-by: Mathieu Desnoyers
Signed-off-by: Linus Torvalds

Masami Hiramatsu
2009-04-10 07:41:45 +0800

28 Feb, 2009

1 commit

fd862dde1 x86, fixmap: define reserve_top_address for x86_64 ... Browse Code »

Impact: new interface (not yet use)

Define reserve_top_address for x86_64; only for later x86 integration.

Signed-off-by: Gustavo F. Padovan
Acked-by: Glauber Costa
Signed-off-by: H. Peter Anvin

Gustavo F. Padovan
2009-02-28 12:57:47 +0800

07 Sep, 2008

1 commit

17b746278 x86: pgd_{c,d}tor() cleanup ... Browse Code »

Giving pgd_ctor() a properly typed parameter allows eliminating a local
variable. Adjust pgd_dtor() to match.

Signed-off-by: Jan Beulich
Acked-by: Jeremy Fitzhardinge
Cc: "Jeremy Fitzhardinge"
Signed-off-by: Ingo Molnar

Jan Beulich
2008-09-07 01:47:09 +0800

12 Aug, 2008

1 commit

cf3e50501 x86: work around gcc 3.4.x bug ... Browse Code »

Simon Horman reported that gcc-3.4.x crashes when compiling
pgd_prepopulate_pmd() when PREALLOCATED_PMDS == 0 and CONFIG_DEBUG_INFO
is enabled.

Adding an extra check for PREALLOCATED_PMDS == 0 [which is compiled out
by gcc] seems to avoid the problem.

Reported-by: Simon Horman
Signed-off-by: Jeremy Fitzhardinge
Acked-by: Simon Horman
Signed-off-by: Ingo Molnar

Jeremy Fitzhardinge
2008-08-12 00:44:02 +0800

08 Jul, 2008

3 commits

d8d5900ef x86: preallocate and prepopulate separately ... Browse Code »

Jan Beulich points out that vmalloc_sync_all() assumes that the
kernel's pmd is always expected to be present in the pgd. The current
pgd construction code will add the pgd to the pgd_list before its pmds
have been pre-populated, thereby making it visible to
vmalloc_sync_all().

However, because pgd_prepopulate_pmd also does the allocation, it may
block and cannot be done under spinlock.

The solution is to preallocate the pmds out of the spinlock, then
populate them while holding the pgd_list lock.

This patch also pulls the pmd preallocation and mop-up functions out
to be common, assuming that the compiler will generate no code for
them when PREALLOCTED_PMDS is 0. Also, there's no need for pgd_ctor
to clear the pgd again, since it's allocated as a zeroed page.

Signed-off-by: Jeremy Fitzhardinge
Cc: xen-devel
Cc: Stephen Tweedie
Cc: Eduardo Habkost
Cc: Mark McLoughlin
Signed-off-by: Ingo Molnar
Cc: Jan Beulich

Signed-off-by: Ingo Molnar

Jeremy Fitzhardinge
2008-07-08 19:11:02 +0800
eba0045ff x86/paravirt: add a pgd_alloc/free hooks ... Browse Code »

Add hooks which are called at pgd_alloc/free time. The pgd_alloc hook
may return an error code, which if non-zero, causes the pgd allocation
to be failed. The hooks may be used to allocate/free auxillary
per-pgd information.

also fix:

> * Ingo Molnar wrote:
>
> include/asm/pgalloc.h: In function ‘paravirt_pgd_free':
> include/asm/pgalloc.h:14: error: parameter name omitted
> arch/x86/kernel/entry_64.S: In file included from
> arch/x86/kernel/traps_64.c:51:include/asm/pgalloc.h: In function ‘paravirt_pgd_free':
> include/asm/pgalloc.h:14: error: parameter name omitted

Signed-off-by: Jeremy Fitzhardinge
Cc: xen-devel
Cc: Stephen Tweedie
Cc: Eduardo Habkost
Cc: Mark McLoughlin
Signed-off-by: Ingo Molnar

Jeremy Fitzhardinge
2008-07-08 19:11:01 +0800
6236af82d Merge branch 'x86/fixmap' into x86/devel ... Browse Code »

Conflicts:

arch/x86/mm/init_64.c

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-07-08 18:24:29 +0800

20 Jun, 2008

4 commits

a1d5a8691 x86: unify __set_fixmap, fix ... Browse Code »

fix build failure:

arch/x86/mm/pgtable.c:280: warning: ‘enum fixed_addresses’ declared inside parameter list
arch/x86/mm/pgtable.c:280: warning: its scope is only this definition or declaration, which is probably not what you want
arch/x86/mm/pgtable.c:280: error: parameter 1 (‘idx’) has incomplete type

Signed-off-by: Ingo Molnar

Ingo Molnar
2008-06-20 21:36:57 +0800
aeaaa59c7 x86/paravirt/xen: add set_fixmap pv_mmu_ops ... Browse Code »

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Juan Quintela
Signed-off-by: Eduardo Habkost
Signed-off-by: Mark McLoughlin
Signed-off-by: Ingo Molnar

Jeremy Fitzhardinge
2008-06-20 21:09:56 +0800
d494a9612 x86: implement set_pte_vaddr ... Browse Code »

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Ingo Molnar

Jeremy Fitzhardinge
2008-06-20 21:09:54 +0800
7c7e6e07e x86: unify __set_fixmap ... Browse Code »

In both cases, I went with the 32-bit behaviour.

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Ingo Molnar

Jeremy Fitzhardinge
2008-06-20 21:09:51 +0800

25 May, 2008

1 commit

48e239572 x86: fixup the fallout of the bitops changes ... Browse Code »

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2008-05-25 14:58:36 +0800

25 Apr, 2008

7 commits

85958b465 x86: unify pgd ctor/dtor ... Browse Code »

All pagetables need fundamentally the same setup and destruction, so
just use the same code for everything.

Signed-off-by: Jeremy Fitzhardinge
Cc: Andi Kleen
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner

Jeremy Fitzhardinge
2008-04-25 05:57:31 +0800
68db065c8 x86: unify KERNEL_PGD_PTRS ... Browse Code »

Make KERNEL_PGD_PTRS common, as previously it was only being defined
for 32-bit.

There are a couple of follow-on changes from this:
- KERNEL_PGD_PTRS was being defined in terms of USER_PGD_PTRS. The
definition of USER_PGD_PTRS doesn't really make much sense on x86-64,
since it can have two different user address-space configurations.
I renamed USER_PGD_PTRS to KERNEL_PGD_BOUNDARY, which is meaningful
for all of 32/32, 32/64 and 64/64 process configurations.

- USER_PTRS_PER_PGD was also defined and was being used for similar
purposes. Converting its users to KERNEL_PGD_BOUNDARY left it
completely unused, and so I removed it.

Signed-off-by: Jeremy Fitzhardinge
Cc: Andi Kleen
Cc: Zach Amsden

Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner

Jeremy Fitzhardinge
2008-04-25 05:57:31 +0800
c20311e16 x86/pgtable.h: demacro ptep_clear_flush_young ... Browse Code »

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner

Jeremy Fitzhardinge
2008-04-25 05:57:31 +0800
f9fbf1a36 x86/pgtable.h: demacro ptep_test_and_clear_young ... Browse Code »

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner

Jeremy Fitzhardinge
2008-04-25 05:57:31 +0800
ee5aa8d3b x86/pgtable.h: demacro ptep_set_access_flags ... Browse Code »

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner

Jeremy Fitzhardinge
2008-04-25 05:57:31 +0800
2761fa092 x86: add pud_alloc for 4-level pagetables ... Browse Code »

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner

Jeremy Fitzhardinge
2008-04-25 05:57:31 +0800
6944a9c89 x86: rename paravirt_alloc_pt etc after the pagetable structure ... Browse Code »

Rename (alloc|release)_(pt|pd) to pte/pmd to explicitly match the name
of the appropriate pagetable level structure.

[ x86.git merge work by Mark McLoughlin ]

Signed-off-by: Jeremy Fitzhardinge
Signed-off-by: Ingo Molnar
Signed-off-by: Mark McLoughlin
Signed-off-by: Thomas Gleixner

Jeremy Fitzhardinge
2008-04-25 05:57:31 +0800