Eric Lee / smarc-fsl-linux-kernel

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

07 Mar, 2010

1 commit

85f1fb72f mm/migrate.c: kill anon local variable from migrate_page_copy ... Browse Code »

commit 01b1ae63c2 ("memcg: simple migration handling") removed
mem_cgroup_uncharge_cache_page() call from migrate_page_copy. Local
variable `anon' is now unused.

Signed-off-by: KOSAKI Motohiro
Cc: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2010-03-07 03:26:25 +0800

02 Mar, 2010

1 commit

ac0f6f927 Merge branch 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm ... Browse Code »

* 'for-linus' of master.kernel.org:/home/rmk/linux-2.6-arm: (100 commits)
ARM: Eliminate decompressor -Dstatic= PIC hack
ARM: 5958/1: ARM: U300: fix inverted clk round rate
ARM: 5956/1: misplaced parentheses
ARM: 5955/1: ep93xx: move timer defines into core.c and document
ARM: 5954/1: ep93xx: move gpio interrupt support to gpio.c
ARM: 5953/1: ep93xx: fix broken build of clock.c
ARM: 5952/1: ARM: MM: Add ARM_L1_CACHE_SHIFT_6 for handle inside each ARCH Kconfig
ARM: 5949/1: NUC900 add gpio virtual memory map
ARM: 5948/1: Enable timer0 to time4 clock support for nuc910
ARM: 5940/2: ARM: MMCI: remove custom DBG macro and printk
ARM: make_coherent(): fix problems with highpte, part 2
MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself
ARM: 5945/1: ep93xx: include correct irq.h in core.c
ARM: 5933/1: amba-pl011: support hardware flow control
ARM: 5930/1: Add PKMAP area description to memory.txt.
ARM: 5929/1: Add checks to detect overlap of memory regions.
ARM: 5928/1: Change type of VMALLOC_END to unsigned long.
ARM: 5927/1: Make delimiters of DMA area globally visibly.
ARM: 5926/1: Add "Virtual kernel memory..." printout.
ARM: 5920/1: OMAP4: Enable L2 Cache
...

Fix up trivial conflict in arch/arm/mach-mx25/clock.c

Linus Torvalds
2010-03-02 01:15:15 +0800

22 Feb, 2010

1 commit

87b8d1ade mm: Make copy_from_user() in migrate.c statically predictable ... Browse Code »

x86-32 has had a static test for copy_on_user() overflow for a while.
This test currently fails in mm/migrate.c resulting in an
allyesconfig/allmodconfig build failure on x86-32:

In function ‘copy_from_user’,
inlined from ‘do_pages_stat’ at
/home/hpa/kernel/git/mm/migrate.c:1012:
/home/hpa/kernel/git/arch/x86/include/asm/uaccess_32.h:212: error:
call to ‘copy_from_user_overflow’ declared

Make the logic more explicit and therefore easier for gcc to
understand.

v2: rewrite the loop entirely using a more normal structure for a
chunked-data loop (Linus Torvalds)

Reported-by: Len Brown
Signed-off-by: H. Peter Anvin
Reviewed-and-Tested-by: KOSAKI Motohiro
Cc: Arjan van de Ven
Cc: Andrew Morton
Cc: Christoph Lameter
Cc: Hugh Dickins
Cc: Rik van Riel
Signed-off-by: Linus Torvalds

H. Peter Anvin
2010-02-22 00:57:08 +0800

21 Feb, 2010

1 commit

4b3073e1c MM: Pass a PTE pointer to update_mmu_cache() rather than the PTE itself ... Browse Code »

On VIVT ARM, when we have multiple shared mappings of the same file
in the same MM, we need to ensure that we have coherency across all
copies. We do this via make_coherent() by making the pages
uncacheable.

This used to work fine, until we allowed highmem with highpte - we
now have a page table which is mapped as required, and is not available
for modification via update_mmu_cache().

Ralf Beache suggested getting rid of the PTE value passed to
update_mmu_cache():

On MIPS update_mmu_cache() calls __update_tlb() which walks pagetables
to construct a pointer to the pte again. Passing a pte_t * is much
more elegant. Maybe we might even replace the pte argument with the
pte_t?

Ben Herrenschmidt would also like the pte pointer for PowerPC:

Passing the ptep in there is exactly what I want. I want that
-instead- of the PTE value, because I have issue on some ppc cases,
for I$/D$ coherency, where set_pte_at() may decide to mask out the
_PAGE_EXEC.

So, pass in the mapped page table pointer into update_mmu_cache(), and
remove the PTE value, updating all implementations and call sites to
suit.

Includes a fix from Stephen Rothwell:

sparc: fix fallout from update_mmu_cache API change

Signed-off-by: Stephen Rothwell

Acked-by: Benjamin Herrenschmidt
Signed-off-by: Russell King

Russell King
2010-02-21 00:41:46 +0800

07 Feb, 2010

1 commit

6f5a55f1a Fix potential crash with sys_move_pages ... Browse Code »

We incorrectly depended on the 'node_state/node_isset()' functions
testing the node range, rather than checking it explicitly. That's not
reliable, even if it might often happen to work. So do the proper
explicit test.

Reported-by: Marcus Meissner
Acked-and-tested-by: Brice Goglin
Acked-by: Hugh Dickins
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Linus Torvalds
2010-02-07 05:00:37 +0800

16 Dec, 2009

5 commits

418b27ef5 mm: remove unevictable_migrate_page function ... Browse Code »

unevictable_migrate_page() in mm/internal.h is a relic of the since
removed UNEVICTABLE_LRU Kconfig option. This patch removes the function
and open codes the test in migrate_page_copy().

Signed-off-by: Lee Schermerhorn
Reviewed-by: Christoph Lameter
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2009-12-16 00:53:23 +0800
62b61f611 ksm: memory hotremove migration only ... Browse Code »

The previous patch enables page migration of ksm pages, but that soon gets
into trouble: not surprising, since we're using the ksm page lock to lock
operations on its stable_node, but page migration switches the page whose
lock is to be used for that. Another layer of locking would fix it, but
do we need that yet?

Do we actually need page migration of ksm pages? Yes, memory hotremove
needs to offline sections of memory: and since we stopped allocating ksm
pages with GFP_HIGHUSER, they will tend to be GFP_HIGHUSER_MOVABLE
candidates for migration.

But KSM is currently unconscious of NUMA issues, happily merging pages
from different NUMA nodes: at present the rule must be, not to use
MADV_MERGEABLE where you care about NUMA. So no, NUMA page migration of
ksm pages does not make sense yet.

So, to complete support for ksm swapping we need to make hotremove safe.
ksm_memory_callback() take ksm_thread_mutex when MEM_GOING_OFFLINE and
release it when MEM_OFFLINE or MEM_CANCEL_OFFLINE. But if mapped pages
are freed before migration reaches them, stable_nodes may be left still
pointing to struct pages which have been removed from the system: the
stable_node needs to identify a page by pfn rather than page pointer, then
it can safely prune them when MEM_OFFLINE.

And make NUMA migration skip PageKsm pages where it skips PageReserved.
But it's only when we reach unmap_and_move() that the page lock is taken
and we can be sure that raised pagecount has prevented a PageAnon from
being upgraded: so add offlining arg to migrate_pages(), to migrate ksm
page when offlining (has sufficient locking) but reject it otherwise.

Signed-off-by: Hugh Dickins
Cc: Izik Eidus
Cc: Andrea Arcangeli
Cc: Chris Wright
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2009-12-16 00:53:20 +0800
e9995ef97 ksm: rmap_walk to remove_migation_ptes ... Browse Code »

A side-effect of making ksm pages swappable is that they have to be placed
on the LRUs: which then exposes them to isolate_lru_page() and hence to
page migration.

Add rmap_walk() for remove_migration_ptes() to use: rmap_walk_anon() and
rmap_walk_file() in rmap.c, but rmap_walk_ksm() in ksm.c. Perhaps some
consolidation with existing code is possible, but don't attempt that yet
(try_to_unmap needs to handle nonlinears, but migration pte removal does
not).

rmap_walk() is sadly less general than it appears: rmap_walk_anon(), like
remove_anon_migration_ptes() which it replaces, avoids calling
page_lock_anon_vma(), because that includes a page_mapped() test which
fails when all migration ptes are in place. That was valid when NUMA page
migration was introduced (holding mmap_sem provided the missing guarantee
that anon_vma's slab had not already been destroyed), but I believe not
valid in the memory hotremove case added since.

For now do the same as before, and consider the best way to fix that
unlikely race later on. When fixed, we can probably use rmap_walk() on
hwpoisoned ksm pages too: for now, they remain among hwpoison's various
exceptions (its PageKsm test comes before the page is locked, but its
page_lock_anon_vma fails safely if an anon gets upgraded).

Signed-off-by: Hugh Dickins
Cc: Izik Eidus
Cc: Andrea Arcangeli
Cc: Chris Wright
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2009-12-16 00:53:20 +0800
3ca7b3c5b mm: define PAGE_MAPPING_FLAGS ... Browse Code »

At present we define PageAnon(page) by the low PAGE_MAPPING_ANON bit set
in page->mapping, with the higher bits a pointer to the anon_vma; and have
defined PageKsm(page) as that with NULL anon_vma.

But KSM swapping will need to store a pointer there: so in preparation for
that, now define PAGE_MAPPING_FLAGS as the low two bits, including
PAGE_MAPPING_KSM (always set along with PAGE_MAPPING_ANON, until some
other use for the bit emerges).

Declare page_rmapping(page) to return the pointer part of page->mapping,
and page_anon_vma(page) to return the anon_vma pointer when that's what it
is. Use these in a few appropriate places: notably, unuse_vma() has been
testing page->mapping, but is better to be testing page_anon_vma() (cases
may be added in which flag bits are set without any pointer).

Signed-off-by: Hugh Dickins
Cc: Izik Eidus
Cc: Andrea Arcangeli
Cc: Nick Piggin
Cc: KOSAKI Motohiro
Reviewed-by: Rik van Riel
Cc: Lee Schermerhorn
Cc: Andi Kleen
Cc: KAMEZAWA Hiroyuki
Cc: Wu Fengguang
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2009-12-16 00:53:17 +0800
6d9c285a6 mm: move inc_zone_page_state(NR_ISOLATED) to just isolated place ... Browse Code »

Christoph pointed out inc_zone_page_state(NR_ISOLATED) should be placed
in right after isolate_page().

This patch does it.

Reviewed-by: Christoph Lameter
Signed-off-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-12-16 00:53:12 +0800

12 Dec, 2009

1 commit

b92558503 mm: Adjust do_pages_stat() so gcc can see copy_from_user() is safe ... Browse Code »

Slightly adjust the logic for determining the size of the
copy_form_user() in do_pages_stat(); with this change, gcc can see
that the copying is safe.

Without this, we get a build error for i386 allyesconfig:

/home/hpa/kernel/linux-2.6-tip.urgent/arch/x86/include/asm/uaccess_32.h:213:
error: call to ‘copy_from_user_overflow’ declared with attribute
error: copy_from_user() buffer size is not provably correct

Unlike an earlier patch from Arjan, this doesn't introduce new
variables; merely reshuffles the compare so that gcc can see that an
overflow cannot happen.

Signed-off-by: H. Peter Anvin
Cc: Brice Goglin
Cc: Arjan van de Ven
Cc: Andrew Morton
Cc: KOSAKI Motohiro
LKML-Reference:

H. Peter Anvin
2009-12-12 07:27:47 +0800

12 Nov, 2009

1 commit

e00e43161 memcg: fix wrong pointer initialization at page migration when memcg is disabled. ... Browse Code »

Lee Schermerhorn reported that he saw bad pointer dereference in
mem_cgroup_end_migration() when he disabled memcg by boot option.

memcg's page migration logic works as

mem_cgroup_prepare_migration(page, &ptr);
do page migration
mem_cgroup_end_migration(page, ptr);

Now, ptr is not initialized in prepare_migration when memcg is disabled
by boot option. This causes panic in end_migration. This patch fixes it.

Reported-by: Lee Schermerhorn
Cc: Balbir Singh
Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Daisuke Nishimura
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-11-12 23:25:56 +0800

24 Sep, 2009

1 commit

db1682636 Merge branch 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6 ... Browse Code »

* 'hwpoison' of git://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-mce-2.6: (21 commits)
HWPOISON: Enable error_remove_page on btrfs
HWPOISON: Add simple debugfs interface to inject hwpoison on arbitary PFNs
HWPOISON: Add madvise() based injector for hardware poisoned pages v4
HWPOISON: Enable error_remove_page for NFS
HWPOISON: Enable .remove_error_page for migration aware file systems
HWPOISON: The high level memory error handler in the VM v7
HWPOISON: Add PR_MCE_KILL prctl to control early kill behaviour per process
HWPOISON: shmem: call set_page_dirty() with locked page
HWPOISON: Define a new error_remove_page address space op for async truncation
HWPOISON: Add invalidate_inode_page
HWPOISON: Refactor truncate to allow direct truncating of page v2
HWPOISON: check and isolate corrupted free pages v2
HWPOISON: Handle hardware poisoned pages in try_to_unmap
HWPOISON: Use bitmask/action code for try_to_unmap behaviour
HWPOISON: x86: Add VM_FAULT_HWPOISON handling to x86 page fault handler v2
HWPOISON: Add poison check to page fault handling
HWPOISON: Add basic support for poisoned pages in fault handler v3
HWPOISON: Add new SIGBUS error codes for hardware poison signals
HWPOISON: Add support for poison swap entries v2
HWPOISON: Export some rmap vma locking to outside world
...

Linus Torvalds
2009-09-24 22:53:22 +0800

22 Sep, 2009

5 commits

edcf4748c mm: return boolean from page_has_private() ... Browse Code »

Make page_has_private() return a true boolean value and remove the double
negations from the two callsites using it for arithmetic.

Signed-off-by: Johannes Weiner
Cc: Christoph Lameter
Reviewed-by: Christoph Lameter
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2009-09-22 22:17:38 +0800
6c0b13519 mm: return boolean from page_is_file_cache() ... Browse Code »

page_is_file_cache() has been used for both boolean checks and LRU
arithmetic, which was always a bit weird.

Now that page_lru_base_type() exists for LRU arithmetic, make
page_is_file_cache() a real predicate function and adjust the
boolean-using callsites to drop those pesky double negations.

Signed-off-by: Johannes Weiner
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2009-09-22 22:17:37 +0800
a731286de mm: vmstat: add isolate pages ... Browse Code »

If the system is running a heavy load of processes then concurrent reclaim
can isolate a large number of pages from the LRU. /proc/vmstat and the
output generated for an OOM do not show how many pages were isolated.

This has been observed during process fork bomb testing (mstctl11 in LTP).

This patch shows the information about isolated pages.

Reproduced via:

-----------------------
% ./hackbench 140 process 1000
=> OOM occur

active_anon:146 inactive_anon:0 isolated_anon:49245
active_file:79 inactive_file:18 isolated_file:113
unevictable:0 dirty:0 writeback:0 unstable:0 buffer:39
free:370 slab_reclaimable:309 slab_unreclaimable:5492
mapped:53 shmem:15 pagetables:28140 bounce:0

Signed-off-by: KOSAKI Motohiro
Acked-by: Rik van Riel
Acked-by: Wu Fengguang
Reviewed-by: Minchan Kim
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-09-22 22:17:29 +0800
4b02108ac mm: oom analysis: add shmem vmstat ... Browse Code »

Recently we encountered OOM problems due to memory use of the GEM cache.
Generally a large amuont of Shmem/Tmpfs pages tend to create a memory
shortage problem.

We often use the following calculation to determine the amount of shmem
pages:

shmem = NR_ACTIVE_ANON + NR_INACTIVE_ANON - NR_ANON_PAGES

however the expression does not consider isolated and mlocked pages.

This patch adds explicit accounting for pages used by shmem and tmpfs.

Signed-off-by: KOSAKI Motohiro
Acked-by: Rik van Riel
Reviewed-by: Christoph Lameter
Acked-by: Wu Fengguang
Cc: David Rientjes
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-09-22 22:17:27 +0800
abfc34881 memory hotplug: migrate swap cache page ... Browse Code »

In test, some pages in swap-cache can't be migrated, as they aren't rmap.

unmap_and_move() ignores swap-cache page which is just read in and hasn't
rmap (see the comments in the code), but swap_aops provides .migratepage.
Better to migrate such pages instead of ignore them.

Signed-off-by: Shaohua Li
Cc: Mel Gorman
Cc: Christoph Lameter
Cc: Yakui Zhao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Shaohua Li
2009-09-22 22:17:26 +0800

16 Sep, 2009

1 commit

14fa31b89 HWPOISON: Use bitmask/action code for try_to_unmap behaviour ... Browse Code »

try_to_unmap currently has multiple modi (migration, munlock, normal unmap)
which are selected by magic flag variables. The logic is not very straight
forward, because each of these flag change multiple behaviours (e.g.
migration turns off aging, not only sets up migration ptes etc.)
Also the different flags interact in magic ways.

A later patch in this series adds another mode to try_to_unmap, so
this becomes quickly unmanageable.

Replace the different flags with a action code (migration, munlock, munmap)
and some additional flags as modifiers (ignore mlock, ignore aging).
This makes the logic more straight forward and allows easier extension
to new behaviours. Change all the caller to declare what they want to
do.

This patch is supposed to be a nop in behaviour. If anyone can prove
it is not that would be a bug.

Cc: Lee.Schermerhorn@hp.com
Cc: npiggin@suse.de

Signed-off-by: Andi Kleen

Andi Kleen
2009-09-16 17:50:10 +0800

17 Jun, 2009

2 commits

35282a2de migration: only migrate_prep() once per move_pages() ... Browse Code »

migrate_prep() is fairly expensive (72us on 16-core barcelona 1.9GHz).
Commit 3140a2273009c01c27d316f35ab76a37e105fdd8 improved move_pages()
throughput by breaking it into chunks, but it also made migrate_prep() be
called once per chunk (every 128pages or so) instead of once per
move_pages().

This patch reverts to calling migrate_prep() only once per chunk as we did
before 2.6.29. It is also a followup to commit
0aedadf91a70a11c4a3e7c7d99b21e5528af8d5d ("mm: move migrate_prep out from
under mmap_sem").

This improves migration throughput on the above machine from 600MB/s to
750MB/s.

Signed-off-by: Brice Goglin
Acked-by: Christoph Lameter
Cc: KOSAKI Motohiro
Cc: Heiko Carstens
Cc: Nick Piggin
Cc: Hugh Dickins
Cc: Rik van Riel
Cc: Lee Schermerhorn
Reviewed-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Brice Goglin
2009-06-17 10:47:41 +0800
6484eb3e2 page allocator: do not check NUMA node ID when the caller knows the node is valid ... Browse Code »

Callers of alloc_pages_node() can optionally specify -1 as a node to mean
"allocate from the current node". However, a number of the callers in
fast paths know for a fact their node is valid. To avoid a comparison and
branch, this patch adds alloc_pages_exact_node() that only checks the nid
with VM_BUG_ON(). Callers that know their node is valid are then
converted.

Signed-off-by: Mel Gorman
Reviewed-by: Christoph Lameter
Reviewed-by: KOSAKI Motohiro
Reviewed-by: Pekka Enberg
Acked-by: Paul Mundt [for the SLOB NUMA bits]
Cc: Peter Zijlstra
Cc: Nick Piggin
Cc: Dave Hansen
Cc: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2009-06-17 10:47:32 +0800

03 Apr, 2009

1 commit

266cf658e FS-Cache: Recruit a page flags for cache management ... Browse Code »

Recruit a page flag to aid in cache management. The following extra flag is
defined:

(1) PG_fscache (PG_private_2)

The marked page is backed by a local cache and is pinning resources in the
cache driver.

If PG_fscache is set, then things that checked for PG_private will now also
check for that. This includes things like truncation and page invalidation.
The function page_has_private() had been added to make the checks for both
PG_private and PG_private_2 at the same time.

Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Rik van Riel
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:36 +0800

12 Feb, 2009

1 commit

1001c9fb8 migration: migrate_vmas should check "vma" ... Browse Code »

migrate_vmas() should check "vma" not "vma->vm_next" for for-loop condition.

Signed-off-by: Daisuke Nishimura
Cc: Christoph Lameter
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke Nishimura
2009-02-12 06:25:34 +0800

14 Jan, 2009

1 commit

938bb9f5e [CVE-2009-0029] System call wrappers part 28 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:30 +0800

09 Jan, 2009

2 commits

01b1ae63c memcg: simple migration handling ... Browse Code »

Now, management of "charge" under page migration is done under following
manner. (Assume migrate page contents from oldpage to newpage)

before
- "newpage" is charged before migration.
at success.
- "oldpage" is uncharged at somewhere(unmap, radix-tree-replace)
at failure
- "newpage" is uncharged.
- "oldpage" is charged if necessary (*1)

But (*1) is not reliable....because of GFP_ATOMIC.

This patch tries to change behavior as following by charge/commit/cancel ops.

before
- charge PAGE_SIZE (no target page)
success
- commit charge against "newpage".
failure
- commit charge against "oldpage".
(PCG_USED bit works effectively to avoid double-counting)
- if "oldpage" is obsolete, cancel charge of PAGE_SIZE.

Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-01-09 00:31:04 +0800
7a81b88cb memcg: introduce charge-commit-cancel style of functions ... Browse Code »

There is a small race in do_swap_page(). When the page swapped-in is
charged, the mapcount can be greater than 0. But, at the same time some
process (shares it ) call unmap and make mapcount 1->0 and the page is
uncharged.

CPUA CPUB
mapcount == 1.
(1) charge if mapcount==0 zap_pte_range()
(2) mapcount 1 => 0.
(3) uncharge(). (success)
(4) set page's rmap()
mapcount 0=>1

Then, this swap page's account is leaked.

For fixing this, I added a new interface.
- charge
account to res_counter by PAGE_SIZE and try to free pages if necessary.
- commit
register page_cgroup and add to LRU if necessary.
- cancel
uncharge PAGE_SIZE because of do_swap_page failure.

CPUA
(1) charge (always)
(2) set page's rmap (mapcount > 0)
(3) commit charge was necessary or not after set_pte().

This protocol uses PCG_USED bit on page_cgroup for avoiding over accounting.
Usual mem_cgroup_charge_common() does charge -> commit at a time.

And this patch also adds following function to clarify all charges.

- mem_cgroup_newpage_charge() ....replacement for mem_cgroup_charge()
called against newly allocated anon pages.

- mem_cgroup_charge_migrate_fixup()
called only from remove_migration_ptes().
we'll have to rewrite this later.(this patch just keeps old behavior)
This function will be removed by additional patch to make migration
clearer.

Good for clarifying "what we do"

Then, we have 4 following charge points.
- newpage
- swap-in
- add-to-cache.
- migration.

[akpm@linux-foundation.org: add missing inline directives to stubs]
Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2009-01-09 00:31:04 +0800

07 Jan, 2009

3 commits

6d91add09 mm: add Set,ClearPageSwapCache stubs ... Browse Code »

If we add NOOP stubs for SetPageSwapCache() and ClearPageSwapCache(), then
we can remove the #ifdef CONFIG_SWAPs from mm/migrate.c.

Signed-off-by: Hugh Dickins
Acked-by: Christoph Lameter
Cc: Nick Piggin
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2009-01-07 07:59:02 +0800
5bd1455c2 mm: move_pages: no need to set pp->page to ZERO_PAGE(0) by default ... Browse Code »

pp->page is never used when not set to the right page, so there is no need
to set it to ZERO_PAGE(0) by default.

Signed-off-by: Brice Goglin
Acked-by: Christoph Lameter
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Brice Goglin
2009-01-07 07:58:58 +0800
3140a2273 mm: rework do_pages_move() to work on page_sized chunks ... Browse Code »

Rework do_pages_move() to work by page-sized chunks of struct page_to_node
that are passed to do_move_page_to_node_array(). We now only have to
allocate a single page instead a possibly very large vmalloc area to store
all page_to_node entries.

As a result, new_page_node() will now have a very small lookup, hidding
much of the overall sys_move_pages() overhead.

Signed-off-by: Brice Goglin
Signed-off-by: Nathalie Furmento
Acked-by: Christoph Lameter
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Brice Goglin
2009-01-07 07:58:58 +0800

25 Dec, 2008

1 commit

cbacc2c7f Merge branch 'next' into for-linus Browse Code »

James Morris
2008-12-25 08:40:09 +0800

17 Dec, 2008

1 commit

c095adbc2 mm: Don't touch uninitialized variable in do_pages_stat_array() ... Browse Code »

Commit 80bba1290ab5122c60cdb73332b26d288dc8aedd removed one necessary
variable initialization. As a result following warning happened:

CC mm/migrate.o
mm/migrate.c: In function 'sys_move_pages':
mm/migrate.c:1001: warning: 'err' may be used uninitialized in this function

More unfortunately, if find_vma() failed, kernel read uninitialized
memory.

Signed-off-by: KOSAKI Motohiro
CC: Brice Goglin
Cc: Christoph Lameter
Cc: KAMEZAWA Hiroyuki
Cc: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2008-12-17 00:19:23 +0800

11 Dec, 2008

1 commit

80bba1290 mm: no get_user/put_user while holding mmap_sem in do_pages_stat? ... Browse Code »

Since commit 2f007e74bb85b9fc4eab28524052161703300f1a, do_pages_stat()
gets the page address from user-space and puts the corresponding status
back while holding the mmap_sem for read. There is no need to hold
mmap_sem there while some page-faults may occur.

This patch adds a temporary address and status buffer so as to only
hold mmap_sem while working on these kernel buffers. This is
implemented by extracting do_pages_stat_array() out of do_pages_stat().

Signed-off-by: Brice Goglin
Cc: Christoph Lameter
Cc: KAMEZAWA Hiroyuki
Cc: Nick Piggin
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Brice Goglin
2008-12-11 00:01:53 +0800

04 Dec, 2008

1 commit

ec98ce480 Merge branch 'master' into next ... Browse Code »

Conflicts:
fs/nfsd/nfs4recover.c

Manually fixed above to use new creds API functions, e.g.
nfs4_save_creds().

Signed-off-by: James Morris

James Morris
2008-12-04 14:16:36 +0800

20 Nov, 2008

1 commit

bda8550de migration: fix writepage error ... Browse Code »

Page migration's writeout() has got understandably confused by the nasty
AOP_WRITEPAGE_ACTIVATE case: as in normal success, a writepage() error has
unlocked the page, so writeout() then needs to relock it.

Signed-off-by: Hugh Dickins
Cc: KAMEZAWA Hiroyuki
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2008-11-20 10:49:58 +0800

14 Nov, 2008

4 commits

2b8289256 Merge branch 'master' into next ... Browse Code »

Conflicts:
security/keys/internal.h
security/keys/process_keys.c
security/keys/request_key.c

Fixed conflicts above by using the non 'tsk' versions.

Signed-off-by: James Morris

James Morris
2008-11-14 08:29:12 +0800
c69e8d9c0 CRED: Use RCU to access another task's creds and to release a task's own creds ... Browse Code »

Use RCU to access another task's creds and to release a task's own creds.
This means that it will be possible for the credentials of a task to be
replaced without another task (a) requiring a full lock to read them, and (b)
seeing deallocated memory.

Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:19 +0800
b6dff3ec5 CRED: Separate task security context from task_struct ... Browse Code »

Separate the task security context from task_struct. At this point, the
security data is temporarily embedded in the task_struct with two pointers
pointing to it.

Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
entry.S via asm-offsets.

With comment fixes Signed-off-by: Marc Dionne

Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:16 +0800
76aac0e9a CRED: Wrap task credential accesses in the core kernel ... Browse Code »

Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells
Reviewed-by: James Morris
Acked-by: Serge Hallyn
Cc: Al Viro
Cc: linux-audit@redhat.com
Cc: containers@lists.linux-foundation.org
Cc: linux-mm@kvack.org
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:12 +0800

07 Nov, 2008

1 commit

0aedadf91 mm: move migrate_prep out from under mmap_sem ... Browse Code »

Move the migrate_prep outside the mmap_sem for the following system calls

1. sys_move_pages
2. sys_migrate_pages
3. sys_mbind()

It really does not matter when we flush the lru. The system is free to
add pages onto the lru even during migration which will make the page
migration either skip the page (mbind, migrate_pages) or return a busy
state (move_pages).

Fixes this lockdep warning (and potential deadlock):

Some VM place has
mmap_sem -> kevent_wq via lru_add_drain_all()

net/core/dev.c::dev_ioctl() has
rtnl_lock -> mmap_sem (*) the ioctl has copy_from_user() and it can do page fault.

linkwatch_event has
kevent_wq -> rtnl_lock

Signed-off-by: Christoph Lameter
Cc: KOSAKI Motohiro
Reported-by: Heiko Carstens
Cc: Nick Piggin
Cc: Hugh Dickins
Cc: Rik van Riel
Cc: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2008-11-07 07:41:18 +0800