Eric Lee / smarc-fsl-linux-kernel

27 Jul, 2011

1 commit

778d3b0ff cpusets: randomize node rotor used in cpuset_mem_spread_node() ... Browse Code »

[ This patch has already been accepted as commit 0ac0c0d0f837 but later
reverted (commit 35926ff5fba8) because it itroduced arch specific
__node_random which was defined only for x86 code so it broke other
archs. This is a followup without any arch specific code. Other than
that there are no functional changes.]

Some workloads that create a large number of small files tend to assign
too many pages to node 0 (multi-node systems). Part of the reason is
that the rotor (in cpuset_mem_spread_node()) used to assign nodes starts
at node 0 for newly created tasks.

This patch changes the rotor to be initialized to a random node number
of the cpuset.

[akpm@linux-foundation.org: fix layout]
[Lee.Schermerhorn@hp.com: Define stub numa_random() for !NUMA configuration]
[mhocko@suse.cz: Make it arch independent]
[akpm@linux-foundation.org: fix CONFIG_NUMA=y, MAX_NUMNODES>1 build]
Signed-off-by: Jack Steiner
Signed-off-by: Lee Schermerhorn
Signed-off-by: Michal Hocko
Reviewed-by: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Paul Menage
Cc: Jack Steiner
Cc: Robin Holt
Cc: David Rientjes
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Jack Steiner
Cc: KOSAKI Motohiro
Cc: Lee Schermerhorn
Cc: Michal Hocko
Cc: Paul Menage
Cc: Pekka Enberg
Cc: Robin Holt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Michal Hocko
2011-07-27 07:49:43 +0800

25 May, 2011

6 commits

f69ff943d mm: proc: move show_numa_map() to fs/proc/task_mmu.c ... Browse Code »

Moving show_numa_map() from mempolicy.c to task_mmu.c solves several
issues.

- Having the show() operation "miles away" from the corresponding
seq_file iteration operations is a maintenance burden.

- The need to export ad hoc info like struct proc_maps_private is
eliminated.

- The implementation of show_numa_map() can be improved in a simple
manner by cooperating with the other seq_file operations (start,
stop, etc) -- something that would be messy to do without this
change.

Signed-off-by: Stephen Wilson
Reviewed-by: KOSAKI Motohiro
Cc: Hugh Dickins
Cc: David Rientjes
Cc: Lee Schermerhorn
Cc: Alexey Dobriyan
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Wilson
2011-05-25 23:39:34 +0800
9840e3723 mm: remove check_huge_range() ... Browse Code »

This function has been superseded by gather_hugetbl_stats() and is no
longer needed.

Signed-off-by: Stephen Wilson
Reviewed-by: KOSAKI Motohiro
Cc: Hugh Dickins
Cc: David Rientjes
Cc: Lee Schermerhorn
Cc: Alexey Dobriyan
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Wilson
2011-05-25 23:39:33 +0800
722e2ee09 mm: make gather_stats() type-safe and remove forward declaration ... Browse Code »

Improve the prototype of gather_stats() to take a struct numa_maps as
argument instead of a generic void *. Update all callers to make the
required type explicit.

Since gather_stats() is not needed before its definition and is scheduled
to be moved out of mempolicy.c the declaration is removed as well.

Signed-off-by: Stephen Wilson
Reviewed-by: KOSAKI Motohiro
Cc: Hugh Dickins
Cc: David Rientjes
Cc: Lee Schermerhorn
Cc: Alexey Dobriyan
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Wilson
2011-05-25 23:39:33 +0800
b1f72d185 mm: remove MPOL_MF_STATS ... Browse Code »

Mapping statistics in a NUMA environment is now computed using the generic
walk_page_range() logic. Remove the old/equivalent functionality.

Signed-off-by: Stephen Wilson
Reviewed-by: KOSAKI Motohiro
Cc: Hugh Dickins
Cc: David Rientjes
Cc: Lee Schermerhorn
Cc: Alexey Dobriyan
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Wilson
2011-05-25 23:39:33 +0800
29ea2f698 mm: use walk_page_range() instead of custom page table walking code ... Browse Code »
1

Converting show_numa_map() to use the generic routine decouples the
function from mempolicy.c, allowing it to be moved out of the mm subsystem
and into fs/proc.

Also, include KSM pages in /proc/pid/numa_maps statistics. The pagewalk
logic implemented by check_pte_range() failed to account for such pages as
they were not applicable to the page migration case.

Signed-off-by: Stephen Wilson
Reviewed-by: KOSAKI Motohiro
Cc: Hugh Dickins
Cc: David Rientjes
Cc: Lee Schermerhorn
Cc: Alexey Dobriyan
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Wilson
2011-05-25 23:39:32 +0800
d98f6cb67 mm: export get_vma_policy() ... Browse Code »

In commit 48fce3429d ("mempolicies: unexport get_vma_policy()")
get_vma_policy() was marked static as all clients were local to
mempolicy.c.

However, the decision to generate /proc/pid/numa_maps in the numa memory
policy code and outside the procfs subsystem introduces an artificial
interdependency between the two systems. Exporting get_vma_policy() once
again is the first step to clean up this interdependency.

Signed-off-by: Stephen Wilson
Reviewed-by: KOSAKI Motohiro
Cc: Hugh Dickins
Cc: David Rientjes
Cc: Lee Schermerhorn
Cc: Alexey Dobriyan
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Stephen Wilson
2011-05-25 23:39:32 +0800

23 Mar, 2011

1 commit

757196618 mempolicy: remove redundant check in __mpol_equal() ... Browse Code »

The 'flags' field is already checked, no need to do it again.

Signed-off-by: Namhyung Kim
Cc: Bob Liu
Cc: Lee Schermerhorn
Reviewed-by: Minchan Kim
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Namhyung Kim
2011-03-23 08:44:04 +0800

19 Mar, 2011

1 commit

e16b396ce Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (47 commits)
doc: CONFIG_UNEVICTABLE_LRU doesn't exist anymore
Update cpuset info & webiste for cgroups
dcdbas: force SMI to happen when expected
arch/arm/Kconfig: remove one to many l's in the word.
asm-generic/user.h: Fix spelling in comment
drm: fix printk typo 'sracth'
Remove one to many n's in a word
Documentation/filesystems/romfs.txt: fixing link to genromfs
drivers:scsi Change printk typo initate -> initiate
serial, pch uart: Remove duplicate inclusion of linux/pci.h header
fs/eventpoll.c: fix spelling
mm: Fix out-of-date comments which refers non-existent functions
drm: Fix printk typo 'failled'
coh901318.c: Change initate to initiate.
mbox-db5500.c Change initate to initiate.
edac: correct i82975x error-info reported
edac: correct i82975x mci initialisation
edac: correct commented info
fs: update comments to point correct document
target: remove duplicate include of target/target_core_device.h from drivers/target/target_core_hba.c
...

Trivial conflict in fs/eventpoll.c (spelling vs addition)

Linus Torvalds
2011-03-19 01:37:40 +0800

05 Mar, 2011

2 commits

5c4b4be3b mm: use correct numa policy node for transparent hugepages ... Browse Code »

Pass down the correct node for a transparent hugepage allocation. Most
callers continue to use the current node, however the hugepaged daemon
now uses the previous node of the first to be collapsed page instead.
This ensures that khugepaged does not mess up local memory for an
existing process which uses local policy.

The choice of node is somewhat primitive currently: it just uses the
node of the first page in the pmd range. An alternative would be to
look at multiple pages and use the most popular node. I used the
simplest variant for now which should work well enough for the case of
all pages being on the same node.

[akpm@linux-foundation.org: coding-style fixes]
Acked-by: Andrea Arcangeli
Signed-off-by: Andi Kleen
Reviewed-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andi Kleen
2011-03-05 09:53:39 +0800
2f5f9486f mm: change alloc_pages_vma to pass down the policy node for local policy ... Browse Code »

Currently alloc_pages_vma() always uses the local node as policy node for
the LOCAL policy. Pass this node down as an argument instead.

No behaviour change from this patch, but will be needed for followons.

Acked-by: Andrea Arcangeli
Signed-off-by: Andi Kleen
Reviewed-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andi Kleen
2011-03-05 09:53:39 +0800

01 Mar, 2011

1 commit

ae0e47f02 Remove one to many n's in a word ... Browse Code »

Signed-off-by: Justin P. Mattock
Signed-off-by: Jiri Kosina

Justin P. Mattock
2011-03-01 22:47:58 +0800

26 Feb, 2011

1 commit

8eac563c1 thp: fix interleaving for transparent hugepages ... Browse Code »

The THP code didn't pass the correct interleaving shift to the memory
policy code. Fix this here by adjusting for the order.

Signed-off-by: Andi Kleen
Reviewed-by: Christoph Lameter
Acked-by: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andi Kleen
2011-02-26 07:07:37 +0800

14 Jan, 2011

5 commits

0bbbc0b33 thp: add numa awareness to hugepage allocations ... Browse Code »

It's mostly a matter of replacing alloc_pages with alloc_pages_vma after
introducing alloc_pages_vma. khugepaged needs special handling as the
allocation has to happen inside collapse_huge_page where the vma is known
and an error has to be returned to the outer loop to sleep
alloc_sleep_millisecs in case of failure. But it retains the more
efficient logic of handling allocation failures in khugepaged in case of
CONFIG_NUMA=n.

Signed-off-by: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2011-01-14 09:32:45 +0800
bae9c19bf thp: split_huge_page_mm/vma ... Browse Code »

split_huge_page_pmd compat code. Each one of those would need to be
expanded to hundred of lines of complex code without a fully reliable
split_huge_page_pmd design.

Signed-off-by: Andrea Arcangeli
Acked-by: Rik van Riel
Acked-by: Mel Gorman
Signed-off-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrea Arcangeli
2011-01-14 09:32:41 +0800
1e50df39f mempolicy: remove tasklist_lock from migrate_pages ... Browse Code »

Today, tasklist_lock in migrate_pages doesn't protect anything.
rcu_read_lock() provide enough protection from pid hash walk.

Signed-off-by: KOSAKI Motohiro
Reported-by: Peter Zijlstra
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2011-01-14 09:32:36 +0800
7f0f24967 mm: migration: cleanup migrate_pages API by matching types for offlining and sync ... Browse Code »

With the introduction of the boolean sync parameter, the API looks a
little inconsistent as offlining is still an int. Convert offlining to a
bool for the sake of being tidy.

Signed-off-by: Mel Gorman
Cc: Andrea Arcangeli
Cc: KOSAKI Motohiro
Cc: Rik van Riel
Acked-by: Johannes Weiner
Cc: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2011-01-14 09:32:34 +0800
77f1fe6b0 mm: migration: allow migration to operate asynchronously and avoid synchronous c… ... Browse Code »

…ompaction in the faster path

Migration synchronously waits for writeback if the initial passes fails.
Callers of memory compaction do not necessarily want this behaviour if the
caller is latency sensitive or expects that synchronous migration is not
going to have a significantly better success rate.

This patch adds a sync parameter to migrate_pages() allowing the caller to
indicate if wait_on_page_writeback() is allowed within migration or not.
For reclaim/compaction, try_to_compact_pages() is first called
asynchronously, direct reclaim runs and then try_to_compact_pages() is
called synchronously as there is a greater expectation that it'll succeed.

[akpm@linux-foundation.org: build/merge fix]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Mel Gorman
2011-01-14 09:32:34 +0800

03 Dec, 2010

1 commit

55cfaa3cb mm/mempolicy.c: add rcu read lock to protect pid structure ... Browse Code »

find_task_by_vpid() should be protected by rcu_read_lock(), to prevent
free_pid() reclaiming pid.

Signed-off-by: Zeng Zhaoming
Cc: "Paul E. McKenney"
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zeng Zhaoming
2010-12-03 06:51:14 +0800

29 Oct, 2010

1 commit

800416f79 numa: fix slab_node(MPOL_BIND) ... Browse Code »

When a node contains only HighMem memory, slab_node(MPOL_BIND)
dereferences a NULL pointer.

[ This code seems to go back all the way to commit 19770b32609b: "mm:
filter based on a nodemask as well as a gfp_mask". Which was back in
April 2008, and it got merged into 2.6.26. - Linus ]

Signed-off-by: Eric Dumazet
Cc: Mel Gorman
Cc: Christoph Lameter
Cc: Lee Schermerhorn
Cc: Andrew Morton
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Eric Dumazet
2010-10-29 01:04:30 +0800

27 Oct, 2010

2 commits

0def08e3a mm/mempolicy.c: check return code of check_range ... Browse Code »

Function check_range may return ERR_PTR(...). Check for it.

Signed-off-by: Vasiliy Kulikov
Acked-by: David Rientjes
Reviewed-by: Christoph Lameter
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vasiliy Kulikov
2010-10-27 07:52:06 +0800
cf608ac19 mm: compaction: fix COMPACTPAGEFAILED counting ... Browse Code »

Presently update_nr_listpages() doesn't have a role. That's because lists
passed is always empty just after calling migrate_pages. The
migrate_pages cleans up page list which have failed to migrate before
returning by aaa994b3.

[PATCH] page migration: handle freeing of pages in migrate_pages()

Do not leave pages on the lists passed to migrate_pages(). Seems that we will
not need any postprocessing of pages. This will simplify the handling of
pages by the callers of migrate_pages().

At that time, we thought we don't need any postprocessing of pages. But
the situation is changed. The compaction need to know the number of
failed to migrate for COMPACTPAGEFAILED stat

This patch makes new rule for caller of migrate_pages to call
putback_lru_pages. So caller need to clean up the lists so it has a
chance to postprocess the pages. [suggested by Christoph Lameter]

Signed-off-by: Minchan Kim
Cc: Hugh Dickins
Cc: Andi Kleen
Reviewed-by: Mel Gorman
Reviewed-by: Wu Fengguang
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2010-10-27 07:52:06 +0800

10 Aug, 2010

2 commits

596d7cfa2 mempolicy: reduce stack size of migrate_pages() ... Browse Code »

migrate_pages() is using >500 bytes stack. Reduce it.

mm/mempolicy.c: In function 'sys_migrate_pages':
mm/mempolicy.c:1344: warning: the frame size of 528 bytes is larger than 512 bytes

[akpm@linux-foundation.org: don't play with a might-be-NULL pointer]
Signed-off-by: KOSAKI Motohiro
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2010-08-10 11:44:58 +0800
6f48d0ebd oom: select task from tasklist for mempolicy ooms ... Browse Code »

The oom killer presently kills current whenever there is no more memory
free or reclaimable on its mempolicy's nodes. There is no guarantee that
current is a memory-hogging task or that killing it will free any
substantial amount of memory, however.

In such situations, it is better to scan the tasklist for nodes that are
allowed to allocate on current's set of nodes and kill the task with the
highest badness() score. This ensures that the most memory-hogging task,
or the one configured by the user with /proc/pid/oom_adj, is always
selected in such scenarios.

Signed-off-by: David Rientjes
Reviewed-by: KOSAKI Motohiro
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2010-08-10 11:44:56 +0800

30 Jun, 2010

1 commit

5c0c16549 mempolicy: fix dangling reference to tmpfs superblock mpol ... Browse Code »

My patch to "Factor out duplicate put/frees in mpol_shared_policy_init()
to a common return path"; and Dan Carpenter's fix thereto both left a
dangling reference to the incoming tmpfs superblock mempolicy structure.
A similar leak was introduced earlier when the nodemask was moved offstack
to the scratch area despite the note in the comment block regarding the
incoming ref.

Move the remaining 'put of the incoming "mpol" to the common exit path to
drop the reference.

Signed-off-by: Lee Schermerhorn
Acked-by: Dan Carpenter
Cc: KOSAKI Motohiro
Cc: David Rientjes
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2010-06-30 06:29:31 +0800

26 May, 2010

1 commit

0cae3457b mempolicy: ERR_PTR dereference in mpol_shared_policy_init() ... Browse Code »

The original code called mpol_put(new) while "new" was an ERR_PTR.

Signed-off-by: Dan Carpenter
Cc: Lee Schermerhorn
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Carpenter
2010-05-26 23:19:23 +0800

25 May, 2010

10 commits

6ec3a1271 mm: consider the entire user address space during node migration ... Browse Code »

Use mm->task_size instead of TASK_SIZE to ensure that the entire user
address space is migrated. mm->task_size is independent of the calling
task context. TASK SIZE may be dependant on the address space size of the
calling process. Usage of TASK_SIZE can lead to partial address space
migration if the calling process was 32 bit and the migrating process was
64 bit.

Here is the test script used on 64 system with a 32 bit echo process:

mount -t cgroup none /cgroup -o cpuset
cd /cgroup

mkdir 0
echo 1 > 0/cpuset.cpus
echo 0 > 0/cpuset.mems
echo 1 > 0/cpuset.memory_migrate

mkdir 1
echo 1 > 1/cpuset.cpus
echo 1 > 1/cpuset.mems
echo 1 > 1/cpuset.memory_migrate

echo $$ > 0/tasks
64_bit_process &
pid=$!

echo $pid > 1/tasks # This does not migrate all process pages without
# this patch. If 64 bit echo is used or this patch is
# applied, then the full address space of $pid is
# migrated.

To check memory migration, I watched:
grep MemUsed /sys/devices/system/node/node*/meminfo

Signed-off-by: Greg Thelen
Acked-by: Christoph Lameter
Reviewed-by: KOSAKI Motohiro
Cc: Mel Gorman
Cc: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Greg Thelen
2010-05-25 23:07:00 +0800
c0ff7453b cpuset,mm: fix no node to alloc memory when changing cpuset's mems ... Browse Code »
43

Before applying this patch, cpuset updates task->mems_allowed and
mempolicy by setting all new bits in the nodemask first, and clearing all
old unallowed bits later. But in the way, the allocator may find that
there is no node to alloc memory.

The reason is that cpuset rebinds the task's mempolicy, it cleans the
nodes which the allocater can alloc pages on, for example:

(mpol: mempolicy)
task1 task1's mpol task2
alloc page 1
alloc on node0? NO 1
1 change mems from 1 to 0
1 rebind task1's mpol
0-1 set new bits
0 clear disallowed bits
alloc on node1? NO 0
...
can't alloc page
goto oom

This patch fixes this problem by expanding the nodes range first(set newly
allowed bits) and shrink it lazily(clear newly disallowed bits). So we
use a variable to tell the write-side task that read-side task is reading
nodemask, and the write-side task clears newly disallowed nodes after
read-side task ends the current memory allocation.

[akpm@linux-foundation.org: fix spello]
Signed-off-by: Miao Xie
Cc: David Rientjes
Cc: Nick Piggin
Cc: Paul Menage
Cc: Lee Schermerhorn
Cc: Hugh Dickins
Cc: Ravikiran Thirumalai
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miao Xie
2010-05-25 23:06:57 +0800
708c1bbc9 mempolicy: restructure rebinding-mempolicy functions ... Browse Code »

Nick Piggin reported that the allocator may see an empty nodemask when
changing cpuset's mems[1]. It happens only on the kernel that do not do
atomic nodemask_t stores. (MAX_NUMNODES > BITS_PER_LONG)

But I found that there is also a problem on the kernel that can do atomic
nodemask_t stores. The problem is that the allocator can't find a node to
alloc page when changing cpuset's mems though there is a lot of free
memory. The reason is like this:

(mpol: mempolicy)
task1 task1's mpol task2
alloc page 1
alloc on node0? NO 1
1 change mems from 1 to 0
1 rebind task1's mpol
0-1 set new bits
0 clear disallowed bits
alloc on node1? NO 0
...
can't alloc page
goto oom

I can use the attached program reproduce it by the following step:

# mkdir /dev/cpuset
# mount -t cpuset cpuset /dev/cpuset
# mkdir /dev/cpuset/1
# echo `cat /dev/cpuset/cpus` > /dev/cpuset/1/cpus
# echo `cat /dev/cpuset/mems` > /dev/cpuset/1/mems
# echo $$ > /dev/cpuset/1/tasks
# numactl --membind=`cat /dev/cpuset/mems` ./cpuset_mem_hog &
= max(nr_cpus - 1, 1)
# killall -s SIGUSR1 cpuset_mem_hog
# ./change_mems.sh

several hours later, oom will happen though there is a lot of free memory.

This patchset fixes this problem by expanding the nodes range first(set
newly allowed bits) and shrink it lazily(clear newly disallowed bits). So
we use a variable to tell the write-side task that read-side task is
reading nodemask, and the write-side task clears newly disallowed nodes
after read-side task ends the current memory allocation.

This patch:

In order to fix no node to alloc memory, when we want to update mempolicy
and mems_allowed, we expand the set of nodes first (set all the newly
nodes) and shrink the set of nodes lazily(clean disallowed nodes), But the
mempolicy's rebind functions may breaks the expanding.

So we restructure the mempolicy's rebind functions and split the rebind
work to two steps, just like the update of cpuset's mems: The 1st step:
expand the set of the mempolicy's nodes. The 2nd step: shrink the set of
the mempolicy's nodes. It is used when there is no real lock to protect
the mempolicy in the read-side. Otherwise we can do rebind work at once.

In order to implement it, we define

enum mpol_rebind_step {
MPOL_REBIND_ONCE,
MPOL_REBIND_STEP1,
MPOL_REBIND_STEP2,
MPOL_REBIND_NSTEP,
};

If the mempolicy needn't be updated by two steps, we can pass
MPOL_REBIND_ONCE to the rebind functions. Or we can pass
MPOL_REBIND_STEP1 to do the first step of the rebind work and pass
MPOL_REBIND_STEP2 to do the second step work.

Besides that, it maybe long time between these two step and we have to
release the lock that protects mempolicy and mems_allowed. If we hold the
lock once again, we must check whether the current mempolicy is under the
rebinding (the first step has been done) or not, because the task may
alloc a new mempolicy when we don't hold the lock. So we defined the
following flag to identify it:

#define MPOL_F_REBINDING (1 << 2)

The new functions will be used in the next patch.

Signed-off-by: Miao Xie
Cc: David Rientjes
Cc: Nick Piggin
Cc: Paul Menage
Cc: Lee Schermerhorn
Cc: Hugh Dickins
Cc: Ravikiran Thirumalai
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miao Xie
2010-05-25 23:06:57 +0800
15d77835a mempolicy: factor mpol_shared_policy_init() return paths ... Browse Code »

Factor out duplicate put/frees in mpol_shared_policy_init() to a common
return path.

Signed-off-by: Lee Schermerhorn
Cc: Hugh Dickins
Cc: Ravikiran Thirumalai
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2010-05-25 23:06:57 +0800
345ace9c7 mempolicy: rename policy_types and cleanup initialization ... Browse Code »

Rename 'policy_types[]' to 'policy_modes[]' to better match the array
contents.

Use designated intializer syntax for policy_modes[].

Signed-off-by: Lee Schermerhorn
Cc: Hugh Dickins
Cc: Ravikiran Thirumalai
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2010-05-25 23:06:57 +0800
b4652e842 mempolicy: lose unnecessary loop variable in mpol_parse_str() ... Browse Code »

We don't really need the extra variable 'i' in mpol_parse_str(). The only
use is as the the loop variable. Then, it's assigned to 'mode'. Just use
mode, and loose the 'uninitialized_var()' macro.

Signed-off-by: Lee Schermerhorn
Cc: Hugh Dickins
Cc: Ravikiran Thirumalai
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2010-05-25 23:06:57 +0800
e17f74af3 mempolicy: don't call mpol_set_nodemask() when no_context ... Browse Code »

No need to call mpol_set_nodemask() when we have no context for the
mempolicy. This can occur when we're parsing a tmpfs 'mpol' mount option.
Just save the raw nodemask in the mempolicy's w.user_nodemask member for
use when a tmpfs/shmem file is created. mpol_shared_policy_init() will
"contextualize" the policy for the new file based on the creating task's
context.

Signed-off-by: Lee Schermerhorn
Cc: Hugh Dickins
Cc: Ravikiran Thirumalai
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2010-05-25 23:06:57 +0800
198005025 mempolicy: remove redundant check ... Browse Code »

Lee's patch "mempolicy: use MPOL_PREFERRED for system-wide default policy"
has made the MPOL_DEFAULT only used in the memory policy APIs. So, no
need to check in __mpol_equal also. Also get rid of mpol_match_intent()
and move its logic directly into __mpol_equal().

Signed-off-by: Bob Liu
Acked-by: David Rientjes
Cc: Andi Kleen
Cc: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bob Liu
2010-05-25 23:06:57 +0800
6eb27e1fd mempolicy: remove case MPOL_INTERLEAVE from policy_zonelist() ... Browse Code »

In policy_zonelist() mode MPOL_INTERLEAVE shouldn't happen, so fall
through to BUG() instead of break to return. I also fixed the comment.

Signed-off-by: Bob Liu
Acked-by: David Rientjes
Cc: Andi Kleen
Cc: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bob Liu
2010-05-25 23:06:57 +0800
6d556294d mempolicy: remove redundant code ... Browse Code »

1. In funtion is_valid_nodemask(), varibable k will be inited to 0 in
the following loop, needn't init to policy_zone anymore.

2. (MPOL_F_STATIC_NODES | MPOL_F_RELATIVE_NODES) has already defined
to MPOL_MODE_FLAGS in mempolicy.h.

Signed-off-by: Bob Liu
Acked-by: David Rientjes
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bob Liu
2010-05-25 23:06:57 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

25 Mar, 2010

3 commits

c6b6ef8bb mempolicy: fix get_mempolicy() for relative and static nodes ... Browse Code »

Discovered while testing other mempolicy changes:

get_mempolicy() does not handle static/relative mode flags correctly.
Return the value that the user specified so that it can be restored
via set_mempolicy() if desired.

Signed-off-by: Lee Schermerhorn
Cc: Hugh Dickins
Cc: Ravikiran Thirumalai
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2010-03-25 07:31:22 +0800
926f2ae04 tmpfs: cleanup mpol_parse_str() ... Browse Code »

mpol_parse_str() made lots 'err' variable related bug. Because it is ugly
and reviewing unfriendly.

This patch simplifies it.

Signed-off-by: KOSAKI Motohiro
Cc: Ravikiran Thirumalai
Cc: Christoph Lameter
Cc: Mel Gorman
Acked-by: Lee Schermerhorn
Cc: Hugh Dickins
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2010-03-25 07:31:21 +0800
12821f5fb tmpfs: handle MPOL_LOCAL mount option properly ... Browse Code »

commit 71fe804b6d5 (mempolicy: use struct mempolicy pointer in
shmem_sb_info) added mpol=local mount option. but its feature is broken
since it was born. because such code always return 1 (i.e. mount
failure).

This patch fixes it.

Signed-off-by: KOSAKI Motohiro
Cc: Ravikiran Thirumalai
Cc: Christoph Lameter
Cc: Mel Gorman
Acked-by: Lee Schermerhorn
Cc: Hugh Dickins
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2010-03-25 07:31:21 +0800