Doug / smarc-fsl-linux-kernel | Embedian Git Server

25 May, 2010

31 commits

d7a5752c0 mm: export unusable free space index via debugfs ... Browse Code »

The unusable free space index measures how much of the available free
memory cannot be used to satisfy an allocation of a given size and is a
value between 0 and 1. The higher the value, the more of free memory is
unusable and by implication, the worse the external fragmentation is. For
the most part, the huge page size will be the size of interest but not
necessarily so it is exported on a per-order and per-zone basis via
/sys/kernel/debug/extfrag/unusable_index.

> cat /sys/kernel/debug/extfrag/unusable_index
Node 0, zone DMA 0.000 0.000 0.000 0.001 0.005 0.013 0.021 0.037 0.037 0.101 0.230
Node 0, zone Normal 0.000 0.000 0.000 0.001 0.002 0.002 0.005 0.015 0.028 0.028 0.054

[akpm@linux-foundation.org: Fix allnoconfig]
Signed-off-by: Mel Gorman
Reviewed-by: Minchan Kim
Reviewed-by: KOSAKI Motohiro
Reviewed-by: Christoph Lameter
Cc: Rik van Riel
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2010-05-25 23:06:59 +0800
a8bef8ff6 mm: migration: avoid race between shift_arg_pages() and rmap_walk() during migra… ... Browse Code »

…tion by not migrating temporary stacks

Page migration requires rmap to be able to find all ptes mapping a page
at all times, otherwise the migration entry can be instantiated, but it
is possible to leave one behind if the second rmap_walk fails to find
the page. If this page is later faulted, migration_entry_to_page() will
call BUG because the page is locked indicating the page was migrated by
the migration PTE not cleaned up. For example

kernel BUG at include/linux/swapops.h:105!
invalid opcode: 0000 [#1] PREEMPT SMP
...
Call Trace:
[<ffffffff810e951a>] handle_mm_fault+0x3f8/0x76a
[<ffffffff8130c7a2>] do_page_fault+0x44a/0x46e
[<ffffffff813099b5>] page_fault+0x25/0x30
[<ffffffff8114de33>] load_elf_binary+0x152a/0x192b
[<ffffffff8111329b>] search_binary_handler+0x173/0x313
[<ffffffff81114896>] do_execve+0x219/0x30a
[<ffffffff8100a5c6>] sys_execve+0x43/0x5e
[<ffffffff8100320a>] stub_execve+0x6a/0xc0
RIP [<ffffffff811094ff>] migration_entry_wait+0xc1/0x129

There is a race between shift_arg_pages and migration that triggers this
bug. A temporary stack is setup during exec and later moved. If
migration moves a page in the temporary stack and the VMA is then removed
before migration completes, the migration PTE may not be found leading to
a BUG when the stack is faulted.

This patch causes pages within the temporary stack during exec to be
skipped by migration. It does this by marking the VMA covering the
temporary stack with an otherwise impossible combination of VMA flags.
These flags are cleared when the temporary stack is moved to its final
location.

[kamezawa.hiroyu@jp.fujitsu.com: idea for having migration skip temporary stacks]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Mel Gorman
2010-05-25 23:06:59 +0800
e9e96b39f mm: allow CONFIG_MIGRATION to be set without CONFIG_NUMA or memory hot-remove ... Browse Code »

CONFIG_MIGRATION currently depends on CONFIG_NUMA or on the architecture
being able to hot-remove memory. The main users of page migration such as
sys_move_pages(), sys_migrate_pages() and cpuset process migration are
only beneficial on NUMA so it makes sense.

As memory compaction will operate within a zone and is useful on both NUMA
and non-NUMA systems, this patch allows CONFIG_MIGRATION to be set if the
user selects CONFIG_COMPACTION as an option.

[akpm@linux-foundation.org: Depend on CONFIG_HUGETLB_PAGE]
Signed-off-by: Mel Gorman
Reviewed-by: Christoph Lameter
Reviewed-by: Rik van Riel
Reviewed-by: KAMEZAWA Hiroyuki
Cc: Minchan Kim
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2010-05-25 23:06:59 +0800
3fe2011ff mm: migration: allow the migration of PageSwapCache pages ... Browse Code »

PageAnon pages that are unmapped may or may not have an anon_vma so are
not currently migrated. However, a swap cache page can be migrated and
fits this description. This patch identifies page swap caches and allows
them to be migrated but ensures that no attempt to made to remap the pages
would would potentially try to access an already freed anon_vma.

Signed-off-by: Mel Gorman
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Minchan Kim
Cc: Rik van Riel
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2010-05-25 23:06:59 +0800
67b9509b2 mm: migration: do not try to migrate unmapped anonymous pages ... Browse Code »

rmap_walk_anon() was triggering errors in memory compaction that look like
use-after-free errors. The problem is that between the page being
isolated from the LRU and rcu_read_lock() being taken, the mapcount of the
page dropped to 0 and the anon_vma gets freed. This can happen during
memory compaction if pages being migrated belong to a process that exits
before migration completes. Hence, the use-after-free race looks like

1. Page isolated for migration
2. Process exits
3. page_mapcount(page) drops to zero so anon_vma was no longer reliable
4. unmap_and_move() takes the rcu_lock but the anon_vma is already garbage
4. call try_to_unmap, looks up tha anon_vma and "locks" it but the lock
is garbage.

This patch checks the mapcount after the rcu lock is taken. If the
mapcount is zero, the anon_vma is assumed to be freed and no further
action is taken.

Signed-off-by: Mel Gorman
Acked-by: Rik van Riel
Reviewed-by: Minchan Kim
Reviewed-by: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2010-05-25 23:06:58 +0800
7f60c214f mm: migration: share the anon_vma ref counts between KSM and page migration ... Browse Code »

For clarity of review, KSM and page migration have separate refcounts on
the anon_vma. While clear, this is a waste of memory. This patch gets
KSM and page migration to share their toys in a spirit of harmony.

Signed-off-by: Mel Gorman
Reviewed-by: Minchan Kim
Reviewed-by: KOSAKI Motohiro
Reviewed-by: Christoph Lameter
Reviewed-by: KAMEZAWA Hiroyuki
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2010-05-25 23:06:58 +0800
3f6c82728 mm: migration: take a reference to the anon_vma before migrating ... Browse Code »

This patchset is a memory compaction mechanism that reduces external
fragmentation memory by moving GFP_MOVABLE pages to a fewer number of
pageblocks. The term "compaction" was chosen as there are is a number of
mechanisms that are not mutually exclusive that can be used to defragment
memory. For example, lumpy reclaim is a form of defragmentation as was
slub "defragmentation" (really a form of targeted reclaim). Hence, this
is called "compaction" to distinguish it from other forms of
defragmentation.

In this implementation, a full compaction run involves two scanners
operating within a zone - a migration and a free scanner. The migration
scanner starts at the beginning of a zone and finds all movable pages
within one pageblock_nr_pages-sized area and isolates them on a
migratepages list. The free scanner begins at the end of the zone and
searches on a per-area basis for enough free pages to migrate all the
pages on the migratepages list. As each area is respectively migrated or
exhausted of free pages, the scanners are advanced one area. A compaction
run completes within a zone when the two scanners meet.

This method is a bit primitive but is easy to understand and greater
sophistication would require maintenance of counters on a per-pageblock
basis. This would have a big impact on allocator fast-paths to improve
compaction which is a poor trade-off.

It also does not try relocate virtually contiguous pages to be physically
contiguous. However, assuming transparent hugepages were in use, a
hypothetical khugepaged might reuse compaction code to isolate free pages,
split them and relocate userspace pages for promotion.

Memory compaction can be triggered in one of three ways. It may be
triggered explicitly by writing any value to /proc/sys/vm/compact_memory
and compacting all of memory. It can be triggered on a per-node basis by
writing any value to /sys/devices/system/node/nodeN/compact where N is the
node ID to be compacted. When a process fails to allocate a high-order
page, it may compact memory in an attempt to satisfy the allocation
instead of entering direct reclaim. Explicit compaction does not finish
until the two scanners meet and direct compaction ends if a suitable page
becomes available that would meet watermarks.

The series is in 14 patches. The first three are not "core" to the series
but are important pre-requisites.

Patch 1 reference counts anon_vma for rmap_walk_anon(). Without this
patch, it's possible to use anon_vma after free if the caller is
not holding a VMA or mmap_sem for the pages in question. While
there should be no existing user that causes this problem,
it's a requirement for memory compaction to be stable. The patch
is at the start of the series for bisection reasons.
Patch 2 merges the KSM and migrate counts. It could be merged with patch 1
but would be slightly harder to review.
Patch 3 skips over unmapped anon pages during migration as there are no
guarantees about the anon_vma existing. There is a window between
when a page was isolated and migration started during which anon_vma
could disappear.
Patch 4 notes that PageSwapCache pages can still be migrated even if they
are unmapped.
Patch 5 allows CONFIG_MIGRATION to be set without CONFIG_NUMA
Patch 6 exports a "unusable free space index" via debugfs. It's
a measure of external fragmentation that takes the size of the
allocation request into account. It can also be calculated from
userspace so can be dropped if requested
Patch 7 exports a "fragmentation index" which only has meaning when an
allocation request fails. It determines if an allocation failure
would be due to a lack of memory or external fragmentation.
Patch 8 moves the definition for LRU isolation modes for use by compaction
Patch 9 is the compaction mechanism although it's unreachable at this point
Patch 10 adds a means of compacting all of memory with a proc trgger
Patch 11 adds a means of compacting a specific node with a sysfs trigger
Patch 12 adds "direct compaction" before "direct reclaim" if it is
determined there is a good chance of success.
Patch 13 adds a sysctl that allows tuning of the threshold at which the
kernel will compact or direct reclaim
Patch 14 temporarily disables compaction if an allocation failure occurs
after compaction.

Testing of compaction was in three stages. For the test, debugging,
preempt, the sleep watchdog and lockdep were all enabled but nothing nasty
popped out. min_free_kbytes was tuned as recommended by hugeadm to help
fragmentation avoidance and high-order allocations. It was tested on X86,
X86-64 and PPC64.

Ths first test represents one of the easiest cases that can be faced for
lumpy reclaim or memory compaction.

1. Machine freshly booted and configured for hugepage usage with
a) hugeadm --create-global-mounts
b) hugeadm --pool-pages-max DEFAULT:8G
c) hugeadm --set-recommended-min_free_kbytes
d) hugeadm --set-recommended-shmmax

The min_free_kbytes here is important. Anti-fragmentation works best
when pageblocks don't mix. hugeadm knows how to calculate a value that
will significantly reduce the worst of external-fragmentation-related
events as reported by the mm_page_alloc_extfrag tracepoint.

2. Load up memory
a) Start updatedb
b) Create in parallel a X files of pagesize*128 in size. Wait
until files are created. By parallel, I mean that 4096 instances
of dd were launched, one after the other using &. The crude
objective being to mix filesystem metadata allocations with
the buffer cache.
c) Delete every second file so that pageblocks are likely to
have holes
d) kill updatedb if it's still running

At this point, the system is quiet, memory is full but it's full with
clean filesystem metadata and clean buffer cache that is unmapped.
This is readily migrated or discarded so you'd expect lumpy reclaim
to have no significant advantage over compaction but this is at
the POC stage.

3. In increments, attempt to allocate 5% of memory as hugepages.
Measure how long it took, how successful it was, how many
direct reclaims took place and how how many compactions. Note
the compaction figures might not fully add up as compactions
can take place for orders other than the hugepage size

X86 vanilla compaction
Final page count 913 916 (attempted 1002)
pages reclaimed 68296 9791

X86-64 vanilla compaction
Final page count: 901 902 (attempted 1002)
Total pages reclaimed: 112599 53234

PPC64 vanilla compaction
Final page count: 93 94 (attempted 110)
Total pages reclaimed: 103216 61838

There was not a dramatic improvement in success rates but it wouldn't be
expected in this case either. What was important is that fewer pages were
reclaimed in all cases reducing the amount of IO required to satisfy a
huge page allocation.

The second tests were all performance related - kernbench, netperf, iozone
and sysbench. None showed anything too remarkable.

The last test was a high-order allocation stress test. Many kernel
compiles are started to fill memory with a pressured mix of unmovable and
movable allocations. During this, an attempt is made to allocate 90% of
memory as huge pages - one at a time with small delays between attempts to
avoid flooding the IO queue.

vanilla compaction
Percentage of request allocated X86 98 99
Percentage of request allocated X86-64 95 98
Percentage of request allocated PPC64 55 70

This patch:

rmap_walk_anon() does not use page_lock_anon_vma() for looking up and
locking an anon_vma and it does not appear to have sufficient locking to
ensure the anon_vma does not disappear from under it.

This patch copies an approach used by KSM to take a reference on the
anon_vma while pages are being migrated. This should prevent rmap_walk()
running into nasty surprises later because anon_vma has been freed.

Signed-off-by: Mel Gorman
Acked-by: Rik van Riel
Cc: Minchan Kim
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2010-05-25 23:06:58 +0800
e325c90ff mm: default to node zonelist ordering when nodes have only lowmem ... Browse Code »

There are two types of zonelist ordering methodologies:

- node order, preferring allocations on a node to stay local to and

- zone order, preferring allocations come from a higher zone to avoid
allocating in lowmem zones even though they may not be local.

The ordering technique used by the kernel is configurable on the command
line, but also has some logic to determine what the default should be.

This logic currently lacks knowledge of systems where a node may only have
lowmem. For such systems, it is necessary to use node order so that
GFP_KERNEL allocations may be satisfied by nodes consisting of only
lowmem.

If zone order is used, GFP_KERNEL allocations to such nodes are actually
allocated on a node with local affinity that includes ZONE_NORMAL.

This change defaults to node zonelist ordering if any node lacks
ZONE_NORMAL.

To force zone order, append 'numa_zonelist_order=zone' to the kernel
command line.

Signed-off-by: David Rientjes
Acked-by: Mel Gorman
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2010-05-25 23:06:58 +0800
1a5cb8146 pagemap: add #ifdefs CONFIG_HUGETLB_PAGE on code walking hugetlb vma ... Browse Code »

If !CONFIG_HUGETLB_PAGE, pagemap_hugetlb_range() is never called. So put
it (and its calling function) into #ifdef block.

Signed-off-by: Naoya Horiguchi
Acked-by: Matt Mackall
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Naoya Horiguchi
2010-05-25 23:06:58 +0800
e48293fd7 mincore: do nested page table walks ... Browse Code »

Do page table walks with the well-known nested loops we use in several
other places already.

This avoids doing full page table walks after every pte range and also
allows to handle unmapped areas bigger than one pte range in one go.

Signed-off-by: Johannes Weiner
Cc: Andrea Arcangeli
Cc: Naoya Horiguchi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2010-05-25 23:06:58 +0800
25ef0e50c mincore: pass ranges as start,end address pairs ... Browse Code »

Instead of passing a start address and a number of pages into the helper
functions, convert them to use a start and an end address.

Signed-off-by: Johannes Weiner
Cc: Andrea Arcangeli
Cc: Naoya Horiguchi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2010-05-25 23:06:58 +0800
f48840107 mincore: break do_mincore() into logical pieces ... Browse Code »

Split out functions to handle hugetlb ranges, pte ranges and unmapped
ranges, to improve readability but also to prepare the file structure for
nested page table walks.

No semantic changes intended.

Signed-off-by: Johannes Weiner
Cc: Andrea Arcangeli
Cc: Naoya Horiguchi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2010-05-25 23:06:58 +0800
6a60f1b35 mincore: cleanups ... Browse Code »

This fixes some minor issues that bugged me while going over the code:

o adjust argument order of do_mincore() to match the syscall
o simplify range length calculation
o drop superfluous shift in huge tlb calculation, address is page aligned
o drop dead nr_huge calculation
o check pte_none() before pte_present()
o comment and whitespace fixes

No semantic changes intended.

Signed-off-by: Johannes Weiner
Cc: Andrea Arcangeli
Cc: Naoya Horiguchi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2010-05-25 23:06:58 +0800
c0ff7453b cpuset,mm: fix no node to alloc memory when changing cpuset's mems ... Browse Code »

Before applying this patch, cpuset updates task->mems_allowed and
mempolicy by setting all new bits in the nodemask first, and clearing all
old unallowed bits later. But in the way, the allocator may find that
there is no node to alloc memory.

The reason is that cpuset rebinds the task's mempolicy, it cleans the
nodes which the allocater can alloc pages on, for example:

(mpol: mempolicy)
task1 task1's mpol task2
alloc page 1
alloc on node0? NO 1
1 change mems from 1 to 0
1 rebind task1's mpol
0-1 set new bits
0 clear disallowed bits
alloc on node1? NO 0
...
can't alloc page
goto oom

This patch fixes this problem by expanding the nodes range first(set newly
allowed bits) and shrink it lazily(clear newly disallowed bits). So we
use a variable to tell the write-side task that read-side task is reading
nodemask, and the write-side task clears newly disallowed nodes after
read-side task ends the current memory allocation.

[akpm@linux-foundation.org: fix spello]
Signed-off-by: Miao Xie
Cc: David Rientjes
Cc: Nick Piggin
Cc: Paul Menage
Cc: Lee Schermerhorn
Cc: Hugh Dickins
Cc: Ravikiran Thirumalai
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miao Xie
2010-05-25 23:06:57 +0800
708c1bbc9 mempolicy: restructure rebinding-mempolicy functions ... Browse Code »

Nick Piggin reported that the allocator may see an empty nodemask when
changing cpuset's mems[1]. It happens only on the kernel that do not do
atomic nodemask_t stores. (MAX_NUMNODES > BITS_PER_LONG)

But I found that there is also a problem on the kernel that can do atomic
nodemask_t stores. The problem is that the allocator can't find a node to
alloc page when changing cpuset's mems though there is a lot of free
memory. The reason is like this:

(mpol: mempolicy)
task1 task1's mpol task2
alloc page 1
alloc on node0? NO 1
1 change mems from 1 to 0
1 rebind task1's mpol
0-1 set new bits
0 clear disallowed bits
alloc on node1? NO 0
...
can't alloc page
goto oom

I can use the attached program reproduce it by the following step:

# mkdir /dev/cpuset
# mount -t cpuset cpuset /dev/cpuset
# mkdir /dev/cpuset/1
# echo `cat /dev/cpuset/cpus` > /dev/cpuset/1/cpus
# echo `cat /dev/cpuset/mems` > /dev/cpuset/1/mems
# echo $$ > /dev/cpuset/1/tasks
# numactl --membind=`cat /dev/cpuset/mems` ./cpuset_mem_hog &
= max(nr_cpus - 1, 1)
# killall -s SIGUSR1 cpuset_mem_hog
# ./change_mems.sh

several hours later, oom will happen though there is a lot of free memory.

This patchset fixes this problem by expanding the nodes range first(set
newly allowed bits) and shrink it lazily(clear newly disallowed bits). So
we use a variable to tell the write-side task that read-side task is
reading nodemask, and the write-side task clears newly disallowed nodes
after read-side task ends the current memory allocation.

This patch:

In order to fix no node to alloc memory, when we want to update mempolicy
and mems_allowed, we expand the set of nodes first (set all the newly
nodes) and shrink the set of nodes lazily(clean disallowed nodes), But the
mempolicy's rebind functions may breaks the expanding.

So we restructure the mempolicy's rebind functions and split the rebind
work to two steps, just like the update of cpuset's mems: The 1st step:
expand the set of the mempolicy's nodes. The 2nd step: shrink the set of
the mempolicy's nodes. It is used when there is no real lock to protect
the mempolicy in the read-side. Otherwise we can do rebind work at once.

In order to implement it, we define

enum mpol_rebind_step {
MPOL_REBIND_ONCE,
MPOL_REBIND_STEP1,
MPOL_REBIND_STEP2,
MPOL_REBIND_NSTEP,
};

If the mempolicy needn't be updated by two steps, we can pass
MPOL_REBIND_ONCE to the rebind functions. Or we can pass
MPOL_REBIND_STEP1 to do the first step of the rebind work and pass
MPOL_REBIND_STEP2 to do the second step work.

Besides that, it maybe long time between these two step and we have to
release the lock that protects mempolicy and mems_allowed. If we hold the
lock once again, we must check whether the current mempolicy is under the
rebinding (the first step has been done) or not, because the task may
alloc a new mempolicy when we don't hold the lock. So we defined the
following flag to identify it:

#define MPOL_F_REBINDING (1 << 2)

The new functions will be used in the next patch.

Signed-off-by: Miao Xie
Cc: David Rientjes
Cc: Nick Piggin
Cc: Paul Menage
Cc: Lee Schermerhorn
Cc: Hugh Dickins
Cc: Ravikiran Thirumalai
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miao Xie
2010-05-25 23:06:57 +0800
971ada0f6 mempolicy: document cpuset interaction with tmpfs mpol mount option ... Browse Code »

Update Documentation/filesystems/tmpfs.txt to describe the interaction of
tmpfs mount option memory policy with tasks' cpuset mems_allowed.

Note: the mount(8) man page [in the util-linux-ng package] requires
similiar updates.

Signed-off-by: Lee Schermerhorn
Cc: Hugh Dickins
Cc: Ravikiran Thirumalai
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2010-05-25 23:06:57 +0800
15d77835a mempolicy: factor mpol_shared_policy_init() return paths ... Browse Code »

Factor out duplicate put/frees in mpol_shared_policy_init() to a common
return path.

Signed-off-by: Lee Schermerhorn
Cc: Hugh Dickins
Cc: Ravikiran Thirumalai
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2010-05-25 23:06:57 +0800
345ace9c7 mempolicy: rename policy_types and cleanup initialization ... Browse Code »

Rename 'policy_types[]' to 'policy_modes[]' to better match the array
contents.

Use designated intializer syntax for policy_modes[].

Signed-off-by: Lee Schermerhorn
Cc: Hugh Dickins
Cc: Ravikiran Thirumalai
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2010-05-25 23:06:57 +0800
b4652e842 mempolicy: lose unnecessary loop variable in mpol_parse_str() ... Browse Code »

We don't really need the extra variable 'i' in mpol_parse_str(). The only
use is as the the loop variable. Then, it's assigned to 'mode'. Just use
mode, and loose the 'uninitialized_var()' macro.

Signed-off-by: Lee Schermerhorn
Cc: Hugh Dickins
Cc: Ravikiran Thirumalai
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2010-05-25 23:06:57 +0800
e17f74af3 mempolicy: don't call mpol_set_nodemask() when no_context ... Browse Code »

No need to call mpol_set_nodemask() when we have no context for the
mempolicy. This can occur when we're parsing a tmpfs 'mpol' mount option.
Just save the raw nodemask in the mempolicy's w.user_nodemask member for
use when a tmpfs/shmem file is created. mpol_shared_policy_init() will
"contextualize" the policy for the new file based on the creating task's
context.

Signed-off-by: Lee Schermerhorn
Cc: Hugh Dickins
Cc: Ravikiran Thirumalai
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2010-05-25 23:06:57 +0800
198005025 mempolicy: remove redundant check ... Browse Code »

Lee's patch "mempolicy: use MPOL_PREFERRED for system-wide default policy"
has made the MPOL_DEFAULT only used in the memory policy APIs. So, no
need to check in __mpol_equal also. Also get rid of mpol_match_intent()
and move its logic directly into __mpol_equal().

Signed-off-by: Bob Liu
Acked-by: David Rientjes
Cc: Andi Kleen
Cc: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bob Liu
2010-05-25 23:06:57 +0800
6eb27e1fd mempolicy: remove case MPOL_INTERLEAVE from policy_zonelist() ... Browse Code »

In policy_zonelist() mode MPOL_INTERLEAVE shouldn't happen, so fall
through to BUG() instead of break to return. I also fixed the comment.

Signed-off-by: Bob Liu
Acked-by: David Rientjes
Cc: Andi Kleen
Cc: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bob Liu
2010-05-25 23:06:57 +0800
6d556294d mempolicy: remove redundant code ... Browse Code »

1. In funtion is_valid_nodemask(), varibable k will be inited to 0 in
the following loop, needn't init to policy_zone anymore.

2. (MPOL_F_STATIC_NODES | MPOL_F_RELATIVE_NODES) has already defined
to MPOL_MODE_FLAGS in mempolicy.h.

Signed-off-by: Bob Liu
Acked-by: David Rientjes
Cc: KOSAKI Motohiro
Cc: Christoph Lameter
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bob Liu
2010-05-25 23:06:57 +0800
e13861d82 mm: remove return value of putback_lru_pages() ... Browse Code »

putback_lru_page() never can fail. So it doesn't matter count of "the
number of pages put back".

In addition, users of this functions don't use return value.

Let's remove unnecessary code.

Signed-off-by: Minchan Kim
Reviewed-by: Rik van Riel
Reviewed-by: KOSAKI Motohiro
Reviewed-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2010-05-25 23:06:57 +0800
4b50dc26a shmem: remove redundant code ... Browse Code »

prep_new_page() will call set_page_private(page, 0) to initialise the
page, so the code is redundant.

Signed-off-by: Huang Shijie
Reviewed-by: Minchan Kim
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Huang Shijie
2010-05-25 23:06:57 +0800
e48e67e08 sparsemem: on no vmemmap path put mem_map on node high too ... Browse Code »

We need to put mem_map high when virtual memmap is not used.

before this patch
free mem pfn range on first node:
[ 0.000000] 19 - 1f
[ 0.000000] 28 40 - 80 95
[ 0.000000] 702 740 - 1000 1000
[ 0.000000] 347c - 347e
[ 0.000000] 34e7 3500 - 3b80 3b8b
[ 0.000000] 73b8b 73bc0 - 73c00 73c00
[ 0.000000] 73ddd - 73e00
[ 0.000000] 73fdd - 74000
[ 0.000000] 741dd - 74200
[ 0.000000] 743dd - 74400
[ 0.000000] 745dd - 74600
[ 0.000000] 747dd - 74800
[ 0.000000] 749dd - 74a00
[ 0.000000] 74bdd - 74c00
[ 0.000000] 74ddd - 74e00
[ 0.000000] 74fdd - 75000
[ 0.000000] 751dd - 75200
[ 0.000000] 753dd - 75400
[ 0.000000] 755dd - 75600
[ 0.000000] 757dd - 75800
[ 0.000000] 759dd - 75a00
[ 0.000000] 79bdd 79c00 - 7d540 7d550
[ 0.000000] 7f745 - 7f750
[ 0.000000] 10000b 100040 - 2080000 2080000
so only 79c00 - 7d540 are major free block under 4g...

after this patch, we will get
[ 0.000000] 19 - 1f
[ 0.000000] 28 40 - 80 95
[ 0.000000] 702 740 - 1000 1000
[ 0.000000] 347c - 347e
[ 0.000000] 34e7 3500 - 3600 3600
[ 0.000000] 37dd - 3800
[ 0.000000] 39dd - 3a00
[ 0.000000] 3bdd - 3c00
[ 0.000000] 3ddd - 3e00
[ 0.000000] 3fdd - 4000
[ 0.000000] 41dd - 4200
[ 0.000000] 43dd - 4400
[ 0.000000] 45dd - 4600
[ 0.000000] 47dd - 4800
[ 0.000000] 49dd - 4a00
[ 0.000000] 4bdd - 4c00
[ 0.000000] 4ddd - 4e00
[ 0.000000] 4fdd - 5000
[ 0.000000] 51dd - 5200
[ 0.000000] 53dd - 5400
[ 0.000000] 95dd 9600 - 7d540 7d550
[ 0.000000] 7f745 - 7f750
[ 0.000000] 17000b 170040 - 2080000 2080000
we will have 9600 - 7d540 for major free block...

sparse-vmemmap path already used __alloc_bootmem_node_high()

Signed-off-by: Yinghai Lu
Cc: Jiri Slaby
Cc: "H. Peter Anvin"
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Christoph Lameter
Cc: Greg Thelen
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yinghai Lu
2010-05-25 23:06:56 +0800
6dda9d55b page allocator: reduce fragmentation in buddy allocator by adding buddies that a… ... Browse Code »

…re merging to the tail of the free lists

In order to reduce fragmentation, this patch classifies freed pages in two
groups according to their probability of being part of a high order merge.
Pages belonging to a compound whose next-highest buddy is free are more
likely to be part of a high order merge in the near future, so they will
be added at the tail of the freelist. The remaining pages are put at the
front of the freelist.

In this way, the pages that are more likely to cause a big merge are kept
free longer. Consequently there is a tendency to aggregate the
long-living allocations on a subset of the compounds, reducing the
fragmentation.

This heuristic was tested on three machines, x86, x86-64 and ppc64 with
3GB of RAM in each machine. The tests were kernbench, netperf, sysbench
and STREAM for performance and a high-order stress test for huge page
allocations.

KernBench X86
Elapsed mean 374.77 ( 0.00%) 375.10 (-0.09%)
User mean 649.53 ( 0.00%) 650.44 (-0.14%)
System mean 54.75 ( 0.00%) 54.18 ( 1.05%)
CPU mean 187.75 ( 0.00%) 187.25 ( 0.27%)

KernBench X86-64
Elapsed mean 94.45 ( 0.00%) 94.01 ( 0.47%)
User mean 323.27 ( 0.00%) 322.66 ( 0.19%)
System mean 36.71 ( 0.00%) 36.50 ( 0.57%)
CPU mean 380.75 ( 0.00%) 381.75 (-0.26%)

KernBench PPC64
Elapsed mean 173.45 ( 0.00%) 173.74 (-0.17%)
User mean 587.99 ( 0.00%) 587.95 ( 0.01%)
System mean 60.60 ( 0.00%) 60.57 ( 0.05%)
CPU mean 373.50 ( 0.00%) 372.75 ( 0.20%)

Nothing notable for kernbench.

NetPerf UDP X86
64 42.68 ( 0.00%) 42.77 ( 0.21%)
128 85.62 ( 0.00%) 85.32 (-0.35%)
256 170.01 ( 0.00%) 168.76 (-0.74%)
1024 655.68 ( 0.00%) 652.33 (-0.51%)
2048 1262.39 ( 0.00%) 1248.61 (-1.10%)
3312 1958.41 ( 0.00%) 1944.61 (-0.71%)
4096 2345.63 ( 0.00%) 2318.83 (-1.16%)
8192 4132.90 ( 0.00%) 4089.50 (-1.06%)
16384 6770.88 ( 0.00%) 6642.05 (-1.94%)*

NetPerf UDP X86-64
64 148.82 ( 0.00%) 154.92 ( 3.94%)
128 298.96 ( 0.00%) 312.95 ( 4.47%)
256 583.67 ( 0.00%) 626.39 ( 6.82%)
1024 2293.18 ( 0.00%) 2371.10 ( 3.29%)
2048 4274.16 ( 0.00%) 4396.83 ( 2.79%)
3312 6356.94 ( 0.00%) 6571.35 ( 3.26%)
4096 7422.68 ( 0.00%) 7635.42 ( 2.79%)*
8192 12114.81 ( 0.00%)* 12346.88 ( 1.88%)
16384 17022.28 ( 0.00%)* 17033.19 ( 0.06%)*
1.64% 2.73%

NetPerf UDP PPC64
64 49.98 ( 0.00%) 50.25 ( 0.54%)
128 98.66 ( 0.00%) 100.95 ( 2.27%)
256 197.33 ( 0.00%) 191.03 (-3.30%)
1024 761.98 ( 0.00%) 785.07 ( 2.94%)
2048 1493.50 ( 0.00%) 1510.85 ( 1.15%)
3312 2303.95 ( 0.00%) 2271.72 (-1.42%)
4096 2774.56 ( 0.00%) 2773.06 (-0.05%)
8192 4918.31 ( 0.00%) 4793.59 (-2.60%)
16384 7497.98 ( 0.00%) 7749.52 ( 3.25%)

The tests are run to have confidence limits within 1%. Results marked
with a * were not confident although in this case, it's only outside by
small amounts. Even with some results that were not confident, the
netperf UDP results were generally positive.

NetPerf TCP X86
64 652.25 ( 0.00%)* 648.12 (-0.64%)*
23.80% 22.82%
128 1229.98 ( 0.00%)* 1220.56 (-0.77%)*
21.03% 18.90%
256 2105.88 ( 0.00%) 1872.03 (-12.49%)*
1.00% 16.46%
1024 3476.46 ( 0.00%)* 3548.28 ( 2.02%)*
13.37% 11.39%
2048 4023.44 ( 0.00%)* 4231.45 ( 4.92%)*
9.76% 12.48%
3312 4348.88 ( 0.00%)* 4396.96 ( 1.09%)*
6.49% 8.75%
4096 4726.56 ( 0.00%)* 4877.71 ( 3.10%)*
9.85% 8.50%
8192 4732.28 ( 0.00%)* 5777.77 (18.10%)*
9.13% 13.04%
16384 5543.05 ( 0.00%)* 5906.24 ( 6.15%)*
7.73% 8.68%

NETPERF TCP X86-64
netperf-tcp-vanilla-netperf netperf-tcp
tcp-vanilla pgalloc-delay
64 1895.87 ( 0.00%)* 1775.07 (-6.81%)*
5.79% 4.78%
128 3571.03 ( 0.00%)* 3342.20 (-6.85%)*
3.68% 6.06%
256 5097.21 ( 0.00%)* 4859.43 (-4.89%)*
3.02% 2.10%
1024 8919.10 ( 0.00%)* 8892.49 (-0.30%)*
5.89% 6.55%
2048 10255.46 ( 0.00%)* 10449.39 ( 1.86%)*
7.08% 7.44%
3312 10839.90 ( 0.00%)* 10740.15 (-0.93%)*
6.87% 7.33%
4096 10814.84 ( 0.00%)* 10766.97 (-0.44%)*
6.86% 8.18%
8192 11606.89 ( 0.00%)* 11189.28 (-3.73%)*
7.49% 5.55%
16384 12554.88 ( 0.00%)* 12361.22 (-1.57%)*
7.36% 6.49%

NETPERF TCP PPC64
netperf-tcp-vanilla-netperf netperf-tcp
tcp-vanilla pgalloc-delay
64 594.17 ( 0.00%) 596.04 ( 0.31%)*
1.00% 2.29%
128 1064.87 ( 0.00%)* 1074.77 ( 0.92%)*
1.30% 1.40%
256 1852.46 ( 0.00%)* 1856.95 ( 0.24%)
1.25% 1.00%
1024 3839.46 ( 0.00%)* 3813.05 (-0.69%)
1.02% 1.00%
2048 4885.04 ( 0.00%)* 4881.97 (-0.06%)*
1.15% 1.04%
3312 5506.90 ( 0.00%) 5459.72 (-0.86%)
4096 6449.19 ( 0.00%) 6345.46 (-1.63%)
8192 7501.17 ( 0.00%) 7508.79 ( 0.10%)
16384 9618.65 ( 0.00%) 9490.10 (-1.35%)

There was a distinct lack of confidence in the X86* figures so I included
what the devation was where the results were not confident. Many of the
results, whether gains or losses were within the standard deviation so no
solid conclusion can be reached on performance impact. Looking at the
figures, only the X86-64 ones look suspicious with a few losses that were
outside the noise. However, the results were so unstable that without
knowing why they vary so much, a solid conclusion cannot be reached.

SYSBENCH X86
sysbench-vanilla pgalloc-delay
1 7722.85 ( 0.00%) 7756.79 ( 0.44%)
2 14901.11 ( 0.00%) 13683.44 (-8.90%)
3 15171.71 ( 0.00%) 14888.25 (-1.90%)
4 14966.98 ( 0.00%) 15029.67 ( 0.42%)
5 14370.47 ( 0.00%) 14865.00 ( 3.33%)
6 14870.33 ( 0.00%) 14845.57 (-0.17%)
7 14429.45 ( 0.00%) 14520.85 ( 0.63%)
8 14354.35 ( 0.00%) 14362.31 ( 0.06%)

SYSBENCH X86-64
1 17448.70 ( 0.00%) 17484.41 ( 0.20%)
2 34276.39 ( 0.00%) 34251.00 (-0.07%)
3 50805.25 ( 0.00%) 50854.80 ( 0.10%)
4 66667.10 ( 0.00%) 66174.69 (-0.74%)
5 66003.91 ( 0.00%) 65685.25 (-0.49%)
6 64981.90 ( 0.00%) 65125.60 ( 0.22%)
7 64933.16 ( 0.00%) 64379.23 (-0.86%)
8 63353.30 ( 0.00%) 63281.22 (-0.11%)
9 63511.84 ( 0.00%) 63570.37 ( 0.09%)
10 62708.27 ( 0.00%) 63166.25 ( 0.73%)
11 62092.81 ( 0.00%) 61787.75 (-0.49%)
12 61330.11 ( 0.00%) 61036.34 (-0.48%)
13 61438.37 ( 0.00%) 61994.47 ( 0.90%)
14 62304.48 ( 0.00%) 62064.90 (-0.39%)
15 63296.48 ( 0.00%) 62875.16 (-0.67%)
16 63951.76 ( 0.00%) 63769.09 (-0.29%)

SYSBENCH PPC64
-sysbench-pgalloc-delay-sysbench
sysbench-vanilla pgalloc-delay
1 7645.08 ( 0.00%) 7467.43 (-2.38%)
2 14856.67 ( 0.00%) 14558.73 (-2.05%)
3 21952.31 ( 0.00%) 21683.64 (-1.24%)
4 27946.09 ( 0.00%) 28623.29 ( 2.37%)
5 28045.11 ( 0.00%) 28143.69 ( 0.35%)
6 27477.10 ( 0.00%) 27337.45 (-0.51%)
7 26489.17 ( 0.00%) 26590.06 ( 0.38%)
8 26642.91 ( 0.00%) 25274.33 (-5.41%)
9 25137.27 ( 0.00%) 24810.06 (-1.32%)
10 24451.99 ( 0.00%) 24275.85 (-0.73%)
11 23262.20 ( 0.00%) 23674.88 ( 1.74%)
12 24234.81 ( 0.00%) 23640.89 (-2.51%)
13 24577.75 ( 0.00%) 24433.50 (-0.59%)
14 25640.19 ( 0.00%) 25116.52 (-2.08%)
15 26188.84 ( 0.00%) 26181.36 (-0.03%)
16 26782.37 ( 0.00%) 26255.99 (-2.00%)

Again, there is little to conclude here. While there are a few losses,
the results vary by +/- 8% in some cases. They are the results of most
concern as there are some large losses but it's also within the variance
typically seen between kernel releases.

The STREAM results varied so little and are so verbose that I didn't
include them here.

The final test stressed how many huge pages can be allocated. The
absolute number of huge pages allocated are the same with or without the
page. However, the "unusability free space index" which is a measure of
external fragmentation was slightly lower (lower is better) throughout the
lifetime of the system. I also measured the latency of how long it took
to successfully allocate a huge page. The latency was slightly lower and
on X86 and PPC64, more huge pages were allocated almost immediately from
the free lists. The improvement is slight but there.

[mel@csn.ul.ie: Tested, reworked for less branches]
[czoccolo@gmail.com: fix oops by checking pfn_valid_within()]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Corrado Zoccolo <czoccolo@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Corrado Zoccolo
2010-05-25 23:06:56 +0800
e9d6c1573 tmpfs: insert tmpfs cache pages to inactive list at first ... Browse Code »

Shaohua Li reported parallel file copy on tmpfs can lead to OOM killer.
This is regression of caused by commit 9ff473b9a7 ("vmscan: evict
streaming IO first"). Wow, It is 2 years old patch!

Currently, tmpfs file cache is inserted active list at first. This means
that the insertion doesn't only increase numbers of pages in anon LRU, but
it also reduces anon scanning ratio. Therefore, vmscan will get totally
confused. It scans almost only file LRU even though the system has plenty
unused tmpfs pages.

Historically, lru_cache_add_active_anon() was used for two reasons.
1) Intend to priotize shmem page rather than regular file cache.
2) Intend to avoid reclaim priority inversion of used once pages.

But we've lost both motivation because (1) Now we have separate anon and
file LRU list. then, to insert active list doesn't help such priotize.
(2) In past, one pte access bit will cause page activation. then to
insert inactive list with pte access bit mean higher priority than to
insert active list. Its priority inversion may lead to uninteded lru
chun. but it was already solved by commit 645747462 (vmscan: detect
mapped file pages used only once). (Thanks Hannes, you are great!)

Thus, now we can use lru_cache_add_anon() instead.

Signed-off-by: KOSAKI Motohiro
Reported-by: Shaohua Li
Reviewed-by: Wu Fengguang
Reviewed-by: Johannes Weiner
Reviewed-by: Rik van Riel
Reviewed-by: Minchan Kim
Acked-by: Hugh Dickins
Cc: Henrique de Moraes Holschuh
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2010-05-25 23:06:56 +0800
1f0a73886 xtensa: includecheck fix: vectors.S ... Browse Code »

fix the following 'make includecheck' warnings:

arch/xtensa/kernel/vectors.S: asm/processor.h is included more than once.
arch/xtensa/kernel/vectors.S: asm/ptrace.h is included more than once.

Signed-off-by: Jaswinder Singh Rajput
Cc: Chris Zankel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jaswinder Singh Rajput
2010-05-25 23:06:56 +0800
e520c4108 xtensa: convert to asm-generic/hardirq.h ... Browse Code »

Also remove lots of unused irq_cpustat fields.

Signed-off-by: Christoph Hellwig
Cc: Chris Zankel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2010-05-25 23:06:56 +0800
498900fc9 xtensa: set ARCH_KMALLOC_MINALIGN ... Browse Code »

Architectures that handle DMA-non-coherent memory need to set
ARCH_KMALLOC_MINALIGN to make sure that kmalloc'ed buffer is DMA-safe: the
buffer doesn't share a cache with the others.

Signed-off-by: FUJITA Tomonori
Cc: Chris Zankel
Acked-by: Pekka Enberg
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

FUJITA Tomonori
2010-05-25 23:06:56 +0800

24 May, 2010

9 commits

7e125f7b9 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide-2.6:
cmd640: fix kernel oops in test_irq() method
pdc202xx_old: ignore "FIFO empty" bit in test_irq() method
pdc202xx_old: wire test_irq() method for PDC2026x
IDE: pass IRQ flags to the IDE core
ide: fix comment typo in ide.h

Linus Torvalds
2010-05-24 23:05:29 +0800
064e297c3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin: (30 commits)
Blackfin: SMP: fix continuation lines
Blackfin: acvilon: fix timeout usage for I2C
Blackfin: fix typo in BF537 IRQ comment
Blackfin: unify duplicate MEM_MT48LC32M8A2_75 kconfig options
Blackfin: set ARCH_KMALLOC_MINALIGN
Blackfin: use atomic kmalloc in L1 alloc so it too can be atomic
Blackfin: another year of changes (update copyright in boot log)
Blackfin: optimize strncpy a bit
Blackfin: isram: clean up ITEST_COMMAND macro and improve the selftests
Blackfin: move string functions to normal lib/ assembly
Blackfin: SIC: cut down on IAR MMR reads a bit
Blackfin: bf537-minotaur: fix build errors due to header changes
Blackfin: kgdb: pass up the CC register instead of a 0 stub
Blackfin: handle HW errors in the new "FAULT" printing code
Blackfin: show the whole accumulator in the pseudo DBG insn
Blackfin: support all possible registers in the pseudo instructions
Blackfin: add support for the DBG (debug output) pseudo insn
Blackfin: change the BUG opcode to an unused 16-bit opcode
Blackfin: allow NMI watchdog to be used w/RETN as a scratch reg
Blackfin: add support for the DBGA (debug assert) pseudo insn
...

Linus Torvalds
2010-05-24 23:02:58 +0800
f13771187 Merge branch 'bkl/ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing ... Browse Code »

* 'bkl/ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
uml: Pushdown the bkl from harddog_kern ioctl
sunrpc: Pushdown the bkl from sunrpc cache ioctl
sunrpc: Pushdown the bkl from ioctl
autofs4: Pushdown the bkl from ioctl
uml: Convert to unlocked_ioctls to remove implicit BKL
ncpfs: BKL ioctl pushdown
coda: Clean-up whitespace problems in pioctl.c
coda: BKL ioctl pushdown
drivers: Push down BKL into various drivers
isdn: Push down BKL into ioctl functions
scsi: Push down BKL into ioctl functions
dvb: Push down BKL into ioctl functions
smbfs: Push down BKL into ioctl function
coda/psdev: Remove BKL from ioctl function
um/mmapper: Remove BKL usage
sn_hwperf: Kill BKL usage
hfsplus: Push down BKL into ioctl function

Linus Torvalds
2010-05-24 23:01:10 +0800
15953654c Merge git://git.infradead.org/battery-2.6 ... Browse Code »

* git://git.infradead.org/battery-2.6:
ds2760_battery: Document ABI change
ds2760_battery: Make charge_now and charge_full writeable
power_supply: Add support for writeable properties
power_supply: Use attribute groups
power_supply: Add test_power driver
tosa_battery: Fix build error due to direct driver_data usage
wm97xx_battery: Quieten sparse warning (bat_set_pdata not declared)
ds2782_battery: Get rid of magic numbers in driver_data
ds2782_battery: Add support for ds2786 battery gas gauge
pda_power: Add function callbacks for suspend and resume
wm831x_power: Use genirq
Driver for Zipit Z2 battery chip
ds2782_battery: Fix clientdata on removal

Linus Torvalds
2010-05-24 23:00:13 +0800
c3ed9ea4a Merge branch 'timers-for-linus-urgent' of git://git.kernel.org/pub/scm/linux/ker… ... Browse Code »

…nel/git/tip/linux-2.6-tip

* 'timers-for-linus-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timers: Fix slack calculation for expired timers
timekeeping: Fix timezone update

Linus Torvalds
2010-05-24 22:59:17 +0800
0fed2b5cb Merge git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/lethal/sh-2.6: (25 commits)
sh: fix up sh7785lcr_32bit_defconfig.
arch/sh/lib/strlen.S: Checkpatch cleanup
sh: fix up sh7786 dmaengine build.
sh: guard cookie consistency across termination in the DMA driver
sh: prevent the DMA driver from unloading, while in use
sh: fix Oops in the serial SCI driver
sh: allow platforms to specify SD-card supported voltages
mmc: let MFD's provide supported Vdd card voltages to tmio_mmc
sh: disable SD-card write-protection detection on kfr2r09
mfd: pass platform flags down to the tmio_mmc driver
tmio: add a platform flag to disable card write-protection detection
sh: Add SDHI DMA support to migor
sh: Add SDHI DMA support to kfr2r09
sh: Add SDHI DMA support to ms7724se
sh: Add SDHI DMA support to ecovec
mmc: add DMA support to tmio_mmc driver, when used on SuperH
sh: prepare the SDHI MFD driver to pass DMA configuration to tmio_mmc.c
mmc: prepare tmio_mmc for passing of DMA configuration from the MFD cell
sh: add DMA slave definitions to sh7724
sh: add DMA slaves for two SDHI controllers to sh7722
...

Linus Torvalds
2010-05-24 22:58:28 +0800
0163916f1 Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd ... Browse Code »

* 'for-linus' of git://git.open-osd.org/linux-open-osd:
exofs: confusion between kmap() and kmap_atomic() api
exofs: Add default address_space_operations

Linus Torvalds
2010-05-24 22:57:41 +0800
a69eee498 Revert "ath9k: Group Key fix for VAPs" ... Browse Code »

This reverts commit 03ceedea972a82d343fa5c2528b3952fa9e615d5, since it
breaks resume from suspend-to-ram on Rafael's Acer Ferrari One.
NetworkManager thinks everything is ok, but it can't connect to the AP
to get an IP address after the resume.

In fact, it even breaks resume for non-ath9k chipsets: reverting it also
fixes Rafael's Toshiba Protege R500 with the iwlagn driver. As Johannes
says:

"Indeed, this patch needs to be reverted. That mac80211 change is wrong
and completely unnecessary."

Reported-and-requested-by: Rafael J. Wysocki
Acked-by: Johannes Berg
Cc: Daniel Yingqiang Ma
Cc: John W. Linville
Cc: David Miller
Signed-off-by: Linus Torvalds

Linus Torvalds
2010-05-24 22:45:43 +0800
3e766fd41 Merge git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/hirofumi/fatfs-2.6:
fat: convert to unlocked_ioctl
fat: Cleanup nls_unload() usage
fat: use pack_hex_byte() instead of custom one

Linus Torvalds
2010-05-24 22:41:47 +0800