Eric Lee / smarc-fsl-linux-kernel

07 Mar, 2010

26 commits

5beb49305 mm: change anon_vma linking to fix multi-process server scalability issue ... Browse Code »

The old anon_vma code can lead to scalability issues with heavily forking
workloads. Specifically, each anon_vma will be shared between the parent
process and all its child processes.

In a workload with 1000 child processes and a VMA with 1000 anonymous
pages per process that get COWed, this leads to a system with a million
anonymous pages in the same anon_vma, each of which is mapped in just one
of the 1000 processes. However, the current rmap code needs to walk them
all, leading to O(N) scanning complexity for each page.

This can result in systems where one CPU is walking the page tables of
1000 processes in page_referenced_one, while all other CPUs are stuck on
the anon_vma lock. This leads to catastrophic failure for a benchmark
like AIM7, where the total number of processes can reach in the tens of
thousands. Real workloads are still a factor 10 less process intensive
than AIM7, but they are catching up.

This patch changes the way anon_vmas and VMAs are linked, which allows us
to associate multiple anon_vmas with a VMA. At fork time, each child
process gets its own anon_vmas, in which its COWed pages will be
instantiated. The parents' anon_vma is also linked to the VMA, because
non-COWed pages could be present in any of the children.

This reduces rmap scanning complexity to O(1) for the pages of the 1000
child processes, with O(N) complexity for at most 1/N pages in the system.
This reduces the average scanning cost in heavily forking workloads from
O(N) to 2.

The only real complexity in this patch stems from the fact that linking a
VMA to anon_vmas now involves memory allocations. This means vma_adjust
can fail, if it needs to attach a VMA to anon_vma structures. This in
turn means error handling needs to be added to the calling functions.

A second source of complexity is that, because there can be multiple
anon_vmas, the anon_vma linking in vma_adjust can no longer be done under
"the" anon_vma lock. To prevent the rmap code from walking up an
incomplete VMA, this patch introduces the VM_LOCK_RMAP VMA flag. This bit
flag uses the same slot as the NOMMU VM_MAPPED_COPY, with an ifdef in mm.h
to make sure it is impossible to compile a kernel that needs both symbolic
values for the same bitflag.

Some test results:

Without the anon_vma changes, when AIM7 hits around 9.7k users (on a test
box with 16GB RAM and not quite enough IO), the system ends up running
>99% in system time, with every CPU on the same anon_vma lock in the
pageout code.

With these changes, AIM7 hits the cross-over point around 29.7k users.
This happens with ~99% IO wait time, there never seems to be any spike in
system time. The anon_vma lock contention appears to be resolved.

[akpm@linux-foundation.org: cleanups]
Signed-off-by: Rik van Riel
Cc: KOSAKI Motohiro
Cc: Larry Woodman
Cc: Lee Schermerhorn
Cc: Minchan Kim
Cc: Andrea Arcangeli
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rik van Riel
2010-03-07 03:26:26 +0800
648bcc771 mm/memcontrol.c: fix "integer as NULL pointer" sparse warning ... Browse Code »

mm/memcontrol.c:2548:32: warning: Using plain integer as NULL pointer

Signed-off-by: Thiago Farina
Acked-by: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thiago Farina
2010-03-07 03:26:26 +0800
19adf9c5d include/linux/fs.h: convert FMODE_* constants to hex ... Browse Code »

It was tolerable until Eric went and added 8388608.

Cc: Eric Paris
Cc: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2010-03-07 03:26:25 +0800
0141450f6 readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM ... Browse Code »

This fixes inefficient page-by-page reads on POSIX_FADV_RANDOM.

POSIX_FADV_RANDOM used to set ra_pages=0, which leads to poor performance:
a 16K read will be carried out in 4 _sync_ 1-page reads.

In other places, ra_pages==0 means
- it's ramfs/tmpfs/hugetlbfs/sysfs/configfs
- some IO error happened
where multi-page read IO won't help or should be avoided.

POSIX_FADV_RANDOM actually want a different semantics: to disable the
*heuristic* readahead algorithm, and to use a dumb one which faithfully
submit read IO for whatever application requests.

So introduce a flag FMODE_RANDOM for POSIX_FADV_RANDOM.

Note that the random hint is not likely to help random reads performance
noticeably. And it may be too permissive on huge request size (its IO
size is not limited by read_ahead_kb).

In Quentin's report (http://lkml.org/lkml/2009/12/24/145), the overall
(NFS read) performance of the application increased by 313%!

Tested-by: Quentin Barnes
Signed-off-by: Wu Fengguang
Cc: Nick Piggin
Cc: Andi Kleen
Cc: Steven Whitehouse
Cc: David Howells
Cc: Jonathan Corbet
Cc: Al Viro
Cc: Christoph Hellwig
Cc: Trond Myklebust
Cc: Chuck Lever
Cc: [2.6.33.x]
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2010-03-07 03:26:25 +0800
42e496086 vfs: take f_lock on modifying f_mode after open time ... Browse Code »

We'll introduce FMODE_RANDOM which will be runtime modified. So protect
all runtime modification to f_mode with f_lock to avoid races.

Signed-off-by: Wu Fengguang
Cc: Al Viro
Cc: Christoph Hellwig
Cc: Trond Myklebust
Cc: Chuck Lever
Cc: [2.6.33.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2010-03-07 03:26:25 +0800
85f1fb72f mm/migrate.c: kill anon local variable from migrate_page_copy ... Browse Code »

commit 01b1ae63c2 ("memcg: simple migration handling") removed
mem_cgroup_uncharge_cache_page() call from migrate_page_copy. Local
variable `anon' is now unused.

Signed-off-by: KOSAKI Motohiro
Cc: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2010-03-07 03:26:25 +0800
da0aa1389 mm/mempolicy.c: fix indentation of the comments of do_migrate_pages ... Browse Code »

Currently, do_migrate_pages() have very long comment and this is not
indent properly. I often misunderstand it is function starting commnents
and confused it.

this patch fixes it.

note: this patch doesn't break 80 column rule. I guess original
author intended this indentaion, but an accident corrupted it.

Signed-off-by: KOSAKI Motohiro
Reviewed-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2010-03-07 03:26:25 +0800
d96ae5309 memory-hotplug: create /sys/firmware/memmap entry for new memory ... Browse Code »

A memmap is a directory in sysfs which includes 3 text files: start, end
and type. For example:

start: 0x100000
end: 0x7e7b1cff
type: System RAM

Interface firmware_map_add was not called explicitly. Remove it and add
function firmware_map_add_hotplug as hotplug interface of memmap.

Each memory entry has a memmap in sysfs, When we hot-add new memory, sysfs
does not export memmap entry for it. We add a call in function add_memory
to function firmware_map_add_hotplug.

Add a new function add_sysfs_fw_map_entry() to create memmap entry, it
will be called when initialize memmap and hot-add memory.

[akpm@linux-foundation.org: un-kernedoc a no longer kerneldoc comment]
Signed-off-by: Shaohui Zheng
Acked-by: Andi Kleen
Acked-by: Yasunori Goto
Reviewed-by: Wu Fengguang
Cc: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

akpm@linux-foundation.org
2010-03-07 03:26:25 +0800
9d8cebd4b mm: fix mbind vma merge problem ... Browse Code »

Strangely, current mbind() doesn't merge vma with neighbor vma although it's possible.
Unfortunately, many vma can reduce performance...

This patch fixes it.

reproduced program
----------------------------------------------------------------
#include
#include
#include
#include
#include
#include
#include

static unsigned long pagesize;

int main(int argc, char** argv)
{
void* addr;
int ch;
int node;
struct bitmask *nmask = numa_allocate_nodemask();
int err;
int node_set = 0;
char buf[128];

while ((ch = getopt(argc, argv, "n:")) != -1){
switch (ch){
case 'n':
node = strtol(optarg, NULL, 0);
numa_bitmask_setbit(nmask, node);
node_set = 1;
break;
default:
;
}
}
argc -= optind;
argv += optind;

if (!node_set)
numa_bitmask_setbit(nmask, 0);

pagesize = getpagesize();

addr = mmap(NULL, pagesize*3, PROT_READ|PROT_WRITE,
MAP_ANON|MAP_PRIVATE, 0, 0);
if (addr == MAP_FAILED)
perror("mmap "), exit(1);

fprintf(stderr, "pid = %d \n" "addr = %p\n", getpid(), addr);

/* make page populate */
memset(addr, 0, pagesize*3);

/* first mbind */
err = mbind(addr+pagesize, pagesize, MPOL_BIND, nmask->maskp,
nmask->size, MPOL_MF_MOVE_ALL);
if (err)
error("mbind1 ");

/* second mbind */
err = mbind(addr, pagesize*3, MPOL_DEFAULT, NULL, 0, 0);
if (err)
error("mbind2 ");

sprintf(buf, "cat /proc/%d/maps", getpid());
system(buf);

return 0;
}
----------------------------------------------------------------

result without this patch

addr = 0x7fe26ef09000
[snip]
7fe26ef09000-7fe26ef0a000 rw-p 00000000 00:00 0
7fe26ef0a000-7fe26ef0b000 rw-p 00000000 00:00 0
7fe26ef0b000-7fe26ef0c000 rw-p 00000000 00:00 0
7fe26ef0c000-7fe26ef0d000 rw-p 00000000 00:00 0

=> 0x7fe26ef09000-0x7fe26ef0c000 have three vmas.

result with this patch

addr = 0x7fc9ebc76000
[snip]
7fc9ebc76000-7fc9ebc7a000 rw-p 00000000 00:00 0
7fffbe690000-7fffbe6a5000 rw-p 00000000 00:00 0 [stack]

=> 0x7fc9ebc76000-0x7fc9ebc7a000 have only one vma.

[minchan.kim@gmail.com: fix file offset passed to vma_merge()]
Signed-off-by: KOSAKI Motohiro
Reviewed-by: Christoph Lameter
Cc: Nick Piggin
Cc: Hugh Dickins
Cc: Andrea Arcangeli
Cc: Mel Gorman
Cc: Lee Schermerhorn
Signed-off-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2010-03-07 03:26:25 +0800
93e4a89a8 mm: restore zone->all_unreclaimable to independence word ... Browse Code »

commit e815af95 ("change all_unreclaimable zone member to flags") changed
all_unreclaimable member to bit flag. But it had an undesireble side
effect. free_one_page() is one of most hot path in linux kernel and
increasing atomic ops in it can reduce kernel performance a bit.

Thus, this patch revert such commit partially. at least
all_unreclaimable shouldn't share memory word with other zone flags.

[akpm@linux-foundation.org: fix patch interaction]
Signed-off-by: KOSAKI Motohiro
Cc: David Rientjes
Cc: Wu Fengguang
Cc: KAMEZAWA Hiroyuki
Cc: Minchan Kim
Cc: Huang Shijie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2010-03-07 03:26:25 +0800
fc91668ea mm: remove free_hot_page() ... Browse Code »

free_hot_page() is just a wrapper around free_hot_cold_page() with
parameter 'cold = 0'. After adding a clear comment for
free_hot_cold_page(), it is reasonable to remove a level of call.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Li Hong
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Ingo Molnar
Cc: Larry Woodman
Cc: Peter Zijlstra
Cc: Li Ming Chun
Cc: KOSAKI Motohiro
Cc: Americo Wang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Hong
2010-03-07 03:26:25 +0800
c475dab63 mm/page_alloc.c: adjust a call site to trace_mm_page_free_direct ... Browse Code »

Move a call of trace_mm_page_free_direct() from free_hot_page() to
free_hot_cold_page(). It is clearer and close to kmemcheck_free_shadow(),
as it is done in function __free_pages_ok().

Signed-off-by: Li Hong
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Ingo Molnar
Cc: Larry Woodman
Cc: Peter Zijlstra
Cc: Li Ming Chun
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Hong
2010-03-07 03:26:24 +0800
f650316c8 mm/page_alloc.c: remove duplicate call to trace_mm_page_free_direct ... Browse Code »

trace_mm_page_free_direct() is called in function __free_pages(). But it
is called again in free_hot_page() if order == 0 and produce duplicate
records in trace file for mm_page_free_direct event. As below:

K-PID CPU# TIMESTAMP FUNCTION
gnome-terminal-1567 [000] 4415.246466: mm_page_free_direct: page=ffffea0003db9f40 pfn=1155800 order=0
gnome-terminal-1567 [000] 4415.246468: mm_page_free_direct: page=ffffea0003db9f40 pfn=1155800 order=0
gnome-terminal-1567 [000] 4415.246506: mm_page_alloc: page=ffffea0003db9f40 pfn=1155800 order=0 migratetype=0 gfp_flags=GFP_KERNEL
gnome-terminal-1567 [000] 4415.255557: mm_page_free_direct: page=ffffea0003db9f40 pfn=1155800 order=0
gnome-terminal-1567 [000] 4415.255557: mm_page_free_direct: page=ffffea0003db9f40 pfn=1155800 order=0

This patch removes the first call and adds a call to
trace_mm_page_free_direct() in __free_pages_ok().

Signed-off-by: Li Hong
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Ingo Molnar
Cc: Larry Woodman
Cc: Peter Zijlstra
Cc: Li Ming Chun
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Hong
2010-03-07 03:26:24 +0800
76ca542d8 mm, lockdep: annotate reclaim context to zone reclaim too ... Browse Code »

Commit cf40bd16fd ("lockdep: annotate reclaim context") introduced reclaim
context annotation. But it didn't annotate zone reclaim. This patch do
it.

The point is, commit cf40bd16fd annotate __alloc_pages_direct_reclaim but
zone-reclaim doesn't use __alloc_pages_direct_reclaim.

current call graph is

__alloc_pages_nodemask
get_page_from_freelist
zone_reclaim()
__alloc_pages_slowpath
__alloc_pages_direct_reclaim
try_to_free_pages

Actually, if zone_reclaim_mode=1, VM never call
__alloc_pages_direct_reclaim in usual VM pressure.

Signed-off-by: KOSAKI Motohiro
Reviewed-by: Minchan Kim
Acked-by: Nick Piggin
Cc: Peter Zijlstra
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2010-03-07 03:26:24 +0800
84b18490d vmscan: get_scan_ratio() cleanup ... Browse Code »

The get_scan_ratio() should have all scan-ratio related calculations.
Thus, this patch move some calculation into get_scan_ratio.

Signed-off-by: KOSAKI Motohiro
Reviewed-by: Rik van Riel
Reviewed-by: KAMEZAWA Hiroyuki
Reviewed-by: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2010-03-07 03:26:24 +0800
45973d74f vmscan: check high watermark after shrink zone ... Browse Code »

Kswapd checks that zone has sufficient pages free via zone_watermark_ok().

If any zone doesn't have enough pages, we set all_zones_ok to zero.
!all_zone_ok makes kswapd retry rather than sleeping.

I think the watermark check before shrink_zone() is pointless. Only after
kswapd has tried to shrink the zone is the check meaningful.

Move the check to after the call to shrink_zone().

[akpm@linux-foundation.org: fix comment, layout]
Signed-off-by: Minchan Kim
Reviewed-by: KOSAKI Motohiro
Cc: Mel Gorman
Cc: Rik van Riel
Reviewed-by: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2010-03-07 03:26:24 +0800
59e99e5b9 mm: use rlimit helpers ... Browse Code »

Make sure compiler won't do weird things with limits. E.g. fetching them
twice may return 2 different values after writable limits are implemented.

I.e. either use rlimit helpers added in
3e10e716abf3c71bdb5d86b8f507f9e72236c9cd ("resource: add helpers for
fetching rlimits") or ACCESS_ONCE if not applicable.

Signed-off-by: Jiri Slaby
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiri Slaby
2010-03-07 03:26:24 +0800
06f9d8c2b mm: mlock_vma_pages_range() only return success or failure ... Browse Code »

Currently, mlock_vma_pages_range() only return len or 0. then current
error handling of mmap_region() is meaningless complex.

This patch makes simplify and makes consist with brk() code.

Signed-off-by: KOSAKI Motohiro
Cc: Nick Piggin
Cc: Lee Schermerhorn
Cc: Rik van Riel
Cc: KAMEZAWA Hiroyuki
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2010-03-07 03:26:24 +0800
c58267c32 mm: mlock_vma_pages_range() never return negative value ... Browse Code »

Currently, mlock_vma_pages_range() never return negative value. Then, we
can remove some worthless error check.

Signed-off-by: KOSAKI Motohiro
Cc: Nick Piggin
Cc: Lee Schermerhorn
Cc: Rik van Riel
Cc: KAMEZAWA Hiroyuki
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2010-03-07 03:26:24 +0800
b084d4353 mm: count swap usage ... Browse Code »

A frequent questions from users about memory management is what numbers of
swap ents are user for processes. And this information will give some
hints to oom-killer.

Besides we can count the number of swapents per a process by scanning
/proc//smaps, this is very slow and not good for usual process
information handler which works like 'ps' or 'top'. (ps or top is now
enough slow..)

This patch adds a counter of swapents to mm_counter and update is at each
swap events. Information is exported via /proc//status file as

[kamezawa@bluextal memory]$ cat /proc/self/status
Name: cat
State: R (running)
Tgid: 2910
Pid: 2910
PPid: 2823
TracerPid: 0
Uid: 500 500 500 500
Gid: 500 500 500 500
FDSize: 256
Groups: 500
VmPeak: 82696 kB
VmSize: 82696 kB
VmLck: 0 kB
VmHWM: 432 kB
VmRSS: 432 kB
VmData: 172 kB
VmStk: 84 kB
VmExe: 48 kB
VmLib: 1568 kB
VmPTE: 40 kB
VmSwap: 0 kB
Reviewed-by: Minchan Kim
Reviewed-by: Christoph Lameter
Cc: Lee Schermerhorn
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-03-07 03:26:24 +0800
34e55232e mm: avoid false sharing of mm_counter ... Browse Code »

Considering the nature of per mm stats, it's the shared object among
threads and can be a cache-miss point in the page fault path.

This patch adds per-thread cache for mm_counter. RSS value will be
counted into a struct in task_struct and synchronized with mm's one at
events.

Now, in this patch, the event is the number of calls to handle_mm_fault.
Per-thread value is added to mm at each 64 calls.

rough estimation with small benchmark on parallel thread (2threads) shows
[before]
4.5 cache-miss/faults
[after]
4.0 cache-miss/faults
Anyway, the most contended object is mmap_sem if the number of threads grows.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Minchan Kim
Cc: Christoph Lameter
Cc: Lee Schermerhorn
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-03-07 03:26:24 +0800
d559db086 mm: clean up mm_counter ... Browse Code »

Presently, per-mm statistics counter is defined by macro in sched.h

This patch modifies it to
- defined in mm.h as inlinf functions
- use array instead of macro's name creation.

This patch is for reducing patch size in future patch to modify
implementation of per-mm counter.

Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Minchan Kim
Cc: Christoph Lameter
Cc: Lee Schermerhorn
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-03-07 03:26:23 +0800
19b629f58 infiniband: use for_each_set_bit() ... Browse Code »

Replace open-coded loop with for_each_set_bit().

Signed-off-by: Akinobu Mita
Acked-by: Roland Dreier
Cc: Sean Hefty
Cc: Hal Rosenstock
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2010-03-07 03:26:23 +0800
984b3f574 bitops: rename for_each_bit() to for_each_set_bit() ... Browse Code »

Rename for_each_bit to for_each_set_bit in the kernel source tree. To
permit for_each_clear_bit(), should that ever be added.

The patch includes a macro to map the old for_each_bit() onto the new
for_each_set_bit(). This is a (very) temporary thing to ease the migration.

[akpm@linux-foundation.org: add temporary for_each_bit()]
Suggested-by: Alexey Dobriyan
Suggested-by: Andrew Morton
Signed-off-by: Akinobu Mita
Cc: "David S. Miller"
Cc: Russell King
Cc: David Woodhouse
Cc: Artem Bityutskiy
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2010-03-07 03:26:23 +0800
e3cb91ce1 timbgpio: fix build ... Browse Code »

Use of get_irq_chip_data() et al. requires including linux/irq.h

Signed-off-by: David S. Miller
Cc: Richard Röjfors
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Miller
2010-03-07 03:26:23 +0800
781b16775 Fix a dumb typo - use of & instead of && ... Browse Code »

We managed to lose O_DIRECTORY testing due to a stupid typo in commit
1f36f774b2 ("Switch !O_CREAT case to use of do_last()")

Reported-by: Walter Sheets
Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2010-03-07 02:54:48 +0800

06 Mar, 2010

14 commits

64096c174 Merge branch 'slab-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 ... Browse Code »

* 'slab-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
SLUB: Fix per-cpu merge conflict
failslab: add ability to filter slab caches
slab: fix regression in touched logic
dma kmalloc handling fixes
slub: remove impossible condition
slab: initialize unused alien cache entry as NULL at alloc_alien_cache().
SLUB: Make slub statistics use this_cpu_inc
SLUB: this_cpu: Remove slub kmem_cache fields
SLUB: Get rid of dynamic DMA kmalloc cache allocation
SLUB: Use this_cpu operations in slub

Linus Torvalds
2010-03-06 06:35:40 +0800
cc7889ff5 Merge branch 'nfs-for-2.6.34' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 ... Browse Code »

* 'nfs-for-2.6.34' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (44 commits)
NFS: Remove requirement for inode->i_mutex from nfs_invalidate_mapping
NFS: Clean up nfs_sync_mapping
NFS: Simplify nfs_wb_page()
NFS: Replace __nfs_write_mapping with sync_inode()
NFS: Simplify nfs_wb_page_cancel()
NFS: Ensure inode is always marked I_DIRTY_DATASYNC, if it has unstable pages
NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set
NFS: Reduce the number of unnecessary COMMIT calls
NFS: Add a count of the number of unstable writes carried by an inode
NFS: Cleanup - move nfs_write_inode() into fs/nfs/write.c
nfs41 fix NFS4ERR_CLID_INUSE for exchange id
NFS: Fix an allocation-under-spinlock bug
SUNRPC: Handle EINVAL error returns from the TCP connect operation
NFSv4.1: Various fixes to the sequence flag error handling
nfs4: renewd renew operations should take/put a client reference
nfs41: renewd sequence operations should take/put client reference
nfs: prevent backlogging of renewd requests
nfs: kill renewd before clearing client minor version
NFS: Make close(2) asynchronous when closing NFS O_DIRECT files
NFS: Improve NFS iostat byte count accuracy for writes
...

Linus Torvalds
2010-03-06 05:25:45 +0800
b13d3c6e8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
fs/9p: Add hardlink support to .u extension
9P2010.L handshake: .L protocol negotiation
9P2010.L handshake: Remove "dotu" variable
9P2010.L handshake: Add mount option
9P2010.L handshake: Add VFS flags
net/9p: Handle mount errors correctly.
net/9p: Remove MAX_9P_CHAN limit
net/9p: Add multi channel support.

Linus Torvalds
2010-03-06 05:25:24 +0800
e213e26ab Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: (33 commits)
quota: stop using QUOTA_OK / NO_QUOTA
dquot: cleanup dquot initialize routine
dquot: move dquot initialization responsibility into the filesystem
dquot: cleanup dquot drop routine
dquot: move dquot drop responsibility into the filesystem
dquot: cleanup dquot transfer routine
dquot: move dquot transfer responsibility into the filesystem
dquot: cleanup inode allocation / freeing routines
dquot: cleanup space allocation / freeing routines
ext3: add writepage sanity checks
ext3: Truncate allocated blocks if direct IO write fails to update i_size
quota: Properly invalidate caches even for filesystems with blocksize < pagesize
quota: generalize quota transfer interface
quota: sb_quota state flags cleanup
jbd: Delay discarding buffers in journal_unmap_buffer
ext3: quota_write cross block boundary behaviour
quota: drop permission checks from xfs_fs_set_xstate/xfs_fs_set_xquota
quota: split out compat_sys_quotactl support from quota.c
quota: split out netlink notification support from quota.c
quota: remove invalid optimization from quota_sync_all
...

Fixed trivial conflicts in fs/namei.c and fs/ufs/inode.c

Linus Torvalds
2010-03-06 05:20:53 +0800
c812a51d1 Merge branch 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

* 'kvm-updates/2.6.34' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (145 commits)
KVM: x86: Add KVM_CAP_X86_ROBUST_SINGLESTEP
KVM: VMX: Update instruction length on intercepted BP
KVM: Fix emulate_sys[call, enter, exit]()'s fault handling
KVM: Fix segment descriptor loading
KVM: Fix load_guest_segment_descriptor() to inject page fault
KVM: x86 emulator: Forbid modifying CS segment register by mov instruction
KVM: Convert kvm->requests_lock to raw_spinlock_t
KVM: Convert i8254/i8259 locks to raw_spinlocks
KVM: x86 emulator: disallow opcode 82 in 64-bit mode
KVM: x86 emulator: code style cleanup
KVM: Plan obsolescence of kernel allocated slots, paravirt mmu
KVM: x86 emulator: Add LOCK prefix validity checking
KVM: x86 emulator: Check CPL level during privilege instruction emulation
KVM: x86 emulator: Fix popf emulation
KVM: x86 emulator: Check IOPL level during io instruction emulation
KVM: x86 emulator: fix memory access during x86 emulation
KVM: x86 emulator: Add Virtual-8086 mode of emulation
KVM: x86 emulator: Add group9 instruction decoding
KVM: x86 emulator: Add group8 instruction decoding
KVM: do not store wqh in irqfd
...

Trivial conflicts in Documentation/feature-removal-schedule.txt

Linus Torvalds
2010-03-06 05:12:34 +0800
5717144a0 fs/9p: Add hardlink support to .u extension ... Browse Code »

For regular file and directories we put the link
count in th extension field in a tagged string format.

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Eric Van Hensbergen

Aneesh Kumar K.V
2010-03-06 05:04:42 +0800
c5a7697da 9P2010.L handshake: .L protocol negotiation ... Browse Code »

This patch adds 9P2010.L protocol negotiation with the server

Signed-off-by: Sripathi Kodi
Signed-off-by: Eric Van Hensbergen

Sripathi Kodi
2010-03-06 05:04:42 +0800
342fee1d5 9P2010.L handshake: Remove "dotu" variable ... Browse Code »

Removes 'dotu' variable and make everything dependent
on 'proto_version' field.

Signed-off-by: Sripathi Kodi
Signed-off-by: Eric Van Hensbergen

Sripathi Kodi
2010-03-06 05:04:42 +0800
0fb80abd9 9P2010.L handshake: Add mount option ... Browse Code »

Add new mount V9FS mount option to specify protocol version

This patch adds a new mount option to specify protocol version.
With this option it is possible to use "-o version=" switch to
specify 9P protocol version to use. Valid options for version
are:
9p2000
9p2000.u
9p2010.L

Signed-off-by: Sripathi Kodi
Signed-off-by: Eric Van Hensbergen

Sripathi Kodi
2010-03-06 05:04:42 +0800
dd6102fbd 9P2010.L handshake: Add VFS flags ... Browse Code »

Add 9P2000.u and 9P2010.L protocol flags to V9FS VFS

This patch adds 9P2000.u and 9P2010.L protocol flags into V9FS VFS side code
and removes the single flag used for 'extended'.

Signed-off-by: Sripathi Kodi
Signed-off-by: Eric Van Hensbergen

Sripathi Kodi
2010-03-06 05:04:41 +0800
c1a7c2262 net/9p: Handle mount errors correctly. ... Browse Code »

With this patch we have

# mount -t 9p -o trans=virtio virtio2 /mnt/
# mount -t 9p -o trans=virtio virtio2 /mnt/
mount: virtio2 already mounted or /mnt/ busy
mount: according to mtab, virtio2 is already mounted on /mnt
# mount -t 9p -o trans=virtio virtio3 /mnt/ -o debug=0xfff
mount: special device virtio3 does not exist

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Eric Van Hensbergen

Aneesh Kumar K.V
2010-03-06 05:04:41 +0800
37c1209d4 net/9p: Remove MAX_9P_CHAN limit ... Browse Code »

Use a list to track the channel instead of statically
allocated array

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Eric Van Hensbergen

Aneesh Kumar K.V
2010-03-06 05:04:41 +0800
f75580c4a net/9p: Add multi channel support. ... Browse Code »

This is needed for supporting multiple mount points.

We can find out the device names to be used with mount by checking

/sys/devices/virtio-pci/virtio*/device file

if the device file have value 9 then the specific virtio device can
be used for mounting.

ex:
#cat /sys/devices/virtio-pci/virtio1/device
9

now we can mount using
# mount -t 9p -o trans=virtio virtio1 /mnt/

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Eric Van Hensbergen

Aneesh Kumar K.V
2010-03-06 05:04:41 +0800
3fa04ecd7 Merge branch 'writeback-for-2.6.34' into nfs-for-2.6.34 Browse Code »

Trond Myklebust
2010-03-06 04:46:18 +0800