Eric Lee / smarc-fsl-linux-kernel

07 Mar, 2010

35 commits

66b89159c Merge git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/joern/logfs:
[LogFS] Change magic number
[LogFS] Remove h_version field
[LogFS] Check feature flags
[LogFS] Only write journal if dirty
[LogFS] Fix bdev erases
[LogFS] Silence gcc
[LogFS] Prevent 64bit divisions in hash_index
[LogFS] Plug memory leak on error paths
[LogFS] Add MAINTAINERS entry
[LogFS] add new flash file system

Fixed up trivial conflict in lib/Kconfig, and a semantic conflict in
fs/logfs/inode.c introduced by write_inode() being changed to use
writeback_control' by commit a9185b41a4f84971b930c519f0c63bd450c4810d
("pass writeback_control to ->write_inode")

Linus Torvalds
2010-03-07 05:18:03 +0800
87c7ae06c Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
dm raid1: fix deadlock when suspending failed device
dm: eliminate some holes data structures
dm ioctl: introduce flag indicating uevent was generated
dm: free dm_io before bio_endio not after
dm table: remove unused dm_get_device range parameters
dm ioctl: only issue uevent on resume if state changed
dm raid1: always return error if all legs fail
dm mpath: refactor pg_init
dm mpath: wait for pg_init completion when suspending
dm mpath: hold io until all pg_inits completed
dm mpath: avoid storing private suspended state
dm: document when snapshot has finished merging
dm table: remove dm_get from dm_table_get_md
dm mpath: skip activate_path for failed paths
dm mpath: pass struct pgpath to pg init done

Linus Torvalds
2010-03-07 03:34:04 +0800
05c5cb31e Merge branch 'for-2.6.34' of git://linux-nfs.org/~bfields/linux ... Browse Code »

* 'for-2.6.34' of git://linux-nfs.org/~bfields/linux: (22 commits)
nfsd4: fix minor memory leak
svcrpc: treat uid's as unsigned
nfsd: ensure sockets are closed on error
Revert "sunrpc: move the close processing after do recvfrom method"
Revert "sunrpc: fix peername failed on closed listener"
sunrpc: remove unnecessary svc_xprt_put
NFSD: NFSv4 callback client should use RPC_TASK_SOFTCONN
xfs_export_operations.commit_metadata
commit_metadata export operation replacing nfsd_sync_dir
lockd: don't clear sm_monitored on nsm_reboot_lookup
lockd: release reference to nsm_handle in nlm_host_rebooted
nfsd: Use vfs_fsync_range() in nfsd_commit
NFSD: Create PF_INET6 listener in write_ports
SUNRPC: NFS kernel APIs shouldn't return ENOENT for "transport not found"
SUNRPC: Bury "#ifdef IPV6" in svc_create_xprt()
NFSD: Support AF_INET6 in svc_addsock() function
SUNRPC: Use rpc_pton() in ip_map_parse()
nfsd: 4.1 has an rfc number
nfsd41: Create the recovery entry for the NFSv4.1 client
nfsd: use vfs_fsync for non-directories
...

Linus Torvalds
2010-03-07 03:31:38 +0800
89ea8bbe9 gpio: pca953x.c: add interrupt handling capability ... Browse Code »

Most of the GPIO expanders controlled by the pca953x driver are able to
report changes on the input pins through an *INT pin.

This patch implements the irq_chip functionality (edge detection only).

The driver has been tested on an Arcom Zeus.

[akpm@linux-foundation.org: the compiler does inlining for us nowadays]
Signed-off-by: Marc Zyngier
Cc: Eric Miao
Cc: Haojian Zhuang
Cc: David Brownell
Cc: Nate Case
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Marc Zyngier
2010-03-07 03:26:48 +0800
3e45f1d11 gpio: introduce gpio_request_one() and friends ... Browse Code »

gpio_request() without initial configuration of the GPIO is normally
useless, introduce gpio_request_one() together with GPIOF_ flags for
input/output direction and initial output level.

gpio_{request,free}_array() for multiple GPIOs.

Signed-off-by: Eric Miao
Cc: David Brownell
Cc: Ben Nizette
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Miao
2010-03-07 03:26:48 +0800
62fecb70c pca953x: minor include cleanup ... Browse Code »

linux/i2c/pca953x.h is a very bare include file. Fix check for multiple
includes of linux/i2c/pca953x.h, and add dependent includes into the
header file.

Signed-off-by: Olof Johansson
Acked-by: Wolfram Sang
Acked-by: Jean Delvare
Cc: David Brownell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Olof Johansson
2010-03-07 03:26:48 +0800
e952805d2 gpio: add driver for MAX7300 I2C GPIO extender ... Browse Code »

Add the MAX7300-I2C variant of the MAX7301-SPI version. Both chips share
the same core logic, so the generic part of the in-kernel SPI-driver is
refactored into a generic part. The I2C and SPI specific funtions are
then wrapped into seperate drivers picking up the generic part.

Signed-off-by: Wolfram Sang
Cc: Juergen Beisert
Cc: David Brownell
Cc: Jean Delvare
Cc: Anton Vorontsov
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wolfram Sang
2010-03-07 03:26:48 +0800
86c340081 mfd/mc13783: new function reading irq mask and status register ... Browse Code »

The driver for the mc13783 rtc needs to know if the TODA irq is pending.

Instead of tracking in the rtc driver if the irq is enabled provide that
information, too.

Signed-off-by: Uwe Kleine-König
Cc: Alessandro Zummo
Cc: Paul Gortmaker
Cc: Valentin Longchamp
Cc: Sascha Hauer
Cc: Samuel Ortiz
Cc: Dmitry Torokhov
Cc: Luotao Fu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Uwe Kleine-König
2010-03-07 03:26:47 +0800
57205026d mc13783: rename mc13783_{{un,}mask,ack_irq} to have a mc13783_irq prefix ... Browse Code »

In the source file group these functions together.

The mc13783 header file provides fallback implementations for the old
names to prevent build failures. When all users of the old names are
fixed to use the new names these can go away.

Signed-off-by: Uwe Kleine-König
Cc: Alessandro Zummo
Cc: Paul Gortmaker
Cc: Valentin Longchamp
Cc: Sascha Hauer
Cc: Samuel Ortiz
Cc: Dmitry Torokhov
Cc: Luotao Fu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Uwe Kleine-König
2010-03-07 03:26:47 +0800
30736a4d4 coredump: pass mm->flags as a coredump parameter for consistency ... Browse Code »

Pass mm->flags as a coredump parameter for consistency.

---
1787 if (mm->core_state || !get_dumpable(mm)) { mmap_sem);
1789 put_cred(cred);
1790 goto fail;
1791 }
1792
[...]
1798 if (get_dumpable(mm) == 2) { /* Setuid core dump mode */ fsuid = 0; /* Dump root private */
1801 }
---

Since dumpable bits are not protected by lock, there is a chance to change
these bits between (1) and (2).

To solve this issue, this patch copies mm->flags to
coredump_params.mm_flags at the beginning of do_coredump() and uses it
instead of get_dumpable() while dumping core.

This copy is also passed to binfmt->core_dump, since elf*_core_dump() uses
dump_filter bits in mm->flags.

[akpm@linux-foundation.org: fix merge]
Signed-off-by: Masami Hiramatsu
Acked-by: Roland McGrath
Cc: Hidehiro Kawai
Cc: Oleg Nesterov
Cc: Ingo Molnar
Reviewed-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Masami Hiramatsu
2010-03-07 03:26:46 +0800
8d9032bbe elf coredump: add extended numbering support ... Browse Code »

The current ELF dumper implementation can produce broken corefiles if
program headers exceed 65535. This number is determined by the number of
vmas which the process have. In particular, some extreme programs may use
more than 65535 vmas. (If you google max_map_count, you can find some
users facing this problem.) This kind of program never be able to generate
correct coredumps.

This patch implements ``extended numbering'' that uses sh_info field of
the first section header instead of e_phnum field in order to represent
upto 4294967295 vmas.

This is supported by
AMD64-ABI(http://www.x86-64.org/documentation.html) and
Solaris(http://docs.sun.com/app/docs/doc/817-1984/).
Of course, we are preparing patches for gdb and binutils.

Signed-off-by: Daisuke HATAYAMA
Cc: "Luck, Tony"
Cc: Jeff Dike
Cc: David Howells
Cc: Greg Ungerer
Cc: Roland McGrath
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Alexander Viro
Cc: Andi Kleen
Cc: Alan Cox
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke HATAYAMA
2010-03-07 03:26:46 +0800
1fcccbac8 elf coredump: replace ELF_CORE_EXTRA_* macros by functions ... Browse Code »

elf_core_dump() and elf_fdpic_core_dump() use #ifdef and the corresponding
macro for hiding _multiline_ logics in functions. This patch removes
#ifdef and replaces ELF_CORE_EXTRA_* by corresponding functions. For
architectures not implemeonting ELF_CORE_EXTRA_*, we use weak functions in
order to reduce a range of modification.

This cleanup is for my next patches, but I think this cleanup itself is
worth doing regardless of my firnal purpose.

Signed-off-by: Daisuke HATAYAMA
Cc: "Luck, Tony"
Cc: Jeff Dike
Cc: David Howells
Cc: Greg Ungerer
Cc: Roland McGrath
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Alexander Viro
Cc: Andi Kleen
Cc: Alan Cox
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke HATAYAMA
2010-03-07 03:26:45 +0800
088e7af73 coredump: move dump_write() and dump_seek() into a header file ... Browse Code »

My next patch will replace ELF_CORE_EXTRA_* macros by functions, putting
them into other newly created *.c files. Then, each files will contain
dump_write(), where each pair of binfmt_*.c and elfcore.c should be the
same. So, this patch moves them into a header file with dump_seek().
Also, the patch deletes confusing DUMP_WRITE macros in each files.

Signed-off-by: Daisuke HATAYAMA
Cc: "Luck, Tony"
Cc: Jeff Dike
Cc: David Howells
Cc: Greg Ungerer
Cc: Roland McGrath
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: Alexander Viro
Cc: Andi Kleen
Cc: Alan Cox
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daisuke HATAYAMA
2010-03-07 03:26:45 +0800
6b5eda369 sdio: put active devices into 1-bit mode during suspend ... Browse Code »

And bring them back to 4-bit mode during resume.

Signed-off-by: Daniel Drake
Signed-off-by: Nicolas Pitre
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daniel Drake
2010-03-07 03:26:37 +0800
da68c4eb2 sdio: introduce API for special power management features ... Browse Code »

This patch series provides the core changes needed to allow SDIO cards to
remain powered and active while the host system is suspended, and let them
wake up the host system when needed. This is used to implement
wake-on-lan with SDIO wireless cards at the moment. Patches to add that
support to the libertas driver will be posted separately.

This patch:

Some SDIO cards have the ability to keep on running autonomously when the
host system is suspended, and wake it up when needed. This however
requires that the host controller preserve power to the card, and
configure itself appropriately for wake-up.

There is however 4 layers of abstractions involved: the host controller
driver, the MMC core code, the SDIO card management code, and the actual
SDIO function driver. To make things simple and manageable, host drivers
must advertise their PM capabilities with a feature bitmask, then function
drivers can query and set those features from their suspend method. Then
each layer in the suspend call chain is expected to act upon those bits
accordingly.

[akpm@linux-foundation.org: fix typo in comment]
Signed-off-by: Nicolas Pitre
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nicolas Pitre
2010-03-07 03:26:36 +0800
3fb7fb4a0 sdio: add quirk to clamp byte mode transfer ... Browse Code »

Some SDIO cards expect byte transfers not to exceed the configured block
transfer size. Add a quirk to that effect.

Patches to make use of this quirk will be sent separately.

Signed-off-by: Bing Zhao
Signed-off-by: Nicolas Pitre
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Bing Zhao
2010-03-07 03:26:36 +0800
9a86e2bad lib: fix first line of kernel-doc for a few functions ... Browse Code »

The function name must be followed by a space, hypen, space, and a short
description.

Signed-off-by: Ben Hutchings
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ben Hutchings
2010-03-07 03:26:35 +0800
cfd8d6c0e smp: fix documentation in include/linux/smp.h ... Browse Code »

smp: Fix documentation.

Fix documentation in include/linux/smp.h: smp_processor_id()

Signed-off-by: Rakib Mullick
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rakib Mullick
2010-03-07 03:26:32 +0800
72c336885 nodemask.h: remove macro any_online_node ... Browse Code »

The macro any_online_node() is prone to producing sparse warnings due to
the local symbol 'node'. Since all the in-tree users are really
requesting the first online node (the mask argument is either
NODE_MASK_ALL or node_online_map) just use the first_online_node macro and
remove the any_online_node macro since there are no users.

Signed-off-by: H Hartley Sweeten
Acked-by: David Rientjes
Reviewed-by: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Cc: Lee Schermerhorn
Acked-by: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Dave Hansen
Cc: Milton Miller
Cc: Nathan Fontenot
Cc: Geoff Levand
Cc: Grant Likely
Cc: J. Bruce Fields
Cc: Neil Brown
Cc: Trond Myklebust
Cc: David S. Miller
Cc: Benny Halevy
Cc: Chuck Lever
Cc: Ricardo Labiaga
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

H Hartley Sweeten
2010-03-07 03:26:31 +0800
221e3ebf6 cpumask: let num_*_cpus() function always return unsigned values ... Browse Code »

Dependent on CONFIG_SMP the num_*_cpus() functions return unsigned or
signed values. Let them always return unsigned values to avoid strange
casts.

Fixes at least one warning:

kernel/kprobes.c: In function 'register_kretprobe':
kernel/kprobes.c:1038: warning: comparison of distinct pointer types lacks a cast

Signed-off-by: Heiko Carstens
Cc: Heiko Carstens
Cc: Ananth N Mavinakayanahalli
Cc: Masami Hiramatsu
Cc: Ingo Molnar
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heiko Carstens
2010-03-07 03:26:29 +0800
478352e78 mm: add comment about deprecation of __GFP_NOFAIL ... Browse Code »

__GFP_NOFAIL was deprecated in dab48dab, so add a comment that no new
users should be added.

Reviewed-by: KAMEZAWA Hiroyuki
Signed-off-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2010-03-07 03:26:27 +0800
645747462 vmscan: detect mapped file pages used only once ... Browse Code »
94

The VM currently assumes that an inactive, mapped and referenced file page
is in use and promotes it to the active list.

However, every mapped file page starts out like this and thus a problem
arises when workloads create a stream of such pages that are used only for
a short time. By flooding the active list with those pages, the VM
quickly gets into trouble finding eligible reclaim canditates. The result
is long allocation latencies and eviction of the wrong pages.

This patch reuses the PG_referenced page flag (used for unmapped file
pages) to implement a usage detection that scales with the speed of LRU
list cycling (i.e. memory pressure).

If the scanner encounters those pages, the flag is set and the page cycled
again on the inactive list. Only if it returns with another page table
reference it is activated. Otherwise it is reclaimed as 'not recently
used cache'.

This effectively changes the minimum lifetime of a used-once mapped file
page from a full memory cycle to an inactive list cycle, which allows it
to occur in linear streams without affecting the stable working set of the
system.

Signed-off-by: Johannes Weiner
Reviewed-by: Rik van Riel
Cc: Minchan Kim
Cc: OSAKI Motohiro
Cc: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2010-03-07 03:26:27 +0800
452aa6999 mm/pm: force GFP_NOIO during suspend/hibernation and resume ... Browse Code »

There are quite a few GFP_KERNEL memory allocations made during
suspend/hibernation and resume that may cause the system to hang, because
the I/O operations they depend on cannot be completed due to the
underlying devices being suspended.

Avoid this problem by clearing the __GFP_IO and __GFP_FS bits in
gfp_allowed_mask before suspend/hibernation and restoring the original
values of these bits in gfp_allowed_mask durig the subsequent resume.

[akpm@linux-foundation.org: fix CONFIG_PM=n linkage]
Signed-off-by: Rafael J. Wysocki
Reported-by: Maxim Levitsky
Cc: Sebastian Ott
Cc: Benjamin Herrenschmidt
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rafael J. Wysocki
2010-03-07 03:26:26 +0800
fc148a5f7 mm: remove VM_LOCK_RMAP code ... Browse Code »

When a VMA is in an inconsistent state during setup or teardown, the worst
that can happen is that the rmap code will not be able to find the page.

The mapping is in the process of being torn down (PTEs just got
invalidated by munmap), or set up (no PTEs have been instantiated yet).

It is also impossible for the rmap code to follow a pointer to an already
freed VMA, because the rmap code holds the anon_vma->lock, which the VMA
teardown code needs to take before the VMA is removed from the anon_vma
chain.

Hence, we should not need the VM_LOCK_RMAP locking at all.

Signed-off-by: Rik van Riel
Cc: Nick Piggin
Cc: KOSAKI Motohiro
Cc: Larry Woodman
Cc: Lee Schermerhorn
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rik van Riel
2010-03-07 03:26:26 +0800
c44b67432 rmap: move exclusively owned pages to own anon_vma in do_wp_page() ... Browse Code »

When the parent process breaks the COW on a page, both the original which
is mapped at child and the new page which is mapped parent end up in that
same anon_vma. Generally this won't be a problem, but for some workloads
it could preserve the O(N) rmap scanning complexity.

A simple fix is to ensure that, when a page which is mapped child gets
reused in do_wp_page, because we already are the exclusive owner, the page
gets moved to our own exclusive child's anon_vma.

Signed-off-by: Rik van Riel
Cc: KOSAKI Motohiro
Cc: Larry Woodman
Cc: Lee Schermerhorn
Reviewed-by: Minchan Kim
Cc: Andrea Arcangeli
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rik van Riel
2010-03-07 03:26:26 +0800
5beb49305 mm: change anon_vma linking to fix multi-process server scalability issue ... Browse Code »

The old anon_vma code can lead to scalability issues with heavily forking
workloads. Specifically, each anon_vma will be shared between the parent
process and all its child processes.

In a workload with 1000 child processes and a VMA with 1000 anonymous
pages per process that get COWed, this leads to a system with a million
anonymous pages in the same anon_vma, each of which is mapped in just one
of the 1000 processes. However, the current rmap code needs to walk them
all, leading to O(N) scanning complexity for each page.

This can result in systems where one CPU is walking the page tables of
1000 processes in page_referenced_one, while all other CPUs are stuck on
the anon_vma lock. This leads to catastrophic failure for a benchmark
like AIM7, where the total number of processes can reach in the tens of
thousands. Real workloads are still a factor 10 less process intensive
than AIM7, but they are catching up.

This patch changes the way anon_vmas and VMAs are linked, which allows us
to associate multiple anon_vmas with a VMA. At fork time, each child
process gets its own anon_vmas, in which its COWed pages will be
instantiated. The parents' anon_vma is also linked to the VMA, because
non-COWed pages could be present in any of the children.

This reduces rmap scanning complexity to O(1) for the pages of the 1000
child processes, with O(N) complexity for at most 1/N pages in the system.
This reduces the average scanning cost in heavily forking workloads from
O(N) to 2.

The only real complexity in this patch stems from the fact that linking a
VMA to anon_vmas now involves memory allocations. This means vma_adjust
can fail, if it needs to attach a VMA to anon_vma structures. This in
turn means error handling needs to be added to the calling functions.

A second source of complexity is that, because there can be multiple
anon_vmas, the anon_vma linking in vma_adjust can no longer be done under
"the" anon_vma lock. To prevent the rmap code from walking up an
incomplete VMA, this patch introduces the VM_LOCK_RMAP VMA flag. This bit
flag uses the same slot as the NOMMU VM_MAPPED_COPY, with an ifdef in mm.h
to make sure it is impossible to compile a kernel that needs both symbolic
values for the same bitflag.

Some test results:

Without the anon_vma changes, when AIM7 hits around 9.7k users (on a test
box with 16GB RAM and not quite enough IO), the system ends up running
>99% in system time, with every CPU on the same anon_vma lock in the
pageout code.

With these changes, AIM7 hits the cross-over point around 29.7k users.
This happens with ~99% IO wait time, there never seems to be any spike in
system time. The anon_vma lock contention appears to be resolved.

[akpm@linux-foundation.org: cleanups]
Signed-off-by: Rik van Riel
Cc: KOSAKI Motohiro
Cc: Larry Woodman
Cc: Lee Schermerhorn
Cc: Minchan Kim
Cc: Andrea Arcangeli
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rik van Riel
2010-03-07 03:26:26 +0800
19adf9c5d include/linux/fs.h: convert FMODE_* constants to hex ... Browse Code »

It was tolerable until Eric went and added 8388608.

Cc: Eric Paris
Cc: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2010-03-07 03:26:25 +0800
0141450f6 readahead: introduce FMODE_RANDOM for POSIX_FADV_RANDOM ... Browse Code »

This fixes inefficient page-by-page reads on POSIX_FADV_RANDOM.

POSIX_FADV_RANDOM used to set ra_pages=0, which leads to poor performance:
a 16K read will be carried out in 4 _sync_ 1-page reads.

In other places, ra_pages==0 means
- it's ramfs/tmpfs/hugetlbfs/sysfs/configfs
- some IO error happened
where multi-page read IO won't help or should be avoided.

POSIX_FADV_RANDOM actually want a different semantics: to disable the
*heuristic* readahead algorithm, and to use a dumb one which faithfully
submit read IO for whatever application requests.

So introduce a flag FMODE_RANDOM for POSIX_FADV_RANDOM.

Note that the random hint is not likely to help random reads performance
noticeably. And it may be too permissive on huge request size (its IO
size is not limited by read_ahead_kb).

In Quentin's report (http://lkml.org/lkml/2009/12/24/145), the overall
(NFS read) performance of the application increased by 313%!

Tested-by: Quentin Barnes
Signed-off-by: Wu Fengguang
Cc: Nick Piggin
Cc: Andi Kleen
Cc: Steven Whitehouse
Cc: David Howells
Cc: Jonathan Corbet
Cc: Al Viro
Cc: Christoph Hellwig
Cc: Trond Myklebust
Cc: Chuck Lever
Cc: [2.6.33.x]
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2010-03-07 03:26:25 +0800
d96ae5309 memory-hotplug: create /sys/firmware/memmap entry for new memory ... Browse Code »

A memmap is a directory in sysfs which includes 3 text files: start, end
and type. For example:

start: 0x100000
end: 0x7e7b1cff
type: System RAM

Interface firmware_map_add was not called explicitly. Remove it and add
function firmware_map_add_hotplug as hotplug interface of memmap.

Each memory entry has a memmap in sysfs, When we hot-add new memory, sysfs
does not export memmap entry for it. We add a call in function add_memory
to function firmware_map_add_hotplug.

Add a new function add_sysfs_fw_map_entry() to create memmap entry, it
will be called when initialize memmap and hot-add memory.

[akpm@linux-foundation.org: un-kernedoc a no longer kerneldoc comment]
Signed-off-by: Shaohui Zheng
Acked-by: Andi Kleen
Acked-by: Yasunori Goto
Reviewed-by: Wu Fengguang
Cc: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

akpm@linux-foundation.org
2010-03-07 03:26:25 +0800
93e4a89a8 mm: restore zone->all_unreclaimable to independence word ... Browse Code »

commit e815af95 ("change all_unreclaimable zone member to flags") changed
all_unreclaimable member to bit flag. But it had an undesireble side
effect. free_one_page() is one of most hot path in linux kernel and
increasing atomic ops in it can reduce kernel performance a bit.

Thus, this patch revert such commit partially. at least
all_unreclaimable shouldn't share memory word with other zone flags.

[akpm@linux-foundation.org: fix patch interaction]
Signed-off-by: KOSAKI Motohiro
Cc: David Rientjes
Cc: Wu Fengguang
Cc: KAMEZAWA Hiroyuki
Cc: Minchan Kim
Cc: Huang Shijie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2010-03-07 03:26:25 +0800
fc91668ea mm: remove free_hot_page() ... Browse Code »

free_hot_page() is just a wrapper around free_hot_cold_page() with
parameter 'cold = 0'. After adding a clear comment for
free_hot_cold_page(), it is reasonable to remove a level of call.

[akpm@linux-foundation.org: fix build]
Signed-off-by: Li Hong
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Ingo Molnar
Cc: Larry Woodman
Cc: Peter Zijlstra
Cc: Li Ming Chun
Cc: KOSAKI Motohiro
Cc: Americo Wang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Hong
2010-03-07 03:26:25 +0800
b084d4353 mm: count swap usage ... Browse Code »

A frequent questions from users about memory management is what numbers of
swap ents are user for processes. And this information will give some
hints to oom-killer.

Besides we can count the number of swapents per a process by scanning
/proc//smaps, this is very slow and not good for usual process
information handler which works like 'ps' or 'top'. (ps or top is now
enough slow..)

This patch adds a counter of swapents to mm_counter and update is at each
swap events. Information is exported via /proc//status file as

[kamezawa@bluextal memory]$ cat /proc/self/status
Name: cat
State: R (running)
Tgid: 2910
Pid: 2910
PPid: 2823
TracerPid: 0
Uid: 500 500 500 500
Gid: 500 500 500 500
FDSize: 256
Groups: 500
VmPeak: 82696 kB
VmSize: 82696 kB
VmLck: 0 kB
VmHWM: 432 kB
VmRSS: 432 kB
VmData: 172 kB
VmStk: 84 kB
VmExe: 48 kB
VmLib: 1568 kB
VmPTE: 40 kB
VmSwap: 0 kB
Reviewed-by: Minchan Kim
Reviewed-by: Christoph Lameter
Cc: Lee Schermerhorn
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-03-07 03:26:24 +0800
34e55232e mm: avoid false sharing of mm_counter ... Browse Code »

Considering the nature of per mm stats, it's the shared object among
threads and can be a cache-miss point in the page fault path.

This patch adds per-thread cache for mm_counter. RSS value will be
counted into a struct in task_struct and synchronized with mm's one at
events.

Now, in this patch, the event is the number of calls to handle_mm_fault.
Per-thread value is added to mm at each 64 calls.

rough estimation with small benchmark on parallel thread (2threads) shows
[before]
4.5 cache-miss/faults
[after]
4.0 cache-miss/faults
Anyway, the most contended object is mmap_sem if the number of threads grows.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Minchan Kim
Cc: Christoph Lameter
Cc: Lee Schermerhorn
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-03-07 03:26:24 +0800
d559db086 mm: clean up mm_counter ... Browse Code »

Presently, per-mm statistics counter is defined by macro in sched.h

This patch modifies it to
- defined in mm.h as inlinf functions
- use array instead of macro's name creation.

This patch is for reducing patch size in future patch to modify
implementation of per-mm counter.

Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Minchan Kim
Cc: Christoph Lameter
Cc: Lee Schermerhorn
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2010-03-07 03:26:23 +0800
984b3f574 bitops: rename for_each_bit() to for_each_set_bit() ... Browse Code »

Rename for_each_bit to for_each_set_bit in the kernel source tree. To
permit for_each_clear_bit(), should that ever be added.

The patch includes a macro to map the old for_each_bit() onto the new
for_each_set_bit(). This is a (very) temporary thing to ease the migration.

[akpm@linux-foundation.org: add temporary for_each_bit()]
Suggested-by: Alexey Dobriyan
Suggested-by: Andrew Morton
Signed-off-by: Akinobu Mita
Cc: "David S. Miller"
Cc: Russell King
Cc: David Woodhouse
Cc: Artem Bityutskiy
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2010-03-07 03:26:23 +0800

06 Mar, 2010

5 commits

924e600d4 dm: eliminate some holes data structures ... Browse Code »

Eliminate a 4-byte hole in 'struct dm_io_memory' by moving 'offset' above the
'ptr' to which it applies (size reduced from 24 to 16 bytes). And by
association, 1-4 byte hole is eliminated in 'struct dm_io_request' (size
reduced from 56 to 48 bytes).

Eliminate all 6 4-byte holes and 1 cache-line in 'struct dm_snapshot' (size
reduced from 392 to 368 bytes).

Signed-off-by: Mike Snitzer
Signed-off-by: Alasdair G Kergon

Mike Snitzer
2010-03-06 10:32:33 +0800
3abf85b5b dm ioctl: introduce flag indicating uevent was generated ... Browse Code »

Set a new DM_UEVENT_GENERATED_FLAG when returning from ioctls to
indicate that a uevent was actually generated. This tells the userspace
caller that it may need to wait for the event to be processed.

Signed-off-by: Peter Rajnoha
Signed-off-by: Alasdair G Kergon

Peter Rajnoha
2010-03-06 10:32:31 +0800
8215d6ec5 dm table: remove unused dm_get_device range parameters ... Browse Code »

Remove unused parameters(start and len) of dm_get_device()
and fix the callers.

Signed-off-by: Nikanth Karthikesan
Signed-off-by: Alasdair G Kergon

Nikanth Karthikesan
2010-03-06 10:32:27 +0800
64096c174 Merge branch 'slab-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 ... Browse Code »

* 'slab-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6:
SLUB: Fix per-cpu merge conflict
failslab: add ability to filter slab caches
slab: fix regression in touched logic
dma kmalloc handling fixes
slub: remove impossible condition
slab: initialize unused alien cache entry as NULL at alloc_alien_cache().
SLUB: Make slub statistics use this_cpu_inc
SLUB: this_cpu: Remove slub kmem_cache fields
SLUB: Get rid of dynamic DMA kmalloc cache allocation
SLUB: Use this_cpu operations in slub

Linus Torvalds
2010-03-06 06:35:40 +0800
cc7889ff5 Merge branch 'nfs-for-2.6.34' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 ... Browse Code »

* 'nfs-for-2.6.34' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (44 commits)
NFS: Remove requirement for inode->i_mutex from nfs_invalidate_mapping
NFS: Clean up nfs_sync_mapping
NFS: Simplify nfs_wb_page()
NFS: Replace __nfs_write_mapping with sync_inode()
NFS: Simplify nfs_wb_page_cancel()
NFS: Ensure inode is always marked I_DIRTY_DATASYNC, if it has unstable pages
NFS: Run COMMIT as an asynchronous RPC call when wbc->for_background is set
NFS: Reduce the number of unnecessary COMMIT calls
NFS: Add a count of the number of unstable writes carried by an inode
NFS: Cleanup - move nfs_write_inode() into fs/nfs/write.c
nfs41 fix NFS4ERR_CLID_INUSE for exchange id
NFS: Fix an allocation-under-spinlock bug
SUNRPC: Handle EINVAL error returns from the TCP connect operation
NFSv4.1: Various fixes to the sequence flag error handling
nfs4: renewd renew operations should take/put a client reference
nfs41: renewd sequence operations should take/put client reference
nfs: prevent backlogging of renewd requests
nfs: kill renewd before clearing client minor version
NFS: Make close(2) asynchronous when closing NFS O_DIRECT files
NFS: Improve NFS iostat byte count accuracy for writes
...

Linus Torvalds
2010-03-06 05:25:45 +0800