Eric Lee / smarc-fsl-linux-kernel

17 Jun, 2009

25 commits

58568d2a8 cpuset,mm: update tasks' mems_allowed in time ... Browse Code »

Fix allocating page cache/slab object on the unallowed node when memory
spread is set by updating tasks' mems_allowed after its cpuset's mems is
changed.

In order to update tasks' mems_allowed in time, we must modify the code of
memory policy. Because the memory policy is applied in the process's
context originally. After applying this patch, one task directly
manipulates anothers mems_allowed, and we use alloc_lock in the
task_struct to protect mems_allowed and memory policy of the task.

But in the fast path, we didn't use lock to protect them, because adding a
lock may lead to performance regression. But if we don't add a lock,the
task might see no nodes when changing cpuset's mems_allowed to some
non-overlapping set. In order to avoid it, we set all new allowed nodes,
then clear newly disallowed ones.

[lee.schermerhorn@hp.com:
The rework of mpol_new() to extract the adjusting of the node mask to
apply cpuset and mpol flags "context" breaks set_mempolicy() and mbind()
with MPOL_PREFERRED and a NULL nodemask--i.e., explicit local
allocation. Fix this by adding the check for MPOL_PREFERRED and empty
node mask to mpol_new_mpolicy().

Remove the now unneeded 'nodes = NULL' from mpol_new().

Note that mpol_new_mempolicy() is always called with a non-NULL
'nodes' parameter now that it has been removed from mpol_new().
Therefore, we don't need to test nodes for NULL before testing it for
'empty'. However, just to be extra paranoid, add a VM_BUG_ON() to
verify this assumption.]
[lee.schermerhorn@hp.com:

I don't think the function name 'mpol_new_mempolicy' is descriptive
enough to differentiate it from mpol_new().

This function applies cpuset set context, usually constraining nodes
to those allowed by the cpuset. However, when the 'RELATIVE_NODES flag
is set, it also translates the nodes. So I settled on
'mpol_set_nodemask()', because the comment block for mpol_new() mentions
that we need to call this function to "set nodes".

Some additional minor line length, whitespace and typo cleanup.]
Signed-off-by: Miao Xie
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Christoph Lameter
Cc: Paul Menage
Cc: Nick Piggin
Cc: Yasunori Goto
Cc: Pekka Enberg
Cc: David Rientjes
Signed-off-by: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miao Xie
2009-06-17 10:47:31 +0800
950592f7b cpusets: update tasks' page/slab spread flags in time ... Browse Code »

Fix the bug that the kernel didn't spread page cache/slab object evenly
over all the allowed nodes when spread flags were set by updating tasks'
page/slab spread flags in time.

Signed-off-by: Miao Xie
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Christoph Lameter
Cc: Paul Menage
Cc: Nick Piggin
Cc: Yasunori Goto
Cc: Pekka Enberg
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miao Xie
2009-06-17 10:47:31 +0800
f3b39d47e cpusets: restructure the function cpuset_update_task_memory_state() ... Browse Code »

The kernel still allocates the page caches on old node after modifying its
cpuset's mems when 'memory_spread_page' was set, or it didn't spread the
page cache evenly over all the nodes that faulting task is allowed to usr
after memory_spread_page was set. it is caused by the old mem_allowed and
flags of the task, the current kernel doesn't updates them unless some
function invokes cpuset_update_task_memory_state(), it is too late
sometimes.We must update the mem_allowed and the flags of the tasks in
time.

Slab has the same problem.

The following patches fix this bug by updating tasks' mem_allowed and
spread flag after its cpuset's mems or spread flag is changed.

This patch:

Extract a function from cpuset_update_task_memory_state(). It will be
used later for update tasks' page/slab spread flags after its cpuset's
flag is set

Signed-off-by: Miao Xie
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Christoph Lameter
Cc: Paul Menage
Cc: Nick Piggin
Cc: Yasunori Goto
Cc: Pekka Enberg
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Miao Xie
2009-06-17 10:47:31 +0800
dcf975d58 mm/page-writeback.c: dirty limit type should be unsigned long ... Browse Code »

get_dirty_limits() calls clip_bdi_dirty_limit() and task_dirty_limit()
with variable pbdi_dirty as one of the arguments. This variable is an
unsigned long * but both functions expect it to be a long *. This causes
the following sparse warnings:

warning: incorrect type in argument 3 (different signedness)
expected long *pbdi_dirty
got unsigned long *pbdi_dirty
warning: incorrect type in argument 2 (different signedness)
expected long *pdirty
got unsigned long *pbdi_dirty

Fix the warnings by changing the long * to unsigned long * in both
functions.

Signed-off-by: H Hartley Sweeten
Cc: Johannes Weiner
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

H Hartley Sweeten
2009-06-17 10:47:31 +0800
78dc583d3 vmscan: low order lumpy reclaim also should use PAGEOUT_IO_SYNC ... Browse Code »

Commit 33c120ed2843090e2bd316de1588b8bf8b96cbde ("more aggressively use
lumpy reclaim") increased how aggressive lumpy reclaim was by isolating
both active and inactive pages for asynchronous lumpy reclaim on
costly-high-order pages and for cheap-high-order when memory pressure is
high. However, if the system is under heavy pressure and there are dirty
pages, asynchronous IO may not be sufficient to reclaim a suitable page in
time.

This patch causes the caller to enter synchronous lumpy reclaim for
costly-high-order pages and for cheap-high-order pages when under memory
pressure.

Minchan.kim@gmail.com said:

Andy added synchronous lumpy reclaim with
c661b078fd62abe06fd11fab4ac5e4eeafe26b6d. At that time, lumpy reclaim is
not agressive. His intension is just for high-order users.(above
PAGE_ALLOC_COSTLY_ORDER).

After some time, Rik added aggressive lumpy reclaim with
33c120ed2843090e2bd316de1588b8bf8b96cbde. His intention was to do lumpy
reclaim when high-order users and trouble getting a small set of
contiguous pages.

So we also have to add synchronous pageout for small set of contiguous
pages.

Cc: Lee Schermerhorn
Cc: Andy Whitcroft
Acked-by: Peter Zijlstra
Cc: Rik van Riel
Signed-off-by: KOSAKI Motohiro
Reviewed-by: Minchan Kim
Reviewed-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2009-06-17 10:47:31 +0800
d2bf6be8a mm: clean up get_user_pages_fast() documentation ... Browse Code »

Move more documentation for get_user_pages_fast into the new kerneldoc comment.
Add some comments for get_user_pages as well.

Also, move get_user_pages_fast declaration up to get_user_pages. It wasn't
there initially because it was once a static inline function.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Nick Piggin
Cc: Andy Grover
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2009-06-17 10:47:30 +0800
7ffc59b4d readahead: enforce full sync mmap readahead size ... Browse Code »

Now that we do readahead for sequential mmap reads, here is a simple
evaluation of the impacts, and one further optimization.

It's an NFS-root debian desktop system, readahead size = 60 pages.
The numbers are grabbed after a fresh boot into console.

approach pgmajfault RA miss ratio mmap IO count avg IO size(pages)
A 383 31.6% 383 11
B 225 32.4% 390 11
C 224 32.6% 307 13

case A: mmap sync/async readahead disabled
case B: mmap sync/async readahead enabled, with enforced full async readahead size
case C: mmap sync/async readahead enabled, with enforced full sync/async readahead size
or:
A = vanilla 2.6.30-rc1
B = A plus mmap readahead
C = B plus this patch

The numbers show that
- there are good possibilities for random mmap reads to trigger readahead
- 'pgmajfault' is reduced by 1/3, due to the _async_ nature of readahead
- case C can further reduce IO count by 1/4
- readahead miss ratios are not quite affected

The theory is
- readahead is _good_ for clustered random reads, and can perform
_better_ than readaround because they could be _async_.
- async readahead size is guaranteed to be larger than readaround
size, and they are _async_, hence will mostly behave better
However for B
- sync readahead size could be smaller than readaround size, hence may
make things worse by produce more smaller IOs
which will be fixed by this patch.

Final conclusion:
- mmap readahead reduced major faults by 1/3 and no obvious overheads;
- mmap io can be further reduced by 1/4 with this patch.

Signed-off-by: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:30 +0800
61b7cbdba readahead: remove redundant test in shrink_readahead_size_eio() ... Browse Code »

Signed-off-by: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:30 +0800
10be0b372 readahead: introduce context readahead algorithm ... Browse Code »

Introduce page cache context based readahead algorithm.
This is to better support concurrent read streams in general.

RATIONALE
---------
The current readahead algorithm detects interleaved reads in a _passive_ way.
Given a sequence of interleaved streams 1,1001,2,1002,3,4,1003,5,1004,1005,6,...
By checking for (offset == prev_offset + 1), it will discover the sequentialness
between 3,4 and between 1004,1005, and start doing sequential readahead for the
individual streams since page 4 and page 1005.

The context readahead algorithm guarantees to discover the sequentialness no
matter how the streams are interleaved. For the above example, it will start
sequential readahead since page 2 and 1002.

The trick is to poke for page @offset-1 in the page cache when it has no other
clues on the sequentialness of request @offset: if the current requenst belongs
to a sequential stream, that stream must have accessed page @offset-1 recently,
and the page will still be cached now. So if page @offset-1 is there, we can
take request @offset as a sequential access.

BENEFICIARIES
-------------
- strictly interleaved reads i.e. 1,1001,2,1002,3,1003,...
the current readahead will take them as silly random reads;
the context readahead will take them as two sequential streams.

- cooperative IO processes i.e. NFS and SCST
They create a thread pool, farming off (sequential) IO requests to different
threads which will be performing interleaved IO.

It was not easy(or possible) to reliably tell from file->f_ra all those
cooperative processes working on the same sequential stream, since they will
have different file->f_ra instances. And NFSD's file->f_ra is particularly
unusable, since their file objects are dynamically created for each request.
The nfsd does have code trying to restore the f_ra bits, but not satisfactory.

The new scheme is to detect the sequential pattern via looking up the page
cache, which provides one single and consistent view of the pages recently
accessed. That makes sequential detection for cooperative processes possible.

USER REPORT
-----------
Vladislav recommends the addition of context readahead as a result of his SCST
benchmarks. It leads to 6%~40% performance gains in various cases and achieves
equal performance in others. http://lkml.org/lkml/2009/3/19/239

OVERHEADS
---------
In theory, it introduces one extra page cache lookup per random read. However
the below benchmark shows context readahead to be slightly faster, wondering..

Randomly reading 200MB amount of data on a sparse file, repeat 20 times for
each block size. The average throughputs are:

original ra context ra gain
4K random reads: 65.561MB/s 65.648MB/s +0.1%
16K random reads: 124.767MB/s 124.951MB/s +0.1%
64K random reads: 162.123MB/s 162.278MB/s +0.1%

Cc: Jens Axboe
Cc: Jeff Moyer
Tested-by: Vladislav Bolkhovitin
Signed-off-by: Wu Fengguang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:30 +0800
045a2529a readahead: move the random read case to bottom ... Browse Code »

Split all readahead cases, and move the random one to bottom.

No behavior changes.

This is to prepare for the introduction of context readahead, and make it
easy for inserting accounting/tracing points for each case.

Signed-off-by: Wu Fengguang
Cc: Vladislav Bolkhovitin
Cc: Jens Axboe
Cc: Jeff Moyer
Cc: Nick Piggin
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:30 +0800
dc566127d radix-tree: add radix_tree_prev_hole() ... Browse Code »

The counterpart of radix_tree_next_hole(). To be used by context readahead.

Signed-off-by: Wu Fengguang
Cc: Vladislav Bolkhovitin
Cc: Jens Axboe
Cc: Jeff Moyer
Cc: Nick Piggin
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:30 +0800
d30a11004 readahead: record mmap read-around states in file_ra_state ... Browse Code »

Mmap read-around now shares the same code style and data structure with
readahead code.

This also removes do_page_cache_readahead(). Its last user, mmap
read-around, has been changed to call ra_submit().

The no-readahead-if-congested logic is dumped by the way. Users will be
pretty sensitive about the slow loading of executables. So it's
unfavorable to disabled mmap read-around on a congested queue.

[akpm@linux-foundation.org: coding-style fixes]
Cc: Nick Piggin
Signed-off-by: Fengguang Wu
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:29 +0800
2fad6f5de readahead: enforce full readahead size on async mmap readahead ... Browse Code »

We need this in one particular case and two more general ones.

Now we do async readahead for sequential mmap reads, and do it with the
help of PG_readahead. For normal reads, PG_readahead is the sufficient
condition to do a sequential readahead. But unfortunately, for mmap
reads, there is a tiny nuisance:

[11736.998347] readahead-init0(process: sh/23926, file: sda1/w3m, offset=0:4503599627370495, ra=0+4-3) = 4
[11737.014985] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:0, ra=290+32-0) = 17
[11737.019488] readahead-around(process: w3m/23926, file: sda1/w3m, offset=0:0, ra=118+32-0) = 32
[11737.024921] readahead-interleaved(process: w3m/23926, file: sda1/w3m, offset=0:2, ra=4+6-6) = 6
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~

An unfavorably small readahead. The original dumb read-around size could
be more efficient.

That happened because ld-linux.so does a read(832) in L1 before mmap(),
which triggers a 4-page readahead, with the second page tagged
PG_readahead.

L0: open("/lib/libc.so.6", O_RDONLY) = 3
L1: read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\340\342"..., 832) = 832
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
L2: fstat(3, {st_mode=S_IFREG|0755, st_size=1420624, ...}) = 0
L3: mmap(NULL, 3527256, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7fac6e51d000
L4: mprotect(0x7fac6e671000, 2097152, PROT_NONE) = 0
L5: mmap(0x7fac6e871000, 20480, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x154000) = 0x7fac6e871000
L6: mmap(0x7fac6e876000, 16984, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7fac6e876000
L7: close(3) = 0

In general, the PG_readahead flag will also be hit in cases

- sequential reads

- clustered random reads

A full readahead size is desirable in both cases.

Cc: Nick Piggin
Signed-off-by: Wu Fengguang
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:29 +0800
70ac23cfa readahead: sequential mmap readahead ... Browse Code »

Auto-detect sequential mmap reads and do readahead for them.

The sequential mmap readahead will be triggered when
- sync readahead: it's a major fault and (prev_offset == offset-1);
- async readahead: minor fault on PG_readahead page with valid readahead state.

The benefits of doing readahead instead of read-around:
- less I/O wait thanks to async readahead
- double real I/O size and no more cache hits

The single stream case is improved a little.
For 100,000 sequential mmap reads:

user system cpu total
(1-1) plain -mm, 128KB readaround: 3.224 2.554 48.40% 11.838
(1-2) plain -mm, 256KB readaround: 3.170 2.392 46.20% 11.976
(2) patched -mm, 128KB readahead: 3.117 2.448 47.33% 11.607

The patched (2) has smallest total time, since it has no cache hit overheads
and less I/O block time(thanks to async readahead). Here the I/O size
makes no much difference, since there's only one single stream.

Note that (1-1)'s real I/O size is 64KB and (1-2)'s real I/O size is 128KB,
since the half of the read-around pages will be readahead cache hits.

This is going to make _real_ differences for _concurrent_ IO streams.

Cc: Nick Piggin
Signed-off-by: Wu Fengguang
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:29 +0800
ef00e08e2 readahead: clean up and simplify the code for filemap page fault readahead ... Browse Code »

This shouldn't really change behavior all that much, but the single rather
complex function with read-ahead inside a loop etc is broken up into more
manageable pieces.

The behaviour is also less subtle, with the read-ahead being done up-front
rather than inside some subtle loop and thus avoiding the now unnecessary
extra state variables (ie "did_readaround" is gone).

Fengguang: the code split in fact fixed a bug reported by Pavel Levshin:
the PGMAJFAULT accounting used to be bypassed when MADV_RANDOM is set, in
which case the original code will directly jump to no_cached_page reading.

Cc: Pavel Levshin
Cc:
Cc: Nick Piggin
Signed-off-by: Wu Fengguang
Signed-off-by: Linus Torvalds
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Linus Torvalds
2009-06-17 10:47:29 +0800
51daa88eb readahead: remove sync/async readahead call dependency ... Browse Code »

The readahead call scheme is error-prone in that it expects the call sites
to check for async readahead after doing a sync one. I.e.

if (!page)
page_cache_sync_readahead();
page = find_get_page();
if (page && PageReadahead(page))
page_cache_async_readahead();

This is because PG_readahead could be set by a sync readahead for the
_current_ newly faulted in page, and the readahead code simply expects one
more callback on the same page to start the async readahead. If the
caller fails to do so, it will miss the PG_readahead bits and never able
to start an async readahead.

Eliminate this insane constraint by piggy-backing the async part into the
current readahead window.

Now if an async readahead should be started immediately after a sync one,
the readahead logic itself will do it. So the following code becomes
valid: (the 'else' in particular)

if (!page)
page_cache_sync_readahead();
else if (PageReadahead(page))
page_cache_async_readahead();

Cc: Nick Piggin
Signed-off-by: Wu Fengguang
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:29 +0800
160334a0c readahead: increase interleaved readahead size ... Browse Code »

Make sure interleaved readahead size is larger than request size. This
also makes the readahead window grow up more quickly.

Reported-by: Xu Chenfeng
Signed-off-by: Wu Fengguang
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:29 +0800
caca7cb74 readahead: remove one unnecessary radix tree lookup ... Browse Code »

(hit_readahead_marker != 0) means the page at @offset is present, so we
can search for non-present page starting from @offset+1.

Reported-by: Xu Chenfeng
Signed-off-by: Wu Fengguang
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:28 +0800
fc31d16ad readahead: apply max_sane_readahead() limit in ondemand_readahead() ... Browse Code »

Just in case someone aggressively sets a huge readahead size.

Cc: Nick Piggin
Signed-off-by: Wu Fengguang
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:28 +0800
f7e839dd3 readahead: move max_sane_readahead() calls into force_page_cache_readahead() ... Browse Code »

Impact: code simplification.

Cc: Nick Piggin
Signed-off-by: Wu Fengguang
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:28 +0800
1ebf26a9b readahead: make mmap_miss an unsigned int ... Browse Code »

This makes the performance impact of possible mmap_miss wrap around to be
temporary and tolerable: i.e. MMAP_LOTSAMISS=100 extra readarounds.

Otherwise if ever mmap_miss wraps around to negative, it takes INT_MAX
cache misses to bring it back to normal state. During the time mmap
readaround will be _enabled_ for whatever wild random workload. That's
almost permanent performance impact.

Signed-off-by: Wu Fengguang
Cc: Ying Han
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wu Fengguang
2009-06-17 10:47:28 +0800
bb1f17b03 mm: consolidate init_mm definition ... Browse Code »

* create mm/init-mm.c, move init_mm there
* remove INIT_MM, initialize init_mm with C99 initializer
* unexport init_mm on all arches:

init_mm is already unexported on x86.

One strange place is some OMAP driver (drivers/video/omap/) which
won't build modular, but it's already wants get_vm_area() export.
Somebody should look there.

[akpm@linux-foundation.org: add missing #includes]
Signed-off-by: Alexey Dobriyan
Cc: Mike Frysinger
Cc: Americo Wang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-06-17 10:47:28 +0800
3b0fde0fa firmware_map: fix hang with x86/32bit ... Browse Code »

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=13484

Peer reported:
| The bug is introduced from kernel 2.6.27, if E820 table reserve the memory
| above 4G in 32bit OS(BIOS-e820: 00000000fff80000 - 0000000120000000
| (reserved)), system will report Int 6 error and hang up. The bug is caused by
| the following code in drivers/firmware/memmap.c, the resource_size_t is 32bit
| variable in 32bit OS, the BUG_ON() will be invoked to result in the Int 6
| error. I try the latest 32bit Ubuntu and Fedora distributions, all hit this
| bug.
|======
|static int firmware_map_add_entry(resource_size_t start, resource_size_t end,
| const char *type,
| struct firmware_map_entry *entry)

and it only happen with CONFIG_PHYS_ADDR_T_64BIT is not set.

it turns out we need to pass u64 instead of resource_size_t for that.

[akpm@linux-foundation.org: add comment]
Reported-and-tested-by: Peer Chen
Signed-off-by: Yinghai Lu
Cc: Ingo Molnar
Acked-by: H. Peter Anvin
Cc: Thomas Gleixner
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yinghai Lu
2009-06-17 10:47:28 +0800
021415468 spi: takes size of a pointer to determine the size of the pointed-to type ... Browse Code »

Do not take the size of a pointer to determine the size of the pointed-to
type.

Signed-off-by: Roel Kluin
Acked-by: Anton Vorontsov
Cc: David Brownell
Cc: Benjamin Herrenschmidt
Cc: Kumar Gala
Acked-by: Grant Likely
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roel Kluin
2009-06-17 10:47:27 +0800
08604bd99 time: move PIT_TICK_RATE to linux/timex.h ... Browse Code »

PIT_TICK_RATE is currently defined in four architectures, but in three
different places. While linux/timex.h is not the perfect place for it, it
is still a reasonable replacement for those drivers that traditionally use
asm/timex.h to get CLOCK_TICK_RATE and expect it to be the PIT frequency.

Note that for Alpha, the actual value changed from 1193182UL to 1193180UL.
This is unlikely to make a difference, and probably can only improve
accuracy. There was a discussion on the correct value of CLOCK_TICK_RATE
a few years ago, after which every existing instance was getting changed
to 1193182. According to the specification, it should be
1193181.818181...

Signed-off-by: Arnd Bergmann
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Cc: Len Brown
Cc: john stultz
Cc: Dmitry Torokhov
Cc: Takashi Iwai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arnd Bergmann
2009-06-17 10:47:27 +0800

16 Jun, 2009

10 commits

03347e259 Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6 ... Browse Code »

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
[IA64] fix compile error in arch/ia64/mm/extable.c

Linus Torvalds
2009-06-16 01:38:06 +0800
19035e5b5 Merge branch 'timers-for-linus-migration' of git://git.kernel.org/pub/scm/linux/… ... Browse Code »

…kernel/git/tip/linux-2.6-tip

* 'timers-for-linus-migration' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
timers: Logic to move non pinned timers
timers: /proc/sys sysctl hook to enable timer migration
timers: Identifying the existing pinned timers
timers: Framework for identifying pinned timers
timers: allow deferrable timers for intervals tv2-tv5 to be deferred

Fix up conflicts in kernel/sched.c and kernel/timer.c manually

Linus Torvalds
2009-06-16 01:06:19 +0800
f9db6e095 Merge branch 'timers-for-linus-clockevents' of git://git.kernel.org/pub/scm/linu… ... Browse Code »

…x/kernel/git/tip/linux-2.6-tip

* 'timers-for-linus-clockevents' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
clockevent: export register_device and delta2ns
clockevents: tick_broadcast_device can become static

Linus Torvalds
2009-06-16 00:58:50 +0800
3f27c0d2a Merge branch 'timers-for-linus-clocksource' of git://git.kernel.org/pub/scm/linu… ... Browse Code »

…x/kernel/git/tip/linux-2.6-tip

* 'timers-for-linus-clocksource' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
clocksource: prevent selection of low resolution clocksourse also for nohz=on
clocksource: sanity check sysfs clocksource changes

Linus Torvalds
2009-06-16 00:58:33 +0800
9aaa63050 Merge branch 'timers-for-linus-ntp' of git://git.kernel.org/pub/scm/linux/kernel… ... Browse Code »

…/git/tip/linux-2.6-tip

* 'timers-for-linus-ntp' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
ntp: fix comment typos
ntp: adjust SHIFT_PLL to improve NTP convergence

Linus Torvalds
2009-06-16 00:43:24 +0800
2ed0e21b3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1244 commits)
pkt_sched: Rename PSCHED_US2NS and PSCHED_NS2US
ipv4: Fix fib_trie rebalancing
Bluetooth: Fix issue with uninitialized nsh.type in DTL-1 driver
Bluetooth: Fix Kconfig issue with RFKILL integration
PIM-SM: namespace changes
ipv4: update ARPD help text
net: use a deferred timer in rt_check_expire
ieee802154: fix kconfig bool/tristate muckup
bonding: initialization rework
bonding: use is_zero_ether_addr
bonding: network device names are case sensative
bonding: elminate bad refcount code
bonding: fix style issues
bonding: fix destructor
bonding: remove bonding read/write semaphore
bonding: initialize before registration
bonding: bond_create always called with default parameters
x_tables: Convert printk to pr_err
netfilter: conntrack: optional reliable conntrack event delivery
list_nulls: add hlist_nulls_add_head and hlist_nulls_del
...

Linus Torvalds
2009-06-16 00:40:05 +0800
0fa213310 Merge branch 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc ... Browse Code »

* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (103 commits)
powerpc: Fix bug in move of altivec code to vector.S
powerpc: Add support for swiotlb on 32-bit
powerpc/spufs: Remove unused error path
powerpc: Fix warning when printing a resource_size_t
powerpc/xmon: Remove unused variable in xmon.c
powerpc/pseries: Fix warnings when printing resource_size_t
powerpc: Shield code specific to 64-bit server processors
powerpc: Separate PACA fields for server CPUs
powerpc: Split exception handling out of head_64.S
powerpc: Introduce CONFIG_PPC_BOOK3S
powerpc: Move VMX and VSX asm code to vector.S
powerpc: Set init_bootmem_done on NUMA platforms as well
powerpc/mm: Fix a AB->BA deadlock scenario with nohash MMU context lock
powerpc/mm: Fix some SMP issues with MMU context handling
powerpc: Add PTRACE_SINGLEBLOCK support
fbdev: Add PLB support and cleanup DCR in xilinxfb driver.
powerpc/virtex: Add ml510 reference design device tree
powerpc/virtex: Add Xilinx ML510 reference design support
powerpc/virtex: refactor intc driver and add support for i8259 cascading
powerpc/virtex: Add support for Xilinx PCI host bridge
...

Linus Torvalds
2009-06-16 00:32:52 +0800
d3bf80bff Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6:
regulator/max1586: fix V3 gain calculation integer overflow
regulator/max1586: support increased V3 voltage range
regulator: lp3971 - fix driver link error when built-in.
LP3971 PMIC regulator driver (updated and combined version)
regulator: remove driver_data direct access of struct device
regulator: Set MODULE_ALIAS for regulator drivers
regulator: Support list_voltage for fixed voltage regulator
regulator: Move regulator drivers to subsys_initcall()
regulator: build fix for powerpc - renamed show_state
regulator: add userspace-consumer driver
Maxim 1586 regulator driver

Linus Torvalds
2009-06-16 00:27:37 +0800
1dcd775eb [IA64] fix compile error in arch/ia64/mm/extable.c ... Browse Code »

ad6561dffa17f17bb68d7207d422c26c381c4313 ("module: trim exception table on init
free.") put a bogus trim_init_extable() function into ia64 which didn't compile.

Signed-off-by: Rusty Russell
Signed-off-by: Tony Luck

Rusty Russell
2009-06-16 00:17:50 +0800
9c7cb99a8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (22 commits)
nilfs2: support contiguous lookup of blocks
nilfs2: add sync_page method to page caches of meta data
nilfs2: use device's backing_dev_info for btree node caches
nilfs2: return EBUSY against delete request on snapshot
nilfs2: modify list of unsupported features in caveats
nilfs2: enable sync_page method
nilfs2: set bio unplug flag for the last bio in segment
nilfs2: allow future expansion of metadata read out via get info ioctl
NILFS2: Pagecache usage optimization on NILFS2
nilfs2: remove nilfs_btree_operations from btree mapping
nilfs2: remove nilfs_direct_operations from direct mapping
nilfs2: remove bmap pointer operations
nilfs2: remove useless b_low and b_high fields from nilfs_bmap struct
nilfs2: remove pointless NULL check of bpop_commit_alloc_ptr function
nilfs2: move get block functions in bmap.c into btree codes
nilfs2: remove nilfs_bmap_delete_block
nilfs2: remove nilfs_bmap_put_block
nilfs2: remove header file for segment list operations
nilfs2: eliminate removal list of segments
nilfs2: add sufile function that can modify multiple segment usages
...

Linus Torvalds
2009-06-16 00:13:49 +0800

15 Jun, 2009

5 commits

c8f1e5025 regulator/max1586: fix V3 gain calculation integer overflow ... Browse Code »

On Thu, May 28, 2009 at 10:59 AM, Mark Brown wrote:
> On Thu, May 28, 2009 at 07:15:16AM +0200, Philipp Zabel wrote:
>> The V3 regulator can be configured with an external resistor
>> connected to the feedback pin (R24 in the data sheet) to
>> increase the voltage range.
>>
>> For example, hx4700 has R24 = 3.32 kOhm to achieve a maximum
>> V3 voltage of 1.55 V which is needed for 624 MHz CPU frequency.
>>
>> Signed-off-by: Philipp Zabel
>
> Looks good.
>
> Acked-by: Mark Brown

Thanks, but it turns out I hit a 32 bit integer overflow in
the gain calculation. I'd like to mend that with the following
patch. Now max_uV could be increased up to 4.294 V, enough to
charge LiPo cells.

Signed-off-by: Philipp Zabel
Acked-by: Robert Jarzmik
Signed-off-by: Liam Girdwood

Philipp Zabel
2009-06-15 18:18:27 +0800
b110a8fb2 regulator/max1586: support increased V3 voltage range ... Browse Code »

The V3 regulator can be configured with an external resistor
connected to the feedback pin (R24 in the data sheet) to
increase the voltage range.

For example, hx4700 has R24 = 3.32 kOhm to achieve a maximum
V3 voltage of 1.55 V which is needed for 624 MHz CPU frequency.

Signed-off-by: Philipp Zabel
Acked-by: Mark Brown
Acked-by: Robert Jarzmik
Signed-off-by: Liam Girdwood

Philipp Zabel
2009-06-15 18:18:26 +0800
6113c3a5a regulator: lp3971 - fix driver link error when built-in. ... Browse Code »

lp3971_i2c_remove' referenced in section `.data' of drivers/built-in.o:
defined in discarded section `.devexit.text' of drivers/built-in.o

Signed-off-by: Liam Girdwood

Liam Girdwood
2009-06-15 18:18:26 +0800
0cbdf7bce LP3971 PMIC regulator driver (updated and combined version) ... Browse Code »

This patch adds regulator drivers for National Semiconductors LP3971 PMIC.
This LP3971 PMIC controller has 3 DC/DC voltage converters and 5 low
drop-out (LDO) regulators. LP3971 PMIC controller uses I2C interface.

Reviewed-by: Kyungmin Park
Signed-off-by: Marek Szyprowski
Acked-by: Mark Brown
Signed-off-by: Liam Girdwood

Marek Szyprowski
2009-06-15 18:18:26 +0800
1909e2f65 regulator: remove driver_data direct access of struct device ... Browse Code »

In the near future, the driver core is going to not allow direct access
to the driver_data pointer in struct device. Instead, the functions
dev_get_drvdata() and dev_set_drvdata() should be used. These functions
have been around since the beginning, so are backwards compatible with
all older kernel versions.

Cc: Mark Brown
Cc: Liam Girdwood
Signed-off-by: Greg Kroah-Hartman
Acked-by: Mark Brown
Signed-off-by: Liam Girdwood

Greg Kroah-Hartman
2009-06-15 18:18:25 +0800