Eric Lee / smarc-fsl-linux-kernel

30 May, 2012

18 commits

4d67d8605 mm: use kcalloc() instead of kzalloc() to allocate array ... Browse Code »

The advantage of kcalloc is, that will prevent integer overflows which
could result from the multiplication of number of elements and size and
it is also a bit nicer to read.

The semantic patch that makes this change is available in
https://lkml.org/lkml/2011/11/25/107

Signed-off-by: Thomas Meyer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thomas Meyer
2012-05-30 07:22:19 +0800
f62388187 mm: fix off-by-one bug in print_nodes_state() ... Browse Code »

/sys/devices/system/node/{online,possible} outputs a garbage byte
because print_nodes_state() returns content size + 1. To fix the bug,
the patch changes the use of cpuset_sprintf_cpulist to follow the use at
other places, which is clearer and safer.

This bug was introduced in v2.6.24 (commit bde631a51876: "mm: add node
states sysfs class attributeS").

Signed-off-by: Ryota Ozaki
Cc: Lee Schermerhorn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ryota Ozaki
2012-05-30 07:22:19 +0800
23b9da55c mm: vmscan: remove reclaim_mode_t ... Browse Code »

There is little motiviation for reclaim_mode_t once RECLAIM_MODE_[A]SYNC
and lumpy reclaim have been removed. This patch gets rid of
reclaim_mode_t as well and improves the documentation about what
reclaim/compaction is and when it is triggered.

Signed-off-by: Mel Gorman
Acked-by: Rik van Riel
Acked-by: KOSAKI Motohiro
Cc: Konstantin Khlebnikov
Cc: Hugh Dickins
Cc: Ying Han
Cc: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-05-30 07:22:19 +0800
41ac1999c mm: vmscan: do not stall on writeback during memory compaction ... Browse Code »

This patch stops reclaim/compaction entering sync reclaim as this was
only intended for lumpy reclaim and an oversight. Page migration has
its own logic for stalling on writeback pages if necessary and memory
compaction is already using it.

Waiting on page writeback is bad for a number of reasons but the primary
one is that waiting on writeback to a slow device like USB can take a
considerable length of time. Page reclaim instead uses
wait_iff_congested() to throttle if too many dirty pages are being
scanned.

Signed-off-by: Mel Gorman
Acked-by: Rik van Riel
Acked-by: KOSAKI Motohiro
Cc: Konstantin Khlebnikov
Cc: Hugh Dickins
Cc: Ying Han
Cc: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-05-30 07:22:19 +0800
c53919adc mm: vmscan: remove lumpy reclaim ... Browse Code »

This series removes lumpy reclaim and some stalling logic that was
unintentionally being used by memory compaction. The end result is that
stalling on dirty pages during page reclaim now depends on
wait_iff_congested().

Four kernels were compared

3.3.0 vanilla
3.4.0-rc2 vanilla
3.4.0-rc2 lumpyremove-v2 is patch one from this series
3.4.0-rc2 nosync-v2r3 is the full series

Removing lumpy reclaim saves almost 900 bytes of text whereas the full
series removes 1200 bytes.

text data bss dec hex filename
6740375 1927944 2260992 10929311 a6c49f vmlinux-3.4.0-rc2-vanilla
6739479 1927944 2260992 10928415 a6c11f vmlinux-3.4.0-rc2-lumpyremove-v2
6739159 1927944 2260992 10928095 a6bfdf vmlinux-3.4.0-rc2-nosync-v2

There are behaviour changes in the series and so tests were run with
monitoring of ftrace events. This disrupts results so the performance
results are distorted but the new behaviour should be clearer.

fs-mark running in a threaded configuration showed little of interest as
it did not push reclaim aggressively

FS-Mark Multi Threaded
3.3.0-vanilla rc2-vanilla lumpyremove-v2r3 nosync-v2r3
Files/s min 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%)
Files/s mean 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%)
Files/s stddev 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
Files/s max 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%)
Overhead min 508667.00 ( 0.00%) 521350.00 (-2.49%) 544292.00 (-7.00%) 547168.00 (-7.57%)
Overhead mean 551185.00 ( 0.00%) 652690.73 (-18.42%) 991208.40 (-79.83%) 570130.53 (-3.44%)
Overhead stddev 18200.69 ( 0.00%) 331958.29 (-1723.88%) 1579579.43 (-8578.68%) 9576.81 (47.38%)
Overhead max 576775.00 ( 0.00%) 1846634.00 (-220.17%) 6901055.00 (-1096.49%) 585675.00 (-1.54%)
MMTests Statistics: duration
Sys Time Running Test (seconds) 309.90 300.95 307.33 298.95
User+Sys Time Running Test (seconds) 319.32 309.67 315.69 307.51
Total Elapsed Time (seconds) 1187.85 1193.09 1191.98 1193.73

MMTests Statistics: vmstat
Page Ins 80532 82212 81420 79480
Page Outs 111434984 111456240 111437376 111582628
Swap Ins 0 0 0 0
Swap Outs 0 0 0 0
Direct pages scanned 44881 27889 27453 34843
Kswapd pages scanned 25841428 25860774 25861233 25843212
Kswapd pages reclaimed 25841393 25860741 25861199 25843179
Direct pages reclaimed 44881 27889 27453 34843
Kswapd efficiency 99% 99% 99% 99%
Kswapd velocity 21754.791 21675.460 21696.029 21649.127
Direct efficiency 100% 100% 100% 100%
Direct velocity 37.783 23.375 23.031 29.188
Percentage direct scans 0% 0% 0% 0%

ftrace showed that there was no stalling on writeback or pages submitted
for IO from reclaim context.

postmark was similar and while it was more interesting, it also did not
push reclaim heavily.

POSTMARK
3.3.0-vanilla rc2-vanilla lumpyremove-v2r3 nosync-v2r3
Transactions per second: 16.00 ( 0.00%) 20.00 (25.00%) 18.00 (12.50%) 17.00 ( 6.25%)
Data megabytes read per second: 18.80 ( 0.00%) 24.27 (29.10%) 22.26 (18.40%) 20.54 ( 9.26%)
Data megabytes written per second: 35.83 ( 0.00%) 46.25 (29.08%) 42.42 (18.39%) 39.14 ( 9.24%)
Files created alone per second: 28.00 ( 0.00%) 38.00 (35.71%) 34.00 (21.43%) 30.00 ( 7.14%)
Files create/transact per second: 8.00 ( 0.00%) 10.00 (25.00%) 9.00 (12.50%) 8.00 ( 0.00%)
Files deleted alone per second: 556.00 ( 0.00%) 1224.00 (120.14%) 3062.00 (450.72%) 6124.00 (1001.44%)
Files delete/transact per second: 8.00 ( 0.00%) 10.00 (25.00%) 9.00 (12.50%) 8.00 ( 0.00%)

MMTests Statistics: duration
Sys Time Running Test (seconds) 113.34 107.99 109.73 108.72
User+Sys Time Running Test (seconds) 145.51 139.81 143.32 143.55
Total Elapsed Time (seconds) 1159.16 899.23 980.17 1062.27

MMTests Statistics: vmstat
Page Ins 13710192 13729032 13727944 13760136
Page Outs 43071140 42987228 42733684 42931624
Swap Ins 0 0 0 0
Swap Outs 0 0 0 0
Direct pages scanned 0 0 0 0
Kswapd pages scanned 9941613 9937443 9939085 9929154
Kswapd pages reclaimed 9940926 9936751 9938397 9928465
Direct pages reclaimed 0 0 0 0
Kswapd efficiency 99% 99% 99% 99%
Kswapd velocity 8576.567 11051.058 10140.164 9347.109
Direct efficiency 100% 100% 100% 100%
Direct velocity 0.000 0.000 0.000 0.000

It looks like here that the full series regresses performance but as
ftrace showed no usage of wait_iff_congested() or sync reclaim I am
assuming it's a disruption due to monitoring. Other data such as memory
usage, page IO, swap IO all looked similar.

Running a benchmark with a plain DD showed nothing very interesting.
The full series stalled in wait_iff_congested() slightly less but stall
times on vanilla kernels were marginal.

Running a benchmark that hammered on file-backed mappings showed stalls
due to congestion but not in sync writebacks

MICRO
3.3.0-vanilla rc2-vanilla lumpyremove-v2r3 nosync-v2r3
MMTests Statistics: duration
Sys Time Running Test (seconds) 308.13 294.50 298.75 299.53
User+Sys Time Running Test (seconds) 330.45 316.28 318.93 320.79
Total Elapsed Time (seconds) 1814.90 1833.88 1821.14 1832.91

MMTests Statistics: vmstat
Page Ins
Page Outs
Swap Ins
Swap Outs
Direct pages scanned
Kswapd pages scanned
Kswapd pages reclaimed
Direct pages reclaimed
Kswapd efficiency
Kswapd velocity
Direct efficiency
Direct velocity
Percentage direct scans
Page writes by reclaim
Page writes file
Page writes anon
Page reclaim immediate
Page rescued immediate
Slabs scanned
Direct inode steals
Kswapd inode steals 108712 120708 97224 110344 155514576 156017404 155813676 156193256 0 0 0 0 0 0 0 0 2599253 1550480 2512822 2414760 69742364 71150694 68839041 69692533 34824488 34773341 34796602 34799396 53693 94750 61792 75205 49% 48% 50% 49% 38427.662 38797.901 37799.972 38022.889 2% 6% 2% 3% 1432.174 845.464 1379.807 1317.446 3% 2% 3% 3% 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1218 0 0 0 0 15360 16384 13312 16384 0 0 0 0 4340 4327 1630 4323

FTrace Reclaim Statistics: congestion_wait
Direct number congest waited 0 0 0 0
Direct time congest waited 0ms 0ms 0ms 0ms
Direct full congest waited 0 0 0 0
Direct number conditional waited 900 870 754 789
Direct time conditional waited 0ms 0ms 0ms 20ms
Direct full conditional waited 0 0 0 0
KSwapd number congest waited 2106 2308 2116 1915
KSwapd time congest waited 139924ms 157832ms 125652ms 132516ms
KSwapd full congest waited 1346 1530 1202 1278
KSwapd number conditional waited 12922 16320 10943 14670
KSwapd time conditional waited 0ms 0ms 0ms 0ms
KSwapd full conditional waited 0 0 0 0

Reclaim statistics are not radically changed. The stall times in kswapd
are massive but it is clear that it is due to calls to congestion_wait()
and that is almost certainly the call in balance_pgdat(). Otherwise
stalls due to dirty pages are non-existant.

I ran a benchmark that stressed high-order allocation. This is very
artifical load but was used in the past to evaluate lumpy reclaim and
compaction. Generally I look at allocation success rates and latency
figures.

STRESS-HIGHALLOC
3.3.0-vanilla rc2-vanilla lumpyremove-v2r3 nosync-v2r3
Pass 1 81.00 ( 0.00%) 28.00 (-53.00%) 24.00 (-57.00%) 28.00 (-53.00%)
Pass 2 82.00 ( 0.00%) 39.00 (-43.00%) 38.00 (-44.00%) 43.00 (-39.00%)
while Rested 88.00 ( 0.00%) 87.00 (-1.00%) 88.00 ( 0.00%) 88.00 ( 0.00%)

MMTests Statistics: duration
Sys Time Running Test (seconds) 740.93 681.42 685.14 684.87
User+Sys Time Running Test (seconds) 2922.65 3269.52 3281.35 3279.44
Total Elapsed Time (seconds) 1161.73 1152.49 1159.55 1161.44

MMTests Statistics: vmstat
Page Ins 4486020 2807256 2855944 2876244
Page Outs 7261600 7973688 7975320 7986120
Swap Ins 31694 0 0 0
Swap Outs 98179 0 0 0
Direct pages scanned 53494 57731 34406 113015
Kswapd pages scanned 6271173 1287481 1278174 1219095
Kswapd pages reclaimed 2029240 1281025 1260708 1201583
Direct pages reclaimed 1468 14564 16649 92456
Kswapd efficiency 32% 99% 98% 98%
Kswapd velocity 5398.133 1117.130 1102.302 1049.641
Direct efficiency 2% 25% 48% 81%
Direct velocity 46.047 50.092 29.672 97.306
Percentage direct scans 0% 4% 2% 8%
Page writes by reclaim 1616049 0 0 0
Page writes file 1517870 0 0 0
Page writes anon 98179 0 0 0
Page reclaim immediate 103778 27339 9796 17831
Page rescued immediate 0 0 0 0
Slabs scanned 1096704 986112 980992 998400
Direct inode steals 223 215040 216736 247881
Kswapd inode steals 175331 61548 68444 63066
Kswapd skipped wait 21991 0 1 0
THP fault alloc 1 135 125 134
THP collapse alloc 393 311 228 236
THP splits 25 13 7 8
THP fault fallback 0 0 0 0
THP collapse fail 3 5 7 7
Compaction stalls 865 1270 1422 1518
Compaction success 370 401 353 383
Compaction failures 495 869 1069 1135
Compaction pages moved 870155 3828868 4036106 4423626
Compaction move failure 26429 23865 29742 27514

Success rates are completely hosed for 3.4-rc2 which is almost certainly
due to commit fe2c2a106663 ("vmscan: reclaim at order 0 when compaction
is enabled"). I expected this would happen for kswapd and impair
allocation success rates (https://lkml.org/lkml/2012/1/25/166) but I did
not anticipate this much a difference: 80% less scanning, 37% less
reclaim by kswapd

In comparison, reclaim/compaction is not aggressive and gives up easily
which is the intended behaviour. hugetlbfs uses __GFP_REPEAT and would
be much more aggressive about reclaim/compaction than THP allocations
are. The stress test above is allocating like neither THP or hugetlbfs
but is much closer to THP.

Mainline is now impaired in terms of high order allocation under heavy
load although I do not know to what degree as I did not test with
__GFP_REPEAT. Keep this in mind for bugs related to hugepage pool
resizing, THP allocation and high order atomic allocation failures from
network devices.

In terms of congestion throttling, I see the following for this test

FTrace Reclaim Statistics: congestion_wait
Direct number congest waited 3 0 0 0
Direct time congest waited 0ms 0ms 0ms 0ms
Direct full congest waited 0 0 0 0
Direct number conditional waited 957 512 1081 1075
Direct time conditional waited 0ms 0ms 0ms 0ms
Direct full conditional waited 0 0 0 0
KSwapd number congest waited 36 4 3 5
KSwapd time congest waited 3148ms 400ms 300ms 500ms
KSwapd full congest waited 30 4 3 5
KSwapd number conditional waited 88514 197 332 542
KSwapd time conditional waited 4980ms 0ms 0ms 0ms
KSwapd full conditional waited 49 0 0 0

The "conditional waited" times are the most interesting as this is
directly impacted by the number of dirty pages encountered during scan.
As lumpy reclaim is no longer scanning contiguous ranges, it is finding
fewer dirty pages. This brings wait times from about 5 seconds to 0.
kswapd itself is still calling congestion_wait() so it'll still stall but
it's a lot less.

In terms of the type of IO we were doing, I see this

FTrace Reclaim Statistics: mm_vmscan_writepage
Direct writes anon sync 0 0 0 0
Direct writes anon async 0 0 0 0
Direct writes file sync 0 0 0 0
Direct writes file async 0 0 0 0
Direct writes mixed sync 0 0 0 0
Direct writes mixed async 0 0 0 0
KSwapd writes anon sync 0 0 0 0
KSwapd writes anon async 91682 0 0 0
KSwapd writes file sync 0 0 0 0
KSwapd writes file async 822629 0 0 0
KSwapd writes mixed sync 0 0 0 0
KSwapd writes mixed async 0 0 0 0

In 3.2, kswapd was doing a bunch of async writes of pages but
reclaim/compaction was never reaching a point where it was doing sync
IO. This does not guarantee that reclaim/compaction was not calling
wait_on_page_writeback() but I would consider it unlikely. It indicates
that merging patches 2 and 3 to stop reclaim/compaction calling
wait_on_page_writeback() should be safe.

This patch:

Lumpy reclaim had a purpose but in the mind of some, it was to kick the
system so hard it trashed. For others the purpose was to complicate
vmscan.c. Over time it was giving softer shoes and a nicer attitude but
memory compaction needs to step up and replace it so this patch sends
lumpy reclaim to the farm.

The tracepoint format changes for isolating LRU pages with this patch
applied. Furthermore reclaim/compaction can no longer queue dirty pages
in pageout() if the underlying BDI is congested. Lumpy reclaim used
this logic and reclaim/compaction was using it in error.

Signed-off-by: Mel Gorman
Acked-by: Rik van Riel
Acked-by: KOSAKI Motohiro
Cc: Konstantin Khlebnikov
Cc: Hugh Dickins
Cc: Ying Han
Cc: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-05-30 07:22:19 +0800
e709ffd61 mm: remove swap token code ... Browse Code »

The swap token code no longer fits in with the current VM model. It
does not play well with cgroups or the better NUMA placement code in
development, since we have only one swap token globally.

It also has the potential to mess with scalability of the system, by
increasing the number of non-reclaimable pages on the active and
inactive anon LRU lists.

Last but not least, the swap token code has been broken for a year
without complaints, as reported by Konstantin Khlebnikov. This suggests
we no longer have much use for it.

The days of sub-1G memory systems with heavy use of swap are over. If
we ever need thrashing reducing code in the future, we will have to
implement something that does scale.

Signed-off-by: Rik van Riel
Cc: Konstantin Khlebnikov
Acked-by: Johannes Weiner
Cc: Mel Gorman
Cc: Hugh Dickins
Acked-by: Bob Picco
Acked-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rik van Riel
2012-05-30 07:22:19 +0800
edad9d2c3 mm, thp: allow fallback when pte_alloc_one() fails for huge pmd ... Browse Code »

The transparent hugepages feature is careful to not invoke the oom
killer when a hugepage cannot be allocated.

pte_alloc_one() failing in __do_huge_pmd_anonymous_page(), however,
currently results in VM_FAULT_OOM which invokes the pagefault oom killer
to kill a memory-hogging task.

This is unnecessary since it's possible to drop the reference to the
hugepage and fallback to allocating a small page.

Signed-off-by: David Rientjes
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2012-05-30 07:22:19 +0800
aa2e878ef mm, thp: remove unnecessary ret variable ... Browse Code »

The "ret" variable is unnecessary in __do_huge_pmd_anonymous_page(), so
remove it.

Signed-off-by: David Rientjes
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2012-05-30 07:22:18 +0800
f2135a4a5 mm/hugetlb.c: use long vars instead of int in region_count() ... Browse Code »

The arguments f & t and fields from & to of struct file_region are
defined as long. So use long instead of int to type the temp vars.

Signed-off-by: Wang Sheng-Hui
Acked-by: David Rientjes
Acked-by: Hillf Danton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wang Sheng-Hui
2012-05-30 07:22:18 +0800
89c522c78 mm/mempolicy.c: use enum value MPOL_REBIND_ONCE in mpol_rebind_policy() ... Browse Code »

We have enum definition in mempolicy.h: MPOL_REBIND_ONCE. It should
replace the magic number 0 for step comparison in function
mpol_rebind_policy.

Signed-off-by: Wang Sheng-Hui
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wang Sheng-Hui
2012-05-30 07:22:18 +0800
71dd0b8ae mm/memory_failure: let the compiler add the function name ... Browse Code »

These things tend to get out of sync with time so let the compiler
automatically enter the current function name using __func__.

No functional change.

Signed-off-by: Borislav Petkov
Acked-by: Andi Kleen
Cc: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Borislav Petkov
2012-05-30 07:22:18 +0800
08fa29d91 mm: fix NULL ptr deref when walking hugepages ... Browse Code »

A missing validation of the value returned by find_vma() could cause a
NULL ptr dereference when walking the pagetable.

This is triggerable from usermode by a simple user by trying to read a
page info out of /proc/pid/pagemap which doesn't exist.

Introduced by commit 025c5b2451e4 ("thp: optimize away unnecessary page
table locking").

Signed-off-by: Sasha Levin
Reviewed-by: Naoya Horiguchi
Cc: David Rientjes
Cc: Andi Kleen
Cc: Andrea Arcangeli
Cc: KOSAKI Motohiro
Cc: KAMEZAWA Hiroyuki
Cc: [3.4.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Sasha Levin
2012-05-30 07:22:18 +0800
4c9c6a1bc cris: select GENERIC_ATOMIC64 ... Browse Code »

Cris doesn't implement atomic64 operations neither, should select
GENERIC_ATOMIC64.

Signed-off-by: WANG Cong
Cc: Mikael Starvik
Cc: Jesper Nilsson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cong Wang
2012-05-30 07:22:18 +0800
af2e84097 pagemap.h: fix warning about possibly used before init var ... Browse Code »

Commit f56f821feb7b ("mm: extend prefault helpers to fault in more than
PAGE_SIZE") added in the new functions: fault_in_multipages_writeable()
and fault_in_multipages_readable().

However, we currently see:

include/linux/pagemap.h:492: warning: 'ret' may be used uninitialized in this function
include/linux/pagemap.h:492: note: 'ret' was declared here

Unlike a lot of gcc nags, this one appears somewhat legit. i.e. passing
in an invalid negative value of "size" does make it look like all the
conditionals in there would be bypassed and the uninitialized value would
be returned.

Signed-off-by: Paul Gortmaker
Cc: Daniel Vetter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Gortmaker
2012-05-30 07:22:18 +0800
4b7814746 Merge tag 'mfd-3.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6 ... Browse Code »

Pull MFD changes from Samuel Ortiz:
"Besides the usual cleanups, this one brings:

* Support for 5 new chipsets: Intel's ICH LPC and SCH Centerton,
ST-E's STAX211, Samsung's MAX77693 and TI's LM3533.

* Device tree support for the twl6040, tps65910, da9502 and ab8500
drivers.

* Fairly big tps56910, ab8500 and db8500 updates.

* i2c support for mc13xxx.

* Our regular update for the wm8xxx driver from Mark."

Fix up various conflicts with other trees, largely due to ab5500 removal
etc.

* tag 'mfd-3.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6: (106 commits)
mfd: Fix build break of max77693 by adding REGMAP_I2C option
mfd: Fix twl6040 build failure
mfd: Fix max77693 build failure
mfd: ab8500-core should depend on MFD_DB8500_PRCMU
gpio: tps65910: dt: process gpio specific device node info
mfd: Remove the parsing of dt info for tps65910 gpio
mfd: Save device node parsed platform data for tps65910 sub devices
mfd: Add r_select to lm3533 platform data
gpio: Add Intel Centerton support to gpio-sch
mfd: Emulate active low IRQs as well as active high IRQs for wm831x
mfd: Mark two lm3533 zone registers as volatile
mfd: Fix return type of lm533 attribute is_visible
mfd: Enable Device Tree support in the ab8500-pwm driver
mfd: Enable Device Tree support in the ab8500-sysctrl driver
mfd: Add support for Device Tree to twl6040
mfd: Register the twl6040 child for the ASoC codec unconditionally
mfd: Allocate twl6040 IRQ numbers dynamically
mfd: twl6040 code cleanup in interrupt initialization part
mfd: Enable ab8500-gpadc driver for Device Tree
mfd: Prevent unassigned pointer from being used in ab8500-gpadc driver
...

Linus Torvalds
2012-05-30 02:53:11 +0800
53f2c4a8f Merge tag 'nfs-for-3.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client updates from Trond Myklebust:
"New features include:
- Rewrite the O_DIRECT code so that it can share the same coalescing
and pNFS functionality as the page cache code.
- Allow the server to provide hints as to when we should use pNFS,
and when it is more efficient to read and write through the
metadata server.
- NFS cache consistency updates:
* Use the ctime to emulate a change attribute for NFSv2/v3 so that
all NFS versions can share the same cache management code.
* New cache management code will only look at the change attribute
and size attribute when deciding whether or not our cached data
is still valid or not.
* Don't request NFSv4 post-op attributes on writes in cases such as
O_DIRECT, where we don't care about data cache consistency, or
when we have a write delegation, and know that our cache is still
consistent.
* Don't request NFSv4 post-op attributes on operations such as
COMMIT, where there are no expected metadata updates.
* Don't request NFSv4 directory post-op attributes in cases where
the operations themselves already return change attribute
updates: i.e. operations such as OPEN, CREATE, REMOVE, LINK and
RENAME.
- Speed up 'ls' and friends by using READDIR rather than READDIRPLUS
if we detect no attempts to lookup filenames.
- Improve the code sharing between NFSv2/v3 and v4 mounts
- NFSv4.1 state management efficiency improvements
- More patches in preparation for NFSv4/v4.1 migration functionality."

Fix trivial conflict in fs/nfs/nfs4proc.c that was due to the dcache
qstr name initialization changes (that made the length/hash a 64-bit
union)

* tag 'nfs-for-3.5-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (146 commits)
NFSv4: Add debugging printks to state manager
NFSv4: Map NFS4ERR_SHARE_DENIED into an EACCES error instead of EIO
NFSv4: update_changeattr does not need to set NFS_INO_REVAL_PAGECACHE
NFSv4.1: nfs4_reset_session should use nfs4_handle_reclaim_lease_error
NFSv4.1: Handle other occurrences of NFS4ERR_CONN_NOT_BOUND_TO_SESSION
NFSv4.1: Handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION in the state manager
NFSv4.1: Handle errors in nfs4_bind_conn_to_session
NFSv4.1: nfs4_bind_conn_to_session should drain the session
NFSv4.1: Don't clobber the seqid if exchange_id returns a confirmed clientid
NFSv4.1: Add DESTROY_CLIENTID
NFSv4.1: Ensure we use the correct credentials for bind_conn_to_session
NFSv4.1: Ensure we use the correct credentials for session create/destroy
NFSv4.1: Move NFSPROC4_CLNT_BIND_CONN_TO_SESSION to the end of the operations
NFSv4.1: Handle NFS4ERR_SEQ_MISORDERED when confirming the lease
NFSv4: When purging the lease, we must clear NFS4CLNT_LEASE_CONFIRM
NFSv4: Clean up the error handling for nfs4_reclaim_lease
NFSv4.1: Exchange ID must use GFP_NOFS allocation mode
nfs41: Use BIND_CONN_TO_SESSION for CB_PATH_DOWN*
nfs4.1: add BIND_CONN_TO_SESSION operation
NFSv4.1 test the mdsthreshold hint parameters
...

Linus Torvalds
2012-05-30 01:43:51 +0800
8f6576ad4 tty: fix ldisc lock inversion trace ... Browse Code »
46

This is caused by tty_release using tty_lock_pair to lock both sides of
the pty/tty pair, and then tty_ldisc_release dropping and relocking one
side only. We can drop both fine, so drop both to avoid any lock
ordering concerns.

Rework the release path to fix the new locking model.

Signed-off-by: Alan Cox
Acked-by: Greg Kroah-Hartman
Signed-off-by: Linus Torvalds

Alan Cox
2012-05-30 01:42:13 +0800
d3ca8b64b pty: Fix lock inversion ... Browse Code »
46

The ptmx_open path takes the tty and devpts locks in the wrong order
because tty_init_dev locks and returns a locked tty. As far as I can
tell this is actually safe anyway because the tty being returned is new
so nobody can get a reference to lock it at this point.

However we don't even need the devpts lock at this point, it's only held
as a byproduct of the way the locks were pushe down.

Signed-off-by: Alan Cox
Acked-by: Greg Kroah-Hartman
Signed-off-by: Linus Torvalds

Alan Cox
2012-05-30 01:42:13 +0800

29 May, 2012

9 commits

cc0a98436 NFSv4: Add debugging printks to state manager ... Browse Code »

Signed-off-by: Trond Myklebust

Trond Myklebust
2012-05-29 05:21:57 +0800
fb13bfa7e NFSv4: Map NFS4ERR_SHARE_DENIED into an EACCES error instead of EIO ... Browse Code »
1

If a file OPEN is denied due to a share lock, the resulting
NFS4ERR_SHARE_DENIED is currently mapped to the default EIO.
This patch adds a more appropriate mapping, and brings Linux
into line with what Solaris 10 does.

See https://bugzilla.kernel.org/show_bug.cgi?id=43286

Signed-off-by: Trond Myklebust
Cc: stable@vger.kernel.org

Trond Myklebust
2012-05-29 05:21:48 +0800
a01ee165a Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd ... Browse Code »

Pull exofs updates from Boaz Harrosh:
"Just a couple of patches. The first is a BUG fix destined for stable
which missed the 3.4-rc7 Kernel. The second is just a fixture
addition so exofs is able to be better exported as a cluster file
system via pNFS."

* 'for-linus' of git://git.open-osd.org/linux-open-osd:
exofs: Add SYSFS info for autologin/pNFS export
exofs: Fix CRASH on very early IO errors.

Linus Torvalds
2012-05-29 04:10:41 +0800
d766023ee Merge branch 'doc' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull documentation updates from Jiri Kosina:
"I am currently relaying documentation patches through 'doc' branch of
trivial tree, until Rob, the new documentation maintainer, has
established a proper tree."

* 'doc' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
doc: ext3: update documentation with barrier=1 default
Documentation/initrd.txt: Change the location of util-linux
Documentation/SubmittingPatches: suggested the use of scripts/get_maintainer.pl
Documentation/kernel-parameters: remove autotest and mcatest

Linus Torvalds
2012-05-29 01:40:11 +0800
905cec410 Merge branch 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild ... Browse Code »

Pull misc kbuild changes from Michal Marek:
"The non-critical part of kbuild for 3.5 includes

- two new coccinelle checks
- fix for make deb-pkg to include generated headers in arch/*/include

I have more make-deb-pkg fixes in the backlog, but these will likely
have to wait for 3.6."

* 'misc' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
builddeb: include autogenerated header files
scripts/coccinelle: sizeof of pointer
scripts/coccinelle: address test is always true

Linus Torvalds
2012-05-29 01:39:07 +0800
da85d3426 Merge branch 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild ... Browse Code »

Pull kconfig changes from Michal Marek:

- Error handling for make KCONFIG_ALLCONFIG= all*config plus a fix
for a bug that was exposed by this

- Fix for the script/config utility.

* 'kconfig' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
scripts/config: properly report and set string options
kbuild: all{no,yes,mod,def,rand}config only read files when instructed to.
kconfig: Add error handling to KCONFIG_ALLCONFIG

Linus Torvalds
2012-05-29 01:37:56 +0800
1347a2ceb Merge branch 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild ... Browse Code »

Pull kbuild updates from Michal Marek.

Fixed up nontrivial merge conflict in Makefile as per Stephen Rothwell
and linux-next (and trivial arch/sparc/Makefile changes due to removed
sparc32 logic).

* 'kbuild' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
mips: Fix KBUILD_CPPFLAGS definition
kbuild: fix ia64 link
kbuild: document KBUILD_LDS, KBUILD_VMLINUX_{INIT,MAIN} and LDFLAGS_vmlinux
kbuild: link of vmlinux moved to a script
kbuild: refactor final link of sparc32
kbuild: drop unused KBUILD_VMLINUX_OBJS from top-level Makefile
kbuild: Makefile: remove unnecessary check for m68knommu ARCH

Linus Torvalds
2012-05-29 01:32:28 +0800
90324cc1b Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux ... Browse Code »

Pull writeback tree from Wu Fengguang:
"Mainly from Jan Kara to avoid iput() in the flusher threads."

* tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: Avoid iput() from flusher thread
vfs: Rename end_writeback() to clear_inode()
vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
writeback: Refactor writeback_single_inode()
writeback: Remove wb->list_lock from writeback_single_inode()
writeback: Separate inode requeueing after writeback
writeback: Move I_DIRTY_PAGES handling
writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
writeback: Move clearing of I_SYNC into inode_sync_complete()
writeback: initialize global_dirty_limit
fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
mm: page-writeback.c: local functions should not be exposed globally

Linus Torvalds
2012-05-29 00:54:45 +0800
fb8b00675 Merge branch 'next' of git://git.monstr.eu/linux-2.6-microblaze ... Browse Code »

Pull microblaze changes from Michal Simek.

* 'next' of git://git.monstr.eu/linux-2.6-microblaze:
microblaze: Setup correct pointer to TLS area
microblaze: Add TLS support to sys_clone
microblaze: ftrace: Pass the first calling instruction for dynamic ftrace
microblaze: Port OOM changes to do_page_fault
microblaze: Do not select GENERIC_GPIO by default

Linus Torvalds
2012-05-29 00:49:56 +0800

28 May, 2012

9 commits

359d7d1c9 NFSv4: update_changeattr does not need to set NFS_INO_REVAL_PAGECACHE ... Browse Code »

We're already invalidating the data cache, and setting the new change
attribute. Since directories don't care about the i_size field, there
is no need to be forcing any extra revalidation of the page cache.

We do keep the NFS_INO_INVALID_ATTR flag, in order to force an
attribute cache revalidation on stat() calls since we do not
update the mtime and ctime fields.

Signed-off-by: Trond Myklebust

Trond Myklebust
2012-05-28 22:05:47 +0800
b48b2c3e5 openrisc: use generic strnlen_user() function ... Browse Code »
46

The generic version is both easier to support and more correct.

Signed-off-by: Jonas Bonn
Signed-off-by: Linus Torvalds

Jonas Bonn
2012-05-28 12:00:32 +0800
1629372ca powerpc: Use the new generic strncpy_from_user() and strnlen_user() ... Browse Code »

This is much the same as for SPARC except that we can do the find_zero()
function more efficiently using the count-leading-zeroes instructions.
Tested on 32-bit and 64-bit PowerPC.

Signed-off-by: Paul Mackerras
Acked-by: David S. Miller
Signed-off-by: Linus Torvalds

Paul Mackerras
2012-05-28 12:00:07 +0800
69ea64059 lib: Fix generic strnlen_user for 32-bit big-endian machines ... Browse Code »

The aligned_byte_mask() definition is wrong for 32-bit big-endian
machines: the "7-(n)" part of the definition assumes a long is 8
bytes. This fixes it by using BITS_PER_LONG - 8 instead of 8*7.
Tested on 32-bit and 64-bit PowerPC.

Signed-off-by: Paul Mackerras
Acked-by: David S. Miller
Signed-off-by: Linus Torvalds

Paul Mackerras
2012-05-28 11:59:46 +0800
f2c1b5100 NFSv4.1: nfs4_reset_session should use nfs4_handle_reclaim_lease_error ... Browse Code »

The results from a call to nfs4_proc_create_session() should always
be fed into nfs4_handle_reclaim_lease_error, so that we can
handle errors such as NFS4ERR_SEQ_MISORDERED correctly.

Signed-off-by: Trond Myklebust

Trond Myklebust
2012-05-28 02:50:44 +0800
9f594791d NFSv4.1: Handle other occurrences of NFS4ERR_CONN_NOT_BOUND_TO_SESSION ... Browse Code »

Let nfs4_schedule_session_recovery() handle the details of choosing
between resetting the session, and other session related recovery.

Signed-off-by: Trond Myklebust

Trond Myklebust
2012-05-28 02:33:07 +0800
7c5d72568 NFSv4.1: Handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION in the state manager ... Browse Code »

Signed-off-by: Trond Myklebust

Trond Myklebust
2012-05-28 02:33:07 +0800
bf674c822 NFSv4.1: Handle errors in nfs4_bind_conn_to_session ... Browse Code »

Ensure that we handle NFS4ERR_DELAY errors separately, and then
let nfs4_recovery_handle_error() handle all other cases.

Signed-off-by: Trond Myklebust

Trond Myklebust
2012-05-28 02:32:06 +0800
43ac544cb NFSv4.1: nfs4_bind_conn_to_session should drain the session ... Browse Code »

In order to avoid races with other RPC calls that end up setting the
NFS4CLNT_BIND_CONN_TO_SESSION flag.

Signed-off-by: Trond Myklebust

Trond Myklebust
2012-05-28 02:00:08 +0800

27 May, 2012

4 commits

1e2aec873 Merge branch 'generic-string-functions' ... Browse Code »

This makes actually live up to its promise of
allowing architectures to help tune the string functions that do their
work a word at a time.

David had already taken the x86 strncpy_from_user() function, modified
it to work on sparc, and then done the extra work to make it generically
useful. This then expands on that work by making x86 use that generic
version, completing the circle.

But more importantly, it fixes up the word-at-a-time interfaces so that
it's now easy to also support things like strnlen_user(), and pretty
much most random string functions.

David reports that it all works fine on sparc, and Jonas Bonn reported
that an earlier version of this worked on OpenRISC too. It's pretty
easy for architectures to add support for this and just replace their
private versions with the generic code.

* generic-string-functions:
sparc: use the new generic strnlen_user() function
x86: use the new generic strnlen_user() function
lib: add generic strnlen_user() function
word-at-a-time: make the interfaces truly generic
x86: use generic strncpy_from_user routine

Linus Torvalds
2012-05-27 07:57:16 +0800
19a4b9889 builddeb: include autogenerated header files ... Browse Code »

After 303395ac3bf3e2cb488435537d416bc840438fcb, some headers are
autogenerated. Include these autogenerated headers (mainly
unistd_32_ia32.h) in out-of-tree builds to allow DKMS modules to be
built succesfully.

Signed-off-by: Peter Lekensteyn
Signed-off-by: Michal Marek

Lekensteyn
2012-05-27 04:54:50 +0800
ae32adc1e Merge branch 'i2c-embedded/for-next' of git://git.pengutronix.de/git/wsa/linux ... Browse Code »

Pull i2c-embedded changes from Wolfram Sang:
"Major changes:

- lots of devicetree additions for existing drivers. I tried hard to
make sure the bindings are proper. In more complicated cases, I
requested acks from people having more experience with them than
me. That took a bit of extra time and also some time went into
discussions with developers about what bindings are and what not.
I have the feeling that the workflow with bindings should be
improved to scale better. I will spend some more thought on
this...

- i2c-muxes are succesfully used meanwhile, so we dropped
EXPERIMENTAL for them and renamed the drivers to a standard pattern
to match the rest of the subsystem. They can also be used with
devicetree now.

- ixp2000 was removed since the whole platform goes away.

- cleanups (strlcpy instead of strcpy, NULL instead of 0)

- The rest is typical driver fixes I assume.

All patches have been in linux-next at least since v3.4-rc6."

Fixed up trivial conflict in arch/arm/mach-lpc32xx/common.c due to the
same patch already having come in through the arm/soc trees, with
additional patches on top of it.

* 'i2c-embedded/for-next' of git://git.pengutronix.de/git/wsa/linux: (35 commits)
i2c: davinci: Free requested IRQ in remove
i2c: ocores: register OF i2c devices
i2c: tegra: notify transfer-complete after clearing status.
I2C: xiic: Add OF binding support
i2c: Rename last mux driver to standard pattern
i2c: tegra: fix 10bit address configuration
i2c: muxes: rename first set of drivers to a standard pattern
of/i2c: implement of_find_i2c_adapter_by_node
i2c: implement i2c_verify_adapter
i2c-s3c2410: Add HDMIPHY quirk for S3C2440
i2c-s3c2410: Rework device type handling
i2c: muxes are not EXPERIMENTAL anymore
i2c/of: Automatically populate i2c mux busses from device tree data.
i2c: Add a struct device * parameter to i2c_add_mux_adapter()
of/i2c: call i2c_verify_client from of_find_i2c_device_by_node
i2c: designware: Add clk_{un}prepare() support
i2c: designware: add PM support
i2c: ixp2000: remove driver
i2c: pnx: add device tree support
i2c: imx: don't use strcpy but strlcpy
...

Linus Torvalds
2012-05-27 04:35:03 +0800
f465d145d Merge tag 'cleanup-initcall' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc ... Browse Code »

Pull sweeping late_initcall cleanup for arm-soc from Olof Johansson:
"This is a patch series from Shawn Guo that moves from individual
late_initcalls() to using a member in the machine structure to invoke
a platform's late initcalls.

This cleanup is a step in the move towards multiplatform kernels since
it would reduce the need to check for compatible platforms in each and
every initcall."

Fix up trivial conflicts in arch/arm/mach-{exynos/mach-universal_c210.c,
imx/mach-cpuimx51.c, omap2/board-generic.c} due to changes nearby (and,
in the case of cpuimx51.c the board support being deleted)

* tag 'cleanup-initcall' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: ux500: use machine specific hook for late init
ARM: tegra: use machine specific hook for late init
ARM: shmobile: use machine specific hook for late init
ARM: sa1100: use machine specific hook for late init
ARM: s3c64xx: use machine specific hook for late init
ARM: prima2: use machine specific hook for late init
ARM: pnx4008: use machine specific hook for late init
ARM: omap2: use machine specific hook for late init
ARM: omap1: use machine specific hook for late init
ARM: msm: use machine specific hook for late init
ARM: imx: use machine specific hook for late init
ARM: exynos: use machine specific hook for late init
ARM: ep93xx: use machine specific hook for late init
ARM: davinci: use machine specific hook for late init
ARM: provide a late_initcall hook for platform initialization

Linus Torvalds
2012-05-27 04:14:01 +0800