Eric Lee / smarc-fsl-linux-kernel

6c5a53c67 kernel/res_counter.c: replace simple_strtoull by kstrtoull ... Browse Code »

[akpm@linux-foundation.org: don't overwrite kstrtoull()'s errno]
Signed-off-by: Fabian Frederick
Cc: Michal Hocko
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:15 +0800

cac92ba74 kernel/tracepoint.c: kernel-doc fixes ... Browse Code »

Signed-off-by: Fabian Frederick
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:15 +0800

cf2500406 kernel/stop_machine.c: kernel-doc warning fix ... Browse Code »

Signed-off-by: Fabian Frederick
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:15 +0800

eaa1809b9 kernel/latencytop.c: convert seq_printf to seq_puts ... Browse Code »

This patch also fixes one function declaration over 80 characters.

Signed-off-by: Fabian Frederick
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:15 +0800

b9e5db6d2 kernel/exec_domain.c: code clean-up ... Browse Code »

Fix checkpatch warnings about EXPORT_SYMBOL and return()

Signed-off-by: Fabian Frederick
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:15 +0800

a6c8c6902 kernel/capability.c: code clean-up ... Browse Code »

- EXPORT_SYMBOL

- typo: unexpectidly->unexpectedly

- function prototype over 80 characters

Signed-off-by: Fabian Frederick
Cc: Serge Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:15 +0800

462b29b85 kernel/backtracetest.c: replace no level printk by pr_info() ... Browse Code »

Signed-off-by: Fabian Frederick
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:14 +0800

84117da5b kernel/cpu.c: convert printk to pr_foo() ... Browse Code »

no level printk converted to pr_warn (if err)
no level printk converted to pr_info (disabling non-boot cpus)
Other printk converted to respective level.

Signed-off-by: Fabian Frederick
Cc: "Rafael J. Wysocki"
Cc: Peter Zijlstra
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:14 +0800

2c0d259e0 compiler.h: avoid sparse errors in __compiletime_error_fallback() ... Browse Code »

Usually, BUG_ON and friends aren't even evaluated in sparse, but recently
compiletime_assert_atomic_type() was added, and that now results in a
sparse warning every time it is used.

The reason turns out to be the temporary variable, after it sparse no
longer considers the value to be a constant, and results in a warning and
an error. The error is the more annoying part of this as it suppresses
any further warnings in the same file, hiding other problems.

Unfortunately the condition cannot be simply expanded out to avoid the
temporary variable since it breaks compiletime_assert on old versions of
GCC such as GCC 4.2.4 which the latest metag compiler is based on.

Therefore #ifndef __CHECKER__ out the __compiletime_error_fallback which
uses the potentially negative size array to trigger a conditional compiler
error, so that sparse doesn't see it.

Signed-off-by: James Hogan
Cc: Johannes Berg
Cc: Daniel Santos
Cc: Luciano Coelho
Cc: Peter Zijlstra
Cc: Paul E. McKenney
Acked-by: Johannes Berg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:14 +0800

00f01791e fs/exportfs/expfs.c: kernel-doc warning fixes ... Browse Code »

Fixing 2 typo in function comments.

Signed-off-by: Fabian Frederick
Cc: Al Viro
Cc: "J. Bruce Fields"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:14 +0800

e37dcbfbb fs/efivarfs/super.c: use static const for dentry_operations ... Browse Code »

...like other filesystems.

Signed-off-by: Fabian Frederick
Cc: Matthew Garrett
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:14 +0800

f6187769d sys_sgetmask/sys_ssetmask: add CONFIG_SGETMASK_SYSCALL ... Browse Code »

sys_sgetmask and sys_ssetmask are obsolete system calls no longer
supported in libc.

This patch replaces architecture related __ARCH_WANT_SYS_SGETMAX by expert
mode configuration.That option is enabled by default for those
architectures.

Signed-off-by: Fabian Frederick
Cc: Steven Miao
Cc: Mikael Starvik
Cc: Jesper Nilsson
Cc: David Howells
Cc: Geert Uytterhoeven
Cc: Michal Simek
Cc: Ralf Baechle
Cc: Koichi Yasutake
Cc: "James E.J. Bottomley"
Cc: Helge Deller
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: "David S. Miller"
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: Greg Ungerer
Cc: Heiko Carstens
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:14 +0800

72d09633c mm/zswap: NUMA aware allocation for zswap_dstmem ... Browse Code »

zswap_dstmem is a percpu block of memory, which should be allocated using
kmalloc_node(), to get better NUMA locality.

Without it, all the blocks are allocated from a single node.

Signed-off-by: Eric Dumazet
Acked-by: Seth Jennings
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:14 +0800

d867f203b mm/zsmalloc: make zsmalloc module-buildable ... Browse Code »

Now, we can build zsmalloc as module because unmap_kernel_range was
exported.

Signed-off-by: Minchan Kim
Cc: Nitin Gupta
Cc: Sergey Senozhatsky
Cc: Jerome Marchand
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:14 +0800

93ef6d6ca mm/vmalloc.c: export unmap_kernel_range() ... Browse Code »

zsmalloc needs exported unmap_kernel_range for building as a module. See
https://lkml.org/lkml/2013/1/18/487

I didn't send a patch to make unmap_kernel_range exportable at that time
because zram was staging stuff and I thought VM function exporting for
staging stuff makes no sense.

Now zsmalloc was promoted. If we can't build zsmalloc as module, it means
we can't build zram as module, either. Additionally, buddy map_vm_area is
already exported so let's export unmap_kernel_range to help his buddy.

Signed-off-by: Minchan Kim
Cc: Nitin Gupta
Cc: Sergey Senozhatsky
Cc: Jerome Marchand
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:14 +0800

7eb52512a zsmalloc: fixup trivial zs size classes value in comments ... Browse Code »

According to calculation, ZS_SIZE_CLASSES value is 255 on systems with 4K
page size, not 254. The old value may forget count the ZS_MIN_ALLOC_SIZE
in.

This patch fixes this trivial issue in the comments.

Signed-off-by: Weijie Yang
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:13 +0800

50417c555 mm/zbud.c: make size unsigned like unique callsite ... Browse Code »

zbud_alloc is only called by zswap_frontswap_store with unsigned int len.
Change function parameter + update >= 0 check.

Signed-off-by: Fabian Frederick
Acked-by: Seth Jennings
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:13 +0800

38515c733 zram: correct offset usage in zram_bio_discard ... Browse Code »

We want to skip the physical block(PAGE_SIZE) which is partially covered
by the discard bio, so we check the remaining size and subtract it if
there is a need to goto the next physical block.

The current offset usage in zram_bio_discard is incorrect, it will cause
its upper filesystem breakdown. Consider the following scenario:

On some architecture or config, PAGE_SIZE is 64K for example, filesystem
is set up on zram disk without PAGE_SIZE aligned, a discard bio leads to a
offset = 4K and size=72K, normally, it should not really discard any
physical block as it partially cover two physical blocks. However, with
the current offset usage, it will discard the second physical block and
free its memory, which will cause filesystem breakdown.

This patch corrects the offset usage in zram_bio_discard.

Signed-off-by: Weijie Yang
Cc: Minchan Kim
Cc: Nitin Gupta
Acked-by: Joonsoo Kim
Cc: Sergey Senozhatsky
Cc: Bob Liu
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:13 +0800

2a7a0e0fd mm, memcg: periodically schedule when emptying page list ... Browse Code »

mem_cgroup_force_empty_list() can iterate a large number of pages on an
lru and mem_cgroup_move_parent() doesn't return an errno unless certain
criteria, none of which indicate that the iteration may be taking too
long, is met.

We have encountered the following stack trace many times indicating
"need_resched set for > 51000020 ns (51 ticks) without schedule", for
example:

scheduler_tick()

mem_cgroup_move_account+0x4d/0x1d5
mem_cgroup_move_parent+0x8d/0x109
mem_cgroup_reparent_charges+0x149/0x2ba
mem_cgroup_css_offline+0xeb/0x11b
cgroup_offline_fn+0x68/0x16b
process_one_work+0x129/0x350

If this iteration is taking too long, we still need to do cond_resched()
even when an individual page is not busy.

[rientjes@google.com: changelog]
Signed-off-by: Hugh Dickins
Signed-off-by: David Rientjes
Acked-by: Johannes Weiner
Acked-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:13 +0800

4a0da71b9 Documentation/sysctl/vm.txt: clarify vfs_cache_pressure description ... Browse Code »

Existing description is worded in a way which almost encourages setting of
vfs_cache_pressure above 100, possibly way above it.

Users are left in a dark what this numeric value is - an int? a
percentage? what the scale is?

As a result, we are getting reports about noticeable performance
degradation from users who have set vfs_cache_pressure to ridiculously
high values - because they thought there is no downside to it.

Via code inspection it's obvious that this value is treated as a
percentage. This patch changes text to reflect this fact, and adds a
cautionary paragraph advising against setting vfs_cache_pressure sky high.

Signed-off-by: Denys Vlasenko
Cc: Alexander Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:13 +0800

3ba08129e mm/memory-failure.c: support use of a dedicated thread to handle SIGBUS(BUS_MCEERR_AO) ... Browse Code »

Currently memory error handler handles action optional errors in the
deferred manner by default. And if a recovery aware application wants
to handle it immediately, it can do it by setting PF_MCE_EARLY flag.
However, such signal can be sent only to the main thread, so it's
problematic if the application wants to have a dedicated thread to
handler such signals.

So this patch adds dedicated thread support to memory error handler. We
have PF_MCE_EARLY flags for each thread separately, so with this patch
AO signal is sent to the thread with PF_MCE_EARLY flag set, not the main
thread. If you want to implement a dedicated thread, you call prctl()
to set PF_MCE_EARLY on the thread.

Memory error handler collects processes to be killed, so this patch lets
it check PF_MCE_EARLY flag on each thread in the collecting routines.

No behavioral change for all non-early kill cases.

Tony said:

: The old behavior was crazy - someone with a multithreaded process might
: well expect that if they call prctl(PF_MCE_EARLY) in just one thread, then
: that thread would see the SIGBUS with si_code = BUS_MCEERR_A0 - even if
: that thread wasn't the main thread for the process.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Naoya Horiguchi
Reviewed-by: Tony Luck
Cc: Kamil Iskra
Cc: Andi Kleen
Cc: Borislav Petkov
Cc: Chen Gong
Cc: [3.2+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:13 +0800

74614de17 mm/memory-failure.c: don't let collect_procs() skip over processes for MF_ACTION_REQUIRED ... Browse Code »

When Linux sees an "action optional" machine check (where h/w has reported
an error that is not in the current execution path) we generally do not
want to signal a process, since most processes do not have a SIGBUS
handler - we'd just prematurely terminate the process for a problem that
they might never actually see.

task_early_kill() decides whether to consider a process - and it checks
whether this specific process has been marked for early signals with
"prctl", or if the system administrator has requested early signals for
all processes using /proc/sys/vm/memory_failure_early_kill.

But for MF_ACTION_REQUIRED case we must not defer. The error is in the
execution path of the current thread so we must send the SIGBUS
immediatley.

Fix by passing a flag argument through collect_procs*() to
task_early_kill() so it knows whether we can defer or must take action.

Signed-off-by: Tony Luck
Signed-off-by: Naoya Horiguchi
Cc: Andi Kleen
Cc: Borislav Petkov
Cc: Chen Gong
Cc: [3.2+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:13 +0800

a70ffcac7 mm/memory-failure.c-failure: send right signal code to correct thread ... Browse Code »

When a thread in a multi-threaded application hits a machine check because
of an uncorrectable error in memory - we want to send the SIGBUS with
si.si_code = BUS_MCEERR_AR to that thread. Currently we fail to do that
if the active thread is not the primary thread in the process.
collect_procs() just finds primary threads and this test:

if ((flags & MF_ACTION_REQUIRED) && t == current) {

will see that the thread we found isn't the current thread and so send a
si.si_code = BUS_MCEERR_AO to the primary (and nothing to the active
thread at this time).

We can fix this by checking whether "current" shares the same mm with the
process that collect_procs() said owned the page. If so, we send the
SIGBUS to current (with code BUS_MCEERR_AR).

Signed-off-by: Tony Luck
Signed-off-by: Naoya Horiguchi
Reported-by: Otto Bruggeman
Cc: Andi Kleen
Cc: Borislav Petkov
Cc: Chen Gong
Cc: [3.2+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:13 +0800

d2f310283 mm/page-writeback.c: remove outdated comment ... Browse Code »

There is an orphaned prehistoric comment , which used to be against
get_dirty_limits(), the dawn of global_dirtyable_memory().

Back then, the implementation of get_dirty_limits() is complicated and
full of magic numbers, so this comment is necessary. But we now use the
clear and neat global_dirtyable_memory(), which renders this comment
ambiguous and useless. Remove it.

Signed-off-by: Jianyu Zhan
Acked-by: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:13 +0800

50088c440 mm/swapfile.c: delete the "last_in_cluster < scan_base" loop in the body of scan_swap_map() ... Browse Code »

Via commit ebc2a1a69111 ("swap: make cluster allocation per-cpu"), we
can find that all SWP_SOLIDSTATE "seek is cheap"(SSD case) has already
gone to si->cluster_info scan_swap_map_try_ssd_cluster() route. So that
the "last_in_cluster < scan_base" loop in the body of scan_swap_map()
has already become a dead code snippet, and it should have been deleted.

This patch is to delete the redundant loop as Hugh and Shaohua
suggested.

[hughd@google.com: fix comment, simplify code]
Signed-off-by: Chen Yucong
Cc: Shaohua Li
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:12 +0800

100873d7a hugetlb: rename hugepage_migration_support() to ..._supported() ... Browse Code »

We already have a function named hugepages_supported(), and the similar
name hugepage_migration_support() is a bit unconfortable, so let's rename
it hugepage_migration_supported().

Signed-off-by: Naoya Horiguchi
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:12 +0800

1fdb412bd mm: document do_fault_around() feature ... Browse Code »

Some clarification on how faultaround works.

[akpm@linux-foundation.org: tweak comment text]
Signed-off-by: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:12 +0800

a9b0f8618 mm: nominate faultaround area in bytes rather than page order ... Browse Code »

There is evidencs that the faultaround feature is less relevant on
architectures with page size bigger then 4k. Which makes sense since page
fault overhead per byte of mapped area should be less there.

Let's rework the feature to specify faultaround area in bytes instead of
page order. It's 64 kilobytes for now.

The patch effectively disables faultaround on architectures with page size
>= 64k (like ppc64).

It's possible that some other size of faultaround area is relevant for a
platform. We can expose `fault_around_bytes' variable to arch-specific
code once such platforms will be found.

Signed-off-by: Kirill A. Shutemov
Cc: Rusty Russell
Cc: Hugh Dickins
Cc: Madhavan Srinivasan
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Andi Kleen
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:12 +0800

7d018176e mm/page_alloc.c: cleanup add_active_range() related comments ... Browse Code »

add_active_range() has been repalced by memblock_set_node(). Clean up the
comments to comply with that change.

Signed-off-by: Zhang Zhen
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:12 +0800

daa5ba768 mm/rmap.c: cleanup ttu_flags ... Browse Code »

Transform action part of ttu_flags into individiual bits. These flags
aren't part of any uses-space visible api or even trace events.

Signed-off-by: Konstantin Khlebnikov
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:12 +0800

3d92860f9 mm/rmap.c: don't call mmu_notifier_invalidate_page() during munlock ... Browse Code »

In its munmap mode, try_to_unmap_one() searches other mlocked vmas, it
never unmaps pages. There is no reason for invalidation because ptes are
left unchanged.

Signed-off-by: Konstantin Khlebnikov
Cc: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:12 +0800

226b4ccdc mm/process_vm_access: move config option into init/Kconfig ... Browse Code »

CONFIG_CROSS_MEMORY_ATTACH adds couple syscalls: process_vm_readv and
process_vm_writev, it's a kind of IPC for copying data between processes.
Currently this option is placed inside "Processor type and features".

This patch moves it into "General setup" (where all other arch-independed
syscalls and ipc features are placed) and changes prompt string to less
cryptic.

Signed-off-by: Konstantin Khlebnikov
Cc: Christopher Yeoh
Cc: Davidlohr Bueso
Cc: Hugh Dickins
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:12 +0800

1a501907b mm: vmscan: use proportional scanning during direct reclaim and full scan at DEF_PRIORITY ... Browse Code »

Commit "mm: vmscan: obey proportional scanning requirements for kswapd"
ensured that file/anon lists were scanned proportionally for reclaim from
kswapd but ignored it for direct reclaim. The intent was to minimse
direct reclaim latency but Yuanhan Liu pointer out that it substitutes one
long stall for many small stalls and distorts aging for normal workloads
like streaming readers/writers. Hugh Dickins pointed out that a
side-effect of the same commit was that when one LRU list dropped to zero
that the entirety of the other list was shrunk leading to excessive
reclaim in memcgs. This patch scans the file/anon lists proportionally
for direct reclaim to similarly age page whether reclaimed by kswapd or
direct reclaim but takes care to abort reclaim if one LRU drops to zero
after reclaiming the requested number of pages.

Based on ext4 and using the Intel VM scalability test

3.15.0-rc5 3.15.0-rc5
shrinker proportion
Unit lru-file-readonce elapsed 5.3500 ( 0.00%) 5.4200 ( -1.31%)
Unit lru-file-readonce time_range 0.2700 ( 0.00%) 0.1400 ( 48.15%)
Unit lru-file-readonce time_stddv 0.1148 ( 0.00%) 0.0536 ( 53.33%)
Unit lru-file-readtwice elapsed 8.1700 ( 0.00%) 8.1700 ( 0.00%)
Unit lru-file-readtwice time_range 0.4300 ( 0.00%) 0.2300 ( 46.51%)
Unit lru-file-readtwice time_stddv 0.1650 ( 0.00%) 0.0971 ( 41.16%)

The test cases are running multiple dd instances reading sparse files. The results are within
the noise for the small test machine. The impact of the patch is more noticable from the vmstats

3.15.0-rc5 3.15.0-rc5
shrinker proportion
Minor Faults 35154 36784
Major Faults 611 1305
Swap Ins 394 1651
Swap Outs 4394 5891
Allocation stalls 118616 44781
Direct pages scanned 4935171 4602313
Kswapd pages scanned 15921292 16258483
Kswapd pages reclaimed 15913301 16248305
Direct pages reclaimed 4933368 4601133
Kswapd efficiency 99% 99%
Kswapd velocity 670088.047 682555.961
Direct efficiency 99% 99%
Direct velocity 207709.217 193212.133
Percentage direct scans 23% 22%
Page writes by reclaim 4858.000 6232.000
Page writes file 464 341
Page writes anon 4394 5891

Note that there are fewer allocation stalls even though the amount
of direct reclaim scanning is very approximately the same.

Signed-off-by: Mel Gorman
Cc: Johannes Weiner
Cc: Hugh Dickins
Cc: Tim Chen
Cc: Dave Chinner
Tested-by: Yuanhan Liu
Cc: Bob Liu
Cc: Jan Kara
Cc: Rik van Riel
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:12 +0800

d23da150a fs/superblock: avoid locking counting inodes and dentries before reclaiming them ... Browse Code »

We remove the call to grab_super_passive in call to super_cache_count.
This becomes a scalability bottleneck as multiple threads are trying to do
memory reclamation, e.g. when we are doing large amount of file read and
page cache is under pressure. The cached objects quickly got reclaimed
down to 0 and we are aborting the cache_scan() reclaim. But counting
creates a log jam acquiring the sb_lock.

We are holding the shrinker_rwsem which ensures the safety of call to
list_lru_count_node() and s_op->nr_cached_objects. The shrinker is
unregistered now before ->kill_sb() so the operation is safe when we are
doing unmount.

The impact will depend heavily on the machine and the workload but for a
small machine using postmark tuned to use 4xRAM size the results were

3.15.0-rc5 3.15.0-rc5
vanilla shrinker-v1r1
Ops/sec Transactions 21.00 ( 0.00%) 24.00 ( 14.29%)
Ops/sec FilesCreate 39.00 ( 0.00%) 44.00 ( 12.82%)
Ops/sec CreateTransact 10.00 ( 0.00%) 12.00 ( 20.00%)
Ops/sec FilesDeleted 6202.00 ( 0.00%) 6202.00 ( 0.00%)
Ops/sec DeleteTransact 11.00 ( 0.00%) 12.00 ( 9.09%)
Ops/sec DataRead/MB 25.97 ( 0.00%) 29.10 ( 12.05%)
Ops/sec DataWrite/MB 49.99 ( 0.00%) 56.02 ( 12.06%)

ffsb running in a configuration that is meant to simulate a mail server showed

3.15.0-rc5 3.15.0-rc5
vanilla shrinker-v1r1
Ops/sec readall 9402.63 ( 0.00%) 9567.97 ( 1.76%)
Ops/sec create 4695.45 ( 0.00%) 4735.00 ( 0.84%)
Ops/sec delete 173.72 ( 0.00%) 179.83 ( 3.52%)
Ops/sec Transactions 14271.80 ( 0.00%) 14482.81 ( 1.48%)
Ops/sec Read 37.00 ( 0.00%) 37.60 ( 1.62%)
Ops/sec Write 18.20 ( 0.00%) 18.30 ( 0.55%)

Signed-off-by: Tim Chen
Signed-off-by: Mel Gorman
Cc: Johannes Weiner
Cc: Hugh Dickins
Cc: Dave Chinner
Tested-by: Yuanhan Liu
Cc: Bob Liu
Cc: Jan Kara
Acked-by: Rik van Riel
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:11 +0800

28f2cd4f6 fs/superblock: unregister sb shrinker before ->kill_sb() ... Browse Code »

This series is aimed at regressions noticed during reclaim activity. The
first two patches are shrinker patches that were posted ages ago but never
merged for reasons that are unclear to me. I'm posting them again to see
if there was a reason they were dropped or if they just got lost. Dave?
Time? The last patch adjusts proportional reclaim. Yuanhan Liu, can you
retest the vm scalability test cases on a larger machine? Hugh, does this
work for you on the memcg test cases?

Based on ext4, I get the following results but unfortunately my larger
test machines are all unavailable so this is based on a relatively small
machine.

postmark
3.15.0-rc5 3.15.0-rc5
vanilla proportion-v1r4
Ops/sec Transactions 21.00 ( 0.00%) 25.00 ( 19.05%)
Ops/sec FilesCreate 39.00 ( 0.00%) 45.00 ( 15.38%)
Ops/sec CreateTransact 10.00 ( 0.00%) 12.00 ( 20.00%)
Ops/sec FilesDeleted 6202.00 ( 0.00%) 6202.00 ( 0.00%)
Ops/sec DeleteTransact 11.00 ( 0.00%) 12.00 ( 9.09%)
Ops/sec DataRead/MB 25.97 ( 0.00%) 30.02 ( 15.59%)
Ops/sec DataWrite/MB 49.99 ( 0.00%) 57.78 ( 15.58%)

ffsb (mail server simulator)
3.15.0-rc5 3.15.0-rc5
vanilla proportion-v1r4
Ops/sec readall 9402.63 ( 0.00%) 9805.74 ( 4.29%)
Ops/sec create 4695.45 ( 0.00%) 4781.39 ( 1.83%)
Ops/sec delete 173.72 ( 0.00%) 177.23 ( 2.02%)
Ops/sec Transactions 14271.80 ( 0.00%) 14764.37 ( 3.45%)
Ops/sec Read 37.00 ( 0.00%) 38.50 ( 4.05%)
Ops/sec Write 18.20 ( 0.00%) 18.50 ( 1.65%)

dd of a large file
3.15.0-rc5 3.15.0-rc5
vanilla proportion-v1r4
WallTime DownloadTar 75.00 ( 0.00%) 61.00 ( 18.67%)
WallTime DD 423.00 ( 0.00%) 401.00 ( 5.20%)
WallTime Delete 2.00 ( 0.00%) 5.00 (-150.00%)

stutter (times mmap latency during large amounts of IO)

3.15.0-rc5 3.15.0-rc5
vanilla proportion-v1r4
Unit >5ms Delays 80252.0000 ( 0.00%) 81523.0000 ( -1.58%)
Unit Mmap min 8.2118 ( 0.00%) 8.3206 ( -1.33%)
Unit Mmap mean 17.4614 ( 0.00%) 17.2868 ( 1.00%)
Unit Mmap stddev 24.9059 ( 0.00%) 34.6771 (-39.23%)
Unit Mmap max 2811.6433 ( 0.00%) 2645.1398 ( 5.92%)
Unit Mmap 90% 20.5098 ( 0.00%) 18.3105 ( 10.72%)
Unit Mmap 93% 22.9180 ( 0.00%) 20.1751 ( 11.97%)
Unit Mmap 95% 25.2114 ( 0.00%) 22.4988 ( 10.76%)
Unit Mmap 99% 46.1430 ( 0.00%) 43.5952 ( 5.52%)
Unit Ideal Tput 85.2623 ( 0.00%) 78.8906 ( 7.47%)
Unit Tput min 44.0666 ( 0.00%) 43.9609 ( 0.24%)
Unit Tput mean 45.5646 ( 0.00%) 45.2009 ( 0.80%)
Unit Tput stddev 0.9318 ( 0.00%) 1.1084 (-18.95%)
Unit Tput max 46.7375 ( 0.00%) 46.7539 ( -0.04%)

This patch (of 3):

We will like to unregister the sb shrinker before ->kill_sb(). This will
allow cached objects to be counted without call to grab_super_passive() to
update ref count on sb. We want to avoid locking during memory
reclamation especially when we are skipping the memory reclaim when we are
out of cached objects.

This is safe because grab_super_passive does a try-lock on the
sb->s_umount now, and so if we are in the unmount process, it won't ever
block. That means what used to be a deadlock and races we were avoiding
by using grab_super_passive() is now:

shrinker umount

down_read(shrinker_rwsem)
down_write(sb->s_umount)
shrinker_unregister
down_write(shrinker_rwsem)

grab_super_passive(sb)
down_read_trylock(sb->s_umount)

....

up_read(shrinker_rwsem)

up_write(shrinker_rwsem)
->kill_sb()
....

So it is safe to deregister the shrinker before ->kill_sb().

Signed-off-by: Tim Chen
Signed-off-by: Mel Gorman
Cc: Johannes Weiner
Cc: Hugh Dickins
Cc: Dave Chinner
Tested-by: Yuanhan Liu
Cc: Bob Liu
Cc: Jan Kara
Acked-by: Rik van Riel
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:11 +0800

850e9c69c mm: fix typo in comment in do_fault_around() ... Browse Code »

Signed-off-by: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:11 +0800

7fc34a62c mm/msync.c: sync only the requested range in msync() ... Browse Code »

msync() currently syncs more than POSIX requires or BSD or Solaris
implement. It is supposed to be equivalent to fdatasync(), not fsync(),
and it is only supposed to sync the portion of the file that overlaps the
range passed to msync.

If the VMA is non-linear, fall back to syncing the entire file, but we
still optimise to only fdatasync() the entire file, not the full fsync().

akpm: there are obvious concerns with bck-compatibility: is anyone relying
on the undocumented side-effect for their data integrity? And how would
they ever know if this change broke their data integrity?

We think the risk is reasonably low, and this patch brings the kernel into
line with other OS's and with what the manpage has always said...

Signed-off-by: Matthew Wilcox
Reviewed-by: Christoph Hellwig
Acked-by: Jeff Moyer
Cc: Chris Mason
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:11 +0800

65eb71823 hwpoison: remove unused global variable in do_machine_check() ... Browse Code »

Remove an unused global variable mce_entry and relative operations in
do_machine_check().

Signed-off-by: Chen Yucong
Cc: Naoya Horiguchi
Cc: Wu Fengguang
Cc: Andi Kleen
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:11 +0800

be9765722 mm, compaction: properly signal and act upon lock and need_sched() contention ... Browse Code »

Compaction uses compact_checklock_irqsave() function to periodically check
for lock contention and need_resched() to either abort async compaction,
or to free the lock, schedule and retake the lock. When aborting,
cc->contended is set to signal the contended state to the caller. Two
problems have been identified in this mechanism.

First, compaction also calls directly cond_resched() in both scanners when
no lock is yet taken. This call either does not abort async compaction,
or set cc->contended appropriately. This patch introduces a new
compact_should_abort() function to achieve both. In isolate_freepages(),
the check frequency is reduced to once by SWAP_CLUSTER_MAX pageblocks to
match what the migration scanner does in the preliminary page checks. In
case a pageblock is found suitable for calling isolate_freepages_block(),
the checks within there are done on higher frequency.

Second, isolate_freepages() does not check if isolate_freepages_block()
aborted due to contention, and advances to the next pageblock. This
violates the principle of aborting on contention, and might result in
pageblocks not being scanned completely, since the scanning cursor is
advanced. This problem has been noticed in the code by Joonsoo Kim when
reviewing related patches. This patch makes isolate_freepages_block()
check the cc->contended flag and abort.

In case isolate_freepages() has already isolated some pages before
aborting due to contention, page migration will proceed, which is OK since
we do not want to waste the work that has been done, and page migration
has own checks for contention. However, we do not want another isolation
attempt by either of the scanners, so cc->contended flag check is added
also to compaction_alloc() and compact_finished() to make sure compaction
is aborted right after the migration.

The outcome of the patch should be reduced lock contention by async
compaction and lower latencies for higher-order allocations where direct
compaction is involved.

[akpm@linux-foundation.org: fix typo in comment]
Reported-by: Joonsoo Kim
Signed-off-by: Vlastimil Babka
Reviewed-by: Naoya Horiguchi
Cc: Minchan Kim
Cc: Mel Gorman
Cc: Bartlomiej Zolnierkiewicz
Cc: Michal Nazarewicz
Cc: Christoph Lameter
Cc: Rik van Riel
Acked-by: Michal Nazarewicz
Tested-by: Shawn Guo
Tested-by: Kevin Hilman
Tested-by: Stephen Warren
Tested-by: Fabio Estevam
Cc: David Rientjes
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:11 +0800

6e6870d4f fs/hugetlbfs/inode.c: remove null test before kfree ... Browse Code »

Fix checkpatch warning:
WARNING: kfree(NULL) is safe this check is probably not required

Signed-off-by: Fabian Frederick
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

2014-06-05 07:54:11 +0800

GITLAB

Eric Lee / smarc-fsl-linux-kernel

05 Jun, 2014