Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

04 Jun, 2014

1 commit

0c188a07b perf probe: Fix a segfault if asked for variable it doesn't find ... Browse Code »

Fix a segfault bug by asking for variable it doesn't find.
Since the convert_variable() didn't handle error code returned
from convert_variable_location(), it just passed an incomplete
variable field and then a segfault was occurred when formatting
the field.

This fixes that bug by handling success code correctly in
convert_variable(). Other callers of convert_variable_location()
are correctly checking the return code.

This bug was introduced by following commit. But another hidden
erroneous error handling has been there previously (-ENOMEM case).

commit 3d918a12a1b3088ac16ff37fa52760639d6e2403

Signed-off-by: Masami Hiramatsu
Reported-by: Arnaldo Carvalho de Melo
Tested-by: Arnaldo Carvalho de Melo
Cc: Peter Zijlstra
Cc: Paul Mackerras
Cc: Ingo Molnar
Cc: Namhyung Kim
Link: http://lkml.kernel.org/r/20140529105232.28251.30447.stgit@ltc230.yrl.intra.hitachi.co.jp
Signed-off-by: Jiri Olsa

Masami Hiramatsu
2014-06-04 20:48:03 +0800

19 May, 2014

3 commits

b69cf5364 perf: Fix a race between ring_buffer_detach() and ring_buffer_attach() ... Browse Code »
13

Alexander noticed that we use RCU iteration on rb->event_list but do
not use list_{add,del}_rcu() to add,remove entries to that list, nor
do we observe proper grace periods when re-using the entries.

Merge ring_buffer_detach() into ring_buffer_attach() such that
attaching to the NULL buffer is detaching.

Furthermore, ensure that between any 'detach' and 'attach' of the same
event we observe the required grace period, but only when strictly
required. In effect this means that only ioctl(.request =
PERF_EVENT_IOC_SET_OUTPUT) will wait for a grace period, while the
normal initial attach and final detach will not be delayed.

This patch should, I think, do the right thing under all
circumstances, the 'normal' cases all should never see the extra grace
period, but the two cases:

1) PERF_EVENT_IOC_SET_OUTPUT on an event which already has a
ring_buffer set, will now observe the required grace period between
removing itself from the old and attaching itself to the new buffer.

This case is 'simple' in that both buffers are present in
perf_event_set_output() one could think an unconditional
synchronize_rcu() would be sufficient; however...

2) an event that has a buffer attached, the buffer is destroyed
(munmap) and then the event is attached to a new/different buffer
using PERF_EVENT_IOC_SET_OUTPUT.

This case is more complex because the buffer destruction does:
ring_buffer_attach(.rb = NULL)
followed by the ioctl() doing:
ring_buffer_attach(.rb = foo);

and we still need to observe the grace period between these two
calls due to us reusing the event->rb_entry list_head.

In order to make 2 happen we use Paul's latest cond_synchronize_rcu()
call.

Cc: Paul Mackerras
Cc: Stephane Eranian
Cc: Andi Kleen
Cc: "Paul E. McKenney"
Cc: Ingo Molnar
Cc: Frederic Weisbecker
Cc: Mike Galbraith
Reported-by: Alexander Shishkin
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20140507123526.GD13658@twins.programming.kicks-ass.net
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2014-05-19 20:44:56 +0800
39af6b167 perf: Prevent false warning in perf_swevent_add ... Browse Code »
6

The perf cpu offline callback takes down all cpu context
events and releases swhash->swevent_hlist.

This could race with task context software event being just
scheduled on this cpu via perf_swevent_add while cpu hotplug
code already cleaned up event's data.

The race happens in the gap between the cpu notifier code
and the cpu being actually taken down. Note that only cpu
ctx events are terminated in the perf cpu hotplug code.

It's easily reproduced with:
$ perf record -e faults perf bench sched pipe

while putting one of the cpus offline:
# echo 0 > /sys/devices/system/cpu/cpu1/online

Console emits following warning:
WARNING: CPU: 1 PID: 2845 at kernel/events/core.c:5672 perf_swevent_add+0x18d/0x1a0()
Modules linked in:
CPU: 1 PID: 2845 Comm: sched-pipe Tainted: G W 3.14.0+ #256
Hardware name: Intel Corporation Montevina platform/To be filled by O.E.M., BIOS AMVACRB1.86C.0066.B00.0805070703 05/07/2008
0000000000000009 ffff880077233ab8 ffffffff81665a23 0000000000200005
0000000000000000 ffff880077233af8 ffffffff8104732c 0000000000000046
ffff88007467c800 0000000000000002 ffff88007a9cf2a0 0000000000000001
Call Trace:
[] dump_stack+0x4f/0x7c
[] warn_slowpath_common+0x8c/0xc0
[] warn_slowpath_null+0x1a/0x20
[] perf_swevent_add+0x18d/0x1a0
[] event_sched_in.isra.75+0x9e/0x1f0
[] group_sched_in+0x6a/0x1f0
[] ? sched_clock_local+0x25/0xa0
[] ctx_sched_in+0x1f6/0x450
[] perf_event_sched_in+0x6b/0xa0
[] perf_event_context_sched_in+0x7b/0xc0
[] __perf_event_task_sched_in+0x43e/0x460
[] ? put_lock_stats.isra.18+0xe/0x30
[] finish_task_switch+0xb8/0x100
[] __schedule+0x30e/0xad0
[] ? pipe_read+0x3e2/0x560
[] ? preempt_schedule_irq+0x3e/0x70
[] ? preempt_schedule_irq+0x3e/0x70
[] preempt_schedule_irq+0x44/0x70
[] retint_kernel+0x20/0x30
[] ? lockdep_sys_exit+0x1a/0x90
[] lockdep_sys_exit_thunk+0x35/0x67
[] ? sysret_check+0x5/0x56

Fixing this by tracking the cpu hotplug state and displaying
the WARN only if current cpu is initialized properly.

Cc: Corey Ashford
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: stable@vger.kernel.org
Reported-by: Fengguang Wu
Signed-off-by: Jiri Olsa
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1396861448-10097-1-git-send-email-jolsa@redhat.com
Signed-off-by: Thomas Gleixner

Jiri Olsa
2014-05-19 20:44:55 +0800
0819b2e30 perf: Limit perf_event_attr::sample_period to 63 bits ... Browse Code »
6

Vince reported that using a large sample_period (one with bit 63 set)
results in wreckage since while the sample_period is fundamentally
unsigned (negative periods don't make sense) the way we implement
things very much rely on signed logic.

So limit sample_period to 63 bits to avoid tripping over this.

Reported-by: Vince Weaver
Signed-off-by: Peter Zijlstra
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/n/tip-p25fhunibl4y3qi0zuqmyf4b@git.kernel.org
Signed-off-by: Thomas Gleixner

Peter Zijlstra
2014-05-19 20:44:55 +0800

11 May, 2014

1 commit

a45e90384 Merge branch 'liblockdep-fixes-3.15' of git://git.kernel.org/pub/scm/linux/kerne… ... Browse Code »

…l/git/sashal/linux into perf/urgent

Pull liblockdep fixes from Sasha Levin, to fix two build related bugs.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2014-05-11 13:23:23 +0800

09 May, 2014

2 commits

ad3b564de tools/liblockdep: Remove all build files when doing make clean ... Browse Code »

We forgot to remove the shared library with the version number when
'make clean' ran, fix the clean pattern.

Signed-off-by: Sasha Levin

Sasha Levin
2014-05-09 01:55:13 +0800
0041898ec tools/liblockdep: Build liblockdep from tools/Makefile ... Browse Code »

add targets to build liblockdep with
make -C tools liblockdep
like the way other stuff under tools/ can be built

Signed-off-by: S. Lockwood-Childs
Signed-off-by: Sasha Levin

S. Lockwood-Childs
2014-05-09 01:34:45 +0800

07 May, 2014

22 commits

a4b4f11b2 perf/x86/intel: Fix Silvermont's event constraints ... Browse Code »

Event 0x013c is not the same as fixed counter2, remove it from
Silvermont's event constraints.

Signed-off-by: Yan, Zheng
Signed-off-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Cc: Stephane Eranian
Link: http://lkml.kernel.org/r/1398755081-12471-1-git-send-email-zheng.z.yan@intel.com
Signed-off-by: Ingo Molnar

Yan, Zheng
2014-05-07 17:33:16 +0800
ffb4ef21a perf: Fix perf_event_init_context() ... Browse Code »

perf_pin_task_context() can return NULL but perf_event_init_context()
assumes it will not, correct this.

Reported-by: Vince Weaver
Signed-off-by: Peter Zijlstra
Cc: Arnaldo Carvalho de Melo
Link: http://lkml.kernel.org/r/20140505171428.GU26782@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2014-05-07 17:33:15 +0800
46ce0fe97 perf: Fix race in removing an event ... Browse Code »
6

When removing a (sibling) event we do:

raw_spin_lock_irq(&ctx->lock);
perf_group_detach(event);
raw_spin_unlock_irq(&ctx->lock);

perf_remove_from_context(event);
raw_spin_lock_irq(&ctx->lock);
...
raw_spin_unlock_irq(&ctx->lock);

Now, assuming the event is a sibling, it will be 'unreachable' for
things like ctx_sched_out() because that iterates the
groups->siblings, and we just unhooked the sibling.

So, if during we get ctx_sched_out(), it will miss the event
and not call event_sched_out() on it, leaving it programmed on the
PMU.

The subsequent perf_remove_from_context() call will find the ctx is
inactive and only call list_del_event() to remove the event from all
other lists.

Hereafter we can proceed to free the event; while still programmed!

Close this hole by moving perf_group_detach() inside the same
ctx->lock region(s) perf_remove_from_context() has.

The condition on inherited events only in __perf_event_exit_task() is
likely complete crap because non-inherited events are part of groups
too and we're tearing down just the same. But leave that for another
patch.

Most-likely-Fixes: e03a9a55b4e ("perf: Change close() semantics for group events")
Reported-by: Vince Weaver
Tested-by: Vince Weaver
Much-staring-at-traces-by: Vince Weaver
Much-staring-at-traces-by: Thomas Gleixner
Cc: Arnaldo Carvalho de Melo
Cc: Linus Torvalds
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20140505093124.GN17778@laptop.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2014-05-07 17:33:14 +0800
38583f095 Merge branch 'akpm' (incoming from Andrew) ... Browse Code »

Merge misc fixes from Andrew Morton:
"13 fixes"

* emailed patches from Andrew Morton :
agp: info leak in agpioc_info_wrap()
fs/affs/super.c: bugfix / double free
fanotify: fix -EOVERFLOW with large files on 64-bit
slub: use sysfs'es release mechanism for kmem_cache
revert "mm: vmscan: do not swap anon pages just because free+file is low"
autofs: fix lockref lookup
mm: filemap: update find_get_pages_tag() to deal with shadow entries
mm/compaction: make isolate_freepages start at pageblock boundary
MAINTAINERS: zswap/zbud: change maintainer email address
mm/page-writeback.c: fix divide by zero in pos_ratio_polynom
hugetlb: ensure hugepage access is denied if hugepages are not supported
slub: fix memcg_propagate_slab_attrs
drivers/rtc/rtc-pcf8523.c: fix month definition

Linus Torvalds
2014-05-07 04:07:41 +0800
3ca9e5d36 agp: info leak in agpioc_info_wrap() ... Browse Code »

On 64 bit systems the agp_info struct has a 4 byte hole between
->agp_mode and ->aper_base. We need to clear it to avoid disclosing
stack information to userspace.

Signed-off-by: Dan Carpenter
Cc: David Airlie
Cc: Daniel Vetter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Carpenter
2014-05-07 04:05:00 +0800
d353efd02 fs/affs/super.c: bugfix / double free ... Browse Code »
5

Commit 842a859db26b ("affs: use ->kill_sb() to simplify ->put_super()
and failure exits of ->mount()") adds .kill_sb which frees sbi but
doesn't remove sbi free in case of parse_options error causing double
free+random crash.

Signed-off-by: Fabian Frederick
Cc: Alexander Viro
Cc: [3.14.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fabian Frederick
2014-05-07 04:05:00 +0800
1e2ee49f7 fanotify: fix -EOVERFLOW with large files on 64-bit ... Browse Code »

On 64-bit systems, O_LARGEFILE is automatically added to flags inside
the open() syscall (also openat(), blkdev_open(), etc). Userspace
therefore defines O_LARGEFILE to be 0 - you can use it, but it's a
no-op. Everything should be O_LARGEFILE by default.

But: when fanotify does create_fd() it uses dentry_open(), which skips
all that. And userspace can't set O_LARGEFILE in fanotify_init()
because it's defined to 0. So if fanotify gets an event regarding a
large file, the read() will just fail with -EOVERFLOW.

This patch adds O_LARGEFILE to fanotify_init()'s event_f_flags on 64-bit
systems, using the same test as open()/openat()/etc.

Addresses https://bugzilla.redhat.com/show_bug.cgi?id=696821

Signed-off-by: Will Woods
Acked-by: Eric Paris
Reviewed-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Will Woods
2014-05-07 04:04:59 +0800
41a212859 slub: use sysfs'es release mechanism for kmem_cache ... Browse Code »
13

debugobjects warning during netfilter exit:

------------[ cut here ]------------
WARNING: CPU: 6 PID: 4178 at lib/debugobjects.c:260 debug_print_object+0x8d/0xb0()
ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
Modules linked in:
CPU: 6 PID: 4178 Comm: kworker/u16:2 Tainted: G W 3.11.0-next-20130906-sasha #3984
Workqueue: netns cleanup_net
Call Trace:
dump_stack+0x52/0x87
warn_slowpath_common+0x8c/0xc0
warn_slowpath_fmt+0x46/0x50
debug_print_object+0x8d/0xb0
__debug_check_no_obj_freed+0xa5/0x220
debug_check_no_obj_freed+0x15/0x20
kmem_cache_free+0x197/0x340
kmem_cache_destroy+0x86/0xe0
nf_conntrack_cleanup_net_list+0x131/0x170
nf_conntrack_pernet_exit+0x5d/0x70
ops_exit_list+0x5e/0x70
cleanup_net+0xfb/0x1c0
process_one_work+0x338/0x550
worker_thread+0x215/0x350
kthread+0xe7/0xf0
ret_from_fork+0x7c/0xb0

Also during dcookie cleanup:

WARNING: CPU: 12 PID: 9725 at lib/debugobjects.c:260 debug_print_object+0x8c/0xb0()
ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x20
Modules linked in:
CPU: 12 PID: 9725 Comm: trinity-c141 Not tainted 3.15.0-rc2-next-20140423-sasha-00018-gc4ff6c4 #408
Call Trace:
dump_stack (lib/dump_stack.c:52)
warn_slowpath_common (kernel/panic.c:430)
warn_slowpath_fmt (kernel/panic.c:445)
debug_print_object (lib/debugobjects.c:262)
__debug_check_no_obj_freed (lib/debugobjects.c:697)
debug_check_no_obj_freed (lib/debugobjects.c:726)
kmem_cache_free (mm/slub.c:2689 mm/slub.c:2717)
kmem_cache_destroy (mm/slab_common.c:363)
dcookie_unregister (fs/dcookies.c:302 fs/dcookies.c:343)
event_buffer_release (arch/x86/oprofile/../../../drivers/oprofile/event_buffer.c:153)
__fput (fs/file_table.c:217)
____fput (fs/file_table.c:253)
task_work_run (kernel/task_work.c:125 (discriminator 1))
do_notify_resume (include/linux/tracehook.h:196 arch/x86/kernel/signal.c:751)
int_signal (arch/x86/kernel/entry_64.S:807)

Sysfs has a release mechanism. Use that to release the kmem_cache
structure if CONFIG_SYSFS is enabled.

Only slub is changed - slab currently only supports /proc/slabinfo and
not /sys/kernel/slab/*. We talked about adding that and someone was
working on it.

[akpm@linux-foundation.org: fix CONFIG_SYSFS=n build]
[akpm@linux-foundation.org: fix CONFIG_SYSFS=n build even more]
Signed-off-by: Christoph Lameter
Reported-by: Sasha Levin
Tested-by: Sasha Levin
Acked-by: Greg KH
Cc: Thomas Gleixner
Cc: Pekka Enberg
Cc: Russell King
Cc: Bart Van Assche
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2014-05-07 04:04:59 +0800
623762517 revert "mm: vmscan: do not swap anon pages just because free+file is low" ... Browse Code »
10

This reverts commit 0bf1457f0cfc ("mm: vmscan: do not swap anon pages
just because free+file is low") because it introduced a regression in
mostly-anonymous workloads, where reclaim would become ineffective and
trap every allocating task in direct reclaim.

The problem is that there is a runaway feedback loop in the scan balance
between file and anon, where the balance tips heavily towards a tiny
thrashing file LRU and anonymous pages are no longer being looked at.
The commit in question removed the safe guard that would detect such
situations and respond with forced anonymous reclaim.

This commit was part of a series to fix premature swapping in loads with
relatively little cache, and while it made a small difference, the cure
is obviously worse than the disease. Revert it.

Signed-off-by: Johannes Weiner
Reported-by: Christian Borntraeger
Acked-by: Christian Borntraeger
Acked-by: Rafael Aquini
Cc: Rik van Riel
Cc: [3.12+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2014-05-07 04:04:59 +0800
6b6751f7f autofs: fix lockref lookup ... Browse Code »
5

autofs needs to be able to see private data dentry flags for its dentrys
that are being created but not yet hashed and for its dentrys that have
been rmdir()ed but not yet freed. It needs to do this so it can block
processes in these states until a status has been returned to indicate
the given operation is complete.

It does this by keeping two lists, active and expring, of dentrys in
this state and uses ->d_release() to keep them stable while it checks
the reference count to determine if they should be used.

But with the recent lockref changes dentrys being freed sometimes don't
transition to a reference count of 0 before being freed so autofs can
occassionally use a dentry that is invalid which can lead to a panic.

Signed-off-by: Ian Kent
Cc: Al Viro
Cc: Linus Torvalds
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ian Kent
2014-05-07 04:04:59 +0800
139b6a6fb mm: filemap: update find_get_pages_tag() to deal with shadow entries ... Browse Code »

Dave Jones reports the following crash when find_get_pages_tag() runs
into an exceptional entry:

kernel BUG at mm/filemap.c:1347!
RIP: find_get_pages_tag+0x1cb/0x220
Call Trace:
find_get_pages_tag+0x36/0x220
pagevec_lookup_tag+0x21/0x30
filemap_fdatawait_range+0xbe/0x1e0
filemap_fdatawait+0x27/0x30
sync_inodes_sb+0x204/0x2a0
sync_inodes_one_sb+0x19/0x20
iterate_supers+0xb2/0x110
sys_sync+0x44/0xb0
ia32_do_call+0x13/0x13

1343 /*
1344 * This function is never used on a shmem/tmpfs
1345 * mapping, so a swap entry won't be found here.
1346 */
1347 BUG();

After commit 0cd6144aadd2 ("mm + fs: prepare for non-page entries in
page cache radix trees") this comment and BUG() are out of date because
exceptional entries can now appear in all mappings - as shadows of
recently evicted pages.

However, as Hugh Dickins notes,

"it is truly surprising for a PAGECACHE_TAG_WRITEBACK (and probably
any other PAGECACHE_TAG_*) to appear on an exceptional entry.

I expect it comes down to an occasional race in RCU lookup of the
radix_tree: lacking absolute synchronization, we might sometimes
catch an exceptional entry, with the tag which really belongs with
the unexceptional entry which was there an instant before."

And indeed, not only is the tree walk lockless, the tags are also read
in chunks, one radix tree node at a time. There is plenty of time for
page reclaim to swoop in and replace a page that was already looked up
as tagged with a shadow entry.

Remove the BUG() and update the comment. While reviewing all other
lookup sites for whether they properly deal with shadow entries of
evicted pages, update all the comments and fix memcg file charge moving
to not miss shmem/tmpfs swapcache pages.

Fixes: 0cd6144aadd2 ("mm + fs: prepare for non-page entries in page cache radix trees")
Signed-off-by: Johannes Weiner
Reported-by: Dave Jones
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Johannes Weiner
2014-05-07 04:04:59 +0800
49e068f0b mm/compaction: make isolate_freepages start at pageblock boundary ... Browse Code »
6

The compaction freepage scanner implementation in isolate_freepages()
starts by taking the current cc->free_pfn value as the first pfn. In a
for loop, it scans from this first pfn to the end of the pageblock, and
then subtracts pageblock_nr_pages from the first pfn to obtain the first
pfn for the next for loop iteration.

This means that when cc->free_pfn starts at offset X rather than being
aligned on pageblock boundary, the scanner will start at offset X in all
scanned pageblock, ignoring potentially many free pages. Currently this
can happen when

a) zone's end pfn is not pageblock aligned, or

b) through zone->compact_cached_free_pfn with CONFIG_HOLES_IN_ZONE
enabled and a hole spanning the beginning of a pageblock

This patch fixes the problem by aligning the initial pfn in
isolate_freepages() to pageblock boundary. This also permits replacing
the end-of-pageblock alignment within the for loop with a simple
pageblock_nr_pages increment.

Signed-off-by: Vlastimil Babka
Reported-by: Heesub Shin
Acked-by: Minchan Kim
Cc: Mel Gorman
Acked-by: Joonsoo Kim
Cc: Bartlomiej Zolnierkiewicz
Cc: Michal Nazarewicz
Cc: Naoya Horiguchi
Cc: Christoph Lameter
Acked-by: Rik van Riel
Cc: Dongjun Shin
Cc: Sunghwan Yun
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vlastimil Babka
2014-05-07 04:04:59 +0800
0e3b7e540 MAINTAINERS: zswap/zbud: change maintainer email address ... Browse Code »

sjenning@linux.vnet.ibm.com is no longer a viable entity.

Signed-off-by: Seth Jennings
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Seth Jennings
2014-05-07 04:04:58 +0800
d5c9fde3d mm/page-writeback.c: fix divide by zero in pos_ratio_polynom ... Browse Code »
5

It is possible for "limit - setpoint + 1" to equal zero, after getting
truncated to a 32 bit variable, and resulting in a divide by zero error.

Using the fully 64 bit divide functions avoids this problem. It also
will cause pos_ratio_polynom() to return the correct value when
(setpoint - limit) exceeds 2^32.

Also uninline pos_ratio_polynom, at Andrew's request.

Signed-off-by: Rik van Riel
Reviewed-by: Michal Hocko
Cc: Aneesh Kumar K.V
Cc: Mel Gorman
Cc: Nishanth Aravamudan
Cc: Luiz Capitulino
Cc: Masayoshi Mizuma
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rik van Riel
2014-05-07 04:04:58 +0800
457c1b27e hugetlb: ensure hugepage access is denied if hugepages are not supported ... Browse Code »
7

Currently, I am seeing the following when I `mount -t hugetlbfs /none
/dev/hugetlbfs`, and then simply do a `ls /dev/hugetlbfs`. I think it's
related to the fact that hugetlbfs is properly not correctly setting
itself up in this state?:

Unable to handle kernel paging request for data at address 0x00000031
Faulting instruction address: 0xc000000000245710
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=2048 NUMA pSeries
....

In KVM guests on Power, in a guest not backed by hugepages, we see the
following:

AnonHugePages: 0 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 64 kB

HPAGE_SHIFT == 0 in this configuration, which indicates that hugepages
are not supported at boot-time, but this is only checked in
hugetlb_init(). Extract the check to a helper function, and use it in a
few relevant places.

This does make hugetlbfs not supported (not registered at all) in this
environment. I believe this is fine, as there are no valid hugepages
and that won't change at runtime.

[akpm@linux-foundation.org: use pr_info(), per Mel]
[akpm@linux-foundation.org: fix build when HPAGE_SHIFT is undefined]
Signed-off-by: Nishanth Aravamudan
Reviewed-by: Aneesh Kumar K.V
Acked-by: Mel Gorman
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nishanth Aravamudan
2014-05-07 04:04:58 +0800
93030d83b slub: fix memcg_propagate_slab_attrs ... Browse Code »

After creating a cache for a memcg we should initialize its sysfs attrs
with the values from its parent. That's what memcg_propagate_slab_attrs
is for. Currently it's broken - we clearly muddled root-vs-memcg caches
there. Let's fix it up.

Signed-off-by: Vladimir Davydov
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: Michal Hocko
Cc: Johannes Weiner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2014-05-07 04:04:58 +0800
35738392b drivers/rtc/rtc-pcf8523.c: fix month definition ... Browse Code »

PCF8523 uses 1-12 to represent month according to datasheet.
link: www.nxp.com/documents/data_sheet/PCF8523.pdf.

Signed-off-by: Chris Cui
Cc: Alessandro Zummo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Chris Cui
2014-05-07 04:04:58 +0800
8169d3005 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs fixes from Al Viro:
"dcache fixes + kvfree() (uninlined, exported by mm/util.c) + posix_acl
bugfix from hch"

The dcache fixes are for a subtle LRU list corruption bug reported by
Miklos Szeredi, where people inside IBM saw list corruptions with the
LTP/host01 test.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
nick kvfree() from apparmor
posix_acl: handle NULL ACL in posix_acl_equiv_mode
dcache: don't need rcu in shrink_dentry_list()
more graceful recovery in umount_collect()
don't remove from shrink list in select_collect()
dentry_kill(): don't try to remove from shrink list
expand the call of dentry_lru_del() in dentry_kill()
new helper: dentry_free()
fold try_prune_one_dentry()
fold d_kill() and d_free()
fix races between __d_instantiate() and checks of dentry flags

Linus Torvalds
2014-05-07 03:22:20 +0800
39f1f78d5 nick kvfree() from apparmor ... Browse Code »

too many places open-code it

Signed-off-by: Al Viro

Al Viro
2014-05-07 02:02:53 +0800
50c6e282b posix_acl: handle NULL ACL in posix_acl_equiv_mode ... Browse Code »
5

Various filesystems don't bother checking for a NULL ACL in
posix_acl_equiv_mode, and thus can dereference a NULL pointer when it
gets passed one. This usually happens from the NFS server, as the ACL tools
never pass a NULL ACL, but instead of one representing the mode bits.

Instead of adding boilerplat to all filesystems put this check into one place,
which will allow us to remove the check from other filesystems as well later
on.

Signed-off-by: Christoph Hellwig
Reported-by: Ben Greear
Reported-by: Marco Munderloh ,
Cc: Chuck Lever
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro

Christoph Hellwig
2014-05-07 01:58:42 +0800
256cf4c43 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse ... Browse Code »

Pull fuse fixes from Miklos Szeredi:
"This adds ctime update in the new cached writeback mode and also
fixes/simplifies the mtime update handling. Support for rename flags
(aka renameat2) is also added to the userspace API"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: add renameat2 support
fuse: clear MS_I_VERSION
fuse: clear FUSE_I_CTIME_DIRTY flag on setattr
fuse: trust kernel i_ctime only
fuse: remove .update_time
fuse: allow ctime flushing to userspace
fuse: fuse: add time_gran to INIT_OUT
fuse: add .write_inode
fuse: clean up fsync
fuse: fuse: fallocate: use file_update_time()
fuse: update mtime on open(O_TRUNC) in atomic_o_trunc mode
fuse: update mtime on truncate(2)
fuse: do not use uninitialized i_mode
fuse: fix mtime update error in fsync
fuse: check fallocate mode
fuse: add __exit to fuse_ctl_cleanup

Linus Torvalds
2014-05-07 00:09:35 +0800
9bd29c56a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc ... Browse Code »

Pull sparc fixes from David Miller:
"I've been auditing the THP support on sparc64 and found several bugs,
hopefully most of which are fixed completely here.

Also an RT kernel locking fix from Kirill Tkhai"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Give more detailed information in {pgd,pmd}_ERROR() and kill pte_ERROR().
sparc64: Add basic validations to {pud,pmd}_bad().
sparc64: Use 'ILOG2_4MB' instead of constant '22'.
sparc64: Fix range check in kern_addr_valid().
sparc64: Fix top-level fault handling bugs.
sparc64: Handle 32-bit tasks properly in compute_effective_address().
sparc64: Don't use _PAGE_PRESENT in pte_modify() mask.
sparc64: Fix hex values in comment above pte_modify().
sparc64: Fix bugs in get_user_pages_fast() wrt. THP.
sparc64: Fix huge PMD invalidation.
sparc64: Fix executable bit testing in set_pmd_at() paths.
sparc64: Normalize NMI watchdog logging and behavior.
sparc64: Make itc_sync_lock raw
sparc64: Fix argument sign extension for compat_sys_futex().

Linus Torvalds
2014-05-07 00:08:03 +0800

06 May, 2014

11 commits

30321c7b6 slab: Fix off by one in object max number tests. ... Browse Code »

If freelist_idx_t is a byte, SLAB_OBJ_MAX_NUM should be 255 not 256, and
likewise if freelist_idx_t is a short, then it should be 65535 not
65536.

This was leading to all kinds of random crashes on sparc64 where
PAGE_SIZE is 8192. One problem shown was that if spinlock debugging was
enabled, we'd get deadlocks in copy_pte_range() or do_wp_page() with the
same cpu already holding a lock it shouldn't hold, or the lock belonging
to a completely unrelated process.

Fixes: a41adfaa23df ("slab: introduce byte sized index for the freelist of a slab")
Signed-off-by: David S. Miller
Signed-off-by: Linus Torvalds

David Miller
2014-05-06 11:38:49 +0800
7cc68973c slab: fix the type of the index on freelist index accessor ... Browse Code »

Commit a41adfaa23df ("slab: introduce byte sized index for the freelist
of a slab") changes the size of freelist index and also changes
prototype of accessor function to freelist index. And there was a
mistake.

The mistake is that although it changes the size of freelist index
correctly, it changes the size of the index of freelist index
incorrectly. With patch, freelist index can be 1 byte or 2 bytes, that
means that num of object on on a slab can be more than 255. So we need
more than 1 byte for the index to find the index of free object on
freelist. But, above patch makes this index type 1 byte, so slab which
have more than 255 objects cannot work properly and in consequence of
it, the system cannot boot.

This issue was reported by Steven King on m68knommu which would use
2 bytes freelist index:

https://lkml.org/lkml/2014/4/16/433

To fix is easy. To change the type of the index of freelist index on
accessor functions is enough to fix this bug. Although 2 bytes is
enough, I use 4 bytes since it have no bad effect and make things more
easier. This fix was suggested and tested by Steven in his original
report.

Signed-off-by: Joonsoo Kim
Reported-and-acked-by: Steven King
Acked-by: Christoph Lameter
Tested-by: James Hogan
Tested-by: David Miller
Cc: Pekka Enberg
Signed-off-by: Linus Torvalds

Joonsoo Kim
2014-05-06 11:38:49 +0800
2080cee43 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) e1000e computes header length incorrectly wrt vlans, fix from Vlad
Yasevich.

2) ns_capable() check in sock_diag netlink code, from Andrew
Lutomirski.

3) Fix invalid queue pairs handling in virtio_net, from Amos Kong.

4) Checksum offloading busted in sxgbe driver due to incorrect
descriptor layout, fix from Byungho An.

5) Fix build failure with SMC_DEBUG set to 2 or larger, from Zi Shen
Lim.

6) Fix uninitialized A and X registers in BPF interpreter, from Alexei
Starovoitov.

7) Fix arch dependencies of candence driver.

8) Fix netlink capabilities checking tree-wide, from Eric W Biederman.

9) Don't dump IFLA_VF_PORTS if netlink request didn't ask for it in
IFLA_EXT_MASK, from David Gibson.

10) IPV6 FIB dump restart doesn't handle table changes that happen
meanwhile, causing the code to loop forever or emit dups, fix from
Kumar Sandararajan.

11) Memory leak on VF removal in bnx2x, from Yuval Mintz.

12) Bug fixes for new Altera TSE driver from Vince Bridgers.

13) Fix route lookup key in SCTP, from Xugeng Zhang.

14) Use BH blocking spinlocks in SLIP, as per a similar fix to CAN/SLCAN
driver. From Oliver Hartkopp.

15) TCP doesn't bump retransmit counters in some code paths, fix from
Eric Dumazet.

16) Clamp delayed_ack in tcp_cubic to prevent theoretical divides by
zero. Fix from Liu Yu.

17) Fix locking imbalance in error paths of HHF packet scheduler, from
John Fastabend.

18) Properly reference the transport module when vsock_core_init() runs,
from Andy King.

19) Fix buffer overflow in cdc_ncm driver, from Bjørn Mork.

20) IP_ECN_decapsulate() doesn't see a correct SKB network header in
ip_tunnel_rcv(), fix from Ying Cai.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (132 commits)
net: macb: Fix race between HW and driver
net: macb: Remove 'unlikely' optimization
net: macb: Re-enable RX interrupt only when RX is done
net: macb: Clear interrupt flags
net: macb: Pass same size to DMA_UNMAP as used for DMA_MAP
ip_tunnel: Set network header properly for IP_ECN_decapsulate()
e1000e: Restrict MDIO Slow Mode workaround to relevant parts
e1000e: Fix issue with link flap on 82579
e1000e: Expand workaround for 10Mb HD throughput bug
e1000e: Workaround for dropped packets in Gig/100 speeds on 82579
net/mlx4_core: Don't issue PCIe speed/width checks for VFs
net/mlx4_core: Load the Eth driver first
net/mlx4_core: Fix slave id computation for single port VF
net/mlx4_core: Adjust port number in qp_attach wrapper when detaching
net: cdc_ncm: fix buffer overflow
Altera TSE: ALTERA_TSE should depend on HAS_DMA
vsock: Make transport the proto owner
net: sched: lock imbalance in hhf qdisc
net: mvmdio: Check for a valid interrupt instead of an error
net phy: Check for aneg completion before setting state to PHY_RUNNING
...

Linus Torvalds
2014-05-06 06:59:46 +0800
783e9e8ed Merge tag 'usb-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb ... Browse Code »

Pull USB fixes from Greg KH:
"Here are some small fixes and device ids for 3.15-rc4.

All have been in linux-next just fine"

* tag 'usb-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: Nokia 5300 should be treated as unusual dev
USB: Nokia 305 should be treated as unusual dev
fsl-usb: do not test for PHY_CLK_VALID bit on controller version 1.6
usb: storage: shuttle_usbat: fix discs being detected twice
usb: qcserial: add a number of Dell devices
USB: OHCI: fix problem with global suspend on ATI controllers
usb: gadget: at91-udc: fix irq and iomem resource retrieval
usb: phy: fsm: change "|" to "||" for condition OTG_STATE_A_WAIT_BCON at statemachine
usb: phy: fsm: update OTG HNP state transition

Linus Torvalds
2014-05-06 06:51:17 +0800
df862f625 Merge tag 'tty-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty ... Browse Code »

Pull tty/serial fixes from Greg KH:
"Here are some tty and serial driver fixes for things reported
recently"

* tag 'tty-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
tty: Fix lockless tty buffer race
Revert "tty: Fix race condition between __tty_buffer_request_room and flush_to_ldisc"
drivers/tty/hvc: don't free hvc_console_setup after init
n_tty: Fix n_tty_write crash when echoing in raw mode
tty: serial: 8250_core.c Bug fix for Exar chips.

Linus Torvalds
2014-05-06 06:50:16 +0800
a1e74464f Merge tag 'staging-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging ... Browse Code »

Pull staging / iio fixes from Greg KH:
"Here are some small IIO driver fixes for 3.15-rc4 that resolve some
reported issues"

* tag 'staging-3.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
iio: adc: Nothing in ADC should be a bool CONFIG
iio: exynos_adc: use indio_dev->dev structure to handle child nodes
iio:imu:mpu6050: Fixed segfault in Invensens MPU driver due to null dereference
staging:iio:ad2s1200 fix missing parenthesis in a for statment.

Linus Torvalds
2014-05-06 06:49:38 +0800
03787ff6f Merge tag 'xtensa-next-20140503' of git://github.com/czankel/xtensa-linux ... Browse Code »

Pull Xtensa fixes from Chris Zankel:
- Fixes allmodconfig, allnoconfig builds
- Adds highmem support
- Enables build-time exception table sorting.

* tag 'xtensa-next-20140503' of git://github.com/czankel/xtensa-linux:
xtensa: ISS: don't depend on CONFIG_TTY
xtensa: xt2000: drop redundant sysmem initialization
xtensa: add support for KC705
xtensa: xtfpga: introduce SoC I/O bus
xtensa: add HIGHMEM support
xtensa: optimize local_flush_tlb_kernel_range
xtensa: dump sysmem from the bootmem_init
xtensa: handle memmap kernel option
xtensa: keep sysmem banks ordered in mem_reserve
xtensa: keep sysmem banks ordered in add_sysmem_bank
xtensa: split bootparam and kernel meminfo
xtensa: enable sorting extable at build time
xtensa: export __{invalidate,flush}_dcache_range
xtensa: Export __invalidate_icache_range

Linus Torvalds
2014-05-06 06:36:59 +0800
5575eeb7b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client ... Browse Code »

Pull Ceph fixes from Sage Weil:
"First, there is a critical fix for the new primary-affinity function
that went into -rc1.

The second batch of patches from Zheng fix a range of problems with
directory fragmentation, readdir, and a few odds and ends for cephfs"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
ceph: reserve caps for file layout/lock MDS requests
ceph: avoid releasing caps that are being used
ceph: clear directory's completeness when creating file
libceph: fix non-default values check in apply_primary_affinity()
ceph: use fpos_cmp() to compare dentry positions
ceph: check directory's completeness before emitting directory entry

Linus Torvalds
2014-05-06 06:17:02 +0800
c8ea5a22b net: macb: Fix race between HW and driver ... Browse Code »

Under "heavy" RX load, the driver cannot handle the descriptors fast
enough. In detail, when a descriptor is consumed, its used flag is
cleared and once the RX budget is consumed all descriptors with a
cleared used flag are prepared to receive more data. Under load though,
the HW may constantly receive more data and use those descriptors with a
cleared used flag before they are actually prepared for next usage.

The head and tail pointers into the RX-ring should always be valid and
we can omit clearing and checking of the used flag.

Signed-off-by: Soren Brinkmann
Signed-off-by: David S. Miller

Soren Brinkmann
2014-05-06 05:11:18 +0800
504ad98df net: macb: Remove 'unlikely' optimization ... Browse Code »

Coverage data suggests that the unlikely case of receiving data while
the receive handler is running may not be that unlikely.
Coverage data after running iperf for a while:
91320: 891: work_done = bp->macbgem_ops.mog_rx(bp, budget);
91320: 892: if (work_done < budget) {
2362: 893: napi_complete(napi);
-: 894:
-: 895: /* Packets received while interrupts were disabled */
4724: 896: status = macb_readl(bp, RSR);
2362: 897: if (unlikely(status)) {
762: 898: if (bp->caps & MACB_CAPS_ISR_CLEAR_ON_WRITE)
762: 899: macb_writel(bp, ISR, MACB_BIT(RCOMP));
-: 900: napi_reschedule(napi);
-: 901: } else {
1600: 902: macb_writel(bp, IER, MACB_RX_INT_FLAGS);
-: 903: }
-: 904: }

Signed-off-by: Soren Brinkmann
Signed-off-by: David S. Miller

Soren Brinkmann
2014-05-06 05:11:18 +0800
02f7a34f3 net: macb: Re-enable RX interrupt only when RX is done ... Browse Code »

When data is received during the driver processing received data the
NAPI is re-scheduled. In that case the RX interrupt should not be
re-enabled.

Signed-off-by: Soren Brinkmann
Signed-off-by: David S. Miller

Soren Brinkmann
2014-05-06 05:11:18 +0800