Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

28 Apr, 2014

1 commit

a949ae560 ftrace/module: Hardcode ftrace_module_init() call into load_module() ... Browse Code »
10

A race exists between module loading and enabling of function tracer.

CPU 1 CPU 2
----- -----
load_module()
module->state = MODULE_STATE_COMING

register_ftrace_function()
mutex_lock(&ftrace_lock);
ftrace_startup()
update_ftrace_function();
ftrace_arch_code_modify_prepare()
set_all_module_text_rw();

ftrace_arch_code_modify_post_process()
set_all_module_text_ro();

[ here all module text is set to RO,
including the module that is
loading!! ]

blocking_notifier_call_chain(MODULE_STATE_COMING);
ftrace_init_module()

[ tries to modify code, but it's RO, and fails!
ftrace_bug() is called]

When this race happens, ftrace_bug() will produces a nasty warning and
all of the function tracing features will be disabled until reboot.

The simple solution is to treate module load the same way the core
kernel is treated at boot. To hardcode the ftrace function modification
of converting calls to mcount into nops. This is done in init/main.c
there's no reason it could not be done in load_module(). This gives
a better control of the changes and doesn't tie the state of the
module to its notifiers as much. Ftrace is special, it needs to be
treated as such.

The reason this would work, is that the ftrace_module_init() would be
called while the module is in MODULE_STATE_UNFORMED, which is ignored
by the set_all_module_text_ro() call.

Link: http://lkml.kernel.org/r/1395637826-3312-1-git-send-email-indou.takao@jp.fujitsu.com

Reported-by: Takao Indoh
Acked-by: Rusty Russell
Cc: stable@vger.kernel.org # 2.6.38+
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2014-04-28 22:37:21 +0800

20 Apr, 2014

1 commit

8f98f6f5d Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull scheduler fixes from Ingo Molnar:
"Two fixes:

- a SCHED_DEADLINE task selection fix
- a sched/numa related lockdep splat fix"

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Check for stop task appearance when balancing happens
sched/numa: Fix task_numa_free() lockdep splat

Linus Torvalds
2014-04-20 01:40:51 +0800

19 Apr, 2014

3 commits

ebfc45ee7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull more networking fixes from David Miller:

1) Fix mlx4_en_netpoll implementation, it needs to schedule a NAPI
context, not synchronize it. From Chris Mason.

2) Ipv4 flow input interface should never be zero, it should be
LOOPBACK_IFINDEX instead. From Cong Wang and Julian Anastasov.

3) Properly configure MAC to PHY connection in mvneta devices, from
Thomas Petazzoni.

4) sys_recv should use SYSCALL_DEFINE. From Jan Glauber.

5) Tunnel driver ioctls do not use the correct namespace, fix from
Nicolas Dichtel.

6) Fix memory leak on seccomp filter attach, from Kees Cook.

7) Fix lockdep warning for nested vlans, from Ding Tianhong.

8) Crashes can happen in SCTP due to how the auth_enable value is
managed, fix from Vlad Yasevich.

9) Wireless fixes from John W Linville and co.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits)
net: sctp: cache auth_enable per endpoint
tg3: update rx_jumbo_pending ring param only when jumbo frames are enabled
vlan: Fix lockdep warning when vlan dev handle notification
seccomp: fix memory leak on filter attach
isdn: icn: buffer overflow in icn_command()
ip6_tunnel: use the right netns in ioctl handler
sit: use the right netns in ioctl handler
ip_tunnel: use the right netns in ioctl handler
net: use SYSCALL_DEFINEx for sys_recv
net: mdio-gpio: Add support for separate MDI and MDO gpio pins
net: mdio-gpio: Add support for active low gpio pins
net: mdio-gpio: Use devm_ functions where possible
ipv4, route: pass 0 instead of LOOPBACK_IFINDEX to fib_validate_source()
ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif
mlx4_en: don't use napi_synchronize inside mlx4_en_netpoll
net: mvneta: properly configure the MAC PHY connection in all situations
net: phy: add minimal support for QSGMII PHY
sfc:On MCDI timeout, issue an FLR (and mark MCDI to fail-fast)
mwifiex: fix hung task on command timeout
mwifiex: process event before command response
...

Linus Torvalds
2014-04-19 08:53:46 +0800
7861144b8 kernel/watchdog.c:touch_softlockup_watchdog(): use raw_cpu_write() ... Browse Code »

Fix:

BUG: using __this_cpu_write() in preemptible [00000000] code: systemd-udevd/497
caller is __this_cpu_preempt_check+0x13/0x20
CPU: 3 PID: 497 Comm: systemd-udevd Tainted: G W 3.15.0-rc1 #9
Hardware name: Hewlett-Packard HP EliteBook 8470p/179B, BIOS 68ICF Ver. F.02 04/27/2012
Call Trace:
check_preemption_disabled+0xe1/0xf0
__this_cpu_preempt_check+0x13/0x20
touch_nmi_watchdog+0x28/0x40

Reported-by: Luis Henriques
Tested-by: Luis Henriques
Cc: Eric Piel
Cc: Robert Moore
Cc: Lv Zheng
Cc: "Rafael J. Wysocki"
Cc: Len Brown
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2014-04-19 07:40:08 +0800
7d77879bf Merge tag 'trace-fixes-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/g… ... Browse Code »

…it/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
"This contains two fixes.

The first is to remove a duplication of creating debugfs files that
already exist and causes an error report to be printed due to the
failure of the second creation.

The second is a memory leak fix that was introduced in 3.14"

* tag 'trace-fixes-v3.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing/uprobes: Fix uprobe_cpu_buffer memory leak
tracing: Do not try to recreated toplevel set_ftrace_* files

Linus Torvalds
2014-04-19 01:16:43 +0800

18 Apr, 2014

1 commit

87a54cae0 Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull timer fixes from Thomas Gleixner:
"Viresh unearthed the following three hickups in the timer/timekeeping
code:

- Negated check for the result of a clock event selection

- A missing early exit in the jiffies update path which causes
update_wall_time to be called for nothing causing lock contention
and wasted cycles in the timer interrupt

- Checking a variable in the NOHZ code enable code for true which can
only be set by that very code after the check succeeds. That
results in a rock solid runtime disablement of that feature"

* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick-sched: Check tick_nohz_enabled in tick_nohz_switch_to_nohz()
tick-sched: Don't call update_wall_time() when delta is lesser than tick_period
tick-common: Fix wrong check in tick_check_replacement()

Linus Torvalds
2014-04-18 07:19:10 +0800

17 Apr, 2014

5 commits

6ea6215fe tracing/uprobes: Fix uprobe_cpu_buffer memory leak ... Browse Code »
5

Forgot to free uprobe_cpu_buffer percpu page in uprobe_buffer_disable().

Link: http://lkml.kernel.org/p/534F8B3F.1090407@huawei.com

Cc: stable@vger.kernel.org # v3.14+
Acked-by: Namhyung Kim
Signed-off-by: zhangwei(Jovi)
Signed-off-by: Steven Rostedt

zhangwei(Jovi)
2014-04-17 22:44:42 +0800
a1d9a3231 sched: Check for stop task appearance when balancing happens ... Browse Code »

We need to do it like we do for the other higher priority classes..

Signed-off-by: Kirill Tkhai
Cc: Michael wang
Cc: Sasha Levin
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/336561397137116@web27h.yandex.ru
Signed-off-by: Ingo Molnar

Kirill Tkhai
2014-04-17 19:39:51 +0800
d99d5917e Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull locking fixes from Ingo Molnar:
"liblockdep fixes and mutex debugging fixes"

* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/mutex: Fix debug_mutexes
tools/liblockdep: Add proper versioning to the shared obj
tools/liblockdep: Ignore asmlinkage and visible

Linus Torvalds
2014-04-17 07:35:18 +0800
5d6c97c55 tracing: Do not try to recreated toplevel set_ftrace_* files ... Browse Code »

With the restructing of the function tracer working with instances, the
"top level" buffer is a bit special, as the function tracing is mapped
to the same set of filters. This is done by using a "global_ops" descriptor
and having the "set_ftrace_filter" and "set_ftrace_notrace" map to it.

When an instance is created, it creates the same files but its for the
local instance and not the global_ops.

The issues is that the local instance creation shares some code with
the global instance one and we end up trying to create th top level
"set_ftrace_*" files twice, and on boot up, we get an error like this:

Could not create debugfs 'set_ftrace_filter' entry
Could not create debugfs 'set_ftrace_notrace' entry

The reason they failed to be created was because they were created
twice, and the second time gives this error as you can not create the
same file twice.

Reported-by: Borislav Petkov
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2014-04-17 07:21:53 +0800
0acf07d24 seccomp: fix memory leak on filter attach ... Browse Code »

This sets the correct error code when final filter memory is unavailable,
and frees the raw filter no matter what.

unreferenced object 0xffff8800d6ea4000 (size 512):
comm "sshd", pid 278, jiffies 4294898315 (age 46.653s)
hex dump (first 32 bytes):
21 00 00 00 04 00 00 00 15 00 01 00 3e 00 00 c0 !...........>...
06 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!.......
backtrace:
[] kmemleak_alloc+0x4e/0xb0
[] __kmalloc+0x280/0x320
[] prctl_set_seccomp+0x11e/0x3b0
[] SyS_prctl+0x3bb/0x4a0
[] system_call_fastpath+0x1a/0x1f
[] 0xffffffffffffffff

Reported-by: Masami Ichikawa
Signed-off-by: Kees Cook
Tested-by: Masami Ichikawa
Acked-by: Daniel Borkmann
Signed-off-by: David S. Miller

Kees Cook
2014-04-17 03:25:53 +0800

16 Apr, 2014

4 commits

10ec34fcb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Fix BPF filter validation of netlink attribute accesses, from
Mathias Kruase.

2) Netfilter conntrack generation seqcount not initialized properly,
from Andrey Vagin.

3) Fix comparison mask computation on big-endian in nft_cmp_fast(),
from Patrick McHardy.

4) Properly limit MTU over ipv6, from Eric Dumazet.

5) Fix seccomp system call argument population on 32-bit, from Daniel
Borkmann.

6) skb_network_protocol() should not use hard-coded ETH_HLEN, instead
skb->mac_len needs to be used. From Vlad Yasevich.

7) We have several cases of using socket based communications to
implement a tunnel. For example, some tunnels are encapsulations
over UDP so we use an internal kernel UDP socket to do the
transmits.

These tunnels should behave just like other software devices and
pass the packets on down to the next layer.

Most importantly we want the top-level socket (eg TCP) that created
the traffic to be charged for the SKB memory.

However, once you get into the IP output path, we have code that
assumed that whatever was attached to skb->sk is an IP socket.

To keep the top-level socket being charged for the SKB memory,
whilst satisfying the needs of the IP output path, we now pass in an
explicit 'sk' argument.

From Eric Dumazet.

8) ping_init_sock() leaks group info, from Xiaoming Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
cxgb4: use the correct max size for firmware flash
qlcnic: Fix MSI-X initialization code
ip6_gre: don't allow to remove the fb_tunnel_dev
ipv4: add a sock pointer to dst->output() path.
ipv4: add a sock pointer to ip_queue_xmit()
driver/net: cosa driver uses udelay incorrectly
at86rf230: fix __at86rf230_read_subreg function
at86rf230: remove check if AVDD settled
net: cadence: Add architecture dependencies
net: Start with correct mac_len in skb_network_protocol
Revert "net: sctp: Fix a_rwnd/rwnd management to reflect real state of the receiver's buffer"
cxgb4: Save the correct mac addr for hw-loopback connections in the L2T
net: filter: seccomp: fix wrong decoding of BPF_S_ANC_SECCOMP_LD_W
seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF
qlcnic: Do not disable SR-IOV when VFs are assigned to VMs
qlcnic: Fix QLogic application/driver interface for virtual NIC configuration
qlcnic: Fix PVID configuration on eSwitch port.
qlcnic: Fix max ring count calculation
qlcnic: Fix to send INIT_NIC_FUNC as first mailbox.
qlcnic: Fix panic due to uninitialzed delayed_work struct in use.
...

Linus Torvalds
2014-04-16 11:30:30 +0800
27630532e tick-sched: Check tick_nohz_enabled in tick_nohz_switch_to_nohz() ... Browse Code »
21

Since commit d689fe222 (NOHZ: Check for nohz active instead of nohz
enabled) the tick_nohz_switch_to_nohz() function returns because it
checks for the tick_nohz_active flag. This can't be set, because the
function itself sets it.

Undo the change in tick_nohz_switch_to_nohz().

Signed-off-by: Viresh Kumar
Cc: linaro-kernel@lists.linaro.org
Cc: fweisbec@gmail.com
Cc: Arvind.Chauhan@arm.com
Cc: linaro-networking@linaro.org
Cc: # 3.13+
Link: http://lkml.kernel.org/r/40939c05f2d65d781b92b20302b02243d0654224.1397537987.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner

Viresh Kumar
2014-04-16 02:26:58 +0800
03e6bdc5c tick-sched: Don't call update_wall_time() when delta is lesser than tick_period ... Browse Code »
5

In tick_do_update_jiffies64() we are processing ticks only if delta is
greater than tick_period. This is what we are supposed to do here and
it broke a bit with this patch:

commit 47a1b796 (tick/timekeeping: Call update_wall_time outside the
jiffies lock)

With above patch, we might end up calling update_wall_time() even if
delta is found to be smaller that tick_period. Fix this by returning
when the delta is less than tick period.

[ tglx: Made it a 3 liner and massaged changelog ]

Signed-off-by: Viresh Kumar
Cc: linaro-kernel@lists.linaro.org
Cc: fweisbec@gmail.com
Cc: Arvind.Chauhan@arm.com
Cc: linaro-networking@linaro.org
Cc: John Stultz
Cc: # v3.14+
Link: http://lkml.kernel.org/r/80afb18a494b0bd9710975bcc4de134ae323c74f.1397537987.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner

Viresh Kumar
2014-04-16 02:26:45 +0800
521c42990 tick-common: Fix wrong check in tick_check_replacement() ... Browse Code »
5

tick_check_replacement() returns if a replacement of clock_event_device is
possible or not. It does this as the first check:

if (tick_check_percpu(curdev, newdev, smp_processor_id()))
return false;

Thats wrong. tick_check_percpu() returns true when the device is
useable. Check for false instead.

[ tglx: Massaged changelog ]

Signed-off-by: Viresh Kumar
Cc: # v3.11+
Cc: linaro-kernel@lists.linaro.org
Cc: fweisbec@gmail.com
Cc: Arvind.Chauhan@arm.com
Cc: linaro-networking@linaro.org
Link: http://lkml.kernel.org/r/486a02efe0246635aaba786e24b42d316438bf3b.1397537987.git.viresh.kumar@linaro.org
Signed-off-by: Thomas Gleixner

Viresh Kumar
2014-04-16 02:26:44 +0800

15 Apr, 2014

2 commits

e79323bd8 user namespace: fix incorrect memory barriers ... Browse Code »
5

smp_read_barrier_depends() can be used if there is data dependency between
the readers - i.e. if the read operation after the barrier uses address
that was obtained from the read operation before the barrier.

In this file, there is only control dependency, no data dependecy, so the
use of smp_read_barrier_depends() is incorrect. The code could fail in the
following way:
* the cpu predicts that idx < entries is true and starts executing the
body of the for loop
* the cpu fetches map->extent[0].first and map->extent[0].count
* the cpu fetches map->nr_extents
* the cpu verifies that idx < extents is true, so it commits the
instructions in the body of the for loop

The problem is that in this scenario, the cpu read map->extent[0].first
and map->nr_extents in the wrong order. We need a full read memory barrier
to prevent it.

Signed-off-by: Mikulas Patocka
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds

Mikulas Patocka
2014-04-15 07:03:02 +0800
2eac76483 seccomp: fix populating a0-a5 syscall args in 32-bit x86 BPF ... Browse Code »

Linus reports that on 32-bit x86 Chromium throws the following seccomp
resp. audit log messages:

audit: type=1326 audit(1397359304.356:28108): auid=500 uid=500
gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0
syscall=172 compat=0 ip=0xb2dd9852 code=0x30000

audit: type=1326 audit(1397359304.356:28109): auid=500 uid=500
gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023
pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0 syscall=5
compat=0 ip=0xb2dd9852 code=0x50000

These audit messages are being triggered via audit_seccomp() through
__secure_computing() in seccomp mode (BPF) filter with seccomp return
codes 0x30000 (== SECCOMP_RET_TRAP) and 0x50000 (== SECCOMP_RET_ERRNO)
during filter runtime. Moreover, Linus reports that x86_64 Chromium
seems fine.

The underlying issue that explains this is that the implementation of
populate_seccomp_data() is wrong. Our seccomp data structure sd that
is being shared with user ABI is:

struct seccomp_data {
int nr;
__u32 arch;
__u64 instruction_pointer;
__u64 args[6];
};

Therefore, a simple cast to 'unsigned long *' for storing the value of
the syscall argument via syscall_get_arguments() is just wrong as on
32-bit x86 (or any other 32bit arch), it would result in storing a0-a5
at wrong offsets in args[] member, and thus i) could leak stack memory
to user space and ii) tampers with the logic of seccomp BPF programs
that read out and check for syscall arguments:

syscall_get_arguments(task, regs, 0, 1, (unsigned long *) &sd->args[0]);

Tested on 32-bit x86 with Google Chrome, unfortunately only via remote
test machine through slow ssh X forwarding, but it fixes the issue on
my side. So fix it up by storing args in type correct variables, gcc
is clever and optimizes the copy away in other cases, e.g. x86_64.

Fixes: bd4cf0ed331a ("net: filter: rework/optimize internal BPF interpreter's instruction set")
Reported-and-bisected-by: Linus Torvalds
Signed-off-by: Daniel Borkmann
Signed-off-by: Alexei Starovoitov
Cc: Linus Torvalds
Cc: Eric Paris
Cc: James Morris
Cc: Kees Cook
Signed-off-by: David S. Miller

Daniel Borkmann
2014-04-15 04:26:47 +0800

13 Apr, 2014

4 commits

d7e8af1af futex: update documentation for ordering guarantees ... Browse Code »

Commits 11d4616bd07f ("futex: revert back to the explicit waiter
counting code") and 69cd9eba3886 ("futex: avoid race between requeue and
wake") changed some of the finer details of how we think about futexes.
One was a late fix and the other a consequence of overlooking the whole
requeuing logic.

The first change caused our documentation to be incorrect, and the
second made us aware that we need to explicitly add more details to it.

Signed-off-by: Davidlohr Bueso
Signed-off-by: Linus Torvalds

Davidlohr Bueso
2014-04-13 08:57:51 +0800
5166701b3 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs updates from Al Viro:
"The first vfs pile, with deep apologies for being very late in this
window.

Assorted cleanups and fixes, plus a large preparatory part of iov_iter
work. There's a lot more of that, but it'll probably go into the next
merge window - it *does* shape up nicely, removes a lot of
boilerplate, gets rid of locking inconsistencie between aio_write and
splice_write and I hope to get Kent's direct-io rewrite merged into
the same queue, but some of the stuff after this point is having
(mostly trivial) conflicts with the things already merged into
mainline and with some I want more testing.

This one passes LTP and xfstests without regressions, in addition to
usual beating. BTW, readahead02 in ltp syscalls testsuite has started
giving failures since "mm/readahead.c: fix readahead failure for
memoryless NUMA nodes and limit readahead pages" - might be a false
positive, might be a real regression..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
missing bits of "splice: fix racy pipe->buffers uses"
cifs: fix the race in cifs_writev()
ceph_sync_{,direct_}write: fix an oops on ceph_osdc_new_request() failure
kill generic_file_buffered_write()
ocfs2_file_aio_write(): switch to generic_perform_write()
ceph_aio_write(): switch to generic_perform_write()
xfs_file_buffered_aio_write(): switch to generic_perform_write()
export generic_perform_write(), start getting rid of generic_file_buffer_write()
generic_file_direct_write(): get rid of ppos argument
btrfs_file_aio_write(): get rid of ppos
kill the 5th argument of generic_file_buffered_write()
kill the 4th argument of __generic_file_aio_write()
lustre: don't open-code kernel_recvmsg()
ocfs2: don't open-code kernel_recvmsg()
drbd: don't open-code kernel_recvmsg()
constify blk_rq_map_user_iov() and friends
lustre: switch to kernel_sendmsg()
ocfs2: don't open-code kernel_sendmsg()
take iov_iter stuff to mm/iov_iter.c
process_vm_access: tidy up a bit
...

Linus Torvalds
2014-04-13 05:49:50 +0800
0a7418f5f Merge tag 'trace-3.15-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull more tracing updates from Steven Rostedt:
"This includes the final patch to clean up and fix the issue with the
design of tracepoints and how a user could register a tracepoint and
have that tracepoint not be activated but no error was shown.

The design was for an out of tree module but broke in tree users. The
clean up was to remove the saving of the hash table of tracepoint
names such that they can be enabled before they exist (enabling a
module tracepoint before that module is loaded). This added more
complexity than needed. The clean up was to remove that code and just
enable tracepoints that exist or fail if they do not.

This removed a lot of code as well as the complexity that it brought.
As a side effect, instead of registering a tracepoint by its name, the
tracepoint needs to be registered with the tracepoint descriptor.
This removes having to duplicate the tracepoint names that are
enabled.

The second patch was added that simplified the way modules were
searched for.

This cleanup required changes that were in the 3.15 queue as well as
some changes that were added late in the 3.14-rc cycle. This final
change waited till the two were merged in upstream and then the change
was added and full tests were run. Unfortunately, the test found some
errors, but after it was already submitted to the for-next branch and
not to be rebased. Sparse errors were detected by Fengguang Wu's bot
tests, and my internal tests discovered that the anonymous union
initialization triggered a bug in older gcc compilers. Luckily, there
was a bugzilla for the gcc bug which gave a work around to the
problem. The third and fourth patch handled the sparse error and the
gcc bug respectively.

A final patch was tagged along to fix a missing documentation for the
README file"

* tag 'trace-3.15-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Add missing function triggers dump and cpudump to README
tracing: Fix anonymous unions in struct ftrace_event_call
tracepoint: Fix sparse warnings in tracepoint.c
tracepoint: Simplify tracepoint module search
tracepoint: Use struct pointer instead of name hash for reg/unreg tracepoints

Linus Torvalds
2014-04-13 04:06:10 +0800
0b747172d Merge git://git.infradead.org/users/eparis/audit ... Browse Code »

Pull audit updates from Eric Paris.

* git://git.infradead.org/users/eparis/audit: (28 commits)
AUDIT: make audit_is_compat depend on CONFIG_AUDIT_COMPAT_GENERIC
audit: renumber AUDIT_FEATURE_CHANGE into the 1300 range
audit: do not cast audit_rule_data pointers pointlesly
AUDIT: Allow login in non-init namespaces
audit: define audit_is_compat in kernel internal header
kernel: Use RCU_INIT_POINTER(x, NULL) in audit.c
sched: declare pid_alive as inline
audit: use uapi/linux/audit.h for AUDIT_ARCH declarations
syscall_get_arch: remove useless function arguments
audit: remove stray newline from audit_log_execve_info() audit_panic() call
audit: remove stray newlines from audit_log_lost messages
audit: include subject in login records
audit: remove superfluous new- prefix in AUDIT_LOGIN messages
audit: allow user processes to log from another PID namespace
audit: anchor all pid references in the initial pid namespace
audit: convert PPIDs to the inital PID namespace.
pid: get pid_t ppid of task in init_pid_ns
audit: rename the misleading audit_get_context() to audit_take_context()
audit: Add generic compat syscall support
audit: Add CONFIG_HAVE_ARCH_AUDITSYSCALL
...

Linus Torvalds
2014-04-13 03:38:53 +0800

12 Apr, 2014

1 commit

a786c06d9 missing bits of "splice: fix racy pipe->buffers uses" ... Browse Code »

that commit has fixed only the parts of that mess in fs/splice.c itself;
there had been more in several other ->splice_read() instances...

Signed-off-by: Al Viro

Al Viro
2014-04-12 19:04:19 +0800

11 Apr, 2014

3 commits

a227960fe locking/mutex: Fix debug_mutexes ... Browse Code »

debug_mutex_unlock() would bail when !debug_locks and forgets to
actually unlock.

Reported-by: "Michael L. Semon"
Reported-by: "Kirill A. Shutemov"
Reported-by: Valdis Kletnieks
Fixes: 6f008e72cd11 ("locking/mutex: Fix debug checks")
Tested-by: Dave Jones
Cc: Jason Low
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/20140410141559.GE13658@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2014-04-11 16:40:35 +0800
60e69eed8 sched/numa: Fix task_numa_free() lockdep splat ... Browse Code »

Sasha reported that lockdep claims that the following commit:
made numa_group.lock interrupt unsafe:

156654f491dd ("sched/numa: Move task_numa_free() to __put_task_struct()")

While I don't see how that could be, given the commit in question moved
task_numa_free() from one irq enabled region to another, the below does
make both gripes and lockups upon gripe with numa=fake=4 go away.

Reported-by: Sasha Levin
Fixes: 156654f491dd ("sched/numa: Move task_numa_free() to __put_task_struct()")
Signed-off-by: Mike Galbraith
Signed-off-by: Peter Zijlstra
Cc: torvalds@linux-foundation.org
Cc: mgorman@suse.com
Cc: akpm@linux-foundation.org
Cc: Dave Jones
Link: http://lkml.kernel.org/r/1396860915.5170.5.camel@marge.simpson.net
Signed-off-by: Ingo Molnar

Mike Galbraith
2014-04-11 16:39:15 +0800
17a280ea8 tracing: Add missing function triggers dump and cpudump to README ... Browse Code »

The debugfs tracing README file lists all the function triggers except for
dump and cpudump. These should be added too.

Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2014-04-11 10:43:37 +0800

10 Apr, 2014

1 commit

abb43f699 tracing: Fix anonymous unions in struct ftrace_event_call ... Browse Code »

gcc
Signed-off-by: Steven Rostedt

Mathieu Desnoyers
2014-04-10 08:02:55 +0800

09 Apr, 2014

4 commits

69cd9eba3 futex: avoid race between requeue and wake ... Browse Code »
5

Jan Stancek reported:
"pthread_cond_broadcast/4-1.c testcase from openposix testsuite (LTP)
occasionally fails, because some threads fail to wake up.

Testcase creates 5 threads, which are all waiting on same condition.
Main thread then calls pthread_cond_broadcast() without holding mutex,
which calls:

futex(uaddr1, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, uaddr2, ..)

This immediately wakes up single thread A, which unlocks mutex and
tries to wake up another thread:

futex(uaddr2, FUTEX_WAKE_PRIVATE, 1)

If thread A manages to call futex_wake() before any waiters are
requeued for uaddr2, no other thread is woken up"

The ordering constraints for the hash bucket waiter counting are that
the waiter counts have to be incremented _before_ getting the spinlock
(because the spinlock acts as part of the memory barrier), but the
"requeue" operation didn't honor those rules, and nobody had even
thought about that case.

This fairly simple patch just increments the waiter count for the target
hash bucket (hb2) when requeing a futex before taking the locks. It
then decrements them again after releasing the lock - the code that
actually moves the futex(es) between hash buckets will do the additional
required waiter count housekeeping.

Reported-and-tested-by: Jan Stancek
Acked-by: Davidlohr Bueso
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: stable@vger.kernel.org # 3.14
Signed-off-by: Linus Torvalds

Linus Torvalds
2014-04-09 23:02:12 +0800
b725dfea2 tracepoint: Fix sparse warnings in tracepoint.c ... Browse Code »

Fix the following sparse warnings:

CHECK kernel/tracepoint.c
kernel/tracepoint.c:184:18: warning: incorrect type in assignment (different address spaces)
kernel/tracepoint.c:184:18: expected struct tracepoint_func *tp_funcs
kernel/tracepoint.c:184:18: got struct tracepoint_func [noderef] *funcs
kernel/tracepoint.c:216:18: warning: incorrect type in assignment (different address spaces)
kernel/tracepoint.c:216:18: expected struct tracepoint_func *tp_funcs
kernel/tracepoint.c:216:18: got struct tracepoint_func [noderef] *funcs
kernel/tracepoint.c:392:24: error: return expression in void function
CC kernel/tracepoint.o
kernel/tracepoint.c: In function tracepoint_module_going:
kernel/tracepoint.c:491:6: warning: symbol 'syscall_regfunc' was not declared. Should it be static?
kernel/tracepoint.c:508:6: warning: symbol 'syscall_unregfunc' was not declared. Should it be static?

Link: http://lkml.kernel.org/r/1397049883-28692-1-git-send-email-mathieu.desnoyers@efficios.com

Signed-off-by: Mathieu Desnoyers
Signed-off-by: Steven Rostedt

Mathieu Desnoyers
2014-04-09 22:12:11 +0800
eb7d035c5 tracepoint: Simplify tracepoint module search ... Browse Code »

Instead of copying the num_tracepoints and tracepoints_ptrs from
the module structure to the tp_mod structure, which only uses it to
find the module associated to tracepoints of modules that are coming
and going, simply copy the pointer to the module struct to the tracepoint
tp_module structure.

Also removed un-needed brackets around an if statement.

Link: http://lkml.kernel.org/r/20140408201705.4dad2c4a@gandalf.local.home

Acked-by: Mathieu Desnoyers
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2014-04-09 08:45:34 +0800
de7b29739 tracepoint: Use struct pointer instead of name hash for reg/unreg tracepoints ... Browse Code »
28

Register/unregister tracepoint probes with struct tracepoint pointer
rather than tracepoint name.

This change, which vastly simplifies tracepoint.c, has been proposed by
Steven Rostedt. It also removes 8.8kB (mostly of text) to the vmlinux
size.

From this point on, the tracers need to pass a struct tracepoint pointer
to probe register/unregister. A probe can now only be connected to a
tracepoint that exists. Moreover, tracers are responsible for
unregistering the probe before the module containing its associated
tracepoint is unloaded.

text data bss dec hex filename
10443444 4282528 10391552 25117524 17f4354 vmlinux.orig
10434930 4282848 10391552 25109330 17f2352 vmlinux

Link: http://lkml.kernel.org/r/1396992381-23785-2-git-send-email-mathieu.desnoyers@efficios.com

CC: Ingo Molnar
CC: Frederic Weisbecker
CC: Andrew Morton
CC: Frank Ch. Eigler
CC: Johannes Berg
Signed-off-by: Mathieu Desnoyers
[ SDR - fixed return val in void func in tracepoint_module_going() ]
Signed-off-by: Steven Rostedt

Mathieu Desnoyers
2014-04-09 08:43:28 +0800

08 Apr, 2014

10 commits

26c12d933 Merge branch 'akpm' (incoming from Andrew) ... Browse Code »

Merge second patch-bomb from Andrew Morton:
- the rest of MM
- zram updates
- zswap updates
- exit
- procfs
- exec
- wait
- crash dump
- lib/idr
- rapidio
- adfs, affs, bfs, ufs
- cris
- Kconfig things
- initramfs
- small amount of IPC material
- percpu enhancements
- early ioremap support
- various other misc things

* emailed patches from Andrew Morton : (156 commits)
MAINTAINERS: update Intel C600 SAS driver maintainers
fs/ufs: remove unused ufs_super_block_third pointer
fs/ufs: remove unused ufs_super_block_second pointer
fs/ufs: remove unused ufs_super_block_first pointer
fs/ufs/super.c: add __init to init_inodecache()
doc/kernel-parameters.txt: add early_ioremap_debug
arm64: add early_ioremap support
arm64: initialize pgprot info earlier in boot
x86: use generic early_ioremap
mm: create generic early_ioremap() support
x86/mm: sparse warning fix for early_memremap
lglock: map to spinlock when !CONFIG_SMP
percpu: add preemption checks to __this_cpu ops
vmstat: use raw_cpu_ops to avoid false positives on preemption checks
slub: use raw_cpu_inc for incrementing statistics
net: replace __this_cpu_inc in route.c with raw_cpu_inc
modules: use raw_cpu_write for initialization of per cpu refcount.
mm: use raw_cpu ops for determining current NUMA node
percpu: add raw_cpu_ops
slub: fix leak of 'name' in sysfs_slab_add
...

Linus Torvalds
2014-04-08 07:38:06 +0800
64b47e8fd lglock: map to spinlock when !CONFIG_SMP ... Browse Code »

When the system has only one CPU, lglock is effectively a spinlock; map
it directly to spinlock to eliminate the indirection and duplicate code.

In addition to removing overhead, this drops 1.6k of code with a
defconfig modified to have !CONFIG_SMP, and 1.1k with a minimal config.

Signed-off-by: Josh Triplett
Cc: Rusty Russell
Cc: Michal Marek
Cc: Thomas Gleixner
Cc: David Howells
Cc: "H. Peter Anvin"
Cc: Nick Piggin
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josh Triplett
2014-04-08 07:36:14 +0800
08f141d3d modules: use raw_cpu_write for initialization of per cpu refcount. ... Browse Code »

The initialization of a structure is not subject to synchronization.
The use of __this_cpu would trigger a false positive with the additional
preemption checks for __this_cpu ops.

So simply disable the check through the use of raw_cpu ops.

Trace:

__this_cpu_write operation in preemptible [00000000] code: modprobe/286
caller is __this_cpu_preempt_check+0x38/0x60
CPU: 3 PID: 286 Comm: modprobe Tainted: GF 3.12.0-rc4+ #187
Call Trace:
dump_stack+0x4e/0x82
check_preemption_disabled+0xec/0x110
__this_cpu_preempt_check+0x38/0x60
load_module+0xcfd/0x2650
SyS_init_module+0xa6/0xd0
tracesys+0xe1/0xe6

Signed-off-by: Christoph Lameter
Acked-by: Ingo Molnar
Acked-by: Rusty Russell
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2014-04-08 07:36:14 +0800
52f5684c8 kernel: use macros from compiler.h instead of __attribute__((...)) ... Browse Code »

To increase compiler portability there is which
provides convenience macros for various gcc constructs. Eg: __weak for
__attribute__((weak)). I've replaced all instances of gcc attributes
with the right macro in the kernel subsystem.

Signed-off-by: Gideon Israel Dsouza
Cc: "Rafael J. Wysocki"
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gideon Israel Dsouza
2014-04-08 07:36:11 +0800
d7c0847fe kernel/panic.c: display reason at end + pr_emerg ... Browse Code »

Currently, booting without initrd specified on 80x25 screen gives a call
trace followed by atkbd : Spurious ACK. Original message ("VFS: Unable
to mount root fs") is not available. Of course this could happen in
other situations...

This patch displays panic reason after call trace which could help lot
of people even if it's not the very last line on screen.

Also, convert all panic.c printk(KERN_EMERG to pr_emerg(

[akpm@linux-foundation.org: missed a couple of pr_ conversions]
Signed-off-by: Fabian Frederick
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Fabian Frederick
2014-04-08 07:36:08 +0800
80df28476 hung_task: check the value of "sysctl_hung_task_timeout_sec" ... Browse Code »
5

As sysctl_hung_task_timeout_sec is unsigned long, when this value is
larger then LONG_MAX/HZ, the function schedule_timeout_interruptible in
watchdog will return immediately without sleep and with print :

schedule_timeout: wrong timeout value ffffffffffffff83

and then the funtion watchdog will call schedule_timeout_interruptible
again and again. The screen will be filled with

"schedule_timeout: wrong timeout value ffffffffffffff83"

This patch does some check and correction in sysctl, to let the function
schedule_timeout_interruptible allways get the valid parameter.

Signed-off-by: Liu Hua
Tested-by: Satoru Takeuchi
Cc: [3.4+]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Liu Hua
2014-04-08 07:36:07 +0800
7c733eb3e wait: WSTOPPED|WCONTINUED doesn't work if a zombie leader is traced by another process ... Browse Code »

Even if the main thread is dead the process still can stop/continue.
However, if the leader is ptraced wait_consider_task(ptrace => false)
always skips wait_task_stopped/wait_task_continued, so WSTOPPED or
WCONTINUED can never work for the natural parent in this case.

Move the "A zombie ptracee is only visible to its ptracer" check into the
"if (!delay_group_leader(p))" block. ->notask_error is cleared by the
"fall through" code below.

This depends on the previous change, wait_task_stopped/continued must be
avoided if !delay_group_leader() and the tracer is ->real_parent.
Otherwise WSTOPPED|WEXITED could wrongly report "stopped" when the child
is already dead (single-threaded or not). If it is traced by another task
then the "stopped" state is fine until the debugger detaches and reveals a
zombie state.

Stupid test-case:

void *tfunc(void *arg)
{
sleep(1); // wait for zombie leader
raise(SIGSTOP);
exit(0x13);
return NULL;
}

int run_child(void)
{
pthread_t thread;

if (!fork()) {
int tracee = getppid();

assert(ptrace(PTRACE_ATTACH, tracee, 0,0) == 0);
do
ptrace(PTRACE_CONT, tracee, 0,0);
while (wait(NULL) > 0);

return 0;
}

sleep(1); // wait for PTRACE_ATTACH
assert(pthread_create(&thread, NULL, tfunc, NULL) == 0);
pthread_exit(NULL);
}

int main(void)
{
int child, stat;

child = fork();
if (!child)
return run_child();

assert(child == waitpid(-1, &stat, WSTOPPED));
assert(stat == 0x137f);

kill(child, SIGCONT);

assert(child == waitpid(-1, &stat, WCONTINUED));
assert(stat == 0xffff);

assert(child == waitpid(-1, &stat, 0));
assert(stat == 0x1300);

return 0;
}

Without this patch it hangs in waitpid(WSTOPPED), wait_task_stopped() is
never called.

Note: this doesn't fix all problems with a zombie delay_group_leader(),
WCONTINUED | WEXITED check is not exactly right. debugger can't assume it
will be notified if another thread reaps the whole thread group.

Signed-off-by: Oleg Nesterov
Cc: Al Viro
Cc: Jan Kratochvil
Cc: Lennart Poettering
Cc: Michal Schmidt
Cc: Roland McGrath
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2014-04-08 07:36:06 +0800
377d75daf wait: WSTOPPED|WCONTINUED hangs if a zombie child is traced by real_parent ... Browse Code »

"A zombie is only visible to its ptracer" logic in wait_consider_task()
is very wrong. Trivial test-case:

#include
#include
#include
#include

int main(void)
{
int child = fork();

if (!child) {
assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
return 0x23;
}

assert(waitid(P_ALL, child, NULL, WEXITED | WNOWAIT) == 0);
assert(waitid(P_ALL, 0, NULL, WSTOPPED) == -1);
return 0;
}

it hangs in waitpid(WSTOPPED) despite the fact it has a single zombie
child. This is because wait_consider_task(ptrace => 0) sees p->ptrace and
cleares ->notask_error assuming that the debugger should detach and notify
us.

Change wait_consider_task(ptrace => 0) to pretend that ptrace == T if the
child is traced by us. This really simplifies the logic and allows us to
do more fixes, see the next changes. This also hides the unwanted group
stop state automatically, we can remove another ptrace_reparented() check.

Unfortunately, this adds the following behavioural changes:

1. Before this patch wait(WEXITED | __WNOTHREAD) does not reap
a natural child if it is traced by the caller's sub-thread.

Hopefully nobody will ever notice this change, and I think
that nobody should rely on this behaviour anyway.

2. SIGNAL_STOP_CONTINUED is no longer hidden from debugger if
it is real parent.

While this change comes as a side effect, I think it is good
by itself. The group continued state can not be consumed by
another process in this case, it doesn't depend on ptrace,
it doesn't make sense to hide it from real parent.

Perhaps we should add the thread_group_leader() check before
wait_task_continued()? May be, but this shouldn't depend on
ptrace_reparented().

Signed-off-by: Oleg Nesterov
Cc: Al Viro
Cc: Jan Kratochvil
Cc: Lennart Poettering
Cc: Michal Schmidt
Cc: Roland McGrath
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2014-04-08 07:36:06 +0800
b3ab03160 wait: completely ignore the EXIT_DEAD tasks ... Browse Code »
13

Now that EXIT_DEAD is the terminal state it doesn't make sense to call
eligible_child() or security_task_wait() if the task is really dead.

Signed-off-by: Oleg Nesterov
Tested-by: Michal Schmidt
Cc: Jan Kratochvil
Cc: Al Viro
Cc: Lennart Poettering
Cc: Roland McGrath
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2014-04-08 07:36:06 +0800
b43606905 wait: use EXIT_TRACE only if thread_group_leader(zombie) ... Browse Code »

wait_task_zombie() always uses EXIT_TRACE/ptrace_unlink() if
ptrace_reparented(). This is suboptimal and a bit confusing: we do not
need do_notify_parent(p) if !thread_group_leader(p) and in this case we
also do not need ptrace_unlink(), we can rely on ptrace_release_task().

Change wait_task_zombie() to check thread_group_leader() along with
ptrace_reparented() and simplify the final p->exit_state transition.

Signed-off-by: Oleg Nesterov
Tested-by: Michal Schmidt
Cc: Jan Kratochvil
Cc: Al Viro
Cc: Lennart Poettering
Cc: Roland McGrath
Cc: Tejun Heo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2014-04-08 07:36:05 +0800