Eric Lee / smarc-fsl-linux-kernel

03 Aug, 2013

1 commit

1fe0135b9 Merge tag 'pm+acpi-3.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull ACPI and power management fixes from Rafael Wysocki:

- Revert two cpuidle commits added during the 3.8 development cycle
that turn out to have introduced a significant performance regression
as requested by Jeremy Eder.

- The recent patches that made the freezer less heavy-weight introduced
a regression causing user-space-driven hibernation using the ioctl()
interface to block indefinitely when the hibernate process executes
try_to_freeze(). Fix from Colin Cross addresses this by adding a
process flag to mark the hibernate/suspend process to inform the
freezer that that process should be ignored.

- One of the recent cpufreq reverts uncovered a problem in the core
causing the cpufreq driver module refcount to become negative after a
system suspend-resume cycle. Fix from Rafael J Wysocki.

- The evaluation of the ACPI battery _BIX method has never worked
correctly, because the commit that added support for it forgot to
take the "Revision" field in the return package into account. As a
result, the reading of battery info doesn't work at all on some
systems, which is addressed by a fix from Lan Tianyu.

* tag 'pm+acpi-3.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
freezer: set PF_SUSPEND_TASK flag on tasks that call freeze_processes
ACPI / battery: Fix parsing _BIX return value
cpufreq: Fix cpufreq driver module refcount balance after suspend/resume
Revert "cpuidle: Quickly notice prediction failure for repeat mode"
Revert "cpuidle: Quickly notice prediction failure in general case"

Linus Torvalds
2013-08-03 03:21:32 +0800

01 Aug, 2013

8 commits

19788a900 Merge branch 'akpm' (patches from Andrew Morton) ... Browse Code »

Merge more patches from Andrew Morton:
"A bunch of fixes.

Plus Joe's printk move and rework. It's not a -rc3 thing but now
would be a nice time to offload it, while things are quiet. I've been
sitting on it all for a couple of weeks, no issues"

* emailed patches from Andrew Morton :
vmpressure: make sure there are no events queued after memcg is offlined
vmpressure: do not check for pending work to prevent from new work
vmpressure: change vmpressure::sr_lock to spinlock
printk: rename struct log to struct printk_log
printk: use pointer for console_cmdline indexing
printk: move braille console support into separate braille.[ch] files
printk: add console_cmdline.h
printk: move to separate directory for easier modification
drivers/rtc/rtc-twl.c: fix: rtcX/wakealarm attribute isn't created
mm: zbud: fix condition check on allocation size
thp, mm: avoid PageUnevictable on active/inactive lru lists
mm/swap.c: clear PageActive before adding pages onto unevictable list
arch/x86/platform/ce4100/ce4100.c: include reboot.h
mm: sched: numa: fix NUMA balancing when !SCHED_DEBUG
rapidio: fix use after free in rio_unregister_scan()
.gitignore: ignore *.lz4 files
MAINTAINERS: dynamic debug: Jason's not there...
dmi_scan: add comments on dmi_present() and the loop in dmi_scan_machine()
ocfs2/refcounttree: add the missing NULL check of the return value of find_or_create_page()
mm: mempolicy: fix mbind_range() && vma_adjust() interaction

Linus Torvalds
2013-08-01 08:52:04 +0800
62e32ac35 printk: rename struct log to struct printk_log ... Browse Code »

Rename the struct to enable moving portions of
printk.c to separate files.

The rename changes output of /proc/vmcoreinfo.

Signed-off-by: Joe Perches
Cc: Samuel Thibault
Cc: Ming Lei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2013-08-01 05:41:03 +0800
23475408c printk: use pointer for console_cmdline indexing ... Browse Code »

Make the code a bit more compact by always using a pointer for the active
console_cmdline.

Move overly indented code to correct indent level.

Signed-off-by: Joe Perches
Cc: Samuel Thibault
Cc: Ming Lei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2013-08-01 05:41:03 +0800
bbeddf52a printk: move braille console support into separate braille.[ch] files ... Browse Code »

Create files with prototypes and static inlines for braille support. Make
braille_console functions return 1 on success.

Corrected CONFIG_A11Y_BRAILLE_CONSOLE=n _braille_console_setup
return value to NULL.

Signed-off-by: Joe Perches
Reviewed-by: Samuel Thibault
Cc: Ming Lei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2013-08-01 05:41:03 +0800
d197c43d0 printk: add console_cmdline.h ... Browse Code »

Add an include file for the console_cmdline struct so that the braille
console driver can be separated.

Signed-off-by: Joe Perches
Cc: Samuel Thibault
Cc: Ming Lei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2013-08-01 05:41:03 +0800
b9ee979e9 printk: move to separate directory for easier modification ... Browse Code »

Make it easier to break up printk into bite-sized chunks.

Remove printk path/filename from comment.

Signed-off-by: Joe Perches
Cc: Samuel Thibault
Cc: Ming Lei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Perches
2013-08-01 05:41:03 +0800
10e84b97e mm: sched: numa: fix NUMA balancing when !SCHED_DEBUG ... Browse Code »

Commit 3105b86a9fee ("mm: sched: numa: Control enabling and disabling of
NUMA balancing if !SCHED_DEBUG") defined numabalancing_enabled to
control the enabling and disabling of automatic NUMA balancing, but it
is never used.

I believe the intention was to use this in place of sched_feat_numa(NUMA).

Currently, if SCHED_DEBUG is not defined, sched_feat_numa(NUMA) will
never be changed from the initial "false".

Signed-off-by: Dave Kleikamp
Acked-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Kleikamp
2013-08-01 05:41:02 +0800
06693f305 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net ... Browse Code »

Pull networking fixes from David Miller:

1) Fix association failures not triggering a connect-failure event in
cfg80211, from Johannes Berg.

2) Eliminate a potential NULL deref with older iptables tools when
configuring xt_socket rules, from Eric Dumazet.

3) Missing RTNL locking in wireless regulatory code, from Johannes
Berg.

4) Fix OOPS caused by firmware loading races in ath9k_htc, from Alexey
Khoroshilov.

5) Fix usb URB leak in usb_8dev CAN driver, also from Alexey
Khoroshilov.

6) VXLAN namespace teardown fails to unregister devices, from Stephen
Hemminger.

7) Fix multicast settings getting dropped by firmware in qlcnic driver,
from Sucheta Chakraborty.

8) Add sysctl range enforcement for tcp_syn_retries, from Michal Tesar.

9) Fix a nasty bug in bridging where an active timer would get
reinitialized with a setup_timer() call. From Eric Dumazet.

10) Fix use after free in new mlx5 driver, from Dan Carpenter.

11) Fix freed pointer reference in ipv6 multicast routing on namespace
cleanup, from Hannes Frederic Sowa.

12) Some usbnet drivers report TSO and SG in their feature set, but the
usbnet layer doesn't really support them. From Eric Dumazet.

13) Fix crash on EEH errors in tg3 driver, from Gavin Shan.

14) Drop cb_lock when requesting modules in genetlink, from Stanislaw
Gruszka.

15) Kernel stack leaks in cbq scheduler and af_key pfkey messages, from
Dan Carpenter.

16) FEC driver erroneously signals NETDEV_TX_BUSY on transmit leading to
endless loops, from Uwe Kleine-König.

17) Fix hangs from loading mvneta driver, from Arnaud Patard.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (84 commits)
mlx5: fix error return code in mlx5_alloc_uuars()
mvneta: Try to fix mvneta when compiled as module
mvneta: Fix hang when loading the mvneta driver
atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring
genetlink: fix usage of NLM_F_EXCL or NLM_F_REPLACE
af_key: more info leaks in pfkey messages
net/fec: Don't let ndo_start_xmit return NETDEV_TX_BUSY without link
net_sched: Fix stack info leak in cbq_dump_wrr().
igb: fix vlan filtering in promisc mode when not in VT mode
ixgbe: Fix Tx Hang issue with lldpad on 82598EB
genetlink: release cb_lock before requesting additional module
net: fec: workaround stop tx during errata ERR006358
qlcnic: Fix diagnostic interrupt test for 83xx adapters.
qlcnic: Fix setting Guest VLAN
qlcnic: Fix operation type and command type.
qlcnic: Fix initialization of work function.
Revert "atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring"
atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring
net/tg3: Fix warning from pci_disable_device()
net/tg3: Fix kernel crash
...

Linus Torvalds
2013-08-01 03:56:18 +0800

30 Jul, 2013

1 commit

2b44c4db2 freezer: set PF_SUSPEND_TASK flag on tasks that call freeze_processes ... Browse Code »

Calling freeze_processes sets a global flag that will cause any
process that calls try_to_freeze to enter the refrigerator. It
skips sending a signal to the current task, but if the current
task ever hits try_to_freeze, all threads will be frozen and the
system will deadlock.

Set a new flag, PF_SUSPEND_TASK, on the task that calls
freeze_processes. The flag notifies the freezer that the thread
is involved in suspend and should not be frozen. Also add a
WARN_ON in thaw_processes if the caller does not have the
PF_SUSPEND_TASK flag set to catch if a different task calls
thaw_processes than the one that called freeze_processes, leaving
a task with PF_SUSPEND_TASK permanently set on it.

Threads that spawn off a task with PF_SUSPEND_TASK set (which
swsusp does) will also have PF_SUSPEND_TASK set, preventing them
from freezing while they are helping with suspend, but they need
to be dead by the time suspend is triggered, otherwise they may
run when userspace is expected to be frozen. Add a WARN_ON in
thaw_processes if more than one thread has the PF_SUSPEND_TASK
flag set.

Reported-and-tested-by: Michael Leun
Signed-off-by: Colin Cross
Signed-off-by: Rafael J. Wysocki

Colin Cross
2013-07-30 20:05:06 +0800

29 Jul, 2013

2 commits

148519120 Revert "cpuidle: Quickly notice prediction failure for repeat mode" ... Browse Code »

Revert commit 69a37bea (cpuidle: Quickly notice prediction failure for
repeat mode), because it has been identified as the source of a
significant performance regression in v3.8 and later as explained by
Jeremy Eder:

We believe we've identified a particular commit to the cpuidle code
that seems to be impacting performance of variety of workloads.
The simplest way to reproduce is using netperf TCP_RR test, so
we're using that, on a pair of Sandy Bridge based servers. We also
have data from a large database setup where performance is also
measurably/positively impacted, though that test data isn't easily
share-able.

Included below are test results from 3 test kernels:

kernel reverts
-----------------------------------------------------------
1) vanilla upstream (no reverts)

2) perfteam2 reverts e11538d1f03914eb92af5a1a378375c05ae8520c

3) test reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4
e11538d1f03914eb92af5a1a378375c05ae8520c

In summary, netperf TCP_RR numbers improve by approximately 4%
after reverting 69a37beabf1f0a6705c08e879bdd5d82ff6486c4. When
69a37beabf1f0a6705c08e879bdd5d82ff6486c4 is included, C0 residency
never seems to get above 40%. Taking that patch out gets C0 near
100% quite often, and performance increases.

The below data are histograms representing the %c0 residency @
1-second sample rates (using turbostat), while under netperf test.

- If you look at the first 4 histograms, you can see %c0 residency
almost entirely in the 30,40% bin.
- The last pair, which reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4,
shows %c0 in the 80,90,100% bins.

Below each kernel name are netperf TCP_RR trans/s numbers for the
particular kernel that can be disclosed publicly, comparing the 3
test kernels. We ran a 4th test with the vanilla kernel where
we've also set /dev/cpu_dma_latency=0 to show overall impact
boosting single-threaded TCP_RR performance over 11% above
baseline.

3.10-rc2 vanilla RX + c0 lock (/dev/cpu_dma_latency=0):
TCP_RR trans/s 54323.78

-----------------------------------------------------------
3.10-rc2 vanilla RX (no reverts)
TCP_RR trans/s 48192.47

Receiver %c0
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 0]:
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 59]:
***********************************************************
40.0000 - 50.0000 [ 1]: *
50.0000 - 60.0000 [ 0]:
60.0000 - 70.0000 [ 0]:
70.0000 - 80.0000 [ 0]:
80.0000 - 90.0000 [ 0]:
90.0000 - 100.0000 [ 0]:

Sender %c0
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 0]:
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 11]: ***********
40.0000 - 50.0000 [ 49]:
*************************************************
50.0000 - 60.0000 [ 0]:
60.0000 - 70.0000 [ 0]:
70.0000 - 80.0000 [ 0]:
80.0000 - 90.0000 [ 0]:
90.0000 - 100.0000 [ 0]:

-----------------------------------------------------------
3.10-rc2 perfteam2 RX (reverts commit
e11538d1f03914eb92af5a1a378375c05ae8520c)
TCP_RR trans/s 49698.69

Receiver %c0
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 1]: *
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 59]:
***********************************************************
40.0000 - 50.0000 [ 0]:
50.0000 - 60.0000 [ 0]:
60.0000 - 70.0000 [ 0]:
70.0000 - 80.0000 [ 0]:
80.0000 - 90.0000 [ 0]:
90.0000 - 100.0000 [ 0]:

Sender %c0
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 0]:
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 2]: **
40.0000 - 50.0000 [ 58]:
**********************************************************
50.0000 - 60.0000 [ 0]:
60.0000 - 70.0000 [ 0]:
70.0000 - 80.0000 [ 0]:
80.0000 - 90.0000 [ 0]:
90.0000 - 100.0000 [ 0]:

-----------------------------------------------------------
3.10-rc2 test RX (reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4
and e11538d1f03914eb92af5a1a378375c05ae8520c)
TCP_RR trans/s 47766.95

Receiver %c0
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 1]: *
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 27]: ***************************
40.0000 - 50.0000 [ 2]: **
50.0000 - 60.0000 [ 0]:
60.0000 - 70.0000 [ 2]: **
70.0000 - 80.0000 [ 0]:
80.0000 - 90.0000 [ 0]:
90.0000 - 100.0000 [ 28]: ****************************

Sender:
0.0000 - 10.0000 [ 1]: *
10.0000 - 20.0000 [ 0]:
20.0000 - 30.0000 [ 0]:
30.0000 - 40.0000 [ 11]: ***********
40.0000 - 50.0000 [ 0]:
50.0000 - 60.0000 [ 1]: *
60.0000 - 70.0000 [ 0]:
70.0000 - 80.0000 [ 3]: ***
80.0000 - 90.0000 [ 7]: *******
90.0000 - 100.0000 [ 38]: **************************************

These results demonstrate gaining back the tendency of the CPU to
stay in more responsive, performant C-states (and thus yield
measurably better performance), by reverting commit
69a37beabf1f0a6705c08e879bdd5d82ff6486c4.

Requested-by: Jeremy Eder
Tested-by: Len Brown
Cc: 3.8+
Signed-off-by: Rafael J. Wysocki

Rafael J. Wysocki
2013-07-29 19:32:29 +0800
6803f37e0 Merge tag 'trace-fixes-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/gi… ... Browse Code »

…t/rostedt/linux-trace

Pull tracing fixes from Steven Rostedt:
"Oleg is working on fixing a very tight race between opening a event
file and deleting that event at the same time (both must be done as
root).

I also found a bug while testing Oleg's patches which has to do with a
race with kprobes using the function tracer.

There's also a deadlock fix that was introduced with the previous
fixes"

* tag 'trace-fixes-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Remove locking trace_types_lock from tracing_reset_all_online_cpus()
ftrace: Add check for NULL regs if ops has SAVE_REGS set
tracing: Kill trace_cpu struct/members
tracing: Change tracing_fops/snapshot_fops to rely on tracing_get_cpu()
tracing: Change tracing_entries_fops to rely on tracing_get_cpu()
tracing: Change tracing_stats_fops to rely on tracing_get_cpu()
tracing: Change tracing_buffers_fops to rely on tracing_get_cpu()
tracing: Change tracing_pipe_fops() to rely on tracing_get_cpu()
tracing: Introduce trace_create_cpu_file() and tracing_get_cpu()

Linus Torvalds
2013-07-29 09:10:39 +0800

27 Jul, 2013

1 commit

d738ce8fd sysctl: range checking in do_proc_dointvec_ms_jiffies_conv ... Browse Code »

When (integer) sysctl values are expressed in ms and have to be
represented internally as jiffies. The msecs_to_jiffies function
returns an unsigned long, which gets assigned to the integer.
This patch prevents the value to be assigned if bigger than
INT_MAX, done in a similar way as in cba9f3 ("Range checking in
do_proc_dointvec_(userhz_)jiffies_conv").

Signed-off-by: Francesco Fusco
CC: Andrew Morton
CC: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller

Francesco Fusco
2013-07-27 05:22:10 +0800

26 Jul, 2013

1 commit

09d8091c0 tracing: Remove locking trace_types_lock from tracing_reset_all_online_cpus() ... Browse Code »

Commit a82274151af "tracing: Protect ftrace_trace_arrays list in trace_events.c"
added taking the trace_types_lock mutex in trace_events.c as there were
several locations that needed it for protection. Unfortunately, it also
encapsulated a call to tracing_reset_all_online_cpus() which also takes
the trace_types_lock, causing a deadlock.

This happens when a module has tracepoints and has been traced. When the
module is removed, the trace events module notifier will grab the
trace_types_lock, do a bunch of clean ups, and also clears the buffer
by calling tracing_reset_all_online_cpus. This doesn't happen often
which explains why it wasn't caught right away.

Commit a82274151af was marked for stable, which means this must be
sent to stable too.

Link: http://lkml.kernel.org/r/51EEC646.7070306@broadcom.com

Reported-by: Arend van Spril
Tested-by: Arend van Spriel
Cc: Alexander Z Lam
Cc: Vaibhav Nagarnaik
Cc: David Sharp
Cc: stable@vger.kernel.org # 3.10
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2013-07-26 20:57:32 +0800

24 Jul, 2013

10 commits

195a8afc7 ftrace: Add check for NULL regs if ops has SAVE_REGS set ... Browse Code »

If a ftrace ops is registered with the SAVE_REGS flag set, and there's
already a ops registered to one of its functions but without the
SAVE_REGS flag, there's a small race window where the SAVE_REGS ops gets
added to the list of callbacks to call for that function before the
callback trampoline gets set to save the regs.

The problem is, the function is not currently saving regs, which opens
a small race window where the ops that is expecting regs to be passed
to it, wont. This can cause a crash if the callback were to reference
the regs, as the SAVE_REGS guarantees that regs will be set.

To fix this, we add a check in the loop case where it checks if the ops
has the SAVE_REGS flag set, and if so, it will ignore it if regs is
not set.

Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2013-07-24 23:22:54 +0800
9c01fe459 tracing: Kill trace_cpu struct/members ... Browse Code »

After the previous changes trace_array_cpu->trace_cpu and
trace_array->trace_cpu becomes write-only. Remove these members
and kill "struct trace_cpu" as well.

As a side effect this also removes memset(per_cpu_memory, 0).
It was not needed, alloc_percpu() returns zero-filled memory.

Link: http://lkml.kernel.org/r/20130723152613.GA23741@redhat.com

Signed-off-by: Oleg Nesterov
Signed-off-by: Steven Rostedt

Oleg Nesterov
2013-07-24 23:22:53 +0800
6484c71cb tracing: Change tracing_fops/snapshot_fops to rely on tracing_get_cpu() ... Browse Code »

tracing_open() and tracing_snapshot_open() are racy, the memory
inode->i_private points to can be already freed.

Convert these last users of "inode->i_private == trace_cpu" to
use "i_private = trace_array" and rely on tracing_get_cpu().

v2: incorporate the fix from Steven, tracing_release() must not
blindly dereference file->private_data unless we know that
the file was opened for reading.

Link: http://lkml.kernel.org/r/20130723152610.GA23737@redhat.com

Signed-off-by: Oleg Nesterov
Signed-off-by: Steven Rostedt

Oleg Nesterov
2013-07-24 23:22:53 +0800
0bc392ee4 tracing: Change tracing_entries_fops to rely on tracing_get_cpu() ... Browse Code »

tracing_open_generic_tc() is racy, the memory inode->i_private
points to can be already freed.

1. Change its last user, tracing_entries_fops, to use
tracing_*_generic_tr() instead.

2. Change debugfs_create_file("buffer_size_kb", data) callers
to pass "data = tr".

3. Change tracing_entries_read() and tracing_entries_write() to
use tracing_get_cpu().

4. Kill the no longer used tracing_open_generic_tc() and
tracing_release_generic_tc().

Link: http://lkml.kernel.org/r/20130723152606.GA23730@redhat.com

Signed-off-by: Oleg Nesterov
Signed-off-by: Steven Rostedt

Oleg Nesterov
2013-07-24 23:22:52 +0800
4d3435b8a tracing: Change tracing_stats_fops to rely on tracing_get_cpu() ... Browse Code »

tracing_open_generic_tc() is racy, the memory inode->i_private
points to can be already freed.

1. Change one of its users, tracing_stats_fops, to use
tracing_*_generic_tr() instead.

2. Change trace_create_cpu_file("stats", data) to pass "data = tr".

3. Change tracing_stats_read() to use tracing_get_cpu().

Link: http://lkml.kernel.org/r/20130723152603.GA23727@redhat.com

Signed-off-by: Oleg Nesterov
Signed-off-by: Steven Rostedt

Oleg Nesterov
2013-07-24 23:22:52 +0800
46ef2be0d tracing: Change tracing_buffers_fops to rely on tracing_get_cpu() ... Browse Code »

tracing_buffers_open() is racy, the memory inode->i_private points
to can be already freed.

Change debugfs_create_file("trace_pipe_raw", data) caller to pass
"data = tr", tracing_buffers_open() can use tracing_get_cpu().

Change debugfs_create_file("snapshot_raw_fops", data) caller too,
this file uses tracing_buffers_open/release.

Link: http://lkml.kernel.org/r/20130723152600.GA23720@redhat.com

Signed-off-by: Oleg Nesterov
Signed-off-by: Steven Rostedt

Oleg Nesterov
2013-07-24 23:22:51 +0800
15544209c tracing: Change tracing_pipe_fops() to rely on tracing_get_cpu() ... Browse Code »

tracing_open_pipe() is racy, the memory inode->i_private points to
can be already freed.

Change debugfs_create_file("trace_pipe", data) callers to to pass
"data = tr", tracing_open_pipe() can use tracing_get_cpu().

Link: http://lkml.kernel.org/r/20130723152557.GA23717@redhat.com

Signed-off-by: Oleg Nesterov
Signed-off-by: Steven Rostedt

Oleg Nesterov
2013-07-24 23:22:51 +0800
649e9c70d tracing: Introduce trace_create_cpu_file() and tracing_get_cpu() ... Browse Code »

Every "file_operations" used by tracing_init_debugfs_percpu is buggy.
f_op->open/etc does:

1. struct trace_cpu *tc = inode->i_private;
struct trace_array *tr = tc->tr;

2. trace_array_get(tr) or fail;

3. do_something(tc);

But tc (and tr) can be already freed before trace_array_get() is called.
And it doesn't matter whether this file is per-cpu or it was created by
init_tracer_debugfs(), free_percpu() or kfree() are equally bad.

Note that even 1. is not safe, the freed memory can be unmapped. But even
if it was safe trace_array_get() can wrongly succeed if we also race with
the next new_instance_create() which can re-allocate the same tr, or tc
was overwritten and ->tr points to the valid tr. In this case 3. uses the
freed/reused memory.

Add the new trivial helper, trace_create_cpu_file() which simply calls
trace_create_file() and encodes "cpu" in "struct inode". Another helper,
tracing_get_cpu() will be used to read cpu_nr-or-RING_BUFFER_ALL_CPUS.

The patch abuses ->i_cdev to encode the number, it is never used unless
the file is S_ISCHR(). But we could use something else, say, i_bytes or
even ->d_fsdata. In any case this hack is hidden inside these 2 helpers,
it would be trivial to change them if needed.

This patch only changes tracing_init_debugfs_percpu() to use the new
trace_create_cpu_file(), the next patches will change file_operations.

Note: tracing_get_cpu(inode) is always safe but you can't trust the
result unless trace_array_get() was called, without trace_types_lock
which acts as a barrier it can wrongly return RING_BUFFER_ALL_CPUS.

Link: http://lkml.kernel.org/r/20130723152554.GA23710@redhat.com

Cc: Al Viro
Signed-off-by: Oleg Nesterov
Signed-off-by: Steven Rostedt

Oleg Nesterov
2013-07-24 23:22:13 +0800
c2468d32f Merge branch 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup changes from Tejun Heo:
"This contains two patches, both of which aren't fixes per-se but I
think it'd be better to fast-track them.

One removes bcache_subsys_id which was added without proper review
through the block tree. Fortunately, bcache cgroup code is
unconditionally disabled, so this was never exposed to userland. The
cgroup subsys_id is removed. Kent will remove the affected (disabled)
code through bcache branch.

The other simplifies task_group_path_from_hierarchy(). The function
doesn't currently have in-kernel users but there are external code and
development going on dependent on the function and making the function
available for 3.11 would make things go smoother"

* 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: replace task_cgroup_path_from_hierarchy() with task_cgroup_path()
cgroup: remove bcache_subsys_id which got added stealthily

Linus Torvalds
2013-07-24 06:48:35 +0800
42577ca8c Fix __wait_on_atomic_t() to call the action func if the counter != 0 ... Browse Code »

Fix __wait_on_atomic_t() so that it calls the action func if the counter != 0
rather than if the counter is 0 so as to be analogous to __wait_on_bit().

Thanks to Yacine who found this by visual inspection.

This will affect FS-Cache in that it will could fail to sleep correctly when
trying to clean up after a netfs cookie is withdrawn.

Reported-by: Yacine Belkadi
Signed-off-by: David Howells
Reviewed-by: Jeff Layton
cc: Milosz Tanski
Signed-off-by: Linus Torvalds

David Howells
2013-07-24 06:46:48 +0800

23 Jul, 2013

1 commit

b3a3a9c44 Merge tag 'trace-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace ... Browse Code »

Pull tracing fixes and cleanups from Steven Rostedt:
"This contains fixes, optimizations and some clean ups

Some of the fixes need to go back to 3.10. They are minor, and deal
mostly with incorrect ref counting in accessing event files.

There was a couple of optimizations that should have perf perform a
bit better when accessing trace events.

And some various clean ups. Some of the clean ups are necessary to
help in a fix to a theoretical race between opening a event file and
deleting that event"

* tag 'trace-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Kill the unbalanced tr->ref++ in tracing_buffers_open()
tracing: Kill trace_array->waiter
tracing: Do not (ab)use trace_seq in event_id_read()
tracing: Simplify the iteration logic in f_start/f_next
tracing: Add ref_data to function and fgraph tracer structs
tracing: Miscellaneous fixes for trace_array ref counting
tracing: Fix error handling to ensure instances can always be removed
tracing/kprobe: Wait for disabling all running kprobe handlers
tracing/perf: Move the PERF_MAX_TRACE_SIZE check into perf_trace_buf_prepare()
tracing/syscall: Avoid perf_trace_buf_*() if sys_data->perf_events is empty
tracing/function: Avoid perf_trace_buf_*() if event_function.perf_events is empty
tracing: Typo fix on ring buffer comments
tracing: Use trace_seq_puts()/trace_seq_putc() where possible
tracing: Use correct config guard CONFIG_STACK_TRACER

Linus Torvalds
2013-07-23 10:07:24 +0800

20 Jul, 2013

2 commits

e70e78e3c tracing: Kill the unbalanced tr->ref++ in tracing_buffers_open() ... Browse Code »

tracing_buffers_open() does trace_array_get() and then it wrongly
inrcements tr->ref again under trace_types_lock. This means that
every caller leaks trace_array:

# cd /sys/kernel/debug/tracing/
# mkdir instances/X
# true < instances/X/per_cpu/cpu0/trace_pipe_raw
# rmdir instances/X
rmdir: failed to remove `instances/X': Device or resource busy

Link: http://lkml.kernel.org/r/20130719153644.GA18899@redhat.com

Cc: Ingo Molnar
Cc: Frederic Weisbecker
Cc: Masami Hiramatsu
Cc: stable@vger.kernel.org # 3.10
Signed-off-by: Oleg Nesterov
Signed-off-by: Steven Rostedt

Oleg Nesterov
2013-07-20 02:32:22 +0800
b7356abb9 Merge tag 'pm+acpi-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull power management and ACPI fixes from Rafael Wysocki:
"These are fixes collected over the last week, most importnatly two
cpufreq reverts fixing regressions introduced in 3.10, an autoseelp
fix preventing systems using it from crashing during shutdown and two
ACPI scan fixes related to hotplug.

Specifics:

- Two cpufreq commits from the 3.10 cycle introduced regressions.
The first of them was buggy (it did way much more than it needed to
do) and the second one attempted to fix an issue introduced by the
first one. Fixes from Srivatsa S Bhat revert both.

- If autosleep triggers during system shutdown and the shutdown
callbacks of some device drivers have been called already, it may
crash the system. Fix from Liu Shuo prevents that from happening
by making try_to_suspend() check system_state.

- The ACPI memory hotplug driver doesn't clear its driver_data on
errors which may cause a NULL poiter dereference to happen later.
Fix from Toshi Kani.

- The ACPI namespace scanning code should not try to attach scan
handlers to device objects that have them already, which may
confuse things quite a bit, and it should rescan the whole
namespace branch starting at the given node after receiving a bus
check notify event even if the device at that particular node has
been discovered already. Fixes from Rafael J Wysocki.

- New ACPI video blacklist entry for a system whose initial backlight
setting from the BIOS doesn't make sense. From Lan Tianyu.

- Garbage string output avoindance for ACPI PNP from Liu Shuo.

- Two Kconfig fixes for issues introduced recently in the s3c24xx
cpufreq driver (when moving the driver to drivers/cpufreq) from
Paul Bolle.

- Trivial comment fix in pm_wakeup.h from Chanwoo Choi"

* tag 'pm+acpi-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / video: ignore BIOS initial backlight value for Fujitsu E753
PNP / ACPI: avoid garbage in resource name
cpufreq: Revert commit 2f7021a8 to fix CPU hotplug regression
cpufreq: s3c24xx: fix "depends on ARM_S3C24XX" in Kconfig
cpufreq: s3c24xx: rename CONFIG_CPU_FREQ_S3C24XX_DEBUGFS
PM / Sleep: Fix comment typo in pm_wakeup.h
PM / Sleep: avoid 'autosleep' in shutdown progress
cpufreq: Revert commit a66b2e to fix suspend/resume regression
ACPI / memhotplug: Fix a stale pointer in error path
ACPI / scan: Always call acpi_bus_scan() for bus check notifications
ACPI / scan: Do not try to attach scan handlers to devices having them

Linus Torvalds
2013-07-20 00:59:06 +0800

19 Jul, 2013

13 commits

a644a7e95 tracing: Kill trace_array->waiter ... Browse Code »

Trivial. trace_array->waiter has no users since 6eaaa5d5
"tracing/core: use appropriate waiting on trace_pipe".

Link: http://lkml.kernel.org/r/20130719142036.GA1594@redhat.com

Signed-off-by: Oleg Nesterov
Signed-off-by: Steven Rostedt

Oleg Nesterov
2013-07-19 22:56:02 +0800
cd458ba9d tracing: Do not (ab)use trace_seq in event_id_read() ... Browse Code »

event_id_read() has no reason to kmalloc "struct trace_seq"
(more than PAGE_SIZE!), it can use a small buffer instead.

Note: "if (*ppos) return 0" looks strange and even wrong,
simple_read_from_buffer() handles ppos != 0 case corrrectly.

And it seems that almost every user of trace_seq in this file
should be converted too. Unless you use seq_open(), trace_seq
buys nothing compared to the raw buffer, but it needs a bit
more memory and code.

Link: http://lkml.kernel.org/r/20130718184712.GA4786@redhat.com

Signed-off-by: Oleg Nesterov
Signed-off-by: Steven Rostedt

Oleg Nesterov
2013-07-19 09:31:33 +0800
7710b6399 tracing: Simplify the iteration logic in f_start/f_next ... Browse Code »

f_next() looks overcomplicated, and it is not strictly correct
even if this doesn't matter.

Say, FORMAT_FIELD_SEPERATOR should not return NULL (means EOF)
if trace_get_fields() returns an empty list, we should simply
advance to FORMAT_PRINTFMT as we do when we find the end of list.

1. Change f_next() to return "struct list_head *" rather than
"ftrace_event_field *", and change f_show() to do list_entry().

This simplifies the code a bit, only f_show() needs to know
about ftrace_event_field, and f_next() can play with ->prev
directly

2. Change f_next() to not play with ->prev / return inside the
switch() statement. It can simply set node = head/common_head,
the prev-or-advance-to-the-next-magic below does all work.

While at it. f_start() looks overcomplicated too. I don't think
*pos == 0 makes sense as a separate case, just change this code
to do "while" instead of "do/while".

The patch also moves f_start() down, close to f_stop(). This is
purely cosmetic, just to make the locking added by the next patch
more clear/visible.

Link: http://lkml.kernel.org/r/20130718184710.GA4783@redhat.com

Signed-off-by: Oleg Nesterov
Signed-off-by: Steven Rostedt

Oleg Nesterov
2013-07-19 09:31:32 +0800
8f7689933 tracing: Add ref_data to function and fgraph tracer structs ... Browse Code »

The selftest for function and function graph tracers are defined as
__init, as they are only executed at boot up. The "tracer" structs
that are associated to those tracers are not setup as __init as they
are used after boot. To stop mismatch warnings, those structures
need to be annotated with __ref_data.

Currently, the tracer structures are defined to __read_mostly, as they
do not really change. But in the future they should be converted to
consts, but that will take a little work because they have a "next"
pointer that gets updated when they are registered. That will have to
wait till the next major release.

Link: http://lkml.kernel.org/r/1373596735.17876.84.camel@gandalf.local.home

Reported-by: kbuild test robot
Reported-by: Chen Gang
Signed-off-by: Steven Rostedt

Steven Rostedt (Red Hat)
2013-07-19 09:31:31 +0800
f77d09a38 tracing: Miscellaneous fixes for trace_array ref counting ... Browse Code »

Some error paths did not handle ref counting properly, and some trace files need
ref counting.

Link: http://lkml.kernel.org/r/1374171524-11948-1-git-send-email-azl@google.com

Cc: stable@vger.kernel.org # 3.10
Cc: Vaibhav Nagarnaik
Cc: David Sharp
Cc: Alexander Z Lam
Signed-off-by: Alexander Z Lam
Signed-off-by: Steven Rostedt

Alexander Z Lam
2013-07-19 09:31:30 +0800
609e85a70 tracing: Fix error handling to ensure instances can always be removed ... Browse Code »

Remove debugfs directories for tracing instances during creation if an error
occurs causing the trace_array for that instance to not be added to
ftrace_trace_arrays. If the directory continues to exist after the error, it
cannot be removed because the respective trace_array is not in
ftrace_trace_arrays.

Link: http://lkml.kernel.org/r/1373502874-1706-2-git-send-email-azl@google.com

Cc: stable@vger.kernel.org # 3.10
Cc: Vaibhav Nagarnaik
Cc: David Sharp
Cc: Alexander Z Lam
Signed-off-by: Alexander Z Lam
Signed-off-by: Steven Rostedt

Alexander Z Lam
2013-07-19 09:31:30 +0800
a232e270d tracing/kprobe: Wait for disabling all running kprobe handlers ... Browse Code »

Wait for disabling all running kprobe handlers when a kprobe
event is disabled, since the caller, trace_remove_event_call()
supposes that a removing event is disabled completely by
disabling the event.
With this change, ftrace can ensure that there is no running
event handlers after disabling it.

Link: http://lkml.kernel.org/r/20130709093526.20138.93100.stgit@mhiramat-M0-7522

Signed-off-by: Masami Hiramatsu
Signed-off-by: Steven Rostedt

Masami Hiramatsu
2013-07-19 09:31:29 +0800
cd92bf61d tracing/perf: Move the PERF_MAX_TRACE_SIZE check into perf_trace_buf_prepare() ... Browse Code »

Every perf_trace_buf_prepare() caller does
WARN_ONCE(size > PERF_MAX_TRACE_SIZE, message) and "message" is
almost the same.

Shift this WARN_ONCE() into perf_trace_buf_prepare(). This changes
the meaning of _ONCE, but I think this is fine.

- 4947014 2932448 10104832 17984294 1126b26 vmlinux
+ 4948422 2932448 10104832 17985702 11270a6 vmlinux

on my build.

Link: http://lkml.kernel.org/r/20130617170211.GA19813@redhat.com

Acked-by: Peter Zijlstra
Signed-off-by: Oleg Nesterov
Signed-off-by: Steven Rostedt

Oleg Nesterov
2013-07-19 09:31:28 +0800
421c7860c tracing/syscall: Avoid perf_trace_buf_*() if sys_data->perf_events is empty ... Browse Code »

perf_trace_buf_prepare() + perf_trace_buf_submit(head, task => NULL)
make no sense if hlist_empty(head). Change perf_syscall_enter/exit()
to check sys_data->{enter,exit}_event->perf_events beforehand.

Link: http://lkml.kernel.org/r/20130617170207.GA19806@redhat.com

Acked-by: Peter Zijlstra
Signed-off-by: Oleg Nesterov
Signed-off-by: Steven Rostedt

Oleg Nesterov
2013-07-19 09:31:28 +0800
b8ebfd3f7 tracing/function: Avoid perf_trace_buf_*() if event_function.perf_events is empty ... Browse Code »

perf_trace_buf_prepare() + perf_trace_buf_submit(head, task => NULL)
make no sense if hlist_empty(head). Change perf_ftrace_function_call()
to check event_function.perf_events beforehand.

Link: http://lkml.kernel.org/r/20130617170204.GA19803@redhat.com

Acked-by: Peter Zijlstra
Signed-off-by: Oleg Nesterov
Signed-off-by: Steven Rostedt

Oleg Nesterov
2013-07-19 09:31:27 +0800
d611851b4 tracing: Typo fix on ring buffer comments ... Browse Code »

There have some mismatch between comments with
real function name, update it.

This patch also add some missed function arguments
description.

Link: http://lkml.kernel.org/r/51E3B3B2.4080307@huawei.com

Signed-off-by: zhangwei(Jovi)
Signed-off-by: Steven Rostedt

zhangwei(Jovi)
2013-07-19 09:31:26 +0800
146c3442f tracing: Use trace_seq_puts()/trace_seq_putc() where possible ... Browse Code »

For string without format specifiers, use trace_seq_puts()
or trace_seq_putc().

Link: http://lkml.kernel.org/r/51E3B3AC.1000605@huawei.com

Signed-off-by: zhangwei(Jovi)
[ fixed a trace_seq_putc(s, " ") to trace_seq_putc(s, ' ') ]
Signed-off-by: Steven Rostedt

zhangwei(Jovi)
2013-07-19 09:30:36 +0800
7a62711aa Merge tag 'driver-core-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull driver core patches from Greg KH:
"Here are some driver core patches for 3.11-rc2. They aren't really
bugfixes, but a bunch of new helper macros for drivers to properly
create attribute groups, which drivers and subsystems need to fix up a
ton of race issues with incorrectly creating sysfs files (binary and
normal) after userspace has been told that the device is present.

Also here is the ability to create binary files as attribute groups,
to solve that race condition, which was impossible to do before this,
so that's my fault the drivers were broken.

The majority of the .c changes is indenting and moving code around a
bit. It affects no existing code, but allows the large backlog of 70+
patches that I already have created to start flowing into the
different subtrees, instead of having to live in my driver-core tree,
causing merge nightmares in linux-next for the next few months.

These were finalized too late for the -rc1 merge window, which is why
they were didn't make that pull request, testing and review from
others didn't happen until a few weeks ago, and then there's the whole
distraction of the past few days, which prevented these from getting
to you sooner, sorry about that.

Oh, and there's a bugfix for the documentation build warning in here
as well. All of these have been in linux-next this week, with no
reported problems"

* tag 'driver-core-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver-core: fix new kernel-doc warning in base/platform.c
sysfs: use file mode defines from stat.h
sysfs: add more helper macro's for (bin_)attribute(_groups)
driver core: add default groups to struct class
driver core: Introduce device_create_groups
sysfs: prevent warning when only using binary attributes
sysfs: add support for binary attributes in groups
driver core: device.h: add RW and RO attribute macros
sysfs.h: add BIN_ATTR macro
sysfs.h: add ATTRIBUTE_GROUPS() macro
sysfs.h: add __ATTR_RW() macro

Linus Torvalds
2013-07-19 03:48:40 +0800