Eric Lee / smarc-fsl-linux-kernel

18 Dec, 2011

1 commit

b3bba872d writeback: show writeback reason with __print_symbolic ... Browse Code »

This makes the binary trace understandable by trace-cmd.

CC: Dave Chinner
CC: Curt Wohlgemuth
CC: Steven Rostedt
Signed-off-by: Wu Fengguang

Wu Fengguang
2011-12-18 14:20:17 +0800

07 Nov, 2011

2 commits

32aaeffbd Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux ... Browse Code »

* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
Revert "tracing: Include module.h in define_trace.h"
irq: don't put module.h into irq.h for tracking irqgen modules.
bluetooth: macroize two small inlines to avoid module.h
ip_vs.h: fix implicit use of module_get/module_put from module.h
nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
include: replace linux/module.h with "struct module" wherever possible
include: convert various register fcns to macros to avoid include chaining
crypto.h: remove unused crypto_tfm_alg_modname() inline
uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
pm_runtime.h: explicitly requires notifier.h
linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
miscdevice.h: fix up implicit use of lists and types
stop_machine.h: fix implicit use of smp.h for smp_processor_id
of: fix implicit use of errno.h in include/linux/of.h
of_platform.h: delete needless include
acpi: remove module.h include from platform/aclinux.h
miscdevice.h: delete unnecessary inclusion of module.h
device_cgroup.h: delete needless include
net: sch_generic remove redundant use of
net: inet_timewait_sock doesnt need
...

Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
- drivers/media/dvb/frontends/dibx000_common.c
- drivers/media/video/{mt9m111.c,ov6650.c}
- drivers/mfd/ab3550-core.c
- include/linux/dmaengine.h

Linus Torvalds
2011-11-07 11:44:47 +0800
208bca086 Merge branch 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux ... Browse Code »

* 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: Add a 'reason' to wb_writeback_work
writeback: send work item to queue_io, move_expired_inodes
writeback: trace event balance_dirty_pages
writeback: trace event bdi_dirty_ratelimit
writeback: fix ppc compile warnings on do_div(long long, unsigned long)
writeback: per-bdi background threshold
writeback: dirty position control - bdi reserve area
writeback: control dirty pause time
writeback: limit max dirty pause time
writeback: IO-less balance_dirty_pages()
writeback: per task dirty rate limit
writeback: stabilize bdi->dirty_ratelimit
writeback: dirty rate control
writeback: add bg_threshold parameter to __bdi_update_bandwidth()
writeback: dirty position control
writeback: account per-bdi accumulated dirtied pages

Linus Torvalds
2011-11-07 11:02:23 +0800

03 Nov, 2011

1 commit

f1f8935a5 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (97 commits)
jbd2: Unify log messages in jbd2 code
jbd/jbd2: validate sb->s_first in journal_get_superblock()
ext4: let ext4_ext_rm_leaf work with EXT_DEBUG defined
ext4: fix a syntax error in ext4_ext_insert_extent when debugging enabled
ext4: fix a typo in struct ext4_allocation_context
ext4: Don't normalize an falloc request if it can fit in 1 extent.
ext4: remove comments about extent mount option in ext4_new_inode()
ext4: let ext4_discard_partial_buffers handle unaligned range correctly
ext4: return ENOMEM if find_or_create_pages fails
ext4: move vars to local scope in ext4_discard_partial_page_buffers_no_lock()
ext4: Create helper function for EXT4_IO_END_UNWRITTEN and i_aiodio_unwritten
ext4: optimize locking for end_io extent conversion
ext4: remove unnecessary call to waitqueue_active()
ext4: Use correct locking for ext4_end_io_nolock()
ext4: fix race in xattr block allocation path
ext4: trace punch_hole correctly in ext4_ext_map_blocks
ext4: clean up AGGRESSIVE_TEST code
ext4: move variables to their scope
ext4: fix quota accounting during migration
ext4: migrate cleanup
...

Linus Torvalds
2011-11-03 01:06:20 +0800

01 Nov, 2011

3 commits

4356f21d0 mm: change isolate mode from #define to bitwise type ... Browse Code »

Change ISOLATE_XXX macro with bitwise isolate_mode_t type. Normally,
macro isn't recommended as it's type-unsafe and making debugging harder as
symbol cannot be passed throught to the debugger.

Quote from Johannes
" Hmm, it would probably be cleaner to fully convert the isolation mode
into independent flags. INACTIVE, ACTIVE, BOTH is currently a
tri-state among flags, which is a bit ugly."

This patch moves isolate mode from swap.h to mmzone.h by memcontrol.h

Signed-off-by: Minchan Kim
Cc: Johannes Weiner
Cc: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Cc: Mel Gorman
Cc: Rik van Riel
Cc: Michal Hocko
Cc: Andrea Arcangeli
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2011-11-01 08:30:44 +0800
67b84999b Revert "tracing: Include module.h in define_trace.h" ... Browse Code »

This reverts commit 3a9f987b3141f086de27832514aad9f50a53f754.

With all the files that are real modules now having module.h
explicitly called out for inclusion, and no reliance on any
implicit presence of module.h assumed, we should no longer
need this workaround.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-11-01 07:32:35 +0800
de4772542 include: replace linux/module.h with "struct module" wherever possible ... Browse Code »
43

The pretty much brings in the kitchen sink along
with it, so it should be avoided wherever reasonably possible in
terms of being included from other commonly used
files, as it results in a measureable increase on compile times.

The worst culprit was probably device.h since it is used everywhere.
This file also had an implicit dependency/usage of mutex.h which was
masked by module.h, and is also fixed here at the same time.

There are over a dozen other headers that simply declare the
struct instead of pulling in the whole file, so follow their lead
and simply make it a few more.

Most of the implicit dependencies on module.h being present by
these headers pulling it in have been now weeded out, so we can
finally make this change with hopefully minimal breakage.

Signed-off-by: Paul Gortmaker

Paul Gortmaker
2011-11-01 07:32:32 +0800

31 Oct, 2011

4 commits

0e175a183 writeback: Add a 'reason' to wb_writeback_work ... Browse Code »

This creates a new 'reason' field in a wb_writeback_work
structure, which unambiguously identifies who initiates
writeback activity. A 'wb_reason' enumeration has been
added to writeback.h, to enumerate the possible reasons.

The 'writeback_work_class' and tracepoint event class and
'writeback_queue_io' tracepoints are updated to include the
symbolic 'reason' in all trace events.

And the 'writeback_inodes_sbXXX' family of routines has had
a wb_stats parameter added to them, so callers can specify
why writeback is being started.

Acked-by: Jan Kara
Signed-off-by: Curt Wohlgemuth
Signed-off-by: Wu Fengguang

Curt Wohlgemuth
2011-10-31 00:33:36 +0800
ad4e38dd6 writeback: send work item to queue_io, move_expired_inodes ... Browse Code »

Instead of sending ->older_than_this to queue_io() and
move_expired_inodes(), send the entire wb_writeback_work
structure. There are other fields of a work item that are
useful in these routines and in tracepoints.

Acked-by: Jan Kara
Signed-off-by: Curt Wohlgemuth
Signed-off-by: Wu Fengguang

Curt Wohlgemuth
2011-10-31 00:33:27 +0800
ece13ac31 writeback: trace event balance_dirty_pages ... Browse Code »

Useful for analyzing the dynamics of the throttling algorithms and
debugging user reported problems.

Signed-off-by: Wu Fengguang

Wu Fengguang
2011-10-31 00:29:38 +0800
b48c104d2 writeback: trace event bdi_dirty_ratelimit ... Browse Code »

It helps understand how various throttle bandwidths are updated.

Signed-off-by: Wu Fengguang

Wu Fengguang
2011-10-31 00:29:21 +0800

29 Oct, 2011

1 commit

68d99b2c8 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (549 commits)
ALSA: hda - Fix ADC input-amp handling for Cx20549 codec
ALSA: hda - Keep EAPD turned on for old Conexant chips
ALSA: hda/realtek - Fix missing volume controls with ALC260
ASoC: wm8940: Properly set codec->dapm.bias_level
ALSA: hda - Fix pin-config for ASUS W90V
ALSA: hda - Fix surround/CLFE headphone and speaker pins order
ALSA: hda - Fix typo
ALSA: Update the sound git tree URL
ALSA: HDA: Add new revision for ALC662
ASoC: max98095: Convert codec->hw_write to snd_soc_write
ASoC: keep pointer to resource so it can be freed
ASoC: sgtl5000: Fix wrong mask in some snd_soc_update_bits calls
ASoC: wm8996: Fix wrong mask for setting WM8996_AIF_CLOCKING_2
ASoC: da7210: Add support for line out and DAC
ASoC: da7210: Add support for DAPM
ALSA: hda/realtek - Fix DAC assignments of multiple speakers
ASoC: Use SGTL5000_LINREG_VDDD_MASK instead of hardcoded mask value
ASoC: Set sgtl5000->ldo in ldo_regulator_register
ASoC: wm8996: Use SND_SOC_DAPM_AIF_OUT for AIF2 Capture
ASoC: wm8994: Use SND_SOC_DAPM_AIF_OUT for AIF3 Capture
...

Linus Torvalds
2011-10-29 05:25:01 +0800

27 Oct, 2011

1 commit

6f91bc5fd ext4: optimize ext4_ext_convert_to_initialized() ... Browse Code »

This patch introduces a fast path in ext4_ext_convert_to_initialized()
for the case when the conversion can be performed by transferring
the newly initialized blocks from the uninitialized extent into
an adjacent initialized extent. Doing so removes the expensive
invocations of memmove() which occur during extent insertion and
the subsequent merge.

In practice this should be the common case for clients performing
append writes into files pre-allocated via
fallocate(FALLOC_FL_KEEP_SIZE). In such a workload performed via
direct IO and when using a suboptimal implementation of memmove()
(x86_64 prior to the 2.6.39 rewrite), this patch reduces kernel CPU
consumption by 32%.

Two new trace points are added to ext4_ext_convert_to_initialized()
to offer visibility into its operations. No exit trace point has
been added due to the multiplicity of return points. This can be
revisited once the upstream cleanup is backported.

Signed-off-by: Eric Gouriou
Signed-off-by: "Theodore Ts'o"

Eric Gouriou
2011-10-27 23:43:23 +0800

26 Oct, 2011

4 commits

8a4a8918e Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
llist: Add back llist_add_batch() and llist_del_first() prototypes
sched: Don't use tasklist_lock for debug prints
sched: Warn on rt throttling
sched: Unify the ->cpus_allowed mask copy
sched: Wrap scheduler p->cpus_allowed access
sched: Request for idle balance during nohz idle load balance
sched: Use resched IPI to kick off the nohz idle balance
sched: Fix idle_cpu()
llist: Remove cpu_relax() usage in cmpxchg loops
sched: Convert to struct llist
llist: Add llist_next()
irq_work: Use llist in the struct irq_work logic
llist: Return whether list is empty before adding in llist_add()
llist: Move cpu_relax() to after the cmpxchg()
llist: Remove the platform-dependent NMI checks
llist: Make some llist functions inline
sched, tracing: Show PREEMPT_ACTIVE state in trace_sched_switch
sched: Remove redundant test in check_preempt_tick()
sched: Add documentation for bandwidth control
sched: Return unused runtime on group dequeue
...

Linus Torvalds
2011-10-26 23:08:43 +0800
7115e3fcf Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »
43

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (121 commits)
perf symbols: Increase symbol KSYM_NAME_LEN size
perf hists browser: Refuse 'a' hotkey on non symbolic views
perf ui browser: Use libslang to read keys
perf tools: Fix tracing info recording
perf hists browser: Elide DSO column when it is set to just one DSO, ditto for threads
perf hists: Don't consider filtered entries when calculating column widths
perf hists: Don't decay total_period for filtered entries
perf hists browser: Honour symbol_conf.show_{nr_samples,total_period}
perf hists browser: Do not exit on tab key with single event
perf annotate browser: Don't change selection line when returning from callq
perf tools: handle endianness of feature bitmap
perf tools: Add prelink suggestion to dso update message
perf script: Fix unknown feature comment
perf hists browser: Apply the dso and thread filters when merging new batches
perf hists: Move the dso and thread filters from hist_browser
perf ui browser: Honour the xterm colors
perf top tui: Give color hints just on the percentage, like on --stdio
perf ui browser: Make the colors configurable and change the defaults
perf tui: Remove unneeded call to newtCls on startup
perf hists: Don't format the percentage on hist_entry__snprintf
...

Fix up conflicts in arch/x86/kernel/kprobes.c manually.

Ingo's tree did the insane "add volatile to const array", which just
doesn't make sense ("volatile const"?). But we could remove the const
*and* make the array volatile to make doubly sure that gcc doesn't
optimize it away..

Also fix up kernel/trace/ring_buffer.c non-data-conflicts manually: the
reader_lock has been turned into a raw lock by the core locking merge,
and there was a new user of it introduced in this perf core merge. Make
sure that new use also uses the raw accessor functions.

Linus Torvalds
2011-10-26 23:03:38 +0800
19b4a8d52 Merge branch 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
rcu: Move propagation of ->completed from rcu_start_gp() to rcu_report_qs_rsp()
rcu: Remove rcu_needs_cpu_flush() to avoid false quiescent states
rcu: Wire up RCU_BOOST_PRIO for rcutree
rcu: Make rcu_torture_boost() exit loops at end of test
rcu: Make rcu_torture_fqs() exit loops at end of test
rcu: Permit rt_mutex_unlock() with irqs disabled
rcu: Avoid having just-onlined CPU resched itself when RCU is idle
rcu: Suppress NMI backtraces when stall ends before dump
rcu: Prohibit grace periods during early boot
rcu: Simplify unboosting checks
rcu: Prevent early boot set_need_resched() from __rcu_pending()
rcu: Dump local stack if cannot dump all CPUs' stacks
rcu: Move __rcu_read_unlock()'s barrier() within if-statement
rcu: Improve rcu_assign_pointer() and RCU_INIT_POINTER() documentation
rcu: Make rcu_assign_pointer() unconditionally insert a memory barrier
rcu: Make rcu_implicit_dynticks_qs() locals be correct size
rcu: Eliminate in_irq() checks in rcu_enter_nohz()
nohz: Remove nohz_cpu_mask
rcu: Document interpretation of RCU-lockdep splats
rcu: Allow rcutorture's stat_interval parameter to be changed at runtime
...

Linus Torvalds
2011-10-26 22:26:53 +0800
e33bae14f Merge branch 'for-linus' of git://github.com/ericvh/linux ... Browse Code »

* 'for-linus' of git://github.com/ericvh/linux:
9p: fix 9p.txt to advertise msize instead of maxdata
net/9p: Convert net/9p protocol dumps to tracepoints
fs/9p: change an int to unsigned int
fs/9p: Cleanup option parsing in 9p
9p: move dereference after NULL check
fs/9p: inode file operation is properly initialized init_special_inode
fs/9p: Update zero-copy implementation in 9p

Linus Torvalds
2011-10-26 20:20:53 +0800

25 Oct, 2011

3 commits

7e0bb71e7 Merge branch 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

* 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (63 commits)
PM / Clocks: Remove redundant NULL checks before kfree()
PM / Documentation: Update docs about suspend and CPU hotplug
ACPI / PM: Add Sony VGN-FW21E to nonvs blacklist.
ARM: mach-shmobile: sh7372 A4R support (v4)
ARM: mach-shmobile: sh7372 A3SP support (v4)
PM / Sleep: Mark devices involved in wakeup signaling during suspend
PM / Hibernate: Improve performance of LZO/plain hibernation, checksum image
PM / Hibernate: Do not initialize static and extern variables to 0
PM / Freezer: Make fake_signal_wake_up() wake TASK_KILLABLE tasks too
PM / Hibernate: Add resumedelay kernel param in addition to resumewait
MAINTAINERS: Update linux-pm list address
PM / ACPI: Blacklist Vaio VGN-FW520F machine known to require acpi_sleep=nonvs
PM / ACPI: Blacklist Sony Vaio known to require acpi_sleep=nonvs
PM / Hibernate: Add resumewait param to support MMC-like devices as resume file
PM / Hibernate: Fix typo in a kerneldoc comment
PM / Hibernate: Freeze kernel threads after preallocating memory
PM: Update the policy on default wakeup settings
PM / VT: Cleanup #if defined uglyness and fix compile error
PM / Suspend: Off by one in pm_suspend()
PM / Hibernate: Include storage keys in hibernation image on s390
...

Linus Torvalds
2011-10-25 21:18:39 +0800
4e7e2a200 Merge branch 'for-linus' of git://opensource.wolfsonmicro.com/regmap ... Browse Code »

* 'for-linus' of git://opensource.wolfsonmicro.com/regmap: (62 commits)
mfd: Enable rbtree cache for wm831x devices
regmap: Support some block operations on cached devices
regmap: Allow caches for devices with no defaults
regmap: Ensure rbtree syncs registers set to zero properly
regmap: Allow rbtree to cache zero default values
regmap: Warn on raw I/O as well as bulk reads that bypass cache
regmap: Return a sensible error code if we fail to read the cache
regmap: Use bsearch() to search the register defaults
regmap: Fix doc comment
regmap: Optimize the lookup path to use binary search
regmap: Ensure we scream if we enable cache bypass/only at the same time
regmap: Implement regcache_cache_bypass helper function
regmap: Save/restore the bypass state upon syncing
regmap: Lock the sync path, ensure we use the lockless _regmap_write()
regmap: Fix apostrophe usage
regmap: Make _regmap_write() global
regmap: Fix lock used for regcache_cache_only()
regmap: Grab the lock in regcache_cache_only()
regmap: Modify map->cache_bypass directly
regmap: Fix regcache_sync generic implementation
...

Linus Torvalds
2011-10-25 19:57:45 +0800
348b59012 net/9p: Convert net/9p protocol dumps to tracepoints ... Browse Code »

This helps in more control over debugging.
root@qemu-img-64:~# ls /pass/123
ls: cannot access /pass/123: No such file or directory
root@qemu-img-64:~# cat /sys/kernel/debug/tracing/trace
# tracer: nop
#
# TASK-PID CPU# TIMESTAMP FUNCTION
# | | | | |
ls-1536 [001] 70.928584: 9p_protocol_dump: clnt 18446612132784021504 P9_TWALK(tag = 1)
000: 16 00 00 00 6e 01 00 01 00 00 00 02 00 00 00 01
010: 00 03 00 31 32 33 00 00 00 ff ff ff ff 00 00 00

ls-1536 [001] 70.928587:
=> trace_9p_protocol_dump
=> p9pdu_finalize
=> p9_client_rpc
=> p9_client_walk
=> v9fs_vfs_lookup
=> d_alloc_and_lookup
=> walk_component
=> path_lookupat
ls-1536 [000] 70.929696: 9p_protocol_dump: clnt 18446612132784021504 P9_RLERROR(tag = 1)
000: 0b 00 00 00 07 01 00 02 00 00 00 4e 03 00 02 00
010: 00 00 00 00 03 00 02 00 00 00 00 00 ff 43 00 00

ls-1536 [000] 70.929697:
=> trace_9p_protocol_dump
=> p9_client_rpc
=> p9_client_walk
=> v9fs_vfs_lookup
=> d_alloc_and_lookup
=> walk_component
=> path_lookupat
=> do_path_lookup

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Eric Van Hensbergen

Aneesh Kumar K.V
2011-10-25 00:13:12 +0800

08 Oct, 2011

1 commit

d727b6065 Merge branch 'pm-runtime' into pm-for-linus ... Browse Code »

* pm-runtime:
PM / Tracing: build rpm-traces.c only if CONFIG_PM_RUNTIME is set
PM / Runtime: Replace dev_dbg() with trace_rpm_*()
PM / Runtime: Introduce trace points for tracing rpm_* functions
PM / Runtime: Don't run callbacks under lock for power.irq_safe set
USB: Add wakeup info to debugging messages
PM / Runtime: pm_runtime_idle() can be called in atomic context
PM / Runtime: Add macro to test for runtime PM events
PM / Runtime: Add might_sleep() to runtime PM functions

Rafael J. Wysocki
2011-10-08 05:16:55 +0800

06 Oct, 2011

1 commit

9d0140202 Merge commit 'v3.1-rc9' into perf/core ... Browse Code »

Merge reason: pick up latest fixes.

Signed-off-by: Ingo Molnar

Ingo Molnar
2011-10-06 18:49:21 +0800

04 Oct, 2011

2 commits

22f92bacb Merge branch 'linus' into sched/core ... Browse Code »

Merge reason: pick up the latest fixes.

Signed-off-by: Ingo Molnar

Ingo Molnar
2011-10-04 17:09:08 +0800
92e51938f perf: Fix counter of ftrace events ... Browse Code »

Each event adds some points to its counters. By default it adds 1,
and a number of points may be transmited in event's parameters.

E.g. sched:sched_stat_runtime adds how long process has been running.

But this functionality was broken by v2.6.31-rc5-392-gf413cdb
and now the event's parameters doesn't affect on a number of points.

TP_perf_assign isn't defined, so __perf_count(c) isn't executed and
__count is always equal to 1.

Signed-off-by: Andrew Vagin
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1317052535-1765247-2-git-send-email-avagin@openvz.org
Signed-off-by: Ingo Molnar

Andrew Vagin
2011-10-04 17:07:54 +0800

03 Oct, 2011

1 commit

143dfe861 writeback: IO-less balance_dirty_pages() ... Browse Code »

As proposed by Chris, Dave and Jan, don't start foreground writeback IO
inside balance_dirty_pages(). Instead, simply let it idle sleep for some
time to throttle the dirtying task. In the mean while, kick off the
per-bdi flusher thread to do background writeback IO.

RATIONALS
=========

- disk seeks on concurrent writeback of multiple inodes (Dave Chinner)

If every thread doing writes and being throttled start foreground
writeback, it leads to N IO submitters from at least N different
inodes at the same time, end up with N different sets of IO being
issued with potentially zero locality to each other, resulting in
much lower elevator sort/merge efficiency and hence we seek the disk
all over the place to service the different sets of IO.
OTOH, if there is only one submission thread, it doesn't jump between
inodes in the same way when congestion clears - it keeps writing to
the same inode, resulting in large related chunks of sequential IOs
being issued to the disk. This is more efficient than the above
foreground writeback because the elevator works better and the disk
seeks less.

- lock contention and cache bouncing on concurrent IO submitters (Dave Chinner)

With this patchset, the fs_mark benchmark on a 12-drive software RAID0 goes
from CPU bound to IO bound, freeing "3-4 CPUs worth of spinlock contention".

* "CPU usage has dropped by ~55%", "it certainly appears that most of
the CPU time saving comes from the removal of contention on the
inode_wb_list_lock" (IMHO at least 10% comes from the reduction of
cacheline bouncing, because the new code is able to call much less
frequently into balance_dirty_pages() and hence access the global
page states)

* the user space "App overhead" is reduced by 20%, by avoiding the
cacheline pollution by the complex writeback code path

* "for a ~5% throughput reduction", "the number of write IOs have
dropped by ~25%", and the elapsed time reduced from 41:42.17 to
40:53.23.

* On a simple test of 100 dd, it reduces the CPU %system time from 30% to 3%,
and improves IO throughput from 38MB/s to 42MB/s.

- IO size too small for fast arrays and too large for slow USB sticks

The write_chunk used by current balance_dirty_pages() cannot be
directly set to some large value (eg. 128MB) for better IO efficiency.
Because it could lead to more than 1 second user perceivable stalls.
Even the current 4MB write size may be too large for slow USB sticks.
The fact that balance_dirty_pages() starts IO on itself couples the
IO size to wait time, which makes it hard to do suitable IO size while
keeping the wait time under control.

Now it's possible to increase writeback chunk size proportional to the
disk bandwidth. In a simple test of 50 dd's on XFS, 1-HDD, 3GB ram,
the larger writeback size dramatically reduces the seek count to 1/10
(far beyond my expectation) and improves the write throughput by 24%.

- long block time in balance_dirty_pages() hurts desktop responsiveness

Many of us may have the experience: it often takes a couple of seconds
or even long time to stop a heavy writing dd/cp/tar command with
Ctrl-C or "kill -9".

- IO pipeline broken by bumpy write() progress

There are a broad class of "loop {read(buf); write(buf);}" applications
whose read() pipeline will be under-utilized or even come to a stop if
the write()s have long latencies _or_ don't progress in a constant rate.
The current threshold based throttling inherently transfers the large
low level IO completion fluctuations to bumpy application write()s,
and further deteriorates with increasing number of dirtiers and/or bdi's.

For example, when doing 50 dd's + 1 remote rsync to an XFS partition,
the rsync progresses very bumpy in legacy kernel, and throughput is
improved by 67% by this patchset. (plus the larger write chunk size,
it will be 93% speedup).

The new rate based throttling can support 1000+ dd's with excellent
smoothness, low latency and low overheads.

For the above reasons, it's much better to do IO-less and low latency
pauses in balance_dirty_pages().

Jan Kara, Dave Chinner and me explored the scheme to let
balance_dirty_pages() wait for enough writeback IO completions to
safeguard the dirty limit. However it's found to have two problems:

- in large NUMA systems, the per-cpu counters may have big accounting
errors, leading to big throttle wait time and jitters.

- NFS may kill large amount of unstable pages with one single COMMIT.
Because NFS server serves COMMIT with expensive fsync() IOs, it is
desirable to delay and reduce the number of COMMITs. So it's not
likely to optimize away such kind of bursty IO completions, and the
resulted large (and tiny) stall times in IO completion based throttling.

So here is a pause time oriented approach, which tries to control the
pause time in each balance_dirty_pages() invocations, by controlling
the number of pages dirtied before calling balance_dirty_pages(), for
smooth and efficient dirty throttling:

- avoid useless (eg. zero pause time) balance_dirty_pages() calls
- avoid too small pause time (less than 4ms, which burns CPU power)
- avoid too large pause time (more than 200ms, which hurts responsiveness)
- avoid big fluctuations of pause times

It can control pause times at will. The default policy (in a followup
patch) will be to do ~10ms pauses in 1-dd case, and increase to ~100ms
in 1000-dd case.

BEHAVIOR CHANGE
===============

(1) dirty threshold

Users will notice that the applications will get throttled once crossing
the global (background + dirty)/2=15% threshold, and then balanced around
17.5%. Before patch, the behavior is to just throttle it at 20% dirtyable
memory in 1-dd case.

Since the task will be soft throttled earlier than before, it may be
perceived by end users as performance "slow down" if his application
happens to dirty more than 15% dirtyable memory.

(2) smoothness/responsiveness

Users will notice a more responsive system during heavy writeback.
"killall dd" will take effect instantly.

Signed-off-by: Wu Fengguang

Wu Fengguang
2011-10-03 21:08:57 +0800

01 Oct, 2011

1 commit

048b71802 Merge branch 'rcu/next' of git://github.com/paulmckrcu/linux into core/rcu Browse Code »

Ingo Molnar
2011-10-01 20:21:36 +0800

29 Sep, 2011

5 commits

d4c08f2ac rcu: Add grace-period, quiescent-state, and call_rcu trace events ... Browse Code »

Add trace events to record grace-period start and end, quiescent states,
CPUs noticing grace-period start and end, grace-period initialization,
call_rcu() invocation, tasks blocking in RCU read-side critical sections,
tasks exiting those same critical sections, force_quiescent_state()
detection of dyntick-idle and offline CPUs, CPUs entering and leaving
dyntick-idle mode (except from NMIs), CPUs coming online and going
offline, and CPUs being kicked for staying in dyntick-idle mode for too
long (as in many weeks, even on 32-bit systems).

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

rcu: Add the rcu flavor to callback trace events

The earlier trace events for registering RCU callbacks and for invoking
them did not include the RCU flavor (rcu_bh, rcu_preempt, or rcu_sched).
This commit adds the RCU flavor to those trace events.

Signed-off-by: Paul E. McKenney

Paul E. McKenney
2011-09-29 12:38:21 +0800
385680a94 rcu: Add event-trace markers to TREE_RCU kthreads ... Browse Code »

Add event-trace markers to TREE_RCU kthreads to allow including these
kthread's CPU time in the utilization calculations.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2011-09-29 12:38:19 +0800
72fe701b7 rcu: Add RCU type to callback-invocation tracing ... Browse Code »

Add a string to the rcu_batch_start() and rcu_batch_end() trace
messages that indicates the RCU type ("rcu_sched", "rcu_bh", or
"rcu_preempt"). The trace messages for the actual invocations
themselves are not marked, as it should be clear from the
rcu_batch_start() and rcu_batch_end() events before and after.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2011-09-29 12:38:15 +0800
300df91ca rcu: Event-trace markers for computing RCU CPU utilization ... Browse Code »

This commit adds the trace_rcu_utilization() marker that is to be
used to allow postprocessing scripts compute RCU's CPU utilization,
give or take event-trace overhead. Note that we do not include RCU's
dyntick-idle interface because event tracing requires RCU protection,
which is not available in dyntick-idle mode.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2011-09-29 12:38:13 +0800
29c00b4a1 rcu: Add event-tracing for RCU callback invocation ... Browse Code »

There was recently some controversy about the overhead of invoking RCU
callbacks. Add TRACE_EVENT()s to obtain fine-grained timings for the
start and stop of a batch of callbacks and also for each callback invoked.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney

Paul E. McKenney
2011-09-29 12:38:12 +0800

28 Sep, 2011

1 commit

53b615ccc PM / Runtime: Introduce trace points for tracing rpm_* functions ... Browse Code »

This patch introduces 3 trace points to prepare for tracing
rpm_idle/rpm_suspend/rpm_resume functions, so we can use these
trace points to replace the current dev_dbg().

Signed-off-by: Ming Lei
Acked-by: Steven Rostedt
Signed-off-by: Rafael J. Wysocki

Ming Lei
2011-09-28 04:53:27 +0800

26 Sep, 2011

1 commit

557ab4254 sched, tracing: Show PREEMPT_ACTIVE state in trace_sched_switch ... Browse Code »

We had need to see the difference between scheduling a runnable task and
a runnable task being involuntarily preempted.

No app should rely on the old string output (the binary
trace event record format is not changed).

Signed-off-by: Peter Zijlstra
Cc: Thomas Gleixner
Cc: Steven Rostedt
Link: http://lkml.kernel.org/r/1316164603.10174.11.camel@twins
Signed-off-by: Ingo Molnar

Peter Zijlstra
2011-09-26 19:25:54 +0800

23 Sep, 2011

1 commit

e56235e09 ASoC: Add another DAPM stat for neighbour checks ... Browse Code »

The number of times we look at a potentially connected neighbour is just
as important as the number of times we actually recurse into looking at
that neighbour so also collect that statistic.

Signed-off-by: Mark Brown

Mark Brown
2011-09-23 00:24:40 +0800

21 Sep, 2011

1 commit

de02d0786 ASoC: Trace and collect statistics for DAPM graph walking ... Browse Code »

One of the longest standing areas for improvement in ASoC has been the
DAPM algorithm - it repeats the same checks many times whenever it is run
and makes no effort to limit the areas of the graph it checks meaning we
do an awful lot of walks over the full graph. This has never mattered too
much as the size of the graph has generally been small in relation to the
size of the devices supported and the speed of CPUs but it is annoying.

In preparation for work on improving this insert a trace point after the
graph walk has been done. This gives us specific timing information for
the walk, and in order to give quantifiable (non-benchmark) numbers also
count every time we check a link or check the power for a widget and report
those numbers. Substantial changes in the algorithm may require tweaks to
the stats but they should be useful for simpler things.

Signed-off-by: Mark Brown

Mark Brown
2011-09-21 21:53:44 +0800

20 Sep, 2011

1 commit

593600890 regmap: Add the regcache_sync trace event ... Browse Code »

Signed-off-by: Dimitris Papastamos
Signed-off-by: Mark Brown

Dimitris Papastamos
2011-09-20 02:06:34 +0800

10 Sep, 2011

1 commit

d8990240d ext4: add some tracepoints in ext4/extents.c ... Browse Code »

This patch adds some tracepoints in ext4/extents.c and updates a tracepoint in
ext4/inode.c.

Tested: Built and ran the kernel and verified that these tracepoints work.
Also ran xfstests.

Signed-off-by: Aditya Kali
Signed-off-by: "Theodore Ts'o"

Aditya Kali
2011-09-10 07:18:51 +0800

31 Aug, 2011

1 commit

c8ad62063 writeback: show raw dirtied_when in trace writeback_single_inode ... Browse Code »

Save inode->dirtied_when in the raw trace output for reliable scripting,
and to also show in formatted output the relative age in seconds for
easy human reading.

CC: Jan Kara
Acked-by: Christoph Hellwig
Signed-off-by: Wu Fengguang

Wu Fengguang
2011-08-31 08:48:15 +0800

20 Aug, 2011

1 commit

5ccc38740 Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-block: (23 commits)
Revert "cfq: Remove special treatment for metadata rqs."
block: fix flush machinery for stacking drivers with differring flush flags
block: improve rq_affinity placement
blktrace: add FLUSH/FUA support
Move some REQ flags to the common bio/request area
allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSH
xen/blkback: Make description more obvious.
cfq-iosched: Add documentation about idling
block: Make rq_affinity = 1 work as expected
block: swim3: fix unterminated of_device_id table
block/genhd.c: remove useless cast in diskstats_show()
drivers/cdrom/cdrom.c: relax check on dvd manufacturer value
drivers/block/drbd/drbd_nl.c: use bitmap_parse instead of __bitmap_parse
bsg-lib: add module.h include
cfq-iosched: Reduce linked group count upon group destruction
blk-throttle: correctly determine sync bio
loop: fix deadlock when sysfs and LOOP_CLR_FD race against each other
loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop devices
loop: add management interface for on-demand device allocation
loop: replace linked list of allocated devices with an idr index
...

Linus Torvalds
2011-08-20 01:47:07 +0800

11 Aug, 2011

1 commit

c09c47cae blktrace: add FLUSH/FUA support ... Browse Code »

Add FLUSH/FUA support to blktrace. As FLUSH precedes WRITE and/or
FUA follows WRITE, use the same 'F' flag for both cases and
distinguish them by their (relative) position. The end results
look like (other flags might be shown also):

- WRITE: W
- WRITE_FLUSH: FW
- WRITE_FUA: WF
- WRITE_FLUSH_FUA: FWF

Note that we reuse TC_BARRIER due to lack of bit space of act_mask
so that the older versions of blktrace tools will report flush
requests as barriers from now on.

Cc: Steven Rostedt
Cc: Frederic Weisbecker
Cc: Ingo Molnar
Signed-off-by: Namhyung Kim
Reviewed-by: Jeff Moyer
Signed-off-by: Jens Axboe

Namhyung Kim
2011-08-11 16:36:05 +0800