18 Dec, 2011

1 commit


07 Nov, 2011

2 commits

  • * 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
    Revert "tracing: Include module.h in define_trace.h"
    irq: don't put module.h into irq.h for tracking irqgen modules.
    bluetooth: macroize two small inlines to avoid module.h
    ip_vs.h: fix implicit use of module_get/module_put from module.h
    nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
    include: replace linux/module.h with "struct module" wherever possible
    include: convert various register fcns to macros to avoid include chaining
    crypto.h: remove unused crypto_tfm_alg_modname() inline
    uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
    pm_runtime.h: explicitly requires notifier.h
    linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
    miscdevice.h: fix up implicit use of lists and types
    stop_machine.h: fix implicit use of smp.h for smp_processor_id
    of: fix implicit use of errno.h in include/linux/of.h
    of_platform.h: delete needless include
    acpi: remove module.h include from platform/aclinux.h
    miscdevice.h: delete unnecessary inclusion of module.h
    device_cgroup.h: delete needless include
    net: sch_generic remove redundant use of
    net: inet_timewait_sock doesnt need
    ...

    Fix up trivial conflicts (other header files, and removal of the ab3550 mfd driver) in
    - drivers/media/dvb/frontends/dibx000_common.c
    - drivers/media/video/{mt9m111.c,ov6650.c}
    - drivers/mfd/ab3550-core.c
    - include/linux/dmaengine.h

    Linus Torvalds
     
  • * 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    writeback: Add a 'reason' to wb_writeback_work
    writeback: send work item to queue_io, move_expired_inodes
    writeback: trace event balance_dirty_pages
    writeback: trace event bdi_dirty_ratelimit
    writeback: fix ppc compile warnings on do_div(long long, unsigned long)
    writeback: per-bdi background threshold
    writeback: dirty position control - bdi reserve area
    writeback: control dirty pause time
    writeback: limit max dirty pause time
    writeback: IO-less balance_dirty_pages()
    writeback: per task dirty rate limit
    writeback: stabilize bdi->dirty_ratelimit
    writeback: dirty rate control
    writeback: add bg_threshold parameter to __bdi_update_bandwidth()
    writeback: dirty position control
    writeback: account per-bdi accumulated dirtied pages

    Linus Torvalds
     

03 Nov, 2011

1 commit

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (97 commits)
    jbd2: Unify log messages in jbd2 code
    jbd/jbd2: validate sb->s_first in journal_get_superblock()
    ext4: let ext4_ext_rm_leaf work with EXT_DEBUG defined
    ext4: fix a syntax error in ext4_ext_insert_extent when debugging enabled
    ext4: fix a typo in struct ext4_allocation_context
    ext4: Don't normalize an falloc request if it can fit in 1 extent.
    ext4: remove comments about extent mount option in ext4_new_inode()
    ext4: let ext4_discard_partial_buffers handle unaligned range correctly
    ext4: return ENOMEM if find_or_create_pages fails
    ext4: move vars to local scope in ext4_discard_partial_page_buffers_no_lock()
    ext4: Create helper function for EXT4_IO_END_UNWRITTEN and i_aiodio_unwritten
    ext4: optimize locking for end_io extent conversion
    ext4: remove unnecessary call to waitqueue_active()
    ext4: Use correct locking for ext4_end_io_nolock()
    ext4: fix race in xattr block allocation path
    ext4: trace punch_hole correctly in ext4_ext_map_blocks
    ext4: clean up AGGRESSIVE_TEST code
    ext4: move variables to their scope
    ext4: fix quota accounting during migration
    ext4: migrate cleanup
    ...

    Linus Torvalds
     

01 Nov, 2011

3 commits

  • Change ISOLATE_XXX macro with bitwise isolate_mode_t type. Normally,
    macro isn't recommended as it's type-unsafe and making debugging harder as
    symbol cannot be passed throught to the debugger.

    Quote from Johannes
    " Hmm, it would probably be cleaner to fully convert the isolation mode
    into independent flags. INACTIVE, ACTIVE, BOTH is currently a
    tri-state among flags, which is a bit ugly."

    This patch moves isolate mode from swap.h to mmzone.h by memcontrol.h

    Signed-off-by: Minchan Kim
    Cc: Johannes Weiner
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Mel Gorman
    Cc: Rik van Riel
    Cc: Michal Hocko
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • This reverts commit 3a9f987b3141f086de27832514aad9f50a53f754.

    With all the files that are real modules now having module.h
    explicitly called out for inclusion, and no reliance on any
    implicit presence of module.h assumed, we should no longer
    need this workaround.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     
  • The pretty much brings in the kitchen sink along
    with it, so it should be avoided wherever reasonably possible in
    terms of being included from other commonly used
    files, as it results in a measureable increase on compile times.

    The worst culprit was probably device.h since it is used everywhere.
    This file also had an implicit dependency/usage of mutex.h which was
    masked by module.h, and is also fixed here at the same time.

    There are over a dozen other headers that simply declare the
    struct instead of pulling in the whole file, so follow their lead
    and simply make it a few more.

    Most of the implicit dependencies on module.h being present by
    these headers pulling it in have been now weeded out, so we can
    finally make this change with hopefully minimal breakage.

    Signed-off-by: Paul Gortmaker

    Paul Gortmaker
     

31 Oct, 2011

4 commits

  • This creates a new 'reason' field in a wb_writeback_work
    structure, which unambiguously identifies who initiates
    writeback activity. A 'wb_reason' enumeration has been
    added to writeback.h, to enumerate the possible reasons.

    The 'writeback_work_class' and tracepoint event class and
    'writeback_queue_io' tracepoints are updated to include the
    symbolic 'reason' in all trace events.

    And the 'writeback_inodes_sbXXX' family of routines has had
    a wb_stats parameter added to them, so callers can specify
    why writeback is being started.

    Acked-by: Jan Kara
    Signed-off-by: Curt Wohlgemuth
    Signed-off-by: Wu Fengguang

    Curt Wohlgemuth
     
  • Instead of sending ->older_than_this to queue_io() and
    move_expired_inodes(), send the entire wb_writeback_work
    structure. There are other fields of a work item that are
    useful in these routines and in tracepoints.

    Acked-by: Jan Kara
    Signed-off-by: Curt Wohlgemuth
    Signed-off-by: Wu Fengguang

    Curt Wohlgemuth
     
  • Useful for analyzing the dynamics of the throttling algorithms and
    debugging user reported problems.

    Signed-off-by: Wu Fengguang

    Wu Fengguang
     
  • It helps understand how various throttle bandwidths are updated.

    Signed-off-by: Wu Fengguang

    Wu Fengguang
     

29 Oct, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (549 commits)
    ALSA: hda - Fix ADC input-amp handling for Cx20549 codec
    ALSA: hda - Keep EAPD turned on for old Conexant chips
    ALSA: hda/realtek - Fix missing volume controls with ALC260
    ASoC: wm8940: Properly set codec->dapm.bias_level
    ALSA: hda - Fix pin-config for ASUS W90V
    ALSA: hda - Fix surround/CLFE headphone and speaker pins order
    ALSA: hda - Fix typo
    ALSA: Update the sound git tree URL
    ALSA: HDA: Add new revision for ALC662
    ASoC: max98095: Convert codec->hw_write to snd_soc_write
    ASoC: keep pointer to resource so it can be freed
    ASoC: sgtl5000: Fix wrong mask in some snd_soc_update_bits calls
    ASoC: wm8996: Fix wrong mask for setting WM8996_AIF_CLOCKING_2
    ASoC: da7210: Add support for line out and DAC
    ASoC: da7210: Add support for DAPM
    ALSA: hda/realtek - Fix DAC assignments of multiple speakers
    ASoC: Use SGTL5000_LINREG_VDDD_MASK instead of hardcoded mask value
    ASoC: Set sgtl5000->ldo in ldo_regulator_register
    ASoC: wm8996: Use SND_SOC_DAPM_AIF_OUT for AIF2 Capture
    ASoC: wm8994: Use SND_SOC_DAPM_AIF_OUT for AIF3 Capture
    ...

    Linus Torvalds
     

27 Oct, 2011

1 commit

  • This patch introduces a fast path in ext4_ext_convert_to_initialized()
    for the case when the conversion can be performed by transferring
    the newly initialized blocks from the uninitialized extent into
    an adjacent initialized extent. Doing so removes the expensive
    invocations of memmove() which occur during extent insertion and
    the subsequent merge.

    In practice this should be the common case for clients performing
    append writes into files pre-allocated via
    fallocate(FALLOC_FL_KEEP_SIZE). In such a workload performed via
    direct IO and when using a suboptimal implementation of memmove()
    (x86_64 prior to the 2.6.39 rewrite), this patch reduces kernel CPU
    consumption by 32%.

    Two new trace points are added to ext4_ext_convert_to_initialized()
    to offer visibility into its operations. No exit trace point has
    been added due to the multiplicity of return points. This can be
    revisited once the upstream cleanup is backported.

    Signed-off-by: Eric Gouriou
    Signed-off-by: "Theodore Ts'o"

    Eric Gouriou
     

26 Oct, 2011

4 commits

  • * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits)
    llist: Add back llist_add_batch() and llist_del_first() prototypes
    sched: Don't use tasklist_lock for debug prints
    sched: Warn on rt throttling
    sched: Unify the ->cpus_allowed mask copy
    sched: Wrap scheduler p->cpus_allowed access
    sched: Request for idle balance during nohz idle load balance
    sched: Use resched IPI to kick off the nohz idle balance
    sched: Fix idle_cpu()
    llist: Remove cpu_relax() usage in cmpxchg loops
    sched: Convert to struct llist
    llist: Add llist_next()
    irq_work: Use llist in the struct irq_work logic
    llist: Return whether list is empty before adding in llist_add()
    llist: Move cpu_relax() to after the cmpxchg()
    llist: Remove the platform-dependent NMI checks
    llist: Make some llist functions inline
    sched, tracing: Show PREEMPT_ACTIVE state in trace_sched_switch
    sched: Remove redundant test in check_preempt_tick()
    sched: Add documentation for bandwidth control
    sched: Return unused runtime on group dequeue
    ...

    Linus Torvalds
     
  • * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (121 commits)
    perf symbols: Increase symbol KSYM_NAME_LEN size
    perf hists browser: Refuse 'a' hotkey on non symbolic views
    perf ui browser: Use libslang to read keys
    perf tools: Fix tracing info recording
    perf hists browser: Elide DSO column when it is set to just one DSO, ditto for threads
    perf hists: Don't consider filtered entries when calculating column widths
    perf hists: Don't decay total_period for filtered entries
    perf hists browser: Honour symbol_conf.show_{nr_samples,total_period}
    perf hists browser: Do not exit on tab key with single event
    perf annotate browser: Don't change selection line when returning from callq
    perf tools: handle endianness of feature bitmap
    perf tools: Add prelink suggestion to dso update message
    perf script: Fix unknown feature comment
    perf hists browser: Apply the dso and thread filters when merging new batches
    perf hists: Move the dso and thread filters from hist_browser
    perf ui browser: Honour the xterm colors
    perf top tui: Give color hints just on the percentage, like on --stdio
    perf ui browser: Make the colors configurable and change the defaults
    perf tui: Remove unneeded call to newtCls on startup
    perf hists: Don't format the percentage on hist_entry__snprintf
    ...

    Fix up conflicts in arch/x86/kernel/kprobes.c manually.

    Ingo's tree did the insane "add volatile to const array", which just
    doesn't make sense ("volatile const"?). But we could remove the const
    *and* make the array volatile to make doubly sure that gcc doesn't
    optimize it away..

    Also fix up kernel/trace/ring_buffer.c non-data-conflicts manually: the
    reader_lock has been turned into a raw lock by the core locking merge,
    and there was a new user of it introduced in this perf core merge. Make
    sure that new use also uses the raw accessor functions.

    Linus Torvalds
     
  • * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (45 commits)
    rcu: Move propagation of ->completed from rcu_start_gp() to rcu_report_qs_rsp()
    rcu: Remove rcu_needs_cpu_flush() to avoid false quiescent states
    rcu: Wire up RCU_BOOST_PRIO for rcutree
    rcu: Make rcu_torture_boost() exit loops at end of test
    rcu: Make rcu_torture_fqs() exit loops at end of test
    rcu: Permit rt_mutex_unlock() with irqs disabled
    rcu: Avoid having just-onlined CPU resched itself when RCU is idle
    rcu: Suppress NMI backtraces when stall ends before dump
    rcu: Prohibit grace periods during early boot
    rcu: Simplify unboosting checks
    rcu: Prevent early boot set_need_resched() from __rcu_pending()
    rcu: Dump local stack if cannot dump all CPUs' stacks
    rcu: Move __rcu_read_unlock()'s barrier() within if-statement
    rcu: Improve rcu_assign_pointer() and RCU_INIT_POINTER() documentation
    rcu: Make rcu_assign_pointer() unconditionally insert a memory barrier
    rcu: Make rcu_implicit_dynticks_qs() locals be correct size
    rcu: Eliminate in_irq() checks in rcu_enter_nohz()
    nohz: Remove nohz_cpu_mask
    rcu: Document interpretation of RCU-lockdep splats
    rcu: Allow rcutorture's stat_interval parameter to be changed at runtime
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://github.com/ericvh/linux:
    9p: fix 9p.txt to advertise msize instead of maxdata
    net/9p: Convert net/9p protocol dumps to tracepoints
    fs/9p: change an int to unsigned int
    fs/9p: Cleanup option parsing in 9p
    9p: move dereference after NULL check
    fs/9p: inode file operation is properly initialized init_special_inode
    fs/9p: Update zero-copy implementation in 9p

    Linus Torvalds
     

25 Oct, 2011

3 commits

  • * 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (63 commits)
    PM / Clocks: Remove redundant NULL checks before kfree()
    PM / Documentation: Update docs about suspend and CPU hotplug
    ACPI / PM: Add Sony VGN-FW21E to nonvs blacklist.
    ARM: mach-shmobile: sh7372 A4R support (v4)
    ARM: mach-shmobile: sh7372 A3SP support (v4)
    PM / Sleep: Mark devices involved in wakeup signaling during suspend
    PM / Hibernate: Improve performance of LZO/plain hibernation, checksum image
    PM / Hibernate: Do not initialize static and extern variables to 0
    PM / Freezer: Make fake_signal_wake_up() wake TASK_KILLABLE tasks too
    PM / Hibernate: Add resumedelay kernel param in addition to resumewait
    MAINTAINERS: Update linux-pm list address
    PM / ACPI: Blacklist Vaio VGN-FW520F machine known to require acpi_sleep=nonvs
    PM / ACPI: Blacklist Sony Vaio known to require acpi_sleep=nonvs
    PM / Hibernate: Add resumewait param to support MMC-like devices as resume file
    PM / Hibernate: Fix typo in a kerneldoc comment
    PM / Hibernate: Freeze kernel threads after preallocating memory
    PM: Update the policy on default wakeup settings
    PM / VT: Cleanup #if defined uglyness and fix compile error
    PM / Suspend: Off by one in pm_suspend()
    PM / Hibernate: Include storage keys in hibernation image on s390
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://opensource.wolfsonmicro.com/regmap: (62 commits)
    mfd: Enable rbtree cache for wm831x devices
    regmap: Support some block operations on cached devices
    regmap: Allow caches for devices with no defaults
    regmap: Ensure rbtree syncs registers set to zero properly
    regmap: Allow rbtree to cache zero default values
    regmap: Warn on raw I/O as well as bulk reads that bypass cache
    regmap: Return a sensible error code if we fail to read the cache
    regmap: Use bsearch() to search the register defaults
    regmap: Fix doc comment
    regmap: Optimize the lookup path to use binary search
    regmap: Ensure we scream if we enable cache bypass/only at the same time
    regmap: Implement regcache_cache_bypass helper function
    regmap: Save/restore the bypass state upon syncing
    regmap: Lock the sync path, ensure we use the lockless _regmap_write()
    regmap: Fix apostrophe usage
    regmap: Make _regmap_write() global
    regmap: Fix lock used for regcache_cache_only()
    regmap: Grab the lock in regcache_cache_only()
    regmap: Modify map->cache_bypass directly
    regmap: Fix regcache_sync generic implementation
    ...

    Linus Torvalds
     
  • This helps in more control over debugging.
    root@qemu-img-64:~# ls /pass/123
    ls: cannot access /pass/123: No such file or directory
    root@qemu-img-64:~# cat /sys/kernel/debug/tracing/trace
    # tracer: nop
    #
    # TASK-PID CPU# TIMESTAMP FUNCTION
    # | | | | |
    ls-1536 [001] 70.928584: 9p_protocol_dump: clnt 18446612132784021504 P9_TWALK(tag = 1)
    000: 16 00 00 00 6e 01 00 01 00 00 00 02 00 00 00 01
    010: 00 03 00 31 32 33 00 00 00 ff ff ff ff 00 00 00

    ls-1536 [001] 70.928587:
    => trace_9p_protocol_dump
    => p9pdu_finalize
    => p9_client_rpc
    => p9_client_walk
    => v9fs_vfs_lookup
    => d_alloc_and_lookup
    => walk_component
    => path_lookupat
    ls-1536 [000] 70.929696: 9p_protocol_dump: clnt 18446612132784021504 P9_RLERROR(tag = 1)
    000: 0b 00 00 00 07 01 00 02 00 00 00 4e 03 00 02 00
    010: 00 00 00 00 03 00 02 00 00 00 00 00 ff 43 00 00

    ls-1536 [000] 70.929697:
    => trace_9p_protocol_dump
    => p9_client_rpc
    => p9_client_walk
    => v9fs_vfs_lookup
    => d_alloc_and_lookup
    => walk_component
    => path_lookupat
    => do_path_lookup

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Eric Van Hensbergen

    Aneesh Kumar K.V
     

08 Oct, 2011

1 commit

  • * pm-runtime:
    PM / Tracing: build rpm-traces.c only if CONFIG_PM_RUNTIME is set
    PM / Runtime: Replace dev_dbg() with trace_rpm_*()
    PM / Runtime: Introduce trace points for tracing rpm_* functions
    PM / Runtime: Don't run callbacks under lock for power.irq_safe set
    USB: Add wakeup info to debugging messages
    PM / Runtime: pm_runtime_idle() can be called in atomic context
    PM / Runtime: Add macro to test for runtime PM events
    PM / Runtime: Add might_sleep() to runtime PM functions

    Rafael J. Wysocki
     

06 Oct, 2011

1 commit


04 Oct, 2011

2 commits

  • Merge reason: pick up the latest fixes.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Each event adds some points to its counters. By default it adds 1,
    and a number of points may be transmited in event's parameters.

    E.g. sched:sched_stat_runtime adds how long process has been running.

    But this functionality was broken by v2.6.31-rc5-392-gf413cdb
    and now the event's parameters doesn't affect on a number of points.

    TP_perf_assign isn't defined, so __perf_count(c) isn't executed and
    __count is always equal to 1.

    Signed-off-by: Andrew Vagin
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1317052535-1765247-2-git-send-email-avagin@openvz.org
    Signed-off-by: Ingo Molnar

    Andrew Vagin
     

03 Oct, 2011

1 commit

  • As proposed by Chris, Dave and Jan, don't start foreground writeback IO
    inside balance_dirty_pages(). Instead, simply let it idle sleep for some
    time to throttle the dirtying task. In the mean while, kick off the
    per-bdi flusher thread to do background writeback IO.

    RATIONALS
    =========

    - disk seeks on concurrent writeback of multiple inodes (Dave Chinner)

    If every thread doing writes and being throttled start foreground
    writeback, it leads to N IO submitters from at least N different
    inodes at the same time, end up with N different sets of IO being
    issued with potentially zero locality to each other, resulting in
    much lower elevator sort/merge efficiency and hence we seek the disk
    all over the place to service the different sets of IO.
    OTOH, if there is only one submission thread, it doesn't jump between
    inodes in the same way when congestion clears - it keeps writing to
    the same inode, resulting in large related chunks of sequential IOs
    being issued to the disk. This is more efficient than the above
    foreground writeback because the elevator works better and the disk
    seeks less.

    - lock contention and cache bouncing on concurrent IO submitters (Dave Chinner)

    With this patchset, the fs_mark benchmark on a 12-drive software RAID0 goes
    from CPU bound to IO bound, freeing "3-4 CPUs worth of spinlock contention".

    * "CPU usage has dropped by ~55%", "it certainly appears that most of
    the CPU time saving comes from the removal of contention on the
    inode_wb_list_lock" (IMHO at least 10% comes from the reduction of
    cacheline bouncing, because the new code is able to call much less
    frequently into balance_dirty_pages() and hence access the global
    page states)

    * the user space "App overhead" is reduced by 20%, by avoiding the
    cacheline pollution by the complex writeback code path

    * "for a ~5% throughput reduction", "the number of write IOs have
    dropped by ~25%", and the elapsed time reduced from 41:42.17 to
    40:53.23.

    * On a simple test of 100 dd, it reduces the CPU %system time from 30% to 3%,
    and improves IO throughput from 38MB/s to 42MB/s.

    - IO size too small for fast arrays and too large for slow USB sticks

    The write_chunk used by current balance_dirty_pages() cannot be
    directly set to some large value (eg. 128MB) for better IO efficiency.
    Because it could lead to more than 1 second user perceivable stalls.
    Even the current 4MB write size may be too large for slow USB sticks.
    The fact that balance_dirty_pages() starts IO on itself couples the
    IO size to wait time, which makes it hard to do suitable IO size while
    keeping the wait time under control.

    Now it's possible to increase writeback chunk size proportional to the
    disk bandwidth. In a simple test of 50 dd's on XFS, 1-HDD, 3GB ram,
    the larger writeback size dramatically reduces the seek count to 1/10
    (far beyond my expectation) and improves the write throughput by 24%.

    - long block time in balance_dirty_pages() hurts desktop responsiveness

    Many of us may have the experience: it often takes a couple of seconds
    or even long time to stop a heavy writing dd/cp/tar command with
    Ctrl-C or "kill -9".

    - IO pipeline broken by bumpy write() progress

    There are a broad class of "loop {read(buf); write(buf);}" applications
    whose read() pipeline will be under-utilized or even come to a stop if
    the write()s have long latencies _or_ don't progress in a constant rate.
    The current threshold based throttling inherently transfers the large
    low level IO completion fluctuations to bumpy application write()s,
    and further deteriorates with increasing number of dirtiers and/or bdi's.

    For example, when doing 50 dd's + 1 remote rsync to an XFS partition,
    the rsync progresses very bumpy in legacy kernel, and throughput is
    improved by 67% by this patchset. (plus the larger write chunk size,
    it will be 93% speedup).

    The new rate based throttling can support 1000+ dd's with excellent
    smoothness, low latency and low overheads.

    For the above reasons, it's much better to do IO-less and low latency
    pauses in balance_dirty_pages().

    Jan Kara, Dave Chinner and me explored the scheme to let
    balance_dirty_pages() wait for enough writeback IO completions to
    safeguard the dirty limit. However it's found to have two problems:

    - in large NUMA systems, the per-cpu counters may have big accounting
    errors, leading to big throttle wait time and jitters.

    - NFS may kill large amount of unstable pages with one single COMMIT.
    Because NFS server serves COMMIT with expensive fsync() IOs, it is
    desirable to delay and reduce the number of COMMITs. So it's not
    likely to optimize away such kind of bursty IO completions, and the
    resulted large (and tiny) stall times in IO completion based throttling.

    So here is a pause time oriented approach, which tries to control the
    pause time in each balance_dirty_pages() invocations, by controlling
    the number of pages dirtied before calling balance_dirty_pages(), for
    smooth and efficient dirty throttling:

    - avoid useless (eg. zero pause time) balance_dirty_pages() calls
    - avoid too small pause time (less than 4ms, which burns CPU power)
    - avoid too large pause time (more than 200ms, which hurts responsiveness)
    - avoid big fluctuations of pause times

    It can control pause times at will. The default policy (in a followup
    patch) will be to do ~10ms pauses in 1-dd case, and increase to ~100ms
    in 1000-dd case.

    BEHAVIOR CHANGE
    ===============

    (1) dirty threshold

    Users will notice that the applications will get throttled once crossing
    the global (background + dirty)/2=15% threshold, and then balanced around
    17.5%. Before patch, the behavior is to just throttle it at 20% dirtyable
    memory in 1-dd case.

    Since the task will be soft throttled earlier than before, it may be
    perceived by end users as performance "slow down" if his application
    happens to dirty more than 15% dirtyable memory.

    (2) smoothness/responsiveness

    Users will notice a more responsive system during heavy writeback.
    "killall dd" will take effect instantly.

    Signed-off-by: Wu Fengguang

    Wu Fengguang
     

01 Oct, 2011

1 commit


29 Sep, 2011

5 commits

  • Add trace events to record grace-period start and end, quiescent states,
    CPUs noticing grace-period start and end, grace-period initialization,
    call_rcu() invocation, tasks blocking in RCU read-side critical sections,
    tasks exiting those same critical sections, force_quiescent_state()
    detection of dyntick-idle and offline CPUs, CPUs entering and leaving
    dyntick-idle mode (except from NMIs), CPUs coming online and going
    offline, and CPUs being kicked for staying in dyntick-idle mode for too
    long (as in many weeks, even on 32-bit systems).

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    rcu: Add the rcu flavor to callback trace events

    The earlier trace events for registering RCU callbacks and for invoking
    them did not include the RCU flavor (rcu_bh, rcu_preempt, or rcu_sched).
    This commit adds the RCU flavor to those trace events.

    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Add event-trace markers to TREE_RCU kthreads to allow including these
    kthread's CPU time in the utilization calculations.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • Add a string to the rcu_batch_start() and rcu_batch_end() trace
    messages that indicates the RCU type ("rcu_sched", "rcu_bh", or
    "rcu_preempt"). The trace messages for the actual invocations
    themselves are not marked, as it should be clear from the
    rcu_batch_start() and rcu_batch_end() events before and after.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • This commit adds the trace_rcu_utilization() marker that is to be
    used to allow postprocessing scripts compute RCU's CPU utilization,
    give or take event-trace overhead. Note that we do not include RCU's
    dyntick-idle interface because event tracing requires RCU protection,
    which is not available in dyntick-idle mode.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     
  • There was recently some controversy about the overhead of invoking RCU
    callbacks. Add TRACE_EVENT()s to obtain fine-grained timings for the
    start and stop of a batch of callbacks and also for each callback invoked.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney

    Paul E. McKenney
     

28 Sep, 2011

1 commit


26 Sep, 2011

1 commit

  • We had need to see the difference between scheduling a runnable task and
    a runnable task being involuntarily preempted.

    No app should rely on the old string output (the binary
    trace event record format is not changed).

    Signed-off-by: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Steven Rostedt
    Link: http://lkml.kernel.org/r/1316164603.10174.11.camel@twins
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

23 Sep, 2011

1 commit


21 Sep, 2011

1 commit

  • One of the longest standing areas for improvement in ASoC has been the
    DAPM algorithm - it repeats the same checks many times whenever it is run
    and makes no effort to limit the areas of the graph it checks meaning we
    do an awful lot of walks over the full graph. This has never mattered too
    much as the size of the graph has generally been small in relation to the
    size of the devices supported and the speed of CPUs but it is annoying.

    In preparation for work on improving this insert a trace point after the
    graph walk has been done. This gives us specific timing information for
    the walk, and in order to give quantifiable (non-benchmark) numbers also
    count every time we check a link or check the power for a widget and report
    those numbers. Substantial changes in the algorithm may require tweaks to
    the stats but they should be useful for simpler things.

    Signed-off-by: Mark Brown

    Mark Brown
     

20 Sep, 2011

1 commit


10 Sep, 2011

1 commit


31 Aug, 2011

1 commit


20 Aug, 2011

1 commit

  • * 'for-linus' of git://git.kernel.dk/linux-block: (23 commits)
    Revert "cfq: Remove special treatment for metadata rqs."
    block: fix flush machinery for stacking drivers with differring flush flags
    block: improve rq_affinity placement
    blktrace: add FLUSH/FUA support
    Move some REQ flags to the common bio/request area
    allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSH
    xen/blkback: Make description more obvious.
    cfq-iosched: Add documentation about idling
    block: Make rq_affinity = 1 work as expected
    block: swim3: fix unterminated of_device_id table
    block/genhd.c: remove useless cast in diskstats_show()
    drivers/cdrom/cdrom.c: relax check on dvd manufacturer value
    drivers/block/drbd/drbd_nl.c: use bitmap_parse instead of __bitmap_parse
    bsg-lib: add module.h include
    cfq-iosched: Reduce linked group count upon group destruction
    blk-throttle: correctly determine sync bio
    loop: fix deadlock when sysfs and LOOP_CLR_FD race against each other
    loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop devices
    loop: add management interface for on-demand device allocation
    loop: replace linked list of allocated devices with an idr index
    ...

    Linus Torvalds
     

11 Aug, 2011

1 commit

  • Add FLUSH/FUA support to blktrace. As FLUSH precedes WRITE and/or
    FUA follows WRITE, use the same 'F' flag for both cases and
    distinguish them by their (relative) position. The end results
    look like (other flags might be shown also):

    - WRITE: W
    - WRITE_FLUSH: FW
    - WRITE_FUA: WF
    - WRITE_FLUSH_FUA: FWF

    Note that we reuse TC_BARRIER due to lack of bit space of act_mask
    so that the older versions of blktrace tools will report flush
    requests as barriers from now on.

    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Signed-off-by: Namhyung Kim
    Reviewed-by: Jeff Moyer
    Signed-off-by: Jens Axboe

    Namhyung Kim