23 Jul, 2013

2 commits

  • Pull tracing fixes and cleanups from Steven Rostedt:
    "This contains fixes, optimizations and some clean ups

    Some of the fixes need to go back to 3.10. They are minor, and deal
    mostly with incorrect ref counting in accessing event files.

    There was a couple of optimizations that should have perf perform a
    bit better when accessing trace events.

    And some various clean ups. Some of the clean ups are necessary to
    help in a fix to a theoretical race between opening a event file and
    deleting that event"

    * tag 'trace-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Kill the unbalanced tr->ref++ in tracing_buffers_open()
    tracing: Kill trace_array->waiter
    tracing: Do not (ab)use trace_seq in event_id_read()
    tracing: Simplify the iteration logic in f_start/f_next
    tracing: Add ref_data to function and fgraph tracer structs
    tracing: Miscellaneous fixes for trace_array ref counting
    tracing: Fix error handling to ensure instances can always be removed
    tracing/kprobe: Wait for disabling all running kprobe handlers
    tracing/perf: Move the PERF_MAX_TRACE_SIZE check into perf_trace_buf_prepare()
    tracing/syscall: Avoid perf_trace_buf_*() if sys_data->perf_events is empty
    tracing/function: Avoid perf_trace_buf_*() if event_function.perf_events is empty
    tracing: Typo fix on ring buffer comments
    tracing: Use trace_seq_puts()/trace_seq_putc() where possible
    tracing: Use correct config guard CONFIG_STACK_TRACER

    Linus Torvalds
     
  • Pull block IO driver bits from Jens Axboe:
    "As I mentioned in the core block pull request, due to real life
    circumstances the driver pull request would be late. Now it looks
    like -rc2 late... On the plus side, apart form the rsxx update, these
    are all things that I could argue could go in later in the cycle as
    they are fixes and not features. So even though things are late, it's
    not ALL bad.

    The pull request contains:

    - Updates to bcache, all bug fixes, from Kent.

    - A pile of drbd bug fixes (no big features this time!).

    - xen blk front/back fixes.

    - rsxx driver updates, some of them deferred form 3.10. So should be
    well cooked by now"

    * 'for-3.11/drivers' of git://git.kernel.dk/linux-block: (63 commits)
    bcache: Allocation kthread fixes
    bcache: Fix GC_SECTORS_USED() calculation
    bcache: Journal replay fix
    bcache: Shutdown fix
    bcache: Fix a sysfs splat on shutdown
    bcache: Advertise that flushes are supported
    bcache: check for allocation failures
    bcache: Fix a dumb race
    bcache: Use standard utility code
    bcache: Update email address
    bcache: Delete fuzz tester
    bcache: Document shrinker reserve better
    bcache: FUA fixes
    drbd: Allow online change of al-stripes and al-stripe-size
    drbd: Constants should be UPPERCASE
    drbd: Ignore the exit code of a fence-peer handler if it returns too late
    drbd: Fix rcu_read_lock balance on error path
    drbd: fix error return code in drbd_init()
    drbd: Do not sleep inside rcu
    bcache: Refresh usage docs
    ...

    Linus Torvalds
     

19 Jul, 2013

1 commit

  • Every perf_trace_buf_prepare() caller does
    WARN_ONCE(size > PERF_MAX_TRACE_SIZE, message) and "message" is
    almost the same.

    Shift this WARN_ONCE() into perf_trace_buf_prepare(). This changes
    the meaning of _ONCE, but I think this is fine.

    - 4947014 2932448 10104832 17984294 1126b26 vmlinux
    + 4948422 2932448 10104832 17985702 11270a6 vmlinux

    on my build.

    Link: http://lkml.kernel.org/r/20130617170211.GA19813@redhat.com

    Acked-by: Peter Zijlstra
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     

12 Jul, 2013

2 commits

  • Pull SCSI target updates from Nicholas Bellinger:
    "Lots of activity this round on performance improvements in target-core
    while benchmarking the prototype scsi-mq initiator code with
    vhost-scsi fabric ports, along with a number of iscsi/iser-target
    improvements and hardening fixes for exception path cases post v3.10
    merge.

    The highlights include:

    - Make persistent reservations APTPL buffer allocated on-demand, and
    drop per t10_reservation buffer. (grover)
    - Make virtual LUN=0 a NULLIO device, and skip allocation of NULLIO
    device pages (grover)
    - Add transport_cmd_check_stop write_pending bit to avoid extra
    access of ->t_state_lock is WRITE I/O submission fast-path. (nab)
    - Drop unnecessary CMD_T_DEV_ACTIVE check from
    transport_lun_remove_cmd to avoid extra access of ->t_state_lock in
    release fast-path. (nab)
    - Avoid extra t_state_lock access in __target_execute_cmd fast-path
    (nab)
    - Drop unnecessary vhost-scsi wait_for_tasks=true usage +
    ->t_state_lock access in release fast-path. (nab)
    - Convert vhost-scsi to use modern se_cmd->cmd_kref
    TARGET_SCF_ACK_KREF usage (nab)
    - Add tracepoints for SCSI commands being processed (roland)
    - Refactoring of iscsi-target handling of ISCSI_OP_NOOP +
    ISCSI_OP_TEXT to be transport independent (nab)
    - Add iscsi-target SendTargets=$IQN support for in-band discovery
    (nab)
    - Add iser-target support for in-band discovery (nab + Or)
    - Add iscsi-target demo-mode TPG authentication context support (nab)
    - Fix isert_put_reject payload buffer post (nab)
    - Fix iscsit_add_reject* usage for iser (nab)
    - Fix iscsit_sequence_cmd reject handling for iser (nab)
    - Fix ISCSI_OP_SCSI_TMFUNC handling for iser (nab)
    - Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED (nab)

    The last five iscsi/iser-target items are CC'ed to stable, as they do
    address issues present in v3.10 code. They are certainly larger than
    I'd like for stable patch set, but are important to ensure proper
    REJECT exception handling in iser-target for 3.10.y"

    * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (51 commits)
    iser-target: Ignore non TEXT + LOGOUT opcodes for discovery
    target: make queue_tm_rsp() return void
    target: remove unused codes from enum tcm_tmrsp_table
    iscsi-target: kstrtou* configfs attribute parameter cleanups
    iscsi-target: Fix tfc_tpg_auth_cit configfs length overflow
    iscsi-target: Fix tfc_tpg_nacl_auth_cit configfs length overflow
    iser-target: Add support for ISCSI_OP_TEXT opcode + payload handling
    iser-target: Rename sense_buf_[dma,len] to pdu_[dma,len]
    iser-target: Add vendor_err debug output
    target: Add (obsolete) checking for PMI/LBA fields in READ CAPACITY(10)
    target: Return correct sense data for IO past the end of a device
    target: Add tracepoints for SCSI commands being processed
    iser-target: Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED
    iscsi-target: Fix ISCSI_OP_SCSI_TMFUNC handling for iser
    iscsi-target: Fix iscsit_sequence_cmd reject handling for iser
    iscsi-target: Fix iscsit_add_reject* usage for iser
    iser-target: Fix isert_put_reject payload buffer post
    iscsi-target: missing kfree() on error path
    iscsi-target: Drop left-over iscsi_conn->bad_hdr
    target: Make core_scsi3_update_and_write_aptpl return sense_reason_t
    ...

    Linus Torvalds
     
  • Pull tracing changes from Steven Rostedt:
    "The majority of the changes here are cleanups for the large changes
    that were added to 3.10, which includes several bug fixes that have
    been marked for stable.

    As for new features, there were a few, but nothing to write to LWN
    about. These include:

    New function trigger called "dump" and "cpudump" that will cause
    ftrace to dump its buffer to the console when the function is called.
    The difference between "dump" and "cpudump" is that "dump" will dump
    the entire contents of the ftrace buffer, where as "cpudump" will only
    dump the contents of the ftrace buffer for the CPU that called the
    function.

    Another small enhancement is a new sysctl switch called
    "traceoff_on_warning" which, when enabled, will disable tracing if any
    WARN_ON() is triggered. This is useful if you want to debug what
    caused a warning and do not want to risk losing your trace data by the
    ring buffer overwriting the data before you can disable it. There's
    also a kernel command line option that will make this enabled at boot
    up called the same thing"

    * tag 'trace-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (34 commits)
    tracing: Make tracing_open_generic_{tr,tc}() static
    tracing: Remove ftrace() function
    tracing: Remove TRACE_EVENT_TYPE enum definition
    tracing: Make tracer_tracing_{off,on,is_on}() static
    tracing: Fix irqs-off tag display in syscall tracing
    uprobes: Fix return value in error handling path
    tracing: Fix race between deleting buffer and setting events
    tracing: Add trace_array_get/put() to event handling
    tracing: Get trace_array ref counts when accessing trace files
    tracing: Add trace_array_get/put() to handle instance refs better
    tracing: Protect ftrace_trace_arrays list in trace_events.c
    tracing: Make trace_marker use the correct per-instance buffer
    ftrace: Do not run selftest if command line parameter is set
    tracing/kprobes: Don't pass addr=ip to perf_trace_buf_submit()
    tracing: Use flag buffer_disabled for irqsoff tracer
    tracing/kprobes: Turn trace_probe->files into list_head
    tracing: Fix disabling of soft disable
    tracing: Add missing syscall_metadata comment
    tracing: Simplify code for showing of soft disabled flag
    tracing/kprobes: Kill probe_enable_lock
    ...

    Linus Torvalds
     

10 Jul, 2013

2 commits

  • …inux/kernel/git/ericvh/v9fs

    Pull 9p update from Eric Van Hensbergen:
    "Grab bag of little fixes and enhancements:
    - optional security enhancements
    - fix path coverage in MAINTAINERS
    - switch to using most used protocol and transport as default
    - clean up buffer dumps in trace code

    Held off on RDMA patches as they need to be cleaned up a bit, but will
    try to get the cleaned, checked, and pushed by mid-week"

    * tag 'for-linus-3.11-merge-window-part-1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    9p: Add rest of 9p files to MAINTAINERS entry
    9p: trace: use %*ph to dump buffer
    net/9p: Handle error in zero copy request correctly for 9p2000.u
    net/9p: Use virtio transpart as the default transport
    net/9p: Make 9P2000.L the default protocol for 9p file system

    Linus Torvalds
     
  • Pull btrfs update from Chris Mason:
    "These are the usual mixture of bugs, cleanups and performance fixes.
    Miao has some really nice tuning of our crc code as well as our
    transaction commits.

    Josef is peeling off more and more problems related to early enospc,
    and has a number of important bug fixes in here too"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (81 commits)
    Btrfs: wait ordered range before doing direct io
    Btrfs: only do the tree_mod_log_free_eb if this is our last ref
    Btrfs: hold the tree mod lock in __tree_mod_log_rewind
    Btrfs: make backref walking code handle skinny metadata
    Btrfs: fix crash regarding to ulist_add_merge
    Btrfs: fix several potential problems in copy_nocow_pages_for_inode
    Btrfs: cleanup the code of copy_nocow_pages_for_inode()
    Btrfs: fix oops when recovering the file data by scrub function
    Btrfs: make the chunk allocator completely tree lockless
    Btrfs: cleanup orphaned root orphan item
    Btrfs: fix wrong mirror number tuning
    Btrfs: cleanup redundant code in btrfs_submit_direct()
    Btrfs: remove btrfs_sector_sum structure
    Btrfs: check if we can nocow if we don't have data space
    Btrfs: stop using try_to_writeback_inodes_sb_nr to flush delalloc
    Btrfs: use a percpu to keep track of possibly pinned bytes
    Btrfs: check for actual acls rather than just xattrs when caching no acl
    Btrfs: move btrfs_truncate_page to btrfs_cont_expand instead of btrfs_truncate
    Btrfs: optimize reada_for_balance
    Btrfs: optimize read_block_for_search
    ...

    Linus Torvalds
     

08 Jul, 2013

1 commit

  • This patch adds tracepoints to the target code for commands being
    received and being completed, which is quite useful for debugging
    interactions with initiators. For example, one can do something like the
    following to watch commands that are completing unsuccessfully:

    # echo 'scsi_status!=0' > /sys/kernel/debug/tracing/events/target/target_cmd_complete/filter
    # echo 1 > /sys/kernel/debug/tracing/events/target/target_cmd_complete/enable

    # cat /sys/kernel/debug/tracing/trace
    iscsi_trx-0-1902 [003] ...1 990185.810385: target_cmd_complete: iqn.1993-08.org.debian:01:e51ede6aacfd
    Signed-off-by: Nicholas Bellinger

    Roland Dreier
     

04 Jul, 2013

4 commits

  • Merge first patch-bomb from Andrew Morton:
    - various misc bits
    - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been
    distracted. There has been quite a bit of activity.
    - About half the MM queue
    - Some backlight bits
    - Various lib/ updates
    - checkpatch updates
    - zillions more little rtc patches
    - ptrace
    - signals
    - exec
    - procfs
    - rapidio
    - nbd
    - aoe
    - pps
    - memstick
    - tools/testing/selftests updates

    * emailed patches from Andrew Morton : (445 commits)
    tools/testing/selftests: don't assume the x bit is set on scripts
    selftests: add .gitignore for kcmp
    selftests: fix clean target in kcmp Makefile
    selftests: add .gitignore for vm
    selftests: add hugetlbfstest
    self-test: fix make clean
    selftests: exit 1 on failure
    kernel/resource.c: remove the unneeded assignment in function __find_resource
    aio: fix wrong comment in aio_complete()
    drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode
    drivers/memstick/host/r592.c: convert to module_pci_driver
    drivers/memstick/host/jmb38x_ms: convert to module_pci_driver
    pps-gpio: add device-tree binding and support
    drivers/pps/clients/pps-gpio.c: convert to module_platform_driver
    drivers/pps/clients/pps-gpio.c: convert to devm_* helpers
    drivers/parport/share.c: use kzalloc
    Documentation/accounting/getdelays.c: avoid strncpy in accounting tool
    aoe: update internal version number to v83
    aoe: update copyright date
    aoe: perform I/O completions in parallel
    ...

    Linus Torvalds
     
  • Andrew Perepechko reported a problem whereby pages are being prematurely
    evicted as the mark_page_accessed() hint is ignored for pages that are
    currently on a pagevec --
    http://www.spinics.net/lists/linux-ext4/msg37340.html .

    Alexey Lyahkov and Robin Dong have also reported problems recently that
    could be due to hot pages reaching the end of the inactive list too
    quickly and be reclaimed.

    Rather than addressing this on a per-filesystem basis, this series aims
    to fix the mark_page_accessed() interface by deferring what LRU a page
    is added to pagevec drain time and allowing mark_page_accessed() to call
    SetPageActive on a pagevec page.

    Patch 1 adds two tracepoints for LRU page activation and insertion. Using
    these processes it's possible to build a model of pages in the
    LRU that can be processed offline.

    Patch 2 defers making the decision on what LRU to add a page to until when
    the pagevec is drained.

    Patch 3 searches the local pagevec for pages to mark PageActive on
    mark_page_accessed. The changelog explains why only the local
    pagevec is examined.

    Patches 4 and 5 tidy up the API.

    postmark, a dd-based test and fs-mark both single and threaded mode were
    run but none of them showed any performance degradation or gain as a
    result of the patch.

    Using patch 1, I built a *very* basic model of the LRU to examine
    offline what the average age of different page types on the LRU were in
    milliseconds. Of course, capturing the trace distorts the test as it's
    written to local disk but it does not matter for the purposes of this
    test. The average age of pages in milliseconds were

    vanilla deferdrain
    Average age mapped anon: 1454 1250
    Average age mapped file: 127841 155552
    Average age unmapped anon: 85 235
    Average age unmapped file: 73633 38884
    Average age unmapped buffers: 74054 116155

    The LRU activity was mostly files which you'd expect for a dd-based
    workload. Note that the average age of buffer pages is increased by the
    series and it is expected this is due to the fact that the buffer pages
    are now getting added to the active list when drained from the pagevecs.
    Note that the average age of the unmapped file data is decreased as they
    are still added to the inactive list and are reclaimed before the
    buffers.

    There is no guarantee this is a universal win for all workloads and it
    would be nice if the filesystem people gave some thought as to whether
    this decision is generally a win or a loss.

    This patch:

    Using these tracepoints it is possible to model LRU activity and the
    average residency of pages of different types. This can be used to
    debug problems related to premature reclaim of pages of particular
    types.

    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Jan Kara
    Cc: Johannes Weiner
    Cc: Alexey Lyahkov
    Cc: Andrew Perepechko
    Cc: Robin Dong
    Cc: Theodore Tso
    Cc: Hugh Dickins
    Cc: Rik van Riel
    Cc: Bernd Schubert
    Cc: David Howells
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Pull power management and ACPI updates from Rafael Wysocki:
    "This time the total number of ACPI commits is slightly greater than
    the number of cpufreq commits, but Viresh Kumar (who works on cpufreq)
    remains the most active patch submitter.

    To me, the most significant change is the addition of offline/online
    device operations to the driver core (with the Greg's blessing) and
    the related modifications of the ACPI core hotplug code. Next are the
    freezer updates from Colin Cross that should make the freezing of
    tasks a bit less heavy weight.

    We also have a couple of regression fixes, a number of fixes for
    issues that have not been identified as regressions, two new drivers
    and a bunch of cleanups all over.

    Highlights:

    - Hotplug changes to support graceful hot-removal failures.

    It sometimes is necessary to fail device hot-removal operations
    gracefully if they cannot be carried out completely. For example,
    if memory from a memory module being hot-removed has been allocated
    for the kernel's own use and cannot be moved elsewhere, it's
    desirable to fail the hot-removal operation in a graceful way
    rather than to crash the kernel, but currenty a success or a kernel
    crash are the only possible outcomes of an attempted memory
    hot-removal. Needless to say, that is not a very attractive
    alternative and it had to be addressed.

    However, in order to make it work for memory, I first had to make
    it work for CPUs and for this purpose I needed to modify the ACPI
    processor driver. It's been split into two parts, a resident one
    handling the low-level initialization/cleanup and a modular one
    playing the actual driver's role (but it binds to the CPU system
    device objects rather than to the ACPI device objects representing
    processors). That's been sort of like a live brain surgery on a
    patient who's riding a bike.

    So this is a little scary, but since we found and fixed a couple of
    regressions it caused to happen during the early linux-next testing
    (a month ago), nobody has complained.

    As a bonus we remove some duplicated ACPI hotplug code, because the
    ACPI-based CPU hotplug is now going to use the common ACPI hotplug
    code.

    - Lighter weight freezing of tasks.

    These changes from Colin Cross and Mandeep Singh Baines are
    targeted at making the freezing of tasks a bit less heavy weight
    operation. They reduce the number of tasks woken up every time
    during the freezing, by using the observation that the freezer
    simply doesn't need to wake up some of them and wait for them all
    to call refrigerator(). The time needed for the freezer to decide
    to report a failure is reduced too.

    Also reintroduced is the check causing a lockdep warining to
    trigger when try_to_freeze() is called with locks held (which is
    generally unsafe and shouldn't happen).

    - cpufreq updates

    First off, a commit from Srivatsa S Bhat fixes a resume regression
    introduced during the 3.10 cycle causing some cpufreq sysfs
    attributes to return wrong values to user space after resume. The
    fix is kind of fresh, but also it's pretty obvious once Srivatsa
    has identified the root cause.

    Second, we have a new freqdomain_cpus sysfs attribute for the
    acpi-cpufreq driver to provide information previously available via
    related_cpus. From Lan Tianyu.

    Finally, we fix a number of issues, mostly related to the
    CPUFREQ_POSTCHANGE notifier and cpufreq Kconfig options and clean
    up some code. The majority of changes from Viresh Kumar with bits
    from Jacob Shin, Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia,
    Arnd Bergmann, and Tang Yuantian.

    - ACPICA update

    A usual bunch of updates from the ACPICA upstream.

    During the 3.4 cycle we introduced support for ACPI 5 extended
    sleep registers, but they are only supposed to be used if the
    HW-reduced mode bit is set in the FADT flags and the code attempted
    to use them without checking that bit. That caused suspend/resume
    regressions to happen on some systems. Fix from Lv Zheng causes
    those registers to be used only if the HW-reduced mode bit is set.

    Apart from this some other ACPICA bugs are fixed and code cleanups
    are made by Bob Moore, Tomasz Nowicki, Lv Zheng, Chao Guan, and
    Zhang Rui.

    - cpuidle updates

    New driver for Xilinx Zynq processors is added by Michal Simek.

    Multidriver support simplification, addition of some missing
    kerneldoc comments and Kconfig-related fixes come from Daniel
    Lezcano.

    - ACPI power management updates

    Changes to make suspend/resume work correctly in Xen guests from
    Konrad Rzeszutek Wilk, sparse warning fix from Fengguang Wu and
    cleanups and fixes of the ACPI device power state selection
    routine.

    - ACPI documentation updates

    Some previously missing pieces of ACPI documentation are added by
    Lv Zheng and Aaron Lu (hopefully, that will help people to
    uderstand how the ACPI subsystem works) and one outdated doc is
    updated by Hanjun Guo.

    - Assorted ACPI updates

    We finally nailed down the IA-64 issue that was the reason for
    reverting commit 9f29ab11ddbf ("ACPI / scan: do not match drivers
    against objects having scan handlers"), so we can fix it and move
    the ACPI scan handler check added to the ACPI video driver back to
    the core.

    A mechanism for adding CMOS RTC address space handlers is
    introduced by Lan Tianyu to allow some EC-related breakage to be
    fixed on some systems.

    A spec-compliant implementation of acpi_os_get_timer() is added by
    Mika Westerberg.

    The evaluation of _STA is added to do_acpi_find_child() to avoid
    situations in which a pointer to a disabled device object is
    returned instead of an enabled one with the same _ADR value. From
    Jeff Wu.

    Intel BayTrail PCH (Platform Controller Hub) support is added to
    the ACPI driver for Intel Low-Power Subsystems (LPSS) and that
    driver is modified to work around a couple of known BIOS issues.
    Changes from Mika Westerberg and Heikki Krogerus.

    The EC driver is fixed by Vasiliy Kulikov to use get_user() and
    put_user() instead of dereferencing user space pointers blindly.

    Code cleanups are made by Bjorn Helgaas, Nicholas Mazzuca and Toshi
    Kani.

    - Assorted power management updates

    The "runtime idle" helper routine is changed to take the return
    values of the callbacks executed by it into account and to call
    rpm_suspend() if they return 0, which allows us to reduce the
    overall code bloat a bit (by dropping some code that's not
    necessary any more after that modification).

    The runtime PM documentation is updated by Alan Stern (to reflect
    the "runtime idle" behavior change).

    New trace points for PM QoS are added by Sahara
    ().

    PM QoS documentation is updated by Lan Tianyu.

    Code cleanups are made and minor issues are addressed by Bernie
    Thompson, Bjorn Helgaas, Julius Werner, and Shuah Khan.

    - devfreq updates

    New driver for the Exynos5-bus device from Abhilash Kesavan.

    Minor cleanups, fixes and MAINTAINERS update from MyungJoo Ham,
    Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and Wei Yongjun.

    - OMAP power management updates

    Adaptive Voltage Scaling (AVS) SmartReflex voltage control driver
    updates from Andrii Tseglytskyi and Nishanth Menon."

    * tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (162 commits)
    cpufreq: Fix cpufreq regression after suspend/resume
    ACPI / PM: Fix possible NULL pointer deref in acpi_pm_device_sleep_state()
    PM / Sleep: Warn about system time after resume with pm_trace
    cpufreq: don't leave stale policy pointer in cdbs->cur_policy
    acpi-cpufreq: Add new sysfs attribute freqdomain_cpus
    cpufreq: make sure frequency transitions are serialized
    ACPI: implement acpi_os_get_timer() according the spec
    ACPI / EC: Add HP Folio 13 to ec_dmi_table in order to skip DSDT scan
    ACPI: Add CMOS RTC Operation Region handler support
    ACPI / processor: Drop unused variable from processor_perflib.c
    cpufreq: tegra: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: s3c64xx: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: omap: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: imx6q: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: exynos: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: dbx500: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: davinci: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: arm-big-little: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: powernow-k8: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: pcc: call CPUFREQ_POSTCHANGE notfier in error cases
    ...

    Linus Torvalds
     
  • Pull regmap updates from Mark Brown:
    "A small but useful set of regmap updates this time around:

    - An abstraction for bitfields within a register map contributed by
    Srinivas Kandagatla, allowing drivers to cope more easily when
    hardware designers randomly move things about (mainly when talking
    to things like system controllers).

    - Changes from Lars-Peter Clausen to allow the MMIO regmap to be used
    from hard IRQ context.

    - Small improvements to the cache infrastructure and performance,
    including a default cache sync operation so now all regmaps can
    sync easily.

    There's also a pinctrl driver making use of the new bitfield API,
    merged here for dependency reasons. There will be a simple add/add
    conflict with the pinctrl tree as a result."

    * tag 'regmap-v3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
    pinctrl: st: Remove unnecessary use of of_match_ptr macro
    pinctrl: st: fix return value check
    pinctrl: st: Add pinctrl and pinconf support.
    regmap: debugfs: Suppress cache for partial register files
    regmap: Add regmap_field APIs
    regmap: core: Cache all registers by default when cache is enabled
    regmap: Implemented default cache sync operation
    regmap: Make regmap-mmio usable from atomic contexts
    regmap: regcache: Fixup locking for custom lock callbacks
    regmap: debugfs: Fix return from regmap_debugfs_get_dump_start
    regmap: debugfs: Don't mark lockdep as broken due to debugfs write
    regmap: rbtree: Use range information to allocate nodes
    regmap: rbtree: Factor out node allocation
    regmap: Make regmap_check_range_table() a public API
    regmap: Add support for discarding parts of the register cache

    Linus Torvalds
     

03 Jul, 2013

2 commits

  • Pull x86 tracing updates from Ingo Molnar:
    "This tree adds IRQ vector tracepoints that are named after the handler
    and which output the vector #, based on a zero-overhead approach that
    relies on changing the IDT entries, by Seiji Aguchi.

    The new tracepoints look like this:

    # perf list | grep -i irq_vector
    irq_vectors:local_timer_entry [Tracepoint event]
    irq_vectors:local_timer_exit [Tracepoint event]
    irq_vectors:reschedule_entry [Tracepoint event]
    irq_vectors:reschedule_exit [Tracepoint event]
    irq_vectors:spurious_apic_entry [Tracepoint event]
    irq_vectors:spurious_apic_exit [Tracepoint event]
    irq_vectors:error_apic_entry [Tracepoint event]
    irq_vectors:error_apic_exit [Tracepoint event]
    [...]"

    * 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/tracing: Add config option checking to the definitions of mce handlers
    trace,x86: Do not call local_irq_save() in load_current_idt()
    trace,x86: Move creation of irq tracepoints from apic.c to irq.c
    x86, trace: Add irq vector tracepoints
    x86: Rename variables for debugging
    x86, trace: Introduce entering/exiting_irq()
    tracing: Add DEFINE_EVENT_FN() macro

    Linus Torvalds
     
  • Pull perf updates from Ingo Molnar:
    "Kernel improvements:

    - watchdog driver improvements by Li Zefan
    - Power7 CPI stack events related improvements by Sukadev Bhattiprolu
    - event multiplexing via hrtimers and other improvements by Stephane
    Eranian
    - kernel stack use optimization by Andrew Hunter
    - AMD IOMMU uncore PMU support by Suravee Suthikulpanit
    - NMI handling rate-limits by Dave Hansen
    - various hw_breakpoint fixes by Oleg Nesterov
    - hw_breakpoint overflow period sampling and related signal handling
    fixes by Jiri Olsa
    - Intel Haswell PMU support by Andi Kleen

    Tooling improvements:

    - Reset SIGTERM handler in workload child process, fix from David
    Ahern.
    - Makefile reorganization, prep work for Kconfig patches, from Jiri
    Olsa.
    - Add automated make test suite, from Jiri Olsa.
    - Add --percent-limit option to 'top' and 'report', from Namhyung
    Kim.
    - Sorting improvements, from Namhyung Kim.
    - Expand definition of sysfs format attribute, from Michael Ellerman.

    Tooling fixes:

    - 'perf tests' fixes from Jiri Olsa.
    - Make Power7 CPI stack events available in sysfs, from Sukadev
    Bhattiprolu.
    - Handle death by SIGTERM in 'perf record', fix from David Ahern.
    - Fix printing of perf_event_paranoid message, from David Ahern.
    - Handle realloc failures in 'perf kvm', from David Ahern.
    - Fix divide by 0 in variance, from David Ahern.
    - Save parent pid in thread struct, from David Ahern.
    - Handle JITed code in shared memory, from Andi Kleen.
    - Fixes for 'perf diff', from Jiri Olsa.
    - Remove some unused struct members, from Jiri Olsa.
    - Add missing liblk.a dependency for python/perf.so, fix from Jiri
    Olsa.
    - Respect CROSS_COMPILE in liblk.a, from Rabin Vincent.
    - No need to do locking when adding hists in perf report, only 'top'
    needs that, from Namhyung Kim.
    - Fix alignment of symbol column in in the hists browser (top,
    report) when -v is given, from NAmhyung Kim.
    - Fix 'perf top' -E option behavior, from Namhyung Kim.
    - Fix bug in isupper() and islower(), from Sukadev Bhattiprolu.
    - Fix compile errors in bp_signal 'perf test', from Sukadev
    Bhattiprolu.

    ... and more things"

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (102 commits)
    perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable()
    perf/x86: Fix shared register mutual exclusion enforcement
    perf/x86/intel: Support full width counting
    x86: Add NMI duration tracepoints
    perf: Drop sample rate when sampling is too slow
    x86: Warn when NMI handlers take large amounts of time
    hw_breakpoint: Introduce "struct bp_cpuinfo"
    hw_breakpoint: Simplify *register_wide_hw_breakpoint()
    hw_breakpoint: Introduce cpumask_of_bp()
    hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths
    hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths
    perf/x86/intel: Add mem-loads/stores support for Haswell
    perf/x86/intel: Support Haswell/v4 LBR format
    perf/x86/intel: Move NMI clearing to end of PMI handler
    perf/x86/intel: Add Haswell PEBS support
    perf/x86/intel: Add simple Haswell PMU support
    perf/x86/intel: Add Haswell PEBS record support
    perf/x86/intel: Fix sparse warning
    perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation
    perf/x86/amd: Add IOMMU Performance Counter resource management
    ...

    Linus Torvalds
     

02 Jul, 2013

1 commit


01 Jul, 2013

1 commit


30 Jun, 2013

1 commit


27 Jun, 2013

2 commits

  • Old gcc doesnt like the struct hack, and it is kind of ugly. So finish
    off the work to convert pr_debug() statements to tracepoints, and delete
    pkey()/pbtree().

    Signed-off-by: Kent Overstreet

    Kent Overstreet
     
  • The tracepoints were reworked to be more sensible, and fixed a null
    pointer deref in one of the tracepoints.

    Converted some of the pr_debug()s to tracepoints - this is partly a
    performance optimization; it used to be that with DEBUG or
    CONFIG_DYNAMIC_DEBUG pr_debug() was an empty macro; but at some point it
    was changed to an empty inline function.

    Some of the pr_debug() statements had rather expensive function calls as
    part of the arguments, so this code was getting run unnecessarily even
    on non debug kernels - in some fast paths, too.

    Signed-off-by: Kent Overstreet

    Kent Overstreet
     

24 Jun, 2013

3 commits


23 Jun, 2013

1 commit

  • This patch has been invaluable in my adventures finding
    issues in the perf NMI handler. I'm as big a fan of
    printk() as anybody is, but using printk() in NMIs is
    deadly when they're happening frequently.

    Even hacking in trace_printk() ended up eating enough
    CPU to throw off some of the measurements I was making.

    Signed-off-by: Dave Hansen
    Acked-by: Peter Zijlstra
    Cc: paulus@samba.org
    Cc: acme@ghostprotocols.net
    Cc: Dave Hansen
    Signed-off-by: Ingo Molnar

    Dave Hansen
     

21 Jun, 2013

1 commit

  • Each TRACE_EVENT() adds several helper functions. If two or more trace events
    share the same structure and print format, they can also share most of these
    helper functions and save a lot of space from duplicate code. This is why the
    DECLARE_EVENT_CLASS() and DEFINE_EVENT() were created.

    Some events require a trigger to be called at registering and unregistering of
    the event and to do so they use TRACE_EVENT_FN().

    If multiple events require a trigger, they currently have no choice but to use
    TRACE_EVENT_FN() as there's no DEFINE_EVENT_FN() available. This unfortunately
    causes a lot of wasted duplicate code created.

    By adding a DEFINE_EVENT_FN(), these events can still use a
    DECLARE_EVENT_CLASS() and then define their own triggers.

    Signed-off-by: Steven Rostedt
    Link: http://lkml.kernel.org/r/51C3236C.8030508@hds.com
    Signed-off-by: Seiji Aguchi
    Signed-off-by: H. Peter Anvin

    Steven Rostedt
     

14 Jun, 2013

1 commit


07 Jun, 2013

1 commit

  • Rename ext4_da_writepages() to ext4_writepages() and use it for all
    modes. We still need to iterate over all the pages in the case of
    data=journalling, but in the case of nodelalloc/data=ordered (which is
    what file systems mounted using ext3 backwards compatibility will use)
    this will allow us to use a much more efficient I/O submission path.

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     

05 Jun, 2013

2 commits

  • There are two issues with current writeback path in ext4. For one we
    don't necessarily map complete pages when blocksize < pagesize and
    thus needn't do any writeback in one iteration. We always map some
    blocks though so we will eventually finish mapping the page. Just if
    writeback races with other operations on the file, forward progress is
    not really guaranteed. The second problem is that current code
    structure makes it hard to associate all the bios to some range of
    pages with one io_end structure so that unwritten extents can be
    converted after all the bios are finished. This will be especially
    difficult later when io_end will be associated with reserved
    transaction handle.

    We restructure the writeback path to a relatively simple loop which
    first prepares extent of pages, then maps one or more extents so that
    no page is partially mapped, and once page is fully mapped it is
    submitted for IO. We keep all the mapping and IO submission
    information in mpage_da_data structure to somewhat reduce stack usage.
    Resulting code is somewhat shorter than the old one and hopefully also
    easier to read.

    Reviewed-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     
  • Reviewed-by: Zheng Liu
    Signed-off-by: Jan Kara
    Signed-off-by: "Theodore Ts'o"

    Jan Kara
     

29 May, 2013

1 commit


28 May, 2013

2 commits

  • Currently punch hole is disabled in file systems with bigalloc
    feature enabled. However the recent changes in punch hole patch should
    make it easier to support punching holes on bigalloc enabled file
    systems.

    This commit changes partial_cluster handling in ext4_remove_blocks(),
    ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
    partial_cluster is unsigned long long type and it makes sure that we
    will free the partial cluster if all extents has been released from that
    cluster. However it has been specifically designed only for truncate.

    With punch hole we can be freeing just some extents in the cluster
    leaving the rest untouched. So we have to make sure that we will notice
    cluster which still has some extents. To do this I've changed
    partial_cluster to be signed long long type. The only scenario where
    this could be a problem is when cluster_size == block size, however in
    that case there would not be any partial clusters so we're safe. For
    bigger clusters the signed type is enough. Now we use the negative value
    in partial_cluster to mark such cluster used, hence we know that we must
    not free it even if all other extents has been freed from such cluster.

    This scenario can be described in simple diagram:

    |FFF...FF..FF.UUU|
    ^----------^
    punch hole

    . - free space
    | - cluster boundary
    F - freed extent
    U - used extent

    Also update respective tracepoints to use signed long long type for
    partial_cluster.

    Signed-off-by: Lukas Czerner
    Reviewed-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Lukas Czerner
     
  • Add "end" variable.

    Signed-off-by: Lukas Czerner
    Reviewed-by: Jan Kara
    Signed-off-by: Theodore Ts'o

    Lukas Czerner
     

22 May, 2013

2 commits


15 May, 2013

1 commit

  • Pull ext4 update from Ted Ts'o:
    "Fixed regressions (two stability regressions and a performance
    regression) introduced during the 3.10-rc1 merge window.

    Also included is a bug fix relating to allocating blocks after
    resizing an ext3 file system when using the ext4 file system driver"

    * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
    jbd,jbd2: fix oops in jbd2_journal_put_journal_head()
    ext4: revert "ext4: use io_end for multiple bios"
    ext4: limit group search loop for non-extent files
    ext4: fix fio regression

    Linus Torvalds
     

12 May, 2013

1 commit


09 May, 2013

3 commits

  • Pull f2fs updates from Jaegeuk Kim:
    "This patch-set includes the following major enhancement patches.
    - introduce a new gloabl lock scheme
    - add tracepoints on several major functions
    - fix the overall cleaning process focused on victim selection
    - apply the block plugging to merge IOs as much as possible
    - enhance management of free nids and its list
    - enhance the readahead mode for node pages
    - address several cretical deadlock conditions
    - reduce lock_page calls

    The other minor bug fixes and enhancements are as follows.
    - calculation mistakes: overflow
    - bio types: READ, READA, and READ_SYNC
    - fix the recovery flow, data races, and null pointer errors"

    * tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
    f2fs: cover free_nid management with spin_lock
    f2fs: optimize scan_nat_page()
    f2fs: code cleanup for scan_nat_page() and build_free_nids()
    f2fs: bugfix for alloc_nid_failed()
    f2fs: recover when journal contains deleted files
    f2fs: continue to mount after failing recovery
    f2fs: avoid deadlock during evict after f2fs_gc
    f2fs: modify the number of issued pages to merge IOs
    f2fs: remove useless #include as we're now using sysfs as debug entry.
    f2fs: fix inconsistent using of NM_WOUT_THRESHOLD
    f2fs: check truncation of mapping after lock_page
    f2fs: enhance alloc_nid and build_free_nids flows
    f2fs: add a tracepoint on f2fs_new_inode
    f2fs: check nid == 0 in add_free_nid
    f2fs: add REQ_META about metadata requests for submit
    f2fs: give a chance to merge IOs by IO scheduler
    f2fs: avoid frequent background GC
    f2fs: add tracepoints to debug checkpoint request
    f2fs: add tracepoints for write page operations
    f2fs: add tracepoints to debug the block allocation
    ...

    Linus Torvalds
     
  • Pull block driver updates from Jens Axboe:
    "It might look big in volume, but when categorized, not a lot of
    drivers are touched. The pull request contains:

    - mtip32xx fixes from Micron.

    - A slew of drbd updates, this time in a nicer series.

    - bcache, a flash/ssd caching framework from Kent.

    - Fixes for cciss"

    * 'for-3.10/drivers' of git://git.kernel.dk/linux-block: (66 commits)
    bcache: Use bd_link_disk_holder()
    bcache: Allocator cleanup/fixes
    cciss: bug fix to prevent cciss from loading in kdump crash kernel
    cciss: add cciss_allow_hpsa module parameter
    drivers/block/mg_disk.c: add CONFIG_PM_SLEEP to suspend/resume functions
    mtip32xx: Workaround for unaligned writes
    bcache: Make sure blocksize isn't smaller than device blocksize
    bcache: Fix merge_bvec_fn usage for when it modifies the bvm
    bcache: Correctly check against BIO_MAX_PAGES
    bcache: Hack around stuff that clones up to bi_max_vecs
    bcache: Set ra_pages based on backing device's ra_pages
    bcache: Take data offset from the bdev superblock.
    mtip32xx: mtip32xx: Disable TRIM support
    mtip32xx: fix a smatch warning
    bcache: Disable broken btree fuzz tester
    bcache: Fix a format string overflow
    bcache: Fix a minor memory leak on device teardown
    bcache: Documentation updates
    bcache: Use WARN_ONCE() instead of __WARN()
    bcache: Add missing #include
    ...

    Linus Torvalds
     
  • Pull block core updates from Jens Axboe:

    - Major bit is Kents prep work for immutable bio vecs.

    - Stable candidate fix for a scheduling-while-atomic in the queue
    bypass operation.

    - Fix for the hang on exceeded rq->datalen 32-bit unsigned when merging
    discard bios.

    - Tejuns changes to convert the writeback thread pool to the generic
    workqueue mechanism.

    - Runtime PM framework, SCSI patches exists on top of these in James'
    tree.

    - A few random fixes.

    * 'for-3.10/core' of git://git.kernel.dk/linux-block: (40 commits)
    relay: move remove_buf_file inside relay_close_buf
    partitions/efi.c: replace useless kzalloc's by kmalloc's
    fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read()
    block: fix max discard sectors limit
    blkcg: fix "scheduling while atomic" in blk_queue_bypass_start
    Documentation: cfq-iosched: update documentation help for cfq tunables
    writeback: expose the bdi_wq workqueue
    writeback: replace custom worker pool implementation with unbound workqueue
    writeback: remove unused bdi_pending_list
    aoe: Fix unitialized var usage
    bio-integrity: Add explicit field for owner of bip_buf
    block: Add an explicit bio flag for bios that own their bvec
    block: Add bio_alloc_pages()
    block: Convert some code to bio_for_each_segment_all()
    block: Add bio_for_each_segment_all()
    bounce: Refactor __blk_queue_bounce to not use bi_io_vec
    raid1: use bio_copy_data()
    pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage
    pktcdvd: use bio_copy_data()
    block: Add bio_copy_data()
    ...

    Linus Torvalds
     

06 May, 2013

2 commits

  • Pull kvm updates from Gleb Natapov:
    "Highlights of the updates are:

    general:
    - new emulated device API
    - legacy device assignment is now optional
    - irqfd interface is more generic and can be shared between arches

    x86:
    - VMCS shadow support and other nested VMX improvements
    - APIC virtualization and Posted Interrupt hardware support
    - Optimize mmio spte zapping

    ppc:
    - BookE: in-kernel MPIC emulation with irqfd support
    - Book3S: in-kernel XICS emulation (incomplete)
    - Book3S: HV: migration fixes
    - BookE: more debug support preparation
    - BookE: e6500 support

    ARM:
    - reworking of Hyp idmaps

    s390:
    - ioeventfd for virtio-ccw

    And many other bug fixes, cleanups and improvements"

    * tag 'kvm-3.10-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (204 commits)
    kvm: Add compat_ioctl for device control API
    KVM: x86: Account for failing enable_irq_window for NMI window request
    KVM: PPC: Book3S: Add API for in-kernel XICS emulation
    kvm/ppc/mpic: fix missing unlock in set_base_addr()
    kvm/ppc: Hold srcu lock when calling kvm_io_bus_read/write
    kvm/ppc/mpic: remove users
    kvm/ppc/mpic: fix mmio region lists when multiple guests used
    kvm/ppc/mpic: remove default routes from documentation
    kvm: KVM_CAP_IOMMU only available with device assignment
    ARM: KVM: iterate over all CPUs for CPU compatibility check
    KVM: ARM: Fix spelling in error message
    ARM: KVM: define KVM_ARM_MAX_VCPUS unconditionally
    KVM: ARM: Fix API documentation for ONE_REG encoding
    ARM: KVM: promote vfp_host pointer to generic host cpu context
    ARM: KVM: add architecture specific hook for capabilities
    ARM: KVM: perform HYP initilization for hotplugged CPUs
    ARM: KVM: switch to a dual-step HYP init code
    ARM: KVM: rework HYP page table freeing
    ARM: KVM: enforce maximum size for identity mapped code
    ARM: KVM: move to a KVM provided HYP idmap
    ...

    Linus Torvalds
     
  • Pull 'full dynticks' support from Ingo Molnar:
    "This tree from Frederic Weisbecker adds a new, (exciting! :-) core
    kernel feature to the timer and scheduler subsystems: 'full dynticks',
    or CONFIG_NO_HZ_FULL=y.

    This feature extends the nohz variable-size timer tick feature from
    idle to busy CPUs (running at most one task) as well, potentially
    reducing the number of timer interrupts significantly.

    This feature got motivated by real-time folks and the -rt tree, but
    the general utility and motivation of full-dynticks runs wider than
    that:

    - HPC workloads get faster: CPUs running a single task should be able
    to utilize a maximum amount of CPU power. A periodic timer tick at
    HZ=1000 can cause a constant overhead of up to 1.0%. This feature
    removes that overhead - and speeds up the system by 0.5%-1.0% on
    typical distro configs even on modern systems.

    - Real-time workload latency reduction: CPUs running critical tasks
    should experience as little jitter as possible. The last remaining
    source of kernel-related jitter was the periodic timer tick.

    - A single task executing on a CPU is a pretty common situation,
    especially with an increasing number of cores/CPUs, so this feature
    helps desktop and mobile workloads as well.

    The cost of the feature is mainly related to increased timer
    reprogramming overhead when a CPU switches its tick period, and thus
    slightly longer to-idle and from-idle latency.

    Configuration-wise a third mode of operation is added to the existing
    two NOHZ kconfig modes:

    - CONFIG_HZ_PERIODIC: [formerly !CONFIG_NO_HZ], now explicitly named
    as a config option. This is the traditional Linux periodic tick
    design: there's a HZ tick going on all the time, regardless of
    whether a CPU is idle or not.

    - CONFIG_NO_HZ_IDLE: [formerly CONFIG_NO_HZ=y], this turns off the
    periodic tick when a CPU enters idle mode.

    - CONFIG_NO_HZ_FULL: this new mode, in addition to turning off the
    tick when a CPU is idle, also slows the tick down to 1 Hz (one
    timer interrupt per second) when only a single task is running on a
    CPU.

    The .config behavior is compatible: existing !CONFIG_NO_HZ and
    CONFIG_NO_HZ=y settings get translated to the new values, without the
    user having to configure anything. CONFIG_NO_HZ_FULL is turned off by
    default.

    This feature is based on a lot of infrastructure work that has been
    steadily going upstream in the last 2-3 cycles: related RCU support
    and non-periodic cputime support in particular is upstream already.

    This tree adds the final pieces and activates the feature. The pull
    request is marked RFC because:

    - it's marked 64-bit only at the moment - the 32-bit support patch is
    small but did not get ready in time.

    - it has a number of fresh commits that came in after the merge
    window. The overwhelming majority of commits are from before the
    merge window, but still some aspects of the tree are fresh and so I
    marked it RFC.

    - it's a pretty wide-reaching feature with lots of effects - and
    while the components have been in testing for some time, the full
    combination is still not very widely used. That it's default-off
    should reduce its regression abilities and obviously there are no
    known regressions with CONFIG_NO_HZ_FULL=y enabled either.

    - the feature is not completely idempotent: there is no 100%
    equivalent replacement for a periodic scheduler/timer tick. In
    particular there's ongoing work to map out and reduce its effects
    on scheduler load-balancing and statistics. This should not impact
    correctness though, there are no known regressions related to this
    feature at this point.

    - it's a pretty ambitious feature that with time will likely be
    enabled by most Linux distros, and we'd like you to make input on
    its design/implementation, if you dislike some aspect we missed.
    Without flaming us to crisp! :-)

    Future plans:

    - there's ongoing work to reduce 1Hz to 0Hz, to essentially shut off
    the periodic tick altogether when there's a single busy task on a
    CPU. We'd first like 1 Hz to be exposed more widely before we go
    for the 0 Hz target though.

    - once we reach 0 Hz we can remove the periodic tick assumption from
    nr_running>=2 as well, by essentially interrupting busy tasks only
    as frequently as the sched_latency constraints require us to do -
    once every 4-40 msecs, depending on nr_running.

    I am personally leaning towards biting the bullet and doing this in
    v3.10, like the -rt tree this effort has been going on for too long -
    but the final word is up to you as usual.

    More technical details can be found in Documentation/timers/NO_HZ.txt"

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
    sched: Keep at least 1 tick per second for active dynticks tasks
    rcu: Fix full dynticks' dependency on wide RCU nocb mode
    nohz: Protect smp_processor_id() in tick_nohz_task_switch()
    nohz_full: Add documentation.
    cputime_nsecs: use math64.h for nsec resolution conversion helpers
    nohz: Select VIRT_CPU_ACCOUNTING_GEN from full dynticks config
    nohz: Reduce overhead under high-freq idling patterns
    nohz: Remove full dynticks' superfluous dependency on RCU tree
    nohz: Fix unavailable tick_stop tracepoint in dynticks idle
    nohz: Add basic tracing
    nohz: Select wide RCU nocb for full dynticks
    nohz: Disable the tick when irq resume in full dynticks CPU
    nohz: Re-evaluate the tick for the new task after a context switch
    nohz: Prepare to stop the tick on irq exit
    nohz: Implement full dynticks kick
    nohz: Re-evaluate the tick from the scheduler IPI
    sched: New helper to prevent from stopping the tick in full dynticks
    sched: Kick full dynticks CPU that have more than one task enqueued.
    perf: New helper to prevent full dynticks CPUs from stopping tick
    perf: Kick full dynticks CPU if events rotation is needed
    ...

    Linus Torvalds