23 Sep, 2013

2 commits

  • Pull block IO fixes from Jens Axboe:
    "After merge window, no new stuff this time only a collection of neatly
    confined and simple fixes"

    * 'for-3.12/core' of git://git.kernel.dk/linux-block:
    cfq: explicitly use 64bit divide operation for 64bit arguments
    block: Add nr_bios to block_rq_remap tracepoint
    If the queue is dying then we only call the rq->end_io callout. This leaves bios setup on the request, because the caller assumes when the blk_execute_rq_nowait/blk_execute_rq call has completed that the rq->bios have been cleaned up.
    bio-integrity: Fix use of bs->bio_integrity_pool after free
    blkcg: relocate root_blkg setting and clearing
    block: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...)
    block: trace all devices plug operation

    Linus Torvalds
     
  • Pull btrfs fixes from Chris Mason:
    "These are mostly bug fixes and a two small performance fixes. The
    most important of the bunch are Josef's fix for a snapshotting
    regression and Mark's update to fix compile problems on arm"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits)
    Btrfs: create the uuid tree on remount rw
    btrfs: change extent-same to copy entire argument struct
    Btrfs: dir_inode_operations should use btrfs_update_time also
    btrfs: Add btrfs: prefix to kernel log output
    btrfs: refuse to remount read-write after abort
    Btrfs: btrfs_ioctl_default_subvol: Revert back to toplevel subvolume when arg is 0
    Btrfs: don't leak transaction in btrfs_sync_file()
    Btrfs: add the missing mutex unlock in write_all_supers()
    Btrfs: iput inode on allocation failure
    Btrfs: remove space_info->reservation_progress
    Btrfs: kill delay_iput arg to the wait_ordered functions
    Btrfs: fix worst case calculator for space usage
    Revert "Btrfs: rework the overcommit logic to be based on the total size"
    Btrfs: improve replacing nocow extents
    Btrfs: drop dir i_size when adding new names on replay
    Btrfs: replay dir_index items before other items
    Btrfs: check roots last log commit when checking if an inode has been logged
    Btrfs: actually log directory we are fsync()'ing
    Btrfs: actually limit the size of delalloc range
    Btrfs: allocate the free space by the existed max extent size when ENOSPC
    ...

    Linus Torvalds
     

22 Sep, 2013

1 commit

  • Adding the number of bios in a remapped request to 'block_rq_remap'
    tracepoint.

    Request remapper clones bios in a request to track the completion
    status of each bio. So the number of bios can be useful information
    for investigation.

    Related discussions:
    http://www.redhat.com/archives/dm-devel/2013-August/msg00084.html
    http://www.redhat.com/archives/dm-devel/2013-September/msg00024.html

    Signed-off-by: Jun'ichi Nomura
    Acked-by: Mike Snitzer
    Cc: Jens Axboe
    Signed-off-by: Jens Axboe

    Jun'ichi Nomura
     

21 Sep, 2013

1 commit


13 Sep, 2013

2 commits

  • Pull vfs pile 4 from Al Viro:
    "list_lru pile, mostly"

    This came out of Andrew's pile, Al ended up doing the merge work so that
    Andrew didn't have to.

    Additionally, a few fixes.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (42 commits)
    super: fix for destroy lrus
    list_lru: dynamically adjust node arrays
    shrinker: Kill old ->shrink API.
    shrinker: convert remaining shrinkers to count/scan API
    staging/lustre/libcfs: cleanup linux-mem.h
    staging/lustre/ptlrpc: convert to new shrinker API
    staging/lustre/obdclass: convert lu_object shrinker to count/scan API
    staging/lustre/ldlm: convert to shrinkers to count/scan API
    hugepage: convert huge zero page shrinker to new shrinker API
    i915: bail out earlier when shrinker cannot acquire mutex
    drivers: convert shrinkers to new count/scan API
    fs: convert fs shrinkers to new scan/count API
    xfs: fix dquot isolation hang
    xfs-convert-dquot-cache-lru-to-list_lru-fix
    xfs: convert dquot cache lru to list_lru
    xfs: rework buffer dispose list tracking
    xfs-convert-buftarg-lru-to-generic-code-fix
    xfs: convert buftarg LRU to generic code
    fs: convert inode and dentry shrinking to be node aware
    vmscan: per-node deferred work
    ...

    Linus Torvalds
     
  • Pull btrfs updates from Chris Mason:
    "This is against 3.11-rc7, but was pulled and tested against your tree
    as of yesterday. We do have two small incrementals queued up, but I
    wanted to get this bunch out the door before I hop on an airplane.

    This is a fairly large batch of fixes, performance improvements, and
    cleanups from the usual Btrfs suspects.

    We've included Stefan Behren's work to index subvolume UUIDs, which is
    targeted at speeding up send/receive with many subvolumes or snapshots
    in place. It closes a long standing performance issue that was built
    in to the disk format.

    Mark Fasheh's offline dedup work is also here. In this case offline
    means the FS is mounted and active, but the dedup work is not done
    inline during file IO. This is a building block where utilities are
    able to ask the FS to dedup a series of extents. The kernel takes
    care of verifying the data involved really is the same. Today this
    involves reading both extents, but we'll continue to evolve the
    patches"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (118 commits)
    Btrfs: optimize key searches in btrfs_search_slot
    Btrfs: don't use an async starter for most of our workers
    Btrfs: only update disk_i_size as we remove extents
    Btrfs: fix deadlock in uuid scan kthread
    Btrfs: stop refusing the relocation of chunk 0
    Btrfs: fix memory leak of uuid_root in free_fs_info
    btrfs: reuse kbasename helper
    btrfs: return btrfs error code for dev excl ops err
    Btrfs: allow partial ordered extent completion
    Btrfs: convert all bug_ons in free-space-cache.c
    Btrfs: add support for asserts
    Btrfs: adjust the fs_devices->missing count on unmount
    Btrf: cleanup: don't check for root_refs == 0 twice
    Btrfs: fix for patch "cleanup: don't check the same thing twice"
    Btrfs: get rid of one BUG() in write_all_supers()
    Btrfs: allocate prelim_ref with a slab allocater
    Btrfs: pass gfp_t to __add_prelim_ref() to avoid always using GFP_ATOMIC
    Btrfs: fix race conditions in BTRFS_IOC_FS_INFO ioctl
    Btrfs: fix race between removing a dev and writing sbs
    Btrfs: remove ourselves from the cluster list under lock
    ...

    Linus Torvalds
     

12 Sep, 2013

1 commit

  • In the current code, the value of fallback_migratetype that is printed
    using the mm_page_alloc_extfrag tracepoint, is the value of the
    migratetype *after* it has been set to the preferred migratetype (if the
    ownership was changed). Obviously that wouldn't have been the original
    intent. (We already have a separate 'change_ownership' field to tell
    whether the ownership of the pageblock was changed from the
    fallback_migratetype to the preferred type.)

    The intent of the fallback_migratetype field is to show the migratetype
    from which we borrowed pages in order to satisfy the allocation request.
    So fix the code to print that value correctly.

    Signed-off-by: Srivatsa S. Bhat
    Cc: Mel Gorman
    Cc: Minchan Kim
    Cc: Cody P Schafer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Srivatsa S. Bhat
     

11 Sep, 2013

1 commit

  • There are no more users of this API, so kill it dead, dead, dead and
    quietly bury the corpse in a shallow, unmarked grave in a dark forest deep
    in the hills...

    [glommer@openvz.org: added flowers to the grave]
    Signed-off-by: Dave Chinner
    Signed-off-by: Glauber Costa
    Reviewed-by: Greg Thelen
    Acked-by: Mel Gorman
    Cc: "Theodore Ts'o"
    Cc: Adrian Hunter
    Cc: Al Viro
    Cc: Artem Bityutskiy
    Cc: Arve Hjønnevåg
    Cc: Carlos Maiolino
    Cc: Christoph Hellwig
    Cc: Chuck Lever
    Cc: Daniel Vetter
    Cc: David Rientjes
    Cc: Gleb Natapov
    Cc: Greg Thelen
    Cc: J. Bruce Fields
    Cc: Jan Kara
    Cc: Jerome Glisse
    Cc: John Stultz
    Cc: KAMEZAWA Hiroyuki
    Cc: Kent Overstreet
    Cc: Kirill A. Shutemov
    Cc: Marcelo Tosatti
    Cc: Mel Gorman
    Cc: Steven Whitehouse
    Cc: Thomas Hellstrom
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton

    Signed-off-by: Al Viro

    Dave Chinner
     

10 Sep, 2013

1 commit

  • Pull NFS client updates from Trond Myklebust:
    "Highlights include:

    - Fix NFSv4 recovery so that it doesn't recover lost locks in cases
    such as lease loss due to a network partition, where doing so may
    result in data corruption. Add a kernel parameter to control
    choice of legacy behaviour or not.
    - Performance improvements when 2 processes are writing to the same
    file.
    - Flush data to disk when an RPCSEC_GSS session timeout is imminent.
    - Implement NFSv4.1 SP4_MACH_CRED state protection to prevent other
    NFS clients from being able to manipulate our lease and file
    locking state.
    - Allow sharing of RPCSEC_GSS caches between different rpc clients.
    - Fix the broken NFSv4 security auto-negotiation between client and
    server.
    - Fix rmdir() to wait for outstanding sillyrename unlinks to complete
    - Add a tracepoint framework for debugging NFSv4 state recovery
    issues.
    - Add tracing to the generic NFS layer.
    - Add tracing for the SUNRPC socket connection state.
    - Clean up the rpc_pipefs mount/umount event management.
    - Merge more patches from Chuck in preparation for NFSv4 migration
    support"

    * tag 'nfs-for-3.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (107 commits)
    NFSv4: use mach cred for SECINFO_NO_NAME w/ integrity
    NFS: nfs_compare_super shouldn't check the auth flavour unless 'sec=' was set
    NFSv4: Allow security autonegotiation for submounts
    NFSv4: Disallow security negotiation for lookups when 'sec=' is specified
    NFSv4: Fix security auto-negotiation
    NFS: Clean up nfs_parse_security_flavors()
    NFS: Clean up the auth flavour array mess
    NFSv4.1 Use MDS auth flavor for data server connection
    NFS: Don't check lock owner compatability unless file is locked (part 2)
    NFS: Don't check lock owner compatibility in writes unless file is locked
    nfs4: Map NFS4ERR_WRONG_CRED to EPERM
    nfs4.1: Add SP4_MACH_CRED write and commit support
    nfs4.1: Add SP4_MACH_CRED stateid support
    nfs4.1: Add SP4_MACH_CRED secinfo support
    nfs4.1: Add SP4_MACH_CRED cleanup support
    nfs4.1: Add state protection handler
    nfs4.1: Minimal SP4_MACH_CRED implementation
    SUNRPC: Replace pointer values with task->tk_pid and rpc_clnt->cl_clid
    SUNRPC: Add an identifier for struct rpc_clnt
    SUNRPC: Ensure rpc_task->tk_pid is available for tracepoints
    ...

    Linus Torvalds
     

05 Sep, 2013

4 commits

  • Instead of the pointer values, use the task and client identifier values
    for tracing purposes.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • Pull ext4 updates from Ted Ts'o:
    "New features for 3.12:

    - Added aggressive extent caching using the extent status tree. This
    can actually decrease memory usage in read-mostly workloads since
    the information is much more compactly stored in the extent status
    tree than if we had to keep the extent tree metadata blocks in the
    buffer cache. This also improves Asynchronous I/O since it is it
    makes much less likely that we need to do metadata I/O to lookup
    the extent tree information.

    - Improve the recovery after corrupted allocation bitmaps are found
    when running in errors=ignore mode.

    Also fixed some writeback vs truncate races when using a blocksize
    less than the page size"

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (25 commits)
    ext4: allow specifying external journal by pathname mount option
    ext4: mark group corrupt on group descriptor checksum
    ext4: mark block group as corrupt on inode bitmap error
    ext4: mark block group as corrupt on block bitmap error
    ext4: fix type declaration of ext4_validate_block_bitmap
    ext4: error out if verifying the block bitmap fails
    jbd2: Fix endian mixing problems in the checksumming code
    ext4: isolate ext4_extents.h file
    ext4: Fix misspellings using 'codespell' tool
    ext4: convert write_begin methods to stable_page_writes semantics
    ext4: fix use of potentially uninitialized variables in debugging code
    ext4: fix lost truncate due to race with writeback
    ext4: simplify truncation code in ext4_setattr()
    ext4: fix ext4_writepages() in presence of truncate
    ext4: move test whether extent to map can be extended to one place
    ext4: fix warning in ext4_da_update_reserve_space()
    quota: provide interface for readding allocated space into reserved space
    ext4: avoid reusing recently deleted inodes in no journal mode
    ext4: allocate delayed allocation blocks before rename
    ext4: start handle at least possible moment when renaming files
    ...

    Linus Torvalds
     
  • Pull timers/nohz changes from Ingo Molnar:
    "It mostly contains fixes and full dynticks off-case optimizations, by
    Frederic Weisbecker"

    * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
    nohz: Include local CPU in full dynticks global kick
    nohz: Optimize full dynticks's sched hooks with static keys
    nohz: Optimize full dynticks state checks with static keys
    nohz: Rename a few state variables
    vtime: Always debug check snapshot source _before_ updating it
    vtime: Always scale generic vtime accounting results
    vtime: Optimize full dynticks accounting off case with static keys
    vtime: Describe overriden functions in dedicated arch headers
    m68k: hardirq_count() only need preempt_mask.h
    hardirq: Split preempt count mask definitions
    context_tracking: Split low level state headers
    vtime: Fix racy cputime delta update
    vtime: Remove a few unneeded generic vtime state checks
    context_tracking: User/kernel broundary cross trace events
    context_tracking: Optimize context switch off case with static keys
    context_tracking: Optimize guest APIs off case with static key
    context_tracking: Optimize main APIs off case with static key
    context_tracking: Ground setup for static key use
    context_tracking: Remove full dynticks' hacky dependency on wide context tracking
    nohz: Only enable context tracking on full dynticks CPUs
    ...

    Linus Torvalds
     
  • Add client side debugging to help trace socket connection/disconnection
    and unexpected state change issues.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

04 Sep, 2013

2 commits

  • …rnel.org/pub/scm/linux/kernel/git/tip/tip

    Pull perf changes from Ingo Molnar:
    "As a first remark I'd like to point out that the obsolete '-f'
    (--force) option, which has not done anything for several releases,
    has been removed from 'perf record' and related utilities. Everyone
    please update muscle memory accordingly! :-)

    Main changes on the perf kernel side:

    - Performance optimizations:
    . for trace events, by Steve Rostedt.
    . for time values, by Peter Zijlstra

    - New hardware support:
    . for Intel Silvermont (22nm Atom) CPUs, by Zheng Yan
    . for Intel SNB-EP uncore PMUs, by Zheng Yan

    - Enhanced hardware support:
    . for Intel uncore PMUs: add filter support for QPI boxes, by Zheng Yan

    - Core perf events code enhancements and fixes:
    . for full-nohz feature handling, by Frederic Weisbecker
    . for group events, by Jiri Olsa
    . for call chains, by Frederic Weisbecker
    . for event stream parsing, by Adrian Hunter

    - New ABI details:
    . Add attr->mmap2 attribute, by Stephane Eranian
    . Add PERF_EVENT_IOC_ID ioctl to return event ID, by Jiri Olsa
    . Export u64 time_zero on the mmap header page to allow TSC
    calculation, by Adrian Hunter
    . Add dummy software event, by Adrian Hunter.
    . Add a new PERF_SAMPLE_IDENTIFIER to make samples always
    parseable, by Adrian Hunter.
    . Make Power7 events available via sysfs, by Runzhen Wang.

    - Code cleanups and refactorings:
    . for nohz-full, by Frederic Weisbecker
    . for group events, by Jiri Olsa

    - Documentation updates:
    . for perf_event_type, by Peter Zijlstra

    Main changes on the perf tooling side (some of these tooling changes
    utilize the above kernel side changes):

    - Lots of 'perf trace' enhancements:

    . Make 'perf trace' command line arguments consistent with
    'perf record', by David Ahern.

    . Allow specifying syscalls a la strace, by Arnaldo Carvalho de Melo.

    . Add --verbose and -o/--output options, by Arnaldo Carvalho de Melo.

    . Support ! in -e expressions, to filter a list of syscalls,
    by Arnaldo Carvalho de Melo.

    . Arg formatting improvements to allow masking arguments in
    syscalls such as futex and open, where the some arguments are
    ignored and thus should not be printed depending on other args,
    by Arnaldo Carvalho de Melo.

    . Beautify futex open, openat, open_by_handle_at, lseek and futex
    syscalls, by Arnaldo Carvalho de Melo.

    . Add option to analyze events in a file versus live, so that
    one can do:

    [root@zoo ~]# perf record -a -e raw_syscalls:* sleep 1
    [ perf record: Woken up 0 times to write data ]
    [ perf record: Captured and wrote 25.150 MB perf.data (~1098836 samples) ]
    [root@zoo ~]# perf trace -i perf.data -e futex --duration 1
    17.799 ( 1.020 ms): 7127 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, ua
    113.344 (95.429 ms): 7127 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, uaddr2: 0x7fff3f6c6648, val3: 4294967
    133.778 ( 1.042 ms): 18004 futex(uaddr: 0x7fff3f6c6674, op: 393, val: 1, utime: 0x7fff3f6c6470, uaddr2: 0x7fff3f6c6648, val3: 429496
    [root@zoo ~]#

    By David Ahern.

    . Honor target pid / tid options when analyzing a file, by David Ahern.

    . Introduce better formatting of syscall arguments, including so
    far beautifiers for mmap, madvise, syscall return values,
    by Arnaldo Carvalho de Melo.

    . Handle HUGEPAGE defines in the mmap beautifier, by David Ahern.

    - 'perf report/top' enhancements:

    . Do annotation using /proc/kcore and /proc/kallsyms when
    available, removing the forced need for a vmlinux file kernel
    assembly annotation. This also improves this use case because
    vmlinux has just the initial kernel image, not what is actually
    in use after various code patchings by things like alternatives.
    By Adrian Hunter.

    . Add --ignore-callees=<regex> option to collapse undesired parts
    of call graphs, by Greg Price.

    . Simplify symbol filtering by doing it at machine class level,
    by Adrian Hunter.

    . Add support for callchains in the gtk UI, by Namhyung Kim.

    . Add --objdump option to 'perf top', by Sukadev Bhattiprolu.

    - 'perf kvm' enhancements:

    . Add option to print only events that exceed a specified time
    duration, by David Ahern.

    . Improve stack trace printing, by David Ahern.

    . Update documentation of the live command, by David Ahern

    . Add perf kvm stat live mode that combines aspects of 'perf kvm
    stat' record and report, by David Ahern.

    . Add option to analyze specific VM in perf kvm stat report, by
    David Ahern.

    . Do not require /lib/modules/* on a guest, by Jason Wessel.

    - 'perf script' enhancements:

    . Fix symbol offset computation for some dsos, by David Ahern.

    . Fix named threads support, by David Ahern.

    . Don't install scripting files files when perl/python support
    is disabled, by Arnaldo Carvalho de Melo.

    - 'perf test' enhancements:

    . Add various improvements and fixes to the "vmlinux matches
    kallsyms" 'perf test' entry, related to the /proc/kcore
    annotation feature. By Adrian Hunter.

    . Add sample parsing test, by Adrian Hunter.

    . Add test for reading object code, by Adrian Hunter.

    . Add attr record group sampling test, by Jiri Olsa.

    . Misc testing infrastructure improvements and other details,
    by Jiri Olsa.

    - 'perf list' enhancements:

    . Skip unsupported hardware events, by Namhyung Kim.

    . List pmu events, by Andi Kleen.

    - 'perf diff' enhancements:

    . Add support for more than two files comparison, by Jiri Olsa.

    - 'perf sched' enhancements:

    . Various improvements, including removing reliance on some
    scheduler tracepoints that provide the same information as the
    PERF_RECORD_{FORK,EXIT} events. By David Ahern.

    . Remove odd build stall by moving a large struct initialization
    from a local variable to a global one, by Namhyung Kim.

    - 'perf stat' enhancements:

    . Add --initial-delay option to skip measuring for a defined
    startup phase, by Andi Kleen.

    - Generic perf tooling infrastructure/plumbing changes:

    . Tidy up sample parsing validation, by Adrian Hunter.

    . Fix up jobserver setup in libtraceevent Makefile.
    by Arnaldo Carvalho de Melo.

    . Debug improvements, by Adrian Hunter.

    . Fix correlation of samples coming after PERF_RECORD_EXIT event,
    by David Ahern.

    . Improve robustness of the topology parsing code,
    by Stephane Eranian.

    . Add group leader sampling, that allows just one event in a group
    to sample while the other events have just its values read,
    by Jiri Olsa.

    . Add support for a new modifier "D", which requests that the
    event, or group of events, be pinned to the PMU.
    By Michael Ellerman.

    . Support callchain sorting based on addresses, by Andi Kleen

    . Prep work for multi perf data file storage, by Jiri Olsa.

    . libtraceevent cleanups, by Namhyung Kim.

    And lots and lots of other fixes and code reorganizations that did not
    make it into the list, see the shortlog, diffstat and the Git log for
    details!"

    [ Also merge a leftover from the 3.11 cycle ]

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf: Prevent race in unthrottling code

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (237 commits)
    perf trace: Tell arg formatters the arg index
    perf trace: Add beautifier for open's flags arg
    perf trace: Add beautifier for lseek's whence arg
    perf tools: Fix symbol offset computation for some dsos
    perf list: Skip unsupported events
    perf tests: Add 'keep tracking' test
    perf tools: Add support for PERF_COUNT_SW_DUMMY
    perf: Add a dummy software event to keep tracking
    perf trace: Add beautifier for futex 'operation' parm
    perf trace: Allow syscall arg formatters to mask args
    perf: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node()
    perf: Export struct perf_branch_entry to userspace
    perf: Add attr->mmap2 attribute to an event
    perf/x86: Add Silvermont (22nm Atom) support
    perf/x86: use INTEL_UEVENT_EXTRA_REG to define MSR_OFFCORE_RSP_X
    perf trace: Handle missing HUGEPAGE defines
    perf trace: Honor target pid / tid options when analyzing a file
    perf trace: Add option to analyze events in a file versus live
    perf evlist: Add tracepoint lookup by name
    perf tests: Add a sample parsing test
    ...

    Linus Torvalds
     
  • Pull RCU updates from Ingo Molnar:
    "Main RCU changes this cycle were:

    - Full-system idle detection. This is for use by Frederic
    Weisbecker's adaptive-ticks mechanism. Its purpose is to allow the
    timekeeping CPU to shut off its tick when all other CPUs are idle.

    - Miscellaneous fixes.

    - Improved rcutorture test coverage.

    - Updated RCU documentation"

    * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
    nohz_full: Force RCU's grace-period kthreads onto timekeeping CPU
    nohz_full: Add full-system-idle state machine
    jiffies: Avoid undefined behavior from signed overflow
    rcu: Simplify _rcu_barrier() processing
    rcu: Make rcutorture emit online failures if verbose
    rcu: Remove unused variable from rcu_torture_writer()
    rcu: Sort rcutorture module parameters
    rcu: Increase rcutorture test coverage
    rcu: Add duplicate-callback tests to rcutorture
    doc: Fix memory-barrier control-dependency example
    rcu: Update RTFP documentation
    nohz_full: Add full-system-idle arguments to API
    nohz_full: Add full-system idle states and variables
    nohz_full: Add per-CPU idle-state tracking
    nohz_full: Add rcu_dyntick data for scalable detection of all-idle state
    nohz_full: Add Kconfig parameter for scalable detection of all-idle state
    nohz_full: Add testing information to documentation
    rcu: Eliminate unused APIs intended for adaptive ticks
    rcu: Select IRQ_WORK from TREE_PREEMPT_RCU
    rculist: list_first_or_null_rcu() should use list_entry_rcu()
    ...

    Linus Torvalds
     

03 Sep, 2013

1 commit

  • …/linux-rcu into core/rcu

    Pull RCU updates from Paul E. McKenney:

    "
    * Update RCU documentation. These were posted to LKML at
    https://lkml.org/lkml/2013/8/19/611.

    * Miscellaneous fixes. These were posted to LKML at
    https://lkml.org/lkml/2013/8/19/619.

    * Full-system idle detection. This is for use by Frederic
    Weisbecker's adaptive-ticks mechanism. Its purpose is
    to allow the timekeeping CPU to shut off its tick when
    all other CPUs are idle. These were posted to LKML at
    https://lkml.org/lkml/2013/8/19/648.

    * Improve rcutorture test coverage. These were posted to LKML at
    https://lkml.org/lkml/2013/8/19/675.
    "

    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     

01 Sep, 2013

1 commit


29 Aug, 2013

1 commit

  • After applied the commit (4a092d73), we have reduced the number of
    source files that need to #include ext4_extents.h. But we can do
    better.

    This commit defines ext4_zeroout_es() in extents.c and move
    EXT_MAX_BLOCKS into ext4.h in order not to include ext4_extents.h in
    indirect.c and ioctl.c. Meanwhile we just need to include this file in
    extent_status.c when ES_AGGRESSIVE_TEST is defined. Otherwise, this
    commit removes a duplicated declaration in trace/events/ext4.h.

    After applied this patch, we just need to include ext4_extents.h file
    in {super,migrate,move_extents,extents}.c, and it is easy for us to
    define a new extent disk layout.

    Signed-off-by: Zheng Liu
    Signed-off-by: "Theodore Ts'o"

    Zheng Liu
     

27 Aug, 2013

1 commit


17 Aug, 2013

2 commits

  • When we read in an extent tree leaf block from disk, arrange to have
    all of its entries cached. In nearly all cases the in-memory
    representation will be more compact than the on-disk representation in
    the buffer cache, and it allows us to get the information without
    having to traverse the extent tree for successive extents.

    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Zheng Liu

    Theodore Ts'o
     
  • Don't use an unsigned long long for the es_status flags; this requires
    that we pass 64-bit values around which is painful on 32-bit systems.
    Instead pass the extent status flags around using the low 4 bits of an
    unsigned int, and shift them into place when we are reading or writing
    es_pblk.

    Signed-off-by: "Theodore Ts'o"
    Reviewed-by: Zheng Liu

    Theodore Ts'o
     

14 Aug, 2013

4 commits

  • This can be useful to track all kernel/user round trips.
    And it's also helpful to debug the context tracking subsystem.

    Signed-off-by: Frederic Weisbecker
    Cc: Steven Rostedt
    Cc: Paul E. McKenney
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Peter Zijlstra
    Cc: Borislav Petkov
    Cc: Li Zhong
    Cc: Mike Galbraith
    Cc: Kevin Hilman

    Frederic Weisbecker
     
  • perf_trace_buf_prepare() + perf_trace_buf_submit(task => NULL)
    make no sense if hlist_empty(head). Change perf_trace_##call()
    to check ->perf_events beforehand and do nothing if it is empty.

    This removes the overhead for tasks without events associated
    with them. For example, "perf record -e sched:sched_switch -p1"
    attaches the counter(s) to the single task, but every task in
    system will do perf_trace_buf_prepare/submit() just to realize
    that it was not attached to this event.

    However, we can only do this if __task == NULL, so we also add
    the __builtin_constant_p(__task) check.

    With this patch "perf bench sched pipe" shows approximately 4%
    improvement when "perf record -p1" runs in parallel, many thanks
    to Steven for the testing.

    Link: http://lkml.kernel.org/r/20130806160847.GA2746@redhat.com

    Tested-by: David Ahern
    Acked-by: Peter Zijlstra
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • The next patch tries to avoid the costly perf_trace_buf_* calls
    when possible but there is a problem. We can only do this if
    __task == NULL, perf_tp_event(task != NULL) has the additional
    code for this case.

    Unfortunately, TP_perf_assign/__perf_xxx which changes the default
    values of __count/__task variables for perf_trace_buf_submit() is
    called "too late", after we already did perf_trace_buf_prepare(),
    and the optimization above can't work.

    So this patch simply embeds __perf_xxx() into TP_ARGS(), this way
    DECLARE_EVENT_CLASS() can use the result of assignments hidden in
    "args" right after ftrace_get_offsets_##call() which is mostly
    trivial. This allows us to have the fast-path "__task != NULL"
    check at the start, see the next patch.

    Link: http://lkml.kernel.org/r/20130806160844.GA2739@redhat.com

    Tested-by: David Ahern
    Acked-by: Peter Zijlstra
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • To simplify the review of the next patches:

    1. We are going to reimplent __perf_task/counter and embedd them
    into TP_ARGS(). expand TRACE_EVENT(sched_stat_runtime) into
    DECLARE_EVENT_CLASS() + DEFINE_EVENT(), this way they can use
    different TP_ARGS's.

    2. Change perf_trace_##call() macro to do perf_fetch_caller_regs()
    right before perf_trace_buf_prepare().

    This way it evaluates TP_ARGS() asap, the next patch explores
    this fact.

    Note: after 87f44bbc perf_trace_buf_prepare() doesn't need
    "struct pt_regs *regs", perhaps it makes sense to remove this
    argument. And perhaps we can teach perf_trace_buf_submit()
    to accept regs == NULL and do fetch_caller_regs(CALLER_ADDR1)
    in this case.

    3. Cosmetic, but the typecast from "void*" buys nothing. It just
    adds the noise, remove it.

    Link: http://lkml.kernel.org/r/20130806160841.GA2736@redhat.com

    Acked-by: Peter Zijlstra
    Tested-by: David Ahern
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     

30 Jul, 2013

1 commit

  • All the RCU tracepoints and functions that reference char pointers do
    so with just 'char *' even though they do not modify the contents of
    the string itself. This will cause warnings if a const char * is used
    in one of these functions.

    The RCU tracepoints store the pointer to the string to refer back to them
    when the trace output is displayed. As this can be minutes, hours or
    even days later, those strings had better be constant.

    This change also opens the door to allow the RCU tracepoint strings and
    their addresses to be exported so that userspace tracing tools can
    translate the contents of the pointers of the RCU tracepoints.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

27 Jul, 2013

1 commit

  • A new trace event is added to PM events to print the time it takes to
    suspend and resume a device. It generates trace messages that
    include device, driver, parent information in addition to the type of
    PM ops invoked as well as the PM event and error status from the PM
    ops. Example trace below:

    bash-2239 [000] .... 290.883035: device_pm_report_time: backlight
    acpi_video0 parent=0000:00:02.0 state=freeze ops=class nsecs=332 err=0
    bash-2239 [000] .... 290.883041: device_pm_report_time: rfkill rf
    kill3 parent=phy0 state=freeze ops=legacy class nsecs=216 err=0
    bash-2239 [001] .... 290.973892: device_pm_report_time: ieee80211
    phy0 parent=0000:01:00.0 state=freeze ops=legacy class nsecs=90846477 err=0

    bash-2239 [001] .... 293.660129: device_pm_report_time: ieee80211 phy0 parent=0000:01:00.0 state=restore ops=legacy class nsecs=101295162 err=0
    bash-2239 [001] .... 293.660147: device_pm_report_time: rfkill rfkill3 parent=phy0 state=restore ops=legacy class nsecs=1804 err=0
    bash-2239 [001] .... 293.660157: device_pm_report_time: backlight acpi_video0 parent=0000:00:02.0 state=restore ops=class nsecs=757 err=0

    Signed-off-by: Shuah Khan
    Signed-off-by: Rafael J. Wysocki

    Shuah Khan
     

23 Jul, 2013

2 commits

  • Pull tracing fixes and cleanups from Steven Rostedt:
    "This contains fixes, optimizations and some clean ups

    Some of the fixes need to go back to 3.10. They are minor, and deal
    mostly with incorrect ref counting in accessing event files.

    There was a couple of optimizations that should have perf perform a
    bit better when accessing trace events.

    And some various clean ups. Some of the clean ups are necessary to
    help in a fix to a theoretical race between opening a event file and
    deleting that event"

    * tag 'trace-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Kill the unbalanced tr->ref++ in tracing_buffers_open()
    tracing: Kill trace_array->waiter
    tracing: Do not (ab)use trace_seq in event_id_read()
    tracing: Simplify the iteration logic in f_start/f_next
    tracing: Add ref_data to function and fgraph tracer structs
    tracing: Miscellaneous fixes for trace_array ref counting
    tracing: Fix error handling to ensure instances can always be removed
    tracing/kprobe: Wait for disabling all running kprobe handlers
    tracing/perf: Move the PERF_MAX_TRACE_SIZE check into perf_trace_buf_prepare()
    tracing/syscall: Avoid perf_trace_buf_*() if sys_data->perf_events is empty
    tracing/function: Avoid perf_trace_buf_*() if event_function.perf_events is empty
    tracing: Typo fix on ring buffer comments
    tracing: Use trace_seq_puts()/trace_seq_putc() where possible
    tracing: Use correct config guard CONFIG_STACK_TRACER

    Linus Torvalds
     
  • Pull block IO driver bits from Jens Axboe:
    "As I mentioned in the core block pull request, due to real life
    circumstances the driver pull request would be late. Now it looks
    like -rc2 late... On the plus side, apart form the rsxx update, these
    are all things that I could argue could go in later in the cycle as
    they are fixes and not features. So even though things are late, it's
    not ALL bad.

    The pull request contains:

    - Updates to bcache, all bug fixes, from Kent.

    - A pile of drbd bug fixes (no big features this time!).

    - xen blk front/back fixes.

    - rsxx driver updates, some of them deferred form 3.10. So should be
    well cooked by now"

    * 'for-3.11/drivers' of git://git.kernel.dk/linux-block: (63 commits)
    bcache: Allocation kthread fixes
    bcache: Fix GC_SECTORS_USED() calculation
    bcache: Journal replay fix
    bcache: Shutdown fix
    bcache: Fix a sysfs splat on shutdown
    bcache: Advertise that flushes are supported
    bcache: check for allocation failures
    bcache: Fix a dumb race
    bcache: Use standard utility code
    bcache: Update email address
    bcache: Delete fuzz tester
    bcache: Document shrinker reserve better
    bcache: FUA fixes
    drbd: Allow online change of al-stripes and al-stripe-size
    drbd: Constants should be UPPERCASE
    drbd: Ignore the exit code of a fence-peer handler if it returns too late
    drbd: Fix rcu_read_lock balance on error path
    drbd: fix error return code in drbd_init()
    drbd: Do not sleep inside rcu
    bcache: Refresh usage docs
    ...

    Linus Torvalds
     

19 Jul, 2013

1 commit

  • Every perf_trace_buf_prepare() caller does
    WARN_ONCE(size > PERF_MAX_TRACE_SIZE, message) and "message" is
    almost the same.

    Shift this WARN_ONCE() into perf_trace_buf_prepare(). This changes
    the meaning of _ONCE, but I think this is fine.

    - 4947014 2932448 10104832 17984294 1126b26 vmlinux
    + 4948422 2932448 10104832 17985702 11270a6 vmlinux

    on my build.

    Link: http://lkml.kernel.org/r/20130617170211.GA19813@redhat.com

    Acked-by: Peter Zijlstra
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     

12 Jul, 2013

2 commits

  • Pull SCSI target updates from Nicholas Bellinger:
    "Lots of activity this round on performance improvements in target-core
    while benchmarking the prototype scsi-mq initiator code with
    vhost-scsi fabric ports, along with a number of iscsi/iser-target
    improvements and hardening fixes for exception path cases post v3.10
    merge.

    The highlights include:

    - Make persistent reservations APTPL buffer allocated on-demand, and
    drop per t10_reservation buffer. (grover)
    - Make virtual LUN=0 a NULLIO device, and skip allocation of NULLIO
    device pages (grover)
    - Add transport_cmd_check_stop write_pending bit to avoid extra
    access of ->t_state_lock is WRITE I/O submission fast-path. (nab)
    - Drop unnecessary CMD_T_DEV_ACTIVE check from
    transport_lun_remove_cmd to avoid extra access of ->t_state_lock in
    release fast-path. (nab)
    - Avoid extra t_state_lock access in __target_execute_cmd fast-path
    (nab)
    - Drop unnecessary vhost-scsi wait_for_tasks=true usage +
    ->t_state_lock access in release fast-path. (nab)
    - Convert vhost-scsi to use modern se_cmd->cmd_kref
    TARGET_SCF_ACK_KREF usage (nab)
    - Add tracepoints for SCSI commands being processed (roland)
    - Refactoring of iscsi-target handling of ISCSI_OP_NOOP +
    ISCSI_OP_TEXT to be transport independent (nab)
    - Add iscsi-target SendTargets=$IQN support for in-band discovery
    (nab)
    - Add iser-target support for in-band discovery (nab + Or)
    - Add iscsi-target demo-mode TPG authentication context support (nab)
    - Fix isert_put_reject payload buffer post (nab)
    - Fix iscsit_add_reject* usage for iser (nab)
    - Fix iscsit_sequence_cmd reject handling for iser (nab)
    - Fix ISCSI_OP_SCSI_TMFUNC handling for iser (nab)
    - Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED (nab)

    The last five iscsi/iser-target items are CC'ed to stable, as they do
    address issues present in v3.10 code. They are certainly larger than
    I'd like for stable patch set, but are important to ensure proper
    REJECT exception handling in iser-target for 3.10.y"

    * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (51 commits)
    iser-target: Ignore non TEXT + LOGOUT opcodes for discovery
    target: make queue_tm_rsp() return void
    target: remove unused codes from enum tcm_tmrsp_table
    iscsi-target: kstrtou* configfs attribute parameter cleanups
    iscsi-target: Fix tfc_tpg_auth_cit configfs length overflow
    iscsi-target: Fix tfc_tpg_nacl_auth_cit configfs length overflow
    iser-target: Add support for ISCSI_OP_TEXT opcode + payload handling
    iser-target: Rename sense_buf_[dma,len] to pdu_[dma,len]
    iser-target: Add vendor_err debug output
    target: Add (obsolete) checking for PMI/LBA fields in READ CAPACITY(10)
    target: Return correct sense data for IO past the end of a device
    target: Add tracepoints for SCSI commands being processed
    iser-target: Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED
    iscsi-target: Fix ISCSI_OP_SCSI_TMFUNC handling for iser
    iscsi-target: Fix iscsit_sequence_cmd reject handling for iser
    iscsi-target: Fix iscsit_add_reject* usage for iser
    iser-target: Fix isert_put_reject payload buffer post
    iscsi-target: missing kfree() on error path
    iscsi-target: Drop left-over iscsi_conn->bad_hdr
    target: Make core_scsi3_update_and_write_aptpl return sense_reason_t
    ...

    Linus Torvalds
     
  • Pull tracing changes from Steven Rostedt:
    "The majority of the changes here are cleanups for the large changes
    that were added to 3.10, which includes several bug fixes that have
    been marked for stable.

    As for new features, there were a few, but nothing to write to LWN
    about. These include:

    New function trigger called "dump" and "cpudump" that will cause
    ftrace to dump its buffer to the console when the function is called.
    The difference between "dump" and "cpudump" is that "dump" will dump
    the entire contents of the ftrace buffer, where as "cpudump" will only
    dump the contents of the ftrace buffer for the CPU that called the
    function.

    Another small enhancement is a new sysctl switch called
    "traceoff_on_warning" which, when enabled, will disable tracing if any
    WARN_ON() is triggered. This is useful if you want to debug what
    caused a warning and do not want to risk losing your trace data by the
    ring buffer overwriting the data before you can disable it. There's
    also a kernel command line option that will make this enabled at boot
    up called the same thing"

    * tag 'trace-3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (34 commits)
    tracing: Make tracing_open_generic_{tr,tc}() static
    tracing: Remove ftrace() function
    tracing: Remove TRACE_EVENT_TYPE enum definition
    tracing: Make tracer_tracing_{off,on,is_on}() static
    tracing: Fix irqs-off tag display in syscall tracing
    uprobes: Fix return value in error handling path
    tracing: Fix race between deleting buffer and setting events
    tracing: Add trace_array_get/put() to event handling
    tracing: Get trace_array ref counts when accessing trace files
    tracing: Add trace_array_get/put() to handle instance refs better
    tracing: Protect ftrace_trace_arrays list in trace_events.c
    tracing: Make trace_marker use the correct per-instance buffer
    ftrace: Do not run selftest if command line parameter is set
    tracing/kprobes: Don't pass addr=ip to perf_trace_buf_submit()
    tracing: Use flag buffer_disabled for irqsoff tracer
    tracing/kprobes: Turn trace_probe->files into list_head
    tracing: Fix disabling of soft disable
    tracing: Add missing syscall_metadata comment
    tracing: Simplify code for showing of soft disabled flag
    tracing/kprobes: Kill probe_enable_lock
    ...

    Linus Torvalds
     

10 Jul, 2013

2 commits

  • …inux/kernel/git/ericvh/v9fs

    Pull 9p update from Eric Van Hensbergen:
    "Grab bag of little fixes and enhancements:
    - optional security enhancements
    - fix path coverage in MAINTAINERS
    - switch to using most used protocol and transport as default
    - clean up buffer dumps in trace code

    Held off on RDMA patches as they need to be cleaned up a bit, but will
    try to get the cleaned, checked, and pushed by mid-week"

    * tag 'for-linus-3.11-merge-window-part-1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
    9p: Add rest of 9p files to MAINTAINERS entry
    9p: trace: use %*ph to dump buffer
    net/9p: Handle error in zero copy request correctly for 9p2000.u
    net/9p: Use virtio transpart as the default transport
    net/9p: Make 9P2000.L the default protocol for 9p file system

    Linus Torvalds
     
  • Pull btrfs update from Chris Mason:
    "These are the usual mixture of bugs, cleanups and performance fixes.
    Miao has some really nice tuning of our crc code as well as our
    transaction commits.

    Josef is peeling off more and more problems related to early enospc,
    and has a number of important bug fixes in here too"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (81 commits)
    Btrfs: wait ordered range before doing direct io
    Btrfs: only do the tree_mod_log_free_eb if this is our last ref
    Btrfs: hold the tree mod lock in __tree_mod_log_rewind
    Btrfs: make backref walking code handle skinny metadata
    Btrfs: fix crash regarding to ulist_add_merge
    Btrfs: fix several potential problems in copy_nocow_pages_for_inode
    Btrfs: cleanup the code of copy_nocow_pages_for_inode()
    Btrfs: fix oops when recovering the file data by scrub function
    Btrfs: make the chunk allocator completely tree lockless
    Btrfs: cleanup orphaned root orphan item
    Btrfs: fix wrong mirror number tuning
    Btrfs: cleanup redundant code in btrfs_submit_direct()
    Btrfs: remove btrfs_sector_sum structure
    Btrfs: check if we can nocow if we don't have data space
    Btrfs: stop using try_to_writeback_inodes_sb_nr to flush delalloc
    Btrfs: use a percpu to keep track of possibly pinned bytes
    Btrfs: check for actual acls rather than just xattrs when caching no acl
    Btrfs: move btrfs_truncate_page to btrfs_cont_expand instead of btrfs_truncate
    Btrfs: optimize reada_for_balance
    Btrfs: optimize read_block_for_search
    ...

    Linus Torvalds
     

08 Jul, 2013

1 commit

  • This patch adds tracepoints to the target code for commands being
    received and being completed, which is quite useful for debugging
    interactions with initiators. For example, one can do something like the
    following to watch commands that are completing unsuccessfully:

    # echo 'scsi_status!=0' > /sys/kernel/debug/tracing/events/target/target_cmd_complete/filter
    # echo 1 > /sys/kernel/debug/tracing/events/target/target_cmd_complete/enable

    # cat /sys/kernel/debug/tracing/trace
    iscsi_trx-0-1902 [003] ...1 990185.810385: target_cmd_complete: iqn.1993-08.org.debian:01:e51ede6aacfd
    Signed-off-by: Nicholas Bellinger

    Roland Dreier
     

04 Jul, 2013

4 commits

  • Merge first patch-bomb from Andrew Morton:
    - various misc bits
    - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been
    distracted. There has been quite a bit of activity.
    - About half the MM queue
    - Some backlight bits
    - Various lib/ updates
    - checkpatch updates
    - zillions more little rtc patches
    - ptrace
    - signals
    - exec
    - procfs
    - rapidio
    - nbd
    - aoe
    - pps
    - memstick
    - tools/testing/selftests updates

    * emailed patches from Andrew Morton : (445 commits)
    tools/testing/selftests: don't assume the x bit is set on scripts
    selftests: add .gitignore for kcmp
    selftests: fix clean target in kcmp Makefile
    selftests: add .gitignore for vm
    selftests: add hugetlbfstest
    self-test: fix make clean
    selftests: exit 1 on failure
    kernel/resource.c: remove the unneeded assignment in function __find_resource
    aio: fix wrong comment in aio_complete()
    drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode
    drivers/memstick/host/r592.c: convert to module_pci_driver
    drivers/memstick/host/jmb38x_ms: convert to module_pci_driver
    pps-gpio: add device-tree binding and support
    drivers/pps/clients/pps-gpio.c: convert to module_platform_driver
    drivers/pps/clients/pps-gpio.c: convert to devm_* helpers
    drivers/parport/share.c: use kzalloc
    Documentation/accounting/getdelays.c: avoid strncpy in accounting tool
    aoe: update internal version number to v83
    aoe: update copyright date
    aoe: perform I/O completions in parallel
    ...

    Linus Torvalds
     
  • Andrew Perepechko reported a problem whereby pages are being prematurely
    evicted as the mark_page_accessed() hint is ignored for pages that are
    currently on a pagevec --
    http://www.spinics.net/lists/linux-ext4/msg37340.html .

    Alexey Lyahkov and Robin Dong have also reported problems recently that
    could be due to hot pages reaching the end of the inactive list too
    quickly and be reclaimed.

    Rather than addressing this on a per-filesystem basis, this series aims
    to fix the mark_page_accessed() interface by deferring what LRU a page
    is added to pagevec drain time and allowing mark_page_accessed() to call
    SetPageActive on a pagevec page.

    Patch 1 adds two tracepoints for LRU page activation and insertion. Using
    these processes it's possible to build a model of pages in the
    LRU that can be processed offline.

    Patch 2 defers making the decision on what LRU to add a page to until when
    the pagevec is drained.

    Patch 3 searches the local pagevec for pages to mark PageActive on
    mark_page_accessed. The changelog explains why only the local
    pagevec is examined.

    Patches 4 and 5 tidy up the API.

    postmark, a dd-based test and fs-mark both single and threaded mode were
    run but none of them showed any performance degradation or gain as a
    result of the patch.

    Using patch 1, I built a *very* basic model of the LRU to examine
    offline what the average age of different page types on the LRU were in
    milliseconds. Of course, capturing the trace distorts the test as it's
    written to local disk but it does not matter for the purposes of this
    test. The average age of pages in milliseconds were

    vanilla deferdrain
    Average age mapped anon: 1454 1250
    Average age mapped file: 127841 155552
    Average age unmapped anon: 85 235
    Average age unmapped file: 73633 38884
    Average age unmapped buffers: 74054 116155

    The LRU activity was mostly files which you'd expect for a dd-based
    workload. Note that the average age of buffer pages is increased by the
    series and it is expected this is due to the fact that the buffer pages
    are now getting added to the active list when drained from the pagevecs.
    Note that the average age of the unmapped file data is decreased as they
    are still added to the inactive list and are reclaimed before the
    buffers.

    There is no guarantee this is a universal win for all workloads and it
    would be nice if the filesystem people gave some thought as to whether
    this decision is generally a win or a loss.

    This patch:

    Using these tracepoints it is possible to model LRU activity and the
    average residency of pages of different types. This can be used to
    debug problems related to premature reclaim of pages of particular
    types.

    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Jan Kara
    Cc: Johannes Weiner
    Cc: Alexey Lyahkov
    Cc: Andrew Perepechko
    Cc: Robin Dong
    Cc: Theodore Tso
    Cc: Hugh Dickins
    Cc: Rik van Riel
    Cc: Bernd Schubert
    Cc: David Howells
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Pull power management and ACPI updates from Rafael Wysocki:
    "This time the total number of ACPI commits is slightly greater than
    the number of cpufreq commits, but Viresh Kumar (who works on cpufreq)
    remains the most active patch submitter.

    To me, the most significant change is the addition of offline/online
    device operations to the driver core (with the Greg's blessing) and
    the related modifications of the ACPI core hotplug code. Next are the
    freezer updates from Colin Cross that should make the freezing of
    tasks a bit less heavy weight.

    We also have a couple of regression fixes, a number of fixes for
    issues that have not been identified as regressions, two new drivers
    and a bunch of cleanups all over.

    Highlights:

    - Hotplug changes to support graceful hot-removal failures.

    It sometimes is necessary to fail device hot-removal operations
    gracefully if they cannot be carried out completely. For example,
    if memory from a memory module being hot-removed has been allocated
    for the kernel's own use and cannot be moved elsewhere, it's
    desirable to fail the hot-removal operation in a graceful way
    rather than to crash the kernel, but currenty a success or a kernel
    crash are the only possible outcomes of an attempted memory
    hot-removal. Needless to say, that is not a very attractive
    alternative and it had to be addressed.

    However, in order to make it work for memory, I first had to make
    it work for CPUs and for this purpose I needed to modify the ACPI
    processor driver. It's been split into two parts, a resident one
    handling the low-level initialization/cleanup and a modular one
    playing the actual driver's role (but it binds to the CPU system
    device objects rather than to the ACPI device objects representing
    processors). That's been sort of like a live brain surgery on a
    patient who's riding a bike.

    So this is a little scary, but since we found and fixed a couple of
    regressions it caused to happen during the early linux-next testing
    (a month ago), nobody has complained.

    As a bonus we remove some duplicated ACPI hotplug code, because the
    ACPI-based CPU hotplug is now going to use the common ACPI hotplug
    code.

    - Lighter weight freezing of tasks.

    These changes from Colin Cross and Mandeep Singh Baines are
    targeted at making the freezing of tasks a bit less heavy weight
    operation. They reduce the number of tasks woken up every time
    during the freezing, by using the observation that the freezer
    simply doesn't need to wake up some of them and wait for them all
    to call refrigerator(). The time needed for the freezer to decide
    to report a failure is reduced too.

    Also reintroduced is the check causing a lockdep warining to
    trigger when try_to_freeze() is called with locks held (which is
    generally unsafe and shouldn't happen).

    - cpufreq updates

    First off, a commit from Srivatsa S Bhat fixes a resume regression
    introduced during the 3.10 cycle causing some cpufreq sysfs
    attributes to return wrong values to user space after resume. The
    fix is kind of fresh, but also it's pretty obvious once Srivatsa
    has identified the root cause.

    Second, we have a new freqdomain_cpus sysfs attribute for the
    acpi-cpufreq driver to provide information previously available via
    related_cpus. From Lan Tianyu.

    Finally, we fix a number of issues, mostly related to the
    CPUFREQ_POSTCHANGE notifier and cpufreq Kconfig options and clean
    up some code. The majority of changes from Viresh Kumar with bits
    from Jacob Shin, Heiko Stübner, Xiaoguang Chen, Ezequiel Garcia,
    Arnd Bergmann, and Tang Yuantian.

    - ACPICA update

    A usual bunch of updates from the ACPICA upstream.

    During the 3.4 cycle we introduced support for ACPI 5 extended
    sleep registers, but they are only supposed to be used if the
    HW-reduced mode bit is set in the FADT flags and the code attempted
    to use them without checking that bit. That caused suspend/resume
    regressions to happen on some systems. Fix from Lv Zheng causes
    those registers to be used only if the HW-reduced mode bit is set.

    Apart from this some other ACPICA bugs are fixed and code cleanups
    are made by Bob Moore, Tomasz Nowicki, Lv Zheng, Chao Guan, and
    Zhang Rui.

    - cpuidle updates

    New driver for Xilinx Zynq processors is added by Michal Simek.

    Multidriver support simplification, addition of some missing
    kerneldoc comments and Kconfig-related fixes come from Daniel
    Lezcano.

    - ACPI power management updates

    Changes to make suspend/resume work correctly in Xen guests from
    Konrad Rzeszutek Wilk, sparse warning fix from Fengguang Wu and
    cleanups and fixes of the ACPI device power state selection
    routine.

    - ACPI documentation updates

    Some previously missing pieces of ACPI documentation are added by
    Lv Zheng and Aaron Lu (hopefully, that will help people to
    uderstand how the ACPI subsystem works) and one outdated doc is
    updated by Hanjun Guo.

    - Assorted ACPI updates

    We finally nailed down the IA-64 issue that was the reason for
    reverting commit 9f29ab11ddbf ("ACPI / scan: do not match drivers
    against objects having scan handlers"), so we can fix it and move
    the ACPI scan handler check added to the ACPI video driver back to
    the core.

    A mechanism for adding CMOS RTC address space handlers is
    introduced by Lan Tianyu to allow some EC-related breakage to be
    fixed on some systems.

    A spec-compliant implementation of acpi_os_get_timer() is added by
    Mika Westerberg.

    The evaluation of _STA is added to do_acpi_find_child() to avoid
    situations in which a pointer to a disabled device object is
    returned instead of an enabled one with the same _ADR value. From
    Jeff Wu.

    Intel BayTrail PCH (Platform Controller Hub) support is added to
    the ACPI driver for Intel Low-Power Subsystems (LPSS) and that
    driver is modified to work around a couple of known BIOS issues.
    Changes from Mika Westerberg and Heikki Krogerus.

    The EC driver is fixed by Vasiliy Kulikov to use get_user() and
    put_user() instead of dereferencing user space pointers blindly.

    Code cleanups are made by Bjorn Helgaas, Nicholas Mazzuca and Toshi
    Kani.

    - Assorted power management updates

    The "runtime idle" helper routine is changed to take the return
    values of the callbacks executed by it into account and to call
    rpm_suspend() if they return 0, which allows us to reduce the
    overall code bloat a bit (by dropping some code that's not
    necessary any more after that modification).

    The runtime PM documentation is updated by Alan Stern (to reflect
    the "runtime idle" behavior change).

    New trace points for PM QoS are added by Sahara
    ().

    PM QoS documentation is updated by Lan Tianyu.

    Code cleanups are made and minor issues are addressed by Bernie
    Thompson, Bjorn Helgaas, Julius Werner, and Shuah Khan.

    - devfreq updates

    New driver for the Exynos5-bus device from Abhilash Kesavan.

    Minor cleanups, fixes and MAINTAINERS update from MyungJoo Ham,
    Abhilash Kesavan, Paul Bolle, Rajagopal Venkat, and Wei Yongjun.

    - OMAP power management updates

    Adaptive Voltage Scaling (AVS) SmartReflex voltage control driver
    updates from Andrii Tseglytskyi and Nishanth Menon."

    * tag 'pm+acpi-3.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (162 commits)
    cpufreq: Fix cpufreq regression after suspend/resume
    ACPI / PM: Fix possible NULL pointer deref in acpi_pm_device_sleep_state()
    PM / Sleep: Warn about system time after resume with pm_trace
    cpufreq: don't leave stale policy pointer in cdbs->cur_policy
    acpi-cpufreq: Add new sysfs attribute freqdomain_cpus
    cpufreq: make sure frequency transitions are serialized
    ACPI: implement acpi_os_get_timer() according the spec
    ACPI / EC: Add HP Folio 13 to ec_dmi_table in order to skip DSDT scan
    ACPI: Add CMOS RTC Operation Region handler support
    ACPI / processor: Drop unused variable from processor_perflib.c
    cpufreq: tegra: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: s3c64xx: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: omap: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: imx6q: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: exynos: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: dbx500: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: davinci: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: arm-big-little: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: powernow-k8: call CPUFREQ_POSTCHANGE notfier in error cases
    cpufreq: pcc: call CPUFREQ_POSTCHANGE notfier in error cases
    ...

    Linus Torvalds
     
  • Pull regmap updates from Mark Brown:
    "A small but useful set of regmap updates this time around:

    - An abstraction for bitfields within a register map contributed by
    Srinivas Kandagatla, allowing drivers to cope more easily when
    hardware designers randomly move things about (mainly when talking
    to things like system controllers).

    - Changes from Lars-Peter Clausen to allow the MMIO regmap to be used
    from hard IRQ context.

    - Small improvements to the cache infrastructure and performance,
    including a default cache sync operation so now all regmaps can
    sync easily.

    There's also a pinctrl driver making use of the new bitfield API,
    merged here for dependency reasons. There will be a simple add/add
    conflict with the pinctrl tree as a result."

    * tag 'regmap-v3.11' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
    pinctrl: st: Remove unnecessary use of of_match_ptr macro
    pinctrl: st: fix return value check
    pinctrl: st: Add pinctrl and pinconf support.
    regmap: debugfs: Suppress cache for partial register files
    regmap: Add regmap_field APIs
    regmap: core: Cache all registers by default when cache is enabled
    regmap: Implemented default cache sync operation
    regmap: Make regmap-mmio usable from atomic contexts
    regmap: regcache: Fixup locking for custom lock callbacks
    regmap: debugfs: Fix return from regmap_debugfs_get_dump_start
    regmap: debugfs: Don't mark lockdep as broken due to debugfs write
    regmap: rbtree: Use range information to allocate nodes
    regmap: rbtree: Factor out node allocation
    regmap: Make regmap_check_range_table() a public API
    regmap: Add support for discarding parts of the register cache

    Linus Torvalds
     

03 Jul, 2013

1 commit

  • Pull x86 tracing updates from Ingo Molnar:
    "This tree adds IRQ vector tracepoints that are named after the handler
    and which output the vector #, based on a zero-overhead approach that
    relies on changing the IDT entries, by Seiji Aguchi.

    The new tracepoints look like this:

    # perf list | grep -i irq_vector
    irq_vectors:local_timer_entry [Tracepoint event]
    irq_vectors:local_timer_exit [Tracepoint event]
    irq_vectors:reschedule_entry [Tracepoint event]
    irq_vectors:reschedule_exit [Tracepoint event]
    irq_vectors:spurious_apic_entry [Tracepoint event]
    irq_vectors:spurious_apic_exit [Tracepoint event]
    irq_vectors:error_apic_entry [Tracepoint event]
    irq_vectors:error_apic_exit [Tracepoint event]
    [...]"

    * 'x86-tracing-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/tracing: Add config option checking to the definitions of mce handlers
    trace,x86: Do not call local_irq_save() in load_current_idt()
    trace,x86: Move creation of irq tracepoints from apic.c to irq.c
    x86, trace: Add irq vector tracepoints
    x86: Rename variables for debugging
    x86, trace: Introduce entering/exiting_irq()
    tracing: Add DEFINE_EVENT_FN() macro

    Linus Torvalds