10 Oct, 2012

1 commit

  • Pull btrfs update from Chris Mason:
    "This is a large pull, with the bulk of the updates coming from:

    - Hole punching

    - send/receive fixes

    - fsync performance

    - Disk format extension allowing more hardlinks inside a single
    directory (btrfs-progs patch required to enable the compat bit for
    this one)

    I'm cooking more unrelated RAID code, but I wanted to make sure this
    original batch makes it in. The largest updates here are relatively
    old and have been in testing for some time."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (121 commits)
    btrfs: init ref_index to zero in add_inode_ref
    Btrfs: remove repeated eb->pages check in, disk-io.c/csum_dirty_buffer
    Btrfs: fix page leakage
    Btrfs: do not warn_on when we cannot alloc a page for an extent buffer
    Btrfs: don't bug on enomem in readpage
    Btrfs: cleanup pages properly when ENOMEM in compression
    Btrfs: make filesystem read-only when submitting barrier fails
    Btrfs: detect corrupted filesystem after write I/O errors
    Btrfs: make compress and nodatacow mount options mutually exclusive
    btrfs: fix message printing
    Btrfs: don't bother committing delayed inode updates when fsyncing
    btrfs: move inline function code to header file
    Btrfs: remove unnecessary IS_ERR in bio_readpage_error()
    btrfs: remove unused function btrfs_insert_some_items()
    Btrfs: don't commit instead of overcommitting
    Btrfs: confirmation of value is added before trace_btrfs_get_extent() is called
    Btrfs: be smarter about dropping things from the tree log
    Btrfs: don't lookup csums for prealloc extents
    Btrfs: cache extent state when writing out dirty metadata pages
    Btrfs: do not hold the file extent leaf locked when adding extent item
    ...

    Linus Torvalds
     

09 Oct, 2012

1 commit

  • When transparent huge pages were introduced, memory compaction and swap
    storms were an issue, and the kernel had to be careful to not make THP
    allocations cause pageout or compaction.

    Now that we have working compaction deferral, kswapd is smart enough to
    invoke compaction and the quadratic behaviour around isolate_free_pages
    has been fixed, it should be safe to remove __GFP_NO_KSWAPD.

    [minchan@kernel.org: Comment fix]
    [mgorman@suse.de: Avoid direct reclaim for deferred compaction]
    Cc: Andrea Arcangeli
    Signed-off-by: Rik van Riel
    Signed-off-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     

08 Oct, 2012

1 commit

  • Pull ext4 updates from Ted Ts'o:
    "The big new feature added this time is supporting online resizing
    using the meta_bg feature. This allows us to resize file systems
    which are greater than 16TB. In addition, the speed of online
    resizing has been improved in general.

    We also fix a number of races, some of which could lead to deadlocks,
    in ext4's Asynchronous I/O and online defrag support, thanks to good
    work by Dmitry Monakhov.

    There are also a large number of more minor bug fixes and cleanups
    from a number of other ext4 contributors, quite of few of which have
    submitted fixes for the first time."

    * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (69 commits)
    ext4: fix ext4_flush_completed_IO wait semantics
    ext4: fix mtime update in nodelalloc mode
    ext4: fix ext_remove_space for punch_hole case
    ext4: punch_hole should wait for DIO writers
    ext4: serialize truncate with owerwrite DIO workers
    ext4: endless truncate due to nonlocked dio readers
    ext4: serialize unlocked dio reads with truncate
    ext4: serialize dio nonlocked reads with defrag workers
    ext4: completed_io locking cleanup
    ext4: fix unwritten counter leakage
    ext4: give i_aiodio_unwritten a more appropriate name
    ext4: ext4_inode_info diet
    ext4: convert to use leXX_add_cpu()
    ext4: ext4_bread usage audit
    fs: reserve fallocate flag codepoint
    ext4: remove redundant offset check in mext_check_arguments()
    ext4: don't clear orphan list on ro mount with errors
    jbd2: fix assertion failure in commit code due to lacking transaction credits
    ext4: release donor reference when EXT4_IOC_MOVE_EXT ioctl fails
    ext4: enable FITRIM ioctl on bigalloc file system
    ...

    Linus Torvalds
     

03 Oct, 2012

1 commit


02 Oct, 2012

2 commits

  • We've added a new field 'sequence' to delayed ref node, so update related
    tracepoints.

    Signed-off-by: Liu Bo

    Liu Bo
     
  • Pull the trivial tree from Jiri Kosina:
    "Tiny usual fixes all over the place"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
    doc: fix old config name of kprobetrace
    fs/fs-writeback.c: cleanup riteback_sb_inodes kerneldoc
    btrfs: fix the commment for the action flags in delayed-ref.h
    btrfs: fix trivial typo for the comment of BTRFS_FREE_INO_OBJECTID
    vfs: fix kerneldoc for generic_fh_to_parent()
    treewide: fix comment/printk/variable typos
    ipr: fix small coding style issues
    doc: fix broken utf8 encoding
    nfs: comment fix
    platform/x86: fix asus_laptop.wled_type module parameter
    mfd: printk/comment fixes
    doc: getdelays.c: remember to close() socket on error in create_nl_socket()
    doc: aliasing-test: close fd on write error
    mmc: fix comment typos
    dma: fix comments
    spi: fix comment/printk typos in spi
    Coccinelle: fix typo in memdup_user.cocci
    tmiofb: missing NULL pointer checks
    tools: perf: Fix typo in tools/perf
    tools/testing: fix comment / output typos
    ...

    Linus Torvalds
     

21 Sep, 2012

1 commit

  • When allocating memory fails, page is NULL. page_to_pfn() will
    cause the kernel panicked if we don't use sparsemem vmemmap.

    Link: http://lkml.kernel.org/r/505AB1FF.8020104@cn.fujitsu.com

    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Andrew Morton
    Cc: stable
    Acked-by: Mel Gorman
    Reviewed-by: Minchan Kim
    Signed-off-by: Wen Congyang
    Signed-off-by: Steven Rostedt

    Wen Congyang
     

02 Sep, 2012

1 commit


17 Aug, 2012

2 commits

  • Signed-off-by: Anatol Pomozov
    Signed-off-by: "Theodore Ts'o"

    Anatol Pomozov
     
  • Most hardware architectures require that data (including struct fields)
    have to be aligned in memory. To make it happen compiler inserts padding
    between struct fields if they are not aligned correctly.

    Reorder fields to remove paddings and make structures denser. Making data
    smaller saves some memory that is very important for trace events.
    Tracing buffer has limited size and making objects smaller we can put more
    of them without overflowing the tracing buffer.

    To find data struct holes I used 'pahole -H 1 -E -I vmlinux.o' from
    'dwarves' package.

    Signed-off-by: Anatol Pomozov
    Signed-off-by: "Theodore Ts'o"

    Anatol Pomozov
     

04 Aug, 2012

1 commit


01 Aug, 2012

3 commits

  • Merge Andrew's second set of patches:
    - MM
    - a few random fixes
    - a couple of RTC leftovers

    * emailed patches from Andrew Morton : (120 commits)
    rtc/rtc-88pm80x: remove unneed devm_kfree
    rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
    mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
    tmpfs: distribute interleave better across nodes
    mm: remove redundant initialization
    mm: warn if pg_data_t isn't initialized with zero
    mips: zero out pg_data_t when it's allocated
    memcg: gix memory accounting scalability in shrink_page_list
    mm/sparse: remove index_init_lock
    mm/sparse: more checks on mem_section number
    mm/sparse: optimize sparse_index_alloc
    memcg: add mem_cgroup_from_css() helper
    memcg: further prevent OOM with too many dirty pages
    memcg: prevent OOM with too many dirty pages
    mm: mmu_notifier: fix freed page still mapped in secondary MMU
    mm: memcg: only check anon swapin page charges for swap cache
    mm: memcg: only check swap cache pages for repeated charging
    mm: memcg: split swapin charge function into private and public part
    mm: memcg: remove needless !mm fixup to init_mm when charging
    mm: memcg: remove unneeded shmem charge type
    ...

    Linus Torvalds
     
  • Pull random subsystem patches from Ted Ts'o:
    "This patch series contains a major revamp of how we collect entropy
    from interrupts for /dev/random and /dev/urandom.

    The goal is to addresses weaknesses discussed in the paper "Mining
    your Ps and Qs: Detection of Widespread Weak Keys in Network Devices",
    by Nadia Heninger, Zakir Durumeric, Eric Wustrow, J. Alex Halderman,
    which will be published in the Proceedings of the 21st Usenix Security
    Symposium, August 2012. (See https://factorable.net for more
    information and an extended version of the paper.)"

    Fix up trivial conflicts due to nearby changes in
    drivers/{mfd/ab3100-core.c, usb/gadget/omap_udc.c}

    * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: (33 commits)
    random: mix in architectural randomness in extract_buf()
    dmi: Feed DMI table to /dev/random driver
    random: Add comment to random_initialize()
    random: final removal of IRQF_SAMPLE_RANDOM
    um: remove IRQF_SAMPLE_RANDOM which is now a no-op
    sparc/ldc: remove IRQF_SAMPLE_RANDOM which is now a no-op
    [ARM] pxa: remove IRQF_SAMPLE_RANDOM which is now a no-op
    board-palmz71: remove IRQF_SAMPLE_RANDOM which is now a no-op
    isp1301_omap: remove IRQF_SAMPLE_RANDOM which is now a no-op
    pxa25x_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
    omap_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
    goku_udc: remove IRQF_SAMPLE_RANDOM which was commented out
    uartlite: remove IRQF_SAMPLE_RANDOM which is now a no-op
    drivers: hv: remove IRQF_SAMPLE_RANDOM which is now a no-op
    xen-blkfront: remove IRQF_SAMPLE_RANDOM which is now a no-op
    n2_crypto: remove IRQF_SAMPLE_RANDOM which is now a no-op
    pda_power: remove IRQF_SAMPLE_RANDOM which is now a no-op
    i2c-pmcmsp: remove IRQF_SAMPLE_RANDOM which is now a no-op
    input/serio/hp_sdc.c: remove IRQF_SAMPLE_RANDOM which is now a no-op
    mfd: remove IRQF_SAMPLE_RANDOM which is now a no-op
    ...

    Linus Torvalds
     
  • __GFP_MEMALLOC will allow the allocation to disregard the watermarks, much
    like PF_MEMALLOC. It allows one to pass along the memalloc state in
    object related allocation flags as opposed to task related flags, such as
    sk->sk_allocation. This removes the need for ALLOC_PFMEMALLOC as callers
    using __GFP_MEMALLOC can get the ALLOC_NO_WATERMARK flag which is now
    enough to identify allocations related to page reclaim.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Mel Gorman
    Cc: David Miller
    Cc: Neil Brown
    Cc: Mike Christie
    Cc: Eric B Munson
    Cc: Eric Dumazet
    Cc: Sebastian Andrzej Siewior
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

31 Jul, 2012

1 commit

  • A few events are interesting not only for a current task.
    For example, sched_stat_* events are interesting for a task
    which wakes up. For this reason, it will be good if such
    events will be delivered to a target task too.

    Now a target task can be set by using __perf_task().

    The original idea and a draft patch belongs to Peter Zijlstra.

    I need these events for profiling sleep times. sched_switch is used for
    getting callchains and sched_stat_* is used for getting time periods.
    These events are combined in user space, then it can be analyzed by
    perf tools.

    Inspired-by: Peter Zijlstra
    Cc: Steven Rostedt
    Cc: Paul Mackerras
    Cc: Arnaldo Carvalho de Melo
    Cc: Steven Rostedt
    Cc: Arun Sharma
    Signed-off-by: Andrew Vagin
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1342016098-213063-1-git-send-email-avagin@openvz.org
    Signed-off-by: Ingo Molnar

    Andrew Vagin
     

27 Jul, 2012

1 commit

  • Pull x86/mm changes from Peter Anvin:
    "The big change here is the patchset by Alex Shi to use INVLPG to flush
    only the affected pages when we only need to flush a small page range.

    It also removes the special INVALIDATE_TLB_VECTOR interrupts (32
    vectors!) and replace it with an ordinary IPI function call."

    Fix up trivial conflicts in arch/x86/include/asm/apic.h (added code next
    to changed line)

    * 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/tlb: Fix build warning and crash when building for !SMP
    x86/tlb: do flush_tlb_kernel_range by 'invlpg'
    x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR
    x86/tlb: enable tlb flush range support for x86
    mm/mmu_gather: enable tlb flush range in generic mmu_gather
    x86/tlb: add tlb_flushall_shift knob into debugfs
    x86/tlb: add tlb_flushall_shift for specific CPU
    x86/tlb: fall back to flush all when meet a THP large page
    x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range
    x86/tlb_info: get last level TLB entry number of CPU
    x86: Add read_mostly declaration/definition to variables from smp.h
    x86: Define early read-mostly per-cpu macros

    Linus Torvalds
     

25 Jul, 2012

2 commits

  • Pull workqueue changes from Tejun Heo:
    "There are three major changes.

    - WQ_HIGHPRI has been reimplemented so that high priority work items
    are served by worker threads with -20 nice value from dedicated
    highpri worker pools.

    - CPU hotplug support has been reimplemented such that idle workers
    are kept across CPU hotplug events. This makes CPU hotplug cheaper
    (for PM) and makes the code simpler.

    - flush_kthread_work() has been reimplemented so that a work item can
    be freed while executing. This removes an annoying behavior
    difference between kthread_worker and workqueue."

    * 'for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
    workqueue: fix spurious CPU locality WARN from process_one_work()
    kthread_worker: reimplement flush_kthread_work() to allow freeing the work item being executed
    kthread_worker: reorganize to prepare for flush_kthread_work() reimplementation
    workqueue: simplify CPU hotplug code
    workqueue: remove CPU offline trustee
    workqueue: don't butcher idle workers on an offline CPU
    workqueue: reimplement CPU online rebinding to handle idle workers
    workqueue: drop @bind from create_worker()
    workqueue: use mutex for global_cwq manager exclusion
    workqueue: ROGUE workers are UNBOUND workers
    workqueue: drop CPU_DYING notifier operation
    workqueue: perform cpu down operations from low priority cpu_notifier()
    workqueue: reimplement WQ_HIGHPRI using a separate worker_pool
    workqueue: introduce NR_WORKER_POOLS and for_each_worker_pool()
    workqueue: separate out worker_pool flags
    workqueue: use @pool instead of @gcwq or @cpu where applicable
    workqueue: factor out worker_pool from global_cwq
    workqueue: don't use WQ_HIGHPRI for unbound workqueues

    Linus Torvalds
     
  • Pull KVM updates from Avi Kivity:
    "Highlights include
    - full big real mode emulation on pre-Westmere Intel hosts (can be
    disabled with emulate_invalid_guest_state=0)
    - relatively small ppc and s390 updates
    - PCID/INVPCID support in guests
    - EOI avoidance; 3.6 guests should perform better on 3.6 hosts on
    interrupt intensive workloads)
    - Lockless write faults during live migration
    - EPT accessed/dirty bits support for new Intel processors"

    Fix up conflicts in:
    - Documentation/virtual/kvm/api.txt:

    Stupid subchapter numbering, added next to each other.

    - arch/powerpc/kvm/booke_interrupts.S:

    PPC asm changes clashing with the KVM fixes

    - arch/s390/include/asm/sigp.h, arch/s390/kvm/sigp.c:

    Duplicated commits through the kvm tree and the s390 tree, with
    subsequent edits in the KVM tree.

    * tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits)
    KVM: fix race with level interrupts
    x86, hyper: fix build with !CONFIG_KVM_GUEST
    Revert "apic: fix kvm build on UP without IOAPIC"
    KVM guest: switch to apic_set_eoi_write, apic_write
    apic: add apic_set_eoi_write for PV use
    KVM: VMX: Implement PCID/INVPCID for guests with EPT
    KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check
    KVM: PPC: Critical interrupt emulation support
    KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests
    KVM: PPC64: booke: Set interrupt computation mode for 64-bit host
    KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt
    KVM: PPC: bookehv64: Add support for std/ld emulation.
    booke: Added crit/mc exception handler for e500v2
    booke/bookehv: Add host crit-watchdog exception support
    KVM: MMU: document mmu-lock and fast page fault
    KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint
    KVM: MMU: trace fast page fault
    KVM: MMU: fast path of handling guest page fault
    KVM: MMU: introduce SPTE_MMU_WRITEABLE bit
    KVM: MMU: fold tlb flush judgement into mmu_spte_update
    ...

    Linus Torvalds
     

23 Jul, 2012

1 commit

  • Pull perf events changes from Ingo Molnar:

    "- kernel side:

    - Intel uncore PMU support for Nehalem and Sandy Bridge CPUs, we
    support both the events available via the MSR and via the PCI
    access space.

    - various uprobes cleanups and restructurings

    - PMU driver quirks by microcode version and required x86 microcode
    loader cleanups/robustization

    - various tracing robustness updates

    - static keys: remove obsolete static_branch()

    - tooling side:

    - GTK browser improvements

    - perf report browser: support screenshots to file

    - more automated tests

    - perf kvm improvements

    - perf bench refinements

    - build environment improvements

    - pipe mode improvements

    - libtraceevent updates, we have now hopefully merged most bits with
    the out of tree forked code base

    ... and many other goodies."

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (138 commits)
    tracing: Check for allocation failure in __tracing_open()
    perf/x86: Fix intel_perfmon_event_mapformatting
    jump label: Remove static_branch()
    tracepoint: Use static_key_false(), since static_branch() is deprecated
    perf/x86: Uncore filter support for SandyBridge-EP
    perf/x86: Detect number of instances of uncore CBox
    perf/x86: Fix event constraint for SandyBridge-EP C-Box
    perf/x86: Use 0xff as pseudo code for fixed uncore event
    perf/x86: Save a few bytes in 'struct x86_pmu'
    perf/x86: Add a microcode revision check for SNB-PEBS
    perf/x86: Improve debug output in check_hw_exists()
    perf/x86/amd: Unify AMD's generic and family 15h pmus
    perf/x86: Move Intel specific code to intel_pmu_init()
    perf/x86: Rename Intel specific macros
    perf/x86: Fix USER/KERNEL tagging of samples
    perf tools: Split event symbols arrays to hw and sw parts
    perf tools: Split out PE_VALUE_SYM parsing token to SW and HW tokens
    perf tools: Add empty rule for new line in event syntax parsing
    perf test: Use ARRAY_SIZE in parse events tests
    tools lib traceevent: Cleanup realloc use
    ...

    Linus Torvalds
     

15 Jul, 2012

1 commit


13 Jul, 2012

1 commit

  • Move worklist and all worker management fields from global_cwq into
    the new struct worker_pool. worker_pool points back to the containing
    gcwq. worker and cpu_workqueue_struct are updated to point to
    worker_pool instead of gcwq too.

    This change is mechanical and doesn't introduce any functional
    difference other than rearranging of fields and an added level of
    indirection in some places. This is to prepare for multiple pools per
    gcwq.

    v2: Comment typo fixes as suggested by Namhyung.

    Signed-off-by: Tejun Heo
    Cc: Namhyung Kim

    Tejun Heo
     

06 Jul, 2012

1 commit


03 Jul, 2012

1 commit


29 Jun, 2012

1 commit

  • The kvm_emulate_insn tracepoint used __print_insn()
    for printing its instructions. However it makes the
    format of the event hard to parse as it reveals TP
    internals.

    Fortunately, kernel provides __print_hex for almost
    same purpose, we can use it instead of open coding
    it. The user-space can be changed to parse it later.

    That means raw kernel tracing will not be affected
    by this change:

    # cd /sys/kernel/debug/tracing/
    # cat events/kvm/kvm_emulate_insn/format
    name: kvm_emulate_insn
    ID: 29
    format:
    ...
    print fmt: "%x:%llx:%s (%s)%s", REC->csbase, REC->rip, __print_hex(REC->insn, REC->len), \
    __print_symbolic(REC->flags, { 0, "real" }, { (1 << 0) | (1 << 1), "vm16" }, \
    { (1 << 0), "prot16" }, { (1 << 0) | (1 << 2), "prot32" }, { (1 << 0) | (1 << 3), "prot64" }), \
    REC->failed ? " failed" : ""

    # echo 1 > events/kvm/kvm_emulate_insn/enable
    # cat trace
    # tracer: nop
    #
    # entries-in-buffer/entries-written: 2183/2183 #P:12
    #
    # _-----=> irqs-off
    # / _----=> need-resched
    # | / _---=> hardirq/softirq
    # || / _--=> preempt-depth
    # ||| / delay
    # TASK-PID CPU# |||| TIMESTAMP FUNCTION
    # | | | |||| | |
    qemu-kvm-1782 [002] ...1 140.931636: kvm_emulate_insn: 0:c102fa25:89 10 (prot32)
    qemu-kvm-1781 [004] ...1 140.931637: kvm_emulate_insn: 0:c102fa25:89 10 (prot32)

    Link: http://lkml.kernel.org/n/tip-wfw6y3b9ugtey8snaow9nmg5@git.kernel.org
    Link: http://lkml.kernel.org/r/1340757701-10711-2-git-send-email-namhyung@kernel.org

    Cc: Arnaldo Carvalho de Melo
    Cc: Frederic Weisbecker
    Cc: Peter Zijlstra
    Cc: Ingo Molnar
    Cc: Namhyung Kim
    Cc: kvm@vger.kernel.org
    Acked-by: Avi Kivity
    Signed-off-by: Namhyung Kim
    Signed-off-by: Steven Rostedt

    Namhyung Kim
     

28 Jun, 2012

1 commit

  • x86 has no flush_tlb_range support in instruction level. Currently the
    flush_tlb_range just implemented by flushing all page table. That is not
    the best solution for all scenarios. In fact, if we just use 'invlpg' to
    flush few lines from TLB, we can get the performance gain from later
    remain TLB lines accessing.

    But the 'invlpg' instruction costs much of time. Its execution time can
    compete with cr3 rewriting, and even a bit more on SNB CPU.

    So, on a 512 4KB TLB entries CPU, the balance points is at:
    (512 - X) * 100ns(assumed TLB refill cost) =
    X(TLB flush entries) * 100ns(assumed invlpg cost)

    Here, X is 256, that is 1/2 of 512 entries.

    But with the mysterious CPU pre-fetcher and page miss handler Unit, the
    assumed TLB refill cost is far lower then 100ns in sequential access. And
    2 HT siblings in one core makes the memory access more faster if they are
    accessing the same memory. So, in the patch, I just do the change when
    the target entries is less than 1/16 of whole active tlb entries.
    Actually, I have no data support for the percentage '1/16', so any
    suggestions are welcomed.

    As to hugetlb, guess due to smaller page table, and smaller active TLB
    entries, I didn't see benefit via my benchmark, so no optimizing now.

    My micro benchmark show in ideal scenarios, the performance improves 70
    percent in reading. And in worst scenario, the reading/writing
    performance is similar with unpatched 3.4-rc4 kernel.

    Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP
    'always':

    multi thread testing, '-t' paramter is thread number:
    with patch unpatched 3.4-rc4
    ./mprotect -t 1 14ns 24ns
    ./mprotect -t 2 13ns 22ns
    ./mprotect -t 4 12ns 19ns
    ./mprotect -t 8 14ns 16ns
    ./mprotect -t 16 28ns 26ns
    ./mprotect -t 32 54ns 51ns
    ./mprotect -t 128 200ns 199ns

    Single process with sequencial flushing and memory accessing:

    with patch unpatched 3.4-rc4
    ./mprotect 7ns 11ns
    ./mprotect -p 4096 -l 8 -n 10240
    21ns 21ns

    [ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com
    has additional performance numbers. ]

    Signed-off-by: Alex Shi
    Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.com
    Signed-off-by: H. Peter Anvin

    Alex Shi
     

18 Jun, 2012

1 commit

  • This is a preparatory patch for the KVM/ARM implementation. KVM/ARM will use
    the KVM_IRQ_LINE ioctl, which is currently conditional on
    __KVM_HAVE_IOAPIC, but ARM obviously doesn't have any IOAPIC support and we
    need a separate define.

    Signed-off-by: Christoffer Dall
    Signed-off-by: Avi Kivity

    Christoffer Dall
     

14 Jun, 2012

1 commit


07 Jun, 2012

1 commit

  • In the current code, a short dyntick-idle interval (where there is
    at least one non-lazy callback on the CPU) and a long dyntick-idle
    interval (where there are only lazy callbacks on the CPU) are traced
    identically, which can be less than helpful. This commit therefore
    emits different event traces in these two cases.

    Signed-off-by: Paul E. McKenney
    Signed-off-by: Paul E. McKenney
    Tested-by: Heiko Carstens
    Tested-by: Pascal Chapperon

    Paul E. McKenney
     

30 May, 2012

4 commits

  • There is little motiviation for reclaim_mode_t once RECLAIM_MODE_[A]SYNC
    and lumpy reclaim have been removed. This patch gets rid of
    reclaim_mode_t as well and improves the documentation about what
    reclaim/compaction is and when it is triggered.

    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Acked-by: KOSAKI Motohiro
    Cc: Konstantin Khlebnikov
    Cc: Hugh Dickins
    Cc: Ying Han
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • This patch stops reclaim/compaction entering sync reclaim as this was
    only intended for lumpy reclaim and an oversight. Page migration has
    its own logic for stalling on writeback pages if necessary and memory
    compaction is already using it.

    Waiting on page writeback is bad for a number of reasons but the primary
    one is that waiting on writeback to a slow device like USB can take a
    considerable length of time. Page reclaim instead uses
    wait_iff_congested() to throttle if too many dirty pages are being
    scanned.

    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Acked-by: KOSAKI Motohiro
    Cc: Konstantin Khlebnikov
    Cc: Hugh Dickins
    Cc: Ying Han
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • This series removes lumpy reclaim and some stalling logic that was
    unintentionally being used by memory compaction. The end result is that
    stalling on dirty pages during page reclaim now depends on
    wait_iff_congested().

    Four kernels were compared

    3.3.0 vanilla
    3.4.0-rc2 vanilla
    3.4.0-rc2 lumpyremove-v2 is patch one from this series
    3.4.0-rc2 nosync-v2r3 is the full series

    Removing lumpy reclaim saves almost 900 bytes of text whereas the full
    series removes 1200 bytes.

    text data bss dec hex filename
    6740375 1927944 2260992 10929311 a6c49f vmlinux-3.4.0-rc2-vanilla
    6739479 1927944 2260992 10928415 a6c11f vmlinux-3.4.0-rc2-lumpyremove-v2
    6739159 1927944 2260992 10928095 a6bfdf vmlinux-3.4.0-rc2-nosync-v2

    There are behaviour changes in the series and so tests were run with
    monitoring of ftrace events. This disrupts results so the performance
    results are distorted but the new behaviour should be clearer.

    fs-mark running in a threaded configuration showed little of interest as
    it did not push reclaim aggressively

    FS-Mark Multi Threaded
    3.3.0-vanilla rc2-vanilla lumpyremove-v2r3 nosync-v2r3
    Files/s min 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%)
    Files/s mean 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%)
    Files/s stddev 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
    Files/s max 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%)
    Overhead min 508667.00 ( 0.00%) 521350.00 (-2.49%) 544292.00 (-7.00%) 547168.00 (-7.57%)
    Overhead mean 551185.00 ( 0.00%) 652690.73 (-18.42%) 991208.40 (-79.83%) 570130.53 (-3.44%)
    Overhead stddev 18200.69 ( 0.00%) 331958.29 (-1723.88%) 1579579.43 (-8578.68%) 9576.81 (47.38%)
    Overhead max 576775.00 ( 0.00%) 1846634.00 (-220.17%) 6901055.00 (-1096.49%) 585675.00 (-1.54%)
    MMTests Statistics: duration
    Sys Time Running Test (seconds) 309.90 300.95 307.33 298.95
    User+Sys Time Running Test (seconds) 319.32 309.67 315.69 307.51
    Total Elapsed Time (seconds) 1187.85 1193.09 1191.98 1193.73

    MMTests Statistics: vmstat
    Page Ins 80532 82212 81420 79480
    Page Outs 111434984 111456240 111437376 111582628
    Swap Ins 0 0 0 0
    Swap Outs 0 0 0 0
    Direct pages scanned 44881 27889 27453 34843
    Kswapd pages scanned 25841428 25860774 25861233 25843212
    Kswapd pages reclaimed 25841393 25860741 25861199 25843179
    Direct pages reclaimed 44881 27889 27453 34843
    Kswapd efficiency 99% 99% 99% 99%
    Kswapd velocity 21754.791 21675.460 21696.029 21649.127
    Direct efficiency 100% 100% 100% 100%
    Direct velocity 37.783 23.375 23.031 29.188
    Percentage direct scans 0% 0% 0% 0%

    ftrace showed that there was no stalling on writeback or pages submitted
    for IO from reclaim context.

    postmark was similar and while it was more interesting, it also did not
    push reclaim heavily.

    POSTMARK
    3.3.0-vanilla rc2-vanilla lumpyremove-v2r3 nosync-v2r3
    Transactions per second: 16.00 ( 0.00%) 20.00 (25.00%) 18.00 (12.50%) 17.00 ( 6.25%)
    Data megabytes read per second: 18.80 ( 0.00%) 24.27 (29.10%) 22.26 (18.40%) 20.54 ( 9.26%)
    Data megabytes written per second: 35.83 ( 0.00%) 46.25 (29.08%) 42.42 (18.39%) 39.14 ( 9.24%)
    Files created alone per second: 28.00 ( 0.00%) 38.00 (35.71%) 34.00 (21.43%) 30.00 ( 7.14%)
    Files create/transact per second: 8.00 ( 0.00%) 10.00 (25.00%) 9.00 (12.50%) 8.00 ( 0.00%)
    Files deleted alone per second: 556.00 ( 0.00%) 1224.00 (120.14%) 3062.00 (450.72%) 6124.00 (1001.44%)
    Files delete/transact per second: 8.00 ( 0.00%) 10.00 (25.00%) 9.00 (12.50%) 8.00 ( 0.00%)

    MMTests Statistics: duration
    Sys Time Running Test (seconds) 113.34 107.99 109.73 108.72
    User+Sys Time Running Test (seconds) 145.51 139.81 143.32 143.55
    Total Elapsed Time (seconds) 1159.16 899.23 980.17 1062.27

    MMTests Statistics: vmstat
    Page Ins 13710192 13729032 13727944 13760136
    Page Outs 43071140 42987228 42733684 42931624
    Swap Ins 0 0 0 0
    Swap Outs 0 0 0 0
    Direct pages scanned 0 0 0 0
    Kswapd pages scanned 9941613 9937443 9939085 9929154
    Kswapd pages reclaimed 9940926 9936751 9938397 9928465
    Direct pages reclaimed 0 0 0 0
    Kswapd efficiency 99% 99% 99% 99%
    Kswapd velocity 8576.567 11051.058 10140.164 9347.109
    Direct efficiency 100% 100% 100% 100%
    Direct velocity 0.000 0.000 0.000 0.000

    It looks like here that the full series regresses performance but as
    ftrace showed no usage of wait_iff_congested() or sync reclaim I am
    assuming it's a disruption due to monitoring. Other data such as memory
    usage, page IO, swap IO all looked similar.

    Running a benchmark with a plain DD showed nothing very interesting.
    The full series stalled in wait_iff_congested() slightly less but stall
    times on vanilla kernels were marginal.

    Running a benchmark that hammered on file-backed mappings showed stalls
    due to congestion but not in sync writebacks

    MICRO
    3.3.0-vanilla rc2-vanilla lumpyremove-v2r3 nosync-v2r3
    MMTests Statistics: duration
    Sys Time Running Test (seconds) 308.13 294.50 298.75 299.53
    User+Sys Time Running Test (seconds) 330.45 316.28 318.93 320.79
    Total Elapsed Time (seconds) 1814.90 1833.88 1821.14 1832.91

    MMTests Statistics: vmstat
    Page Ins 108712 120708 97224 110344
    Page Outs 155514576 156017404 155813676 156193256
    Swap Ins 0 0 0 0
    Swap Outs 0 0 0 0
    Direct pages scanned 2599253 1550480 2512822 2414760
    Kswapd pages scanned 69742364 71150694 68839041 69692533
    Kswapd pages reclaimed 34824488 34773341 34796602 34799396
    Direct pages reclaimed 53693 94750 61792 75205
    Kswapd efficiency 49% 48% 50% 49%
    Kswapd velocity 38427.662 38797.901 37799.972 38022.889
    Direct efficiency 2% 6% 2% 3%
    Direct velocity 1432.174 845.464 1379.807 1317.446
    Percentage direct scans 3% 2% 3% 3%
    Page writes by reclaim 0 0 0 0
    Page writes file 0 0 0 0
    Page writes anon 0 0 0 0
    Page reclaim immediate 0 0 0 1218
    Page rescued immediate 0 0 0 0
    Slabs scanned 15360 16384 13312 16384
    Direct inode steals 0 0 0 0
    Kswapd inode steals 4340 4327 1630 4323

    FTrace Reclaim Statistics: congestion_wait
    Direct number congest waited 0 0 0 0
    Direct time congest waited 0ms 0ms 0ms 0ms
    Direct full congest waited 0 0 0 0
    Direct number conditional waited 900 870 754 789
    Direct time conditional waited 0ms 0ms 0ms 20ms
    Direct full conditional waited 0 0 0 0
    KSwapd number congest waited 2106 2308 2116 1915
    KSwapd time congest waited 139924ms 157832ms 125652ms 132516ms
    KSwapd full congest waited 1346 1530 1202 1278
    KSwapd number conditional waited 12922 16320 10943 14670
    KSwapd time conditional waited 0ms 0ms 0ms 0ms
    KSwapd full conditional waited 0 0 0 0

    Reclaim statistics are not radically changed. The stall times in kswapd
    are massive but it is clear that it is due to calls to congestion_wait()
    and that is almost certainly the call in balance_pgdat(). Otherwise
    stalls due to dirty pages are non-existant.

    I ran a benchmark that stressed high-order allocation. This is very
    artifical load but was used in the past to evaluate lumpy reclaim and
    compaction. Generally I look at allocation success rates and latency
    figures.

    STRESS-HIGHALLOC
    3.3.0-vanilla rc2-vanilla lumpyremove-v2r3 nosync-v2r3
    Pass 1 81.00 ( 0.00%) 28.00 (-53.00%) 24.00 (-57.00%) 28.00 (-53.00%)
    Pass 2 82.00 ( 0.00%) 39.00 (-43.00%) 38.00 (-44.00%) 43.00 (-39.00%)
    while Rested 88.00 ( 0.00%) 87.00 (-1.00%) 88.00 ( 0.00%) 88.00 ( 0.00%)

    MMTests Statistics: duration
    Sys Time Running Test (seconds) 740.93 681.42 685.14 684.87
    User+Sys Time Running Test (seconds) 2922.65 3269.52 3281.35 3279.44
    Total Elapsed Time (seconds) 1161.73 1152.49 1159.55 1161.44

    MMTests Statistics: vmstat
    Page Ins 4486020 2807256 2855944 2876244
    Page Outs 7261600 7973688 7975320 7986120
    Swap Ins 31694 0 0 0
    Swap Outs 98179 0 0 0
    Direct pages scanned 53494 57731 34406 113015
    Kswapd pages scanned 6271173 1287481 1278174 1219095
    Kswapd pages reclaimed 2029240 1281025 1260708 1201583
    Direct pages reclaimed 1468 14564 16649 92456
    Kswapd efficiency 32% 99% 98% 98%
    Kswapd velocity 5398.133 1117.130 1102.302 1049.641
    Direct efficiency 2% 25% 48% 81%
    Direct velocity 46.047 50.092 29.672 97.306
    Percentage direct scans 0% 4% 2% 8%
    Page writes by reclaim 1616049 0 0 0
    Page writes file 1517870 0 0 0
    Page writes anon 98179 0 0 0
    Page reclaim immediate 103778 27339 9796 17831
    Page rescued immediate 0 0 0 0
    Slabs scanned 1096704 986112 980992 998400
    Direct inode steals 223 215040 216736 247881
    Kswapd inode steals 175331 61548 68444 63066
    Kswapd skipped wait 21991 0 1 0
    THP fault alloc 1 135 125 134
    THP collapse alloc 393 311 228 236
    THP splits 25 13 7 8
    THP fault fallback 0 0 0 0
    THP collapse fail 3 5 7 7
    Compaction stalls 865 1270 1422 1518
    Compaction success 370 401 353 383
    Compaction failures 495 869 1069 1135
    Compaction pages moved 870155 3828868 4036106 4423626
    Compaction move failure 26429 23865 29742 27514

    Success rates are completely hosed for 3.4-rc2 which is almost certainly
    due to commit fe2c2a106663 ("vmscan: reclaim at order 0 when compaction
    is enabled"). I expected this would happen for kswapd and impair
    allocation success rates (https://lkml.org/lkml/2012/1/25/166) but I did
    not anticipate this much a difference: 80% less scanning, 37% less
    reclaim by kswapd

    In comparison, reclaim/compaction is not aggressive and gives up easily
    which is the intended behaviour. hugetlbfs uses __GFP_REPEAT and would
    be much more aggressive about reclaim/compaction than THP allocations
    are. The stress test above is allocating like neither THP or hugetlbfs
    but is much closer to THP.

    Mainline is now impaired in terms of high order allocation under heavy
    load although I do not know to what degree as I did not test with
    __GFP_REPEAT. Keep this in mind for bugs related to hugepage pool
    resizing, THP allocation and high order atomic allocation failures from
    network devices.

    In terms of congestion throttling, I see the following for this test

    FTrace Reclaim Statistics: congestion_wait
    Direct number congest waited 3 0 0 0
    Direct time congest waited 0ms 0ms 0ms 0ms
    Direct full congest waited 0 0 0 0
    Direct number conditional waited 957 512 1081 1075
    Direct time conditional waited 0ms 0ms 0ms 0ms
    Direct full conditional waited 0 0 0 0
    KSwapd number congest waited 36 4 3 5
    KSwapd time congest waited 3148ms 400ms 300ms 500ms
    KSwapd full congest waited 30 4 3 5
    KSwapd number conditional waited 88514 197 332 542
    KSwapd time conditional waited 4980ms 0ms 0ms 0ms
    KSwapd full conditional waited 49 0 0 0

    The "conditional waited" times are the most interesting as this is
    directly impacted by the number of dirty pages encountered during scan.
    As lumpy reclaim is no longer scanning contiguous ranges, it is finding
    fewer dirty pages. This brings wait times from about 5 seconds to 0.
    kswapd itself is still calling congestion_wait() so it'll still stall but
    it's a lot less.

    In terms of the type of IO we were doing, I see this

    FTrace Reclaim Statistics: mm_vmscan_writepage
    Direct writes anon sync 0 0 0 0
    Direct writes anon async 0 0 0 0
    Direct writes file sync 0 0 0 0
    Direct writes file async 0 0 0 0
    Direct writes mixed sync 0 0 0 0
    Direct writes mixed async 0 0 0 0
    KSwapd writes anon sync 0 0 0 0
    KSwapd writes anon async 91682 0 0 0
    KSwapd writes file sync 0 0 0 0
    KSwapd writes file async 822629 0 0 0
    KSwapd writes mixed sync 0 0 0 0
    KSwapd writes mixed async 0 0 0 0

    In 3.2, kswapd was doing a bunch of async writes of pages but
    reclaim/compaction was never reaching a point where it was doing sync
    IO. This does not guarantee that reclaim/compaction was not calling
    wait_on_page_writeback() but I would consider it unlikely. It indicates
    that merging patches 2 and 3 to stop reclaim/compaction calling
    wait_on_page_writeback() should be safe.

    This patch:

    Lumpy reclaim had a purpose but in the mind of some, it was to kick the
    system so hard it trashed. For others the purpose was to complicate
    vmscan.c. Over time it was giving softer shoes and a nicer attitude but
    memory compaction needs to step up and replace it so this patch sends
    lumpy reclaim to the farm.

    The tracepoint format changes for isolating LRU pages with this patch
    applied. Furthermore reclaim/compaction can no longer queue dirty pages
    in pageout() if the underlying BDI is congested. Lumpy reclaim used
    this logic and reclaim/compaction was using it in error.

    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Acked-by: KOSAKI Motohiro
    Cc: Konstantin Khlebnikov
    Cc: Hugh Dickins
    Cc: Ying Han
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • The swap token code no longer fits in with the current VM model. It
    does not play well with cgroups or the better NUMA placement code in
    development, since we have only one swap token globally.

    It also has the potential to mess with scalability of the system, by
    increasing the number of non-reclaimable pages on the active and
    inactive anon LRU lists.

    Last but not least, the swap token code has been broken for a year
    without complaints, as reported by Konstantin Khlebnikov. This suggests
    we no longer have much use for it.

    The days of sub-1G memory systems with heavy use of swap are over. If
    we ever need thrashing reducing code in the future, we will have to
    implement something that does scale.

    Signed-off-by: Rik van Riel
    Cc: Konstantin Khlebnikov
    Acked-by: Johannes Weiner
    Cc: Mel Gorman
    Cc: Hugh Dickins
    Acked-by: Bob Picco
    Acked-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik van Riel
     

29 May, 2012

1 commit

  • Pull writeback tree from Wu Fengguang:
    "Mainly from Jan Kara to avoid iput() in the flusher threads."

    * tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    writeback: Avoid iput() from flusher thread
    vfs: Rename end_writeback() to clear_inode()
    vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
    writeback: Refactor writeback_single_inode()
    writeback: Remove wb->list_lock from writeback_single_inode()
    writeback: Separate inode requeueing after writeback
    writeback: Move I_DIRTY_PAGES handling
    writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
    writeback: Move clearing of I_SYNC into inode_sync_complete()
    writeback: initialize global_dirty_limit
    fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
    mm: page-writeback.c: local functions should not be exposed globally

    Linus Torvalds
     

25 May, 2012

1 commit

  • Pull ext2, ext3 and quota fixes from Jan Kara:
    "Interesting bits are:
    - removal of a special i_mutex locking subclass (I_MUTEX_QUOTA) since
    quota code does not need i_mutex anymore in any unusual way.
    - backport (from ext4) of a fix of a checkpointing bug (missing cache
    flush) that could lead to fs corruption on power failure

    The rest are just random small fixes & cleanups."

    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    ext2: trivial fix to comment for ext2_free_blocks
    ext2: remove the redundant comment for ext2_export_ops
    ext3: return 32/64-bit dir name hash according to usage type
    quota: Get rid of nested I_MUTEX_QUOTA locking subclass
    quota: Use precomputed value of sb_dqopt in dquot_quota_sync
    ext2: Remove i_mutex use from ext2_quota_write()
    reiserfs: Remove i_mutex use from reiserfs_quota_write()
    ext4: Remove i_mutex use from ext4_quota_write()
    ext3: Remove i_mutex use from ext3_quota_write()
    quota: Fix double lock in add_dquot_ref() with CONFIG_QUOTA_DEBUG
    jbd: Write journal superblock with WRITE_FUA after checkpointing
    jbd: protect all log tail updates with j_checkpoint_mutex
    jbd: Split updating of journal superblock and marking journal empty
    ext2: do not register write_super within VFS
    ext2: Remove s_dirt handling
    ext2: write superblock only once on unmount
    ext3: update documentation with barrier=1 default
    ext3: remove max_debt in find_group_orlov()
    jbd: Refine commit writeout logic

    Linus Torvalds
     

24 May, 2012

3 commits

  • Pull user namespace enhancements from Eric Biederman:
    "This is a course correction for the user namespace, so that we can
    reach an inexpensive, maintainable, and reasonably complete
    implementation.

    Highlights:
    - Config guards make it impossible to enable the user namespace and
    code that has not been converted to be user namespace safe.

    - Use of the new kuid_t type ensures the if you somehow get past the
    config guards the kernel will encounter type errors if you enable
    user namespaces and attempt to compile in code whose permission
    checks have not been updated to be user namespace safe.

    - All uids from child user namespaces are mapped into the initial
    user namespace before they are processed. Removing the need to add
    an additional check to see if the user namespace of the compared
    uids remains the same.

    - With the user namespaces compiled out the performance is as good or
    better than it is today.

    - For most operations absolutely nothing changes performance or
    operationally with the user namespace enabled.

    - The worst case performance I could come up with was timing 1
    billion cache cold stat operations with the user namespace code
    enabled. This went from 156s to 164s on my laptop (or 156ns to
    164ns per stat operation).

    - (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
    Most uid/gid setting system calls treat these value specially
    anyway so attempting to use -1 as a uid would likely cause
    entertaining failures in userspace.

    - If setuid is called with a uid that can not be mapped setuid fails.
    I have looked at sendmail, login, ssh and every other program I
    could think of that would call setuid and they all check for and
    handle the case where setuid fails.

    - If stat or a similar system call is called from a context in which
    we can not map a uid we lie and return overflowuid. The LFS
    experience suggests not lying and returning an error code might be
    better, but the historical precedent with uids is different and I
    can not think of anything that would break by lying about a uid we
    can't map.

    - Capabilities are localized to the current user namespace making it
    safe to give the initial user in a user namespace all capabilities.

    My git tree covers all of the modifications needed to convert the core
    kernel and enough changes to make a system bootable to runlevel 1."

    Fix up trivial conflicts due to nearby independent changes in fs/stat.c

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
    userns: Silence silly gcc warning.
    cred: use correct cred accessor with regards to rcu read lock
    userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
    userns: Convert cgroup permission checks to use uid_eq
    userns: Convert tmpfs to use kuid and kgid where appropriate
    userns: Convert sysfs to use kgid/kuid where appropriate
    userns: Convert sysctl permission checks to use kuid and kgids.
    userns: Convert proc to use kuid/kgid where appropriate
    userns: Convert ext4 to user kuid/kgid where appropriate
    userns: Convert ext3 to use kuid/kgid where appropriate
    userns: Convert ext2 to use kuid/kgid where appropriate.
    userns: Convert devpts to use kuid/kgid where appropriate
    userns: Convert binary formats to use kuid/kgid where appropriate
    userns: Add negative depends on entries to avoid building code that is userns unsafe
    userns: signal remove unnecessary map_cred_ns
    userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
    userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
    userns: Convert stat to return values mapped from kuids and kgids
    userns: Convert user specfied uids and gids in chown into kuids and kgid
    userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
    ...

    Linus Torvalds
     
  • Pull power management updates from Rafael Wysocki:

    - Implementation of opportunistic suspend (autosleep) and user space
    interface for manipulating wakeup sources.

    - Hibernate updates from Bojan Smojver and Minho Ban.

    - Updates of the runtime PM core and generic PM domains framework
    related to PM QoS.

    - Assorted fixes.

    * tag 'pm-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (25 commits)
    epoll: Fix user space breakage related to EPOLLWAKEUP
    PM / Domains: Make it possible to add devices to inactive domains
    PM / Hibernate: Use get_gendisk to verify partition if resume_file is integer format
    PM / Domains: Fix computation of maximum domain off time
    PM / Domains: Fix link checking when add subdomain
    PM / Sleep: User space wakeup sources garbage collector Kconfig option
    PM / Sleep: Make the limit of user space wakeup sources configurable
    PM / Documentation: suspend-and-cpuhotplug.txt: Fix typo
    PM / Domains: Cache device stop and domain power off governor results, v3
    PM / Domains: Make device removal more straightforward
    PM / Sleep: Fix a mistake in a conditional in autosleep_store()
    epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready
    PM / QoS: Create device constraints objects on notifier registration
    PM / Runtime: Remove device fields related to suspend time, v2
    PM / Domains: Rework default domain power off governor function, v2
    PM / Domains: Rework default device stop governor function, v2
    PM / Sleep: Add user space interface for manipulating wakeup sources, v3
    PM / Sleep: Add "prevent autosleep time" statistics to wakeup sources
    PM / Sleep: Implement opportunistic sleep, v2
    PM / Sleep: Add wakeup_source_activate and wakeup_source_deactivate tracepoints
    ...

    Linus Torvalds
     
  • Pull sound updates from Takashi Iwai:
    "This is the first big chunk for 3.5 merges of sound stuff.

    There are a few big changes in different areas. First off, the
    streaming logic of USB-audio endpoints has been largely rewritten for
    the better support of "implicit feedback". If anything about USB got
    broken, this change has to be checked.

    For HD-audio, the resume procedure was changed; instead of delaying
    the resume of the hardware until the first use, now waking up
    immediately at resume. This is for buggy BIOS.

    For ASoC, dynamic PCM support and the improved support for digital
    links between off-SoC devices are major framework changes.

    Some highlights are below:

    * HD-audio
    - Avoid accesses of invalid pin-control bits that may stall the codec
    - V-ref setup cleanups
    - Fix the races in power-saving code
    - Fix the races in codec cache hashes and connection lists
    - Split some common codes for BIOS auto-parser to hda_auto_parser.c
    - Changed the PM resume code to wake up immediately for buggy BIOS
    - Creative SoundCore3D support
    - Add Conexant CX20751/2/3/4 codec support

    * ASoC
    - Dynamic PCM support, allowing support for SoCs with internal
    routing through components with tight sequencing and formatting
    constraints within their internal paths or where there are multiple
    components connected with CPU managed DMA controllers inside the
    SoC.
    - Greatly improved support for direct digital links between off-SoC
    devices, providing a much simpler way of connecting things like
    digital basebands to CODECs.
    - Much more fine grained and robust locking, cleaning up some of the
    confusion that crept in with multi-component.
    - CPU support for nVidia Tegra 30 I2S and audio hub controllers and
    ST-Ericsson MSP I2S controolers
    - New CODEC drivers for Cirrus CS42L52, LAPIS Semiconductor ML26124,
    Texas Instruments LM49453.
    - Some regmap changes needed by the Tegra I2S driver.
    - mc13783 audio support.

    * Misc
    - Rewrite with module_pci_driver()
    - Xonar DGX support for snd-oxygen
    - Improvement of packet handling in snd-firewire driver
    - New USB-endpoint streaming logic
    - Enhanced M-audio FTU quirks and relevant cleanups
    - Increment the support of OSS devices to 256
    - snd-aloop accuracy improvement

    There are a few more pending changes for 3.5, but they will be sent
    slightly later as partly depending on the changes of DRM."

    Fix up conflicts in regmap (due to duplicate patches, with some further
    updates then having already come in from the regmap tree). Also some
    fairly trivial context conflicts in the imx and mcx soc drivers.

    * tag 'sound-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (280 commits)
    ALSA: snd-usb: fix stream info output in /proc
    ALSA: pcm - Add proper state checks to snd_pcm_drain()
    ALSA: sh: Fix up namespace collision in sh_dac_audio.
    ALSA: hda/realtek - Fix unused variable compile warning
    ASoC: sh: fsi: enable chip specific data transfer mode
    ASoC: sh: fsi: call fsi_hw_startup/shutdown from fsi_dai_trigger()
    ASoC: sh: fsi: use same format for IN/OUT
    ASoC: sh: fsi: add fsi_version() and removed meaningless version check
    ASoC: sh: fsi: use register field macro name on IN/OUT_DMAC
    ASoC: tegra: Add machine driver for WM8753 codec
    ALSA: hda - Fix possible races of accesses to connection list array
    ASoC: OMAP: HDMI: Introduce codec
    ARM: mx31_3ds: Add sound support
    ASoC: imx-mc13783 cleanup
    mx31moboard: Add sound support
    ASoC: mc13783 codec cleanups
    ASoC: add imx-mc13783 sound support
    ASoC: Add mc13783 codec
    mfd: mc13xxx: add codec platform data
    ASoC: don't flip master of DT-instantiated DAI links
    ...

    Linus Torvalds
     

23 May, 2012

1 commit

  • Pull trivial updates from Jiri Kosina:
    "As usual, it's mostly typo fixes, redundant code elimination and some
    documentation updates."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (57 commits)
    edac, mips: don't change code that has been removed in edac/mips tree
    xtensa: Change mail addresses of Hannes Weiner and Oskar Schirmer
    lib: Change mail address of Oskar Schirmer
    net: Change mail address of Oskar Schirmer
    arm/m68k: Change mail address of Sebastian Hess
    i2c: Change mail address of Oskar Schirmer
    net: Fix tcp_build_and_update_options comment in struct tcp_sock
    atomic64_32.h: fix parameter naming mismatch
    Kconfig: replace "--- help ---" with "---help---"
    c2port: fix bogus Kconfig "default no"
    edac: Fix spelling errors.
    qla1280: Remove redundant NULL check before release_firmware() call
    remoteproc: remove redundant NULL check before release_firmware()
    qla2xxx: Remove redundant NULL check before release_firmware() call.
    aic94xx: Get rid of redundant NULL check before release_firmware() call
    tehuti: delete redundant NULL check before release_firmware()
    qlogic: get rid of a redundant test for NULL before call to release_firmware()
    bna: remove redundant NULL test before release_firmware()
    tg3: remove redundant NULL test before release_firmware() call
    typhoon: get rid of redundant conditional before all to release_firmware()
    ...

    Linus Torvalds
     

16 May, 2012

2 commits