Eric Lee / smarc-fsl-linux-kernel

10 Oct, 2012

1 commit

72055425e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs update from Chris Mason:
"This is a large pull, with the bulk of the updates coming from:

- Hole punching

- send/receive fixes

- fsync performance

- Disk format extension allowing more hardlinks inside a single
directory (btrfs-progs patch required to enable the compat bit for
this one)

I'm cooking more unrelated RAID code, but I wanted to make sure this
original batch makes it in. The largest updates here are relatively
old and have been in testing for some time."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (121 commits)
btrfs: init ref_index to zero in add_inode_ref
Btrfs: remove repeated eb->pages check in, disk-io.c/csum_dirty_buffer
Btrfs: fix page leakage
Btrfs: do not warn_on when we cannot alloc a page for an extent buffer
Btrfs: don't bug on enomem in readpage
Btrfs: cleanup pages properly when ENOMEM in compression
Btrfs: make filesystem read-only when submitting barrier fails
Btrfs: detect corrupted filesystem after write I/O errors
Btrfs: make compress and nodatacow mount options mutually exclusive
btrfs: fix message printing
Btrfs: don't bother committing delayed inode updates when fsyncing
btrfs: move inline function code to header file
Btrfs: remove unnecessary IS_ERR in bio_readpage_error()
btrfs: remove unused function btrfs_insert_some_items()
Btrfs: don't commit instead of overcommitting
Btrfs: confirmation of value is added before trace_btrfs_get_extent() is called
Btrfs: be smarter about dropping things from the tree log
Btrfs: don't lookup csums for prealloc extents
Btrfs: cache extent state when writing out dirty metadata pages
Btrfs: do not hold the file extent leaf locked when adding extent item
...

Linus Torvalds
2012-10-10 09:49:20 +0800

09 Oct, 2012

1 commit

c65434592 mm: remove __GFP_NO_KSWAPD ... Browse Code »

When transparent huge pages were introduced, memory compaction and swap
storms were an issue, and the kernel had to be careful to not make THP
allocations cause pageout or compaction.

Now that we have working compaction deferral, kswapd is smart enough to
invoke compaction and the quadratic behaviour around isolate_free_pages
has been fixed, it should be safe to remove __GFP_NO_KSWAPD.

[minchan@kernel.org: Comment fix]
[mgorman@suse.de: Avoid direct reclaim for deferred compaction]
Cc: Andrea Arcangeli
Signed-off-by: Rik van Riel
Signed-off-by: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rik van Riel
2012-10-09 15:22:15 +0800

08 Oct, 2012

1 commit

6432f2128 Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 updates from Ted Ts'o:
"The big new feature added this time is supporting online resizing
using the meta_bg feature. This allows us to resize file systems
which are greater than 16TB. In addition, the speed of online
resizing has been improved in general.

We also fix a number of races, some of which could lead to deadlocks,
in ext4's Asynchronous I/O and online defrag support, thanks to good
work by Dmitry Monakhov.

There are also a large number of more minor bug fixes and cleanups
from a number of other ext4 contributors, quite of few of which have
submitted fixes for the first time."

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (69 commits)
ext4: fix ext4_flush_completed_IO wait semantics
ext4: fix mtime update in nodelalloc mode
ext4: fix ext_remove_space for punch_hole case
ext4: punch_hole should wait for DIO writers
ext4: serialize truncate with owerwrite DIO workers
ext4: endless truncate due to nonlocked dio readers
ext4: serialize unlocked dio reads with truncate
ext4: serialize dio nonlocked reads with defrag workers
ext4: completed_io locking cleanup
ext4: fix unwritten counter leakage
ext4: give i_aiodio_unwritten a more appropriate name
ext4: ext4_inode_info diet
ext4: convert to use leXX_add_cpu()
ext4: ext4_bread usage audit
fs: reserve fallocate flag codepoint
ext4: remove redundant offset check in mext_check_arguments()
ext4: don't clear orphan list on ro mount with errors
jbd2: fix assertion failure in commit code due to lacking transaction credits
ext4: release donor reference when EXT4_IOC_MOVE_EXT ioctl fails
ext4: enable FITRIM ioctl on bigalloc file system
...

Linus Torvalds
2012-10-08 05:36:39 +0800

03 Oct, 2012

1 commit

a1ce39288 UAPI: (Scripted) Convert #include "..." to #include <path/...> in kernel system headers ... Browse Code »

Convert #include "..." to #include in kernel system headers.

Signed-off-by: David Howells
Acked-by: Arnd Bergmann
Acked-by: Thomas Gleixner
Acked-by: Paul E. McKenney
Acked-by: Dave Jones

David Howells
2012-10-03 01:01:25 +0800

02 Oct, 2012

2 commits

dea7d76ec Btrfs: update delayed ref's tracepoints to show sequence ... Browse Code »

We've added a new field 'sequence' to delayed ref node, so update related
tracepoints.

Signed-off-by: Liu Bo

Liu Bo
2012-10-02 03:19:17 +0800
99dbb1632 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull the trivial tree from Jiri Kosina:
"Tiny usual fixes all over the place"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
doc: fix old config name of kprobetrace
fs/fs-writeback.c: cleanup riteback_sb_inodes kerneldoc
btrfs: fix the commment for the action flags in delayed-ref.h
btrfs: fix trivial typo for the comment of BTRFS_FREE_INO_OBJECTID
vfs: fix kerneldoc for generic_fh_to_parent()
treewide: fix comment/printk/variable typos
ipr: fix small coding style issues
doc: fix broken utf8 encoding
nfs: comment fix
platform/x86: fix asus_laptop.wled_type module parameter
mfd: printk/comment fixes
doc: getdelays.c: remember to close() socket on error in create_nl_socket()
doc: aliasing-test: close fd on write error
mmc: fix comment typos
dma: fix comments
spi: fix comment/printk typos in spi
Coccinelle: fix typo in memdup_user.cocci
tmiofb: missing NULL pointer checks
tools: perf: Fix typo in tools/perf
tools/testing: fix comment / output typos
...

Linus Torvalds
2012-10-02 00:06:36 +0800

21 Sep, 2012

1 commit

85f2a2ef1 tracing: Don't call page_to_pfn() if page is NULL ... Browse Code »

When allocating memory fails, page is NULL. page_to_pfn() will
cause the kernel panicked if we don't use sparsemem vmemmap.

Link: http://lkml.kernel.org/r/505AB1FF.8020104@cn.fujitsu.com

Cc: Frederic Weisbecker
Cc: Ingo Molnar
Cc: Andrew Morton
Cc: stable
Acked-by: Mel Gorman
Reviewed-by: Minchan Kim
Signed-off-by: Wen Congyang
Signed-off-by: Steven Rostedt

Wen Congyang
2012-09-21 03:51:16 +0800

02 Sep, 2012

1 commit

4907cb7b1 treewide: fix comment/printk/variable typos ... Browse Code »

Signed-off-by: Anatol Pomozov
Signed-off-by: Jiri Kosina

Anatol Pomozov
2012-09-02 01:33:05 +0800

17 Aug, 2012

2 commits

813702917 ext4: add missing space to trace message ... Browse Code »

Signed-off-by: Anatol Pomozov
Signed-off-by: "Theodore Ts'o"

Anatol Pomozov
2012-08-17 21:52:17 +0800
210c05264 ext4: realign trace events structs to make it smaller ... Browse Code »

Most hardware architectures require that data (including struct fields)
have to be aligned in memory. To make it happen compiler inserts padding
between struct fields if they are not aligned correctly.

Reorder fields to remove paddings and make structures denser. Making data
smaller saves some memory that is very important for trace events.
Tracing buffer has limited size and making objects smaller we can put more
of them without overflowing the tracing buffer.

To find data struct holes I used 'pahole -H 1 -E -I vmlinux.o' from
'dwarves' package.

Signed-off-by: Anatol Pomozov
Signed-off-by: "Theodore Ts'o"

Anatol Pomozov
2012-08-17 21:50:17 +0800

04 Aug, 2012

1 commit

bd463a060 Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf fixes from Ingo Molnar:
"Fix merge window fallout and fix sleep profiling (this was always
broken, so it's not a fix for the merge window - we can skip this one
from the head of the tree)."

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/trace: Add ability to set a target task for events
perf/x86: Fix USER/KERNEL tagging of samples properly
perf/x86/intel/uncore: Make UNCORE_PMU_HRTIMER_INTERVAL 64-bit

Linus Torvalds
2012-08-04 01:57:20 +0800

01 Aug, 2012

3 commits

ac694dbdb Merge branch 'akpm' (Andrew's patch-bomb) ... Browse Code »

Merge Andrew's second set of patches:
- MM
- a few random fixes
- a couple of RTC leftovers

* emailed patches from Andrew Morton : (120 commits)
rtc/rtc-88pm80x: remove unneed devm_kfree
rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
tmpfs: distribute interleave better across nodes
mm: remove redundant initialization
mm: warn if pg_data_t isn't initialized with zero
mips: zero out pg_data_t when it's allocated
memcg: gix memory accounting scalability in shrink_page_list
mm/sparse: remove index_init_lock
mm/sparse: more checks on mem_section number
mm/sparse: optimize sparse_index_alloc
memcg: add mem_cgroup_from_css() helper
memcg: further prevent OOM with too many dirty pages
memcg: prevent OOM with too many dirty pages
mm: mmu_notifier: fix freed page still mapped in secondary MMU
mm: memcg: only check anon swapin page charges for swap cache
mm: memcg: only check swap cache pages for repeated charging
mm: memcg: split swapin charge function into private and public part
mm: memcg: remove needless !mm fixup to init_mm when charging
mm: memcg: remove unneeded shmem charge type
...

Linus Torvalds
2012-08-01 10:25:39 +0800
3e9a97082 Merge tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random ... Browse Code »

Pull random subsystem patches from Ted Ts'o:
"This patch series contains a major revamp of how we collect entropy
from interrupts for /dev/random and /dev/urandom.

The goal is to addresses weaknesses discussed in the paper "Mining
your Ps and Qs: Detection of Widespread Weak Keys in Network Devices",
by Nadia Heninger, Zakir Durumeric, Eric Wustrow, J. Alex Halderman,
which will be published in the Proceedings of the 21st Usenix Security
Symposium, August 2012. (See https://factorable.net for more
information and an extended version of the paper.)"

Fix up trivial conflicts due to nearby changes in
drivers/{mfd/ab3100-core.c, usb/gadget/omap_udc.c}

* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: (33 commits)
random: mix in architectural randomness in extract_buf()
dmi: Feed DMI table to /dev/random driver
random: Add comment to random_initialize()
random: final removal of IRQF_SAMPLE_RANDOM
um: remove IRQF_SAMPLE_RANDOM which is now a no-op
sparc/ldc: remove IRQF_SAMPLE_RANDOM which is now a no-op
[ARM] pxa: remove IRQF_SAMPLE_RANDOM which is now a no-op
board-palmz71: remove IRQF_SAMPLE_RANDOM which is now a no-op
isp1301_omap: remove IRQF_SAMPLE_RANDOM which is now a no-op
pxa25x_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
omap_udc: remove IRQF_SAMPLE_RANDOM which is now a no-op
goku_udc: remove IRQF_SAMPLE_RANDOM which was commented out
uartlite: remove IRQF_SAMPLE_RANDOM which is now a no-op
drivers: hv: remove IRQF_SAMPLE_RANDOM which is now a no-op
xen-blkfront: remove IRQF_SAMPLE_RANDOM which is now a no-op
n2_crypto: remove IRQF_SAMPLE_RANDOM which is now a no-op
pda_power: remove IRQF_SAMPLE_RANDOM which is now a no-op
i2c-pmcmsp: remove IRQF_SAMPLE_RANDOM which is now a no-op
input/serio/hp_sdc.c: remove IRQF_SAMPLE_RANDOM which is now a no-op
mfd: remove IRQF_SAMPLE_RANDOM which is now a no-op
...

Linus Torvalds
2012-08-01 10:07:42 +0800
b37f1dd0f mm: introduce __GFP_MEMALLOC to allow access to emergency reserves ... Browse Code »

__GFP_MEMALLOC will allow the allocation to disregard the watermarks, much
like PF_MEMALLOC. It allows one to pass along the memalloc state in
object related allocation flags as opposed to task related flags, such as
sk->sk_allocation. This removes the need for ALLOC_PFMEMALLOC as callers
using __GFP_MEMALLOC can get the ALLOC_NO_WATERMARK flag which is now
enough to identify allocations related to page reclaim.

Signed-off-by: Peter Zijlstra
Signed-off-by: Mel Gorman
Cc: David Miller
Cc: Neil Brown
Cc: Mike Christie
Cc: Eric B Munson
Cc: Eric Dumazet
Cc: Sebastian Andrzej Siewior
Cc: Mel Gorman
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-08-01 09:42:45 +0800

31 Jul, 2012

1 commit

e6dab5ffa perf/trace: Add ability to set a target task for events ... Browse Code »

A few events are interesting not only for a current task.
For example, sched_stat_* events are interesting for a task
which wakes up. For this reason, it will be good if such
events will be delivered to a target task too.

Now a target task can be set by using __perf_task().

The original idea and a draft patch belongs to Peter Zijlstra.

I need these events for profiling sleep times. sched_switch is used for
getting callchains and sched_stat_* is used for getting time periods.
These events are combined in user space, then it can be analyzed by
perf tools.

Inspired-by: Peter Zijlstra
Cc: Steven Rostedt
Cc: Paul Mackerras
Cc: Arnaldo Carvalho de Melo
Cc: Steven Rostedt
Cc: Arun Sharma
Signed-off-by: Andrew Vagin
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1342016098-213063-1-git-send-email-avagin@openvz.org
Signed-off-by: Ingo Molnar

Andrew Vagin
2012-07-31 23:02:05 +0800

27 Jul, 2012

1 commit

4cb38750d Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »
92

Pull x86/mm changes from Peter Anvin:
"The big change here is the patchset by Alex Shi to use INVLPG to flush
only the affected pages when we only need to flush a small page range.

It also removes the special INVALIDATE_TLB_VECTOR interrupts (32
vectors!) and replace it with an ordinary IPI function call."

Fix up trivial conflicts in arch/x86/include/asm/apic.h (added code next
to changed line)

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tlb: Fix build warning and crash when building for !SMP
x86/tlb: do flush_tlb_kernel_range by 'invlpg'
x86/tlb: replace INVALIDATE_TLB_VECTOR by CALL_FUNCTION_VECTOR
x86/tlb: enable tlb flush range support for x86
mm/mmu_gather: enable tlb flush range in generic mmu_gather
x86/tlb: add tlb_flushall_shift knob into debugfs
x86/tlb: add tlb_flushall_shift for specific CPU
x86/tlb: fall back to flush all when meet a THP large page
x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range
x86/tlb_info: get last level TLB entry number of CPU
x86: Add read_mostly declaration/definition to variables from smp.h
x86: Define early read-mostly per-cpu macros

Linus Torvalds
2012-07-27 04:17:17 +0800

25 Jul, 2012

2 commits

a08489c56 Merge branch 'for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq ... Browse Code »
46

Pull workqueue changes from Tejun Heo:
"There are three major changes.

- WQ_HIGHPRI has been reimplemented so that high priority work items
are served by worker threads with -20 nice value from dedicated
highpri worker pools.

- CPU hotplug support has been reimplemented such that idle workers
are kept across CPU hotplug events. This makes CPU hotplug cheaper
(for PM) and makes the code simpler.

- flush_kthread_work() has been reimplemented so that a work item can
be freed while executing. This removes an annoying behavior
difference between kthread_worker and workqueue."

* 'for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix spurious CPU locality WARN from process_one_work()
kthread_worker: reimplement flush_kthread_work() to allow freeing the work item being executed
kthread_worker: reorganize to prepare for flush_kthread_work() reimplementation
workqueue: simplify CPU hotplug code
workqueue: remove CPU offline trustee
workqueue: don't butcher idle workers on an offline CPU
workqueue: reimplement CPU online rebinding to handle idle workers
workqueue: drop @bind from create_worker()
workqueue: use mutex for global_cwq manager exclusion
workqueue: ROGUE workers are UNBOUND workers
workqueue: drop CPU_DYING notifier operation
workqueue: perform cpu down operations from low priority cpu_notifier()
workqueue: reimplement WQ_HIGHPRI using a separate worker_pool
workqueue: introduce NR_WORKER_POOLS and for_each_worker_pool()
workqueue: separate out worker_pool flags
workqueue: use @pool instead of @gcwq or @cpu where applicable
workqueue: factor out worker_pool from global_cwq
workqueue: don't use WQ_HIGHPRI for unbound workqueues

Linus Torvalds
2012-07-25 08:46:16 +0800
5fecc9d8f Merge tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm ... Browse Code »

Pull KVM updates from Avi Kivity:
"Highlights include
- full big real mode emulation on pre-Westmere Intel hosts (can be
disabled with emulate_invalid_guest_state=0)
- relatively small ppc and s390 updates
- PCID/INVPCID support in guests
- EOI avoidance; 3.6 guests should perform better on 3.6 hosts on
interrupt intensive workloads)
- Lockless write faults during live migration
- EPT accessed/dirty bits support for new Intel processors"

Fix up conflicts in:
- Documentation/virtual/kvm/api.txt:

Stupid subchapter numbering, added next to each other.

- arch/powerpc/kvm/booke_interrupts.S:

PPC asm changes clashing with the KVM fixes

- arch/s390/include/asm/sigp.h, arch/s390/kvm/sigp.c:

Duplicated commits through the kvm tree and the s390 tree, with
subsequent edits in the KVM tree.

* tag 'kvm-3.6-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (93 commits)
KVM: fix race with level interrupts
x86, hyper: fix build with !CONFIG_KVM_GUEST
Revert "apic: fix kvm build on UP without IOAPIC"
KVM guest: switch to apic_set_eoi_write, apic_write
apic: add apic_set_eoi_write for PV use
KVM: VMX: Implement PCID/INVPCID for guests with EPT
KVM: Add x86_hyper_kvm to complete detect_hypervisor_platform check
KVM: PPC: Critical interrupt emulation support
KVM: PPC: e500mc: Fix tlbilx emulation for 64-bit guests
KVM: PPC64: booke: Set interrupt computation mode for 64-bit host
KVM: PPC: bookehv: Add ESR flag to Data Storage Interrupt
KVM: PPC: bookehv64: Add support for std/ld emulation.
booke: Added crit/mc exception handler for e500v2
booke/bookehv: Add host crit-watchdog exception support
KVM: MMU: document mmu-lock and fast page fault
KVM: MMU: fix kvm_mmu_pagetable_walk tracepoint
KVM: MMU: trace fast page fault
KVM: MMU: fast path of handling guest page fault
KVM: MMU: introduce SPTE_MMU_WRITEABLE bit
KVM: MMU: fold tlb flush judgement into mmu_spte_update
...

Linus Torvalds
2012-07-25 03:01:20 +0800

23 Jul, 2012

1 commit

2eafeb6a4 Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull perf events changes from Ingo Molnar:

"- kernel side:

- Intel uncore PMU support for Nehalem and Sandy Bridge CPUs, we
support both the events available via the MSR and via the PCI
access space.

- various uprobes cleanups and restructurings

- PMU driver quirks by microcode version and required x86 microcode
loader cleanups/robustization

- various tracing robustness updates

- static keys: remove obsolete static_branch()

- tooling side:

- GTK browser improvements

- perf report browser: support screenshots to file

- more automated tests

- perf kvm improvements

- perf bench refinements

- build environment improvements

- pipe mode improvements

- libtraceevent updates, we have now hopefully merged most bits with
the out of tree forked code base

... and many other goodies."

* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (138 commits)
tracing: Check for allocation failure in __tracing_open()
perf/x86: Fix intel_perfmon_event_mapformatting
jump label: Remove static_branch()
tracepoint: Use static_key_false(), since static_branch() is deprecated
perf/x86: Uncore filter support for SandyBridge-EP
perf/x86: Detect number of instances of uncore CBox
perf/x86: Fix event constraint for SandyBridge-EP C-Box
perf/x86: Use 0xff as pseudo code for fixed uncore event
perf/x86: Save a few bytes in 'struct x86_pmu'
perf/x86: Add a microcode revision check for SNB-PEBS
perf/x86: Improve debug output in check_hw_exists()
perf/x86/amd: Unify AMD's generic and family 15h pmus
perf/x86: Move Intel specific code to intel_pmu_init()
perf/x86: Rename Intel specific macros
perf/x86: Fix USER/KERNEL tagging of samples
perf tools: Split event symbols arrays to hw and sw parts
perf tools: Split out PE_VALUE_SYM parsing token to SW and HW tokens
perf tools: Add empty rule for new line in event syntax parsing
perf test: Use ARRAY_SIZE in parse events tests
tools lib traceevent: Cleanup realloc use
...

Linus Torvalds
2012-07-23 02:10:36 +0800

15 Jul, 2012

1 commit

00ce1db1a random: add tracepoints for easier debugging and verification ... Browse Code »

Signed-off-by: "Theodore Ts'o"

Theodore Ts'o
2012-07-15 08:17:48 +0800

13 Jul, 2012

1 commit

bd7bdd43d workqueue: factor out worker_pool from global_cwq ... Browse Code »

Move worklist and all worker management fields from global_cwq into
the new struct worker_pool. worker_pool points back to the containing
gcwq. worker and cpu_workqueue_struct are updated to point to
worker_pool instead of gcwq too.

This change is mechanical and doesn't introduce any functional
difference other than rearranging of fields and an added level of
indirection in some places. This is to prepare for multiple pools per
gcwq.

v2: Comment typo fixes as suggested by Namhyung.

Signed-off-by: Tejun Heo
Cc: Namhyung Kim

Tejun Heo
2012-07-13 05:46:37 +0800

06 Jul, 2012

1 commit

35c2f48c6 Merge branch 'tip/perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/ro… ... Browse Code »

…stedt/linux-trace into perf/core

Pull tracing updates from Steve Rostedt.

Signed-off-by: Ingo Molnar <mingo@kernel.org>

Ingo Molnar
2012-07-06 17:12:17 +0800

03 Jul, 2012

1 commit

a83eff0a8 rcu: Add tracing for _rcu_barrier() ... Browse Code »

This commit adds event tracing for _rcu_barrier() execution. This
is defined only if RCU_TRACE=y.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Reviewed-by: Josh Triplett

Paul E. McKenney
2012-07-03 03:33:23 +0800

29 Jun, 2012

1 commit

b102f1d0f tracing/kvm: Use __print_hex() for kvm_emulate_insn tracepoint ... Browse Code »

The kvm_emulate_insn tracepoint used __print_insn()
for printing its instructions. However it makes the
format of the event hard to parse as it reveals TP
internals.

Fortunately, kernel provides __print_hex for almost
same purpose, we can use it instead of open coding
it. The user-space can be changed to parse it later.

That means raw kernel tracing will not be affected
by this change:

# cd /sys/kernel/debug/tracing/
# cat events/kvm/kvm_emulate_insn/format
name: kvm_emulate_insn
ID: 29
format:
...
print fmt: "%x:%llx:%s (%s)%s", REC->csbase, REC->rip, __print_hex(REC->insn, REC->len), \
__print_symbolic(REC->flags, { 0, "real" }, { (1 << 0) | (1 << 1), "vm16" }, \
{ (1 << 0), "prot16" }, { (1 << 0) | (1 << 2), "prot32" }, { (1 << 0) | (1 << 3), "prot64" }), \
REC->failed ? " failed" : ""

# echo 1 > events/kvm/kvm_emulate_insn/enable
# cat trace
# tracer: nop
#
# entries-in-buffer/entries-written: 2183/2183 #P:12
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
qemu-kvm-1782 [002] ...1 140.931636: kvm_emulate_insn: 0:c102fa25:89 10 (prot32)
qemu-kvm-1781 [004] ...1 140.931637: kvm_emulate_insn: 0:c102fa25:89 10 (prot32)

Link: http://lkml.kernel.org/n/tip-wfw6y3b9ugtey8snaow9nmg5@git.kernel.org
Link: http://lkml.kernel.org/r/1340757701-10711-2-git-send-email-namhyung@kernel.org

Cc: Arnaldo Carvalho de Melo
Cc: Frederic Weisbecker
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Namhyung Kim
Cc: kvm@vger.kernel.org
Acked-by: Avi Kivity
Signed-off-by: Namhyung Kim
Signed-off-by: Steven Rostedt

Namhyung Kim
2012-06-29 01:52:15 +0800

28 Jun, 2012

1 commit

e7b52ffd4 x86/flush_tlb: try flush_tlb_single one by one in flush_tlb_range ... Browse Code »

x86 has no flush_tlb_range support in instruction level. Currently the
flush_tlb_range just implemented by flushing all page table. That is not
the best solution for all scenarios. In fact, if we just use 'invlpg' to
flush few lines from TLB, we can get the performance gain from later
remain TLB lines accessing.

But the 'invlpg' instruction costs much of time. Its execution time can
compete with cr3 rewriting, and even a bit more on SNB CPU.

So, on a 512 4KB TLB entries CPU, the balance points is at:
(512 - X) * 100ns(assumed TLB refill cost) =
X(TLB flush entries) * 100ns(assumed invlpg cost)

Here, X is 256, that is 1/2 of 512 entries.

But with the mysterious CPU pre-fetcher and page miss handler Unit, the
assumed TLB refill cost is far lower then 100ns in sequential access. And
2 HT siblings in one core makes the memory access more faster if they are
accessing the same memory. So, in the patch, I just do the change when
the target entries is less than 1/16 of whole active tlb entries.
Actually, I have no data support for the percentage '1/16', so any
suggestions are welcomed.

As to hugetlb, guess due to smaller page table, and smaller active TLB
entries, I didn't see benefit via my benchmark, so no optimizing now.

My micro benchmark show in ideal scenarios, the performance improves 70
percent in reading. And in worst scenario, the reading/writing
performance is similar with unpatched 3.4-rc4 kernel.

Here is the reading data on my 2P * 4cores *HT NHM EP machine, with THP
'always':

multi thread testing, '-t' paramter is thread number:
with patch unpatched 3.4-rc4
./mprotect -t 1 14ns 24ns
./mprotect -t 2 13ns 22ns
./mprotect -t 4 12ns 19ns
./mprotect -t 8 14ns 16ns
./mprotect -t 16 28ns 26ns
./mprotect -t 32 54ns 51ns
./mprotect -t 128 200ns 199ns

Single process with sequencial flushing and memory accessing:

with patch unpatched 3.4-rc4
./mprotect 7ns 11ns
./mprotect -p 4096 -l 8 -n 10240
21ns 21ns

[ hpa: http://lkml.kernel.org/r/1B4B44D9196EFF41AE41FDA404FC0A100BFF94@SHSMSX101.ccr.corp.intel.com
has additional performance numbers. ]

Signed-off-by: Alex Shi
Link: http://lkml.kernel.org/r/1340845344-27557-3-git-send-email-alex.shi@intel.com
Signed-off-by: H. Peter Anvin

Alex Shi
2012-06-28 10:29:07 +0800

18 Jun, 2012

1 commit

a1e4ccb99 KVM: Introduce __KVM_HAVE_IRQ_LINE ... Browse Code »

This is a preparatory patch for the KVM/ARM implementation. KVM/ARM will use
the KVM_IRQ_LINE ioctl, which is currently conditional on
__KVM_HAVE_IOAPIC, but ARM obviously doesn't have any IOAPIC support and we
need a separate define.

Signed-off-by: Christoffer Dall
Signed-off-by: Avi Kivity

Christoffer Dall
2012-06-18 21:06:35 +0800

14 Jun, 2012

1 commit

dcce04894 KVM: trace events: update list of exit reasons ... Browse Code »

The list of exit reasons for the kvm_userspace_exit event was
missing recent additions; bring it into sync again.

Reviewed-by: Christian Borntraeger
Signed-off-by: Cornelia Huck
Signed-off-by: Marcelo Tosatti

Cornelia Huck
2012-06-14 07:53:46 +0800

07 Jun, 2012

1 commit

fd4b35268 rcu: Update RCU_FAST_NO_HZ tracing for lazy callbacks ... Browse Code »

In the current code, a short dyntick-idle interval (where there is
at least one non-lazy callback on the CPU) and a long dyntick-idle
interval (where there are only lazy callbacks on the CPU) are traced
identically, which can be less than helpful. This commit therefore
emits different event traces in these two cases.

Signed-off-by: Paul E. McKenney
Signed-off-by: Paul E. McKenney
Tested-by: Heiko Carstens
Tested-by: Pascal Chapperon

Paul E. McKenney
2012-06-07 11:43:27 +0800

30 May, 2012

4 commits

23b9da55c mm: vmscan: remove reclaim_mode_t ... Browse Code »

There is little motiviation for reclaim_mode_t once RECLAIM_MODE_[A]SYNC
and lumpy reclaim have been removed. This patch gets rid of
reclaim_mode_t as well and improves the documentation about what
reclaim/compaction is and when it is triggered.

Signed-off-by: Mel Gorman
Acked-by: Rik van Riel
Acked-by: KOSAKI Motohiro
Cc: Konstantin Khlebnikov
Cc: Hugh Dickins
Cc: Ying Han
Cc: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-05-30 07:22:19 +0800
41ac1999c mm: vmscan: do not stall on writeback during memory compaction ... Browse Code »

This patch stops reclaim/compaction entering sync reclaim as this was
only intended for lumpy reclaim and an oversight. Page migration has
its own logic for stalling on writeback pages if necessary and memory
compaction is already using it.

Waiting on page writeback is bad for a number of reasons but the primary
one is that waiting on writeback to a slow device like USB can take a
considerable length of time. Page reclaim instead uses
wait_iff_congested() to throttle if too many dirty pages are being
scanned.

Signed-off-by: Mel Gorman
Acked-by: Rik van Riel
Acked-by: KOSAKI Motohiro
Cc: Konstantin Khlebnikov
Cc: Hugh Dickins
Cc: Ying Han
Cc: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-05-30 07:22:19 +0800
c53919adc mm: vmscan: remove lumpy reclaim ... Browse Code »

This series removes lumpy reclaim and some stalling logic that was
unintentionally being used by memory compaction. The end result is that
stalling on dirty pages during page reclaim now depends on
wait_iff_congested().

Four kernels were compared

3.3.0 vanilla
3.4.0-rc2 vanilla
3.4.0-rc2 lumpyremove-v2 is patch one from this series
3.4.0-rc2 nosync-v2r3 is the full series

Removing lumpy reclaim saves almost 900 bytes of text whereas the full
series removes 1200 bytes.

text data bss dec hex filename
6740375 1927944 2260992 10929311 a6c49f vmlinux-3.4.0-rc2-vanilla
6739479 1927944 2260992 10928415 a6c11f vmlinux-3.4.0-rc2-lumpyremove-v2
6739159 1927944 2260992 10928095 a6bfdf vmlinux-3.4.0-rc2-nosync-v2

There are behaviour changes in the series and so tests were run with
monitoring of ftrace events. This disrupts results so the performance
results are distorted but the new behaviour should be clearer.

fs-mark running in a threaded configuration showed little of interest as
it did not push reclaim aggressively

FS-Mark Multi Threaded
3.3.0-vanilla rc2-vanilla lumpyremove-v2r3 nosync-v2r3
Files/s min 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%)
Files/s mean 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%)
Files/s stddev 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%) 0.00 ( 0.00%)
Files/s max 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%) 3.20 ( 0.00%)
Overhead min 508667.00 ( 0.00%) 521350.00 (-2.49%) 544292.00 (-7.00%) 547168.00 (-7.57%)
Overhead mean 551185.00 ( 0.00%) 652690.73 (-18.42%) 991208.40 (-79.83%) 570130.53 (-3.44%)
Overhead stddev 18200.69 ( 0.00%) 331958.29 (-1723.88%) 1579579.43 (-8578.68%) 9576.81 (47.38%)
Overhead max 576775.00 ( 0.00%) 1846634.00 (-220.17%) 6901055.00 (-1096.49%) 585675.00 (-1.54%)
MMTests Statistics: duration
Sys Time Running Test (seconds) 309.90 300.95 307.33 298.95
User+Sys Time Running Test (seconds) 319.32 309.67 315.69 307.51
Total Elapsed Time (seconds) 1187.85 1193.09 1191.98 1193.73

MMTests Statistics: vmstat
Page Ins 80532 82212 81420 79480
Page Outs 111434984 111456240 111437376 111582628
Swap Ins 0 0 0 0
Swap Outs 0 0 0 0
Direct pages scanned 44881 27889 27453 34843
Kswapd pages scanned 25841428 25860774 25861233 25843212
Kswapd pages reclaimed 25841393 25860741 25861199 25843179
Direct pages reclaimed 44881 27889 27453 34843
Kswapd efficiency 99% 99% 99% 99%
Kswapd velocity 21754.791 21675.460 21696.029 21649.127
Direct efficiency 100% 100% 100% 100%
Direct velocity 37.783 23.375 23.031 29.188
Percentage direct scans 0% 0% 0% 0%

ftrace showed that there was no stalling on writeback or pages submitted
for IO from reclaim context.

postmark was similar and while it was more interesting, it also did not
push reclaim heavily.

POSTMARK
3.3.0-vanilla rc2-vanilla lumpyremove-v2r3 nosync-v2r3
Transactions per second: 16.00 ( 0.00%) 20.00 (25.00%) 18.00 (12.50%) 17.00 ( 6.25%)
Data megabytes read per second: 18.80 ( 0.00%) 24.27 (29.10%) 22.26 (18.40%) 20.54 ( 9.26%)
Data megabytes written per second: 35.83 ( 0.00%) 46.25 (29.08%) 42.42 (18.39%) 39.14 ( 9.24%)
Files created alone per second: 28.00 ( 0.00%) 38.00 (35.71%) 34.00 (21.43%) 30.00 ( 7.14%)
Files create/transact per second: 8.00 ( 0.00%) 10.00 (25.00%) 9.00 (12.50%) 8.00 ( 0.00%)
Files deleted alone per second: 556.00 ( 0.00%) 1224.00 (120.14%) 3062.00 (450.72%) 6124.00 (1001.44%)
Files delete/transact per second: 8.00 ( 0.00%) 10.00 (25.00%) 9.00 (12.50%) 8.00 ( 0.00%)

MMTests Statistics: duration
Sys Time Running Test (seconds) 113.34 107.99 109.73 108.72
User+Sys Time Running Test (seconds) 145.51 139.81 143.32 143.55
Total Elapsed Time (seconds) 1159.16 899.23 980.17 1062.27

MMTests Statistics: vmstat
Page Ins 13710192 13729032 13727944 13760136
Page Outs 43071140 42987228 42733684 42931624
Swap Ins 0 0 0 0
Swap Outs 0 0 0 0
Direct pages scanned 0 0 0 0
Kswapd pages scanned 9941613 9937443 9939085 9929154
Kswapd pages reclaimed 9940926 9936751 9938397 9928465
Direct pages reclaimed 0 0 0 0
Kswapd efficiency 99% 99% 99% 99%
Kswapd velocity 8576.567 11051.058 10140.164 9347.109
Direct efficiency 100% 100% 100% 100%
Direct velocity 0.000 0.000 0.000 0.000

It looks like here that the full series regresses performance but as
ftrace showed no usage of wait_iff_congested() or sync reclaim I am
assuming it's a disruption due to monitoring. Other data such as memory
usage, page IO, swap IO all looked similar.

Running a benchmark with a plain DD showed nothing very interesting.
The full series stalled in wait_iff_congested() slightly less but stall
times on vanilla kernels were marginal.

Running a benchmark that hammered on file-backed mappings showed stalls
due to congestion but not in sync writebacks

MICRO
3.3.0-vanilla rc2-vanilla lumpyremove-v2r3 nosync-v2r3
MMTests Statistics: duration
Sys Time Running Test (seconds) 308.13 294.50 298.75 299.53
User+Sys Time Running Test (seconds) 330.45 316.28 318.93 320.79
Total Elapsed Time (seconds) 1814.90 1833.88 1821.14 1832.91

MMTests Statistics: vmstat
Page Ins
Page Outs
Swap Ins
Swap Outs
Direct pages scanned
Kswapd pages scanned
Kswapd pages reclaimed
Direct pages reclaimed
Kswapd efficiency
Kswapd velocity
Direct efficiency
Direct velocity
Percentage direct scans
Page writes by reclaim
Page writes file
Page writes anon
Page reclaim immediate
Page rescued immediate
Slabs scanned
Direct inode steals
Kswapd inode steals 108712 120708 97224 110344 155514576 156017404 155813676 156193256 0 0 0 0 0 0 0 0 2599253 1550480 2512822 2414760 69742364 71150694 68839041 69692533 34824488 34773341 34796602 34799396 53693 94750 61792 75205 49% 48% 50% 49% 38427.662 38797.901 37799.972 38022.889 2% 6% 2% 3% 1432.174 845.464 1379.807 1317.446 3% 2% 3% 3% 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1218 0 0 0 0 15360 16384 13312 16384 0 0 0 0 4340 4327 1630 4323

FTrace Reclaim Statistics: congestion_wait
Direct number congest waited 0 0 0 0
Direct time congest waited 0ms 0ms 0ms 0ms
Direct full congest waited 0 0 0 0
Direct number conditional waited 900 870 754 789
Direct time conditional waited 0ms 0ms 0ms 20ms
Direct full conditional waited 0 0 0 0
KSwapd number congest waited 2106 2308 2116 1915
KSwapd time congest waited 139924ms 157832ms 125652ms 132516ms
KSwapd full congest waited 1346 1530 1202 1278
KSwapd number conditional waited 12922 16320 10943 14670
KSwapd time conditional waited 0ms 0ms 0ms 0ms
KSwapd full conditional waited 0 0 0 0

Reclaim statistics are not radically changed. The stall times in kswapd
are massive but it is clear that it is due to calls to congestion_wait()
and that is almost certainly the call in balance_pgdat(). Otherwise
stalls due to dirty pages are non-existant.

I ran a benchmark that stressed high-order allocation. This is very
artifical load but was used in the past to evaluate lumpy reclaim and
compaction. Generally I look at allocation success rates and latency
figures.

STRESS-HIGHALLOC
3.3.0-vanilla rc2-vanilla lumpyremove-v2r3 nosync-v2r3
Pass 1 81.00 ( 0.00%) 28.00 (-53.00%) 24.00 (-57.00%) 28.00 (-53.00%)
Pass 2 82.00 ( 0.00%) 39.00 (-43.00%) 38.00 (-44.00%) 43.00 (-39.00%)
while Rested 88.00 ( 0.00%) 87.00 (-1.00%) 88.00 ( 0.00%) 88.00 ( 0.00%)

MMTests Statistics: duration
Sys Time Running Test (seconds) 740.93 681.42 685.14 684.87
User+Sys Time Running Test (seconds) 2922.65 3269.52 3281.35 3279.44
Total Elapsed Time (seconds) 1161.73 1152.49 1159.55 1161.44

MMTests Statistics: vmstat
Page Ins 4486020 2807256 2855944 2876244
Page Outs 7261600 7973688 7975320 7986120
Swap Ins 31694 0 0 0
Swap Outs 98179 0 0 0
Direct pages scanned 53494 57731 34406 113015
Kswapd pages scanned 6271173 1287481 1278174 1219095
Kswapd pages reclaimed 2029240 1281025 1260708 1201583
Direct pages reclaimed 1468 14564 16649 92456
Kswapd efficiency 32% 99% 98% 98%
Kswapd velocity 5398.133 1117.130 1102.302 1049.641
Direct efficiency 2% 25% 48% 81%
Direct velocity 46.047 50.092 29.672 97.306
Percentage direct scans 0% 4% 2% 8%
Page writes by reclaim 1616049 0 0 0
Page writes file 1517870 0 0 0
Page writes anon 98179 0 0 0
Page reclaim immediate 103778 27339 9796 17831
Page rescued immediate 0 0 0 0
Slabs scanned 1096704 986112 980992 998400
Direct inode steals 223 215040 216736 247881
Kswapd inode steals 175331 61548 68444 63066
Kswapd skipped wait 21991 0 1 0
THP fault alloc 1 135 125 134
THP collapse alloc 393 311 228 236
THP splits 25 13 7 8
THP fault fallback 0 0 0 0
THP collapse fail 3 5 7 7
Compaction stalls 865 1270 1422 1518
Compaction success 370 401 353 383
Compaction failures 495 869 1069 1135
Compaction pages moved 870155 3828868 4036106 4423626
Compaction move failure 26429 23865 29742 27514

Success rates are completely hosed for 3.4-rc2 which is almost certainly
due to commit fe2c2a106663 ("vmscan: reclaim at order 0 when compaction
is enabled"). I expected this would happen for kswapd and impair
allocation success rates (https://lkml.org/lkml/2012/1/25/166) but I did
not anticipate this much a difference: 80% less scanning, 37% less
reclaim by kswapd

In comparison, reclaim/compaction is not aggressive and gives up easily
which is the intended behaviour. hugetlbfs uses __GFP_REPEAT and would
be much more aggressive about reclaim/compaction than THP allocations
are. The stress test above is allocating like neither THP or hugetlbfs
but is much closer to THP.

Mainline is now impaired in terms of high order allocation under heavy
load although I do not know to what degree as I did not test with
__GFP_REPEAT. Keep this in mind for bugs related to hugepage pool
resizing, THP allocation and high order atomic allocation failures from
network devices.

In terms of congestion throttling, I see the following for this test

FTrace Reclaim Statistics: congestion_wait
Direct number congest waited 3 0 0 0
Direct time congest waited 0ms 0ms 0ms 0ms
Direct full congest waited 0 0 0 0
Direct number conditional waited 957 512 1081 1075
Direct time conditional waited 0ms 0ms 0ms 0ms
Direct full conditional waited 0 0 0 0
KSwapd number congest waited 36 4 3 5
KSwapd time congest waited 3148ms 400ms 300ms 500ms
KSwapd full congest waited 30 4 3 5
KSwapd number conditional waited 88514 197 332 542
KSwapd time conditional waited 4980ms 0ms 0ms 0ms
KSwapd full conditional waited 49 0 0 0

The "conditional waited" times are the most interesting as this is
directly impacted by the number of dirty pages encountered during scan.
As lumpy reclaim is no longer scanning contiguous ranges, it is finding
fewer dirty pages. This brings wait times from about 5 seconds to 0.
kswapd itself is still calling congestion_wait() so it'll still stall but
it's a lot less.

In terms of the type of IO we were doing, I see this

FTrace Reclaim Statistics: mm_vmscan_writepage
Direct writes anon sync 0 0 0 0
Direct writes anon async 0 0 0 0
Direct writes file sync 0 0 0 0
Direct writes file async 0 0 0 0
Direct writes mixed sync 0 0 0 0
Direct writes mixed async 0 0 0 0
KSwapd writes anon sync 0 0 0 0
KSwapd writes anon async 91682 0 0 0
KSwapd writes file sync 0 0 0 0
KSwapd writes file async 822629 0 0 0
KSwapd writes mixed sync 0 0 0 0
KSwapd writes mixed async 0 0 0 0

In 3.2, kswapd was doing a bunch of async writes of pages but
reclaim/compaction was never reaching a point where it was doing sync
IO. This does not guarantee that reclaim/compaction was not calling
wait_on_page_writeback() but I would consider it unlikely. It indicates
that merging patches 2 and 3 to stop reclaim/compaction calling
wait_on_page_writeback() should be safe.

This patch:

Lumpy reclaim had a purpose but in the mind of some, it was to kick the
system so hard it trashed. For others the purpose was to complicate
vmscan.c. Over time it was giving softer shoes and a nicer attitude but
memory compaction needs to step up and replace it so this patch sends
lumpy reclaim to the farm.

The tracepoint format changes for isolating LRU pages with this patch
applied. Furthermore reclaim/compaction can no longer queue dirty pages
in pageout() if the underlying BDI is congested. Lumpy reclaim used
this logic and reclaim/compaction was using it in error.

Signed-off-by: Mel Gorman
Acked-by: Rik van Riel
Acked-by: KOSAKI Motohiro
Cc: Konstantin Khlebnikov
Cc: Hugh Dickins
Cc: Ying Han
Cc: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mel Gorman
2012-05-30 07:22:19 +0800
e709ffd61 mm: remove swap token code ... Browse Code »

The swap token code no longer fits in with the current VM model. It
does not play well with cgroups or the better NUMA placement code in
development, since we have only one swap token globally.

It also has the potential to mess with scalability of the system, by
increasing the number of non-reclaimable pages on the active and
inactive anon LRU lists.

Last but not least, the swap token code has been broken for a year
without complaints, as reported by Konstantin Khlebnikov. This suggests
we no longer have much use for it.

The days of sub-1G memory systems with heavy use of swap are over. If
we ever need thrashing reducing code in the future, we will have to
implement something that does scale.

Signed-off-by: Rik van Riel
Cc: Konstantin Khlebnikov
Acked-by: Johannes Weiner
Cc: Mel Gorman
Cc: Hugh Dickins
Acked-by: Bob Picco
Acked-by: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rik van Riel
2012-05-30 07:22:19 +0800

29 May, 2012

1 commit

90324cc1b Merge tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux ... Browse Code »

Pull writeback tree from Wu Fengguang:
"Mainly from Jan Kara to avoid iput() in the flusher threads."

* tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
writeback: Avoid iput() from flusher thread
vfs: Rename end_writeback() to clear_inode()
vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
writeback: Refactor writeback_single_inode()
writeback: Remove wb->list_lock from writeback_single_inode()
writeback: Separate inode requeueing after writeback
writeback: Move I_DIRTY_PAGES handling
writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
writeback: Move clearing of I_SYNC into inode_sync_complete()
writeback: initialize global_dirty_limit
fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
mm: page-writeback.c: local functions should not be exposed globally

Linus Torvalds
2012-05-29 00:54:45 +0800

25 May, 2012

1 commit

ece78b7df Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs ... Browse Code »

Pull ext2, ext3 and quota fixes from Jan Kara:
"Interesting bits are:
- removal of a special i_mutex locking subclass (I_MUTEX_QUOTA) since
quota code does not need i_mutex anymore in any unusual way.
- backport (from ext4) of a fix of a checkpointing bug (missing cache
flush) that could lead to fs corruption on power failure

The rest are just random small fixes & cleanups."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext2: trivial fix to comment for ext2_free_blocks
ext2: remove the redundant comment for ext2_export_ops
ext3: return 32/64-bit dir name hash according to usage type
quota: Get rid of nested I_MUTEX_QUOTA locking subclass
quota: Use precomputed value of sb_dqopt in dquot_quota_sync
ext2: Remove i_mutex use from ext2_quota_write()
reiserfs: Remove i_mutex use from reiserfs_quota_write()
ext4: Remove i_mutex use from ext4_quota_write()
ext3: Remove i_mutex use from ext3_quota_write()
quota: Fix double lock in add_dquot_ref() with CONFIG_QUOTA_DEBUG
jbd: Write journal superblock with WRITE_FUA after checkpointing
jbd: protect all log tail updates with j_checkpoint_mutex
jbd: Split updating of journal superblock and marking journal empty
ext2: do not register write_super within VFS
ext2: Remove s_dirt handling
ext2: write superblock only once on unmount
ext3: update documentation with barrier=1 default
ext3: remove max_debt in find_group_orlov()
jbd: Refine commit writeout logic

Linus Torvalds
2012-05-25 23:14:59 +0800

24 May, 2012

3 commits

644473e9c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull user namespace enhancements from Eric Biederman:
"This is a course correction for the user namespace, so that we can
reach an inexpensive, maintainable, and reasonably complete
implementation.

Highlights:
- Config guards make it impossible to enable the user namespace and
code that has not been converted to be user namespace safe.

- Use of the new kuid_t type ensures the if you somehow get past the
config guards the kernel will encounter type errors if you enable
user namespaces and attempt to compile in code whose permission
checks have not been updated to be user namespace safe.

- All uids from child user namespaces are mapped into the initial
user namespace before they are processed. Removing the need to add
an additional check to see if the user namespace of the compared
uids remains the same.

- With the user namespaces compiled out the performance is as good or
better than it is today.

- For most operations absolutely nothing changes performance or
operationally with the user namespace enabled.

- The worst case performance I could come up with was timing 1
billion cache cold stat operations with the user namespace code
enabled. This went from 156s to 164s on my laptop (or 156ns to
164ns per stat operation).

- (uid_t)-1 and (gid_t)-1 are reserved as an internal error value.
Most uid/gid setting system calls treat these value specially
anyway so attempting to use -1 as a uid would likely cause
entertaining failures in userspace.

- If setuid is called with a uid that can not be mapped setuid fails.
I have looked at sendmail, login, ssh and every other program I
could think of that would call setuid and they all check for and
handle the case where setuid fails.

- If stat or a similar system call is called from a context in which
we can not map a uid we lie and return overflowuid. The LFS
experience suggests not lying and returning an error code might be
better, but the historical precedent with uids is different and I
can not think of anything that would break by lying about a uid we
can't map.

- Capabilities are localized to the current user namespace making it
safe to give the initial user in a user namespace all capabilities.

My git tree covers all of the modifications needed to convert the core
kernel and enough changes to make a system bootable to runlevel 1."

Fix up trivial conflicts due to nearby independent changes in fs/stat.c

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (46 commits)
userns: Silence silly gcc warning.
cred: use correct cred accessor with regards to rcu read lock
userns: Convert the move_pages, and migrate_pages permission checks to use uid_eq
userns: Convert cgroup permission checks to use uid_eq
userns: Convert tmpfs to use kuid and kgid where appropriate
userns: Convert sysfs to use kgid/kuid where appropriate
userns: Convert sysctl permission checks to use kuid and kgids.
userns: Convert proc to use kuid/kgid where appropriate
userns: Convert ext4 to user kuid/kgid where appropriate
userns: Convert ext3 to use kuid/kgid where appropriate
userns: Convert ext2 to use kuid/kgid where appropriate.
userns: Convert devpts to use kuid/kgid where appropriate
userns: Convert binary formats to use kuid/kgid where appropriate
userns: Add negative depends on entries to avoid building code that is userns unsafe
userns: signal remove unnecessary map_cred_ns
userns: Teach inode_capable to understand inodes whose uids map to other namespaces.
userns: Fail exec for suid and sgid binaries with ids outside our user namespace.
userns: Convert stat to return values mapped from kuids and kgids
userns: Convert user specfied uids and gids in chown into kuids and kgid
userns: Use uid_eq gid_eq helpers when comparing kuids and kgids in the vfs
...

Linus Torvalds
2012-05-24 08:42:39 +0800
468f4d1a8 Merge tag 'pm-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm ... Browse Code »

Pull power management updates from Rafael Wysocki:

- Implementation of opportunistic suspend (autosleep) and user space
interface for manipulating wakeup sources.

- Hibernate updates from Bojan Smojver and Minho Ban.

- Updates of the runtime PM core and generic PM domains framework
related to PM QoS.

- Assorted fixes.

* tag 'pm-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (25 commits)
epoll: Fix user space breakage related to EPOLLWAKEUP
PM / Domains: Make it possible to add devices to inactive domains
PM / Hibernate: Use get_gendisk to verify partition if resume_file is integer format
PM / Domains: Fix computation of maximum domain off time
PM / Domains: Fix link checking when add subdomain
PM / Sleep: User space wakeup sources garbage collector Kconfig option
PM / Sleep: Make the limit of user space wakeup sources configurable
PM / Documentation: suspend-and-cpuhotplug.txt: Fix typo
PM / Domains: Cache device stop and domain power off governor results, v3
PM / Domains: Make device removal more straightforward
PM / Sleep: Fix a mistake in a conditional in autosleep_store()
epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready
PM / QoS: Create device constraints objects on notifier registration
PM / Runtime: Remove device fields related to suspend time, v2
PM / Domains: Rework default domain power off governor function, v2
PM / Domains: Rework default device stop governor function, v2
PM / Sleep: Add user space interface for manipulating wakeup sources, v3
PM / Sleep: Add "prevent autosleep time" statistics to wakeup sources
PM / Sleep: Implement opportunistic sleep, v2
PM / Sleep: Add wakeup_source_activate and wakeup_source_deactivate tracepoints
...

Linus Torvalds
2012-05-24 05:07:06 +0800
2e341ca68 Merge tag 'sound-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound ... Browse Code »

Pull sound updates from Takashi Iwai:
"This is the first big chunk for 3.5 merges of sound stuff.

There are a few big changes in different areas. First off, the
streaming logic of USB-audio endpoints has been largely rewritten for
the better support of "implicit feedback". If anything about USB got
broken, this change has to be checked.

For HD-audio, the resume procedure was changed; instead of delaying
the resume of the hardware until the first use, now waking up
immediately at resume. This is for buggy BIOS.

For ASoC, dynamic PCM support and the improved support for digital
links between off-SoC devices are major framework changes.

Some highlights are below:

* HD-audio
- Avoid accesses of invalid pin-control bits that may stall the codec
- V-ref setup cleanups
- Fix the races in power-saving code
- Fix the races in codec cache hashes and connection lists
- Split some common codes for BIOS auto-parser to hda_auto_parser.c
- Changed the PM resume code to wake up immediately for buggy BIOS
- Creative SoundCore3D support
- Add Conexant CX20751/2/3/4 codec support

* ASoC
- Dynamic PCM support, allowing support for SoCs with internal
routing through components with tight sequencing and formatting
constraints within their internal paths or where there are multiple
components connected with CPU managed DMA controllers inside the
SoC.
- Greatly improved support for direct digital links between off-SoC
devices, providing a much simpler way of connecting things like
digital basebands to CODECs.
- Much more fine grained and robust locking, cleaning up some of the
confusion that crept in with multi-component.
- CPU support for nVidia Tegra 30 I2S and audio hub controllers and
ST-Ericsson MSP I2S controolers
- New CODEC drivers for Cirrus CS42L52, LAPIS Semiconductor ML26124,
Texas Instruments LM49453.
- Some regmap changes needed by the Tegra I2S driver.
- mc13783 audio support.

* Misc
- Rewrite with module_pci_driver()
- Xonar DGX support for snd-oxygen
- Improvement of packet handling in snd-firewire driver
- New USB-endpoint streaming logic
- Enhanced M-audio FTU quirks and relevant cleanups
- Increment the support of OSS devices to 256
- snd-aloop accuracy improvement

There are a few more pending changes for 3.5, but they will be sent
slightly later as partly depending on the changes of DRM."

Fix up conflicts in regmap (due to duplicate patches, with some further
updates then having already come in from the regmap tree). Also some
fairly trivial context conflicts in the imx and mcx soc drivers.

* tag 'sound-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (280 commits)
ALSA: snd-usb: fix stream info output in /proc
ALSA: pcm - Add proper state checks to snd_pcm_drain()
ALSA: sh: Fix up namespace collision in sh_dac_audio.
ALSA: hda/realtek - Fix unused variable compile warning
ASoC: sh: fsi: enable chip specific data transfer mode
ASoC: sh: fsi: call fsi_hw_startup/shutdown from fsi_dai_trigger()
ASoC: sh: fsi: use same format for IN/OUT
ASoC: sh: fsi: add fsi_version() and removed meaningless version check
ASoC: sh: fsi: use register field macro name on IN/OUT_DMAC
ASoC: tegra: Add machine driver for WM8753 codec
ALSA: hda - Fix possible races of accesses to connection list array
ASoC: OMAP: HDMI: Introduce codec
ARM: mx31_3ds: Add sound support
ASoC: imx-mc13783 cleanup
mx31moboard: Add sound support
ASoC: mc13783 codec cleanups
ASoC: add imx-mc13783 sound support
ASoC: Add mc13783 codec
mfd: mc13xxx: add codec platform data
ASoC: don't flip master of DT-instantiated DAI links
...

Linus Torvalds
2012-05-24 04:05:43 +0800

23 May, 2012

1 commit

e8650a082 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial ... Browse Code »

Pull trivial updates from Jiri Kosina:
"As usual, it's mostly typo fixes, redundant code elimination and some
documentation updates."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (57 commits)
edac, mips: don't change code that has been removed in edac/mips tree
xtensa: Change mail addresses of Hannes Weiner and Oskar Schirmer
lib: Change mail address of Oskar Schirmer
net: Change mail address of Oskar Schirmer
arm/m68k: Change mail address of Sebastian Hess
i2c: Change mail address of Oskar Schirmer
net: Fix tcp_build_and_update_options comment in struct tcp_sock
atomic64_32.h: fix parameter naming mismatch
Kconfig: replace "--- help ---" with "---help---"
c2port: fix bogus Kconfig "default no"
edac: Fix spelling errors.
qla1280: Remove redundant NULL check before release_firmware() call
remoteproc: remove redundant NULL check before release_firmware()
qla2xxx: Remove redundant NULL check before release_firmware() call.
aic94xx: Get rid of redundant NULL check before release_firmware() call
tehuti: delete redundant NULL check before release_firmware()
qlogic: get rid of a redundant test for NULL before call to release_firmware()
bna: remove redundant NULL test before release_firmware()
tg3: remove redundant NULL test before release_firmware() call
typhoon: get rid of redundant conditional before all to release_firmware()
...

Linus Torvalds
2012-05-23 10:22:50 +0800

16 May, 2012

2 commits

08cefc7ab userns: Convert ext4 to user kuid/kgid where appropriate ... Browse Code »

Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman

Eric W. Biederman
2012-05-16 05:59:27 +0800
1523299d5 userns: Convert ext3 to use kuid/kgid where appropriate ... Browse Code »

Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman

Eric W. Biederman
2012-05-16 05:59:27 +0800