03 Feb, 2011

1 commit

  • Currently the trace_event structures are placed in the _ftrace_events
    section, and at link time, the linker makes one large array of all
    the trace_event structures. On boot up, this array is read (much like
    the initcall sections) and the events are processed.

    The problem is that there is no guarantee that gcc will place complex
    structures nicely together in an array format. Two structures in the
    same file may be placed awkwardly, because gcc has no clue that they
    are suppose to be in an array.

    A hack was used previous to force the alignment to 4, to pack the
    structures together. But this caused alignment issues with other
    architectures (sparc).

    Instead of packing the structures into an array, the structures' addresses
    are now put into the _ftrace_event section. As pointers are always the
    natural alignment, gcc should always pack them tightly together
    (otherwise initcall, extable, etc would also fail).

    By having the pointers to the structures in the section, we can still
    iterate the trace_events without causing unnecessary alignment problems
    with other architectures, or depending on the current behaviour of
    gcc that will likely change in the future just to tick us kernel developers
    off a little more.

    The _ftrace_event section is also moved into the .init.data section
    as it is now only needed at boot up.

    Suggested-by: David Miller
    Cc: Mathieu Desnoyers
    Acked-by: David S. Miller
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

16 Jan, 2011

1 commit

  • …linus' and 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    rcu: avoid pointless blocked-task warnings
    rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status
    rtmutex: Fix comment about why new_owner can be NULL in wake_futex_pi()

    * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, olpc: Add missing Kconfig dependencies
    x86, mrst: Set correct APB timer IRQ affinity for secondary cpu
    x86: tsc: Fix calibration refinement conditionals to avoid divide by zero
    x86, ia64, acpi: Clean up x86-ism in drivers/acpi/numa.c

    * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    timekeeping: Make local variables static
    time: Rename misnamed minsec argument of clocks_calc_mult_shift()

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    tracing: Remove syscall_exit_fields
    tracing: Only process module tracepoints once
    perf record: Add "nodelay" mode, disabled by default
    perf sched: Fix list of events, dropping unsupported ':r' modifier
    Revert "perf tools: Emit clearer message for sys_perf_event_open ENOENT return"
    perf top: Fix annotate segv
    perf evsel: Fix order of event list deletion

    Linus Torvalds
     

15 Jan, 2011

1 commit

  • The commit:

    9f987b3141f086de27832514aad9f50a53f754
    tracing: Include module.h in define_trace.h

    only solved half the problem. If the trace/events/module.h header is
    included at the time of define_trace.h (or in ftrace.h within it),
    the module.h TRACE_SYSTEM will override the current TRACE_SYSTEM
    macro.

    Since define_trace.h is included when CREATE_TRACE_POINTS is set,
    and the first thing it does is to #undef CREATE_TRACE_POINTS,
    by placing the module.h TRACE_SYSTEM inside a
    #ifdef CREATE_TRACE_POINTS
    we can prevent it from overriding the TRACE_SYSTEM that is
    being processed, and still process the module.h tracepoints
    when the module code defines CREATE_TRACE_POINTS and includes
    the trace/events/module.h header.

    As with commit 9f987b3141, this is only an issue if module.h
    is not included before the trace/events/.h file is
    included, which (luckily) has not happened yet.

    Reported-by: Peter Zijlstra
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

14 Jan, 2011

8 commits

  • With compaction being used instead of lumpy reclaim, the name lumpy_mode
    and associated variables is a bit misleading. Rename lumpy_mode to
    reclaim_mode which is a better fit. There is no functional change.

    Signed-off-by: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: KOSAKI Motohiro
    Cc: Rik van Riel
    Acked-by: Johannes Weiner
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Currently lumpy_mode is an enum and determines if lumpy reclaim is off,
    syncronous or asyncronous. In preparation for using compaction instead of
    lumpy reclaim, this patch converts the flags into a bitmap.

    Signed-off-by: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: KOSAKI Motohiro
    Cc: Rik van Riel
    Cc: Johannes Weiner
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • In preparation for a patches promoting the use of memory compaction over
    lumpy reclaim, this patch adds trace points for memory compaction
    activity. Using them, we can monitor the scanning activity of the
    migration and free page scanners as well as the number and success rates
    of pages passed to page migration.

    Signed-off-by: Mel Gorman
    Cc: Andrea Arcangeli
    Cc: KOSAKI Motohiro
    Cc: Rik van Riel
    Cc: Johannes Weiner
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • This tracks when balance_dirty_pages() tries to wakeup the flusher thread
    for background writeback (if it was not started already).

    Suggested-by: Christoph Hellwig
    Signed-off-by: Wu Fengguang
    Cc: Jan Kara
    Cc: Johannes Weiner
    Cc: Dave Chinner
    Cc: Jan Engelhardt
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wu Fengguang
     
  • * 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
    block: ensure that completion error gets properly traced
    blktrace: add missing probe argument to block_bio_complete
    block cfq: don't use atomic_t for cfq_group
    block cfq: don't use atomic_t for cfq_queue
    block: trace event block fix unassigned field
    block: add internal hd part table references
    block: fix accounting bug on cross partition merges
    kref: add kref_test_and_get
    bio-integrity: mark kintegrityd_wq highpri and CPU intensive
    block: make kblockd_workqueue smarter
    Revert "sd: implement sd_check_events()"
    block: Clean up exit_io_context() source code.
    Fix compile warnings due to missing removal of a 'ret' variable
    fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
    block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
    cfq-iosched: don't check cfqg in choose_service_tree()
    fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
    cdrom: export cdrom_check_events()
    sd: implement sd_check_events()
    sr: implement sr_check_events()
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6: (348 commits)
    ALSA: hda - Fix NULL-derefence with a single mic in STAC auto-mic detection
    ALSA: hda - Add missing NID 0x19 fixup for Sony VAIO
    ALSA: hda - Fix ALC275 enable hardware EQ for SONY VAIO
    ALSA: oxygen: fix Xonar DG input
    ALSA: hda - Fix EAPD on Lenovo NB ALC269 to low
    ALSA: hda - Fix missing EAPD for Acer 4930G
    ALSA: hda: Disable 4/6 channels on some NVIDIA GPUs.
    ALSA: hda - Add static_hdmi_pcm option to HDMI codec parser
    ALSA: hda - Don't refer ELD when unplugged
    ASoC: tpa6130a2: Fix compiler warning
    ASoC: tlv320dac33: Add DAPM selection for LOM invert
    ASoC: DMIC codec: Adding a generic DMIC codec
    ALSA: snd-usb-us122l: Fix missing NULL checks
    ALSA: snd-usb-us122l: Fix MIDI output
    ASoC: soc-cache: Fix invalid memory access during snd_soc_lzo_cache_sync()
    ASoC: Fix section mismatch in wm8995.c
    ALSA: oxygen: add S/PDIF source selection for Claro cards
    ALSA: oxygen: fix CD/MIDI for X-Meridian (2G)
    ASoC: fix migor audio build
    ALSA: include delay.h for msleep in Xonar DG support
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/lrg/voltage-2.6: (45 commits)
    regulator: missing index in PTR_ERR() in isl6271a_probe()
    regulator: Assign return value of mc13xxx_reg_rmw to ret
    regulator: Add initial per-regulator debugfs support
    regulator: Make regulator_has_full_constraints a bool
    regulator: Clean up logging a bit
    regulator: Optimise out noop voltage changes
    regulator: Add API to re-apply voltage to hardware
    regulator: Staticise non-exported functions in mc13892
    regulator: Only notify voltage changes when they succeed
    regulator: Provide a selector based set_voltage_sel() operation
    regulator: Factor out voltage set operation into a separate function
    regulator: Convert WM8994 to use get_voltage_sel()
    regulator: Convert WM835x to use get_voltage_sel()
    regulator: Allow modular build of mc13xxx-core
    regulator: support PMIC mc13892
    make mc13783 regulator code generic
    Change the register name definitions for mc13783
    mach-ux500: Updated and connected ab8500 regulator board configuration
    regulators: Removed macros for initialization of ab8500 regulators
    regulators: Added verbose debug messages to ab8500 regulators
    ...

    Linus Torvalds
     
  • * 'kvm-updates/2.6.38' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (142 commits)
    KVM: Initialize fpu state in preemptible context
    KVM: VMX: when entering real mode align segment base to 16 bytes
    KVM: MMU: handle 'map_writable' in set_spte() function
    KVM: MMU: audit: allow audit more guests at the same time
    KVM: Fetch guest cr3 from hardware on demand
    KVM: Replace reads of vcpu->arch.cr3 by an accessor
    KVM: MMU: only write protect mappings at pagetable level
    KVM: VMX: Correct asm constraint in vmcs_load()/vmcs_clear()
    KVM: MMU: Initialize base_role for tdp mmus
    KVM: VMX: Optimize atomic EFER load
    KVM: VMX: Add definitions for more vm entry/exit control bits
    KVM: SVM: copy instruction bytes from VMCB
    KVM: SVM: implement enhanced INVLPG intercept
    KVM: SVM: enhance mov DR intercept handler
    KVM: SVM: enhance MOV CR intercept handler
    KVM: SVM: add new SVM feature bit names
    KVM: cleanup emulate_instruction
    KVM: move complete_insn_gp() into x86.c
    KVM: x86: fix CR8 handling
    KVM guest: Fix kvm clock initialization when it's configured out
    ...

    Linus Torvalds
     

12 Jan, 2011

6 commits

  • Provide some basic trace facilities to the regulator API. We generate
    events on regulator enable, disable and voltage setting over the actual
    hardware operations (which are assumed to be the expensive ones which
    require interaction with the actual device). This is intended to facilitate
    debug of the performance and behaviour with consumers allowing unified
    traces to be generated including the regulator operations within the
    context of the other components of the system.

    For enable we log the explicit delay for the voltage ramp separately to
    the interaction with the hardware to highlight the time consumed in I/O.
    We should add a similar delay for voltage changes, though there the
    relatively small magnitude of the changes in the context of the I/O
    costs makes it much less critical for most regulators.

    Only hardware interactions are currently traced as the primary focus is
    on the performance and synchronisation of actual hardware interactions.
    Additional tracepoints for debugging of the logical operations can be
    added later if required.

    Signed-off-by: Mark Brown
    Signed-off-by: Liam Girdwood

    Mark Brown
     
  • Use 'DECLARE_EVENT_CLASS' to cleanup async_pf tracepoints

    Acked-by: Gleb Natapov
    Signed-off-by: Xiao Guangrong
    Signed-off-by: Marcelo Tosatti

    Xiao Guangrong
     
  • Tracing 'async' and *pfn is useless, since 'async' is always true,
    and '*pfn' is always "fault_pfn'

    We can trace 'gva' and 'gfn' instead, it can help us to see the
    life-cycle of an async_pf

    Signed-off-by: Xiao Guangrong
    Signed-off-by: Marcelo Tosatti

    Xiao Guangrong
     
  • Add tracepoint for userspace exit.

    Signed-off-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Gleb Natapov
     
  • Send async page fault to a PV guest if it accesses swapped out memory.
    Guest will choose another task to run upon receiving the fault.

    Allow async page fault injection only when guest is in user mode since
    otherwise guest may be in non-sleepable context and will not be able
    to reschedule.

    Vcpu will be halted if guest will fault on the same page again or if
    vcpu executes kernel code.

    Acked-by: Rik van Riel
    Signed-off-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Gleb Natapov
     
  • If a guest accesses swapped out memory do not swap it in from vcpu thread
    context. Schedule work to do swapping and put vcpu into halted state
    instead.

    Interrupts will still be delivered to the guest and if interrupt will
    cause reschedule guest will continue to run another task.

    [avi: remove call to get_user_pages_noio(), nacked by Linus; this
    makes everything synchrnous again]

    Acked-by: Rik van Riel
    Signed-off-by: Gleb Natapov
    Signed-off-by: Marcelo Tosatti

    Gleb Natapov
     

08 Jan, 2011

2 commits

  • The check for NULL skb in the kfree_skb trace event is a duplicate from the
    check already done in its only caller, kfree_skb(). Remove this duplicate check.

    Signed-off-by: Mathieu Desnoyers
    LKML-Reference:
    Acked-by: Neil Horman
    Acked-by: David S. Miller
    CC: Steven Rostedt
    CC: Frederic Weisbecker
    CC: Ingo Molnar
    CC: Thomas Gleixner
    CC: Zhaolei
    Signed-off-by: Steven Rostedt

    Mathieu Desnoyers
     
  • While doing some developing, Peter Zijlstra and I have found
    that if a CREATE_TRACE_POINTS include is done before module.h
    is included, it can break the build.

    We have been lucky so far that this has not broke the build
    since module.h is included in almost everything.

    Reported-by: Peter Zijlstra
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

07 Jan, 2011

1 commit

  • The "error" field in block_bio_complete is not assigned, leaving the memory area
    uninitialized (keeping garbage data). Pass an additional tracepoint argument to
    this event to initialize this field.

    Signed-off-by: Jeff Moyer
    Signed-off-by: Mathieu Desnoyers
    CC: Steven Rostedt
    CC: Frederic Weisbecker
    CC: Ingo Molnar
    CC: Thomas Gleixner
    CC: Li Zefan
    CC: Alan.Brunelle@hp.com
    Signed-off-by: Jens Axboe

    Jeff Moyer
     

04 Jan, 2011

1 commit

  • Add these new power trace events:

    power:cpu_idle
    power:cpu_frequency
    power:machine_suspend

    The old C-state/idle accounting events:
    power:power_start
    power:power_end

    Have now a replacement (but we are still keeping the old
    tracepoints for compatibility):

    power:cpu_idle

    and
    power:power_frequency

    is replaced with:
    power:cpu_frequency

    power:machine_suspend is newly introduced.

    Jean Pihet has a patch integrated into the generic layer
    (kernel/power/suspend.c) which will make use of it.

    the type= field got removed from both, it was never
    used and the type is differed by the event type itself.

    perf timechart userspace tool gets adjusted in a separate patch.

    Signed-off-by: Thomas Renninger
    Signed-off-by: Ingo Molnar
    Acked-by: Arjan van de Ven
    Acked-by: Jean Pihet
    Cc: Arnaldo Carvalho de Melo
    Cc: Peter Zijlstra
    Cc: Linus Torvalds
    Cc: rjw@sisk.pl
    LKML-Reference:
    Signed-off-by: Ingo Molnar
    LKML-Reference:

    Thomas Renninger
     

06 Dec, 2010

1 commit

  • As jack detection can trigger DAPM and the latency in debouncing can create
    confusing windows in operation provide some trace events which will hopefully
    help in diagnostics. The soc-jack core traces all reports that it gets and
    the resulting notifications to upper layers. An event for jack IRQs is also
    provided for instrumentation of debounce, and used in the GPIO jack code.

    Signed-off-by: Mark Brown
    Acked-by: Liam Girdwood

    Mark Brown
     

03 Dec, 2010

1 commit

  • There are instances in the kernel that we only want to trace
    a tracepoint when a certain condition is set. But we do not
    want to test for that condition in the core kernel.
    If we test for that condition before calling the tracepoin, then
    we will be performing that test even when tracing is not enabled.
    This is 99.99% of the time.

    We currently can just filter out on that condition, but that happens
    after we write to the trace buffer. We just wasted time writing to
    the ring buffer for an event we never cared about.

    This patch adds:

    TRACE_EVENT_CONDITION() and DEFINE_EVENT_CONDITION()

    These have a new TP_CONDITION() argument that comes right after
    the TP_ARGS(). This condition can use the parameters of TP_ARGS()
    in the TRACE_EVENT() to determine if the tracepoint should be traced
    or not. The TP_CONDITION() will be placed in a if (cond) trace;

    For example, for the tracepoint sched_wakeup, it is useless to
    trace a wakeup event where the caller never actually wakes
    anything up (where success == 0). So adding:

    TP_CONDITION(success),

    which uses the "success" parameter of the wakeup tracepoint
    will have it only trace when we have successfully woken up a
    task.

    Acked-by: Mathieu Desnoyers
    Acked-by: Frederic Weisbecker
    Cc: Arjan van de Ven
    Cc: Thomas Gleixner
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     

19 Nov, 2010

3 commits


18 Nov, 2010

3 commits

  • As for the raw syscalls events, individual syscall events won't
    leak system wide information on task bound tracing. Allow non
    privileged users to use them in such workflow.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Thomas Gleixner
    Cc: Steven Rostedt
    Cc: Li Zefan
    Cc: Jason Baron

    Frederic Weisbecker
     
  • This allows non privileged users to use the raw syscall trace events
    for task bound tracing in perf.

    It is safe because raw syscall trace events don't leak system wide
    informations.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Thomas Gleixner
    Cc: Steven Rostedt
    Cc: Li Zefan
    Cc: Jason Baron

    Frederic Weisbecker
     
  • This introduces the new TRACE_EVENT_FLAGS() macro in order
    to set up initial event flags value.

    This macro must simply follow the definition of a trace event
    and take the event name and the flag value as parameters:

    TRACE_EVENT(my_event, .....
    ....
    );

    TRACE_EVENT_FLAGS(my_event, 1)

    This will set up 1 as the initial my_event->flags value.

    Signed-off-by: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Arnaldo Carvalho de Melo
    Cc: Thomas Gleixner
    Cc: Steven Rostedt
    Cc: Li Zefan
    Cc: Jason Baron

    Frederic Weisbecker
     

16 Nov, 2010

1 commit


11 Nov, 2010

2 commits

  • Trace events for DAPM allow us to monitor the performance and behaviour
    of DAPM with logging which can be built into the kernel permanantly, is
    more suited to automated analysis and display and less likely to suffer
    interference from other logging activity.

    Currently trace events are generated for:

    - Start and stop of DAPM processing
    - Start and stop of bias level changes
    - Power decisions for widgets
    - Widget event execution start and stop

    giving some view as to what is happening and where latencies occur.

    Actual changes in widget power can be seen via the register write trace in
    soc-core.

    Signed-off-by: Mark Brown
    Acked-by: Liam Girdwood

    Mark Brown
     
  • The trace subsystem provides a convenient way of instrumenting the kernel
    which can be left on all the time with extremely low impact on the system
    unlike prints to the kernel log which can be very spammy. Begin adding
    support for instrumenting ASoC via this interface by adding trace for the
    register access primitives.

    Signed-off-by: Mark Brown
    Acked-by: Liam Girdwood

    Mark Brown
     

09 Nov, 2010

1 commit


28 Oct, 2010

6 commits

  • Conflicts:
    fs/ext4/inode.c
    fs/ext4/mballoc.c
    include/trace/events/ext4.h

    Theodore Ts'o
     
  • Unfortunately perf can't deal with anything other than direct structure
    accesses in the TP_printk() section. It will drop dead when it sees
    jbd2_dev_to_name() in the "print fmt" section of the tracepoint.

    Addresses-Google-Bug: 3138508

    Signed-off-by: "Theodore Ts'o"

    Theodore Ts'o
     
  • …/git/tip/linux-2.6-tip

    * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (50 commits)
    perf python scripting: Add futex-contention script
    perf python scripting: Fixup cut'n'paste error in sctop script
    perf scripting: Shut up 'perf record' final status
    perf record: Remove newline character from perror() argument
    perf python scripting: Support fedora 11 (audit 1.7.17)
    perf python scripting: Improve the syscalls-by-pid script
    perf python scripting: print the syscall name on sctop
    perf python scripting: Improve the syscalls-counts script
    perf python scripting: Improve the failed-syscalls-by-pid script
    kprobes: Remove redundant text_mutex lock in optimize
    x86/oprofile: Fix uninitialized variable use in debug printk
    tracing: Fix 'faild' -> 'failed' typo
    perf probe: Fix format specified for Dwarf_Off parameter
    perf trace: Fix detection of script extension
    perf trace: Use $PERF_EXEC_PATH in canned report scripts
    perf tools: Document event modifiers
    perf tools: Remove direct slang.h include
    perf_events: Fix for transaction recovery in group_sched_in()
    perf_events: Revert: Fix transaction recovery in group_sched_in()
    perf, x86: Use NUMA aware allocations for PEBS/BTS/DS allocations
    ...

    Linus Torvalds
     
  • Many tracepoints were populating an ext4_allocation_context
    to pass in, but this requires a slab allocation even when
    tracepoints are off. In fact, 4 of 5 of these allocations
    were only for tracing. In addition, we were only using a
    small fraction of the 144 bytes of this structure for this
    purpose.

    We can do away with all these alloc/frees of the ac and
    simply pass in the bits we care about, instead.

    I tested this by turning on tracing and running through
    xfstests on x86_64. I did not actually do anything with
    the trace output, however.

    Signed-off-by: Eric Sandeen
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     
  • Our QA reported an oops in the ext4_mb_release_group_pa tracing,
    and Josef Bacik pointed out that it was because we may have a
    non-null but uninitialized ac_inode in the allocation context.

    I can reproduce it when running xfstests with ext4 tracepoints on,
    on a CONFIG_SLAB_DEBUG kernel.

    We call trace_ext4_mb_release_group_pa from 2 places,
    ext4_mb_discard_group_preallocations and
    ext4_mb_discard_lg_preallocations

    In both cases we allocate an ac as a container just for tracing (!)
    and never fill in the ac_inode. There's no reason to be assigning,
    testing, or printing it as far as I can see, so just remove it from
    the tracepoint.

    Signed-off-by: Eric Sandeen
    Reviewed-by: Josef Bacik
    Signed-off-by: "Theodore Ts'o"

    Eric Sandeen
     
  • ac->inode is set to null in function ext4_mb_release_group_pa(),
    and then trace_ext4_mballoc_discard(ac) is called, the kernel
    will panic.

    BUG: unable to handle kernel NULL pointer dereference at 000000a4
    IP: [] ftrace_raw_event_ext4__mballoc+0x54/0xc0 [ext4]
    *pdpt = 0000000000abd001 *pde = 0000000000000000
    Oops: 0000 [#1] SMP

    Pid: 550, comm: flush-8:16 Not tainted 2.6.36-rc1 #1 SE7320EP2/Altos G530
    EIP: 0060:[] EFLAGS: 00010206 CPU: 1
    EIP is at ftrace_raw_event_ext4__mballoc+0x54/0xc0 [ext4]
    EAX: f32ac840 EBX: f3f1cf88 ECX: f32ac840 EDX: 00000000
    ESI: f32ac83c EDI: f880b9d8 EBP: 00000000 ESP: f4b77ae4
    DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
    Process flush-8:16 (pid: 550, ti=f4b76000 task=f613e540 task.ti=f4b76000)
    Call Trace:
    [] ? ext4_mb_release_group_pa+0x121/0x150 [ext4]
    [] ? ext4_mb_discard_group_preallocations+0x336/0x400 [ext4]
    [] ? ext4_mb_new_blocks+0x3d1/0x4f0 [ext4]
    [] ? __make_request+0x10b/0x440
    [] ? ext4_ext_map_blocks+0x1334/0x1980 [ext4]
    [] ? rb_reserve_next_event+0xaa/0x3b0
    [] ? ext4_map_blocks+0xd6/0x1d0 [ext4]
    [] ? mpage_da_map_blocks+0xc7/0x8a0 [ext4]
    [] ? find_get_pages_tag+0x38/0x110
    [] ? __pagevec_release+0x15/0x20
    [] ? ext4_da_writepages+0x2b5/0x5d0 [ext4]
    [] ? __writepage+0x0/0x30
    [] ? do_writepages+0x14/0x30
    [] ? writeback_single_inode+0xa0/0x240
    [] ? writeback_sb_inodes+0xc1/0x180
    [] ? writeback_inodes_wb+0x88/0x140
    [] ? wb_writeback+0x20b/0x320
    [] ? lock_timer_base+0x27/0x50
    [] ? wb_do_writeback+0x150/0x190
    [] ? bdi_writeback_thread+0x88/0x1f0
    [] ? complete+0x40/0x60
    [] ? bdi_writeback_thread+0x0/0x1f0
    [] ? kthread+0x74/0x80
    [] ? kthread+0x0/0x80
    [] ? kernel_thread_helper+0x6/0x10

    Signed-off-by: Wen Congyang
    Acked-by: Steven Rostedt
    Signed-off-by: "Theodore Ts'o"

    Wen Congyang
     

27 Oct, 2010

1 commit

  • …r if significant congestion is not being encountered in the current zone

    If congestion_wait() is called with no BDI congested, the caller will
    sleep for the full timeout and this may be an unnecessary sleep. This
    patch adds a wait_iff_congested() that checks congestion and only sleeps
    if a BDI is congested else, it calls cond_resched() to ensure the caller
    is not hogging the CPU longer than its quota but otherwise will not sleep.

    This is aimed at reducing some of the major desktop stalls reported during
    IO. For example, while kswapd is operating, it calls congestion_wait()
    but it could just have been reclaiming clean page cache pages with no
    congestion. Without this patch, it would sleep for a full timeout but
    after this patch, it'll just call schedule() if it has been on the CPU too
    long. Similar logic applies to direct reclaimers that are not making
    enough progress.

    Signed-off-by: Mel Gorman <mel@csn.ul.ie>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Minchan Kim <minchan.kim@gmail.com>
    Cc: Wu Fengguang <fengguang.wu@intel.com>
    Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Cc: Rik van Riel <riel@redhat.com>
    Cc: Jens Axboe <axboe@kernel.dk>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Mel Gorman