13 Mar, 2009

40 commits

  • Introduce softirq entry/exit tracepoints. These are useful for
    augmenting existing tracers, and to figure out softirq frequencies and
    timings.

    [
    s/irq_softirq_/softirq_/ for trace point names and
    Fixed printf format in TRACE_FORMAT macro
    - Steven Rostedt
    ]

    LKML-Reference:
    Signed-off-by: Jason Baron
    Signed-off-by: Steven Rostedt

    Jason Baron
     
  • Create a 'softirq_to_name' array, which is indexed by softirq #, so
    that we can easily convert between the softirq index # and its name, in
    order to get more meaningful output messages.

    LKML-Reference:
    Signed-off-by: Jason Baron
    Signed-off-by: Steven Rostedt

    Jason Baron
     
  • If the stack tracing is disabled (by default) the stack_trace file
    will only contain the header:

    # cat /debug/tracing/stack_trace
    Depth Size Location (0 entries)
    ----- ---- --------

    This can be frustrating to a developer that does not realize that the
    stack tracer is disabled. This patch adds the following text:

    # cat /debug/tracing/stack_trace
    Depth Size Location (0 entries)
    ----- ---- --------
    #
    # Stack tracer disabled
    #
    # To enable the stack tracer, either add 'stacktrace' to the
    # kernel command line
    # or 'echo 1 > /proc/sys/kernel/stack_tracer_enabled'
    #

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The stack tracer use to look like this:

    # cat /debug/tracing/stack_trace
    Depth Size Location (57 entries)
    ----- ---- --------
    0) 5088 16 mempool_alloc_slab+0x16/0x18
    1) 5072 144 mempool_alloc+0x4d/0xfe
    2) 4928 16 scsi_sg_alloc+0x48/0x4a [scsi_mod]

    Now it looks like this:

    # cat /debug/tracing/stack_trace

    Depth Size Location (57 entries)
    ----- ---- --------
    0) 5088 16 mempool_alloc_slab+0x16/0x18
    1) 5072 144 mempool_alloc+0x4d/0xfe
    2) 4928 16 scsi_sg_alloc+0x48/0x4a [scsi_mod]

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The binary printk saves a pointer to the format string in the ring buffer.
    On output, the format is processed. But if the user is reading the
    ring buffer through a binary interface, the pointer is meaningless.

    This patch creates a file called printk_formats that maps the pointers
    to the formats.

    # cat /debug/tracing/printk_formats
    0xffffffff80713d40 : "irq_handler_entry: irq=%d handler=%s\n"
    0xffffffff80713d48 : "lock_acquire: %s%s%s\n"
    0xffffffff80713d50 : "lock_release: %s\n"

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Impact: speed up on event tracing

    The event_trace_printk is currently a wrapper function that calls
    trace_vprintk. Because it uses a variable for the fmt it misses out
    on the optimization of using the binary printk.

    This patch makes event_trace_printk into a macro wrapper to use the
    fmt as the same as the trace_printks.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The bprint record is using TRACE_PRINT when it should be TRACE_BPRINT.

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Impact: fix callsites with dynamic format strings

    Since its new binary implementation, trace_printk() internally uses static
    containers for the format strings on each callsites. But the value is
    assigned once at build time, which means that it can't take dynamic
    formats.

    So this patch unearthes the raw trace_printk implementation for the callers
    that will need trace_printk to be able to carry these dynamic format
    strings. The trace_printk() macro will use the appropriate implementation
    for each callsite. Most of the time however, the binary implementation will
    still be used.

    The other impact of this patch is that mmiotrace_printk() will use the old
    implementation because it calls the low level trace_vprintk and we can't
    guess here whether the format passed in it is dynamic or not.

    Some parts of this patch have been written by Steven Rostedt (most notably
    the part that chooses the appropriate implementation for each callsites).

    Signed-off-by: Frederic Weisbecker
    Signed-off-by: Steven Rostedt

    Frederic Weisbecker
     
  • Impact: do not confuse user on small trace buffer sizes

    When the system boots up, the trace buffer is small to conserve memory.
    It is only two pages per online CPU. When the tracer is used, it expands
    to the default value.

    This can confuse the user if they look at the buffer size and see only
    7, but then later they see 1408.

    # cat /debug/tracing/buffer_size_kb
    7

    # echo sched_switch > /debug/tracing/current_tracer

    # cat /debug/tracing/buffer_size_kb
    1408

    This patch tries to help remove this confustion by showing that the
    buffer has not been expanded.

    # cat /debug/tracing/buffer_size_kb
    7 (expanded: 1408)

    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Impact: speed up and remove possible races

    The get_online_cpus was added to the ring buffer because the original
    design would free the ring buffer on a CPU that was being taken
    off line. The final design kept the ring buffer around even when the
    CPU was taken off line. This is to allow a user to still read the
    information on that ring buffer.

    Most of the get_online_cpus are no longer needed since the ring buffer will
    not disappear from the use cases.

    Reported-by: KOSAKI Motohiro
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • The hotplug code in the ring buffers is for use with CPU hotplug,
    not generic hotplug.

    Reported-by: Andrew Morton
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Impact: prevent races with ring_buffer_expanded

    This patch places the expanding of the tracing buffer under the
    protection of the trace_types_lock mutex. It is highly unlikely
    that there would be any contention, but better safe than sorry.

    Reported-by: Andrew Morton
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • Impact: cleanup

    Some of the comments about the trace buffer resizing is gobbledygook.
    And I wonder why people question if I'm a native English speaker.

    This patch makes the comments make a bit more sense.

    Reported-by: Andrew Morton
    Signed-off-by: Steven Rostedt

    Steven Rostedt
     
  • …ip/linux-2.6-tip into trace/tip/tracing/ftrace-merge

    Steven Rostedt
     
  • Ingo Molnar
     
  • Impact: cleanup

    The naming clashes with upcoming softirq tracepoints, so rename the
    APIs to lockdep_*().

    Requested-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Ingo Molnar
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sam/kbuild-fixes:
    kbuild: remove unused -r option for module-init-tool depmod
    kbuild: fix 'make rpm' when CONFIG_LOCALVERSION_AUTO=y and using SCM tree
    kbuild: fix mkspec to cleanup RPM_BUILD_ROOT
    kbuild: fix C libary confusion in unifdef.c due to getline()

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
    cpumask: mm_cpumask for accessing the struct mm_struct's cpu_vm_mask.
    cpumask: tsk_cpumask for accessing the struct task_struct's cpus_allowed.

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
    Squashfs: Valid filesystems are flagged as bad by the corrupted fs patch

    Linus Torvalds
     
  • * 'hwmon-for-linus' of git://jdelvare.pck.nerim.net/jdelvare-2.6:
    hwmon: (f75375s) Remove unnecessary and confusing initialization
    hwmon: (it87) Properly decode -128 degrees C temperature
    hwmon: (lm90) Document support for the MAX6648/6692 chips
    hwmon: (abituguru3) Fix I/O error handling

    Linus Torvalds
     
  • Trivial patch to fix bad links in the ext2 and ext3 documentation.

    Signed-off-by: Jody McIntyre
    Signed-off-by: Linus Torvalds

    Jody McIntyre
     
  • * 'fixes-20090312' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/pci:
    PCIe: portdrv: call pci_disable_device during remove
    pci: Fix typo in message while disabling HT MSI mapping
    pci: don't disable too many HT MSI mapping
    powerpc/pseries: The RPA PCI hotplug driver depends on EEH
    PCIe: AER: during disable, check subordinate before walking
    PCI: Add PCI quirk to disable L0s ASPM state for 82575 and 82598

    Linus Torvalds
     
  • STag zero is a special STag that allows consumers to access any bus
    address without registering memory. The nes driver unfortunately
    allows STag zero to be used even with QPs created by unprivileged
    userspace consumers, which means that any process with direct verbs
    access to the nes device can read and write any memory accessible to
    the underlying PCI device (usually any memory in the system). Such
    access is usually given for cluster software such as MPI to use, so
    this is a local privilege escalation bug on most systems running this
    driver.

    The driver was using STag zero to receive the last streaming mode
    data; to allow STag zero to be disabled for unprivileged QPs, the
    driver now registers a special MR for this data.

    Cc:
    Signed-off-by: Faisal Latif
    Signed-off-by: Roland Dreier
    Signed-off-by: Linus Torvalds

    Faisal Latif
     
  • There was a report of a data corruption
    http://lkml.org/lkml/2008/11/14/121. There is a script included to
    reproduce the problem.

    During testing, I encountered a number of strange things with ext3, so I
    tried ext2 to attempt to reduce complexity of the problem. I found that
    fsstress would quickly hang in wait_on_inode, waiting for I_LOCK to be
    cleared, even though instrumentation showed that unlock_new_inode had
    already been called for that inode. This points to memory scribble, or
    synchronisation problme.

    i_state of I_NEW inodes is not protected by inode_lock because other
    processes are not supposed to touch them until I_LOCK (and I_NEW) is
    cleared. Adding WARN_ON(inode->i_state & I_NEW) to sites where we modify
    i_state revealed that generic_sync_sb_inodes is picking up new inodes from
    the inode lists and passing them to __writeback_single_inode without
    waiting for I_NEW. Subsequently modifying i_state causes corruption. In
    my case it would look like this:

    CPU0 CPU1
    unlock_new_inode() __sync_single_inode()
    reg i_state
    reg -> reg & ~(I_LOCK|I_NEW) reg i_state
    reg -> inode->i_state reg -> reg | I_SYNC
    reg -> inode->i_state

    Non-atomic RMW on CPU1 overwrites CPU0 store and sets I_LOCK|I_NEW again.

    Fix for this is rather than wait for I_NEW inodes, just skip over them:
    inodes concurrently being created are not subject to data integrity
    operations, and should not significantly contribute to dirty memory
    either.

    After this change, I'm unable to reproduce any of the added warnings or
    hangs after ~1hour of running. Previously, the new warnings would start
    immediately and hang would happen in under 5 minutes.

    I'm also testing on ext3 now, and so far no problems there either. I
    don't know whether this fixes the problem reported above, but it fixes a
    real problem for me.

    Cc: "Jorge Boncompte [DTI2]"
    Reported-by: Adrian Hunter
    Cc: Jan Kara
    Cc:
    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Even when page reclaim is under mem_cgroup, # of scan page is determined by
    status of global LRU. Fix that.

    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Daisuke Nishimura
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • No software visible difference from revision A.

    Signed-off-by: Mark Brown
    Cc: Samuel Ortiz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark Brown
     
  • Currently we disable the Acer WMI backlight device if there is no ACPI
    backlight device. As a result, we end up with no backlight device at all.
    We should instead disable it if there is an ACPI device, as the other
    laptop drivers do. This regression was introduced in febf2d9 ("Acer-WMI:
    fingers off backlight if video.ko is serving this functionality").

    Each laptop driver with backlight support got a similar change around
    febf2d9. The changes to the other drivers look correct; see e.g.
    a598c82f for a similar but correct change. The regression is also in
    2.6.28.

    Signed-off-by: Michael Spang
    Acked-by: Thomas Renninger
    Cc: Zhang Rui
    Cc: Andi Kleen
    Cc: Carlos Corbacho
    Cc: Len Brown
    Cc: "Rafael J. Wysocki"
    Cc: [2.6.28.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Spang
     
  • The s3cmci driver is calling s3c2410_dma_config with incorrect data for
    the DCON register. The S3C2410_DCON_HWTRIG is implicit in the channel
    configuration and the device selection of S3C2410_DCON_CH0_SDI is
    incorrect as the DMA system may not select channel 0.

    Signed-off-by: Ben Dooks
    Acked-by: Pierre Ossman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Dooks
     
  • Unfortunately, Linux Foundation funding for my work on
    man-pages/testing/doc under the auspices of the LF documentation
    fellowship unfortunately ran out a short while ago (after earlier attempts
    to seek funding, only Google stepped forward with a bit of further funding
    for the position), so the patch below acknowledges something closer to
    reality.

    Unfortunately, there will (probably very) soon be a further downgrade from
    "Maintained" to "Odd Fixes" or "Orphan", unless some funding miracle
    occurs. So, if anyone is looking to become man-pages maintainer, there
    may soon be an opening (okay, don't trample me in the rush ;-).)

    Signed-off-by: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Kerrisk
     
  • The 'battery remaining capacity' calculation in
    drivers/power/ds2760_battery.c lacks a parameter check to a division
    operation which causes the kernel to oops on my board.

    [ 21.233750] Division by zero in kernel.
    [ 21.237646] [] (__div0+0x0/0x20) from [] (Ldiv0+0x8/0x10)
    [ 21.244816] [] (ds2760_battery_read_status+0x0/0x2a4) from [] (ds2760_battery_get_property+0x30/0xdc)
    [ 21.255803] r8:c03a22c0 r7:c7886100 r6:00000009 r5:c782fe7c r4:c7886084
    [ 21.262518] [] (ds2760_battery_get_property+0x0/0xdc) from [] (power_supply_show_property+0x48/0x114)
    [ 21.273480] r6:c7996000 r5:00000009 r4:00000000
    [ 21.278111] [] (power_supply_show_property+0x0/0x114) from [] (power_supply_uevent+0x188/0x280)
    [ 21.288537] r8:00000001 r7:c7886100 r6:c7996000 r5:000000b4 r4:00000000
    [ 21.295222] [] (power_supply_uevent+0x0/0x280) from [] (dev_uevent+0xd4/0x10c)
    [ 21.304199] [] (dev_uevent+0x0/0x10c) from [] (kobject_uevent_env+0x180/0x390)
    [ 21.313170] r5:00000000 r4:c78860ac
    [ 21.316725] [] (kobject_uevent_env+0x0/0x390) from [] (kobject_uevent+0x14/0x18)
    [ 21.325850] [] (kobject_uevent+0x0/0x18) from [] (power_supply_changed_work+0x5c/0x70)
    [ 21.335506] [] (power_supply_changed_work+0x0/0x70) from [] (run_workqueue+0xbc/0x144)
    [ 21.345167] r4:c7812040
    [ 21.347716] [] (run_workqueue+0x0/0x144) from [] (worker_thread+0xa8/0xbc)
    [ 21.356296] r7:c7812040 r6:c7820b00 r5:c782ffa4 r4:c7812048
    [ 21.361957] [] (worker_thread+0x0/0xbc) from [] (kthread+0x5c/0x94)
    [ 21.369971] r7:00000000 r6:c004d8a4 r5:c7812040 r4:c782e000
    [ 21.375612] [] (kthread+0x0/0x94) from [] (do_exit+0x0/0x688)

    Signed-off-by: Daniel Mack
    Cc: Szabolcs Gyurko
    Acked-by: Matt Reimer
    Acked-by: Anton Vorontsov
    Cc: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Mack
     
  • In sget(), destroy_super(s) is called with s->s_umount held, which makes
    lockdep unhappy.

    Signed-off-by: Li Zefan
    Cc: Al Viro
    Acked-by: Peter Zijlstra
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • If the second fasync_helper() fails, pipe_rdwr_fasync() returns the error
    but leaves the file on ->fasync_readers.

    This was always wrong, but since 233e70f4228e78eb2f80dc6650f65d3ae3dbf17c
    "saner FASYNC handling on file close" we have the new problem. Because in
    this case setfl() doesn't set FASYNC bit, __fput() will not do
    ->fasync(0), and we leak fasync_struct with ->fa_file pointing to the
    freed file.

    Signed-off-by: Oleg Nesterov
    Cc: Al Viro
    Cc: Andi Kleen
    Cc: Jonathan Corbet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • W1 master implementations are expected to return 0 or 1 from their
    read_bit() function. However, not all platforms do return these values
    from gpio_get_value() - namely PXAs won't. Hence the w1 gpio-master needs
    to break the result down to 0 or 1 itself.

    Signed-off-by: Daniel Mack
    Cc: Ville Syrjala
    Cc: Evgeniy Polyakov
    Cc: David Brownell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Mack
     
  • Fix the following warning on x86_64:

    LD vmlinux.o
    MODPOST vmlinux.o
    WARNING: vmlinux: 'memcpy' exported twice. Previous export was in vmlinux

    For x86_64, this symbol is already exported from arch/um/sys-x86_64/ksyms.c.

    Reported-by: Boaz Harrosh
    Signed-off-by: WANG Cong
    Tested-by: Boaz Harrosh
    Cc: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    akpm@linux-foundation.org
     
  • It is currently impossible to run a user-mode linux machine inside another
    user-mode linux (UML on UML). It breaks after a few instructions. When
    it tries to check whether SYSEMU is installed (the inner) UML receives an
    inconsistent result (from the outer UML).

    This is the output of a broken attempt:
    $ ./linux mem=256m ubd0=cow
    Locating the bottom of the address space ... 0x0
    Locating the top of the address space ... 0xc0000000
    Core dump limits :
    soft - 0
    hard - NONE
    Checking that ptrace can change system call numbers...OK
    Checking ptrace new tags for syscall emulation...unsupported
    Checking syscall emulation patch for ptrace...check_sysemu : expected SIGTRAP, got status = 256
    $

    The problem is the following:

    PTRACE_SYSCALL/SINGLESTEP is currently managed inside arch_ptrace for ARCH=um.

    PTRACE_SYSEMU/SUSEMU_SINGLESTEP is not captured in arch_ptrace's switch,
    therefore it is erroneously passed back to ptrace_request (in
    kernel/ptrace).

    This simple patch simply forces ptrace to return an error on
    PTRACE_SYSEMU/SUSEMU_SINGLESTEP as it is unsupported on ARCH=um, and fixes
    the problem.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Renzo Davoli
    Reviewed-by: WANG Cong
    Cc: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Renzo Davoli
     
  • The PCIe port driver calls pci_enable_device() during probe but
    never calls pci_disable_device() during remove.

    Cc: stable@kernel.org
    Signed-off-by: Alex Chiang
    Signed-off-by: Matthew Wilcox

    Alex Chiang
     
  • "Enabling" should read "Disabling"

    Signed-off-by: Prakash Punnoor
    Signed-off-by: Matthew Wilcox

    Prakash Punnoor
     
  • Prakash's system needs MSI disabled on some bridges, but not all.
    This seems to be the minimal fix for 2.6.29, but should be replaced
    during 2.6.30.

    Signed-off-by: Prakash Punnoor
    Signed-off-by: Matthew Wilcox

    Prakash Punnoor
     
  • The RPA PCI hotplug driver calls EEH routines, so should depend on
    EEH. Also PPC_PSERIES implies PPC64, so remove that.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Matthew Wilcox

    Michael Ellerman