24 Jan, 2007

2 commits

  • Any newly added irq handler may obviously make any old spurious irq
    status invalid, since the new handler may well be the thing that is
    supposed to handle any interrupts that came in.

    So just clear the statistics when adding handlers.

    Pointed-out-by: Alan Cox
    Acked-by: Thomas Gleixner
    Acked-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • while lock-profiling the -rt kernel i noticed weird contention during
    mmap-intense workloads, and the tracer showed the following gem, in one
    of our MM hotpaths:

    threaded-2771 1.... 65us : sys_munmap (sysenter_do_call)
    threaded-2771 1.... 66us : profile_munmap (sys_munmap)
    threaded-2771 1.... 66us : blocking_notifier_call_chain (profile_munmap)
    threaded-2771 1.... 66us : rt_down_read (blocking_notifier_call_chain)

    ouch! a global rw-semaphore taken in one of the most performance-
    sensitive codepaths of the kernel. And i dont even have oprofile
    enabled! All distro kernels have CONFIG_PROFILING enabled, so this
    scalability problem affects the majority of Linux users.

    The fix is to enhance blocking_notifier_call_chain() to only take the
    lock if there appears to be work on the call-chain.

    With this patch applied i get nicely saturated system, and much higher
    munmap performance, on SMP systems.

    And as a bonus this also fixes a similar scalability bottleneck in the
    thread-exit codepath: profile_task_exit() ...

    Signed-off-by: Ingo Molnar
    Acked-by: Peter Zijlstra
    Acked-by: Nick Piggin
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

23 Jan, 2007

1 commit


12 Jan, 2007

3 commits

  • This adds the profile=kvm boot option, which enables KVM to profile VM
    exits.

    Use: "readprofile -m ./System.map | sort -n" to see the resulting
    output:

    [...]
    18246 serial_out 148.3415
    18945 native_flush_tlb 378.9000
    23618 serial_in 212.7748
    29279 __spin_unlock_irq 622.9574
    43447 native_apic_write 2068.9048
    52702 enable_8259A_irq 742.2817
    54250 vgacon_scroll 89.3740
    67394 ide_inb 6126.7273
    79514 copy_page_range 98.1654
    84868 do_wp_page 86.6000
    140266 pit_read 783.6089
    151436 ide_outb 25239.3333
    152668 native_io_delay 21809.7143
    174783 mask_and_ack_8259A 783.7803
    362404 native_set_pte_at 36240.4000
    1688747 total 0.5009

    Signed-off-by: Ingo Molnar
    Acked-by: Avi Kivity
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Compiling the kernel with CONFIG_HOTPLUG = y and CONFIG_HOTPLUG_CPU = n
    with CONFIG_RELOCATABLE = y generates the following modpost warnings

    WARNING: vmlinux - Section mismatch: reference to .init.data: from
    .text between '_cpu_up' (at offset 0xc0141b7d) and 'cpu_up'
    WARNING: vmlinux - Section mismatch: reference to .init.data: from
    .text between '_cpu_up' (at offset 0xc0141b9c) and 'cpu_up'
    WARNING: vmlinux - Section mismatch: reference to .init.text:__cpu_up
    from .text between '_cpu_up' (at offset 0xc0141bd8) and 'cpu_up'
    WARNING: vmlinux - Section mismatch: reference to .init.data: from
    .text between '_cpu_up' (at offset 0xc0141c05) and 'cpu_up'
    WARNING: vmlinux - Section mismatch: reference to .init.data: from
    .text between '_cpu_up' (at offset 0xc0141c26) and 'cpu_up'
    WARNING: vmlinux - Section mismatch: reference to .init.data: from
    .text between '_cpu_up' (at offset 0xc0141c37) and 'cpu_up'

    This is because cpu_up, _cpu_up and __cpu_up (in some architectures) are
    defined as __devinit
    AND
    __cpu_up calls some __cpuinit functions.

    Since __cpuinit would map to __init with this kind of a configuration,
    we get a .text refering .init.data warning.

    This patch solves the problem by converting all of __cpu_up, _cpu_up
    and cpu_up from __devinit to __cpuinit. The approach is justified since
    the callers of cpu_up are either dependent on CONFIG_HOTPLUG_CPU or
    are of __init type.

    Thus when CONFIG_HOTPLUG_CPU=y, all these cpu up functions would land up
    in .text section, and when CONFIG_HOTPLUG_CPU=n, all these functions would
    land up in .init section.

    Tested on a i386 SMP machine running linux-2.6.20-rc3-mm1.

    Signed-off-by: Gautham R Shenoy
    Cc: Vivek Goyal
    Cc: Mikael Starvik
    Cc: Ralf Baechle
    Cc: Kyle McMartin
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gautham R Shenoy
     
  • Commit 5c1e176781f43bc902a51e5832f789756bff911b ("sched: force /sbin/init
    off isolated cpus") sets init's cpus_allowed to a subset of cpu_online_map
    at boot time, which means that tasks won't be scheduled on cpus that are
    added to the system later.

    Make init's cpus_allowed a subset of cpu_possible_map instead. This should
    still preserve the behavior that Nick's change intended.

    Thanks to Giuliano Pochini for reporting this and testing the fix:

    http://ozlabs.org/pipermail/linuxppc-dev/2006-December/029397.html

    Signed-off-by: Nathan Lynch
    Acked-by: Ingo Molnar
    Cc: Nick Piggin
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nathan Lynch
     

11 Jan, 2007

1 commit

  • o noirqdebug_setup() is __init but it is being called by
    quirk_intel_irqbalance() which if of type __devinit. If CONFIG_HOTPLUG=y,
    quirk_intel_irqbalance() is put into text section and it is wrong to
    call a function in __init section.

    o MODPOST flags this on i386 if CONFIG_RELOCATABLE=y

    WARNING: vmlinux - Section mismatch: reference to .init.text:noirqdebug_setup from .text between 'quirk_intel_irqbalance' (at offset 0xc010969e) and 'i8237A_suspend'

    o Make noirqdebug_setup() non-init.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Andi Kleen

    Vivek Goyal
     

06 Jan, 2007

5 commits


01 Jan, 2007

1 commit

  • Commit b2b2cbc4b2a2f389442549399a993a8306420baf introduced a user-
    visible change: ->pdeath_signal is sent only when the entire thread
    group exits.

    While this change is imho good, it may break things. So restore the
    old behaviour for now.

    Signed-off-by: Oleg Nesterov
    To: Albert Cahalan
    Cc: Eric W. Biederman
    Cc: Andrew Morton
    Cc: Linus Torvalds
    Cc: Ingo Molnar
    Cc: Qi Yong
    Cc: Roland McGrath
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

31 Dec, 2006

5 commits

  • kernel/lockdep.c: In function `lookup_chain_cache':
    kernel/lockdep.c:1339: warning: long long unsigned int format, u64 arg (arg 2)
    kernel/lockdep.c:1344: warning: long long unsigned int format, u64 arg (arg 2)

    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • fs/proc/base.c:1869: warning: initialization discards qualifiers from pointer target type
    fs/proc/base.c:2150: warning: initialization discards qualifiers from pointer target type

    Cc: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • mod_sysfs_setup() doesn't return error when kobject_add_dir() failed.

    Signed-off-by: Akinobu Mita
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • Remove the __resched_legal() check: it is conceptually broken. The biggest
    problem it had is that it can mask buggy cond_resched() calls. A
    cond_resched() call is only legal if we are not in an atomic context, with
    two narrow exceptions:

    - if the system is booting
    - a reacquire_kernel_lock() down() done while PREEMPT_ACTIVE is set

    But __resched_legal() hid this and just silently returned whenever
    these primitives were called from invalid contexts. (Same goes for
    cond_resched_locked() and cond_resched_softirq()).

    Furthermore, the __legal_resched(0) call was buggy in that it caused
    unnecessarily long softirq latencies via cond_resched_softirq(). (which is
    only called from softirq-off sections, hence the code did nothing.)

    The fix is to resurrect the efficiency of the might_sleep checks and to
    only allow the narrow exceptions.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Fix suspend hang: rcutorture threads need to be nofreeze.

    Signed-off-by: Ingo Molnar
    Acked-by: Paul E. McKenney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

24 Dec, 2006

1 commit

  • Clark Williams reported that suspend doesnt work on his laptop on
    2.6.20-rc1-rt kernels. The bug was introduced by the following cleanup
    commit:

    commit 112cecb2cc0e7341db92281ba04b26c41bb8146d
    Author: Siddha, Suresh B
    Date: Wed Dec 6 20:34:31 2006 -0800

    [PATCH] suspend: don't change cpus_allowed for task initiating the suspend

    because with this change 'error' is not initialized to 0 anymore, if
    there are no other online CPUs. (i.e. if the system is single-CPU).

    the fix is the initialize it to 0. The really weird thing is that my
    version of gcc does not warn about this non-initialized variable
    situation ...

    (also fix the kernel printk in the error branch, it was missing a
    newline)

    Reported-by: Clark Williams
    Signed-off-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     

23 Dec, 2006

11 commits

  • * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (68 commits)
    ACPI: replace kmalloc+memset with kzalloc
    ACPI: Add support for acpi_load_table/acpi_unload_table_id
    fbdev: update after backlight argument change
    ACPI: video: Add dev argument for backlight_device_register
    ACPI: Implement acpi_video_get_next_level()
    ACPI: Kconfig - depend on PM rather than selecting it
    ACPI: fix NULL check in drivers/acpi/osl.c
    ACPI: make drivers/acpi/ec.c:ec_ecdt static
    ACPI: prevent processor module from loading on failures
    ACPI: fix single linked list manipulation
    ACPI: ibm_acpi: allow clean removal
    ACPI: fix git automerge failure
    ACPI: ibm_acpi: respond to workqueue update
    ACPI: dock: add uevent to indicate change in device status
    ACPI: ec: Lindent once again
    ACPI: ec: Change #define to enums there possible.
    ACPI: ec: Style changes.
    ACPI: ec: Acquire Global Lock under EC mutex.
    ACPI: ec: Drop udelay() from poll mode. Loop by reading status field instead.
    ACPI: ec: Rename gpe_bit to gpe
    ...

    Linus Torvalds
     
  • This patch fixes the case when we reparent to a different thread in the
    same thread group. This modifies the code so that we do not send
    signals and do not change the signal to send to SIGCHLD unless we have
    change the thread group of our parents. It also suppresses sending
    pdeath_sig in this cas as well since the result of geppid doesn't
    change.

    Thanks to Oleg for spotting my bug of only fixing this for non-ptraced
    tasks.

    Signed-off-by: Eric W. Biederman
    Cc: Mike Galbraith
    Cc: Albert Cahalan
    Cc: Andrew Morton
    Cc: Roland McGrath
    Cc: Ingo Molnar
    Cc: Coywolf Qi Hunt
    Acked-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • text data bss dec hex filename
    before: 4036 44 0 4080 ff0 kernel/relay.o
    after: 3727 44 0 3771 ebb kernel/relay.o

    Cc: Mathieu Desnoyers
    Cc: Tom Zanussi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Christoph Hellwig has expressed concerns that the recent fdtable changes
    expose the details of the RCU methodology used to release no-longer-used
    fdtable structures to the rest of the kernel. The trivial patch below
    addresses these concerns by introducing the appropriate free_fdtable()
    calls, which simply wrap the release RCU usage. Since free_fdtable() is a
    one-liner, it makes sense to promote it to an inline helper.

    Signed-off-by: Vadim Lobanov
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vadim Lobanov
     
  • Kyle is hitting this warning, and we don't have a clue what it's caused by.
    Add the obligatory dump_stack().

    Cc: kyle
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • kstrdup() returns NULL on error.

    Cc: David Woodhouse
    Signed-off-by: Akinobu Mita
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • The sanity check for no_irq_chip in __set_irq_hander() is unconditional on
    both install and uninstall of an handler. This triggers false warnings and
    replaces no_irq_chip by dummy_irq_chip in the uninstall case.

    Check only, when a real handler is installed.

    Signed-off-by: Thomas Gleixner
    Acked-by: Ingo Molnar
    Acked-by: Sylvain Munaut
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • The structure cpu_isolated_map is used not only during initialization.
    Multi-core scheduler configuration changes and exclusive cpusets
    use this during run time. During setting of sched_mc_power_savings
    policy, this structure is accessed to update sched_domains.

    Signed-off-by: Tim Chen
    Acked-by: Suresh Siddha
    Acked-by: Ingo Molnar
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tim Chen
     
  • Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Fix kernel-doc warnings in 2.6.20-rc1.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Commit 2d7d253548cffdce80f4e03664686e9ccb1b0ed7 ("fix cond_resched() fix")
    introduced an 'expected_preempt_count' parameter to __resched_legal() to
    fix a bug where it was returning a false negative when called from
    cond_resched_lock() and preemption was enabled.

    Unfortunately this broke things for when preemption is disabled.
    preempt_count() will always return zero, thus failing the check against any
    value of expected_preempt_count not equal to zero. cond_resched_lock() for
    example, passes an expected_preempt_count value of 1.

    So fix the fix for the cond_resched() fix by skipping the check of
    preempt_count() against expected_preempt_count when preemption is disabled.

    Credit should go to Sunil Mushran for spotting the bug during testing.

    Signed-off-by: Mark Fasheh
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark Fasheh
     

21 Dec, 2006

3 commits

  • fix the schedule_on_each_cpu() implementation: __queue_work() is now
    stricter, hence set the work-pending bit before passing in the new work.

    (found in the -rt tree, using Peter Zijlstra's files-lock scalability
    patchset)

    Signed-off-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Problem:
    sched_fork() has always called scheduler_tick() in some (unlikely)
    circumstances in order to update the current task in light of those
    circumstances. It has always been the case that the work done by
    scheduler_tick() was more than was required to handle the problem in
    hand but no harm was done except for the waste of a few CPU cycles.

    However, the splitting of scheduler_tick() into two procedures in
    2.6.20-rc1 enables the wasted cycles to be saved as the new procedure
    task_running_tick() does all the work that is required to rectify the
    problem being handled.

    Solution:
    Replace the call to scheduler_tick() in sched_fork() with a call to
    task_running_tick().

    Signed-off-by: Peter Williams
    Acked-by: Ingo Molnar
    Signed-off-by: Linus Torvalds

    Peter Williams
     
  • __set_irq_handler: Kill a bogus space

    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     

20 Dec, 2006

1 commit


17 Dec, 2006

1 commit

  • On architectures where the atomicity of the bit operations is handled by
    external means (ie a separate spinlock to protect concurrent accesses),
    just doing a direct assignment on the workqueue data field (as done by
    commit 4594bf159f1962cec3b727954b7c598b07e2e737) can cause the
    assignment to be lost due to lack of serialization with the bitops on
    the same word.

    So we need to serialize the assignment with the locks on those
    architectures (notably older ARM chips, PA-RISC and sparc32).

    So rather than using an "unsigned long", let's use "atomic_long_t",
    which already has a safe assignment operation (atomic_long_set()) on
    such architectures.

    This requires that the atomic operations use the same atomicity locks as
    the bit operations do, but that is largely the case anyway. Sparc32
    will probably need fixing.

    Architectures (including modern ARM with LL/SC) that implement sane
    atomic operations for SMP won't see any of this matter.

    Cc: Russell King
    Cc: David Howells
    Cc: David Miller
    Cc: Matthew Wilcox
    Cc: Linux Arch Maintainers
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

16 Dec, 2006

2 commits


14 Dec, 2006

3 commits