16 Sep, 2009

2 commits

  • * 'x86-pat-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, pat: Fix cacheflush address in change_page_attr_set_clr()
    mm: remove !NUMA condition from PAGEFLAGS_EXTENDED condition set
    x86: Fix earlyprintk=dbgp for machines without NX
    x86, pat: Sanity check remap_pfn_range for RAM region
    x86, pat: Lookup the protection from memtype list on vm_insert_pfn()
    x86, pat: Add lookup_memtype to get the current memtype of a paddr
    x86, pat: Use page flags to track memtypes of RAM pages
    x86, pat: Generalize the use of page flag PG_uncached
    x86, pat: Add rbtree to do quick lookup in memtype tracking
    x86, pat: Add PAT reserve free to io_mapping* APIs
    x86, pat: New i/f for driver to request memtype for IO regions
    x86, pat: ioremap to follow same PAT restrictions as other PAT users
    x86, pat: Keep identity maps consistent with mmaps even when pat_disabled
    x86, mtrr: make mtrr_aps_delayed_init static bool
    x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init
    generic-ipi: Allow cpus not yet online to call smp_call_function with irqs disabled
    x86: Fix an incorrect argument of reserve_bootmem()
    x86: Fix system crash when loading with "reservetop" parameter

    Linus Torvalds
     
  • * 'x86-txt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    x86, intel_txt: clean up the impact on generic code, unbreak non-x86
    x86, intel_txt: Handle ACPI_SLEEP without X86_TRAMPOLINE
    x86, intel_txt: Fix typos in Kconfig help
    x86, intel_txt: Factor out the code for S3 setup
    x86, intel_txt: tboot.c needs
    intel_txt: Force IOMMU on for Intel TXT launch
    x86, intel_txt: Intel TXT Sx shutdown support
    x86, intel_txt: Intel TXT reboot/halt shutdown support
    x86, intel_txt: Intel TXT boot support

    Linus Torvalds
     

15 Sep, 2009

14 commits

  • …is/security-testing-2.6

    * 'for-linus3' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
    SELinux: inline selinux_is_enabled in !CONFIG_SECURITY_SELINUX
    KEYS: Fix garbage collector
    KEYS: Unlock tasklist when exiting early from keyctl_session_to_parent
    CRED: Allow put_cred() to cope with a NULL groups list
    SELinux: flush the avc before disabling SELinux
    SELinux: seperate avc_cache flushing
    Creds: creds->security can be NULL is selinux is disabled

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6: (23 commits)
    at_hdmac: Rework suspend_late()/resume_early()
    PM: Reset transition_started at dpm_resume_noirq
    PM: Update kerneldoc comments in drivers/base/power/main.c
    PM: Add convenience macro to make switching to dev_pm_ops less error-prone
    hp-wmi: Switch driver to dev_pm_ops
    floppy: Switch driver to dev_pm_ops
    PM: Trivial fixes
    PM / Hibernate / Memory hotplug: Always use for_each_populated_zone()
    PM/Hibernate: Do not try to allocate too much memory too hard (rev. 2)
    PM/Hibernate: Do not release preallocated memory unnecessarily (rev. 2)
    PM/Hibernate: Rework shrinking of memory
    PM: Fix typo in label name s/Platofrm_finish/Platform_finish/
    PM: Run-time PM platform device bus support
    PM: Introduce core framework for run-time PM of I/O devices (rev. 17)
    Driver Core: Make PM operations a const pointer
    PM: Remove platform device suspend_late()/resume_early() V2
    USB: Rework musb suspend()/resume_early()
    I2C: Rework i2c-s3c2410 suspend_late()/resume() V2
    I2C: Rework i2c-pxa suspend_late()/resume_early()
    DMA: Rework txx9dmac suspend_late()/resume_early()
    ...

    Fix trivial conflict in drivers/base/platform.c (due to same
    constification patch being merged in both sides, along with some other
    PM work in the PM branch)

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-kconfig:
    kconfig: add missing dependency of conf to localyesconfig
    kconfig: test if a .config already exists
    kconfig: make local .config default for streamline_config
    kconfig: test for /boot/config-uname after /proc/config.gz in localconfig
    kconfig: unset IKCONFIG_PROC and clean up nesting
    kconfig: search for a config to base the local(mod|yes)config on
    kconfig: keep config.gz around even if CONFIG_IKCONFIG_PROC is not set
    kconfig: have extract-ikconfig read ELF files
    kconfig: add check if end exists in extract-ikconfig
    kconfig: enable CONFIG_IKCONFIG from streamline_config.pl
    kconfig: do not warn about modules built in
    kconfig: streamline_config.pl do not stop with no depends
    kconfig: add make localyesconfig option
    kconfig: make localmodconfig to run streamline_config.pl
    kconfig: add streamline_config.pl to scripts

    Linus Torvalds
     
  • * 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block: (29 commits)
    block: use blkdev_issue_discard in blk_ioctl_discard
    Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads
    block: don't assume device has a request list backing in nr_requests store
    block: Optimal I/O limit wrapper
    cfq: choose a new next_req when a request is dispatched
    Seperate read and write statistics of in_flight requests
    aoe: end barrier bios with EOPNOTSUPP
    block: trace bio queueing trial only when it occurs
    block: enable rq CPU completion affinity by default
    cfq: fix the log message after dispatched a request
    block: use printk_once
    cciss: memory leak in cciss_init_one()
    splice: update mtime and atime on files
    block: make blk_iopoll_prep_sched() follow normal 0/1 return convention
    cfq-iosched: get rid of must_alloc flag
    block: use interrupts disabled version of raise_softirq_irqoff()
    block: fix comment in blk-iopoll.c
    block: adjust default budget for blk-iopoll
    block: fix long lines in block/blk-iopoll.c
    block: add blk-iopoll, a NAPI like approach for block devices
    ...

    Linus Torvalds
     
  • console_print() is an old legacy interface mostly unused in the entire
    kernel tree. It's best to clean up its existing use and let developers
    use their own implementation of it as they feel fit.

    Signed-off-by: Anirban Sinha
    Signed-off-by: Linus Torvalds

    Anirban Sinha
     
  • put_cred() will oops if given a NULL groups list, but that is now possible with
    the existence of cred_alloc_blank(), as used in keyctl_session_to_parent().

    Added in commit:

    commit ee18d64c1f632043a02e6f5ba5e045bb26a5465f
    Author: David Howells
    Date: Wed Sep 2 09:14:21 2009 +0100
    KEYS: Add a keyctl to install a process's session keyring on its parent [try #6]

    Reported-by: Marc Dionne
    Signed-off-by: David Howells
    Signed-off-by: James Morris

    David Howells
     
  • Fix the definition of BM_BITS_PER_BLOCK and kerneldoc
    description of create_bm_block_list().

    [rjw: Added changelog.]

    Signed-off-by: Wu Fengguang
    Signed-off-by: Rafael J. Wysocki

    Wu Fengguang
     
  • Use for_each_populated_zone() instead of for_each_zone() in hibernation
    code. This fixes a bug on s390, where we allow both config options
    HIBERNATION and MEMORY_HOTPLUG, so that we also have a ZONE_MOVABLE
    here. We only allow hibernation if no memory hotplug operation was
    performed, so in fact both features can only be used exclusively, but
    this way we don't need 2 differently configured (distribution) kernels.

    If we have an unpopulated ZONE_MOVABLE, we allow hibernation but run
    into a BUG_ON() in memory_bm_test/set/clear_bit() because hibernation
    code iterates through all zones, not only the populated zones, in
    several places. For example, swsusp_free() does for_each_zone() and
    then checks for pfn_valid(), which is true even if the zone is not
    populated, resulting in a BUG_ON() later because the pfn cannot be
    found in the memory bitmap.

    Replacing all occurences of for_each_zone() in hibernation code with
    for_each_populated_zone() would fix this issue.

    [rjw: Rebased on top of linux-next hibernation patches.]

    Signed-off-by: Gerald Schaefer
    Acked-by: KOSAKI Motohiro
    Signed-off-by: Rafael J. Wysocki

    Gerald Schaefer
     
  • We want to avoid attempting to free too much memory too hard during
    hibernation, so estimate the minimum size of the image to use as the
    lower limit for preallocating memory.

    The approach here is based on the (experimental) observation that we
    can't free more page frames than the sum of:

    * global_page_state(NR_SLAB_RECLAIMABLE)
    * global_page_state(NR_ACTIVE_ANON)
    * global_page_state(NR_INACTIVE_ANON)
    * global_page_state(NR_ACTIVE_FILE)
    * global_page_state(NR_INACTIVE_FILE)

    minus

    * global_page_state(NR_FILE_MAPPED)

    Namely, if this number is subtracted from the number of saveable
    pages in the system, we get a good estimate of the minimum reasonable
    size of a hibernation image.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Wu Fengguang

    Rafael J. Wysocki
     
  • Since the hibernation code is now going to use allocations of memory
    to make enough room for the image, it can also use the page frames
    allocated at this stage as image page frames. The low-level
    hibernation code needs to be rearranged for this purpose, but it
    allows us to avoid freeing a great number of pages and allocating
    these same pages once again later, so it generally is worth doing.

    [rev. 2: Take highmem into account correctly.]

    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • Rework swsusp_shrink_memory() so that it calls shrink_all_memory()
    just once to make some room for the image and then allocates memory
    to apply more pressure to the memory management subsystem, if
    necessary.

    Unfortunately, we don't seem to be able to drop shrink_all_memory()
    entirely just yet, because that would lead to huge performance
    regressions in some test cases.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek

    Rafael J. Wysocki
     
  • Although the same label name is used somewhere else in the file, this
    particular label was consistently typoed in all of its uses.

    Signed-off-by: Thadeu Lima de Souza Cascardo
    Signed-off-by: Rafael J. Wysocki

    Thadeu Lima de Souza Cascardo
     
  • Rafael J. Wysocki
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1623 commits)
    netxen: update copyright
    netxen: fix tx timeout recovery
    netxen: fix file firmware leak
    netxen: improve pci memory access
    netxen: change firmware write size
    tg3: Fix return ring size breakage
    netxen: build fix for INET=n
    cdc-phonet: autoconfigure Phonet address
    Phonet: back-end for autoconfigured addresses
    Phonet: fix netlink address dump error handling
    ipv6: Add IFA_F_DADFAILED flag
    net: Add DEVTYPE support for Ethernet based devices
    mv643xx_eth.c: remove unused txq_set_wrr()
    ucc_geth: Fix hangs after switching from full to half duplex
    ucc_geth: Rearrange some code to avoid forward declarations
    phy/marvell: Make non-aneg speed/duplex forcing work for 88E1111 PHYs
    drivers/net/phy: introduce missing kfree
    drivers/net/wan: introduce missing kfree
    net: force bridge module(s) to be GPL
    Subject: [PATCH] appletalk: Fix skb leak when ipddp interface is not loaded
    ...

    Fixed up trivial conflicts:

    - arch/x86/include/asm/socket.h

    converted to in the x86 tree. The generic
    header has the same new #define's, so that works out fine.

    - drivers/net/tun.c

    fix conflict between 89f56d1e9 ("tun: reuse struct sock fields") that
    switched over to using 'tun->socket.sk' instead of the redundantly
    available (and thus removed) 'tun->sk', and 2b980dbd ("lsm: Add hooks
    to the TUN driver") which added a new 'tun->sk' use.

    Noted in 'next' by Stephen Rothwell.

    Linus Torvalds
     

12 Sep, 2009

12 commits

  • …el/git/tip/linux-2.6-tip

    * 'tracing-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (105 commits)
    ring-buffer: only enable ring_buffer_swap_cpu when needed
    ring-buffer: check for swapped buffers in start of committing
    tracing: report error in trace if we fail to swap latency buffer
    tracing: add trace_array_printk for internal tracers to use
    tracing: pass around ring buffer instead of tracer
    tracing: make tracing_reset safe for external use
    tracing: use timestamp to determine start of latency traces
    tracing: Remove mentioning of legacy latency_trace file from documentation
    tracing/filters: Defer pred allocation, fix memory leak
    tracing: remove users of tracing_reset
    tracing: disable buffers and synchronize_sched before resetting
    tracing: disable update max tracer while reading trace
    tracing: print out start and stop in latency traces
    ring-buffer: disable all cpu buffers when one finds a problem
    ring-buffer: do not count discarded events
    ring-buffer: remove ring_buffer_event_discard
    ring-buffer: fix ring_buffer_read crossing pages
    ring-buffer: remove unnecessary cpu_relax
    ring-buffer: do not swap buffers during a commit
    ring-buffer: do not reset while in a commit
    ...

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (64 commits)
    sched: Fix sched::sched_stat_wait tracepoint field
    sched: Disable NEW_FAIR_SLEEPERS for now
    sched: Keep kthreads at default priority
    sched: Re-tune the scheduler latency defaults to decrease worst-case latencies
    sched: Turn off child_runs_first
    sched: Ensure that a child can't gain time over it's parent after fork()
    sched: enable SD_WAKE_IDLE
    sched: Deal with low-load in wake_affine()
    sched: Remove short cut from select_task_rq_fair()
    sched: Turn on SD_BALANCE_NEWIDLE
    sched: Clean up topology.h
    sched: Fix dynamic power-balancing crash
    sched: Remove reciprocal for cpu_power
    sched: Try to deal with low capacity, fix update_sd_power_savings_stats()
    sched: Try to deal with low capacity
    sched: Scale down cpu_power due to RT tasks
    sched: Implement dynamic cpu_power
    sched: Add smt_gain
    sched: Update the cpu_power sum during load-balance
    sched: Add SD_PREFER_SIBLING
    ...

    Linus Torvalds
     
  • …/kernel/git/tip/linux-2.6-tip

    * 'perfcounters-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits)
    perf tools: Avoid unnecessary work in directory lookups
    perf stat: Clean up statistics calculations a bit more
    perf stat: More advanced variance computation
    perf stat: Use stddev_mean in stead of stddev
    perf stat: Remove the limit on repeat
    perf stat: Change noise calculation to use stddev
    x86, perf_counter, bts: Do not allow kernel BTS tracing for now
    x86, perf_counter, bts: Correct pointer-to-u64 casts
    x86, perf_counter, bts: Fail if BTS is not available
    perf_counter: Fix output-sharing error path
    perf trace: Fix read_string()
    perf trace: Print out in nanoseconds
    perf tools: Seek to the end of the header area
    perf trace: Fix parsing of perf.data
    perf trace: Sample timestamps as well
    perf_counter: Introduce new (non-)paranoia level to allow raw tracepoint access
    perf trace: Sample the CPU too
    perf tools: Work around strict aliasing related warnings
    perf tools: Clean up warnings list in the Makefile
    perf tools: Complete support for dynamic strings
    ...

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'irq-threaded-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    genirq: Do not mask oneshot edge type interrupts
    genirq: Support nested threaded irq handling
    genirq: Add buslock support
    genirq: Add oneshot support

    Linus Torvalds
     
  • * 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    pci/intr_remapping: Allocate irq_iommu on node
    irq: Add irq_node() primitive
    irq: Make sure irq_desc for legacy irq get correct node setting
    genirq: Add prototype for handle_nested_irq()
    irq: Remove superfluous NULL pointer check in check_irq_resend()
    irq: Clean up by removing irqfixup MODULE_PARM_DESC()
    genirq: Fix comment describing suspend_device_irqs()
    genirq: Remove obsolete defines and typedefs

    Linus Torvalds
     
  • * 'core-rcu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (28 commits)
    rcu: Move end of special early-boot RCU operation earlier
    rcu: Changes from reviews: avoid casts, fix/add warnings, improve comments
    rcu: Create rcutree plugins to handle hotplug CPU for multi-level trees
    rcu: Remove lockdep annotations from RCU's _notrace() API members
    rcu: Add #ifdef to suppress __rcu_offline_cpu() warning in !HOTPLUG_CPU builds
    rcu: Add CPU-offline processing for single-node configurations
    rcu: Add "notrace" to RCU function headers used by ftrace
    rcu: Remove CONFIG_PREEMPT_RCU
    rcu: Merge preemptable-RCU functionality into hierarchical RCU
    rcu: Simplify rcu_pending()/rcu_check_callbacks() API
    rcu: Use debugfs_remove_recursive() simplify code.
    rcu: Merge per-RCU-flavor initialization into pre-existing macro
    rcu: Fix online/offline indication for rcudata.csv trace file
    rcu: Consolidate sparse and lockdep declarations in include/linux/rcupdate.h
    rcu: Renamings to increase RCU clarity
    rcu: Move private definitions from include/linux/rcutree.h to kernel/rcutree.h
    rcu: Expunge lingering references to CONFIG_CLASSIC_RCU, optimize on !SMP
    rcu: Delay rcu_barrier() wait until beginning of next CPU-hotunplug operation.
    rcu: Fix typo in rcu_irq_exit() comment header
    rcu: Make rcupreempt_trace.c look at offline CPUs
    ...

    Linus Torvalds
     
  • …l/git/tip/linux-2.6-tip

    * 'core-printk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    printk: Fix "printk: Enable the use of more than one CON_BOOT (early console)"
    printk: Restore previous console_loglevel when re-enabling logging
    printk: Ensure that "console enabled" messages are printed on the console
    printk: Enable the use of more than one CON_BOOT (early console)

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (32 commits)
    locking, m68k/asm-offsets: Rename signal defines
    locking: Inline spinlock code for all locking variants on s390
    locking: Simplify spinlock inlining
    locking: Allow arch-inlined spinlocks
    locking: Move spinlock function bodies to header file
    locking, m68k: Calculate thread_info offset with asm offset
    locking, m68k/asm-offsets: Rename pt_regs offset defines
    locking, sparc: Rename __spin_try_lock() and friends
    locking, powerpc: Rename __spin_try_lock() and friends
    lockdep: Remove recursion stattistics
    lockdep: Simplify lock_stat seqfile code
    lockdep: Simplify lockdep_chains seqfile code
    lockdep: Simplify lockdep seqfile code
    lockdep: Fix missing entries in /proc/lock_chains
    lockdep: Fix missing entry in /proc/lock_stat
    lockdep: Fix memory usage info of BFS
    lockdep: Reintroduce generation count to make BFS faster
    lockdep: Deal with many similar locks
    lockdep: Introduce lockdep_assert_held()
    lockdep: Fix style nits
    ...

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'core-futexes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    futex: Detect mismatched requeue targets
    futex: Correct futex_wait_requeue_pi() commentary

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'core-debug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    debug lockups: Improve lockup detection, fix generic arch fallback
    debug lockups: Improve lockup detection

    Linus Torvalds
     
  • …nel/git/tip/linux-2.6-tip

    * 'core-cleanups-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    workqueues: Improve schedule_work() documentation

    Linus Torvalds
     
  • * 'writeback' of git://git.kernel.dk/linux-2.6-block:
    writeback: check for registered bdi in flusher add and inode dirty
    writeback: add name to backing_dev_info
    writeback: add some debug inode list counters to bdi stats
    writeback: get rid of pdflush completely
    writeback: switch to per-bdi threads for flushing data
    writeback: move dirty inodes from super_block to backing_dev_info
    writeback: get rid of generic_sync_sb_inodes() export

    Linus Torvalds
     

11 Sep, 2009

5 commits

  • This borrows some code from NAPI and implements a polled completion
    mode for block devices. The idea is the same as NAPI - instead of
    doing the command completion when the irq occurs, schedule a dedicated
    softirq in the hopes that we will complete more IO when the iopoll
    handler is invoked. Devices have a budget of commands assigned, and will
    stay in polled mode as long as they continue to consume their budget
    from the iopoll softirq handler. If they do not, the device is set back
    to interrupt completion mode.

    This patch holds the core bits for blk-iopoll, device driver support
    sold separately.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This enables us to track who does what and print info. Its main use
    is catching dirty inodes on the default_backing_dev_info, so we can
    fix that up.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • James Morris
     
  • This weird perf trace output:

    cc1-9943 [001] 2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns]

    Is caused by setting one component field of the delta to zero
    a bit too early. Move it to later.

    ( Note, this does not affect the NEW_FAIR_SLEEPERS interactivity bug,
    it's just a reporting bug in essence. )

    Acked-by: Peter Zijlstra
    Cc: Nikos Chantziaras
    Cc: Jens Axboe
    Cc: Mike Galbraith
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Nikos Chantziaras and Jens Axboe reported that turning off
    NEW_FAIR_SLEEPERS improves desktop interactivity visibly.

    Nikos described his experiences the following way:

    " With this setting, I can do "nice -n 19 make -j20" and
    still have a very smooth desktop and watch a movie at
    the same time. Various other annoyances (like the
    "logout/shutdown/restart" dialog of KDE not appearing
    at all until the background fade-out effect has finished)
    are also gone. So this seems to be the single most
    important setting that vastly improves desktop behavior,
    at least here. "

    Jens described it the following way, referring to a 10-seconds
    xmodmap scheduling delay he was trying to debug:

    " Then I tried switching NO_NEW_FAIR_SLEEPERS on, and then
    I get:

    Performance counter stats for 'xmodmap .xmodmap-carl':

    9.009137 task-clock-msecs # 0.447 CPUs
    18 context-switches # 0.002 M/sec
    1 CPU-migrations # 0.000 M/sec
    315 page-faults # 0.035 M/sec

    0.020167093 seconds time elapsed

    Woot! "

    So disable it for now. In perf trace output i can see weird
    delta timestamps:

    cc1-9943 [001] 2802.059479616: sched_stat_wait: task: as:9944 wait: 2801938766276 [ns]

    That nsec field is not supposed to be that large. More digging
    is needed - but lets turn it off while the real bug is found.

    Reported-by: Nikos Chantziaras
    Tested-by: Nikos Chantziaras
    Reported-by: Jens Axboe
    Tested-by: Jens Axboe
    Acked-by: Peter Zijlstra
    Cc: Mike Galbraith
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

10 Sep, 2009

1 commit


09 Sep, 2009

3 commits

  • Removes kthread/workqueue priority boost, they increase worst-case
    desktop latencies.

    Signed-off-by: Mike Galbraith
    Acked-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     
  • Reduce the latency target from 20 msecs to 5 msecs.

    Why? Larger latencies increase spread, which is good for scaling,
    but bad for worst case latency.

    We still have the ilog(nr_cpus) rule to scale up on bigger
    server boxes.

    Signed-off-by: Mike Galbraith
    Acked-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     
  • Set child_runs_first default to off.

    It hurts 'optimal' make -j workloads as make jobs
    get preempted by child tasks, reducing parallelism.

    Note, this patch might make existing races in user
    applications more prominent than before - so breakages
    might be bisected to this commit.

    Child-runs-first is broken on SMP to begin with, and we
    already had it off briefly in v2.6.23 so most of the
    offenders ought to be fixed. Would be nice not to revert
    this commit but fix those apps finally ...

    Signed-off-by: Mike Galbraith
    Acked-by: Peter Zijlstra
    LKML-Reference:
    [ made the sysctl independent of CONFIG_SCHED_DEBUG, in case
    people want to work around broken apps. ]
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     

08 Sep, 2009

3 commits

  • A fork/exec load is usually "pass the baton", so the child
    should never be placed behind the parent. With START_DEBIT we
    make room for the new task, but with child_runs_first, that
    room comes out of the _parent's_ hide. There's nothing to say
    that the parent wasn't ahead of min_vruntime at fork() time,
    which means that the "baton carrier", who is essentially the
    parent in drag, can gain time and increase scheduling latencies
    for waiters.

    With NEW_FAIR_SLEEPERS + START_DEBIT + child_runs_first
    enabled, we essentially pass the sleeper fairness off to the
    child, which is fine, but if we don't base placement on the
    parent's updated vruntime, we can end up compounding latency
    woes if the child itself then does fork/exec. The debit
    incurred at fork doesn't hurt the parent who is then going to
    sleep and maybe exit, but the child who acquires the error
    harms all comers.

    This improves latencies of make -j kernel build workloads.

    Reported-by: Jens Axboe
    Signed-off-by: Mike Galbraith
    Acked-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     
  • wake_affine() would always fail under low-load situations where
    both prev and this were idle, because adding a single task will
    always be a significant imbalance, even if there's nothing
    around that could balance it.

    Deal with this by allowing imbalance when there's nothing you
    can do about it.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • select_task_rq_fair() incorrectly skips the wake_affine()
    logic, remove this.

    When prev_cpu == this_cpu, the code jumps straight to the
    wake_idle() logic, this doesn't give the wake_affine() logic
    the chance to pin the task to this cpu.

    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra