28 Jan, 2008

3 commits


27 Jan, 2008

1 commit


26 Jan, 2008

36 commits

  • Forgot to remove this when removing the appldata binary sysctls.

    Signed-off-by: Heiko Carstens
    Signed-off-by: Martin Schwidefsky

    Heiko Carstens
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (200 commits)
    [SCSI] usbstorage: use last_sector_bug flag universally
    [SCSI] libsas: abstract STP task status into a function
    [SCSI] ultrastor: clean up inline asm warnings
    [SCSI] aic7xxx: fix firmware build
    [SCSI] aacraid: fib context lock for management ioctls
    [SCSI] ch: remove forward declarations
    [SCSI] ch: fix device minor number management bug
    [SCSI] ch: handle class_device_create failure properly
    [SCSI] NCR5380: fix section mismatch
    [SCSI] sg: fix /proc/scsi/sg/devices when no SCSI devices
    [SCSI] IB/iSER: add logical unit reset support
    [SCSI] don't use __GFP_DMA for sense buffers if not required
    [SCSI] use dynamically allocated sense buffer
    [SCSI] scsi.h: add macro for enclosure bit of inquiry data
    [SCSI] sd: add fix for devices with last sector access problems
    [SCSI] fix pcmcia compile problem
    [SCSI] aacraid: add Voodoo Lite class of cards.
    [SCSI] aacraid: add new driver features flags
    [SCSI] qla2xxx: Update version number to 8.02.00-k7.
    [SCSI] qla2xxx: Issue correct MBC_INITIALIZE_FIRMWARE command.
    ...

    Linus Torvalds
     
  • Right now, the linux kernel (with scheduler statistics enabled) keeps track
    of the maximum time a process is waiting to be scheduled. While the maximum
    is a very useful metric, tracking average and total is equally useful
    (at least for latencytop) to figure out the accumulated effect of scheduler
    delays. The accumulated effect is important to judge the performance impact
    of scheduler tuning/behavior.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     
  • print_cfs_stats is callable from interrupt context (sysrq), hence it should
    not take mutexes. Change it to use RCU since the task group data is RCU
    freed anyway.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The attached patch is something really simple that can sometimes help
    in getting more info out of a hung system.

    Signed-off-by: Ingo Molnar

    Nick Piggin
     
  • printk timestamps: use ktime_get().

    Some platforms have a functioning clocksource function only after
    they are done with early bootup, so delay this until out of
    SYSTEM_BOOTING state.

    it's also inherently safe now, as any bugs in this area will be
    caught by the printk recursion checks.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • fix softlockup tunables signedness.

    mark tunables read-mostly.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • LatencyTOP kernel infrastructure; it measures latencies in the
    scheduler and tracks it system wide and per process.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     
  • looking at it one more time:

    (1) it looks to me that there is no need to call
    sched_rt_ratio_exceeded() from pick_next_rt_entity()

    - [ for CONFIG_FAIR_GROUP_SCHED ] queues with rt_rq->rt_throttled are
    not within this 'tree-like hierarchy' (or whatever we should call it
    :-)

    - there is also no need to re-check 'rt_rq->rt_time > ratio' at this
    point as 'rt_rq->rt_time' couldn't have been increased since the last
    call to update_curr_rt() (which obviously calls
    sched_rt_ratio_esceeded())
    well, it might be that 'ratio' for this rt_rq has been re-configured
    (and the period over which this rt_rq was active has not yet been
    finished)... but I don't think we should really take this into
    account.

    (2) now pick_next_rt_entity() must never return NULL, so let's change
    pick_next_task_rt() accordingly.

    Signed-off-by: Dmitry Adamushko
    Signed-off-by: Ingo Molnar

    Dmitry Adamushko
     
  • We monitor clock overflows, let's also monitor clock underflows.

    Signed-off-by: Guillaume Chazarain
    Signed-off-by: Ingo Molnar

    Guillaume Chazarain
     
  • sched: fix rq->clock warps on frequency changes

    Fix 2bacec8c318ca0418c0ee9ac662ee44207765dd4
    (sched: touch softlockup watchdog after idling) that reintroduced warps
    on frequency changes. touch_softlockup_watchdog() calls __update_rq_clock
    that checks rq->clock for warps, so call it after adjusting rq->clock.

    Signed-off-by: Guillaume Chazarain
    Signed-off-by: Ingo Molnar

    Guillaume Chazarain
     
  • Ensure that the kernel threads are created with the usual nice level
    and affinity even if kthreadd's properties were changed from the
    default by root.

    Signed-off-by: Michal Schmidt
    Signed-off-by: Ingo Molnar

    Michal Schmidt
     
  • Before:
    total: 25 errors, 13 warnings, 602 lines checked

    After:
    total: 0 errors, 2 warnings, 601 lines checked

    No code changed:

    kernel/profile.o:
    text data bss dec hex filename
    3048 236 24 3308 cec profile.o.before
    3048 236 24 3308 cec profile.o.after
    md5:
    2501d64748a4d350dffb11203e2a5182 profile.o.before.asm
    2501d64748a4d350dffb11203e2a5182 profile.o.after.asm

    Signed-off-by: Paolo Ciarrocchi
    Signed-off-by: Ingo Molnar

    Paolo Ciarrocchi
     
  • remove the !PREEMPT_BKL code.

    this removes 160 lines of legacy code.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • make PREEMPT_BKL the default.

    precursor to removal of the !PREEMPT_BKL code.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Based on a suggestion from Andi:

    In various cases, the unload of a module may leave some bad state around
    that causes a kernel crash AFTER a module is unloaded; and it's then hard
    to find which module caused that.

    This patch tracks the last unloaded module, and prints this as part of the
    module list in the oops trace.

    Right now, only the last 1 module is tracked; I expect that this is enough
    for the vast majority of cases where this information matters; if it turns
    out that tracking more is important, we can always extend it to that.

    [ mingo@elte.hu: build fix ]

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     
  • It's rather common that an oops/WARN_ON/BUG happens during the load or
    unload of a module. Unfortunatly, it's not always easy to see directly
    which module is being loaded/unloaded from the oops itself. Worse,
    it's not even always possible to ask the bug reporter, since there
    are so many components (udev etc) that auto-load modules that there's
    a good chance that even the reporter doesn't know which module this is.

    This patch extends the existing "show if it's tainting" print code,
    which is used as part of printing the modules in the oops/BUG/WARN_ON
    to include a "+" for "being loaded" and a "-" for "being unloaded".

    As a result this extension, the "taint_flags()" function gets renamed to
    "module_flags()" (and takes a module struct as argument, not a taint
    flags int).

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Ingo Molnar

    Arjan van de Ven
     
  • Remove the curious logic to set it_sched_expires in the future. It useless
    because rt.timeout wouldn't be incremented anyway.

    Explicity check for RLIM_INFINITY as a test programm that had a 1s soft limit
    and a inf hard limit would SIGKILL at 1s. This is because RLIM_INFINITY+d-1
    is d-2.

    Signed-off-by: Peter Zijlsta
    CC: Michal Schmidt
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Only reschedule if the new group has a higher prio task.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • hrtimer_wakeup creates a

    base->lock
    rq->lock

    lock dependancy. Avoid this by switching to HRTIMER_CB_IRQSAFE_NO_SOFTIRQ
    which doesn't hold base->lock.

    This fully untangles hrtimer locks from the scheduler locks, and allows
    hrtimer usage in the scheduler proper.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Currently all highres=off timers are run from softirq context, but
    HRTIMER_CB_IRQSAFE_NO_SOFTIRQ timers expect to run from irq context.

    Fix this up by splitting it similar to the highres=on case.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • In order to more easily allow for the scheduler to use timers, clean up
    the locking a bit.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • We need to teach no_hz about the rt throttling because its tick driven.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • "goto out" is an odd way to spell "skip".

    Signed-off-by: Mike Galbraith
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     
  • Extend group scheduling to also cover the realtime classes. It uses the time
    limiting introduced by the previous patch to allow multiple realtime groups.

    The hard time limit is required to keep behaviour deterministic.

    The algorithms used make the realtime scheduler O(tg), linear scaling wrt the
    number of task groups. This is the worst case behaviour I can't seem to get out
    of, the avg. case of the algorithms can be improved, I focused on correctness
    and worst case.

    [ akpm@linux-foundation.org: move side-effects out of BUG_ON(). ]

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Very simple time limit on the realtime scheduling classes.
    Allow the rq's realtime class to consume sched_rt_ratio of every
    sched_rt_period slice. If the class exceeds this quota the fair class
    will preempt the realtime class.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Use HR-timers (when available) to deliver an accurate preemption tick.

    The regular scheduler tick that runs at 1/HZ can be too coarse when nice
    level are used. The fairness system will still keep the cpu utilisation 'fair'
    by then delaying the task that got an excessive amount of CPU time but try to
    minimize this by delivering preemption points spot-on.

    The average frequency of this extra interrupt is sched_latency / nr_latency.
    Which need not be higher than 1/HZ, its just that the distribution within the
    sched_latency period is important.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Why do we even have cond_resched when real preemption
    is on? It seems to be a waste of space and time.

    remove cond_resched with CONFIG_PREEMPT on.

    Signed-off-by: Ingo Molnar

    Herbert Xu
     
  • whitespace fixes.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Introduce a new rlimit that allows the user to set a runtime timeout on
    real-time tasks their slice. Once this limit is exceeded the task will receive
    SIGXCPU.

    So it measures runtime since the last sleep.

    Input and ideas by Thomas Gleixner and Lennart Poettering.

    Signed-off-by: Peter Zijlstra
    CC: Lennart Poettering
    CC: Michael Kerrisk
    CC: Ulrich Drepper
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • Move the task_struct members specific to rt scheduling together.
    A future optimization could be to put sched_entity and sched_rt_entity
    into a union.

    Signed-off-by: Peter Zijlstra
    CC: Srivatsa Vaddagiri
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • There are already 4 error paths in alloc_uid() that do incremental rollbacks.
    I think it's time to merge them. This costs us 8 lines of code :)

    Maybe it would be better to merge this patch with the previous one, but I
    remember that some time ago I sent a similar patch (fixing the error path and
    cleaning it), but I was told to make two patches in such cases.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Dhaval Giani
    Signed-off-by: Andrew Morton
    Signed-off-by: Ingo Molnar

    Pavel Emelyanov
     
  • The baseline code statically builds the span maps when the domain is formed.
    Previous attempts at dynamically updating the maps caused a suspend-to-ram
    regression, which should now be fixed.

    Signed-off-by: Gregory Haskins
    CC: Gautham R Shenoy
    Signed-off-by: Ingo Molnar

    Gregory Haskins
     
  • This patch allows preemptible RCU to tolerate CPU-hotplug operations.
    It accomplishes this by maintaining a local copy of a map of online
    CPUs, which it accesses under its own lock.

    Signed-off-by: Gautham R Shenoy
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • This patch implements a new version of RCU which allows its read-side
    critical sections to be preempted. It uses a set of counter pairs
    to keep track of the read-side critical sections and flips them
    when all tasks exit read-side critical section. The details
    of this implementation can be found in this paper -

    http://www.rdrop.com/users/paulmck/RCU/OLSrtRCU.2006.08.11a.pdf

    and the article-

    http://lwn.net/Articles/253651/

    This patch was developed as a part of the -rt kernel development and
    meant to provide better latencies when read-side critical sections of
    RCU don't disable preemption. As a consequence of keeping track of RCU
    readers, the readers have a slight overhead (optimizations in the paper).
    This implementation co-exists with the "classic" RCU implementations
    and can be switched to at compiler.

    Also includes RCU tracing summarized in debugfs.

    [ akpm@linux-foundation.org: build fixes on non-preempt architectures ]

    Signed-off-by: Gautham R Shenoy
    Signed-off-by: Dipankar Sarma
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     
  • Fix rcu_barrier() to work properly in preemptive kernel environment.
    Also, the ordering of callback must be preserved while moving
    callbacks to another CPU during CPU hotplug.

    Signed-off-by: Gautham R Shenoy
    Signed-off-by: Dipankar Sarma
    Signed-off-by: Paul E. McKenney
    Reviewed-by: Steven Rostedt
    Signed-off-by: Ingo Molnar

    Paul E. McKenney