06 Dec, 2011

11 commits

  • This patch changes fields in cpustat from a structure, to an
    u64 array. Math gets easier, and the code is more flexible.

    Signed-off-by: Glauber Costa
    Reviewed-by: KAMEZAWA Hiroyuki
    Cc: Linus Torvalds
    Cc: Andrew Morton
    Cc: Paul Tuner
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1322498719-2255-2-git-send-email-glommer@parallels.com
    Signed-off-by: Ingo Molnar

    Glauber Costa
     
  • nr_busy_cpus in the sched_group_power indicates whether the group
    is semi idle or not. This helps remove the is_semi_idle_group() and simplify
    the find_new_ilb() in the context of finding an optimal cpu that can do
    idle load balancing.

    Signed-off-by: Suresh Siddha
    Signed-off-by: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20111202010832.656983582@sbsiddha-desk.sc.intel.com
    Signed-off-by: Ingo Molnar

    Suresh Siddha
     
  • When there are many logical cpu's that enter and exit idle often, members of
    the global nohz data structure are getting modified very frequently causing
    lot of cache-line contention.

    Make the nohz idle load balancing more scalabale by using the sched domain
    topology and 'nr_busy_cpu's in the struct sched_group_power.

    Idle load balance is kicked on one of the idle cpu's when there is atleast
    one idle cpu and:

    - a busy rq having more than one task or

    - a busy rq's scheduler group that share package resources (like HT/MC
    siblings) and has more than one member in that group busy or

    - for the SD_ASYM_PACKING domain, if the lower numbered cpu's in that
    domain are idle compared to the busy ones.

    This will help in kicking the idle load balancing request only when
    there is a potential imbalance. And once it is mostly balanced, these kicks will
    be minimized.

    These changes helped improve the workload that is context switch intensive
    between number of task pairs by 2x on a 8 socket NHM-EX based system.

    Reported-by: Tim Chen
    Signed-off-by: Suresh Siddha
    Signed-off-by: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20111202010832.602203411@sbsiddha-desk.sc.intel.com
    Signed-off-by: Ingo Molnar

    Suresh Siddha
     
  • Introduce nr_busy_cpus in the struct sched_group_power [Not in sched_group
    because sched groups are duplicated for the SD_OVERLAP scheduler domain]
    and for each cpu that enters and exits idle, this parameter will
    be updated in each scheduler group of the scheduler domain that this cpu
    belongs to.

    To avoid the frequent update of this state as the cpu enters
    and exits idle, the update of the stat during idle exit is
    delayed to the first timer tick that happens after the cpu becomes busy.
    This is done using NOHZ_IDLE flag in the struct rq's nohz_flags.

    Signed-off-by: Suresh Siddha
    Signed-off-by: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20111202010832.555984323@sbsiddha-desk.sc.intel.com
    Signed-off-by: Ingo Molnar

    Suresh Siddha
     
  • Introduce nohz_flags in the struct rq, which will track these two flags
    for now.

    NOHZ_TICK_STOPPED keeps track of the tick stopped status that gets set when
    the tick is stopped. It will be used to update the nohz idle load balancer data
    structures during the first busy tick after the tick is restarted. At this
    first busy tick after tickless idle, NOHZ_TICK_STOPPED flag will be reset.
    This will minimize the nohz idle load balancer status updates that currently
    happen for every tickless exit, making it more scalable when there
    are many logical cpu's that enter and exit idle often.

    NOHZ_BALANCE_KICK will track the need for nohz idle load balance
    on this rq. This will replace the nohz_balance_kick in the rq, which was
    not being updated atomically.

    Signed-off-by: Suresh Siddha
    Signed-off-by: Peter Zijlstra
    Cc: Thomas Gleixner
    Link: http://lkml.kernel.org/r/20111202010832.499438999@sbsiddha-desk.sc.intel.com
    Signed-off-by: Ingo Molnar

    Suresh Siddha
     
  • The second call to sched_rt_period() is redundant, because the value of the
    rt_runtime was already read and it was protected by the ->rt_runtime_lock.

    Signed-off-by: Shan Hai
    Reviewed-by: Kamalesh Babulal
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1322535836-13590-2-git-send-email-haishan.bai@gmail.com
    Signed-off-by: Ingo Molnar

    Shan Hai
     
  • For the SD_OVERLAP domain, sched_groups for each CPU's sched_domain are
    privately allocated and not shared with any other cpu. So the
    sched group allocation should come from the cpu's node for which
    SD_OVERLAP sched domain is being setup.

    Signed-off-by: Suresh Siddha
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20111118230554.164910950@sbsiddha-desk.sc.intel.com
    Signed-off-by: Ingo Molnar

    Suresh Siddha
     
  • This is another case where we are on our way to schedule(),
    so can save a useless clock update and resulting microscopic
    vruntime update.

    Signed-off-by: Mike Galbraith
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1321971686.6855.18.camel@marge.simson.net
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     
  • rt.nr_cpus_allowed is always available, use it to bail from select_task_rq()
    when only one cpu can be used, and saves some cycles for pinned tasks.

    See the line marked with '*' below:

    # taskset -c 3 pipe-test

    PerfTop: 997 irqs/sec kernel:89.5% exact: 0.0% [1000Hz cycles], (all, CPU: 3)
    ------------------------------------------------------------------------------------------------

    Virgin Patched
    samples pcnt function samples pcnt function
    _______ _____ ___________________________ _______ _____ ___________________________

    2880.00 10.2% __schedule 3136.00 11.3% __schedule
    1634.00 5.8% pipe_read 1615.00 5.8% pipe_read
    1458.00 5.2% system_call 1534.00 5.5% system_call
    1382.00 4.9% _raw_spin_lock_irqsave 1412.00 5.1% _raw_spin_lock_irqsave
    1202.00 4.3% pipe_write 1255.00 4.5% copy_user_generic_string
    1164.00 4.1% copy_user_generic_string 1241.00 4.5% __switch_to
    1097.00 3.9% __switch_to 929.00 3.3% mutex_lock
    872.00 3.1% mutex_lock 846.00 3.0% mutex_unlock
    687.00 2.4% mutex_unlock 804.00 2.9% pipe_write
    682.00 2.4% native_sched_clock 713.00 2.6% native_sched_clock
    643.00 2.3% system_call_after_swapgs 653.00 2.3% _raw_spin_unlock_irqrestore
    617.00 2.2% sched_clock_local 633.00 2.3% fsnotify
    612.00 2.2% fsnotify 605.00 2.2% sched_clock_local
    596.00 2.1% _raw_spin_unlock_irqrestore 593.00 2.1% system_call_after_swapgs
    542.00 1.9% sysret_check 559.00 2.0% sysret_check
    467.00 1.7% fget_light 472.00 1.7% fget_light
    462.00 1.6% finish_task_switch 461.00 1.7% finish_task_switch
    437.00 1.5% vfs_write 442.00 1.6% vfs_write
    431.00 1.5% do_sync_write 428.00 1.5% do_sync_write
    * 413.00 1.5% select_task_rq_fair 404.00 1.5% _raw_spin_lock_irq
    386.00 1.4% update_curr 402.00 1.4% update_curr
    385.00 1.4% rw_verify_area 389.00 1.4% do_sync_read
    377.00 1.3% _raw_spin_lock_irq 378.00 1.4% vfs_read
    369.00 1.3% do_sync_read 340.00 1.2% pipe_iov_copy_from_user
    360.00 1.3% vfs_read 316.00 1.1% __wake_up_sync_key
    342.00 1.2% hrtick_start_fair 313.00 1.1% __wake_up_common

    Signed-off-by: Mike Galbraith
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1321971504.6855.15.camel@marge.simson.net
    Signed-off-by: Ingo Molnar

    Mike Galbraith
     
  • Instead of going through the scheduler domain hierarchy multiple times
    (for giving priority to an idle core over an idle SMT sibling in a busy
    core), start with the highest scheduler domain with the SD_SHARE_PKG_RESOURCES
    flag and traverse the domain hierarchy down till we find an idle group.

    This cleanup also addresses an issue reported by Mike where the recent
    changes returned the busy thread even in the presence of an idle SMT
    sibling in single socket platforms.

    Signed-off-by: Suresh Siddha
    Tested-by: Mike Galbraith
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1321556904.15339.25.camel@sbsiddha-desk.sc.intel.com
    Signed-off-by: Ingo Molnar

    Suresh Siddha
     
  • This tracepoint shows how long a task is sleeping in uninterruptible state.

    E.g. it may show how long and where a mutex is waited for.

    Signed-off-by: Andrew Vagin
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1322471015-107825-8-git-send-email-avagin@openvz.org
    Signed-off-by: Ingo Molnar

    Andrew Vagin
     

17 Nov, 2011

2 commits


16 Nov, 2011

6 commits


14 Nov, 2011

10 commits

  • In UP systems, the idle task is initialized using the init_task
    structure from which the command name is taken (currently "swapper").

    In SMP systems, one idle task per CPU is forked by the worker thread
    from which the task structure is copied. The command name is, therefore,
    "kworker/0:0" or "kworker/0:1", if not updated. Since such update was
    lacking, all idle tasks in SMP systems were incorrectly named. This
    longtime bug was not discovered immediately, because there is no /proc/0
    entry - the bug only becomes apparent when tracing is enabled.

    This patch sets the command name of the idle tasks in SMP systems to the
    name that is used in the INIT_TASK structure suffixed by a slash and the
    number of the CPU.

    Signed-off-by: Carsten Emde
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20111026211708.768925506@osadl.org
    Signed-off-by: Ingo Molnar

    Carsten Emde
     
  • Normally the RT bandwidth scheme will share bandwidth across the
    entire root_domain. However sometimes its convenient to disable this
    sharing for debug purposes. Provide a simple feature switch to this
    end.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • The return-value convention for these functions varies depending on
    whether they're interruptible or can timeout. It can be a little
    confusing--document it.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/20111006192246.GB28026@fieldses.org
    Signed-off-by: Ingo Molnar

    J. Bruce Fields
     
  • Signed-off-by: Hui Kang
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1318388459-4427-1-git-send-email-hkang.sunysb@gmail.com
    Signed-off-by: Ingo Molnar

    Hui Kang
     
  • Every time I have to stare at this function I need to completely
    reverse engineer its workings, about time I write a comment
    explaining the thing.

    Collected bits and pieces from previous changelogs, mostly:

    4be9daaa1b33701f011f4117f22dc1e45a3e6e34
    83378269a5fad98f562ebc0f09c349575e6cbfe1

    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/r/1318518057.27731.2.camel@twins
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     
  • * 'rmobile-fixes-for-linus' of git://github.com/pmundt/linux-sh:
    ARM: mach-shmobile: cpuidle single/global and last_state fixes
    ARM: mach-shmobile: move helper macro PORTCR to sh_pfc.h
    ARM: mach-shmobile: move helper macro PORT_xx to sh_pfc.h
    ARM: mach-shmobile: move helper macro PORT_DATA_xx to sh_pfc.h
    ARM: mach-shmobile: ap4evb: remove white space from end of line
    ARM: mach-shmobile: clock-sh7372: remove un-necessary index
    ARM: mach-shmobile: kota2: add comment out separator
    ARM: mach-shmobile: sh73a0: add MMC data pin pull-up

    Linus Torvalds
     
  • * 'sh-fixes-for-linus' of git://github.com/pmundt/linux-sh:
    mailmap: Fix up some renesas attributions
    sh: clkfwk: Kill off remaining debugfs cruft.
    drivers: sh: Kill off dead pathname for runtime PM stub.
    drivers: sh: Generalize runtime PM platform stub.
    sh: Wire up process_vm syscalls.
    sh: clkfwk: add clk_rate_mult_range_round()
    serial: sh-sci: Fix up SH-2A SCIF support.
    sh: Fix cached/uncaced address calculation in 29bit mode

    Linus Torvalds
     
  • * git://github.com/rustyrussell/linux:
    virtio-pci: fix use after free

    Linus Torvalds
     
  • Commit 31a3ddda166cda86d2b5111e09ba4bda5239fae6 introduced
    a use after free in virtio-pci. The main issue is
    that the release method signals removal of the virtio device,
    while remove signals removal of the pci device.

    For example, on driver removal or hot-unplug,
    virtio_pci_release_dev is called before virtio_pci_remove.
    We then might get a crash as virtio_pci_remove tries to use the
    device freed by virtio_pci_release_dev.

    We allocate/free all resources together with the
    pci device, so we can leave the release method empty.

    Signed-off-by: Michael S. Tsirkin
    Acked-by: Amit Shah
    Signed-off-by: Rusty Russell
    Cc: stable@kernel.org

    Michael S. Tsirkin
     
  • * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
    drm/radeon/kms/combios: fix dynamic allocation of PM clock modes

    Linus Torvalds
     

13 Nov, 2011

2 commits

  • After commit e978aa7d7d57 ("cpuidle: Move dev->last_residency update to
    driver enter routine; remove dev->last_state") setting acpi_idle_suspend
    to 1 by acpi_processor_suspend() causes the ACPI cpuidle routines to
    return error codes continuously, which in turn causes cpuidle to lock up
    (hard).

    However, acpi_idle_suspend doesn't appear to be useful for any
    particular purpose (it's racy and doesn't really provide any real
    protection), so it can be removed, which makes the problem go away.

    Reported-and-tested-by: Tomas M.
    Reported-and-tested-by: Ferenc Wagner
    Tested-by: Arnd Bergmann
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • I missed the combios path when I updated the atombios pm code.

    Reported by amarsh04 on IRC.

    Signed-off-by: Alex Deucher
    Signed-off-by: Dave Airlie

    Alex Deucher
     

12 Nov, 2011

9 commits

  • * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
    arm/imx: fix imx6q mmc error when mounting rootfs
    arm/imx: fix AUTO_ZRELADDR selection
    arm/imx: fix the references to ARCH_MX3
    ARM: mx51/53: set pwm clock parent to ipg_perclk
    arm/tegra: enable headphone detection gpio on seaboard
    arm/dt: Fix ventana SDHCI power-gpios
    arm/tegra: Don't create duplicate gpio and pinmux devices
    ARM: at91: Fix USBA gadget registration
    atmel/spi: fix missing probe
    at91/yl-9200: Fix section mismatch
    at91: vmalloc fix missing AT91_VIRT_BASE define
    ARM: at91: usart: drop static map regs for dbgu
    ARM: picoxcell: add extra temp register to addruart
    ARM: msm: fix compilation flags for MSM_SCM
    arm/mxs: fix mmc device adding for mach-mx28evk
    ARM: mxc: Remove test_for_ltirq
    ARM:i.MX: fix build error in clock-mx51-mx53.c
    ARM:i.MX: fix build error in tzic/avic.c
    ARM: mxc: fix local timer interrupt handling
    msm: boards: Fix fallout from removal of machine_desc in fixup

    Linus Torvalds
     
  • The variable i is removed by commit ded8433
    "[CPUFREQ] db8500: remove unneeded for loop iteration over freq_table",
    but current code to print available frequencies still uses the i variable.
    Thus add the i variable back to fix below buld error:

    CC drivers/cpufreq/db8500-cpufreq.o
    drivers/cpufreq/db8500-cpufreq.c: In function 'db8500_cpufreq_init':
    drivers/cpufreq/db8500-cpufreq.c:123: error: 'i' undeclared (first use in this function)
    drivers/cpufreq/db8500-cpufreq.c:123: error: (Each undeclared identifier is reported only once
    drivers/cpufreq/db8500-cpufreq.c:123: error: for each function it appears in.)
    make[2]: *** [drivers/cpufreq/db8500-cpufreq.o] Error 1
    make[1]: *** [drivers/cpufreq] Error 2
    make: *** [drivers] Error 2

    This patch also fixes using uninitialized i variable as array index.

    Signed-off-by: Axel Lin
    Acked-by: Linus Walleij
    Signed-off-by: Dave Jones

    Axel Lin
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: (29 commits)
    m68k/mac: Remove mac_irq_{en,dis}able() wrappers
    m68k/irq: Remove obsolete support for user vector interrupt fixups
    m68k/irq: Remove obsolete m68k irq framework
    m68k/q40: Convert Q40/Q60 to genirq
    m68k/sun3: Convert Sun3/3x to genirq
    m68k/sun3: Use the kstat_irqs_cpu() wrapper
    m68k/apollo: Convert Apollo to genirq
    m68k/vme: Convert VME to genirq
    m68k/hp300: Convert HP9000/300 and HP9000/400 to genirq
    m68k/mac: Optimize interrupts using chain handlers
    m68k/mac: Convert Mac to genirq
    m68k/amiga: Optimize interrupts using chain handlers
    m68k/amiga: Convert Amiga to genirq
    m68k/amiga: Refactor amiints.c
    m68k/atari: Remove code and comments about different irq types
    m68k/atari: Convert Atari to genirq
    m68k/irq: Add genirq support
    m68k/irq: Remove obsolete IRQ_FLG_* users
    m68k/irq: Rename {,__}m68k_handle_int()
    m68k/irq: Add m68k_setup_irq_controller()
    ...

    Linus Torvalds
     
  • * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
    [media] v4l2-ctrl: Send change events to all fh for auto cluster slave controls
    [media] v4l2-event: Don't set sev->fh to NULL on unsubscribe
    [media] v4l2-event: Remove pending events from fh event queue when unsubscribing
    [media] v4l2-event: Deny subscribing with a type of V4L2_EVENT_ALL
    [media] MAINTAINERS: add a maintainer for s5p-mfc driver
    [media] v4l: s5p-mfc: fix reported capabilities
    [media] media: vb2: reset queued list on REQBUFS(0) call
    [media] media: vb2: set buffer length correctly for all buffer types
    [media] media: vb2: add a check for uninitialized buffer
    [media] mxl111sf: fix build warning
    [media] mxl111sf: remove pointless if condition in mxl111sf_config_spi
    [media] mxl111sf: check for errors after mxl111sf_write_reg in mxl111sf_idac_config
    [media] mxl111sf: fix return value of mxl111sf_idac_config
    [media] uvcvideo: GET_RES should only be checked for BITMAP type menu controls

    Linus Torvalds
     
  • * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
    powerpc/kvm: Fix build failure with HV KVM and CBE
    powerpc/ps3: Fix lv1_gpu_attribute hcall
    powerpc/ps3: Fix PS3 repository build warnings
    powerpc/ps3: irq: Remove IRQF_DISABLED
    powerpc/irq: Remove IRQF_DISABLED
    powerpc/numa: NUMA topology support for PowerNV
    powerpc: Add System RAM to /proc/iomem
    powerpc: Add KVM as module to defconfigs
    powerpc/kvm: Fix build with older toolchains
    powerpc, tqm5200: update tqm5200_defconfig to fit for charon board.
    powerpc/5200: add support for charon board

    Linus Torvalds
     
  • * 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
    kbuild: Fix missing system calls check on mips.

    Linus Torvalds
     
  • This needed the sfi IRQ 0xFF fix to go in first. It simply plumbs in the
    bma023 driver with the firmware naming of it.

    Signed-off-by: William Douglas
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    William Douglas
     
  • Real world year equals the value in vrtc YEAR register plus an offset.
    We used 1960 as the offset to make leap year consistent, but for a
    device's first use, its YEAR register is 0 and the system year will
    be parsed as 1960 which is not a valid UNIX time and will cause many
    applications to fail mysteriously. So we use 1972 instead to fix this
    issue.

    Updated patch which adds a sanity check suggested by Mathias

    This isn't a change in behaviour for systems, because 1972 is the one we
    actually use. It's the old version in upstream which is out of sync with
    all devices.

    Signed-off-by: Feng Tang
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Feng Tang
     
  • Fix a build error. CE4100 with no serial errors because the alternate
    function is only a prototype not a null function as intended.

    Signed-off-by: Zhang Rui
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Zhang Rui