12 Jan, 2012

1 commit

  • * 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci: (80 commits)
    x86/PCI: Expand the x86_msi_ops to have a restore MSIs.
    PCI: Increase resource array mask bit size in pcim_iomap_regions()
    PCI: DEVICE_COUNT_RESOURCE should be equal to PCI_NUM_RESOURCES
    PCI: pci_ids: add device ids for STA2X11 device (aka ConneXT)
    PNP: work around Dell 1536/1546 BIOS MMCONFIG bug that breaks USB
    x86/PCI: amd: factor out MMCONFIG discovery
    PCI: Enable ATS at the device state restore
    PCI: msi: fix imbalanced refcount of msi irq sysfs objects
    PCI: kconfig: English typo in pci/pcie/Kconfig
    PCI/PM/Runtime: make PCI traces quieter
    PCI: remove pci_create_bus()
    xtensa/PCI: convert to pci_scan_root_bus() for correct root bus resources
    x86/PCI: convert to pci_create_root_bus() and pci_scan_root_bus()
    x86/PCI: use pci_scan_bus() instead of pci_scan_bus_parented()
    x86/PCI: read Broadcom CNB20LE host bridge info before PCI scan
    sparc32, leon/PCI: convert to pci_scan_root_bus() for correct root bus resources
    sparc/PCI: convert to pci_create_root_bus()
    sh/PCI: convert to pci_scan_root_bus() for correct root bus resources
    powerpc/PCI: convert to pci_create_root_bus()
    powerpc/PCI: split PHB part out of pcibios_map_io_space()
    ...

    Fix up conflicts in drivers/pci/msi.c and include/linux/pci_regs.h due
    to the same patches being applied in other branches.

    Linus Torvalds
     

11 Jan, 2012

1 commit

  • * 'kvm-updates/3.3' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (74 commits)
    KVM: PPC: Whitespace fix for kvm.h
    KVM: Fix whitespace in kvm_para.h
    KVM: PPC: annotate kvm_rma_init as __init
    KVM: x86 emulator: implement RDPMC (0F 33)
    KVM: x86 emulator: fix RDPMC privilege check
    KVM: Expose the architectural performance monitoring CPUID leaf
    KVM: VMX: Intercept RDPMC
    KVM: SVM: Intercept RDPMC
    KVM: Add generic RDPMC support
    KVM: Expose a version 2 architectural PMU to a guests
    KVM: Expose kvm_lapic_local_deliver()
    KVM: x86 emulator: Use opcode::execute for Group 9 instruction
    KVM: x86 emulator: Use opcode::execute for Group 4/5 instructions
    KVM: x86 emulator: Use opcode::execute for Group 1A instruction
    KVM: ensure that debugfs entries have been created
    KVM: drop bsp_vcpu pointer from kvm struct
    KVM: x86: Consolidate PIT legacy test
    KVM: x86: Do not rely on implicit inclusions
    KVM: Make KVM_INTEL depend on CPU_SUP_INTEL
    KVM: Use memdup_user instead of kmalloc/copy_from_user
    ...

    Linus Torvalds
     

09 Jan, 2012

3 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (53 commits)
    Kconfig: acpi: Fix typo in comment.
    misc latin1 to utf8 conversions
    devres: Fix a typo in devm_kfree comment
    btrfs: free-space-cache.c: remove extra semicolon.
    fat: Spelling s/obsolate/obsolete/g
    SCSI, pmcraid: Fix spelling error in a pmcraid_err() call
    tools/power turbostat: update fields in manpage
    mac80211: drop spelling fix
    types.h: fix comment spelling for 'architectures'
    typo fixes: aera -> area, exntension -> extension
    devices.txt: Fix typo of 'VMware'.
    sis900: Fix enum typo 'sis900_rx_bufer_status'
    decompress_bunzip2: remove invalid vi modeline
    treewide: Fix comment and string typo 'bufer'
    hyper-v: Update MAINTAINERS
    treewide: Fix typos in various parts of the kernel, and fix some comments.
    clockevents: drop unknown Kconfig symbol GENERIC_CLOCKEVENTS_MIGR
    gpio: Kconfig: drop unknown symbol 'CS5535_GPIO'
    leds: Kconfig: Fix typo 'D2NET_V2'
    sound: Kconfig: drop unknown symbol ARCH_CLPS7500
    ...

    Fix up trivial conflicts in arch/powerpc/platforms/40x/Kconfig (some new
    kconfig additions, close to removed commented-out old ones)

    Linus Torvalds
     
  • * 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits)
    PM / Hibernate: Implement compat_ioctl for /dev/snapshot
    PM / Freezer: fix return value of freezable_schedule_timeout_killable()
    PM / shmobile: Allow the A4R domain to be turned off at run time
    PM / input / touchscreen: Make st1232 use device PM QoS constraints
    PM / QoS: Introduce dev_pm_qos_add_ancestor_request()
    PM / shmobile: Remove the stay_on flag from SH7372's PM domains
    PM / shmobile: Don't include SH7372's INTCS in syscore suspend/resume
    PM / shmobile: Add support for the sh7372 A4S power domain / sleep mode
    PM: Drop generic_subsys_pm_ops
    PM / Sleep: Remove forward-only callbacks from AMBA bus type
    PM / Sleep: Remove forward-only callbacks from platform bus type
    PM: Run the driver callback directly if the subsystem one is not there
    PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers
    PM/Devfreq: Add Exynos4-bus device DVFS driver for Exynos4210/4212/4412.
    PM / Sleep: Merge internal functions in generic_ops.c
    PM / Sleep: Simplify generic system suspend callbacks
    PM / Hibernate: Remove deprecated hibernation snapshot ioctls
    PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
    ARM: S3C64XX: Implement basic power domain support
    PM / shmobile: Use common always on power domain governor
    ...

    Fix up trivial conflict in fs/xfs/xfs_buf.c due to removal of unused
    XBT_FORCE_SLEEP bit

    Linus Torvalds
     
  • * 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits)
    reiserfs: Properly display mount options in /proc/mounts
    vfs: prevent remount read-only if pending removes
    vfs: count unlinked inodes
    vfs: protect remounting superblock read-only
    vfs: keep list of mounts for each superblock
    vfs: switch ->show_options() to struct dentry *
    vfs: switch ->show_path() to struct dentry *
    vfs: switch ->show_devname() to struct dentry *
    vfs: switch ->show_stats to struct dentry *
    switch security_path_chmod() to struct path *
    vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
    vfs: trim includes a bit
    switch mnt_namespace ->root to struct mount
    vfs: take /proc/*/mounts and friends to fs/proc_namespace.c
    vfs: opencode mntget() mnt_set_mountpoint()
    vfs: spread struct mount - remaining argument of next_mnt()
    vfs: move fsnotify junk to struct mount
    vfs: move mnt_devname
    vfs: move mnt_list to struct mount
    vfs: switch pnode.h macros to struct mount *
    ...

    Linus Torvalds
     

08 Jan, 2012

1 commit

  • * 'driver-core-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (73 commits)
    arm: fix up some samsung merge sysdev conversion problems
    firmware: Fix an oops on reading fw_priv->fw in sysfs loading file
    Drivers:hv: Fix a bug in vmbus_driver_unregister()
    driver core: remove __must_check from device_create_file
    debugfs: add missing #ifdef HAS_IOMEM
    arm: time.h: remove device.h #include
    driver-core: remove sysdev.h usage.
    clockevents: remove sysdev.h
    arm: convert sysdev_class to a regular subsystem
    arm: leds: convert sysdev_class to a regular subsystem
    kobject: remove kset_find_obj_hinted()
    m86k: gpio - convert sysdev_class to a regular subsystem
    mips: txx9_sram - convert sysdev_class to a regular subsystem
    mips: 7segled - convert sysdev_class to a regular subsystem
    sh: dma - convert sysdev_class to a regular subsystem
    sh: intc - convert sysdev_class to a regular subsystem
    power: suspend - convert sysdev_class to a regular subsystem
    power: qe_ic - convert sysdev_class to a regular subsystem
    power: cmm - convert sysdev_class to a regular subsystem
    s390: time - convert sysdev_class to a regular subsystem
    ...

    Fix up conflicts with 'struct sysdev' removal from various platform
    drivers that got changed:
    - arch/arm/mach-exynos/cpu.c
    - arch/arm/mach-exynos/irq-eint.c
    - arch/arm/mach-s3c64xx/common.c
    - arch/arm/mach-s3c64xx/cpu.c
    - arch/arm/mach-s5p64x0/cpu.c
    - arch/arm/mach-s5pv210/common.c
    - arch/arm/plat-samsung/include/plat/cpu.h
    - arch/powerpc/kernel/sysfs.c
    and fix up cpu_is_hotpluggable() as per Greg in include/linux/cpu.h

    Linus Torvalds
     

07 Jan, 2012

6 commits

  • * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (185 commits)
    powerpc: fix compile error with 85xx/p1010rdb.c
    powerpc: fix compile error with 85xx/p1023_rds.c
    powerpc/fsl: add MSI support for the Freescale hypervisor
    arch/powerpc/sysdev/fsl_rmu.c: introduce missing kfree
    powerpc/fsl: Add support for Integrated Flash Controller
    powerpc/fsl: update compatiable on fsl 16550 uart nodes
    powerpc/85xx: fix PCI and localbus properties in p1022ds.dts
    powerpc/85xx: re-enable ePAPR byte channel driver in corenet32_smp_defconfig
    powerpc/fsl: Update defconfigs to enable some standard FSL HW features
    powerpc: Add TBI PHY node to first MDIO bus
    sbc834x: put full compat string in board match check
    powerpc/fsl-pci: Allow 64-bit PCIe devices to DMA to any memory address
    powerpc: Fix unpaired probe_hcall_entry and probe_hcall_exit
    offb: Fix setting of the pseudo-palette for >8bpp
    offb: Add palette hack for qemu "standard vga" framebuffer
    offb: Fix bug in calculating requested vram size
    powerpc/boot: Change the WARN to INFO for boot wrapper overlap message
    powerpc/44x: Fix build error on currituck platform
    powerpc/boot: Change the load address for the wrapper to fit the kernel
    powerpc/44x: Enable CRASH_DUMP for 440x
    ...

    Fix up a trivial conflict in arch/powerpc/include/asm/cputime.h due to
    the additional sparse-checking code for cputime_t.

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1958 commits)
    net: pack skb_shared_info more efficiently
    net_sched: red: split red_parms into parms and vars
    net_sched: sfq: extend limits
    cnic: Improve error recovery on bnx2x devices
    cnic: Re-init dev->stats_addr after chip reset
    net_sched: Bug in netem reordering
    bna: fix sparse warnings/errors
    bna: make ethtool_ops and strings const
    xgmac: cleanups
    net: make ethtool_ops const
    vmxnet3" make ethtool ops const
    xen-netback: make ops structs const
    virtio_net: Pass gfp flags when allocating rx buffers.
    ixgbe: FCoE: Add support for ndo_get_fcoe_hbainfo() call
    netdev: FCoE: Add new ndo_get_fcoe_hbainfo() call
    igb: reset PHY after recovering from PHY power down
    igb: add basic runtime PM support
    igb: Add support for byte queue limits.
    e1000: cleanup CE4100 MDIO registers access
    e1000: unmap ce4100_gbe_mdio_base_virt in e1000_remove
    ...

    Linus Torvalds
     
  • CC: Benjamin Herrenschmidt
    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Jesse Barnes

    Bjorn Helgaas
     
  • This patch converts PowerPC's architecture-specific
    'pcibios_set_master()' routine to a non-inlined function. This will
    allow follow on patches to create a generic 'pcibios_set_master()'
    function using the '__weak' attribute which can be used by all
    architectures as a default which, if necessary, can then be over-
    ridden by architecture-specific code.

    Converting 'pci_bios_set_master()' to a non-inlined function will
    allow PowerPC's 'pcibios_set_master()' implementation to remain
    architecture-specific after the generic version is introduced and
    thus, not change current behavior.

    No functional change.

    Acked-by: Benjamin Herrenschmidt
    Signed-off-by: Myron Stowe
    Signed-off-by: Jesse Barnes

    Myron Stowe
     
  • This resolves the conflict in the arch/arm/mach-s3c64xx/s3c6400.c file,
    and it fixes the build error in the arch/x86/kernel/microcode_core.c
    file, that the merge did not catch.

    The microcode_core.c patch was provided by Stephen Rothwell
    who was invaluable in the merge issues involved
    with the large sysdev removal process in the driver-core tree.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • * 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
    sched/tracing: Add a new tracepoint for sleeptime
    sched: Disable scheduler warnings during oopses
    sched: Fix cgroup movement of waking process
    sched: Fix cgroup movement of newly created process
    sched: Fix cgroup movement of forking process
    sched: Remove cfs bandwidth period check in tg_set_cfs_period()
    sched: Fix load-balance lock-breaking
    sched: Replace all_pinned with a generic flags field
    sched: Only queue remote wakeups when crossing cache boundaries
    sched: Add missing rcu_dereference() around ->real_parent usage
    [S390] fix cputime overflow in uptime_proc_show
    [S390] cputime: add sparse checking and cleanup
    sched: Mark parent and real_parent as __rcu
    sched, nohz: Fix missing RCU read lock
    sched, nohz: Set the NOHZ_BALANCE_KICK flag for idle load balancer
    sched, nohz: Fix the idle cpu check in nohz_idle_balance
    sched: Use jump_labels for sched_feat
    sched/accounting: Fix parameter passing in task_group_account_field
    sched/accounting: Fix user/system tick double accounting
    sched/accounting: Re-use scheduler statistics for the root cgroup
    ...

    Fix up conflicts in
    - arch/ia64/include/asm/cputime.h, include/asm-generic/cputime.h
    usecs_to_cputime64() vs the sparse cleanups
    - kernel/sched/fair.c, kernel/time/tick-sched.c
    scheduler changes in multiple branches

    Linus Torvalds
     

06 Jan, 2012

1 commit

  • * 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
    memblock: Reimplement memblock allocation using reverse free area iterator
    memblock: Kill early_node_map[]
    score: Use HAVE_MEMBLOCK_NODE_MAP
    s390: Use HAVE_MEMBLOCK_NODE_MAP
    mips: Use HAVE_MEMBLOCK_NODE_MAP
    ia64: Use HAVE_MEMBLOCK_NODE_MAP
    SuperH: Use HAVE_MEMBLOCK_NODE_MAP
    sparc: Use HAVE_MEMBLOCK_NODE_MAP
    powerpc: Use HAVE_MEMBLOCK_NODE_MAP
    memblock: Implement memblock_add_node()
    memblock: s/memblock_analyze()/memblock_allow_resize()/ and update users
    memblock: Track total size of regions automatically
    powerpc: Cleanup memblock usage
    memblock: Reimplement memblock_enforce_memory_limit() using __memblock_remove()
    memblock: Make memblock functions handle overflowing range @size
    memblock: Reimplement __memblock_remove() using memblock_isolate_range()
    memblock: Separate out memblock_isolate_range() from memblock_set_node()
    memblock: Kill memblock_init()
    memblock: Kill sentinel entries at the end of static region arrays
    memblock: Add __memblock_dump_all()
    ...

    Linus Torvalds
     

05 Jan, 2012

1 commit


04 Jan, 2012

2 commits


31 Dec, 2011

1 commit


30 Dec, 2011

1 commit

  • Commit 2a95ea6c0d129b4 ("procfs: do not overflow get_{idle,iowait}_time
    for nohz") did not take into account that one some architectures jiffies
    and cputime use different units.

    This causes get_idle_time() to return numbers in the wrong units, making
    the idle time fields in /proc/stat wrong.

    Instead of converting the usec value returned by
    get_cpu_{idle,iowait}_time_us to units of jiffies, use the new function
    usecs_to_cputime64 to convert it to the correct unit of cputime64_t.

    Signed-off-by: Andreas Schwab
    Acked-by: Michal Hocko
    Cc: Arnd Bergmann
    Cc: "Artem S. Tashkinov"
    Cc: Dave Jones
    Cc: Alexey Dobriyan
    Cc: Thomas Gleixner
    Cc: "Luck, Tony"
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andreas Schwab
     

27 Dec, 2011

1 commit


26 Dec, 2011

1 commit


22 Dec, 2011

1 commit

  • This moves the 'cpu sysdev_class' over to a regular 'cpu' subsystem
    and converts the devices to regular devices. The sysdev drivers are
    implemented as subsystem interfaces now.

    After all sysdev classes are ported to regular driver core entities, the
    sysdev implementation will be entirely removed from the kernel.

    Userspace relies on events and generic sysfs subsystem infrastructure
    from sysdev devices, which are made available with this conversion.

    Cc: Haavard Skinnemoen
    Cc: Hans-Christian Egtvedt
    Cc: Tony Luck
    Cc: Fenghua Yu
    Cc: Arnd Bergmann
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Paul Mundt
    Cc: "David S. Miller"
    Cc: Chris Metcalf
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Borislav Petkov
    Cc: Tigran Aivazian
    Cc: Len Brown
    Cc: Zhang Rui
    Cc: Dave Jones
    Cc: Peter Zijlstra
    Cc: Russell King
    Cc: Andrew Morton
    Cc: Arjan van de Ven
    Cc: "Rafael J. Wysocki"
    Cc: "Srivatsa S. Bhat"
    Signed-off-by: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     

20 Dec, 2011

2 commits

  • We find the runtime address of _stext and relocate ourselves based
    on the following calculation.

    virtual_base = ALIGN(KERNELBASE,KERNEL_TLB_PIN_SIZE) +
    MODULO(_stext.run,KERNEL_TLB_PIN_SIZE)

    relocate() is called with the Effective Virtual Base Address (as
    shown below)

    | Phys. Addr| Virt. Addr |
    Page |------------------------|
    Boundary | | |
    | | |
    | | |
    Kernel Load |___________|_ __ _ _ _ _|
    Cc: Benjamin Herrenschmidt
    Cc: Kumar Gala
    Cc: linuxppc-dev
    Signed-off-by: Josh Boyer

    Suzuki Poulose
     
  • The current implementation of CONFIG_RELOCATABLE in BookE is based
    on mapping the page aligned kernel load address to KERNELBASE. This
    approach however is not enough for platforms, where the TLB page size
    is large (e.g, 256M on 44x). So we are renaming the RELOCATABLE used
    currently in BookE to DYNAMIC_MEMSTART to reflect the actual method.

    The CONFIG_RELOCATABLE for PPC32(BookE) based on processing of the
    dynamic relocations will be introduced in the later in the patch series.

    This change would allow the use of the old method of RELOCATABLE for
    platforms which can afford to enforce the page alignment (platforms with
    smaller TLB size).

    Changes since v3:

    * Introduced a new config, NONSTATIC_KERNEL, to denote a kernel which is
    either a RELOCATABLE or DYNAMIC_MEMSTART(Suggested by: Josh Boyer)

    Suggested-by: Scott Wood
    Tested-by: Scott Wood

    Signed-off-by: Suzuki K. Poulose
    Cc: Scott Wood
    Cc: Kumar Gala
    Cc: Josh Boyer
    Cc: Benjamin Herrenschmidt
    Cc: linux ppc dev
    Signed-off-by: Josh Boyer

    Suzuki Poulose
     

19 Dec, 2011

4 commits

  • We support 16TB of user address space and half a million contexts
    so update the comment to reflect this.

    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     
  • Commit d57af9b (taskstats: use real microsecond granularity for CPU times)
    renamed msecs_to_cputime to usecs_to_cputime, but failed to update all
    numbers on the way. This causes nonsensical cpu idle/iowait values to be
    displayed in /proc/stat (the only user of usecs_to_cputime so far).

    This also renames __cputime_msec_factor to __cputime_usec_factor, adapting
    its value and using it directly in cputime_to_usecs instead of doing two
    multiplications.

    Signed-off-by: Andreas Schwab
    Acked-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Andreas Schwab
     
  • PPC64 uses long long for u64 in the kernel, but powerpc's asm/types.h
    prevents 64-bit userland from seeing this definition, instead defaulting
    to u64 == long in userspace. Some user programs (e.g. kvmtool) may actually
    want LL64, so this patch adds a check for __SANE_USERSPACE_TYPES__ so that,
    if defined, int-ll64.h is included instead.

    Signed-off-by: Matt Evans
    Acked-by: Ingo Molnar
    Signed-off-by: Benjamin Herrenschmidt

    Matt Evans
     
  • Implement a POWER7 optimised copy_to_user/copy_from_user using VMX.
    For large aligned copies this new loop is over 10% faster, and for
    large unaligned copies it is over 200% faster.

    If we take a fault we fall back to the old version, this keeps
    things relatively simple and easy to verify.

    On POWER7 unaligned stores rarely slow down - they only flush when
    a store crosses a 4KB page boundary. Furthermore this flush is
    handled completely in hardware and should be 20-30 cycles.

    Unaligned loads on the other hand flush much more often - whenever
    crossing a 128 byte cache line, or a 32 byte sector if either sector
    is an L1 miss.

    Considering this information we really want to get the loads aligned
    and not worry about the alignment of the stores. Microbenchmarks
    confirm that this approach is much faster than the current unaligned
    copy loop that uses shifts and rotates to ensure both loads and
    stores are aligned.

    We also want to try and do the stores in cacheline aligned, cacheline
    sized chunks. If the store queue is unable to merge an entire
    cacheline of stores then the L2 cache will have to do a
    read/modify/write. Even worse, we will serialise this with the stores
    in the next iteration of the copy loop since both iterations hit
    the same cacheline.

    Based on this, the new loop does the following things:

    1 - 127 bytes
    Get the source 8 byte aligned and use 8 byte loads and stores. Pretty
    boring and similar to how the current loop works.

    128 - 4095 bytes
    Get the source 8 byte aligned and use 8 byte loads and stores,
    1 cacheline at a time. We aren't doing the stores in cacheline
    aligned chunks so we will potentially serialise once per cacheline.
    Even so it is much better than the loop we have today.

    4096 - bytes
    If both source and destination have the same alignment get them both
    16 byte aligned, then get the destination cacheline aligned. Do
    cacheline sized loads and stores using VMX.

    If source and destination do not have the same alignment, we get the
    destination cacheline aligned, and use permute to do aligned loads.

    In both cases the VMX loop should be optimal - we always do aligned
    loads and stores and are always doing stores in cacheline aligned,
    cacheline sized chunks.

    To be able to use VMX we must be careful about interrupts and
    sleeping. We don't use the VMX loop when in an interrupt (which should
    be rare anyway) and we wrap the VMX loop in disable/enable_pagefault
    and fall back to the existing copy_tofrom_user loop if we do need to
    sleep.

    The VMX breakpoint of 4096 bytes was chosen using this microbenchmark:

    http://ozlabs.org/~anton/junkcode/copy_to_user.c

    Since we are using VMX and there is a cost to saving and restoring
    the user VMX state there are two broad cases we need to benchmark:

    - Best case - userspace never uses VMX

    - Worst case - userspace always uses VMX

    In reality a userspace process will sit somewhere between these two
    extremes. Since we need to test both aligned and unaligned copies we
    end up with 4 combinations. The point at which the VMX loop begins to
    win is:

    0% VMX
    aligned 2048 bytes
    unaligned 2048 bytes

    100% VMX
    aligned 16384 bytes
    unaligned 8192 bytes

    Considering this is a microbenchmark, the data is hot in cache and
    the VMX loop has better store queue merging properties we set the
    breakpoint to 4096 bytes, a little below the unaligned breakpoints.

    Some future optimisations we can look at:

    - Looking at the perf data, a significant part of the cost when a
    task is always using VMX is the extra exception we take to restore
    the VMX state. As such we should do something similar to the x86
    optimisation that restores FPU state for heavy users. ie:

    /*
    * If the task has used fpu the last 5 timeslices, just do a full
    * restore of the math state immediately to avoid the trap; the
    * chances of needing FPU soon are obviously high now
    */
    preload_fpu = tsk_used_math(next_p) && next_p->fpu_counter > 5;

    and

    /*
    * fpu_counter contains the number of consecutive context switches
    * that the FPU is used. If this is over a threshold, the lazy fpu
    * saving becomes unlazy to save the trap. This is an unsigned char
    * so that after 256 times the counter wraps and the behavior turns
    * lazy again; this to deal with bursty apps that only use FPU for
    * a short time
    */

    - We could create a paca bit to mirror the VMX enabled MSR bit and check
    that first, avoiding multiple calls to calling enable_kernel_altivec.
    That should help with iovec based system calls like readv.

    - We could have two VMX breakpoints, one for when we know the user VMX
    state is loaded into the registers and one when it isn't. This could
    be a second bit in the paca so we can calculate the break points quickly.

    - One suggestion from Ben was to save and restore the VSX registers
    we use inline instead of using enable_kernel_altivec.

    [BenH: Fixed a problem with preempt and fixed build without CONFIG_ALTIVEC]

    Signed-off-by: Anton Blanchard
    Signed-off-by: Benjamin Herrenschmidt

    Anton Blanchard
     

16 Dec, 2011

5 commits


15 Dec, 2011

1 commit


09 Dec, 2011

2 commits

  • Based on original work by David 'Shaggy' Kleikamp.

    Signed-off-by: Tony Breeds
    Signed-off-by: Josh Boyer

    Tony Breeds
     
  • 24aa07882b (memblock, x86: Replace memblock_x86_reserve/free_range()
    with generic ones) removed arch/x86/include/asm/memblock.h and dropped
    its inclusion from include/linux/memblock.h which breaks other
    architectures which depended on the generic memblock.h pulling in the
    arch specific one.

    However, the proper fix isn't adding back the asm inclusion. memblock
    doesn't have any arch dependent part and doesn't need arch specific
    header file and asm/memblock.h files are either practically empty or
    contain mostly unrelated arch specific stuff.

    * In microblaze, sh, powerpc, sparc and openrisc, asm/memblock.h is
    either empty or just contains unused MEMBLOCK_DBG() macro. Remove
    them.

    * In arm and unicore32, asm/memblock.h contains arch specific stuff.
    Include it directly from its users. It might be a good idea to
    rename the header file to avoid confusion.

    Signed-off-by: Tejun Heo
    Reported-by: "H. Peter Anvin"
    Cc: Yinghai Lu
    Cc: Russell King
    Cc: Michal Simek
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mundt
    Cc: "David S. Miller"
    Cc: Guan Xuetao

    Tejun Heo
     

08 Dec, 2011

5 commits

  • This fixes a problem where a CPU thread coming out of nap mode can
    think it has valid values in the nonvolatile GPRs (r14 - r31) as saved
    away in power7_idle, but in fact the values have been trashed because
    the thread was used for KVM in the mean time. The result is that the
    thread crashes because code that called power7_idle (e.g.,
    pnv_smp_cpu_kill_self()) goes to use values in registers that have
    been trashed.

    The bit field in SRR1 that tells whether state was lost only reflects
    the most recent nap, which may not have been the nap instruction in
    power7_idle. So we need an extra PACA field to indicate that state
    has been lost even if SRR1 indicates that the most recent nap didn't
    lose state. We clear this field when saving the state in power7_idle,
    we set it to a non-zero value when we use the thread for KVM, and we
    test it in power7_wakeup_noloss.

    Signed-off-by: Paul Mackerras
    Signed-off-by: Benjamin Herrenschmidt

    Paul Mackerras
     
  • With CONFIG_STRICT_DEVMEM=y, user space cannot read any part of /dev/mem.
    Since this breaks librtas, punch a hole in /dev/mem to allow access to the
    rmo_buffer that librtas needs.

    Anton Blanchard reported the problem and helped with the fix.

    A quick test for this patch:

    # cat /proc/rtas/rmo_buffer
    000000000f190000 10000

    # python -c "print 0x000000000f190000 / 0x10000"
    3865

    # dd if=/dev/mem of=/tmp/foo count=1 bs=64k skip=3865
    1+0 records in
    1+0 records out
    65536 bytes (66 kB) copied, 0.000205235 s, 319 MB/s

    # dd if=/dev/mem of=/tmp/foo
    dd: reading `/dev/mem': Operation not permitted
    0+0 records in
    0+0 records out
    0 bytes (0 B) copied, 0.00022519 s, 0.0 kB/s

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Benjamin Herrenschmidt

    sukadev@linux.vnet.ibm.com
     
  • The lv1 hcall #91 should be named lv1_read_repository_node, and
    not lv1_get_repository_node_value. Adjust the lv1 hcall table
    and all calls.

    Signed-off-by: Geoff Levand
    Signed-off-by: Benjamin Herrenschmidt

    Geoff Levand
     
  • The lv1_get_version_info hcall takes 2, not 1 output
    arguments. Adjust the lv1 hcall table and all calls.

    Usage:

    int lv1_get_version_info(u64 *version_number, u64 *vendor_id)

    Signed-off-by: Geoff Levand
    Signed-off-by: Benjamin Herrenschmidt

    Geoff Levand
     
  • The lv1_get_virtual_address_space_id_of_ppe hcall takes 0, not 1 input
    arguments. Adjust the lv1 hcall table and all calls.

    Signed-off-by: Geoff Levand
    Signed-off-by: Benjamin Herrenschmidt

    Geoff Levand