03 Jan, 2006

1 commit

  • In commit 3D59121003721a8fad11ee72e646fd9d3076b5679c, the x86 and x86-64
    was changed to include for the
    configurable timer frequency.

    However, asm/param.h is sometimes used in userland (it is included
    indirectly from ), so your commit pollutes the userland
    namespace with tons of CONFIG_FOO macros. This greatly confuses
    software packages (such as BusyBox) which use CONFIG_FOO macros
    themselves to control the inclusion of optional features.

    After a short exchange, Christoph approved this patch

    Signed-off-by: Linus Torvalds

    Dag-Erling Smørgrav
     

25 Dec, 2005

2 commits


24 Nov, 2005

1 commit

  • alpha, sparc64, x86_64 are each missing some primitives from their atomic64
    support: fill in the gaps I've noticed by extrapolating asm, follow the
    groupings in each file. But powerpc and parisc still lack atomic64.

    Signed-off-by: Hugh Dickins
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: "David S. Miller"
    Cc: Andi Kleen
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

21 Nov, 2005

1 commit

  • Ever since we remove msr.c from x86_64 branch and started grabbing it from
    i386, msr device (read functionality) has been broken for us.

    This is due to the differences between asm-i386/msr.h and asm-x86_64/msr.h interfaces.

    Here is a patch to our side to fix this.

    Thankfully, as of current (2.6.15-rc1-git6) tree, arch/i386/kernel/msr.c is the only file that uses rdmsr_safe macro.

    Signed-off-by: Jacob Shin
    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Jacob.Shin@amd.com
     

15 Nov, 2005

19 commits

  • Linus Torvalds
     
  • This is needed for large multinode IBM systems which have a sparse
    APIC space in clustered mode, fully covering the available 8 bits.

    The previous kernels would limit the local APIC number to 127,
    which caused it to reject some of the CPUs at boot.

    I increased the maximum and shrunk the apic_version array a bit
    to make up for that (the version is only 8 bit, so don't need
    an full int to store)

    Cc: Chris McDermott

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • Keeping this function does not makes sense because it's a copied (and
    buggy) copy of sys_time. The only difference is that now.tv_sec (which is
    a time_t, i.e. a 64-bit long) is copied (and truncated) into a int
    (32-bit).

    The prototype is the same (they both take a long __user *), so let's drop
    this and redirect it to sys_time (and make sure it exists by defining
    __ARCH_WANT_SYS_TIME).

    Only disadvantage is that the sys_stime definition is also compiled (may be
    fixed if needed by adding a separate __ARCH_WANT_SYS_STIME macro, and
    defining it for all arch's defining __ARCH_WANT_SYS_TIME except x86_64).

    Acked-by: Andi Kleen
    Signed-off-by: Paolo 'Blaisorblade' Giarrusso
    Signed-off-by: Andrew Morton
    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Paolo 'Blaisorblade' Giarrusso
     
  • The current value was correct before the introduction of Intel EM64T support -
    but now L1_CACHE_SHIFT_MAX can be less than L1_CACHE_SHIFT, which _is_ funny!

    Between the few users of ____cacheline_maxaligned_in_smp, we also have (for
    example) rcu_ctrlblk, and struct zone, with zone->{lru_,}lock. I.e. we have
    a lot of excess cacheline bouncing on them.

    No correctness issues, obviously. So this could even be merged for 2.6.14
    (I'm not a fan of this idea, though).

    CC: Andi Kleen
    Signed-off-by: Paolo 'Blaisorblade' Giarrusso
    Signed-off-by: Andrew Morton
    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Paolo 'Blaisorblade' Giarrusso
     
  • Not needed since x86-64 always uses the spinlock based rwsems.

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • Fields obtained through cpuid vector 0x1(ebx[16:23]) and
    vector 0x4(eax[14:25], eax[26:31]) indicate the maximum values and might not
    always be the same as what is available and what OS sees. So make sure
    "siblings" and "cpu cores" values in /proc/cpuinfo reflect the values as seen
    by OS instead of what cpuid instruction says. This will also fix the buggy BIOS
    cases (for example where cpuid on a single core cpu says there are "2" siblings,
    even when HT is disabled in the BIOS.
    http://bugzilla.kernel.org/show_bug.cgi?id=4359)

    Signed-off-by: Suresh Siddha
    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Siddha, Suresh B
     
  • Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • No functional changes.

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • With a NR_CPUS==128 kernel with CPU hotplug enabled we would waste 4MB
    on per CPU data of all possible CPUs. The reason was that HOTPLUG
    always set up possible map to NR_CPUS cpus and then we need to allocate
    that much (each per CPU data is roughly ~32k now)

    The underlying problem is that ACPI didn't tell us how many hotplug CPUs
    the platform supports. So the old code just assumed all, which would
    lead to this memory wastage.

    This implements some new heuristics:

    - If the BIOS specified disabled CPUs in the ACPI/mptables assume they
    can be enabled later (this is bending the ACPI specification a bit,
    but seems like a obvious extension)
    - The user can overwrite it with a new additionals_cpus=NUM option
    - Otherwise use half of the available CPUs or 2, whatever is more.

    Cc: ashok.raj@intel.com
    Cc: len.brown@intel.com

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • Pointed out by Eric Dumazet

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • It is for physical addresses, not for PFNs.

    Pointed out by Tejun Heo.

    Cc: htejun@gmail.com

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • We should zap the low mappings, as soon as possible, so that we can catch
    kernel bugs more effectively. Previously early boot had NULL mapped
    and didn't trap on NULL references.

    This patch introduces boot_level4_pgt, which will always have low identity
    addresses mapped. Druing boot, all the processors will use this as their
    level4 pgt. On BP, we will switch to init_level4_pgt as soon as we enter C
    code and zap the low mappings as soon as we are done with the usage of
    identity low mapped addresses. On AP's we will zap the low mappings as
    soon as we jump to C code.

    Signed-off-by: Suresh Siddha
    Signed-off-by: Ashok Raj
    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Siddha, Suresh B
     
  • Not go from the CPU number to an mapping array.
    Mode number is often used now in fast paths.

    This also adds a generic numa_node_id to all the topology includes

    Suggested by Eric Dumazet

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • pfn_to_page really requires pfn_valid to be true now, no question.
    Some people stumbled over it, but it was misleading and wrong.

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     
  • Here's a patch that builds on Natalie Protasevich's IRQ compression
    patch and tries to work for MPS boots as well as ACPI. It is meant for
    a 4-node IBM x460 NUMA box, which was dying because it had interrupt
    pins with GSI numbers > NR_IRQS and thus overflowed irq_desc.

    The problem is that this system has 270 GSIs (which are 1:1 mapped with
    I/O APIC RTEs) and an 8-node box would have 540. This is much bigger
    than NR_IRQS (224 for both i386 and x86_64). Also, there aren't enough
    vectors to go around. There are about 190 usable vectors, not counting
    the reserved ones and the unused vectors at 0x20 to 0x2F. So, my patch
    attempts to compress the GSI range and share vectors by sharing IRQs.

    Cc: "Protasevich, Natalie"

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    James Cleverdon
     
  • MC4_MISC - DRAM Errors Threshold Register realized under AMD K8 Rev F.
    This register is used to count correctable and uncorrectable ECC errors that occur during DRAM read operations.
    The user may interface through sysfs files in order to change the threshold configuration.

    bank%d/error_count - reads current error count, write to clear.
    bank%d/interrupt_enable - set/clear interrupt enable.
    bank%d/threshold_limit - read/write the threshold limit.

    APIC vector 0xF9 in hw_irq.h.
    5 software defined bank ids in mce.h.
    new apic.c function to setup threshold apic lvt.
    defaults to interrupt off, count enabled, and threshold limit max.
    sysfs interface created on /sys/devices/system/threshold.

    AK: added some ifdefs to make it compile on UP

    Signed-off-by: Jacob Shin
    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Jacob Shin
     
  • Signed-off-by: Jan Beulich
    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Jan Beulich
     
  • Add a new 4GB GFP_DMA32 zone between the GFP_DMA and GFP_NORMAL zones.

    As a bit of historical background: when the x86-64 port
    was originally designed we had some discussion if we should
    use a 16MB DMA zone like i386 or a 4GB DMA zone like IA64 or
    both. Both was ruled out at this point because it was in early
    2.4 when VM is still quite shakey and had bad troubles even
    dealing with one DMA zone. We settled on the 16MB DMA zone mainly
    because we worried about older soundcards and the floppy.

    But this has always caused problems since then because
    device drivers had trouble getting enough DMA able memory. These days
    the VM works much better and the wide use of NUMA has proven
    it can deal with many zones successfully.

    So this patch adds both zones.

    This helps drivers who need a lot of memory below 4GB because
    their hardware is not accessing more (graphic drivers - proprietary
    and free ones, video frame buffer drivers, sound drivers etc.).
    Previously they could only use IOMMU+16MB GFP_DMA, which
    was not enough memory.

    Another common problem is that hardware who has full memory
    addressing for >4GB misses it for some control structures in memory
    (like transmit rings or other metadata). They tended to allocate memory
    in the 16MB GFP_DMA or the IOMMU/swiotlb then using pci_alloc_consistent,
    but that can tie up a lot of precious 16MB GFPDMA/IOMMU/swiotlb memory
    (even on AMD systems the IOMMU tends to be quite small) especially if you have
    many devices. With the new zone pci_alloc_consistent can just put
    this stuff into memory below 4GB which works better.

    One argument was still if the zone should be 4GB or 2GB. The main
    motivation for 2GB would be an unnamed not so unpopular hardware
    raid controller (mostly found in older machines from a particular four letter
    company) who has a strange 2GB restriction in firmware. But
    that one works ok with swiotlb/IOMMU anyways, so it doesn't really
    need GFP_DMA32. I chose 4GB to be compatible with IA64 and because
    it seems to be the most common restriction.

    The new zone is so far added only for x86-64.

    For other architectures who don't set up this
    new zone nothing changes. Architectures can set a compatibility
    define in Kconfig CONFIG_DMA_IS_DMA32 that will define GFP_DMA32
    as GFP_DMA. Otherwise it's a nop because on 32bit architectures
    it's normally not needed because GFP_NORMAL (=0) is DMA able
    enough.

    One problem is still that GFP_DMA means different things on different
    architectures. e.g. some drivers used to have #ifdef ia64 use GFP_DMA
    (trusting it to be 4GB) #elif __x86_64__ (use other hacks like
    the swiotlb because 16MB is not enough) ... . This was quite
    ugly and is now obsolete.

    These should be now converted to use GFP_DMA32 unconditionally. I haven't done
    this yet. Or best only use pci_alloc_consistent/dma_alloc_coherent
    which will use GFP_DMA32 transparently.

    Signed-off-by: Andi Kleen
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

14 Nov, 2005

3 commits

  • Introduce an atomic_inc_not_zero operation. Make this a special case of
    atomic_add_unless because lockless pagecache actually wants
    atomic_inc_not_negativeone due to its offset refcount.

    Signed-off-by: Nick Piggin
    Cc: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Introduce an atomic_cmpxchg operation.

    Signed-off-by: Nick Piggin
    Cc: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Fix the x86_64 TSS limit in TSS descriptor.

    Signed-off-by: Suresh Siddha
    Acked-by: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Siddha, Suresh B
     

11 Nov, 2005

1 commit

  • MSI hardcoded delivery mode to use logical delivery mode. Recently
    x86_64 moved to use physical mode addressing to support physflat mode.
    With this mode enabled noticed that my eth with MSI werent working.

    msi_address_init() was hardcoded to use logical mode for i386 and x86_64.
    So when we switch to use physical mode, things stopped working.

    Since anyway we dont use lowest priority delivery with MSI, its always
    directed to just a single CPU. Its safe and simpler to use
    physical mode always, even when we use logical delivery mode for IPI's
    or other ioapic RTE's.

    Signed-off-by: Ashok Raj
    Signed-off-by: Greg Kroah-Hartman

    Ashok Raj
     

07 Nov, 2005

2 commits


01 Nov, 2005

1 commit


31 Oct, 2005

4 commits

  • __MUTEX_INITIALIZER() has no users, and equates to the more commonly used
    DECLARE_MUTEX(), thus making it pretty much redundant. Remove it for good.

    Signed-off-by: Arthur Othieno
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arthur Othieno
     
  • This patch removes page_pte_prot and page_pte macros from all
    architectures. Some architectures define both, some only page_pte (broken)
    and others none. These macros are not used anywhere.

    page_pte_prot(page, prot) is identical to mk_pte(page, prot) and
    page_pte(page) is identical to page_pte_prot(page, __pgprot(0)).

    * The following architectures define both page_pte_prot and page_pte

    arm, arm26, ia64, sh64, sparc, sparc64

    * The following architectures define only page_pte (broken)

    frv, i386, m32r, mips, sh, x86-64

    * All other architectures define neither

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Make sure we always return, as all syscalls should. Also move the common
    prototype to

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Handle 32-bit mtrr ioctls in the mtrr driver instead of the ia32
    compatability layer.

    Signed-off-by: Brian Gerst
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Brian Gerst
     

30 Oct, 2005

1 commit

  • Add sem_is_read/write_locked functions to the read/write semaphores, along the
    same lines of the *_is_locked spinlock functions. The swap token tuning patch
    uses sem_is_read_locked; sem_is_write_locked is added for completeness.

    Signed-off-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rik Van Riel
     

28 Oct, 2005

3 commits


21 Oct, 2005

1 commit