20 Jul, 2007

5 commits

  • Remove the arg+env limit of MAX_ARG_PAGES by copying the strings directly from
    the old mm into the new mm.

    We create the new mm before the binfmt code runs, and place the new stack at
    the very top of the address space. Once the binfmt code runs and figures out
    where the stack should be, we move it downwards.

    It is a bit peculiar in that we have one task with two mm's, one of which is
    inactive.

    [a.p.zijlstra@chello.nl: limit stack size]
    Signed-off-by: Ollie Wild
    Signed-off-by: Peter Zijlstra
    Cc:
    Cc: Hugh Dickins
    [bunk@stusta.de: unexport bprm_mm_init]
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ollie Wild
     
  • Currently most of the per cpu data, which is accessed by different cpus,
    has a ____cacheline_aligned_in_smp attribute. Move all this data to the
    new per cpu shared data section: .data.percpu.shared_aligned.

    This will seperate the percpu data which is referenced frequently by other
    cpus from the local only percpu data.

    Signed-off-by: Fenghua Yu
    Acked-by: Suresh Siddha
    Cc: Rusty Russell
    Cc: Christoph Lameter
    Cc: "Luck, Tony"
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fenghua Yu
     
  • per cpu data section contains two types of data. One set which is
    exclusively accessed by the local cpu and the other set which is per cpu,
    but also shared by remote cpus. In the current kernel, these two sets are
    not clearely separated out. This can potentially cause the same data
    cacheline shared between the two sets of data, which will result in
    unnecessary bouncing of the cacheline between cpus.

    One way to fix the problem is to cacheline align the remotely accessed per
    cpu data, both at the beginning and at the end. Because of the padding at
    both ends, this will likely cause some memory wastage and also the
    interface to achieve this is not clean.

    This patch:

    Moves the remotely accessed per cpu data (which is currently marked
    as ____cacheline_aligned_in_smp) into a different section, where all the data
    elements are cacheline aligned. And as such, this differentiates the local
    only data and remotely accessed data cleanly.

    Signed-off-by: Fenghua Yu
    Acked-by: Suresh Siddha
    Cc: Rusty Russell
    Cc: Christoph Lameter
    Cc:
    Cc: "Luck, Tony"
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fenghua Yu
     
  • I realise jprobes are a razor-blades-included type of interface, but that
    doesn't mean we can't try and make them safer to use. This guy I know once
    wrote code like this:

    struct jprobe jp = { .kp.symbol_name = "foo", .entry = "jprobe_foo" };

    And then his kernel exploded. Oops.

    This patch adds an arch hook, arch_deref_entry_point() (I don't like it
    either) which takes the void * in a struct jprobe, and gives back the text
    address that it represents.

    We can then use that in register_jprobe() to check that the entry point we're
    passed is actually in the kernel text, rather than just some random value.

    Signed-off-by: Michael Ellerman
    Cc: Prasanna S Panchamukhi
    Acked-by: Ananth N Mavinakayanahalli
    Cc: Anil S Keshavamurthy
    Cc: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Ellerman
     
  • This patch completes Linus's wish that the fault return codes be made into
    bit flags, which I agree makes everything nicer. This requires requires
    all handle_mm_fault callers to be modified (possibly the modifications
    should go further and do things like fault accounting in handle_mm_fault --
    however that would be for another patch).

    [akpm@linux-foundation.org: fix alpha build]
    [akpm@linux-foundation.org: fix s390 build]
    [akpm@linux-foundation.org: fix sparc build]
    [akpm@linux-foundation.org: fix sparc64 build]
    [akpm@linux-foundation.org: fix ia64 build]
    Signed-off-by: Nick Piggin
    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Russell King
    Cc: Ian Molton
    Cc: Bryan Wu
    Cc: Mikael Starvik
    Cc: David Howells
    Cc: Yoshinori Sato
    Cc: "Luck, Tony"
    Cc: Hirokazu Takata
    Cc: Geert Uytterhoeven
    Cc: Roman Zippel
    Cc: Greg Ungerer
    Cc: Matthew Wilcox
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: Paul Mundt
    Cc: Kazumoto Kojima
    Cc: Richard Curnow
    Cc: William Lee Irwin III
    Cc: "David S. Miller"
    Cc: Jeff Dike
    Cc: Paolo 'Blaisorblade' Giarrusso
    Cc: Miles Bader
    Cc: Chris Zankel
    Acked-by: Kyle McMartin
    Acked-by: Haavard Skinnemoen
    Acked-by: Ralf Baechle
    Acked-by: Andi Kleen
    Signed-off-by: Andrew Morton
    [ Still apparently needs some ARM and PPC loving - Linus ]
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

18 Jul, 2007

5 commits

  • * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
    [IA64] Clean away some code inside some non-existent CONFIG ifdefs
    [IA64] ar.itc access must really be after xtime_lock.sequence has been read
    [IA64] correctly count CPU objects in the ia64/sn hwperf interface
    [IA64] arbitary speed tty ioctl support
    [IA64] use machvec=dig on hpzx1 platforms

    Linus Torvalds
     
  • Signed-off-by: Al Viro
    Acked-by: David S. Miller
    Acked-by: Geert Uytterhoeven
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • Based on usage and testing over the past couple of years, kprobes on
    i386, ia64, powerpc and x86_64 is no longer EXPERIMENTAL.

    This is a follow-up to Robert P.J. Day's patch making "Instrumentation
    support" non-EXPERIMENTAL:

    http://marc.info/?l=linux-kernel&m=118396955423812&w=2

    Arch maintainers for sparc64, avr32 and s390 need to take a similar call.

    Signed-off-by: Ananth N Mavinakayanahalli
    Cc: Andi Kleen
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ananth N Mavinakayanahalli
     
  • If the kernel OOPSed or BUGed then it probably should be considered as
    tainted. Thus, all subsequent OOPSes and SysRq dumps will report the
    tainted kernel. This saves a lot of time explaining oddities in the
    calltraces.

    Signed-off-by: Pavel Emelianov
    Acked-by: Randy Dunlap
    Cc:
    Signed-off-by: Andrew Morton
    [ Added parisc patch from Matthew Wilson -Linus ]
    Signed-off-by: Linus Torvalds

    Pavel Emelianov
     
  • This patch adds the kernelcore= parameter for x86.

    Once all patches are applied, a new command-line parameter exist and a new
    sysctl. This patch adds the necessary documentation.

    From: Yasunori Goto

    When "kernelcore" boot option is specified, kernel can't boot up on ia64
    because of an infinite loop. In addition, the parsing code can be handled
    in an architecture-independent manner.

    This patch uses common code to handle the kernelcore= parameter. It is
    only available to architectures that support arch-independent zone-sizing
    (i.e. define CONFIG_ARCH_POPULATES_NODE_MAP). Other architectures will
    ignore the boot parameter.

    [bunk@stusta.de: make cmdline_parse_kernelcore() static]
    Signed-off-by: Mel Gorman
    Signed-off-by: Yasunori Goto
    Acked-by: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

17 Jul, 2007

3 commits

  • OpenVZ Linux kernel team has discovered the problem with 32bit quota tools
    working on 64bit architectures. In 2.6.10 kernel sys32_quotactl() function
    was replaced by sys_quotactl() with the comment "sys_quotactl seems to be
    32/64bit clean, enable it for 32bit" However this isn't right. Look at
    if_dqblk structure:

    struct if_dqblk {
    __u64 dqb_bhardlimit;
    __u64 dqb_bsoftlimit;
    __u64 dqb_curspace;
    __u64 dqb_ihardlimit;
    __u64 dqb_isoftlimit;
    __u64 dqb_curinodes;
    __u64 dqb_btime;
    __u64 dqb_itime;
    __u32 dqb_valid;
    };

    For 32 bit quota tools sizeof(if_dqblk) == 0x44.
    But for 64 bit kernel its size is 0x48, 'cause of alignment!
    Thus we got a problem. Attached patch reintroduce sys32_quotactl() function,
    that handles this and related situations.

    [michal.k.k.piotrowski@gmail.com: build fix]
    [akpm@linux-foundation.org: Make it link with CONFIG_QUOTA=n]
    Signed-off-by: Vasily Tarasov
    Cc: Andi Kleen
    Cc: "Luck, Tony"
    Cc: Jan Kara
    Cc:
    Signed-off-by: Michal Piotrowski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasily Tarasov
     
  • This patch is using mmap()'s randomization functionality in such a way that
    it maps the main executable of (specially compiled/linked -pie/-fpie)
    ET_DYN binaries onto a random address (in cases in which mmap() is allowed
    to perform a randomization).

    Origin of this patch is in exec-shield
    (http://people.redhat.com/mingo/exec-shield/)

    [jkosina@suse.cz: pie randomization: fix BAD_ADDR macro]
    Signed-off-by: Jan Kratochvil
    Signed-off-by: Jiri Kosina
    Cc: Ingo Molnar
    Cc: Roland McGrath
    Cc: Jakub Jelinek
    Signed-off-by: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kratochvil
     
  • Beacuse SERIAL_PORT_DFNS is removed from include/asm-i386/serial.h and
    include/asm-x86_64/serial.h. the serial8250_ports need to be probed late in
    serial initializing stage. the console_init=>serial8250_console_init=>
    register_console=>serial8250_console_setup will return -ENDEV, and console
    ttyS0 can not be enabled at that time. need to wait till uart_add_one_port in
    drivers/serial/serial_core.c to call register_console to get console ttyS0.
    that is too late.

    Make early_uart to use early_param, so uart console can be used earlier. Make
    it to be bootconsole with CON_BOOT flag, so can use console handover feature.
    and it will switch to corresponding normal serial console automatically.

    new command line will be:
    console=uart8250,io,0x3f8,9600n8
    console=uart8250,mmio,0xff5e0000,115200n8
    or
    earlycon=uart8250,io,0x3f8,9600n8
    earlycon=uart8250,mmio,0xff5e0000,115200n8

    it will print in very early stage:
    Early serial console at I/O port 0x3f8 (options '9600n8')
    console [uart0] enabled
    later for console it will print:
    console handover: boot [uart0] -> real [ttyS0]

    Signed-off-by:
    Cc: Andi Kleen
    Cc: Bjorn Helgaas
    Cc: Russell King
    Cc: Gerd Hoffmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yinghai Lu
     

14 Jul, 2007

4 commits

  • Robert P.J. Day has a script that finds places in the code that
    use non-existent CONFIG variables. It complained of two uses in
    ia64 specific code: CONFIG_IA64_SDV and CONFIG_KDB (both used in
    the hp/sim code).

    Signed-off-by: Tony Luck

    Tony Luck
     
  • The ".acq" semantics of the load only apply w.r.t. other data access.
    Reading the clock (ar.itc) isn't a data access so strange things can
    happen here. Specifically the read of ar.itc can be launched as soon
    as the read of xtime_lock.sequence is ISSUED. Since this may cache
    miss, and that might cause a thread switch, and there may be cache
    contention for the line containing xtime_lock, it may be a long time
    before the actual value is returned, so the ar.itc value may be very
    stale.

    Move the consumption of r28 up before the read of ar.itc to make sure
    that we really have got the current value of xtime_lock.sequence
    before look at ar.itc.

    Signed-off-by: Hidetoshi Seto
    Signed-off-by: Tony Luck

    Hidetoshi Seto
     
  • Correctly count CPU objects for SGI ia64/sn hwperf interface

    Signed-off-by: Mark Goodwin
    Signed-off-by: Jack Steiner
    Signed-off-by: Tony Luck

    Mark Goodwin
     
  • On HP zx1 machines, the 'machvec=dig' parameter is needed for the
    kdump kernel to avoid problems with the HP sba iommu. The problem
    is that during the boot of the kdump kernel, the iommu is re-initialized,
    so in-flight DMA from improperly shutdown drivers causes an IOTLB
    miss which leads to an MCA. With kdump, the idea is to get into the
    kdump kernel with as little code as we can, so shutting down drivers
    properly is not an option.

    The workaround is to add 'machvec=dig' to the kdump kernel boot
    parameters. This makes the kdump kernel avoid using the sba iommu
    altogether, leaving the IOTLB intact. Any ongoing DMA falls
    harmlessly outside the kdump kernel. After the kdump kernel reboots,
    all devices will have been shutdown properly and DMA stopped.

    This patch pushes that functionality into the sba iommu
    initialization code, so that users won't have to find the obscure
    documentation telling them about 'machvec=dig'.

    This patch only affects HP platforms. It still includes one
    extern declaration in the file, because no applicable header file
    exists.

    Signed-off-by: Terry Loftin
    Signed-off-by: Alex Williamson
    Signed-off-by: Tony Luck

    Terry Loftin
     

13 Jul, 2007

1 commit


12 Jul, 2007

4 commits

  • The PCI syscalls are built on every architecture except X86, but only
    a few have ever hooked them up. Use a new Kconfig symbol to save a
    couple of kB on the architectures that have never used the syscalls.
    Tested on x86 and ia64 only.

    Signed-off-by: Matthew Wilcox
    Signed-off-by: Greg Kroah-Hartman

    Matthew Wilcox
     
  • Linux does not gracefully deal with multiple processors going
    through OS_MCA aa part of the same MCA event. The first cpu
    into OS_MCA grabs the ia64_mca_serialize lock. Subsequent
    cpus wait for that lock, preventing them from reporting in as
    rendezvoused. The first cpu waits 5 seconds then complains
    that all the cpus have not rendezvoused. The first cpu then
    handles its MCA and frees up all the rendezvoused cpus and
    releases the ia64_mca_serialize lock. One of the subsequent
    cpus going thought OS_MCA then gets the ia64_mca_serialize
    lock, waits another 5 seconds and then complains that none of
    the other cpus have rendezvoused.

    This patch allows multiple CPUs to gracefully go through OS_MCA.

    The first CPU into ia64_mca_handler() grabs a mca_count lock.
    Subsequent CPUs into ia64_mca_handler() are added to a list of cpus
    that need to go through OS_MCA (a bit set in mca_cpu), and report
    in as rendezvoused, and but spin waiting their turn.

    The first CPU sees everyone rendezvous, handles his MCA, wakes up
    one of the other CPUs waiting to process their MCA (by clearing
    one mca_cpu bit), and then waits for the other cpus to complete
    their MCA handling. The next CPU handles his MCA and the process
    repeats until all the CPUs have handled their MCA. When the last
    CPU has handled it's MCA, it sets monarch_cpu to -1, releasing all
    the CPUs.

    In testing this works more reliably and faster.

    Thanks to Keith Owens for suggesting numerous improvements
    to this code.

    Signed-off-by: Russ Anderson
    Signed-off-by: Tony Luck

    Russ Anderson
     
  • Tell GCC to stop spewing out unnecessary warnings for unused variables
    passed to functions as pointers for ia64 files.

    Signed-off-by: Jes Sorensen
    Signed-off-by: Tony Luck

    Jes Sorensen
     
  • Example memory map (HP rx7640 with 'default' acpiconfig setting, VGA disabled):
    0x00000000 - 0x3FFFBFFF supports only WB (cacheable) access

    If a user attempts to perform an MMIO mmap (using the PCIIOC_MMAP_IS_MEM ioctl)
    to PCI config space (like mmap'ing and accessing memory at 0xA0000),
    we will MCA because the kernel will attempt to use a mapping with the UC
    attribute.

    So check the memory attribute in kern_mmap and the EFI memmap. If WC is
    requested, and WC or UC access is supported for the region, allow it.
    Otherwise, use the same attribute the kernel uses.

    Updates documentation and test cases as well.

    Signed-off-by: Alex Chiang
    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Tony Luck

    Alex Chiang
     

10 Jul, 2007

4 commits

  • SDM says that brl instruction must be followed by a stop bit.
    Fix instance in BRL_COND_FSYS_BUBBLE_DOWN where it isn't.

    Signed-off-by: Christian Kandeler
    Signed-off-by: Tony Luck

    Christian Kandeler
     
  • On SN systems, when setting the IORESOURCE_ROM_BIOS_COPY resource flag,
    the resource length should be set to the actual size of the ROM image
    so that a call to pci_map_rom() returns the correct size.

    Signed-off-by: John Keller
    Signed-off-by: Andrew Morton
    Signed-off-by: Tony Luck

    John Keller
     
  • It's not a good idea to use "ssm psr.ic | psr.i" to simultaneously
    enable interrupts and interrupt state collection, the two bits can
    take effect asynchronously, so it is possible for an interrupt to
    be serviced while psr.ic is still zero.

    Signed-off-by: Tony Luck

    Tony Luck
     
  • the SMP load-balancer uses the boot-time migration-cost estimation
    code to attempt to improve the quality of balancing. The reason for
    this code is that the discrete priority queues do not preserve
    the order of scheduling accurately, so the load-balancer skips
    tasks that were running on a CPU 'recently'.

    this code is fundamental fragile: the boot-time migration cost detector
    doesnt really work on systems that had large L3 caches, it caused boot
    delays on large systems and the whole cache-hot concept made the
    balancing code pretty undeterministic as well.

    (and hey, i wrote most of it, so i can say it out loud that it sucks ;-)

    under CFS the same purpose of cache affinity can be achieved without
    any special cache-hot special-case: tasks are sorted in the 'timeline'
    tree and the SMP balancer picks tasks from the left side of the
    tree, thus the most cache-cold task is balanced automatically.

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

27 Jun, 2007

5 commits


25 May, 2007

2 commits

  • Section mismatch: reference to .init.text:acpi_find_rsdp
    (between 'acpi_get_sysname' and 'acpi_request_vector')

    acpi_get_sysname() needs to call the __init function acpi_find_rsdp, but it
    doesn't have the __init attribute itself, hence the warning. Luckily it is
    only called from machvec_init() which has __init attribute, so the fix
    is to define acpi_get_sysname() as __init too.

    Signed-off-by: Tony Luck

    Tony Luck
     
  • Silly bug in _PDC data setup. Haven't seen any real side-effects of this one
    yet. But, needs fixing regardless.

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Tony Luck

    Venki Pallipadi
     

24 May, 2007

1 commit


23 May, 2007

4 commits

  • * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux-2.6:
    [IA64] fix kmalloc(0) in arch/ia64/pci/pci.c
    [IA64] Only unwind non-running tasks.
    [IA64] Improve unwind checking.
    [IA64] Yet another section mismatch warning
    [IA64] Fix bogus messages about system calls not implemented.

    Linus Torvalds
     
  • Hiroyuki Kamezawa reported the problem that pci_acpi_scan_root() of
    ia64 might call kmalloc_node() with zero size.

    Currently ia64's pci_acpi_scan_root() assumes that _CRS method of root
    bridge has at least one resource window. But, the root bridges that
    has no resource window must be taken into account.

    Signed-off-by: Kenji Kaneshige
    Signed-off-by: Andrew Morton
    Signed-off-by: Tony Luck

    Kenji Kaneshige
     
  • Unwinding a running task has proven problematic.

    In one instance, the running task was attempting to unwind itself and
    received an interrupt between when get_wchan allocated local variables on
    the stack and when unw_init_from_blocked_task was called which resulted
    in unw_init_frame_info to place this tasks task_struct pointer over the
    switch stack's ar_bspstore entry.

    Signed-off-by: Robin Holt
    Signed-off-by: Tony Luck

    Robin Holt
     
  • This patch adds some sanity checks to keep register and memory stack
    pointers in the unw_frame_info structure within the tasks stack address
    range.

    Signed-off-by: Robin Holt
    Signed-off-by: Tony Luck

    Robin Holt
     

19 May, 2007

2 commits