09 Dec, 2011

1 commit

  • memblock_init() initializes arrays for regions and memblock itself;
    however, all these can be done with struct initializers and
    memblock_init() can be removed. This patch kills memblock_init() and
    initializes memblock with struct initializer.

    The only difference is that the first dummy entries don't have .nid
    set to MAX_NUMNODES initially. This doesn't cause any behavior
    difference.

    Signed-off-by: Tejun Heo
    Cc: Benjamin Herrenschmidt
    Cc: Yinghai Lu
    Cc: Russell King
    Cc: Michal Simek
    Cc: Paul Mundt
    Cc: "David S. Miller"
    Cc: Guan Xuetao
    Cc: "H. Peter Anvin"

    Tejun Heo
     

15 Jul, 2011

1 commit

  • Other than sanity check and debug message, the x86 specific version of
    memblock reserve/free functions are simple wrappers around the generic
    versions - memblock_reserve/free().

    This patch adds debug messages with caller identification to the
    generic versions and replaces x86 specific ones and kills them.
    arch/x86/include/asm/memblock.h and arch/x86/mm/memblock.c are empty
    after this change and removed.

    Signed-off-by: Tejun Heo
    Link: http://lkml.kernel.org/r/1310462166-31469-14-git-send-email-tj@kernel.org
    Cc: Yinghai Lu
    Cc: Benjamin Herrenschmidt
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Signed-off-by: H. Peter Anvin

    Tejun Heo
     

20 Mar, 2011

1 commit

  • Now cleanup_highmap actually is in two steps: one is early in head64.c
    and only clears above _end; a second one is in init_memory_mapping() and
    tries to clean from _brk_end to _end.
    It should check if those boundaries are PMD_SIZE aligned but currently
    does not.
    Also init_memory_mapping() is called several times for numa or memory
    hotplug, so we really should not handle initial kernel mappings there.

    This patch moves cleanup_highmap() down after _brk_end is settled so
    we can do everything in one step.
    Also we honor max_pfn_mapped in the implementation of cleanup_highmap.

    Signed-off-by: Yinghai Lu
    Signed-off-by: Stefano Stabellini
    LKML-Reference:
    Signed-off-by: H. Peter Anvin

    Yinghai Lu
     

14 Oct, 2010

1 commit

  • head_64.S maps up to 512 MiB, but that is not necessarity true for
    other entry paths, such as Xen.

    Thus, co-locate the setting of max_pfn_mapped with the code to
    actually set up the page tables in head_64.S. The 32-bit code is
    already so co-located. (The Xen code already sets max_pfn_mapped
    correctly for its own use case.)

    -v2:

    Yinghai fixed the following bug in this patch:

    |
    | max_pfn_mapped is in .bss section, so we need to set that
    | after bss get cleared. Without that we crash on bootup.
    |
    | That is safe because Xen does not call x86_64_start_kernel().
    |

    Signed-off-by: Jeremy Fitzhardinge
    Fixed-by: Yinghai Lu
    Signed-off-by: H. Peter Anvin
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     

28 Aug, 2010

2 commits

  • 1.include linux/memblock.h directly. so later could reduce e820.h reference.
    2 this patch is done by sed scripts mainly

    -v2: use MEMBLOCK_ERROR instead of -1ULL or -1UL

    Signed-off-by: Yinghai Lu
    Signed-off-by: H. Peter Anvin

    Yinghai Lu
     
  • 1. replace find_e820_area with memblock_find_in_range
    2. replace reserve_early with memblock_x86_reserve_range
    3. replace free_early with memblock_x86_free_range.
    4. NO_BOOTMEM will switch to use memblock too.
    5. use _e820, _early wrap in the patch, in following patch, will
    replace them all
    6. because memblock_x86_free_range support partial free, we can remove some special care
    7. Need to make sure that memblock_find_in_range() is called after memblock_x86_fill()
    so adjust some calling later in setup.c::setup_arch()
    -- corruption_check and mptable_update

    -v2: Move reserve_brk() early
    Before fill_memblock_area, to avoid overlap between brk and memblock_find_in_range()
    that could happen We have more then 128 RAM entry in E820 tables, and
    memblock_x86_fill() could use memblock_find_in_range() to find a new place for
    memblock.memory.region array.
    and We don't need to use extend_brk() after fill_memblock_area()
    So move reserve_brk() early before fill_memblock_area().
    -v3: Move find_smp_config early
    To make sure memblock_find_in_range not find wrong place, if BIOS doesn't put mptable
    in right place.
    -v4: Treat RESERVED_KERN as RAM in memblock.memory. and they are already in
    memblock.reserved already..
    use __NOT_KEEP_MEMBLOCK to make sure memblock related code could be freed later.
    -v5: Generic version __memblock_find_in_range() is going from high to low, and for 32bit
    active_region for 32bit does include high pages
    need to replace the limit with memblock.default_alloc_limit, aka get_max_mapped()
    -v6: Use current_limit instead
    -v7: check with MEMBLOCK_ERROR instead of -1ULL or -1L
    -v8: Set memblock_can_resize early to handle EFI with more RAM entries
    -v9: update after kmemleak changes in mainline

    Suggested-by: David S. Miller
    Suggested-by: Benjamin Herrenschmidt
    Suggested-by: Thomas Gleixner
    Signed-off-by: Yinghai Lu
    Signed-off-by: H. Peter Anvin

    Yinghai Lu
     

30 Mar, 2010

1 commit

  • When CONFIG_NO_BOOTMEM=y, it could use memory more effiently, or
    in a more compact fashion.

    Example:

    Allocated new RAMDISK: 00ec2000 - 0248ce57
    Move RAMDISK from 000000002ea04000 - 000000002ffcee56 to 00ec2000 - 0248ce56

    The new RAMDISK's end is not page aligned.
    Last page could be shared with other users.

    When free_init_pages are called for initrd or .init, the page
    could be freed and we could corrupt other data.

    code segment in free_init_pages():

    | for (; addr < end; addr += PAGE_SIZE) {
    | ClearPageReserved(virt_to_page(addr));
    | init_page_count(virt_to_page(addr));
    | memset((void *)(addr & ~(PAGE_SIZE-1)),
    | POISON_FREE_INITMEM, PAGE_SIZE);
    | free_page(addr);
    | totalram_pages++;
    | }

    last half page could be used as one whole free page.

    So page align the boundaries.

    -v2: make the original initramdisk to be aligned, according to
    Johannes, otherwise we have the chance to lose one page.
    we still need to keep initrd_end not aligned, otherwise it could
    confuse decompressor.
    -v3: change to WARN_ON instead, suggested by Johannes.
    -v4: use PAGE_ALIGN, suggested by Johannes.
    We may fix that macro name later to PAGE_ALIGN_UP, and PAGE_ALIGN_DOWN
    Add comments about assuming ramdisk start is aligned
    in relocate_initrd(), change to re get ramdisk_image instead of save it
    to make diff smaller. Add warning for wrong range, suggested by Johannes.
    -v6: remove one WARN()
    We need to align beginning in free_init_pages()
    do not copy more than ramdisk_size, noticed by Johannes

    Reported-by: Stanislaw Gruszka
    Tested-by: Stanislaw Gruszka
    Signed-off-by: Yinghai Lu
    Acked-by: Johannes Weiner
    Cc: David Miller
    Cc: Benjamin Herrenschmidt
    Cc: Linus Torvalds
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     

11 Dec, 2009

1 commit

  • Jens found the following crash/regression:

    [ 0.000000] found SMP MP-table at [ffff8800000fdd80] fdd80
    [ 0.000000] Kernel panic - not syncing: Overlapping early reservations 12-f011 MP-table mpc to 0-fff BIOS data page

    and

    [ 0.000000] Kernel panic - not syncing: Overlapping early reservations 12-f011 MP-table mpc to 6000-7fff TRAMPOLINE

    and bisected it to b24c2a9 ("x86: Move find_smp_config()
    earlier and avoid bootmem usage").

    It turns out the BIOS is using the first 64k for mptable,
    without reserving it.

    So try to find good range for the real-mode trampoline instead of
    hard coding it, in case some bios tries to use that range for sth.

    Reported-by: Jens Axboe
    Signed-off-by: Yinghai Lu
    Tested-by: Jens Axboe
    Cc: Randy Dunlap
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     

31 Aug, 2009

1 commit

  • Platforms like Moorestown require early setup and want to avoid the
    call to reserve_ebda_region. The x86_init override is too late when
    the MRST detection happens in setup_arch. Move the default i386
    x86_init overrides and the call to reserve_ebda_region into a separate
    function which is called as the default of a switch case depending on
    the hardware_subarch id in boot params. This allows us to add a case
    for MRST and let MRST have its own early setup function.

    Signed-off-by: Thomas Gleixner

    Thomas Gleixner
     

27 Aug, 2009

1 commit


15 Mar, 2009

1 commit

  • Impact: new interface

    Add a brk()-like allocator which effectively extends the bss in order
    to allow very early code to do dynamic allocations. This is better than
    using statically allocated arrays for data in subsystems which may never
    get used.

    The space for brk allocations is in the bss ELF segment, so that the
    space is mapped properly by the code which maps the kernel, and so
    that bootloaders keep the space free rather than putting a ramdisk or
    something into it.

    The bss itself, delimited by __bss_stop, ends before the brk area
    (__brk_base to __brk_limit). The kernel text, data and bss is reserved
    up to __bss_stop.

    Any brk-allocated data is reserved separately just before the kernel
    pagetable is built, as that code allocates from unreserved spaces
    in the e820 map, potentially allocating from any unused brk memory.
    Ultimately any unused memory in the brk area is used in the general
    kernel memory pool.

    Initially the brk space is set to 1MB, which is probably much larger
    than any user needs (the largest current user is i386 head_32.S's code
    to build the pagetables to map the kernel, which can get fairly large
    with a big kernel image and no PSE support). So long as the system
    has sufficient memory for the bootloader to reserve the kernel+1MB brk,
    there are no bad effects resulting from an over-large brk.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: H. Peter Anvin

    Jeremy Fitzhardinge
     

20 Jan, 2009

1 commit

  • Impact: cleanup

    Copy the code to cpu_init() to satisfy the requirement that the cpu
    be reinitialized. Remove all other calls, since the segments are
    already initialized in head_64.S.

    Signed-off-by: Brian Gerst
    Signed-off-by: Tejun Heo

    Brian Gerst
     

16 Jan, 2009

7 commits

  • Do the following cleanups:

    * kill x86_64_init_pda() which now is equivalent to pda_init()

    * use per_cpu_offset() instead of cpu_pda() when initializing
    initial_gs

    Signed-off-by: Tejun Heo
    Signed-off-by: Ingo Molnar

    Tejun Heo
     
  • [ Based on original patch from Christoph Lameter and Mike Travis. ]

    As pda is now allocated in percpu area, it can easily be made a proper
    percpu variable. Make it so by defining per cpu symbol from linker
    script and declaring it in C code for SMP and simply defining it for
    UP. This change cleans up code and brings SMP and UP closer a bit.

    Signed-off-by: Tejun Heo
    Signed-off-by: Ingo Molnar

    Tejun Heo
     
  • Now that pda is allocated as part of percpu, percpu doesn't need to be
    accessed through pda. Unify x86_64 SMP percpu access with x86_32 SMP
    one. Other than the segment register, operand size and the base of
    percpu symbols, they behave identical now.

    This patch replaces now unnecessary pda->data_offset with a dummy
    field which is necessary to keep stack_canary at its place. This
    patch also moves per_cpu_offset initialization out of init_gdt() into
    setup_per_cpu_areas(). Note that this change also necessitates
    explicit per_cpu_offset initializations in voyager_smp.c.

    With this change, x86_OP_percpu()'s are as efficient on x86_64 as on
    x86_32 and also x86_64 can use assembly PER_CPU macros.

    Signed-off-by: Tejun Heo
    Signed-off-by: Ingo Molnar

    Tejun Heo
     
  • [ Based on original patch from Christoph Lameter and Mike Travis. ]

    Currently pdas and percpu areas are allocated separately. %gs points
    to local pda and percpu area can be reached using pda->data_offset.
    This patch folds pda into percpu area.

    Due to strange gcc requirement, pda needs to be at the beginning of
    the percpu area so that pda->stack_canary is at %gs:40. To achieve
    this, a new percpu output section macro - PERCPU_VADDR_PREALLOC() - is
    added and used to reserve pda sized chunk at the start of the percpu
    area.

    After this change, for boot cpu, %gs first points to pda in the
    data.init area and later during setup_per_cpu_areas() gets updated to
    point to the actual pda. This means that setup_per_cpu_areas() need
    to reload %gs for CPU0 while clearing pda area for other cpus as cpu0
    already has modified it when control reaches setup_per_cpu_areas().

    This patch also removes now unnecessary get_local_pda() and its call
    sites.

    A lot of this patch is taken from Mike Travis' "x86_64: Fold pda into
    per cpu area" patch.

    Signed-off-by: Tejun Heo
    Signed-off-by: Ingo Molnar

    Tejun Heo
     
  • _cpu_pda array first uses statically allocated storage in data.init
    and then switches to allocated bootmem to conserve space. However,
    after folding pda area into percpu area, _cpu_pda array will be
    removed completely. Drop the reallocation part to simplify the code
    for soon-to-follow changes.

    Signed-off-by: Tejun Heo
    Signed-off-by: Ingo Molnar

    Tejun Heo
     
  • [ Based on original patch from Christoph Lameter and Mike Travis. ]

    CPU startup code in head_64.S loaded address of a zero page into %gs
    for temporary use till pda is loaded but address to the actual pda is
    available at the point. Load the real address directly instead.

    This will help unifying percpu and pda handling later on.

    This patch is mostly taken from Mike Travis' "x86_64: Fold pda into
    per cpu area" patch.

    Signed-off-by: Tejun Heo

    Tejun Heo
     
  • [ Based on original patch from Christoph Lameter and Mike Travis. ]

    This patch makes percpu symbols zerobased on x86_64 SMP by adding
    PERCPU_VADDR() to vmlinux.lds.h which helps setting explicit vaddr on
    the percpu output section and using it in vmlinux_64.lds.S. A new
    PHDR is added as existing ones cannot contain sections near address
    zero. PERCPU_VADDR() also adds a new symbol __per_cpu_load which
    always points to the vaddr of the loaded percpu data.init region.

    The following adjustments have been made to accomodate the address
    change.

    * code to locate percpu gdt_page in head_64.S is updated to add the
    load address to the gdt_page offset.

    * __per_cpu_load is used in places where access to the init data area
    is necessary.

    * pda->data_offset is initialized soon after C code is entered as zero
    value doesn't work anymore.

    This patch is mostly taken from Mike Travis' "x86_64: Base percpu
    variables at zero" patch.

    Signed-off-by: Tejun Heo
    Signed-off-by: Ingo Molnar

    Tejun Heo
     

03 Jan, 2009

1 commit

  • The pda rework (commit 3461b0af025251bbc6b3d56c821c6ac2de6f7209)
    to remove static boot cpu pdas introduced a performance bug.

    _boot_cpu_pda is the actual pda used by the boot cpu and is definitely
    not "__read_mostly" and ended up polluting the read mostly section with
    writes. This bug caused regression of about 8-10% on certain syscall
    intensive workloads.

    Signed-off-by: Ravikiran Thirumalai
    Acked-by: Mike Travis
    Cc:
    Signed-off-by: Ingo Molnar

    Ravikiran G Thirumalai
     

08 Dec, 2008

1 commit

  • Impact: fix trampoline sizing bug, save space

    While debugging a suspend-to-RAM related issue it occured to me that
    if the trampoline code had grown past 4 KB, we would have been
    allocating too little memory for it, since the 4 KB size of the
    trampoline is hardcoded into arch/x86/kernel/e820.c . Change that
    by making the kernel compute the trampoline size and allocate as much
    memory as necessary.

    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Ingo Molnar

    Rafael J. Wysocki
     

29 Sep, 2008

1 commit


25 Sep, 2008

1 commit


15 Aug, 2008

1 commit


16 Jul, 2008

1 commit

  • As a stopgap until Mike Travis's x86-64 gs-based percpu patches are
    ready, provide workaround functions for x86_read/write_percpu for
    Xen's use.

    Specifically, this means that we can't really make use of vcpu
    placement, because we can't use a single gs-based memory access to get
    to vcpu fields. So disable all that for now.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     

08 Jul, 2008

5 commits

  • Ying Huang would like setup_data to be reserved, but not included in the
    no save range.

    Here we try to modify the e820 table to reserve that range early.
    also add that in early_res in case bootloader messes up with the ramdisk.

    other solution would be
    1. add early_res_to_highmem...
    2. early_res_to_e820...
    but they could reserve another type memory wrongly, if early_res has some
    resource reserved early, and not needed later, but it is not removed from
    early_res in time. Like the RAMDISK (already handled).

    Signed-off-by: Yinghai Lu
    Cc: andi@firstfloor.org
    Tested-by: Huang, Ying
    Signed-off-by: Ingo Molnar

    Yinghai Lu
     
  • Split x86_64_start_kernel() into two pieces:

    The first essentially cleans up after head_64.S. It clears the
    bss, zaps low identity mappings, sets up some early exception
    handlers.

    The second part preserves the boot data, reserves the kernel's
    text/data/bss, pagetables and ramdisk, and then starts the kernel
    proper.

    This split is so that Xen can call the second part to do the set up it
    needs done. It doesn't need any of the first part setups, because it
    doesn't boot via head_64.S, and its redundant or actively damaging.

    Signed-off-by: Jeremy Fitzhardinge
    Cc: xen-devel
    Cc: Stephen Tweedie
    Cc: Eduardo Habkost
    Cc: Mark McLoughlin
    Signed-off-by: Ingo Molnar

    Jeremy Fitzhardinge
     
  • Conflicts:

    arch/x86/Kconfig
    arch/x86/kernel/e820.c
    arch/x86/kernel/efi_64.c
    arch/x86/kernel/mpparse.c
    arch/x86/kernel/setup.c
    arch/x86/kernel/setup_32.c
    arch/x86/mm/init_64.c
    include/asm-x86/proto.h

    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Ingo Molnar wrote:
    ...
    > they crashed after about 3 randconfig iterations with:
    >
    > early res: 4 [8000-afff] PGTABLE
    > early res: 5 [b000-b87f] MEMNODEMAP
    > PANIC: early exception 0e rip 10:ffffffff8077a150 error 2 cr2 37
    > Pid: 0, comm: swapper Not tainted 2.6.25-sched-devel.git-x86-latest.git #14
    >
    > Call Trace:
    > [] early_idt_handler+0x56/0x6a
    > [] ? numa_set_node+0x30/0x60
    > [] ? numa_set_node+0x9/0x60
    > [] numa_init_array+0x93/0xf0
    > [] acpi_scan_nodes+0x3b9/0x3f0
    > [] numa_initmem_init+0x136/0x150
    > [] setup_arch+0x48f/0x700
    > [] ? clockevents_register_notifier+0x3a/0x50
    > [] start_kernel+0xd7/0x440
    > [] x86_64_start_kernel+0x222/0x280
    ...
    Here's the fixup... This one should follow the previous patches.

    Thanks,
    Mike
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Mike Travis
     
  • * Remove the boot_cpu_pda array and pointer table from the data section.
    Allocate the pointer table and array during init. do_boot_cpu()
    will reallocate the pda in node local memory and if the cpu is being
    brought up before the bootmem array is released (after_bootmem = 0),
    then it will free the initial pda. This will happen for all cpus
    present at system startup.

    This removes 512k + 32k bytes from the data section.

    For inclusion into sched-devel/latest tree.

    Based on:
    git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
    + sched-devel/latest .../mingo/linux-2.6-sched-devel.git

    Signed-off-by: Mike Travis
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Mike Travis
     

05 Jun, 2008

2 commits


27 Apr, 2008

1 commit

  • This patch adds a field of 64-bit physical pointer to NULL terminated
    single linked list of struct setup_data to real-mode kernel
    header. This is used as a more extensible boot parameters passing
    mechanism.

    Signed-off-by: Huang Ying
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Huang, Ying
     

26 Apr, 2008

1 commit


20 Apr, 2008

1 commit

  • ramdisk is reserved via reserve_early in x86_64_start_kernel,
    later early_res_to_bootmem() will convert to reservation in bootmem.

    so don't need to reserve that again.

    Signed-off-by: Yinghai Lu
    Signed-off-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner

    Yinghai Lu
     

17 Apr, 2008

5 commits

  • All of early setup runs with interrupts disabled, so there is no
    need to set up early exception handlers for vectors >= 32

    This saves some minor text size.

    Signed-off-by: Andi Kleen
    Cc: mingo@elte.hu
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Ingo Molnar

    Andi Kleen
     
  • Jeremy Fitzhardinge pointed out that looking at the boot_params
    struct to determine if the system is running in a paravirtual
    environment is not reliable for the Xen case, currently. He also
    points out that there already exists a function to determine if
    the system is running in a paravirtual environment. So let's use
    that instead. This gets rid of the preprocessor test too.

    Signed-off-by: Alexander van Heukelum
    Acked-by: H. Peter Anvin
    Signed-off-by: Ingo Molnar

    Alexander van Heukelum
     
  • This patch is an add-on to the 64-bit ebda patch. It makes
    the functions reserve_ebda_region (renamed from reserve_ebda)
    and copy_e820_map equal to the 32-bit versions of the previous
    patch.

    Changes:

    Use u64 and u32 for local variables in copy_e820_map.

    The amount of conventional memory and the start of the EBDA are
    detected by reading the BIOS data area directly. Paravirtual
    environments do not provide this area, so we bail out early
    in that case. They will just have to set up a correct memory
    map to start with.

    Add a safety net for zeroed out BIOS data area.

    Signed-off-by: Alexander van Heukelum
    Signed-off-by: Ingo Molnar

    Alexander van Heukelum
     
  • Explicitly reserve_early the whole address range from the end of
    conventional memory as reported by the bios data area up to the
    1Mb mark. Regard the info retrieved from the BIOS data area with
    a bit of paranoia, though, because some biosses forget to register
    the EBDA correctly.

    Signed-off-by: Alexander van Heukelum
    Signed-off-by: Ingo Molnar

    Alexander van Heukelum
     
  • these build-time and link-time checks would have prevented the
    vmlinux size regression.

    Signed-off-by: Ingo Molnar

    Ingo Molnar