12 Jan, 2007

1 commit

  • Fix an oops experienced on the Cell architecture when init-time functions,
    early_*(), are called at runtime. It alters the call paths to make sure
    that the callers explicitly say whether the call is being made on behalf of
    a hotplug even, or happening at boot-time.

    It has been compile tested on ppc64, ia64, s390, i386 and x86_64.

    Acked-by: Arnd Bergmann
    Signed-off-by: Dave Hansen
    Cc: Yasunori Goto
    Acked-by: Andy Whitcroft
    Cc: Christoph Lameter
    Cc: Martin Schwidefsky
    Acked-by: Heiko Carstens
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     

08 Dec, 2006

1 commit

  • The zone table is mostly not needed. If we have a node in the page flags
    then we can get to the zone via NODE_DATA() which is much more likely to be
    already in the cpu cache.

    In case of SMP and UP NODE_DATA() is a constant pointer which allows us to
    access an exact replica of zonetable in the node_zones field. In all of
    the above cases there will be no need at all for the zone table.

    The only remaining case is if in a NUMA system the node numbers do not fit
    into the page flags. In that case we make sparse generate a table that
    maps sections to nodes and use that table to to figure out the node number.
    This table is sized to fit in a single cache line for the known 32 bit
    NUMA platform which makes it very likely that the information can be
    obtained without a cache miss.

    For sparsemem the zone table seems to be have been fairly large based on
    the maximum possible number of sections and the number of zones per node.
    There is some memory saving by removing zone_table. The main benefit is to
    reduce the cache foootprint of the VM from the frequent lookups of zones.
    Plus it simplifies the page allocator.

    [akpm@osdl.org: build fix]
    Signed-off-by: Christoph Lameter
    Cc: Dave Hansen
    Cc: Andy Whitcroft
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

01 Oct, 2006

3 commits


30 Sep, 2006

2 commits

  • ratelimit_pages in page-writeback.c is recalculated (in set_ratelimit())
    every time a CPU is hot-added/removed. But this value is not recalculated
    when new pages are hot-added.

    This patch fixes that problem by calling set_ratelimit() when new pages
    are hot-added.

    [akpm@osdl.org: cleanups]
    Signed-off-by: Chandra Seetharaman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chandra Seetharaman
     
  • Change the list of memory nodes allowed to tasks in the top (root) nodeset
    to dynamically track what cpus are online, using a call to a cpuset hook
    from the memory hotplug code. Make this top cpus file read-only.

    On systems that have cpusets configured in their kernel, but that aren't
    actively using cpusets (for some distros, this covers the majority of
    systems) all tasks end up in the top cpuset.

    If that system does support memory hotplug, then these tasks cannot make
    use of memory nodes that are added after system boot, because the memory
    nodes are not allowed in the top cpuset. This is a surprising regression
    over earlier kernels that didn't have cpusets enabled.

    One key motivation for this change is to remain consistent with the
    behaviour for the top_cpuset's 'cpus', which is also read-only, and which
    automatically tracks the cpu_online_map.

    This change also has the minor benefit that it fixes a long standing,
    little noticed, minor bug in cpusets. The cpuset performance tweak to
    short circuit the cpuset_zone_allowed() check on systems with just a single
    cpuset (see 'number_of_cpusets', in linux/cpuset.h) meant that simply
    changing the 'mems' of the top_cpuset had no affect, even though the change
    (the write system call) appeared to succeed. With the following change,
    that write to the 'mems' file fails -EACCES, and the 'mems' file stubbornly
    refuses to be changed via user space writes. Thus no one should be mislead
    into thinking they've changed the top_cpusets's 'mems' when in affect they
    haven't.

    In order to keep the behaviour of cpusets consistent between systems
    actively making use of them and systems not using them, this patch changes
    the behaviour of the 'mems' file in the top (root) cpuset, making it read
    only, and making it automatically track the value of node_online_map. Thus
    tasks in the top cpuset will have automatic use of hot plugged memory nodes
    allowed by their cpuset.

    [akpm@osdl.org: build fix]
    [bunk@stusta.de: build fix]
    Signed-off-by: Paul Jackson
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     

06 Aug, 2006

3 commits

  • This patch is for collision check enhancement for memory hot add.

    It's better to do resouce collision check before doing memory hot add,
    which will touch memory management structures.

    And add_section() should check section exists or not before calling
    sparse_add_one_section(). (sparse_add_one_section() will do another
    check anyway. but checking in memory_hotplug.c will be easy to understand.)

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: keith mannthey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • find_next_system_ram() is used to find available memory resource at onlining
    newly added memory. This patch fixes following problem.

    find_next_system_ram() cannot catch this case.

    Resource: (start)-------------(end)
    Section : (start)-------------(end)

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Keith Mannthey
    Cc: Yasunori Goto
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • ioresouce handling code in memory hotplug allows not-aligned memory hot add.
    But when memmap and other memory structures are initialized, parameters should
    be aligned. (if not aligned, initialization of mem_map will do wrong, it
    assumes parameters are aligned.) This patch fix it.

    And this patch allows ioresource collision check to handle -EEXIST.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Keith Mannthey
    Cc: Yasunori Goto
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

01 Jul, 2006

1 commit


28 Jun, 2006

5 commits

  • When new node becomes enable by hot-add, new sysfs file must be created for
    new node. So, if new node is enabled by add_memory(), register_one_node() is
    called to create it. In addition, I386's arch_register_node() and a part of
    register_nodes() of powerpc are consolidated to register_one_node() as a
    generic_code().

    This is tested by Tiger4(IPF) with node hot-plug emulation.

    Signed-off-by: Keiichiro Tokunaga
    Signed-off-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • This patch allows hot-add memory which is not aligned to section.

    Now, hot-added memory has to be aligned to section size. Considering big
    section sized archs, this is not useful.

    When hot-added memory is registerd as iomem resoruce by iomem resource
    patch, we can make use of that information to detect valid memory range.

    Note: With this, not-aligned memory can be registerd. To allow hot-add
    memory with holes, we have to do more work around add_memory().
    (It doesn't allows add memory to already existing mem section.)

    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Register hot-added memory to iomem_resource. With this, /proc/iomem can
    show hot-added memory.

    Note: kdump uses /proc/iomem to catch memory range when it is installed.
    So, kdump should be re-installed after /proc/iomem change.

    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Vivek Goyal
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • Add node-hot-add support to add_memory().

    node hotadd uses this sequence.
    1. allocate pgdat.
    2. refresh NODE_DATA()
    3. call free_area_init_node() to initialize
    4. create sysfs entry
    5. add memory (old add_memory())
    6. set node online
    7. run kswapd for new node.
    (8). update zonelist after pages are onlined. (This is already merged in -mm
    due to update phase is difference.)

    Note:
    To make common function as much as possible,
    there is 2 changes from v2.
    - The old add_memory(), which is defiend by each archs,
    is renamed to arch_add_memory(). New add_memory becomes
    caller of arch dependent function as a common code.

    - This patch changes add_memory()'s interface
    From: add_memory(start, end)
    TO : add_memory(nid, start, end).
    It was cause of similar code that finding node id from
    physical address is inside of old add_memory() on each arch.

    In addition, acpi memory hotplug driver can find node id easier.
    In v2, it must walk DSDT'S _CRS by matching physical address to
    get the handle of its memory device, then get _PXM and node id.
    Because input is just physical address.
    However, in v3, the acpi driver can use handle to get _PXM and node id
    for the new memory device. It can pass just node id to add_memory().

    Fix interface of arch_add_memory() is in next patche.

    Signed-off-by: Yasunori Goto
    Signed-off-by: KAMEZAWA Hiroyuki
    Cc: Dave Hansen
    Cc: "Brown, Len"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • Change the name of old add_memory() to arch_add_memory. And use node id to
    get pgdat for the node at NODE_DATA().

    Note: Powerpc's old add_memory() is defined as __devinit. However,
    add_memory() is usually called only after bootup.
    I suppose it may be redundant. But, I'm not well known about powerpc.
    So, I keep it. (But, __meminit is better at least.)

    Signed-off-by: Yasunori Goto
    Cc: Dave Hansen
    Cc: "Brown, Len"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     

23 Jun, 2006

3 commits

  • Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     
  • In current code, zonelist is considered to be build once, no modification.
    But MemoryHotplug can add new zone/pgdat. It must be updated.

    This patch modifies build_all_zonelists(). By this, build_all_zonelist() can
    reconfig pgdat's zonelists.

    To update them safety, this patch use stop_machine_run(). Other cpus don't
    touch among updating them by using it.

    In old version (V2 of node hotadd), kernel updated them after zone
    initialization. But present_page of its new zone is still 0, because
    online_page() is not called yet at this time. Build_zonelists() checks
    present_pages to find present zone. It was too early. So, I changed it after
    online_pages().

    Signed-off-by: Yasunori Goto
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     
  • …for init_current_empty_zone

    When add_zone() is called against empty zone (not populated zone), we have to
    initialize the zone which didn't initialize at boot time. But,
    init_currently_empty_zone() may fail due to allocation of wait table. So,
    this patch is to catch its error code.

    Changes against wait_table is in the next patch.

    Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
    Signed-off-by: Yasunori Goto <y-goto@jp.fujitsu.com>
    Signed-off-by: Andrew Morton <akpm@osdl.org>
    Signed-off-by: Linus Torvalds <torvalds@osdl.org>

    Yasunori Goto
     

01 Jun, 2006

1 commit

  • From: Yasunori Goto

    If hot-added memory's address is smaller than old area, spanned_pages will
    not be updated. It must be fixed.

    example) Old zone_start_pfn = 0x60000, and spanned_pages = 0x10000
    Added new memory's start_pfn = 0x50000, and end_pfn = 0x60000

    new spanned_pages will be still 0x10000 by old code.
    (It should be updated to 0x20000.) Because old_zone_end_pfn will be
    0x70000, and end_pfn smaller than it. So, spanned_pages will not be
    updated.

    In current code, spanned_pages is updated only when end_pfn is updated.
    But, it should be updated by subtraction between bigger end_pfn and new
    zone_start_pfn.

    Signed-off-by: Yasunori Goto
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     

02 May, 2006

1 commit

  • Based on an older patch from Mike Kravetz

    We need to have a mem_map for high addresses in order to make fops->no_page
    work on spufs mem and register files. So far, we have used the
    memory_present() function during early bootup, but that did not work when
    CONFIG_NUMA was enabled.

    We now use the __add_pages() function to add the mem_map when loading the
    spufs module, which is a lot nicer.

    Signed-off-by: Arnd Bergmann
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joel H Schopp
     

10 Mar, 2006

1 commit

  • When pages are onlined, not only zone->present_pages but also
    pgdat->node_present_pages should be refreshed.

    This parameter is used to show information at
    /sys/device/system/node/nodeX/meminfo via si_meminfo_node().

    So, it shows strange value for MemUsed which is calculated
    (node_present_pages - all zones free pages).

    Signed-off-by: Yasunori Goto
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yasunori Goto
     

07 Jan, 2006

1 commit


14 Dec, 2005

1 commit


30 Oct, 2005

3 commits

  • From: IWAMOTO Toshihiro
    > I found the tests does not work well with Dave's patchset.
    > I've found the followings:
    >
    > - setup_per_zone_pages_min() calls should be added in
    > capture_page_range() and online_pages()
    > - lru_add_drain() should be called before try_to_migrate_pages()

    The following patch deals with the first item.

    Signed-off-by: IWAMOTO Toshihiro
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • This basically keeps up from having to extern __kmalloc_section_memmap().

    The vaddr_in_vmalloc_area() helper could go in a vmalloc header, but that
    header gets hard to work with, because it needs some arch-specific macros.
    Just stick it in here for now, instead of creating another header.

    Signed-off-by: Dave Hansen
    Signed-off-by: Lion Vollnhals
    Signed-off-by: Jiri Slaby
    Signed-off-by: Yasunori Goto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • This adds generic memory add/remove and supporting functions for memory
    hotplug into a new file as well as a memory hotplug kernel config option.

    Individual architecture patches will follow.

    For now, disable memory hotplug when swsusp is enabled. There's a lot of
    churn there right now. We'll fix it up properly once it calms down.

    Signed-off-by: Matt Tolentino
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen