17 Oct, 2007

3 commits

  • We need the check for a node with cpu in zone reclaim. Zone reclaim will not
    allow remote zone reclaim if a node has a cpu.

    [Lee.Schermerhorn@hp.com: Move setup of N_CPU node state mask]
    Signed-off-by: Christoph Lameter
    Tested-by: Lee Schermerhorn
    Acked-by: Bob Picco
    Cc: Nishanth Aravamudan
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • It is necessary to know if nodes have memory since we have recently begun to
    add support for memoryless nodes. For that purpose we introduce a two new
    node states: N_HIGH_MEMORY and N_NORMAL_MEMORY.

    A node has its bit in N_HIGH_MEMORY set if it has any memory regardless of the
    type of mmemory. If a node has memory then it has at least one zone defined
    in its pgdat structure that is located in the pgdat itself.

    A node has its bit in N_NORMAL_MEMORY set if it has a lower zone than
    ZONE_HIGHMEM. This means it is possible to allocate memory that is not
    subject to kmap.

    N_HIGH_MEMORY and N_NORMAL_MEMORY can then be used in various places to insure
    that we do the right thing when we encounter a memoryless node.

    [akpm@linux-foundation.org: build fix]
    [Lee.Schermerhorn@hp.com: update N_HIGH_MEMORY node state for memory hotadd]
    [y-goto@jp.fujitsu.com: Fix memory hotplug + sparsemem build]
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Nishanth Aravamudan
    Signed-off-by: Christoph Lameter
    Acked-by: Bob Picco
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Signed-off-by: Yasunori Goto
    Signed-off-by: Paul Mundt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Why do we need to support memoryless nodes?

    KAMEZAWA Hiroyuki wrote:

    > For fujitsu, problem is called "empty" node.
    >
    > When ACPI's SRAT table includes "possible nodes", ia64 bootstrap(acpi_numa_init)
    > creates nodes, which includes no memory, no cpu.
    >
    > I tried to remove empty-node in past, but that was denied.
    > It was because we can hot-add cpu to the empty node.
    > (node-hotplug triggered by cpu is not implemented now. and it will be ugly.)
    >
    >
    > For HP, (Lee can comment on this later), they have memory-less-node.
    > As far as I hear, HP's machine can have following configration.
    >
    > (example)
    > Node0: CPU0 memory AAA MB
    > Node1: CPU1 memory AAA MB
    > Node2: CPU2 memory AAA MB
    > Node3: CPU3 memory AAA MB
    > Node4: Memory XXX GB
    >
    > AAA is very small value (below 16MB) and will be omitted by ia64 bootstrap.
    > After boot, only Node 4 has valid memory (but have no cpu.)
    >
    > Maybe this is memory-interleave by firmware config.

    Christoph Lameter wrote:

    > Future SGI platforms (actually also current one can have but nothing like
    > that is deployed to my knowledge) have nodes with only cpus. Current SGI
    > platforms have nodes with just I/O that we so far cannot manage in the
    > core. So the arch code maps them to the nearest memory node.

    Lee Schermerhorn wrote:

    > For the HP platforms, we can configure each cell with from 0% to 100%
    > "cell local memory". When we configure with improve bandwidth at the expense of latency for numa-challenged
    > applications [and OSes, but not our problem ;-)]. When we boot Linux on
    > such a config, all of the real nodes have no memory--it all resides in a
    > single interleaved pseudo-node.
    >
    > When we boot Linux on a 100% CLM configuration [== NUMA], we still have
    > the interleaved pseudo-node. It contains a few hundred MB stolen from
    > the real nodes to contain the DMA zone. [Interleaved memory resides at
    > phys addr 0]. The memoryless-nodes patches, along with the zoneorder
    > patches, support this config as well.
    >
    > Also, when we boot a NUMA config with the "mem=" command line,
    > specifying less memory than actually exists, Linux takes the excluded
    > memory "off the top" rather than distributing it across the nodes. This
    > can result in memoryless nodes, as well.
    >

    This patch:

    Preparation for memoryless node patches.

    Provide a generic way to keep nodemasks describing various characteristics of
    NUMA nodes.

    Remove the node_online_map and the node_possible map and realize the same
    functionality using two nodes stats: N_POSSIBLE and N_ONLINE.

    [Lee.Schermerhorn@hp.com: Initialize N_*_MEMORY and N_CPU masks for non-NUMA config]
    Signed-off-by: Christoph Lameter
    Tested-by: Lee Schermerhorn
    Acked-by: Lee Schermerhorn
    Acked-by: Bob Picco
    Cc: Nishanth Aravamudan
    Cc: KAMEZAWA Hiroyuki
    Cc: Mel Gorman
    Signed-off-by: Lee Schermerhorn
    Cc: "Serge E. Hallyn"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

21 Feb, 2007

1 commit

  • highest_possible_node_id() is currently used to calculate the last possible
    node idso that the network subsystem can figure out how to size per node
    arrays.

    I think having the ability to determine the maximum amount of nodes in a
    system at runtime is useful but then we should name this entry
    correspondingly, it should return the number of node_ids, and the the value
    needs to be setup only once on bootup. The node_possible_map does not
    change after bootup.

    This patch introduces nr_node_ids and replaces the use of
    highest_possible_node_id(). nr_node_ids is calculated on bootup when the
    page allocators pagesets are initialized.

    [deweerdt@free.fr: fix oops]
    Signed-off-by: Christoph Lameter
    Cc: Neil Brown
    Cc: Trond Myklebust
    Signed-off-by: Frederik Deweerdt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

12 Oct, 2006

1 commit

  • lib/bitmap.c:bitmap_parse() is a library function that received as input a
    user buffer. This seemed to have originated from the way the write_proc
    function of the /proc filesystem operates.

    This has been reworked to not use kmalloc and eliminates a lot of
    get_user() overhead by performing one access_ok before using __get_user().

    We need to test if we are in kernel or user space (is_user) and access the
    buffer differently. We cannot use __get_user() to access kernel addresses
    in all cases, for example in architectures with separate address space for
    kernel and user.

    This function will be useful for other uses as well; for example, taking
    input for /sysfs instead of /proc, so it was changed to accept kernel
    buffers. We have this use for the Linux UWB project, as part as the
    upcoming bandwidth allocator code.

    Only a few routines used this function and they were changed too.

    Signed-off-by: Reinette Chatre
    Signed-off-by: Inaky Perez-Gonzalez
    Cc: Paul Jackson
    Cc: Joe Korty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Reinette Chatre
     

02 Oct, 2006

1 commit


28 Mar, 2006

1 commit

  • This patch defines for_each_online_pgdat() as a replacement of
    for_each_pgdat()

    Now, online nodes are managed by node_online_map. But for_each_pgdat()
    uses pgdat_link to iterate over all nodes(pgdat). This means management
    structure for online pgdat is duplicated.

    I think using node_online_map for for_each_pgdat() is simple and sane
    rather ather than pgdat_link. New macro is named as
    for_each_online_pgdat(). Following patch will fix callers of
    for_each_pgdat().

    The bootmem allocater uses for_each_pgdat() before pgdat initialization. I
    don't think it's sane. Following patch will fix it.

    Signed-off-by: Yasunori Goto
    Signed-off-by: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

08 Feb, 2006

1 commit


31 Oct, 2005

1 commit

  • In the forthcoming task migration support, a key calculation will be
    mapping cpu and node numbers from the old set to the new set while
    preserving cpuset-relative offset.

    For example, if a task and its pages on nodes 8-11 are being migrated to
    nodes 24-27, then pages on node 9 (the 2nd node in the old set) should be
    moved to node 25 (the 2nd node in the new set.)

    As with other bitmap operations, the proper way to code this is to provide
    the underlying calculation in lib/bitmap.c, and then to provide the usual
    cpumask and nodemask wrappers.

    This patch provides that. These operations are termed 'remap' operations.
    Both remapping a single bit and a set of bits is supported.

    Signed-off-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds