17 Oct, 2007
40 commits
-
The checks for node_online in the uncached allocator are made to make sure
that memory is available on these nodes. Thus switch all the checks to use
N_HIGH_MEMORY and to N_ONLINE.Signed-off-by: Christoph Lameter
Signed-off-by: Jes Sorensen
Acked-by: Lee Schermerhorn
Acked-by: Bob Picco
Cc: Nishanth Aravamudan
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Simply switch all for_each_online_node to for_each_node_state(NORMAL_MEMORY).
That way SLUB only operates on nodes with regular memory. Any allocation
attempt on a memoryless node or a node with just highmem will fall whereupon
SLUB will fetch memory from a nearby node (depending on how memory policies
and cpuset describe fallback).Signed-off-by: Christoph Lameter
Tested-by: Lee Schermerhorn
Acked-by: Bob Picco
Cc: Nishanth Aravamudan
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Slab should not allocate control structures for nodes without memory. This
may seem to work right now but its unreliable since not all allocations can
fall back due to the use of GFP_THISNODE.Switching a few for_each_online_node's to N_NORMAL_MEMORY will allow us to
only allocate for nodes that have regular memory.Signed-off-by: Christoph Lameter
Acked-by: Nishanth Aravamudan
Acked-by: Lee Schermerhorn
Acked-by: Bob Picco
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A node without memory does not need a kswapd. So use the memory map instead
of the online map when starting kswapd.Signed-off-by: Christoph Lameter
Acked-by: Nishanth Aravamudan
Tested-by: Lee Schermerhorn
Acked-by: Bob Picco
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
constrained_alloc() builds its own memory map for nodes with memory. We have
that available in N_HIGH_MEMORY now. So simplify the code.Signed-off-by: Christoph Lameter
Acked-by: Nishanth Aravamudan
Acked-by: Lee Schermerhorn
Acked-by: Bob Picco
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
MPOL_INTERLEAVE currently simply loops over all nodes. Allocations on
memoryless nodes will be redirected to nodes with memory. This results in an
imbalance because the neighboring nodes to memoryless nodes will get
significantly more interleave hits that the rest of the nodes on the system.We can avoid this imbalance by clearing the nodes in the interleave node set
that have no memory. If we use the node map of the memory nodes instead of
the online nodes then we have only the nodes we want.Signed-off-by: Christoph Lameter
Signed-off-by: Nishanth Aravamudan
Tested-by: Lee Schermerhorn
Acked-by: Bob Picco
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It is necessary to know if nodes have memory since we have recently begun to
add support for memoryless nodes. For that purpose we introduce a two new
node states: N_HIGH_MEMORY and N_NORMAL_MEMORY.A node has its bit in N_HIGH_MEMORY set if it has any memory regardless of the
type of mmemory. If a node has memory then it has at least one zone defined
in its pgdat structure that is located in the pgdat itself.A node has its bit in N_NORMAL_MEMORY set if it has a lower zone than
ZONE_HIGHMEM. This means it is possible to allocate memory that is not
subject to kmap.N_HIGH_MEMORY and N_NORMAL_MEMORY can then be used in various places to insure
that we do the right thing when we encounter a memoryless node.[akpm@linux-foundation.org: build fix]
[Lee.Schermerhorn@hp.com: update N_HIGH_MEMORY node state for memory hotadd]
[y-goto@jp.fujitsu.com: Fix memory hotplug + sparsemem build]
Signed-off-by: Lee Schermerhorn
Signed-off-by: Nishanth Aravamudan
Signed-off-by: Christoph Lameter
Acked-by: Bob Picco
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Signed-off-by: Yasunori Goto
Signed-off-by: Paul Mundt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Why do we need to support memoryless nodes?
KAMEZAWA Hiroyuki wrote:
> For fujitsu, problem is called "empty" node.
>
> When ACPI's SRAT table includes "possible nodes", ia64 bootstrap(acpi_numa_init)
> creates nodes, which includes no memory, no cpu.
>
> I tried to remove empty-node in past, but that was denied.
> It was because we can hot-add cpu to the empty node.
> (node-hotplug triggered by cpu is not implemented now. and it will be ugly.)
>
>
> For HP, (Lee can comment on this later), they have memory-less-node.
> As far as I hear, HP's machine can have following configration.
>
> (example)
> Node0: CPU0 memory AAA MB
> Node1: CPU1 memory AAA MB
> Node2: CPU2 memory AAA MB
> Node3: CPU3 memory AAA MB
> Node4: Memory XXX GB
>
> AAA is very small value (below 16MB) and will be omitted by ia64 bootstrap.
> After boot, only Node 4 has valid memory (but have no cpu.)
>
> Maybe this is memory-interleave by firmware config.Christoph Lameter wrote:
> Future SGI platforms (actually also current one can have but nothing like
> that is deployed to my knowledge) have nodes with only cpus. Current SGI
> platforms have nodes with just I/O that we so far cannot manage in the
> core. So the arch code maps them to the nearest memory node.Lee Schermerhorn wrote:
> For the HP platforms, we can configure each cell with from 0% to 100%
> "cell local memory". When we configure with improve bandwidth at the expense of latency for numa-challenged
> applications [and OSes, but not our problem ;-)]. When we boot Linux on
> such a config, all of the real nodes have no memory--it all resides in a
> single interleaved pseudo-node.
>
> When we boot Linux on a 100% CLM configuration [== NUMA], we still have
> the interleaved pseudo-node. It contains a few hundred MB stolen from
> the real nodes to contain the DMA zone. [Interleaved memory resides at
> phys addr 0]. The memoryless-nodes patches, along with the zoneorder
> patches, support this config as well.
>
> Also, when we boot a NUMA config with the "mem=" command line,
> specifying less memory than actually exists, Linux takes the excluded
> memory "off the top" rather than distributing it across the nodes. This
> can result in memoryless nodes, as well.
>This patch:
Preparation for memoryless node patches.
Provide a generic way to keep nodemasks describing various characteristics of
NUMA nodes.Remove the node_online_map and the node_possible map and realize the same
functionality using two nodes stats: N_POSSIBLE and N_ONLINE.[Lee.Schermerhorn@hp.com: Initialize N_*_MEMORY and N_CPU masks for non-NUMA config]
Signed-off-by: Christoph Lameter
Tested-by: Lee Schermerhorn
Acked-by: Lee Schermerhorn
Acked-by: Bob Picco
Cc: Nishanth Aravamudan
Cc: KAMEZAWA Hiroyuki
Cc: Mel Gorman
Signed-off-by: Lee Schermerhorn
Cc: "Serge E. Hallyn"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
prepare/commit_write no longer returns AOP_TRUNCATED_PAGE since OCFS2 and
GFS2 were converted to the new aops, so we can make some simplifications
for that.[michal.k.k.piotrowski@gmail.com: fix warning]
Signed-off-by: Nick Piggin
Cc: Michael Halcrow
Cc: Mark Fasheh
Cc: Steven Whitehouse
Signed-off-by: Michal Piotrowski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Implement nobh in new aops. This is a bit tricky. FWIW, nobh_truncate is
now implemented in a way that does not create blocks in sparse regions,
which is a silly thing for it to have been doing (isn't it?)ext2 survives fsx and fsstress. jfs is converted as well... ext3
should be easy to do (but not done yet).[akpm@linux-foundation.org: coding-style fixes]
Cc: Badari Pulavarty
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Plug ocfs2 into the ->write_begin and ->write_end aops.
A bunch of custom code is now gone - the iovec iteration stuff during write
and the ocfs2 splice write actor.Signed-off-by: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Cc: Roman Zippel
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Acked-by: Russell King
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Nick Piggin
Acked-by: Dave Kleikamp
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Nick Piggin
Cc: Andries Brouwer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Nick Piggin
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Convert udf to new aops. Also seem to have fixed pagecache corruption in
udf_adinicb_commit_write -- page was marked uptodate when it is not. Also,
fixed the silly setup where prepare_write was doing a kmap to be used in
commit_write: just do kmap_atomic in write_end. Use libfs helpers to make
this easier.Signed-off-by: Nick Piggin
Cc:
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Nick Piggin
Cc: Evgeniy Dushistov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Nick Piggin
Cc: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This also gets rid of a lot of useless read_file stuff. And also
optimises the full page write case by marking a !uptodate page uptodate.Signed-off-by: Nick Piggin
Cc: Jeff Dike
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
[mszeredi]
- don't send zero length write requests
- it is not legal for the filesystem to return with zero written bytesSigned-off-by: Nick Piggin
Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
[akpm@linux-foundation.org: fix against git-nfs]
[peterz@infradead.org: fix against git-nfs]
Signed-off-by: Nick Piggin
Acked-by: Trond Myklebust
Cc: "J. Bruce Fields"
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch makes reiserfs to use AOP_FLAG_CONT_EXPAND
in order to get rid of the special generic_cont_expand routineSigned-off-by: Vladimir Saveliev
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Convert reiserfs to new aops
Signed-off-by: Vladimir Saveliev
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make reiserfs to write via generic routines.
Original reiserfs write optimized for big writes is deadlock roneSigned-off-by: Vladimir Saveliev
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Nick Piggin
Acked-by: Anders Larsen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Nick Piggin
Cc: Tigran Aivazian
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Nick Piggin
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Nick Piggin
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Nick Piggin
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Nick Piggin
Cc: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Rework the generic block "cont" routines to handle the new aops. Supporting
cont_prepare_write would take quite a lot of code to support, so remove it
instead (and we later convert all filesystems to use it).write_begin gets passed AOP_FLAG_CONT_EXPAND when called from
generic_cont_expand, so filesystems can avoid the old hacks they used.Signed-off-by: Nick Piggin
Cc: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Cc: Nick Piggin
Cc: Steven Whitehouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Nick Piggin
Cc: David Chinner
Cc: Timothy Shimmin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Convert ext4 to use write_begin()/write_end() methods.
Signed-off-by: Badari Pulavarty
Signed-off-by: Nick Piggin
Cc: Dmitriy Monakhov
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Various fixes and improvements
Signed-off-by: Badari Pulavarty
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds