Eric Lee / smarc-fsl-linux-kernel

28 Jun, 2008

1 commit

ad118c54a stacktrace: add saved stack traces to backtrace self-test ... Browse Code »

This patch adds saved stack-traces to the backtrace suite of self-tests.

Note that we don't depend on or unconditionally enable CONFIG_STACKTRACE
because not all architectures may have it (and we still want to enable the
other tests for those architectures).

Cc: Arjan van de Ven
Signed-off-by: Vegard Nossum
Cc: Arjan van de Ven
Cc: Andrew Morton
Signed-off-by: Ingo Molnar

Vegard Nossum
2008-06-28 00:09:15 +0800

25 May, 2008

1 commit

886dd5825 debugging: make stacktrace independent from DEBUG_KERNEL ... Browse Code »

Signed-off-by: Ingo Molnar
Signed-off-by: Thomas Gleixner

Ingo Molnar
2008-05-25 21:55:20 +0800

19 May, 2008

1 commit

f9ebcd9d4 lmb: Fix compile warning ... Browse Code »

lib/lmb.c: In function 'lmb_dump_all':
lib/lmb.c:51: warning: format '%lx' expects type 'long unsigned int', but argument 2 has type 'u64'

Signed-off-by: Kumar Gala

Kumar Gala
2008-05-19 12:35:43 +0800

15 May, 2008

4 commits

8f40f672e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs ... Browse Code »

* 'for-linus' of ssh://master.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: fix error path during early mount
9p: make cryptic unknown error from server less scary
9p: fix flags length in net
9p: Correct fidpool creation failure in p9_client_create
9p: use struct mutex instead of struct semaphore
9p: propagate parse_option changes to client and transports
fs/9p/v9fs.c (v9fs_parse_options): Handle kstrdup and match_strdup failure.
9p: Documentation updates
add match_strlcpy() us it to make v9fs make uname and remotename parsing more robust

Linus Torvalds
2008-05-15 10:30:51 +0800
8978a3188 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6:
sparc64: Use a TS_RESTORE_SIGMASK
lmb: Make lmb debugging more useful.
lmb: Fix inconsistent alignment of size argument.
sparc: Fix mremap address range validation.

Linus Torvalds
2008-05-15 10:11:36 +0800
3fc957721 lib: create common ascii hex array ... Browse Code »

Add a common hex array in hexdump.c so everyone can use it.

Add a common hi/lo helper to avoid the shifting masking that is
done to get the upper and lower nibbles of a byte value.

Pull the pack_hex_byte helper from kgdb as it is opencoded many
places in the tree that will be consolidated.

Signed-off-by: Harvey Harrison
Acked-by: Paul Mundt
Cc: Jason Wessel
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Harvey Harrison
2008-05-15 10:11:14 +0800
b32a09db4 add match_strlcpy() us it to make v9fs make uname and remotename parsing more robust ... Browse Code »

match_strcpy() is a somewhat creepy function: the caller needs to make sure
that the destination buffer is big enough, and when he screws up or
forgets, match_strcpy() happily overruns the buffer.

There's exactly one customer: v9fs_parse_options(). I believe it currently
can't overflow its buffer, but that's not exactly obvious.

The source string is a substing of the mount options. The kernel silently
truncates those to PAGE_SIZE bytes, including the terminating zero. See
compat_sys_mount() and do_mount().

The destination buffer is obtained from __getname(), which allocates from
name_cachep, which is initialized by vfs_caches_init() for size PATH_MAX.

We're safe as long as PATH_MAX
Cc: Latchesar Ionkov
Cc: Jim Meyering
Cc: "Randy.Dunlap"
Signed-off-by: Andrew Morton
Signed-off-by: Eric Van Hensbergen

Markus Armbruster
2008-05-15 08:23:25 +0800

13 May, 2008

3 commits

f4ed0deae cpumask: remove bitmap_scnprintf_len and cpumask_scnprintf_len ... Browse Code »

They aren't used. They were briefly used as part of some other patches to
provide an alternative format for displaying some /proc and /sys cpumasks.
They probably should have been removed when those other patches were dropped,
in favor of a different solution.

Signed-off-by: Paul Jackson
Cc: "Mike Travis"
Cc: "Bert Wesarg"
Cc: Alexey Dobriyan
Cc: WANG Cong
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2008-05-13 23:02:25 +0800
faa6cfde7 lmb: Make lmb debugging more useful. ... Browse Code »

Having to muck with the build and set DEBUG just to
get lmb_dump_all() to print things isn't very useful.

So use pr_info() and use an early boot param
"lmb=debug" so we can simply ask users to reboot
with this option when we need some debugging from
them.

Signed-off-by: David S. Miller

David S. Miller
2008-05-13 08:21:55 +0800
4978db5bd lmb: Fix inconsistent alignment of size argument. ... Browse Code »

When allocating, if we will align up the size when making
the reservation, we should also align the size for the
check that the space is actually available.

The simplest thing is to just aling the size up from
the beginning, then we can use plain 'size' throughout.

Signed-off-by: David S. Miller

David S. Miller
2008-05-13 07:51:15 +0800

11 May, 2008

1 commit

8e3e076c5 BKL: revert back to the old spinlock implementation ... Browse Code »

The generic semaphore rewrite had a huge performance regression on AIM7
(and potentially other BKL-heavy benchmarks) because the generic
semaphores had been rewritten to be simple to understand and fair. The
latter, in particular, turns a semaphore-based BKL implementation into a
mess of scheduling.

The attempt to fix the performance regression failed miserably (see the
previous commit 00b41ec2611dc98f87f30753ee00a53db648d662 'Revert
"semaphore: fix"'), and so for now the simple and sane approach is to
instead just go back to the old spinlock-based BKL implementation that
never had any issues like this.

This patch also has the advantage of being reported to fix the
regression completely according to Yanmin Zhang, unlike the semaphore
hack which still left a couple percentage point regression.

As a spinlock, the BKL obviously has the potential to be a latency
issue, but it's not really any different from any other spinlock in that
respect. We do want to get rid of the BKL asap, but that has been the
plan for several years.

These days, the biggest users are in the tty layer (open/release in
particular) and Alan holds out some hope:

"tty release is probably a few months away from getting cured - I'm
afraid it will almost certainly be the very last user of the BKL in
tty to get fixed as it depends on everything else being sanely locked."

so while we're not there yet, we do have a plan of action.

Tested-by: Yanmin Zhang
Cc: Ingo Molnar
Cc: Andi Kleen
Cc: Matthew Wilcox
Cc: Alexander Viro
Cc: Andrew Morton
Signed-off-by: Linus Torvalds

Linus Torvalds
2008-05-11 11:58:02 +0800

06 May, 2008

1 commit

2e83fc4df Merge branch 'powerpc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc ... Browse Code »

* 'powerpc-next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[POWERPC] Assign PDE->data before gluing PDE into /proc tree
[POWERPC] devres: Add devm_ioremap_prot()
[POWERPC] macintosh: ADB driver: adb_handler_sem semaphore to mutex
[POWERPC] macintosh: windfarm_smu_sat: semaphore to mutex
[POWERPC] macintosh: therm_pm72: driver_lock semaphore to mutex

Linus Torvalds
2008-05-06 06:48:53 +0800

05 May, 2008

2 commits

e024cbd25 kgdb: kconfig fix xconfig/menuconfig element ... Browse Code »

Kconfig.kgdb: fix menuconfig element

Signed-off-by: Jan Engelhardt
Signed-off-by: Jason Wessel

Jan Engelhardt
2008-05-05 20:13:21 +0800
b41e5fffe [POWERPC] devres: Add devm_ioremap_prot() ... Browse Code »

We provide an ioremap_flags, so this provides a corresponding
devm_ioremap_prot. The slight name difference is at Ben
Herrenschmidt's request as he plans on changing ioremap_flags to
ioremap_prot in the future.

Signed-off-by: Emil Medve
Signed-off-by: Kumar Gala
Acked-by: Tejun Heo
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Paul Mackerras

Emil Medve
2008-05-05 14:47:14 +0800

01 May, 2008

8 commits

af8e2a4cb idr: fix idr_remove() ... Browse Code »

The return inside the loop makes us free only a single layer.

Signed-off-by: Nadia Derbey
Cc: "Paul E. McKenney"
Cc: Manfred Spraul
Cc: Jim Houston
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nadia Derbey
2008-05-01 23:04:00 +0800
34990cf70 Add a new sysfs_streq() string comparison function ... Browse Code »

Add a new sysfs_streq() string comparison function, which ignores
the trailing newlines found in sysfs inputs. By example:

sysfs_streq("a", "b") ==> false
sysfs_streq("a", "a") ==> true
sysfs_streq("a", "a\n") ==> true
sysfs_streq("a\n", "a") ==> true

This is intended to simplify parsing of sysfs inputs, letting them
avoid the need to manually strip off newlines from inputs.

Signed-off-by: David Brownell
Acked-by: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Brownell
2008-05-01 23:03:59 +0800
6f6d6a1a6 rename div64_64 to div64_u64 ... Browse Code »

Rename div64_64 to div64_u64 to make it consistent with the other divide
functions, so it clearly includes the type of the divide. Move its definition
to math64.h as currently no architecture overrides the generic implementation.
They can still override it of course, but the duplicated declarations are
avoided.

Signed-off-by: Roman Zippel
Cc: Avi Kivity
Cc: Russell King
Cc: Geert Uytterhoeven
Cc: Ralf Baechle
Cc: David Howells
Cc: Jeff Dike
Cc: Ingo Molnar
Cc: "David S. Miller"
Cc: Patrick McHardy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roman Zippel
2008-05-01 23:03:58 +0800
2418f4f28 introduce explicit signed/unsigned 64bit divide ... Browse Code »

The current do_div doesn't explicitly say that it's unsigned and the signed
counterpart is missing, which is e.g. needed when dealing with time values.

This introduces 64bit signed/unsigned divide functions which also attempts to
cleanup the somewhat awkward calling API, which often requires the use of
temporary variables for the dividend. To avoid the need for temporary
variables everywhere for the remainder, each divide variant also provides a
version which doesn't return the remainder.

Each architecture can now provide optimized versions of these function,
otherwise generic fallback implementations will be used.

As an example I provided an alternative for the current x86 divide, which
avoids the asm casts and using an union allows gcc to generate better code.
It also avoids the upper divde in a few more cases, where the result is known
(i.e. upper quotient is zero).

Signed-off-by: Roman Zippel
Cc: john stultz
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roman Zippel
2008-05-01 23:03:58 +0800
c3bb7fada klist: fix coding style errors in klist.h and klist.c ... Browse Code »

Finally clean up the odd spacing in these files.

Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2008-05-01 07:52:58 +0800
4f452e8aa devres: support addresses greater than an unsigned long via dev_ioremap ... Browse Code »

Use a resource_size_t instead of unsigned long since some arch's are
capable of having ioremap deal with addresses greater than the size of a
unsigned long.

Signed-off-by: Kumar Gala
Cc: Tejun Heo
Cc: Jeff Garzik
Signed-off-by: Greg Kroah-Hartman

Kumar Gala
2008-05-01 07:52:48 +0800
a4ca66174 kobject: do not copy vargs, just pass them around ... Browse Code »

This prevents a few unneeded copies.

Signed-off-by: Kay Sievers
Signed-off-by: Greg Kroah-Hartman

Kay Sievers
2008-05-01 07:52:48 +0800
93dd40013 klist: implement klist_add_{after|before}() ... Browse Code »

Add klist_add_after() and klist_add_before() which puts a new node
after and before an existing node, respectively. This is useful for
callers which need to keep klist ordered. Note that synchronizing
between simultaneous additions for ordering is the caller's
responsibility.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2008-05-01 07:52:47 +0800

30 Apr, 2008

5 commits

810304db7 lib: replace remaining __FUNCTION__ occurrences ... Browse Code »

__FUNCTION__ is gcc specific, use __func__

Signed-off-by: Harvey Harrison
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Harvey Harrison
2008-04-30 23:29:54 +0800
c6f3a97f8 debugobjects: add timer specific object debugging code ... Browse Code »

Add calls to the generic object debugging infrastructure and provide fixup
functions which allow to keep the system alive when recoverable problems have
been detected by the object debugging core code.

Signed-off-by: Thomas Gleixner
Acked-by: Ingo Molnar
Cc: Greg KH
Cc: Randy Dunlap
Cc: Kay Sievers
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thomas Gleixner
2008-04-30 23:29:53 +0800
3ac7fe5a4 infrastructure to debug (dynamic) objects ... Browse Code »

We can see an ever repeating problem pattern with objects of any kind in the
kernel:

1) freeing of active objects
2) reinitialization of active objects

Both problems can be hard to debug because the crash happens at a point where
we have no chance to decode the root cause anymore. One problem spot are
kernel timers, where the detection of the problem often happens in interrupt
context and usually causes the machine to panic.

While working on a timer related bug report I had to hack specialized code
into the timer subsystem to get a reasonable hint for the root cause. This
debug hack was fine for temporary use, but far from a mergeable solution due
to the intrusiveness into the timer code.

The code further lacked the ability to detect and report the root cause
instantly and keep the system operational.

Keeping the system operational is important to get hold of the debug
information without special debugging aids like serial consoles and special
knowledge of the bug reporter.

The problems described above are not restricted to timers, but timers tend to
expose it usually in a full system crash. Other objects are less explosive,
but the symptoms caused by such mistakes can be even harder to debug.

Instead of creating specialized debugging code for the timer subsystem a
generic infrastructure is created which allows developers to verify their code
and provides an easy to enable debug facility for users in case of trouble.

The debugobjects core code keeps track of operations on static and dynamic
objects by inserting them into a hashed list and sanity checking them on
object operations and provides additional checks whenever kernel memory is
freed.

The tracked object operations are:
- initializing an object
- adding an object to a subsystem list
- deleting an object from a subsystem list

Each operation is sanity checked before the operation is executed and the
subsystem specific code can provide a fixup function which allows to prevent
the damage of the operation. When the sanity check triggers a warning message
and a stack trace is printed.

The list of operations can be extended if the need arises. For now it's
limited to the requirements of the first user (timers).

The core code enqueues the objects into hash buckets. The hash index is
generated from the address of the object to simplify the lookup for the check
on kfree/vfree. Each bucket has it's own spinlock to avoid contention on a
global lock.

The debug code can be compiled in without being active. The runtime overhead
is minimal and could be optimized by asm alternatives. A kernel command line
option enables the debugging code.

Thanks to Ingo Molnar for review, suggestions and cleanup patches.

Signed-off-by: Thomas Gleixner
Signed-off-by: Ingo Molnar
Cc: Greg KH
Cc: Randy Dunlap
Cc: Kay Sievers
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Thomas Gleixner
2008-04-30 23:29:53 +0800
a42dde041 mm: bdi: allow setting a maximum for the bdi dirty limit ... Browse Code »

Add "max_ratio" to /sys/class/bdi. This indicates the maximum percentage of
the global dirty threshold allocated to this bdi.

[mszeredi@suse.cz]

- fix parsing in max_ratio_store().
- export bdi_set_max_ratio() to modules
- limit bdi_dirty with bdi->max_ratio
- document new sysfs attribute

Signed-off-by: Peter Zijlstra
Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2008-04-30 23:29:50 +0800
cf0ca9fe5 mm: bdi: export BDI attributes in sysfs ... Browse Code »

Provide a place in sysfs (/sys/class/bdi) for the backing_dev_info object.
This allows us to see and set the various BDI specific variables.

In particular this properly exposes the read-ahead window for all relevant
users and /sys/block//queue/read_ahead_kb should be deprecated.

With patient help from Kay Sievers and Greg KH

[mszeredi@suse.cz]

- split off NFS and FUSE changes into separate patches
- document new sysfs attributes under Documentation/ABI
- do bdi_class_init as a core_initcall, otherwise the "default" BDI
won't be initialized
- remove bdi_init_fmt macro, it's not used very much

[akpm@linux-foundation.org: fix ia64 warning]
Signed-off-by: Peter Zijlstra
Cc: Kay Sievers
Acked-by: Greg KH
Cc: Trond Myklebust
Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2008-04-30 23:29:49 +0800

29 Apr, 2008

11 commits

867a89e0b Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc ... Browse Code »

* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc:
[RAPIDIO] Change RapidIO doorbell source and target ID field to 16-bit
[RAPIDIO] Add RapidIO connection info print out and re-training for broken connections
[RAPIDIO] Add serial RapidIO controller support, which includes MPC8548, MPC8641
[RAPIDIO] Add RapidIO node probing into MPC86xx_HPCN board id table
[RAPIDIO] Add RapidIO node into MPC8641HPCN dts file
[RAPIDIO] Auto-probe the RapidIO system size
[RAPIDIO] Add OF-tree support to RapidIO controller driver
[RAPIDIO] Add RapidIO multi mport support
[RAPIDIO] Move include/asm-ppc/rio.h to asm-powerpc
[RAPIDIO] Add RapidIO option to kernel configuration
[RAPIDIO] Change RIO function mpc85xx_ to fsl_
[POWERPC] Provide walk_memory_resource() for powerpc
[POWERPC] Update lmb data structures for hotplug memory add/remove
[POWERPC] Hotplug memory remove notifications for powerpc
[POWERPC] windfarm: Add PowerMac 12,1 support
[POWERPC] Fix building of pmac32 when CONFIG_NVRAM=m
[POWERPC] Add IRQSTACKS support on ppc32
[POWERPC] Use __always_inline for xchg* and cmpxchg*
[POWERPC] Add fast little-endian switch system call

Linus Torvalds
2008-04-29 23:19:14 +0800
fee4b19fb bitops: remove "optimizations" ... Browse Code »

The mapsize optimizations which were moved from x86 to the generic
code in commit 64970b68d2b3ed32b964b0b30b1b98518fde388e increased the
binary size on non x86 architectures.

Looking into the real effects of the "optimizations" it turned out
that they are not used in find_next_bit() and find_next_zero_bit().

The ones in find_first_bit() and find_first_zero_bit() are used in a
couple of places but none of them is a real hot path.

Remove the "optimizations" all together and call the library functions
unconditionally.

Boot-tested on x86 and compile tested on every cross compiler I have.

Signed-off-by: Thomas Gleixner
Signed-off-by: Linus Torvalds

Thomas Gleixner
2008-04-29 23:11:16 +0800
199f0ca51 idr: create idr_layer_cache at boot time ... Browse Code »

Avoid a possible kmem_cache_create() failure by creating idr_layer_cache
unconditionary at boot time rather than creating it on-demand when idr_init()
is called the first time.

This change also enables us to eliminate the check every time idr_init() is
called.

[akpm@linux-foundation.org: rename init_id_cache() to idr_init_cache()]
[akpm@linux-foundation.org: fix alpha build]
Signed-off-by: Akinobu Mita
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2008-04-29 23:06:25 +0800
309df0c50 dma/ia64: update ia64 machvecs, swiotlb.c ... Browse Code »

Change all ia64 machvecs to use the new dma_*map*_attrs() interfaces.
Implement the old dma_*map_*() interfaces in terms of the corresponding new
interfaces. For ia64/sn, make use of one dma attribute,
DMA_ATTR_WRITE_BARRIER. Introduce swiotlb_*map*_attrs() functions.

Signed-off-by: Arthur Kepner
Cc: Tony Luck
Cc: Jesse Barnes
Cc: Jes Sorensen
Cc: Randy Dunlap
Cc: Roland Dreier
Cc: James Bottomley
Cc: David Miller
Cc: Benjamin Herrenschmidt
Cc: Grant Grundler
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Arthur Kepner
2008-04-29 23:06:12 +0800
5f97a5a87 isolate ratelimit from printk.c for other use ... Browse Code »

Due to the rcupreempt.h WARN_ON trigged, I got 2G syslog file. For some
serious complaining of kernel, we need repeat the warnings, so here I isolate
the ratelimit part of printk.c to a standalone file.

Signed-off-by: Dave Young
Acked-by: Paul E. McKenney
Tested-by: Paul E. McKenney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dave Young
2008-04-29 23:06:06 +0800
a85225092 swiotlb: use iommu_is_span_boundary helper function ... Browse Code »

iommu_is_span_boundary in lib/iommu-helper.c was exported for PARISC IOMMUs
(commit 3715863aa142c4f4c5208f5f3e5e9bac06006d2f). SWIOTLB can use it instead
of the homegrown function.

Signed-off-by: FUJITA Tomonori
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: H. Peter Anvin
Cc: Tony Luck
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

FUJITA Tomonori
2008-04-29 23:06:05 +0800
a7133a155 lib/swiotlb.c: cleanups ... Browse Code »

There's a pointlessly braced block of code in there. Remove the braces and
save a tabstop.

Cc: Andi Kleen
Cc: FUJITA Tomonori
Cc: Jan Beulich
Cc: Tony Luck
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2008-04-29 23:06:05 +0800
b70d3a2c5 iomap: fix 64 bits resources on 32 bits ... Browse Code »

Almost all implementations of pci_iomap() in the kernel, including the generic
lib/iomap.c one, copies the content of a struct resource into unsigned long's
which will break on 32 bits platforms with 64 bits resources.

This fixes all definitions of pci_iomap() to use resource_size_t. I also
"fixed" the 64bits arch for consistency.

Signed-off-by: Benjamin Herrenschmidt
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Benjamin Herrenschmidt
2008-04-29 23:06:02 +0800
22caa0417 lib/inflate.c: handle failed malloc() ... Browse Code »

lib/inflate.c (inflate_dynamic): Don't deref NULL upon failed malloc.

Signed-off-by: Jim Meyering
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jim Meyering
2008-04-29 23:06:02 +0800
9d88a2eb6 [POWERPC] Provide walk_memory_resource() for powerpc ... Browse Code »

Provide walk_memory_resource() for 64-bit powerpc. PowerPC maintains
logical memory region mapping in the lmb.memory structure. Walk
through these structures and do the callbacks for the contiguous
chunks.

Signed-off-by: Badari Pulavarty
Cc: Yasunori Goto
Cc: Benjamin Herrenschmidt
Signed-off-by: Andrew Morton
Signed-off-by: Paul Mackerras

Badari Pulavarty
2008-04-29 13:57:53 +0800
98d5c21c8 [POWERPC] Update lmb data structures for hotplug memory add/remove ... Browse Code »

The powerpc kernel maintains information about logical memory blocks
in the lmb.memory structure, which is initialized and updated at boot
time, but not when memory is added or removed while the kernel is
running.

This adds a hotplug memory notifier which updates lmb.memory when
memory is added or removed. This information is useful for eHEA
driver to find out the memory layout and holes.

NOTE: No special locking is needed for lmb_add() and lmb_remove().
Calls to these are serialized by caller. (pSeries_reconfig_chain).

Signed-off-by: Badari Pulavarty
Cc: Yasunori Goto
Cc: Benjamin Herrenschmidt
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Paul Mackerras

Badari Pulavarty
2008-04-29 13:57:53 +0800

28 Apr, 2008

2 commits

7ea931c9f mempolicy: add bitmap_onto() and bitmap_fold() operations ... Browse Code »

The following adds two more bitmap operators, bitmap_onto() and bitmap_fold(),
with the usual cpumask and nodemask wrappers.

The bitmap_onto() operator computes one bitmap relative to another. If the
n-th bit in the origin mask is set, then the m-th bit of the destination mask
will be set, where m is the position of the n-th set bit in the relative mask.

The bitmap_fold() operator folds a bitmap into a second that has bit m set iff
the input bitmap has some bit n set, where m == n mod sz, for the specified sz
value.

There are two substantive changes between this patch and its
predecessor bitmap_relative:
1) Renamed bitmap_relative() to be bitmap_onto().
2) Added bitmap_fold().

The essential motivation for bitmap_onto() is to provide a mechanism for
converting a cpuset-relative CPU or Node mask to an absolute mask. Cpuset
relative masks are written as if the current task were in a cpuset whose CPUs
or Nodes were just the consecutive ones numbered 0..N-1, for some N. The
bitmap_onto() operator is provided in anticipation of adding support for the
first such cpuset relative mask, by the mbind() and set_mempolicy() system
calls, using a planned flag of MPOL_F_RELATIVE_NODES. These bitmap operators
(and their nodemask wrappers, in particular) will be used in code that
converts the user specified cpuset relative memory policy to a specific system
node numbered policy, given the current mems_allowed of the tasks cpuset.

Such cpuset relative mempolicies will address two deficiencies
of the existing interface between cpusets and mempolicies:
1) A task cannot at present reliably establish a cpuset
relative mempolicy because there is an essential race
condition, in that the tasks cpuset may be changed in
between the time the task can query its cpuset placement,
and the time the task can issue the applicable mbind or
set_memplicy system call.
2) A task cannot at present establish what cpuset relative
mempolicy it would like to have, if it is in a smaller
cpuset than it might have mempolicy preferences for,
because the existing interface only allows specifying
mempolicies for nodes currently allowed by the cpuset.

Cpuset relative mempolicies are useful for tasks that don't distinguish
particularly between one CPU or Node and another, but only between how many of
each are allowed, and the proper placement of threads and memory pages on the
various CPUs and Nodes available.

The motivation for the added bitmap_fold() can be seen in the following
example.

Let's say an application has specified some mempolicies that presume 16 memory
nodes, including say a mempolicy that specified MPOL_F_RELATIVE_NODES (cpuset
relative) nodes 12-15. Then lets say that application is crammed into a
cpuset that only has 8 memory nodes, 0-7. If one just uses bitmap_onto(),
this mempolicy, mapped to that cpuset, would ignore the requested relative
nodes above 7, leaving it empty of nodes. That's not good; better to fold the
higher nodes down, so that some nodes are included in the resulting mapped
mempolicy. In this case, the mempolicy nodes 12-15 are taken modulo 8 (the
weight of the mems_allowed of the confining cpuset), resulting in a mempolicy
specifying nodes 4-7.

Signed-off-by: Paul Jackson
Signed-off-by: David Rientjes
Cc: Christoph Lameter
Cc: Andi Kleen
Cc: Mel Gorman
Cc: Lee Schermerhorn
Cc:
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Jackson
2008-04-28 23:58:19 +0800
488514d17 Remove set_migrateflags() ... Browse Code »

Migrate flags must be set on slab creation as agreed upon when the antifrag
logic was reviewed. Otherwise some slabs of a slabcache will end up in the
unmovable and others in the reclaimable section depending on which flag was
active when a new slab page was allocated.

This likely slid in somehow when antifrag was merged. Remove it.

The buffer_heads are always allocated with __GFP_RECLAIMABLE because the
SLAB_RECLAIM_ACCOUNT option is set. The set_migrateflags() never had any
effect there.

Radix tree allocations are not directly reclaimable but they are allocated
with __GFP_RECLAIMABLE set on each allocation. We now set
SLAB_RECLAIM_ACCOUNT on radix tree slab creation making sure that radix
tree slabs are consistently placed in the reclaimable section. Radix tree
slabs will also be accounted as such.

There is then no user left of set_migratepages. So remove it.

Signed-off-by: Christoph Lameter
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2008-04-28 23:58:17 +0800