Eric Lee / smarc-fsl-linux-kernel

17 Oct, 2008

2 commits

6d97e2345 ipc/sem.c: make free_un() static ... Browse Code »

Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2008-10-17 02:21:51 +0800
f221e726b sysctl: simplify ->strategy ... Browse Code »

name and nlen parameters passed to ->strategy hook are unused, remove
them. In general ->strategy hook should know what it's doing, and don't
do something tricky for which, say, pointer to original userspace array
may be needed (name).

Signed-off-by: Alexey Dobriyan
Acked-by: David S. Miller [ networking bits ]
Cc: Ralf Baechle
Cc: David Howells
Cc: Matt Mackall
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2008-10-17 02:21:47 +0800

27 Jul, 2008

2 commits

f419a2e3b [PATCH] kill nameidata passing to permission(), rename to inode_permission() ... Browse Code »

Incidentally, the name that gives hundreds of false positives on grep
is not a good idea...

Signed-off-by: Al Viro

Al Viro
2008-07-27 08:53:31 +0800
51cc50685 SL*B: drop kmem cache argument from constructor ... Browse Code »

Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.

Non-trivial places are:
arch/powerpc/mm/init_64.c
arch/powerpc/mm/hugetlbpage.c

This is flag day, yes.

Signed-off-by: Alexey Dobriyan
Acked-by: Pekka Enberg
Acked-by: Christoph Lameter
Cc: Jon Tollefson
Cc: Nick Piggin
Cc: Matt Mackall
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2008-07-27 03:00:07 +0800

26 Jul, 2008

8 commits

9eefe520c ipc: do not use a negative value to re-enable msgmni automatic recomputing ... Browse Code »

This patch proposes an alternative to the "magical
positive-versus-negative number trick" Andrew complained about last week
in http://lkml.org/lkml/2008/6/24/418.

This had been introduced with the patches that scale msgmni to the amount
of lowmem. With these patches, msgmni has a registered notification
routine that recomputes msgmni value upon memory add/remove or ipc
namespace creation/ removal.

When msgmni is changed from user space (i.e. value written to the proc
file), that notification routine is unregistered, and the way to make it
registered back is to write a negative value into the proc file. This is
the "magical positive-versus-negative number trick".

To fix this, a new proc file is introduced: /proc/sys/kernel/auto_msgmni.
This file acts as ON/OFF for msgmni automatic recomputing.

With this patch, the process is the following:
1) kernel boots in "automatic recomputing mode"
/proc/sys/kernel/msgmni contains the value that has been computed (depends
on lowmem)
/proc/sys/kernel/automatic_msgmni contains "1"

2) echo > /proc/sys/kernel/msgmni
. sets msg_ctlmni to
. de-activates automatic recomputing (i.e. if, say, some memory is added
msgmni won't be recomputed anymore)
. /proc/sys/kernel/automatic_msgmni now contains "0"

3) echo "0" > /proc/sys/kernel/automatic_msgmni
. de-activates msgmni automatic recomputing
this has the same effect as 2) except that msg_ctlmni's value stays
blocked at its current value)

3) echo "1" > /proc/sys/kernel/automatic_msgmni
. recomputes msgmni's value based on the current available memory size
and number of ipc namespaces
. re-activates automatic recomputing for msgmni.

Signed-off-by: Nadia Derbey
Cc: Solofo Ramangalahy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nadia Derbey
2008-07-26 01:53:42 +0800
f1a43f93f ipc: use simple_read_from_buffer() ... Browse Code »

Also this patch kills unneccesary trailing NULL character.

Signed-off-by: Akinobu Mita
Cc: Nadia Derbey
Cc: Manfred Spraul
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2008-07-26 01:53:42 +0800
380af1b33 ipc/sem.c: rewrite undo list locking ... Browse Code »

The attached patch:
- reverses the locking order of ulp->lock and sem_lock:
Previously, it was first ulp->lock, then inside sem_lock.
Now it's the other way around.
- converts the undo structure to rcu.

Benefits:
- With the old locking order, IPC_RMID could not kfree the undo structures.
The stale entries remained in the linked lists and were released later.
- The patch fixes a a race in semtimedop(): if both IPC_RMID and a semget() that
recreates exactly the same id happen between find_alloc_undo() and sem_lock,
then semtimedop() would access already kfree'd memory.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul
Reviewed-by: Nadia Derbey
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2008-07-26 01:53:42 +0800
a1193f8ec ipc/sem.c: convert sem_array.sem_pending to struct list_head ... Browse Code »

sem_array.sem_pending is a double linked list, the attached patch converts
it to struct list_head.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul
Reviewed-by: Nadia Derbey
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2008-07-26 01:53:42 +0800
2c0c29d41 ipc/sem.c: remove unused entries from struct sem_queue ... Browse Code »

sem_queue.sma and sem_queue.id were never used, the attached patch removes
them.

Signed-off-by: Manfred Spraul
Reviewed-by: Nadia Derbey
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2008-07-26 01:53:42 +0800
4daa28f6d ipc/sem.c: convert undo structures to struct list_head ... Browse Code »

The undo structures contain two linked lists, the attached patch replaces
them with generic struct list_head lists.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Manfred Spraul
Cc: Nadia Derbey
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2008-07-26 01:53:42 +0800
00c2bf85d ipc: get rid of ipc_lock_down() ... Browse Code »

Remove the ipc_lock_down() routines: they used to call idr_find() locklessly
(given that the ipc ids lock was already held), so they are not needed
anymore.

Signed-off-by: Nadia Derbey
Acked-by: "Paul E. McKenney"
Cc: Manfred Spraul
Cc: Jim Houston
Cc: Pierre Peiffer
Acked-by: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nadia Derbey
2008-07-26 01:53:42 +0800
983bfb7db ipc: call idr_find() without locking in ipc_lock() ... Browse Code »

Call idr_find() locklessly from ipc_lock(), since the idr tree is now RCU
protected.

Signed-off-by: Nadia Derbey
Acked-by: "Paul E. McKenney"
Cc: Manfred Spraul
Cc: Jim Houston
Cc: Pierre Peiffer
Acked-by: Rik van Riel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nadia Derbey
2008-07-26 01:53:42 +0800

25 Jul, 2008

1 commit

a55164389 hugetlb: modular state for hugetlb page size ... Browse Code »

The goal of this patchset is to support multiple hugetlb page sizes. This
is achieved by introducing a new struct hstate structure, which
encapsulates the important hugetlb state and constants (eg. huge page
size, number of huge pages currently allocated, etc).

The hstate structure is then passed around the code which requires these
fields, they will do the right thing regardless of the exact hstate they
are operating on.

This patch adds the hstate structure, with a single global instance of it
(default_hstate), and does the basic work of converting hugetlb to use the
hstate.

Future patches will add more hstate structures to allow for different
hugetlbfs mounts to have different page sizes.

[akpm@linux-foundation.org: coding-style fixes]
Acked-by: Adam Litke
Acked-by: Nishanth Aravamudan
Signed-off-by: Andi Kleen
Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andi Kleen
2008-07-25 01:47:17 +0800

14 Jun, 2008

1 commit

4ae127d1b Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 ... Browse Code »

Conflicts:

drivers/net/smc911x.c

David S. Miller
2008-06-14 11:52:39 +0800

13 Jun, 2008

1 commit

6c826818f /proc/sysvipc/shm: fix 32-bit truncation of segment sizes ... Browse Code »

sysvipc_shm_proc_show() picks between format strings (based on the
expected maximum length of a SHM segment) in a way that prevents gcc from
performing format checks on the seq_printf() parameters. This hid two
format errors - shp->shm_segsz and shp->shm_nattach are both unsigned
long, but were being printed as unsigned int and signed int respectively.
This leads to 32-bit truncation of SHM segment sizes reported in
/proc/sysvipc/shm. (And for nattach, but that's less of a problem for
most users).

This patch makes the format string directly visible to gcc's format
specifier checker, and fixes the two broken format specifiers.

Signed-off-by: Paul Menage
Cc: Nadia Derbey
Cc: Manfred Spraul
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2008-06-13 09:05:41 +0800

10 Jun, 2008

1 commit

c592713b3 shm: Remove silly double assignment ... Browse Code »

Found a silly double assignment of err is do_shmat. Silly, but good to
clean up the useless code.

Signed-off-by: Neil Horman
Signed-off-by: Linus Torvalds

Neil Horman
2008-06-10 22:58:00 +0800

07 Jun, 2008

1 commit

dfcceb26f ipc: only output msgmni value at boot time ... Browse Code »

When posting:
[PATCH 1/8] Scaling msgmni to the amount of lowmem
(see http://lkml.org/lkml/2008/2/11/171), I have added a KERN_INFO message
that is output each time msgmni is recomputed.

In http://lkml.org/lkml/2008/4/29/575 Tony Luck complained that this
message references an ipc namespace address that is useless.

I first thought of using an audit_log instead of a printk, as suggested by
Serge Hallyn. But unfortunately, we do not have any other information
than the namespace address to provide here too. So I chose to move the
message and output it only at boot time, removing the reference to the
namespace.

Signed-off-by: Nadia Derbey
Cc: Pierre Peiffer
Cc: Manfred Spraul
Acked-by: Tony Luck
Cc: "Serge E. Hallyn"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nadia Derbey
2008-06-07 02:29:12 +0800

06 Jun, 2008

1 commit

9457afee8 netlink: Remove nonblock parameter from netlink_attachskb ... Browse Code »

Signed-off-by: Denis V. Lunev
Signed-off-by: David S. Miller

Denis V. Lunev
2008-06-06 02:23:39 +0800

04 May, 2008

1 commit

269f21344 tiny mq_open optimization ... Browse Code »

A very small cleanup for mq_open.

We do not have to call set_close_on_exit if we create the file
descriptor right away with the flag set. We have a function for this
now. The resulting code is smaller and a tiny bit faster.

Signed-off-by: Ulrich Drepper
Signed-off-by: Linus Torvalds

Ulrich Drepper
2008-05-04 04:50:33 +0800

29 Apr, 2008

19 commits

6a6375db1 sysvipc: use non-racy method for proc entries creation ... Browse Code »

Use proc_create_data() to make sure that ->proc_fops and ->data be setup
before gluing PDE to main tree.

Signed-off-by: Denis V. Lunev
Cc: Alexey Dobriyan
Cc: "Eric W. Biederman"
Cc: Nadia Derbey
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Denis V. Lunev
2008-04-29 23:06:20 +0800
9edff4ab1 ipc: sysvsem: implement sys_unshare(CLONE_SYSVSEM) ... Browse Code »

sys_unshare(CLONE_NEWIPC) doesn't handle the undo lists properly, this can
cause a kernel memory corruption. CLONE_NEWIPC must detach from the existing
undo lists.

Fix, part 1: add support for sys_unshare(CLONE_SYSVSEM)

The original reason to not support it was the potential (inevitable?)
confusion due to the fact that sys_unshare(CLONE_SYSVSEM) has the
inverse meaning of clone(CLONE_SYSVSEM).

Our two most reasonable options then appear to be (1) fully support
CLONE_SYSVSEM, or (2) continue to refuse explicit CLONE_SYSVSEM,
but always do it anyway on unshare(CLONE_SYSVSEM). This patch does
(1).

Changelog:
Apr 16: SEH: switch to Manfred's alternative patch which
removes the unshare_semundo() function which
always refused CLONE_SYSVSEM.

Signed-off-by: Manfred Spraul
Signed-off-by: Serge E. Hallyn
Acked-by: "Eric W. Biederman"
Cc: Pavel Emelyanov
Cc: Michael Kerrisk
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2008-04-29 23:06:14 +0800
44f564a4b ipc: add definitions of USHORT_MAX and others ... Browse Code »

Add definitions of USHORT_MAX and others into kernel. ipc uses it and slub
implementation might also use it.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Zhang Yanmin
Reviewed-by: Christoph Lameter
Cc: Nadia Derbey
Cc: "Pierre Peiffer"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zhang, Yanmin
2008-04-29 23:06:14 +0800
a5f75e7f2 IPC: consolidate all xxxctl_down() functions ... Browse Code »

semctl_down(), msgctl_down() and shmctl_down() are used to handle the same set
of commands for each kind of IPC. They all start to do the same job (they
retrieve the ipc and do some permission checks) before handling the commands
on their own.

This patch proposes to consolidate this by moving these same pieces of code
into one common function called ipcctl_pre_down().

It simplifies a little these xxxctl_down() functions and increases a little
the maintainability.

Signed-off-by: Pierre Peiffer
Acked-by: Serge Hallyn
Cc: Nadia Derbey
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pierre Peiffer
2008-04-29 23:06:14 +0800
8f4a3809c IPC: introduce ipc_update_perm() ... Browse Code »

The IPC_SET command performs the same permission setting for all IPCs. This
patch introduces a common ipc_update_perm() function to update these
permissions and makes use of it for all IPCs.

Signed-off-by: Pierre Peiffer
Acked-by: Serge Hallyn
Cc: Nadia Derbey
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pierre Peiffer
2008-04-29 23:06:13 +0800
016d7132f IPC: get rid of the use *_setbuf structure. ... Browse Code »

All IPCs make use of an intermetiate *_setbuf structure to handle the IPC_SET
command. This is not really needed and, moreover, it complicates a little bit
the code.

This patch gets rid of the use of it and uses directly the semid64_ds/
msgid64_ds/shmid64_ds structure.

In addition of removing one struture declaration, it also simplifies and
improves a little bit the common 64-bits path.

Signed-off-by: Pierre Peiffer
Acked-by: Serge Hallyn
Cc: Nadia Derbey
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pierre Peiffer
2008-04-29 23:06:13 +0800
21a4826a7 IPC/semaphores: remove one unused parameter from semctl_down() ... Browse Code »

semctl_down() takes one unused parameter: semnum. This patch proposes to get
rid of it.

Signed-off-by: Pierre Peiffer
Acked-by: Serge Hallyn
Cc: Nadia Derbey
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pierre Peiffer
2008-04-29 23:06:13 +0800
522bb2a2b IPC/semaphores: move the rwmutex handling inside semctl_down ... Browse Code »

semctl_down is called with the rwmutex (the one which protects the list of
ipcs) taken in write mode.

This patch moves this rwmutex taken in write-mode inside semctl_down.

This has the advantages of reducing a little bit the window during which this
rwmutex is taken, clarifying sys_semctl, and finally of having a coherent
behaviour with [shm|msg]ctl_down

Signed-off-by: Pierre Peiffer
Acked-by: Serge Hallyn
Cc: Nadia Derbey
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pierre Peiffer
2008-04-29 23:06:13 +0800
a0d092fc2 IPC/message queues: introduce msgctl_down ... Browse Code »

Currently, sys_msgctl is not easy to read.

This patch tries to improve that by introducing the msgctl_down function to
handle all commands requiring the rwmutex to be taken in write mode (ie
IPC_SET and IPC_RMID for now). It is the equivalent function of semctl_down
for message queues.

This greatly changes the readability of sys_msgctl and also harmonizes the way
these commands are handled among all IPCs.

Signed-off-by: Pierre Peiffer
Acked-by: Serge Hallyn
Cc: Nadia Derbey
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pierre Peiffer
2008-04-29 23:06:13 +0800
8d4cc8b5c IPC/shared memory: introduce shmctl_down ... Browse Code »

Currently, the way the different commands are handled in sys_shmctl introduces
some duplicated code.

This patch introduces the shmctl_down function to handle all the commands
requiring the rwmutex to be taken in write mode (ie IPC_SET and IPC_RMID for
now). It is the equivalent function of semctl_down for shared memory.

This removes some duplicated code for handling these both commands and
harmonizes the way they are handled among all IPCs.

Signed-off-by: Pierre Peiffer
Acked-by: Serge Hallyn
Cc: Nadia Derbey
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pierre Peiffer
2008-04-29 23:06:13 +0800
6ff379721 IPC/semaphores: code factorisation ... Browse Code »

Trivial patch which adds some small locking functions and makes use of them to
factorize some part of the code and to make it cleaner.

Signed-off-by: Pierre Peiffer
Acked-by: Serge Hallyn
Cc: Nadia Derbey
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pierre Peiffer
2008-04-29 23:06:13 +0800
6546bc427 ipc: re-enable msgmni automatic recomputing msgmni if set to negative ... Browse Code »

The enhancement as asked for by Yasunori: if msgmni is set to a negative
value, register it back into the ipcns notifier chain.

A new interface has been added to the notification mechanism:
notifier_chain_cond_register() registers a notifier block only if not already
registered. With that new interface we avoid taking care of the states
changes in procfs.

Signed-off-by: Nadia Derbey
Cc: Yasunori Goto
Cc: Matt Helsley
Cc: Mingming Cao
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nadia Derbey
2008-04-29 23:06:13 +0800
91cfb2b4b ipc: do not recompute msgmni anymore if explicitly set by user ... Browse Code »

Make msgmni not recomputed anymore upon ipc namespace creation / removal or
memory add/remove, as soon as it has been set from userland.

As soon as msgmni is explicitly set via procfs or sysctl(), the associated
callback routine is unregistered from the ipc namespace notifier chain.

Signed-off-by: Nadia Derbey
Cc: Yasunori Goto
Cc: Matt Helsley
Cc: Mingming Cao
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nadia Derbey
2008-04-29 23:06:13 +0800
e2c284d8a ipc: recompute msgmni on ipc namespace creation/removal ... Browse Code »

Introduce a notification mechanism that aims at recomputing msgmni each time
an ipc namespace is created or removed.

The ipc namespace notifier chain already defined for memory hotplug management
is used for that purpose too.

Each time a new ipc namespace is allocated or an existing ipc namespace is
removed, the ipcns notifier chain is notified. The callback routine for each
registered ipc namespace is then activated in order to recompute msgmni for
that namespace.

Signed-off-by: Nadia Derbey
Cc: Yasunori Goto
Cc: Matt Helsley
Cc: Mingming Cao
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nadia Derbey
2008-04-29 23:06:12 +0800
424450c1d ipc: invoke the ipcns notifier chain as a work item ... Browse Code »

Make the memory hotplug chain's mutex held for a shorter time: when memory is
offlined or onlined a work item is added to the global workqueue. When the
work item is run, it notifies the ipcns notifier chain with the
IPCNS_MEMCHANGED event.

Signed-off-by: Nadia Derbey
Cc: Yasunori Goto
Cc: Matt Helsley
Cc: Mingming Cao
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nadia Derbey
2008-04-29 23:06:12 +0800
b6b337ad1 ipc: recompute msgmni on memory add / remove ... Browse Code »

Introduce the registration of a callback routine that recomputes msg_ctlmni
upon memory add / remove.

A single notifier block is registered in the hotplug memory chain for all the
ipc namespaces.

Since the ipc namespaces are not linked together, they have their own
notification chain: one notifier_block is defined per ipc namespace.

Each time an ipc namespace is created (removed) it registers (unregisters) its
notifier block in (from) the ipcns chain. The callback routine registered in
the memory chain invokes the ipcns notifier chain with the IPCNS_LOWMEM event.
Each callback routine registered in the ipcns namespace, in turn, recomputes
msgmni for the owning namespace.

Signed-off-by: Nadia Derbey
Cc: Yasunori Goto
Cc: Matt Helsley
Cc: Mingming Cao
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nadia Derbey
2008-04-29 23:06:12 +0800
4d89dc6ab ipc: scale msgmni to the number of ipc namespaces ... Browse Code »

Since all the namespaces see the same amount of memory (the total one) this
patch introduces a new variable that counts the ipc namespaces and divides
msg_ctlmni by this counter.

Signed-off-by: Nadia Derbey
Cc: Yasunori Goto
Cc: Matt Helsley
Cc: Mingming Cao
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nadia Derbey
2008-04-29 23:06:12 +0800
f7bf3df8b ipc: scale msgmni to the amount of lowmem ... Browse Code »

On large systems we'd like to allow a larger number of message queues. In
some cases up to 32K. However simply setting MSGMNI to a larger value may
cause problems for smaller systems.

The first patch of this series introduces a default maximum number of message
queue ids that scales with the amount of lowmem.

Since msgmni is per namespace and there is no amount of memory dedicated to
each namespace so far, the second patch of this series scales msgmni to the
number of ipc namespaces too.

Since msgmni depends on the amount of memory, it becomes necessary to
recompute it upon memory add/remove. In the 4th patch, memory hotplug
management is added: a notifier block is registered into the memory hotplug
notifier chain for the ipc subsystem. Since the ipc namespaces are not linked
together, they have their own notification chain: one notifier_block is
defined per ipc namespace. Each time an ipc namespace is created (removed) it
registers (unregisters) its notifier block in (from) the ipcns chain. The
callback routine registered in the memory chain invokes the ipcns notifier
chain with the IPCNS_MEMCHANGE event. Each callback routine registered in the
ipcns namespace, in turn, recomputes msgmni for the owning namespace.

The 5th patch makes it possible to keep the memory hotplug notifier chain's
lock for a lesser amount of time: instead of directly notifying the ipcns
notifier chain upon memory add/remove, a work item is added to the global
workqueue. When activated, this work item is the one who notifies the ipcns
notifier chain.

Since msgmni depends on the number of ipc namespaces, it becomes necessary to
recompute it upon ipc namespace creation / removal. The 6th patch uses the
ipc namespace notifier chain for that purpose: that chain is notified each
time an ipc namespace is created or removed. This makes it possible to
recompute msgmni for all the namespaces each time one of them is created or
removed.

When msgmni is explicitely set from userspace, we should avoid recomputing it
upon memory add/remove or ipcns creation/removal. This is what the 7th patch
does: it simply unregisters the ipcns callback routine as soon as msgmni has
been changed from procfs or sysctl().

Even if msgmni is set by hand, it should be possible to make it back
automatically recomputed upon memory add/remove or ipcns creation/removal.
This what is achieved in patch 8: if set to a negative value, msgmni is added
back to the ipcns notifier chain, making it automatically recomputed again.

This patch:

Compute msg_ctlmni to make it scale with the amount of lowmem. msg_ctlmni is
now set to make the message queues occupy 1/32 of the available lowmem.

Some cleaning has also been done for the MSGPOOL constant: the msgctl man page
says it's not used, but it also defines it as a size in bytes (the code
expresses it in Kbytes).

Signed-off-by: Nadia Derbey
Cc: Yasunori Goto
Cc: Matt Helsley
Cc: Mingming Cao
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nadia Derbey
2008-04-29 23:06:12 +0800
48dea404e IPC: use ipc_buildid() directly from ipc_addid() ... Browse Code »

By continuing to consolidate a little the IPC code, each id can be built
directly in ipc_addid() instead of having it built from each callers of
ipc_addid()

And I also remove shm_addid() in order to have, as much as possible, the
same code for shm/sem/msg.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Pierre Peiffer
Cc: Nadia Derbey
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pierre Peiffer
2008-04-29 23:06:12 +0800

28 Apr, 2008

2 commits

52cd3b074 mempolicy: rework mempolicy Reference Counting [yet again] ... Browse Code »

After further discussion with Christoph Lameter, it has become clear that my
earlier attempts to clean up the mempolicy reference counting were a bit of
overkill in some areas, resulting in superflous ref/unref in what are usually
fast paths. In other areas, further inspection reveals that I botched the
unref for interleave policies.

A separate patch, suitable for upstream/stable trees, fixes up the known
errors in the previous attempt to fix reference counting.

This patch reworks the memory policy referencing counting and, one hopes,
simplifies the code. Maybe I'll get it right this time.

See the update to the numa_memory_policy.txt document for a discussion of
memory policy reference counting that motivates this patch.

Summary:

Lookup of mempolicy, based on (vma, address) need only add a reference for
shared policy, and we need only unref the policy when finished for shared
policies. So, this patch backs out all of the unneeded extra reference
counting added by my previous attempt. It then unrefs only shared policies
when we're finished with them, using the mpol_cond_put() [conditional put]
helper function introduced by this patch.

Note that shmem_swapin() calls read_swap_cache_async() with a dummy vma
containing just the policy. read_swap_cache_async() can call alloc_page_vma()
multiple times, so we can't let alloc_page_vma() unref the shared policy in
this case. To avoid this, we make a copy of any non-null shared policy and
remove the MPOL_F_SHARED flag from the copy. This copy occurs before reading
a page [or multiple pages] from swap, so the overhead should not be an issue
here.

I introduced a new static inline function "mpol_cond_copy()" to copy the
shared policy to an on-stack policy and remove the flags that would require a
conditional free. The current implementation of mpol_cond_copy() assumes that
the struct mempolicy contains no pointers to dynamically allocated structures
that must be duplicated or reference counted during copy.

Signed-off-by: Lee Schermerhorn
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Mel Gorman
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2008-04-28 23:58:24 +0800
ae4d8c16a mempolicy: fixup Fallback for Default Shmem Policy ... Browse Code »

get_vma_policy() is not handling fallback to task policy correctly when the
get_policy() vm_op returns NULL. The NULL overwrites the 'pol' variable that
was holding the fallback task mempolicy. So, it was falling back directly to
system default policy.

Fix get_vma_policy() to use only non-NULL policy returned from the vma
get_policy op.

shm_get_policy() was falling back to current task's mempolicy if the "backing
file system" [tmpfs vs hugetlbfs] does not support the get_policy vm_op and
the vma policy is null. This is incorrect for show_numa_maps() which is
likely querying the numa_maps of some task other than current. Remove this
fallback.

Signed-off-by: Lee Schermerhorn
Cc: Christoph Lameter
Cc: David Rientjes
Cc: Mel Gorman
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lee Schermerhorn
2008-04-28 23:58:24 +0800