27 Jul, 2011

1 commit

  • Add support for the shm_rmid_forced sysctl. If set to 1, all shared
    memory objects in current ipc namespace will be automatically forced to
    use IPC_RMID.

    The POSIX way of handling shmem allows one to create shm objects and
    call shmdt(), leaving shm object associated with no process, thus
    consuming memory not counted via rlimits.

    With shm_rmid_forced=1 the shared memory object is counted at least for
    one process, so OOM killer may effectively kill the fat process holding
    the shared memory.

    It obviously breaks POSIX - some programs relying on the feature would
    stop working. So set shm_rmid_forced=1 only if you're sure nobody uses
    "orphaned" memory. Use shm_rmid_forced=0 by default for compatability
    reasons.

    The feature was previously impemented in -ow as a configure option.

    [akpm@linux-foundation.org: fix documentation, per Randy]
    [akpm@linux-foundation.org: fix warning]
    [akpm@linux-foundation.org: readability/conventionality tweaks]
    [akpm@linux-foundation.org: fix shm_rmid_forced/shm_forced_rmid confusion, use standard comment layout]
    Signed-off-by: Vasiliy Kulikov
    Cc: Randy Dunlap
    Cc: "Eric W. Biederman"
    Cc: "Serge E. Hallyn"
    Cc: Daniel Lezcano
    Cc: Oleg Nesterov
    Cc: Tejun Heo
    Cc: Ingo Molnar
    Cc: Alan Cox
    Cc: Solar Designer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     

24 Mar, 2011

2 commits

  • CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
    because the resource comes from current's own ipc namespace.

    setuid/setgid are to uids in own namespace, so again checks can be against
    current_user_ns().

    Changelog:
    Jan 11: Use task_ns_capable() in place of sched_capable().
    Jan 11: Use nsown_capable() as suggested by Bastian Blank.
    Jan 11: Clarify (hopefully) some logic in futex and sched.c
    Feb 15: use ns_capable for ipc, not nsown_capable
    Feb 23: let copy_ipcs handle setting ipc_ns->user_ns
    Feb 23: pass ns down rather than taking it from current

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Serge E. Hallyn
    Acked-by: "Eric W. Biederman"
    Acked-by: Daniel Lezcano
    Acked-by: David Howells
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • Changelog:
    Feb 15: Don't set new ipc->user_ns if we didn't create a new
    ipc_ns.
    Feb 23: Move extern declaration to ipc_namespace.h, and group
    fwd declarations at top.

    Signed-off-by: Serge E. Hallyn
    Acked-by: "Eric W. Biederman"
    Acked-by: Daniel Lezcano
    Acked-by: David Howells
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

13 Mar, 2010

1 commit

  • Remove INIT_NSPROXY(), use C99 initializer.
    Remove INIT_IPC_NS(), INIT_NET_NS() while I'm at it.

    Note: headers trim will be done later, now it's quite pointless because
    results will be invalidated by merge window.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

16 Dec, 2009

1 commit

  • We have HARD_MSGMAX lower on 64bit than on 32bit, since usually 64bit
    machines have more memory than 32bit machines.

    Making it higher on 64bit seems reasonable, and keep the original number
    on 32bit.

    Acked-by: Serge E. Hallyn
    Cc: Cedric Le Goater
    Signed-off-by: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Amerigo Wang
     

19 Jun, 2009

2 commits


07 Apr, 2009

3 commits

  • Largely inspired from ipc/ipc_sysctl.c. This patch isolates the mqueue
    sysctl stuff in its own file.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Cedric Le Goater
    Signed-off-by: Nadia Derbey
    Signed-off-by: Serge E. Hallyn
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • Implement multiple mounts of the mqueue file system, and link it to usage
    of CLONE_NEWIPC.

    Each ipc ns has a corresponding mqueuefs superblock. When a user does
    clone(CLONE_NEWIPC) or unshare(CLONE_NEWIPC), the unshare will cause an
    internal mount of a new mqueuefs sb linked to the new ipc ns.

    When a user does 'mount -t mqueue mqueue /dev/mqueue', he mounts the
    mqueuefs superblock.

    Posix message queues can be worked with both through the mq_* system calls
    (see mq_overview(7)), and through the VFS through the mqueue mount. Any
    usage of mq_open() and friends will work with the acting task's ipc
    namespace. Any actions through the VFS will work with the mqueuefs in
    which the file was created. So if a user doesn't remount mqueuefs after
    unshare(CLONE_NEWIPC), mq_open("/ab") will not be reflected in "ls
    /dev/mqueue".

    If task a mounts mqueue for ipc_ns:1, then clones task b with a new ipcns,
    ipcns:2, and then task a is the last task in ipc_ns:1 to exit, then (1)
    ipc_ns:1 will be freed, (2) it's superblock will live on until task b
    umounts the corresponding mqueuefs, and vfs actions will continue to
    succeed, but (3) sb->s_fs_info will be NULL for the sb corresponding to
    the deceased ipc_ns:1.

    To make this happen, we must protect the ipc reference count when

    a) a task exits and drops its ipcns->count, since it might be dropping
    it to 0 and freeing the ipcns

    b) a task accesses the ipcns through its mqueuefs interface, since it
    bumps the ipcns refcount and might race with the last task in the ipcns
    exiting.

    So the kref is changed to an atomic_t so we can use
    atomic_dec_and_lock(&ns->count,mq_lock), and every access to the ipcns
    through ns = mqueuefs_sb->s_fs_info is protected by the same lock.

    Signed-off-by: Cedric Le Goater
    Signed-off-by: Serge E. Hallyn
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • Move mqueue vfsmount plus a few tunables into the ipc_namespace struct.
    The CONFIG_IPC_NS boolean and the ipc_namespace struct will serve both the
    posix message queue namespaces and the SYSV ipc namespaces.

    The sysctl code will be fixed separately in patch 3. After just this
    patch, making a change to posix mqueue tunables always changes the values
    in the initial ipc namespace.

    Signed-off-by: Cedric Le Goater
    Signed-off-by: Serge E. Hallyn
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

26 Jul, 2008

1 commit

  • This patch proposes an alternative to the "magical
    positive-versus-negative number trick" Andrew complained about last week
    in http://lkml.org/lkml/2008/6/24/418.

    This had been introduced with the patches that scale msgmni to the amount
    of lowmem. With these patches, msgmni has a registered notification
    routine that recomputes msgmni value upon memory add/remove or ipc
    namespace creation/ removal.

    When msgmni is changed from user space (i.e. value written to the proc
    file), that notification routine is unregistered, and the way to make it
    registered back is to write a negative value into the proc file. This is
    the "magical positive-versus-negative number trick".

    To fix this, a new proc file is introduced: /proc/sys/kernel/auto_msgmni.
    This file acts as ON/OFF for msgmni automatic recomputing.

    With this patch, the process is the following:
    1) kernel boots in "automatic recomputing mode"
    /proc/sys/kernel/msgmni contains the value that has been computed (depends
    on lowmem)
    /proc/sys/kernel/automatic_msgmni contains "1"

    2) echo > /proc/sys/kernel/msgmni
    . sets msg_ctlmni to
    . de-activates automatic recomputing (i.e. if, say, some memory is added
    msgmni won't be recomputed anymore)
    . /proc/sys/kernel/automatic_msgmni now contains "0"

    3) echo "0" > /proc/sys/kernel/automatic_msgmni
    . de-activates msgmni automatic recomputing
    this has the same effect as 2) except that msg_ctlmni's value stays
    blocked at its current value)

    3) echo "1" > /proc/sys/kernel/automatic_msgmni
    . recomputes msgmni's value based on the current available memory size
    and number of ipc namespaces
    . re-activates automatic recomputing for msgmni.

    Signed-off-by: Nadia Derbey
    Cc: Solofo Ramangalahy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     

29 Apr, 2008

4 commits

  • The enhancement as asked for by Yasunori: if msgmni is set to a negative
    value, register it back into the ipcns notifier chain.

    A new interface has been added to the notification mechanism:
    notifier_chain_cond_register() registers a notifier block only if not already
    registered. With that new interface we avoid taking care of the states
    changes in procfs.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Introduce a notification mechanism that aims at recomputing msgmni each time
    an ipc namespace is created or removed.

    The ipc namespace notifier chain already defined for memory hotplug management
    is used for that purpose too.

    Each time a new ipc namespace is allocated or an existing ipc namespace is
    removed, the ipcns notifier chain is notified. The callback routine for each
    registered ipc namespace is then activated in order to recompute msgmni for
    that namespace.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Introduce the registration of a callback routine that recomputes msg_ctlmni
    upon memory add / remove.

    A single notifier block is registered in the hotplug memory chain for all the
    ipc namespaces.

    Since the ipc namespaces are not linked together, they have their own
    notification chain: one notifier_block is defined per ipc namespace.

    Each time an ipc namespace is created (removed) it registers (unregisters) its
    notifier block in (from) the ipcns chain. The callback routine registered in
    the memory chain invokes the ipcns notifier chain with the IPCNS_LOWMEM event.
    Each callback routine registered in the ipcns namespace, in turn, recomputes
    msgmni for the owning namespace.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Since all the namespaces see the same amount of memory (the total one) this
    patch introduces a new variable that counts the ipc namespaces and divides
    msg_ctlmni by this counter.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     

09 Feb, 2008

3 commits

  • sem_exit_ns(), msg_exit_ns() and shm_exit_ns() are all called when an
    ipc_namespace is released to free all ipcs of each type. But in fact, they
    do the same thing: they loop around all ipcs to free them individually by
    calling a specific routine.

    This patch proposes to consolidate this by introducing a common function,
    free_ipcs(), that do the job. The specific routine to call on each
    individual ipcs is passed as parameter. For this, these ipc-specific
    'free' routines are reworked to take a generic 'struct ipc_perm' as
    parameter.

    Signed-off-by: Pierre Peiffer
    Cc: Cedric Le Goater
    Cc: Pavel Emelyanov
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Each ipc_namespace contains a table of 3 pointers to struct ipc_ids (3 for
    msg, sem and shm, structure used to store all ipcs) These 'struct ipc_ids'
    are dynamically allocated for each icp_namespace as the ipc_namespace
    itself (for the init namespace, they are initialized with pointers to
    static variables instead)

    It is so for historical reason: in fact, before the use of idr to store the
    ipcs, the ipcs were stored in tables of variable length, depending of the
    maximum number of ipc allowed. Now, these 'struct ipc_ids' have a fixed
    size. As they are allocated in any cases for each new ipc_namespace, there
    is no gain of memory in having them allocated separately of the struct
    ipc_namespace.

    This patch proposes to make this table static in the struct ipc_namespace.
    Thus, we can allocate all in once and get rid of all the code needed to
    allocate and free these ipc_ids separately.

    Signed-off-by: Pierre Peiffer
    Acked-by: Cedric Le Goater
    Cc: Pavel Emelyanov
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Currently the IPC namespace management code is spread over the ipc/*.c files.
    I moved this code into ipc/namespace.c file which is compiled out when needed.

    The linux/ipc_namespace.h file is used to store the prototypes of the
    functions in namespace.c and the stubs for NAMESPACES=n case. This is done
    so, because the stub for copy_ipc_namespace requires the knowledge of the
    CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in
    included into many many .c files via the sys.h->sem.h sequence so adding the
    sched.h into it will make all these .c depend on sched.h which is not that
    good. On the other hand the knowledge about the namespaces stuff is required
    in 4 .c files only.

    Besides, this patch compiles out some auxiliary functions from ipc/sem.c,
    msg.c and shm.c files. It turned out that moving these functions into
    namespaces.c is not that easy because they use many other calls and macros
    from the original file. Moving them would make this patch complicated. On
    the other hand all these functions can be consolidated, so I will send a
    separate patch doing this a bit later.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov