04 May, 2008

1 commit

  • A very small cleanup for mq_open.

    We do not have to call set_close_on_exit if we create the file
    descriptor right away with the flag set. We have a function for this
    now. The resulting code is smaller and a tiny bit faster.

    Signed-off-by: Ulrich Drepper
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     

29 Apr, 2008

19 commits

  • Use proc_create_data() to make sure that ->proc_fops and ->data be setup
    before gluing PDE to main tree.

    Signed-off-by: Denis V. Lunev
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman"
    Cc: Nadia Derbey
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denis V. Lunev
     
  • sys_unshare(CLONE_NEWIPC) doesn't handle the undo lists properly, this can
    cause a kernel memory corruption. CLONE_NEWIPC must detach from the existing
    undo lists.

    Fix, part 1: add support for sys_unshare(CLONE_SYSVSEM)

    The original reason to not support it was the potential (inevitable?)
    confusion due to the fact that sys_unshare(CLONE_SYSVSEM) has the
    inverse meaning of clone(CLONE_SYSVSEM).

    Our two most reasonable options then appear to be (1) fully support
    CLONE_SYSVSEM, or (2) continue to refuse explicit CLONE_SYSVSEM,
    but always do it anyway on unshare(CLONE_SYSVSEM). This patch does
    (1).

    Changelog:
    Apr 16: SEH: switch to Manfred's alternative patch which
    removes the unshare_semundo() function which
    always refused CLONE_SYSVSEM.

    Signed-off-by: Manfred Spraul
    Signed-off-by: Serge E. Hallyn
    Acked-by: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Cc: Michael Kerrisk
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • Add definitions of USHORT_MAX and others into kernel. ipc uses it and slub
    implementation might also use it.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Zhang Yanmin
    Reviewed-by: Christoph Lameter
    Cc: Nadia Derbey
    Cc: "Pierre Peiffer"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang, Yanmin
     
  • semctl_down(), msgctl_down() and shmctl_down() are used to handle the same set
    of commands for each kind of IPC. They all start to do the same job (they
    retrieve the ipc and do some permission checks) before handling the commands
    on their own.

    This patch proposes to consolidate this by moving these same pieces of code
    into one common function called ipcctl_pre_down().

    It simplifies a little these xxxctl_down() functions and increases a little
    the maintainability.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • The IPC_SET command performs the same permission setting for all IPCs. This
    patch introduces a common ipc_update_perm() function to update these
    permissions and makes use of it for all IPCs.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • All IPCs make use of an intermetiate *_setbuf structure to handle the IPC_SET
    command. This is not really needed and, moreover, it complicates a little bit
    the code.

    This patch gets rid of the use of it and uses directly the semid64_ds/
    msgid64_ds/shmid64_ds structure.

    In addition of removing one struture declaration, it also simplifies and
    improves a little bit the common 64-bits path.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • semctl_down() takes one unused parameter: semnum. This patch proposes to get
    rid of it.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • semctl_down is called with the rwmutex (the one which protects the list of
    ipcs) taken in write mode.

    This patch moves this rwmutex taken in write-mode inside semctl_down.

    This has the advantages of reducing a little bit the window during which this
    rwmutex is taken, clarifying sys_semctl, and finally of having a coherent
    behaviour with [shm|msg]ctl_down

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Currently, sys_msgctl is not easy to read.

    This patch tries to improve that by introducing the msgctl_down function to
    handle all commands requiring the rwmutex to be taken in write mode (ie
    IPC_SET and IPC_RMID for now). It is the equivalent function of semctl_down
    for message queues.

    This greatly changes the readability of sys_msgctl and also harmonizes the way
    these commands are handled among all IPCs.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Currently, the way the different commands are handled in sys_shmctl introduces
    some duplicated code.

    This patch introduces the shmctl_down function to handle all the commands
    requiring the rwmutex to be taken in write mode (ie IPC_SET and IPC_RMID for
    now). It is the equivalent function of semctl_down for shared memory.

    This removes some duplicated code for handling these both commands and
    harmonizes the way they are handled among all IPCs.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Trivial patch which adds some small locking functions and makes use of them to
    factorize some part of the code and to make it cleaner.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • The enhancement as asked for by Yasunori: if msgmni is set to a negative
    value, register it back into the ipcns notifier chain.

    A new interface has been added to the notification mechanism:
    notifier_chain_cond_register() registers a notifier block only if not already
    registered. With that new interface we avoid taking care of the states
    changes in procfs.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Make msgmni not recomputed anymore upon ipc namespace creation / removal or
    memory add/remove, as soon as it has been set from userland.

    As soon as msgmni is explicitly set via procfs or sysctl(), the associated
    callback routine is unregistered from the ipc namespace notifier chain.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Introduce a notification mechanism that aims at recomputing msgmni each time
    an ipc namespace is created or removed.

    The ipc namespace notifier chain already defined for memory hotplug management
    is used for that purpose too.

    Each time a new ipc namespace is allocated or an existing ipc namespace is
    removed, the ipcns notifier chain is notified. The callback routine for each
    registered ipc namespace is then activated in order to recompute msgmni for
    that namespace.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Make the memory hotplug chain's mutex held for a shorter time: when memory is
    offlined or onlined a work item is added to the global workqueue. When the
    work item is run, it notifies the ipcns notifier chain with the
    IPCNS_MEMCHANGED event.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Introduce the registration of a callback routine that recomputes msg_ctlmni
    upon memory add / remove.

    A single notifier block is registered in the hotplug memory chain for all the
    ipc namespaces.

    Since the ipc namespaces are not linked together, they have their own
    notification chain: one notifier_block is defined per ipc namespace.

    Each time an ipc namespace is created (removed) it registers (unregisters) its
    notifier block in (from) the ipcns chain. The callback routine registered in
    the memory chain invokes the ipcns notifier chain with the IPCNS_LOWMEM event.
    Each callback routine registered in the ipcns namespace, in turn, recomputes
    msgmni for the owning namespace.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Since all the namespaces see the same amount of memory (the total one) this
    patch introduces a new variable that counts the ipc namespaces and divides
    msg_ctlmni by this counter.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • On large systems we'd like to allow a larger number of message queues. In
    some cases up to 32K. However simply setting MSGMNI to a larger value may
    cause problems for smaller systems.

    The first patch of this series introduces a default maximum number of message
    queue ids that scales with the amount of lowmem.

    Since msgmni is per namespace and there is no amount of memory dedicated to
    each namespace so far, the second patch of this series scales msgmni to the
    number of ipc namespaces too.

    Since msgmni depends on the amount of memory, it becomes necessary to
    recompute it upon memory add/remove. In the 4th patch, memory hotplug
    management is added: a notifier block is registered into the memory hotplug
    notifier chain for the ipc subsystem. Since the ipc namespaces are not linked
    together, they have their own notification chain: one notifier_block is
    defined per ipc namespace. Each time an ipc namespace is created (removed) it
    registers (unregisters) its notifier block in (from) the ipcns chain. The
    callback routine registered in the memory chain invokes the ipcns notifier
    chain with the IPCNS_MEMCHANGE event. Each callback routine registered in the
    ipcns namespace, in turn, recomputes msgmni for the owning namespace.

    The 5th patch makes it possible to keep the memory hotplug notifier chain's
    lock for a lesser amount of time: instead of directly notifying the ipcns
    notifier chain upon memory add/remove, a work item is added to the global
    workqueue. When activated, this work item is the one who notifies the ipcns
    notifier chain.

    Since msgmni depends on the number of ipc namespaces, it becomes necessary to
    recompute it upon ipc namespace creation / removal. The 6th patch uses the
    ipc namespace notifier chain for that purpose: that chain is notified each
    time an ipc namespace is created or removed. This makes it possible to
    recompute msgmni for all the namespaces each time one of them is created or
    removed.

    When msgmni is explicitely set from userspace, we should avoid recomputing it
    upon memory add/remove or ipcns creation/removal. This is what the 7th patch
    does: it simply unregisters the ipcns callback routine as soon as msgmni has
    been changed from procfs or sysctl().

    Even if msgmni is set by hand, it should be possible to make it back
    automatically recomputed upon memory add/remove or ipcns creation/removal.
    This what is achieved in patch 8: if set to a negative value, msgmni is added
    back to the ipcns notifier chain, making it automatically recomputed again.

    This patch:

    Compute msg_ctlmni to make it scale with the amount of lowmem. msg_ctlmni is
    now set to make the message queues occupy 1/32 of the available lowmem.

    Some cleaning has also been done for the MSGPOOL constant: the msgctl man page
    says it's not used, but it also defines it as a size in bytes (the code
    expresses it in Kbytes).

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • By continuing to consolidate a little the IPC code, each id can be built
    directly in ipc_addid() instead of having it built from each callers of
    ipc_addid()

    And I also remove shm_addid() in order to have, as much as possible, the
    same code for shm/sem/msg.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Pierre Peiffer
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     

28 Apr, 2008

2 commits

  • After further discussion with Christoph Lameter, it has become clear that my
    earlier attempts to clean up the mempolicy reference counting were a bit of
    overkill in some areas, resulting in superflous ref/unref in what are usually
    fast paths. In other areas, further inspection reveals that I botched the
    unref for interleave policies.

    A separate patch, suitable for upstream/stable trees, fixes up the known
    errors in the previous attempt to fix reference counting.

    This patch reworks the memory policy referencing counting and, one hopes,
    simplifies the code. Maybe I'll get it right this time.

    See the update to the numa_memory_policy.txt document for a discussion of
    memory policy reference counting that motivates this patch.

    Summary:

    Lookup of mempolicy, based on (vma, address) need only add a reference for
    shared policy, and we need only unref the policy when finished for shared
    policies. So, this patch backs out all of the unneeded extra reference
    counting added by my previous attempt. It then unrefs only shared policies
    when we're finished with them, using the mpol_cond_put() [conditional put]
    helper function introduced by this patch.

    Note that shmem_swapin() calls read_swap_cache_async() with a dummy vma
    containing just the policy. read_swap_cache_async() can call alloc_page_vma()
    multiple times, so we can't let alloc_page_vma() unref the shared policy in
    this case. To avoid this, we make a copy of any non-null shared policy and
    remove the MPOL_F_SHARED flag from the copy. This copy occurs before reading
    a page [or multiple pages] from swap, so the overhead should not be an issue
    here.

    I introduced a new static inline function "mpol_cond_copy()" to copy the
    shared policy to an on-stack policy and remove the flags that would require a
    conditional free. The current implementation of mpol_cond_copy() assumes that
    the struct mempolicy contains no pointers to dynamically allocated structures
    that must be duplicated or reference counted during copy.

    Signed-off-by: Lee Schermerhorn
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Mel Gorman
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     
  • get_vma_policy() is not handling fallback to task policy correctly when the
    get_policy() vm_op returns NULL. The NULL overwrites the 'pol' variable that
    was holding the fallback task mempolicy. So, it was falling back directly to
    system default policy.

    Fix get_vma_policy() to use only non-NULL policy returned from the vma
    get_policy op.

    shm_get_policy() was falling back to current task's mempolicy if the "backing
    file system" [tmpfs vs hugetlbfs] does not support the get_policy vm_op and
    the vma policy is null. This is incorrect for show_numa_maps() which is
    likely querying the numa_maps of some task other than current. Remove this
    fallback.

    Signed-off-by: Lee Schermerhorn
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Mel Gorman
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     

19 Apr, 2008

2 commits

  • This is the first really tricky patch in the series. It elevates the writer
    count on a mount each time a non-special file is opened for write.

    We used to do this in may_open(), but Miklos pointed out that __dentry_open()
    is used as well to create filps. This will cover even those cases, while a
    call in may_open() would not have.

    There is also an elevated count around the vfs_create() call in open_namei().
    See the comments for more details, but we need this to fix a 'create, remount,
    fail r/w open()' race.

    Some filesystems forego the use of normal vfs calls to create
    struct files. Make sure that these users elevate the mnt
    writer count because they will get __fput(), and we need
    to make sure they're balanced.

    Acked-by: Al Viro
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Dave Hansen
     
  • Elevate the write count during the vfs_rmdir() and vfs_unlink().

    [AV: merged rmdir and unlink parts, added missing pieces in nfsd]

    Acked-by: Serge Hallyn
    Acked-by: Al Viro
    Signed-off-by: Christoph Hellwig
    Signed-off-by: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Dave Hansen
     

11 Mar, 2008

1 commit

  • Address 3 known bugs in the current memory policy reference counting method.
    I have a series of patches to rework the reference counting to reduce overhead
    in the allocation path. However, that series will require testing in -mm once
    I repost it.

    1) alloc_page_vma() does not release the extra reference taken for
    vma/shared mempolicy when the mode == MPOL_INTERLEAVE. This can result in
    leaking mempolicy structures. This is probably occurring, but not being
    noticed.

    Fix: add the conditional release of the reference.

    2) hugezonelist unconditionally releases a reference on the mempolicy when
    mode == MPOL_INTERLEAVE. This can result in decrementing the reference
    count for system default policy [should have no ill effect] or premature
    freeing of task policy. If this occurred, the next allocation using task
    mempolicy would use the freed structure and probably BUG out.

    Fix: add the necessary check to the release.

    3) The current reference counting method assumes that vma 'get_policy()'
    methods automatically add an extra reference a non-NULL returned mempolicy.
    This is true for shmem_get_policy() used by tmpfs mappings, including
    regular page shm segments. However, SHM_HUGETLB shm's, backed by
    hugetlbfs, just use the vma policy without the extra reference. This
    results in freeing of the vma policy on the first allocation, with reuse of
    the freed mempolicy structure on subsequent allocations.

    Fix: Rather than add another condition to the conditional reference
    release, which occur in the allocation path, just add a reference when
    returning the vma policy in shm_get_policy() to match the assumptions.

    Signed-off-by: Lee Schermerhorn
    Cc: Greg KH
    Cc: Andi Kleen
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: David Rientjes
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     

09 Feb, 2008

7 commits

  • When sending the pid namespaces patches I wrongly converted the tsk->tgid into
    task_pid_vnr(tsk) in mqueue-s (the git id of this patch is
    b488893a390edfe027bae7a46e9af8083e740668).

    The proper behavior is to get the task_tgid_vnr(tsk).

    This seem to be the only mistake of that kind.

    Signed-off-by: Pavel Emelyanov
    Cc: "Eric W. Biederman"
    Cc: Oleg Nesterov
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Some time ago the xxx_vnr() calls (e.g. pid_vnr or find_task_by_vpid) were
    _all_ converted to operate on the current pid namespace. After this each call
    like xxx_nr_ns(foo, current->nsproxy->pid_ns) is nothing but a xxx_vnr(foo)
    one.

    Switch all the xxx_nr_ns() callers to use the xxx_vnr() calls where
    appropriate.

    Signed-off-by: Pavel Emelyanov
    Reviewed-by: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • sem_exit_ns(), msg_exit_ns() and shm_exit_ns() are all called when an
    ipc_namespace is released to free all ipcs of each type. But in fact, they
    do the same thing: they loop around all ipcs to free them individually by
    calling a specific routine.

    This patch proposes to consolidate this by introducing a common function,
    free_ipcs(), that do the job. The specific routine to call on each
    individual ipcs is passed as parameter. For this, these ipc-specific
    'free' routines are reworked to take a generic 'struct ipc_perm' as
    parameter.

    Signed-off-by: Pierre Peiffer
    Cc: Cedric Le Goater
    Cc: Pavel Emelyanov
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Each ipc_namespace contains a table of 3 pointers to struct ipc_ids (3 for
    msg, sem and shm, structure used to store all ipcs) These 'struct ipc_ids'
    are dynamically allocated for each icp_namespace as the ipc_namespace
    itself (for the init namespace, they are initialized with pointers to
    static variables instead)

    It is so for historical reason: in fact, before the use of idr to store the
    ipcs, the ipcs were stored in tables of variable length, depending of the
    maximum number of ipc allowed. Now, these 'struct ipc_ids' have a fixed
    size. As they are allocated in any cases for each new ipc_namespace, there
    is no gain of memory in having them allocated separately of the struct
    ipc_namespace.

    This patch proposes to make this table static in the struct ipc_namespace.
    Thus, we can allocate all in once and get rid of all the code needed to
    allocate and free these ipc_ids separately.

    Signed-off-by: Pierre Peiffer
    Acked-by: Cedric Le Goater
    Cc: Pavel Emelyanov
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • These commands (SEM_STAT and IPC_STAT) are rather doing the same things
    (only the meaning of the id given as input and the return value differ).
    However, for the semaphores, they are handled in two different places (two
    different functions).

    This patch consolidates this for clarification by handling these both
    commands in the same place in semctl_nolock(). It also removes one unused
    parameter for this function.

    Signed-off-by: Pierre Peiffer
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • ipc_lock_check_down(), ipc_lock_check() and ipcget() seem too large to be
    inline. Besides, they give no optimization being inline as they perform
    calls inside in any case.

    Moving them into ipc/util.c saves 500 bytes of vmlinux and shortens IPC
    internal API.

    $ ./scripts/bloat-o-meter vmlinux-orig vmlinux
    add/remove: 3/2 grow/shrink: 0/10 up/down: 490/-989 (-499)
    function old new delta
    ipcget - 392 +392
    ipc_lock_check_down - 49 +49
    ipc_lock_check - 49 +49
    sys_semget 119 105 -14
    sys_shmget 108 86 -22
    sys_msgget 100 78 -22
    do_msgsnd 665 631 -34
    do_msgrcv 680 644 -36
    do_shmat 771 733 -38
    sys_msgctl 1302 1229 -73
    ipcget_new 80 - -80
    sys_semtimedop 1534 1452 -82
    sys_semctl 2034 1922 -112
    sys_shmctl 1919 1765 -154
    ipcget_public 322 - -322

    The ipcget() growth is the result of gcc inlining of currently static
    ipcget_new/_public.

    Signed-off-by: Pavel Emelyanov
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Currently the IPC namespace management code is spread over the ipc/*.c files.
    I moved this code into ipc/namespace.c file which is compiled out when needed.

    The linux/ipc_namespace.h file is used to store the prototypes of the
    functions in namespace.c and the stubs for NAMESPACES=n case. This is done
    so, because the stub for copy_ipc_namespace requires the knowledge of the
    CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in
    included into many many .c files via the sys.h->sem.h sequence so adding the
    sched.h into it will make all these .c depend on sched.h which is not that
    good. On the other hand the knowledge about the namespaces stuff is required
    in 4 .c files only.

    Besides, this patch compiles out some auxiliary functions from ipc/sem.c,
    msg.c and shm.c files. It turned out that moving these functions into
    namespaces.c is not that easy because they use many other calls and macros
    from the original file. Moving them would make this patch complicated. On
    the other hand all these functions can be consolidated, so I will send a
    separate patch doing this a bit later.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

07 Feb, 2008

2 commits

  • sysvipc_find_ipc() can become static.

    Signed-off-by: Adrian Bunk
    Acked-by: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • In the new implementation of the [sem|shm|msg]_lock[_check]() routines, we
    use the return value of ipc_lock() in container_of() without any check.
    But ipc_lock may return a errcode. The use of this errcode in
    container_of() may alter this errcode, and we don't want this.

    And in xxx_exit_ns, the pointer return by idr_find is of type 'struct
    kern_ipc_per'...

    Today, the code will work as is because the member used in these
    container_of() is the first member of its container (offset == 0), the
    errcode isn't changed then. But in the general case, we can't count on
    this assumption and this may lead later to a real bug if we don't correct
    this.

    Again, the proposed solution is simple and correct. But, as pointed by
    Nadia, with this solution, the same check will be done several times (in
    all sub-callers...), what is not very funny/optimal...

    Signed-off-by: Pierre Peiffer
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     

30 Nov, 2007

1 commit


07 Nov, 2007

1 commit

  • Commit ed6dcf4a in the history.git tree broke netlink_unicast timeouts
    by moving the schedule_timeout() call to a new function that doesn't
    propagate the remaining timeout back to the caller. This means on each
    retry we start with the full timeout again.

    ipc/mqueue.c seems to actually want to wait indefinitely so this
    behaviour is retained.

    Signed-off-by: Patrick McHardy
    Signed-off-by: David S. Miller

    Patrick McHardy
     

21 Oct, 2007

1 commit


20 Oct, 2007

3 commits

  • With the use of idr to store the ipc, the case where the idr cache is
    empty, when idr_get_new is called (this may happen even if we call
    idr_pre_get() before), is not well handled: it lets
    semget()/shmget()/msgget() return ENOSPC when this cache is empty, what 1.
    does not reflect the facts and 2. does not conform to the man(s).

    This patch fixes this by retrying the whole process of allocation in this case.

    Signed-off-by: Pierre Peiffer
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Virtualization of sysv msg queues is incomplete: msg_hdrs and msg_bytes
    variables visible from userspace are global. Let's make them
    per-namespace.

    Signed-off-by: Alexey Kuznetsov
    Signed-off-by: Kirill Korotaev
    Cc: Pierre Peiffer
    Cc: Nadia Derbey
    Cc: Serge Hallyn
    Acked-by: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Korotaev
     
  • Some comments about sem_undo_list seem wrong.
    About the comment above unlock_semundo:
    "... If task2 now exits before task1 releases the lock (by calling
    unlock_semundo()), then task1 will never call spin_unlock(). ..."

    This is just wrong, I see no reason for which task1 will not call
    spin_unlock... The rest of this comment is also wrong... Unless I
    miss something (of course).

    Finally, (un)lock_semundo functions are useless, so remove them
    for simplification. (this avoids an useless if statement)

    Signed-off-by: Pierre Peiffer
    Cc: Nadia Derbey
    Acked-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer