03 Apr, 2009

1 commit

  • shm_get_stat() assumes idr_find(&shm_ids(ns).ipcs_idr) returns "struct
    shmid_kernel *"; all other callers assume that it returns "struct
    kern_ipc_perm *". This works because "struct kern_ipc_perm" is currently
    the first member of "struct shmid_kernel", but it would be better to use
    container_of() to prevent future breakage.

    Signed-off-by: Tony Battersby
    Cc: Jiri Olsa
    Cc: Jiri Kosina
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tony Battersby
     

24 Mar, 2009

1 commit


11 Feb, 2009

1 commit

  • When overcommit is disabled, the core VM accounts for pages used by anonymous
    shared, private mappings and special mappings. It keeps track of VMAs that
    should be accounted for with VM_ACCOUNT and VMAs that never had a reserve
    with VM_NORESERVE.

    Overcommit for hugetlbfs is much riskier than overcommit for base pages
    due to contiguity requirements. It avoids overcommiting on both shared and
    private mappings using reservation counters that are checked and updated
    during mmap(). This ensures (within limits) that hugepages exist in the
    future when faults occurs or it is too easy to applications to be SIGKILLed.

    As hugetlbfs makes its own reservations of a different unit to the base page
    size, VM_ACCOUNT should never be set. Even if the units were correct, we would
    double account for the usage in the core VM and hugetlbfs. VM_NORESERVE may
    be set because an application can request no reserves be made for hugetlbfs
    at the risk of getting killed later.

    With commit fc8744adc870a8d4366908221508bb113d8b72ee, VM_NORESERVE and
    VM_ACCOUNT are getting unconditionally set for hugetlbfs-backed mappings. This
    breaks the accounting for both the core VM and hugetlbfs, can trigger an
    OOM storm when hugepage pools are too small lockups and corrupted counters
    otherwise are used. This patch brings hugetlbfs more in line with how the
    core VM treats VM_NORESERVE but prevents VM_ACCOUNT being set.

    Signed-off-by: Mel Gorman
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

06 Feb, 2009

3 commits

  • Conflicts:
    fs/namei.c

    Manually merged per:

    diff --cc fs/namei.c
    index 734f2b5,bbc15c2..0000000
    --- a/fs/namei.c
    +++ b/fs/namei.c
    @@@ -860,9 -848,8 +849,10 @@@ static int __link_path_walk(const char
    nd->flags |= LOOKUP_CONTINUE;
    err = exec_permission_lite(inode);
    if (err == -EAGAIN)
    - err = vfs_permission(nd, MAY_EXEC);
    + err = inode_permission(nd->path.dentry->d_inode,
    + MAY_EXEC);
    + if (!err)
    + err = ima_path_check(&nd->path, MAY_EXEC);
    if (err)
    break;

    @@@ -1525,14 -1506,9 +1509,14 @@@ int may_open(struct path *path, int acc
    flag &= ~O_TRUNC;
    }

    - error = vfs_permission(nd, acc_mode);
    + error = inode_permission(inode, acc_mode);
    if (error)
    return error;
    +
    - error = ima_path_check(&nd->path,
    ++ error = ima_path_check(path,
    + acc_mode & (MAY_READ | MAY_WRITE | MAY_EXEC));
    + if (error)
    + return error;
    /*
    * An append-only file must be opened in append mode for writing.
    */

    Signed-off-by: James Morris

    James Morris
     
  • The number of calls to ima_path_check()/ima_file_free()
    should be balanced. An extra call to fput(), indicates
    the file could have been accessed without first being
    measured.

    Although f_count is incremented/decremented in places other
    than fget/fput, like fget_light/fput_light and get_file, the
    current task must already hold a file refcnt. The call to
    __fput() is delayed until the refcnt becomes 0, resulting
    in ima_file_free() flagging any changes.

    - add hook to increment opencount for IPC shared memory(SYSV),
    shmat files, and /dev/zero
    - moved NULL iint test in opencount_get()

    Signed-off-by: Mimi Zohar
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    Mimi Zohar
     
  • shm_get_stat() assumes that the inode is a "struct shmem_inode_info",
    which is incorrect for !CONFIG_SHMEM (see fs/ramfs/inode.c:
    ramfs_get_inode() vs. mm/shmem.c: shmem_get_inode()).

    This bad assumption can cause shmctl(SHM_INFO) to lockup when
    shm_get_stat() tries to spin_lock(&info->lock). Users of !CONFIG_SHMEM
    may encounter this lockup simply by invoking the 'ipcs' command.

    Reported by Jiri Olsa back in February 2008:
    http://lkml.org/lkml/2008/2/29/74

    Signed-off-by: Tony Battersby
    Cc: Jiri Kosina
    Reported-by: Jiri Olsa
    Cc: Hugh Dickins
    Cc: [2.6.everything]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tony Battersby
     

01 Feb, 2009

1 commit

  • The mmap_region() code would temporarily set the VM_ACCOUNT flag for
    anonymous shared mappings just to inform shmem_zero_setup() that it
    should enable accounting for the resulting shm object. It would then
    clear the flag after calling ->mmap (for the /dev/zero case) or doing
    shmem_zero_setup() (for the MAP_ANON case).

    This just resulted in vma merge issues, but also made for just
    unnecessary confusion. Use the already-existing VM_NORESERVE flag for
    this instead, and let shmem_{zero|file}_setup() just figure it out from
    that.

    This also happens to make it obvious that the new DRI2 GEM layer uses a
    non-reserving backing store for its object allocation - which is quite
    possibly not intentional. But since I didn't want to change semantics
    in this patch, I left it alone, and just updated the caller to use the
    new flag semantics.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

14 Jan, 2009

1 commit


08 Jan, 2009

1 commit

  • Make VMAs per mm_struct as for MMU-mode linux. This solves two problems:

    (1) In SYSV SHM where nattch for a segment does not reflect the number of
    shmat's (and forks) done.

    (2) In mmap() where the VMA's vm_mm is set to point to the parent mm by an
    exec'ing process when VM_EXECUTABLE is specified, regardless of the fact
    that a VMA might be shared and already have its vm_mm assigned to another
    process or a dead process.

    A new struct (vm_region) is introduced to track a mapped region and to remember
    the circumstances under which it may be shared and the vm_list_struct structure
    is discarded as it's no longer required.

    This patch makes the following additional changes:

    (1) Regions are now allocated with alloc_pages() rather than kmalloc() and
    with no recourse to __GFP_COMP, so the pages are not composite. Instead,
    each page has a reference on it held by the region. Anything else that is
    interested in such a page will have to get a reference on it to retain it.
    When the pages are released due to unmapping, each page is passed to
    put_page() and will be freed when the page usage count reaches zero.

    (2) Excess pages are trimmed after an allocation as the allocation must be
    made as a power-of-2 quantity of pages.

    (3) VMAs are added to the parent MM's R/B tree and mmap lists. As an MM may
    end up with overlapping VMAs within the tree, the VMA struct address is
    appended to the sort key.

    (4) Non-anonymous VMAs are now added to the backing inode's prio list.

    (5) Holes may be punched in anonymous VMAs with munmap(), releasing parts of
    the backing region. The VMA and region structs will be split if
    necessary.

    (6) sys_shmdt() only releases one attachment to a SYSV IPC shared memory
    segment instead of all the attachments at that addresss. Multiple
    shmat()'s return the same address under NOMMU-mode instead of different
    virtual addresses as under MMU-mode.

    (7) Core dumping for ELF-FDPIC requires fewer exceptions for NOMMU-mode.

    (8) /proc/maps is now the global list of mapped regions, and may list bits
    that aren't actually mapped anywhere.

    (9) /proc/meminfo gains a line (tagged "MmapCopy") that indicates the amount
    of RAM currently allocated by mmap to hold mappable regions that can't be
    mapped directly. These are copies of the backing device or file if not
    anonymous.

    These changes make NOMMU mode more similar to MMU mode. The downside is that
    NOMMU mode requires some extra memory to track things over NOMMU without this
    patch (VMAs are no longer shared, and there are now region structs).

    Signed-off-by: David Howells
    Tested-by: Mike Frysinger
    Acked-by: Paul Mundt

    David Howells
     

07 Jan, 2009

1 commit

  • Use the macro shm_ids().

    Remove useless check for a userspace pointer, because copy_to_user()
    will check it.

    Some style cleanups.

    Signed-off-by: WANG Cong
    Cc: Nadia Derbey
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    WANG Cong
     

05 Jan, 2009

1 commit


14 Nov, 2008

3 commits

  • Wrap current->cred and a few other accessors to hide their actual
    implementation.

    Signed-off-by: David Howells
    Acked-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    David Howells
     
  • Separate the task security context from task_struct. At this point, the
    security data is temporarily embedded in the task_struct with two pointers
    pointing to it.

    Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
    entry.S via asm-offsets.

    With comment fixes Signed-off-by: Marc Dionne

    Signed-off-by: David Howells
    Acked-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    David Howells
     
  • Wrap access to task credentials so that they can be separated more easily from
    the task_struct during the introduction of COW creds.

    Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

    Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
    sense to use RCU directly rather than a convenient wrapper; these will be
    addressed by later patches.

    Signed-off-by: David Howells
    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Signed-off-by: James Morris

    David Howells
     

21 Oct, 2008

1 commit


20 Oct, 2008

1 commit

  • Shmem segments locked into memory via shmctl(SHM_LOCKED) should not be
    kept on the normal LRU, since scanning them is a waste of time and might
    throw off kswapd's balancing algorithms. Place them on the unevictable
    LRU list instead.

    Use the AS_UNEVICTABLE flag to mark address_space of SHM_LOCKed shared
    memory regions as unevictable. Then these pages will be culled off the
    normal LRU lists during vmscan.

    Add new wrapper function to clear the mapping's unevictable state when/if
    shared memory segment is munlocked.

    Add 'scan_mapping_unevictable_page()' to mm/vmscan.c to scan all pages in
    the shmem segment's mapping [struct address_space] for evictability now
    that they're no longer locked. If so, move them to the appropriate zone
    lru list.

    Changes depend on [CONFIG_]UNEVICTABLE_LRU.

    [kosaki.motohiro@jp.fujitsu.com: revert shm change]
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Rik van Riel
    Signed-off-by: Kosaki Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     

26 Jul, 2008

1 commit

  • Remove the ipc_lock_down() routines: they used to call idr_find() locklessly
    (given that the ipc ids lock was already held), so they are not needed
    anymore.

    Signed-off-by: Nadia Derbey
    Acked-by: "Paul E. McKenney"
    Cc: Manfred Spraul
    Cc: Jim Houston
    Cc: Pierre Peiffer
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     

25 Jul, 2008

1 commit

  • The goal of this patchset is to support multiple hugetlb page sizes. This
    is achieved by introducing a new struct hstate structure, which
    encapsulates the important hugetlb state and constants (eg. huge page
    size, number of huge pages currently allocated, etc).

    The hstate structure is then passed around the code which requires these
    fields, they will do the right thing regardless of the exact hstate they
    are operating on.

    This patch adds the hstate structure, with a single global instance of it
    (default_hstate), and does the basic work of converting hugetlb to use the
    hstate.

    Future patches will add more hstate structures to allow for different
    hugetlbfs mounts to have different page sizes.

    [akpm@linux-foundation.org: coding-style fixes]
    Acked-by: Adam Litke
    Acked-by: Nishanth Aravamudan
    Signed-off-by: Andi Kleen
    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

13 Jun, 2008

1 commit

  • sysvipc_shm_proc_show() picks between format strings (based on the
    expected maximum length of a SHM segment) in a way that prevents gcc from
    performing format checks on the seq_printf() parameters. This hid two
    format errors - shp->shm_segsz and shp->shm_nattach are both unsigned
    long, but were being printed as unsigned int and signed int respectively.
    This leads to 32-bit truncation of SHM segment sizes reported in
    /proc/sysvipc/shm. (And for nattach, but that's less of a problem for
    most users).

    This patch makes the format string directly visible to gcc's format
    specifier checker, and fixes the two broken format specifiers.

    Signed-off-by: Paul Menage
    Cc: Nadia Derbey
    Cc: Manfred Spraul
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     

10 Jun, 2008

1 commit


29 Apr, 2008

5 commits

  • semctl_down(), msgctl_down() and shmctl_down() are used to handle the same set
    of commands for each kind of IPC. They all start to do the same job (they
    retrieve the ipc and do some permission checks) before handling the commands
    on their own.

    This patch proposes to consolidate this by moving these same pieces of code
    into one common function called ipcctl_pre_down().

    It simplifies a little these xxxctl_down() functions and increases a little
    the maintainability.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • The IPC_SET command performs the same permission setting for all IPCs. This
    patch introduces a common ipc_update_perm() function to update these
    permissions and makes use of it for all IPCs.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • All IPCs make use of an intermetiate *_setbuf structure to handle the IPC_SET
    command. This is not really needed and, moreover, it complicates a little bit
    the code.

    This patch gets rid of the use of it and uses directly the semid64_ds/
    msgid64_ds/shmid64_ds structure.

    In addition of removing one struture declaration, it also simplifies and
    improves a little bit the common 64-bits path.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Currently, the way the different commands are handled in sys_shmctl introduces
    some duplicated code.

    This patch introduces the shmctl_down function to handle all the commands
    requiring the rwmutex to be taken in write mode (ie IPC_SET and IPC_RMID for
    now). It is the equivalent function of semctl_down for shared memory.

    This removes some duplicated code for handling these both commands and
    harmonizes the way they are handled among all IPCs.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • By continuing to consolidate a little the IPC code, each id can be built
    directly in ipc_addid() instead of having it built from each callers of
    ipc_addid()

    And I also remove shm_addid() in order to have, as much as possible, the
    same code for shm/sem/msg.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Pierre Peiffer
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     

28 Apr, 2008

2 commits

  • After further discussion with Christoph Lameter, it has become clear that my
    earlier attempts to clean up the mempolicy reference counting were a bit of
    overkill in some areas, resulting in superflous ref/unref in what are usually
    fast paths. In other areas, further inspection reveals that I botched the
    unref for interleave policies.

    A separate patch, suitable for upstream/stable trees, fixes up the known
    errors in the previous attempt to fix reference counting.

    This patch reworks the memory policy referencing counting and, one hopes,
    simplifies the code. Maybe I'll get it right this time.

    See the update to the numa_memory_policy.txt document for a discussion of
    memory policy reference counting that motivates this patch.

    Summary:

    Lookup of mempolicy, based on (vma, address) need only add a reference for
    shared policy, and we need only unref the policy when finished for shared
    policies. So, this patch backs out all of the unneeded extra reference
    counting added by my previous attempt. It then unrefs only shared policies
    when we're finished with them, using the mpol_cond_put() [conditional put]
    helper function introduced by this patch.

    Note that shmem_swapin() calls read_swap_cache_async() with a dummy vma
    containing just the policy. read_swap_cache_async() can call alloc_page_vma()
    multiple times, so we can't let alloc_page_vma() unref the shared policy in
    this case. To avoid this, we make a copy of any non-null shared policy and
    remove the MPOL_F_SHARED flag from the copy. This copy occurs before reading
    a page [or multiple pages] from swap, so the overhead should not be an issue
    here.

    I introduced a new static inline function "mpol_cond_copy()" to copy the
    shared policy to an on-stack policy and remove the flags that would require a
    conditional free. The current implementation of mpol_cond_copy() assumes that
    the struct mempolicy contains no pointers to dynamically allocated structures
    that must be duplicated or reference counted during copy.

    Signed-off-by: Lee Schermerhorn
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Mel Gorman
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     
  • get_vma_policy() is not handling fallback to task policy correctly when the
    get_policy() vm_op returns NULL. The NULL overwrites the 'pol' variable that
    was holding the fallback task mempolicy. So, it was falling back directly to
    system default policy.

    Fix get_vma_policy() to use only non-NULL policy returned from the vma
    get_policy op.

    shm_get_policy() was falling back to current task's mempolicy if the "backing
    file system" [tmpfs vs hugetlbfs] does not support the get_policy vm_op and
    the vma policy is null. This is incorrect for show_numa_maps() which is
    likely querying the numa_maps of some task other than current. Remove this
    fallback.

    Signed-off-by: Lee Schermerhorn
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Mel Gorman
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     

11 Mar, 2008

1 commit

  • Address 3 known bugs in the current memory policy reference counting method.
    I have a series of patches to rework the reference counting to reduce overhead
    in the allocation path. However, that series will require testing in -mm once
    I repost it.

    1) alloc_page_vma() does not release the extra reference taken for
    vma/shared mempolicy when the mode == MPOL_INTERLEAVE. This can result in
    leaking mempolicy structures. This is probably occurring, but not being
    noticed.

    Fix: add the conditional release of the reference.

    2) hugezonelist unconditionally releases a reference on the mempolicy when
    mode == MPOL_INTERLEAVE. This can result in decrementing the reference
    count for system default policy [should have no ill effect] or premature
    freeing of task policy. If this occurred, the next allocation using task
    mempolicy would use the freed structure and probably BUG out.

    Fix: add the necessary check to the release.

    3) The current reference counting method assumes that vma 'get_policy()'
    methods automatically add an extra reference a non-NULL returned mempolicy.
    This is true for shmem_get_policy() used by tmpfs mappings, including
    regular page shm segments. However, SHM_HUGETLB shm's, backed by
    hugetlbfs, just use the vma policy without the extra reference. This
    results in freeing of the vma policy on the first allocation, with reuse of
    the freed mempolicy structure on subsequent allocations.

    Fix: Rather than add another condition to the conditional reference
    release, which occur in the allocation path, just add a reference when
    returning the vma policy in shm_get_policy() to match the assumptions.

    Signed-off-by: Lee Schermerhorn
    Cc: Greg KH
    Cc: Andi Kleen
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: David Rientjes
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     

09 Feb, 2008

3 commits

  • sem_exit_ns(), msg_exit_ns() and shm_exit_ns() are all called when an
    ipc_namespace is released to free all ipcs of each type. But in fact, they
    do the same thing: they loop around all ipcs to free them individually by
    calling a specific routine.

    This patch proposes to consolidate this by introducing a common function,
    free_ipcs(), that do the job. The specific routine to call on each
    individual ipcs is passed as parameter. For this, these ipc-specific
    'free' routines are reworked to take a generic 'struct ipc_perm' as
    parameter.

    Signed-off-by: Pierre Peiffer
    Cc: Cedric Le Goater
    Cc: Pavel Emelyanov
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Each ipc_namespace contains a table of 3 pointers to struct ipc_ids (3 for
    msg, sem and shm, structure used to store all ipcs) These 'struct ipc_ids'
    are dynamically allocated for each icp_namespace as the ipc_namespace
    itself (for the init namespace, they are initialized with pointers to
    static variables instead)

    It is so for historical reason: in fact, before the use of idr to store the
    ipcs, the ipcs were stored in tables of variable length, depending of the
    maximum number of ipc allowed. Now, these 'struct ipc_ids' have a fixed
    size. As they are allocated in any cases for each new ipc_namespace, there
    is no gain of memory in having them allocated separately of the struct
    ipc_namespace.

    This patch proposes to make this table static in the struct ipc_namespace.
    Thus, we can allocate all in once and get rid of all the code needed to
    allocate and free these ipc_ids separately.

    Signed-off-by: Pierre Peiffer
    Acked-by: Cedric Le Goater
    Cc: Pavel Emelyanov
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Currently the IPC namespace management code is spread over the ipc/*.c files.
    I moved this code into ipc/namespace.c file which is compiled out when needed.

    The linux/ipc_namespace.h file is used to store the prototypes of the
    functions in namespace.c and the stubs for NAMESPACES=n case. This is done
    so, because the stub for copy_ipc_namespace requires the knowledge of the
    CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in
    included into many many .c files via the sys.h->sem.h sequence so adding the
    sched.h into it will make all these .c depend on sched.h which is not that
    good. On the other hand the knowledge about the namespaces stuff is required
    in 4 .c files only.

    Besides, this patch compiles out some auxiliary functions from ipc/sem.c,
    msg.c and shm.c files. It turned out that moving these functions into
    namespaces.c is not that easy because they use many other calls and macros
    from the original file. Moving them would make this patch complicated. On
    the other hand all these functions can be consolidated, so I will send a
    separate patch doing this a bit later.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

07 Feb, 2008

1 commit

  • In the new implementation of the [sem|shm|msg]_lock[_check]() routines, we
    use the return value of ipc_lock() in container_of() without any check.
    But ipc_lock may return a errcode. The use of this errcode in
    container_of() may alter this errcode, and we don't want this.

    And in xxx_exit_ns, the pointer return by idr_find is of type 'struct
    kern_ipc_per'...

    Today, the code will work as is because the member used in these
    container_of() is the first member of its container (offset == 0), the
    errcode isn't changed then. But in the general case, we can't count on
    this assumption and this may lead later to a real bug if we don't correct
    this.

    Again, the proposed solution is simple and correct. But, as pointed by
    Nadia, with this solution, the same check will be done several times (in
    all sub-callers...), what is not very funny/optimal...

    Signed-off-by: Pierre Peiffer
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     

20 Oct, 2007

8 commits

  • With the use of idr to store the ipc, the case where the idr cache is
    empty, when idr_get_new is called (this may happen even if we call
    idr_pre_get() before), is not well handled: it lets
    semget()/shmget()/msgget() return ENOSPC when this cache is empty, what 1.
    does not reflect the facts and 2. does not conform to the man(s).

    This patch fixes this by retrying the whole process of allocation in this case.

    Signed-off-by: Pierre Peiffer
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Remvoe the unneeded parameters from ipc_checkid() and ipc_buildid()
    interfaces.

    Signed-off-by: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • This is a patch that fixes the way idr_find() used to be called in ipc_lock():
    in all the paths that don't imply an update of the ipcs idr, it was called
    without the idr tree being locked.

    The changes are:
    . in ipc_ids, the mutex has been changed into a reader/writer semaphore.
    . ipc_lock() now takes the mutex as a reader during the idr_find().
    . a new routine ipc_lock_down() has been defined: it doesn't take the
    mutex, assuming that it is being held by the caller. This is the routine
    that is now called in all the update paths.

    Signed-off-by: Nadia Derbey
    Acked-by: Jarek Poplawski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • This patch fixes the wrong / obsolete comments in the ipc code. Also adds
    a missing lock around ipc_get_maxid() in shm_get_stat().

    Signed-off-by: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • This patch converts casts of struct kern_ipc_perm to
    . struct msg_queue
    . struct sem_array
    . struct shmid_kernel
    into the equivalent container_of() macro. It improves code maintenance
    because the code need not change if kern_ipc_perm is no longer at the
    beginning of the containing struct.

    Signed-off-by: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • This patch introduces a new ipc_lock_check() routine interface:
    . each time ipc_checkid() is called, this is done after calling ipc_lock().
    ipc_checkid() is now called from inside ipc_lock_check().

    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: fix RCU locking]
    Signed-off-by: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • This is a trivial patch that removes the ipc_get() routine: it is replaced
    by a call to idr_find().

    Signed-off-by: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • This patch introduces a change into the sys_msgget(), sys_semget() and
    sys_shmget() routines: they now share a common code, which is better for
    maintainability.

    Signed-off-by: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey