20 Nov, 2008

1 commit

  • A problem was found while reviewing the code after Bugzilla bug
    http://bugzilla.kernel.org/show_bug.cgi?id=11796.

    In ipc_addid(), the newly allocated ipc structure is inserted into the
    ipcs tree (i.e made visible to readers) without locking it. This is not
    correct since its initialization continues after it has been inserted in
    the tree.

    This patch moves the ipc structure lock initialization + locking before
    the actual insertion.

    Signed-off-by: Nadia Derbey
    Reported-by: Clement Calmels
    Cc: Manfred Spraul
    Cc: [2.6.27.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     

21 Oct, 2008

1 commit


20 Oct, 2008

2 commits

  • Increase the range of various posix message queue limits.

    Posix gives the message queue user the ability to 'trade off' the maximum
    size of messages with the number of possible messages that can be 'in
    flight'. Linux currently makes this trade off more restrictive than it
    needs to be.

    In particular, the maximum message size today can be made no smaller than
    8192. This greatly restricts those applications that would like to have
    the ability to post large numbers of very small messages.

    So this task lowers the limit that the maximum message size can be set to,
    from 8192 to 128. It also lowers the limit that the maximum #number of
    messages in flight can be set to, from 10 to 1.

    With these changes the message queue user can make better trade offs
    between #messages and message size, in order to get everything to fit
    within the setrlimit(RLIMIT_MSGQUEUE) limit for that particular user.

    This patch also applies the values in

    /proc/sys/fs/mqueue/msg_max
    /proc/sys/fs/mqueue/msgsize_max

    as the defaults for the max #messages allowed and the max message size
    allowed, respectively, for those applications that do not supply these.
    Previously, the defaults were hardwired to 10 and 8192, respectively.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Joe Korty
    Cc: Al Viro
    Cc: Manfred Spraul
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Korty
     
  • Shmem segments locked into memory via shmctl(SHM_LOCKED) should not be
    kept on the normal LRU, since scanning them is a waste of time and might
    throw off kswapd's balancing algorithms. Place them on the unevictable
    LRU list instead.

    Use the AS_UNEVICTABLE flag to mark address_space of SHM_LOCKed shared
    memory regions as unevictable. Then these pages will be culled off the
    normal LRU lists during vmscan.

    Add new wrapper function to clear the mapping's unevictable state when/if
    shared memory segment is munlocked.

    Add 'scan_mapping_unevictable_page()' to mm/vmscan.c to scan all pages in
    the shmem segment's mapping [struct address_space] for evictability now
    that they're no longer locked. If so, move them to the appropriate zone
    lru list.

    Changes depend on [CONFIG_]UNEVICTABLE_LRU.

    [kosaki.motohiro@jp.fujitsu.com: revert shm change]
    Signed-off-by: Lee Schermerhorn
    Signed-off-by: Rik van Riel
    Signed-off-by: Kosaki Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     

17 Oct, 2008

2 commits

  • Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • name and nlen parameters passed to ->strategy hook are unused, remove
    them. In general ->strategy hook should know what it's doing, and don't
    do something tricky for which, say, pointer to original userspace array
    may be needed (name).

    Signed-off-by: Alexey Dobriyan
    Acked-by: David S. Miller [ networking bits ]
    Cc: Ralf Baechle
    Cc: David Howells
    Cc: Matt Mackall
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

27 Jul, 2008

2 commits

  • Incidentally, the name that gives hundreds of false positives on grep
    is not a good idea...

    Signed-off-by: Al Viro

    Al Viro
     
  • Kmem cache passed to constructor is only needed for constructors that are
    themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
    passed kmem cache in non-trivial way, so pass only pointer to object.

    Non-trivial places are:
    arch/powerpc/mm/init_64.c
    arch/powerpc/mm/hugetlbpage.c

    This is flag day, yes.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Pekka Enberg
    Acked-by: Christoph Lameter
    Cc: Jon Tollefson
    Cc: Nick Piggin
    Cc: Matt Mackall
    [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
    [akpm@linux-foundation.org: fix mm/slab.c]
    [akpm@linux-foundation.org: fix ubifs]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

26 Jul, 2008

8 commits

  • This patch proposes an alternative to the "magical
    positive-versus-negative number trick" Andrew complained about last week
    in http://lkml.org/lkml/2008/6/24/418.

    This had been introduced with the patches that scale msgmni to the amount
    of lowmem. With these patches, msgmni has a registered notification
    routine that recomputes msgmni value upon memory add/remove or ipc
    namespace creation/ removal.

    When msgmni is changed from user space (i.e. value written to the proc
    file), that notification routine is unregistered, and the way to make it
    registered back is to write a negative value into the proc file. This is
    the "magical positive-versus-negative number trick".

    To fix this, a new proc file is introduced: /proc/sys/kernel/auto_msgmni.
    This file acts as ON/OFF for msgmni automatic recomputing.

    With this patch, the process is the following:
    1) kernel boots in "automatic recomputing mode"
    /proc/sys/kernel/msgmni contains the value that has been computed (depends
    on lowmem)
    /proc/sys/kernel/automatic_msgmni contains "1"

    2) echo > /proc/sys/kernel/msgmni
    . sets msg_ctlmni to
    . de-activates automatic recomputing (i.e. if, say, some memory is added
    msgmni won't be recomputed anymore)
    . /proc/sys/kernel/automatic_msgmni now contains "0"

    3) echo "0" > /proc/sys/kernel/automatic_msgmni
    . de-activates msgmni automatic recomputing
    this has the same effect as 2) except that msg_ctlmni's value stays
    blocked at its current value)

    3) echo "1" > /proc/sys/kernel/automatic_msgmni
    . recomputes msgmni's value based on the current available memory size
    and number of ipc namespaces
    . re-activates automatic recomputing for msgmni.

    Signed-off-by: Nadia Derbey
    Cc: Solofo Ramangalahy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Also this patch kills unneccesary trailing NULL character.

    Signed-off-by: Akinobu Mita
    Cc: Nadia Derbey
    Cc: Manfred Spraul
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     
  • The attached patch:
    - reverses the locking order of ulp->lock and sem_lock:
    Previously, it was first ulp->lock, then inside sem_lock.
    Now it's the other way around.
    - converts the undo structure to rcu.

    Benefits:
    - With the old locking order, IPC_RMID could not kfree the undo structures.
    The stale entries remained in the linked lists and were released later.
    - The patch fixes a a race in semtimedop(): if both IPC_RMID and a semget() that
    recreates exactly the same id happen between find_alloc_undo() and sem_lock,
    then semtimedop() would access already kfree'd memory.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Manfred Spraul
    Reviewed-by: Nadia Derbey
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • sem_array.sem_pending is a double linked list, the attached patch converts
    it to struct list_head.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Manfred Spraul
    Reviewed-by: Nadia Derbey
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • sem_queue.sma and sem_queue.id were never used, the attached patch removes
    them.

    Signed-off-by: Manfred Spraul
    Reviewed-by: Nadia Derbey
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • The undo structures contain two linked lists, the attached patch replaces
    them with generic struct list_head lists.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Manfred Spraul
    Cc: Nadia Derbey
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • Remove the ipc_lock_down() routines: they used to call idr_find() locklessly
    (given that the ipc ids lock was already held), so they are not needed
    anymore.

    Signed-off-by: Nadia Derbey
    Acked-by: "Paul E. McKenney"
    Cc: Manfred Spraul
    Cc: Jim Houston
    Cc: Pierre Peiffer
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Call idr_find() locklessly from ipc_lock(), since the idr tree is now RCU
    protected.

    Signed-off-by: Nadia Derbey
    Acked-by: "Paul E. McKenney"
    Cc: Manfred Spraul
    Cc: Jim Houston
    Cc: Pierre Peiffer
    Acked-by: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     

25 Jul, 2008

1 commit

  • The goal of this patchset is to support multiple hugetlb page sizes. This
    is achieved by introducing a new struct hstate structure, which
    encapsulates the important hugetlb state and constants (eg. huge page
    size, number of huge pages currently allocated, etc).

    The hstate structure is then passed around the code which requires these
    fields, they will do the right thing regardless of the exact hstate they
    are operating on.

    This patch adds the hstate structure, with a single global instance of it
    (default_hstate), and does the basic work of converting hugetlb to use the
    hstate.

    Future patches will add more hstate structures to allow for different
    hugetlbfs mounts to have different page sizes.

    [akpm@linux-foundation.org: coding-style fixes]
    Acked-by: Adam Litke
    Acked-by: Nishanth Aravamudan
    Signed-off-by: Andi Kleen
    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andi Kleen
     

14 Jun, 2008

1 commit


13 Jun, 2008

1 commit

  • sysvipc_shm_proc_show() picks between format strings (based on the
    expected maximum length of a SHM segment) in a way that prevents gcc from
    performing format checks on the seq_printf() parameters. This hid two
    format errors - shp->shm_segsz and shp->shm_nattach are both unsigned
    long, but were being printed as unsigned int and signed int respectively.
    This leads to 32-bit truncation of SHM segment sizes reported in
    /proc/sysvipc/shm. (And for nattach, but that's less of a problem for
    most users).

    This patch makes the format string directly visible to gcc's format
    specifier checker, and fixes the two broken format specifiers.

    Signed-off-by: Paul Menage
    Cc: Nadia Derbey
    Cc: Manfred Spraul
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     

10 Jun, 2008

1 commit


07 Jun, 2008

1 commit

  • When posting:
    [PATCH 1/8] Scaling msgmni to the amount of lowmem
    (see http://lkml.org/lkml/2008/2/11/171), I have added a KERN_INFO message
    that is output each time msgmni is recomputed.

    In http://lkml.org/lkml/2008/4/29/575 Tony Luck complained that this
    message references an ipc namespace address that is useless.

    I first thought of using an audit_log instead of a printk, as suggested by
    Serge Hallyn. But unfortunately, we do not have any other information
    than the namespace address to provide here too. So I chose to move the
    message and output it only at boot time, removing the reference to the
    namespace.

    Signed-off-by: Nadia Derbey
    Cc: Pierre Peiffer
    Cc: Manfred Spraul
    Acked-by: Tony Luck
    Cc: "Serge E. Hallyn"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     

06 Jun, 2008

1 commit


04 May, 2008

1 commit

  • A very small cleanup for mq_open.

    We do not have to call set_close_on_exit if we create the file
    descriptor right away with the flag set. We have a function for this
    now. The resulting code is smaller and a tiny bit faster.

    Signed-off-by: Ulrich Drepper
    Signed-off-by: Linus Torvalds

    Ulrich Drepper
     

29 Apr, 2008

17 commits

  • Use proc_create_data() to make sure that ->proc_fops and ->data be setup
    before gluing PDE to main tree.

    Signed-off-by: Denis V. Lunev
    Cc: Alexey Dobriyan
    Cc: "Eric W. Biederman"
    Cc: Nadia Derbey
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denis V. Lunev
     
  • sys_unshare(CLONE_NEWIPC) doesn't handle the undo lists properly, this can
    cause a kernel memory corruption. CLONE_NEWIPC must detach from the existing
    undo lists.

    Fix, part 1: add support for sys_unshare(CLONE_SYSVSEM)

    The original reason to not support it was the potential (inevitable?)
    confusion due to the fact that sys_unshare(CLONE_SYSVSEM) has the
    inverse meaning of clone(CLONE_SYSVSEM).

    Our two most reasonable options then appear to be (1) fully support
    CLONE_SYSVSEM, or (2) continue to refuse explicit CLONE_SYSVSEM,
    but always do it anyway on unshare(CLONE_SYSVSEM). This patch does
    (1).

    Changelog:
    Apr 16: SEH: switch to Manfred's alternative patch which
    removes the unshare_semundo() function which
    always refused CLONE_SYSVSEM.

    Signed-off-by: Manfred Spraul
    Signed-off-by: Serge E. Hallyn
    Acked-by: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Cc: Michael Kerrisk
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • Add definitions of USHORT_MAX and others into kernel. ipc uses it and slub
    implementation might also use it.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Zhang Yanmin
    Reviewed-by: Christoph Lameter
    Cc: Nadia Derbey
    Cc: "Pierre Peiffer"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhang, Yanmin
     
  • semctl_down(), msgctl_down() and shmctl_down() are used to handle the same set
    of commands for each kind of IPC. They all start to do the same job (they
    retrieve the ipc and do some permission checks) before handling the commands
    on their own.

    This patch proposes to consolidate this by moving these same pieces of code
    into one common function called ipcctl_pre_down().

    It simplifies a little these xxxctl_down() functions and increases a little
    the maintainability.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • The IPC_SET command performs the same permission setting for all IPCs. This
    patch introduces a common ipc_update_perm() function to update these
    permissions and makes use of it for all IPCs.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • All IPCs make use of an intermetiate *_setbuf structure to handle the IPC_SET
    command. This is not really needed and, moreover, it complicates a little bit
    the code.

    This patch gets rid of the use of it and uses directly the semid64_ds/
    msgid64_ds/shmid64_ds structure.

    In addition of removing one struture declaration, it also simplifies and
    improves a little bit the common 64-bits path.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • semctl_down() takes one unused parameter: semnum. This patch proposes to get
    rid of it.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • semctl_down is called with the rwmutex (the one which protects the list of
    ipcs) taken in write mode.

    This patch moves this rwmutex taken in write-mode inside semctl_down.

    This has the advantages of reducing a little bit the window during which this
    rwmutex is taken, clarifying sys_semctl, and finally of having a coherent
    behaviour with [shm|msg]ctl_down

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Currently, sys_msgctl is not easy to read.

    This patch tries to improve that by introducing the msgctl_down function to
    handle all commands requiring the rwmutex to be taken in write mode (ie
    IPC_SET and IPC_RMID for now). It is the equivalent function of semctl_down
    for message queues.

    This greatly changes the readability of sys_msgctl and also harmonizes the way
    these commands are handled among all IPCs.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Currently, the way the different commands are handled in sys_shmctl introduces
    some duplicated code.

    This patch introduces the shmctl_down function to handle all the commands
    requiring the rwmutex to be taken in write mode (ie IPC_SET and IPC_RMID for
    now). It is the equivalent function of semctl_down for shared memory.

    This removes some duplicated code for handling these both commands and
    harmonizes the way they are handled among all IPCs.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • Trivial patch which adds some small locking functions and makes use of them to
    factorize some part of the code and to make it cleaner.

    Signed-off-by: Pierre Peiffer
    Acked-by: Serge Hallyn
    Cc: Nadia Derbey
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pierre Peiffer
     
  • The enhancement as asked for by Yasunori: if msgmni is set to a negative
    value, register it back into the ipcns notifier chain.

    A new interface has been added to the notification mechanism:
    notifier_chain_cond_register() registers a notifier block only if not already
    registered. With that new interface we avoid taking care of the states
    changes in procfs.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Make msgmni not recomputed anymore upon ipc namespace creation / removal or
    memory add/remove, as soon as it has been set from userland.

    As soon as msgmni is explicitly set via procfs or sysctl(), the associated
    callback routine is unregistered from the ipc namespace notifier chain.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Introduce a notification mechanism that aims at recomputing msgmni each time
    an ipc namespace is created or removed.

    The ipc namespace notifier chain already defined for memory hotplug management
    is used for that purpose too.

    Each time a new ipc namespace is allocated or an existing ipc namespace is
    removed, the ipcns notifier chain is notified. The callback routine for each
    registered ipc namespace is then activated in order to recompute msgmni for
    that namespace.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Make the memory hotplug chain's mutex held for a shorter time: when memory is
    offlined or onlined a work item is added to the global workqueue. When the
    work item is run, it notifies the ipcns notifier chain with the
    IPCNS_MEMCHANGED event.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Introduce the registration of a callback routine that recomputes msg_ctlmni
    upon memory add / remove.

    A single notifier block is registered in the hotplug memory chain for all the
    ipc namespaces.

    Since the ipc namespaces are not linked together, they have their own
    notification chain: one notifier_block is defined per ipc namespace.

    Each time an ipc namespace is created (removed) it registers (unregisters) its
    notifier block in (from) the ipcns chain. The callback routine registered in
    the memory chain invokes the ipcns notifier chain with the IPCNS_LOWMEM event.
    Each callback routine registered in the ipcns namespace, in turn, recomputes
    msgmni for the owning namespace.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Since all the namespaces see the same amount of memory (the total one) this
    patch introduces a new variable that counts the ipc namespaces and divides
    msg_ctlmni by this counter.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey