06 Sep, 2020

1 commit

  • Commit 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
    changed ctl_table.proc_handler to take a kernel pointer. Adjust the
    signature of proc_ipc_sem_dointvec to match ctl_table.proc_handler which
    fixes the following sparse error/warning:

    ipc/ipc_sysctl.c:94:47: warning: incorrect type in argument 3 (different address spaces)
    ipc/ipc_sysctl.c:94:47: expected void *buffer
    ipc/ipc_sysctl.c:94:47: got void [noderef] __user *buffer
    ipc/ipc_sysctl.c:194:35: warning: incorrect type in initializer (incompatible argument 3 (different address spaces))
    ipc/ipc_sysctl.c:194:35: expected int ( [usertype] *proc_handler )( ... )
    ipc/ipc_sysctl.c:194:35: got int ( * )( ... )

    Fixes: 32927393dc1c ("sysctl: pass kernel pointers to ->proc_handler")
    Signed-off-by: Tobias Klauser
    Signed-off-by: Andrew Morton
    Cc: Christoph Hellwig
    Cc: Alexander Viro
    Link: https://lkml.kernel.org/r/20200825105846.5193-1-tklauser@distanz.ch
    Signed-off-by: Linus Torvalds

    Tobias Klauser
     

27 Apr, 2020

1 commit

  • Instead of having all the sysctl handlers deal with user pointers, which
    is rather hairy in terms of the BPF interaction, copy the input to and
    from userspace in common code. This also means that the strings are
    always NUL-terminated by the common code, making the API a little bit
    safer.

    As most handler just pass through the data to one of the common handlers
    a lot of the changes are mechnical.

    Signed-off-by: Christoph Hellwig
    Acked-by: Andrey Ignatov
    Signed-off-by: Al Viro

    Christoph Hellwig
     

19 Jul, 2019

1 commit

  • In the sysctl code the proc_dointvec_minmax() function is often used to
    validate the user supplied value between an allowed range. This
    function uses the extra1 and extra2 members from struct ctl_table as
    minimum and maximum allowed value.

    On sysctl handler declaration, in every source file there are some
    readonly variables containing just an integer which address is assigned
    to the extra1 and extra2 members, so the sysctl range is enforced.

    The special values 0, 1 and INT_MAX are very often used as range
    boundary, leading duplication of variables like zero=0, one=1,
    int_max=INT_MAX in different source files:

    $ git grep -E '\.extra[12].*&(zero|one|int_max)' |wc -l
    248

    Add a const int array containing the most commonly used values, some
    macros to refer more easily to the correct array member, and use them
    instead of creating a local one for every object file.

    This is the bloat-o-meter output comparing the old and new binary
    compiled with the default Fedora config:

    # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o
    add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164)
    Data old new delta
    sysctl_vals - 12 +12
    __kstrtab_sysctl_vals - 12 +12
    max 14 10 -4
    int_max 16 - -16
    one 68 - -68
    zero 128 28 -100
    Total: Before=20583249, After=20583085, chg -0.00%

    [mcroce@redhat.com: tipc: remove two unused variables]
    Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com
    [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c]
    [arnd@arndb.de: proc/sysctl: make firmware loader table conditional]
    Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de
    [akpm@linux-foundation.org: fix fs/eventpoll.c]
    Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com
    Signed-off-by: Matteo Croce
    Signed-off-by: Arnd Bergmann
    Acked-by: Kees Cook
    Reviewed-by: Aaron Tomlin
    Cc: Matthew Wilcox
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matteo Croce
     

05 Jun, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public license as published by
    the free software foundation version 2 of the license

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-only

    has been chosen to replace the boilerplate/reference in 315 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Armijn Hemel
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190531190115.503150771@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

15 May, 2019

2 commits

  • For ipcmni_extend mode, the sequence number space is only 7 bits. So
    the chance of id reuse is relatively high compared with the non-extended
    mode.

    To alleviate this id reuse problem, this patch enables cyclic allocation
    for the index to the radix tree (idx). The disadvantage is that this
    can cause a slight slow-down of the fast path, as the radix tree could
    be higher than necessary.

    To limit the radix tree height, I have chosen the following limits:
    1) The cycling is done over in_use*1.5.
    2) At least, the cycling is done over
    "normal" ipcnmi mode: RADIX_TREE_MAP_SIZE elements
    "ipcmni_extended": 4096 elements

    Result:
    - for normal mode:
    No change for 4095 active objects until the 3rd level
    is added without cyclic allocation.

    For a 2-level radix tree compared to a 1-level radix tree, I have
    observed < 1% performance impact.

    Notes:
    1) Normal "x=semget();y=semget();" is unaffected: Then the idx
    is e.g. a and a+1, regardless if idr_alloc() or idr_alloc_cyclic()
    is used.

    2) The -1% happens in a microbenchmark after this situation:
    x=semget();
    for(i=0;i<
    Acked-by: Waiman Long
    Cc: "Luis R. Rodriguez"
    Cc: Kees Cook
    Cc: Jonathan Corbet
    Cc: Al Viro
    Cc: Matthew Wilcox
    Cc: "Eric W . Biederman"
    Cc: Takashi Iwai
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • The maximum number of unique System V IPC identifiers was limited to
    32k. That limit should be big enough for most use cases.

    However, there are some users out there requesting for more, especially
    those that are migrating from Solaris which uses 24 bits for unique
    identifiers. To satisfy the need of those users, a new boot time kernel
    option "ipcmni_extend" is added to extend the IPCMNI value to 16M. This
    is a 512X increase which should be big enough for users out there that
    need a large number of unique IPC identifier.

    The use of this new option will change the pattern of the IPC
    identifiers returned by functions like shmget(2). An application that
    depends on such pattern may not work properly. So it should only be
    used if the users really need more than 32k of unique IPC numbers.

    This new option does have the side effect of reducing the maximum number
    of unique sequence numbers from 64k down to 128. So it is a trade-off.

    The computation of a new IPC id is not done in the performance critical
    path. So a little bit of additional overhead shouldn't have any real
    performance impact.

    Link: http://lkml.kernel.org/r/20190329204930.21620-1-longman@redhat.com
    Signed-off-by: Waiman Long
    Acked-by: Manfred Spraul
    Cc: Al Viro
    Cc: Davidlohr Bueso
    Cc: "Eric W . Biederman"
    Cc: Jonathan Corbet
    Cc: Kees Cook
    Cc: "Luis R. Rodriguez"
    Cc: Matthew Wilcox
    Cc: Takashi Iwai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Waiman Long
     

31 Oct, 2018

2 commits

  • For SysV semaphores, the semmni value is the last part of the 4-element
    sem number array. To make semmni behave in a similar way to msgmni and
    shmmni, we can't directly use the _minmax handler. Instead, a special sem
    specific handler is added to check the last argument to make sure that it
    is limited to the [0, IPCMNI] range. An error will be returned if this is
    not the case.

    Link: http://lkml.kernel.org/r/1536352137-12003-3-git-send-email-longman@redhat.com
    Signed-off-by: Waiman Long
    Reviewed-by: Davidlohr Bueso
    Cc: "Eric W. Biederman"
    Cc: Jonathan Corbet
    Cc: Kees Cook
    Cc: Luis R. Rodriguez
    Cc: Matthew Wilcox
    Cc: Takashi Iwai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Waiman Long
     
  • Patch series "ipc: IPCMNI limit check for *mni & increase that limit", v9.

    The sysctl parameters msgmni, shmmni and semmni have an inherent limit of
    IPC_MNI (32k). However, users may not be aware of that because they can
    write a value much higher than that without getting any error or
    notification. Reading the parameters back will show the newly written
    values which are not real.

    The real IPCMNI limit is now enforced to make sure that users won't put in
    an unrealistic value. The first 2 patches enforce the limits.

    There are also users out there requesting increase in the IPCMNI value.
    The last 2 patches attempt to do that by using a boot kernel parameter
    "ipcmni_extend" to increase the IPCMNI limit from 32k to 8M if the users
    really want the extended value.

    This patch (of 4):

    A user can write arbitrary integer values to msgmni and shmmni sysctl
    parameters without getting error, but the actual limit is really IPCMNI
    (32k). This can mislead users as they think they can get a value that is
    not real.

    The right limits are now set for msgmni and shmmni so that the users will
    become aware if they set a value outside of the acceptable range.

    Link: http://lkml.kernel.org/r/1536352137-12003-2-git-send-email-longman@redhat.com
    Signed-off-by: Waiman Long
    Acked-by: Luis R. Rodriguez
    Reviewed-by: Davidlohr Bueso
    Cc: Kees Cook
    Cc: Jonathan Corbet
    Cc: Matthew Wilcox
    Cc: "Eric W. Biederman"
    Cc: Takashi Iwai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Waiman Long
     

14 Dec, 2014

1 commit

  • SysV can be abused to allocate locked kernel memory. For most systems, a
    small limit doesn't make sense, see the discussion with regards to SHMMAX.

    Therefore: increase MSGMNI to the maximum supported.

    And: If we ignore the risk of locking too much memory, then an automatic
    scaling of MSGMNI doesn't make sense. Therefore the logic can be removed.

    The code preserves auto_msgmni to avoid breaking any user space applications
    that expect that the value exists.

    Notes:
    1) If an administrator must limit the memory allocations, then he can set
    MSGMNI as necessary.

    Or he can disable sysv entirely (as e.g. done by Android).

    2) MSGMAX and MSGMNB are intentionally not increased, as these values are used
    to control latency vs. throughput:
    If MSGMNB is large, then msgsnd() just returns and more messages can be queued
    before a task switch to a task that calls msgrcv() is forced.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Manfred Spraul
    Cc: Davidlohr Bueso
    Cc: Rafael Aquini
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     

14 Oct, 2014

1 commit

  • proc_dointvec_minmax() returns zero if a new value has been set. So we
    don't need to check all charecters have been handled.

    Below you can find two examples. In the new value has not been handled
    properly.

    $ strace ./a.out
    open("/proc/sys/kernel/auto_msgmni", O_WRONLY) = 3
    write(3, "0\n\0", 3) = 2
    close(3) = 0
    exit_group(0)
    $ cat /sys/kernel/debug/tracing/trace

    $strace ./a.out
    open("/proc/sys/kernel/auto_msgmni", O_WRONLY) = 3
    write(3, "0\n", 2) = 2
    close(3) = 0

    $ cat /sys/kernel/debug/tracing/trace
    a.out-697 [000] .... 3280.998235: unregister_ipcns_notifier
    Cc: Mathias Krause
    Cc: Manfred Spraul
    Cc: Joe Perches
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Vagin
     

07 Jun, 2014

1 commit


08 Apr, 2014

1 commit


28 Jan, 2014

1 commit

  • The ipc code does not adhere the typical linux coding style.
    This patch fixes lots of simple whitespace errors.

    - mostly autogenerated by
    scripts/checkpatch.pl -f --fix \
    --types=pointer_location,spacing,space_before_tab
    - one manual fixup (keep structure members tab-aligned)
    - removal of additional space_before_tab that were not found by --fix

    Tested with some of my msg and sem test apps.

    Andrew: Could you include it in -mm and move it towards Linus' tree?

    Signed-off-by: Manfred Spraul
    Suggested-by: Li Bin
    Cc: Joe Perches
    Acked-by: Rafael Aquini
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     

04 Nov, 2013

1 commit

  • Negative message lengths make no sense -- so don't do negative queue
    lenghts or identifier counts. Prevent them from getting negative.

    Also change the underlying data types to be unsigned to avoid hairy
    surprises with sign extensions in cases where those variables get
    evaluated in unsigned expressions with bigger data types, e.g size_t.

    In case a user still wants to have "unlimited" sizes she could just use
    INT_MAX instead.

    Signed-off-by: Mathias Krause
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathias Krause
     

05 Jan, 2013

1 commit

  • Add 3 new variables and sysctls to tune them (by one "next_id" variable
    for messages, semaphores and shared memory respectively). This variable
    can be used to set desired id for next allocated IPC object. By default
    it's equal to -1 and old behaviour is preserved. If this variable is
    non-negative, then desired idr will be extracted from it and used as a
    start value to search for free IDR slot.

    Notes:

    1) this patch doesn't guarantee that the new object will have desired
    id. So it's up to user space how to handle new object with wrong id.

    2) After a sucessful id allocation attempt, "next_id" will be set back
    to -1 (if it was non-negative).

    [akpm@linux-foundation.org: checkpatch fixes]
    Signed-off-by: Stanislav Kinsbursky
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Cc: Al Viro
    Cc: KOSAKI Motohiro
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stanislav Kinsbursky
     

27 Jul, 2011

1 commit

  • Add support for the shm_rmid_forced sysctl. If set to 1, all shared
    memory objects in current ipc namespace will be automatically forced to
    use IPC_RMID.

    The POSIX way of handling shmem allows one to create shm objects and
    call shmdt(), leaving shm object associated with no process, thus
    consuming memory not counted via rlimits.

    With shm_rmid_forced=1 the shared memory object is counted at least for
    one process, so OOM killer may effectively kill the fat process holding
    the shared memory.

    It obviously breaks POSIX - some programs relying on the feature would
    stop working. So set shm_rmid_forced=1 only if you're sure nobody uses
    "orphaned" memory. Use shm_rmid_forced=0 by default for compatability
    reasons.

    The feature was previously impemented in -ow as a configure option.

    [akpm@linux-foundation.org: fix documentation, per Randy]
    [akpm@linux-foundation.org: fix warning]
    [akpm@linux-foundation.org: readability/conventionality tweaks]
    [akpm@linux-foundation.org: fix shm_rmid_forced/shm_forced_rmid confusion, use standard comment layout]
    Signed-off-by: Vasiliy Kulikov
    Cc: Randy Dunlap
    Cc: "Eric W. Biederman"
    Cc: "Serge E. Hallyn"
    Cc: Daniel Lezcano
    Cc: Oleg Nesterov
    Cc: Tejun Heo
    Cc: Ingo Molnar
    Cc: Alan Cox
    Cc: Solar Designer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     

12 Nov, 2009

1 commit


24 Sep, 2009

1 commit

  • It's unused.

    It isn't needed -- read or write flag is already passed and sysctl
    shouldn't care about the rest.

    It _was_ used in two places at arch/frv for some reason.

    Signed-off-by: Alexey Dobriyan
    Cc: David Howells
    Cc: "Eric W. Biederman"
    Cc: Al Viro
    Cc: Ralf Baechle
    Cc: Martin Schwidefsky
    Cc: Ingo Molnar
    Cc: "David S. Miller"
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

03 Apr, 2009

1 commit


07 Jan, 2009

1 commit


17 Oct, 2008

1 commit

  • name and nlen parameters passed to ->strategy hook are unused, remove
    them. In general ->strategy hook should know what it's doing, and don't
    do something tricky for which, say, pointer to original userspace array
    may be needed (name).

    Signed-off-by: Alexey Dobriyan
    Acked-by: David S. Miller [ networking bits ]
    Cc: Ralf Baechle
    Cc: David Howells
    Cc: Matt Mackall
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

26 Jul, 2008

1 commit

  • This patch proposes an alternative to the "magical
    positive-versus-negative number trick" Andrew complained about last week
    in http://lkml.org/lkml/2008/6/24/418.

    This had been introduced with the patches that scale msgmni to the amount
    of lowmem. With these patches, msgmni has a registered notification
    routine that recomputes msgmni value upon memory add/remove or ipc
    namespace creation/ removal.

    When msgmni is changed from user space (i.e. value written to the proc
    file), that notification routine is unregistered, and the way to make it
    registered back is to write a negative value into the proc file. This is
    the "magical positive-versus-negative number trick".

    To fix this, a new proc file is introduced: /proc/sys/kernel/auto_msgmni.
    This file acts as ON/OFF for msgmni automatic recomputing.

    With this patch, the process is the following:
    1) kernel boots in "automatic recomputing mode"
    /proc/sys/kernel/msgmni contains the value that has been computed (depends
    on lowmem)
    /proc/sys/kernel/automatic_msgmni contains "1"

    2) echo > /proc/sys/kernel/msgmni
    . sets msg_ctlmni to
    . de-activates automatic recomputing (i.e. if, say, some memory is added
    msgmni won't be recomputed anymore)
    . /proc/sys/kernel/automatic_msgmni now contains "0"

    3) echo "0" > /proc/sys/kernel/automatic_msgmni
    . de-activates msgmni automatic recomputing
    this has the same effect as 2) except that msg_ctlmni's value stays
    blocked at its current value)

    3) echo "1" > /proc/sys/kernel/automatic_msgmni
    . recomputes msgmni's value based on the current available memory size
    and number of ipc namespaces
    . re-activates automatic recomputing for msgmni.

    Signed-off-by: Nadia Derbey
    Cc: Solofo Ramangalahy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     

29 Apr, 2008

2 commits

  • The enhancement as asked for by Yasunori: if msgmni is set to a negative
    value, register it back into the ipcns notifier chain.

    A new interface has been added to the notification mechanism:
    notifier_chain_cond_register() registers a notifier block only if not already
    registered. With that new interface we avoid taking care of the states
    changes in procfs.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     
  • Make msgmni not recomputed anymore upon ipc namespace creation / removal or
    memory add/remove, as soon as it has been set from userland.

    As soon as msgmni is explicitly set via procfs or sysctl(), the associated
    callback routine is unregistered from the ipc namespace notifier chain.

    Signed-off-by: Nadia Derbey
    Cc: Yasunori Goto
    Cc: Matt Helsley
    Cc: Mingming Cao
    Cc: Pierre Peiffer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nadia Derbey
     

09 Feb, 2008

1 commit

  • Currently the IPC namespace management code is spread over the ipc/*.c files.
    I moved this code into ipc/namespace.c file which is compiled out when needed.

    The linux/ipc_namespace.h file is used to store the prototypes of the
    functions in namespace.c and the stubs for NAMESPACES=n case. This is done
    so, because the stub for copy_ipc_namespace requires the knowledge of the
    CLONE_NEWIPC flag, which is in sched.h. But the linux/ipc.h file itself in
    included into many many .c files via the sys.h->sem.h sequence so adding the
    sched.h into it will make all these .c depend on sched.h which is not that
    good. On the other hand the knowledge about the namespaces stuff is required
    in 4 .c files only.

    Besides, this patch compiles out some auxiliary functions from ipc/sem.c,
    msg.c and shm.c files. It turned out that moving these functions into
    namespaces.c is not that easy because they use many other calls and macros
    from the original file. Moving them would make this patch complicated. On
    the other hand all these functions can be consolidated, so I will send a
    separate patch doing this a bit later.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: "Eric W. Biederman"
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

17 Oct, 2007

1 commit


15 Feb, 2007

2 commits

  • The semantic effect of insert_at_head is that it would allow new registered
    sysctl entries to override existing sysctl entries of the same name. Which is
    pain for caching and the proc interface never implemented.

    I have done an audit and discovered that none of the current users of
    register_sysctl care as (excpet for directories) they do not register
    duplicate sysctl entries.

    So this patch simply removes the support for overriding existing entries in
    the sys_sysctl interface since no one uses it or cares and it makes future
    enhancments harder.

    Signed-off-by: Eric W. Biederman
    Acked-by: Ralf Baechle
    Acked-by: Martin Schwidefsky
    Cc: Russell King
    Cc: David Howells
    Cc: "Luck, Tony"
    Cc: Ralf Baechle
    Cc: Paul Mackerras
    Cc: Martin Schwidefsky
    Cc: Andi Kleen
    Cc: Jens Axboe
    Cc: Corey Minyard
    Cc: Neil Brown
    Cc: "John W. Linville"
    Cc: James Bottomley
    Cc: Jan Kara
    Cc: Trond Myklebust
    Cc: Mark Fasheh
    Cc: David Chinner
    Cc: "David S. Miller"
    Cc: Patrick McHardy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • This is just a simple cleanup to keep kernel/sysctl.c from getting to crowded
    with special cases, and by keeping all of the ipc logic to together it makes
    the code a little more readable.

    [gcoady.lk@gmail.com: build fix]
    Signed-off-by: Eric W. Biederman
    Cc: Serge E. Hallyn
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Signed-off-by: Grant Coady
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman