24 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this file is released under gnu general public licence version 2 or
    at your option any later version see the file copying for more
    details

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 1 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Kate Stewart
    Reviewed-by: Richard Fontana
    Reviewed-by: Allison Randal
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190520071857.941092988@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

15 May, 2019

1 commit

  • msgctl10 of ltp triggers the following lockup When CONFIG_KASAN is
    enabled on large memory SMP systems, the pages initialization can take a
    long time, if msgctl10 requests a huge block memory, and it will block
    rcu scheduler, so release cpu actively.

    After adding schedule() in free_msg, free_msg can not be called when
    holding spinlock, so adding msg to a tmp list, and free it out of
    spinlock

    rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
    rcu: Tasks blocked on level-1 rcu_node (CPUs 16-31): P32505
    rcu: Tasks blocked on level-1 rcu_node (CPUs 48-63): P34978
    rcu: (detected by 11, t=35024 jiffies, g=44237529, q=16542267)
    msgctl10 R running task 21608 32505 2794 0x00000082
    Call Trace:
    preempt_schedule_irq+0x4c/0xb0
    retint_kernel+0x1b/0x2d
    RIP: 0010:__is_insn_slot_addr+0xfb/0x250
    Code: 82 1d 00 48 8b 9b 90 00 00 00 4c 89 f7 49 c1 ee 03 e8 59 83 1d 00 48 b8 00 00 00 00 00 fc ff df 4c 39 eb 48 89 9d 58 ff ff ff c6 04 06 f8 74 66 4c 8d 75 98 4c 89 f1 48 c1 e9 03 48 01 c8 48
    RSP: 0018:ffff88bce041f758 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
    RAX: dffffc0000000000 RBX: ffffffff8471bc50 RCX: ffffffff828a2a57
    RDX: dffffc0000000000 RSI: dffffc0000000000 RDI: ffff88bce041f780
    RBP: ffff88bce041f828 R08: ffffed15f3f4c5b3 R09: ffffed15f3f4c5b3
    R10: 0000000000000001 R11: ffffed15f3f4c5b2 R12: 000000318aee9b73
    R13: ffffffff8471bc50 R14: 1ffff1179c083ef0 R15: 1ffff1179c083eec
    kernel_text_address+0xc1/0x100
    __kernel_text_address+0xe/0x30
    unwind_get_return_address+0x2f/0x50
    __save_stack_trace+0x92/0x100
    create_object+0x380/0x650
    __kmalloc+0x14c/0x2b0
    load_msg+0x38/0x1a0
    do_msgsnd+0x19e/0xcf0
    do_syscall_64+0x117/0x400
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
    rcu: Tasks blocked on level-1 rcu_node (CPUs 0-15): P32170
    rcu: (detected by 14, t=35016 jiffies, g=44237525, q=12423063)
    msgctl10 R running task 21608 32170 32155 0x00000082
    Call Trace:
    preempt_schedule_irq+0x4c/0xb0
    retint_kernel+0x1b/0x2d
    RIP: 0010:lock_acquire+0x4d/0x340
    Code: 48 81 ec c0 00 00 00 45 89 c6 4d 89 cf 48 8d 6c 24 20 48 89 3c 24 48 8d bb e4 0c 00 00 89 74 24 0c 48 c7 44 24 20 b3 8a b5 41 c1 ed 03 48 c7 44 24 28 b4 25 18 84 48 c7 44 24 30 d0 54 7a 82
    RSP: 0018:ffff88af83417738 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13
    RAX: dffffc0000000000 RBX: ffff88bd335f3080 RCX: 0000000000000002
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88bd335f3d64
    RBP: ffff88af83417758 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000001 R11: ffffed13f3f745b2 R12: 0000000000000000
    R13: 0000000000000002 R14: 0000000000000000 R15: 0000000000000000
    is_bpf_text_address+0x32/0xe0
    kernel_text_address+0xec/0x100
    __kernel_text_address+0xe/0x30
    unwind_get_return_address+0x2f/0x50
    __save_stack_trace+0x92/0x100
    save_stack+0x32/0xb0
    __kasan_slab_free+0x130/0x180
    kfree+0xfa/0x2d0
    free_msg+0x24/0x50
    do_msgrcv+0x508/0xe60
    do_syscall_64+0x117/0x400
    entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Davidlohr said:
    "So after releasing the lock, the msg rbtree/list is empty and new
    calls will not see those in the newly populated tmp_msg list, and
    therefore they cannot access the delayed msg freeing pointers, which
    is good. Also the fact that the node_cache is now freed before the
    actual messages seems to be harmless as this is wanted for
    msg_insert() avoiding GFP_ATOMIC allocations, and after releasing the
    info->lock the thing is freed anyway so it should not change things"

    Link: http://lkml.kernel.org/r/1552029161-4957-1-git-send-email-lirongqing@baidu.com
    Signed-off-by: Li RongQing
    Signed-off-by: Zhang Yu
    Reviewed-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Rongqing
     

09 Sep, 2017

1 commit

  • refcount_t type and corresponding API should be used instead of atomic_t
    when the variable is used as a reference counter. This allows to avoid
    accidental refcounter overflows that might lead to use-after-free
    situations.

    Link: http://lkml.kernel.org/r/1499417992-3238-2-git-send-email-elena.reshetova@intel.com
    Signed-off-by: Elena Reshetova
    Signed-off-by: Hans Liljestrand
    Signed-off-by: Kees Cook
    Signed-off-by: David Windsor
    Cc: Peter Zijlstra
    Cc: Greg Kroah-Hartman
    Cc: "Eric W. Biederman"
    Cc: Ingo Molnar
    Cc: Alexey Dobriyan
    Cc: Serge Hallyn
    Cc:
    Cc: Davidlohr Bueso
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Elena Reshetova
     

28 Oct, 2016

1 commit

  • When kmem accounting switched from account by default to only account if
    flagged by __GFP_ACCOUNT, IPC mqueue and messages was left out.

    The production use case at hand is that mqueues should be customizable
    via sysctls in Docker containers in a Kubernetes cluster. This can only
    be safely allowed to the users of the cluster (without the risk that
    they can cause resource shortage on a node, influencing other users'
    containers) if all resources they control are bounded, i.e. accounted
    for.

    Link: http://lkml.kernel.org/r/1476806075-1210-1-git-send-email-arozansk@redhat.com
    Signed-off-by: Aristeu Rozanski
    Reported-by: Stefan Schimanski
    Acked-by: Davidlohr Bueso
    Cc: Alexey Dobriyan
    Cc: Johannes Weiner
    Cc: Michal Hocko
    Cc: Vladimir Davydov
    Cc: Stefan Schimanski
    Cc: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Aristeu Rozanski
     

03 Aug, 2016

1 commit


07 Nov, 2015

1 commit

  • d0edd8528362 ("ipc: convert invalid scenarios to use WARN_ON") relaxed the
    nil dst parameter check, originally being a full BUG_ON. However, this
    check seems quite unnecessary when the only purpose is for
    ceckpoint/restore (MSG_COPY flag):

    o The copy variable is set initially to nil, apparently as a way of
    ensuring that prepare_copy is previously called. Which is in fact done,
    unconditionally at the beginning of do_msgrcv.

    o There is no concurrency with 'copy' (stack allocated in do_msgrcv).

    Furthermore, any errors in 'copy' (and thus prepare_copy/copy_msg) should
    always handled by IS_ERR() family. Therefore remove this check altogether
    as it can never occur with the current users.

    Signed-off-by: Davidlohr Bueso
    Cc: Stanislav Kinsbursky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

11 Sep, 2015

1 commit

  • Considering Linus' past rants about the (ab)use of BUG in the kernel, I
    took a look at how we deal with such calls in ipc. Given that any errors
    or corruption in ipc code are most likely contained within the set of
    processes participating in the broken mechanisms, there aren't really many
    strong fatal system failure scenarios that would require a BUG call.
    Also, if something is seriously wrong, ipc might not be the place for such
    a BUG either.

    1. For example, recently, a customer hit one of these BUG_ONs in shm
    after failing shm_lock(). A busted ID imho does not merit a BUG_ON,
    and WARN would have been better.

    2. MSG_COPY functionality of posix msgrcv(2) for checkpoint/restore.
    I don't see how we can hit this anyway -- at least it should be IS_ERR.
    The 'copy' arg from do_msgrcv is always set by calling prepare_copy()
    first and foremost. We could also probably drop this check altogether.
    Either way, it does not merit a BUG_ON.

    3. No ->fault() callback for the fs getting the corresponding page --
    seems selfish to make the system unusable.

    Signed-off-by: Davidlohr Bueso
    Cc: Manfred Spraul
    Cc: Linus Torvalds
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

05 Dec, 2014

2 commits


13 Nov, 2013

1 commit

  • On 64 bit systems the test for negative message sizes is bogus as the
    size, which may be positive when evaluated as a long, will get truncated
    to an int when passed to load_msg(). So a long might very well contain a
    positive value but when truncated to an int it would become negative.

    That in combination with a small negative value of msg_ctlmax (which will
    be promoted to an unsigned type for the comparison against msgsz, making
    it a big positive value and therefore make it pass the check) will lead to
    two problems: 1/ The kmalloc() call in alloc_msg() will allocate a too
    small buffer as the addition of alen is effectively a subtraction. 2/ The
    copy_from_user() call in load_msg() will first overflow the buffer with
    userland data and then, when the userland access generates an access
    violation, the fixup handler copy_user_handle_tail() will try to fill the
    remainder with zeros -- roughly 4GB. That almost instantly results in a
    system crash or reset.

    ,-[ Reproducer (needs to be run as root) ]--
    | #include
    | #include
    | #include
    | #include
    |
    | int main(void) {
    | long msg = 1;
    | int fd;
    |
    | fd = open("/proc/sys/kernel/msgmax", O_WRONLY);
    | write(fd, "-1", 2);
    | close(fd);
    |
    | msgsnd(0, &msg, 0xfffffff0, IPC_NOWAIT);
    |
    | return 0;
    | }
    '---

    Fix the issue by preventing msgsz from getting truncated by consistently
    using size_t for the message length. This way the size checks in
    do_msgsnd() could still be passed with a negative value for msg_ctlmax but
    we would fail on the buffer allocation in that case and error out.

    Also change the type of m_ts from int to size_t to avoid similar nastiness
    in other code paths -- it is used in similar constructs, i.e. signed vs.
    unsigned checks. It should never become negative under normal
    circumstances, though.

    Setting msg_ctlmax to a negative value is an odd configuration and should
    be prevented. As that might break existing userland, it will be handled
    in a separate commit so it could easily be reverted and reworked without
    reintroducing the above described bug.

    Hardening mechanisms for user copy operations would have catched that bug
    early -- e.g. checking slab object sizes on user copy operations as the
    usercopy feature of the PaX patch does. Or, for that matter, detect the
    long vs. int sign change due to truncation, as the size overflow plugin
    of the very same patch does.

    [akpm@linux-foundation.org: fix i386 min() warnings]
    Signed-off-by: Mathias Krause
    Cc: Pax Team
    Cc: Davidlohr Bueso
    Cc: Brad Spengler
    Cc: Manfred Spraul
    Cc: [ v2.3.27+ -- yes, that old ;) ]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathias Krause
     

02 May, 2013

2 commits

  • Pull VFS updates from Al Viro,

    Misc cleanups all over the place, mainly wrt /proc interfaces (switch
    create_proc_entry to proc_create(), get rid of the deprecated
    create_proc_read_entry() in favor of using proc_create_data() and
    seq_file etc).

    7kloc removed.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
    don't bother with deferred freeing of fdtables
    proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
    proc: Make the PROC_I() and PDE() macros internal to procfs
    proc: Supply a function to remove a proc entry by PDE
    take cgroup_open() and cpuset_open() to fs/proc/base.c
    ppc: Clean up scanlog
    ppc: Clean up rtas_flash driver somewhat
    hostap: proc: Use remove_proc_subtree()
    drm: proc: Use remove_proc_subtree()
    drm: proc: Use minor->index to label things, not PDE->name
    drm: Constify drm_proc_list[]
    zoran: Don't print proc_dir_entry data in debug
    reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
    proc: Supply an accessor for getting the data from a PDE's parent
    airo: Use remove_proc_subtree()
    rtl8192u: Don't need to save device proc dir PDE
    rtl8187se: Use a dir under /proc/net/r8180/
    proc: Add proc_mkdir_data()
    proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
    proc: Move PDE_NET() to fs/proc/proc_net.c
    ...

    Linus Torvalds
     
  • Split the proc namespace stuff out into linux/proc_ns.h.

    Signed-off-by: David Howells
    cc: netdev@vger.kernel.org
    cc: Serge E. Hallyn
    cc: Eric W. Biederman
    Signed-off-by: Al Viro

    David Howells
     

01 May, 2013

5 commits


09 Mar, 2013

1 commit


05 Jan, 2013

2 commits

  • Remove the redundant and confusing fill_copy(). Also add copy_msg()
    check for error. In this case exit from the function have to be done
    instead of break, because further code interprets any error as EAGAIN.

    Also define copy_msg() for the case when CONFIG_CHECKPOINT_RESTORE is
    disabled.

    Signed-off-by: Stanislav Kinsbursky
    Cc: "Eric W. Biederman"
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stanislav Kinsbursky
     
  • This patch is required for checkpoint/restore in userspace.

    c/r requires some way to get all pending IPC messages without deleting
    them from the queue (checkpoint can fail and in this case tasks will be
    resumed, so queue have to be valid).

    To achive this, new operation flag MSG_COPY for sys_msgrcv() system call
    was introduced. If this flag was specified, then mtype is interpreted as
    number of the message to copy.

    If MSG_COPY is set, then kernel will allocate dummy message with passed
    size, and then use new copy_msg() helper function to copy desired message
    (instead of unlinking it from the queue).

    Notes:

    1) Return -ENOSYS if MSG_COPY is specified, but
    CONFIG_CHECKPOINT_RESTORE is not set.

    Signed-off-by: Stanislav Kinsbursky
    Cc: Serge Hallyn
    Cc: "Eric W. Biederman"
    Cc: Pavel Emelyanov
    Cc: Al Viro
    Cc: KOSAKI Motohiro
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stanislav Kinsbursky
     

20 Nov, 2012

1 commit

  • Assign a unique proc inode to each namespace, and use that
    inode number to ensure we only allocate at most one proc
    inode for every namespace in proc.

    A single proc inode per namespace allows userspace to test
    to see if two processes are in the same namespace.

    This has been a long requested feature and only blocked because
    a naive implementation would put the id in a global space and
    would ultimately require having a namespace for the names of
    namespaces, making migration and certain virtualization tricks
    impossible.

    We still don't have per superblock inode numbers for proc, which
    appears necessary for application unaware checkpoint/restart and
    migrations (if the application is using namespace file descriptors)
    but that is now allowd by the design if it becomes important.

    I have preallocated the ipc and uts initial proc inode numbers so
    their structures can be statically initialized.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

14 Feb, 2012

1 commit


09 Dec, 2011

1 commit


24 Mar, 2011

1 commit

  • Changelog:
    Feb 15: Don't set new ipc->user_ns if we didn't create a new
    ipc_ns.
    Feb 23: Move extern declaration to ipc_namespace.h, and group
    fwd declarations at top.

    Signed-off-by: Serge E. Hallyn
    Acked-by: "Eric W. Biederman"
    Acked-by: Daniel Lezcano
    Acked-by: David Howells
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

07 Apr, 2009

2 commits

  • Implement multiple mounts of the mqueue file system, and link it to usage
    of CLONE_NEWIPC.

    Each ipc ns has a corresponding mqueuefs superblock. When a user does
    clone(CLONE_NEWIPC) or unshare(CLONE_NEWIPC), the unshare will cause an
    internal mount of a new mqueuefs sb linked to the new ipc ns.

    When a user does 'mount -t mqueue mqueue /dev/mqueue', he mounts the
    mqueuefs superblock.

    Posix message queues can be worked with both through the mq_* system calls
    (see mq_overview(7)), and through the VFS through the mqueue mount. Any
    usage of mq_open() and friends will work with the acting task's ipc
    namespace. Any actions through the VFS will work with the mqueuefs in
    which the file was created. So if a user doesn't remount mqueuefs after
    unshare(CLONE_NEWIPC), mq_open("/ab") will not be reflected in "ls
    /dev/mqueue".

    If task a mounts mqueue for ipc_ns:1, then clones task b with a new ipcns,
    ipcns:2, and then task a is the last task in ipc_ns:1 to exit, then (1)
    ipc_ns:1 will be freed, (2) it's superblock will live on until task b
    umounts the corresponding mqueuefs, and vfs actions will continue to
    succeed, but (3) sb->s_fs_info will be NULL for the sb corresponding to
    the deceased ipc_ns:1.

    To make this happen, we must protect the ipc reference count when

    a) a task exits and drops its ipcns->count, since it might be dropping
    it to 0 and freeing the ipcns

    b) a task accesses the ipcns through its mqueuefs interface, since it
    bumps the ipcns refcount and might race with the last task in the ipcns
    exiting.

    So the kref is changed to an atomic_t so we can use
    atomic_dec_and_lock(&ns->count,mq_lock), and every access to the ipcns
    through ns = mqueuefs_sb->s_fs_info is protected by the same lock.

    Signed-off-by: Cedric Le Goater
    Signed-off-by: Serge E. Hallyn
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • Move mqueue vfsmount plus a few tunables into the ipc_namespace struct.
    The CONFIG_IPC_NS boolean and the ipc_namespace struct will serve both the
    posix message queue namespaces and the SYSV ipc namespaces.

    The sysctl code will be fixed separately in patch 3. After just this
    patch, making a change to posix mqueue tunables always changes the values
    in the initial ipc namespace.

    Signed-off-by: Cedric Le Goater
    Signed-off-by: Serge E. Hallyn
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     

14 Dec, 2006

1 commit

  • Run this:

    #!/bin/sh
    for f in $(grep -Erl "\([^\)]*\) *k[cmz]alloc" *) ; do
    echo "De-casting $f..."
    perl -pi -e "s/ ?= ?\([^\)]*\) *(k[cmz]alloc) *\(/ = \1\(/" $f
    done

    And then go through and reinstate those cases where code is casting pointers
    to non-pointers.

    And then drop a few hunks which conflicted with outstanding work.

    Cc: Russell King , Ian Molton
    Cc: Mikael Starvik
    Cc: Yoshinori Sato
    Cc: Roman Zippel
    Cc: Geert Uytterhoeven
    Cc: Ralf Baechle
    Cc: Paul Mackerras
    Cc: Kyle McMartin
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: "David S. Miller"
    Cc: Jeff Dike
    Cc: Greg KH
    Cc: Jens Axboe
    Cc: Paul Fulghum
    Cc: Alan Cox
    Cc: Karsten Keil
    Cc: Mauro Carvalho Chehab
    Cc: Jeff Garzik
    Cc: James Bottomley
    Cc: Ian Kent
    Cc: Steven French
    Cc: David Woodhouse
    Cc: Neil Brown
    Cc: Jaroslav Kysela
    Cc: Takashi Iwai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     

04 Oct, 2006

1 commit


17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds