09 Sep, 2017
1 commit
-
refcount_t type and corresponding API should be used instead of atomic_t
when the variable is used as a reference counter. This allows to avoid
accidental refcounter overflows that might lead to use-after-free
situations.Link: http://lkml.kernel.org/r/1499417992-3238-2-git-send-email-elena.reshetova@intel.com
Signed-off-by: Elena Reshetova
Signed-off-by: Hans Liljestrand
Signed-off-by: Kees Cook
Signed-off-by: David Windsor
Cc: Peter Zijlstra
Cc: Greg Kroah-Hartman
Cc: "Eric W. Biederman"
Cc: Ingo Molnar
Cc: Alexey Dobriyan
Cc: Serge Hallyn
Cc:
Cc: Davidlohr Bueso
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
28 Oct, 2016
1 commit
-
When kmem accounting switched from account by default to only account if
flagged by __GFP_ACCOUNT, IPC mqueue and messages was left out.The production use case at hand is that mqueues should be customizable
via sysctls in Docker containers in a Kubernetes cluster. This can only
be safely allowed to the users of the cluster (without the risk that
they can cause resource shortage on a node, influencing other users'
containers) if all resources they control are bounded, i.e. accounted
for.Link: http://lkml.kernel.org/r/1476806075-1210-1-git-send-email-arozansk@redhat.com
Signed-off-by: Aristeu Rozanski
Reported-by: Stefan Schimanski
Acked-by: Davidlohr Bueso
Cc: Alexey Dobriyan
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Vladimir Davydov
Cc: Stefan Schimanski
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
03 Aug, 2016
1 commit
-
Write-only variable.
Link: http://lkml.kernel.org/r/20160708214356.GA6785@p183.telecom.by
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Nov, 2015
1 commit
-
d0edd8528362 ("ipc: convert invalid scenarios to use WARN_ON") relaxed the
nil dst parameter check, originally being a full BUG_ON. However, this
check seems quite unnecessary when the only purpose is for
ceckpoint/restore (MSG_COPY flag):o The copy variable is set initially to nil, apparently as a way of
ensuring that prepare_copy is previously called. Which is in fact done,
unconditionally at the beginning of do_msgrcv.o There is no concurrency with 'copy' (stack allocated in do_msgrcv).
Furthermore, any errors in 'copy' (and thus prepare_copy/copy_msg) should
always handled by IS_ERR() family. Therefore remove this check altogether
as it can never occur with the current users.Signed-off-by: Davidlohr Bueso
Cc: Stanislav Kinsbursky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
11 Sep, 2015
1 commit
-
Considering Linus' past rants about the (ab)use of BUG in the kernel, I
took a look at how we deal with such calls in ipc. Given that any errors
or corruption in ipc code are most likely contained within the set of
processes participating in the broken mechanisms, there aren't really many
strong fatal system failure scenarios that would require a BUG call.
Also, if something is seriously wrong, ipc might not be the place for such
a BUG either.1. For example, recently, a customer hit one of these BUG_ONs in shm
after failing shm_lock(). A busted ID imho does not merit a BUG_ON,
and WARN would have been better.2. MSG_COPY functionality of posix msgrcv(2) for checkpoint/restore.
I don't see how we can hit this anyway -- at least it should be IS_ERR.
The 'copy' arg from do_msgrcv is always set by calling prepare_copy()
first and foremost. We could also probably drop this check altogether.
Either way, it does not merit a BUG_ON.3. No ->fault() callback for the fs getting the corresponding page --
seems selfish to make the system unusable.Signed-off-by: Davidlohr Bueso
Cc: Manfred Spraul
Cc: Linus Torvalds
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
05 Dec, 2014
2 commits
-
Signed-off-by: Al Viro
-
for now - just move corresponding ->proc_inum instances over there
Acked-by: "Eric W. Biederman"
Signed-off-by: Al Viro
13 Nov, 2013
1 commit
-
On 64 bit systems the test for negative message sizes is bogus as the
size, which may be positive when evaluated as a long, will get truncated
to an int when passed to load_msg(). So a long might very well contain a
positive value but when truncated to an int it would become negative.That in combination with a small negative value of msg_ctlmax (which will
be promoted to an unsigned type for the comparison against msgsz, making
it a big positive value and therefore make it pass the check) will lead to
two problems: 1/ The kmalloc() call in alloc_msg() will allocate a too
small buffer as the addition of alen is effectively a subtraction. 2/ The
copy_from_user() call in load_msg() will first overflow the buffer with
userland data and then, when the userland access generates an access
violation, the fixup handler copy_user_handle_tail() will try to fill the
remainder with zeros -- roughly 4GB. That almost instantly results in a
system crash or reset.,-[ Reproducer (needs to be run as root) ]--
| #include
| #include
| #include
| #include
|
| int main(void) {
| long msg = 1;
| int fd;
|
| fd = open("/proc/sys/kernel/msgmax", O_WRONLY);
| write(fd, "-1", 2);
| close(fd);
|
| msgsnd(0, &msg, 0xfffffff0, IPC_NOWAIT);
|
| return 0;
| }
'---Fix the issue by preventing msgsz from getting truncated by consistently
using size_t for the message length. This way the size checks in
do_msgsnd() could still be passed with a negative value for msg_ctlmax but
we would fail on the buffer allocation in that case and error out.Also change the type of m_ts from int to size_t to avoid similar nastiness
in other code paths -- it is used in similar constructs, i.e. signed vs.
unsigned checks. It should never become negative under normal
circumstances, though.Setting msg_ctlmax to a negative value is an odd configuration and should
be prevented. As that might break existing userland, it will be handled
in a separate commit so it could easily be reverted and reworked without
reintroducing the above described bug.Hardening mechanisms for user copy operations would have catched that bug
early -- e.g. checking slab object sizes on user copy operations as the
usercopy feature of the PaX patch does. Or, for that matter, detect the
long vs. int sign change due to truncation, as the size overflow plugin
of the very same patch does.[akpm@linux-foundation.org: fix i386 min() warnings]
Signed-off-by: Mathias Krause
Cc: Pax Team
Cc: Davidlohr Bueso
Cc: Brad Spengler
Cc: Manfred Spraul
Cc: [ v2.3.27+ -- yes, that old ;) ]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
02 May, 2013
2 commits
-
Pull VFS updates from Al Viro,
Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).7kloc removed.
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
don't bother with deferred freeing of fdtables
proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
proc: Make the PROC_I() and PDE() macros internal to procfs
proc: Supply a function to remove a proc entry by PDE
take cgroup_open() and cpuset_open() to fs/proc/base.c
ppc: Clean up scanlog
ppc: Clean up rtas_flash driver somewhat
hostap: proc: Use remove_proc_subtree()
drm: proc: Use remove_proc_subtree()
drm: proc: Use minor->index to label things, not PDE->name
drm: Constify drm_proc_list[]
zoran: Don't print proc_dir_entry data in debug
reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
proc: Supply an accessor for getting the data from a PDE's parent
airo: Use remove_proc_subtree()
rtl8192u: Don't need to save device proc dir PDE
rtl8187se: Use a dir under /proc/net/r8180/
proc: Add proc_mkdir_data()
proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
proc: Move PDE_NET() to fs/proc/proc_net.c
... -
Split the proc namespace stuff out into linux/proc_ns.h.
Signed-off-by: David Howells
cc: netdev@vger.kernel.org
cc: Serge E. Hallyn
cc: Eric W. Biederman
Signed-off-by: Al Viro
01 May, 2013
5 commits
-
Signed-off-by: HoSung Jung
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Peter Hurley
Acked-by: Stanislav Kinsbursky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Peter Hurley
Acked-by: Stanislav Kinsbursky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Separating msg allocation enables single-block vmalloc
allocation instead.Signed-off-by: Peter Hurley
Acked-by: Stanislav Kinsbursky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Peter Hurley
Acked-by: Stanislav Kinsbursky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Mar, 2013
1 commit
-
If the src msg is > 4k, then dest->next points to the
next allocated segment; resetting it just prior to dereferencing
is bad.Signed-off-by: Peter Hurley
Acked-by: Stanislav Kinsbursky
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
05 Jan, 2013
2 commits
-
Remove the redundant and confusing fill_copy(). Also add copy_msg()
check for error. In this case exit from the function have to be done
instead of break, because further code interprets any error as EAGAIN.Also define copy_msg() for the case when CONFIG_CHECKPOINT_RESTORE is
disabled.Signed-off-by: Stanislav Kinsbursky
Cc: "Eric W. Biederman"
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch is required for checkpoint/restore in userspace.
c/r requires some way to get all pending IPC messages without deleting
them from the queue (checkpoint can fail and in this case tasks will be
resumed, so queue have to be valid).To achive this, new operation flag MSG_COPY for sys_msgrcv() system call
was introduced. If this flag was specified, then mtype is interpreted as
number of the message to copy.If MSG_COPY is set, then kernel will allocate dummy message with passed
size, and then use new copy_msg() helper function to copy desired message
(instead of unlinking it from the queue).Notes:
1) Return -ENOSYS if MSG_COPY is specified, but
CONFIG_CHECKPOINT_RESTORE is not set.Signed-off-by: Stanislav Kinsbursky
Cc: Serge Hallyn
Cc: "Eric W. Biederman"
Cc: Pavel Emelyanov
Cc: Al Viro
Cc: KOSAKI Motohiro
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Nov, 2012
1 commit
-
Assign a unique proc inode to each namespace, and use that
inode number to ensure we only allocate at most one proc
inode for every namespace in proc.A single proc inode per namespace allows userspace to test
to see if two processes are in the same namespace.This has been a long requested feature and only blocked because
a naive implementation would put the id in a global space and
would ultimately require having a namespace for the names of
namespaces, making migration and certain virtualization tricks
impossible.We still don't have per superblock inode numbers for proc, which
appears necessary for application unaware checkpoint/restart and
migrations (if the application is using namespace file descriptors)
but that is now allowd by the design if it becomes important.I have preallocated the ipc and uts initial proc inode numbers so
their structures can be statically initialized.Signed-off-by: Eric W. Biederman
14 Feb, 2012
1 commit
-
Trim security.h
Signed-off-by: Al Viro
Signed-off-by: James Morris
09 Dec, 2011
1 commit
-
Signed-off-by: Al Viro
24 Mar, 2011
1 commit
-
Changelog:
Feb 15: Don't set new ipc->user_ns if we didn't create a new
ipc_ns.
Feb 23: Move extern declaration to ipc_namespace.h, and group
fwd declarations at top.Signed-off-by: Serge E. Hallyn
Acked-by: "Eric W. Biederman"
Acked-by: Daniel Lezcano
Acked-by: David Howells
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
07 Apr, 2009
2 commits
-
Implement multiple mounts of the mqueue file system, and link it to usage
of CLONE_NEWIPC.Each ipc ns has a corresponding mqueuefs superblock. When a user does
clone(CLONE_NEWIPC) or unshare(CLONE_NEWIPC), the unshare will cause an
internal mount of a new mqueuefs sb linked to the new ipc ns.When a user does 'mount -t mqueue mqueue /dev/mqueue', he mounts the
mqueuefs superblock.Posix message queues can be worked with both through the mq_* system calls
(see mq_overview(7)), and through the VFS through the mqueue mount. Any
usage of mq_open() and friends will work with the acting task's ipc
namespace. Any actions through the VFS will work with the mqueuefs in
which the file was created. So if a user doesn't remount mqueuefs after
unshare(CLONE_NEWIPC), mq_open("/ab") will not be reflected in "ls
/dev/mqueue".If task a mounts mqueue for ipc_ns:1, then clones task b with a new ipcns,
ipcns:2, and then task a is the last task in ipc_ns:1 to exit, then (1)
ipc_ns:1 will be freed, (2) it's superblock will live on until task b
umounts the corresponding mqueuefs, and vfs actions will continue to
succeed, but (3) sb->s_fs_info will be NULL for the sb corresponding to
the deceased ipc_ns:1.To make this happen, we must protect the ipc reference count when
a) a task exits and drops its ipcns->count, since it might be dropping
it to 0 and freeing the ipcnsb) a task accesses the ipcns through its mqueuefs interface, since it
bumps the ipcns refcount and might race with the last task in the ipcns
exiting.So the kref is changed to an atomic_t so we can use
atomic_dec_and_lock(&ns->count,mq_lock), and every access to the ipcns
through ns = mqueuefs_sb->s_fs_info is protected by the same lock.Signed-off-by: Cedric Le Goater
Signed-off-by: Serge E. Hallyn
Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Move mqueue vfsmount plus a few tunables into the ipc_namespace struct.
The CONFIG_IPC_NS boolean and the ipc_namespace struct will serve both the
posix message queue namespaces and the SYSV ipc namespaces.The sysctl code will be fixed separately in patch 3. After just this
patch, making a change to posix mqueue tunables always changes the values
in the initial ipc namespace.Signed-off-by: Cedric Le Goater
Signed-off-by: Serge E. Hallyn
Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
14 Dec, 2006
1 commit
-
Run this:
#!/bin/sh
for f in $(grep -Erl "\([^\)]*\) *k[cmz]alloc" *) ; do
echo "De-casting $f..."
perl -pi -e "s/ ?= ?\([^\)]*\) *(k[cmz]alloc) *\(/ = \1\(/" $f
doneAnd then go through and reinstate those cases where code is casting pointers
to non-pointers.And then drop a few hunks which conflicted with outstanding work.
Cc: Russell King , Ian Molton
Cc: Mikael Starvik
Cc: Yoshinori Sato
Cc: Roman Zippel
Cc: Geert Uytterhoeven
Cc: Ralf Baechle
Cc: Paul Mackerras
Cc: Kyle McMartin
Cc: Benjamin Herrenschmidt
Cc: Martin Schwidefsky
Cc: "David S. Miller"
Cc: Jeff Dike
Cc: Greg KH
Cc: Jens Axboe
Cc: Paul Fulghum
Cc: Alan Cox
Cc: Karsten Keil
Cc: Mauro Carvalho Chehab
Cc: Jeff Garzik
Cc: James Bottomley
Cc: Ian Kent
Cc: Steven French
Cc: David Woodhouse
Cc: Neil Brown
Cc: Jaroslav Kysela
Cc: Takashi Iwai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Oct, 2006
1 commit
-
Many files include the filename at the beginning, serveral used a wrong one.
Signed-off-by: Uwe Zeisberger
Signed-off-by: Adrian Bunk
17 Apr, 2005
1 commit
-
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.Let it rip!