19 Jun, 2009
40 commits
-
In theory it is not safe to dereference ->parent/real_parent without
tasklist or rcu lock, we can race with re-parenting.Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Cc: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The forked child can have TIF_SIGPENDING if it was copied from parent's
ti->flags. But this is harmless and actually almost never happens,
because copy_process() can't succeed if signal_pending() == T.Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There is no reason for thread_group_cputime() in wait_task_zombie(), there
must be no other threads.This call was previously needed to collect the per-cpu data which we do
not have any longer.Signed-off-by: Oleg Nesterov
Cc: Peter Zijlstra
Acked-by: Roland McGrath
Cc: Stanislaw Gruszka
Cc: Vitaly Mayatskikh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Change ptrace_getsiginfo/ptrace_setsiginfo to use lock_task_sighand()
without tasklist_lock. Perhaps it makes sense to make a single helper
with "bool rw" argument.Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If the non-traced sub-thread calls do_notify_parent_cldstop(), we send the
notification to group_leader->real_parent and we report group_leader's
pid.But, if group_leader is traced we use the wrong ->parent->nsproxy->pid_ns,
the tracer and parent can live in different namespaces. Change the code
to use "parent" instead of tsk->parent.Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Acked-by: Sukadev Bhattiprolu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Change wait_task_zombie() to use ->real_parent instead of ->parent. We
could even use current afaics, but ->real_parent is more clean.We know that the child is not ptrace_reparented() and thus they are equal.
But we should avoid using task_struct->parent, we are going to remove it.Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
- Use rcu_read_lock() instead of tasklist_lock to find/get the task
in ptrace_get_task_struct().- Make it static, it has no callers outside of ptrace.c.
- The comment doesn't match the reality, this helper does not do
any checks. Beacuse it is really trivial and static I removed the
whole comment.Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove the "Nasty, nasty" lock dance in ptrace_attach()/ptrace_traceme() -
from now task_lock() has nothing to do with ptrace at all.With the recent changes nobody uses task_lock() to serialize with ptrace,
but in fact it was never needed and it was never used consistently.However ptrace_attach() calls __ptrace_may_access() and needs task_lock()
to pin task->mm for get_dumpable(). But we can call __ptrace_may_access()
before we take tasklist_lock, ->cred_exec_mutex protects us against
do_execve() path which can change creds and MMF_DUMP* flags.(ugly, but we can't use ptrace_may_access() because it hides the error
code, so we have to take task_lock() and use __ptrace_may_access()).NOTE: this change assumes that LSM hooks, security_ptrace_may_access() and
security_ptrace_traceme(), can be called without task_lock() held.Signed-off-by: Oleg Nesterov
Cc: Chris Wright
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
ptrace_attach() and ptrace_traceme() are the last functions which look as
if the untraced task can have task->ptrace != 0, this must not be
possible. Change the code to just check ->ptrace != 0 and s/|=/=/ to set
PT_PTRACED.Also, a couple of trivial whitespace cleanups in ptrace_attach().
And move ptrace_traceme() up near ptrace_attach() to keep them close to
each other.Signed-off-by: Oleg Nesterov
Cc: Chris Wright
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
- Add PF_KTHREAD check to prevent attaching to the kernel thread
with a borrowed ->mm.With or without this change we can race with daemonize() which
can set PF_KTHREAD or clear ->mm after ptrace_attach() does the
check, but this doesn't matter because reparent_to_kthreadd()
does ptrace_unlink().- Kill "!task->mm" check. We don't really care about ->mm != NULL,
and the task can call exit_mm() right after we drop task_lock().
What we need is to make sure we can't attach after exit_notify(),
check task->exit_state != 0 instead.Also, move the "already traced" check down for cosmetic reasons.
Signed-off-by: Oleg Nesterov
Cc: Chris Wright
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
No functional changes.
- Nobody except ptrace.c & co should use ptrace flags directly, we have
task_ptrace() for that.- No need to specially check PT_PTRACED, we must not have other PT_ bits
set without PT_PTRACED. And no need to know this flag exists.Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
tracehook_unsafe_exec() doesn't need task_lock(), remove the old comment.
Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
"Search in the siblings" should use ->real_parent, not ->parent. If the
task is traced then ->parent == tracer, while the task's parent is always
->real_parent.Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
m32r: PTRACE_SINGLESTEP sets PT_DTRACE, it is never used except cleared
after do_execve().Signed-off-by: Oleg Nesterov
Acked-by: Hirokazu Takata
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
m68k sets PT_DTRACE in trap_c() but never uses it.
Signed-off-by: Oleg Nesterov
Acked-by: Geert Uytterhoeven
Acked-by: Greg Ungerer
Cc: Roman Zippel
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
avr32, mn10300, parisc, s390, sh, xtensa:
They never set PT_DTRACE, but clear it after do_execve().
Signed-off-by: Oleg Nesterov
Cc: David Howells
Acked-by: Kyle McMartin
Cc: Grant Grundler
Cc: Matthew Wilcox
Acked-by: Martin Schwidefsky
Cc: Heiko Carstens
Acked-by: Paul Mundt
Acked-by: Chris Zankel
Acked-by: Roland McGrath
Acked-by: Haavard Skinnemoen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
h8300 defines PT_DTRACE for asm but never uses it.
DEFINE(PT_PTRACED, PT_PTRACED) seems to be unused too.
Signed-off-by: Oleg Nesterov
Acked-by: Yoshinori Sato
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
allow_signal() checks ->mm == NULL. Not sure why. Perhaps to make sure
current is the kernel thread. But this helper must not be used unless we
are the kernel thread, kill this check.Also, document the fact that the CLONE_SIGHAND kthread must not use
allow_signal(), unless the caller really wants to change the parent's
->sighand->action as well.Signed-off-by: Oleg Nesterov
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Try to fix memcg's lru rotation sanity: make memcg use the same logic as
the global LRU does.Now, at __isolate_lru_page() retruns -EBUSY, the page is rotated to the
tail of LRU in global LRU's isolate LRU pages. But in memcg, it's not
handled. This makes memcg do the same behavior as global LRU and rotate
LRU in the page is busy.Signed-off-by: KAMEZAWA Hiroyuki
Cc: KOSAKI Motohiro
Acked-by: Daisuke Nishimura
Cc: Balbir Singh
Cc: Mel Gorman
Cc: Minchan Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We don't have an interface to reset mem.limit or memsw.limit now.
This patch allows to reset mem.limit or memsw.limit when they are being
set to -1.Signed-off-by: Daisuke Nishimura
Cc: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Cc: Li Zefan
Cc: Dhaval Giani
Cc: YAMAMOTO Takashi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A user can set memcg.limit_in_bytes == memcg.memsw.limit_in_bytes when the
user just want to limit the total size of applications, in other words,
not very interested in memory usage itself. In this case, swap-out will
be done only by global-LRU.But, under current implementation, memory.limit_in_bytes is checked at
first and try_to_free_page() may do swap-out. But, that swap-out is
useless for memsw.limit_in_bytes and the thread may hit limit again.This patch tries to fix the current behavior at memory.limit ==
memsw.limit case. And documentation is updated to explain the behavior of
this special case.Signed-off-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Cc: Li Zefan
Cc: Dhaval Giani
Cc: YAMAMOTO Takashi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch fixes mis-accounting of swap usage in memcg.
In the current implementation, memcg's swap account is uncharged only when
swap is completely freed. But there are several cases where swap cannot
be freed cleanly. For handling that, this patch changes that memcg
uncharges swap account when swap has no references other than cache.By this, memcg's swap entry accounting can be fully synchronous with the
application's behavior.This patch also changes memcg's hooks for swap-out.
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Acked-by: Balbir Singh
Cc: Hugh Dickins
Cc: Johannes Weiner
Cc: Li Zefan
Cc: Dhaval Giani
Cc: YAMAMOTO Takashi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This forward declaration seems pointless.
Signed-off-by: Li Zefan
Cc: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We don't need to check do_swap_account in the case that the function which
checks do_swap_account will never get called if do_swap_account == 0.Signed-off-by: Li Zefan
Cc: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
mem_cgroup_cache_charge_swapin() isn't used any more, so remove no-op
definition of it in header file.Signed-off-by: Daisuke Nishimura
Cc: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add file RSS tracking per memory cgroup
We currently don't track file RSS, the RSS we report is actually anon RSS.
All the file mapped pages, come in through the page cache and get
accounted there. This patch adds support for accounting file RSS pages.
It should1. Help improve the metrics reported by the memory resource controller
2. Will form the basis for a future shared memory accounting heuristic
that has been proposed by Kamezawa.Unfortunately, we cannot rename the existing "rss" keyword used in
memory.stat to "anon_rss". We however, add "mapped_file" data and hope to
educate the end user through documentation.[hugh.dickins@tiscali.co.uk: fix mem_cgroup_update_mapped_file_stat oops]
Signed-off-by: Balbir Singh
Acked-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Cc: Paul Menage
Cc: Dhaval Giani
Cc: Daisuke Nishimura
Cc: YAMAMOTO Takashi
Cc: KOSAKI Motohiro
Cc: David Rientjes
Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
While walking through the whitelist, if the DEV_ALL item is found, no more
check is needed.Signed-off-by: Li Zefan
Acked-by: Serge Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The 'noprefix' option was introduced for backwards-compatibility of
cpuset, but actually it can be used when mounting other subsystems.This results in possibility of name collision, and now the collision can
really happen, because we have 'stat' file in both memory and cpuacct
subsystem:# mount -t cgroup -o noprefix,memory,cpuacct xxx /mnt
Cgroup will happily mount the 2 subsystems, but only 'stat' file of memory
subsys can be seen.We don't want users to use nopreifx, and also want to avoid name
collision, so we change to allow noprefix only if mounting just the cpuset
subsystem.[akpm@linux-foundation.org: fix shift for cpuset_subsys_id >= 32]
Signed-off-by: Li Zefan
Cc: Paul Menage
Acked-by: KAMEZAWA Hiroyuki
Cc: Balbir Singh
Acked-by: Dhaval Giani
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix some cgroup messages to read better.
Update MAINTAINERS to include mm/*cgroup* files.Signed-off-by: Randy Dunlap
Cc: Paul Menage
Cc: Li Zefan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently cn_test_want_notify() has no user.
So add an ifdef and a comment which tells us to not remove it.
Signed-off-by: Jaswinder Singh Rajput
Acked-by: Evgeniy Polyakov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Perl is used on the kernel Makefile to generate documentation, firmwares
in c source form, sources, graphs, and some headers and this fact is
undocumented.[akpm@linux-foundation.org: 80-columns, please]
Signed-off-by: Jose Luis Perez Diez
Cc: Sam Ravnborg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Several code paths in reiserfs have a construct like:
if (is_direntry_le_ih(ih = B_N_PITEM_HEAD(src, item_num))) ...
which, in addition to being ugly, end up causing compiler warnings with
gcc 4.4.0. Previous compilers didn't issue a warning.fs/reiserfs/do_balan.c:1273: warning: operation on `aux_ih' may be undefined
fs/reiserfs/lbalance.c:393: warning: operation on `ih' may be undefined
fs/reiserfs/lbalance.c:421: warning: operation on `ih' may be undefined
fs/reiserfs/lbalance.c:777: warning: operation on `ih' may be undefinedI believe this is due to the ih being passed to macros which evaluate the
argument more than once. This is old code and we haven't seen any
problems with it, but this patch eliminates the warnings.It converts the multiple evaluation macros to static inlines and does a
preassignment for the cases that were causing the warnings because that
code is just ugly.Reported-by: Chris Mason
Signed-off-by: Jeff Mahoney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
unsigned i_block,fragment cannot be negative.
Signed-off-by: Roel Kluin
Signed-off-by: Evgeniy Dushistov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove unused variables from isofs_sb_info (used to be some mount
options), unify variables for option to use 0/1 (some options used
'y'/'n'), use bit fields for option flags in superblock.Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
isofs allows setting of default uid and gid of files but value 0 was used
to indicate that user did not specify any uid/gid mount option. Since
this option also overrides uid/gid set in Rock Ridge extension, it makes
sense to allow forcing uid/gid 0. Fix option processing to allow this.Cc:
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
So far, permissions set via 'mode' and/or 'dmode' mount options were
effective only if the medium had no rock ridge extensions (or was mounted
without them). Add 'overriderockmode' mount option to indicate that these
options should override permissions set in rock ridge extensions. Maybe
this should be default but the current behavior is there since mount
options were created so I think we should not change how they behave.Cc:
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
As Ted pointed out, it can happen that ext3_truncate() returns without
removing inode from orphan list. This way we could in some rare cases
(like when we get ENOMEM from an allocation in ext3_truncate called
because of failed ext3_write_begin) leave the inode on orphan list and
that triggers assertion failure on umount.So make ext3_truncate() always remove inode from in-memory orphan list.
Cc: Theodore Ts'o
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
I delete the following patch
"commit 3f31fddfa26b7594b44ff2b34f9a04ba409e0f91
Author: Mingming Cao
Date: Fri Jul 25 01:46:22 2008 -0700jbd: fix race between free buffer and commit transaction
This patch is no longer needed because if race between freeing buffer and
committing transaction functionality occurs and dio gets error, currently
dio falls back to buffered IO by the following patch.commit 6ccfa806a9cfbbf1cd43d5b6aa47ef2c0eb518fd
Author: Hisashi Hifumi
Date: Tue Sep 2 14:35:40 2008 -0700VFS: fix dio write returning EIO when try_to_release_page fails
Signed-off-by: Hisashi Hifumi
Cc: Theodore Tso
Cc: Mingming Cao
Acked-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Chain verification in ext3_get_blocks() has been hosed since it called
verify_chain(chain, NULL) which always returns success. As a result
readers could in theory race with truncate. On the other hand the race
probably cannot happen with the current locking scheme, since by the
time ext3_truncate() is called all the pages are already removed and
hence get_block() shouldn't be called on such pages...Signed-off-by: Jan Kara
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
ext2.txt says that dirs can have 32,768 subdirs, but the actual value of
EXT2_LINK_MAX is 32000.ext3 is the same, but the doc does not mention it. One of ext4's features
is to "fix 32000 subdirectory limit".Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds