Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

24 Aug, 2009

1 commit

d8e180dcd bsdacct: switch credentials for writing to the accounting file ... Browse Code »

When process accounting is enabled, every exiting process writes a log to
the account file. In addition, every once in a while one of the exiting
processes checks whether there's enough free space for the log.

SELinux policy may or may not allow the exiting process to stat the fs.
So unsuspecting processes start generating AVC denials just because
someone enabled process accounting.

For these filesystem operations, the exiting process's credentials should
be temporarily switched to that of the process which enabled accounting,
because it's really that process which wanted to have the accounting
information logged.

Signed-off-by: Michal Schmidt
Acked-by: David Howells
Acked-by: Serge Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: James Morris

Michal Schmidt
2009-08-24 09:33:40 +0800

01 Jul, 2009

1 commit

df279ca89 bsdacct: fix access to invalid filp in acct_on() ... Browse Code »

The file opened in acct_on and freshly stored in the ns->bacct struct can
be closed in acct_file_reopen by a concurrent call after we release
acct_lock and before we call mntput(file->f_path.mnt).

Record file->f_path.mnt in a local variable and use this variable only.

Signed-off-by: Renaud Lottiaux
Signed-off-by: Louis Rilling
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Renaud Lottiaux
2009-07-01 09:56:00 +0800

14 Jan, 2009

1 commit

b290ebe2c [CVE-2009-0029] System call wrappers part 04 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:19 +0800

14 Nov, 2008

1 commit

76aac0e9a CRED: Wrap task credential accesses in the core kernel ... Browse Code »

Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells
Reviewed-by: James Morris
Acked-by: Serge Hallyn
Cc: Al Viro
Cc: linux-audit@redhat.com
Cc: containers@lists.linux-foundation.org
Cc: linux-mm@kvack.org
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:12 +0800

14 Oct, 2008

1 commit

dbda4c0b9 tty: Fix abusers of current->sighand->tty ... Browse Code »

Various people outside the tty layer still stick their noses in behind the
scenes. We need to make sure they also obey the locking and referencing rules.

Signed-off-by: Alan Cox
Signed-off-by: Linus Torvalds

Alan Cox
2008-10-14 00:51:42 +0800

26 Jul, 2008

9 commits

0c18d7a5d bsdacct: fix and add comments around acct_process() ... Browse Code »

Fix the one describing what this function is and add one more - about
locking absence around pid namespaces loop.

Signed-off-by: Pavel Emelyanov
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-07-26 01:53:47 +0800
7d1e13505 bsdacct: account dying tasks in all relevant namespaces ... Browse Code »

This just makes the acct_proces walk the pid namespaces from current up to
the top and account a task in each with the accounting turned on.

ns->parent access if safe lockless, since current it still alive and holds
its namespace, which in turn holds its parent.

Signed-off-by: Pavel Emelyanov
Cc: Balbir Singh
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-07-26 01:53:47 +0800
b5a717487 bsdacct: turn acct off for all pidns-s on umount time ... Browse Code »

All the bsd_acct_strcts with opened accounting are linked into a global
list. So, the acct_auto_close(_mnt) walks one and drops the accounting
for each.

Signed-off-by: Pavel Emelyanov
Cc: Balbir Singh
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-07-26 01:53:47 +0800
0b6b030fc bsdacct: switch from global bsd_acct_struct instance to per-pidns one ... Browse Code »

Allocate the structure on the first call to sys_acct(). After this each
namespace, that ordered the accounting, will live with this structure till
its own death.

Two notes
- routines, that close the accounting on fs umount time use
the init_pid_ns's acct by now;
- accounting routine accounts to dying task's namespace
(also by now).

Signed-off-by: Pavel Emelyanov
Cc: Balbir Singh
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-07-26 01:53:47 +0800
6248b1b34 bsdacct: make internal code work with passed bsd_acct_struct, not global ... Browse Code »

This adds the appropriate pointer to all the internal (i.e. static)
functions that work with global acct instance. API calls pass a global
instance to them (while we still have such).

Mostly this is a s/acct_globals./acct->/ over the file.

Signed-off-by: Pavel Emelyanov
Cc: Balbir Singh
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-07-26 01:53:47 +0800
a75d97976 bsdacct: turn the acct_lock from on-the-struct to global ... Browse Code »

Don't use per-bsd-acct-struct lock, but work with a global one.

This lock is taken for short periods, so it doesn't seem it'll become a
bottleneck, but it will allow us to easily avoid many locking difficulties
in the future.

So this is a mostly s/acct_globals.lock/acct_lock/ over the file.

Signed-off-by: Pavel Emelyanov
Cc: Balbir Singh
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-07-26 01:53:47 +0800
e59a04a7a bsdacct: make check timer accept a bsd_acct_struct argument ... Browse Code »

We're going to have many bsd_acct_struct instances, not just one, so the
timer (currently working with a global one) has to know which one to work
with.

Use a handy setup_timer macro for it (thanks to Oleg for one).

Signed-off-by: Pavel Emelyanov
Cc: Balbir Singh
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-07-26 01:53:46 +0800
1c552858a bsdacct: "truthify" a comment near acct_process ... Browse Code »

The acct_process does not accept any arguments actually.

Signed-off-by: Pavel Emelyanov
Cc: Balbir Singh
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-07-26 01:53:46 +0800
081e4c8a7 bsdacct: rename acct_gbls to bsd_acct_struct ... Browse Code »

After I fixed access to task->tgid in kernel/acct.c, Oleg pointed out some
bad side effects with this accounting vs pid namespaces interaction. I.e.
when some task in pid namespace sets this accounting up, this blocks all
the others from doing the same. Restricting this to init namespace only
could help, but didn't look a graceful solution.

So here is the approach to make this accounting work with pid namespaces
properly.

The idea is simple - when a task dies it accounts itself in each namespace
it is visible from and which set the accounting up.

For example here are the commands run and the output of lastcomm from init
and sub namespaces:

init_ns# accton pacct
sub_ns# accton pacct (this is a different file - sub ns is run in
a chroot-ed environment)
init_ns# cat /dev/null
sub_ns# ls /dev/null
init_ns# accton
sub_ns# accton

sub_ns# lastcomm -f pacct
ls 0 [136,0] 0.00 secs Thu May 15 10:30
accton 0 [136,0] 0.00 secs Thu May 15 10:30

init_ns# lastcomm -f pacct
accton root pts/0 0.00 secs Thu May 15 14:30 << got from sub
cat root pts/1 0.00 secs Thu May 15 14:30
ls root pts/0 0.00 secs Thu May 15 14:30 << got from sub
accton root pts/1 0.00 secs Thu May 15 14:30

That was the summary, the details are in patches.

This patch:

It will be visible in pid_namespace.h file, so fix its name to look better
outside the acct.c file.

Signed-off-by: Pavel Emelyanov
Cc: Balbir Singh
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-07-26 01:53:46 +0800

25 Mar, 2008

2 commits

5f7b703fe bsd_acct: using task_struct->tgid is not right in pid-namespaces ... Browse Code »

In case we're accounting from a sub-namespace, the tgids reported will not
refer to the right namespace.

Save the pid_namespace we're accounting in on the acct_glbs and use it in
do_acct_process.

Two less :) places using the task_struct.tgid member.

Signed-off-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: "Paul E. McKenney"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-03-25 10:22:20 +0800
a846a1954 bsd_acct: plain current->real_parent access is not always safe ... Browse Code »

This is minor, but dereferencing even current real_parent is not safe on debug
kernels, since the memory, this points to, can be unmapped - RCU protection is
required.

Besides, the tgid field is deprecated and is to be replaced with task_tgid_xxx
call (the 2nd patch), so RCU will be required anyway.

Signed-off-by: Pavel Emelyanov
Cc: Oleg Nesterov
Cc: "Paul E. McKenney"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-03-25 10:22:19 +0800

08 Jan, 2008

1 commit

b59f8197c acct: real_parent ppid ... Browse Code »

The ac_ppid field reported in process accounting records
should match what getppid() would have returned to that
process, regardless of whether a debugger is attached.

Signed-off-by: Roland McGrath
Signed-off-by: Linus Torvalds

Roland McGrath
2008-01-08 06:55:37 +0800

27 Nov, 2007

1 commit

bcbe4a076 sched: fix kernel/acct.c comment ... Browse Code »

fix kernel/acct.c comment.

noticed by Lin Tan. Comment suggested by Olaf Kirch.

also see:

http://bugzilla.kernel.org/show_bug.cgi?id=8220

Reported-by: tammy000@gmail.com
Signed-off-by: Ingo Molnar

Ingo Molnar
2007-11-27 04:21:49 +0800

19 Oct, 2007

1 commit

6ae965cd6 whitespace fixes: process accounting ... Browse Code »

Lots of converting spaces to tabs.

Signed-off-by: Daniel Walker
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daniel Walker
2007-10-19 05:37:24 +0800

26 Jul, 2007

1 commit

2c6b47de1 Cleanup non-arch xtime uses, use get_seconds() or current_kernel_time(). ... Browse Code »

This avoids use of the kernel-internal "xtime" variable directly outside
of the actual time-related functions. Instead, use the helper functions
that we already have available to us.

This doesn't actually change any behaviour, but this will allow us to
fix the fact that "xtime" isn't updated very often with CONFIG_NO_HZ
(because much of the realtime information is maintained as separate
offsets to 'xtime'), which has caused interfaces that use xtime directly
to get a time that is out of sync with the real-time clock by up to a
third of a second or so.

Signed-off-by: John Stultz
Cc: Ingo Molnar
Cc: Thomas Gleixner
Signed-off-by: Linus Torvalds

john stultz
2007-07-26 01:09:20 +0800

09 Dec, 2006

3 commits

f3a43f3f6 [PATCH] kernel: change uses of f_{dentry, vfsmnt} to use f_path ... Browse Code »

Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in
linux/kernel/.

Signed-off-by: Josef "Jeff" Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josef "Jeff" Sipek
2006-12-09 00:28:42 +0800
7bcfa95e5 [PATCH] do_acct_process(): don't take tty_mutex ... Browse Code »

No need to take the global tty_mutex, signal->tty->driver can't go away while
we are holding ->siglock.

Signed-off-by: Oleg Nesterov
Acked-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2006-12-09 00:28:38 +0800
24ec839c4 [PATCH] tty: ->signal->tty locking ... Browse Code »

Fix the locking of signal->tty.

Use ->sighand->siglock to protect ->signal->tty; this lock is already used
by most other members of ->signal/->sighand. And unless we are 'current'
or the tasklist_lock is held we need ->siglock to access ->signal anyway.

(NOTE: sys_unshare() is broken wrt ->sighand locking rules)

Note that tty_mutex is held over tty destruction, so while holding
tty_mutex any tty pointer remains valid. Otherwise the lifetime of ttys
are governed by their open file handles. This leaves some holes for tty
access from signal->tty (or any other non file related tty access).

It solves the tty SLAB scribbles we were seeing.

(NOTE: the change from group_send_sig_info to __group_send_sig_info needs to
be examined by someone familiar with the security framework, I think
it is safe given the SEND_SIG_PRIV from other __group_send_sig_info
invocations)

[schwidefsky@de.ibm.com: 3270 fix]
[akpm@osdl.org: various post-viro fixes]
Signed-off-by: Peter Zijlstra
Acked-by: Alan Cox
Cc: Oleg Nesterov
Cc: Prarit Bhargava
Cc: Chris Wright
Cc: Roland McGrath
Cc: Stephen Smalley
Cc: James Morris
Cc: "David S. Miller"
Cc: Jeff Dike
Cc: Martin Schwidefsky
Cc: Jan Kara
Signed-off-by: Martin Schwidefsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2006-12-09 00:28:38 +0800

08 Dec, 2006

1 commit

6cfd76a26 [PATCH] lockdep: name some old style locks ... Browse Code »

Name some of the remaning 'old_style_spin_init' locks

Signed-off-by: Peter Zijlstra
Acked-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Peter Zijlstra
2006-12-08 00:39:36 +0800

01 Oct, 2006

1 commit

8f0ab5147 [PATCH] csa: convert CONFIG tag for extended accounting routines ... Browse Code »

There were a few accounting data/macros that are used in CSA but are #ifdef'ed
inside CONFIG_BSD_PROCESS_ACCT. This patch is to change those ifdef's from
CONFIG_BSD_PROCESS_ACCT to CONFIG_TASK_XACCT. A few defines are moved from
kernel/acct.c and include/linux/acct.h to kernel/tsacct.c and
include/linux/tsacct_kern.h.

Signed-off-by: Jay Lan
Cc: Shailabh Nagar
Cc: Balbir Singh
Cc: Jes Sorensen
Cc: Chris Sturtivant
Cc: Tony Ernst
Cc: Guillaume Thouvenin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jay Lan
2006-10-01 15:39:29 +0800

30 Sep, 2006

1 commit

eb84a20e9 [PATCH] audit/accounting: tty locking ... Browse Code »

Add tty locking around the audit and accounting code.

The whole current->signal-> locking is all deeply strange but it's for
someone else to sort out. Add rather than replace the lock for acct.c

Signed-off-by: Alan Cox
Acked-by: Arjan van de Ven
Cc: Al Viro
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alan Cox
2006-09-30 00:18:25 +0800

15 Jul, 2006

1 commit

a4afee02a [PATCH] Fix sighand->siglock usage in kernel/acct.c ... Browse Code »

IRQs must be disabled before taking ->siglock.

Noticed by lockdep.

Signed-off-by: OGAWA Hirofumi
Cc: Arjan van de Ven
Cc: Ingo Molnar
Cc: "Eric W. Biederman"
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

OGAWA Hirofumi
2006-07-15 12:53:54 +0800

01 Jul, 2006

1 commit

6ab3d5624 Remove obsolete #include <linux/config.h> ... Browse Code »

Signed-off-by: Jörn Engel
Signed-off-by: Adrian Bunk

Jörn Engel
2006-07-01 01:25:36 +0800

28 Jun, 2006

2 commits

1dbe83c34 [PATCH] fix kernel-doc in kernel/ dir ... Browse Code »

Fix kernel-doc parameters in kernel/

Warning(/var/linsrc/linux-2617-g9//kernel/auditsc.c:1376): No description found for parameter 'u_abs_timeout'
Warning(/var/linsrc/linux-2617-g9//kernel/auditsc.c:1420): No description found for parameter 'u_msg_prio'
Warning(/var/linsrc/linux-2617-g9//kernel/auditsc.c:1420): No description found for parameter 'u_abs_timeout'
Warning(/var/linsrc/linux-2617-g9//kernel/acct.c:526): No description found for parameter 'pacct'

Signed-off-by: Randy Dunlap
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2006-06-28 08:32:39 +0800
7f32a25f6 [PATCH] kernel/acct: fix function definition ... Browse Code »

kernel/acct.c:579:19: warning: non-ANSI function declaration of function 'acct_process'

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy Dunlap
2006-06-28 08:32:35 +0800

26 Jun, 2006

4 commits

77787bfb4 [PATCH] pacct: none-delayed process accounting accumulation ... Browse Code »

In current 2.6.17 implementation, signal_struct refered from task_struct is
used for per-process data structure. The pacct facility also uses it as a
per-process data structure to store stime, utime, minflt, majflt. But those
members are saved in __exit_signal(). It's too late.

For example, if some threads exits at same time, pacct facility has a
possibility to drop accountings for a part of those threads. (see, the
following 'The results of original 2.6.17 kernel') I think accounting
information should be completely collected into the per-process data structure
before writing out an accounting record.

This patch fixes this matter. Accumulation of stime, utime, minflt and majflt
are done before generating accounting record.

[mingo@elte.hu: fix acct_collect() siglock bug found by lockdep]
Signed-off-by: KaiGai Kohei
Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KaiGai Kohei
2006-06-26 01:01:25 +0800
f6ec29a42 [PATCH] pacct: avoidance to refer the last thread as a representation of the process ... Browse Code »

When pacct facility generate an 'ac_flag' field in accounting record, it
refers a task_struct of the thread which died last in the process. But any
other task_structs are ignored.

Therefore, pacct facility drops ASU flag even if root-privilege operations are
used by any other threads except the last one. In addition, AFORK flag is
always set when the thread of group-leader didn't die last, although this
process has called execve() after fork().

We have a same matter in ac_exitcode. The recorded ac_exitcode is an exit
code of the last thread in the process. There is a possibility this exitcode
is not the group leader's one.

KaiGai Kohei
2006-06-26 01:01:25 +0800
0e4648141 [PATCH] pacct: add pacct_struct to fix some pacct bugs. ... Browse Code »

The pacct facility need an i/o operation when an accounting record is
generated. There is a possibility to wake OOM killer up. If OOM killer is
activated, it kills some processes to make them release process memory
regions.

But acct_process() is called in the killed processes context before calling
exit_mm(), so those processes cannot release own memory. In the results, any
processes stop in this point and it finally cause a system stall.

KaiGai Kohei
2006-06-26 01:01:25 +0800
11e64757f [PATCH] Remove unecessary NULL check in kernel/acct.c ... Browse Code »

copy_process() appears to be the only caller of acct_clear_integrals() and
does not pass in NULL task pointers. Remove the unecessary check.

Signed-off-by: Matt Helsley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matt Helsley
2006-06-26 01:01:09 +0800

23 Jun, 2006

1 commit

726c33422 [PATCH] VFS: Permit filesystem to perform statfs with a known root dentry ... Browse Code »

Give the statfs superblock operation a dentry pointer rather than a superblock
pointer.

This complements the get_sb() patch. That reduced the significance of
sb->s_root, allowing NFS to place a fake root there. However, NFS does
require a dentry to use as a target for the statfs operation. This permits
the root in the vfsmount to be used instead.

linux/mount.h has been added where necessary to make allyesconfig build
successfully.

Interest has also been expressed for use with the FUSE and XFS filesystems.

Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Nathan Scott
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Howells
2006-06-23 22:42:45 +0800

01 Apr, 2006

1 commit

bb231fe3a [PATCH] Fix pacct bug in multithreading case. ... Browse Code »

I noticed a bug on the process accounting facility. In multi-threading
process, some data would be recorded incorrectly when the group_leader dies
earlier than one or more threads. The attached patch fixes this problem.

See below. 'bugacct' is a test program that create a worker thread after 4
seconds sleeping, then the group_leader dies soon. The worker thread
consume CPU/Memory for 6 seconds, then exit. We can estimate 10 seconds as
etime and 6 seconds as stime + utime. This is a sample program which the
group_leader dies earlier than other threads.

The results of same binary execution on different kernel are below.
-- accounted records --------------------
| btime | utime | stime | etime | minflt | majflt | comm |
original | 13:16:40 | 0.00 | 0.00 | 6.10 | 171 | 0 | bugacct |
patched | 13:20:21 | 5.83 | 0.18 | 10.03 | 32776 | 0 | bugacct |
(*) bugacct allocates 128MB memory, thus 128MB / 4KB = 32768 of minflt is
appropriate.

-- Test results in original kernel ------
$ date; time -p ./bugacct
Tue Mar 28 13:16:36 JST 2006 start_time.tv_sec*NSEC_PER_SEC
+ current->start_time.tv_nsec;
----

The following section calculates stime and utime of the process.
But it might count the utime and stime of the group_leader duplicatly
and ignore the utime and stime of the thread dies last, when one or
more threads remain after group_leader dead.
The ac_utime should be calculated as the sum of the signal->utime
and utime of the thread dies last. The ac_stime should be done also.

---- do_acct_process() in kernel/acct.c:
jiffies = cputime_to_jiffies(cputime_add(current->group_leader->utime,
current->signal->utime));
ac.ac_utime = encode_comp_t(jiffies_to_AHZ(jiffies));
jiffies = cputime_to_jiffies(cputime_add(current->group_leader->stime,
current->signal->stime));
ac.ac_stime = encode_comp_t(jiffies_to_AHZ(jiffies));
----

The part of the minflt/majflt calculation has same problem.
This patch solves those problems, I think.

Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KaiGai Kohei
2006-04-01 04:18:54 +0800

12 Jan, 2006

1 commit

c59ede7b7 [PATCH] move capable() to capability.h ... Browse Code »

- Move capable() from sched.h to capability.h;

- Use where capable() is used
(in include/, block/, ipc/, kernel/, a few drivers/,
mm/, security/, & sound/;
many more drivers/ to go)

Signed-off-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Randy.Dunlap
2006-01-12 10:42:13 +0800

07 Jan, 2006

1 commit

089545f0c [PATCH] s390: cputime_t fixes ... Browse Code »

There are some more places where the use of cputime_t instead of an integer
type and the associated macros is necessary for the virtual cputime accounting
on s390. Affected are the s390 specific appldata code and BSD process
accounting.

Signed-off-by: Martin Schwidefsky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Martin Schwidefsky
2006-01-07 00:33:49 +0800

08 Nov, 2005

1 commit

7b7b1ace2 [PATCH] saner handling of auto_acct_off() and DQUOT_OFF() in umount ... Browse Code »

The way we currently deal with quota and process accounting that might
keep vfsmount busy at umount time is inherently broken; we try to turn
them off just in case (not quite correctly, at that) and

a) pray umount doesn't fail (otherwise they'll stay turned off)
b) pray nobody doesn anything funny just as we turn quota off

Moreover, LSM provides hooks for doing the same sort of broken logics.

The proper way to deal with that is to introduce the second kind of
reference to vfsmount. Semantics:

- when the last normal reference is dropped, all special ones are
converted to normal ones and if there had been any, cleanup is done.
- normal reference can be cloned into a special one
- special reference can be converted to normal one; that's a no-op if
we'd already passed the point of no return (i.e. mntput() had
converted special references to normal and started cleanup).

The way it works: e.g. starting process accounting converts the vfsmount
reference pinned by the opened file into special one and turns it back
to normal when it gets shut down; acct_auto_close() is done when no
normal references are left. That way it does *not* obstruct umount(2)
and it silently gets turned off when the last normal reference to
vfsmount is gone. Which is exactly what we want...

The same should be done by LSM module that holds some internal
references to vfsmount and wants to shut them down on umount - it should
make them special and security_sb_umount_close() will be called exactly
when the last normal reference to vfsmount is gone.

quota handling is even simpler - we don't use normal file IO anymore, so
there's no need to hold vfsmounts at all. DQUOT_OFF() is done from
deactivate_super(), where it really belongs.

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2005-11-08 10:18:09 +0800

30 Oct, 2005

1 commit

4294621f4 [PATCH] mm: rss = file_rss + anon_rss ... Browse Code »

I was lazy when we added anon_rss, and chose to change as few places as
possible. So currently each anonymous page has to be counted twice, in rss
and in anon_rss. Which won't be so good if those are atomic counts in some
configurations.

Change that around: keep file_rss and anon_rss separately, and add them
together (with get_mm_rss macro) when the total is needed - reading two
atomics is much cheaper than updating two atomics. And update anon_rss
upfront, typically in memory.c, not tucked away in page_add_anon_rmap.

Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hugh Dickins
2005-10-30 12:40:38 +0800