24 Aug, 2009

1 commit

  • When process accounting is enabled, every exiting process writes a log to
    the account file. In addition, every once in a while one of the exiting
    processes checks whether there's enough free space for the log.

    SELinux policy may or may not allow the exiting process to stat the fs.
    So unsuspecting processes start generating AVC denials just because
    someone enabled process accounting.

    For these filesystem operations, the exiting process's credentials should
    be temporarily switched to that of the process which enabled accounting,
    because it's really that process which wanted to have the accounting
    information logged.

    Signed-off-by: Michal Schmidt
    Acked-by: David Howells
    Acked-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: James Morris

    Michal Schmidt
     

01 Jul, 2009

1 commit

  • The file opened in acct_on and freshly stored in the ns->bacct struct can
    be closed in acct_file_reopen by a concurrent call after we release
    acct_lock and before we call mntput(file->f_path.mnt).

    Record file->f_path.mnt in a local variable and use this variable only.

    Signed-off-by: Renaud Lottiaux
    Signed-off-by: Louis Rilling
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Renaud Lottiaux
     

14 Jan, 2009

1 commit


14 Nov, 2008

1 commit

  • Wrap access to task credentials so that they can be separated more easily from
    the task_struct during the introduction of COW creds.

    Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

    Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
    sense to use RCU directly rather than a convenient wrapper; these will be
    addressed by later patches.

    Signed-off-by: David Howells
    Reviewed-by: James Morris
    Acked-by: Serge Hallyn
    Cc: Al Viro
    Cc: linux-audit@redhat.com
    Cc: containers@lists.linux-foundation.org
    Cc: linux-mm@kvack.org
    Signed-off-by: James Morris

    David Howells
     

14 Oct, 2008

1 commit


26 Jul, 2008

9 commits

  • Fix the one describing what this function is and add one more - about
    locking absence around pid namespaces loop.

    Signed-off-by: Pavel Emelyanov
    Cc: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • This just makes the acct_proces walk the pid namespaces from current up to
    the top and account a task in each with the accounting turned on.

    ns->parent access if safe lockless, since current it still alive and holds
    its namespace, which in turn holds its parent.

    Signed-off-by: Pavel Emelyanov
    Cc: Balbir Singh
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • All the bsd_acct_strcts with opened accounting are linked into a global
    list. So, the acct_auto_close(_mnt) walks one and drops the accounting
    for each.

    Signed-off-by: Pavel Emelyanov
    Cc: Balbir Singh
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Allocate the structure on the first call to sys_acct(). After this each
    namespace, that ordered the accounting, will live with this structure till
    its own death.

    Two notes
    - routines, that close the accounting on fs umount time use
    the init_pid_ns's acct by now;
    - accounting routine accounts to dying task's namespace
    (also by now).

    Signed-off-by: Pavel Emelyanov
    Cc: Balbir Singh
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • This adds the appropriate pointer to all the internal (i.e. static)
    functions that work with global acct instance. API calls pass a global
    instance to them (while we still have such).

    Mostly this is a s/acct_globals./acct->/ over the file.

    Signed-off-by: Pavel Emelyanov
    Cc: Balbir Singh
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • Don't use per-bsd-acct-struct lock, but work with a global one.

    This lock is taken for short periods, so it doesn't seem it'll become a
    bottleneck, but it will allow us to easily avoid many locking difficulties
    in the future.

    So this is a mostly s/acct_globals.lock/acct_lock/ over the file.

    Signed-off-by: Pavel Emelyanov
    Cc: Balbir Singh
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • We're going to have many bsd_acct_struct instances, not just one, so the
    timer (currently working with a global one) has to know which one to work
    with.

    Use a handy setup_timer macro for it (thanks to Oleg for one).

    Signed-off-by: Pavel Emelyanov
    Cc: Balbir Singh
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • The acct_process does not accept any arguments actually.

    Signed-off-by: Pavel Emelyanov
    Cc: Balbir Singh
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • After I fixed access to task->tgid in kernel/acct.c, Oleg pointed out some
    bad side effects with this accounting vs pid namespaces interaction. I.e.
    when some task in pid namespace sets this accounting up, this blocks all
    the others from doing the same. Restricting this to init namespace only
    could help, but didn't look a graceful solution.

    So here is the approach to make this accounting work with pid namespaces
    properly.

    The idea is simple - when a task dies it accounts itself in each namespace
    it is visible from and which set the accounting up.

    For example here are the commands run and the output of lastcomm from init
    and sub namespaces:

    init_ns# accton pacct
    sub_ns# accton pacct (this is a different file - sub ns is run in
    a chroot-ed environment)
    init_ns# cat /dev/null
    sub_ns# ls /dev/null
    init_ns# accton
    sub_ns# accton

    sub_ns# lastcomm -f pacct
    ls 0 [136,0] 0.00 secs Thu May 15 10:30
    accton 0 [136,0] 0.00 secs Thu May 15 10:30

    init_ns# lastcomm -f pacct
    accton root pts/0 0.00 secs Thu May 15 14:30 << got from sub
    cat root pts/1 0.00 secs Thu May 15 14:30
    ls root pts/0 0.00 secs Thu May 15 14:30 << got from sub
    accton root pts/1 0.00 secs Thu May 15 14:30

    That was the summary, the details are in patches.

    This patch:

    It will be visible in pid_namespace.h file, so fix its name to look better
    outside the acct.c file.

    Signed-off-by: Pavel Emelyanov
    Cc: Balbir Singh
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

25 Mar, 2008

2 commits

  • In case we're accounting from a sub-namespace, the tgids reported will not
    refer to the right namespace.

    Save the pid_namespace we're accounting in on the acct_glbs and use it in
    do_acct_process.

    Two less :) places using the task_struct.tgid member.

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • This is minor, but dereferencing even current real_parent is not safe on debug
    kernels, since the memory, this points to, can be unmapped - RCU protection is
    required.

    Besides, the tgid field is deprecated and is to be replaced with task_tgid_xxx
    call (the 2nd patch), so RCU will be required anyway.

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

08 Jan, 2008

1 commit

  • The ac_ppid field reported in process accounting records
    should match what getppid() would have returned to that
    process, regardless of whether a debugger is attached.

    Signed-off-by: Roland McGrath
    Signed-off-by: Linus Torvalds

    Roland McGrath
     

27 Nov, 2007

1 commit


19 Oct, 2007

1 commit


26 Jul, 2007

1 commit

  • This avoids use of the kernel-internal "xtime" variable directly outside
    of the actual time-related functions. Instead, use the helper functions
    that we already have available to us.

    This doesn't actually change any behaviour, but this will allow us to
    fix the fact that "xtime" isn't updated very often with CONFIG_NO_HZ
    (because much of the realtime information is maintained as separate
    offsets to 'xtime'), which has caused interfaces that use xtime directly
    to get a time that is out of sync with the real-time clock by up to a
    third of a second or so.

    Signed-off-by: John Stultz
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Linus Torvalds

    john stultz
     

09 Dec, 2006

3 commits

  • Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in
    linux/kernel/.

    Signed-off-by: Josef "Jeff" Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef "Jeff" Sipek
     
  • No need to take the global tty_mutex, signal->tty->driver can't go away while
    we are holding ->siglock.

    Signed-off-by: Oleg Nesterov
    Acked-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Fix the locking of signal->tty.

    Use ->sighand->siglock to protect ->signal->tty; this lock is already used
    by most other members of ->signal/->sighand. And unless we are 'current'
    or the tasklist_lock is held we need ->siglock to access ->signal anyway.

    (NOTE: sys_unshare() is broken wrt ->sighand locking rules)

    Note that tty_mutex is held over tty destruction, so while holding
    tty_mutex any tty pointer remains valid. Otherwise the lifetime of ttys
    are governed by their open file handles. This leaves some holes for tty
    access from signal->tty (or any other non file related tty access).

    It solves the tty SLAB scribbles we were seeing.

    (NOTE: the change from group_send_sig_info to __group_send_sig_info needs to
    be examined by someone familiar with the security framework, I think
    it is safe given the SEND_SIG_PRIV from other __group_send_sig_info
    invocations)

    [schwidefsky@de.ibm.com: 3270 fix]
    [akpm@osdl.org: various post-viro fixes]
    Signed-off-by: Peter Zijlstra
    Acked-by: Alan Cox
    Cc: Oleg Nesterov
    Cc: Prarit Bhargava
    Cc: Chris Wright
    Cc: Roland McGrath
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: "David S. Miller"
    Cc: Jeff Dike
    Cc: Martin Schwidefsky
    Cc: Jan Kara
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     

08 Dec, 2006

1 commit


01 Oct, 2006

1 commit

  • There were a few accounting data/macros that are used in CSA but are #ifdef'ed
    inside CONFIG_BSD_PROCESS_ACCT. This patch is to change those ifdef's from
    CONFIG_BSD_PROCESS_ACCT to CONFIG_TASK_XACCT. A few defines are moved from
    kernel/acct.c and include/linux/acct.h to kernel/tsacct.c and
    include/linux/tsacct_kern.h.

    Signed-off-by: Jay Lan
    Cc: Shailabh Nagar
    Cc: Balbir Singh
    Cc: Jes Sorensen
    Cc: Chris Sturtivant
    Cc: Tony Ernst
    Cc: Guillaume Thouvenin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jay Lan
     

30 Sep, 2006

1 commit

  • Add tty locking around the audit and accounting code.

    The whole current->signal-> locking is all deeply strange but it's for
    someone else to sort out. Add rather than replace the lock for acct.c

    Signed-off-by: Alan Cox
    Acked-by: Arjan van de Ven
    Cc: Al Viro
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     

15 Jul, 2006

1 commit


01 Jul, 2006

1 commit


28 Jun, 2006

2 commits

  • Fix kernel-doc parameters in kernel/

    Warning(/var/linsrc/linux-2617-g9//kernel/auditsc.c:1376): No description found for parameter 'u_abs_timeout'
    Warning(/var/linsrc/linux-2617-g9//kernel/auditsc.c:1420): No description found for parameter 'u_msg_prio'
    Warning(/var/linsrc/linux-2617-g9//kernel/auditsc.c:1420): No description found for parameter 'u_abs_timeout'
    Warning(/var/linsrc/linux-2617-g9//kernel/acct.c:526): No description found for parameter 'pacct'

    Signed-off-by: Randy Dunlap
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • kernel/acct.c:579:19: warning: non-ANSI function declaration of function 'acct_process'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

26 Jun, 2006

4 commits

  • In current 2.6.17 implementation, signal_struct refered from task_struct is
    used for per-process data structure. The pacct facility also uses it as a
    per-process data structure to store stime, utime, minflt, majflt. But those
    members are saved in __exit_signal(). It's too late.

    For example, if some threads exits at same time, pacct facility has a
    possibility to drop accountings for a part of those threads. (see, the
    following 'The results of original 2.6.17 kernel') I think accounting
    information should be completely collected into the per-process data structure
    before writing out an accounting record.

    This patch fixes this matter. Accumulation of stime, utime, minflt and majflt
    are done before generating accounting record.

    [mingo@elte.hu: fix acct_collect() siglock bug found by lockdep]
    Signed-off-by: KaiGai Kohei
    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KaiGai Kohei
     
  • When pacct facility generate an 'ac_flag' field in accounting record, it
    refers a task_struct of the thread which died last in the process. But any
    other task_structs are ignored.

    Therefore, pacct facility drops ASU flag even if root-privilege operations are
    used by any other threads except the last one. In addition, AFORK flag is
    always set when the thread of group-leader didn't die last, although this
    process has called execve() after fork().

    We have a same matter in ac_exitcode. The recorded ac_exitcode is an exit
    code of the last thread in the process. There is a possibility this exitcode
    is not the group leader's one.

    KaiGai Kohei
     
  • The pacct facility need an i/o operation when an accounting record is
    generated. There is a possibility to wake OOM killer up. If OOM killer is
    activated, it kills some processes to make them release process memory
    regions.

    But acct_process() is called in the killed processes context before calling
    exit_mm(), so those processes cannot release own memory. In the results, any
    processes stop in this point and it finally cause a system stall.

    KaiGai Kohei
     
  • copy_process() appears to be the only caller of acct_clear_integrals() and
    does not pass in NULL task pointers. Remove the unecessary check.

    Signed-off-by: Matt Helsley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Helsley
     

23 Jun, 2006

1 commit

  • Give the statfs superblock operation a dentry pointer rather than a superblock
    pointer.

    This complements the get_sb() patch. That reduced the significance of
    sb->s_root, allowing NFS to place a fake root there. However, NFS does
    require a dentry to use as a target for the statfs operation. This permits
    the root in the vfsmount to be used instead.

    linux/mount.h has been added where necessary to make allyesconfig build
    successfully.

    Interest has also been expressed for use with the FUSE and XFS filesystems.

    Signed-off-by: David Howells
    Acked-by: Al Viro
    Cc: Nathan Scott
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

01 Apr, 2006

1 commit

  • I noticed a bug on the process accounting facility. In multi-threading
    process, some data would be recorded incorrectly when the group_leader dies
    earlier than one or more threads. The attached patch fixes this problem.

    See below. 'bugacct' is a test program that create a worker thread after 4
    seconds sleeping, then the group_leader dies soon. The worker thread
    consume CPU/Memory for 6 seconds, then exit. We can estimate 10 seconds as
    etime and 6 seconds as stime + utime. This is a sample program which the
    group_leader dies earlier than other threads.

    The results of same binary execution on different kernel are below.
    -- accounted records --------------------
    | btime | utime | stime | etime | minflt | majflt | comm |
    original | 13:16:40 | 0.00 | 0.00 | 6.10 | 171 | 0 | bugacct |
    patched | 13:20:21 | 5.83 | 0.18 | 10.03 | 32776 | 0 | bugacct |
    (*) bugacct allocates 128MB memory, thus 128MB / 4KB = 32768 of minflt is
    appropriate.

    -- Test results in original kernel ------
    $ date; time -p ./bugacct
    Tue Mar 28 13:16:36 JST 2006 start_time.tv_sec*NSEC_PER_SEC
    + current->start_time.tv_nsec;
    ----

    The following section calculates stime and utime of the process.
    But it might count the utime and stime of the group_leader duplicatly
    and ignore the utime and stime of the thread dies last, when one or
    more threads remain after group_leader dead.
    The ac_utime should be calculated as the sum of the signal->utime
    and utime of the thread dies last. The ac_stime should be done also.

    ---- do_acct_process() in kernel/acct.c:
    jiffies = cputime_to_jiffies(cputime_add(current->group_leader->utime,
    current->signal->utime));
    ac.ac_utime = encode_comp_t(jiffies_to_AHZ(jiffies));
    jiffies = cputime_to_jiffies(cputime_add(current->group_leader->stime,
    current->signal->stime));
    ac.ac_stime = encode_comp_t(jiffies_to_AHZ(jiffies));
    ----

    The part of the minflt/majflt calculation has same problem.
    This patch solves those problems, I think.

    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KaiGai Kohei
     

12 Jan, 2006

1 commit

  • - Move capable() from sched.h to capability.h;

    - Use where capable() is used
    (in include/, block/, ipc/, kernel/, a few drivers/,
    mm/, security/, & sound/;
    many more drivers/ to go)

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy.Dunlap
     

07 Jan, 2006

1 commit

  • There are some more places where the use of cputime_t instead of an integer
    type and the associated macros is necessary for the virtual cputime accounting
    on s390. Affected are the s390 specific appldata code and BSD process
    accounting.

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Martin Schwidefsky
     

08 Nov, 2005

1 commit

  • The way we currently deal with quota and process accounting that might
    keep vfsmount busy at umount time is inherently broken; we try to turn
    them off just in case (not quite correctly, at that) and

    a) pray umount doesn't fail (otherwise they'll stay turned off)
    b) pray nobody doesn anything funny just as we turn quota off

    Moreover, LSM provides hooks for doing the same sort of broken logics.

    The proper way to deal with that is to introduce the second kind of
    reference to vfsmount. Semantics:

    - when the last normal reference is dropped, all special ones are
    converted to normal ones and if there had been any, cleanup is done.
    - normal reference can be cloned into a special one
    - special reference can be converted to normal one; that's a no-op if
    we'd already passed the point of no return (i.e. mntput() had
    converted special references to normal and started cleanup).

    The way it works: e.g. starting process accounting converts the vfsmount
    reference pinned by the opened file into special one and turns it back
    to normal when it gets shut down; acct_auto_close() is done when no
    normal references are left. That way it does *not* obstruct umount(2)
    and it silently gets turned off when the last normal reference to
    vfsmount is gone. Which is exactly what we want...

    The same should be done by LSM module that holds some internal
    references to vfsmount and wants to shut them down on umount - it should
    make them special and security_sb_umount_close() will be called exactly
    when the last normal reference to vfsmount is gone.

    quota handling is even simpler - we don't use normal file IO anymore, so
    there's no need to hold vfsmounts at all. DQUOT_OFF() is done from
    deactivate_super(), where it really belongs.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

30 Oct, 2005

1 commit

  • I was lazy when we added anon_rss, and chose to change as few places as
    possible. So currently each anonymous page has to be counted twice, in rss
    and in anon_rss. Which won't be so good if those are atomic counts in some
    configurations.

    Change that around: keep file_rss and anon_rss separately, and add them
    together (with get_mm_rss macro) when the total is needed - reading two
    atomics is much cheaper than updating two atomics. And update anon_rss
    upfront, typically in memory.c, not tucked away in page_add_anon_rmap.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins