28 Jul, 2011

1 commit

  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (54 commits)
    tpm_nsc: Fix bug when loading multiple TPM drivers
    tpm: Move tpm_tis_reenable_interrupts out of CONFIG_PNP block
    tpm: Fix compilation warning when CONFIG_PNP is not defined
    TOMOYO: Update kernel-doc.
    tpm: Fix a typo
    tpm_tis: Probing function for Intel iTPM bug
    tpm_tis: Fix the probing for interrupts
    tpm_tis: Delay ACPI S3 suspend while the TPM is busy
    tpm_tis: Re-enable interrupts upon (S3) resume
    tpm: Fix display of data in pubek sysfs entry
    tpm_tis: Add timeouts sysfs entry
    tpm: Adjust interface timeouts if they are too small
    tpm: Use interface timeouts returned from the TPM
    tpm_tis: Introduce durations sysfs entry
    tpm: Adjust the durations if they are too small
    tpm: Use durations returned from TPM
    TOMOYO: Enable conditional ACL.
    TOMOYO: Allow using argv[]/envp[] of execve() as conditions.
    TOMOYO: Allow using executable's realpath and symlink's target as conditions.
    TOMOYO: Allow using owner/group etc. of file objects as conditions.
    ...

    Fix up trivial conflict in security/tomoyo/realpath.c

    Linus Torvalds
     

27 Jul, 2011

1 commit

  • This allows us to move duplicated code in
    (atomic_inc_not_zero() for now) to

    Signed-off-by: Arun Sharma
    Reviewed-by: Eric Dumazet
    Cc: Ingo Molnar
    Cc: David Miller
    Cc: Eric Dumazet
    Acked-by: Mike Frysinger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun Sharma
     

26 Jul, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (43 commits)
    fs: Merge split strings
    treewide: fix potentially dangerous trailing ';' in #defined values/expressions
    uwb: Fix misspelling of neighbourhood in comment
    net, netfilter: Remove redundant goto in ebt_ulog_packet
    trivial: don't touch files that are removed in the staging tree
    lib/vsprintf: replace link to Draft by final RFC number
    doc: Kconfig: `to be' -> `be'
    doc: Kconfig: Typo: square -> squared
    doc: Konfig: Documentation/power/{pm => apm-acpi}.txt
    drivers/net: static should be at beginning of declaration
    drivers/media: static should be at beginning of declaration
    drivers/i2c: static should be at beginning of declaration
    XTENSA: static should be at beginning of declaration
    SH: static should be at beginning of declaration
    MIPS: static should be at beginning of declaration
    ARM: static should be at beginning of declaration
    rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check
    Update my e-mail address
    PCIe ASPM: forcedly -> forcibly
    gma500: push through device driver tree
    ...

    Fix up trivial conflicts:
    - arch/arm/mach-ep93xx/dma-m2p.c (deleted)
    - drivers/gpio/gpio-ep93xx.c (renamed and context nearby)
    - drivers/net/r8169.c (just context changes)

    Linus Torvalds
     

20 Jul, 2011

1 commit


09 Jul, 2011

1 commit


09 Jun, 2011

1 commit

  • We recently found that in some configurations SELinux was blocking the ability
    for cgroupfs to be mounted. The reason for this is because cgroupfs creates
    files and directories during the get_sb() call and also uses lookup_one_len()
    during that same get_sb() call. This is a problem since the security
    subsystem cannot initialize the superblock and the inodes in that filesystem
    until after the get_sb() call returns. Thus we leave the inodes in
    an unitialized state during get_sb(). For the vast majority of filesystems
    this is not an issue, but since cgroupfs uses lookup_on_len() it does
    search permission checks on the directories in the path it walks. Since the
    inode security state is not set up SELinux does these checks as if the inodes
    were 'unlabeled.'

    Many 'normal' userspace process do not have permission to interact with
    unlabeled inodes. The solution presented here is to do the permission checks
    of path walk and inode creation as the kernel rather than as the task that
    called mount. Since the kernel has permission to read/write/create
    unlabeled inodes the get_sb() call will complete successfully and the SELinux
    code will be able to initialize the superblock and those inodes created during
    the get_sb() call.

    This appears to be the same solution used by other filesystems such as devtmpfs
    to solve the same issue and should thus have no negative impact on other LSMs
    which currently work.

    Signed-off-by: Eric Paris
    Acked-by: Paul Menage
    Signed-off-by: James Morris

    eparis@redhat
     

27 May, 2011

4 commits

  • The ns_cgroup is an annoying cgroup at the namespace / cgroup frontier and
    leads to some problems:

    * cgroup creation is out-of-control
    * cgroup name can conflict when pids are looping
    * it is not possible to have a single process handling a lot of
    namespaces without falling in a exponential creation time
    * we may want to create a namespace without creating a cgroup

    The ns_cgroup was replaced by a compatibility flag 'clone_children',
    where a newly created cgroup will copy the parent cgroup values.
    The userspace has to manually create a cgroup and add a task to
    the 'tasks' file.

    This patch removes the ns_cgroup as suggested in the following thread:

    https://lists.linux-foundation.org/pipermail/containers/2009-June/018616.html

    The 'cgroup_clone' function is removed because it is no longer used.

    This is a userspace-visible change. Commit 45531757b45c ("cgroup: notify
    ns_cgroup deprecated") (merged into 2.6.27) caused the kernel to emit a
    printk warning users that the feature is planned for removal. Since that
    time we have heard from XXX users who were affected by this.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: Serge E. Hallyn
    Cc: Eric W. Biederman
    Cc: Jamal Hadi Salim
    Reviewed-by: Li Zefan
    Acked-by: Paul Menage
    Acked-by: Matt Helsley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Lezcano
     
  • Convert cgroup_attach_proc to use flex_array.

    The cgroup_attach_proc implementation requires a pre-allocated array to
    store task pointers to atomically move a thread-group, but asking for a
    monolithic array with kmalloc() may be unreliable for very large groups.
    Using flex_array provides the same functionality with less risk of
    failure.

    This is a post-patch for cgroup-procs-write.patch.

    Signed-off-by: Ben Blum
    Cc: "Eric W. Biederman"
    Cc: Li Zefan
    Cc: Matt Helsley
    Reviewed-by: Paul Menage
    Cc: Oleg Nesterov
    Cc: David Rientjes
    Cc: Miao Xie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     
  • Make procs file writable to move all threads by tgid at once.

    Add functionality that enables users to move all threads in a threadgroup
    at once to a cgroup by writing the tgid to the 'cgroup.procs' file. This
    current implementation makes use of a per-threadgroup rwsem that's taken
    for reading in the fork() path to prevent newly forking threads within the
    threadgroup from "escaping" while the move is in progress.

    Signed-off-by: Ben Blum
    Cc: "Eric W. Biederman"
    Cc: Li Zefan
    Cc: Matt Helsley
    Reviewed-by: Paul Menage
    Cc: Oleg Nesterov
    Cc: David Rientjes
    Cc: Miao Xie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     
  • Add cgroup subsystem callbacks for per-thread attachment in atomic contexts

    Add can_attach_task(), pre_attach(), and attach_task() as new callbacks
    for cgroups's subsystem interface. Unlike can_attach and attach, these
    are for per-thread operations, to be called potentially many times when
    attaching an entire threadgroup.

    Also, the old "bool threadgroup" interface is removed, as replaced by
    this. All subsystems are modified for the new interface - of note is
    cpuset, which requires from/to nodemasks for attach to be globally scoped
    (though per-cpuset would work too) to persist from its pre_attach to
    attach_task and attach.

    This is a pre-patch for cgroup-procs-writable.patch.

    Signed-off-by: Ben Blum
    Cc: "Eric W. Biederman"
    Cc: Li Zefan
    Cc: Matt Helsley
    Reviewed-by: Paul Menage
    Cc: Oleg Nesterov
    Cc: David Rientjes
    Cc: Miao Xie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Blum
     

08 May, 2011

3 commits


31 Mar, 2011

1 commit


23 Mar, 2011

1 commit

  • list_del() leaves poison in the prev and next pointers. The next
    list_empty() will compare those poisons, and say the list isn't empty.
    Any list operations that assume the node is on a list because of such a
    check will be fooled into dereferencing poison. One needs to INIT the
    node after the del, and fortunately there's already a wrapper for that -
    list_del_init().

    Some of the dels are followed by deallocations, so can be ignored, and one
    can be merged with an add to make a move. Apart from that, I erred on the
    side of caution in making nodes list_empty()-queriable.

    Signed-off-by: Phil Carmody
    Reviewed-by: Paul Menage
    Cc: Li Zefan
    Acked-by: Kirill A. Shutemov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Phil Carmody
     

16 Feb, 2011

2 commits

  • This kernel patch adds the ability to filter monitoring based on
    container groups (cgroups). This is for use in per-cpu mode only.

    The cgroup to monitor is passed as a file descriptor in the pid
    argument to the syscall. The file descriptor must be opened to
    the cgroup name in the cgroup filesystem. For instance, if the
    cgroup name is foo and cgroupfs is mounted in /cgroup, then the
    file descriptor is opened to /cgroup/foo. Cgroup mode is
    activated by passing PERF_FLAG_PID_CGROUP in the flags argument
    to the syscall.

    For instance to measure in cgroup foo on CPU1 assuming
    cgroupfs is mounted under /cgroup:

    struct perf_event_attr attr;
    int cgroup_fd, fd;

    cgroup_fd = open("/cgroup/foo", O_RDONLY);
    fd = perf_event_open(&attr, cgroup_fd, 1, -1, PERF_FLAG_PID_CGROUP);
    close(cgroup_fd);

    Signed-off-by: Stephane Eranian
    [ added perf_cgroup_{exit,attach} ]
    Signed-off-by: Peter Zijlstra
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Stephane Eranian
     
  • Make the ::exit method act like ::attach, it is after all very nearly
    the same thing.

    The bug had no effect on correctness - fixing it is an optimization for
    the scheduler. Also, later perf-cgroups patches rely on it.

    Signed-off-by: Peter Zijlstra
    Acked-by: Paul Menage
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

15 Jan, 2011

2 commits


14 Jan, 2011

1 commit


13 Jan, 2011

1 commit


07 Jan, 2011

7 commits

  • Reduce some branches and memory accesses in dcache lookup by adding dentry
    flags to indicate common d_ops are set, rather than having to check them.
    This saves a pointer memory access (dentry->d_op) in common path lookup
    situations, and saves another pointer load and branch in cases where we
    have d_op but not the particular operation.

    Patched with:

    git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • dget_locked was a shortcut to avoid the lazy lru manipulation when we already
    held dcache_lock (lru manipulation was relatively cheap at that point).
    However, how that the lru lock is an innermost one, we never hold it at any
    caller, so the lock cost can now be avoided. We already have well working lazy
    dcache LRU, so it should be fine to defer LRU manipulations to scan time.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • dcache_lock no longer protects anything. remove it.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Protect d_subdirs and d_child with d_lock, except in filesystems that aren't
    using dcache_lock for these anyway (eg. using i_mutex).

    Note: if we change the locking rule in future so that ->d_child protection is
    provided only with ->d_parent->d_lock, it may allow us to reduce some locking.
    But it would be an exception to an otherwise regular locking scheme, so we'd
    have to see some good results. Probably not worthwhile.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
    0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
    we start protecting many other dentry members with d_lock.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Change d_delete from a dentry deletion notification to a dentry caching
    advise, more like ->drop_inode. Require it to be constant and idempotent,
    and not take d_lock. This is how all existing filesystems use the callback
    anyway.

    This makes fine grained dentry locking of dput and dentry lru scanning
    much simpler.

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Switching d_op on a live dentry is racy in general, so avoid it. In this case
    it is a negative dentry, which is safer, but there are still concurrent ops
    which may be called on d_op in that case (eg. d_revalidate). So in general
    a filesystem may not do this. Fix cgroupfs so as not to do this.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

29 Oct, 2010

1 commit


28 Oct, 2010

3 commits

  • Function "strcpy" is used without check for maximum allowed source string
    length and could cause destination string overflow. Check for string
    length is added before using "strcpy". Function now is return error if
    source string length is more than a maximum.

    akpm: presently considered NotABug, but add the check for general
    future-safeness and robustness.

    Signed-off-by: Evgeny Kuznetsov
    Acked-by: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeny Kuznetsov
     
  • Current behavior:
    =================

    (1) When we mount a cgroup, we can specify the 'all' option which
    means to enable all the cgroup subsystems. This is the default option
    when no option is specified.

    (2) If we want to mount a cgroup with a subset of the supported cgroup
    subsystems, we have to specify a subsystems name list for the mount
    option.

    (3) If we specify another option like 'noprefix' or 'release_agent',
    the actual code wants the 'all' or a subsystem name option specified
    also. Not critical but a bit not friendly as we should assume (1) in
    this case.

    (4) Logically, the 'all' option is mutually exclusive with a subsystem
    name, but this is not detected.

    In other words:
    succeed : mount -t cgroup -o all,freezer cgroup /cgroup
    => is it 'all' or 'freezer' ?
    fails : mount -t cgroup -o noprefix cgroup /cgroup
    => succeed if we do '-o noprefix,all'

    The following patches consolidate a bit the mount options check.

    New behavior:
    =============

    (1) untouched
    (2) untouched
    (3) the 'all' option will be by default when specifying other than
    a subsystem name option
    (4) raises an error

    In other words:
    fails : mount -t cgroup -o all,freezer cgroup /cgroup
    succeed : mount -t cgroup -o noprefix cgroup /cgroup

    For the sake of lisibility, the if ... then ... else ... if ...
    indentation when parsing the options has been changed to:
    if ... then
    ...
    continue
    fi

    Signed-off-by: Daniel Lezcano
    Signed-off-by: Serge E. Hallyn
    Reviewed-by: Li Zefan
    Reviewed-by: Paul Menage
    Cc: Eric W. Biederman
    Cc: Jamal Hadi Salim
    Cc: Matt Helsley
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Lezcano
     
  • The ns_cgroup is a control group interacting with the namespaces. When a
    new namespace is created, a corresponding cgroup is automatically created
    too. The cgroup name is the pid of the process who did 'unshare' or the
    child of 'clone'.

    This cgroup is tied with the namespace because it prevents a process to
    escape the control group and use the post_clone callback, so the child
    cgroup inherits the values of the parent cgroup.

    Unfortunately, the more we use this cgroup and the more we are facing
    problems with it:

    (1) when a process unshares, the cgroup name may conflict with a
    previous cgroup with the same pid, so unshare or clone return -EEXIST

    (2) the cgroup creation is out of control because there may have an
    application creating several namespaces where the system will
    automatically create several cgroups in his back and let them on the
    cgroupfs (eg. a vrf based on the network namespace).

    (3) the mix of (1) and (2) force an administrator to regularly check
    and clean these cgroups.

    This patchset removes the ns_cgroup by adding a new flag to the cgroup and
    the cgroupfs mount option. It enables the copy of the parent cgroup when
    a child cgroup is created. We can then safely remove the ns_cgroup as
    this flag brings a compatibility. We have now to manually create and add
    the task to a cgroup, which is consistent with the cgroup framework.

    This patch:

    Sent as an answer to a previous thread around the ns_cgroup.

    https://lists.linux-foundation.org/pipermail/containers/2009-June/018627.html

    It adds a control file 'clone_children' for a cgroup. This control file
    is a boolean specifying if the child cgroup should be a clone of the
    parent cgroup or not. The default value is 'false'.

    This flag makes the child cgroup to call the post_clone callback of all
    the subsystem, if it is available.

    At present, the cpuset is the only one which had implemented the
    post_clone callback.

    The option can be set at mount time by specifying the 'clone_children'
    mount option.

    Signed-off-by: Daniel Lezcano
    Signed-off-by: Serge E. Hallyn
    Cc: Eric W. Biederman
    Acked-by: Paul Menage
    Reviewed-by: Li Zefan
    Cc: Jamal Hadi Salim
    Cc: Matt Helsley
    Acked-by: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Lezcano
     

26 Oct, 2010

1 commit

  • Instead of always assigning an increasing inode number in new_inode
    move the call to assign it into those callers that actually need it.
    For now callers that need it is estimated conservatively, that is
    the call is added to all filesystems that do not assign an i_ino
    by themselves. For a few more filesystems we can avoid assigning
    any inode number given that they aren't user visible, and for others
    it could be done lazily when an inode number is actually needed,
    but that's left for later patches.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Dave Chinner
    Signed-off-by: Al Viro

    Christoph Hellwig
     

23 Oct, 2010

1 commit

  • * 'vfs' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: (30 commits)
    BKL: remove BKL from freevxfs
    BKL: remove BKL from qnx4
    autofs4: Only declare function when CONFIG_COMPAT is defined
    autofs: Only declare function when CONFIG_COMPAT is defined
    ncpfs: Lock socket in ncpfs while setting its callbacks
    fs/locks.c: prepare for BKL removal
    BKL: Remove BKL from ncpfs
    BKL: Remove BKL from OCFS2
    BKL: Remove BKL from squashfs
    BKL: Remove BKL from jffs2
    BKL: Remove BKL from ecryptfs
    BKL: Remove BKL from afs
    BKL: Remove BKL from USB gadgetfs
    BKL: Remove BKL from autofs4
    BKL: Remove BKL from isofs
    BKL: Remove BKL from fat
    BKL: Remove BKL from ext2 filesystem
    BKL: Remove BKL from do_new_mount()
    BKL: Remove BKL from cgroup
    BKL: Remove BKL from NTFS
    ...

    Linus Torvalds
     

07 Oct, 2010

1 commit


05 Oct, 2010

2 commits

  • The BKL is only used in remount_fs and get_sb that are both protected by
    the superblocks s_umount rw_semaphore. Therefore it is safe to remove the
    BKL entirely.

    Signed-off-by: Jan Blunck
    Signed-off-by: Arnd Bergmann

    Jan Blunck
     
  • This patch is a preparation necessary to remove the BKL from do_new_mount().
    It explicitly adds calls to lock_kernel()/unlock_kernel() around
    get_sb/fill_super operations for filesystems that still uses the BKL.

    I've read through all the code formerly covered by the BKL inside
    do_kern_mount() and have satisfied myself that it doesn't need the BKL
    any more.

    do_kern_mount() is already called without the BKL when mounting the rootfs
    and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called
    from various places without BKL: simple_pin_fs(), nfs_do_clone_mount()
    through nfs_follow_mountpoint(), afs_mntpt_do_automount() through
    afs_mntpt_follow_link(). Both later functions are actually the filesystems
    follow_link inode operation. vfs_kern_mount() is calling the specified
    get_sb function and lets the filesystem do its job by calling the given
    fill_super function.

    Therefore I think it is safe to push down the BKL from the VFS to the
    low-level filesystems get_sb/fill_super operation.

    [arnd: do not add the BKL to those file systems that already
    don't use it elsewhere]

    Signed-off-by: Jan Blunck
    Signed-off-by: Arnd Bergmann
    Cc: Matthew Wilcox
    Cc: Christoph Hellwig

    Jan Blunck
     

10 Sep, 2010

1 commit

  • Add cgroup_attach_task_all()

    The existing cgroup_attach_task_current_cg() API is called by a thread to
    attach another thread to all of its cgroups; this is unsuitable for cases
    where a privileged task wants to attach itself to the cgroups of a less
    privileged one, since the call must be made from the context of the target
    task.

    This patch adds a more generic cgroup_attach_task_all() API that allows
    both the source task and to-be-moved task to be specified.
    cgroup_attach_task_current_cg() becomes a specialization of the more
    generic new function.

    [menage@google.com: rewrote changelog]
    [akpm@linux-foundation.org: address reviewer comments]
    Signed-off-by: Michael S. Tsirkin
    Tested-by: Alex Williamson
    Acked-by: Paul Menage
    Cc: Li Zefan
    Cc: Ben Blum
    Cc: Sridhar Samudrala
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael S. Tsirkin
     

20 Aug, 2010

1 commit


11 Aug, 2010

1 commit

  • The original code didn't leave enough space for a NULL terminator. These
    strings are copied with strcpy() into fixed length buffers in
    cgroup_root_from_opts().

    Signed-off-by: Dan Carpenter
    Acked-by: Serge E. Hallyn
    Reviewd-by: KAMEZAWA Hiroyuki
    Cc: Paul Menage
    Cc: Li Zefan
    Cc: Ben Blum
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter