24 Dec, 2010

1 commit

  • In construct_alloc_key(), up_write() is called in the error path if
    __key_link_begin() fails, but this is incorrect as __key_link_begin() only
    returns with the nominated keyring locked if it returns successfully.

    Without this patch, you might see the following in dmesg:

    =====================================
    [ BUG: bad unlock balance detected! ]
    -------------------------------------
    mount.cifs/5769 is trying to release lock (&key->sem) at:
    [] request_key_and_link+0x263/0x3fc
    but there are no more locks to release!

    other info that might help us debug this:
    3 locks held by mount.cifs/5769:
    #0: (&type->s_umount_key#41/1){+.+.+.}, at: [] sget+0x278/0x3e7
    #1: (&ret_buf->session_mutex){+.+.+.}, at: [] cifs_get_smb_ses+0x35a/0x443 [cifs]
    #2: (root_key_user.cons_lock){+.+.+.}, at: [] request_key_and_link+0x10a/0x3fc

    stack backtrace:
    Pid: 5769, comm: mount.cifs Not tainted 2.6.37-rc6+ #1
    Call Trace:
    [] ? request_key_and_link+0x263/0x3fc
    [] print_unlock_inbalance_bug+0xca/0xd5
    [] lock_release_non_nested+0xc1/0x263
    [] ? request_key_and_link+0x263/0x3fc
    [] ? request_key_and_link+0x263/0x3fc
    [] lock_release+0x17d/0x1a4
    [] up_write+0x23/0x3b
    [] request_key_and_link+0x263/0x3fc
    [] ? cifs_get_spnego_key+0x61/0x21f [cifs]
    [] request_key+0x41/0x74
    [] cifs_get_spnego_key+0x200/0x21f [cifs]
    [] CIFS_SessSetup+0x55d/0x1273 [cifs]
    [] cifs_setup_session+0x90/0x1ae [cifs]
    [] cifs_get_smb_ses+0x37f/0x443 [cifs]
    [] cifs_mount+0x1aa1/0x23f3 [cifs]
    [] ? alloc_debug_processing+0xdb/0x120
    [] ? cifs_get_spnego_key+0x1ef/0x21f [cifs]
    [] cifs_do_mount+0x165/0x2b3 [cifs]
    [] vfs_kern_mount+0xaf/0x1dc
    [] do_kern_mount+0x4d/0xef
    [] do_mount+0x6f4/0x733
    [] sys_mount+0x88/0xc2
    [] system_call_fastpath+0x16/0x1b

    Reported-by: Jeff Layton
    Signed-off-by: David Howells
    Reviewed-and-Tested-by: Jeff Layton
    Signed-off-by: Linus Torvalds

    David Howells
     

16 Nov, 2010

1 commit

  • The addition of CONFIG_SECURITY_DMESG_RESTRICT resulted in a build
    failure when CONFIG_PRINTK=n. This is because the capabilities code
    which used the new option was built even though the variable in question
    didn't exist.

    The patch here fixes this by moving the capabilities checks out of the
    LSM and into the caller. All (known) LSMs should have been calling the
    capabilities hook already so it actually makes the code organization
    better to eliminate the hook altogether.

    Signed-off-by: Eric Paris
    Acked-by: James Morris
    Signed-off-by: Linus Torvalds

    Eric Paris
     

13 Nov, 2010

1 commit


12 Nov, 2010

1 commit

  • The kernel syslog contains debugging information that is often useful
    during exploitation of other vulnerabilities, such as kernel heap
    addresses. Rather than futilely attempt to sanitize hundreds (or
    thousands) of printk statements and simultaneously cripple useful
    debugging functionality, it is far simpler to create an option that
    prevents unprivileged users from reading the syslog.

    This patch, loosely based on grsecurity's GRKERNSEC_DMESG, creates the
    dmesg_restrict sysctl. When set to "0", the default, no restrictions are
    enforced. When set to "1", only users with CAP_SYS_ADMIN can read the
    kernel syslog via dmesg(8) or other mechanisms.

    [akpm@linux-foundation.org: explain the config option in kernel.txt]
    Signed-off-by: Dan Rosenberg
    Acked-by: Ingo Molnar
    Acked-by: Eugene Teo
    Acked-by: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Rosenberg
     

11 Nov, 2010

2 commits


29 Oct, 2010

2 commits


27 Oct, 2010

13 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
    split invalidate_inodes()
    fs: skip I_FREEING inodes in writeback_sb_inodes
    fs: fold invalidate_list into invalidate_inodes
    fs: do not drop inode_lock in dispose_list
    fs: inode split IO and LRU lists
    fs: switch bdev inode bdi's correctly
    fs: fix buffer invalidation in invalidate_list
    fsnotify: use dget_parent
    smbfs: use dget_parent
    exportfs: use dget_parent
    fs: use RCU read side protection in d_validate
    fs: clean up dentry lru modification
    fs: split __shrink_dcache_sb
    fs: improve DCACHE_REFERENCED usage
    fs: use percpu counter for nr_dentry and nr_dentry_unused
    fs: simplify __d_free
    fs: take dcache_lock inside __d_path
    fs: do not assign default i_ino in new_inode
    fs: introduce a per-cpu last_ino allocator
    new helper: ihold()
    ...

    Linus Torvalds
     
  • * ima-memory-use-fixes:
    IMA: fix the ToMToU logic
    IMA: explicit IMA i_flag to remove global lock on inode_delete
    IMA: drop refcnt from ima_iint_cache since it isn't needed
    IMA: only allocate iint when needed
    IMA: move read counter into struct inode
    IMA: use i_writecount rather than a private counter
    IMA: use inode->i_lock to protect read and write counters
    IMA: convert internal flags from long to char
    IMA: use unsigned int instead of long for counters
    IMA: drop the inode opencount since it isn't needed for operation
    IMA: use rbtree instead of radix tree for inode information cache

    Linus Torvalds
     
  • Current logic looks like this:

    rc = ima_must_measure(NULL, inode, MAY_READ, FILE_CHECK);
    if (rc < 0)
    goto out;

    if (mode & FMODE_WRITE) {
    if (inode->i_readcount)
    send_tomtou = true;
    goto out;
    }

    if (atomic_read(&inode->i_writecount) > 0)
    send_writers = true;

    Lets assume we have a policy which states that all files opened for read
    by root must be measured.

    Lets assume the file has permissions 777.

    Lets assume that root has the given file open for read.

    Lets assume that a non-root process opens the file write.

    The non-root process will get to ima_counts_get() and will check the
    ima_must_measure(). Since it is not supposed to measure it will goto
    out.

    We should check the i_readcount no matter what since we might be causing
    a ToMToU voilation!

    This is close to correct, but still not quite perfect. The situation
    could have been that root, which was interested in the mesurement opened
    and closed the file and another process which is not interested in the
    measurement is the one holding the i_readcount ATM. This is just overly
    strict on ToMToU violations, which is better than not strict enough...

    Signed-off-by: Eric Paris
    Acked-by: Mimi Zohar
    Signed-off-by: Linus Torvalds

    Eric Paris
     
  • Currently for every removed inode IMA must take a global lock and search
    the IMA rbtree looking for an associated integrity structure. Instead
    we explicitly mark an inode when we add an integrity structure so we
    only have to take the global lock and do the removal if it exists.

    Signed-off-by: Eric Paris
    Acked-by: Mimi Zohar
    Signed-off-by: Linus Torvalds

    Eric Paris
     
  • Since finding a struct ima_iint_cache requires a valid struct inode, and
    the struct ima_iint_cache is supposed to have the same lifetime as a
    struct inode (technically they die together but don't need to be created
    at the same time) we don't have to worry about the ima_iint_cache
    outliving or dieing before the inode. So the refcnt isn't useful. Just
    get rid of it and free the structure when the inode is freed.

    Signed-off-by: Eric Paris
    Acked-by: Mimi Zohar
    Signed-off-by: Linus Torvalds

    Eric Paris
     
  • IMA always allocates an integrity structure to hold information about
    every inode, but only needed this structure to track the number of
    readers and writers currently accessing a given inode. Since that
    information was moved into struct inode instead of the integrity struct
    this patch stops allocating the integrity stucture until it is needed.
    Thus greatly reducing memory usage.

    Signed-off-by: Eric Paris
    Acked-by: Mimi Zohar
    Signed-off-by: Linus Torvalds

    Eric Paris
     
  • IMA currently allocated an inode integrity structure for every inode in
    core. This stucture is about 120 bytes long. Most files however
    (especially on a system which doesn't make use of IMA) will never need
    any of this space. The problem is that if IMA is enabled we need to
    know information about the number of readers and the number of writers
    for every inode on the box. At the moment we collect that information
    in the per inode iint structure and waste the rest of the space. This
    patch moves those counters into the struct inode so we can eventually
    stop allocating an IMA integrity structure except when absolutely
    needed.

    This patch does the minimum needed to move the location of the data.
    Further cleanups, especially the location of counter updates, may still
    be possible.

    Signed-off-by: Eric Paris
    Acked-by: Mimi Zohar
    Signed-off-by: Linus Torvalds

    Eric Paris
     
  • IMA tracks the number of struct files which are holding a given inode
    readonly and the number which are holding the inode write or r/w. It
    needs this information so when a new reader or writer comes in it can
    tell if this new file will be able to invalidate results it already made
    about existing files.

    aka if a task is holding a struct file open RO, IMA measured the file
    and recorded those measurements and then a task opens the file RW IMA
    needs to note in the logs that the old measurement may not be correct.
    It's called a "Time of Measure Time of Use" (ToMToU) issue. The same is
    true is a RO file is opened to an inode which has an open writer. We
    cannot, with any validity, measure the file in question since it could
    be changing.

    This patch attempts to use the i_writecount field to track writers. The
    i_writecount field actually embeds more information in it's value than
    IMA needs but it should work for our purposes and allow us to shrink the
    struct inode even more.

    Signed-off-by: Eric Paris
    Acked-by: Mimi Zohar
    Signed-off-by: Linus Torvalds

    Eric Paris
     
  • Currently IMA used the iint->mutex to protect the i_readcount and
    i_writecount. This patch uses the inode->i_lock since we are going to
    start using in inode objects and that is the most appropriate lock.

    Signed-off-by: Eric Paris
    Acked-by: Mimi Zohar
    Signed-off-by: Linus Torvalds

    Eric Paris
     
  • The IMA flags is an unsigned long but there is only 1 flag defined.
    Lets save a little space and make it a char. This packs nicely next to
    the array of u8's.

    Signed-off-by: Eric Paris
    Acked-by: Mimi Zohar
    Signed-off-by: Linus Torvalds

    Eric Paris
     
  • Currently IMA uses 2 longs in struct inode. To save space (and as it
    seems impossible to overflow 32 bits) we switch these to unsigned int.
    The switch to unsigned does require slightly different checks for
    underflow, but it isn't complex.

    Signed-off-by: Eric Paris
    Acked-by: Mimi Zohar
    Signed-off-by: Linus Torvalds

    Eric Paris
     
  • The opencount was used to help debugging to make sure that everything
    which created a struct file also correctly made the IMA calls. Since we
    moved all of that into the VFS this isn't as necessary. We should be
    able to get the same amount of debugging out of just the reader and
    write count.

    Signed-off-by: Eric Paris
    Acked-by: Mimi Zohar
    Signed-off-by: Linus Torvalds

    Eric Paris
     
  • The IMA code needs to store the number of tasks which have an open fd
    granting permission to write a file even when IMA is not in use. It
    needs this information in order to be enabled at a later point in time
    without losing it's integrity garantees.

    At the moment that means we store a little bit of data about every inode
    in a cache. We use a radix tree key'd on the inode's memory address.
    Dave Chinner pointed out that a radix tree is a terrible data structure
    for such a sparse key space. This patch switches to using an rbtree
    which should be more efficient.

    Bug report from Dave:

    "I just noticed that slabtop was reporting an awfully high usage of
    radix tree nodes:

    OBJS ACTIVE USE OBJ SIZE SLABS OBJ/SLAB CACHE SIZE NAME
    4200331 2778082 66% 0.55K 144839 29 2317424K radix_tree_node
    2321500 2060290 88% 1.00K 72581 32 2322592K xfs_inode
    2235648 2069791 92% 0.12K 69864 32 279456K iint_cache

    That is, 2.7M radix tree nodes are allocated, and the cache itself is
    consuming 2.3GB of RAM. I know that the XFS inodei caches are indexed
    by radix tree node, but for 2 million cached inodes that would mean a
    density of 1 inode per radix tree node, which for a system with 16M
    inodes in the filsystems is an impossibly low density. The worst I've
    seen in a production system like kernel.org is about 20-25% density,
    which would mean about 150-200k radix tree nodes for that many inodes.
    So it's not the inode cache.

    So I looked up what the iint_cache was. It appears to used for
    storing per-inode IMA information, and uses a radix tree for indexing.
    It uses the *address* of the struct inode as the indexing key. That
    means the key space is extremely sparse - for XFS the struct inode
    addresses are approximately 1000 bytes apart, which means the closest
    the radix tree index keys get is ~1000. Which means that there is a
    single entry per radix tree leaf node, so the radix tree is using
    roughly 550 bytes for every 120byte structure being cached. For the
    above example, it's probably wasting close to 1GB of RAM...."

    Reported-by: Dave Chinner
    Signed-off-by: Eric Paris
    Acked-by: Mimi Zohar
    Signed-off-by: Linus Torvalds

    Eric Paris
     

26 Oct, 2010

2 commits

  • All callers take dcache_lock just around the call to __d_path, so
    take the lock into it in preparation of getting rid of dcache_lock.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Instead of always assigning an increasing inode number in new_inode
    move the call to assign it into those callers that actually need it.
    For now callers that need it is estimated conservatively, that is
    the call is added to all filesystems that do not assign an i_ino
    by themselves. For a few more filesystems we can avoid assigning
    any inode number given that they aren't user visible, and for others
    it could be done lazily when an inode number is actually needed,
    but that's left for later patches.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Dave Chinner
    Signed-off-by: Al Viro

    Christoph Hellwig
     

23 Oct, 2010

1 commit

  • * 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
    vfs: make no_llseek the default
    vfs: don't use BKL in default_llseek
    llseek: automatically add .llseek fop
    libfs: use generic_file_llseek for simple_attr
    mac80211: disallow seeks in minstrel debug code
    lirc: make chardev nonseekable
    viotape: use noop_llseek
    raw: use explicit llseek file operations
    ibmasmfs: use generic_file_llseek
    spufs: use llseek in all file operations
    arm/omap: use generic_file_llseek in iommu_debug
    lkdtm: use generic_file_llseek in debugfs
    net/wireless: use generic_file_llseek in debugfs
    drm: use noop_llseek

    Linus Torvalds
     

21 Oct, 2010

16 commits

  • Include vmalloc.h for vmalloc_user (fixes ppc build warning).
    Acked-by: Eric Paris

    Signed-off-by: James Morris

    Stephen Rothwell
     
  • /selinux/policy allows a user to copy the policy back out of the kernel.
    This patch allows userspace to actually mmap that file and use it directly.

    Signed-off-by: Eric Paris
    Signed-off-by: James Morris

    Eric Paris
     
  • There is interest in being able to see what the actual policy is that was
    loaded into the kernel. The patch creates a new selinuxfs file
    /selinux/policy which can be read by userspace. The actual policy that is
    loaded into the kernel will be written back out to userspace.

    Signed-off-by: Eric Paris
    Signed-off-by: James Morris

    Eric Paris
     
  • AVTAB_MAX_SIZE was a define which was supposed to be used in userspace to
    define a maximally sized avtab when userspace wasn't sure how big of a table
    it needed. It doesn't make sense in the kernel since we always know our table
    sizes. The only place it is used we have a more appropiately named define
    called AVTAB_MAX_HASH_BUCKETS, use that instead.

    Signed-off-by: Eric Paris
    Signed-off-by: James Morris

    Eric Paris
     
  • Range transition rules are placed in the hash table in an (almost)
    arbitrary order. This patch inserts them in a fixed order to make policy
    retrival more predictable.

    Signed-off-by: Eric Paris
    Signed-off-by: James Morris

    Eric Paris
     
  • With the (long ago) interface change to have the secid_to_secctx functions
    do the string allocation instead of having the caller do the allocation we
    lost the ability to query the security server for the length of the
    upcoming string. The SECMARK code would like to allocate a netlink skb
    with enough length to hold the string but it is just too unclean to do the
    string allocation twice or to do the allocation the first time and hold
    onto the string and slen. This patch adds the ability to call
    security_secid_to_secctx() with a NULL data pointer and it will just set
    the slen pointer.

    Signed-off-by: Eric Paris
    Reviewed-by: Paul Moore
    Signed-off-by: James Morris

    Eric Paris
     
  • Right now secmark has lots of direct selinux calls. Use all LSM calls and
    remove all SELinux specific knowledge. The only SELinux specific knowledge
    we leave is the mode. The only point is to make sure that other LSMs at
    least test this generic code before they assume it works. (They may also
    have to make changes if they do not represent labels as strings)

    Signed-off-by: Eric Paris
    Acked-by: Paul Moore
    Acked-by: Patrick McHardy
    Signed-off-by: James Morris

    Eric Paris
     
  • Actually I think in this case the appropriate thing to do is to BUG as there
    is currently a case (remove) where the alloc_size needs to be larger than
    the copy_size, and if copy_size is ever greater than alloc_size there is
    a mistake in the caller code.

    Signed-off-by: John Johansen
    Acked-by: Kees Cook
    Signed-off-by: James Morris

    John Johansen
     
  • Configuration files for TOMOYO 2.3 are not compatible with TOMOYO 2.2.
    But current panic() message is too unfriendly and is confusing users.

    Signed-off-by: Tetsuo Handa
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: James Morris

    Tetsuo Handa
     
  • All security modules shouldn't change sched_param parameter of
    security_task_setscheduler(). This is not only meaningless, but also
    make a harmful result if caller pass a static variable.

    This patch remove policy and sched_param parameter from
    security_task_setscheduler() becuase none of security module is
    using it.

    Cc: James Morris
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: James Morris

    KOSAKI Motohiro
     
  • This patch fixes up coding-style problem at this commit:

    4f27a7d49789b04404eca26ccde5f527231d01d5
    selinux: fast status update interface (/selinux/status)

    Signed-off-by: KaiGai Kohei
    Signed-off-by: James Morris

    KaiGai Kohei
     
  • Replace EXTRA_CFLAGS with ccflags-y.

    Signed-off-by: matt mooney
    Signed-off-by: James Morris

    matt mooney
     
  • While the previous change to the selinux Makefile reduced the window
    significantly for this failure, it is still possible to see a compile
    failure where cpp starts processing selinux files before the auto
    generated flask.h file is completed. This is easily reproduced by
    adding the following temporary change to expose the issue everytime:

    - cmd_flask = scripts/selinux/genheaders/genheaders ...
    + cmd_flask = sleep 30 ; scripts/selinux/genheaders/genheaders ...

    This failure happens because the creation of the object files in the ss
    subdir also depends on flask.h. So simply incorporate them into the
    parent Makefile, as the ss/Makefile really doesn't do anything unique.

    With this change, compiling of all selinux files is dependent on
    completion of the header file generation, and this test case with
    the "sleep 30" now confirms it is functioning as expected.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: James Morris

    Paul Gortmaker
     
  • Selinux has an autogenerated file, "flask.h" which is included by
    two other selinux files. The current makefile has a single dependency
    on the first object file in the selinux-y list, assuming that will get
    flask.h generated before anyone looks for it, but that assumption breaks
    down in a "make -jN" situation and you get:

    selinux/selinuxfs.c:35: fatal error: flask.h: No such file or directory
    compilation terminated.
    remake[9]: *** [security/selinux/selinuxfs.o] Error 1

    Since flask.h is included by security.h which in turn is included
    nearly everywhere, make the dependency apply to all of the selinux-y
    list of objs.

    Signed-off-by: Paul Gortmaker
    Signed-off-by: James Morris

    Paul Gortmaker
     
  • This patch provides a new /selinux/status entry which allows applications
    read-only mmap(2).
    This region reflects selinux_kernel_status structure in kernel space.
    struct selinux_kernel_status
    {
    u32 length; /* length of this structure */
    u32 sequence; /* sequence number of seqlock logic */
    u32 enforcing; /* current setting of enforcing mode */
    u32 policyload; /* times of policy reloaded */
    u32 deny_unknown; /* current setting of deny_unknown */
    };

    When userspace object manager caches access control decisions provided
    by SELinux, it needs to invalidate the cache on policy reload and setenforce
    to keep consistency.
    However, the applications need to check the kernel state for each accesses
    on userspace avc, or launch a background worker process.
    In heuristic, frequency of invalidation is much less than frequency of
    making access control decision, so it is annoying to invoke a system call
    to check we don't need to invalidate the userspace cache.
    If we can use a background worker thread, it allows to receive invalidation
    messages from the kernel. But it requires us an invasive coding toward the
    base application in some cases; E.g, when we provide a feature performing
    with SELinux as a plugin module, it is unwelcome manner to launch its own
    worker thread from the module.

    If we could map /selinux/status to process memory space, application can
    know updates of selinux status; policy reload or setenforce.

    A typical application checks selinux_kernel_status::sequence when it tries
    to reference userspace avc. If it was changed from the last time when it
    checked userspace avc, it means something was updated in the kernel space.
    Then, the application can reset userspace avc or update current enforcing
    mode, without any system call invocations.
    This sequence number is updated according to the seqlock logic, so we need
    to wait for a while if it is odd number.

    Signed-off-by: KaiGai Kohei
    Acked-by: Eric Paris
    --
    security/selinux/include/security.h | 21 ++++++
    security/selinux/selinuxfs.c | 56 +++++++++++++++
    security/selinux/ss/Makefile | 2 +-
    security/selinux/ss/services.c | 3 +
    security/selinux/ss/status.c | 129 +++++++++++++++++++++++++++++++++++
    5 files changed, 210 insertions(+), 1 deletions(-)
    Signed-off-by: James Morris

    KaiGai Kohei
     
  • Signed-off-by: Yong Zhang
    Signed-off-by: John Johansen
    Signed-off-by: James Morris

    Yong Zhang