27 Jun, 2014

2 commits

  • commit 2fb1c9a4f2dbc2f0bd2431c7fa64d0b5483864e4 upstream.

    Calculating the 'security.evm' HMAC value requires access to the
    EVM encrypted key. Only the kernel should have access to it. This
    patch prevents userspace tools(eg. setfattr, cp --preserve=xattr)
    from setting/modifying the 'security.evm' HMAC value directly.

    Signed-off-by: Mimi Zohar
    Signed-off-by: Greg Kroah-Hartman

    Mimi Zohar
     
  • commit 0430e49b6e7c6b5e076be8fefdee089958c9adad upstream.

    Commit 8aac62706 "move exit_task_namespaces() outside of exit_notify"
    introduced the kernel opps since the kernel v3.10, which happens when
    Apparmor and IMA-appraisal are enabled at the same time.

    ----------------------------------------------------------------------
    [ 106.750167] BUG: unable to handle kernel NULL pointer dereference at
    0000000000000018
    [ 106.750221] IP: [] our_mnt+0x1a/0x30
    [ 106.750241] PGD 0
    [ 106.750254] Oops: 0000 [#1] SMP
    [ 106.750272] Modules linked in: cuse parport_pc ppdev bnep rfcomm
    bluetooth rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc
    fscache dm_crypt intel_rapl x86_pkg_temp_thermal intel_powerclamp
    kvm_intel snd_hda_codec_hdmi kvm crct10dif_pclmul crc32_pclmul
    ghash_clmulni_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul
    ablk_helper cryptd snd_hda_codec_realtek dcdbas snd_hda_intel
    snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi
    snd_seq_midi_event snd_rawmidi psmouse snd_seq microcode serio_raw
    snd_timer snd_seq_device snd soundcore video lpc_ich coretemp mac_hid lp
    parport mei_me mei nbd hid_generic e1000e usbhid ahci ptp hid libahci
    pps_core
    [ 106.750658] CPU: 6 PID: 1394 Comm: mysqld Not tainted 3.13.0-rc7-kds+ #15
    [ 106.750673] Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A08
    09/19/2012
    [ 106.750689] task: ffff8800de804920 ti: ffff880400fca000 task.ti:
    ffff880400fca000
    [ 106.750704] RIP: 0010:[] []
    our_mnt+0x1a/0x30
    [ 106.750725] RSP: 0018:ffff880400fcba60 EFLAGS: 00010286
    [ 106.750738] RAX: 0000000000000000 RBX: 0000000000000100 RCX:
    ffff8800d51523e7
    [ 106.750764] RDX: ffffffffffffffea RSI: ffff880400fcba34 RDI:
    ffff880402d20020
    [ 106.750791] RBP: ffff880400fcbae0 R08: 0000000000000000 R09:
    0000000000000001
    [ 106.750817] R10: 0000000000000000 R11: 0000000000000001 R12:
    ffff8800d5152300
    [ 106.750844] R13: ffff8803eb8df510 R14: ffff880400fcbb28 R15:
    ffff8800d51523e7
    [ 106.750871] FS: 0000000000000000(0000) GS:ffff88040d200000(0000)
    knlGS:0000000000000000
    [ 106.750910] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 106.750935] CR2: 0000000000000018 CR3: 0000000001c0e000 CR4:
    00000000001407e0
    [ 106.750962] Stack:
    [ 106.750981] ffffffff813434eb ffff880400fcbb20 ffff880400fcbb18
    0000000000000000
    [ 106.751037] ffff8800de804920 ffffffff8101b9b9 0001800000000000
    0000000000000100
    [ 106.751093] 0000010000000000 0000000000000002 000000000000000e
    ffff8803eb8df500
    [ 106.751149] Call Trace:
    [ 106.751172] [] ? aa_path_name+0x2ab/0x430
    [ 106.751199] [] ? sched_clock+0x9/0x10
    [ 106.751225] [] aa_path_perm+0x7d/0x170
    [ 106.751250] [] ? native_sched_clock+0x15/0x80
    [ 106.751276] [] aa_file_perm+0x33/0x40
    [ 106.751301] [] common_file_perm+0x8e/0xb0
    [ 106.751327] [] apparmor_file_permission+0x18/0x20
    [ 106.751355] [] security_file_permission+0x23/0xa0
    [ 106.751382] [] rw_verify_area+0x52/0xe0
    [ 106.751407] [] vfs_read+0x6d/0x170
    [ 106.751432] [] kernel_read+0x41/0x60
    [ 106.751457] [] ima_calc_file_hash+0x225/0x280
    [ 106.751483] [] ? ima_calc_file_hash+0x32/0x280
    [ 106.751509] [] ima_collect_measurement+0x9d/0x160
    [ 106.751536] [] ? trace_hardirqs_on+0xd/0x10
    [ 106.751562] [] ? ima_file_free+0x6c/0xd0
    [ 106.751587] [] ima_update_xattr+0x34/0x60
    [ 106.751612] [] ima_file_free+0xc0/0xd0
    [ 106.751637] [] __fput+0xd5/0x300
    [ 106.751662] [] ____fput+0xe/0x10
    [ 106.751687] [] task_work_run+0xc4/0xe0
    [ 106.751712] [] do_exit+0x2bd/0xa90
    [ 106.751738] [] ? retint_swapgs+0x13/0x1b
    [ 106.751763] [] do_group_exit+0x4c/0xc0
    [ 106.751788] [] SyS_exit_group+0x14/0x20
    [ 106.751814] [] system_call_fastpath+0x1a/0x1f
    [ 106.751839] Code: c3 0f 1f 44 00 00 55 48 89 e5 e8 22 fe ff ff 5d c3
    0f 1f 44 00 00 55 65 48 8b 04 25 c0 c9 00 00 48 8b 80 28 06 00 00 48 89
    e5 5d 8b 40 18 48 39 87 c0 00 00 00 0f 94 c0 c3 0f 1f 80 00 00 00
    [ 106.752185] RIP [] our_mnt+0x1a/0x30
    [ 106.752214] RSP
    [ 106.752236] CR2: 0000000000000018
    [ 106.752258] ---[ end trace 3c520748b4732721 ]---
    ----------------------------------------------------------------------

    The reason for the oops is that IMA-appraisal uses "kernel_read()" when
    file is closed. kernel_read() honors LSM security hook which calls
    Apparmor handler, which uses current->nsproxy->mnt_ns. The 'guilty'
    commit changed the order of cleanup code so that nsproxy->mnt_ns was
    not already available for Apparmor.

    Discussion about the issue with Al Viro and Eric W. Biederman suggested
    that kernel_read() is too high-level for IMA. Another issue, except
    security checking, that was identified is mandatory locking. kernel_read
    honors it as well and it might prevent IMA from calculating necessary hash.
    It was suggested to use simplified version of the function without security
    and locking checks.

    This patch introduces special version ima_kernel_read(), which skips security
    and mandatory locking checking. It prevents the kernel oops to happen.

    Signed-off-by: Dmitry Kasatkin
    Suggested-by: Eric W. Biederman
    Signed-off-by: Mimi Zohar
    Signed-off-by: Greg Kroah-Hartman

    Dmitry Kasatkin
     

14 Apr, 2014

1 commit

  • commit f64410ec665479d7b4b77b7519e814253ed0f686 upstream.

    This patch is based on an earlier patch by Eric Paris, he describes
    the problem below:

    "If an inode is accessed before policy load it will get placed on a
    list of inodes to be initialized after policy load. After policy
    load we call inode_doinit() which calls inode_doinit_with_dentry()
    on all inodes accessed before policy load. In the case of inodes
    in procfs that means we'll end up at the bottom where it does:

    /* Default to the fs superblock SID. */
    isec->sid = sbsec->sid;

    if ((sbsec->flags & SE_SBPROC) && !S_ISLNK(inode->i_mode)) {
    if (opt_dentry) {
    isec->sclass = inode_mode_to_security_class(...)
    rc = selinux_proc_get_sid(opt_dentry,
    isec->sclass,
    &sid);
    if (rc)
    goto out_unlock;
    isec->sid = sid;
    }
    }

    Since opt_dentry is null, we'll never call selinux_proc_get_sid()
    and will leave the inode labeled with the label on the superblock.
    I believe a fix would be to mimic the behavior of xattrs. Look
    for an alias of the inode. If it can't be found, just leave the
    inode uninitialized (and pick it up later) if it can be found, we
    should be able to call selinux_proc_get_sid() ..."

    On a system exhibiting this problem, you will notice a lot of files in
    /proc with the generic "proc_t" type (at least the ones that were
    accessed early in the boot), for example:

    # ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }'
    system_u:object_r:proc_t:s0 /proc/sys/kernel/shmmax

    However, with this patch in place we see the expected result:

    # ls -Z /proc/sys/kernel/shmmax | awk '{ print $4 " " $5 }'
    system_u:object_r:sysctl_kernel_t:s0 /proc/sys/kernel/shmmax

    Cc: Eric Paris
    Signed-off-by: Paul Moore
    Acked-by: Eric Paris
    Signed-off-by: Greg Kroah-Hartman

    Paul Moore
     

07 Mar, 2014

1 commit

  • commit 9085a6422900092886da8c404e1c5340c4ff1cbf upstream.

    When writing policy via /sys/fs/selinux/policy I wrote the type and class
    of filename trans rules in CPU endian instead of little endian. On
    x86_64 this works just fine, but it means that on big endian arch's like
    ppc64 and s390 userspace reads the policy and converts it from
    le32_to_cpu. So the values are all screwed up. Write the values in le
    format like it should have been to start.

    Signed-off-by: Eric Paris
    Acked-by: Stephen Smalley
    Signed-off-by: Paul Moore
    Signed-off-by: Greg Kroah-Hartman

    Eric Paris
     

21 Feb, 2014

1 commit

  • commit 2172fa709ab32ca60e86179dc67d0857be8e2c98 upstream.

    Setting an empty security context (length=0) on a file will
    lead to incorrectly dereferencing the type and other fields
    of the security context structure, yielding a kernel BUG.
    As a zero-length security context is never valid, just reject
    all such security contexts whether coming from userspace
    via setxattr or coming from the filesystem upon a getxattr
    request by SELinux.

    Setting a security context value (empty or otherwise) unknown to
    SELinux in the first place is only possible for a root process
    (CAP_MAC_ADMIN), and, if running SELinux in enforcing mode, only
    if the corresponding SELinux mac_admin permission is also granted
    to the domain by policy. In Fedora policies, this is only allowed for
    specific domains such as livecd for setting down security contexts
    that are not defined in the build host policy.

    Reproducer:
    su
    setenforce 0
    touch foo
    setfattr -n security.selinux foo

    Caveat:
    Relabeling or removing foo after doing the above may not be possible
    without booting with SELinux disabled. Any subsequent access to foo
    after doing the above will also trigger the BUG.

    BUG output from Matthew Thode:
    [ 473.893141] ------------[ cut here ]------------
    [ 473.962110] kernel BUG at security/selinux/ss/services.c:654!
    [ 473.995314] invalid opcode: 0000 [#6] SMP
    [ 474.027196] Modules linked in:
    [ 474.058118] CPU: 0 PID: 8138 Comm: ls Tainted: G D I
    3.13.0-grsec #1
    [ 474.116637] Hardware name: Supermicro X8ST3/X8ST3, BIOS 2.0
    07/29/10
    [ 474.149768] task: ffff8805f50cd010 ti: ffff8805f50cd488 task.ti:
    ffff8805f50cd488
    [ 474.183707] RIP: 0010:[] []
    context_struct_compute_av+0xce/0x308
    [ 474.219954] RSP: 0018:ffff8805c0ac3c38 EFLAGS: 00010246
    [ 474.252253] RAX: 0000000000000000 RBX: ffff8805c0ac3d94 RCX:
    0000000000000100
    [ 474.287018] RDX: ffff8805e8aac000 RSI: 00000000ffffffff RDI:
    ffff8805e8aaa000
    [ 474.321199] RBP: ffff8805c0ac3cb8 R08: 0000000000000010 R09:
    0000000000000006
    [ 474.357446] R10: 0000000000000000 R11: ffff8805c567a000 R12:
    0000000000000006
    [ 474.419191] R13: ffff8805c2b74e88 R14: 00000000000001da R15:
    0000000000000000
    [ 474.453816] FS: 00007f2e75220800(0000) GS:ffff88061fc00000(0000)
    knlGS:0000000000000000
    [ 474.489254] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 474.522215] CR2: 00007f2e74716090 CR3: 00000005c085e000 CR4:
    00000000000207f0
    [ 474.556058] Stack:
    [ 474.584325] ffff8805c0ac3c98 ffffffff811b549b ffff8805c0ac3c98
    ffff8805f1190a40
    [ 474.618913] ffff8805a6202f08 ffff8805c2b74e88 00068800d0464990
    ffff8805e8aac860
    [ 474.653955] ffff8805c0ac3cb8 000700068113833a ffff880606c75060
    ffff8805c0ac3d94
    [ 474.690461] Call Trace:
    [ 474.723779] [] ? lookup_fast+0x1cd/0x22a
    [ 474.778049] [] security_compute_av+0xf4/0x20b
    [ 474.811398] [] avc_compute_av+0x2a/0x179
    [ 474.843813] [] avc_has_perm+0x45/0xf4
    [ 474.875694] [] inode_has_perm+0x2a/0x31
    [ 474.907370] [] selinux_inode_getattr+0x3c/0x3e
    [ 474.938726] [] security_inode_getattr+0x1b/0x22
    [ 474.970036] [] vfs_getattr+0x19/0x2d
    [ 475.000618] [] vfs_fstatat+0x54/0x91
    [ 475.030402] [] vfs_lstat+0x19/0x1b
    [ 475.061097] [] SyS_newlstat+0x15/0x30
    [ 475.094595] [] ? __audit_syscall_entry+0xa1/0xc3
    [ 475.148405] [] system_call_fastpath+0x16/0x1b
    [ 475.179201] Code: 00 48 85 c0 48 89 45 b8 75 02 0f 0b 48 8b 45 a0 48
    8b 3d 45 d0 b6 00 8b 40 08 89 c6 ff ce e8 d1 b0 06 00 48 85 c0 49 89 c7
    75 02 0b 48 8b 45 b8 4c 8b 28 eb 1e 49 8d 7d 08 be 80 01 00 00 e8
    [ 475.255884] RIP []
    context_struct_compute_av+0xce/0x308
    [ 475.296120] RSP
    [ 475.328734] ---[ end trace f076482e9d754adc ]---

    Reported-by: Matthew Thode
    Signed-off-by: Stephen Smalley
    Signed-off-by: Paul Moore
    Signed-off-by: Greg Kroah-Hartman

    Stephen Smalley
     

14 Feb, 2014

1 commit

  • commit 8ed814602876bec9bad2649ca17f34b499357a1c upstream.

    Hello.

    I got below leak with linux-3.10.0-54.0.1.el7.x86_64 .

    [ 681.903890] kmemleak: 5538 new suspected memory leaks (see /sys/kernel/debug/kmemleak)

    Below is a patch, but I don't know whether we need special handing for undoing
    ebitmap_set_bit() call.
    ----------
    >>From fe97527a90fe95e2239dfbaa7558f0ed559c0992 Mon Sep 17 00:00:00 2001
    From: Tetsuo Handa
    Date: Mon, 6 Jan 2014 16:30:21 +0900
    Subject: SELinux: Fix memory leak upon loading policy

    Commit 2463c26d "SELinux: put name based create rules in a hashtable" did not
    check return value from hashtab_insert() in filename_trans_read(). It leaks
    memory if hashtab_insert() returns error.

    unreferenced object 0xffff88005c9160d0 (size 8):
    comm "systemd", pid 1, jiffies 4294688674 (age 235.265s)
    hex dump (first 8 bytes):
    57 0b 00 00 6b 6b 6b a5 W...kkk.
    backtrace:
    [] kmemleak_alloc+0x4e/0xb0
    [] kmem_cache_alloc_trace+0x12e/0x360
    [] policydb_read+0xd1d/0xf70
    [] security_load_policy+0x6c/0x500
    [] sel_write_load+0xac/0x750
    [] vfs_write+0xc0/0x1f0
    [] SyS_write+0x4c/0xa0
    [] system_call_fastpath+0x16/0x1b
    [] 0xffffffffffffffff

    However, we should not return EEXIST error to the caller, or the systemd will
    show below message and the boot sequence freezes.

    systemd[1]: Failed to load SELinux policy. Freezing.

    Signed-off-by: Tetsuo Handa
    Acked-by: Eric Paris
    Signed-off-by: Paul Moore
    Signed-off-by: Greg Kroah-Hartman

    Tetsuo Handa
     

26 Jan, 2014

1 commit

  • commit 3dc91d4338d698ce77832985f9cb183d8eeaf6be upstream.

    While running stress tests on adding and deleting ftrace instances I hit
    this bug:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
    IP: selinux_inode_permission+0x85/0x160
    PGD 63681067 PUD 7ddbe067 PMD 0
    Oops: 0000 [#1] PREEMPT
    CPU: 0 PID: 5634 Comm: ftrace-test-mki Not tainted 3.13.0-rc4-test-00033-gd2a6dde-dirty #20
    Hardware name: /DG965MQ, BIOS MQ96510J.86A.0372.2006.0605.1717 06/05/2006
    task: ffff880078375800 ti: ffff88007ddb0000 task.ti: ffff88007ddb0000
    RIP: 0010:[] [] selinux_inode_permission+0x85/0x160
    RSP: 0018:ffff88007ddb1c48 EFLAGS: 00010246
    RAX: 0000000000000000 RBX: 0000000000800000 RCX: ffff88006dd43840
    RDX: 0000000000000001 RSI: 0000000000000081 RDI: ffff88006ee46000
    RBP: ffff88007ddb1c88 R08: 0000000000000000 R09: ffff88007ddb1c54
    R10: 6e6576652f6f6f66 R11: 0000000000000003 R12: 0000000000000000
    R13: 0000000000000081 R14: ffff88006ee46000 R15: 0000000000000000
    FS: 00007f217b5b6700(0000) GS:ffffffff81e21000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033^M
    CR2: 0000000000000020 CR3: 000000006a0fe000 CR4: 00000000000007f0
    Call Trace:
    security_inode_permission+0x1c/0x30
    __inode_permission+0x41/0xa0
    inode_permission+0x18/0x50
    link_path_walk+0x66/0x920
    path_openat+0xa6/0x6c0
    do_filp_open+0x43/0xa0
    do_sys_open+0x146/0x240
    SyS_open+0x1e/0x20
    system_call_fastpath+0x16/0x1b
    Code: 84 a1 00 00 00 81 e3 00 20 00 00 89 d8 83 c8 02 40 f6 c6 04 0f 45 d8 40 f6 c6 08 74 71 80 cf 02 49 8b 46 38 4c 8d 4d cc 45 31 c0 b7 50 20 8b 70 1c 48 8b 41 70 89 d9 8b 78 04 e8 36 cf ff ff
    RIP selinux_inode_permission+0x85/0x160
    CR2: 0000000000000020

    Investigating, I found that the inode->i_security was NULL, and the
    dereference of it caused the oops.

    in selinux_inode_permission():

    isec = inode->i_security;

    rc = avc_has_perm_noaudit(sid, isec->sid, isec->sclass, perms, 0, &avd);

    Note, the crash came from stressing the deletion and reading of debugfs
    files. I was not able to recreate this via normal files. But I'm not
    sure they are safe. It may just be that the race window is much harder
    to hit.

    What seems to have happened (and what I have traced), is the file is
    being opened at the same time the file or directory is being deleted.
    As the dentry and inode locks are not held during the path walk, nor is
    the inodes ref counts being incremented, there is nothing saving these
    structures from being discarded except for an rcu_read_lock().

    The rcu_read_lock() protects against freeing of the inode, but it does
    not protect freeing of the inode_security_struct. Now if the freeing of
    the i_security happens with a call_rcu(), and the i_security field of
    the inode is not changed (it gets freed as the inode gets freed) then
    there will be no issue here. (Linus Torvalds suggested not setting the
    field to NULL such that we do not need to check if it is NULL in the
    permission check).

    Note, this is a hack, but it fixes the problem at hand. A real fix is
    to restructure the destroy_inode() to call all the destructor handlers
    from the RCU callback. But that is a major job to do, and requires a
    lot of work. For now, we just band-aid this bug with this fix (it
    works), and work on a more maintainable solution in the future.

    Link: http://lkml.kernel.org/r/20140109101932.0508dec7@gandalf.local.home
    Link: http://lkml.kernel.org/r/20140109182756.17abaaa8@gandalf.local.home

    Signed-off-by: Steven Rostedt
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Steven Rostedt
     

10 Jan, 2014

4 commits

  • commit c0828e50485932b7e019df377a6b0a8d1ebd3080 upstream.

    Due to difficulty in arriving at the proper security label for
    TCP SYN-ACK packets in selinux_ip_postroute(), we need to check packets
    while/before they are undergoing XFRM transforms instead of waiting
    until afterwards so that we can determine the correct security label.

    Reported-by: Janak Desai
    Signed-off-by: Paul Moore
    Signed-off-by: Greg Kroah-Hartman

    Paul Moore
     
  • commit 817eff718dca4e54d5721211ddde0914428fbb7c upstream.

    Previously selinux_skb_peerlbl_sid() would only check for labeled
    IPsec security labels on inbound packets, this patch enables it to
    check both inbound and outbound traffic for labeled IPsec security
    labels.

    Reported-by: Janak Desai
    Signed-off-by: Paul Moore
    Signed-off-by: Greg Kroah-Hartman

    Paul Moore
     
  • commit c0c1439541f5305b57a83d599af32b74182933fe upstream.

    selinux_setprocattr() does ptrace_parent(p) under task_lock(p),
    but task_struct->alloc_lock doesn't pin ->parent or ->ptrace,
    this looks confusing and triggers the "suspicious RCU usage"
    warning because ptrace_parent() does rcu_dereference_check().

    And in theory this is wrong, spin_lock()->preempt_disable()
    doesn't necessarily imply rcu_read_lock() we need to access
    the ->parent.

    Reported-by: Evan McNabb
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Paul Moore
    Signed-off-by: Greg Kroah-Hartman

    Oleg Nesterov
     
  • commit 46d01d63221c3508421dd72ff9c879f61053cffc upstream.

    Fix a broken networking check. Return an error if peer recv fails. If
    secmark is active and the packet recv succeeds the peer recv error is
    ignored.

    Signed-off-by: Chad Hanson
    Signed-off-by: Paul Moore
    Signed-off-by: Greg Kroah-Hartman

    Chad Hanson
     

20 Dec, 2013

2 commits

  • commit 446b802437f285de68ffb8d6fac3c44c3cab5b04 upstream.

    In selinux_ip_postroute() we perform access checks based on the
    packet's security label. For locally generated traffic we get the
    packet's security label from the associated socket; this works in all
    cases except for TCP SYN-ACK packets. In the case of SYN-ACK packet's
    the correct security label is stored in the connection's request_sock,
    not the server's socket. Unfortunately, at the point in time when
    selinux_ip_postroute() is called we can't query the request_sock
    directly, we need to recreate the label using the same logic that
    originally labeled the associated request_sock.

    See the inline comments for more explanation.

    Reported-by: Janak Desai
    Tested-by: Janak Desai
    Signed-off-by: Paul Moore
    Signed-off-by: Greg Kroah-Hartman

    Paul Moore
     
  • commit 47180068276a04ed31d24fe04c673138208b07a9 upstream.

    In selinux_ip_output() we always label packets based on the parent
    socket. While this approach works in almost all cases, it doesn't
    work in the case of TCP SYN-ACK packets when the correct label is not
    the label of the parent socket, but rather the label of the larval
    socket represented by the request_sock struct.

    Unfortunately, since the request_sock isn't queued on the parent
    socket until *after* the SYN-ACK packet is sent, we can't lookup the
    request_sock to determine the correct label for the packet; at this
    point in time the best we can do is simply pass/NF_ACCEPT the packet.
    It must be said that simply passing the packet without any explicit
    labeling action, while far from ideal, is not terrible as the SYN-ACK
    packet will inherit any IP option based labeling from the initial
    connection request so the label *should* be correct and all our
    access controls remain in place so we shouldn't have to worry about
    information leaks.

    Reported-by: Janak Desai
    Tested-by: Janak Desai
    Signed-off-by: Paul Moore
    Signed-off-by: Greg Kroah-Hartman

    Paul Moore
     

05 Dec, 2013

1 commit

  • commit 42d64e1add3a1ce8a787116036163b8724362145 upstream.

    The SELinux/NetLabel glue code has a locking bug that affects systems
    with NetLabel enabled, see the kernel error message below. This patch
    corrects this problem by converting the bottom half socket lock to a
    more conventional, and correct for this call-path, lock_sock() call.

    ===============================
    [ INFO: suspicious RCU usage. ]
    3.11.0-rc3+ #19 Not tainted
    -------------------------------
    net/ipv4/cipso_ipv4.c:1928 suspicious rcu_dereference_protected() usage!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 0
    2 locks held by ping/731:
    #0: (slock-AF_INET/1){+.-...}, at: [...] selinux_netlbl_socket_connect
    #1: (rcu_read_lock){.+.+..}, at: [] netlbl_conn_setattr

    stack backtrace:
    CPU: 1 PID: 731 Comm: ping Not tainted 3.11.0-rc3+ #19
    Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
    0000000000000001 ffff88006f659d28 ffffffff81726b6a ffff88003732c500
    ffff88006f659d58 ffffffff810e4457 ffff88006b845a00 0000000000000000
    000000000000000c ffff880075aa2f50 ffff88006f659d90 ffffffff8169bec7
    Call Trace:
    [] dump_stack+0x54/0x74
    [] lockdep_rcu_suspicious+0xe7/0x120
    [] cipso_v4_sock_setattr+0x187/0x1a0
    [] netlbl_conn_setattr+0x187/0x190
    [] ? netlbl_conn_setattr+0x5/0x190
    [] selinux_netlbl_socket_connect+0xae/0xc0
    [] selinux_socket_connect+0x135/0x170
    [] ? might_fault+0x57/0xb0
    [] security_socket_connect+0x16/0x20
    [] SYSC_connect+0x73/0x130
    [] ? sysret_check+0x22/0x5d
    [] ? trace_hardirqs_on_caller+0xfd/0x1c0
    [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [] SyS_connect+0xe/0x10
    [] system_call_fastpath+0x16/0x1b

    Signed-off-by: Paul Moore
    Signed-off-by: Greg Kroah-Hartman

    Paul Moore
     

30 Nov, 2013

1 commit

  • commit 08de59eb144d7c41351a467442f898d720f0f15f upstream.

    This reverts commit 4c2c392763a682354fac65b6a569adec4e4b5387.

    Everything in the initramfs should be measured and appraised,
    but until the initramfs has extended attribute support, at
    least measured.

    Signed-off-by: Mimi Zohar
    Signed-off-by: Greg Kroah-Hartman

    Mimi Zohar
     

01 Jun, 2013

1 commit


08 May, 2013

1 commit

  • Faster kernel compiles by way of fewer unnecessary includes.

    [akpm@linux-foundation.org: fix fallout]
    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     

02 May, 2013

2 commits

  • Pull VFS updates from Al Viro,

    Misc cleanups all over the place, mainly wrt /proc interfaces (switch
    create_proc_entry to proc_create(), get rid of the deprecated
    create_proc_read_entry() in favor of using proc_create_data() and
    seq_file etc).

    7kloc removed.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
    don't bother with deferred freeing of fdtables
    proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
    proc: Make the PROC_I() and PDE() macros internal to procfs
    proc: Supply a function to remove a proc entry by PDE
    take cgroup_open() and cpuset_open() to fs/proc/base.c
    ppc: Clean up scanlog
    ppc: Clean up rtas_flash driver somewhat
    hostap: proc: Use remove_proc_subtree()
    drm: proc: Use remove_proc_subtree()
    drm: proc: Use minor->index to label things, not PDE->name
    drm: Constify drm_proc_list[]
    zoran: Don't print proc_dir_entry data in debug
    reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
    proc: Supply an accessor for getting the data from a PDE's parent
    airo: Use remove_proc_subtree()
    rtl8192u: Don't need to save device proc dir PDE
    rtl8187se: Use a dir under /proc/net/r8180/
    proc: Add proc_mkdir_data()
    proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
    proc: Move PDE_NET() to fs/proc/proc_net.c
    ...

    Linus Torvalds
     
  • Pull networking updates from David Miller:
    "Highlights (1721 non-merge commits, this has to be a record of some
    sort):

    1) Add 'random' mode to team driver, from Jiri Pirko and Eric
    Dumazet.

    2) Make it so that any driver that supports configuration of multiple
    MAC addresses can provide the forwarding database add and del
    calls by providing a default implementation and hooking that up if
    the driver doesn't have an explicit set of handlers. From Vlad
    Yasevich.

    3) Support GSO segmentation over tunnels and other encapsulating
    devices such as VXLAN, from Pravin B Shelar.

    4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton.

    5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita
    Dukkipati.

    6) In the PHY layer, allow supporting wake-on-lan in situations where
    the PHY registers have to be written for it to be configured.

    Use it to support wake-on-lan in mv643xx_eth.

    From Michael Stapelberg.

    7) Significantly improve firewire IPV6 support, from YOSHIFUJI
    Hideaki.

    8) Allow multiple packets to be sent in a single transmission using
    network coding in batman-adv, from Martin Hundebøll.

    9) Add support for T5 cxgb4 chips, from Santosh Rastapur.

    10) Generalize the VXLAN forwarding tables so that there is more
    flexibility in configurating various aspects of the endpoints.
    From David Stevens.

    11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver,
    from Dmitry Kravkov.

    12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo
    Neira Ayuso.

    13) Start adding networking selftests.

    14) In situations of overload on the same AF_PACKET fanout socket, or
    per-cpu packet receive queue, minimize drop by distributing the
    load to other cpus/fanouts. From Willem de Bruijn and Eric
    Dumazet.

    15) Add support for new payload offset BPF instruction, from Daniel
    Borkmann.

    16) Convert several drivers over to mdoule_platform_driver(), from
    Sachin Kamat.

    17) Provide a minimal BPF JIT image disassembler userspace tool, from
    Daniel Borkmann.

    18) Rewrite F-RTO implementation in TCP to match the final
    specification of it in RFC4138 and RFC5682. From Yuchung Cheng.

    19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear
    you like netlink, so I implemented netlink dumping of netlink
    sockets.") From Andrey Vagin.

    20) Remove ugly passing of rtnetlink attributes into rtnl_doit
    functions, from Thomas Graf.

    21) Allow userspace to be able to see if a configuration change occurs
    in the middle of an address or device list dump, from Nicolas
    Dichtel.

    22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes
    Frederic Sowa.

    23) Increase accuracy of packet length used by packet scheduler, from
    Jason Wang.

    24) Beginning set of changes to make ipv4/ipv6 fragment handling more
    scalable and less susceptible to overload and locking contention,
    from Jesper Dangaard Brouer.

    25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*()
    instead. From Hong Zhiguo.

    26) Optimize route usage in IPVS by avoiding reference counting where
    possible, from Julian Anastasov.

    27) Convert IPVS schedulers to RCU, also from Julian Anastasov.

    28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger
    Eitzenberger.

    29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG,
    nfnetlink_log, and nfnetlink_queue. From Gao feng.

    30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa.

    31) Support several new r8169 chips, from Hayes Wang.

    32) Support tokenized interface identifiers in ipv6, from Daniel
    Borkmann.

    33) Use usbnet_link_change() helper in USB net driver, from Ming Lei.

    34) Add 802.1ad vlan offload support, from Patrick McHardy.

    35) Support mmap() based netlink communication, also from Patrick
    McHardy.

    36) Support HW timestamping in mlx4 driver, from Amir Vadai.

    37) Rationalize AF_PACKET packet timestamping when transmitting, from
    Willem de Bruijn and Daniel Borkmann.

    38) Bring parity to what's provided by /proc/net/packet socket dumping
    and the info provided by netlink socket dumping of AF_PACKET
    sockets. From Nicolas Dichtel.

    39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin
    Poirier"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
    filter: fix va_list build error
    af_unix: fix a fatal race with bit fields
    bnx2x: Prevent memory leak when cnic is absent
    bnx2x: correct reading of speed capabilities
    net: sctp: attribute printl with __printf for gcc fmt checks
    netlink: kconfig: move mmap i/o into netlink kconfig
    netpoll: convert mutex into a semaphore
    netlink: Fix skb ref counting.
    net_sched: act_ipt forward compat with xtables
    mlx4_en: fix a build error on 32bit arches
    Revert "bnx2x: allow nvram test to run when device is down"
    bridge: avoid OOPS if root port not found
    drivers: net: cpsw: fix kernel warn on cpsw irq enable
    sh_eth: use random MAC address if no valid one supplied
    3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA)
    tg3: fix to append hardware time stamping flags
    unix/stream: fix peeking with an offset larger than data in queue
    unix/dgram: fix peeking with an offset larger than data in queue
    unix/dgram: peek beyond 0-sized skbs
    openvswitch: Remove unneeded ovs_netdev_get_ifindex()
    ...

    Linus Torvalds
     

01 May, 2013

3 commits

  • Merge third batch of fixes from Andrew Morton:
    "Most of the rest. I still have two large patchsets against AIO and
    IPC, but they're a bit stuck behind other trees and I'm about to
    vanish for six days.

    - random fixlets
    - inotify
    - more of the MM queue
    - show_stack() cleanups
    - DMI update
    - kthread/workqueue things
    - compat cleanups
    - epoll udpates
    - binfmt updates
    - nilfs2
    - hfs
    - hfsplus
    - ptrace
    - kmod
    - coredump
    - kexec
    - rbtree
    - pids
    - pidns
    - pps
    - semaphore tweaks
    - some w1 patches
    - relay updates
    - core Kconfig changes
    - sysrq tweaks"

    * emailed patches from Andrew Morton : (109 commits)
    Documentation/sysrq: fix inconstistent help message of sysrq key
    ethernet/emac/sysrq: fix inconstistent help message of sysrq key
    sparc/sysrq: fix inconstistent help message of sysrq key
    powerpc/xmon/sysrq: fix inconstistent help message of sysrq key
    ARM/etm/sysrq: fix inconstistent help message of sysrq key
    power/sysrq: fix inconstistent help message of sysrq key
    kgdb/sysrq: fix inconstistent help message of sysrq key
    lib/decompress.c: fix initconst
    notifier-error-inject: fix module names in Kconfig
    kernel/sys.c: make prctl(PR_SET_MM) generally available
    UAPI: remove empty Kbuild files
    menuconfig: print more info for symbol without prompts
    init/Kconfig: re-order CONFIG_EXPERT options to fix menuconfig display
    kconfig menu: move Virtualization drivers near other virtualization options
    Kconfig: consolidate CONFIG_DEBUG_STRICT_USER_COPY_CHECKS
    relay: use macro PAGE_ALIGN instead of FIX_SIZE
    kernel/relay.c: move FIX_SIZE macro into relay.c
    kernel/relay.c: remove unused function argument actor
    drivers/w1/slaves/w1_ds2760.c: fix the error handling in w1_ds2760_add_slave()
    drivers/w1/slaves/w1_ds2781.c: fix the error handling in w1_ds2781_add_slave()
    ...

    Linus Torvalds
     
  • Use call_usermodehelper_setup() + call_usermodehelper_exec() instead of
    calling call_usermodehelper_fns(). In case there's an OOM in this last
    function the cleanup function may not be called - in this case we would
    miss a call to key_put().

    Signed-off-by: Lucas De Marchi
    Cc: Oleg Nesterov
    Acked-by: David Howells
    Acked-by: James Morris
    Cc: Al Viro
    Cc: Tejun Heo
    Cc: "Rafael J. Wysocki"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lucas De Marchi
     
  • Pull security subsystem update from James Morris:
    "Just some minor updates across the subsystem"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
    ima: eliminate passing d_name.name to process_measurement()
    TPM: Retry SaveState command in suspend path
    tpm/tpm_i2c_infineon: Add small comment about return value of __i2c_transfer
    tpm/tpm_i2c_infineon.c: Add OF attributes type and name to the of_device_id table entries
    tpm_i2c_stm_st33: Remove duplicate inclusion of header files
    tpm: Add support for new Infineon I2C TPM (SLB 9645 TT 1.2 I2C)
    char/tpm: Convert struct i2c_msg initialization to C99 format
    drivers/char/tpm/tpm_ppi: use strlcpy instead of strncpy
    tpm/tpm_i2c_stm_st33: formatting and white space changes
    Smack: include magic.h in smackfs.c
    selinux: make security_sb_clone_mnt_opts return an error on context mismatch
    seccomp: allow BPF_XOR based ALU instructions.
    Fix NULL pointer dereference in smack_inode_unlink() and smack_inode_rmdir()
    Smack: add support for modification of existing rules
    smack: SMACK_MAGIC to include/uapi/linux/magic.h
    Smack: add missing support for transmute bit in smack_str_from_perm()
    Smack: prevent revoke-subject from failing when unseen label is written to it
    tomoyo: use DEFINE_SRCU() to define tomoyo_ss
    tomoyo: use DEFINE_SRCU() to define tomoyo_ss

    Linus Torvalds
     

30 Apr, 2013

2 commits

  • Pull cgroup updates from Tejun Heo:

    - Fixes and a lot of cleanups. Locking cleanup is finally complete.
    cgroup_mutex is no longer exposed to individual controlelrs which
    used to cause nasty deadlock issues. Li fixed and cleaned up quite a
    bit including long standing ones like racy cgroup_path().

    - device cgroup now supports proper hierarchy thanks to Aristeu.

    - perf_event cgroup now supports proper hierarchy.

    - A new mount option "__DEVEL__sane_behavior" is added. As indicated
    by the name, this option is to be used for development only at this
    point and generates a warning message when used. Unfortunately,
    cgroup interface currently has too many brekages and inconsistencies
    to implement a consistent and unified hierarchy on top. The new flag
    is used to collect the behavior changes which are necessary to
    implement consistent unified hierarchy. It's likely that this flag
    won't be used verbatim when it becomes ready but will be enabled
    implicitly along with unified hierarchy.

    The option currently disables some of broken behaviors in cgroup core
    and also .use_hierarchy switch in memcg (will be routed through -mm),
    which can be used to make very unusual hierarchy where nesting is
    partially honored. It will also be used to implement hierarchy
    support for blk-throttle which would be impossible otherwise without
    introducing a full separate set of control knobs.

    This is essentially versioning of interface which isn't very nice but
    at this point I can't see any other options which would allow keeping
    the interface the same while moving towards hierarchy behavior which
    is at least somewhat sane. The planned unified hierarchy is likely
    to require some level of adaptation from userland anyway, so I think
    it'd be best to take the chance and update the interface such that
    it's supportable in the long term.

    Maintaining the existing interface does complicate cgroup core but
    shouldn't put too much strain on individual controllers and I think
    it'd be manageable for the foreseeable future. Maybe we'll be able
    to drop it in a decade.

    Fix up conflicts (including a semantic one adding a new #include to ppc
    that was uncovered by header the file changes) as per Tejun.

    * 'for-3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (45 commits)
    cpuset: fix compile warning when CONFIG_SMP=n
    cpuset: fix cpu hotplug vs rebuild_sched_domains() race
    cpuset: use rebuild_sched_domains() in cpuset_hotplug_workfn()
    cgroup: restore the call to eventfd->poll()
    cgroup: fix use-after-free when umounting cgroupfs
    cgroup: fix broken file xattrs
    devcg: remove parent_cgroup.
    memcg: force use_hierarchy if sane_behavior
    cgroup: remove cgrp->top_cgroup
    cgroup: introduce sane_behavior mount option
    move cgroupfs_root to include/linux/cgroup.h
    cgroup: convert cgroupfs_root flag bits to masks and add CGRP_ prefix
    cgroup: make cgroup_path() not print double slashes
    Revert "cgroup: remove bind() method from cgroup_subsys."
    perf: make perf_event cgroup hierarchical
    cgroup: implement cgroup_is_descendant()
    cgroup: make sure parent won't be destroyed before its children
    cgroup: remove bind() method from cgroup_subsys.
    devcg: remove broken_hierarchy tag
    cgroup: remove cgroup_lock_is_held()
    ...

    Linus Torvalds
     
  • Signed-off-by: Al Viro

    Al Viro
     

23 Apr, 2013

1 commit

  • Conflicts:
    drivers/net/ethernet/emulex/benet/be_main.c
    drivers/net/ethernet/intel/igb/igb_main.c
    drivers/net/wireless/brcm80211/brcmsmac/mac80211_if.c
    include/net/scm.h
    net/batman-adv/routing.c
    net/ipv4/tcp_input.c

    The e{uid,gid} --> {uid,gid} credentials fix conflicted with the
    cleanup in net-next to now pass cred structs around.

    The be2net driver had a bug fix in 'net' that overlapped with the VLAN
    interface changes by Patrick McHardy in net-next.

    An IGB conflict existed because in 'net' the build_skb() support was
    reverted, and in 'net-next' there was a comment style fix within that
    code.

    Several batman-adv conflicts were resolved by making sure that all
    calls to batadv_is_my_mac() are changed to have a new bat_priv first
    argument.

    Eric Dumazet's TS ECR fix in TCP in 'net' conflicted with the F-RTO
    rewrite in 'net-next', mostly overlapping changes.

    Thanks to Stephen Rothwell and Antonio Quartulli for help with several
    of these merge resolutions.

    Signed-off-by: David S. Miller

    David S. Miller
     

19 Apr, 2013

1 commit

  • In devcgroup_css_alloc(), there is no longer need for parent_cgroup.
    bd2953ebbb("devcg: propagate local changes down the hierarchy") made
    the variable parent_cgroup redundant. This patch removes parent_cgroup
    from devcgroup_css_alloc().

    Signed-off-by: Rami Rosen
    Acked-by: Aristeu Rozanski
    Signed-off-by: Tejun Heo

    Rami Rosen
     

18 Apr, 2013

1 commit

  • Passing a pointer to the dentry name, as a parameter to
    process_measurement(), causes a race condition with rename() and
    is unnecessary, as the dentry name is already accessible via the
    file parameter.

    In the normal case, we use the full pathname as provided by
    brpm->filename, bprm->interp, or ima_d_path(). Only on ima_d_path()
    failure, do we fallback to using the d_name.name, which points
    either to external memory or d_iname.

    Reported-by: Al Viro
    Signed-off-by: Mimi Zohar
    Signed-off-by: James Morris

    Mimi Zohar
     

10 Apr, 2013

1 commit

  • Commit 90ba9b1986b5ac (tcp: tcp_make_synack() can use alloc_skb())
    broke certain SELinux/NetLabel configurations by no longer correctly
    assigning the sock to the outgoing SYNACK packet.

    Cost of atomic operations on the LISTEN socket is quite big,
    and we would like it to happen only if really needed.

    This patch introduces a new security_ops->skb_owned_by() method,
    that is a void operation unless selinux is active.

    Reported-by: Miroslav Vadkerti
    Diagnosed-by: Paul Moore
    Signed-off-by: Eric Dumazet
    Cc: "David S. Miller"
    Cc: linux-security-module@vger.kernel.org
    Acked-by: James Morris
    Tested-by: Paul Moore
    Acked-by: Paul Moore
    Signed-off-by: David S. Miller

    Eric Dumazet
     

08 Apr, 2013

1 commit


03 Apr, 2013

1 commit


02 Apr, 2013

2 commits

  • I had the following problem reported a while back. If you mount the
    same filesystem twice using NFSv4 with different contexts, then the
    second context= option is ignored. For instance:

    # mount server:/export /mnt/test1
    # mount server:/export /mnt/test2 -o context=system_u:object_r:tmp_t:s0
    # ls -dZ /mnt/test1
    drwxrwxrwt. root root system_u:object_r:nfs_t:s0 /mnt/test1
    # ls -dZ /mnt/test2
    drwxrwxrwt. root root system_u:object_r:nfs_t:s0 /mnt/test2

    When we call into SELinux to set the context of a "cloned" superblock,
    it will currently just bail out when it notices that we're reusing an
    existing superblock. Since the existing superblock is already set up and
    presumably in use, we can't go overwriting its context with the one from
    the "original" sb. Because of this, the second context= option in this
    case cannot take effect.

    This patch fixes this by turning security_sb_clone_mnt_opts into an int
    return operation. When it finds that the "new" superblock that it has
    been handed is already set up, it checks to see whether the contexts on
    the old superblock match it. If it does, then it will just return
    success, otherwise it'll return -EBUSY and emit a printk to tell the
    admin why the second mount failed.

    Note that this patch may cause casualties. The NFSv4 code relies on
    being able to walk down to an export from the pseudoroot. If you mount
    filesystems that are nested within one another with different contexts,
    then this patch will make those mounts fail in new and "exciting" ways.

    For instance, suppose that /export is a separate filesystem on the
    server:

    # mount server:/ /mnt/test1
    # mount salusa:/export /mnt/test2 -o context=system_u:object_r:tmp_t:s0
    mount.nfs: an incorrect mount option was specified

    ...with the printk in the ring buffer. Because we *might* eventually
    walk down to /mnt/test1/export, the mount is denied due to this patch.
    The second mount needs the pseudoroot superblock, but that's already
    present with the wrong context.

    OTOH, if we mount these in the reverse order, then both mounts work,
    because the pseudoroot superblock created when mounting /export is
    discarded once that mount is done. If we then however try to walk into
    that directory, the automount fails for the similar reasons:

    # cd /mnt/test1/scratch/
    -bash: cd: /mnt/test1/scratch: Device or resource busy

    The story I've gotten from the SELinux folks that I've talked to is that
    this is desirable behavior. In SELinux-land, mounting the same data
    under different contexts is wrong -- there can be only one.

    Cc: Steve Dickson
    Cc: Stephen Smalley
    Signed-off-by: Jeff Layton
    Acked-by: Eric Paris
    Signed-off-by: James Morris

    Jeff Layton
     
  • Conflicts:
    net/mac80211/sta_info.c
    net/wireless/core.h

    Two minor conflicts in wireless. Overlapping additions of extern
    declarations in net/wireless/core.h and a bug fix overlapping with
    the addition of a boolean parameter to __ieee80211_key_free().

    Signed-off-by: David S. Miller

    David S. Miller
     

29 Mar, 2013

2 commits

  • Pull userns fixes from Eric W Biederman:
    "The bulk of the changes are fixing the worst consequences of the user
    namespace design oversight in not considering what happens when one
    namespace starts off as a clone of another namespace, as happens with
    the mount namespace.

    The rest of the changes are just plain bug fixes.

    Many thanks to Andy Lutomirski for pointing out many of these issues."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    userns: Restrict when proc and sysfs can be mounted
    ipc: Restrict mounting the mqueue filesystem
    vfs: Carefully propogate mounts across user namespaces
    vfs: Add a mount flag to lock read only bind mounts
    userns: Don't allow creation if the user is chrooted
    yama: Better permission check for ptraceme
    pid: Handle the exit of a multi-threaded init.
    scm: Require CAP_SYS_ADMIN over the current pidns to spoof pids.

    Linus Torvalds
     
  • Signed-off-by: Hong Zhiguo
    Acked-by: Thomas Graf
    Signed-off-by: David S. Miller

    Hong zhi guo
     

27 Mar, 2013

1 commit


20 Mar, 2013

5 commits

  • This patch makes exception changes to propagate down in hierarchy respecting
    when possible local exceptions.

    New exceptions allowing additional access to devices won't be propagated, but
    it'll be possible to add an exception to access all of part of the newly
    allowed device(s).

    New exceptions disallowing access to devices will be propagated down and the
    local group's exceptions will be revalidated for the new situation.
    Example:
    A
    / \
    B

    group behavior exceptions
    A allow "b 8:* rwm", "c 116:1 rw"
    B deny "c 1:3 rwm", "c 116:2 rwm", "b 3:* rwm"

    If a new exception is added to group A:
    # echo "c 116:* r" > A/devices.deny
    it'll propagate down and after revalidating B's local exceptions, the exception
    "c 116:2 rwm" will be removed.

    In case parent's exceptions change and local exceptions are not allowed anymore,
    they'll be deleted.

    v7:
    - do not allow behavior change when the cgroup has children
    - update documentation

    v6: fixed issues pointed by Serge Hallyn
    - only copy parent's exceptions while propagating behavior if the local
    behavior is different
    - while propagating exceptions, do not clear and copy parent's: it'd be against
    the premise we don't propagate access to more devices

    v5: fixed issues pointed by Serge Hallyn
    - updated documentation
    - not propagating when an exception is written to devices.allow
    - when propagating a new behavior, clean the local exceptions list if they're
    for a different behavior

    v4: fixed issues pointed by Tejun Heo
    - separated function to walk the tree and collect valid propagation targets

    v3: fixed issues pointed by Tejun Heo
    - update documentation
    - move css_online/css_offline changes to a new patch
    - use cgroup_for_each_descendant_pre() instead of own descendant walk
    - move exception_copy rework to a separared patch
    - move exception_clean rework to a separated patch

    v2: fixed issues pointed by Tejun Heo
    - instead of keeping the local settings that won't apply anymore, remove them

    Cc: Tejun Heo
    Cc: Serge Hallyn
    Signed-off-by: Aristeu Rozanski
    Signed-off-by: Tejun Heo

    Aristeu Rozanski
     
  • Allocate resources and change behavior only when online. This is needed in
    order to determine if a node is suitable for hierarchy propagation or if it's
    being removed.

    Locking:
    Both functions take devcgroup_mutex to make changes to device_cgroup structure.
    Hierarchy propagation will also take devcgroup_mutex before walking the
    tree while walking the tree itself is protected by rcu lock.

    Acked-by: Tejun Heo
    Acked-by: Serge Hallyn
    Cc: Tejun Heo
    Cc: Serge Hallyn
    Signed-off-by: Aristeu Rozanski
    Signed-off-by: Tejun Heo

    Aristeu Rozanski
     
  • Currently may_access() is only able to verify if an exception is valid for the
    current cgroup, which has the same behavior. With hierarchy, it'll be also used
    to verify if a cgroup local exception is valid towards its cgroup parent, which
    might have different behavior.

    v2:
    - updated patch description
    - rebased on top of a new patch to expand the may_access() logic to make it
    more clear
    - fixed argument description order in may_access()

    Acked-by: Tejun Heo
    Acked-by: Serge Hallyn
    Cc: Tejun Heo
    Cc: Serge Hallyn
    Signed-off-by: Aristeu Rozanski
    Signed-off-by: Tejun Heo

    Aristeu Rozanski
     
  • In order to make the next patch more clear, expand may_access() logic.

    v2: may_access() returns bool now

    Acked-by: Tejun Heo
    Acked-by: Serge Hallyn
    Cc: Tejun Heo
    Cc: Serge Hallyn
    Signed-off-by: Aristeu Rozanski
    Signed-off-by: Tejun Heo

    Aristeu Rozanski
     
  • This patch fixes kernel Oops because of wrong common_audit_data type
    in smack_inode_unlink() and smack_inode_rmdir().

    When SMACK security module is enabled and SMACK logging is on (/smack/logging
    is not zero) and you try to delete the file which
    1) you cannot delete due to SMACK rules and logging of failures is on
    or
    2) you can delete and logging of success is on,

    you will see following:

    Unable to handle kernel NULL pointer dereference at virtual address 000002d7

    [] (strlen+0x0/0x28)
    [] (audit_log_untrustedstring+0x14/0x28)
    [] (common_lsm_audit+0x108/0x6ac)
    [] (smack_log+0xc4/0xe4)
    [] (smk_curacc+0x80/0x10c)
    [] (smack_inode_unlink+0x74/0x80)
    [] (security_inode_unlink+0x2c/0x30)
    [] (vfs_unlink+0x7c/0x100)
    [] (do_unlinkat+0x144/0x16c)

    The function smack_inode_unlink() (and smack_inode_rmdir()) need
    to log two structures of different types. First of all it does:

    smk_ad_init(&ad, __func__, LSM_AUDIT_DATA_DENTRY);
    smk_ad_setfield_u_fs_path_dentry(&ad, dentry);

    This will set common audit data type to LSM_AUDIT_DATA_DENTRY
    and store dentry for auditing (by function smk_curacc(), which in turn calls
    dump_common_audit_data(), which is actually uses provided data and logs it).

    /*
    * You need write access to the thing you're unlinking
    */
    rc = smk_curacc(smk_of_inode(ip), MAY_WRITE, &ad);
    if (rc == 0) {
    /*
    * You also need write access to the containing directory
    */

    Then this function wants to log anoter data:

    smk_ad_setfield_u_fs_path_dentry(&ad, NULL);
    smk_ad_setfield_u_fs_inode(&ad, dir);

    The function sets inode field, but don't change common_audit_data type.

    rc = smk_curacc(smk_of_inode(dir), MAY_WRITE, &ad);
    }

    So the dump_common_audit() function incorrectly interprets inode structure
    as dentry, and Oops will happen.

    This patch reinitializes common_audit_data structures with correct type.
    Also I removed unneeded
    smk_ad_setfield_u_fs_path_dentry(&ad, NULL);
    initialization, because both dentry and inode pointers are stored
    in the same union.

    Signed-off-by: Igor Zhbanov
    Signed-off-by: Kyungmin Park

    Igor Zhbanov