13 Dec, 2012

1 commit

  • Pull big execve/kernel_thread/fork unification series from Al Viro:
    "All architectures are converted to new model. Quite a bit of that
    stuff is actually shared with architecture trees; in such cases it's
    literally shared branch pulled by both, not a cherry-pick.

    A lot of ugliness and black magic is gone (-3KLoC total in this one):

    - kernel_thread()/kernel_execve()/sys_execve() redesign.

    We don't do syscalls from kernel anymore for either kernel_thread()
    or kernel_execve():

    kernel_thread() is essentially clone(2) with callback run before we
    return to userland, the callbacks either never return or do
    successful do_execve() before returning.

    kernel_execve() is a wrapper for do_execve() - it doesn't need to
    do transition to user mode anymore.

    As a result kernel_thread() and kernel_execve() are
    arch-independent now - they live in kernel/fork.c and fs/exec.c
    resp. sys_execve() is also in fs/exec.c and it's completely
    architecture-independent.

    - daemonize() is gone, along with its parts in fs/*.c

    - struct pt_regs * is no longer passed to do_fork/copy_process/
    copy_thread/do_execve/search_binary_handler/->load_binary/do_coredump.

    - sys_fork()/sys_vfork()/sys_clone() unified; some architectures
    still need wrappers (ones with callee-saved registers not saved in
    pt_regs on syscall entry), but the main part of those suckers is in
    kernel/fork.c now."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (113 commits)
    do_coredump(): get rid of pt_regs argument
    print_fatal_signal(): get rid of pt_regs argument
    ptrace_signal(): get rid of unused arguments
    get rid of ptrace_signal_deliver() arguments
    new helper: signal_pt_regs()
    unify default ptrace_signal_deliver
    flagday: kill pt_regs argument of do_fork()
    death to idle_regs()
    don't pass regs to copy_process()
    flagday: don't pass regs to copy_thread()
    bfin: switch to generic vfork, get rid of pointless wrappers
    xtensa: switch to generic clone()
    openrisc: switch to use of generic fork and clone
    unicore32: switch to generic clone(2)
    score: switch to generic fork/vfork/clone
    c6x: sanitize copy_thread(), get rid of clone(2) wrapper, switch to generic clone()
    take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h
    mn10300: switch to generic fork/vfork/clone
    h8300: switch to generic fork/vfork/clone
    tile: switch to generic clone()
    ...

    Conflicts:
    arch/microblaze/include/asm/Kbuild

    Linus Torvalds
     

30 Nov, 2012

1 commit


29 Nov, 2012

1 commit


19 Nov, 2012

1 commit


12 Nov, 2012

1 commit

  • It can be legitimately triggered via procfs access. Now, at least
    2 of 3 of get_files_struct() callers in procfs are useless, but
    when and if we get rid of those we can always add WARN_ON() here.
    BUG_ON() at that spot is simply wrong.

    Signed-off-by: Al Viro

    Al Viro
     

31 Oct, 2012

1 commit

  • Jack Lin reports that the error return from dup3() for the RLIMIT_NOFILE
    case changed incorrectly after 3.6.

    The culprit is commit f33ff9927f42 ("take rlimit check to callers of
    expand_files()") which when it moved the "return -EMFILE" out to the
    caller, didn't notice that the dup3() had special code to turn the
    EMFILE return into EBADF.

    The replace_fd() helper that got added later then inherited the bug too.

    Reported-by: Jack Lin
    Signed-off-by: Al Viro
    [ Noted more bugs, wrote proper changelog, fixed up typos - Linus ]
    Signed-off-by: Linus Torvalds

    Al Viro
     

10 Oct, 2012

1 commit

  • I have tested the attached patch to fix the dup3 regression.

    Rich.

    From 0944e30e12dec6544b3602626b60ff412375c78f Mon Sep 17 00:00:00 2001
    From: "Richard W.M. Jones"
    Date: Tue, 9 Oct 2012 14:42:45 +0100
    Subject: [PATCH] dup3: Return an error when oldfd == newfd.

    The following commit:

    commit fe17f22d7fd0e344ef6447238f799bb49f670c6f
    Author: Al Viro
    Date: Tue Aug 21 11:48:11 2012 -0400

    take purely descriptor-related stuff from fcntl.c to file.c

    was supposed to be just code motion, but it dropped the following two
    lines:

    if (unlikely(oldfd == newfd))
    return -EINVAL;

    from the dup3 system call. dup3 is not specified by POSIX, so Linux
    can do what it likes. However the POSIX proposal for dup3 [1] states
    that it should return an error if oldfd == newfd.

    [1] http://austingroupbugs.net/view.php?id=411

    Signed-off-by: Richard W.M. Jones
    Tested-by: Richard W.M. Jones
    Signed-off-by: Al Viro

    Richard W.M. Jones
     

27 Sep, 2012

18 commits


30 Mar, 2012

1 commit

  • Pull x32 support for x86-64 from Ingo Molnar:
    "This tree introduces the X32 binary format and execution mode for x86:
    32-bit data space binaries using 64-bit instructions and 64-bit kernel
    syscalls.

    This allows applications whose working set fits into a 32 bits address
    space to make use of 64-bit instructions while using a 32-bit address
    space with shorter pointers, more compressed data structures, etc."

    Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c}

    * 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
    x32: Fix alignment fail in struct compat_siginfo
    x32: Fix stupid ia32/x32 inversion in the siginfo format
    x32: Add ptrace for x32
    x32: Switch to a 64-bit clock_t
    x32: Provide separate is_ia32_task() and is_x32_task() predicates
    x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls
    x86/x32: Fix the binutils auto-detect
    x32: Warn and disable rather than error if binutils too old
    x32: Only clear TIF_X32 flag once
    x32: Make sure TS_COMPAT is cleared for x32 tasks
    fs: Remove missed ->fds_bits from cessation use of fd_set structs internally
    fs: Fix close_on_exec pointer in alloc_fdtable
    x32: Drop non-__vdso weak symbols from the x32 VDSO
    x32: Fix coding style violations in the x32 VDSO code
    x32: Add x32 VDSO support
    x32: Allow x32 to be configured
    x32: If configured, add x32 system calls to system call tables
    x32: Handle process creation
    x32: Signal-related system calls
    x86: Add #ifdef CONFIG_COMPAT to
    ...

    Linus Torvalds
     

29 Feb, 2012

1 commit


24 Feb, 2012

1 commit

  • alloc_fdtable allocates space for the open_fds and close_on_exec
    bitfields together, as 2 * nr / BITS_PER_BYTE. close_on_exec needs to
    point to open_fds + nr / BITS_PER_BYTE, not open_fds + nr /
    BITS_PER_LONG, as introducted in 1fd36adc: Replace the fd_sets in
    struct fdtable with an array of unsigned longs.

    Signed-off-by: Bobby Powers
    Link: http://lkml.kernel.org/r/1329888587-3087-1-git-send-email-bobbypowers@gmail.com
    Acked-by: David Howells
    Signed-off-by: H. Peter Anvin

    Bobby Powers
     

20 Feb, 2012

2 commits

  • Replace the fd_sets in struct fdtable with an array of unsigned longs and then
    use the standard non-atomic bit operations rather than the FD_* macros.

    This:

    (1) Removes the abuses of struct fd_set:

    (a) Since we don't want to allocate a full fd_set the vast majority of the
    time, we actually, in effect, just allocate a just-big-enough array of
    unsigned longs and cast it to an fd_set type - so why bother with the
    fd_set at all?

    (b) Some places outside of the core fdtable handling code (such as
    SELinux) want to look inside the array of unsigned longs hidden inside
    the fd_set struct for more efficient iteration over the entire set.

    (2) Eliminates the use of FD_*() macros in the kernel completely.

    (3) Permits the __FD_*() macros to be deleted entirely where not exposed to
    userspace.

    Signed-off-by: David Howells
    Link: http://lkml.kernel.org/r/20120216174954.23314.48147.stgit@warthog.procyon.org.uk
    Signed-off-by: H. Peter Anvin
    Cc: Al Viro

    David Howells
     
  • Wrap accesses to the fd_sets in struct fdtable (for recording open files and
    close-on-exec flags) so that we can move away from using fd_sets since we
    abuse the fd_set structs by not allocating the full-sized structure under
    normal circumstances and by non-core code looking at the internals of the
    fd_sets.

    The first abuse means that use of FD_ZERO() on these fd_sets is not permitted,
    since that cannot be told about their abnormal lengths.

    This introduces six wrapper functions for setting, clearing and testing
    close-on-exec flags and fd-is-open flags:

    void __set_close_on_exec(int fd, struct fdtable *fdt);
    void __clear_close_on_exec(int fd, struct fdtable *fdt);
    bool close_on_exec(int fd, const struct fdtable *fdt);
    void __set_open_fd(int fd, struct fdtable *fdt);
    void __clear_open_fd(int fd, struct fdtable *fdt);
    bool fd_is_open(int fd, const struct fdtable *fdt);

    Note that I've prepended '__' to the names of the set/clear functions because
    they require the caller to hold a lock to use them.

    Note also that I haven't added wrappers for looking behind the scenes at the
    the array. Possibly that should exist too.

    Signed-off-by: David Howells
    Link: http://lkml.kernel.org/r/20120216174942.23314.1364.stgit@warthog.procyon.org.uk
    Signed-off-by: H. Peter Anvin
    Cc: Al Viro

    David Howells
     

29 Apr, 2011

1 commit

  • Azurit reports large increases in system time after 2.6.36 when running
    Apache. It was bisected down to a892e2d7dcdfa6c76e6 ("vfs: use kmalloc()
    to allocate fdmem if possible").

    That patch caused the vfs to use kmalloc() for very large allocations and
    this is causing excessive work (and presumably excessive reclaim) within
    the page allocator.

    Fix it by falling back to vmalloc() earlier - when the allocation attempt
    would have been considered "costly" by reclaim.

    Reported-by: azurIt
    Tested-by: azurIt
    Acked-by: Changli Gao
    Cc: Americo Wang
    Cc: Jiri Slaby
    Acked-by: Eric Dumazet
    Cc: Mel Gorman
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

11 Aug, 2010

1 commit

  • Use kmalloc() to allocate fdmem if possible.

    vmalloc() is used as a fallback solution for fdmem allocation. A new
    helper function __free_fdtable() is introduced to reduce the lines of
    code.

    A potential bug, vfree() a memory allocated by kmalloc(), is fixed.

    [akpm@linux-foundation.org: use __GFP_NOWARN, uninline alloc_fdmem() and free_fdmem()]
    Signed-off-by: Changli Gao
    Cc: Alexander Viro
    Cc: Jiri Slaby
    Cc: "Paul E. McKenney"
    Cc: Alexey Dobriyan
    Cc: Ingo Molnar
    Cc: Peter Zijlstra
    Cc: Avi Kivity
    Cc: Tetsuo Handa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Changli Gao
     

15 Jun, 2010

1 commit


07 Mar, 2010

1 commit

  • Make sure compiler won't do weird things with limits. E.g. fetching them
    twice may return 2 different values after writable limits are implemented.

    I.e. either use rlimit helpers added in commit 3e10e716abf3 ("resource:
    add helpers for fetching rlimits") or ACCESS_ONCE if not applicable.

    Signed-off-by: Jiri Slaby
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     

25 Feb, 2010

1 commit

  • Add lockdep-ified RCU primitives to alloc_fd(), files_fdtable()
    and fcheck_files().

    Cc: Alexander Viro
    Signed-off-by: Paul E. McKenney
    Cc: laijs@cn.fujitsu.com
    Cc: dipankar@in.ibm.com
    Cc: mathieu.desnoyers@polymtl.ca
    Cc: josh@joshtriplett.org
    Cc: dvhltc@us.ibm.com
    Cc: niv@us.ibm.com
    Cc: peterz@infradead.org
    Cc: rostedt@goodmis.org
    Cc: Valdis.Kletnieks@vt.edu
    Cc: dhowells@redhat.com
    Cc: Alexander Viro
    LKML-Reference:
    Signed-off-by: Ingo Molnar

    Paul E. McKenney
     

12 Oct, 2009

1 commit


01 Aug, 2008

1 commit


27 Jul, 2008

1 commit

  • * dup2() should return -EBADF on exceeded sysctl_nr_open
    * dup() should *not* return -EINVAL even if you have rlimit set to 0;
    it should get -EMFILE instead.

    Check for orig_start exceeding rlimit taken to sys_fcntl().
    Failing expand_files() in dup{2,3}() now gets -EMFILE remapped to -EBADF.
    Consequently, remaining checks for rlimit are taken to expand_files().

    Signed-off-by: Al Viro

    Al Viro
     

17 May, 2008

2 commits

  • Limit sysctl_nr_open - we don't want ->max_fds to exceed MAX_INT and
    we don't want size calculation for ->fd[] to overflow.

    Signed-off-by: Al Viro

    Al Viro
     
  • Parent _can_ be a clone task, contrary to the comment. Moreover,
    more files could be opened while we allocate a copy, in which case
    we end up copying only part into new descriptor table. Since what
    we get _is_ affected by all changes in the old range, we can get
    rather weird effects - e.g.
    dup2(0, 1024); close(0);
    in parallel with fork() resulting in child that sees the effect of
    close(), but not that of dup2() done just before that close().

    What we need is to recalculate the open_count after having reacquired
    ->file_lock and if external fdtable we'd just allocated is too small for
    it, free the sucker and redo allocation.

    Signed-off-by: Al Viro

    Al Viro