29 Jun, 2013

3 commits

  • Signed-off-by: Al Viro

    Al Viro
     
  • New method - ->iterate(file, ctx). That's the replacement for ->readdir();
    it takes callback from ctx->actor, uses ctx->pos instead of file->f_pos and
    calls dir_emit(ctx, ...) instead of filldir(data, ...). It does *not*
    update file->f_pos (or look at it, for that matter); iterate_dir() does the
    update.

    Note that dir_emit() takes the offset from ctx->pos (and eventually
    filldir_t will lose that argument).

    Signed-off-by: Al Viro

    Al Viro
     
  • iterate_dir(): new helper, replacing vfs_readdir().

    struct dir_context: contains the readdir callback (and will get more stuff
    in it), embedded into whatever data that callback wants to deal with;
    eventually, we'll be passing it to ->readdir() replacement instead of
    (data,filldir) pair.

    Signed-off-by: Al Viro

    Al Viro
     

08 May, 2013

1 commit

  • Faster kernel compiles by way of fewer unnecessary includes.

    [akpm@linux-foundation.org: fix fallout]
    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Kent Overstreet
    Cc: Zach Brown
    Cc: Felipe Balbi
    Cc: Greg Kroah-Hartman
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Rusty Russell
    Cc: Jens Axboe
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Jeff Moyer
    Cc: Al Viro
    Cc: Benjamin LaHaise
    Reviewed-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kent Overstreet
     

05 May, 2013

1 commit


02 May, 2013

1 commit

  • Pull VFS updates from Al Viro,

    Misc cleanups all over the place, mainly wrt /proc interfaces (switch
    create_proc_entry to proc_create(), get rid of the deprecated
    create_proc_read_entry() in favor of using proc_create_data() and
    seq_file etc).

    7kloc removed.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
    don't bother with deferred freeing of fdtables
    proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
    proc: Make the PROC_I() and PDE() macros internal to procfs
    proc: Supply a function to remove a proc entry by PDE
    take cgroup_open() and cpuset_open() to fs/proc/base.c
    ppc: Clean up scanlog
    ppc: Clean up rtas_flash driver somewhat
    hostap: proc: Use remove_proc_subtree()
    drm: proc: Use remove_proc_subtree()
    drm: proc: Use minor->index to label things, not PDE->name
    drm: Constify drm_proc_list[]
    zoran: Don't print proc_dir_entry data in debug
    reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
    proc: Supply an accessor for getting the data from a PDE's parent
    airo: Use remove_proc_subtree()
    rtl8192u: Don't need to save device proc dir PDE
    rtl8187se: Use a dir under /proc/net/r8180/
    proc: Add proc_mkdir_data()
    proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
    proc: Move PDE_NET() to fs/proc/proc_net.c
    ...

    Linus Torvalds
     

01 May, 2013

1 commit

  • Pull compat cleanup from Al Viro:
    "Mostly about syscall wrappers this time; there will be another pile
    with patches in the same general area from various people, but I'd
    rather push those after both that and vfs.git pile are in."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    syscalls.h: slightly reduce the jungles of macros
    get rid of union semop in sys_semctl(2) arguments
    make do_mremap() static
    sparc: no need to sign-extend in sync_file_range() wrapper
    ppc compat wrappers for add_key(2) and request_key(2) are pointless
    x86: trim sys_ia32.h
    x86: sys32_kill and sys32_mprotect are pointless
    get rid of compat_sys_semctl() and friends in case of ARCH_WANT_OLD_COMPAT_IPC
    merge compat sys_ipc instances
    consolidate compat lookup_dcookie()
    convert vmsplice to COMPAT_SYSCALL_DEFINE
    switch getrusage() to COMPAT_SYSCALL_DEFINE
    switch epoll_pwait to COMPAT_SYSCALL_DEFINE
    convert sendfile{,64} to COMPAT_SYSCALL_DEFINE
    switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE
    make SYSCALL_DEFINE-generated wrappers do asmlinkage_protect
    make HAVE_SYSCALL_WRAPPERS unconditional
    consolidate cond_syscall and SYSCALL_ALIAS declarations
    teach SYSCALL_DEFINE how to deal with long long/unsigned long long
    get rid of duplicate logics in __SC_....[1-6] definitions

    Linus Torvalds
     

10 Apr, 2013

2 commits


13 Mar, 2013

1 commit

  • Looking at mm/process_vm_access.c:process_vm_rw() and comparing it to
    compat_process_vm_rw() shows that the compatibility code requires an
    explicit "access_ok()" check before calling
    compat_rw_copy_check_uvector(). The same difference seems to appear when
    we compare fs/read_write.c:do_readv_writev() to
    fs/compat.c:compat_do_readv_writev().

    This subtle difference between the compat and non-compat requirements
    should probably be debated, as it seems to be error-prone. In fact,
    there are two others sites that use this function in the Linux kernel,
    and they both seem to get it wrong:

    Now shifting our attention to fs/aio.c, we see that aio_setup_iocb()
    also ends up calling compat_rw_copy_check_uvector() through
    aio_setup_vectored_rw(). Unfortunately, the access_ok() check appears to
    be missing. Same situation for
    security/keys/compat.c:compat_keyctl_instantiate_key_iov().

    I propose that we add the access_ok() check directly into
    compat_rw_copy_check_uvector(), so callers don't have to worry about it,
    and it therefore makes the compat call code similar to its non-compat
    counterpart. Place the access_ok() check in the same location where
    copy_from_user() can trigger a -EFAULT error in the non-compat code, so
    the ABI behaviors are alike on both compat and non-compat.

    While we are here, fix compat_do_readv_writev() so it checks for
    compat_rw_copy_check_uvector() negative return values.

    And also, fix a memory leak in compat_keyctl_instantiate_key_iov() error
    handling.

    Acked-by: Linus Torvalds
    Acked-by: Al Viro
    Signed-off-by: Mathieu Desnoyers
    Signed-off-by: Linus Torvalds

    Mathieu Desnoyers
     

04 Mar, 2013

4 commits


04 Feb, 2013

2 commits


13 Oct, 2012

1 commit

  • getname() is intended to copy pathname strings from userspace into a
    kernel buffer. The result is just a string in kernel space. It would
    however be quite helpful to be able to attach some ancillary info to
    the string.

    For instance, we could attach some audit-related info to reduce the
    amount of audit-related processing needed. When auditing is enabled,
    we could also call getname() on the string more than once and not
    need to recopy it from userspace.

    This patchset converts the getname()/putname() interfaces to return
    a struct instead of a string. For now, the struct just tracks the
    string in kernel space and the original userland pointer for it.

    Later, we'll add other information to the struct as it becomes
    convenient.

    Signed-off-by: Jeff Layton
    Signed-off-by: Al Viro

    Jeff Layton
     

03 Oct, 2012

1 commit

  • This function is used by sparc, powerpc and arm64 for compat support.
    The patch adds a generic implementation which calls do_sendfile()
    directly and avoids set_fs().

    The sparc architecture has wrappers for the sign extensions while
    powerpc relies on the compiler to do the this. The patch adds wrappers
    for powerpc to handle the u32->int type conversion.

    compat_sys_sendfile64() can be replaced by a sys_sendfile() call since
    compat_loff_t has the same size as off_t on a 64-bit system.

    On powerpc, the patch also changes the 64-bit sendfile call from
    sys_sendile64 to sys_sendfile.

    Signed-off-by: Catalin Marinas
    Acked-by: David S. Miller
    Cc: Arnd Bergmann
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Alexander Viro
    Cc: Andrew Morton
    Signed-off-by: Al Viro

    Catalin Marinas
     

27 Sep, 2012

1 commit


21 Aug, 2012

1 commit


02 Jun, 2012

3 commits

  • Pull third pile of signal handling patches from Al Viro:
    "This time it's mostly helpers and conversions to them; there's a lot
    of stuff remaining in the tree, but that'll either go in -rc2
    (isolated bug fixes, ideally via arch maintainers' trees) or will sit
    there until the next cycle."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    x86: get rid of calling do_notify_resume() when returning to kernel mode
    blackfin: check __get_user() return value
    whack-a-mole with TIF_FREEZE
    FRV: Optimise the system call exit path in entry.S [ver #2]
    FRV: Shrink TIF_WORK_MASK [ver #2]
    FRV: Prevent syscall exit tracing and notify_resume at end of kernel exceptions
    new helper: signal_delivered()
    powerpc: get rid of restore_sigmask()
    most of set_current_blocked() callers want SIGKILL/SIGSTOP removed from set
    set_restore_sigmask() is never called without SIGPENDING (and never should be)
    TIF_RESTORE_SIGMASK can be set only when TIF_SIGPENDING is set
    don't call try_to_freeze() from do_signal()
    pull clearing RESTORE_SIGMASK into block_sigmask()
    sh64: failure to build sigframe != signal without handler
    openrisc: tracehook_signal_handler() is supposed to be called on success
    new helper: sigmask_to_save()
    new helper: restore_saved_sigmask()
    new helpers: {clear,test,test_and_clear}_restore_sigmask()
    HAVE_RESTORE_SIGMASK is defined on all architectures now

    Linus Torvalds
     
  • Pull vfs changes from Al Viro.
    "A lot of misc stuff. The obvious groups:
    * Miklos' atomic_open series; kills the damn abuse of
    ->d_revalidate() by NFS, which was the major stumbling block for
    all work in that area.
    * ripping security_file_mmap() and dealing with deadlocks in the
    area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in
    general.
    * ->encode_fh() switched to saner API; insane fake dentry in
    mm/cleancache.c gone.
    * assorted annotations in fs (endianness, __user)
    * parts of Artem's ->s_dirty work (jff2 and reiserfs parts)
    * ->update_time() work from Josef.
    * other bits and pieces all over the place.

    Normally it would've been in two or three pull requests, but
    signal.git stuff had eaten a lot of time during this cycle ;-/"

    Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the
    'truncate_range' inode method was removed by the VM changes, the VFS
    update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due
    to sparse fix added twice, with other changes nearby).

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits)
    nfs: don't open in ->d_revalidate
    vfs: retry last component if opening stale dentry
    vfs: nameidata_to_filp(): don't throw away file on error
    vfs: nameidata_to_filp(): inline __dentry_open()
    vfs: do_dentry_open(): don't put filp
    vfs: split __dentry_open()
    vfs: do_last() common post lookup
    vfs: do_last(): add audit_inode before open
    vfs: do_last(): only return EISDIR for O_CREAT
    vfs: do_last(): check LOOKUP_DIRECTORY
    vfs: do_last(): make ENOENT exit RCU safe
    vfs: make follow_link check RCU safe
    vfs: do_last(): use inode variable
    vfs: do_last(): inline walk_component()
    vfs: do_last(): make exit RCU safe
    vfs: split do_lookup()
    Btrfs: move over to use ->update_time
    fs: introduce inode operation ->update_time
    reiserfs: get rid of resierfs_sync_super
    reiserfs: mark the superblock as dirty a bit later
    ...

    Linus Torvalds
     
  • Everyone either defines it in arch thread_info.h or has TIF_RESTORE_SIGMASK
    and picks default set_restore_sigmask() in linux/thread_info.h. Kill the
    ifdefs, slap #error in linux/thread_info.h to catch breakage when new ones
    get merged.

    Signed-off-by: Al Viro

    Al Viro
     

01 Jun, 2012

1 commit

  • A cleanup of rw_copy_check_uvector and compat_rw_copy_check_uvector after
    changes made to support CMA in an earlier patch.

    Rather than having an additional check_access parameter to these
    functions, the first paramater type is overloaded to allow the caller to
    specify CHECK_IOVEC_ONLY which means check that the contents of the iovec
    are valid, but do not check the memory that they point to. This is used
    by process_vm_readv/writev where we need to validate that a iovec passed
    to the syscall is valid but do not want to check the memory that it points
    to at this point because it refers to an address space in another process.

    Signed-off-by: Chris Yeoh
    Reviewed-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christopher Yeoh
     

30 May, 2012

1 commit


16 May, 2012

1 commit


30 Mar, 2012

1 commit

  • Pull x32 support for x86-64 from Ingo Molnar:
    "This tree introduces the X32 binary format and execution mode for x86:
    32-bit data space binaries using 64-bit instructions and 64-bit kernel
    syscalls.

    This allows applications whose working set fits into a 32 bits address
    space to make use of 64-bit instructions while using a 32-bit address
    space with shorter pointers, more compressed data structures, etc."

    Fix up trivial context conflicts in arch/x86/{Kconfig,vdso/vma.c}

    * 'x86-x32-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (71 commits)
    x32: Fix alignment fail in struct compat_siginfo
    x32: Fix stupid ia32/x32 inversion in the siginfo format
    x32: Add ptrace for x32
    x32: Switch to a 64-bit clock_t
    x32: Provide separate is_ia32_task() and is_x32_task() predicates
    x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls
    x86/x32: Fix the binutils auto-detect
    x32: Warn and disable rather than error if binutils too old
    x32: Only clear TIF_X32 flag once
    x32: Make sure TS_COMPAT is cleared for x32 tasks
    fs: Remove missed ->fds_bits from cessation use of fd_set structs internally
    fs: Fix close_on_exec pointer in alloc_fdtable
    x32: Drop non-__vdso weak symbols from the x32 VDSO
    x32: Fix coding style violations in the x32 VDSO code
    x32: Add x32 VDSO support
    x32: Allow x32 to be configured
    x32: If configured, add x32 system calls to system call tables
    x32: Handle process creation
    x32: Signal-related system calls
    x86: Add #ifdef CONFIG_COMPAT to
    ...

    Linus Torvalds
     

29 Feb, 2012

1 commit


21 Feb, 2012

1 commit

  • For 32-bit ABIs which have real 64-bit registers, we don't want to
    break the position argument into two. However, we still need compat
    support to deal with 32-bit pointers, so we can't just use
    sys_p{read,write} directly.

    Signed-off-by: H. Peter Anvin

    H. J. Lu
     

14 Feb, 2012

1 commit


04 Jan, 2012

2 commits


01 Nov, 2011

1 commit

  • The basic idea behind cross memory attach is to allow MPI programs doing
    intra-node communication to do a single copy of the message rather than a
    double copy of the message via shared memory.

    The following patch attempts to achieve this by allowing a destination
    process, given an address and size from a source process, to copy memory
    directly from the source process into its own address space via a system
    call. There is also a symmetrical ability to copy from the current
    process's address space into a destination process's address space.

    - Use of /proc/pid/mem has been considered, but there are issues with
    using it:
    - Does not allow for specifying iovecs for both src and dest, assuming
    preadv or pwritev was implemented either the area read from or
    written to would need to be contiguous.
    - Currently mem_read allows only processes who are currently
    ptrace'ing the target and are still able to ptrace the target to read
    from the target. This check could possibly be moved to the open call,
    but its not clear exactly what race this restriction is stopping
    (reason appears to have been lost)
    - Having to send the fd of /proc/self/mem via SCM_RIGHTS on unix
    domain socket is a bit ugly from a userspace point of view,
    especially when you may have hundreds if not (eventually) thousands
    of processes that all need to do this with each other
    - Doesn't allow for some future use of the interface we would like to
    consider adding in the future (see below)
    - Interestingly reading from /proc/pid/mem currently actually
    involves two copies! (But this could be fixed pretty easily)

    As mentioned previously use of vmsplice instead was considered, but has
    problems. Since you need the reader and writer working co-operatively if
    the pipe is not drained then you block. Which requires some wrapping to
    do non blocking on the send side or polling on the receive. In all to all
    communication it requires ordering otherwise you can deadlock. And in the
    example of many MPI tasks writing to one MPI task vmsplice serialises the
    copying.

    There are some cases of MPI collectives where even a single copy interface
    does not get us the performance gain we could. For example in an
    MPI_Reduce rather than copy the data from the source we would like to
    instead use it directly in a mathops (say the reduce is doing a sum) as
    this would save us doing a copy. We don't need to keep a copy of the data
    from the source. I haven't implemented this, but I think this interface
    could in the future do all this through the use of the flags - eg could
    specify the math operation and type and the kernel rather than just
    copying the data would apply the specified operation between the source
    and destination and store it in the destination.

    Although we don't have a "second user" of the interface (though I've had
    some nibbles from people who may be interested in using it for intra
    process messaging which is not MPI). This interface is something which
    hardware vendors are already doing for their custom drivers to implement
    fast local communication. And so in addition to this being useful for
    OpenMPI it would mean the driver maintainers don't have to fix things up
    when the mm changes.

    There was some discussion about how much faster a true zero copy would
    go. Here's a link back to the email with some testing I did on that:

    http://marc.info/?l=linux-mm&m=130105930902915&w=2

    There is a basic man page for the proposed interface here:

    http://ozlabs.org/~cyeoh/cma/process_vm_readv.txt

    This has been implemented for x86 and powerpc, other architecture should
    mainly (I think) just need to add syscall numbers for the process_vm_readv
    and process_vm_writev. There are 32 bit compatibility versions for
    64-bit kernels.

    For arch maintainers there are some simple tests to be able to quickly
    verify that the syscalls are working correctly here:

    http://ozlabs.org/~cyeoh/cma/cma-test-20110718.tgz

    Signed-off-by: Chris Yeoh
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: Thomas Gleixner
    Cc: Arnd Bergmann
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: David Howells
    Cc: James Morris
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christopher Yeoh
     

29 Oct, 2011

1 commit

  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue: (21 commits)
    leases: fix write-open/read-lease race
    nfs: drop unnecessary locking in llseek
    ext4: replace cut'n'pasted llseek code with generic_file_llseek_size
    vfs: add generic_file_llseek_size
    vfs: do (nearly) lockless generic_file_llseek
    direct-io: merge direct_io_walker into __blockdev_direct_IO
    direct-io: inline the complete submission path
    direct-io: separate map_bh from dio
    direct-io: use a slab cache for struct dio
    direct-io: rearrange fields in dio/dio_submit to avoid holes
    direct-io: fix a wrong comment
    direct-io: separate fields only used in the submission path from struct dio
    vfs: fix spinning prevention in prune_icache_sb
    vfs: add a comment to inode_permission()
    vfs: pass all mask flags check_acl and posix_acl_permission
    vfs: add hex format for MAY_* flag values
    vfs: indicate that the permission functions take all the MAY_* flags
    compat: sync compat_stats with statfs.
    vfs: add "device" tag to /proc/self/mountstats
    cleanup: vfs: small comment fix for block_invalidatepage
    ...

    Fix up trivial conflict in fs/gfs2/file.c (llseek changes)

    Linus Torvalds
     

28 Oct, 2011

1 commit

  • This was found by inspection while tracking a similar
    bug in compat_statfs64, that has been fixed in mainline
    since decemeber.

    - This fixes a bug where not all of the f_spare fields
    were cleared on mips and s390.
    - Add the f_flags field to struct compat_statfs
    - Copy f_flags to userspace in case someone cares.
    - Use __clear_user to copy the f_spare field to userspace
    to ensure that all of the elements of f_spare are cleared.
    On some architectures f_spare is has 5 ints and on some
    architectures f_spare only has 4 ints. Which makes
    the previous technique of clearing each int individually
    broken.

    I don't expect anyone actually uses the old statfs system
    call anymore but if they do let them benefit from having
    the compat and the native version working the same.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Christoph Hellwig

    Eric W. Biederman
     

25 Oct, 2011

1 commit

  • * 'for-3.2' of git://linux-nfs.org/~bfields/linux: (103 commits)
    nfs41: implement DESTROY_CLIENTID operation
    nfsd4: typo logical vs bitwise negate for want_mask
    nfsd4: allow NFS4_SHARE_SIGNAL_DELEG_WHEN_RESRC_AVAIL | NFS4_SHARE_PUSH_DELEG_WHEN_UNCONTENDED
    nfsd4: seq->status_flags may be used unitialized
    nfsd41: use SEQ4_STATUS_BACKCHANNEL_FAULT when cb_sequence is invalid
    nfsd4: implement new 4.1 open reclaim types
    nfsd4: remove unneeded CLAIM_DELEGATE_CUR workaround
    nfsd4: warn on open failure after create
    nfsd4: preallocate open stateid in process_open1()
    nfsd4: do idr preallocation with stateid allocation
    nfsd4: preallocate nfs4_file in process_open1()
    nfsd4: clean up open owners on OPEN failure
    nfsd4: simplify process_open1 logic
    nfsd4: make is_open_owner boolean
    nfsd4: centralize renew_client() calls
    nfsd4: typo logical vs bitwise negate
    nfs: fix bug about IPv6 address scope checking
    nfsd4: more robust ignoring of WANT bits in OPEN
    nfsd4: move name-length checks to xdr
    nfsd4: move access/deny validity checks to xdr code
    ...

    Linus Torvalds
     

31 Aug, 2011

1 commit


27 Aug, 2011

1 commit


16 Jul, 2011

1 commit

  • As promised in feature-removal-schedule.txt it is time to
    remove the nfsctl system call.

    Userspace has perferred to not use this call throughout 2.6 and it has been
    excluded in the default configuration since 2.6.36 (9 months ago).

    So this patch removes all the code that was being compiled out.

    There are still references to sys_nfsctl in various arch systemcall tables
    and related code. These should be cleaned out too, probably in the next
    merge window.

    Signed-off-by: NeilBrown
    Signed-off-by: J. Bruce Fields

    NeilBrown
     

09 Apr, 2011

1 commit

  • Add the appropriate members into struct user_arg_ptr and teach
    get_user_arg_ptr() to handle is_compat = T case correctly.

    This allows us to remove the compat_do_execve() code from fs/compat.c
    and reimplement compat_do_execve() as the trivial wrapper on top of
    do_execve_common(is_compat => true).

    In fact, this fixes another (minor) bug. "compat_uptr_t str" can
    overflow after "str += len" in compat_copy_strings() if a 64bit
    application execs via sys32_execve().

    Unexport acct_arg_size() and get_arg_page(), fs/compat.c doesn't
    need them any longer.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: KOSAKI Motohiro
    Tested-by: KOSAKI Motohiro

    Oleg Nesterov