02 May, 2013

1 commit

  • Pull VFS updates from Al Viro,

    Misc cleanups all over the place, mainly wrt /proc interfaces (switch
    create_proc_entry to proc_create(), get rid of the deprecated
    create_proc_read_entry() in favor of using proc_create_data() and
    seq_file etc).

    7kloc removed.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
    don't bother with deferred freeing of fdtables
    proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
    proc: Make the PROC_I() and PDE() macros internal to procfs
    proc: Supply a function to remove a proc entry by PDE
    take cgroup_open() and cpuset_open() to fs/proc/base.c
    ppc: Clean up scanlog
    ppc: Clean up rtas_flash driver somewhat
    hostap: proc: Use remove_proc_subtree()
    drm: proc: Use remove_proc_subtree()
    drm: proc: Use minor->index to label things, not PDE->name
    drm: Constify drm_proc_list[]
    zoran: Don't print proc_dir_entry data in debug
    reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
    proc: Supply an accessor for getting the data from a PDE's parent
    airo: Use remove_proc_subtree()
    rtl8192u: Don't need to save device proc dir PDE
    rtl8187se: Use a dir under /proc/net/r8180/
    proc: Add proc_mkdir_data()
    proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
    proc: Move PDE_NET() to fs/proc/proc_net.c
    ...

    Linus Torvalds
     

01 May, 2013

1 commit

  • Pull compat cleanup from Al Viro:
    "Mostly about syscall wrappers this time; there will be another pile
    with patches in the same general area from various people, but I'd
    rather push those after both that and vfs.git pile are in."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    syscalls.h: slightly reduce the jungles of macros
    get rid of union semop in sys_semctl(2) arguments
    make do_mremap() static
    sparc: no need to sign-extend in sync_file_range() wrapper
    ppc compat wrappers for add_key(2) and request_key(2) are pointless
    x86: trim sys_ia32.h
    x86: sys32_kill and sys32_mprotect are pointless
    get rid of compat_sys_semctl() and friends in case of ARCH_WANT_OLD_COMPAT_IPC
    merge compat sys_ipc instances
    consolidate compat lookup_dcookie()
    convert vmsplice to COMPAT_SYSCALL_DEFINE
    switch getrusage() to COMPAT_SYSCALL_DEFINE
    switch epoll_pwait to COMPAT_SYSCALL_DEFINE
    convert sendfile{,64} to COMPAT_SYSCALL_DEFINE
    switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE
    make SYSCALL_DEFINE-generated wrappers do asmlinkage_protect
    make HAVE_SYSCALL_WRAPPERS unconditional
    consolidate cond_syscall and SYSCALL_ALIAS declarations
    teach SYSCALL_DEFINE how to deal with long long/unsigned long long
    get rid of duplicate logics in __SC_....[1-6] definitions

    Linus Torvalds
     

10 Apr, 2013

5 commits


22 Mar, 2013

1 commit

  • default_file_splice_from() ends up calling vfs_write() (via very convoluted
    callchain). It's an overkill, since we already have done rw_verify_area()
    in the caller by the time we call vfs_write() we are under set_fs(KERNEL_DS),
    so access_ok() is also pointless. Add a new helper (__kernel_write()),
    use it instead of kernel_write() in there.

    Signed-off-by: Al Viro

    Al Viro
     

04 Mar, 2013

1 commit


26 Feb, 2013

1 commit


23 Feb, 2013

1 commit


07 Jan, 2013

1 commit

  • commit 35f9c09fe9c72e (tcp: tcp_sendpages() should call tcp_push() once)
    added an internal flag : MSG_SENDPAGE_NOTLAST meant to be set on all
    frags but the last one for a splice() call.

    The condition used to set the flag in pipe_to_sendpage() relied on
    splice() user passing the exact number of bytes present in the pipe,
    or a smaller one.

    But some programs pass an arbitrary high value, and the test fails.

    The effect of this bug is a lack of tcp_push() at the end of a
    splice(pipe -> socket) call, and possibly very slow or erratic TCP
    sessions.

    We should both test sd->total_len and fact that another fragment
    is in the pipe (pipe->nrbufs > 1)

    Many thanks to Willy for providing very clear bug report, bisection
    and test programs.

    Reported-by: Willy Tarreau
    Bisected-by: Willy Tarreau
    Tested-by: Willy Tarreau
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller

    Eric Dumazet
     

12 Dec, 2012

1 commit


27 Sep, 2012

1 commit


31 Jul, 2012

1 commit

  • There are several entry points which dirty pages in a filesystem. mmap
    (handled by block_page_mkwrite()), buffered write (handled by
    __generic_file_aio_write()), splice write (generic_file_splice_write),
    truncate, and fallocate (these can dirty last partial page - handled inside
    each filesystem separately). Protect these places with sb_start_write() and
    sb_end_write().

    ->page_mkwrite() calls are particularly complex since they are called with
    mmap_sem held and thus we cannot use standard sb_start_write() due to lock
    ordering constraints. We solve the problem by using a special freeze protection
    sb_start_pagefault() which ranks below mmap_sem.

    BugLink: https://bugs.launchpad.net/bugs/897421
    Tested-by: Kamal Mostafa
    Tested-by: Peter M. Petrakis
    Tested-by: Dann Frazier
    Tested-by: Massimo Morana
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     

14 Jun, 2012

1 commit

  • Dave Jones reported a kernel BUG at mm/slub.c:3474! triggered
    by splice_shrink_spd() called from vmsplice_to_pipe()

    commit 35f3d14dbbc5 (pipe: add support for shrinking and growing pipes)
    added capability to adjust pipe->buffers.

    Problem is some paths don't hold pipe mutex and assume pipe->buffers
    doesn't change for their duration.

    Fix this by adding nr_pages_max field in struct splice_pipe_desc, and
    use it in place of pipe->buffers where appropriate.

    splice_shrink_spd() loses its struct pipe_inode_info argument.

    Reported-by: Dave Jones
    Signed-off-by: Eric Dumazet
    Cc: Jens Axboe
    Cc: Alexander Viro
    Cc: Tom Herbert
    Cc: stable # 2.6.35
    Tested-by: Dave Jones
    Signed-off-by: Jens Axboe

    Eric Dumazet
     

02 Jun, 2012

2 commits

  • Pull vfs changes from Al Viro.
    "A lot of misc stuff. The obvious groups:
    * Miklos' atomic_open series; kills the damn abuse of
    ->d_revalidate() by NFS, which was the major stumbling block for
    all work in that area.
    * ripping security_file_mmap() and dealing with deadlocks in the
    area; sanitizing the neighborhood of vm_mmap()/vm_munmap() in
    general.
    * ->encode_fh() switched to saner API; insane fake dentry in
    mm/cleancache.c gone.
    * assorted annotations in fs (endianness, __user)
    * parts of Artem's ->s_dirty work (jff2 and reiserfs parts)
    * ->update_time() work from Josef.
    * other bits and pieces all over the place.

    Normally it would've been in two or three pull requests, but
    signal.git stuff had eaten a lot of time during this cycle ;-/"

    Fix up trivial conflicts in Documentation/filesystems/vfs.txt (the
    'truncate_range' inode method was removed by the VM changes, the VFS
    update adds an 'update_time()' method), and in fs/btrfs/ulist.[ch] (due
    to sparse fix added twice, with other changes nearby).

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (95 commits)
    nfs: don't open in ->d_revalidate
    vfs: retry last component if opening stale dentry
    vfs: nameidata_to_filp(): don't throw away file on error
    vfs: nameidata_to_filp(): inline __dentry_open()
    vfs: do_dentry_open(): don't put filp
    vfs: split __dentry_open()
    vfs: do_last() common post lookup
    vfs: do_last(): add audit_inode before open
    vfs: do_last(): only return EISDIR for O_CREAT
    vfs: do_last(): check LOOKUP_DIRECTORY
    vfs: do_last(): make ENOENT exit RCU safe
    vfs: make follow_link check RCU safe
    vfs: do_last(): use inode variable
    vfs: do_last(): inline walk_component()
    vfs: do_last(): make exit RCU safe
    vfs: split do_lookup()
    Btrfs: move over to use ->update_time
    fs: introduce inode operation ->update_time
    reiserfs: get rid of resierfs_sync_super
    reiserfs: mark the superblock as dirty a bit later
    ...

    Linus Torvalds
     
  • Btrfs has to make sure we have space to allocate new blocks in order to modify
    the inode, so updating time can fail. We've gotten around this by having our
    own file_update_time but this is kind of a pain, and Christoph has indicated he
    would like to make xfs do something different with atime updates. So introduce
    ->update_time, where we will deal with i_version an a/m/c time updates and
    indicate which changes need to be made. The normal version just does what it
    has always done, updates the time and marks the inode dirty, and then
    filesystems can choose to do something different.

    I've gone through all of the users of file_update_time and made them check for
    errors with the exception of the fault code since it's complicated and I wasn't
    quite sure what to do there, also Jan is going to be pushing the file time
    updates into page_mkwrite for those who have it so that should satisfy btrfs and
    make it not a big deal to check the file_update_time() return code in the
    generic fault path. Thanks,

    Signed-off-by: Josef Bacik

    Josef Bacik
     

01 May, 2012

1 commit


20 Apr, 2012

1 commit

  • It seems there is no fundamental reason to limit vmsplice()
    SPLICE_F_GIFT to page aligned chunks.

    All helpers are prepared to cope with offsets in page.

    This limitation makes vmsplice() API very impractical in the zero-copy
    land.

    Signed-off-by: Eric Dumazet
    Cc: Tom Herbert
    Cc: Jens Axboe
    Cc: Andrew Morton
    Cc: David Miller
    Cc: Al Viro
    Cc: Hugh Dickins
    Cc: Changli Gao
    Cc: Miklos Szeredi
    Signed-off-by: Jens Axboe

    Eric Dumazet
     

06 Apr, 2012

1 commit

  • commit 2f533844242 (tcp: allow splice() to build full TSO packets) added
    a regression for splice() calls using SPLICE_F_MORE.

    We need to call tcp_flush() at the end of the last page processed in
    tcp_sendpages(), or else transmits can be deferred and future sends
    stall.

    Add a new internal flag, MSG_SENDPAGE_NOTLAST, acting like MSG_MORE, but
    with different semantic.

    For all sendpage() providers, its a transparent change. Only
    sock_sendpage() and tcp_sendpages() can differentiate the two different
    flags provided by pipe_to_sendpage()

    Reported-by: Tom Herbert
    Cc: Nandita Dukkipati
    Cc: Neal Cardwell
    Cc: Tom Herbert
    Cc: Yuchung Cheng
    Cc: H.K. Jerry Chu
    Cc: Maciej Żenczykowski
    Cc: Mahesh Bandewar
    Cc: Ilpo Järvinen
    Signed-off-by: Eric Dumazet com>
    Signed-off-by: David S. Miller

    Eric Dumazet
     

25 Mar, 2012

1 commit

  • Pull cleanup of fs/ and lib/ users of module.h from Paul Gortmaker:
    "Fix up files in fs/ and lib/ dirs to only use module.h if they really
    need it.

    These are trivial in scope vs the work done previously. We now have
    things where any few remaining cleanups can be farmed out to arch or
    subsystem maintainers, and I have done so when possible. What is
    remaining here represents the bits that don't clearly lie within a
    single arch/subsystem boundary, like the fs dir and the lib dir.

    Some duplicate includes arising from overlapping fixes from
    independent subsystem maintainer submissions are also quashed."

    Fix up trivial conflicts due to clashes with other include file cleanups
    (including some due to the previous bug.h cleanup pull).

    * tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
    lib: reduce the use of module.h wherever possible
    fs: reduce the use of module.h wherever possible
    includecheck: delete any duplicate instances of module.h

    Linus Torvalds
     

20 Mar, 2012

1 commit


29 Feb, 2012

1 commit


04 Jan, 2012

1 commit

  • Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export
    kill_bdev as well, so brd doesn't have to open code it. Reduce
    buffer_head.h requirement accordingly.

    Removed a rather large comment from invalidate_bdev, as it looked a bit
    obsolete to bother moving. The small comment replacing it says enough.

    Signed-off-by: Nick Piggin
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Al Viro
     

26 Jul, 2011

1 commit

  • Copy __generic_file_splice_read() and generic_file_splice_read() from
    fs/splice.c to shmem_file_splice_read() in mm/shmem.c. Make
    page_cache_pipe_buf_ops and spd_release_page() accessible to it.

    Signed-off-by: Hugh Dickins
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

24 May, 2011

1 commit


14 Jan, 2011

1 commit

  • * 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
    block: ensure that completion error gets properly traced
    blktrace: add missing probe argument to block_bio_complete
    block cfq: don't use atomic_t for cfq_group
    block cfq: don't use atomic_t for cfq_queue
    block: trace event block fix unassigned field
    block: add internal hd part table references
    block: fix accounting bug on cross partition merges
    kref: add kref_test_and_get
    bio-integrity: mark kintegrityd_wq highpri and CPU intensive
    block: make kblockd_workqueue smarter
    Revert "sd: implement sd_check_events()"
    block: Clean up exit_io_context() source code.
    Fix compile warnings due to missing removal of a 'ret' variable
    fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
    block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
    cfq-iosched: don't check cfqg in choose_service_tree()
    fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
    cdrom: export cdrom_check_events()
    sd: implement sd_check_events()
    sr: implement sr_check_events()
    ...

    Linus Torvalds
     

17 Dec, 2010

1 commit

  • This patch pulls calls to buf->ops->confirm() from all actors passed
    (also indirectly) to splice_from_pipe_feed().

    Is avoiding the call to buf->ops->confirm() while splice()ing to
    /dev/null is an intentional optimization? No other user does that
    and this will remove this special case.

    Against current linux.git 6313e3c21743cc88bb5bd8aa72948ee1e83937b6.

    Signed-off-by: Michał Mirosław
    Signed-off-by: Jens Axboe

    Michał Mirosław
     

29 Nov, 2010

2 commits

  • And in particular, use it in 'pipe_fcntl()'.

    The other pipe functions do not need to use the 'careful' version, since
    they are only ever called for things that are already known to be pipes.

    The normal read/write/ioctl functions are called through the file
    operations structures, so if a file isn't a pipe, they'd never get
    called. But pipe_fcntl() is special, and called directly from the
    generic fcntl code, and needs to use the same careful function that the
    splice code is using.

    Cc: Jens Axboe
    Cc: Andrew Morton
    Cc: Al Viro
    Cc: Dave Jones
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • .. and change it to take the 'file' pointer instead of an inode, since
    that's what all users want anyway.

    The renaming is preparatory to exporting it to other users. The old
    'pipe_info()' name was too generic and is already used elsewhere, so
    before making the function public we need to use a more specific name.

    Cc: Jens Axboe
    Cc: Andrew Morton
    Cc: Al Viro
    Cc: Dave Jones
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

08 Aug, 2010

2 commits

  • SPLICE_F_NONBLOCK is clearly documented to only affect blocking on the
    pipe. In __generic_file_splice_read(), however, it causes an EAGAIN
    if the page is currently being read.

    This makes it impossible to write an application that only wants
    failure if the pipe is full. For example if the same process is
    handling both ends of a pipe and isn't otherwise able to determine
    whether a splice to the pipe will fill it or not.

    We could make the read non-blocking on O_NONBLOCK or some other splice
    flag, but for now this is the simplest fix.

    Signed-off-by: Miklos Szeredi
    CC: stable@kernel.org
    Signed-off-by: Jens Axboe

    Miklos Szeredi
     
  • No real bugs I believe, just some dead code, and some
    shut up code.

    Signed-off-by: Andi Kleen
    Cc: Eric Paris
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Andi Kleen
     

30 Jun, 2010

2 commits

  • check f_mode for seekable file

    As a seekable file is allowed without a llseek function, so the old way isn't
    work any more.

    Signed-off-by: Changli Gao
    Signed-off-by: Miklos Szeredi
    ----
    fs/splice.c | 6 ++----
    1 file changed, 2 insertions(+), 4 deletions(-)
    Signed-off-by: Jens Axboe

    Changli Gao
     
  • direct_splice_actor() shouldn't use sd->pos, as sd->pos is for file reading,
    file->f_pos should be used instead.

    Signed-off-by: Changli Gao
    Signed-off-by: Miklos Szeredi
    ----
    fs/splice.c | 3 ++-
    1 file changed, 2 insertions(+), 1 deletion(-)
    Signed-off-by: Jens Axboe

    Changli Gao
     

25 May, 2010

1 commit

  • mapping_gfp_mask() is not supposed to store allocation contex details,
    only page location details. So mapping_gfp_mask should be applied to the
    pagecache page allocation, wheras normal (kernel mapped) memory should be
    used for surrounding allocations such as radix-tree nodes allocated by
    add_to_page_cache. Context modifiers should be applied on a per-callsite
    basis.

    So change splice to follow this convention (which is followed in similar
    code patterns in core code).

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Nick Piggin
     

22 May, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

04 Nov, 2009

1 commit

  • sendfile(2) was reworked with the splice infrastructure, but it still
    checks f_op.sendpage() instead of f_op.splice_write() wrongly. Although
    if f_op.sendpage() exists, f_op.splice_write() always exists at the same
    time currently, the assumption will be broken in future silently. This
    patch also brings a side effect: sendfile(2) can work with any output
    file. Some security checks related to f_op are added too.

    Signed-off-by: Changli Gao
    Signed-off-by: Jens Axboe

    Changli Gao
     

15 Sep, 2009

1 commit

  • * 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block: (29 commits)
    block: use blkdev_issue_discard in blk_ioctl_discard
    Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads
    block: don't assume device has a request list backing in nr_requests store
    block: Optimal I/O limit wrapper
    cfq: choose a new next_req when a request is dispatched
    Seperate read and write statistics of in_flight requests
    aoe: end barrier bios with EOPNOTSUPP
    block: trace bio queueing trial only when it occurs
    block: enable rq CPU completion affinity by default
    cfq: fix the log message after dispatched a request
    block: use printk_once
    cciss: memory leak in cciss_init_one()
    splice: update mtime and atime on files
    block: make blk_iopoll_prep_sched() follow normal 0/1 return convention
    cfq-iosched: get rid of must_alloc flag
    block: use interrupts disabled version of raise_softirq_irqoff()
    block: fix comment in blk-iopoll.c
    block: adjust default budget for blk-iopoll
    block: fix long lines in block/blk-iopoll.c
    block: add blk-iopoll, a NAPI like approach for block devices
    ...

    Linus Torvalds