04 Oct, 2010

1 commit


07 Sep, 2010

2 commits

  • Sparse doesn't understand lock annotations of the form
    __releases(&foo->lock). Change them to __releases(foo->lock). Same
    for __acquires().

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • David Bartly reported that fuse can hang in fuse_get_req_nofail() when
    the connection to the filesystem server is no longer active.

    If bg_queue is not empty then flush_bg_queue() called from
    request_end() can put more requests on to the pending queue. If this
    happens while ending requests on the processing queue then those
    background requests will be queued to the pending list and never
    ended.

    Another problem is that fuse_dev_release() didn't wake up processes
    sleeping on blocked_waitq.

    Solve this by:

    a) flushing the background queue before calling end_requests() on the
    pending and processing queues

    b) setting blocked = 0 and waking up processes waiting on
    blocked_waitq()

    Thanks to David for an excellent bug report.

    Reported-by: David Bartley
    Signed-off-by: Miklos Szeredi
    CC: stable@kernel.org

    Miklos Szeredi
     

11 Aug, 2010

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
    no need for list_for_each_entry_safe()/resetting with superblock list
    Fix sget() race with failing mount
    vfs: don't hold s_umount over close_bdev_exclusive() call
    sysv: do not mark superblock dirty on remount
    sysv: do not mark superblock dirty on mount
    btrfs: remove junk sb_dirt change
    BFS: clean up the superblock usage
    AFFS: wait for sb synchronization when needed
    AFFS: clean up dirty flag usage
    cifs: truncate fallout
    mbcache: fix shrinker function return value
    mbcache: Remove unused features
    add f_flags to struct statfs(64)
    pass a struct path to vfs_statfs
    update VFS documentation for method changes.
    All filesystems that need invalidate_inode_buffers() are doing that explicitly
    convert remaining ->clear_inode() to ->evict_inode()
    Make ->drop_inode() just return whether inode needs to be dropped
    fs/inode.c:clear_inode() is gone
    fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
    ...

    Fix up trivial conflicts in fs/nilfs2/super.c

    Linus Torvalds
     

10 Aug, 2010

3 commits

  • Signed-off-by: Al Viro

    Al Viro
     
  • Make sure we check the truncate constraints early on in ->setattr by adding
    those checks to inode_change_ok. Also clean up and document inode_change_ok
    to make this obvious.

    As a fallout we don't have to call inode_newsize_ok from simple_setsize and
    simplify it down to a truncate_setsize which doesn't return an error. This
    simplifies a lot of setattr implementations and means we use truncate_setsize
    almost everywhere. Get rid of fat_setsize now that it's trivial and mark
    ext2_setsize static to make the calling convention obvious.

    Keep the inode_newsize_ok in vmtruncate for now as all callers need an
    audit for its removal anyway.

    Note: setattr code in ecryptfs doesn't call inode_change_ok at all and
    needs a deeper audit, but that is left for later.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Make sure we call inode_change_ok before doing any changes in ->setattr,
    and make sure to call it even if our fs wants to ignore normal UNIX
    permissions, but use the ATTR_FORCE to skip those.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

08 Aug, 2010

1 commit


02 Aug, 2010

1 commit

  • Currently MAY_ACCESS means that filesystems must check the permissions
    right then and not rely on cached results or the results of future
    operations on the object. This can be because of a call to sys_access() or
    because of a call to chdir() which needs to check search without relying on
    any future operations inside that dir. I plan to use MAY_ACCESS for other
    purposes in the security system, so I split the MAY_ACCESS and the
    MAY_CHDIR cases.

    Signed-off-by: Eric Paris
    Acked-by: Stephen D. Smalley
    Signed-off-by: James Morris

    Eric Paris
     

12 Jul, 2010

3 commits

  • Userspace filesystem can request data to be retrieved from the inode's
    mapping. This request is synchronous and the retrieved data is queued
    as a new request. If the write to the fuse device returns an error
    then the retrieve request was not completed and a reply will not be
    sent.

    Only present pages are returned in the retrieve reply. Retrieving
    stops when it finds a non-present page and only data prior to that is
    returned.

    This request doesn't change the dirty state of pages.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Userspace filesystem can request data to be stored in the inode's
    mapping. This request is synchronous and has no reply. If the write
    to the fuse device returns an error then the store request was not
    fully completed (but may have updated some pages).

    If the stored data overflows the current file size, then the size is
    extended, similarly to a write(2) on the filesystem.

    Pages which have been completely stored are marked uptodate.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Don't use atomic kmap for mapping userspace buffers in device
    read/write/splice.

    This is necessary because the next patch (adding store notify)
    requires that caller of fuse_copy_page() may sleep between
    invocations. The simplest way to ensure this is to change the atomic
    kmaps to non-atomic ones.

    Thankfully architectures where kmap() is not a no-op are going out of
    fashion, so we can ignore the (probably negligible) performance impact
    of this change.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

31 May, 2010

1 commit


28 May, 2010

1 commit


26 May, 2010

1 commit

  • This adds:
    alias: devname:
    to some common kernel modules, which will allow the on-demand loading
    of the kernel module when the device node is accessed.

    Ideally all these modules would be compiled-in, but distros seems too
    much in love with their modularization that we need to cover the common
    cases with this new facility. It will allow us to remove a bunch of pretty
    useless init scripts and modprobes from init scripts.

    The static device node aliases will be carried in the module itself. The
    program depmod will extract this information to a file in the module directory:
    $ cat /lib/modules/2.6.34-00650-g537b60d-dirty/modules.devname
    # Device nodes to trigger on-demand module loading.
    microcode cpu/microcode c10:184
    fuse fuse c10:229
    ppp_generic ppp c108:0
    tun net/tun c10:200
    dm_mod mapper/control c10:235

    Udev will pick up the depmod created file on startup and create all the
    static device nodes which the kernel modules specify, so that these modules
    get automatically loaded when the device node is accessed:
    $ /sbin/udevd --debug
    ...
    static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184
    static_dev_create_from_modules: mknod '/dev/fuse' c10:229
    static_dev_create_from_modules: mknod '/dev/ppp' c108:0
    static_dev_create_from_modules: mknod '/dev/net/tun' c10:200
    static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235
    udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666
    udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666

    A few device nodes are switched to statically allocated numbers, to allow
    the static nodes to work. This might also useful for systems which still run
    a plain static /dev, which is completely unsafe to use with any dynamic minor
    numbers.

    Note:
    The devname aliases must be limited to the *common* and *single*instance*
    device nodes, like the misc devices, and never be used for conceptually limited
    systems like the loop devices, which should rather get fixed properly and get a
    control node for losetup to talk to, instead of creating a random number of
    device nodes in advance, regardless if they are ever used.

    This facility is to hide the mess distros are creating with too modualized
    kernels, and just to hide that these modules are not compiled-in, and not to
    paper-over broken concepts. Thanks! :)

    Cc: Greg Kroah-Hartman
    Cc: David S. Miller
    Cc: Miklos Szeredi
    Cc: Chris Mason
    Cc: Alasdair G Kergon
    Cc: Tigran Aivazian
    Cc: Ian Kent
    Signed-Off-By: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Kay Sievers
     

25 May, 2010

6 commits

  • Allow userspace filesystem implementation to use splice() to read from
    the fuse device.

    The userspace filesystem can now transfer data coming from a WRITE
    request to an arbitrary file descriptor (regular file, block device or
    socket) without having to go through a userspace buffer.

    The semantics of using splice() to read messages are:

    1) with a single splice() call move the whole message from the fuse
    device to a temporary pipe
    2) read the header from the pipe and determine the message type
    3a) if message is a WRITE then splice data from pipe to destination
    3b) else read rest of message to userspace buffer

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • When splicing buffers to the fuse device with SPLICE_F_MOVE, try to
    move pages from the pipe buffer into the page cache. This allows
    populating the fuse filesystem's cache without ever touching the page
    contents, i.e. zero copy read capability.

    The following steps are performed when trying to move a page into the
    page cache:

    - buf->ops->confirm() to make sure the new page is uptodate
    - buf->ops->steal() to try to remove the new page from it's previous place
    - remove_from_page_cache() on the old page
    - add_to_page_cache_locked() on the new page

    If any of the above steps fail (non fatally) then the code falls back
    to copying the page. In particular ->steal() will fail if there are
    external references (other than the page cache and the pipe buffer) to
    the page.

    Also since the remove_from_page_cache() + add_to_page_cache_locked()
    are non-atomic it is possible that the page cache is repopulated in
    between the two and add_to_page_cache_locked() will fail. This could
    be fixed by creating a new atomic replace_page_cache_page() function.

    fuse_readpages_end() needed to be reworked so it works even if
    page->mapping is NULL for some or all pages which can happen if the
    add_to_page_cache_locked() failed.

    A number of sanity checks were added to make sure the stolen pages
    don't have weird flags set, etc... These could be moved into generic
    splice/steal code.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Allow userspace filesystem implementation to use splice() to write to
    the fuse device. The semantics of using splice() are:

    1) buffer the message header and data in a temporary pipe
    2) with a *single* splice() call move the message from the temporary pipe
    to the fuse device

    The READ reply message has the most interesting use for this, since
    now the data from an arbitrary file descriptor (which could be a
    regular file, a block device or a socket) can be tranferred into the
    fuse device without having to go through a userspace buffer. It will
    also allow zero copy moving of pages.

    One caveat is that the protocol on the fuse device requires the length
    of the whole message to be written into the header. But the length of
    the data transferred into the temporary pipe may not be known in
    advance. The current library implementation works around this by
    using vmplice to write the header and modifying the header after
    splicing the data into the pipe (error handling omitted):

    struct fuse_out_header out;

    iov.iov_base = &out;
    iov.iov_len = sizeof(struct fuse_out_header);
    vmsplice(pip[1], &iov, 1, 0);
    len = splice(input_fd, input_offset, pip[1], NULL, len, 0);
    /* retrospectively modify the header: */
    out.len = len + sizeof(struct fuse_out_header);
    splice(pip[0], NULL, fuse_chan_fd(req->ch), NULL, out.len, flags);

    This works since vmsplice only saves a pointer to the data, it does
    not copy the data itself.

    Since pipes are currently limited to 16 pages and messages need to be
    spliced atomically, the length of the data is limited to 15 pages (or
    60kB for 4k pages).

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Acquire a page ref on pages in ->readpages() and release them when the
    read has finished. Not acquiring a reference didn't seem to cause any
    trouble since the page is locked and will not be kicked out of the
    page cache during the read.

    However the following patches will want to remove the page from the
    cache so a separate ref is needed. Making the reference in req->pages
    explicit also makes the code easier to understand.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • Replace uses of get_user_pages() with get_user_pages_fast(). It looks
    nicer and should be faster in most cases.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     
  • "map" isn't needed any more after: 0bd87182d3ab18 "fuse: fix kunmap in
    fuse_ioctl_copy_user"

    Signed-off-by: Dan Carpenter
    Signed-off-by: Miklos Szeredi

    Dan Carpenter
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

08 Mar, 2010

1 commit


04 Mar, 2010

1 commit


09 Feb, 2010

1 commit

  • In particular, several occurances of funny versions of 'success',
    'unknown', 'therefore', 'acknowledge', 'argument', 'achieve', 'address',
    'beginning', 'desirable', 'separate' and 'necessary' are fixed.

    Signed-off-by: Daniel Mack
    Cc: Joe Perches
    Cc: Junio C Hamano
    Signed-off-by: Jiri Kosina

    Daniel Mack
     

05 Feb, 2010

2 commits

  • gcc 4.4 warns about:
    fs/fuse/dev.c: In function ‘fuse_notify_inval_entry’:
    fs/fuse/dev.c:925: warning: the frame size of 1060 bytes is larger than 1024 bytes

    The problem is we declare two structures and a large array on the stack,
    I move the array alway from the stack and allocate memory for it dynamically.

    Signed-off-by: Fang Wenqi
    Signed-off-by: Miklos Szeredi

    Fang Wenqi
     
  • Small cleanup in fuse_notify_inval_inode() and
    fuse_notify_inval_entry().

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

03 Feb, 2010

1 commit

  • The cache alias problem will happen if the changes of user shared mapping
    is not flushed before copying, then user and kernel mapping may be mapped
    into two different cache line, it is impossible to guarantee the coherence
    after iov_iter_copy_from_user_atomic. So the right steps should be:

    flush_dcache_page(page);
    kmap_atomic(page);
    write to page;
    kunmap_atomic(page);
    flush_dcache_page(page);

    More precisely, we might create two new APIs flush_dcache_user_page and
    flush_dcache_kern_page to replace the two flush_dcache_page accordingly.

    Here is a snippet tested on omap2430 with VIPT cache, and I think it is
    not ARM-specific:

    int val = 0x11111111;
    fd = open("abc", O_RDWR);
    addr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
    *(addr+0) = 0x44444444;
    tmp = *(addr+0);
    *(addr+1) = 0x77777777;
    write(fd, &val, sizeof(int));
    close(fd);

    The results are not always 0x11111111 0x77777777 at the beginning as expected. Sometimes we see 0x44444444 0x77777777.

    Signed-off-by: Anfei
    Cc: Russell King
    Cc: Miklos Szeredi
    Cc: Nick Piggin
    Cc:
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    anfei zhou
     

27 Nov, 2009

1 commit

  • The comment in fuse_open about O_DIRECT:

    "VFS checks this, but only _after_ ->open()"

    also holds for fuse_create, however, the same kind of check was missing there.

    As an impact of this bug, open(newfile, O_RDWR|O_CREAT|O_DIRECT) fails, but a
    stub newfile will remain if the fuse server handled the implied FUSE_CREATE
    request appropriately.

    Other impact: in the above situation ima_file_free() will complain to open/free
    imbalance if CONFIG_IMA is set.

    Signed-off-by: Csaba Henk
    Signed-off-by: Miklos Szeredi
    Cc: Harshavardhana
    Cc: stable@kernel.org

    Csaba Henk
     

04 Nov, 2009

3 commits


28 Sep, 2009

1 commit


24 Sep, 2009

1 commit

  • Update some fs code to make use of new helper functions introduced
    in the previous patch. Should be no significant change in behaviour
    (except CIFS now calls send_sig under i_lock, via inode_newsize_ok).

    Reviewed-by: Christoph Hellwig
    Acked-by: Miklos Szeredi
    Cc: linux-nfs@vger.kernel.org
    Cc: Trond.Myklebust@netapp.com
    Cc: linux-cifs-client@lists.samba.org
    Cc: sfrench@samba.org
    Signed-off-by: Nick Piggin
    Signed-off-by: Al Viro

    npiggin@suse.de
     

19 Sep, 2009

1 commit


16 Sep, 2009

4 commits

  • We do this automatically in get_sb_bdev() from the set_bdev_super()
    callback. Filesystems that have their own private backing_dev_info
    must assign that in ->fill_super().

    Note that ->s_bdi assignment is required for proper writeback!

    Acked-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Make the max_background and congestion_threshold parameters of a FUSE
    mount tunable at runtime by adding the respective knobs to its directory
    within the fusectl filesystem.

    Signed-off-by: Csaba Henk
    Signed-off-by: Miklos Szeredi

    Csaba Henk
     
  • An untrusted user could DoS the system if s/he were allowed to accumulate an
    arbitrary number of pending background requests by setting the above limits
    to extremely high values in INIT. This patch excludes this possibility by
    imposing global upper limits on the possible values of per-mount "max
    background requests" and "congestion threshold" parameters for unprivileged
    FUSE filesystems.

    These global limits are implemented as module parameters.

    Signed-off-by: Csaba Henk
    Signed-off-by: Miklos Szeredi

    Csaba Henk
     
  • drop_nlink() is the API function to decrease the link count of an inode.
    However, at a place the control filesystem used the decrement operator
    on i_nlink directly. Fix this.

    Cc: Anand Avati
    Signed-off-by: Csaba Henk
    Signed-off-by: Miklos Szeredi

    Csaba Henk
     

11 Sep, 2009

1 commit