13 Mar, 2014

1 commit

  • Previously, the no-op "mount -o mount /dev/xxx" operation when the
    file system is already mounted read-write causes an implied,
    unconditional syncfs(). This seems pretty stupid, and it's certainly
    documented or guaraunteed to do this, nor is it particularly useful,
    except in the case where the file system was mounted rw and is getting
    remounted read-only.

    However, it's possible that there might be some file systems that are
    actually depending on this behavior. In most file systems, it's
    probably fine to only call sync_filesystem() when transitioning from
    read-write to read-only, and there are some file systems where this is
    not needed at all (for example, for a pseudo-filesystem or something
    like romfs).

    Signed-off-by: "Theodore Ts'o"
    Cc: linux-fsdevel@vger.kernel.org
    Cc: Christoph Hellwig
    Cc: Artem Bityutskiy
    Cc: Adrian Hunter
    Cc: Evgeniy Dushistov
    Cc: Jan Kara
    Cc: OGAWA Hirofumi
    Cc: Anders Larsen
    Cc: Phillip Lougher
    Cc: Kees Cook
    Cc: Mikulas Patocka
    Cc: Petr Vandrovec
    Cc: xfs@oss.sgi.com
    Cc: linux-btrfs@vger.kernel.org
    Cc: linux-cifs@vger.kernel.org
    Cc: samba-technical@lists.samba.org
    Cc: codalist@coda.cs.cmu.edu
    Cc: linux-ext4@vger.kernel.org
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: fuse-devel@lists.sourceforge.net
    Cc: cluster-devel@redhat.com
    Cc: linux-mtd@lists.infradead.org
    Cc: jfs-discussion@lists.sourceforge.net
    Cc: linux-nfs@vger.kernel.org
    Cc: linux-nilfs@vger.kernel.org
    Cc: linux-ntfs-dev@lists.sourceforge.net
    Cc: ocfs2-devel@oss.oracle.com
    Cc: reiserfs-devel@vger.kernel.org

    Theodore Ts'o
     

24 Nov, 2013

1 commit


20 Nov, 2013

7 commits

  • Fix static checker complaint that stream is not checked in
    squashfs_decompressor_destroy().

    Reported-by: Dan Carpenter
    Signed-off-by: Phillip Lougher
    Reviewed-by: Minchan Kim

    Phillip Lougher
     
  • This introduces an implementation of squashfs_readpage_block()
    that directly decompresses into the page cache.

    This uses the previously added page handler abstraction to push
    down the necessary kmap_atomic/kunmap_atomic operations on the
    page cache buffers into the decompressors. This enables
    direct copying into the page cache without using the slow
    kmap/kunmap calls.

    The code detects when multiple threads are racing in
    squashfs_readpage() to decompress the same block, and avoids
    this regression by falling back to using an intermediate
    buffer.

    This patch enhances the performance of Squashfs significantly
    when multiple processes are accessing the filesystem simultaneously
    because it not only reduces memcopying, but it more importantly
    eliminates the lock contention on the intermediate buffer.

    Using single-thread decompression.

    dd if=file1 of=/dev/null bs=4096 &
    dd if=file2 of=/dev/null bs=4096 &
    dd if=file3 of=/dev/null bs=4096 &
    dd if=file4 of=/dev/null bs=4096

    Before:

    629145600 bytes (629 MB) copied, 45.8046 s, 13.7 MB/s

    After:

    629145600 bytes (629 MB) copied, 9.29414 s, 67.7 MB/s

    Signed-off-by: Phillip Lougher
    Reviewed-by: Minchan Kim

    Phillip Lougher
     
  • Restructure squashfs_readpage() splitting it into separate
    functions for datablocks, fragments and sparse blocks.

    Move the memcpying (from squashfs cache entry) implementation of
    squashfs_readpage_block into file_cache.c

    This allows different implementations to be supported.

    Signed-off-by: Phillip Lougher
    Reviewed-by: Minchan Kim

    Phillip Lougher
     
  • Further generalise the decompressors by adding a page handler
    abstraction. This adds helpers to allow the decompressors
    to access and process the output buffers in an implementation
    independant manner.

    This allows different types of output buffer to be passed
    to the decompressors, with the implementation specific
    aspects handled at decompression time, but without the
    knowledge being held in the decompressor wrapper code.

    This will allow the decompressors to handle Squashfs
    cache buffers, and page cache pages.

    This patch adds the abstraction and an implementation for
    the caches.

    Signed-off-by: Phillip Lougher
    Reviewed-by: Minchan Kim

    Phillip Lougher
     
  • Add a multi-threaded decompression implementation which uses
    percpu variables.

    Using percpu variables has advantages and disadvantages over
    implementations which do not use percpu variables.

    Advantages:
    * the nature of percpu variables ensures decompression is
    load-balanced across the multiple cores.
    * simplicity.

    Disadvantages: it limits decompression to one thread per core.

    Signed-off-by: Phillip Lougher

    Phillip Lougher
     
  • Now squashfs have used for only one stream buffer for decompression
    so it hurts parallel read performance so this patch supports
    multiple decompressor to enhance performance parallel I/O.

    Four 1G file dd read on KVM machine which has 2 CPU and 4G memory.

    dd if=test/test1.dat of=/dev/null &
    dd if=test/test2.dat of=/dev/null &
    dd if=test/test3.dat of=/dev/null &
    dd if=test/test4.dat of=/dev/null &

    old : 1m39s -> new : 9s

    * From v1
    * Change comp_strm with decomp_strm - Phillip
    * Change/add comments - Phillip

    Signed-off-by: Minchan Kim
    Signed-off-by: Phillip Lougher

    Minchan Kim
     
  • The decompressor interface and code was written from
    the point of view of single-threaded operation. In doing
    so it mixed a lot of single-threaded implementation specific
    aspects into the decompressor code and elsewhere which makes it
    difficult to seamlessly support multiple different decompressor
    implementations.

    This patch does the following:

    1. It removes compressor_options parsing from the decompressor
    init() function. This allows the decompressor init() function
    to be dynamically called to instantiate multiple decompressors,
    without the compressor options needing to be read and parsed each
    time.

    2. It moves threading and all sleeping operations out of the
    decompressors. In doing so, it makes the decompressors
    non-blocking wrappers which only deal with interfacing with
    the decompressor implementation.

    3. It splits decompressor.[ch] into decompressor generic functions
    in decompressor.[ch], and moves the single threaded
    decompressor implementation into decompressor_single.c.

    The result of this patch is Squashfs should now be able to
    support multiple decompressors by adding new decompressor_xxx.c
    files with specialised implementations of the functions in
    decompressor_single.c

    Signed-off-by: Phillip Lougher
    Reviewed-by: Minchan Kim

    Phillip Lougher
     

06 Sep, 2013

5 commits

  • We read the type field from disk. This value should be sanity
    checked for correctness to avoid an out of bounds access when
    reading the squashfs_filetype_table array.

    Signed-off-by: Phillip Lougher

    Phillip Lougher
     
  • We read the size (of the name) field from disk. This value should
    be sanity checked for correctness to avoid blindly reading
    huge amounts of unnecessary data from disk on corruption.

    Note, here we're not actually reading the name into a buffer, but
    skipping it, and so corruption doesn't cause buffer overflow, merely
    lots of unnecessary amounts of data to be read.

    Signed-off-by: Phillip Lougher

    Phillip Lougher
     
  • The dir_count and size fields when read from disk are sanity
    checked for correctness. However, the sanity checks only check the
    values are not greater than expected. As dir_count and size were
    incorrectly defined as signed ints, this can lead to corrupted values
    appearing as negative which are not trapped.

    Signed-off-by: Phillip Lougher

    Phillip Lougher
     
  • The dir_count and size fields when read from disk are sanity
    checked for correctness. However, the sanity checks only check the
    values are not greater than expected. As dir_count and size were
    incorrectly defined as signed ints, this can lead to corrupted values
    appearing as negative which are not trapped.

    Signed-off-by: Phillip Lougher

    Phillip Lougher
     
  • Patch "Squashfs: sanity check information from disk" from
    Dan Carpenter adds a missing check for corruption in the
    "size" field while reading the directory index from disk.

    It, however, sets err to -EINVAL, this value is not used later, and
    so setting it is completely redundant. So remove it.

    Errors in reading the index are deliberately non-fatal. If we
    get an error in reading the index we just return the part of the
    index we have managed to read - the index isn't essential,
    just quicker.

    Signed-off-by: Phillip Lougher

    Phillip Lougher
     

05 Sep, 2013

1 commit


29 Aug, 2013

1 commit


29 Jun, 2013

1 commit


11 Mar, 2013

1 commit


23 Feb, 2013

1 commit


03 Oct, 2012

2 commits

  • Pull vfs update from Al Viro:

    - big one - consolidation of descriptor-related logics; almost all of
    that is moved to fs/file.c

    (BTW, I'm seriously tempted to rename the result to fd.c. As it is,
    we have a situation when file_table.c is about handling of struct
    file and file.c is about handling of descriptor tables; the reasons
    are historical - file_table.c used to be about a static array of
    struct file we used to have way back).

    A lot of stray ends got cleaned up and converted to saner primitives,
    disgusting mess in android/binder.c is still disgusting, but at least
    doesn't poke so much in descriptor table guts anymore. A bunch of
    relatively minor races got fixed in process, plus an ext4 struct file
    leak.

    - related thing - fget_light() partially unuglified; see fdget() in
    there (and yes, it generates the code as good as we used to have).

    - also related - bits of Cyrill's procfs stuff that got entangled into
    that work; _not_ all of it, just the initial move to fs/proc/fd.c and
    switch of fdinfo to seq_file.

    - Alex's fs/coredump.c spiltoff - the same story, had been easier to
    take that commit than mess with conflicts. The rest is a separate
    pile, this was just a mechanical code movement.

    - a few misc patches all over the place. Not all for this cycle,
    there'll be more (and quite a few currently sit in akpm's tree)."

    Fix up trivial conflicts in the android binder driver, and some fairly
    simple conflicts due to two different changes to the sock_alloc_file()
    interface ("take descriptor handling from sock_alloc_file() to callers"
    vs "net: Providing protocol type via system.sockprotoname xattr of
    /proc/PID/fd entries" adding a dentry name to the socket)

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
    MAX_LFS_FILESIZE should be a loff_t
    compat: fs: Generic compat_sys_sendfile implementation
    fs: push rcu_barrier() from deactivate_locked_super() to filesystems
    btrfs: reada_extent doesn't need kref for refcount
    coredump: move core dump functionality into its own file
    coredump: prevent double-free on an error path in core dumper
    usb/gadget: fix misannotations
    fcntl: fix misannotations
    ceph: don't abuse d_delete() on failure exits
    hypfs: ->d_parent is never NULL or negative
    vfs: delete surplus inode NULL check
    switch simple cases of fget_light to fdget
    new helpers: fdget()/fdput()
    switch o2hb_region_dev_write() to fget_light()
    proc_map_files_readdir(): don't bother with grabbing files
    make get_file() return its argument
    vhost_set_vring(): turn pollstart/pollstop into bool
    switch prctl_set_mm_exe_file() to fget_light()
    switch xfs_find_handle() to fget_light()
    switch xfs_swapext() to fget_light()
    ...

    Linus Torvalds
     
  • There's no reason to call rcu_barrier() on every
    deactivate_locked_super(). We only need to make sure that all delayed rcu
    free inodes are flushed before we destroy related cache.

    Removing rcu_barrier() from deactivate_locked_super() affects some fast
    paths. E.g. on my machine exit_group() of a last process in IPC
    namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.

    Signed-off-by: Kirill A. Shutemov
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Kirill A. Shutemov
     

21 Sep, 2012

1 commit


14 Jul, 2012

1 commit

  • Just the flags; only NFS cares even about that, but there are
    legitimate uses for such argument. And getting rid of that
    completely would require splitting ->lookup() into a couple
    of methods (at least), so let's leave that alone for now...

    Signed-off-by: Al Viro

    Al Viro
     

29 Mar, 2012

1 commit

  • Pull squashfs updates from Phillip Lougher:
    "Add an extra mount time sanity check, plus some code cleanups and bug
    fixes."

    * tag 'squashfs-updates' of git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-next:
    Squashfs: add mount time sanity check for block_size and block_log match
    Squashfs: fix f_pos check in get_dir_index_using_offset
    Squashfs: get rid of obsolete definitions in header file
    Squashfs: remove redundant length initialisation in squashfs_lookup
    Squashfs: remove redundant length initialisation in squashfs_readdir
    Squashfs: update comment removing reference to zlib only
    Squashfs: use define instead of constant

    Linus Torvalds
     

22 Mar, 2012

1 commit

  • Pull vfs pile 1 from Al Viro:
    "This is _not_ all; in particular, Miklos' and Jan's stuff is not there
    yet."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
    ext4: initialization of ext4_li_mtx needs to be done earlier
    debugfs-related mode_t whack-a-mole
    hfsplus: add an ioctl to bless files
    hfsplus: change finder_info to u32
    hfsplus: initialise userflags
    qnx4: new helper - try_extent()
    qnx4: get rid of qnx4_bread/qnx4_getblk
    take removal of PF_FORKNOEXEC to flush_old_exec()
    trim includes in inode.c
    um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
    um: embed ->stub_pages[] into mmu_context
    gadgetfs: list_for_each_safe() misuse
    ocfs2: fix leaks on failure exits in module_init
    ecryptfs: make register_filesystem() the last potential failure exit
    ntfs: forgets to unregister sysctls on register_filesystem() failure
    logfs: missing cleanup on register_filesystem() failure
    jfs: mising cleanup on register_filesystem() failure
    make configfs_pin_fs() return root dentry on success
    configfs: configfs_create_dir() has parent dentry in dentry->d_parent
    configfs: sanitize configfs_create()
    ...

    Linus Torvalds
     

21 Mar, 2012

1 commit


20 Mar, 2012

1 commit


10 Mar, 2012

7 commits


14 Jan, 2012

1 commit


04 Jan, 2012

1 commit

  • Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
    it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
    the cost of taking it into inode_init_always() will be negligible for pipes
    and sockets and negative for everything else. Not to mention the removal of
    boilerplate code from ->destroy_inode() instances...

    Signed-off-by: Al Viro

    Al Viro
     

03 Jan, 2012

2 commits

  • The le64_to_cpu() forces the calculation to be unsigned, with
    the effect that it can underflow leading to an incorrect large
    value.

    This bug only triggers in rare(ish) circumstances, an empty file
    encoded as an extended regular file or a completely sparse file.
    Normally empty files are encoded as a regular file rather than as
    an extended regular file (and the regular file i_blocks calculation
    doesn't have this bug). To save space regular file inodes are
    optimised to encode the most commonly occurring files. Less
    common regular files are encoded using extended regular file inodes
    which contain extra information.

    Empty files with nlinks greater than 1, and or empty files
    with extended attributes are encoded using extended regular file
    inodes and they will hit this bug.

    Signed-off-by: Phillip Lougher

    Phillip Lougher
     
  • A Squashfs filesystem containing nothing but an empty directory,
    although unusual and ultimately pointless, is still valid.

    The directory_table >= next_table sanity check rejects these
    filesystems as invalid because the directory_table is empty and
    equal to next_table.

    Signed-off-by: Phillip Lougher

    Phillip Lougher
     

30 Dec, 2011

2 commits