08 Oct, 2016

1 commit


08 Jun, 2016

1 commit


09 May, 2016

1 commit


03 May, 2016

1 commit


11 Apr, 2016

1 commit


05 Apr, 2016

2 commits

  • Mostly direct substitution with occasional adjustment or removing
    outdated comments.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

15 Jan, 2016

1 commit

  • Mark those kmem allocations that are known to be easily triggered from
    userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
    memcg. For the list, see below:

    - threadinfo
    - task_struct
    - task_delay_info
    - pid
    - cred
    - mm_struct
    - vm_area_struct and vm_region (nommu)
    - anon_vma and anon_vma_chain
    - signal_struct
    - sighand_struct
    - fs_struct
    - files_struct
    - fdtable and fdtable->full_fds_bits
    - dentry and external_name
    - inode for all filesystems. This is the most tedious part, because
    most filesystems overwrite the alloc_inode method.

    The list is far from complete, so feel free to add more objects.
    Nevertheless, it should be close to "account everything" approach and
    keep most workloads within bounds. Malevolent users will be able to
    breach the limit, but this was possible even with the former "account
    everything" approach (simply because it did not account everything in
    fact).

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Vladimir Davydov
    Acked-by: Johannes Weiner
    Acked-by: Michal Hocko
    Cc: Tejun Heo
    Cc: Greg Thelen
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

13 Jan, 2016

1 commit

  • Pull misc vfs updates from Al Viro:
    "All kinds of stuff. That probably should've been 5 or 6 separate
    branches, but by the time I'd realized how large and mixed that bag
    had become it had been too close to -final to play with rebasing.

    Some fs/namei.c cleanups there, memdup_user_nul() introduction and
    switching open-coded instances, burying long-dead code, whack-a-mole
    of various kinds, several new helpers for ->llseek(), assorted
    cleanups and fixes from various people, etc.

    One piece probably deserves special mention - Neil's
    lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets
    called without ->i_mutex and tries to avoid ever taking it. That, of
    course, means that it's not useful for any directory modifications,
    but things like getting inode attributes in nfds readdirplus are fine
    with that. I really should've asked for moratorium on lookup-related
    changes this cycle, but since I hadn't done that early enough... I
    *am* asking for that for the coming cycle, though - I'm going to try
    and get conversion of i_mutex to rwsem with ->lookup() done under lock
    taken shared.

    There will be a patch closer to the end of the window, along the lines
    of the one Linus had posted last May - mechanical conversion of
    ->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/
    inode_is_locked()/inode_lock_nested(). To quote Linus back then:

    -----
    | This is an automated patch using
    |
    | sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/'
    | sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/'
    | sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[ ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/'
    | sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/'
    | sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/'
    |
    | with a very few manual fixups
    -----

    I'm going to send that once the ->i_mutex-affecting stuff in -next
    gets mostly merged (or when Linus says he's about to stop taking
    merges)"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
    nfsd: don't hold i_mutex over userspace upcalls
    fs:affs:Replace time_t with time64_t
    fs/9p: use fscache mutex rather than spinlock
    proc: add a reschedule point in proc_readfd_common()
    logfs: constify logfs_block_ops structures
    fcntl: allow to set O_DIRECT flag on pipe
    fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
    fs: xattr: Use kvfree()
    [s390] page_to_phys() always returns a multiple of PAGE_SIZE
    nbd: use ->compat_ioctl()
    fs: use block_device name vsprintf helper
    lib/vsprintf: add %*pg format specifier
    fs: use gendisk->disk_name where possible
    poll: plug an unused argument to do_poll
    amdkfd: don't open-code memdup_user()
    cdrom: don't open-code memdup_user()
    rsxx: don't open-code memdup_user()
    mtip32xx: don't open-code memdup_user()
    [um] mconsole: don't open-code memdup_user_nul()
    [um] hostaudio: don't open-code memdup_user()
    ...

    Linus Torvalds
     

12 Jan, 2016

1 commit

  • Pull vfs xattr updates from Al Viro:
    "Andreas' xattr cleanup series.

    It's a followup to his xattr work that went in last cycle; -0.5KLoC"

    * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    xattr handlers: Simplify list operation
    ocfs2: Replace list xattr handler operations
    nfs: Move call to security_inode_listsecurity into nfs_listxattr
    xfs: Change how listxattr generates synthetic attributes
    tmpfs: listxattr should include POSIX ACL xattrs
    tmpfs: Use xattr handler infrastructure
    btrfs: Use xattr handler infrastructure
    vfs: Distinguish between full xattr names and proper prefixes
    posix acls: Remove duplicate xattr name definitions
    gfs2: Remove gfs2_xattr_acl_chmod
    vfs: Remove vfs_xattr_cmp

    Linus Torvalds
     

07 Jan, 2016

1 commit


31 Dec, 2015

1 commit


14 Dec, 2015

1 commit

  • Change the list operation to only return whether or not an attribute
    should be listed. Copying the attribute names into the buffer is moved
    to the callers.

    Since the result only depends on the dentry and not on the attribute
    name, we do not pass the attribute name to list operations.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

09 Dec, 2015

2 commits

  • new method: ->get_link(); replacement of ->follow_link(). The differences
    are:
    * inode and dentry are passed separately
    * might be called both in RCU and non-RCU mode;
    the former is indicated by passing it a NULL dentry.
    * when called that way it isn't allowed to block
    and should return ERR_PTR(-ECHILD) if it needs to be called
    in non-RCU mode.

    It's a flagday change - the old method is gone, all in-tree instances
    converted. Conversion isn't hard; said that, so far very few instances
    do not immediately bail out when called in RCU mode. That'll change
    in the next commits.

    Signed-off-by: Al Viro

    Al Viro
     
  • kmap() in page_follow_link_light() needed to go - allowing to hold
    an arbitrary number of kmaps for long is a great way to deadlocking
    the system.

    new helper (inode_nohighmem(inode)) needs to be used for pagecache
    symlinks inodes; done for all in-tree cases. page_follow_link_light()
    instrumented to yell about anything missed.

    Signed-off-by: Al Viro

    Al Viro
     

07 Dec, 2015

1 commit

  • Add an additional "name" field to struct xattr_handler. When the name
    is set, the handler matches attributes with exactly that name. When the
    prefix is set instead, the handler matches attributes with the given
    prefix and with a non-empty suffix.

    This patch should avoid bugs like the one fixed in commit c361016a in
    the future.

    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: James Morris
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

14 Nov, 2015

2 commits

  • Now that the xattr handler is passed to the xattr handler operations, we
    have access to the attribute name prefix, so simplify the squashfs xattr
    handlers a bit.

    Signed-off-by: Andreas Gruenbacher
    Cc: Phillip Lougher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • The xattr_handler operations are currently all passed a file system
    specific flags value which the operations can use to disambiguate between
    different handlers; some file systems use that to distinguish the xattr
    namespace, for example. In some oprations, it would be useful to also have
    access to the handler prefix. To allow that, pass a pointer to the handler
    to operations instead of the flags value alone.

    Signed-off-by: Andreas Gruenbacher
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     

24 Jun, 2015

1 commit

  • list_entry is just a wrapper for container_of, but it is arguably
    wrong (and slightly confusing) to use it when the pointed-to struct
    member is not a struct list_head. Use container_of directly instead.

    Signed-off-by: Rasmus Villemoes
    Signed-off-by: Al Viro

    Rasmus Villemoes
     

16 Apr, 2015

1 commit


28 Nov, 2014

1 commit


27 Nov, 2014

1 commit


07 Aug, 2014

2 commits


05 Jun, 2014

1 commit


13 Mar, 2014

1 commit

  • Previously, the no-op "mount -o mount /dev/xxx" operation when the
    file system is already mounted read-write causes an implied,
    unconditional syncfs(). This seems pretty stupid, and it's certainly
    documented or guaraunteed to do this, nor is it particularly useful,
    except in the case where the file system was mounted rw and is getting
    remounted read-only.

    However, it's possible that there might be some file systems that are
    actually depending on this behavior. In most file systems, it's
    probably fine to only call sync_filesystem() when transitioning from
    read-write to read-only, and there are some file systems where this is
    not needed at all (for example, for a pseudo-filesystem or something
    like romfs).

    Signed-off-by: "Theodore Ts'o"
    Cc: linux-fsdevel@vger.kernel.org
    Cc: Christoph Hellwig
    Cc: Artem Bityutskiy
    Cc: Adrian Hunter
    Cc: Evgeniy Dushistov
    Cc: Jan Kara
    Cc: OGAWA Hirofumi
    Cc: Anders Larsen
    Cc: Phillip Lougher
    Cc: Kees Cook
    Cc: Mikulas Patocka
    Cc: Petr Vandrovec
    Cc: xfs@oss.sgi.com
    Cc: linux-btrfs@vger.kernel.org
    Cc: linux-cifs@vger.kernel.org
    Cc: samba-technical@lists.samba.org
    Cc: codalist@coda.cs.cmu.edu
    Cc: linux-ext4@vger.kernel.org
    Cc: linux-f2fs-devel@lists.sourceforge.net
    Cc: fuse-devel@lists.sourceforge.net
    Cc: cluster-devel@redhat.com
    Cc: linux-mtd@lists.infradead.org
    Cc: jfs-discussion@lists.sourceforge.net
    Cc: linux-nfs@vger.kernel.org
    Cc: linux-nilfs@vger.kernel.org
    Cc: linux-ntfs-dev@lists.sourceforge.net
    Cc: ocfs2-devel@oss.oracle.com
    Cc: reiserfs-devel@vger.kernel.org

    Theodore Ts'o
     

24 Nov, 2013

1 commit


20 Nov, 2013

7 commits

  • Fix static checker complaint that stream is not checked in
    squashfs_decompressor_destroy().

    Reported-by: Dan Carpenter
    Signed-off-by: Phillip Lougher
    Reviewed-by: Minchan Kim

    Phillip Lougher
     
  • This introduces an implementation of squashfs_readpage_block()
    that directly decompresses into the page cache.

    This uses the previously added page handler abstraction to push
    down the necessary kmap_atomic/kunmap_atomic operations on the
    page cache buffers into the decompressors. This enables
    direct copying into the page cache without using the slow
    kmap/kunmap calls.

    The code detects when multiple threads are racing in
    squashfs_readpage() to decompress the same block, and avoids
    this regression by falling back to using an intermediate
    buffer.

    This patch enhances the performance of Squashfs significantly
    when multiple processes are accessing the filesystem simultaneously
    because it not only reduces memcopying, but it more importantly
    eliminates the lock contention on the intermediate buffer.

    Using single-thread decompression.

    dd if=file1 of=/dev/null bs=4096 &
    dd if=file2 of=/dev/null bs=4096 &
    dd if=file3 of=/dev/null bs=4096 &
    dd if=file4 of=/dev/null bs=4096

    Before:

    629145600 bytes (629 MB) copied, 45.8046 s, 13.7 MB/s

    After:

    629145600 bytes (629 MB) copied, 9.29414 s, 67.7 MB/s

    Signed-off-by: Phillip Lougher
    Reviewed-by: Minchan Kim

    Phillip Lougher
     
  • Restructure squashfs_readpage() splitting it into separate
    functions for datablocks, fragments and sparse blocks.

    Move the memcpying (from squashfs cache entry) implementation of
    squashfs_readpage_block into file_cache.c

    This allows different implementations to be supported.

    Signed-off-by: Phillip Lougher
    Reviewed-by: Minchan Kim

    Phillip Lougher
     
  • Further generalise the decompressors by adding a page handler
    abstraction. This adds helpers to allow the decompressors
    to access and process the output buffers in an implementation
    independant manner.

    This allows different types of output buffer to be passed
    to the decompressors, with the implementation specific
    aspects handled at decompression time, but without the
    knowledge being held in the decompressor wrapper code.

    This will allow the decompressors to handle Squashfs
    cache buffers, and page cache pages.

    This patch adds the abstraction and an implementation for
    the caches.

    Signed-off-by: Phillip Lougher
    Reviewed-by: Minchan Kim

    Phillip Lougher
     
  • Add a multi-threaded decompression implementation which uses
    percpu variables.

    Using percpu variables has advantages and disadvantages over
    implementations which do not use percpu variables.

    Advantages:
    * the nature of percpu variables ensures decompression is
    load-balanced across the multiple cores.
    * simplicity.

    Disadvantages: it limits decompression to one thread per core.

    Signed-off-by: Phillip Lougher

    Phillip Lougher
     
  • Now squashfs have used for only one stream buffer for decompression
    so it hurts parallel read performance so this patch supports
    multiple decompressor to enhance performance parallel I/O.

    Four 1G file dd read on KVM machine which has 2 CPU and 4G memory.

    dd if=test/test1.dat of=/dev/null &
    dd if=test/test2.dat of=/dev/null &
    dd if=test/test3.dat of=/dev/null &
    dd if=test/test4.dat of=/dev/null &

    old : 1m39s -> new : 9s

    * From v1
    * Change comp_strm with decomp_strm - Phillip
    * Change/add comments - Phillip

    Signed-off-by: Minchan Kim
    Signed-off-by: Phillip Lougher

    Minchan Kim
     
  • The decompressor interface and code was written from
    the point of view of single-threaded operation. In doing
    so it mixed a lot of single-threaded implementation specific
    aspects into the decompressor code and elsewhere which makes it
    difficult to seamlessly support multiple different decompressor
    implementations.

    This patch does the following:

    1. It removes compressor_options parsing from the decompressor
    init() function. This allows the decompressor init() function
    to be dynamically called to instantiate multiple decompressors,
    without the compressor options needing to be read and parsed each
    time.

    2. It moves threading and all sleeping operations out of the
    decompressors. In doing so, it makes the decompressors
    non-blocking wrappers which only deal with interfacing with
    the decompressor implementation.

    3. It splits decompressor.[ch] into decompressor generic functions
    in decompressor.[ch], and moves the single threaded
    decompressor implementation into decompressor_single.c.

    The result of this patch is Squashfs should now be able to
    support multiple decompressors by adding new decompressor_xxx.c
    files with specialised implementations of the functions in
    decompressor_single.c

    Signed-off-by: Phillip Lougher
    Reviewed-by: Minchan Kim

    Phillip Lougher
     

06 Sep, 2013

5 commits

  • We read the type field from disk. This value should be sanity
    checked for correctness to avoid an out of bounds access when
    reading the squashfs_filetype_table array.

    Signed-off-by: Phillip Lougher

    Phillip Lougher
     
  • We read the size (of the name) field from disk. This value should
    be sanity checked for correctness to avoid blindly reading
    huge amounts of unnecessary data from disk on corruption.

    Note, here we're not actually reading the name into a buffer, but
    skipping it, and so corruption doesn't cause buffer overflow, merely
    lots of unnecessary amounts of data to be read.

    Signed-off-by: Phillip Lougher

    Phillip Lougher
     
  • The dir_count and size fields when read from disk are sanity
    checked for correctness. However, the sanity checks only check the
    values are not greater than expected. As dir_count and size were
    incorrectly defined as signed ints, this can lead to corrupted values
    appearing as negative which are not trapped.

    Signed-off-by: Phillip Lougher

    Phillip Lougher
     
  • The dir_count and size fields when read from disk are sanity
    checked for correctness. However, the sanity checks only check the
    values are not greater than expected. As dir_count and size were
    incorrectly defined as signed ints, this can lead to corrupted values
    appearing as negative which are not trapped.

    Signed-off-by: Phillip Lougher

    Phillip Lougher
     
  • Patch "Squashfs: sanity check information from disk" from
    Dan Carpenter adds a missing check for corruption in the
    "size" field while reading the directory index from disk.

    It, however, sets err to -EINVAL, this value is not used later, and
    so setting it is completely redundant. So remove it.

    Errors in reading the index are deliberately non-fatal. If we
    get an error in reading the index we just return the part of the
    index we have managed to read - the index isn't essential,
    just quicker.

    Signed-off-by: Phillip Lougher

    Phillip Lougher
     

05 Sep, 2013

1 commit