08 Oct, 2016
1 commit
-
These inode operations are no longer used; remove them.
Signed-off-by: Andreas Gruenbacher
Signed-off-by: Al Viro
08 Jun, 2016
1 commit
-
This has ll_rw_block users pass in the operation and flags separately,
so ll_rw_block can setup the bio op and bi_rw flags on the bio that
is submitted.Signed-off-by: Mike Christie
Reviewed-by: Christoph Hellwig
Reviewed-by: Hannes Reinecke
Signed-off-by: Jens Axboe
09 May, 2016
1 commit
-
don't need to lock directory in ->llseek(), either
Signed-off-by: Al Viro
03 May, 2016
1 commit
-
The rest of work.xattr stuff isn't needed for this branch
11 Apr, 2016
1 commit
-
... and do not assume they are already attached to each other
Signed-off-by: Al Viro
05 Apr, 2016
2 commits
-
Mostly direct substitution with occasional adjustment or removing
outdated comments.Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds -
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.This promise never materialized. And unlikely will.
We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.Let's stop pretending that pages in page cache are special. They are
not.The changes are pretty straight-forward:
- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;
- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;
- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
- page_cache_get() -> get_page();
- page_cache_release() -> put_page();
This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.virtual patch
@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds
15 Jan, 2016
1 commit
-
Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg. For the list, see below:- threadinfo
- task_struct
- task_delay_info
- pid
- cred
- mm_struct
- vm_area_struct and vm_region (nommu)
- anon_vma and anon_vma_chain
- signal_struct
- sighand_struct
- fs_struct
- files_struct
- fdtable and fdtable->full_fds_bits
- dentry and external_name
- inode for all filesystems. This is the most tedious part, because
most filesystems overwrite the alloc_inode method.The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds. Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Vladimir Davydov
Acked-by: Johannes Weiner
Acked-by: Michal Hocko
Cc: Tejun Heo
Cc: Greg Thelen
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Jan, 2016
1 commit
-
Pull misc vfs updates from Al Viro:
"All kinds of stuff. That probably should've been 5 or 6 separate
branches, but by the time I'd realized how large and mixed that bag
had become it had been too close to -final to play with rebasing.Some fs/namei.c cleanups there, memdup_user_nul() introduction and
switching open-coded instances, burying long-dead code, whack-a-mole
of various kinds, several new helpers for ->llseek(), assorted
cleanups and fixes from various people, etc.One piece probably deserves special mention - Neil's
lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets
called without ->i_mutex and tries to avoid ever taking it. That, of
course, means that it's not useful for any directory modifications,
but things like getting inode attributes in nfds readdirplus are fine
with that. I really should've asked for moratorium on lookup-related
changes this cycle, but since I hadn't done that early enough... I
*am* asking for that for the coming cycle, though - I'm going to try
and get conversion of i_mutex to rwsem with ->lookup() done under lock
taken shared.There will be a patch closer to the end of the window, along the lines
of the one Linus had posted last May - mechanical conversion of
->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/
inode_is_locked()/inode_lock_nested(). To quote Linus back then:-----
| This is an automated patch using
|
| sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/'
| sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/'
| sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[ ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/'
| sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/'
| sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/'
|
| with a very few manual fixups
-----I'm going to send that once the ->i_mutex-affecting stuff in -next
gets mostly merged (or when Linus says he's about to stop taking
merges)"* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
nfsd: don't hold i_mutex over userspace upcalls
fs:affs:Replace time_t with time64_t
fs/9p: use fscache mutex rather than spinlock
proc: add a reschedule point in proc_readfd_common()
logfs: constify logfs_block_ops structures
fcntl: allow to set O_DIRECT flag on pipe
fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
fs: xattr: Use kvfree()
[s390] page_to_phys() always returns a multiple of PAGE_SIZE
nbd: use ->compat_ioctl()
fs: use block_device name vsprintf helper
lib/vsprintf: add %*pg format specifier
fs: use gendisk->disk_name where possible
poll: plug an unused argument to do_poll
amdkfd: don't open-code memdup_user()
cdrom: don't open-code memdup_user()
rsxx: don't open-code memdup_user()
mtip32xx: don't open-code memdup_user()
[um] mconsole: don't open-code memdup_user_nul()
[um] hostaudio: don't open-code memdup_user()
...
12 Jan, 2016
1 commit
-
Pull vfs xattr updates from Al Viro:
"Andreas' xattr cleanup series.It's a followup to his xattr work that went in last cycle; -0.5KLoC"
* 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
xattr handlers: Simplify list operation
ocfs2: Replace list xattr handler operations
nfs: Move call to security_inode_listsecurity into nfs_listxattr
xfs: Change how listxattr generates synthetic attributes
tmpfs: listxattr should include POSIX ACL xattrs
tmpfs: Use xattr handler infrastructure
btrfs: Use xattr handler infrastructure
vfs: Distinguish between full xattr names and proper prefixes
posix acls: Remove duplicate xattr name definitions
gfs2: Remove gfs2_xattr_acl_chmod
vfs: Remove vfs_xattr_cmp
07 Jan, 2016
1 commit
-
Signed-off-by: Dmitry Monakhov
Signed-off-by: Al Viro
31 Dec, 2015
1 commit
-
Signed-off-by: Al Viro
14 Dec, 2015
1 commit
-
Change the list operation to only return whether or not an attribute
should be listed. Copying the attribute names into the buffer is moved
to the callers.Since the result only depends on the dentry and not on the attribute
name, we do not pass the attribute name to list operations.Signed-off-by: Andreas Gruenbacher
Signed-off-by: Al Viro
09 Dec, 2015
2 commits
-
new method: ->get_link(); replacement of ->follow_link(). The differences
are:
* inode and dentry are passed separately
* might be called both in RCU and non-RCU mode;
the former is indicated by passing it a NULL dentry.
* when called that way it isn't allowed to block
and should return ERR_PTR(-ECHILD) if it needs to be called
in non-RCU mode.It's a flagday change - the old method is gone, all in-tree instances
converted. Conversion isn't hard; said that, so far very few instances
do not immediately bail out when called in RCU mode. That'll change
in the next commits.Signed-off-by: Al Viro
-
kmap() in page_follow_link_light() needed to go - allowing to hold
an arbitrary number of kmaps for long is a great way to deadlocking
the system.new helper (inode_nohighmem(inode)) needs to be used for pagecache
symlinks inodes; done for all in-tree cases. page_follow_link_light()
instrumented to yell about anything missed.Signed-off-by: Al Viro
07 Dec, 2015
1 commit
-
Add an additional "name" field to struct xattr_handler. When the name
is set, the handler matches attributes with exactly that name. When the
prefix is set instead, the handler matches attributes with the given
prefix and with a non-empty suffix.This patch should avoid bugs like the one fixed in commit c361016a in
the future.Signed-off-by: Andreas Gruenbacher
Reviewed-by: James Morris
Signed-off-by: Al Viro
14 Nov, 2015
2 commits
-
Now that the xattr handler is passed to the xattr handler operations, we
have access to the attribute name prefix, so simplify the squashfs xattr
handlers a bit.Signed-off-by: Andreas Gruenbacher
Cc: Phillip Lougher
Signed-off-by: Al Viro -
The xattr_handler operations are currently all passed a file system
specific flags value which the operations can use to disambiguate between
different handlers; some file systems use that to distinguish the xattr
namespace, for example. In some oprations, it would be useful to also have
access to the handler prefix. To allow that, pass a pointer to the handler
to operations instead of the flags value alone.Signed-off-by: Andreas Gruenbacher
Reviewed-by: Christoph Hellwig
Signed-off-by: Al Viro
24 Jun, 2015
1 commit
-
list_entry is just a wrapper for container_of, but it is arguably
wrong (and slightly confusing) to use it when the pointed-to struct
member is not a struct list_head. Use container_of directly instead.Signed-off-by: Rasmus Villemoes
Signed-off-by: Al Viro
16 Apr, 2015
1 commit
-
that's the bulk of filesystem drivers dealing with inodes of their own
Signed-off-by: David Howells
Signed-off-by: Al Viro
28 Nov, 2014
1 commit
-
Add the glue code, and also update the documentation.
Signed-off-by: Phillip Lougher
27 Nov, 2014
1 commit
-
Add support for reading file systems compressed with the
LZ4 compression algorithm.This patch adds the LZ4 decompressor wrapper code.
Signed-off-by: Phillip Lougher
07 Aug, 2014
2 commits
-
- Convert printk to pr_foo()
- Add pr_fmt for future logging entries
- Coalesce formatsSigned-off-by: Fabian Frederick
Cc: Phillip Lougher
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
kmalloc_array() manages count*sizeof overflow.
Signed-off-by: Fabian Frederick
Cc: Phillip Lougher
Cc: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
05 Jun, 2014
1 commit
-
Update the last pr_warning callsite in fs branch
Signed-off-by: Fabian Frederick
Cc: Phillip Lougher
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Mar, 2014
1 commit
-
Previously, the no-op "mount -o mount /dev/xxx" operation when the
file system is already mounted read-write causes an implied,
unconditional syncfs(). This seems pretty stupid, and it's certainly
documented or guaraunteed to do this, nor is it particularly useful,
except in the case where the file system was mounted rw and is getting
remounted read-only.However, it's possible that there might be some file systems that are
actually depending on this behavior. In most file systems, it's
probably fine to only call sync_filesystem() when transitioning from
read-write to read-only, and there are some file systems where this is
not needed at all (for example, for a pseudo-filesystem or something
like romfs).Signed-off-by: "Theodore Ts'o"
Cc: linux-fsdevel@vger.kernel.org
Cc: Christoph Hellwig
Cc: Artem Bityutskiy
Cc: Adrian Hunter
Cc: Evgeniy Dushistov
Cc: Jan Kara
Cc: OGAWA Hirofumi
Cc: Anders Larsen
Cc: Phillip Lougher
Cc: Kees Cook
Cc: Mikulas Patocka
Cc: Petr Vandrovec
Cc: xfs@oss.sgi.com
Cc: linux-btrfs@vger.kernel.org
Cc: linux-cifs@vger.kernel.org
Cc: samba-technical@lists.samba.org
Cc: codalist@coda.cs.cmu.edu
Cc: linux-ext4@vger.kernel.org
Cc: linux-f2fs-devel@lists.sourceforge.net
Cc: fuse-devel@lists.sourceforge.net
Cc: cluster-devel@redhat.com
Cc: linux-mtd@lists.infradead.org
Cc: jfs-discussion@lists.sourceforge.net
Cc: linux-nfs@vger.kernel.org
Cc: linux-nilfs@vger.kernel.org
Cc: linux-ntfs-dev@lists.sourceforge.net
Cc: ocfs2-devel@oss.oracle.com
Cc: reiserfs-devel@vger.kernel.org
24 Nov, 2013
1 commit
-
Direct decompression into the page cache. If we fall back
to using an intermediate buffer (because we cannot grab all the
page cache pages) and we get a decompress fail, we forgot to
release the pages.Reported-by: Roman Peniaev
Signed-off-by: Phillip Lougher
20 Nov, 2013
7 commits
-
Fix static checker complaint that stream is not checked in
squashfs_decompressor_destroy().Reported-by: Dan Carpenter
Signed-off-by: Phillip Lougher
Reviewed-by: Minchan Kim -
This introduces an implementation of squashfs_readpage_block()
that directly decompresses into the page cache.This uses the previously added page handler abstraction to push
down the necessary kmap_atomic/kunmap_atomic operations on the
page cache buffers into the decompressors. This enables
direct copying into the page cache without using the slow
kmap/kunmap calls.The code detects when multiple threads are racing in
squashfs_readpage() to decompress the same block, and avoids
this regression by falling back to using an intermediate
buffer.This patch enhances the performance of Squashfs significantly
when multiple processes are accessing the filesystem simultaneously
because it not only reduces memcopying, but it more importantly
eliminates the lock contention on the intermediate buffer.Using single-thread decompression.
dd if=file1 of=/dev/null bs=4096 &
dd if=file2 of=/dev/null bs=4096 &
dd if=file3 of=/dev/null bs=4096 &
dd if=file4 of=/dev/null bs=4096Before:
629145600 bytes (629 MB) copied, 45.8046 s, 13.7 MB/s
After:
629145600 bytes (629 MB) copied, 9.29414 s, 67.7 MB/s
Signed-off-by: Phillip Lougher
Reviewed-by: Minchan Kim -
Restructure squashfs_readpage() splitting it into separate
functions for datablocks, fragments and sparse blocks.Move the memcpying (from squashfs cache entry) implementation of
squashfs_readpage_block into file_cache.cThis allows different implementations to be supported.
Signed-off-by: Phillip Lougher
Reviewed-by: Minchan Kim -
Further generalise the decompressors by adding a page handler
abstraction. This adds helpers to allow the decompressors
to access and process the output buffers in an implementation
independant manner.This allows different types of output buffer to be passed
to the decompressors, with the implementation specific
aspects handled at decompression time, but without the
knowledge being held in the decompressor wrapper code.This will allow the decompressors to handle Squashfs
cache buffers, and page cache pages.This patch adds the abstraction and an implementation for
the caches.Signed-off-by: Phillip Lougher
Reviewed-by: Minchan Kim -
Add a multi-threaded decompression implementation which uses
percpu variables.Using percpu variables has advantages and disadvantages over
implementations which do not use percpu variables.Advantages:
* the nature of percpu variables ensures decompression is
load-balanced across the multiple cores.
* simplicity.Disadvantages: it limits decompression to one thread per core.
Signed-off-by: Phillip Lougher
-
Now squashfs have used for only one stream buffer for decompression
so it hurts parallel read performance so this patch supports
multiple decompressor to enhance performance parallel I/O.Four 1G file dd read on KVM machine which has 2 CPU and 4G memory.
dd if=test/test1.dat of=/dev/null &
dd if=test/test2.dat of=/dev/null &
dd if=test/test3.dat of=/dev/null &
dd if=test/test4.dat of=/dev/null &old : 1m39s -> new : 9s
* From v1
* Change comp_strm with decomp_strm - Phillip
* Change/add comments - PhillipSigned-off-by: Minchan Kim
Signed-off-by: Phillip Lougher -
The decompressor interface and code was written from
the point of view of single-threaded operation. In doing
so it mixed a lot of single-threaded implementation specific
aspects into the decompressor code and elsewhere which makes it
difficult to seamlessly support multiple different decompressor
implementations.This patch does the following:
1. It removes compressor_options parsing from the decompressor
init() function. This allows the decompressor init() function
to be dynamically called to instantiate multiple decompressors,
without the compressor options needing to be read and parsed each
time.2. It moves threading and all sleeping operations out of the
decompressors. In doing so, it makes the decompressors
non-blocking wrappers which only deal with interfacing with
the decompressor implementation.3. It splits decompressor.[ch] into decompressor generic functions
in decompressor.[ch], and moves the single threaded
decompressor implementation into decompressor_single.c.The result of this patch is Squashfs should now be able to
support multiple decompressors by adding new decompressor_xxx.c
files with specialised implementations of the functions in
decompressor_single.cSigned-off-by: Phillip Lougher
Reviewed-by: Minchan Kim
06 Sep, 2013
5 commits
-
We read the type field from disk. This value should be sanity
checked for correctness to avoid an out of bounds access when
reading the squashfs_filetype_table array.Signed-off-by: Phillip Lougher
-
We read the size (of the name) field from disk. This value should
be sanity checked for correctness to avoid blindly reading
huge amounts of unnecessary data from disk on corruption.Note, here we're not actually reading the name into a buffer, but
skipping it, and so corruption doesn't cause buffer overflow, merely
lots of unnecessary amounts of data to be read.Signed-off-by: Phillip Lougher
-
The dir_count and size fields when read from disk are sanity
checked for correctness. However, the sanity checks only check the
values are not greater than expected. As dir_count and size were
incorrectly defined as signed ints, this can lead to corrupted values
appearing as negative which are not trapped.Signed-off-by: Phillip Lougher
-
The dir_count and size fields when read from disk are sanity
checked for correctness. However, the sanity checks only check the
values are not greater than expected. As dir_count and size were
incorrectly defined as signed ints, this can lead to corrupted values
appearing as negative which are not trapped.Signed-off-by: Phillip Lougher
-
Patch "Squashfs: sanity check information from disk" from
Dan Carpenter adds a missing check for corruption in the
"size" field while reading the directory index from disk.It, however, sets err to -EINVAL, this value is not used later, and
so setting it is completely redundant. So remove it.Errors in reading the index are deliberately non-fatal. If we
get an error in reading the index we just return the part of the
index we have managed to read - the index isn't essential,
just quicker.Signed-off-by: Phillip Lougher
05 Sep, 2013
1 commit
-
Merged the two for loops. We might get a little gain by overlapping
wait_on_bh and the memcpy operations.Signed-off-by: Manish Sharma
Signed-off-by: Phillip Lougher