Eric Lee / smarc-fsl-linux-kernel

28 May, 2016

2 commits

d102a56ed Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs fixes from Al Viro:
"Followups to the parallel lookup work:

- update docs

- restore killability of the places that used to take ->i_mutex
killably now that we have down_write_killable() merged

- Additionally, it turns out that I missed a prerequisite for
security_d_instantiate() stuff - ->getxattr() wasn't the only thing
that could be called before dentry is attached to inode; with smack
we needed the same treatment applied to ->setxattr() as well"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
switch ->setxattr() to passing dentry and inode separately
switch xattr_handler->set() to passing dentry and inode separately
restore killability of old mutex_lock_killable(&inode->i_mutex) users
add down_write_killable_nested()
update D/f/directory-locking

Linus Torvalds
2016-05-28 08:14:05 +0800
593012268 switch xattr_handler->set() to passing dentry and inode separately ... Browse Code »

preparation for similar switch in ->setxattr() (see the next commit for
rationale).

Signed-off-by: Al Viro

Al Viro
2016-05-28 03:39:43 +0800

27 May, 2016

1 commit

315227f6d Merge tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull misc DAX updates from Vishal Verma:
"DAX error handling for 4.7

- Until now, dax has been disabled if media errors were found on any
device. This enables the use of DAX in the presence of these
errors by making all sector-aligned zeroing go through the driver.

- The driver (already) has the ability to clear errors on writes that
are sent through the block layer using 'DSMs' defined in ACPI 6.1.

Other misc changes:

- When mounting DAX filesystems, check to make sure the partition is
page aligned. This is a requirement for DAX, and previously, we
allowed such unaligned mounts to succeed, but subsequent
reads/writes would fail.

- Misc/cleanup fixes from Jan that remove unused code from DAX
related to zeroing, writeback, and some size checks"

* tag 'dax-misc-for-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: fix a comment in dax_zero_page_range and dax_truncate_page
dax: for truncate/hole-punch, do zeroing through the driver if possible
dax: export a low-level __dax_zero_page_range helper
dax: use sb_issue_zerout instead of calling dax_clear_sectors
dax: enable dax in the presence of known media errors (badblocks)
dax: fallback from pmd to pte on error
block: Update blkdev_dax_capable() for consistency
xfs: Add alignment check for DAX mount
ext2: Add alignment check for DAX mount
ext4: Add alignment check for DAX mount
block: Add bdev_dax_supported() for dax mount checks
block: Add vfs_msg() interface
dax: Remove redundant inode size checks
dax: Remove pointless writeback from dax_do_io()
dax: Remove zeroing from dax_io()
dax: Remove dead zeroing code from fault handlers
ext2: Avoid DAX zeroing to corrupt data
ext2: Fix block zeroing in ext2_get_blocks() for DAX
dax: Remove complete_unwritten argument
DAX: move RADIX_DAX_ definitions to dax.c

Linus Torvalds
2016-05-27 10:34:26 +0800

19 May, 2016

1 commit

3dc291610 dax: use sb_issue_zerout instead of calling dax_clear_sectors ... Browse Code »

dax_clear_sectors() cannot handle poisoned blocks. These must be
zeroed using the BIO interface instead. Convert ext2 and XFS to use
only sb_issue_zerout().

Reviewed-by: Jeff Moyer
Reviewed-by: Christoph Hellwig
Reviewed-by: Jan Kara
Signed-off-by: Matthew Wilcox
[vishal: Also remove the dax_clear_sectors function entirely]
Signed-off-by: Vishal Verma

Matthew Wilcox
2016-05-19 02:16:56 +0800

18 May, 2016

1 commit

c2e7b2070 Merge branch 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs cleanups from Al Viro:
"More cleanups from Christoph"

* 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
nfsd: use RWF_SYNC
fs: add RWF_DSYNC aand RWF_SYNC
ceph: use generic_write_sync
fs: simplify the generic_write_sync prototype
fs: add IOCB_SYNC and IOCB_DSYNC
direct-io: remove the offset argument to dio_complete
direct-io: eliminate the offset argument to ->direct_IO
xfs: eliminate the pos variable in xfs_file_dio_aio_write
filemap: remove the pos argument to generic_file_direct_write
filemap: remove pos variables in generic_file_read_iter

Linus Torvalds
2016-05-18 06:05:23 +0800

17 May, 2016

4 commits

284854be2 ext2: Add alignment check for DAX mount ... Browse Code »

When a partition is not aligned by 4KB, mount -o dax succeeds,
but any read/write access to the filesystem fails, except for
metadata update.

Call bdev_dax_supported() to perform proper precondition checks
which includes this partition alignment check.

Signed-off-by: Toshi Kani
Reviewed-by: Jan Kara
Cc: Jan Kara
Cc: Dan Williams
Cc: Ross Zwisler
Cc: Christoph Hellwig
Cc: Boaz Harrosh
Signed-off-by: Vishal Verma

Toshi Kani
2016-05-17 14:44:12 +0800
86b0624e4 ext2: Avoid DAX zeroing to corrupt data ... Browse Code »

Currently ext2 zeroes any data blocks allocated for DAX inode however it
still returns them as BH_New. Thus DAX code zeroes them again in
dax_insert_mapping() which can possibly overwrite the data that has been
already stored to those blocks by a racing dax_io(). Avoid marking
pre-zeroed buffers as new.

Reviewed-by: Ross Zwisler
Signed-off-by: Jan Kara
Signed-off-by: Vishal Verma

Jan Kara
2016-05-17 14:44:07 +0800
9b6cd5f76 ext2: Fix block zeroing in ext2_get_blocks() for DAX ... Browse Code »

When zeroing allocated blocks for DAX, we accidentally zeroed only the
first allocated block instead of all of them. So far this problem is
hidden by the fact that page faults always need only a single block and
DAX write code zeroes blocks again. But the zeroing in DAX code is racy
and needs to be removed so fix the zeroing in ext2 to zero all allocated
blocks.

Reported-by: Ross Zwisler
Signed-off-by: Jan Kara
Signed-off-by: Vishal Verma

Jan Kara
2016-05-17 14:41:24 +0800
02fbd1397 dax: Remove complete_unwritten argument ... Browse Code »

Fault handlers currently take complete_unwritten argument to convert
unwritten extents after PTEs are updated. However no filesystem uses
this anymore as the code is racy. Remove the unused argument.

Reviewed-by: Ross Zwisler
Signed-off-by: Jan Kara
Signed-off-by: Vishal Verma

Jan Kara
2016-05-17 08:11:51 +0800

03 May, 2016

3 commits

3b0a3c1ac simple local filesystems: switch to ->iterate_shared() ... Browse Code »

no changes needed (XFS isn't simple, but it has the same parallelism
in the interesting parts exercised from CXFS).

Signed-off-by: Al Viro

Al Viro
2016-05-03 07:49:32 +0800
be5b82dbf make ext2_get_page() and friends work without external serialization ... Browse Code »

Right now ext2_get_page() (and its analogues in a bunch of other filesystems)
relies upon the directory being locked - the way it sets and tests Checked and
Error bits would be racy without that. Switch to a slightly different scheme,
_not_ setting Checked in case of failure. That way the logics becomes
if Checked => OK
else if Error => fail
else if !validate => fail
else => OK
with validation setting Checked or Error on success and failure resp. and
returning which one had happened. Equivalent to the current logics, but unlike
the current logics not sensitive to the order of set_bit, test_bit getting
reordered by CPU, etc.

Signed-off-by: Al Viro

Al Viro
2016-05-03 07:47:25 +0800
84695ffee Merge getxattr prototype change into work.lookups ... Browse Code »

The rest of work.xattr stuff isn't needed for this branch

Al Viro
2016-05-03 07:45:47 +0800

02 May, 2016

1 commit

c8b8e32d7 direct-io: eliminate the offset argument to ->direct_IO ... Browse Code »

Including blkdev_direct_IO and dax_do_io. It has to be ki_pos to actually
work, so eliminate the superflous argument.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2016-05-02 07:58:39 +0800

11 Apr, 2016

2 commits

b296821a7 xattr_handler: pass dentry and inode as separate arguments of ->get() ... Browse Code »

... and do not assume they are already attached to each other

Signed-off-by: Al Viro

Al Viro
2016-04-11 08:48:24 +0800
fc64005c9 don't bother with ->d_inode->i_sb - it's always equal to ->d_sb ... Browse Code »

... and neither can ever be NULL

Signed-off-by: Al Viro

Al Viro
2016-04-11 05:11:51 +0800

05 Apr, 2016

2 commits

ea1754a08 mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage ... Browse Code »

Mostly direct substitution with occasional adjustment or removing
outdated comments.

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800
09cbfeaf1 mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros ... Browse Code »

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800

31 Mar, 2016

1 commit

b8a7a3a66 posix_acl: Inode acl caching fixes ... Browse Code »

When get_acl() is called for an inode whose ACL is not cached yet, the
get_acl inode operation is called to fetch the ACL from the filesystem.
The inode operation is responsible for updating the cached acl with
set_cached_acl(). This is done without locking at the VFS level, so
another task can call set_cached_acl() or forget_cached_acl() before the
get_acl inode operation gets to calling set_cached_acl(), and then
get_acl's call to set_cached_acl() results in caching an outdate ACL.

Prevent this from happening by setting the cached ACL pointer to a
task-specific sentinel value before calling the get_acl inode operation.
Move the responsibility for updating the cached ACL from the get_acl
inode operations to get_acl(). There, only set the cached ACL if the
sentinel value hasn't changed.

The sentinel values are chosen to have odd values. Likewise, the value
of ACL_NOT_CACHED is odd. In contrast, ACL object pointers always have
an even value (ACLs are aligned in memory). This allows to distinguish
uncached ACLs values from ACL objects.

In addition, switch from guarding inode->i_acl and inode->i_default_acl
upates by the inode->i_lock spinlock to using xchg() and cmpxchg().

Filesystems that do not want ACLs returned from their get_acl inode
operations to be cached must call forget_cached_acl() to prevent the VFS
from doing so.

(Patch written by Al Viro and Andreas Gruenbacher.)

Signed-off-by: Andreas Gruenbacher
Signed-off-by: Al Viro

Andreas Gruenbacher
2016-03-31 12:30:15 +0800

18 Mar, 2016

1 commit

faeb20ecf Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 updates from Ted Ts'o:
"Performance improvements in SEEK_DATA and xattr scalability
improvements, plus a lot of clean ups and bug fixes"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (38 commits)
ext4: clean up error handling in the MMP support
jbd2: do not fail journal because of frozen_buffer allocation failure
ext4: use __GFP_NOFAIL in ext4_free_blocks()
ext4: fix compile error while opening the macro DOUBLE_CHECK
ext4: print ext4 mount option data_err=abort correctly
ext4: fix NULL pointer dereference in ext4_mark_inode_dirty()
ext4: drop unneeded BUFFER_TRACE in ext4_delete_inline_entry()
ext4: fix misspellings in comments.
jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path
ext4: more efficient SEEK_DATA implementation
ext4: cleanup handling of bh->b_state in DAX mmap
ext4: return hole from ext4_map_blocks()
ext4: factor out determining of hole size
ext4: fix setting of referenced bit in ext4_es_lookup_extent()
ext4: remove i_ioend_count
ext4: simplify io_end handling for AIO DIO
ext4: move trans handling and completion deferal out of _ext4_get_block
ext4: rename and split get blocks functions
ext4: use i_mutex to serialize unaligned AIO DIO
ext4: pack ioend structure better
...

Linus Torvalds
2016-03-18 07:31:18 +0800

28 Feb, 2016

5 commits

691429e13 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge fixes from Andrew Morton:
"10 fixes"

* emailed patches from Andrew Morton :
dax: move writeback calls into the filesystems
dax: give DAX clearing code correct bdev
ext4: online defrag not supported with DAX
ext2, ext4: only set S_DAX for regular inodes
block: disable block device DAX by default
ocfs2: unlock inode if deleting inode from orphan fails
mm: ASLR: use get_random_long()
drivers: char: random: add get_random_long()
mm: numa: quickly fail allocations for NUMA balancing on full nodes
mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED

Linus Torvalds
2016-02-28 04:46:16 +0800
1e9d180ba ext2, ext4: fix issue with missing journal entry in ext4_dax_mkwrite() ... Browse Code »

As it is currently written ext4_dax_mkwrite() assumes that the call into
__dax_mkwrite() will not have to do a block allocation so it doesn't create
a journal entry. For a read that creates a zero page to cover a hole
followed by a write that actually allocates storage this is incorrect. The
ext4_dax_mkwrite() -> __dax_mkwrite() -> __dax_fault() path calls
get_blocks() to allocate storage.

Fix this by having the ->page_mkwrite fault handler call ext4_dax_fault()
as this function already has all the logic needed to allocate a journal
entry and call __dax_fault().

Also update the ext2 fault handlers in this same way to remove duplicate
code and keep the logic between ext2 and ext4 the same.

Reviewed-by: Jan Kara
Signed-off-by: Ross Zwisler
Signed-off-by: Theodore Ts'o

Ross Zwisler
2016-02-28 03:01:16 +0800
7f6d5b529 dax: move writeback calls into the filesystems ... Browse Code »

Previously calls to dax_writeback_mapping_range() for all DAX filesystems
(ext2, ext4 & xfs) were centralized in filemap_write_and_wait_range().

dax_writeback_mapping_range() needs a struct block_device, and it used
to get that from inode->i_sb->s_bdev. This is correct for normal inodes
mounted on ext2, ext4 and XFS filesystems, but is incorrect for DAX raw
block devices and for XFS real-time files.

Instead, call dax_writeback_mapping_range() directly from the filesystem
->writepages function so that it can supply us with a valid block
device. This also fixes DAX code to properly flush caches in response
to sync(2).

Signed-off-by: Ross Zwisler
Signed-off-by: Jan Kara
Cc: Al Viro
Cc: Dan Williams
Cc: Dave Chinner
Cc: Jens Axboe
Cc: Matthew Wilcox
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ross Zwisler
2016-02-28 02:28:52 +0800
20a90f589 dax: give DAX clearing code correct bdev ... Browse Code »

dax_clear_blocks() needs a valid struct block_device and previously it
was using inode->i_sb->s_bdev in all cases. This is correct for normal
inodes on mounted ext2, ext4 and XFS filesystems, but is incorrect for
DAX raw block devices and for XFS real-time devices.

Instead, rename dax_clear_blocks() to dax_clear_sectors(), and change
its arguments to take a bdev and a sector instead of an inode and a
block. This better reflects what the function does, and it allows the
filesystem and raw block device code to pass in an appropriate struct
block_device.

Signed-off-by: Ross Zwisler
Suggested-by: Dan Williams
Reviewed-by: Jan Kara
Cc: Theodore Ts'o
Cc: Al Viro
Cc: Dave Chinner
Cc: Jens Axboe
Cc: Matthew Wilcox
Cc: Ross Zwisler
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ross Zwisler
2016-02-28 02:28:52 +0800
0a6cf9137 ext2, ext4: only set S_DAX for regular inodes ... Browse Code »

When S_DAX is set on an inode we assume that if there are pages attached
to the mapping (mapping->nrpages != 0), those pages are clean zero pages
that were used to service reads from holes. Any dirty data associated
with the inode should be in the form of DAX exceptional entries
(mapping->nrexceptional) that is written back via
dax_writeback_mapping_range().

With the current code, though, this isn't always true. For example,
ext2 and ext4 directory inodes can have S_DAX set, but have their dirty
data stored as dirty page cache entries. For these types of inodes,
having S_DAX set doesn't really make sense since their I/O doesn't
actually happen through the DAX code path.

Instead, only allow S_DAX to be set for regular inodes for ext2 and
ext4. This allows us to have strict DAX vs non-DAX paths in the
writeback code.

Signed-off-by: Ross Zwisler
Reviewed-by: Jan Kara
Cc: Theodore Ts'o
Cc: Al Viro
Cc: Dan Williams
Cc: Dave Chinner
Cc: Jens Axboe
Cc: Matthew Wilcox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ross Zwisler
2016-02-28 02:28:52 +0800

23 Feb, 2016

3 commits

6048c64b2 mbcache: add reusable flag to cache entries ... Browse Code »

To reduce amount of damage caused by single bad block, we limit number
of inodes sharing an xattr block to 1024. Thus there can be more xattr
blocks with the same contents when there are lots of files with the same
extended attributes. These xattr blocks naturally result in hash
collisions and can form long hash chains and we unnecessarily check each
such block only to find out we cannot use it because it is already
shared by too many inodes.

Add a reusable flag to cache entries which is cleared when a cache entry
has reached its maximum refcount. Cache entries which are not marked
reusable are skipped by mb_cache_entry_find_{first,next}. This
significantly speeds up mbcache when there are many same xattr blocks.
For example for xattr-bench with 5 values and each process handling
20000 files, the run for 64 processes is 25x faster with this patch.
Even for 8 processes the speedup is almost 3x. We have also verified
that for situations where there is only one xattr block of each kind,
the patch doesn't have a measurable cost.

[JK: Remove handling of setting the same value since it is not needed
anymore, check for races in e_reusable setting, improve changelog,
add measurements]

Signed-off-by: Andreas Gruenbacher
Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Andreas Gruenbacher
2016-02-23 11:44:04 +0800
7a2508e1b mbcache2: rename to mbcache ... Browse Code »

Since old mbcache code is gone, let's rename new code to mbcache since
number 2 is now meaningless. This is just a mechanical replacement.

Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2016-02-23 11:35:22 +0800
be0726d33 ext2: convert to mbcache2 ... Browse Code »

The conversion is generally straightforward. We convert filesystem from
a global cache to per-fs one. Similarly to ext4 the tricky part is that
xattr block corresponding to found mbcache entry can get freed before we
get buffer lock for that block. So we have to check whether the entry is
still valid after getting the buffer lock.

Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2016-02-23 00:56:38 +0800

24 Jan, 2016

1 commit

cc673757e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull final vfs updates from Al Viro:

- The ->i_mutex wrappers (with small prereq in lustre)

- a fix for too early freeing of symlink bodies on shmem (they need to
be RCU-delayed) (-stable fodder)

- followup to dedupe stuff merged this cycle

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: abort dedupe loop if fatal signals are pending
make sure that freeing shmem fast symlinks is RCU-delayed
wrappers for ->i_mutex access
lustre: remove unused declaration

Linus Torvalds
2016-01-24 04:24:56 +0800

23 Jan, 2016

2 commits

80b4adcaf ext2: call dax_pfn_mkwrite() for DAX fsync/msync ... Browse Code »

To properly support the new DAX fsync/msync infrastructure filesystems
need to call dax_pfn_mkwrite() so that DAX can track when user pages are
dirtied.

Signed-off-by: Ross Zwisler
Cc: "H. Peter Anvin"
Cc: "J. Bruce Fields"
Cc: "Theodore Ts'o"
Cc: Alexander Viro
Cc: Andreas Dilger
Cc: Dave Chinner
Cc: Ingo Molnar
Cc: Jan Kara
Cc: Jeff Layton
Cc: Matthew Wilcox
Cc: Thomas Gleixner
Cc: Dan Williams
Cc: Matthew Wilcox
Cc: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ross Zwisler
2016-01-23 09:02:18 +0800
5955102c9 wrappers for ->i_mutex access ... Browse Code »

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro

Al Viro
2016-01-23 07:04:28 +0800

15 Jan, 2016

1 commit

5d097056c kmemcg: account certain kmem allocations to memcg ... Browse Code »

Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg. For the list, see below:

- threadinfo
- task_struct
- task_delay_info
- pid
- cred
- mm_struct
- vm_area_struct and vm_region (nommu)
- anon_vma and anon_vma_chain
- signal_struct
- sighand_struct
- fs_struct
- files_struct
- fdtable and fdtable->full_fds_bits
- dentry and external_name
- inode for all filesystems. This is the most tedious part, because
most filesystems overwrite the alloc_inode method.

The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds. Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Vladimir Davydov
Acked-by: Johannes Weiner
Acked-by: Michal Hocko
Cc: Tejun Heo
Cc: Greg Thelen
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2016-01-15 08:00:49 +0800

13 Jan, 2016

1 commit

33caf82ac Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull misc vfs updates from Al Viro:
"All kinds of stuff. That probably should've been 5 or 6 separate
branches, but by the time I'd realized how large and mixed that bag
had become it had been too close to -final to play with rebasing.

Some fs/namei.c cleanups there, memdup_user_nul() introduction and
switching open-coded instances, burying long-dead code, whack-a-mole
of various kinds, several new helpers for ->llseek(), assorted
cleanups and fixes from various people, etc.

One piece probably deserves special mention - Neil's
lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets
called without ->i_mutex and tries to avoid ever taking it. That, of
course, means that it's not useful for any directory modifications,
but things like getting inode attributes in nfds readdirplus are fine
with that. I really should've asked for moratorium on lookup-related
changes this cycle, but since I hadn't done that early enough... I
*am* asking for that for the coming cycle, though - I'm going to try
and get conversion of i_mutex to rwsem with ->lookup() done under lock
taken shared.

There will be a patch closer to the end of the window, along the lines
of the one Linus had posted last May - mechanical conversion of
->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/
inode_is_locked()/inode_lock_nested(). To quote Linus back then:

-----
| This is an automated patch using
|
| sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/'
| sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/'
| sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[ ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/'
| sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/'
| sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/'
|
| with a very few manual fixups
-----

I'm going to send that once the ->i_mutex-affecting stuff in -next
gets mostly merged (or when Linus says he's about to stop taking
merges)"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
nfsd: don't hold i_mutex over userspace upcalls
fs:affs:Replace time_t with time64_t
fs/9p: use fscache mutex rather than spinlock
proc: add a reschedule point in proc_readfd_common()
logfs: constify logfs_block_ops structures
fcntl: allow to set O_DIRECT flag on pipe
fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
fs: xattr: Use kvfree()
[s390] page_to_phys() always returns a multiple of PAGE_SIZE
nbd: use ->compat_ioctl()
fs: use block_device name vsprintf helper
lib/vsprintf: add %*pg format specifier
fs: use gendisk->disk_name where possible
poll: plug an unused argument to do_poll
amdkfd: don't open-code memdup_user()
cdrom: don't open-code memdup_user()
rsxx: don't open-code memdup_user()
mtip32xx: don't open-code memdup_user()
[um] mconsole: don't open-code memdup_user_nul()
[um] hostaudio: don't open-code memdup_user()
...

Linus Torvalds
2016-01-13 09:11:47 +0800

12 Jan, 2016

1 commit

ddf1d6238 Merge branch 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs xattr updates from Al Viro:
"Andreas' xattr cleanup series.

It's a followup to his xattr work that went in last cycle; -0.5KLoC"

* 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
xattr handlers: Simplify list operation
ocfs2: Replace list xattr handler operations
nfs: Move call to security_inode_listsecurity into nfs_listxattr
xfs: Change how listxattr generates synthetic attributes
tmpfs: listxattr should include POSIX ACL xattrs
tmpfs: Use xattr handler infrastructure
btrfs: Use xattr handler infrastructure
vfs: Distinguish between full xattr names and proper prefixes
posix acls: Remove duplicate xattr name definitions
gfs2: Remove gfs2_xattr_acl_chmod
vfs: Remove vfs_xattr_cmp

Linus Torvalds
2016-01-12 05:32:10 +0800

07 Jan, 2016

1 commit

a1c6f0573 fs: use block_device name vsprintf helper ... Browse Code »

Signed-off-by: Dmitry Monakhov
Signed-off-by: Al Viro

Dmitry Monakhov
2016-01-07 02:03:18 +0800

31 Dec, 2015

1 commit

fceef393a switch ->get_link() to delayed_call, kill ->put_link() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2015-12-31 02:01:03 +0800

14 Dec, 2015

1 commit

764a5c6b1 xattr handlers: Simplify list operation ... Browse Code »

Change the list operation to only return whether or not an attribute
should be listed. Copying the attribute names into the buffer is moved
to the callers.

Since the result only depends on the dentry and not on the attribute
name, we do not pass the attribute name to list operations.

Signed-off-by: Andreas Gruenbacher
Signed-off-by: Al Viro

Andreas Gruenbacher
2015-12-14 08:46:12 +0800

09 Dec, 2015

2 commits

6b2553918 replace ->follow_link() with new method that could stay in RCU mode ... Browse Code »

new method: ->get_link(); replacement of ->follow_link(). The differences
are:
* inode and dentry are passed separately
* might be called both in RCU and non-RCU mode;
the former is indicated by passing it a NULL dentry.
* when called that way it isn't allowed to block
and should return ERR_PTR(-ECHILD) if it needs to be called
in non-RCU mode.

It's a flagday change - the old method is gone, all in-tree instances
converted. Conversion isn't hard; said that, so far very few instances
do not immediately bail out when called in RCU mode. That'll change
in the next commits.

Signed-off-by: Al Viro

Al Viro
2015-12-09 11:41:54 +0800
21fc61c73 don't put symlink bodies in pagecache into highmem ... Browse Code »

kmap() in page_follow_link_light() needed to go - allowing to hold
an arbitrary number of kmaps for long is a great way to deadlocking
the system.

new helper (inode_nohighmem(inode)) needs to be used for pagecache
symlinks inodes; done for all in-tree cases. page_follow_link_light()
instrumented to yell about anything missed.

Signed-off-by: Al Viro

Al Viro
2015-12-09 11:41:36 +0800

07 Dec, 2015

1 commit

98e9cb571 vfs: Distinguish between full xattr names and proper prefixes ... Browse Code »

Add an additional "name" field to struct xattr_handler. When the name
is set, the handler matches attributes with exactly that name. When the
prefix is set instead, the handler matches attributes with the given
prefix and with a non-empty suffix.

This patch should avoid bugs like the one fixed in commit c361016a in
the future.

Signed-off-by: Andreas Gruenbacher
Reviewed-by: James Morris
Signed-off-by: Al Viro

Andreas Gruenbacher
2015-12-07 10:33:52 +0800

17 Nov, 2015

1 commit

ef83b6e8f ext2, ext4: warn when mounting with dax enabled ... Browse Code »

Similar to XFS warn when mounting DAX while it is still considered under
development. Also, aspects of the DAX implementation, for example
synchronization against multiple faults and faults causing block
allocation, depend on the correct implementation in the filesystem. The
maturity of a given DAX implementation is filesystem specific.

Cc:
Cc: "Theodore Ts'o"
Cc: Matthew Wilcox
Cc: linux-ext4@vger.kernel.org
Cc: Kirill A. Shutemov
Reported-by: Dave Chinner
Acked-by: Jan Kara
Signed-off-by: Dan Williams

Dan Williams
2015-11-17 01:43:54 +0800