Eric Lee / smarc-fsl-linux-kernel

05 Apr, 2016

2 commits

ea1754a08 mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage ... Browse Code »

Mostly direct substitution with occasional adjustment or removing
outdated comments.

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800
09cbfeaf1 mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros ... Browse Code »

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800

18 Mar, 2016

1 commit

faeb20ecf Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 ... Browse Code »

Pull ext4 updates from Ted Ts'o:
"Performance improvements in SEEK_DATA and xattr scalability
improvements, plus a lot of clean ups and bug fixes"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (38 commits)
ext4: clean up error handling in the MMP support
jbd2: do not fail journal because of frozen_buffer allocation failure
ext4: use __GFP_NOFAIL in ext4_free_blocks()
ext4: fix compile error while opening the macro DOUBLE_CHECK
ext4: print ext4 mount option data_err=abort correctly
ext4: fix NULL pointer dereference in ext4_mark_inode_dirty()
ext4: drop unneeded BUFFER_TRACE in ext4_delete_inline_entry()
ext4: fix misspellings in comments.
jbd2: fix FS corruption possibility in jbd2_journal_destroy() on umount path
ext4: more efficient SEEK_DATA implementation
ext4: cleanup handling of bh->b_state in DAX mmap
ext4: return hole from ext4_map_blocks()
ext4: factor out determining of hole size
ext4: fix setting of referenced bit in ext4_es_lookup_extent()
ext4: remove i_ioend_count
ext4: simplify io_end handling for AIO DIO
ext4: move trans handling and completion deferal out of _ext4_get_block
ext4: rename and split get blocks functions
ext4: use i_mutex to serialize unaligned AIO DIO
ext4: pack ioend structure better
...

Linus Torvalds
2016-03-18 07:31:18 +0800

28 Feb, 2016

5 commits

691429e13 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge fixes from Andrew Morton:
"10 fixes"

* emailed patches from Andrew Morton :
dax: move writeback calls into the filesystems
dax: give DAX clearing code correct bdev
ext4: online defrag not supported with DAX
ext2, ext4: only set S_DAX for regular inodes
block: disable block device DAX by default
ocfs2: unlock inode if deleting inode from orphan fails
mm: ASLR: use get_random_long()
drivers: char: random: add get_random_long()
mm: numa: quickly fail allocations for NUMA balancing on full nodes
mm: thp: fix SMP race condition between THP page fault and MADV_DONTNEED

Linus Torvalds
2016-02-28 04:46:16 +0800
1e9d180ba ext2, ext4: fix issue with missing journal entry in ext4_dax_mkwrite() ... Browse Code »

As it is currently written ext4_dax_mkwrite() assumes that the call into
__dax_mkwrite() will not have to do a block allocation so it doesn't create
a journal entry. For a read that creates a zero page to cover a hole
followed by a write that actually allocates storage this is incorrect. The
ext4_dax_mkwrite() -> __dax_mkwrite() -> __dax_fault() path calls
get_blocks() to allocate storage.

Fix this by having the ->page_mkwrite fault handler call ext4_dax_fault()
as this function already has all the logic needed to allocate a journal
entry and call __dax_fault().

Also update the ext2 fault handlers in this same way to remove duplicate
code and keep the logic between ext2 and ext4 the same.

Reviewed-by: Jan Kara
Signed-off-by: Ross Zwisler
Signed-off-by: Theodore Ts'o

Ross Zwisler
2016-02-28 03:01:16 +0800
7f6d5b529 dax: move writeback calls into the filesystems ... Browse Code »

Previously calls to dax_writeback_mapping_range() for all DAX filesystems
(ext2, ext4 & xfs) were centralized in filemap_write_and_wait_range().

dax_writeback_mapping_range() needs a struct block_device, and it used
to get that from inode->i_sb->s_bdev. This is correct for normal inodes
mounted on ext2, ext4 and XFS filesystems, but is incorrect for DAX raw
block devices and for XFS real-time files.

Instead, call dax_writeback_mapping_range() directly from the filesystem
->writepages function so that it can supply us with a valid block
device. This also fixes DAX code to properly flush caches in response
to sync(2).

Signed-off-by: Ross Zwisler
Signed-off-by: Jan Kara
Cc: Al Viro
Cc: Dan Williams
Cc: Dave Chinner
Cc: Jens Axboe
Cc: Matthew Wilcox
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ross Zwisler
2016-02-28 02:28:52 +0800
20a90f589 dax: give DAX clearing code correct bdev ... Browse Code »

dax_clear_blocks() needs a valid struct block_device and previously it
was using inode->i_sb->s_bdev in all cases. This is correct for normal
inodes on mounted ext2, ext4 and XFS filesystems, but is incorrect for
DAX raw block devices and for XFS real-time devices.

Instead, rename dax_clear_blocks() to dax_clear_sectors(), and change
its arguments to take a bdev and a sector instead of an inode and a
block. This better reflects what the function does, and it allows the
filesystem and raw block device code to pass in an appropriate struct
block_device.

Signed-off-by: Ross Zwisler
Suggested-by: Dan Williams
Reviewed-by: Jan Kara
Cc: Theodore Ts'o
Cc: Al Viro
Cc: Dave Chinner
Cc: Jens Axboe
Cc: Matthew Wilcox
Cc: Ross Zwisler
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ross Zwisler
2016-02-28 02:28:52 +0800
0a6cf9137 ext2, ext4: only set S_DAX for regular inodes ... Browse Code »

When S_DAX is set on an inode we assume that if there are pages attached
to the mapping (mapping->nrpages != 0), those pages are clean zero pages
that were used to service reads from holes. Any dirty data associated
with the inode should be in the form of DAX exceptional entries
(mapping->nrexceptional) that is written back via
dax_writeback_mapping_range().

With the current code, though, this isn't always true. For example,
ext2 and ext4 directory inodes can have S_DAX set, but have their dirty
data stored as dirty page cache entries. For these types of inodes,
having S_DAX set doesn't really make sense since their I/O doesn't
actually happen through the DAX code path.

Instead, only allow S_DAX to be set for regular inodes for ext2 and
ext4. This allows us to have strict DAX vs non-DAX paths in the
writeback code.

Signed-off-by: Ross Zwisler
Reviewed-by: Jan Kara
Cc: Theodore Ts'o
Cc: Al Viro
Cc: Dan Williams
Cc: Dave Chinner
Cc: Jens Axboe
Cc: Matthew Wilcox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ross Zwisler
2016-02-28 02:28:52 +0800

23 Feb, 2016

3 commits

6048c64b2 mbcache: add reusable flag to cache entries ... Browse Code »

To reduce amount of damage caused by single bad block, we limit number
of inodes sharing an xattr block to 1024. Thus there can be more xattr
blocks with the same contents when there are lots of files with the same
extended attributes. These xattr blocks naturally result in hash
collisions and can form long hash chains and we unnecessarily check each
such block only to find out we cannot use it because it is already
shared by too many inodes.

Add a reusable flag to cache entries which is cleared when a cache entry
has reached its maximum refcount. Cache entries which are not marked
reusable are skipped by mb_cache_entry_find_{first,next}. This
significantly speeds up mbcache when there are many same xattr blocks.
For example for xattr-bench with 5 values and each process handling
20000 files, the run for 64 processes is 25x faster with this patch.
Even for 8 processes the speedup is almost 3x. We have also verified
that for situations where there is only one xattr block of each kind,
the patch doesn't have a measurable cost.

[JK: Remove handling of setting the same value since it is not needed
anymore, check for races in e_reusable setting, improve changelog,
add measurements]

Signed-off-by: Andreas Gruenbacher
Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Andreas Gruenbacher
2016-02-23 11:44:04 +0800
7a2508e1b mbcache2: rename to mbcache ... Browse Code »

Since old mbcache code is gone, let's rename new code to mbcache since
number 2 is now meaningless. This is just a mechanical replacement.

Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2016-02-23 11:35:22 +0800
be0726d33 ext2: convert to mbcache2 ... Browse Code »

The conversion is generally straightforward. We convert filesystem from
a global cache to per-fs one. Similarly to ext4 the tricky part is that
xattr block corresponding to found mbcache entry can get freed before we
get buffer lock for that block. So we have to check whether the entry is
still valid after getting the buffer lock.

Signed-off-by: Jan Kara
Signed-off-by: Theodore Ts'o

Jan Kara
2016-02-23 00:56:38 +0800

24 Jan, 2016

1 commit

cc673757e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull final vfs updates from Al Viro:

- The ->i_mutex wrappers (with small prereq in lustre)

- a fix for too early freeing of symlink bodies on shmem (they need to
be RCU-delayed) (-stable fodder)

- followup to dedupe stuff merged this cycle

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: abort dedupe loop if fatal signals are pending
make sure that freeing shmem fast symlinks is RCU-delayed
wrappers for ->i_mutex access
lustre: remove unused declaration

Linus Torvalds
2016-01-24 04:24:56 +0800

23 Jan, 2016

2 commits

80b4adcaf ext2: call dax_pfn_mkwrite() for DAX fsync/msync ... Browse Code »

To properly support the new DAX fsync/msync infrastructure filesystems
need to call dax_pfn_mkwrite() so that DAX can track when user pages are
dirtied.

Signed-off-by: Ross Zwisler
Cc: "H. Peter Anvin"
Cc: "J. Bruce Fields"
Cc: "Theodore Ts'o"
Cc: Alexander Viro
Cc: Andreas Dilger
Cc: Dave Chinner
Cc: Ingo Molnar
Cc: Jan Kara
Cc: Jeff Layton
Cc: Matthew Wilcox
Cc: Thomas Gleixner
Cc: Dan Williams
Cc: Matthew Wilcox
Cc: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ross Zwisler
2016-01-23 09:02:18 +0800
5955102c9 wrappers for ->i_mutex access ... Browse Code »

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro

Al Viro
2016-01-23 07:04:28 +0800

15 Jan, 2016

1 commit

5d097056c kmemcg: account certain kmem allocations to memcg ... Browse Code »

Mark those kmem allocations that are known to be easily triggered from
userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
memcg. For the list, see below:

- threadinfo
- task_struct
- task_delay_info
- pid
- cred
- mm_struct
- vm_area_struct and vm_region (nommu)
- anon_vma and anon_vma_chain
- signal_struct
- sighand_struct
- fs_struct
- files_struct
- fdtable and fdtable->full_fds_bits
- dentry and external_name
- inode for all filesystems. This is the most tedious part, because
most filesystems overwrite the alloc_inode method.

The list is far from complete, so feel free to add more objects.
Nevertheless, it should be close to "account everything" approach and
keep most workloads within bounds. Malevolent users will be able to
breach the limit, but this was possible even with the former "account
everything" approach (simply because it did not account everything in
fact).

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Vladimir Davydov
Acked-by: Johannes Weiner
Acked-by: Michal Hocko
Cc: Tejun Heo
Cc: Greg Thelen
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2016-01-15 08:00:49 +0800

13 Jan, 2016

1 commit

33caf82ac Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull misc vfs updates from Al Viro:
"All kinds of stuff. That probably should've been 5 or 6 separate
branches, but by the time I'd realized how large and mixed that bag
had become it had been too close to -final to play with rebasing.

Some fs/namei.c cleanups there, memdup_user_nul() introduction and
switching open-coded instances, burying long-dead code, whack-a-mole
of various kinds, several new helpers for ->llseek(), assorted
cleanups and fixes from various people, etc.

One piece probably deserves special mention - Neil's
lookup_one_len_unlocked(). Similar to lookup_one_len(), but gets
called without ->i_mutex and tries to avoid ever taking it. That, of
course, means that it's not useful for any directory modifications,
but things like getting inode attributes in nfds readdirplus are fine
with that. I really should've asked for moratorium on lookup-related
changes this cycle, but since I hadn't done that early enough... I
*am* asking for that for the coming cycle, though - I'm going to try
and get conversion of i_mutex to rwsem with ->lookup() done under lock
taken shared.

There will be a patch closer to the end of the window, along the lines
of the one Linus had posted last May - mechanical conversion of
->i_mutex accesses to inode_lock()/inode_unlock()/inode_trylock()/
inode_is_locked()/inode_lock_nested(). To quote Linus back then:

-----
| This is an automated patch using
|
| sed 's/mutex_lock(&\(.*\)->i_mutex)/inode_lock(\1)/'
| sed 's/mutex_unlock(&\(.*\)->i_mutex)/inode_unlock(\1)/'
| sed 's/mutex_lock_nested(&\(.*\)->i_mutex,[ ]*I_MUTEX_\([A-Z0-9_]*\))/inode_lock_nested(\1, I_MUTEX_\2)/'
| sed 's/mutex_is_locked(&\(.*\)->i_mutex)/inode_is_locked(\1)/'
| sed 's/mutex_trylock(&\(.*\)->i_mutex)/inode_trylock(\1)/'
|
| with a very few manual fixups
-----

I'm going to send that once the ->i_mutex-affecting stuff in -next
gets mostly merged (or when Linus says he's about to stop taking
merges)"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (63 commits)
nfsd: don't hold i_mutex over userspace upcalls
fs:affs:Replace time_t with time64_t
fs/9p: use fscache mutex rather than spinlock
proc: add a reschedule point in proc_readfd_common()
logfs: constify logfs_block_ops structures
fcntl: allow to set O_DIRECT flag on pipe
fs: __generic_file_splice_read retry lookup on AOP_TRUNCATED_PAGE
fs: xattr: Use kvfree()
[s390] page_to_phys() always returns a multiple of PAGE_SIZE
nbd: use ->compat_ioctl()
fs: use block_device name vsprintf helper
lib/vsprintf: add %*pg format specifier
fs: use gendisk->disk_name where possible
poll: plug an unused argument to do_poll
amdkfd: don't open-code memdup_user()
cdrom: don't open-code memdup_user()
rsxx: don't open-code memdup_user()
mtip32xx: don't open-code memdup_user()
[um] mconsole: don't open-code memdup_user_nul()
[um] hostaudio: don't open-code memdup_user()
...

Linus Torvalds
2016-01-13 09:11:47 +0800

12 Jan, 2016

1 commit

ddf1d6238 Merge branch 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs xattr updates from Al Viro:
"Andreas' xattr cleanup series.

It's a followup to his xattr work that went in last cycle; -0.5KLoC"

* 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
xattr handlers: Simplify list operation
ocfs2: Replace list xattr handler operations
nfs: Move call to security_inode_listsecurity into nfs_listxattr
xfs: Change how listxattr generates synthetic attributes
tmpfs: listxattr should include POSIX ACL xattrs
tmpfs: Use xattr handler infrastructure
btrfs: Use xattr handler infrastructure
vfs: Distinguish between full xattr names and proper prefixes
posix acls: Remove duplicate xattr name definitions
gfs2: Remove gfs2_xattr_acl_chmod
vfs: Remove vfs_xattr_cmp

Linus Torvalds
2016-01-12 05:32:10 +0800

07 Jan, 2016

1 commit

a1c6f0573 fs: use block_device name vsprintf helper ... Browse Code »

Signed-off-by: Dmitry Monakhov
Signed-off-by: Al Viro

Dmitry Monakhov
2016-01-07 02:03:18 +0800

31 Dec, 2015

1 commit

fceef393a switch ->get_link() to delayed_call, kill ->put_link() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2015-12-31 02:01:03 +0800

14 Dec, 2015

1 commit

764a5c6b1 xattr handlers: Simplify list operation ... Browse Code »

Change the list operation to only return whether or not an attribute
should be listed. Copying the attribute names into the buffer is moved
to the callers.

Since the result only depends on the dentry and not on the attribute
name, we do not pass the attribute name to list operations.

Signed-off-by: Andreas Gruenbacher
Signed-off-by: Al Viro

Andreas Gruenbacher
2015-12-14 08:46:12 +0800

09 Dec, 2015

2 commits

6b2553918 replace ->follow_link() with new method that could stay in RCU mode ... Browse Code »

new method: ->get_link(); replacement of ->follow_link(). The differences
are:
* inode and dentry are passed separately
* might be called both in RCU and non-RCU mode;
the former is indicated by passing it a NULL dentry.
* when called that way it isn't allowed to block
and should return ERR_PTR(-ECHILD) if it needs to be called
in non-RCU mode.

It's a flagday change - the old method is gone, all in-tree instances
converted. Conversion isn't hard; said that, so far very few instances
do not immediately bail out when called in RCU mode. That'll change
in the next commits.

Signed-off-by: Al Viro

Al Viro
2015-12-09 11:41:54 +0800
21fc61c73 don't put symlink bodies in pagecache into highmem ... Browse Code »

kmap() in page_follow_link_light() needed to go - allowing to hold
an arbitrary number of kmaps for long is a great way to deadlocking
the system.

new helper (inode_nohighmem(inode)) needs to be used for pagecache
symlinks inodes; done for all in-tree cases. page_follow_link_light()
instrumented to yell about anything missed.

Signed-off-by: Al Viro

Al Viro
2015-12-09 11:41:36 +0800

07 Dec, 2015

1 commit

98e9cb571 vfs: Distinguish between full xattr names and proper prefixes ... Browse Code »

Add an additional "name" field to struct xattr_handler. When the name
is set, the handler matches attributes with exactly that name. When the
prefix is set instead, the handler matches attributes with the given
prefix and with a non-empty suffix.

This patch should avoid bugs like the one fixed in commit c361016a in
the future.

Signed-off-by: Andreas Gruenbacher
Reviewed-by: James Morris
Signed-off-by: Al Viro

Andreas Gruenbacher
2015-12-07 10:33:52 +0800

17 Nov, 2015

1 commit

ef83b6e8f ext2, ext4: warn when mounting with dax enabled ... Browse Code »

Similar to XFS warn when mounting DAX while it is still considered under
development. Also, aspects of the DAX implementation, for example
synchronization against multiple faults and faults causing block
allocation, depend on the correct implementation in the filesystem. The
maturity of a given DAX implementation is filesystem specific.

Cc:
Cc: "Theodore Ts'o"
Cc: Matthew Wilcox
Cc: linux-ext4@vger.kernel.org
Cc: Kirill A. Shutemov
Reported-by: Dave Chinner
Acked-by: Jan Kara
Signed-off-by: Dan Williams

Dan Williams
2015-11-17 01:43:54 +0800

14 Nov, 2015

2 commits

5d2eb548b Merge branch 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs xattr cleanups from Al Viro.

* 'for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
f2fs: xattr simplifications
squashfs: xattr simplifications
9p: xattr simplifications
xattr handlers: Pass handler to operations instead of flags
jffs2: Add missing capability check for listing trusted xattrs
hfsplus: Remove unused xattr handler list operations
ubifs: Remove unused security xattr handler
vfs: Fix the posix_acl_xattr_list return value
vfs: Check attribute names in posix acl xattr handers

Linus Torvalds
2015-11-14 10:02:30 +0800
d9a82a040 xattr handlers: Pass handler to operations instead of flags ... Browse Code »

The xattr_handler operations are currently all passed a file system
specific flags value which the operations can use to disambiguate between
different handlers; some file systems use that to distinguish the xattr
namespace, for example. In some oprations, it would be useful to also have
access to the handler prefix. To allow that, pass a pointer to the handler
to operations instead of the flags value alone.

Signed-off-by: Andreas Gruenbacher
Reviewed-by: Christoph Hellwig
Signed-off-by: Al Viro

Andreas Gruenbacher
2015-11-14 09:34:32 +0800

10 Nov, 2015

2 commits

bd4f203e4 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge third patch-bomb from Andrew Morton:
"We're pretty much done over here - I'm still waiting for a nouveau
merge so I can cleanly finish up Christoph's dma-mapping rework.

- bunch of small misc stuff

- fold abs64() into abs(), remove abs64()

- new_valid_dev() cleanups

- binfmt_elf_fdpic feature work"

* emailed patches from Andrew Morton : (24 commits)
fs/binfmt_elf_fdpic.c: provide NOMMU loader for regular ELF binaries
fs/stat.c: remove unnecessary new_valid_dev() check
fs/reiserfs/namei.c: remove unnecessary new_valid_dev() check
fs/nilfs2/namei.c: remove unnecessary new_valid_dev() check
fs/ncpfs/dir.c: remove unnecessary new_valid_dev() check
fs/jfs: remove unnecessary new_valid_dev() checks
fs/hpfs/namei.c: remove unnecessary new_valid_dev() check
fs/f2fs/namei.c: remove unnecessary new_valid_dev() check
fs/ext2/namei.c: remove unnecessary new_valid_dev() check
fs/exofs/namei.c: remove unnecessary new_valid_dev() check
fs/btrfs/inode.c: remove unnecessary new_valid_dev() check
fs/9p: remove unnecessary new_valid_dev() checks
include/linux/kdev_t.h: old/new_valid_dev() can return bool
include/linux/kdev_t.h: remove unused huge_valid_dev()
kmap_atomic_to_page() has no users, remove it
drivers/scsi/cxgbi: fix build with EXTRA_CFLAGS
dma: remove external references to dma_supported
Documentation/sysctl/vm.txt: fix misleading code reference of overcommit_memory
remove abs64()
kernel.h: make abs() work with 64-bit types
...

Linus Torvalds
2015-11-10 13:05:13 +0800
d7df00072 fs/ext2/namei.c: remove unnecessary new_valid_dev() check ... Browse Code »

new_valid_dev() always returns 1, so the !new_valid_dev() check is not
needed. Remove it.

Signed-off-by: Yaowei Bai
Cc: Alexander Viro
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yaowei Bai
2015-11-10 07:11:24 +0800

19 Oct, 2015

1 commit

5726b27b0 ext2: Add locking for DAX faults ... Browse Code »

Add locking to ensure that DAX faults are isolated from ext2 operations
that modify the data blocks allocation for an inode. This is intended to
be analogous to the work being done in XFS by Dave Chinner:

http://www.spinics.net/lists/linux-fsdevel/msg90260.html

Compared with XFS the ext2 case is greatly simplified by the fact that ext2
already allocates and zeros new blocks before they are returned as part of
ext2_get_block(), so DAX doesn't need to worry about getting unmapped or
unwritten buffer heads.

This means that the only work we need to do in ext2 is to isolate the DAX
faults from inode block allocation changes. I believe this just means that
we need to isolate the DAX faults from truncate operations.

The newly introduced dax_sem is intended to replicate the protection
offered by i_mmaplock in XFS. In addition to truncate the i_mmaplock also
protects XFS operations like hole punching, fallocate down, extent
manipulation IOCTLS like xfs_ioc_space() and extent swapping. Truncate is
the only one of these operations supported by ext2.

Signed-off-by: Ross Zwisler
Signed-off-by: Jan Kara

Ross Zwisler
2015-10-19 20:40:54 +0800

09 Sep, 2015

2 commits

e7b1ea2ad ext2: huge page fault support ... Browse Code »

Use DAX to provide support for huge pages.

Signed-off-by: Matthew Wilcox
Cc: Hillf Danton
Cc: "Kirill A. Shutemov"
Cc: Theodore Ts'o
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2015-09-09 06:35:28 +0800
c94c2acf8 dax: move DAX-related functions to a new header ... Browse Code »

In order to handle the !CONFIG_TRANSPARENT_HUGEPAGES case, we need to
return VM_FAULT_FALLBACK from the inlined dax_pmd_fault(), which is
defined in linux/mm.h. Given that we don't want to include
in , the easiest solution is to move the DAX-related
functions to a new header, . We could also have moved
VM_FAULT_* definitions to a new header, or a different header that isn't
quite such a boil-the-ocean header as , but this felt like
the best option.

Signed-off-by: Matthew Wilcox
Cc: Hillf Danton
Cc: "Kirill A. Shutemov"
Cc: Theodore Ts'o
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2015-09-09 06:35:28 +0800

24 Jul, 2015

1 commit

c2edb305d ext2: Handle error from dquot_initalize() ... Browse Code »

dquot_initialize() can now return error. Handle it where possible.

Signed-off-by: Jan Kara

Jan Kara
2015-07-24 02:59:37 +0800

05 Jul, 2015

1 commit

1dc51b828 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull more vfs updates from Al Viro:
"Assorted VFS fixes and related cleanups (IMO the most interesting in
that part are f_path-related things and Eric's descriptor-related
stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
fs-cache series, DAX patches, Jan's file_remove_suid() work"

[ I'd say this is much more than "fixes and related cleanups". The
file_table locking rule change by Eric Dumazet is a rather big and
fundamental update even if the patch isn't huge. - Linus ]

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
9p: cope with bogus responses from server in p9_client_{read,write}
p9_client_write(): avoid double p9_free_req()
9p: forgetting to cancel request on interrupted zero-copy RPC
dax: bdev_direct_access() may sleep
block: Add support for DAX reads/writes to block devices
dax: Use copy_from_iter_nocache
dax: Add block size note to documentation
fs/file.c: __fget() and dup2() atomicity rules
fs/file.c: don't acquire files->file_lock in fd_install()
fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
vfs: avoid creation of inode number 0 in get_next_ino
namei: make set_root_rcu() return void
make simple_positive() public
ufs: use dir_pages instead of ufs_dir_pages()
pagemap.h: move dir_pages() over there
remove the pointless include of lglock.h
fs: cleanup slight list_entry abuse
xfs: Correctly lock inode when removing suid and file capabilities
fs: Call security_ops->inode_killpriv on truncate
fs: Provide function telling whether file_remove_privs() will do anything
...

Linus Torvalds
2015-07-05 10:36:06 +0800

01 Jul, 2015

1 commit

68b4449d7 Merge tag 'xfs-for-linus-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs ... Browse Code »

Pul xfs updates from Dave Chinner:
"There's a couple of small API changes to the core DAX code which
required small changes to the ext2 and ext4 code bases, but otherwise
everything is within the XFS codebase.

This update contains:

- A new sparse on-disk inode record format to allow small extents to
be used for inode allocation when free space is fragmented.

- DAX support. This includes minor changes to the DAX core code to
fix problems with lock ordering and bufferhead mapping abuse.

- transaction commit interface cleanup

- removal of various unnecessary XFS specific type definitions

- cleanup and optimisation of freelist preparation before allocation

- various minor cleanups

- bug fixes for
- transaction reservation leaks
- incorrect inode logging in unwritten extent conversion
- mmap lock vs freeze ordering
- remote symlink mishandling
- attribute fork removal issues"

* tag 'xfs-for-linus-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (49 commits)
xfs: don't truncate attribute extents if no extents exist
xfs: clean up XFS_MIN_FREELIST macros
xfs: sanitise error handling in xfs_alloc_fix_freelist
xfs: factor out free space extent length check
xfs: xfs_alloc_fix_freelist() can use incore perag structures
xfs: remove xfs_caddr_t
xfs: use void pointers in log validation helpers
xfs: return a void pointer from xfs_buf_offset
xfs: remove inst_t
xfs: remove __psint_t and __psunsigned_t
xfs: fix remote symlinks on V5/CRC filesystems
xfs: fix xfs_log_done interface
xfs: saner xfs_trans_commit interface
xfs: remove the flags argument to xfs_trans_cancel
xfs: pass a boolean flag to xfs_trans_free_items
xfs: switch remaining xfs_trans_dup users to xfs_trans_roll
xfs: check min blks for random debug mode sparse allocations
xfs: fix sparse inodes 32-bit compile failure
xfs: add initial DAX support
xfs: add DAX IO path support
...

Linus Torvalds
2015-07-01 11:16:08 +0800

26 Jun, 2015

1 commit

e4bc13adf Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-block ... Browse Code »

Pull cgroup writeback support from Jens Axboe:
"This is the big pull request for adding cgroup writeback support.

This code has been in development for a long time, and it has been
simmering in for-next for a good chunk of this cycle too. This is one
of those problems that has been talked about for at least half a
decade, finally there's a solution and code to go with it.

Also see last weeks writeup on LWN:

http://lwn.net/Articles/648292/"

* 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits)
writeback, blkio: add documentation for cgroup writeback support
vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB
writeback: do foreign inode detection iff cgroup writeback is enabled
v9fs: fix error handling in v9fs_session_init()
bdi: fix wrong error return value in cgwb_create()
buffer: remove unusued 'ret' variable
writeback: disassociate inodes from dying bdi_writebacks
writeback: implement foreign cgroup inode bdi_writeback switching
writeback: add lockdep annotation to inode_to_wb()
writeback: use unlocked_inode_to_wb transaction in inode_congested()
writeback: implement unlocked_inode_to_wb transaction and use it for stat updates
writeback: implement [locked_]inode_to_wb_and_lock_list()
writeback: implement foreign cgroup inode detection
writeback: make writeback_control track the inode being written back
writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb()
mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use
writeback: implement memcg writeback domain based throttling
writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes
writeback: implement memcg wb_domain
writeback: update wb_over_bg_thresh() to use wb_domain aware operations
...

Linus Torvalds
2015-06-26 07:00:17 +0800

24 Jun, 2015

1 commit

b57c2cb9e pagemap.h: move dir_pages() over there ... Browse Code »

That function was declared in a lot of filesystems to calculate
directory pages.

Signed-off-by: Fabian Frederick
Signed-off-by: Al Viro

Fabian Frederick
2015-06-24 06:02:00 +0800

18 Jun, 2015

1 commit

46b15caa7 vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB ... Browse Code »

FS_CGROUP_WRITEBACK indicates whether a file_system_type supports
cgroup writeback; however, different super_blocks of the same
file_system_type may or may not support cgroup writeback depending on
filesystem options. This patch replaces FS_CGROUP_WRITEBACK with a
per-super_block flag.

super_block->s_flags carries some internal flags in the high bits but
it's exposd to userland through uapi header and running out of space
anyway. This patch adds a new field super_block->s_iflags to carry
kernel-internal flags. It is currently only used by the new
SB_I_CGROUPWB flag whose concatenated and abbreviated name is for
consistency with other super_block flags.

ext2_fill_super() is updated to set SB_I_CGROUPWB.

v2: Added super_block->s_iflags instead of stealing another high bit
from sb->s_flags as suggested by Christoph and Jan.

Signed-off-by: Tejun Heo
Cc: Alexander Viro
Cc: linux-fsdevel@vger.kernel.org
Cc: Christoph Hellwig
Cc: Jan Kara
Cc: linux-ext4@vger.kernel.org
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Tejun Heo
2015-06-18 02:47:39 +0800

04 Jun, 2015

1 commit

e842f2903 dax: don't abuse get_block mapping for endio callbacks ... Browse Code »

dax_fault() currently relies on the get_block callback to attach an
io completion callback to the mapping buffer head so that it can
run unwritten extent conversion after zeroing allocated blocks.

Instead of this hack, pass the conversion callback directly into
dax_fault() similar to the get_block callback. When the filesystem
allocates unwritten extents, it will set the buffer_unwritten()
flag, and hence the dax_fault code can call the completion function
in the contexts where it is necessary without overloading the
mapping buffer head.

Note: The changes to ext4 to use this interface are suspect at best.
In fact, the way ext4 did this end_io assignment in the first place
looks suspect because it only set a completion callback when there
wasn't already some other write() call taking place on the same
inode. The ext4 end_io code looks rather intricate and fragile with
all it's reference counting and passing to different contexts for
modification via inode private pointers that aren't protected by
locks...

Signed-off-by: Dave Chinner
Acked-by: Jan Kara
Signed-off-by: Dave Chinner

Dave Chinner
2015-06-04 07:18:18 +0800

02 Jun, 2015

1 commit

108dad65b ext2: enable cgroup writeback support ... Browse Code »

Writeback now supports cgroup writeback and the generic writeback,
buffer, libfs, and mpage helpers that ext2 uses are all updated to
work with cgroup writeback.

This patch enables cgroup writeback for ext2 by adding
FS_CGROUP_WRITEBACK to its ->fs_flags.

Signed-off-by: Tejun Heo
Cc: Jens Axboe
Cc: Jan Kara
Cc: linux-ext4@vger.kernel.org
Signed-off-by: Jens Axboe

Tejun Heo
2015-06-02 22:38:04 +0800

11 May, 2015

1 commit

cbe0fa385 ext2: use simple_follow_link() ... Browse Code »

Reviewed-by: Jan Kara
Signed-off-by: Al Viro

Al Viro
2015-05-11 10:18:21 +0800