Eric Lee / smarc-fsl-linux-kernel

14 Oct, 2016

1 commit

35a891be9 Merge tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/… ... Browse Code »

…kernel/git/dgc/linux-xfs

< XFS has gained super CoW powers! >
----------------------------------
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||

Pull XFS support for shared data extents from Dave Chinner:
"This is the second part of the XFS updates for this merge cycle. This
pullreq contains the new shared data extents feature for XFS.

Given the complexity and size of this change I am expecting - like the
addition of reverse mapping last cycle - that there will be some
follow-up bug fixes and cleanups around the -rc3 stage for issues that
I'm sure will show up once the code hits a wider userbase.

What it is:

At the most basic level we are simply adding shared data extents to
XFS - i.e. a single extent on disk can now have multiple owners. To do
this we have to add new on-disk features to both track the shared
extents and the number of times they've been shared. This is done by
the new "refcount" btree that sits in every allocation group. When we
share or unshare an extent, this tree gets updated.

Along with this new tree, the reverse mapping tree needs to be updated
to track each owner or a shared extent. This also needs to be updated
ever share/unshare operation. These interactions at extent allocation
and freeing time have complex ordering and recovery constraints, so
there's a significant amount of new intent-based transaction code to
ensure that operations are performed atomically from both the runtime
and integrity/crash recovery perspectives.

We also need to break sharing when writes hit a shared extent - this
is where the new copy-on-write implementation comes in. We allocate
new storage and copy the original data along with the overwrite data
into the new location. We only do this for data as we don't share
metadata at all - each inode has it's own metadata that tracks the
shared data extents, the extents undergoing CoW and it's own private
extents.

Of course, being XFS, nothing is simple - we use delayed allocation
for CoW similar to how we use it for normal writes. ENOSPC is a
significant issue here - we build on the reservation code added in
4.8-rc1 with the reverse mapping feature to ensure we don't get
spurious ENOSPC issues part way through a CoW operation. These
mechanisms also help minimise fragmentation due to repeated CoW
operations. To further reduce fragmentation overhead, we've also
introduced a CoW extent size hint, which indicates how large a region
we should allocate when we execute a CoW operation.

With all this functionality in place, we can hook up .copy_file_range,
.clone_file_range and .dedupe_file_range and we gain all the
capabilities of reflink and other vfs provided functionality that
enable manipulation to shared extents. We also added a fallocate mode
that explicitly unshares a range of a file, which we implemented as an
explicit CoW of all the shared extents in a file.

As such, it's a huge chunk of new functionality with new on-disk
format features and internal infrastructure. It warns at mount time as
an experimental feature and that it may eat data (as we do with all
new on-disk features until they stabilise). We have not released
userspace suport for it yet - userspace support currently requires
download from Darrick's xfsprogs repo and build from source, so the
access to this feature is really developer/tester only at this point.
Initial userspace support will be released at the same time the kernel
with this code in it is released.

The new code causes 5-6 new failures with xfstests - these aren't
serious functional failures but things the output of tests changing
slightly due to perturbations in layouts, space usage, etc. OTOH,
we've added 150+ new tests to xfstests that specifically exercise this
new functionality so it's got far better test coverage than any
functionality we've previously added to XFS.

Darrick has done a pretty amazing job getting us to this stage, and
special mention also needs to go to Christoph (review, testing,
improvements and bug fixes) and Brian (caught several intricate bugs
during review) for the effort they've also put in.

Summary:

- unshare range (FALLOC_FL_UNSHARE) support for fallocate

- copy-on-write extent size hints (FS_XFLAG_COWEXTSIZE) for fsxattr
interface

- shared extent support for XFS

- copy-on-write support for shared extents

- copy_file_range support

- clone_file_range support (implements reflink)

- dedupe_file_range support

- defrag support for reverse mapping enabled filesystems"

* tag 'xfs-reflink-for-linus-4.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (71 commits)
xfs: convert COW blocks to real blocks before unwritten extent conversion
xfs: rework refcount cow recovery error handling
xfs: clear reflink flag if setting realtime flag
xfs: fix error initialization
xfs: fix label inaccuracies
xfs: remove isize check from unshare operation
xfs: reduce stack usage of _reflink_clear_inode_flag
xfs: check inode reflink flag before calling reflink functions
xfs: implement swapext for rmap filesystems
xfs: refactor swapext code
xfs: various swapext cleanups
xfs: recognize the reflink feature bit
xfs: simulate per-AG reservations being critically low
xfs: don't mix reflink and DAX mode for now
xfs: check for invalid inode reflink flags
xfs: set a default CoW extent size of 32 blocks
xfs: convert unwritten status of reverse mappings for shared files
xfs: use interval query for rmap alloc operations on shared files
xfs: add shared rmap map/unmap/convert log item types
xfs: increase log reservations for reflink
...

Linus Torvalds
2016-10-14 11:28:22 +0800

12 Oct, 2016

1 commit

25f4c4141 block: implement (some of) fallocate for block devices ... Browse Code »

After much discussion, it seems that the fallocate feature flag
FALLOC_FL_ZERO_RANGE maps nicely to SCSI WRITE SAME; and the feature
FALLOC_FL_PUNCH_HOLE maps nicely to the devices that have been whitelisted
for zeroing SCSI UNMAP. Punch still requires that FALLOC_FL_KEEP_SIZE is
set. A length that goes past the end of the device will be clamped to the
device size if KEEP_SIZE is set; or will return -EINVAL if not. Both
start and length must be aligned to the device's logical block size.

Since the semantics of fallocate are fairly well established already, wire
up the two pieces. The other fallocate variants (collapse range, insert
range, and allocate blocks) are not supported.

Link: http://lkml.kernel.org/r/147518379992.22791.8849838163218235007.stgit@birch.djwong.org
Signed-off-by: Darrick J. Wong
Reviewed-by: Hannes Reinecke
Reviewed-by: Bart Van Assche
Cc: Theodore Ts'o
Cc: Martin K. Petersen
Cc: Mike Snitzer # tweaked header
Cc: Brian Foster
Cc: Christoph Hellwig
Cc: Hannes Reinecke
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Darrick J. Wong
2016-10-12 06:06:30 +0800

04 Oct, 2016

1 commit

71be6b494 vfs: add a FALLOC_FL_UNSHARE mode to fallocate to unshare a range of blocks ... Browse Code »

Add a new fallocate mode flag that explicitly unshares blocks on
filesystems that support such features. The new flag can only
be used with an allocate-mode fallocate call.

Signed-off-by: Darrick J. Wong

Darrick J. Wong
2016-10-04 00:11:14 +0800

16 Sep, 2016

2 commits

4d0c5ba2f vfs: do get_write_access() on upper layer of overlayfs ... Browse Code »

The problem with writecount is: we want consistent handling of it for
underlying filesystems as well as overlayfs. Making sure i_writecount is
correct on all layers is difficult. Instead this patch makes sure that
when write access is acquired, it's always done on the underlying writable
layer (called the upper layer). We must also make sure to look at the
writecount on this layer when checking for conflicting leases.

Open for write already updates the upper layer's writecount. Leaving only
truncate.

For truncate copy up must happen before get_write_access() so that the
writecount is updated on the upper layer. Problem with this is if
something fails after that, then copy-up was done needlessly. E.g. if
break_lease() was interrupted. Probably not a big deal in practice.

Another interesting case is if there's a denywrite on a lower file that is
then opened for write or truncated. With this patch these will succeed,
which is somewhat counterintuitive. But I think it's still acceptable,
considering that the copy-up does actually create a different file, so the
old, denywrite mapping won't be touched.

On non-overlayfs d_real() is an identity function and d_real_inode() is
equivalent to d_inode() so this patch doesn't change behavior in that case.

Signed-off-by: Miklos Szeredi
Acked-by: Jeff Layton
Cc: "J. Bruce Fields"

Miklos Szeredi
2016-09-16 18:44:21 +0800
c568d6834 locks: fix file locking on overlayfs ... Browse Code »

This patch allows flock, posix locks, ofd locks and leases to work
correctly on overlayfs.

Instead of using the underlying inode for storing lock context use the
overlay inode. This allows locks to be persistent across copy-up.

This is done by introducing locks_inode() helper and using it instead of
file_inode() to get the inode in locking code. For non-overlayfs the two
are equivalent, except for an extra pointer dereference in locks_inode().

Since lock operations are in "struct file_operations" we must also make
sure not to call underlying filesystem's lock operations. Introcude a
super block flag MS_NOREMOTELOCK to this effect.

Signed-off-by: Miklos Szeredi
Acked-by: Jeff Layton
Cc: "J. Bruce Fields"

Miklos Szeredi
2016-09-16 18:44:20 +0800

07 Aug, 2016

1 commit

e9d488c31 Merge tag 'binfmt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/binfmt_misc ... Browse Code »

Pull binfmt_misc update from James Bottomley:
"This update is to allow architecture emulation containers to function
such that the emulation binary can be housed outside the container
itself. The container and fs parts both have acks from relevant
experts.

To use the new feature you have to add an F option to your binfmt_misc
configuration"

From the docs:
"The usual behaviour of binfmt_misc is to spawn the binary lazily when
the misc format file is invoked. However, this doesn't work very well
in the face of mount namespaces and changeroots, so the F mode opens
the binary as soon as the emulation is installed and uses the opened
image to spawn the emulator, meaning it is always available once
installed, regardless of how the environment changes"

* tag 'binfmt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/binfmt_misc:
binfmt_misc: add F option description to documentation
binfmt_misc: add persistent opened binary handler for containers
fs: add filp_clone_open API

Linus Torvalds
2016-08-07 22:13:14 +0800

30 Jun, 2016

1 commit

2d902671c vfs: merge .d_select_inode() into .d_real() ... Browse Code »

The two methods essentially do the same: find the real dentry/inode
belonging to an overlay dentry. The difference is in the usage:

vfs_open() uses ->d_select_inode() and expects the function to perform
copy-up if necessary based on the open flags argument.

file_dentry() uses ->d_real() passing in the overlay dentry as well as the
underlying inode.

vfs_rename() uses ->d_select_inode() but passes zero flags. ->d_real()
with a zero inode would have worked just as well here.

This patch merges the functionality of ->d_select_inode() into ->d_real()
by adding an 'open_flags' argument to the latter.

[Al Viro] Make the signature of d_real() match that of ->d_real() again.
And constify the inode argument, while we are at it.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2016-06-30 14:53:27 +0800

18 May, 2016

1 commit

c52b76185 Merge branch 'work.const-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull 'struct path' constification update from Al Viro:
"'struct path' is passed by reference to a bunch of Linux security
methods; in theory, there's nothing to stop them from modifying the
damn thing and LSM community being what it is, sooner or later some
enterprising soul is going to decide that it's a good idea.

Let's remove the temptation and constify all of those..."

* 'work.const-path' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
constify ima_d_path()
constify security_sb_pivotroot()
constify security_path_chroot()
constify security_path_{link,rename}
apparmor: remove useless checks for NULL ->mnt
constify security_path_{mkdir,mknod,symlink}
constify security_path_{unlink,rmdir}
apparmor: constify common_perm_...()
apparmor: constify aa_path_link()
apparmor: new helper - common_path_perm()
constify chmod_common/security_path_chmod
constify security_sb_mount()
constify chown_common/security_path_chown
tomoyo: constify assorted struct path *
apparmor_path_truncate(): path->mnt is never NULL
constify vfs_truncate()
constify security_path_truncate()
[apparmor] constify struct path * in a bunch of helpers

Linus Torvalds
2016-05-18 05:41:03 +0800

17 May, 2016

1 commit

0e0162bb8 Merge branch 'ovl-fixes' into for-linus ... Browse Code »

Backmerge to resolve a conflict in ovl_lookup_real();
"ovl_lookup_real(): use lookup_one_len_unlocked()" instead,
but it was too late in the cycle to rebase.

Al Viro
2016-05-17 14:17:59 +0800

11 May, 2016

1 commit

54d5ca871 vfs: add vfs_select_inode() helper ... Browse Code »

Signed-off-by: Miklos Szeredi
Cc: # v4.2+

Miklos Szeredi
2016-05-11 11:55:01 +0800

03 May, 2016

1 commit

63b6df141 give readdir(2)/getdents(2)/etc. uniform exclusion with lseek() ... Browse Code »

same as read() on regular files has, and for the same reason.

Signed-off-by: Al Viro

Al Viro
2016-05-03 07:49:28 +0800

31 Mar, 2016

1 commit

9a08c352d fs: add filp_clone_open API ... Browse Code »

I need an API that allows me to obtain a clone of the current file
pointer to pass in to an exec handler. I've labelled this as an
internal API because I can't see how it would be useful outside of the
fs subsystem. The use case will be a persistent binfmt_misc handler.

Signed-off-by: James Bottomley
Acked-by: Serge Hallyn
Acked-by: Jan Kara

James Bottomley
2016-03-31 05:12:22 +0800

28 Mar, 2016

3 commits

be01f9f28 constify chmod_common/security_path_chmod ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2016-03-28 12:47:25 +0800
7fd25dac9 constify chown_common/security_path_chown ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2016-03-28 12:47:24 +0800
7df818b23 constify vfs_truncate() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2016-03-28 12:47:22 +0800

23 Mar, 2016

1 commit

378c6520e fs/coredump: prevent fsuid=0 dumps into user-controlled directories ... Browse Code »

This commit fixes the following security hole affecting systems where
all of the following conditions are fulfilled:

- The fs.suid_dumpable sysctl is set to 2.
- The kernel.core_pattern sysctl's value starts with "/". (Systems
where kernel.core_pattern starts with "|/" are not affected.)
- Unprivileged user namespace creation is permitted. (This is
true on Linux >=3.8, but some distributions disallow it by
default using a distro patch.)

Under these conditions, if a program executes under secure exec rules,
causing it to run with the SUID_DUMP_ROOT flag, then unshares its user
namespace, changes its root directory and crashes, the coredump will be
written using fsuid=0 and a path derived from kernel.core_pattern - but
this path is interpreted relative to the root directory of the process,
allowing the attacker to control where a coredump will be written with
root privileges.

To fix the security issue, always interpret core_pattern for dumps that
are written under SUID_DUMP_ROOT relative to the root directory of init.

Signed-off-by: Jann Horn
Acked-by: Kees Cook
Cc: Al Viro
Cc: "Eric W. Biederman"
Cc: Andy Lutomirski
Cc: Oleg Nesterov
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jann Horn
2016-03-23 06:36:02 +0800

23 Jan, 2016

1 commit

5955102c9 wrappers for ->i_mutex access ... Browse Code »

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro

Al Viro
2016-01-23 07:04:28 +0800

04 Jan, 2016

1 commit

62fb4a155 don't carry MAY_OPEN in op->acc_mode ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2016-01-04 23:28:40 +0800

10 Jul, 2015

1 commit

90f8572b0 vfs: Commit to never having exectuables on proc and sysfs. ... Browse Code »

Today proc and sysfs do not contain any executable files. Several
applications today mount proc or sysfs without noexec and nosuid and
then depend on there being no exectuables files on proc or sysfs.
Having any executable files show on proc or sysfs would cause
a user space visible regression, and most likely security problems.

Therefore commit to never allowing executables on proc and sysfs by
adding a new flag to mark them as filesystems without executables and
enforce that flag.

Test the flag where MNT_NOEXEC is tested today, so that the only user
visible effect will be that exectuables will be treated as if the
execute bit is cleared.

The filesystems proc and sysfs do not currently incoporate any
executable files so this does not result in any user visible effects.

This makes it unnecessary to vet changes to proc and sysfs tightly for
adding exectuable files or changes to chattr that would modify
existing files, as no matter what the individual file say they will
not be treated as exectuable files by the vfs.

Not having to vet changes to closely is important as without this we
are only one proc_create call (or another goof up in the
implementation of notify_change) from having problematic executables
on proc. Those mistakes are all too easy to make and would create
a situation where there are security issues or the assumptions of
some program having to be broken (and cause userspace regressions).

Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2015-07-10 23:39:25 +0800

24 Jun, 2015

2 commits

45f147a1b fs: Call security_ops->inode_killpriv on truncate ... Browse Code »

Comment in include/linux/security.h says that ->inode_killpriv() should
be called when setuid bit is being removed and that similar security
labels (in fact this applies only to file capabilities) should be
removed at this time as well. However we don't call ->inode_killpriv()
when we remove suid bit on truncate.

We fix the problem by calling ->inode_need_killpriv() and subsequently
->inode_killpriv() on truncate the same way as we do it on file write.

After this patch there's only one user of should_remove_suid() - ocfs2 -
and indeed it's buggy because it doesn't call ->inode_killpriv() on
write. However fixing it is difficult because of special locking
constraints.

Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2015-06-24 06:01:09 +0800
9bf39ab2a vfs: add file_path() helper ... Browse Code »

Turn
d_path(&file->f_path, ...);
into
file_path(file, ...);

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2015-06-24 06:00:05 +0800

19 Jun, 2015

1 commit

4bacc9c92 overlayfs: Make f_path always point to the overlay and f_inode to the underlay ... Browse Code »

Make file->f_path always point to the overlay dentry so that the path in
/proc/pid/fd is correct and to ensure that label-based LSMs have access to the
overlay as well as the underlay (path-based LSMs probably don't need it).

Using my union testsuite to set things up, before the patch I see:

[root@andromeda union-testsuite]# bash 5 /a/foo107
[root@andromeda union-testsuite]# stat /mnt/a/foo107
...
Device: 23h/35d Inode: 13381 Links: 1
...
[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
...
Device: 23h/35d Inode: 13381 Links: 1
...

After the patch:

[root@andromeda union-testsuite]# bash 5 /mnt/a/foo107
[root@andromeda union-testsuite]# stat /mnt/a/foo107
...
Device: 23h/35d Inode: 40346 Links: 1
...
[root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
...
Device: 23h/35d Inode: 40346 Links: 1
...

Note the change in where /proc/$$/fd/5 points to in the ls command. It was
pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107
(which is correct).

The inode accessed, however, is the lower layer. The union layer is on device
25h/37d and the upper layer on 24h/36d.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2015-06-19 15:19:32 +0800

11 May, 2015

1 commit

63afdfc78 VFS: Handle lower layer dentry/inode in pathwalk ... Browse Code »

Make use of d_backing_inode() in pathwalk to gain access to an
inode or dentry that's on a lower layer.

Signed-off-by: David Howells

David Howells
2015-05-11 20:13:10 +0800

24 Apr, 2015

1 commit

1aef882f0 Merge tag 'xfs-for-linus-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs ... Browse Code »

Pull xfs update from Dave Chinner:
"This update contains:

- RENAME_WHITEOUT support

- conversion of per-cpu superblock accounting to use generic counters

- new inode mmap lock so that we can lock page faults out of
truncate, hole punch and other direct extent manipulation functions
to avoid racing mmap writes from causing data corruption

- rework of direct IO submission and completion to solve data
corruption issue when running concurrent extending DIO writes.
Also solves problem of running IO completion transactions in
interrupt context during size extending AIO writes.

- FALLOC_FL_INSERT_RANGE support for inserting holes into a file via
direct extent manipulation to avoid needing to copy data within the
file

- attribute block header field overflow fix for 64k block size
filesystems

- Lots of changes to log messaging to be more informative and concise
when errors occur. Also prevent a lot of unnecessary log spamming
due to cascading failures in error conditions.

- lots of cleanups and bug fixes

One thing of note is the direct IO fixes that we merged last week
after the window opened. Even though a little late, they fix a user
reported data corruption and have been pretty well tested. I figured
there was not much point waiting another 2 weeks for -rc1 to be
released just so I could send them to you..."

* tag 'xfs-for-linus-4.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (49 commits)
xfs: using generic_file_direct_write() is unnecessary
xfs: direct IO EOF zeroing needs to drain AIO
xfs: DIO write completion size updates race
xfs: DIO writes within EOF don't need an ioend
xfs: handle DIO overwrite EOF update completion correctly
xfs: DIO needs an ioend for writes
xfs: move DIO mapping size calculation
xfs: factor DIO write mapping from get_blocks
xfs: unlock i_mutex in xfs_break_layouts
xfs: kill unnecessary firstused overflow check on attr3 leaf removal
xfs: use larger in-core attr firstused field and detect overflow
xfs: pass attr geometry to attr leaf header conversion functions
xfs: disallow ro->rw remount on norecovery mount
xfs: xfs_shift_file_space can be static
xfs: Add support FALLOC_FL_INSERT_RANGE for fallocate
fs: Add support FALLOC_FL_INSERT_RANGE for fallocate
xfs: Fix incorrect positive ENOMEM return
xfs: xfs_mru_cache_insert() should use GFP_NOFS
xfs: %pF is only for function pointers
xfs: fix shadow warning in xfs_da3_root_split()
...

Linus Torvalds
2015-04-24 22:08:41 +0800

12 Apr, 2015

3 commits

843631820 ->aio_read and ->aio_write removed ... Browse Code »

no remaining users

Signed-off-by: Al Viro

Al Viro
2015-04-12 10:29:43 +0800
c1b8940b4 NFS: fix BUG() crash in notify_change() with patch to chown_common() ... Browse Code »

We have observed a BUG() crash in fs/attr.c:notify_change(). The crash
occurs during an rsync into a filesystem that is exported via NFS.

1.) fs/attr.c:notify_change() modifies the caller's version of attr.
2.) 6de0ec00ba8d ("VFS: make notify_change pass ATTR_KILL_S*ID to
setattr operations") introduced a BUG() restriction such that "no
function will ever call notify_change() with both ATTR_MODE and
ATTR_KILL_S*ID set". Under some circumstances though, it will have
assisted in setting the caller's version of attr to this very
combination.
3.) 27ac0ffeac80 ("locks: break delegations on any attribute
modification") introduced code to handle breaking
delegations. This can result in notify_change() being re-called. attr
_must_ be explicitly reset to avoid triggering the BUG() established
in #2.
4.) The path that that triggers this is via fs/open.c:chmod_common().
The combination of attr flags set here and in the first call to
notify_change() along with a later failed break_deleg_wait()
results in notify_change() being called again via retry_deleg
without resetting attr.

Solution is to move retry_deleg in chmod_common() a bit further up to
ensure attr is completely reset.

There are other places where this seemingly could occur, such as
fs/utimes.c:utimes_common(), but the attr flags are not initially
set in such a way to trigger this.

Fixes: 27ac0ffeac80 ("locks: break delegations on any attribute modification")
Reported-by: Eric Meddaugh
Tested-by: Eric Meddaugh
Signed-off-by: Andrew Elble
Signed-off-by: Al Viro

Andrew Elble
2015-04-12 10:24:34 +0800
e5b811e38 drop bogus check in file_open_root() ... Browse Code »

For one thing, LOOKUP_DIRECTORY will be dealt with in do_last().
For another, name can be an empty string, but not NULL - no callers
pass that and it would oops immediately if they would.

Signed-off-by: Al Viro

Al Viro
2015-04-12 10:24:32 +0800

25 Mar, 2015

1 commit

dd46c7877 fs: Add support FALLOC_FL_INSERT_RANGE for fallocate ... Browse Code »

FALLOC_FL_INSERT_RANGE command is the opposite command of
FALLOC_FL_COLLAPSE_RANGE that is needed for someone who wants to add
some data in the middle of file.

FALLOC_FL_INSERT_RANGE will create space for writing new data within
a file after shifting extents to right as given length. This command
also has same limitations as FALLOC_FL_COLLAPSE_RANGE in that
operations need to be filesystem block boundary aligned and cannot
cross the current EOF.

Signed-off-by: Namjae Jeon
Signed-off-by: Ashish Sangwan
Reviewed-by: Dave Chinner
Signed-off-by: Dave Chinner

Namjae Jeon
2015-03-25 12:07:05 +0800

18 Feb, 2015

1 commit

05016b0f0 Merge branch 'getname2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull getname/putname updates from Al Viro:
"Rework of getname/getname_kernel/etc., mostly from Paul Moore. Gets
rid of quite a pile of kludges between namei and audit..."

* 'getname2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
audit: replace getname()/putname() hacks with reference counters
audit: fix filename matching in __audit_inode() and __audit_inode_child()
audit: enable filename recording via getname_kernel()
simpler calling conventions for filename_mountpoint()
fs: create proper filename objects using getname_kernel()
fs: rework getname_kernel to handle up to PATH_MAX sized filenames
cut down the number of do_path_lookup() callers

Linus Torvalds
2015-02-18 07:27:47 +0800

17 Feb, 2015

1 commit

e748dcd09 vfs: remove get_xip_mem ... Browse Code »

All callers of get_xip_mem() are now gone. Remove checks for it,
initialisers of it, documentation of it and the only implementation of it.
Also remove mm/filemap_xip.c as it is now empty. Also remove
documentation of the long-gone get_xip_page().

Signed-off-by: Matthew Wilcox
Cc: Andreas Dilger
Cc: Boaz Harrosh
Cc: Christoph Hellwig
Cc: Dave Chinner
Cc: Jan Kara
Cc: Jens Axboe
Cc: Kirill A. Shutemov
Cc: Mathieu Desnoyers
Cc: Randy Dunlap
Cc: Ross Zwisler
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2015-02-17 09:56:03 +0800

23 Jan, 2015

1 commit

516891041 fs: create proper filename objects using getname_kernel() ... Browse Code »

There are several areas in the kernel that create temporary filename
objects using the following pattern:

int func(const char *name)
{
struct filename *file = { .name = name };
...
return 0;
}

... which for the most part works okay, but it causes havoc within the
audit subsystem as the filename object does not persist beyond the
lifetime of the function. This patch converts all of these temporary
filename objects into proper filename objects using getname_kernel()
and putname() which ensure that the filename object persists until the
audit subsystem is finished with it.

Also, a special thanks to Al Viro, Guenter Roeck, and Sabrina Dubroca
for helping resolve a difficult kernel panic on boot related to a
use-after-free problem in kern_path_create(); the thread can be seen
at the link below:

* https://lkml.org/lkml/2015/1/20/710

This patch includes code that was either based on, or directly written
by Al in the above thread.

CC: viro@zeniv.linux.org.uk
CC: linux@roeck-us.net
CC: sd@queasysnail.net
CC: linux-fsdevel@vger.kernel.org
Signed-off-by: Paul Moore
Signed-off-by: Al Viro

Paul Moore
2015-01-23 13:22:20 +0800

17 Dec, 2014

1 commit

0b233b7c7 Merge branch 'for-3.19' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd updates from Bruce Fields:
"A comparatively quieter cycle for nfsd this time, but still with two
larger changes:

- RPC server scalability improvements from Jeff Layton (using RCU
instead of a spinlock to find idle threads).

- server-side NFSv4.2 ALLOCATE/DEALLOCATE support from Anna
Schumaker, enabling fallocate on new clients"

* 'for-3.19' of git://linux-nfs.org/~bfields/linux: (32 commits)
nfsd4: fix xdr4 count of server in fs_location4
nfsd4: fix xdr4 inclusion of escaped char
sunrpc/cache: convert to use string_escape_str()
sunrpc: only call test_bit once in svc_xprt_received
fs: nfsd: Fix signedness bug in compare_blob
sunrpc: add some tracepoints around enqueue and dequeue of svc_xprt
sunrpc: convert to lockless lookup of queued server threads
sunrpc: fix potential races in pool_stats collection
sunrpc: add a rcu_head to svc_rqst and use kfree_rcu to free it
sunrpc: require svc_create callers to pass in meaningful shutdown routine
sunrpc: have svc_wake_up only deal with pool 0
sunrpc: convert sp_task_pending flag to use atomic bitops
sunrpc: move rq_cachetype field to better optimize space
sunrpc: move rq_splice_ok flag into rq_flags
sunrpc: move rq_dropme flag into rq_flags
sunrpc: move rq_usedeferral flag to rq_flags
sunrpc: move rq_local field to rq_flags
sunrpc: add a generic rq_flags field to svc_rqst and move rq_secure to it
nfsd: minor off by one checks in __write_versions()
sunrpc: release svc_pool_map reference when serv allocation fails
...

Linus Torvalds
2014-12-17 07:25:31 +0800

14 Dec, 2014

1 commit

820c12d5d fallocate: create FAN_MODIFY and IN_MODIFY events ... Browse Code »

The fanotify and the inotify API can be used to monitor changes of the
file system. System call fallocate() modifies files. Hence it should
trigger the corresponding fanotify (FAN_MODIFY) and inotify (IN_MODIFY)
events. The most interesting case is FALLOC_FL_COLLAPSE_RANGE because
this value allows to create arbitrary file content from random data.

This patch adds the missing call to fsnotify_modify().

The FAN_MODIFY and IN_MODIFY event will be created when fallocate()
succeeds. It will even be created if the file length remains unchanged,
e.g. when calling fanotify with flag FALLOC_FL_KEEP_SIZE.

This logic was primarily chosen to keep the coding simple.

It resembles the logic of the write() system call.

When we call write() we always create a FAN_MODIFY event, even in the case
of overwriting with identical data.

Events FAN_MODIFY and IN_MODIFY do not provide any guarantee that data was
actually changed.

Furthermore even if if the filesize remains unchanged, fallocate() may
influence whether a subsequent write() will succeed and hence the
fallocate() call may be considered a modification.

The fallocate(2) man page teaches: After a successful call, subsequent
writes into the range specified by offset and len are guaranteed not to
fail because of lack of disk space.

So calling fallocate(fd, FALLOC_FL_KEEP_SIZE, offset, len) may result in
different outcomes of a subsequent write depending on the values of offset
and len.

Signed-off-by: Heinrich Schuchardt
Reviewed-by: Jan Kara
Cc: Jan Kara
Cc: Alexander Viro
Cc: Eric Paris
Cc: John McCutchan
Cc: Robert Love
Cc: Michael Kerrisk
Cc: Theodore Ts'o
Cc: Dave Chinner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Heinrich Schuchardt
2014-12-14 04:42:53 +0800

20 Nov, 2014

2 commits

9f45f5bf3 new helper: audit_file() ... Browse Code »

... for situations when we don't have any candidate in pathnames - basically,
in descriptor-based syscalls.

[Folded the build fix for !CONFIG_AUDITSYSCALL configs from Chen Gang]

Signed-off-by: Al Viro

Al Viro
2014-11-20 02:01:26 +0800
56429e9b3 merge nfs bugfixes into nfsd for-3.19 branch ... Browse Code »

In addition to nfsd bugfixes, there are some fixes in -rc5 for client
bugs that can interfere with my testing.

J. Bruce Fields
2014-11-20 01:06:30 +0800

08 Nov, 2014

1 commit

72c72bdf7 VFS: Rename do_fallocate() to vfs_fallocate() ... Browse Code »

This function needs to be exported so it can be used by the NFSD module
when responding to the new ALLOCATE and DEALLOCATE operations in NFS
v4.2. Christoph Hellwig suggested renaming the function to stay
consistent with how other vfs functions are named.

Signed-off-by: Anna Schumaker
Signed-off-by: J. Bruce Fields

Anna Schumaker
2014-11-08 05:17:44 +0800

24 Oct, 2014

1 commit

4aa7c6346 vfs: add i_op->dentry_open() ... Browse Code »

Add a new inode operation i_op->dentry_open(). This is for stacked filesystems
that want to return a struct file from a different filesystem.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2014-10-24 06:14:35 +0800

01 Aug, 2014

1 commit

6d2b6170c vfs: fix check for fallocate on active swapfile ... Browse Code »

Fix the broken check for calling sys_fallocate() on an active swapfile,
introduced by commit 0790b31b69374ddadefe ("fs: disallow all fallocate
operation on active swapfile").

Signed-off-by: Eric Biggers
Signed-off-by: Al Viro

Eric Biggers
2014-08-01 14:36:04 +0800

07 May, 2014

2 commits

293bc9822 new methods: ->read_iter() and ->write_iter() ... Browse Code »

Beginning to introduce those. Just the callers for now, and it's
clumsier than it'll eventually become; once we finish converting
aio_read and aio_write instances, the things will get nicer.

For now, these guys are in parallel to ->aio_read() and ->aio_write();
they take iocb and iov_iter, with everything in iov_iter already
validated. File offset is passed in iocb->ki_pos, iov/nr_segs -
in iov_iter.

Main concerns in that series are stack footprint and ability to
split the damn thing cleanly.

[fix from Peter Ujfalusi folded]

Signed-off-by: Al Viro

Al Viro
2014-05-07 05:36:00 +0800
7f7f25e82 replace checking for ->read/->aio_read presence with check in ->f_mode ... Browse Code »

Since we are about to introduce new methods (read_iter/write_iter), the
tests in a bunch of places would have to grow inconveniently. Check
once (at open() time) and store results in ->f_mode as FMODE_CAN_READ
and FMODE_CAN_WRITE resp. It might end up being a temporary measure -
once everything switches from ->aio_{read,write} to ->{read,write}_iter
it might make sense to return to open-coded checks. We'll see...

Signed-off-by: Al Viro

Al Viro
2014-05-07 05:32:55 +0800