Eric Lee / smarc-fsl-linux-kernel

22 Aug, 2018

2 commits

d9a185f8b Merge tag 'ovl-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs ... Browse Code »

Pull overlayfs updates from Miklos Szeredi:
"This contains two new features:

- Stack file operations: this allows removal of several hacks from
the VFS, proper interaction of read-only open files with copy-up,
possibility to implement fs modifying ioctls properly, and others.

- Metadata only copy-up: when file is on lower layer and only
metadata is modified (except size) then only copy up the metadata
and continue to use the data from the lower file"

* tag 'ovl-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (66 commits)
ovl: Enable metadata only feature
ovl: Do not do metacopy only for ioctl modifying file attr
ovl: Do not do metadata only copy-up for truncate operation
ovl: add helper to force data copy-up
ovl: Check redirect on index as well
ovl: Set redirect on upper inode when it is linked
ovl: Set redirect on metacopy files upon rename
ovl: Do not set dentry type ORIGIN for broken hardlinks
ovl: Add an inode flag OVL_CONST_INO
ovl: Treat metacopy dentries as type OVL_PATH_MERGE
ovl: Check redirects for metacopy files
ovl: Move some dir related ovl_lookup_single() code in else block
ovl: Do not expose metacopy only dentry from d_real()
ovl: Open file with data except for the case of fsync
ovl: Add helper ovl_inode_realdata()
ovl: Store lower data inode in ovl_inode
ovl: Fix ovl_getattr() to get number of blocks from lower
ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry
ovl: Copy up meta inode data from lowest data inode
ovl: Modify ovl_lookup() and friends to lookup metacopy dentry
...

Linus Torvalds
2018-08-22 09:19:09 +0800
0214f46b3 Merge branch 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/eb… ... Browse Code »

…iederm/user-namespace

Pull core signal handling updates from Eric Biederman:
"It was observed that a periodic timer in combination with a
sufficiently expensive fork could prevent fork from every completing.
This contains the changes to remove the need for that restart.

This set of changes is split into several parts:

- The first part makes PIDTYPE_TGID a proper pid type instead
something only for very special cases. The part starts using
PIDTYPE_TGID enough so that in __send_signal where signals are
actually delivered we know if the signal is being sent to a a group
of processes or just a single process.

- With that prep work out of the way the logic in fork is modified so
that fork logically makes signals received while it is running
appear to be received after the fork completes"

* 'siginfo-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (22 commits)
signal: Don't send signals to tasks that don't exist
signal: Don't restart fork when signals come in.
fork: Have new threads join on-going signal group stops
fork: Skip setting TIF_SIGPENDING in ptrace_init_task
signal: Add calculate_sigpending()
fork: Unconditionally exit if a fatal signal is pending
fork: Move and describe why the code examines PIDNS_ADDING
signal: Push pid type down into complete_signal.
signal: Push pid type down into __send_signal
signal: Push pid type down into send_signal
signal: Pass pid type into do_send_sig_info
signal: Pass pid type into send_sigio_to_task & send_sigurg_to_task
signal: Pass pid type into group_send_sig_info
signal: Pass pid and pid type into send_sigqueue
posix-timers: Noralize good_sigevent
signal: Use PIDTYPE_TGID to clearly store where file signals will be sent
pid: Implement PIDTYPE_TGID
pids: Move the pgrp and session pid pointers from task_struct to signal_struct
kvm: Don't open code task_pid in kvm_vcpu_ioctl
pids: Compute task_tgid using signal->leader_pid
...

Linus Torvalds
2018-08-22 04:47:29 +0800

14 Aug, 2018

1 commit

575b94386 Merge tag 'locks-v4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux ... Browse Code »

Pull file locking updates from Jeff Layton:
"Just a couple of patches from Konstantin to fix /proc/locks when the
process that set the lock has exited, and a new tracepoint for the
flock() codepath. Also threw in mailmap entries for my addresses and a
comment cleanup"

* tag 'locks-v4.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
locks: remove misleading obsolete comment
mailmap: remap some of my email addresses to kernel.org address
locks: add tracepoint in flock codepath
fs/lock: show locks taken by processes from another pidns
fs/lock: skip lock owner pid translation in case we are in init_pid_ns

Linus Torvalds
2018-08-14 12:56:50 +0800

09 Aug, 2018

1 commit

da33a871b locks: remove misleading obsolete comment ... Browse Code »

The spinlock handling in this file has changed significantly since this
comment was written, and the file_lock_lock is no more. In addition,
this overall comment no longer applies. Deleting an entry now requires
both locks.

Signed-off-by: Jeff Layton

Jeff Layton
2018-08-09 00:59:06 +0800

07 Aug, 2018

1 commit

c883da313 locks: add tracepoint in flock codepath ... Browse Code »

Signed-off-by: Jeff Layton

Jeff Layton
2018-08-07 01:15:16 +0800

21 Jul, 2018

1 commit

019191342 signal: Use PIDTYPE_TGID to clearly store where file signals will be sent ... Browse Code »

When f_setown is called a pid and a pid type are stored. Replace the use
of PIDTYPE_PID with PIDTYPE_TGID as PIDTYPE_TGID goes to the entire thread
group. Replace the use of PIDTYPE_MAX with PIDTYPE_PID as PIDTYPE_PID now
is only for a thread.

Update the users of __f_setown to use PIDTYPE_TGID instead of
PIDTYPE_PID.

For now the code continues to capture task_pid (when task_tgid would
really be appropriate), and iterate on PIDTYPE_PID (even when type ==
PIDTYPE_TGID) out of an abundance of caution to preserve existing
behavior.

Oleg Nesterov suggested using the test to ensure we use PIDTYPE_PID
for tgid lookup also be used to avoid taking the tasklist lock.

Suggested-by: Oleg Nesterov
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2018-07-21 23:43:12 +0800

18 Jul, 2018

2 commits

de2a4a501 Partially revert "locks: fix file locking on overlayfs" ... Browse Code »

This partially reverts commit c568d68341be7030f5647def68851e469b21ca11.

Overlayfs files will now automatically get the correct locks, no need to
hack overlay support in VFS.

It is a partial revert, because it leaves the locks_inode() calls in place
and defines locks_inode() to file_inode(). We could revert those as well,
but it would be unnecessary code churn and it makes sense to document that
we are getting the inode for locking purposes.

Don't revert MS_NOREMOTELOCK yet since that has been part of the userspace
API for some time (though not in a useful way). Will try to remove
internal flags later when the dust around the new mount API settles.

Signed-off-by: Miklos Szeredi
Acked-by: Jeff Layton

Miklos Szeredi
2018-07-18 21:44:43 +0800
8cf9ee506 Revert "vfs: do get_write_access() on upper layer of overlayfs" ... Browse Code »

This reverts commit 4d0c5ba2ff79ef9f5188998b29fd28fcb05f3667.

We now get write access on both overlay and underlying layers so this patch
is no longer needed for correct operation.

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2018-07-18 21:44:43 +0800

15 Jun, 2018

1 commit

7a932516f Merge tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground ... Browse Code »

Pull inode timestamps conversion to timespec64 from Arnd Bergmann:
"This is a late set of changes from Deepa Dinamani doing an automated
treewide conversion of the inode and iattr structures from 'timespec'
to 'timespec64', to push the conversion from the VFS layer into the
individual file systems.

As Deepa writes:

'The series aims to switch vfs timestamps to use struct timespec64.
Currently vfs uses struct timespec, which is not y2038 safe.

The series involves the following:
1. Add vfs helper functions for supporting struct timepec64
timestamps.
2. Cast prints of vfs timestamps to avoid warnings after the switch.
3. Simplify code using vfs timestamps so that the actual replacement
becomes easy.
4. Convert vfs timestamps to use struct timespec64 using a script.
This is a flag day patch.

Next steps:
1. Convert APIs that can handle timespec64, instead of converting
timestamps at the boundaries.
2. Update internal data structures to avoid timestamp conversions'

Thomas Gleixner adds:

'I think there is no point to drag that out for the next merge
window. The whole thing needs to be done in one go for the core
changes which means that you're going to play that catchup game
forever. Let's get over with it towards the end of the merge window'"

* tag 'vfs-timespec64' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/playground:
pstore: Remove bogus format string definition
vfs: change inode times to use struct timespec64
pstore: Convert internal records to timespec64
udf: Simplify calls to udf_disk_stamp_to_time
fs: nfs: get rid of memcpys for inode times
ceph: make inode time prints to be long long
lustre: Use long long type to print inode time
fs: add timespec64_truncate()

Linus Torvalds
2018-06-15 06:31:07 +0800

14 Jun, 2018

2 commits

1cf8e5de4 fs/lock: show locks taken by processes from another pidns ... Browse Code »

Currently if we face a lock taken by a process invisible in the current
pidns we skip the lock completely, but this

1) makes the output not that nice
(root@vz7)/: cat /proc/${PID_A2}/fdinfo/3
pos: 4
flags: 02100002
mnt_id: 257
lock: (root@vz7)/:

2) makes it more difficult to debug issues with leaked flocks
if you get error on lock, but don't see any locks in /proc/$id/fdinfo/$file

Let's show information about such locks again as previously, but
show zero in the owner pid field.

After the patch:
===============
(root@vz7)/:cat /proc/${PID_A2}/fdinfo/3
pos: 4
flags: 02100002
mnt_id: 295
lock: 1: FLOCK ADVISORY WRITE 0 b6:f8a61:529946 0 EOF

Fixes: 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks")
Signed-off-by: Konstantin Khorenko
Acked-by: Andrey Vagin
Reviewed-by: Benjamin Coddington
Signed-off-by: Jeff Layton

Konstantin Khorenko
2018-06-14 19:37:50 +0800
826d7bc9f fs/lock: skip lock owner pid translation in case we are in init_pid_ns ... Browse Code »

If the flock owner process is dead and its pid has been already freed,
pid translation won't work, but we still want to show flock owner pid
number when expecting /proc/$PID/fdinfo/$FD in init pidns.

Reproducer:
process A process A1 process A2
fork()--------->
exit() open()
flock()
fork()--------->
exit() sleep()

Before the patch:
================
(root@vz7)/: cat /proc/${PID_A2}/fdinfo/3
pos: 4
flags: 02100002
mnt_id: 257
lock: (root@vz7)/:

After the patch:
===============
(root@vz7)/:cat /proc/${PID_A2}/fdinfo/3
pos: 4
flags: 02100002
mnt_id: 295
lock: 1: FLOCK ADVISORY WRITE ${PID_A1} b6:f8a61:529946 0 EOF

Fixes: 9d5b86ac13c5 ("fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks")
Signed-off-by: Konstantin Khorenko
Acked-by: Andrey Vagin
Reviewed-by: Benjamin Coddington
Signed-off-by: Jeff Layton

Konstantin Khorenko
2018-06-14 19:36:40 +0800

06 Jun, 2018

1 commit

95582b008 vfs: change inode times to use struct timespec64 ... Browse Code »

struct timespec is not y2038 safe. Transition vfs to use
y2038 safe struct timespec64 instead.

The change was made with the help of the following cocinelle
script. This catches about 80% of the changes.
All the header file and logic changes are included in the
first 5 rules. The rest are trivial substitutions.
I avoid changing any of the function signatures or any other
filesystem specific data structures to keep the patch simple
for review.

The script can be a little shorter by combining different cases.
But, this version was sufficient for my usecase.

virtual patch

@ depends on patch @
identifier now;
@@
- struct timespec
+ struct timespec64
current_time ( ... )
{
- struct timespec now = current_kernel_time();
+ struct timespec64 now = current_kernel_time64();
...
- return timespec_trunc(
+ return timespec64_trunc(
... );
}

@ depends on patch @
identifier xtime;
@@
struct $ iattr \| inode \| kstat $ {
...
- struct timespec xtime;
+ struct timespec64 xtime;
...
}

@ depends on patch @
identifier t;
@@
struct inode_operations {
...
int (*update_time) (...,
- struct timespec t,
+ struct timespec64 t,
...);
...
}

@ depends on patch @
identifier t;
identifier fn_update_time =~ "update_time$";
@@
fn_update_time (...,
- struct timespec *t,
+ struct timespec64 *t,
...) { ... }

@ depends on patch @
identifier t;
@@
lease_get_mtime( ... ,
- struct timespec *t
+ struct timespec64 *t
) { ... }

@te depends on patch forall@
identifier ts;
local idexpression struct inode *inode_node;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
identifier fn_update_time =~ "update_time$";
identifier fn;
expression e, E3;
local idexpression struct inode *node1;
local idexpression struct inode *node2;
local idexpression struct iattr *attr1;
local idexpression struct iattr *attr2;
local idexpression struct iattr attr;
identifier i_xtime1 =~ "^i_[acm]time$";
identifier i_xtime2 =~ "^i_[acm]time$";
identifier ia_xtime1 =~ "^ia_[acm]time$";
identifier ia_xtime2 =~ "^ia_[acm]time$";
@@
(
(
- struct timespec ts;
+ struct timespec64 ts;
|
- struct timespec ts = current_time(inode_node);
+ struct timespec64 ts = current_time(inode_node);
)

i_xtime, &ts)
+ timespec64_equal(&inode_node->i_xtime, &ts)
|
- timespec_equal(&ts, &inode_node->i_xtime)
+ timespec64_equal(&ts, &inode_node->i_xtime)
|
- timespec_compare(&inode_node->i_xtime, &ts)
+ timespec64_compare(&inode_node->i_xtime, &ts)
|
- timespec_compare(&ts, &inode_node->i_xtime)
+ timespec64_compare(&ts, &inode_node->i_xtime)
|
ts = current_time(e)
|
fn_update_time(..., &ts,...)
|
inode_node->i_xtime = ts
|
node1->i_xtime = ts
|
ts = inode_node->i_xtime
|
ia_xtime ...+> = ts
|
ts = attr1->ia_xtime
|
ts.tv_sec
|
ts.tv_nsec
|
btrfs_set_stack_timespec_sec(..., ts.tv_sec)
|
btrfs_set_stack_timespec_nsec(..., ts.tv_nsec)
|
- ts = timespec64_to_timespec(
+ ts =
...
-)
|
- ts = ktime_to_timespec(
+ ts = ktime_to_timespec64(
...)
|
- ts = E3
+ ts = timespec_to_timespec64(E3)
|
- ktime_get_real_ts(&ts)
+ ktime_get_real_ts64(&ts)
|
fn(...,
- ts
+ timespec64_to_timespec(ts)
,...)
)
...+>
(

)
|
- timespec_equal(&node1->i_xtime1, &node2->i_xtime2)
+ timespec64_equal(&node1->i_xtime2, &node2->i_xtime2)
|
- timespec_equal(&node1->i_xtime1, &attr2->ia_xtime2)
+ timespec64_equal(&node1->i_xtime2, &attr2->ia_xtime2)
|
- timespec_compare(&node1->i_xtime1, &node2->i_xtime2)
+ timespec64_compare(&node1->i_xtime1, &node2->i_xtime2)
|
node1->i_xtime1 =
- timespec_trunc(attr1->ia_xtime1,
+ timespec64_trunc(attr1->ia_xtime1,
...)
|
- attr1->ia_xtime1 = timespec_trunc(attr2->ia_xtime2,
+ attr1->ia_xtime1 = timespec64_trunc(attr2->ia_xtime2,
...)
|
- ktime_get_real_ts(&attr1->ia_xtime1)
+ ktime_get_real_ts64(&attr1->ia_xtime1)
|
- ktime_get_real_ts(&attr.ia_xtime1)
+ ktime_get_real_ts64(&attr.ia_xtime1)
)

@ depends on patch @
struct inode *node;
struct iattr *attr;
identifier fn;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
expression e;
@@
(
- fn(node->i_xtime);
+ fn(timespec64_to_timespec(node->i_xtime));
|
fn(...,
- node->i_xtime);
+ timespec64_to_timespec(node->i_xtime));
|
- e = fn(attr->ia_xtime);
+ e = fn(timespec64_to_timespec(attr->ia_xtime));
)

@ depends on patch forall @
struct inode *node;
struct iattr *attr;
identifier i_xtime =~ "^i_[acm]time$";
identifier ia_xtime =~ "^ia_[acm]time$";
identifier fn;
@@
{
+ struct timespec ts;
i_xtime);
fn (...,
- &node->i_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
fn (...,
- &attr->ia_xtime,
+ &ts,
...);
)
...+>
}

@ depends on patch forall @
struct inode *node;
struct iattr *attr;
struct kstat *stat;
identifier ia_xtime =~ "^ia_[acm]time$";
identifier i_xtime =~ "^i_[acm]time$";
identifier xtime =~ "^[acm]time$";
identifier fn, ret;
@@
{
+ struct timespec ts;
i_xtime);
ret = fn (...,
- &node->i_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(node->i_xtime);
ret = fn (...,
- &node->i_xtime);
+ &ts);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
ret = fn (...,
- &attr->ia_xtime,
+ &ts,
...);
|
+ ts = timespec64_to_timespec(attr->ia_xtime);
ret = fn (...,
- &attr->ia_xtime);
+ &ts);
|
+ ts = timespec64_to_timespec(stat->xtime);
ret = fn (...,
- &stat->xtime);
+ &ts);
)
...+>
}

@ depends on patch @
struct inode *node;
struct inode *node2;
identifier i_xtime1 =~ "^i_[acm]time$";
identifier i_xtime2 =~ "^i_[acm]time$";
identifier i_xtime3 =~ "^i_[acm]time$";
struct iattr *attrp;
struct iattr *attrp2;
struct iattr attr ;
identifier ia_xtime1 =~ "^ia_[acm]time$";
identifier ia_xtime2 =~ "^ia_[acm]time$";
struct kstat *stat;
struct kstat stat1;
struct timespec64 ts;
identifier xtime =~ "^[acmb]time$";
expression e;
@@
(
( node->i_xtime2 \| attrp->ia_xtime2 \| attr.ia_xtime2 \) = node->i_xtime1 ;
|
node->i_xtime2 = $ node2->i_xtime1 \| timespec64_trunc(...) $;
|
node->i_xtime2 = node->i_xtime1 = node->i_xtime3 = $ts \| current_time(...) $;
|
node->i_xtime1 = node->i_xtime3 = $ts \| current_time(...) $;
|
stat->xtime = node2->i_xtime1;
|
stat1.xtime = node2->i_xtime1;
|
( node->i_xtime2 \| attrp->ia_xtime2 \) = attrp->ia_xtime1 ;
|
( attrp->ia_xtime1 \| attr.ia_xtime1 \) = attrp2->ia_xtime2;
|
- e = node->i_xtime1;
+ e = timespec64_to_timespec( node->i_xtime1 );
|
- e = attrp->ia_xtime1;
+ e = timespec64_to_timespec( attrp->ia_xtime1 );
|
node->i_xtime1 = current_time(...);
|
node->i_xtime2 = node->i_xtime1 = node->i_xtime3 =
- e;
+ timespec_to_timespec64(e);
|
node->i_xtime1 = node->i_xtime3 =
- e;
+ timespec_to_timespec64(e);
|
- node->i_xtime1 = e;
+ node->i_xtime1 = timespec_to_timespec64(e);
)

Signed-off-by: Deepa Dinamani
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:
Cc:

Deepa Dinamani
2018-06-06 07:57:31 +0800

16 May, 2018

1 commit

44414d82c proc: introduce proc_create_seq_private ... Browse Code »

Variant of proc_create_data that directly take a struct seq_operations
argument + a private state size and drastically reduces the boilerplate
code in the callers.

All trivial callers converted over.

Signed-off-by: Christoph Hellwig

Christoph Hellwig
2018-05-16 13:23:35 +0800

26 Mar, 2018

1 commit

447a5647c treewide: Align function definition open/close braces ... Browse Code »

Some functions definitions have either the initial open brace and/or
the closing brace outside of column 1.

Move those braces to column 1.

This allows various function analyzers like gnu complexity to work
properly for these modified functions.

Signed-off-by: Joe Perches
Acked-by: Andy Shevchenko
Acked-by: Paul Moore
Acked-by: Alex Deucher
Acked-by: Dave Chinner
Reviewed-by: Darrick J. Wong
Acked-by: Alexandre Belloni
Acked-by: Martin K. Petersen
Acked-by: Takashi Iwai
Acked-by: Mauro Carvalho Chehab
Acked-by: Rafael J. Wysocki
Acked-by: Nicolin Chen
Acked-by: Martin K. Petersen
Acked-by: Steven Rostedt (VMware)
Signed-off-by: Jiri Kosina

Joe Perches
2018-03-26 17:13:09 +0800

09 Feb, 2018

2 commits

f1517df87 Merge tag 'nfsd-4.16' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd update from Bruce Fields:
"A fairly small update this time around. Some cleanup, RDMA fixes,
overlayfs fixes, and a fix for an NFSv4 state bug.

The bigger deal for nfsd this time around was Jeff Layton's
already-merged i_version patches"

* tag 'nfsd-4.16' of git://linux-nfs.org/~bfields/linux:
svcrdma: Fix Read chunk round-up
NFSD: hide unused svcxdr_dupstr()
nfsd: store stat times in fill_pre_wcc() instead of inode times
nfsd: encode stat->mtime for getattr instead of inode->i_mtime
nfsd: return RESOURCE not GARBAGE_ARGS on too many ops
nfsd4: don't set lock stateid's sc_type to CLOSED
nfsd: Detect unhashed stids in nfsd4_verify_open_stid()
sunrpc: remove dead code in svc_sock_setbufsize
svcrdma: Post Receives in the Receive completion handler
nfsd4: permit layoutget of executable-only files
lockd: convert nlm_rqst.a_count from atomic_t to refcount_t
lockd: convert nlm_lockowner.count from atomic_t to refcount_t
lockd: convert nsm_handle.sm_count from atomic_t to refcount_t

Linus Torvalds
2018-02-09 07:18:32 +0800
76c479480 nfsd: encode stat->mtime for getattr instead of inode->i_mtime ... Browse Code »

The values of stat->mtime and inode->i_mtime may differ for overlayfs
and stat->mtime is the correct value to use when encoding getattr.
This is also consistent with the fact that other attr times are also
encoded from stat values.

Both callers of lease_get_mtime() already have the value of stat->mtime,
so the only needed change is that lease_get_mtime() will not overwrite
this value with inode->i_mtime in case the inode does not have an
exclusive lease.

Signed-off-by: Amir Goldstein
Reviewed-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Amir Goldstein
2018-02-09 02:40:16 +0800

28 Nov, 2017

1 commit

1751e8a6c Rename superblock flags (MS_xyz -> SB_xyz) ... Browse Code »

This is a pure automated search-and-replace of the internal kernel
superblock flags.

The s_flags are now called SB_*, with the names and the values for the
moment mirroring the MS_* flags that they're equivalent to.

Note how the MS_xyz flags are the ones passed to the mount system call,
while the SB_xyz flags are what we then use in sb->s_flags.

The script to do this was:

# places to look in; re security/*: it generally should *not* be
# touched (that stuff parses mount(2) arguments directly), but
# there are two places where we really deal with superblock flags.
FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
include/linux/fs.h include/uapi/linux/bfs_fs.h \
security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
# the list of MS_... constants
SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
ACTIVE NOUSER"

SED_PROG=
for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done

# we want files that contain at least one of MS_...,
# with fs/namespace.c and fs/pnode.c excluded.
L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')

for f in $L; do sed -i $f $SED_PROG; done

Requested-by: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2017-11-28 05:05:09 +0800

22 Jul, 2017

1 commit

3953704fd locks: restore a warn for leaked locks on close ... Browse Code »

When locks.c moved to using file_lock_context, the check for any locks that
were not released was moved from the __fput() to destroy_inode() path in
commit 8634b51f6ca2 ("locks: convert lease handling to file_lock_context").
This warning has been quite useful for catching bugs, particularly in NFS
where lock handling still sees some churn.

Let's bring back the warning for leaked locks on __fput, as this warning is
much more likely to be seen and reported by users.

Signed-off-by: Benjamin Coddington
Signed-off-by: Jeff Layton

Benjamin Coddington
2017-07-22 01:57:31 +0800

16 Jul, 2017

2 commits

9d5b86ac1 fs/locks: Remove fl_nspid and use fs-specific l_pid for remote locks ... Browse Code »

Since commit c69899a17ca4 "NFSv4: Update of VFS byte range lock must be
atomic with the stateid update", NFSv4 has been inserting locks in rpciod
worker context. The result is that the file_lock's fl_nspid is the
kworker's pid instead of the original userspace pid.

The fl_nspid is only used to represent the namespaced virtual pid number
when displaying locks or returning from F_GETLK. There's no reason to set
it for every inserted lock, since we can usually just look it up from
fl_pid. So, instead of looking up and holding struct pid for every lock,
let's just look up the virtual pid number from fl_pid when it is needed.
That means we can remove fl_nspid entirely.

The translaton and presentation of fl_pid should handle the following four
cases:

1 - F_GETLK on a remote file with a remote lock:
In this case, the filesystem should determine the l_pid to return here.
Filesystems should indicate that the fl_pid represents a non-local pid
value that should not be translated by returning an fl_pid
Signed-off-by: Jeff Layton

Benjamin Coddington
2017-07-16 22:28:22 +0800
52306e882 fs/locks: Use allocation rather than the stack in fcntl_getlk() ... Browse Code »

Struct file_lock is fairly large, so let's save some space on the stack by
using an allocation for struct file_lock in fcntl_getlk(), just as we do
for fcntl_setlk().

Signed-off-by: Benjamin Coddington
Signed-off-by: Jeff Layton

Benjamin Coddington
2017-07-16 22:28:21 +0800

27 May, 2017

2 commits

a75d30c77 fs/locks: pass kernel struct flock to fcntl_getlk/setlk ... Browse Code »

This will make it easier to implement a sane compat fcntl syscall.

[ jlayton: fix undeclared identifiers in 32-bit fcntl64 syscall handler ]

Signed-off-by: Christoph Hellwig
Reviewed-by: Jeff Layton
Signed-off-by: Jeff Layton

Christoph Hellwig
2017-05-27 18:07:19 +0800
80b79dd0e fs: locks: Fix some troubles at kernel-doc comments ... Browse Code »

There are a few syntax violations that cause outputs of
a few comments to not be properly parsed in ReST format.

No functional changes.

Signed-off-by: Mauro Carvalho Chehab
Signed-off-by: Jeff Layton

Mauro Carvalho Chehab
2017-05-27 18:07:18 +0800

21 Apr, 2017

1 commit

50f2112cf locks: Set FL_CLOSE when removing flock locks on close() ... Browse Code »

Set FL_CLOSE in fl_flags as in locks_remove_posix() when clearing locks.
NFS will check for this flag to ensure an unlock is sent in a following
patch.

Fuse handles flock and posix locks differently for FL_CLOSE, and so
requires a fixup to retain the existing behavior for flock.

Signed-off-by: Benjamin Coddington
Reviewed-by: Jeff Layton
Acked-by: Miklos Szeredi
Signed-off-by: Trond Myklebust

Benjamin Coddington
2017-04-21 22:45:01 +0800

25 Dec, 2016

1 commit

7c0f6ba68 Replace <asm/uaccess.h> with <linux/uaccess.h> globally ... Browse Code »

This was entirely automated, using the script by Al:

PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*'
sed -i -e "s!$PATT!#include !" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)

to do the replacement at the end of the merge window.

Requested-by: Al Viro
Signed-off-by: Linus Torvalds

Linus Torvalds
2016-12-25 03:46:01 +0800

18 Oct, 2016

1 commit

5f43086bb locking, fs/locks: Add missing file_sem locks ... Browse Code »

I overlooked a few code-paths that can lead to
locks_delete_global_locks().

Reported-by: Dmitry Vyukov
Signed-off-by: Peter Zijlstra (Intel)
Reviewed-by: Jeff Layton
Cc: Al Viro
Cc: Bruce Fields
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-fsdevel@vger.kernel.org
Cc: syzkaller
Link: http://lkml.kernel.org/r/20161008081228.GF3142@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-10-18 18:21:28 +0800

11 Oct, 2016

2 commits

101105b17 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull more vfs updates from Al Viro:
">rename2() work from Miklos + current_time() from Deepa"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Replace current_fs_time() with current_time()
fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
fs: Replace CURRENT_TIME with current_time() for inode timestamps
fs: proc: Delete inode time initializations in proc_alloc_inode()
vfs: Add current_time() api
vfs: add note about i_op->rename changes to porting
fs: rename "rename2" i_op to "rename"
vfs: remove unused i_op->rename
fs: make remaining filesystems use .rename2
libfs: support RENAME_NOREPLACE in simple_rename()
fs: support RENAME_NOREPLACE for local filesystems
ncpfs: fix unused variable warning

Linus Torvalds
2016-10-11 11:16:43 +0800
abb5a14fa Merge branch 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull misc vfs updates from Al Viro:
"Assorted misc bits and pieces.

There are several single-topic branches left after this (rename2
series from Miklos, current_time series from Deepa Dinamani, xattr
series from Andreas, uaccess stuff from from me) and I'd prefer to
send those separately"

* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
proc: switch auxv to use of __mem_open()
hpfs: support FIEMAP
cifs: get rid of unused arguments of CIFSSMBWrite()
posix_acl: uapi header split
posix_acl: xattr representation cleanups
fs/aio.c: eliminate redundant loads in put_aio_ring_file
fs/internal.h: add const to ns_dentry_operations declaration
compat: remove compat_printk()
fs/buffer.c: make __getblk_slow() static
proc: unsigned file descriptors
fs/file: more unsigned file descriptors
fs: compat: remove redundant check of nr_segs
cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
cifs: don't use memcpy() to copy struct iov_iter
get rid of separate multipage fault-in primitives
fs: Avoid premature clearing of capabilities
fs: Give dentry to inode_change_ok() instead of inode
fuse: Propagate dentry down to inode_change_ok()
ceph: Propagate dentry down to inode_change_ok()
xfs: Propagate dentry down to inode_change_ok()
...

Linus Torvalds
2016-10-11 04:04:49 +0800

05 Oct, 2016

1 commit

c35bcfd8e Merge tag 'locks-v4.9-1' of git://git.samba.org/jlayton/linux ... Browse Code »

Pull file locking updates from Jeff Layton:
"Only a single patch from Nikolay this cycle, with a small change to
better handle /proc/locks in a containerized host"

* tag 'locks-v4.9-1' of git://git.samba.org/jlayton/linux:
locks: Filter /proc/locks output on proc pid ns

Linus Torvalds
2016-10-05 04:36:19 +0800

28 Sep, 2016

1 commit

c2050a454 fs: Replace current_fs_time() with current_time() ... Browse Code »

current_fs_time() uses struct super_block* as an argument.
As per Linus's suggestion, this is changed to take struct
inode* as a parameter instead. This is because the function
is primarily meant for vfs inode timestamps.
Also the function was renamed as per Arnd's suggestion.

Change all calls to current_fs_time() to use the new
current_time() function instead. current_fs_time() will be
deleted.

Signed-off-by: Deepa Dinamani
Signed-off-by: Al Viro

Deepa Dinamani
2016-09-28 09:06:22 +0800

22 Sep, 2016

3 commits

87709e28d fs/locks: Use percpu_down_read_preempt_disable() ... Browse Code »

Avoid spurious preemption.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Al Viro
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dave@stgolabs.net
Cc: der.herr@hofr.at
Cc: paulmck@linux.vnet.ibm.com
Cc: riel@redhat.com
Cc: tj@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-09-22 21:25:54 +0800
7c3f654d8 fs/locks: Replace lg_local with a per-cpu spinlock ... Browse Code »

As Oleg suggested, replace file_lock_list with a structure containing
the hlist head and a spinlock.

This completely removes the lglock from fs/locks.

Suggested-by: Oleg Nesterov
Signed-off-by: Peter Zijlstra (Intel)
Cc: Al Viro
Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dave@stgolabs.net
Cc: der.herr@hofr.at
Cc: paulmck@linux.vnet.ibm.com
Cc: riel@redhat.com
Cc: tj@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-09-22 21:25:53 +0800
aba376607 fs/locks: Replace lg_global with a percpu-rwsem ... Browse Code »

Replace the global part of the lglock with a percpu-rwsem.

Since fcl_lock is a spinlock and itself nests under i_lock, which too
is a spinlock we cannot acquire sleeping locks at
locks_{insert,remove}_global_locks().

We can however wrap all fcl_lock acquisitions with percpu_down_read
such that all invocations of locks_{insert,remove}_global_locks() have
that read lock held.

This allows us to replace the lg_global part of the lglock with the
write side of the rwsem.

In the absense of writers, percpu_{down,up}_read() are free of atomic
instructions. This further avoids the very long preempt-disable
regions caused by lglock on larger machines.

Signed-off-by: Peter Zijlstra (Intel)
Cc: Al Viro
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: dave@stgolabs.net
Cc: der.herr@hofr.at
Cc: paulmck@linux.vnet.ibm.com
Cc: riel@redhat.com
Cc: tj@kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Peter Zijlstra
2016-09-22 21:25:53 +0800

16 Sep, 2016

2 commits

4d0c5ba2f vfs: do get_write_access() on upper layer of overlayfs ... Browse Code »

The problem with writecount is: we want consistent handling of it for
underlying filesystems as well as overlayfs. Making sure i_writecount is
correct on all layers is difficult. Instead this patch makes sure that
when write access is acquired, it's always done on the underlying writable
layer (called the upper layer). We must also make sure to look at the
writecount on this layer when checking for conflicting leases.

Open for write already updates the upper layer's writecount. Leaving only
truncate.

For truncate copy up must happen before get_write_access() so that the
writecount is updated on the upper layer. Problem with this is if
something fails after that, then copy-up was done needlessly. E.g. if
break_lease() was interrupted. Probably not a big deal in practice.

Another interesting case is if there's a denywrite on a lower file that is
then opened for write or truncated. With this patch these will succeed,
which is somewhat counterintuitive. But I think it's still acceptable,
considering that the copy-up does actually create a different file, so the
old, denywrite mapping won't be touched.

On non-overlayfs d_real() is an identity function and d_real_inode() is
equivalent to d_inode() so this patch doesn't change behavior in that case.

Signed-off-by: Miklos Szeredi
Acked-by: Jeff Layton
Cc: "J. Bruce Fields"

Miklos Szeredi
2016-09-16 18:44:21 +0800
c568d6834 locks: fix file locking on overlayfs ... Browse Code »

This patch allows flock, posix locks, ofd locks and leases to work
correctly on overlayfs.

Instead of using the underlying inode for storing lock context use the
overlay inode. This allows locks to be persistent across copy-up.

This is done by introducing locks_inode() helper and using it instead of
file_inode() to get the inode in locking code. For non-overlayfs the two
are equivalent, except for an extra pointer dereference in locks_inode().

Since lock operations are in "struct file_operations" we must also make
sure not to call underlying filesystem's lock operations. Introcude a
super block flag MS_NOREMOTELOCK to this effect.

Signed-off-by: Miklos Szeredi
Acked-by: Jeff Layton
Cc: "J. Bruce Fields"

Miklos Szeredi
2016-09-16 18:44:20 +0800

19 Aug, 2016

1 commit

d67fd44f6 locks: Filter /proc/locks output on proc pid ns ... Browse Code »

On busy container servers reading /proc/locks shows all the locks
created by all clients. This can cause large latency spikes. In my
case I observed lsof taking up to 5-10 seconds while processing around
50k locks. Fix this by limiting the locks shown only to those created
in the same pidns as the one the proc fs was mounted in. When reading
/proc/locks from the init_pid_ns proc instance then perform no
filtering

[ jlayton: reformat comments for 80 columns ]

Signed-off-by: Nikolay Borisov
Suggested-by: Eric W. Biederman
Signed-off-by: Jeff Layton

Nikolay Borisov
2016-08-19 01:49:41 +0800

01 Jul, 2016

1 commit

6343a2120 locks: use file_inode() ... Browse Code »

(Another one for the f_path debacle.)

ltp fcntl33 testcase caused an Oops in selinux_file_send_sigiotask.

The reason is that generic_add_lease() used filp->f_path.dentry->inode
while all the others use file_inode(). This makes a difference for files
opened on overlayfs since the former will point to the overlay inode the
latter to the underlying inode.

So generic_add_lease() added the lease to the overlay inode and
generic_delete_lease() removed it from the underlying inode. When the file
was released the lease remained on the overlay inode's lock list, resulting
in use after free.

Reported-by: Eryu Guan
Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")
Cc:
Signed-off-by: Miklos Szeredi
Reviewed-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Miklos Szeredi
2016-07-01 22:24:18 +0800

23 Jan, 2016

1 commit

5955102c9 wrappers for ->i_mutex access ... Browse Code »

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro

Al Viro
2016-01-23 07:04:28 +0800

13 Jan, 2016

1 commit

fce205e9d Merge branch 'work.copy_file_range' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs copy_file_range updates from Al Viro:
"Several series around copy_file_range/CLONE"

* 'work.copy_file_range' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
btrfs: use new dedupe data function pointer
vfs: hoist the btrfs deduplication ioctl to the vfs
vfs: wire up compat ioctl for CLONE/CLONE_RANGE
cifs: avoid unused variable and label
nfsd: implement the NFSv4.2 CLONE operation
nfsd: Pass filehandle to nfs4_preprocess_stateid_op()
vfs: pull btrfs clone API to vfs layer
locks: new locks_mandatory_area calling convention
vfs: Add vfs_copy_file_range() support for pagecache copies
btrfs: add .copy_file_range file operation
x86: add sys_copy_file_range to syscall tables
vfs: add copy_file_range syscall and vfs helper

Linus Torvalds
2016-01-13 08:30:34 +0800

09 Jan, 2016

2 commits

b4d629a39 locks: rename __posix_lock_file to posix_lock_inode ... Browse Code »

...a more descriptive name and we can drop the double underscore prefix.

Signed-off-by: Jeff Layton
Acked-by: "J. Bruce Fields"

Jeff Layton
2016-01-09 00:38:30 +0800
e24dadab0 locks: prink more detail when there are leaked locks ... Browse Code »

Right now, we just get WARN_ON_ONCE, which is not particularly helpful.
Have it dump some info about the locks and the inode to make it easier
to track down leaked locks in the future.

Signed-off-by: Jeff Layton
Acked-by: "J. Bruce Fields"

Jeff Layton
2016-01-09 00:38:25 +0800