Eric Lee / smarc-ti-linux-kernel | Embedian Git Server

31 Mar, 2014

16 commits

29723adee locks: make locks_mandatory_area check for file-private locks ... Browse Code »

Allow locks_mandatory_area() to handle file-private locks correctly.
If there is a file-private lock set on an open file and we're doing I/O
via the same, then that should not cause anything to block.

Handle this by first doing a non-blocking FL_ACCESS check for a
file-private lock, and then fall back to checking for a classic POSIX
lock (and possibly blocking).

Note that this approach is subject to the same races that have always
plagued mandatory locking on Linux.

Reported-by: Trond Myklebust
Signed-off-by: Jeff Layton

Jeff Layton
2014-03-31 20:24:43 +0800
d7a06983a locks: fix locks_mandatory_locked to respect file-private locks ... Browse Code »

As Trond pointed out, you can currently deadlock yourself by setting a
file-private lock on a file that requires mandatory locking and then
trying to do I/O on it.

Avoid this problem by plumbing some knowledge of file-private locks into
the mandatory locking code. In order to do this, we must pass down
information about the struct file that's being used to
locks_verify_locked.

Reported-by: Trond Myklebust
Signed-off-by: Jeff Layton
Acked-by: J. Bruce Fields

Jeff Layton
2014-03-31 20:24:43 +0800
90478939d locks: require that flock->l_pid be set to 0 for file-private locks ... Browse Code »

Neil Brown suggested potentially overloading the l_pid value as a "lock
context" field for file-private locks. While I don't think we will
probably want to do that here, it's probably a good idea to ensure that
in the future we could extend this API without breaking existing
callers.

Typically the l_pid value is ignored for incoming struct flock
arguments, serving mainly as a place to return the pid of the owner if
there is a conflicting lock. For file-private locks, require that it
currently be set to 0 and return EINVAL if it isn't. If we eventually
want to make a non-zero l_pid mean something, then this will help ensure
that we don't break legacy programs that are using file-private locks.

Cc: Neil Brown
Signed-off-by: Jeff Layton

Jeff Layton
2014-03-31 20:24:43 +0800
5d50ffd7c locks: add new fcntl cmd values for handling file private locks ... Browse Code »

Due to some unfortunate history, POSIX locks have very strange and
unhelpful semantics. The thing that usually catches people by surprise
is that they are dropped whenever the process closes any file descriptor
associated with the inode.

This is extremely problematic for people developing file servers that
need to implement byte-range locks. Developers often need a "lock
management" facility to ensure that file descriptors are not closed
until all of the locks associated with the inode are finished.

Additionally, "classic" POSIX locks are owned by the process. Locks
taken between threads within the same process won't conflict with one
another, which renders them useless for synchronization between threads.

This patchset adds a new type of lock that attempts to address these
issues. These locks conflict with classic POSIX read/write locks, but
have semantics that are more like BSD locks with respect to inheritance
and behavior on close.

This is implemented primarily by changing how fl_owner field is set for
these locks. Instead of having them owned by the files_struct of the
process, they are instead owned by the filp on which they were acquired.
Thus, they are inherited across fork() and are only released when the
last reference to a filp is put.

These new semantics prevent them from being merged with classic POSIX
locks, even if they are acquired by the same process. These locks will
also conflict with classic POSIX locks even if they are acquired by
the same process or on the same file descriptor.

The new locks are managed using a new set of cmd values to the fcntl()
syscall. The initial implementation of this converts these values to
"classic" cmd values at a fairly high level, and the details are not
exposed to the underlying filesystem. We may eventually want to push
this handing out to the lower filesystem code but for now I don't
see any need for it.

Also, note that with this implementation the new cmd values are only
available via fcntl64() on 32-bit arches. There's little need to
add support for legacy apps on a new interface like this.

Signed-off-by: Jeff Layton

Jeff Layton
2014-03-31 20:24:43 +0800
57b65325f locks: skip deadlock detection on FL_FILE_PVT locks ... Browse Code »

It's not really feasible to do deadlock detection with FL_FILE_PVT
locks since they aren't owned by a single task, per-se. Deadlock
detection also tends to be rather expensive so just skip it for
these sorts of locks.

Also, add a FIXME comment about adding more limited deadlock detection
that just applies to ro -> rw upgrades, per Andy's request.

Cc: Andy Lutomirski
Signed-off-by: Jeff Layton

Jeff Layton
2014-03-31 20:24:43 +0800
c1e62b8fc locks: pass the cmd value to fcntl_getlk/getlk64 ... Browse Code »

Once we introduce file private locks, we'll need to know what cmd value
was used, as that affects the ownership and whether a conflict would
arise.

Signed-off-by: Jeff Layton

Jeff Layton
2014-03-31 20:24:43 +0800
3fd80cddc locks: report l_pid as -1 for FL_FILE_PVT locks ... Browse Code »

FL_FILE_PVT locks are no longer tied to a particular pid, and are
instead inheritable by child processes. Report a l_pid of '-1' for
these sorts of locks since the pid is somewhat meaningless for them.

This precedent comes from FreeBSD. There, POSIX and flock() locks can
conflict with one another. If fcntl(F_GETLK, ...) returns a lock set
with flock() then the l_pid member cannot be a process ID because the
lock is not held by a process as such.

Acked-by: J. Bruce Fields
Signed-off-by: Jeff Layton

Jeff Layton
2014-03-31 20:24:42 +0800
c918d42a2 locks: make /proc/locks show IS_FILE_PVT locks as type "FLPVT" ... Browse Code »

In a later patch, we'll be adding a new type of lock that's owned by
the struct file instead of the files_struct. Those sorts of locks
will be flagged with a new FL_FILE_PVT flag.

Report these types of locks as "FLPVT" in /proc/locks to distinguish
them from "classic" POSIX locks.

Acked-by: J. Bruce Fields
Signed-off-by: Jeff Layton

Jeff Layton
2014-03-31 20:24:42 +0800
78ed8a133 locks: rename locks_remove_flock to locks_remove_file ... Browse Code »

This function currently removes leases in addition to flock locks and in
a later patch we'll have it deal with file-private locks too. Rename it
to locks_remove_file to indicate that it removes locks that are
associated with a particular struct file, and not just flock locks.

Acked-by: J. Bruce Fields
Signed-off-by: Jeff Layton

Jeff Layton
2014-03-31 20:24:42 +0800
bce7560d4 locks: consolidate checks for compatible filp->f_mode values in setlk handlers ... Browse Code »
14

Move this check into flock64_to_posix_lock instead of duplicating it in
two places. This also fixes a minor wart in the code where we continue
referring to the struct flock after converting it to struct file_lock.

Acked-by: J. Bruce Fields
Signed-off-by: Jeff Layton

Jeff Layton
2014-03-31 20:24:42 +0800
ef12e72a0 locks: fix posix lock range overflow handling ... Browse Code »

In the 32-bit case fcntl assigns the 64-bit f_pos and i_size to a 32-bit
off_t.

The existing range checks also seem to depend on signed arithmetic
wrapping when it overflows. In practice maybe that works, but we can be
more careful. That also allows us to make a more reliable distinction
between -EINVAL and -EOVERFLOW.

Note that in the 32-bit case SEEK_CUR or SEEK_END might allow the caller
to set a lock with starting point no longer representable as a 32-bit
value. We could return -EOVERFLOW in such cases, but the locks code is
capable of handling such ranges, so we choose to be lenient here. The
only problem is that subsequent GETLK calls on such a lock will fail
with EOVERFLOW.

While we're here, do some cleanup including consolidating code for the
flock and flock64 cases.

Signed-off-by: J. Bruce Fields
Signed-off-by: Jeff Layton

J. Bruce Fields
2014-03-31 20:24:42 +0800
8c3cac5e6 locks: eliminate BUG() call when there's an unexpected lock on file close ... Browse Code »
5

A leftover lock on the list is surely a sign of a problem of some sort,
but it's not necessarily a reason to panic the box. Instead, just log a
warning with some info about the lock, and then delete it like we would
any other lock.

In the event that the filesystem declares a ->lock f_op, we may end up
leaking something, but that's generally preferable to an immediate
panic.

Acked-by: J. Bruce Fields
Signed-off-by: Jeff Layton

Jeff Layton
2014-03-31 20:24:42 +0800
b03dfdec0 locks: add __acquires and __releases annotations to locks_start and locks_stop ... Browse Code »

...to make sparse happy.

Acked-by: J. Bruce Fields
Signed-off-by: Jeff Layton

Jeff Layton
2014-03-31 20:24:42 +0800
6ca10ed8e locks: remove "inline" qualifier from fl_link manipulation functions ... Browse Code »

It's best to let the compiler decide that.

Acked-by: J. Bruce Fields
Reported-by: Stephen Rothwell
Signed-off-by: Jeff Layton

Jeff Layton
2014-03-31 20:24:42 +0800
46dad7603 locks: clean up comment typo ... Browse Code »

Acked-by: J. Bruce Fields
Signed-off-by: Jeff Layton

Jeff Layton
2014-03-31 20:24:42 +0800
24cbe7845 locks: close potential race between setlease and open ... Browse Code »

As Al Viro points out, there is an unlikely, but possible race between
opening a file and setting a lease on it. generic_add_lease is done with
the i_lock held, but the inode->i_flock check in break_lease is
lockless. It's possible for another task doing an open to do the entire
pathwalk and call break_lease between the point where generic_add_lease
checks for a conflicting open and adds the lease to the list. If this
occurs, we can end up with a lease set on the file with a conflicting
open.

To guard against that, check again for a conflicting open after adding
the lease to the i_flock list. If the above race occurs, then we can
simply unwind the lease setting and return -EAGAIN.

Because we take dentry references and acquire write access on the file
before calling break_lease, we know that if the i_flock list is empty
when the open caller goes to check it then the necessary refcounts have
already been incremented. Thus the additional check for a conflicting
open will see that there is one and the setlease call will fail.

Cc: Bruce Fields
Cc: David Howells
Cc: "Paul E. McKenney"
Reported-by: Al Viro
Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2014-03-31 20:24:42 +0800

13 Nov, 2013

1 commit

4fdb793ff locks: missing unlock on error in generic_add_lease() ... Browse Code »

We should unlock here before returning.

Fixes: df4e8d2c1d2b ('locks: implement delegations')
Signed-off-by: Dan Carpenter
Signed-off-by: Al Viro

Dan Carpenter
2013-11-13 20:30:53 +0800

09 Nov, 2013

2 commits

df4e8d2c1 locks: implement delegations ... Browse Code »

Implement NFSv4 delegations at the vfs level using the new FL_DELEG lock
type.

Note nfsd is the only delegation user and is only using read
delegations. Warn on any attempt to set a write delegation for now.
We'll come back to that case later.

Acked-by: Jeff Layton
Signed-off-by: J. Bruce Fields
Signed-off-by: Al Viro

J. Bruce Fields
2013-11-09 13:16:41 +0800
617588d51 locks: introduce new FL_DELEG lock flag ... Browse Code »

For now FL_DELEG is just a synonym for FL_LEASE. So this patch doesn't
change behavior.

Next we'll modify break_lease to treat FL_DELEG leases differently, to
account for the fact that NFSv4 delegations should be broken in more
situations than Windows oplocks.

Acked-by: Jeff Layton
Signed-off-by: J. Bruce Fields
Signed-off-by: Al Viro

J. Bruce Fields
2013-11-09 13:16:41 +0800

25 Oct, 2013

1 commit

72c2d5319 file->f_op is never NULL... ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-10-25 11:34:54 +0800

08 Jul, 2013

1 commit

7012b02a2 locks: move file_lock_list to a set of percpu hlist_heads and convert file_lock_lock to an lglock ... Browse Code »

The file_lock_list is only used for /proc/locks. The vastly common case
is for locks to be put onto the list and come off again, without ever
being traversed.

Help optimize for this use-case by moving to percpu hlist_head-s. At the
same time, we can make the locking less contentious by moving to an
lglock. When iterating over the lists for /proc/locks, we must take the
global lock and then iterate over each CPU's list in turn.

This change necessitates a new fl_link_cpu field to keep track of which
CPU the entry is on. On x86_64 at least, this field is placed within an
existing hole in the struct to avoid growing the size.

Signed-off-by: Jeff Layton
Acked-by: J. Bruce Fields
Signed-off-by: Al Viro

Jeff Layton
2013-07-08 17:36:42 +0800

05 Jul, 2013

1 commit

84d08fa88 helper for reading ->d_count ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-07-05 22:59:33 +0800

29 Jun, 2013

12 commits

7b2296afb locks: give the blocked_hash its own spinlock ... Browse Code »

There's no reason we have to protect the blocked_hash and file_lock_list
with the same spinlock. With the tests I have, breaking it in two gives
a barely measurable performance benefit, but it seems reasonable to make
this locking as granular as possible.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2013-06-29 16:57:46 +0800
3999e4936 locks: add a new "lm_owner_key" lock operation ... Browse Code »

Currently, the hashing that the locking code uses to add these values
to the blocked_hash is simply calculated using fl_owner field. That's
valid in most cases except for server-side lockd, which validates the
owner of a lock based on fl_owner and fl_pid.

In the case where you have a small number of NFS clients doing a lot
of locking between different processes, you could end up with all
the blocked requests sitting in a very small number of hash buckets.

Add a new lm_owner_key operation to the lock_manager_operations that
will generate an unsigned long to use as the key in the hashtable.
That function is only implemented for server-side lockd, and simply
XORs the fl_owner and fl_pid.

Signed-off-by: Jeff Layton
Acked-by: J. Bruce Fields
Signed-off-by: Al Viro

Jeff Layton
2013-06-29 16:57:45 +0800
48f741865 locks: turn the blocked_list into a hashtable ... Browse Code »

Break up the blocked_list into a hashtable, using the fl_owner as a key.
This speeds up searching the hash chains, which is especially significant
for deadlock detection.

Note that the initial implementation assumes that hashing on fl_owner is
sufficient. In most cases it should be, with the notable exception being
server-side lockd, which compares ownership using a tuple of the
nlm_host and the pid sent in the lock request. So, this may degrade to a
single hash bucket when you only have a single NFS client. That will be
addressed in a later patch.

The careful observer may note that this patch leaves the file_lock_list
alone. There's much less of a case for turning the file_lock_list into a
hashtable. The only user of that list is the code that generates
/proc/locks, and it always walks the entire list.

Signed-off-by: Jeff Layton
Acked-by: J. Bruce Fields
Signed-off-by: Al Viro

Jeff Layton
2013-06-29 16:57:44 +0800
139ca04ee locks: convert fl_link to a hlist_node ... Browse Code »

Testing has shown that iterating over the blocked_list for deadlock
detection turns out to be a bottleneck. In order to alleviate that,
begin the process of turning it into a hashtable. We start by turning
the fl_link into a hlist_node and the global lists into hlists. A later
patch will do the conversion of the blocked_list to a hashtable.

Signed-off-by: Jeff Layton
Acked-by: J. Bruce Fields
Signed-off-by: Al Viro

Jeff Layton
2013-06-29 16:57:44 +0800
4e8c765d3 locks: avoid taking global lock if possible when waking up blocked waiters ... Browse Code »

Since we always hold the i_lock when inserting a new waiter onto the
fl_block list, we can avoid taking the global lock at all if we find
that it's empty when we go to wake up blocked waiters.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2013-06-29 16:57:43 +0800
1c8c601a8 locks: protect most of the file_lock handling with i_lock ... Browse Code »

Having a global lock that protects all of this code is a clear
scalability problem. Instead of doing that, move most of the code to be
protected by the i_lock instead. The exceptions are the global lists
that the ->fl_link sits on, and the ->fl_block list.

->fl_link is what connects these structures to the
global lists, so we must ensure that we hold those locks when iterating
over or updating these lists.

Furthermore, sound deadlock detection requires that we hold the
blocked_list state steady while checking for loops. We also must ensure
that the search and update to the list are atomic.

For the checking and insertion side of the blocked_list, push the
acquisition of the global lock into __posix_lock_file and ensure that
checking and update of the blocked_list is done without dropping the
lock in between.

On the removal side, when waking up blocked lock waiters, take the
global lock before walking the blocked list and dequeue the waiters from
the global list prior to removal from the fl_block list.

With this, deadlock detection should be race free while we minimize
excessive file_lock_lock thrashing.

Finally, in order to avoid a lock inversion problem when handling
/proc/locks output we must ensure that manipulations of the fl_block
list are also protected by the file_lock_lock.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2013-06-29 16:57:42 +0800
889746917 locks: encapsulate the fl_link list handling ... Browse Code »

Move the fl_link list handling routines into a separate set of helpers.
Also ensure that locks and requests are always put on global lists
last (after fully initializing them) and are taken off before unintializing
them.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2013-06-29 16:57:41 +0800
b9746ef80 locks: make "added" in __posix_lock_file a bool ... Browse Code »

Signed-off-by: Jeff Layton
Acked-by: J. Bruce Fields
Signed-off-by: Al Viro

Jeff Layton
2013-06-29 16:57:40 +0800
1cb360125 locks: comment cleanups and clarifications ... Browse Code »

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2013-06-29 16:57:39 +0800
d4f22d19d locks: make generic_add_lease and generic_delete_lease static ... Browse Code »

Signed-off-by: Jeff Layton
Acked-by: J. Bruce Fields
Signed-off-by: Al Viro

Jeff Layton
2013-06-29 16:57:39 +0800
1a9e64a71 cifs: use posix_unblock_lock instead of locks_delete_block ... Browse Code »

commit 66189be74 (CIFS: Fix VFS lock usage for oplocked files) exported
the locks_delete_block symbol. There's already an exported helper
function that provides this capability however, so make cifs use that
instead and turn locks_delete_block back into a static function.

Note that if fl->fl_next == NULL then this lock has already been through
locks_delete_block(), so we should be OK to ignore an ENOENT error here
and simply not retry the lock.

Cc: Pavel Shilovsky
Signed-off-by: Jeff Layton
Acked-by: J. Bruce Fields
Signed-off-by: Al Viro

Jeff Layton
2013-06-29 16:57:38 +0800
f891a29f4 locks: drop the unused filp argument to posix_unblock_lock ... Browse Code »

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2013-06-29 16:57:37 +0800

23 Feb, 2013

1 commit

496ad9aa8 new helper: file_inode(file) ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-02-23 12:31:31 +0800

10 Oct, 2012

1 commit

f474af705 nfs: disintegrate UAPI for nfs ... Browse Code »

This is to complete part of the Userspace API (UAPI) disintegration for which
the preparatory patches were pulled recently. After these patches, userspace
headers will be segregated into:

include/uapi/linux/.../foo.h

for the userspace interface stuff, and:

include/linux/.../foo.h

for the strictly kernel internal stuff.

Signed-off-by: J. Bruce Fields

J. Bruce Fields
2012-10-10 06:35:22 +0800

27 Sep, 2012

1 commit

2903ff019 switch simple cases of fget_light to fdget ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-09-27 10:20:08 +0800

21 Aug, 2012

1 commit

0ee5c6d63 vfs: don't treat fl_type as a bitmap ... Browse Code »

The rules for fl_type are rather convoluted. Typically it's treated as
holding specific values, except in the case of LOCK_MAND, in which case
it can be or'ed with LOCK_READ|LOCK_WRITE.

On some arches F_WRLCK == 2 and F_UNLCK == 3, so and'ing with F_WRLCK will also
catch the F_UNLCK case. It's unlikely in either case here that we'd ever see
F_UNLCK since those shouldn't end up on any lists, but it's still best to be
consistent.

Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2012-08-21 06:39:42 +0800

02 Aug, 2012

1 commit

068535f1f locks: remove unused lm_release_private ... Browse Code »

In commit 3b6e2723f32d ("locks: prevent side-effects of
locks_release_private before file_lock is initialized") we removed the
last user of lm_release_private without removing the field itself.

Signed-off-by: J. Bruce Fields
Signed-off-by: Linus Torvalds

J. Bruce Fields
2012-08-02 00:01:46 +0800

01 Aug, 2012

1 commit

08843b79f Merge branch 'nfsd-next' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd changes from J. Bruce Fields:
"This has been an unusually quiet cycle--mostly bugfixes and cleanup.
The one large piece is Stanislav's work to containerize the server's
grace period--but that in itself is just one more step in a
not-yet-complete project to allow fully containerized nfs service.

There are a number of outstanding delegation, container, v4 state, and
gss patches that aren't quite ready yet; 3.7 may be wilder."

* 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (35 commits)
NFSd: make boot_time variable per network namespace
NFSd: make grace end flag per network namespace
Lockd: move grace period management from lockd() to per-net functions
LockD: pass actual network namespace to grace period management functions
LockD: manage grace list per network namespace
SUNRPC: service request network namespace helper introduced
NFSd: make nfsd4_manager allocated per network namespace context.
LockD: make lockd manager allocated per network namespace
LockD: manage grace period per network namespace
Lockd: add more debug to host shutdown functions
Lockd: host complaining function introduced
LockD: manage used host count per networks namespace
LockD: manage garbage collection timeout per networks namespace
LockD: make garbage collector network namespace aware.
LockD: mark host per network namespace on garbage collect
nfsd4: fix missing fault_inject.h include
locks: move lease-specific code out of locks_delete_lock
locks: prevent side-effects of locks_release_private before file_lock is initialized
NFSd: set nfsd_serv to NULL after service destruction
NFSd: introduce nfsd_destroy() helper
...

Linus Torvalds
2012-08-01 05:42:28 +0800