Eric Lee / smarc-fsl-linux-kernel

09 Mar, 2013

1 commit

7b54c165a vfs: don't BUG_ON() if following a /proc fd pseudo-symlink results in a symlink ... Browse Code »

It's "normal" - it can happen if the file descriptor you followed was
opened with O_NOFOLLOW.

Reported-by: Dave Jones
Cc: Al Viro
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Linus Torvalds
2013-03-09 01:03:07 +0800

02 Mar, 2013

1 commit

dcf787f39 constify path_get/path_put and fs_struct.c stuff ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-03-02 12:51:07 +0800

26 Feb, 2013

1 commit

ecf3d1f1a vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op ... Browse Code »

The following set of operations on a NFS client and server will cause

server# mkdir a
client# cd a
server# mv a a.bak
client# sleep 30 # (or whatever the dir attrcache timeout is)
client# stat .
stat: cannot stat `.': Stale NFS file handle

Obviously, we should not be getting an ESTALE error back there since the
inode still exists on the server. The problem is that the lookup code
will call d_revalidate on the dentry that "." refers to, because NFS has
FS_REVAL_DOT set.

nfs_lookup_revalidate will see that the parent directory has changed and
will try to reverify the dentry by redoing a LOOKUP. That of course
fails, so the lookup code returns ESTALE.

The problem here is that d_revalidate is really a bad fit for this case.
What we really want to know at this point is whether the inode is still
good or not, but we don't really care what name it goes by or whether
the dcache is still valid.

Add a new d_op->d_weak_revalidate operation and have complete_walk call
that instead of d_revalidate. The intent there is to allow for a
"weaker" d_revalidate that just checks to see whether the inode is still
good. This is also gives us an opportunity to kill off the FS_REVAL_DOT
special casing.

[AV: changed method name, added note in porting, fixed confusion re
having it possibly called from RCU mode (it won't be)]

Cc: NeilBrown
Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2013-02-26 15:46:09 +0800

23 Feb, 2013

6 commits

cc2a52711 lookup_slow: get rid of name argument ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-02-23 12:31:35 +0800
e97cdc87b lookup_fast: get rid of name argument ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-02-23 12:31:34 +0800
21b9b0739 get rid of name and type arguments of walk_component() ... Browse Code »

... always can be found in nameidata now.

Signed-off-by: Al Viro

Al Viro
2013-02-23 12:31:34 +0800
5f4a6a695 link_path_walk(): move assignments to nd->last/nd->last_type up ... Browse Code »

... and clean the main loop a bit

Signed-off-by: Al Viro

Al Viro
2013-02-23 12:31:34 +0800
1afc99bea propagate error from get_empty_filp() to its callers ... Browse Code »

Based on parts from Anatol's patch (the rest is the next commit).

Signed-off-by: Al Viro

Al Viro
2013-02-23 12:31:32 +0800
496ad9aa8 new helper: file_inode(file) ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-02-23 12:31:31 +0800

21 Dec, 2012

12 commits

c6a942840 vfs: fix renameat to retry on ESTALE errors ... Browse Code »

...as always, rename is the messiest of the bunch. We have to track
whether to retry or not via a separate flag since the error handling
is already quite complex.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-12-21 07:50:05 +0800
5d18f8133 vfs: make do_unlinkat retry once on ESTALE errors ... Browse Code »

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-12-21 07:50:04 +0800
c6ee92069 vfs: make do_rmdir retry once on ESTALE errors ... Browse Code »

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-12-21 07:50:04 +0800
9e790bd65 vfs: add a flags argument to user_path_parent ... Browse Code »

...so we can pass in LOOKUP_REVAL. For now, nothing does yet.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-12-21 07:50:04 +0800
442e31ca5 vfs: fix linkat to retry once on ESTALE errors ... Browse Code »

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-12-21 07:50:03 +0800
f46d3567b vfs: fix symlinkat to retry on ESTALE errors ... Browse Code »

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-12-21 07:50:03 +0800
b76d8b822 vfs: fix mkdirat to retry once on an ESTALE error ... Browse Code »

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-12-21 07:50:02 +0800
972567f14 vfs: fix mknodat to retry on ESTALE errors ... Browse Code »

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-12-21 07:50:02 +0800
1ac12b4b6 vfs: turn is_dir argument to kern_path_create into a lookup_flags arg ... Browse Code »

Where we can pass in LOOKUP_DIRECTORY or LOOKUP_REVAL. Any other flags
passed in here are currently ignored.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-12-21 07:50:02 +0800
39e3c9553 vfs: remove DCACHE_NEED_LOOKUP ... Browse Code »

The code that relied on that flag was ripped out of btrfs quite some
time ago, and never added back. Josef indicated that he was going to
take a different approach to the problem in btrfs, and that we
could just eliminate this flag.

Cc: Josef Bacik
Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-12-21 02:57:36 +0800
741b7c3f7 path_init(): make -ENOTDIR failure exits consistent ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-12-21 02:57:35 +0800
582aa64a0 vfs: remove unneeded permission check from path_init ... Browse Code »

When path_init is called with a valid dfd, that code checks permissions
on the open directory fd and returns an error if the check fails. This
permission check is redundant, however.

Both callers of path_init immediately call link_path_walk afterward. The
first thing that link_path_walk does for pathnames that do not consist
only of slashes is to check for exec permissions at the starting point of
the path walk. And this check in path_init() is on the path taken only
when *name != '/' && *name != '\0'.

In most cases, these checks are very quick, but when the dfd is for a
file on a NFS mount with the actimeo=0, each permission check goes
out onto the wire. The result is 2 identical ACCESS calls.

Given that these codepaths are fairly "hot", I think it makes sense to
eliminate the permission check in path_init and simply assume that the
caller will eventually check the permissions before proceeding.

Reported-by: Dave Wysochanski
Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-12-21 02:57:04 +0800

30 Nov, 2012

1 commit

21d8a15ac lookup_one_len: don't accept . and .. ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-11-30 11:17:21 +0800

27 Oct, 2012

1 commit

561ec64ae VFS: don't do protected {sym,hard}links by default ... Browse Code »

In commit 800179c9b8a1 ("This adds symlink and hardlink restrictions to
the Linux VFS"), the new link protections were enabled by default, in
the hope that no actual application would care, despite it being
technically against legacy UNIX (and documented POSIX) behavior.

However, it does turn out to break some applications. It's rare, and
it's unfortunate, but it's unacceptable to break existing systems, so
we'll have to default to legacy behavior.

In particular, it has broken the way AFD distributes files, see

http://www.dwd.de/AFD/

along with some legacy scripts.

Distributions can end up setting this at initrd time or in system
scripts: if you have security problems due to link attacks during your
early boot sequence, you have bigger problems than some kernel sysctl
setting. Do:

echo 1 > /proc/sys/fs/protected_symlinks
echo 1 > /proc/sys/fs/protected_hardlinks

to re-enable the link protections.

Alternatively, we may at some point introduce a kernel config option
that sets these kinds of "more secure but not traditional" behavioural
options automatically.

Reported-by: Nick Bowler
Reported-by: Holger Kiehl
Cc: Kees Cook
Cc: Ingo Molnar
Cc: Andrew Morton
Cc: Al Viro
Cc: Alan Cox
Cc: Theodore Ts'o
Cc: stable@kernel.org # v3.6
Signed-off-by: Linus Torvalds

Linus Torvalds
2012-10-27 01:05:07 +0800

13 Oct, 2012

6 commits

7950e3852 vfs: embed struct filename inside of names_cache allocation if possible ... Browse Code »

In the common case where a name is much smaller than PATH_MAX, an extra
allocation for struct filename is unnecessary. Before allocating a
separate one, try to embed the struct filename inside the buffer first. If
it turns out that that's not long enough, then fall back to allocating a
separate struct filename and redoing the copy.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-10-13 08:15:10 +0800
adb5c2473 audit: make audit_inode take struct filename ... Browse Code »

Keep a pointer to the audit_names "slot" in struct filename.

Have all of the audit_inode callers pass a struct filename ponter to
audit_inode instead of a string pointer. If the aname field is already
populated, then we can skip walking the list altogether and just use it
directly.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-10-13 08:15:09 +0800
669abf4e5 vfs: make path_openat take a struct filename pointer ... Browse Code »

...and fix up the callers. For do_file_open_root, just declare a
struct filename on the stack and fill out the .name field. For
do_filp_open, make it also take a struct filename pointer, and fix up its
callers to call it appropriately.

For filp_open, add a variant that takes a struct filename pointer and turn
filp_open into a wrapper around it.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-10-13 08:15:09 +0800
873f1eedc vfs: turn do_path_lookup into wrapper around struct filename variant ... Browse Code »

...and make the user_path callers use that variant instead.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-10-13 08:15:08 +0800
7ac86265d audit: allow audit code to satisfy getname requests from its names_list ... Browse Code »

Currently, if we call getname() on a userland string more than once,
we'll get multiple copies of the string and multiple audit_names
records.

Add a function that will allow the audit_names code to satisfy getname
requests using info from the audit_names list, avoiding a new allocation
and audit_names records.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-10-13 08:15:08 +0800
91a27b2a7 vfs: define struct filename and have getname() return it ... Browse Code »

getname() is intended to copy pathname strings from userspace into a
kernel buffer. The result is just a string in kernel space. It would
however be quite helpful to be able to attach some ancillary info to
the string.

For instance, we could attach some audit-related info to reduce the
amount of audit-related processing needed. When auditing is enabled,
we could also call getname() on the string more than once and not
need to recopy it from userspace.

This patchset converts the getname()/putname() interfaces to return
a struct instead of a string. For now, the struct just tracks the
string in kernel space and the original userland pointer for it.

Later, we'll add other information to the struct as it becomes
convenient.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-10-13 08:14:55 +0800

12 Oct, 2012

6 commits

8e377d150 vfs: unexport getname and putname symbols ... Browse Code »

I see no callers in module code.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-10-12 12:32:09 +0800
4fa6b5ecb audit: overhaul __audit_inode_child to accomodate retrying ... Browse Code »

In order to accomodate retrying path-based syscalls, we need to add a
new "type" argument to audit_inode_child. This will tell us whether
we're looking for a child entry that represents a create or a delete.

If we find a parent, don't automatically assume that we need to create a
new entry. Instead, use the information we have to try to find an
existing entry first. Update it if one is found and create a new one if
not.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-10-12 12:32:03 +0800
bfcec7087 audit: set the name_len in audit_inode for parent lookups ... Browse Code »

Currently, this gets set mostly by happenstance when we call into
audit_inode_child. While that might be a little more efficient, it seems
wrong. If the syscall ends up failing before audit_inode_child ever gets
called, then you'll have an audit_names record that shows the full path
but has the parent inode info attached.

Fix this by passing in a parent flag when we call audit_inode that gets
set to the value of LOOKUP_PARENT. We can then fix up the pathname for
the audit entry correctly from the get-go.

While we're at it, clean up the no-op macro for audit_inode in the
!CONFIG_AUDITSYSCALL case.

Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-10-12 12:32:01 +0800
c43a25abb audit: reverse arguments to audit_inode_child ... Browse Code »

Most of the callers get called with an inode and dentry in the reverse
order. The compiler then has to reshuffle the arg registers and/or
stack in order to pass them on to audit_inode_child.

Reverse those arguments for a micro-optimization.

Reported-by: Eric Paris
Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-10-12 12:32:00 +0800
f78570dd6 audit: remove unnecessary NULL ptr checks from do_path_lookup ... Browse Code »

As best I can tell, whenever retval == 0, nd->path.dentry and nd->inode
are also non-NULL. Eliminate those checks and the superfluous
audit_context check.

Signed-off-by: Eric Paris
Signed-off-by: Jeff Layton
Signed-off-by: Al Viro

Jeff Layton
2012-10-12 12:31:59 +0800
98f6ef64b vfs: bogus warnings in fs/namei.c ... Browse Code »

The follow_link() function always initializes its *p argument,
or returns an error, but when building with 'gcc -s', the compiler
gets confused by the __always_inline attribute to the function
and can no longer detect where the cookie was initialized.

The solution is to always initialize the pointer from follow_link,
even in the error path. When building with -O2, this has zero impact
on generated code and adds a single instruction in the error path
for a -Os build on ARM.

Without this patch, building with gcc-4.6 through gcc-4.8 and
CONFIG_CC_OPTIMIZE_FOR_SIZE results in:

fs/namei.c: In function 'link_path_walk':
fs/namei.c:649:24: warning: 'cookie' may be used uninitialized in this function [-Wuninitialized]
fs/namei.c:1544:9: note: 'cookie' was declared here
fs/namei.c: In function 'path_lookupat':
fs/namei.c:649:24: warning: 'cookie' may be used uninitialized in this function [-Wuninitialized]
fs/namei.c:1934:10: note: 'cookie' was declared here
fs/namei.c: In function 'path_openat':
fs/namei.c:649:24: warning: 'cookie' may be used uninitialized in this function [-Wuninitialized]
fs/namei.c:2899:9: note: 'cookie' was declared here

Signed-off-by: Arnd Bergmann
Signed-off-by: Al Viro

Arnd Bergmann
2012-10-12 08:02:16 +0800

10 Oct, 2012

1 commit

ffd8d101a fs: prevent use after free in auditing when symlink following was denied ... Browse Code »

Commit "fs: add link restriction audit reporting" has added auditing of failed
attempts to follow symlinks. Unfortunately, the auditing was being done after
the struct path structure was released earlier.

Signed-off-by: Sasha Levin
Signed-off-by: Al Viro

Sasha Levin
2012-10-10 11:33:37 +0800

03 Oct, 2012

2 commits

aab174f0d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs update from Al Viro:

- big one - consolidation of descriptor-related logics; almost all of
that is moved to fs/file.c

(BTW, I'm seriously tempted to rename the result to fd.c. As it is,
we have a situation when file_table.c is about handling of struct
file and file.c is about handling of descriptor tables; the reasons
are historical - file_table.c used to be about a static array of
struct file we used to have way back).

A lot of stray ends got cleaned up and converted to saner primitives,
disgusting mess in android/binder.c is still disgusting, but at least
doesn't poke so much in descriptor table guts anymore. A bunch of
relatively minor races got fixed in process, plus an ext4 struct file
leak.

- related thing - fget_light() partially unuglified; see fdget() in
there (and yes, it generates the code as good as we used to have).

- also related - bits of Cyrill's procfs stuff that got entangled into
that work; _not_ all of it, just the initial move to fs/proc/fd.c and
switch of fdinfo to seq_file.

- Alex's fs/coredump.c spiltoff - the same story, had been easier to
take that commit than mess with conflicts. The rest is a separate
pile, this was just a mechanical code movement.

- a few misc patches all over the place. Not all for this cycle,
there'll be more (and quite a few currently sit in akpm's tree)."

Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
MAX_LFS_FILESIZE should be a loff_t
compat: fs: Generic compat_sys_sendfile implementation
fs: push rcu_barrier() from deactivate_locked_super() to filesystems
btrfs: reada_extent doesn't need kref for refcount
coredump: move core dump functionality into its own file
coredump: prevent double-free on an error path in core dumper
usb/gadget: fix misannotations
fcntl: fix misannotations
ceph: don't abuse d_delete() on failure exits
hypfs: ->d_parent is never NULL or negative
vfs: delete surplus inode NULL check
switch simple cases of fget_light to fdget
new helpers: fdget()/fdput()
switch o2hb_region_dev_write() to fget_light()
proc_map_files_readdir(): don't bother with grabbing files
make get_file() return its argument
vhost_set_vring(): turn pollstart/pollstop into bool
switch prctl_set_mm_exe_file() to fget_light()
switch xfs_find_handle() to fget_light()
switch xfs_swapext() to fget_light()
...

Linus Torvalds
2012-10-03 11:25:04 +0800
437589a74 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull user namespace changes from Eric Biederman:
"This is a mostly modest set of changes to enable basic user namespace
support. This allows the code to code to compile with user namespaces
enabled and removes the assumption there is only the initial user
namespace. Everything is converted except for the most complex of the
filesystems: autofs4, 9p, afs, ceph, cifs, coda, fuse, gfs2, ncpfs,
nfs, ocfs2 and xfs as those patches need a bit more review.

The strategy is to push kuid_t and kgid_t values are far down into
subsystems and filesystems as reasonable. Leaving the make_kuid and
from_kuid operations to happen at the edge of userspace, as the values
come off the disk, and as the values come in from the network.
Letting compile type incompatible compile errors (present when user
namespaces are enabled) guide me to find the issues.

The most tricky areas have been the places where we had an implicit
union of uid and gid values and were storing them in an unsigned int.
Those places were converted into explicit unions. I made certain to
handle those places with simple trivial patches.

Out of that work I discovered we have generic interfaces for storing
quota by projid. I had never heard of the project identifiers before.
Adding full user namespace support for project identifiers accounts
for most of the code size growth in my git tree.

Ultimately there will be work to relax privlige checks from
"capable(FOO)" to "ns_capable(user_ns, FOO)" where it is safe allowing
root in a user names to do those things that today we only forbid to
non-root users because it will confuse suid root applications.

While I was pushing kuid_t and kgid_t changes deep into the audit code
I made a few other cleanups. I capitalized on the fact we process
netlink messages in the context of the message sender. I removed
usage of NETLINK_CRED, and started directly using current->tty.

Some of these patches have also made it into maintainer trees, with no
problems from identical code from different trees showing up in
linux-next.

After reading through all of this code I feel like I might be able to
win a game of kernel trivial pursuit."

Fix up some fairly trivial conflicts in netfilter uid/git logging code.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (107 commits)
userns: Convert the ufs filesystem to use kuid/kgid where appropriate
userns: Convert the udf filesystem to use kuid/kgid where appropriate
userns: Convert ubifs to use kuid/kgid
userns: Convert squashfs to use kuid/kgid where appropriate
userns: Convert reiserfs to use kuid and kgid where appropriate
userns: Convert jfs to use kuid/kgid where appropriate
userns: Convert jffs2 to use kuid and kgid where appropriate
userns: Convert hpfs to use kuid and kgid where appropriate
userns: Convert btrfs to use kuid/kgid where appropriate
userns: Convert bfs to use kuid/kgid where appropriate
userns: Convert affs to use kuid/kgid wherwe appropriate
userns: On alpha modify linux_to_osf_stat to use convert from kuids and kgids
userns: On ia64 deal with current_uid and current_gid being kuid and kgid
userns: On ppc convert current_uid from a kuid before printing.
userns: Convert s390 getting uid and gid system calls to use kuid and kgid
userns: Convert s390 hypfs to use kuid and kgid where appropriate
userns: Convert binder ipc to use kuids
userns: Teach security_path_chown to take kuids and kgids
userns: Add user namespace support to IMA
userns: Convert EVM to deal with kuids and kgids in it's hmac computation
...

Linus Torvalds
2012-10-03 02:11:09 +0800

27 Sep, 2012

2 commits

2903ff019 switch simple cases of fget_light to fdget ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-09-27 10:20:08 +0800
f6d2ac5ca namei.c: fix BS comment ... Browse Code »

get_write_access() is needed for nfsd, not binfmt_aout (the latter
has no business doing anything of that kind, of course)

Signed-off-by: Al Viro

Al Viro
2012-09-27 09:10:02 +0800