Eric Lee / smarc-fsl-linux-kernel

04 Mar, 2013

1 commit

7f78e0351 fs: Limit sys_mount to only request filesystem modules. ... Browse Code »

Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.

A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.

Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.

Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives. Allowing simple, safe,
well understood work-arounds to known problematic software.

This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work. While writing this patch I saw a handful of such
cases. The most significant being autofs that lives in the module
autofs4.

This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.

After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module. The common pattern in the kernel is to call request_module()
without regards to the users permissions. In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted. In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.

Acked-by: Serge Hallyn
Acked-by: Kees Cook
Reported-by: Kees Cook
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2013-03-04 11:36:31 +0800

23 Feb, 2013

1 commit

496ad9aa8 new helper: file_inode(file) ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-02-23 12:31:31 +0800

21 Dec, 2012

1 commit

7fc7cd00f minix: drop vmtruncate ... Browse Code »

Removed vmtruncate

Signed-off-by: Marco Stornelli
Signed-off-by: Al Viro

Marco Stornelli
2012-12-21 07:40:53 +0800

03 Oct, 2012

2 commits

aab174f0d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs update from Al Viro:

- big one - consolidation of descriptor-related logics; almost all of
that is moved to fs/file.c

(BTW, I'm seriously tempted to rename the result to fd.c. As it is,
we have a situation when file_table.c is about handling of struct
file and file.c is about handling of descriptor tables; the reasons
are historical - file_table.c used to be about a static array of
struct file we used to have way back).

A lot of stray ends got cleaned up and converted to saner primitives,
disgusting mess in android/binder.c is still disgusting, but at least
doesn't poke so much in descriptor table guts anymore. A bunch of
relatively minor races got fixed in process, plus an ext4 struct file
leak.

- related thing - fget_light() partially unuglified; see fdget() in
there (and yes, it generates the code as good as we used to have).

- also related - bits of Cyrill's procfs stuff that got entangled into
that work; _not_ all of it, just the initial move to fs/proc/fd.c and
switch of fdinfo to seq_file.

- Alex's fs/coredump.c spiltoff - the same story, had been easier to
take that commit than mess with conflicts. The rest is a separate
pile, this was just a mechanical code movement.

- a few misc patches all over the place. Not all for this cycle,
there'll be more (and quite a few currently sit in akpm's tree)."

Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
MAX_LFS_FILESIZE should be a loff_t
compat: fs: Generic compat_sys_sendfile implementation
fs: push rcu_barrier() from deactivate_locked_super() to filesystems
btrfs: reada_extent doesn't need kref for refcount
coredump: move core dump functionality into its own file
coredump: prevent double-free on an error path in core dumper
usb/gadget: fix misannotations
fcntl: fix misannotations
ceph: don't abuse d_delete() on failure exits
hypfs: ->d_parent is never NULL or negative
vfs: delete surplus inode NULL check
switch simple cases of fget_light to fdget
new helpers: fdget()/fdput()
switch o2hb_region_dev_write() to fget_light()
proc_map_files_readdir(): don't bother with grabbing files
make get_file() return its argument
vhost_set_vring(): turn pollstart/pollstop into bool
switch prctl_set_mm_exe_file() to fget_light()
switch xfs_find_handle() to fget_light()
switch xfs_swapext() to fget_light()
...

Linus Torvalds
2012-10-03 11:25:04 +0800
8c0a85377 fs: push rcu_barrier() from deactivate_locked_super() to filesystems ... Browse Code »

There's no reason to call rcu_barrier() on every
deactivate_locked_super(). We only need to make sure that all delayed rcu
free inodes are flushed before we destroy related cache.

Removing rcu_barrier() from deactivate_locked_super() affects some fast
paths. E.g. on my machine exit_group() of a last process in IPC
namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.

Signed-off-by: Kirill A. Shutemov
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Kirill A. Shutemov
2012-10-03 09:35:55 +0800

21 Sep, 2012

1 commit

f303bdc55 userns: Convert minix to use kuid/kgid where appropriate ... Browse Code »

Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman

Eric W. Biederman
2012-09-21 18:13:14 +0800

31 Jul, 2012

1 commit

6ed6a722f minixfs: fix block limit check ... Browse Code »

On minix2 and minix3 usually max_size is 7fffffff and the check in
question prohibits creation of last block spanning right before 7fffffff,
due to downward rounding during the division. Fix it by using
multiplication instead.

[akpm@linux-foundation.org: fix up code layout, use local `sb']
Signed-off-by: Vladimir Serbinenko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Serbinenko
2012-07-31 08:25:19 +0800

14 Jul, 2012

2 commits

ebfc3b49a don't pass nameidata to ->create() ... Browse Code »

boolean "does it have to be exclusive?" flag is passed instead;
Local filesystem should just ignore it - the object is guaranteed
not to be there yet.

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:34:47 +0800
00cd8dd3b stop passing nameidata to ->lookup() ... Browse Code »

Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument. And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:34:32 +0800

06 May, 2012

1 commit

dbd5768f8 vfs: Rename end_writeback() to clear_inode() ... Browse Code »

After we moved inode_sync_wait() from end_writeback() it doesn't make sense
to call the function end_writeback() anymore. Rename it to clear_inode()
which well says what the function really does - set I_CLEAR flag.

Signed-off-by: Jan Kara
Signed-off-by: Fengguang Wu

Jan Kara
2012-05-06 13:43:41 +0800

22 Mar, 2012

1 commit

e2a0883e4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs pile 1 from Al Viro:
"This is _not_ all; in particular, Miklos' and Jan's stuff is not there
yet."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (64 commits)
ext4: initialization of ext4_li_mtx needs to be done earlier
debugfs-related mode_t whack-a-mole
hfsplus: add an ioctl to bless files
hfsplus: change finder_info to u32
hfsplus: initialise userflags
qnx4: new helper - try_extent()
qnx4: get rid of qnx4_bread/qnx4_getblk
take removal of PF_FORKNOEXEC to flush_old_exec()
trim includes in inode.c
um: uml_dup_mmap() relies on ->mmap_sem being held, but activate_mm() doesn't hold it
um: embed ->stub_pages[] into mmu_context
gadgetfs: list_for_each_safe() misuse
ocfs2: fix leaks on failure exits in module_init
ecryptfs: make register_filesystem() the last potential failure exit
ntfs: forgets to unregister sysctls on register_filesystem() failure
logfs: missing cleanup on register_filesystem() failure
jfs: mising cleanup on register_filesystem() failure
make configfs_pin_fs() return root dentry on success
configfs: configfs_create_dir() has parent dentry in dentry->d_parent
configfs: sanitize configfs_create()
...

Linus Torvalds
2012-03-22 04:36:41 +0800

21 Mar, 2012

2 commits

ca85c0780 minixfs: switch to d_make_root() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-03-21 09:29:36 +0800
8de527787 vfs: check i_nlink limits in vfs_{mkdir,rename_dir,link} ... Browse Code »

New field of struct super_block - ->s_max_links. Maximal allowed
value of ->i_nlink or 0; in the latter case all checks still need
to be done in ->link/->mkdir/->rename instances. Note that this
limit applies both to directoris and to non-directories.

Signed-off-by: Al Viro

Al Viro
2012-03-21 09:29:32 +0800

20 Mar, 2012

1 commit

27a6d5c74 minix: remove the second argument of k[un]map_atomic() ... Browse Code »

Signed-off-by: Cong Wang

Cong Wang
2012-03-20 21:48:24 +0800

09 Jan, 2012

1 commit

972b2c719 Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits)
reiserfs: Properly display mount options in /proc/mounts
vfs: prevent remount read-only if pending removes
vfs: count unlinked inodes
vfs: protect remounting superblock read-only
vfs: keep list of mounts for each superblock
vfs: switch ->show_options() to struct dentry *
vfs: switch ->show_path() to struct dentry *
vfs: switch ->show_devname() to struct dentry *
vfs: switch ->show_stats to struct dentry *
switch security_path_chmod() to struct path *
vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
vfs: trim includes a bit
switch mnt_namespace ->root to struct mount
vfs: take /proc/*/mounts and friends to fs/proc_namespace.c
vfs: opencode mntget() mnt_set_mountpoint()
vfs: spread struct mount - remaining argument of next_mnt()
vfs: move fsnotify junk to struct mount
vfs: move mnt_devname
vfs: move mnt_list to struct mount
vfs: switch pnode.h macros to struct mount *
...

Linus Torvalds
2012-01-09 04:19:57 +0800

05 Jan, 2012

1 commit

d6042eac4 minixfs: misplaced checks lead to dentry leak ... Browse Code »

bitmap size sanity checks should be done *before* allocating ->s_root;
there their cleanup on failure would be correct. As it is, we do iput()
on root inode, but leak the root dentry...

Signed-off-by: Al Viro
Acked-by: Josh Boyer
Signed-off-by: Linus Torvalds

Al Viro
2012-01-05 07:03:06 +0800

04 Jan, 2012

5 commits

4f45ba3d1 minix: propagate umode_t ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:54:59 +0800
1a67aafb5 switch ->mknod() to umode_t ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:54:54 +0800
4acdaf27e switch ->create() to umode_t ... Browse Code »

vfs_create() ignores everything outside of 16bit subset of its
mode argument; switching it to umode_t is obviously equivalent
and it's the only caller of the method

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:54:53 +0800
18bb1db3e switch vfs_mkdir() and ->mkdir() to umode_t ... Browse Code »

vfs_mkdir() gets int, but immediately drops everything that might not
fit into umode_t and that's the only caller of ->mkdir()...

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:54:53 +0800
6b520e056 vfs: fix the stupidity with i_dentry in inode destructors ... Browse Code »

Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
the cost of taking it into inode_init_always() will be negligible for pipes
and sockets and negative for everything else. Not to mention the removal of
boilerplate code from ->destroy_inode() instances...

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:52:40 +0800

20 Nov, 2011

2 commits

f1fd306a9 minixfs: kill manual hweight(), simplify ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-11-20 00:13:28 +0800
016e8d44b fs/minix: Verify bitmap block counts before mounting ... Browse Code »

Newer versions of MINIX can create filesystems that allocate an extra
bitmap block. Mounting of this succeeds, but doing a statfs call will
result in an oops in count_free because of a negative number being used
for the bh index.

Avoid this by verifying the number of allocated blocks at mount time,
erroring out if there are not enough and make statfs ignore the extras
if there are too many.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=18792

Signed-off-by: Josh Boyer
Signed-off-by: Al Viro

Josh Boyer
2011-11-20 00:13:26 +0800

02 Nov, 2011

1 commit

bfe868486 filesystems: add set_nlink() ... Browse Code »

Replace remaining direct i_nlink updates with a new set_nlink()
updater function.

Signed-off-by: Miklos Szeredi
Tested-by: Toshiyuki Okajima
Signed-off-by: Christoph Hellwig

Miklos Szeredi
2011-11-02 19:53:43 +0800

21 Jul, 2011

1 commit

2def9e4ec minix_getattr(): don't bother with ->d_parent ... Browse Code »

we can find superblock easier, TYVM...

Signed-off-by: Al Viro

Al Viro
2011-07-21 08:47:53 +0800

28 May, 2011

1 commit

b80d2c228 minix: remove unnecessary dentry_unhash on rmdir, dir rename ... Browse Code »

Minix has no issues with references to unlinked directories.

Signed-off-by: Sage Weil
Signed-off-by: Al Viro

Sage Weil
2011-05-28 13:02:54 +0800

26 May, 2011

2 commits

e4eaac06b vfs: push dentry_unhash on rename_dir into file systems ... Browse Code »

Only a few file systems need this. Start by pushing it down into each
rename method (except gfs2 and xfs) so that it can be dealt with on a
per-fs basis.

Acked-by: Christoph Hellwig
Signed-off-by: Sage Weil
Signed-off-by: Al Viro

Sage Weil
2011-05-26 19:26:48 +0800
79bf7c732 vfs: push dentry_unhash on rmdir into file systems ... Browse Code »

Only a few file systems need this. Start by pushing it down into each
fs rmdir method (except gfs2 and xfs) so it can be dealt with on a per-fs
basis.

This does not change behavior for any in-tree file systems.

Acked-by: Christoph Hellwig
Signed-off-by: Sage Weil
Signed-off-by: Al Viro

Sage Weil
2011-05-26 19:26:47 +0800

25 Mar, 2011

1 commit

6c5103890 Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
Documentation/iostats.txt: bit-size reference etc.
cfq-iosched: removing unnecessary think time checking
cfq-iosched: Don't clear queue stats when preempt.
blk-throttle: Reset group slice when limits are changed
blk-cgroup: Only give unaccounted_time under debug
cfq-iosched: Don't set active queue in preempt
block: fix non-atomic access to genhd inflight structures
block: attempt to merge with existing requests on plug flush
block: NULL dereference on error path in __blkdev_get()
cfq-iosched: Don't update group weights when on service tree
fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
block: Require subsystems to explicitly allocate bio_set integrity mempool
jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
fs: make fsync_buffers_list() plug
mm: make generic_writepages() use plugging
blk-cgroup: Add unaccounted time to timeslice_used.
block: fixup plugging stubs for !CONFIG_BLOCK
block: remove obsolete comments for blkdev_issue_zeroout.
blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
...

Fix up conflicts in fs/{aio.c,super.c}

Linus Torvalds
2011-03-25 01:16:26 +0800

24 Mar, 2011

1 commit

61f2e7b0f bitops: remove minix bitops from asm/bitops.h ... Browse Code »

minix bit operations are only used by minix filesystem and useless by
other modules. Because byte order of inode and block bitmaps is different
on each architecture like below:

m68k:
big-endian 16bit indexed bitmaps

h8300, microblaze, s390, sparc, m68knommu:
big-endian 32 or 64bit indexed bitmaps

m32r, mips, sh, xtensa:
big-endian 32 or 64bit indexed bitmaps for big-endian mode
little-endian bitmaps for little-endian mode

Others:
little-endian bitmaps

In order to move minix bit operations from asm/bitops.h to architecture
independent code in minix filesystem, this provides two config options.

CONFIG_MINIX_FS_BIG_ENDIAN_16BIT_INDEXED is only selected by m68k.
CONFIG_MINIX_FS_NATIVE_ENDIAN is selected by the architectures which use
native byte order bitmaps (h8300, microblaze, s390, sparc, m68knommu,
m32r, mips, sh, xtensa). The architectures which always use little-endian
bitmaps do not select these options.

Finally, we can remove minix bit operations from asm/bitops.h for all
architectures.

Signed-off-by: Akinobu Mita
Acked-by: Arnd Bergmann
Acked-by: Greg Ungerer
Cc: Geert Uytterhoeven
Cc: Roman Zippel
Cc: Andreas Schwab
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Yoshinori Sato
Cc: Michal Simek
Cc: "David S. Miller"
Cc: Hirokazu Takata
Acked-by: Ralf Baechle
Acked-by: Paul Mundt
Cc: Chris Zankel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Akinobu Mita
2011-03-24 10:46:22 +0800

10 Mar, 2011

1 commit

7eaceacca block: remove per-queue plugging ... Browse Code »
88

Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:52:07 +0800

03 Mar, 2011

1 commit

6f88049ca minix: i_nlink races in rename() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-03-03 14:28:16 +0800

13 Jan, 2011

1 commit

c6cb41236 minixfs: kill dead code ... Browse Code »

->d_op of root stays NULL these days on minixfs

Signed-off-by: Al Viro

Al Viro
2011-01-13 09:02:44 +0800

07 Jan, 2011

2 commits

fb045adb9 fs: dcache reduce branches in lookup path ... Browse Code »

Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:28 +0800
fa0d7e3de fs: icache RCU free inodes ... Browse Code »

RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
to take i_lock no longer need to take sb_inode_list_lock to walk the list in
the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:26 +0800

29 Oct, 2010

1 commit

152a08366 new helper: mount_bdev() ... Browse Code »

... and switch of the obvious get_sb_bdev() users to ->mount()

Signed-off-by: Al Viro

Al Viro
2010-10-29 16:16:13 +0800

26 Oct, 2010

1 commit

7de9c6ee3 new helper: ihold() ... Browse Code »

Clones an existing reference to inode; caller must already hold one.

Signed-off-by: Al Viro

Al Viro
2010-10-26 09:26:11 +0800

10 Sep, 2010

1 commit

eee743fd7 minix: fix regression in minix_mkdir() ... Browse Code »

Commit 9eed1fb721c ("minix: replace inode uid,gid,mode init with helper")
broke directory creation on minix filesystems.

Fix it by passing the needed mode flag to inode init helper.

Signed-off-by: Jorge Boncompte [DTI2]
Cc: Dmitry Monakhov
Cc: Al Viro
Cc: [2.6.35.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jorge Boncompte [DTI2]
2010-09-10 09:57:25 +0800

10 Aug, 2010

2 commits

5ccb4a78d switch minix to ->evict_inode(), fix write_inode/delete_inode race ... Browse Code »

We need to wait for completion of possible writeback in progress
before we clear on-disk inode during deletion.

Signed-off-by: Al Viro

Al Viro
2010-08-10 04:47:53 +0800
1025774ce remove inode_setattr ... Browse Code »

Replace inode_setattr with opencoded variants of it in all callers. This
moves the remaining call to vmtruncate into the filesystem methods where it
can be replaced with the proper truncate sequence.

In a few cases it was obvious that we would never end up calling vmtruncate
so it was left out in the opencoded variant:

spufs: explicitly checks for ATTR_SIZE earlier
btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above

In addition to that ncpfs called inode_setattr with handcrafted iattrs,
which allowed to trim down the opencoded variant.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-08-10 04:47:37 +0800