Eric Lee / smarc-fsl-linux-kernel

25 Oct, 2013

1 commit

966c1f75f isofs: don't pass dentry to isofs_hash{i,}_common() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-10-25 11:34:59 +0800

01 Aug, 2013

1 commit

17b7f7cf5 isofs: Refuse RW mount of the filesystem instead of making it RO ... Browse Code »

Refuse RW mount of isofs filesystem. So far we just silently changed it
to RO mount but when the media is writeable, block layer won't notice
this change and thus will think device is used RW and will block eject
button of the drive. That is unexpected by users because for
non-writeable media eject button works just fine.

Userspace mount(8) command handles this just fine and retries mounting
with MS_RDONLY set so userspace shouldn't see any regression. Plus any
tool mounting isofs is likely confronted with the case of read-only
media where block layer already refuses to mount the filesystem without
MS_RDONLY set so our behavior shouldn't be anything new for it.

Reported-by: Hui Wang
Signed-off-by: Jan Kara

Jan Kara
2013-08-01 04:14:50 +0800

29 Jun, 2013

2 commits

da53be12b Don't pass inode to ->d_hash() and ->d_compare() ... Browse Code »

Instances either don't look at it at all (the majority of cases) or
only want it to find the superblock (which can be had as dentry->d_sb).
A few cases that want more are actually safe with dentry->d_inode -
the only precaution needed is the check that it hadn't been replaced with
NULL by rmdir() or by overwriting rename(), which case should be simply
treated as cache miss.

Signed-off-by: Linus Torvalds
Signed-off-by: Al Viro

Linus Torvalds
2013-06-29 16:57:36 +0800
bfee7169c [readdir] convert isofs ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-06-29 16:56:47 +0800

13 Mar, 2013

1 commit

fa7614ddd fs: Readd the fs module aliases. ... Browse Code »

I had assumed that the only use of module aliases for filesystems
prior to "fs: Limit sys_mount to only request filesystem modules."
was in request_module. It turns out I was wrong. At least mkinitcpio
in Arch linux uses these aliases.

So readd the preexising aliases, to keep from breaking userspace.

Userspace eventually will have to follow and use the same aliases the
kernel does. So at some point we may be delete these aliases without
problems. However that day is not today.

Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2013-03-13 09:55:21 +0800

04 Mar, 2013

1 commit

7f78e0351 fs: Limit sys_mount to only request filesystem modules. ... Browse Code »

Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.

A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.

Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.

Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives. Allowing simple, safe,
well understood work-arounds to known problematic software.

This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work. While writing this patch I saw a handful of such
cases. The most significant being autofs that lives in the module
autofs4.

This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.

After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module. The common pattern in the kernel is to call request_module()
without regards to the users permissions. In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted. In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.

Acked-by: Serge Hallyn
Acked-by: Kees Cook
Reported-by: Kees Cook
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2013-03-04 11:36:31 +0800

26 Feb, 2013

1 commit

94e07a759 fs: encode_fh: return FILEID_INVALID if invalid fid_type ... Browse Code »

This patch is a follow up on below patch:

[PATCH] exportfs: add FILEID_INVALID to indicate invalid fid_type
commit: 216b6cbdcbd86b1db0754d58886b466ae31f5a63

Signed-off-by: Namjae Jeon
Signed-off-by: Vivek Trivedi
Acked-by: Steven Whitehouse
Acked-by: Sage Weil
Signed-off-by: Al Viro

Namjae Jeon
2013-02-26 15:46:10 +0800

23 Feb, 2013

1 commit

496ad9aa8 new helper: file_inode(file) ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-02-23 12:31:31 +0800

10 Oct, 2012

1 commit

35c2a7f49 tmpfs,ceph,gfs2,isofs,reiserfs,xfs: fix fh_len checking ... Browse Code »

Fuzzing with trinity oopsed on the 1st instruction of shmem_fh_to_dentry(),
u64 inum = fid->raw[2];
which is unhelpfully reported as at the end of shmem_alloc_inode():

BUG: unable to handle kernel paging request at ffff880061cd3000
IP: [] shmem_alloc_inode+0x40/0x40
Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Call Trace:
[] ? exportfs_decode_fh+0x79/0x2d0
[] do_handle_open+0x163/0x2c0
[] sys_open_by_handle_at+0xc/0x10
[] tracesys+0xe1/0xe6

Right, tmpfs is being stupid to access fid->raw[2] before validating that
fh_len includes it: the buffer kmalloc'ed by do_sys_name_to_handle() may
fall at the end of a page, and the next page not be present.

But some other filesystems (ceph, gfs2, isofs, reiserfs, xfs) are being
careless about fh_len too, in fh_to_dentry() and/or fh_to_parent(), and
could oops in the same way: add the missing fh_len checks to those.

Reported-by: Sasha Levin
Signed-off-by: Hugh Dickins
Cc: Al Viro
Cc: Sage Weil
Cc: Steven Whitehouse
Cc: Christoph Hellwig
Cc: stable@vger.kernel.org
Signed-off-by: Al Viro

Hugh Dickins
2012-10-10 11:33:55 +0800

03 Oct, 2012

2 commits

aab174f0d Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs update from Al Viro:

- big one - consolidation of descriptor-related logics; almost all of
that is moved to fs/file.c

(BTW, I'm seriously tempted to rename the result to fd.c. As it is,
we have a situation when file_table.c is about handling of struct
file and file.c is about handling of descriptor tables; the reasons
are historical - file_table.c used to be about a static array of
struct file we used to have way back).

A lot of stray ends got cleaned up and converted to saner primitives,
disgusting mess in android/binder.c is still disgusting, but at least
doesn't poke so much in descriptor table guts anymore. A bunch of
relatively minor races got fixed in process, plus an ext4 struct file
leak.

- related thing - fget_light() partially unuglified; see fdget() in
there (and yes, it generates the code as good as we used to have).

- also related - bits of Cyrill's procfs stuff that got entangled into
that work; _not_ all of it, just the initial move to fs/proc/fd.c and
switch of fdinfo to seq_file.

- Alex's fs/coredump.c spiltoff - the same story, had been easier to
take that commit than mess with conflicts. The rest is a separate
pile, this was just a mechanical code movement.

- a few misc patches all over the place. Not all for this cycle,
there'll be more (and quite a few currently sit in akpm's tree)."

Fix up trivial conflicts in the android binder driver, and some fairly
simple conflicts due to two different changes to the sock_alloc_file()
interface ("take descriptor handling from sock_alloc_file() to callers"
vs "net: Providing protocol type via system.sockprotoname xattr of
/proc/PID/fd entries" adding a dentry name to the socket)

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (72 commits)
MAX_LFS_FILESIZE should be a loff_t
compat: fs: Generic compat_sys_sendfile implementation
fs: push rcu_barrier() from deactivate_locked_super() to filesystems
btrfs: reada_extent doesn't need kref for refcount
coredump: move core dump functionality into its own file
coredump: prevent double-free on an error path in core dumper
usb/gadget: fix misannotations
fcntl: fix misannotations
ceph: don't abuse d_delete() on failure exits
hypfs: ->d_parent is never NULL or negative
vfs: delete surplus inode NULL check
switch simple cases of fget_light to fdget
new helpers: fdget()/fdput()
switch o2hb_region_dev_write() to fget_light()
proc_map_files_readdir(): don't bother with grabbing files
make get_file() return its argument
vhost_set_vring(): turn pollstart/pollstop into bool
switch prctl_set_mm_exe_file() to fget_light()
switch xfs_find_handle() to fget_light()
switch xfs_swapext() to fget_light()
...

Linus Torvalds
2012-10-03 11:25:04 +0800
8c0a85377 fs: push rcu_barrier() from deactivate_locked_super() to filesystems ... Browse Code »

There's no reason to call rcu_barrier() on every
deactivate_locked_super(). We only need to make sure that all delayed rcu
free inodes are flushed before we destroy related cache.

Removing rcu_barrier() from deactivate_locked_super() affects some fast
paths. E.g. on my machine exit_group() of a last process in IPC
namespace takes 0.07538s. rcu_barrier() takes 0.05188s of that time.

Signed-off-by: Kirill A. Shutemov
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Kirill A. Shutemov
2012-10-03 09:35:55 +0800

21 Sep, 2012

1 commit

ba64e2b9e userns: Convert isofs to use kuid/kgid where appropriate ... Browse Code »

Signed-off-by: Eric W. Biederman

Eric W. Biederman
2012-09-21 18:13:12 +0800

25 Jul, 2012

1 commit

08d9329c2 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs ... Browse Code »

Pull misc udf, ext2, ext3, and isofs fixes from Jan Kara:
"Assorted, mostly trivial, fixes for udf, ext2, ext3, and isofs. I'm
on vacation and scarcely checking email since we are expecting baby
any day now but these fixes should be safe to go in and I don't want
to delay them unnecessarily."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: avoid info leak on export
isofs: avoid info leak on export
udf: Improve table length check to avoid possible overflow
ext3: Check return value of blkdev_issue_flush()
jbd: Check return value of blkdev_issue_flush()
udf: Do not decrement i_blocks when freeing indirect extent block
udf: Fix memory leak when mounting
ext2: cleanup the confused goto label
UDF: Remove unnecessary variable "offset" from udf_fill_inode
udf: stop using s_dirt
ext3: force ro mount if ext3_setup_super() fails
quota: fix checkpatch.pl warning by replacing with

Linus Torvalds
2012-07-25 08:40:44 +0800

14 Jul, 2012

1 commit

00cd8dd3b stop passing nameidata to ->lookup() ... Browse Code »

Just the flags; only NFS cares even about that, but there are
legitimate uses for such argument. And getting rid of that
completely would require splitting ->lookup() into a couple
of methods (at least), so let's leave that alone for now...

Signed-off-by: Al Viro

Al Viro
2012-07-14 20:34:32 +0800

13 Jul, 2012

1 commit

fe685aabf isofs: avoid info leak on export ... Browse Code »

For type 1 the parent_offset member in struct isofs_fid gets copied
uninitialized to userland. Fix this by initializing it to 0.

Signed-off-by: Mathias Krause
Signed-off-by: Jan Kara

Mathias Krause
2012-07-13 17:21:21 +0800

30 May, 2012

1 commit

b0b0382bb ->encode_fh() API change ... Browse Code »

pass inode + parent's inode or NULL instead of dentry + bool saying
whether we want the parent or not.

NOTE: that needs ceph fix folded in.

Signed-off-by: Al Viro

Al Viro
2012-05-30 11:28:33 +0800

21 Mar, 2012

1 commit

48fde701a switch open-coded instances of d_make_root() to new helper ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-03-21 09:29:35 +0800

09 Jan, 2012

1 commit

8fdd8c49f isofs: inode leak on mount failure ... Browse Code »

d_alloc_root() failure leaves root inode leaked...

Signed-off-by: Al Viro

Al Viro
2012-01-09 23:48:11 +0800

04 Jan, 2012

2 commits

7328bdd6c isofs: propagate umode_t ... Browse Code »

situation with mount options is the same as for udf

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:55:08 +0800
6b520e056 vfs: fix the stupidity with i_dentry in inode destructors ... Browse Code »

Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
the cost of taking it into inode_init_always() will be negligible for pipes
and sockets and negative for everything else. Not to mention the removal of
boilerplate code from ->destroy_inode() instances...

Signed-off-by: Al Viro

Al Viro
2012-01-04 11:52:40 +0800

03 Nov, 2011

2 commits

092f4c56c Merge branch 'akpm' (Andrew's incoming - part two) ... Browse Code »

Says Andrew:

"60 patches. That's good enough for -rc1 I guess. I have quite a lot
of detritus to be rechecked, work through maintainers, etc.

- most of the remains of MM
- rtc
- various misc
- cgroups
- memcg
- cpusets
- procfs
- ipc
- rapidio
- sysctl
- pps
- w1
- drivers/misc
- aio"

* akpm: (60 commits)
memcg: replace ss->id_lock with a rwlock
aio: allocate kiocbs in batches
drivers/misc/vmw_balloon.c: fix typo in code comment
drivers/misc/vmw_balloon.c: determine page allocation flag can_sleep outside loop
w1: disable irqs in critical section
drivers/w1/w1_int.c: multiple masters used same init_name
drivers/power/ds2780_battery.c: fix deadlock upon insertion and removal
drivers/power/ds2780_battery.c: add a nolock function to w1 interface
drivers/power/ds2780_battery.c: create central point for calling w1 interface
w1: ds2760 and ds2780, use ida for id and ida_simple_get() to get it
pps gpio client: add missing dependency
pps: new client driver using GPIO
pps: default echo function
include/linux/dma-mapping.h: add dma_zalloc_coherent()
sysctl: make CONFIG_SYSCTL_SYSCALL default to n
sysctl: add support for poll()
RapidIO: documentation update
drivers/net/rionet.c: fix ethernet address macros for LE platforms
RapidIO: fix potential null deref in rio_setup_device()
RapidIO: add mport driver for Tsi721 bridge
...

Linus Torvalds
2011-11-03 07:07:27 +0800
3069083cc isofs: add readpages support ... Browse Code »

Use mpage_readpages() instead of multiple calls to isofs_readpage() to
reduce the CPU utilization and make performance higher.

Signed-off-by: Namjae Jeon
Cc: Al Viro
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Namjae Jeon
2011-11-03 07:06:59 +0800

02 Nov, 2011

1 commit

bfe868486 filesystems: add set_nlink() ... Browse Code »

Replace remaining direct i_nlink updates with a new set_nlink()
updater function.

Signed-off-by: Miklos Szeredi
Tested-by: Toshiyuki Okajima
Signed-off-by: Christoph Hellwig

Miklos Szeredi
2011-11-02 19:53:43 +0800

23 Jul, 2011

1 commit

d769b3c2a isofs: Remove global fs lock ... Browse Code »

sbi->s_mutex isn't needed for isofs at all so we can just remove it. Generally,
since isofs is always mounted read-only, filesystem structure cannot change
under us. So buffer_head contents stays constant after it's filled in. That
leaves us with possible changes of global data structures. Superblock changes
only during filesystem mount (even remount does not change it), inodes are only
filled in during reading from disk. So there are no changes of these structures
to bother about.

Arguments why sbi->s_mutex can be removed at each place:
isofs_readdir: Accesses sb, inode, filp, local variables => s_mutex not needed
isofs_lookup: Protected by directory's i_mutex. Accesses sb, inode, dentry,
local variables => s_mutex not needed
rock_ridge_symlink_readpage: Protected by page lock. Accesses sb, inode,
local variables => s_mutex not needed.

Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2011-07-23 07:42:12 +0800

20 Jul, 2011

1 commit

a9049376e make d_splice_alias(ERR_PTR(err), dentry) = ERR_PTR(err) ... Browse Code »

... and simplify the living hell out of callers

Signed-off-by: Al Viro

Al Viro
2011-07-20 13:44:26 +0800

18 Jun, 2011

1 commit

c11760c6d isofs: fix bh leak in isofs_fill_super() error case ... Browse Code »

In isofs_fill_super(), when an iso_primary_descriptor is found, it is
kept in pri_bh. The error cases don't properly release it. Fix it.

Reported-and-tested-by: 김원석
Cc: Andrew Morton
Signed-off-by: Linus Torvalds

Linus Torvalds
2011-06-18 22:25:42 +0800

25 Mar, 2011

1 commit

6c5103890 Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
Documentation/iostats.txt: bit-size reference etc.
cfq-iosched: removing unnecessary think time checking
cfq-iosched: Don't clear queue stats when preempt.
blk-throttle: Reset group slice when limits are changed
blk-cgroup: Only give unaccounted_time under debug
cfq-iosched: Don't set active queue in preempt
block: fix non-atomic access to genhd inflight structures
block: attempt to merge with existing requests on plug flush
block: NULL dereference on error path in __blkdev_get()
cfq-iosched: Don't update group weights when on service tree
fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
block: Require subsystems to explicitly allocate bio_set integrity mempool
jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
fs: make fsync_buffers_list() plug
mm: make generic_writepages() use plugging
blk-cgroup: Add unaccounted time to timeslice_used.
block: fixup plugging stubs for !CONFIG_BLOCK
block: remove obsolete comments for blkdev_issue_zeroout.
blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
...

Fix up conflicts in fs/{aio.c,super.c}

Linus Torvalds
2011-03-25 01:16:26 +0800

14 Mar, 2011

1 commit

5fe0c2378 exportfs: Return the minimum required handle size ... Browse Code »

The exportfs encode handle function should return the minimum required
handle size. This helps user to find out the handle size by passing 0
handle size in the first step and then redoing to the call again with
the returned handle size value.

Acked-by: Serge Hallyn
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Al Viro

Aneesh Kumar K.V
2011-03-14 21:15:28 +0800

10 Mar, 2011

1 commit

7eaceacca block: remove per-queue plugging ... Browse Code »
86

Code has been converted over to the new explicit on-stack plugging,
and delay users have been converted to use the new API for that.
So lets kill off the old plugging along with aops->sync_page().

Signed-off-by: Jens Axboe

Jens Axboe
2011-03-10 15:52:07 +0800

13 Jan, 2011

1 commit

6cc9c1d2c fix isofs d_op handling ... Browse Code »

switch to ->s_d_op; d_obtain_alias() will DTRT now

Signed-off-by: Al Viro

Al Viro
2011-01-13 09:02:43 +0800

07 Jan, 2011

4 commits

fb045adb9 fs: dcache reduce branches in lookup path ... Browse Code »

Reduce some branches and memory accesses in dcache lookup by adding dentry
flags to indicate common d_ops are set, rather than having to check them.
This saves a pointer memory access (dentry->d_op) in common path lookup
situations, and saves another pointer load and branch in cases where we
have d_op but not the particular operation.

Patched with:

git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:28 +0800
fa0d7e3de fs: icache RCU free inodes ... Browse Code »

RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
to take i_lock no longer need to take sb_inode_list_lock to walk the list in
the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:26 +0800
b1e6a015a fs: change d_hash for rcu-walk ... Browse Code »

Change d_hash so it may be called from lock-free RCU lookups. See similar
patch for d_compare for details.

For in-tree filesystems, this is just a mechanical change.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:20 +0800
621e155a3 fs: change d_compare for rcu-walk ... Browse Code »

Change d_compare so it may be called from lock-free RCU lookups. This
does put significant restrictions on what may be done from the callback,
however there don't seem to have been any problems with in-tree fses.
If some strange use case pops up that _really_ cannot cope with the
rcu-walk rules, we can just add new rcu-unaware callbacks, which would
cause name lookup to drop out of rcu-walk mode.

For in-tree filesystems, this is just a mechanical change.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:19 +0800

29 Oct, 2010

1 commit

152a08366 new helper: mount_bdev() ... Browse Code »

... and switch of the obvious get_sb_bdev() users to ->mount()

Signed-off-by: Al Viro

Al Viro
2010-10-29 16:16:13 +0800

28 Oct, 2010

1 commit

e45c9effe isofs: work-around for Rock Ridge+Joliet CDs with empty ISO root directory ... Browse Code »

If a CD has both Rock Ridge and Joliet extensions and the ISO root
directory is empty, no files are visible. Disable Rock Ridge extensions
in this case and use Joliet root directory instead.

Signed-off-by: Ondrej Zary
Cc: Al Viro
Cc: Guenter Roeck
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ondrej Zary
2010-10-28 09:03:08 +0800

26 Oct, 2010

1 commit

fde214d41 isofs: Fix isofs_get_blocks for 8TB files ... Browse Code »

Currently isofs_get_blocks() was limited to handle only 4TB files on 32-bit
architectures because of unnecessary use of iblock variable which was signed
long. Just remove the variable. The error messages that were using this
variable should have rather used b_off anyway because that is the block we
are currently mapping.

Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2010-10-26 09:18:20 +0800

05 Oct, 2010

2 commits

4f819a789 BKL: Remove BKL from isofs ... Browse Code »

As in other file systems, we can replace the big kernel lock
with a private mutex in isofs. This means we can now access
multiple file systems concurrently, but it also means that
we serialize readdir and lookup across sleeping operations
which previously released the big kernel lock. This should
not matter though, as these operations are in practice
serialized through the hardware access.

The isofs_get_blocks functions now does not take any lock
any more, it used to recursively get the BKL. After looking
at the code for hours, I convinced myself that it was never
needed here anyway, because it only reads constant fields
of the inode and writes to a buffer head array that is
at this time only visible to the caller.

The get_sb and fill_super operations do not need the locking
at all because they operate on a file system that is either
about to be created or to be destroyed but in either case
is not visible to other threads.

Signed-off-by: Arnd Bergmann

Arnd Bergmann
2010-10-05 03:10:45 +0800
db7192221 BKL: Explicitly add BKL around get_sb/fill_super ... Browse Code »

This patch is a preparation necessary to remove the BKL from do_new_mount().
It explicitly adds calls to lock_kernel()/unlock_kernel() around
get_sb/fill_super operations for filesystems that still uses the BKL.

I've read through all the code formerly covered by the BKL inside
do_kern_mount() and have satisfied myself that it doesn't need the BKL
any more.

do_kern_mount() is already called without the BKL when mounting the rootfs
and in nfsctl. do_kern_mount() calls vfs_kern_mount(), which is called
from various places without BKL: simple_pin_fs(), nfs_do_clone_mount()
through nfs_follow_mountpoint(), afs_mntpt_do_automount() through
afs_mntpt_follow_link(). Both later functions are actually the filesystems
follow_link inode operation. vfs_kern_mount() is calling the specified
get_sb function and lets the filesystem do its job by calling the given
fill_super function.

Therefore I think it is safe to push down the BKL from the VFS to the
low-level filesystems get_sb/fill_super operation.

[arnd: do not add the BKL to those file systems that already
don't use it elsewhere]

Signed-off-by: Jan Blunck
Signed-off-by: Arnd Bergmann
Cc: Matthew Wilcox
Cc: Christoph Hellwig

Jan Blunck
2010-10-05 03:10:10 +0800

11 Aug, 2010

1 commit

66a362a2a isofs: Fix lseek() to position beyond 4 GB ... Browse Code »

isofs supports files larger than 4 GB by using multi-extent files.
However an lseek() to a position beyond 4 GB in such a file will
fail with EINVAL, because s_maxbytes in the isofs superblock is
initialized to 2^32-1, and generic_file_llseek() checks against
that value.

I therefore suggest increasing the value of s_maxbytes to have
full support for large files in isofs. With multi-extent files, file
size is only limited by the maximum size of the file system (8 TB),
so this seems a reasonable value for s_maxbytes.

Signed-off-by: Jan Andres
Signed-off-by: Al Viro

Jan Andres
2010-08-11 12:29:47 +0800