28 Aug, 2009
3 commits
-
* 'for-linus' of git://git.infradead.org/users/eparis/notify:
inotify: Ensure we alwasy write the terminating NULL.
inotify: fix locking around inotify watching in the idr
inotify: do not BUG on idr entries at inotify destruction
inotify: seperate new watch creation updating existing watches -
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: update documentation pointers
9p: remove unnecessary v9fses->options which duplicates the mount string
net/9p: insulate the client against an invalid error code sent by a 9p server
9p: Add missing cast for the error return value in v9fs_get_inode
9p: Remove redundant inode uid/gid assignment
9p: Fix possible regressions when ->get_sb fails.
9p: Fix v9fs show_options
9p: Fix possible memleak in v9fs_inode_from fid.
9p: minor comment fixes
9p: Fix possible inode leak in v9fs_get_inode.
9p: Check for error in return value of v9fs_fid_add -
kAFS crashes when asked to read a symbolic link because page_getlink()
passes a NULL file pointer to read_mapping_page(), but afs_readpage()
expects a file pointer from which to extract a key.Modify afs_readpage() to request the appropriate key from the calling
process's keyrings if a file struct is not supplied with one attached.Signed-off-by: David Howells
Acked-by: Anton Blanchard
Signed-off-by: Linus Torvalds
27 Aug, 2009
4 commits
-
Before the rewrite copy_event_to_user always wrote a terqminating '\0'
byte to user space after the filename. Since the rewrite that
terminating byte was skipped if your filename is exactly a multiple of
event_size. Ouch!So add one byte to name_size before we round up and use clear_user to
set userspace to zero like /dev/zero does instead of copying the
strange nul_inotify_event. I can't quite convince myself len_to_zero
will never exceed 16 and even if it doesn't clear_user should be more
efficient and a more accurate reflection of what the code is trying to
do.Signed-off-by: Eric W. Biederman
Signed-off-by: Eric Paris -
The are races around the idr storage of inotify watches. It's possible
that a watch could be found from sys_inotify_rm_watch() in the idr, but it
could be removed from the idr before that code does it's removal. Move the
locking and the refcnt'ing so that these have to happen atomically.Signed-off-by: Eric Paris
-
If an inotify watch is left in the idr when an fsnotify group is destroyed
this will lead to a BUG. This is not a dangerous situation and really
indicates a programming bug and leak of memory. This patch changes it to
use a WARN and a printk rather than killing people's boxes.Signed-off-by: Eric Paris
-
There is nothing known wrong with the inotify watch addition/modification
but this patch seperates the two code paths to make them each easy to
verify as correct.Signed-off-by: Eric Paris
26 Aug, 2009
1 commit
-
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
ext3: Improve error message that changing journaling mode on remount is not possible
ext3: Update Kconfig description of EXT3_DEFAULTS_TO_ORDERED
25 Aug, 2009
3 commits
-
Commit 76db6d9500caeaa774a3e32a997eba30bbdc176b (nfs41: add session setup
to the state manager) introduces an infinite loop possibility in the NFSv4
state manager. By first checking nfs4_has_session() before clearing the
NFS4CLNT_SESSION_SETUP flag, it allows for a situation where someone sets
that flag, but it never gets cleared, and so the state manager loops.In fact commit c3fad1b1aaf850bf692642642ace7cd0d64af0a3 (nfs41: add session
reset to state manager) causes this to happen every time we get a network
partition error.Signed-off-by: Trond Myklebust
Tested-by: Daniel J Blueman
Signed-off-by: Linus Torvalds -
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jlbec/ocfs2:
ocfs2/dlm: Wait on lockres instead of erroring cancel requests
ocfs2: Add missing lock name
ocfs2: Don't oops in ocfs2_kill_sb on a failed mount
ocfs2: release the buffer head in ocfs2_do_truncate.
ocfs2: Handle quota file corruption more gracefully -
2.6.30's commit 8a0bdec194c21c8fdef840989d0d7b742bb5d4bc removed
user_shm_lock() calls in hugetlb_file_setup() but left the
user_shm_unlock call in shm_destroy().In detail:
Assume that can_do_hugetlb_shm() returns true and hence user_shm_lock()
is not called in hugetlb_file_setup(). However, user_shm_unlock() is
called in any case in shm_destroy() and in the following
atomic_dec_and_lock(&up->__count) in free_uid() is executed and if
up->__count gets zero, also cleanup_user_struct() is scheduled.Note that sched_destroy_user() is empty if CONFIG_USER_SCHED is not set.
However, the ref counter up->__count gets unexpectedly non-positive and
the corresponding structs are freed even though there are live
references to them, resulting in a kernel oops after a lots of
shmget(SHM_HUGETLB)/shmctl(IPC_RMID) cycles and CONFIG_USER_SCHED set.Hugh changed Stefan's suggested patch: can_do_hugetlb_shm() at the
time of shm_destroy() may give a different answer from at the time
of hugetlb_file_setup(). And fixed newseg()'s no_id error path,
which has missed user_shm_unlock() ever since it came in 2.6.9.Reported-by: Stefan Huber
Signed-off-by: Hugh Dickins
Tested-by: Stefan Huber
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds
24 Aug, 2009
3 commits
-
This patch makes the error message about changing journaling mode on remount
more descriptive. Some people are going to hit this error now due to commit
bbae8bcc49bc4d002221dab52c79a50a82e7cd1f if they configure a kernel to default
to data=writeback mode. The problem happens if they have data=ordered set for
the root filesystem in /etc/fstab but not in the kernel command line (and they
don't use initrd). Their filesystem then gets mounted as data=writeback by
kernel but then their boot fails because init scripts won't be able to remount
the filesystem rw. Better error message will hopefully make it easier for them
to find the error in their setup and bother us less with error reports :).Signed-off-by: Jan Kara
-
The old description for this configuration option was perhaps not
completely balanced in terms of describing the tradeoffs of using a
default of data=writeback vs. data=ordered. Despite the fact that old
description very strongly recomended disabling this feature, all of
the major distributions have elected to preserve the existing 'legacy'
default, which is a strong hint that it perhaps wasn't telling the
whole story.This revised description has been vetted by a number of ext3
developers as being better at informing the user about the tradeoffs
of enabling or disabling this configuration feature.Cc: linux-ext4@vger.kernel.org
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Jan Kara -
vfs_read() offset is defined as loff_t, but kernel_read()
offset is only defined as unsigned long. Redefine
kernel_read() offset as loff_t.Cc: stable@kernel.org
Signed-off-by: Mimi Zohar
Signed-off-by: James Morris
22 Aug, 2009
2 commits
-
In commit a8e7d49aa7be728c4ae241a75a2a124cdcabc0c5 ("Fix race in
create_empty_buffers() vs __set_page_dirty_buffers()"), I removed a test
for a NULL page mapping unintentionally when some of the code inside
__set_page_dirty() was moved to the callers.That removal generally didn't matter, since a filesystem would serialize
truncation (which clears the page mapping) against writing (which marks
the buffer dirty), so locking at a higher level (either per-page or an
inode at a time) should mean that the buffer page would be stable. And
indeed, nothing bad seemed to happen.Except it turns out that apparently reiserfs does something odd when
under load and writing out the journal, and we have a number of bugzilla
entries that look similar:http://bugzilla.kernel.org/show_bug.cgi?id=13556
http://bugzilla.kernel.org/show_bug.cgi?id=13756
http://bugzilla.kernel.org/show_bug.cgi?id=13876and it looks like reiserfs depended on that check (the common theme
seems to be "data=journal", and a journal writeback during a truncate).I suspect reiserfs should have some additional locking, but in the
meantime this should get us back to the pre-2.6.29 behavior.Pattern-pointed-out-by: Roland Kletzing
Cc: stable@kernel.org (2.6.29 and 2.6.30)
Cc: Jeff Mahoney
Cc: Nick Piggin
Cc: Al Viro
Signed-off-by: Linus Torvalds -
* 'btrfs' of git://git.kernel.dk/linux-2.6-block:
btrfs: fix inode rbtree corruption
21 Aug, 2009
3 commits
-
Node may not be inserted over existing node. This causes inode tree
corruption and I was seeing crashes in inode_tree_del which I can not
reproduce after this patch.The other way to fix this would be to tie inode lifetime in the rbtree
with inode while not in freeing state. I had a look at this but it is
not so trivial at this point. At least this patch gets things working again.Signed-off-by: Nick Piggin
Cc: Chris Mason
Acked-by: Yan Zheng
Signed-off-by: Jens Axboe -
In case a downconvert is queued, and a flock receives a signal,
BUG_ON(lockres->l_action != OCFS2_AST_INVALID) is triggered
because a lock cancel triggers a dlmunlock while an AST is
scheduled.To avoid this, allow a LKM_CANCEL to pass through, and let it
wait on __dlm_wait_on_lockres().Signed-off-by: Goldwyn Rodrigues
Acked-off-by: Mark Fasheh
Signed-off-by: Joel Becker -
There is missing name for NFSSync cluster lock. This makes lockdep unhappy
because we end up passing NULL to lockdep when initializing lock key. Fix it.Signed-off-by: Jan Kara
Signed-off-by: Joel Becker
20 Aug, 2009
1 commit
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2:
nilfs2: fix oopses with doubly mounted snapshots
nilfs2: missing a read lock for segment writer in nilfs_attach_checkpoint()
19 Aug, 2009
3 commits
-
The commit 2ff05b2b (oom: move oom_adj value) moveed the oom_adj value to
the mm_struct. It was a very good first step for sanitize OOM.However Paul Menage reported the commit makes regression to his job
scheduler. Current OOM logic can kill OOM_DISABLED process.Why? His program has the code of similar to the following.
...
set_oom_adj(OOM_DISABLE); /* The job scheduler never killed by oom */
...
if (vfork() == 0) {
set_oom_adj(0); /* Invoked child can be killed */
execve("foo-bar-cmd");
}
....vfork() parent and child are shared the same mm_struct. then above
set_oom_adj(0) doesn't only change oom_adj for vfork() child, it's also
change oom_adj for vfork() parent. Then, vfork() parent (job scheduler)
lost OOM immune and it was killed.Actually, fork-setting-exec idiom is very frequently used in userland program.
We must not break this assumption.Then, this patch revert commit 2ff05b2b and related commit.
Reverted commit list
---------------------
- commit 2ff05b2b4e (oom: move oom_adj value from task_struct to mm_struct)
- commit 4d8b9135c3 (oom: avoid unnecessary mm locking and scanning for OOM_DISABLE)
- commit 8123681022 (oom: only oom kill exiting tasks with attached memory)
- commit 933b787b57 (mm: copy over oom_adj value at fork time)Signed-off-by: KOSAKI Motohiro
Cc: Paul Menage
Cc: David Rientjes
Cc: KAMEZAWA Hiroyuki
Cc: Rik van Riel
Cc: Linus Torvalds
Cc: Oleg Nesterov
Cc: Nick Piggin
Cc: Mel Gorman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
get_sb_pseudo sets s_maxbytes to ~0ULL which becomes negative when cast
to a signed value. Fix it to use MAX_LFS_FILESIZE which casts properly
to a positive signed value.Signed-off-by: Jeff Layton
Reviewed-by: Johannes Weiner
Acked-by: Steve French
Reviewed-by: Christoph Hellwig
Cc: Al Viro
Cc: Robert Love
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
will fix kernel oopses like the following:
# mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test1
# mount -t nilfs2 -r -o cp=20 /dev/sdb1 /test2
# umount /test1
# umount /test2BUG: sleeping function called from invalid context at arch/x86/mm/fault.c:1069
in_atomic(): 0, irqs_disabled(): 1, pid: 3886, name: umount.nilfs2
1 lock held by umount.nilfs2/3886:
#0: (&type->s_umount_key#31){+.+...}, at: [] deactivate_super+0x52/0x6c
irq event stamp: 1219
hardirqs last enabled at (1219): [] __mutex_unlock_slowpath+0xf8/0x119
hardirqs last disabled at (1218): [] __mutex_unlock_slowpath+0x59/0x119
softirqs last enabled at (1214): [] __do_softirq+0x1a5/0x1ad
softirqs last disabled at (1205): [] do_softirq+0x36/0x5a
Pid: 3886, comm: umount.nilfs2 Not tainted 2.6.31-rc6 #55
Call Trace:
[] __might_sleep+0x107/0x10e
[] do_page_fault+0x246/0x397
[] ? do_page_fault+0x0/0x397
[] error_code+0x6b/0x70
[] ? do_page_fault+0x0/0x397
[] ? __lock_acquire+0x91/0x12fd
[] ? __lock_acquire+0x12ee/0x12fd
[] ? __lock_acquire+0x12ee/0x12fd
[] lock_acquire+0xba/0xdd
[] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
[] down_write+0x2a/0x46
[] ? nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
[] nilfs_detach_segment_constructor+0x2f/0x2fa [nilfs2]
[] ? mark_held_locks+0x43/0x5b
[] ? trace_hardirqs_on_caller+0x10b/0x133
[] ? trace_hardirqs_on+0xb/0xd
[] nilfs_put_super+0x2f/0xca [nilfs2]
[] generic_shutdown_super+0x49/0xb8
[] kill_block_super+0x1d/0x31
[] ? vfs_quota_off+0x0/0x12
[] deactivate_super+0x57/0x6c
[] mntput_no_expire+0x8c/0xb4
[] sys_umount+0x27f/0x2a4
[] sys_oldumount+0xd/0xf
[] sysenter_do_call+0x12/0x38
...This turns out to be a bug brought by an -rc1 patch ("nilfs2: simplify
remaining sget() use").In the patch, a new "put resource" function, nilfs_put_sbinfo()
was introduced to delay freeing nilfs_sb_info struct.But the nilfs_put_sbinfo() mistakenly used atomic_dec_and_test()
function to check the reference count, and it caused the nilfs_sb_info
was freed when user mounted a snapshot twice.This bug also suggests there was unseen memory leak in usual mount
/umount operations for nilfs.Signed-off-by: Ryusuke Konishi
18 Aug, 2009
17 commits
-
'ns_cno' of structure 'the_nilfs' must be protected from segment
writer, in other words, the caller of nilfs_get_checkpoint should hold
read lock for nilfs->ns_segctor_sem. This patch adds the lock/unlock
operations in nilfs_attach_checkpoint() when calling
nilfs_cpfile_get_checkpoint().Signed-off-by: Zhang Qiang
Signed-off-by: Ryusuke Konishi -
The mount options string is saved in sb->s_options. This patch removes
the redundant duplicating of the mount options. Also, since we are not
displaying anything special in show options, we replace v9fs_show_options
with generic_show_options for now.Signed-off-by: Abhishek Kulkarni
Signed-off-by: Eric Van Hensbergen -
Cast the error return value (ENOMEM) in v9fs_get_inode() to its
correct type using ERR_PTR.Signed-off-by: Abhishek Kulkarni
Signed-off-by: Eric Van Hensbergen -
If we fail to mount the filesystem, we have to be careful not to dereference
uninitialized structures in ocfs2_kill_sb.Signed-off-by: Jan Kara
Signed-off-by: Joel Becker -
Remove a redundant update of inode's i_uid and i_gid
after v9fs_get_inode() since the latter already sets up
a new inode and sets the proper uid and gid values.Signed-off-by: Abhishek Kulkarni
Signed-off-by: Eric Van Hensbergen -
->get_sb can fail causing some badness. this patch fixes
* clear sb->fs_s_info in kill_sb.
* deactivate_locked_super() calls kill_sb (v9fs_kill_super) which closes the
destroys the client, clunks all its fids and closes the v9fs session.
Attempting to do it twice will cause an oops.Signed-off-by: Abhishek Kulkarni
Signed-off-by: Eric Van Hensbergen -
Add the delimiter ',' before the options when they are passed
and check if no option parameters are passed to prevent displaying
NULL in /proc/mounts.Signed-off-by: Abhishek Kulkarni
Signed-off-by: Eric Van Hensbergen -
Add missing p9stat_free in v9fs_inode_from_fid to avoid
any possible leaks.Signed-off-by: Abhishek Kulkarni
Signed-off-by: Eric Van Hensbergen -
Fix the comments -- mostly the improper and/or missing descriptions
of function parameters.Signed-off-by: Abhishek Kulkarni
Signed-off-by: Eric Van Hensbergen -
Add a missing iput when cleaning up if v9fs_get_inode
fails after returning a valid inode.Signed-off-by: Abhishek Kulkarni
Signed-off-by: Eric Van Hensbergen -
Check if v9fs_fid_add was successful or not based on its
return value.Signed-off-by: Abhishek Kulkarni
Signed-off-by: Eric Van Hensbergen -
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: fix locking in xfs_iget_cache_hit -
The inotify_add_watch man page specifies that inotify_add_watch() will
return a non-negative integer. However, historically the inotify
watches started at 1, not at 0.Turns out that the inotifywait program provided by the inotify-tools
package doesn't properly handle a 0 watch descriptor. In 7e790dd5 we
changed from starting at 1 to starting at 0. This patch starts at 1,
just like in previous kernels, but also just like in previous kernels
it's possible for it to wrap back to 0. This preserves the kernel
functionality exactly like it was before the patch (neither method broke
the spec)Signed-off-by: Eric Paris
Signed-off-by: Linus Torvalds -
In f44aebcc the tail drop logic of events with no file backing
(q_overflow and in_ignored) was reversed so IN_IGNORED events would
never be tail dropped. This now means that Q_OVERFLOW events are NOT
tail dropped. The fix is to not tail drop IN_IGNORED, but to tail drop
Q_OVERFLOW.Signed-off-by: Eric Paris
Signed-off-by: Linus Torvalds -
inotify decides if private data it passed to get added to an event was
used by checking list_empty(). But it's possible that the event may
have been dequeued and the private event removed so it would look empty.The fix is to use the return code from fsnotify_add_notify_event rather
than looking at the list.Signed-off-by: Eric Paris
Signed-off-by: Linus Torvalds -
In ocfs2_do_truncate, we forget to release last_eb_bh which
will cause memleak. So call brelse in the end.Signed-off-by: Tao Ma
Signed-off-by: Joel Becker -
ocfs2_read_virt_blocks() does BUG when we try to read a block from a file
beyond its end. Since this can happen due to filesystem corruption, it
is not really an appropriate answer. Make ocfs2_read_quota_block() check
the condition and handle it by calling ocfs2_error() and returning EIO.[ Modified to print ip_blkno in the error - Joel ]
Reported-by: Tristan Ye
Signed-off-by: Jan Kara
Signed-off-by: Joel Becker