Eric Lee / smarc-fsl-linux-kernel

29 Nov, 2012

6 commits

71613c3b8 get rid of pt_regs argument of ->load_binary() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-11-29 10:53:38 +0800
3c456bfc4 get rid of pt_regs argument of search_binary_handler() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-11-29 10:53:38 +0800
835ab32df get rid of pt_regs argument of do_execve_common() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-11-29 10:53:37 +0800
da3d4c5fa get rid of pt_regs argument of do_execve() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-11-29 10:53:37 +0800
d03d26e58 make compat_do_execve() static, lose pt_regs argument ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-11-29 10:53:37 +0800
c4144670f kill daemonize() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2012-11-29 10:49:02 +0800

24 Nov, 2012

1 commit

35f95d228 Merge tag 'for-linus-20121123' of git://git.infradead.org/mtd-2.6 ... Browse Code »

Pull MTD fixes from David Woodhouse:
"The most important part of this is that it fixes a regression in
Samsung NAND chip detection, introduced by some rework which went into
3.7. The initial fix wasn't quite complete, so it's in two parts. In
fact the first part is committed twice (Artem committed his own copy
of the same patch) and I've merged Artem's tree into mine which
already had that fix.

I'd have recommitted that to make it somewhat cleaner, but figured by
this point in the release cycle it was better to merge *exactly* the
commits which have been in linux-next.

If I'd recommitted, I'd also omit the sparse warning fix. But it's
there, and it's harmless — just marking one function as 'static' in
onenand code.

This also includes a couple more fixes for stable: an AB-BA deadlock
in JFFS2, and an invalid range check in slram."

* tag 'for-linus-20121123' of git://git.infradead.org/mtd-2.6:
mtd: nand: fix Samsung SLC detection regression
mtd: nand: fix Samsung SLC NAND identification regression
jffs2: Fix lock acquisition order bug in jffs2_write_begin
mtd: onenand: Make flexonenand_set_boundary static
mtd: slram: invalid checking of absolute end address
mtd: ofpart: Fix incorrect NULL check in parse_ofoldpart_partitions()
mtd: nand: fix Samsung SLC NAND identification regression

Linus Torvalds
2012-11-24 09:12:17 +0800

21 Nov, 2012

1 commit

ca6215dfc Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs ... Browse Code »

Pull reiserfs and ext3 fixes from Jan Kara:
"Fixes of reiserfs deadlocks when quotas are enabled (locking there was
completely busted by BKL conversion) and also one small ext3 fix in
the trim interface."

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
ext3: Avoid underflow of in ext3_trim_fs()
reiserfs: Move quota calls out of write lock
reiserfs: Protect reiserfs_quota_write() with write lock
reiserfs: Protect reiserfs_quota_on() with write lock
reiserfs: Fix lock ordering during remount

Linus Torvalds
2012-11-21 12:48:25 +0800

20 Nov, 2012

5 commits

ae49eeec7 ext3: Avoid underflow of in ext3_trim_fs() ... Browse Code »

Currently if len argument in ext3_trim_fs() is smaller than one block,
the 'end' variable underflow. Avoid that by returning EINVAL if len is
smaller than file system block.

Also remove useless unlikely().

Signed-off-by: Lukas Czerner
Signed-off-by: Jan Kara

Lukas Czerner
2012-11-20 04:36:12 +0800
7af116869 reiserfs: Move quota calls out of write lock ... Browse Code »

Calls into highlevel quota code cannot happen under the write lock. These
calls take dqio_mutex which ranks above write lock. So drop write lock
before calling back into quota code.

CC: stable@vger.kernel.org # >= 3.0
Signed-off-by: Jan Kara

Jan Kara
2012-11-20 04:34:33 +0800
361d94a33 reiserfs: Protect reiserfs_quota_write() with write lock ... Browse Code »

Calls into reiserfs journalling code and reiserfs_get_block() need to
be protected with write lock. We remove write lock around calls to high
level quota code in the next patch so these paths would suddently become
unprotected.

CC: stable@vger.kernel.org # >= 3.0
Signed-off-by: Jan Kara

Jan Kara
2012-11-20 04:34:33 +0800
b9e06ef2e reiserfs: Protect reiserfs_quota_on() with write lock ... Browse Code »

In reiserfs_quota_on() we do quite some work - for example unpacking
tail of a quota file. Thus we have to hold write lock until a moment
we call back into the quota code.

CC: stable@vger.kernel.org # >= 3.0
Signed-off-by: Jan Kara

Jan Kara
2012-11-20 04:34:32 +0800
3bb3e1fc4 reiserfs: Fix lock ordering during remount ... Browse Code »

When remounting reiserfs dquot_suspend() or dquot_resume() can be called.
These functions take dqonoff_mutex which ranks above write lock so we have
to drop it before calling into quota code.

CC: stable@vger.kernel.org # >= 3.0
Signed-off-by: Jan Kara

Jan Kara
2012-11-20 04:34:32 +0800

19 Nov, 2012

3 commits

3587b1b09 fanotify: fix FAN_Q_OVERFLOW case of fanotify_read() ... Browse Code »

If the FAN_Q_OVERFLOW bit set in event->mask, the fanotify event
metadata will not contain a valid file descriptor, but
copy_event_to_user() didn't check for that, and unconditionally does a
fd_install() on the file descriptor.

Which in turn will cause a BUG_ON() in __fd_install().

Introduced by commit 352e3b249284 ("fanotify: sanitize failure exits in
copy_event_to_user()")

Mea culpa - missed that path ;-/

Reported-by: Alex Shi
Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2012-11-19 03:30:00 +0800
8d938105e Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull misc VFS fixes from Al Viro:
"Remove a bogus BUG_ON() that can trigger spuriously + alpha bits of
do_mount() constification I'd missed during the merge window."

This pull request came in a week ago, I missed it for some reason.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
kill bogus BUG_ON() in do_close_on_exec()
missing const in alpha callers of do_mount()

Linus Torvalds
2012-11-19 03:13:48 +0800
d28d3730f Merge tag 'for-linus-v3.7-rc7' of git://oss.sgi.com/xfs/xfs ... Browse Code »

Pull xfs bugfixes from Ben Myers:

- fix attr tree double split corruption

- fix broken error handling in xfs_vm_writepage

- drop buffer io reference when a bad bio is built

* tag 'for-linus-v3.7-rc7' of git://oss.sgi.com/xfs/xfs:
xfs: drop buffer io reference when a bad bio is built
xfs: fix broken error handling in xfs_vm_writepage
xfs: fix attr tree double split corruption

Linus Torvalds
2012-11-19 02:29:34 +0800

17 Nov, 2012

4 commits

d69043c42 xfs: drop buffer io reference when a bad bio is built ... Browse Code »

Error handling in xfs_buf_ioapply_map() does not handle IO reference
counts correctly. We increment the b_io_remaining count before
building the bio, but then fail to decrement it in the failure case.
This leads to the buffer never running IO completion and releasing
the reference that the IO holds, so at unmount we can leak the
buffer. This leak is captured by this assert failure during unmount:

XFS: Assertion failed: atomic_read(&pag->pag_ref) == 0, file: fs/xfs/xfs_mount.c, line: 273

This is not a new bug - the b_io_remaining accounting has had this
problem for a long, long time - it's just very hard to get a
zero length bio being built by this code...

Further, the buffer IO error can be overwritten on a multi-segment
buffer by subsequent bio completions for partial sections of the
buffer. Hence we should only set the buffer error status if the
buffer is not already carrying an error status. This ensures that a
partial IO error on a multi-segment buffer will not be lost. This
part of the problem is a regression, however.

cc:
Signed-off-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-11-17 23:36:57 +0800
3daed8bc3 xfs: fix broken error handling in xfs_vm_writepage ... Browse Code »

When we shut down the filesystem, it might first be detected in
writeback when we are allocating a inode size transaction. This
happens after we have moved all the pages into the writeback state
and unlocked them. Unfortunately, if we fail to set up the
transaction we then abort writeback and try to invalidate the
current page. This then triggers are BUG() in block_invalidatepage()
because we are trying to invalidate an unlocked page.

Fixing this is a bit of a chicken and egg problem - we can't
allocate the transaction until we've clustered all the pages into
the IO and we know the size of it (i.e. whether the last block of
the IO is beyond the current EOF or not). However, we don't want to
hold pages locked for long periods of time, especially while we lock
other pages to cluster them into the write.

To fix this, we need to make a clear delineation in writeback where
errors can only be handled by IO completion processing. That is,
once we have marked a page for writeback and unlocked it, we have to
report errors via IO completion because we've already started the
IO. We may not have submitted any IO, but we've changed the page
state to indicate that it is under IO so we must now use the IO
completion path to report errors.

To do this, add an error field to xfs_submit_ioend() to pass it the
error that occurred during the building on the ioend chain. When
this is non-zero, mark each ioend with the error and call
xfs_finish_ioend() directly rather than building bios. This will
immediately push the ioends through completion processing with the
error that has occurred.

Signed-off-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-11-17 23:35:42 +0800
42e2976f1 xfs: fix attr tree double split corruption ... Browse Code »

In certain circumstances, a double split of an attribute tree is
needed to insert or replace an attribute. In rare situations, this
can go wrong, leaving the attribute tree corrupted. In this case,
the attr being replaced is the last attr in a leaf node, and the
replacement is larger so doesn't fit in the same leaf node.
When we have the initial condition of a node format attribute
btree with two leaves at index 1 and 2. Call them L1 and L2. The
leaf L1 is completely full, there is not a single byte of free space
in it. L2 is mostly empty. The attribute being replaced - call it X
- is the last attribute in L1.

The way an attribute replace is executed is that the replacement
attribute - call it Y - is first inserted into the tree, but has an
INCOMPLETE flag set on it so that list traversals ignore it. Once
this transaction is committed, a second transaction it run to
atomically mark Y as COMPLETE and X as INCOMPLETE, so that a
traversal will now find Y and skip X. Once that transaction is
committed, attribute X is then removed.

So, the initial condition is:

+--------+ +--------+
| L1 | | L2 |
| fwd: 2 |---->| fwd: 0 |
| bwd: 0 || fwd: 3 |---->| fwd: 2 |---->| fwd: 0 |
| bwd: 0 || fwd: 3 |---->| fwd: 2 |---->| fwd: 0 |
| bwd: 0 |
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-11-17 23:34:13 +0800
fa0cbbf14 mm, oom: reintroduce /proc/pid/oom_adj ... Browse Code »

This is mostly a revert of 01dc52ebdf47 ("oom: remove deprecated oom_adj")
from Davidlohr Bueso.

It reintroduces /proc/pid/oom_adj for backwards compatibility with earlier
kernels. It simply scales the value linearly when /proc/pid/oom_score_adj
is written.

The major difference is that its scheduled removal is no longer included
in Documentation/feature-removal-schedule.txt. We do warn users with a
single printk, though, to suggest the more powerful and supported
/proc/pid/oom_score_adj interface.

Reported-by: Artem S. Tashkinov
Signed-off-by: David Rientjes
Signed-off-by: Linus Torvalds

David Rientjes
2012-11-17 02:15:35 +0800

16 Nov, 2012

1 commit

ce95a36bb Merge tag 'upstream-3.7-rc6' of git://git.infradead.org/linux-ubifs ... Browse Code »

Pull UBIFS fixes from Artem Bityutskiy:
"Two patches which fix a problem reported by several people in the
past, but only fixed now because no one gave enough material for
debugging.

Anyway, these fix the problem that sometimes after a power cut the
file-system is not mountable with the following symptom:

grab_empty_leb: could not find an empty LEB

The fixes make the file-system mountable again."

* tag 'upstream-3.7-rc6' of git://git.infradead.org/linux-ubifs:
UBIFS: fix mounting problems after power cuts
UBIFS: introduce categorized lprops counter

Linus Torvalds
2012-11-16 03:28:43 +0800

15 Nov, 2012

1 commit

70a6f46d7 pstore: Fix NULL pointer dereference in console writes ... Browse Code »

Passing a NULL id causes a NULL pointer deference in writers such as
erst_writer and efi_pstore_write because they expect to update this id.
Pass a dummy id instead.

This avoids a cascade of oopses caused when the initial
pstore_console_write passes a null which in turn causes writes to the
console causing further oopses in subsequent pstore_console_write calls.

Signed-off-by: Colin Ian King
Acked-by: Kees Cook
Cc: stable@vger.kernel.org
Signed-off-by: Anton Vorontsov

Colin Ian King
2012-11-15 10:30:21 +0800

12 Nov, 2012

1 commit

5a8477660 kill bogus BUG_ON() in do_close_on_exec() ... Browse Code »

It can be legitimately triggered via procfs access. Now, at least
2 of 3 of get_files_struct() callers in procfs are useless, but
when and if we get rid of those we can always add WARN_ON() here.
BUG_ON() at that spot is simply wrong.

Signed-off-by: Al Viro

Al Viro
2012-11-12 14:19:02 +0800

10 Nov, 2012

1 commit

affd9a8db Merge branch 'for-linus' of git://git.samba.org/sfrench/cifs-2.6 ... Browse Code »

Pull cifs fixes from Jeff Layton.

* 'for-linus' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Do not lookup hashed negative dentry in cifs_atomic_open
cifs: fix potential buffer overrun in cifs.idmap handling code

Linus Torvalds
2012-11-10 13:59:35 +0800

09 Nov, 2012

15 commits

5ffd3412a jffs2: Fix lock acquisition order bug in jffs2_write_begin ... Browse Code »

jffs2_write_begin() first acquires the page lock, then f->sem. This
causes an AB-BA deadlock with jffs2_garbage_collect_live(), which first
acquires f->sem, then the page lock:

jffs2_garbage_collect_live
mutex_lock(&f->sem) (A)
jffs2_garbage_collect_dnode
jffs2_gc_fetch_page
read_cache_page_async
do_read_cache_page
lock_page(page) (B)

jffs2_write_begin
grab_cache_page_write_begin
find_lock_page
lock_page(page) (B)
mutex_lock(&f->sem) (A)

We fix this by restructuring jffs2_write_begin() to take f->sem before
the page lock. However, we make sure that f->sem is not held when
calling jffs2_reserve_space(), as this is not permitted by the locking
rules.

The deadlock above was observed multiple times on an SoC with a dual
ARMv7 (Cortex-A9), running the long-term 3.4.11 kernel; it occurred
when using scp to copy files from a host system to the ARM target
system. The fix was heavily tested on the same target system.

Cc: stable@vger.kernel.org
Signed-off-by: Thomas Betker
Acked-by: Joakim Tjernlund
Signed-off-by: Artem Bityutskiy

Thomas Betker
2012-11-09 23:02:50 +0800
ce6d841e9 Merge branch 'akpm' (Fixes from Andrew) ... Browse Code »

Merge misc fixes from Andrew Morton:
"Five fixes"

* emailed patches from Andrew Morton : (5 patches)
h8300: add missing L1_CACHE_SHIFT
mm: bugfix: set current->reclaim_state to NULL while returning from kswapd()
fanotify: fix missing break
revert "epoll: support for disabling items, and a self-test app"
checkpatch: improve network block comment style checking

Linus Torvalds
2012-11-09 13:53:02 +0800
a601e6371 Merge tag 'for-linus-v3.7-rc5' of git://oss.sgi.com/xfs/xfs ... Browse Code »

Pull xfs bugfixes from Ben Myers:

- fix for large transactions spanning multiple iclog buffers

- zero the allocation_args structure on the stack before using it to
determine whether to use a worker for allocation
- move allocation stack switch to xfs_bmapi_allocate in order to
prevent deadlock on AGF buffers

- growfs no longer reads in garbage for new secondary superblocks

- silence a build warning

- ensure that invalid buffers never get written to disk while on free
list

- don't vmap inode cluster buffers during free

- fix buffer shutdown reference count mismatch

- fix reading of wrapped log data

* tag 'for-linus-v3.7-rc5' of git://oss.sgi.com/xfs/xfs:
xfs: fix reading of wrapped log data
xfs: fix buffer shudown reference count mismatch
xfs: don't vmap inode cluster buffers during free
xfs: invalidate allocbt blocks moved to the free list
xfs: silence uninitialised f.file warning.
xfs: growfs: don't read garbage for new secondary superblocks
xfs: move allocation stack switch up to xfs_bmapi_allocate
xfs: introduce XFS_BMAPI_STACK_SWITCH
xfs: zero allocation_args on the kernel stack
xfs: only update the last_sync_lsn when a transaction completes

Linus Torvalds
2012-11-09 13:42:51 +0800
848561d36 fanotify: fix missing break ... Browse Code »

Anders Blomdell noted in 2010 that Fanotify lost events and provided a
test case. Eric Paris confirmed it was a bug and posted a fix to the
list

https://groups.google.com/forum/?fromgroups=#!topic/linux.kernel/RrJfTfyW2BE

but never applied it. Repeated attempts over time to actually get him
to apply it have never had a reply from anyone who has raised it

So apply it anyway

Signed-off-by: Alan Cox
Reported-by: Anders Blomdell
Cc: Eric Paris
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Paris
2012-11-09 13:41:47 +0800
a80a6b85b revert "epoll: support for disabling items, and a self-test app" ... Browse Code »

Revert commit 03a7beb55b9f ("epoll: support for disabling items, and a
self-test app") pending resolution of the issues identified by Michael
Kerrisk, copied below.

We'll revisit this for 3.8.

: I've taken a look at this patch as it currently stands in 3.7-rc1, and
: done a bit of testing. (By the way, the test program
: tools/testing/selftests/epoll/test_epoll.c does not compile...)
:
: There are one or two places where the behavior seems a little strange,
: so I have a question or two at the end of this mail. But other than
: that, I want to check my understanding so that the interface can be
: correctly documented.
:
: Just to go though my understanding, the problem is the following
: scenario in a multithreaded application:
:
: 1. Multiple threads are performing epoll_wait() operations,
: and maintaining a user-space cache that contains information
: corresponding to each file descriptor being monitored by
: epoll_wait().
:
: 2. At some point, a thread wants to delete (EPOLL_CTL_DEL)
: a file descriptor from the epoll interest list, and
: delete the corresponding record from the user-space cache.
:
: 3. The problem with (2) is that some other thread may have
: previously done an epoll_wait() that retrieved information
: about the fd in question, and may be in the middle of using
: information in the cache that relates to that fd. Thus,
: there is a potential race.
:
: 4. The race can't solved purely in user space, because doing
: so would require applying a mutex across the epoll_wait()
: call, which would of course blow thread concurrency.
:
: Right?
:
: Your solution is the EPOLL_CTL_DISABLE operation. I want to
: confirm my understanding about how to use this flag, since
: the description that has accompanied the patches so far
: has been a bit sparse
:
: 0. In the scenario you're concerned about, deleting a file
: descriptor means (safely) doing the following:
: (a) Deleting the file descriptor from the epoll interest list
: using EPOLL_CTL_DEL
: (b) Deleting the corresponding record in the user-space cache
:
: 1. It's only meaningful to use this EPOLL_CTL_DISABLE in
: conjunction with EPOLLONESHOT.
:
: 2. Using EPOLL_CTL_DISABLE without using EPOLLONESHOT in
: conjunction is a logical error.
:
: 3. The correct way to code multithreaded applications using
: EPOLL_CTL_DISABLE and EPOLLONESHOT is as follows:
:
: a. All EPOLL_CTL_ADD and EPOLL_CTL_MOD operations should
: should EPOLLONESHOT.
:
: b. When a thread wants to delete a file descriptor, it
: should do the following:
:
: [1] Call epoll_ctl(EPOLL_CTL_DISABLE)
: [2] If the return status from epoll_ctl(EPOLL_CTL_DISABLE)
: was zero, then the file descriptor can be safely
: deleted by the thread that made this call.
: [3] If the epoll_ctl(EPOLL_CTL_DISABLE) fails with EBUSY,
: then the descriptor is in use. In this case, the calling
: thread should set a flag in the user-space cache to
: indicate that the thread that is using the descriptor
: should perform the deletion operation.
:
: Is all of the above correct?
:
: The implementation depends on checking on whether
: (events & ~EP_PRIVATE_BITS) == 0
: This replies on the fact that EPOLL_CTL_AD and EPOLL_CTL_MOD always
: set EPOLLHUP and EPOLLERR in the 'events' mask, and EPOLLONESHOT
: causes those flags (as well as all others in ~EP_PRIVATE_BITS) to be
: cleared.
:
: A corollary to the previous paragraph is that using EPOLL_CTL_DISABLE
: is only useful in conjunction with EPOLLONESHOT. However, as things
: stand, one can use EPOLL_CTL_DISABLE on a file descriptor that does
: not have EPOLLONESHOT set in 'events' This results in the following
: (slightly surprising) behavior:
:
: (a) The first call to epoll_ctl(EPOLL_CTL_DISABLE) returns 0
: (the indicator that the file descriptor can be safely deleted).
: (b) The next call to epoll_ctl(EPOLL_CTL_DISABLE) fails with EBUSY.
:
: This doesn't seem particularly useful, and in fact is probably an
: indication that the user made a logic error: they should only be using
: epoll_ctl(EPOLL_CTL_DISABLE) on a file descriptor for which
: EPOLLONESHOT was set in 'events'. If that is correct, then would it
: not make sense to return an error to user space for this case?

Cc: Michael Kerrisk
Cc: "Paton J. Lewis"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2012-11-09 13:41:46 +0800
6ce377afd xfs: fix reading of wrapped log data ... Browse Code »

Commit 4439647 ("xfs: reset buffer pointers before freeing them") in
3.0-rc1 introduced a regression when recovering log buffers that
wrapped around the end of log. The second part of the log buffer at
the start of the physical log was being read into the header buffer
rather than the data buffer, and hence recovery was seeing garbage
in the data buffer when it got to the region of the log buffer that
was incorrectly read.

Cc: # 3.0.x, 3.2.x, 3.4.x 3.6.x
Reported-by: Torsten Kaiser
Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-11-09 01:10:51 +0800
03b1293ed xfs: fix buffer shudown reference count mismatch ... Browse Code »

When we shut down the filesystem, we have to unpin and free all the
buffers currently active in the CIL. To do this we unpin and remove
them in one operation as a result of a failed iclogbuf write. For
buffers, we do this removal via a simultated IO completion of after
marking the buffer stale.

At the time we do this, we have two references to the buffer - the
active LRU reference and the buf log item. The LRU reference is
removed by marking the buffer stale, and the active CIL reference is
by the xfs_buf_iodone() callback that is run by
xfs_buf_do_callbacks() during ioend processing (via the bp->b_iodone
callback).

However, ioend processing requires one more reference - that of the
IO that it is completing. We don't have this reference, so we free
the buffer prematurely and use it after it is freed. For buffers
marked with XBF_ASYNC, this leads to assert failures in
xfs_buf_rele() on debug kernels because the b_hold count is zero.

Fix this by making sure we take the necessary IO reference before
starting IO completion processing on the stale buffer, and set the
XBF_ASYNC flag to ensure that IO completion processing removes all
the active references from the buffer to ensure it is fully torn
down.

Cc:
Signed-off-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-11-09 01:10:35 +0800
4b62acfe9 xfs: don't vmap inode cluster buffers during free ... Browse Code »

Inode buffers do not need to be mapped as inodes are read or written
directly from/to the pages underlying the buffer. This fixes a
regression introduced by commit 611c994 ("xfs: make XBF_MAPPED the
default behaviour").

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-11-09 01:10:18 +0800
ca250b1b3 xfs: invalidate allocbt blocks moved to the free list ... Browse Code »

When we free a block from the alloc btree tree, we move it to the
freelist held in the AGFL and mark it busy in the busy extent tree.
This typically happens when we merge btree blocks.

Once the transaction is committed and checkpointed, the block can
remain on the free list for an indefinite amount of time. Now, this
isn't the end of the world at this point - if the free list is
shortened, the buffer is invalidated in the transaction that moves
it back to free space. If the buffer is allocated as metadata from
the free list, then all the modifications getted logged, and we have
no issues, either. And if it gets allocated as userdata direct from
the freelist, it gets invalidated and so will never get written.

However, during the time it sits on the free list, pressure on the
log can cause the AIL to be pushed and the buffer that covers the
block gets pushed for write. IOWs, we end up writing a freed
metadata block to disk. Again, this isn't the end of the world
because we know from the above we are only writing to free space.

The problem, however, is for validation callbacks. If the block was
on old btree root block, then the level of the block is going to be
higher than the current tree root, and so will fail validation.
There may be other inconsistencies in the block as well, and
currently we don't care because the block is in free space. Shutting
down the filesystem because a freed block doesn't pass write
validation, OTOH, is rather unfriendly.

So, make sure we always invalidate buffers as they move from the
free space trees to the free list so that we guarantee they never
get written to disk while on the free list.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Phil White
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-11-09 01:09:44 +0800
1e7acbb7b xfs: silence uninitialised f.file warning. ... Browse Code »

Uninitialised variable build warning introduced by 2903ff0 ("switch
simple cases of fget_light to fdget"), gcc is not smart enough to
work out that the variable is not used uninitialised, and the commit
removed the initialisation at declaration that the old variable had.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-11-09 01:09:17 +0800
eaef85433 xfs: growfs: don't read garbage for new secondary superblocks ... Browse Code »

When updating new secondary superblocks in a growfs operation, the
superblock buffer is read from the newly grown region of the
underlying device. This is not guaranteed to be zero, so violates
the underlying assumption that the unused parts of superblocks are
zero filled. Get a new buffer for these secondary superblocks to
ensure that the unused regions are zero filled correctly.

Signed-off-by: Dave Chinner
Reviewed-by: Carlos Maiolino
Signed-off-by: Ben Myers

Dave Chinner
2012-11-09 01:08:57 +0800
1f3c785c3 xfs: move allocation stack switch up to xfs_bmapi_allocate ... Browse Code »

Switching stacks are xfs_alloc_vextent can cause deadlocks when we
run out of worker threads on the allocation workqueue. This can
occur because xfs_bmap_btalloc can make multiple calls to
xfs_alloc_vextent() and even if xfs_alloc_vextent() fails it can
return with the AGF locked in the current allocation transaction.

If we then need to make another allocation, and all the allocation
worker contexts are exhausted because the are blocked waiting for
the AGF lock, holder of the AGF cannot get it's xfs-alloc_vextent
work completed to release the AGF. Hence allocation effectively
deadlocks.

To avoid this, move the stack switch one layer up to
xfs_bmapi_allocate() so that all of the allocation attempts in a
single switched stack transaction occur in a single worker context.
This avoids the problem of an allocation being blocked waiting for
a worker thread whilst holding the AGF.

Signed-off-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-11-09 01:08:46 +0800
326c03555 xfs: introduce XFS_BMAPI_STACK_SWITCH ... Browse Code »

Certain allocation paths through xfs_bmapi_write() are in situations
where we have limited stack available. These are almost always in
the buffered IO writeback path when convertion delayed allocation
extents to real extents.

The current stack switch occurs for userdata allocations, which
means we also do stack switches for preallocation, direct IO and
unwritten extent conversion, even those these call chains have never
been implicated in a stack overrun.

Hence, let's target just the single stack overun offended for stack
switches. To do that, introduce a XFS_BMAPI_STACK_SWITCH flag that
the caller can pass xfs_bmapi_write() to indicate it should switch
stacks if it needs to do allocation.

Signed-off-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2012-11-09 01:08:27 +0800
408cc4e97 xfs: zero allocation_args on the kernel stack ... Browse Code »

Zero the kernel stack space that makes up the xfs_alloc_arg structures.

Signed-off-by: Mark Tinguely
Reviewed-by: Ben Myers
Signed-off-by: Ben Myers

Mark Tinguely
2012-11-09 01:08:10 +0800
7e9620f21 xfs: only update the last_sync_lsn when a transaction completes ... Browse Code »

The log write code stamps each iclog with the current tail LSN in
the iclog header so that recovery knows where to find the tail of
thelog once it has found the head. Normally this is taken from the
first item on the AIL - the log item that corresponds to the oldest
active item in the log.

The problem is that when the AIL is empty, the tail lsn is dervied
from the the l_last_sync_lsn, which is the LSN of the last iclog to
be written to the log. In most cases this doesn't happen, because
the AIL is rarely empty on an active filesystem. However, when it
does, it opens up an interesting case when the transaction being
committed to the iclog spans multiple iclogs.

That is, the first iclog is stamped with the l_last_sync_lsn, and IO
is issued. Then the next iclog is setup, the changes copied into the
iclog (takes some time), and then the l_last_sync_lsn is stamped
into the header and IO is issued. This is still the same
transaction, so the tail lsn of both iclogs must be the same for log
recovery to find the entire transaction to be able to replay it.

The problem arises in that the iclog buffer IO completion updates
the l_last_sync_lsn with it's own LSN. Therefore, If the first iclog
completes it's IO before the second iclog is filled and has the tail
lsn stamped in it, it will stamp the LSN of the first iclog into
it's tail lsn field. If the system fails at this point, log recovery
will not see a complete transaction, so the transaction will no be
replayed.

The fix is simple - the l_last_sync_lsn is updated when a iclog
buffer IO completes, and this is incorrect. The l_last_sync_lsn
shoul dbe updated when a transaction is completed by a iclog buffer
IO. That is, only iclog buffers that have transaction commit
callbacks attached to them should update the l_last_sync_lsn. This
means that the last_sync_lsn will only move forward when a commit
record it written, not in the middle of a large transaction that is
rolling through multiple iclog buffers.

Signed-off-by: Dave Chinner
Reviewed-by: Mark Tinguely
Reviewed-by: Christoph Hellwig
Signed-off-by: Ben Myers

Dave Chinner
2012-11-09 01:07:38 +0800

07 Nov, 2012

1 commit

0e4a43ed0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes ... Browse Code »

Pull gfs2 fixes from Steven Whitehouse:
"Here are a number of GFS2 bug fixes. There are three from Andy Price
which fix various issues spotted by automated code analysis. There
are two from Lukas Czerner fixing my mistaken assumptions as to how
FITRIM should work. Finally Ben Marzinski has fixed a bug relating to
mmap and atime and also a bug relating to a locking issue in the
transaction code."

* git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-3.0-fixes:
GFS2: Test bufdata with buffer locked and gfs2_log_lock held
GFS2: Don't call file_accessed() with a shared glock
GFS2: Fix FITRIM argument handling
GFS2: Require user to provide argument for FITRIM
GFS2: Clean up some unused assignments
GFS2: Fix possible null pointer deref in gfs2_rs_alloc
GFS2: Fix an unchecked error from gfs2_rs_alloc

Linus Torvalds
2012-11-07 20:38:56 +0800