Eric Lee / smarc-fsl-linux-kernel

02 Oct, 2013

2 commits

517bf8fc2 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs lru leak fix from Al Viro:
"The fix in "super: fix for destroy lrus" didn't - they need to be
destroyed, all right, but that's the wrong place..."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs/super.c: fix lru_list leak for real

Linus Torvalds
2013-10-02 01:28:11 +0800
c2d22ecd3 fs/super.c: fix lru_list leak for real ... Browse Code »

Freeing ->s_{inode,dentry}_lru in deactivate_locked_super() is wrong;
the right place is destroy_super(). As it is, we leak them if sget()
decides that new superblock it has allocated (and never shown to
anybody) isn't needed and should be freed.

Signed-off-by: Al Viro

Al Viro
2013-10-02 01:11:21 +0800

01 Oct, 2013

4 commits

f92731884 Merge tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client bugfixes from Trond Myklebust:
- Stable fix for Oopses in the pNFS files layout driver
- Fix a regression when doing a non-exclusive file create on NFSv4.x
- NFSv4.1 security negotiation fixes when looking up the root
filesystem
- Fix a memory ordering issue in the pNFS files layout driver

* tag 'nfs-for-3.12-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Give "flavor" an initial value to fix a compile warning
NFSv4.1: try SECINFO_NO_NAME flavs until one works
NFSv4.1: Ensure memory ordering between nfs4_ds_connect and nfs4_fl_prepare_ds
NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails
NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem method

Linus Torvalds
2013-10-01 08:10:26 +0800
522d6d38f Merge branch 'akpm' (fixes from Andrew Morton) ... Browse Code »

Merge misc fixes from Andrew Morton.

* emailed patches from Andrew Morton : (22 commits)
pidns: fix free_pid() to handle the first fork failure
ipc,msg: prevent race with rmid in msgsnd,msgrcv
ipc/sem.c: update sem_otime for all operations
mm/hwpoison: fix the lack of one reference count against poisoned page
mm/hwpoison: fix false report on 2nd attempt at page recovery
mm/hwpoison: fix test for a transparent huge page
mm/hwpoison: fix traversal of hugetlbfs pages to avoid printk flood
block: change config option name for cmdline partition parsing
mm/mlock.c: prevent walking off the end of a pagetable in no-pmd configuration
mm: avoid reinserting isolated balloon pages into LRU lists
arch/parisc/mm/fault.c: fix uninitialized variable usage
include/asm-generic/vtime.h: avoid zero-length file
nilfs2: fix issue with race condition of competition between segments for dirty blocks
Documentation/kernel-parameters.txt: replace kernelcore with Movable
mm/bounce.c: fix a regression where MS_SNAP_STABLE (stable pages snapshotting) was ignored
kernel/kmod.c: check for NULL in call_usermodehelper_exec()
ipc/sem.c: synchronize the proc interface
ipc/sem.c: optimize sem_lock()
ipc/sem.c: fix race in sem_lock()
mm/compaction.c: periodically schedule when freeing pages
...

Linus Torvalds
2013-10-01 05:32:32 +0800
7f42ec394 nilfs2: fix issue with race condition of competition between segments for dirty blocks ... Browse Code »

Many NILFS2 users were reported about strange file system corruption
(for example):

NILFS: bad btree node (blocknr=185027): level = 0, flags = 0x0, nchildren = 768
NILFS error (device sda4): nilfs_bmap_last_key: broken bmap (inode number=11540)

But such error messages are consequence of file system's issue that takes
place more earlier. Fortunately, Jerome Poulin
and Anton Eliasson were reported about another
issue not so recently. These reports describe the issue with segctor
thread's crash:

BUG: unable to handle kernel paging request at 0000000000004c83
IP: nilfs_end_page_io+0x12/0xd0 [nilfs2]

Call Trace:
nilfs_segctor_do_construct+0xf25/0x1b20 [nilfs2]
nilfs_segctor_construct+0x17b/0x290 [nilfs2]
nilfs_segctor_thread+0x122/0x3b0 [nilfs2]
kthread+0xc0/0xd0
ret_from_fork+0x7c/0xb0

These two issues have one reason. This reason can raise third issue
too. Third issue results in hanging of segctor thread with eating of
100% CPU.

REPRODUCING PATH:

One of the possible way or the issue reproducing was described by
Jermoe me Poulin :

1. init S to get to single user mode.
2. sysrq+E to make sure only my shell is running
3. start network-manager to get my wifi connection up
4. login as root and launch "screen"
5. cd /boot/log/nilfs which is a ext3 mount point and can log when NILFS dies.
6. lscp | xz -9e > lscp.txt.xz
7. mount my snapshot using mount -o cp=3360839,ro /dev/vgUbuntu/root /mnt/nilfs
8. start a screen to dump /proc/kmsg to text file since rsyslog is killed
9. start a screen and launch strace -f -o find-cat.log -t find
/mnt/nilfs -type f -exec cat {} > /dev/null \;
10. start a screen and launch strace -f -o apt-get.log -t apt-get update
11. launch the last command again as it did not crash the first time
12. apt-get crashes
13. ps aux > ps-aux-crashed.log
13. sysrq+W
14. sysrq+E wait for everything to terminate
15. sysrq+SUSB

Simplified way of the issue reproducing is starting kernel compilation
task and "apt-get update" in parallel.

REPRODUCIBILITY:

The issue is reproduced not stable [60% - 80%]. It is very important to
have proper environment for the issue reproducing. The critical
conditions for successful reproducing:

(1) It should have big modified file by mmap() way.

(2) This file should have the count of dirty blocks are greater that
several segments in size (for example, two or three) from time to time
during processing.

(3) It should be intensive background activity of files modification
in another thread.

INVESTIGATION:

First of all, it is possible to see that the reason of crash is not valid
page address:

NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82
NILFS [nilfs_segctor_complete_write]:2101 segbuf->sb_segnum 6783

Moreover, value of b_page (0x1a82) is 6786. This value looks like segment
number. And b_blocknr with b_size values look like block numbers. So,
buffer_head's pointer points on not proper address value.

Detailed investigation of the issue is discovered such picture:

[-----------------------------SEGMENT 6783-------------------------------]
NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111149024, segbuf->sb_segnum 6783

[-----------------------------SEGMENT 6784-------------------------------]
NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff8802174a6798, bh->b_assoc_buffers.prev ffff880221cffee8
NILFS [nilfs_segctor_do_construct]:2336 nilfs_segctor_assign
NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 1, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6784
NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111150080, segbuf->sb_segnum 6784, segbuf->sb_nbio 0
[----------] ditto
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111164416, segbuf->sb_segnum 6784, segbuf->sb_nbio 15

[-----------------------------SEGMENT 6785-------------------------------]
NILFS [nilfs_segctor_do_construct]:2310 nilfs_segctor_begin_construction
NILFS [nilfs_segctor_do_construct]:2321 nilfs_segctor_collect
NILFS [nilfs_lookup_dirty_data_buffers]:782 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
NILFS [nilfs_lookup_dirty_data_buffers]:783 bh->b_assoc_buffers.next ffff880219277e80, bh->b_assoc_buffers.prev ffff880221cffc88
NILFS [nilfs_segctor_do_construct]:2367 nilfs_segctor_update_segusage
NILFS [nilfs_segctor_do_construct]:2371 nilfs_segctor_prepare_write
NILFS [nilfs_segctor_do_construct]:2376 nilfs_add_checksums_on_logs
NILFS [nilfs_segctor_do_construct]:2381 nilfs_segctor_write
NILFS [nilfs_segbuf_submit_bh]:575 bh->b_count 2, bh->b_page ffffea000709b000, page->index 0, i_ino 1033103, i_size 25165824
NILFS [nilfs_segbuf_submit_bh]:576 segbuf->sb_segnum 6785
NILFS [nilfs_segbuf_submit_bh]:577 bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111165440, segbuf->sb_segnum 6785, segbuf->sb_nbio 0
[----------] ditto
NILFS [nilfs_segbuf_submit_bio]:464 bio->bi_sector 111177728, segbuf->sb_segnum 6785, segbuf->sb_nbio 12

NILFS [nilfs_segctor_do_construct]:2399 nilfs_segctor_wait
NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6783
NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6784
NILFS [nilfs_segbuf_wait]:676 segbuf->sb_segnum 6785

NILFS [nilfs_segctor_complete_write]:2100 bh->b_count 0, bh->b_blocknr 13895680, bh->b_size 13897727, bh->b_page 0000000000001a82

BUG: unable to handle kernel paging request at 0000000000001a82
IP: [] nilfs_end_page_io+0x12/0xd0 [nilfs2]

Usually, for every segment we collect dirty files in list. Then, dirty
blocks are gathered for every dirty file, prepared for write and
submitted by means of nilfs_segbuf_submit_bh() call. Finally, it takes
place complete write phase after calling nilfs_end_bio_write() on the
block layer. Buffers/pages are marked as not dirty on final phase and
processed files removed from the list of dirty files.

It is possible to see that we had three prepare_write and submit_bio
phases before segbuf_wait and complete_write phase. Moreover, segments
compete between each other for dirty blocks because on every iteration
of segments processing dirty buffer_heads are added in several lists of
payload_buffers:

[SEGMENT 6784]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880218bcdf50
[SEGMENT 6785]: bh->b_assoc_buffers.next ffff880218a0d5f8, bh->b_assoc_buffers.prev ffff880222cc7ee8

The next pointer is the same but prev pointer has changed. It means
that buffer_head has next pointer from one list but prev pointer from
another. Such modification can be made several times. And, finally, it
can be resulted in various issues: (1) segctor hanging, (2) segctor
crashing, (3) file system metadata corruption.

FIX:
This patch adds:

(1) setting of BH_Async_Write flag in nilfs_segctor_prepare_write()
for every proccessed dirty block;

(2) checking of BH_Async_Write flag in
nilfs_lookup_dirty_data_buffers() and
nilfs_lookup_dirty_node_buffers();

(3) clearing of BH_Async_Write flag in nilfs_segctor_complete_write(),
nilfs_abort_logs(), nilfs_forget_buffer(), nilfs_clear_dirty_page().

Reported-by: Jerome Poulin
Reported-by: Anton Eliasson
Cc: Paul Fertser
Cc: ARAI Shun-ichi
Cc: Piotr Szymaniak
Cc: Juan Barry Manuel Canham
Cc: Zahid Chowdhury
Cc: Elmer Zhang
Cc: Kenneth Langga
Signed-off-by: Vyacheslav Dubeyko
Acked-by: Ryusuke Konishi
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vyacheslav Dubeyko
2013-10-01 05:31:02 +0800
720236569 fs/binfmt_elf.c: prevent a coredump with a large vm_map_count from Oopsing ... Browse Code »

A high setting of max_map_count, and a process core-dumping with a large
enough vm_map_count could result in an NT_FILE note not being written,
and the kernel crashing immediately later because it has assumed
otherwise.

Reproduction of the oops-causing bug described here:

https://lkml.org/lkml/2013/8/30/50

Rge ussue originated in commit 2aa362c49c31 ("coredump: extend core dump
note section to contain file names of mapped file") from Oct 4, 2012.

This patch make that section optional in that case. fill_files_note()
should signify the error, and also let the info struct in
elf_core_dump() be zero-initialized so that we can check for the
optionally written note.

[akpm@linux-foundation.org: avoid abusing E2BIG, remove a couple of not-really-needed local variables]
[akpm@linux-foundation.org: fix sparse warning]
Signed-off-by: Dan Aloni
Cc: Al Viro
Cc: Denys Vlasenko
Reported-by: Martin MOKREJS
Tested-by: Martin MOKREJS
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Aloni
2013-10-01 05:31:01 +0800

30 Sep, 2013

7 commits

13f358389 afs: dget_parent() can't return a negative dentry ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-09-30 10:02:24 +0800
7b9a2378b ocfs2: needs ->d_lock to poke in ->d_parent->d_inode from ->d_revalidate() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-09-30 10:02:20 +0800
494755558 sysv: Add forgotten superblock lock init for v7 fs ... Browse Code »

Superblock lock was replaced with (un)lock_super() removal, but left
uninitialized for Seventh Edition UNIX filesystem in the following commit (3.7):
c07cb01 sysv: drop lock/unlock super

Signed-off-by: Lubomir Rintel
Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Lubomir Rintel
2013-09-30 10:02:02 +0800
367156d9a NFS: Give "flavor" an initial value to fix a compile warning ... Browse Code »

The previous patch introduces a compile warning by not assigning an initial
value to the "flavor" variable. This could only be a problem if the server
returns a supported secflavor list of length zero, but it's better to
fix this before it's ever hit.

Signed-off-by: Anna Schumaker
Acked-by: Weston Andros Adamson
Signed-off-by: Trond Myklebust

Anna Schumaker
2013-09-30 04:03:34 +0800
58a8cf121 NFSv4.1: try SECINFO_NO_NAME flavs until one works ... Browse Code »

Call nfs4_lookup_root_sec for each flavor returned by SECINFO_NO_NAME until
one works.

One example of a situation this fixes:

- server configured for krb5
- server principal somehow gets deleted from KDC
- server still thinking krb is good, sends krb5 as first entry in
SECINFO_NO_NAME response
- client tries krb5, but this fails without even sending an RPC because
gssd's requests to the KDC can't find the server's principal

Signed-off-by: Weston Andros Adamson
Signed-off-by: Trond Myklebust

Weston Andros Adamson
2013-09-30 04:03:34 +0800
acd65e5bc NFSv4.1: Ensure memory ordering between nfs4_ds_connect and nfs4_fl_prepare_ds ... Browse Code »

We need to ensure that the initialisation of the data server nfs_client
structure in nfs4_ds_connect is correctly ordered w.r.t. the read of
ds->ds_clp in nfs4_fl_prepare_ds.

Signed-off-by: Trond Myklebust

Trond Myklebust
2013-09-30 03:58:35 +0800
52b26a3e1 NFSv4.1: nfs4_fl_prepare_ds - fix bugs when the connect attempt fails ... Browse Code »

- Fix an Oops when nfs4_ds_connect() returns an error.
- Always check the device status after waiting for a connect to complete.

Reported-by: Andy Adamson
Reported-by: Jeff Layton
Signed-off-by: Trond Myklebust
Cc: # v3.10+

Trond Myklebust
2013-09-30 03:56:35 +0800

29 Sep, 2013

1 commit

ddd23eb18 Merge tag 'xfs-for-linus-v3.12-rc3' of git://oss.sgi.com/xfs/xfs ... Browse Code »

Pull xfs bugfixes from Ben Myers:
- fix for directory node collapse regression
- fix for recovery over stale on disk structures
- fix for eofblocks ioctl
- fix asserts in xfs_inode_free
- lock the ail before removing an item from it

* tag 'xfs-for-linus-v3.12-rc3' of git://oss.sgi.com/xfs/xfs:
xfs: fix node forward in xfs_node_toosmall
xfs: log recovery lsn ordering needs uuid check
xfs: fix XFS_IOC_FREE_EOFBLOCKS definition
xfs: asserting lock not held during freeing not valid
xfs: lock the AIL before removing the buffer item

Linus Torvalds
2013-09-29 04:52:05 +0800

28 Sep, 2013

1 commit

e1f8826f5 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs ... Browse Code »

Pull reiserfs and UDF fixes from Jan Kara:
"The contains fix of an UDF oops when mounting corrupted media and a
fix of a race in reiserfs leading to oops"

* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
reiserfs: fix race with flush_used_journal_lists and flush_journal_list
reiserfs: remove useless flush_old_journal_lists
udf: Fortify LVID loading

Linus Torvalds
2013-09-28 00:31:09 +0800

26 Sep, 2013

2 commits

997def25e xfs: fix node forward in xfs_node_toosmall ... Browse Code »

Commit f5ea1100 cleans up the disk to host conversions for
node directory entries, but because a variable is reused in
xfs_node_toosmall() the next node is not correctly found.
If the original node is small enough (< BBTOB(bp->b_length),
file: /root/newest/xfs/fs/xfs/xfs_trans_buf.c, line: 569

Keep the original node header to get the correct forward node.

(When a node is considered for a merge with a sibling, it overwrites the
sibling pointers of the original incore nodehdr with the sibling's
pointers. This leads to loop considering the original node as a merge
candidate with itself in the second pass, and so it incorrectly
determines a merge should occur.)

Signed-off-by: Mark Tinguely
Reviewed-by: Ben Myers
Signed-off-by: Ben Myers

[v3: added Dave Chinner's (slightly modified) suggestion to the commit header,
cleaned up whitespace. -bpm]

Mark Tinguely
2013-09-26 23:38:17 +0800
5bc2afc2b NFSv4: Honour the 'opened' parameter in the atomic_open() filesystem method ... Browse Code »

Determine if we've created a new file by examining the directory change
attribute and/or the O_EXCL flag.

This fixes a regression when doing a non-exclusive create of a new file.
If the FILE_CREATED flag is not set, the atomic_open() command will
perform full file access permissions checks instead of just checking
for MAY_OPEN.

Signed-off-by: Trond Myklebust

Trond Myklebust
2013-09-26 22:20:18 +0800

25 Sep, 2013

7 commits

a153e67bd Merge branch 'akpm' (patches from Andrew Morton) ... Browse Code »

Merge fixes from Andrew Morton:
"Bunch of fixes.

And a reversion of mhocko's "Soft limit rework" patch series. This is
actually your fault for opening the merge window when I was off racing ;)

I didn't read the email thread before sending everything off.
Johannes Weiner raised significant issues:

http://www.spinics.net/lists/cgroups/msg08813.html

and we agreed to back it all out"

I clearly need to be more aware of Andrew's racing schedule.

* akpm:
MAINTAINERS: update mach-bcm related email address
checkpatch: make extern in .h prototypes quieter
cciss: fix info leak in cciss_ioctl32_passthru()
cpqarray: fix info leak in ida_locked_ioctl()
kernel/reboot.c: re-enable the function of variable reboot_default
audit: fix endless wait in audit_log_start()
revert "memcg, vmscan: integrate soft reclaim tighter with zone shrinking code"
revert "memcg: get rid of soft-limit tree infrastructure"
revert "vmscan, memcg: do softlimit reclaim also for targeted reclaim"
revert "memcg: enhance memcg iterator to support predicates"
revert "memcg: track children in soft limit excess to improve soft limit"
revert "memcg, vmscan: do not attempt soft limit reclaim if it would not scan anything"
revert "memcg: track all children over limit in the root"
revert "memcg, vmscan: do not fall into reclaim-all pass too quickly"
fs/ocfs2/super.c: use a bigger nodestr in ocfs2_dismount_volume
watchdog: update watchdog_thresh properly
watchdog: update watchdog attributes atomically

Linus Torvalds
2013-09-25 08:00:35 +0800
99d7a8824 fs/ocfs2/super.c: use a bigger nodestr in ocfs2_dismount_volume ... Browse Code »

While printing 32-bit node numbers, an 8-byte string is not enough.
Increase the size of the string to 12 chars.

This got left out in commit 49fa8140e487 ("fs/ocfs2/super.c: Use bigger
nodestr to accomodate 32-bit node numbers").

Signed-off-by: Goldwyn Rodrigues
Cc: Joel Becker
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Goldwyn Rodrigues
2013-09-25 08:00:25 +0800
2f6cf0de0 block: Fix bio_copy_data() ... Browse Code »

The memcpy() in bio_copy_data() was using the wrong offset vars, leading
to data corruption in weird unusual setups.

Signed-off-by: Kent Overstreet
Cc: Jens Axboe
Cc: linux-stable # >= v3.9
Signed-off-by: Linus Torvalds

Kent Overstreet
2013-09-25 05:41:42 +0800
566055d33 xfs: log recovery lsn ordering needs uuid check ... Browse Code »

After a fair number of xfstests runs, xfs/182 started to fail
regularly with a corrupted directory - a directory read verifier was
failing after recovery because it found a block with a XARM magic
number (remote attribute block) rather than a directory data block.

The first time I saw this repeated failure I did /something/ and the
problem went away, so I was never able to find the underlying
problem. Test xfs/182 failed again today, and I found the root
cause before I did /something else/ that made it go away.

Tracing indicated that the block in question was being correctly
logged, the log was being flushed by sync, but the buffer was not
being written back before the shutdown occurred. Tracing also
indicated that log recovery was also reading the block, but then
never writing it before log recovery invalidated the cache,
indicating that it was not modified by log recovery.

More detailed analysis of the corpse indicated that the filesystem
had a uuid of "a4131074-1872-4cac-9323-2229adbcb886" but the XARM
block had a uuid of "8f32f043-c3c9-e7f8-f947-4e7f989c05d3", which
indicated it was a block from an older filesystem. The reason that
log recovery didn't replay it was that the LSN in the XARM block was
larger than the LSN of the transaction being replayed, and so the
block was not overwritten by log recovery.

Hence, log recovery cant blindly trust the magic number and LSN in
the block - it must verify that it belongs to the filesystem being
recovered before using the LSN. i.e. if the UUIDs don't match, we
need to unconditionally recovery the change held in the log.

This patch was first tested on a block device that was repeatedly
causing xfs/182 to fail with the same failure on the same block with
the same directory read corruption signature (i.e. XARM block). It
did not fail, and hasn't failed since.

Signed-off-by: Dave Chinner
Reviewed-by: Ben Myers
Signed-off-by: Ben Myers

Dave Chinner
2013-09-25 01:35:57 +0800
b771af2fc xfs: fix XFS_IOC_FREE_EOFBLOCKS definition ... Browse Code »

It uses a kernel internal structure in it's definition rather than
the user visible structure that is passed to the ioctl.

Signed-off-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2013-09-25 01:35:08 +0800
b313a5f1c xfs: asserting lock not held during freeing not valid ... Browse Code »

When we free an inode, we do so via RCU. As an RCU lookup can occur
at any time before we free an inode, and that lookup takes the inode
flags lock, we cannot safely assert that the flags lock is not held
just before marking it dead and running call_rcu() to free the
inode.

We check on allocation of a new inode structre that the lock is not
held, so we still have protection against locks being leaked and
hence not correctly initialised when allocated out of the slab.
Hence just remove the assert...

Signed-off-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2013-09-25 01:32:57 +0800
488523580 xfs: lock the AIL before removing the buffer item ... Browse Code »

Regression introduced by commit 46f9d2e ("xfs: aborted buf items can
be in the AIL") which fails to lock the AIL before removing the
item. Spinlock debugging throws a warning about this.

Signed-off-by: Dave Chinner
Reviewed-by: Mark Tinguely
Signed-off-by: Ben Myers

Dave Chinner
2013-09-25 01:31:41 +0800

24 Sep, 2013

3 commits

721a769c0 reiserfs: fix race with flush_used_journal_lists and flush_journal_list ... Browse Code »

There are two locks involved in managing the journal lists. The general
reiserfs_write_lock and the journal->j_flush_mutex.

While flush_journal_list is sleeping to acquire the j_flush_mutex or to
submit a block for write, it will drop the write lock. This allows
another thread to acquire the write lock and ultimately call
flush_used_journal_lists to traverse the list of journal lists and
select one for flushing. It can select the journal_list that has just
had flush_journal_list called on it in the original thread and call it
again with the same journal_list.

The second thread then drops the write lock to acquire j_flush_mutex and
the first thread reacquires it and continues execution and eventually
clears and frees the journal list before dropping j_flush_mutex and
returning.

The second thread acquires j_flush_mutex and ends up operating on a
journal_list that has already been released. If the memory hasn't
been reused, we'll soon after hit a BUG_ON because the transaction id
has already been cleared. If it's been reused, we'll crash in other
fun ways.

Since flush_journal_list will synchronize on j_flush_mutex, we can fix
the race by taking a proper reference in flush_used_journal_lists
and checking to see if it's still valid after the mutex is taken. It's
safe to iterate the list of journal lists and pick a list with
just the write lock as long as a reference is taken on the journal list
before we drop the lock. We already have code to handle whether a
transaction has been flushed already so we can use that to handle the
race and get rid of the trans_id BUG_ON.

Signed-off-by: Jeff Mahoney
Signed-off-by: Jan Kara

Jeff Mahoney
2013-09-24 17:24:21 +0800
7bc9cc07e reiserfs: remove useless flush_old_journal_lists ... Browse Code »

Commit a3172027 introduced test_transaction as a requirement for
flushing old lists -- but it can never return 1 unless the transaction
has already been flushed.

As a result, we have a routine that iterates the j_realblocks list but
doesn't actually do anything. Since it's been this way since 2006 and
the latency numbers were what Chris expected, let's just rip it out.

Signed-off-by: Jeff Mahoney
Signed-off-by: Jan Kara

Jeff Mahoney
2013-09-24 17:24:21 +0800
69d75671d udf: Fortify LVID loading ... Browse Code »

A user has reported an oops in udf_statfs() that was caused by
numOfPartitions entry in LVID structure being corrupted. Fix the problem
by verifying whether numOfPartitions makes sense at least to the extent
that LVID fits into a single block as it should.

Reported-by: Juergen Weigert
Signed-off-by: Jan Kara

Jan Kara
2013-09-24 17:23:33 +0800

23 Sep, 2013

2 commits

68cf8d0c7 Merge branch 'for-3.12/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block IO fixes from Jens Axboe:
"After merge window, no new stuff this time only a collection of neatly
confined and simple fixes"

* 'for-3.12/core' of git://git.kernel.dk/linux-block:
cfq: explicitly use 64bit divide operation for 64bit arguments
block: Add nr_bios to block_rq_remap tracepoint
If the queue is dying then we only call the rq->end_io callout. This leaves bios setup on the request, because the caller assumes when the blk_execute_rq_nowait/blk_execute_rq call has completed that the rq->bios have been cleaned up.
bio-integrity: Fix use of bs->bio_integrity_pool after free
blkcg: relocate root_blkg setting and clearing
block: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...)
block: trace all devices plug operation

Linus Torvalds
2013-09-23 06:00:11 +0800
0fbf2cc98 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs fixes from Chris Mason:
"These are mostly bug fixes and a two small performance fixes. The
most important of the bunch are Josef's fix for a snapshotting
regression and Mark's update to fix compile problems on arm"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (25 commits)
Btrfs: create the uuid tree on remount rw
btrfs: change extent-same to copy entire argument struct
Btrfs: dir_inode_operations should use btrfs_update_time also
btrfs: Add btrfs: prefix to kernel log output
btrfs: refuse to remount read-write after abort
Btrfs: btrfs_ioctl_default_subvol: Revert back to toplevel subvolume when arg is 0
Btrfs: don't leak transaction in btrfs_sync_file()
Btrfs: add the missing mutex unlock in write_all_supers()
Btrfs: iput inode on allocation failure
Btrfs: remove space_info->reservation_progress
Btrfs: kill delay_iput arg to the wait_ordered functions
Btrfs: fix worst case calculator for space usage
Revert "Btrfs: rework the overcommit logic to be based on the total size"
Btrfs: improve replacing nocow extents
Btrfs: drop dir i_size when adding new names on replay
Btrfs: replay dir_index items before other items
Btrfs: check roots last log commit when checking if an inode has been logged
Btrfs: actually log directory we are fsync()'ing
Btrfs: actually limit the size of delalloc range
Btrfs: allocate the free space by the existed max extent size when ENOSPC
...

Linus Torvalds
2013-09-23 05:58:49 +0800

21 Sep, 2013

11 commits

94aebfb2e Btrfs: create the uuid tree on remount rw ... Browse Code »

Users have been complaining of the uuid tree stuff warning that there is no uuid
root when trying to do snapshot operations. This is because if you mount -o ro
we will not create the uuid tree. But then if you mount -o rw,remount we will
still not create it and then any subsequent snapshot/subvol operations you try
to do will fail gloriously. Fix this by creating the uuid_root on remount rw if
it was not already there. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-09-21 23:50:43 +0800
cbf8b8ca3 btrfs: change extent-same to copy entire argument struct ... Browse Code »

btrfs_ioctl_file_extent_same() uses __put_user_unaligned() to copy some data
back to it's argument struct. Unfortunately, not all architectures provide
__put_user_unaligned(), so compiles break on them if btrfs is selected.

Instead, just copy the whole struct in / out at the start and end of
operations, respectively.

Signed-off-by: Mark Fasheh
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Mark Fasheh
2013-09-21 23:05:31 +0800
93fd63c2f Btrfs: dir_inode_operations should use btrfs_update_time also ... Browse Code »

Commit 2bc5565286121d2a77ccd728eb3484dff2035b58 (Btrfs: don't update atime on
RO subvolumes) ensures that the access time of an inode is not updated when
the inode lives in a read-only subvolume.
However, if a directory on a read-only subvolume is accessed, the atime is
updated. This results in a write operation to a read-only subvolume. I
believe that access times should never be updated on read-only subvolumes.

To reproduce:

# mkfs.btrfs -f /dev/dm-3
(...)
# mount /dev/dm-3 /mnt
# btrfs subvol create /mnt/sub
Create subvolume '/mnt/sub'
# mkdir /mnt/sub/dir
# echo "abc" > /mnt/sub/dir/file
# btrfs subvol snapshot -r /mnt/sub /mnt/rosnap
Create a readonly snapshot of '/mnt/sub' in '/mnt/rosnap'
# stat /mnt/rosnap/dir
File: `/mnt/rosnap/dir'
Size: 8 Blocks: 0 IO Block: 4096 directory
Device: 16h/22d Inode: 257 Links: 1
Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2013-09-11 07:21:49.389157126 -0400
Modify: 2013-09-11 07:22:02.330156079 -0400
Change: 2013-09-11 07:22:02.330156079 -0400
# ls /mnt/rosnap/dir
file
# stat /mnt/rosnap/dir
File: `/mnt/rosnap/dir'
Size: 8 Blocks: 0 IO Block: 4096 directory
Device: 16h/22d Inode: 257 Links: 1
Access: (0755/drwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2013-09-11 07:22:56.797151670 -0400
Modify: 2013-09-11 07:22:02.330156079 -0400
Change: 2013-09-11 07:22:02.330156079 -0400

Reported-by: Koen De Wit
Signed-off-by: Guangyu Sun
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Guangyu Sun
2013-09-21 23:05:30 +0800
5138cccf3 btrfs: Add btrfs: prefix to kernel log output ... Browse Code »

The kernel log entries for device label %s and device fsid %pU
are missing the btrfs: prefix. Add those here.

Signed-off-by: Frank Holton
Reviewed-by: David Sterba
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Frank Holton
2013-09-21 23:05:30 +0800
6ef3de9c9 btrfs: refuse to remount read-write after abort ... Browse Code »

It's still possible to flip the filesystem into RW mode after it's
remounted RO due to an abort. There are lots of places that check for
the superblock error bit and will not write data, but we should not let
the filesystem appear read-write.

Signed-off-by: David Sterba
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

David Sterba
2013-09-21 23:05:30 +0800
1cecf579d Btrfs: btrfs_ioctl_default_subvol: Revert back to toplevel subvolume when arg is 0 ... Browse Code »

This patch makes it possible to set BTRFS_FS_TREE_OBJECTID as the default
subvolume by passing a subvolume id of 0.

Signed-off-by: chandan
Reviewed-by: David Sterba
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

chandan
2013-09-21 23:05:29 +0800
a0634be56 Btrfs: don't leak transaction in btrfs_sync_file() ... Browse Code »

In btrfs_sync_file(), if the call to btrfs_log_dentry_safe() returns
a negative error (for e.g. -ENOMEM via btrfs_log_inode()), we would
return without ending/freeing the transaction.

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Filipe David Borba Manana
2013-09-21 23:05:29 +0800
a724b4369 Btrfs: add the missing mutex unlock in write_all_supers() ... Browse Code »

The BUG() was replaced by btrfs_error() and return -EIO with the
patch "get rid of one BUG() in write_all_supers()", but the missing
mutex_unlock() was overlooked.

The 0-DAY kernel build service from Intel reported the missing
unlock which was found by the coccinelle tool:

fs/btrfs/disk-io.c:3422:2-8: preceding lock on line 3374

Signed-off-by: Stefan Behrens
Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Stefan Behrens
2013-09-21 23:05:28 +0800
f4ab9ea70 Btrfs: iput inode on allocation failure ... Browse Code »

We don't do the iput when we fail to allocate our delayed delalloc work in
__start_delalloc_inodes, fix this.

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-09-21 23:05:28 +0800
363e4d354 Btrfs: remove space_info->reservation_progress ... Browse Code »

This isn't used for anything anymore, just remove it.

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-09-21 23:05:27 +0800
f0de181c9 Btrfs: kill delay_iput arg to the wait_ordered functions ... Browse Code »

This is a left over of how we used to wait for ordered extents, which was to
grab the inode and then run filemap flush on it. However if we have an ordered
extent then we already are holding a ref on the inode, and we just use
btrfs_start_ordered_extent anyway, so there is no reason to have an extra ref on
the inode to start work on the ordered extent. Thanks,

Signed-off-by: Josef Bacik
Signed-off-by: Chris Mason

Josef Bacik
2013-09-21 23:05:27 +0800