Doug / smarc-fsl-linux-kernel | Embedian Git Server

12 May, 2013

1 commit

c4cc75c33 Merge git://git.infradead.org/users/eparis/audit ... Browse Code »

Pull audit changes from Eric Paris:
"Al used to send pull requests every couple of years but he told me to
just start pushing them to you directly.

Our touching outside of core audit code is pretty straight forward. A
couple of interface changes which hit net/. A simple argument bug
calling audit functions in namei.c and the removal of some assembly
branch prediction code on ppc"

* git://git.infradead.org/users/eparis/audit: (31 commits)
audit: fix message spacing printing auid
Revert "audit: move kaudit thread start from auditd registration to kaudit init"
audit: vfs: fix audit_inode call in O_CREAT case of do_last
audit: Make testing for a valid loginuid explicit.
audit: fix event coverage of AUDIT_ANOM_LINK
audit: use spin_lock in audit_receive_msg to process tty logging
audit: do not needlessly take a lock in tty_audit_exit
audit: do not needlessly take a spinlock in copy_signal
audit: add an option to control logging of passwords with pam_tty_audit
audit: use spin_lock_irqsave/restore in audit tty code
helper for some session id stuff
audit: use a consistent audit helper to log lsm information
audit: push loginuid and sessionid processing down
audit: stop pushing loginid, uid, sessionid as arguments
audit: remove the old depricated kernel interface
audit: make validity checking generic
audit: allow checking the type of audit message in the user filter
audit: fix build break when AUDIT_DEBUG == 2
audit: remove duplicate export of audit_enabled
Audit: do not print error when LSMs disabled
...

Linus Torvalds
2013-05-12 05:29:11 +0800

11 May, 2013

3 commits

2dbd3cac8 Merge branch 'for-3.10' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd fixes from Bruce Fields:
"Small fixes for two bugs and two warnings"

* 'for-3.10' of git://linux-nfs.org/~bfields/linux:
nfsd: fix oops when legacy_recdir_name_error is passed a -ENOENT error
SUNRPC: fix decoding of optional gss-proxy xdr fields
SUNRPC: Refactor gssx_dec_option_array() to kill uninitialized warning
nfsd4: don't allow owner override on 4.1 CLAIM_FH opens

Linus Torvalds
2013-05-11 00:28:55 +0800
3644bc2ec Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal ... Browse Code »

Pull stray syscall bits from Al Viro:
"Several syscall-related commits that were missing from the original"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
switch compat_sys_sysctl to COMPAT_SYSCALL_DEFINE
unicore32: just use mmap_pgoff()...
unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE
x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...)

Linus Torvalds
2013-05-11 00:21:05 +0800
6fad8d02e Merge tag 'ecryptfs-3.10-rc1-ablkcipher' of git://git.kernel.org/pub/scm/linux/k… ... Browse Code »

…ernel/git/tyhicks/ecryptfs

Pull eCryptfs update from Tyler Hicks:
"Improve performance when AES-NI (and most likely other crypto
accelerators) is available by moving to the ablkcipher crypto API.
The improvement is more apparent on faster storage devices.

There's no noticeable change when hardware crypto is not available"

* tag 'ecryptfs-3.10-rc1-ablkcipher' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: Use the ablkcipher crypto API

Linus Torvalds
2013-05-11 00:20:01 +0800

10 May, 2013

10 commits

977b58e1d Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu ... Browse Code »

Pull m68knommu updates from Greg Ungerer:
"The bulk of the changes are generalizing the ColdFire v3 core support
and adding in 537x CPU support. Also a couple of other bug fixes, one
to fix a reintroduction of a past bug in the romfs filesystem nommu
support."

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gerg/m68knommu:
m68knommu: enable Timer on coldfire 532x
m68knommu: fix ColdFire 5373/5329 QSPI base address
m68knommu: add support for configuring a Freescale M5373EVB board
m68knommu: add support for the ColdFire 537x family of CPUs
m68knommu: make ColdFire M532x platform support more v3 generic
m68knommu: create and use a common M53xx ColdFire class of CPUs
m68k: remove unused asm/dbg.h
m68k: Set ColdFire ACR1 cache mode depending on kernel configuration
romfs: fix nommu map length to keep inside filesystem
m68k: clean up unused "config ROMVECSIZE"

Linus Torvalds
2013-05-10 22:22:35 +0800
4dfea4f0d eCryptfs: Use the ablkcipher crypto API ... Browse Code »

Make the switch from the blkcipher kernel crypto interface to the
ablkcipher interface.

encrypt_scatterlist() and decrypt_scatterlist() now use the ablkcipher
interface but, from the eCryptfs standpoint, still treat the crypto
operation as a synchronous operation. They submit the async request and
then wait until the operation is finished before they return. Most of
the changes are contained inside those two functions.

Despite waiting for the completion of the crypto operation, the
ablkcipher interface provides performance increases in most cases when
used on AES-NI capable hardware.

Signed-off-by: Tyler Hicks
Acked-by: Colin King
Reviewed-by: Zeev Zilberman
Cc: Dustin Kirkland
Cc: Tim Chen
Cc: Ying Huang
Cc: Thieu Le
Cc: Li Wang
Cc: Jarkko Sakkinen

Tyler Hicks
2013-05-10 07:55:07 +0800
70eba4226 Merge tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux ... Browse Code »

Pull trivial pstore update from Tony Luck:
"Couple of pstore cleanups"

It turns out that the kmemdup() conversion ends up being undone by the
fact that the memory block also needed the ecc information (see commit
bd08ec33b5c2: "pstore/ram: Restore ecc information block"), so all that
remains after merging is the error return code change.

* tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
pstore/ram: fix error return code in ramoops_probe()
fs: pstore: Replaced calls to kmalloc and memcpy with kmemdup

Linus Torvalds
2013-05-10 07:42:10 +0800
07e074503 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull more vfs fixes from Al Viro:
"Regression fix from Geert + yet another open-coded kernel_read()"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ecryptfs: don't open-code kernel_read()
xtensa simdisk: Fix proc_create_data() conversion fallout

Linus Torvalds
2013-05-10 04:44:35 +0800
983a5f84a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs update from Chris Mason:
"These are mostly fixes. The biggest exceptions are Josef's skinny
extents and Jan Schmidt's code to rebuild our quota indexes if they
get out of sync (or you enable quotas on an existing filesystem).

The skinny extents are off by default because they are a new variation
on the extent allocation tree format. btrfstune -x enables them, and
the new format makes the extent allocation tree about 30% smaller.

I rebased this a few days ago to rework Dave Sterba's crc checks on
the super block, but almost all of these go back to rc6, since I
though 3.9 was due any minute.

The biggest missing fix is the tracepoint bug that was hit late in
3.9. I ran into problems with that in overnight testing and I'm still
tracking it down. I'll definitely have that fixed for rc2."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (101 commits)
Btrfs: allow superblock mismatch from older mkfs
btrfs: enhance superblock checks
btrfs: fix misleading variable name for flags
btrfs: use unsigned long type for extent state bits
Btrfs: improve the loop of scrub_stripe
btrfs: read entire device info under lock
btrfs: remove unused gfp mask parameter from release_extent_buffer callchain
btrfs: handle errors returned from get_tree_block_key
btrfs: make static code static & remove dead code
Btrfs: deal with errors in write_dev_supers
Btrfs: remove almost all of the BUG()'s from tree-log.c
Btrfs: deal with free space cache errors while replaying log
Btrfs: automatic rescan after "quota enable" command
Btrfs: rescan for qgroups
Btrfs: split btrfs_qgroup_account_ref into four functions
Btrfs: allocate new chunks if the space is not enough for global rsv
Btrfs: separate sequence numbers for delayed ref tracking and tree mod log
btrfs: move leak debug code to functions
Btrfs: return free space in cow error path
Btrfs: set UUID in root_item for created trees
...

Linus Torvalds
2013-05-10 04:07:40 +0800
8769e078a Merge tag 'for-linus-v3.10-rc1-2' of git://oss.sgi.com/xfs/xfs ... Browse Code »

Pull xfs update (#2) from Ben Myers:

- add CONFIG_XFS_WARN, a step between zero debugging and
CONFIG_XFS_DEBUG.

- fix attrmulti and attrlist to fall back to vmalloc when kmalloc
fails.

* tag 'for-linus-v3.10-rc1-2' of git://oss.sgi.com/xfs/xfs:
xfs: fallback to vmalloc for large buffers in xfs_compat_attrlist_by_handle
xfs: fallback to vmalloc for large buffers in xfs_attrlist_by_handle
xfs: introduce CONFIG_XFS_WARN

Linus Torvalds
2013-05-10 04:06:20 +0800
91c2e0bca unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-05-10 01:46:38 +0800
39dfe6c6a ecryptfs: don't open-code kernel_read() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-05-10 01:39:58 +0800
8cbc95ee7 Merge tag 'nfs-for-3.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull more NFS client bugfixes from Trond Myklebust:

- Ensure that we match the 'sec=' mount flavour against the server list

- Fix the NFSv4 byte range locking in the presence of delegations

- Ensure that we conform to the NFSv4.1 spec w.r.t. freeing lock
stateids

- Fix a pNFS data server connection race

* tag 'nfs-for-3.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS4.1 Fix data server connection race
NFSv3: match sec= flavor against server list
NFSv4.1: Ensure that we free the lock stateid on the server
NFSv4: Convert nfs41_free_stateid to use an asynchronous RPC call
SUNRPC: Don't spam syslog with "Pseudoflavor not found" messages
NFSv4.x: Fix handling of partially delegated locks

Linus Torvalds
2013-05-10 01:24:54 +0800
7255e716b nfsd: fix oops when legacy_recdir_name_error is passed a -ENOENT error ... Browse Code »

Toralf reported the following oops to the linux-nfs mailing list:

-----------------[snip]------------------
NFSD: unable to generate recoverydir name (-2).
NFSD: disabling legacy clientid tracking. Reboot recovery will not function correctly!
BUG: unable to handle kernel NULL pointer dereference at 000003c8
IP: [] nfsd4_client_tracking_exit+0x11/0x50 [nfsd]
*pdpt = 000000002ba33001 *pde = 0000000000000000
Oops: 0000 [#1] SMP
Modules linked in: loop nfsd auth_rpcgss ipt_MASQUERADE xt_owner xt_multiport ipt_REJECT xt_tcpudp xt_recent xt_conntrack nf_conntrack_ftp xt_limit xt_LOG iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_filter ip_tables x_tables af_packet pppoe pppox ppp_generic slhc bridge stp llc tun arc4 iwldvm mac80211 coretemp kvm_intel uvcvideo sdhci_pci sdhci mmc_core videobuf2_vmalloc videobuf2_memops usblp videobuf2_core i915 iwlwifi psmouse videodev cfg80211 kvm fbcon bitblit cfbfillrect acpi_cpufreq mperf evdev softcursor font cfbimgblt i2c_algo_bit cfbcopyarea intel_agp intel_gtt drm_kms_helper snd_hda_codec_conexant drm agpgart fb fbdev tpm_tis thinkpad_acpi tpm nvram e1000e rfkill thermal ptp wmi pps_core tpm_bios 8250_pci processor 8250 ac snd_hda_intel snd_hda_codec snd_pcm battery video i2c_i801 snd_page_alloc snd_timer button serial_core i2c_core snd soundcore thermal_sys hwmon aesni_intel ablk_helper cryp
td lrw aes_i586 xts gf128mul cbc fuse nfs lockd sunrpc dm_crypt dm_mod hid_monterey hid_microsoft hid_logitech hid_ezkey hid_cypress hid_chicony hid_cherry hid_belkin hid_apple hid_a4tech hid_generic usbhid hid sr_mod cdrom sg [last unloaded: microcode]
Pid: 6374, comm: nfsd Not tainted 3.9.1 #6 LENOVO 4180F65/4180F65
EIP: 0060:[] EFLAGS: 00010202 CPU: 0
EIP is at nfsd4_client_tracking_exit+0x11/0x50 [nfsd]
EAX: 00000000 EBX: fffffffe ECX: 00000007 EDX: 00000007
ESI: eb9dcb00 EDI: eb2991c0 EBP: eb2bde38 ESP: eb2bde34
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
CR0: 80050033 CR2: 000003c8 CR3: 2ba80000 CR4: 000407f0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
Process nfsd (pid: 6374, ti=eb2bc000 task=eb2711c0 task.ti=eb2bc000)
Stack:
fffffffe eb2bde4c f90a3e0c f90a7754 fffffffe eb0a9c00 eb2bdea0 f90a41ed
eb2991c0 1b270000 eb2991c0 eb2bde7c f9099ce9 eb2bde98 0129a020 eb29a020
eb2bdecc eb2991c0 eb2bdea8 f9099da5 00000000 eb9dcb00 00000001 67822f08
Call Trace:
[] legacy_recdir_name_error+0x3c/0x40 [nfsd]
[] nfsd4_create_clid_dir+0x15d/0x1c0 [nfsd]
[] ? nfsd4_lookup_stateid+0x99/0xd0 [nfsd]
[] ? nfs4_preprocess_seqid_op+0x85/0x100 [nfsd]
[] nfsd4_client_record_create+0x37/0x50 [nfsd]
[] nfsd4_open_confirm+0xfe/0x130 [nfsd]
[] ? nfsd4_encode_operation+0x61/0x90 [nfsd]
[] ? nfsd4_free_stateid+0xc0/0xc0 [nfsd]
[] nfsd4_proc_compound+0x41b/0x530 [nfsd]
[] nfsd_dispatch+0x8b/0x1a0 [nfsd]
[] svc_process+0x3dd/0x640 [sunrpc]
[] nfsd+0xad/0x110 [nfsd]
[] ? nfsd_destroy+0x70/0x70 [nfsd]
[] kthread+0x94/0xa0
[] ret_from_kernel_thread+0x1b/0x28
[] ? flush_kthread_work+0xd0/0xd0
Code: 86 b0 00 00 00 90 c5 0a f9 c7 04 24 70 76 0a f9 e8 74 a9 3d c8 eb ba 8d 76 00 55 89 e5 53 66 66 66 66 90 8b 15 68 c7 0a f9 85 d2 88 c8 03 00 00 74 2c 3b 11 77 28 8b 5c 91 08 85 db 74 22 8b
EIP: [] nfsd4_client_tracking_exit+0x11/0x50 [nfsd] SS:ESP 0068:eb2bde34
CR2: 00000000000003c8
---[ end trace 09e54015d145c9c6 ]---

The problem appears to be a regression that was introduced in commit
9a9c6478 "nfsd: make NFSv4 recovery client tracking options per net".
Prior to that commit, it was safe to pass a NULL net pointer to
nfsd4_client_tracking_exit in the legacy recdir case, and
legacy_recdir_name_error did so. After that comit, the net pointer must
be valid.

This patch just fixes legacy_recdir_name_error to pass in a valid net
pointer to that function.

Cc: # v3.8+
Cc: Stanislav Kinsbursky
Reported-and-tested-by: Toralf Förster
Signed-off-by: Jeff Layton
Signed-off-by: J. Bruce Fields

Jeff Layton
2013-05-10 00:58:36 +0800

09 May, 2013

4 commits

942d33da9 Merge tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs ... Browse Code »

Pull f2fs updates from Jaegeuk Kim:
"This patch-set includes the following major enhancement patches.
- introduce a new gloabl lock scheme
- add tracepoints on several major functions
- fix the overall cleaning process focused on victim selection
- apply the block plugging to merge IOs as much as possible
- enhance management of free nids and its list
- enhance the readahead mode for node pages
- address several cretical deadlock conditions
- reduce lock_page calls

The other minor bug fixes and enhancements are as follows.
- calculation mistakes: overflow
- bio types: READ, READA, and READ_SYNC
- fix the recovery flow, data races, and null pointer errors"

* tag 'f2fs-for-v3.10' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (68 commits)
f2fs: cover free_nid management with spin_lock
f2fs: optimize scan_nat_page()
f2fs: code cleanup for scan_nat_page() and build_free_nids()
f2fs: bugfix for alloc_nid_failed()
f2fs: recover when journal contains deleted files
f2fs: continue to mount after failing recovery
f2fs: avoid deadlock during evict after f2fs_gc
f2fs: modify the number of issued pages to merge IOs
f2fs: remove useless #include as we're now using sysfs as debug entry.
f2fs: fix inconsistent using of NM_WOUT_THRESHOLD
f2fs: check truncation of mapping after lock_page
f2fs: enhance alloc_nid and build_free_nids flows
f2fs: add a tracepoint on f2fs_new_inode
f2fs: check nid == 0 in add_free_nid
f2fs: add REQ_META about metadata requests for submit
f2fs: give a chance to merge IOs by IO scheduler
f2fs: avoid frequent background GC
f2fs: add tracepoints to debug checkpoint request
f2fs: add tracepoints for write page operations
f2fs: add tracepoints to debug the block allocation
...

Linus Torvalds
2013-05-09 06:11:48 +0800
c23266d53 NFS4.1 Fix data server connection race ... Browse Code »

Unlike meta data server mounts which support multiple mount points to
the same server via struct nfs_server, data servers support a single connection.

Concurrent calls to setup the data server connection can race where the first
call allocates the nfs_client struct, and before the cache struct nfs_client
pointer can be set, a second call also tries to setup the connection, finds the
already allocated nfs_client, bumps the reference count, re-initializes the
session,etc. This results in a hanging data server session after umount.

Signed-off-by: Andy Adamson
Signed-off-by: Trond Myklebust

Andy Adamson
2013-05-09 05:19:32 +0800
47110b889 pstore/ram: fix error return code in ramoops_probe() ... Browse Code »

Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun
Acked-by: Kees Cook
Signed-off-by: Tony Luck

Wei Yongjun
2013-05-09 01:24:48 +0800
4de13d7aa Merge branch 'for-3.10/core' of git://git.kernel.dk/linux-block ... Browse Code »

Pull block core updates from Jens Axboe:

- Major bit is Kents prep work for immutable bio vecs.

- Stable candidate fix for a scheduling-while-atomic in the queue
bypass operation.

- Fix for the hang on exceeded rq->datalen 32-bit unsigned when merging
discard bios.

- Tejuns changes to convert the writeback thread pool to the generic
workqueue mechanism.

- Runtime PM framework, SCSI patches exists on top of these in James'
tree.

- A few random fixes.

* 'for-3.10/core' of git://git.kernel.dk/linux-block: (40 commits)
relay: move remove_buf_file inside relay_close_buf
partitions/efi.c: replace useless kzalloc's by kmalloc's
fs/block_dev.c: fix iov_shorten() criteria in blkdev_aio_read()
block: fix max discard sectors limit
blkcg: fix "scheduling while atomic" in blk_queue_bypass_start
Documentation: cfq-iosched: update documentation help for cfq tunables
writeback: expose the bdi_wq workqueue
writeback: replace custom worker pool implementation with unbound workqueue
writeback: remove unused bdi_pending_list
aoe: Fix unitialized var usage
bio-integrity: Add explicit field for owner of bip_buf
block: Add an explicit bio flag for bios that own their bvec
block: Add bio_alloc_pages()
block: Convert some code to bio_for_each_segment_all()
block: Add bio_for_each_segment_all()
bounce: Refactor __blk_queue_bounce to not use bi_io_vec
raid1: use bio_copy_data()
pktcdvd: Use bio_reset() in disabled code to kill bi_idx usage
pktcdvd: use bio_copy_data()
block: Add bio_copy_data()
...

Linus Torvalds
2013-05-09 01:13:35 +0800

08 May, 2013

22 commits

59bbd474a f2fs: cover free_nid management with spin_lock ... Browse Code »

After build_free_nids() searches free nid candidates from nat pages and
current journal blocks, it checks all the candidates if they are allocated
so that the nat cache has its nid with an allocated block address.

In this procedure, previously we used
list_for_each_entry_safe(fnid, next_fnid, &nm_i->free_nid_list, list).
But, this is not covered by free_nid_list_lock, resulting in null pointer bug.

This patch moves this checking routine inside add_free_nid() in order not to use
the spin_lock.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-05-08 18:54:22 +0800
23d388442 f2fs: optimize scan_nat_page() ... Browse Code »

When nm_i->fcnt > 2 * MAX_FREE_NIDS, stop scanning other NAT entries.

Signed-off-by: Haicheng Li
[Jaegeuk Kim: fix handling the return value of add_free_nid()]
Signed-off-by: Jaegeuk Kim

Haicheng Li
2013-05-08 18:54:22 +0800
8760952d9 f2fs: code cleanup for scan_nat_page() and build_free_nids() ... Browse Code »

This patch does two cleanups:
1. remove unused variable "fcnt" in build_free_nids().
2. make scan_nat_page() as void type and remove useless variable "fcnt".

Signed-off-by: Haicheng Li
Signed-off-by: Jaegeuk Kim

Haicheng Li
2013-05-08 18:54:21 +0800
95630cbad f2fs: bugfix for alloc_nid_failed() ... Browse Code »

Directly drop the free_nid cache when nm_i->fcnt > 2 * MAX_FREE_NIDS

Since there is NOT nmi->free_nid_list_lock spinlock protection between
a sequential calling of alloc_nid() and alloc_nid_failed(), some other
threads may already add new free_nid to the free_nid_list during this
period.

We need to make sure nmi->fcnt is never > 2 * MAX_FREE_NIDS.

Signed-off-by: Haicheng Li
[Jaegeuk Kim: fit the coding style]
Signed-off-by: Jaegeuk Kim

Haicheng Li
2013-05-08 18:54:20 +0800
047184b42 f2fs: recover when journal contains deleted files ... Browse Code »

When recovering a journal file with fsync data for files that have
been deleted, don't bail out on recovery.

Signed-off-by: Chris Fries
Reviewed-by: Russell Knize
Reviewed-by: Jason Hrycay
[Jaegeuk Kim: fit the coding style]
Signed-off-by: Jaegeuk Kim

Chris Fries
2013-05-08 18:54:20 +0800
bde582b22 f2fs: continue to mount after failing recovery ... Browse Code »

When unable to roll forward the journal, we shouldn't bail out and
not mount, we should continue to attempt the mount. Bad recovery data
is likely unrecoverable at this point, and requiring the user to try
to mount again doesn't solve any issues.

Signed-off-by: Chris Fries
Reviewed-by: Russell Knize
Reviewed-by: Jason Hrycay
Signed-off-by: Jaegeuk Kim

Chris Fries
2013-05-08 18:54:19 +0800
531ad7d58 f2fs: avoid deadlock during evict after f2fs_gc ... Browse Code »

o Deadlock case #1

Thread 1:
- writeback_sb_inodes
- do_writepages
- f2fs_write_data_pages
- write_cache_pages
- f2fs_write_data_page
- f2fs_balance_fs
- wait mutex_lock(gc_mutex)

Thread 2:
- f2fs_balance_fs
- mutex_lock(gc_mutex)
- f2fs_gc
- f2fs_iget
- wait iget_locked(inode->i_lock)

Thread 3:
- do_unlinkat
- iput
- lock(inode->i_lock)
- evict
- inode_wait_for_writeback

o Deadlock case #2

Thread 1:
- __writeback_single_inode
: set I_SYNC
- do_writepages
- f2fs_write_data_page
- f2fs_balance_fs
- f2fs_gc
- iput
- evict
- inode_wait_for_writeback(I_SYNC)

In order to avoid this, even though iput is called with the zero-reference
count, we need to stop the eviction procedure if the inode is on writeback.
So this patch links f2fs_drop_inode which checks the I_SYNC flag.

Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2013-05-08 18:54:08 +0800
5af43c24c Merge branch 'akpm' (incoming from Andrew) ... Browse Code »

Merge more incoming from Andrew Morton:

- Various fixes which were stalled or which I picked up recently

- A large rotorooting of the AIO code. Allegedly to improve
performance but I don't really have good performance numbers (I might
have lost the email) and I can't raise Kent today. I held this out
of 3.9 and we could give it another cycle if it's all too late/scary.

I ended up taking only the first two thirds of the AIO rotorooting. I
left the percpu parts and the batch completion for later. - Linus

* emailed patches from Andrew Morton : (33 commits)
aio: don't include aio.h in sched.h
aio: kill ki_retry
aio: kill ki_key
aio: give shared kioctx fields their own cachelines
aio: kill struct aio_ring_info
aio: kill batch allocation
aio: change reqs_active to include unreaped completions
aio: use cancellation list lazily
aio: use flush_dcache_page()
aio: make aio_read_evt() more efficient, convert to hrtimers
wait: add wait_event_hrtimeout()
aio: refcounting cleanup
aio: make aio_put_req() lockless
aio: do fget() after aio_get_req()
aio: dprintk() -> pr_debug()
aio: move private stuff out of aio.h
aio: add kiocb_cancel()
aio: kill return value of aio_complete()
char: add aio_{read,write} to /dev/{null,zero}
aio: remove retry-based AIO
...

Linus Torvalds
2013-05-08 11:49:51 +0800
a27bb332c aio: don't include aio.h in sched.h ... Browse Code »

Faster kernel compiles by way of fewer unnecessary includes.

[akpm@linux-foundation.org: fix fallout]
[akpm@linux-foundation.org: fix build]
Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Reviewed-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kent Overstreet
2013-05-08 11:16:25 +0800
41ef4eb8e aio: kill ki_retry ... Browse Code »

Thanks to Zach Brown's work to rip out the retry infrastructure, we don't
need this anymore - ki_retry was only called right after the kiocb was
initialized.

This also refactors and trims some duplicated code, as well as cleaning up
the refcounting/error handling a bit.

[akpm@linux-foundation.org: use fmode_t in aio_run_iocb()]
[akpm@linux-foundation.org: fix file_start_write/file_end_write tests]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Reviewed-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kent Overstreet
2013-05-08 10:46:02 +0800
8a6608907 aio: kill ki_key ... Browse Code »

ki_key wasn't actually used for anything previously - it was always 0.
Drop it to trim struct kiocb a bit.

Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Reviewed-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kent Overstreet
2013-05-08 10:41:50 +0800
33e2208ac audit: vfs: fix audit_inode call in O_CREAT case of do_last ... Browse Code »

Jiri reported a regression in auditing of open(..., O_CREAT) syscalls.
In older kernels, creating a file with open(..., O_CREAT) created
audit_name records that looked like this:

type=PATH msg=audit(1360255720.628:64): item=1 name="/abc/foo" inode=138810 dev=fd:00 mode=0100640 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
type=PATH msg=audit(1360255720.628:64): item=0 name="/abc/" inode=138635 dev=fd:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0

...in recent kernels though, they look like this:

type=PATH msg=audit(1360255402.886:12574): item=2 name=(null) inode=264599 dev=fd:00 mode=0100640 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
type=PATH msg=audit(1360255402.886:12574): item=1 name=(null) inode=264598 dev=fd:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0
type=PATH msg=audit(1360255402.886:12574): item=0 name="/abc/foo" inode=264598 dev=fd:00 mode=040750 ouid=0 ogid=0 rdev=00:00 obj=unconfined_u:object_r:default_t:s0

Richard bisected to determine that the problems started with commit
bfcec708, but the log messages have changed with some later
audit-related patches.

The problem is that this audit_inode call is passing in the parent of
the dentry being opened, but audit_inode is being called with the parent
flag false. This causes later audit_inode and audit_inode_child calls to
match the wrong entry in the audit_names list.

This patch simply sets the flag to properly indicate that this inode
represents the parent. With this, the audit_names entries are back to
looking like they did before.

Cc: # v3.7+
Reported-by: Jiri Jaburek
Signed-off-by: Jeff Layton
Test By: Richard Guy Briggs
Signed-off-by: Eric Paris

Jeff Layton
2013-05-08 10:27:19 +0800
4e23bcaeb aio: give shared kioctx fields their own cachelines ... Browse Code »

[akpm@linux-foundation.org: make reqs_active __cacheline_aligned_in_smp]
Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Reviewed-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kent Overstreet
2013-05-08 09:38:29 +0800
58c85dc20 aio: kill struct aio_ring_info ... Browse Code »

struct aio_ring_info was kind of odd, the only place it's used is where
it's embedded in struct kioctx - there's no real need for it.

The next patch rearranges struct kioctx and puts various things on their
own cachelines - getting rid of struct aio_ring_info now makes that
reordering a bit clearer.

Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Reviewed-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kent Overstreet
2013-05-08 09:38:29 +0800
a1c8eae75 aio: kill batch allocation ... Browse Code »

Previously, allocating a kiocb required touching quite a few global
(well, per kioctx) cachelines... so batching up allocation to amortize
those was worthwhile. But we've gotten rid of some of those, and in
another couple of patches kiocb allocation won't require writing to any
shared cachelines, so that means we can just rip this code out.

Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Reviewed-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kent Overstreet
2013-05-08 09:38:29 +0800
3e845ce01 aio: change reqs_active to include unreaped completions ... Browse Code »

The aio code tries really hard to avoid having to deal with the
completion ringbuffer overflowing. To do that, it has to keep track of
the number of outstanding kiocbs, and the number of completions
currently in the ringbuffer - and it's got to check that every time we
allocate a kiocb. Ouch.

But - we can improve this quite a bit if we just change reqs_active to
mean "number of outstanding requests and unreaped completions" - that
means kiocb allocation doesn't have to look at the ringbuffer, which is
a fairly significant win.

Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kent Overstreet
2013-05-08 09:38:29 +0800
0460fef2a aio: use cancellation list lazily ... Browse Code »

Cancelling kiocbs requires adding them to a per kioctx linked list,
which is one of the few things we need to take the kioctx lock for in
the fast path. But most kiocbs can't be cancelled - so if we just do
this lazily, we can avoid quite a bit of locking overhead.

While we're at it, instead of using a flag bit switch to using ki_cancel
itself to indicate that a kiocb has been cancelled/completed. This lets
us get rid of ki_flags entirely.

[akpm@linux-foundation.org: remove buggy BUG()]
Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Reviewed-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kent Overstreet
2013-05-08 09:38:29 +0800
21b40200c aio: use flush_dcache_page() ... Browse Code »

This wasn't causing problems before because it's not needed on x86, but
it is needed on other architectures.

Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kent Overstreet
2013-05-08 09:38:29 +0800
a31ad380b aio: make aio_read_evt() more efficient, convert to hrtimers ... Browse Code »

Previously, aio_read_event() pulled a single completion off the
ringbuffer at a time, locking and unlocking each time. Change it to
pull off as many events as it can at a time, and copy them directly to
userspace.

This also fixes a bug where if copying the event to userspace failed,
we'd lose the event.

Also convert it to wait_event_interruptible_hrtimeout(), which
simplifies it quite a bit.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Reviewed-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kent Overstreet
2013-05-08 09:38:28 +0800
36f558890 aio: refcounting cleanup ... Browse Code »

The usage of ctx->dead was fubar - it makes no sense to explicitly check
it all over the place, especially when we're already using RCU.

Now, ctx->dead only indicates whether we've dropped the initial
refcount. The new teardown sequence is:

set ctx->dead
hlist_del_rcu();
synchronize_rcu();

Now we know no system calls can take a new ref, and it's safe to drop
the initial ref:

put_ioctx();

We also need to ensure there are no more outstanding kiocbs. This was
done incorrectly - it was being done in kill_ctx(), and before dropping
the initial refcount. At this point, other syscalls may still be
submitting kiocbs!

Now, we cancel and wait for outstanding kiocbs in free_ioctx(), after
kioctx->users has dropped to 0 and we know no more iocbs could be
submitted.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Cc: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Reviewed-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kent Overstreet
2013-05-08 09:38:28 +0800
11599ebac aio: make aio_put_req() lockless ... Browse Code »

Freeing a kiocb needed to touch the kioctx for three things:

* Pull it off the reqs_active list
* Decrementing reqs_active
* Issuing a wakeup, if the kioctx was in the process of being freed.

This patch moves these to aio_complete(), for a couple reasons:

* aio_complete() already has to issue the wakeup, so if we drop the
kioctx refcount before aio_complete does its wakeup we don't have to
do it twice.
* aio_complete currently has to take the kioctx lock, so it makes sense
for it to pull the kiocb off the reqs_active list too.
* A later patch is going to change reqs_active to include unreaped
completions - this will mean allocating a kiocb doesn't have to look
at the ringbuffer. So taking the decrement of reqs_active out of
kiocb_free() is useful prep work for that patch.

This doesn't really affect cancellation, since existing (usb) code that
implements a cancel function still calls aio_complete() - we just have
to make sure that aio_complete does the necessary teardown for cancelled
kiocbs.

It does affect code paths where we free kiocbs that were never
submitted; they need to decrement reqs_active and pull the kiocb off the
reqs_active list. This occurs in two places: kiocb_batch_free(), which
is going away in a later patch, and the error path in io_submit_one.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Acked-by: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Reviewed-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kent Overstreet
2013-05-08 09:38:28 +0800
1d98ebfcc aio: do fget() after aio_get_req() ... Browse Code »

aio_get_req() will fail if we have the maximum number of requests
outstanding, which depending on the application may not be uncommon. So
avoid doing an unnecessary fget().

Signed-off-by: Kent Overstreet
Cc: Zach Brown
Cc: Felipe Balbi
Cc: Greg Kroah-Hartman
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Rusty Russell
Cc: Jens Axboe
Cc: Asai Thambi S P
Cc: Selvan Mani
Cc: Sam Bradshaw
Acked-by: Jeff Moyer
Cc: Al Viro
Cc: Benjamin LaHaise
Reviewed-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kent Overstreet
2013-05-08 09:38:28 +0800