17 Apr, 2009
1 commit
-
splice: fix kernel-doc warnings
Warning(fs/splice.c:617): bad line:
Warning(fs/splice.c:722): No description found for parameter 'sd'
Warning(fs/splice.c:722): Excess function parameter 'pipe' description in 'splice_from_pipe_begin'Signed-off-by: Randy Dunlap
Signed-off-by: Linus Torvalds
15 Apr, 2009
6 commits
-
There are lots of sequences like this, especially in splice code:
if (pipe->inode)
mutex_lock(&pipe->inode->i_mutex);
/* do something */
if (pipe->inode)
mutex_unlock(&pipe->inode->i_mutex);so introduce helpers which do the conditional locking and unlocking.
Also replace the inode_double_lock() call with a pipe_double_lock()
helper to avoid spreading the use of this functionality beyond the
pipe code.This patch is just a cleanup, and should cause no behavioral changes.
Signed-off-by: Miklos Szeredi
Signed-off-by: Jens Axboe -
Remove the now unused generic_file_splice_write_nolock() function.
It's conceptually broken anyway, because splice may need to wait for
pipe events so holding locks across the whole operation is wrong.Signed-off-by: Miklos Szeredi
Signed-off-by: Jens Axboe -
Rearrange locking of i_mutex on destination and call to
ocfs2_rw_lock() so locks are only held while buffers are copied with
the pipe_to_file() actor, and not while waiting for more data on the
pipe.Signed-off-by: Miklos Szeredi
Signed-off-by: Jens Axboe -
Rearrange locking of i_mutex on destination so it's only held while
buffers are copied with the pipe_to_file() actor, and not while
waiting for more data on the pipe.Signed-off-by: Miklos Szeredi
Signed-off-by: Jens Axboe -
splice_from_pipe() is only called from two places:
- generic_splice_sendpage()
- splice_write_null()Neither of these require i_mutex to be taken on the destination inode.
Signed-off-by: Miklos Szeredi
Signed-off-by: Jens Axboe -
Split up __splice_from_pipe() into four helper functions:
splice_from_pipe_begin()
splice_from_pipe_next()
splice_from_pipe_feed()
splice_from_pipe_end()splice_from_pipe_next() will wait (if necessary) for more buffers to
be added to the pipe. splice_from_pipe_feed() will feed the buffers
to the supplied actor and return when there's no more data available
(or if all of the requested data has been copied).This is necessary so that implementations can do locking around the
non-waiting splice_from_pipe_feed().This patch should not cause any change in behavior.
Signed-off-by: Miklos Szeredi
Signed-off-by: Jens Axboe
07 Apr, 2009
1 commit
-
There's a possible deadlock in generic_file_splice_write(),
splice_from_pipe() and ocfs2_file_splice_write():- task A calls generic_file_splice_write()
- this calls inode_double_lock(), which locks i_mutex on both
pipe->inode and target inode
- ordering depends on inode pointers, can happen that pipe->inode is
locked first
- __splice_from_pipe() needs more data, calls pipe_wait()
- this releases lock on pipe->inode, goes to interruptible sleep
- task B calls generic_file_splice_write(), similarly to the first
- this locks pipe->inode, then tries to lock inode, but that is
already held by task A
- task A is interrupted, it tries to lock pipe->inode, but fails, as
it is already held by task B
- ABBA deadlockFix this by explicitly ordering locks: the outer lock must be on
target inode and the inner lock (which is later unlocked and relocked)
must be on pipe->inode. This is OK, pipe inodes and target inodes
form two nonoverlapping sets, generic_file_splice_write() and friends
are not called with a target which is a pipe.Signed-off-by: Miklos Szeredi
Acked-by: Mark Fasheh
Acked-by: Jens Axboe
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds
03 Apr, 2009
1 commit
-
Recruit a page flag to aid in cache management. The following extra flag is
defined:(1) PG_fscache (PG_private_2)
The marked page is backed by a local cache and is pinning resources in the
cache driver.If PG_fscache is set, then things that checked for PG_private will now also
check for that. This includes things like truncation and page invalidation.
The function page_has_private() had been added to make the checks for both
PG_private and PG_private_2 at the same time.Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Rik van Riel
Acked-by: Al Viro
Tested-by: Daire Byrne
14 Jan, 2009
1 commit
-
Signed-off-by: Heiko Carstens
09 Jan, 2009
1 commit
-
A big patch for changing memcg's LRU semantics.
Now,
- page_cgroup is linked to mem_cgroup's its own LRU (per zone).- LRU of page_cgroup is not synchronous with global LRU.
- page and page_cgroup is one-to-one and statically allocated.
- To find page_cgroup is on what LRU, you have to check pc->mem_cgroup as
- lru = page_cgroup_zoneinfo(pc, nid_of_pc, zid_of_pc);- SwapCache is handled.
And, when we handle LRU list of page_cgroup, we do following.
pc = lookup_page_cgroup(page);
lock_page_cgroup(pc); .....................(1)
mz = page_cgroup_zoneinfo(pc);
spin_lock(&mz->lru_lock);
.....add to LRU
spin_unlock(&mz->lru_lock);
unlock_page_cgroup(pc);But (1) is spin_lock and we have to be afraid of dead-lock with zone->lru_lock.
So, trylock() is used at (1), now. Without (1), we can't trust "mz" is correct.This is a trial to remove this dirty nesting of locks.
This patch changes mz->lru_lock to be zone->lru_lock.
Then, above sequence will be written asspin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRU
mem_cgroup_add/remove/etc_lru() {
pc = lookup_page_cgroup(page);
mz = page_cgroup_zoneinfo(pc);
if (PageCgroupUsed(pc)) {
....add to LRU
}
spin_lock(&zone->lru_lock); # in vmscan.c or swap.c via global LRUThis is much simpler.
(*) We're safe even if we don't take lock_page_cgroup(pc). Because..
1. When pc->mem_cgroup can be modified.
- at charge.
- at account_move().
2. at charge
the PCG_USED bit is not set before pc->mem_cgroup is fixed.
3. at account_move()
the page is isolated and not on LRU.Pros.
- easy for maintenance.
- memcg can make use of laziness of pagevec.
- we don't have to duplicated LRU/Active/Unevictable bit in page_cgroup.
- LRU status of memcg will be synchronized with global LRU's one.
- # of locks are reduced.
- account_move() is simplified very much.
Cons.
- may increase cost of LRU rotation.
(no impact if memcg is not configured.)Signed-off-by: KAMEZAWA Hiroyuki
Cc: Li Zefan
Cc: Balbir Singh
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
31 Oct, 2008
1 commit
-
Nothing uses prepare_write or commit_write. Remove them from the tree
completely.[akpm@linux-foundation.org: schedule simple_prepare_write() for unexporting]
Signed-off-by: Nick Piggin
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 Oct, 2008
1 commit
-
This is debatable, but while we're debating it, let's disallow the
combination of splice and an O_APPEND destination.It's not entirely clear what the semantics of O_APPEND should be, and
POSIX apparently expects pwrite() to ignore O_APPEND, for example. So
we could make up any semantics we want, including the old ones.But Miklos convinced me that we should at least give it some thought,
and that accepting writes at arbitrary offsets is wrong at least for
IS_APPEND() files (which always have O_APPEND set, even if the reverse
isn't true: you can obviously have O_APPEND set on a regular file).So disallow O_APPEND entirely for now. I doubt anybody cares, and this
way we have one less gray area to worry about.Reported-and-argued-for-by: Miklos Szeredi
Acked-by: Jens Axboe
Signed-off-by: Linus Torvalds
05 Aug, 2008
1 commit
-
Converting page lock to new locking bitops requires a change of page flag
operation naming, so we might as well convert it to something nicer
(!TestSetPageLocked_Lock => trylock_page, SetPageLocked => set_page_locked).This also facilitates lockdeping of page lock.
Signed-off-by: Nick Piggin
Acked-by: KOSAKI Motohiro
Acked-by: Peter Zijlstra
Acked-by: Andrew Morton
Acked-by: Benjamin Herrenschmidt
Signed-off-by: Linus Torvalds
27 Jul, 2008
2 commits
-
All calls to remove_suid() are made with a file pointer, because
(similarly to file_update_time) it is called when the file is written.Clean up callers by passing in a file instead of a dentry.
Signed-off-by: Miklos Szeredi
-
Use get_user_pages_fast in splice. This reverts some mmap_sem batching
there, however the biggest problem with mmap_sem tends to be hold times
blocking out other threads rather than cacheline bouncing. Further: on
architectures that implement get_user_pages_fast without locks, mmap_sem
can be avoided completely anyway.Signed-off-by: Nick Piggin
Cc: Dave Kleikamp
Cc: Andy Whitcroft
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Andi Kleen
Cc: Dave Kleikamp
Cc: Badari Pulavarty
Cc: Zach Brown
Cc: Jens Axboe
Reviewed-by: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Jul, 2008
1 commit
-
If a page was invalidated during splicing from file to a pipe, then
generic_file_splice_read() could return a short or zero count.This manifested itself in rare I/O errors seen on nfs exported fuse
filesystems. This is because nfsd uses splice_direct_to_actor() to read
files, and fuse uses invalidate_inode_pages2() to invalidate stale data on
open.Fix by redoing the page find/create if it was found to be truncated
(invalidated).Signed-off-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Jens Axboe
28 May, 2008
2 commits
-
splice currently assumes that try_to_release_page() always suceeds,
but it can return failure. If it does, we cannot steal the page.Acked-by: Mingming Cao
-
Splice isn't always incrementing the ppos correctly, which broke
relay splice.Signed-off-by: Tom Zanussi
Tested-by: Dan Williams
Signed-off-by: Jens Axboe
08 May, 2008
1 commit
-
This reverts commit c3270e577c18b3d0e984c3371493205a4807db9d.
07 May, 2008
1 commit
-
generic_file_splice_write() duplicates remove_suid() just because it
doesn't hold i_mutex. But it grabs i_mutex inside splice_from_pipe()
anyway, so this is rather pointless.Move locking to generic_file_splice_write() and call remove_suid() and
__splice_from_pipe() instead.Signed-off-by: Miklos Szeredi
Signed-off-by: Jens Axboe
29 Apr, 2008
1 commit
-
Splice isn't always incrementing the ppos correctly, which broke
relay splice.Signed-off-by: Tom Zanussi
Signed-off-by: Jens Axboe
10 Apr, 2008
1 commit
-
There's a quirky loop in generic_file_splice_read() that could go
on indefinitely, if the file splice returns 0 permanently (and not
just as a temporary condition). Get rid of the loop and pass
back -EAGAIN correctly from __generic_file_splice_read(), so we
handle that condition properly as well.Signed-off-by: Jens Axboe
04 Apr, 2008
1 commit
-
The loop block driver is careful to mask __GFP_IO|__GFP_FS out of its
mapping_gfp_mask, to avoid hangs under memory pressure. But nowadays
it uses splice, usually going through __generic_file_splice_read. That
must use mapping_gfp_mask instead of GFP_KERNEL to avoid those hangs.Signed-off-by: Hugh Dickins
Cc: Jens Axboe
Cc: Andrew Morton
Signed-off-by: Linus Torvalds
04 Mar, 2008
1 commit
-
sys_tee() currently is a bit eager in returning -EAGAIN, it may do so
even if we don't have a chance of anymore data becoming available. So
improve the logic and only return -EAGAIN if we have an attached writer
to the input pipe.Reported by Johann Felix Soden and
Patrick McManus .Tested-by: Johann Felix Soden
Signed-off-by: Jens Axboe
11 Feb, 2008
1 commit
-
Commit 8811930dc74a503415b35c4a79d14fb0b408a361 ("splice: missing user
pointer access verification") added the proper access_ok() calls to
copy_from_user_mmap_sem() which ensures we can copy the struct iovecs
from userspace to the kernel.But we also must check whether we can access the actual memory region
pointed to by the struct iovec to fix the access checks properly.Signed-off-by: Bastian Blank
Acked-by: Oliver Pinter
Cc: Jens Axboe
Cc: Andrew Morton
Signed-off-by: Pekka Enberg
Signed-off-by: Linus Torvalds
09 Feb, 2008
1 commit
-
vmsplice_to_user() must always check the user pointer and length
with access_ok() before copying. Likewise, for the slow path of
copy_from_user_mmap_sem() we need to check that we may read from
the user region.Signed-off-by: Jens Axboe
Cc: Wojciech Purczynski
Signed-off-by: Greg Kroah-Hartman
Signed-off-by: Linus Torvalds
01 Feb, 2008
1 commit
-
Andre Majorel points out that if we only updated
the atime when we transfer some data, we deviate from the standard
of always updating the atime. So change splice to always call
file_accessed() even if splice_direct_to_actor() didn't transfer
any data.Signed-off-by: Jens Axboe
30 Jan, 2008
1 commit
-
A bug report on nfsd that states that since it was switched to use
splice instead of sendfile, the atime was no longer being updated
on the input file. do_generic_mapping_read() does this when accessing
the file, make splice do it for the direct splice handler.Signed-off-by: Jens Axboe
29 Jan, 2008
1 commit
-
Allow caller to pass in a release function, there might be
other resources that need releasing as well. Needed for
network receive.Signed-off-by: Jens Axboe
Signed-off-by: David S. Miller
25 Jan, 2008
1 commit
-
All instances of rw_verify_area() are followed by a call to
security_file_permission(), so just call the latter from the former.Acked-by: Eric Paris
Signed-off-by: James Morris
17 Oct, 2007
4 commits
-
Implement file posix capabilities. This allows programs to be given a
subset of root's powers regardless of who runs them, without having to use
setuid and giving the binary all of root's powers.This version works with Kaigai Kohei's userspace tools, found at
http://www.kaigai.gr.jp/index.php. For more information on how to use this
patch, Chris Friedhoff has posted a nice page at
http://www.friedhoff.org/fscaps.html.Changelog:
Nov 27:
Incorporate fixes from Andrew Morton
(security-introduce-file-caps-tweaks and
security-introduce-file-caps-warning-fix)
Fix Kconfig dependency.
Fix change signaling behavior when file caps are not compiled in.Nov 13:
Integrate comments from Alexey: Remove CONFIG_ ifdef from
capability.h, and use %zd for printing a size_t.Nov 13:
Fix endianness warnings by sparse as suggested by Alexey
Dobriyan.Nov 09:
Address warnings of unused variables at cap_bprm_set_security
when file capabilities are disabled, and simultaneously clean
up the code a little, by pulling the new code into a helper
function.Nov 08:
For pointers to required userspace tools and how to use
them, see http://www.friedhoff.org/fscaps.html.Nov 07:
Fix the calculation of the highest bit checked in
check_cap_sanity().Nov 07:
Allow file caps to be enabled without CONFIG_SECURITY, since
capabilities are the default.
Hook cap_task_setscheduler when !CONFIG_SECURITY.
Move capable(TASK_KILL) to end of cap_task_kill to reduce
audit messages.Nov 05:
Add secondary calls in selinux/hooks.c to task_setioprio and
task_setscheduler so that selinux and capabilities with file
cap support can be stacked.Sep 05:
As Seth Arnold points out, uid checks are out of place
for capability code.Sep 01:
Define task_setscheduler, task_setioprio, cap_task_kill, and
task_setnice to make sure a user cannot affect a process in which
they called a program with some fscaps.One remaining question is the note under task_setscheduler: are we
ok with CAP_SYS_NICE being sufficient to confine a process to a
cpuset?It is a semantic change, as without fsccaps, attach_task doesn't
allow CAP_SYS_NICE to override the uid equivalence check. But since
it uses security_task_setscheduler, which elsewhere is used where
CAP_SYS_NICE can be used to override the uid equivalence check,
fixing it might be tough.task_setscheduler
note: this also controls cpuset:attach_task. Are we ok with
CAP_SYS_NICE being used to confine to a cpuset?
task_setioprio
task_setnice
sys_setpriority uses this (through set_one_prio) for another
process. Need same checks as setrlimitAug 21:
Updated secureexec implementation to reflect the fact that
euid and uid might be the same and nonzero, but the process
might still have elevated caps.Aug 15:
Handle endianness of xattrs.
Enforce capability version match between kernel and disk.
Enforce that no bits beyond the known max capability are
set, else return -EPERM.
With this extra processing, it may be worth reconsidering
doing all the work at bprm_set_security rather than
d_instantiate.Aug 10:
Always call getxattr at bprm_set_security, rather than
caching it at d_instantiate.[morgan@kernel.org: file-caps clean up for linux/capability.h]
[bunk@kernel.org: unexport cap_inode_killpriv]
Signed-off-by: Serge E. Hallyn
Cc: Stephen Smalley
Cc: James Morris
Cc: Chris Wright
Cc: Andrew Morgan
Signed-off-by: Andrew Morgan
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
* 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block: (63 commits)
Fix memory leak in dm-crypt
SPARC64: sg chaining support
SPARC: sg chaining support
PPC: sg chaining support
PS3: sg chaining support
IA64: sg chaining support
x86-64: enable sg chaining
x86-64: update pci-gart iommu to sg helpers
x86-64: update nommu to sg helpers
x86-64: update calgary iommu to sg helpers
swiotlb: sg chaining support
i386: enable sg chaining
i386 dma_map_sg: convert to using sg helpers
mmc: need to zero sglist on init
Panic in blk_rq_map_sg() from CCISS driver
remove sglist_len
remove blk_queue_max_phys_segments in libata
revert sg segment size ifdefs
Fixup u14-34f ENABLE_SG_CHAINING
qla1280: enable use_sg_chaining option
... -
These are intended to replace prepare_write and commit_write with more
flexible alternatives that are also able to avoid the buffered write
deadlock problems efficiently (which prepare_write is unable to do).[mark.fasheh@oracle.com: API design contributions, code review and fixes]
[akpm@linux-foundation.org: various fixes]
[dmonakhov@sw.ru: new aop block_write_begin fix]
Signed-off-by: Nick Piggin
Signed-off-by: Mark Fasheh
Signed-off-by: Dmitriy Monakhov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Combine the file_ra_state members
unsigned long prev_index
unsigned int prev_offset
into
loff_t prev_posIt is more consistent and better supports huge files.
Thanks to Peter for the nice proposal!
[akpm@linux-foundation.org: fix shift overflow]
Cc: Peter Zijlstra
Signed-off-by: Fengguang Wu
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 Oct, 2007
1 commit
-
The out label should not include the unmap, the only way to jump
there already has unmapped the source.00002000
f7c21a00 00000000 00000000 c0489036 00018e32 00000002 00000000
00001000
Call Trace:
[] pipe_to_user+0xca/0xd3
[] __splice_from_pipe+0x53/0x1bd
[] ------------[ cut here ]------------
filemap_fault+0x221/0x380
[] pipe_to_user+0x0/0xd3
[] sys_vmsplice+0x3b7/0x422
[] kernel BUG at mm/highmem.c:206!
handle_mm_fault+0x4d5/0x8eb
[] kmap_atomic+0x1c/0x20
[] unmap_vmas+0x3d1/0x584
[] free_pgtables+0x90/0xa0
[] pgd_dtor+0x0/0x1
[] audit_syscall_exit+0x2aa/0x2c6
[] do_syscall_trace+0x124/0x169
[] syscall_call+0x7/0xb
=======================
Code: 2d 00 d0 5b 00 25 00 00 e0 ff 29 invalid opcode: 0000 [#1]
c2 89 d0 c1 e8 0c 8b 14 85 a0 6c 7c c0 4a 85 d2 89 14 85 a0 6c 7c c0 74 07
31 c9 4a 75 15 eb 04 0b eb fe 31 c9 81 3d 78 38 6d c0 78 38 6d c0 0f
95 c1 b0 01
EIP: [] kunmap_high+0x51/0x8e SS:ESP 0068:f5960df0
SMP
Modules linked in: netconsole autofs4 hidp nfs lockd nfs_acl rfcomm l2cap
bluetooth sunrpc ipv6 ib_iser rdma_cm ib_cm iw_cmib_sa ib_mad ib_core
ib_addr iscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_multipath
dm_mod video output sbs batteryac parport_pc lp parport sg i2c_piix4
i2c_core floppy cfi_probe gen_probe scb2_flash mtd chipreg tg3 e1000 button
ide_cd serio_raw cdrom aic7xxx scsi_transport_spi sd_mod scsi_mod ext3 jbd
ehci_hcd ohci_hcd uhci_hcd
CPU: 3
EIP: 0060:[] Not tainted VLI
EFLAGS: 00010246 (2.6.23 #1)
EIP is at kunmap_high+0x51/0x8eSigned-off-by: Jens Axboe
02 Oct, 2007
1 commit
-
Nick Piggin points out that splice isn't being good about the mmap
semaphore: while two readers can nest inside each others, it does leave
a possible deadlock if a writer (ie a new mmap()) comes in during that
nesting.Original "just move the locking" patch by Nick, replaced by one by me
based on an optimistic pagefault_disable(). And then Jens tested and
updated that patch.Reported-by: Nick Piggin
Tested-by: Jens Axboe
Cc: Andrew Morton
Signed-off-by: Linus Torvalds
27 Jul, 2007
1 commit
-
Fix some typos in pipe.c and splice.c.
Add pipes API to kernel-api.tmpl.Signed-off-by: Randy Dunlap
Signed-off-by: Jens Axboe
21 Jul, 2007
1 commit
-
If add_to_page_cache_lru() fails, the page will not be locked. But
splice jumps to an error path that does a page release and unlock,
causing a BUG() in unlock_page().Fix this by adding one more label that just releases the page. This bug
was actually triggered on EL5 by gurudas pai
using fio.Signed-off-by: Jens Axboe
Signed-off-by: Linus Torvalds
20 Jul, 2007
1 commit
-
Split ondemand readahead interface into two functions. I think this makes it
a little clearer for non-readahead experts (like Rusty).Internally they both call ondemand_readahead(), but the page argument is
changed to an obvious boolean flag.Signed-off-by: Rusty Russell
Signed-off-by: Fengguang Wu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds