08 Oct, 2016
1 commit
-
Allow some seq_puts removals by taking a string instead of a single
char.[akpm@linux-foundation.org: update vmstat_show(), per Joe]
Link: http://lkml.kernel.org/r/667e1cf3d436de91a5698170a1e98d882905e956.1470704995.git.joe@perches.com
Signed-off-by: Joe Perches
Cc: Joe Perches
Cc: Andi Kleen
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Aug, 2016
1 commit
-
seq_read() is a nasty piece of work, not to mention buggy.
It has (I think) an old bug which allows unprivileged userspace to read
beyond the end of m->buf.I was getting these:
BUG: KASAN: slab-out-of-bounds in seq_read+0xcd2/0x1480 at addr ffff880116889880
Read of size 2713 by task trinity-c2/1329
CPU: 2 PID: 1329 Comm: trinity-c2 Not tainted 4.8.0-rc1+ #96
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
Call Trace:
kasan_object_err+0x1c/0x80
kasan_report_error+0x2cb/0x7e0
kasan_report+0x4e/0x80
check_memory_region+0x13e/0x1a0
kasan_check_read+0x11/0x20
seq_read+0xcd2/0x1480
proc_reg_read+0x10b/0x260
do_loop_readv_writev.part.5+0x140/0x2c0
do_readv_writev+0x589/0x860
vfs_readv+0x7b/0xd0
do_readv+0xd8/0x2c0
SyS_readv+0xb/0x10
do_syscall_64+0x1b3/0x4b0
entry_SYSCALL64_slow_path+0x25/0x25
Object at ffff880116889100, in cache kmalloc-4096 size: 4096
Allocated:
PID = 1329
save_stack_trace+0x26/0x80
save_stack+0x46/0xd0
kasan_kmalloc+0xad/0xe0
__kmalloc+0x1aa/0x4a0
seq_buf_alloc+0x35/0x40
seq_read+0x7d8/0x1480
proc_reg_read+0x10b/0x260
do_loop_readv_writev.part.5+0x140/0x2c0
do_readv_writev+0x589/0x860
vfs_readv+0x7b/0xd0
do_readv+0xd8/0x2c0
SyS_readv+0xb/0x10
do_syscall_64+0x1b3/0x4b0
return_from_SYSCALL_64+0x0/0x6a
Freed:
PID = 0
(stack is not available)
Memory state around the buggy address:
ffff88011688a000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
ffff88011688a080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffff88011688a100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff88011688a180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88011688a200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
Disabling lock debugging due to kernel taintThis seems to be the same thing that Dave Jones was seeing here:
https://lkml.org/lkml/2016/8/12/334
There are multiple issues here:
1) If we enter the function with a non-empty buffer, there is an attempt
to flush it. But it was not clearing m->from after doing so, which
means that if we try to do this flush twice in a row without any call
to traverse() in between, we are going to be reading from the wrong
place -- the splat above, fixed by this patch.2) If there's a short write to userspace because of page faults, the
buffer may already contain multiple lines (i.e. pos has advanced by
more than 1), but we don't save the progress that was made so the
next call will output what we've already returned previously. Since
that is a much less serious issue (and I have a headache after
staring at seq_read() for the past 8 hours), I'll leave that for now.Link: http://lkml.kernel.org/r/1471447270-32093-1-git-send-email-vegard.nossum@oracle.com
Signed-off-by: Vegard Nossum
Reported-by: Dave Jones
Cc: Al Viro
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Apr, 2016
1 commit
-
A lot of seqfile users seem to be using things like %pK that uses the
credentials of the current process, but that is actually completely
wrong for filesystem interfaces.The unix semantics for permission checking files is to check permissions
at _open_ time, not at read or write time, and that is not just a small
detail: passing off stdin/stdout/stderr to a suid application and making
the actual IO happen in privileged context is a classic exploit
technique.So if we want to be able to look at permissions at read time, we need to
use the file open credentials, not the current ones. Normal file
accesses can just use "f_cred" (or any of the helper functions that do
that, like file_ns_capable()), but the seqfile interfaces do not have
any such options.It turns out that seq_file _does_ save away the user_ns information of
the file, though. Since user_ns is just part of the full credential
information, replace that special case with saving off the cred pointer
instead, and suddenly seq_file has all the permission information it
needs.Signed-off-by: Linus Torvalds
07 Nov, 2015
3 commits
-
Since 5cec38ac866b ("fs, seq_file: fallback to vmalloc instead of oom kill
processes") seq_buf_alloc() avoids calling the oom killer for PAGE_SIZE or
smaller allocations; but larger allocations can use the oom killer via
vmalloc(). Thus reads of small files can return ENOMEM, but larger files
use the oom killer to avoid ENOMEM.The effect of this bug is that reads from /proc and other virtual
filesystems can return ENOMEM instead of the preferred behavior - oom
killing something (possibly the calling process). I don't know of anyone
except Google who has noticed the issue.I suspect the fix is more needed in smaller systems where there isn't any
reclaimable memory. But these seem like the kinds of systems which
probably don't use the oom killer for production situations.Memory overcommit requires use of the oom killer to select a victim
regardless of file size.Enable oom killer for small seq_buf_alloc() allocations.
Fixes: 5cec38ac866b ("fs, seq_file: fallback to vmalloc instead of oom kill processes")
Signed-off-by: David Rientjes
Signed-off-by: Greg Thelen
Acked-by: Eric Dumazet
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
strint_escape_str() escapes input string by given criteria. In case of
seq_escape() the criteria is to convert some characters to their octal
representation.Signed-off-by: Andy Shevchenko
Cc: Alexander Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This improves code readability.
Signed-off-by: Andy Shevchenko
Cc: Alexander Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Sep, 2015
1 commit
-
The seq_ function return values were frequently misused.
See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to
seq_has_overflowed() and make public")All uses of these return values have been removed, so convert the
return types to void.Miscellanea:
o Move seq_put_decimal_ and seq_escape prototypes closer the
other seq_vprintf prototypes
o Reorder seq_putc and seq_puts to return early on overflow
o Add argument names to seq_vprintf and seq_printf
o Update the seq_escape kernel-doc
o Convert a couple of leading spaces to tabs in seq_escapeSigned-off-by: Joe Perches
Cc: Al Viro
Cc: Steven Rostedt
Cc: Mark Brown
Cc: Stephen Rothwell
Cc: Joerg Roedel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
11 Sep, 2015
1 commit
-
This introduces a new helper and switches current users to use it. All
patches are compiled tested. kmemleak is tested via its own test suite.This patch (of 6):
The new seq_hex_dump() is a complete analogue of print_hex_dump().
We have few users of this functionality already. It allows to reduce their
codebase.Signed-off-by: Andy Shevchenko
Cc: Alexander Viro
Cc: Joe Perches
Cc: Tadeusz Struk
Cc: Helge Deller
Cc: Ingo Tuchscherer
Cc: Catalin Marinas
Cc: Vladimir Kondratiev
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
05 Jul, 2015
1 commit
-
Pull more vfs updates from Al Viro:
"Assorted VFS fixes and related cleanups (IMO the most interesting in
that part are f_path-related things and Eric's descriptor-related
stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
fs-cache series, DAX patches, Jan's file_remove_suid() work"[ I'd say this is much more than "fixes and related cleanups". The
file_table locking rule change by Eric Dumazet is a rather big and
fundamental update even if the patch isn't huge. - Linus ]* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
9p: cope with bogus responses from server in p9_client_{read,write}
p9_client_write(): avoid double p9_free_req()
9p: forgetting to cancel request on interrupted zero-copy RPC
dax: bdev_direct_access() may sleep
block: Add support for DAX reads/writes to block devices
dax: Use copy_from_iter_nocache
dax: Add block size note to documentation
fs/file.c: __fget() and dup2() atomicity rules
fs/file.c: don't acquire files->file_lock in fd_install()
fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
vfs: avoid creation of inode number 0 in get_next_ino
namei: make set_root_rcu() return void
make simple_positive() public
ufs: use dir_pages instead of ufs_dir_pages()
pagemap.h: move dir_pages() over there
remove the pointless include of lglock.h
fs: cleanup slight list_entry abuse
xfs: Correctly lock inode when removing suid and file capabilities
fs: Call security_ops->inode_killpriv on truncate
fs: Provide function telling whether file_remove_privs() will do anything
...
02 Jul, 2015
1 commit
-
Merge third patchbomb from Andrew Morton:
- the rest of MM
- scripts/gdb updates
- ipc/ updates
- lib/ updates
- MAINTAINERS updates
- various other misc things
* emailed patches from Andrew Morton : (67 commits)
genalloc: rename of_get_named_gen_pool() to of_gen_pool_get()
genalloc: rename dev_get_gen_pool() to gen_pool_get()
x86: opt into HAVE_COPY_THREAD_TLS, for both 32-bit and 64-bit
MAINTAINERS: add zpool
MAINTAINERS: BCACHE: Kent Overstreet has changed email address
MAINTAINERS: move Jens Osterkamp to CREDITS
MAINTAINERS: remove unused nbd.h pattern
MAINTAINERS: update brcm gpio filename pattern
MAINTAINERS: update brcm dts pattern
MAINTAINERS: update sound soc intel patterns
MAINTAINERS: remove website for paride
MAINTAINERS: update Emulex ocrdma email addresses
bcache: use kvfree() in various places
libcxgbi: use kvfree() in cxgbi_free_big_mem()
target: use kvfree() in session alloc and free
IB/ehca: use kvfree() in ipz_queue_{cd}tor()
drm/nouveau/gem: use kvfree() in u_free()
drm: use kvfree() in drm_free_large()
cxgb4: use kvfree() in t4_free_mem()
cxgb3: use kvfree() in cxgb_free_mem()
...
01 Jul, 2015
2 commits
-
seq_open() stores its struct seq_file in file->private_data, thus it must
not be modified by user of seq_file.Link: http://lkml.kernel.org/r/cover.1433193673.git.ydroneaud@opteya.com
Signed-off-by: Yann Droneaud
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Since patch described below, from v2.6.15-rc1, seq_open() could use a
struct seq_file already allocated by the caller if the pointer to the
structure is stored in file->private_data before calling the function.Commit 1abe77b0fc4b485927f1f798ae81a752677e1d05
Author: Al Viro
Date: Mon Nov 7 17:15:34 2005 -0500[PATCH] allow callers of seq_open do allocation themselves
Allow caller of seq_open() to kmalloc() seq_file + whatever else they
want and set ->private_data to it. seq_open() will then abstain from
doing allocation itself.As there's no more use for such feature, as it could be easily replaced by
calls to seq_open_private() (see commit 39699037a5c9 ("[FS] seq_file:
Introduce the seq_open_private()")) and seq_release_private() (see
v2.6.0-test3), support for this uncommon feature can be removed from
seq_open().Link: http://lkml.kernel.org/r/cover.1433193673.git.ydroneaud@opteya.com
Signed-off-by: Yann Droneaud
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
24 Jun, 2015
1 commit
-
Turn
seq_path(..., &file->f_path, ...);
into
seq_file_path(..., file, ...);Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro
03 Jun, 2015
1 commit
-
Now that we're guaranteed to have a meaningful root dentry, we can just
export seq_dentry() and use it in btrfs_show_options(). The subvolume ID
is easy to get and can also be useful, so put that in there, too.Reviewed-by: David Sterba
Signed-off-by: Omar Sandoval
Signed-off-by: Chris Mason
14 Feb, 2015
1 commit
-
Now that all bitmap formatting usages have been converted to
'%*pb[l]', the separate formatting functions are unnecessary. The
following functions are removed.* bitmap_scn[list]printf()
* cpumask_scnprintf(), cpulist_scnprintf()
* [__]nodemask_scnprintf(), [__]nodelist_scnprintf()
* seq_bitmap[_list](), seq_cpumask[_list](), seq_nodemask[_list]()
* seq_buf_bitmask()Signed-off-by: Tejun Heo
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
14 Dec, 2014
1 commit
-
Since commit 058504edd026 ("fs/seq_file: fallback to vmalloc allocation"),
seq_buf_alloc() falls back to vmalloc() when the kmalloc() for contiguous
memory fails. This was done to address order-4 slab allocations for
reading /proc/stat on large machines and noticed because
PAGE_ALLOC_COSTLY_ORDER < 4, so there is no infinite loop in the page
allocator when allocating new slab for such high-order allocations.Contiguous memory isn't necessary for caller of seq_buf_alloc(), however.
Other GFP_KERNEL high-order allocations that are
Cc: Heiko Carstens
Cc: Christoph Hellwig
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
30 Oct, 2014
1 commit
-
The return values of seq_printf/puts/putc are frequently misused.
Start down a path to remove all the return value uses of these
functions.Move the seq_overflow() to a global inlined function called
seq_has_overflowed() that can be used by the users of seq_file() calls.Update the documentation to not show return types for seq_printf
et al. Add a description of seq_has_overflowed().Link: http://lkml.kernel.org/p/848ac7e3d1c31cddf638a8526fa3c59fa6fdeb8a.1412031505.git.joe@perches.com
Cc: Al Viro
Signed-off-by: Joe Perches
[ Reworked the original patch from Joe ]
Signed-off-by: Steven Rostedt
04 Jul, 2014
1 commit
-
There are a couple of seq_files which use the single_open() interface.
This interface requires that the whole output must fit into a single
buffer.E.g. for /proc/stat allocation failures have been observed because an
order-4 memory allocation failed due to memory fragmentation. In such
situations reading /proc/stat is not possible anymore.Therefore change the seq_file code to fallback to vmalloc allocations
which will usually result in a couple of order-0 allocations and hence
also work if memory is fragmented.For reference a call trace where reading from /proc/stat failed:
sadc: page allocation failure: order:4, mode:0x1040d0
CPU: 1 PID: 192063 Comm: sadc Not tainted 3.10.0-123.el7.s390x #1
[...]
Call Trace:
show_stack+0x6c/0xe8
warn_alloc_failed+0xd6/0x138
__alloc_pages_nodemask+0x9da/0xb68
__get_free_pages+0x2e/0x58
kmalloc_order_trace+0x44/0xc0
stat_open+0x5a/0xd8
proc_reg_open+0x8a/0x140
do_dentry_open+0x1bc/0x2c8
finish_open+0x46/0x60
do_last+0x382/0x10d0
path_openat+0xc8/0x4f8
do_filp_open+0x46/0xa8
do_sys_open+0x114/0x1f0
sysc_tracego+0x14/0x1aSigned-off-by: Heiko Carstens
Tested-by: David Rientjes
Cc: Ian Kent
Cc: Hendrik Brueckner
Cc: Thorsten Diehl
Cc: Andrea Righi
Cc: Christoph Hellwig
Cc: Al Viro
Cc: Stefan Bader
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
19 Nov, 2013
1 commit
-
Once we'd freed m->buf, m->count should become zero - we have no valid
contents reachable via m->buf.Reported-by: Charley (Hao Chuan) Chu
Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds
15 Nov, 2013
1 commit
-
There are several users who want to know bytes written by seq_*() for
alignment purpose. Currently they are using %n format for knowing it
because seq_*() returns 0 on success.This patch introduces seq_setwidth() and seq_pad() for allowing them to
align without using %n format.Signed-off-by: Tetsuo Handa
Signed-off-by: Kees Cook
Cc: Joe Perches
Cc: David Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 Oct, 2013
1 commit
-
This issue was first pointed out by Jiaxing Wang several months ago, but no
further comments:
https://lkml.org/lkml/2013/6/29/41As we know pread() does not change f_pos, so after pread(), file->f_pos
and m->read_pos become different. And seq_lseek() does not update file->f_pos
if offset equals to m->read_pos, so after pread() and seq_lseek()(lseek to
m->read_pos), then a subsequent read may read from a wrong position, the
following program produces the problem:char str1[32] = { 0 };
char str2[32] = { 0 };
int poffset = 10;
int count = 20;/*open any seq file*/
int fd = open("/proc/modules", O_RDONLY);pread(fd, str1, count, poffset);
printf("pread:%s\n", str1);/*seek to where m->read_pos is*/
lseek(fd, poffset+count, SEEK_SET);/*supposed to read from poffset+count, but this read from position 0*/
read(fd, str2, count);
printf("read:%s\n", str2);out put:
pread:
ck_netbios_ns 12665
read:
nf_conntrack_netbios/proc/modules:
nf_conntrack_netbios_ns 12665 0 - Live 0xffffffffa038b000
nf_conntrack_broadcast 12589 1 nf_conntrack_netbios_ns, Live 0xffffffffa0386000So we always update file->f_pos to offset in seq_lseek() to fix this issue.
Signed-off-by: Jiaxing Wang
Signed-off-by: Gu Zheng
Signed-off-by: Al Viro
08 Jul, 2013
1 commit
-
When we convert the file_lock_list to a set of percpu lists, we'll need
a way to iterate over them in order to output /proc/locks info. Add
some seq_list_*_percpu helpers to handle that.Signed-off-by: Jeff Layton
Acked-by: J. Bruce Fields
Signed-off-by: Al Viro
10 Apr, 2013
1 commit
-
Same as single_open(), but preallocates the buffer of given size.
Doesn't make any sense for sizes up to PAGE_SIZE and doesn't make
sense if output of show() exceeds PAGE_SIZE only rarely - seq_read()
will take care of growing the buffer and redoing show(). If you
_know_ that it will be large, it might make more sense to look into
saner iterator, rather than go with single-shot one. If that's
impossible, single_open_size() might be for you.Again, don't use that without a good reason; occasionally that's really
the best way to go, but very often there are better solutions.Signed-off-by: Al Viro
04 Mar, 2013
1 commit
-
Pull more VFS bits from Al Viro:
"Unfortunately, it looks like xattr series will have to wait until the
next cycle ;-/This pile contains 9p cleanups and fixes (races in v9fs_fid_add()
etc), fixup for nommu breakage in shmem.c, several cleanups and a bit
more file_inode() work"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
constify path_get/path_put and fs_struct.c stuff
fix nommu breakage in shmem.c
cache the value of file_inode() in struct file
9p: if v9fs_fid_lookup() gets to asking server, it'd better have hashed dentry
9p: make sure ->lookup() adds fid to the right dentry
9p: untangle ->lookup() a bit
9p: double iput() in ->lookup() if d_materialise_unique() fails
9p: v9fs_fid_add() can't fail now
v9fs: get rid of v9fs_dentry
9p: turn fid->dlist into hlist
9p: don't bother with private lock in ->d_fsdata; dentry->d_lock will do just fine
more file_inode() open-coded instances
selinux: opened file can't have NULL or negative ->f_path.dentry(In the meantime, the hlist traversal macros have changed, so this
required a semantic conflict fixup for the newly hlistified fid->dlist)
28 Feb, 2013
3 commits
-
[akpm@linux-foundation.org: checkpatch fixes]
Cc: Cyrill Gorcunov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Cyrill Gorcunov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Al Viro
11 Jan, 2013
1 commit
-
Fix kernel-doc warnings in fs/seq_file.c:
Warning(fs/seq_file.c:304): No description found for parameter 'whence'
Warning(fs/seq_file.c:304): Excess function parameter 'origin' description in 'seq_lseek'Signed-off-by: Randy Dunlap
Cc: Alexander Viro
Signed-off-by: Linus Torvalds
18 Dec, 2012
1 commit
-
But the kernel decided to call it "origin" instead. Fix most of the
sites.Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Aug, 2012
1 commit
-
struct file already has a user namespace associated with it
in file->f_cred->user_ns, unfortunately because struct
seq_file has no struct file backpointer associated with
it, it is difficult to get at the user namespace in seq_file
context. Therefore add a helper function seq_user_ns to return
the associated user namespace and a user_ns field to struct
seq_file to be used in implementing seq_user_ns.Cc: Al Viro
Cc: Eric Dumazet
Cc: KAMEZAWA Hiroyuki
Cc: Alexey Dobriyan
Acked-by: David S. Miller
Acked-by: Serge Hallyn
Signed-off-by: Eric W. Biederman
11 Jun, 2012
1 commit
-
The existing seq_printf function is rewritten in terms of the new
seq_vprintf which is also exported to modules. This allows GFS2
(and potentially other seq_file users) to have a vprintf based
interface and to avoid an extra copy into a temporary buffer in
some cases.Signed-off-by: Steven Whitehouse
Reported-by: Eric Dumazet
Acked-by: Al Viro
25 Mar, 2012
1 commit
-
Pull cleanup of fs/ and lib/ users of module.h from Paul Gortmaker:
"Fix up files in fs/ and lib/ dirs to only use module.h if they really
need it.These are trivial in scope vs the work done previously. We now have
things where any few remaining cleanups can be farmed out to arch or
subsystem maintainers, and I have done so when possible. What is
remaining here represents the bits that don't clearly lie within a
single arch/subsystem boundary, like the fs dir and the lib dir.Some duplicate includes arising from overlapping fixes from
independent subsystem maintainer submissions are also quashed."Fix up trivial conflicts due to clashes with other include file cleanups
(including some due to the previous bug.h cleanup pull).* tag 'module-for-3.4' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
lib: reduce the use of module.h wherever possible
fs: reduce the use of module.h wherever possible
includecheck: delete any duplicate instances of module.h
24 Mar, 2012
3 commits
-
It is undocumented but a seq_file's overflow state is indicated by
m->count == m->size. Add seq_set_overflow() and seq_overflow() to
set/check overflow status explicitly.Based on an idea from Eric Dumazet.
[akpm@linux-foundation.org: tweak code comment]
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Eric Dumazet
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Process accounting applications as top, ps visit some files under
/proc/. With seq_put_decimal_ull(), we can optimize /proc//stat
and /proc//statm files.This patch adds
- seq_put_decimal_ll() for signed values.
- allow delimiter == 0.
- convert seq_printf() to seq_put_decimal_ull/ll in /proc/stat, statm.Test result on a system with 2000+ procs.
Before patch:
[kamezawa@bluextal test]$ top -b -n 1 | wc -l
2223
[kamezawa@bluextal test]$ time top -b -n 1 > /dev/nullreal 0m0.675s
user 0m0.044s
sys 0m0.121s[kamezawa@bluextal test]$ time ps -elf > /dev/null
real 0m0.236s
user 0m0.056s
sys 0m0.176sAfter patch:
kamezawa@bluextal ~]$ time top -b -n 1 > /dev/nullreal 0m0.657s
user 0m0.052s
sys 0m0.100s[kamezawa@bluextal ~]$ time ps -elf > /dev/null
real 0m0.198s
user 0m0.050s
sys 0m0.145sConsidering top, ps tend to scan /proc periodically, this will reduce cpu
consumption by top/ps to some extent.[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
== stat_check.py
num = 0
with open("/proc/stat") as f:
while num < 1000 :
data = f.read()
f.seek(0, 0)
num = num + 1
==perf shows
20.39% stat_check.py [kernel.kallsyms] [k] format_decode
13.41% stat_check.py [kernel.kallsyms] [k] number
12.61% stat_check.py [kernel.kallsyms] [k] vsnprintf
10.85% stat_check.py [kernel.kallsyms] [k] memcpy
4.85% stat_check.py [kernel.kallsyms] [k] radix_tree_lookup
4.43% stat_check.py [kernel.kallsyms] [k] seq_printfThis patch removes most of calls to vsnprintf() by adding num_to_str()
and seq_print_decimal_ull(), which prints decimal numbers without rich
functions provided by printf().On my 8cpu box.
== Before patch ==
[root@bluextal test]# time ./stat_check.pyreal 0m0.150s
user 0m0.026s
sys 0m0.121s== After patch ==
[root@bluextal test]# time ./stat_check.pyreal 0m0.055s
user 0m0.022s
sys 0m0.030s[akpm@linux-foundation.org: remove incorrect comment, use less statck in num_to_str(), move comment from .h to .c, simplify seq_put_decimal_ull()]
[andrea@betterlinux.com: avoid breaking the ABI in /proc/stat]
Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrea Righi
Cc: Eric Dumazet
Cc: Glauber Costa
Cc: Peter Zijlstra
Cc: Ingo Molnar
Cc: Paul Turner
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 Mar, 2012
1 commit
-
The following program illustrates the problem:
char buf[8192];
int fd = open("/proc/self/maps", O_RDONLY);
n = pread(fd, buf, sizeof(buf), 0);
printf("%d\n", n);/* lseek(fd, 0, SEEK_CUR); */ /* Uncomment to work around */
n = pread(fd, buf, sizeof(buf), 0);
printf("%d\n", n);The second printf() prints zero, but uncommenting the lseek() corrects its
behaviour.To fix, make seq_read() mirror seq_lseek() when processing changes in
*ppos. Restore m->version first, then if required traverse and update
read_pos on success.Addresses https://bugzilla.kernel.org/show_bug.cgi?id=11856
Signed-off-by: Earl Chew
Cc: Alexey Dobriyan
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
29 Feb, 2012
1 commit
-
For files only using THIS_MODULE and/or EXPORT_SYMBOL, map
them onto including export.h -- or if the file isn't even
using those, then just delete the include. Fix up any implicit
include dependencies that were being masked by module.h along
the way.Signed-off-by: Paul Gortmaker
04 Jan, 2012
1 commit
-
Signed-off-by: Al Viro
07 Dec, 2011
1 commit
-
__d_path() API is asking for trouble and in case of apparmor d_namespace_path()
getting just that. The root cause is that when __d_path() misses the root
it had been told to look for, it stores the location of the most remote ancestor
in *root. Without grabbing references. Sure, at the moment of call it had
been pinned down by what we have in *path. And if we raced with umount -l, we
could have very well stopped at vfsmount/dentry that got freed as soon as
prepend_path() dropped vfsmount_lock.It is safe to compare these pointers with pre-existing (and known to be still
alive) vfsmount and dentry, as long as all we are asking is "is it the same
address?". Dereferencing is not safe and apparmor ended up stepping into
that. d_namespace_path() really wants to examine the place where we stopped,
even if it's not connected to our namespace. As the result, it looked
at ->d_sb->s_magic of a dentry that might've been already freed by that point.
All other callers had been careful enough to avoid that, but it's really
a bad interface - it invites that kind of trouble.The fix is fairly straightforward, even though it's bigger than I'd like:
* prepend_path() root argument becomes const.
* __d_path() is never called with NULL/NULL root. It was a kludge
to start with. Instead, we have an explicit function - d_absolute_root().
Same as __d_path(), except that it doesn't get root passed and stops where
it stops. apparmor and tomoyo are using it.
* __d_path() returns NULL on path outside of root. The main
caller is show_mountinfo() and that's precisely what we pass root for - to
skip those outside chroot jail. Those who don't want that can (and do)
use d_path().
* __d_path() root argument becomes const. Everyone agrees, I hope.
* apparmor does *NOT* try to use __d_path() or any of its variants
when it sees that path->mnt is an internal vfsmount. In that case it's
definitely not mounted anywhere and dentry_path() is exactly what we want
there. Handling of sysctl()-triggered weirdness is moved to that place.
* if apparmor is asked to do pathname relative to chroot jail
and __d_path() tells it we it's not in that jail, the sucker just calls
d_absolute_path() instead. That's the other remaining caller of __d_path(),
BTW.
* seq_path_root() does _NOT_ return -ENAMETOOLONG (it's stupid anyway -
the normal seq_file logics will take care of growing the buffer and redoing
the call of ->show() just fine). However, if it gets path not reachable
from root, it returns SEQ_SKIP. The only caller adjusted (i.e. stopped
ignoring the return value as it used to do).Reviewed-by: John Johansen
ACKed-by: John Johansen
Signed-off-by: Al Viro
Cc: stable@vger.kernel.org
26 Oct, 2010
1 commit
-
All callers take dcache_lock just around the call to __d_path, so
take the lock into it in preparation of getting rid of dcache_lock.Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro