09 Aug, 2014
40 commits
-
I'm working on address sanitizer project for kernel. Recently we
started experiments with stack instrumentation, to detect out-of-bounds
read/write bugs on stack.Just after booting I've hit out-of-bounds read on stack in idr_for_each
(and in __idr_remove_all as well):struct idr_layer **paa = &pa[0];
while (id >= 0 && id < fls(id)) {
n += IDR_BITS;
p = *--paa;
Reviewed-by: Lai Jiangshan
Cc: Tejun Heo
Cc: Alexey Preobrazhensky
Cc: Dmitry Vyukov
Cc: Konstantin Khlebnikov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We have a special check in read_vmcore() handler to check if the page was
reported as ram or not by the hypervisor (pfn_is_ram()). However, when
vmcore is read with mmap() no such check is performed. That can lead to
unpredictable results, e.g. when running Xen PVHVM guest memcpy() after
mmap() on /proc/vmcore will hang processing HVMMEM_mmio_dm pages creating
enormous load in both DomU and Dom0.Fix the issue by mapping each non-ram page to the zero page. Keep direct
path with remap_oldmem_pfn_range() to avoid looping through all pages on
bare metal.The issue can also be solved by overriding remap_oldmem_pfn_range() in
xen-specific code, as remap_oldmem_pfn_range() was been designed for.
That, however, would involve non-obvious xen code path for all x86 builds
with CONFIG_XEN_PVHVM=y and would prevent all other hypervisor-specific
code on x86 arch from doing the same override.[fengguang.wu@intel.com: remap_oldmem_pfn_checked() can be static]
[akpm@linux-foundation.org: clean up layout]
Signed-off-by: Vitaly Kuznetsov
Reviewed-by: Andrew Jones
Cc: Michael Holzheu
Acked-by: Vivek Goyal
Cc: David Vrabel
Signed-off-by: Fengguang Wu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It's only used in fork.c:mm_init().
Signed-off-by: Vladimir Davydov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If a forking process has a thread calling (un)mmap (silly but still),
the child process may have some of its mm's vm usage counters (total_vm
and friends) screwed up, because currently they are copied from oldmm
w/o holding any locks (memcpy in dup_mm).This patch moves the counters initialization to dup_mmap() to be called
under oldmm->mmap_sem, which eliminates any possibility of race.Signed-off-by: Vladimir Davydov
Cc: Oleg Nesterov
Cc: David Rientjes
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
mm->pinned_vm counts pages of mm's address space that were permanently
pinned in memory by increasing their reference counter. The counter was
introduced by commit bc3e53f682d9 ("mm: distinguish between mlocked and
pinned pages"), while before it locked_vm had been used for such pages.Obviously, we should reset the counter on fork if !CLONE_VM, just like
we do with locked_vm, but currently we don't. Let's fix it.This patch will fix the contents of /proc/pid/status:VmPin.
ib_umem_get[infiniband] and perf_mmap still check pinned_vm against
RLIMIT_MEMLOCK. It's left from the times when pinned pages were accounted
under locked_vm, but today it looks wrong. It isn't clear how we should
deal with it.We still have some drivers accounting pinned pages under mm->locked_vm -
this is what commit bc3e53f682d9 was fighting against. It's
infiniband/usnic and vfio.Signed-off-by: Vladimir Davydov
Cc: Oleg Nesterov
Cc: David Rientjes
Cc: Christoph Lameter
Cc: Roland Dreier
Cc: Sean Hefty
Cc: Hal Rosenstock
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
mm initialization on fork/exec is spread all over the place, which makes
the code look inconsistent.We have mm_init(), which is supposed to init/nullify mm's internals, but
it doesn't init all the fields it should:- on fork ->mmap,mm_rb,vmacache_seqnum,map_count,mm_cpumask,locked_vm
are zeroed in dup_mmap();- on fork ->pmd_huge_pte is zeroed in dup_mm(), immediately before
calling mm_init();- ->cpu_vm_mask_var ptr is initialized by mm_init_cpumask(), which is
called before mm_init() on both fork and exec;- ->context is initialized by init_new_context(), which is called after
mm_init() on both fork and exec;Let's consolidate all the initializations in mm_init() to make the code
look cleaner.Signed-off-by: Vladimir Davydov
Cc: Oleg Nesterov
Cc: David Rientjes
Cc: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If you're applying this patch, all /proc/$PID/* files were converted
to seq_file interface and this code became unused.Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
/proc/tty/ldisc appear to be unused as a directory and
it had been always that way.But it is userspace visible thing.
Cowardly remove only in-kernel variable holding it.
[akpm@linux-foundation.org: add comment]
Signed-off-by: Alexey Dobriyan
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Alexey Dobriyan
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently lookup for /proc/$PID first goes through spinlock and whole list
of misc /proc entries only to confirm that, yes, /proc/42 can not possibly
match random proc entry.List is is several dozens entries long (52 entries on my setup).
None of this is necessary.
Try to convert dentry name to integer first.
If it works, it must be /proc/$PID.
If it doesn't, it must be random proc entry.Based on patch from Al Viro.
Signed-off-by: Alexey Dobriyan
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
* remove proc_create(NULL, ...) check, let it oops
* warn about proc_create("", ...) and proc_create("very very long name", ...)
proc code keeps length as u8, no 256+ name length possible* warn about proc_create("123", ...)
/proc/$PID and /proc/misc namespaces are separate things,
but dumb module might create funky a-la $PID entry.* remove post mortem strchr('/') check
Triggering it implies either strchr() is buggy or memory corruption.
It should be VFS check anyway.In reality, none of these checks will ever trigger,
it is preparation for the next patch.Based on patch from Al Viro.
Signed-off-by: Alexey Dobriyan
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
proc_uid_seq_operations, proc_gid_seq_operations and
proc_projid_seq_operations are only called in proc_id_map_open with
seq_open as const struct seq_operations so we can constify the 3
structures and update proc_id_map_open prototype.text data bss dec hex filename
6817 404 1984 9205 23f5 kernel/user_namespace.o-before
6913 308 1984 9205 23f5 kernel/user_namespace.o-afterSigned-off-by: Fabian Frederick
Cc: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Use mm.h definition.
Signed-off-by: Fabian Frederick
Cc: Xishi Qiu
Acked-by: David Rientjes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fixed coding style warnings and errors.
Signed-off-by: Ionut Alexa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix 2 checkpatch warnings:
WARNING: suspect code indent for conditional statements
Signed-off-by: Fabian Frederick
Cc: Mikulas Patocka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix checkpatch warning:
WARNING: Missing a blank line after declarations
Signed-off-by: Fabian Frederick
Cc: Jeff Mahoney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix checkpatch warning
WARNING: Use #include instead of
Signed-off-by: Fabian Frederick
Cc: Jeff Mahoney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fixes checkpatch warnings:
"WARNING: %Ld/%Lu are not-standard C, use %lld/%llu"
Signed-off-by: Fabian Frederick
Cc: Jeff Mahoney
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Fabian Frederick
Cc: Evgeniy Dushistov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Convert no level printk to pr_debug in UFSD. DEBUG is defined with
CONFIG_UFS_DEBUG so pr_debug are emitted here.Also fixing call to UFSD (add;)
Signed-off-by: Fabian Frederick
Cc: Evgeniy Dushistov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove error_buffer and use %pV
Signed-off-by: Fabian Frederick
Cc: Evgeniy Dushistov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Replace approximate function name by __func__ using standard format
"function():"Signed-off-by: Fabian Frederick
Cc: Evgeniy Dushistov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Replace UFS-fs, UFS-fs: and UFS: by pr_fmt with module name "ufs: "
Signed-off-by: Fabian Frederick
Cc: Evgeniy Dushistov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Use current logging functions.
- no level printk under CONFIG_UFS_DEBUG converted to pr_debug
- no level printk elsewhere converted to pr_err
- add DDEBUG flag in Makefile
- coalesce formats
Signed-off-by: Fabian Frederick
Cc: Evgeniy Dushistov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch integrates creation of sysfs groups and
attributes into NILFS file system driver.It was found the issue with nilfs_sysfs_{create/delete}_snapshot_group
functions by Michael L Semon in the first
version of the patch:BUG: sleeping function called from invalid context at kernel/locking/mutex.c:579
in_atomic(): 1, irqs_disabled(): 0, pid: 32676, name: umount.nilfs2
2 locks held by umount.nilfs2/32676:
#0: (&type->s_umount_key#21){++++..}, at: [] deactivate_super+0x37/0x58
#1: (&(&nilfs->ns_cptree_lock)->rlock){+.+...}, at: [] nilfs_put_root+0x23/0x5a
Preemption disabled at:[] nilfs_put_root+0x23/0x5aCPU: 0 PID: 32676 Comm: umount.nilfs2 Not tainted 3.14.0+ #2
Hardware name: Dell Computer Corporation Dimension 2350/07W080, BIOS A01 12/17/2002
Call Trace:
dump_stack+0x4b/0x75
__might_sleep+0x111/0x16f
mutex_lock_nested+0x1e/0x3ad
kernfs_remove+0x12/0x26
sysfs_remove_dir+0x3d/0x62
kobject_del+0x13/0x38
nilfs_sysfs_delete_snapshot_group+0xb/0xd
nilfs_put_root+0x2a/0x5a
nilfs_detach_log_writer+0x1ab/0x2c1
nilfs_put_super+0x13/0x68
generic_shutdown_super+0x60/0xd1
kill_block_super+0x1d/0x60
deactivate_locked_super+0x22/0x3f
deactivate_super+0x3e/0x58
mntput_no_expire+0xe2/0x141
SyS_oldumount+0x70/0xa5
syscall_call+0x7/0xbThe reason of the issue was placement of
nilfs_sysfs_{create/delete}_snapshot_group() call under
nilfs->ns_cptree_lock protection. But this protection is unnecessary and
wrong solution. The second version of the patch fixes this issue.[fengguang.wu@intel.com: nilfs_sysfs_create_mounted_snapshots_group can be static]
Reported-by: Michael L. Semon
Signed-off-by: Vyacheslav Dubeyko
Cc: Vyacheslav Dubeyko
Cc: Ryusuke Konishi
Tested-by: Michael L. Semon
Signed-off-by: Fengguang Wu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch adds creation of group for every mounted
snapshot in /sys/fs/nilfs2//mounted_snapshots group.The group contains details about mounted snapshot:
(1) inodes_count - show number of inodes for snapshot.
(2) blocks_count - show number of blocks for snapshot.Signed-off-by: Vyacheslav Dubeyko
Cc: Vyacheslav Dubeyko
Cc: Ryusuke Konishi
Cc: Michael L. Semon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch adds creation of /sys/fs/nilfs2//mounted_snapshots
group.The mounted_snapshots group contains group for every
mounted snapshot.Signed-off-by: Vyacheslav Dubeyko
Cc: Vyacheslav Dubeyko
Cc: Ryusuke Konishi
Cc: Michael L. Semon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch adds creation of /sys/fs/nilfs2//checkpoints
group.The checkpoints group contains attributes that describe
details about volume's checkpoints:
(1) checkpoints_number - show number of checkpoints on volume.
(2) snapshots_number - show number of snapshots on volume.
(3) last_seg_checkpoint - show checkpoint number of the latest segment.
(4) next_checkpoint - show next checkpoint number.Signed-off-by: Vyacheslav Dubeyko
Cc: Vyacheslav Dubeyko
Cc: Ryusuke Konishi
Cc: Michael L. Semon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch adds creation of /sys/fs/nilfs2//segments
group.The segments group contains attributes that describe
details about volume's segments:
(1) segments_number - show number of segments on volume.
(2) blocks_per_segment - show number of blocks in segment.
(3) clean_segments - show count of clean segments.
(4) dirty_segments - show count of dirty segments.Signed-off-by: Vyacheslav Dubeyko
Cc: Vyacheslav Dubeyko
Cc: Ryusuke Konishi
Cc: Michael L. Semon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch adds creation of /sys/fs/nilfs2//segctor
group.The segctor group contains attributes that describe
segctor thread activity details:
(1) last_pseg_block - show start block number of the latest segment.
(2) last_seg_sequence - show sequence value of the latest segment.
(3) last_seg_checkpoint - show checkpoint number of the latest segment.
(4) current_seg_sequence - show segment sequence counter.
(5) current_last_full_seg - show index number of the latest full segment.
(6) next_full_seg - show index number of the full segment index
to be used next.
(7) next_pseg_offset - show offset of next partial segment in
the current full segment.
(8) next_checkpoint - show next checkpoint number.
(9) last_seg_write_time - show write time of the last segment
in human-readable format.
(10) last_seg_write_time_secs - show write time of the last segment
in seconds.
(11) last_nongc_write_time - show write time of the last segment
not for cleaner operation in human-readable format.
(12) last_nongc_write_time_secs - show write time of the last segment
not for cleaner operation in seconds.
(13) dirty_data_blocks_count - show number of dirty data blocks.Signed-off-by: Vyacheslav Dubeyko
Cc: Vyacheslav Dubeyko
Cc: Ryusuke Konishi
Cc: Michael L. Semon
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds