09 Aug, 2014

40 commits

  • I'm working on address sanitizer project for kernel. Recently we
    started experiments with stack instrumentation, to detect out-of-bounds
    read/write bugs on stack.

    Just after booting I've hit out-of-bounds read on stack in idr_for_each
    (and in __idr_remove_all as well):

    struct idr_layer **paa = &pa[0];

    while (id >= 0 && id < fls(id)) {
    n += IDR_BITS;
    p = *--paa;
    Reviewed-by: Lai Jiangshan
    Cc: Tejun Heo
    Cc: Alexey Preobrazhensky
    Cc: Dmitry Vyukov
    Cc: Konstantin Khlebnikov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     
  • We have a special check in read_vmcore() handler to check if the page was
    reported as ram or not by the hypervisor (pfn_is_ram()). However, when
    vmcore is read with mmap() no such check is performed. That can lead to
    unpredictable results, e.g. when running Xen PVHVM guest memcpy() after
    mmap() on /proc/vmcore will hang processing HVMMEM_mmio_dm pages creating
    enormous load in both DomU and Dom0.

    Fix the issue by mapping each non-ram page to the zero page. Keep direct
    path with remap_oldmem_pfn_range() to avoid looping through all pages on
    bare metal.

    The issue can also be solved by overriding remap_oldmem_pfn_range() in
    xen-specific code, as remap_oldmem_pfn_range() was been designed for.
    That, however, would involve non-obvious xen code path for all x86 builds
    with CONFIG_XEN_PVHVM=y and would prevent all other hypervisor-specific
    code on x86 arch from doing the same override.

    [fengguang.wu@intel.com: remap_oldmem_pfn_checked() can be static]
    [akpm@linux-foundation.org: clean up layout]
    Signed-off-by: Vitaly Kuznetsov
    Reviewed-by: Andrew Jones
    Cc: Michael Holzheu
    Acked-by: Vivek Goyal
    Cc: David Vrabel
    Signed-off-by: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Kuznetsov
     
  • It's only used in fork.c:mm_init().

    Signed-off-by: Vladimir Davydov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • If a forking process has a thread calling (un)mmap (silly but still),
    the child process may have some of its mm's vm usage counters (total_vm
    and friends) screwed up, because currently they are copied from oldmm
    w/o holding any locks (memcpy in dup_mm).

    This patch moves the counters initialization to dup_mmap() to be called
    under oldmm->mmap_sem, which eliminates any possibility of race.

    Signed-off-by: Vladimir Davydov
    Cc: Oleg Nesterov
    Cc: David Rientjes
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • mm->pinned_vm counts pages of mm's address space that were permanently
    pinned in memory by increasing their reference counter. The counter was
    introduced by commit bc3e53f682d9 ("mm: distinguish between mlocked and
    pinned pages"), while before it locked_vm had been used for such pages.

    Obviously, we should reset the counter on fork if !CLONE_VM, just like
    we do with locked_vm, but currently we don't. Let's fix it.

    This patch will fix the contents of /proc/pid/status:VmPin.

    ib_umem_get[infiniband] and perf_mmap still check pinned_vm against
    RLIMIT_MEMLOCK. It's left from the times when pinned pages were accounted
    under locked_vm, but today it looks wrong. It isn't clear how we should
    deal with it.

    We still have some drivers accounting pinned pages under mm->locked_vm -
    this is what commit bc3e53f682d9 was fighting against. It's
    infiniband/usnic and vfio.

    Signed-off-by: Vladimir Davydov
    Cc: Oleg Nesterov
    Cc: David Rientjes
    Cc: Christoph Lameter
    Cc: Roland Dreier
    Cc: Sean Hefty
    Cc: Hal Rosenstock
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • mm initialization on fork/exec is spread all over the place, which makes
    the code look inconsistent.

    We have mm_init(), which is supposed to init/nullify mm's internals, but
    it doesn't init all the fields it should:

    - on fork ->mmap,mm_rb,vmacache_seqnum,map_count,mm_cpumask,locked_vm
    are zeroed in dup_mmap();

    - on fork ->pmd_huge_pte is zeroed in dup_mm(), immediately before
    calling mm_init();

    - ->cpu_vm_mask_var ptr is initialized by mm_init_cpumask(), which is
    called before mm_init() on both fork and exec;

    - ->context is initialized by init_new_context(), which is called after
    mm_init() on both fork and exec;

    Let's consolidate all the initializations in mm_init() to make the code
    look cleaner.

    Signed-off-by: Vladimir Davydov
    Cc: Oleg Nesterov
    Cc: David Rientjes
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     
  • If you're applying this patch, all /proc/$PID/* files were converted
    to seq_file interface and this code became unused.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • /proc/tty/ldisc appear to be unused as a directory and
    it had been always that way.

    But it is userspace visible thing.

    Cowardly remove only in-kernel variable holding it.

    [akpm@linux-foundation.org: add comment]
    Signed-off-by: Alexey Dobriyan
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Signed-off-by: Alexey Dobriyan
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Currently lookup for /proc/$PID first goes through spinlock and whole list
    of misc /proc entries only to confirm that, yes, /proc/42 can not possibly
    match random proc entry.

    List is is several dozens entries long (52 entries on my setup).

    None of this is necessary.

    Try to convert dentry name to integer first.
    If it works, it must be /proc/$PID.
    If it doesn't, it must be random proc entry.

    Based on patch from Al Viro.

    Signed-off-by: Alexey Dobriyan
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • * remove proc_create(NULL, ...) check, let it oops

    * warn about proc_create("", ...) and proc_create("very very long name", ...)
    proc code keeps length as u8, no 256+ name length possible

    * warn about proc_create("123", ...)
    /proc/$PID and /proc/misc namespaces are separate things,
    but dumb module might create funky a-la $PID entry.

    * remove post mortem strchr('/') check
    Triggering it implies either strchr() is buggy or memory corruption.
    It should be VFS check anyway.

    In reality, none of these checks will ever trigger,
    it is preparation for the next patch.

    Based on patch from Al Viro.

    Signed-off-by: Alexey Dobriyan
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • proc_uid_seq_operations, proc_gid_seq_operations and
    proc_projid_seq_operations are only called in proc_id_map_open with
    seq_open as const struct seq_operations so we can constify the 3
    structures and update proc_id_map_open prototype.

    text data bss dec hex filename
    6817 404 1984 9205 23f5 kernel/user_namespace.o-before
    6913 308 1984 9205 23f5 kernel/user_namespace.o-after

    Signed-off-by: Fabian Frederick
    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • Use mm.h definition.

    Signed-off-by: Fabian Frederick
    Cc: Xishi Qiu
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • Fixed coding style warnings and errors.

    Signed-off-by: Ionut Alexa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ionut Alexa
     
  • Fix 2 checkpatch warnings:

    WARNING: suspect code indent for conditional statements

    Signed-off-by: Fabian Frederick
    Cc: Mikulas Patocka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • Fix checkpatch warning:

    WARNING: Missing a blank line after declarations

    Signed-off-by: Fabian Frederick
    Cc: Jeff Mahoney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • Fix checkpatch warning

    WARNING: Use #include instead of

    Signed-off-by: Fabian Frederick
    Cc: Jeff Mahoney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • Fixes checkpatch warnings:

    "WARNING: %Ld/%Lu are not-standard C, use %lld/%llu"

    Signed-off-by: Fabian Frederick
    Cc: Jeff Mahoney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • Signed-off-by: Fabian Frederick
    Cc: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • Convert no level printk to pr_debug in UFSD. DEBUG is defined with
    CONFIG_UFS_DEBUG so pr_debug are emitted here.

    Also fixing call to UFSD (add;)

    Signed-off-by: Fabian Frederick
    Cc: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • Remove error_buffer and use %pV

    Signed-off-by: Fabian Frederick
    Cc: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • Replace approximate function name by __func__ using standard format
    "function():"

    Signed-off-by: Fabian Frederick
    Cc: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • Replace UFS-fs, UFS-fs: and UFS: by pr_fmt with module name "ufs: "

    Signed-off-by: Fabian Frederick
    Cc: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • Use current logging functions.

    - no level printk under CONFIG_UFS_DEBUG converted to pr_debug

    - no level printk elsewhere converted to pr_err

    - add DDEBUG flag in Makefile

    - coalesce formats

    Signed-off-by: Fabian Frederick
    Cc: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • This patch integrates creation of sysfs groups and
    attributes into NILFS file system driver.

    It was found the issue with nilfs_sysfs_{create/delete}_snapshot_group
    functions by Michael L Semon in the first
    version of the patch:

    BUG: sleeping function called from invalid context at kernel/locking/mutex.c:579
    in_atomic(): 1, irqs_disabled(): 0, pid: 32676, name: umount.nilfs2
    2 locks held by umount.nilfs2/32676:
    #0: (&type->s_umount_key#21){++++..}, at: [] deactivate_super+0x37/0x58
    #1: (&(&nilfs->ns_cptree_lock)->rlock){+.+...}, at: [] nilfs_put_root+0x23/0x5a
    Preemption disabled at:[] nilfs_put_root+0x23/0x5a

    CPU: 0 PID: 32676 Comm: umount.nilfs2 Not tainted 3.14.0+ #2
    Hardware name: Dell Computer Corporation Dimension 2350/07W080, BIOS A01 12/17/2002
    Call Trace:
    dump_stack+0x4b/0x75
    __might_sleep+0x111/0x16f
    mutex_lock_nested+0x1e/0x3ad
    kernfs_remove+0x12/0x26
    sysfs_remove_dir+0x3d/0x62
    kobject_del+0x13/0x38
    nilfs_sysfs_delete_snapshot_group+0xb/0xd
    nilfs_put_root+0x2a/0x5a
    nilfs_detach_log_writer+0x1ab/0x2c1
    nilfs_put_super+0x13/0x68
    generic_shutdown_super+0x60/0xd1
    kill_block_super+0x1d/0x60
    deactivate_locked_super+0x22/0x3f
    deactivate_super+0x3e/0x58
    mntput_no_expire+0xe2/0x141
    SyS_oldumount+0x70/0xa5
    syscall_call+0x7/0xb

    The reason of the issue was placement of
    nilfs_sysfs_{create/delete}_snapshot_group() call under
    nilfs->ns_cptree_lock protection. But this protection is unnecessary and
    wrong solution. The second version of the patch fixes this issue.

    [fengguang.wu@intel.com: nilfs_sysfs_create_mounted_snapshots_group can be static]
    Reported-by: Michael L. Semon
    Signed-off-by: Vyacheslav Dubeyko
    Cc: Vyacheslav Dubeyko
    Cc: Ryusuke Konishi
    Tested-by: Michael L. Semon
    Signed-off-by: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • This patch adds creation of group for every mounted
    snapshot in /sys/fs/nilfs2//mounted_snapshots group.

    The group contains details about mounted snapshot:
    (1) inodes_count - show number of inodes for snapshot.
    (2) blocks_count - show number of blocks for snapshot.

    Signed-off-by: Vyacheslav Dubeyko
    Cc: Vyacheslav Dubeyko
    Cc: Ryusuke Konishi
    Cc: Michael L. Semon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • This patch adds creation of /sys/fs/nilfs2//mounted_snapshots
    group.

    The mounted_snapshots group contains group for every
    mounted snapshot.

    Signed-off-by: Vyacheslav Dubeyko
    Cc: Vyacheslav Dubeyko
    Cc: Ryusuke Konishi
    Cc: Michael L. Semon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • This patch adds creation of /sys/fs/nilfs2//checkpoints
    group.

    The checkpoints group contains attributes that describe
    details about volume's checkpoints:
    (1) checkpoints_number - show number of checkpoints on volume.
    (2) snapshots_number - show number of snapshots on volume.
    (3) last_seg_checkpoint - show checkpoint number of the latest segment.
    (4) next_checkpoint - show next checkpoint number.

    Signed-off-by: Vyacheslav Dubeyko
    Cc: Vyacheslav Dubeyko
    Cc: Ryusuke Konishi
    Cc: Michael L. Semon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • This patch adds creation of /sys/fs/nilfs2//segments
    group.

    The segments group contains attributes that describe
    details about volume's segments:
    (1) segments_number - show number of segments on volume.
    (2) blocks_per_segment - show number of blocks in segment.
    (3) clean_segments - show count of clean segments.
    (4) dirty_segments - show count of dirty segments.

    Signed-off-by: Vyacheslav Dubeyko
    Cc: Vyacheslav Dubeyko
    Cc: Ryusuke Konishi
    Cc: Michael L. Semon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • This patch adds creation of /sys/fs/nilfs2//segctor
    group.

    The segctor group contains attributes that describe
    segctor thread activity details:
    (1) last_pseg_block - show start block number of the latest segment.
    (2) last_seg_sequence - show sequence value of the latest segment.
    (3) last_seg_checkpoint - show checkpoint number of the latest segment.
    (4) current_seg_sequence - show segment sequence counter.
    (5) current_last_full_seg - show index number of the latest full segment.
    (6) next_full_seg - show index number of the full segment index
    to be used next.
    (7) next_pseg_offset - show offset of next partial segment in
    the current full segment.
    (8) next_checkpoint - show next checkpoint number.
    (9) last_seg_write_time - show write time of the last segment
    in human-readable format.
    (10) last_seg_write_time_secs - show write time of the last segment
    in seconds.
    (11) last_nongc_write_time - show write time of the last segment
    not for cleaner operation in human-readable format.
    (12) last_nongc_write_time_secs - show write time of the last segment
    not for cleaner operation in seconds.
    (13) dirty_data_blocks_count - show number of dirty data blocks.

    Signed-off-by: Vyacheslav Dubeyko
    Cc: Vyacheslav Dubeyko
    Cc: Ryusuke Konishi
    Cc: Michael L. Semon
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko