15 Apr, 2015

40 commits

  • A small cleanup. Seems in e3239ff9 ("memblock: Rename memblock_region to
    memblock_type and memblock_property to memblock_region") this one was
    missed.

    Signed-off-by: Baoquan He
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Baoquan He
     
  • It's odd that we have populate_vma_page_range() and __mm_populate() in
    mm/mlock.c. It's implementation of generic memory population and mlocking
    is one of possible side effect, if VM_LOCKED is set.

    __get_user_pages() is core of the implementation. Let's move the code
    into mm/gup.c.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Linus Torvalds
    Acked-by: David Rientjes
    Cc: Michel Lespinasse
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • This is praparation to moving mm_populate()-related code out of
    mm/mlock.c.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Linus Torvalds
    Acked-by: David Rientjes
    Cc: Michel Lespinasse
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • __mlock_vma_pages_range() doesn't necessarily mlock pages. It depends on
    vma flags. The same codepath is used for MAP_POPULATE.

    Let's rename __mlock_vma_pages_range() to populate_vma_page_range().

    This patch also drops mlock_vma_pages_range() references from
    documentation. It has gone in cea10a19b797 ("mm: directly use
    __mlock_vma_pages_range() in find_extend_vma()").

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Linus Torvalds
    Acked-by: David Rientjes
    Cc: Michel Lespinasse
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • After commit a1fde08c74e9 ("VM: skip the stack guard page lookup in
    get_user_pages only for mlock") FOLL_MLOCK has lost its original
    meaning: we don't necessarily mlock the page if the flags is set -- we
    also take VM_LOCKED into consideration.

    Since we use the same codepath for __mm_populate(), let's rename
    FOLL_MLOCK to FOLL_POPULATE.

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Linus Torvalds
    Acked-by: David Rientjes
    Cc: Michel Lespinasse
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • slob_alloc_node() is only used in slob.c. Remove the EXPORT_SYMBOL and
    make slob_alloc_node() static.

    Signed-off-by: Fabian Frederick
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fabian Frederick
     
  • Use the normal return values for bool functions

    Signed-off-by: Joe Perches
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Acked-by: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • CONFIG_SLAB_DEBUG doesn't exist, CONFIG_DEBUG_SLAB does.

    Signed-off-by: David Rientjes
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • By moving the O option detection into the switch statement, we allow this
    parameter to be combined with other options correctly. Previously options
    like slub_debug=OFZ would only detect the 'o' and use DEBUG_DEFAULT_FLAGS
    to fill in the rest of the flags.

    Signed-off-by: Chris J Arges
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Acked-by: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris J Arges
     
  • With gcc version 4.7.3 (Ubuntu/Linaro 4.7.3-12ubuntu1) :

    mm/migrate.c: In function `migrate_pages':
    mm/migrate.c:1148:1: internal compiler error: in push_minipool_fix, at config/arm/arm.c:13500
    Please submit a full bug report,
    with preprocessed source if appropriate.
    See for instructions.
    Preprocessed source stored into /tmp/ccPoM1tr.out file, please attach this to your bugreport.
    make[1]: *** [mm/migrate.o] Error 1
    make: *** [mm/migrate.o] Error 2

    Mark unmap_and_move() (which is used in a single place only) "noinline"
    to work around this compiler bug.

    [akpm@linux-foundation.org: make it conditional on gcc-4.7.3 and arm]
    [khilman@kernel.org: fine-tune compiler versions]
    [akpm@linux-foundation.org: fix comment]
    Signed-off-by: Geert Uytterhoeven
    Reported-by: Kevin Hilman
    Cc: Marc Zyngier
    Tested-by: Kevin Hilman
    Tested-by: Lina Iyer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     
  • Have kvm_guest_init() use hardlockup_detector_disable() instead of
    watchdog_enable_hardlockup_detector(false).

    Remove the watchdog_hardlockup_detector_is_enabled() and the
    watchdog_enable_hardlockup_detector() function which are no longer needed.

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • Rename the update_timers*() functions to update_watchdog*().

    Remove the boolean argument from watchdog_enable_all_cpus() because
    update_watchdog_all_cpus() is now a generic function to change the run
    state of the lockup detectors and to have the lockup detectors use a new
    sample period.

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • With the current user interface of the watchdog mechanism it is only
    possible to disable or enable both lockup detectors at the same time.
    This series introduces new kernel parameters and changes the semantics of
    some existing kernel parameters, so that the hard lockup detector and the
    soft lockup detector can be disabled or enabled individually. With this
    series applied, the user interface is as follows.

    - parameters in /proc/sys/kernel

    . soft_watchdog
    This is a new parameter to control and examine the run state of
    the soft lockup detector.

    . nmi_watchdog
    The semantics of this parameter have changed. It can now be used
    to control and examine the run state of the hard lockup detector.

    . watchdog
    This parameter is still available to control the run state of both
    lockup detectors at the same time. If this parameter is examined,
    it shows the logical OR of soft_watchdog and nmi_watchdog.

    . watchdog_thresh
    The semantics of this parameter are not affected by the patch.

    - kernel command line parameters

    . nosoftlockup
    The semantics of this parameter have changed. It can now be used
    to disable the soft lockup detector at boot time.

    . nmi_watchdog=0 or nmi_watchdog=1
    Disable or enable the hard lockup detector at boot time. The patch
    introduces '=1' as a new option.

    . nowatchdog
    The semantics of this parameter are not affected by the patch. It
    is still available to disable both lockup detectors at boot time.

    Also, remove the proc_dowatchdog() function which is no longer needed.

    [dzickus@redhat.com: wrote changelog]
    [dzickus@redhat.com: update documentation for kernel params and sysctl]
    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • If watchdog_nmi_enable() fails to set up the hardware perf event of one
    CPU, the entire hard lockup detector is deemed unreliable. Hence, disable
    the hard lockup detector and shut down the hardware perf events on all
    CPUs.

    [dzickus@redhat.com: update comments to explain some code]
    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • Separate handlers for each watchdog parameter in /proc/sys/kernel replace
    the proc_dowatchdog() function. Three of those handlers merely call
    proc_watchdog_common() with one different argument.

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • Three of four handlers for the watchdog parameters in /proc/sys/kernel
    essentially have to do the same thing.

    if the parameter is being read {
    return the state of the corresponding bit(s) in 'watchdog_enabled'
    } else {
    set/clear the state of the corresponding bit(s) in 'watchdog_enabled'
    update the run state of the lockup detector(s)
    }

    Hence, introduce a common function that can be called by those handlers.
    The callers pass a 'bit mask' to this function to indicate which bit(s)
    should be set/cleared in 'watchdog_enabled'.

    This function handles an uncommon race with watchdog_nmi_enable() where a
    concurrent update of 'watchdog_enabled' is possible. We use 'cmpxchg' to
    detect the concurrency. [This avoids introducing a new spinlock or a
    mutex to synchronize updates of 'watchdog_enabled'. Using the same lock
    or mutex in watchdog thread context and in system call context needs to be
    considered carefully because it can make the code prone to deadlock
    situations in connection with parking/unparking the watchdog threads.]

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • This series removes proc_dowatchdog(). Since multiple new functions need
    the 'watchdog_proc_mutex' to serialize access to the watchdog parameters
    in /proc/sys/kernel, move the mutex outside of any function.

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • This series introduces a separate handler for each watchdog parameter in
    /proc/sys/kernel. The separate handlers need a common function that they
    can call to update the run state of the lockup detectors, or to have the
    lockup detectors use a new sample period.

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • The hardlockup and softockup had always been tied together. Due to the
    request of KVM folks, they had a need to have one enabled but not the
    other. Internally rework the code to split things apart more cleanly.

    There is a bunch of churn here, but the end result should be code that
    should be easier to maintain and fix without knowing the internals of what
    is going on.

    This patch (of 9):

    Introduce new definitions and variables to separate the user interface in
    /proc/sys/kernel from the internal run state of the lockup detectors. The
    internal run state is represented by two bits in a new variable that is
    named 'watchdog_enabled'. This helps simplify the code, for example:

    - In order to check if any of the two lockup detectors is enabled,
    it is sufficient to check if 'watchdog_enabled' is not zero.

    - In order to enable/disable one or both lockup detectors,
    it is sufficient to set/clear one or both bits in 'watchdog_enabled'.

    - Concurrent updates of 'watchdog_enabled' need not be synchronized via
    a spinlock or a mutex. Updates can either be atomic or concurrency can
    be detected by using 'cmpxchg'.

    Signed-off-by: Ulrich Obergfell
    Signed-off-by: Don Zickus
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ulrich Obergfell
     
  • ocfs2 does

    mlog_errno(v);
    return v;

    in many places. Change mlog_errno() so we can do

    return mlog_errno(v);

    For some weird reason this patch reduces the size of ocfs2 by 6k:

    akpm3:/usr/src/25> size fs/ocfs2/ocfs2.ko
    text data bss dec hex filename
    1146613 82767 832192 2061572 1f7504 fs/ocfs2/ocfs2.ko-before
    1140857 82767 832192 2055816 1f5e88 fs/ocfs2/ocfs2.ko-after

    [dan.carpenter@oracle.com: double evaluation concerns in mlog_errno()]
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: alex chen
    Signed-off-by: Dan Carpenter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • If ocfs2 lockres has not been initialized before calling ocfs2_dlm_lock,
    the lock won't be dropped and then will lead umount hung. The case is
    described below:

    ocfs2_mknod
    ocfs2_mknod_locked
    __ocfs2_mknod_locked
    ocfs2_journal_access_di
    Failed because of -ENOMEM or other reasons, the inode lockres
    has not been initialized yet.

    iput(inode)
    ocfs2_evict_inode
    ocfs2_delete_inode
    ocfs2_inode_lock
    ocfs2_inode_lock_full_nested
    __ocfs2_cluster_lock
    Succeeds and allocates a new dlm lockres.
    ocfs2_clear_inode
    ocfs2_open_unlock
    ocfs2_drop_inode_locks
    ocfs2_drop_lock
    Since lockres has not been initialized, the lock
    can't be dropped and the lockres can't be
    migrated, thus umount will hang forever.

    Signed-off-by: Alex Chen
    Reviewed-by: Joseph Qi
    Reviewed-by: joyce.xue
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    alex chen
     
  • Use the vsprintf %pV extension to avoid using a static buffer and remove
    the now unnecessary buffer.

    Signed-off-by: Joe Perches
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • debugfs_create_dir and debugfs_create_file may return -ENODEV when debugfs
    is not configured, so the return value should be checked against
    ERROR_VALUE as well, otherwise the later dereference of the dentry pointer
    would crash the kernel.

    This patch tries to solve this problem by fixing certain checks. However,
    I have that found other call sites are protected by #ifdef CONFIG_DEBUG_FS.
    In current implementation, if CONFIG_DEBUG_FS is defined, then the above
    two functions will never return any ERROR_VALUE. So another possibility
    to fix this is to surround all the buggy checks/functions with the same
    #ifdef CONFIG_DEBUG_FS. But I'm not sure if this would break any functionality,
    as only OCFS2_FS_STATS declares dependency on DEBUG_FS.

    Signed-off-by: Chengyu Song
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chengyu Song
     
  • Signed-off-by: Jakub Wilk
    Reviewed-by: Eric Ren
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jakub Wilk
     
  • In ocfs2_local_alloc_find_clear_bits and ocfs2_get_dentry, variable
    numfound and set may be uninitialized and then used in tracepoint. In
    ocfs2_xattr_block_get and ocfs2_delete_xattr_in_bucket, variable block_off
    and xv may be uninitialized and then used in the following logic due to
    unchecked return value.

    This patch fixes these possible issues.

    Signed-off-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • Signed-off-by: Daeseok Youn
    Reviewed-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daeseok Youn
     
  • ocfs2_block_group_clear_bits will clear bits in block group bitmap.
    Once it succeeds but fails in the following step, it will cause block
    group bitmap mismatch the corresponding count recorded in dinode.
    So rollback the cleared bits if error occurs.

    Signed-off-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • When ocfs2_get_system_file_inode fails, it is obscure to set the return
    value to -EEXIST. So change it to -ENOENT.

    Signed-off-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • If the namelen is 20 and name only has actual length 16, it will fail in
    ocfs2_find_entry because of mismatch. So use actual name length when find
    entry.

    Signed-off-by: Joseph Qi
    Signed-off-by: Yiwen Jiang
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • The code at the "out" label assumes that "default_acl" and "acl" are NULL,
    but actually the pointers can be NULL, unitialized, or freed.

    Signed-off-by: Dan Carpenter
    Reviewed-by: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     
  • In ocfs2_reserve_local_alloc_bits, it calls ocfs2_error if local alloc
    inode bitmap used bits mismatch, but the log mistakes it as free bits.

    Signed-off-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • In ocfs2_direct_IO_write, we use ocfs2_zero_extend to zero allocated
    clusters in case of cluster not aligned. But ocfs2_zero_extend uses page
    cache, this may happen that it clears the data which blockdev_direct_IO
    has already written.

    We should use blkdev_issue_zeroout instead of ocfs2_zero_extend during
    direct IO.

    So fix this issue by introducing ocfs2_direct_IO_zero_extend and
    ocfs2_direct_IO_extend_no_holes.

    Reported-by: Yiwen Jiang
    Signed-off-by: Joseph Qi
    Tested-by: Yiwen Jiang
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • We need take inode lock when calling ocfs2_get_clusters.
    And use GFP_NOFS instead of GFP_KERNEL.

    Signed-off-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • Since di_bh won't be used when zeroing extend, set it to NULL.

    Signed-off-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • Only when direct IO succeeds we need consider zeroing out in case of
    cluster not aligned.

    Signed-off-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • Fix an off-by-one when attempting to avoid an msleep() on the final loop
    iteration.

    Signed-off-by: Daeseok Youn
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daeseok Youn
     
  • kfree() was called by user_cluster_connect() even if a previous call of
    the kzalloc() function failed.

    Return from this implementation directly after failure detection.

    Signed-off-by: Markus Elfring
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Markus Elfring
     
  • __ocfs2_free_slot_info() was called by ocfs2_init_slot_info() even if a
    call of the kzalloc() function failed.

    Return from this implementation directly after corresponding
    exception handling.

    Signed-off-by: Markus Elfring
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Markus Elfring
     
  • ocfs2_free_path() was called by ocfs2_merge_rec_right() even if a call of
    the ocfs2_get_right_path() function failed.

    Return from this implementation directly after corresponding
    exception handling.

    Signed-off-by: Markus Elfring
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Markus Elfring
     
  • ocfs2_free_path() was called by ocfs2_merge_rec_left() even if a call of
    the ocfs2_get_left_path() function failed.

    Return from this implementation directly after corresponding
    exception handling.

    Signed-off-by: Markus Elfring
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Markus Elfring