10 Nov, 2015

7 commits

  • The origin document references to cap_vm_enough_memory is because
    cap_vm_enough_memory invoked __vm_enough_memory before and it no longer
    does now.

    Signed-off-by: Chun Chen
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chun Chen
     
  • Switch everything to the new and more capable implementation of abs().
    Mainly to give the new abs() a bit of a workout.

    Cc: Michal Nazarewicz
    Cc: John Stultz
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Masami Hiramatsu
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • For 64-bit arguments, the abs macro casts it to an int which leads to
    lost precision and may cause incorrect results. To deal with 64-bit
    types abs64 macro has been introduced but still there are places where
    abs macro is used incorrectly.

    To deal with the problem, expand abs macro such that it operates on s64
    type when dealing with 64-bit types while still returning long when
    dealing with smaller types.

    This fixes one known bug (per John):

    The internal clocksteering done for fine-grained error correction uses a
    : logarithmic approximation, so any time adjtimex() adjusts the clock
    : steering, timekeeping_freqadjust() quickly approximates the correct clock
    : frequency over a series of ticks.
    :
    : Unfortunately, the logic in timekeeping_freqadjust(), introduced in commit
    : dc491596f639438 (Rework frequency adjustments to work better w/ nohz),
    : used the abs() function with a s64 error value to calculate the size of
    : the approximated adjustment to be made.
    :
    : Per include/linux/kernel.h: "abs() should not be used for 64-bit types
    : (s64, u64, long long) - use abs64()".
    :
    : Thus on 32-bit platforms, this resulted in the clocksteering to take a
    : quite dampended random walk trying to converge on the proper frequency,
    : which caused the adjustments to be made much slower then intended (most
    : easily observed when large adjustments are made).

    Signed-off-by: Michal Nazarewicz
    Reported-by: John Stultz
    Tested-by: John Stultz
    Cc: Ingo Molnar
    Cc: Steven Rostedt
    Cc: Peter Zijlstra
    Cc: Masami Hiramatsu
    Cc: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Nazarewicz
     
  • Signed-off-by: Mathieu Desnoyers
    Acked-by: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mathieu Desnoyers
     
  • A previous commit introduced the new mlock2 syscall, add entries for the
    MIPS architecture.

    Signed-off-by: Eric B Munson
    Acked-by: Ralf Baechle
    Cc: Catalin Marinas
    Cc: Geert Uytterhoeven
    Cc: Guenter Roeck
    Cc: Heiko Carstens
    Cc: Jonathan Corbet
    Cc: Kirill A. Shutemov
    Cc: Michael Kerrisk
    Cc: Michal Hocko
    Cc: Shuah Khan
    Cc: Stephen Rothwell
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric B Munson
     
  • Fix kernel-doc warnings in fs/fs-writeback.c by moving a #define macro to
    after the function's opening brace. Also #undef this macro at the end of
    the function.

    ../fs/fs-writeback.c:1984: warning: Excess function parameter 'inode' description in 'I_DIRTY_INODE'
    ../fs/fs-writeback.c:1984: warning: Excess function parameter 'flags' description in 'I_DIRTY_INODE'

    Signed-off-by: Randy Dunlap
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Fix kernel-doc warning in fs/inode.c:

    ../fs/inode.c:1606: warning: No description found for parameter 'inode'

    Signed-off-by: Randy Dunlap
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     

08 Nov, 2015

6 commits

  • __GFP_WAIT was renamed for __GFP_RECLAIM and the gfpflags_allow_blocking()
    helper was added.

    Cc: Stephen Rothwell
    Cc: Catalin Marinas
    Cc: Robin Murphy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Merge second patch-bomb from Andrew Morton:

    - most of the rest of MM

    - procfs

    - lib/ updates

    - printk updates

    - bitops infrastructure tweaks

    - checkpatch updates

    - nilfs2 update

    - signals

    - various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
    dma-debug, dma-mapping, ...

    * emailed patches from Andrew Morton : (102 commits)
    ipc,msg: drop dst nil validation in copy_msg
    include/linux/zutil.h: fix usage example of zlib_adler32()
    panic: release stale console lock to always get the logbuf printed out
    dma-debug: check nents in dma_sync_sg*
    dma-mapping: tidy up dma_parms default handling
    pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
    kexec: use file name as the output message prefix
    fs, seqfile: always allow oom killer
    seq_file: reuse string_escape_str()
    fs/seq_file: use seq_* helpers in seq_hex_dump()
    coredump: change zap_threads() and zap_process() to use for_each_thread()
    coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
    signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
    signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
    signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
    signals: kill block_all_signals() and unblock_all_signals()
    nilfs2: fix gcc uninitialized-variable warnings in powerpc build
    nilfs2: fix gcc unused-but-set-variable warnings
    MAINTAINERS: nilfs2: add header file for tracing
    nilfs2: add tracepoints for analyzing reading and writing metadata files
    ...

    Linus Torvalds
     
  • Pull rdma updates from Doug Ledford:
    "This is my initial round of 4.4 merge window patches. There are a few
    other things I wish to get in for 4.4 that aren't in this pull, as
    this represents what has gone through merge/build/run testing and not
    what is the last few items for which testing is not yet complete.

    - "Checksum offload support in user space" enablement
    - Misc cxgb4 fixes, add T6 support
    - Misc usnic fixes
    - 32 bit build warning fixes
    - Misc ocrdma fixes
    - Multicast loopback prevention extension
    - Extend the GID cache to store and return attributes of GIDs
    - Misc iSER updates
    - iSER clustering update
    - Network NameSpace support for rdma CM
    - Work Request cleanup series
    - New Memory Registration API"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (76 commits)
    IB/core, cma: Make __attribute_const__ declarations sparse-friendly
    IB/core: Remove old fast registration API
    IB/ipath: Remove fast registration from the code
    IB/hfi1: Remove fast registration from the code
    RDMA/nes: Remove old FRWR API
    IB/qib: Remove old FRWR API
    iw_cxgb4: Remove old FRWR API
    RDMA/cxgb3: Remove old FRWR API
    RDMA/ocrdma: Remove old FRWR API
    IB/mlx4: Remove old FRWR API support
    IB/mlx5: Remove old FRWR API support
    IB/srp: Dont allocate a page vector when using fast_reg
    IB/srp: Remove srp_finish_mapping
    IB/srp: Convert to new registration API
    IB/srp: Split srp_map_sg
    RDS/IW: Convert to new memory registration API
    svcrdma: Port to new memory registration API
    xprtrdma: Port to new memory registration API
    iser-target: Port to new memory registration API
    IB/iser: Port to new fast registration API
    ...

    Linus Torvalds
     
  • Pull trivial updates from Jiri Kosina:
    "Trivial stuff from trivial tree that can be trivially summed up as:

    - treewide drop of spurious unlikely() before IS_ERR() from Viresh
    Kumar

    - cosmetic fixes (that don't really affect basic functionality of the
    driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek

    - various comment / printk fixes and updates all over the place"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
    bcache: Really show state of work pending bit
    hwmon: applesmc: fix comment typos
    Kconfig: remove comment about scsi_wait_scan module
    class_find_device: fix reference to argument "match"
    debugfs: document that debugfs_remove*() accepts NULL and error values
    net: Drop unlikely before IS_ERR(_OR_NULL)
    mm: Drop unlikely before IS_ERR(_OR_NULL)
    fs: Drop unlikely before IS_ERR(_OR_NULL)
    drivers: net: Drop unlikely before IS_ERR(_OR_NULL)
    drivers: misc: Drop unlikely before IS_ERR(_OR_NULL)
    UBI: Update comments to reflect UBI_METAONLY flag
    pktcdvd: drop null test before destroy functions

    Linus Torvalds
     
  • Pull HID updates from Jiri Kosina:
    "Highlights:

    - Intel Skylake Win8 precision touchpads support fixes/improvements
    from Mika Westerberg

    - Lenovo Yoga 2 quirk from Ritesh Raj Sarraf

    - potential uninitialized buffer access fix in HID core from Richard
    Purdie

    - Wacom Intuos and Wacom Cintiq 2 support improvements from Jason
    Gerecke and Ping Cheng

    - initiation of sysfs deprecation process for most of the roccat
    drivers, from the roccat support maintiner Stefan Achatz

    - quite a few device ID / quirk additions and small fixes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (30 commits)
    HID: logitech: Add support for G29
    HID: logitech: Simplify wheel detection scheme
    HID: wacom: Call 'wacom_query_tablet_data' only after 'hid_hw_start'
    HID: wacom: Fix ABS_MISC reporting for Cintiq Companion 2
    HID: wacom: Remove useless conditions from 'wacom_query_tablet_data'
    HID: wacom: fix Intuos wireless report id issue
    HID: fix some indenting issues
    HID: wacom: Expect 'touch_max' touches if HID_DG_CONTACTCOUNT not present
    HID: wacom: Tie cached HID_DG_CONTACTCOUNT indices to report ID
    HID: roccat: Fixed resubmit: Deprecating most Roccat sysfs attributes
    HID: wacom: Report full pressure range for Intuos, Cintiq 13HD Touch
    HID: wacom: Add support for Cintiq Companion 2
    HID: multitouch: Fetch feature reports on demand for Win8 devices
    HID: sensor-hub: Add quirk for Lenovo Yoga 2 with ITE Chips
    HID: usbhid: Fix for the WiiU adapter from Mayflash
    HID: corsair: boolify struct k90_led.removed
    HID: corsair: Add Corsair Vengeance K90 driver
    HID: hid-input: allow input_configured callback return errors
    HID: multitouch: Add suffix for HID_DG_TOUCHPAD
    HID: i2c-hid: Fill in physical device providing HID functionality
    ...

    Linus Torvalds
     
  • Pull livepatching fix from Jiri Kosina:
    "A fix for a kernel oops in case CONFIG_DEBUG_SET_MODULE_RONX is unset
    (as in such case it's possible for module struct to share a page with
    executable text, which is currently not being handled with grace) from
    Josh Poimboeuf"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
    livepatch: Fix crash with !CONFIG_DEBUG_SET_MODULE_RONX

    Linus Torvalds
     

07 Nov, 2015

27 commits

  • d0edd8528362 ("ipc: convert invalid scenarios to use WARN_ON") relaxed the
    nil dst parameter check, originally being a full BUG_ON. However, this
    check seems quite unnecessary when the only purpose is for
    ceckpoint/restore (MSG_COPY flag):

    o The copy variable is set initially to nil, apparently as a way of
    ensuring that prepare_copy is previously called. Which is in fact done,
    unconditionally at the beginning of do_msgrcv.

    o There is no concurrency with 'copy' (stack allocated in do_msgrcv).

    Furthermore, any errors in 'copy' (and thus prepare_copy/copy_msg) should
    always handled by IS_ERR() family. Therefore remove this check altogether
    as it can never occur with the current users.

    Signed-off-by: Davidlohr Bueso
    Cc: Stanislav Kinsbursky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • alder32 was renamed to zlib_adler32 since before 2.6.11.

    Signed-off-by: Anish Bhatt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anish Bhatt
     
  • In some cases we may end up killing the CPU holding the console lock
    while still having valuable data in logbuf. E.g. I'm observing the
    following:

    - A crash is happening on one CPU and console_unlock() is being called on
    some other.

    - console_unlock() tries to print out the buffer before releasing the lock
    and on slow console it takes time.

    - in the meanwhile crashing CPU does lots of printk()-s with valuable data
    (which go to the logbuf) and sends IPIs to all other CPUs.

    - console_unlock() finishes printing previous chunk and enables interrupts
    before trying to print out the rest, the CPU catches the IPI and never
    releases console lock.

    This is not the only possible case: in VT/fb subsystems we have many other
    console_lock()/console_unlock() users. Non-masked interrupts (or
    receiving NMI in case of extreme slowness) will have the same result.
    Getting the whole console buffer printed out on crash should be top
    priority.

    [akpm@linux-foundation.org: tweak comment text]
    Signed-off-by: Vitaly Kuznetsov
    Cc: HATAYAMA Daisuke
    Cc: Masami Hiramatsu
    Cc: Jiri Kosina
    Cc: Baoquan He
    Cc: Prarit Bhargava
    Cc: Xie XiuQi
    Cc: Seth Jennings
    Cc: "K. Y. Srinivasan"
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vitaly Kuznetsov
     
  • Like dma_unmap_sg, dma_sync_sg* should be called with the original number
    of entries passed to dma_map_sg, so do the same check in the sync path as
    we do in the unmap path.

    Signed-off-by: Robin Murphy
    Cc: Arnd Bergmann
    Cc: Marek Szyprowski
    Cc: Sumit Semwal
    Cc: Sakari Ailus
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robin Murphy
     
  • Many DMA controllers and other devices set max_segment_size to
    indicate their scatter-gather capability, but have no interest in
    segment_boundary_mask. However, the existence of a dma_parms structure
    precludes the use of any default value, leaving them as zeros (assuming
    a properly kzalloc'ed structure). If a well-behaved IOMMU (or SWIOTLB)
    then tries to respect this by ensuring a mapped segment does not cross
    a zero-byte boundary, hilarity ensues.

    Since zero is a nonsensical value for either parameter, treat it as an
    indicator for "default", as might be expected. In the process, clean up
    a bit by replacing the bare constants with slightly more meaningful
    macros and removing the superfluous "else" statements.

    [akpm@linux-foundation.org: dma-mapping.h needs sizes.h for SZ_64K]
    Signed-off-by: Robin Murphy
    Reviewed-by: Sumit Semwal
    Acked-by: Marek Szyprowski
    Cc: Arnd Bergmann
    Cc: Sakari Ailus
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robin Murphy
     
  • setpriority(PRIO_USER, 0, x) will change the priority of tasks outside of
    the current pid namespace. This is in contrast to both the other modes of
    setpriority and the example of kill(-1). Fix this. getpriority and
    ioprio have the same failure mode, fix them too.

    Eric said:

    : After some more thinking about it this patch sounds justifiable.
    :
    : My goal with namespaces is not to build perfect isolation mechanisms
    : as that can get into ill defined territory, but to build well defined
    : mechanisms. And to handle the corner cases so you can use only
    : a single namespace with well defined results.
    :
    : In this case you have found the two interfaces I am aware of that
    : identify processes by uid instead of by pid. Which quite frankly is
    : weird. Unfortunately the weird unexpected cases are hard to handle
    : in the usual way.
    :
    : I was hoping for a little more information. Changes like this one we
    : have to be careful of because someone might be depending on the current
    : behavior. I don't think they are and I do think this make sense as part
    : of the pid namespace.

    Signed-off-by: Ben Segall
    Cc: Oleg Nesterov
    Cc: Al Viro
    Cc: Ambrose Feinstein
    Acked-by: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Segall
     
  • kexec output message misses the prefix "kexec", when Dave Young split the
    kexec code. Now, we use file name as the output message prefix.

    Currently, the format of output message:
    [ 140.290795] SYSC_kexec_load: hello, world
    [ 140.291534] kexec: sanity_check_segment_list: hello, world

    Ideally, the format of output message:
    [ 30.791503] kexec: SYSC_kexec_load, Hello, world
    [ 79.182752] kexec_core: sanity_check_segment_list, Hello, world

    Remove the custom prefix "kexec" in output message.

    Signed-off-by: Minfei Huang
    Acked-by: Dave Young
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minfei Huang
     
  • Since 5cec38ac866b ("fs, seq_file: fallback to vmalloc instead of oom kill
    processes") seq_buf_alloc() avoids calling the oom killer for PAGE_SIZE or
    smaller allocations; but larger allocations can use the oom killer via
    vmalloc(). Thus reads of small files can return ENOMEM, but larger files
    use the oom killer to avoid ENOMEM.

    The effect of this bug is that reads from /proc and other virtual
    filesystems can return ENOMEM instead of the preferred behavior - oom
    killing something (possibly the calling process). I don't know of anyone
    except Google who has noticed the issue.

    I suspect the fix is more needed in smaller systems where there isn't any
    reclaimable memory. But these seem like the kinds of systems which
    probably don't use the oom killer for production situations.

    Memory overcommit requires use of the oom killer to select a victim
    regardless of file size.

    Enable oom killer for small seq_buf_alloc() allocations.

    Fixes: 5cec38ac866b ("fs, seq_file: fallback to vmalloc instead of oom kill processes")
    Signed-off-by: David Rientjes
    Signed-off-by: Greg Thelen
    Acked-by: Eric Dumazet
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Greg Thelen
     
  • strint_escape_str() escapes input string by given criteria. In case of
    seq_escape() the criteria is to convert some characters to their octal
    representation.

    Signed-off-by: Andy Shevchenko
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • This improves code readability.

    Signed-off-by: Andy Shevchenko
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Shevchenko
     
  • Change zap_threads() paths to use for_each_thread() rather than
    while_each_thread().

    While at it, change zap_threads() to avoid the nested if's to make the
    code more readable and lessen the indentation.

    Signed-off-by: Oleg Nesterov
    Cc: David Rientjes
    Cc: Kyle Walker
    Cc: Michal Hocko
    Cc: Stanislav Kozina
    Cc: Tetsuo Handa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • task_will_free_mem() is wrong in many ways, and in particular the
    SIGNAL_GROUP_COREDUMP check is not reliable: a task can participate in the
    coredumping without SIGNAL_GROUP_COREDUMP bit set.

    change zap_threads() paths to always set SIGNAL_GROUP_COREDUMP even if
    other CLONE_VM processes can't react to SIGKILL. Fortunately, at least
    oom-kill case if fine; it kills all tasks sharing the same mm, so it
    should also kill the process which actually dumps the core.

    The change in prepare_signal() is not strictly necessary, it just ensures
    that the patch does not bring another subtle behavioural change. But it
    reminds us that this SIGNAL_GROUP_EXIT/COREDUMP case needs more changes.

    Signed-off-by: Oleg Nesterov
    Cc: David Rientjes
    Cc: Kyle Walker
    Acked-by: Michal Hocko
    Cc: Stanislav Kozina
    Cc: Tetsuo Handa
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • jffs2_garbage_collect_thread() does allow_signal(SIGCONT) for no reason,
    SIGCONT will wake a stopped task up even if it is ignored.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Tejun Heo
    Cc: David Woodhouse
    Cc: Felipe Balbi
    Cc: Markus Pargmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • jffs2_garbage_collect_thread() can race with SIGCONT and sleep in
    TASK_STOPPED state after it was already sent. Add the new helper,
    kernel_signal_stop(), which does this correctly.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Tejun Heo
    Cc: David Woodhouse
    Cc: Felipe Balbi
    Cc: Markus Pargmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • 1. Rename dequeue_signal_lock() to kernel_dequeue_signal(). This
    matches another "for kthreads only" kernel_sigaction() helper.

    2. Remove the "tsk" and "mask" arguments, they are always current
    and current->blocked. And it is simply wrong if tsk != current.

    3. We could also remove the 3rd "siginfo_t *info" arg but it looks
    potentially useful. However we can simplify the callers if we
    change kernel_dequeue_signal() to accept info => NULL.

    4. Remove _irqsave, it is never called from atomic context.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: Tejun Heo
    Cc: David Woodhouse
    Cc: Felipe Balbi
    Cc: Markus Pargmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • It is hardly possible to enumerate all problems with block_all_signals()
    and unblock_all_signals(). Just for example,

    1. block_all_signals(SIGSTOP/etc) simply can't help if the caller is
    multithreaded. Another thread can dequeue the signal and force the
    group stop.

    2. Even is the caller is single-threaded, it will "stop" anyway. It
    will not sleep, but it will spin in kernel space until SIGCONT or
    SIGKILL.

    And a lot more. In short, this interface doesn't work at all, at least
    the last 10+ years.

    Daniel said:

    Yeah the only times I played around with the DRM_LOCK stuff was when
    old drivers accidentally deadlocked - my impression is that the entire
    DRM_LOCK thing was never really tested properly ;-) Hence I'm all for
    purging where this leaks out of the drm subsystem.

    Signed-off-by: Oleg Nesterov
    Acked-by: Daniel Vetter
    Acked-by: Dave Airlie
    Cc: Richard Weinberger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Some false positive warnings are reported for powerpc build.

    The following warnings are reported in
    http://kisskb.ellerman.id.au/kisskb/buildresult/12519703/

    CC fs/nilfs2/super.o
    fs/nilfs2/super.c: In function 'nilfs_resize_fs':
    fs/nilfs2/super.c:376:2: warning: 'blocknr' may be used uninitialized in this function [-Wuninitialized]
    fs/nilfs2/super.c:362:11: note: 'blocknr' was declared here
    CC fs/nilfs2/recovery.o
    fs/nilfs2/recovery.c: In function 'nilfs_salvage_orphan_logs':
    fs/nilfs2/recovery.c:631:21: warning: 'sum' may be used uninitialized in this function [-Wuninitialized]
    fs/nilfs2/recovery.c:585:32: note: 'sum' was declared here
    fs/nilfs2/recovery.c: In function 'nilfs_search_super_root':
    fs/nilfs2/recovery.c:873:11: warning: 'sum' may be used uninitialized in this function [-Wuninitialized]

    Another similar warning is reported in
    http://kisskb.ellerman.id.au/kisskb/buildresult/12520079/

    CC fs/nilfs2/btree.o
    fs/nilfs2/btree.c: In function 'nilfs_btree_convert_and_insert':
    include/asm-generic/bitops/non-atomic.h:105:20: warning: 'bh' may be used uninitialized in this function [-Wuninitialized]
    fs/nilfs2/btree.c:1859:22: note: 'bh' was declared here

    This cleans out these warnings by forcing the variables to be initialized.

    Signed-off-by: Ryusuke Konishi
    Reported-by: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • Fix the following build warnings:

    $ make W=1
    [...]
    CC [M] fs/nilfs2/btree.o
    fs/nilfs2/btree.c: In function 'nilfs_btree_split':
    fs/nilfs2/btree.c:923:8: warning: variable 'newptr' set but not used [-Wunused-but-set-variable]
    __u64 newptr;
    ^
    fs/nilfs2/btree.c:922:8: warning: variable 'newkey' set but not used [-Wunused-but-set-variable]
    __u64 newkey;
    ^
    CC [M] fs/nilfs2/dat.o
    fs/nilfs2/dat.c: In function 'nilfs_dat_prepare_end':
    fs/nilfs2/dat.c:158:8: warning: variable 'start' set but not used [-Wunused-but-set-variable]
    __u64 start;
    ^
    CC [M] fs/nilfs2/segment.o
    fs/nilfs2/segment.c: In function 'nilfs_segctor_do_immediate_flush':
    fs/nilfs2/segment.c:2433:6: warning: variable 'err' set but not used [-Wunused-but-set-variable]
    int err;
    ^
    CC [M] fs/nilfs2/sufile.o
    fs/nilfs2/sufile.c: In function 'nilfs_sufile_alloc':
    fs/nilfs2/sufile.c:320:27: warning: variable 'ncleansegs' set but not used [-Wunused-but-set-variable]
    unsigned long nsegments, ncleansegs, nsus, cnt;
    ^
    CC [M] fs/nilfs2/alloc.o
    fs/nilfs2/alloc.c: In function 'nilfs_palloc_prepare_alloc_entry':
    fs/nilfs2/alloc.c:478:38: warning: variable 'groups_per_desc_block' set but not used [-Wunused-but-set-variable]
    unsigned long n, entries_per_group, groups_per_desc_block;
    ^

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • This adds header file "include/trace/events/nilfs2.h" to maintainer-ship
    of nilfs2 so that updates to the nilfs2 header file go to the mailing list
    of nilfs2.

    Signed-off-by: Ryusuke Konishi
    Cc: Hitoshi Mitake
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • This patch adds tracepoints for analyzing requests of reading and writing
    metadata files. The tracepoints cover every in-place mdt files (cpfile,
    sufile, and datfile).

    Example of tracing mdt_insert_new_block():
    cp-14635 [000] ...1 30598.199309: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 155
    cp-14635 [000] ...1 30598.199520: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 5
    cp-14635 [000] ...1 30598.200828: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 253

    Signed-off-by: Hitoshi Mitake
    Signed-off-by: Ryusuke Konishi
    Cc: Steven Rostedt
    Cc: TK Kato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hitoshi Mitake
     
  • This patch adds tracepoints which would be useful for analyzing segment
    usage from a perspective of high level sufile manipulation (check, alloc,
    free). sufile is an important in-place updated metadata file, so
    analyzing the behavior would be useful for performance turning.

    example of usage (a case of allocation):

    $ sudo bin/tpoint nilfs2:nilfs2_segment_usage_allocated
    Tracing nilfs2:nilfs2_segment_usage_allocated. Ctrl-C to end.
    segctord-17800 [002] ...1 10671.867294: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 2
    segctord-17800 [002] ...1 10675.073477: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 3

    Signed-off-by: Hitoshi Mitake
    Signed-off-by: Ryusuke Konishi
    Cc: Steven Rostedt
    Cc: Benixon Dhas
    Cc: TK Kato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hitoshi Mitake
     
  • This patch adds a tracepoint for transaction events of nilfs. With the
    tracepoint, these events can be tracked: begin, abort, commit, trylock,
    lock, and unlock. Basically, these events have corresponding functions
    e.g. begin event corresponds nilfs_transaction_begin(). The unlock event
    is an exception. It corresponds to the iteration in
    nilfs_transaction_lock().

    Only one tracepoint is introcued: nilfs2_transaction_transition. The
    above events are distinguished with newly introduced enum. With this
    tracepoint, we can analyse a critical section of segment constructoin.

    Sample output by tpoint of perf-tools:
    cp-4457 [000] ...1 63.266220: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 1 flags = 9 state = BEGIN
    cp-4457 [000] ...1 63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
    cp-4457 [000] ...1 63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
    segctord-4371 [001] ...1 68.261196: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK
    segctord-4371 [001] ...1 68.261280: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = LOCK
    segctord-4371 [001] ...1 68.261877: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 1 flags = 10 state = BEGIN
    segctord-4371 [001] ...1 68.262116: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = COMMIT
    segctord-4371 [001] ...1 68.265032: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = UNLOCK
    segctord-4371 [001] ...1 132.376847: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK

    This patch also does trivial cleaning of comma usage in collection stage
    transition event for consistent coding style.

    Signed-off-by: Hitoshi Mitake
    Signed-off-by: Ryusuke Konishi
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hitoshi Mitake
     
  • This patch adds a tracepoint for tracking stage transition of block
    collection in segment construction. With the tracepoint, we can analysis
    the behavior of segment construction in depth. It would be useful for
    bottleneck detection and debugging, etc.

    The tracepoint is created with the standard trace API of linux (like ext3,
    ext4, f2fs and btrfs). So we can analysis with existing tools easily. Of
    course, more detailed analysis will be possible if we can create nilfs
    specific analysis tools.

    Below is an example of event dump with Brendan Gregg's perf-tools
    (https://github.com/brendangregg/perf-tools). Time consumption between
    each stage can be obtained.

    $ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition
    Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end.
    segctord-14875 [003] ...1 28311.067794: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_INIT
    segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_GC
    segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_FILE
    segctord-14875 [003] ...1 28311.068486: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_IFILE
    segctord-14875 [003] ...1 28311.068540: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_CPFILE
    segctord-14875 [003] ...1 28311.068561: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SUFILE
    segctord-14875 [003] ...1 28311.068565: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DAT
    segctord-14875 [003] ...1 28311.068573: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SR
    segctord-14875 [003] ...1 28311.068574: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DONE

    For capturing transition correctly, this patch adds wrappers for the
    member scnt of nilfs_cstage. With this change, every transition of the
    stage can produce trace event in a correct manner.

    Signed-off-by: Hitoshi Mitake
    Signed-off-by: Ryusuke Konishi
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hitoshi Mitake
     
  • As a nilfs2 volume ages, the amount of available disk space decreases
    little by little due to bloat of DAT (disk address translation) metadata
    file. Even if we delete all files in a file system and free their block
    addresses from the DAT file through a garbage collection, empty DAT blocks
    are not freed.

    This fixes the issue by extending the deallocator of block addresses so
    that empty data blocks and empty bitmap blocks of DAT are deleted.

    The following comparison shows the effect of this patch. Each shows disk
    amount information of a nilfs2 volume that we cleaned out by deleting all
    files and running gc after having filled 90% of its capacity.

    Before:
    Filesystem 1K-blocks Used Available Use% Mounted on
    /dev/sda1 500105212 3022844 472072192 1% /test

    After:
    Filesystem 1K-blocks Used Available Use% Mounted on
    /dev/sda1 500105212 16380 475078656 1% /test

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • This adds delete functions for data blocks of metadata files using bitmap
    based allocator. nilfs_palloc_delete_entry_block() deletes an entry block
    (e.g. block storing dat entries), and nilfs_palloc_delete_bitmap_block()
    deletes a bitmap block, respectively.

    These helpers are intended to be used in the successive change on
    deallocator of block addresses ("nilfs2: free unused dat file blocks
    during garbage collection").

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • This unfolds nilfs_palloc_group_is_in() helper function into
    nilfs_palloc_freev() function to simplify a range check and an index
    calculation repeatedy performed in a loop of the function.

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi
     
  • The current implementation of nilfs_palloc_find_available_slot() function
    is overkill. The underlying bit search routine is well optimized, so this
    uses it more simply in nilfs_palloc_find_available_slot().

    Signed-off-by: Ryusuke Konishi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ryusuke Konishi