10 Nov, 2015
7 commits
-
The origin document references to cap_vm_enough_memory is because
cap_vm_enough_memory invoked __vm_enough_memory before and it no longer
does now.Signed-off-by: Chun Chen
Acked-by: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Switch everything to the new and more capable implementation of abs().
Mainly to give the new abs() a bit of a workout.Cc: Michal Nazarewicz
Cc: John Stultz
Cc: Ingo Molnar
Cc: Steven Rostedt
Cc: Peter Zijlstra
Cc: Masami Hiramatsu
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
For 64-bit arguments, the abs macro casts it to an int which leads to
lost precision and may cause incorrect results. To deal with 64-bit
types abs64 macro has been introduced but still there are places where
abs macro is used incorrectly.To deal with the problem, expand abs macro such that it operates on s64
type when dealing with 64-bit types while still returning long when
dealing with smaller types.This fixes one known bug (per John):
The internal clocksteering done for fine-grained error correction uses a
: logarithmic approximation, so any time adjtimex() adjusts the clock
: steering, timekeeping_freqadjust() quickly approximates the correct clock
: frequency over a series of ticks.
:
: Unfortunately, the logic in timekeeping_freqadjust(), introduced in commit
: dc491596f639438 (Rework frequency adjustments to work better w/ nohz),
: used the abs() function with a s64 error value to calculate the size of
: the approximated adjustment to be made.
:
: Per include/linux/kernel.h: "abs() should not be used for 64-bit types
: (s64, u64, long long) - use abs64()".
:
: Thus on 32-bit platforms, this resulted in the clocksteering to take a
: quite dampended random walk trying to converge on the proper frequency,
: which caused the adjustments to be made much slower then intended (most
: easily observed when large adjustments are made).Signed-off-by: Michal Nazarewicz
Reported-by: John Stultz
Tested-by: John Stultz
Cc: Ingo Molnar
Cc: Steven Rostedt
Cc: Peter Zijlstra
Cc: Masami Hiramatsu
Cc: Peter Zijlstra
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Mathieu Desnoyers
Acked-by: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A previous commit introduced the new mlock2 syscall, add entries for the
MIPS architecture.Signed-off-by: Eric B Munson
Acked-by: Ralf Baechle
Cc: Catalin Marinas
Cc: Geert Uytterhoeven
Cc: Guenter Roeck
Cc: Heiko Carstens
Cc: Jonathan Corbet
Cc: Kirill A. Shutemov
Cc: Michael Kerrisk
Cc: Michal Hocko
Cc: Shuah Khan
Cc: Stephen Rothwell
Cc: Vlastimil Babka
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix kernel-doc warnings in fs/fs-writeback.c by moving a #define macro to
after the function's opening brace. Also #undef this macro at the end of
the function.../fs/fs-writeback.c:1984: warning: Excess function parameter 'inode' description in 'I_DIRTY_INODE'
../fs/fs-writeback.c:1984: warning: Excess function parameter 'flags' description in 'I_DIRTY_INODE'Signed-off-by: Randy Dunlap
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix kernel-doc warning in fs/inode.c:
../fs/inode.c:1606: warning: No description found for parameter 'inode'
Signed-off-by: Randy Dunlap
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Nov, 2015
6 commits
-
__GFP_WAIT was renamed for __GFP_RECLAIM and the gfpflags_allow_blocking()
helper was added.Cc: Stephen Rothwell
Cc: Catalin Marinas
Cc: Robin Murphy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Merge second patch-bomb from Andrew Morton:
- most of the rest of MM
- procfs
- lib/ updates
- printk updates
- bitops infrastructure tweaks
- checkpatch updates
- nilfs2 update
- signals
- various other misc bits: coredump, seqfile, kexec, pidns, zlib, ipc,
dma-debug, dma-mapping, ...* emailed patches from Andrew Morton : (102 commits)
ipc,msg: drop dst nil validation in copy_msg
include/linux/zutil.h: fix usage example of zlib_adler32()
panic: release stale console lock to always get the logbuf printed out
dma-debug: check nents in dma_sync_sg*
dma-mapping: tidy up dma_parms default handling
pidns: fix set/getpriority and ioprio_set/get in PRIO_USER mode
kexec: use file name as the output message prefix
fs, seqfile: always allow oom killer
seq_file: reuse string_escape_str()
fs/seq_file: use seq_* helpers in seq_hex_dump()
coredump: change zap_threads() and zap_process() to use for_each_thread()
coredump: ensure all coredumping tasks have SIGNAL_GROUP_COREDUMP
signal: remove jffs2_garbage_collect_thread()->allow_signal(SIGCONT)
signal: introduce kernel_signal_stop() to fix jffs2_garbage_collect_thread()
signal: turn dequeue_signal_lock() into kernel_dequeue_signal()
signals: kill block_all_signals() and unblock_all_signals()
nilfs2: fix gcc uninitialized-variable warnings in powerpc build
nilfs2: fix gcc unused-but-set-variable warnings
MAINTAINERS: nilfs2: add header file for tracing
nilfs2: add tracepoints for analyzing reading and writing metadata files
... -
Pull rdma updates from Doug Ledford:
"This is my initial round of 4.4 merge window patches. There are a few
other things I wish to get in for 4.4 that aren't in this pull, as
this represents what has gone through merge/build/run testing and not
what is the last few items for which testing is not yet complete.- "Checksum offload support in user space" enablement
- Misc cxgb4 fixes, add T6 support
- Misc usnic fixes
- 32 bit build warning fixes
- Misc ocrdma fixes
- Multicast loopback prevention extension
- Extend the GID cache to store and return attributes of GIDs
- Misc iSER updates
- iSER clustering update
- Network NameSpace support for rdma CM
- Work Request cleanup series
- New Memory Registration API"* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (76 commits)
IB/core, cma: Make __attribute_const__ declarations sparse-friendly
IB/core: Remove old fast registration API
IB/ipath: Remove fast registration from the code
IB/hfi1: Remove fast registration from the code
RDMA/nes: Remove old FRWR API
IB/qib: Remove old FRWR API
iw_cxgb4: Remove old FRWR API
RDMA/cxgb3: Remove old FRWR API
RDMA/ocrdma: Remove old FRWR API
IB/mlx4: Remove old FRWR API support
IB/mlx5: Remove old FRWR API support
IB/srp: Dont allocate a page vector when using fast_reg
IB/srp: Remove srp_finish_mapping
IB/srp: Convert to new registration API
IB/srp: Split srp_map_sg
RDS/IW: Convert to new memory registration API
svcrdma: Port to new memory registration API
xprtrdma: Port to new memory registration API
iser-target: Port to new memory registration API
IB/iser: Port to new fast registration API
... -
Pull trivial updates from Jiri Kosina:
"Trivial stuff from trivial tree that can be trivially summed up as:- treewide drop of spurious unlikely() before IS_ERR() from Viresh
Kumar- cosmetic fixes (that don't really affect basic functionality of the
driver) for pktcdvd and bcache, from Julia Lawall and Petr Mladek- various comment / printk fixes and updates all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial:
bcache: Really show state of work pending bit
hwmon: applesmc: fix comment typos
Kconfig: remove comment about scsi_wait_scan module
class_find_device: fix reference to argument "match"
debugfs: document that debugfs_remove*() accepts NULL and error values
net: Drop unlikely before IS_ERR(_OR_NULL)
mm: Drop unlikely before IS_ERR(_OR_NULL)
fs: Drop unlikely before IS_ERR(_OR_NULL)
drivers: net: Drop unlikely before IS_ERR(_OR_NULL)
drivers: misc: Drop unlikely before IS_ERR(_OR_NULL)
UBI: Update comments to reflect UBI_METAONLY flag
pktcdvd: drop null test before destroy functions -
Pull HID updates from Jiri Kosina:
"Highlights:- Intel Skylake Win8 precision touchpads support fixes/improvements
from Mika Westerberg- Lenovo Yoga 2 quirk from Ritesh Raj Sarraf
- potential uninitialized buffer access fix in HID core from Richard
Purdie- Wacom Intuos and Wacom Cintiq 2 support improvements from Jason
Gerecke and Ping Cheng- initiation of sysfs deprecation process for most of the roccat
drivers, from the roccat support maintiner Stefan Achatz- quite a few device ID / quirk additions and small fixes"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (30 commits)
HID: logitech: Add support for G29
HID: logitech: Simplify wheel detection scheme
HID: wacom: Call 'wacom_query_tablet_data' only after 'hid_hw_start'
HID: wacom: Fix ABS_MISC reporting for Cintiq Companion 2
HID: wacom: Remove useless conditions from 'wacom_query_tablet_data'
HID: wacom: fix Intuos wireless report id issue
HID: fix some indenting issues
HID: wacom: Expect 'touch_max' touches if HID_DG_CONTACTCOUNT not present
HID: wacom: Tie cached HID_DG_CONTACTCOUNT indices to report ID
HID: roccat: Fixed resubmit: Deprecating most Roccat sysfs attributes
HID: wacom: Report full pressure range for Intuos, Cintiq 13HD Touch
HID: wacom: Add support for Cintiq Companion 2
HID: multitouch: Fetch feature reports on demand for Win8 devices
HID: sensor-hub: Add quirk for Lenovo Yoga 2 with ITE Chips
HID: usbhid: Fix for the WiiU adapter from Mayflash
HID: corsair: boolify struct k90_led.removed
HID: corsair: Add Corsair Vengeance K90 driver
HID: hid-input: allow input_configured callback return errors
HID: multitouch: Add suffix for HID_DG_TOUCHPAD
HID: i2c-hid: Fill in physical device providing HID functionality
... -
Pull livepatching fix from Jiri Kosina:
"A fix for a kernel oops in case CONFIG_DEBUG_SET_MODULE_RONX is unset
(as in such case it's possible for module struct to share a page with
executable text, which is currently not being handled with grace) from
Josh Poimboeuf"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching:
livepatch: Fix crash with !CONFIG_DEBUG_SET_MODULE_RONX
07 Nov, 2015
27 commits
-
d0edd8528362 ("ipc: convert invalid scenarios to use WARN_ON") relaxed the
nil dst parameter check, originally being a full BUG_ON. However, this
check seems quite unnecessary when the only purpose is for
ceckpoint/restore (MSG_COPY flag):o The copy variable is set initially to nil, apparently as a way of
ensuring that prepare_copy is previously called. Which is in fact done,
unconditionally at the beginning of do_msgrcv.o There is no concurrency with 'copy' (stack allocated in do_msgrcv).
Furthermore, any errors in 'copy' (and thus prepare_copy/copy_msg) should
always handled by IS_ERR() family. Therefore remove this check altogether
as it can never occur with the current users.Signed-off-by: Davidlohr Bueso
Cc: Stanislav Kinsbursky
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
alder32 was renamed to zlib_adler32 since before 2.6.11.
Signed-off-by: Anish Bhatt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In some cases we may end up killing the CPU holding the console lock
while still having valuable data in logbuf. E.g. I'm observing the
following:- A crash is happening on one CPU and console_unlock() is being called on
some other.- console_unlock() tries to print out the buffer before releasing the lock
and on slow console it takes time.- in the meanwhile crashing CPU does lots of printk()-s with valuable data
(which go to the logbuf) and sends IPIs to all other CPUs.- console_unlock() finishes printing previous chunk and enables interrupts
before trying to print out the rest, the CPU catches the IPI and never
releases console lock.This is not the only possible case: in VT/fb subsystems we have many other
console_lock()/console_unlock() users. Non-masked interrupts (or
receiving NMI in case of extreme slowness) will have the same result.
Getting the whole console buffer printed out on crash should be top
priority.[akpm@linux-foundation.org: tweak comment text]
Signed-off-by: Vitaly Kuznetsov
Cc: HATAYAMA Daisuke
Cc: Masami Hiramatsu
Cc: Jiri Kosina
Cc: Baoquan He
Cc: Prarit Bhargava
Cc: Xie XiuQi
Cc: Seth Jennings
Cc: "K. Y. Srinivasan"
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Like dma_unmap_sg, dma_sync_sg* should be called with the original number
of entries passed to dma_map_sg, so do the same check in the sync path as
we do in the unmap path.Signed-off-by: Robin Murphy
Cc: Arnd Bergmann
Cc: Marek Szyprowski
Cc: Sumit Semwal
Cc: Sakari Ailus
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Many DMA controllers and other devices set max_segment_size to
indicate their scatter-gather capability, but have no interest in
segment_boundary_mask. However, the existence of a dma_parms structure
precludes the use of any default value, leaving them as zeros (assuming
a properly kzalloc'ed structure). If a well-behaved IOMMU (or SWIOTLB)
then tries to respect this by ensuring a mapped segment does not cross
a zero-byte boundary, hilarity ensues.Since zero is a nonsensical value for either parameter, treat it as an
indicator for "default", as might be expected. In the process, clean up
a bit by replacing the bare constants with slightly more meaningful
macros and removing the superfluous "else" statements.[akpm@linux-foundation.org: dma-mapping.h needs sizes.h for SZ_64K]
Signed-off-by: Robin Murphy
Reviewed-by: Sumit Semwal
Acked-by: Marek Szyprowski
Cc: Arnd Bergmann
Cc: Sakari Ailus
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
setpriority(PRIO_USER, 0, x) will change the priority of tasks outside of
the current pid namespace. This is in contrast to both the other modes of
setpriority and the example of kill(-1). Fix this. getpriority and
ioprio have the same failure mode, fix them too.Eric said:
: After some more thinking about it this patch sounds justifiable.
:
: My goal with namespaces is not to build perfect isolation mechanisms
: as that can get into ill defined territory, but to build well defined
: mechanisms. And to handle the corner cases so you can use only
: a single namespace with well defined results.
:
: In this case you have found the two interfaces I am aware of that
: identify processes by uid instead of by pid. Which quite frankly is
: weird. Unfortunately the weird unexpected cases are hard to handle
: in the usual way.
:
: I was hoping for a little more information. Changes like this one we
: have to be careful of because someone might be depending on the current
: behavior. I don't think they are and I do think this make sense as part
: of the pid namespace.Signed-off-by: Ben Segall
Cc: Oleg Nesterov
Cc: Al Viro
Cc: Ambrose Feinstein
Acked-by: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
kexec output message misses the prefix "kexec", when Dave Young split the
kexec code. Now, we use file name as the output message prefix.Currently, the format of output message:
[ 140.290795] SYSC_kexec_load: hello, world
[ 140.291534] kexec: sanity_check_segment_list: hello, worldIdeally, the format of output message:
[ 30.791503] kexec: SYSC_kexec_load, Hello, world
[ 79.182752] kexec_core: sanity_check_segment_list, Hello, worldRemove the custom prefix "kexec" in output message.
Signed-off-by: Minfei Huang
Acked-by: Dave Young
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Since 5cec38ac866b ("fs, seq_file: fallback to vmalloc instead of oom kill
processes") seq_buf_alloc() avoids calling the oom killer for PAGE_SIZE or
smaller allocations; but larger allocations can use the oom killer via
vmalloc(). Thus reads of small files can return ENOMEM, but larger files
use the oom killer to avoid ENOMEM.The effect of this bug is that reads from /proc and other virtual
filesystems can return ENOMEM instead of the preferred behavior - oom
killing something (possibly the calling process). I don't know of anyone
except Google who has noticed the issue.I suspect the fix is more needed in smaller systems where there isn't any
reclaimable memory. But these seem like the kinds of systems which
probably don't use the oom killer for production situations.Memory overcommit requires use of the oom killer to select a victim
regardless of file size.Enable oom killer for small seq_buf_alloc() allocations.
Fixes: 5cec38ac866b ("fs, seq_file: fallback to vmalloc instead of oom kill processes")
Signed-off-by: David Rientjes
Signed-off-by: Greg Thelen
Acked-by: Eric Dumazet
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
strint_escape_str() escapes input string by given criteria. In case of
seq_escape() the criteria is to convert some characters to their octal
representation.Signed-off-by: Andy Shevchenko
Cc: Alexander Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This improves code readability.
Signed-off-by: Andy Shevchenko
Cc: Alexander Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Change zap_threads() paths to use for_each_thread() rather than
while_each_thread().While at it, change zap_threads() to avoid the nested if's to make the
code more readable and lessen the indentation.Signed-off-by: Oleg Nesterov
Cc: David Rientjes
Cc: Kyle Walker
Cc: Michal Hocko
Cc: Stanislav Kozina
Cc: Tetsuo Handa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
task_will_free_mem() is wrong in many ways, and in particular the
SIGNAL_GROUP_COREDUMP check is not reliable: a task can participate in the
coredumping without SIGNAL_GROUP_COREDUMP bit set.change zap_threads() paths to always set SIGNAL_GROUP_COREDUMP even if
other CLONE_VM processes can't react to SIGKILL. Fortunately, at least
oom-kill case if fine; it kills all tasks sharing the same mm, so it
should also kill the process which actually dumps the core.The change in prepare_signal() is not strictly necessary, it just ensures
that the patch does not bring another subtle behavioural change. But it
reminds us that this SIGNAL_GROUP_EXIT/COREDUMP case needs more changes.Signed-off-by: Oleg Nesterov
Cc: David Rientjes
Cc: Kyle Walker
Acked-by: Michal Hocko
Cc: Stanislav Kozina
Cc: Tetsuo Handa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
jffs2_garbage_collect_thread() does allow_signal(SIGCONT) for no reason,
SIGCONT will wake a stopped task up even if it is ignored.Signed-off-by: Oleg Nesterov
Reviewed-by: Tejun Heo
Cc: David Woodhouse
Cc: Felipe Balbi
Cc: Markus Pargmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
jffs2_garbage_collect_thread() can race with SIGCONT and sleep in
TASK_STOPPED state after it was already sent. Add the new helper,
kernel_signal_stop(), which does this correctly.Signed-off-by: Oleg Nesterov
Reviewed-by: Tejun Heo
Cc: David Woodhouse
Cc: Felipe Balbi
Cc: Markus Pargmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
1. Rename dequeue_signal_lock() to kernel_dequeue_signal(). This
matches another "for kthreads only" kernel_sigaction() helper.2. Remove the "tsk" and "mask" arguments, they are always current
and current->blocked. And it is simply wrong if tsk != current.3. We could also remove the 3rd "siginfo_t *info" arg but it looks
potentially useful. However we can simplify the callers if we
change kernel_dequeue_signal() to accept info => NULL.4. Remove _irqsave, it is never called from atomic context.
Signed-off-by: Oleg Nesterov
Reviewed-by: Tejun Heo
Cc: David Woodhouse
Cc: Felipe Balbi
Cc: Markus Pargmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It is hardly possible to enumerate all problems with block_all_signals()
and unblock_all_signals(). Just for example,1. block_all_signals(SIGSTOP/etc) simply can't help if the caller is
multithreaded. Another thread can dequeue the signal and force the
group stop.2. Even is the caller is single-threaded, it will "stop" anyway. It
will not sleep, but it will spin in kernel space until SIGCONT or
SIGKILL.And a lot more. In short, this interface doesn't work at all, at least
the last 10+ years.Daniel said:
Yeah the only times I played around with the DRM_LOCK stuff was when
old drivers accidentally deadlocked - my impression is that the entire
DRM_LOCK thing was never really tested properly ;-) Hence I'm all for
purging where this leaks out of the drm subsystem.Signed-off-by: Oleg Nesterov
Acked-by: Daniel Vetter
Acked-by: Dave Airlie
Cc: Richard Weinberger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Some false positive warnings are reported for powerpc build.
The following warnings are reported in
http://kisskb.ellerman.id.au/kisskb/buildresult/12519703/CC fs/nilfs2/super.o
fs/nilfs2/super.c: In function 'nilfs_resize_fs':
fs/nilfs2/super.c:376:2: warning: 'blocknr' may be used uninitialized in this function [-Wuninitialized]
fs/nilfs2/super.c:362:11: note: 'blocknr' was declared here
CC fs/nilfs2/recovery.o
fs/nilfs2/recovery.c: In function 'nilfs_salvage_orphan_logs':
fs/nilfs2/recovery.c:631:21: warning: 'sum' may be used uninitialized in this function [-Wuninitialized]
fs/nilfs2/recovery.c:585:32: note: 'sum' was declared here
fs/nilfs2/recovery.c: In function 'nilfs_search_super_root':
fs/nilfs2/recovery.c:873:11: warning: 'sum' may be used uninitialized in this function [-Wuninitialized]Another similar warning is reported in
http://kisskb.ellerman.id.au/kisskb/buildresult/12520079/CC fs/nilfs2/btree.o
fs/nilfs2/btree.c: In function 'nilfs_btree_convert_and_insert':
include/asm-generic/bitops/non-atomic.h:105:20: warning: 'bh' may be used uninitialized in this function [-Wuninitialized]
fs/nilfs2/btree.c:1859:22: note: 'bh' was declared hereThis cleans out these warnings by forcing the variables to be initialized.
Signed-off-by: Ryusuke Konishi
Reported-by: Geert Uytterhoeven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix the following build warnings:
$ make W=1
[...]
CC [M] fs/nilfs2/btree.o
fs/nilfs2/btree.c: In function 'nilfs_btree_split':
fs/nilfs2/btree.c:923:8: warning: variable 'newptr' set but not used [-Wunused-but-set-variable]
__u64 newptr;
^
fs/nilfs2/btree.c:922:8: warning: variable 'newkey' set but not used [-Wunused-but-set-variable]
__u64 newkey;
^
CC [M] fs/nilfs2/dat.o
fs/nilfs2/dat.c: In function 'nilfs_dat_prepare_end':
fs/nilfs2/dat.c:158:8: warning: variable 'start' set but not used [-Wunused-but-set-variable]
__u64 start;
^
CC [M] fs/nilfs2/segment.o
fs/nilfs2/segment.c: In function 'nilfs_segctor_do_immediate_flush':
fs/nilfs2/segment.c:2433:6: warning: variable 'err' set but not used [-Wunused-but-set-variable]
int err;
^
CC [M] fs/nilfs2/sufile.o
fs/nilfs2/sufile.c: In function 'nilfs_sufile_alloc':
fs/nilfs2/sufile.c:320:27: warning: variable 'ncleansegs' set but not used [-Wunused-but-set-variable]
unsigned long nsegments, ncleansegs, nsus, cnt;
^
CC [M] fs/nilfs2/alloc.o
fs/nilfs2/alloc.c: In function 'nilfs_palloc_prepare_alloc_entry':
fs/nilfs2/alloc.c:478:38: warning: variable 'groups_per_desc_block' set but not used [-Wunused-but-set-variable]
unsigned long n, entries_per_group, groups_per_desc_block;
^Signed-off-by: Ryusuke Konishi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This adds header file "include/trace/events/nilfs2.h" to maintainer-ship
of nilfs2 so that updates to the nilfs2 header file go to the mailing list
of nilfs2.Signed-off-by: Ryusuke Konishi
Cc: Hitoshi Mitake
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch adds tracepoints for analyzing requests of reading and writing
metadata files. The tracepoints cover every in-place mdt files (cpfile,
sufile, and datfile).Example of tracing mdt_insert_new_block():
cp-14635 [000] ...1 30598.199309: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 155
cp-14635 [000] ...1 30598.199520: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 5
cp-14635 [000] ...1 30598.200828: nilfs2_mdt_insert_new_block: inode = ffff88022a8d0178 ino = 3 block = 253Signed-off-by: Hitoshi Mitake
Signed-off-by: Ryusuke Konishi
Cc: Steven Rostedt
Cc: TK Kato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch adds tracepoints which would be useful for analyzing segment
usage from a perspective of high level sufile manipulation (check, alloc,
free). sufile is an important in-place updated metadata file, so
analyzing the behavior would be useful for performance turning.example of usage (a case of allocation):
$ sudo bin/tpoint nilfs2:nilfs2_segment_usage_allocated
Tracing nilfs2:nilfs2_segment_usage_allocated. Ctrl-C to end.
segctord-17800 [002] ...1 10671.867294: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 2
segctord-17800 [002] ...1 10675.073477: nilfs2_segment_usage_allocated: sufile = ffff880054f908a8 segnum = 3Signed-off-by: Hitoshi Mitake
Signed-off-by: Ryusuke Konishi
Cc: Steven Rostedt
Cc: Benixon Dhas
Cc: TK Kato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch adds a tracepoint for transaction events of nilfs. With the
tracepoint, these events can be tracked: begin, abort, commit, trylock,
lock, and unlock. Basically, these events have corresponding functions
e.g. begin event corresponds nilfs_transaction_begin(). The unlock event
is an exception. It corresponds to the iteration in
nilfs_transaction_lock().Only one tracepoint is introcued: nilfs2_transaction_transition. The
above events are distinguished with newly introduced enum. With this
tracepoint, we can analyse a critical section of segment constructoin.Sample output by tpoint of perf-tools:
cp-4457 [000] ...1 63.266220: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 1 flags = 9 state = BEGIN
cp-4457 [000] ...1 63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
cp-4457 [000] ...1 63.266221: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800bf5ccc58 count = 0 flags = 9 state = COMMIT
segctord-4371 [001] ...1 68.261196: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCK
segctord-4371 [001] ...1 68.261280: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = LOCK
segctord-4371 [001] ...1 68.261877: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 1 flags = 10 state = BEGIN
segctord-4371 [001] ...1 68.262116: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = COMMIT
segctord-4371 [001] ...1 68.265032: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 18 state = UNLOCK
segctord-4371 [001] ...1 132.376847: nilfs2_transaction_transition: sb = ffff8802112b8800 ti = ffff8800b889bdf8 count = 0 flags = 10 state = TRYLOCKThis patch also does trivial cleaning of comma usage in collection stage
transition event for consistent coding style.Signed-off-by: Hitoshi Mitake
Signed-off-by: Ryusuke Konishi
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch adds a tracepoint for tracking stage transition of block
collection in segment construction. With the tracepoint, we can analysis
the behavior of segment construction in depth. It would be useful for
bottleneck detection and debugging, etc.The tracepoint is created with the standard trace API of linux (like ext3,
ext4, f2fs and btrfs). So we can analysis with existing tools easily. Of
course, more detailed analysis will be possible if we can create nilfs
specific analysis tools.Below is an example of event dump with Brendan Gregg's perf-tools
(https://github.com/brendangregg/perf-tools). Time consumption between
each stage can be obtained.$ sudo bin/tpoint nilfs2:nilfs2_collection_stage_transition
Tracing nilfs2:nilfs2_collection_stage_transition. Ctrl-C to end.
segctord-14875 [003] ...1 28311.067794: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_INIT
segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_GC
segctord-14875 [003] ...1 28311.068139: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_FILE
segctord-14875 [003] ...1 28311.068486: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_IFILE
segctord-14875 [003] ...1 28311.068540: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_CPFILE
segctord-14875 [003] ...1 28311.068561: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SUFILE
segctord-14875 [003] ...1 28311.068565: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DAT
segctord-14875 [003] ...1 28311.068573: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_SR
segctord-14875 [003] ...1 28311.068574: nilfs2_collection_stage_transition: sci = ffff8800ce6de000 stage = ST_DONEFor capturing transition correctly, this patch adds wrappers for the
member scnt of nilfs_cstage. With this change, every transition of the
stage can produce trace event in a correct manner.Signed-off-by: Hitoshi Mitake
Signed-off-by: Ryusuke Konishi
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
As a nilfs2 volume ages, the amount of available disk space decreases
little by little due to bloat of DAT (disk address translation) metadata
file. Even if we delete all files in a file system and free their block
addresses from the DAT file through a garbage collection, empty DAT blocks
are not freed.This fixes the issue by extending the deallocator of block addresses so
that empty data blocks and empty bitmap blocks of DAT are deleted.The following comparison shows the effect of this patch. Each shows disk
amount information of a nilfs2 volume that we cleaned out by deleting all
files and running gc after having filled 90% of its capacity.Before:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 500105212 3022844 472072192 1% /testAfter:
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda1 500105212 16380 475078656 1% /testSigned-off-by: Ryusuke Konishi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This adds delete functions for data blocks of metadata files using bitmap
based allocator. nilfs_palloc_delete_entry_block() deletes an entry block
(e.g. block storing dat entries), and nilfs_palloc_delete_bitmap_block()
deletes a bitmap block, respectively.These helpers are intended to be used in the successive change on
deallocator of block addresses ("nilfs2: free unused dat file blocks
during garbage collection").Signed-off-by: Ryusuke Konishi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This unfolds nilfs_palloc_group_is_in() helper function into
nilfs_palloc_freev() function to simplify a range check and an index
calculation repeatedy performed in a loop of the function.Signed-off-by: Ryusuke Konishi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The current implementation of nilfs_palloc_find_available_slot() function
is overkill. The underlying bit search routine is well optimized, so this
uses it more simply in nilfs_palloc_find_available_slot().Signed-off-by: Ryusuke Konishi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds