21 Jan, 2016
40 commits
-
Use kobj_to_dev() instead of open-coding it.
Signed-off-by: Geliang Tang
Acked-by: "Bounine, Alexandre"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Move the stuff currently only used by the kexec file code within
CONFIG_KEXEC_FILE (and CONFIG_KEXEC_VERIFY_SIG).Also move internal "struct kexec_sha_region" and "struct kexec_buf" into
"kexec_internal.h".Signed-off-by: Xunlei Pang
Cc: "Eric W. Biederman"
Cc: Dave Young
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Use list_for_each_entry_safe() instead of list_for_each_safe() to
simplify the code.Signed-off-by: Geliang Tang
Cc: Dave Young
Cc: Vivek Goyal
Acked-by: Baoquan He
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
sanity_check_segment_list() checks KEXEC_TYPE_CRASH flag to ensure all the
segments of the loaded crash kernel are within the kernel crash resource
limits, so set the flag beforehand.Signed-off-by: Xunlei Pang
Acked-by: Dave Young
Cc: Eric Biederman
Cc: Vivek Goyal
Acked-by: Baoquan He
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Almost all callers of the set_cpu_* functions pass an explicit true or
false. Making them static inline thus replaces the function calls with a
simple set_bit/clear_bit, saving some .text.Signed-off-by: Rasmus Villemoes
Acked-by: Rusty Russell
Cc: Greg Kroah-Hartman
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Replace the variables cpu_possible_mask, cpu_online_mask, cpu_present_mask
and cpu_active_mask with macros expanding to expressions of the same type
and value, eliminating some indirection.Signed-off-by: Rasmus Villemoes
Acked-by: Rusty Russell
Cc: Greg Kroah-Hartman
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The only user of the lvalue-ness of the cpu_*_mask variables is in
drivers/base/cpu.c, and that is mostly a work-around for the fact that not
even const variables can be used in static initialization. Now that the
underlying struct cpumasks are exposed we can take their address.Signed-off-by: Rasmus Villemoes
Acked-by: Rusty Russell
Acked-by: Greg Kroah-Hartman
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Exporting the cpumasks __cpu_possible_mask and friends will allow us to
remove the extra indirection through the cpu_*_mask variables. It will
also allow the set_cpu_* functions to become static inlines, which will
give a .text reduction.Signed-off-by: Rasmus Villemoes
Acked-by: Rusty Russell
Cc: Greg Kroah-Hartman
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Change cpu_possible_bits and friends (online, present, active) from being
bitmaps that happen to have the right size to actually being struct
cpumasks. Also rename them to __cpu_xyz_mask. This is mostly a small
cleanup in preparation for exporting them and, eventually, eliminating the
extra indirection through the cpu_xyz_mask variables.Signed-off-by: Rasmus Villemoes
Acked-by: Rusty Russell
Cc: Greg Kroah-Hartman
Cc: Michael Ellerman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The four cpumasks cpu_{possible,online,present,active}_bits are exposed
readonly via the corresponding const variables cpu_xyz_mask. But they are
also accessible for arbitrary writing via the exposed functions
set_cpu_xyz. There's quite a bit of code throughout the kernel which
iterates over or otherwise accesses these bitmaps, and having the access
go via the cpu_xyz_mask variables is nowadays [1] simply a useless
indirection.It may be that any problem in CS can be solved by an extra level of
indirection, but that doesn't mean every extra indirection solves a
problem. In this case, it even necessitates some minor ugliness (see
4/6).Patch 1/6 is new in v2, and fixes a build failure on ppc by renaming a
struct member, to avoid problems when the identifier cpu_online_mask
becomes a macro later in the series. The next four patches eliminate the
cpu_xyz_mask variables by simply exposing the actual bitmaps, after
renaming them to discourage direct access - that still happens through
cpu_xyz_mask, which are now simply macros with the same type and value as
they used to have.After that, there's no longer any reason to have the setter functions be
out-of-line: The boolean parameter is almost always a literal true or
false, so by making them static inlines they will usually compile to one
or two instructions.For a defconfig build on x86_64, bloat-o-meter says we save ~3000 bytes.
We also save a little stack (stackdelta says 127 functions have a 16 byte
smaller stack frame, while two grow by that amount). Mostly because, when
iterating over the mask, gcc typically loads the value of cpu_xyz_mask
into a callee-saved register and from there into %rdi before each
find_next_bit call - now it can just load the appropriate immediate
address into %rdi before each call.[1] See Rusty's kind explanation
http://thread.gmane.org/gmane.linux.kernel/2047078/focus=2047722 for
some historic context.This patch (of 6):
As preparation for eliminating the indirect access to the various global
cpu_*_bits bitmaps via the pointer variables cpu_*_mask, rename the
cpu_online_mask member of struct fadump_crash_info_header to simply
online_mask, thus allowing cpu_online_mask to become a macro.Signed-off-by: Rasmus Villemoes
Acked-by: Michael Ellerman
Cc: Greg Kroah-Hartman
Cc: Rusty Russell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Dmitry Safonov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Let %h and %e print empty values as "!", "." as "!" and
".." as "!.".This prevents hostnames and comm values that are empty or consist of one
or two dots from changing the directory level at which the corefile will
be stored.Consider the case where someone decides to sort coredumps by hostname
with a core pattern like "/cores/%h/core.%e.%p.%t" or so. In this
case, hostnames "" and "." would cause the coredump to land directly in
/cores, which is not what the intent behind the core pattern is, and
".." would cause the coredump to land in /.Yeah, there probably aren't many people who do that, but I still don't
want this edgecase to be kind of broken.It seems very unlikely that this caused security issues anywhere, so I'm
not requesting a stable backport.[akpm@linux-foundation.org: tweak code comment]
Signed-off-by: Jann Horn
Acked-by: Kees Cook
Cc: Alexander Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
By checking the effective credentials instead of the real UID / permitted
capabilities, ensure that the calling process actually intended to use its
credentials.To ensure that all ptrace checks use the correct caller credentials (e.g.
in case out-of-tree code or newly added code omits the PTRACE_MODE_*CREDS
flag), use two new flags and require one of them to be set.The problem was that when a privileged task had temporarily dropped its
privileges, e.g. by calling setreuid(0, user_uid), with the intent to
perform following syscalls with the credentials of a user, it still passed
ptrace access checks that the user would not be able to pass.While an attacker should not be able to convince the privileged task to
perform a ptrace() syscall, this is a problem because the ptrace access
check is reused for things in procfs.In particular, the following somewhat interesting procfs entries only rely
on ptrace access checks:/proc/$pid/stat - uses the check for determining whether pointers
should be visible, useful for bypassing ASLR
/proc/$pid/maps - also useful for bypassing ASLR
/proc/$pid/cwd - useful for gaining access to restricted
directories that contain files with lax permissions, e.g. in
this scenario:
lrwxrwxrwx root root /proc/13020/cwd -> /root/foobar
drwx------ root root /root
drwxr-xr-x root root /root/foobar
-rw-r--r-- root root /root/foobar/secretTherefore, on a system where a root-owned mode 6755 binary changes its
effective credentials as described and then dumps a user-specified file,
this could be used by an attacker to reveal the memory layout of root's
processes or reveal the contents of files he is not allowed to access
(through /proc/$pid/cwd).[akpm@linux-foundation.org: fix warning]
Signed-off-by: Jann Horn
Acked-by: Kees Cook
Cc: Casey Schaufler
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: James Morris
Cc: "Serge E. Hallyn"
Cc: Andy Shevchenko
Cc: Andy Lutomirski
Cc: Al Viro
Cc: "Eric W. Biederman"
Cc: Willy Tarreau
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It looks like smack and yama weren't aware that the ptrace mode
can have flags ORed into it - PTRACE_MODE_NOAUDIT until now, but
only for /proc/$pid/stat, and with the PTRACE_MODE_*CREDS patch,
all modes have flags ORed into them.Signed-off-by: Jann Horn
Acked-by: Kees Cook
Acked-by: Casey Schaufler
Cc: Oleg Nesterov
Cc: Ingo Molnar
Cc: James Morris
Cc: "Serge E. Hallyn"
Cc: Andy Shevchenko
Cc: Andy Lutomirski
Cc: Al Viro
Cc: "Eric W. Biederman"
Cc: Willy Tarreau
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
task_stopped_code()->task_is_stopped_or_traced() doesn't look right, the
traced task must never be TASK_STOPPED.We can not add WARN_ON(task_is_stopped(p)), but this is only because
do_wait() can race with PTRACE_ATTACH from another thread.[akpm@linux-foundation.org: teeny cleanup]
Signed-off-by: Oleg Nesterov
Cc: Andrey Ryabinin
Cc: Roland McGrath
Acked-by: Tejun Heo
Cc: Pedro Alves
Cc: Jan Kratochvil
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
ptrace_attach() can hang waiting for STOPPED -> TRACED transition if the
tracee gets frozen in between, change wait_on_bit() to use TASK_KILLABLE.This doesn't really solve the problem(s) and we probably need to fix the
freezer. In particular, note that this means that pm freezer will fail if
it races attach-to-stopped-task.And otoh perhaps we can just remove JOBCTL_TRAPPING_BIT altogether, it is
not clear if we really need to hide this transition from debugger, WNOHANG
after PTRACE_ATTACH can fail anyway if it races with SIGCONT.Signed-off-by: Oleg Nesterov
Reported-by: Andrey Ryabinin
Cc: Roland McGrath
Acked-by: Tejun Heo
Cc: Pedro Alves
Cc: Jan Kratochvil
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The fatent_operations structures are never modified, so declare them as
const.Done with the help of Coccinelle.
Signed-off-by: Julia Lawall
Acked-by: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Update the limitation for fat fallocate.
Signed-off-by: Namjae Jeon
Signed-off-by: Amit Sahrawat
Cc: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make the fibmap call return the proper physical block number for any
offset request in the fallocated range.Signed-off-by: Namjae Jeon
Signed-off-by: Amit Sahrawat
Cc: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Skip new cluster allocation after checking i_blocks limit in _fat_get_block,
because the blocks are already allocated in fallocated region.Signed-off-by: Namjae Jeon
Signed-off-by: Amit Sahrawat
Cc: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Implement preallocation via the fallocate syscall on VFAT partitions.
This patch is based on an earlier patch of the same name which had some
issues detailed below and did not get accepted. Refer
https://lkml.org/lkml/2007/12/22/130.a) The preallocated space was not persistent when the
FALLOC_FL_KEEP_SIZE flag was set. It will deallocate cluster at evict
time.b) There was no need to zero out the clusters when the flag was set
Instead of doing an expanding truncate, just allocate clusters and add
them to the fat chain. This reduces preallocation time.Compatibility with windows:
There are no issues when FALLOC_FL_KEEP_SIZE is not set because it just
does an expanding truncate. Thus reading from the preallocated area on
windows returns null until data is written to it.When a file with preallocated area using the FALLOC_FL_KEEP_SIZE was
written to on windows, the windows driver freed-up the preallocated
clusters and allocated new clusters for the new data. The freed up
clusters gets reflected in the free space available for the partition
which can be seen from the Volume properties.The windows chkdsk tool also does not report any errors on a disk
containing files with preallocated space.And there is also no issue using linux fat fsck. because discard
preallocated clusters at repair time.Signed-off-by: Namjae Jeon
Signed-off-by: Amit Sahrawat
Cc: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This detects simple corruption cases of directory, and tries to avoid
further damage to user data.And performance impact of this validation should be very low, or not
measurable.Signed-off-by: OGAWA Hirofumi
Reported-by: Vegard Nossum
Tested-by: Vegard Nossum
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently we limit values of time_offset mount option to be between -12
and 12 hours. However e.g. zone GMT+12 can have a DST correction on top
which makes the total time difference 13 hours. Update the checks in
mount option parsing to allow offset of upto 24 hours to allow for unusual
cases.Signed-off-by: Jan Kara
Reported-by: Volker Kuhlmann
Acked-by: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Use list_for_each_entry() instead of list_for_each() to simplify the code.
Signed-off-by: Geliang Tang
Reviewed-by: Vyacheslav Dubeyko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make initrd_load() return bool due to this particular function only using
either one or zero as its return value.No functional change.
Signed-off-by: Yaowei Bai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make obsolete_checksetup() return bool due to this particular function
only using either one or zero as its return value.No functional change.
Signed-off-by: Yaowei Bai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently, epoll file descriptors or epfds (the fd returned from
epoll_create[1]()) that are added to a shared wakeup source are always
added in a non-exclusive manner. This means that when we have multiple
epfds attached to a shared fd source they are all woken up. This creates
thundering herd type behavior.Introduce a new 'EPOLLEXCLUSIVE' flag that can be passed as part of the
'event' argument during an epoll_ctl() EPOLL_CTL_ADD operation. This new
flag allows for exclusive wakeups when there are multiple epfds attached
to a shared fd event source.The implementation walks the list of exclusive waiters, and queues an
event to each epfd, until it finds the first waiter that has threads
blocked on it via epoll_wait(). The idea is to search for threads which
are idle and ready to process the wakeup events. Thus, we queue an event
to at least 1 epfd, but may still potentially queue an event to all epfds
that are attached to the shared fd source.Performance testing was done by Madars Vitolins using a modified version
of Enduro/X. The use of the 'EPOLLEXCLUSIVE' flag reduce the length of
this particular workload from 860s down to 24s.Sample epoll_clt text:
EPOLLEXCLUSIVE
Sets an exclusive wakeup mode for the epfd file descriptor that is
being attached to the target file descriptor, fd. Thus, when an event
occurs and multiple epfd file descriptors are attached to the same
target file using EPOLLEXCLUSIVE, one or more epfds will receive an
event with epoll_wait(2). The default in this scenario (when
EPOLLEXCLUSIVE is not set) is for all epfds to receive an event.
EPOLLEXCLUSIVE may only be specified with the op EPOLL_CTL_ADD.Signed-off-by: Jason Baron
Tested-by: Madars Vitolins
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Al Viro
Cc: Michael Kerrisk
Cc: Eric Wong
Cc: Jonathan Corbet
Cc: Andy Lutomirski
Cc: Hagen Paul Pfeifer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A simple search over the kernel souce displays a number of correctly
defined multiline macro, which generally are used as an array element
initializer:% find ../linux -type f | xargs grep -B1 -H '^[:space]*\[.*\\$'
However checkpatch.pl unexpectedly complains about all these macro
definitions:% ./scripts/checkpatch.pl --types COMPLEX_MACRO -f include/linux/perf/arm_pmu.h
ERROR: Macros with complex values should be enclosed in parentheses
+#define PERF_MAP_ALL_UNSUPPORTED \
+ [0 ... PERF_COUNT_HW_MAX - 1] = HW_OP_UNSUPPORTEDThe change intends to fix this type of false positives by flattening
only array members and skipping array element designators.Signed-off-by: Vladimir Zapolskiy
Acked-by: Joe Perches
Cc: Andy Whitcroft
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The current test excludes any macro with ## concatenation from being
reported with hidden flow control.Some macros are used with return or goto statements along with ##args or
##__VA_ARGS__. A somewhat common case is a logging macro like
pr_info(fmt, ...) then a return or goto statement.Check the concatenated variable for args or __VA_ARGS__ and allow those
macros to also be reported when they contain a return or goto.Signed-off-by: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Linus Torvalds wrote:
> I can't but help to react that this:
> #define IOMMU_ERROR_CODE (~(unsigned long) 0)
> Not that this *matters*, but it's a bit odd to have to cast constants
> to perfectly regular C types.So add a test that looks for constants that are cast to
standard C90 int or longer types and suggest using C90
"6.4.4.1 Integer constants" integer-suffixes instead.Miscellanea:
o Add a --fix option too
Signed-off-by: Joe Perches
Suggested-by: Andrew Morton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The clz table (__clz_tab) in lib/clz_tab.c is also provided as part of
libgcc.a, and many architectures link against libgcc. To allow the
linker to avoid a multiple-definition link failure, clz_tab.o has to be
in lib/lib.a rather than lib/builtin.o. The specific issue is that
libgcc.a comes before lib/builtin.o on vmlinux.o's link command line, so
its _clz.o is pulled to satisfy __clz_tab, and then when the remainder
of lib/builtin.o is pulled in to satisfy all the other dependencies, the
__clz_tab symbols conflict. By putting clz_tab.o in lib.a, the linker
can simply avoid pulling it into vmlinux.o when this situation arises.The definitions of __clz_tab are the same in libgcc.a and in the kernel;
arguably we could also simply rename the kernel version, but it's
unlikely the libgcc version will ever change to become incompatible, so
just using it seems reasonably safe.Signed-off-by: Chris Metcalf
Acked-by: David S. Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This text refers to the "first 7 functions", which was correct when
written but became incorrect when Johannes Weiner added another function
to the list in 139e561660fe ("lib: radix_tree: tree node interface").Change the text to correctly refer to the first 8 functions.
Signed-off-by: Adam Barth
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Like others test are doing print the gathered statistics after test module
is finished. Return from the module based on the result.Signed-off-by: Andy Shevchenko
Acked-by: Rasmus Villemoes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently the only one combination is tested for overflow, i.e. rowsize =
16, groupsize = 1, len = 1. Do various test to go through all possible
branches.Signed-off-by: Andy Shevchenko
Cc: Rasmus Villemoes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
After processing by hex_dump_to_buffer() check all the parts to be expected.
Part 1. The actual expected hex dump with or without ASCII part.
Part 2. Check if the buffer is dirty beyond needed.
Part 3. Return code should be as expected.
This is done by using comparison of the return code and memcmp() against
the test buffer. We fill the buffer by FILL_CHAR ('#') characters, so, we
expect to have a tail of the buffer will be left untouched. The
terminating NUL is also checked by memcmp().Signed-off-by: Andy Shevchenko
Cc: Rasmus Villemoes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Better to use memcmp() against entire buffer to check that nothing is
happened to the data in the tail.Signed-off-by: Andy Shevchenko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The magic numbers of the length are converted to their actual meaning,
such as end of the buffer with and without ASCII part.We don't touch the rest of the magic constants that will be removed in the
following commits.Signed-off-by: Andy Shevchenko
Cc: Rasmus Villemoes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When test for overflow do iterate the buffer length in a range 0 ..
BUF_SIZE.Signed-off-by: Andy Shevchenko
Cc: Rasmus Villemoes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Define a character to fill the test buffers. Though the character should
be printable since it's used when errors are reported. It should neither
be from hex digit [a-fA-F0-9] dictionary nor space. It is recommended not
to use one which is present in ASCII part of the test data. Later on we
might switch to unprintable character to make test case more robust.Signed-off-by: Andy Shevchenko
Suggested-by: Rasmus Villemoes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The function prepares the expected result in the provided buffer.
Signed-off-by: Andy Shevchenko
Acked-by: Rasmus Villemoes
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds