25 Dec, 2016
1 commit
-
This was entirely automated, using the script by Al:
PATT='^[[:blank:]]*#[[:blank:]]*include[[:blank:]]*'
sed -i -e "s!$PATT!#include !" \
$(git grep -l "$PATT"|grep -v ^include/linux/uaccess.h)to do the replacement at the end of the merge window.
Requested-by: Al Viro
Signed-off-by: Linus Torvalds
18 Dec, 2016
1 commit
-
…/linux/kernel/git/mszeredi/vfs
Pull partial readlink cleanups from Miklos Szeredi.
This is the uncontroversial part of the readlink cleanup patch-set that
simplifies the default readlink handling.Miklos and Al are still discussing the rest of the series.
* git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
vfs: make generic_readlink() static
vfs: remove ".readlink = generic_readlink" assignments
vfs: default to generic_readlink()
vfs: replace calling i_op->readlink with vfs_readlink()
proc/self: use generic_readlink
ecryptfs: use vfs_get_link()
bad_inode: add missing i_op initializers
15 Dec, 2016
2 commits
-
Pull audit updates from Paul Moore:
"After the small number of patches for v4.9, we've got a much bigger
pile for v4.10.The bulk of these patches involve a rework of the audit backlog queue
to enable us to move the netlink multicasting out of the task/thread
that generates the audit record and into the kernel thread that emits
the record (just like we do for the audit unicast to auditd).While we were playing with the backlog queue(s) we fixed a number of
other little problems with the code, and from all the testing so far
things look to be in much better shape now. Doing this also allowed us
to re-enable disabling IRQs for some netns operations ("netns: avoid
disabling irq for netns id").The remaining patches fix some small problems that are well documented
in the commit descriptions, as well as adding session ID filtering
support"* 'stable-4.10' of git://git.infradead.org/users/pcmoore/audit:
audit: use proper refcount locking on audit_sock
netns: avoid disabling irq for netns id
audit: don't ever sleep on a command record/message
audit: handle a clean auditd shutdown with grace
audit: wake up kauditd_thread after auditd registers
audit: rework audit_log_start()
audit: rework the audit queue handling
audit: rename the queues and kauditd related functions
audit: queue netlink multicast sends just like we do for unicast sends
audit: fixup audit_init()
audit: move kaudit thread start from auditd registration to kaudit init (#2)
audit: add support for session ID user filter
audit: fix formatting of AUDIT_CONFIG_CHANGE events
audit: skip sessionid sentinel value when auto-incrementing
audit: tame initialization warning len_abuf in audit_log_execve_info
audit: less stack usage for /proc/*/loginuid -
Pull security subsystem updates from James Morris:
"Generally pretty quiet for this release. Highlights:Yama:
- allow ptrace access for original parent after re-parentingTPM:
- add documentation
- many bugfixes & cleanups
- define a generic open() method for ascii & bios measurementsIntegrity:
- Harden against malformed xattrsSELinux:
- bugfixes & cleanupsSmack:
- Remove unnecessary smack_known_invalid label
- Do not apply star label in smack_setprocattr hook
- parse mnt opts after privileges check (fixes unpriv DoS vuln)"* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (56 commits)
Yama: allow access for the current ptrace parent
tpm: adjust return value of tpm_read_log
tpm: vtpm_proxy: conditionally call tpm_chip_unregister
tpm: Fix handling of missing event log
tpm: Check the bios_dir entry for NULL before accessing it
tpm: return -ENODEV if np is not set
tpm: cleanup of printk error messages
tpm: replace of_find_node_by_name() with dev of_node property
tpm: redefine read_log() to handle ACPI/OF at runtime
tpm: fix the missing .owner in tpm_bios_measurements_ops
tpm: have event log use the tpm_chip
tpm: drop tpm1_chip_register(/unregister)
tpm: replace dynamically allocated bios_dir with a static array
tpm: replace symbolic permission with octal for securityfs files
char: tpm: fix kerneldoc tpm2_unseal_trusted name typo
tpm_tis: Allow tpm_tis to be bound using DT
tpm, tpm_vtpm_proxy: add kdoc comments for VTPM_PROXY_IOC_NEW_DEV
tpm: Only call pm_runtime_get_sync if device has a parent
tpm: define a generic open() method for ascii & bios measurements
Documentation: tpm: add the Physical TPM device tree binding documentation
...
14 Dec, 2016
1 commit
-
Pull xen updates from Juergen Gross:
"Xen features and fixes for 4.10These are some fixes, a move of some arm related headers to share them
between arm and arm64 and a series introducing a helper to make code
more readable.The most notable change is David stepping down as maintainer of the
Xen hypervisor interface. This results in me sending you the pull
requests for Xen related code from now on"* tag 'for-linus-4.10-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (29 commits)
xen/balloon: Only mark a page as managed when it is released
xenbus: fix deadlock on writes to /proc/xen/xenbus
xen/scsifront: don't request a slot on the ring until request is ready
xen/x86: Increase xen_e820_map to E820_X_MAX possible entries
x86: Make E820_X_MAX unconditionally larger than E820MAX
xen/pci: Bubble up error and fix description.
xen: xenbus: set error code on failure
xen: set error code on failures
arm/xen: Use alloc_percpu rather than __alloc_percpu
arm/arm64: xen: Move shared architecture headers to include/xen/arm
xen/events: use xen_vcpu_id mapping for EVTCHNOP_status
xen/gntdev: Use VM_MIXEDMAP instead of VM_IO to avoid NUMA balancing
xen-scsifront: Add a missing call to kfree
MAINTAINERS: update XEN HYPERVISOR INTERFACE
xenfs: Use proc_create_mount_point() to create /proc/xen
xen-platform: use builtin_pci_driver
xen-netback: fix error handling output
xen: make use of xenbus_read_unsigned() in xenbus
xen: make use of xenbus_read_unsigned() in xen-pciback
xen: make use of xenbus_read_unsigned() in xen-fbfront
...
13 Dec, 2016
11 commits
-
Runtime nlink calculation works but meh. I don't know how to do it at
compile time, but I know how to do it at init time.Shift "2+" part into init time as a bonus.
Link: http://lkml.kernel.org/r/20161122195549.GB29812@avx2
Signed-off-by: Alexey Dobriyan
Reviewed-by: Vegard Nossum
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Comparison for "
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
format_decode and vsnprintf occasionally show up in perf top, so I went
looking for places that might not need the full printf power. With the
help of kprobes, I gathered some statistics on which format strings we
mostly pass to vsnprintf. On a trivial desktop workload, I hit "%x" 25%
of the time, so something apparently reads /proc/pid/status (which does
5*16 printf("%x") calls) a lot.With this patch, reading /proc/pid/status is 30% faster according to
this microbenchmark:char buf[4096];
int i, fd;
for (i = 0; i < 10000; ++i) {
fd = open("/proc/self/status", O_RDONLY);
read(fd, buf, sizeof(buf));
close(fd);
}Link: http://lkml.kernel.org/r/1474410485-1305-1-git-send-email-linux@rasmusvillemoes.dk
Signed-off-by: Rasmus Villemoes
Acked-by: Andrei Vagin
Acked-by: Kees Cook
Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Some comments were obsoleted since commit 05c0ae21c034 ("try a saner
locking for pde_opener...").Some new comments added.
Some confusing comments replaced with equally confusing ones.
Link: http://lkml.kernel.org/r/20161029160231.GD1246@avx2
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
kzalloc is too much, half of the fields will be reinitialized anyway.
If proc file doesn't have ->release hook (some still do not), clearing
is unnecessary because it will be freed immediately.Link: http://lkml.kernel.org/r/20161029155747.GC1246@avx2
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
struct pde_opener::closing is boolean.
Link: http://lkml.kernel.org/r/20161029155439.GB1246@avx2
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
list_del_init() is too much, structure will be freed in three lines
anyway.Link: http://lkml.kernel.org/r/20161029155313.GA1246@avx2
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Linux doesn't support 4GB+ filenames in /proc, so unsigned long is too
much.MOV r64, r/m64 is larger than MOV r32, r/m32.
Link: http://lkml.kernel.org/r/20161029161123.GG1246@avx2
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
"unsigned int" is better on x86_64 because it most of the time it
autoexpands to 64-bit value while "int" requires MOVSX instruction.Link: http://lkml.kernel.org/r/20161029160810.GF1246@avx2
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Similar to being able to examine if a process has been correctly
confined with seccomp, the state of no_new_privs is equally interesting,
so this adds it to /proc/$pid/status.Link: http://lkml.kernel.org/r/20161103214041.GA58566@beast
Signed-off-by: Kees Cook
Reviewed-by: Jann Horn
Cc: Jonathan Corbet
Cc: Vlastimil Babka
Cc: Michal Hocko
Cc: Konstantin Khlebnikov
Cc: Hugh Dickins
Cc: Naoya Horiguchi
Cc: Rodrigo Freire
Cc: John Stultz
Cc: Ross Zwisler
Cc: Robert Ho
Cc: Jerome Marchand
Cc: Andy Lutomirski
Cc: Johannes Weiner
Cc: Alexey Dobriyan
Cc: "Richard W.M. Jones"
Cc: Joe Perches
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The other pagetable walks in task_mmu.c have a cond_resched() after
walking their ptes: add a cond_resched() in gather_pte_stats() too, for
reading /proc//numa_maps. Only pagemap_pmd_range() has a
cond_resched() in its (unusually expensive) pmd_trans_huge case: more
should probably be added, but leave them unchanged for now.Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1612052157400.13021@eggly.anvils
Signed-off-by: Hugh Dickins
Acked-by: Michal Hocko
Cc: David Rientjes
Cc: Gerald Schaefer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Dec, 2016
2 commits
-
If .readlink == NULL implies generic_readlink().
Generated by:
to_del="\.readlink.*=.*generic_readlink"
for i in `git grep -l $to_del`; do sed -i "/$to_del"/d $i; doneSigned-off-by: Miklos Szeredi
-
The /proc/self and /proc/self-thread symlinks have separate but identical
functionality for reading and following. This cleanup utilizes
generic_readlink to remove the duplication.Signed-off-by: Miklos Szeredi
24 Nov, 2016
1 commit
17 Nov, 2016
1 commit
-
Mounting proc in user namespace containers fails if the xenbus
filesystem is mounted on /proc/xen because this directory fails
the "permanently empty" test. proc_create_mount_point() exists
specifically to create such mountpoints in proc but is currently
proc-internal. Export this interface to modules, then use it in
xenbus when creating /proc/xen.Signed-off-by: Seth Forshee
Signed-off-by: David Vrabel
Signed-off-by: Juergen Gross
15 Nov, 2016
1 commit
-
Pass the file mode of the proc inode to be created to
proc_pid_make_inode. In proc_pid_make_inode, initialize inode->i_mode
before calling security_task_to_inode. This allows selinux to set
isec->sclass right away without introducing "half-initialized" inode
security structs.Signed-off-by: Andreas Gruenbacher
Signed-off-by: Paul Moore
04 Nov, 2016
1 commit
-
%u requires 10 characters at most not 20.
Signed-off-by: Alexey Dobriyan
Acked-by: Richard Guy Briggs
Signed-off-by: Paul Moore
28 Oct, 2016
1 commit
-
Reading auxv of any kernel thread results in NULL pointer dereferencing
in auxv_read() where mm can be NULL. Fix that by checking for NULL mm
and bailing out early. This is also the original behavior changed by
recent commit c5317167854e ("proc: switch auxv to use of __mem_open()").# cat /proc/2/auxv
Unable to handle kernel NULL pointer dereference at virtual address 000000a8
Internal error: Oops: 17 [#1] PREEMPT SMP ARM
CPU: 3 PID: 113 Comm: cat Not tainted 4.9.0-rc1-ARCH+ #1
Hardware name: BCM2709
task: ea3b0b00 task.stack: e99b2000
PC is at auxv_read+0x24/0x4c
LR is at do_readv_writev+0x2fc/0x37c
Process cat (pid: 113, stack limit = 0xe99b2210)
Call chain:
auxv_read
do_readv_writev
vfs_readv
default_file_splice_read
splice_direct_to_actor
do_splice_direct
do_sendfile
SyS_sendfile64
ret_fast_syscallFixes: c5317167854e ("proc: switch auxv to use of __mem_open()")
Link: http://lkml.kernel.org/r/1476966200-14457-1-git-send-email-chianglungyu@gmail.com
Signed-off-by: Leon Yu
Acked-by: Oleg Nesterov
Acked-by: Michal Hocko
Cc: Al Viro
Cc: Kees Cook
Cc: John Stultz
Cc: Mateusz Guzik
Cc: Janis Danisevskis
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 Oct, 2016
1 commit
-
Now that Lorenzo cleaned things up and made the FOLL_FORCE users
explicit, it becomes obvious how some of them don't really need
FOLL_FORCE at all.So remove FOLL_FORCE from the proc code that reads the command line and
arguments from user space.The mem_rw() function actually does want FOLL_FORCE, because gdd (and
possibly many other debuggers) use it as a much more convenient version
of PTRACE_PEEKDATA, but we should consider making the FOLL_FORCE part
conditional on actually being a ptracer. This does not actually do
that, just moves adds a comment to that effect and moves the gup_flags
settings next to each other.Signed-off-by: Linus Torvalds
23 Oct, 2016
1 commit
-
Pull vmap stack fixes from Ingo Molnar:
"This is fallout from CONFIG_HAVE_ARCH_VMAP_STACK=y on x86: stack
accesses that used to be just somewhat questionable are now totally
buggy.These changes try to do it without breaking the ABI: the fields are
left there, they are just reporting zero, or reporting narrower
information (the maps file change)"* 'mm-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
mm: Change vm_is_stack_for_task() to vm_is_stack_for_current()
fs/proc: Stop trying to report thread stacks
fs/proc: Stop reporting eip and esp in /proc/PID/stat
mm/numa: Remove duplicated include from mprotect.c
20 Oct, 2016
2 commits
-
This reverts more of:
b76437579d13 ("procfs: mark thread stack correctly in proc//maps")
... which was partially reverted by:
65376df58217 ("proc: revert /proc//maps [stack:TID] annotation")
Originally, /proc/PID/task/TID/maps was the same as /proc/TID/maps.
In current kernels, /proc/PID/maps (or /proc/TID/maps even for
threads) shows "[stack]" for VMAs in the mm's stack address range.In contrast, /proc/PID/task/TID/maps uses KSTK_ESP to guess the
target thread's stack's VMA. This is racy, probably returns garbage
and, on arches with CONFIG_TASK_INFO_IN_THREAD=y, is also crash-prone:
KSTK_ESP is not safe to use on tasks that aren't known to be running
ordinary process-context kernel code.This patch removes the difference and just shows "[stack]" for VMAs
in the mm's stack range. This is IMO much more sensible -- the
actual "stack" address really is treated specially by the VM code,
and the current thread stack isn't even well-defined for programs
that frequently switch stacks on their own.Reported-by: Jann Horn
Signed-off-by: Andy Lutomirski
Acked-by: Thomas Gleixner
Cc: Al Viro
Cc: Andrew Morton
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Johannes Weiner
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Linux API
Cc: Peter Zijlstra
Cc: Tycho Andersen
Link: http://lkml.kernel.org/r/3e678474ec14e0a0ec34c611016753eea2e1b8ba.1475257877.git.luto@kernel.org
Signed-off-by: Ingo Molnar -
Reporting these fields on a non-current task is dangerous. If the
task is in any state other than normal kernel code, they may contain
garbage or even kernel addresses on some architectures. (x86_64
used to do this. I bet lots of architectures still do.) With
CONFIG_THREAD_INFO_IN_TASK=y, it can OOPS, too.As far as I know, there are no use programs that make any material
use of these fields, so just get rid of them.Reported-by: Jann Horn
Signed-off-by: Andy Lutomirski
Acked-by: Thomas Gleixner
Cc: Al Viro
Cc: Andrew Morton
Cc: Borislav Petkov
Cc: Brian Gerst
Cc: Kees Cook
Cc: Linus Torvalds
Cc: Linux API
Cc: Peter Zijlstra
Cc: Tetsuo Handa
Cc: Tycho Andersen
Link: http://lkml.kernel.org/r/a5fed4c3f4e33ed25d4bb03567e329bc5a712bcc.1475257877.git.luto@kernel.org
Signed-off-by: Ingo Molnar
19 Oct, 2016
1 commit
-
This removes the 'write' argument from access_remote_vm() and replaces
it with 'gup_flags' as use of this function previously silently implied
FOLL_FORCE, whereas after this patch callers explicitly pass this flag.We make this explicit as use of FOLL_FORCE can result in surprising
behaviour (and hence bugs) within the mm subsystem.Signed-off-by: Lorenzo Stoakes
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds
11 Oct, 2016
3 commits
-
Pull more vfs updates from Al Viro:
">rename2() work from Miklos + current_time() from Deepa"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
fs: Replace current_fs_time() with current_time()
fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
fs: Replace CURRENT_TIME with current_time() for inode timestamps
fs: proc: Delete inode time initializations in proc_alloc_inode()
vfs: Add current_time() api
vfs: add note about i_op->rename changes to porting
fs: rename "rename2" i_op to "rename"
vfs: remove unused i_op->rename
fs: make remaining filesystems use .rename2
libfs: support RENAME_NOREPLACE in simple_rename()
fs: support RENAME_NOREPLACE for local filesystems
ncpfs: fix unused variable warning -
Pull misc vfs updates from Al Viro:
"Assorted misc bits and pieces.There are several single-topic branches left after this (rename2
series from Miklos, current_time series from Deepa Dinamani, xattr
series from Andreas, uaccess stuff from from me) and I'd prefer to
send those separately"* 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (39 commits)
proc: switch auxv to use of __mem_open()
hpfs: support FIEMAP
cifs: get rid of unused arguments of CIFSSMBWrite()
posix_acl: uapi header split
posix_acl: xattr representation cleanups
fs/aio.c: eliminate redundant loads in put_aio_ring_file
fs/internal.h: add const to ns_dentry_operations declaration
compat: remove compat_printk()
fs/buffer.c: make __getblk_slow() static
proc: unsigned file descriptors
fs/file: more unsigned file descriptors
fs: compat: remove redundant check of nr_segs
cachefiles: Fix attempt to read i_blocks after deleting file [ver #2]
cifs: don't use memcpy() to copy struct iov_iter
get rid of separate multipage fault-in primitives
fs: Avoid premature clearing of capabilities
fs: Give dentry to inode_change_ok() instead of inode
fuse: Propagate dentry down to inode_change_ok()
ceph: Propagate dentry down to inode_change_ok()
xfs: Propagate dentry down to inode_change_ok()
...
08 Oct, 2016
9 commits
-
Current supplementary groups code can massively overallocate memory and
is implemented in a way so that access to individual gid is done via 2D
array.If number of gids is
Cc: Vasily Kulikov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Recently, Redhat reported that nvml test suite failed on QEMU/KVM,
more detailed info please refer to:https://bugzilla.redhat.com/show_bug.cgi?id=1365721
Actually, this bug is not only for NVDIMM/DAX but also for any other
file systems. This simple test case abstracted from nvml can easily
reproduce this bug in common environment:-------------------------- testcase.c -----------------------------
int
is_pmem_proc(const void *addr, size_t len)
{
const char *caddr = addr;FILE *fp;
if ((fp = fopen("/proc/self/smaps", "r")) == NULL) {
printf("!/proc/self/smaps");
return 0;
}int retval = 0; /* assume false until proven otherwise */
char line[PROCMAXLEN]; /* for fgets() */
char *lo = NULL; /* beginning of current range in smaps file */
char *hi = NULL; /* end of current range in smaps file */
int needmm = 0; /* looking for mm flag for current range */
while (fgets(line, PROCMAXLEN, fp) != NULL) {
static const char vmflags[] = "VmFlags:";
static const char mm[] = " wr";/* check for range line */
if (sscanf(line, "%p-%p", &lo, &hi) == 2) {
if (needmm) {
/* last range matched, but no mm flag found */
printf("never found mm flag.\n");
break;
} else if (caddr < lo) {
/* never found the range for caddr */
printf("#######no match for addr %p.\n", caddr);
break;
} else if (caddr < hi) {
/* start address is in this range */
size_t rangelen = (size_t)(hi - caddr);/* remember that matching has started */
needmm = 1;/* calculate remaining range to search for */
if (len > rangelen) {
len -= rangelen;
caddr += rangelen;
printf("matched %zu bytes in range "
"%p-%p, %zu left over.\n",
rangelen, lo, hi, len);
} else {
len = 0;
printf("matched all bytes in range "
"%p-%p.\n", lo, hi);
}
}
} else if (needmm && strncmp(line, vmflags,
sizeof(vmflags) - 1) == 0) {
if (strstr(&line[sizeof(vmflags) - 1], mm) != NULL) {
printf("mm flag found.\n");
if (len == 0) {
/* entire range matched */
retval = 1;
break;
}
needmm = 0; /* saw what was needed */
} else {
/* mm flag not set for some or all of range */
printf("range has no mm flag.\n");
break;
}
}
}fclose(fp);
printf("returning %d.\n", retval);
return retval;
}void *Addr;
size_t Size;/*
* worker -- the work each thread performs
*/
static void *
worker(void *arg)
{
int *ret = (int *)arg;
*ret = is_pmem_proc(Addr, Size);
return NULL;
}int main(int argc, char *argv[])
{
if (argc < 2 || argc > 3) {
printf("usage: %s file [env].\n", argv[0]);
return -1;
}int fd = open(argv[1], O_RDWR);
struct stat stbuf;
fstat(fd, &stbuf);Size = stbuf.st_size;
Addr = mmap(0, stbuf.st_size, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0);close(fd);
pthread_t threads[NTHREAD];
int ret[NTHREAD];/* kick off NTHREAD threads */
for (int i = 0; i < NTHREAD; i++)
pthread_create(&threads[i], NULL, worker, &ret[i]);/* wait for all the threads to complete */
for (int i = 0; i < NTHREAD; i++)
pthread_join(threads[i], NULL);/* verify that all the threads return the same value */
for (int i = 1; i < NTHREAD; i++) {
if (ret[0] != ret[i]) {
printf("Error i %d ret[0] = %d ret[i] = %d.\n", i,
ret[0], ret[i]);
}
}printf("%d", ret[0]);
return 0;
}It failed as some threads can not find the memory region in
"/proc/self/smaps" which is allocated in the main processIt is caused by proc fs which uses 'file->version' to indicate the VMA that
is the last one has already been handled by read() system call. When the
next read() issues, it uses the 'version' to find the VMA, then the next
VMA is what we want to handle, the related code is as follows:if (last_addr) {
vma = find_vma(mm, last_addr);
if (vma && (vma = m_next_vma(priv, vma)))
return vma;
}However, VMA will be lost if the last VMA is gone, e.g:
The process VMA list is A->B->C->D
CPU 0 CPU 1
read() system call
handle VMA B
version = B
return to userspaceunmap VMA B
issue read() again to continue to get
the region info
find_vma(version) will get VMA C
m_next_vma(C) will get VMA D
handle D
!!! VMA C is lost !!!In order to fix this bug, we make 'file->version' indicate the end address
of the current VMA. m_start will then look up a vma which with vma_start
< last_vm_end and moves on to the next vma if we found the same or an
overlapping vma. This will guarantee that we will not miss an exclusive
vma but we can still miss one if the previous vma was shrunk. This is
acceptable because guaranteeing "never miss a vma" is simply not feasible.
User has to cope with some inconsistencies if the file is not read in one
go.[mhocko@suse.com: changelog fixes]
Link: http://lkml.kernel.org/r/1475296958-27652-1-git-send-email-robert.hu@intel.com
Acked-by: Dave Hansen
Signed-off-by: Xiao Guangrong
Signed-off-by: Robert Hu
Acked-by: Michal Hocko
Acked-by: Oleg Nesterov
Cc: Paolo Bonzini
Cc: Dan Williams
Cc: Gleb Natapov
Cc: Marcelo Tosatti
Cc: Stefan Hajnoczi
Cc: Ross Zwisler
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In changing from checking ptrace_may_access(p, PTRACE_MODE_ATTACH_FSCREDS)
to capable(CAP_SYS_NICE), I missed that ptrace_my_access succeeds when p
== current, but the CAP_SYS_NICE doesn't.Thus while the previous commit was intended to loosen the needed
privileges to modify a processes timerslack, it needlessly restricted a
task modifying its own timerslack via the proc//timerslack_ns
(which is permitted also via the PR_SET_TIMERSLACK method).This patch corrects this by checking if p == current before checking the
CAP_SYS_NICE value.This patch applies on top of my two previous patches currently in -mm
Link: http://lkml.kernel.org/r/1471906870-28624-1-git-send-email-john.stultz@linaro.org
Signed-off-by: John Stultz
Acked-by: Kees Cook
Cc: "Serge E. Hallyn"
Cc: Thomas Gleixner
Cc: Arjan van de Ven
Cc: Oren Laadan
Cc: Ruchi Kandoi
Cc: Rom Lemarchand
Cc: Todd Kjos
Cc: Colin Cross
Cc: Nick Kralevich
Cc: Dmitry Shmidt
Cc: Elliott Hughes
Cc: Android Kernel Team
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
As requested, this patch checks the existing LSM hooks
task_getscheduler/task_setscheduler when reading or modifying the task's
timerslack value.Previous versions added new get/settimerslack LSM hooks, but since they
checked the same PROCESS__SET/GETSCHED values as existing hooks, it was
suggested we just use the existing ones.Link: http://lkml.kernel.org/r/1469132667-17377-2-git-send-email-john.stultz@linaro.org
Signed-off-by: John Stultz
Cc: Kees Cook
Cc: "Serge E. Hallyn"
Cc: Thomas Gleixner
Cc: Arjan van de Ven
Cc: Oren Laadan
Cc: Ruchi Kandoi
Cc: Rom Lemarchand
Cc: Todd Kjos
Cc: Colin Cross
Cc: Nick Kralevich
Cc: Dmitry Shmidt
Cc: Elliott Hughes
Cc: James Morris
Cc: Android Kernel Team
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When an interface to allow a task to change another tasks timerslack was
first proposed, it was suggested that something greater then
CAP_SYS_NICE would be needed, as a task could be delayed further then
what normally could be done with nice adjustments.So CAP_SYS_PTRACE was adopted instead for what became the
/proc//timerslack_ns interface. However, for Android (where this
feature originates), giving the system_server CAP_SYS_PTRACE would allow
it to observe and modify all tasks memory. This is considered too high
a privilege level for only needing to change the timerslack.After some discussion, it was realized that a CAP_SYS_NICE process can
set a task as SCHED_FIFO, so they could fork some spinning processes and
set them all SCHED_FIFO 99, in effect delaying all other tasks for an
infinite amount of time.So as a CAP_SYS_NICE task can already cause trouble for other tasks,
using it as a required capability for accessing and modifying
/proc//timerslack_ns seems sufficient.Thus, this patch loosens the capability requirements to CAP_SYS_NICE and
removes CAP_SYS_PTRACE, simplifying some of the code flow as well.This is technically an ABI change, but as the feature just landed in
4.6, I suspect no one is yet using it.Link: http://lkml.kernel.org/r/1469132667-17377-1-git-send-email-john.stultz@linaro.org
Signed-off-by: John Stultz
Reviewed-by: Nick Kralevich
Acked-by: Serge Hallyn
Acked-by: Kees Cook
Cc: Kees Cook
Cc: "Serge E. Hallyn"
Cc: Thomas Gleixner
Cc: Arjan van de Ven
Cc: Oren Laadan
Cc: Ruchi Kandoi
Cc: Rom Lemarchand
Cc: Todd Kjos
Cc: Colin Cross
Cc: Nick Kralevich
Cc: Dmitry Shmidt
Cc: Elliott Hughes
Cc: Android Kernel Team
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Use a specific routine to emit most lines so that the code is easier to
read and maintain.akpm:
text data bss dec hex filename
2976 8 0 2984 ba8 fs/proc/meminfo.o before
2669 8 0 2677 a75 fs/proc/meminfo.o afterLink: http://lkml.kernel.org/r/8fce7fdef2ba081a4ef531594e97da8a9feebb58.1470810406.git.joe@perches.com
Signed-off-by: Joe Perches
Cc: Andi Kleen
Cc: Alexey Dobriyan
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Allow some seq_puts removals by taking a string instead of a single
char.[akpm@linux-foundation.org: update vmstat_show(), per Joe]
Link: http://lkml.kernel.org/r/667e1cf3d436de91a5698170a1e98d882905e956.1470704995.git.joe@perches.com
Signed-off-by: Joe Perches
Cc: Joe Perches
Cc: Andi Kleen
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
top(1) opens the following files for every PID:
/proc/*/stat
/proc/*/statm
/proc/*/statusThis patch switches /proc/*/status away from seq_printf().
The result is 13.5% speedup.Benchmark is open("/proc/self/status")+read+close 1.000.000 million times.
BEFORE
$ perf stat -r 10 taskset -c 3 ./proc-self-statusPerformance counter stats for 'taskset -c 3 ./proc-self-status' (10 runs):
10748.474301 task-clock (msec) # 0.954 CPUs utilized ( +- 0.91% )
12 context-switches # 0.001 K/sec ( +- 1.09% )
1 cpu-migrations # 0.000 K/sec
104 page-faults # 0.010 K/sec ( +- 0.45% )
37,424,127,876 cycles # 3.482 GHz ( +- 0.04% )
8,453,010,029 stalled-cycles-frontend # 22.59% frontend cycles idle ( +- 0.12% )
3,747,609,427 stalled-cycles-backend # 10.01% backend cycles idle ( +- 0.68% )
65,632,764,147 instructions # 1.75 insn per cycle
# 0.13 stalled cycles per insn ( +- 0.00% )
13,981,324,775 branches # 1300.773 M/sec ( +- 0.00% )
138,967,110 branch-misses # 0.99% of all branches ( +- 0.18% )11.263885428 seconds time elapsed ( +- 0.04% )
^^^^^^^^^^^^AFTER
$ perf stat -r 10 taskset -c 3 ./proc-self-statusPerformance counter stats for 'taskset -c 3 ./proc-self-status' (10 runs):
9010.521776 task-clock (msec) # 0.925 CPUs utilized ( +- 1.54% )
11 context-switches # 0.001 K/sec ( +- 1.54% )
1 cpu-migrations # 0.000 K/sec ( +- 11.11% )
103 page-faults # 0.011 K/sec ( +- 0.60% )
32,352,310,603 cycles # 3.591 GHz ( +- 0.07% )
7,849,199,578 stalled-cycles-frontend # 24.26% frontend cycles idle ( +- 0.27% )
3,269,738,842 stalled-cycles-backend # 10.11% backend cycles idle ( +- 0.73% )
56,012,163,567 instructions # 1.73 insn per cycle
# 0.14 stalled cycles per insn ( +- 0.00% )
11,735,778,795 branches # 1302.453 M/sec ( +- 0.00% )
98,084,459 branch-misses # 0.84% of all branches ( +- 0.28% )9.741247736 seconds time elapsed ( +- 0.07% )
^^^^^^^^^^^Link: http://lkml.kernel.org/r/20160806125608.GB1187@p183.telecom.by
Signed-off-by: Alexey Dobriyan
Cc: Joe Perches
Cc: Andi Kleen
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds