15 Nov, 2013
4 commits
-
All seq_printf() users are using "%n" for calculating padding size,
convert them to use seq_setwidth() / seq_pad() pair.Signed-off-by: Tetsuo Handa
Signed-off-by: Kees Cook
Cc: Joe Perches
Cc: David Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Hugetlb supports multiple page sizes. We use split lock only for PMD
level, but not for PUD.[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Naoya Horiguchi
Signed-off-by: Kirill A. Shutemov
Tested-by: Alex Thorlton
Cc: Ingo Molnar
Cc: "Eric W . Biederman"
Cc: "Paul E . McKenney"
Cc: Al Viro
Cc: Andi Kleen
Cc: Andrea Arcangeli
Cc: Dave Hansen
Cc: Dave Jones
Cc: David Howells
Cc: Frederic Weisbecker
Cc: Johannes Weiner
Cc: Kees Cook
Cc: Mel Gorman
Cc: Michael Kerrisk
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Robin Holt
Cc: Sedat Dilek
Cc: Srikar Dronamraju
Cc: Thomas Gleixner
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
With split ptlock it's important to know which lock
pmd_trans_huge_lock() took. This patch adds one more parameter to the
function to return the lock.In most places migration to new api is trivial. Exception is
move_huge_pmd(): we need to take two locks if pmd tables are different.Signed-off-by: Naoya Horiguchi
Signed-off-by: Kirill A. Shutemov
Tested-by: Alex Thorlton
Cc: Ingo Molnar
Cc: "Eric W . Biederman"
Cc: "Paul E . McKenney"
Cc: Al Viro
Cc: Andi Kleen
Cc: Andrea Arcangeli
Cc: Dave Hansen
Cc: Dave Jones
Cc: David Howells
Cc: Frederic Weisbecker
Cc: Johannes Weiner
Cc: Kees Cook
Cc: Mel Gorman
Cc: Michael Kerrisk
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Robin Holt
Cc: Sedat Dilek
Cc: Srikar Dronamraju
Cc: Thomas Gleixner
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
With split page table lock for PMD level we can't hold mm->page_table_lock
while updating nr_ptes.Let's convert it to atomic_long_t to avoid races.
Signed-off-by: Kirill A. Shutemov
Tested-by: Alex Thorlton
Cc: Ingo Molnar
Cc: Naoya Horiguchi
Cc: "Eric W . Biederman"
Cc: "Paul E . McKenney"
Cc: Al Viro
Cc: Andi Kleen
Cc: Andrea Arcangeli
Cc: Dave Hansen
Cc: Dave Jones
Cc: David Howells
Cc: Frederic Weisbecker
Cc: Johannes Weiner
Cc: Kees Cook
Cc: Mel Gorman
Cc: Michael Kerrisk
Cc: Oleg Nesterov
Cc: Peter Zijlstra
Cc: Rik van Riel
Cc: Robin Holt
Cc: Sedat Dilek
Cc: Srikar Dronamraju
Cc: Thomas Gleixner
Cc: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Nov, 2013
8 commits
-
Merge first patch-bomb from Andrew Morton:
"Quite a lot of other stuff is banked up awaiting further
next->mainline merging, but this batch contains:- Lots of random misc patches
- OCFS2
- Most of MM
- backlight updates
- lib/ updates
- printk updates
- checkpatch updates
- epoll tweaking
- rtc updates
- hfs
- hfsplus
- documentation
- procfs
- update gcov to gcc-4.7 format
- IPC"* emailed patches from Andrew Morton : (269 commits)
ipc, msg: fix message length check for negative values
ipc/util.c: remove unnecessary work pending test
devpts: plug the memory leak in kill_sb
./Makefile: export initial ramdisk compression config option
init/Kconfig: add option to disable kernel compression
drivers: w1: make w1_slave::flags long to avoid memory corruption
drivers/w1/masters/ds1wm.cuse dev_get_platdata()
drivers/memstick/core/ms_block.c: fix unreachable state in h_msb_read_page()
drivers/memstick/core/mspro_block.c: fix attributes array allocation
drivers/pps/clients/pps-gpio.c: remove redundant of_match_ptr
kernel/panic.c: reduce 1 byte usage for print tainted buffer
gcov: reuse kbasename helper
kernel/gcov/fs.c: use pr_warn()
kernel/module.c: use pr_foo()
gcov: compile specific gcov implementation based on gcc version
gcov: add support for gcc 4.7 gcov format
gcov: move gcov structs definitions to a gcc version specific file
kernel/taskstats.c: return -ENOMEM when alloc memory fails in add_del_listener()
kernel/taskstats.c: add nla_nest_cancel() for failure processing between nla_nest_start() and nla_nest_end()
kernel/sysctl_binary.c: use scnprintf() instead of snprintf()
... -
Pull vfs updates from Al Viro:
"All kinds of stuff this time around; some more notable parts:- RCU'd vfsmounts handling
- new primitives for coredump handling
- files_lock is gone
- Bruce's delegations handling series
- exportfs fixesplus misc stuff all over the place"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
ecryptfs: ->f_op is never NULL
locks: break delegations on any attribute modification
locks: break delegations on link
locks: break delegations on rename
locks: helper functions for delegation breaking
locks: break delegations on unlink
namei: minor vfs_unlink cleanup
locks: implement delegations
locks: introduce new FL_DELEG lock flag
vfs: take i_mutex on renamed file
vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
vfs: don't use PARENT/CHILD lock classes for non-directories
vfs: pull ext4's double-i_mutex-locking into common code
exportfs: fix quadratic behavior in filehandle lookup
exportfs: better variable name
exportfs: move most of reconnect_path to helper function
exportfs: eliminate unused "noprogress" counter
exportfs: stop retrying once we race with rename/remove
exportfs: clear DISCONNECTED on all parents sooner
exportfs: more detailed comment for path_reconnect
... -
Under Pseudo filesystems, /proc/kcore support has no help.
Fixes a portion of kernel bugzilla #52671:
https://bugzilla.kernel.org/show_bug.cgi?id=52671Thanks for David Howells for the help text.
Signed-off-by: Randy Dunlap
Reported-by:
Signed-off-by: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Clean up proc_reg_get_unmapped_area due to its 80-column limit
violation.Signed-off-by: HATAYAMA Daisuke
Tested-by: Michael Holzheu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The same calculation is currently done in three differents places.
Factor that code so future changes has to be made at only one place.[akpm@linux-foundation.org: uninline vm_commit_limit()]
Signed-off-by: Jerome Marchand
Cc: Dave Hansen
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This flag shows that the VMA is "newly created" and thus represents
"dirty" in the task's VM.You can clear it by "echo 4 > /proc/pid/clear_refs."
Signed-off-by: Naoya Horiguchi
Cc: Wu Fengguang
Cc: Pavel Emelyanov
Acked-by: Cyrill Gorcunov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
mpol_to_str() should not fail. Currently, it either fails because the
string buffer is too small or because a string hasn't been defined for a
mempolicy mode.If a new mempolicy mode is introduced and no string is defined for it,
just warn and return "unknown".If the buffer is too small, just truncate the string and return, the
same behavior as snprintf().This also fixes a bug where there was no NULL-byte termination when doing
*p++ = '=' and *p++ ':' and maxlen has been reached.Signed-off-by: David Rientjes
Cc: KOSAKI Motohiro
Cc: Chen Gang
Cc: Rik van Riel
Cc: Dave Jones
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Use "pgdat_end_pfn()" instead of "pgdat->node_start_pfn +
pgdat->node_spanned_pages". Simplify the code, no functional change.Signed-off-by: Xishi Qiu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Nov, 2013
1 commit
-
Pull devicetree updates from Rob Herring:
"DeviceTree updates for 3.13. This is a bit larger pull request than
usual for this cycle with lots of clean-up.- Cross arch clean-up and consolidation of early DT scanning code.
- Clean-up and removal of arch prom.h headers. Makes arch specific
prom.h optional on all but Sparc.
- Addition of interrupts-extended property for devices connected to
multiple interrupt controllers.
- Refactoring of DT interrupt parsing code in preparation for
deferred probe of interrupts.
- ARM cpu and cpu topology bindings documentation.
- Various DT vendor binding documentation updates"* tag 'devicetree-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (82 commits)
powerpc: add missing explicit OF includes for ppc
dt/irq: add empty of_irq_count for !OF_IRQ
dt: disable self-tests for !OF_IRQ
of: irq: Fix interrupt-map entry matching
MIPS: Netlogic: replace early_init_devtree() call
of: Add Panasonic Corporation vendor prefix
of: Add Chunghwa Picture Tubes Ltd. vendor prefix
of: Add AU Optronics Corporation vendor prefix
of/irq: Fix potential buffer overflow
of/irq: Fix bug in interrupt parsing refactor.
of: set dma_mask to point to coherent_dma_mask
of: add vendor prefix for PHYTEC Messtechnik GmbH
DT: sort vendor-prefixes.txt
of: Add vendor prefix for Cadence
of: Add empty for_each_available_child_of_node() macro definition
arm/versatile: Fix versatile irq specifications.
of/irq: create interrupts-extended property
microblaze/pci: Drop PowerPC-ism from irq parsing
of/irq: Create of_irq_parse_and_map_pci() to consolidate arch code.
of/irq: Use irq_of_parse_and_map()
...
01 Nov, 2013
1 commit
-
Resolve cherry-picking conflicts:
Conflicts:
mm/huge_memory.c
mm/memory.c
mm/mprotect.cSee this upstream merge commit for more details:
52469b4fcd4f Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Signed-off-by: Ingo Molnar
25 Oct, 2013
1 commit
-
duplicated to hell and back...
Signed-off-by: Al Viro
17 Oct, 2013
3 commits
-
Commit c4fe24485729 ("sparc: fix PCI device proc file mmap(2)") added
proc_reg_get_unmapped_area in proc_reg_file_ops and
proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
get_unmapped_area method is not defined for the target procfs file,
which causes regression of mmap on /proc/vmcore.To address this issue, like get_unmapped_area(), call default
current->mm->get_unmapped_area on MMU-present architectures if
pde->proc_fops->get_unmapped_area, i.e. the one in actual file
operation in the procfs file, is not defined.Reported-by: Michael Holzheu
Signed-off-by: HATAYAMA Daisuke
Cc: Alexey Dobriyan
Cc: David S. Miller
Tested-by: Michael Holzheu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently, proc_reg_get_unmapped_area truncates upper 32-bit of the
mapped virtual address returned from get_unmapped_area method in
pde->proc_fops due to the variable rv of signed integer on x86_64. This
is too small to have vitual address of unsigned long on x86_64 since on
x86_64, signed integer is of 4 bytes while unsigned long is of 8 bytes.
To fix this issue, use unsigned long instead.Fixes a regression added in commit c4fe24485729 ("sparc: fix PCI device
proc file mmap(2)").Signed-off-by: HATAYAMA Daisuke
Cc: Alexey Dobriyan
Cc: David S. Miller
Tested-by: Michael Holzheu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If a page we are inspecting is in swap we may occasionally report it as
having soft dirty bit (even if it is clean). The pte_soft_dirty helper
should be called on present pte only.Signed-off-by: Cyrill Gorcunov
Cc: Pavel Emelyanov
Cc: Andy Lutomirski
Cc: Matt Mackall
Cc: Xiao Guangrong
Cc: Marcelo Tosatti
Cc: KOSAKI Motohiro
Cc: Stephen Rothwell
Cc: Peter Zijlstra
Cc: "Aneesh Kumar K.V"
Reviewed-by: Naoya Horiguchi
Cc: Mel Gorman
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 Oct, 2013
1 commit
-
HAVE_ARCH_DEVTREE_FIXUPS appears to always be needed except for sparc,
but it is only used for /proc/device-teee and sparc does not enable
/proc/device-tree. So this option is redundant. Remove the option and
always enable it. This has the side effect of fixing /proc/device-tree
on arches such as arm64 which failed to define this option.Signed-off-by: Rob Herring
Acked-by: Vineet Gupta
Acked-by: Grant Likely
Cc: Russell King
Cc: James Hogan
Cc: Michal Simek
Cc: Jonas Bonn
Cc: Benjamin Herrenschmidt
Cc: Paul Mackerras
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: "H. Peter Anvin"
Cc: x86@kernel.org
Cc: Chris Zankel
Cc: Max Filippov
09 Oct, 2013
1 commit
-
It is desirable to model from userspace how the scheduler groups tasks
over time. This patch adds an ID to the numa_group and reports it via
/proc/PID/status.Signed-off-by: Mel Gorman
Reviewed-by: Rik van Riel
Cc: Andrea Arcangeli
Cc: Johannes Weiner
Cc: Srikar Dronamraju
Signed-off-by: Peter Zijlstra
Link: http://lkml.kernel.org/r/1381141781-10992-45-git-send-email-mgorman@suse.de
Signed-off-by: Ingo Molnar
13 Sep, 2013
1 commit
-
We use NR_ANON_PAGES as base for reporting AnonPages to user. There's
not much sense in not accounting transparent huge pages there, but add
them on printing to user.Let's account transparent huge pages in NR_ANON_PAGES in the first place.
Signed-off-by: Kirill A. Shutemov
Acked-by: Dave Hansen
Cc: Andrea Arcangeli
Cc: Al Viro
Cc: Hugh Dickins
Cc: Wu Fengguang
Cc: Jan Kara
Cc: Mel Gorman
Cc: Andi Kleen
Cc: Matthew Wilcox
Cc: Hillf Danton
Cc: Ning Qu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Sep, 2013
6 commits
-
The patch "s390/vmcore: Implement remap_oldmem_pfn_range for s390" allows
now to use mmap also on s390.So enable mmap for s390 again.
Signed-off-by: Michael Holzheu
Cc: HATAYAMA Daisuke
Cc: Jan Willeke
Cc: Vivek Goyal
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
For zfcpdump we can't map the HSA storage because it is only available via
a read interface. Therefore, for the new vmcore mmap feature we have
introduce a new mechanism to create mappings on demand.This patch introduces a new architecture function remap_oldmem_pfn_range()
that should be used to create mappings with remap_pfn_range() for oldmem
areas that can be directly mapped. For zfcpdump this is everything
besides of the HSA memory. For the areas that are not mapped by
remap_oldmem_pfn_range() a generic vmcore a new generic vmcore fault
handler mmap_vmcore_fault() is called.This handler works as follows:
* Get already available or new page from page cache (find_or_create_page)
* Check if /proc/vmcore page is filled with data (PageUptodate)
* If yes:
Return that page
* If no:
Fill page using __vmcore_read(), set PageUptodate, and return pageSigned-off-by: Michael Holzheu
Acked-by: Vivek Goyal
Cc: HATAYAMA Daisuke
Cc: Jan Willeke
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
For s390 we want to use /proc/vmcore for our SCSI stand-alone dump
(zfcpdump). We have support where the first HSA_SIZE bytes are saved into
a hypervisor owned memory area (HSA) before the kdump kernel is booted.
When the kdump kernel starts, it is restricted to use only HSA_SIZE bytes.The advantages of this mechanism are:
* No crashkernel memory has to be defined in the old kernel.
* Early boot problems (before kexec_load has been done) can be dumped
* Non-Linux systems can be dumped.We modify the s390 copy_oldmem_page() function to read from the HSA memory
if memory below HSA_SIZE bytes is requested.Since we cannot use the kexec tool to load the kernel in this scenario,
we have to build the ELF header in the 2nd (kdump/new) kernel.So with the following patch set we would like to introduce the new
function that the ELF header for /proc/vmcore can be created in the 2nd
kernel memory.The following steps are done during zfcpdump execution:
1. Production system crashes
2. User boots a SCSI disk that has been prepared with the zfcpdump tool
3. Hypervisor saves CPU state of boot CPU and HSA_SIZE bytes of memory into HSA
4. Boot loader loads kernel into low memory area
5. Kernel boots and uses only HSA_SIZE bytes of memory
6. Kernel saves registers of non-boot CPUs
7. Kernel does memory detection for dump memory map
8. Kernel creates ELF header for /proc/vmcore
9. /proc/vmcore uses this header for initialization
10. The zfcpdump user space reads /proc/vmcore to write dump to SCSI disk
- copy_oldmem_page() copies from HSA for memory below HSA_SIZE
- copy_oldmem_page() copies from real memory for memory above HSA_SIZECurrently for s390 we create the ELF core header in the 2nd kernel with a
small trick. We relocate the addresses in the ELF header in a way that
for the /proc/vmcore code it seems to be in the 1st kernel (old) memory
and the read_from_oldmem() returns the correct data. This allows the
/proc/vmcore code to use the ELF header in the 2nd kernel.This patch:
Exchange the old mechanism with the new and much cleaner function call
override feature that now offcially allows to create the ELF core header
in the 2nd kernel.To use the new feature the following function have to be defined
by the architecture backend code to read from new memory:* elfcorehdr_alloc: Allocate ELF header
* elfcorehdr_free: Free the memory of the ELF header
* elfcorehdr_read: Read from ELF header
* elfcorehdr_read_notes: Read from ELF notesSigned-off-by: Michael Holzheu
Acked-by: Vivek Goyal
Cc: HATAYAMA Daisuke
Cc: Jan Willeke
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
proc_fd_permission() says "process can still access /proc/self/fd after it
has executed a setuid()", but the "task_pid() = proc_pid() check only
helps if the task is group leader, /proc/self points to
/proc/.Change this check to use task_tgid() so that the whole thread group can
access its /proc/self/fd or /proc//fd.Notes:
- CLONE_THREAD does not require CLONE_FILES so task->files
can differ, but I don't think this can lead to any security
problem. And this matches same_thread_group() in
__ptrace_may_access().- /proc/self should probably point to /proc/, but
it is too late to change the rules. Perhaps it makes sense
to add /proc/thread though.Test-case:
void *tfunc(void *arg)
{
assert(opendir("/proc/self/fd"));
return NULL;
}int main(void)
{
pthread_t t;
pthread_create(&t, NULL, tfunc, NULL);
pthread_join(t, NULL);
return 0;
}fails if, say, this executable is not readable and suid_dumpable = 0.
Signed-off-by: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
mpol_to_str() may fail, and not fill the buffer (e.g. -EINVAL), so need
check about it, or buffer may not be zero based, and next seq_printf()
will cause issue.The failure return need after mpol_cond_put() to match get_vma_policy().
Signed-off-by: Chen Gang
Cc: Cyrill Gorcunov
Cc: Mel Gorman
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Pavel reported that in case if vma area get unmapped and then mapped (or
expanded) in-place, the soft dirty tracker won't be able to recognize this
situation since it works on pte level and ptes are get zapped on unmap,
loosing soft dirty bit of course.So to resolve this situation we need to track actions on vma level, there
VM_SOFTDIRTY flag comes in. When new vma area created (or old expanded)
we set this bit, and keep it here until application calls for clearing
soft dirty bit.Thus when user space application track memory changes now it can detect if
vma area is renewed.Reported-by: Pavel Emelyanov
Signed-off-by: Cyrill Gorcunov
Cc: Andy Lutomirski
Cc: Matt Mackall
Cc: Xiao Guangrong
Cc: Marcelo Tosatti
Cc: KOSAKI Motohiro
Cc: Stephen Rothwell
Cc: Peter Zijlstra
Cc: "Aneesh Kumar K.V"
Cc: Rob Landley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Sep, 2013
1 commit
-
Pull namespace changes from Eric Biederman:
"This is an assorted mishmash of small cleanups, enhancements and bug
fixes.The major theme is user namespace mount restrictions. nsown_capable
is killed as it encourages not thinking about details that need to be
considered. A very hard to hit pid namespace exiting bug was finally
tracked and fixed. A couple of cleanups to the basic namespace
infrastructure.Finally there is an enhancement that makes per user namespace
capabilities usable as capabilities, and an enhancement that allows
the per userns root to nice other processes in the user namespace"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
userns: Kill nsown_capable it makes the wrong thing easy
capabilities: allow nice if we are privileged
pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
userns: Allow PR_CAPBSET_DROP in a user namespace.
namespaces: Simplify copy_namespaces so it is clear what is going on.
pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup
sysfs: Restrict mounting sysfs
userns: Better restrictions on when proc and sysfs can be mounted
vfs: Don't copy mount bind mounts of /proc//ns/mnt between namespaces
kernel/nsproxy.c: Improving a snippet of code.
proc: Restrict mounting the proc filesystem
vfs: Lock in place mounts from more privileged users
06 Sep, 2013
2 commits
-
Pull sparc changes from David Miller:
"Several bug fixes (from Kirill Tkhai, Geery Uytterhoeven, and Alexey
Dobriyan) and some support for Fujitsu sparc64x chips (from Allen
Pais)"* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
sparc64: Export flush_ptrace_access() (needed by lustre)
sparc: fix PCI device proc file mmap(2)
sparc64: Remove RWSEM export leftovers
sparc64: Fix off by one in trampoline TLB mapping installation loop.
sparc64: Fix ITLB handler of null page
esp_scsi: Fix tag state corruption when autosensing.
sparc64: Fix not SRA'ed %o5 in 32-bit traced syscall
sparc64: cleanup: Rename ret_from_syscall to ret_from_fork
sparc32: Fix exit flag passed from traced sys_sigreturn
sparc64: Fix wrong syscall return value passed to trace_sys_exit()
support sparc64x chip type in cpumap.c
cpu hw caps support for sparc64x -
Commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba "Fix rmmod/read/write races in /proc entries"
must have broken mmapping of PCI device proc files on Sparc.Notice how it adds wrapper around ->mmap but doesn't do it around ->get_unmapped_area.
Add wrapper around ->get_unmapped_area.Signed-off-by: Alexey Dobriyan
Signed-off-by: David S. Miller
27 Aug, 2013
2 commits
-
Rely on the fact that another flavor of the filesystem is already
mounted and do not rely on state in the user namespace.Verify that the mounted filesystem is not covered in any significant
way. I would love to verify that the previously mounted filesystem
has no mounts on top but there are at least the directories
/proc/sys/fs/binfmt_misc and /sys/fs/cgroup/ that exist explicitly
for other filesystems to mount on top of.Refactor the test into a function named fs_fully_visible and call that
function from the mount routines of proc and sysfs. This makes this
test local to the filesystems involved and the results current of when
the mounts take place, removing a weird threading of the user
namespace, the mount namespace and the filesystems themselves.Signed-off-by: "Eric W. Biederman"
-
Don't allow mounting the proc filesystem unless the caller has
CAP_SYS_ADMIN rights over the pid namespace. The principle here is if
you create or have capabilities over it you can mount it, otherwise
you get to live with what other people have mounted.Andy pointed out that this is needed to prevent users in a user
namespace from remounting proc and specifying different hidepid and gid
options on already existing proc mounts.Cc: stable@vger.kernel.org
Reported-by: Andy Lutomirski
Signed-off-by: "Eric W. Biederman"
26 Aug, 2013
1 commit
-
Pull vfs fixes from Al Viro:
"Assorted fixes from the last week or so"* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
VFS: collect_mounts() should return an ERR_PTR
bfs: iget_locked() doesn't return an ERR_PTR
efs: iget_locked() doesn't return an ERR_PTR()
proc: kill the extra proc_readfd_common()->dir_emit_dots()
cope with potentially long ->d_dname() output for shmem/hugetlb
25 Aug, 2013
1 commit
-
proc_readfd_common() does dir_emit_dots() twice in a row,
we need to do this only once.Signed-off-by: Oleg Nesterov
Signed-off-by: Al Viro
20 Aug, 2013
2 commits
-
In the previous commit, Richard Genoud fixed proc_root_readdir(), which
had lost the check for whether all of the non-process /proc entries had
been returned or not.But that in turn exposed _another_ bug, namely that the original readdir
conversion patch had yet another problem: it had lost the return value
of proc_readdir_de(), so now checking whether it had completed
successfully or not didn't actually work right anyway.This reinstates the non-zero return for the "end of base entries" that
had also gotten lost in commit f0c3b5093add ("[readdir] convert
procfs"). So now you get all the base entries *and* you get all the
process entries, regardless of getdents buffer size.(Side note: the Linux "getdents" manual page actually has a nice example
application for testing getdents, which can be easily modified to use
different buffers. Who knew? Man-pages can be useful)Reported-by: Emmanuel Benisty
Reported-by: Marc Dionne
Cc: Richard Genoud
Cc: Al Viro
Signed-off-by: Linus Torvalds -
Commit f0c3b5093add ("[readdir] convert procfs") introduced a bug on the
listing of the proc file-system. The return value of proc_readdir()
isn't tested anymore in the proc_root_readdir function.This lead to an "interesting" behaviour when we are using the getdents()
system call with a buffer too small: instead of failing, it returns the
first entries of /proc (enough to fill the given buffer), plus the PID
directories.This is not triggered on glibc (as getdents is called with a 32KB
buffer), but on uclibc, the buffer size is only 1KB, thus some proc
entries are missing.See https://lkml.org/lkml/2013/8/12/288 for more background.
Signed-off-by: Richard Genoud
Cc: Al Viro
Cc: Andrew Morton
Signed-off-by: Linus Torvalds
14 Aug, 2013
3 commits
-
Recently we met quite a lot of random kernel panic issues after enabling
CONFIG_PROC_PAGE_MONITOR. After debuggind we found this has something
to do with following bug in pagemap:In struct pagemapread:
struct pagemapread {
int pos, len;
pagemap_entry_t *buffer;
bool v2;
};pos is number of PM_ENTRY_BYTES in buffer, but len is the size of
buffer, it is a mistake to compare pos and len in add_page_map() for
checking buffer is full or not, and this can lead to buffer overflow and
random kernel panic issue.Correct len to be total number of PM_ENTRY_BYTES in buffer.
[akpm@linux-foundation.org: document pagemapread.pos and .len units, fix PM_ENTRY_BYTES definition]
Signed-off-by: Yonghua Zheng
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Andy reported that if file page get reclaimed we lose the soft-dirty bit
if it was there, so save _PAGE_BIT_SOFT_DIRTY bit when page address get
encoded into pte entry. Thus when #pf happens on such non-present pte
we can restore it back.Reported-by: Andy Lutomirski
Signed-off-by: Cyrill Gorcunov
Acked-by: Pavel Emelyanov
Cc: Matt Mackall
Cc: Xiao Guangrong
Cc: Marcelo Tosatti
Cc: KOSAKI Motohiro
Cc: Stephen Rothwell
Cc: Peter Zijlstra
Cc: "Aneesh Kumar K.V"
Cc: Minchan Kim
Cc: Wanpeng Li
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Andy Lutomirski reported that if a page with _PAGE_SOFT_DIRTY bit set
get swapped out, the bit is getting lost and no longer available when
pte read back.To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is saved in
pte entry for the page being swapped out. When such page is to be read
back from a swap cache we check for bit presence and if it's there we
clear it and restore the former _PAGE_SOFT_DIRTY bit back.One of the problem was to find a place in pte entry where we can save
the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The _PAGE_PSE was
chosen for that, it doesn't intersect with swap entry format stored in
pte.Reported-by: Andy Lutomirski
Signed-off-by: Cyrill Gorcunov
Acked-by: Pavel Emelyanov
Cc: Matt Mackall
Cc: Xiao Guangrong
Cc: Marcelo Tosatti
Cc: KOSAKI Motohiro
Cc: Stephen Rothwell
Cc: Peter Zijlstra
Cc: "Aneesh Kumar K.V"
Reviewed-by: Minchan Kim
Reviewed-by: Wanpeng Li
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 Jul, 2013
1 commit
-
The kdump mmap patch series (git commit 83086978c63afd7c73e1c) directly
map the PT_LOADs to memory. On s390 this does not work because the
copy_from_oldmem() function swaps [0,crashkernel size] with
[crashkernel base, crashkernel base+crashkernel size]. The swap
int copy_from_oldmem() was done in order correctly implement /dev/oldmem.See: http://marc.info/?l=kexec&m=136940802511603&w=2
Signed-off-by: Michael Holzheu
Signed-off-by: Martin Schwidefsky