Eric Lee / smarc-fsl-linux-kernel

04 Jul, 2013

14 commits

30bc30df1 fs/proc/kcore.c: using strlcpy() instead of strncpy() ... Browse Code »

For NUL terminated string, set '\0' at the end.

Signed-off-by: Zhao Hongjiang
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Zhao Hongjiang
2013-07-04 07:08:02 +0800
1d98a5fa1 fs/proc/uptime.c:uptime_proc_show(): use get_monotonic_boottime() ... Browse Code »

Change uptime_proc_show() to use get_monotonic_boottime() instead of
do_posix_clock_monotonic_gettime() + monotonic_to_bootbased().

Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Acked-by: John Stultz
Cc: Tomas Janousek
Cc: Tomas Smetana
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2013-07-04 07:08:02 +0800
83086978c vmcore: support mmap() on /proc/vmcore ... Browse Code »

This patch introduces mmap_vmcore().

Don't permit writable nor executable mapping even with mprotect()
because this mmap() is aimed at reading crash dump memory. Non-writable
mapping is also requirement of remap_pfn_range() when mapping linear
pages on non-consecutive physical pages; see is_cow_mapping().

Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
remap_vmalloc_range_pertial at the same time for a single vma.
do_munmap() can correctly clean partially remapped vma with two
functions in abnormal case. See zap_pte_range(), vm_normal_page() and
their comments for details.

On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
limitation comes from the fact that the third argument of
remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.

[akpm@linux-foundation.org: use min(), switch to conventional error-unwinding approach]
Signed-off-by: HATAYAMA Daisuke
Acked-by: Vivek Goyal
Cc: KOSAKI Motohiro
Cc: Atsushi Kumagai
Cc: Lisa Mitchell
Cc: Zhang Yanfei
Tested-by: Maxim Uvarov
Cc: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

HATAYAMA Daisuke
2013-07-04 07:07:30 +0800
591ff7166 vmcore: calculate vmcore file size from buffer size and total size of vmcore objects ... Browse Code »

The previous patches newly added holes before each chunk of memory and
the holes need to be count in vmcore file size. There are two ways to
count file size in such a way:

1) suppose m is a poitner to the last vmcore object in vmcore_list.
Then file size is (m->offset + m->size), or

2) calculate sum of size of buffers for ELF header, program headers,
ELF note segments and objects in vmcore_list.

Although 1) is more direct and simpler than 2), 2) seems better in that
it reflects internal object structure of /proc/vmcore. Thus, this patch
changes get_vmcore_size_elf{64, 32} so that it calculates size in the
way of 2).

As a result, both get_vmcore_size_elf{64, 32} have the same definition.
Merge them as get_vmcore_size.

Signed-off-by: HATAYAMA Daisuke
Acked-by: Vivek Goyal
Cc: KOSAKI Motohiro
Cc: Atsushi Kumagai
Cc: Lisa Mitchell
Cc: Zhang Yanfei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

HATAYAMA Daisuke
2013-07-04 07:07:30 +0800
ef9e78fd2 vmcore: allow user process to remap ELF note segment buffer ... Browse Code »

Now ELF note segment has been copied in the buffer on vmalloc memory.
To allow user process to remap the ELF note segment buffer with
remap_vmalloc_page, the corresponding VM area object has to have
VM_USERMAP flag set.

[akpm@linux-foundation.org: use the conventional comment layout]
Signed-off-by: HATAYAMA Daisuke
Acked-by: Vivek Goyal
Cc: KOSAKI Motohiro
Cc: Atsushi Kumagai
Cc: Lisa Mitchell
Cc: Zhang Yanfei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

HATAYAMA Daisuke
2013-07-04 07:07:30 +0800
087350c9d vmcore: allocate ELF note segment in the 2nd kernel vmalloc memory ... Browse Code »

The reasons why we don't allocate ELF note segment in the 1st kernel
(old memory) on page boundary is to keep backward compatibility for old
kernels, and that if doing so, we waste not a little memory due to
round-up operation to fit the memory to page boundary since most of the
buffers are in per-cpu area.

ELF notes are per-cpu, so total size of ELF note segments depends on
number of CPUs. The current maximum number of CPUs on x86_64 is 5192,
and there's already system with 4192 CPUs in SGI, where total size
amounts to 1MB. This can be larger in the near future or possibly even
now on another architecture that has larger size of note per a single
cpu. Thus, to avoid the case where memory allocation for large block
fails, we allocate vmcore objects on vmalloc memory.

This patch adds elfnotes_buf and elfnotes_sz variables to keep pointer
to the ELF note segment buffer and its size. There's no longer the
vmcore object that corresponds to the ELF note segment in vmcore_list.
Accordingly, read_vmcore() has new case for ELF note segment and
set_vmcore_list_offsets_elf{64,32}() and other helper functions starts
calculating offset from sum of size of ELF headers and size of ELF note
segment.

[akpm@linux-foundation.org: use min(), fix error-path vzalloc() leaks]
Signed-off-by: HATAYAMA Daisuke
Acked-by: Vivek Goyal
Cc: KOSAKI Motohiro
Cc: Atsushi Kumagai
Cc: Lisa Mitchell
Cc: Zhang Yanfei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

HATAYAMA Daisuke
2013-07-04 07:07:30 +0800
7f614cd1e vmcore: treat memory chunks referenced by PT_LOAD program header entries in page… ... Browse Code »

…-size boundary in vmcore_list

Treat memory chunks referenced by PT_LOAD program header entries in
page-size boundary in vmcore_list. Formally, for each range [start,
end], we set up the corresponding vmcore object in vmcore_list to
[rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)].

This change affects layout of /proc/vmcore. The gaps generated by the
rearrangement are newly made visible to applications as holes.
Concretely, they are two ranges [rounddown(start, PAGE_SIZE), start] and
[end, roundup(end, PAGE_SIZE)].

Suppose variable m points at a vmcore object in vmcore_list, and
variable phdr points at the program header of PT_LOAD type the variable
m corresponds to. Then, pictorially:

m->offset +---------------+
| hole |
phdr->p_offset = +---------------+
m->offset + (paddr - start) | |\
| kernel memory | phdr->p_memsz
| |/
+---------------+
| hole |
m->offset + m->size +---------------+

where m->offset and m->offset + m->size are always page-size aligned.

Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
Acked-by: Vivek Goyal <vgoyal@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
Cc: Lisa Mitchell <lisa.mitchell@hp.com>
Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

HATAYAMA Daisuke
2013-07-04 07:07:30 +0800
f2bdacdd5 vmcore: allocate buffer for ELF headers on page-size alignment ... Browse Code »

Allocate ELF headers on page-size boundary using __get_free_pages()
instead of kmalloc().

Later patch will merge PT_NOTE entries into a single unique one and
decrease the buffer size actually used. Keep original buffer size in
variable elfcorebuf_sz_orig to kfree the buffer later and actually used
buffer size with rounded up to page-size boundary in variable
elfcorebuf_sz separately.

The size of part of the ELF buffer exported from /proc/vmcore is
elfcorebuf_sz.

The merged, removed PT_NOTE entries, i.e. the range [elfcorebuf_sz,
elfcorebuf_sz_orig], is filled with 0.

Use size of the ELF headers as an initial offset value in
set_vmcore_list_offsets_elf{64,32} and
process_ptload_program_headers_elf{64,32} in order to indicate that the
offset includes the holes towards the page boundary.

As a result, both set_vmcore_list_offsets_elf{64,32} have the same
definition. Merge them as set_vmcore_list_offsets.

[akpm@linux-foundation.org: add free_elfcorebuf(), cleanups]
Signed-off-by: HATAYAMA Daisuke
Acked-by: Vivek Goyal
Cc: KOSAKI Motohiro
Cc: Atsushi Kumagai
Cc: Lisa Mitchell
Cc: Zhang Yanfei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

HATAYAMA Daisuke
2013-07-04 07:07:30 +0800
b27eb1866 vmcore: clean up read_vmcore() ... Browse Code »

Rewrite part of read_vmcore() that reads objects in vmcore_list in the
same way as part reading ELF headers, by which some duplicated and
redundant codes are removed.

Signed-off-by: HATAYAMA Daisuke
Acked-by: Vivek Goyal
Cc: KOSAKI Motohiro
Cc: Atsushi Kumagai
Cc: Lisa Mitchell
Cc: Zhang Yanfei
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

HATAYAMA Daisuke
2013-07-04 07:07:30 +0800
541c237c0 pagemap: prepare to reuse constant bits with page-shift ... Browse Code »

In order to reuse bits from pagemap entries gracefully, we leave the
entries as is but on pagemap open emit a warning in dmesg, that bits
55-60 are about to change in a couple of releases. Next, if a user
issues soft-dirty clear command via the clear_refs file (it was disabled
before v3.9) we assume that he's aware of the new pagemap format, note
that fact and report the bits in pagemap in the new manner.

The "migration strategy" looks like this then:

1. existing users are not affected -- they don't touch soft-dirty feature, thus
see old bits in pagemap, but are warned and have time to fix themselves
2. those who use soft-dirty know about new pagemap format
3. some time soon we get rid of any signs of page-shift in pagemap as well as
this trick with clear-soft-dirty affecting pagemap format.

Signed-off-by: Pavel Emelyanov
Cc: Matt Mackall
Cc: Xiao Guangrong
Cc: Glauber Costa
Cc: Marcelo Tosatti
Cc: KOSAKI Motohiro
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2013-07-04 07:07:26 +0800
0f8975ec4 mm: soft-dirty bits for user memory changes tracking ... Browse Code »

The soft-dirty is a bit on a PTE which helps to track which pages a task
writes to. In order to do this tracking one should

1. Clear soft-dirty bits from PTEs ("echo 4 > /proc/PID/clear_refs)
2. Wait some time.
3. Read soft-dirty bits (55'th in /proc/PID/pagemap2 entries)

To do this tracking, the writable bit is cleared from PTEs when the
soft-dirty bit is. Thus, after this, when the task tries to modify a
page at some virtual address the #PF occurs and the kernel sets the
soft-dirty bit on the respective PTE.

Note, that although all the task's address space is marked as r/o after
the soft-dirty bits clear, the #PF-s that occur after that are processed
fast. This is so, since the pages are still mapped to physical memory,
and thus all the kernel does is finds this fact out and puts back
writable, dirty and soft-dirty bits on the PTE.

Another thing to note, is that when mremap moves PTEs they are marked
with soft-dirty as well, since from the user perspective mremap modifies
the virtual memory at mremap's new address.

Signed-off-by: Pavel Emelyanov
Cc: Matt Mackall
Cc: Xiao Guangrong
Cc: Glauber Costa
Cc: Marcelo Tosatti
Cc: KOSAKI Motohiro
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2013-07-04 07:07:26 +0800
2b0a9f017 pagemap: introduce pagemap_entry_t without pmshift bits ... Browse Code »

These bits are always constant (== PAGE_SHIFT) and just occupy space in
the entry. Moreover, in next patch we will need to report one more bit
in the pagemap, but all bits are already busy on it.

That said, describe the pagemap entry that has 6 more free zero bits.

Signed-off-by: Pavel Emelyanov
Cc: Matt Mackall
Cc: Xiao Guangrong
Cc: Glauber Costa
Cc: Marcelo Tosatti
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2013-07-04 07:07:25 +0800
af9de7eb1 clear_refs: introduce private struct for mm_walk ... Browse Code »

In the next patch the clear-refs-type will be required in
clear_refs_pte_range funciton, so prepare the walk->private to carry
this info.

Signed-off-by: Pavel Emelyanov
Cc: Matt Mackall
Cc: Xiao Guangrong
Cc: Glauber Costa
Cc: Marcelo Tosatti
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2013-07-04 07:07:25 +0800
040fa0207 clear_refs: sanitize accepted commands declaration ... Browse Code »

This is the implementation of the soft-dirty bit concept that should
help keep track of changes in user memory, which in turn is very-very
required by the checkpoint-restore project (http://criu.org).

To create a dump of an application(s) we save all the information about
it to files, and the biggest part of such dump is the contents of tasks'
memory. However, there are usage scenarios where it's not required to
get _all_ the task memory while creating a dump. For example, when
doing periodical dumps, it's only required to take full memory dump only
at the first step and then take incremental changes of memory. Another
example is live migration. We copy all the memory to the destination
node without stopping all tasks, then stop them, check for what pages
has changed, dump it and the rest of the state, then copy it to the
destination node. This decreases freeze time significantly.

That said, some help from kernel to watch how processes modify the
contents of their memory is required.

The proposal is to track changes with the help of new soft-dirty bit
this way:

1. First do "echo 4 > /proc/$pid/clear_refs".
At that point kernel clears the soft dirty _and_ the writable bits from all
ptes of process $pid. From now on every write to any page will result in #pf
and the subsequent call to pte_mkdirty/pmd_mkdirty, which in turn will set
the soft dirty flag.

2. Then read the /proc/$pid/pagemap2 and check the soft-dirty bit reported there
(the 55'th one). If set, the respective pte was written to since last call
to clear refs.

The soft-dirty bit is the _PAGE_BIT_HIDDEN one. Although it's used by
kmemcheck, the latter one marks kernel pages with it, while the former
bit is put on user pages so they do not conflict to each other.

This patch:

A new clear-refs type will be added in the next patch, so prepare
code for that.

[akpm@linux-foundation.org: don't assume that sizeof(enum clear_refs_types) == sizeof(int)]
Signed-off-by: Pavel Emelyanov
Cc: Matt Mackall
Cc: Xiao Guangrong
Cc: Glauber Costa
Cc: Marcelo Tosatti
Cc: KOSAKI Motohiro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2013-07-04 07:07:25 +0800

29 Jun, 2013

6 commits

da53be12b Don't pass inode to ->d_hash() and ->d_compare() ... Browse Code »

Instances either don't look at it at all (the majority of cases) or
only want it to find the superblock (which can be had as dentry->d_sb).
A few cases that want more are actually safe with dentry->d_inode -
the only precaution needed is the check that it hadn't been replaced with
NULL by rmdir() or by overwriting rename(), which case should be simply
treated as cache miss.

Signed-off-by: Linus Torvalds
Signed-off-by: Al Viro

Linus Torvalds
2013-06-29 16:57:36 +0800
1df98b8bb proc_fill_cache(): clean up, get rid of pointless find_inode_number() use ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-06-29 16:57:19 +0800
c52a47ace proc_fill_cache(): just make instantiate_t return int ... Browse Code »

all instances always return ERR_PTR(-E...) or NULL, anyway

Signed-off-by: Al Viro

Al Viro
2013-06-29 16:57:18 +0800
db9631648 proc_pid_readdir(): stop wanking with proc_fill_cache() for /proc/self ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-06-29 16:57:17 +0800
147ce6997 proc_fill_cache(): kill pointless check ... Browse Code »

we'd just checked that child->d_inode is non-NULL, for fuck sake!

Signed-off-by: Al Viro

Al Viro
2013-06-29 16:57:16 +0800
f0c3b5093 [readdir] convert procfs ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-06-29 16:56:32 +0800

13 Jun, 2013

1 commit

637241a90 kmsg: honor dmesg_restrict sysctl on /dev/kmsg ... Browse Code »

The dmesg_restrict sysctl currently covers the syslog method for access
dmesg, however /dev/kmsg isn't covered by the same protections. Most
people haven't noticed because util-linux dmesg(1) defaults to using the
syslog method for access in older versions. With util-linux dmesg(1)
defaults to reading directly from /dev/kmsg.

To fix /dev/kmsg, let's compare the existing interfaces and what they
allow:

- /proc/kmsg allows:
- open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
single-reader interface (SYSLOG_ACTION_READ).
- everything, after an open.

- syslog syscall allows:
- anything, if CAP_SYSLOG.
- SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
dmesg_restrict==0.
- nothing else (EPERM).

The use-cases were:
- dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
- sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
destructive SYSLOG_ACTION_READs.

AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
clear the ring buffer.

Based on the comments in devkmsg_llseek, it sounds like actions besides
reading aren't going to be supported by /dev/kmsg (i.e.
SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
syslog syscall actions.

To this end, move the check as Josh had done, but also rename the
constants to reflect their new uses (SYSLOG_FROM_CALL becomes
SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
allows destructive actions after a capabilities-constrained
SYSLOG_ACTION_OPEN check.

- /dev/kmsg allows:
- open if CAP_SYSLOG or dmesg_restrict==0
- reading/polling, after open

Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192

[akpm@linux-foundation.org: use pr_warn_once()]
Signed-off-by: Kees Cook
Reported-by: Christian Kujau
Tested-by: Josh Boyer
Cc: Kay Sievers
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kees Cook
2013-06-13 07:29:44 +0800

28 May, 2013

1 commit

15ef0298d posix-timers: Show clock ID in proc file ... Browse Code »

Expand information about posix-timers in /proc//timers by adding
info about clock, with which the timer was created. I.e. in the forth
line of timer info after "notify:" line go "ClockID: ".

Signed-off-by: Pavel Tikhomirov
Cc: Michael Kerrisk
Cc: Matthew Helsley
Cc: Pavel Emelyanov
Link: http://lkml.kernel.org/r/1368742323-46949-2-git-send-email-snorcht@gmail.com
Signed-off-by: Thomas Gleixner

Pavel Tikhomirov
2013-05-28 17:41:14 +0800

07 May, 2013

2 commits

0f47c9423 Merge branch 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux ... Browse Code »

Pull slab changes from Pekka Enberg:
"The bulk of the changes are more slab unification from Christoph.

There's also few fixes from Aaron, Glauber, and Joonsoo thrown into
the mix."

* 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: (24 commits)
mm, slab_common: Fix bootstrap creation of kmalloc caches
slab: Return NULL for oversized allocations
mm: slab: Verify the nodeid passed to ____cache_alloc_node
slub: tid must be retrieved from the percpu area of the current processor
slub: Do not dereference NULL pointer in node_match
slub: add 'likely' macro to inc_slabs_node()
slub: correct to calculate num of acquired objects in get_partial_node()
slub: correctly bootstrap boot caches
mm/sl[au]b: correct allocation type check in kmalloc_slab()
slab: Fixup CONFIG_PAGE_ALLOC/DEBUG_SLAB_LEAK sections
slab: Handle ARCH_DMA_MINALIGN correctly
slab: Common definition for kmem_cache_node
slab: Rename list3/l3 to node
slab: Common Kmalloc cache determination
stat: Use size_t for sizes instead of unsigned
slab: Common function to create the kmalloc array
slab: Common definition for the array of kmalloc caches
slab: Common constants for kmalloc boundaries
slab: Rename nodelists to node
slab: Common name for the per node structures
...

Linus Torvalds
2013-05-07 23:42:20 +0800
69df2ac12 Merge branch 'slab/next' into slab/for-linus Browse Code »

Pekka Enberg
2013-05-07 14:19:47 +0800

05 May, 2013

1 commit

75fc0cf6a proc_devtree: Replace include linux/module.h with linux/export.h ... Browse Code »

Since it uses only THIS_MODULE macro, include
is the right to go here.

Signed-off-by: Syam Sidhardhan
Signed-off-by: Al Viro

Syam Sidhardhan
2013-05-05 03:31:01 +0800

02 May, 2013

13 commits

20b4fb485 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull VFS updates from Al Viro,

Misc cleanups all over the place, mainly wrt /proc interfaces (switch
create_proc_entry to proc_create(), get rid of the deprecated
create_proc_read_entry() in favor of using proc_create_data() and
seq_file etc).

7kloc removed.

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
don't bother with deferred freeing of fdtables
proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
proc: Make the PROC_I() and PDE() macros internal to procfs
proc: Supply a function to remove a proc entry by PDE
take cgroup_open() and cpuset_open() to fs/proc/base.c
ppc: Clean up scanlog
ppc: Clean up rtas_flash driver somewhat
hostap: proc: Use remove_proc_subtree()
drm: proc: Use remove_proc_subtree()
drm: proc: Use minor->index to label things, not PDE->name
drm: Constify drm_proc_list[]
zoran: Don't print proc_dir_entry data in debug
reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
proc: Supply an accessor for getting the data from a PDE's parent
airo: Use remove_proc_subtree()
rtl8192u: Don't need to save device proc dir PDE
rtl8187se: Use a dir under /proc/net/r8180/
proc: Add proc_mkdir_data()
proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
proc: Move PDE_NET() to fs/proc/proc_net.c
...

Linus Torvalds
2013-05-02 08:51:54 +0800
59d8053f1 proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h ... Browse Code »

Move non-public declarations and definitions from linux/proc_fs.h to
fs/proc/internal.h.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2013-05-02 05:29:47 +0800
c30480b92 proc: Make the PROC_I() and PDE() macros internal to procfs ... Browse Code »

Make the PROC_I() and PDE() macros internal to procfs. This means making
PDE_DATA() out of line. This could be made more optimal by storing
PDE()->data into inode->i_private.

Also provide a __PDE_DATA() that is inline and internal to procfs.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2013-05-02 05:29:47 +0800
a8ca16ea7 proc: Supply a function to remove a proc entry by PDE ... Browse Code »

Supply a function (proc_remove()) to remove a proc entry (and any subtree
rooted there) by proc_dir_entry pointer rather than by name and (optionally)
root dir entry pointer. This allows us to eliminate all remaining pde->name
accesses outside of procfs.

Signed-off-by: David Howells
Acked-by: Grant Likely
cc: linux-acpi@vger.kernel.org
cc: openipmi-developer@lists.sourceforge.net
cc: devicetree-discuss@lists.ozlabs.org
cc: linux-pci@vger.kernel.org
cc: netdev@vger.kernel.org
cc: netfilter-devel@vger.kernel.org
cc: alsa-devel@alsa-project.org
Signed-off-by: Al Viro

David Howells
2013-05-02 05:29:46 +0800
8d8b97ba4 take cgroup_open() and cpuset_open() to fs/proc/base.c ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2013-05-02 05:29:46 +0800
4a520d276 proc: Supply an accessor for getting the data from a PDE's parent ... Browse Code »

Supply an accessor function for getting the private data from the parent
proc_dir_entry struct of the proc_dir_entry struct associated with an inode.

ReiserFS, for instance, stores the super_block pointer in the proc directory
it makes for that super_block, and a pointer to the respective seq_file show
function in each of the proc files in that directory.

This allows a reduction in the number of file_operations structs, open
functions and seq_operations structs required. The problem otherwise is that
each show function requires two pieces of data but only has storage for one
per PDE (and this has no release function).

Signed-off-by: David Howells
Acked-by: Mauro Carvalho Chehab
Acked-by: Greg Kroah-Hartman
cc: Jerry Chuang
cc: Maxim Mikityanskiy
cc: YAMANE Toshiaki
cc: linux-wireless@vger.kernel.org
cc: linux-scsi@vger.kernel.org
cc: devel@driverdev.osuosl.org
Signed-off-by: Al Viro

David Howells
2013-05-02 05:29:42 +0800
270b5ac21 proc: Add proc_mkdir_data() ... Browse Code »

Add proc_mkdir_data() to allow procfs directories to be created that are
annotated at the time of creation with private data rather than doing this
post-creation. This means no access is then required to the proc_dir_entry
struct to set this.

Signed-off-by: David Howells
Acked-by: Mauro Carvalho Chehab
Acked-by: Greg Kroah-Hartman
cc: Neela Syam Kolli
cc: Jerry Chuang
cc: linux-scsi@vger.kernel.org
cc: devel@driverdev.osuosl.org
cc: linux-wireless@vger.kernel.org
Signed-off-by: Al Viro

David Howells
2013-05-02 05:29:41 +0800
34db8aaf0 proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h} ... Browse Code »

Move some bits from linux/proc_fs.h to linux/of.h, signal.h and tty.h.

Also move proc_tty_init() and proc_device_tree_init() to fs/proc/internal.h as
they're internal to procfs.

Signed-off-by: David Howells
Acked-by: Greg Kroah-Hartman
Acked-by: Grant Likely
cc: devicetree-discuss@lists.ozlabs.org
cc: linux-arch@vger.kernel.org
cc: Greg Kroah-Hartman
cc: Jri Slaby
Signed-off-by: Al Viro

David Howells
2013-05-02 05:29:40 +0800
4abfd0298 proc: Move PDE_NET() to fs/proc/proc_net.c ... Browse Code »

Move PDE_NET() to fs/proc/proc_net.c as that's where the only user is.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2013-05-02 05:29:40 +0800
0bb80f240 proc: Split the namespace stuff out into linux/proc_ns.h ... Browse Code »

Split the proc namespace stuff out into linux/proc_ns.h.

Signed-off-by: David Howells
cc: netdev@vger.kernel.org
cc: Serge E. Hallyn
cc: Eric W. Biederman
Signed-off-by: Al Viro

David Howells
2013-05-02 05:29:39 +0800
c3bef7bca proc: Move proc_fd() to fs/proc/fd.h ... Browse Code »

Move proc_fd() to fs/proc/fd.h.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2013-05-02 05:29:39 +0800
1dd704b61 proc: Uninline pid_delete_dentry() ... Browse Code »

Uninline pid_delete_dentry() as it's only used by three function pointers.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2013-05-02 05:29:39 +0800
271a15eab proc: Supply PDE attribute setting accessor functions ... Browse Code »

Supply accessor functions to set attributes in proc_dir_entry structs.

The following are supplied: proc_set_size() and proc_set_user().

Signed-off-by: David Howells
Acked-by: Mauro Carvalho Chehab
cc: linuxppc-dev@lists.ozlabs.org
cc: linux-media@vger.kernel.org
cc: netdev@vger.kernel.org
cc: linux-wireless@vger.kernel.org
cc: linux-pci@vger.kernel.org
cc: netfilter-devel@vger.kernel.org
cc: alsa-devel@alsa-project.org
Signed-off-by: Al Viro

David Howells
2013-05-02 05:29:18 +0800

01 May, 2013

1 commit

830e0fc96 fs, proc: truncate /proc/pid/comm writes to first TASK_COMM_LEN bytes ... Browse Code »

Currently, a write to a procfs file will return the number of bytes
successfully written. If the actual string is longer than this, the
remainder of the string will not be be written and userspace will
complete the operation by issuing additional write()s.

Hence

$ echo -n "abcdefghijklmnopqrs" > /proc/self/comm

results in

$ cat /proc/$$/comm
pqrs

since the final four bytes were written with a second write() since
TASK_COMM_LEN == 16. This is obviously an undesired result and not
equivalent to prctl(PR_SET_NAME). The implementation should not need to
know the definition of TASK_COMM_LEN.

This patch truncates the string to the first TASK_COMM_LEN bytes and
returns the bytes written as the length of the string written so the
second write() is suppressed.

$ cat /proc/$$/comm
abcdefghijklmno

Signed-off-by: David Rientjes
Acked-by: John Stultz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

David Rientjes
2013-05-01 08:04:07 +0800

30 Apr, 2013

1 commit

ab86e974f Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

Pull core timer updates from Ingo Molnar:
"The main changes in this cycle's merge are:

- Implement shadow timekeeper to shorten in kernel reader side
blocking, by Thomas Gleixner.

- Posix timers enhancements by Pavel Emelyanov:

- allocate timer ID per process, so that exact timer ID allocations
can be re-created be checkpoint/restore code.

- debuggability and tooling (/proc/PID/timers, etc.) improvements.

- suspend/resume enhancements by Feng Tang: on certain new Intel Atom
processors (Penwell and Cloverview), there is a feature that the
TSC won't stop in S3 state, so the TSC value won't be reset to 0
after resume. This can be taken advantage of by the generic via
the CLOCK_SOURCE_SUSPEND_NONSTOP flag: instead of using the RTC to
recover/approximate sleep time, the main (and precise) clocksource
can be used.

- Fix /proc/timer_list for 4096 CPUs by Nathan Zimmer: on so many
CPUs the file goes beyond 4MB of size and thus the current
simplistic seqfile approach fails. Convert /proc/timer_list to a
proper seq_file with its own iterator.

- Cleanups and refactorings of the core timekeeping code by John
Stultz.

- International Atomic Clock time is managed by the NTP code
internally currently but not exposed externally. Separate the TAI
code out and add CLOCK_TAI support and TAI support to the hrtimer
and posix-timer code, by John Stultz.

- Add deep idle support enhacement to the broadcast clockevents core
timer code, by Daniel Lezcano: add an opt-in CLOCK_EVT_FEAT_DYNIRQ
clockevents feature (which will be utilized by future clockevents
driver updates), which allows the use of IRQ affinities to avoid
spurious wakeups of idle CPUs - the right CPU with an expiring
timer will be woken.

- Add new ARM bcm281xx clocksource driver, by Christian Daudt

- ... various other fixes and cleanups"

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
clockevents: Set dummy handler on CPU_DEAD shutdown
timekeeping: Update tk->cycle_last in resume
posix-timers: Remove unused variable
clockevents: Switch into oneshot mode even if broadcast registered late
timer_list: Convert timer list to be a proper seq_file
timer_list: Split timer_list_show_tickdevices
posix-timers: Show sigevent info in proc file
posix-timers: Introduce /proc/PID/timers file
posix timers: Allocate timer id per process (v2)
timekeeping: Make sure to notify hrtimers when TAI offset changes
hrtimer: Fix ktime_add_ns() overflow on 32bit architectures
hrtimer: Add expiry time overflow check in hrtimer_interrupt
timekeeping: Shorten seq_count region
timekeeping: Implement a shadow timekeeper
timekeeping: Delay update of clock->cycle_last
timekeeping: Store cycle_last value in timekeeper struct as well
ntp: Remove ntp_lock, using the timekeeping locks to protect ntp state
timekeeping: Simplify tai updating from do_adjtimex
timekeeping: Hold timekeepering locks in do_adjtimex and hardpps
timekeeping: Move ADJ_SETOFFSET to top level do_adjtimex()
...

Linus Torvalds
2013-04-30 23:15:40 +0800