04 Jul, 2013

14 commits

  • For NUL terminated string, set '\0' at the end.

    Signed-off-by: Zhao Hongjiang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zhao Hongjiang
     
  • Change uptime_proc_show() to use get_monotonic_boottime() instead of
    do_posix_clock_monotonic_gettime() + monotonic_to_bootbased().

    Signed-off-by: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Acked-by: John Stultz
    Cc: Tomas Janousek
    Cc: Tomas Smetana
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • This patch introduces mmap_vmcore().

    Don't permit writable nor executable mapping even with mprotect()
    because this mmap() is aimed at reading crash dump memory. Non-writable
    mapping is also requirement of remap_pfn_range() when mapping linear
    pages on non-consecutive physical pages; see is_cow_mapping().

    Set VM_MIXEDMAP flag to remap memory by remap_pfn_range and by
    remap_vmalloc_range_pertial at the same time for a single vma.
    do_munmap() can correctly clean partially remapped vma with two
    functions in abnormal case. See zap_pte_range(), vm_normal_page() and
    their comments for details.

    On x86-32 PAE kernels, mmap() supports at most 16TB memory only. This
    limitation comes from the fact that the third argument of
    remap_pfn_range(), pfn, is of 32-bit length on x86-32: unsigned long.

    [akpm@linux-foundation.org: use min(), switch to conventional error-unwinding approach]
    Signed-off-by: HATAYAMA Daisuke
    Acked-by: Vivek Goyal
    Cc: KOSAKI Motohiro
    Cc: Atsushi Kumagai
    Cc: Lisa Mitchell
    Cc: Zhang Yanfei
    Tested-by: Maxim Uvarov
    Cc: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HATAYAMA Daisuke
     
  • The previous patches newly added holes before each chunk of memory and
    the holes need to be count in vmcore file size. There are two ways to
    count file size in such a way:

    1) suppose m is a poitner to the last vmcore object in vmcore_list.
    Then file size is (m->offset + m->size), or

    2) calculate sum of size of buffers for ELF header, program headers,
    ELF note segments and objects in vmcore_list.

    Although 1) is more direct and simpler than 2), 2) seems better in that
    it reflects internal object structure of /proc/vmcore. Thus, this patch
    changes get_vmcore_size_elf{64, 32} so that it calculates size in the
    way of 2).

    As a result, both get_vmcore_size_elf{64, 32} have the same definition.
    Merge them as get_vmcore_size.

    Signed-off-by: HATAYAMA Daisuke
    Acked-by: Vivek Goyal
    Cc: KOSAKI Motohiro
    Cc: Atsushi Kumagai
    Cc: Lisa Mitchell
    Cc: Zhang Yanfei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HATAYAMA Daisuke
     
  • Now ELF note segment has been copied in the buffer on vmalloc memory.
    To allow user process to remap the ELF note segment buffer with
    remap_vmalloc_page, the corresponding VM area object has to have
    VM_USERMAP flag set.

    [akpm@linux-foundation.org: use the conventional comment layout]
    Signed-off-by: HATAYAMA Daisuke
    Acked-by: Vivek Goyal
    Cc: KOSAKI Motohiro
    Cc: Atsushi Kumagai
    Cc: Lisa Mitchell
    Cc: Zhang Yanfei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HATAYAMA Daisuke
     
  • The reasons why we don't allocate ELF note segment in the 1st kernel
    (old memory) on page boundary is to keep backward compatibility for old
    kernels, and that if doing so, we waste not a little memory due to
    round-up operation to fit the memory to page boundary since most of the
    buffers are in per-cpu area.

    ELF notes are per-cpu, so total size of ELF note segments depends on
    number of CPUs. The current maximum number of CPUs on x86_64 is 5192,
    and there's already system with 4192 CPUs in SGI, where total size
    amounts to 1MB. This can be larger in the near future or possibly even
    now on another architecture that has larger size of note per a single
    cpu. Thus, to avoid the case where memory allocation for large block
    fails, we allocate vmcore objects on vmalloc memory.

    This patch adds elfnotes_buf and elfnotes_sz variables to keep pointer
    to the ELF note segment buffer and its size. There's no longer the
    vmcore object that corresponds to the ELF note segment in vmcore_list.
    Accordingly, read_vmcore() has new case for ELF note segment and
    set_vmcore_list_offsets_elf{64,32}() and other helper functions starts
    calculating offset from sum of size of ELF headers and size of ELF note
    segment.

    [akpm@linux-foundation.org: use min(), fix error-path vzalloc() leaks]
    Signed-off-by: HATAYAMA Daisuke
    Acked-by: Vivek Goyal
    Cc: KOSAKI Motohiro
    Cc: Atsushi Kumagai
    Cc: Lisa Mitchell
    Cc: Zhang Yanfei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HATAYAMA Daisuke
     
  • …-size boundary in vmcore_list

    Treat memory chunks referenced by PT_LOAD program header entries in
    page-size boundary in vmcore_list. Formally, for each range [start,
    end], we set up the corresponding vmcore object in vmcore_list to
    [rounddown(start, PAGE_SIZE), roundup(end, PAGE_SIZE)].

    This change affects layout of /proc/vmcore. The gaps generated by the
    rearrangement are newly made visible to applications as holes.
    Concretely, they are two ranges [rounddown(start, PAGE_SIZE), start] and
    [end, roundup(end, PAGE_SIZE)].

    Suppose variable m points at a vmcore object in vmcore_list, and
    variable phdr points at the program header of PT_LOAD type the variable
    m corresponds to. Then, pictorially:

    m->offset +---------------+
    | hole |
    phdr->p_offset = +---------------+
    m->offset + (paddr - start) | |\
    | kernel memory | phdr->p_memsz
    | |/
    +---------------+
    | hole |
    m->offset + m->size +---------------+

    where m->offset and m->offset + m->size are always page-size aligned.

    Signed-off-by: HATAYAMA Daisuke <d.hatayama@jp.fujitsu.com>
    Acked-by: Vivek Goyal <vgoyal@redhat.com>
    Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
    Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
    Cc: Lisa Mitchell <lisa.mitchell@hp.com>
    Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    HATAYAMA Daisuke
     
  • Allocate ELF headers on page-size boundary using __get_free_pages()
    instead of kmalloc().

    Later patch will merge PT_NOTE entries into a single unique one and
    decrease the buffer size actually used. Keep original buffer size in
    variable elfcorebuf_sz_orig to kfree the buffer later and actually used
    buffer size with rounded up to page-size boundary in variable
    elfcorebuf_sz separately.

    The size of part of the ELF buffer exported from /proc/vmcore is
    elfcorebuf_sz.

    The merged, removed PT_NOTE entries, i.e. the range [elfcorebuf_sz,
    elfcorebuf_sz_orig], is filled with 0.

    Use size of the ELF headers as an initial offset value in
    set_vmcore_list_offsets_elf{64,32} and
    process_ptload_program_headers_elf{64,32} in order to indicate that the
    offset includes the holes towards the page boundary.

    As a result, both set_vmcore_list_offsets_elf{64,32} have the same
    definition. Merge them as set_vmcore_list_offsets.

    [akpm@linux-foundation.org: add free_elfcorebuf(), cleanups]
    Signed-off-by: HATAYAMA Daisuke
    Acked-by: Vivek Goyal
    Cc: KOSAKI Motohiro
    Cc: Atsushi Kumagai
    Cc: Lisa Mitchell
    Cc: Zhang Yanfei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HATAYAMA Daisuke
     
  • Rewrite part of read_vmcore() that reads objects in vmcore_list in the
    same way as part reading ELF headers, by which some duplicated and
    redundant codes are removed.

    Signed-off-by: HATAYAMA Daisuke
    Acked-by: Vivek Goyal
    Cc: KOSAKI Motohiro
    Cc: Atsushi Kumagai
    Cc: Lisa Mitchell
    Cc: Zhang Yanfei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HATAYAMA Daisuke
     
  • In order to reuse bits from pagemap entries gracefully, we leave the
    entries as is but on pagemap open emit a warning in dmesg, that bits
    55-60 are about to change in a couple of releases. Next, if a user
    issues soft-dirty clear command via the clear_refs file (it was disabled
    before v3.9) we assume that he's aware of the new pagemap format, note
    that fact and report the bits in pagemap in the new manner.

    The "migration strategy" looks like this then:

    1. existing users are not affected -- they don't touch soft-dirty feature, thus
    see old bits in pagemap, but are warned and have time to fix themselves
    2. those who use soft-dirty know about new pagemap format
    3. some time soon we get rid of any signs of page-shift in pagemap as well as
    this trick with clear-soft-dirty affecting pagemap format.

    Signed-off-by: Pavel Emelyanov
    Cc: Matt Mackall
    Cc: Xiao Guangrong
    Cc: Glauber Costa
    Cc: Marcelo Tosatti
    Cc: KOSAKI Motohiro
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • The soft-dirty is a bit on a PTE which helps to track which pages a task
    writes to. In order to do this tracking one should

    1. Clear soft-dirty bits from PTEs ("echo 4 > /proc/PID/clear_refs)
    2. Wait some time.
    3. Read soft-dirty bits (55'th in /proc/PID/pagemap2 entries)

    To do this tracking, the writable bit is cleared from PTEs when the
    soft-dirty bit is. Thus, after this, when the task tries to modify a
    page at some virtual address the #PF occurs and the kernel sets the
    soft-dirty bit on the respective PTE.

    Note, that although all the task's address space is marked as r/o after
    the soft-dirty bits clear, the #PF-s that occur after that are processed
    fast. This is so, since the pages are still mapped to physical memory,
    and thus all the kernel does is finds this fact out and puts back
    writable, dirty and soft-dirty bits on the PTE.

    Another thing to note, is that when mremap moves PTEs they are marked
    with soft-dirty as well, since from the user perspective mremap modifies
    the virtual memory at mremap's new address.

    Signed-off-by: Pavel Emelyanov
    Cc: Matt Mackall
    Cc: Xiao Guangrong
    Cc: Glauber Costa
    Cc: Marcelo Tosatti
    Cc: KOSAKI Motohiro
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • These bits are always constant (== PAGE_SHIFT) and just occupy space in
    the entry. Moreover, in next patch we will need to report one more bit
    in the pagemap, but all bits are already busy on it.

    That said, describe the pagemap entry that has 6 more free zero bits.

    Signed-off-by: Pavel Emelyanov
    Cc: Matt Mackall
    Cc: Xiao Guangrong
    Cc: Glauber Costa
    Cc: Marcelo Tosatti
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • In the next patch the clear-refs-type will be required in
    clear_refs_pte_range funciton, so prepare the walk->private to carry
    this info.

    Signed-off-by: Pavel Emelyanov
    Cc: Matt Mackall
    Cc: Xiao Guangrong
    Cc: Glauber Costa
    Cc: Marcelo Tosatti
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • This is the implementation of the soft-dirty bit concept that should
    help keep track of changes in user memory, which in turn is very-very
    required by the checkpoint-restore project (http://criu.org).

    To create a dump of an application(s) we save all the information about
    it to files, and the biggest part of such dump is the contents of tasks'
    memory. However, there are usage scenarios where it's not required to
    get _all_ the task memory while creating a dump. For example, when
    doing periodical dumps, it's only required to take full memory dump only
    at the first step and then take incremental changes of memory. Another
    example is live migration. We copy all the memory to the destination
    node without stopping all tasks, then stop them, check for what pages
    has changed, dump it and the rest of the state, then copy it to the
    destination node. This decreases freeze time significantly.

    That said, some help from kernel to watch how processes modify the
    contents of their memory is required.

    The proposal is to track changes with the help of new soft-dirty bit
    this way:

    1. First do "echo 4 > /proc/$pid/clear_refs".
    At that point kernel clears the soft dirty _and_ the writable bits from all
    ptes of process $pid. From now on every write to any page will result in #pf
    and the subsequent call to pte_mkdirty/pmd_mkdirty, which in turn will set
    the soft dirty flag.

    2. Then read the /proc/$pid/pagemap2 and check the soft-dirty bit reported there
    (the 55'th one). If set, the respective pte was written to since last call
    to clear refs.

    The soft-dirty bit is the _PAGE_BIT_HIDDEN one. Although it's used by
    kmemcheck, the latter one marks kernel pages with it, while the former
    bit is put on user pages so they do not conflict to each other.

    This patch:

    A new clear-refs type will be added in the next patch, so prepare
    code for that.

    [akpm@linux-foundation.org: don't assume that sizeof(enum clear_refs_types) == sizeof(int)]
    Signed-off-by: Pavel Emelyanov
    Cc: Matt Mackall
    Cc: Xiao Guangrong
    Cc: Glauber Costa
    Cc: Marcelo Tosatti
    Cc: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

29 Jun, 2013

6 commits


13 Jun, 2013

1 commit

  • The dmesg_restrict sysctl currently covers the syslog method for access
    dmesg, however /dev/kmsg isn't covered by the same protections. Most
    people haven't noticed because util-linux dmesg(1) defaults to using the
    syslog method for access in older versions. With util-linux dmesg(1)
    defaults to reading directly from /dev/kmsg.

    To fix /dev/kmsg, let's compare the existing interfaces and what they
    allow:

    - /proc/kmsg allows:
    - open (SYSLOG_ACTION_OPEN) if CAP_SYSLOG since it uses a destructive
    single-reader interface (SYSLOG_ACTION_READ).
    - everything, after an open.

    - syslog syscall allows:
    - anything, if CAP_SYSLOG.
    - SYSLOG_ACTION_READ_ALL and SYSLOG_ACTION_SIZE_BUFFER, if
    dmesg_restrict==0.
    - nothing else (EPERM).

    The use-cases were:
    - dmesg(1) needs to do non-destructive SYSLOG_ACTION_READ_ALLs.
    - sysklog(1) needs to open /proc/kmsg, drop privs, and still issue the
    destructive SYSLOG_ACTION_READs.

    AIUI, dmesg(1) is moving to /dev/kmsg, and systemd-journald doesn't
    clear the ring buffer.

    Based on the comments in devkmsg_llseek, it sounds like actions besides
    reading aren't going to be supported by /dev/kmsg (i.e.
    SYSLOG_ACTION_CLEAR), so we have a strict subset of the non-destructive
    syslog syscall actions.

    To this end, move the check as Josh had done, but also rename the
    constants to reflect their new uses (SYSLOG_FROM_CALL becomes
    SYSLOG_FROM_READER, and SYSLOG_FROM_FILE becomes SYSLOG_FROM_PROC).
    SYSLOG_FROM_READER allows non-destructive actions, and SYSLOG_FROM_PROC
    allows destructive actions after a capabilities-constrained
    SYSLOG_ACTION_OPEN check.

    - /dev/kmsg allows:
    - open if CAP_SYSLOG or dmesg_restrict==0
    - reading/polling, after open

    Addresses https://bugzilla.redhat.com/show_bug.cgi?id=903192

    [akpm@linux-foundation.org: use pr_warn_once()]
    Signed-off-by: Kees Cook
    Reported-by: Christian Kujau
    Tested-by: Josh Boyer
    Cc: Kay Sievers
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

28 May, 2013

1 commit

  • Expand information about posix-timers in /proc//timers by adding
    info about clock, with which the timer was created. I.e. in the forth
    line of timer info after "notify:" line go "ClockID: ".

    Signed-off-by: Pavel Tikhomirov
    Cc: Michael Kerrisk
    Cc: Matthew Helsley
    Cc: Pavel Emelyanov
    Link: http://lkml.kernel.org/r/1368742323-46949-2-git-send-email-snorcht@gmail.com
    Signed-off-by: Thomas Gleixner

    Pavel Tikhomirov
     

07 May, 2013

2 commits

  • Pull slab changes from Pekka Enberg:
    "The bulk of the changes are more slab unification from Christoph.

    There's also few fixes from Aaron, Glauber, and Joonsoo thrown into
    the mix."

    * 'slab/for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/linux: (24 commits)
    mm, slab_common: Fix bootstrap creation of kmalloc caches
    slab: Return NULL for oversized allocations
    mm: slab: Verify the nodeid passed to ____cache_alloc_node
    slub: tid must be retrieved from the percpu area of the current processor
    slub: Do not dereference NULL pointer in node_match
    slub: add 'likely' macro to inc_slabs_node()
    slub: correct to calculate num of acquired objects in get_partial_node()
    slub: correctly bootstrap boot caches
    mm/sl[au]b: correct allocation type check in kmalloc_slab()
    slab: Fixup CONFIG_PAGE_ALLOC/DEBUG_SLAB_LEAK sections
    slab: Handle ARCH_DMA_MINALIGN correctly
    slab: Common definition for kmem_cache_node
    slab: Rename list3/l3 to node
    slab: Common Kmalloc cache determination
    stat: Use size_t for sizes instead of unsigned
    slab: Common function to create the kmalloc array
    slab: Common definition for the array of kmalloc caches
    slab: Common constants for kmalloc boundaries
    slab: Rename nodelists to node
    slab: Common name for the per node structures
    ...

    Linus Torvalds
     
  • Pekka Enberg
     

05 May, 2013

1 commit


02 May, 2013

13 commits

  • Pull VFS updates from Al Viro,

    Misc cleanups all over the place, mainly wrt /proc interfaces (switch
    create_proc_entry to proc_create(), get rid of the deprecated
    create_proc_read_entry() in favor of using proc_create_data() and
    seq_file etc).

    7kloc removed.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits)
    don't bother with deferred freeing of fdtables
    proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h
    proc: Make the PROC_I() and PDE() macros internal to procfs
    proc: Supply a function to remove a proc entry by PDE
    take cgroup_open() and cpuset_open() to fs/proc/base.c
    ppc: Clean up scanlog
    ppc: Clean up rtas_flash driver somewhat
    hostap: proc: Use remove_proc_subtree()
    drm: proc: Use remove_proc_subtree()
    drm: proc: Use minor->index to label things, not PDE->name
    drm: Constify drm_proc_list[]
    zoran: Don't print proc_dir_entry data in debug
    reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show()
    proc: Supply an accessor for getting the data from a PDE's parent
    airo: Use remove_proc_subtree()
    rtl8192u: Don't need to save device proc dir PDE
    rtl8187se: Use a dir under /proc/net/r8180/
    proc: Add proc_mkdir_data()
    proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h}
    proc: Move PDE_NET() to fs/proc/proc_net.c
    ...

    Linus Torvalds
     
  • Move non-public declarations and definitions from linux/proc_fs.h to
    fs/proc/internal.h.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Make the PROC_I() and PDE() macros internal to procfs. This means making
    PDE_DATA() out of line. This could be made more optimal by storing
    PDE()->data into inode->i_private.

    Also provide a __PDE_DATA() that is inline and internal to procfs.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Supply a function (proc_remove()) to remove a proc entry (and any subtree
    rooted there) by proc_dir_entry pointer rather than by name and (optionally)
    root dir entry pointer. This allows us to eliminate all remaining pde->name
    accesses outside of procfs.

    Signed-off-by: David Howells
    Acked-by: Grant Likely
    cc: linux-acpi@vger.kernel.org
    cc: openipmi-developer@lists.sourceforge.net
    cc: devicetree-discuss@lists.ozlabs.org
    cc: linux-pci@vger.kernel.org
    cc: netdev@vger.kernel.org
    cc: netfilter-devel@vger.kernel.org
    cc: alsa-devel@alsa-project.org
    Signed-off-by: Al Viro

    David Howells
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Supply an accessor function for getting the private data from the parent
    proc_dir_entry struct of the proc_dir_entry struct associated with an inode.

    ReiserFS, for instance, stores the super_block pointer in the proc directory
    it makes for that super_block, and a pointer to the respective seq_file show
    function in each of the proc files in that directory.

    This allows a reduction in the number of file_operations structs, open
    functions and seq_operations structs required. The problem otherwise is that
    each show function requires two pieces of data but only has storage for one
    per PDE (and this has no release function).

    Signed-off-by: David Howells
    Acked-by: Mauro Carvalho Chehab
    Acked-by: Greg Kroah-Hartman
    cc: Jerry Chuang
    cc: Maxim Mikityanskiy
    cc: YAMANE Toshiaki
    cc: linux-wireless@vger.kernel.org
    cc: linux-scsi@vger.kernel.org
    cc: devel@driverdev.osuosl.org
    Signed-off-by: Al Viro

    David Howells
     
  • Add proc_mkdir_data() to allow procfs directories to be created that are
    annotated at the time of creation with private data rather than doing this
    post-creation. This means no access is then required to the proc_dir_entry
    struct to set this.

    Signed-off-by: David Howells
    Acked-by: Mauro Carvalho Chehab
    Acked-by: Greg Kroah-Hartman
    cc: Neela Syam Kolli
    cc: Jerry Chuang
    cc: linux-scsi@vger.kernel.org
    cc: devel@driverdev.osuosl.org
    cc: linux-wireless@vger.kernel.org
    Signed-off-by: Al Viro

    David Howells
     
  • Move some bits from linux/proc_fs.h to linux/of.h, signal.h and tty.h.

    Also move proc_tty_init() and proc_device_tree_init() to fs/proc/internal.h as
    they're internal to procfs.

    Signed-off-by: David Howells
    Acked-by: Greg Kroah-Hartman
    Acked-by: Grant Likely
    cc: devicetree-discuss@lists.ozlabs.org
    cc: linux-arch@vger.kernel.org
    cc: Greg Kroah-Hartman
    cc: Jri Slaby
    Signed-off-by: Al Viro

    David Howells
     
  • Move PDE_NET() to fs/proc/proc_net.c as that's where the only user is.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Split the proc namespace stuff out into linux/proc_ns.h.

    Signed-off-by: David Howells
    cc: netdev@vger.kernel.org
    cc: Serge E. Hallyn
    cc: Eric W. Biederman
    Signed-off-by: Al Viro

    David Howells
     
  • Move proc_fd() to fs/proc/fd.h.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Uninline pid_delete_dentry() as it's only used by three function pointers.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Supply accessor functions to set attributes in proc_dir_entry structs.

    The following are supplied: proc_set_size() and proc_set_user().

    Signed-off-by: David Howells
    Acked-by: Mauro Carvalho Chehab
    cc: linuxppc-dev@lists.ozlabs.org
    cc: linux-media@vger.kernel.org
    cc: netdev@vger.kernel.org
    cc: linux-wireless@vger.kernel.org
    cc: linux-pci@vger.kernel.org
    cc: netfilter-devel@vger.kernel.org
    cc: alsa-devel@alsa-project.org
    Signed-off-by: Al Viro

    David Howells
     

01 May, 2013

1 commit

  • Currently, a write to a procfs file will return the number of bytes
    successfully written. If the actual string is longer than this, the
    remainder of the string will not be be written and userspace will
    complete the operation by issuing additional write()s.

    Hence

    $ echo -n "abcdefghijklmnopqrs" > /proc/self/comm

    results in

    $ cat /proc/$$/comm
    pqrs

    since the final four bytes were written with a second write() since
    TASK_COMM_LEN == 16. This is obviously an undesired result and not
    equivalent to prctl(PR_SET_NAME). The implementation should not need to
    know the definition of TASK_COMM_LEN.

    This patch truncates the string to the first TASK_COMM_LEN bytes and
    returns the bytes written as the length of the string written so the
    second write() is suppressed.

    $ cat /proc/$$/comm
    abcdefghijklmno

    Signed-off-by: David Rientjes
    Acked-by: John Stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     

30 Apr, 2013

1 commit

  • Pull core timer updates from Ingo Molnar:
    "The main changes in this cycle's merge are:

    - Implement shadow timekeeper to shorten in kernel reader side
    blocking, by Thomas Gleixner.

    - Posix timers enhancements by Pavel Emelyanov:

    - allocate timer ID per process, so that exact timer ID allocations
    can be re-created be checkpoint/restore code.

    - debuggability and tooling (/proc/PID/timers, etc.) improvements.

    - suspend/resume enhancements by Feng Tang: on certain new Intel Atom
    processors (Penwell and Cloverview), there is a feature that the
    TSC won't stop in S3 state, so the TSC value won't be reset to 0
    after resume. This can be taken advantage of by the generic via
    the CLOCK_SOURCE_SUSPEND_NONSTOP flag: instead of using the RTC to
    recover/approximate sleep time, the main (and precise) clocksource
    can be used.

    - Fix /proc/timer_list for 4096 CPUs by Nathan Zimmer: on so many
    CPUs the file goes beyond 4MB of size and thus the current
    simplistic seqfile approach fails. Convert /proc/timer_list to a
    proper seq_file with its own iterator.

    - Cleanups and refactorings of the core timekeeping code by John
    Stultz.

    - International Atomic Clock time is managed by the NTP code
    internally currently but not exposed externally. Separate the TAI
    code out and add CLOCK_TAI support and TAI support to the hrtimer
    and posix-timer code, by John Stultz.

    - Add deep idle support enhacement to the broadcast clockevents core
    timer code, by Daniel Lezcano: add an opt-in CLOCK_EVT_FEAT_DYNIRQ
    clockevents feature (which will be utilized by future clockevents
    driver updates), which allows the use of IRQ affinities to avoid
    spurious wakeups of idle CPUs - the right CPU with an expiring
    timer will be woken.

    - Add new ARM bcm281xx clocksource driver, by Christian Daudt

    - ... various other fixes and cleanups"

    * 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
    clockevents: Set dummy handler on CPU_DEAD shutdown
    timekeeping: Update tk->cycle_last in resume
    posix-timers: Remove unused variable
    clockevents: Switch into oneshot mode even if broadcast registered late
    timer_list: Convert timer list to be a proper seq_file
    timer_list: Split timer_list_show_tickdevices
    posix-timers: Show sigevent info in proc file
    posix-timers: Introduce /proc/PID/timers file
    posix timers: Allocate timer id per process (v2)
    timekeeping: Make sure to notify hrtimers when TAI offset changes
    hrtimer: Fix ktime_add_ns() overflow on 32bit architectures
    hrtimer: Add expiry time overflow check in hrtimer_interrupt
    timekeeping: Shorten seq_count region
    timekeeping: Implement a shadow timekeeper
    timekeeping: Delay update of clock->cycle_last
    timekeeping: Store cycle_last value in timekeeper struct as well
    ntp: Remove ntp_lock, using the timekeeping locks to protect ntp state
    timekeeping: Simplify tai updating from do_adjtimex
    timekeeping: Hold timekeepering locks in do_adjtimex and hardpps
    timekeeping: Move ADJ_SETOFFSET to top level do_adjtimex()
    ...

    Linus Torvalds