28 Feb, 2013

1 commit

  • The existing SUID_DUMP_* defines duplicate the newer SUID_DUMPABLE_*
    defines introduced in 54b501992dd2 ("coredump: warn about unsafe
    suid_dumpable / core_pattern combo"). Remove the new ones, and use the
    prior values instead.

    Signed-off-by: Kees Cook
    Reported-by: Chen Gang
    Cc: Alexander Viro
    Cc: Alan Cox
    Cc: "Eric W. Biederman"
    Cc: Doug Ledford
    Cc: Serge Hallyn
    Cc: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

19 Nov, 2012

1 commit

  • I had visions at one point of splitting proc into two filesystems. If
    that had happened proc/self being the the part of proc that actually deals
    with pids would have been a nice cleanup. As it is proc/self requires
    a lot of unnecessary infrastructure for a single file.

    The only user visible change is that a mounted /proc for a pid namespace
    that is dead now shows a broken proc symlink, instead of being completely
    invisible. I don't think anyone will notice or care.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

20 Oct, 2012

1 commit

  • /proc//numa_maps scans vma and show mempolicy under
    mmap_sem. It sometimes accesses task->mempolicy which can
    be freed without mmap_sem and numa_maps can show some
    garbage while scanning.

    This patch tries to take reference count of task->mempolicy at reading
    numa_maps before calling get_vma_policy(). By this, task->mempolicy
    will not be freed until numa_maps reaches its end.

    V2->v3
    - updated comments to be more verbose.
    - removed task_lock() in numa_maps code.
    V1->V2
    - access task->mempolicy only once and remember it. Becase kernel/exit.c
    can overwrite it.

    Signed-off-by: KAMEZAWA Hiroyuki
    Acked-by: David Rientjes
    Acked-by: KOSAKI Motohiro
    Signed-off-by: Linus Torvalds

    KAMEZAWA Hiroyuki
     

06 Oct, 2012

1 commit


27 Sep, 2012

1 commit

  • This patch prepares the ground for further extension of
    /proc/pid/fd[info] handling code by moving fdinfo handling
    code into fs/proc/fd.c.

    I think such move makes both fs/proc/base.c and fs/proc/fd.c
    easier to read.

    Signed-off-by: Cyrill Gorcunov
    Acked-by: Pavel Emelyanov
    CC: Al Viro
    CC: Alexey Dobriyan
    CC: Andrew Morton
    CC: James Bottomley
    CC: "Aneesh Kumar K.V"
    CC: Alexey Dobriyan
    CC: Matthew Helsley
    CC: "J. Bruce Fields"
    CC: "Aneesh Kumar K.V"
    Signed-off-by: Al Viro

    Cyrill Gorcunov
     

14 Jul, 2012

2 commits


01 Jun, 2012

2 commits

  • When we do checkpoint of a task we need to know the list of children the
    task, has but there is no easy and fast way to generate reverse
    parent->children chain from arbitrary (while a parent pid is
    provided in "PPid" field of /proc//status).

    So instead of walking over all pids in the system (creating one big
    process tree in memory, just to figure out which children a task has) --
    we add explicit /proc//task//children entry, because the kernel
    already has this kind of information but it is not yet exported.

    This is a first level children, not the whole process tree.

    Signed-off-by: Cyrill Gorcunov
    Reviewed-by: Oleg Nesterov
    Reviewed-by: Kees Cook
    Cc: Pavel Emelyanov
    Cc: Serge Hallyn
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • mm_for_maps() is a simple wrapper for mm_access(), and the name is
    misleading, so just remove it and use mm_access() directly.

    Signed-off-by: Cong Wang
    Cc: Oleg Nesterov
    Cc: Alexey Dobriyan
    Acked-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cong Wang
     

24 Mar, 2012

1 commit

  • Pull sysctl updates from Eric Biederman:

    - Rewrite of sysctl for speed and clarity.

    Insert/remove/Lookup in sysctl are all now O(NlogN) operations, and
    are no longer bottlenecks in the process of adding and removing
    network devices.

    sysctl is now focused on being a filesystem instead of system call
    and the code can all be found in fs/proc/proc_sysctl.c. Hopefully
    this means the code is now approachable.

    Much thanks is owed to Lucian Grinjincu for keeping at this until
    something was found that was usable.

    - The recent proc_sys_poll oops found by the fuzzer during hibernation
    is fixed.

    * git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/sysctl: (36 commits)
    sysctl: protect poll() in entries that may go away
    sysctl: Don't call sysctl_follow_link unless we are a link.
    sysctl: Comments to make the code clearer.
    sysctl: Correct error return from get_subdir
    sysctl: An easier to read version of find_subdir
    sysctl: fix memset parameters in setup_sysctl_set()
    sysctl: remove an unused variable
    sysctl: Add register_sysctl for normal sysctl users
    sysctl: Index sysctl directories with rbtrees.
    sysctl: Make the header lists per directory.
    sysctl: Move sysctl_check_dups into insert_header
    sysctl: Modify __register_sysctl_paths to take a set instead of a root and an nsproxy
    sysctl: Replace root_list with links between sysctl_table_sets.
    sysctl: Add sysctl_print_dir and use it in get_subdir
    sysctl: Stop requiring explicit management of sysctl directories
    sysctl: Add a root pointer to ctl_table_set
    sysctl: Rewrite proc_sys_readdir in terms of first_entry and next_entry
    sysctl: Rewrite proc_sys_lookup introducing find_entry and lookup_entry.
    sysctl: Normalize the root_table data structure.
    sysctl: Factor out insert_header and erase_header
    ...

    Linus Torvalds
     

22 Mar, 2012

1 commit

  • Stack for a new thread is mapped by userspace code and passed via
    sys_clone. This memory is currently seen as anonymous in
    /proc//maps, which makes it difficult to ascertain which mappings
    are being used for thread stacks. This patch uses the individual task
    stack pointers to determine which vmas are actually thread stacks.

    For a multithreaded program like the following:

    #include

    void *thread_main(void *foo)
    {
    while(1);
    }

    int main()
    {
    pthread_t t;
    pthread_create(&t, NULL, thread_main, NULL);
    pthread_join(t, NULL);
    }

    proc/PID/maps looks like the following:

    00400000-00401000 r-xp 00000000 fd:0a 3671804 /home/siddhesh/a.out
    00600000-00601000 rw-p 00000000 fd:0a 3671804 /home/siddhesh/a.out
    019ef000-01a10000 rw-p 00000000 00:00 0 [heap]
    7f8a44491000-7f8a44492000 ---p 00000000 00:00 0
    7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0
    7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482 /lib64/libc-2.14.90.so
    7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so
    7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so
    7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482 /lib64/libc-2.14.90.so
    7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0
    7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938 /lib64/libpthread-2.14.90.so
    7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so
    7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938 /lib64/libpthread-2.14.90.so
    7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so
    7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0
    7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348 /lib64/ld-2.14.90.so
    7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0
    7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0
    7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348 /lib64/ld-2.14.90.so
    7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348 /lib64/ld-2.14.90.so
    7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0
    7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 [stack]
    7fff627ff000-7fff62800000 r-xp 00000000 00:00 0 [vdso]
    ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

    Here, one could guess that 7f8a44492000-7f8a44c92000 is a stack since
    the earlier vma that has no permissions (7f8a44e3d000-7f8a4503d000) but
    that is not always a reliable way to find out which vma is a thread
    stack. Also, /proc/PID/maps and /proc/PID/task/TID/maps has the same
    content.

    With this patch in place, /proc/PID/task/TID/maps are treated as 'maps
    as the task would see it' and hence, only the vma that that task uses as
    stack is marked as [stack]. All other 'stack' vmas are marked as
    anonymous memory. /proc/PID/maps acts as a thread group level view,
    where all thread stack vmas are marked as [stack:TID] where TID is the
    process ID of the task that uses that vma as stack, while the process
    stack is marked as [stack].

    So /proc/PID/maps will look like this:

    00400000-00401000 r-xp 00000000 fd:0a 3671804 /home/siddhesh/a.out
    00600000-00601000 rw-p 00000000 fd:0a 3671804 /home/siddhesh/a.out
    019ef000-01a10000 rw-p 00000000 00:00 0 [heap]
    7f8a44491000-7f8a44492000 ---p 00000000 00:00 0
    7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 [stack:1442]
    7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482 /lib64/libc-2.14.90.so
    7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so
    7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so
    7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482 /lib64/libc-2.14.90.so
    7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0
    7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938 /lib64/libpthread-2.14.90.so
    7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so
    7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938 /lib64/libpthread-2.14.90.so
    7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so
    7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0
    7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348 /lib64/ld-2.14.90.so
    7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0
    7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0
    7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348 /lib64/ld-2.14.90.so
    7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348 /lib64/ld-2.14.90.so
    7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0
    7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 [stack]
    7fff627ff000-7fff62800000 r-xp 00000000 00:00 0 [vdso]
    ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

    Thus marking all vmas that are used as stacks by the threads in the
    thread group along with the process stack. The task level maps will
    however like this:

    00400000-00401000 r-xp 00000000 fd:0a 3671804 /home/siddhesh/a.out
    00600000-00601000 rw-p 00000000 fd:0a 3671804 /home/siddhesh/a.out
    019ef000-01a10000 rw-p 00000000 00:00 0 [heap]
    7f8a44491000-7f8a44492000 ---p 00000000 00:00 0
    7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 [stack]
    7f8a44c92000-7f8a44e3d000 r-xp 00000000 fd:00 2097482 /lib64/libc-2.14.90.so
    7f8a44e3d000-7f8a4503d000 ---p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so
    7f8a4503d000-7f8a45041000 r--p 001ab000 fd:00 2097482 /lib64/libc-2.14.90.so
    7f8a45041000-7f8a45043000 rw-p 001af000 fd:00 2097482 /lib64/libc-2.14.90.so
    7f8a45043000-7f8a45048000 rw-p 00000000 00:00 0
    7f8a45048000-7f8a4505f000 r-xp 00000000 fd:00 2099938 /lib64/libpthread-2.14.90.so
    7f8a4505f000-7f8a4525e000 ---p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so
    7f8a4525e000-7f8a4525f000 r--p 00016000 fd:00 2099938 /lib64/libpthread-2.14.90.so
    7f8a4525f000-7f8a45260000 rw-p 00017000 fd:00 2099938 /lib64/libpthread-2.14.90.so
    7f8a45260000-7f8a45264000 rw-p 00000000 00:00 0
    7f8a45264000-7f8a45286000 r-xp 00000000 fd:00 2097348 /lib64/ld-2.14.90.so
    7f8a45457000-7f8a4545a000 rw-p 00000000 00:00 0
    7f8a45484000-7f8a45485000 rw-p 00000000 00:00 0
    7f8a45485000-7f8a45486000 r--p 00021000 fd:00 2097348 /lib64/ld-2.14.90.so
    7f8a45486000-7f8a45487000 rw-p 00022000 fd:00 2097348 /lib64/ld-2.14.90.so
    7f8a45487000-7f8a45488000 rw-p 00000000 00:00 0
    7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0
    7fff627ff000-7fff62800000 r-xp 00000000 00:00 0 [vdso]
    ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]

    where only the vma that is being used as a stack by *that* task is
    marked as [stack].

    Analogous changes have been made to /proc/PID/smaps,
    /proc/PID/numa_maps, /proc/PID/task/TID/smaps and
    /proc/PID/task/TID/numa_maps. Relevant snippets from smaps and
    numa_maps:

    [siddhesh@localhost ~ ]$ pgrep a.out
    1441
    [siddhesh@localhost ~ ]$ cat /proc/1441/smaps | grep "\[stack"
    7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 [stack:1442]
    7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 [stack]
    [siddhesh@localhost ~ ]$ cat /proc/1441/task/1442/smaps | grep "\[stack"
    7f8a44492000-7f8a44c92000 rw-p 00000000 00:00 0 [stack]
    [siddhesh@localhost ~ ]$ cat /proc/1441/task/1441/smaps | grep "\[stack"
    7fff6273b000-7fff6275c000 rw-p 00000000 00:00 0 [stack]
    [siddhesh@localhost ~ ]$ cat /proc/1441/numa_maps | grep "stack"
    7f8a44492000 default stack:1442 anon=2 dirty=2 N0=2
    7fff6273a000 default stack anon=3 dirty=3 N0=3
    [siddhesh@localhost ~ ]$ cat /proc/1441/task/1442/numa_maps | grep "stack"
    7f8a44492000 default stack anon=2 dirty=2 N0=2
    [siddhesh@localhost ~ ]$ cat /proc/1441/task/1441/numa_maps | grep "stack"
    7fff6273a000 default stack anon=3 dirty=3 N0=3

    [akpm@linux-foundation.org: checkpatch fixes]
    [akpm@linux-foundation.org: fix build]
    Signed-off-by: Siddhesh Poyarekar
    Cc: KOSAKI Motohiro
    Cc: Alexander Viro
    Cc: Jamie Lokier
    Cc: Mike Frysinger
    Cc: Alexey Dobriyan
    Cc: Matt Mackall
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Siddhesh Poyarekar
     

25 Jan, 2012

1 commit

  • Move the core sysctl code from kernel/sysctl.c and kernel/sysctl_check.c
    into fs/proc/proc_sysctl.c.

    Currently sysctl maintenance is hampered by the sysctl implementation
    being split across 3 files with artificial layering between them.
    Consolidate the entire sysctl implementation into 1 file so that
    it is easier to see what is going on and hopefully allowing for
    simpler maintenance.

    For functions that are now only used in fs/proc/proc_sysctl.c remove
    their declarations from sysctl.h and make them static in fs/proc/proc_sysctl.c

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

11 Jan, 2012

1 commit

  • Add support for procfs mount options. Actual mount options are coming in
    the next patches.

    Signed-off-by: Vasiliy Kulikov
    Cc: Alexey Dobriyan
    Cc: Al Viro
    Cc: Randy Dunlap
    Cc: "H. Peter Anvin"
    Cc: Greg KH
    Cc: Theodore Tso
    Cc: Alan Cox
    Cc: James Morris
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     

26 May, 2011

1 commit

  • * git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd:
    net: fix get_net_ns_by_fd for !CONFIG_NET_NS
    ns proc: Return -ENOENT for a nonexistent /proc/self/ns/ entry.
    ns: Declare sys_setns in syscalls.h
    net: Allow setting the network namespace by fd
    ns proc: Add support for the ipc namespace
    ns proc: Add support for the uts namespace
    ns proc: Add support for the network namespace.
    ns: Introduce the setns syscall
    ns: proc files for namespace naming policy.

    Linus Torvalds
     

25 May, 2011

1 commit

  • Now that mm/mempolicy.c is no longer implementing /proc/pid/numa_maps
    there is no need to export struct proc_maps_private to the world. Move it
    to fs/proc/internal.h instead.

    Signed-off-by: Stephen Wilson
    Reviewed-by: KOSAKI Motohiro
    Cc: Hugh Dickins
    Cc: David Rientjes
    Cc: Lee Schermerhorn
    Cc: Alexey Dobriyan
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Wilson
     

11 May, 2011

1 commit

  • Create files under /proc//ns/ to allow controlling the
    namespaces of a process.

    This addresses three specific problems that can make namespaces hard to
    work with.
    - Namespaces require a dedicated process to pin them in memory.
    - It is not possible to use a namespace unless you are the child
    of the original creator.
    - Namespaces don't have names that userspace can use to talk about
    them.

    The namespace files under /proc//ns/ can be opened and the
    file descriptor can be used to talk about a specific namespace, and
    to keep the specified namespace alive.

    A namespace can be kept alive by either holding the file descriptor
    open or bind mounting the file someplace else. aka:
    mount --bind /proc/self/ns/net /some/filesystem/path
    mount --bind /proc/self/fd/ /some/filesystem/path

    This allows namespaces to be named with userspace policy.

    It requires additional support to make use of these filedescriptors
    and that will be comming in the following patches.

    Acked-by: Daniel Lezcano
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

24 Mar, 2011

1 commit

  • After the previous cleanup in proc_get_sb() the global proc_mnt has no
    reasons to exists, kill it.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Daniel Lezcano
    Cc: Alexey Dobriyan
    Acked-by: Serge E. Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

14 Jan, 2011

2 commits

  • - ->low_ino is write-once field -- reading it under locks is unnecessary.

    - /proc/$PID stuff never reaches pde_put()/free_proc_entry() --
    PROC_DYNAMIC_FIRST check never triggers.

    - in proc_get_inode(), inode number always matches proc dir entry, so
    save one parameter.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • /proc/*/statm code needlessly truncates data from unsigned long to int.
    One needs only 8+ TB of RAM to make truncation visible.

    Signed-off-by: Alexey Dobriyan
    Reviewed-by: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

16 Dec, 2009

1 commit

  • * de_get() is trivial -- make inline, save a few bits of code, drop
    "refcount is 0" check -- it should be done in some generic refcount
    code, don't recall it's was helpful

    * rename GET and PUT functions to pde_get(), pde_put() for cool prefix!

    * remove obvious and incorrent comments

    * in remove_proc_entry() use pde_put(), when I fixed PDE refcounting to
    be normal one, remove_proc_entry() was supposed to do "-1" and code now
    reflects that.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

12 Jun, 2009

1 commit


31 Mar, 2009

1 commit

  • struct proc_dir_entry::owner is going to be removed. Now it's only necessary
    to protect PDEs which are using ->read_proc, ->write_proc hooks.

    However, ->owner assignments are racy and make it very easy for someone to switch
    ->owner on live PDE (as some subsystems do) without fixing refcounts and so on.

    http://bugzilla.kernel.org/show_bug.cgi?id=12454

    So, ->owner is on death row.

    Proxy file operations exist already (proc_file_operations), just bump usecount
    when necessary.

    Signed-off-by: Alexey Dobriyan

    Alexey Dobriyan
     

08 Jan, 2009

1 commit

  • Make VMAs per mm_struct as for MMU-mode linux. This solves two problems:

    (1) In SYSV SHM where nattch for a segment does not reflect the number of
    shmat's (and forks) done.

    (2) In mmap() where the VMA's vm_mm is set to point to the parent mm by an
    exec'ing process when VM_EXECUTABLE is specified, regardless of the fact
    that a VMA might be shared and already have its vm_mm assigned to another
    process or a dead process.

    A new struct (vm_region) is introduced to track a mapped region and to remember
    the circumstances under which it may be shared and the vm_list_struct structure
    is discarded as it's no longer required.

    This patch makes the following additional changes:

    (1) Regions are now allocated with alloc_pages() rather than kmalloc() and
    with no recourse to __GFP_COMP, so the pages are not composite. Instead,
    each page has a reference on it held by the region. Anything else that is
    interested in such a page will have to get a reference on it to retain it.
    When the pages are released due to unmapping, each page is passed to
    put_page() and will be freed when the page usage count reaches zero.

    (2) Excess pages are trimmed after an allocation as the allocation must be
    made as a power-of-2 quantity of pages.

    (3) VMAs are added to the parent MM's R/B tree and mmap lists. As an MM may
    end up with overlapping VMAs within the tree, the VMA struct address is
    appended to the sort key.

    (4) Non-anonymous VMAs are now added to the backing inode's prio list.

    (5) Holes may be punched in anonymous VMAs with munmap(), releasing parts of
    the backing region. The VMA and region structs will be split if
    necessary.

    (6) sys_shmdt() only releases one attachment to a SYSV IPC shared memory
    segment instead of all the attachments at that addresss. Multiple
    shmat()'s return the same address under NOMMU-mode instead of different
    virtual addresses as under MMU-mode.

    (7) Core dumping for ELF-FDPIC requires fewer exceptions for NOMMU-mode.

    (8) /proc/maps is now the global list of mapped regions, and may list bits
    that aren't actually mapped anywhere.

    (9) /proc/meminfo gains a line (tagged "MmapCopy") that indicates the amount
    of RAM currently allocated by mmap to hold mappable regions that can't be
    mapped directly. These are copies of the backing device or file if not
    anonymous.

    These changes make NOMMU mode more similar to MMU mode. The downside is that
    NOMMU mode requires some extra memory to track things over NOMMU without this
    patch (VMAs are no longer shared, and there are now region structs).

    Signed-off-by: David Howells
    Tested-by: Mike Frysinger
    Acked-by: Paul Mundt

    David Howells
     

23 Oct, 2008

2 commits


10 Oct, 2008

1 commit

  • After commit 831830b5a2b5d413407adf380ef62fe17d6fcbf2 aka
    "restrict reading from /proc//maps to those who share ->mm or can ptrace"
    sysctl stopped being relevant because commit moved security checks from ->show
    time to ->start time (mm_for_maps()).

    Signed-off-by: Alexey Dobriyan
    Acked-by: Kees Cook

    Alexey Dobriyan
     

26 Jul, 2008

2 commits

  • Current two-stage scheme of removing PDE emphasizes one bug in proc:

    open
    rmmod
    remove_proc_entry
    close

    ->release won't be called because ->proc_fops were cleared. In simple
    cases it's small memory leak.

    For every ->open, ->release has to be done. List of openers is introduced
    which is traversed at remove_proc_entry() if neeeded.

    Discussions with Al long ago (sigh).

    Signed-off-by: Alexey Dobriyan
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • This patch moves the extern of struct proc_kmsg_operations to
    fs/proc/internal.h and adds an #include "internal.h" to fs/proc/kmsg.c
    so that the latter sees the former.

    Signed-off-by: Adrian Bunk
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     

29 Apr, 2008

3 commits

  • Remove proc_root export. Creation and removal works well if parent PDE is
    supplied as NULL -- it worked always that way.

    So, one useless export removed and consistency added, some drivers created
    PDEs with &proc_root as parent but removed them as NULL and so on.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • The kernel implements readlink of /proc/pid/exe by getting the file from
    the first executable VMA. Then the path to the file is reconstructed and
    reported as the result.

    Because of the VMA walk the code is slightly different on nommu systems.
    This patch avoids separate /proc/pid/exe code on nommu systems. Instead of
    walking the VMAs to find the first executable file-backed VMA we store a
    reference to the exec'd file in the mm_struct.

    That reference would prevent the filesystem holding the executable file
    from being unmounted even after unmapping the VMAs. So we track the number
    of VM_EXECUTABLE VMAs and drop the new reference when the last one is
    unmapped. This avoids pinning the mounted filesystem.

    [akpm@linux-foundation.org: improve comments]
    [yamamoto@valinux.co.jp: fix dup_mmap]
    Signed-off-by: Matt Helsley
    Cc: Oleg Nesterov
    Cc: David Howells
    Cc:"Eric W. Biederman"
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: Hugh Dickins
    Signed-off-by: YAMAMOTO Takashi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Helsley
     

08 Mar, 2008

1 commit

  • Current /proc/net is done with so called "shadows", but current
    implementation is broken and has little chances to get fixed.

    The problem is that dentries subtree of /proc/net directory has
    fancy revalidation rules to make processes living in different
    net namespaces see different entries in /proc/net subtree, but
    currently, tasks see in the /proc/net subdir the contents of any
    other namespace, depending on who opened the file first.

    The proposed fix is to turn /proc/net into a symlink, which points
    to /proc/self/net, which in turn shows what previously was in
    /proc/net - the network-related info, from the net namespace the
    appropriate task lives in.

    # ls -l /proc/net
    lrwxrwxrwx 1 root root 8 Mar 5 15:17 /proc/net -> self/net

    In other words - this behaves like /proc/mounts, but unlike
    "mounts", "net" is not a file, but a directory.

    Changes from v2:
    * Fixed discrepancy of /proc/net nlink count and selinux labeling
    screwup pointed out by Stephen.

    To get the correct nlink count the ->getattr callback for /proc/net
    is overridden to read one from the net->proc_net entry.

    To make selinux still work the net->proc_net entry is initialized
    properly, i.e. with the "net" name and the proc_net parent.

    Selinux fixes are
    Acked-by: Stephen Smalley

    Changes from v1:
    * Fixed a task_struct leak in get_proc_task_net, pointed out by Paul.

    Signed-off-by: Pavel Emelyanov
    Acked-by: "Eric W. Biederman"
    Signed-off-by: David S. Miller

    Pavel Emelyanov
     

15 Feb, 2008

1 commit

  • proc_get_link() is always called with a dentry and a vfsmount from a struct
    path. Make proc_get_link() take it directly as an argument.

    Signed-off-by: Jan Blunck
    Acked-by: Christoph Hellwig
    Cc: Al Viro
    Cc: "J. Bruce Fields"
    Cc: Neil Brown
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Blunck
     

09 Feb, 2008

3 commits

  • Currently we possibly lookup the pid in the wrong pid namespace. So
    seq_file convert proc_pid_status which ensures the proper pid namespaces is
    passed in.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: another build fix]
    [akpm@linux-foundation.org: s390 build fix]
    [akpm@linux-foundation.org: fix task_name() output]
    [akpm@linux-foundation.org: fix nommu build]
    Signed-off-by: Eric W. Biederman
    Cc: Andrew Morgan
    Cc: Serge Hallyn
    Cc: Cedric Le Goater
    Cc: Pavel Emelyanov
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Paul Menage
    Cc: Paul Jackson
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • This conversion is just for code cleanliness, uniformity, and general safety.

    Signed-off-by: Eric W. Biederman
    Cc: Oleg Nesterov
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Currently (as pointed out by Oleg) do_task_stat has a race when calling
    task_pid_nr_ns with the task exiting. In addition do_task_stat is not
    currently displaying information in the context of the pid namespace that
    mounted the /proc filesystem. So "cut -d' ' -f 1 /proc//stat" may not
    equal .

    This patch fixes the problem by converting to a single_open seq_file show
    method. Getting the pid namespace from the filesystem superblock instead of
    current, and simply using the the struct pid from the inode instead of
    attempting to get that same pid from the task.

    Signed-off-by: Eric W. Biederman
    Cc: Oleg Nesterov
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

06 Feb, 2008

2 commits

  • This interface provides a mapping for each page in an address space to its
    physical page frame number, allowing precise determination of what pages are
    mapped and what pages are shared between processes.

    New in this version:

    - headers gone again (as recommended by Dave Hansen and Alan Cox)
    - 64-bit entries (as per discussion with Andi Kleen)
    - swap pte information exported (from Dave Hansen)
    - page walker callback for holes (from Dave Hansen)
    - direct put_user I/O (as suggested by Rusty Russell)

    This patch folds in cleanups and swap PTE support from Dave Hansen
    .

    Signed-off-by: Matt Mackall
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Mackall
     
  • This puts all the clear_refs code where it belongs and probably lets things
    compile on MMU-less systems as well.

    Signed-off-by: Matt Mackall
    Cc: Jeremy Fitzhardinge
    Cc: David Rientjes
    Cc: Dave Hansen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Mackall
     

03 Jan, 2008

1 commit

  • Contents of /proc/*/maps is sensitive and may become sensitive after
    open() (e.g. if target originally shares our ->mm and later does exec
    on suid-root binary).

    Check at read() (actually, ->start() of iterator) time that mm_struct
    we'd grabbed and locked is
    - still the ->mm of target
    - equal to reader's ->mm or the target is ptracable by reader.

    Signed-off-by: Al Viro
    Acked-by: Rik van Riel
    Signed-off-by: Linus Torvalds

    Al Viro
     

30 Nov, 2007

1 commit

  • proc_kill_inodes() can clear ->i_fop in the middle of vfs_readdir resulting in
    NULL dereference during "file->f_op->readdir(file, buf, filler)".

    The solution is to remove proc_kill_inodes() completely:

    a) we don't have tricky modules implementing their tricky readdir hooks which
    could keeping this revoke from hell.

    b) In a situation when module is gone but PDE still alive, standard
    readdir will return only "." and "..", because pde->next was cleared by
    remove_proc_entry().

    c) the race proc_kill_inode() destined to prevent is not completely
    fixed, just race window made smaller, because vfs_readdir() is run
    without sb_lock held and without file_list_lock held. Effectively,
    ->i_fop is cleared at random moment, which can't fix properly anything.

    BUG: unable to handle kernel NULL pointer dereference at virtual address 00000018
    printing eip: c1061205 *pdpt = 0000000005b22001 *pde = 0000000000000000
    Oops: 0000 [#1] PREEMPT SMP
    Modules linked in: foo af_packet ipv6 cpufreq_ondemand loop serio_raw sr_mod k8temp cdrom hwmon amd_rng
    Pid: 2033, comm: find Not tainted (2.6.24-rc1-b1d08ac064268d0ae2281e98bf5e82627e0f0c56 #2)
    EIP: 0060:[] EFLAGS: 00010246 CPU: 0
    EIP is at vfs_readdir+0x47/0x74
    EAX: c6b6a780 EBX: 00000000 ECX: c1061040 EDX: c5decf94
    ESI: c6b6a780 EDI: fffffffe EBP: c9797c54 ESP: c5decf78
    DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    Process find (pid: 2033, ti=c5dec000 task=c64bba90 task.ti=c5dec000)
    Stack: c5decf94 c1061040 fffffff7 0805ffbc 00000000 c6b6a780 c1061295 0805ffbc
    00000000 00000400 00000000 00000004 0805ffbc 4588eff4 c5dec000 c10026ba
    00000004 0805ffbc 00000400 0805ffbc 4588eff4 bfdc6c70 000000dc 0000007b
    Call Trace:
    [] filldir64+0x0/0xc5
    [] sys_getdents64+0x63/0xa5
    [] sysenter_past_esp+0x5f/0x85
    =======================
    Code: 49 83 78 18 00 74 43 8d 6b 74 bf fe ff ff ff 89 e8 e8 b8 c0 12 00 f6 83 2c 01 00 00 10 75 22 8b 5e 10 8b 4c 24 04 89 f0 8b 14 24 53 18 f6 46 1a 04 89 c7 75 0b 8b 56 0c 8b 46 08 e8 c8 66 00
    EIP: [] vfs_readdir+0x47/0x74 SS:ESP 0068:c5decf78

    hch: "Nice, getting rid of this is a very good step formwards.
    Unfortunately we have another copy of this junk in
    security/selinux/selinuxfs.c:sel_remove_entries() which would need the
    same treatment."

    Signed-off-by: Alexey Dobriyan
    Acked-by: Christoph Hellwig
    Cc: Al Viro
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan