28 Feb, 2013

1 commit

  • - use pr_foo() throughout

    - remove a couple of duplicated KERN_WARNINGs, via WARN(KERN_WARNING "...")

    - nuke a few warnings which I've never seen happen, ever.

    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

26 Feb, 2013

2 commits

  • Make it drop the pde in *all* cases when no new reference to it is
    put into an inode - both when an inode had already been set up
    (as we were already doing) and when inode allocation has failed.
    Makes for simpler logics in callers...

    Signed-off-by: Al Viro

    Al Viro
     
  • If proc_get_inode() succeeded, but d_make_root() failed, pde_put() for
    proc_root will be called twice: the first time due to iput() called from
    d_make_root() and the second time directly in the end of
    proc_fill_super().

    Signed-off-by: Maxim Patlasov
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Maxim Patlasov
     

23 Feb, 2013

1 commit


20 Nov, 2012

1 commit

  • Change the proc namespace files into symlinks so that
    we won't cache the dentries for the namespace files
    which can bypass the ptrace_may_access checks.

    To support the symlinks create an additional namespace
    inode with it's own set of operations distinct from the
    proc pid inode and dentry methods as those no longer
    make sense.

    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

06 Oct, 2012

1 commit

  • proc_get_inode() obtains the inode via a call to iget_locked().
    iget_locked() calls alloc_inode() which will call proc_alloc_inode() which
    clears proc_inode.fd, so there is no need to clear this field in
    proc_get_inode().

    If iget_locked() instead found the inode via find_inode_fast(), that inode
    will not have I_NEW set so this change has no effect.

    Signed-off-by: yan
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yan
     

29 May, 2012

1 commit

  • Pull writeback tree from Wu Fengguang:
    "Mainly from Jan Kara to avoid iput() in the flusher threads."

    * tag 'writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    writeback: Avoid iput() from flusher thread
    vfs: Rename end_writeback() to clear_inode()
    vfs: Move waiting for inode writeback from end_writeback() to evict_inode()
    writeback: Refactor writeback_single_inode()
    writeback: Remove wb->list_lock from writeback_single_inode()
    writeback: Separate inode requeueing after writeback
    writeback: Move I_DIRTY_PAGES handling
    writeback: Move requeueing when I_SYNC set to writeback_sb_inodes()
    writeback: Move clearing of I_SYNC into inode_sync_complete()
    writeback: initialize global_dirty_limit
    fs: remove 8 bytes of padding from struct writeback_control on 64 bit builds
    mm: page-writeback.c: local functions should not be exposed globally

    Linus Torvalds
     

16 May, 2012

1 commit


06 May, 2012

1 commit

  • After we moved inode_sync_wait() from end_writeback() it doesn't make sense
    to call the function end_writeback() anymore. Rename it to clear_inode()
    which well says what the function really does - set I_CLEAR flag.

    Signed-off-by: Jan Kara
    Signed-off-by: Fengguang Wu

    Jan Kara
     

29 Mar, 2012

1 commit


21 Mar, 2012

2 commits


11 Jan, 2012

2 commits

  • Add support for mount options to restrict access to /proc/PID/
    directories. The default backward-compatible "relaxed" behaviour is left
    untouched.

    The first mount option is called "hidepid" and its value defines how much
    info about processes we want to be available for non-owners:

    hidepid=0 (default) means the old behavior - anybody may read all
    world-readable /proc/PID/* files.

    hidepid=1 means users may not access any /proc// directories, but
    their own. Sensitive files like cmdline, sched*, status are now protected
    against other users. As permission checking done in proc_pid_permission()
    and files' permissions are left untouched, programs expecting specific
    files' modes are not confused.

    hidepid=2 means hidepid=1 plus all /proc/PID/ will be invisible to other
    users. It doesn't mean that it hides whether a process exists (it can be
    learned by other means, e.g. by kill -0 $PID), but it hides process' euid
    and egid. It compicates intruder's task of gathering info about running
    processes, whether some daemon runs with elevated privileges, whether
    another user runs some sensitive program, whether other users run any
    program at all, etc.

    gid=XXX defines a group that will be able to gather all processes' info
    (as in hidepid=0 mode). This group should be used instead of putting
    nonroot user in sudoers file or something. However, untrusted users (like
    daemons, etc.) which are not supposed to monitor the tasks in the whole
    system should not be added to the group.

    hidepid=1 or higher is designed to restrict access to procfs files, which
    might reveal some sensitive private information like precise keystrokes
    timings:

    http://www.openwall.com/lists/oss-security/2011/11/05/3

    hidepid=1/2 doesn't break monitoring userspace tools. ps, top, pgrep, and
    conky gracefully handle EPERM/ENOENT and behave as if the current user is
    the only user running processes. pstree shows the process subtree which
    contains "pstree" process.

    Note: the patch doesn't deal with setuid/setgid issues of keeping
    preopened descriptors of procfs files (like
    https://lkml.org/lkml/2011/2/7/368). We rely on that the leaked
    information like the scheduling counters of setuid apps doesn't threaten
    anybody's privacy - only the user started the setuid program may read the
    counters.

    Signed-off-by: Vasiliy Kulikov
    Cc: Alexey Dobriyan
    Cc: Al Viro
    Cc: Randy Dunlap
    Cc: "H. Peter Anvin"
    Cc: Greg KH
    Cc: Theodore Tso
    Cc: Alan Cox
    Cc: James Morris
    Cc: Oleg Nesterov
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     
  • Add support for procfs mount options. Actual mount options are coming in
    the next patches.

    Signed-off-by: Vasiliy Kulikov
    Cc: Alexey Dobriyan
    Cc: Al Viro
    Cc: Randy Dunlap
    Cc: "H. Peter Anvin"
    Cc: Greg KH
    Cc: Theodore Tso
    Cc: Alan Cox
    Cc: James Morris
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasiliy Kulikov
     

04 Jan, 2012

1 commit

  • Seeing that just about every destructor got that INIT_LIST_HEAD() copied into
    it, there is no point whatsoever keeping this INIT_LIST_HEAD in inode_init_once();
    the cost of taking it into inode_init_always() will be negligible for pipes
    and sockets and negative for everything else. Not to mention the removal of
    boilerplate code from ->destroy_inode() instances...

    Signed-off-by: Al Viro

    Al Viro
     

02 Nov, 2011

1 commit


27 Jul, 2011

1 commit

  • Change the return value to ENOENT. This return value is then returned
    when opening the proc entry that have been removed. For example,
    open("/proc/bus/pci/XX/YY") when the corresponding device is being
    hot-removed.

    Signed-off-by: Daisuke Ogino
    Cc: Jesse Barnes
    Acked-by: Alexey Dobriyan
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daisuke Ogino
     

11 May, 2011

1 commit

  • Create files under /proc//ns/ to allow controlling the
    namespaces of a process.

    This addresses three specific problems that can make namespaces hard to
    work with.
    - Namespaces require a dedicated process to pin them in memory.
    - It is not possible to use a namespace unless you are the child
    of the original creator.
    - Namespaces don't have names that userspace can use to talk about
    them.

    The namespace files under /proc//ns/ can be opened and the
    file descriptor can be used to talk about a specific namespace, and
    to keep the specified namespace alive.

    A namespace can be kept alive by either holding the file descriptor
    open or bind mounting the file someplace else. aka:
    mount --bind /proc/self/ns/net /some/filesystem/path
    mount --bind /proc/self/fd/ /some/filesystem/path

    This allows namespaces to be named with userspace policy.

    It requires additional support to make use of these filedescriptors
    and that will be comming in the following patches.

    Acked-by: Daniel Lezcano
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

24 Mar, 2011

1 commit

  • After the previous cleanup in proc_get_sb() the global proc_mnt has no
    reasons to exists, kill it.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Daniel Lezcano
    Cc: Alexey Dobriyan
    Acked-by: Serge E. Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

08 Mar, 2011

1 commit

  • a) struct inode is not going to be freed under ->d_compare();
    however, the thing PROC_I(inode)->sysctl points to just might.
    Fortunately, it's enough to make freeing that sucker delayed,
    provided that we don't step on its ->unregistering, clear
    the pointer to it in PROC_I(inode) before dropping the reference
    and check if it's NULL in ->d_compare().

    b) I'm not sure that we *can* walk into NULL inode here (we recheck
    dentry->seq between verifying that it's still hashed / fetching
    dentry->d_inode and passing it to ->d_compare() and there's no
    negative hashed dentries in /proc/sys/*), but if we can walk into
    that, we really should not have ->d_compare() return 0 on it!
    Said that, I really suspect that this check can be simply killed.
    Nick?

    Signed-off-by: Al Viro

    Al Viro
     

14 Jan, 2011

1 commit

  • - ->low_ino is write-once field -- reading it under locks is unnecessary.

    - /proc/$PID stuff never reaches pde_put()/free_proc_entry() --
    PROC_DYNAMIC_FIRST check never triggers.

    - in proc_get_inode(), inode number always matches proc dir entry, so
    save one parameter.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

07 Jan, 2011

1 commit

  • RCU free the struct inode. This will allow:

    - Subsequent store-free path walking patch. The inode must be consulted for
    permissions when walking, so an RCU inode reference is a must.
    - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
    to take i_lock no longer need to take sb_inode_list_lock to walk the list in
    the first place. This will simplify and optimize locking.
    - Could remove some nested trylock loops in dcache code
    - Could potentially simplify things a bit in VM land. Do not need to take the
    page lock to follow page->mapping.

    The downsides of this is the performance cost of using RCU. In a simple
    creat/unlink microbenchmark, performance drops by about 10% due to inability to
    reuse cache-hot slab objects. As iterations increase and RCU freeing starts
    kicking over, this increases to about 20%.

    In cases where inode lifetimes are longer (ie. many inodes may be allocated
    during the average life span of a single inode), a lot of this cache reuse is
    not applicable, so the regression caused by this patch is smaller.

    The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
    however this adds some complexity to list walking and store-free path walking,
    so I prefer to implement this at a later date, if it is shown to be a win in
    real situations. I haven't found a regression in any non-micro benchmark so I
    doubt it will be a problem.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

18 Nov, 2010

1 commit


14 Aug, 2010

1 commit


10 Aug, 2010

1 commit


20 May, 2010

1 commit


17 May, 2010

1 commit

  • There are no more users of procfs that implement the ioctl
    callback. Drop the bkl from this path and warn on any use
    of this callback.

    Signed-off-by: Frederic Weisbecker
    Cc: Arnd Bergmann
    Cc: Thomas Gleixner
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: John Kacur
    Cc: KAMEZAWA Hiroyuki
    Cc: Al Viro

    Frederic Weisbecker
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

16 Dec, 2009

1 commit

  • * de_get() is trivial -- make inline, save a few bits of code, drop
    "refcount is 0" check -- it should be done in some generic refcount
    code, don't recall it's was helpful

    * rename GET and PUT functions to pde_get(), pde_put() for cool prefix!

    * remove obvious and incorrent comments

    * in remove_proc_entry() use pde_put(), when I fixed PDE refcounting to
    be normal one, remove_proc_entry() was supposed to do "-1" and code now
    reflects that.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

31 Mar, 2009

2 commits

  • Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy
    as correctly noted at bug #12454. Someone can lookup entry with NULL
    ->owner, thus not pinning enything, and release it later resulting
    in module refcount underflow.

    We can keep ->owner and supply it at registration time like ->proc_fops
    and ->data.

    But this leaves ->owner as easy-manipulative field (just one C assignment)
    and somebody will forget to unpin previous/pin current module when
    switching ->owner. ->proc_fops is declared as "const" which should give
    some thoughts.

    ->read_proc/->write_proc were just fixed to not require ->owner for
    protection.

    rmmod'ed directories will be empty and return "." and ".." -- no harm.
    And directories with tricky enough readdir and lookup shouldn't be modular.
    We definitely don't want such modular code.

    Removing ->owner will also make PDE smaller.

    So, let's nuke it.

    Kudos to Jeff Layton for reminding about this, let's say, oversight.

    http://bugzilla.kernel.org/show_bug.cgi?id=12454

    Signed-off-by: Alexey Dobriyan

    Alexey Dobriyan
     
  • struct proc_dir_entry::owner is going to be removed. Now it's only necessary
    to protect PDEs which are using ->read_proc, ->write_proc hooks.

    However, ->owner assignments are racy and make it very easy for someone to switch
    ->owner on live PDE (as some subsystems do) without fixing refcounts and so on.

    http://bugzilla.kernel.org/show_bug.cgi?id=12454

    So, ->owner is on death row.

    Proxy file operations exist already (proc_file_operations), just bump usecount
    when necessary.

    Signed-off-by: Alexey Dobriyan

    Alexey Dobriyan
     

24 Feb, 2009

1 commit

  • de_get is called before every proc_get_inode, but corresponding de_put is
    called only when dropping last reference to an inode. This might cause
    something like
    remove_proc_entry: /proc/stats busy, count=14496
    to be printed to the syslog.

    The fix is to call de_put in case of an already initialized inode in
    proc_get_inode.

    Signed-off-by: Krzysztof Sachanowicz
    Tested-by: Marcin Pilipczuk
    Acked-by: Al Viro
    Signed-off-by: Linus Torvalds

    Krzysztof Sachanowicz
     

05 Jan, 2009

1 commit

  • There are four BKL users in proc: de_put(), proc_lookup_de(),
    proc_readdir_de(), proc_root_readdir(),

    1) de_put()
    -----------
    de_put() is classic atomic_dec_and_test() refcount wrapper -- no BKL
    needed. BKL doesn't matter to possible refcount leak as well.

    2) proc_lookup_de()
    -------------------
    Walking PDE list is protected by proc_subdir_lock(), proc_get_inode() is
    potentially blocking, all callers of proc_lookup_de() eventually end up
    from ->lookup hooks which is protected by directory's ->i_mutex -- BKL
    doesn't protect anything.

    3) proc_readdir_de()
    --------------------
    "." and ".." part doesn't need BKL, walking PDE list is under
    proc_subdir_lock, calling filldir callback is potentially blocking
    because it writes to luserspace. All proc_readdir_de() callers
    eventually come from ->readdir hook which is under directory's
    ->i_mutex -- BKL doesn't protect anything.

    4) proc_root_readdir_de()
    -------------------------
    proc_root_readdir_de is ->readdir hook, see (3).

    Since readdir hooks doesn't use BKL anymore, switch to
    generic_file_llseek, since it also takes directory's i_mutex.

    Signed-off-by: Alexey Dobriyan

    Alexey Dobriyan
     

23 Oct, 2008

1 commit


10 Oct, 2008

1 commit


27 Jul, 2008

2 commits

  • * keep references to ctl_table_head and ctl_table in /proc/sys inodes
    * grab the former during operations, use the latter for access to
    entry if that succeeds
    * have ->d_compare() check if table should be seen for one who does lookup;
    that allows us to avoid flipping inodes - if we have the same name resolve
    to different things, we'll just keep several dentries and ->d_compare()
    will reject the wrong ones.
    * have ->lookup() and ->readdir() scan the table of our inode first, then
    walk all ctl_table_header and scan ->attached_by for those that are
    attached to our directory.
    * implement ->getattr().
    * get rid of insane amounts of tree-walking
    * get rid of the need to know dentry in ->permission() and of the contortions
    induced by that.

    Signed-off-by: Al Viro

    Al Viro
     
  • Kmem cache passed to constructor is only needed for constructors that are
    themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
    passed kmem cache in non-trivial way, so pass only pointer to object.

    Non-trivial places are:
    arch/powerpc/mm/init_64.c
    arch/powerpc/mm/hugetlbpage.c

    This is flag day, yes.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Pekka Enberg
    Acked-by: Christoph Lameter
    Cc: Jon Tollefson
    Cc: Nick Piggin
    Cc: Matt Mackall
    [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
    [akpm@linux-foundation.org: fix mm/slab.c]
    [akpm@linux-foundation.org: fix ubifs]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

26 Jul, 2008

2 commits

  • MS_RMT_MASK will unmask changes in do_remount_sb() anyway.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Current two-stage scheme of removing PDE emphasizes one bug in proc:

    open
    rmmod
    remove_proc_entry
    close

    ->release won't be called because ->proc_fops were cleared. In simple
    cases it's small memory leak.

    For every ->open, ->release has to be done. List of openers is introduced
    which is traversed at remove_proc_entry() if neeeded.

    Discussions with Al long ago (sigh).

    Signed-off-by: Alexey Dobriyan
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

25 May, 2008

1 commit

  • Any file under /proc/net opened more than once leaked the refcounter
    on the module it belongs to.

    The problem is that module_get is called for each file opening while
    module_put is called only when /proc inode is destroyed. So, lets put
    module counter if we are dealing with already initialised inode.

    Addresses http://bugzilla.kernel.org/show_bug.cgi?id=10737

    Signed-off-by: Denis V. Lunev
    Cc: David Miller
    Cc: Patrick McHardy
    Acked-by: Pavel Emelyanov
    Acked-by: Robert Olsson
    Acked-by: Eric W. Biederman
    Reported-by: Roland Kletzing
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denis V. Lunev