11 May, 2011

1 commit

  • Create files under /proc//ns/ to allow controlling the
    namespaces of a process.

    This addresses three specific problems that can make namespaces hard to
    work with.
    - Namespaces require a dedicated process to pin them in memory.
    - It is not possible to use a namespace unless you are the child
    of the original creator.
    - Namespaces don't have names that userspace can use to talk about
    them.

    The namespace files under /proc//ns/ can be opened and the
    file descriptor can be used to talk about a specific namespace, and
    to keep the specified namespace alive.

    A namespace can be kept alive by either holding the file descriptor
    open or bind mounting the file someplace else. aka:
    mount --bind /proc/self/ns/net /some/filesystem/path
    mount --bind /proc/self/fd/ /some/filesystem/path

    This allows namespaces to be named with userspace policy.

    It requires additional support to make use of these filedescriptors
    and that will be comming in the following patches.

    Acked-by: Daniel Lezcano
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

24 Mar, 2011

1 commit

  • After the previous cleanup in proc_get_sb() the global proc_mnt has no
    reasons to exists, kill it.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Daniel Lezcano
    Cc: Alexey Dobriyan
    Acked-by: Serge E. Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

08 Mar, 2011

1 commit

  • a) struct inode is not going to be freed under ->d_compare();
    however, the thing PROC_I(inode)->sysctl points to just might.
    Fortunately, it's enough to make freeing that sucker delayed,
    provided that we don't step on its ->unregistering, clear
    the pointer to it in PROC_I(inode) before dropping the reference
    and check if it's NULL in ->d_compare().

    b) I'm not sure that we *can* walk into NULL inode here (we recheck
    dentry->seq between verifying that it's still hashed / fetching
    dentry->d_inode and passing it to ->d_compare() and there's no
    negative hashed dentries in /proc/sys/*), but if we can walk into
    that, we really should not have ->d_compare() return 0 on it!
    Said that, I really suspect that this check can be simply killed.
    Nick?

    Signed-off-by: Al Viro

    Al Viro
     

14 Jan, 2011

1 commit

  • - ->low_ino is write-once field -- reading it under locks is unnecessary.

    - /proc/$PID stuff never reaches pde_put()/free_proc_entry() --
    PROC_DYNAMIC_FIRST check never triggers.

    - in proc_get_inode(), inode number always matches proc dir entry, so
    save one parameter.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

07 Jan, 2011

1 commit

  • RCU free the struct inode. This will allow:

    - Subsequent store-free path walking patch. The inode must be consulted for
    permissions when walking, so an RCU inode reference is a must.
    - sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
    to take i_lock no longer need to take sb_inode_list_lock to walk the list in
    the first place. This will simplify and optimize locking.
    - Could remove some nested trylock loops in dcache code
    - Could potentially simplify things a bit in VM land. Do not need to take the
    page lock to follow page->mapping.

    The downsides of this is the performance cost of using RCU. In a simple
    creat/unlink microbenchmark, performance drops by about 10% due to inability to
    reuse cache-hot slab objects. As iterations increase and RCU freeing starts
    kicking over, this increases to about 20%.

    In cases where inode lifetimes are longer (ie. many inodes may be allocated
    during the average life span of a single inode), a lot of this cache reuse is
    not applicable, so the regression caused by this patch is smaller.

    The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
    however this adds some complexity to list walking and store-free path walking,
    so I prefer to implement this at a later date, if it is shown to be a win in
    real situations. I haven't found a regression in any non-micro benchmark so I
    doubt it will be a problem.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

18 Nov, 2010

1 commit


14 Aug, 2010

1 commit


10 Aug, 2010

1 commit


20 May, 2010

1 commit


17 May, 2010

1 commit

  • There are no more users of procfs that implement the ioctl
    callback. Drop the bkl from this path and warn on any use
    of this callback.

    Signed-off-by: Frederic Weisbecker
    Cc: Arnd Bergmann
    Cc: Thomas Gleixner
    Cc: Andrew Morton
    Cc: Ingo Molnar
    Cc: John Kacur
    Cc: KAMEZAWA Hiroyuki
    Cc: Al Viro

    Frederic Weisbecker
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

16 Dec, 2009

1 commit

  • * de_get() is trivial -- make inline, save a few bits of code, drop
    "refcount is 0" check -- it should be done in some generic refcount
    code, don't recall it's was helpful

    * rename GET and PUT functions to pde_get(), pde_put() for cool prefix!

    * remove obvious and incorrent comments

    * in remove_proc_entry() use pde_put(), when I fixed PDE refcounting to
    be normal one, remove_proc_entry() was supposed to do "-1" and code now
    reflects that.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

31 Mar, 2009

2 commits

  • Setting ->owner as done currently (pde->owner = THIS_MODULE) is racy
    as correctly noted at bug #12454. Someone can lookup entry with NULL
    ->owner, thus not pinning enything, and release it later resulting
    in module refcount underflow.

    We can keep ->owner and supply it at registration time like ->proc_fops
    and ->data.

    But this leaves ->owner as easy-manipulative field (just one C assignment)
    and somebody will forget to unpin previous/pin current module when
    switching ->owner. ->proc_fops is declared as "const" which should give
    some thoughts.

    ->read_proc/->write_proc were just fixed to not require ->owner for
    protection.

    rmmod'ed directories will be empty and return "." and ".." -- no harm.
    And directories with tricky enough readdir and lookup shouldn't be modular.
    We definitely don't want such modular code.

    Removing ->owner will also make PDE smaller.

    So, let's nuke it.

    Kudos to Jeff Layton for reminding about this, let's say, oversight.

    http://bugzilla.kernel.org/show_bug.cgi?id=12454

    Signed-off-by: Alexey Dobriyan

    Alexey Dobriyan
     
  • struct proc_dir_entry::owner is going to be removed. Now it's only necessary
    to protect PDEs which are using ->read_proc, ->write_proc hooks.

    However, ->owner assignments are racy and make it very easy for someone to switch
    ->owner on live PDE (as some subsystems do) without fixing refcounts and so on.

    http://bugzilla.kernel.org/show_bug.cgi?id=12454

    So, ->owner is on death row.

    Proxy file operations exist already (proc_file_operations), just bump usecount
    when necessary.

    Signed-off-by: Alexey Dobriyan

    Alexey Dobriyan
     

24 Feb, 2009

1 commit

  • de_get is called before every proc_get_inode, but corresponding de_put is
    called only when dropping last reference to an inode. This might cause
    something like
    remove_proc_entry: /proc/stats busy, count=14496
    to be printed to the syslog.

    The fix is to call de_put in case of an already initialized inode in
    proc_get_inode.

    Signed-off-by: Krzysztof Sachanowicz
    Tested-by: Marcin Pilipczuk
    Acked-by: Al Viro
    Signed-off-by: Linus Torvalds

    Krzysztof Sachanowicz
     

05 Jan, 2009

1 commit

  • There are four BKL users in proc: de_put(), proc_lookup_de(),
    proc_readdir_de(), proc_root_readdir(),

    1) de_put()
    -----------
    de_put() is classic atomic_dec_and_test() refcount wrapper -- no BKL
    needed. BKL doesn't matter to possible refcount leak as well.

    2) proc_lookup_de()
    -------------------
    Walking PDE list is protected by proc_subdir_lock(), proc_get_inode() is
    potentially blocking, all callers of proc_lookup_de() eventually end up
    from ->lookup hooks which is protected by directory's ->i_mutex -- BKL
    doesn't protect anything.

    3) proc_readdir_de()
    --------------------
    "." and ".." part doesn't need BKL, walking PDE list is under
    proc_subdir_lock, calling filldir callback is potentially blocking
    because it writes to luserspace. All proc_readdir_de() callers
    eventually come from ->readdir hook which is under directory's
    ->i_mutex -- BKL doesn't protect anything.

    4) proc_root_readdir_de()
    -------------------------
    proc_root_readdir_de is ->readdir hook, see (3).

    Since readdir hooks doesn't use BKL anymore, switch to
    generic_file_llseek, since it also takes directory's i_mutex.

    Signed-off-by: Alexey Dobriyan

    Alexey Dobriyan
     

23 Oct, 2008

1 commit


10 Oct, 2008

1 commit


27 Jul, 2008

2 commits

  • * keep references to ctl_table_head and ctl_table in /proc/sys inodes
    * grab the former during operations, use the latter for access to
    entry if that succeeds
    * have ->d_compare() check if table should be seen for one who does lookup;
    that allows us to avoid flipping inodes - if we have the same name resolve
    to different things, we'll just keep several dentries and ->d_compare()
    will reject the wrong ones.
    * have ->lookup() and ->readdir() scan the table of our inode first, then
    walk all ctl_table_header and scan ->attached_by for those that are
    attached to our directory.
    * implement ->getattr().
    * get rid of insane amounts of tree-walking
    * get rid of the need to know dentry in ->permission() and of the contortions
    induced by that.

    Signed-off-by: Al Viro

    Al Viro
     
  • Kmem cache passed to constructor is only needed for constructors that are
    themselves multiplexeres. Nobody uses this "feature", nor does anybody uses
    passed kmem cache in non-trivial way, so pass only pointer to object.

    Non-trivial places are:
    arch/powerpc/mm/init_64.c
    arch/powerpc/mm/hugetlbpage.c

    This is flag day, yes.

    Signed-off-by: Alexey Dobriyan
    Acked-by: Pekka Enberg
    Acked-by: Christoph Lameter
    Cc: Jon Tollefson
    Cc: Nick Piggin
    Cc: Matt Mackall
    [akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
    [akpm@linux-foundation.org: fix mm/slab.c]
    [akpm@linux-foundation.org: fix ubifs]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

26 Jul, 2008

2 commits

  • MS_RMT_MASK will unmask changes in do_remount_sb() anyway.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Current two-stage scheme of removing PDE emphasizes one bug in proc:

    open
    rmmod
    remove_proc_entry
    close

    ->release won't be called because ->proc_fops were cleared. In simple
    cases it's small memory leak.

    For every ->open, ->release has to be done. List of openers is introduced
    which is traversed at remove_proc_entry() if neeeded.

    Discussions with Al long ago (sigh).

    Signed-off-by: Alexey Dobriyan
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

25 May, 2008

1 commit

  • Any file under /proc/net opened more than once leaked the refcounter
    on the module it belongs to.

    The problem is that module_get is called for each file opening while
    module_put is called only when /proc inode is destroyed. So, lets put
    module counter if we are dealing with already initialised inode.

    Addresses http://bugzilla.kernel.org/show_bug.cgi?id=10737

    Signed-off-by: Denis V. Lunev
    Cc: David Miller
    Cc: Patrick McHardy
    Acked-by: Pavel Emelyanov
    Acked-by: Robert Olsson
    Acked-by: Eric W. Biederman
    Reported-by: Roland Kletzing
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Denis V. Lunev
     

29 Apr, 2008

1 commit


09 Feb, 2008

1 commit


08 Feb, 2008

1 commit

  • Stop the PROCFS filesystem from using iget() and read_inode(). Merge
    procfs_read_inode() into procfs_get_inode(), and have that call iget_locked()
    instead of iget().

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: David Howells
    Cc: "Eric W. Biederman"
    Acked-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     

06 Dec, 2007

1 commit

  • Creating PDEs with refcount 0 and "deleted" flag has problems (see below).
    Switch to usual scheme:
    * PDE is created with refcount 1
    * every de_get does +1
    * every de_put() and remove_proc_entry() do -1
    * once refcount reaches 0, PDE is freed.

    This elegantly fixes at least two following races (both observed) without
    introducing new locks, without abusing old locks, without spreading
    lock_kernel():

    1) PDE leak

    remove_proc_entry de_put
    ----------------- ------
    [refcnt = 1]
    if (atomic_read(&de->count) == 0)
    if (atomic_dec_and_test(&de->count))
    if (de->deleted)
    /* also not taken! */
    free_proc_entry(de);
    else
    de->deleted = 1;
    [refcount=0, deleted=1]

    2) use after free

    remove_proc_entry de_put
    ----------------- ------
    [refcnt = 1]

    if (atomic_dec_and_test(&de->count))
    if (atomic_read(&de->count) == 0)
    free_proc_entry(de);
    /* boom! */
    if (de->deleted)
    free_proc_entry(de);

    BUG: unable to handle kernel paging request at virtual address 6b6b6b6b
    printing eip: c10acdda *pdpt = 00000000338f8001 *pde = 0000000000000000
    Oops: 0000 [#1] PREEMPT SMP
    Modules linked in: af_packet ipv6 cpufreq_ondemand loop serio_raw psmouse k8temp hwmon sr_mod cdrom
    Pid: 23161, comm: cat Not tainted (2.6.24-rc2-8c0863403f109a43d7000b4646da4818220d501f #4)
    EIP: 0060:[] EFLAGS: 00210097 CPU: 1
    EIP is at strnlen+0x6/0x18
    EAX: 6b6b6b6b EBX: 6b6b6b6b ECX: 6b6b6b6b EDX: fffffffe
    ESI: c128fa3b EDI: f380bf34 EBP: ffffffff ESP: f380be44
    DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
    Process cat (pid: 23161, ti=f380b000 task=f38f2570 task.ti=f380b000)
    Stack: c10ac4f0 00000278 c12ce000 f43cd2a8 00000163 00000000 7da86067 00000400
    c128fa20 00896b18 f38325a8 c128fe20 ffffffff 00000000 c11f291e 00000400
    f75be300 c128fa20 f769c9a0 c10ac779 f380bf34 f7bfee70 c1018e6b f380bf34
    Call Trace:
    [] vsnprintf+0x2ad/0x49b
    [] vscnprintf+0x14/0x1f
    [] vprintk+0xc5/0x2f9
    [] handle_fasteoi_irq+0x0/0xab
    [] do_IRQ+0x9f/0xb7
    [] preempt_schedule_irq+0x3f/0x5b
    [] need_resched+0x1f/0x21
    [] printk+0x1b/0x1f
    [] de_put+0x3d/0x50
    [] proc_delete_inode+0x38/0x41
    [] proc_delete_inode+0x0/0x41
    [] generic_delete_inode+0x5e/0xc6
    [] iput+0x60/0x62
    [] d_kill+0x2d/0x46
    [] dput+0xdc/0xe4
    [] __fput+0xb0/0xcd
    [] filp_close+0x48/0x4f
    [] sys_close+0x67/0xa5
    [] sysenter_past_esp+0x5f/0x85
    =======================
    Code: c9 74 0c f2 ae 74 05 bf 01 00 00 00 4f 89 fa 5f 89 d0 c3 85 c9 57 89 c7 89 d0 74 05 f2 ae 75 01 4f 89 f8 5f c3 89 c1 89 c8 eb 06 38 00 74 07 40 4a 83 fa ff 75 f4 29 c8 c3 90 90 90 57 83 c9
    EIP: [] strnlen+0x6/0x18 SS:ESP 0068:f380be44

    Also, remove broken usage of ->deleted from reiserfs: if sget() succeeds,
    module is already pinned and remove_proc_entry() can't happen => nobody
    can mark PDE deleted.

    Dummy proc root in netns code is not marked with refcount 1. AFAICS, we
    never get it, it's just for proper /proc/net removal. I double checked
    CLONE_NETNS continues to work.

    Patch survives many hours of modprobe/rmmod/cat loops without new bugs
    which can be attributed to refcounting.

    Signed-off-by: Alexey Dobriyan
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

20 Oct, 2007

1 commit

  • Each pid namespace have to be visible through its own proc mount. Thus we
    need to have per-namespace proc trees with their own superblocks.

    We cannot easily show different pid namespace via one global proc tree, since
    each pid refers to different tasks in different namespaces. E.g. pid 1
    refers to the init task in the initial namespace and to some other task when
    seeing from another namespace. Moreover - pid, exisintg in one namespace may
    not exist in the other.

    This approach has one move advantage is that the tasks from the init namespace
    can see what tasks live in another namespace by reading entries from another
    proc tree.

    Signed-off-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Sukadev Bhattiprolu
    Cc: Paul Menage
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     

17 Oct, 2007

2 commits


12 Sep, 2007

1 commit

  • Taneli Vähäkangas reported that commit
    786d7e1612f0b0adb6046f19b906609e4fe8b1ba aka "Fix rmmod/read/write races
    in /proc entries" broke SBCL + SLIME combo.

    The old code in do_select() used DEFAULT_POLLMASK, if couldn't find
    ->poll handler. The new code makes ->poll always there and returns 0 by
    default, which is not correct. Return DEFAULT_POLLMASK instead.

    Steps to reproduce:

    install emacs, SBCL, SLIME
    emacs
    M-x slime in *inferior-lisp* buffer
    [watch it doing "Connecting to Swank on port X.."]

    Please, apply before 2.6.23.

    P.S.: why SBCL can't just read(2) /proc/cpuinfo is a mystery.

    Signed-off-by: Alexey Dobriyan
    Cc: T Taneli Vahakangas
    Cc: Oleg Nesterov
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

29 Jul, 2007

1 commit

  • It is important to only provide the compat_ioctl method
    if the downstream de->proc_fops does too, otherwise this
    utterly confuses the logic in fs/compat_ioctl.c and we
    end up doing the wrong thing.

    Signed-off-by: David S. Miller
    Acked-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    David Miller
     

20 Jul, 2007

1 commit

  • Slab destructors were no longer supported after Christoph's
    c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
    BUGs for both slab and slub, and slob never supported them
    either.

    This rips out support for the dtor pointer from kmem_cache_create()
    completely and fixes up every single callsite in the kernel (there were
    about 224, not including the slab allocator definitions themselves,
    or the documentation references).

    Signed-off-by: Paul Mundt

    Paul Mundt
     

17 Jul, 2007

1 commit

  • Fix following races:
    ===========================================
    1. Write via ->write_proc sleeps in copy_from_user(). Module disappears
    meanwhile. Or, more generically, system call done on /proc file, method
    supplied by module is called, module dissapeares meanwhile.

    pde = create_proc_entry()
    if (!pde)
    return -ENOMEM;
    pde->write_proc = ...
    open
    write
    copy_from_user
    pde = create_proc_entry();
    if (!pde) {
    remove_proc_entry();
    return -ENOMEM;
    /* module unloaded */
    }
    *boom*
    ==========================================
    2. bogo-revoke aka proc_kill_inodes()

    remove_proc_entry vfs_read
    proc_kill_inodes [check ->f_op validness]
    [check ->f_op->read validness]
    [verify_area, security permissions checks]
    ->f_op = NULL;
    if (file->f_op->read)
    /* ->f_op dereference, boom */

    NOTE, NOTE, NOTE: file_operations are proxied for regular files only. Let's
    see how this scheme behaves, then extend if needed for directories.
    Directories creators in /proc only set ->owner for them, so proxying for
    directories may be unneeded.

    NOTE, NOTE, NOTE: methods being proxied are ->llseek, ->read, ->write,
    ->poll, ->unlocked_ioctl, ->ioctl, ->compat_ioctl, ->open, ->release.
    If your in-tree module uses something else, yell on me. Full audit pending.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

17 May, 2007

1 commit

  • SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.

    Signed-off-by: Christoph Lameter
    Cc: David Howells
    Cc: Jens Axboe
    Cc: Steven French
    Cc: Michael Halcrow
    Cc: OGAWA Hirofumi
    Cc: Miklos Szeredi
    Cc: Steven Whitehouse
    Cc: Roman Zippel
    Cc: David Woodhouse
    Cc: Dave Kleikamp
    Cc: Trond Myklebust
    Cc: "J. Bruce Fields"
    Cc: Anton Altaparmakov
    Cc: Mark Fasheh
    Cc: Paul Mackerras
    Cc: Christoph Hellwig
    Cc: Jan Kara
    Cc: David Chinner
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

09 May, 2007

2 commits

  • WARN_ON(de && de->deleted); is sooo unreliable. Why?

    proc_lookup remove_proc_entry
    =========== =================
    lock_kernel();
    spin_lock(&proc_subdir_lock);
    [find proc entry]
    spin_unlock(&proc_subdir_lock);
    spin_lock(&proc_subdir_lock);
    [find proc entry]

    proc_get_inode
    ==============
    WARN_ON(de && de->deleted); ...

    if (!atomic_read(&de->count))
    free_proc_entry(de);
    else
    de->deleted = 1;

    So, if you have some strange oops [1], and doesn't see this WARN_ON it means
    nothing.

    [1] try_module_get() of module which doesn't exist, two lines below
    should suffice, or not?

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • proc_lookup remove_proc_entry
    =========== =================

    lock_kernel();
    spin_lock(&proc_subdir_lock);
    [find PDE with refcount 0]
    spin_unlock(&proc_subdir_lock);
    spin_lock(&proc_subdir_lock);
    [find PDE with refcount 0]
    [check refcount and free PDE]
    spin_unlock(&proc_subdir_lock);
    proc_get_inode:
    de_get(de); /* boom */

    Signed-off-by: Alexey Dobriyan
    Cc: "Eric W. Biederman"
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

08 May, 2007

1 commit

  • I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
    SLAB.

    I think its purpose was to have a callback after an object has been freed
    to verify that the state is the constructor state again? The callback is
    performed before each freeing of an object.

    I would think that it is much easier to check the object state manually
    before the free. That also places the check near the code object
    manipulation of the object.

    Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
    compiled with SLAB debugging on. If there would be code in a constructor
    handling SLAB_DEBUG_INITIAL then it would have to be conditional on
    SLAB_DEBUG otherwise it would just be dead code. But there is no such code
    in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
    use of, difficult to understand and there are easier ways to accomplish the
    same effect (i.e. add debug code before kfree).

    There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
    clear in fs inode caches. Remove the pointless checks (they would even be
    pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.

    This is the last slab flag that SLUB did not support. Remove the check for
    unimplemented flags from SLUB.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

15 Feb, 2007

1 commit

  • With this change the sysctl inodes can be cached and nothing needs to be done
    when removing a sysctl table.

    For a cost of 2K code we will save about 4K of static tables (when we remove
    de from ctl_table) and 70K in proc_dir_entries that we will not allocate, or
    about half that on a 32bit arch.

    The speed feels about the same, even though we can now cache the sysctl
    dentries :(

    We get the core advantage that we don't need to have a 1 to 1 mapping between
    ctl table entries and proc files. Making it possible to have /proc/sys vary
    depending on the namespace you are in. The currently merged namespaces don't
    have an issue here but the network namespace under /proc/sys/net needs to have
    different directories depending on which network adapters are visible. By
    simply being a cache different directories being visible depending on who you
    are is trivial to implement.

    [akpm@osdl.org: fix uninitialised var]
    [akpm@osdl.org: fix ARM build]
    [bunk@stusta.de: make things static]
    Signed-off-by: Eric W. Biederman
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

13 Feb, 2007

1 commit