20 Aug, 2013

1 commit

  • In the previous commit, Richard Genoud fixed proc_root_readdir(), which
    had lost the check for whether all of the non-process /proc entries had
    been returned or not.

    But that in turn exposed _another_ bug, namely that the original readdir
    conversion patch had yet another problem: it had lost the return value
    of proc_readdir_de(), so now checking whether it had completed
    successfully or not didn't actually work right anyway.

    This reinstates the non-zero return for the "end of base entries" that
    had also gotten lost in commit f0c3b5093add ("[readdir] convert
    procfs"). So now you get all the base entries *and* you get all the
    process entries, regardless of getdents buffer size.

    (Side note: the Linux "getdents" manual page actually has a nice example
    application for testing getdents, which can be easily modified to use
    different buffers. Who knew? Man-pages can be useful)

    Reported-by: Emmanuel Benisty
    Reported-by: Marc Dionne
    Cc: Richard Genoud
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

29 Jun, 2013

1 commit


02 May, 2013

5 commits

  • Make the PROC_I() and PDE() macros internal to procfs. This means making
    PDE_DATA() out of line. This could be made more optimal by storing
    PDE()->data into inode->i_private.

    Also provide a __PDE_DATA() that is inline and internal to procfs.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     
  • Supply a function (proc_remove()) to remove a proc entry (and any subtree
    rooted there) by proc_dir_entry pointer rather than by name and (optionally)
    root dir entry pointer. This allows us to eliminate all remaining pde->name
    accesses outside of procfs.

    Signed-off-by: David Howells
    Acked-by: Grant Likely
    cc: linux-acpi@vger.kernel.org
    cc: openipmi-developer@lists.sourceforge.net
    cc: devicetree-discuss@lists.ozlabs.org
    cc: linux-pci@vger.kernel.org
    cc: netdev@vger.kernel.org
    cc: netfilter-devel@vger.kernel.org
    cc: alsa-devel@alsa-project.org
    Signed-off-by: Al Viro

    David Howells
     
  • Supply an accessor function for getting the private data from the parent
    proc_dir_entry struct of the proc_dir_entry struct associated with an inode.

    ReiserFS, for instance, stores the super_block pointer in the proc directory
    it makes for that super_block, and a pointer to the respective seq_file show
    function in each of the proc files in that directory.

    This allows a reduction in the number of file_operations structs, open
    functions and seq_operations structs required. The problem otherwise is that
    each show function requires two pieces of data but only has storage for one
    per PDE (and this has no release function).

    Signed-off-by: David Howells
    Acked-by: Mauro Carvalho Chehab
    Acked-by: Greg Kroah-Hartman
    cc: Jerry Chuang
    cc: Maxim Mikityanskiy
    cc: YAMANE Toshiaki
    cc: linux-wireless@vger.kernel.org
    cc: linux-scsi@vger.kernel.org
    cc: devel@driverdev.osuosl.org
    Signed-off-by: Al Viro

    David Howells
     
  • Add proc_mkdir_data() to allow procfs directories to be created that are
    annotated at the time of creation with private data rather than doing this
    post-creation. This means no access is then required to the proc_dir_entry
    struct to set this.

    Signed-off-by: David Howells
    Acked-by: Mauro Carvalho Chehab
    Acked-by: Greg Kroah-Hartman
    cc: Neela Syam Kolli
    cc: Jerry Chuang
    cc: linux-scsi@vger.kernel.org
    cc: devel@driverdev.osuosl.org
    cc: linux-wireless@vger.kernel.org
    Signed-off-by: Al Viro

    David Howells
     
  • Supply accessor functions to set attributes in proc_dir_entry structs.

    The following are supplied: proc_set_size() and proc_set_user().

    Signed-off-by: David Howells
    Acked-by: Mauro Carvalho Chehab
    cc: linuxppc-dev@lists.ozlabs.org
    cc: linux-media@vger.kernel.org
    cc: netdev@vger.kernel.org
    cc: linux-wireless@vger.kernel.org
    cc: linux-pci@vger.kernel.org
    cc: netfilter-devel@vger.kernel.org
    cc: alsa-devel@alsa-project.org
    Signed-off-by: Al Viro

    David Howells
     

30 Apr, 2013

1 commit

  • Delete create_proc_read_entry() as it no longer has any users.

    Also delete read_proc_t, write_proc_t, the read_proc member of the
    proc_dir_entry struct and the support functions that use them. This saves a
    pointer for every PDE allocated.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro

    David Howells
     

10 Apr, 2013

7 commits


28 Feb, 2013

1 commit

  • - use pr_foo() throughout

    - remove a couple of duplicated KERN_WARNINGs, via WARN(KERN_WARNING "...")

    - nuke a few warnings which I've never seen happen, ever.

    Cc: Joe Perches
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

26 Feb, 2013

1 commit

  • Make it drop the pde in *all* cases when no new reference to it is
    put into an inode - both when an inode had already been set up
    (as we were already doing) and when inode allocation has failed.
    Makes for simpler logics in callers...

    Signed-off-by: Al Viro

    Al Viro
     

23 Feb, 2013

1 commit


26 Dec, 2012

1 commit

  • While testing the pid namespace code I hit this nasty warning.

    [ 176.262617] ------------[ cut here ]------------
    [ 176.263388] WARNING: at /home/eric/projects/linux/linux-userns-devel/kernel/softirq.c:160 local_bh_enable_ip+0x7a/0xa0()
    [ 176.265145] Hardware name: Bochs
    [ 176.265677] Modules linked in:
    [ 176.266341] Pid: 742, comm: bash Not tainted 3.7.0userns+ #18
    [ 176.266564] Call Trace:
    [ 176.266564] [] warn_slowpath_common+0x7f/0xc0
    [ 176.266564] [] warn_slowpath_null+0x1a/0x20
    [ 176.266564] [] local_bh_enable_ip+0x7a/0xa0
    [ 176.266564] [] _raw_spin_unlock_bh+0x19/0x20
    [ 176.266564] [] proc_free_inum+0x3a/0x50
    [ 176.266564] [] free_pid_ns+0x1c/0x80
    [ 176.266564] [] put_pid_ns+0x35/0x50
    [ 176.266564] [] put_pid+0x4a/0x60
    [ 176.266564] [] tty_ioctl+0x717/0xc10
    [ 176.266564] [] ? wait_consider_task+0x855/0xb90
    [ 176.266564] [] ? default_spin_lock_flags+0x9/0x10
    [ 176.266564] [] ? remove_wait_queue+0x5a/0x70
    [ 176.266564] [] do_vfs_ioctl+0x98/0x550
    [ 176.266564] [] ? recalc_sigpending+0x1f/0x60
    [ 176.266564] [] ? __set_task_blocked+0x37/0x80
    [ 176.266564] [] ? sys_wait4+0xab/0xf0
    [ 176.266564] [] sys_ioctl+0x91/0xb0
    [ 176.266564] [] ? task_stopped_code+0x50/0x50
    [ 176.266564] [] system_call_fastpath+0x16/0x1b
    [ 176.266564] ---[ end trace 387af88219ad6143 ]---

    It turns out that spin_unlock_bh(proc_inum_lock) is not safe when
    put_pid is called with another spinlock held and irqs disabled.

    For now take the easy path and use spin_lock_irqsave(proc_inum_lock)
    in proc_free_inum and spin_loc_irq in proc_alloc_inum(proc_inum_lock).

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

21 Dec, 2012

3 commits

  • Merge the rest of Andrew's patches for -rc1:
    "A bunch of fixes and misc missed-out-on things.

    That'll do for -rc1. I still have a batch of IPC patches which still
    have a possible bug report which I'm chasing down."

    * emailed patches from Andrew Morton : (25 commits)
    keys: use keyring_alloc() to create module signing keyring
    keys: fix unreachable code
    sendfile: allows bypassing of notifier events
    SGI-XP: handle non-fatal traps
    fat: fix incorrect function comment
    Documentation: ABI: remove testing/sysfs-devices-node
    proc: fix inconsistent lock state
    linux/kernel.h: fix DIV_ROUND_CLOSEST with unsigned divisors
    memcg: don't register hotcpu notifier from ->css_alloc()
    checkpatch: warn on uapi #includes that #include
    mm: cma: WARN if freed memory is still in use
    exec: do not leave bprm->interp on stack
    ...

    Linus Torvalds
     
  • Lockdep found an inconsistent lock state when rcu is processing delayed
    work in softirq. Currently, kernel is using spin_lock/spin_unlock to
    protect proc_inum_ida, but proc_free_inum is called by rcu in softirq
    context.

    Use spin_lock_bh/spin_unlock_bh fix following lockdep warning.

    =================================
    [ INFO: inconsistent lock state ]
    3.7.0 #36 Not tainted
    ---------------------------------
    inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
    swapper/1/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
    (proc_inum_lock){+.?...}, at: proc_free_inum+0x1c/0x50
    {SOFTIRQ-ON-W} state was registered at:
    __lock_acquire+0x8ae/0xca0
    lock_acquire+0x199/0x200
    _raw_spin_lock+0x41/0x50
    proc_alloc_inum+0x4c/0xd0
    alloc_mnt_ns+0x49/0xc0
    create_mnt_ns+0x25/0x70
    mnt_init+0x161/0x1c7
    vfs_caches_init+0x107/0x11a
    start_kernel+0x348/0x38c
    x86_64_start_reservations+0x131/0x136
    x86_64_start_kernel+0x103/0x112
    irq event stamp: 2993422
    hardirqs last enabled at (2993422): _raw_spin_unlock_irqrestore+0x55/0x80
    hardirqs last disabled at (2993421): _raw_spin_lock_irqsave+0x29/0x70
    softirqs last enabled at (2993394): _local_bh_enable+0x13/0x20
    softirqs last disabled at (2993395): call_softirq+0x1c/0x30

    other info that might help us debug this:
    Possible unsafe locking scenario:

    CPU0
    ----
    lock(proc_inum_lock);

    lock(proc_inum_lock);

    *** DEADLOCK ***

    no locks held by swapper/1/0.

    stack backtrace:
    Pid: 0, comm: swapper/1 Not tainted 3.7.0 #36
    Call Trace:
    [] ? vprintk_emit+0x471/0x510
    print_usage_bug+0x2a5/0x2c0
    mark_lock+0x33b/0x5e0
    __lock_acquire+0x813/0xca0
    lock_acquire+0x199/0x200
    _raw_spin_lock+0x41/0x50
    proc_free_inum+0x1c/0x50
    free_pid_ns+0x1c/0x50
    put_pid_ns+0x2e/0x50
    put_pid+0x4a/0x60
    delayed_put_pid+0x12/0x20
    rcu_process_callbacks+0x462/0x790
    __do_softirq+0x1b4/0x3b0
    call_softirq+0x1c/0x30
    do_softirq+0x59/0xd0
    irq_exit+0x54/0xd0
    smp_apic_timer_interrupt+0x95/0xa3
    apic_timer_interrupt+0x72/0x80
    cpuidle_enter_tk+0x10/0x20
    cpuidle_enter_state+0x17/0x50
    cpuidle_idle_call+0x287/0x520
    cpu_idle+0xba/0x130
    start_secondary+0x2b3/0x2bc

    Signed-off-by: Xiaotian Feng
    Cc: Al Viro
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xiaotian Feng
     
  • Removed vmtruncate

    Signed-off-by: Marco Stornelli
    Signed-off-by: Al Viro

    Marco Stornelli
     

20 Nov, 2012

1 commit

  • Generalize the proc inode allocation so that it can be
    used without having to having to create a proc_dir_entry.

    This will allow namespace file descriptors to remain light
    weight entitities but still have the same inode number
    when the backing namespace is the same.

    Acked-by: Serge E. Hallyn
    Signed-off-by: Eric W. Biederman

    Eric W. Biederman
     

06 Oct, 2012

2 commits

  • Part of the memory will be written twice after this change, but that
    should be negligible.

    [akpm@linux-foundation.org: fix __proc_create() coding-style issues, remove unneeded zero-initialisations]
    Signed-off-by: yan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yan
     
  • If proc_get_inode() returns NULL then presumably it encountered memory
    exhaustion. proc_lookup_de() should return -ENOMEM in this case, not
    -EINVAL.

    Signed-off-by: yan
    Cc: Ryan Mallon
    Cc: Cong Wang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yan
     

14 Jul, 2012

1 commit

  • Just the flags; only NFS cares even about that, but there are
    legitimate uses for such argument. And getting rid of that
    completely would require splitting ->lookup() into a couple
    of methods (at least), so let's leave that alone for now...

    Signed-off-by: Al Viro

    Al Viro
     

04 Jan, 2012

1 commit


02 Nov, 2011

1 commit


28 Jul, 2011

1 commit

  • Since __proc_create() appends the name it is given to the end of the PDE
    structure that it allocates, there isn't a need to store a name pointer.
    Instead we can just replace the name pointer with a terminal char array of
    _unspecified_ length. The compiler will simply append the string to statically
    defined variables of PDE type overlapping any hole at the end of the structure
    and, unlike specifying an explicitly _zero_ length array, won't give a warning
    if you try to statically initialise it with a string of more than zero length.

    Also, whilst we're at it:

    (1) Move namelen to end just prior to name and reduce it to a single byte
    (name shouldn't be longer than NAME_MAX).

    (2) Move pde_unload_lock two places further on so that if it's four bytes in
    size on a 64-bit machine, it won't cause an unused hole in the PDE struct.

    Signed-off-by: David Howells
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Linus Torvalds

    David Howells
     

17 May, 2011

1 commit


24 Mar, 2011

1 commit

  • 1. namelen is declared "unsigned short" which hints for "maybe space savings".
    Indeed in 2.4 struct proc_dir_entry looked like:

    struct proc_dir_entry {
    unsigned short low_ino;
    unsigned short namelen;

    Now, low_ino is "unsigned int", all savings were gone for a long time.
    "struct proc_dir_entry" is not that countless to worry about it's size,
    anyway.

    2. converting from unsigned short to int/unsigned int can only create
    problems, we better play it safe.

    Space is not really conserved, because of natural alignment for the next
    field. sizeof(struct proc_dir_entry) remains the same.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

14 Jan, 2011

2 commits

  • For the common case where a proc entry is being removed and nobody is in
    the process of using it, save a LOCK/UNLOCK pair.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • - ->low_ino is write-once field -- reading it under locks is unnecessary.

    - /proc/$PID stuff never reaches pde_put()/free_proc_entry() --
    PROC_DYNAMIC_FIRST check never triggers.

    - in proc_get_inode(), inode number always matches proc dir entry, so
    save one parameter.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

07 Jan, 2011

2 commits

  • Reduce some branches and memory accesses in dcache lookup by adding dentry
    flags to indicate common d_ops are set, rather than having to check them.
    This saves a pointer memory access (dentry->d_op) in common path lookup
    situations, and saves another pointer load and branch in cases where we
    have d_op but not the particular operation.

    Patched with:

    git grep -E '[.>]([[:space:]])*d_op([[:space:]])*=' | xargs sed -e 's/\([^\t ]*\)->d_op = \(.*\);/d_set_d_op(\1, \2);/' -e 's/\([^\t ]*\)\.d_op = \(.*\);/d_set_d_op(\&\1, \2);/' -i

    Signed-off-by: Nick Piggin

    Nick Piggin
     
  • Change d_delete from a dentry deletion notification to a dentry caching
    advise, more like ->drop_inode. Require it to be constant and idempotent,
    and not take d_lock. This is how all existing filesystems use the callback
    anyway.

    This makes fine grained dentry locking of dput and dentry lru scanning
    much simpler.

    Signed-off-by: Nick Piggin

    Nick Piggin
     

10 Aug, 2010

1 commit

  • Replace inode_setattr with opencoded variants of it in all callers. This
    moves the remaining call to vmtruncate into the filesystem methods where it
    can be replaced with the proper truncate sequence.

    In a few cases it was obvious that we would never end up calling vmtruncate
    so it was left out in the opencoded variant:

    spufs: explicitly checks for ATTR_SIZE earlier
    btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
    ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above

    In addition to that ncpfs called inode_setattr with handcrafted iattrs,
    which allowed to trim down the opencoded variant.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     

28 May, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

07 Mar, 2010

2 commits