20 Apr, 2006

1 commit

  • There are places in the kernel where we look up files in fd tables and
    access the file structure without holding refereces to the file. So, we
    need special care to avoid the race between looking up files in the fd
    table and tearing down of the file in another CPU. Otherwise, one might
    see a NULL f_dentry or such torn down version of the file. This patch
    fixes those special places where such a race may happen.

    Signed-off-by: Dipankar Sarma
    Acked-by: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dipankar Sarma
     

11 Apr, 2006

1 commit


01 Apr, 2006

2 commits

  • proc_check_chroot() does the check in a very unintuitive way (keeping a
    copy of the argument, then modifying the argument), and has uncommented
    sideeffects.

    Signed-off-by: Herbert Poetzl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Herbert Poetzl
     
  • Make baby-simple the code for /proc/devices. Based on the proven design
    for /proc/interrupts.

    This also fixes the early-termination regression 2.6.16 introduced, as
    demonstrated by:

    # dd if=/proc/devices bs=1
    Character devices:
    1 mem
    27+0 records in
    27+0 records out

    This should also work (but is untested) when /proc/devices >4096 bytes,
    which I believe is what the original 2.6.16 rewrite fixed.

    [akpm@osdl.org: cleanups, simplifications]
    Signed-off-by: Joe Korty
    Cc: Neil Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Korty
     

29 Mar, 2006

4 commits


28 Mar, 2006

1 commit

  • Various dodgy firmware might give us nodes and/or properties in the device
    tree with conflicting names. That's generally ok, except for when we export
    the device tree via /proc, so check when we're creating the proc device tree
    and munge names accordingly.

    Tested on a faked device tree with kexec, would be good if someone with
    actual bogus firmware could try it, but just for completeness.

    Signed-off-by: Michael Ellerman
    Signed-off-by: Paul Mackerras

    Michael Ellerman
     

27 Mar, 2006

2 commits

  • Remove the it_real_value from /proc/*/stat, during 1.2.x was the last time it
    returned useful data (as it was directly maintained by the scheduler), now
    it's only a waste of time to calculate it. Return 0 instead.

    Signed-off-by: Roman Zippel
    Acked-by: Ingo Molnar
    Acked-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     
  • It has been discovered that the remove_proc_entry has a race in the removing
    of entries in the proc file system that are siblings. There's no protection
    around the traversing and removing of elements that belong in the same
    subdirectory.

    This subdirectory list is protected in other areas by the BKL. So the BKL was
    at first used to protect this area too, but unfortunately, remove_proc_entry
    may be called with spinlocks held. The BKL may schedule, so this was not a
    solution.

    The final solution was to add a new global spin lock to protect this list,
    called proc_subdir_lock. This lock now protects the list in
    remove_proc_entry, and I also went around looking for other areas that this
    list is modified and added this protection there too. Care must be taken
    since these locations call several functions that may also schedule.

    Since I don't see any location that these functions that modify the
    subdirectory list are called by interrupts, the irqsave/restore versions of
    the spin lock was _not_ used.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     

26 Mar, 2006

2 commits

  • * git://git.linux-nfs.org/pub/linux/nfs-2.6: (103 commits)
    SUNRPC,RPCSEC_GSS: spkm3--fix config dependencies
    SUNRPC,RPCSEC_GSS: spkm3: import contexts using NID_cast5_cbc
    LOCKD: Make nlmsvc_traverse_shares return void
    LOCKD: nlmsvc_traverse_blocks return is unused
    SUNRPC,RPCSEC_GSS: fix krb5 sequence numbers.
    NFSv4: Dont list system.nfs4_acl for filesystems that don't support it.
    SUNRPC,RPCSEC_GSS: remove unnecessary kmalloc of a checksum
    SUNRPC: Ensure rpc_call_async() always calls tk_ops->rpc_release()
    SUNRPC: Fix memory barriers for req->rq_received
    NFS: Fix a race in nfs_sync_inode()
    NFS: Clean up nfs_flush_list()
    NFS: Fix a race with PG_private and nfs_release_page()
    NFSv4: Ensure the callback daemon flushes signals
    SUNRPC: Fix a 'Busy inodes' error in rpc_pipefs
    NFS, NLM: Allow blocking locks to respect signals
    NFS: Make nfs_fhget() return appropriate error values
    NFSv4: Fix an oops in nfs4_fill_super
    lockd: blocks should hold a reference to the nlm_file
    NFSv4: SETCLIENTID_CONFIRM should handle NFS4ERR_DELAY/NFS4ERR_RESOURCE
    NFSv4: Send the delegation stateid for SETATTR calls
    ...

    Linus Torvalds
     
  • Implement /proc/slab_allocators. It produces output like:

    idr_layer_cache: 80 idr_pre_get+0x33/0x4e
    buffer_head: 2555 alloc_buffer_head+0x20/0x75
    mm_struct: 9 mm_alloc+0x1e/0x42
    mm_struct: 20 dup_mm+0x36/0x370
    vm_area_struct: 384 dup_mm+0x18f/0x370
    vm_area_struct: 151 do_mmap_pgoff+0x2e0/0x7c3
    vm_area_struct: 1 split_vma+0x5a/0x10e
    vm_area_struct: 11 do_brk+0x206/0x2e2
    vm_area_struct: 2 copy_vma+0xda/0x142
    vm_area_struct: 9 setup_arg_pages+0x99/0x214
    fs_cache: 8 copy_fs_struct+0x21/0x133
    fs_cache: 29 copy_process+0xf38/0x10e3
    files_cache: 30 alloc_files+0x1b/0xcf
    signal_cache: 81 copy_process+0xbaa/0x10e3
    sighand_cache: 77 copy_process+0xe65/0x10e3
    sighand_cache: 1 de_thread+0x4d/0x5f8
    anon_vma: 241 anon_vma_prepare+0xd9/0xf3
    size-2048: 1 add_sect_attrs+0x5f/0x145
    size-2048: 2 journal_init_revoke+0x99/0x302
    size-2048: 2 journal_init_revoke+0x137/0x302
    size-2048: 2 journal_init_inode+0xf9/0x1c4

    Cc: Manfred Spraul
    Cc: Alexander Nyberg
    Cc: Pekka Enberg
    Cc: Christoph Lameter
    Cc: Ravikiran Thirumalai
    Signed-off-by: Al Viro
    DESC
    slab-leaks3-locking-fix
    EDESC
    From: Andrew Morton

    Update for slab-remove-cachep-spinlock.patch

    Cc: Al Viro
    Cc: Manfred Spraul
    Cc: Alexander Nyberg
    Cc: Pekka Enberg
    Cc: Christoph Lameter
    Cc: Ravikiran Thirumalai
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Al Viro
     

24 Mar, 2006

3 commits

  • Rewrap the overly long source code lines resulting from the previous
    patch's addition of the slab cache flag SLAB_MEM_SPREAD. This patch
    contains only formatting changes, and no function change.

    Signed-off-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     
  • Mark file system inode and similar slab caches subject to SLAB_MEM_SPREAD
    memory spreading.

    If a slab cache is marked SLAB_MEM_SPREAD, then anytime that a task that's
    in a cpuset with the 'memory_spread_slab' option enabled goes to allocate
    from such a slab cache, the allocations are spread evenly over all the
    memory nodes (task->mems_allowed) allowed to that task, instead of favoring
    allocation on the node local to the current cpu.

    The following inode and similar caches are marked SLAB_MEM_SPREAD:

    file cache
    ==== =====
    fs/adfs/super.c adfs_inode_cache
    fs/affs/super.c affs_inode_cache
    fs/befs/linuxvfs.c befs_inode_cache
    fs/bfs/inode.c bfs_inode_cache
    fs/block_dev.c bdev_cache
    fs/cifs/cifsfs.c cifs_inode_cache
    fs/coda/inode.c coda_inode_cache
    fs/dquot.c dquot
    fs/efs/super.c efs_inode_cache
    fs/ext2/super.c ext2_inode_cache
    fs/ext2/xattr.c (fs/mbcache.c) ext2_xattr
    fs/ext3/super.c ext3_inode_cache
    fs/ext3/xattr.c (fs/mbcache.c) ext3_xattr
    fs/fat/cache.c fat_cache
    fs/fat/inode.c fat_inode_cache
    fs/freevxfs/vxfs_super.c vxfs_inode
    fs/hpfs/super.c hpfs_inode_cache
    fs/isofs/inode.c isofs_inode_cache
    fs/jffs/inode-v23.c jffs_fm
    fs/jffs2/super.c jffs2_i
    fs/jfs/super.c jfs_ip
    fs/minix/inode.c minix_inode_cache
    fs/ncpfs/inode.c ncp_inode_cache
    fs/nfs/direct.c nfs_direct_cache
    fs/nfs/inode.c nfs_inode_cache
    fs/ntfs/super.c ntfs_big_inode_cache_name
    fs/ntfs/super.c ntfs_inode_cache
    fs/ocfs2/dlm/dlmfs.c dlmfs_inode_cache
    fs/ocfs2/super.c ocfs2_inode_cache
    fs/proc/inode.c proc_inode_cache
    fs/qnx4/inode.c qnx4_inode_cache
    fs/reiserfs/super.c reiser_inode_cache
    fs/romfs/inode.c romfs_inode_cache
    fs/smbfs/inode.c smb_inode_cache
    fs/sysv/inode.c sysv_inode_cache
    fs/udf/super.c udf_inode_cache
    fs/ufs/super.c ufs_inode_cache
    net/socket.c sock_inode_cache
    net/sunrpc/rpc_pipe.c rpc_inode_cache

    The choice of which slab caches to so mark was quite simple. I marked
    those already marked SLAB_RECLAIM_ACCOUNT, except for fs/xfs, dentry_cache,
    inode_cache, and buffer_head, which were marked in a previous patch. Even
    though SLAB_RECLAIM_ACCOUNT is for a different purpose, it marks the same
    potentially large file system i/o related slab caches as we need for memory
    spreading.

    Given that the rule now becomes "wherever you would have used a
    SLAB_RECLAIM_ACCOUNT slab cache flag before (usually the inode cache), use
    the SLAB_MEM_SPREAD flag too", this should be easy enough to maintain.
    Future file system writers will just copy one of the existing file system
    slab cache setups and tend to get it right without thinking.

    Signed-off-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     
  • Trond Myklebust
     

23 Mar, 2006

1 commit


21 Mar, 2006

1 commit

  • Create a new file under /proc/self, called mountstats, where mounted file
    systems can export information (configuration options, performance counters,
    and so on). Use a mechanism similar to /proc/mounts and s_ops->show_options.

    This mechanism does not violate namespace security, and is safe to use while
    other processes are unmounting file systems.

    Thanks to Mike Waychison for his review and comments.

    Test-plan:
    Test concurrent mount/unmount operations while cat'ing /proc/self/mountstats.

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

07 Mar, 2006

2 commits

  • The point of the smaps "shared" is to count the number of pages that are
    mapped by more than one process, according to Mauricio Lin. However, smaps
    uses page_count for this, so it will return a false positive for every page
    that is mapped by just that one process, which is also in pagecache or
    swapcache. There are false positive situations for anonymous pages not in
    swapcache as well: - page reclaim, migration - get_user_pages (eg.
    direct-io, ptrace)

    Use page_mapcount instead, to count the number of mappings to the page.

    Use vm_normal_page so that weird things like /dev/mem aren't counted either.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • smaps doesn't have a hugepage pagetable walker. Skip walking hugepage
    vmas.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     

19 Feb, 2006

1 commit

  • 1) it should use nr_processes(), not nr_threads; otherwise we are getting
    very confused find(1) and friends, among other things.
    2) better do that at stat() time than at every damn lookup in procfs root.

    Patch had been sitting in FC4 kernels for many months now...

    Signed-off-by: Al Viro

    Al Viro
     

04 Feb, 2006

1 commit


15 Jan, 2006

1 commit

  • A Christoph suggested that the /proc/devices file be converted to use the
    seq_file interface. This patch does that.

    I've obxerved one or two installation that had sufficiently large sans that
    they overran the 4k limit on /proc/devices.

    Signed-off-by: Neil Horman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Neil Horman
     

13 Jan, 2006

1 commit


12 Jan, 2006

2 commits


11 Jan, 2006

3 commits


09 Jan, 2006

3 commits

  • Function prototypes belong into header files.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • configurable replacement for slab allocator

    This adds a CONFIG_SLAB option under CONFIG_EMBEDDED. When CONFIG_SLAB is
    disabled, the kernel falls back to using the 'SLOB' allocator.

    SLOB is a traditional K&R/UNIX allocator with a SLAB emulation layer,
    similar to the original Linux kmalloc allocator that SLAB replaced. It's
    signicantly smaller code and is more memory efficient. But like all
    similar allocators, it scales poorly and suffers from fragmentation more
    than SLAB, so it's only appropriate for small systems.

    It's been tested extensively in the Linux-tiny tree. I've also
    stress-tested it with make -j 8 compiles on a 3G SMP+PREEMPT box (not
    recommended).

    Here's a comparison for otherwise identical builds, showing SLOB saving
    nearly half a megabyte of RAM:

    $ size vmlinux*
    text data bss dec hex filename
    3336372 529360 190812 4056544 3de5e0 vmlinux-slab
    3323208 527948 190684 4041840 3dac70 vmlinux-slob

    $ size mm/{slab,slob}.o
    text data bss dec hex filename
    13221 752 48 14021 36c5 mm/slab.o
    1896 52 8 1956 7a4 mm/slob.o

    /proc/meminfo:
    SLAB SLOB delta
    MemTotal: 27964 kB 27980 kB +16 kB
    MemFree: 24596 kB 25092 kB +496 kB
    Buffers: 36 kB 36 kB 0 kB
    Cached: 1188 kB 1188 kB 0 kB
    SwapCached: 0 kB 0 kB 0 kB
    Active: 608 kB 600 kB -8 kB
    Inactive: 808 kB 812 kB +4 kB
    HighTotal: 0 kB 0 kB 0 kB
    HighFree: 0 kB 0 kB 0 kB
    LowTotal: 27964 kB 27980 kB +16 kB
    LowFree: 24596 kB 25092 kB +496 kB
    SwapTotal: 0 kB 0 kB 0 kB
    SwapFree: 0 kB 0 kB 0 kB
    Dirty: 4 kB 12 kB +8 kB
    Writeback: 0 kB 0 kB 0 kB
    Mapped: 560 kB 556 kB -4 kB
    Slab: 1756 kB 0 kB -1756 kB
    CommitLimit: 13980 kB 13988 kB +8 kB
    Committed_AS: 4208 kB 4208 kB 0 kB
    PageTables: 28 kB 28 kB 0 kB
    VmallocTotal: 1007312 kB 1007312 kB 0 kB
    VmallocUsed: 48 kB 48 kB 0 kB
    VmallocChunk: 1007264 kB 1007264 kB 0 kB

    (this work has been sponsored in part by CELF)

    From: Ingo Molnar

    Fix 32-bitness bugs in mm/slob.c.

    Signed-off-by: Matt Mackall
    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Mackall
     
  • First discussed at http://marc.theaimsgroup.com/?t=113149255100001&r=1&w=2

    - Use the check_range() in mempolicy.c to gather statistics.

    - Improve the numa_maps code in general and fix some comments.

    Signed-off-by: Christoph Lameter
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

07 Jan, 2006

1 commit

  • Sanitize some s390 Kconfig options. We have ARCH_S390, ARCH_S390X,
    ARCH_S390_31, 64BIT, S390_SUPPORT and COMPAT. Replace these 6 options by
    S390, 64BIT and COMPAT.

    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Martin Schwidefsky
     

31 Dec, 2005

1 commit

  • The old /proc interfaces were never updated to use loff_t, and are just
    generally broken. Now, we should be using the seq_file interface for
    all of the proc files, but converting the legacy functions is more work
    than most people care for and has little upside..

    But at least we can make the non-LFS rules explicit, rather than just
    insanely wrapping the offset or something.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

29 Nov, 2005

1 commit

  • This replaces the (in my opinion horrible) VM_UNMAPPED logic with very
    explicit support for a "remapped page range" aka VM_PFNMAP. It allows a
    VM area to contain an arbitrary range of page table entries that the VM
    never touches, and never considers to be normal pages.

    Any user of "remap_pfn_range()" automatically gets this new
    functionality, and doesn't even have to mark the pages reserved or
    indeed mark them any other way. It just works. As a side effect, doing
    mmap() on /dev/mem works for arbitrary ranges.

    Sparc update from David in the next commit.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

14 Nov, 2005

1 commit


08 Nov, 2005

3 commits


31 Oct, 2005

1 commit