27 Sep, 2006

34 commits

  • Ingo Oeser pointed out that because current expands to an inline function
    it is more space efficient and somewhat faster to simply keep a cached copy
    of current in another variable. This patch implements that for the
    de_thread function.

    (akpm: saves nearly 100 bytes of text on x86)

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Signed-off-by: Adrian Bunk
    Acked-by: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Add missing \n to dprintk

    Signed-off-by: Martin Bligh
    Acked-by: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Martin Bligh
     
  • In de_thread we move pids from one process to another, a rather ugly case.
    The function transfer_pid makes it clear what we are doing, and makes the
    action atomic. This is useful we ever want to atomically traverse the
    process group and session lists, in a rcu safe manner.

    Even if the atomic properties this change should be a win as transfer_pid
    should be less code to execute than executing both attach_pid and
    detach_pid, and this should make de_thread slightly smaller as only a
    single function call needs to be emitted. The only downside is that the
    code might be slower to execute as the odds are against transfer_pid being
    in cache.

    Signed-off-by: Eric W. Biederman
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Since sys_sysctl is deprecated start allow it to be compiled out. This
    should catch any remaining user space code that cares, and paves the way
    for further sysctl cleanups.

    [akpm@osdl.org: If sys_sysctl() is not compiled-in, emit a warning]
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • free_fdset(NULL, ...) is legal.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Since the nolargeio option no longer has any effect, print a warning
    instead of setting a write-only variable.

    Signed-off-by: Adrian Bunk
    Cc: Jeff Mahoney
    Cc: Chris Mason
    Cc: Hans Reiser
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • This eliminates the i_blksize field from struct inode. Filesystems that want
    to provide a per-inode st_blksize can do so by providing their own getattr
    routine instead of using the generic_fillattr() function.

    Note that some filesystems were providing pretty much random (and incorrect)
    values for i_blksize.

    [bunk@stusta.de: cleanup]
    [akpm@osdl.org: generic_fillattr() fix]
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Theodore Ts'o
     
  • Move the i_cdev pointer in struct inode into a union.

    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Theodore Ts'o
     
  • Move the i_bdev pointer in struct inode into a union.

    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Theodore Ts'o
     
  • The following patches reduce the size of the VFS inode structure by 28 bytes
    on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction
    in the inode size on a UP kernel that is configured in a production mode
    (i.e., with no spinlock or other debugging functions enabled; if you want to
    save memory taken up by in-core inodes, the first thing you should do is
    disable the debugging options; they are responsible for a huge amount of bloat
    in the VFS inode structure).

    This patch:

    The filesystem or device-specific pointer in the inode is inside a union,
    which is pretty pointless given that all 30+ users of this field have been
    using the void pointer. Get rid of the union and rename it to i_private, with
    a comment to explain who is allowed to use the void pointer. This is just a
    cleanup, but it allows us to reuse the union 'u' for something something where
    the union will actually be used.

    [judith@osdl.org: powerpc build fix]
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Judith Lebzelter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Theodore Ts'o
     
  • get_blocks() was removed. So, this removes it on fat, and will take
    advantage of the multi block mapping.

    Signed-off-by: OGAWA Hirofumi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     
  • For a long time now I have had a problem with not being able to return a
    lookup failure on an existsing directory. In autofs this corresponds to a
    mount failure on a autofs managed mount entry that is browsable (and so the
    mount point directory exists).

    While this problem has been present for a long time I've avoided resolving
    it because it was not very visible. But now that autofs v5 has "mount and
    expire on demand" of nested multiple mounts, such as is found when mounting
    an export list from a server, solving the problem cannot be avoided any
    longer.

    I've tried very hard to find a way to do this entirely within the autofs4
    module but have not been able to find a satisfactory way to achieve it.

    So, I need to propose a change to the VFS.

    Signed-off-by: Ian Kent
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Kent
     
  • Move the fallback arch_vma_name() to a sensible place (kernel/signal.c).

    Currently it's in fs/proc/task_mmu.c, a file that is dependent on both
    CONFIG_PROC_FS and CONFIG_MMU being enabled, but it's used from
    kernel/signal.c from where it is called unconditionally.

    [akpm@osdl.org: build fix]
    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Implement /proc/pid/maps for NOMMU by reading the vm_area_list attached to
    current->mm->context.vmlist.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Set the backing device info capabilities for /dev/mem and /dev/kmem to
    permit direct sharing under no-MMU conditions and full mapping capabilities
    under MMU conditions. Make the BDI used by these available to all directly
    mappable character devices.

    Also comment the capabilities for /dev/zero.

    [akpm@osdl.org: ifdef reductions]
    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • * Rougly half of callers already do it by not checking return value
    * Code in drivers/acpi/osl.c does the following to be sure:

    (void)kmem_cache_destroy(cache);

    * Those who check it printk something, however, slab_error already printed
    the name of failed cache.
    * XFS BUGs on failed kmem_cache_destroy which is not the decision
    low-level filesystem driver should make. Converted to ignore.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • * Removing useless casts
    * Removing useless wrapper
    * Conversion from kmalloc+memset to kzalloc

    Signed-off-by: Panagiotis Issaris
    Acked-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Panagiotis Issaris
     
  • Conversions from kmalloc+memset to kzalloc.

    Signed-off-by: Panagiotis Issaris
    Jffs2-bit-acked-by: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Panagiotis Issaris
     
  • Some of the changes in balloc.c are just cosmetic, as Andreas pointed out -
    if they overflow they'll then underflow and things are fine.

    5th hunk actually fixes an overflow problem.

    Also check for potential overflows in inode & block counts when resizing.

    Signed-off-by: Eric Sandeen
    Cc: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • Fixing up some endian-ness warnings in preparation to clone ext4 from ext3.

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     
  • More white space cleanups in preparation of cloning ext4 from ext3.
    Removing spaces that precede a tab.

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     
  • SWsoft Virtuozzo/OpenVZ Linux kernel team has discovered that ext3 error
    behavior was broken in linux kernels since 2.5.x versions by the following
    patch:

    2002/10/31 02:15:26-05:00 tytso@snap.thunk.org
    Default mount options from superblock for ext2/3 filesystems
    http://linux.bkbits.net:8080/linux-2.6/gnupatch@3dc0d88eKbV9ivV4ptRNM8fBuA3JBQ

    In case ext3 file system is mounted with errors=continue
    (EXT3_ERRORS_CONTINUE) errors should be ignored when possible. However at
    present in case of any error kernel aborts journal and remounts filesystem
    to read-only. Such behavior was hit number of times and noted to differ
    from that of 2.4.x kernels.

    This patch fixes this:
    - do nothing in case of EXT3_ERRORS_CONTINUE,
    - set EXT3_MOUNT_ABORT and call journal_abort() in all other cases
    - panic() should be called after ext3_commit_super() to save
    sb marked as EXT3_ERROR_FS

    Signed-off-by: Vasily Averin
    Acked-by: Kirill Korotaev
    Cc: Theodore Ts'o
    Cc: "Stephen C. Tweedie"
    Cc: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasily Averin
     
  • Signed-off-by: Mingming Cao
    Acked-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • In the past there were a few kernel panics related to block reservation
    tree operations failure (insert/remove etc). It would be very useful to
    get the block allocation reservation map info when such error happens.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • These are a few places I've found in jbd that look like they may not be
    16T-safe, or consistent with the use of unsigned longs for block
    containers. Problems here would be somewhat hard to hit, would require
    journal blocks past the 8T boundary, which would not be terribly common.
    Still, should fix.

    (some of these have come from the ext4 work on jbd as well).

    I think there's one more possibility that the wrap() function may not be
    safe IF your last block in the journal butts right up against the 232 block
    boundary, but that seems like a VERY remote possibility, and I'm not
    worrying about it at this point.

    Signed-off-by: Eric Sandeen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • This is primarily format string fixes, with changes to ialloc.c where large
    inode counts could overflow, and also pass around journal_inum as an
    unsigned long, just to be pedantic about it....

    Signed-off-by: Eric Sandeen
    Cc: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • Signed-off-by: Eric Sandeen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • I need to do some actual IO testing now, but this gets things mounting for
    a 16T ext3 filesystem. (patched up e2fsprogs is needed too, I'll send that
    off the kernel list)

    This patch fixes these issues in the kernel:

    o sbi->s_groups_count overflows in ext3_fill_super()

    sbi->s_groups_count = (le32_to_cpu(es->s_blocks_count) -
    le32_to_cpu(es->s_first_data_block) +
    EXT3_BLOCKS_PER_GROUP(sb) - 1) /
    EXT3_BLOCKS_PER_GROUP(sb);

    at 16T, s_blocks_count is already maxed out; adding
    EXT3_BLOCKS_PER_GROUP(sb) overflows it and groups_count comes out to 0.
    Not really what we want, and causes a failed mount.

    Feel free to check my math (actually, please do!), but changing it this
    way should work & avoid the overflow:

    (A + B - 1)/B changed to: ((A - 1)/B) + 1

    o ext3_check_descriptors() overflows range checks

    ext3_check_descriptors() iterates over all block groups making sure
    that various bits are within the right block ranges... on the last pass
    through, it is checking the error case

    [item] >= block + EXT3_BLOCKS_PER_GROUP(sb)

    where "block" is the first block in the last block group. The last
    block in this group (and the last one that will fit in 32 bits) is block
    + EXT3_BLOCKS_PER_GROUP(sb)- 1. block + EXT3_BLOCKS_PER_GROUP(sb) wraps
    back around to 0.

    so, make things clearer with "first_block" and "last_block" where those
    are first and last, inclusive, and use rather than =.

    Finally, the last block group may be smaller than the rest, so account
    for this on the last pass through: last_block = sb->s_blocks_count - 1;

    (a similar patch could be done for ext2; does anyone in their right mind
    use ext2 at 16T? I'll send an ext2 patch doing the same thing if that's
    warranted)

    Signed-off-by: Eric Sandeen
    Cc: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sandeen
     
  • Signed-off-by: Alexey Dobriyan
    Acked-by: Stephen Tweedie
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Remove whitespace from ext3 and jbd, before we clone ext4.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • jbd_sync_bh releases journal->j_list_lock. Add a lock annotation to this
    function so that sparse can check callers for lock pairing, and so that
    sparse will not complain about this function since it intentionally uses
    the lock in this manner.

    Signed-off-by: Josh Triplett
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     
  • * 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits)
    [PATCH] Don't set calgary iommu as default y
    [PATCH] i386/x86-64: New Intel feature flags
    [PATCH] x86: Add a cumulative thermal throttle event counter.
    [PATCH] i386: Make the jiffies compares use the 64bit safe macros.
    [PATCH] x86: Refactor thermal throttle processing
    [PATCH] Add 64bit jiffies compares (for use with get_jiffies_64)
    [PATCH] Fix unwinder warning in traps.c
    [PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1
    [PATCH] x86: Move direct PCI scanning functions out of line
    [PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI
    [PATCH] Don't leak NT bit into next task
    [PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder
    [PATCH] Fix some broken white space in ia32_signal.c
    [PATCH] Initialize argument registers for 32bit signal handlers.
    [PATCH] Remove all traces of signal number conversion
    [PATCH] Don't synchronize time reading on single core AMD systems
    [PATCH] Remove outdated comment in x86-64 mmconfig code
    [PATCH] Use string instructions for Core2 copy/clear
    [PATCH] x86: - restore i8259A eoi status on resume
    [PATCH] i386: Split multi-line printk in oops output.
    ...

    Linus Torvalds
     
  • * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (47 commits)
    Driver core: Don't call put methods while holding a spinlock
    Driver core: Remove unneeded routines from driver core
    Driver core: Fix potential deadlock in driver core
    PCI: enable driver multi-threaded probe
    Driver Core: add ability for drivers to do a threaded probe
    sysfs: add proper sysfs_init() prototype
    drivers/base: check errors
    drivers/base: Platform notify needs to occur before drivers attach to the device
    v4l-dev2: handle __must_check
    add CONFIG_ENABLE_MUST_CHECK
    add __must_check to device management code
    Driver core: fixed add_bind_files() definition
    Driver core: fix comments in drivers/base/power/resume.c
    sysfs_remove_bin_file: no return value, dump_stack on error
    kobject: must_check fixes
    Driver core: add ability for devices to create and remove bin files
    Class: add support for class interfaces for devices
    Driver core: create devices/virtual/ tree
    Driver core: add device_rename function
    Driver core: add ability for classes to handle devices properly
    ...

    Linus Torvalds
     

26 Sep, 2006

6 commits

  • As David Howells points out, binfmt_elf sometimes uses
    off_t, sometimes uses loff_t. Use loff_t throughout.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Remove the atomic counter for slab_reclaim_pages and replace the counter
    and NR_SLAB with two ZVC counter that account for unreclaimable and
    reclaimable slab pages: NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE.

    Change the check in vmscan.c to refer to to NR_SLAB_RECLAIMABLE. The
    intend seems to be to check for slab pages that could be freed.

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Do not display HIGHMEM memory sizes if CONFIG_HIGHMEM is not set.

    Make HIGHMEM dependent texts and make display of highmem counters optional

    Some texts are depending on CONFIG_HIGHMEM.

    Remove those strings and remove the display of highmem counter values if
    CONFIG_HIGHMEM is not set.

    [akpm@osdl.org: remove some ifdefs]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Tracking of dirty pages in shared writeable mmap()s.

    The idea is simple: write protect clean shared writeable pages, catch the
    write-fault, make writeable and set dirty. On page write-back clean all the
    PTE dirty bits and write protect them once again.

    The implementation is a tad harder, mainly because the default
    backing_dev_info capabilities were too loosely maintained. Hence it is not
    enough to test the backing_dev_info for cap_account_dirty.

    The current heuristic is as follows, a VMA is eligible when:
    - its shared writeable
    (vm_flags & (VM_WRITE|VM_SHARED)) == (VM_WRITE|VM_SHARED)
    - it is not a 'special' mapping
    (vm_flags & (VM_PFNMAP|VM_INSERTPAGE)) == 0
    - the backing_dev_info is cap_account_dirty
    mapping_cap_account_dirty(vma->vm_file->f_mapping)
    - f_op->mmap() didn't change the default page protection

    Page from remap_pfn_range() are explicitly excluded because their COW
    semantics are already horrid enough (see vm_normal_page() in do_wp_page()) and
    because they don't have a backing store anyway.

    mprotect() is taught about the new behaviour as well. However it overrides
    the last condition.

    Cleaning the pages on write-back is done with page_mkclean() a new rmap call.
    It can be called on any page, but is currently only implemented for mapped
    pages, if the page is found the be of a VMA that accounts dirty pages it will
    also wrprotect the PTE.

    Finally, in fs/buffers.c:try_to_free_buffers(); remove clear_page_dirty() from
    under ->private_lock. This seems to be safe, since ->private_lock is used to
    serialize access to the buffers, not the page itself. This is needed because
    clear_page_dirty() will call into page_mkclean() and would thereby violate
    locking order.

    [dhowells@redhat.com: Provide a page_mkclean() implementation for NOMMU]
    Signed-off-by: Peter Zijlstra
    Cc: Hugh Dickins
    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Original commit code assumes, that when a buffer on BJ_SyncData list is
    locked, it is being written to disk. But this is not true and hence it can
    lead to a potential data loss on crash. Also the code didn't count with
    the fact that journal_dirty_data() can steal buffers from committing
    transaction and hence could write buffers that no longer belong to the
    committing transaction. Finally it could possibly happen that we tried
    writing out one buffer several times.

    The patch below tries to solve these problems by a complete rewrite of the
    data commit code. We go through buffers on t_sync_datalist, lock buffers
    needing write out and store them in an array. Buffers are also immediately
    refiled to BJ_Locked list or unfiled (if the write out is completed). When
    the array is full or we have to block on buffer lock, we submit all
    accumulated buffers for IO.

    [suitable for 2.6.18.x around the 2.6.19-rc2 timeframe]

    Signed-off-by: Jan Kara
    Cc: Badari Pulavarty
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Fix

    linux/fs/compat.c: In function compat_sys_pselect7
    linux/fs/compat.c:1869: warning: ignoring return value of copy_to_user, declared with attribute warn_unused_result

    To make it easier to handle I changed to semantics to not try to
    write out a timespec if an error occurred. I hope that's ok.

    Cc: dwmw2@infradead.org

    Signed-off-by: Andi Kleen

    Andi Kleen