22 Jun, 2005

7 commits

  • There is a memory leak during mount when CONFIG_SECURITY is enabled and
    mount options are specified.

    Signed-off-by: Gerald Schaefer
    Acked-by: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Gerald Schaefer
     
  • try_to_free_pages accepts a third argument, order, but hasn't used it since
    before 2.6.0. The following patch removes the argument and updates all the
    calls to try_to_free_pages.

    Signed-off-by: Darren Hart
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Darren Hart
     
  • Ingo recently introduced a great speedup for allocating new mmaps using the
    free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and
    causes huge performance increases in thread creation.

    The downside of this patch is that it does lead to fragmentation in the
    mmap-ed areas (visible via /proc/self/maps), such that some applications
    that work fine under 2.4 kernels quickly run out of memory on any 2.6
    kernel.

    The problem is twofold:

    1) the free_area_cache is used to continue a search for memory where
    the last search ended. Before the change new areas were always
    searched from the base address on.

    So now new small areas are cluttering holes of all sizes
    throughout the whole mmap-able region whereas before small holes
    tended to close holes near the base leaving holes far from the base
    large and available for larger requests.

    2) the free_area_cache also is set to the location of the last
    munmap-ed area so in scenarios where we allocate e.g. five regions of
    1K each, then free regions 4 2 3 in this order the next request for 1K
    will be placed in the position of the old region 3, whereas before we
    appended it to the still active region 1, placing it at the location
    of the old region 2. Before we had 1 free region of 2K, now we only
    get two free regions of 1K -> fragmentation.

    The patch addresses thes issues by introducing yet another cache descriptor
    cached_hole_size that contains the largest known hole size below the
    current free_area_cache. If a new request comes in the size is compared
    against the cached_hole_size and if the request can be filled with a hole
    below free_area_cache the search is started from the base instead.

    The results look promising: Whereas 2.6.12-rc4 fragments quickly and my
    (earlier posted) leakme.c test program terminates after 50000+ iterations
    with 96 distinct and fragmented maps in /proc/self/maps it performs nicely
    (as expected) with thread creation, Ingo's test_str02 with 20000 threads
    requires 0.7s system time.

    Taking out Ingo's patch (un-patch available per request) by basically
    deleting all mentions of free_area_cache from the kernel and starting the
    search for new memory always at the respective bases we observe: leakme
    terminates successfully with 11 distinctive hardly fragmented areas in
    /proc/self/maps but thread creating is gringdingly slow: 30+s(!) system
    time for Ingo's test_str02 with 20000 threads.

    Now - drumroll ;-) the appended patch works fine with leakme: it ends with
    only 7 distinct areas in /proc/self/maps and also thread creation seems
    sufficiently fast with 0.71s for 20000 threads.

    Signed-off-by: Wolfgang Wander
    Credit-to: "Richard Purdie"
    Signed-off-by: Ken Chen
    Acked-by: Ingo Molnar (partly)
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wolfgang Wander
     
  • Add /proc/zoneinfo file to display information about memory zones. Useful
    to analyze VM behaviour.

    Signed-off-by: Nikita Danilov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nikita Danilov
     
  • This patch implements a number of smp_processor_id() cleanup ideas that
    Arjan van de Ven and I came up with.

    The previous __smp_processor_id/_smp_processor_id/smp_processor_id API
    spaghetti was hard to follow both on the implementational and on the
    usage side.

    Some of the complexity arose from picking wrong names, some of the
    complexity comes from the fact that not all architectures defined
    __smp_processor_id.

    In the new code, there are two externally visible symbols:

    - smp_processor_id(): debug variant.

    - raw_smp_processor_id(): nondebug variant. Replaces all existing
    uses of _smp_processor_id() and __smp_processor_id(). Defined
    by every SMP architecture in include/asm-*/smp.h.

    There is one new internal symbol, dependent on DEBUG_PREEMPT:

    - debug_smp_processor_id(): internal debug variant, mapped to
    smp_processor_id().

    Also, i moved debug_smp_processor_id() from lib/kernel_lock.c into a new
    lib/smp_processor_id.c file. All related comments got updated and/or
    clarified.

    I have build/boot tested the following 8 .config combinations on x86:

    {SMP,UP} x {PREEMPT,!PREEMPT} x {DEBUG_PREEMPT,!DEBUG_PREEMPT}

    I have also build/boot tested x64 on UP/PREEMPT/DEBUG_PREEMPT. (Other
    architectures are untested, but should work just fine.)

    Signed-off-by: Ingo Molnar
    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Here's a much smaller patch to simply disable devfs from the build. If
    this goes well, and there are no complaints for a few weeks, I'll resend
    my big "devfs-die-die-die" series of patches that rip the whole thing
    out of the kernel tree.

    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Linus Torvalds

    Greg KH
     
  • Linus Torvalds
     

21 Jun, 2005

8 commits

  • Without this change I can't set an attribute exactly PAGE_SIZE in
    length. There is no need for zero termination because the interface
    uses lengths.

    From: Jon Smirl
    Signed-off-by: Greg Kroah-Hartman

    Jon Smirl
     
  • o Following patch sets the attributes for newly allocated inodes for sysfs
    objects. If the object has non-default attributes, inode attributes are
    set as saved in sysfs_dirent->s_iattr, pointer to struct iattr.

    Signed-off-by: Maneesh Soni
    Signed-off-by: Greg Kroah-Hartman

    Maneesh Soni
     
  • o This adds ->i_op->setattr VFS method for sysfs inodes. The changed
    attribues are saved in the persistent sysfs_dirent structure as a pointer
    to struct iattr. The struct iattr is allocated only for those sysfs_dirent's
    for which default attributes are getting changed. Thanks to Jon Smirl for
    this suggestion.

    Signed-off-by: Maneesh Soni
    Signed-off-by: Greg Kroah-Hartman

    Maneesh Soni
     
  • o The following patch makes sure to attach sysfs_dirent to the dentry before
    allocation a new inode through sysfs_create(). This change is done as
    preparatory work for implementing ->i_op->setattr() functionality for
    sysfs objects.

    Signed-off-by: Maneesh Soni
    Signed-off-by: Greg Kroah-Hartman

    Maneesh Soni
     
  • Based on the discussion about spufs attributes, this is my suggestion
    for a more generic attribute file support that can be used by both
    debugfs and spufs.

    Simple attribute files behave similarly to sequential files from
    a kernel programmers perspective in that a standard set of file
    operations is provided and only an open operation needs to
    be written that registers file specific get() and set() functions.

    These operations are defined as

    void foo_set(void *data, u64 val); and
    u64 foo_get(void *data);

    where data is the inode->u.generic_ip pointer of the file and the
    operations just need to make send of that pointer. The infrastructure
    makes sure this works correctly with concurrent access and partial
    read calls.

    A macro named DEFINE_SIMPLE_ATTRIBUTE is provided to further simplify
    using the attributes.

    This patch already contains the changes for debugfs to use attributes
    for its internal file operations.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: Greg Kroah-Hartman

    Arnd Bergmann
     
  • Signed-off-by: Greg Kroah-Hartman

    gregkh@suse.de
     
  • sysfs: if attribute does not implement show or store method
    read/write should return -EIO instead of 0 or -EINVAL.

    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Greg Kroah-Hartman

    Dmitry Torokhov
     
  • sysfs: make sysfs_{create|remove}_link to take const char * name.

    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Greg Kroah-Hartman

    Dmitry Torokhov
     

20 Jun, 2005

1 commit


19 Jun, 2005

1 commit


17 Jun, 2005

1 commit

  • The ELF core dump code has one use of off_t when writing out segments.
    Some of the segments may be passed the 2GB limit of an off_t, even on a
    32-bit system, so it's important to use loff_t instead. This fixes a
    corrupted core dump in the bigcore test in GDB's testsuite.

    Signed-off-by: Daniel Jacobowitz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Jacobowitz
     

14 Jun, 2005

1 commit


10 Jun, 2005

2 commits


08 Jun, 2005

1 commit

  • We should never apply a lookup intent to anything other than the last
    path component in an open(), create() or access() call.

    Introduce the helper nfs_lookup_check_intent() which always returns
    zero if LOOKUP_CONTINUE or LOOKUP_PARENT are set, and returns the
    intent flags if we're on the last component of the lookup.
    By doing so, we fix a bug in open(O_EXCL), where we may end up
    optimizing away a real lookup of the parent directory.

    Problem noticed by Linda Dunaphant
    Signed-off-by: Trond Myklebust
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     

07 Jun, 2005

18 commits

  • Make sure that binfmt_flat passes the correct flags into do_mmap(). nommu's
    validate_mmap_request() will simple return -EINVAL if we try and pass it a
    flags value of zero.

    Signed-off-by: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yoshinori Sato
     
  • __do_follow_link() passes potentially worng vfsmount to touch_atime(). It
    matters only in (currently impossible) case of symlink mounted on something,
    but it's trivial to fix and that actually makes more sense.

    Signed-off-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • Cosmetical cleanups - __follow_mount() calls in __link_path_walk() absorbed
    into do_lookup().

    Obviously equivalent transformation.

    Signed-off-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • follow_mount() made void, reordered dput()/mntput() in it.

    follow_dotdot() switched from struct vfmount ** + struct dentry ** to
    struct nameidata *; callers updated.

    Equivalent transformation + fix for too-early-mntput() race.

    Signed-off-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • Conditional mntput() moved into __do_follow_link(). There it collapses with
    unconditional mntget() on the same sucker, closing another too-early-mntput()
    race.

    Signed-off-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • Getting rid of sloppy logics:

    a) in do_follow_link() we have the wrong vfsmount dropped if our symlink
    had been mounted on something. Currently it worls only because we never
    get such situation (modulo filesystem playing dirty tricks on us). And
    it obfuscates already convoluted logics...

    b) same goes for open_namei().

    c) in __link_path_walk() we have another "it should never happen" sloppiness -
    out_dput: there does double-free on underlying vfsmount and leaks the covering
    one if we hit it just after crossing a mountpoint. Again, wrong vfsmount
    getting dropped.

    d) another too-early-mntput() race - in do_follow_mount() we need to postpone
    conditional mntput(path->mnt) until after dput(path->dentry). Again, this one
    happens only in it-currently-never-happens-unless-some-fs-plays-dirty
    scenario...

    Signed-off-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • shifted conditional mntput() into do_follow_link() - all callers were doing
    the same thing.

    Obviously equivalent transformation.

    Signed-off-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • In open_namei() exit_dput: we have mntput() done in the wrong order -
    if nd->mnt != path.mnt we end up doing
    mntput(nd->mnt);
    nd->mnt = path.mnt;
    dput(nd->dentry);
    mntput(nd->mnt);
    which drops nd->dentry too late. Fixed by having path.mnt go first.
    That allows to switch O_NOFOLLOW under if (__follow_mount(...)) back
    to exit_dput, while we are at it.

    Fix for early-mntput() race + equivalent transformation.

    Signed-off-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • In open_namei() we take mntput(nd->mnt);nd->mnt=path.mnt; out of the if
    (__follow_mount(...)), making it conditional on nd->mnt != path.mnt instead.

    Then we shift the result downstream.

    Equivalent transformations.

    Signed-off-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • shifted conditional mntput() calls in __link_path_walk() downstream.

    Obviously equivalent transformation.

    Signed-off-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • In open_namei(), __follow_down() loop turned into __follow_mount().
    Instead of
    if we are on a mountpoint dentry
    if O_NOFOLLOW checks fail
    drop path.dentry
    drop nd
    return
    do equivalent of follow_mount(&path.mnt, &path.dentry)
    nd->mnt = path.mnt
    we do
    if __follow_mount(path) had, indeed, traversed mountpoint
    /* now both nd->mnt and path.mnt are pinned down */
    if O_NOFOLLOW checks fail
    drop path.dentry
    drop path.mnt
    drop nd
    return
    mntput(nd->mnt)
    nd->mnt = path.mnt

    Now __follow_down() can be folded into follow_down() - no other callers left.
    We need to reorder dput()/mntput() there - same problem as in follow_mount().

    Equivalent transformation + fix for a bug in O_NOFOLLOW handling - we used to
    get -ELOOP if we had the same fs mounted on /foo and /bar, had something bound
    on /bar/baz and tried to open /foo/baz with O_NOFOLLOW. And fix of
    too-early-mntput() race in follow_down()

    Signed-off-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • New helper: __follow_mount(struct path *path). Same as follow_mount(), except
    that we do *not* do mntput() after the first lookup_mnt().

    IOW, original path->mnt stays pinned down. We also take care to do dput()
    before mntput() in the loop body (follow_mount() also needs that reordering,
    but that will be done later in the series).

    The following are equivalent, assuming that path.mnt == x:
    (1)
    follow_mount(&path.mnt, &path.dentry)
    (2)
    __follow_mount(&path);
    if (path->mnt != x)
    mntput(x);
    (3)
    if (__follow_mount(&path))
    mntput(x);

    Callers of follow_mount() in __link_path_walk() converted to (2).

    Equivalent transformation + fix for too-late-mntput() race in __follow_mount()
    loop.

    Signed-off-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • In open_namei() we never use path.mnt or path.dentry after exit: or ok:.
    Assignment of path.dentry in case of LAST_BIND is dead code and only
    obfuscates already convoluted function; assignment of path.mnt after
    __do_follow_link() can be moved down to the place where we set path.dentry.

    Obviously equivalent transformations, just to clean the air a bit in that
    region.

    Signed-off-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • The first argument of __do_follow_link() switched to struct path *
    (__do_follow_link(path->dentry, ...) -> __do_follow_link(path, ...)).

    All callers have the same calls of mntget() right before and dput()/mntput()
    right after __do_follow_link(); these calls have been moved inside.

    Obviously equivalent transformations.

    Signed-off-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • mntget(path->mnt) in do_follow_link() moved down to right before the
    __do_follow_link() call and rigth after loop: resp.

    dput()+mntput() on non-ELOOP branch moved up to right after __do_follow_link()
    call.

    resulting
    loop:
    mntget(path->mnt);
    path_release(nd);
    dput(path->mnt);
    mntput(path->mnt);
    replaced with equivalent
    dput(path->mnt);
    path_release(nd);

    Equivalent transformations - the reason why we have that mntget() is that
    __do_follow_link() can drop a reference to nd->mnt and that's what holds
    path->mnt. So that call can happen at any point prior to __do_follow_link()
    touching nd->mnt. The rest is obvious.

    NOTE: current tree relies on symlinks *never* being mounted on anything. It's
    not hard to get rid of that assumption (actually, that will come for free
    later in the series). For now we are just not making the situation worse than
    it is.

    Signed-off-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • fix for too early mntput() in open_namei() - we pin path.mnt down for the
    duration of __do_follow_link(). Otherwise we could get the fs where our
    symlink lived unmounted while we were in __do_follow_link(). That would end
    up with dentry of symlink staying pinned down through the fs shutdown.

    Signed-off-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • path.mnt in open_namei() set to mirror nd->mnt.

    nd->mnt is set in 3 places in that function - path_lookup() in the beginning,
    __follow_down() loop after do_last: and __do_follow_link() call after
    do_link:.

    We set path.mnt to nd->mnt after path_lookup() and __do_follow_link(). In
    __follow_down() loop we use &path.mnt instead of &nd->mnt and set nd->mnt to
    path.mnt immediately after that loop.

    Obviously equivalent transformation.

    Signed-off-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • Replaced struct dentry *dentry in namei with struct path path. All uses of
    dentry replaced with path.dentry there.

    Obviously equivalent transformation.

    Signed-off-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Al Viro