Eric Lee / smarc-fsl-linux-kernel

22 Jun, 2005

7 commits

8680e22f2 [PATCH] VFS: memory leak in do_kern_mount() ... Browse Code »

There is a memory leak during mount when CONFIG_SECURITY is enabled and
mount options are specified.

Signed-off-by: Gerald Schaefer
Acked-by: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gerald Schaefer
2005-06-22 09:46:22 +0800
1ad539b2b [PATCH] vm: try_to_free_pages unused argument ... Browse Code »

try_to_free_pages accepts a third argument, order, but hasn't used it since
before 2.6.0. The following patch removes the argument and updates all the
calls to try_to_free_pages.

Signed-off-by: Darren Hart
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Darren Hart
2005-06-22 09:46:17 +0800
1363c3cd8 [PATCH] Avoiding mmap fragmentation ... Browse Code »

Ingo recently introduced a great speedup for allocating new mmaps using the
free_area_cache pointer which boosts the specweb SSL benchmark by 4-5% and
causes huge performance increases in thread creation.

The downside of this patch is that it does lead to fragmentation in the
mmap-ed areas (visible via /proc/self/maps), such that some applications
that work fine under 2.4 kernels quickly run out of memory on any 2.6
kernel.

The problem is twofold:

1) the free_area_cache is used to continue a search for memory where
the last search ended. Before the change new areas were always
searched from the base address on.

So now new small areas are cluttering holes of all sizes
throughout the whole mmap-able region whereas before small holes
tended to close holes near the base leaving holes far from the base
large and available for larger requests.

2) the free_area_cache also is set to the location of the last
munmap-ed area so in scenarios where we allocate e.g. five regions of
1K each, then free regions 4 2 3 in this order the next request for 1K
will be placed in the position of the old region 3, whereas before we
appended it to the still active region 1, placing it at the location
of the old region 2. Before we had 1 free region of 2K, now we only
get two free regions of 1K -> fragmentation.

The patch addresses thes issues by introducing yet another cache descriptor
cached_hole_size that contains the largest known hole size below the
current free_area_cache. If a new request comes in the size is compared
against the cached_hole_size and if the request can be filled with a hole
below free_area_cache the search is started from the base instead.

The results look promising: Whereas 2.6.12-rc4 fragments quickly and my
(earlier posted) leakme.c test program terminates after 50000+ iterations
with 96 distinct and fragmented maps in /proc/self/maps it performs nicely
(as expected) with thread creation, Ingo's test_str02 with 20000 threads
requires 0.7s system time.

Taking out Ingo's patch (un-patch available per request) by basically
deleting all mentions of free_area_cache from the kernel and starting the
search for new memory always at the respective bases we observe: leakme
terminates successfully with 11 distinctive hardly fragmented areas in
/proc/self/maps but thread creating is gringdingly slow: 30+s(!) system
time for Ingo's test_str02 with 20000 threads.

Now - drumroll ;-) the appended patch works fine with leakme: it ends with
only 7 distinct areas in /proc/self/maps and also thread creation seems
sufficiently fast with 0.71s for 20000 threads.

Signed-off-by: Wolfgang Wander
Credit-to: "Richard Purdie"
Signed-off-by: Ken Chen
Acked-by: Ingo Molnar (partly)
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Wolfgang Wander
2005-06-22 09:46:16 +0800
295ab9349 [PATCH] mm: add /proc/zoneinfo ... Browse Code »

Add /proc/zoneinfo file to display information about memory zones. Useful
to analyze VM behaviour.

Signed-off-by: Nikita Danilov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nikita Danilov
2005-06-22 09:46:14 +0800
39c715b71 [PATCH] smp_processor_id() cleanup ... Browse Code »

This patch implements a number of smp_processor_id() cleanup ideas that
Arjan van de Ven and I came up with.

The previous __smp_processor_id/_smp_processor_id/smp_processor_id API
spaghetti was hard to follow both on the implementational and on the
usage side.

Some of the complexity arose from picking wrong names, some of the
complexity comes from the fact that not all architectures defined
__smp_processor_id.

In the new code, there are two externally visible symbols:

- smp_processor_id(): debug variant.

- raw_smp_processor_id(): nondebug variant. Replaces all existing
uses of _smp_processor_id() and __smp_processor_id(). Defined
by every SMP architecture in include/asm-*/smp.h.

There is one new internal symbol, dependent on DEBUG_PREEMPT:

- debug_smp_processor_id(): internal debug variant, mapped to
smp_processor_id().

Also, i moved debug_smp_processor_id() from lib/kernel_lock.c into a new
lib/smp_processor_id.c file. All related comments got updated and/or
clarified.

I have build/boot tested the following 8 .config combinations on x86:

{SMP,UP} x {PREEMPT,!PREEMPT} x {DEBUG_PREEMPT,!DEBUG_PREEMPT}

I have also build/boot tested x64 on UP/PREEMPT/DEBUG_PREEMPT. (Other
architectures are untested, but should work just fine.)

Signed-off-by: Ingo Molnar
Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ingo Molnar
2005-06-22 09:46:13 +0800
2c6e5a839 [PATCH] devfs: remove devfs from Kconfig preventing it from being built ... Browse Code »

Here's a much smaller patch to simply disable devfs from the build. If
this goes well, and there are no complaints for a few weeks, I'll resend
my big "devfs-die-die-die" series of patches that rip the whole thing
out of the kernel tree.

Signed-off-by: Greg Kroah-Hartman
Signed-off-by: Linus Torvalds

Greg KH
2005-06-22 06:41:16 +0800
9527cc77e Merge 'for-linus' branch of rsync://rsync.kernel.org/pub/scm/linux/kernel/git/shaggy/jfs-2.6 Browse Code »

Linus Torvalds
2005-06-22 05:49:35 +0800

21 Jun, 2005

8 commits

9d9d27fb6 [PATCH] SYSFS: fix PAGE_SIZE check ... Browse Code »

Without this change I can't set an attribute exactly PAGE_SIZE in
length. There is no need for zero termination because the interface
uses lengths.

From: Jon Smirl
Signed-off-by: Greg Kroah-Hartman

Jon Smirl
2005-06-21 06:15:38 +0800
8215534ce [PATCH] sysfs-iattr: set inode attributes ... Browse Code »

o Following patch sets the attributes for newly allocated inodes for sysfs
objects. If the object has non-default attributes, inode attributes are
set as saved in sysfs_dirent->s_iattr, pointer to struct iattr.

Signed-off-by: Maneesh Soni
Signed-off-by: Greg Kroah-Hartman

Maneesh Soni
2005-06-21 06:15:37 +0800
988d186de [PATCH] sysfs-iattr: add sysfs_setattr ... Browse Code »

o This adds ->i_op->setattr VFS method for sysfs inodes. The changed
attribues are saved in the persistent sysfs_dirent structure as a pointer
to struct iattr. The struct iattr is allocated only for those sysfs_dirent's
for which default attributes are getting changed. Thanks to Jon Smirl for
this suggestion.

Signed-off-by: Maneesh Soni
Signed-off-by: Greg Kroah-Hartman

Maneesh Soni
2005-06-21 06:15:37 +0800
6fa5c828c [PATCH] sysfs-iattr: attach sysfs_dirent before new inode ... Browse Code »

o The following patch makes sure to attach sysfs_dirent to the dentry before
allocation a new inode through sysfs_create(). This change is done as
preparatory work for implementing ->i_op->setattr() functionality for
sysfs objects.

Signed-off-by: Maneesh Soni
Signed-off-by: Greg Kroah-Hartman

Maneesh Soni
2005-06-21 06:15:36 +0800
acaefc25d [PATCH] libfs: add simple attribute files ... Browse Code »

Based on the discussion about spufs attributes, this is my suggestion
for a more generic attribute file support that can be used by both
debugfs and spufs.

Simple attribute files behave similarly to sequential files from
a kernel programmers perspective in that a standard set of file
operations is provided and only an open operation needs to
be written that registers file specific get() and set() functions.

These operations are defined as

void foo_set(void *data, u64 val); and
u64 foo_get(void *data);

where data is the inode->u.generic_ip pointer of the file and the
operations just need to make send of that pointer. The infrastructure
makes sure this works correctly with concurrent access and partial
read calls.

A macro named DEFINE_SIMPLE_ATTRIBUTE is provided to further simplify
using the attributes.

This patch already contains the changes for debugfs to use attributes
for its internal file operations.

Signed-off-by: Arnd Bergmann
Signed-off-by: Greg Kroah-Hartman

Arnd Bergmann
2005-06-21 06:15:30 +0800
1db560afe [PATCH] class: convert the remaining class_simple users in the kernel to usee the new class api ... Browse Code »

Signed-off-by: Greg Kroah-Hartman

gregkh@suse.de
2005-06-21 06:15:11 +0800
c76d0abd0 [PATCH] sysfs: if show/store is missing return -EIO ... Browse Code »

sysfs: if attribute does not implement show or store method
read/write should return -EIO instead of 0 or -EINVAL.

Signed-off-by: Dmitry Torokhov
Signed-off-by: Greg Kroah-Hartman

Dmitry Torokhov
2005-06-21 06:15:02 +0800
e3a15db24 [PATCH] sysfs_{create|remove}_link should take const char * ... Browse Code »

sysfs: make sysfs_{create|remove}_link to take const char * name.

Signed-off-by: Dmitry Torokhov
Signed-off-by: Greg Kroah-Hartman

Dmitry Torokhov
2005-06-21 06:15:00 +0800

20 Jun, 2005

1 commit

d039ba24f Merge with /home/shaggy/git/linus-clean/ Browse Code »

Dave Kleikamp
2005-06-20 21:44:00 +0800

19 Jun, 2005

1 commit

c2a0f5943 Clean up subthread exec ... Browse Code »

Make sure we re-parent itimers, and use BUG_ON() instead of an explicit
conditional BUG().

Linus Torvalds
2005-06-19 04:06:22 +0800

17 Jun, 2005

1 commit

5db92850d [PATCH] Fix large core dumps with a 32-bit off_t ... Browse Code »

The ELF core dump code has one use of off_t when writing out segments.
Some of the segments may be passed the 2GB limit of an off_t, even on a
32-bit system, so it's important to use loff_t instead. This fixes a
corrupted core dump in the bigcore test in GDB's testsuite.

Signed-off-by: Daniel Jacobowitz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Daniel Jacobowitz
2005-06-17 00:02:59 +0800

14 Jun, 2005

1 commit

980802e31 [PATCH] NFS: Ensure that we revalidate the cached file length for llseek(SEEK_END) ... Browse Code »

This fixes a data corruption error for mail delivery applications that
expect to be able to do posix locking and then append writes on NFS.

Signed-off-by: Trond Myklebust
Signed-off-by: Linus Torvalds

Trond Myklebust
2005-06-14 01:33:02 +0800

10 Jun, 2005

2 commits

f5d9b97ee Merge with rsync://rsync.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git Browse Code »

Steve French
2005-06-10 05:44:56 +0800
3079ca621 [CIFS] Fix cifs update of page cache. Write at correct offset when out of memory ... Browse Code »

and add_to_page_cache fails.

Thanks to Shaggy for pointing out the fix.

Signed-off-by: Steve French (sfrench@us.ibm.com)
Signed-off-by: Shaggy (shaggy@us.ibm.com)

Steve French
2005-06-10 05:44:07 +0800

08 Jun, 2005

1 commit

1d6757fbf [PATCH] NFS: Fix lookup intent handling ... Browse Code »

We should never apply a lookup intent to anything other than the last
path component in an open(), create() or access() call.

Introduce the helper nfs_lookup_check_intent() which always returns
zero if LOOKUP_CONTINUE or LOOKUP_PARENT are set, and returns the
intent flags if we're on the last component of the lookup.
By doing so, we fix a bug in open(O_EXCL), where we may end up
optimizing away a real lookup of the parent directory.

Problem noticed by Linda Dunaphant
Signed-off-by: Trond Myklebust
Signed-off-by: Linus Torvalds

Trond Myklebust
2005-06-08 06:53:47 +0800

07 Jun, 2005

18 commits

8f5bb0438 [PATCH] binfmt_flat mmap flag fix ... Browse Code »

Make sure that binfmt_flat passes the correct flags into do_mmap(). nommu's
validate_mmap_request() will simple return -EINVAL if we try and pass it a
flags value of zero.

Signed-off-by: Yoshinori Sato
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Yoshinori Sato
2005-06-07 05:57:51 +0800
d671a1cbf [PATCH] namei fixes (19/19) ... Browse Code »

__do_follow_link() passes potentially worng vfsmount to touch_atime(). It
matters only in (currently impossible) case of symlink mounted on something,
but it's trivial to fix and that actually makes more sense.

Signed-off-by: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Al Viro
2005-06-07 05:42:27 +0800
634ee7017 [PATCH] namei fixes (18/19) ... Browse Code »

Cosmetical cleanups - __follow_mount() calls in __link_path_walk() absorbed
into do_lookup().

Obviously equivalent transformation.

Signed-off-by: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Al Viro
2005-06-07 05:42:27 +0800
58c465eba [PATCH] namei fixes (17/19) ... Browse Code »

follow_mount() made void, reordered dput()/mntput() in it.

follow_dotdot() switched from struct vfmount ** + struct dentry ** to
struct nameidata *; callers updated.

Equivalent transformation + fix for too-early-mntput() race.

Signed-off-by: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Al Viro
2005-06-07 05:42:27 +0800
39ca6d497 [PATCH] namei fixes (16/19) ... Browse Code »

Conditional mntput() moved into __do_follow_link(). There it collapses with
unconditional mntget() on the same sucker, closing another too-early-mntput()
race.

Signed-off-by: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Al Viro
2005-06-07 05:42:27 +0800
d9d29a296 [PATCH] namei fixes (15/19) ... Browse Code »

Getting rid of sloppy logics:

a) in do_follow_link() we have the wrong vfsmount dropped if our symlink
had been mounted on something. Currently it worls only because we never
get such situation (modulo filesystem playing dirty tricks on us). And
it obfuscates already convoluted logics...

b) same goes for open_namei().

c) in __link_path_walk() we have another "it should never happen" sloppiness -
out_dput: there does double-free on underlying vfsmount and leaks the covering
one if we hit it just after crossing a mountpoint. Again, wrong vfsmount
getting dropped.

d) another too-early-mntput() race - in do_follow_mount() we need to postpone
conditional mntput(path->mnt) until after dput(path->dentry). Again, this one
happens only in it-currently-never-happens-unless-some-fs-plays-dirty
scenario...

Signed-off-by: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Al Viro
2005-06-07 05:42:27 +0800
4b7b9772e [PATCH] namei fixes (14/19) ... Browse Code »

shifted conditional mntput() into do_follow_link() - all callers were doing
the same thing.

Obviously equivalent transformation.

Signed-off-by: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Al Viro
2005-06-07 05:42:26 +0800
ba7a4c1a7 [PATCH] namei fixes (13/19) ... Browse Code »

In open_namei() exit_dput: we have mntput() done in the wrong order -
if nd->mnt != path.mnt we end up doing
mntput(nd->mnt);
nd->mnt = path.mnt;
dput(nd->dentry);
mntput(nd->mnt);
which drops nd->dentry too late. Fixed by having path.mnt go first.
That allows to switch O_NOFOLLOW under if (__follow_mount(...)) back
to exit_dput, while we are at it.

Fix for early-mntput() race + equivalent transformation.

Signed-off-by: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Al Viro
2005-06-07 05:42:26 +0800
a15a3f6fc [PATCH] namei fixes (12/19) ... Browse Code »

In open_namei() we take mntput(nd->mnt);nd->mnt=path.mnt; out of the if
(__follow_mount(...)), making it conditional on nd->mnt != path.mnt instead.

Then we shift the result downstream.

Equivalent transformations.

Signed-off-by: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Al Viro
2005-06-07 05:42:26 +0800
2f12dbfbb [PATCH] namei fixes (11/19) ... Browse Code »

shifted conditional mntput() calls in __link_path_walk() downstream.

Obviously equivalent transformation.

Signed-off-by: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Al Viro
2005-06-07 05:42:26 +0800
e13b210f6 [PATCH] namei fixes (10/19) ... Browse Code »

In open_namei(), __follow_down() loop turned into __follow_mount().
Instead of
if we are on a mountpoint dentry
if O_NOFOLLOW checks fail
drop path.dentry
drop nd
return
do equivalent of follow_mount(&path.mnt, &path.dentry)
nd->mnt = path.mnt
we do
if __follow_mount(path) had, indeed, traversed mountpoint
/* now both nd->mnt and path.mnt are pinned down */
if O_NOFOLLOW checks fail
drop path.dentry
drop path.mnt
drop nd
return
mntput(nd->mnt)
nd->mnt = path.mnt

Now __follow_down() can be folded into follow_down() - no other callers left.
We need to reorder dput()/mntput() there - same problem as in follow_mount().

Equivalent transformation + fix for a bug in O_NOFOLLOW handling - we used to
get -ELOOP if we had the same fs mounted on /foo and /bar, had something bound
on /bar/baz and tried to open /foo/baz with O_NOFOLLOW. And fix of
too-early-mntput() race in follow_down()

Signed-off-by: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Al Viro
2005-06-07 05:42:26 +0800
463ffb2e9 [PATCH] namei fixes (9/19) ... Browse Code »

New helper: __follow_mount(struct path *path). Same as follow_mount(), except
that we do *not* do mntput() after the first lookup_mnt().

IOW, original path->mnt stays pinned down. We also take care to do dput()
before mntput() in the loop body (follow_mount() also needs that reordering,
but that will be done later in the series).

The following are equivalent, assuming that path.mnt == x:
(1)
follow_mount(&path.mnt, &path.dentry)
(2)
__follow_mount(&path);
if (path->mnt != x)
mntput(x);
(3)
if (__follow_mount(&path))
mntput(x);

Callers of follow_mount() in __link_path_walk() converted to (2).

Equivalent transformation + fix for too-late-mntput() race in __follow_mount()
loop.

Signed-off-by: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Al Viro
2005-06-07 05:42:25 +0800
d671d5e51 [PATCH] namei fixes (8/19) ... Browse Code »

In open_namei() we never use path.mnt or path.dentry after exit: or ok:.
Assignment of path.dentry in case of LAST_BIND is dead code and only
obfuscates already convoluted function; assignment of path.mnt after
__do_follow_link() can be moved down to the place where we set path.dentry.

Obviously equivalent transformations, just to clean the air a bit in that
region.

Signed-off-by: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Al Viro
2005-06-07 05:42:25 +0800
cd4e91d3b [PATCH] namei fixes (7/19) ... Browse Code »

The first argument of __do_follow_link() switched to struct path *
(__do_follow_link(path->dentry, ...) -> __do_follow_link(path, ...)).

All callers have the same calls of mntget() right before and dput()/mntput()
right after __do_follow_link(); these calls have been moved inside.

Obviously equivalent transformations.

Signed-off-by: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Al Viro
2005-06-07 05:42:25 +0800
839d9f93c [PATCH] namei fixes (6/19) ... Browse Code »

mntget(path->mnt) in do_follow_link() moved down to right before the
__do_follow_link() call and rigth after loop: resp.

dput()+mntput() on non-ELOOP branch moved up to right after __do_follow_link()
call.

resulting
loop:
mntget(path->mnt);
path_release(nd);
dput(path->mnt);
mntput(path->mnt);
replaced with equivalent
dput(path->mnt);
path_release(nd);

Equivalent transformations - the reason why we have that mntget() is that
__do_follow_link() can drop a reference to nd->mnt and that's what holds
path->mnt. So that call can happen at any point prior to __do_follow_link()
touching nd->mnt. The rest is obvious.

NOTE: current tree relies on symlinks *never* being mounted on anything. It's
not hard to get rid of that assumption (actually, that will come for free
later in the series). For now we are just not making the situation worse than
it is.

Signed-off-by: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Al Viro
2005-06-07 05:42:25 +0800
1be4a0900 [PATCH] namei fixes (5/19) ... Browse Code »

fix for too early mntput() in open_namei() - we pin path.mnt down for the
duration of __do_follow_link(). Otherwise we could get the fs where our
symlink lived unmounted while we were in __do_follow_link(). That would end
up with dentry of symlink staying pinned down through the fs shutdown.

Signed-off-by: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Al Viro
2005-06-07 05:42:24 +0800
d73ffe16b [PATCH] namei fixes (4/19) ... Browse Code »

path.mnt in open_namei() set to mirror nd->mnt.

nd->mnt is set in 3 places in that function - path_lookup() in the beginning,
__follow_down() loop after do_last: and __do_follow_link() call after
do_link:.

We set path.mnt to nd->mnt after path_lookup() and __do_follow_link(). In
__follow_down() loop we use &path.mnt instead of &nd->mnt and set nd->mnt to
path.mnt immediately after that loop.

Obviously equivalent transformation.

Signed-off-by: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Al Viro
2005-06-07 05:42:24 +0800
4e7506e4d [PATCH] namei fixes (3/19) ... Browse Code »

Replaced struct dentry *dentry in namei with struct path path. All uses of
dentry replaced with path.dentry there.

Obviously equivalent transformation.

Signed-off-by: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Al Viro
2005-06-07 05:42:24 +0800