Eric Lee / smarc-fsl-linux-kernel

13 Jul, 2009

1 commit

405f55712 headers: smp_lock.h redux ... Browse Code »

* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

This will make hardirq.h inclusion cheaper for every PREEMPT=n config
(which includes allmodconfig/allyesconfig, BTW)

Signed-off-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-07-13 03:22:34 +0800

17 Jun, 2009

2 commits

8eeee4e2f send_sigio_to_task: sanitize the usage of fown->signum ... Browse Code »

send_sigio_to_task() reads fown->signum several times, we can race with
F_SETSIG which changes ->signum lockless. In theory, this can fool
security checks or we can call group_send_sig_info() with the wrong
->si_signo which does not match "int sig".

Change the code to cache ->signum.

Signed-off-by: Oleg Nesterov
Signed-off-by: Linus Torvalds

Oleg Nesterov
2009-06-17 06:36:17 +0800
2f38d70fb shift current_cred() from __f_setown() to f_modown() ... Browse Code »

Shift current_cred() from __f_setown() to f_modown(). This reduces
the number of arguments and saves 48 bytes from fs/fcntl.o.

[ Note: this doesn't clear euid/uid when pid is set to NULL. But if
f_owner.pid == NULL we never use f_owner.uid/euid. Otherwise we'd
have a bug anyway: we must not send signals if pid was reset to NULL. ]

Signed-off-by: Oleg Nesterov
Acked-by: David Howells
Signed-off-by: Linus Torvalds

Oleg Nesterov
2009-06-17 05:19:00 +0800

12 May, 2009

1 commit

2b79bc4f7 dup2: Fix return value with oldfd == newfd and invalid fd ... Browse Code »

The return value of dup2 when oldfd == newfd and the fd isn't valid is
not getting properly sign extended. We end up with 4294967287 instead
of -EBADF.

I've reproduced this on SLE11 (2.6.27.21), openSUSE Factory
(2.6.29-rc5), and Ubuntu 9.04 (2.6.28).

This patch uses a signed int for the error value so it is properly
extended.

Commit 6c5d0512a091480c9f981162227fdb1c9d70e555 introduced this
regression.

Reported-by: Jiri Dluhos
Signed-off-by: Jeff Mahoney
Signed-off-by: Linus Torvalds

Jeff Mahoney
2009-05-12 03:18:06 +0800

30 Mar, 2009

1 commit

4a6a44996 Fix a lockdep warning in fasync_helper() ... Browse Code »

Lockdep gripes if file->f_lock is taken in a no-IRQ situation, since that
is not always the case. We don't really want to disable IRQs for every
acquisition of f_lock; instead, just move it outside of fasync_lock.

Reported-by: Bartlomiej Zolnierkiewicz
Reported-by: Larry Finger
Reported-by: Wu Fengguang
Signed-off-by: Jonathan Corbet

Jonathan Corbet
2009-03-30 22:00:24 +0800

16 Mar, 2009

3 commits

60aa49243 Rationalize fasync return values ... Browse Code »

Most fasync implementations do something like:

return fasync_helper(...);

But fasync_helper() will return a positive value at times - a feature used
in at least one place. Thus, a number of other drivers do:

err = fasync_helper(...);
if (err < 0)
return err;
return 0;

In the interests of consistency and more concise code, it makes sense to
map positive return values onto zero where ->fasync() is called.

Cc: Al Viro
Signed-off-by: Jonathan Corbet

Jonathan Corbet
2009-03-16 22:34:35 +0800
76398425b Move FASYNC bit handling to f_op->fasync() ... Browse Code »

Removing the BKL from FASYNC handling ran into the challenge of keeping the
setting of the FASYNC bit in filp->f_flags atomic with regard to calls to
the underlying fasync() function. Andi Kleen suggested moving the handling
of that bit into fasync(); this patch does exactly that. As a result, we
have a couple of internal API changes: fasync() must now manage the FASYNC
bit, and it will be called without the BKL held.

As it happens, every fasync() implementation in the kernel with one
exception calls fasync_helper(). So, if we make fasync_helper() set the
FASYNC bit, we can avoid making any changes to the other fasync()
functions - as long as those functions, themselves, have proper locking.
Most fasync() implementations do nothing but call fasync_helper() - which
has its own lock - so they are easily verified as correct. The BKL had
already been pushed down into the rest.

The networking code has its own version of fasync_helper(), so that code
has been augmented with explicit FASYNC bit handling.

Cc: Al Viro
Cc: David Miller
Reviewed-by: Christoph Hellwig
Signed-off-by: Jonathan Corbet

Jonathan Corbet
2009-03-16 22:32:27 +0800
db1dd4d37 Use f_lock to protect f_flags ... Browse Code »

Traditionally, changes to struct file->f_flags have been done under BKL
protection, or with no protection at all. This patch causes all f_flags
changes after file open/creation time to be done under protection of
f_lock. This allows the removal of some BKL usage and fixes a number of
longstanding (if microscopic) races.

Reviewed-by: Christoph Hellwig
Cc: Al Viro
Signed-off-by: Jonathan Corbet

Jonathan Corbet
2009-03-16 22:32:27 +0800

14 Jan, 2009

1 commit

a26eab240 [CVE-2009-0029] System call wrappers part 15 ... Browse Code »

Signed-off-by: Heiko Carstens

Heiko Carstens
2009-01-14 21:15:24 +0800

25 Dec, 2008

1 commit

cbacc2c7f Merge branch 'next' into for-linus Browse Code »

James Morris
2008-12-25 08:40:09 +0800

06 Dec, 2008

1 commit

218d11a8b Fix a race condition in FASYNC handling ... Browse Code »

Changeset a238b790d5f99c7832f9b73ac8847025815b85f7 (Call fasync()
functions without the BKL) introduced a race which could leave
file->f_flags in a state inconsistent with what the underlying
driver/filesystem believes. Revert that change, and also fix the same
races in ioctl_fioasync() and ioctl_fionbio().

This is a minimal, short-term fix; the real fix will not involve the
BKL.

Reported-by: Oleg Nesterov
Cc: Andi Kleen
Cc: Al Viro
Cc: stable@kernel.org
Signed-off-by: Jonathan Corbet
Signed-off-by: Linus Torvalds

Jonathan Corbet
2008-12-06 07:35:10 +0800

14 Nov, 2008

4 commits

c69e8d9c0 CRED: Use RCU to access another task's creds and to release a task's own creds ... Browse Code »

Use RCU to access another task's creds and to release a task's own creds.
This means that it will be possible for the credentials of a task to be
replaced without another task (a) requiring a full lock to read them, and (b)
seeing deallocated memory.

Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:19 +0800
86a264abe CRED: Wrap current->cred and a few other accessors ... Browse Code »

Wrap current->cred and a few other accessors to hide their actual
implementation.

Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:18 +0800
b6dff3ec5 CRED: Separate task security context from task_struct ... Browse Code »

Separate the task security context from task_struct. At this point, the
security data is temporarily embedded in the task_struct with two pointers
pointing to it.

Note that the Alpha arch is altered as it refers to (E)UID and (E)GID in
entry.S via asm-offsets.

With comment fixes Signed-off-by: Marc Dionne

Signed-off-by: David Howells
Acked-by: James Morris
Acked-by: Serge Hallyn
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:16 +0800
da9592ede CRED: Wrap task credential accesses in the filesystem subsystem ... Browse Code »

Wrap access to task credentials so that they can be separated more easily from
the task_struct during the introduction of COW creds.

Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
sense to use RCU directly rather than a convenient wrapper; these will be
addressed by later patches.

Signed-off-by: David Howells
Reviewed-by: James Morris
Acked-by: Serge Hallyn
Cc: Al Viro
Signed-off-by: James Morris

David Howells
2008-11-14 07:39:05 +0800

01 Aug, 2008

2 commits

1b7e190b4 [PATCH] clean dup2() up a bit ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2008-08-01 23:25:24 +0800
1027abe88 [PATCH] merge locate_fd() and get_unused_fd() ... Browse Code »

New primitive: alloc_fd(start, flags). get_unused_fd() and
get_unused_fd_flags() become wrappers on top of it.

Signed-off-by: Al Viro

Al Viro
2008-08-01 23:25:23 +0800

27 Jul, 2008

3 commits

4e1e018ec [PATCH] fix RLIM_NOFILE handling ... Browse Code »

* dup2() should return -EBADF on exceeded sysctl_nr_open
* dup() should *not* return -EINVAL even if you have rlimit set to 0;
it should get -EMFILE instead.

Check for orig_start exceeding rlimit taken to sys_fcntl().
Failing expand_files() in dup{2,3}() now gets -EMFILE remapped to -EBADF.
Consequently, remaining checks for rlimit are taken to expand_files().

Signed-off-by: Al Viro

Al Viro
2008-07-27 08:53:45 +0800
6c5d0512a [PATCH] get rid of corner case in dup3() entirely ... Browse Code »

Since Ulrich is OK with getting rid of dup3(fd, fd, flags) completely,
to hell the damn thing goes. Corner case for dup2() is handled in
sys_dup2() (complete with -EBADF if dup2(fd, fd) is called with fd
that is not open), the rest is done in dup3().

Signed-off-by: Al Viro

Al Viro
2008-07-27 08:53:44 +0800
3c333937e [PATCH] dup3 fix ... Browse Code »

Al Viro notice one cornercase that the new dup3() code. The dup2()
function, as a special case, handles dup-ing to the same file
descriptor. In this case the current dup3() code does nothing at
all. I.e., it ingnores the flags parameter. This shouldn't happen,
the close-on-exec flag should be set if requested.

In case the O_CLOEXEC bit in the flags parameter is not set the
dup3() function should behave in this respect identical to dup2().
This means dup3(fd, fd, 0) should not actively reset the c-o-e
flag.

The patch below implements this minor change.

[AV: credits to Artur Grabowski for bringing that up as potential subtle point
in dup2() behaviour]

Signed-off-by: Ulrich Drepper
Signed-off-by: Al Viro

Ulrich Drepper
2008-07-27 08:53:39 +0800

25 Jul, 2008

1 commit

336dd1f70 flag parameters: dup2 ... Browse Code »

This patch adds the new dup3 syscall. It extends the old dup2 syscall by one
parameter which is meant to hold a flag value. Support for the O_CLOEXEC flag
is added in this patch.

The following test must be adjusted for architectures other than x86 and
x86-64 and in case the syscall numbers changed.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include
#include
#include
#include
#include

#ifndef __NR_dup3
# ifdef __x86_64__
# define __NR_dup3 292
# elif defined __i386__
# define __NR_dup3 330
# else
# error "need __NR_dup3"
# endif
#endif

int
main (void)
{
int fd = syscall (__NR_dup3, 1, 4, 0);
if (fd == -1)
{
puts ("dup3(0) failed");
return 1;
}
int coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if (coe & FD_CLOEXEC)
{
puts ("dup3(0) set close-on-exec flag");
return 1;
}
close (fd);

fd = syscall (__NR_dup3, 1, 4, O_CLOEXEC);
if (fd == -1)
{
puts ("dup3(O_CLOEXEC) failed");
return 1;
}
coe = fcntl (fd, F_GETFD);
if (coe == -1)
{
puts ("fcntl failed");
return 1;
}
if ((coe & FD_CLOEXEC) == 0)
{
puts ("dup3(O_CLOEXEC) set close-on-exec flag");
return 1;
}
close (fd);

puts ("OK");

return 0;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper
Acked-by: Davide Libenzi
Cc: Michael Kerrisk
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2008-07-25 01:47:28 +0800

03 Jul, 2008

1 commit

a238b790d Call fasync() functions without the BKL ... Browse Code »

lock_kernel() calls have been pushed down into code which needs it, so
there is no need to take the BKL at this level anymore.

This work inspired and aided by Andi Kleen's unlocked_fasync() patches.

Acked-by: Andi Kleen
Signed-off-by: Jonathan Corbet

Jonathan Corbet
2008-07-03 05:06:28 +0800

02 May, 2008

1 commit

9f3acc314 [PATCH] split linux/file.h ... Browse Code »

Initial splitoff of the low-level stuff; taken to fdtable.h

Signed-off-by: Al Viro

Al Viro
2008-05-02 01:08:16 +0800

25 Apr, 2008

1 commit

f8f95702f [PATCH] sanitize locate_fd() ... Browse Code »

* 'file' argument is unused; lose it.
* move setting flags from the caller (dupfd()) to locate_fd();
pass cloexec flag as new argument. Note that files_fdtable()
that used to be in dupfd() isn't needed in the place in
locate_fd() where the moved code ends up - we know that ->file_lock
hadn't been dropped since the last time we calculated fdt because
we can get there only if expand_files() returns 0 and it doesn't
drop/reacquire in that case.
* move getting/dropping ->file_lock into locate_fd(). Now the caller
doesn't need to do anything with files_struct *files anymore and
we can move that inside locate_fd() as well, killing the
struct files_struct * argument.

At that point locate_fd() is extremely similar to get_unused_fd_flags()
and the next patches will merge those two.

Signed-off-by: Al Viro

Al Viro
2008-04-25 21:24:05 +0800

09 Feb, 2008

2 commits

fc9b52cd8 fs: remove fastcall, it is always empty ... Browse Code »

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Harvey Harrison
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Harvey Harrison
2008-02-09 01:22:31 +0800
6c5f3e7b4 Pidns: make full use of xxx_vnr() calls ... Browse Code »

Some time ago the xxx_vnr() calls (e.g. pid_vnr or find_task_by_vpid) were
_all_ converted to operate on the current pid namespace. After this each call
like xxx_nr_ns(foo, current->nsproxy->pid_ns) is nothing but a xxx_vnr(foo)
one.

Switch all the xxx_nr_ns() callers to use the xxx_vnr() calls where
appropriate.

Signed-off-by: Pavel Emelyanov
Reviewed-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2008-02-09 01:22:29 +0800

20 Oct, 2007

1 commit

b488893a3 pid namespaces: changes to show virtual ids to user ... Browse Code »

This is the largest patch in the set. Make all (I hope) the places where
the pid is shown to or get from user operate on the virtual pids.

The idea is:
- all in-kernel data structures must store either struct pid itself
or the pid's global nr, obtained with pid_nr() call;
- when seeking the task from kernel code with the stored id one
should use find_task_by_pid() call that works with global pids;
- when showing pid's numerical value to the user the virtual one
should be used, but however when one shows task's pid outside this
task's namespace the global one is to be used;
- when getting the pid from userspace one need to consider this as
the virtual one and use appropriate task/pid-searching functions.

[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: nuther build fix]
[akpm@linux-foundation.org: yet nuther build fix]
[akpm@linux-foundation.org: remove unneeded casts]
Signed-off-by: Pavel Emelyanov
Signed-off-by: Alexey Dobriyan
Cc: Sukadev Bhattiprolu
Cc: Oleg Nesterov
Cc: Paul Menage
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Pavel Emelyanov
2007-10-20 02:53:40 +0800

17 Oct, 2007

1 commit

22d2b35b2 F_DUPFD_CLOEXEC implementation ... Browse Code »

One more small change to extend the availability of creation of file
descriptors with FD_CLOEXEC set. Adding a new command to fcntl() requires
no new system call and the overall impact on code size if minimal.

If this patch gets accepted we will also add this change to the next
revision of the POSIX spec.

To test the patch, use the following little program. Adjust the value of
F_DUPFD_CLOEXEC appropriately.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#include
#include
#include
#include
#include

#ifndef F_DUPFD_CLOEXEC
# define F_DUPFD_CLOEXEC 12
#endif

int
main (int argc, char *argv[])
{
if (argc > 1)
{
if (fcntl (3, F_GETFD) == 0)
{
puts ("descriptor not closed");
exit (1);
}
if (errno != EBADF)
{
puts ("error not EBADF");
exit (1);
}

exit (0);
}
int fd = fcntl (STDOUT_FILENO, F_DUPFD_CLOEXEC, 0);
if (fd == -1 && errno == EINVAL)
{
puts ("F_DUPFD_CLOEXEC not supported");
return 0;
}
if (fd != 3)
{
puts ("program called with descriptors other than 0,1,2");
return 1;
}

execl ("/proc/self/exe", "/proc/self/exe", "1", NULL);
puts ("execl failed");
return 1;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Ulrich Drepper
Cc: Al Viro
Cc: Christoph Hellwig
Cc:
Cc: Kyle McMartin
Cc: Stephen Rothwell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ulrich Drepper
2007-10-17 23:43:01 +0800

20 Jul, 2007

1 commit

20c2df83d mm: Remove slab destructors from kmem_cache_create(). ... Browse Code »

Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt

Paul Mundt
2007-07-20 09:11:58 +0800

18 Jul, 2007

1 commit

3bd858ab1 Introduce is_owner_or_cap() to wrap CAP_FOWNER use with fsuid check ... Browse Code »

Introduce is_owner_or_cap() macro in fs.h, and convert over relevant
users to it. This is done because we want to avoid bugs in the future
where we check for only effective fsuid of the current task against a
file's owning uid, without simultaneously checking for CAP_FOWNER as
well, thus violating its semantics.
[ XFS uses special macros and structures, and in general looked ...
untouchable, so we leave it alone -- but it has been looked over. ]

The (current->fsuid != inode->i_uid) check in generic_permission() and
exec_permission_lite() is left alone, because those operations are
covered by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH. Similarly operations
falling under the purview of CAP_CHOWN and CAP_LEASE are also left alone.

Signed-off-by: Satyam Sharma
Cc: Al Viro
Acked-by: Serge E. Hallyn
Signed-off-by: Linus Torvalds

Satyam Sharma
2007-07-18 03:00:03 +0800

11 Dec, 2006

1 commit

bbea9f696 [PATCH] fdtable: Make fdarray and fdsets equal in size ... Browse Code »

Currently, each fdtable supports three dynamically-sized arrays of data: the
fdarray and two fdsets. The code allows the number of fds supported by the
fdarray (fdtable->max_fds) to differ from the number of fds supported by each
of the fdsets (fdtable->max_fdset).

In practice, it is wasteful for these two sizes to differ: whenever we hit a
limit on the smaller-capacity structure, we will reallocate the entire fdtable
and all the dynamic arrays within it, so any delta in the memory used by the
larger-capacity structure will never be touched at all.

Rather than hogging this excess, we shouldn't even allocate it in the first
place, and keep the capacities of the fdarray and the fdsets equal. This
patch removes fdtable->max_fdset. As an added bonus, most of the supporting
code becomes simpler.

Signed-off-by: Vadim Lobanov
Cc: Christoph Hellwig
Cc: Al Viro
Cc: Dipankar Sarma
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vadim Lobanov
2006-12-11 01:57:22 +0800

09 Dec, 2006

1 commit

0f7fc9e4d [PATCH] VFS: change struct file to use struct path ... Browse Code »

This patch changes struct file to use struct path instead of having
independent pointers to struct dentry and struct vfsmount, and converts all
users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.

Additionally, it adds two #define's to make the transition easier for users of
the f_dentry and f_vfsmnt.

Signed-off-by: Josef "Jeff" Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josef "Jeff" Sipek
2006-12-09 00:28:41 +0800

08 Dec, 2006

2 commits

e18b890bb [PATCH] slab: remove kmem_cache_t ... Browse Code »

Replace all uses of kmem_cache_t with struct kmem_cache.

The patch was generated using the following script:

#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#

set -e

for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
done

The script was run like this

sh replace kmem_cache_t "struct kmem_cache"

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:25 +0800
e94b17660 [PATCH] slab: remove SLAB_KERNEL ... Browse Code »

SLAB_KERNEL is an alias of GFP_KERNEL.

Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Lameter
2006-12-08 00:39:24 +0800

02 Oct, 2006

2 commits

43fa1adb9 [PATCH] file: Add locking to f_getown ... Browse Code »

This has been needed for a long time, but now with the advent of a
reference counted struct pid there are real consequences for getting this
wrong.

Someone I think it was Oleg Nesterov pointed out that this construct was
missing locking, when I introduced struct pid. After taking time to review
the locking construct already present I figured out which lock needs to be
taken. The other paths that access f_owner.pid take either the f_owner
read or the write lock.

Signed-off-by: Eric W. Biederman
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-10-02 22:57:15 +0800
609d7fa95 [PATCH] file: modify struct fown_struct to use a struct pid ... Browse Code »

File handles can be requested to send sigio and sigurg to processes. By
tracking the destination processes using struct pid instead of pid_t we make
the interface safe from all potential pid wrap around problems.

Signed-off-by: Eric W. Biederman
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric W. Biederman
2006-10-02 22:57:14 +0800

02 Apr, 2006

1 commit

f6298aab2 BUG_ON() Conversion in fs/fcntl.c ... Browse Code »

this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized away

Signed-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk

Eric Sesterhenn
2006-04-02 19:37:19 +0800

27 Mar, 2006

1 commit

fa3536cc1 [PATCH] Use __read_mostly on some hot fs variables ... Browse Code »

I discovered on oprofile hunting on a SMP platform that dentry lookups were
slowed down because d_hash_mask, d_hash_shift and dentry_hashtable were in
a cache line that contained inodes_stat. So each time inodes_stats is
changed by a cpu, other cpus have to refill their cache line.

This patch moves some variables to the __read_mostly section, in order to
avoid false sharing. RCU dentry lookups can go full speed.

Signed-off-by: Eric Dumazet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2006-03-27 00:56:56 +0800

23 Mar, 2006

1 commit

0c9e63fd3 [PATCH] Shrinks sizeof(files_struct) and better layout ... Browse Code »

1) Reduce the size of (struct fdtable) to exactly 64 bytes on 32bits
platforms, lowering kmalloc() allocated space by 50%.

2) Reduce the size of (files_struct), using a special 32 bits (or
64bits) embedded_fd_set, instead of a 1024 bits fd_set for the
close_on_exec_init and open_fds_init fields. This save some ram (248
bytes per task) as most tasks dont open more than 32 files. D-Cache
footprint for such tasks is also reduced to the minimum.

3) Reduce size of allocated fdset. Currently two full pages are
allocated, that is 32768 bits on x86 for example, and way too much. The
minimum is now L1_CACHE_BYTES.

UP and SMP should benefit from this patch, because most tasks will touch
only one cache line when open()/close() stdin/stdout/stderr (0/1/2),
(next_fd, close_on_exec_init, open_fds_init, fd_array[0 .. 2] being in the
same cache line)

Signed-off-by: Eric Dumazet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2006-03-23 23:38:09 +0800

04 Feb, 2006

1 commit

7d95c8f27 [PATCH] fcntl F_SETFL and read-only IS_APPEND files ... Browse Code »

There is code in setfl() which attempts to preserve the O_APPEND flag on
IS_APPEND files... however IS_APPEND files could also be opened O_RDONLY
and in that case setfl() should not require O_APPEND...

coreutils 5.93 tail -f attempts to set O_NONBLOCK even on regular files...
unfortunately if you try this on an append-only log file the result is
this:

fcntl64(3, F_GETFL) = 0x8000 (flags O_RDONLY|O_LARGEFILE)
fcntl64(3, F_SETFL, O_RDONLY|O_NONBLOCK|O_LARGEFILE) = -1 EPERM (Operation not permitted)

I offer up the patch below as one way of fixing the problem... i've tested
it fixes the problem with tail -f but haven't really tested beyond that.

(I also reported the coreutils bug upstream... it shouldn't fail imho...
)

Signed-off-by: dean gaudet
Cc: Al Viro
Acked-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

dean gaudet
2006-02-04 00:32:07 +0800