Eric Lee / smarc-fsl-linux-kernel

29 Apr, 2011

1 commit

6d4831c28 vfs: avoid large kmalloc()s for the fdtable ... Browse Code »

Azurit reports large increases in system time after 2.6.36 when running
Apache. It was bisected down to a892e2d7dcdfa6c76e6 ("vfs: use kmalloc()
to allocate fdmem if possible").

That patch caused the vfs to use kmalloc() for very large allocations and
this is causing excessive work (and presumably excessive reclaim) within
the page allocator.

Fix it by falling back to vmalloc() earlier - when the allocation attempt
would have been considered "costly" by reclaim.

Reported-by: azurIt
Tested-by: azurIt
Acked-by: Changli Gao
Cc: Americo Wang
Cc: Jiri Slaby
Acked-by: Eric Dumazet
Cc: Mel Gorman
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2011-04-29 02:28:20 +0800

11 Aug, 2010

1 commit

a892e2d7d vfs: use kmalloc() to allocate fdmem if possible ... Browse Code »

Use kmalloc() to allocate fdmem if possible.

vmalloc() is used as a fallback solution for fdmem allocation. A new
helper function __free_fdtable() is introduced to reduce the lines of
code.

A potential bug, vfree() a memory allocated by kmalloc(), is fixed.

[akpm@linux-foundation.org: use __GFP_NOWARN, uninline alloc_fdmem() and free_fdmem()]
Signed-off-by: Changli Gao
Cc: Alexander Viro
Cc: Jiri Slaby
Cc: "Paul E. McKenney"
Cc: Alexey Dobriyan
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: Avi Kivity
Cc: Tetsuo Handa
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Changli Gao
2010-08-11 23:59:02 +0800

15 Jun, 2010

1 commit

b97181f24 fs: remove all rcu head initializations, except on_stack initializations ... Browse Code »

Remove all rcu head inits. We don't care about the RCU head state before passing
it to call_rcu() anyway. Only leave the "on_stack" variants so debugobjects can
keep track of objects on stack.

Signed-off-by: Alexey Dobriyan
Signed-off-by: Mathieu Desnoyers
Signed-off-by: Paul E. McKenney
Cc: Alexander Viro
Cc: Andries Brouwer

Paul E. McKenney
2010-06-15 07:37:26 +0800

07 Mar, 2010

1 commit

d554ed895 fs: use rlimit helpers ... Browse Code »

Make sure compiler won't do weird things with limits. E.g. fetching them
twice may return 2 different values after writable limits are implemented.

I.e. either use rlimit helpers added in commit 3e10e716abf3 ("resource:
add helpers for fetching rlimits") or ACCESS_ONCE if not applicable.

Signed-off-by: Jiri Slaby
Cc: Alexander Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiri Slaby
2010-03-07 03:26:29 +0800

25 Feb, 2010

1 commit

7dc521579 vfs: Apply lockdep-based checking to rcu_dereference() uses ... Browse Code »

Add lockdep-ified RCU primitives to alloc_fd(), files_fdtable()
and fcheck_files().

Cc: Alexander Viro
Signed-off-by: Paul E. McKenney
Cc: laijs@cn.fujitsu.com
Cc: dipankar@in.ibm.com
Cc: mathieu.desnoyers@polymtl.ca
Cc: josh@joshtriplett.org
Cc: dvhltc@us.ibm.com
Cc: niv@us.ibm.com
Cc: peterz@infradead.org
Cc: rostedt@goodmis.org
Cc: Valdis.Kletnieks@vt.edu
Cc: dhowells@redhat.com
Cc: Alexander Viro
LKML-Reference:
Signed-off-by: Ingo Molnar

Paul E. McKenney
2010-02-25 17:34:48 +0800

12 Oct, 2009

1 commit

d43c36dc6 headers: remove sched.h from interrupt.h ... Browse Code »

After m68k's task_thread_info() doesn't refer to current,
it's possible to remove sched.h from interrupt.h and not break m68k!
Many thanks to Heiko Carstens for allowing this.

Signed-off-by: Alexey Dobriyan

Alexey Dobriyan
2009-10-12 02:20:58 +0800

01 Aug, 2008

1 commit

1027abe88 [PATCH] merge locate_fd() and get_unused_fd() ... Browse Code »

New primitive: alloc_fd(start, flags). get_unused_fd() and
get_unused_fd_flags() become wrappers on top of it.

Signed-off-by: Al Viro

Al Viro
2008-08-01 23:25:23 +0800

27 Jul, 2008

1 commit

4e1e018ec [PATCH] fix RLIM_NOFILE handling ... Browse Code »

* dup2() should return -EBADF on exceeded sysctl_nr_open
* dup() should *not* return -EINVAL even if you have rlimit set to 0;
it should get -EMFILE instead.

Check for orig_start exceeding rlimit taken to sys_fcntl().
Failing expand_files() in dup{2,3}() now gets -EMFILE remapped to -EBADF.
Consequently, remaining checks for rlimit are taken to expand_files().

Signed-off-by: Al Viro

Al Viro
2008-07-27 08:53:45 +0800

17 May, 2008

6 commits

eceea0b3d [PATCH] avoid multiplication overflows and signedness issues for max_fds ... Browse Code »

Limit sysctl_nr_open - we don't want ->max_fds to exceed MAX_INT and
we don't want size calculation for ->fd[] to overflow.

Signed-off-by: Al Viro

Al Viro
2008-05-17 05:22:52 +0800
adbecb128 [PATCH] dup_fd() part 4 - race fix ... Browse Code »

Parent _can_ be a clone task, contrary to the comment. Moreover,
more files could be opened while we allocate a copy, in which case
we end up copying only part into new descriptor table. Since what
we get _is_ affected by all changes in the old range, we can get
rather weird effects - e.g.
dup2(0, 1024); close(0);
in parallel with fork() resulting in child that sees the effect of
close(), but not that of dup2() done just before that close().

What we need is to recalculate the open_count after having reacquired
->file_lock and if external fdtable we'd just allocated is too small for
it, free the sucker and redo allocation.

Signed-off-by: Al Viro

Al Viro
2008-05-17 05:22:46 +0800
afbec7fff [PATCH] dup_fd() - part 3 ... Browse Code »

merge alloc_files() into dup_fd(), leave setting newf->fdt until the end

Signed-off-by: Al Viro

Al Viro
2008-05-17 05:22:39 +0800
9dec3c4d3 [PATCH] dup_fd() part 2 ... Browse Code »

use alloc_fdtable() instead of expand_files(), get rid of pointless
grabbing newf->file_lock, kill magic in copy_fdtable() that used to
be there only to skip copying when called from dup_fd().

Signed-off-by: Al Viro

Al Viro
2008-05-17 05:22:33 +0800
02afc6267 [PATCH] dup_fd() fixes, part 1 ... Browse Code »

Move the sucker to fs/file.c in preparation to the rest

Signed-off-by: Al Viro

Al Viro
2008-05-17 05:22:26 +0800
f52111b15 [PATCH] take init_files to fs/file.c ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2008-05-17 05:22:20 +0800

02 May, 2008

2 commits

5c598b342 [PATCH] fix sysctl_nr_open bugs ... Browse Code »

* if luser with root sets it to something that is not a multiple of
BITS_PER_LONG, the system is screwed.
* if it gets decreased at the wrong time, we can get expand_files()
returning success and _not_ increasing the size of table as asked.

Signed-off-by: Al Viro

Al Viro
2008-05-02 01:08:57 +0800
9f3acc314 [PATCH] split linux/file.h ... Browse Code »

Initial splitoff of the low-level stuff; taken to fdtable.h

Signed-off-by: Al Viro

Al Viro
2008-05-02 01:08:16 +0800

07 Feb, 2008

1 commit

9cfe015aa get rid of NR_OPEN and introduce a sysctl_nr_open ... Browse Code »

NR_OPEN (historically set to 1024*1024) actually forbids processes to open
more than 1024*1024 handles.

Unfortunatly some production servers hit the not so 'ridiculously high
value' of 1024*1024 file descriptors per process.

Changing NR_OPEN is not considered safe because of vmalloc space potential
exhaust.

This patch introduces a new sysctl (/proc/sys/fs/nr_open) wich defaults to
1024*1024, so that admins can decide to change this limit if their workload
needs it.

[akpm@linux-foundation.org: export it for sparc64]
Signed-off-by: Eric Dumazet
Cc: Alan Cox
Cc: Richard Henderson
Cc: Ivan Kokshaysky
Cc: "David S. Miller"
Cc: Ralf Baechle
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2008-02-07 02:41:06 +0800

23 Dec, 2006

1 commit

01b2d93ca [PATCH] fdtable: Provide free_fdtable() wrapper ... Browse Code »

Christoph Hellwig has expressed concerns that the recent fdtable changes
expose the details of the RCU methodology used to release no-longer-used
fdtable structures to the rest of the kernel. The trivial patch below
addresses these concerns by introducing the appropriate free_fdtable()
calls, which simply wrap the release RCU usage. Since free_fdtable() is a
one-liner, it makes sense to promote it to an inline helper.

Signed-off-by: Vadim Lobanov
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vadim Lobanov
2006-12-23 00:55:50 +0800

11 Dec, 2006

3 commits

5466b456e [PATCH] fdtable: Implement new pagesize-based fdtable allocator ... Browse Code »

This patch provides an improved fdtable allocation scheme, useful for
expanding fdtable file descriptor entries. The main focus is on the fdarray,
as its memory usage grows 128 times faster than that of an fdset.

The allocation algorithm sizes the fdarray in such a way that its memory usage
increases in easy page-sized chunks. The overall algorithm expands the allowed
size in powers of two, in order to amortize the cost of invoking vmalloc() for
larger allocation sizes. Namely, the following sizes for the fdarray are
considered, and the smallest that accommodates the requested fd count is
chosen:

pagesize / 4
pagesize / 2
pagesize open_fds is now used as the anchor for the
fdset memory allocation.

Signed-off-by: Vadim Lobanov
Cc: Christoph Hellwig
Cc: Al Viro
Cc: Dipankar Sarma
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vadim Lobanov
2006-12-11 01:57:22 +0800
4fd45812c [PATCH] fdtable: Remove the free_files field ... Browse Code »

An fdtable can either be embedded inside a files_struct or standalone (after
being expanded). When an fdtable is being discarded after all RCU references
to it have expired, we must either free it directly, in the standalone case,
or free the files_struct it is contained within, in the embedded case.

Currently the free_files field controls this behavior, but we can get rid of
it entirely, as all the necessary information is already recorded. We can
distinguish embedded and standalone fdtables using max_fds, and if it is
embedded we can divine the relevant files_struct using container_of().

Signed-off-by: Vadim Lobanov
Cc: Christoph Hellwig
Cc: Al Viro
Cc: Dipankar Sarma
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vadim Lobanov
2006-12-11 01:57:22 +0800
bbea9f696 [PATCH] fdtable: Make fdarray and fdsets equal in size ... Browse Code »

Currently, each fdtable supports three dynamically-sized arrays of data: the
fdarray and two fdsets. The code allows the number of fds supported by the
fdarray (fdtable->max_fds) to differ from the number of fds supported by each
of the fdsets (fdtable->max_fdset).

In practice, it is wasteful for these two sizes to differ: whenever we hit a
limit on the smaller-capacity structure, we will reallocate the entire fdtable
and all the dynamic arrays within it, so any delta in the memory used by the
larger-capacity structure will never be touched at all.

Rather than hogging this excess, we shouldn't even allocate it in the first
place, and keep the capacities of the fdarray and the fdsets equal. This
patch removes fdtable->max_fdset. As an added bonus, most of the supporting
code becomes simpler.

Signed-off-by: Vadim Lobanov
Cc: Christoph Hellwig
Cc: Al Viro
Cc: Dipankar Sarma
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vadim Lobanov
2006-12-11 01:57:22 +0800

08 Dec, 2006

1 commit

593be07ae [PATCH] file: kill unnecessary timer in fdtable_defer ... Browse Code »

free_fdtable_rc() schedules timer to reschedule fddef->wq if
schedule_work() on it returns 0. However, schedule_work() guarantees that
the target work is executed at least once after the scheduling regardless
of its return value. 0 return simply means that the work was already
pending and thus no further action was required.

Another problem is that it used contant '5' as @expires argument to
mod_timer().

Kill unnecessary fddef->timer.

Signed-off-by: Tejun Heo
Cc: Dipankar Sarma
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tejun Heo
2006-12-08 00:39:32 +0800

22 Nov, 2006

1 commit

65f27f384 WorkStruct: Pass the work_struct pointer instead of context data ... Browse Code »

Pass the work_struct pointer to the work function rather than context data.
The work function can use container_of() to work out the data.

For the cases where the container of the work_struct may go away the moment the
pending bit is cleared, it is made possible to defer the release of the
structure by deferring the clearing of the pending bit.

To make this work, an extra flag is introduced into the management side of the
work_struct. This governs auto-release of the structure upon execution.

Ordinarily, the work queue executor would release the work_struct for further
scheduling or deallocation by clearing the pending bit prior to jumping to the
work function. This means that, unless the driver makes some guarantee itself
that the work_struct won't go away, the work function may not access anything
else in the work_struct or its container lest they be deallocated.. This is a
problem if the auxiliary data is taken away (as done by the last patch).

However, if the pending bit is *not* cleared before jumping to the work
function, then the work function *may* access the work_struct and its container
with no problems. But then the work function must itself release the
work_struct by calling work_release().

In most cases, automatic release is fine, so this is the default. Special
initiators exist for the non-auto-release case (ending in _NAR).

Signed-Off-By: David Howells

David Howells
2006-11-22 22:55:48 +0800

30 Sep, 2006

2 commits

327dcaadc [PATCH] expand_fdtable(): remove pointless unlock+lock ... Browse Code »

This unlock/lock on a super-unlikely path isn't worth the kernel text.

Cc: Vadim Lobanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-09-30 00:18:25 +0800
74d392aaa [PATCH] Clean up expand_fdtable() and expand_files() ... Browse Code »

Perform a code cleanup against the expand_fdtable() and expand_files()
functions inside fs/file.c. It aims to make the flow of code within these
functions simpler and easier to understand, via added comments and modest
refactoring.

Signed-off-by: Vadim Lobanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vadim Lobanov
2006-09-30 00:18:25 +0800

27 Sep, 2006

1 commit

8b0e330b7 [PATCH] alloc_fdtable() cleanup ... Browse Code »

free_fdset(NULL, ...) is legal.

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-09-27 23:26:19 +0800

13 Jul, 2006

2 commits

a29b0b74e [PATCH] alloc_fdtable() expansion fix ... Browse Code »

We're supposed to go the next power of two if nfds==nr.

Of `nr', not of `nfsd'.

Spotted by Rene Scharfe

Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-07-13 03:52:55 +0800
d579091b4 [PATCH] fix fdset leakage ... Browse Code »

When found, it is obvious. nfds calculated when allocating fdsets is
rewritten by calculation of size of fdtable, and when we are unlucky, we
try to free fdsets of wrong size.

Found due to OpenVZ resource management (User Beancounters).

Signed-off-by: Alexey Kuznetsov
Signed-off-by: Kirill Korotaev
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill Korotaev
2006-07-13 03:52:54 +0800

11 Jul, 2006

1 commit

92eb7a2f2 [PATCH] fix weird logic in alloc_fdtable() ... Browse Code »

There's a fairly obvious infinite loop in there.

Also, use roundup_pow_of_two() rather than open-coding stuff.

Cc: Eric Dumazet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2006-07-11 04:24:25 +0800

29 Mar, 2006

1 commit

0a9450227 [PATCH] for_each_possible_cpu: fixes for generic part ... Browse Code »

replaces for_each_cpu with for_each_possible_cpu().

Signed-off-by: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2006-03-29 01:16:05 +0800

23 Mar, 2006

1 commit

0c9e63fd3 [PATCH] Shrinks sizeof(files_struct) and better layout ... Browse Code »

1) Reduce the size of (struct fdtable) to exactly 64 bytes on 32bits
platforms, lowering kmalloc() allocated space by 50%.

2) Reduce the size of (files_struct), using a special 32 bits (or
64bits) embedded_fd_set, instead of a 1024 bits fd_set for the
close_on_exec_init and open_fds_init fields. This save some ram (248
bytes per task) as most tasks dont open more than 32 files. D-Cache
footprint for such tasks is also reduced to the minimum.

3) Reduce size of allocated fdset. Currently two full pages are
allocated, that is 32768 bits on x86 for example, and way too much. The
minimum is now L1_CACHE_BYTES.

UP and SMP should benefit from this patch, because most tasks will touch
only one cache line when open()/close() stdin/stdout/stderr (0/1/2),
(next_fd, close_on_exec_init, open_fds_init, fd_array[0 .. 2] being in the
same cache line)

Signed-off-by: Eric Dumazet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2006-03-23 23:38:09 +0800

06 Feb, 2006

1 commit

88a2a4ac6 [PATCH] percpu data: only iterate over possible CPUs ... Browse Code »

percpu_data blindly allocates bootmem memory to store NR_CPUS instances of
cpudata, instead of allocating memory only for possible cpus.

As a preparation for changing that, we need to convert various 0 -> NR_CPUS
loops to use for_each_cpu().

(The above only applies to users of asm-generic/percpu.h. powerpc has gone it
alone and is presently only allocating memory for present CPUs, so it's
currently corrupting memory).

Signed-off-by: Eric Dumazet
Cc: "David S. Miller"
Cc: James Bottomley
Acked-by: Ingo Molnar
Cc: Jens Axboe
Cc: Anton Blanchard
Acked-by: William Irwin
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Dumazet
2006-02-06 03:06:51 +0800

15 Sep, 2005

1 commit

0b175a7e6 [PATCH] Fix the fdtable freeing in the case of vmalloced fdset/arrays ... Browse Code »

Noted by David Miller:

"The bug is that free_fd_array() takes a "num" argument, but when
calling it from __free_fdtable() we're instead passing in the size in
bytes (ie. "num * sizeof(struct file *)")."

Yes it is a bug. I think I messed it up while merging newer
changes with an older version where I was using size in bytes
to optimize.

Signed-off-by: Dipankar Sarma
Signed-off-by: Linus Torvalds

Dipankar Sarma
2005-09-15 03:38:26 +0800

10 Sep, 2005

2 commits

ab2af1f50 [PATCH] files: files struct with RCU ... Browse Code »

Patch to eliminate struct files_struct.file_lock spinlock on the reader side
and use rcu refcounting rcuref_xxx api for the f_count refcounter. The
updates to the fdtable are done by allocating a new fdtable structure and
setting files->fdt to point to the new structure. The fdtable structure is
protected by RCU thereby allowing lock-free lookup. For fd arrays/sets that
are vmalloced, we use keventd to free them since RCU callbacks can't sleep. A
global list of fdtable to be freed is not scalable, so we use a per-cpu list.
If keventd is already handling the current cpu's work, we use a timer to defer
queueing of that work.

Since the last publication, this patch has been re-written to avoid using
explicit memory barriers and use rcu_assign_pointer(), rcu_dereference()
premitives instead. This required that the fd information is kept in a
separate structure (fdtable) and updated atomically.

Signed-off-by: Dipankar Sarma
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dipankar Sarma
2005-09-10 04:57:55 +0800
badf16621 [PATCH] files: break up files struct ... Browse Code »

In order for the RCU to work, the file table array, sets and their sizes must
be updated atomically. Instead of ensuring this through too many memory
barriers, we put the arrays and their sizes in a separate structure. This
patch takes the first step of putting the file table elements in a separate
structure fdtable that is embedded withing files_struct. It also changes all
the users to refer to the file table using files_fdtable() macro. Subsequent
applciation of RCU becomes easier after this.

Signed-off-by: Dipankar Sarma
Signed-Off-By: David Howells
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dipankar Sarma
2005-09-10 04:57:55 +0800

17 Apr, 2005

1 commit

1da177e4c Linux-2.6.12-rc2 ... Browse Code »

Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.

Let it rip!

Linus Torvalds
2005-04-17 06:20:36 +0800