Eric Lee / smarc-fsl-linux-kernel

27 May, 2011

1 commit

ca16d140a mm: don't access vm_flags as 'int' ... Browse Code »

The type of vma->vm_flags is 'unsigned long'. Neither 'int' nor
'unsigned int'. This patch fixes such misuse.

Signed-off-by: KOSAKI Motohiro
[ Changed to use a typedef - we'll extend it to cover more cases
later, since there has been discussion about making it a 64-bit
type.. - Linus ]
Signed-off-by: Linus Torvalds

KOSAKI Motohiro
2011-05-27 00:20:31 +0800

11 May, 2011

1 commit

a00eaf11a ns proc: Add support for the ipc namespace ... Browse Code »

Acked-by: Daniel Lezcano
Signed-off-by: Eric W. Biederman

Eric W. Biederman
2011-05-11 05:35:47 +0800

31 Mar, 2011

1 commit

25985edce Fix common misspellings ... Browse Code »

Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi

Lucas De Marchi
2011-03-31 22:26:23 +0800

28 Mar, 2011

1 commit

6213cfe82 ipc: fix util.c kernel-doc warnings ... Browse Code »

Fix ipc/util.c kernel-doc warnings:

Warning(ipc/util.c:336): No description found for parameter 'ns'
Warning(ipc/util.c:620): No description found for parameter 'ns'
Warning(ipc/util.c:790): No description found for parameter 'ns'

Signed-off-by: Randy Dunlap
Reviewed-by: Jesper Juhl
Signed-off-by: Linus Torvalds

Randy Dunlap
2011-03-28 10:30:19 +0800

26 Mar, 2011

1 commit

be4d250ab ipcns: fix use after free in free_ipc_ns() ... Browse Code »

commit b515498 ("userns: add a user namespace owner of ipc ns") added a
user namespace owner of ipc ns, but it also introduced a use after free in
free_ipc_ns().

Signed-off-by: Xiaotian Feng
Acked-by: "Serge E. Hallyn"
Acked-by: David Howells
Cc: "Eric W. Biederman"
Cc: Daniel Lezcano
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Xiaotian Feng
2011-03-26 08:45:16 +0800

24 Mar, 2011

2 commits

b0e77598f userns: user namespaces: convert several capable() calls ... Browse Code »

CAP_IPC_OWNER and CAP_IPC_LOCK can be checked against current_user_ns(),
because the resource comes from current's own ipc namespace.

setuid/setgid are to uids in own namespace, so again checks can be against
current_user_ns().

Changelog:
Jan 11: Use task_ns_capable() in place of sched_capable().
Jan 11: Use nsown_capable() as suggested by Bastian Blank.
Jan 11: Clarify (hopefully) some logic in futex and sched.c
Feb 15: use ns_capable for ipc, not nsown_capable
Feb 23: let copy_ipcs handle setting ipc_ns->user_ns
Feb 23: pass ns down rather than taking it from current

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Serge E. Hallyn
Acked-by: "Eric W. Biederman"
Acked-by: Daniel Lezcano
Acked-by: David Howells
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2011-03-24 10:47:08 +0800
b515498f5 userns: add a user namespace owner of ipc ns ... Browse Code »

Changelog:
Feb 15: Don't set new ipc->user_ns if we didn't create a new
ipc_ns.
Feb 23: Move extern declaration to ipc_namespace.h, and group
fwd declarations at top.

Signed-off-by: Serge E. Hallyn
Acked-by: "Eric W. Biederman"
Acked-by: Daniel Lezcano
Acked-by: David Howells
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Serge E. Hallyn
2011-03-24 10:47:07 +0800

07 Jan, 2011

1 commit

fa0d7e3de fs: icache RCU free inodes ... Browse Code »

RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
to take i_lock no longer need to take sb_inode_list_lock to walk the list in
the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.

Signed-off-by: Nick Piggin

Nick Piggin
2011-01-07 14:50:26 +0800

30 Oct, 2010

1 commit

3af54c9bd ipc: shm: fix information leak to userland ... Browse Code »

The shmid_ds structure is copied to userland with shm_unused{,2,3}
fields unitialized. It leads to leaking of contents of kernel stack
memory.

Signed-off-by: Vasiliy Kulikov
Acked-by: Al Viro
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds

Vasiliy Kulikov
2010-10-30 23:25:51 +0800

29 Oct, 2010

1 commit

ceefda693 switch get_sb_ns() users ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2010-10-29 16:17:03 +0800

28 Oct, 2010

2 commits

03145beb4 ipc: initialize structure memory to zero for compat functions ... Browse Code »

This takes care of leaking uninitialized kernel stack memory to
userspace from non-zeroed fields in structs in compat ipc functions.

Signed-off-by: Dan Rosenberg
Cc: Manfred Spraul
Cc: Arnd Bergmann
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Rosenberg
2010-10-28 09:03:13 +0800
b79521807 ipc/shm.c: add RSS and swap size information to /proc/sysvipc/shm ... Browse Code »

The kernel currently provides no functionality to analyze the RSS and swap
space usage of each individual sysvipc shared memory segment.

This patch adds this info for each existing shm segment by extending the
output of /proc/sysvipc/shm by two columns for RSS and swap.

Since shmctl(SHM_INFO) already provides a similiar calculation (it
currently sums up all RSS/swap info for all segments), I did split out a
static function which is now used by the /proc/sysvipc/shm output and
shmctl(SHM_INFO).

SAP products (esp. the SAP Netweaver ABAP Kernel) uses lots of big shared
memory segments (we often have Linux systems with >= 16GB shm usage).
Sometimes we get customer reports about "slow" system responses and while
looking into their configurations we often find massive swapping activity
on the system. With this patch it's now easy to see from the command line
if and which shm segments gets swapped out (and how much) and can more
easily give recommendations for system tuning. Without the patch it's
currently not possible to do such shm analysis at all.

Also...

Add some spaces in front of the "size" field for 64bit kernels to get the
columns correct if you cat the contents of the file. In
sysvipc_shm_proc_show() the kernel prints the size value in "SPEC_SIZE"
format, which is defined like this:

#if BITS_PER_LONG
Cc: Manfred Spraul
Acked-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Helge Deller
2010-10-28 09:03:13 +0800

26 Oct, 2010

2 commits

85fe4025c fs: do not assign default i_ino in new_inode ... Browse Code »
43

Instead of always assigning an increasing inode number in new_inode
move the call to assign it into those callers that actually need it.
For now callers that need it is estimated conservatively, that is
the call is added to all filesystems that do not assign an i_ino
by themselves. For a few more filesystems we can avoid assigning
any inode number given that they aren't user visible, and for others
it could be done lazily when an inode number is actually needed,
but that's left for later patches.

Signed-off-by: Christoph Hellwig
Signed-off-by: Dave Chinner
Signed-off-by: Al Viro

Christoph Hellwig
2010-10-26 09:26:11 +0800
7de9c6ee3 new helper: ihold() ... Browse Code »

Clones an existing reference to inode; caller must already hold one.

Signed-off-by: Al Viro

Al Viro
2010-10-26 09:26:11 +0800

23 Oct, 2010

1 commit

092e0e7e5 Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl ... Browse Code »

* 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl:
vfs: make no_llseek the default
vfs: don't use BKL in default_llseek
llseek: automatically add .llseek fop
libfs: use generic_file_llseek for simple_attr
mac80211: disallow seeks in minstrel debug code
lirc: make chardev nonseekable
viotape: use noop_llseek
raw: use explicit llseek file operations
ibmasmfs: use generic_file_llseek
spufs: use llseek in all file operations
arm/omap: use generic_file_llseek in iommu_debug
lkdtm: use generic_file_llseek in debugfs
net/wireless: use generic_file_llseek in debugfs
drm: use noop_llseek

Linus Torvalds
2010-10-23 01:52:56 +0800

15 Oct, 2010

1 commit

6038f373a llseek: automatically add .llseek fop ... Browse Code »

All file_operations should get a .llseek operation so we can make
nonseekable_open the default for future file operations without a
.llseek pointer.

The three cases that we can automatically detect are no_llseek, seq_lseek
and default_llseek. For cases where we can we can automatically prove that
the file offset is always ignored, we use noop_llseek, which maintains
the current behavior of not returning an error from a seek.

New drivers should normally not use noop_llseek but instead use no_llseek
and call nonseekable_open at open time. Existing drivers can be converted
to do the same when the maintainer knows for certain that no user code
relies on calling seek on the device file.

The generated code is often incorrectly indented and right now contains
comments that clarify for each added line why a specific variant was
chosen. In the version that gets submitted upstream, the comments will
be gone and I will manually fix the indentation, because there does not
seem to be a way to do that using coccinelle.

Some amount of new code is currently sitting in linux-next that should get
the same modifications, which I will do at the end of the merge window.

Many thanks to Julia Lawall for helping me learn to write a semantic
patch that does all this.

===== begin semantic patch =====
// This adds an llseek= method to all file operations,
// as a preparation for making no_llseek the default.
//
// The rules are
// - use no_llseek explicitly if we do nonseekable_open
// - use seq_lseek for sequential files
// - use default_llseek if we know we access f_pos
// - use noop_llseek if we know we don't access f_pos,
// but we still want to allow users to call lseek
//
@ open1 exists @
identifier nested_open;
@@
nested_open(...)
{

}

@ open exists@
identifier open_f;
identifier i, f;
identifier open1.nested_open;
@@
int open_f(struct inode *i, struct file *f)
{

}

@ read disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{

}

@ read_no_fpos disable optional_qualifier exists @
identifier read_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t read_f(struct file *f, char *p, size_t s, loff_t *off)
{
... when != off
}

@ write @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
expression E;
identifier func;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{

}

@ write_no_fpos @
identifier write_f;
identifier f, p, s, off;
type ssize_t, size_t, loff_t;
@@
ssize_t write_f(struct file *f, const char *p, size_t s, loff_t *off)
{
... when != off
}

@ fops0 @
identifier fops;
@@
struct file_operations fops = {
...
};

@ has_llseek depends on fops0 @
identifier fops0.fops;
identifier llseek_f;
@@
struct file_operations fops = {
...
.llseek = llseek_f,
...
};

@ has_read depends on fops0 @
identifier fops0.fops;
identifier read_f;
@@
struct file_operations fops = {
...
.read = read_f,
...
};

@ has_write depends on fops0 @
identifier fops0.fops;
identifier write_f;
@@
struct file_operations fops = {
...
.write = write_f,
...
};

@ has_open depends on fops0 @
identifier fops0.fops;
identifier open_f;
@@
struct file_operations fops = {
...
.open = open_f,
...
};

// use no_llseek if we call nonseekable_open
////////////////////////////////////////////
@ nonseekable1 depends on !has_llseek && has_open @
identifier fops0.fops;
identifier nso ~= "nonseekable_open";
@@
struct file_operations fops = {
... .open = nso, ...
+.llseek = no_llseek, /* nonseekable */
};

@ nonseekable2 depends on !has_llseek @
identifier fops0.fops;
identifier open.open_f;
@@
struct file_operations fops = {
... .open = open_f, ...
+.llseek = no_llseek, /* open uses nonseekable */
};

// use seq_lseek for sequential files
/////////////////////////////////////
@ seq depends on !has_llseek @
identifier fops0.fops;
identifier sr ~= "seq_read";
@@
struct file_operations fops = {
... .read = sr, ...
+.llseek = seq_lseek, /* we have seq_read */
};

// use default_llseek if there is a readdir
///////////////////////////////////////////
@ fops1 depends on !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier readdir_e;
@@
// any other fop is used that changes pos
struct file_operations fops = {
... .readdir = readdir_e, ...
+.llseek = default_llseek, /* readdir is present */
};

// use default_llseek if at least one of read/write touches f_pos
/////////////////////////////////////////////////////////////////
@ fops2 depends on !fops1 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read.read_f;
@@
// read fops use offset
struct file_operations fops = {
... .read = read_f, ...
+.llseek = default_llseek, /* read accesses f_pos */
};

@ fops3 depends on !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write.write_f;
@@
// write fops use offset
struct file_operations fops = {
... .write = write_f, ...
+ .llseek = default_llseek, /* write accesses f_pos */
};

// Use noop_llseek if neither read nor write accesses f_pos
///////////////////////////////////////////////////////////

@ fops4 depends on !fops1 && !fops2 && !fops3 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
identifier write_no_fpos.write_f;
@@
// write fops use offset
struct file_operations fops = {
...
.write = write_f,
.read = read_f,
...
+.llseek = noop_llseek, /* read and write both use no f_pos */
};

@ depends on has_write && !has_read && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier write_no_fpos.write_f;
@@
struct file_operations fops = {
... .write = write_f, ...
+.llseek = noop_llseek, /* write uses no f_pos */
};

@ depends on has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
identifier read_no_fpos.read_f;
@@
struct file_operations fops = {
... .read = read_f, ...
+.llseek = noop_llseek, /* read uses no f_pos */
};

@ depends on !has_read && !has_write && !fops1 && !fops2 && !has_llseek && !nonseekable1 && !nonseekable2 && !seq @
identifier fops0.fops;
@@
struct file_operations fops = {
...
+.llseek = noop_llseek, /* no read or write fn */
};
===== End semantic patch =====

Signed-off-by: Arnd Bergmann
Cc: Julia Lawall
Cc: Christoph Hellwig

Arnd Bergmann
2010-10-15 21:53:27 +0800

02 Oct, 2010

1 commit

982f7c2b2 sys_semctl: fix kernel stack leakage ... Browse Code »

The semctl syscall has several code paths that lead to the leakage of
uninitialized kernel stack memory (namely the IPC_INFO, SEM_INFO,
IPC_STAT, and SEM_STAT commands) during the use of the older, obsolete
version of the semid_ds struct.

The copy_semid_to_user() function declares a semid_ds struct on the stack
and copies it back to the user without initializing or zeroing the
"sem_base", "sem_pending", "sem_pending_last", and "undo" pointers,
allowing the leakage of 16 bytes of kernel stack memory.

The code is still reachable on 32-bit systems - when calling semctl()
newer glibc's automatically OR the IPC command with the IPC_64 flag, but
invoking the syscall directly allows users to use the older versions of
the struct.

Signed-off-by: Dan Rosenberg
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Dan Rosenberg
2010-10-02 01:50:58 +0800

10 Aug, 2010

1 commit

6d8af64c1 switch mqueue to ->evict_inode() ... Browse Code »

... and since the inodes are never hashed, we can use default ->drop_inode()
just fine.

Signed-off-by: Al Viro

Al Viro
2010-08-10 04:47:58 +0800

21 Jul, 2010

1 commit

c61284e99 ipc/sem.c: bugfix for semop() not reporting successful operation ... Browse Code »

The last change to improve the scalability moved the actual wake-up out of
the section that is protected by spin_lock(sma->sem_perm.lock).

This means that IN_WAKEUP can be in queue.status even when the spinlock is
acquired by the current task. Thus the same loop that is performed when
queue.status is read without the spinlock acquired must be performed when
the spinlock is acquired.

Thanks to kamezawa.hiroyu@jp.fujitsu.com for noticing lack of the memory
barrier.

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=16255

[akpm@linux-foundation.org: clean up kerneldoc, checkpatch warning and whitespace]
Signed-off-by: Manfred Spraul
Reported-by: Luca Tettamanti
Tested-by: Luca Tettamanti
Reported-by: Christoph Lameter
Cc: Maciej Rutecki
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2010-07-21 07:25:40 +0800

05 Jun, 2010

1 commit

0abbb609a mqueue doesn't need make_bad_inode() ... Browse Code »

It never hashes them anyway and does final iput() immediately
afterwards. With ->drop_inode() being generic_delete_inode()...

Signed-off-by: Al Viro

Al Viro
2010-06-05 05:16:27 +0800

28 May, 2010

5 commits

7ea808591 drop unused dentry argument to ->fsync ... Browse Code »

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-05-28 10:05:02 +0800
4de85cd6d ipc/sem.c: use ERR_CAST ... Browse Code »

Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more
clear what is the purpose of the operation, which otherwise looks like a
no-op.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

//
@@
type T;
T x;
identifier f;
@@

T f (...) { }

@@
expression x;
@@

- ERR_PTR(PTR_ERR(x))
+ ERR_CAST(x)
//

Signed-off-by: Julia Lawall
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Julia Lawall
2010-05-28 00:12:49 +0800
c5cf6359a ipc/sem.c: update description of the implementation ... Browse Code »

ipc/sem.c begins with a 15 year old description about bugs in the initial
implementation in Linux-1.0. The patch replaces that with a top level
description of the current code.

A TODO could be derived from this text:

The opengroup man page for semop() does not mandate FIFO. Thus there is
no need for a semaphore array list of pending operations.

If

- this list is removed
- the per-semaphore array spinlock is removed (possible if there is no
list to protect)
- sem_otime is moved into the semaphores and calculated on demand during
semctl()

then the array would be read-mostly - which would significantly improve
scaling for applications that use semaphore arrays with lots of entries.

The price would be expensive semctl() calls:

for(i=0;isem_nsems;i++) spin_lock(sma->sem_lock);

for(i=0;isem_nsems;i++) spin_unlock(sma->sem_lock);

I'm not sure if the complexity is worth the effort, thus here is the
documentation of the current behavior first.

Signed-off-by: Manfred Spraul
Cc: Chris Mason
Cc: Zach Brown
Cc: Jens Axboe
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2010-05-28 00:12:49 +0800
0a2b9d4c7 ipc/sem.c: move wake_up_process out of the spinlock section ... Browse Code »

The wake-up part of semtimedop() consists out of two steps:

- the right tasks must be identified.
- they must be woken up.

Right now, both steps run while the array spinlock is held. This patch
reorders the code and moves the actual wake_up_process() behind the point
where the spinlock is dropped.

The code also moves setting sem->sem_otime to one place: It does not make
sense to set the last modify time multiple times.

[akpm@linux-foundation.org: repair kerneldoc]
[akpm@linux-foundation.org: fix uninitialised retval]
Signed-off-by: Manfred Spraul
Cc: Chris Mason
Cc: Zach Brown
Cc: Jens Axboe
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2010-05-28 00:12:49 +0800
fd5db4225 ipc/sem.c: optimize update_queue() for bulk wakeup calls ... Browse Code »

The following series of patches tries to fix the spinlock contention
reported by Chris Mason - his benchmark exposes problems of the current
code:

- In the worst case, the algorithm used by update_queue() is O(N^2).
Bulk wake-up calls can enter this worst case. The patch series fix
that.

Note that the benchmark app doesn't expose the problem, it just should
be fixed: Real world apps might do the wake-ups in another order than
perfect FIFO.

- The part of the code that runs within the semaphore array spinlock is
significantly larger than necessary.

The patch series fixes that. This change is responsible for the main
improvement.

- The cacheline with the spinlock is also used for a variable that is
read in the hot path (sem_base) and for a variable that is unnecessarily
written to multiple times (sem_otime). The last step of the series
cacheline-aligns the spinlock.

This patch:

The SysV semaphore code allows to perform multiple operations on all
semaphores in the array as atomic operations. After a modification,
update_queue() checks which of the waiting tasks can complete.

The algorithm that is used to identify the tasks is O(N^2) in the worst
case. For some cases, it is simple to avoid the O(N^2).

The patch adds a detection logic for some cases, especially for the case
of an array where all sleeping tasks are single sembuf operations and a
multi-sembuf operation is used to wake up multiple tasks.

A big database application uses that approach.

The patch fixes wakeup due to semctl(,,SETALL,) - the initial version of
the patch breaks that.

[akpm@linux-foundation.org: make do_smart_update() static]
Signed-off-by: Manfred Spraul
Cc: Chris Mason
Cc: Zach Brown
Cc: Jens Axboe
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Manfred Spraul
2010-05-28 00:12:49 +0800

25 May, 2010

1 commit

4be929be3 kernel-wide: replace USHORT_MAX, SHORT_MAX and SHORT_MIN with USHRT_MAX, SHRT_MAX and SHRT_MIN ... Browse Code »

- C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not
USHORT_MAX/SHORT_MAX/SHORT_MIN.

- Make SHRT_MIN of type s16, not int, for consistency.

[akpm@linux-foundation.org: fix drivers/dma/timb_dma.c]
[akpm@linux-foundation.org: fix security/keys/keyring.c]
Signed-off-by: Alexey Dobriyan
Acked-by: WANG Cong
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2010-05-25 23:07:02 +0800

20 May, 2010

1 commit

164d44fd9 Merge branch 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip ... Browse Code »

* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
clocksource: Add clocksource_register_hz/khz interface
posix-cpu-timers: Optimize run_posix_cpu_timers()
time: Remove xtime_cache
mqueue: Convert message queue timeout to use hrtimers
hrtimers: Provide schedule_hrtimeout for CLOCK_REALTIME
timers: Introduce the concept of timer slack for legacy timers
ntp: Remove tickadj
ntp: Make time_adjust static
time: Add xtime, wall_to_monotonic to feature-removal-schedule
timer: Try to survive timer callback preempt_count leak
timer: Split out timer function call
timer: Print function name for timer callbacks modifying preemption count
time: Clean up warp_clock()
cpu-timers: Avoid iterating over all threads in fastpath_timer_check()
cpu-timers: Change SIGEV_NONE timer implementation
cpu-timers: Return correct previous timer reload value
cpu-timers: Cleanup arm_timer()
cpu-timers: Simplify RLIMIT_CPU handling

Linus Torvalds
2010-05-20 08:11:10 +0800

12 May, 2010

1 commit

a3ed2a157 mqueue: fix kernel BUG caused by double free() on mq_open() ... Browse Code »

In case of aborting because we reach the maximum amount of memory which
can be allocated to message queues per user (RLIMIT_MSGQUEUE), we would
try to free the message area twice when bailing out: first by the error
handling code itself, and then later when cleaning up the inode through
delete_inode().

Signed-off-by: André Goddard Rosa
Cc: Alexey Dobriyan
Cc: Al Viro
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

André Goddard Rosa
2010-05-12 08:33:42 +0800

10 May, 2010

1 commit

dbb6be6d5 Merge branch 'linus' into timers/core ... Browse Code »

Reason: Further posix_cpu_timer patches depend on mainline changes

Signed-off-by: Thomas Gleixner

Thomas Gleixner
2010-05-10 20:20:42 +0800

07 Apr, 2010

1 commit

9ca7d8e68 mqueue: Convert message queue timeout to use hrtimers ... Browse Code »

The message queue functions mq_timedsend() and mq_timedreceive()
have not yet been converted to use the hrtimer interface.

This patch replaces the call to schedule_timeout() by a call to
schedule_hrtimeout() and transforms the expiration time from
timespec to ktime as required.

[ tglx: Fixed whitespace wreckage ]

Signed-off-by: Carsten Emde
Tested-by: Pradyumna Sampath
Cc: Arjan van de Veen
Cc: Andrew Morton
LKML-Reference:
Signed-off-by: Thomas Gleixner

Carsten Emde
2010-04-07 03:50:03 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

23 Mar, 2010

1 commit

45575f5a4 ppc64 sys_ipc breakage in 2.6.34-rc2 ... Browse Code »

I chased down a fail on ppc64 on 2.6.34-rc2 where an application that
uses shared memory was getting a SEGV.

Commit baed7fc9b580bd3fb8252ff1d9b36eaf1f86b670 ("Add generic sys_ipc
wrapper") changed the second argument from an unsigned long to an int.
When we call shmget the system call wrappers for sys_ipc will sign
extend second (ie the size) which truncates it. It took a while to
track down because the call succeeds and strace shows the untruncated
size :)

The patch below changes second from an int to an unsigned long which
fixes shmget on ppc64 (and I assume s390, sparc64 and mips64).

Signed-off-by: Anton Blanchard
--

I assume the function prototypes for the other IPC methods would cause us
to sign or zero extend second where appropriate (avoiding any security
issues). Come to think of it, the syscall wrappers for each method should do
that for us as well.
Signed-off-by: Linus Torvalds

Anton Blanchard
2010-03-23 00:57:19 +0800

13 Mar, 2010

2 commits

f1eb1332b ipc: use rlimit helpers ... Browse Code »

Make sure compiler won't do weird things with limits. E.g. fetching them
twice may return 2 different values after writable limits are implemented.

I.e. either use rlimit helpers added in
3e10e716abf3c71bdb5d86b8f507f9e72236c9cd ("resource: add helpers for
fetching rlimits") or ACCESS_ONCE if not applicable.

Signed-off-by: Jiri Slaby
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jiri Slaby
2010-03-13 07:52:39 +0800
baed7fc9b Add generic sys_ipc wrapper ... Browse Code »

Add a generic implementation of the ipc demultiplexer syscall. Except for
s390 and sparc64 all implementations of the sys_ipc are nearly identical.

There are slight differences in the types of the parameters, where mips
and powerpc as the only 64-bit architectures with sys_ipc use unsigned
long for the "third" argument as it gets casted to a pointer later, while
it traditionally is an "int" like most other paramters. frv goes even
further and uses unsigned long for all parameters execept for "ptr" which
is a pointer type everywhere. The change from int to unsigned long for
"third" and back to "int" for the others on frv should be fine due to the
in-register calling conventions for syscalls (we already had a similar
issue with the generic sys_ptrace), but I'd prefer to have the arch
maintainers looks over this in details.

Except for that h8300, m68k and m68knommu lack an impplementation of the
semtimedop sub call which this patch adds, and various architectures have
gets used - at least on i386 it seems superflous as the compat code on
x86-64 and ia64 doesn't even bother to implement it.

[akpm@linux-foundation.org: add sys_ipc to sys_ni.c]
Signed-off-by: Christoph Hellwig
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Hirokazu Takata
Cc: Thomas Gleixner
Cc: Ingo Molnar
Reviewed-by: H. Peter Anvin
Cc: Al Viro
Cc: Arnd Bergmann
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: "Luck, Tony"
Cc: James Morris
Cc: Andreas Schwab
Acked-by: Jesper Nilsson
Acked-by: Russell King
Acked-by: David Howells
Acked-by: Kyle McMartin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2010-03-13 07:52:32 +0800

04 Mar, 2010

6 commits

2329e392a mqueue: fix typo "failues" -> "failures" ... Browse Code »

Signed-off-by: André Goddard Rosa
Signed-off-by: Al Viro

André Goddard Rosa
2010-03-04 03:48:00 +0800
8d8ffefaa mqueue: only set error codes if they are really necessary ... Browse Code »

... postponing assignments until they're needed. Doesn't change code size.

Signed-off-by: André Goddard Rosa
Signed-off-by: Al Viro

André Goddard Rosa
2010-03-04 03:48:00 +0800
04db0dde0 mqueue: simplify do_open() error handling ... Browse Code »

It reduces code size:
text data bss dec hex filename
9925 72 16 10013 271d ipc/mqueue-BEFORE.o
9885 72 16 9973 26f5 ipc/mqueue-AFTER.o

Signed-off-by: André Goddard Rosa
Signed-off-by: Al Viro

André Goddard Rosa
2010-03-04 03:48:00 +0800
8834cf796 mqueue: apply mathematics distributivity on mq_bytes calculation ... Browse Code »

Code size reduction:
text data bss dec hex filename
9941 72 16 10029 272d ipc/mqueue-BEFORE.o
9925 72 16 10013 271d ipc/mqueue-AFTER.o

Signed-off-by: André Goddard Rosa
Signed-off-by: Al Viro

André Goddard Rosa
2010-03-04 03:48:00 +0800
c8308b1c9 mqueue: remove unneeded info->messages initialization ... Browse Code »

... and abort earlier if we couldn't allocate the message pointers array,
avoiding the u->mq_bytes accounting logic.

It reduces code size:
text data bss dec hex filename
9949 72 16 10037 2735 ipc/mqueue-BEFORE.o
9941 72 16 10029 272d ipc/mqueue-AFTER.o

Signed-off-by: André Goddard Rosa
Signed-off-by: Al Viro

André Goddard Rosa
2010-03-04 03:47:59 +0800
4294a8eed mqueue: fix mq_open() file descriptor leak on user-space processes ... Browse Code »

We leak fd on lookup_one_len() failure

Signed-off-by: André Goddard Rosa
Signed-off-by: Al Viro

André Goddard Rosa
2010-03-04 03:46:05 +0800