05 Jun, 2010
1 commit
-
It never hashes them anyway and does final iput() immediately
afterwards. With ->drop_inode() being generic_delete_inode()...Signed-off-by: Al Viro
28 May, 2010
5 commits
-
Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro -
Use ERR_CAST(x) rather than ERR_PTR(PTR_ERR(x)). The former makes more
clear what is the purpose of the operation, which otherwise looks like a
no-op.The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)//
@@
type T;
T x;
identifier f;
@@T f (...) { }
@@
expression x;
@@- ERR_PTR(PTR_ERR(x))
+ ERR_CAST(x)
//Signed-off-by: Julia Lawall
Cc: Manfred Spraul
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
ipc/sem.c begins with a 15 year old description about bugs in the initial
implementation in Linux-1.0. The patch replaces that with a top level
description of the current code.A TODO could be derived from this text:
The opengroup man page for semop() does not mandate FIFO. Thus there is
no need for a semaphore array list of pending operations.If
- this list is removed
- the per-semaphore array spinlock is removed (possible if there is no
list to protect)
- sem_otime is moved into the semaphores and calculated on demand during
semctl()then the array would be read-mostly - which would significantly improve
scaling for applications that use semaphore arrays with lots of entries.The price would be expensive semctl() calls:
for(i=0;isem_nsems;i++) spin_lock(sma->sem_lock);
for(i=0;isem_nsems;i++) spin_unlock(sma->sem_lock);I'm not sure if the complexity is worth the effort, thus here is the
documentation of the current behavior first.Signed-off-by: Manfred Spraul
Cc: Chris Mason
Cc: Zach Brown
Cc: Jens Axboe
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The wake-up part of semtimedop() consists out of two steps:
- the right tasks must be identified.
- they must be woken up.Right now, both steps run while the array spinlock is held. This patch
reorders the code and moves the actual wake_up_process() behind the point
where the spinlock is dropped.The code also moves setting sem->sem_otime to one place: It does not make
sense to set the last modify time multiple times.[akpm@linux-foundation.org: repair kerneldoc]
[akpm@linux-foundation.org: fix uninitialised retval]
Signed-off-by: Manfred Spraul
Cc: Chris Mason
Cc: Zach Brown
Cc: Jens Axboe
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The following series of patches tries to fix the spinlock contention
reported by Chris Mason - his benchmark exposes problems of the current
code:- In the worst case, the algorithm used by update_queue() is O(N^2).
Bulk wake-up calls can enter this worst case. The patch series fix
that.Note that the benchmark app doesn't expose the problem, it just should
be fixed: Real world apps might do the wake-ups in another order than
perfect FIFO.- The part of the code that runs within the semaphore array spinlock is
significantly larger than necessary.The patch series fixes that. This change is responsible for the main
improvement.- The cacheline with the spinlock is also used for a variable that is
read in the hot path (sem_base) and for a variable that is unnecessarily
written to multiple times (sem_otime). The last step of the series
cacheline-aligns the spinlock.This patch:
The SysV semaphore code allows to perform multiple operations on all
semaphores in the array as atomic operations. After a modification,
update_queue() checks which of the waiting tasks can complete.The algorithm that is used to identify the tasks is O(N^2) in the worst
case. For some cases, it is simple to avoid the O(N^2).The patch adds a detection logic for some cases, especially for the case
of an array where all sleeping tasks are single sembuf operations and a
multi-sembuf operation is used to wake up multiple tasks.A big database application uses that approach.
The patch fixes wakeup due to semctl(,,SETALL,) - the initial version of
the patch breaks that.[akpm@linux-foundation.org: make do_smart_update() static]
Signed-off-by: Manfred Spraul
Cc: Chris Mason
Cc: Zach Brown
Cc: Jens Axboe
Cc: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
25 May, 2010
1 commit
-
- C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not
USHORT_MAX/SHORT_MAX/SHORT_MIN.- Make SHRT_MIN of type s16, not int, for consistency.
[akpm@linux-foundation.org: fix drivers/dma/timb_dma.c]
[akpm@linux-foundation.org: fix security/keys/keyring.c]
Signed-off-by: Alexey Dobriyan
Acked-by: WANG Cong
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 May, 2010
1 commit
-
* 'timers-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
clocksource: Add clocksource_register_hz/khz interface
posix-cpu-timers: Optimize run_posix_cpu_timers()
time: Remove xtime_cache
mqueue: Convert message queue timeout to use hrtimers
hrtimers: Provide schedule_hrtimeout for CLOCK_REALTIME
timers: Introduce the concept of timer slack for legacy timers
ntp: Remove tickadj
ntp: Make time_adjust static
time: Add xtime, wall_to_monotonic to feature-removal-schedule
timer: Try to survive timer callback preempt_count leak
timer: Split out timer function call
timer: Print function name for timer callbacks modifying preemption count
time: Clean up warp_clock()
cpu-timers: Avoid iterating over all threads in fastpath_timer_check()
cpu-timers: Change SIGEV_NONE timer implementation
cpu-timers: Return correct previous timer reload value
cpu-timers: Cleanup arm_timer()
cpu-timers: Simplify RLIMIT_CPU handling
12 May, 2010
1 commit
-
In case of aborting because we reach the maximum amount of memory which
can be allocated to message queues per user (RLIMIT_MSGQUEUE), we would
try to free the message area twice when bailing out: first by the error
handling code itself, and then later when cleaning up the inode through
delete_inode().Signed-off-by: André Goddard Rosa
Cc: Alexey Dobriyan
Cc: Al Viro
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
10 May, 2010
1 commit
-
Reason: Further posix_cpu_timer patches depend on mainline changes
Signed-off-by: Thomas Gleixner
07 Apr, 2010
1 commit
-
The message queue functions mq_timedsend() and mq_timedreceive()
have not yet been converted to use the hrtimer interface.This patch replaces the call to schedule_timeout() by a call to
schedule_hrtimeout() and transforms the expiration time from
timespec to ktime as required.[ tglx: Fixed whitespace wreckage ]
Signed-off-by: Carsten Emde
Tested-by: Pradyumna Sampath
Cc: Arjan van de Veen
Cc: Andrew Morton
LKML-Reference:
Signed-off-by: Thomas Gleixner
30 Mar, 2010
1 commit
-
…it slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
23 Mar, 2010
1 commit
-
I chased down a fail on ppc64 on 2.6.34-rc2 where an application that
uses shared memory was getting a SEGV.Commit baed7fc9b580bd3fb8252ff1d9b36eaf1f86b670 ("Add generic sys_ipc
wrapper") changed the second argument from an unsigned long to an int.
When we call shmget the system call wrappers for sys_ipc will sign
extend second (ie the size) which truncates it. It took a while to
track down because the call succeeds and strace shows the untruncated
size :)The patch below changes second from an int to an unsigned long which
fixes shmget on ppc64 (and I assume s390, sparc64 and mips64).Signed-off-by: Anton Blanchard
--I assume the function prototypes for the other IPC methods would cause us
to sign or zero extend second where appropriate (avoiding any security
issues). Come to think of it, the syscall wrappers for each method should do
that for us as well.
Signed-off-by: Linus Torvalds
13 Mar, 2010
2 commits
-
Make sure compiler won't do weird things with limits. E.g. fetching them
twice may return 2 different values after writable limits are implemented.I.e. either use rlimit helpers added in
3e10e716abf3c71bdb5d86b8f507f9e72236c9cd ("resource: add helpers for
fetching rlimits") or ACCESS_ONCE if not applicable.Signed-off-by: Jiri Slaby
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add a generic implementation of the ipc demultiplexer syscall. Except for
s390 and sparc64 all implementations of the sys_ipc are nearly identical.There are slight differences in the types of the parameters, where mips
and powerpc as the only 64-bit architectures with sys_ipc use unsigned
long for the "third" argument as it gets casted to a pointer later, while
it traditionally is an "int" like most other paramters. frv goes even
further and uses unsigned long for all parameters execept for "ptr" which
is a pointer type everywhere. The change from int to unsigned long for
"third" and back to "int" for the others on frv should be fine due to the
in-register calling conventions for syscalls (we already had a similar
issue with the generic sys_ptrace), but I'd prefer to have the arch
maintainers looks over this in details.Except for that h8300, m68k and m68knommu lack an impplementation of the
semtimedop sub call which this patch adds, and various architectures have
gets used - at least on i386 it seems superflous as the compat code on
x86-64 and ia64 doesn't even bother to implement it.[akpm@linux-foundation.org: add sys_ipc to sys_ni.c]
Signed-off-by: Christoph Hellwig
Cc: Ralf Baechle
Cc: Benjamin Herrenschmidt
Cc: Paul Mundt
Cc: Jeff Dike
Cc: Hirokazu Takata
Cc: Thomas Gleixner
Cc: Ingo Molnar
Reviewed-by: H. Peter Anvin
Cc: Al Viro
Cc: Arnd Bergmann
Cc: Heiko Carstens
Cc: Martin Schwidefsky
Cc: "Luck, Tony"
Cc: James Morris
Cc: Andreas Schwab
Acked-by: Jesper Nilsson
Acked-by: Russell King
Acked-by: David Howells
Acked-by: Kyle McMartin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Mar, 2010
6 commits
-
Signed-off-by: André Goddard Rosa
Signed-off-by: Al Viro -
... postponing assignments until they're needed. Doesn't change code size.
Signed-off-by: André Goddard Rosa
Signed-off-by: Al Viro -
It reduces code size:
text data bss dec hex filename
9925 72 16 10013 271d ipc/mqueue-BEFORE.o
9885 72 16 9973 26f5 ipc/mqueue-AFTER.oSigned-off-by: André Goddard Rosa
Signed-off-by: Al Viro -
Code size reduction:
text data bss dec hex filename
9941 72 16 10029 272d ipc/mqueue-BEFORE.o
9925 72 16 10013 271d ipc/mqueue-AFTER.oSigned-off-by: André Goddard Rosa
Signed-off-by: Al Viro -
... and abort earlier if we couldn't allocate the message pointers array,
avoiding the u->mq_bytes accounting logic.It reduces code size:
text data bss dec hex filename
9949 72 16 10037 2735 ipc/mqueue-BEFORE.o
9941 72 16 10029 272d ipc/mqueue-AFTER.oSigned-off-by: André Goddard Rosa
Signed-off-by: Al Viro -
We leak fd on lookup_one_len() failure
Signed-off-by: André Goddard Rosa
Signed-off-by: Al Viro
17 Jan, 2010
1 commit
-
Commit c4caa778157dbbf04116f0ac2111e389b5cd7a29 ("file
->get_unmapped_area() shouldn't duplicate work of get_unmapped_area()")
broke SYSV SHM for NOMMU by taking away the pointer to
shm_get_unmapped_area() from shm_file_operations.Put it back conditionally on CONFIG_MMU=n.
file->f_ops->get_unmapped_area() is used to find out the base address for a
mapping of a mappable chardev device or mappable memory-based file (such as a
ramfs file). It needs to be called prior to file->f_ops->mmap() being called.Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Greg Ungerer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
17 Dec, 2009
4 commits
-
* 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (38 commits)
direct I/O fallback sync simplification
ocfs: stop using do_sync_mapping_range
cleanup blockdev_direct_IO locking
make generic_acl slightly more generic
sanitize xattr handler prototypes
libfs: move EXPORT_SYMBOL for d_alloc_name
vfs: force reval of target when following LAST_BIND symlinks (try #7)
ima: limit imbalance msg
Untangling ima mess, part 3: kill dead code in ima
Untangling ima mess, part 2: deal with counters
Untangling ima mess, part 1: alloc_file()
O_TRUNC open shouldn't fail after file truncation
ima: call ima_inode_free ima_inode_free
IMA: clean up the IMA counts updating code
ima: only insert at inode creation time
ima: valid return code from ima_inode_alloc
fs: move get_empty_filp() deffinition to internal.h
Sanitize exec_permission_lite()
Kill cached_lookup() and real_lookup()
Kill path_lookup_open()
...Trivial conflicts in fs/direct-io.c
-
* do ima_get_count() in __dentry_open()
* stop doing that in followups
* move ima_path_check() to right after nameidata_to_filp()
* don't bump counters on itSigned-off-by: Al Viro
-
There are 2 groups of alloc_file() callers:
* ones that are followed by ima_counts_get
* ones giving non-regular files
So let's pull that ima_counts_get() into alloc_file();
it's a no-op in case of non-regular files.Signed-off-by: Al Viro
-
... and have the caller grab both mnt and dentry; kill
leak in infiniband, while we are at it.Signed-off-by: Al Viro
16 Dec, 2009
9 commits
-
This line is unreachable, remove it.
[akpm@linux-foundation.org: remove unneeded initialisation of `err']
Signed-off-by: WANG Cong
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If multiple simple decrements on the same semaphore are pending, then the
current code scans all decrement operations, even if the semaphore value
is already 0.The patch optimizes that: if the semaphore value is 0, then there is no
need to scan the q->alter entries.Note that this is a common case: It happens if 100 decrements by one are
pending and now an increment by one increases the semaphore value from 0
to 1. Without this patch, all 100 entries are scanned. With the patch,
only one entry is scanned, then woken up. Then the new rule triggers and
the scanning is aborted, without looking at the remaining 99 tasks.With this patch, single sop increment/decrement by 1 are now O(1).
(same as with Nick's patch)Signed-off-by: Manfred Spraul
Cc: Nick Piggin
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
sysv sem has the concept of semaphore arrays that consist out of multiple
semaphores. Atomic operations that affect multiple semaphores are
supported.The patch optimizes single semaphore operation calls that affect only one
semaphore: It's not necessary to scan all pending operations, it is
sufficient to scan the per-semaphore list.The idea is from Nick Piggin version of an ipc sem improvement, the
implementation is different: The code tries to keep as much common code as
possible.As the result, the patch is simpler, but optimizes fewer cases.
Signed-off-by: Manfred Spraul
Cc: Nick Piggin
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Based on Nick's findings:
sysv sem has the concept of semaphore arrays that consist out of multiple
semaphores. Atomic operations that affect multiple semaphores are
supported.The patch is the first step for optimizing simple, single semaphore
operations: In addition to the global list of all pending operations, a
2nd, per-semaphore list with the simple operations is added.Note: this patch does not make sense by itself, the new list is used
nowhere.Signed-off-by: Manfred Spraul
Cc: Nick Piggin
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Reduce the amount of scanning of the list of pending semaphore operations:
If try_atomic_semop failed, then no changes were applied. Thus no need to
restart.Additionally, this patch correct an incorrect comment: It's possible to
wait for arbitrary semaphore values (do a dec by , wait-for-zero, inc
by in one atomic operation)Both changes are from Nick Piggin, the patch is the result of a different
split of the individual changes.Signed-off-by: Manfred Spraul
Cc: Nick Piggin
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The strange sysv semaphore wakeup scheme has a kind of busy-wait lock
involved, which could deadlock if preemption is enabled during the "lock".It is an implementation detail (due to a spinlock being held) that this is
actually the case. However if "spinlocks" are made preemptible, or if the
sem lock is changed to a sleeping lock for example, then the wakeup would
become buggy. So this might be a bugfix for -rt kernels.Imagine waker being preempted by wakee and never clearing IN_WAKEUP -- if
wakee has higher RT priority then there is a priority inversion deadlock.
Even if there is not a priority inversion to cause a deadlock, then there
is still time wasted spinning.Signed-off-by: Nick Piggin
Signed-off-by: Manfred Spraul
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Replace the handcoded list operations in update_queue() with the standard
list_for_each_entry macros.list_for_each_entry_safe() must be used, because list entries can
disappear immediately uppon the wakeup event.Signed-off-by: Nick Piggin
Signed-off-by: Manfred Spraul
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Around a month ago, there was some discussion about an improvement of the
sysv sem algorithm: Most (at least: some important) users only use simple
semaphore operations, therefore it's worthwile to optimize this use case.This patch:
Move last looked up sem_undo struct to the head of the task's undo list.
Attempt to move common entries to the front of the list so search time is
reduced. This reduces lookup_undo on oprofile of problematic SAP workload
by 30% (see patch 4 for a description of SAP workload).Signed-off-by: Nick Piggin
Signed-off-by: Manfred Spraul
Cc: Pierre Peiffer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We have apparently had a memory leak since
7ca7e564e049d8b350ec9d958ff25eaa24226352 "ipc: store ipcs into IDRs" in
2007. The idr of which 3 exist for each ipc namespace is never freed.This patch simply frees them when the ipcns is freed. I don't believe any
idr_remove() are done from rcu (and could therefore be delayed until after
this idr_destroy()), so the patch should be safe. Some quick testing
showed no harm, and the memory leak fixed.Caught by kmemleak.
Signed-off-by: Serge E. Hallyn
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
11 Dec, 2009
1 commit
-
... we should call mm ->get_unmapped_area() instead and let our caller
do the final checks.Acked-by: David S. Miller
Signed-off-by: Al Viro
10 Dec, 2009
1 commit
-
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (42 commits)
tree-wide: fix misspelling of "definition" in comments
reiserfs: fix misspelling of "journaled"
doc: Fix a typo in slub.txt.
inotify: remove superfluous return code check
hdlc: spelling fix in find_pvc() comment
doc: fix regulator docs cut-and-pasteism
mtd: Fix comment in Kconfig
doc: Fix IRQ chip docs
tree-wide: fix assorted typos all over the place
drivers/ata/libata-sff.c: comment spelling fixes
fix typos/grammos in Documentation/edac.txt
sysctl: add missing comments
fs/debugfs/inode.c: fix comment typos
sgivwfb: Make use of ARRAY_SIZE.
sky2: fix sky2_link_down copy/paste comment error
tree-wide: fix typos "couter" -> "counter"
tree-wide: fix typos "offest" -> "offset"
fix kerneldoc for set_irq_msi()
spidev: fix double "of of" in comment
comment typo fix: sybsystem -> subsystem
...
04 Dec, 2009
1 commit
-
Commit a0d092f introduced the following warning:
ipc/msg.c: In function ?msgctl_down?:
ipc/msg.c:415: warning: ?msqid64? may be used uninitialized in this functionThe gcc warning in this case is actually bogus, as msqid64 is touched only
iff cmd == IPC_SET, and in such case, copy_msqid_from_user() initializes
it properly.Signed-off-by: Felipe Contreras
Signed-off-by: Jiri Kosina
12 Nov, 2009
1 commit
-
Now that sys_sysctl is a generic wrapper around /proc/sys .ctl_name
and .strategy members of sysctl tables are dead code. Remove them.Signed-off-by: Eric W. Biederman
28 Sep, 2009
1 commit
-
* mark struct vm_area_struct::vm_ops as const
* mark vm_ops in AGP codeBut leave TTM code alone, something is fishy there with global vm_ops
being used.Signed-off-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds