03 Nov, 2011
14 commits
-
* 'for-3.2' of git://linux-nfs.org/~bfields/linux:
nfsd4: typo logical vs bitwise negate in nfsd4_decode_share_access -
Says Andrew:
"60 patches. That's good enough for -rc1 I guess. I have quite a lot
of detritus to be rechecked, work through maintainers, etc.- most of the remains of MM
- rtc
- various misc
- cgroups
- memcg
- cpusets
- procfs
- ipc
- rapidio
- sysctl
- pps
- w1
- drivers/misc
- aio"* akpm: (60 commits)
memcg: replace ss->id_lock with a rwlock
aio: allocate kiocbs in batches
drivers/misc/vmw_balloon.c: fix typo in code comment
drivers/misc/vmw_balloon.c: determine page allocation flag can_sleep outside loop
w1: disable irqs in critical section
drivers/w1/w1_int.c: multiple masters used same init_name
drivers/power/ds2780_battery.c: fix deadlock upon insertion and removal
drivers/power/ds2780_battery.c: add a nolock function to w1 interface
drivers/power/ds2780_battery.c: create central point for calling w1 interface
w1: ds2760 and ds2780, use ida for id and ida_simple_get() to get it
pps gpio client: add missing dependency
pps: new client driver using GPIO
pps: default echo function
include/linux/dma-mapping.h: add dma_zalloc_coherent()
sysctl: make CONFIG_SYSCTL_SYSCALL default to n
sysctl: add support for poll()
RapidIO: documentation update
drivers/net/rionet.c: fix ethernet address macros for LE platforms
RapidIO: fix potential null deref in rio_setup_device()
RapidIO: add mport driver for Tsi721 bridge
... -
In testing aio on a fast storage device, I found that the context lock
takes up a fair amount of cpu time in the I/O submission path. The reason
is that we take it for every I/O submitted (see __aio_get_req). Since we
know how many I/Os are passed to io_submit, we can preallocate the kiocbs
in batches, reducing the number of times we take and release the lock.In my testing, I was able to reduce the amount of time spent in
_raw_spin_lock_irq by .56% (average of 3 runs). The command I used to
test this was:aio-stress -O -o 2 -o 3 -r 8 -d 128 -b 32 -i 32 -s 16384
I also tested the patch with various numbers of events passed to
io_submit, and I ran the xfstests aio group of tests to ensure I didn't
break anything.Signed-off-by: Jeff Moyer
Cc: Daniel Ehrenberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Adding support for poll() in sysctl fs allows userspace to receive
notifications of changes in sysctl entries. This adds a infrastructure to
allow files in sysctl fs to be pollable and implements it for hostname and
domainname.[akpm@linux-foundation.org: s/declare/define/ for definitions]
Signed-off-by: Lucas De Marchi
Cc: Greg KH
Cc: Kay Sievers
Cc: Al Viro
Cc: "Eric W. Biederman"
Cc: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
fd* files are restricted to the task's owner, and other users may not get
direct access to them. But one may open any of these files and run any
setuid program, keeping opened file descriptors. As there are permission
checks on open(), but not on readdir() and read(), operations on the kept
file descriptors will not be checked. It makes it possible to violate
procfs permission model.Reading fdinfo/* may disclosure current fds' position and flags, reading
directory contents of fdinfo/ and fd/ may disclosure the number of opened
files by the target task. This information is not sensible per se, but it
can reveal some private information (like length of a password stored in a
file) under certain conditions.Used existing (un)lock_trace functions to check for ptrace_may_access(),
but instead of using EPERM return code from it use EACCES to be consistent
with existing proc_pid_follow_link()/proc_pid_readlink() return code. If
they differ, attacker can guess what fds exist by analyzing stat() return
code. Patched handlers: stat() for fd/*, stat() and read() for fdindo/*,
readdir() and lookup() for fd/ and fdinfo/.Signed-off-by: Vasiliy Kulikov
Cc: Cyrill Gorcunov
Cc:
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
On reading sysctl dirs we should return -EISDIR instead of -EINVAL.
Signed-off-by: Pavel Emelyanov
Signed-off-by: Cyrill Gorcunov
Cc: Alexey Dobriyan
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Clement Lecigne reports a filesystem which causes a kernel oops in
hfs_find_init() trying to dereference sb->ext_tree which is NULL.This proves to be because the filesystem has a corrupted MDB extent
record, where the extents file does not fit into the first three extents
in the file record (the first blocks).In hfs_get_block() when looking up the blocks for the extent file
(HFS_EXT_CNID), it fails the first blocks special case, and falls
through to the extent code (which ultimately calls hfs_find_init())
which is in the process of being initialised.Hfs avoids this scenario by always having the extents b-tree fitting
into the first blocks (the extents B-tree can't have overflow extents).The fix is to check at mount time that the B-tree fits into first
blocks, i.e. fail if HFS_I(inode)->alloc_blocks >=
HFS_I(inode)->first_blocksNote, the existing commit 47f365eb57573 ("hfs: fix oops on mount with
corrupted btree extent records") becomes subsumed into this as a special
case, but only for the extents B-tree (HFS_EXT_CNID), it is perfectly
acceptable for the catalog B-Tree file to grow beyond three extents,
with the remaining extent descriptors in the extents overfow.This fixes CVE-2011-2203
Reported-by: Clement LECIGNE
Signed-off-by: Phillip Lougher
Cc: Jeff Mahoney
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Use mpage_readpages() instead of multiple calls to isofs_readpage() to
reduce the CPU utilization and make performance higher.Signed-off-by: Namjae Jeon
Cc: Al Viro
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Since ramfs is hard-selected to "y", the module leftovers make no sense.
Signed-off-by: Richard Weinberger
Reviewed-by: WANG Cong
Cc: Al Viro
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The case of address space randomization being disabled in runtime through
randomize_va_space sysctl is not treated properly in load_elf_binary(),
resulting in SIGKILL coming at exec() time for certain PIE-linked binaries
in case the randomization has been disabled at runtime prior to calling
exec().Handle the randomize_va_space == 0 case the same way as if we were not
supporting .text randomization at all.Based on original patch by H.J. Lu and Josh Boyer.
Signed-off-by: Jiri Kosina
Cc: Ingo Molnar
Cc: Russell King
Cc: H.J. Lu
Cc:
Tested-by: Josh Boyer
Acked-by: Nicolas Pitre
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue:
vfs: add d_prune dentry operation
vfs: protect i_nlink
filesystems: add set_nlink()
filesystems: add missing nlink wrappers
logfs: remove unnecessary nlink setting
ocfs2: remove unnecessary nlink setting
jfs: remove unnecessary nlink setting
hypfs: remove unnecessary nlink setting
vfs: ignore error on forced remount
readlinkat: ensure we return ENOENT for the empty pathname for normal lookups
vfs: fix dentry leak in simple_fill_super() -
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (97 commits)
jbd2: Unify log messages in jbd2 code
jbd/jbd2: validate sb->s_first in journal_get_superblock()
ext4: let ext4_ext_rm_leaf work with EXT_DEBUG defined
ext4: fix a syntax error in ext4_ext_insert_extent when debugging enabled
ext4: fix a typo in struct ext4_allocation_context
ext4: Don't normalize an falloc request if it can fit in 1 extent.
ext4: remove comments about extent mount option in ext4_new_inode()
ext4: let ext4_discard_partial_buffers handle unaligned range correctly
ext4: return ENOMEM if find_or_create_pages fails
ext4: move vars to local scope in ext4_discard_partial_page_buffers_no_lock()
ext4: Create helper function for EXT4_IO_END_UNWRITTEN and i_aiodio_unwritten
ext4: optimize locking for end_io extent conversion
ext4: remove unnecessary call to waitqueue_active()
ext4: Use correct locking for ext4_end_io_nolock()
ext4: fix race in xattr block allocation path
ext4: trace punch_hole correctly in ext4_ext_map_blocks
ext4: clean up AGGRESSIVE_TEST code
ext4: move variables to their scope
ext4: fix quota accounting during migration
ext4: migrate cleanup
... -
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Cleanup metadata flags handling
udf: Skip mirror metadata FE loading when metadata FE is ok
ext3: Allow quota file use root reservation
udf: Remove web reference from UDF MAINTAINERS entry
quota: Drop path reference on error exit from quotactl
udf: Neaten udf_debug uses
udf: Neaten logging output, use vsprintf extension %pV
udf: Convert printks to pr_
udf: Rename udf_warning to udf_warn
udf: Rename udf_error to udf_err
udf: Promote some debugging messages to udf_error
ext3: Remove the obsolete broken EXT3_IOC32_WAIT_FOR_READONLY.
udf: Add readpages support for udf.
ext3/balloc.c: local functions should be static
ext2: fix the outdated comment in ext2_nfs_get_inode()
ext3: remove deprecated oldalloc
fs/ext3/balloc.c: delete useless initialization
fs/ext2/balloc.c: delete useless initialization
ext3: fix message in ext3_remount for rw-remount case
ext3: Remove i_mutex from ext3_sync_file()Fix up trivial (printf format cleanup) conflicts in fs/udf/udfdecl.h
-
* 'for-linus' of git://github.com/richardweinberger/linux: (90 commits)
um: fix ubd cow size
um: Fix kmalloc argument order in um/vdso/vma.c
um: switch to use of drivers/Kconfig
UserModeLinux-HOWTO.txt: fix a typo
UserModeLinux-HOWTO.txt: remove ^H characters
um: we need sys/user.h only on i386
um: merge delay_{32,64}.c
um: distribute exports to where exported stuff is defined
um: kill system-um.h
um: generic ftrace.h will do...
um: segment.h is x86-only and needed only there
um: asm/pda.h is not needed anymore
um: hw_irq.h can go generic as well
um: switch to generic-y
um: clean Kconfig up a bit
um: a couple of missing dependencies...
um: kill useless argument of free_chan() and free_one_chan()
um: unify ptrace_user.h
um: unify KSTK_...
um: fix gcov build breakage
...
02 Nov, 2011
18 commits
-
everything in USER_OBJ gets it via -include user.h
Signed-off-by: Al Viro
Signed-off-by: Richard Weinberger -
This adds a d_prune dentry operation that is called by the VFS prior to
pruning (i.e. unhashing and killing) a hashed dentry from the dcache.
Wrap dentry_lru_del() and use the new _prune() helper in the cases where we
are about to unhash and kill the dentry.This will be used by Ceph to maintain a flag indicating whether the
complete contents of a directory are contained in the dcache, allowing it
to satisfy lookups and readdir without addition server communication.Renumber a few DCACHE_* #defines to group DCACHE_OP_PRUNE with the other
DCACHE_OP_ bits.Signed-off-by: Sage Weil
Signed-off-by: Christoph Hellwig -
Prevent direct modification of i_nlink by making it const and adding a
non-const __i_nlink alias.Signed-off-by: Miklos Szeredi
Tested-by: Toshiyuki Okajima
Signed-off-by: Christoph Hellwig -
Replace remaining direct i_nlink updates with a new set_nlink()
updater function.Signed-off-by: Miklos Szeredi
Tested-by: Toshiyuki Okajima
Signed-off-by: Christoph Hellwig -
Replace direct i_nlink updates with the respective updater function
(inc_nlink, drop_nlink, clear_nlink, inode_dec_link_count).Signed-off-by: Miklos Szeredi
-
alloc_inode() initializes i_nlink to 1. Remove unnecessary
re-initialization.Signed-off-by: Miklos Szeredi
CC: Joern Engel
CC: Prasad Joshi
Signed-off-by: Christoph Hellwig -
alloc_inode() initializes i_nlink to 1. Remove unnecessary
re-initialization.Signed-off-by: Miklos Szeredi
CC: Joel Becker
CC: Mark Fasheh
Signed-off-by: Christoph Hellwig -
alloc_inode() initializes i_nlink to 1. Remove unnecessary
re-initialization.Signed-off-by: Miklos Szeredi
Acked-by: Dave Kleikamp
Signed-off-by: Christoph Hellwig -
On emergency remount we want to force MS_RDONLY on the super block
even if ->remount_fs() failed for some reason.Signed-off-by: Miklos Szeredi
Tested-by: Toshiyuki Okajima
Signed-off-by: Christoph Hellwig -
Since the commit below which added O_PATH support to the *at() calls, the
error return for readlink/readlinkat for the empty pathname has switched
from ENOENT to EINVAL:commit 65cfc6722361570bfe255698d9cd4dccaf47570d
Author: Al Viro
Date: Sun Mar 13 15:56:26 2011 -0400readlinkat(), fchownat() and fstatat() with empty relative pathnames
This is both unexpected for userspace and makes readlink/readlinkat
inconsistant with all other interfaces; and inconsistant with our stated
return for these pathnames.As the readlinkat call does not have a flags parameter we cannot use the
AT_EMPTY_PATH approach used in the other calls. Therefore expose whether
the original path is infact entry via a new user_path_at_empty() path
lookup function. Use this to determine whether to default to EINVAL or
ENOENT for failures.Addresses http://bugs.launchpad.net/bugs/817187
[akpm@linux-foundation.org: remove unused getname_flags()]
Signed-off-by: Andy Whitcroft
Cc: Christoph Hellwig
Cc: Al Viro
Cc:
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Christoph Hellwig -
put dentry if inode allocation failed, d_genocide() cannot release it
Signed-off-by: Konstantin Khlebnikov
Signed-off-by: Christoph Hellwig -
Some jbd2 code prints out kernel messages with "JBD2: " prefix, at the
same time other jbd2 code prints with "JBD: " prefix. Unify the prefix
to "JBD2: ".Signed-off-by: Eryu Guan
Signed-off-by: "Theodore Ts'o" -
I hit a J_ASSERT(blocknr != 0) failure in cleanup_journal_tail() when
mounting a fsfuzzed ext3 image. It turns out that the corrupted ext3
image has s_first = 0 in journal superblock, and the 0 is passed to
journal->j_head in journal_reset(), then to blocknr in
cleanup_journal_tail(), in the end the J_ASSERT failed.So validate s_first after reading journal superblock from disk in
journal_get_superblock() to ensure s_first is valid.The following script could reproduce it:
fstype=ext3
blocksize=1024
img=$fstype.img
offset=0
found=0
magic="c0 3b 39 98"dd if=/dev/zero of=$img bs=1M count=8
mkfs -t $fstype -b $blocksize -F $img
filesize=`stat -c %s $img`
while [ $offset -lt $filesize ]
do
if od -j $offset -N 4 -t x1 $img | grep -i "$magic";then
echo "Found journal: $offset"
found=1
break
fi
offset=`echo "$offset+$blocksize" | bc`
doneif [ $found -ne 1 ];then
echo "Magic \"$magic\" not found"
exit 1
fidd if=/dev/zero of=$img seek=$(($offset+23)) conv=notrunc bs=1 count=1
mkdir -p ./mnt
mount -o loop $img ./mntCc: Jan Kara
Signed-off-by: Eryu Guan
Signed-off-by: "Theodore Ts'o" -
The variable 'block' is removed by commit 750c9c47, so use the
replacement ex_ee_block instead.Signed-off-by: Yongqiang Yang
Signed-off-by: "Theodore Ts'o" -
This patch fixes a syntax error which omits a comma. Besides this,
logical block number is unsigend 32 bits, so printk should use %u
instead %d.Signed-off-by: Yongqiang Yang
Signed-off-by: "Theodore Ts'o" -
Signed-off-by: Benny Halevy
Signed-off-by: J. Bruce Fields -
* 'pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
pstore: make pstore write function return normal success/fail value
pstore: change mutex locking to spin_locks
pstore: defer inserting OOPS entries into pstore -
In sysfs_rename we need to remove the optimization of not calling
sysfs_unlink_sibling and sysfs_link_sibling if the renamed parent
directory is not changing. This optimization is no longer valid now
that sysfs dirents are stored in an rbtree sorted by name.Move the assignment of s_ns before the call of sysfs_link_sibling. With
no sysfs_dirent fields changing after the call of sysfs_link_sibling
this allows sysfs_link_sibling to take any of the directory entries into
account when it builds the rbtrees, and s_ns looks like a prime canidate
to be used in the rbtree in the future.Signed-off-by: Eric W. Biederman
Cc: Jiri Slaby
Cc: Greg KH
Cc: David Miller
Cc: Mikulas Patocka
Cc: Andrew Morton
Signed-off-by: Linus Torvalds
01 Nov, 2011
8 commits
-
epoll can acquire recursively acquire ep->mtx on multiple "struct
eventpoll"s at once in the case where one epoll fd is monitoring another
epoll fd. This is perfectly OK, since we're careful about the lock
ordering, but it causes spurious lockdep warnings. Annotate the recursion
using mutex_lock_nested, and add a comment explaining the nesting rules
for good measure.Recent versions of systemd are triggering this, and it can also be
demonstrated with the following trivial test program:--------------------8
Tested-by: Paul Bolle
Signed-off-by: Nelson Elhage
Acked-by: Jason Baron
Cc: Dave Jones
Cc: Davide Libenzi
Cc:
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There is no functional change.
Signed-off-by: Andy Shevchenko
Acked-by: OGAWA Hirofumi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Standardize the style for compiler based printf format verification.
Standardized the location of __printf too.Done via script and a little typing.
$ grep -rPl --include=*.[ch] -w "__attribute__" * | \
grep -vP "^(tools|scripts|include/linux/compiler-gcc.h)" | \
xargs perl -n -i -e 'local $/; while (<>) { s/\b__attribute__\s*\(\s*\(\s*format\s*\(\s*printf\s*,\s*(.+)\s*,\s*(.+)\s*\)\s*\)\s*\)/__printf($1, $2)/g ; print; }'[akpm@linux-foundation.org: revert arch bits]
Signed-off-by: Joe Perches
Cc: "Kirill A. Shutemov"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently a statfs on a pipe's /proc//fd/ link returns -ENOSYS. Wire
pipfs up so that the statfs succeeds.This is required by checkpoint-restart in the userspace to make it
possible to distinguish pipes from fifos.When we dump information about task's open files we use the /proc/pid/fd
directoy's symlinks and the fact that opening any of them gives us exactly
the same dentry->inode pair as the original process has. Now if a task
we're dumping has opened pipe and fifo we need to detect this and act
accordingly. Knowing that an fd with type S_ISFIFO resides on a pipefs is
the most precise way.Signed-off-by: Pavel Emelyanov
Reviewed-by: Tejun Heo
Acked-by: Serge Hallyn
Signed-off-by: Cyrill Gorcunov
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
On the ext4 mailing list[1], we got some report about errors in
__find_get_block_slow(), but the information is very limited.If the device information is given, we can know the name of the sick
volume. Futhermore, we can get the corresponding status of that
block(group, inode block etc) by analyzing the disk layout.[1] http://marc.info/?l=linux-ext4&m=131379831421147&w=2
Signed-off-by: Tao Ma
Cc: Al Viro
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The callback must not return -1 when nr_to_scan is zero. Fix the bug in
fs/super.c and add this requirement to the callback specification.Signed-off-by: Mikulas Patocka
Cc: Dave Chinner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
memchr_inv() is mainly used to check whether the whole buffer is filled
with just a specified byte.The function name and prototype are stolen from logfs and the
implementation is from SLUB.Signed-off-by: Akinobu Mita
Acked-by: Christoph Lameter
Acked-by: Pekka Enberg
Cc: Matt Mackall
Acked-by: Joern Engel
Cc: Marcin Slusarz
Cc: Eric Dumazet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Direct reclaim should never writeback pages. Warn if an attempt is made.
Signed-off-by: Mel Gorman
Cc: Dave Chinner
Cc: Christoph Hellwig
Cc: Johannes Weiner
Cc: Wu Fengguang
Cc: Jan Kara
Cc: Minchan Kim
Cc: Rik van Riel
Cc: Mel Gorman
Cc: Alex Elder
Cc: Theodore Ts'o
Cc: Chris Mason
Cc: Dave Hansen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds