Eric Lee / smarc-fsl-linux-kernel

26 Oct, 2010

40 commits

85fe4025c fs: do not assign default i_ino in new_inode ... Browse Code »
43

Instead of always assigning an increasing inode number in new_inode
move the call to assign it into those callers that actually need it.
For now callers that need it is estimated conservatively, that is
the call is added to all filesystems that do not assign an i_ino
by themselves. For a few more filesystems we can avoid assigning
any inode number given that they aren't user visible, and for others
it could be done lazily when an inode number is actually needed,
but that's left for later patches.

Signed-off-by: Christoph Hellwig
Signed-off-by: Dave Chinner
Signed-off-by: Al Viro

Christoph Hellwig
2010-10-26 09:26:11 +0800
f991bd2e1 fs: introduce a per-cpu last_ino allocator ... Browse Code »

new_inode() dirties a contended cache line to get increasing
inode numbers. This limits performance on workloads that cause
significant parallel inode allocation.

Solve this problem by using a per_cpu variable fed by the shared
last_ino in batches of 1024 allocations. This reduces contention on
the shared last_ino, and give same spreading ino numbers than before
(i.e. same wraparound after 2^32 allocations).

Signed-off-by: Eric Dumazet
Signed-off-by: Nick Piggin
Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Al Viro

Eric Dumazet
2010-10-26 09:26:11 +0800
7de9c6ee3 new helper: ihold() ... Browse Code »

Clones an existing reference to inode; caller must already hold one.

Signed-off-by: Al Viro

Al Viro
2010-10-26 09:26:11 +0800
646ec4615 fs: remove inode_add_to_list/__inode_add_to_list ... Browse Code »

Split up inode_add_to_list/__inode_add_to_list. Locking for the two
lists will be split soon so these helpers really don't buy us much
anymore.

The __ prefixes for the sb list helpers will go away soon, but until
inode_lock is gone we'll need them to distinguish between the locked
and unlocked variants.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-10-26 09:26:10 +0800
f7899bd54 fs: move i_count increments into find_inode/find_inode_fast ... Browse Code »

Now that iunique is not abusing find_inode anymore we can move the i_ref
increment back to where it belongs.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-10-26 09:26:10 +0800
ad5e195ac fs: Stop abusing find_inode_fast in iunique ... Browse Code »

Stop abusing find_inode_fast for iunique and opencode the inode hash walk.
Introduce a new iunique_lock to protect the iunique counters once inode_lock
is removed.

Based on a patch originally from Nick Piggin.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-10-26 09:26:10 +0800
4c51acbc6 fs: Factor inode hash operations into functions ... Browse Code »

Before replacing the inode hash locking with a more scalable
mechanism, factor the removal of the inode from the hashes rather
than open coding it in several places.

Based on a patch originally from Nick Piggin.

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Al Viro

Dave Chinner
2010-10-26 09:26:10 +0800
9e38d86ff fs: Implement lazy LRU updates for inodes ... Browse Code »

Convert the inode LRU to use lazy updates to reduce lock and
cacheline traffic. We avoid moving inodes around in the LRU list
during iget/iput operations so these frequent operations don't need
to access the LRUs. Instead, we defer the refcount checks to
reclaim-time and use a per-inode state flag, I_REFERENCED, to tell
reclaim that iget has touched the inode in the past. This means that
only reclaim should be touching the LRU with any frequency, hence
significantly reducing lock acquisitions and the amount contention
on LRU updates.

This also removes the inode_in_use list, which means we now only
have one list for tracking the inode LRU status. This makes it much
simpler to split out the LRU list operations under it's own lock.

Signed-off-by: Nick Piggin
Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Al Viro

Nick Piggin
2010-10-26 09:26:09 +0800
cffbc8aa3 fs: Convert nr_inodes and nr_unused to per-cpu counters ... Browse Code »

The number of inodes allocated does not need to be tied to the
addition or removal of an inode to/from a list. If we are not tied
to a list lock, we could update the counters when inodes are
initialised or destroyed, but to do that we need to convert the
counters to be per-cpu (i.e. independent of a lock). This means that
we have the freedom to change the list/locking implementation
without needing to care about the counters.

Based on a patch originally from Eric Dumazet.

[AV: cleaned up a bit, fixed build breakage on weird configs

Signed-off-by: Dave Chinner
Reviewed-by: Christoph Hellwig
Signed-off-by: Al Viro

Dave Chinner
2010-10-26 09:26:09 +0800
be1a16a0a vfs: fix infinite loop caused by clone_mnt race ... Browse Code »

If clone_mnt() happens while mnt_make_readonly() is running, the
cloned mount might have MNT_WRITE_HOLD flag set, which results in
mnt_want_write() spinning forever on this mount.

Needs CAP_SYS_ADMIN to trigger deliberately and unlikely to happen
accidentally. But if it does happen it can hang the machine.

Signed-off-by: Miklos Szeredi
Signed-off-by: Al Viro

Miklos Szeredi
2010-10-26 09:24:16 +0800
89b0fc38c switch hfs to hlist_add_fake() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2010-10-26 09:24:16 +0800
756acc2d6 list.h: new helper - hlist_add_fake() ... Browse Code »

Make node look as if it was on hlist, with hlist_del()
working correctly. Usable without any locking...

Convert a couple of places where we want to do that to
inode->i_hash.

Signed-off-by: Al Viro

Al Viro
2010-10-26 09:24:15 +0800
1d3382cbf new helper: inode_unhashed() ... Browse Code »

note: for race-free uses you inode_lock held

Signed-off-by: Al Viro

Al Viro
2010-10-26 09:24:15 +0800
a8dade34e unexport invalidate_inodes ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2010-10-26 09:23:32 +0800
61ebdb425 smbfs never retains inodes with zero refcount in the first place ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2010-10-26 09:23:01 +0800
70fd136ec ntfs: don't call invalidate_inodes() ... Browse Code »

We are in fill_super(); again, no inodes with zero i_count could
be around until we set MS_ACTIVE.

Signed-off-by: Al Viro

Al Viro
2010-10-26 09:23:01 +0800
9dcefee50 gfs2: invalidate_inodes() is no-op there ... Browse Code »

In fill_super() we hadn't MS_ACTIVE set yet, so there won't
be any inodes with zero i_count sitting around.

In put_super() we already have MS_ACTIVE removed *and* we
had called invalidate_inodes() since then. So again there
won't be any inodes with zero i_count...

Signed-off-by: Al Viro

Al Viro
2010-10-26 09:23:01 +0800
8e3b9a072 ext2_remount: don't bother with invalidate_inodes() ... Browse Code »

It's pointless - we *do* have busy inodes (root directory,
for one), so that call will fail and attempt to change
XIP flag will be ignored.

Signed-off-by: Al Viro

Al Viro
2010-10-26 09:23:00 +0800
309f77ad9 fs/buffer.c: call __block_write_begin() if we have page ... Browse Code »

If we have the appropriate page already, call __block_write_begin()
directly instead of releasing and regrabbing it inside of
block_write_begin().

Signed-off-by: Namhyung Kim
Signed-off-by: Al Viro

Namhyung Kim
2010-10-26 09:18:23 +0800
a3314a0ed lockdep: fixup checking of dir inode annotation ... Browse Code »

Since inode->i_mode shares its bits for S_IFMT, S_ISDIR should be
used to distinguish whether it is a dir or not.

Signed-off-by: Namhyung Kim
Signed-off-by: Al Viro

Namhyung Kim
2010-10-26 09:18:23 +0800
306fb0979 aio: bump i_count instead of using igrab ... Browse Code »

The aio batching code is using igrab to get an extra reference on the
inode so it can safely batch. igrab will go ahead and take the global
inode spinlock, which can be a bottleneck on large machines doing lots
of AIO.

In this case, igrab isn't required because we already have a reference
on the file handle. It is safe to just bump the i_count directly
on the inode.

Benchmarking shows this patch brings IOP/s on tons of flash up by about
2.5X.

Signed-off-by: Chris Mason

Chris Mason
2010-10-26 09:18:23 +0800
e1455d1bd update block_device_operations documentation ... Browse Code »

Updated Documentation/filesystems/Locking to match the code.

Signed-off-by: Christoph Hellwig

Christoph Hellwig
2010-10-26 09:18:22 +0800
8358e7d71 fs/buffer.c: remove duplicated assignment on b_private ... Browse Code »

bh->b_private is initialized within init_buffer(), thus the
assignment should be redundant. Remove it.

Signed-off-by: Namhyung Kim
Signed-off-by: Al Viro

Namhyung Kim
2010-10-26 09:18:22 +0800
bb1e5f8c0 fs: move exportfs since it is not a networking filesystem ... Browse Code »

Move the EXPORTFS kconfig symbol out of the NETWORK_FILESYSTEMS block
since it provides a library function that can be (and is) used by other
(non-network) filesystems.

This also eliminates a kconfig dependency warning:

warning: (XFS_FS && BLOCK || NFSD && NETWORK_FILESYSTEMS && INET && FILE_LOCKING && BKL) selects EXPORTFS which has unmet direct dependencies (NETWORK_FILESYSTEMS)

Signed-off-by: Randy Dunlap
Cc: Dave Chinner
Cc: Christoph Hellwig
Cc: Alex Elder
Cc: xfs-masters@oss.sgi.com
Signed-off-by: Al Viro

Randy Dunlap
2010-10-26 09:18:22 +0800
3072b90c4 hfs: use sync_dirty_buffer ... Browse Code »

Use sync_dirty_buffer instead of the incorrect opencoding it.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-10-26 09:18:21 +0800
4a3956c79 vfs: introduce FMODE_UNSIGNED_OFFSET for allowing negative f_pos ... Browse Code »

Now, rw_verify_area() checsk f_pos is negative or not. And if negative,
returns -EINVAL.

But, some special files as /dev/(k)mem and /proc//mem etc.. has
negative offsets. And we can't do any access via read/write to the
file(device).

So introduce FMODE_UNSIGNED_OFFSET to allow negative file offsets.

Signed-off-by: Wu Fengguang
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Al Viro
Cc: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

KAMEZAWA Hiroyuki
2010-10-26 09:18:21 +0800
ba10f4866 hostfs: fix UML crash: remove f_spare from hostfs ... Browse Code »

365b1818 ("add f_flags to struct statfs(64)") resized f_spare within
struct statfs which caused a UML crash. There is no need to copy f_spare.

Signed-off-by: Richard Weinberger
Reported-by: Toralf Förster
Tested-by: Toralf Förster
Cc: Christoph Hellwig
Cc: Al Viro
Cc: Jeff Dike
Signed-off-by: Andrew Morton
Signed-off-by: Al Viro

Richard Weinberger
2010-10-26 09:18:21 +0800
d9d1dc802 Documentation: Fix trivial typo in filesystems/sharedsubtree.txt ... Browse Code »

Documentation: Fix trivial typo in filesystems/sharedsubtree.txt

This typo is easy to ignore unless you have spent a great deal of time
thinking about how to eliminate duplicate dentries in unions.

Signed-off-by: Valerie Aurora
Signed-off-by: Al Viro

Valerie Aurora
2010-10-26 09:18:21 +0800
0e45b67d5 affs: testing the wrong variable ... Browse Code »

The intent was to verify that bh = affs_bread_ino(...) returned a valid
pointer. We checked "ext_bh" earlier in the function and it's valid
here.

Signed-off-by: Dan Carpenter
Signed-off-by: Al Viro

Dan Carpenter
2010-10-26 09:18:20 +0800
7e360c38a fs: allow for more than 2^31 files ... Browse Code »

Andrew,

Could you please review this patch, you probably are the right guy to
take it, because it crosses fs and net trees.

Note : /proc/sys/fs/file-nr is a read-only file, so this patch doesnt
depend on previous patch (sysctl: fix min/max handling in
__do_proc_doulongvec_minmax())

Thanks !

[PATCH V4] fs: allow for more than 2^31 files

Robin Holt tried to boot a 16TB system and found af_unix was overflowing
a 32bit value :

We were seeing a failure which prevented boot. The kernel was incapable
of creating either a named pipe or unix domain socket. This comes down
to a common kernel function called unix_create1() which does:

atomic_inc(&unix_nr_socks);
if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
goto out;

The function get_max_files() is a simple return of files_stat.max_files.
files_stat.max_files is a signed integer and is computed in
fs/file_table.c's files_init().

n = (mempages * (PAGE_SIZE / 1024)) / 10;
files_stat.max_files = n;

In our case, mempages (total_ram_pages) is approx 3,758,096,384
(0xe0000000). That leaves max_files at approximately 1,503,238,553.
This causes 2 * get_max_files() to integer overflow.

Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
integers, and change af_unix to use an atomic_long_t instead of
atomic_t.

get_max_files() is changed to return an unsigned long.
get_nr_files() is changed to return a long.

unix_nr_socks is changed from atomic_t to atomic_long_t, while not
strictly needed to address Robin problem.

Before patch (on a 64bit kernel) :
# echo 2147483648 >/proc/sys/fs/file-max
# cat /proc/sys/fs/file-max
-18446744071562067968

After patch:
# echo 2147483648 >/proc/sys/fs/file-max
# cat /proc/sys/fs/file-max
2147483648
# cat /proc/sys/fs/file-nr
704 0 2147483648

Reported-by: Robin Holt
Signed-off-by: Eric Dumazet
Acked-by: David Miller
Reviewed-by: Robin Holt
Tested-by: Robin Holt
Signed-off-by: Al Viro

Eric Dumazet
2010-10-26 09:18:20 +0800
fde214d41 isofs: Fix isofs_get_blocks for 8TB files ... Browse Code »

Currently isofs_get_blocks() was limited to handle only 4TB files on 32-bit
architectures because of unnecessary use of iblock variable which was signed
long. Just remove the variable. The error messages that were using this
variable should have rather used b_off anyway because that is the block we
are currently mapping.

Signed-off-by: Jan Kara
Signed-off-by: Al Viro

Jan Kara
2010-10-26 09:18:20 +0800
ebdec241d fs: kill block_prepare_write ... Browse Code »

__block_write_begin and block_prepare_write are identical except for slightly
different calling conventions. Convert all callers to the __block_write_begin
calling conventions and drop block_prepare_write.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-10-26 09:18:20 +0800
56b0dacfa fs: mark destroy_inode static ... Browse Code »

Hugetlbfs used to need it, but after the destroy_inode and evict_inode
changes it's not required anymore.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-10-26 09:18:19 +0800
c37650161 fs: add sync_inode_metadata ... Browse Code »

Add a new helper to write out the inode using the writeback code,
that is including the correct dirty bit and list manipulation. A few
of filesystems already opencode this, and a lot of others should be
using it instead of using write_inode_now which also writes out the
data.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-10-26 09:18:19 +0800
81fca4440 fs: move permission check back into __lookup_hash ... Browse Code »

The caller that didn't need it is gone.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-10-26 09:18:19 +0800
72e58063d Merge branch 'davinci-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/gi… ... Browse Code »

…t/khilman/linux-davinci

* 'davinci-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/khilman/linux-davinci: (50 commits)
davinci: fix remaining board support after io_pgoffst removal
davinci: mityomapl138: make file local data static
arm/davinci: remove duplicated include
davinci: Initial support for Omapl138-Hawkboard
davinci: MityDSP-L138/MityARM-1808 read MAC address from I2C Prom
davinci: add tnetv107x touchscreen platform device
input: add driver for tnetv107x touchscreen controller
davinci: add keypad config for tnetv107x evm board
davinci: add tnetv107x keypad platform device
input: add driver for tnetv107x on-chip keypad controller
net: davinci_emac: cleanup unused cpdma code
net: davinci_emac: switch to new cpdma layer
net: davinci_emac: separate out cpdma code
net: davinci_emac: cleanup unused mdio emac code
omap: cleanup unused davinci mdio arch code
davinci: cleanup mdio arch code and switch to phy_id
net: davinci_emac: switch to new mdio
omap: add mdio platform devices
davinci: add mdio platform devices
net: davinci_emac: separate out davinci mdio
...

Fix up trivial conflict in drivers/input/keyboard/Kconfig (two entries
added next to each other - one from the davinci merge, one from the
input merge)

Linus Torvalds
2010-10-26 01:59:31 +0800
57c155d51 Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd ... Browse Code »

* 'for-linus' of git://git.open-osd.org/linux-open-osd:
exofs: Remove inode->i_count manipulation in exofs_new_inode
fs/exofs: typo fix of faild to failed
exofs: Set i_mapping->backing_dev_info anyway
exofs: Cleaup read path in regard with read_for_write

Linus Torvalds
2010-10-26 01:08:21 +0800
9afd281a1 x86-32, mm: Remove duplicated include ... Browse Code »

Commit b40827fa7268 ("x86-32, mm: Add an initial page table for core
bootstrapping") added an include directive which is needless and is
taken care of by a previous one. Remove it.

Caught-by: Jaswinder Singh Rajput
Signed-off-by: Borislav Petkov
Signed-off-by: Linus Torvalds

Borislav Petkov
2010-10-26 01:05:13 +0800
fe2fd9ed5 exofs: Remove inode->i_count manipulation in exofs_new_inode ... Browse Code »

exofs_new_inode() was incrementing the inode->i_count and
decrementing it in create_done(), in a bad attempt to make sure
the inode will still be there when the asynchronous create_done()
finally arrives. This was very stupid because iput() was not called,
and if it was actually needed, it would leak the inode.

However all this is not needed, because at exofs_evict_inode()
we already wait for create_done() by waiting for the
object_created event. Therefore remove the superfluous ref counting
and just Thicken the comment at exofs_evict_inode() a bit.

While at it change places that open coded wait_obj_created()
to call the already available wrapper.

CC: Dave Chinner
CC: Christoph Hellwig
CC: Nick Piggin
Signed-off-by: Boaz Harrosh

Boaz Harrosh
2010-10-26 00:03:07 +0800
571f7f46b fs/exofs: typo fix of faild to failed ... Browse Code »

Signed-off-by: Joe Perches
Signed-off-by: Boaz Harrosh

Joe Perches
2010-10-26 00:02:49 +0800