Eric Lee / smarc-fsl-linux-kernel

29 Oct, 2011

5 commits

97d2eb13a Merge branch 'for-linus' of git://ceph.newdream.net/git/ceph-client ... Browse Code »

* 'for-linus' of git://ceph.newdream.net/git/ceph-client:
libceph: fix double-free of page vector
ceph: fix 32-bit ino numbers
libceph: force resend of osd requests if we skip an osdmap
ceph: use kernel DNS resolver
ceph: fix ceph_monc_init memory leak
ceph: let the set_layout ioctl set single traits
Revert "ceph: don't truncate dirty pages in invalidate work thread"
ceph: replace leading spaces with tabs
libceph: warn on msg allocation failures
libceph: don't complain on msgpool alloc failures
libceph: always preallocate mon connection
libceph: create messenger with client
ceph: document ioctls
ceph: implement (optional) max read size
ceph: rename rsize -> rasize
ceph: make readpages fully async

Linus Torvalds
2011-10-29 07:42:18 +0800
f362f98e7 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue ... Browse Code »

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue: (21 commits)
leases: fix write-open/read-lease race
nfs: drop unnecessary locking in llseek
ext4: replace cut'n'pasted llseek code with generic_file_llseek_size
vfs: add generic_file_llseek_size
vfs: do (nearly) lockless generic_file_llseek
direct-io: merge direct_io_walker into __blockdev_direct_IO
direct-io: inline the complete submission path
direct-io: separate map_bh from dio
direct-io: use a slab cache for struct dio
direct-io: rearrange fields in dio/dio_submit to avoid holes
direct-io: fix a wrong comment
direct-io: separate fields only used in the submission path from struct dio
vfs: fix spinning prevention in prune_icache_sb
vfs: add a comment to inode_permission()
vfs: pass all mask flags check_acl and posix_acl_permission
vfs: add hex format for MAY_* flag values
vfs: indicate that the permission functions take all the MAY_* flags
compat: sync compat_stats with statfs.
vfs: add "device" tag to /proc/self/mountstats
cleanup: vfs: small comment fix for block_invalidatepage
...

Fix up trivial conflict in fs/gfs2/file.c (llseek changes)

Linus Torvalds
2011-10-29 01:49:34 +0800
f793f2961 Merge http://sucs.org/~rohan/git/gfs2-3.0-nmw ... Browse Code »

* http://sucs.org/~rohan/git/gfs2-3.0-nmw: (24 commits)
GFS2: Move readahead of metadata during deallocation into its own function
GFS2: Remove two unused variables
GFS2: Misc fixes
GFS2: rewrite fallocate code to write blocks directly
GFS2: speed up delete/unlink performance for large files
GFS2: Fix off-by-one in gfs2_blk2rgrpd
GFS2: Clean up ->page_mkwrite
GFS2: Correctly set goal block after allocation
GFS2: Fix AIL flush issue during fsync
GFS2: Use cached rgrp in gfs2_rlist_add()
GFS2: Call do_strip() directly from recursive_scan()
GFS2: Remove obsolete assert
GFS2: Cache the most recently used resource group in the inode
GFS2: Make resource groups "append only" during life of fs
GFS2: Use rbtree for resource groups and clean up bitmap buffer ref count scheme
GFS2: Fix lseek after SEEK_DATA, SEEK_HOLE have been added
GFS2: Clean up gfs2_create
GFS2: Use ->dirty_inode()
GFS2: Fix bug trap and journaled data fsync
GFS2: Fix inode allocation error path
...

Linus Torvalds
2011-10-29 01:44:50 +0800
dabcbb1ba Merge branch '3.2-without-smb2' of git://git.samba.org/sfrench/cifs-2.6 ... Browse Code »

* '3.2-without-smb2' of git://git.samba.org/sfrench/cifs-2.6: (52 commits)
Fix build break when freezer not configured
Add definition for share encryption
CIFS: Make cifs_push_locks send as many locks at once as possible
CIFS: Send as many mandatory unlock ranges at once as possible
CIFS: Implement caching mechanism for posix brlocks
CIFS: Implement caching mechanism for mandatory brlocks
CIFS: Fix DFS handling in cifs_get_file_info
CIFS: Fix error handling in cifs_readv_complete
[CIFS] Fixup trivial checkpatch warning
[CIFS] Show nostrictsync and noperm mount options in /proc/mounts
cifs, freezer: add wait_event_freezekillable and have cifs use it
cifs: allow cifs_max_pending to be readable under /sys/module/cifs/parameters
cifs: tune bdi.ra_pages in accordance with the rsize
cifs: allow for larger rsize= options and change defaults
cifs: convert cifs_readpages to use async reads
cifs: add cifs_async_readv
cifs: fix protocol definition for READ_RSP
cifs: add a callback function to receive the rest of the frame
cifs: break out 3rd receive phase into separate function
cifs: find mid earlier in receive codepath
...

Linus Torvalds
2011-10-29 01:43:32 +0800
5619a6939 Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfs ... Browse Code »

* 'for-linus' of git://oss.sgi.com/xfs/xfs: (69 commits)
xfs: add AIL pushing tracepoints
xfs: put in missed fix for merge problem
xfs: do not flush data workqueues in xfs_flush_buftarg
xfs: remove XFS_bflush
xfs: remove xfs_buf_target_name
xfs: use xfs_ioerror_alert in xfs_buf_iodone_callbacks
xfs: clean up xfs_ioerror_alert
xfs: clean up buffer allocation
xfs: remove buffers from the delwri list in xfs_buf_stale
xfs: remove XFS_BUF_STALE and XFS_BUF_SUPER_STALE
xfs: remove XFS_BUF_SET_VTYPE and XFS_BUF_SET_VTYPE_REF
xfs: remove XFS_BUF_FINISH_IOWAIT
xfs: remove xfs_get_buftarg_list
xfs: fix buffer flushing during unmount
xfs: optimize fsync on directories
xfs: reduce the number of log forces from tail pushing
xfs: Don't allocate new buffers on every call to _xfs_buf_find
xfs: simplify xfs_trans_ijoin* again
xfs: unlock the inode before log force in xfs_change_file_space
xfs: unlock the inode before log force in xfs_fs_nfs_commit_metadata
...

Linus Torvalds
2011-10-29 01:31:42 +0800

28 Oct, 2011

20 commits

f3c7691e8 leases: fix write-open/read-lease race ... Browse Code »

In setlease, we use i_writecount to decide whether we can give out a
read lease.

In open, we break leases before incrementing i_writecount.

There is therefore a window between the break lease and the i_writecount
increment when setlease could add a new read lease.

This would leave us with a simultaneous write open and read lease, which
shouldn't happen.

Signed-off-by: J. Bruce Fields
Signed-off-by: Christoph Hellwig

J. Bruce Fields
2011-10-28 20:59:00 +0800
79835a710 nfs: drop unnecessary locking in llseek ... Browse Code »

This makes NFS follow the standard generic_file_llseek locking scheme.

Cc: Trond.Myklebust@netapp.com
Signed-off-by: Andi Kleen
Signed-off-by: Christoph Hellwig

Andi Kleen
2011-10-28 20:59:00 +0800
4cce0e28b ext4: replace cut'n'pasted llseek code with generic_file_llseek_size ... Browse Code »

This gives ext4 the benefits of unlocked llseek.

Cc: tytso@mit.edu
Signed-off-by: Andi Kleen
Signed-off-by: Christoph Hellwig

Andi Kleen
2011-10-28 20:58:59 +0800
5760495a8 vfs: add generic_file_llseek_size ... Browse Code »

Add a generic_file_llseek variant to the VFS that allows passing in
the maximum file size of the file system, instead of always
using maxbytes from the superblock.

This can be used to eliminate some cut'n'paste seek code in ext4.

Signed-off-by: Andi Kleen
Signed-off-by: Christoph Hellwig

Andi Kleen
2011-10-28 20:58:59 +0800
ef3d0fd27 vfs: do (nearly) lockless generic_file_llseek ... Browse Code »

The i_mutex lock use of generic _file_llseek hurts. Independent processes
accessing the same file synchronize over a single lock, even though
they have no need for synchronization at all.

Under high utilization this can cause llseek to scale very poorly on larger
systems.

This patch does some rethinking of the llseek locking model:

First the 64bit f_pos is not necessarily atomic without locks
on 32bit systems. This can already cause races with read() today.
This was discussed on linux-kernel in the past and deemed acceptable.
The patch does not change that.

Let's look at the different seek variants:

SEEK_SET: Doesn't really need any locking.
If there's a race one writer wins, the other loses.

For 32bit the non atomic update races against read()
stay the same. Without a lock they can also happen
against write() now. The read() race was deemed
acceptable in past discussions, and I think if it's
ok for read it's ok for write too.

=> Don't need a lock.

SEEK_END: This behaves like SEEK_SET plus it reads
the maximum size too. Reading the maximum size would have the
32bit atomic problem. But luckily we already have a way to read
the maximum size without locking (i_size_read), so we
can just use that instead.

Without i_mutex there is no synchronization with write() anymore,
however since the write() update is atomic on 64bit it just behaves
like another racy SEEK_SET. On non atomic 32bit it's the same
as SEEK_SET.

=> Don't need a lock, but need to use i_size_read()

SEEK_CUR: This has a read-modify-write race window
on the same file. One could argue that any application
doing unsynchronized seeks on the same file is already broken.
But for the sake of not adding a regression here I'm
using the file->f_lock to synchronize this. Using this
lock is much better than the inode mutex because it doesn't
synchronize between processes.

=> So still need a lock, but can use a f_lock.

This patch implements this new scheme in generic_file_llseek.
I dropped generic_file_llseek_unlocked and changed all callers.

Signed-off-by: Andi Kleen
Signed-off-by: Christoph Hellwig

Andi Kleen
2011-10-28 20:58:58 +0800
847cc6371 direct-io: merge direct_io_walker into __blockdev_direct_IO ... Browse Code »

This doesn't change anything for the compiler, but hch thought it would
make the code clearer.

I moved the reference counting into its own little inline.

Signed-off-by: Andi Kleen
Acked-by: Jeff Moyer
Signed-off-by: Christoph Hellwig

Andi Kleen
2011-10-28 20:58:58 +0800
ba253fbf6 direct-io: inline the complete submission path ... Browse Code »

Add inlines to all the submission path functions. While this increases
code size it also gives gcc a lot of optimization opportunities
in this critical hotpath.

In particular -- together with some other changes -- this
allows gcc to get rid of the unnecessary clearing of
sdio at the beginning and optimize the messy parameter passing.
Any non inlining of a function which takes a sdio parameter
would break this optimization because they cannot be done if the
address of a structure is taken.

Note that benefits are only seen with CONFIG_OPTIMIZE_INLINING
and CONFIG_CC_OPTIMIZE_FOR_SIZE both set to off.

This gives about 2.2% improvement on a large database benchmark
with a high IOPS rate.

Signed-off-by: Andi Kleen
Signed-off-by: Christoph Hellwig

Andi Kleen
2011-10-28 20:58:58 +0800
18772641d direct-io: separate map_bh from dio ... Browse Code »

Only a single b_private field in the map_bh buffer head is needed after
the submission path. Move map_bh separately to avoid storing
this information in the long term slab.

This avoids the weird 104 byte hole in struct dio_submit which also needed
to be memseted early.

Signed-off-by: Andi Kleen
Signed-off-by: Christoph Hellwig

Andi Kleen
2011-10-28 20:58:57 +0800
6e8267f53 direct-io: use a slab cache for struct dio ... Browse Code »

A direct slab call is slightly faster than kmalloc and can be better cached
per CPU. It also avoids rounding to the next kmalloc slab.

In addition this enforces cache line alignment for struct dio to avoid
any false sharing.

Signed-off-by: Andi Kleen
Acked-by: Jeff Moyer
Signed-off-by: Christoph Hellwig

Andi Kleen
2011-10-28 20:58:57 +0800
0dc2bc49b direct-io: rearrange fields in dio/dio_submit to avoid holes ... Browse Code »

Fix most problems reported by pahole.

There is still a weird 104 byte hole after map_bh. I'm not sure what
causes this.

Signed-off-by: Andi Kleen
Acked-by: Jeff Moyer
Signed-off-by: Christoph Hellwig

Andi Kleen
2011-10-28 20:58:56 +0800
cde1ecb32 direct-io: fix a wrong comment ... Browse Code »

There's nothing on the stack, even before my changes.

Signed-off-by: Andi Kleen
Acked-by: Jeff Moyer
Signed-off-by: Christoph Hellwig

Andi Kleen
2011-10-28 20:58:56 +0800
eb28be2b4 direct-io: separate fields only used in the submission path from struct dio ... Browse Code »

This large, but largely mechanic, patch moves all fields in struct dio
that are only used in the submission path into a separate on stack
data structure. This has the advantage that the memory is very likely
cache hot, which is not guaranteed for memory fresh out of kmalloc.

This also gives gcc more optimization potential because it can easier
determine that there are no external aliases for these variables.

The sdio initialization is a initialization now instead of memset.
This allows gcc to break sdio into individual fields and optimize
away unnecessary zeroing (after all the functions are inlined)

Signed-off-by: Andi Kleen
Acked-by: Jeff Moyer
Signed-off-by: Christoph Hellwig

Andi Kleen
2011-10-28 20:58:56 +0800
62a3ddef6 vfs: fix spinning prevention in prune_icache_sb ... Browse Code »

We need to move the inode to the end of the list to actually make the
spinning prevention explained in the comment above it work. With a
plain list_move it will simply stay in place as we're always reclaiming
from the head of the list.

Signed-off-by: Christoph Hellwig

Christoph Hellwig
2011-10-28 20:58:55 +0800
948409c74 vfs: add a comment to inode_permission() ... Browse Code »

Acked-by: J. Bruce Fields
Acked-by: David Howells
Signed-off-by: Andreas Gruenbacher
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Christoph Hellwig

Andreas Gruenbacher
2011-10-28 20:58:55 +0800
d124b60a8 vfs: pass all mask flags check_acl and posix_acl_permission ... Browse Code »

Acked-by: J. Bruce Fields
Acked-by: David Howells
Signed-off-by: Andreas Gruenbacher
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Christoph Hellwig

Andreas Gruenbacher
2011-10-28 20:58:54 +0800
8fd90c8d1 vfs: indicate that the permission functions take all the MAY_* flags ... Browse Code »

Acked-by: J. Bruce Fields
Acked-by: David Howells
Signed-off-by: Andreas Gruenbacher
Signed-off-by: Aneesh Kumar K.V
Signed-off-by: Christoph Hellwig

Andreas Gruenbacher
2011-10-28 20:58:54 +0800
1448c721e compat: sync compat_stats with statfs. ... Browse Code »

This was found by inspection while tracking a similar
bug in compat_statfs64, that has been fixed in mainline
since decemeber.

- This fixes a bug where not all of the f_spare fields
were cleared on mips and s390.
- Add the f_flags field to struct compat_statfs
- Copy f_flags to userspace in case someone cares.
- Use __clear_user to copy the f_spare field to userspace
to ensure that all of the elements of f_spare are cleared.
On some architectures f_spare is has 5 ints and on some
architectures f_spare only has 4 ints. Which makes
the previous technique of clearing each int individually
broken.

I don't expect anyone actually uses the old statfs system
call anymore but if they do let them benefit from having
the compat and the native version working the same.

Signed-off-by: Eric W. Biederman
Signed-off-by: Christoph Hellwig

Eric W. Biederman
2011-10-28 20:58:53 +0800
a877ee03a vfs: add "device" tag to /proc/self/mountstats ... Browse Code »
1

nfsiostat was failing to find mounted filesystems on kernels after
2.6.38 because of changes to show_vfsstat() by commit
c7f404b40a3665d9f4e9a927cc5c1ee0479ed8f9. This patch adds back the
"device" tag before the nfs server entry so scripts can parse the
mountstats file correctly.

Signed-off-by: Bryan Schumaker
CC: stable@kernel.org [>=2.6.39]
Signed-off-by: Christoph Hellwig

Bryan Schumaker
2011-10-28 19:55:08 +0800
814e1d25a cleanup: vfs: small comment fix for block_invalidatepage ... Browse Code »

The patch is aganist 3.1-rc3.

Signed-off-by: Wang Sheng-Hui
Signed-off-by: Christoph Hellwig

Wang Sheng-Hui
2011-10-28 19:55:08 +0800
96814ecb4 Add definition for share encryption ... Browse Code »

Samba supports a setfs info level to negotiate encrypted
shares. This patch adds the defines so we recognize
this info level. Later patches will add the enablement
for it.

Acked-by: Jeremy Allison
Signed-off-by: Steve French

Steve French
2011-10-28 05:53:31 +0800

27 Oct, 2011

2 commits

60325f0c6 fs/Makefile: Stupid typo breakage of exofs inclusion ... Browse Code »

In my last patch I did a stupid mistake and broke the exofs
compilation completely. Fix it ASAP.

Instead of obj-y I did obj-$(y)

Really Really sorry. Me totally blushing :-{|

Signed-off-by: Boaz Harrosh
Signed-off-by: Linus Torvalds

Boaz Harrosh
2011-10-27 14:36:51 +0800
c28cfd60e Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd ... Browse Code »

* 'for-linus' of git://git.open-osd.org/linux-open-osd: (21 commits)
ore: Enable RAID5 mounts
exofs: Support for RAID5 read-4-write interface.
ore: RAID5 Write
ore: RAID5 read
fs/Makefile: Always inspect exofs/
ore: Make ore_calc_stripe_info EXPORT_SYMBOL
ore/exofs: Change ore_check_io API
ore/exofs: Define new ore_verify_layout
ore: Support for partial component table
ore: Support for short read/writes
exofs: Support for short read/writes
ore: Remove check for ios->kern_buff in _prepare_for_striping to later
ore: cleanup: Embed an ore_striping_info inside ore_io_state
ore: Only IO one group at a time (API change)
ore/exofs: Change the type of the devices array (API change)
ore: Make ore_striping_info and ore_calc_stripe_info public
exofs: Remove unused data_map member from exofs_sb_info
exofs: Rename struct ore_components comps => oc
exofs/super.c: local functions should be static
exofs/ore.c: local functions should be static
...

Linus Torvalds
2011-10-27 03:33:50 +0800

26 Oct, 2011

13 commits

39adff5f6 Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip ... Browse Code »

* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (23 commits)
time, s390: Get rid of compile warning
dw_apb_timer: constify clocksource name
time: Cleanup old CONFIG_GENERIC_TIME references that snuck in
time: Change jiffies_to_clock_t() argument type to unsigned long
alarmtimers: Fix error handling
clocksource: Make watchdog reset lockless
posix-cpu-timers: Cure SMP accounting oddities
s390: Use direct ktime path for s390 clockevent device
clockevents: Add direct ktime programming function
clockevents: Make minimum delay adjustments configurable
nohz: Remove "Switched to NOHz mode" debugging messages
proc: Consider NO_HZ when printing idle and iowait times
nohz: Make idle/iowait counter update conditional
nohz: Fix update_ts_time_stat idle accounting
cputime: Clean up cputime_to_usecs and usecs_to_cputime macros
alarmtimers: Rework RTC device selection using class interface
alarmtimers: Add try_to_cancel functionality
alarmtimers: Add more refined alarm state tracking
alarmtimers: Remove period from alarm structure
alarmtimers: Remove interval cap limit hack
...

Linus Torvalds
2011-10-26 23:15:03 +0800
e33bae14f Merge branch 'for-linus' of git://github.com/ericvh/linux ... Browse Code »

* 'for-linus' of git://github.com/ericvh/linux:
9p: fix 9p.txt to advertise msize instead of maxdata
net/9p: Convert net/9p protocol dumps to tracepoints
fs/9p: change an int to unsigned int
fs/9p: Cleanup option parsing in 9p
9p: move dereference after NULL check
fs/9p: inode file operation is properly initialized init_special_inode
fs/9p: Update zero-copy implementation in 9p

Linus Torvalds
2011-10-26 20:20:53 +0800
339573406 libceph: fix double-free of page vector ... Browse Code »

ceph_release_page_vector() kfrees the vector; we shouldn't do it here too.

Reported-by: Jeff Wu
Signed-off-by: Sage Weil

Sage Weil
2011-10-26 07:10:17 +0800
3310f7541 ceph: fix 32-bit ino numbers ... Browse Code »

Fix 32-bit ino generation to not always be 1.

Signed-off-by: Amon Ott

Amon Ott
2011-10-26 07:10:17 +0800
a35eca958 ceph: let the set_layout ioctl set single traits ... Browse Code »

Previously we were validating the passed-in stripe unit, object size,
and stripe count against each other (and not testing most other stuff).
Instead, make sure that the composed previous layout and new values are valid,
and only send the new values to the MDS. This lets users change the
pool without setting the whole layout, for instance.

Signed-off-by: Greg Farnum

Greg Farnum
2011-10-26 07:10:16 +0800
83eaea22b Revert "ceph: don't truncate dirty pages in invalidate work thread" ... Browse Code »

This reverts commit c9af9fb68e01eb2c2165e1bc45cfeeed510c64e6.

We need to block and truncate all pages in order to reliably invalidate
them. Otherwise, we could:

- have some uptodate pages in the cache
- queue an invalidate
- write(2) locks some pages
- invalidate_work skips them
- write(2) only overwrites part of the page
- page now dirty and uptodate
-> partial leakage of invalidated data

It's not entirely clear why we started skipping locked pages in the first
place. I just ran this through fsx and didn't see any problems.

Signed-off-by: Sage Weil

Sage Weil
2011-10-26 07:10:16 +0800
80db8bea6 ceph: replace leading spaces with tabs ... Browse Code »

Trivial formatting fix.

Signed-off-by: Noah Watkins
Signed-off-by: Sage Weil

Noah Watkins
2011-10-26 07:10:16 +0800
b61c27636 libceph: don't complain on msgpool alloc failures ... Browse Code »

The pool allocation failures are masked by the pool; there is no need to
spam the console about them. (That's the whole point of having the pool
in the first place.)

Mark msg allocations whose failure is safely handled as such.

Signed-off-by: Sage Weil

Sage Weil
2011-10-26 07:10:15 +0800
6ab00d465 libceph: create messenger with client ... Browse Code »

This simplifies the init/shutdown paths, and makes client->msgr available
during the rest of the setup process.

Signed-off-by: Sage Weil

Sage Weil
2011-10-26 07:10:15 +0800
6a8ea4706 ceph: document ioctls ... Browse Code »

...after some prodding by Christoph.

Signed-off-by: Sage Weil

Sage Weil
2011-10-26 07:10:15 +0800
0d66a487c ceph: implement (optional) max read size ... Browse Code »

The 'rsize' mount option limits the maximum size of an individual
read(ahead) operation that is sent off to an OSD. This is distinct from
'rasize', which controls the size of the readahead window.

Signed-off-by: Sage Weil

Sage Weil
2011-10-26 07:10:15 +0800
83817e35c ceph: rename rsize -> rasize ... Browse Code »

It controls readahead.

Signed-off-by: Sage Weil

Sage Weil
2011-10-26 07:10:15 +0800
7c272194e ceph: make readpages fully async ... Browse Code »

When we get a ->readpages() aop, submit async reads for all page ranges
in the provided page list. Lock the pages immediately, so that VFS/MM
will block until the reads complete.

Signed-off-by: Sage Weil

Sage Weil
2011-10-26 07:10:14 +0800