Eric Lee / smarc-fsl-linux-kernel

04 Jun, 2011

2 commits

4f1ba49ef Merge branch 'for-linus' of git://git.kernel.dk/linux-block ... Browse Code »

* 'for-linus' of git://git.kernel.dk/linux-block:
block: Use hlist_entry() for io_context.cic_list.first
cfq-iosched: Remove bogus check in queue_fail path
xen/blkback: potential null dereference in error handling
xen/blkback: don't call vbd_size() if bd_disk is NULL
block: blkdev_get() should access ->bd_disk only after success
CFQ: Fix typo and remove unnecessary semicolon
block: remove unwanted semicolons
Revert "block: Remove extra discard_alignment from hd_struct."
nbd: adjust 'max_part' according to part_shift
nbd: limit module parameters to a sane value
nbd: pass MSG_* flags to kernel_recvmsg()
block: improve the bio_add_page() and bio_add_pc_page() descriptions

Linus Torvalds
2011-06-04 07:11:26 +0800
3af91a125 Merge branch 'linux-next' of git://git.infradead.org/ubifs-2.6 ... Browse Code »

* 'linux-next' of git://git.infradead.org/ubifs-2.6:
UBIFS: fix-up free space earlier
UBIFS: intialize LPT earlier
UBIFS: assert no fixup when writing a node
UBIFS: fix clean znode counter corruption in error cases
UBIFS: fix memory leak on error path
UBIFS: fix shrinker object count reports
UBIFS: fix recovery broken by the previous recovery fix
UBIFS: amend ubifs_recover_leb interface
UBIFS: introduce a "grouped" journal head flag
UBIFS: supress false error messages

Linus Torvalds
2011-06-04 06:59:32 +0800

03 Jun, 2011

6 commits

098011940 UBIFS: fix-up free space earlier ... Browse Code »

The free space fixup is currently initiated during mount after the call to
ubifs_write_master() which results in a write to PEBs; this has been observed
with the patch 'assert no fixup when writing a node' applied:

Move the free space fixup on mount to before the calls to
ubifs_recover_inl_heads() and ubifs_write_master(). This results in no
assertions with the previously mentioned patch applied.

Artem: tweaked the patch a bit

Signed-off-by: Ben Gardiner
Reviewed-by: Matthew L. Creech
Signed-off-by: Artem Bityutskiy

Ben Gardiner
2011-06-03 23:12:31 +0800
781c5717a UBIFS: intialize LPT earlier ... Browse Code »
44

The current 'mount_ubifs()' implementation does not initialize the LPT until the
the master node is marked dirty. Move the LPT initialization to before marking
the master node dirty. This is a preparation for the next patch which will move
the free-space-fixup check to before marking the master node dirty, because we
have to fix-up the free space before doing any writes.

Artem: massaged the patch and commit message.

Signed-off-by: Ben Gardiner
Reviewed-by: Matthew L. Creech
Signed-off-by: Artem Bityutskiy

Ben Gardiner
2011-06-03 23:12:31 +0800
4f1ab9b01 UBIFS: assert no fixup when writing a node ... Browse Code »

The current free space fixup can result in some writing to the UBI volume
when the space_fixup flag is set.

To catch instances where UBIFS is writing to the NAND while the space_fixup
flag is set, add an assert to ubifs_write_node().

Artem: tweaked the patch, added similar assertion to the write buffer
write path.

Signed-off-by: Ben Gardiner
Reviewed-by: Matthew L. Creech
Signed-off-by: Artem Bityutskiy

Ben Gardiner
2011-06-03 23:12:31 +0800
837072377 UBIFS: fix clean znode counter corruption in error cases ... Browse Code »

UBIFS maintains per-filesystem and global clean znode counters
('c->clean_zn_cnt' and 'ubifs_clean_zn_cnt'). It is important to maintain
correct values there since the shrinker relies on 'ubifs_clean_zn_cnt'.

However, in case of failures during commit the counters were corrupted. E.g.,
if a failure happens in the middle of 'write_index()', then some nodes in the
commit list ('c->cnext') are marked as clean, and some are marked as dirty. And
the 'ubifs_destroy_tnc_subtree()' frees does not retrun correct count, and we
end up with non-zero 'c->clean_zn_cnt' when unmounting. This means that if we
have 2 file-sytem and one of them fails, and we unmount it,
'ubifs_clean_zn_cnt' stays incorrect and confuses the shrinker.

Signed-off-by: Artem Bityutskiy

Artem Bityutskiy
2011-06-03 23:12:31 +0800
812eb2583 UBIFS: fix memory leak on error path ... Browse Code »

UBIFS leaks memory on error path in 'ubifs_jnl_update()' in case of write
failure because it forgets to free the 'struct ubifs_dent_node *dent' object.
Although the object is small, the alignment can make it large - e.g., 2KiB
if the min. I/O unit is 2KiB.

Signed-off-by: Artem Bityutskiy
Cc: stable@kernel.org

Artem Bityutskiy
2011-06-03 23:12:31 +0800
cf610bf41 UBIFS: fix shrinker object count reports ... Browse Code »

Sometimes VM asks the shrinker to return amount of objects it can shrink,
and we return the ubifs_clean_zn_cnt in that case. However, it is possible
that this counter is negative for a short period of time, due to the way
UBIFS TNC code updates it. And I can observe the following warnings sometimes:

shrink_slab: ubifs_shrinker+0x0/0x2b7 [ubifs] negative objects to delete nr=-8541616642706119788

This patch makes sure UBIFS never returns negative count of objects.

Signed-off-by: Artem Bityutskiy
Cc: stable@kernel.org

Artem Bityutskiy
2011-06-03 23:12:24 +0800

01 Jun, 2011

5 commits

da8b94ea6 UBIFS: fix recovery broken by the previous recovery fix ... Browse Code »

Unfortunately, the recovery fix d1606a59b6be4ea392eabd40d1250aa1eeb19efb
(UBIFS: fix extremely rare mount failure) broke recovery. This commit make
UBIFS drop the last min. I/O unit in all journal heads, but this is needed only
for the GC head. And this does not work for non-GC heads. For example, if
suppose we have min. I/O units A and B, and A contains a valid node X, which
was fsynced, and then a group of nodes Y which spans the rest of A and B. In
this case we'll drop not only Y, but also X, which is obviously incorrect.

This patch fixes the issue and additionally makes recovery to drop last min.
I/O unit only for the GC head, and leave things as they have been for ages for
the other heads - this is safer.

Signed-off-by: Artem Bityutskiy

Artem Bityutskiy
2011-06-01 17:29:06 +0800
efcfde54c UBIFS: amend ubifs_recover_leb interface ... Browse Code »

Instead of passing "grouped" parameter to 'ubifs_recover_leb()' which tells
whether the nodes are grouped in the LEB to recover, pass the journal head
number and let 'ubifs_recover_leb()' look at the journal head's 'grouped' flag.

This patch is a preparation to a further fix where we'll need to know the
journal head number for other purposes.

Signed-off-by: Artem Bityutskiy

Artem Bityutskiy
2011-06-01 17:29:06 +0800
1a0b06997 UBIFS: introduce a "grouped" journal head flag ... Browse Code »

Journal heads are different in a way how UBIFS writes nodes there. All normal
journal heads receive grouped nodes, while the GC journal heads receives
ungrouped nodes. This patch adds a 'grouped' flag to 'struct ubifs_jhead' which
describes this property.

This patch is a preparation to a further recovery fix.

Signed-off-by: Artem Bityutskiy

Artem Bityutskiy
2011-06-01 17:29:06 +0800
ab75950b1 UBIFS: supress false error messages ... Browse Code »

Commit ab51afe05273741f72383529ef488aa1ea598ec6 was a good clean-up, but
it introduced a regression - now UBIFS prints scary error messages during
recovery on all corrupted nodes, even though the corruptions are expected
(due to a power cut). This patch fixes the issue.

Additionally fix a typo in a commentary introduced by the same commit.

Signed-off-by: Artem Bityutskiy

Artem Bityutskiy
2011-06-01 17:29:05 +0800
4c49ff3fe block: blkdev_get() should access ->bd_disk only after success ... Browse Code »

d4dc210f69 (block: don't block events on excl write for non-optical
devices) added dereferencing of bdev->bd_disk to test
GENHD_FL_BLOCK_EVENTS_ON_EXCL_WRITE; however, bdev->bd_disk can be
%NULL if open failed which can lead to an oops.

Test the flag after testing open was successful, not before.

Signed-off-by: Tejun Heo
Reported-by: David Miller
Tested-by: David Miller
Cc: stable@kernel.org
Signed-off-by: Jens Axboe

Tejun Heo
2011-06-01 14:28:47 +0800

30 May, 2011

27 commits

c7427d23f autofs4: bogus dentry_unhash() added in ->unlink() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2011-05-30 13:50:53 +0800
3cebde241 vfs: shrink_dcache_parent before rmdir, dir rename ... Browse Code »

The dentry_unhash push-down series missed that shink_dcache_parent needs to
be called prior to rmdir or dir rename to clear DCACHE_REFERENCED and
allow efficient dentry reclaim.

Reported-by: Dave Chinner
Signed-off-by: Sage Weil
Signed-off-by: Al Viro

Sage Weil
2011-05-30 13:48:27 +0800
a1706ac4c Revert "block: Remove extra discard_alignment from hd_struct." ... Browse Code »

It was not a good idea to start dereferencing disk->queue from
the fs sysfs strategy for displaying discard alignment. We ran
into first a NULL pointer deref, and after fixing that we sometimes
see unvalid disk->queue pointer values.

Since discard is the only one of the bunch actually looking into
the queue, just revert the change.

This reverts commit 23ceb5b7719e9276d4fa72a3ecf94dd396755276.

Conflicts:
fs/partitions/check.c

Jens Axboe
2011-05-30 13:42:51 +0800
bd1bfe40a Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
eCryptfs: Remove ecryptfs_header_cache_2
eCryptfs: Cleanup and optimize ecryptfs_lookup_interpose()
eCryptfs: Return useful code from contains_ecryptfs_marker
eCryptfs: Fix new inode race condition
eCryptfs: Cleanup inode initialization code
eCryptfs: Consolidate inode functions into inode.c

Linus Torvalds
2011-05-30 05:13:25 +0800
cd1acdf17 Merge branch 'pnfs-submit' of git://git.open-osd.org/linux-open-osd ... Browse Code »

* 'pnfs-submit' of git://git.open-osd.org/linux-open-osd: (32 commits)
pnfs-obj: pg_test check for max_io_size
NFSv4.1: define nfs_generic_pg_test
NFSv4.1: use pnfs_generic_pg_test directly by layout driver
NFSv4.1: change pg_test return type to bool
NFSv4.1: unify pnfs_pageio_init functions
pnfs-obj: objlayout_encode_layoutcommit implementation
pnfs: encode_layoutcommit
pnfs-obj: report errors and .encode_layoutreturn Implementation.
pnfs: encode_layoutreturn
pnfs: layoutret_on_setattr
pnfs: layoutreturn
pnfs-obj: osd raid engine read/write implementation
pnfs: support for non-rpc layout drivers
pnfs-obj: define per-inode private structure
pnfs: alloc and free layout_hdr layoutdriver methods
pnfs-obj: objio_osd device information retrieval and caching
pnfs-obj: decode layout, alloc/free lseg
pnfs-obj: pnfs_osd XDR client implementation
pnfs-obj: pnfs_osd XDR definitions
pnfs-obj: objlayoutdriver module skeleton
...

Linus Torvalds
2011-05-30 05:10:13 +0800
306328705 eCryptfs: Remove ecryptfs_header_cache_2 ... Browse Code »

Now that ecryptfs_lookup_interpose() is no longer using
ecryptfs_header_cache_2 to read in metadata, the kmem_cache can be
removed and the ecryptfs_header_cache_1 kmem_cache can be renamed to
ecryptfs_header_cache.

Signed-off-by: Tyler Hicks

Tyler Hicks
2011-05-30 03:24:25 +0800
778aeb42a eCryptfs: Cleanup and optimize ecryptfs_lookup_interpose() ... Browse Code »

ecryptfs_lookup_interpose() has turned into spaghetti code over the
years. This is an effort to clean it up.

- Shorten overly descriptive variable names such as ecryptfs_dentry
- Simplify gotos and error paths
- Create helper function for reading plaintext i_size from metadata

It also includes an optimization when reading i_size from the metadata.
A complete page-sized kmem_cache_alloc() was being done to read in 16
bytes of metadata. The buffer for that is now statically declared.

Signed-off-by: Tyler Hicks

Tyler Hicks
2011-05-30 03:24:24 +0800
7a86617e5 eCryptfs: Return useful code from contains_ecryptfs_marker ... Browse Code »

Instead of having the calling functions translate the true/false return
code to either 0 or -EINVAL, have contains_ecryptfs_marker() return 0 or
-EINVAL so that the calling functions can just reuse the return code.

Also, rename the function to ecryptfs_validate_marker() to avoid callers
mistakenly thinking that it returns true/false codes.

Signed-off-by: Tyler Hicks

Tyler Hicks
2011-05-30 03:24:24 +0800
3b06b3ebf eCryptfs: Fix new inode race condition ... Browse Code »

Only unlock and d_add() new inodes after the plaintext inode size has
been read from the lower filesystem. This fixes a race condition that
was sometimes seen during a multi-job kernel build in an eCryptfs mount.

https://bugzilla.kernel.org/show_bug.cgi?id=36002

Signed-off-by: Tyler Hicks
Reported-by: David
Tested-by: David

Tyler Hicks
2011-05-30 03:23:39 +0800
57ed609d4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile:
arch/tile: more /proc and /sys file support

Linus Torvalds
2011-05-30 02:29:28 +0800
a74d70b63 Merge branch 'for-2.6.40' of git://linux-nfs.org/~bfields/linux ... Browse Code »

* 'for-2.6.40' of git://linux-nfs.org/~bfields/linux: (22 commits)
nfsd: make local functions static
NFSD: Remove unused variable from nfsd4_decode_bind_conn_to_session()
NFSD: Check status from nfsd4_map_bcts_dir()
NFSD: Remove setting unused variable in nfsd_vfs_read()
nfsd41: error out on repeated RECLAIM_COMPLETE
nfsd41: compare request's opcnt with session's maxops at nfsd4_sequence
nfsd v4.1 lOCKT clientid field must be ignored
nfsd41: add flag checking for create_session
nfsd41: make sure nfs server process OPEN with EXCLUSIVE4_1 correctly
nfsd4: fix wrongsec handling for PUTFH + op cases
nfsd4: make fh_verify responsibility of nfsd_lookup_dentry caller
nfsd4: introduce OPDESC helper
nfsd4: allow fh_verify caller to skip pseudoflavor checks
nfsd: distinguish functions of NFSD_MAY_* flags
svcrpc: complete svsk processing on cb receive failure
svcrpc: take advantage of tcp autotuning
SUNRPC: Don't wait for full record to receive tcp data
svcrpc: copy cb reply instead of pages
svcrpc: close connection if client sends short packet
svcrpc: note network-order types in svc_process_calldir
...

Linus Torvalds
2011-05-30 02:21:12 +0800
f1d1c9fa8 Merge branch 'nfs-for-2.6.40' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 ... Browse Code »

* 'nfs-for-2.6.40' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
SUNRPC: Support for RPC over AF_LOCAL transports
SUNRPC: Remove obsolete comment
SUNRPC: Use AF_LOCAL for rpcbind upcalls
SUNRPC: Clean up use of curly braces in switch cases
NFS: Revert NFSROOT default mount options
SUNRPC: Rename xs_encode_tcp_fragment_header()
nfs,rcu: convert call_rcu(nfs_free_delegation_callback) to kfree_rcu()
nfs41: Correct offset for LAYOUTCOMMIT
NFS: nfs_update_inode: print current and new inode size in debug output
NFSv4.1: Fix the handling of NFS4ERR_SEQ_MISORDERED errors
NFSv4: Handle expired stateids when the lease is still valid
SUNRPC: Deal with the lack of a SYN_SENT sk->sk_state_change callback...

Linus Torvalds
2011-05-30 02:20:02 +0800
2ff55e98d Merge git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/pkl/squashfs-linus:
Squashfs: Fix sanity check patches on big-endian systems

Linus Torvalds
2011-05-30 02:19:45 +0800
ef1d57599 cifs/ubifs: Fix shrinker API change fallout ... Browse Code »

Commit 1495f230fa77 ("vmscan: change shrinker API by passing
shrink_control struct") changed the API of ->shrink(), but missed ubifs
and cifs instances.

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2011-05-30 02:17:34 +0800
934207701 pnfs-obj: pg_test check for max_io_size ... Browse Code »

Implement pg_test vector to test for max IO sizes. We calculate
a max_io_size member only once, and cache it in lseg so to not
do so on every page insert.

Signed-off-by: Boaz Harrosh
[simplify logic]
Signed-off-by: Benny Halevy

Boaz Harrosh
2011-05-30 02:03:08 +0800
5b36c7dc4 NFSv4.1: define nfs_generic_pg_test ... Browse Code »

By default, unless pnfs is used coalesce pages until pg_bsize
(rsize or wsize) is reached.

pnfs layout drivers define their own pg_test methods that use
pnfs_generic_pg_test and need to define their own I/O size
limits (e.g. based on the file stripe size).

[Move a check from nfs_pageio_do_add_request to nfs_generic_pg_test]
Signed-off-by: Boaz Harrosh
Signed-off-by: Benny Halevy

Boaz Harrosh
2011-05-30 02:02:42 +0800
89a58e32d NFSv4.1: use pnfs_generic_pg_test directly by layout driver ... Browse Code »

Signed-off-by: Benny Halevy

Benny Halevy
2011-05-30 01:56:55 +0800
18ad0a9f2 NFSv4.1: change pg_test return type to bool ... Browse Code »

Signed-off-by: Benny Halevy

Benny Halevy
2011-05-30 01:56:54 +0800
dfed206b8 NFSv4.1: unify pnfs_pageio_init functions ... Browse Code »

Use common code for pnfs_pageio_init_{read,write} and use
a common generic pg_test function.

Note that this function always assumes the the layout driver's
pg_test method is implemented.

[Fix BUG]
Signed-off-by: Boaz Harrosh
Signed-off-by: Benny Halevy

Benny Halevy
2011-05-30 01:56:43 +0800
a0fe8bf42 pnfs-obj: objlayout_encode_layoutcommit implementation ... Browse Code »

* Define API for io-engines to report delta_space_used in IOs
* Encode the osd-layout specific information of the layoutcommit
XDR buffer.

Signed-off-by: Boaz Harrosh
Signed-off-by: Benny Halevy

Boaz Harrosh
2011-05-30 01:55:00 +0800
ac7db7264 pnfs: encode_layoutcommit ... Browse Code »

Add a layout driver method to encode the layout type specific
opaque part of layout commit in-line in the xdr stream.

Currently, the pnfs-objects layout driver uses it to encode metadata hints
to the MDS and the blocks layout driver to commit provisionally allocated
extents to the file.

Signed-off-by: Benny Halevy

Benny Halevy
2011-05-30 01:55:00 +0800
adb58535e pnfs-obj: report errors and .encode_layoutreturn Implementation. ... Browse Code »

An io_state pre-allocates an error information structure for each
possible osd-device that might error during IO. When IO is done if all
was well the io_state is freed. (as today). If the I/O has ended with an
error, the io_state is queued on a per-layout err_list. When eventually
encode_layoutreturn() is called, each error is properly encoded on the
XDR buffer and only then the io_state is removed from err_list and
de-allocated.

It is up to the io_engine to fill in the segment that fault and the type
of osd_error that occurred. By calling objlayout_io_set_result() for
each failing device.

In objio_osd:
* Allocate io-error descriptors space as part of io_state
* Use generic objlayout error reporting at end of io.

Signed-off-by: Boaz Harrosh
Signed-off-by: Benny Halevy

Boaz Harrosh
2011-05-30 01:54:45 +0800
04a555498 pnfs: encode_layoutreturn ... Browse Code »

Add a layout driver method to encode the layout type specific
opaque part of layout return in-line in the xdr stream.

Currently the pnfs-objects layout driver uses it to encode i/o error
information on LAYOUTRETURN.

Signed-off-by: Andy Adamson
[fixup layout header pointer for encode_layoutreturn]
Signed-off-by: Benny Halevy

Andy Adamson
2011-05-30 01:54:37 +0800
8a1636c45 pnfs: layoutret_on_setattr ... Browse Code »

With the objects layout security model, we have object capabilities
that are associated with the layout and we anticipate that the server
will issue a cb_layoutrecall for any setattr that changes security
related attributes (user/group/mode/acl) or truncates the file.

Therefore, the layout is returned before issuing the setattr to avoid
the anticipated cb_layoutrecall.

Signed-off-by: Benny Halevy

Benny Halevy
2011-05-30 01:54:36 +0800
cbe826036 pnfs: layoutreturn ... Browse Code »

NFSv4.1 LAYOUTRETURN implementation

Currently, does not support layout-type payload encoding.

Signed-off-by: Alexandros Batsakis
Signed-off-by: Andy Adamson
Signed-off-by: Andy Adamson
Signed-off-by: Dean Hildebrand
Signed-off-by: Fred Isaman
Signed-off-by: Fred Isaman
Signed-off-by: Marc Eshel
Signed-off-by: Zhang Jingwang
[call pnfs_return_layout right before pnfs_destroy_layout]
[remove assert_spin_locked from pnfs_clear_lseg_list]
[remove wait parameter from the layoutreturn path.]
[remove return_type field from nfs4_layoutreturn_args]
[remove range from nfs4_layoutreturn_args]
[no need to send layoutcommit from _pnfs_return_layout]
[don't wait on sync layoutreturn]
[fix layout stateid in layoutreturn args]
[fixed NULL deref in _pnfs_return_layout]
[removed recaim member of nfs4_layoutreturn_args]
Signed-off-by: Benny Halevy

Benny Halevy
2011-05-30 01:54:36 +0800
04f834503 pnfs-obj: osd raid engine read/write implementation ... Browse Code »

With the use of the in-kernel osd library. Implement read/write
of data from/to osd-objects according to information specified
in the objects-layout.

Support for stripping over mirrors with a received stripe_unit.
There are however a few constrains which are not supported:
1. Stripe Unit must be a multiple of PAGE_SIZE
2. stripe length (stripe_unit * number_of_stripes) can not be
bigger then 32bit.

Also support raid-groups and partial-layout. Partial-layout is
when not all the groups are received on the line, addressing
only a partial range of the file.

TODO:
Only raid0! raid 4/5/6 support will come at later stage

A none supported layout will send IO through the MDS

[Important fallout from the last rebase]
Signed-off-by: Boaz Harrosh
[gfp_flags]
Signed-off-by: Benny Halevy

Boaz Harrosh
2011-05-30 01:54:15 +0800
d20581aa4 pnfs: support for non-rpc layout drivers ... Browse Code »

Non-rpc layout driver such as for objects and blocks
implement their own I/O path and error handling logic.
Therefore bypass NFS-based error handling for these layout drivers.

[fix lseg ref-count bugs, and null de-refs]
[Fall out from: non-rpc layout drivers]
Signed-off-by: Boaz Harrosh
[get rid of PNFS_USE_RPC_CODE]
[get rid of __nfs4_write_done_cb]
[revert useless change in nfs4_write_done_cb]
Signed-off-by: Benny Halevy

Benny Halevy
2011-05-30 01:53:51 +0800