Eric Lee / smarc-fsl-linux-kernel

20 Jul, 2007

5 commits

20c2df83d mm: Remove slab destructors from kmem_cache_create(). ... Browse Code »

Slab destructors were no longer supported after Christoph's
c59def9f222d44bb7e2f0a559f2906191a0862d7 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt

Paul Mundt
2007-07-20 09:11:58 +0800
f745bb1c7 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 ... Browse Code »

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
ocfs2: ->fallocate() support

Linus Torvalds
2007-07-20 05:16:44 +0800
d0217ac04 mm: fault feedback #1 ... Browse Code »

Change ->fault prototype. We now return an int, which contains
VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte.
FAULT_RET_ code tells the VM whether a page was found, whether it has been
locked, and potentially other things. This is not quite the way he wanted
it yet, but that's changed in the next patch (which requires changes to
arch code).

This means we no longer set VM_CAN_INVALIDATE in the vma in order to say
that a page is locked which requires filemap_nopage to go away (because we
can no longer remain backward compatible without that flag), but we were
going to do that anyway.

struct fault_data is renamed to struct vm_fault as Linus asked. address
is now a void __user * that we should firmly encourage drivers not to use
without really good reason.

The page is now returned via a page pointer in the vm_fault struct.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-07-20 01:04:41 +0800
54cb8821d mm: merge populate and nopage into fault (fixes nonlinear) ... Browse Code »

Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
the virtual address -> file offset differently from linear mappings.

->populate is a layering violation because the filesystem/pagecache code
should need to know anything about the virtual memory mapping. The hitch here
is that the ->nopage handler didn't pass down enough information (ie. pgoff).
But it is more logical to pass pgoff rather than have the ->nopage function
calculate it itself anyway (because that's a similar layering violation).

Having the populate handler install the pte itself is likewise a nasty thing
to be doing.

This patch introduces a new fault handler that replaces ->nopage and
->populate and (later) ->nopfn. Most of the old mechanism is still in place
so there is a lot of duplication and nice cleanups that can be removed if
everyone switches over.

The rationale for doing this in the first place is that nonlinear mappings are
subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
to duplicate the synchronisation logic rather than just consolidate the two.

After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
pagecache. Seems like a fringe functionality anyway.

NOPAGE_REFAULT is removed. This should be implemented with ->fault, and no
users have hit mainline yet.

[akpm@linux-foundation.org: cleanup]
[randy.dunlap@oracle.com: doc. fixes for readahead]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Nick Piggin
Signed-off-by: Randy Dunlap
Cc: Mark Fasheh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-07-20 01:04:41 +0800
d00806b18 mm: fix fault vs invalidate race for linear mappings ... Browse Code »

Fix the race between invalidate_inode_pages and do_no_page.

Andrea Arcangeli identified a subtle race between invalidation of pages from
pagecache with userspace mappings, and do_no_page.

The issue is that invalidation has to shoot down all mappings to the page,
before it can be discarded from the pagecache. Between shooting down ptes to
a particular page, and actually dropping the struct page from the pagecache,
do_no_page from any process might fault on that page and establish a new
mapping to the page just before it gets discarded from the pagecache.

The most common case where such invalidation is used is in file truncation.
This case was catered for by doing a sort of open-coded seqlock between the
file's i_size, and its truncate_count.

Truncation will decrease i_size, then increment truncate_count before
unmapping userspace pages; do_no_page will read truncate_count, then find the
page if it is within i_size, and then check truncate_count under the page
table lock and back out and retry if it had subsequently been changed (ptl
will serialise against unmapping, and ensure a potentially updated
truncate_count is actually visible).

Complexity and documentation issues aside, the locking protocol fails in the
case where we would like to invalidate pagecache inside i_size. do_no_page
can come in anytime and filemap_nopage is not aware of the invalidation in
progress (as it is when it is outside i_size). The end result is that
dangling (->mapping == NULL) pages that appear to be from a particular file
may be mapped into userspace with nonsense data. Valid mappings to the same
place will see a different page.

Andrea implemented two working fixes, one using a real seqlock, another using
a page->flags bit. He also proposed using the page lock in do_no_page, but
that was initially considered too heavyweight. However, it is not a global or
per-file lock, and the page cacheline is modified in do_no_page to increment
_count and _mapcount anyway, so a further modification should not be a large
performance hit. Scalability is not an issue.

This patch implements this latter approach. ->nopage implementations return
with the page locked if it is possible for their underlying file to be
invalidated (in that case, they must set a special vm_flags bit to indicate
so). do_no_page only unlocks the page after setting up the mapping
completely. invalidation is excluded because it holds the page lock during
invalidation of each page (and ensures that the page is not mapped while
holding the lock).

This also allows significant simplifications in do_no_page, because we have
the page locked in the right place in the pagecache from the start.

Signed-off-by: Nick Piggin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Nick Piggin
2007-07-20 01:04:41 +0800

19 Jul, 2007

1 commit

385820a38 ocfs2: ->fallocate() support ... Browse Code »

Plug ocfs2 into the ->fallocate() callback. This just re-uses the existing
preallocation code.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-07-19 15:23:55 +0800

18 Jul, 2007

5 commits

86313c488 usermodehelper: Tidy up waiting ... Browse Code »

Rather than using a tri-state integer for the wait flag in
call_usermodehelper_exec, define a proper enum, and use that. I've
preserved the integer values so that any callers I've missed should
still work OK.

Signed-off-by: Jeremy Fitzhardinge
Cc: James Bottomley
Cc: Randy Dunlap
Cc: Christoph Hellwig
Cc: Andi Kleen
Cc: Paul Mackerras
Cc: Johannes Berg
Cc: Ralf Baechle
Cc: Bjorn Helgaas
Cc: Joel Becker
Cc: Tony Luck
Cc: Kay Sievers
Cc: Srivatsa Vaddagiri
Cc: Oleg Nesterov
Cc: David Howells

Jeremy Fitzhardinge
2007-07-18 23:47:40 +0800
b8c638aca Merge branch 'uninit-var' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6 ... Browse Code »

* 'uninit-var' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/misc-2.6:
arch/i386/* fs/* ipc/*: mark variables with uninitialized_var()
drivers/*: mark variables with uninitialized_var()

Linus Torvalds
2007-07-18 06:19:06 +0800
8e1c091cc arch/i386/* fs/* ipc/*: mark variables with uninitialized_var() ... Browse Code »

Mark variables with uninitialized_var() if such a warning appears,
and analysis proves that the var is initialized properly on all paths
it is used.

Signed-off-by: Jeff Garzik

Jeff Garzik
2007-07-18 04:23:19 +0800
3bd858ab1 Introduce is_owner_or_cap() to wrap CAP_FOWNER use with fsuid check ... Browse Code »

Introduce is_owner_or_cap() macro in fs.h, and convert over relevant
users to it. This is done because we want to avoid bugs in the future
where we check for only effective fsuid of the current task against a
file's owning uid, without simultaneously checking for CAP_FOWNER as
well, thus violating its semantics.
[ XFS uses special macros and structures, and in general looked ...
untouchable, so we leave it alone -- but it has been looked over. ]

The (current->fsuid != inode->i_uid) check in generic_permission() and
exec_permission_lite() is left alone, because those operations are
covered by CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH. Similarly operations
falling under the purview of CAP_CHOWN and CAP_LEASE are also left alone.

Signed-off-by: Satyam Sharma
Cc: Al Viro
Acked-by: Serge E. Hallyn
Signed-off-by: Linus Torvalds

Satyam Sharma
2007-07-18 03:00:03 +0800
a56942551 knfsd: exportfs: add exportfs.h header ... Browse Code »

currently the export_operation structure and helpers related to it are in
fs.h. fs.h is already far too large and there are very few places needing the
export bits, so split them off into a separate header.

[akpm@linux-foundation.org: fix cifs build]
Signed-off-by: Christoph Hellwig
Signed-off-by: Neil Brown
Cc: Steven French
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Christoph Hellwig
2007-07-18 01:23:06 +0800

17 Jul, 2007

1 commit

add096909 Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 ... Browse Code »

* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (32 commits)
[PATCH] ocfs2: zero_user_page conversion
ocfs2: Support xfs style space reservation ioctls
ocfs2: support for removing file regions
ocfs2: update truncate handling of partial clusters
ocfs2: btree support for removal of arbirtrary extents
ocfs2: Support creation of unwritten extents
ocfs2: support writing of unwritten extents
ocfs2: small cleanup of ocfs2_write_begin_nolock()
ocfs2: btree changes for unwritten extents
ocfs2: abstract btree growing calls
ocfs2: use all extent block suballocators
ocfs2: plug truncate into cached dealloc routines
ocfs2: simplify deallocation locking
ocfs2: harden buffer check during mapping of page blocks
ocfs2: shared writeable mmap
ocfs2: factor out write aops into nolock variants
ocfs2: rework ocfs2_buffered_write_cluster()
ocfs2: take ip_alloc_sem during entire truncate
ocfs2: Add "preferred slot" mount option
[KJ PATCH] Replacing memset(,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c
...

Linus Torvalds
2007-07-17 01:52:55 +0800

12 Jul, 2007

1 commit

7b595756e sysfs: kill unnecessary attribute->owner ... Browse Code »

sysfs is now completely out of driver/module lifetime game. After
deletion, a sysfs node doesn't access anything outside sysfs proper,
so there's no reason to hold onto the attribute owners. Note that
often the wrong modules were accounted for as owners leading to
accessing removed modules.

This patch kills now unnecessary attribute->owner. Note that with
this change, userland holding a sysfs node does not prevent the
backing module from being unloaded.

For more info regarding lifetime rule cleanup, please read the
following message.

http://article.gmane.org/gmane.linux.kernel/510293

(tweaked by Greg to not delete the field just yet, to make it easier to
merge things properly.)

Signed-off-by: Tejun Heo
Cc: Cornelia Huck
Cc: Andrew Morton
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2007-07-12 07:09:06 +0800

11 Jul, 2007

25 commits

54c57dc3b [PATCH] ocfs2: zero_user_page conversion ... Browse Code »

Signed-off-by: Eric Sandeen
Signed-off-by: Mark Fasheh

Eric Sandeen
2007-07-11 08:32:10 +0800
b25801038 ocfs2: Support xfs style space reservation ioctls ... Browse Code »

We re-use the RESVSP/UNRESVSP ioctls from xfs which allow the user to
allocate and deallocate regions to a file without zeroing data or changing
i_size.

Though renamed, the structure passed in from user is identical to struct
xfs_flock64. The three fields that are actually used right now are l_whence,
l_start and l_len.

This should get ocfs2 immediate compatibility with userspace software using
the pre-existing xfs ioctls.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-07-11 08:32:09 +0800
063c4561f ocfs2: support for removing file regions ... Browse Code »

Provide an internal interface for the removal of arbitrary file regions.

ocfs2_remove_inode_range() takes a byte range within a file and will remove
existing extents within that range. Partial clusters will be zeroed so that
any read from within the region will return zeros.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-07-11 08:32:08 +0800
35edec1d5 ocfs2: update truncate handling of partial clusters ... Browse Code »

The partial cluster zeroing code used during truncate usually assumes that
the rightmost byte in the range to be zeroed lies on a cluster boundary.
This makes sense for truncate, but punching holes might require zeroing on
non-aligned rightmost boundaries.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-07-11 08:32:07 +0800
d0c7d7082 ocfs2: btree support for removal of arbirtrary extents ... Browse Code »

Add code to the btree paths to support the removal of arbitrary regions
within an existing extent. With proper higher level support this can be used
to "punch holes" in a file. Truncate (a special case of hole punching) could
also be converted to use these methods.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-07-11 08:32:05 +0800
2ae99a603 ocfs2: Support creation of unwritten extents ... Browse Code »

This can now be trivially supported with re-use of our existing extend code.

ocfs2_allocate_unwritten_extents() takes a start offset and a byte length
and iterates over the inode, adding extents (marked as unwritten) until len
is reached. Existing extents are skipped over.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-07-11 08:32:04 +0800
b27b7cbcf ocfs2: support writing of unwritten extents ... Browse Code »

Update the write code to detect when the user is asking to write to an
unwritten extent. Like writing to a hole, we must zero the region between
the write and the cluster boundaries. Most of the existing cluster zeroing
logic can be re-used with some additional checks for the unwritten flag on
extent records.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-07-11 08:32:03 +0800
0d172baa5 ocfs2: small cleanup of ocfs2_write_begin_nolock() ... Browse Code »

We can easily seperate out the write descriptor setup and manipulation
into helper functions.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-07-11 08:32:01 +0800
328d5752e ocfs2: btree changes for unwritten extents ... Browse Code »

Writes to a region marked as unwritten might result in a record split or
merge. We can support splits by making minor changes to the existing insert
code. Merges require left rotations which mostly re-use right rotation
support functions.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-07-11 08:32:00 +0800
c3afcbb34 ocfs2: abstract btree growing calls ... Browse Code »

The top level calls and logic for growing a tree can easily be abstracted
out of ocfs2_insert_extent() into a seperate function - ocfs2_grow_tree().

This allows future code to easily grow btrees when needed.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-07-11 08:31:58 +0800
1f6697d07 ocfs2: use all extent block suballocators ... Browse Code »

Now that we have a method to deallocate blocks from them, each node should
allocate extent blocks from their local suballocator file.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-07-11 08:31:56 +0800
59a5e416d ocfs2: plug truncate into cached dealloc routines ... Browse Code »

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-07-11 08:31:55 +0800
2b604351b ocfs2: simplify deallocation locking ... Browse Code »

Deallocation of suballocator blocks, most notably extent blocks, might
involve multiple suballocator inodes.

The locking for this can get extremely complicated, especially when the
suballocator inodes to delete from aren't known until deep within an
unrelated codepath.

Implement a simple scheme for recording the blocks to be unlinked so that
the actual deallocation can be done in a context which won't deadlock.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-07-11 08:31:54 +0800
bce997682 ocfs2: harden buffer check during mapping of page blocks ... Browse Code »

We don't want to submit buffer_new blocks for read i/o. This actually won't
happen right now because those requests during an allocating write are all nicely
aligned. It's probably a good idea to provide an explicit check though.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-07-11 08:31:52 +0800
7307de805 ocfs2: shared writeable mmap ... Browse Code »

Implement cluster consistent shared writeable mappings using the
->page_mkwrite() callback.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-07-11 08:31:51 +0800
607d44aa3 ocfs2: factor out write aops into nolock variants ... Browse Code »

ocfs2_mkwrite() will want this so that it can add some mmap specific checks
before asking for a write.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-07-11 08:31:49 +0800
3a307ffc2 ocfs2: rework ocfs2_buffered_write_cluster() ... Browse Code »

Use some ideas from the new-aops patch series and turn
ocfs2_buffered_write_cluster() into a 2 stage operation with the caller
copying data in between. The code now understands multiple cluster writes as
a result of having to deal with a full page write for greater than 4k pages.

This sets us up to easily call into the write path during ->page_mkwrite().

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-07-11 08:31:46 +0800
2e89b2e48 ocfs2: take ip_alloc_sem during entire truncate ... Browse Code »

Use of the alloc sem during truncate was too narrow - we want to protect
the i_size change and page truncation against mmap now.

Signed-off-by: Mark Fasheh

Mark Fasheh
2007-07-11 08:19:57 +0800
baf4661a8 ocfs2: Add "preferred slot" mount option ... Browse Code »

ocfs2 will attempt to assign the node the slot# provided in the mount
option. Failure to assign the preferred slot is not an error. This small
feature can be useful for automated testing.

Signed-off-by: Sunil Mushran
Signed-off-by: Mark Fasheh

Sunil Mushran
2007-07-11 08:19:54 +0800
5fb0f7f01 [KJ PATCH] Replacing memset(<addr>,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c ... Browse Code »

Replacing memset(,0,PAGE_SIZE) with clear_page() in
fs/ocfs2/dlm/dlmrecovery.c

Signed-off-by: Shani Moideen
Signed-off-by: Mark Fasheh

Shani Moideen
2007-07-11 08:19:52 +0800
800deef3f [PATCH] ocfs2: use list_for_each_entry where benefical ... Browse Code »

Signed-off-by: Christoph Hellwig
Signed-off-by: Mark Fasheh

Christoph Hellwig
2007-07-11 08:19:49 +0800
e6df3a663 ocfs2: Wake up a starting region if it gets killed in the background. ... Browse Code »

Tell o2cb_region_dev_write() to wake up if rmdir(2) happens on the
heartbeat region while it is starting up. Then o2hb_region_dev_write()
can check to see if it is alive and act accordingly. This prevents a hang
(not being woken) and a crash (if it's woken by a signal).

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2007-07-11 08:19:46 +0800
16c6a4f24 ocfs2: live heartbeat depends on the local node configuration ... Browse Code »

Removing the local node configuration out from underneath a running
heartbeat is "bad". Provide an API in the ocfs2 nodemanager to request
a configfs dependancy on the local node, then use it in heartbeat.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2007-07-11 08:19:43 +0800
14829422b ocfs2: Depend on configfs heartbeat items. ... Browse Code »

ocfs2 mounts require a heartbeat region. Use the new configfs_depend_item()
facility to actually depend on them so they can't go away from under us.

First, teach cluster/nodemanager.c to depend an item on the o2cb subsystem.
Then teach o2hb_register_callbacks to take a UUID and depend on the
appropriate region. Finally, teach all users of o2hb to pass a UUID or
NULL if they don't require a pin.

Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2007-07-11 08:19:40 +0800
e6bd07aee configfs: Convert subsystem semaphore to mutex ... Browse Code »

Convert the su_sem member of struct configfs_subsystem to a struct
mutex, as that's what it is. Also convert all the users and update
Documentation/configfs.txt and Documentation/configfs_example.c
accordingly.

[ Conflict in fs/dlm/config.c with commit
3168b0780d06ace875696f8a648d04d6089654e5 manually resolved. --Mark ]

Inspired-by: Satyam Sharma
Signed-off-by: Joel Becker
Signed-off-by: Mark Fasheh

Joel Becker
2007-07-11 08:10:56 +0800

10 Jul, 2007

2 commits

cac36bb06 pipe: change the ->pin() operation to ->confirm() ... Browse Code »

The name 'pin' was badly chosen, it doesn't pin a pipe buffer
in the most commonly used sense in the kernel. So change the
name to 'confirm', after debating this issue with Hugh
Dickins a bit.

A good return from ->confirm() means that the buffer is really
there, and that the contents are good.

Signed-off-by: Jens Axboe

Jens Axboe
2007-07-10 14:04:15 +0800
d6b29d7ce splice: divorce the splice structure/function definitions from the pipe header ... Browse Code »

We need to move even more stuff into the header so that folks can use
the splice_to_pipe() implementation instead of open-coding a lot of
pipe knowledge (see relay implementation), so move to our own header
file finally.

Signed-off-by: Jens Axboe

Jens Axboe
2007-07-10 14:04:14 +0800