Eric Lee / smarc-fsl-linux-kernel

04 Apr, 2009

6 commits

3cc50ac0d Merge git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-2.6-fscache: (41 commits)
NFS: Add mount options to enable local caching on NFS
NFS: Display local caching state
NFS: Store pages from an NFS inode into a local cache
NFS: Read pages from FS-Cache into an NFS inode
NFS: nfs_readpage_async() needs to be accessible as a fallback for local caching
NFS: Add read context retention for FS-Cache to call back with
NFS: FS-Cache page management
NFS: Add some new I/O counters for FS-Cache doing things for NFS
NFS: Invalidate FsCache page flags when cache removed
NFS: Use local disk inode cache
NFS: Define and create inode-level cache objects
NFS: Define and create superblock-level objects
NFS: Define and create server-level objects
NFS: Register NFS for caching and retrieve the top-level index
NFS: Permit local filesystem caching to be enabled for NFS
NFS: Add FS-Cache option bit and debug bit
NFS: Add comment banners to some NFS functions
FS-Cache: Make kAFS use FS-Cache
CacheFiles: A cache that backs onto a mounted filesystem
CacheFiles: Export things for CacheFiles
...

Linus Torvalds
2009-04-04 01:07:43 +0800
d9b9be024 Merge git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm: (36 commits)
dm: set queue ordered mode
dm: move wait queue declaration
dm: merge pushback and deferred bio lists
dm: allow uninterruptible wait for pending io
dm: merge __flush_deferred_io into caller
dm: move bio_io_error into __split_and_process_bio
dm: rename __split_bio
dm: remove unnecessary struct dm_wq_req
dm: remove unnecessary work queue context field
dm: remove unnecessary work queue type field
dm: bio list add bio_list_add_head
dm snapshot: persistent fix dtr cleanup
dm snapshot: move status to exception store
dm snapshot: move ctr parsing to exception store
dm snapshot: use DMEMIT macro for status
dm snapshot: remove dm_snap header
dm snapshot: remove dm_snap header use
dm exception store: move cow pointer
dm exception store: move chunk_fields
dm exception store: move dm_target pointer
...

Linus Torvalds
2009-04-04 01:02:45 +0800
3688e07f8 Fix highmem PPC build failure ... Browse Code »

Commit f4112de6b679d84bd9b9681c7504be7bdfb7c7d5 ("mm: introduce
debug_kmap_atomic") broke PPC builds with CONFIG_HIGHMEM=y:

CC init/main.o
In file included from include/linux/highmem.h:25,
from include/linux/pagemap.h:11,
from include/linux/mempolicy.h:63,
from init/main.c:53:
arch/powerpc/include/asm/highmem.h: In function 'kmap_atomic_prot':
arch/powerpc/include/asm/highmem.h:98: error: implicit declaration of function 'debug_kmap_atomic'
In file included from include/linux/pagemap.h:11,
from include/linux/mempolicy.h:63,
from init/main.c:53:
include/linux/highmem.h: At top level:
include/linux/highmem.h:196: warning: conflicting types for 'debug_kmap_atomic'
include/linux/highmem.h:196: error: static declaration of 'debug_kmap_atomic' follows non-static declaration
include/asm/highmem.h:98: error: previous implicit declaration of 'debug_kmap_atomic' was here
make[1]: *** [init/main.o] Error 1
make: *** [init] Error 2

Signed-off-by: Kumar Gala
Acked-by: Akinobu Mita
Signed-off-by: Linus Torvalds

Kumar Gala
2009-04-04 00:48:29 +0800
c54c4dec6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: ixp4xx - Fix handling of chained sg buffers
crypto: shash - Fix unaligned calculation with short length
hwrng: timeriomem - Use phys address rather than virt

Linus Torvalds
2009-04-04 00:45:53 +0800
223cdea4c Merge branch 'for-linus' of git://neil.brown.name/md ... Browse Code »

* 'for-linus' of git://neil.brown.name/md: (53 commits)
md/raid5 revise rules for when to update metadata during reshape
md/raid5: minor code cleanups in make_request.
md: remove CONFIG_MD_RAID_RESHAPE config option.
md/raid5: be more careful about write ordering when reshaping.
md: don't display meaningless values in sysfs files resync_start and sync_speed
md/raid5: allow layout and chunksize to be changed on active array.
md/raid5: reshape using largest of old and new chunk size
md/raid5: prepare for allowing reshape to change layout
md/raid5: prepare for allowing reshape to change chunksize.
md/raid5: clearly differentiate 'before' and 'after' stripes during reshape.
Documentation/md.txt update
md: allow number of drives in raid5 to be reduced
md/raid5: change reshape-progress measurement to cope with reshaping backwards.
md: add explicit method to signal the end of a reshape.
md/raid5: enhance raid5_size to work correctly with negative delta_disks
md/raid5: drop qd_idx from r6_state
md/raid6: move raid6 data processing to raid6_pq.ko
md: raid5 run(): Fix max_degraded for raid level 4.
md: 'array_size' sysfs attribute
md: centralize ->array_sectors modifications
...

Linus Torvalds
2009-04-04 00:08:19 +0800
ea02259fd Merge git://git.kernel.org/pub/scm/linux/kernel/git/bart/linux-hdreg-h-cleanup ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/bart/linux-hdreg-h-cleanup:
remove include from
include/linux/hdreg.h: remove unused defines
isd200: use ATA_* defines instead of *_STAT and *_ERR ones
include/linux/hdreg.h: cover WIN_* and friends with #ifndef/#endif __KERNEL__
aoe: WIN_* -> ATA_CMD_*
isd200: WIN_* -> ATA_CMD_*
include/linux/hdreg.h: cover struct hd_driveid with #ifndef/#endif __KERNEL__
xsysace: make it 'struct hd_driveid'-free
ubd_kern: make it 'struct hd_driveid'-free
isd200: make it 'struct hd_driveid'-free

Linus Torvalds
2009-04-04 00:02:32 +0800

03 Apr, 2009

34 commits

f42b293d6 NFS: nfs_readpage_async() needs to be accessible as a fallback for local caching ... Browse Code »

nfs_readpage_async() needs to be non-static so that it can be used as a
fallback for the local on-disk caching should an EIO crop up when reading the
cache.

Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:44 +0800
6a51091d0 NFS: Add some new I/O counters for FS-Cache doing things for NFS ... Browse Code »

Add some new NFS I/O counters for FS-Cache doing things for NFS. A new line is
emitted into /proc/pid/mountstats if caching is enabled that looks like:

fsc:

Where is the number of pages read successfully from the cache, is
the number of failed page reads against the cache, is the number of
successful page writes to the cache, is the number of failed page writes
to the cache, and is the number of NFS pages that have been disconnected
from the cache.

Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:43 +0800
ef79c097b NFS: Use local disk inode cache ... Browse Code »

Bind data storage objects in the local cache to NFS inodes.

Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:43 +0800
08734048b NFS: Define and create superblock-level objects ... Browse Code »

Define and create superblock-level cache index objects (as managed by
nfs_server structs).

Each superblock object is created in a server level index object and is itself
an index into which inode-level objects are inserted.

Ideally there would be one superblock-level object per server, and the former
would be folded into the latter; however, since the "nosharecache" option
exists this isn't possible.

The superblock object key is a sequence consisting of:

(1) Certain superblock s_flags.

(2) Various connection parameters that serve to distinguish superblocks for
sget().

(3) The volume FSID.

(4) The security flavour.

(5) The uniquifier length.

(6) The uniquifier text. This is normally an empty string, unless the fsc=xyz
mount option was used to explicitly specify a uniquifier.

The key blob is of variable length, depending on the length of (6).

The superblock object is given no coherency data to carry in the auxiliary data
permitted by the cache. It is assumed that the superblock is always coherent.

This patch also adds uniquification handling such that two otherwise identical
superblocks, at least one of which is marked "nosharecache", won't end up
trying to share the on-disk cache. It will be possible to manually provide a
uniquifier through a mount option with a later patch to avoid the error
otherwise produced.

Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:42 +0800
147272813 NFS: Define and create server-level objects ... Browse Code »

Define and create server-level cache index objects (as managed by nfs_client
structs).

Each server object is created in the NFS top-level index object and is itself
an index into which superblock-level objects are inserted.

Ideally there would be one superblock-level object per server, and the former
would be folded into the latter; however, since the "nosharecache" option
exists this isn't possible.

The server object key is a sequence consisting of:

(1) NFS version

(2) Server address family (eg: AF_INET or AF_INET6)

(3) Server port.

(4) Server IP address.

The key blob is of variable length, depending on the length of (4).

The server object is given no coherency data to carry in the auxiliary data
permitted by the cache.

Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:42 +0800
c6a6f19e2 NFS: Add FS-Cache option bit and debug bit ... Browse Code »

Add FS-Cache option bit to nfs_server struct. This is set to indicate local
on-disk caching is enabled for a particular superblock.

Also add debug bit for local caching operations.

Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:42 +0800
385e1ca5f CacheFiles: Permit the page lock state to be monitored ... Browse Code »

Add a function to install a monitor on the page lock waitqueue for a particular
page, thus allowing the page being unlocked to be detected.

This is used by CacheFiles to detect read completion on a page in the backing
filesystem so that it can then copy the data to the waiting netfs page.

Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Rik van Riel
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:39 +0800
b51088228 FS-Cache: Implement data I/O part of netfs API ... Browse Code »

Implement the data I/O part of the FS-Cache netfs API. The documentation and
API header file were added in a previous patch.

This patch implements the following functions for the netfs to call:

(*) fscache_attr_changed().

Indicate that the object has changed its attributes. The only attribute
currently recorded is the file size. Only pages within the set file size
will be stored in the cache.

This operation is submitted for asynchronous processing, and will return
immediately. It will return -ENOMEM if an out of memory error is
encountered, -ENOBUFS if the object is not actually cached, or 0 if the
operation is successfully queued.

(*) fscache_read_or_alloc_page().
(*) fscache_read_or_alloc_pages().

Request data be fetched from the disk, and allocate internal metadata to
track the netfs pages and reserve disk space for unknown pages.

These operations perform semi-asynchronous data reads. Upon returning
they will indicate which pages they think can be retrieved from disk, and
will have set in progress attempts to retrieve those pages.

These will return, in order of preference, -ENOMEM on memory allocation
error, -ERESTARTSYS if a signal interrupted proceedings, -ENODATA if one
or more requested pages are not yet cached, -ENOBUFS if the object is not
actually cached or if there isn't space for future pages to be cached on
this object, or 0 if successful.

In the case of the multipage function, the pages for which reads are set
in progress will be removed from the list and the page count decreased
appropriately.

If any read operations should fail, the completion function will be given
an error, and will also be passed contextual information to allow the
netfs to fall back to querying the server for the absent pages.

For each successful read, the page completion function will also be
called.

Any pages subsequently tracked by the cache will have PG_fscache set upon
them on return. fscache_uncache_page() must be called for such pages.

If supplied by the netfs, the mark_pages_cached() cookie op will be
invoked for any pages now tracked.

(*) fscache_alloc_page().

Allocate internal metadata to track a netfs page and reserve disk space.

This will return -ENOMEM on memory allocation error, -ERESTARTSYS on
signal, -ENOBUFS if the object isn't cached, or there isn't enough space
in the cache, or 0 if successful.

Any pages subsequently tracked by the cache will have PG_fscache set upon
them on return. fscache_uncache_page() must be called for such pages.

If supplied by the netfs, the mark_pages_cached() cookie op will be
invoked for any pages now tracked.

(*) fscache_write_page().

Request data be stored to disk. This may only be called on pages that
have been read or alloc'd by the above three functions and have not yet
been uncached.

This will return -ENOMEM on memory allocation error, -ERESTARTSYS on
signal, -ENOBUFS if the object isn't cached, or there isn't immediately
enough space in the cache, or 0 if successful.

On a successful return, this operation will have queued the page for
asynchronous writing to the cache. The page will be returned with
PG_fscache_write set until the write completes one way or another. The
caller will not be notified if the write fails due to an I/O error. If
that happens, the object will become available and all pending writes will
be aborted.

Note that the cache may batch up page writes, and so it may take a while
to get around to writing them out.

The caller must assume that until PG_fscache_write is cleared the page is
use by the cache. Any changes made to the page may be reflected on disk.
The page may even be under DMA.

(*) fscache_uncache_page().

Indicate that the cache should stop tracking a page previously read or
alloc'd from the cache. If the page was alloc'd only, but unwritten, it
will not appear on disk.

Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:39 +0800
ccc4fc3d1 FS-Cache: Implement the cookie management part of the netfs API ... Browse Code »

Implement the cookie management part of the FS-Cache netfs client API. The
documentation and API header file were added in a previous patch.

This patch implements the following three functions:

(1) fscache_acquire_cookie().

Acquire a cookie to represent an object to the netfs. If the object in
question is a non-index object, then that object and its parent indices
will be created on disk at this point if they don't already exist. Index
creation is deferred because an index may reside in multiple caches.

(2) fscache_relinquish_cookie().

Retire or release a cookie previously acquired. At this point, the
object on disk may be destroyed.

(3) fscache_update_cookie().

Update the in-cache representation of a cookie. This is used to update
the auxiliary data for coherency management purposes.

With this patch it is possible to have a netfs instruct a cache backend to
look up, validate and create metadata on disk and to destroy it again.
The ability to actually store and retrieve data in the objects so created is
added in later patches.

Note that these functions will never return an error. _All_ errors are
handled internally to FS-Cache.

The worst that can happen is that fscache_acquire_cookie() may return a NULL
pointer - which is considered a negative cookie pointer and can be passed back
to any function that takes a cookie without harm. A negative cookie pointer
merely suppresses caching at that level.

The stub in linux/fscache.h will detect inline the negative cookie pointer and
abort the operation as fast as possible. This means that the compiler doesn't
have to set up for a call in that case.

See the documentation in Documentation/filesystems/caching/netfs-api.txt for
more information.

Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:38 +0800
726dd7ff1 FS-Cache: Add netfs registration ... Browse Code »

Add functions to register and unregister a network filesystem or other client
of the FS-Cache service. This allocates and releases the cookie representing
the top-level index for a netfs, and makes it available to the netfs.

If the FS-Cache facility is disabled, then the calls are optimised away at
compile time.

Note that whilst this patch may appear to work with FS-Cache enabled and a
netfs attempting to use it, it will leak the cookie it allocates for the netfs
as fscache_relinquish_cookie() is implemented in a later patch. This will
cause the slab code to emit a warning when the module is removed.

Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:38 +0800
0e04d4cef FS-Cache: Add cache tag handling ... Browse Code »

Implement two features of FS-Cache:

(1) The ability to request and release cache tags - names by which a cache may
be known to a netfs, and thus selected for use.

(2) An internal function by which a cache is selected by consulting the netfs,
if the netfs wishes to be consulted.

Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:37 +0800
7394daa8c FS-Cache: Add use of /proc and presentation of statistics ... Browse Code »

Make FS-Cache create its /proc interface and present various statistical
information through it. Also provide the functions for updating this
information.

These features are enabled by:

CONFIG_FSCACHE_PROC
CONFIG_FSCACHE_STATS
CONFIG_FSCACHE_HISTOGRAM

The /proc directory for FS-Cache is also exported so that caching modules can
add their own statistics there too.

The FS-Cache module is loadable at this point, and the statistics files can be
examined by userspace:

cat /proc/fs/fscache/stats
cat /proc/fs/fscache/histogram

Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:37 +0800
0dfc41d1e FS-Cache: Add the FS-Cache cache backend API and documentation ... Browse Code »

Add the API for a generic facility (FS-Cache) by which caches may declare them
selves open for business, and may obtain work to be done from network
filesystems. The header file is included by:

#include

Documentation for the API is also added to:

Documentation/filesystems/caching/backend-api.txt

This API is not usable without the implementation of the utility functions
which will be added in further patches.

Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:36 +0800
2d6fff637 FS-Cache: Add the FS-Cache netfs API and documentation ... Browse Code »

Add the API for a generic facility (FS-Cache) by which filesystems (such as AFS
or NFS) may call on local caching capabilities without having to know anything
about how the cache works, or even if there is a cache:

+---------+
| | +--------------+
| NFS |--+ | |
| | | +-->| CacheFS |
+---------+ | +----------+ | | /dev/hda5 |
| | | | +--------------+
+---------+ +-->| | |
| | | |--+
| AFS |----->| FS-Cache |
| | | |--+
+---------+ +-->| | |
| | | | +--------------+
+---------+ | +----------+ | | |
| | | +-->| CacheFiles |
| ISOFS |--+ | /var/cache |
| | +--------------+
+---------+

General documentation and documentation of the netfs specific API are provided
in addition to the header files.

As this patch stands, it is possible to build a filesystem against the facility
and attempt to use it. All that will happen is that all requests will be
immediately denied as if no cache is present.

Further patches will implement the core of the facility. The facility will
transfer requests from networking filesystems to appropriate caches if
possible, or else gracefully deny them.

If this facility is disabled in the kernel configuration, then all its
operations will trivially reduce to nothing during compilation.

WHY NOT I_MAPPING?
==================

I have added my own API to implement caching rather than using i_mapping to do
this for a number of reasons. These have been discussed a lot on the LKML and
CacheFS mailing lists, but to summarise the basics:

(1) Most filesystems don't do hole reportage. Holes in files are treated as
blocks of zeros and can't be distinguished otherwise, making it difficult
to distinguish blocks that have been read from the network and cached from
those that haven't.

(2) The backing inode must be fully populated before being exposed to
userspace through the main inode because the VM/VFS goes directly to the
backing inode and does not interrogate the front inode's VM ops.

Therefore:

(a) The backing inode must fit entirely within the cache.

(b) All backed files currently open must fit entirely within the cache at
the same time.

(c) A working set of files in total larger than the cache may not be
cached.

(d) A file may not grow larger than the available space in the cache.

(e) A file that's open and cached, and remotely grows larger than the
cache is potentially stuffed.

(3) Writes go to the backing filesystem, and can only be transferred to the
network when the file is closed.

(4) There's no record of what changes have been made, so the whole file must
be written back.

(5) The pages belong to the backing filesystem, and all metadata associated
with that page are relevant only to the backing filesystem, and not
anything stacked atop it.

OVERVIEW
========

FS-Cache provides (or will provide) the following facilities:

(1) Caches can be added / removed at any time, even whilst in use.

(2) Adds a facility by which tags can be used to refer to caches, even if
they're not available yet.

(3) More than one cache can be used at once. Caches can be selected
explicitly by use of tags.

(4) The netfs is provided with an interface that allows either party to
withdraw caching facilities from a file (required for (1)).

(5) A netfs may annotate cache objects that belongs to it. This permits the
storage of coherency maintenance data.

(6) Cache objects will be pinnable and space reservations will be possible.

(7) The interface to the netfs returns as few errors as possible, preferring
rather to let the netfs remain oblivious.

(8) Cookies are used to represent indices, files and other objects to the
netfs. The simplest cookie is just a NULL pointer - indicating nothing
cached there.

(9) The netfs is allowed to propose - dynamically - any index hierarchy it
desires, though it must be aware that the index search function is
recursive, stack space is limited, and indices can only be children of
indices.

(10) Indices can be used to group files together to reduce key size and to make
group invalidation easier. The use of indices may make lookup quicker,
but that's cache dependent.

(11) Data I/O is effectively done directly to and from the netfs's pages. The
netfs indicates that page A is at index B of the data-file represented by
cookie C, and that it should be read or written. The cache backend may or
may not start I/O on that page, but if it does, a netfs callback will be
invoked to indicate completion. The I/O may be either synchronous or
asynchronous.

(12) Cookies can be "retired" upon release. At this point FS-Cache will mark
them as obsolete and the index hierarchy rooted at that point will get
recycled.

(13) The netfs provides a "match" function for index searches. In addition to
saying whether a match was made or not, this can also specify that an
entry should be updated or deleted.

FS-Cache maintains a virtual index tree in which all indices, files, objects
and pages are kept. Bits of this tree may actually reside in one or more
caches.

FSDEF
|
+------------------------------------+
| |
NFS AFS
| |
+--------------------------+ +-----------+
| | | |
homedir mirror afs.org redhat.com
| | |
+------------+ +---------------+ +----------+
| | | | | |
00001 00002 00007 00125 vol00001 vol00002
| | | | |
+---+---+ +-----+ +---+ +------+------+ +-----+----+
| | | | | | | | | | | | |
PG0 PG1 PG2 PG0 XATTR PG0 PG1 DIRENT DIRENT DIRENT R/W R/O Bak
| |
PG0 +-------+
| |
00001 00003
|
+---+---+
| | |
PG0 PG1 PG2

In the example above, two netfs's can be seen to be backed: NFS and AFS. These
have different index hierarchies:

(*) The NFS primary index will probably contain per-server indices. Each
server index is indexed by NFS file handles to get data file objects.
Each data file objects can have an array of pages, but may also have
further child objects, such as extended attributes and directory entries.
Extended attribute objects themselves have page-array contents.

(*) The AFS primary index contains per-cell indices. Each cell index contains
per-logical-volume indices. Each of volume index contains up to three
indices for the read-write, read-only and backup mirrors of those volumes.
Each of these contains vnode data file objects, each of which contains an
array of pages.

The very top index is the FS-Cache master index in which individual netfs's
have entries.

Any index object may reside in more than one cache, provided it only has index
children. Any index with non-index object children will be assumed to only
reside in one cache.

The FS-Cache overview can be found in:

Documentation/filesystems/caching/fscache.txt

The netfs API to FS-Cache can be found in:

Documentation/filesystems/caching/netfs-api.txt

Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:36 +0800
266cf658e FS-Cache: Recruit a page flags for cache management ... Browse Code »

Recruit a page flag to aid in cache management. The following extra flag is
defined:

(1) PG_fscache (PG_private_2)

The marked page is backed by a local cache and is pinning resources in the
cache driver.

If PG_fscache is set, then things that checked for PG_private will now also
check for that. This includes things like truncation and page invalidation.
The function page_has_private() had been added to make the checks for both
PG_private and PG_private_2 at the same time.

Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Rik van Riel
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:36 +0800
03fb3d2af FS-Cache: Release page->private after failed readahead ... Browse Code »

The attached patch causes read_cache_pages() to release page-private data on a
page for which add_to_page_cache() fails. If the filler function fails, then
the problematic page is left attached to the pagecache (with appropriate flags
set, one presumes) and the remaining to-be-attached pages are invalidated and
discarded. This permits pages with caching references associated with them to
be cleaned up.

The invalidatepage() address space op is called (indirectly) to do the honours.

Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Rik van Riel
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:35 +0800
8f0aa2f25 Document the slow work thread pool ... Browse Code »

Document the slow work thread pool.

Signed-off-by: David Howells
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:35 +0800
12e22c5e4 Make the slow work pool configurable ... Browse Code »

Make the slow work pool configurable through /proc/sys/kernel/slow-work.

(*) /proc/sys/kernel/slow-work/min-threads

The minimum number of threads that should be in the pool as long as it is
in use. This may be anywhere between 2 and max-threads.

(*) /proc/sys/kernel/slow-work/max-threads

The maximum number of threads that should in the pool. This may be
anywhere between min-threads and 255 or NR_CPUS * 2, whichever is greater.

(*) /proc/sys/kernel/slow-work/vslow-percentage

The percentage of active threads in the pool that may be used to execute
very slow work items. This may be between 1 and 99. The resultant number
is bounded to between 1 and one fewer than the number of active threads.
This ensures there is always at least one thread that can process very
slow work items, and always at least one thread that won't.

Signed-off-by: David Howells
Acked-by: Serge Hallyn
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:35 +0800
07fe7cb7c Create a dynamically sized pool of threads for doing very slow work items ... Browse Code »

Create a dynamically sized pool of threads for doing very slow work items, such
as invoking mkdir() or rmdir() - things that may take a long time and may
sleep, holding mutexes/semaphores and hogging a thread, and are thus unsuitable
for workqueues.

The number of threads is always at least a settable minimum, but more are
started when there's more work to do, up to a limit. Because of the nature of
the load, it's not suitable for a 1-thread-per-CPU type pool. A system with
one CPU may well want several threads.

This is used by FS-Cache to do slow caching operations in the background, such
as looking up, creating or deleting cache objects.

Signed-off-by: David Howells
Acked-by: Serge Hallyn
Acked-by: Steve Dickson
Acked-by: Trond Myklebust
Acked-by: Al Viro
Tested-by: Daire Byrne

David Howells
2009-04-03 23:42:35 +0800
8fe74cf05 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 ... Browse Code »

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
Remove two unneeded exports and make two symbols static in fs/mpage.c
Cleanup after commit 585d3bc06f4ca57f975a5a1f698f65a45ea66225
Trim includes of fdtable.h
Don't crap into descriptor table in binfmt_som
Trim includes in binfmt_elf
Don't mess with descriptor table in load_elf_binary()
Get rid of indirect include of fs_struct.h
New helper - current_umask()
check_unsafe_exec() doesn't care about signal handlers sharing
New locking/refcounting for fs_struct
Take fs_struct handling to new file (fs/fs_struct.c)
Get rid of bumping fs_struct refcount in pivot_root(2)
Kill unsharing fs_struct in __set_personality()

Linus Torvalds
2009-04-03 12:09:10 +0800
f5f7eac41 Allow rwlocks to re-enable interrupts ... Browse Code »

Pass the original flags to rwlock arch-code, so that it can re-enable
interrupts if implemented for that architecture.

Initially, make __raw_read_lock_flags and __raw_write_lock_flags stubs
which just do the same thing as non-flags variants.

Signed-off-by: Petr Tesarik
Signed-off-by: Robin Holt
Acked-by: Peter Zijlstra
Cc:
Acked-by: Ingo Molnar
Cc: "Luck, Tony"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robin Holt
2009-04-03 10:05:11 +0800
e8c158bb3 Factor out #ifdefs from kernel/spinlock.c to LOCK_CONTENDED_FLAGS ... Browse Code »

SGI has observed that on large systems, interrupts are not serviced for a
long period of time when waiting for a rwlock. The following patch series
re-enables irqs while waiting for the lock, resembling the code which is
already there for spinlocks.

I only made the ia64 version, because the patch adds some overhead to the
fast path. I assume there is currently no demand to have this for other
architectures, because the systems are not so large. Of course, the
possibility to implement raw_{read|write}_lock_flags for any architecture
is still there.

This patch:

The new macro LOCK_CONTENDED_FLAGS expands to the correct implementation
depending on the config options, so that IRQ's are re-enabled when
possible, but they remain disabled if CONFIG_LOCKDEP is set.

Signed-off-by: Petr Tesarik
Signed-off-by: Robin Holt
Cc:
Cc: Ingo Molnar
Cc: Peter Zijlstra
Cc: "Luck, Tony"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Robin Holt
2009-04-03 10:05:10 +0800
f3554f4bc preadv/pwritev: Add preadv and pwritev system calls. ... Browse Code »

This patch adds preadv and pwritev system calls. These syscalls are a
pretty straightforward combination of pread and readv (same for write).
They are quite useful for doing vectored I/O in threaded applications.
Using lseek+readv instead opens race windows you'll have to plug with
locking.

Other systems have such system calls too, for example NetBSD, check
here: http://www.daemon-systems.org/man/preadv.2.html

The application-visible interface provided by glibc should look like
this to be compatible to the existing implementations in the *BSD family:

ssize_t preadv(int d, const struct iovec *iov, int iovcnt, off_t offset);
ssize_t pwritev(int d, const struct iovec *iov, int iovcnt, off_t offset);

This prototype has one problem though: On 32bit archs is the (64bit)
offset argument unaligned, which the syscall ABI of several archs doesn't
allow to do. At least s390 needs a wrapper in glibc to handle this. As
we'll need a wrappers in glibc anyway I've decided to push problem to
glibc entriely and use a syscall prototype which works without
arch-specific wrappers inside the kernel: The offset argument is
explicitly splitted into two 32bit values.

The patch sports the actual system call implementation and the windup in
the x86 system call tables. Other archs follow as separate patches.

Signed-off-by: Gerd Hoffmann
Cc: Arnd Bergmann
Cc: Al Viro
Cc:
Cc:
Cc: Ralf Baechle
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: "H. Peter Anvin"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Gerd Hoffmann
2009-04-03 10:05:08 +0800
04d491ab2 kexec: add dmesg log symbols to /proc/vmcoreinfo lists ... Browse Code »

It would be nice to be able to extract the dmesg log from a vmcore file
without needing to keep the debug symbols for the running kernel handy all
the time. We have a facility to do this in /proc/vmcore. This patch adds
the log_buf and log_end symbols to the vmcoreinfo area so that tools (like
makedumpfile) can easily extract the dmesg logs from a vmcore image.

[akpm@linux-foundation.org: several fixes and cleanups]
[akpm@linux-foundation.org: fix unused log_buf_kexec_setup()]
[akpm@linux-foundation.org: build fix]
Signed-off-by: Neil Horman
Cc: Simon Horman
Acked-by: Vivek Goyal
Cc: Neil Horman
Cc: Simon Horman
Cc: Vivek Goyal
Cc: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Neil Horman
2009-04-03 10:05:04 +0800
7c5ff4f92 pci: Add AMD8111 PCI Bridge PCI Device ID ... Browse Code »

Add the PCI Device ID of the PCI Bridge Controller on AMD8111 chip.

Signed-off-by: Harry Ciao
Cc: Doug Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Harry Ciao
2009-04-03 10:05:03 +0800
1b0f7ffd0 pids: kill signal_struct-> __pgrp/__session and friends ... Browse Code »

We are wasting 2 words in signal_struct without any reason to implement
task_pgrp_nr() and task_session_nr().

task_session_nr() has no callers since
2e2ba22ea4fd4bb85f0fa37c521066db6775cbef, we can remove it.

task_pgrp_nr() is still (I believe wrongly) used in fs/autofsX and
fs/coda.

This patch reimplements task_pgrp_nr() via task_pgrp_nr_ns(), and kills
__pgrp/__session and the related helpers.

The change in drivers/char/tty_io.c is cosmetic, but hopefully makes sense
anyway.

Signed-off-by: Oleg Nesterov
Acked-by: Alan Cox [tty parts]
Cc: Cedric Le Goater
Cc: Dave Hansen
Cc: Eric Biederman
Cc: Pavel Emelyanov
Cc: Serge Hallyn
Cc: Sukadev Bhattiprolu
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2009-04-03 10:05:02 +0800
52ee2dfdd pids: refactor vnr/nr_ns helpers to make them safe ... Browse Code »

Inho, the safety rules for vnr/nr_ns helpers are horrible and buggy.

task_pid_nr_ns(task) needs rcu/tasklist depending on task == current.

As for "special" pids, vnr/nr_ns helpers always need rcu. However, if
task != current, they are unsafe even under rcu lock, we can't trust
task->group_leader without the special checks.

And almost every helper has a callsite which needs a fix.

Also, it is a bit annoying that the implementations of, say,
task_pgrp_vnr() and task_pgrp_nr_ns() are not "symmetrical".

This patch introduces the new helper, __task_pid_nr_ns(), which is always
safe to use, and turns all other helpers into the trivial wrappers.

After this I'll send another patch which converts task_tgid_xxx() as well,
they're are a bit special.

Signed-off-by: Oleg Nesterov
Cc: Louis Rilling
Cc: "Eric W. Biederman"
Cc: Pavel Emelyanov
Cc: Sukadev Bhattiprolu
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2009-04-03 10:05:02 +0800
6dda81f43 pids: document task_pgrp/task_session is not safe without tasklist/rcu ... Browse Code »

Even if task == current, it is not safe to dereference the result of
task_pgrp/task_session. We can race with another thread which changes the
special pid via setpgid/setsid.

Document this. The next 2 patches give an example of the unsafe usage, we
have more bad users.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Oleg Nesterov
Cc: Louis Rilling
Cc: "Eric W. Biederman"
Cc: Pavel Emelyanov
Cc: Sukadev Bhattiprolu
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2009-04-03 10:05:02 +0800
1f80769ff synclink_gt: add clock options ... Browse Code »

Add support for x8 asynchronous sample rate and ability to specify base
clock frequency.

Signed-off-by: Paul Fulghum
Acked-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Fulghum
2009-04-03 10:05:01 +0800
a50b0aa4b struct linux_binprm: drop unused fields ... Browse Code »

Signed-off-by: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2009-04-03 10:05:01 +0800
40e8a10de cpu hotplug: remove unused cpuhotplug_mutex_lock() ... Browse Code »

cpuhotplug_mutex_lock() is not used, remove it.

Signed-off-by: Lai Jiangshan
Cc: Ingo Molnar
Cc: Rusty Russell
Acked-by: Gautham R Shenoy
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2009-04-03 10:05:00 +0800
bb24c679a tracehook_notify_death: use task_detached() helper ... Browse Code »

Now that task_detached() is exported, change tracehook_notify_death() to
use this helper, nobody else checks ->exit_signal == -1 by hand.

Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc: "Metzger, Markus T"
Acked-by: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2009-04-03 10:05:00 +0800
39c626ae4 forget_original_parent: split out the un-ptrace part ... Browse Code »

By discussion with Roland.

- Rename ptrace_exit() to exit_ptrace(), and change it to do all the
necessary work with ->ptraced list by its own.

- Move this code from exit.c to ptrace.c

- Update the comment in ptrace_detach() to explain the rechecking of
the child->ptrace.

Signed-off-by: Oleg Nesterov
Cc: "Eric W. Biederman"
Cc: "Metzger, Markus T"
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2009-04-03 10:05:00 +0800
4576145c1 ptrace: fix possible zombie leak on PTRACE_DETACH ... Browse Code »

When ptrace_detach() takes tasklist, the tracee can be SIGKILL'ed. If it
has already passed exit_notify() we can leak a zombie, because a) ptracing
disables the auto-reaping logic, and b) ->real_parent was not notified
about the child's death.

ptrace_detach() should follow the ptrace_exit's logic, change the code
accordingly.

Signed-off-by: Oleg Nesterov
Cc: Jerome Marchand
Cc: Roland McGrath
Tested-by: Denys Vlasenko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2009-04-03 10:04:59 +0800