Eric Lee / linux-smarc-t335x-v3.2

12 Aug, 2010

1 commit

bf25db365 Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd ... Browse Code »

* 'for-linus' of git://git.open-osd.org/linux-open-osd:
exofs: Fix groups code when num_devices is not divisible by group_width
exofs: Remove useless optimization
exofs: exofs_file_fsync and exofs_file_flush correctness
exofs: Remove superfluous dependency on buffer_head and writeback

Linus Torvalds
2010-08-12 00:19:43 +0800

11 Aug, 2010

1 commit

2f9e825d3 Merge branch 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block ... Browse Code »

* 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
xen-blkfront: fix missing out label
blkdev: fix blkdev_issue_zeroout return value
block: update request stacking methods to support discards
block: fix missing export of blk_types.h
writeback: fix bad _bh spinlock nesting
drbd: revert "delay probes", feature is being re-implemented differently
drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
drbd: Disable delay probes for the upcomming release
writeback: cleanup bdi_register
writeback: add new tracepoints
writeback: remove unnecessary init_timer call
writeback: optimize periodic bdi thread wakeups
writeback: prevent unnecessary bdi threads wakeups
writeback: move bdi threads exiting logic to the forker thread
writeback: restructure bdi forker loop a little
writeback: move last_active to bdi
writeback: do not remove bdi from bdi_list
writeback: simplify bdi code a little
writeback: do not lose wake-ups in bdi threads
...

Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
drivers/scsi/scsi_error.c as per Jens.

Linus Torvalds
2010-08-11 06:22:42 +0800

10 Aug, 2010

3 commits

4ec70c9b4 convert exofs to ->evict_inode() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2010-08-10 04:48:24 +0800
2f246fd0f exofs: New truncate sequence ... Browse Code »

These changes are crafted based on the similar
conversion done to ext2 by Nick Piggin.

* Remove the deprecated ->truncate vector. Let exofs_setattr
take care of on-disk size updates.
* Call truncate_pagecache on the unused pages if
write_begin/end fails.
* Cleanup exofs_delete_inode that did stupid inode
writes and updates on an inode that will be
removed.
* And finally get rid of exofs_get_block. We never
had any blocks it was all for calling nobh_truncate_page.
nobh_truncate_page is not actually needed in exofs since
the last page is complete and gone, just like all the other
pages. There is no partial blocks in exofs.

I've tested with this patch, and there are no apparent
failures, so far.

CC: Nick Piggin
CC: Christoph Hellwig
Signed-off-by: Boaz Harrosh
Signed-off-by: Al Viro

Boaz Harrosh
2010-08-10 04:47:41 +0800
1025774ce remove inode_setattr ... Browse Code »

Replace inode_setattr with opencoded variants of it in all callers. This
moves the remaining call to vmtruncate into the filesystem methods where it
can be replaced with the proper truncate sequence.

In a few cases it was obvious that we would never end up calling vmtruncate
so it was left out in the opencoded variant:

spufs: explicitly checks for ATTR_SIZE earlier
btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier
ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above

In addition to that ncpfs called inode_setattr with handcrafted iattrs,
which allowed to trim down the opencoded variant.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-08-10 04:47:37 +0800

08 Aug, 2010

1 commit

7b6d91dae block: unify flags for struct bio and struct request ... Browse Code »

Remove the current bio flags and reuse the request flags for the bio, too.
This allows to more easily trace the type of I/O from the filesystem
down to the block driver. There were two flags in the bio that were
missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
renamed two request flags that had a superflous RW in them.

Note that the flags are in bio.h despite having the REQ_ name - as
blkdev.h includes bio.h that is the only way to go for now.

Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe

Christoph Hellwig
2010-08-08 00:20:39 +0800

04 Aug, 2010

4 commits

5002dd18c exofs: Fix groups code when num_devices is not divisible by group_width ... Browse Code »

There is a bug when num_devices is not divisible by group_width * mirrors.
We would not return to the proper device and offset when looping on to the
next group.

The fix makes code simpler actually.

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2010-08-04 18:17:58 +0800
6e31609b1 exofs: Remove useless optimization ... Browse Code »

We used to compact all used devices in an IO to the beginning
of the device array in an io_state. And keep a last device used
so in later loops we don't iterate on all device slots. This
does not prevent us from checking if slots are empty since in
reads we only read from a single mirror and jump to the next
mirror-set.

This optimization is marginal, and needlessly complicates the
code. Specially when we will later want to support raid/456
with same abstract code. So remove the distinction between
"dev" and "comp". Only "dev" is used both as the device used
and as the index (component) in the device array.

[Note that now the io_state->dev member is redundant but I
keep it because I might want to optimize by only IOing a
single group, though keeping a group_width*mirrors devices
in io_state, we now keep num-devices in each io_state]

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2010-08-04 18:17:57 +0800
b28483492 exofs: exofs_file_fsync and exofs_file_flush correctness ... Browse Code »

As per Christoph advise: no need to call filemap_write_and_wait().
In exofs all metadata is at the inode so just writing the inode is
all is needed. ->fsync implies this must be done synchronously.

But now exofs_file_fsync can not be used by exofs_file_flush.
vfs_fsync() should do that job correctly.

FIXME: remove the sb_sync and fix that sb_update better.

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2010-08-04 18:17:56 +0800
85dc7878c exofs: Remove superfluous dependency on buffer_head and writeback ... Browse Code »

exofs_releasepage && exofs_invalidatepage are never called.
Leave the WARN_ONs but remove any code. Remove the

cleanup other stale #includes.

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2010-08-04 18:17:55 +0800

28 May, 2010

1 commit

7ea808591 drop unused dentry argument to ->fsync ... Browse Code »

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-05-28 10:05:02 +0800

24 May, 2010

1 commit

0163916f1 Merge branch 'for-linus' of git://git.open-osd.org/linux-open-osd ... Browse Code »

* 'for-linus' of git://git.open-osd.org/linux-open-osd:
exofs: confusion between kmap() and kmap_atomic() api
exofs: Add default address_space_operations

Linus Torvalds
2010-05-24 22:57:41 +0800

22 May, 2010

1 commit

e00117f14 exofs: replace inode uid,gid,mode initialization with helper function ... Browse Code »

Ack-by: Boaz Harrosh
Signed-off-by: Dmitry Monakhov
Signed-off-by: Al Viro

Dmitry Monakhov
2010-05-22 06:31:23 +0800

17 May, 2010

2 commits

ddf08f4b9 exofs: confusion between kmap() and kmap_atomic() api ... Browse Code »

For kmap_atomic() we call kunmap_atomic() on the returned pointer.
That's different from kmap() and kunmap() and so it's easy to get them
backwards.

Cc: Stable
Signed-off-by: Dan Carpenter
Signed-off-by: Boaz Harrosh

Dan Carpenter
2010-05-17 18:50:58 +0800
200b07004 exofs: Add default address_space_operations ... Browse Code »

All vectors of address_space_operations should be initialized
by the filesystem. Add the missing parts.

This is actually an optimization, by using
__set_page_dirty_nobuffers. The default, in case of NULL,
would be __set_page_dirty_buffers which has these extar if(s).

.releasepage && .invalidatepage should both not be called
because page_private() is NULL in exofs. Put a WARN_ON if
they are called, to indicate the Kernel has changed in this
regard, if when it does.

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2010-05-17 18:50:50 +0800

29 Apr, 2010

1 commit

a36fed12a exofs: Fix "add bdi backing to mount session" fall out ... Browse Code »

Commit b3d0ab7e60d1865bb6f6a79a77aaba22f2543236 ("exofs: add bdi backing
to mount session") has a bug in the placement of the bdi member at
struct exofs_sb_info. The layout member must be kept last.

Signed-off-by: Boaz Harrosh
Acked-by: Jens Axboe
Signed-off-by: Linus Torvalds

Boaz Harrosh
2010-04-29 22:59:16 +0800

22 Apr, 2010

1 commit

b3d0ab7e6 exofs: add bdi backing to mount session ... Browse Code »

This ensures that dirty data gets flushed properly.

Signed-off-by: Jens Axboe

Jens Axboe
2010-04-22 18:26:04 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

06 Mar, 2010

1 commit

a9185b41a pass writeback_control to ->write_inode ... Browse Code »

This gives the filesystem more information about the writeback that
is happening. Trond requested this for the NFS unstable write handling,
and other filesystems might benefit from this too by beeing able to
distinguish between the different callers in more detail.

Signed-off-by: Christoph Hellwig
Signed-off-by: Al Viro

Christoph Hellwig
2010-03-06 02:25:52 +0800

28 Feb, 2010

11 commits

50a76fd3c exofs: groups support ... Browse Code »

* _calc_stripe_info() changes to accommodate for grouping
calculations. Returns additional information

* old _prepare_pages() becomes _prepare_one_group()
which stores pages belonging to one device group.

* New _prepare_for_striping iterates on all groups calling
_prepare_one_group().

* Enable mounting of groups data_maps (group_width != 0)

[QUESTION]
what is faster A or B;
A. x += stride;
x = x % width + first_x;

B x += stride
if (x < last_x)
x = first_x;

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2010-02-28 19:55:53 +0800
b367e78bd exofs: Prepare for groups ... Browse Code »

* Rename _offset_dev_unit_off() to _calc_stripe_info()
and recieve a struct for the output params

* In _prepare_for_striping we only need to call
_calc_stripe_info() once. The other componets
are easy to calculate from that. This code
was inspired by what's done in truncate.

* Some code shifts that make sense now but will make
more sense when group support is added.

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2010-02-28 19:44:44 +0800
96391e2ba exofs: Error recovery if object is missing from storage ... Browse Code »

If an object is referenced by a directory but does not
exist on a target, it is a very serious corruption that
means:
1. Either a power failure with very slim chance of it
happening. Because the directory update is always submitted
much after object creation, but if a directory is written
to one device and the object creation to another it might
theoretically happen.
2. It only ever happened to me while developing with BUGs
causing file corruption. Crashes could also cause it but
they are more like case 1.

In any way the object does not exist, so data is surely lost.
If there is a mix-up in the obj-id or data-map, then lost objects
can be salvaged by off-line fsck. The only recoverable information
is the directory name. By letting it appear as a regular empty file,
with date==0 (1970 Jan 1st) ownership to root, we enable recovery
of the only useful information. And also enable deletion or over-write.
I can see how this can hurt.

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2010-02-28 19:44:43 +0800
86093aaff exofs: convert io_state to use pages array instead of bio at input ... Browse Code »

* inode.c operations are full-pages based, and not actually
true scatter-gather
* Lets us use more pages at once upto 512 (from 249) in 64 bit
* Brings us much much closer to be able to use exofs's io_state engine
from objlayout driver. (Once I decide where to put the common code)

After RAID0 patch the outer (input) bio was never used as a bio, but
was simply a page carrier into the raid engine. Even in the simple
mirror/single-dev arrangement pages info was copied into a second bio.
It is now easer to just pass a pages array into the io_state and prepare
bio(s) once.

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2010-02-28 19:44:42 +0800
5d952b839 exofs: RAID0 support ... Browse Code »

We now support striping over mirror devices. Including variable sized
stripe_unit.

Some limits:
* stripe_unit must be a multiple of PAGE_SIZE
* stripe_unit * stripe_count is maximum upto 32-bit (4Gb)

Tested RAID0 over mirrors, RAID0 only, mirrors only. All check.

Design notes:
* I'm not using a vectored raid-engine mechanism yet. Following the
pnfs-objects-layout data-map structure, "Mirror" is just a private
case of "group_width" == 1, and RAID0 is a private case of
"Mirrors" == 1. The performance lose of the general case over the
particular special case optimization is totally negligible, also
considering the extra code size.

* In general I added a prepare_stripes() stage that divides the
to-be-io pages to the participating devices, the previous
exofs_ios_write/read, now becomes _write/read_mirrors and a new
write/read upper layer loops on all devices calling
_write/read_mirrors. Effectively the prepare_stripes stage is the all
secret.
Also truncate need fixing to accommodate for striping.

* In a RAID0 arrangement, in a regular usage scenario, if all inode
layouts will start at the same device, the small files fill up the
first device and the later devices stay empty, the farther the device
the emptier it is.

To fix that, each inode will start at a different stripe_unit,
according to it's obj_id modulus number-of-stripe-units. And
will then span all stripe-units in the same incrementing order
wrapping back to the beginning of the device table. We call it
a stripe-units moving window.

Special consideration was taken to keep all devices in a mirror
arrangement identical. So a broken osd-device could just be cloned
from one of the mirrors and no FS scrubbing is needed. (We do that
by rotating stripe-unit at a time and not a single device at a time.)

TODO:
We no longer verify object_length == inode->i_size in exofs_iget.
(since i_size is stripped on multiple objects now).
I should introduce a multiple-device attribute reading, and use
it in exofs_iget.

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2010-02-28 19:43:08 +0800
d9c740d22 exofs: Define on-disk per-inode optional layout attribute ... Browse Code »

* Layouts describe the way a file is spread on multiple devices.
The layout information is stored in the objects attribute introduced
in this patch.

* There can be multiple generating function for the layout.
Currently defined:
- No attribute present - use below moving-window on global
device table, all devices.
(This is the only one currently used in exofs)
- an obj_id generated moving window - the obj_id is a randomizing
factor in the otherwise global map layout.
- An explicit layout stored, including a data_map and a device
index list.
- More might be defined in future ...

* There are two attributes defined of the same structure:
A-data-files-layout - This layout is used by data-files. If present
at a directory, all files of that directory will
be created with this layout.
A-meta-data-layout - This layout is used by a directory and other
meta-data information. Also inherited at creation
of subdirectories.

* At creation time inodes are created with the layout specified above.
A usermode utility may change the creation layout on a give directory
or file. Which in the case of directories, will also apply to newly
created files/subdirectories, children of that directory.
In the simple unaltered case of a newly created exofs, no layout
attributes are present, and all layouts adhere to the layout specified
at the device-table.

* In case of a future file system loaded in an old exofs-driver.
At iget(), the generating_function is inspected and if not supported
will return an IO error to the application and the inode will not
be loaded. So not to damage any data.
Note: After this patch we do not yet support any type of layout
only the RAID0 patch that enables striping at the super-block
level will add support for RAID0 layouts above. This way we
are past and future compatible and fully bisectable.

* Access to the device table is done by an accessor since
it will change according to above information.

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2010-02-28 19:35:28 +0800
46f4d973f exofs: unindent exofs_sbi_read ... Browse Code »

The original idea was that a mirror read can be sub-divided
to multiple devices. But this has very little gain and only
at very large IOes so it's not going to be implemented soon.

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2010-02-28 19:35:27 +0800
45d3abcb1 exofs: Move layout related members to a layout structure ... Browse Code »

* Abstract away those members in exofs_sb_info that are related/needed
by a layout into a new exofs_layout structure. Embed it in exofs_sb_info.

* At exofs_io_state receive/keep a pointer to an exofs_layout. No need for
an exofs_sb_info pointer, all we need is at exofs_layout.

* Change any usage of above exofs_sb_info members to their new name.

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2010-02-28 19:35:27 +0800
22ddc5563 exofs: Recover in the case of read-passed-end-of-file ... Browse Code »

In check_io, implement the case of reading passed end of
file, by clearing the pages and recover with no error. In
a raid arrangement this can become a legitimate situation
in case of holes in the file.

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2010-02-28 19:35:26 +0800
518f167a3 exofs: Micro-optimize exofs_i_info ... Browse Code »

optimize the exofs_i_info struct usage by moving the embedded
vfs_inode to be first. A compiler might optimize away an "add"
operation with constant zero. (Which it cannot with other constants)

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2010-02-28 19:35:25 +0800
34ce4e7c2 exofs: debug print even less ... Browse Code »

* Last debug trimming left in some stupid print, remove them.
Fixup some other prints
* Shift printing from inode.c to ios.c
* Add couple of prints when memory allocation fails.

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2010-02-28 19:35:25 +0800

05 Jan, 2010

2 commits

efd124b99 exofs: simple_write_end does not mark_inode_dirty ... Browse Code »

exofs uses simple_write_end() for it's .write_end handler. But
it is not enough because simple_write_end() does not call
mark_inode_dirty() when it extends i_size. So even if we do
call mark_inode_dirty at beginning of write out, with a very
long IO and a saturated system we might get the .write_inode()
called while still extend-writing to file and miss out on the last
i_size updates.

So override .write_end, call simple_write_end(), and afterwords if
i_size was changed call mark_inode_dirty().

It stands to logic that since simple_write_end() was the one extending
i_size it should also call mark_inode_dirty(). But it looks like all
users of simple_write_end() are memory-bound pseudo filesystems, who
could careless about mark_inode_dirty(). I might submit a
warning-comment patch to simple_write_end() in future.

CC: Stable
Signed-off-by: Boaz Harrosh

Boaz Harrosh
2010-01-05 15:14:32 +0800
89be50302 exofs: fix pnfs_osd re-definitions in pre-pnfs trees ... Browse Code »

Some on disk exofs constants and types are defined in the pnfs_osd_xdr.h
file. Since we needed these types before the pnfs-objects code was
accepted to mainline we duplicated the minimal needed definitions into
an exofs local header. The definitions where conditionally included
depending on !CONFIG_PNFS defined. So if PNFS was present in the tree
definitions are taken from there and if not they are defined locally.

That was all good but, the CONFIG_PNFS is planed to be included upstream
before the pnfs-objects is also included. (The first pnfs batch might be
pnfs-files only)

So condition exofs local definitions on the absence of pnfs_osd_xdr.h
inclusion (__PNFS_OSD_XDR_H__ not defined). User code must make sure
that in future pnfs_osd_xdr.h will be included before fs/exofs/pnfs.h,
which happens to be so in current code.

Once pnfs-objects hits mainline, exofs's local header will be removed.

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2010-01-05 15:14:32 +0800

10 Dec, 2009

8 commits

04dc1e88a exofs: Multi-device mirror support ... Browse Code »

This patch changes on-disk format, it is accompanied with a parallel
patch to mkfs.exofs that enables multi-device capabilities.

After this patch, old exofs will refuse to mount a new formatted FS and
new exofs will refuse an old format. This is done by moving the magic
field offset inside the FSCB. A new FSCB *version* field was added. In
the future, exofs will refuse to mount unmatched FSCB version. To
up-grade or down-grade an exofs one must use mkfs.exofs --upgrade option
before mounting.

Introduced, a new object that contains a *device-table*. This object
contains the default *data-map* and a linear array of devices
information, which identifies the devices used in the filesystem. This
object is only written to offline by mkfs.exofs. This is why it is kept
separate from the FSCB, since the later is written to while mounted.

Same partition number, same object number is used on all devices only
the device varies.

* define the new format, then load the device table on mount time make
sure every thing is supported.

* Change I/O engine to now support Mirror IO, .i.e write same data
to multiple devices, read from a random device to spread the
read-load from multiple clients (TODO: stripe read)

Implementation notes:
A few points introduced in previous patch should be mentioned here:

* Special care was made so absolutlly all operation that have any chance
of failing are done before any osd-request is executed. This is to
minimize the need for a data consistency recovery, to only real IO
errors.

* Each IO state has a kref. It starts at 1, any osd-request executed
will increment the kref, finally when all are executed the first ref
is dropped. At IO-done, each request completion decrements the kref,
the last one to return executes the internal _last_io() routine.
_last_io() will call the registered io_state_done. On sync mode a
caller does not supply a done method, indicating a synchronous
request, the caller is put to sleep and a special io_state_done is
registered that will awaken the caller. Though also in sync mode all
operations are executed in parallel.

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2009-12-10 15:59:23 +0800
06886a5a3 exofs: Move all operations to an io_engine ... Browse Code »

In anticipation for multi-device operations, we separate osd operations
into an abstract I/O API. Currently only one device is used but later
when adding more devices, we will drive all devices in parallel according
to a "data_map" that describes how data is arranged on multiple devices.
The file system level operates, like before, as if there is one object
(inode-number) and an i_size. The io engine will split this to the same
object-number but on multiple device.

At first we introduce Mirror (raid 1) layout. But at the final outcome
we intend to fully implement the pNFS-Objects data-map, including
raid 0,4,5,6 over mirrored devices, over multiple device-groups. And
more. See: http://tools.ietf.org/html/draft-ietf-nfsv4-pnfs-obj-12

* Define an io_state based API for accessing osd storage devices
in an abstract way.
Usage:
First a caller allocates an io state with:
exofs_get_io_state(struct exofs_sb_info *sbi,
struct exofs_io_state** ios);

Then calles one of:
exofs_sbi_create(struct exofs_io_state *ios);
exofs_sbi_remove(struct exofs_io_state *ios);
exofs_sbi_write(struct exofs_io_state *ios);
exofs_sbi_read(struct exofs_io_state *ios);
exofs_oi_truncate(struct exofs_i_info *oi, u64 new_len);

And when done
exofs_put_io_state(struct exofs_io_state *ios);

* Convert all source files to use this new API
* Convert from bio_alloc to bio_kmalloc
* In io engine we make use of the now fixed osd_req_decode_sense

There are no functional changes or on disk additions after this patch.

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2009-12-10 15:59:22 +0800
8ce9bdd1f exofs: move osd.c to ios.c ... Browse Code »

If I do a "git mv" together with a massive code change
and commit in one patch, git looses the rename and
records a delete/new instead. This is bad because I want
a rename recorded so later rebased/cherry-picked patches
to the old name will work. Also the --follow is lost.

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2009-12-10 15:59:21 +0800
cae012d85 exofs: statfs blocks is sectors not FS blocks ... Browse Code »

Even though exofs has a 4k block size, statfs blocks
is in sectors (512 bytes).

Also if target returns 0 for capacity then make it
ULLONG_MAX. df does not like zero-size filesystems

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2009-12-10 15:59:21 +0800
19fe294f2 exofs: Prints on mount and unmout ... Browse Code »

It is important to print in the logs when a filesystem was
mounted and eventually unmounted.

Print the osd-device's osd_name and pid the FS was
mounted/unmounted on.

TODO: How to also print the namespace path the filesystem was
mounted on?

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2009-12-10 15:59:20 +0800
9cfdc7aa9 exofs: refactor exofs_i_info initialization into common helper ... Browse Code »

There are two places that initialize inodes: exofs_iget() and
exofs_new_inode()

As more members of exofs_i_info that need initialization are
added this code will grow. (soon)

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2009-12-10 15:59:19 +0800
fe33cc1ee exofs: dbg-print less ... Browse Code »

Iner-loops printing is converted to EXOFS_DBG2 which is #defined
to nothing.

It is now almost bareable to just leave debug-on. Every operation
is printed once, with most relevant info (I hope).

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2009-12-10 15:59:18 +0800
58311c43d exofs: More sane debug print ... Browse Code »

debug prints should be somewhat useful without actually
reading the source code

Signed-off-by: Boaz Harrosh

Boaz Harrosh
2009-12-10 15:59:17 +0800