Eric Lee / smarc-fsl-linux-kernel

06 Nov, 2011

1 commit

6c41761fc btrfs: separate superblock items out of fs_info ... Browse Code »

fs_info has now ~9kb, more than fits into one page. This will cause
mount failure when memory is too fragmented. Top space consumers are
super block structures super_copy and super_for_commit, ~2.8kb each.
Allocate them dynamically. fs_info will be ~3.5kb. (measured on x86_64)

Add a wrapper for freeing fs_info and all of it's dynamically allocated
members.

Signed-off-by: David Sterba

David Sterba
2011-11-06 16:04:01 +0800

02 Aug, 2011

1 commit

e55179b3d Btrfs: check the nodatasum flag when writing compressed files ... Browse Code »

If mounting with nodatasum option, we won't csum file data for
general write or direct-io write, and this rule should also be
applied when writing compressed files.

Signed-off-by: Li Zefan
Signed-off-by: Chris Mason

Li Zefan
2011-08-02 02:30:46 +0800

23 May, 2011

1 commit

945d8962c Merge branch 'cleanups' of git://repo.or.cz/linux-2.6/btrfs-unstable into inode_numbers ... Browse Code »

Conflicts:
fs/btrfs/extent-tree.c
fs/btrfs/free-space-cache.c
fs/btrfs/inode.c
fs/btrfs/tree-log.c

Signed-off-by: Chris Mason

Chris Mason
2011-05-23 00:33:42 +0800

02 May, 2011

1 commit

306e16ce1 btrfs: rename variables clashing with global function names ... Browse Code »

reported by gcc -Wshadow:
page_index, page_offset, new_inode, dev_name

Signed-off-by: David Sterba

David Sterba
2011-05-02 19:57:19 +0800

25 Apr, 2011

1 commit

33345d015 Btrfs: Always use 64bit inode number ... Browse Code »

There's a potential problem in 32bit system when we exhaust 32bit inode
numbers and start to allocate big inode numbers, because btrfs uses
inode->i_ino in many places.

So here we always use BTRFS_I(inode)->location.objectid, which is an
u64 variable.

There are 2 exceptions that BTRFS_I(inode)->location.objectid !=
inode->i_ino: the btree inode (0 vs 1) and empty subvol dirs (256 vs 2),
and inode->i_ino will be used in those cases.

Another reason to make this change is I'm going to use a special inode
to save free ino cache, and the inode number must be > (u64)-256.

Signed-off-by: Li Zefan

Li Zefan
2011-04-25 16:46:09 +0800

28 Mar, 2011

2 commits

c2db1073f Btrfs: check return value of btrfs_alloc_path() ... Browse Code »

Adding the check on the return value of btrfs_alloc_path() to several places.
And, some of callers are modified by this change.

Signed-off-by: Tsutomu Itoh
Signed-off-by: Chris Mason

Tsutomu Itoh
2011-03-28 17:37:54 +0800
dac97e516 Btrfs: fix uncheck memory allocations ... Browse Code »

To make Btrfs code more robust, several return value checks where memory
allocation can fail are introduced. I use BUG_ON where I don't know how
to handle the error properly, which increases the number of using the
notorious BUG_ON, though.

Signed-off-by: Yoshinori Sano
Signed-off-by: Chris Mason

Yoshinori Sano
2011-03-28 17:37:49 +0800

06 Feb, 2011

1 commit

8e4eef7a6 btrfs: Drop __exit attribute on btrfs_exit_compress ... Browse Code »

As this function is called in some error paths while not
removing the module, the __exit attribute prevents the kernel
image from linking when btrfs is compiled in statically.

Signed-off-by: Alexey Charkov
Signed-off-by: Chris Mason

Alexey Charkov
2011-02-06 20:19:19 +0800

29 Jan, 2011

1 commit

6b82ce8d8 btrfs: fix uncheck memory allocation in btrfs_submit_compressed_read ... Browse Code »

btrfs_submit_compressed_read() is lack of memory allocation checks and
corresponding error route.

After this fix, if it comes to "no memory" case, errno will be returned
to userland step by step, and tell users this operation cannot go on.

Signed-off-by: Liu Bo
Signed-off-by: Chris Mason

liubo
2011-01-29 05:40:36 +0800

22 Dec, 2010

3 commits

3a39c18d6 btrfs: Extract duplicate decompress code ... Browse Code »

Add a common function to copy decompressed data from working buffer
to bio pages.

Signed-off-by: Li Zefan

Li Zefan
2010-12-22 23:15:50 +0800
a6fa6fae4 btrfs: Add lzo compression support ... Browse Code »

Lzo is a much faster compression algorithm than gzib, so would allow
more users to enable transparent compression, and some users can
choose from compression ratio and speed for different applications

Usage:

# mount -t btrfs -o compress[=] dev /mnt
or
# mount -t btrfs -o compress-force[=] dev /mnt

"-o compress" without argument is still allowed for compatability.

Compatibility:

If we mount a filesystem with lzo compression, it will not be able be
mounted in old kernels. One reason is, otherwise btrfs will directly
dump compressed data, which sits in inline extent, to user.

Performance:

The test copied a linux source tarball (~400M) from an ext4 partition
to the btrfs partition, and then extracted it.

(time in second)
lzo zlib nocompress
copy: 10.6 21.7 14.9
extract: 70.1 94.4 66.6

(data size in MB)
lzo zlib nocompress
copy: 185.87 108.69 394.49
extract: 193.80 132.36 381.21

Changelog:

v1 -> v2:
- Select LZO_COMPRESS and LZO_DECOMPRESS in btrfs Kconfig.
- Add incompability flag.
- Fix error handling in compress code.

Signed-off-by: Li Zefan

Li Zefan
2010-12-22 23:15:47 +0800
261507a02 btrfs: Allow to add new compression algorithm ... Browse Code »

Make the code aware of compression type, instead of always assuming
zlib compression.

Also make the zlib workspace function as common code for all
compression types.

Signed-off-by: Li Zefan

Li Zefan
2010-12-22 23:15:45 +0800

22 Nov, 2010

1 commit

88f794ede btrfs: cleanup duplicate bio allocating functions ... Browse Code »

extent_bio_alloc() and compressed_bio_alloc() are similar, cleanup
similar source code.

Signed-off-by: Miao Xie
Signed-off-by: Chris Mason

Miao Xie
2010-11-22 11:26:03 +0800

30 Oct, 2010

1 commit

559af8211 Btrfs: cleanup warnings from gcc 4.6 (nonbugs) ... Browse Code »
43

These are all the cases where a variable is set, but not read which are
not bugs as far as I can see, but simply leftovers.

Still needs more review.

Found by gcc 4.6's new warnings

Signed-off-by: Andi Kleen
Cc: Chris Mason
Signed-off-by: Andrew Morton
Signed-off-by: Chris Mason

Andi Kleen
2010-10-30 03:14:37 +0800

06 Apr, 2010

2 commits

795d580ba Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable ... Browse Code »

* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: add check for changed leaves in setup_leaf_for_split
Btrfs: create snapshot references in same commit as snapshot
Btrfs: fix small race with delalloc flushing waitqueue's
Btrfs: use add_to_page_cache_lru, use __page_cache_alloc
Btrfs: fix chunk allocate size calculation
Btrfs: kill max_extent mount option
Btrfs: fail to mount if we have problems reading the block groups
Btrfs: check btrfs_get_extent return for IS_ERR()
Btrfs: handle kmalloc() failure in inode lookup ioctl
Btrfs: dereferencing freed memory
Btrfs: Simplify num_stripes's calculation logical for __btrfs_alloc_chunk()
Btrfs: Add error handle for btrfs_search_slot() in btrfs_read_chunk_tree()
Btrfs: Remove unnecessary finish_wait() in wait_current_trans()
Btrfs: add NULL check for do_walk_down()
Btrfs: remove duplicate include in ioctl.c

Fix trivial conflict in fs/btrfs/compression.c due to slab.h include
cleanups.

Linus Torvalds
2010-04-06 04:21:15 +0800
28ecb6090 Btrfs: use add_to_page_cache_lru, use __page_cache_alloc ... Browse Code »

Pagecache pages should be allocated with __page_cache_alloc, so they
obey pagecache memory policies.

add_to_page_cache_lru is exported, so it should be used. Benefits over
using a private pagevec: neater code, 128 bytes fewer stack used, percpu
lru ordering is preserved, and finally don't need to flush pagevec
before returning so batching may be shared with other LRU insertions.

Signed-off-by: Nick Piggin :
Signed-off-by: Chris Mason

Nick Piggin
2010-04-06 02:41:51 +0800

30 Mar, 2010

1 commit

5a0e3ad6a include cleanup: Update gfp.h and slab.h includes to prepare for breaking implic… ... Browse Code »

…it slab.h inclusion from percpu.h

percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.

percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.

http://userweb.kernel.org/~tj/misc/slabh-sweep.py

The script does the followings.

* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.

* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.

* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.

The conversion was done in the following steps.

1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.

2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.

3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.

4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.

5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.

6. percpu.h was updated not to include slab.h.

7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).

* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig

8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.

Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.

Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

Tejun Heo
2010-03-30 21:02:32 +0800

15 Mar, 2010

1 commit

ef5780c01 Btrfs: fix gfp flags masking in the compression code ... Browse Code »

GFP_FS must be masked out, NOFS can't be or'd in.

Signed-off-by: Chris Mason

Nick Piggin
2010-03-15 23:05:57 +0800

12 Sep, 2009

2 commits

83ebade34 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable Browse Code »

Chris Mason
2009-09-12 07:07:25 +0800
890871be8 Btrfs: switch extent_map to a rw lock ... Browse Code »

There are two main users of the extent_map tree. The
first is regular file inodes, where it is evenly spread
between readers and writers.

The second is the chunk allocation tree, which maps blocks from
logical addresses to phyiscal ones, and it is 99.99% reads.

The mapping tree is a point of lock contention during heavy IO
workloads, so this commit switches things to a rw lock.

Signed-off-by: Chris Mason

Chris Mason
2009-09-12 01:31:05 +0800

13 Jul, 2009

1 commit

405f55712 headers: smp_lock.h redux ... Browse Code »

* Remove smp_lock.h from files which don't need it (including some headers!)
* Add smp_lock.h to files which do need it
* Make smp_lock.h include conditional in hardirq.h
It's needed only for one kernel_locked() usage which is under CONFIG_PREEMPT

This will make hardirq.h inclusion cheaper for every PREEMPT=n config
(which includes allmodconfig/allyesconfig, BTW)

Signed-off-by: Alexey Dobriyan
Signed-off-by: Linus Torvalds

Alexey Dobriyan
2009-07-13 03:22:34 +0800

10 Jun, 2009

1 commit

6cbff00f4 Btrfs: implement FS_IOC_GETFLAGS/SETFLAGS/GETVERSION ... Browse Code »

Add support for the standard attributes set via chattr and read via
lsattr. Currently we store the attributes in the flags value in
the btrfs inode, but I wonder whether we should split it into two so
that we don't have to keep converting between the two formats.

Remove the btrfs_clear_flag/btrfs_set_flag/btrfs_test_flag macros
as they were confusing the existing code and got in the way of the
new additions.

Also add the FS_IOC_GETVERSION ioctl for getting i_generation as it's
trivial.

Signed-off-by: Christoph Hellwig
Signed-off-by: Chris Mason

Christoph Hellwig
2009-06-10 23:29:52 +0800

21 Jan, 2009

1 commit

7eaebe7d5 Btrfs: removed unused #include <version.h>'s ... Browse Code »

Removed unused #include 's in btrfs

Signed-off-by: Huang Weiyi
Signed-off-by: Chris Mason

Huang Weiyi
2009-01-21 23:49:16 +0800

06 Jan, 2009

1 commit

d397712bc Btrfs: Fix checkpatch.pl warnings ... Browse Code »

There were many, most are fixed now. struct-funcs.c generates some warnings
but these are bogus.

Signed-off-by: Chris Mason

Chris Mason
2009-01-06 10:25:51 +0800

12 Dec, 2008

1 commit

17d217fe9 Btrfs: fix nodatasum handling in balancing code ... Browse Code »

Checksums on data can be disabled by mount option, so it's
possible some data extents don't have checksums or have
invalid checksums. This causes trouble for data relocation.
This patch contains following things to make data relocation
work.

1) make nodatasum/nodatacow mount option only affects new
files. Checksums and COW on data are only controlled by the
inode flags.

2) check the existence of checksum in the nodatacow checker.
If checksums exist, force COW the data extent. This ensure that
checksum for a given block is either valid or does not exist.

3) update data relocation code to properly handle the case
of checksum missing.

Signed-off-by: Yan Zheng

Yan Zheng
2008-12-12 23:03:38 +0800

09 Dec, 2008

1 commit

d20f7043f Btrfs: move data checksumming into a dedicated tree ... Browse Code »

Btrfs stores checksums for each data block. Until now, they have
been stored in the subvolume trees, indexed by the inode that is
referencing the data block. This means that when we read the inode,
we've probably read in at least some checksums as well.

But, this has a few problems:

* The checksums are indexed by logical offset in the file. When
compression is on, this means we have to do the expensive checksumming
on the uncompressed data. It would be faster if we could checksum
the compressed data instead.

* If we implement encryption, we'll be checksumming the plain text and
storing that on disk. This is significantly less secure.

* For either compression or encryption, we have to get the plain text
back before we can verify the checksum as correct. This makes the raid
layer balancing and extent moving much more expensive.

* It makes the front end caching code more complex, as we have touch
the subvolume and inodes as we cache extents.

* There is potentitally one copy of the checksum in each subvolume
referencing an extent.

The solution used here is to store the extent checksums in a dedicated
tree. This allows us to index the checksums by phyiscal extent
start and length. It means:

* The checksum is against the data stored on disk, after any compression
or encryption is done.

* The checksum is stored in a central location, and can be verified without
following back references, or reading inodes.

This makes compression significantly faster by reducing the amount of
data that needs to be checksummed. It will also allow much faster
raid management code in general.

The checksums are indexed by a key with a fixed objectid (a magic value
in ctree.h) and offset set to the starting byte of the extent. This
allows us to copy the checksum items into the fsync log tree directly (or
any other tree), without having to invent a second format for them.

Signed-off-by: Chris Mason

Chris Mason
2008-12-09 05:58:54 +0800

20 Nov, 2008

2 commits

4b4e25f2a Btrfs: compat code fixes ... Browse Code »

The btrfs git kernel trees is used to build a standalone tree for
compiling against older kernels. This commit makes the standalone tree
work with 2.6.27

Signed-off-by: Chris Mason

Chris Mason
2008-11-20 23:22:27 +0800
15916de83 Btrfs: Fixes for 2.6.28-rc API changes ... Browse Code »

* open/close_bdev_excl -> open/close_bdev_exclusive
* blkdev_issue_discard takes a GFP mask now
* Fix blkdev_issue_discard usage now that it is enabled

Signed-off-by: Chris Mason

Chris Mason
2008-11-20 10:17:22 +0800

11 Nov, 2008

2 commits

5b050f04c Btrfs: Fix compile warnings on 32 bit machines ... Browse Code »

Simple casting here and there to fix things up.

Signed-off-by: Chris Mason

Chris Mason
2008-11-11 22:34:41 +0800
e04ca626b Btrfs: Fix use after free during compressed reads ... Browse Code »

Yan's fix to use the correct file offset during compressed reads used the
extent_map struct pointer after it had been freed. This saves the
fields we want for later use instead.

Signed-off-by: Chris Mason

Chris Mason
2008-11-11 00:44:58 +0800

10 Nov, 2008

1 commit

ff5b7ee33 Btrfs: Fix csum error for compressed data ... Browse Code »

The decompress code doesn't take the logical offset in extent
pointer into account. If the logical offset isn't zero, data
will be decompressed into wrong pages.

The solution used here is to record the starting offset of the extent
in the file separately from the logical start of the extent_map struct.
This allows us to avoid problems inserting overlapping extents.

Signed-off-by: Yan Zheng

Yan Zheng
2008-11-10 20:34:43 +0800

08 Nov, 2008

1 commit

af09abfec Btrfs: make sure compressed bios don't complete too soon ... Browse Code »

When writing a compressed extent, a number of bios are created that
point to a single struct compressed_bio. At end_io time an atomic counter in
the compressed_bio struct makes sure that all of the bios have finished
before final end_io processing is done.

But when multiple bios are needed to write a compressed extent, the
counter was being incremented after the first bio was sent to submit_bio.
It is possible the bio will complete before the counter is incremented,
making the end_io handler free the compressed_bio struct before
processing is finished.

The fix is to increment the atomic counter before bio submission,
both for compressed reads and writes.

Signed-off-by: Chris Mason

Chris Mason
2008-11-08 01:35:44 +0800

07 Nov, 2008

1 commit

771ed689d Btrfs: Optimize compressed writeback and reads ... Browse Code »

When reading compressed extents, try to put pages into the page cache
for any pages covered by the compressed extent that readpages didn't already
preload.

Add an async work queue to handle transformations at delayed allocation processing
time. Right now this is just compression. The workflow is:

1) Find offsets in the file marked for delayed allocation
2) Lock the pages
3) Lock the state bits
4) Call the async delalloc code

The async delalloc code clears the state lock bits and delalloc bits. It is
important this happens before the range goes into the work queue because
otherwise it might deadlock with other work queue items that try to lock
those extent bits.

The file pages are compressed, and if the compression doesn't work the
pages are written back directly.

An ordered work queue is used to make sure the inodes are written in the same
order that pdflush or writepages sent them down.

This changes extent_write_cache_pages to let the writepage function
update the wbc nr_written count.

Signed-off-by: Chris Mason

Chris Mason
2008-11-07 11:02:51 +0800

01 Nov, 2008

1 commit

70b99e695 Btrfs: Compression corner fixes ... Browse Code »

Make sure we keep page->mapping NULL on the pages we're getting
via alloc_page. It gets set so a few of the callbacks can do the right
thing, but in general these pages don't have a mapping.

Don't try to truncate compressed inline items in btrfs_drop_extents.
The whole compressed item must be preserved.

Don't try to create multipage inline compressed items. When we try to
overwrite just the first page of the file, we would have to read in and recow
all the pages after it in the same compressed inline items. For now, only
create single page inline items.

Make sure we lock pages in the correct order during delalloc. The
search into the state tree for delalloc bytes can return bytes before
the page we already have locked.

Signed-off-by: Chris Mason

Chris Mason
2008-11-01 00:46:39 +0800

31 Oct, 2008

1 commit

cfbc246ea Btrfs: walk compressed pages based on the nr_pages count instead of bytes ... Browse Code »

The byte walk counting was awkward and error prone. This uses the
number of pages sent the higher layer to build bios.

Signed-off-by: Chris Mason

Chris Mason
2008-10-31 01:22:14 +0800

30 Oct, 2008

1 commit

c8b978188 Btrfs: Add zlib compression support ... Browse Code »

This is a large change for adding compression on reading and writing,
both for inline and regular extents. It does some fairly large
surgery to the writeback paths.

Compression is off by default and enabled by mount -o compress. Even
when the -o compress mount option is not used, it is possible to read
compressed extents off the disk.

If compression for a given set of pages fails to make them smaller, the
file is flagged to avoid future compression attempts later.

* While finding delalloc extents, the pages are locked before being sent down
to the delalloc handler. This allows the delalloc handler to do complex things
such as cleaning the pages, marking them writeback and starting IO on their
behalf.

* Inline extents are inserted at delalloc time now. This allows us to compress
the data before inserting the inline extent, and it allows us to insert
an inline extent that spans multiple pages.

* All of the in-memory extent representations (extent_map.c, ordered-data.c etc)
are changed to record both an in-memory size and an on disk size, as well
as a flag for compression.

From a disk format point of view, the extent pointers in the file are changed
to record the on disk size of a given extent and some encoding flags.
Space in the disk format is allocated for compression encoding, as well
as encryption and a generic 'other' field. Neither the encryption or the
'other' field are currently used.

In order to limit the amount of data read for a single random read in the
file, the size of a compressed extent is limited to 128k. This is a
software only limit, the disk format supports u64 sized compressed extents.

In order to limit the ram consumed while processing extents, the uncompressed
size of a compressed extent is limited to 256k. This is a software only limit
and will be subject to tuning later.

Checksumming is still done on compressed extents, and it is done on the
uncompressed version of the data. This way additional encodings can be
layered on without having to figure out which encoding to checksum.

Compression happens at delalloc time, which is basically singled threaded because
it is usually done by a single pdflush thread. This makes it tricky to
spread the compression load across all the cpus on the box. We'll have to
look at parallel pdflush walks of dirty inodes at a later time.

Decompression is hooked into readpages and it does spread across CPUs nicely.

Signed-off-by: Chris Mason

Chris Mason
2008-10-30 02:49:59 +0800