Eric Lee / smarc-fsl-linux-kernel

20 Oct, 2008

40 commits

3b274f44d edac cell: fix incorrect edac_mode ... Browse Code »

The cell_edac driver is setting the edac_mode field of the csrow's to an
incorrect value, causing the sysfs show routine for that field to go out
of an array bound and Oopsing the kernel when used.

Signed-off-by: Benjamin Herrenschmidt
Signed-off-by: Doug Thompson
Cc: [2.6.27.x, 2.6.26.x. 2.6.25.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Benjamin Herrenschmidt
2008-10-20 23:52:40 +0800
b64fd291a pc8736x_gpio: add support for PC87365 chips ... Browse Code »

This is only compile tested, because I do not own appropriate hardware.

Signed-off-by: Andre Haupt
Cc: Jim Cromie
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andre Haupt
2008-10-20 23:52:40 +0800
b231cca43 message queues: increase range limits ... Browse Code »
220

Increase the range of various posix message queue limits.

Posix gives the message queue user the ability to 'trade off' the maximum
size of messages with the number of possible messages that can be 'in
flight'. Linux currently makes this trade off more restrictive than it
needs to be.

In particular, the maximum message size today can be made no smaller than
8192. This greatly restricts those applications that would like to have
the ability to post large numbers of very small messages.

So this task lowers the limit that the maximum message size can be set to,
from 8192 to 128. It also lowers the limit that the maximum #number of
messages in flight can be set to, from 10 to 1.

With these changes the message queue user can make better trade offs
between #messages and message size, in order to get everything to fit
within the setrlimit(RLIMIT_MSGQUEUE) limit for that particular user.

This patch also applies the values in

/proc/sys/fs/mqueue/msg_max
/proc/sys/fs/mqueue/msgsize_max

as the defaults for the max #messages allowed and the max message size
allowed, respectively, for those applications that do not supply these.
Previously, the defaults were hardwired to 10 and 8192, respectively.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Joe Korty
Cc: Al Viro
Cc: Manfred Spraul
Cc: Nadia Derbey
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Joe Korty
2008-10-20 23:52:40 +0800
acd99dbf5 kdump: add vmlist.addr to vmcoreinfo for x86 vmalloc translation. ... Browse Code »

Add the symbols 'vmlist' and offset 'vm_struct.addr' to the vmcoreinfo[1]
data for i386 vmalloc translation.

makedumpfile[2] needs VMALLOC_START value for distinguishing a vmalloc
address or not, because it should choose suitable translation method. If
applying this patch, makedumpfile will be able to take VMALLOC_START value
from 'vmlist.addr'.

vmcoreinfo[1]:
The vmcoreinfo data has the minimum debugging information only for dump
filtering. makedumpfile[2] uses it to distinguish unnecessary pages and
creates a small dumpfile.

makedumpfile[2]:
dump filtering command
https://sourceforge.net/projects/makedumpfile/

Signed-off-by: Ken'ichi Ohmichi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ken'ichi Ohmichi
2008-10-20 23:52:40 +0800
d9a9855d0 always reserve elfcore header memory in crash kernel ... Browse Code »

elfcore header memory needs to be reserved in a crash kernel. This means
that the relevant code should be protected by CONFIG_CRASH_DUMP rather
than CONFIG_PROC_VMCORE.

Signed-off-by: Simon Horman
Cc: Vivek Goyal
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Simon Horman
2008-10-20 23:52:40 +0800
85a0ee342 kdump: add is_vmcore_usable() and vmcore_unusable() ... Browse Code »

The usage of elfcorehdr_addr has changed recently such that being set to
ELFCORE_ADDR_MAX is used by is_kdump_kernel() to indicate if the code is
executing in a kernel executed as a crash kernel.

However, arch/ia64/kernel/setup.c:reserve_elfcorehdr will rest
elfcorehdr_addr to ELFCORE_ADDR_MAX on error, which means any subsequent
calls to is_kdump_kernel() will return 0, even though they should return
1.

Ok, at this point in time there are no subsequent calls, but I think its
fair to say that there is ample scope for error or at the very least
confusion.

This patch add an extra state, ELFCORE_ADDR_ERR, which indicates that
elfcorehdr_addr was passed on the command line, and thus execution is
taking place in a crashdump kernel, but vmcore can't be used for some
reason. This is tested for using is_vmcore_usable() and set using
vmcore_unusable(). A subsequent patch makes use of this new code.

To summarise, the states that elfcorehdr_addr can now be in are as follows:

ELFCORE_ADDR_MAX: not a crashdump kernel
ELFCORE_ADDR_ERR: crashdump kernel but vmcore is unusable
any other value: crash dump kernel and vmcore is usable

Signed-off-by: Simon Horman
Cc: Vivek Goyal
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Simon Horman
2008-10-20 23:52:40 +0800
630bf2074 kdump: use is_kdump_kernel() in sba_init() ... Browse Code »

o Make use of is_kdump_kernel() rather than checking elfcorehdr_addr directly.

o Remove CONFIG_CRASH_DUMP as is_kdump_kernel() is safe to call anywhere

o Remove CONFIG_PROC_FS as it is bogus, the check
should occur regardless of if CONFIG_PROC_FS is set or not.

Signed-off-by: Simon Horman
Acked-by: Vivek Goyal
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Simon Horman
2008-10-20 23:52:40 +0800
e515a0d60 kdump: update elfcorehdr documentation to reflect supported architectures ... Browse Code »

IA64, PPC and SH also support the elfcorehdr command line.

Signed-off-by: Simon Horman
Acked-by: Vivek Goyal
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Simon Horman
2008-10-20 23:52:40 +0800
57cac4d18 kdump: make elfcorehdr_addr independent of CONFIG_PROC_VMCORE ... Browse Code »

o elfcorehdr_addr is used by not only the code under CONFIG_PROC_VMCORE
but also by the code which is not inside CONFIG_PROC_VMCORE. For
example, is_kdump_kernel() is used by powerpc code to determine if
kernel is booting after a panic then use previous kernel's TCE table.
So even if CONFIG_PROC_VMCORE is not set in second kernel, one should be
able to correctly determine that we are booting after a panic and setup
calgary iommu accordingly.

o So remove the assumption that elfcorehdr_addr is under
CONFIG_PROC_VMCORE.

o Move definition of elfcorehdr_addr to arch dependent crash files.
(Unfortunately crash dump does not have an arch independent file
otherwise that would have been the best place).

o kexec.c is not the right place as one can Have CRASH_DUMP enabled in
second kernel without KEXEC being enabled.

o I don't see sh setup code parsing the command line for
elfcorehdr_addr. I am wondering how does vmcore interface work on sh.
Anyway, I am atleast defining elfcoredhr_addr so that compilation is not
broken on sh.

Signed-off-by: Vivek Goyal
Acked-by: "Eric W. Biederman"
Acked-by: Simon Horman
Acked-by: Paul Mundt
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vivek Goyal
2008-10-20 23:52:39 +0800
293adee60 kthread_bind: use wait_task_inactive(TASK_UNINTERRUPTIBLE) ... Browse Code »

Now that wait_task_inactive(task, state) checks task->state == state,
we can simplify the code and make this debugging check more robust.

Signed-off-by: Oleg Nesterov
Cc: Roland McGrath
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-10-20 23:52:39 +0800
656eb2cd5 add CONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS ... Browse Code »

This adds a kconfig option to change the /proc/PID/coredump_filter default.
Fedora has been carrying a trivial patch to change the hard-wired value for
this default, since Fedora 8. The default default can't change safely
because there are old GDB versions out there (all before 6.7) that are
confused by the core dump files created by the MMF_DUMP_ELF_HEADERS setting.

Signed-off-by: Roland McGrath
Cc: Michael Kerrisk
Cc: Oleg Nesterov
Cc: Alan Cox
Cc: Andi Kleen
Cc: KOSAKI Motohiro
Cc: Kawai Hidehiro
Cc: Ingo Molnar
Cc: David Jones
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Roland McGrath
2008-10-20 23:52:39 +0800
6409324b3 coredump: format_corename: don't append .%pid if multi-threaded ... Browse Code »

If the coredumping is multi-threaded, format_corename() appends .%pid to
the corename. This was needed before the proper multi-thread core dump
support, now all the threads in the mm go into a single unified core file.

Remove this special case, it is not even documented and we have "%p"
and core_uses_pid.

Signed-off-by: Oleg Nesterov
Cc: Michael Kerrisk
Cc: Oleg Nesterov
Cc: Alan Cox
Cc: Roland McGrath
Cc: Andi Kleen
Cc: La Monte Yarroll
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2008-10-20 23:52:39 +0800
b747c8c10 make ptrace_untrace() static ... Browse Code »

ptrace_untrace() can now become static.

Signed-off-by: Adrian Bunk
Cc: Oleg Nesterov
Cc: Roland McGrath
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Adrian Bunk
2008-10-20 23:52:39 +0800
c45964354 bitmask: remove bitmap_scnprintf_len() ... Browse Code »

bitmap_scnprintf_len() is not used now, so we remove it.

Otherwise we have to maintain it and make its return
value always equal to bitmap_scnprintf()'s return value.

Signed-off-by: Lai Jiangshan
Cc: Alexey Dobriyan
Cc: Paul Menage
Cc: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2008-10-20 23:52:39 +0800
30e8e1360 cpuset: use seq_*mask_* to print masks ... Browse Code »

1) seq_file excepts that m->count == m->size when it's buf is full,
so current code will causes bugs when buf is overflow.

2) There is not too good that cpuset accesses struct seq_file's
fields directly.

Signed-off-by: Lai Jiangshan
Cc: Alexey Dobriyan
Acked-by: Paul Menage
Cc: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2008-10-20 23:52:39 +0800
3eda20118 seq_file: add seq_cpumask_list(), seq_nodemask_list() ... Browse Code »

seq_cpumask_list(), seq_nodemask_list() are very like seq_cpumask(),
seq_nodemask(), but they print human readable string.

Signed-off-by: Lai Jiangshan
Cc: Alexey Dobriyan
Cc: Paul Menage
Cc: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2008-10-20 23:52:39 +0800
85dd030ed seq_file: don't call bitmap_scnprintf_len() ... Browse Code »

"m->count + len < m->size" is true commonly, so bitmap_scnprintf()
is commonly called. this fix saves a call to bitmap_scnprintf_len().

Signed-off-by: Lai Jiangshan
Cc: Alexey Dobriyan
Cc: Paul Menage
Cc: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2008-10-20 23:52:39 +0800
40b6a7623 cpuset.c: remove extra variable ... Browse Code »

Remove the use of int cpus_nonempty variable from 'update_flag' function.

Signed-off-by: Md.Rakib H. Mullick
Acked-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Rakib Mullick
2008-10-20 23:52:39 +0800
52d4b9ac0 memcg: allocate all page_cgroup at boot ... Browse Code »

Allocate all page_cgroup at boot and remove page_cgroup poitner from
struct page. This patch adds an interface as

struct page_cgroup *lookup_page_cgroup(struct page*)

All FLATMEM/DISCONTIGMEM/SPARSEMEM and MEMORY_HOTPLUG is supported.

Remove page_cgroup pointer reduces the amount of memory by
- 4 bytes per PAGE_SIZE.
- 8 bytes per PAGE_SIZE
if memory controller is disabled. (even if configured.)

On usual 8GB x86-32 server, this saves 8MB of NORMAL_ZONE memory.
On my x86-64 server with 48GB of memory, this saves 96MB of memory.
I think this reduction makes sense.

By pre-allocation, kmalloc/kfree in charge/uncharge are removed.
This means
- we're not necessary to be afraid of kmalloc faiulre.
(this can happen because of gfp_mask type.)
- we can avoid calling kmalloc/kfree.
- we can avoid allocating tons of small objects which can be fragmented.
- we can know what amount of memory will be used for this extra-lru handling.

I added printk message as

"allocated %ld bytes of page_cgroup"
"please try cgroup_disable=memory option if you don't want"

maybe enough informative for users.

Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Balbir Singh
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-10-20 23:52:39 +0800
c05555b57 memcg: atomic ops for page_cgroup->flags ... Browse Code »

This patch makes page_cgroup->flags to be atomic_ops and define functions
(and macros) to access it.

Before trying to modify memory resource controller, this atomic operation
on flags is necessary. Most of flags in this patch is for LRU and modfied
under mz->lru_lock but we'll add another flags which is not for LRU soon.
For example, we'll place LOCK bit on flags field. We need atomic
operation to modify LRU bit without LOCK.

Signed-off-by: KAMEZAWA Hiroyuki
Acked-by: Balbir Singh
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-10-20 23:52:39 +0800
addb9efeb memcg: optimize per-cpu statistics ... Browse Code »

Some obvious optimization to memcg.

I found mem_cgroup_charge_statistics() is a little big (in object) and
does unnecessary address calclation. This patch is for optimization to
reduce the size of this function.

And res_counter_charge() is 'likely' to succeed.

Signed-off-by: KAMEZAWA Hiroyuki
Acked-by: Balbir Singh
Cc: Daisuke Nishimura
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-10-20 23:52:39 +0800
5b4e655e9 memcg: avoid accounting special pages ... Browse Code »

There are not-on-LRU pages which can be mapped and they are not worth to
be accounted. (becasue we can't shrink them and need dirty codes to
handle specical case) We'd like to make use of usual objrmap/radix-tree's
protcol and don't want to account out-of-vm's control pages.

When special_mapping_fault() is called, page->mapping is tend to be NULL
and it's charged as Anonymous page. insert_page() also handles some
special pages from drivers.

This patch is for avoiding to account special pages.

Signed-off-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-10-20 23:52:38 +0800
b7abea963 memcg: make page->mapping NULL before uncharge ... Browse Code »

This patch tries to make page->mapping to be NULL before
mem_cgroup_uncharge_cache_page() is called.

"page->mapping == NULL" is a good check for "whether the page is still
radix-tree or not". This patch also adds BUG_ON() to
mem_cgroup_uncharge_cache_page();

Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-10-20 23:52:38 +0800
073e587ec memcg: move charge swapin under lock ... Browse Code »

While page-cache's charge/uncharge is done under page_lock(), swap-cache
isn't. (anonymous page is charged when it's newly allocated.)

This patch moves do_swap_page()'s charge() call under lock. I don't see
any bad problem *now* but this fix will be good for future for avoiding
unnecessary racy state.

Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Daisuke Nishimura
Acked-by: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

KAMEZAWA Hiroyuki
2008-10-20 23:52:38 +0800
47c59803b devcgroup: remove spin_lock() ... Browse Code »

Since we introduced rcu for read side, spin_lock is used only for update.
But we always hold cgroup_lock() when update, so spin_lock() is not need.

Additional cleanup:
1) include linux/rcupdate.h explicitly
2) remove unused variable cur_devcgroup in devcgroup_update_access()

Signed-off-by: Lai Jiangshan
Acked-by: "Serge E. Hallyn"
Cc: Paul Menage
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2008-10-20 23:52:38 +0800
c012a54ae devcgroup: remove unused variable ... Browse Code »

Signed-off-by: Li Zefan
Acked-by: Serge Hallyn
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-10-20 23:52:38 +0800
2cdc7241a devcgroup: use kmemdup() ... Browse Code »

This saves 40 bytes on my x86_32 box.

Signed-off-by: Li Zefan
Acked-by: Serge Hallyn
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Li Zefan
2008-10-20 23:52:38 +0800
886465f40 cgroups: fix declaration of cgroup_mm_owner_callbacks ... Browse Code »

The choice of real/dummy declaration for cgroup_mm_owner_callbacks()
shouldn't be based on CONFIG_MM_OWNER, but on CONFIG_CGROUPS. Otherwise
kernel/exit.c fails to compile when something other than a cgroups
controller selects CONFIG_MM_OWNER

Signed-off-by: Paul Menage
Acked-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2008-10-20 23:52:38 +0800
cc31edcee cgroups: convert tasks file to use a seq_file with shared pid array ... Browse Code »

Rather than pre-generating the entire text for the "tasks" file each
time the file is opened, we instead just generate/update the array of
process ids and use a seq_file to report these to userspace. All open
file handles on the same "tasks" file can share a pid array, which may
be updated any time that no thread is actively reading the array. By
sharing the array, the potential for userspace to DoS the system by
opening many handles on the same "tasks" file is removed.

[Based on a patch by Lai Jiangshan, extended to use seq_file]

Signed-off-by: Paul Menage
Reviewed-by: Lai Jiangshan
Cc: Serge Hallyn
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Paul Menage
2008-10-20 23:52:38 +0800
146aa1bd0 cgroups: fix probable race with put_css_set[_taskexit] and find_css_set ... Browse Code »

put_css_set_taskexit may be called when find_css_set is called on other
cpu. And the race will occur:

put_css_set_taskexit side find_css_set side

|
atomic_dec_and_test(&kref->refcount) |
/* kref->refcount = 0 */ |
....................................................................
| read_lock(&css_set_lock)
| find_existing_css_set
| get_css_set
| read_unlock(&css_set_lock);
....................................................................
__release_css_set |
....................................................................
| /* use a released css_set */
|

[put_css_set is the same. But in the current code, all put_css_set are
put into cgroup mutex critical region as the same as find_css_set.]

[akpm@linux-foundation.org: repair comments]
[menage@google.com: eliminate race in css_set refcounting]
Signed-off-by: Lai Jiangshan
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Lai Jiangshan
2008-10-20 23:52:38 +0800
248736c2a hfsplus: fix possible deadlock when handling corrupted extents ... Browse Code »

A corrupted extent for the extent file itself may try to get an impossible
extent, causing a deadlock if I see it correctly.

Check the inode number after the first_blocks checks and fail if it's the
extent file, as according to the spec the extent file should have no
extent for itself.

Signed-off-by: Eric Sesterhenn
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Sesterhenn
2008-10-20 23:52:38 +0800
6e7152944 hfsplus: missing O_LARGEFILE check ... Browse Code »

hfsplus: O_LARGEFILE checking is missing

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=8490

From: Alan Cox
Reported-by: didier
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Alan Cox
2008-10-20 23:52:38 +0800
cdbf6dba2 ext3: avoid printk floods in the face of directory corruption ... Browse Code »

A very large directory with many read failures (either due to storage
problems, or due to invalid size & blocks from corruption) will generate a
printk storm as the filesystem continues to try to read all the blocks.
This flood of messages can tie up the box until it is complete - which may
be a very long time, especially for very large corrupted values.

This is fixed by only reporting the corruption once each time we try to
read the directory.

Signed-off-by: Eric Sandeen
Signed-off-by: "Theodore Ts'o"
Cc: Eugene Teo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eric Sandeen
2008-10-20 23:52:38 +0800
5ec8b75e3 ext3: truncate block allocated on a failed ext3_write_begin ... Browse Code »

For blocksize < pagesize we need to remove blocks that got allocated in
block_write_begin() if we fail with ENOSPC for later blocks.
block_write_begin() internally does this if it allocated page locally.
This makes sure we don't have blocks outside inode.i_size during ENOSPC.

Signed-off-by: Aneesh Kumar K.V
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Aneesh Kumar K.V
2008-10-20 23:52:38 +0800
6a897cf44 ext3: fix ext3_dx_readdir hash collision handling ... Browse Code »

This fixes a bug where readdir() would return a directory entry twice
if there was a hash collision in an hash tree indexed directory.

[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Eugene Dashevsky
Signed-off-by: Mike Snitzer
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Eugene Dashevsky
2008-10-20 23:52:38 +0800
960a22ae6 jbd: ordered data integrity fix ... Browse Code »

In ordered mode, if a file data buffer being dirtied exists in the
committing transaction, we write the buffer to the disk, move it from the
committing transaction to the running transaction, then dirty it. But we
don't have to remove the buffer from the committing transaction when the
buffer couldn't be written out, otherwise it would miss the error and the
committing transaction would not abort.

This patch adds an error check before removing the buffer from the
committing transaction.

Signed-off-by: Hidehiro Kawai
Acked-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hidehiro Kawai
2008-10-20 23:52:37 +0800
0e4fb5e28 ext3: add an option to control error handling on file data ... Browse Code »

If the journal doesn't abort when it gets an IO error in file data blocks,
the file data corruption will spread silently. Because most of
applications and commands do buffered writes without fsync(), they don't
notice the IO error. It's scary for mission critical systems. On the
other hand, if the journal aborts whenever it gets an IO error in file
data blocks, the system will easily become inoperable. So this patch
introduces a filesystem option to determine whether it aborts the journal
or just call printk() when it gets an IO error in file data.

If you mount a ext3 fs with data_err=abort option, it aborts on file data
write error. If you mount it with data_err=ignore, it doesn't abort, just
call printk(). data_err=ignore is the default.

Signed-off-by: Hidehiro Kawai
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hidehiro Kawai
2008-10-20 23:52:37 +0800
46d01a225 ext3: fix ext3 block reservation early ENOSPC issue ... Browse Code »

We could run into ENOSPC error on ext3, even when there is free blocks on
the filesystem.

The problem is triggered in the case the goal block group has 0 free
blocks , and the rest block groups are skipped due to the check of
"free_blocks < windowsz/2". Current code could fall back to non
reservation allocation to prevent early ENOSPC after examing all the block
groups with reservation on , but this code was bypassed if the reservation
window is turned off already, which is true in this case.

This patch fixed two issues:
1) We don't need to turn off block reservation if the goal block group has
0 free blocks left and continue search for the rest of block groups.

Current code the intention is to turn off the block reservation if the
goal allocation group has a few (some) free blocks left (not enough for
make the desired reservation window),to try to allocation in the goal
block group, to get better locality. But if the goal blocks have 0 free
blocks, it should leave the block reservation on, and continues search for
the next block groups,rather than turn off block reservation completely.

2) we don't need to check the window size if the block reservation is off.

The problem was originally found and fixed in ext4.

Signed-off-by: Mingming Cao
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mingming Cao
2008-10-20 23:52:37 +0800
972fbf779 ext3: don't try to resize if there are no reserved gdt blocks left ... Browse Code »

When trying to resize a ext3 fs and you run out of reserved gdt blocks,
you get an error that doesn't actually tell you what went wrong, it just
says that the gdb it picked is not correct, which is the case since you
don't have any reserved gdt blocks left. This patch adds a check to make
sure you have reserved gdt blocks to use, and if not prints out a more
relevant error.

Signed-off-by: Josef Bacik
Cc:
Cc: Andreas Dilger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Josef Bacik
2008-10-20 23:52:37 +0800
885e353c7 jbd: don't dirty original metadata buffer on abort ... Browse Code »

Currently, original metadata buffers are dirtied when they are unfiled
whether the journal has aborted or not. Eventually these buffers will be
written-back to the filesystem by pdflush. This means some metadata
buffers are written to the filesystem without journaling if the journal
aborts. So if both journal abort and system crash happen at the same
time, the filesystem would become inconsistent state. Additionally,
replaying journaled metadata can overwrite the latest metadata on the
filesystem partly. Because, if the journal aborts, journaled metadata are
preserved and replayed during the next mount not to lose uncheckpointed
metadata. This would also break the consistency of the filesystem.

This patch prevents original metadata buffers from being dirtied on abort
by clearing BH_JBDDirty flag from those buffers. Thus, no metadata
buffers are written to the filesystem without journaling.

Signed-off-by: Hidehiro Kawai
Acked-by: Jan Kara
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Hidehiro Kawai
2008-10-20 23:52:37 +0800