20 Oct, 2008
40 commits
-
There are not-on-LRU pages which can be mapped and they are not worth to
be accounted. (becasue we can't shrink them and need dirty codes to
handle specical case) We'd like to make use of usual objrmap/radix-tree's
protcol and don't want to account out-of-vm's control pages.When special_mapping_fault() is called, page->mapping is tend to be NULL
and it's charged as Anonymous page. insert_page() also handles some
special pages from drivers.This patch is for avoiding to account special pages.
Signed-off-by: KAMEZAWA Hiroyuki
Cc: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch tries to make page->mapping to be NULL before
mem_cgroup_uncharge_cache_page() is called."page->mapping == NULL" is a good check for "whether the page is still
radix-tree or not". This patch also adds BUG_ON() to
mem_cgroup_uncharge_cache_page();Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Daisuke Nishimura
Cc: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
While page-cache's charge/uncharge is done under page_lock(), swap-cache
isn't. (anonymous page is charged when it's newly allocated.)This patch moves do_swap_page()'s charge() call under lock. I don't see
any bad problem *now* but this fix will be good for future for avoiding
unnecessary racy state.Signed-off-by: KAMEZAWA Hiroyuki
Reviewed-by: Daisuke Nishimura
Acked-by: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Since we introduced rcu for read side, spin_lock is used only for update.
But we always hold cgroup_lock() when update, so spin_lock() is not need.Additional cleanup:
1) include linux/rcupdate.h explicitly
2) remove unused variable cur_devcgroup in devcgroup_update_access()Signed-off-by: Lai Jiangshan
Acked-by: "Serge E. Hallyn"
Cc: Paul Menage
Cc: James Morris
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Li Zefan
Acked-by: Serge Hallyn
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This saves 40 bytes on my x86_32 box.
Signed-off-by: Li Zefan
Acked-by: Serge Hallyn
Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The choice of real/dummy declaration for cgroup_mm_owner_callbacks()
shouldn't be based on CONFIG_MM_OWNER, but on CONFIG_CGROUPS. Otherwise
kernel/exit.c fails to compile when something other than a cgroups
controller selects CONFIG_MM_OWNERSigned-off-by: Paul Menage
Acked-by: Pekka Enberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Rather than pre-generating the entire text for the "tasks" file each
time the file is opened, we instead just generate/update the array of
process ids and use a seq_file to report these to userspace. All open
file handles on the same "tasks" file can share a pid array, which may
be updated any time that no thread is actively reading the array. By
sharing the array, the potential for userspace to DoS the system by
opening many handles on the same "tasks" file is removed.[Based on a patch by Lai Jiangshan, extended to use seq_file]
Signed-off-by: Paul Menage
Reviewed-by: Lai Jiangshan
Cc: Serge Hallyn
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
put_css_set_taskexit may be called when find_css_set is called on other
cpu. And the race will occur:put_css_set_taskexit side find_css_set side
|
atomic_dec_and_test(&kref->refcount) |
/* kref->refcount = 0 */ |
....................................................................
| read_lock(&css_set_lock)
| find_existing_css_set
| get_css_set
| read_unlock(&css_set_lock);
....................................................................
__release_css_set |
....................................................................
| /* use a released css_set */
|[put_css_set is the same. But in the current code, all put_css_set are
put into cgroup mutex critical region as the same as find_css_set.][akpm@linux-foundation.org: repair comments]
[menage@google.com: eliminate race in css_set refcounting]
Signed-off-by: Lai Jiangshan
Cc: Balbir Singh
Cc: KAMEZAWA Hiroyuki
Signed-off-by: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A corrupted extent for the extent file itself may try to get an impossible
extent, causing a deadlock if I see it correctly.Check the inode number after the first_blocks checks and fail if it's the
extent file, as according to the spec the extent file should have no
extent for itself.Signed-off-by: Eric Sesterhenn
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
hfsplus: O_LARGEFILE checking is missing
Addresses http://bugzilla.kernel.org/show_bug.cgi?id=8490
From: Alan Cox
Reported-by: didier
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A very large directory with many read failures (either due to storage
problems, or due to invalid size & blocks from corruption) will generate a
printk storm as the filesystem continues to try to read all the blocks.
This flood of messages can tie up the box until it is complete - which may
be a very long time, especially for very large corrupted values.This is fixed by only reporting the corruption once each time we try to
read the directory.Signed-off-by: Eric Sandeen
Signed-off-by: "Theodore Ts'o"
Cc: Eugene Teo
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
For blocksize < pagesize we need to remove blocks that got allocated in
block_write_begin() if we fail with ENOSPC for later blocks.
block_write_begin() internally does this if it allocated page locally.
This makes sure we don't have blocks outside inode.i_size during ENOSPC.Signed-off-by: Aneesh Kumar K.V
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This fixes a bug where readdir() would return a directory entry twice
if there was a hash collision in an hash tree indexed directory.[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Eugene Dashevsky
Signed-off-by: Mike Snitzer
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In ordered mode, if a file data buffer being dirtied exists in the
committing transaction, we write the buffer to the disk, move it from the
committing transaction to the running transaction, then dirty it. But we
don't have to remove the buffer from the committing transaction when the
buffer couldn't be written out, otherwise it would miss the error and the
committing transaction would not abort.This patch adds an error check before removing the buffer from the
committing transaction.Signed-off-by: Hidehiro Kawai
Acked-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If the journal doesn't abort when it gets an IO error in file data blocks,
the file data corruption will spread silently. Because most of
applications and commands do buffered writes without fsync(), they don't
notice the IO error. It's scary for mission critical systems. On the
other hand, if the journal aborts whenever it gets an IO error in file
data blocks, the system will easily become inoperable. So this patch
introduces a filesystem option to determine whether it aborts the journal
or just call printk() when it gets an IO error in file data.If you mount a ext3 fs with data_err=abort option, it aborts on file data
write error. If you mount it with data_err=ignore, it doesn't abort, just
call printk(). data_err=ignore is the default.Signed-off-by: Hidehiro Kawai
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We could run into ENOSPC error on ext3, even when there is free blocks on
the filesystem.The problem is triggered in the case the goal block group has 0 free
blocks , and the rest block groups are skipped due to the check of
"free_blocks < windowsz/2". Current code could fall back to non
reservation allocation to prevent early ENOSPC after examing all the block
groups with reservation on , but this code was bypassed if the reservation
window is turned off already, which is true in this case.This patch fixed two issues:
1) We don't need to turn off block reservation if the goal block group has
0 free blocks left and continue search for the rest of block groups.Current code the intention is to turn off the block reservation if the
goal allocation group has a few (some) free blocks left (not enough for
make the desired reservation window),to try to allocation in the goal
block group, to get better locality. But if the goal blocks have 0 free
blocks, it should leave the block reservation on, and continues search for
the next block groups,rather than turn off block reservation completely.2) we don't need to check the window size if the block reservation is off.
The problem was originally found and fixed in ext4.
Signed-off-by: Mingming Cao
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When trying to resize a ext3 fs and you run out of reserved gdt blocks,
you get an error that doesn't actually tell you what went wrong, it just
says that the gdb it picked is not correct, which is the case since you
don't have any reserved gdt blocks left. This patch adds a check to make
sure you have reserved gdt blocks to use, and if not prints out a more
relevant error.Signed-off-by: Josef Bacik
Cc:
Cc: Andreas Dilger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently, original metadata buffers are dirtied when they are unfiled
whether the journal has aborted or not. Eventually these buffers will be
written-back to the filesystem by pdflush. This means some metadata
buffers are written to the filesystem without journaling if the journal
aborts. So if both journal abort and system crash happen at the same
time, the filesystem would become inconsistent state. Additionally,
replaying journaled metadata can overwrite the latest metadata on the
filesystem partly. Because, if the journal aborts, journaled metadata are
preserved and replayed during the next mount not to lose uncheckpointed
metadata. This would also break the consistency of the filesystem.This patch prevents original metadata buffers from being dirtied on abort
by clearing BH_JBDDirty flag from those buffers. Thus, no metadata
buffers are written to the filesystem without journaling.Signed-off-by: Hidehiro Kawai
Acked-by: Jan Kara
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If we failed to write metadata buffers to the journal space and succeeded
to write the commit record, stale data can be written back to the
filesystem as metadata in the recovery phase.To avoid this, when we failed to write out metadata buffers, abort the
journal before writing the commit record.Signed-off-by: Hidehiro Kawai
Acked-by: Jan Kara
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The phone_device array is covered by the phone_lock mutex in all cases and
request_module no longer needs the BKL so we can remove the only remaining
instance of the BKL from phonedev.Signed-off-by: Richard Holden
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Change lock_kernel()/unlock_kernel() to local fb mutex. Each frame buffer
instance has its own mutex.The one line try_to_load() function is unrolled to request_module() in two
places for readability.[righi.andrea@gmail.com: fb: fix NULL pointer BUG dereference in fb_open()]
Signed-off-by: Krzysztof Helt
Signed-off-by: Andrea Righi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Framebuffer is heavily BKL dependant at the moment so just wrap the ioctl
handler in the driver as we push down.[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Alan Cox
Cc: Krzysztof Helt
Cc: "Antonino A. Daplas"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We can get the following oops from gpio_get_value_cansleep() when a GPIO
controller doesn't provide a get() callback:Unable to handle kernel paging request for instruction fetch
Faulting instruction address: 0x00000000
Oops: Kernel access of bad area, sig: 11 [#1]
[...]
NIP [00000000] 0x0
LR [c0182fb0] gpio_get_value_cansleep+0x40/0x50
Call Trace:
[c7b79e80] [c0183f28] gpio_value_show+0x5c/0x94
[c7b79ea0] [c01a584c] dev_attr_show+0x30/0x7c
[c7b79eb0] [c00d6b48] fill_read_buffer+0x68/0xe0
[c7b79ed0] [c00d6c54] sysfs_read_file+0x94/0xbc
[c7b79ef0] [c008f24c] vfs_read+0xb4/0x16c
[c7b79f10] [c008f580] sys_read+0x4c/0x90
[c7b79f40] [c0013a14] ret_from_syscall+0x0/0x38It's OK to request the value of *any* GPIO; most GPIOs are bidirectional,
so configuring them as outputs just enables an output driver and doesn't
disable the input logic.So the problem is that gpio_get_value_cansleep() isn't making the same
sanity check that gpio_get_value() does: making sure this GPIO isn't one
of the atypical "no input logic" cases.Reported-by: Anton Vorontsov
Signed-off-by: David Brownell
Cc: [2.6.27.x, 2.6.26.x, 2.6.25.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
gpiolib can export GPIOs to userspace via sysfs. This patch modifies the
gpio_value_show() so that any non-zero value is explicitly printed as "1",
rather than whatever numerical value the lower-level driver returns.Signed-off-by: Steve Falco
Signed-off-by: David Brownell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Teach rtc-cmos about the second bank of registers found on most modern x86
systems, giving access to 128 bytes more NVRAM.This version only sees that extra NVRAM when both register banks are
provided as part of *one* PNP resource. Since BIOS on some systems
presents them using two IO resources, and nothing merges them, this can't
always show all the NVRAM. (We're supposed to be able to use PNP id
PNP0b01 too, but BIOS tables doesn't often seem to use that particular
option.)Signed-off-by: David Brownell
Cc: Ingo Molnar
Cc: Thomas Gleixner
Cc: Bjorn Helgaas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In function sn_sal_switch_to_asynch(): drivers/serial/sn_console.c:713:
HZ * SN_SAL_UART_FIFO_DEPTH / SN_SAL_UART_FIFO_SPEED_CPS;
After preprocessing (see defines in patch) this becomes HZ * 16 / 9600 / 10
(associativity from left to right), not equivalent to HZ * 16 / 960.Looks-obviously-right-to: Tony Luck
Cc: Jes Sorensen
Acked-by: Pat Gefre
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch makes the needlessly global probe_serial_gsc() static.
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
readl/writel are not expected to accept iomap return value. Replace
bogus mapping by standard ioremap.Signed-off-by: Jiri Slaby
Cc:
Acked-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The read fail ratio is sensitive to the delay between the first byte
written and the first byte read; apparently the sensors cannot be rushed.
Increasing the minimum wait time, without changing the total wait time,
improves the fail ratio from a 8% chance that any of the sensors fails in
one read, down to 0.4%, on a Macbook Air. On a Macbook Pro 3,1, the
effect is even more apparent. By reducing the number of status polls, the
ratio is further improved to below 0.1%. Finally, increasing the total
wait time brings the fail ratio down to virtually zero.Signed-off-by: Henrik Rydberg
Tested-by: Bob McElrath
Cc: Nicolas Boichat
Cc: "Mark M. Hoffman"
Cc: Jean Delvare
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add temperature sensor support for Macbook Pro 3.
Signed-off-by: Henrik Rydberg
Cc: Nicolas Boichat
Cc: Riki Oktarianto
Cc: Mark M. Hoffman
Cc: Jean Delvare
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Adds temperature sensor support for the Macbook Pro 4.
Signed-off-by: Henrik Rydberg
Cc: Nicolas Boichat
Cc: Riki Oktarianto
Cc: Mark M. Hoffman
Cc: Jean Delvare
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
dmi_system_id.driver_data is already void*.
Cc: Henrik Rydberg
Cc: Nicolas Boichat
Cc: Riki Oktarianto
Cc: Mark M. Hoffman
Cc: Jean Delvare
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch adds accelerometer, backlight and temperature sensor support
for the Macbook Air.Signed-off-by: Henrik Rydberg
Cc: Nicolas Boichat
Cc: Riki Oktarianto
Cc: Mark M. Hoffman
Cc: Jean Delvare
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
On some recent Macbooks, the package length for the light sensors ALV0 and
ALV1 has changed from 6 to 10. This patch allows for a variable package
length encompassing both variants.Signed-off-by: Henrik Rydberg
Cc: Nicolas Boichat
Cc: Riki Oktarianto
Cc: Mark M. Hoffman
Cc: Jean Delvare
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The time to wait for a status change while reading or writing to the SMC
ports is a balance between read reliability and system performance. The
current setting yields rougly three errors in a thousand when
simultaneously reading three different temperature values on a Macbook
Air. This patch increases the setting to a value yielding roughly one
error in ten thousand, with no noticable system performance degradation.Signed-off-by: Henrik Rydberg
Cc: Nicolas Boichat
Cc: Riki Oktarianto
Cc: Mark M. Hoffman
Cc: Jean Delvare
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
On many Macbooks since mid 2007, the Pro, C2D and Air models, applesmc
fails to read some or all SMC ports. This problem has various effects,
such as flooded logfiles, malfunctioning temperature sensors,
accelerometers failing to initialize, and difficulties getting backlight
functionality to work properly.The root of the problem seems to be the command protocol. The current
code sends out a command byte, then repeatedly polls for an ack before
continuing to send or recieve data. From experiments leading to this
patch, it seems the command protocol never quite worked or changed so that
one now sends a command byte, waits a little bit, polls for an ack, and if
it fails, repeats the whole thing by sending the command byte again.This patch implements a send_command function according to the new
interpretation of the protocol, and should work also for earlier models.Signed-off-by: Henrik Rydberg
Cc: Nicolas Boichat
Cc: Riki Oktarianto
Cc: Mark M. Hoffman
Cc: Jean Delvare
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
At one single place in the code, the specified number of bytes to read and
the actual number of bytes read differ by one. This one-liner patch fixes
that inconsistency.Signed-off-by: Henrik Rydberg
Cc: Nicolas Boichat
Cc: Riki Oktarianto
Cc: Mark M. Hoffman
Cc: Jean Delvare
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Adds therm-min/max/crit-alarm callbacks, sensor-device-attribute
declarations, and refs to those new decls in the macro used to initialize
the therm_group (of sysfs files)The thermistors use voltage channels to measure; so they don't have a
fault-alarm, but unlike the other voltages, they do have an overtemp,
which we call crit (by convention).[akpm@linux-foundation.org: cleanup]
Signed-off-by: Jim Cromie
Cc: Jean Delvare
Cc: "Mark M. Hoffman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
temp and vin status register values may be set by chip specifications, set
again by bios, or by this previously loaded driver. Debug output nicely
displays modprobe init=\d actions.Signed-off-by: Jim Cromie
Cc: Jean Delvare
Cc: "Mark M. Hoffman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds