15 Nov, 2007
40 commits
-
Fix http://bugzilla.kernel.org/show_bug.cgi?id=9247
Allow sigcont to be sent to a process with greater capabilities if it is in
the same session. Otherwise, a shell from which I've started a root shell
and done 'suspend' can't be restarted by the parent shell.Also don't do file-capabilities signaling checks when uids for the
processes don't match, since the standard check_kill_permission will have
done those checks.[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: Serge E. Hallyn
Acked-by: Andrew Morgan
Cc: Chris Wright
Tested-by: "Theodore Ts'o"
Cc: Stephen Smalley
Cc: "Rafael J. Wysocki"
Cc: Chris Wright
Cc: James Morris
Cc: Stephen Smalley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The delay incurred in lock_page() should also be accounted in swap delay
accountingReported-by: Nick Piggin
Signed-off-by: Balbir Singh
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Handle the case of CONFIG_PRINTK being disabled. This requires a do-nothing
stub to be present in arch/um/include/user.h so that we don't get references
to printk from libc code.Signed-off-by: Jeff Dike
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make UML build in the absence of CONFIG_INET by making the inetaddr_notifier
registration depend on it.Signed-off-by: Jeff Dike
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
asm/page.h is disappearing from the libc headers and we don't need it anyway.
Signed-off-by: Jeff Dike
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The spurious IRQ testing in request_irq is mishandled in um_request_irq, which
sets the incoming file descriptors non-blocking only after request_irq
succeeds. This results in the spurious irq calling read on a blocking
descriptor, and a hang.Fixed by reversing the O_NONBLOCK setting and the request_irq call.
Signed-off-by: Jeff Dike
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
With 64KB blocksize, a directory entry can have size 64KB which does not
fit into 16 bits we have for entry lenght. So we store 0xffff instead and
convert value when read from / written to disk. The patch also converts
some places to use ext3_next_entry() when we are changing them anyway.[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: Jan Kara
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix some warnings with SMBFS_DEBUG_* builds. This patch makes it so that
builds with -Werror don't fail.Signed-off-by: Jeff Layton
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Lockdep reports a circular locking dependency in the hibernate code
because
- during system boot hibernate code (from an initcall) locks pm_mutex
and then a sysfs buffer mutex via name_to_dev_t
- during regular operation hibernate code locks pm_mutex under a
sysfs buffer mutex because it's called from sysfs methods.The deadlock can never happen because during initcall invocation nothing
can write to sysfs yet. This removes the lockdep report by marking the
initcall locking as being in a different class.Signed-off-by: Johannes Berg
Cc: "Rafael J. Wysocki"
Cc: Alan Stern
Acked-by: Peter Zijlstra
Cc: Pavel Machek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In __do_IRQ(), the normal case is that IRQ_DISABLED is checked and if set
the handler (handle_IRQ_event()) is not called.Earlier in __do_IRQ(), if IRQ_PER_CPU is set the code does not check
IRQ_DISABLED and calls the handler even though IRQ_DISABLED is set. This
behavior seems unintentional.One user encountering this behavior is the CPE handler (in
arch/ia64/kernel/mca.c). When the CPE handler encounters too many CPEs
(such as a solid single bit error), it sets up a polling timer and disables
the CPE interrupt (to avoid excessive overhead logging the stream of single
bit errors). disable_irq_nosync() is called which sets IRQ_DISABLED. The
IRQ_PER_CPU flag was previously set (in ia64_mca_late_init()). The net
result is the CPE handler gets called even though it is marked disabled.If the behavior of not checking IRQ_DISABLED when IRQ_PER_CPU is set is
intentional, it would be worthy of a comment describing the intended
behavior. disable_irq_nosync() does call chip->disable() to provide a
chipset specifiec interface for disabling the interrupt, which avoids this
issue when used.Signed-off-by: Russ Anderson
Cc: "Luck, Tony"
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: Bjorn Helgaas
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This is my trivial patch to swat innumerable little bugs with a single
blow.After some intensive review (my apologies for not having gotten to this
sooner) what we have looks like a good base to build on with the current
pid namespace code but it is not complete, and it is still much to simple
to find issues where the kernel does the wrong thing outside of the initial
pid namespace.Until the dust settles and we are certain we have the ABI and the
implementation is as correct as humanly possible let's keep process ID
namespaces behind CONFIG_EXPERIMENTAL.Allowing us the option of fixing any ABI or other bugs we find as long as
they are minor.Allowing users of the kernel to avoid those bugs simply by ensuring their
kernel does not have support for multiple pid namespaces.[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: Eric W. Biederman
Cc: Cedric Le Goater
Cc: Adrian Bunk
Cc: Jeremy Fitzhardinge
Cc: Kir Kolyshkin
Cc: Kirill Korotaev
Cc: Pavel Emelyanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Mark start_cpu_timer() as __cpuinit instead of __devinit.
Fixes this section warning:WARNING: vmlinux.o(.text+0x60e53): Section mismatch: reference to .init.text:start_cpu_timer (between 'vmstat_cpuup_callback' and 'vmstat_show')
Signed-off-by: Randy Dunlap
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Make 'default_mode' and 'default_var' be __initdata.
Fixes these section warnings:WARNING: vmlinux.o(.data+0x128e0): Section mismatch: reference to .init.data:default_mode_CRT (between 'default_mode' and 'default_var')
WARNING: vmlinux.o(.data+0x128e4): Section mismatch: reference to .init.data:default_var_CRT (between 'default_var' and 'dev_attr_size')Signed-off-by: Randy Dunlap
Cc: "Antonino A. Daplas"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
sys_open / sys_read were used in the early 1.2 days to load firmware from
disk inside drivers. Since 2.0 or so this was deprecated behavior, but
several drivers still were using this. Since a few years we have a
request_firmware() API that implements this in a nice, consistent way.
Only some old ISA sound drivers (pre-ALSA) still straggled along for some
time.... however with commit c2b1239a9f22f19c53543b460b24507d0e21ea0c the
last user is now gone.This is a good thing, since using sys_open / sys_read etc for firmware is a
very buggy to dangerous thing to do; these operations put an fd in the
process file descriptor table.... which then can be tampered with from
other threads for example. For those who don't want the firmware loader,
filp_open()/vfs_read are the better APIs to use, without this security
issue.The patch below marks sys_open and sys_read unused now that they're
really not used anymore, and for deletion in the 2.6.25 timeframe.Signed-off-by: Arjan van de Ven
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit faf8c714f4508207a9c81cc94dafc76ed6680b44 caused a regression:
parameter names longer than MAX_KBUILD_MODNAME will now be rejected,
although we just need to keep the module name part that short. This patch
restores the old behaviour while still avoiding that memchr is called with
its length parameter larger than the total string length.Signed-off-by: Jan Kiszka
Cc: Dave Young
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently we special case when we have only the initial pid namespace.
Unfortunately in doing so the copied case for the other namespaces was
broken so we don't properly flush the thread directories :(So this patch removes the unnecessary special case (removing a usage of
proc_mnt) and corrects the flushing of the thread directories.Signed-off-by: Eric W. Biederman
Cc: Al Viro
Cc: Pavel Emelyanov
Cc: Sukadev Bhattiprolu
Cc: Kirill Korotaev
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
[akpm@linux-foundation.org: coding-style cleanups]
Signed-off-by: Roel Kluin
Cc: Ralf Baechle
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Roel Kluin
Cc: Mikael Starvik
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
We have seen ramdisk based install systems, where some pages of mapped
libraries and programs were suddendly zeroed under memory pressure. This
should not happen, as the ramdisk avoids freeing its pages by keeping them
dirty all the time.It turns out that there is a case, where the VM makes a ramdisk page clean,
without telling the ramdisk driver. On memory pressure shrink_zone runs
and it starts to run shrink_active_list. There is a check for
buffer_heads_over_limit, and if true, pagevec_strip is called.
pagevec_strip calls try_to_release_page. If the mapping has no releasepage
callback, try_to_free_buffers is called. try_to_free_buffers has now a
special logic for some file systems to make a dirty page clean, if all
buffers are clean. Thats what happened in our test case.The simplest solution is to provide a noop-releasepage callback for the
ramdisk driver. This avoids try_to_free_buffers for ramdisk pages.Signed-off-by: Christian Borntraeger
Acked-by: Nick Piggin
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The tle62x0 driver was ignoring all read errors. This patch makes it
pass such errors up the stack, instead of returning bogus data.Signed-off-by: David Brownell
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix obvious NULL dereferences spotted by the Coverity checker.
Signed-off-by: Adrian Bunk
Acked-by: Miklos Szeredi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit ef8b4520bd9f8294ffce9abd6158085bde5dc902 added one NULL check for
"p" in krealloc(), but that doesn't seem to be enough since there
doesn't seem to be any guarantee that memcpy(ret, NULL, 0) works
(spotted by the Coverity checker).For making it clearer what happens this patch also removes the pointless
min().Signed-off-by: Adrian Bunk
Acked-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix an obvious use-after-free spotted by the Coverity checker.
Signed-off-by: Adrian Bunk
Cc: Trond Myklebust
Cc: "J. Bruce Fields"
Cc: Neil Brown
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
"Luming Yu" says:
There is a "ttyS1 irq is -1" problem observed on tiger4 which cause the
serial port broken.It is because that there is __no__ ACPI IRQ resource assigned for the
serial port. So the value of the IRQ for the port is never changed since it
got initialized to -1.If PNP supplies a valid IRQ, use it. Otherwise, leave port.irq == 0, which
means "no IRQ" to the serial core.Signed-off-by: Bjorn Helgaas
Cc: Yu Luming
Acked-by: Matthew Wilcox
Cc: Alan Cox
Cc: Russell King
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The i5000_edac driver's PCI registration structure has the name
""i5000_edac"" (with extra set of double-quotes) which is probably not
intentional. Get rid of __stringify.Signed-off-by: Darrick J. Wong
Cc: Doug Thompson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Firmware like PNPBIOS or ACPI can report the address space consumed by the
RTC. The actual space consumed may be less than the size (RTC_IO_EXTENT)
assumed by the RTC driver.The PNP core doesn't request resources yet, but I'd like to make it do so.
If/when it does, the RTC_IO_EXTENT request may fail, which prevents the RTC
driver from loading.Since we only use the RTC index and data registers at RTC_PORT(0) and
RTC_PORT(1), we can fall back to requesting just enough space for those.If the PNP core requests resources, this results in typical I/O port usage
like this:0070-0073 : 00:06
Cc: Alessandro Zummo
Cc: David Brownell
Cc: Ralf Baechle
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The misc_register() error path always released an I/O port region,
even if the region was memory-mapped (only mips uses memory-mapped RTC,
as far as I can see).Signed-off-by: Bjorn Helgaas
Cc: Alessandro Zummo
Cc: David Brownell
Acked-by: Ralf Baechle
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This is not a new problem in 2.6.23-git17. 2.6.22/2.6.23 is buggy in the
same way.Reiserfs could accumulate dirty sub-page-size files until umount time.
They cannot be synced to disk by pdflush routines or explicit `sync'
commands. Only `umount' can do the trick.The direct cause is: the dirty page's PG_dirty is wrongly _cleared_.
Call trace:
[] cancel_dirty_page+0xd0/0xf0
[] :reiserfs:reiserfs_cut_from_item+0x660/0x710
[] :reiserfs:reiserfs_do_truncate+0x271/0x530
[] :reiserfs:reiserfs_truncate_file+0xfd/0x3b0
[] :reiserfs:reiserfs_file_release+0x1e0/0x340
[] __fput+0xcc/0x1b0
[] fput+0x16/0x20
[] filp_close+0x56/0x90
[] sys_close+0xad/0x110
[] system_call+0x7e/0x83Fix the bug by removing the cancel_dirty_page() call. Tests show that
it causes no bad behaviors on various write sizes.=== for the patient ===
Here are more detailed demonstrations of the problem.1) the page has both PG_dirty(D)/PAGECACHE_TAG_DIRTY(d) after being written to;
and then only PAGECACHE_TAG_DIRTY(d) remains after the file is closed.------------------------------ screen 0 ------------------------------
[T0] root /home/wfg# cat > /test/tiny
[T1] hi
[T2] root /home/wfg#------------------------------ screen 1 ------------------------------
[T1] root /home/wfg# echo /test/tiny > /proc/filecache
[T1] root /home/wfg# cat /proc/filecache
# file /test/tiny
# flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback
# idx len state refcnt
0 1 ___UD__Bd_ 2
[T2] root /home/wfg# cat /proc/filecache
# file /test/tiny
# flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback
# idx len state refcnt
0 1 ___U___Bd_ 22) note the non-zero 'cancelled_write_bytes' after /tmp/hi is copied.
------------------------------ screen 0 ------------------------------
[T0] root /home/wfg# echo hi > /tmp/hi
[T1] root /home/wfg# cp /tmp/hi /dev/stdin /test
[T2] hi
[T3] root /home/wfg#------------------------------ screen 1 ------------------------------
[T1] root /proc/4397# cd /proc/`pidof cp`
[T1] root /proc/4713# cat io
rchar: 8396
wchar: 3
syscr: 20
syscw: 1
read_bytes: 0
write_bytes: 20480
cancelled_write_bytes: 4096
[T2] root /proc/4713# cat io
rchar: 8399
wchar: 6
syscr: 21
syscw: 2
read_bytes: 0
write_bytes: 24576
cancelled_write_bytes: 4096//Question: the 'write_bytes' is a bit more than expected ;-)
Tested-by: Maxim Levitsky
Cc: Peter Zijlstra
Cc: Jeff Mahoney
Signed-off-by: Fengguang Wu
Reviewed-by: Chris Mason
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add support for version 2 of the ioatdma device. This device handles
the descriptor chain and DCA services slightly differently:
- Instead of moving the dma descriptors between a busy and an idle chain,
this new version uses a single circular chain so that we don't have
rewrite the next_descriptor pointers as we add new requests, and the
device doesn't need to re-read the last descriptor.
- The new device has the DCA tags defined internally instead of needing
them defined statically.Signed-off-by: Shannon Nelson
Cc: "Williams, Dan J"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add the field names to marker example format string.
Signed-off-by: Mathieu Desnoyers
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Describes the format string standard further: Use of field names before the
type specifiers..Signed-off-by: Mathieu Desnoyers
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Upon module load, we must take the markers mutex. It implies that the marker
mutex must be nested inside the module mutex.It implies changing the nesting order : now the marker mutex nests inside the
module mutex. Make the necessary changes to reverse the order in which the
mutexes are taken.Includes some cleanup from Dave Hansen .
Signed-off-by: Mathieu Desnoyers
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
I found a few bugs in the BFS driver. Detailed description of the bugs as
well as the steps to reproduce the errors are given in the kernel bugzilla.
Please follow these links for more information:http://bugzilla.kernel.org/show_bug.cgi?id=9363
http://bugzilla.kernel.org/show_bug.cgi?id=9364
http://bugzilla.kernel.org/show_bug.cgi?id=9365
http://bugzilla.kernel.org/show_bug.cgi?id=9366This patch fixes the bugs described above. Besides, the patch introduces
coding style changes to make the BFS driver conform to the requirements
specified for Linux kernel code. Finally, I made a few cosmetic changes
such as removal of trivial debug output.Also, the patch removes the fields `si_lf_ioff' and `si_lf_sblk' of the
in-core superblock structure. These fields are initialized but never
actually used.If you are wondering why I need BFS, here is the answer: I am using this
driver in the context of Linux kernel classes I am teaching in the Moscow
State University and in the International Institute of Information
Technology in Pune, India.Signed-off-by: Dmitri Vorobiev
Cc: Tigran Aivazian
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Revert 62d0df64065e7c135d0002f069444fbdfc64768f.
This was originally intended as a simple initial example of how to create a
control groups subsystem; it wasn't intended for mainline, but I didn't make
this clear enough to Andrew.The CFS cgroup subsystem now has better functionality for the per-cgroup usage
accounting (based directly on CFS stats) than the "usage" status file in this
patch, and the "load" status file is rather simplistic - although having a
per-cgroup load average report would be a useful feature, I don't believe this
patch actually provides it. If it gets into the final 2.6.24 we'd probably
have to support this interface for ever.Cc: Paul Menage
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
For administrative purpose, we want to query actual block usage for
hugetlbfs file via fstat. Currently, hugetlbfs always return 0. Fix that
up since kernel already has all the information to track it properly.Signed-off-by: Ken Chen
Acked-by: Adam Litke
Cc: Badari Pulavarty
Cc: David Gibson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
return_unused_surplus_pages() can become static.
Signed-off-by: Adrian Bunk
Acked-by: Adam Litke
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When a MAP_SHARED mmap of a hugetlbfs file succeeds, huge pages are reserved
to guarantee no problems will occur later when instantiating pages. If quotas
are in force, page instantiation could fail due to a race with another process
or an oversized (but approved) shared mapping.To prevent these scenarios, debit the quota for the full reservation amount up
front and credit the unused quota when the reservation is released.Signed-off-by: Adam Litke
Cc: Ken Chen
Cc: Andy Whitcroft
Cc: Dave Hansen
Cc: David Gibson
Cc: William Lee Irwin III
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add a second parameter 'delta' to hugetlb_get_quota and hugetlb_put_quota to
allow bulk updating of the sbinfo->free_blocks counter. This will be used by
the next patch in the series.Signed-off-by: Adam Litke
Cc: Ken Chen
Cc: Andy Whitcroft
Cc: Dave Hansen
Cc: David Gibson
Cc: William Lee Irwin III
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Now that quota is credited by free_huge_page(), calls to hugetlb_get_quota()
seem out of place. The alloc/free API is unbalanced because we handle the
hugetlb_put_quota() but expect the caller to open-code hugetlb_get_quota().
Move the get inside alloc_huge_page to clean up this disparity.This patch has been kept apart from the previous patch because of the somewhat
dodgy ERR_PTR() use herein. Moving the quota logic means that
alloc_huge_page() has two failure modes. Quota failure must result in a
SIGBUS while a standard allocation failure is OOM. Unfortunately, ERR_PTR()
doesn't like the small positive errnos we have in VM_FAULT_* so they must be
negated before they are used.Does anyone take issue with the way I am using PTR_ERR. If so, what are your
thoughts on how to clean this up (without needing an if,else if,else block at
each alloc_huge_page() callsite)?Signed-off-by: Adam Litke
Cc: Ken Chen
Cc: Andy Whitcroft
Cc: Dave Hansen
Cc: David Gibson
Cc: William Lee Irwin III
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The hugetlbfs quota management system was never taught to handle MAP_PRIVATE
mappings when that support was added. Currently, quota is debited at page
instantiation and credited at file truncation. This approach works correctly
for shared pages but is incomplete for private pages. In addition to
hugetlb_no_page(), private pages can be instantiated by hugetlb_cow(); but
this function does not respect quotas.Private huge pages are treated very much like normal, anonymous pages. They
are not "backed" by the hugetlbfs file and are not stored in the mapping's
radix tree. This means that private pages are invisible to
truncate_hugepages() so that function will not credit the quota.This patch (based on a prototype provided by Ken Chen) moves quota crediting
for all pages into free_huge_page(). page->private is used to store a pointer
to the mapping to which this page belongs. This is used to credit quota on
the appropriate hugetlbfs instance.Signed-off-by: Adam Litke
Cc: Ken Chen
Cc: Ken Chen
Cc: Andy Whitcroft
Cc: Dave Hansen
Cc: David Gibson
Cc: William Lee Irwin III
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds