28 Aug, 2006
28 commits
-
Change the list of cpus allowed to tasks in the top (root) cpuset to
dynamically track what cpus are online, using a CPU hotplug notifier. Make
this top cpus file read-only.On systems that have cpusets configured in their kernel, but that aren't
actively using cpusets (for some distros, this covers the majority of
systems) all tasks end up in the top cpuset.If that system does support CPU hotplug, then these tasks cannot make use
of CPUs that are added after system boot, because the CPUs are not allowed
in the top cpuset. This is a surprising regression over earlier kernels
that didn't have cpusets enabled.In order to keep the behaviour of cpusets consistent between systems
actively making use of them and systems not using them, this patch changes
the behaviour of the 'cpus' file in the top (root) cpuset, making it read
only, and making it automatically track the value of cpu_online_map. Thus
tasks in the top cpuset will have automatic use of hot plugged CPUs allowed
by their cpuset.Thanks to Anton Blanchard and Nathan Lynch for reporting this problem,
driving the fix, and earlier versions of this patch.Signed-off-by: Paul Jackson
Cc: Nathan Lynch
Cc: Anton Blanchard
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A recent patch broke the ability to do a user-request check of a raid1.
This patch fixes the breakage and also moves a comment that was dislocated
by the same patch.Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
If we
- shut down a clean array,
- restart with one (or more) drive(s) missing
- make some changes
- pause, so that they array gets marked 'clean',
the event count on the superblock of included drives
will be the same as that of the removed drives.
So adding the removed drive back in will cause it
to be included with no resync.To avoid this, we only update the eventcount backwards when the array
is not degraded. In this case there can (should) be no non-connected
drives that we can get confused with, and this is the particular case
where updating-backwards is valuable.Signed-off-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix two compile failures in eventpoll.c code which would happen if
DEBUG_EPOLL is bigger than zero.Signed-off-by: Masoud Sharbiani
Cc: Davide Libenzi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Here's updated documentation for the relay interface, rewritten to match
the relayfs->relay changes. It also moves relayfs.txt to relay.txt in the
process.It includes the changes to relayfs.txt previously posted by Randy Dunlap,
thanks for those.The relay-apps examples have also been updated to match, and can be found
on the sourceforge relayfs website.Signed-off-by: Tom Zanussi
Cc: "Randy.Dunlap"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
An up() is called in kernel/stop_machine.c on failure, and also in the
caller (unconditionally).Signed-off-by: Zhou Yingchao
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
1) When we allocated last fragment in ufs_truncate, we read page, check
if block mapped to address, and if not trying to allocate it. This is
wrong behaviour, fragment may be NOT allocated, but mapped, this
happened because of "block map" function not checked allocated fragment
or not, it just take address of the first fragment in the block, add
offset of fragment and return result, this is correct behaviour in
almost all situation except call from ufs_truncate.2) Almost all implementation of UFS, which I can investigate have such
"defect": if you have full disk, and try truncate file, for example 3GB
to 2MB, and have hole in this region, truncate return -ENOSPC. I tried
evade from this problem, but "block allocation" algorithm is tied to
right value of i_lastfrag, and fix of this corner case may slow down of
ordinaries scenarios, so this patch makes behavior of "truncate"
operations similar to what other UFS implementations do.Signed-off-by: Evgeniy Dushistov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
On UFS, this scenario:
open(O_TRUNC)
lseek(1024 * 1024 * 80)
write("A")
lseek(1024 * 2)
write("A")may cause access to invalid address.
This happened because of "goal" is calculated in wrong way in block
allocation path, as I see this problem exists also in 2.4.We use construction like this i_data[lastfrag], i_data array of pointers to
direct blocks, indirect and so on, it has ceratain size ~20 elements, and
lastfrag may have value for example 40000.Also this patch fixes related to handling such scenario issues, wrong
zeroing metadata, in case of block(not fragment) allocation, and wrong goal
calculation, when we allocate blockSigned-off-by: Evgeniy Dushistov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
To handle the earlier bogus ENOSPC error caused by filesystem full of block
reservation, current code falls back to non block reservation, starts to
allocate block(s) from the goal allocation block group as if there is no
block reservation.Current code needs to re-load the corresponding block group descriptor for
the initial goal block group in this case. The patch fixes this.Signed-off-by: Mingming Cao
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Mounting an ext2 filesystem with zero s_inodes_per_group will cause a
divide error.Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Mounting a (corrupt) minix filesystem with zero s_zmap_blocks
gives a spectacular crash on my 2.6.17.8 system, no doubt
because minix/inode.c does an unconditional
minix_set_bit(0,sbi->s_zmap[0]->b_data);[akpm@osdl.org: make labels conistent while we're there]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The recent hwctrl core conversion for MTD NAND devices broke the Amstrad
Delta driver. This fixes it up and uses the existing control line defines
rather than unclear magic numbers.Signed-off-by: Jonathan McDowell
Acked-by: David Woodhouse
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
futex_find_get_task:
if (p->state == EXIT_ZOMBIE || p->exit_state == EXIT_ZOMBIE)
return NULL;I can't understand this. First, p->state can't be EXIT_ZOMBIE. The
->exit_state check looks strange too. Sub-threads or tasks whose ->parent
ignores SIGCHLD go directly to EXIT_DEAD state (I am ignoring a ptrace
case). Why EXIT_DEAD tasks should be ok? Yes, EXIT_ZOMBIE is more
important (a task may stay zombie for a long time), but this doesn't mean
we should explicitely ignore other EXIT_XXX states.Signed-off-by: Oleg Nesterov
Acked-by: Ingo Molnar
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When reading /dev/vcsa while a font with more than 256 characters is
loaded, one of the attribute bits records the 9th bit of the character.
But depending on the console driver (vgacon or fbcon for instance), that's
bit 3 or bit 0. And there is no way for userland to know that, thus no way
for userland to safely grab the screen content. So here is a (tested)
patch:Add a VT_GETHIFONTMASK ioctl for knowing which bit is the 9th bit for VC
text (vc_hi_font_mask field of the vc_data structure).Signed-off-by: Samuel Thibault
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
I wish I was happier about this patch. It'll serve as a placeholder for
the moment. I'm still trying to get a G550 working in order to even
reproduce the problem this patch introduces. I find that the G450 has
jitter even without this patch, so it won't show me what the patch changed.
At this point, I'll continue trying to get the G550 to work, and in
parallel work with the G450 to work out the kinks.The patch is below.
Set XDVICLKCTRL only on PPC, as doing this apparently introduces jitter on
the G550, at least on x86 architectures.Signed-off-by: Paul A. Clarke
Signed-off-by: Petr Vandrovec
Cc: "Antonino A. Daplas"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
While testing Moxa C218T/PCI on PowerPC 405EP I found that loading firmware
using the linux kernel driver fails because calculation of the checksum is
not endianess independent in the original code.After I fixed this I found that uploading firmware in a system with
multiple cards causes a kernel oops. I had a look in the recent moxa
sources and found that they do some kind of locking there. Applying this
lock fixed the problem.Alan sayeth:
Checksum changes are clearly correct. Other changes is an improvement but
not I think enough to handle malicious firmware attacks. That said such an
attacker has CAP_SYS_RAWIO anyway so that part is irrelevant except for
neatness.[akpm@osdl.org: cleanups]
Signed-off-by: Dirk Eibach
Acked-by: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Ignore the return value of early_init_acpi(), as it can give false error
messages. If there is something really wrong, then register_driver will
fail cleanly with EINVAL later.[ background: modprobe acpi-cpufreq on systems not capable of speed-scaling
started failing with 'invalid argument', where previously it would only
ever -ENODEVI'm not 100% happy with the solution. It'd be better to handle
failure properly, but this is a low-impact change for 2.6.18
We can always revisit doing this better in .19 --davej.]Signed-off-by: Venkatesh Pallipadi
Signed-off-by: Dave Jones
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
sched_setscheduler() looks at ->signal->rlim[]. It is unsafe do
dereference ->signal unless tasklist_lock or ->siglock is held (or p ==
current). We pin the task structure, but this can't prevent from
release_task()->__exit_signal() which sets ->signal = NULL.Restore tasklist_lock across the setscheduler call.
Signed-off-by: Oleg Nesterov
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Commit b64ef8afa58f397e1eaba2bd9ecaa6812064d464 ("[PATCH] add imacfb
documentation and detection") contained a wrong DMI_MATCH.Signed-off-by: Thomas Meyer
Cc: Greg KH
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Read the return value before we release the nand device otherwise the
value can become corrupted by another user of chip->ops, ultimately
resulting in filesystem corruption.Signed-off-by: Richard Purdie
Cc: David Woodhouse
Acked-by: Josh Boyer
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
On Wed, 2006-08-09 at 07:57 +0200, Rolf Eike Beer wrote:
> =============================================
> [ INFO: possible recursive locking detected ]
> ---------------------------------------------
> parted/7929 is trying to acquire lock:
> (&bdev->bd_mutex){--..}, at: [] __blkdev_put+0x1e/0x13c
>
> but task is already holding lock:
> (&bdev->bd_mutex){--..}, at: [] do_open+0x72/0x3a8
>
> other info that might help us debug this:
> 1 lock held by parted/7929:
> #0: (&bdev->bd_mutex){--..}, at: [] do_open+0x72/0x3a8
> stack backtrace:
> [] show_trace_log_lvl+0x58/0x15b
> [] show_trace+0xd/0x10
> [] dump_stack+0x17/0x1a
> [] __lock_acquire+0x753/0x99c
> [] lock_acquire+0x4a/0x6a
> [] mutex_lock_nested+0xc8/0x20c
> [] __blkdev_put+0x1e/0x13c
> [] blkdev_put+0xa/0xc
> [] do_open+0x336/0x3a8
> [] blkdev_open+0x1f/0x4c
> [] __dentry_open+0xc7/0x1aa
> [] nameidata_to_filp+0x1c/0x2e
> [] do_filp_open+0x2e/0x35
> [] do_sys_open+0x38/0x68
> [] sys_open+0x16/0x18
> [] sysenter_past_esp+0x56/0x8dOK, I'm having a look here; its all new to me so bear with me.
blkdev_open() calls
do_open(bdev, ...,BD_MUTEX_NORMAL) and takes
mutex_lock_nested(&bdev->bd_mutex, BD_MUTEX_NORMAL)then something fails, and we're thrown to:
out_first: where
if (bdev != bdev->bd_contains)
blkdev_put(bdev->bd_contains) which is
__blkdev_put(bdev->bd_contains, BD_MUTEX_NORMAL) which does
mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_NORMAL) bd_contains is either bdev or whole, and
since we take the branch it must be whole. So it seems to me the
following patch would be the right one:[akpm@osdl.org: compile fix]
Signed-off-by: Peter Zijlstra
Cc: Arjan van de Ven
Acked-by: NeilBrown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Recently a patch was added for preliminary suspend/resume handling on
!PPC_PMAC. However, this broke both suspend and firewire on powerpc
because it saves the pci state after the device has already been disabled.This moves the save state to before the pmac specific code.
Signed-off-by: Danny Tholen
Cc: Stefan Richter
Acked-by: Benjamin Herrenschmidt
Cc: Ben Collins
Cc: Jody McIntyre
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Sergey Vlasov noticed that there is not kernel.suid_dumpable, but
fs.suid_dumpable.How KERN_SETUID_DUMPABLE ended up in fs_table[]? Hell knows...
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When cdev_add() failed there is no reason to call cdev_del().
Signed-off-by: Rolf Eike Beer
Cc: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix the year check on setting the time with the S3C24XX RTC driver. Also
move the debug to before the set to see what is going on if it does fail.Signed-off-by: Ben Dooks
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There is a bug in mm/swapfile.c#swap_type_of() that makes swsusp only be
able to use the first active swap partition as the resume device. Fix it.Signed-off-by: Rafael J. Wysocki
Cc: Hugh Dickins
Acked-by: Pavel Machek
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
On an nForce4-equipped machine with two SATA disk in raid1 setup using dmraid,
we experienced frequent deadlock of the system under high i/o load. 'cat
/dev/zero > ~/zero' was the most reliable way to reproduce them: Randomly
after a few GB, 'cp' would be left in 'D' state along with kjournald and
kmirrord. The functions cp and kjournald were blocked in did vary, but
kmirrord's wchan always pointed to 'mempool_alloc()'. We've seen this pattern
on 2.6.15 and 2.6.17 kernels. http://lkml.org/lkml/2005/4/20/142 indicates
that this problem has been around even before.So much for the facts, here's my interpretation: mempool_alloc() first tries
to atomically allocate the requested memory, or falls back to hand out
preallocated chunks from the mempool. If both fail, it puts the calling
process (kmirrord in this case) on a private waitqueue until somebody refills
the pool. Where the only 'somebody' is kmirrord itself, so we have a
deadlock.I worked around this problem by falling back to a (blocking) kmalloc when
before kmirrord would have ended up on the waitqueue. This defeats part of
the benefits of using the mempool, but at least keeps the system running. And
it could be done with a two-line change. Note that mempool_alloc() clears the
GFP_NOIO flag internally, and only uses it to decide whether to wait or return
an error if immediate allocation fails, so the attached patch doesn't change
behaviour in the non-deadlocking case. Path is against current git
(2.6.18-rc4), but should apply to earlier versions as well. I've tested on
2.6.15, where this patch makes the difference between random lockup and a
stable system.Signed-off-by: Daniel Kobras
Acked-by: Alasdair G Kergon
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
In the cleanups of drivers/rtc/s3c-rtc.c, the base address for the
registers got broken. This patch fixes that by ensuring the readb/writeb
are all prefixed with the base returned from ioremap()ing the registers.Also fix check for valid year range, which was the wrong way around.
Signed-off-by: Ben Dooks
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Aug, 2006
12 commits
-
This fixes CCID3 to give much closer performance to RFC4342.
CCID3 is meant to alter sending rate based on RTT and loss.
The performance was verified against:
http://wand.net.nz/~perry/max_download.phpFor example I tested with netem and had the following parameters:
Delayed Acks 1, MSS 256 bytes, RTT 105 ms, packet loss 5%.This gives a theoretical speed of 71.9 Kbits/s. I measured across three
runs with this patch set and got 70.1 Kbits/s. Without this patchset the
average was 232 Kbits/s which means Linux can't be used for CCID3 research
properly.I also tested with netem turned off so box just acting as router with 1.2
msec RTT. The performance with this is the same with or without the patch
at around 30 Mbit/s.Signed off by: Ian McDonald
Signed-off-by: David S. Miller -
The bridge-netfilter code will overwrite memory if there is not
headroom in the skb to save the header. This first showed up when
using Xen with sky2 driver that doesn't allocate the extra space.Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller -
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
[DCCP]: Introduce dccp_rx_hist_find_entry
[DCCP]: Introduces follows48 function
[DCCP]: Update contact details and copyright
[DCCP]: Fix typo
[IPV6]: Segmentation offload not set correctly on TCP children
[CONNECTOR]: Add userspace example code into Documentation/connector/ -
This adds a new function dccp_rx_hist_find_entry.
Signed off by: Ian McDonald
Signed-off-by: David S. Miller -
This adds a new function to see if two sequence numbers follow each
other.Signed off by: Ian McDonald
Signed-off-by: David S. Miller -
Just updating copyright and contacts
Signed off by: Ian McDonald
Signed-off-by: David S. Miller -
This fixes a small typo in net/dccp/libs/packet_history.c
Signed off by: Ian McDonald
Signed-off-by: David S. Miller -
TCP over IPV6 would incorrectly inherit the GSO settings.
This would cause kernel to send Tcp Segmentation Offload packets for
IPV6 data to devices that can't handle it. It caused the sky2 driver
to lock http://bugzilla.kernel.org/show_bug.cgi?id=7050
and the e1000 would generate bogus packets. I can't blame the
hardware for gagging if the upper layers feed it garbage.This was a new bug in 2.6.18 introduced with GSO support.
Signed-off-by: Stephen Hemminger
Signed-off-by: David S. Miller -
I was asked several times to include userspace example code into
Documentation, so if there is no policy against it, consider attached patch
for 2.6.18. This program works with included Documentation/connector/cn_test.c
connector module.Signed-off-by: Evgeniy Polyakov
Signed-off-by: David S. Miller -
The current sun disklabel code uses a signed int for the sector count.
When partitions larger than 1 TB are used, the cast to a sector_t causes
the partition sizes to be invalid:# cat /proc/paritions | grep sdan
66 112 2146435072 sdan
66 115 9223372036853660736 sdan3
66 120 9223372036853660736 sdan8This patch switches the sector count to an unsigned int to fix this.
Signed-off-by: Jeff Mahoney
Signed-off-by: Andrew Morton
Signed-off-by: David S. Miller -
It moves the smp_procesors_ready variable to sun4d_smp.c only.
Signed-off-by: Krzysztof Helt (krzysztof.h1@wp.pl)
Signed-off-by: David S. Miller -
smp_setup_cpu_possible_map() needs to run after paging_init()
so that the in-kernel device tree is setup.Signed-off-by: Krzysztof Helt
Signed-off-by: David S. Miller