28 Aug, 2006

28 commits

  • Change the list of cpus allowed to tasks in the top (root) cpuset to
    dynamically track what cpus are online, using a CPU hotplug notifier. Make
    this top cpus file read-only.

    On systems that have cpusets configured in their kernel, but that aren't
    actively using cpusets (for some distros, this covers the majority of
    systems) all tasks end up in the top cpuset.

    If that system does support CPU hotplug, then these tasks cannot make use
    of CPUs that are added after system boot, because the CPUs are not allowed
    in the top cpuset. This is a surprising regression over earlier kernels
    that didn't have cpusets enabled.

    In order to keep the behaviour of cpusets consistent between systems
    actively making use of them and systems not using them, this patch changes
    the behaviour of the 'cpus' file in the top (root) cpuset, making it read
    only, and making it automatically track the value of cpu_online_map. Thus
    tasks in the top cpuset will have automatic use of hot plugged CPUs allowed
    by their cpuset.

    Thanks to Anton Blanchard and Nathan Lynch for reporting this problem,
    driving the fix, and earlier versions of this patch.

    Signed-off-by: Paul Jackson
    Cc: Nathan Lynch
    Cc: Anton Blanchard
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     
  • A recent patch broke the ability to do a user-request check of a raid1.
    This patch fixes the breakage and also moves a comment that was dislocated
    by the same patch.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • If we
    - shut down a clean array,
    - restart with one (or more) drive(s) missing
    - make some changes
    - pause, so that they array gets marked 'clean',
    the event count on the superblock of included drives
    will be the same as that of the removed drives.
    So adding the removed drive back in will cause it
    to be included with no resync.

    To avoid this, we only update the eventcount backwards when the array
    is not degraded. In this case there can (should) be no non-connected
    drives that we can get confused with, and this is the particular case
    where updating-backwards is valuable.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Fix two compile failures in eventpoll.c code which would happen if
    DEBUG_EPOLL is bigger than zero.

    Signed-off-by: Masoud Sharbiani
    Cc: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masoud Asgharifard Sharbiani
     
  • Here's updated documentation for the relay interface, rewritten to match
    the relayfs->relay changes. It also moves relayfs.txt to relay.txt in the
    process.

    It includes the changes to relayfs.txt previously posted by Randy Dunlap,
    thanks for those.

    The relay-apps examples have also been updated to match, and can be found
    on the sourceforge relayfs website.

    Signed-off-by: Tom Zanussi
    Cc: "Randy.Dunlap"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tom Zanussi
     
  • An up() is called in kernel/stop_machine.c on failure, and also in the
    caller (unconditionally).

    Signed-off-by: Zhou Yingchao
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Yingchao Zhou
     
  • 1) When we allocated last fragment in ufs_truncate, we read page, check
    if block mapped to address, and if not trying to allocate it. This is
    wrong behaviour, fragment may be NOT allocated, but mapped, this
    happened because of "block map" function not checked allocated fragment
    or not, it just take address of the first fragment in the block, add
    offset of fragment and return result, this is correct behaviour in
    almost all situation except call from ufs_truncate.

    2) Almost all implementation of UFS, which I can investigate have such
    "defect": if you have full disk, and try truncate file, for example 3GB
    to 2MB, and have hole in this region, truncate return -ENOSPC. I tried
    evade from this problem, but "block allocation" algorithm is tied to
    right value of i_lastfrag, and fix of this corner case may slow down of
    ordinaries scenarios, so this patch makes behavior of "truncate"
    operations similar to what other UFS implementations do.

    Signed-off-by: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     
  • On UFS, this scenario:
    open(O_TRUNC)
    lseek(1024 * 1024 * 80)
    write("A")
    lseek(1024 * 2)
    write("A")

    may cause access to invalid address.

    This happened because of "goal" is calculated in wrong way in block
    allocation path, as I see this problem exists also in 2.4.

    We use construction like this i_data[lastfrag], i_data array of pointers to
    direct blocks, indirect and so on, it has ceratain size ~20 elements, and
    lastfrag may have value for example 40000.

    Also this patch fixes related to handling such scenario issues, wrong
    zeroing metadata, in case of block(not fragment) allocation, and wrong goal
    calculation, when we allocate block

    Signed-off-by: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Dushistov
     
  • To handle the earlier bogus ENOSPC error caused by filesystem full of block
    reservation, current code falls back to non block reservation, starts to
    allocate block(s) from the goal allocation block group as if there is no
    block reservation.

    Current code needs to re-load the corresponding block group descriptor for
    the initial goal block group in this case. The patch fixes this.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • Mounting an ext2 filesystem with zero s_inodes_per_group will cause a
    divide error.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andries Brouwer
     
  • Mounting a (corrupt) minix filesystem with zero s_zmap_blocks
    gives a spectacular crash on my 2.6.17.8 system, no doubt
    because minix/inode.c does an unconditional
    minix_set_bit(0,sbi->s_zmap[0]->b_data);

    [akpm@osdl.org: make labels conistent while we're there]

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andries Brouwer
     
  • The recent hwctrl core conversion for MTD NAND devices broke the Amstrad
    Delta driver. This fixes it up and uses the existing control line defines
    rather than unclear magic numbers.

    Signed-off-by: Jonathan McDowell
    Acked-by: David Woodhouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jonathan McDowell
     
  • futex_find_get_task:

    if (p->state == EXIT_ZOMBIE || p->exit_state == EXIT_ZOMBIE)
    return NULL;

    I can't understand this. First, p->state can't be EXIT_ZOMBIE. The
    ->exit_state check looks strange too. Sub-threads or tasks whose ->parent
    ignores SIGCHLD go directly to EXIT_DEAD state (I am ignoring a ptrace
    case). Why EXIT_DEAD tasks should be ok? Yes, EXIT_ZOMBIE is more
    important (a task may stay zombie for a long time), but this doesn't mean
    we should explicitely ignore other EXIT_XXX states.

    Signed-off-by: Oleg Nesterov
    Acked-by: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • When reading /dev/vcsa while a font with more than 256 characters is
    loaded, one of the attribute bits records the 9th bit of the character.
    But depending on the console driver (vgacon or fbcon for instance), that's
    bit 3 or bit 0. And there is no way for userland to know that, thus no way
    for userland to safely grab the screen content. So here is a (tested)
    patch:

    Add a VT_GETHIFONTMASK ioctl for knowing which bit is the 9th bit for VC
    text (vc_hi_font_mask field of the vc_data structure).

    Signed-off-by: Samuel Thibault
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Samuel Thibault
     
  • I wish I was happier about this patch. It'll serve as a placeholder for
    the moment. I'm still trying to get a G550 working in order to even
    reproduce the problem this patch introduces. I find that the G450 has
    jitter even without this patch, so it won't show me what the patch changed.
    At this point, I'll continue trying to get the G550 to work, and in
    parallel work with the G450 to work out the kinks.

    The patch is below.

    Set XDVICLKCTRL only on PPC, as doing this apparently introduces jitter on
    the G550, at least on x86 architectures.

    Signed-off-by: Paul A. Clarke
    Signed-off-by: Petr Vandrovec
    Cc: "Antonino A. Daplas"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul A. Clarke
     
  • While testing Moxa C218T/PCI on PowerPC 405EP I found that loading firmware
    using the linux kernel driver fails because calculation of the checksum is
    not endianess independent in the original code.

    After I fixed this I found that uploading firmware in a system with
    multiple cards causes a kernel oops. I had a look in the recent moxa
    sources and found that they do some kind of locking there. Applying this
    lock fixed the problem.

    Alan sayeth:

    Checksum changes are clearly correct. Other changes is an improvement but
    not I think enough to handle malicious firmware attacks. That said such an
    attacker has CAP_SYS_RAWIO anyway so that part is irrelevant except for
    neatness.

    [akpm@osdl.org: cleanups]
    Signed-off-by: Dirk Eibach
    Acked-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dirk Eibach
     
  • Ignore the return value of early_init_acpi(), as it can give false error
    messages. If there is something really wrong, then register_driver will
    fail cleanly with EINVAL later.

    [ background: modprobe acpi-cpufreq on systems not capable of speed-scaling
    started failing with 'invalid argument', where previously it would only
    ever -ENODEV

    I'm not 100% happy with the solution. It'd be better to handle
    failure properly, but this is a low-impact change for 2.6.18
    We can always revisit doing this better in .19 --davej.]

    Signed-off-by: Venkatesh Pallipadi
    Signed-off-by: Dave Jones
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jones
     
  • sched_setscheduler() looks at ->signal->rlim[]. It is unsafe do
    dereference ->signal unless tasklist_lock or ->siglock is held (or p ==
    current). We pin the task structure, but this can't prevent from
    release_task()->__exit_signal() which sets ->signal = NULL.

    Restore tasklist_lock across the setscheduler call.

    Signed-off-by: Oleg Nesterov
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Commit b64ef8afa58f397e1eaba2bd9ecaa6812064d464 ("[PATCH] add imacfb
    documentation and detection") contained a wrong DMI_MATCH.

    Signed-off-by: Thomas Meyer
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Meyer
     
  • Read the return value before we release the nand device otherwise the
    value can become corrupted by another user of chip->ops, ultimately
    resulting in filesystem corruption.

    Signed-off-by: Richard Purdie
    Cc: David Woodhouse
    Acked-by: Josh Boyer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Richard Purdie
     
  • On Wed, 2006-08-09 at 07:57 +0200, Rolf Eike Beer wrote:
    > =============================================
    > [ INFO: possible recursive locking detected ]
    > ---------------------------------------------
    > parted/7929 is trying to acquire lock:
    > (&bdev->bd_mutex){--..}, at: [] __blkdev_put+0x1e/0x13c
    >
    > but task is already holding lock:
    > (&bdev->bd_mutex){--..}, at: [] do_open+0x72/0x3a8
    >
    > other info that might help us debug this:
    > 1 lock held by parted/7929:
    > #0: (&bdev->bd_mutex){--..}, at: [] do_open+0x72/0x3a8
    > stack backtrace:
    > [] show_trace_log_lvl+0x58/0x15b
    > [] show_trace+0xd/0x10
    > [] dump_stack+0x17/0x1a
    > [] __lock_acquire+0x753/0x99c
    > [] lock_acquire+0x4a/0x6a
    > [] mutex_lock_nested+0xc8/0x20c
    > [] __blkdev_put+0x1e/0x13c
    > [] blkdev_put+0xa/0xc
    > [] do_open+0x336/0x3a8
    > [] blkdev_open+0x1f/0x4c
    > [] __dentry_open+0xc7/0x1aa
    > [] nameidata_to_filp+0x1c/0x2e
    > [] do_filp_open+0x2e/0x35
    > [] do_sys_open+0x38/0x68
    > [] sys_open+0x16/0x18
    > [] sysenter_past_esp+0x56/0x8d

    OK, I'm having a look here; its all new to me so bear with me.

    blkdev_open() calls
    do_open(bdev, ...,BD_MUTEX_NORMAL) and takes
    mutex_lock_nested(&bdev->bd_mutex, BD_MUTEX_NORMAL)

    then something fails, and we're thrown to:

    out_first: where
    if (bdev != bdev->bd_contains)
    blkdev_put(bdev->bd_contains) which is
    __blkdev_put(bdev->bd_contains, BD_MUTEX_NORMAL) which does
    mutex_lock_nested(&bdev->bd_contains->bd_mutex, BD_MUTEX_NORMAL) bd_contains is either bdev or whole, and
    since we take the branch it must be whole. So it seems to me the
    following patch would be the right one:

    [akpm@osdl.org: compile fix]
    Signed-off-by: Peter Zijlstra
    Cc: Arjan van de Ven
    Acked-by: NeilBrown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • Recently a patch was added for preliminary suspend/resume handling on
    !PPC_PMAC. However, this broke both suspend and firewire on powerpc
    because it saves the pci state after the device has already been disabled.

    This moves the save state to before the pmac specific code.

    Signed-off-by: Danny Tholen
    Cc: Stefan Richter
    Acked-by: Benjamin Herrenschmidt
    Cc: Ben Collins
    Cc: Jody McIntyre
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Danny Tholen
     
  • Sergey Vlasov noticed that there is not kernel.suid_dumpable, but
    fs.suid_dumpable.

    How KERN_SETUID_DUMPABLE ended up in fs_table[]? Hell knows...

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • When cdev_add() failed there is no reason to call cdev_del().

    Signed-off-by: Rolf Eike Beer
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rolf Eike Beer
     
  • Fix the year check on setting the time with the S3C24XX RTC driver. Also
    move the debug to before the set to see what is going on if it does fail.

    Signed-off-by: Ben Dooks
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Dooks
     
  • There is a bug in mm/swapfile.c#swap_type_of() that makes swsusp only be
    able to use the first active swap partition as the resume device. Fix it.

    Signed-off-by: Rafael J. Wysocki
    Cc: Hugh Dickins
    Acked-by: Pavel Machek
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • On an nForce4-equipped machine with two SATA disk in raid1 setup using dmraid,
    we experienced frequent deadlock of the system under high i/o load. 'cat
    /dev/zero > ~/zero' was the most reliable way to reproduce them: Randomly
    after a few GB, 'cp' would be left in 'D' state along with kjournald and
    kmirrord. The functions cp and kjournald were blocked in did vary, but
    kmirrord's wchan always pointed to 'mempool_alloc()'. We've seen this pattern
    on 2.6.15 and 2.6.17 kernels. http://lkml.org/lkml/2005/4/20/142 indicates
    that this problem has been around even before.

    So much for the facts, here's my interpretation: mempool_alloc() first tries
    to atomically allocate the requested memory, or falls back to hand out
    preallocated chunks from the mempool. If both fail, it puts the calling
    process (kmirrord in this case) on a private waitqueue until somebody refills
    the pool. Where the only 'somebody' is kmirrord itself, so we have a
    deadlock.

    I worked around this problem by falling back to a (blocking) kmalloc when
    before kmirrord would have ended up on the waitqueue. This defeats part of
    the benefits of using the mempool, but at least keeps the system running. And
    it could be done with a two-line change. Note that mempool_alloc() clears the
    GFP_NOIO flag internally, and only uses it to decide whether to wait or return
    an error if immediate allocation fails, so the attached patch doesn't change
    behaviour in the non-deadlocking case. Path is against current git
    (2.6.18-rc4), but should apply to earlier versions as well. I've tested on
    2.6.15, where this patch makes the difference between random lockup and a
    stable system.

    Signed-off-by: Daniel Kobras
    Acked-by: Alasdair G Kergon
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Kobras
     
  • In the cleanups of drivers/rtc/s3c-rtc.c, the base address for the
    registers got broken. This patch fixes that by ensuring the readb/writeb
    are all prefixed with the base returned from ioremap()ing the registers.

    Also fix check for valid year range, which was the wrong way around.

    Signed-off-by: Ben Dooks
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ben Dooks
     

27 Aug, 2006

12 commits

  • This fixes CCID3 to give much closer performance to RFC4342.

    CCID3 is meant to alter sending rate based on RTT and loss.

    The performance was verified against:
    http://wand.net.nz/~perry/max_download.php

    For example I tested with netem and had the following parameters:
    Delayed Acks 1, MSS 256 bytes, RTT 105 ms, packet loss 5%.

    This gives a theoretical speed of 71.9 Kbits/s. I measured across three
    runs with this patch set and got 70.1 Kbits/s. Without this patchset the
    average was 232 Kbits/s which means Linux can't be used for CCID3 research
    properly.

    I also tested with netem turned off so box just acting as router with 1.2
    msec RTT. The performance with this is the same with or without the patch
    at around 30 Mbit/s.

    Signed off by: Ian McDonald
    Signed-off-by: David S. Miller

    Ian McDonald
     
  • The bridge-netfilter code will overwrite memory if there is not
    headroom in the skb to save the header. This first showed up when
    using Xen with sky2 driver that doesn't allocate the extra space.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
    [DCCP]: Introduce dccp_rx_hist_find_entry
    [DCCP]: Introduces follows48 function
    [DCCP]: Update contact details and copyright
    [DCCP]: Fix typo
    [IPV6]: Segmentation offload not set correctly on TCP children
    [CONNECTOR]: Add userspace example code into Documentation/connector/

    Linus Torvalds
     
  • This adds a new function dccp_rx_hist_find_entry.

    Signed off by: Ian McDonald
    Signed-off-by: David S. Miller

    Ian McDonald
     
  • This adds a new function to see if two sequence numbers follow each
    other.

    Signed off by: Ian McDonald
    Signed-off-by: David S. Miller

    Ian McDonald
     
  • Just updating copyright and contacts

    Signed off by: Ian McDonald
    Signed-off-by: David S. Miller

    Ian McDonald
     
  • This fixes a small typo in net/dccp/libs/packet_history.c

    Signed off by: Ian McDonald
    Signed-off-by: David S. Miller

    Ian McDonald
     
  • TCP over IPV6 would incorrectly inherit the GSO settings.
    This would cause kernel to send Tcp Segmentation Offload packets for
    IPV6 data to devices that can't handle it. It caused the sky2 driver
    to lock http://bugzilla.kernel.org/show_bug.cgi?id=7050
    and the e1000 would generate bogus packets. I can't blame the
    hardware for gagging if the upper layers feed it garbage.

    This was a new bug in 2.6.18 introduced with GSO support.

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    Stephen Hemminger
     
  • I was asked several times to include userspace example code into
    Documentation, so if there is no policy against it, consider attached patch
    for 2.6.18. This program works with included Documentation/connector/cn_test.c
    connector module.

    Signed-off-by: Evgeniy Polyakov
    Signed-off-by: David S. Miller

    Evgeniy Polyakov
     
  • The current sun disklabel code uses a signed int for the sector count.
    When partitions larger than 1 TB are used, the cast to a sector_t causes
    the partition sizes to be invalid:

    # cat /proc/paritions | grep sdan
    66 112 2146435072 sdan
    66 115 9223372036853660736 sdan3
    66 120 9223372036853660736 sdan8

    This patch switches the sector count to an unsigned int to fix this.

    Signed-off-by: Jeff Mahoney
    Signed-off-by: Andrew Morton
    Signed-off-by: David S. Miller

    Jeff Mahoney
     
  • It moves the smp_procesors_ready variable to sun4d_smp.c only.

    Signed-off-by: Krzysztof Helt (krzysztof.h1@wp.pl)
    Signed-off-by: David S. Miller

    Krzysztof Helt
     
  • smp_setup_cpu_possible_map() needs to run after paging_init()
    so that the in-kernel device tree is setup.

    Signed-off-by: Krzysztof Helt
    Signed-off-by: David S. Miller

    Krzysztof Helt