23 Jun, 2006

40 commits

  • Do a safer check for when to enable DMA. Currently we enable ISA DMA
    for cases that do not need it, resulting in OOM conditions when ZONE_DMA
    runs out of space.

    Signed-off-by: Jens Axboe

    Andi Kleen
     
  • They all duplicate macros to check for empty root and/or node, and
    clearing a node. So put those in rbtree.h.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Otherwise we could be racing with truncate/mapping removal.

    Problem found/fixed by Nick Piggin , logic rewritten
    by me.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • - Remember to set ->last_sector so that the cfq_choose_req() logic
    works correctly.

    - Remove redundant call to cfq_choose_req()

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This is a collection of patches that greatly improve CFQ performance
    in some circumstances.

    - Change the idling logic to only kick in after a request is done and we
    are deciding what to do. Before the idling included the request service
    time, so it was hard to adjust. Now it's true think/idle time.

    - Take advantage of TCQ/NCQ/queueing for seeky sync workloads, but keep
    it in control for sync and sequential (or close to) workloads.

    - Expire queues immediately and move on to other busy queues, if we are
    not going to idle after the current one finishes.

    - Don't rearm idle timer if there are no busy queues. Just leave the
    system idle.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Patch originally from Vasily Tarasov

    If you set io-priority of process 1 using sys_ioprio_set system call by
    another process 2 (like ionice do), then cfq_init_prio_data() function
    sets priority of process 2 (current) on queue of process 1 and clears
    the flag, that designates change of ioprio. So the process 1 will work
    like with priority of process 2.

    I propose not to call cfq_init_prio_data() on io-priority change, but
    only mark queue as queue with changed prority. Every time when new
    request comes cfq-scheduler checks for this flag and atomaticaly changes
    priority of queue to new value.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Signed-off-by: Jens Axboe

    Jens Axboe
     
  • This saves 8 bytes of data in 64-bit archs.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • The IDE power management can just use the ->end_io_data member to store
    it's data.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • A process flag to indicate whether we are doing sync io is incredibly
    ugly. It also causes performance problems when one does a lot of async
    io and then proceeds to sync it. Part of the io will go out as async,
    and the other part as sync. This causes a disconnect between the
    previously submitted io and the synced io. For io schedulers such as CFQ,
    this will cause us lost merges and suboptimal behaviour in scheduling.

    Remove PF_SYNCWRITE completely from the fsync/msync paths, and let
    the O_DIRECT path just directly indicate that the writes are sync
    by using WRITE_SYNC instead.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • We cannot update them if the user changes nr_requests, so don't
    set it in the first place. The gains are pretty questionable as
    well. The batching loss has been shown to decrease throughput.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • We already drop the refcount in elevator_exit(), and as
    we're setting 'e' to NULL, we'll never take that branch anyway.
    Finally, as 'e' is a local var that isn't referenced afterwards,
    setting it to NULL is pointless.

    Signed-off-by: Dave Jones
    Signed-off-by: Jens Axboe

    Dave Jones
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Alexey Dobriyan
     
  • The queue lock can be taken from interrupts so it must always be taken with
    irq disabling primitives. Some primitives already verify this.
    blk_start_queue() is called under this lock, so interrupts must be
    disabled.

    Also document this requirement clearly in blk_init_queue(), where the queue
    spinlock is set.

    Signed-off-by: Paolo 'Blaisorblade' Giarrusso
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Paolo 'Blaisorblade' Giarrusso
     
  • Use hlist instead of list_head for request hashtable in deadline-iosched
    and as-iosched. It also can remove the flag to know hashed or unhashed.

    Signed-off-by: Akinobu Mita
    Signed-off-by: Jens Axboe

    block/as-iosched.c | 45 +++++++++++++++++++--------------------------
    block/deadline-iosched.c | 39 ++++++++++++++++-----------------------
    2 files changed, 35 insertions(+), 49 deletions(-)

    Akinobu Mita
     
  • * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
    [NET]: Require CAP_NET_ADMIN to create tuntap devices.
    [NET]: fix net-core kernel-doc
    [TCP]: Move inclusion of to correct place in
    [IPSEC]: Handle GSO packets
    [NET]: Added GSO toggle
    [NET]: Add software TSOv4
    [NET]: Add generic segmentation offload
    [NET]: Merge TSO/UFO fields in sk_buff
    [NET]: Prevent transmission after dev_deactivate
    [IPV6] ADDRCONF: Fix default source address selection without CONFIG_IPV6_PRIVACY
    [IPV6]: Fix source address selection.
    [NET]: Avoid allocating skb in skb_pad

    Linus Torvalds
     
  • * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (65 commits)
    ACPI: suppress power button event on S3 resume
    ACPI: resolve merge conflict between sem2mutex and processor_perflib.c
    ACPI: use for_each_possible_cpu() instead of for_each_cpu()
    ACPI: delete newly added debugging macros in processor_perflib.c
    ACPI: UP build fix for bugzilla-5737
    Enable P-state software coordination via _PDC
    P-state software coordination for speedstep-centrino
    P-state software coordination for acpi-cpufreq
    P-state software coordination for ACPI core
    ACPI: create acpi_thermal_resume()
    ACPI: create acpi_fan_suspend()/acpi_fan_resume()
    ACPI: pass pm_message_t from acpi_device_suspend() to root_suspend()
    ACPI: create acpi_device_suspend()/acpi_device_resume()
    ACPI: replace spin_lock_irq with mutex for ec poll mode
    ACPI: Allow a WAN module enable/disable on a Thinkpad X60.
    sem2mutex: acpi, acpi_link_lock
    ACPI: delete unused acpi_bus_drivers_lock
    sem2mutex: drivers/acpi/processor_perflib.c
    ACPI add ia64 exports to build acpi_memhotplug as a module
    ACPI: asus_acpi_init(): propagate correct return value
    ...

    Manual resolve of conflicts in:

    arch/i386/kernel/cpu/cpufreq/acpi-cpufreq.c
    arch/i386/kernel/cpu/cpufreq/speedstep-centrino.c
    include/acpi/processor.h

    Linus Torvalds
     
  • Update the sparse documentation to omit the -Wbitwise flag example (as it
    is now passed by default), and document the kernel defines to enable
    endianness checking.

    Signed-off-by: Bob Copeland
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bob Copeland
     
  • While writing a version of losetup, I ran into the problem that the loop
    device was returning total garbage.

    It turns out the problem was that this losetup was only issuing the
    LOOP_SET_FD ioctl and not issuing a subsequent LOOP_SET_STATUS ioctl. This
    losetup didn't have any special status to set, so it left out the call.

    The deeper cause is that loop_set_fd sets the transfer function to NULL,
    which causes no transfer to happen lo_do_transfer.

    This patch fixes the problem by setting transfer to transfer_none in
    loop_set_fd.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Constantine Sapuntzakis
     
  • Sometimes partitions claim to be larger than the reported capacity of a
    disk device. This patch makes the kernel warn about those partitions.

    We still permit these patitions to be used. Quoting Andries Brouwer
    :

    Case 1: The kernel is mistaken about the size of the disk. (There are
    commands to clip a disk to a certain capacity, there are jumpers to tell a
    disk that it should report a certain capacity etc. Usually this is because
    of BIOS bugs. In bad cases the machine will crash in the BIOS and hence fail
    to boot if the disk reports full capacity.) In such cases actually accessing
    the blocks of the partition may work fine, or may work fine after running an
    unclip utility. I wrote "setmax" some years ago precisely for this reason.

    Case 2: There was a messy partition table (maybe just a rounding error) but
    the actual filesystem on the partition is contained in the physical disk.
    Now using the filesystem goes without problem.

    Case 3: Both partition and filesystem extend beyond the end of the disk. In
    forensic or debugging situations one often uses a copy of the start of a
    disk. Now access beyond the end gives an expected I/O error.

    Signed-off-by: Mike Miller
    Signed-off-by: Stephen Cameron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Miller
     
  • Signed-off-by: Eric Sesterhenn
    Signed-off-by: Alexey Dobriyan
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Alan Cox
    Cc: James Bottomley
    Acked-by: "Salyzyn, Mark"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Sesterhenn
     
  • Split the checkpoint list of the transaction into two lists. In the first
    list we keep the buffers that need to be submitted for IO. In the second
    list are kept buffers that were already submitted and we just have to wait
    for the IO to complete. This should simplify a handling of checkpoint
    lists a bit and can eventually be also a performance gain.

    Signed-off-by: Jan Kara
    Cc: Mark Fasheh
    Cc: "Stephen C. Tweedie"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Mark a few non-exported functions static.

    Signed-off-by: Peter Hagervall
    Cc: Paul Fulghum
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Hagervall
     
  • Correct the return type of handle_IRQ_event() (inconsistency noticed during
    Xen development), and remove redundant declarations. The return type
    adjustment required breaking out the definition of irqreturn_t into a
    separate header, in order to satisfy current include order dependencies.

    Signed-off-by: Jan Beulich

    Cc: Richard Henderson
    Cc: Ivan Kokshaysky
    Cc: Russell King
    Cc: Ian Molton
    Cc: Mikael Starvik
    Cc: Yoshinori Sato
    Cc: Hirokazu Takata
    Cc: Heiko Carstens
    Cc: Martin Schwidefsky
    Cc: William Lee Irwin III
    Cc: "David S. Miller"
    Cc: Miles Bader
    Cc: Geert Uytterhoeven
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     
  • This patch fixes a NULL dereference spotted by the Coverity checker.

    Signed-off-by: Adrian Bunk
    Cc: "H. Peter Anvin"
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Add a chapter on typedefs, copied from an email from Linus to lkml on Feb.
    3, 2006. (Subject: Re: [RFC][PATCH 1/5] Virtualization/containers:
    startup)

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • When CONFIG_BASE_SAMLL=1, cascade() in may enter the infinite loop.
    Because of CONFIG_BASE_SMALL=1(TVR_BITS=6 and TVN_BITS=4), the list
    base->tv5 may cascade into base->tv5. So, the kernel enters the infinite
    loop in the function cascade().

    I created a test module to verify this bug, and a patch to fix it.

    #include
    #include
    #include
    #include
    #if 0
    #include
    #else
    #define kdb_printf printk
    #endif

    #define TVN_BITS (CONFIG_BASE_SMALL ? 4 : 6)
    #define TVR_BITS (CONFIG_BASE_SMALL ? 6 : 8)
    #define TVN_SIZE (1 << TVN_BITS)
    #define TVR_SIZE (1 << TVR_BITS)
    #define TVN_MASK (TVN_SIZE - 1)
    #define TVR_MASK (TVR_SIZE - 1)

    #define TV_SIZE(N) (N*TVN_BITS + TVR_BITS)

    struct timer_list timer0;
    struct timer_list dummy_timer1;
    struct timer_list dummy_timer2;

    void dummy_timer_fun(unsigned long data) {
    }
    unsigned long j=0;
    void check_timer_base(unsigned long data)
    {
    kdb_printf("check_timer_base %08x\n",jiffies);
    mod_timer(&timer0,(jiffies & (~0xFFF)) + 0x1FFF);
    }

    int init_module(void)
    {
    init_timer(&timer0);
    timer0.data = (unsigned long)0;
    timer0.function = check_timer_base;
    mod_timer(&timer0,jiffies+1);

    init_timer(&dummy_timer1);
    dummy_timer1.data = (unsigned long)0;
    dummy_timer1.function = dummy_timer_fun;

    init_timer(&dummy_timer2);
    dummy_timer2.data = (unsigned long)0;
    dummy_timer2.function = dummy_timer_fun;

    j=jiffies;
    j&=(~((1<<<
    Cc: Matt Mackall
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Porpoise
     
  • list_splice_init(list, head) does unneeded job if it is known that
    list_empty(head) == 1. We can use list_replace_init() instead.

    Signed-off-by: Oleg Nesterov
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • list_replace() is similar to list_replace_rcu(), but unlike
    list_replace_rcu() it

    could be used when list_empty(old) == 1

    doesn't use barriers

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    bjdouma
     
  • There are three different IO cards which an SGI IOC4 controller may find
    itself on. One of these variants does not bring out the IDE and serial
    signals, so we need to disable attaching the corresponding IOC4 subdrivers
    to such cards.

    Cleans up message clutter emitted during device probing.

    Signed-off-by: Brent Casavant
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Brent Casavant
     
  • Fix one audit kernel-doc description (one parameter was missing).
    Add audit*.c interfaces to DocBook.
    Add BSD accounting interfaces to DocBook.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Remove synchronize_kernel() (deprecated 2-APR-2005 in
    http://lkml.org/lkml/2005/4/3/11) and makes the RCU API inaccessible to
    non-GPL Linux kernel modules (as was announced more than one year ago in
    http://lkml.org/lkml/2005/4/3/8). Tested on x86 and ppc64.

    Signed-off-by: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul E. McKenney
     
  • kernel/sys.c doesn't have anything in it relying on linux/init.h -
    remove the include.

    Signed-off-by: Jes Sorensen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jes Sorensen
     
  • Provide a checklist of techniques to aid kernel patch submitters in
    producing healthy patches and in lessening a burden on maintainers.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • If invalidate_mapping_pages is called to invalidate a very large mapping
    (e.g. a very large block device) and if the only active page in that
    device is near the end (or at least, at a very large index), such as, say,
    the superblock of an md array, and if that page happens to be locked when
    invalidate_mapping_pages is called, then

    pagevec_lookup will return this page and
    as it is locked, 'next' will be incremented and pagevec_lookup
    will be called again. and again. and again.
    while we count from 0 upto a very large number.

    We should really always set 'next' to 'page->index+1' before going around
    the loop again, not just if the page isn't locked.

    Cc: "Steinar H. Gunderson"
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Cc: Greg KH
    Cc: Russell King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Put the connector exports at the functions so people can see them in context.

    Cc: Evgeniy Polyakov
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Switch an open-coded strstrip() to use the new API.

    Acked-by: Corey Minyard
    Signed-off-by: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka Enberg
     
  • Add a new strstrip() function to lib/string.c for removing leading and
    trailing whitespace from a string.

    Cc: Michael Holzheu
    Acked-by: Ingo Oeser
    Acked-by: Joern Engel
    Cc: Corey Minyard
    Signed-off-by: Pekka Enberg
    Acked-by: Michael Holzheu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka Enberg