27 Mar, 2006

36 commits

  • The nanosleep cleanup allows to remove the data field of hrtimer. The
    callback function can use container_of() to get it's own data. Since the
    hrtimer structure is anyway embedded in other structures, this adds no
    overhead.

    Signed-off-by: Roman Zippel
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     
  • nsec_t predates ktime_t and has mostly been superseded by it. In the few
    places that are left it's better to make it explicit that we're dealing with
    64 bit values here.

    Signed-off-by: Roman Zippel
    Acked-by: Thomas Gleixner
    Acked-by: John Stultz
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     
  • Now that it_real_value is gone, the last user of DEFINE_KTIME and
    ktime_to_clock_t are also gone, so remove it before someone starts using it
    again.

    Signed-off-by: Roman Zippel
    Acked-by: Ingo Molnar
    Acked-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     
  • Remove the state field and encode this information in the rb_node similiar to
    normal timer.

    Signed-off-by: Roman Zippel
    Acked-by: Ingo Molnar
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     
  • nanosleep is the only user of the expired state, so let it manage this itself,
    which makes the hrtimer code a bit simpler. The remaining time is also only
    calculated if requested.

    Signed-off-by: Roman Zippel
    Acked-by: Ingo Molnar
    Acked-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     
  • Pass current time to hrtimer_forward(). This allows to use the softirq time
    in the timer base when the forward function is called from the timer callback.
    Other places pass current time with a call to timer->base->get_time().

    Signed-off-by: Roman Zippel
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roman Zippel
     
  • The hrtimer softirq is called from the timer softirq every tick. Retrieve the
    current time from xtime and wall_to_monotonic instead of calling
    base->get_time() for each timer base. Store the time in the base structure
    and provide a hook once clock source abstractions are in place and to keep the
    code open for new base clocks.

    Based on a patch from: Roman Zippel

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Now that get_block() can handle mapping multiple disk blocks, no need to have
    ->get_blocks(). This patch removes fs specific ->get_blocks() added for DIO
    and makes it users use get_block() instead.

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • Pass amount of disk needs to be mapped to get_block(). This way one can
    modify the fs ->get_block() functions to map multiple blocks at the same time.

    [akpm@osdl.org: performance tweak]
    [akpm@osdl.org: remove unneeded assignments]
    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • Increase the size of the buffer_head b_size field (only) for 64 bit platforms.
    Update some old and moldy comments in and around the structure as well.

    The b_size increase allows us to perform larger mappings and allocations for
    large I/O requests from userspace, which tie in with other changes allowing
    the get_block_t() interface to map multiple blocks at once.

    Signed-off-by: Nathan Scott
    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • Change ext3_try_to_allocate() (called via ext3_new_blocks()) to try to
    allocate the requested number of blocks on a best effort basis: After
    allocated the first block, it will always attempt to allocate the next few(up
    to the requested size and not beyond the reservation window) adjacent blocks
    at the same time.

    Signed-off-by: Mingming Cao
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • Currently ext3_get_block() only maps or allocates one block at a time. This
    is quite inefficient for sequential IO workload.

    I have posted a early implements a simply multiple block map and allocation
    with current ext3. The basic idea is allocating the 1st block in the existing
    way, and attempting to allocate the next adjacent blocks on a best effort
    basis. More description about the implementation could be found here:
    http://marc.theaimsgroup.com/?l=ext2-devel&m=112162230003522&w=2

    The following the latest version of the patch: break the original patch into 5
    patches, re-worked some logicals, and fixed some bugs. The break ups are:

    [patch 1] Adding map multiple blocks at a time in ext3_get_blocks()
    [patch 2] Extend ext3_get_blocks() to support multiple block allocation
    [patch 3] Implement multiple block allocation in ext3-try-to-allocate
    (called via ext3_new_block()).
    [patch 4] Proper accounting updates in ext3_new_blocks()
    [patch 5] Adjust reservation window size properly (by the given number
    of blocks to allocate) before block allocation to increase the
    possibility of allocating multiple blocks in a single call.

    Tests done so far includes fsx,tiobench and dbench. The following numbers
    collected from Direct IO tests (1G file creation/read) shows the system time
    have been greatly reduced (more than 50% on my 8 cpu system) with the patches.

    1G file DIO write:
    2.6.15 2.6.15+patches
    real 0m31.275s 0m31.161s
    user 0m0.000s 0m0.000s
    sys 0m3.384s 0m0.564s

    1G file DIO read:
    2.6.15 2.6.15+patches
    real 0m30.733s 0m30.624s
    user 0m0.000s 0m0.004s
    sys 0m0.748s 0m0.380s

    Some previous test we did on buffered IO with using multiple blocks allocation
    and delayed allocation shows noticeable improvement on throughput and system
    time.

    This patch:

    Add support of mapping multiple blocks in one call.

    This is useful for DIO reads and re-writes (where blocks are already
    allocated), also is in line with Christoph's proposal of using getblocks() in
    mpage_readpage() or mpage_readpages().

    Signed-off-by: Mingming Cao
    Cc: Badari Pulavarty
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • This fix was proposed by Trond Myklebust. He says: The type "sector_t" is
    heavily tied in to the block layer interface as an offset/handle to a block,
    and is subject to a supposedly block-specific configuration option:
    CONFIG_LBD. Despite this, it is used in struct kstatfs to save a couple of
    bytes on the stack whenever we call the filesystems' ->statfs().

    So kstatfs's entries related to blocks are invalid on statfs64 for a network
    filesystem which has more than 2^32-1 blocks when CONFIG_LBD is disabled.

    - struct kstatfs
    Change the type of following entries from sector_t to u64.
    f_blocks
    f_bfree
    f_bavail
    f_files
    f_ffree

    Signed-off-by: Trond Myklebust
    Signed-off-by: Takashi Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Takashi Sato
     
  • Add blkcnt_t as the type of inode.i_blocks. This enables you to make the size
    of blkcnt_t either 4 bytes or 8 bytes on 32 bits architecture with CONFIG_LSF.

    - CONFIG_LSF
    Add new configuration parameter.
    - blkcnt_t
    On h8300, i386, mips, powerpc, s390 and sh that define sector_t,
    blkcnt_t is defined as u64 if CONFIG_LSF is enabled; otherwise it is
    defined as unsigned long.
    On other architectures, it is defined as unsigned long.
    - inode.i_blocks
    Change the type from sector_t to blkcnt_t.

    Signed-off-by: Takashi Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Takashi Sato
     
  • This patch series fixes the following problems on 32 bits architecture.

    o stat64 returns the lower 32 bits of blocks, although userland st_blocks
    has 64 bits, because i_blocks has only 32 bits. The ioctl with FIOQSIZE has
    the same problem.

    o As Dave Kleikamp said, making >2TB file on JFS results in writing an
    invalid block number to disk inode. The cause is the same as above too.

    o In generic quota code dquot_transfer(), the file usage is calculated from
    i_blocks via inode_get_bytes(). If the file is over 2TB, the change of
    usage is less than expected. The cause is the same as above too.

    o As Trond Myklebust said, statfs64's entries related to blocks are invalid
    on statfs64 for a network filesystem which has more than 2^32-1 blocks with
    CONFIG_LBD disabled. [PATCH 3/3]

    We made patches to fix problems that occur when handling a large filesystem
    and a large file. It was discussed on the mails titled "stat64 for over 2TB
    file returned invalid st_blocks".

    Signed-off-by: Takashi Sato
    Cc: Dave Kleikamp
    Cc: Jan Kara
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Takashi Sato
     
  • Modify well over a dozen mempool users to call mempool_create_slab_pool()
    rather than calling mempool_create() with extra arguments, saving about 30
    lines of code and increasing readability.

    Signed-off-by: Matthew Dobson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Dobson
     
  • Create a simple wrapper function for the common case of creating a slab-based
    mempool.

    Signed-off-by: Matthew Dobson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Dobson
     
  • Add another allocator to the common mempool code: a kzalloc/kfree allocator

    This will be used by the next patch in the series to replace a mempool-backed
    kzalloc allocator. It is also very likely that there will be more users in
    the future.

    Signed-off-by: Matthew Dobson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Dobson
     
  • Add another allocator to the common mempool code: a kmalloc/kfree allocator

    This will be used by the next patch in the series to replace duplicate
    mempool-backed kmalloc allocators in several places in the kernel. It is also
    very likely that there will be more users in the future.

    Signed-off-by: Matthew Dobson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Dobson
     
  • This will be used by the next patch in the series to replace duplicate
    mempool-backed page allocators in 2 places in the kernel. It is also likely
    that there will be more users in the future.

    Signed-off-by: Matthew Dobson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Dobson
     
  • Change proc_dir_entry->size to be loff_t to represent files like
    /proc/vmcore for 32bit systems with more than 4G memory.

    Needed for seeing correct size for /proc/vmcore for 32-bit systems with >
    4G RAM.

    Signed-off-by: Maneesh Soni
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Maneesh Soni
     
  • Create compat_sys_adjtimex and use it an all appropriate places.

    Signed-off-by: Stephen Rothwell
    Cc: Arnd Bergmann
    Acked-by: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Rothwell
     
  • We had a copy of the compatibility version of struct timex in each 64 bit
    architecture. This patch just creates a global one and replaces all the
    usages of the old ones.

    Signed-off-by: Stephen Rothwell
    Cc: Arnd Bergmann
    Acked-by: Kyle McMartin
    Acked-by: Tony Luck
    Acked-by: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Rothwell
     
  • Lockd and the NFSv4 server both exercise a race condition where
    posix_test_lock() is called either before or after posix_lock_file() to
    deal with a denied lock request due to a conflicting lock.

    Remove the race condition for the NFSv4 server by adding a new conflicting
    lock parameter to __posix_lock_file() , changing the name to
    __posix_lock_file_conf().

    Keep posix_lock_file() interface, add posix_lock_conf() interface, both
    call __posix_lock_file_conf().

    [akpm@osdl.org: Put the EXPORT_SYMBOL() where it belongs]
    Signed-off-by: Andy Adamson
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Adamson
     
  • Add __KERNEL__ block.
    Use __KERNEL__ to allow ioctl interface to be usable.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Add full driver model support for the IPMI driver. It links in the proper
    bus and device support.

    It adds an "ipmi" driver interface that has each BMC discovered by the
    driver (as a device). These BMCs appear in the devices/platform directory.
    If there are multiple interfaces to the same BMC, the driver should
    discover this and will only have one BMC entry. The BMC entry will have
    pointers to each interface device that connects to it.

    The device information (statistics and config information) has not yet been
    ported over to the driver model from proc, that will come later.

    This work was based on work by Yani Ioannou. I basically rewrote it using
    that code as a guide, but he still deserves credit :).

    [bunk@stusta.de: make ipmi_find_bmc_guid() static]
    Signed-off-by: Corey Minyard
    Signed-off-by: Yani Ioannou
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Corey Minyard
     
  • net/core/flow.c: In function 'flow_cache_flush':
    net/core/flow.c:299: warning: statement with no effect

    Signed-off-by: Con Kolivas
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Con Kolivas
     
  • The return value of this function is never used, so let's be honest and
    declare it as void.

    Some places where invalidatepage returned 0, I have inserted comments
    suggesting a BUG_ON.

    [akpm@osdl.org: JBD BUG fix]
    [akpm@osdl.org: rework for git-nfs]
    [akpm@osdl.org: don't go BUG in block_invalidate_page()]
    Signed-off-by: Neil Brown
    Acked-by: Dave Kleikamp
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • The only user ignores the return value, and the only instanace
    (block_sync_page) always returns 0...

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Almost all users of the table addresses from the EFI system table want
    physical addresses. So rather than doing the pa->va->pa conversion, just keep
    physical addresses in struct efi.

    This fixes a DMI bug: the efi structure contained the physical SMBIOS address
    on x86 but the virtual address on ia64, so dmi_scan_machine() used ioremap()
    on a virtual address on ia64.

    This is essentially the same as an earlier patch by Matt Tolentino:
    http://marc.theaimsgroup.com/?l=linux-kernel&m=112130292316281&w=2
    except that this changes all table addresses, not just ACPI addresses.

    Matt's original patch was backed out because it caused MCAs on HP sx1000
    systems. That problem is resolved by the ioremap() attribute checking added
    for ia64.

    Signed-off-by: Bjorn Helgaas
    Cc: Matt Domsch
    Cc: "Tolentino, Matthew E"
    Cc: "Brown, Len"
    Cc: Andi Kleen
    Acked-by: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bjorn Helgaas
     
  • Check the EFI memory map so we can use the correct memory attributes for
    ioremap(). Previously, we always used uncacheable access, which blows up on
    some machines for regular system memory.

    Signed-off-by: Bjorn Helgaas
    Cc: Matt Domsch
    Cc: "Tolentino, Matthew E"
    Cc: "Brown, Len"
    Cc: Andi Kleen
    Acked-by: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bjorn Helgaas
     
  • Pass the size, not a pointer to the size, to efi_mem_attribute_range().

    This function validates memory regions for the /dev/mem read/write/mmap paths.
    The pointer allows arches to reduce the size of the range, but I think that's
    unnecessary complexity. Simplifying it will let me use
    efi_mem_attribute_range() to improve the ia64 ioremap() implementation.

    Signed-off-by: Bjorn Helgaas
    Cc: Matt Domsch
    Cc: "Tolentino, Matthew E"
    Cc: "Brown, Len"
    Cc: Andi Kleen
    Acked-by: "Luck, Tony"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bjorn Helgaas
     
  • Enable DMI table parsing on ia64.

    Andi Kleen has a patch in his x86_64 tree which enables the use of i386
    dmi_scan.c on x86_64. dmi_scan.c functions are being used by the
    drivers/char/ipmi/ipmi_si_intf.c driver for autodetecting the ports or
    memory spaces where the IPMI controllers may be found.

    This patch adds equivalent changes for ia64 as to what is in the x86_64
    tree. In addition, I reworked the DMI detection, such that on EFI-capable
    systems, it uses the efi.smbios pointer to find the table, rather than
    brute-force searching from 0xF0000. On non-EFI systems, it continues the
    brute-force search.

    My test system, an Intel S870BN4 'Tiger4', aka Dell PowerEdge 7250, with
    latest BIOS, does not list the IPMI controller in the ACPI namespace, nor
    does it have an ACPI SPMI table. Also note, currently shipping Dell x8xx
    EM64T servers don't have these either, so DMI is the only method for
    obtaining the address of the IPMI controller.

    Signed-off-by: Matt Domsch
    Acked-by: "Luck, Tony"
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Domsch
     
  • We have a problem in a lot of emulated storage in that it takes a page from
    get_user_pages() and does something like

    kmap_atomic(page)
    modify page
    kunmap_atomic(page)

    However, nothing has flushed the kernel cache view of the page before the
    kunmap. We need a lightweight API to do this, so this new API would
    specifically be for flushing the kernel cache view of a user page which the
    kernel has modified. The driver would need to add
    flush_kernel_dcache_page(page) before the final kunmap.

    Signed-off-by: James Bottomley
    Cc: Russell King
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    James Bottomley
     
  • Currently, get_user_pages() returns fully coherent pages to the kernel for
    anything other than anonymous pages. This is a problem for things like
    fuse and the SCSI generic ioctl SG_IO which can potentially wish to do DMA
    to anonymous pages passed in by users.

    The fix is to add a new memory management API: flush_anon_page() which
    is used in get_user_pages() to make anonymous pages coherent.

    Signed-off-by: James Bottomley
    Cc: Russell King
    Cc: "David S. Miller"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    James Bottomley
     
  • It has been discovered that the remove_proc_entry has a race in the removing
    of entries in the proc file system that are siblings. There's no protection
    around the traversing and removing of elements that belong in the same
    subdirectory.

    This subdirectory list is protected in other areas by the BKL. So the BKL was
    at first used to protect this area too, but unfortunately, remove_proc_entry
    may be called with spinlocks held. The BKL may schedule, so this was not a
    solution.

    The final solution was to add a new global spin lock to protect this list,
    called proc_subdir_lock. This lock now protects the list in
    remove_proc_entry, and I also went around looking for other areas that this
    list is modified and added this protection there too. Care must be taken
    since these locations call several functions that may also schedule.

    Since I don't see any location that these functions that modify the
    subdirectory list are called by interrupts, the irqsave/restore versions of
    the spin lock was _not_ used.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Rostedt
     

26 Mar, 2006

4 commits

  • * master.kernel.org:/home/rmk/linux-2.6-serial:
    [ARM] 3383/3: ixp2000: ixdp2x01 platform serial conversion
    [SERIAL] amba-pl010: Remove accessor macros
    [SERIAL] remove 8250_acpi (replaced by 8250_pnp and PNPACPI)
    [SERIAL] icom: select FW_LOADER

    Linus Torvalds
     
  • * master.kernel.org:/home/rmk/linux-2.6-arm:
    [ARM] 3030/2: fix permission check in the obscur cmpxchg syscall
    [ARM] nommu: rename compressed/head.S symbols to a new style
    [ARM] select TLS_REG_EMUL and NEEDS_SYSCALL_FOR_CMPXCHG
    [ARM] nommu: Move hardware page table definitions to pgtable-hwdef.h
    [ARM] Move read of processor ID out of lookup_processor_type()
    [ARM] Fix typo in tlbflush.h
    [ARM] noMMU: removes TLB codes in nommu mode
    [ARM] noMMU: block sys_fork in nommu mode
    [ARM] 3399/1: Fix link problem when CONFIG_PRINTK is disabled
    [ARM] 3398/1: Fix the VFP registers loading/storing base address
    [ARM] 3397/1: AT91RM9200 Header update
    [ARM] 3385/1: Battery support for sharp zaurus sl-5500 (collie)
    [ARM] SMP: don't set cpu_*_map in smp_prepare_boot_cpu
    include/linux/clk.h is betraying its ARM origins
    [ARM] Move enable_irq and disable_irq to assembler.h
    [ARM] 3391/1: use PLAT8250_DEV_PLATFORM{,1} for platform device id instead of 0/1

    Linus Torvalds
     
  • Patch from Lennert Buytenhek

    Add a PLAT8250_DEV_PLATFORM2, and convert the two ixdp2x01 CPLD serial
    ports to use platform serial devices with ids PLAT8250_DEV_PLATFORM[12].
    (The on-chip xscale UART is PLAT8250_DEV_PLATFORM, id #0.)

    Signed-off-by: Lennert Buytenhek
    Signed-off-by: Russell King

    Lennert Buytenhek
     
  • Fix merge conflict in arch/arm/mm/proc-xscale.S

    Signed-off-by: Russell King

    Russell King