02 Mar, 2007

40 commits

  • Describes how/when the information exported to `/proc/stat' is calculated,
    and possible problems with this approach.

    Signed-off-by: Vassili Karpov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vassili Karpov
     
  • The SMT scheduler incorrectly skips kernel threads even if they are
    runnable (but they are preempted by a higher-prio user-space task which got
    SMT-delayed by an even higher-priority task running on a sibling CPU).

    Fix this for now by only doing the SMT-nice optimization if the
    to-be-delayed task is the only runnable task. (This should cover most of
    the real-life cases anyway.)

    This bug has been in the SMT scheduler since 2.6.17 or so, but has only
    been noticed now by the active check in the dynticks code.

    Signed-off-by: Ingo Molnar
    Cc: Michal Piotrowski
    Cc: Nick Piggin
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • ps3: Introduce CONFIG_PS3_ADVANCED, as suggested by Roman Zippel, and use
    it to control questions about PS3 subsystems that may not be obvious for
    the casual user.

    This gets rid of the following warning on non-powerpc platforms: |
    drivers/video/Kconfig:1604:warning: 'select' used by config symbol 'FB_PS3'
    refer to undefined symbol 'PS3_PS3AV'

    Signed-off-by: Geert Uytterhoeven
    Acked-by: Geoff Levand
    Cc: Paul Mackerras
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     
  • There are race issues around ext[34] xattr block release code.

    ext[34]_xattr_release_block() checks the reference count of xattr block
    (h_refcount) and frees that xattr block if it is the last one reference it.
    Unlike ext2, the check of this counter is unprotected by any lock.
    ext[34]_xattr_release_block() will free the mb_cache entry before freeing
    that xattr block. There is a small window between the check for the re
    h_refcount ==1 and the call to mb_cache_entry_free(). During this small
    window another inode might find this xattr block from the mbcache and reuse
    it, racing a refcount updates. The xattr block will later be freed by the
    first inode without notice other inode is still use it. Later if that
    block is reallocated as a datablock for other file, then more serious
    problem might happen.

    We need put a lock around places checking the refount as well to avoid
    racing issue. Another place need this kind of protection is in
    ext3_xattr_block_set(), where it will modify the xattr block content in-
    the-fly if the refcount is 1 (means it's the only inode reference it).

    This will also fix another issue: the xattr block may not get freed at all
    if no lock is to protect the refcount check at the release time. It is
    possible that the last two inodes could release the shared xattr block at
    the same time. But both of them think they are not the last one so only
    decreased the h_refcount without freeing xattr block at all.

    We need to call lock_buffer() after ext3_journal_get_write_access() to
    avoid deadlock (because the later will call lock_buffer()/unlock_buffer
    () as well).

    Signed-off-by: Mingming Cao
    Cc: Andreas Gruenbacher
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mingming Cao
     
  • Fix the fact that pte_mkread set _PAGE_RW instead of _PAGE_USER (the logic is
    copied from i386 in most place, so it is really as bad as you're thinking).

    Thus currently page tables are more permissive than they should.

    Such a change may trigger other latent bugs, so be careful with this.

    Signed-off-by: Paolo 'Blaisorblade' Giarrusso
    Signed-off-by: Jeff Dike
    Cc: Paolo 'Blaisorblade' Giarrusso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Dike
     
  • This fixes a problem seen by a number of people running UML on newer host
    kernels. init would hang with an infinite segfault loop.

    It turns out that the host kernel was providing a AT_SYSINFO_EHDR of
    0xffffe000, which faked UML into believing that the host VDSO page could be
    reused. However, AT_SYSINFO pointed into the middle of the address space, and
    was unmapped as a result. Because UML was providing AT_SYSINFO_EHDR and
    AT_SYSINFO to its own processes, these would branch to nowhere when trying to
    use the VDSO.

    The fix is to also check the location of AT_SYSINFO when deciding whether to
    use the host's VDSO.

    Signed-off-by: Jeff Dike
    Cc: Paolo 'Blaisorblade' Giarrusso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Dike
     
  • Add the RAW device driver options to the UML Kconfig.char file so that you may
    use them in UML.

    Signed-off-by: Allan Graves
    Signed-off-by: Jeff Dike
    Cc: Paolo 'Blaisorblade' Giarrusso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Allan Graves
     
  • linux/irq.h uses EINVAL but does not #include linux/errno.h. This results in
    the compiler spitting out errors on some files.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • throttle_vm_writeout() is designed to wait for the dirty levels to subside.
    But if the caller holds IO or FS locks, we might be holding up that writeout.

    So change it to take a single nap to give other devices a chance to clean some
    memory, then return.

    Cc: Nick Piggin
    Cc: OGAWA Hirofumi
    Cc: Kumar Gala
    Cc: Pete Zaitcev
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Since the bay driver depends on the dock driver for proper notification,
    make this driver depend on the dock driver.

    Signed-off-by: Kristen Carlson Accardi
    Acked-by: Len Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kristen Carlson Accardi
     
  • The code is seemingly trying to make sure that rb_next() brings us to
    successive increasing vma entries.

    But the two variables, prev and pend, used to perform these checks, are
    never advanced.

    Signed-off-by: David S. Miller
    Cc: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Miller
     
  • In the 2.6.20 hang patch, I accidentally threw out an error message.
    This puts it back.

    Signed-off-by: Jeff Dike
    Cc: Paolo 'Blaisorblade' Giarrusso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Dike
     
  • Add some locking to host_ldt_entries to prevent racing when reading LDT
    information from the host.

    The locking is somewhat more careful than my previous attempt. Now, only
    the check of host_ldt_entries is locked. The lock is dropped immediately
    afterwards, and if the LDT needs initializing, that (and the memory
    allocations needed) proceed outside the lock.

    Also fixed some style violations.

    Signed-off-by: Jeff Dike
    Cc: Paolo 'Blaisorblade' Giarrusso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Dike
     
  • a) Remove #define acrobatics that have become unnecessary by the move of
    asyncdata.o into the common part.

    b) Correct the rule for building the common part into the kernel when
    some or all hardware specific parts are built as modules.

    Signed-off-by: Tilman Schmidt
    Cc: Adrian Bunk
    Cc: Karsten Keil
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tilman Schmidt
     
  • Dmitriy Monakhov wrote:
    > if path_lookup() return non zero code we don't have to worry about
    > 'nd' parameter, but ecryptfs_read_super does path_release(&nd) after
    > path_lookup has failed, and dentry counter becomes negative

    Do not do a path_release after a path_lookup error.

    Signed-off-by: Michael Halcrow
    Cc: Dmitriy Monakhov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Halcrow
     
  • Remove unnecessary flush_dcache_page() call. Thanks to Dmitriy
    Monakhov for pointing this out.

    Signed-off-by: Michael Halcrow
    Cc: Dmitriy Monakhov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Halcrow
     
  • O_LARGEFILE should be set here when opening the lower file.

    Signed-off-by: Michael Halcrow
    Cc: Dmitriy Monakhov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Halcrow
     
  • Fix an oops on the rtc_device_unregister() path by waiting until the last
    moment before nulling the rtc->ops vector. Fix some potential oopses by
    having the rtc_class_open()/rtc_class_close() interface increase the RTC's
    reference count while an RTC handle is available outside the RTC framework.

    Signed-off-by: David Brownell
    Cc: Alessandro Zummo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Brownell
     
  • adaplas@pol.net is still alive, but is choking on the traffic.

    Signed-off-by: Antonino Daplas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Antonino A. Daplas
     
  • Add -mm testing to SubmitChecklist.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • lockdep_init() is marked __init but used in several places
    outside __init code. This causes following warnings:
    $ scripts/mod/modpost kernel/lockdep.o
    WARNING: kernel/built-in.o - Section mismatch: reference to .init.text:lockdep_init from .text.lockdep_init_map after 'lockdep_init_map' (at offset 0x105)
    WARNING: kernel/built-in.o - Section mismatch: reference to .init.text:lockdep_init from .text.lockdep_reset_lock after 'lockdep_reset_lock' (at offset 0x35)
    WARNING: kernel/built-in.o - Section mismatch: reference to .init.text:lockdep_init from .text.__lock_acquire after '__lock_acquire' (at offset 0xb2)

    The warnings are less obviously due to heavy inlining by gcc - this is not
    altered.

    Fix the section mismatch warnings by removing the __init marking, which
    seems obviously wrong.

    Signed-off-by: Sam Ravnborg
    Acked-by: Ingo Molnar
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sam Ravnborg
     
  • Rename PG_checked to PG_owner_priv_1 to reflect its availablilty as a
    private flag for use by the owner/allocator of the page. In the case of
    pagecache pages (which might be considered to be owned by the mm),
    filesystems may use the flag.

    Signed-off-by: Jeremy Fitzhardinge
    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • /home/bunk/linux/kernel-2.6/linux-2.6.20-mm2/kernel/sysctl.c:1411: error: conflicting types for 'register_sysctl_table'
    /home/bunk/linux/kernel-2.6/linux-2.6.20-mm2/include/linux/sysctl.h:1042: error: previous declaration of 'register_sysctl_table' was here
    make[2]: *** [kernel/sysctl.o] Error 1

    Caused by commit 0b4d414714f0d2f922d39424b0c5c82ad900a381.

    Signed-off-by: Adrian Bunk
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Problem description at:
    http://bugzilla.kernel.org/show_bug.cgi?id=8048

    Commit b18ec80396834497933d77b81ec0918519f4e2a7
    [PATCH] sched: improve migration accuracy
    optimized the scheduler time calculations, but broke posix-cpu-timers.

    The problem is that the p->last_ran value is not updated after a context
    switch. So a subsequent call to current_sched_time() calculates with a
    stale p->last_ran value, i.e. accounts the full time, which the task was
    scheduled away.

    Signed-off-by: Thomas Gleixner
    Acked-by: Ingo Molnar
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • eCryptfs lower file handling code has several issues:
    - Retval from prepare_write()/commit_write() wasn't checked to equality
    to AOP_TRUNCATED_PAGE.
    - In some places page wasn't unmapped and unlocked after error.

    Signed-off-by: Michael Halcrow
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Halcrow
     
  • Fix kernel-doc warnings in 2.6.20-git15 (lib/, mm/, kernel/, include/).

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Allow space(s) between "__attribute__" and "((blah))" so that
    kernel-doc does not complain like:

    Warning(/tester/linsrc/linux-2.6.20-git15//kernel/timer.c:939): No description found for parameter 'read_persistent_clock(void'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Signed-off-by: Daniel Walker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Daniel Walker
     
  • Add some missing lazy MMU hooks for NOMMU mode.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • FRV does not require a ZONE_DMA, so all DMA'able pages that aren't highmem
    should be in ZONE_NORMAL.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • i.e. one or more drives can be added and the array will re-stripe
    while on-line.

    Most of the interesting work was already done for raid5. This just extends it
    to raid6.

    mdadm newer than 2.6 is needed for complete safety, however any version of
    mdadm which support raid5 reshape will do a good enough job in almost all
    cases (an 'echo repair > /sys/block/mdX/md/sync_action' is recommended after a
    reshape that was aborted and had to be restarted with an such a version of
    mdadm).

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • An error always aborts any resync/recovery/reshape on the understanding that
    it will immediately be restarted if that still makes sense. However a reshape
    currently doesn't get restarted. With this patch it does.

    To avoid restarting when it is not possible to do work, we call into the
    personality to check that a reshape is ok, and strengthen raid5_check_reshape
    to fail if there are too many failed devices.

    We also break some code out into a separate function: remove_and_add_spares as
    the indent level for that code was getting crazy.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • The mddev and queue might be used for another array which does not set these,
    so they need to be cleared.

    Signed-off-by: NeilBrown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • md tries to warn the user if they e.g. create a raid1 using two partitions of
    the same device, as this does not provide true redundancy.

    However it also warns if a raid0 is created like this, and there is nothing
    wrong with that.

    At the place where the warning is currently printer, we don't necessarily know
    what level the array will be, so move the warning from the point where the
    device is added to the point where the array is started.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • - Use kernel_fpu_begin() and kernel_fpu_end()
    - Use boot_cpu_has() for feature testing even in userspace

    Signed-off-by: H. Peter Anvin
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    H. Peter Anvin
     
  • There are two errors that can lead to recovery problems with raid10
    when used in 'far' more (not the default).

    Due to a '>' instead of '>=' the wrong block is located which would result in
    garbage being written to some random location, quite possible outside the
    range of the device, causing the newly reconstructed device to fail.

    The device size calculation had some rounding errors (it didn't round when it
    should) and so recovery would go a few blocks too far which would again cause
    a write to a random block address and probably a device error.

    The code for working with device sizes was fairly confused and spread out, so
    this has been tided up a bit.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • If register_blkdev() or alloc-disk fail in mm_init() after
    pci_register_driver() succeeds, then mm_pci_driver is not unregistered
    properly:

    Cc: Philip Guo
    Signed-off-by: Neil Brown
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Signed-off-by: Adrian Bunk
    Cc: Ben Dooks
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • shmem_{nopage,mmap} are no longer used in ipc/shm.c

    Signed-off-by: Adrian Bunk
    Cc: "Eric W. Biederman"
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • shm_nopage() can become static.

    Signed-off-by: Adrian Bunk
    Acked-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk