14 Dec, 2006

40 commits

  • Signed-off-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • Signed-off-by: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • I don't see why there is a memory barrier in copy_from_read_buf() at all.
    Even if it was useful spin_unlock_irqrestore implies a barrier.

    Signed-off-by: Ralf Baechle
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ralf Baechle
     
  • All kcalloc() calls of the form "kcalloc(1,...)" are converted to the
    equivalent kzalloc() calls, and a few kcalloc() calls with the incorrect
    ordering of the first two arguments are fixed.

    Signed-off-by: Robert P. J. Day
    Cc: Jeff Garzik
    Cc: Alan Cox
    Cc: Dominik Brodowski
    Cc: Adam Belay
    Cc: James Bottomley
    Cc: Greg KH
    Cc: Mark Fasheh
    Cc: Trond Myklebust
    Cc: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     
  • activate_mm() is not the right thing to be using in use_mm(). It should be
    switch_mm().

    On normal x86, they're synonymous, but for the Xen patches I'm adding a
    hook which assumes that activate_mm is only used the first time a new mm
    is used after creation (I have another hook for dealing with dup_mm). I
    think this use of activate_mm() is the only place where it could be used
    a second time on an mm.

    >From a quick look at the other architectures I think this is OK (most
    simply implement one in terms of the other), but some are doing some
    subtly different stuff between the two.

    Acked-by: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeremy Fitzhardinge
     
  • Jarek Poplawski noticed that lockdep global state could be accessed in a
    racy way if one CPU did a lockdep assert (shutting lockdep down), while the
    other CPU would try to do something that changes its global state.

    This patch fixes those races and cleans up lockdep's internal locking by
    adding a graph_lock()/graph_unlock()/debug_locks_off_graph_unlock helpers.

    (Also note that as we all know the Linux kernel is, by definition, bug-free
    and perfect, so this code never triggers, so these fixes are highly
    theoretical. I wrote this patch for aesthetic reasons alone.)

    [akpm@osdl.org: build fix]
    [jarkao2@o2.pl: build fix's refix]
    Signed-off-by: Ingo Molnar
    Signed-off-by: Jarek Poplawski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • When we print an assert due to scheduling-in-atomic bugs, and if lockdep
    is enabled, then the IRQ tracing information of lockdep can be printed
    to pinpoint the code location that disabled interrupts. This saved me
    quite a bit of debugging time in cases where the backtrace did not
    identify the irq-disabling site well enough.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • CONFIG_DEBUG_LOCKDEP is unacceptably slow because it does not utilize
    the chain-hash. Turn the chain-hash back on in this case too.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Cleanup: the VERY_VERBOSE define was unnecessarily dependent on #ifdef VERBOSE
    - while the VERBOSE switch is 0 or 1 (always defined).

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Clear all the chains during lockdep_reset(). This fixes some locking-selftest
    false positives i saw on -rt. (never saw those on mainline though, but it
    could happen.)

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Make verbose lockdep messages (off by default) more informative by printing
    out the hash chain key. (this patch was what helped me catch the earlier
    lockdep hash-collision bug)

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Fix typo in the class_filter() function. (filtering is not used by default so
    this only affects lockdep-internal debugging cases)

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Most distributions enable sysrq support but set it to 0 by default. Add a
    sysrq_always_enabled boot option to always-enable sysrq keys. Useful for
    debugging - without having to modify the disribution's config files (which
    might not be possible if the kernel is on a live CD, etc.).

    Also, while at it, clean up the sysrq interfaces.

    [bunk@stusta.de: make sysrq_always_enabled_setup() static]
    Signed-off-by: Ingo Molnar
    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Implement block device specific .direct_IO method instead of going through
    generic direct_io_worker for block device.

    direct_io_worker() is fairly complex because it needs to handle O_DIRECT on
    file system, where it needs to perform block allocation, hole detection,
    extents file on write, and tons of other corner cases. The end result is
    that it takes tons of CPU time to submit an I/O.

    For block device, the block allocation is much simpler and a tight triple
    loop can be written to iterate each iovec and each page within the iovec in
    order to construct/prepare bio structure and then subsequently submit it to
    the block layer. This significantly speeds up O_D on block device.

    [akpm@osdl.org: small speedup]
    Signed-off-by: Ken Chen
    Cc: Christoph Hellwig
    Cc: Zach Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen, Kenneth W
     
  • Update ocfs2_should_update_atime() to understand the MNT_RELATIME flag and
    to test against mtime / ctime accordingly.

    [akpm@osdl.org: cleanups]
    Signed-off-by: Mark Fasheh
    Cc: Valerie Henson
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark Fasheh
     
  • Add "relatime" (relative atime) support. Relative atime only updates the
    atime if the previous atime is older than the mtime or ctime. Like
    noatime, but useful for applications like mutt that need to know when a
    file has been read since it was last modified.

    A corresponding patch against mount(8) is available at
    http://userweb.kernel.org/~akpm/mount-relative-atime.txt

    Signed-off-by: Valerie Henson
    Cc: Mark Fasheh
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Karel Zak
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Valerie Henson
     
  • Simplify touch_atime() layout.

    Cc: Valerie Henson
    Cc: Mark Fasheh
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • The kernel termios (ktermios) changes were somehow missed for Xtensa. This
    patch adds the ktermios structure and also includes some minor file name
    fix that was missed in the syscall patch.

    Signed-off-by: Chris Zankel
    Acked-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chris Zankel
     
  • Currently, to tell a task that it should go to the refrigerator, we set the
    PF_FREEZE flag for it and send a fake signal to it. Unfortunately there
    are two SMP-related problems with this approach. First, a task running on
    another CPU may be updating its flags while the freezer attempts to set
    PF_FREEZE for it and this may leave the task's flags in an inconsistent
    state. Second, there is a potential race between freeze_process() and
    refrigerator() in which freeze_process() running on one CPU is reading a
    task's PF_FREEZE flag while refrigerator() running on another CPU has just
    set PF_FROZEN for the same task and attempts to reset PF_FREEZE for it. If
    the refrigerator wins the race, freeze_process() will state that PF_FREEZE
    hasn't been set for the task and will set it unnecessarily, so the task
    will go to the refrigerator once again after it's been thawed.

    To solve first of these problems we need to stop using PF_FREEZE to tell
    tasks that they should go to the refrigerator. Instead, we can introduce a
    special TIF_*** flag and use it for this purpose, since it is allowed to
    change the other tasks' TIF_*** flags and there are special calls for it.

    To avoid the freeze_process()-refrigerator() race we can make
    freeze_process() to always check the task's PF_FROZEN flag after it's read
    its "freeze" flag. We should also make sure that refrigerator() will
    always reset the task's "freeze" flag after it's set PF_FROZEN for it.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Cc: Russell King
    Cc: David Howells
    Cc: Andi Kleen
    Cc: "Luck, Tony"
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Paul Mundt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • Currently, if a task is stopped (ie. it's in the TASK_STOPPED state), it
    is considered by the freezer as unfreezeable. However, there may be a race
    between the freezer and the delivery of the continuation signal to the task
    resulting in the task running after we have finished freezing the other
    tasks. This, in turn, may lead to undesirable effects up to and including
    data corruption.

    To prevent this from happening we first need to make the freezer consider
    stopped tasks as freezeable. For this purpose we need to make freezeable()
    stop returning 0 for these tasks and we need to force them to enter the
    refrigerator. However, if there's no continuation signal in the meantime,
    the stopped tasks should remain stopped after all processes have been
    thawed, so we need to send an additional SIGSTOP to each of them before
    waking it up.

    Also, a stopped task that has just been woken up should first check if
    there's a freezing request for it and go to the refrigerator if that's the
    case.

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • When some objects are allocated by one CPU but freed by another CPU we can
    consume lot of cycles doing divides in obj_to_index().

    (Typical load on a dual processor machine where network interrupts are
    handled by one particular CPU (allocating skbufs), and the other CPU is
    running the application (consuming and freeing skbufs))

    Here on one production server (dual-core AMD Opteron 285), I noticed this
    divide took 1.20 % of CPU_CLK_UNHALTED events in kernel. But Opteron are
    quite modern cpus and the divide is much more expensive on oldest
    architectures :

    On a 200 MHz sparcv9 machine, the division takes 64 cycles instead of 1
    cycle for a multiply.

    Doing some math, we can use a reciprocal multiplication instead of a divide.

    If we want to compute V = (A / B) (A and B being u32 quantities)
    we can instead use :

    V = ((u64)A * RECIPROCAL(B)) >> 32 ;

    where RECIPROCAL(B) is precalculated to ((1LL << 32) + (B - 1)) / B

    Note :

    I wrote pure C code for clarity. gcc output for i386 is not optimal but
    acceptable :

    mull 0x14(%ebx)
    mov %edx,%eax // part of the >> 32
    xor %edx,%edx // useless
    mov %eax,(%esp) // could be avoided
    mov %edx,0x4(%esp) // useless
    mov (%esp),%ebx

    [akpm@osdl.org: small cleanups]
    Signed-off-by: Eric Dumazet
    Cc: Christoph Lameter
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • Elaborate the API for calling cpuset_zone_allowed(), so that users have to
    explicitly choose between the two variants:

    cpuset_zone_allowed_hardwall()
    cpuset_zone_allowed_softwall()

    Until now, whether or not you got the hardwall flavor depended solely on
    whether or not you or'd in the __GFP_HARDWALL gfp flag to the gfp_mask
    argument.

    If you didn't specify __GFP_HARDWALL, you implicitly got the softwall
    version.

    Unfortunately, this meant that users would end up with the softwall version
    without thinking about it. Since only the softwall version might sleep,
    this led to bugs with possible sleeping in interrupt context on more than
    one occassion.

    The hardwall version requires that the current tasks mems_allowed allows
    the node of the specified zone (or that you're in interrupt or that
    __GFP_THISNODE is set or that you're on a one cpuset system.)

    The softwall version, depending on the gfp_mask, might allow a node if it
    was allowed in the nearest enclusing cpuset marked mem_exclusive (which
    requires taking the cpuset lock 'callback_mutex' to evaluate.)

    This patch removes the cpuset_zone_allowed() call, and forces the caller to
    explicitly choose between the hardwall and the softwall case.

    If the caller wants the gfp_mask to determine this choice, they should (1)
    be sure they can sleep or that __GFP_HARDWALL is set, and (2) invoke the
    cpuset_zone_allowed_softwall() routine.

    This adds another 100 or 200 bytes to the kernel text space, due to the few
    lines of nearly duplicate code at the top of both cpuset_zone_allowed_*
    routines. It should save a few instructions executed for the calls that
    turned into calls of cpuset_zone_allowed_hardwall, thanks to not having to
    set (before the call) then check (within the call) the __GFP_HARDWALL flag.

    For the most critical call, from get_page_from_freelist(), the same
    instructions are executed as before -- the old cpuset_zone_allowed()
    routine it used to call is the same code as the
    cpuset_zone_allowed_softwall() routine that it calls now.

    Not a perfect win, but seems worth it, to reduce this chance of hitting a
    sleeping with irq off complaint again.

    Signed-off-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     
  • More cleanups for slab.h

    1. Remove tabs from weird locations as suggested by Pekka

    2. Drop the check for NUMA and SLAB_DEBUG from the fallback section
    as suggested by Pekka.

    3. Uses static inline for the fallback defs as also suggested by Pekka.

    4. Make kmem_ptr_valid take a const * argument.

    5. Separate the NUMA fallback definitions from the kmalloc_track fallback
    definitions.

    Signed-off-by: Christoph Lameter
    Cc: Pekka Enberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • This is a response to an earlier discussion on linux-mm about splitting
    slab.h components per allocator. Patch is against 2.6.19-git11. See
    http://marc.theaimsgroup.com/?l=linux-mm&m=116469577431008&w=2

    This patch cleans up the slab header definitions. We define the common
    functions of slob and slab in slab.h and put the extra definitions needed
    for slab's kmalloc implementations in . In order to get
    a greater set of common functions we add several empty functions to slob.c
    and also rename slob's kmalloc to __kmalloc.

    Slob does not need any special definitions since we introduce a fallback
    case. If there is no need for a slab implementation to provide its own
    kmalloc mess^H^H^Hacros then we simply fall back to __kmalloc functions.
    That is sufficient for SLOB.

    Sort the function in slab.h according to their functionality. First the
    functions operating on struct kmem_cache * then the kmalloc related
    functions followed by special debug and fallback definitions.

    Also redo a lot of comments.

    Signed-off-by: Christoph Lameter ?
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Remove calls to pci_disable_device except in fail_all_cmds. The
    pci_disable_device function does something nasty to Smart Array controllers
    that pci_enable_device does not undo. So if the driver is unloaded it
    cannot be reloaded.

    Also, customers can disable any pci device via the ROM Based Setup Utility
    (RBSU). If the customer has disabled the controller we should not try to
    blindly enable the card from the driver. Please consider this for
    inclusion.

    Signed-off-by: Mike Miller
    Acked-by: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Miller
     
  • Map out more memory for our config table. It's required to reach offset
    0x214 to disable DMA on the P600. I'm not sure how I lost this hunk.
    Please consider this for inclusion.

    Signed-off-by: Mike Miller
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Miller
     
  • When CONFIG_PCI is not defined (i.e. PCI bus is disabled), the sx driver
    fails to link, since some pci functions are not available. Fix this
    behaviour to be able to compile this driver on machines with no PCI bus
    (but with ISA bus support).

    Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • When CONFIG_PCI is not defined (i.e. PCI bus is disabled), the mxser_new
    driver fails to link, since some pci functions are not available. Fix this
    behaviour to be able to compile this driver on machines with no PCI bus
    (but with ISA bus support).

    Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Slaby
     
  • With CONFIG_PCI=n:
    drivers/char/isicom.c: In function 'isicom_probe':
    drivers/char/isicom.c:1793: warning: implicit declaration of function
    'pci_request_region'
    drivers/char/isicom.c:1827: warning: implicit declaration of function
    'pci_release_region'

    Let's CONFIG_ISI depend on CONFIG_PCI.

    Signed-off-by: Randy Dunlap
    Signed-off-by: Jiri Slaby
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Based on patch from Alexander Rigbo

    Signed-off-by: Evgeniy Polyakov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Evgeniy Polyakov
     
  • It seems macbooks set bit 2 but not bit 0, which is an "enabled but vmxon will
    fault" setting.

    Signed-off-by: Avi Kivity
    Tested-by: Alex Larsson (sometimes testing helps)
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Avi Kivity
     
  • Signed-off-by: Avi Kivity
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Riepe
     
  • They're not on speaking terms.

    Signed-off-by: Avi Kivity
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Avi Kivity
     
  • Thanks Jens for alerting me to this.

    Cc: Jens Axboe
    Cc:
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • The previous checkstack fix for UML, which needs to use the host's tools,
    was wrong in the crossbuilding case. It would use the build host's, rather
    than the target's, toolchain.

    This patch removes the old fix and adds an explicit special case for UML,
    leaving everyone else alone.

    Signed-off-by: Jeff Dike
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Dike
     
  • Fallback_alloc() does not do the check for GFP_WAIT as done in
    cache_grow(). Thus interrupts are disabled when we call kmem_getpages()
    which results in the failure.

    Duplicate the handling of GFP_WAIT in cache_grow().

    Signed-off-by: Christoph Lameter
    Cc: Jay Cliburn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Fields of struct pipe_buf_operations have not a precise layout (ie not
    optimized to fit cache lines nor reduce cache line ping pongs)

    The bufs[] array is *large* and is placed near the beginning of the
    structure, so all following fields have a large offset. This is
    unfortunate because many archs have smaller instructions when using small
    offsets relative to a base register. On x86 for example, 7 bits offsets
    have smaller instruction lengths.

    Moving bufs[] at the end of pipe_buf_operations permits all fields to have
    small offsets, and reduce text size, and icache pressure.

    # size vmlinux.pre vmlinux
    text data bss dec hex filename
    3268989 664356 492196 4425541 438745 vmlinux.pre
    3268765 664356 492196 4425317 438665 vmlinux

    So this patch reduces text size by 224 bytes on my x86_64 machine. Similar
    results on ia32.

    Signed-off-by: Eric Dumazet
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • Clean up a little.

    Signed-off-by: Karsten Wiese
    Cc: Sam Ravnborg
    Cc: Roman Zippel
    Acked-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Karsten Wiese
     
  • Added function sets "void (*conf_changed_callback)(void)". Call it, if
    .config's changed state changes. Use above in qconf.cc to set gui's
    save-widget's sensitvity.

    Signed-off-by: Karsten Wiese
    Cc: Sam Ravnborg
    Cc: Roman Zippel
    Acked-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Karsten Wiese
     
  • Those two functions are
    void sym_set_change_count(int count)
    and
    void sym_add_change_count(int count)

    All write accesses to sym_change_count are replaced by calls to above
    functions.

    Variable and changer-functions are moved to confdata.c. IMO thats ok, as
    sym_change_count is an attribute of the .config's change state.

    Signed-off-by: Karsten Wiese
    Cc: Sam Ravnborg
    Cc: Roman Zippel
    Acked-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Karsten Wiese