26 Jun, 2005

26 commits

  • This patch contains the following cleanups:
    - make needlessly global code static
    - remove the following unused global functions:
    - blkdev_scsi_issue_flush_fn
    - __blk_attempt_remerge
    - remove the following unused EXPORT_SYMBOL's:
    - blk_phys_contig_segment
    - blk_hw_contig_segment
    - blkdev_scsi_issue_flush_fn
    - __blk_attempt_remerge

    Signed-off-by: Adrian Bunk
    Acked-by: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • This patch contains the following possible cleanups:
    - make the needlessly global function __nvram_set_checksum static
    - #if 0 the unused global function nvram_set_checksum
    - remove the EXPORT_SYMBOL's for both functions

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • This patch makes use of ALIGN() to remove duplicate round-up code.

    Signed-off-by: Nick Wilson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Wilson
     
  • o Following patch provides purely cosmetic changes and corrects CodingStyle
    guide lines related certain issues like below in kexec related files

    o braces for one line "if" statements, "for" loops,
    o more than 80 column wide lines,
    o No space after "while", "for" and "switch" key words

    o Changes:
    o take-2: Removed the extra tab before "case" key words.
    o take-3: Put operator at the end of line and space before "*/"

    Signed-off-by: Maneesh Soni
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Maneesh Soni
     
  • Makes kexec_crashdump() take a pt_regs * as an argument. This allows to
    get exact register state at the point of the crash. If we come from direct
    panic assertion NULL will be passed and the current registers saved before
    crashdump.

    This hooks into two places:
    die(): check the conditions under which we will panic when calling
    do_exit and go there directly with the pt_regs that caused the fatal
    fault.

    die_nmi(): If we receive an NMI lockup while in the kernel use the
    pt_regs and go directly to crash_kexec(). We're probably nested up badly
    at this point so this might be the only chance to escape with proper
    information.

    Signed-off-by: Alexander Nyberg
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Nyberg
     
  • From: "Vivek Goyal"

    o Support for /proc/vmcore interface. This interface exports elf core image
    either in ELF32 or ELF64 format, depending on the format in which elf headers
    have been stored by crashed kernel.
    o Added support for CONFIG_VMCORE config option.
    o Removed the dependency on /proc/kcore.

    From: "Eric W. Biederman"

    This patch has been refactored to more closely match the prevailing style in
    the affected files. And to clearly indicate the dependency between
    /proc/kcore and proc/vmcore.c

    From: Hariprasad Nellitheertha

    This patch contains the code that provides an ELF format interface to the
    previous kernel's memory post kexec reboot.

    Signed off by Hariprasad Nellitheertha
    Signed-off-by: Eric Biederman
    Signed-off-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     
  • This patch adds support for retrieving the address of elf core header if one
    is passed in command line.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     
  • This patch provides the interfaces necessary to read the dump contents,
    treating it as a high memory device.

    Signed off by Hariprasad Nellitheertha
    Signed-off-by: Eric Biederman
    Signed-off-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     
  • This patch retrieves the max_pfn being used by previous kernel and stores it
    in a safe location (saved_max_pfn) before it is overwritten due to user
    defined memory map. This pfn is used to make sure that user does not try to
    read the physical memory beyond saved_max_pfn.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vivek Goyal
     
  • Add kexec support for s390 architecture.

    From: Milton Miller

    - Fix passing of first argument to relocate_kernel assembly.
    - Fix Kconfig description.
    - Remove wrong comment and comments that describe obvious things.
    - Allow only KEXEC_TYPE_DEFAULT as image type -> dump not supported.

    Acked-by: Martin Schwidefsky
    Signed-off-by: Heiko Carstens
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • This patch introduces the architecture independent implementation the
    sys_kexec_load, the compat_sys_kexec_load system calls.

    Kexec on panic support has been integrated into the core patch and is
    relatively clean.

    In addition the hopefully architecture independent option
    crashkernel=size@location has been docuemented. It's purpose is to reserve
    space for the panic kernel to live, and where no DMA transfer will ever be
    setup to access.

    Signed-off-by: Eric Biederman
    Signed-off-by: Alexander Nyberg
    Signed-off-by: Adrian Bunk
    Signed-off-by: Vivek Goyal
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • This patch adds a new preemption model: 'Voluntary Kernel Preemption'. The
    3 models can be selected from a new menu:

    (X) No Forced Preemption (Server)
    ( ) Voluntary Kernel Preemption (Desktop)
    ( ) Preemptible Kernel (Low-Latency Desktop)

    we still default to the stock (Server) preemption model.

    Voluntary preemption works by adding a cond_resched()
    (reschedule-if-needed) call to every might_sleep() check. It is lighter
    than CONFIG_PREEMPT - at the cost of not having as tight latencies. It
    represents a different latency/complexity/overhead tradeoff.

    It has no runtime impact at all if disabled. Here are size stats that show
    how the various preemption models impact the kernel's size:

    text data bss dec hex filename
    3618774 547184 179896 4345854 424ffe vmlinux.stock
    3626406 547184 179896 4353486 426dce vmlinux.voluntary +0.2%
    3748414 548640 179896 4476950 445016 vmlinux.preempt +3.5%

    voluntary-preempt is +0.2% of .text, preempt is +3.5%.

    This feature has been tested for many months by lots of people (and it's
    also included in the RHEL4 distribution and earlier variants were in Fedora
    as well), and it's intended for users and distributions who dont want to
    use full-blown CONFIG_PREEMPT for one reason or another.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • The following patches add dynamic sched domains functionality that was
    extensively discussed on lkml and lse-tech. I would like to see this added to
    -mm

    o The main advantage with this feature is that it ensures that the scheduler
    load balacing code only balances against the cpus that are in the sched
    domain as defined by an exclusive cpuset and not all of the cpus in the
    system. This removes any overhead due to load balancing code trying to
    pull tasks outside of the cpu exclusive cpuset only to be prevented by
    the tasks' cpus_allowed mask.
    o cpu exclusive cpusets are useful for servers running orthogonal
    workloads such as RT applications requiring low latency and HPC
    applications that are throughput sensitive

    o It provides a new API partition_sched_domains in sched.c
    that makes dynamic sched domains possible.
    o cpu_exclusive cpusets sets are now associated with a sched domain.
    Which means that the users can dynamically modify the sched domains
    through the cpuset file system interface
    o ia64 sched domain code has been updated to support this feature as well
    o Currently, this does not support hotplug. (However some of my tests
    indicate hotplug+preempt is currently broken)
    o I have tested it extensively on x86.
    o This should have very minimal impact on performance as none of
    the fast paths are affected

    Signed-off-by: Dinakar Guniguntala
    Acked-by: Paul Jackson
    Acked-by: Nick Piggin
    Acked-by: Matthew Dobson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dinakar Guniguntala
     
  • Consolidate balance-on-exec with balance-on-fork. This is made easy by the
    sched-domains RCU patches.

    As well as the general goodness of code reduction, this allows the runqueues
    to be unlocked during balance-on-fork.

    schedstats is a problem. Maybe just have balance-on-event instead of
    distinguishing fork and exec?

    Signed-off-by: Nick Piggin
    Acked-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Instead of requiring architecture code to interact with the scheduler's
    locking implementation, provide a couple of defines that can be used by the
    architecture to request runqueue unlocked context switches, and ask for
    interrupts to be enabled over the context switch.

    Also replaces the "switch_lock" used by these architectures with an oncpu
    flag (note, not a potentially slow bitflag). This eliminates one bus
    locked memory operation when context switching, and simplifies the
    task_running function.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Do some basic initial tuning.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Add SCHEDSTAT statistics for sched-balance-fork.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Reimplement the balance on exec balancing to be sched-domains aware. Use this
    to also do balance on fork balancing. Make x86_64 do balance on fork over the
    NUMA domain.

    The problem that the non sched domains aware blancing became apparent on dual
    core, multi socket opterons. What we want is for the new tasks to be sent to
    a different socket, but more often than not, we would first load up our
    sibling core, or fill two cores of a single remote socket before selecting a
    new one.

    This gives large improvements to STREAM on such systems.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Remove the very aggressive idle stuff that has recently gone into 2.6 - it is
    going against the direction we are trying to go. Hopefully we can regain
    performance through other methods.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Do CPU load averaging over a number of different intervals. Allow each
    interval to be chosen by sending a parameter to source_load and target_load.
    0 is instantaneous, idx > 0 returns a decaying average with the most recent
    sample weighted at 2^(idx-1). To a maximum of 3 (could be easily increased).

    So generally a higher number will result in more conservative balancing.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • 2.6.12-rc6-mm1 has a few remaining synchronize_kernel()s, some (but not
    all) in comments. This patch changes these synchronize_kernel() calls (and
    comments) to synchronize_rcu() or synchronize_sched() as follows:

    - arch/x86_64/kernel/mce.c mce_read(): change to synchronize_sched() to
    handle races with machine-check exceptions (synchronize_rcu() would not cut
    it given RCU implementations intended for hardcore realtime use.

    - drivers/input/serio/i8042.c i8042_stop(): change to synchronize_sched() to
    handle races with i8042_interrupt() interrupt handler. Again,
    synchronize_rcu() would not cut it given RCU implementations intended for
    hardcore realtime use.

    - include/*/kdebug.h comments: change to synchronize_sched() to handle races
    with NMIs. As before, synchronize_rcu() would not cut it...

    - include/linux/list.h comment: change to synchronize_rcu(), since this
    comment is for list_del_rcu().

    - security/keys/key.c unregister_key_type(): change to synchronize_rcu(),
    since this is interacting with RCU read side.

    - security/keys/process_keys.c install_session_keyring(): change to
    synchronize_rcu(), since this is interacting with RCU read side.

    Signed-off-by: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul E. McKenney
     
  • Without this patch, Linux provokes emergency disk shutdowns and
    similar nastiness. It was in SuSE kernels for some time, IIRC.

    Signed-off-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Machek
     
  • Using CPU hotplug to support suspend/resume SMP. Both S3 and S4 use
    disable/enable_nonboot_cpus API. The S4 part is based on Pavel's original S4
    SMP patch.

    Signed-off-by: Li Shaohua
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Shaohua
     
  • This patch adds __cpuinit and __cpuinitdata sections that need to exist past
    boot to support cpu hotplug.

    Caveat: This is done *only* for EM64T CPU Hotplug support, on request from
    Andi Kleen. Much of the generic hotplug code in kernel, and none of the other
    archs that support CPU hotplug today, i386, ia64, ppc64, s390 and parisc dont
    mark sections with __cpuinit, but only mark them as __devinit, and
    __devinitdata.

    If someone is motivated to change generic code, we need to make sure all
    existing hotplug code does not break, on other arch's that dont use __cpuinit,
    and __cpudevinit.

    Signed-off-by: Ashok Raj
    Acked-by: Andi Kleen
    Acked-by: Zwane Mwaikambo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ashok Raj
     
  • I really wish smp_prepare_cpu() would disappear eventually. In the interim
    this is ideally a weak function, so we dont end up changing several places
    to define this dummy in headers.

    Today since the dummy declaration is done only in drivers/base/cpu.c but
    the function is called in kernel/power/smp.c i get undefined reference in
    my cpu hotplug code for x86_64 under development.

    Signed-off-by: Ashok Raj
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ashok Raj
     
  • I8K: Change to use stock dmi infrastructure instead of homegrown
    parsing code. The driver now requires box's DMI data to match
    list of supported models so driver can be safely compiled-in
    by default without fear of it poking into random SMM BIOS
    code. DMI checks can be ignored with i8k.ignore_dmi option.

    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitry Torokhov
     

25 Jun, 2005

1 commit


24 Jun, 2005

13 commits

  • Linus Torvalds
     
  • Another rollup of patches which give various symbols static scope

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • This patch reworks filemap_xip.c with the goal to reduce code duplication
    from mm/filemap.c. It applies agains 2.6.12-rc6-mm1. Instead of
    implementing the aio functions, this one implements the synchronous
    read/write functions only. For readv and writev, the generic fallback is
    used. For aio, we rely on the application doing the fallback. Since our
    "synchronous" function does memcpy immediately anyway, there is no
    performance difference between using the fallbacks or implementing each
    operation.

    Signed-off-by: Carsten Otte
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Carsten Otte
     
  • These are the ext2 related parts. Ext2 now uses the xip_* file operations
    along with the get_xip_page aop when mounted with -o xip.

    Signed-off-by: Carsten Otte
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Carsten Otte
     
  • - generic_file* file operations do no longer have a xip/non-xip split
    - filemap_xip.c implements a new set of fops that require get_xip_page
    aop to work proper. all new fops are exported GPL-only (don't like to
    see whatever code use those except GPL modules)
    - __xip_unmap now uses page_check_address, which is no longer static
    in rmap.c, and defined in linux/rmap.h
    - mm/filemap.h is now much more clean, plainly having just Linus'
    inline funcs moved here from filemap.c
    - fix includes in filemap_xip to make it build cleanly on i386

    Signed-off-by: Carsten Otte
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Carsten Otte
     
  • This is the block device related part. The block device operation
    direct_access now has a struct block_device as first parameter.

    Signed-off-by: Carsten Otte
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Carsten Otte
     
  • This patch adds version and srcversion files to
    /sys/module/${modulename} containing the version and srcversion fields
    of the module's modinfo section (if present).

    /sys/module/e1000
    |-- srcversion
    `-- version

    This patch differs slightly from the version posted in January, as it
    now uses the new kstrdup() call in -mm.

    Why put this in sysfs?

    a) Tools like DKMS, which deal with changing out individual kernel
    modules without replacing the whole kernel, can behave smarter if they
    can tell the version of a given module. The autoinstaller feature, for
    example, which determines if your system has a "good" version of a
    driver (i.e. if the one provided by DKMS has a newer verson than that
    provided by the kernel package installed), and to automatically compile
    and install a newer version if DKMS has it but your kernel doesn't yet
    have that version.

    b) Because sysadmins manually, or with tools like DKMS, can switch out
    modules on the file system, you can't count on 'modinfo foo.ko', which
    looks at /lib/modules/${kernelver}/... actually matching what is loaded
    into the kernel already. Hence asking sysfs for this.

    c) as the unbind-driver-from-device work takes shape, it will be
    possible to rebind a driver that's built-in (no .ko to modinfo for the
    version) to a newly loaded module. sysfs will have the
    currently-built-in version info, for comparison.

    d) tech support scripts can then easily grab the version info for what's
    running presently - a question I get often.

    There has been renewed interest in this patch on linux-scsi by driver
    authors.

    As the idea originated from GregKH, I leave his Signed-off-by: intact,
    though the implementation is nearly completely new. Compiled and run on
    x86 and x86_64.

    From: Matthew Dobson

    build fix

    From: Thierry Vignaud

    build fix

    From: Matthew Dobson

    warning fix

    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Matt Domsch
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Domsch
     
  • Set the recovery directory via /proc/fs/nfsd/nfs4recoverydir.

    It may be changed any time, but is used only on startup.

    Signed-off-by: Andy Adamson
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • This patch adds the code to create and remove client subdirectories from the
    recovery directory, as described in the previous patch comment.

    Signed-off-by: Andy Adamson
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • NFSv4 clients are required to know what state they have on the server so that
    they can reclaim it on server reboot. However, it is possible for
    pathalogical combinations of server reboots and network partitions to leave a
    client in a state where it cannot know whether it has lost its state on the
    server.

    For this reason, rfc3530 requires that we store some information about clients
    to stable storage.

    So we maintain a directory /var/lib/nfs/v4recovery with a subdirectory for
    each client with active state. We leave open the possibility of including
    files underneath each such subdirectory with information about the client, but
    for now the subdirectories are empty.

    We create a client subdirectory whenever a client makes its first non-reclaim
    open_confirm.

    We remove a client subdirectory whenever either
    a) its lease expires, or
    b) the grace period ends without it reclaiming anything.
    When handling reclaims, we allow the reclaim if and only if the client doing
    the reclaim has a subdirectory.

    This patch adds just the code to scan the recovery directory on nfsd startup.

    Signed-off-by: Andy Adamson
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • The cb_parsed field is only used by probe_callback, to determine whether the
    callback information has been filled in by setclientid. But there is no way
    that probe_callback() can be called without that having already happened, so
    that check is superfluous, as is cb_parsed.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Trivial renaming patch:

    I can never remember, while looking at various lists relating the nfsd4 state
    structures, which are the "heads" and which are items on other lists, or which
    structures are actually on the various lists. The following convention helps
    me: given structures foo and bar, with foo containing the head of a list of
    bars, use "bars" for the name of the head of the list contained in the struct
    foo, and use "per_foo" for the entries in the struct bars.

    Already done for struct nfs4_file; go ahead and do it for the other nfsd4
    state structures.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • This patch contains the following possible cleanups:

    - make needlessly global code static

    Signed-off-by: Adrian Bunk
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown