08 Jan, 2009

2 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (24 commits)
    trivial: chack -> check typo fix in main Makefile
    trivial: Add a space (and a comma) to a printk in 8250 driver
    trivial: Fix misspelling of "firmware" in docs for ncr53c8xx/sym53c8xx
    trivial: Fix misspelling of "firmware" in powerpc Makefile
    trivial: Fix misspelling of "firmware" in usb.c
    trivial: Fix misspelling of "firmware" in qla1280.c
    trivial: Fix misspelling of "firmware" in a100u2w.c
    trivial: Fix misspelling of "firmware" in megaraid.c
    trivial: Fix misspelling of "firmware" in ql4_mbx.c
    trivial: Fix misspelling of "firmware" in acpi_memhotplug.c
    trivial: Fix misspelling of "firmware" in ipw2100.c
    trivial: Fix misspelling of "firmware" in atmel.c
    trivial: Fix misspelled firmware in Kconfig
    trivial: fix an -> a typos in documentation and comments
    trivial: fix then -> than typos in comments and documentation
    trivial: update Jesper Juhl CREDITS entry with new email
    trivial: fix singal -> signal typo
    trivial: Fix incorrect use of "loose" in event.c
    trivial: printk: fix indentation of new_text_line declaration
    trivial: rtc-stk17ta8: fix sparse warning
    ...

    Linus Torvalds
     
  • The typedefs for __u64 and __s64 where fixed to be available for other
    compiler on May 2 2008 by H. Peter Anvin (in commit edfa5cfa3dc5)

    Acked-by: H. Peter Anvin
    Signed-off-by: Detlef Riekenberg
    Signed-off-by: Linus Torvalds

    Detlef Riekenberg
     

07 Jan, 2009

38 commits

  • The __SWAB_64_THRU_32__ case of a 64-bit byte swap was depending on the
    no-longer-existant ___swab32() method (three underscores). We got rid
    of some of the worst indirection and complexity, and now it should just
    use the 32-bit swab function that was defined right above it.

    Reported-and-tested-by: Nicolas Pitre
    Reported-by: Benjamin Herrenschmidt
    Cc: Harvey Harrison
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • This implementation caused problems in userspace which can, and does
    define _both_ __LITTLE_ENDIAN and __BIG_ENDIAN.

    Signed-off-by: Harvey Harrison
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • The first step to make swab.h a regular header that will
    include an asm/swab.h with arch overrides.

    Avoid the gratuitous differences introduced in the new
    linux/swab.h by naming the ___constant_swabXX bits and
    __fswabXX bits exactly as found in the old implementation
    in byteorder/swab[b].h

    Use this new swab.h in byteorder/[big|little]_endian.h and
    remove the two old swab headers.

    Although the inclusion of asm/byteorder.h looks strange in
    linux/swab.h, this will allow each arch to move the actual
    arch overrides for the swab bits in an asm file and then
    the includes can be cleaned up without requiring a flag day
    for all arches at once.

    Keep providing __fswabXX in case some userspace was using them
    directly, but the revised __swabXX should be used instead in
    any new code and will always do constant folding not dependent
    on the optimization level, which means the __constant versions
    can be phased out in-kernel.

    Arches that use the old-style arch macros will lose their
    optimized versions until they move to the new style, but at
    least they will still compile. Many arches have already moved
    and the patches to move the remaining arches are trivial.

    Signed-off-by: Harvey Harrison
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: (29 commits)
    Input: i8042 - add Dell Vostro 1510 to nomux list
    Input: gtco - use USB endpoint API
    Input: add support for Maple controller as a joystick
    Input: atkbd - broaden the Dell DMI signatures
    Input: HIL drivers - add MODULE_ALIAS()
    Input: map_to_7segment.h - convert to __inline__ for userspace
    Input: add support for enhanced rotary controller on pxa930 and pxa935
    Input: add support for trackball on pxa930 and pxa935
    Input: add da9034 touchscreen support
    Input: ads7846 - strict_strtoul takes unsigned long
    Input: make some variables and functions static
    Input: add tsc2007 based touchscreen driver
    Input: psmouse - add module parameters to control OLPC touchpad delays
    Input: i8042 - add Gigabyte M912 netbook to noloop exception table
    Input: atkbd - Samsung NC10 key repeat fix
    Input: atkbd - add keyboard quirk for HP Pavilion ZV6100 laptop
    Input: libps2 - handle 0xfc responses from devices
    Input: add support for Wacom W8001 penabled serial touchscreen
    Input: synaptics - report multi-taps only if supported by the device
    Input: add joystick driver for Walkera WK-0701 RC transmitter
    ...

    Linus Torvalds
     
  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
    CRED: Fix regression in cap_capable() as shown up by sys_faccessat() [ver #3]
    Revert "CRED: Fix regression in cap_capable() as shown up by sys_faccessat() [ver #2]"
    SELinux: shrink sizeof av_inhert selinux_class_perm and context
    CRED: Fix regression in cap_capable() as shown up by sys_faccessat() [ver #2]
    keys: fix sparse warning by adding __user annotation to cast
    smack: Add support for unlabeled network hosts and networks
    selinux: Deprecate and schedule the removal of the the compat_net functionality
    netlabel: Update kernel configuration API

    Linus Torvalds
     
  • …el/git/tip/linux-2.6-tip

    * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    hrtimer: splitout peek ahead functionality, fix
    hrtimer: fixup comments
    hrtimer: fix recursion deadlock by re-introducing the softirq
    hrtimer: simplify hotplug migration
    hrtimer: fix HOTPLUG_CPU=n compile warning
    hrtimer: splitout peek ahead functionality

    Linus Torvalds
     
  • …l/git/tip/linux-2.6-tip

    * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    sched: fix section mismatch
    sched: fix double kfree in failure path
    sched: clean up arch_reinit_sched_domains()
    sched: mark sched_create_sysfs_power_savings_entries() as __init
    getrusage: RUSAGE_THREAD should return ru_utime and ru_stime
    sched: fix sched_slice()
    sched_clock: prevent scd->clock from moving backwards, take #2
    sched: sched.c declare variables before they get used

    Linus Torvalds
     
  • …git/tip/linux-2.6-tip

    * 'irq-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    genirq: provide irq_to_desc() to non-genirq architectures too

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    rcu: fix rcutorture bug
    rcu: eliminate synchronize_rcu_xxx macro
    rcu: make treercu safe for suspend and resume
    rcu: fix rcutree grace-period-latency bug on small systems
    futex: catch certain assymetric (get|put)_futex_key calls
    futex: make futex_(get|put)_key() calls symmetric
    locking, percpu counters: introduce separate lock classes
    swiotlb: clean up EXPORT_SYMBOL usage
    swiotlb: remove unnecessary declaration
    swiotlb: replace architecture-specific swiotlb.h with linux/swiotlb.h
    swiotlb: add support for systems with highmem
    swiotlb: store phys address in io_tlb_orig_addr array
    swiotlb: add hwdev to swiotlb_phys_to_bus() / swiotlb_sg_to_bus()

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core-2.6: (60 commits)
    uio: make uio_info's name and version const
    UIO: Documentation for UIO ioport info handling
    UIO: Pass information about ioports to userspace (V2)
    UIO: uio_pdrv_genirq: allow custom irq_flags
    UIO: use pci_ioremap_bar() in drivers/uio
    arm: struct device - replace bus_id with dev_name(), dev_set_name()
    libata: struct device - replace bus_id with dev_name(), dev_set_name()
    avr: struct device - replace bus_id with dev_name(), dev_set_name()
    block: struct device - replace bus_id with dev_name(), dev_set_name()
    chris: struct device - replace bus_id with dev_name(), dev_set_name()
    dmi: struct device - replace bus_id with dev_name(), dev_set_name()
    gadget: struct device - replace bus_id with dev_name(), dev_set_name()
    gpio: struct device - replace bus_id with dev_name(), dev_set_name()
    gpu: struct device - replace bus_id with dev_name(), dev_set_name()
    hwmon: struct device - replace bus_id with dev_name(), dev_set_name()
    i2o: struct device - replace bus_id with dev_name(), dev_set_name()
    IA64: struct device - replace bus_id with dev_name(), dev_set_name()
    i7300_idle: struct device - replace bus_id with dev_name(), dev_set_name()
    infiniband: struct device - replace bus_id with dev_name(), dev_set_name()
    ISDN: struct device - replace bus_id with dev_name(), dev_set_name()
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
    fuse: clean up annotations of fc->lock
    fuse: fix sparse warning in ioctl
    fuse: update interface version
    fuse: add fuse_conn->release()
    fuse: separate out fuse_conn_init() from new_conn()
    fuse: add fuse_ prefix to several functions
    fuse: implement poll support
    fuse: implement unsolicited notification
    fuse: add file kernel handle
    fuse: implement ioctl support
    fuse: don't let fuse_req->end() put the base reference
    fuse: move FUSE_MINOR to miscdevice.h
    fuse: style fixes

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (41 commits)
    scc_pata: make use of scc_dma_sff_read_status()
    ide-dma-sff: factor out ide_dma_sff_write_status()
    ide: move read_sff_dma_status() method to 'struct ide_dma_ops'
    ide: don't set hwif->dma_ops in init_dma() method
    Resurrect IT8172 IDE controller driver
    piix: sync ich_laptop[] with ata_piix.c
    ide: update warm-plug HOWTO
    ide: fix ide_port_scan() to do ACPI setup after initializing request queues
    ide: remove now redundant ->cur_dev checks
    ide: remove unused ide_hwif_t.sg_mapped field
    ide: struct ide_atapi_pc - remove unused fields and update documentation
    ide: remove superfluous hwif variable assignment from ide_timer_expiry()
    ide: use ide_pci_is_in_compatibility_mode() helper in setup-pci.c
    ide: make "paranoia" ->handler check in ide_intr() more strict
    ide-cd: convert to ide-atapi facilities
    ide-cd: start DMA before sending the actual packet command
    ide-cd: wait for DRQ to get set per default
    ide: Fix drive's DWORD-IO handling
    ide: add port and host iterators
    ide: dynamic allocation of device structures
    ...

    Linus Torvalds
     
  • No one cares do_coredump()'s return value, and also it seems that it
    is also not necessary. So make it void.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: WANG Cong
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    WANG Cong
     
  • Remove excess kernel-doc notation from rio header and driver:

    Warning(include/linux/rio_drv.h:399): Excess function parameter or struct member 'buffer' description in 'rio_get_inb_message'

    Signed-off-by: Randy Dunlap
    Cc: Matt Porter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Provide a static debounce configuration mechanism for twl4030 GPIOs,
    replacing the previous dynamic one. The single user of that mechanism was
    for MMC card detect debouncing.

    Boards can provide a bitmask saying which GPIOs to debounce (30 msec).
    It's always enabled for pins with the MMC card-detect/VMMCx link active,
    so most boards won't need to set the debounce mask.

    This is a net code shrink, including runtime footprint.

    Signed-off-by: David Brownell
    Signed-off-by: Tony Lindgren
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Brownell
     
  • - the type assigned at mount when no type is given is changed
    from 0 to AUTOFS_TYPE_INDIRECT. This was done because 0 and
    AUTOFS_TYPE_INDIRECT were being treated implicitly as the same
    type.

    - previously, an offset mount had it's type set to
    AUTOFS_TYPE_DIRECT|AUTOFS_TYPE_OFFSET but the mount control
    re-implementation needs to be able distinguish all three types.
    So this was changed to make the type setting explicit.

    - a type AUTOFS_TYPE_ANY was added for use by the re-implementation
    when checking if a given path is a mountpoint. It's not really a
    type as we use this to ask if a given path is a mountpoint in the
    autofs_dev_ioctl_ismountpoint() function.

    - functions to set and test the autofs mount types have been added to
    improve readability and make the type usage explicit.

    - the mount type is used from user space for the mount control
    re-implementtion so, for consistency, all the definitions have
    been moved to the user space include file include/linux/auto_fs4.h.

    Signed-off-by: Ian Kent
    Signed-off-by: Jeff Moyer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Kent
     
  • The parameter usage in the device node ioctl code uses arg1 and arg2 as
    parameter names. This patch redefines the parameter names to reflect what
    they actually are in an effort to make the code more readable.

    Signed-off-by: Ian Kent
    Signed-off-by: Jeff Moyer
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Kent
     
  • Allows kprobes to probe __exit routine. This adds flags member to struct
    kprobe. When module is freed(kprobes hooks module_notifier to get this
    event), kprobes which probe the functions in that module are set to "Gone"
    flag to the flags member. These "Gone" probes are never be enabled.
    Users can check the GONE flag through debugfs.

    This also removes mod_refcounted, because we couldn't free a module if
    kprobe incremented the refcount of that module.

    [akpm@linux-foundation.org: document some locking]
    [mhiramat@redhat.com: bugfix: pass aggr_kprobe to arch_remove_kprobe]
    [mhiramat@redhat.com: bugfix: release old_p's insn_slot before error return]
    Signed-off-by: Masami Hiramatsu
    Acked-by: Ananth N Mavinakayanahalli
    Cc: Anil S Keshavamurthy
    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masami Hiramatsu
     
  • Add kprobe_insn_mutex for protecting kprobe_insn_pages hlist, and remove
    kprobe_mutex from architecture dependent code.

    This allows us to call arch_remove_kprobe() (and free_insn_slot) while
    holding kprobe_mutex.

    Signed-off-by: Masami Hiramatsu
    Acked-by: Ananth N Mavinakayanahalli
    Cc: Anil S Keshavamurthy
    Cc: Russell King
    Cc: "Luck, Tony"
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masami Hiramatsu
     
  • This series of patches allows kprobes to probe module's __init and __exit
    functions. This means, you can probe driver initialization and
    terminating.

    Currently, kprobes can't probe __init function because these functions are
    freed after module initialization. And it also can't probe module __exit
    functions because kprobe increments reference count of target module and
    user can't unload it. this means __exit functions never be called unless
    removing probes from the module.

    To solve both cases, this series of patches introduces GONE flag and sets
    it when the target code is freed(for this purpose, kprobes hooks
    MODULE_STATE_* events). This also removes refcount incrementing for
    allowing user to unload target module. Users can check which probes are
    GONE by debugfs interface. For taking timing of freeing module's .init
    text, these also include a patch which adds module's notifier of
    MODULE_STATE_LIVE event.

    This patch:

    Add within_module_core() and within_module_init() for checking whether an
    address is in the module .init.text section or .text section, and replace
    within() local inline functions in kernel/module.c with them.

    kprobes uses these functions to check where the kprobe is inserted.

    Signed-off-by: Masami Hiramatsu
    Cc: Ananth N Mavinakayanahalli
    Cc: Anil S Keshavamurthy
    Acked-by: Rusty Russell
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Masami Hiramatsu
     
  • Generalize the old at91rm9200 "bootstrap" bitbanging SPI master driver as
    "spi_gpio", so it works with arbitrary GPIOs and can be configured through
    platform_data. Such SPI masters support:

    - any number of bus instances (bus_num is the platform_device.id)
    - any number of chipselects (one GPIO per spi_device)
    - all four SPI_MODE values, and SPI_CS_HIGH
    - i/o word sizes from 1 to 32 bits;
    - devices configured as with any other spi_master controller

    When configured using platform_data, this provides relatively low clock
    rates. On platforms that support inlined GPIO calls, significantly
    improved transfer speeds are also possible with a semi-custom driver.
    (It's still painful when accessing flash memory, but less so.)

    Sanity checked by using this version to replace both native controllers on
    a board with six different SPI slaves, relying on three different
    SPI_MODE_* values and both SPI_CS_HIGH settings for correct operation.

    [akpm@linux-foundation.org: cleanups]
    Signed-off-by: David Brownell
    Acked-by: Magnus Damm
    Tested-by: Magnus Damm
    Cc: Torgil Svensson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Brownell
     
  • linux_binfmt uses list_head, so list.h is needed.

    [akpm@linux-foundation.org: fix `make headerscheck']
    Signed-off-by: Hiroshi Shimamoto
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hiroshi Shimamoto
     
  • Signed-off-by: Jesper Juhl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     
  • For NR_CPUS >= 16 values, FBC_BATCH is 2*NR_CPUS

    Considering more and more distros are using high NR_CPUS values, it makes
    sense to use a more sensible value for FBC_BATCH, and get rid of NR_CPUS.

    A sensible value is 2*num_online_cpus(), with a minimum value of 32 (This
    minimum value helps branch prediction in __percpu_counter_add())

    We already have a hotcpu notifier, so we can adjust FBC_BATCH dynamically.

    We rename FBC_BATCH to percpu_counter_batch since its not a constant
    anymore.

    Signed-off-by: Eric Dumazet
    Acked-by: David S. Miller
    Acked-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • f_op->poll is the only vfs operation which is not allowed to sleep. It's
    because poll and select implementation used task state to synchronize
    against wake ups, which doesn't have to be the case anymore as wait/wake
    interface can now use custom wake up functions. The non-sleep restriction
    can be a bit tricky because ->poll is not called from an atomic context
    and the result of accidentally sleeping in ->poll only shows up as
    temporary busy looping when the timing is right or rather wrong.

    This patch converts poll/select to use custom wake up function and use
    separate triggered variable to synchronize against wake up events. The
    only added overhead is an extra function call during wake up and
    negligible.

    This patch removes the one non-sleep exception from vfs locking rules and
    is beneficial to userland filesystem implementations like FUSE, 9p or
    peculiar fs like spufs as it's very difficult for those to implement
    non-sleeping poll method.

    While at it, make the following cosmetic changes to make poll.h and
    select.c checkpatch friendly.

    * s/type * symbol/type *symbol/ : three places in poll.h
    * remove blank line before EXPORT_SYMBOL() : two places in select.c

    Oleg: spotted missing barrier in poll_schedule_timeout()
    Davide: spotted missing write barrier in pollwake()

    Signed-off-by: Tejun Heo
    Cc: Eric Van Hensbergen
    Cc: Ron Minnich
    Cc: Ingo Molnar
    Cc: Christoph Hellwig
    Signed-off-by: Miklos Szeredi
    Cc: Davide Libenzi
    Cc: Brad Boyer
    Cc: Al Viro
    Cc: Roland McGrath
    Cc: Mauro Carvalho Chehab
    Signed-off-by: Andrew Morton
    Cc: Davide Libenzi
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Create a helper macro to divide two numbers and round the result to the
    nearest whole number. This is a helper macro for hwmon drivers that want
    to convert incoming sysfs values per standard hwmon practice, though the
    macro itself can be used by anyone.

    Signed-off-by: Darrick J. Wong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Darrick J. Wong
     
  • Signed-off-by: Alexey Dobriyan
    Cc: Gabor Gombas
    Cc: Jan Beulich
    Cc: Andi Kleen
    Cc: Ingo Molnar ,
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • The atomic_t type cannot currently be used in some header files because it
    would create an include loop with asm/atomic.h. Move the type definition
    to linux/types.h to break the loop.

    Signed-off-by: Matthew Wilcox
    Cc: Huang Ying
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     
  • xacct_add_tsk() relies on do_exit()->update_hiwater_xxx() and uses
    mm->hiwater_xxx directly, this leads to 2 problems:

    - taskstats_user_cmd() can call fill_pid()->xacct_add_tsk() at any
    moment before the task exits, so we should check the current values of
    rss/vm anyway.

    - do_exit()->update_hiwater_xxx() calls are racy. An exiting thread can
    be preempted right before mm->hiwater_xxx = new_val, and another thread
    can use A_LOT of memory and exit in between. When the first thread
    resumes it can be the last thread in the thread group, in that case we
    report the wrong hiwater_xxx values which do not take A_LOT into
    account.

    Introduce get_mm_hiwater_rss() and get_mm_hiwater_vm() helpers and change
    xacct_add_tsk() to use them. The first helper will also be used by
    rusage->ru_maxrss accounting.

    Kill do_exit()->update_hiwater_xxx() calls. Unless we are going to
    decrease rss/vm there is no point to update mm->hiwater_xxx, and nobody
    can look at this mm_struct when exit_mmap() actually unmaps the memory.

    Signed-off-by: Oleg Nesterov
    Acked-by: Hugh Dickins
    Reviewed-by: KOSAKI Motohiro
    Acked-by: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • s_syncing livelock avoidance was breaking data integrity guarantee of
    sys_sync, by allowing sys_sync to skip writing or waiting for superblocks
    if there is a concurrent sys_sync happening.

    This livelock avoidance is much less important now that we don't have the
    get_super_to_sync() call after every sb that we sync. This was replaced
    by __put_super_and_need_restart.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Remove WB_SYNC_HOLD. The primary motiviation is the design of my
    anti-starvation code for fsync. It requires taking an inode lock over the
    sync operation, so we could run into lock ordering problems with multiple
    inodes. It is possible to take a single global lock to solve the ordering
    problem, but then that would prevent a future nice implementation of "sync
    multiple inodes" based on lock order via inode address.

    Seems like a backward step to remove this, but actually it is busted
    anyway: we can't use the inode lists for data integrity wait: an inode can
    be taken off the dirty lists but still be under writeback. In order to
    satisfy data integrity semantics, we should wait for it to finish
    writeback, but if we only search the dirty lists, we'll miss it.

    It would be possible to have a "writeback" list, for sys_sync, I suppose.
    But why complicate things by prematurely optimise? For unmounting, we
    could avoid the "livelock avoidance" code, which would be easier, but
    again premature IMO.

    Fixing the existing data integrity problem will come next.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Remove page_remove_rmap()'s vma arg, which was only for the Eeek message.
    And remove the BUG_ON(page_mapcount(page) == 0) from CONFIG_DEBUG_VM's
    page_dup_rmap(): we're trying to be more resilient about that than BUGs.

    Signed-off-by: Hugh Dickins
    Cc: Nick Piggin
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Complete zap_pte_range()'s coverage of bad pagetable entries by calling
    print_bad_pte() on a pte_file in a linear vma and on a bad swap entry.
    That needs free_swap_and_cache() to tell it, which will also have shown
    one of those "swap_free" errors (but with much less information).

    Similar checks in fork's copy_one_pte()? No, that would be more noisy
    than helpful: we'll see them when parent and child exec or exit.

    Where do_nonlinear_fault() calls print_bad_pte(): omit !VM_CAN_NONLINEAR
    case, that could only be a bug in sys_remap_file_pages(), not a bad pte.
    VM_FAULT_OOM rather than VM_FAULT_SIGBUS? Well, okay, that is consistent
    with what happens if do_swap_page() operates a bad swap entry; but don't
    we have patches to be more careful about killing when VM_FAULT_OOM?

    Signed-off-by: Hugh Dickins
    Cc: Nick Piggin
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Simplify the PAGE_FLAGS checking and clearing when freeing and allocating
    a page: check the same flags as before when freeing, clear ALL the flags
    (unless PageReserved) when freeing, check ALL flags off when allocating.

    Signed-off-by: Hugh Dickins
    Cc: Nick Piggin
    Cc: Christoph Lameter
    Cc: Mel Gorman
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Swap allocation has always started from the beginning of the swap area;
    but if we're dealing with a solidstate swap device which can only remap
    blocks within limited zones, that would sooner wear out the first zone.

    Therefore sys_swapon() test whether blk_queue is non-rotational, and if so
    randomize the cluster_next starting position for allocation.

    If blk_queue is nonrot, note SWP_SOLIDSTATE for later use, and report it
    with an "SS" at the right end of the kernel's "Adding ... swap" message
    (so that if it's both nonrot and discardable, "SSD" will be shown there).
    Perhaps something should be shown in /proc/swaps (swapon -s), but we have
    to be more cautious before making any addition to that format.

    Signed-off-by: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: Nick Piggin
    Cc: David Woodhouse
    Cc: Jens Axboe
    Cc: Matthew Wilcox
    Cc: Joern Engel
    Cc: James Bottomley
    Cc: Donjun Shin
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • When scan_swap_map() finds a free cluster of swap pages to allocate,
    discard the old contents of the cluster if the device supports discard.
    But don't bother when swap is so fragmented that we allocate single pages.

    Be careful about racing allocations made while we're scanning for a
    cluster; and hold up allocations made while we're discarding.

    Signed-off-by: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: Nick Piggin
    Cc: David Woodhouse
    Cc: Jens Axboe
    Cc: Matthew Wilcox
    Cc: Joern Engel
    Cc: James Bottomley
    Cc: Donjun Shin
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • When adding swap, all the old data on swap can be forgotten: sys_swapon()
    discard all but the header page of the swap partition (or every extent but
    the header of the swap file), to give a solidstate swap device the
    opportunity to optimize its wear-levelling.

    If that succeeds, note SWP_DISCARDABLE for later use, and report it with a
    "D" at the right end of the kernel's "Adding ... swap" message. Perhaps
    something should be shown in /proc/swaps (swapon -s), but we have to be
    more cautious before making any addition to that format.

    Signed-off-by: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: Nick Piggin
    Cc: David Woodhouse
    Cc: Jens Axboe
    Cc: Matthew Wilcox
    Cc: Joern Engel
    Cc: James Bottomley
    Cc: Donjun Shin
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Before making functional changes, rearrange scan_swap_map() to simplify
    subsequent diffs. Actually, there is one functional change in there:
    leave cluster_nr negative while scanning for a new cluster - resetting it
    early increased the likelihood that when we have difficulty finding a free
    cluster, another task may come in and try doing exactly the same - just a
    waste of cpu.

    Before making functional changes, rearrange struct swap_info_struct
    slightly: flags will be needed as an unsigned long (for wait_on_bit), next
    is a good int to pair with prio, old_block_size is uninteresting so shift
    it to the end.

    Signed-off-by: Hugh Dickins
    Cc: KAMEZAWA Hiroyuki
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins