07 Jan, 2009

9 commits

  • When cpusets are enabled, it's necessary to print the triggering task's
    set of allowable nodes so the subsequently printed meminfo can be
    interpreted correctly.

    We also print the task's cpuset name for informational purposes.

    [rientjes@google.com: task lock current before dereferencing cpuset]
    Cc: Paul Menage
    Cc: Li Zefan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • zone_scan_mutex is actually a spinlock, so name it appropriately.

    Signed-off-by: David Rientjes
    Reviewed-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Rather than have the pagefault handler kill a process directly if it gets
    a VM_FAULT_OOM, have it call into the OOM killer.

    With increasingly sophisticated oom behaviour (cpusets, memory cgroups,
    oom killing throttling, oom priority adjustment or selective disabling,
    panic on oom, etc), it's silly to unconditionally kill the faulting
    process at page fault time. Create a hook for pagefault oom path to call
    into instead.

    Only converted x86 and uml so far.

    [akpm@linux-foundation.org: make __out_of_memory() static]
    [akpm@linux-foundation.org: fix comment]
    Signed-off-by: Nick Piggin
    Cc: Jeff Dike
    Acked-by: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • pp->page is never used when not set to the right page, so there is no need
    to set it to ZERO_PAGE(0) by default.

    Signed-off-by: Brice Goglin
    Acked-by: Christoph Lameter
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Brice Goglin
     
  • Rework do_pages_move() to work by page-sized chunks of struct page_to_node
    that are passed to do_move_page_to_node_array(). We now only have to
    allocate a single page instead a possibly very large vmalloc area to store
    all page_to_node entries.

    As a result, new_page_node() will now have a very small lookup, hidding
    much of the overall sys_move_pages() overhead.

    Signed-off-by: Brice Goglin
    Signed-off-by: Nathalie Furmento
    Acked-by: Christoph Lameter
    Cc: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Brice Goglin
     
  • Following "mm: don't mark_page_accessed in fault path", which now
    places a mark_page_accessed() in zap_pte_range(), we should remove
    the mark_page_accessed() from shmem_fault().

    Signed-off-by: Hugh Dickins
    Cc: Nick Piggin
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Doing a mark_page_accessed at fault-time, then doing SetPageReferenced at
    unmap-time if the pte is young has a number of problems.

    mark_page_accessed is supposed to be roughly the equivalent of a young pte
    for unmapped references. Unfortunately it doesn't come with any context:
    after being called, reclaim doesn't know who or why the page was touched.

    So calling mark_page_accessed not only adds extra lru or PG_referenced
    manipulations for pages that are already going to have pte_young ptes anyway,
    but it also adds these references which are difficult to work with from the
    context of vma specific references (eg. MADV_SEQUENTIAL pte_young may not
    wish to contribute to the page being referenced).

    Then, simply doing SetPageReferenced when zapping a pte and finding it is
    young, is not a really good solution either. SetPageReferenced does not
    correctly promote the page to the active list for example. So after removing
    mark_page_accessed from the fault path, several mmap()+touch+munmap() would
    have a very different result from several read(2) calls for example, which
    is not really desirable.

    Signed-off-by: Nick Piggin
    Acked-by: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • The KernelPageSize entry in /proc/pid/smaps is the pagesize used by the
    kernel to back a VMA. This matches the size used by the MMU in the
    majority of cases. However, one counter-example occurs on PPC64 kernels
    whereby a kernel using 64K as a base pagesize may still use 4K pages for
    the MMU on older processor. To distinguish, this patch reports
    MMUPageSize as the pagesize used by the MMU in /proc/pid/smaps.

    Signed-off-by: Mel Gorman
    Cc: "KOSAKI Motohiro"
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • It is useful to verify a hugepage-aware application is using the expected
    pagesizes for its memory regions. This patch creates an entry called
    KernelPageSize in /proc/pid/smaps that is the size of page used by the
    kernel to back a VMA. The entry is not called PageSize as it is possible
    the MMU uses a different size. This extension should not break any sensible
    parser that skips lines containing unrecognised information.

    Signed-off-by: Mel Gorman
    Acked-by: "KOSAKI Motohiro"
    Cc: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

06 Jan, 2009

31 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-2.6-dm:
    dm snapshot: extend exception store functions
    dm snapshot: split out exception store implementations
    dm snapshot: rename struct exception_store
    dm snapshot: separate out exception store interface
    dm mpath: move trigger_event to system workqueue
    dm: add name and uuid to sysfs
    dm table: rework reference counting
    dm: support barriers on simple devices
    dm request: extend target interface
    dm request: add caches
    dm ioctl: allow dm_copy_name_and_uuid to return only one field
    dm log: ensure log bitmap fits on log device
    dm log: move region_size validation
    dm log: avoid reinitialising io_req on every operation
    dm: consolidate target deregistration error handling
    dm raid1: fix error count
    dm log: fix dm_io_client leak on error paths
    dm snapshot: change yield to msleep
    dm table: drop reference at unbind

    Linus Torvalds
     
  • Supply dm_add_exception as a callback to the read_metadata function.
    Add a status function ready for a later patch and name the functions
    consistently.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: Alasdair G Kergon

    Jonathan Brassow
     
  • Move the existing snapshot exception store implementations out into
    separate files. Later patches will place these behind a new
    interface in preparation for alternative implementations.

    Signed-off-by: Alasdair G Kergon

    Alasdair G Kergon
     
  • Rename struct exception_store to dm_exception_store.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: Alasdair G Kergon

    Jonathan Brassow
     
  • Pull structures that bridge the gap between snapshot and
    exception store out of dm-snap.h and put them in a new
    .h file - dm-exception-store.h. This file will define the
    API for new exception stores.

    Ultimately, dm-snap.h is unnecessary, since only dm-snap.c
    should be using it.

    Signed-off-by: Jonathan Brassow
    Signed-off-by: Alasdair G Kergon

    Jonathan Brassow
     
  • The same workqueue is used both for sending uevents and processing queued I/O.
    Deadlock has been reported in RHEL5 when sending a uevent was blocked waiting
    for the queued I/O to be processed. Use scheduled_work() for the asynchronous
    uevents instead.

    Signed-off-by: Alasdair G Kergon

    Alasdair G Kergon
     
  • Implement simple read-only sysfs entry for device-mapper block device.

    This patch adds a simple sysfs directory named "dm" under block device
    properties and implements
    - name attribute (string containing mapped device name)
    - uuid attribute (string containing UUID, or empty string if not set)

    The kobject is embedded in mapped_device struct, so no additional
    memory allocation is needed for initializing sysfs entry.

    During the processing of sysfs attribute we need to lock mapped device
    which is done by a new function dm_get_from_kobj, which returns the md
    associated with kobject and increases the usage count.

    Each 'show attribute' function is responsible for its own locking.

    Signed-off-by: Milan Broz
    Signed-off-by: Alasdair G Kergon

    Milan Broz
     
  • Rework table reference counting.

    The existing code uses a reference counter. When the last reference is
    dropped and the counter reaches zero, the table destructor is called.
    Table reference counters are acquired/released from upcalls from other
    kernel code (dm_any_congested, dm_merge_bvec, dm_unplug_all).
    If the reference counter reaches zero in one of the upcalls, the table
    destructor is called from almost random kernel code.

    This leads to various problems:
    * dm_any_congested being called under a spinlock, which calls the
    destructor, which calls some sleeping function.
    * the destructor attempting to take a lock that is already taken by the
    same process.
    * stale reference from some other kernel code keeps the table
    constructed, which keeps some devices open, even after successful
    return from "dmsetup remove". This can confuse lvm and prevent closing
    of underlying devices or reusing device minor numbers.

    The patch changes reference counting so that the table destructor can be
    called only at predetermined places.

    The table has always exactly one reference from either mapped_device->map
    or hash_cell->new_map. After this patch, this reference is not counted
    in table->holders. A pair of dm_create_table/dm_destroy_table functions
    is used for table creation/destruction.

    Temporary references from the other code increase table->holders. A pair
    of dm_table_get/dm_table_put functions is used to manipulate it.

    When the table is about to be destroyed, we wait for table->holders to
    reach 0. Then, we call the table destructor. We use active waiting with
    msleep(1), because the situation happens rarely (to one user in 5 years)
    and removing the device isn't performance-critical task: the user doesn't
    care if it takes one tick more or not.

    This way, the destructor is called only at specific points
    (dm_table_destroy function) and the above problems associated with lazy
    destruction can't happen.

    Finally remove the temporary protection added to dm_any_congested().

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Implement barrier support for single device DM devices

    This patch implements barrier support in DM for the common case of dm linear
    just remapping a single underlying device. In this case we can safely
    pass the barrier through because there can be no reordering between
    devices.

    NB. Any DM device might cease to support barriers if it gets
    reconfigured so code must continue to allow for a possible
    -EOPNOTSUPP on every barrier bio submitted. - agk

    Signed-off-by: Andi Kleen
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Andi Kleen
     
  • This patch adds the following target interfaces for request-based dm.

    map_rq : for mapping a request

    rq_end_io : for finishing a request

    busy : for avoiding performance regression from bio-based dm.
    Target can tell dm core not to map requests now, and
    that may help requests in the block layer queue to be
    bigger by I/O merging.
    In bio-based dm, this behavior is done by device
    drivers managing the block layer queue.
    But in request-based dm, dm core has to do that
    since dm core manages the block layer queue.

    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Alasdair G Kergon

    Kiyoshi Ueda
     
  • This patch prepares some kmem_caches for request-based dm.

    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Signed-off-by: Alasdair G Kergon

    Kiyoshi Ueda
     
  • Allow NULL buffer in dm_copy_name_and_uuid if you only want to return one of
    the fields.

    (Required by a following patch that adds these fields to sysfs.)

    Signed-off-by: Milan Broz
    Reviewed-by: Alasdair G Kergon
    Signed-off-by: Alasdair G Kergon

    Milan Broz
     
  • Check that the log bitmap will fit within the log device.

    Signed-off-by: Milan Broz
    Signed-off-by: Alasdair G Kergon

    Milan Broz
     
  • Move log size validation from mirror target to log constructor.

    Removed PAGE_SIZE restriction we no longer think necessary.

    Signed-off-by: Milan Broz
    Signed-off-by: Alasdair G Kergon

    Milan Broz
     
  • rw_header function updates three members of io_req data every time
    when I/O is processed. bi_rw and notify.fn are never modified once
    they get initialized, and so they can be set in advance.

    header_to_disk() can also be pulled out of write_header() since only one
    caller needs it and write_header() can be replaced by rw_header()
    directly.

    Signed-off-by: Takahiro Yasui
    Signed-off-by: Alasdair G Kergon

    Takahiro Yasui
     
  • Change dm_unregister_target to return void and use BUG() for error
    reporting.

    dm_unregister_target can only fail because of programming bug in the
    target driver. It can't fail because of user's behavior or disk errors.

    This patch changes unregister_target to return void and use BUG if
    someone tries to unregister non-registered target or unregister target
    that is in use.

    This patch removes code duplication (testing of error codes in all dm
    targets) and reports bugs in just one place, in dm_unregister_target. In
    some target drivers, these return codes were ignored, which could lead
    to a situation where bugs could be missed.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Always increase the error count when I/O on a leg of a mirror fails.

    The error count is used to decide whether to select an alternative
    mirror leg. If the target doesn't use the "handle_errors" feature, the
    error count is not updated and the bio can get requeued forever by the
    read callback.

    Fix it by increasing error_count before the handle_errors feature
    checking.

    Cc: stable@kernel.org
    Signed-off-by: Milan Broz
    Signed-off-by: Jonathan Brassow
    Signed-off-by: Alasdair G Kergon

    Jonathan Brassow
     
  • In create_log_context function, dm_io_client_destroy function needs
    to be called, when memory allocation of disk_header, sync_bits and
    recovering_bits failed, but dm_io_client_destroy is not called.

    Cc: stable@kernel.org
    Signed-off-by: Takahiro Yasui
    Acked-by: Jonathan Brassow
    Signed-off-by: Alasdair G Kergon

    Takahiro Yasui
     
  • Change yield() to msleep(1). If the thread had realtime priority,
    yield() doesn't really yield, so the yielding process would loop
    indefinitely and cause machine lockup.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • Move one dm_table_put() so that the last reference in the thread
    gets dropped in __unbind().

    This is required for a following patch,
    dm-table-rework-reference-counting.patch, which will change the logic in
    such a way that table destructor is called only at specific points in
    the code.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Alasdair G Kergon

    Mikulas Patocka
     
  • * 'for-next' of git://git.o-hand.com/linux-mfd: (30 commits)
    mfd: Fix section mismatch in da903x
    mfd: move drivers/i2c/chips/menelaus.c to drivers/mfd
    mfd: move drivers/i2c/chips/tps65010.c to drivers/mfd
    mfd: dm355evm msp430 driver
    mfd: Add missing break from wm3850-core
    mfd: Add WM8351 support
    mfd: Support configurable numbers of DCDCs and ISINKs on WM8350
    mfd: Handle missing WM8350 platform data
    mfd: Add WM8352 support
    mfd: Use irq_to_desc in twl4030 code
    power_supply: Add Dialog DA9030 battery charger driver
    mfd: Dialog DA9030 battery charger MFD driver
    mfd: Register WM8400 codec device
    mfd: Pass driver_data onto child devices
    mfd: Fix twl4030-core.c build error
    mfd: twl4030 regulator bug fixes
    mfd: twl4030: create some regulator devices
    mfd: twl4030: cleanup symbols and OMAP dependency
    mfd: twl4030: simplified child creation code
    power_supply: Add battery health reporting for WM8350
    ...

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
    module: convert to stop_machine_create/destroy.
    stop_machine: introduce stop_machine_create/destroy.
    parisc: fix module loading failure of large kernel modules
    module: fix module loading failure of large kernel modules for parisc
    module: fix warning of unused function when !CONFIG_PROC_FS
    kernel/module.c: compare symbol values when marking symbols as exported in /proc/kallsyms.
    remove CONFIG_KMOD

    Linus Torvalds
     
  • …/git/tip/linux-2.6-tip

    * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
    swiotlb: Don't include linux/swiotlb.h twice in lib/swiotlb.c
    intel-iommu: fix build error with INTR_REMAP=y and DMAR=n
    swiotlb: add missing __init annotations

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
    dlm: fs/dlm/ast.c: fix warning
    dlm: add new debugfs entry
    dlm: add time stamp of blocking callback
    dlm: change lock time stamping
    dlm: improve how bast mode handling
    dlm: remove extra blocking callback check
    dlm: replace schedule with cond_resched
    dlm: remove kmap/kunmap
    dlm: trivial annotation of be16 value
    dlm: fix up memory allocation flags

    Linus Torvalds
     
  • * 'i2c-next' of git://aeryn.fluff.org.uk/bjdooks/linux:
    i2c-omap: fix type of irq handler function
    i2c-s3c2410: Change IRQ to be plain integer.
    i2c-s3c2410: Allow more than one i2c-s3c2410 adapter
    i2c-s3c2410: Remove default platform data.
    i2c-s3c2410: Use platform data for gpio configuration
    i2c-s3c2410: Fixup style problems from checkpatch.pl
    i2c-omap: Enable I2C wakeups for 34xx
    i2c-omap: reprogram OCP_SYSCONFIG register after reset
    i2c-omap: convert 'rev1' flag to generic 'rev' u8
    i2c-omap: fix I2C timeouts due to recursive omap_i2c_{un,}idle()
    i2c-omap: Clean-up i2c-omap
    i2c-omap: Don't compile in OMAP15xx I2C ISR for non-OMAP15xx builds
    i2c-omap: Mark init-only functions as __init
    i2c-omap: Add support for omap34xx
    i2c-omap: FIFO handling support and broken hw workaround for i2c-omap
    i2c-omap: Add high-speed support to omap-i2c
    i2c-omap: Close suspected race between omap_i2c_idle() and omap_i2c_isr()
    i2c-omap: Do not use interruptible wait call in omap_i2c_xfer_msg

    Fix up apparently-trivial conflict in drivers/i2c/busses/i2c-s3c2410.c

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (22 commits)
    HID: fix error condition propagation in hid-sony driver
    HID: fix reference count leak hidraw
    HID: add proper support for pensketch 12x9 tablet
    HID: don't allow DealExtreme usb-radio be handled by usb hid driver
    HID: fix default Kconfig setting for TopSpeed driver
    HID: driver for TopSeed Cyberlink quirky remote
    HID: make boot protocol drivers depend on EMBEDDED
    HID: avoid sparse warning in HID_COMPAT_LOAD_DRIVER
    HID: hiddev cleanup -- handle all error conditions properly
    HID: force feedback driver for GreenAsia 0x12 PID
    HID: switch specialized drivers from "default y" to !EMBEDDED
    HID: set proper dev.parent in hidraw
    HID: add dynids facility
    HID: use GFP_KERNEL in hid_alloc_buffers
    HID: usbhid, use usb_endpoint_xfer_int
    HID: move usbhid flags to usbhid.h
    HID: add n-trig digitizer support
    HID: add phys and name ioctls to hidraw
    HID: struct device - replace bus_id with dev_name(), dev_set_name()
    HID: automatically call usbhid_set_leds in usbhid driver
    ...

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: (27 commits)
    GFS2: Use DEFINE_SPINLOCK
    GFS2: Fix use-after-free bug on umount (try #2)
    Revert "GFS2: Fix use-after-free bug on umount"
    GFS2: Streamline alloc calculations for writes
    GFS2: Send useful information with uevent messages
    GFS2: Fix use-after-free bug on umount
    GFS2: Remove ancient, unused code
    GFS2: Move four functions from super.c
    GFS2: Fix bug in gfs2_lock_fs_check_clean()
    GFS2: Send some sensible sysfs stuff
    GFS2: Kill two daemons with one patch
    GFS2: Move gfs2_recoverd into recovery.c
    GFS2: Fix "truncate in progress" hang
    GFS2: Clean up & move gfs2_quotad
    GFS2: Add more detail to debugfs glock dumps
    GFS2: Banish struct gfs2_rgrpd_host
    GFS2: Move rg_free from gfs2_rgrpd_host to gfs2_rgrpd
    GFS2: Move rg_igeneration into struct gfs2_rgrpd
    GFS2: Banish struct gfs2_dinode_host
    GFS2: Move i_size from gfs2_dinode_host and rename it to i_disksize
    ...

    Linus Torvalds
     
  • When using "min()", the types of both sides should match. With the cpu
    mask changes, the type of num_online_cpus() will now depend on config
    options. Use "min_t()" with an explicit type instead.

    And make the rx/tx case look the same too, just for sanity.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc-2.6: (30 commits)
    sparc: Fix minor SPARC32 compile error
    sparc: Remove reg*.h from Kbuild
    sparc: Clean arch-specific code in prom_common.c
    sparc: Kill asm/reg*.h
    sparc: Use 64BIT config entry
    MAINTAINERS: update sparc maintainer
    sparc: unify ipcbuf.h
    sparc: Update 64-bit defconfig.
    sparc: remove NO_PROC_ID - it is no longer used
    sparc: drop get_tbr() in traps.h
    sparc: fix warning in userspace header traps.h
    sparc: fix warnings in userspace header byteorder.h
    sparc: fix warning in userspace header jsflash.h
    sparc: unify openprom.h
    sparc64: delete unused linux_prom64_ranges from openprom_64.h
    sparc: prepare openprom for unification
    sparc: remove linux_prom_pci_assigned_addresses from openprom_32.h
    sparc: remove ebus definitions from openprom*.h
    sparc: unify siginfo.h
    sparc: unify ptrace.h
    ...

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (44 commits)
    qlge: Fix sparse warnings for tx ring indexes.
    qlge: Fix sparse warning regarding rx buffer queues.
    qlge: Fix sparse endian warning in ql_hw_csum_setup().
    qlge: Fix sparse endian warning for inbound packet control block flags.
    qlge: Fix sparse warnings for byte swapping in qlge_ethool.c
    myri10ge: print MAC and serial number on probe failure
    pkt_sched: cls_u32: Fix locking in u32_change()
    iucv: fix cpu hotplug
    af_iucv: Free iucv path/socket in path_pending callback
    af_iucv: avoid left over IUCV connections from failing connects
    af_iucv: New error return codes for connect()
    net/ehea: bitops work on unsigned longs
    Revert "net: Fix for initial link state in 2.6.28"
    tcp: Kill extraneous SPLICE_F_NONBLOCK checks.
    tcp: don't mask EOF and socket errors on nonblocking splice receive
    dccp: Integrate the TFRC library with DCCP
    dccp: Clean up ccid.c after integration of CCID plugins
    dccp: Lockless integration of CCID congestion-control plugins
    qeth: get rid of extra argument after printk to dev_* conversion
    qeth: No large send using EDDP for HiperSockets.
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound-2.6:
    ALSA: ice1724 - Fix a typo in IEC958 PCM name
    ASoC: fix davinci-sffsdr buglet
    ALSA: sound/usb: Use negated usb_endpoint_xfer_control, etc
    ALSA: hda - cxt5051 report jack state
    ALSA: hda - add basic jack reporting functions to patch_conexant.c
    ALSA: Use usb_set/get_intfdata
    ASoC: Clean up kerneldoc warnings
    ASoC: Fix pxa2xx-pcm checks for invalid DMA channels
    LSA: hda - Add HP Acacia detection
    ALSA: hda - fix name for ALC1200
    ALSA: sound/usb: use USB API functions rather than constants
    ASoC: TWL4030: DAPM based capture implementation
    ASoC: TWL4030: Make the enum filter generic for twl4030

    Linus Torvalds