26 Sep, 2009

40 commits

  • Len Brown
     
  • Len Brown
     
  • Minor code cleanup, no functional change. Instead of remembering
    what HIDs & CIDs to add later, just add them immediately.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • Nobody uses acpi_device_uid(), so this patch removes it.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • Every acpi_device has at least one ID (if there's no _HID or _CID, we
    give it a synthetic or default ID). So there's no longer a need to
    check whether an ID exists; we can just use it.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • We now keep a single list of IDs that includes both the _HID and any
    _CIDs. We no longer need to keep track of whether the device has a _CID.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • There's no need to treat _HID and _CID differently. Keeping them in
    a single list makes code that uses the IDs a little simpler because it
    can just traverse the list rather than checking "do we have a HID?",
    "do we have any CIDs?"

    Signed-off-by: Bjorn Helgaas
    Reviewed-by: Alex Chiang
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • This makes sure every acpi_device has at least one ID. If we build an
    acpi_device for a namespace node with no _HID or _CID, we sometimes
    synthesize an ID like "LNXCPU" or "LNXVIDEO". If we don't even have
    that, give it a default "device" ID.

    Note that this means things like:
    /sys/devices/LNXSYSTM:00/LNXSYBUS:00/HWP0001:00/HWP0002:04/device:00
    (a PCI slot SxFy device) will have "hid" and "modprobe" entries, where
    they didn't before. These aren't very useful (a HID of "device" doesn't
    tell you what *kind* of device it is, so it doesn't help find a driver),
    but I don't think they're harmful.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • Use acpi_device_hid() rather than accessing acpi_device.pnp.hardware_id
    directly.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • This makes \_SB_ show up as /sys/devices/LNXSYSTM:00/LNXSYBUS:00
    rather than "device:00". This has been broken for a loooong time
    (at least since 2.6.13) because device->parent is an acpi_device
    pointer, not a handle.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • acpi_bus_scan() traverses the namespace to enumerate devices and uses
    acpi_add_single_object() to create acpi_devices. When the platform
    notifies us of a hot-plug event, we need to traverse part of the namespace
    again to figure out what appeared or disappeared. (We don't yet call
    acpi_bus_scan() during hot-plug, but I plan to do that in the future.)

    This patch makes acpi_add_single_object() notice when we already have
    an acpi_device, so we don't need to make a new one.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • This patch adds acpi_bus_type_and_status(), which determines the type
    of the object and whether we want to build an acpi_device for it. If
    it is acpi_device-worthy, it returns the type and the device's current
    status.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • Add acpi_bus_get_status_handle() so we can get the status of a namespace
    object before building a struct acpi_device.

    This removes a use of "device->flags.dynamic_status", a cached indicator of
    whether _STA exists. It seems simpler and more reliable to just evaluate
    _STA and catch AE_NOT_FOUND errors.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • acpi_bus_scan() currently walks the namespace manually. This patch changes
    it to use acpi_walk_namespace() instead.

    Besides removing some complicated code, this means we take advantage of the
    namespace locking done by acpi_walk_namespace(). The locking isn't so
    important at boot-time, but I hope to eventually use this same path to
    handle hot-addition of devices, when it will be important.

    Note that acpi_walk_namespace() does not actually visit the starting node
    first, so we need to do that by hand first.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • We can identify the root of the ACPI device tree by the fact that it
    has no parent. This is simpler than passing around ACPI_BUS_TYPE_SYSTEM
    and will help remove special treatment of the device tree root.

    Currently, we add the root by hand with ACPI_BUS_TYPE_SYSTEM. If we
    traverse the tree treating the root as just another device and use
    acpi_get_type(), the root shows up as ACPI_TYPE_DEVICE.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • This patch changes the order so we enumerate in the "root, namespace,
    functional fixed" order instead of the "root, functional fixed, namespace"
    order. When I change acpi_bus_scan() to use acpi_walk_namespace(), it
    will use the former order, so this patch isolates the order change for
    bisectability.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • This patch changes acpi_bus_scan() to take an acpi_handle rather than an
    acpi_device pointer. I plan to use acpi_bus_scan() in the hotplug path,
    and I'd rather not assume that notifications only go to nodes that already
    have acpi_devices.

    This will also help remove the special case for adding the root node. We
    currently add the root by hand before acpi_bus_scan(), but using a handle
    here means we can start the acpi_bus_scan() directly with the root even
    though it doesn't have an acpi_device yet.

    Note that acpi_bus_scan() currently adds and/or starts the *children* of
    its device argument. It doesn't do anything with the device itself.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • This patch adds acpi_bus_get_parent(), which ascends the namespace until
    it finds a parent with an acpi_device.

    Then we use acpi_bus_get_parent() in acpi_add_single_object(), so callers
    don't have to figure out or keep track of the parent acpi_device.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • acpi_add_single_object() is static, and all callers supply a valid "child"
    argument, so we don't need to check it. This patch also remove some
    unnecessary initializations.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • We now save the ACPI bus "device_type" in the acpi_device structure, so
    we don't need to pass it around explicitly anymore.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • We only pass the "type" to acpi_device_set_context() so we know whether
    the device has a handle to which we can attach the acpi_device pointer.
    But it's safer to just check for the handle directly, since it's in the
    acpi_device already.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • Check the acpi_device device_type rather than the HID.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • Most uses of the ACPI bus device_type (ACPI_BUS_TYPE_DEVICE,
    ACPI_BUS_TYPE_POWER, etc) are during device initialization, but
    we do need it later for notify handler installation, since that
    is different for fixed hardware devices vs. namespace devices.

    This patch saves the device_type in the acpi_device structure,
    so we can check that rather than comparing against the _HID string.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • In several cases, functions take handle and parent device pointers in
    addition to acpi_device pointers. But the acpi_device structure contains
    both the handle and the parent pointer, so it's pointless and error-prone
    to pass them all. This patch removes the unnecessary "handle" and "parent"
    arguments.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • We never use the "root" argument, so just remove it.

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • Add debug output for adding an ACPI device. Enable this with
    "acpi.debug_layer=0x00010000" (ACPI_BUS_COMPONENT).

    Signed-off-by: Bjorn Helgaas
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • Commit 15b8dd53f5ffa changed info->hardware_id from a static array to
    a pointer. If hardware_id is non-NULL, it points to a NULL-terminated
    string, so we don't need to terminate it explicitly. However, it may
    be NULL; in that case, we *can't* add a NULL terminator.

    This causes a NULL pointer dereference oops for devices without _HID.

    Signed-off-by: Bjorn Helgaas
    CC: Lin Ming
    CC: Bob Moore
    CC: Gary Hade
    Signed-off-by: Len Brown

    Bjorn Helgaas
     
  • * 'writeback' of git://git.kernel.dk/linux-2.6-block:
    writeback: writeback_inodes_sb() should use bdi_start_writeback()
    writeback: don't delay inodes redirtied by a fast dirtier
    writeback: make the super_block pinning more efficient
    writeback: don't resort for a single super_block in move_expired_inodes()
    writeback: move inodes from one super_block together
    writeback: get rid to incorrect references to pdflush in comments
    writeback: improve readability of the wb_writeback() continue/break logic
    writeback: cleanup writeback_single_inode()
    writeback: kupdate writeback shall not stop when more io is possible
    writeback: stop background writeback when below background threshold
    writeback: balance_dirty_pages() shall write more than dirtied pages
    fs: Fix busyloop in wb_writeback()

    Linus Torvalds
     
  • Pointless to iterate other devices looking for a super, when
    we have a bdi mapping.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Debug traces show that in per-bdi writeback, the inode under writeback
    almost always get redirtied by a busy dirtier. We used to call
    redirty_tail() in this case, which could delay inode for up to 30s.

    This is unacceptable because it now happens so frequently for plain cp/dd,
    that the accumulated delays could make writeback of big files very slow.

    So let's distinguish between data redirty and metadata only redirty.
    The first one is caused by a busy dirtier, while the latter one could
    happen in XFS, NFS, etc. when they are doing delalloc or updating isize.

    The inode being busy dirtied will now be requeued for next io, while
    the inode being redirtied by fs will continue to be delayed to avoid
    repeated IO.

    CC: Jan Kara
    CC: Theodore Ts'o
    CC: Dave Chinner
    CC: Chris Mason
    CC: Christoph Hellwig
    Signed-off-by: Wu Fengguang
    Signed-off-by: Jens Axboe

    Wu Fengguang
     
  • Currently we pin the inode->i_sb for every single inode. This
    increases cache traffic on sb->s_umount sem. Lets instead
    cache the inode sb pin state and keep the super_block pinned
    for as long as keep writing out inodes from the same
    super_block.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • If we only moved inodes from a single super_block to the temporary
    list, there's no point in doing a resort for multiple super_blocks.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • __mark_inode_dirty adds inode to wb dirty list in random order. If a disk has
    several partitions, writeback might keep spindle moving between partitions.
    To reduce the move, better write big chunk of one partition and then move to
    another. Inodes from one fs usually are in one partion, so idealy move indoes
    from one fs together should reduce spindle move. This patch tries to address
    this. Before per-bdi writeback is added, the behavior is write indoes
    from one fs first and then another, so the patch restores previous behavior.
    The loop in the patch is a bit ugly, should we add a dirty list for each
    superblock in bdi_writeback?

    Test in a two partition disk with attached fio script shows about 3% ~ 6%
    improvement.

    Signed-off-by: Shaohua Li
    Reviewed-by: Wu Fengguang
    Signed-off-by: Jens Axboe

    Shaohua Li
     
  • Signed-off-by: Jens Axboe

    Jens Axboe
     
  • And throw some comments in there, too.

    Reviewed-by: Wu Fengguang
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Make the if-else straight in writeback_single_inode().
    No behavior change.

    Cc: Jan Kara
    Cc: Michael Rubin
    Cc: Peter Zijlstra
    Signed-off-by: Fengguang Wu
    Signed-off-by: Jens Axboe

    Wu Fengguang
     
  • Fix the kupdate case, which disregards wbc.more_io and stop writeback
    prematurely even when there are more inodes to be synced.

    wbc.more_io should always be respected.

    Also remove the pages_skipped check. It will set when some page(s) of some
    inode(s) cannot be written for now. Such inodes will be delayed for a while.
    This variable has nothing to do with whether there are other writeable inodes.

    CC: Jan Kara
    CC: Dave Chinner
    CC: Peter Zijlstra
    Signed-off-by: Wu Fengguang
    Signed-off-by: Jens Axboe

    Wu Fengguang
     
  • Treat bdi_start_writeback(0) as a special request to do background write,
    and stop such work when we are below the background dirty threshold.

    Also simplify the (nr_pages
    CC: Jan Kara
    Acked-by: Peter Zijlstra
    Signed-off-by: Wu Fengguang
    Signed-off-by: Jens Axboe

    Wu Fengguang
     
  • Some filesystem may choose to write much more than ratelimit_pages
    before calling balance_dirty_pages_ratelimited_nr(). So it is safer to
    determine number to write based on real number of dirtied pages.

    Otherwise it is possible that
    loop {
    btrfs_file_write(): dirty 1024 pages
    balance_dirty_pages(): write up to 48 pages (= ratelimit_pages * 1.5)
    }
    in which the writeback rate cannot keep up with dirty rate, and the
    dirty pages go all the way beyond dirty_thresh.

    The increased write_chunk may make the dirtier more bumpy.
    So filesystems shall be take care not to dirty too much at
    a time (eg. > 4MB) without checking the ratelimit.

    Signed-off-by: Wu Fengguang
    Acked-by: Peter Zijlstra
    Signed-off-by: Jens Axboe

    Wu Fengguang
     
  • If all inodes are under writeback (e.g. in case when there's only one inode
    with dirty pages), wb_writeback() with WB_SYNC_NONE work basically degrades
    to busylooping until I_SYNC flags of the inode is cleared. Fix the problem by
    waiting on I_SYNC flags of an inode on b_more_io list in case we failed to
    write anything.

    Tested-by: Wu Fengguang
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara