01 Mar, 2013

3 commits

  • Pull Ceph updates from Sage Weil:
    "A few groups of patches here. Alex has been hard at work improving
    the RBD code, layout groundwork for understanding the new formats and
    doing layering. Most of the infrastructure is now in place for the
    final bits that will come with the next window.

    There are a few changes to the data layout. Jim Schutt's patch fixes
    some non-ideal CRUSH behavior, and a set of patches from me updates
    the client to speak a newer version of the protocol and implement an
    improved hashing strategy across storage nodes (when the server side
    supports it too).

    A pair of patches from Sam Lang fix the atomicity of open+create
    operations. Several patches from Yan, Zheng fix various mds/client
    issues that turned up during multi-mds torture tests.

    A final set of patches expose file layouts via virtual xattrs, and
    allow the policies to be set on directories via xattrs as well
    (avoiding the awkward ioctl interface and providing a consistent
    interface for both kernel mount and ceph-fuse users)."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (143 commits)
    libceph: add support for HASHPSPOOL pool flag
    libceph: update osd request/reply encoding
    libceph: calculate placement based on the internal data types
    ceph: update support for PGID64, PGPOOL3, OSDENC protocol features
    ceph: update "ceph_features.h"
    libceph: decode into cpu-native ceph_pg type
    libceph: rename ceph_pg -> ceph_pg_v1
    rbd: pass length, not op for osd completions
    rbd: move rbd_osd_trivial_callback()
    libceph: use a do..while loop in con_work()
    libceph: use a flag to indicate a fault has occurred
    libceph: separate non-locked fault handling
    libceph: encapsulate connection backoff
    libceph: eliminate sparse warnings
    ceph: eliminate sparse warnings in fs code
    rbd: eliminate sparse warnings
    libceph: define connection flag helpers
    rbd: normalize dout() calls
    rbd: barriers are hard
    rbd: ignore zero-length requests
    ...

    Linus Torvalds
     
  • Pull block driver bits from Jens Axboe:
    "After the block IO core bits are in, please grab the driver updates
    from below as well. It contains:

    - Fix ancient regression in dac960. Nobody must be using that
    anymore...

    - Some good fixes from Guo Ghao for loop, fixing both potential
    oopses and deadlocks.

    - Improve mtip32xx for NUMA systems, by being a bit more clever in
    distributing work.

    - Add IBM RamSan 70/80 driver. A second round of fixes for that is
    pending, that will come in through for-linus during the 3.9 cycle
    as per usual.

    - A few xen-blk{back,front} fixes from Konrad and Roger.

    - Other minor fixes and improvements."

    * 'for-3.9/drivers' of git://git.kernel.dk/linux-block:
    loopdev: ignore negative offset when calculate loop device size
    loopdev: remove an user triggerable oops
    loopdev: move common code into loop_figure_size()
    loopdev: update block device size in loop_set_status()
    loopdev: fix a deadlock
    xen-blkback: use balloon pages for persistent grants
    xen-blkfront: drop the use of llist_for_each_entry_safe
    xen/blkback: Don't trust the handle from the frontend.
    xen-blkback: do not leak mode property
    block: IBM RamSan 70/80 driver fixes
    rsxx: add slab.h include to dma.c
    drivers/block/mtip32xx: add missing GENERIC_HARDIRQS dependency
    block: remove new __devinit/exit annotations on ramsam driver
    block: IBM RamSan 70/80 device driver
    drivers/block/mtip32xx/mtip32xx.c:1726:5: sparse: symbol 'mtip_send_trim' was not declared. Should it be static?
    drivers/block/mtip32xx/mtip32xx.c:4029:1: sparse: symbol 'mtip_workq_sdbf0' was not declared. Should it be static?
    dac960: return success instead of -ENOTTY
    mtip32xx: add trim support
    mtip32xx: Add workqueue and NUMA support
    block: delete super ancient PC-XT driver for 1980's hardware

    Linus Torvalds
     
  • Pull block IO core bits from Jens Axboe:
    "Below are the core block IO bits for 3.9. It was delayed a few days
    since my workstation kept crashing every 2-8h after pulling it into
    current -git, but turns out it is a bug in the new pstate code (divide
    by zero, will report separately). In any case, it contains:

    - The big cfq/blkcg update from Tejun and and Vivek.

    - Additional block and writeback tracepoints from Tejun.

    - Improvement of the should sort (based on queues) logic in the plug
    flushing.

    - _io() variants of the wait_for_completion() interface, using
    io_schedule() instead of schedule() to contribute to io wait
    properly.

    - Various little fixes.

    You'll get two trivial merge conflicts, which should be easy enough to
    fix up"

    Fix up the trivial conflicts due to hlist traversal cleanups (commit
    b67bfe0d42ca: "hlist: drop the node parameter from iterators").

    * 'for-3.9/core' of git://git.kernel.dk/linux-block: (39 commits)
    block: remove redundant check to bd_openers()
    block: use i_size_write() in bd_set_size()
    cfq: fix lock imbalance with failed allocations
    drivers/block/swim3.c: fix null pointer dereference
    block: don't select PERCPU_RWSEM
    block: account iowait time when waiting for completion of IO request
    sched: add wait_for_completion_io[_timeout]
    writeback: add more tracepoints
    block: add block_{touch|dirty}_buffer tracepoint
    buffer: make touch_buffer() an exported function
    block: add @req to bio_{front|back}_merge tracepoints
    block: add missing block_bio_complete() tracepoint
    block: Remove should_sort judgement when flush blk_plug
    block,elevator: use new hashtable implementation
    cfq-iosched: add hierarchical cfq_group statistics
    cfq-iosched: collect stats from dead cfqgs
    cfq-iosched: separate out cfqg_stats_reset() from cfq_pd_reset_stats()
    blkcg: make blkcg_print_blkgs() grab q locks instead of blkcg lock
    block: RCU free request_queue
    blkcg: implement blkg_[rw]stat_recursive_sum() and blkg_[rw]stat_merge()
    ...

    Linus Torvalds
     

28 Feb, 2013

7 commits

  • I just fixed this in "drivers/block/rbd.c" and I noticed that
    "drivers/block/nbd.c" has the same problem. Fix a warning issued by
    sparse by adding some lockdep annotations to indicate the queue lock gets
    dropped (because it's held when do_nbd_request() is called) and
    re-acquired within the function.

    Signed-off-by: Alex Elder
    Cc: Paul Clements
    Cc: Paul Clements
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Elder
     
  • Pass the read-only flag to set_device_ro, so that it will be visible to
    the block layer and in sysfs.

    Signed-off-by: Paolo Bonzini
    Cc: Paul Clements
    Cc: Alex Bligh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paolo Bonzini
     
  • There are two problems with shutdown in the NBD driver.

    1: Receiving the NBD_DISCONNECT ioctl does not sync the filesystem.

    This patch adds the sync operation into __nbd_ioctl()'s
    NBD_DISCONNECT handler. This is useful because BLKFLSBUF is restricted
    to processes that have CAP_SYS_ADMIN, and the NBD client may not
    possess it (fsync of the block device does not sync the filesystem,
    either).

    2: Once we clear the socket we have no guarantee that later reads will
    come from the same backing storage.

    The patch adds calls to kill_bdev() in __nbd_ioctl()'s socket
    clearing code so the page cache is cleaned, lest reads that hit on the
    page cache will return stale data from the previously-accessible disk.

    Example:

    # qemu-nbd -r -c/dev/nbd0 /dev/sr0
    # file -s /dev/nbd0
    /dev/stdin: # UDF filesystem data (version 1.5) etc.
    # qemu-nbd -d /dev/nbd0
    # qemu-nbd -r -c/dev/nbd0 /dev/sda
    # file -s /dev/nbd0
    /dev/stdin: # UDF filesystem data (version 1.5) etc.

    While /dev/sda has:

    # file -s /dev/sda
    /dev/sda: x86 boot sector; etc.

    Signed-off-by: Paolo Bonzini
    Acked-by: Paul Clements
    Cc: Alex Bligh
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paolo Bonzini
     
  • Currently, the NBD device does not accept flush requests from the Linux
    block layer. If the NBD server opened the target with neither O_SYNC nor
    O_DSYNC, however, the device will be effectively backed by a writeback
    cache. Without issuing flushes properly, operation of the NBD device will
    not be safe against power losses.

    The NBD protocol has support for both a cache flush command and a FUA
    command flag; the server will also pass a flag to note its support for
    these features. This patch adds support for the cache flush command and
    flag. In the kernel, we receive the flags via the NBD_SET_FLAGS ioctl,
    and map NBD_FLAG_SEND_FLUSH to the argument of blk_queue_flush. When the
    flag is active the block layer will send REQ_FLUSH requests, which we
    translate to NBD_CMD_FLUSH commands.

    FUA support is not included in this patch because all free software
    servers implement it with a full fdatasync; thus it has no advantage over
    supporting flush only. Because I [Paolo] cannot really benchmark it in a
    realistic scenario, I cannot tell if it is a good idea or not. It is also
    not clear if it is valid for an NBD server to support FUA but not flush.
    The Linux block layer gives a warning for this combination, the NBD
    protocol documentation says nothing about it.

    The patch also fixes a small problem in the handling of flags: nbd->flags
    must be cleared at the end of NBD_DO_IT, but the driver was not doing
    that. The bug manifests itself as follows. Suppose you two different
    client/server pairs to start the NBD device. Suppose also that the first
    client supports NBD_SET_FLAGS, and the first server sends
    NBD_FLAG_SEND_FLUSH; the second pair instead does neither of these two
    things. Before this patch, the second invocation of NBD_DO_IT will use a
    stale value of nbd->flags, and the second server will issue an error every
    time it receives an NBD_CMD_FLUSH command.

    This bug is pre-existing, but it becomes much more important after this
    patch; flush failures make the device pretty much unusable, unlike

    Signed-off-by: Paolo Bonzini
    Signed-off-by: Alex Bligh
    Acked-by: Paul Clements
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Bligh
     
  • Convert to the much saner new idr interface.

    Signed-off-by: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Convert to the much saner new idr interface.

    Signed-off-by: Tejun Heo
    Acked-by: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • idr_destroy() can destroy idr by itself and idr_remove_all() is being
    deprecated. Drop its usage.

    Signed-off-by: Tejun Heo
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     

27 Feb, 2013

4 commits

  • Pull vfs pile (part one) from Al Viro:
    "Assorted stuff - cleaning namei.c up a bit, fixing ->d_name/->d_parent
    locking violations, etc.

    The most visible changes here are death of FS_REVAL_DOT (replaced with
    "has ->d_weak_revalidate()") and a new helper getting from struct file
    to inode. Some bits of preparation to xattr method interface changes.

    Misc patches by various people sent this cycle *and* ocfs2 fixes from
    several cycles ago that should've been upstream right then.

    PS: the next vfs pile will be xattr stuff."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (46 commits)
    saner proc_get_inode() calling conventions
    proc: avoid extra pde_put() in proc_fill_super()
    fs: change return values from -EACCES to -EPERM
    fs/exec.c: make bprm_mm_init() static
    ocfs2/dlm: use GFP_ATOMIC inside a spin_lock
    ocfs2: fix possible use-after-free with AIO
    ocfs2: Fix oops in ocfs2_fast_symlink_readpage() code path
    get_empty_filp()/alloc_file() leave both ->f_pos and ->f_version zero
    target: writev() on single-element vector is pointless
    export kernel_write(), convert open-coded instances
    fs: encode_fh: return FILEID_INVALID if invalid fid_type
    kill f_vfsmnt
    vfs: kill FS_REVAL_DOT by adding a d_weak_revalidate dentry op
    nfsd: handle vfs_getattr errors in acl protocol
    switch vfs_getattr() to struct path
    default SET_PERSONALITY() in linux/elf.h
    ceph: prepopulate inodes only when request is aborted
    d_hash_and_lookup(): export, switch open-coded instances
    9p: switch v9fs_set_create_acl() to inode+fid, do it before d_instantiate()
    9p: split dropping the acls from v9fs_set_create_acl()
    ...

    Linus Torvalds
     
  • Use the new version of the encoding for osd requests and replies. In the
    process, update the way we are tracking request ops and reply lengths and
    results in the struct ceph_osd_request. Update the rbd and fs/ceph users
    appropriately.

    The main changes are:
    - we keep pointers into the request memory for fields we need to update
    each time the request is sent out over the wire
    - we keep information about the result in an array in the request struct
    where the users can easily get at it.

    Signed-off-by: Sage Weil
    Reviewed-by: Alex Elder

    Sage Weil
     
  • The only thing type-specific osd completion functions do with their
    osd op parameter is (in some cases) extract the number of bytes
    transferred from it. In the other cases, the xferred bytes field
    is not used, and total message data transfer byte count (which may
    well be zero) is used.

    Just set the object request transfer count in the main osd request
    callback function and provide that to the other routines. There is
    then no longer any need to pass the op pointer to the type-specific
    completion routines, so drop those parameters.

    Stop doing anything with the total message data length.

    Signed-off-by: Alex Elder
    Reviewed-by: Sage Weil

    Alex Elder
     
  • This function is slightly out of place, probably the result
    of an errant automatic merge or something.

    Signed-off-by: Alex Elder
    Reviewed-by: Sage Weil

    Alex Elder
     

26 Feb, 2013

5 commits

  • Signed-off-by: Al Viro

    Al Viro
     
  • Fengguang Wu reminded me that there were outstanding sparse reports
    in the ceph and rbd code. This patch fixes these problems in rbd
    that lead to those reports:
    - Convert functions that are never referenced externally to have
    static scope.
    - Add a lockdep annotation to rbd_request_fn(), because it
    releases a lock before acquiring it again.

    This partially resolves:
    http://tracker.ceph.com/issues/4184

    Reported-by: Fengguang Wu
    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     
  • Add dout() calls to facilitate tracing of image and object requests.
    Change a few existing calls so they use __func__ rather than the
    hard-coded function name. Have calls always add ":" after the name
    of the function, and prefix pointer values with a consistent tag
    indicating what it represents. (Note that there remain some older
    dout() calls that are left untouched by this patch.)

    Issue a warning if rbd_osd_write_callback() ever gets a short write.

    This resolves:
    http://tracker.ceph.com/issues/4235

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     
  • Let's go shopping!

    I'm afraid this may not have gotten it right:
    07741308 rbd: add barriers near done flag operations

    The smp_wmb() should have been done *before* setting the done flag,
    to ensure all other data was valid before marking the object request
    done.

    Switch to use atomic_inc_return() here to set the done flag, which
    allows us to verify we don't mark something done more than once.
    Doing this also implies general barriers before and after the call.

    And although a read memory barrier might have been sufficient before
    reading the done flag, convert this to a full memory barrier just
    to put this issue to bed.

    This resolves:
    http://tracker.ceph.com/issues/4238

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     
  • The old request code simply ignored zero-length requests. We should
    still operate that same way to avoid any changes in behavior. We
    can implement handling for special zero-length requests separately
    (see http://tracker.ceph.com/issues/4236).

    Add some assertions based on this new constraint.

    This resolves:
    http://tracker.ceph.com/issues/4237

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     

23 Feb, 2013

1 commit


22 Feb, 2013

7 commits

  • Negative offset may cause loop device size larger than backing file
    size.

    $ fallocate -l 1M a
    $ losetup --offset 0xffffffffffff0000 /dev/loop0 a
    $ blockdev --getsize64 /dev/loop0
    1114112
    $ ls -l a
    -rw-r--r-- 1 root root 1048576 Jan 23 12:46 a
    $ cat /dev/loop0
    cat: /dev/loop0: Input/output error

    It makes no sense to do that. Only apply offset when it's positive.

    Fix a typo in the comment by the way.

    Signed-off-by: Guo Chao
    Cc: Alexander Viro
    Cc: Guo Chao
    Cc: M. Hindess
    Cc: Nikanth Karthikesan
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Guo Chao
     
  • When loopdev is built as module and we pass an invalid parameter,
    loop_init() will return directly without deregister misc device, which
    will cause an oops when insert loop module next time because we left some
    garbage in the misc device list.

    Test case:
    sudo modprobe loop max_part=1024
    (failed due to invalid parameter)
    sudo modprobe loop
    (oops)

    Clean up nicely to avoid such oops.

    Signed-off-by: Guo Chao
    Cc: Alexander Viro
    Cc: Guo Chao
    Cc: M. Hindess
    Cc: Nikanth Karthikesan
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Guo Chao
     
  • Update block device size in accord with gendisk size and let userspace
    know the change in loop_figure_size(). This is a clean up to remove
    common code of loop_figure_size()'s two callers.

    Signed-off-by: Guo Chao
    Cc: Alexander Viro
    Cc: Guo Chao
    Cc: M. Hindess
    Cc: Nikanth Karthikesan
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Guo Chao
     
  • Loop device driver sometimes fails to impose the size limit on the
    device. Keep issuing following two commands:

    losetup --offset 7517244416 --sizelimit 3224971264 /dev/loop0 backed_file
    blockdev --getsize64 /dev/loop0

    blockdev reports file size instead of sizelimit several out of 100 times.

    The problems are:

    - losetup set up the device in two ioctl:
    LOOP_SET_FD and LOOP_SET_STATUS64.

    - LOOP_SET_STATUS64 only update size of gendisk.

    Block device size will be updated lazily when device comes to use. If udev
    rushes in between the two ioctl, it will bring in a block device whose
    size is backing file size. If the device is not released after
    LOOP_SET_STATUS64 ioctl, blockdev will not see the updated size.

    Update block size in LOOP_SET_STATUS64 ioctl.

    Signed-off-by: Guo Chao
    Reported-by: M. Hindess
    Cc: Alexander Viro
    Cc: Guo Chao
    Cc: Nikanth Karthikesan
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Guo Chao
     
  • bd_mutex and lo_ctl_mutex can be held in different order.

    Path #1:

    blkdev_open
    blkdev_get
    __blkdev_get (hold bd_mutex)
    lo_open (hold lo_ctl_mutex)

    Path #2:

    blkdev_ioctl
    lo_ioctl (hold lo_ctl_mutex)
    lo_set_capacity (hold bd_mutex)

    Lockdep does not report it, because path #2 actually holds a subclass of
    lo_ctl_mutex. This subclass seems creep into the code by mistake. The
    patch author actually just mentioned it in the changelog, see commit
    f028f3b2 ("loop: fix circular locking in loop_clr_fd()"), also see:

    http://marc.info/?l=linux-kernel&m=123806169129727&w=2

    Path #2 hold bd_mutex to call bd_set_size(), I've protected it
    with i_mutex in a previous patch, so drop bd_mutex at this site.

    Signed-off-by: Guo Chao
    Cc: Alexander Viro
    Cc: Guo Chao
    Cc: M. Hindess
    Cc: Nikanth Karthikesan
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Guo Chao
     
  • The use of pointer fs should be after the null check.

    Signed-off-by: Cong Ding
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Jens Axboe

    Cong Ding
     
  • Pull driver core patches from Greg Kroah-Hartman:
    "Here is the big driver core merge for 3.9-rc1

    There are two major series here, both of which touch lots of drivers
    all over the kernel, and will cause you some merge conflicts:

    - add a new function called devm_ioremap_resource() to properly be
    able to check return values.

    - remove CONFIG_EXPERIMENTAL

    Other than those patches, there's not much here, some minor fixes and
    updates"

    Fix up trivial conflicts

    * tag 'driver-core-3.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (221 commits)
    base: memory: fix soft/hard_offline_page permissions
    drivercore: Fix ordering between deferred_probe and exiting initcalls
    backlight: fix class_find_device() arguments
    TTY: mark tty_get_device call with the proper const values
    driver-core: constify data for class_find_device()
    firmware: Ignore abort check when no user-helper is used
    firmware: Reduce ifdef CONFIG_FW_LOADER_USER_HELPER
    firmware: Make user-mode helper optional
    firmware: Refactoring for splitting user-mode helper code
    Driver core: treat unregistered bus_types as having no devices
    watchdog: Convert to devm_ioremap_resource()
    thermal: Convert to devm_ioremap_resource()
    spi: Convert to devm_ioremap_resource()
    power: Convert to devm_ioremap_resource()
    mtd: Convert to devm_ioremap_resource()
    mmc: Convert to devm_ioremap_resource()
    mfd: Convert to devm_ioremap_resource()
    media: Convert to devm_ioremap_resource()
    iommu: Convert to devm_ioremap_resource()
    drm: Convert to devm_ioremap_resource()
    ...

    Linus Torvalds
     

21 Feb, 2013

1 commit


20 Feb, 2013

11 commits

  • …git/konrad/xen into for-3.9/drivers

    Konrad writes:

    Please git pull the following branch:

    git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/for-jens-3.9

    which has bug-fixes that did not make it in v3.8. They all are marked as
    material for the stable tree as well. There are two bug-fixes for
    the code that has been in there for some time (that is the Jan's fix
    and one of mine). And there are two bug-fixes for the persistent grant
    feature that debuted in v3.8 for xen blk[back|front]end.

    Jens Axboe
     
  • Alex Elder
     
  • The return values provided for ceph_copy_to_page_vector() and
    ceph_copy_from_page_vector() serve no purpose, so get rid of them.

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     
  • The result of ceph_copy_from_page_vector() is simply the length
    argument it is provided.

    This is called by rbd_obj_method_sync(), which returns the result if
    it's non-negative. But we always either ignore or overwrite that
    return value. So explicitly ignore what's returned by the copy
    function, and have rbd_obj_method_sync() always return either a
    negative errno or 0.

    We also return the result of ceph_copy_from_page_vector() in
    rbd_obj_read_sync(). There we still want to return the number of
    bytes transferred, but we can use the value we already have in hand
    rather than what ceph_copy_from_page_vector() provides.

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     
  • In rbd_obj_read_sync(), verify the number of bytes transferred won't
    exceed what can be represented by a size_t before using it to
    indicate the number of bytes to copy to the result buffer.

    (The real motivation for this is to prepare for the next patch.)

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     
  • Add support for CEPH_OSD_OP_STAT operations in the osd client
    and in rbd.

    This operation sends no data to the osd; everything required is
    encoded in identity of the target object.

    The result will be ENOENT if the object doesn't exist. If it does
    exist and no other error occurs the server returns the size and last
    modification time of the target object as output data (in little
    endian format). The size is a 64 bit unsigned and the time is
    ceph_timespec structure (two unsigned 32-bit integers, representing
    a seconds and nanoseconds value).

    This resolves:
    http://tracker.ceph.com/issues/4007

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     
  • The for_each_obj_request*() macros should parenthesize their uses of
    the ireq parameter.

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     
  • With current persistent grants implementation we are not freeing the
    persistent grants after we disconnect the device. Since grant map
    operations change the mfn of the allocated page, and we can no longer
    pass it to __free_page without setting the mfn to a sane value, use
    balloon grant pages instead, as the gntdev device does.

    Signed-off-by: Roger Pau Monné
    Cc: stable@vger.kernel.org
    Cc: Konrad Rzeszutek Wilk
    Signed-off-by: Konrad Rzeszutek Wilk

    Roger Pau Monne
     
  • Replace llist_for_each_entry_safe with a while loop.

    llist_for_each_entry_safe can trigger a bug in GCC 4.1, so it's best
    to remove it and use a while loop and do the deletion manually.

    Specifically this bug can be triggered by hot-unplugging a disk, either
    by doing xm block-detach or by save/restore cycle.

    BUG: unable to handle kernel paging request at fffffffffffffff0
    IP: [] blkif_free+0x63/0x130 [xen_blkfront]
    The crash call trace is:
    ...
    bad_area_nosemaphore+0x13/0x20
    do_page_fault+0x25e/0x4b0
    page_fault+0x25/0x30
    ? blkif_free+0x63/0x130 [xen_blkfront]
    blkfront_resume+0x46/0xa0 [xen_blkfront]
    xenbus_dev_resume+0x6c/0x140
    pm_op+0x192/0x1b0
    device_resume+0x82/0x1e0
    dpm_resume+0xc9/0x1a0
    dpm_resume_end+0x15/0x30
    do_suspend+0x117/0x1e0

    When drilling down to the assembler code, on newer GCC it does
    .L29:
    cmpq $-16, %r12 #, persistent_gnt check
    je .L30 #, out of the loop
    .L25:
    ... code in the loop
    testq %r13, %r13 # n
    je .L29 #, back to the top of the loop
    cmpq $-16, %r12 #, persistent_gnt check
    movq 16(%r12), %r13 # .node.next, n
    jne .L25 #, back to the top of the loop
    .L30:

    While on GCC 4.1, it is:
    L78:
    ... code in the loop
    testq %r13, %r13 # n
    je .L78 #, back to the top of the loop
    movq 16(%rbx), %r13 # .node.next, n
    jmp .L78 #, back to the top of the loop

    Which basically means that the exit loop condition instead of
    being:

    &(pos)->member != NULL;

    is:
    ;

    which makes the loop unbound.

    Since xen-blkfront is the only user of the llist_for_each_entry_safe
    macro remove it from llist.h.

    Orabug: 16263164
    CC: stable@vger.kernel.org
    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     
  • The 'handle' is the device that the request is from. For the life-time
    of the ring we copy it from a request to a response so that the frontend
    is not surprised by it. But we do not need it - when we start processing
    I/Os we have our own 'struct phys_req' which has only most essential
    information about the request. In fact the 'vbd_translate' ends up
    over-writing the preq.dev with a value from the backend.

    This assignment of preq.dev with the 'handle' value is superfluous
    so lets not do it.

    Cc: stable@vger.kernel.org
    Acked-by: Jan Beulich
    Acked-by: Ian Campbell
    Signed-off-by: Konrad Rzeszutek Wilk

    Konrad Rzeszutek Wilk
     
  • "be->mode" is obtained from xenbus_read(), which does a kmalloc() for
    the message body. The short string is never released, so do it along
    with freeing "be" itself, and make sure the string isn't kept when
    backend_changed() doesn't complete successfully (which made it
    desirable to slightly re-structure that function, so that the error
    cleanup can be done in one place).

    Reported-by: Olaf Hering
    CC: stable@vger.kernel.org
    Signed-off-by: Jan Beulich
    Signed-off-by: Konrad Rzeszutek Wilk

    Jan Beulich
     

19 Feb, 2013

1 commit

  • This patch includes the following driver fixes for the
    IBM RamSan 70/80 driver:

    o Changed the creg_ctrl lock from a mutex to a spinlock.
    o Added a count check for ioctl calls.
    o Removed unnecessary casting of void pointers.
    o Made every function static that needed to be.
    o Added comments to explain things more thoroughly.

    Signed-off-by: Philip J Kelleher
    Signed-off-by: Jens Axboe

    Philip J Kelleher