05 Oct, 2009

1 commit

  • It was briefly introduced to allow CFQ to to delayed scheduling,
    but we ended up removing that feature again. So lets kill the
    function and export, and just switch CFQ back to the normal work
    schedule since it is now passing in a '0' delay from all call
    sites.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

04 Oct, 2009

1 commit

  • Not all users of the topology information want to use libblkid. Provide
    the topology information through bdev ioctls.

    Also clarify sector size comments for existing BLK ioctls.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

03 Oct, 2009

1 commit


02 Oct, 2009

2 commits

  • Currently we set the bio size to the byte equivalent of the blocks to
    be trimmed when submitting the initial DISCARD ioctl. That means it
    is subject to the max_hw_sectors limitation of the HBA which is
    much lower than the size of a DISCARD request we can support.
    Add a separate max_discard_sectors tunable to limit the size for discard
    requests.

    We limit the max discard request size in bytes to 32bit as that is the
    limit for bio->bi_size. This could be much larger if we had a way to pass
    that information through the block layer.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • prepare_discard_fn() was being called in a place where memory allocation
    was effectively impossible. This makes it inappropriate for all but
    the most trivial translations of Linux's DISCARD operation to the block
    command set. Additionally adding a payload there makes the ownership
    of the bio backing unclear as it's now allocated by the device driver
    and not the submitter as usual.

    It is replaced with QUEUE_FLAG_DISCARD which is used to indicate whether
    the queue supports discard operations or not. blkdev_issue_discard now
    allocates a one-page, sector-length payload which is the right thing
    for the common ATA and SCSI implementations.

    The mtd implementation of prepare_discard_fn() is replaced with simply
    checking for the request being a discard.

    Largely based on a previous patch from Matthew Wilcox
    which did the prepare_discard_fn but not the different payload allocation
    yet.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

14 Sep, 2009

2 commits

  • blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard,
    the only difference between the two is that blkdev_issue_discard needs to
    send a barrier discard request and blk_ioctl_discard a non-barrier one,
    and blk_ioctl_discard needs to wait on the request. To facilitates this
    add a flags argument to blkdev_issue_discard to control both aspects of the
    behaviour. This will be very useful later on for using the waiting
    funcitonality for other callers.

    Based on an earlier patch from Matthew Wilcox .

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Implement blk_limits_io_opt() and make blk_queue_io_opt() a wrapper
    around it. DM needs this to avoid poking at the queue_limits directly.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

11 Sep, 2009

5 commits

  • Test results here look good, and on big OLTP runs it has also shown
    to significantly increase cycles attributed to the database and
    cause a performance boost.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Instead of just checking whether this device uses block layer
    tagging, we can improve the detection by looking at the maximum
    queue depth it has reached. If that crosses 4, then deem it a
    queuing device.

    This is important on high IOPS devices, since plugging hurts
    the performance there (it can be as much as 10-15% of the sys
    time).

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Get rid of any functions that test for these bits and make callers
    use bio_rw_flagged() directly. Then it is at least directly apparent
    what variable and flag they check.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Failfast has characteristics from other attributes. When issuing,
    executing and successuflly completing requests, failfast doesn't make
    any difference. It only affects how a request is handled on failure.
    Allowing requests with different failfast settings to be merged cause
    normal IOs to fail prematurely while not allowing has performance
    penalties as failfast is used for read aheads which are likely to be
    located near in-flight or to-be-issued normal IOs.

    This patch introduces the concept of 'mixed merge'. A request is a
    mixed merge if it is merge of segments which require different
    handling on failure. Currently the only mixable attributes are
    failfast ones (or lack thereof).

    When a bio with different failfast settings is added to an existing
    request or requests of different failfast settings are merged, the
    merged request is marked mixed. Each bio carries failfast settings
    and the request always tracks failfast state of the first bio. When
    the request fails, blk_rq_err_bytes() can be used to determine how
    many bytes can be safely failed without crossing into an area which
    requires further retrials.

    This allows request merging regardless of failfast settings while
    keeping the failure handling correct.

    This patch only implements mixed merge but doesn't enable it. The
    next one will update SCSI to make use of mixed merge.

    Signed-off-by: Tejun Heo
    Cc: Niel Lambrechts
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • bio and request use the same set of failfast bits. This patch makes
    the following changes to simplify things.

    * enumify BIO_RW* bits and reorder bits such that BIOS_RW_FAILFAST_*
    bits coincide with __REQ_FAILFAST_* bits.

    * The above pushes BIO_RW_AHEAD out of sync with __REQ_FAILFAST_DEV
    but the matching is useless anyway. init_request_from_bio() is
    responsible for setting FAILFAST bits on FS requests and non-FS
    requests never use BIO_RW_AHEAD. Drop the code and comment from
    blk_rq_bio_prep().

    * Define REQ_FAILFAST_MASK which is OR of all FAILFAST bits and
    simplify FAILFAST flags handling in init_request_from_bio().

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     

01 Aug, 2009

1 commit


12 Jul, 2009

1 commit

  • Move the definition of BLK_RW_ASYNC/BLK_RW_SYNC into linux/backing-dev.h
    so that it is available to all callers of set/clear_bdi_congested().

    This replaces commit 097041e576ee3a50d92dd643ee8ca65bf6a62e21 ("fuse:
    Fix build error"), which will be reverted.

    Signed-off-by: Trond Myklebust
    Acked-by: Larry Finger
    Cc: Jens Axboe
    Cc: Miklos Szeredi
    Signed-off-by: Linus Torvalds

    Trond Myklebust
     

11 Jul, 2009

2 commits

  • I overlooked SG_DXFER_TO_FROM_DEV support when I converted sg to use
    the block layer mapping API (2.6.28).

    Douglas Gilbert explained SG_DXFER_TO_FROM_DEV:

    http://www.spinics.net/lists/linux-scsi/msg37135.html

    =
    The semantics of SG_DXFER_TO_FROM_DEV were:
    - copy user space buffer to kernel (LLD) buffer
    - do SCSI command which is assumed to be of the DATA_IN
    (data from device) variety. This would overwrite
    some or all of the kernel buffer
    - copy kernel (LLD) buffer back to the user space.

    The idea was to detect short reads by filling the original
    user space buffer with some marker bytes ("0xec" it would
    seem in this report). The "resid" value is a better way
    of detecting short reads but that was only added this century
    and requires co-operation from the LLD.
    =

    This patch changes the block layer mapping API to support this
    semantics. This simply adds another field to struct rq_map_data and
    enables __bio_copy_iov() to copy data from user space even with READ
    requests.

    It's better to add the flags field and kills null_mapped and the new
    from_user fields in struct rq_map_data but that approach makes it
    difficult to send this patch to stable trees because st and osst
    drivers use struct rq_map_data (they were converted to use the block
    layer in 2.6.29 and 2.6.30). Well, I should clean up the block layer
    mapping API.

    zhou sf reported this regiression and tested this patch:

    http://www.spinics.net/lists/linux-scsi/msg37128.html
    http://www.spinics.net/lists/linux-scsi/msg37168.html

    Reported-by: zhou sf
    Tested-by: zhou sf
    Cc: stable@kernel.org
    Signed-off-by: FUJITA Tomonori
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     
  • Commit 1faa16d22877f4839bd433547d770c676d1d964c accidentally broke
    the bdi congestion wait queue logic, causing us to wait on congestion
    for WRITE (== 1) when we really wanted BLK_RW_ASYNC (== 0) instead.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

01 Jul, 2009

1 commit

  • The initial patches to support this through sysfs export were broken
    and have been if 0'ed out in any release. So lets just kill the code
    and reclaim some space in struct request_queue, if anyone would later
    like to fixup the sysfs bits, the git history can easily restore
    the removed bits.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

16 Jun, 2009

1 commit


13 Jun, 2009

1 commit

  • * 'for-2.6.31' of git://git.kernel.org/pub/scm/linux/kernel/git/bart/ide-2.6: (29 commits)
    ide: re-implement ide_pci_init_one() on top of ide_pci_init_two()
    ide: unexport ide_find_dma_mode()
    ide: fix PowerMac bootup oops
    ide: skip probe if there are no devices on the port (v2)
    sl82c105: add printk() logging facility
    ide-tape: fix proc warning
    ide: add IDE_DFLAG_NIEN_QUIRK device flag
    ide: respect quirk_drives[] list on all controllers
    hpt366: enable all quirks for devices on quirk_drives[] list
    hpt366: sync quirk_drives[] list with pdc202xx_{new,old}.c
    ide: remove superfluous SELECT_MASK() call from do_rw_taskfile()
    ide: remove superfluous SELECT_MASK() call from ide_driveid_update()
    icside: remove superfluous ->maskproc method
    ide-tape: fix IDE_AFLAG_* atomic accesses
    ide-tape: change IDE_AFLAG_IGNORE_DSC non-atomically
    pdc202xx_old: kill resetproc() method
    pdc202xx_old: don't call pdc202xx_reset() on IRQ timeout
    pdc202xx_old: use ide_dma_test_irq()
    ide: preserve Host Protected Area by default (v2)
    ide-gd: implement block device ->set_capacity method (v2)
    ...

    Linus Torvalds
     

11 Jun, 2009

1 commit

  • This patch adds the following 2 interfaces for request-stacking drivers:

    - blk_rq_prep_clone(struct request *clone, struct request *orig,
    struct bio_set *bs, gfp_t gfp_mask,
    int (*bio_ctr)(struct bio *, struct bio*, void *),
    void *data)
    * Clones bios in the original request to the clone request
    (bio_ctr is called for each cloned bios.)
    * Copies attributes of the original request to the clone request.
    The actual data parts (e.g. ->cmd, ->buffer, ->sense) are not
    copied.

    - blk_rq_unprep_clone(struct request *clone)
    * Frees cloned bios from the clone request.

    Request stacking drivers (e.g. request-based dm) need to make a clone
    request for a submitted request and dispatch it to other devices.

    To allocate request for the clone, request stacking drivers may not
    be able to use blk_get_request() because the allocation may be done
    in an irq-disabled context.
    So blk_rq_prep_clone() takes a request allocated by the caller
    as an argument.

    For each clone bio in the clone request, request stacking drivers
    should be able to set up their own completion handler.
    So blk_rq_prep_clone() takes a callback function which is called
    for each clone bio, and a pointer for private data which is passed
    to the callback.

    NOTE:
    blk_rq_prep_clone() doesn't copy any actual data of the original
    request. Pages are shared between original bios and cloned bios.
    So caller must not complete the original request before the clone
    request.

    Signed-off-by: Kiyoshi Ueda
    Signed-off-by: Jun'ichi Nomura
    Cc: Boaz Harrosh
    Signed-off-by: Jens Axboe

    Kiyoshi Ueda
     

09 Jun, 2009

1 commit


07 Jun, 2009

1 commit

  • * Add ->set_capacity block device method and use it in rescan_partitions()
    to attempt enabling native capacity of the device upon detecting the
    partition which exceeds device capacity.

    * Add GENHD_FL_NATIVE_CAPACITY flag to try limit attempts of enabling
    native capacity during partition scan.

    Together with the consecutive patch implementing ->set_capacity method in
    ide-gd device driver this allows automatic disabling of Host Protected Area
    (HPA) if any partitions overlapping HPA are detected.

    Cc: Robert Hancock
    Cc: Frans Pop
    Cc: "Andries E. Brouwer"
    Acked-by: Al Viro
    Emphatically-Acked-by: Alan Cox
    Signed-off-by: Bartlomiej Zolnierkiewicz

    Bartlomiej Zolnierkiewicz
     

03 Jun, 2009

1 commit

  • blk_queue_bounce_limit() is more than a wrapper about the request queue
    limits.bounce_pfn variable. Introduce blk_queue_bounce_pfn() which can
    be called by stacking drivers that wish to set the bounce limit
    explicitly.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

23 May, 2009

5 commits

  • To support devices with physical block sizes bigger than 512 bytes we
    need to ensure proper alignment. This patch adds support for exposing
    I/O topology characteristics as devices are stacked.

    logical_block_size is the smallest unit the device can address.

    physical_block_size indicates the smallest I/O the device can write
    without incurring a read-modify-write penalty.

    The io_min parameter is the smallest preferred I/O size reported by
    the device. In many cases this is the same as the physical block
    size. However, the io_min parameter can be scaled up when stacking
    (RAID5 chunk size > physical block size).

    The io_opt characteristic indicates the optimal I/O size reported by
    the device. This is usually the stripe width for arrays.

    The alignment_offset parameter indicates the number of bytes the start
    of the device/partition is offset from the device's natural alignment.
    Partition tools and MD/DM utilities can use this to pad their offsets
    so filesystems start on proper boundaries.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • To accommodate stacking drivers that do not have an associated request
    queue we're moving the limits to a separate, embedded structure.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Convert all external users of queue limits to using wrapper functions
    instead of poking the request queue variables directly.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Until now we have had a 1:1 mapping between storage device physical
    block size and the logical block sized used when addressing the device.
    With SATA 4KB drives coming out that will no longer be the case. The
    sector size will be 4KB but the logical block size will remain
    512-bytes. Hence we need to distinguish between the physical block size
    and the logical ditto.

    This patch renames hardsect_size to logical_block_size.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • Conflicts:
    drivers/block/hd.c
    drivers/block/mg_disk.c

    Signed-off-by: Jens Axboe

    Jens Axboe
     

20 May, 2009

1 commit


19 May, 2009

2 commits

  • OSD was the last in-tree user of blk_rq_append_bio(). Now
    that it is fixed blk_rq_append_bio is un-exported and
    is only used internally by block layer.

    Signed-off-by: Boaz Harrosh
    Signed-off-by: Jens Axboe

    Boaz Harrosh
     
  • New block API:
    given a struct bio allocates a new request. This is the parallel of
    generic_make_request for BLOCK_PC commands users.

    The passed bio may be a chained-bio. The bio is bounced if needed
    inside the call to this member.

    This is in the effort of un-exporting blk_rq_append_bio().

    Signed-off-by: Boaz Harrosh
    CC: Jeff Garzik
    Signed-off-by: Jens Axboe

    Boaz Harrosh
     

11 May, 2009

7 commits

  • Let's put the completion related functions back to block/blk-core.c
    where they have lived. We can also unexport blk_end_bidi_request() and
    __blk_end_bidi_request(), which nobody uses.

    Signed-off-by: FUJITA Tomonori
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     
  • blk_end_request_all() and __blk_end_request_all() should finish all
    bytes including bidi, by definition. That's what all bidi users need ,
    bidi requests must be complete as a whole (partial completion is
    impossible).

    Signed-off-by: FUJITA Tomonori
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     
  • Till now block layer allowed two separate modes of request execution.
    A request is always acquired from the request queue via
    elv_next_request(). After that, drivers are free to either dequeue it
    or process it without dequeueing. Dequeue allows elv_next_request()
    to return the next request so that multiple requests can be in flight.

    Executing requests without dequeueing has its merits mostly in
    allowing drivers for simpler devices which can't do sg to deal with
    segments only without considering request boundary. However, the
    benefit this brings is dubious and declining while the cost of the API
    ambiguity is increasing. Segment based drivers are usually for very
    old or limited devices and as converting to dequeueing model isn't
    difficult, it doesn't justify the API overhead it puts on block layer
    and its more modern users.

    Previous patches converted all block low level drivers to dequeueing
    model. This patch completes the API transition by...

    * renaming elv_next_request() to blk_peek_request()

    * renaming blkdev_dequeue_request() to blk_start_request()

    * adding blk_fetch_request() which is combination of peek and start

    * disallowing completion of queued (not started) requests

    * applying new API to all LLDs

    Renamings are for consistency and to break out of tree code so that
    it's apparent that out of tree drivers need updating.

    [ Impact: block request issue API cleanup, no functional change ]

    Signed-off-by: Tejun Heo
    Cc: Rusty Russell
    Cc: James Bottomley
    Cc: Mike Miller
    Cc: unsik Kim
    Cc: Paul Clements
    Cc: Tim Waugh
    Cc: Geert Uytterhoeven
    Cc: David S. Miller
    Cc: Laurent Vivier
    Cc: Jeff Garzik
    Cc: Jeremy Fitzhardinge
    Cc: Grant Likely
    Cc: Adrian McMenamin
    Cc: Stephen Rothwell
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Borislav Petkov
    Cc: Sergei Shtylyov
    Cc: Alex Dubov
    Cc: Pierre Ossman
    Cc: David Woodhouse
    Cc: Markus Lidel
    Cc: Stefan Weinhuber
    Cc: Martin Schwidefsky
    Cc: Pete Zaitcev
    Cc: FUJITA Tomonori
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Block low level drivers for some reason have been pretty good at
    abusing block layer API. Especially struct request's fields tend to
    get violated in all possible ways. Make it clear that low level
    drivers MUST NOT access or manipulate rq->sector and rq->data_len
    directly by prefixing them with double underscores.

    This change is also necessary to break build of out-of-tree codes
    which assume the previous block API where internal fields can be
    manipulated and rq->data_len carries residual count on completion.

    [ Impact: hide internal fields, block API change ]

    Signed-off-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • struct request has had a few different ways to represent some
    properties of a request. ->hard_* represent block layer's view of the
    request progress (completion cursor) and the ones without the prefix
    are supposed to represent the issue cursor and allowed to be updated
    as necessary by the low level drivers. The thing is that as block
    layer supports partial completion, the two cursors really aren't
    necessary and only cause confusion. In addition, manual management of
    request detail from low level drivers is cumbersome and error-prone at
    the very least.

    Another interesting duplicate fields are rq->[hard_]nr_sectors and
    rq->{hard_cur|current}_nr_sectors against rq->data_len and
    rq->bio->bi_size. This is more convoluted than the hard_ case.

    rq->[hard_]nr_sectors are initialized for requests with bio but
    blk_rq_bytes() uses it only for !pc requests. rq->data_len is
    initialized for all request but blk_rq_bytes() uses it only for pc
    requests. This causes good amount of confusion throughout block layer
    and its drivers and determining the request length has been a bit of
    black magic which may or may not work depending on circumstances and
    what the specific LLD is actually doing.

    rq->{hard_cur|current}_nr_sectors represent the number of sectors in
    the contiguous data area at the front. This is mainly used by drivers
    which transfers data by walking request segment-by-segment. This
    value always equals rq->bio->bi_size >> 9. However, data length for
    pc requests may not be multiple of 512 bytes and using this field
    becomes a bit confusing.

    In general, having multiple fields to represent the same property
    leads only to confusion and subtle bugs. With recent block low level
    driver cleanups, no driver is accessing or manipulating these
    duplicate fields directly. Drop all the duplicates. Now rq->sector
    means the current sector, rq->data_len the current total length and
    rq->bio->bi_size the current segment length. Everything else is
    defined in terms of these three and available only through accessors.

    * blk_recalc_rq_sectors() is collapsed into blk_update_request() and
    now handles pc and fs requests equally other than rq->sector update.
    This means that now pc requests can use partial completion too (no
    in-kernel user yet tho).

    * bio_cur_sectors() is replaced with bio_cur_bytes() as block layer
    now uses byte count as the primary data length.

    * blk_rq_pos() is now guranteed to be always correct. In-block users
    converted.

    * blk_rq_bytes() is now guaranteed to be always valid as is
    blk_rq_sectors(). In-block users converted.

    * blk_rq_sectors() is now guaranteed to equal blk_rq_bytes() >> 9.
    More convenient one is used.

    * blk_rq_bytes() and blk_rq_cur_bytes() are now inlined and take const
    pointer to request.

    [ Impact: API cleanup, single way to represent one property of a request ]

    Signed-off-by: Tejun Heo
    Cc: Boaz Harrosh
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Implement accessors - blk_rq_pos(), blk_rq_sectors() and
    blk_rq_cur_sectors() which return rq->hard_sector, rq->hard_nr_sectors
    and rq->hard_cur_sectors respectively and convert direct references of
    the said fields to the accessors.

    This is in preparation of request data length handling cleanup.

    Geert : suggested adding const to struct request * parameter to accessors
    Sergei : spotted error in patch description

    [ Impact: cleanup ]

    Signed-off-by: Tejun Heo
    Acked-by: Geert Uytterhoeven
    Acked-by: Stephen Rothwell
    Tested-by: Grant Likely
    Acked-by: Grant Likely
    Ackec-by: Sergei Shtylyov
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Borislav Petkov
    Cc: James Bottomley
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • rq->data_len served two purposes - the length of data buffer on issue
    and the residual count on completion. This duality creates some
    headaches.

    First of all, block layer and low level drivers can't really determine
    what rq->data_len contains while a request is executing. It could be
    the total request length or it coulde be anything else one of the
    lower layers is using to keep track of residual count. This
    complicates things because blk_rq_bytes() and thus
    [__]blk_end_request_all() relies on rq->data_len for PC commands.
    Drivers which want to report residual count should first cache the
    total request length, update rq->data_len and then complete the
    request with the cached data length.

    Secondly, it makes requests default to reporting full residual count,
    ie. reporting that no data transfer occurred. The residual count is
    an exception not the norm; however, the driver should clear
    rq->data_len to zero to signify the normal cases while leaving it
    alone means no data transfer occurred at all. This reverse default
    behavior complicates code unnecessarily and renders block PC on some
    drivers (ide-tape/floppy) unuseable.

    This patch adds rq->resid_len which is used only for residual count.

    While at it, remove now unnecessasry blk_rq_bytes() caching in
    ide_pc_intr() as rq->data_len is not changed anymore.

    Boaz : spotted missing conversion in osd
    Sergei : spotted too early conversion to blk_rq_bytes() in ide-tape

    [ Impact: cleanup residual count handling, report 0 resid by default ]

    Signed-off-by: Tejun Heo
    Cc: James Bottomley
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Borislav Petkov
    Cc: Sergei Shtylyov
    Cc: Mike Miller
    Cc: Eric Moore
    Cc: Alan Stern
    Cc: FUJITA Tomonori
    Cc: Doug Gilbert
    Cc: Mike Miller
    Cc: Eric Moore
    Cc: Darrick J. Wong
    Cc: Pete Zaitcev
    Cc: Boaz Harrosh
    Signed-off-by: Jens Axboe

    Tejun Heo
     

03 May, 2009

1 commit


01 May, 2009

1 commit

  • Original patch (dfa4411cc3a690011cab90e9a536938795366cf9) was buggy.
    This is a more proper fix which introduces blk_rq_quiet() macro
    alleviating the need for dumb, too short caching variables.

    Thanks to Helge Deller and Bart for debugging this.

    Signed-off-by: Borislav Petkov
    Cc: Jens Axboe
    Cc: Sergei Shtylyov
    Reported-and-tested-by: Helge Deller
    Signed-off-by: Bartlomiej Zolnierkiewicz

    Borislav Petkov