15 Oct, 2010

1 commit


14 Oct, 2010

1 commit


02 Oct, 2010

3 commits


01 Oct, 2010

7 commits

  • o Randy Dunlap reported following linux-next failure. This patch fixes it.

    on i386:

    blk-throttle.c:(.text+0x1abb8): undefined reference to `__udivdi3'
    blk-throttle.c:(.text+0x1b1dc): undefined reference to `__udivdi3'

    o bytes_per_second interface is 64bit and I was continuing to do 64 bit
    division even on 32bit platform without help of special macros/functions
    hence the failure.

    Signed-off-by: Vivek Goyal
    Reported-by: Randy Dunlap
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • o Currently any cgroup throttle limit changes are processed asynchronousy and
    the change does not take affect till a new bio is dispatched from same group.

    o It might happen that a user sets a redicuously low limit on throttling.
    Say 1 bytes per second on reads. In such cases simple operations like mount
    a disk can wait for a very long time.

    o Once bio is throttled, there is no easy way to come out of that wait even if
    user increases the read limit later.

    o This patch fixes it. Now if a user changes the cgroup limits, we recalculate
    the bio dispatch time according to new limits.

    o Can't take queueu lock under blkcg_lock, hence after the change I wake
    up the dispatch thread again which recalculates the time. So there are some
    variables being synchronized across two threads without lock and I had to
    make use of barriers. Hoping I have used barriers correctly. Any review of
    memory barrier code especially will help.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • o Currently all the dynamically allocated groups, except root grp is added
    to td->tg_list. This was not a problem so far but in next patch I will
    travel through td->tg_list to process any updates of limits on the group.
    If root group is not in tg_list, then root group's updates are not
    processed.

    o It is better to root group also to tg_list instead of doing special
    processing for it during limit updates.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • o Now a cgroup list of blkg elements can contain blkg from multiple policies.
    Before sending an unlink event, make sure blkg belongs to they policy. If
    policy does not own the blkg, do not send update for this blkg.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • Currently throttling related files were visible even if user had disabled
    throttling using config options. It was switching off background throttling
    of bio but not the cgroup files. This patch fixes it.

    Signed-off-by: Vivek Goyal
    Signed-off-by: Jens Axboe

    Vivek Goyal
     
  • The bounce_pfn of the request queue in 64 bit systems is set to the
    current max_low_pfn. Adding more memory later makes this incorrect.
    Memory allocated beyond this boot time max_low_pfn appear to require
    bounce buffers (bounce buffers are actually not allocated but used in
    calculating segments that may result in "over max segments limit"
    errors).

    Signed-off-by: Malahal Naineni
    Signed-off-by: Jens Axboe

    Malahal Naineni
     
  • Revert "block: set the bounce_pfn to the actual DMA limit rather than to max memory"

    This reverts commit c49825facfd4969585224a896a5e717f88450cad.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

25 Sep, 2010

2 commits


24 Sep, 2010

1 commit

  • During long I/O operations, the hang_check timer may fire,
    trigger stack dumps that unnecessarily alarm the user.

    Eg. hdparm --security-erase NULL /dev/sdb ## can take *hours* to complete

    So, if hang_check is armed, we should wake up periodically
    to prevent it from triggering. This patch uses a wake-up interval
    equal to half the hang_check timer period, which keeps overhead low enough.

    Signed-off-by: Mark Lord
    Signed-off-by: Jens Axboe

    Mark Lord
     

20 Sep, 2010

1 commit

  • Fsync performance for small files achieved by cfq on high-end disks is
    lower than what deadline can achieve, due to idling introduced between
    the sync write happening in process context and the journal commit.

    Moreover, when competing with a sequential reader, a process writing
    small files and fsync-ing them is starved.

    This patch fixes the two problems by:
    - marking journal commits as WRITE_SYNC, so that they get the REQ_NOIDLE
    flag set,
    - force all queues that have REQ_NOIDLE requests to be put in the noidle
    tree.

    Having the queue associated to the fsync-ing process and the one associated
    to journal commits in the noidle tree allows:
    - switching between them without idling,
    - fairness vs. competing idling queues, since they will be serviced only
    after the noidle tree expires its slice.

    Acked-by: Vivek Goyal
    Reviewed-by: Jeff Moyer
    Tested-by: Jeff Moyer
    Signed-off-by: Corrado Zoccolo
    Signed-off-by: Jens Axboe

    Corrado Zoccolo
     

17 Sep, 2010

2 commits

  • When CONFIG_BLOCK is not enabled:

    init/do_mounts.c:71: error: implicit declaration of function 'dev_to_part'
    init/do_mounts.c:71: warning: initialization makes pointer from integer without a cast
    init/do_mounts.c:73: error: dereferencing pointer to incomplete type
    init/do_mounts.c:76: error: dereferencing pointer to incomplete type
    init/do_mounts.c:76: error: dereferencing pointer to incomplete type
    init/do_mounts.c:102: error: implicit declaration of function 'part_pack_uuid'
    init/do_mounts.c:104: error: 'block_class' undeclared (first use in this function)

    Reported-by: Randy Dunlap
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • When a new disk is being discovered, add_disk() first ties the bdev to gendisk
    (via register_disk()->blkdev_get()) and only after that calls
    bdi_register_bdev(). Because register_disk() also creates disk's kobject, it
    can happen that userspace manages to open and modify the device's data (or
    inode) before its BDI is properly initialized leading to a warning in
    __mark_inode_dirty().

    Fix the problem by registering BDI early enough.

    This patch addresses https://bugzilla.kernel.org/show_bug.cgi?id=16312

    Cc: stable@kernel.org
    Reported-by: Larry Finger
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Signed-off-by: Jan Kara
     

16 Sep, 2010

9 commits


15 Sep, 2010

4 commits

  • This is the third patch in a series which adds support for
    storing partition metadata, optionally, off of the hd_struct.

    One major use for that data is being able to resolve partition
    by other identities than just the index on a block device. Device
    enumeration varies by platform and there's a benefit to being able
    to use something like EFI GPT's GUIDs to determine the correct
    block device and partition to mount as the root.

    This change adds that support to root= by adding support for
    the following syntax:

    root=PARTUUID=hex-uuid

    Signed-off-by: Will Drewry
    Signed-off-by: Jens Axboe

    Will Drewry
     
  • This change extends the partition_meta_info structure to
    support EFI GPT-specific metadata and ensures that data
    is copied in on partition scanning.

    Signed-off-by: Will Drewry
    Signed-off-by: Jens Axboe

    Will Drewry
     
  • I'm reposting this patch series as v4 since there have been no additional
    comments, and I cleaned up one extra bit of unneeded code (in 3/3). The patches
    are against Linus's tree: 2bfc96a127bc1cc94d26bfaa40159966064f9c8c
    (2.6.36-rc3).

    Would this patchset be suitable for inclusion in an mm branch?

    This changes adds a partition_meta_info struct which itself contains a
    union of structures that provide partition table specific metadata.

    This change leaves the union empty. The subsequent patch includes an
    implementation for CONFIG_EFI_PARTITION-based metadata.

    Signed-off-by: Will Drewry
    Signed-off-by: Jens Axboe

    Will Drewry
     
  • Change type of 2nd parameter of blk_rq_aligned() into unsigned long
    and remove unnecessary casting. Now we can call it with 'uaddr'
    instead of 'ubuf' in __blk_rq_map_user() so that it can remove
    following warnings from sparse:

    block/blk-map.c:57:31: warning: incorrect type in argument 2 (different address spaces)
    block/blk-map.c:57:31: expected void *addr
    block/blk-map.c:57:31: got void [noderef] *ubuf

    However blk_rq_map_kern() needs one more local variable to handle it.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Jens Axboe

    Namhyung Kim
     

14 Sep, 2010

1 commit


11 Sep, 2010

3 commits

  • When sending DIX integrity segments with an I/O request, the
    restriction for the maximum number of segments is still the same for
    the zfcp hardware. Report the new sg_prot_tablesize for the SCSI host,
    so that the number of integrity segments plus the number of data
    segments is not larger than the hardware limit. This results in using
    half of the hardware segments for integrity data and the other half
    for regular data.

    Reviewed-by: Swen Schillig
    Signed-off-by: Christof Schmitt
    Signed-off-by: Jens Axboe

    Christof Schmitt
     
  • Some controllers have a hardware limit on the number of protection
    information scatter-gather list segments they can handle.

    Introduce a max_integrity_segments limit in the block layer and provide
    a new scsi_host_template setting that allows HBA drivers to provide a
    value suitable for the hardware.

    Add support for honoring the integrity segment limit when merging both
    bios and requests.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     
  • We have several users of min_not_zero, each of them using their own
    definition. Move the define to kernel.h.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

23 Aug, 2010

5 commits