29 Mar, 2009

2 commits

  • …nux/kernel/git/tip/linux-2.6-tip

    * 'percpu-cpumask-x86-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (682 commits)
    percpu: fix spurious alignment WARN in legacy SMP percpu allocator
    percpu: generalize embedding first chunk setup helper
    percpu: more flexibility for @dyn_size of pcpu_setup_first_chunk()
    percpu: make x86 addr <-> pcpu ptr conversion macros generic
    linker script: define __per_cpu_load on all SMP capable archs
    x86: UV: remove uv_flush_tlb_others() WARN_ON
    percpu: finer grained locking to break deadlock and allow atomic free
    percpu: move fully free chunk reclamation into a work
    percpu: move chunk area map extension out of area allocation
    percpu: replace pcpu_realloc() with pcpu_mem_alloc() and pcpu_mem_free()
    x86, percpu: setup reserved percpu area for x86_64
    percpu, module: implement reserved allocation and use it for module percpu variables
    percpu: add an indirection ptr for chunk page map access
    x86: make embedding percpu allocator return excessive free space
    percpu: use negative for auto for pcpu_setup_first_chunk() arguments
    percpu: improve first chunk initial area map handling
    percpu: cosmetic renames in pcpu_setup_first_chunk()
    percpu: clean up percpu constants
    x86: un-__init fill_pud/pmd/pte
    x86: remove vestigial fix_ioremap prototypes
    ...

    Manually merge conflicts in arch/ia64/kernel/irq_ia64.c

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (119 commits)
    [SCSI] scsi_dh_rdac: Retry for NOT_READY check condition
    [SCSI] mpt2sas: make global symbols unique
    [SCSI] sd: Make revalidate less chatty
    [SCSI] sd: Try READ CAPACITY 16 first for SBC-2 devices
    [SCSI] sd: Refactor sd_read_capacity()
    [SCSI] mpt2sas v00.100.11.15
    [SCSI] mpt2sas: add MPT2SAS_MINOR(221) to miscdevice.h
    [SCSI] ch: Add scsi type modalias
    [SCSI] 3w-9xxx: add power management support
    [SCSI] bsg: add linux/types.h include to bsg.h
    [SCSI] cxgb3i: fix function descriptions
    [SCSI] libiscsi: fix possbile null ptr session command cleanup
    [SCSI] iscsi class: remove host no argument from session creation callout
    [SCSI] libiscsi: pass session failure a session struct
    [SCSI] iscsi lib: remove qdepth param from iscsi host allocation
    [SCSI] iscsi lib: have lib create work queue for transmitting IO
    [SCSI] iscsi class: fix lock dep warning on logout
    [SCSI] libiscsi: don't cap queue depth in iscsi modules
    [SCSI] iscsi_tcp: replace scsi_debug/tcp_debug logging with iscsi conn logging
    [SCSI] libiscsi_tcp: replace tcp_debug/scsi_debug logging with session/conn logging
    ...

    Linus Torvalds
     

28 Mar, 2009

1 commit


26 Mar, 2009

2 commits


24 Mar, 2009

3 commits

  • Currently inherited from sg.c bsg will submit asynchronous request
    at the head-of-the-queue, (using "at_head" set in the call to
    blk_execute_rq_nowait()). This is bad in situation where the queues
    are full, requests will execute out of order, and can cause
    starvation of the first submitted requests.

    The sg_io_v4->flags member is used and a bit is allocated to denote the
    Q_AT_TAIL. Zero is to queue at_head as before, to be compatible with old
    code at the write/read path. SG_IO code path behavior was changed so to
    be the same as write/read behavior. SG_IO was very rarely used and breaking
    compatibility with it is OK at this stage.

    sg_io_hdr at sg.h also has a flags member and uses 3 bits from the first
    nibble and one bit from the last nibble. Even though none of these bits
    are supported by bsg, The second nibble is allocated for use by bsg. Just
    in case.

    Signed-off-by: Boaz Harrosh
    CC: Douglas Gilbert
    Signed-off-by: Jens Axboe

    Boaz Harrosh
     
  • Signed-off-by: Jens Axboe

    Jens Axboe
     
  • It calls blk_queue_make_request(), which sets the identical set of limits.

    Signed-off-by: Jens Axboe

    Jens Axboe
     

13 Mar, 2009

1 commit


06 Mar, 2009

1 commit

  • Commit 1e42807918d17e8c93bf14fbb74be84b141334c1 introduced a bug where we
    don't get front/back segment sizes in the bio in blk_recount_segments().
    Fix this by tracking the back bio as well as the front bio in
    __blk_recalc_rq_segments(), this also cleans up the interface by getting
    rid of the segment size pointer passing.

    Tested-by: Thomas Gleixner
    Tested-by: Ingo Molnar
    Signed-off-by: Jens Axboe

    Jens Axboe
     

04 Mar, 2009

1 commit


26 Feb, 2009

2 commits

  • blk_recalc_rq_segments() requires a request structure passed in, which
    we don't have from blk_recount_segments(). So the latter allocates one on
    the stack, using > 400 bytes of stack for that. This can cause us to spill
    over one page of stack from ext4 at least:

    0) 4560 400 blk_recount_segments+0x43/0x62
    1) 4160 32 bio_phys_segments+0x1c/0x24
    2) 4128 32 blk_rq_bio_prep+0x2a/0xf9
    3) 4096 32 init_request_from_bio+0xf9/0xfe
    4) 4064 112 __make_request+0x33c/0x3f6
    5) 3952 144 generic_make_request+0x2d1/0x321
    6) 3808 64 submit_bio+0xb9/0xc3
    7) 3744 48 submit_bh+0xea/0x10e
    8) 3696 368 ext4_mb_init_cache+0x257/0xa6a [ext4]
    9) 3328 288 ext4_mb_regular_allocator+0x421/0xcd9 [ext4]
    10) 3040 160 ext4_mb_new_blocks+0x211/0x4b4 [ext4]
    11) 2880 336 ext4_ext_get_blocks+0xb61/0xd45 [ext4]
    12) 2544 96 ext4_get_blocks_wrap+0xf2/0x200 [ext4]
    13) 2448 80 ext4_da_get_block_write+0x6e/0x16b [ext4]
    14) 2368 352 mpage_da_map_blocks+0x7e/0x4b3 [ext4]
    15) 2016 352 ext4_da_writepages+0x2ce/0x43c [ext4]
    16) 1664 32 do_writepages+0x2d/0x3c
    17) 1632 144 __writeback_single_inode+0x162/0x2cd
    18) 1488 96 generic_sync_sb_inodes+0x1e3/0x32b
    19) 1392 16 sync_sb_inodes+0xe/0x10
    20) 1376 48 writeback_inodes+0x69/0xb3
    21) 1328 208 balance_dirty_pages_ratelimited_nr+0x187/0x2f9
    22) 1120 224 generic_file_buffered_write+0x1d4/0x2c4
    23) 896 176 __generic_file_aio_write_nolock+0x35f/0x393
    24) 720 80 generic_file_aio_write+0x6c/0xc8
    25) 640 80 ext4_file_write+0xa9/0x137 [ext4]
    26) 560 320 do_sync_write+0xf0/0x137
    27) 240 48 vfs_write+0xb3/0x13c
    28) 192 64 sys_write+0x4c/0x74
    29) 128 128 system_call_fastpath+0x16/0x1b

    Split the segment counting out into a __blk_recalc_rq_segments() helper
    to avoid allocating an onstack request just for checking the physical
    segment count.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Add documentation for register_blkdev() function and for the parameters.

    Signed-off-by: Márton Németh
    Cc: Greg Kroah-Hartman
    Signed-off-by: Jens Axboe

    Márton Németh
     

25 Feb, 2009

1 commit


20 Feb, 2009

1 commit


18 Feb, 2009

4 commits

  • blk_abort_queue() iterates the timeout list and aborts each request on the
    list, but if the driver error handling readds a request to the timeout list
    during this processing, we could be looping forever. Fix this by splicing
    current entries to a local list and run over that list instead.

    Signed-off-by: Jens Axboe

    Hannes Reinecke
     
  • Hi Tejun,

    it looks like your commit:

    block: don't depend on consecutive minor space
    f331c0296f2a9fee0d396a70598b954062603015

    broke a particular case for booting from partitioned md/raid devices.
    That is the second time this has been broken recently. The previous
    time was fixed by

    block: do_mounts - accept root=
    30f2f0eb4bd2c43d10a8b0d872c6e5ad8f31c9a0

    Because the data isn't available when an md device is first created
    (we add disks and set it up after creation), the initial partition
    scan finds nothing. It is not until the device is opened that
    another partition scan happens and finds something.

    So at the point where the kernel parameter "root=/dev/md_d0p1" is
    being parsed, md_d0 exists, but md_d0p1 does not.
    However if we let blk_lookup_devt return the correct device number
    even though the device doesn't exist, then the attempt to mount it
    will successfully find the partition.

    I have tried in the past to find a way to get the partition table to
    be read as soon as the array is assembled but that proved impossible
    (at the time). I don't remember the details, and could possibly
    revisit it. However it would be really nice if blk_lookup_devt
    could be adjusted to again accept non existant partitions.

    Signed-off-by: Jens Axboe

    Neil Brown
     
  • We can't OR shift values, so get rid of BIO_RW_SYNC and use BIO_RW_SYNCIO
    and BIO_RW_UNPLUG explicitly. This brings back the behaviour from before
    213d9417fec62ef4c3675621b9364a667954d4dd.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • When submitting requests via SG_IO, which does a sync io, a
    bsg_command is not allocated. So an in-Kernel sense_buffer was not
    set. However when calling blk_execute_rq() with no sense buffer
    one is provided from the stack. Now bsg at blk_complete_sgv4_hdr_rq()
    would check if rq->sense_len and a sense was requested by sg_io_v4
    the rq->sense was copy_user() back, but by now it is already mangled
    stack memory.

    I have fixed that by forcing a sense_buffer when calling bsg_map_hdr().
    The bsg_command->sense is provided in the write/read path like before,
    and on-the-stack buffer is provided when doing SG_IO.

    I have also fixed a dprintk message to print rq->errors in hex because
    of the scsi bit-field use of this member. For other block devices it
    does not matter anyway.

    Signed-off-by: Boaz Harrosh
    Acked-by: FUJITA Tomonori
    Signed-off-by: Jens Axboe

    Boaz Harrosh
     

02 Feb, 2009

1 commit


30 Jan, 2009

8 commits


09 Jan, 2009

2 commits

  • * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (57 commits)
    jbd2: Fix oops in jbd2_journal_init_inode() on corrupted fs
    ext4: Remove "extents" mount option
    block: Add Kconfig help which notes that ext4 needs CONFIG_LBD
    ext4: Make printk's consistently prefixed with "EXT4-fs: "
    ext4: Add sanity checks for the superblock before mounting the filesystem
    ext4: Add mount option to set kjournald's I/O priority
    jbd2: Submit writes to the journal using WRITE_SYNC
    jbd2: Add pid and journal device name to the "kjournald2 starting" message
    ext4: Add markers for better debuggability
    ext4: Remove code to create the journal inode
    ext4: provide function to release metadata pages under memory pressure
    ext3: provide function to release metadata pages under memory pressure
    add releasepage hooks to block devices which can be used by file systems
    ext4: Fix s_dirty_blocks_counter if block allocation failed with nodelalloc
    ext4: Init the complete page while building buddy cache
    ext4: Don't allow new groups to be added during block allocation
    ext4: mark the blocks/inode bitmap beyond end of group as used
    ext4: Use new buffer_head flag to check uninit group bitmaps initialization
    ext4: Fix the race between read_inode_bitmap() and ext4_new_inode()
    ext4: code cleanup
    ...

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (45 commits)
    [SCSI] qla2xxx: Update version number to 8.03.00-k1.
    [SCSI] qla2xxx: Add ISP81XX support.
    [SCSI] qla2xxx: Use proper request/response queues with MQ instantiations.
    [SCSI] qla2xxx: Correct MQ-chain information retrieval during a firmware dump.
    [SCSI] qla2xxx: Collapse EFT/FCE copy procedures during a firmware dump.
    [SCSI] qla2xxx: Don't pollute kernel logs with ZIO/RIO status messages.
    [SCSI] qla2xxx: Don't fallback to interrupt-polling during re-initialization with MSI-X enabled.
    [SCSI] qla2xxx: Remove support for reading/writing HW-event-log.
    [SCSI] cxgb3i: add missing include
    [SCSI] scsi_lib: fix DID_RESET status problems
    [SCSI] fc transport: restore missing dev_loss_tmo callback to LLDD
    [SCSI] aha152x_cs: Fix regression that keeps driver from using shared interrupts
    [SCSI] sd: Correctly handle 6-byte commands with DIX
    [SCSI] sd: DIF: Fix tagging on platforms with signed char
    [SCSI] sd: DIF: Show app tag on error
    [SCSI] Fix error handling for DIF/DIX
    [SCSI] scsi_lib: don't decrement busy counters when inserting commands
    [SCSI] libsas: fix test for negative unsigned and typos
    [SCSI] a2091, gvp11: kill warn_unused_result warnings
    [SCSI] fusion: Move a dereference below a NULL test
    ...

    Fixed up trivial conflict due to moving the async part of sd_probe
    around in the async probes vs using dev_set_name() in naming.

    Linus Torvalds
     

07 Jan, 2009

2 commits


03 Jan, 2009

2 commits

  • The commit 818827669d85b84241696ffef2de485db46b0b5e (block: make
    blk_rq_map_user take a NULL user-space buffer) extended
    blk_rq_map_user to accept a NULL user-space buffer with a READ
    command. It was necessary to convert sg to use the block layer mapping
    API.

    This patch extends blk_rq_map_user again for a WRITE command. It is
    necessary to convert st and osst drivers to use the block layer
    apping API.

    Signed-off-by: FUJITA Tomonori
    Acked-by: Jens Axboe
    Signed-off-by: James Bottomley

    FUJITA Tomonori
     
  • This fixes bio_copy_user_iov to properly handle the partial mappings
    with struct rq_map_data (which only sg uses for now but st and osst
    will shortly). It adds the offset member to struct rq_map_data and
    changes blk_rq_map_user to update it so that bio_copy_user_iov can add
    an appropriate page frame via bio_add_pc_page().

    Signed-off-by: FUJITA Tomonori
    Acked-by: Jens Axboe
    Signed-off-by: James Bottomley

    FUJITA Tomonori
     

31 Dec, 2008

1 commit


30 Dec, 2008

1 commit


29 Dec, 2008

4 commits

  • Original patch from Nikanth Karthikesan

    When a queue exits the queue lock is taken and cfq_exit_queue() would free all
    the cic's associated with the queue.

    But when a task exits, cfq_exit_io_context() gets cic one by one and then
    locks the associated queue to call __cfq_exit_single_io_context. It looks like
    between getting a cic from the ioc and locking the queue, the queue might have
    exited on another cpu.

    Fix this by rechecking the cfq_io_context queue key inside the queue lock
    again, and not calling into __cfq_exit_single_io_context() if somebody
    beat us to it.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • We have two seperate config entries for large devices/files. One
    is CONFIG_LBD that guards just the devices, the other is CONFIG_LSF
    that handles large files. This doesn't make a lot of sense, you typically
    want both or none. So get rid of CONFIG_LSF and change CONFIG_LBD wording
    to indicate that it covers both.

    Acked-by: Jean Delvare
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Sparse asked whether these could be static.

    Signed-off-by: Roel Kluin
    Signed-off-by: Jens Axboe

    Roel Kluin
     
  • zero is invalid for max_phys_segments, max_hw_segments, and
    max_segment_size. It's better to use use min_not_zero instead of
    min. min() works though (because the commit 0e435ac makes sure that
    these values are set to the default values, non zero, if a queue is
    initialized properly).

    With this patch, blk_queue_stack_limits does the almost same thing
    that dm's combine_restrictions_low() does. I think that it's easy to
    remove dm's combine_restrictions_low.

    Signed-off-by: FUJITA Tomonori
    Signed-off-by: Jens Axboe

    FUJITA Tomonori