02 Apr, 2014

1 commit


24 Nov, 2013

2 commits

  • This adds a mechanism by which we can advance a bio by an arbitrary
    number of bytes without modifying the biovec: bio->bi_iter.bi_bvec_done
    indicates the number of bytes completed in the current bvec.

    Various driver code still needs to be updated to not refer to the bvec
    directly before we can use this for interesting things, like efficient
    bio splitting.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Lars Ellenberg
    Cc: Paul Clements
    Cc: drbd-user@lists.linbit.com
    Cc: nbd-general@lists.sourceforge.net

    Kent Overstreet
     
  • More prep work for immutable biovecs - with immutable bvecs drivers
    won't be able to use the biovec directly, they'll need to use helpers
    that take into account bio->bi_iter.bi_bvec_done.

    This updates callers for the new usage without changing the
    implementation yet.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Ed L. Cashin"
    Cc: Nick Piggin
    Cc: Lars Ellenberg
    Cc: Jiri Kosina
    Cc: Paul Clements
    Cc: Jim Paris
    Cc: Geoff Levand
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Joshua Morris
    Cc: Philip Kelleher
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Neil Brown
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux390@de.ibm.com
    Cc: Nagalakshmi Nandigama
    Cc: Sreekanth Reddy
    Cc: support@lsi.com
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: Alexander Viro
    Cc: Steven Whitehouse
    Cc: Herton Ronaldo Krzesinski
    Cc: Tejun Heo
    Cc: Andrew Morton
    Cc: Guo Chao
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Matthew Wilcox
    Cc: Keith Busch
    Cc: Stephen Hemminger
    Cc: Quoc-Son Anh
    Cc: Sebastian Ott
    Cc: Nitin Gupta
    Cc: Minchan Kim
    Cc: Jerome Marchand
    Cc: Seth Jennings
    Cc: "Martin K. Petersen"
    Cc: Mike Snitzer
    Cc: Vivek Goyal
    Cc: "Darrick J. Wong"
    Cc: Chris Metcalf
    Cc: Jan Kara
    Cc: linux-m68k@lists.linux-m68k.org
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: drbd-user@lists.linbit.com
    Cc: nbd-general@lists.sourceforge.net
    Cc: cbe-oss-dev@lists.ozlabs.org
    Cc: xen-devel@lists.xensource.com
    Cc: virtualization@lists.linux-foundation.org
    Cc: linux-raid@vger.kernel.org
    Cc: linux-s390@vger.kernel.org
    Cc: DL-MPTFusionLinux@lsi.com
    Cc: linux-scsi@vger.kernel.org
    Cc: devel@driverdev.osuosl.org
    Cc: linux-fsdevel@vger.kernel.org
    Cc: cluster-devel@redhat.com
    Cc: linux-mm@kvack.org
    Acked-by: Geoff Levand

    Kent Overstreet
     

04 Jul, 2013

3 commits

  • Currently, when a disconnect is requested by the user (via NBD_DISCONNECT
    ioctl) the return from NBD_DO_IT is undefined (it is usually one of
    several error codes). This means that nbd-client does not know if a
    manual disconnect was performed or whether a network error occurred.
    Because of this, nbd-client's persist mode (which tries to reconnect after
    error, but not after manual disconnect) does not always work correctly.

    This change fixes this by causing NBD_DO_IT to always return 0 if a user
    requests a disconnect. This means that nbd-client can correctly either
    persist the connection (if an error occurred) or disconnect (if the user
    requested it).

    Signed-off-by: Paul Clements
    Acked-by: Rob Landley
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Clements
     
  • The NBD_CLEAR_QUE ioctl has been deprecated for quite some time (its job
    is now done by two other ioctls). We should stop trying to make bogus
    assertions in it. Also, user-level code should remove calls to
    NBD_CLEAR_QUE, ASAP.

    Signed-off-by: Michal Belczyk
    Signed-off-by: Paul Clements
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Belczyk
     
  • Disk names may contain arbitrary strings, so they must not be
    interpreted as format strings. It seems that only md allows arbitrary
    strings to be used for disk names, but this could allow for a local
    memory corruption from uid 0 into ring 0.

    CVE-2013-2851

    Signed-off-by: Kees Cook
    Cc: Jens Axboe
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     

01 May, 2013

1 commit

  • Raise the default max request size for nbd to 128KB (from 127KB) to get it
    4KB aligned. This patch also allows the max request size to be increased
    (via /sys/block/nbd/queue/max_sectors_kb) to 32MB.

    The patch makes nbd network traffic more efficient by:
    - reducing request fragmentation (4KB alignment)
    - reducing the number of requests (fewer round trips, less network overhead)

    Especially in high latency networks, larger request size can make a dramatic

    Signed-off-by: Paul Clements
    Signed-off-by: Michal Belczyk
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Belczyk
     

28 Feb, 2013

4 commits

  • I just fixed this in "drivers/block/rbd.c" and I noticed that
    "drivers/block/nbd.c" has the same problem. Fix a warning issued by
    sparse by adding some lockdep annotations to indicate the queue lock gets
    dropped (because it's held when do_nbd_request() is called) and
    re-acquired within the function.

    Signed-off-by: Alex Elder
    Cc: Paul Clements
    Cc: Paul Clements
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Elder
     
  • Pass the read-only flag to set_device_ro, so that it will be visible to
    the block layer and in sysfs.

    Signed-off-by: Paolo Bonzini
    Cc: Paul Clements
    Cc: Alex Bligh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paolo Bonzini
     
  • There are two problems with shutdown in the NBD driver.

    1: Receiving the NBD_DISCONNECT ioctl does not sync the filesystem.

    This patch adds the sync operation into __nbd_ioctl()'s
    NBD_DISCONNECT handler. This is useful because BLKFLSBUF is restricted
    to processes that have CAP_SYS_ADMIN, and the NBD client may not
    possess it (fsync of the block device does not sync the filesystem,
    either).

    2: Once we clear the socket we have no guarantee that later reads will
    come from the same backing storage.

    The patch adds calls to kill_bdev() in __nbd_ioctl()'s socket
    clearing code so the page cache is cleaned, lest reads that hit on the
    page cache will return stale data from the previously-accessible disk.

    Example:

    # qemu-nbd -r -c/dev/nbd0 /dev/sr0
    # file -s /dev/nbd0
    /dev/stdin: # UDF filesystem data (version 1.5) etc.
    # qemu-nbd -d /dev/nbd0
    # qemu-nbd -r -c/dev/nbd0 /dev/sda
    # file -s /dev/nbd0
    /dev/stdin: # UDF filesystem data (version 1.5) etc.

    While /dev/sda has:

    # file -s /dev/sda
    /dev/sda: x86 boot sector; etc.

    Signed-off-by: Paolo Bonzini
    Acked-by: Paul Clements
    Cc: Alex Bligh
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paolo Bonzini
     
  • Currently, the NBD device does not accept flush requests from the Linux
    block layer. If the NBD server opened the target with neither O_SYNC nor
    O_DSYNC, however, the device will be effectively backed by a writeback
    cache. Without issuing flushes properly, operation of the NBD device will
    not be safe against power losses.

    The NBD protocol has support for both a cache flush command and a FUA
    command flag; the server will also pass a flag to note its support for
    these features. This patch adds support for the cache flush command and
    flag. In the kernel, we receive the flags via the NBD_SET_FLAGS ioctl,
    and map NBD_FLAG_SEND_FLUSH to the argument of blk_queue_flush. When the
    flag is active the block layer will send REQ_FLUSH requests, which we
    translate to NBD_CMD_FLUSH commands.

    FUA support is not included in this patch because all free software
    servers implement it with a full fdatasync; thus it has no advantage over
    supporting flush only. Because I [Paolo] cannot really benchmark it in a
    realistic scenario, I cannot tell if it is a good idea or not. It is also
    not clear if it is valid for an NBD server to support FUA but not flush.
    The Linux block layer gives a warning for this combination, the NBD
    protocol documentation says nothing about it.

    The patch also fixes a small problem in the handling of flags: nbd->flags
    must be cleared at the end of NBD_DO_IT, but the driver was not doing
    that. The bug manifests itself as follows. Suppose you two different
    client/server pairs to start the NBD device. Suppose also that the first
    client supports NBD_SET_FLAGS, and the first server sends
    NBD_FLAG_SEND_FLUSH; the second pair instead does neither of these two
    things. Before this patch, the second invocation of NBD_DO_IT will use a
    stale value of nbd->flags, and the second server will issue an error every
    time it receives an NBD_CMD_FLUSH command.

    This bug is pre-existing, but it becomes much more important after this
    patch; flush failures make the device pretty much unusable, unlike

    Signed-off-by: Paolo Bonzini
    Signed-off-by: Alex Bligh
    Acked-by: Paul Clements
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alex Bligh
     

23 Feb, 2013

1 commit


06 Oct, 2012

2 commits

  • Add discard support to nbd. If the nbd-server supports discard, it will
    send NBD_FLAG_SEND_TRIM to the client. The client will then set the flag
    in the kernel via NBD_SET_FLAGS, which tells the kernel to enable discards
    for the device (QUEUE_FLAG_DISCARD).

    If discard support is enabled, then when the nbd client system receives a
    discard request, this will be passed along to the nbd-server. When the
    discard request is received by the nbd-server, it will perform:

    fallocate(.. FALLOC_FL_PUNCH_HOLE ..)

    To punch a hole in the backend storage, which is no longer needed.

    Signed-off-by: Paul Clements
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Clements
     
  • Add a set-flags ioctl, allowing various option flags to be set on an nbd
    device. This allows the nbd-client to set the device flags (to enable
    read-only mode, or enable discard support, etc.).

    Flags are typically specified by the nbd-server. During the negotiation
    phase of the nbd connection, the server sends its flags to the client.
    The client then uses NBD_SET_FLAGS to inform the kernel of the options.

    Also included is a one-line fix to debug output for the set-timeout ioctl.

    Signed-off-by: Paul Clements
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Clements
     

18 Sep, 2012

1 commit

  • Fix a serious but uncommon bug in nbd which occurs when there is heavy
    I/O going to the nbd device while, at the same time, a failure (server,
    network) or manual disconnect of the nbd connection occurs.

    There is a small window between the time that the nbd_thread is stopped
    and the socket is shutdown where requests can continue to be queued to
    nbd's internal waiting_queue. When this happens, those requests are
    never completed or freed.

    The fix is to clear the waiting_queue on shutdown of the nbd device, in
    the same way that the nbd request queue (queue_head) is already being
    cleared.

    Signed-off-by: Paul Clements
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Clements
     

02 Aug, 2012

1 commit

  • Pull block driver changes from Jens Axboe:

    - Making the plugging support for drivers a bit more sane from Neil.
    This supersedes the plugging change from Shaohua as well.

    - The usual round of drbd updates.

    - Using a tail add instead of a head add in the request completion for
    ndb, making us find the most completed request more quickly.

    - A few floppy changes, getting rid of a duplicated flag and also
    running the floppy init async (since it takes forever in boot terms)
    from Andi.

    * 'for-3.6/drivers' of git://git.kernel.dk/linux-block:
    floppy: remove duplicated flag FD_RAW_NEED_DISK
    blk: pass from_schedule to non-request unplug functions.
    block: stack unplug
    blk: centralize non-request unplug handling.
    md: remove plug_cnt feature of plugging.
    block/nbd: micro-optimization in nbd request completion
    drbd: announce FLUSH/FUA capability to upper layers
    drbd: fix max_bio_size to be unsigned
    drbd: flush drbd work queue before invalidate/invalidate remote
    drbd: fix potential access after free
    drbd: call local-io-error handler early
    drbd: do not reset rs_pending_cnt too early
    drbd: reset congestion information before reporting it in /proc/drbd
    drbd: report congestion if we are waiting for some userland callback
    drbd: differentiate between normal and forced detach
    drbd: cleanup, remove two unused global flags
    floppy: Run floppy initialization asynchronous

    Linus Torvalds
     

01 Aug, 2012

1 commit

  • Set SOCK_MEMALLOC on the NBD socket to allow access to PFMEMALLOC reserves
    so pages backed by NBD, particularly if swap related, can be cleaned to
    prevent the machine being deadlocked. It is still possible that the
    PFMEMALLOC reserves get depleted resulting in deadlock but this can be
    resolved by the administrator by increasing min_free_kbytes.

    Signed-off-by: Mel Gorman
    Cc: David Miller
    Cc: Neil Brown
    Cc: Peter Zijlstra
    Cc: Mike Christie
    Cc: Eric B Munson
    Cc: Eric Dumazet
    Cc: Sebastian Andrzej Siewior
    Cc: Mel Gorman
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

31 Jul, 2012

1 commit


29 Mar, 2012

3 commits

  • Merge third batch of patches from Andrew Morton:
    - Some MM stragglers
    - core SMP library cleanups (on_each_cpu_mask)
    - Some IPI optimisations
    - kexec
    - kdump
    - IPMI
    - the radix-tree iterator work
    - various other misc bits.

    "That'll do for -rc1. I still have ~10 patches for 3.4, will send
    those along when they've baked a little more."

    * emailed from Andrew Morton : (35 commits)
    backlight: fix typo in tosa_lcd.c
    crc32: add help text for the algorithm select option
    mm: move hugepage test examples to tools/testing/selftests/vm
    mm: move slabinfo.c to tools/vm
    mm: move page-types.c from Documentation to tools/vm
    selftests/Makefile: make `run_tests' depend on `all'
    selftests: launch individual selftests from the main Makefile
    radix-tree: use iterators in find_get_pages* functions
    radix-tree: rewrite gang lookup using iterator
    radix-tree: introduce bit-optimized iterator
    fs/proc/namespaces.c: prevent crash when ns_entries[] is empty
    nbd: rename the nbd_device variable from lo to nbd
    pidns: add reboot_pid_ns() to handle the reboot syscall
    sysctl: use bitmap library functions
    ipmi: use locks on watchdog timeout set on reboot
    ipmi: simplify locking
    ipmi: fix message handling during panics
    ipmi: use a tasklet for handling received messages
    ipmi: increase KCS timeouts
    ipmi: decrease the IPMI message transaction time in interrupt mode
    ...

    Linus Torvalds
     
  • rename the nbd_device variable from "lo" to "nbd", since "lo" is just a name
    copied from loop.c.

    Signed-off-by: Wanlong Gao
    Cc: Paul Clements
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wanlong Gao
     
  • Remove all #inclusions of asm/system.h preparatory to splitting and killing
    it. Performed with the following command:

    perl -p -i -e 's!^#\s*include\s*.*\n!!' `grep -Irl '^#\s*include\s*' *`

    Signed-off-by: David Howells

    David Howells
     

19 Aug, 2011

6 commits


28 May, 2011

3 commits

  • The 'max_part' parameter determines how many partitions are supported
    on each nbd device. However the actual number can be changed to the
    power of 2 minus 1 form during the module initialization as
    alloc_disk() is called with (1 << part_shift) for some reason.

    So adjust 'max_part' also at least for consistency with loop and brd.
    It is exported via sysfs already, and a user should check this value
    after module loading if [s]he wants to use that number correctly
    (i.e. fdisk or something).

    Signed-off-by: Namhyung Kim
    Cc: Laurent Vivier
    Cc: Paul Clements
    Signed-off-by: Jens Axboe

    Namhyung Kim
     
  • The 'max_part' parameter controls the number of maximum partition
    a nbd device can have. However if a user specifies very large
    value it would exceed the limitation of device minor number and
    can cause a kernel oops (or, at least, produce invalid device
    nodes in some cases).

    In addition, specifying large 'nbds_max' value causes same
    problem for the same reason.

    On my desktop, following command results to the kernel bug:

    $ sudo modprobe nbd max_part=100000
    kernel BUG at /media/Linux_Data/project/linux/fs/sysfs/group.c:65!
    invalid opcode: 0000 [#1] SMP
    last sysfs file: /sys/devices/virtual/block/nbd4/range
    CPU 1
    Modules linked in: nbd(+) bridge stp llc kvm_intel kvm asus_atk0110 sg sr_mod cdrom

    Pid: 2522, comm: modprobe Tainted: G W 2.6.39-leonard+ #159 System manufacturer System Product Name/P5G41TD-M PRO
    RIP: 0010:[] [] internal_create_group+0x2f/0x166
    RSP: 0018:ffff8801009f1de8 EFLAGS: 00010246
    RAX: 00000000ffffffef RBX: ffff880103920478 RCX: 00000000000a7bd3
    RDX: ffffffff81a2dbe0 RSI: 0000000000000000 RDI: ffff880103920478
    RBP: ffff8801009f1e38 R08: ffff880103920468 R09: ffff880103920478
    R10: ffff8801009f1de8 R11: ffff88011eccbb68 R12: ffffffff81a2dbe0
    R13: ffff880103920468 R14: 0000000000000000 R15: ffff880103920400
    FS: 00007f3c49de9700(0000) GS:ffff88011f800000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00007f3b7fe7c000 CR3: 00000000cd58d000 CR4: 00000000000406e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process modprobe (pid: 2522, threadinfo ffff8801009f0000, task ffff8801009a93a0)
    Stack:
    ffff8801009f1e58 ffffffff812e8f6e ffff8801009f1e58 ffffffff812e7a80
    ffff880000000010 ffff880103920400 ffff8801002fd0c0 ffff880103920468
    0000000000000011 ffff880103920400 ffff8801009f1e48 ffffffff8115ab6a
    Call Trace:
    [] ? device_add+0x4f1/0x5e4
    [] ? dev_set_name+0x41/0x43
    [] sysfs_create_group+0x13/0x15
    [] blk_trace_init_sysfs+0x14/0x16
    [] blk_register_queue+0x4c/0xfd
    [] add_disk+0xe4/0x29c
    [] nbd_init+0x2ab/0x30d [nbd]
    [] ? 0xffffffffa007dfff
    [] do_one_initcall+0x7f/0x13e
    [] sys_init_module+0xa1/0x1e3
    [] system_call_fastpath+0x16/0x1b
    Code: 41 57 41 56 41 55 41 54 53 48 83 ec 28 0f 1f 44 00 00 48 89 fb 41 89 f6 49 89 d4 48 85 ff 74 0b 85 f6 75 0b 48 83
    7f 30 00 75 14 0b eb fe b9 ea ff ff ff 48 83 7f 30 00 0f 84 09 01 00 00 49
    RIP [] internal_create_group+0x2f/0x166
    RSP
    ---[ end trace 753285ffbf72c57c ]---

    Signed-off-by: Namhyung Kim
    Cc: Laurent Vivier
    Cc: Paul Clements
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Namhyung Kim
     
  • Unlike kernel_sendmsg(), kernel_recvmsg() requires passing flags explicitly
    via last parameter instead of struct msghdr.msg_flags. Therefore calls to
    sock_xmit(lo, 0, ..., MSG_WAITALL) have not been processed properly by tcp
    layer wrt. the flag. Fix it.

    Signed-off-by: Namhyung Kim
    Cc: Paul Clements
    Signed-off-by: Jens Axboe

    Namhyung Kim
     

12 Feb, 2011

1 commit

  • Commit 2a48fc0ab242417 ("block: autoconvert trivial BKL users to private
    mutex") replaced uses of the BKL in the nbd driver with mutex
    operations. Since then, I've been been seeing these lock ups:

    INFO: task qemu-nbd:16115 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    qemu-nbd D 0000000000000001 0 16115 16114 0x00000004
    ffff88007d775d98 0000000000000082 ffff88007d775fd8 ffff88007d774000
    0000000000013a80 ffff8800020347e0 ffff88007d775fd8 0000000000013a80
    ffff880133730000 ffff880002034440 ffffea0004333db8 ffffffffa071c020
    Call Trace:
    [] __mutex_lock_slowpath+0xf7/0x180
    [] mutex_lock+0x2b/0x50
    [] nbd_ioctl+0x6c/0x1c0 [nbd]
    [] blkdev_ioctl+0x230/0x730
    [] block_ioctl+0x41/0x50
    [] do_vfs_ioctl+0x93/0x370
    [] sys_ioctl+0x81/0xa0
    [] system_call_fastpath+0x16/0x1b

    Instrumenting the nbd module's ioctl handler with some extra logging
    clearly shows the NBD_DO_IT ioctl being invoked which is a long-lived
    ioctl in the sense that it doesn't return until another ioctl asks the
    driver to disconnect. However, that other ioctl blocks, waiting for the
    module-level mutex that replaced the BKL, and then we're stuck.

    This patch removes the module-level mutex altogether. It's clearly
    wrong, and as far as I can see, it's entirely unnecessary, since the nbd
    driver maintains per-device mutexes, and I don't see anything that would
    require a module-level (or kernel-level, for that matter) mutex.

    Signed-off-by: Soren Hansen
    Acked-by: Serge Hallyn
    Acked-by: Paul Clements
    Cc: Arnd Bergmann
    Cc: Jens Axboe
    Cc: [2.6.37.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Soren Hansen
     

05 Oct, 2010

1 commit

  • The block device drivers have all gained new lock_kernel
    calls from a recent pushdown, and some of the drivers
    were already using the BKL before.

    This turns the BKL into a set of per-driver mutexes.
    Still need to check whether this is safe to do.

    file=$1
    name=$2
    if grep -q lock_kernel ${file} ; then
    if grep -q 'include.*linux.mutex.h' ${file} ; then
    sed -i '/include.*/d' ${file}
    else
    sed -i 's/include.*.*$/include /g' ${file}
    fi
    sed -i ${file} \
    -e "/^#include.*linux.mutex.h/,$ {
    1,/^\(static\|int\|long\)/ {
    /^\(static\|int\|long\)/istatic DEFINE_MUTEX(${name}_mutex);

    } }" \
    -e "s/\(un\)*lock_kernel\>[ ]*()/mutex_\1lock(\&${name}_mutex)/g" \
    -e '/[ ]*cycle_kernel_lock();/d'
    else
    sed -i -e '/include.*\/d' ${file} \
    -e '/cycle_kernel_lock()/d'
    fi

    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

11 Aug, 2010

1 commit

  • * 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
    block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
    xen-blkfront: fix missing out label
    blkdev: fix blkdev_issue_zeroout return value
    block: update request stacking methods to support discards
    block: fix missing export of blk_types.h
    writeback: fix bad _bh spinlock nesting
    drbd: revert "delay probes", feature is being re-implemented differently
    drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
    drbd: Disable delay probes for the upcomming release
    writeback: cleanup bdi_register
    writeback: add new tracepoints
    writeback: remove unnecessary init_timer call
    writeback: optimize periodic bdi thread wakeups
    writeback: prevent unnecessary bdi threads wakeups
    writeback: move bdi threads exiting logic to the forker thread
    writeback: restructure bdi forker loop a little
    writeback: move last_active to bdi
    writeback: do not remove bdi from bdi_list
    writeback: simplify bdi code a little
    writeback: do not lose wake-ups in bdi threads
    ...

    Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
    drivers/scsi/scsi_error.c as per Jens.

    Linus Torvalds
     

08 Aug, 2010

2 commits


19 Jul, 2010

1 commit


30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

22 Sep, 2009

1 commit


11 May, 2009

2 commits

  • Till now block layer allowed two separate modes of request execution.
    A request is always acquired from the request queue via
    elv_next_request(). After that, drivers are free to either dequeue it
    or process it without dequeueing. Dequeue allows elv_next_request()
    to return the next request so that multiple requests can be in flight.

    Executing requests without dequeueing has its merits mostly in
    allowing drivers for simpler devices which can't do sg to deal with
    segments only without considering request boundary. However, the
    benefit this brings is dubious and declining while the cost of the API
    ambiguity is increasing. Segment based drivers are usually for very
    old or limited devices and as converting to dequeueing model isn't
    difficult, it doesn't justify the API overhead it puts on block layer
    and its more modern users.

    Previous patches converted all block low level drivers to dequeueing
    model. This patch completes the API transition by...

    * renaming elv_next_request() to blk_peek_request()

    * renaming blkdev_dequeue_request() to blk_start_request()

    * adding blk_fetch_request() which is combination of peek and start

    * disallowing completion of queued (not started) requests

    * applying new API to all LLDs

    Renamings are for consistency and to break out of tree code so that
    it's apparent that out of tree drivers need updating.

    [ Impact: block request issue API cleanup, no functional change ]

    Signed-off-by: Tejun Heo
    Cc: Rusty Russell
    Cc: James Bottomley
    Cc: Mike Miller
    Cc: unsik Kim
    Cc: Paul Clements
    Cc: Tim Waugh
    Cc: Geert Uytterhoeven
    Cc: David S. Miller
    Cc: Laurent Vivier
    Cc: Jeff Garzik
    Cc: Jeremy Fitzhardinge
    Cc: Grant Likely
    Cc: Adrian McMenamin
    Cc: Stephen Rothwell
    Cc: Bartlomiej Zolnierkiewicz
    Cc: Borislav Petkov
    Cc: Sergei Shtylyov
    Cc: Alex Dubov
    Cc: Pierre Ossman
    Cc: David Woodhouse
    Cc: Markus Lidel
    Cc: Stefan Weinhuber
    Cc: Martin Schwidefsky
    Cc: Pete Zaitcev
    Cc: FUJITA Tomonori
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • With the previous changes, the followings are now guaranteed for all
    requests in any valid state.

    * blk_rq_sectors() == blk_rq_bytes() >> 9
    * blk_rq_cur_sectors() == blk_rq_cur_bytes() >> 9

    Clean up accessor usages. Notable changes are

    * nbd,i2o_block: end_all used instead of explicit byte count
    * scsi_lib: unnecessary conditional on request type removed

    [ Impact: cleanup ]

    Signed-off-by: Tejun Heo
    Cc: Paul Clements
    Cc: Pete Zaitcev
    Cc: Alex Dubov
    Cc: Markus Lidel
    Cc: David Woodhouse
    Cc: James Bottomley
    Cc: Boaz Harrosh
    Signed-off-by: Jens Axboe

    Tejun Heo