22 Aug, 2014

1 commit

  • Currenly ram disk is not visiable inside /proc/partitions. This was
    done for compatibility reasons here: 53978d0a7a27. But some utilities
    expect disk presents in /proc/partitions.
    Let's add module's option and let's administrator chose visibility behaviour.
    By default, old behaviour preserved.

    Signed-off-by: Dmitry Monakhov
    Signed-off-by: Jens Axboe

    Dmitry Monakhov
     

05 Jun, 2014

2 commits


24 Nov, 2013

2 commits

  • More prep work for immutable biovecs - with immutable bvecs drivers
    won't be able to use the biovec directly, they'll need to use helpers
    that take into account bio->bi_iter.bi_bvec_done.

    This updates callers for the new usage without changing the
    implementation yet.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Ed L. Cashin"
    Cc: Nick Piggin
    Cc: Lars Ellenberg
    Cc: Jiri Kosina
    Cc: Paul Clements
    Cc: Jim Paris
    Cc: Geoff Levand
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Joshua Morris
    Cc: Philip Kelleher
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Neil Brown
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux390@de.ibm.com
    Cc: Nagalakshmi Nandigama
    Cc: Sreekanth Reddy
    Cc: support@lsi.com
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: Alexander Viro
    Cc: Steven Whitehouse
    Cc: Herton Ronaldo Krzesinski
    Cc: Tejun Heo
    Cc: Andrew Morton
    Cc: Guo Chao
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Matthew Wilcox
    Cc: Keith Busch
    Cc: Stephen Hemminger
    Cc: Quoc-Son Anh
    Cc: Sebastian Ott
    Cc: Nitin Gupta
    Cc: Minchan Kim
    Cc: Jerome Marchand
    Cc: Seth Jennings
    Cc: "Martin K. Petersen"
    Cc: Mike Snitzer
    Cc: Vivek Goyal
    Cc: "Darrick J. Wong"
    Cc: Chris Metcalf
    Cc: Jan Kara
    Cc: linux-m68k@lists.linux-m68k.org
    Cc: linuxppc-dev@lists.ozlabs.org
    Cc: drbd-user@lists.linbit.com
    Cc: nbd-general@lists.sourceforge.net
    Cc: cbe-oss-dev@lists.ozlabs.org
    Cc: xen-devel@lists.xensource.com
    Cc: virtualization@lists.linux-foundation.org
    Cc: linux-raid@vger.kernel.org
    Cc: linux-s390@vger.kernel.org
    Cc: DL-MPTFusionLinux@lsi.com
    Cc: linux-scsi@vger.kernel.org
    Cc: devel@driverdev.osuosl.org
    Cc: linux-fsdevel@vger.kernel.org
    Cc: cluster-devel@redhat.com
    Cc: linux-mm@kvack.org
    Acked-by: Geoff Levand

    Kent Overstreet
     
  • Immutable biovecs are going to require an explicit iterator. To
    implement immutable bvecs, a later patch is going to add a bi_bvec_done
    member to this struct; for now, this patch effectively just renames
    things.

    Signed-off-by: Kent Overstreet
    Cc: Jens Axboe
    Cc: Geert Uytterhoeven
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: "Ed L. Cashin"
    Cc: Nick Piggin
    Cc: Lars Ellenberg
    Cc: Jiri Kosina
    Cc: Matthew Wilcox
    Cc: Geoff Levand
    Cc: Yehuda Sadeh
    Cc: Sage Weil
    Cc: Alex Elder
    Cc: ceph-devel@vger.kernel.org
    Cc: Joshua Morris
    Cc: Philip Kelleher
    Cc: Rusty Russell
    Cc: "Michael S. Tsirkin"
    Cc: Konrad Rzeszutek Wilk
    Cc: Jeremy Fitzhardinge
    Cc: Neil Brown
    Cc: Alasdair Kergon
    Cc: Mike Snitzer
    Cc: dm-devel@redhat.com
    Cc: Martin Schwidefsky
    Cc: Heiko Carstens
    Cc: linux390@de.ibm.com
    Cc: Boaz Harrosh
    Cc: Benny Halevy
    Cc: "James E.J. Bottomley"
    Cc: Greg Kroah-Hartman
    Cc: "Nicholas A. Bellinger"
    Cc: Alexander Viro
    Cc: Chris Mason
    Cc: "Theodore Ts'o"
    Cc: Andreas Dilger
    Cc: Jaegeuk Kim
    Cc: Steven Whitehouse
    Cc: Dave Kleikamp
    Cc: Joern Engel
    Cc: Prasad Joshi
    Cc: Trond Myklebust
    Cc: KONISHI Ryusuke
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Ben Myers
    Cc: xfs@oss.sgi.com
    Cc: Steven Rostedt
    Cc: Frederic Weisbecker
    Cc: Ingo Molnar
    Cc: Len Brown
    Cc: Pavel Machek
    Cc: "Rafael J. Wysocki"
    Cc: Herton Ronaldo Krzesinski
    Cc: Ben Hutchings
    Cc: Andrew Morton
    Cc: Guo Chao
    Cc: Tejun Heo
    Cc: Asai Thambi S P
    Cc: Selvan Mani
    Cc: Sam Bradshaw
    Cc: Wei Yongjun
    Cc: "Roger Pau Monné"
    Cc: Jan Beulich
    Cc: Stefano Stabellini
    Cc: Ian Campbell
    Cc: Sebastian Ott
    Cc: Christian Borntraeger
    Cc: Minchan Kim
    Cc: Jiang Liu
    Cc: Nitin Gupta
    Cc: Jerome Marchand
    Cc: Joe Perches
    Cc: Peng Tao
    Cc: Andy Adamson
    Cc: fanchaoting
    Cc: Jie Liu
    Cc: Sunil Mushran
    Cc: "Martin K. Petersen"
    Cc: Namjae Jeon
    Cc: Pankaj Kumar
    Cc: Dan Magenheimer
    Cc: Mel Gorman 6

    Kent Overstreet
     

08 Nov, 2013

1 commit

  • The probe function is supposed to return NULL on failure (as we can see in
    kobj_lookup: kobj = probe(dev, index, data); ... if (kobj) return kobj;

    However, in loop and brd, it returns negative error from ERR_PTR.

    This causes a crash if we simulate disk allocation failure and run
    less -f /dev/loop0 because the negative number is interpreted as a pointer:

    BUG: unable to handle kernel NULL pointer dereference at 00000000000002b4
    IP: [] __blkdev_get+0x28/0x450
    PGD 23c677067 PUD 23d6d1067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP
    Modules linked in: loop hpfs nvidia(PO) ip6table_filter ip6_tables uvesafb cfbcopyarea cfbimgblt cfbfillrect fbcon font bitblit fbcon_rotate fbcon_cw fbcon_ud fbcon_ccw softcursor fb fbdev msr ipt_MASQUERADE iptable_nat nf_nat_ipv4 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables bridge stp llc tun ipv6 cpufreq_stats cpufreq_ondemand cpufreq_userspace cpufreq_powersave cpufreq_conservative hid_generic spadfs usbhid hid fuse raid0 snd_usb_audio snd_pcm_oss snd_mixer_oss md_mod snd_pcm snd_timer snd_page_alloc snd_hwdep snd_usbmidi_lib dmi_sysfs snd_rawmidi nf_nat_ftp nf_nat nf_conntrack_ftp nf_conntrack snd soundcore lm85 hwmon_vid ohci_hcd ehci_pci ehci_hcd serverworks sata_svw libata acpi_cpufreq freq_table mperf ide_core usbcore kvm_amd kvm tg3 i2c_piix4 libphy microcode e100 usb_common ptp skge i2c_core pcspkr k10temp evdev floppy hwmon pps_core mii rtc_cmos button processor unix [last unloaded: nvidia]
    CPU: 1 PID: 6831 Comm: less Tainted: P W O 3.10.15-devel #18
    Hardware name: empty empty/S3992-E, BIOS 'V1.06 ' 06/09/2009
    task: ffff880203cc6bc0 ti: ffff88023e47c000 task.ti: ffff88023e47c000
    RIP: 0010:[] [] __blkdev_get+0x28/0x450
    RSP: 0018:ffff88023e47dbd8 EFLAGS: 00010286
    RAX: ffffffffffffff74 RBX: ffffffffffffff74 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000001
    RBP: ffff88023e47dc18 R08: 0000000000000002 R09: 0000000000000000
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff88023f519658
    R13: ffffffff8118c300 R14: 0000000000000000 R15: ffff88023f519640
    FS: 00007f2070bf7700(0000) GS:ffff880247400000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00000000000002b4 CR3: 000000023da1d000 CR4: 00000000000007e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Stack:
    0000000000000002 0000001d00000000 000000003e47dc50 ffff88023f519640
    ffff88043d5bb668 ffffffff8118c300 ffff88023d683550 ffff88023e47de60
    ffff88023e47dc98 ffffffff8118c10d 0000001d81605698 0000000000000292
    Call Trace:
    [] ? blkdev_get_by_dev+0x60/0x60
    [] blkdev_get+0x1dd/0x370
    [] ? blkdev_get_by_dev+0x60/0x60
    [] ? _raw_spin_unlock+0x2c/0x50
    [] ? blkdev_get_by_dev+0x60/0x60
    [] blkdev_open+0x65/0x80
    [] do_dentry_open.isra.18+0x23e/0x2f0
    [] finish_open+0x34/0x50
    [] do_last.isra.62+0x2d2/0xc50
    [] path_openat.isra.63+0xb8/0x4d0
    [] ? might_fault+0x4e/0xa0
    [] do_filp_open+0x40/0x90
    [] ? _raw_spin_unlock+0x2c/0x50
    [] ? __alloc_fd+0xa5/0x1f0
    [] do_sys_open+0xef/0x1d0
    [] SyS_open+0x19/0x20
    [] system_call_fastpath+0x1a/0x1f
    Code: 44 00 00 55 48 89 e5 41 57 49 89 ff 41 56 41 89 d6 41 55 41 54 4c 8d 67 18 53 48 83 ec 18 89 75 cc e9 f2 00 00 00 0f 1f 44 00 00 8b 80 40 03 00 00 48 89 df 4c 8b 68 58 e8 d5
    a4 07 00 44 89
    RIP [] __blkdev_get+0x28/0x450
    RSP
    CR2: 00000000000002b4
    ---[ end trace bb7f32dbf02398dc ]---

    The brd change should be backported to stable kernels starting with 2.6.25.
    The loop change should be backported to stable kernels starting with 2.6.22.

    Signed-off-by: Mikulas Patocka
    Acked-by: Tejun Heo
    Cc: stable@kernel.org # 2.6.22+
    Signed-off-by: Jens Axboe

    Mikulas Patocka
     

25 May, 2013

1 commit

  • The index on the page must be set before it is inserted in the radix
    tree. Otherwise there is a small race which can occur during lookup
    where the page can be found with the incorrect index. This will trigger
    the BUG_ON() in brd_lookup_page().

    Signed-off-by: Brian Behlendorf
    Reported-by: Chris Wedgwood
    Cc: Jens Axboe
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Brian Behlendorf
     

24 Mar, 2013

1 commit

  • Just a little convenience macro - main reason to add it now is preparing
    for immutable bio vecs, it'll reduce the size of the patch that puts
    bi_sector/bi_size/bi_idx into a struct bvec_iter.

    Signed-off-by: Kent Overstreet
    CC: Jens Axboe
    CC: Lars Ellenberg
    CC: Jiri Kosina
    CC: Alasdair Kergon
    CC: dm-devel@redhat.com
    CC: Neil Brown
    CC: Martin Schwidefsky
    CC: Heiko Carstens
    CC: linux-s390@vger.kernel.org
    CC: Chris Mason
    CC: Steven Whitehouse
    Acked-by: Steven Whitehouse

    Kent Overstreet
     

20 Mar, 2012

1 commit


04 Jan, 2012

1 commit

  • Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export
    kill_bdev as well, so brd doesn't have to open code it. Reduce
    buffer_head.h requirement accordingly.

    Removed a rather large comment from invalidate_bdev, as it looked a bit
    obsolete to bother moving. The small comment replacing it says enough.

    Signed-off-by: Nick Piggin
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Al Viro
     

12 Sep, 2011

1 commit

  • There is very little benefit in allowing to let a ->make_request
    instance update the bios device and sector and loop around it in
    __generic_make_request when we can archive the same through calling
    generic_make_request from the driver and letting the loop in
    generic_make_request handle it.

    Note that various drivers got the return value from ->make_request and
    returned non-zero values for errors.

    Signed-off-by: Christoph Hellwig
    Acked-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

27 May, 2011

5 commits

  • Export 'rd_nr', 'rd_size' and 'max_part' parameters to sysfs so user can
    know that how many devices are allowed, how big each device is and how
    many partitions are supported. If 'max_part' is 0, it means simply the
    device doesn't support partitioning.

    Also note that 'max_part' can be adjusted to power of 2 minus 1 form if
    needed. User should check this value after the module loading if he/she
    want to use that number correctly (i.e. fdisk, mknod, etc.).

    Signed-off-by: Namhyung Kim
    Cc: Laurent Vivier
    Signed-off-by: Jens Axboe

    Namhyung Kim
     
  • If 'rd_nr' param was not specified, 16 (can be adjusted via
    CONFIG_BLK_DEV_RAM_COUNT) devices would be created by default
    but comment said 1. Fix it.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Jens Axboe

    Namhyung Kim
     
  • When finding or allocating a ram disk device, brd_probe() did not take
    partition numbers into account so that it can result to a different
    device. Consider following example (I set CONFIG_BLK_DEV_RAM_COUNT=4
    for simplicity) :

    $ sudo modprobe brd max_part=15
    $ ls -l /dev/ram*
    brw-rw---- 1 root disk 1, 0 2011-05-25 15:41 /dev/ram0
    brw-rw---- 1 root disk 1, 16 2011-05-25 15:41 /dev/ram1
    brw-rw---- 1 root disk 1, 32 2011-05-25 15:41 /dev/ram2
    brw-rw---- 1 root disk 1, 48 2011-05-25 15:41 /dev/ram3
    $ sudo mknod /dev/ram4 b 1 64
    $ sudo dd if=/dev/zero of=/dev/ram4 bs=4k count=256
    256+0 records in
    256+0 records out
    1048576 bytes (1.0 MB) copied, 0.00215578 s, 486 MB/s
    namhyung@leonhard:linux$ ls -l /dev/ram*
    brw-rw---- 1 root disk 1, 0 2011-05-25 15:41 /dev/ram0
    brw-rw---- 1 root disk 1, 16 2011-05-25 15:41 /dev/ram1
    brw-rw---- 1 root disk 1, 32 2011-05-25 15:41 /dev/ram2
    brw-rw---- 1 root disk 1, 48 2011-05-25 15:41 /dev/ram3
    brw-r--r-- 1 root root 1, 64 2011-05-25 15:45 /dev/ram4
    brw-rw---- 1 root disk 1, 1024 2011-05-25 15:44 /dev/ram64

    After this patch, /dev/ram4 - instead of /dev/ram64 - was
    accessed correctly.

    In addition, 'range' passed to blk_register_region() should
    include all range of dev_t that RAMDISK_MAJOR can address.
    It does not need to be limited by partition numbers unless
    'rd_nr' param was specified.

    Signed-off-by: Namhyung Kim
    Cc: Laurent Vivier
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Namhyung Kim
     
  • The 'max_part' parameter controls the number of maximum partition
    a brd device can have. However if a user specifies very large
    value it would exceed the limitation of device minor number and
    can cause a kernel panic (or, at least, produce invalid device
    nodes in some cases).

    On my desktop system, following command kills the kernel. On qemu,
    it triggers similar oops but the kernel was alive:

    $ sudo modprobe brd max_part=100000
    BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
    IP: [] sysfs_create_dir+0x2d/0xae
    PGD 7af1067 PUD 7b19067 PMD 0
    Oops: 0000 [#1] SMP
    last sysfs file:
    CPU 0
    Modules linked in: brd(+)

    Pid: 44, comm: insmod Tainted: G W 2.6.39-qemu+ #158 Bochs Bochs
    RIP: 0010:[] [] sysfs_create_dir+0x2d/0xae
    RSP: 0018:ffff880007b15d78 EFLAGS: 00000286
    RAX: ffff880007b05478 RBX: ffff880007a52760 RCX: ffff880007b15dc8
    RDX: ffff880007a4f900 RSI: ffff880007b15e48 RDI: ffff880007a52760
    RBP: ffff880007b15da8 R08: 0000000000000002 R09: 0000000000000000
    R10: ffff880007b15e48 R11: ffff880007b05478 R12: 0000000000000000
    R13: ffff880007b05478 R14: 0000000000400920 R15: 0000000000000063
    FS: 0000000002160880(0063) GS:ffff880007c00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000058 CR3: 0000000007b1c000 CR4: 00000000000006b0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 0000000000000000 DR7: 0000000000000000
    Process insmod (pid: 44, threadinfo ffff880007b14000, task ffff880007acb980)
    Stack:
    ffff880007b15dc8 ffff880007b05478 ffff880007b15da8 00000000fffffffe
    ffff880007a52760 ffff880007b05478 ffff880007b15de8 ffffffff81143c0a
    0000000000400920 ffff880007a52760 ffff880007b05478 0000000000000000
    Call Trace:
    [] kobject_add_internal+0xdf/0x1a0
    [] kobject_add_varg+0x41/0x50
    [] kobject_add+0x64/0x66
    [] blk_register_queue+0x5f/0xb8
    [] add_disk+0xdf/0x289
    [] brd_init+0xdf/0x1aa [brd]
    [] ? 0xffffffffa0003fff
    [] ? 0xffffffffa0003fff
    [] do_one_initcall+0x7a/0x12e
    [] sys_init_module+0x9c/0x1dc
    [] system_call_fastpath+0x16/0x1b
    Code: 89 e5 41 55 41 54 53 48 89 fb 48 83 ec 18 48 85 ff 75 04 0f 0b eb fe 48 8b 47 18 49 c7 c4 70 1e 4d 81 48 85 c0 74 04 4c 8b 60 30
    8b 44 24 58 45 31 ed 0f b6 c4 85 c0 74 0d 48 8b 43 28 48 89
    RIP [] sysfs_create_dir+0x2d/0xae
    RSP
    CR2: 0000000000000058
    ---[ end trace aebb1175ce1f6739 ]---

    Signed-off-by: Namhyung Kim
    Cc: Laurent Vivier
    Cc: stable@kernel.org
    Signed-off-by: Jens Axboe

    Namhyung Kim
     
  • brd_refcnt, brd_offset, brd_sizelimit and brd_blocksize in struct
    brd_device seem to be copied from struct loop_device but they're
    not used anywhere. Let get rid of them.

    Signed-off-by: Namhyung Kim
    Signed-off-by: Jens Axboe

    Namhyung Kim
     

23 Oct, 2010

1 commit

  • * 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits)
    xen-blkfront: disable barrier/flush write support
    Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c
    block: remove BLKDEV_IFL_WAIT
    aic7xxx_old: removed unused 'req' variable
    block: remove the BH_Eopnotsupp flag
    block: remove the BLKDEV_IFL_BARRIER flag
    block: remove the WRITE_BARRIER flag
    swap: do not send discards as barriers
    fat: do not send discards as barriers
    ext4: do not send discards as barriers
    jbd2: replace barriers with explicit flush / FUA usage
    jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier
    jbd: replace barriers with explicit flush / FUA usage
    nilfs2: replace barriers with explicit flush / FUA usage
    reiserfs: replace barriers with explicit flush / FUA usage
    gfs2: replace barriers with explicit flush / FUA usage
    btrfs: replace barriers with explicit flush / FUA usage
    xfs: replace barriers with explicit flush / FUA usage
    block: pass gfp_mask and flags to sb_issue_discard
    dm: convey that all flushes are processed as empty
    ...

    Linus Torvalds
     

05 Oct, 2010

1 commit

  • The block device drivers have all gained new lock_kernel
    calls from a recent pushdown, and some of the drivers
    were already using the BKL before.

    This turns the BKL into a set of per-driver mutexes.
    Still need to check whether this is safe to do.

    file=$1
    name=$2
    if grep -q lock_kernel ${file} ; then
    if grep -q 'include.*linux.mutex.h' ${file} ; then
    sed -i '/include.*/d' ${file}
    else
    sed -i 's/include.*.*$/include /g' ${file}
    fi
    sed -i ${file} \
    -e "/^#include.*linux.mutex.h/,$ {
    1,/^\(static\|int\|long\)/ {
    /^\(static\|int\|long\)/istatic DEFINE_MUTEX(${name}_mutex);

    } }" \
    -e "s/\(un\)*lock_kernel\>[ ]*()/mutex_\1lock(\&${name}_mutex)/g" \
    -e '/[ ]*cycle_kernel_lock();/d'
    else
    sed -i -e '/include.*\/d' ${file} \
    -e '/cycle_kernel_lock()/d'
    fi

    Signed-off-by: Arnd Bergmann

    Arnd Bergmann
     

10 Sep, 2010

2 commits

  • Barrier is deemed too heavy and will soon be replaced by FLUSH/FUA
    requests. Deprecate barrier. All REQ_HARDBARRIERs are failed with
    -EOPNOTSUPP and blk_queue_ordered() is replaced with simpler
    blk_queue_flush().

    blk_queue_flush() takes combinations of REQ_FLUSH and FUA. If a
    device has write cache and can flush it, it should set REQ_FLUSH. If
    the device can handle FUA writes, it should also set REQ_FUA.

    All blk_queue_ordered() users are converted.

    * ORDERED_DRAIN is mapped to 0 which is the default value.
    * ORDERED_DRAIN_FLUSH is mapped to REQ_FLUSH.
    * ORDERED_DRAIN_FLUSH_FUA is mapped to REQ_FLUSH | REQ_FUA.

    Signed-off-by: Tejun Heo
    Acked-by: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Nick Piggin
    Cc: Michael S. Tsirkin
    Cc: Jeremy Fitzhardinge
    Cc: Chris Wright
    Cc: FUJITA Tomonori
    Cc: Geert Uytterhoeven
    Cc: David S. Miller
    Cc: Alasdair G Kergon
    Cc: Pierre Ossman
    Cc: Stefan Weinhuber
    Signed-off-by: Jens Axboe

    Tejun Heo
     
  • Nobody is making meaningful use of ORDERED_BY_TAG now and queue
    draining for barrier requests will be removed soon which will render
    the advantage of tag ordering moot. Kill ORDERED_BY_TAG. The
    following users are affected.

    * brd: converted to ORDERED_DRAIN.
    * virtio_blk: ORDERED_TAG path was already marked deprecated. Removed.
    * xen-blkfront: ORDERED_TAG case dropped.

    Signed-off-by: Tejun Heo
    Cc: Christoph Hellwig
    Cc: Nick Piggin
    Cc: Michael S. Tsirkin
    Cc: Jeremy Fitzhardinge
    Cc: Chris Wright
    Signed-off-by: Jens Axboe

    Tejun Heo
     

08 Aug, 2010

3 commits

  • As a preparation for the removal of the big kernel
    lock in the block layer, this removes the BKL
    from the common ioctl handling code, moving it
    into every single driver still using it.

    Signed-off-by: Arnd Bergmann
    Acked-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Arnd Bergmann
     
  • This removes q->prepare_flush_fn completely (changes the
    blk_queue_ordered API).

    Signed-off-by: FUJITA Tomonori
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    FUJITA Tomonori
     
  • Remove the current bio flags and reuse the request flags for the bio, too.
    This allows to more easily trace the type of I/O from the filesystem
    down to the block driver. There were two flags in the bio that were
    missing in the requests: BIO_RW_UNPLUG and BIO_RW_AHEAD. Also I've
    renamed two request flags that had a superflous RW in them.

    Note that the flags are in bio.h despite having the REQ_ name - as
    blkdev.h includes bio.h that is the only way to go for now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

01 Jun, 2010

1 commit

  • Support discard requests in brd by zeroing or deleting the underlying backing
    pages. This is simply to help with testing and documentation nature of
    brd code.

    Signed-off-by: Nick Piggin
    Signed-off-by: Jens Axboe

    Nick Piggin
     

30 Mar, 2010

1 commit

  • …it slab.h inclusion from percpu.h

    percpu.h is included by sched.h and module.h and thus ends up being
    included when building most .c files. percpu.h includes slab.h which
    in turn includes gfp.h making everything defined by the two files
    universally available and complicating inclusion dependencies.

    percpu.h -> slab.h dependency is about to be removed. Prepare for
    this change by updating users of gfp and slab facilities include those
    headers directly instead of assuming availability. As this conversion
    needs to touch large number of source files, the following script is
    used as the basis of conversion.

    http://userweb.kernel.org/~tj/misc/slabh-sweep.py

    The script does the followings.

    * Scan files for gfp and slab usages and update includes such that
    only the necessary includes are there. ie. if only gfp is used,
    gfp.h, if slab is used, slab.h.

    * When the script inserts a new include, it looks at the include
    blocks and try to put the new include such that its order conforms
    to its surrounding. It's put in the include block which contains
    core kernel includes, in the same order that the rest are ordered -
    alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
    doesn't seem to be any matching order.

    * If the script can't find a place to put a new include (mostly
    because the file doesn't have fitting include block), it prints out
    an error message indicating which .h file needs to be added to the
    file.

    The conversion was done in the following steps.

    1. The initial automatic conversion of all .c files updated slightly
    over 4000 files, deleting around 700 includes and adding ~480 gfp.h
    and ~3000 slab.h inclusions. The script emitted errors for ~400
    files.

    2. Each error was manually checked. Some didn't need the inclusion,
    some needed manual addition while adding it to implementation .h or
    embedding .c file was more appropriate for others. This step added
    inclusions to around 150 files.

    3. The script was run again and the output was compared to the edits
    from #2 to make sure no file was left behind.

    4. Several build tests were done and a couple of problems were fixed.
    e.g. lib/decompress_*.c used malloc/free() wrappers around slab
    APIs requiring slab.h to be added manually.

    5. The script was run on all .h files but without automatically
    editing them as sprinkling gfp.h and slab.h inclusions around .h
    files could easily lead to inclusion dependency hell. Most gfp.h
    inclusion directives were ignored as stuff from gfp.h was usually
    wildly available and often used in preprocessor macros. Each
    slab.h inclusion directive was examined and added manually as
    necessary.

    6. percpu.h was updated not to include slab.h.

    7. Build test were done on the following configurations and failures
    were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
    distributed build env didn't work with gcov compiles) and a few
    more options had to be turned off depending on archs to make things
    build (like ipr on powerpc/64 which failed due to missing writeq).

    * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
    * powerpc and powerpc64 SMP allmodconfig
    * sparc and sparc64 SMP allmodconfig
    * ia64 SMP allmodconfig
    * s390 SMP allmodconfig
    * alpha SMP allmodconfig
    * um on x86_64 SMP allmodconfig

    8. percpu.h modifications were reverted so that it could be applied as
    a separate patch and serve as bisection point.

    Given the fact that I had only a couple of failures from tests on step
    6, I'm fairly confident about the coverage of this conversion patch.
    If there is a breakage, it's likely to be something in one of the arch
    headers which should be easily discoverable easily on most builds of
    the specific arch.

    Signed-off-by: Tejun Heo <tj@kernel.org>
    Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>

    Tejun Heo
     

26 Feb, 2010

1 commit

  • The block layer calling convention is blk_queue_.
    blk_queue_max_sectors predates this practice, leading to some confusion.
    Rename the function to appropriately reflect that its intended use is to
    set max_hw_sectors.

    Also introduce a temporary wrapper for backwards compability. This can
    be removed after the merge window is closed.

    Signed-off-by: Martin K. Petersen
    Signed-off-by: Jens Axboe

    Martin K. Petersen
     

22 Sep, 2009

1 commit


11 Jun, 2009

1 commit


15 Apr, 2009

2 commits

  • brd is missing a flush_dcache_page. On 2nd thoughts, perhaps it is the
    pagecache's responsibility to flush user virtual aliases (the driver of
    course should flush kernel virtual mappings)... but anyway, there
    already exists cache flushing for one direction of transfer, so we
    should add the other.

    Signed-off-by: Nick Piggin
    Signed-off-by: Jens Axboe

    Nick Piggin
     
  • brd is always ordered (not that it matters, as it is defined not to
    survive when the system goes down). So tell the block layer it is
    ordered, which might be of help with testing filesystems.

    Signed-off-by: Nick Piggin
    Signed-off-by: Jens Axboe

    Nick Piggin
     

21 Oct, 2008

2 commits

  • Signed-off-by: Al Viro

    Al Viro
     
  • To keep the size of changesets sane we split the switch by drivers;
    to keep the damn thing bisectable we do the following:
    1) rename the affected methods, add ones with correct
    prototypes, make (few) callers handle both. That's this changeset.
    2) for each driver convert to new methods. *ALL* drivers
    are converted in this series.
    3) kill the old (renamed) methods.

    Note that it _is_ a flagday; all in-tree drivers are converted and by the
    end of this series no trace of old methods remain. The only reason why
    we do that this way is to keep the damn thing bisectable and allow per-driver
    debugging if anything goes wrong.

    New methods:
    open(bdev, mode)
    release(disk, mode)
    ioctl(bdev, mode, cmd, arg) /* Called without BKL */
    compat_ioctl(bdev, mode, cmd, arg)
    locked_ioctl(bdev, mode, cmd, arg) /* Called with BKL, legacy */

    Signed-off-by: Al Viro

    Al Viro
     

21 Aug, 2008

1 commit

  • The name of brd block device is "ramdisk", it's not "brd".
    (The block device is registered by register_blkdev(RAMDISK_MAJOR, "ramdisk")
    So it should be unregistered by unregister_blkdev(RAMDISK_MAJOR, "ramdisk")

    Signed-off-by: Akinobu Mita
    Acked-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Akinobu Mita
     

06 Jun, 2008

1 commit


25 May, 2008

1 commit

  • In 2.6.25, ramdisk devices show up in /proc/partitions, which is a
    behaviour change from the old rd.c. Add GENHD_FL_SUPPRESS_PARTITION_INFO,
    which was present in rd.c.

    All kernels prior to 2.6.25 weren't displaying ramdisks in
    /proc/partitions. Since there are many userspace tools using information
    from /proc/partitions some of them may now behave incorrectly (I didn't
    tested any though). For example before 2.6.25 /proc/partitions was empty
    if no block devices like hard disks and such were detected by kernel. Now
    all 16 ramdisks are always visible there. Some software may rely on such
    information (I mean, on empty /proc/partitions).

    There was quite similar situation back in 2004, and ramdisks were excluded
    back from displaying. Thats why I called this a regression (maybe a bit
    unfortunate). See this patch for info:
    http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.3-rc2/2.6.3-rc2-mm1/broken-out/nbd-proc-partitions-fix.patch

    I also think that someone somewhere (long time ago) excluded ramdisks from
    /proc/partitions for good reasons. It is possible that now such new
    "feature" is harmless, but I think there are more chances that someone
    will say "hey, /proc/partitions has changed, now my software doesn't work"
    then "hey where did my new 2.6.25 feature go". nbd devices are also
    excluded, maybe for very same (unknown to me) reasons.

    Signed-off-by: Marcin Krol
    Signed-off-by: Nick Piggin
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Marcin Krol
     

30 Apr, 2008

1 commit

  • This patch adds partition management for Block RAM Device (BRD).

    This patch is done to keep in sync BRD and loop device drivers.

    This patch adds a parameter to the module, max_part, to specify
    the maximum number of partitions per RAM device.

    Example:

    # modprobe brd max_part=63
    # ls -l /dev/ram*
    brw-rw---- 1 root disk 1, 0 2008-04-03 13:39 /dev/ram0
    brw-rw---- 1 root disk 1, 64 2008-04-03 13:39 /dev/ram1
    brw-rw---- 1 root disk 1, 640 2008-04-03 13:39 /dev/ram10
    brw-rw---- 1 root disk 1, 704 2008-04-03 13:39 /dev/ram11
    brw-rw---- 1 root disk 1, 768 2008-04-03 13:39 /dev/ram12
    brw-rw---- 1 root disk 1, 832 2008-04-03 13:39 /dev/ram13
    brw-rw---- 1 root disk 1, 896 2008-04-03 13:39 /dev/ram14
    brw-rw---- 1 root disk 1, 960 2008-04-03 13:39 /dev/ram15
    brw-rw---- 1 root disk 1, 128 2008-04-03 13:39 /dev/ram2
    brw-rw---- 1 root disk 1, 192 2008-04-03 13:39 /dev/ram3
    brw-rw---- 1 root disk 1, 256 2008-04-03 13:39 /dev/ram4
    brw-rw---- 1 root disk 1, 320 2008-04-03 13:39 /dev/ram5
    brw-rw---- 1 root disk 1, 384 2008-04-03 13:39 /dev/ram6
    brw-rw---- 1 root disk 1, 448 2008-04-03 13:39 /dev/ram7
    brw-rw---- 1 root disk 1, 512 2008-04-03 13:39 /dev/ram8
    brw-rw---- 1 root disk 1, 576 2008-04-03 13:39 /dev/ram9
    # fdisk /dev/ram0
    Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
    Building a new DOS disklabel. Changes will remain in memory only,
    until you decide to write them. After that, of course, the previous
    content won't be recoverable.

    Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

    Command (m for help): o
    Building a new DOS disklabel. Changes will remain in memory only,
    until you decide to write them. After that, of course, the previous
    content won't be recoverable.

    Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

    Command (m for help): n
    Command action
    e extended
    p primary partition (1-4)
    p
    Partition number (1-4): 1
    First cylinder (1-2, default 1): 1
    Last cylinder or +size or +sizeM or +sizeK (1-2, default 2): 2

    Command (m for help): w
    The partition table has been altered!

    Calling ioctl() to re-read partition table.
    Syncing disks.
    # ls -l /dev/ram0*
    brw-rw---- 1 root disk 1, 0 2008-04-03 13:40 /dev/ram0
    brw-rw---- 1 root disk 1, 1 2008-04-03 13:40 /dev/ram0p1
    # mkfs /dev/ram0p1
    mke2fs 1.40-WIP (14-Nov-2006)
    Filesystem label=
    OS type: Linux
    Block size=1024 (log=0)
    Fragment size=1024 (log=0)
    4016 inodes, 16032 blocks
    801 blocks (5.00%) reserved for the super user
    First data block=1
    Maximum filesystem blocks=16515072
    2 block groups
    8192 blocks per group, 8192 fragments per group
    2008 inodes per group
    Superblock backups stored on blocks:
    8193

    Writing inode tables: done
    Writing superblocks and filesystem accounting information: done

    This filesystem will be automatically checked every 26 mounts or
    180 days, whichever comes first. Use tune2fs -c or -i to override.
    # mount /dev/ram0p1 /mnt
    df /mnt
    Filesystem 1K-blocks Used Available Use% Mounted on
    /dev/ram0p1 15521 138 14582 1% /mnt
    # ls -l /mnt
    total 12
    drwx------ 2 root root 12288 2008-04-03 13:41 lost+found
    # umount /mnt
    # rmmod brd

    Signed-off-by: Laurent Vivier
    Acked-by: Nick Piggin
    Cc: Al Viro
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Laurent Vivier
     

28 Apr, 2008

1 commit

  • Alter the block device ->direct_access() API to work with the new
    get_xip_mem() API (that requires both kaddr and pfn are returned).

    Some architectures will not do the right thing in their virt_to_page() for use
    by XIP (to translate from the kernel virtual address returned by
    direct_access(), to a user mappable pfn in XIP's page fault handler.

    However, we can't switch it to just return the pfn and not the kaddr, because
    we have no good way to get a kva from a pfn, and XIP requires the kva for its
    read(2) and write(2) handlers. So we have to return both.

    Signed-off-by: Jared Hulbert
    Signed-off-by: Nick Piggin
    Cc: Carsten Otte
    Cc: Heiko Carstens
    Cc: linux-mm@kvack.org
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jared Hulbert
     

23 Apr, 2008

1 commit

  • While looking at the implementation of the Ram backed block device
    driver, I stumbled across a write-only local variable, which makes
    little sense, so I assume it should actually work like this:

    Signed-off-by: Petr Tesarik
    Signed-off-by: Nick Piggin
    Signed-off-by: Linus Torvalds

    Petr Tesarik
     

09 Feb, 2008

2 commits

  • Support direct_access XIP method with brd.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • This is a rewrite of the ramdisk block device driver.

    The old one is really difficult because it effectively implements a block
    device which serves data out of its own buffer cache. It relies on the dirty
    bit being set, to pin its backing store in cache, however there are non
    trivial paths which can clear the dirty bit (eg. try_to_free_buffers()),
    which had recently lead to data corruption. And in general it is completely
    wrong for a block device driver to do this.

    The new one is more like a regular block device driver. It has no idea about
    vm/vfs stuff. It's backing store is similar to the buffer cache (a simple
    radix-tree of pages), but it doesn't know anything about page cache (the pages
    in the radix tree are not pagecache pages).

    There is one slight downside -- direct block device access and filesystem
    metadata access goes through an extra copy and gets stored in RAM twice.
    However, this downside is only slight, because the real buffercache of the
    device is now reclaimable (because we're not playing crazy games with it), so
    under memory intensive situations, footprint should effectively be the same --
    maybe even a slight advantage to the new driver because it can also reclaim
    buffer heads.

    The fact that it now goes through all the regular vm/fs paths makes it
    much more useful for testing, too.

    text data bss dec hex filename
    2837 849 384 4070 fe6 drivers/block/rd.o
    3528 371 12 3911 f47 drivers/block/brd.o

    Text is larger, but data and bss are smaller, making total size smaller.

    A few other nice things about it:
    - Similar structure and layout to the new loop device handlinag.
    - Dynamic ramdisk creation.
    - Runtime flexible buffer head size (because it is no longer part of the
    ramdisk code).
    - Boot / load time flexible ramdisk size, which could easily be extended
    to a per-ramdisk runtime changeable size (eg. with an ioctl).
    - Can use highmem for the backing store.

    [akpm@linux-foundation.org: fix build]
    [byron.bbradley@gmail.com: make rd_size non-static]
    Signed-off-by: Nick Piggin
    Signed-off-by: Byron Bradley
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin