25 Feb, 2011
1 commit
-
Adam Kovari and others reported that disconnecting an USB drive with
an ntfs-3g filesystem would cause "kernel BUG at fs/inode.c:1421!" to
be triggered.The BUG could be traced back to ioctl(BLKBSZSET), which would
erroneously decrement the refcount on the bdev. This is because
blkdev_get() expects the refcount to be already incremented and either
returns success or decrements the refcount and returns an error.The bug was introduced by e525fd89 (block: make blkdev_get/put()
handle exclusive access), which didn't take into account this behavior
of blkdev_get().This fixes
https://bugzilla.kernel.org/show_bug.cgi?id=29202
(and likely 29792 too)Reported-by: Adam Kovari
Acked-by: Tejun Heo
Signed-off-by: Miklos Szeredi
Signed-off-by: Linus Torvalds
14 Jan, 2011
1 commit
-
* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
block: ensure that completion error gets properly traced
blktrace: add missing probe argument to block_bio_complete
block cfq: don't use atomic_t for cfq_group
block cfq: don't use atomic_t for cfq_queue
block: trace event block fix unassigned field
block: add internal hd part table references
block: fix accounting bug on cross partition merges
kref: add kref_test_and_get
bio-integrity: mark kintegrityd_wq highpri and CPU intensive
block: make kblockd_workqueue smarter
Revert "sd: implement sd_check_events()"
block: Clean up exit_io_context() source code.
Fix compile warnings due to missing removal of a 'ret' variable
fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
cfq-iosched: don't check cfqg in choose_service_tree()
fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
cdrom: export cdrom_check_events()
sd: implement sd_check_events()
sr: implement sr_check_events()
...
28 Nov, 2010
1 commit
-
…/tj/misc into for-2.6.38/core
18 Nov, 2010
1 commit
-
The big kernel lock has been removed from all these files at some point,
leaving only the #include.Remove this too as a cleanup.
Signed-off-by: Arnd Bergmann
Signed-off-by: Linus Torvalds
13 Nov, 2010
1 commit
-
Over time, block layer has accumulated a set of APIs dealing with bdev
open, close, claim and release.* blkdev_get/put() are the primary open and close functions.
* bd_claim/release() deal with exclusive open.
* open/close_bdev_exclusive() are combination of open and claim and
the other way around, respectively.* bd_link/unlink_disk_holder() to create and remove holder/slave
symlinks.* open_by_devnum() wraps bdget() + blkdev_get().
The interface is a bit confusing and the decoupling of open and claim
makes it impossible to properly guarantee exclusive access as
in-kernel open + claim sequence can disturb the existing exclusive
open even before the block layer knows the current open if for another
exclusive access. Reorganize the interface such that,* blkdev_get() is extended to include exclusive access management.
@holder argument is added and, if is @FMODE_EXCL specified, it will
gain exclusive access atomically w.r.t. other exclusive accesses.* blkdev_put() is similarly extended. It now takes @mode argument and
if @FMODE_EXCL is set, it releases an exclusive access. Also, when
the last exclusive claim is released, the holder/slave symlinks are
removed automatically.* bd_claim/release() and close_bdev_exclusive() are no longer
necessary and either made static or removed.* bd_link_disk_holder() remains the same but bd_unlink_disk_holder()
is no longer necessary and removed.* open_bdev_exclusive() becomes a simple wrapper around lookup_bdev()
and blkdev_get(). It also has an unexpected extra bdev_read_only()
test which probably should be moved into blkdev_get().* open_by_devnum() is modified to take @holder argument and pass it to
blkdev_get().Most of bdev open/close operations are unified into blkdev_get/put()
and most exclusive accesses are tested atomically at the open time (as
it should). This cleans up code and removes some, both valid and
invalid, but unnecessary all the same, corner cases.open_bdev_exclusive() and open_by_devnum() can use further cleanup -
rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop
special features. Well, let's leave them for another day.Most conversions are straight-forward. drbd conversion is a bit more
involved as there was some reordering, but the logic should stay the
same.Signed-off-by: Tejun Heo
Acked-by: Neil Brown
Acked-by: Ryusuke Konishi
Acked-by: Mike Snitzer
Acked-by: Philipp Reisner
Cc: Peter Osterlund
Cc: Martin Schwidefsky
Cc: Heiko Carstens
Cc: Jan Kara
Cc: Andrew Morton
Cc: Andreas Dilger
Cc: "Theodore Ts'o"
Cc: Mark Fasheh
Cc: Joel Becker
Cc: Alex Elder
Cc: Christoph Hellwig
Cc: dm-devel@redhat.com
Cc: drbd-dev@lists.linbit.com
Cc: Leo Chen
Cc: Scott Branden
Cc: Chris Mason
Cc: Steven Whitehouse
Cc: Dave Kleikamp
Cc: Joern Engel
Cc: reiserfs-devel@vger.kernel.org
Cc: Alexander Viro
10 Nov, 2010
2 commits
-
Structure hd_geometry is copied to userland with 4 padding bytes
between cylinders and start fields uninitialized on 64-bit platforms.
It leads to leaking of contents of kernel stack memory.Currently there is no memset() in real implementations of getgeo()
in drivers/block/, so it makes sense to have memset() in blkdev_ioctl().Signed-off-by: Vasiliy Kulikov
Signed-off-by: Jens Axboe -
Convert direct reads of an inode's i_size to using i_size_read().
i_size_{read,write} use a seqcount to protect reads from accessing
incomple writes. Concurrent i_size_write()s require mutual exclussion
to protect the seqcount that is used by i_size_{read,write}. But
i_size_read() callers do not need to use additional locking.Signed-off-by: Mike Snitzer
Acked-by: NeilBrown
Acked-by: Lars Ellenberg
Signed-off-by: Jens Axboe
23 Oct, 2010
1 commit
-
* 'for-2.6.37/barrier' of git://git.kernel.dk/linux-2.6-block: (46 commits)
xen-blkfront: disable barrier/flush write support
Added blk-lib.c and blk-barrier.c was renamed to blk-flush.c
block: remove BLKDEV_IFL_WAIT
aic7xxx_old: removed unused 'req' variable
block: remove the BH_Eopnotsupp flag
block: remove the BLKDEV_IFL_BARRIER flag
block: remove the WRITE_BARRIER flag
swap: do not send discards as barriers
fat: do not send discards as barriers
ext4: do not send discards as barriers
jbd2: replace barriers with explicit flush / FUA usage
jbd2: Modify ASYNC_COMMIT code to not rely on queue draining on barrier
jbd: replace barriers with explicit flush / FUA usage
nilfs2: replace barriers with explicit flush / FUA usage
reiserfs: replace barriers with explicit flush / FUA usage
gfs2: replace barriers with explicit flush / FUA usage
btrfs: replace barriers with explicit flush / FUA usage
xfs: replace barriers with explicit flush / FUA usage
block: pass gfp_mask and flags to sb_issue_discard
dm: convey that all flushes are processed as empty
...
17 Sep, 2010
1 commit
-
All the blkdev_issue_* helpers can only sanely be used for synchronous
caller. To issue cache flushes or barriers asynchronously the caller needs
to set up a bio by itself with a completion callback to move the asynchronous
state machine ahead. So drop the BLKDEV_IFL_WAIT flag that is always
specified when calling blkdev_issue_* and also remove the now unused flags
argument to blkdev_issue_flush and blkdev_issue_zeroout. For
blkdev_issue_discard we need to keep it for the secure discard flag, which
gains a more descriptive name and loses the bitops vs flag confusion.Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
15 Sep, 2010
1 commit
-
I'm reposting this patch series as v4 since there have been no additional
comments, and I cleaned up one extra bit of unneeded code (in 3/3). The patches
are against Linus's tree: 2bfc96a127bc1cc94d26bfaa40159966064f9c8c
(2.6.36-rc3).Would this patchset be suitable for inclusion in an mm branch?
This changes adds a partition_meta_info struct which itself contains a
union of structures that provide partition table specific metadata.This change leaves the union empty. The subsequent patch includes an
implementation for CONFIG_EFI_PARTITION-based metadata.Signed-off-by: Will Drewry
Signed-off-by: Jens Axboe
12 Aug, 2010
1 commit
-
Secure discard is the same as discard except that all copies of the
discarded sectors (perhaps created by garbage collection) must also be
erased.Signed-off-by: Adrian Hunter
Acked-by: Jens Axboe
Cc: Kyungmin Park
Cc: Madhusudhan Chikkature
Cc: Christoph Hellwig
Cc: Ben Gardiner
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Aug, 2010
4 commits
-
The blkpg_ioctl and blkdev_reread_part access fields of
the bdev and gendisk structures, yet they always do so
under the protection of bdev->bd_mutex, which seems
sufficient.Signed-off-by: Arnd Bergmann
cked-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
We only call the functions set_device_ro(),
invalidate_bdev(), sync_filesystem() and sync_blockdev()
while holding the BKL in these commands. All
of these are also done in other code paths without
the BKL, which leads me to the conclusion that
the BKL is not needed here either.The reason we hold it here is that it was originally
pushed down into the ioctl function from vfs_ioctl.Signed-off-by: Arnd Bergmann
Reviewed-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
The blktrace driver currently needs the BKL, but
we should not need to take that in the block layer,
so just push it down into the driver itself.It is quite likely that the BKL is not actually
required in blktrace code and could be removed
in a follow-on patch.Signed-off-by: Arnd Bergmann
Acked-by: Christoph Hellwig
Signed-off-by: Jens Axboe -
As a preparation for the removal of the big kernel
lock in the block layer, this removes the BKL
from the common ioctl handling code, moving it
into every single driver still using it.Signed-off-by: Arnd Bergmann
Acked-by: Christoph Hellwig
Signed-off-by: Jens Axboe
29 Apr, 2010
1 commit
-
The patch just convert all blkdev_issue_xxx function to common
set of flags. Wait/allocation semantics preserved.Signed-off-by: Dmitry Monakhov
Signed-off-by: Jens Axboe
30 Mar, 2010
1 commit
-
…it slab.h inclusion from percpu.h
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
03 Dec, 2009
1 commit
-
The discard ioctl is used by mkfs utilities to clear a block device
prior to putting metadata down. However, not all devices return zeroed
blocks after a discard. Some drives return stale data, potentially
containing old superblocks. It is therefore important to know whether
discarded blocks are properly zeroed.Both ATA and SCSI drives have configuration bits that indicate whether
zeroes are returned after a discard operation. Implement a block level
interface that allows this information to be bubbled up the stack and
queried via a new block device ioctl.Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe
04 Oct, 2009
1 commit
-
Not all users of the topology information want to use libblkid. Provide
the topology information through bdev ioctls.Also clarify sector size comments for existing BLK ioctls.
Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe
14 Sep, 2009
1 commit
-
blk_ioctl_discard duplicates large amounts of code from blkdev_issue_discard,
the only difference between the two is that blkdev_issue_discard needs to
send a barrier discard request and blk_ioctl_discard a non-barrier one,
and blk_ioctl_discard needs to wait on the request. To facilitates this
add a flags argument to blkdev_issue_discard to control both aspects of the
behaviour. This will be very useful later on for using the waiting
funcitonality for other callers.Based on an earlier patch from Matthew Wilcox .
Signed-off-by: Christoph Hellwig
Signed-off-by: Jens Axboe
23 May, 2009
2 commits
-
Convert all external users of queue limits to using wrapper functions
instead of poking the request queue variables directly.Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe -
Until now we have had a 1:1 mapping between storage device physical
block size and the logical block sized used when addressing the device.
With SATA 4KB drives coming out that will no longer be the case. The
sector size will be 4KB but the logical block size will remain
512-bytes. Hence we need to distinguish between the physical block size
and the logical ditto.This patch renames hardsect_size to logical_block_size.
Signed-off-by: Martin K. Petersen
Signed-off-by: Jens Axboe
15 Apr, 2009
1 commit
-
Remove code handling bio_alloc failure with __GFP_WAIT.
GFP_KERNEL implies __GFP_WAIT.Signed-off-by: Nikanth Karthikesan
Signed-off-by: Jens Axboe
29 Dec, 2008
1 commit
-
There's no need to take queue_lock or kernel_lock when modifying
bdi->ra_pages. So remove them. Also remove out of date comment for
queue_max_sectors_store().Signed-off-by: Wu Fengguang
Signed-off-by: Jens Axboe
18 Nov, 2008
1 commit
-
Make add_partition() return pointer to the new hd_struct on success
and ERR_PTR() value on failure. This change will be used to fix md
autodetection bug.Signed-off-by: Tejun Heo
Cc: Neil Brown
Signed-off-by: Jens Axboe
21 Oct, 2008
7 commits
-
Now we can switch blkdev_ioctl() block_device/mode
Signed-off-by: Al Viro
-
We need to do bd_claim() only if file hadn't been opened with O_EXCL
and then we have no need to use file itself as owner.Signed-off-by: Al Viro
-
Most of that stuff doesn't need BKL at all; expand in the (only) caller,
merge the switch into one there and leave BKL only around the stuff that
might actually need it.Signed-off-by: Al Viro
-
convert remaining callers to __blkdev_driver_ioctl()
Signed-off-by: Al Viro
-
Signed-off-by: Al Viro
-
To keep the size of changesets sane we split the switch by drivers;
to keep the damn thing bisectable we do the following:
1) rename the affected methods, add ones with correct
prototypes, make (few) callers handle both. That's this changeset.
2) for each driver convert to new methods. *ALL* drivers
are converted in this series.
3) kill the old (renamed) methods.Note that it _is_ a flagday; all in-tree drivers are converted and by the
end of this series no trace of old methods remain. The only reason why
we do that this way is to keep the damn thing bisectable and allow per-driver
debugging if anything goes wrong.New methods:
open(bdev, mode)
release(disk, mode)
ioctl(bdev, mode, cmd, arg) /* Called without BKL */
compat_ioctl(bdev, mode, cmd, arg)
locked_ioctl(bdev, mode, cmd, arg) /* Called with BKL, legacy */Signed-off-by: Al Viro
-
Analog of blkdev_driver_ioctl() with sane arguments. For
now uses fake struct file, by the end of the series it won't
and blkdev_driver_ioctl() will become a wrapper around it.Signed-off-by: Al Viro
09 Oct, 2008
8 commits
-
disk->__part used to be statically allocated to the maximum possible
number of partitions. This patch makes partition array allocation
dynamic. The added overhead is minimal as only real change is one
memory dereference changed to RCU one. This saves both a bit of
memory and cpu cycles iterating through unoccupied slots and makes
increasing partition limit easier.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe -
genhd and partition code handled disk and partitions separately. All
information about the whole disk was in struct genhd and partitions in
struct hd_struct. However, the whole disk (part0) and other
partitions have a lot in common and the data structures end up having
good number of common fields and thus separate code paths doing the
same thing. Also, the partition array was indexed by partno - 1 which
gets pretty confusing at times.This patch introduces partition 0 and makes the partition array
indexed by partno. Following patches will unify the handling of disk
and parts piece-by-piece.This patch also implements disk_partitionable() which tests whether a
disk is partitionable. With coming dynamic partition array change,
the most common usage of disk_max_parts() will be testing whether a
disk is partitionable and the number of max partitions will become
much less important.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe -
disk->part[] is protected by its matching bdev's lock. However,
non-critical accesses like collecting stats and printing out sysfs and
proc information used to be performed without any locking. As
partitions can come and go dynamically, partitions can go away
underneath those non-critical accesses. As some of those accesses are
writes, this theoretically can lead to silent corruption.This patch fixes the race by using RCU for the partition array and dev
reference counter to hold partitions.* Rename disk->part[] to disk->__part[] to make sure no one outside
genhd layer proper accesses it directly.* Use RCU for disk->__part[] dereferencing.
* Implement disk_{get|put}_part() which can be used to get and put
partitions from gendisk respectively.* Iterators are implemented to help iterate through all partitions
safely.* Functions which require RCU readlock are marked with _rcu suffix.
* Use disk_put_part() in __blkdev_put() instead of directly putting
the contained kobject.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe -
* Implement disk_devt() and part_devt() and use them to directly
access devt instead of computing it from ->major and ->first_minor.Note that all references to ->major and ->first_minor outside of
block layer is used to determine devt of the disk (the part0) and as
->major and ->first_minor will continue to represent devt for the
disk, converting these users aren't strictly necessary. However,
convert them for consistency.* Implement disk_max_parts() to avoid directly deferencing
genhd->minors.* Update bdget_disk() such that it doesn't assume consecutive minor
space.* Move devt computation from register_disk() to add_disk() and make it
the only one (all other usages use the initially determined value).These changes clean up the code and will help disk->part dereference
fix and extended block device numbers.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe -
In hd_struct, @partno is used to denote partition number and a number
of other places use @part to denote hd_struct. Functions use @part
and @index instead. This causes confusion and makes it difficult to
use consistent variable names for hd_struct. Always use @partno if a
variable represents partition number.Also, print out functions use @f or @part for seq_file argument. Use
@seqf uniformly instead.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe -
d805dda4 tried to fix error case handling in add_partition() but had a
few problems.* disk->part[] entry is set early and left dangling if operation
fails.* Once device initialized, the last put_device() is responsible for
freeing all the resources. The failure path freed part_stats and p
regardless of put_device() causing double free.* holders subdir holds reference to the disk device, so failure path
should remove it to release resources properly which was missing.This patch fixes the above problems and while at it move partition
slot busy check into add_partition() for completeness and inlines
holders subdirectory creation. Using separate function for it just
obfuscates the code.Signed-off-by: Tejun Heo
Cc: Abdel Benamrouche
Signed-off-by: Jens Axboe -
delete_partition() was noop for zero length partition. As the
addition code allows creating zero lenght partition and deletion is
assumed to always succeed, this causes memory leak for zero length
partitions. Allow zero length partitions to end their meaningless
lives.While at it, allow deleting zero lenght partition via
BLKPG_DEL_PARTITION ioctl too.Signed-off-by: Tejun Heo
Signed-off-by: Jens Axboe -
But blkdev_issue_discard() still emits requests which are interpreted as
soft barriers, because naïve callers might otherwise issue subsequent
writes to those same sectors, which might cross on the queue (if they're
reallocated quickly enough).Callers still _can_ issue non-barrier discard requests, but they have to
take care of queue ordering for themselves.Signed-off-by: David Woodhouse
Signed-off-by: Jens Axboe