05 Oct, 2014

1 commit

  • Clear QUEUE_FLAG_ADD_RANDOM in all block drivers that set
    QUEUE_FLAG_NONROT.

    Historically, all block devices have automatically made entropy
    contributions. But as previously stated in commit e2e1a148 ("block: add
    sysfs knob for turning off disk entropy contributions"):
    - On SSD disks, the completion times aren't as random as they
    are for rotational drives. So it's questionable whether they
    should contribute to the random pool in the first place.
    - Calling add_disk_randomness() has a lot of overhead.

    There are more reliable sources for randomness than non-rotational block
    devices. From a security perspective it is better to err on the side of
    caution than to allow entropy contributions from unreliable "random"
    sources.

    Signed-off-by: Mike Snitzer
    Signed-off-by: Jens Axboe

    Mike Snitzer
     

05 Aug, 2014

22 commits


18 Apr, 2014

1 commit

  • Mostly scripted conversion of the smp_mb__* barriers.

    Signed-off-by: Peter Zijlstra
    Acked-by: Paul E. McKenney
    Link: http://lkml.kernel.org/n/tip-55dhyhocezdw1dg7u19hmh1u@git.kernel.org
    Cc: Linus Torvalds
    Cc: linux-arch@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Peter Zijlstra
     

19 Mar, 2014

16 commits

  • Uninlined nested functions can cause crashes when using ftrace, as they don't
    follow the normal calling convention and confuse the ftrace function graph
    tracer as it examines the stack.

    Also, nested functions are supported as a gcc extension, but may fail on other
    compilers (e.g. llvm).

    Signed-off-by: John Sheu

    John Sheu
     
  • gc_gen was a temporary used to recalculate last_gc, but since we only need
    bucket->last_gc when gc isn't running (gc_mark_valid = 1), we can just update
    last_gc directly.

    Signed-off-by: Kent Overstreet

    Kent Overstreet
     
  • This was originally added as at optimization that for various reasons isn't
    needed anymore, but it does add a lot of nasty corner cases (and it was
    responsible for some recently fixed bugs). Just get rid of it now.

    Signed-off-by: Kent Overstreet

    Kent Overstreet
     
  • This changes the bucket allocation reserves to use _real_ reserves - separate
    freelists - instead of watermarks, which if nothing else makes the current code
    saner to reason about and is going to be important in the future when we add
    support for multiple btrees.

    It also adds btree_check_reserve(), which checks (and locks) the reserves for
    both bucket allocation and memory allocation for btree nodes; the old code just
    kinda sorta assumed that since (e.g. for btree node splits) it had the root
    locked and that meant no other threads could try to make use of the same
    reserve; this technically should have been ok for memory allocation (we should
    always have a reserve for memory allocation (the btree node cache is used as a
    reserve and we preallocate it)), but multiple btrees will mean that locking the
    root won't be sufficient anymore, and for the bucket allocation reserve it was
    technically possible for the old code to deadlock.

    Signed-off-by: Kent Overstreet

    Kent Overstreet
     
  • With the locking rework in the last patch, this shouldn't be needed anymore -
    btree_node_write_work() only takes b->write_lock which is never held for very
    long.

    Signed-off-by: Kent Overstreet

    Kent Overstreet
     
  • Add a new lock, b->write_lock, which is required to actually modify - or write -
    a btree node; this lock is only held for short durations.

    This means we can write out a btree node without taking b->lock, which _is_ held
    for long durations - solving a deadlock when btree_flush_write() (from the
    journalling code) is called with a btree node locked.

    Right now just occurs in bch_btree_set_root(), but with an upcoming journalling
    rework is going to happen a lot more.

    This also turns b->lock is now more of a read/intent lock instead of a
    read/write lock - but not completely, since it still blocks readers. May turn it
    into a real intent lock at some point in the future.

    Signed-off-by: Kent Overstreet

    Kent Overstreet
     
  • This isn't a bulletproof fix; btree_node_free() -> bch_bucket_free() puts the
    bucket on the unused freelist, where it can be reused right away without any
    ordering requirements. It would be better to wait on at least a journal write to
    go down before reusing the bucket. bch_btree_set_root() does this, and inserting
    into non leaf nodes is completely synchronous so we should be ok, but future
    patches are just going to get rid of the unused freelist - it was needed in the
    past for various reasons but shouldn't be anymore.

    Signed-off-by: Kent Overstreet

    Kent Overstreet
     
  • This means the garbage collection code can better check for data and metadata
    pointers to the same buckets.

    Signed-off-by: Kent Overstreet

    Kent Overstreet
     
  • This will potentially save us an allocation when we've got inode/dirent bkeys
    that don't fit in the keylist's inline keys.

    Signed-off-by: Kent Overstreet

    Kent Overstreet
     
  • Break down data into clean data/dirty data/metadata.

    Signed-off-by: Kent Overstreet

    Kent Overstreet
     
  • Change the invalidate tracepoint to indicate how much data we're invalidating,
    and change the alloc tracepoints to indicate what offset they're for.

    Signed-off-by: Kent Overstreet

    Kent Overstreet
     
  • This hasn't been used or even enabled in ages.

    Signed-off-by: Kent Overstreet

    Kent Overstreet
     
  • Signed-off-by: Nicholas Swenson

    Nicholas Swenson
     
  • Avoid a potential null pointer deref (e.g. from check keys for cache misses)

    Signed-off-by: Kent Overstreet

    Kent Overstreet
     
  • Deadlock happened because a foreground write slept, waiting for a bucket
    to be allocated. Normally the gc would mark buckets available for invalidation.
    But the moving_gc was stuck waiting for outstanding writes to complete.
    These writes used the bcache_wq, the same queue foreground writes used.

    This fix gives moving_gc its own work queue, so it was still finish moving
    even if foreground writes are stuck waiting for allocation. It also makes
    work queue a parameter to the data_insert path, so moving_gc can use its
    workqueue for writes.

    Signed-off-by: Nicholas Swenson
    Signed-off-by: Kent Overstreet

    Nicholas Swenson
     
  • blk_stack_limits() doesn't like a discard granularity of 0.

    Signed-off-by: Kent Overstreet

    Kent Overstreet