20 Sep, 2015

1 commit

  • Pull block updates from Jens Axboe:
    "This is a bit bigger than it should be, but I could (did) not want to
    send it off last week due to both wanting extra testing, and expecting
    a fix for the bounce regression as well. In any case, this contains:

    - Fix for the blk-merge.c compilation warning on gcc 5.x from me.

    - A set of back/front SG gap merge fixes, from me and from Sagi.
    This ensures that we honor SG gapping for integrity payloads as
    well.

    - Two small fixes for null_blk from Matias, fixing a leak and a
    capacity propagation issue.

    - A blkcg fix from Tejun, fixing a NULL dereference.

    - A fast clone optimization from Ming, fixing a performance
    regression since the arbitrarily sized bio's were introduced.

    - Also from Ming, a regression fix for bouncing IOs"

    * 'for-linus' of git://git.kernel.dk/linux-block:
    block: fix bounce_end_io
    block: blk-merge: fast-clone bio when splitting rw bios
    block: blkg_destroy_all() should clear q->root_blkg and ->root_rl.blkg
    block: Copy a user iovec if it includes gaps
    block: Refuse adding appending a gapped integrity page to a bio
    block: Refuse request/bio merges with gaps in the integrity payload
    block: Check for gaps on front and back merges
    null_blk: fix wrong capacity when bs is not 512 bytes
    null_blk: fix memory leak on cleanup
    block: fix bogus compiler warnings in blk-merge.c

    Linus Torvalds
     

18 Sep, 2015

1 commit

  • When bio bounce is involved, one new bio and its biovecs are
    cloned from the comming bio, which can be one fast-cloned bio
    from upper layer(such as dm).

    So it is obviously wrong to assume the start index of the coming(
    original) bio's io vector is zero, which can be any value between
    0 and (bi_max_vecs - 1), especially in case of bio split.

    This patch fixes Fedora's booting oops on i386, often with the
    following kernel log together:

    > [ 9.026738] systemd[1]: Switching root.
    > [ 9.036467] systemd-journald[149]: Received SIGTERM from PID 1
    > (systemd).
    > [ 9.082262] BUG: Bad page state in process kworker/u5:1 pfn:372ac
    > [ 9.083989] page:f3d32ae0 count:0 mapcount:0 mapping:f2252178
    > index:0x16a
    > [ 9.085755] flags: 0x40020021(locked|lru|mappedtodisk)
    > [ 9.087284] page dumped because: page still charged to cgroup
    > [ 9.088772] bad because of flags:
    > [ 9.089731] flags: 0x21(locked|lru)
    > [ 9.090818] page->mem_cgroup:f2c3e400

    Reported-by: Josh Boyer
    Tested-by: Adam Williamson
    Cc: Ming Lin
    Cc: Mike Snitzer
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     

04 Sep, 2015

1 commit

  • Pull ext3 removal, quota & udf fixes from Jan Kara:
    "The biggest change in the pull is the removal of ext3 filesystem
    driver (~28k lines removed). Ext4 driver is a full featured
    replacement these days and both RH and SUSE use it for several years
    without issues. Also there are some workarounds in VM & block layer
    mainly for ext3 which we could eventually get rid of.

    Other larger change is addition of proper error handling for
    dquot_initialize(). The rest is small fixes and cleanups"

    [ I wasn't convinced about the ext3 removal and worried about things
    falling through the cracks for legacy users, but ext4 maintainers
    piped up and were all unanimously in favor of removal, and maintaining
    all legacy ext3 support inside ext4. - Linus ]

    * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
    udf: Don't modify filesystem for read-only mounts
    quota: remove an unneeded condition
    ext4: memory leak on error in ext4_symlink()
    mm/Kconfig: NEED_BOUNCE_POOL: clean-up condition
    ext4: Improve ext4 Kconfig test
    block: Remove forced page bouncing under IO
    fs: Remove ext3 filesystem driver
    doc: Update doc about journalling layer
    jfs: Handle error from dquot_initialize()
    reiserfs: Handle error from dquot_initialize()
    ocfs2: Handle error from dquot_initialize()
    ext4: Handle error from dquot_initialize()
    ext2: Handle error from dquot_initalize()
    quota: Propagate error from ->acquire_dquot()

    Linus Torvalds
     

29 Jul, 2015

2 commits

  • Some places use helpers now, others don't. We only have the 'is set'
    helper, add helpers for setting and clearing flags too.

    It was a bit of a mess of atomic vs non-atomic access. With
    BIO_UPTODATE gone, we don't have any risk of concurrent access to the
    flags. So relax the restriction and don't make any of them atomic. The
    flags that do have serialization issues (reffed and chained), we
    already handle those separately.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Currently we have two different ways to signal an I/O error on a BIO:

    (1) by clearing the BIO_UPTODATE flag
    (2) by returning a Linux errno value to the bi_end_io callback

    The first one has the drawback of only communicating a single possible
    error (-EIO), and the second one has the drawback of not beeing persistent
    when bios are queued up, and are not passed along from child to parent
    bio in the ever more popular chaining scenario. Having both mechanisms
    available has the additional drawback of utterly confusing driver authors
    and introducing bugs where various I/O submitters only deal with one of
    them, and the others have to add boilerplate code to deal with both kinds
    of error returns.

    So add a new bi_error field to store an errno value directly in struct
    bio and remove the existing mechanisms to clean all this up.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Hannes Reinecke
    Reviewed-by: NeilBrown
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

24 Jul, 2015

1 commit

  • JBD layer wrote back data buffers without setting PageWriteback bit.
    Thus standard mechanism for guaranteeing stable pages under IO did not
    work. Since JBD is gone now and there is no other user of the
    functionality, just remove it.

    Acked-by: Jens Axboe
    Signed-off-by: Jan Kara

    Jan Kara
     

26 Jun, 2015

2 commits

  • Pull cgroup writeback support from Jens Axboe:
    "This is the big pull request for adding cgroup writeback support.

    This code has been in development for a long time, and it has been
    simmering in for-next for a good chunk of this cycle too. This is one
    of those problems that has been talked about for at least half a
    decade, finally there's a solution and code to go with it.

    Also see last weeks writeup on LWN:

    http://lwn.net/Articles/648292/"

    * 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits)
    writeback, blkio: add documentation for cgroup writeback support
    vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB
    writeback: do foreign inode detection iff cgroup writeback is enabled
    v9fs: fix error handling in v9fs_session_init()
    bdi: fix wrong error return value in cgwb_create()
    buffer: remove unusued 'ret' variable
    writeback: disassociate inodes from dying bdi_writebacks
    writeback: implement foreign cgroup inode bdi_writeback switching
    writeback: add lockdep annotation to inode_to_wb()
    writeback: use unlocked_inode_to_wb transaction in inode_congested()
    writeback: implement unlocked_inode_to_wb transaction and use it for stat updates
    writeback: implement [locked_]inode_to_wb_and_lock_list()
    writeback: implement foreign cgroup inode detection
    writeback: make writeback_control track the inode being written back
    writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb()
    mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use
    writeback: implement memcg writeback domain based throttling
    writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes
    writeback: implement memcg wb_domain
    writeback: update wb_over_bg_thresh() to use wb_domain aware operations
    ...

    Linus Torvalds
     
  • Pull core block IO update from Jens Axboe:
    "Nothing really major in here, mostly a collection of smaller
    optimizations and cleanups, mixed with various fixes. In more detail,
    this contains:

    - Addition of policy specific data to blkcg for block cgroups. From
    Arianna Avanzini.

    - Various cleanups around command types from Christoph.

    - Cleanup of the suspend block I/O path from Christoph.

    - Plugging updates from Shaohua and Jeff Moyer, for blk-mq.

    - Eliminating atomic inc/dec of both remaining IO count and reference
    count in a bio. From me.

    - Fixes for SG gap and chunk size support for data-less (discards)
    IO, so we can merge these better. From me.

    - Small restructuring of blk-mq shared tag support, freeing drivers
    from iterating hardware queues. From Keith Busch.

    - A few cfq-iosched tweaks, from Tahsin Erdogan and me. Makes the
    IOPS mode the default for non-rotational storage"

    * 'for-4.2/core' of git://git.kernel.dk/linux-block: (35 commits)
    cfq-iosched: fix other locations where blkcg_to_cfqgd() can return NULL
    cfq-iosched: fix sysfs oops when attempting to read unconfigured weights
    cfq-iosched: move group scheduling functions under ifdef
    cfq-iosched: fix the setting of IOPS mode on SSDs
    blktrace: Add blktrace.c to BLOCK LAYER in MAINTAINERS file
    block, cgroup: implement policy-specific per-blkcg data
    block: Make CFQ default to IOPS mode on SSDs
    block: add blk_set_queue_dying() to blkdev.h
    blk-mq: Shared tag enhancements
    block: don't honor chunk sizes for data-less IO
    block: only honor SG gap prevention for merges that contain data
    block: fix returnvar.cocci warnings
    block, dm: don't copy bios for request clones
    block: remove management of bi_remaining when restoring original bi_end_io
    block: replace trylock with mutex_lock in blkdev_reread_part()
    block: export blkdev_reread_part() and __blkdev_reread_part()
    suspend: simplify block I/O handling
    block: collapse bio bit space
    block: remove unused BIO_RW_BLOCK and BIO_EOF flags
    block: remove BIO_EOPNOTSUPP
    ...

    Linus Torvalds
     

02 Jun, 2015

1 commit

  • With the planned cgroup writeback support, backing-dev related
    declarations will be more widely used across block and cgroup;
    unfortunately, including backing-dev.h from include/linux/blkdev.h
    makes cyclic include dependency quite likely.

    This patch separates out backing-dev-defs.h which only has the
    essential definitions and updates blkdev.h to include it. c files
    which need access to more backing-dev details now include
    backing-dev.h directly. This takes backing-dev.h off the common
    include dependency chain making it a lot easier to use it across block
    and cgroup.

    v2: fs/fat build failure fixed.

    Signed-off-by: Tejun Heo
    Reviewed-by: Jan Kara
    Cc: Jens Axboe
    Signed-off-by: Jens Axboe

    Tejun Heo
     

19 May, 2015

1 commit

  • Since the big barrier rewrite/removal in 2007 we never fail FLUSH or
    FUA requests, which means we can remove the magic BIO_EOPNOTSUPP flag
    to help propagating those to the buffer_head layer.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Jeff Moyer
    Acked-by: Tejun Heo
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

27 Apr, 2015

1 commit

  • Commit d2c5e30c9a1420902262aa923794d2ae4e0bc391
    ("[PATCH] zoned vm counters: conversion of nr_bounce to per zone counter")
    convert statistic of nr_bounce to per zone and one global value in vm_stat,
    but it call inc_|dec_zone_page_state on different pages, then different
    zones, and cause us to get unexpected value of NR_BOUNCE.

    Below is the result on my machine:
    Mar 2 09:26:08 udknight kernel: [144766.778265] Mem-Info:
    Mar 2 09:26:08 udknight kernel: [144766.778266] DMA per-cpu:
    Mar 2 09:26:08 udknight kernel: [144766.778268] CPU 0: hi: 0, btch: 1 usd: 0
    Mar 2 09:26:08 udknight kernel: [144766.778269] CPU 1: hi: 0, btch: 1 usd: 0
    Mar 2 09:26:08 udknight kernel: [144766.778270] Normal per-cpu:
    Mar 2 09:26:08 udknight kernel: [144766.778271] CPU 0: hi: 186, btch: 31 usd: 0
    Mar 2 09:26:08 udknight kernel: [144766.778273] CPU 1: hi: 186, btch: 31 usd: 0
    Mar 2 09:26:08 udknight kernel: [144766.778274] HighMem per-cpu:
    Mar 2 09:26:08 udknight kernel: [144766.778275] CPU 0: hi: 186, btch: 31 usd: 0
    Mar 2 09:26:08 udknight kernel: [144766.778276] CPU 1: hi: 186, btch: 31 usd: 0
    Mar 2 09:26:08 udknight kernel: [144766.778279] active_anon:46926 inactive_anon:287406 isolated_anon:0
    Mar 2 09:26:08 udknight kernel: [144766.778279] active_file:105085 inactive_file:139432 isolated_file:0
    Mar 2 09:26:08 udknight kernel: [144766.778279] unevictable:653 dirty:0 writeback:0 unstable:0
    Mar 2 09:26:08 udknight kernel: [144766.778279] free:178957 slab_reclaimable:6419 slab_unreclaimable:9966
    Mar 2 09:26:08 udknight kernel: [144766.778279] mapped:4426 shmem:305277 pagetables:784 bounce:0
    Mar 2 09:26:08 udknight kernel: [144766.778279] free_cma:0
    Mar 2 09:26:08 udknight kernel: [144766.778286] DMA free:3324kB min:68kB low:84kB high:100kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15976kB managed:15900kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
    Mar 2 09:26:08 udknight kernel: [144766.778287] lowmem_reserve[]: 0 822 3754 3754
    Mar 2 09:26:08 udknight kernel: [144766.778293] Normal free:26828kB min:3632kB low:4540kB high:5448kB active_anon:4872kB inactive_anon:68kB active_file:1796kB inactive_file:1796kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:892920kB managed:842560kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:4144kB slab_reclaimable:25676kB slab_unreclaimable:39864kB kernel_stack:1944kB pagetables:3136kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:2412612 all_unreclaimable? yes
    Mar 2 09:26:08 udknight kernel: [144766.778294] lowmem_reserve[]: 0 0 23451 23451
    Mar 2 09:26:08 udknight kernel: [144766.778299] HighMem free:685676kB min:512kB low:3748kB high:6984kB active_anon:182832kB inactive_anon:1149556kB active_file:418544kB inactive_file:555932kB unevictable:2612kB isolated(anon):0kB isolated(file):0kB present:3001732kB managed:3001732kB mlocked:0kB dirty:0kB writeback:0kB mapped:17704kB shmem:1216964kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:75771152kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
    Mar 2 09:26:08 udknight kernel: [144766.778300] lowmem_reserve[]: 0 0 0 0

    You can see bounce:75771152kB for HighMem, but bounce:0 for lowmem and global.

    This patch fix it.

    Signed-off-by: Wang YanQing
    Signed-off-by: Jens Axboe

    Wang YanQing
     

07 Jun, 2014

1 commit

  • printk is meant to be used with an associated log level. There are some
    instances of printk scattered around the mm code where the log level is
    missing. Add a log level and adhere to suggestions by
    scripts/checkpatch.pl by moving to the pr_* macros.

    Also add the typical pr_fmt definition so that print statements can be
    easily traced back to the modules where they occur, correlated one with
    another, etc. This will require the removal of some (now redundant)
    prefixes on a few print statements.

    Signed-off-by: Mitchel Humpherys
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mitchel Humpherys
     

20 May, 2014

1 commit

  • Continue moving some of the block files that are scattered around.
    bounce.c contains only code for bouncing the contents of a bio.
    It's block proper code, not mm code.

    Suggested-by: Ming Lei
    Signed-off-by: Jens Axboe

    Jens Axboe