02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

15 Sep, 2017

1 commit

  • Pull mount flag updates from Al Viro:
    "Another chunk of fmount preparations from dhowells; only trivial
    conflicts for that part. It separates MS_... bits (very grotty
    mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
    only a small subset of MS_... stuff).

    This does *not* convert the filesystems to new constants; only the
    infrastructure is done here. The next step in that series is where the
    conflicts would be; that's the conversion of filesystems. It's purely
    mechanical and it's better done after the merge, so if you could run
    something like

    list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')

    sed -i -e 's/\/SB_RDONLY/g' \
    -e 's/\/SB_NOSUID/g' \
    -e 's/\/SB_NODEV/g' \
    -e 's/\/SB_NOEXEC/g' \
    -e 's/\/SB_SYNCHRONOUS/g' \
    -e 's/\/SB_MANDLOCK/g' \
    -e 's/\/SB_DIRSYNC/g' \
    -e 's/\/SB_NOATIME/g' \
    -e 's/\/SB_NODIRATIME/g' \
    -e 's/\/SB_SILENT/g' \
    -e 's/\/SB_POSIXACL/g' \
    -e 's/\/SB_KERNMOUNT/g' \
    -e 's/\/SB_I_VERSION/g' \
    -e 's/\/SB_LAZYTIME/g' \
    $list

    and commit it with something along the lines of 'convert filesystems
    away from use of MS_... constants' as commit message, it would save a
    quite a bit of headache next cycle"

    * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    VFS: Differentiate mount flags (MS_*) from internal superblock flags
    VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
    vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags

    Linus Torvalds
     

07 Sep, 2017

2 commits

  • Merge updates from Andrew Morton:

    - various misc bits

    - DAX updates

    - OCFS2

    - most of MM

    * emailed patches from Andrew Morton : (119 commits)
    mm,fork: introduce MADV_WIPEONFORK
    x86,mpx: make mpx depend on x86-64 to free up VMA flag
    mm: add /proc/pid/smaps_rollup
    mm: hugetlb: clear target sub-page last when clearing huge page
    mm: oom: let oom_reap_task and exit_mmap run concurrently
    swap: choose swap device according to numa node
    mm: replace TIF_MEMDIE checks by tsk_is_oom_victim
    mm, oom: do not rely on TIF_MEMDIE for memory reserves access
    z3fold: use per-cpu unbuddied lists
    mm, swap: don't use VMA based swap readahead if HDD is used as swap
    mm, swap: add sysfs interface for VMA based swap readahead
    mm, swap: VMA based swap readahead
    mm, swap: fix swap readahead marking
    mm, swap: add swap readahead hit statistics
    mm/vmalloc.c: don't reinvent the wheel but use existing llist API
    mm/vmstat.c: fix wrong comment
    selftests/memfd: add memfd_create hugetlbfs selftest
    mm/shmem: add hugetlbfs support to memfd_create()
    mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups
    mm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas()
    ...

    Linus Torvalds
     
  • fsync codepath assumes that f_mapping can never be NULL, but
    sync_file_range has a check for that.

    Remove the one from sync_file_range as I don't see how you'd ever get a
    NULL pointer in here.

    Link: http://lkml.kernel.org/r/20170525110509.9434-1-jlayton@redhat.com
    Signed-off-by: Jeff Layton
    Cc: Alexander Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Layton
     

01 Aug, 2017

1 commit

  • sync_file_range doesn't call down into the filesystem directly at all.
    It only kicks off writeback of pagecache pages and optionally waits
    on the result.

    Convert sync_file_range to use errseq_t based error tracking, under the
    assumption that most users will prefer this behavior when errors occur.

    Reviewed-by: Jan Kara
    Signed-off-by: Jeff Layton

    Jeff Layton
     

17 Jul, 2017

1 commit

  • Firstly by applying the following with coccinelle's spatch:

    @@ expression SB; @@
    -SB->s_flags & MS_RDONLY
    +sb_rdonly(SB)

    to effect the conversion to sb_rdonly(sb), then by applying:

    @@ expression A, SB; @@
    (
    -(!sb_rdonly(SB)) && A
    +!sb_rdonly(SB) && A
    |
    -A != (sb_rdonly(SB))
    +A != sb_rdonly(SB)
    |
    -A == (sb_rdonly(SB))
    +A == sb_rdonly(SB)
    |
    -!(sb_rdonly(SB))
    +!sb_rdonly(SB)
    |
    -A && (sb_rdonly(SB))
    +A && sb_rdonly(SB)
    |
    -A || (sb_rdonly(SB))
    +A || sb_rdonly(SB)
    |
    -(sb_rdonly(SB)) != A
    +sb_rdonly(SB) != A
    |
    -(sb_rdonly(SB)) == A
    +sb_rdonly(SB) == A
    |
    -(sb_rdonly(SB)) && A
    +sb_rdonly(SB) && A
    |
    -(sb_rdonly(SB)) || A
    +sb_rdonly(SB) || A
    )

    @@ expression A, B, SB; @@
    (
    -(sb_rdonly(SB)) ? 1 : 0
    +sb_rdonly(SB)
    |
    -(sb_rdonly(SB)) ? A : B
    +sb_rdonly(SB) ? A : B
    )

    to remove left over excess bracketage and finally by applying:

    @@ expression A, SB; @@
    (
    -(A & MS_RDONLY) != sb_rdonly(SB)
    +(bool)(A & MS_RDONLY) != sb_rdonly(SB)
    |
    -(A & MS_RDONLY) == sb_rdonly(SB)
    +(bool)(A & MS_RDONLY) == sb_rdonly(SB)
    )

    to make comparisons against the result of sb_rdonly() (which is a bool)
    work correctly.

    Signed-off-by: David Howells

    David Howells
     

06 Jul, 2017

1 commit


20 Feb, 2017

1 commit


05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

07 Nov, 2015

1 commit

  • sync_file_range(2) is documented to issue writeback only for pages that
    are not currently being written. After all the system call has been
    created for userspace to be able to issue background writeout and so
    waiting for in-flight IO is undesirable there. However commit
    ee53a891f474 ("mm: do_sync_mapping_range integrity fix") switched
    do_sync_mapping_range() and thus sync_file_range() to issue writeback in
    WB_SYNC_ALL mode since do_sync_mapping_range() was used by other code
    relying on WB_SYNC_ALL semantics.

    These days do_sync_mapping_range() went away and we can switch
    sync_file_range(2) back to issuing WB_SYNC_NONE writeback. That should
    help PostgreSQL avoid large latency spikes when flushing data in the
    background.

    Andres measured a 20% increase in transactions per second on an SSD disk.

    Signed-off-by: Jan Kara
    Reported-by: Andres Freund
    Tested-By: Andres Freund
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

06 Nov, 2015

1 commit

  • filemap_fdatawait() is a function to wait for on-going writeback to
    complete but also consume and clear error status of the mapping set during
    writeback.

    The latter functionality is critical for applications to detect writeback
    error with system calls like fsync(2)/fdatasync(2).

    However filemap_fdatawait() is also used by sync(2) or FIFREEZE ioctl,
    which don't check error status of individual mappings.

    As a result, fsync() may not be able to detect writeback error if events
    happen in the following order:

    Application System admin
    ----------------------------------------------------------
    write data on page cache
    Run sync command
    writeback completes with error
    filemap_fdatawait() clears error
    fsync returns success
    (but the data is not on disk)

    This patch adds filemap_fdatawait_keep_errors() for call sites where
    writeback error is not handled so that they don't clear error status.

    Signed-off-by: Jun'ichi Nomura
    Acked-by: Andi Kleen
    Reviewed-by: Tejun Heo
    Cc: Fengguang Wu
    Cc: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Junichi Nomura
     

05 Feb, 2015

1 commit

  • Add a new mount option which enables a new "lazytime" mode. This mode
    causes atime, mtime, and ctime updates to only be made to the
    in-memory version of the inode. The on-disk times will only get
    updated when (a) if the inode needs to be updated for some non-time
    related change, (b) if userspace calls fsync(), syncfs() or sync(), or
    (c) just before an undeleted inode is evicted from memory.

    This is OK according to POSIX because there are no guarantees after a
    crash unless userspace explicitly requests via a fsync(2) call.

    For workloads which feature a large number of random write to a
    preallocated file, the lazytime mount option significantly reduces
    writes to the inode table. The repeated 4k writes to a single block
    will result in undesirable stress on flash devices and SMR disk
    drives. Even on conventional HDD's, the repeated writes to the inode
    table block will trigger Adjacent Track Interference (ATI) remediation
    latencies, which very negatively impact long tail latencies --- which
    is a very big deal for web serving tiers (for example).

    Google-Bug-Id: 18297052

    Signed-off-by: Theodore Ts'o
    Signed-off-by: Al Viro

    Theodore Ts'o
     

20 Nov, 2014

1 commit


05 Sep, 2014

1 commit

  • This patch changes sync_filesystem() to be EXPORT_SYMBOL().

    The reason this is needed is that starting with 3.15 kernel, due to
    Theodore Ts'o's commit 02b9984d6408 ("fs: push sync_filesystem() down to
    the file system's remount_fs()"), all file systems that have dirty data
    to be written out need to call sync_filesystem() from their
    ->remount_fs() method when remounting read-only.

    As this is now a generically required function rather than an internal
    only function it should be EXPORT_SYMBOL() so that all file systems can
    call it.

    Signed-off-by: Anton Altaparmakov
    Acked-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Anton Altaparmakov
     

22 Feb, 2014

1 commit

  • This reverts commit c4a391b53a72d2df4ee97f96f78c1d5971b47489. Dave
    Chinner has reported the commit may cause some
    inodes to be left out from sync(2). This is because we can call
    redirty_tail() for some inode (which sets i_dirtied_when to current time)
    after sync(2) has started or similarly requeue_inode() can set
    i_dirtied_when to current time if writeback had to skip some pages. The
    real problem is in the functions clobbering i_dirtied_when but fixing
    that isn't trivial so revert is a safer choice for now.

    CC: stable@vger.kernel.org # >= 3.13
    Signed-off-by: Jan Kara

    Jan Kara
     

10 Feb, 2014

1 commit

  • It actually goes back to 2004 ([PATCH] Concurrent O_SYNC write support)
    when sync_page_range() had been introduced; generic_file_write{,v}() correctly
    synced
    pos_after_write - written .. pos_after_write - 1
    but generic_file_aio_write() synced
    pos_before_write .. pos_before_write + written - 1
    instead. Which is not the same thing with O_APPEND, obviously.
    A couple of years later correct variant had been killed off when
    everything switched to use of generic_file_aio_write().

    All users of generic_file_aio_write() are affected, and the same bug
    has been copied into other instances of ->aio_write().

    The fix is trivial; the only subtle point is that generic_write_sync()
    ought to be inlined to avoid calculations useless for the majority of
    calls.

    Signed-off-by: Al Viro

    Al Viro
     

13 Nov, 2013

2 commits

  • Merge first patch-bomb from Andrew Morton:
    "Quite a lot of other stuff is banked up awaiting further
    next->mainline merging, but this batch contains:

    - Lots of random misc patches
    - OCFS2
    - Most of MM
    - backlight updates
    - lib/ updates
    - printk updates
    - checkpatch updates
    - epoll tweaking
    - rtc updates
    - hfs
    - hfsplus
    - documentation
    - procfs
    - update gcov to gcc-4.7 format
    - IPC"

    * emailed patches from Andrew Morton : (269 commits)
    ipc, msg: fix message length check for negative values
    ipc/util.c: remove unnecessary work pending test
    devpts: plug the memory leak in kill_sb
    ./Makefile: export initial ramdisk compression config option
    init/Kconfig: add option to disable kernel compression
    drivers: w1: make w1_slave::flags long to avoid memory corruption
    drivers/w1/masters/ds1wm.cuse dev_get_platdata()
    drivers/memstick/core/ms_block.c: fix unreachable state in h_msb_read_page()
    drivers/memstick/core/mspro_block.c: fix attributes array allocation
    drivers/pps/clients/pps-gpio.c: remove redundant of_match_ptr
    kernel/panic.c: reduce 1 byte usage for print tainted buffer
    gcov: reuse kbasename helper
    kernel/gcov/fs.c: use pr_warn()
    kernel/module.c: use pr_foo()
    gcov: compile specific gcov implementation based on gcc version
    gcov: add support for gcc 4.7 gcov format
    gcov: move gcov structs definitions to a gcc version specific file
    kernel/taskstats.c: return -ENOMEM when alloc memory fails in add_del_listener()
    kernel/taskstats.c: add nla_nest_cancel() for failure processing between nla_nest_start() and nla_nest_end()
    kernel/sysctl_binary.c: use scnprintf() instead of snprintf()
    ...

    Linus Torvalds
     
  • When there are processes heavily creating small files while sync(2) is
    running, it can easily happen that quite some new files are created
    between WB_SYNC_NONE and WB_SYNC_ALL pass of sync(2). That can happen
    especially if there are several busy filesystems (remember that sync
    traverses filesystems sequentially and waits in WB_SYNC_ALL phase on one
    fs before starting it on another fs). Because WB_SYNC_ALL pass is slow
    (e.g. causes a transaction commit and cache flush for each inode in
    ext3), resulting sync(2) times are rather large.

    The following script reproduces the problem:

    function run_writers
    {
    for (( i = 0; i < 10; i++ )); do
    mkdir $1/dir$i
    for (( j = 0; j < 40000; j++ )); do
    dd if=/dev/zero of=$1/dir$i/$j bs=4k count=4 &>/dev/null
    done &
    done
    }

    for dir in "$@"; do
    run_writers $dir
    done

    sleep 40
    time sync

    Fix the problem by disregarding inodes dirtied after sync(2) was called
    in the WB_SYNC_ALL pass. To allow for this, sync_inodes_sb() now takes
    a time stamp when sync has started which is used for setting up work for
    flusher threads.

    To give some numbers, when above script is run on two ext4 filesystems
    on simple SATA drive, the average sync time from 10 runs is 267.549
    seconds with standard deviation 104.799426. With the patched kernel,
    the average sync time from 10 runs is 2.995 seconds with standard
    deviation 0.096.

    Signed-off-by: Jan Kara
    Reviewed-by: Fengguang Wu
    Reviewed-by: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     

25 Oct, 2013

1 commit


04 Mar, 2013

1 commit


23 Feb, 2013

1 commit


27 Sep, 2012

1 commit


23 Jul, 2012

7 commits

  • wakeup_flusher_threads(0) will queue work doing complete writeback for each
    flusher thread. Thus there is not much point in submitting another work doing
    full inode WB_SYNC_NONE writeback by writeback_inodes_sb().

    After this change it does not make sense to call nonblocking ->sync_fs and
    block device flush before calling sync_inodes_sb() because
    wakeup_flusher_threads() is completely asynchronous and thus these functions
    would be called in parallel with inode writeback running which will effectively
    void any work they do. So we move sync_inodes_sb() call before these two
    functions.

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • It is not necessary to write block devices twice. The reason why we first did
    flush and then proper sync is that
    for_each_bdev() {
    write_bdev()
    wait_for_completion()
    }
    is much slower than
    for_each_bdev()
    write_bdev()
    for_each_bdev()
    wait_for_completion()
    when there is bigger amount of data. But as is seen in the above, there's no real
    need to scan pages and submit them twice. We just need to separate the submission
    and waiting part. This patch does that.

    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • In case block device does not have filesystem mounted on it, sys_sync will just
    ignore it and doesn't writeout its dirty pages. This is because writeback code
    avoids writing inodes from superblock without backing device and
    blockdev_superblock is such a superblock. Since it's unexpected that sync
    doesn't writeout dirty data for block devices be nice to users and change the
    behavior to do so. So now we iterate over all block devices on blockdev_super
    instead of iterating over all superblocks when syncing block devices.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Change the order of operations during sync from

    for_each_sb {
    writeback_inodes_sb();
    sync_fs(nowait);
    __sync_blockdev(nowait);
    }
    for_each_sb {
    sync_inodes_sb();
    sync_fs(wait);
    __sync_blockdev(wait);
    }

    to

    for_each_sb
    writeback_inodes_sb();
    for_each_sb
    sync_fs(nowait);
    for_each_sb
    __sync_blockdev(nowait);
    for_each_sb
    sync_inodes_sb();
    for_each_sb
    sync_fs(wait);
    for_each_sb
    __sync_blockdev(wait);

    This is a preparation for the following patches in this series.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Since the moment writes to quota files are using block device page cache and
    space for quota structures is reserved at the moment they are first accessed we
    have no reason to sync quota before inode writeback. In fact this order is now
    only harmful since quota information can easily change during inode writeback
    (either because conversion of delayed-allocated extents or simply because of
    allocation of new blocks for simple filesystems not using page_mkwrite).

    So move syncing of quota information after writeback of inodes into ->sync_fs
    method. This way we do not have to use ->quota_sync callback which is primarily
    intended for use by quotactl syscall anyway and we get rid of calling
    ->sync_fs() twice unnecessarily. We skip quota syncing for OCFS2 since it does
    proper quota journalling in all cases (unlike ext3, ext4, and reiserfs which
    also support legacy non-journalled quotas) and thus there are no dirty quota
    structures.

    CC: "Theodore Ts'o"
    CC: Joel Becker
    CC: reiserfs-devel@vger.kernel.org
    Acked-by: Steven Whitehouse
    Acked-by: Dave Kleikamp
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • Split off part of dquot_quota_sync() which writes dquots into a quota file
    to a separate function. In the next patch we will use the function from
    filesystems and we do not want to abuse ->quota_sync quotactl callback more
    than necessary.

    Acked-by: Steven Whitehouse
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     
  • In principle, a filesystem may want to have ->sync_fs() called during sync(1)
    although it does not have a bdi (i.e. s_bdi is set to noop_backing_dev_info).
    Only writeback code really needs bdi set to something reasonable. So move the
    checks where they are more logical.

    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Al Viro

    Jan Kara
     

30 May, 2012

1 commit


29 Feb, 2012

1 commit


04 Jan, 2012

1 commit

  • Move invalidate_bdev, block_sync_page into fs/block_dev.c. Export
    kill_bdev as well, so brd doesn't have to open code it. Reduce
    buffer_head.h requirement accordingly.

    Removed a rather large comment from invalidate_bdev, as it looked a bit
    obsolete to bother moving. The small comment replacing it says enough.

    Signed-off-by: Nick Piggin
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Al Viro
     

31 Oct, 2011

1 commit

  • This creates a new 'reason' field in a wb_writeback_work
    structure, which unambiguously identifies who initiates
    writeback activity. A 'wb_reason' enumeration has been
    added to writeback.h, to enumerate the possible reasons.

    The 'writeback_work_class' and tracepoint event class and
    'writeback_queue_io' tracepoints are updated to include the
    symbolic 'reason' in all trace events.

    And the 'writeback_inodes_sbXXX' family of routines has had
    a wb_stats parameter added to them, so callers can specify
    why writeback is being started.

    Acked-by: Jan Kara
    Signed-off-by: Curt Wohlgemuth
    Signed-off-by: Wu Fengguang

    Curt Wohlgemuth
     

21 Jul, 2011

1 commit

  • Btrfs needs to be able to control how filemap_write_and_wait_range() is called
    in fsync to make it less of a painful operation, so push down taking i_mutex and
    the calling of filemap_write_and_wait() down into the ->fsync() handlers. Some
    file systems can drop taking the i_mutex altogether it seems, like ext3 and
    ocfs2. For correctness sake I just pushed everything down in all cases to make
    sure that we keep the current behavior the same for everybody, and then each
    individual fs maintainer can make up their mind about what to do from there.
    Thanks,

    Acked-by: Jan Kara
    Signed-off-by: Josef Bacik
    Signed-off-by: Al Viro

    Josef Bacik
     

25 Mar, 2011

1 commit

  • * 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
    Documentation/iostats.txt: bit-size reference etc.
    cfq-iosched: removing unnecessary think time checking
    cfq-iosched: Don't clear queue stats when preempt.
    blk-throttle: Reset group slice when limits are changed
    blk-cgroup: Only give unaccounted_time under debug
    cfq-iosched: Don't set active queue in preempt
    block: fix non-atomic access to genhd inflight structures
    block: attempt to merge with existing requests on plug flush
    block: NULL dereference on error path in __blkdev_get()
    cfq-iosched: Don't update group weights when on service tree
    fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
    block: Require subsystems to explicitly allocate bio_set integrity mempool
    jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
    fs: make fsync_buffers_list() plug
    mm: make generic_writepages() use plugging
    blk-cgroup: Add unaccounted time to timeslice_used.
    block: fixup plugging stubs for !CONFIG_BLOCK
    block: remove obsolete comments for blkdev_issue_zeroout.
    blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
    ...

    Fix up conflicts in fs/{aio.c,super.c}

    Linus Torvalds
     

21 Mar, 2011

1 commit

  • It is frequently useful to sync a single file system, instead of all
    mounted file systems via sync(2):

    - On machines with many mounts, it is not at all uncommon for some of
    them to hang (e.g. unresponsive NFS server). sync(2) will get stuck on
    those and may never get to the one you do care about (e.g., /).
    - Some applications write lots of data to the file system and then
    want to make sure it is flushed to disk. Calling fsync(2) on each
    file introduces unnecessary ordering constraints that result in a large
    amount of sub-optimal writeback/flush/commit behavior by the file
    system.

    There are currently two ways (that I know of) to sync a single super_block:

    - BLKFLSBUF ioctl on the block device: That also invalidates the bdev
    mapping, which isn't usually desirable, and doesn't work for non-block
    file systems.
    - 'mount -o remount,rw' will call sync_filesystem as an artifact of the
    current implemention. Relying on this little-known side effect for
    something like data safety sounds foolish.

    Both of these approaches require root privileges, which some applications
    do not have (nor should they need?) given that sync(2) is an unprivileged
    operation.

    This patch introduces a new system call syncfs(2) that takes an fd and
    syncs only the file system it references. Maybe someday we can

    $ sync /some/path

    and not get

    sync: ignoring all arguments

    The syscall is motivated by comments by Al and Christoph at the last LSF.
    syncfs(2) seems like an appropriate name given statfs(2).

    A similar ioctl was also proposed a while back, see
    http://marc.info/?l=linux-fsdevel&m=127970513829285&w=2

    Signed-off-by: Sage Weil
    Signed-off-by: Al Viro

    Sage Weil
     

17 Mar, 2011

1 commit


10 Aug, 2010

1 commit


01 Jun, 2010

2 commits