08 Jun, 2016

1 commit

  • This patch converts the simple bi_rw use cases in the block,
    drivers, mm and fs code to set/get the bio operation using
    bio_set_op_attrs/bio_op

    These should be simple one or two liner cases, so I just did them
    in one patch. The next patches handle the more complicated
    cases in a module per patch.

    Signed-off-by: Mike Christie
    Reviewed-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Mike Christie
     

18 May, 2016

2 commits

  • Pull trivial tree updates from Jiri Kosina.

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (21 commits)
    gitignore: fix wording
    mfd: ab8500-debugfs: fix "between" in printk
    memstick: trivial fix of spelling mistake on management
    cpupowerutils: bench: fix "average"
    treewide: Fix typos in printk
    IB/mlx4: printk fix
    pinctrl: sirf/atlas7: fix printk spelling
    serial: mctrl_gpio: Grammar s/lines GPIOs/line GPIOs/, /sets/set/
    w1: comment spelling s/minmum/minimum/
    Blackfin: comment spelling s/divsor/divisor/
    metag: Fix misspellings in comments.
    ia64: Fix misspellings in comments.
    hexagon: Fix misspellings in comments.
    tools/perf: Fix misspellings in comments.
    cris: Fix misspellings in comments.
    c6x: Fix misspellings in comments.
    blackfin: Fix misspelling of 'register' in comment.
    avr32: Fix misspelling of 'definitions' in comment.
    treewide: Fix typos in printk
    Doc: treewide : Fix typos in DocBook/filesystem.xml
    ...

    Linus Torvalds
     
  • Pull vfs cleanups from Al Viro:
    "More cleanups from Christoph"

    * 'work.preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    nfsd: use RWF_SYNC
    fs: add RWF_DSYNC aand RWF_SYNC
    ceph: use generic_write_sync
    fs: simplify the generic_write_sync prototype
    fs: add IOCB_SYNC and IOCB_DSYNC
    direct-io: remove the offset argument to dio_complete
    direct-io: eliminate the offset argument to ->direct_IO
    xfs: eliminate the pos variable in xfs_file_dio_aio_write
    filemap: remove the pos argument to generic_file_direct_write
    filemap: remove pos variables in generic_file_read_iter

    Linus Torvalds
     

09 May, 2016

1 commit


03 May, 2016

2 commits

  • Right now ext2_get_page() (and its analogues in a bunch of other filesystems)
    relies upon the directory being locked - the way it sets and tests Checked and
    Error bits would be racy without that. Switch to a slightly different scheme,
    _not_ setting Checked in case of failure. That way the logics becomes
    if Checked => OK
    else if Error => fail
    else if !validate => fail
    else => OK
    with validation setting Checked or Error on success and failure resp. and
    returning which one had happened. Equivalent to the current logics, but unlike
    the current logics not sensitive to the order of set_bit, test_bit getting
    reordered by CPU, etc.

    Signed-off-by: Al Viro

    Al Viro
     
  • The rest of work.xattr stuff isn't needed for this branch

    Al Viro
     

02 May, 2016

1 commit


18 Apr, 2016

1 commit


11 Apr, 2016

1 commit


05 Apr, 2016

1 commit

  • PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
    ago with promise that one day it will be possible to implement page
    cache with bigger chunks than PAGE_SIZE.

    This promise never materialized. And unlikely will.

    We have many places where PAGE_CACHE_SIZE assumed to be equal to
    PAGE_SIZE. And it's constant source of confusion on whether
    PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
    especially on the border between fs and mm.

    Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
    breakage to be doable.

    Let's stop pretending that pages in page cache are special. They are
    not.

    The changes are pretty straight-forward:

    - << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

    - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

    - page_cache_get() -> get_page();

    - page_cache_release() -> put_page();

    This patch contains automated changes generated with coccinelle using
    script below. For some reason, coccinelle doesn't patch header files.
    I've called spatch for them manually.

    The only adjustment after coccinelle is revert of changes to
    PAGE_CAHCE_ALIGN definition: we are going to drop it later.

    There are few places in the code where coccinelle didn't reach. I'll
    fix them manually in a separate patch. Comments and documentation also
    will be addressed with the separate patch.

    virtual patch

    @@
    expression E;
    @@
    - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    expression E;
    @@
    - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
    + E

    @@
    @@
    - PAGE_CACHE_SHIFT
    + PAGE_SHIFT

    @@
    @@
    - PAGE_CACHE_SIZE
    + PAGE_SIZE

    @@
    @@
    - PAGE_CACHE_MASK
    + PAGE_MASK

    @@
    expression E;
    @@
    - PAGE_CACHE_ALIGN(E)
    + PAGE_ALIGN(E)

    @@
    expression E;
    @@
    - page_cache_get(E)
    + get_page(E)

    @@
    expression E;
    @@
    - page_cache_release(E)
    + put_page(E)

    Signed-off-by: Kirill A. Shutemov
    Acked-by: Michal Hocko
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     

23 Jan, 2016

1 commit

  • parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
    inode_foo(inode) being mutex_foo(&inode->i_mutex).

    Please, use those for access to ->i_mutex; over the coming cycle
    ->i_mutex will become rwsem, with ->lookup() done with it held
    only shared.

    Signed-off-by: Al Viro

    Al Viro
     

15 Jan, 2016

1 commit

  • Mark those kmem allocations that are known to be easily triggered from
    userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
    memcg. For the list, see below:

    - threadinfo
    - task_struct
    - task_delay_info
    - pid
    - cred
    - mm_struct
    - vm_area_struct and vm_region (nommu)
    - anon_vma and anon_vma_chain
    - signal_struct
    - sighand_struct
    - fs_struct
    - files_struct
    - fdtable and fdtable->full_fds_bits
    - dentry and external_name
    - inode for all filesystems. This is the most tedious part, because
    most filesystems overwrite the alloc_inode method.

    The list is far from complete, so feel free to add more objects.
    Nevertheless, it should be close to "account everything" approach and
    keep most workloads within bounds. Malevolent users will be able to
    breach the limit, but this was possible even with the former "account
    everything" approach (simply because it did not account everything in
    fact).

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Vladimir Davydov
    Acked-by: Johannes Weiner
    Acked-by: Michal Hocko
    Cc: Tejun Heo
    Cc: Greg Thelen
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

12 Jan, 2016

1 commit

  • Pull vfs RCU symlink updates from Al Viro:
    "Replacement of ->follow_link/->put_link, allowing to stay in RCU mode
    even if the symlink is not an embedded one.

    No changes since the mailbomb on Jan 1"

    * 'work.symlinks' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    switch ->get_link() to delayed_call, kill ->put_link()
    kill free_page_put_link()
    teach nfs_get_link() to work in RCU mode
    teach proc_self_get_link()/proc_thread_self_get_link() to work in RCU mode
    teach shmem_get_link() to work in RCU mode
    teach page_get_link() to work in RCU mode
    replace ->follow_link() with new method that could stay in RCU mode
    don't put symlink bodies in pagecache into highmem
    namei: page_getlink() and page_follow_link_light() are the same thing
    ufs: get rid of ->setattr() for symlinks
    udf: don't duplicate page_symlink_inode_operations
    logfs: don't duplicate page_symlink_inode_operations
    switch befs long symlinks to page_symlink_operations

    Linus Torvalds
     

13 Dec, 2015

1 commit

  • Commit 42cb14b110a5 ("mm: migrate dirty page without
    clear_page_dirty_for_io etc") simplified the migration of a PageDirty
    pagecache page: one stat needs moving from zone to zone and that's about
    all.

    It's convenient and safest for it to shift the PageDirty bit from old
    page to new, just before updating the zone stats: before copying data
    and marking the new PageUptodate. This is all done while both pages are
    isolated and locked, just as before; and just as before, there's a
    moment when the new page is visible in the radix_tree, but not yet
    PageUptodate. What's new is that it may now be briefly visible as
    PageDirty before it is PageUptodate.

    When I scoured the tree to see if this could cause a problem anywhere,
    the only places I found were in two similar functions __r4w_get_page():
    which look up a page with find_get_page() (not using page lock), then
    claim it's uptodate if it's PageDirty or PageWriteback or PageUptodate.

    I'm not sure whether that was right before, but now it might be wrong
    (on rare occasions): only claim the page is uptodate if PageUptodate.
    Or perhaps the page in question could never be migratable anyway?

    Signed-off-by: Hugh Dickins
    Tested-by: Boaz Harrosh
    Cc: Benny Halevy
    Cc: Trond Myklebust
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

09 Dec, 2015

1 commit

  • kmap() in page_follow_link_light() needed to go - allowing to hold
    an arbitrary number of kmaps for long is a great way to deadlocking
    the system.

    new helper (inode_nohighmem(inode)) needs to be used for pagecache
    symlinks inodes; done for all in-tree cases. page_follow_link_light()
    instrumented to yell about anything missed.

    Signed-off-by: Al Viro

    Al Viro
     

10 Nov, 2015

1 commit


24 Jun, 2015

1 commit


11 May, 2015

1 commit


16 Apr, 2015

1 commit


12 Apr, 2015

2 commits


17 Feb, 2015

1 commit

  • All callers of get_xip_mem() are now gone. Remove checks for it,
    initialisers of it, documentation of it and the only implementation of it.
    Also remove mm/filemap_xip.c as it is now empty. Also remove
    documentation of the long-gone get_xip_page().

    Signed-off-by: Matthew Wilcox
    Cc: Andreas Dilger
    Cc: Boaz Harrosh
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Jan Kara
    Cc: Jens Axboe
    Cc: Kirill A. Shutemov
    Cc: Mathieu Desnoyers
    Cc: Randy Dunlap
    Cc: Ross Zwisler
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthew Wilcox
     

21 Jan, 2015

2 commits

  • Now that we never use the backing_dev_info pointer in struct address_space
    we can simply remove it and save 4 to 8 bytes in every inode.

    Signed-off-by: Christoph Hellwig
    Acked-by: Ryusuke Konishi
    Reviewed-by: Tejun Heo
    Reviewed-by: Jan Kara
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     
  • Since "BDI: Provide backing device capability information [try #3]" the
    backing_dev_info structure also provides flags for the kind of mmap
    operation available in a nommu environment, which is entirely unrelated
    to it's original purpose.

    Introduce a new nommu-only file operation to provide this information to
    the nommu mmap code instead. Splitting this from the backing_dev_info
    structure allows to remove lots of backing_dev_info instance that aren't
    otherwise needed, and entirely gets rid of the concept of providing a
    backing_dev_info for a character device. It also removes the need for
    the mtd_inodefs filesystem.

    Signed-off-by: Christoph Hellwig
    Reviewed-by: Tejun Heo
    Acked-by: Brian Norris
    Signed-off-by: Jens Axboe

    Christoph Hellwig
     

20 Oct, 2014

1 commit


09 Aug, 2014

1 commit


13 Jun, 2014

1 commit

  • Pull vfs updates from Al Viro:
    "This the bunch that sat in -next + lock_parent() fix. This is the
    minimal set; there's more pending stuff.

    In particular, I really hope to get acct.c fixes merged this cycle -
    we need that to deal sanely with delayed-mntput stuff. In the next
    pile, hopefully - that series is fairly short and localized
    (kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more
    iov_iter work. Most of prereqs for ->splice_write with sane locking
    order are there and Kent's dio rewrite would also fit nicely on top of
    this pile"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits)
    lock_parent: don't step on stale ->d_parent of all-but-freed one
    kill generic_file_splice_write()
    ceph: switch to iter_file_splice_write()
    shmem: switch to iter_file_splice_write()
    nfs: switch to iter_splice_write_file()
    fs/splice.c: remove unneeded exports
    ocfs2: switch to iter_file_splice_write()
    ->splice_write() via ->write_iter()
    bio_vec-backed iov_iter
    optimize copy_page_{to,from}_iter()
    bury generic_file_aio_{read,write}
    lustre: get rid of messing with iovecs
    ceph: switch to ->write_iter()
    ceph_sync_direct_write: stop poking into iov_iter guts
    ceph_sync_read: stop poking into iov_iter guts
    new helper: copy_page_from_iter()
    fuse: switch to ->write_iter()
    btrfs: switch to ->write_iter()
    ocfs2: switch to ->write_iter()
    xfs: switch to ->write_iter()
    ...

    Linus Torvalds
     

12 Jun, 2014

1 commit

  • iter_file_splice_write() - a ->splice_write() instance that gathers the
    pipe buffers, builds a bio_vec-based iov_iter covering those and feeds
    it to ->write_iter(). A bunch of simple cases coverted to that...

    [AV: fixed the braino spotted by Cyrill]

    Signed-off-by: Al Viro

    Al Viro
     

22 May, 2014

3 commits

  • This simple patch adds support for raid6 to the ORE.
    Most operations and calculations where already for the general
    case. Only things left:
    * call async_gen_syndrome() in the case of raid6
    (NOTE that the raid6 math is the one supported by the Linux Kernel
    see: crypto/async_tx/async_pq.c)
    * call _ore_add_parity_unit() twice with only last call generating
    the redundancy pages.

    * Fix couple BUGS in old code
    a. In reads when parity==2 it can happen that per_dev->length=0
    but per_dev->offset was set and adjusted by _ore_add_sg_seg().
    Don't let it be overwritten.
    b. The all 'cur_comp > starting_dev' thing to determine if:
    "per_dev->offset is in the current stripe number or the
    next one."
    Was a complete raid5/4 accident. When parity==2 this is not
    at all true usually. All we need to do is increment si->ob_offset
    once we pass by the first parity device.
    (This also greatly simplifies the code, amen)
    c. Calculation of si->dev rotation can overflow when parity==2.

    * Then last enable raid6 in ore_verify_layout()

    I want to deeply thank Daniel Gryniewicz who found first all the
    bugs in the old raid code, and inspired these patches:
    Inspired-by Daniel Gryniewicz

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • Two cleanups:
    * si->cur_comp, si->cur_pg where always calculated after
    the call to ore_calc_stripe_info() with the help of
    _dev_order(...). But these are already calculated by
    ore_calc_stripe_info() and can be just set there.
    (This is left over from the time that si->cur_comp, si->cur_pg
    were only used by raid code, but now the main loop manages
    them anyway even though they are ultimately not used in
    none raid code)

    * si->cur_comp - For the very last stripe case, was set inside
    _ore_add_parity_unit(). This is not clear and will be wrong
    for coming raid6 so move this to only caller. Now si->cur_comp
    is only manipulated within _prepare_for_striping(), always next
    to the manipulation of cur_dev.
    Which is much easier to understand and follow.

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • rearrange some source lines. Nothing changed.

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     

07 May, 2014

3 commits


11 Apr, 2014

1 commit


04 Apr, 2014

1 commit

  • Reclaim will be leaving shadow entries in the page cache radix tree upon
    evicting the real page. As those pages are found from the LRU, an
    iput() can lead to the inode being freed concurrently. At this point,
    reclaim must no longer install shadow pages because the inode freeing
    code needs to ensure the page tree is really empty.

    Add an address_space flag, AS_EXITING, that the inode freeing code sets
    under the tree lock before doing the final truncate. Reclaim will check
    for this flag before installing shadow pages.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Mel Gorman
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

03 Apr, 2014

2 commits

  • Mark functions as static in exofs/ore_raid.c because they are not used
    outside this file.

    This also eliminates the following warning in exofs/ore_raid.c:
    fs/exofs/ore_raid.c:24:14: warning: no previous prototype for _raid_page_alloc [-Wmissing-prototypes]
    fs/exofs/ore_raid.c:29:6: warning: no previous prototype for _raid_page_free [-Wmissing-prototypes]

    Signed-off-by: Rashika Kheria
    Reviewed-by: Josh Triplett
    Signed-off-by: Boaz Harrosh

    Rashika Kheria
     
  • Mark function as static in exofs/super.c because it is not used outside
    this file.

    This also eliminates the following warning in exofs/super.c:
    fs/exofs/super.c:546:5: warning: no previous prototype \
    for __alloc_dev_table[-Wmissing-prototypes]

    Signed-off-by: Rashika Kheria
    Reviewed-by: Josh Triplett
    Signed-off-by: Boaz Harrosh

    Rashika Kheria
     

24 Jan, 2014

2 commits

  • In debug mode exofs is too verbose. Hiding the real problems
    remove some trivial stuff.

    Also fix some other prints.

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • If there was an error in fetching an object or extracting
    inode info from attributes. Which means corrupted storage.
    Let it be an empty ZERO dated directory entry so it can be
    deleted. Otherwise the all directory will be inaccessible.

    This does not loose data, because if there is an orphan object
    somewhere it will be recovered by fschk. But usually this only
    means corrupted dir entry. The object was never generated and
    only its link exist. This way we can delete the bad entry.

    Signed-off-by: Boaz Harrosh

    Boaz Harrosh