27 Oct, 2020

1 commit


08 Jul, 2020

1 commit


02 Jun, 2020

1 commit

  • Pull documentation updates from Jonathan Corbet:
    "A fair amount of stuff this time around, dominated by yet another
    massive set from Mauro toward the completion of the RST conversion. I
    *really* hope we are getting close to the end of this. Meanwhile,
    those patches reach pretty far afield to update document references
    around the tree; there should be no actual code changes there. There
    will be, alas, more of the usual trivial merge conflicts.

    Beyond that we have more translations, improvements to the sphinx
    scripting, a number of additions to the sysctl documentation, and lots
    of fixes"

    * tag 'docs-5.8' of git://git.lwn.net/linux: (130 commits)
    Documentation: fixes to the maintainer-entry-profile template
    zswap: docs/vm: Fix typo accept_threshold_percent in zswap.rst
    tracing: Fix events.rst section numbering
    docs: acpi: fix old http link and improve document format
    docs: filesystems: add info about efivars content
    Documentation: LSM: Correct the basic LSM description
    mailmap: change email for Ricardo Ribalda
    docs: sysctl/kernel: document unaligned controls
    Documentation: admin-guide: update bug-hunting.rst
    docs: sysctl/kernel: document ngroups_max
    nvdimm: fixes to maintainter-entry-profile
    Documentation/features: Correct RISC-V kprobes support entry
    Documentation/features: Refresh the arch support status files
    Revert "docs: sysctl/kernel: document ngroups_max"
    docs: move locking-specific documents to locking/
    docs: move digsig docs to the security book
    docs: move the kref doc into the core-api book
    docs: add IRQ documentation at the core-api book
    docs: debugging-via-ohci1394.txt: add it to the core-api book
    docs: fix references for ipmi.rst file
    ...

    Linus Torvalds
     

09 May, 2020

1 commit

  • There is a potential race in fscache operation enqueuing for reading and
    copying multiple pages from cachefiles to netfs. The problem can be seen
    easily on a heavy loaded system (for example many processes reading files
    continually on an NFS share covered by fscache triggered this problem within
    a few minutes).

    The race is due to cachefiles_read_waiter() adding the op to the monitor
    to_do list and then then drop the object->work_lock spinlock before
    completing fscache_enqueue_operation(). Once the lock is dropped,
    cachefiles_read_copier() grabs the op, completes processing it, and
    makes it through fscache_retrieval_complete() which sets the op->state to
    the final state of FSCACHE_OP_ST_COMPLETE(4). When cachefiles_read_waiter()
    finally gets through the remainder of fscache_enqueue_operation()
    it sees the invalid state, and hits the ASSERTCMP and the following
    oops is seen:
    [ 2259.612361] FS-Cache:
    [ 2259.614785] FS-Cache: Assertion failed
    [ 2259.618639] FS-Cache: 4 == 5 is false
    [ 2259.622456] ------------[ cut here ]------------
    [ 2259.627190] kernel BUG at fs/fscache/operation.c:70!
    ...
    [ 2259.791675] RIP: 0010:[] [] fscache_enqueue_operation+0xff/0x170 [fscache]
    [ 2259.802059] RSP: 0000:ffffa0263d543be0 EFLAGS: 00010046
    [ 2259.807521] RAX: 0000000000000019 RBX: ffffa01a4d390480 RCX: 0000000000000006
    [ 2259.814847] RDX: 0000000000000000 RSI: 0000000000000046 RDI: ffffa0263d553890
    [ 2259.822176] RBP: ffffa0263d543be8 R08: 0000000000000000 R09: ffffa0263c2d8708
    [ 2259.829502] R10: 0000000000001e7f R11: 0000000000000000 R12: ffffa01a4d390480
    [ 2259.844483] R13: ffff9fa9546c5920 R14: ffffa0263d543c80 R15: ffffa0293ff9bf10
    [ 2259.859554] FS: 00007f4b6efbd700(0000) GS:ffffa0263d540000(0000) knlGS:0000000000000000
    [ 2259.875571] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 2259.889117] CR2: 00007f49e1624ff0 CR3: 0000012b38b38000 CR4: 00000000007607e0
    [ 2259.904015] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [ 2259.918764] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [ 2259.933449] PKRU: 55555554
    [ 2259.943654] Call Trace:
    [ 2259.953592]
    [ 2259.955577] [] cachefiles_read_waiter+0x92/0xf0 [cachefiles]
    [ 2259.978039] [] __wake_up_common+0x82/0x120
    [ 2259.991392] [] __wake_up_common_lock+0x83/0xc0
    [ 2260.004930] [] ? task_rq_unlock+0x20/0x20
    [ 2260.017863] [] __wake_up+0x13/0x20
    [ 2260.030230] [] __wake_up_bit+0x50/0x70
    [ 2260.042535] [] unlock_page+0x2b/0x30
    [ 2260.054495] [] page_endio+0x29/0x90
    [ 2260.066184] [] mpage_end_io+0x51/0x80

    CPU1
    cachefiles_read_waiter()
    20 static int cachefiles_read_waiter(wait_queue_entry_t *wait, unsigned mode,
    21 int sync, void *_key)
    22 {
    ...
    61 spin_lock(&object->work_lock);
    62 list_add_tail(&monitor->op_link, &op->to_do);
    63 spin_unlock(&object->work_lock);

    64
    65 fscache_enqueue_retrieval(op);
    182 static inline void fscache_enqueue_retrieval(struct fscache_retrieval *op)
    183 {
    184 fscache_enqueue_operation(&op->op);
    185 }
    58 void fscache_enqueue_operation(struct fscache_operation *op)
    59 {
    60 struct fscache_cookie *cookie = op->object->cookie;
    61
    62 _enter("{OBJ%x OP%x,%u}",
    63 op->object->debug_id, op->debug_id, atomic_read(&op->usage));
    64
    65 ASSERT(list_empty(&op->pend_link));
    66 ASSERT(op->processor != NULL);
    67 ASSERT(fscache_object_is_available(op->object));
    68 ASSERTCMP(atomic_read(&op->usage), >, 0);

    CPU2
    cachefiles_read_copier()
    168 while (!list_empty(&op->to_do)) {
    ...
    202 fscache_end_io(op, monitor->netfs_page, error);
    203 put_page(monitor->netfs_page);
    204 fscache_retrieval_complete(op, 1);

    CPU1
    58 void fscache_enqueue_operation(struct fscache_operation *op)
    59 {
    ...
    69 ASSERTIFCMP(op->state != FSCACHE_OP_ST_IN_PROGRESS,
    70 op->state, ==, FSCACHE_OP_ST_CANCELLED);

    Signed-off-by: Lei Xue
    Signed-off-by: Dave Wysochanski
    Signed-off-by: David Howells

    Lei Xue
     

05 May, 2020

1 commit

  • - Add a SPDX header;
    - Adjust document title;
    - Mark literal blocks as such;
    - Add table markups;
    - Comment out text ToC for html/pdf output;
    - Add lists markups;
    - Add it to filesystems/caching/index.rst.

    Signed-off-by: Mauro Carvalho Chehab
    Link: https://lore.kernel.org/r/eec0cfc268e8dca348f760224685100c9c2caba6.1588021877.git.mchehab+huawei@kernel.org
    Signed-off-by: Jonathan Corbet

    Mauro Carvalho Chehab
     

04 May, 2020

1 commit

  • The patch which changed cachefiles from calling ->bmap() to using the
    bmap() wrapper overwrote the running return value with the result of
    calling bmap(). This causes an assertion failure elsewhere in the code.

    Fix this by using ret2 rather than ret to hold the return value.

    The oops looks like:

    kernel BUG at fs/nfs/fscache.c:468!
    ...
    RIP: 0010:__nfs_readpages_from_fscache+0x18b/0x190 [nfs]
    ...
    Call Trace:
    nfs_readpages+0xbf/0x1c0 [nfs]
    ? __alloc_pages_nodemask+0x16c/0x320
    read_pages+0x67/0x1a0
    __do_page_cache_readahead+0x1cf/0x1f0
    ondemand_readahead+0x172/0x2b0
    page_cache_async_readahead+0xaa/0xe0
    generic_file_buffered_read+0x852/0xd50
    ? mem_cgroup_commit_charge+0x6e/0x140
    ? nfs4_have_delegation+0x19/0x30 [nfsv4]
    generic_file_read_iter+0x100/0x140
    ? nfs_revalidate_mapping+0x176/0x2b0 [nfs]
    nfs_file_read+0x6d/0xc0 [nfs]
    new_sync_read+0x11a/0x1c0
    __vfs_read+0x29/0x40
    vfs_read+0x8e/0x140
    ksys_read+0x61/0xd0
    __x64_sys_read+0x1a/0x20
    do_syscall_64+0x60/0x1e0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
    RIP: 0033:0x7f5d148267e0

    Fixes: 10d83e11a582 ("cachefiles: drop direct usage of ->bmap method.")
    Reported-by: David Wysochanski
    Signed-off-by: David Howells
    Tested-by: David Wysochanski
    cc: Carlos Maiolino

    David Howells
     

03 Feb, 2020

1 commit


24 May, 2019

1 commit

  • Based on 1 normalized pattern(s):

    this program is free software you can redistribute it and or modify
    it under the terms of the gnu general public licence as published by
    the free software foundation either version 2 of the licence or at
    your option any later version

    extracted by the scancode license scanner the SPDX license identifier

    GPL-2.0-or-later

    has been chosen to replace the boilerplate/reference in 114 file(s).

    Signed-off-by: Thomas Gleixner
    Reviewed-by: Allison Randal
    Reviewed-by: Kate Stewart
    Cc: linux-spdx@vger.kernel.org
    Link: https://lkml.kernel.org/r/20190520170857.552531963@linutronix.de
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 May, 2019

1 commit


15 May, 2019

1 commit


01 Dec, 2018

3 commits

  • Variable 'cache' is being assigned but is never used hence it is
    redundant and can be removed.

    Cleans up clang warning:
    warning: variable 'cache' set but not used [-Wunused-but-set-variable]

    Signed-off-by: Colin Ian King
    Signed-off-by: David Howells

    Colin Ian King
     
  • get_seconds() returns an unsigned long can overflow on some architectures
    and is deprecated because of that. In cachefs, we cast that number to
    a a 32-bit integer, which will overflow in year 2106 on all architectures.

    As confirmed by David Howells, the overflow probably isn't harmful
    in the end, since the timestamps are only used to make the file names
    unique, but they don't strictly have to be in monotonically increasing
    order since the files only exist in order to be deleted as quickly
    as possible.

    Moving to ktime_get_real_seconds() avoids the deprecated interface.

    Signed-off-by: Arnd Bergmann
    Signed-off-by: David Howells

    Arnd Bergmann
     
  • Clang warns when one enumerated type is implicitly converted to another.

    fs/cachefiles/namei.c:247:50: warning: implicit conversion from
    enumeration type 'enum cachefiles_obj_ref_trace' to different
    enumeration type 'enum fscache_obj_ref_trace' [-Wenum-conversion]
    cache->cache.ops->put_object(&xobject->fscache,
    cachefiles_obj_put_wait_retry);

    Silence this warning by explicitly casting to fscache_obj_ref_trace,
    which is also done in put_object.

    Reported-by: Nick Desaulniers
    Signed-off-by: Nathan Chancellor
    Signed-off-by: David Howells

    Nathan Chancellor
     

28 Nov, 2018

2 commits

  • [Description]

    In a heavily loaded system where the system pagecache is nearing memory
    limits and fscache is enabled, pages can be leaked by fscache while trying
    read pages from cachefiles backend. This can happen because two
    applications can be reading same page from a single mount, two threads can
    be trying to read the backing page at same time. This results in one of
    the threads finding that a page for the backing file or netfs file is
    already in the radix tree. During the error handling cachefiles does not
    clean up the reference on backing page, leading to page leak.

    [Fix]
    The fix is straightforward, to decrement the reference when error is
    encountered.

    [dhowells: Note that I've removed the clearance and put of newpage as
    they aren't attested in the commit message and don't appear to actually
    achieve anything since a new page is only allocated is newpage!=NULL and
    any residual new page is cleared before returning.]

    [Testing]
    I have tested the fix using following method for 12+ hrs.

    1) mkdir -p /mnt/nfs ; mount -o vers=3,fsc :/export /mnt/nfs
    2) create 10000 files of 2.8MB in a NFS mount.
    3) start a thread to simulate heavy VM presssure
    (while true ; do echo 3 > /proc/sys/vm/drop_caches ; sleep 1 ; done)&
    4) start multiple parallel reader for data set at same time
    find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
    find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
    find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
    ..
    ..
    find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
    find /mnt/nfs -type f | xargs -P 80 cat > /dev/null &
    5) finally check using cat /proc/fs/fscache/stats | grep -i pages ;
    free -h , cat /proc/meminfo and page-types -r -b lru
    to ensure all pages are freed.

    Reviewed-by: Daniel Axtens
    Signed-off-by: Shantanu Goel
    Signed-off-by: Kiran Kumar Modukuri
    [dja: forward ported to current upstream]
    Signed-off-by: Daniel Axtens
    Signed-off-by: David Howells

    Kiran Kumar Modukuri
     
  • If cachefiles gets an error other then ENOENT when trying to look up an
    object in the cache (in this case, EACCES), the object state machine will
    eventually transition to the DROP_OBJECT state.

    This state invokes fscache_drop_object() which tries to sync the auxiliary
    data with the cache (this is done lazily since commit 402cb8dda949d) on an
    incomplete cache object struct.

    The problem comes when cachefiles_update_object_xattr() is called to
    rewrite the xattr holding the data. There's an assertion there that the
    cache object points to a dentry as we're going to update its xattr. The
    assertion trips, however, as dentry didn't get set.

    Fix the problem by skipping the update in cachefiles if the object doesn't
    refer to a dentry. A better way to do it could be to skip the update from
    the DROP_OBJECT state handler in fscache, but that might deny the cache the
    opportunity to update intermediate state.

    If this error occurs, the kernel log includes lines that look like the
    following:

    CacheFiles: Lookup failed error -13
    CacheFiles:
    CacheFiles: Assertion failed
    ------------[ cut here ]------------
    kernel BUG at fs/cachefiles/xattr.c:138!
    ...
    Workqueue: fscache_object fscache_object_work_func [fscache]
    RIP: 0010:cachefiles_update_object_xattr.cold.4+0x18/0x1a [cachefiles]
    ...
    Call Trace:
    cachefiles_update_object+0xdd/0x1c0 [cachefiles]
    fscache_update_aux_data+0x23/0x30 [fscache]
    fscache_drop_object+0x18e/0x1c0 [fscache]
    fscache_object_work_func+0x74/0x2b0 [fscache]
    process_one_work+0x18d/0x340
    worker_thread+0x2e/0x390
    ? pwq_unbound_release_workfn+0xd0/0xd0
    kthread+0x112/0x130
    ? kthread_bind+0x30/0x30
    ret_from_fork+0x35/0x40

    Note that there are actually two issues here: (1) EACCES happened on a
    cache object and (2) an oops occurred. I think that the second is a
    consequence of the first (it certainly looks like it ought to be). This
    patch only deals with the second.

    Fixes: 402cb8dda949 ("fscache: Attach the index key and aux data to the cookie")
    Reported-by: Zhibin Li
    Signed-off-by: David Howells

    David Howells
     

18 Oct, 2018

1 commit

  • the victim might've been rmdir'ed just before the lock_rename();
    unlike the normal callers, we do not look the source up after the
    parents are locked - we know it beforehand and just recheck that it's
    still the child of what used to be its parent. Unfortunately,
    the check is too weak - we don't spot a dead directory since its
    ->d_parent is unchanged, dentry is positive, etc. So we sail all
    the way to ->rename(), with hosting filesystems _not_ expecting
    to be asked renaming an rmdir'ed subdirectory.

    The fix is easy, fortunately - the lock on parent is sufficient for
    making IS_DEADDIR() on child safe.

    Cc: stable@vger.kernel.org
    Fixes: 9ae326a69004 (CacheFiles: A cache that backs onto a mounted filesystem)
    Signed-off-by: Al Viro
    Signed-off-by: David Howells
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     

25 Jul, 2018

4 commits

  • If we meet a conflicting object that is marked FSCACHE_OBJECT_IS_LIVE in
    the active object tree, we have been emitting a BUG after logging
    information about it and the new object.

    Instead, we should wait for the CACHEFILES_OBJECT_ACTIVE flag to be cleared
    on the old object (or return an error). The ACTIVE flag should be cleared
    after it has been removed from the active object tree. A timeout of 60s is
    used in the wait, so we shouldn't be able to get stuck there.

    Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem")
    Signed-off-by: Kiran Kumar Modukuri
    Signed-off-by: David Howells

    Kiran Kumar Modukuri
     
  • In cachefiles_mark_object_active(), the new object is marked active and
    then we try to add it to the active object tree. If a conflicting object
    is already present, we want to wait for that to go away. After the wait,
    we go round again and try to re-mark the object as being active - but it's
    already marked active from the first time we went through and a BUG is
    issued.

    Fix this by clearing the CACHEFILES_OBJECT_ACTIVE flag before we try again.

    Analysis from Kiran Kumar Modukuri:

    [Impact]
    Oops during heavy NFS + FSCache + Cachefiles

    CacheFiles: Error: Overlong wait for old active object to go away.

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000002

    CacheFiles: Error: Object already active kernel BUG at
    fs/cachefiles/namei.c:163!

    [Cause]
    In a heavily loaded system with big files being read and truncated, an
    fscache object for a cookie is being dropped and a new object being
    looked. The new object being looked for has to wait for the old object
    to go away before the new object is moved to active state.

    [Fix]
    Clear the flag 'CACHEFILES_OBJECT_ACTIVE' for the new object when
    retrying the object lookup.

    [Testcase]
    Have run ~100 hours of NFS stress tests and have not seen this bug recur.

    [Regression Potential]
    - Limited to fscache/cachefiles.

    Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem")
    Signed-off-by: Kiran Kumar Modukuri
    Signed-off-by: David Howells

    Kiran Kumar Modukuri
     
  • When a cookie is allocated that causes fscache_object structs to be
    allocated, those objects are initialised with the cookie pointer, but
    aren't blessed with a ref on that cookie unless the attachment is
    successfully completed in fscache_attach_object().

    If attachment fails because the parent object was dying or there was a
    collision, fscache_attach_object() returns without incrementing the cookie
    counter - but upon failure of this function, the object is released which
    then puts the cookie, whether or not a ref was taken on the cookie.

    Fix this by taking a ref on the cookie when it is assigned in
    fscache_object_init(), even when we're creating a root object.

    Analysis from Kiran Kumar:

    This bug has been seen in 4.4.0-124-generic #148-Ubuntu kernel

    BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1776277

    fscache cookie ref count updated incorrectly during fscache object
    allocation resulting in following Oops.

    kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/internal.h:321!
    kernel BUG at /build/linux-Y09MKI/linux-4.4.0/fs/fscache/cookie.c:639!

    [Cause]
    Two threads are trying to do operate on a cookie and two objects.

    (1) One thread tries to unmount the filesystem and in process goes over a
    huge list of objects marking them dead and deleting the objects.
    cookie->usage is also decremented in following path:

    nfs_fscache_release_super_cookie
    -> __fscache_relinquish_cookie
    ->__fscache_cookie_put
    ->BUG_ON(atomic_read(&cookie->usage) fscache_object_init
    -> assign cookie, but usage not bumped.
    2) fscache_attach_object -> fails in cant_attach_object because the
    cookie's backing object or cookie's->parent object are going away
    3) fscache_put_object
    -> cachefiles_put_object
    ->fscache_object_destroy
    ->fscache_cookie_put
    ->BUG_ON(atomic_read(&cookie->usage)
    Signed-off-by: David Howells

    Kiran Kumar Modukuri
     
  • cachefiles_read_waiter() has the right to access a 'monitor' object by
    virtue of being called under the waitqueue lock for one of the pages in its
    purview. However, it has no ref on that monitor object or on the
    associated operation.

    What it is allowed to do is to move the monitor object to the operation's
    to_do list, but once it drops the work_lock, it's actually no longer
    permitted to access that object. However, it is trying to enqueue the
    retrieval operation for processing - but it can only do this via a pointer
    in the monitor object, something it shouldn't be doing.

    If it doesn't enqueue the operation, the operation may not get processed.
    If the order is flipped so that the enqueue is first, then it's possible
    for the work processor to look at the to_do list before the monitor is
    enqueued upon it.

    Fix this by getting a ref on the operation so that we can trust that it
    will still be there once we've added the monitor to the to_do list and
    dropped the work_lock. The op can then be enqueued after the lock is
    dropped.

    The bug can manifest in one of a couple of ways. The first manifestation
    looks like:

    FS-Cache:
    FS-Cache: Assertion failed
    FS-Cache: 6 == 5 is false
    ------------[ cut here ]------------
    kernel BUG at fs/fscache/operation.c:494!
    RIP: 0010:fscache_put_operation+0x1e3/0x1f0
    ...
    fscache_op_work_func+0x26/0x50
    process_one_work+0x131/0x290
    worker_thread+0x45/0x360
    kthread+0xf8/0x130
    ? create_worker+0x190/0x190
    ? kthread_cancel_work_sync+0x10/0x10
    ret_from_fork+0x1f/0x30

    This is due to the operation being in the DEAD state (6) rather than
    INITIALISED, COMPLETE or CANCELLED (5) because it's already passed through
    fscache_put_operation().

    The bug can also manifest like the following:

    kernel BUG at fs/fscache/operation.c:69!
    ...
    [exception RIP: fscache_enqueue_operation+246]
    ...
    #7 [ffff883fff083c10] fscache_enqueue_operation at ffffffffa0b793c6
    #8 [ffff883fff083c28] cachefiles_read_waiter at ffffffffa0b15a48
    #9 [ffff883fff083c48] __wake_up_common at ffffffff810af028

    I'm not entirely certain as to which is line 69 in Lei's kernel, so I'm not
    entirely clear which assertion failed.

    Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem")
    Reported-by: Lei Xue
    Reported-by: Vegard Nossum
    Reported-by: Anthony DeRobertis
    Reported-by: NeilBrown
    Reported-by: Daniel Axtens
    Reported-by: Kiran Kumar Modukuri
    Signed-off-by: David Howells
    Reviewed-by: Daniel Axtens

    Kiran Kumar Modukuri
     

05 Jun, 2018

1 commit

  • Pull procfs updates from Al Viro:
    "Christoph's proc_create_... cleanups series"

    * 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (44 commits)
    xfs, proc: hide unused xfs procfs helpers
    isdn/gigaset: add back gigaset_procinfo assignment
    proc: update SIZEOF_PDE_INLINE_NAME for the new pde fields
    tty: replace ->proc_fops with ->proc_show
    ide: replace ->proc_fops with ->proc_show
    ide: remove ide_driver_proc_write
    isdn: replace ->proc_fops with ->proc_show
    atm: switch to proc_create_seq_private
    atm: simplify procfs code
    bluetooth: switch to proc_create_seq_data
    netfilter/x_tables: switch to proc_create_seq_private
    netfilter/xt_hashlimit: switch to proc_create_{seq,single}_data
    neigh: switch to proc_create_seq_data
    hostap: switch to proc_create_{seq,single}_data
    bonding: switch to proc_create_seq_data
    rtc/proc: switch to proc_create_single_data
    drbd: switch to proc_create_single
    resource: switch to proc_create_seq_data
    staging/rtl8192u: simplify procfs code
    jfs: simplify procfs code
    ...

    Linus Torvalds
     

22 May, 2018

1 commit


16 May, 2018

1 commit


06 Apr, 2018

1 commit

  • Pass the object size in to fscache_acquire_cookie() and
    fscache_write_page() rather than the netfs providing a callback by which it
    can be received. This makes it easier to update the size of the object
    when a new page is written that extends the object.

    The current object size is also passed by fscache to the check_aux
    function, obviating the need to store it in the aux data.

    Signed-off-by: David Howells
    Acked-by: Anna Schumaker
    Tested-by: Steve Dickson

    David Howells
     

04 Apr, 2018

3 commits

  • Attach copies of the index key and auxiliary data to the fscache cookie so
    that:

    (1) The callbacks to the netfs for this stuff can be eliminated. This
    can simplify things in the cache as the information is still
    available, even after the cache has relinquished the cookie.

    (2) Simplifies the locking requirements of accessing the information as we
    don't have to worry about the netfs object going away on us.

    (3) The cache can do lazy updating of the coherency information on disk.
    As long as the cache is flushed before reboot/poweroff, there's no
    need to update the coherency info on disk every time it changes.

    (4) Cookies can be hashed or put in a tree as the index key is easily
    available. This allows:

    (a) Checks for duplicate cookies can be made at the top fscache layer
    rather than down in the bowels of the cache backend.

    (b) Caching can be added to a netfs object that has a cookie if the
    cache is brought online after the netfs object is allocated.

    A certain amount of space is made in the cookie for inline copies of the
    data, but if it won't fit there, extra memory will be allocated for it.

    The downside of this is that live cache operation requires more memory.

    Signed-off-by: David Howells
    Acked-by: Anna Schumaker
    Tested-by: Steve Dickson

    David Howells
     
  • Add some tracepoints to fscache:

    (*) fscache_cookie - Tracks a cookie's usage count.

    (*) fscache_netfs - Logs registration of a network filesystem, including
    the pointer to the cookie allocated.

    (*) fscache_acquire - Logs cookie acquisition.

    (*) fscache_relinquish - Logs cookie relinquishment.

    (*) fscache_enable - Logs enablement of a cookie.

    (*) fscache_disable - Logs disablement of a cookie.

    (*) fscache_osm - Tracks execution of states in the object state machine.

    and cachefiles:

    (*) cachefiles_ref - Tracks a cachefiles object's usage count.

    (*) cachefiles_lookup - Logs result of lookup_one_len().

    (*) cachefiles_mkdir - Logs result of vfs_mkdir().

    (*) cachefiles_create - Logs result of vfs_create().

    (*) cachefiles_unlink - Logs calls to vfs_unlink().

    (*) cachefiles_rename - Logs calls to vfs_rename().

    (*) cachefiles_mark_active - Logs an object becoming active.

    (*) cachefiles_wait_active - Logs a wait for an old object to be
    destroyed.

    (*) cachefiles_mark_inactive - Logs an object becoming inactive.

    (*) cachefiles_mark_buried - Logs the burial of an object.

    Signed-off-by: David Howells

    David Howells
     
  • Fix a couple of checker warnings in fscache and cachefiles:

    (1) fscache_n_op_requeue is never used, so get rid of it.

    (2) cachefiles_uncache_page() is passed in a lock that it releases, so
    this needs annotating.

    Signed-off-by: David Howells

    David Howells
     

12 Feb, 2018

1 commit

  • This is the mindless scripted replacement of kernel use of POLL*
    variables as described by Al, done by this script:

    for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
    L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
    for f in $L; do sed -i "-es/^\([^\"]*\)\(\\)/\\1E\\2/" $f; done
    done

    with de-mangling cleanups yet to come.

    NOTE! On almost all architectures, the EPOLL* constants have the same
    values as the POLL* constants do. But they keyword here is "almost".
    For various bad reasons they aren't the same, and epoll() doesn't
    actually work quite correctly in some cases due to this on Sparc et al.

    The next patch from Al will sort out the final differences, and we
    should be all done.

    Scripted-by: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

28 Nov, 2017

1 commit


16 Nov, 2017

2 commits

  • As the page free path makes no distinction between cache hot and cold
    pages, there is no real useful ordering of pages in the free list that
    allocation requests can take advantage of. Juding from the users of
    __GFP_COLD, it is likely that a number of them are the result of copying
    other sites instead of actually measuring the impact. Remove the
    __GFP_COLD parameter which simplifies a number of paths in the page
    allocator.

    This is potentially controversial but bear in mind that the size of the
    per-cpu pagelists versus modern cache sizes means that the whole per-cpu
    list can often fit in the L3 cache. Hence, there is only a potential
    benefit for microbenchmarks that alloc/free pages in a tight loop. It's
    even worse when THP is taken into account which has little or no chance
    of getting a cache-hot page as the per-cpu list is bypassed and the
    zeroing of multiple pages will thrash the cache anyway.

    The truncate microbenchmarks are not shown as this patch affects the
    allocation path and not the free path. A page fault microbenchmark was
    tested but it showed no sigificant difference which is not surprising
    given that the __GFP_COLD branches are a miniscule percentage of the
    fault path.

    Link: http://lkml.kernel.org/r/20171018075952.10627-9-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Vlastimil Babka
    Cc: Andi Kleen
    Cc: Dave Chinner
    Cc: Dave Hansen
    Cc: Jan Kara
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Every pagevec_init user claims the pages being released are hot even in
    cases where it is unlikely the pages are hot. As no one cares about the
    hotness of pages being released to the allocator, just ditch the
    parameter.

    No performance impact is expected as the overhead is marginal. The
    parameter is removed simply because it is a bit stupid to have a useless
    parameter copied everywhere.

    Link: http://lkml.kernel.org/r/20171018075952.10627-6-mgorman@techsingularity.net
    Signed-off-by: Mel Gorman
    Acked-by: Vlastimil Babka
    Cc: Andi Kleen
    Cc: Dave Chinner
    Cc: Dave Hansen
    Cc: Jan Kara
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

17 Jul, 2017

1 commit

  • Firstly by applying the following with coccinelle's spatch:

    @@ expression SB; @@
    -SB->s_flags & MS_RDONLY
    +sb_rdonly(SB)

    to effect the conversion to sb_rdonly(sb), then by applying:

    @@ expression A, SB; @@
    (
    -(!sb_rdonly(SB)) && A
    +!sb_rdonly(SB) && A
    |
    -A != (sb_rdonly(SB))
    +A != sb_rdonly(SB)
    |
    -A == (sb_rdonly(SB))
    +A == sb_rdonly(SB)
    |
    -!(sb_rdonly(SB))
    +!sb_rdonly(SB)
    |
    -A && (sb_rdonly(SB))
    +A && sb_rdonly(SB)
    |
    -A || (sb_rdonly(SB))
    +A || sb_rdonly(SB)
    |
    -(sb_rdonly(SB)) != A
    +sb_rdonly(SB) != A
    |
    -(sb_rdonly(SB)) == A
    +sb_rdonly(SB) == A
    |
    -(sb_rdonly(SB)) && A
    +sb_rdonly(SB) && A
    |
    -(sb_rdonly(SB)) || A
    +sb_rdonly(SB) || A
    )

    @@ expression A, B, SB; @@
    (
    -(sb_rdonly(SB)) ? 1 : 0
    +sb_rdonly(SB)
    |
    -(sb_rdonly(SB)) ? A : B
    +sb_rdonly(SB) ? A : B
    )

    to remove left over excess bracketage and finally by applying:

    @@ expression A, SB; @@
    (
    -(A & MS_RDONLY) != sb_rdonly(SB)
    +(bool)(A & MS_RDONLY) != sb_rdonly(SB)
    |
    -(A & MS_RDONLY) == sb_rdonly(SB)
    +(bool)(A & MS_RDONLY) == sb_rdonly(SB)
    )

    to make comparisons against the result of sb_rdonly() (which is a bool)
    work correctly.

    Signed-off-by: David Howells

    David Howells
     

20 Jun, 2017

3 commits

  • So I've noticed a number of instances where it was not obvious from the
    code whether ->task_list was for a wait-queue head or a wait-queue entry.

    Furthermore, there's a number of wait-queue users where the lists are
    not for 'tasks' but other entities (poll tables, etc.), in which case
    the 'task_list' name is actively confusing.

    To clear this all up, name the wait-queue head and entry list structure
    fields unambiguously:

    struct wait_queue_head::task_list => ::head
    struct wait_queue_entry::task_list => ::entry

    For example, this code:

    rqw->wait.task_list.next != &wait->task_list

    ... is was pretty unclear (to me) what it's doing, while now it's written this way:

    rqw->wait.head.next != &wait->entry

    ... which makes it pretty clear that we are iterating a list until we see the head.

    Other examples are:

    list_for_each_entry_safe(pos, next, &x->task_list, task_list) {
    list_for_each_entry(wq, &fence->wait.task_list, task_list) {

    ... where it's unclear (to me) what we are iterating, and during review it's
    hard to tell whether it's trying to walk a wait-queue entry (which would be
    a bug), while now it's written as:

    list_for_each_entry_safe(pos, next, &x->head, entry) {
    list_for_each_entry(wq, &fence->wait.head, entry) {

    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • The wait_bit*() types and APIs are mixed into wait.h, but they
    are a pretty orthogonal extension of wait-queues.

    Furthermore, only about 50 kernel files use these APIs, while
    over 1000 use the regular wait-queue functionality.

    So clean up the main wait.h by moving the wait-bit functionality
    out of it, into a separate .h and .c file:

    include/linux/wait_bit.h for types and APIs
    kernel/sched/wait_bit.c for the implementation

    Update all header dependencies.

    This reduces the size of wait.h rather significantly, by about 30%.

    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     
  • Rename:

    wait_queue_t => wait_queue_entry_t

    'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
    but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
    which had to carry the name.

    Start sorting this out by renaming it to 'wait_queue_entry_t'.

    This also allows the real structure name 'struct __wait_queue' to
    lose its double underscore and become 'struct wait_queue_entry',
    which is the more canonical nomenclature for such data types.

    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar

    Ingo Molnar
     

02 Mar, 2017

1 commit


11 Oct, 2016

2 commits

  • Pull more vfs updates from Al Viro:
    ">rename2() work from Miklos + current_time() from Deepa"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: Replace current_fs_time() with current_time()
    fs: Replace CURRENT_TIME_SEC with current_time() for inode timestamps
    fs: Replace CURRENT_TIME with current_time() for inode timestamps
    fs: proc: Delete inode time initializations in proc_alloc_inode()
    vfs: Add current_time() api
    vfs: add note about i_op->rename changes to porting
    fs: rename "rename2" i_op to "rename"
    vfs: remove unused i_op->rename
    fs: make remaining filesystems use .rename2
    libfs: support RENAME_NOREPLACE in simple_rename()
    fs: support RENAME_NOREPLACE for local filesystems
    ncpfs: fix unused variable warning

    Linus Torvalds
     
  • Pull vfs xattr updates from Al Viro:
    "xattr stuff from Andreas

    This completes the switch to xattr_handler ->get()/->set() from
    ->getxattr/->setxattr/->removexattr"

    * 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: Remove {get,set,remove}xattr inode operations
    xattr: Stop calling {get,set,remove}xattr inode operations
    vfs: Check for the IOP_XATTR flag in listxattr
    xattr: Add __vfs_{get,set,remove}xattr helpers
    libfs: Use IOP_XATTR flag for empty directory handling
    vfs: Use IOP_XATTR flag for bad-inode handling
    vfs: Add IOP_XATTR inode operations flag
    vfs: Move xattr_resolve_name to the front of fs/xattr.c
    ecryptfs: Switch to generic xattr handlers
    sockfs: Get rid of getxattr iop
    sockfs: getxattr: Fail with -EOPNOTSUPP for invalid attribute names
    kernfs: Switch to generic xattr handlers
    hfs: Switch to generic xattr handlers
    jffs2: Remove jffs2_{get,set,remove}xattr macros
    xattr: Remove unnecessary NULL attribute name check

    Linus Torvalds
     

08 Oct, 2016

1 commit

  • Right now, various places in the kernel check for the existence of
    getxattr, setxattr, and removexattr inode operations and directly call
    those operations. Switch to helper functions and test for the IOP_XATTR
    flag instead.

    Signed-off-by: Andreas Gruenbacher
    Acked-by: James Morris
    Signed-off-by: Al Viro

    Andreas Gruenbacher