29 Jul, 2020

1 commit

  • A sequence counter write side critical section must be protected by some
    form of locking to serialize writers. A plain seqcount_t does not
    contain the information of which lock must be held when entering a write
    side critical section.

    Use the new seqcount_spinlock_t data type, which allows to associate a
    spinlock with the sequence counter. This enables lockdep to verify that
    the spinlock used for writer serialization is held when the write side
    critical section is entered.

    If lockdep is disabled this lock association is compiled out and has
    neither storage size nor runtime overhead.

    Signed-off-by: Ahmed S. Darwish
    Signed-off-by: Peter Zijlstra (Intel)
    Link: https://lkml.kernel.org/r/20200720155530.1173732-19-a.darwish@linutronix.de

    Ahmed S. Darwish
     

04 Jun, 2020

1 commit

  • Pull networking updates from David Miller:

    1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
    Augusto von Dentz.

    2) Add GSO partial support to igc, from Sasha Neftin.

    3) Several cleanups and improvements to r8169 from Heiner Kallweit.

    4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
    device self-test. From Andrew Lunn.

    5) Start moving away from custom driver versions, use the globally
    defined kernel version instead, from Leon Romanovsky.

    6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.

    7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.

    8) Add sriov and vf support to hinic, from Luo bin.

    9) Support Media Redundancy Protocol (MRP) in the bridging code, from
    Horatiu Vultur.

    10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.

    11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
    Dubroca. Also add ipv6 support for espintcp.

    12) Lots of ReST conversions of the networking documentation, from Mauro
    Carvalho Chehab.

    13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
    from Doug Berger.

    14) Allow to dump cgroup id and filter by it in inet_diag code, from
    Dmitry Yakunin.

    15) Add infrastructure to export netlink attribute policies to
    userspace, from Johannes Berg.

    16) Several optimizations to sch_fq scheduler, from Eric Dumazet.

    17) Fallback to the default qdisc if qdisc init fails because otherwise
    a packet scheduler init failure will make a device inoperative. From
    Jesper Dangaard Brouer.

    18) Several RISCV bpf jit optimizations, from Luke Nelson.

    19) Correct the return type of the ->ndo_start_xmit() method in several
    drivers, it's netdev_tx_t but many drivers were using
    'int'. From Yunjian Wang.

    20) Add an ethtool interface for PHY master/slave config, from Oleksij
    Rempel.

    21) Add BPF iterators, from Yonghang Song.

    22) Add cable test infrastructure, including ethool interfaces, from
    Andrew Lunn. Marvell PHY driver is the first to support this
    facility.

    23) Remove zero-length arrays all over, from Gustavo A. R. Silva.

    24) Calculate and maintain an explicit frame size in XDP, from Jesper
    Dangaard Brouer.

    25) Add CAP_BPF, from Alexei Starovoitov.

    26) Support terse dumps in the packet scheduler, from Vlad Buslov.

    27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.

    28) Add devm_register_netdev(), from Bartosz Golaszewski.

    29) Minimize qdisc resets, from Cong Wang.

    30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
    eliminate set_fs/get_fs calls. From Christoph Hellwig.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
    selftests: net: ip_defrag: ignore EPERM
    net_failover: fixed rollback in net_failover_open()
    Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
    Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
    vmxnet3: allow rx flow hash ops only when rss is enabled
    hinic: add set_channels ethtool_ops support
    selftests/bpf: Add a default $(CXX) value
    tools/bpf: Don't use $(COMPILE.c)
    bpf, selftests: Use bpf_probe_read_kernel
    s390/bpf: Use bcr 0,%0 as tail call nop filler
    s390/bpf: Maintain 8-byte stack alignment
    selftests/bpf: Fix verifier test
    selftests/bpf: Fix sample_cnt shared between two threads
    bpf, selftests: Adapt cls_redirect to call csum_level helper
    bpf: Add csum_level helper for fixing up csum levels
    bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
    sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
    crypto/chtls: IPv6 support for inline TLS
    Crypto/chcr: Fixes a coccinile check error
    Crypto/chcr: Fixes compilations warnings
    ...

    Linus Torvalds
     

13 May, 2020

1 commit

  • DCACHE_DONTCACHE indicates a dentry should not be cached on final
    dput().

    Also add a helper function to mark DCACHE_DONTCACHE on all dentries
    pointing to a specific inode when that inode is being set I_DONTCACHE.

    This facilitates dropping dentry references to inodes sooner which
    require eviction to swap S_DAX mode.

    Cc: Al Viro
    Signed-off-by: Ira Weiny
    Reviewed-by: Jan Kara
    Reviewed-by: Darrick J. Wong
    Signed-off-by: Darrick J. Wong

    Ira Weiny
     

27 Apr, 2020

1 commit

  • Instead of having all the sysctl handlers deal with user pointers, which
    is rather hairy in terms of the BPF interaction, copy the input to and
    from userspace in common code. This also means that the strings are
    always NUL-terminated by the common code, making the API a little bit
    safer.

    As most handler just pass through the data to one of the common handlers
    a lot of the changes are mechnical.

    Signed-off-by: Christoph Hellwig
    Acked-by: Andrey Ignatov
    Signed-off-by: Al Viro

    Christoph Hellwig
     

09 Dec, 2019

1 commit


07 Dec, 2019

1 commit

  • Pull vfs d_inode/d_flags memory ordering fixes from Al Viro:
    "Fallout from tree-wide audit for ->d_inode/->d_flags barriers use.
    Basically, the problem is that negative pinned dentries require
    careful treatment - unless ->d_lock is locked or parent is held at
    least shared, another thread can make them positive right under us.

    Most of the uses turned out to be safe - the main surprises as far as
    filesystems are concerned were

    - race in dget_parent() fastpath, that might end up with the caller
    observing the returned dentry _negative_, due to insufficient
    barriers. It is positive in memory, but we could end up seeing the
    wrong value of ->d_inode in CPU cache. Fixed.

    - manual checks that result of lookup_one_len_unlocked() is positive
    (and rejection of negatives). Again, insufficient barriers (we
    might end up with inconsistent observed values of ->d_inode and
    ->d_flags). Fixed by switching to a new primitive that does the
    checks itself and returns ERR_PTR(-ENOENT) instead of a negative
    dentry. That way we get rid of boilerplate converting negatives
    into ERR_PTR(-ENOENT) in the callers and have a single place to
    deal with the barrier-related mess - inside fs/namei.c rather than
    in every caller out there.

    The guts of pathname resolution *do* need to be careful - the race
    found by Ritesh is real, as well as several similar races.
    Fortunately, it turns out that we can take care of that with fairly
    local changes in there.

    The tree-wide audit had not been fun, and I hate the idea of repeating
    it. I think the right approach would be to annotate the places where
    we are _not_ guaranteed ->d_inode/->d_flags stability and have sparse
    catch regressions. But I'm still not sure what would be the least
    invasive way of doing that and it's clearly the next cycle fodder"

    * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs/namei.c: fix missing barriers when checking positivity
    fix dget_parent() fastpath race
    new helper: lookup_positive_unlocked()
    fs/namei.c: pull positivity check into follow_managed()

    Linus Torvalds
     

16 Nov, 2019

2 commits

  • Pinned negative dentries can, generally, be made positive
    by another thread. Conditions that prevent that are
    * ->d_lock on dentry in question
    * parent directory held at least shared
    * nobody else could have observed the address of dentry
    Most of the places working with those fall into one of those
    categories; however, d_lookup() and friends need to be used
    with some care. Fortunately, there's not a lot of call sites,
    and with few exceptions all of those fall under one of the
    cases above.

    Exceptions are all in fs/namei.c - in lookup_fast(), lookup_dcache()
    and mountpoint_last(). Another one is lookup_slow() - there
    dcache lookup is done with parent held shared, but the result
    is used after we'd drop the lock. The same happens in do_last() -
    the lookup (in lookup_one()) is done with parent locked, but
    result is used after unlocking.

    lookup_fast(), do_last() and mountpoint_last() flat-out reject
    negatives.

    Most of lookup_dcache() calls are made with parent locked at least
    shared; the only exception is lookup_one_len_unlocked(). It might
    return pinned negative, needs serious care from callers. Fortunately,
    almost nobody calls it directly anymore; all but two callers have
    converted to lookup_positive_unlocked(), which rejects negatives.

    lookup_slow() is called by the same lookup_one_len_unlocked() (see
    above), mountpoint_last() and walk_component(). In those two negatives
    are rejected.

    In other words, there is a small set of places where we need to
    check carefully if a pinned potentially negative dentry is, in
    fact, positive. After that check we want to be sure that both
    ->d_inode and type bits in ->d_flags are stable and observed.
    The set consists of follow_managed() (where the rejection happens
    for lookup_fast(), walk_component() and do_last()), last_mountpoint()
    and lookup_positive_unlocked().

    Solution:
    1) transition from negative to positive (in __d_set_inode_and_type())
    stores ->d_inode, then uses smp_store_release() to set ->d_flags type bits.
    2) aforementioned 3 places in fs/namei.c fetch ->d_flags with
    smp_load_acquire() and bugger off if it type bits say "negative".
    That way anyone downstream of those checks has dentry know positive pinned,
    with ->d_inode and type bits of ->d_flags stable and observed.

    I considered splitting off d_lookup_positive(), so that the checks could
    be done right there, under ->d_lock. However, that leads to massive
    duplication of rather subtle code in fs/namei.c and fs/dcache.c. It's
    worse than it might seem, thanks to autofs ->d_manage() getting involved ;-/
    No matter what, autofs_d_manage()/autofs_d_automount() must live with
    the possibility of pinned negative dentry passed their way, becoming
    positive under them - that's the intended behaviour when lookup comes
    in the middle of automount in progress, so we can't keep them out of
    the area that has to deal with those, more's the pity...

    Reported-by: Ritesh Harjani
    Signed-off-by: Al Viro

    Al Viro
     
  • We are overoptimistic about taking the fast path there; seeing
    the same value in ->d_parent after having grabbed a reference
    to that parent does *not* mean that it has remained our parent
    all along.

    That wouldn't be a big deal (in the end it is our parent and
    we have grabbed the reference we are about to return), but...
    the situation with barriers is messed up.

    We might have hit the following sequence:

    d is a dentry of /tmp/a/b
    CPU1: CPU2:
    parent = d->d_parent (i.e. dentry of /tmp/a)
    rename /tmp/a/b to /tmp/b
    rmdir /tmp/a, making its dentry negative
    grab reference to parent,
    end up with cached parent->d_inode (NULL)
    mkdir /tmp/a, rename /tmp/b to /tmp/a/b
    recheck d->d_parent, which is back to original
    decide that everything's fine and return the reference we'd got.

    The trouble is, caller (on CPU1) will observe dget_parent()
    returning an apparently negative dentry. It actually is positive,
    but CPU1 has stale ->d_inode cached.

    Use d->d_seq to see if it has been moved instead of rechecking ->d_parent.
    NOTE: we are *NOT* going to retry on any kind of ->d_seq mismatch;
    we just go into the slow path in such case. We don't wait for ->d_seq
    to become even either - again, if we are racing with renames, we
    can bloody well go to slow path anyway.

    Signed-off-by: Al Viro

    Al Viro
     

26 Oct, 2019

1 commit


09 Oct, 2019

1 commit

  • Since the following commit:

    b4adfe8e05f1 ("locking/lockdep: Remove unused argument in __lock_release")

    @nested is no longer used in lock_release(), so remove it from all
    lock_release() calls and friends.

    Signed-off-by: Qian Cai
    Signed-off-by: Peter Zijlstra (Intel)
    Acked-by: Will Deacon
    Acked-by: Daniel Vetter
    Cc: Linus Torvalds
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: airlied@linux.ie
    Cc: akpm@linux-foundation.org
    Cc: alexander.levin@microsoft.com
    Cc: daniel@iogearbox.net
    Cc: davem@davemloft.net
    Cc: dri-devel@lists.freedesktop.org
    Cc: duyuyang@gmail.com
    Cc: gregkh@linuxfoundation.org
    Cc: hannes@cmpxchg.org
    Cc: intel-gfx@lists.freedesktop.org
    Cc: jack@suse.com
    Cc: jlbec@evilplan.or
    Cc: joonas.lahtinen@linux.intel.com
    Cc: joseph.qi@linux.alibaba.com
    Cc: jslaby@suse.com
    Cc: juri.lelli@redhat.com
    Cc: maarten.lankhorst@linux.intel.com
    Cc: mark@fasheh.com
    Cc: mhocko@kernel.org
    Cc: mripard@kernel.org
    Cc: ocfs2-devel@oss.oracle.com
    Cc: rodrigo.vivi@intel.com
    Cc: sean@poorly.run
    Cc: st@kernel.org
    Cc: tj@kernel.org
    Cc: tytso@mit.edu
    Cc: vdavydov.dev@gmail.com
    Cc: vincent.guittot@linaro.org
    Cc: viro@zeniv.linux.org.uk
    Link: https://lkml.kernel.org/r/1568909380-32199-1-git-send-email-cai@lca.pw
    Signed-off-by: Ingo Molnar

    Qian Cai
     

21 Jul, 2019

1 commit

  • Pull dcache and mountpoint updates from Al Viro:
    "Saner handling of refcounts to mountpoints.

    Transfer the counting reference from struct mount ->mnt_mountpoint
    over to struct mountpoint ->m_dentry. That allows us to get rid of the
    convoluted games with ordering of mount shutdowns.

    The cost is in teaching shrink_dcache_{parent,for_umount} to cope with
    mixed-filesystem shrink lists, which we'll also need for the Slab
    Movable Objects patchset"

    * 'work.dcache2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    switch the remnants of releasing the mountpoint away from fs_pin
    get rid of detach_mnt()
    make struct mountpoint bear the dentry reference to mountpoint, not struct mount
    Teach shrink_dcache_parent() to cope with mixed-filesystem shrink lists
    fs/namespace.c: shift put_mountpoint() to callers of unhash_mnt()
    __detach_mounts(): lookup_mountpoint() can't return ERR_PTR() anymore
    nfs: dget_parent() never returns NULL
    ceph: don't open-code the check for dead lockref

    Linus Torvalds
     

10 Jul, 2019

1 commit

  • Currently, running into a shrink list that contains dentries from different
    filesystems can cause several unpleasant things for shrink_dcache_parent()
    and for umount(2).

    The first problem is that there's a window during shrink_dentry_list() between
    __dentry_kill() takes a victim out and dropping reference to its parent. During
    that window the parent looks like a genuine busy dentry. shrink_dcache_parent()
    (or, worse yet, shrink_dcache_for_umount()) coming at that time will see no
    eviction candidates and no indication that it needs to wait for some
    shrink_dentry_list() to proceed further.

    That applies for any shrink list that might intersect with the subtree we are
    trying to shrink; the only reason it does not blow on umount(2) in the mainline
    is that we unregister the memory shrinker before hitting shrink_dcache_for_umount().

    Another problem happens if something in a mixed-filesystem shrink list gets
    be stuck in e.g. iput(), getting umount of unrelated fs to spin waiting for
    the stuck shrinker to get around to our dentries.

    Solution:
    1) have shrink_dentry_list() decrement the parent's refcount and
    make sure it's on a shrink list (ours unless it already had been on some
    other) before calling __dentry_kill(). That eliminates the window when
    shrink_dcache_parent() would've blown past the entire subtree without
    noticing anything with zero refcount not on shrink lists.
    2) when shrink_dcache_parent() has found no eviction candidates,
    but some dentries are still sitting on shrink lists, rather than
    repeating the scan in hope that shrinkers have progressed, scan looking
    for something on shrink lists with zero refcount. If such a thing is
    found, grab rcu_read_lock() and stop the scan, with caller locking
    it for eviction, dropping out of RCU and doing __dentry_kill(), with
    the same treatment for parent as shrink_dentry_list() would do.

    Note that right now mixed-filesystem shrink lists do not occur, so this
    is not a mainline bug. Howevere, there's a bunch of uses for such
    beasts (e.g. the "try and evict everything we can out of given page"
    patches; there are potential uses in mount-related code, considerably
    simplifying the life in fs/namespace.c, etc.)

    Signed-off-by: Al Viro

    Al Viro
     

20 Jun, 2019

1 commit

  • d_delete() was piggy backed for the fsnotify_nameremove() hook when
    in fact not all callers of d_delete() care about fsnotify events.

    For all callers of d_delete() that may be interested in fsnotify events,
    we made sure to call one of fsnotify_{unlink,rmdir}() hooks before
    calling d_delete().

    Now we can move the fsnotify_nameremove() call from d_delete() to the
    fsnotify_{unlink,rmdir}() hooks.

    Two explicit calls to fsnotify_nameremove() from nfs/afs sillyrename
    are also removed. This will cause a change of behavior - nfs/afs will
    NOT generate an fsnotify delete event when renaming over a positive
    dentry. This change is desirable, because it is consistent with the
    behavior of all other filesystems.

    Signed-off-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Amir Goldstein
     

21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

08 May, 2019

1 commit

  • Pull fscrypt updates from Ted Ts'o:
    "Clean up fscrypt's dcache revalidation support, and other
    miscellaneous cleanups"

    * tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
    fscrypt: cache decrypted symlink target in ->i_link
    vfs: use READ_ONCE() to access ->i_link
    fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext
    fscrypt: only set dentry_operations on ciphertext dentries
    fs, fscrypt: clear DCACHE_ENCRYPTED_NAME when unaliasing directory
    fscrypt: fix race allowing rename() and link() of ciphertext dentries
    fscrypt: clean up and improve dentry revalidation
    fscrypt: use READ_ONCE() to access ->i_crypt_info
    fscrypt: remove WARN_ON_ONCE() when decryption fails
    fscrypt: drop inode argument from fscrypt_get_ctx()

    Linus Torvalds
     

27 Apr, 2019

1 commit


17 Apr, 2019

1 commit

  • Make __d_move() clear DCACHE_ENCRYPTED_NAME on the source dentry. This
    is needed for when d_splice_alias() moves a directory's encrypted alias
    to its decrypted alias as a result of the encryption key being added.

    Otherwise, the decrypted alias will incorrectly be invalidated on the
    next lookup, causing problems such as unmounting a mount the user just
    mount()ed there.

    Note that we don't have to support arbitrary moves of this flag because
    fscrypt doesn't allow dentries with DCACHE_ENCRYPTED_NAME to be the
    source or target of a rename().

    Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key")
    Reported-by: Sarthak Kukreti
    Signed-off-by: Eric Biggers
    Signed-off-by: Theodore Ts'o

    Eric Biggers
     

10 Apr, 2019

2 commits

  • No modular uses since introducion of alloc_file_pseudo(),
    and the only non-modular user not in alloc_file_pseudo()
    had actually been wrong - should've been d_alloc_anon().

    Signed-off-by: Al Viro

    Al Viro
     
  • For lockless accesses to dentries we don't have pinned we rely
    (among other things) upon having an RCU delay between dropping
    the last reference and actually freeing the memory.

    On the other hand, for things like pipes and sockets we neither
    do that kind of lockless access, nor want to deal with the
    overhead of an RCU delay every time a socket gets closed.

    So delay was made optional - setting DCACHE_RCUACCESS in ->d_flags
    made sure it would happen. We tried to avoid setting it unless
    we knew we need it. Unfortunately, that had led to recurring
    class of bugs, in which we missed the need to set it.

    We only really need it for dentries that are created by
    d_alloc_pseudo(), so let's not bother with trying to be smart -
    just make having an RCU delay the default. The ones that do
    *not* get it set the replacement flag (DCACHE_NORCU) and we'd
    better use that sparingly. d_alloc_pseudo() is the only
    such user right now.

    FWIW, the race that finally prompted that switch had been
    between __lock_parent() of immediate subdirectory of what's
    currently the root of a disconnected tree (e.g. from
    open-by-handle in progress) racing with d_splice_alias()
    elsewhere picking another alias for the same inode, either
    on outright corrupted fs image, or (in case of open-by-handle
    on NFS) that subdirectory having been just moved on server.
    It's not easy to hit, so the sky is not falling, but that's
    not the first race on similar missed cases and the logics
    for settinf DCACHE_RCUACCESS has gotten ridiculously
    convoluted.

    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro

    Al Viro
     

31 Jan, 2019

2 commits

  • The current dentry number tracking code doesn't distinguish between
    positive & negative dentries. It just reports the total number of
    dentries in the LRU lists.

    As excessive number of negative dentries can have an impact on system
    performance, it will be wise to track the number of positive and
    negative dentries separately.

    This patch adds tracking for the total number of negative dentries in
    the system LRU lists and reports it in the 5th field in the
    /proc/sys/fs/dentry-state file. The number, however, does not include
    negative dentries that are in flight but not in the LRU yet as well as
    those in the shrinker lists which are on the way out anyway.

    The number of positive dentries in the LRU lists can be roughly found by
    subtracting the number of negative dentries from the unused count.

    Matthew Wilcox had confirmed that since the introduction of the
    dentry_stat structure in 2.1.60, the dummy array was there, probably for
    future extension. They were not replacements of pre-existing fields.
    So no sane applications that read the value of /proc/sys/fs/dentry-state
    will do dummy thing if the last 2 fields of the sysctl parameter are not
    zero. IOW, it will be safe to use one of the dummy array entry for
    negative dentry count.

    Signed-off-by: Waiman Long
    Signed-off-by: Linus Torvalds

    Waiman Long
     
  • The nr_dentry_unused per-cpu counter tracks dentries in both the LRU
    lists and the shrink lists where the DCACHE_LRU_LIST bit is set.

    The shrink_dcache_sb() function moves dentries from the LRU list to a
    shrink list and subtracts the dentry count from nr_dentry_unused. This
    is incorrect as the nr_dentry_unused count will also be decremented in
    shrink_dentry_list() via d_shrink_del().

    To fix this double decrement, the decrement in the shrink_dcache_sb()
    function is taken out.

    Fixes: 4e717f5c1083 ("list_lru: remove special case function list_lru_dispose_all."
    Cc: stable@kernel.org
    Signed-off-by: Waiman Long
    Reviewed-by: Dave Chinner
    Signed-off-by: Linus Torvalds

    Waiman Long
     

31 Oct, 2018

1 commit

  • Move remaining definitions and declarations from include/linux/bootmem.h
    into include/linux/memblock.h and remove the redundant header.

    The includes were replaced with the semantic patch below and then
    semi-automated removal of duplicated '#include

    @@
    @@
    - #include
    + #include

    [sfr@canb.auug.org.au: dma-direct: fix up for the removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181002185342.133d1680@canb.auug.org.au
    [sfr@canb.auug.org.au: powerpc: fix up for removal of linux/bootmem.h]
    Link: http://lkml.kernel.org/r/20181005161406.73ef8727@canb.auug.org.au
    [sfr@canb.auug.org.au: x86/kaslr, ACPI/NUMA: fix for linux/bootmem.h removal]
    Link: http://lkml.kernel.org/r/20181008190341.5e396491@canb.auug.org.au
    Link: http://lkml.kernel.org/r/1536927045-23536-30-git-send-email-rppt@linux.vnet.ibm.com
    Signed-off-by: Mike Rapoport
    Signed-off-by: Stephen Rothwell
    Acked-by: Michal Hocko
    Cc: Catalin Marinas
    Cc: Chris Zankel
    Cc: "David S. Miller"
    Cc: Geert Uytterhoeven
    Cc: Greentime Hu
    Cc: Greg Kroah-Hartman
    Cc: Guan Xuetao
    Cc: Ingo Molnar
    Cc: "James E.J. Bottomley"
    Cc: Jonas Bonn
    Cc: Jonathan Corbet
    Cc: Ley Foon Tan
    Cc: Mark Salter
    Cc: Martin Schwidefsky
    Cc: Matt Turner
    Cc: Michael Ellerman
    Cc: Michal Simek
    Cc: Palmer Dabbelt
    Cc: Paul Burton
    Cc: Richard Kuo
    Cc: Richard Weinberger
    Cc: Rich Felker
    Cc: Russell King
    Cc: Serge Semin
    Cc: Thomas Gleixner
    Cc: Tony Luck
    Cc: Vineet Gupta
    Cc: Yoshinori Sato
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Rapoport
     

27 Oct, 2018

1 commit

  • We can use the newly introduced kmalloc-reclaimable-X caches, to allocate
    external names in dcache, which will take care of the proper accounting
    automatically, and also improve anti-fragmentation page grouping.

    This effectively reverts commit f1782c9bc547 ("dcache: account external
    names as indirectly reclaimable memory") and instead passes
    __GFP_RECLAIMABLE to kmalloc(). The accounting thus moves from
    NR_INDIRECTLY_RECLAIMABLE_BYTES to NR_SLAB_RECLAIMABLE, which is also
    considered in MemAvailable calculation and overcommit decisions.

    Link: http://lkml.kernel.org/r/20180731090649.16028-4-vbabka@suse.cz
    Signed-off-by: Vlastimil Babka
    Acked-by: Mel Gorman
    Acked-by: Roman Gushchin
    Cc: Christoph Lameter
    Cc: David Rientjes
    Cc: Johannes Weiner
    Cc: Joonsoo Kim
    Cc: Laura Abbott
    Cc: Matthew Wilcox
    Cc: Michal Hocko
    Cc: Sumit Semwal
    Cc: Vijayanand Jitta
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     

18 Aug, 2018

1 commit

  • Since only dentry->d_name.len + 1 bytes out of DNAME_INLINE_LEN bytes
    are initialized at __d_alloc(), we can't copy the whole size
    unconditionally.

    WARNING: kmemcheck: Caught 32-bit read from uninitialized memory (ffff8fa27465ac50)
    636f6e66696766732e746d70000000000010000000000000020000000188ffff
    i i i i i i i i i i i i i u u u u u u u u u u i i i i i u u u u
    ^
    RIP: 0010:take_dentry_name_snapshot+0x28/0x50
    RSP: 0018:ffffa83000f5bdf8 EFLAGS: 00010246
    RAX: 0000000000000020 RBX: ffff8fa274b20550 RCX: 0000000000000002
    RDX: ffffa83000f5be40 RSI: ffff8fa27465ac50 RDI: ffffa83000f5be60
    RBP: ffffa83000f5bdf8 R08: ffffa83000f5be48 R09: 0000000000000001
    R10: ffff8fa27465ac00 R11: ffff8fa27465acc0 R12: ffff8fa27465ac00
    R13: ffff8fa27465acc0 R14: 0000000000000000 R15: 0000000000000000
    FS: 00007f79737ac8c0(0000) GS:ffffffff8fc30000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffff8fa274c0b000 CR3: 0000000134aa7002 CR4: 00000000000606f0
    take_dentry_name_snapshot+0x28/0x50
    vfs_rename+0x128/0x870
    SyS_rename+0x3b2/0x3d0
    entry_SYSCALL_64_fastpath+0x1a/0xa4
    0xffffffffffffffff

    Link: http://lkml.kernel.org/r/201709131912.GBG39012.QMJLOVFSFFOOtH@I-love.SAKURA.ne.jp
    Signed-off-by: Tetsuo Handa
    Cc: Vegard Nossum
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     

14 Aug, 2018

2 commits

  • …ux/kernel/git/viro/vfs

    Pull misc vfs updates from Al Viro:
    "Misc cleanups from various folks all over the place

    I expected more fs/dcache.c cleanups this cycle, so that went into a
    separate branch. Said cleanups have missed the window, so in the
    hindsight it could've gone into work.misc instead. Decided not to
    cherry-pick, thus the 'work.dcache' branch"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    fs: dcache: Use true and false for boolean values
    fold generic_readlink() into its only caller
    fs: shave 8 bytes off of struct inode
    fs: Add more kernel-doc to the produced documentation
    fs: Fix attr.c kernel-doc
    removed extra extern file_fdatawait_range

    * 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    kill dentry_update_name_case()

    Linus Torvalds
     
  • Pull vfs icache updates from Al Viro:

    - NFS mkdir/open_by_handle race fix

    - analogous solution for FUSE, replacing the one currently in mainline

    - new primitive to be used when discarding halfway set up inodes on
    failed object creation; gives sane warranties re icache lookups not
    returning such doomed by still not freed inodes. A bunch of
    filesystems switched to that animal.

    - Miklos' fix for last cycle regression in iget5_locked(); -stable will
    need a slightly different variant, unfortunately.

    - misc bits and pieces around things icache-related (in adfs and jfs).

    * 'work.mkdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    jfs: don't bother with make_bad_inode() in ialloc()
    adfs: don't put inodes into icache
    new helper: inode_fake_hash()
    vfs: don't evict uninitialized inode
    jfs: switch to discard_new_inode()
    ext2: make sure that partially set up inodes won't be returned by ext2_iget()
    udf: switch to discard_new_inode()
    ufs: switch to discard_new_inode()
    btrfs: switch to discard_new_inode()
    new primitive: discard_new_inode()
    kill d_instantiate_no_diralias()
    nfs_instantiate(): prevent multiple aliases for directory inode

    Linus Torvalds
     

10 Aug, 2018

1 commit

  • RCU pathwalk relies upon the assumption that anything that changes
    ->d_inode of a dentry will invalidate its ->d_seq. That's almost
    true - the one exception is that the final dput() of already unhashed
    dentry does *not* touch ->d_seq at all. Unhashing does, though,
    so for anything we'd found by RCU dcache lookup we are fine.
    Unfortunately, we can *start* with an unhashed dentry or jump into
    it.

    We could try and be careful in the (few) places where that could
    happen. Or we could just make the final dput() invalidate the damn
    thing, unhashed or not. The latter is much simpler and easier to
    backport, so let's do it that way.

    Reported-by: "Dae R. Jeong"
    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro

    Al Viro
     

06 Aug, 2018

2 commits

  • Since mountpoint crossing can happen without leaving lazy mode,
    root dentries do need the same protection against having their
    memory freed without RCU delay as everything else in the tree.

    It's partially hidden by RCU delay between detaching from the
    mount tree and dropping the vfsmount reference, but the starting
    point of pathwalk can be on an already detached mount, in which
    case umount-caused RCU delay has already passed by the time the
    lazy pathwalk grabs rcu_read_lock(). If the starting point
    happens to be at the root of that vfsmount *and* that vfsmount
    covers the entire filesystem, we get trouble.

    Fixes: 48a066e72d97 ("RCU'd vsfmounts")
    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro

    Al Viro
     
  • Return statements in functions returning bool should use true or false
    instead of an integer value.

    This issue was detected with the help of Coccinelle.

    Signed-off-by: Gustavo A. R. Silva
    Signed-off-by: Al Viro

    Gustavo A. R. Silva
     

04 Aug, 2018

1 commit

  • We don't want open-by-handle picking half-set-up in-core
    struct inode from e.g. mkdir() having failed halfway through.
    In other words, we don't want such inodes returned by iget_locked()
    on their way to extinction. However, we can't just have them
    unhashed - otherwise open-by-handle immediately *after* that would've
    ended up creating a new in-core inode over the on-disk one that
    is in process of being freed right under us.

    Solution: new flag (I_CREATING) set by insert_inode_locked() and
    removed by unlock_new_inode() and a new primitive (discard_new_inode())
    to be used by such halfway-through-setup failure exits instead of
    unlock_new_inode() / iput() combinations. That primitive unlocks new
    inode, but leaves I_CREATING in place.

    iget_locked() treats finding an I_CREATING inode as failure
    (-ESTALE, once we sort out the error propagation).
    insert_inode_locked() treats the same as instant -EBUSY.
    ilookup() treats those as icache miss.

    [Fix by Dan Carpenter folded in]

    Signed-off-by: Al Viro

    Al Viro
     

02 Aug, 2018

1 commit

  • The only user is fuse_create_new_entry(), and there it's used to
    mitigate the same mkdir/open-by-handle race as in nfs_mkdir().
    The same solution applies - unhash the mkdir argument, then
    call d_splice_alias() and if that returns a reference to preexisting
    alias, dput() and report success. ->mkdir() argument left unhashed
    negative with the preexisting alias moved in the right place is just
    fine from the ->mkdir() callers point of view.

    Cc: Miklos Szeredi
    Signed-off-by: Al Viro

    Al Viro
     

24 Jun, 2018

1 commit


05 Jun, 2018

1 commit

  • Pull misc vfs updates from Al Viro:
    "Misc bits and pieces not fitting into anything more specific"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: delete unnecessary assignment in vfs_listxattr
    Documentation: filesystems: update filesystem locking documentation
    vfs: namei: use path_equal() in follow_dotdot()
    fs.h: fix outdated comment about file flags
    __inode_security_revalidate() never gets NULL opt_dentry
    make xattr_getsecurity() static
    vfat: simplify checks in vfat_lookup()
    get rid of dead code in d_find_alias()
    it's SB_BORN, not MS_BORN...
    msdos_rmdir(): kill BS comment
    remove rpc_rmdir()
    fs: avoid fdput() after failed fdget() in vfs_dedupe_file_range()

    Linus Torvalds
     

04 Jun, 2018

1 commit


14 May, 2018

1 commit

  • All "try disconnected alias if nothing else fits" logics in d_find_alias()
    got accidentally disabled by Neil a while ago; for most of the callers it
    was the right thing to do, so fixes belong in few callers that *do* want
    disconnected aliases. This just takes the now-dead code in d_find_alias()
    out.

    Signed-off-by: Al Viro

    Al Viro
     

12 May, 2018

1 commit

  • For anything NFS-exported we do _not_ want to unlock new inode
    before it has grown an alias; original set of fixes got the
    ordering right, but missed the nasty complication in case of
    lockdep being enabled - unlock_new_inode() does
    lockdep_annotate_inode_mutex_key(inode)
    which can only be done before anyone gets a chance to touch
    ->i_mutex. Unfortunately, flipping the order and doing
    unlock_new_inode() before d_instantiate() opens a window when
    mkdir can race with open-by-fhandle on a guessed fhandle, leading
    to multiple aliases for a directory inode and all the breakage
    that follows from that.

    Correct solution: a new primitive (d_instantiate_new())
    combining these two in the right order - lockdep annotate, then
    d_instantiate(), then the rest of unlock_new_inode(). All
    combinations of d_instantiate() with unlock_new_inode() should
    be converted to that.

    Cc: stable@kernel.org # 2.6.29 and later
    Tested-by: Mike Marshall
    Reviewed-by: Andreas Dilger
    Signed-off-by: Al Viro

    Al Viro
     

20 Apr, 2018

1 commit


16 Apr, 2018

3 commits