18 Oct, 2020

1 commit

  • A previous commit changed the notification mode from true/false to an
    int, allowing notify-no, notify-yes, or signal-notify. This was
    backwards compatible in the sense that any existing true/false user
    would translate to either 0 (on notification sent) or 1, the latter
    which mapped to TWA_RESUME. TWA_SIGNAL was assigned a value of 2.

    Clean this up properly, and define a proper enum for the notification
    mode. Now we have:

    - TWA_NONE. This is 0, same as before the original change, meaning no
    notification requested.
    - TWA_RESUME. This is 1, same as before the original change, meaning
    that we use TIF_NOTIFY_RESUME.
    - TWA_SIGNAL. This uses TIF_SIGPENDING/JOBCTL_TASK_WORK for the
    notification.

    Clean up all the callers, switching their 0/1/false/true to using the
    appropriate TWA_* mode for notifications.

    Fixes: e91b48162332 ("task_work: teach task_work_add() to do signal_wake_up()")
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Jens Axboe

    Jens Axboe
     

30 Jun, 2020

1 commit

  • This reverts commit e9c15badbb7b ("fs: Do not check if there is a
    fsnotify watcher on pseudo inodes"). The commit intended to eliminate
    fsnotify-related overhead for pseudo inodes but it is broken in
    concept. inotify can receive events of pipe files under /proc/X/fd and
    chromium relies on close and open events for sandboxing. Maxim Levitsky
    reported the following

    Chromium starts as a white rectangle, shows few white rectangles that
    resemble its notifications and then crashes.

    The stdout output from chromium:

    [mlevitsk@starship ~]$chromium-freeworld
    mesa: for the --simplifycfg-sink-common option: may only occur zero or one times!
    mesa: for the --global-isel-abort option: may only occur zero or one times!
    [3379:3379:0628/135151.440930:ERROR:browser_switcher_service.cc(238)] XXX Init()
    ../../sandbox/linux/seccomp-bpf-helpers/sigsys_handlers.cc:**CRASHING**:seccomp-bpf failure in syscall 0072
    Received signal 11 SEGV_MAPERR 0000004a9048

    Crashes are not universal but even if chromium does not crash, it certainly
    does not work properly. While filtering just modify and access might be
    safe, the benefit is not worth the risk hence the revert.

    Reported-by: Maxim Levitsky
    Fixes: e9c15badbb7b ("fs: Do not check if there is a fsnotify watcher on pseudo inodes")
    Signed-off-by: Mel Gorman
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

16 Jun, 2020

1 commit

  • The kernel uses internal mounts created by kern_mount() and populated
    with files with no lookup path by alloc_file_pseudo() for a variety of
    reasons. An example of such a mount is for anonymous pipes. For pipes,
    every vfs_write() regardless of filesystem, calls fsnotify_modify()
    to notify of any changes which incurs a small amount of overhead in
    fsnotify even when there are no watchers. It can also trigger for reads
    and readv and writev, it was simply vfs_write() that was noticed first.

    A patch is pending that reduces, but does not eliminate, the overhead of
    fsnotify but for files that cannot be looked up via a path, even that
    small overhead is unnecessary. The user API for all notification
    subsystems (inotify, fanotify, ...) is based on the pathname and a dirfd
    and proc entries appear to be the only visible representation of the
    files. Proc does not have the same pathname as the internal entry and
    the proc inode is not the same as the internal inode so even if fanotify
    is used on a file under /proc/XX/fd, no useful events are notified.

    This patch changes alloc_file_pseudo() to always opt out of fsnotify by
    setting FMODE_NONOTIFY flag so that no check is made for fsnotify
    watchers on pseudo files. This should be safe as the underlying helper
    for the dentry is d_alloc_pseudo() which explicitly states that no
    lookups are ever performed meaning that fanotify should have nothing
    useful to attach to.

    The test motivating this was "perf bench sched messaging --pipe". On
    a single-socket machine using threads the difference of the patch was
    as follows.

    5.7.0 5.7.0
    vanilla nofsnotify-v1r1
    Amean 1 1.3837 ( 0.00%) 1.3547 ( 2.10%)
    Amean 3 3.7360 ( 0.00%) 3.6543 ( 2.19%)
    Amean 5 5.8130 ( 0.00%) 5.7233 * 1.54%*
    Amean 7 8.1490 ( 0.00%) 7.9730 * 2.16%*
    Amean 12 14.6843 ( 0.00%) 14.1820 ( 3.42%)
    Amean 18 21.8840 ( 0.00%) 21.7460 ( 0.63%)
    Amean 24 28.8697 ( 0.00%) 29.1680 ( -1.03%)
    Amean 30 36.0787 ( 0.00%) 35.2640 * 2.26%*
    Amean 32 38.0527 ( 0.00%) 38.1223 ( -0.18%)

    The difference is small but in some cases it's outside the noise so
    while marginal, there is still some small benefit to ignoring fsnotify
    for files allocated via alloc_file_pseudo() in some cases.

    Link: https://lore.kernel.org/r/20200615121358.GF3183@techsingularity.net
    Signed-off-by: Mel Gorman
    Reviewed-by: Amir Goldstein
    Signed-off-by: Jan Kara

    Mel Gorman
     

04 Jun, 2020

1 commit

  • Pull networking updates from David Miller:

    1) Allow setting bluetooth L2CAP modes via socket option, from Luiz
    Augusto von Dentz.

    2) Add GSO partial support to igc, from Sasha Neftin.

    3) Several cleanups and improvements to r8169 from Heiner Kallweit.

    4) Add IF_OPER_TESTING link state and use it when ethtool triggers a
    device self-test. From Andrew Lunn.

    5) Start moving away from custom driver versions, use the globally
    defined kernel version instead, from Leon Romanovsky.

    6) Support GRO vis gro_cells in DSA layer, from Alexander Lobakin.

    7) Allow hard IRQ deferral during NAPI, from Eric Dumazet.

    8) Add sriov and vf support to hinic, from Luo bin.

    9) Support Media Redundancy Protocol (MRP) in the bridging code, from
    Horatiu Vultur.

    10) Support netmap in the nft_nat code, from Pablo Neira Ayuso.

    11) Allow UDPv6 encapsulation of ESP in the ipsec code, from Sabrina
    Dubroca. Also add ipv6 support for espintcp.

    12) Lots of ReST conversions of the networking documentation, from Mauro
    Carvalho Chehab.

    13) Support configuration of ethtool rxnfc flows in bcmgenet driver,
    from Doug Berger.

    14) Allow to dump cgroup id and filter by it in inet_diag code, from
    Dmitry Yakunin.

    15) Add infrastructure to export netlink attribute policies to
    userspace, from Johannes Berg.

    16) Several optimizations to sch_fq scheduler, from Eric Dumazet.

    17) Fallback to the default qdisc if qdisc init fails because otherwise
    a packet scheduler init failure will make a device inoperative. From
    Jesper Dangaard Brouer.

    18) Several RISCV bpf jit optimizations, from Luke Nelson.

    19) Correct the return type of the ->ndo_start_xmit() method in several
    drivers, it's netdev_tx_t but many drivers were using
    'int'. From Yunjian Wang.

    20) Add an ethtool interface for PHY master/slave config, from Oleksij
    Rempel.

    21) Add BPF iterators, from Yonghang Song.

    22) Add cable test infrastructure, including ethool interfaces, from
    Andrew Lunn. Marvell PHY driver is the first to support this
    facility.

    23) Remove zero-length arrays all over, from Gustavo A. R. Silva.

    24) Calculate and maintain an explicit frame size in XDP, from Jesper
    Dangaard Brouer.

    25) Add CAP_BPF, from Alexei Starovoitov.

    26) Support terse dumps in the packet scheduler, from Vlad Buslov.

    27) Support XDP_TX bulking in dpaa2 driver, from Ioana Ciornei.

    28) Add devm_register_netdev(), from Bartosz Golaszewski.

    29) Minimize qdisc resets, from Cong Wang.

    30) Get rid of kernel_getsockopt and kernel_setsockopt in order to
    eliminate set_fs/get_fs calls. From Christoph Hellwig.

    * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2517 commits)
    selftests: net: ip_defrag: ignore EPERM
    net_failover: fixed rollback in net_failover_open()
    Revert "tipc: Fix potential tipc_aead refcnt leak in tipc_crypto_rcv"
    Revert "tipc: Fix potential tipc_node refcnt leak in tipc_rcv"
    vmxnet3: allow rx flow hash ops only when rss is enabled
    hinic: add set_channels ethtool_ops support
    selftests/bpf: Add a default $(CXX) value
    tools/bpf: Don't use $(COMPILE.c)
    bpf, selftests: Use bpf_probe_read_kernel
    s390/bpf: Use bcr 0,%0 as tail call nop filler
    s390/bpf: Maintain 8-byte stack alignment
    selftests/bpf: Fix verifier test
    selftests/bpf: Fix sample_cnt shared between two threads
    bpf, selftests: Adapt cls_redirect to call csum_level helper
    bpf: Add csum_level helper for fixing up csum levels
    bpf: Fix up bpf_skb_adjust_room helper's skb csum setting
    sfc: add missing annotation for efx_ef10_try_update_nic_stats_vf()
    crypto/chtls: IPv6 support for inline TLS
    Crypto/chcr: Fixes a coccinile check error
    Crypto/chcr: Fixes compilations warnings
    ...

    Linus Torvalds
     

03 Jun, 2020

1 commit

  • Patch series "vfs: have syncfs() return error when there are writeback
    errors", v6.

    Currently, syncfs does not return errors when one of the inodes fails to
    be written back. It will return errors based on the legacy AS_EIO and
    AS_ENOSPC flags when syncing out the block device fails, but that's not
    particularly helpful for filesystems that aren't backed by a blockdev.
    It's also possible for a stray sync to lose those errors.

    The basic idea in this set is to track writeback errors at the
    superblock level, so that we can quickly and easily check whether
    something bad happened without having to fsync each file individually.
    syncfs is then changed to reliably report writeback errors after they
    occur, much in the same fashion as fsync does now.

    This patch (of 2):

    Usually we suggest that applications call fsync when they want to ensure
    that all data written to the file has made it to the backing store, but
    that can be inefficient when there are a lot of open files.

    Calling syncfs on the filesystem can be more efficient in some
    situations, but the error reporting doesn't currently work the way most
    people expect. If a single inode on a filesystem reports a writeback
    error, syncfs won't necessarily return an error. syncfs only returns an
    error if __sync_blockdev fails, and on some filesystems that's a no-op.

    It would be better if syncfs reported an error if there were any
    writeback failures. Then applications could call syncfs to see if there
    are any errors on any open files, and could then call fsync on all of
    the other descriptors to figure out which one failed.

    This patch adds a new errseq_t to struct super_block, and has
    mapping_set_error also record writeback errors there.

    To report those errors, we also need to keep an errseq_t in struct file
    to act as a cursor. This patch adds a dedicated field for that purpose,
    which slots nicely into 4 bytes of padding at the end of struct file on
    x86_64.

    An earlier version of this patch used an O_PATH file descriptor to cue
    the kernel that the open file should track the superblock error and not
    the inode's writeback error.

    I think that API is just too weird though. This is simpler and should
    make syncfs error reporting "just work" even if someone is multiplexing
    fsync and syncfs on the same fds.

    Signed-off-by: Jeff Layton
    Signed-off-by: Andrew Morton
    Reviewed-by: Jan Kara
    Cc: Andres Freund
    Cc: Matthew Wilcox
    Cc: Al Viro
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: David Howells
    Link: http://lkml.kernel.org/r/20200428135155.19223-1-jlayton@kernel.org
    Link: http://lkml.kernel.org/r/20200428135155.19223-2-jlayton@kernel.org
    Signed-off-by: Linus Torvalds

    Jeff Layton
     

27 Apr, 2020

1 commit

  • Instead of having all the sysctl handlers deal with user pointers, which
    is rather hairy in terms of the BPF interaction, copy the input to and
    from userspace in common code. This also means that the strings are
    always NUL-terminated by the common code, making the API a little bit
    safer.

    As most handler just pass through the data to one of the common handlers
    a lot of the changes are mechnical.

    Signed-off-by: Christoph Hellwig
    Acked-by: Andrey Ignatov
    Signed-off-by: Al Viro

    Christoph Hellwig
     

19 Aug, 2019

1 commit


21 May, 2019

1 commit

  • Add SPDX license identifiers to all files which:

    - Have no license information of any form

    - Have EXPORT_.*_SYMBOL_GPL inside which was used in the
    initial scan/conversion to ignore the file

    These files fall under the project license, GPL v2 only. The resulting SPDX
    license identifier is:

    GPL-2.0-only

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     

21 Mar, 2019

1 commit

  • open_tree(dfd, pathname, flags)

    Returns an O_PATH-opened file descriptor or an error.
    dfd and pathname specify the location to open, in usual
    fashion (see e.g. fstatat(2)). flags should be an OR of
    some of the following:
    * AT_PATH_EMPTY, AT_NO_AUTOMOUNT, AT_SYMLINK_NOFOLLOW -
    same meanings as usual
    * OPEN_TREE_CLOEXEC - make the resulting descriptor
    close-on-exec
    * OPEN_TREE_CLONE or OPEN_TREE_CLONE | AT_RECURSIVE -
    instead of opening the location in question, create a detached
    mount tree matching the subtree rooted at location specified by
    dfd/pathname. With AT_RECURSIVE the entire subtree is cloned,
    without it - only the part within in the mount containing the
    location in question. In other words, the same as mount --rbind
    or mount --bind would've taken. The detached tree will be
    dissolved on the final close of obtained file. Creation of such
    detached trees requires the same capabilities as doing mount --bind.

    Signed-off-by: Al Viro
    Signed-off-by: David Howells
    cc: linux-api@vger.kernel.org
    Signed-off-by: Al Viro

    Al Viro
     

28 Feb, 2019

1 commit

  • Some uses cases repeatedly get and put references to the same file, but
    the only exposed interface is doing these one at the time. As each of
    these entail an atomic inc or dec on a shared structure, that cost can
    add up.

    Add fget_many(), which works just like fget(), except it takes an
    argument for how many references to get on the file. Ditto fput_many(),
    which can drop an arbitrary number of references to a file.

    Reviewed-by: Hannes Reinecke
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jens Axboe

    Jens Axboe
     

29 Dec, 2018

2 commits

  • totalram_pages and totalhigh_pages are made static inline function.

    Main motivation was that managed_page_count_lock handling was complicating
    things. It was discussed in length here,
    https://lore.kernel.org/patchwork/patch/995739/#1181785 So it seemes
    better to remove the lock and convert variables to atomic, with preventing
    poteintial store-to-read tearing as a bonus.

    [akpm@linux-foundation.org: coding style fixes]
    Link: http://lkml.kernel.org/r/1542090790-21750-4-git-send-email-arunks@codeaurora.org
    Signed-off-by: Arun KS
    Suggested-by: Michal Hocko
    Suggested-by: Vlastimil Babka
    Reviewed-by: Konstantin Khlebnikov
    Reviewed-by: Pavel Tatashin
    Acked-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Cc: David Hildenbrand
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun KS
     
  • Patch series "mm: convert totalram_pages, totalhigh_pages and managed
    pages to atomic", v5.

    This series converts totalram_pages, totalhigh_pages and
    zone->managed_pages to atomic variables.

    totalram_pages, zone->managed_pages and totalhigh_pages updates are
    protected by managed_page_count_lock, but readers never care about it.
    Convert these variables to atomic to avoid readers potentially seeing a
    store tear.

    Main motivation was that managed_page_count_lock handling was complicating
    things. It was discussed in length here,
    https://lore.kernel.org/patchwork/patch/995739/#1181785 It seemes better
    to remove the lock and convert variables to atomic. With the change,
    preventing poteintial store-to-read tearing comes as a bonus.

    This patch (of 4):

    This is in preparation to a later patch which converts totalram_pages and
    zone->managed_pages to atomic variables. Please note that re-reading the
    value might lead to a different value and as such it could lead to
    unexpected behavior. There are no known bugs as a result of the current
    code but it is better to prevent from them in principle.

    Link: http://lkml.kernel.org/r/1542090790-21750-2-git-send-email-arunks@codeaurora.org
    Signed-off-by: Arun KS
    Reviewed-by: Konstantin Khlebnikov
    Reviewed-by: David Hildenbrand
    Acked-by: Michal Hocko
    Acked-by: Vlastimil Babka
    Reviewed-by: Pavel Tatashin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arun KS
     

22 Aug, 2018

1 commit

  • Pull overlayfs updates from Miklos Szeredi:
    "This contains two new features:

    - Stack file operations: this allows removal of several hacks from
    the VFS, proper interaction of read-only open files with copy-up,
    possibility to implement fs modifying ioctls properly, and others.

    - Metadata only copy-up: when file is on lower layer and only
    metadata is modified (except size) then only copy up the metadata
    and continue to use the data from the lower file"

    * tag 'ovl-update-4.19' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (66 commits)
    ovl: Enable metadata only feature
    ovl: Do not do metacopy only for ioctl modifying file attr
    ovl: Do not do metadata only copy-up for truncate operation
    ovl: add helper to force data copy-up
    ovl: Check redirect on index as well
    ovl: Set redirect on upper inode when it is linked
    ovl: Set redirect on metacopy files upon rename
    ovl: Do not set dentry type ORIGIN for broken hardlinks
    ovl: Add an inode flag OVL_CONST_INO
    ovl: Treat metacopy dentries as type OVL_PATH_MERGE
    ovl: Check redirects for metacopy files
    ovl: Move some dir related ovl_lookup_single() code in else block
    ovl: Do not expose metacopy only dentry from d_real()
    ovl: Open file with data except for the case of fsync
    ovl: Add helper ovl_inode_realdata()
    ovl: Store lower data inode in ovl_inode
    ovl: Fix ovl_getattr() to get number of blocks from lower
    ovl: Add helper ovl_dentry_lowerdata() to get lower data dentry
    ovl: Copy up meta inode data from lowest data inode
    ovl: Modify ovl_lookup() and friends to lookup metacopy dentry
    ...

    Linus Torvalds
     

18 Jul, 2018

1 commit

  • Stacking file operations in overlay will store an extra open file for each
    overlay file opened.

    The overhead is just that of "struct file" which is about 256bytes, because
    overlay already pins an extra dentry and inode when the file is open, which
    add up to a much larger overhead.

    For fear of breaking working setups, don't start accounting the extra file.

    Signed-off-by: Miklos Szeredi

    Miklos Szeredi
     

12 Jul, 2018

8 commits


11 Jul, 2018

1 commit


08 Dec, 2017

1 commit

  • Preempt counter APIs have been split out, currently, hardirq.h just
    includes irq_enter/exit APIs which are not used by vfs at all.

    So, remove the unused hardirq.h.

    Signed-off-by: Yang Shi
    Cc: Alexander Viro
    Signed-off-by: Al Viro

    Yang Shi
     

16 Nov, 2017

1 commit

  • The allocations from filp cache can be directly triggered by userspace
    applications. A buggy application can consume a significant amount of
    unaccounted system memory. Though we have not noticed such buggy
    applications in our production but upon close inspection, we found that
    a lot of machines spend very significant amount of memory on these
    caches.

    One way to limit allocations from filp cache is to set system level
    limit of maximum number of open files. However this limit is shared
    between different users on the system and one user can hog this
    resource. To cater that, we can charge filp to kmemcg and set the
    maximum limit very high and let the memory limit of each user limit the
    number of files they can open and indirectly limiting their allocations
    from filp cache.

    One side effect of this change is that it will allow _sysctl() to return
    ENOMEM and the man page of _sysctl() does not specify that. However the
    man page also discourages to use _sysctl() at all.

    Link: http://lkml.kernel.org/r/20171011190359.34926-1-shakeelb@google.com
    Signed-off-by: Shakeel Butt
    Cc: Alexander Viro
    Cc: Vladimir Davydov
    Cc: Michal Hocko
    Cc: Greg Thelen
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shakeel Butt
     

09 Nov, 2017

1 commit

  • The file hash is calculated and written out as an xattr after
    calling fasync(). In order for the file data and metadata to be
    written out to disk at the same time, this patch calculates the
    file hash and stores it as an xattr before calling fasync.

    Signed-off-by: Mimi Zohar

    Mimi Zohar
     

28 Aug, 2017

1 commit


06 Jul, 2017

1 commit

  • Most filesystems currently use mapping_set_error and
    filemap_check_errors for setting and reporting/clearing writeback errors
    at the mapping level. filemap_check_errors is indirectly called from
    most of the filemap_fdatawait_* functions and from
    filemap_write_and_wait*. These functions are called from all sorts of
    contexts to wait on writeback to finish -- e.g. mostly in fsync, but
    also in truncate calls, getattr, etc.

    The non-fsync callers are problematic. We should be reporting writeback
    errors during fsync, but many places spread over the tree clear out
    errors before they can be properly reported, or report errors at
    nonsensical times.

    If I get -EIO on a stat() call, there is no reason for me to assume that
    it is because some previous writeback failed. The fact that it also
    clears out the error such that a subsequent fsync returns 0 is a bug,
    and a nasty one since that's potentially silent data corruption.

    This patch adds a small bit of new infrastructure for setting and
    reporting errors during address_space writeback. While the above was my
    original impetus for adding this, I think it's also the case that
    current fsync semantics are just problematic for userland. Most
    applications that call fsync do so to ensure that the data they wrote
    has hit the backing store.

    In the case where there are multiple writers to the file at the same
    time, this is really hard to determine. The first one to call fsync will
    see any stored error, and the rest get back 0. The processes with open
    fds may not be associated with one another in any way. They could even
    be in different containers, so ensuring coordination between all fsync
    callers is not really an option.

    One way to remedy this would be to track what file descriptor was used
    to dirty the file, but that's rather cumbersome and would likely be
    slow. However, there is a simpler way to improve the semantics here
    without incurring too much overhead.

    This set adds an errseq_t to struct address_space, and a corresponding
    one is added to struct file. Writeback errors are recorded in the
    mapping's errseq_t, and the one in struct file is used as the "since"
    value.

    This changes the semantics of the Linux fsync implementation such that
    applications can now use it to determine whether there were any
    writeback errors since fsync(fd) was last called (or since the file was
    opened in the case of fsync having never been called).

    Note that those writeback errors may have occurred when writing data
    that was dirtied via an entirely different fd, but that's the case now
    with the current mapping_set_error/filemap_check_error infrastructure.
    This will at least prevent you from getting a false report of success.

    The new behavior is still consistent with the POSIX spec, and is more
    reliable for application developers. This patch just adds some basic
    infrastructure for doing this, and ensures that the f_wb_err "cursor"
    is properly set when a file is opened. Later patches will change the
    existing code to use this new infrastructure for reporting errors at
    fsync time.

    Signed-off-by: Jeff Layton
    Reviewed-by: Jan Kara

    Jeff Layton
     

02 Mar, 2017

1 commit


06 Dec, 2016

1 commit


07 Aug, 2015

1 commit

  • Dave Hansen reported the following;

    My laptop has been behaving strangely with 4.2-rc2. Once I log
    in to my X session, I start getting all kinds of strange errors
    from applications and see this in my dmesg:

    VFS: file-max limit 8192 reached

    The problem is that the file-max is calculated before memory is fully
    initialised and miscalculates how much memory the kernel is using. This
    patch recalculates file-max after deferred memory initialisation. Note
    that using memory hotplug infrastructure would not have avoided this
    problem as the value is not recalculated after memory hot-add.

    4.1: files_stat.max_files = 6582781
    4.2-rc2: files_stat.max_files = 8192
    4.2-rc2 patched: files_stat.max_files = 6562467

    Small differences with the patch applied and 4.1 but not enough to matter.

    Signed-off-by: Mel Gorman
    Reported-by: Dave Hansen
    Cc: Nicolai Stange
    Cc: Dave Hansen
    Cc: Alex Ng
    Cc: Fengguang Wu
    Cc: Peter Zijlstra (Intel)
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

24 Jun, 2015

1 commit


12 Apr, 2015

1 commit


13 Oct, 2014

2 commits

  • Pull vfs updates from Al Viro:
    "The big thing in this pile is Eric's unmount-on-rmdir series; we
    finally have everything we need for that. The final piece of prereqs
    is delayed mntput() - now filesystem shutdown always happens on
    shallow stack.

    Other than that, we have several new primitives for iov_iter (Matt
    Wilcox, culled from his XIP-related series) pushing the conversion to
    ->read_iter()/ ->write_iter() a bit more, a bunch of fs/dcache.c
    cleanups and fixes (including the external name refcounting, which
    gives consistent behaviour of d_move() wrt procfs symlinks for long
    and short names alike) and assorted cleanups and fixes all over the
    place.

    This is just the first pile; there's a lot of stuff from various
    people that ought to go in this window. Starting with
    unionmount/overlayfs mess... ;-/"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (60 commits)
    fs/file_table.c: Update alloc_file() comment
    vfs: Deduplicate code shared by xattr system calls operating on paths
    reiserfs: remove pointless forward declaration of struct nameidata
    don't need that forward declaration of struct nameidata in dcache.h anymore
    take dname_external() into fs/dcache.c
    let path_init() failures treated the same way as subsequent link_path_walk()
    fix misuses of f_count() in ppp and netlink
    ncpfs: use list_for_each_entry() for d_subdirs walk
    vfs: move getname() from callers to do_mount()
    gfs2_atomic_open(): skip lookups on hashed dentry
    [infiniband] remove pointless assignments
    gadgetfs: saner API for gadgetfs_create_file()
    f_fs: saner API for ffs_sb_create_file()
    jfs: don't hash direct inode
    [s390] remove pointless assignment of ->f_op in vmlogrdr ->open()
    ecryptfs: ->f_op is never NULL
    android: ->f_op is never NULL
    nouveau: __iomem misannotations
    missing annotation in fs/file.c
    fs: namespace: suppress 'may be used uninitialized' warnings
    ...

    Linus Torvalds
     
  • This comment is 5 years outdated; init_file() no longer exists.

    Signed-off-by: Eric Biggers
    Signed-off-by: Al Viro

    Eric Biggers
     

08 Sep, 2014

1 commit

  • Percpu allocator now supports allocation mask. Add @gfp to
    percpu_counter_init() so that !GFP_KERNEL allocation masks can be used
    with percpu_counters too.

    We could have left percpu_counter_init() alone and added
    percpu_counter_init_gfp(); however, the number of users isn't that
    high and introducing _gfp variants to all percpu data structures would
    be quite ugly, so let's just do the conversion. This is the one with
    the most users. Other percpu data structures are a lot easier to
    convert.

    This patch doesn't make any functional difference.

    Signed-off-by: Tejun Heo
    Acked-by: Jan Kara
    Acked-by: "David S. Miller"
    Cc: x86@kernel.org
    Cc: Jens Axboe
    Cc: "Theodore Ts'o"
    Cc: Alexander Viro
    Cc: Andrew Morton

    Tejun Heo
     

13 Jun, 2014

1 commit

  • Pull vfs updates from Al Viro:
    "This the bunch that sat in -next + lock_parent() fix. This is the
    minimal set; there's more pending stuff.

    In particular, I really hope to get acct.c fixes merged this cycle -
    we need that to deal sanely with delayed-mntput stuff. In the next
    pile, hopefully - that series is fairly short and localized
    (kernel/acct.c, fs/super.c and fs/namespace.c). In this pile: more
    iov_iter work. Most of prereqs for ->splice_write with sane locking
    order are there and Kent's dio rewrite would also fit nicely on top of
    this pile"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (70 commits)
    lock_parent: don't step on stale ->d_parent of all-but-freed one
    kill generic_file_splice_write()
    ceph: switch to iter_file_splice_write()
    shmem: switch to iter_file_splice_write()
    nfs: switch to iter_splice_write_file()
    fs/splice.c: remove unneeded exports
    ocfs2: switch to iter_file_splice_write()
    ->splice_write() via ->write_iter()
    bio_vec-backed iov_iter
    optimize copy_page_{to,from}_iter()
    bury generic_file_aio_{read,write}
    lustre: get rid of messing with iovecs
    ceph: switch to ->write_iter()
    ceph_sync_direct_write: stop poking into iov_iter guts
    ceph_sync_read: stop poking into iov_iter guts
    new helper: copy_page_from_iter()
    fuse: switch to ->write_iter()
    btrfs: switch to ->write_iter()
    ocfs2: switch to ->write_iter()
    xfs: switch to ->write_iter()
    ...

    Linus Torvalds
     

07 Jun, 2014

1 commit


07 May, 2014

2 commits

  • Beginning to introduce those. Just the callers for now, and it's
    clumsier than it'll eventually become; once we finish converting
    aio_read and aio_write instances, the things will get nicer.

    For now, these guys are in parallel to ->aio_read() and ->aio_write();
    they take iocb and iov_iter, with everything in iov_iter already
    validated. File offset is passed in iocb->ki_pos, iov/nr_segs -
    in iov_iter.

    Main concerns in that series are stack footprint and ability to
    split the damn thing cleanly.

    [fix from Peter Ujfalusi folded]

    Signed-off-by: Al Viro

    Al Viro
     
  • Since we are about to introduce new methods (read_iter/write_iter), the
    tests in a bunch of places would have to grow inconveniently. Check
    once (at open() time) and store results in ->f_mode as FMODE_CAN_READ
    and FMODE_CAN_WRITE resp. It might end up being a temporary measure -
    once everything switches from ->aio_{read,write} to ->{read,write}_iter
    it might make sense to return to open-coded checks. We'll see...

    Signed-off-by: Al Viro

    Al Viro