08 Jul, 2018

1 commit

  • [ Upstream commit 378831e4daec75fbba6d3612bcf3b4dd00ddbf08 ]

    Doing faccessat("/afs/some/directory", 0) triggers a BUG in the permissions
    check code.

    Fix this by just removing the BUG section. If no permissions are asked
    for, just return okay if the file exists.

    Also:

    (1) Split up the directory check so that it has separate if-statements
    rather than if-else-if (e.g. checking for MAY_EXEC shouldn't skip the
    check for MAY_READ and MAY_WRITE).

    (2) Check for MAY_CHDIR as MAY_EXEC.

    Without the main fix, the following BUG may occur:

    kernel BUG at fs/afs/security.c:386!
    invalid opcode: 0000 [#1] SMP PTI
    ...
    RIP: 0010:afs_permission+0x19d/0x1a0 [kafs]
    ...
    Call Trace:
    ? inode_permission+0xbe/0x180
    ? do_faccessat+0xdc/0x270
    ? do_syscall_64+0x60/0x1f0
    ? entry_SYSCALL_64_after_hwframe+0x49/0xbe

    Fixes: 00d3b7a4533e ("[AFS]: Add security support.")
    Reported-by: Jonathan Billings
    Signed-off-by: David Howells
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     

21 Jun, 2018

1 commit

  • [ Upstream commit 4776cab43fd3111618112737a257dc3ef368eddd ]

    Some AFS servers refuse to accept unencrypted traffic, so can't be accessed
    with kAFS. Set the AF_RXRPC security level to encrypt client calls to deal
    with this.

    Note that incoming service calls are set by the remote client and so aren't
    affected by this.

    This requires an AF_RXRPC patch to pass the value set by setsockopt to calls
    begun by the kernel.

    Signed-off-by: David Howells
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     

03 Mar, 2018

1 commit


14 Dec, 2017

2 commits

  • [ Upstream commit f4b3526d83c40dd8bf5948b9d7a1b2c340f0dcc8 ]

    The handler for the CB.ProbeUuid operation in the cache manager is
    implemented, but isn't listed in the switch-statement of operation
    selection, so won't be used. Fix this by adding it.

    Signed-off-by: David Howells
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     
  • [ Upstream commit 1199db603511d7463d9d3840f96f61967affc766 ]

    Fix the total-length calculation in afs_make_call() when the operation
    being dispatched has data from a series of pages attached.

    Despite the patched code looking like that it should reduce mathematically
    to the current code, it doesn't because the 32-bit unsigned arithmetic
    being used to calculate the page-offset-difference doesn't correctly extend
    to a 64-bit value when the result is effectively negative.

    Without this, some FS.StoreData operations that span multiple pages fail,
    reporting too little or too much data.

    Signed-off-by: David Howells
    Signed-off-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     

02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

07 Sep, 2017

3 commits

  • Merge updates from Andrew Morton:

    - various misc bits

    - DAX updates

    - OCFS2

    - most of MM

    * emailed patches from Andrew Morton : (119 commits)
    mm,fork: introduce MADV_WIPEONFORK
    x86,mpx: make mpx depend on x86-64 to free up VMA flag
    mm: add /proc/pid/smaps_rollup
    mm: hugetlb: clear target sub-page last when clearing huge page
    mm: oom: let oom_reap_task and exit_mmap run concurrently
    swap: choose swap device according to numa node
    mm: replace TIF_MEMDIE checks by tsk_is_oom_victim
    mm, oom: do not rely on TIF_MEMDIE for memory reserves access
    z3fold: use per-cpu unbuddied lists
    mm, swap: don't use VMA based swap readahead if HDD is used as swap
    mm, swap: add sysfs interface for VMA based swap readahead
    mm, swap: VMA based swap readahead
    mm, swap: fix swap readahead marking
    mm, swap: add swap readahead hit statistics
    mm/vmalloc.c: don't reinvent the wheel but use existing llist API
    mm/vmstat.c: fix wrong comment
    selftests/memfd: add memfd_create hugetlbfs selftest
    mm/shmem: add hugetlbfs support to memfd_create()
    mm, devm_memremap_pages: use multi-order radix for ZONE_DEVICE lookups
    mm/vmalloc.c: halve the number of comparisons performed in pcpu_get_vm_areas()
    ...

    Linus Torvalds
     
  • Patch series "Ranged pagevec lookup", v2.

    In this series I make pagevec_lookup() update the index (to be
    consistent with pagevec_lookup_tag() and also as a preparation for
    ranged lookups), provide ranged variant of pagevec_lookup() and use it
    in places where it makes sense. This not only removes some common code
    but is also a measurable performance win for some use cases (see patch
    4/10) where radix tree is sparse and searching & grabing of a page after
    the end of the range has measurable overhead.

    This patch (of 10):

    The callback doesn't ever get called. Remove it.

    Link: http://lkml.kernel.org/r/20170726114704.7626-2-jack@suse.cz
    Signed-off-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Pull networking updates from David Miller:

    1) Support ipv6 checksum offload in sunvnet driver, from Shannon
    Nelson.

    2) Move to RB-tree instead of custom AVL code in inetpeer, from Eric
    Dumazet.

    3) Allow generic XDP to work on virtual devices, from John Fastabend.

    4) Add bpf device maps and XDP_REDIRECT, which can be used to build
    arbitrary switching frameworks using XDP. From John Fastabend.

    5) Remove UFO offloads from the tree, gave us little other than bugs.

    6) Remove the IPSEC flow cache, from Florian Westphal.

    7) Support ipv6 route offload in mlxsw driver.

    8) Support VF representors in bnxt_en, from Sathya Perla.

    9) Add support for forward error correction modes to ethtool, from
    Vidya Sagar Ravipati.

    10) Add time filter for packet scheduler action dumping, from Jamal Hadi
    Salim.

    11) Extend the zerocopy sendmsg() used by virtio and tap to regular
    sockets via MSG_ZEROCOPY. From Willem de Bruijn.

    12) Significantly rework value tracking in the BPF verifier, from Edward
    Cree.

    13) Add new jump instructions to eBPF, from Daniel Borkmann.

    14) Rework rtnetlink plumbing so that operations can be run without
    taking the RTNL semaphore. From Florian Westphal.

    15) Support XDP in tap driver, from Jason Wang.

    16) Add 32-bit eBPF JIT for ARM, from Shubham Bansal.

    17) Add Huawei hinic ethernet driver.

    18) Allow to report MD5 keys in TCP inet_diag dumps, from Ivan
    Delalande.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1780 commits)
    i40e: point wb_desc at the nvm_wb_desc during i40e_read_nvm_aq
    i40e: avoid NVM acquire deadlock during NVM update
    drivers: net: xgene: Remove return statement from void function
    drivers: net: xgene: Configure tx/rx delay for ACPI
    drivers: net: xgene: Read tx/rx delay for ACPI
    rocker: fix kcalloc parameter order
    rds: Fix non-atomic operation on shared flag variable
    net: sched: don't use GFP_KERNEL under spin lock
    vhost_net: correctly check tx avail during rx busy polling
    net: mdio-mux: add mdio_mux parameter to mdio_mux_init()
    rxrpc: Make service connection lookup always check for retry
    net: stmmac: Delete dead code for MDIO registration
    gianfar: Fix Tx flow control deactivation
    cxgb4: Ignore MPS_TX_INT_CAUSE[Bubble] for T6
    cxgb4: Fix pause frame count in t4_get_port_stats
    cxgb4: fix memory leak
    tun: rename generic_xdp to skb_xdp
    tun: reserve extra headroom only when XDP is set
    net: dsa: bcm_sf2: Configure IMP port TC2QOS mapping
    net: dsa: bcm_sf2: Advertise number of egress queues
    ...

    Linus Torvalds
     

29 Aug, 2017

1 commit

  • Add a callback to rxrpc_kernel_send_data() so that a kernel service can get
    a notification that the AF_RXRPC call has transitioned out the Tx phase and
    is now waiting for a reply or a final ACK.

    This is called from AF_RXRPC with the call state lock held so the
    notification is guaranteed to come before any reply is passed back.

    Further, modify the AFS filesystem to make use of this so that we don't have
    to change the afs_call state before sending the last bit of data.

    Signed-off-by: David Howells

    David Howells
     

01 Aug, 2017

1 commit

  • This patch converts most of the in-kernel filesystems that do writeback
    out of the pagecache to report errors using the errseq_t-based
    infrastructure that was recently added. This allows them to report
    errors once for each open file description.

    Most filesystems have a fairly straightforward fsync operation. They
    call filemap_write_and_wait_range to write back all of the data and
    wait on it, and then (sometimes) sync out the metadata.

    For those filesystems this is a straightforward conversion from calling
    filemap_write_and_wait_range in their fsync operation to calling
    file_write_and_wait_range.

    Acked-by: Jan Kara
    Acked-by: Dave Kleikamp
    Signed-off-by: Jeff Layton

    Jeff Layton
     

21 Jul, 2017

1 commit

  • Move the protocol description header file into net/rxrpc/ and rename it to
    protocol.h. It's no longer necessary to expose it as packets are no longer
    exposed to kernel services (such as AFS) that use the facility.

    The abort codes are transferred to the UAPI header instead as we pass these
    back to userspace and also to kernel services.

    Signed-off-by: David Howells

    David Howells
     

16 Jul, 2017

1 commit

  • Pull ->s_options removal from Al Viro:
    "Preparations for fsmount/fsopen stuff (coming next cycle). Everything
    gets moved to explicit ->show_options(), killing ->s_options off +
    some cosmetic bits around fs/namespace.c and friends. Basically, the
    stuff needed to work with fsmount series with minimum of conflicts
    with other work.

    It's not strictly required for this merge window, but it would reduce
    the PITA during the coming cycle, so it would be nice to have those
    bits and pieces out of the way"

    * 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    isofs: Fix isofs_show_options()
    VFS: Kill off s_options and helpers
    orangefs: Implement show_options
    9p: Implement show_options
    isofs: Implement show_options
    afs: Implement show_options
    affs: Implement show_options
    befs: Implement show_options
    spufs: Implement show_options
    bpf: Implement show_options
    ramfs: Implement show_options
    pstore: Implement show_options
    omfs: Implement show_options
    hugetlbfs: Implement show_options
    VFS: Don't use save/replace_mount_options if not using generic_show_options
    VFS: Provide empty name qstr
    VFS: Make get_filesystem() return the affected filesystem
    VFS: Clean up whitespace in fs/namespace.c and fs/super.c
    Provide a function to create a NUL-terminated string from unterminated data

    Linus Torvalds
     

11 Jul, 2017

1 commit

  • Implement the show_options superblock op for afs as part of a bid to get
    rid of s_options and generic_show_options() to make it easier to implement
    a context-based mount where the mount options can be passed individually
    over a file descriptor.

    Also implement the show_devname op to display the correct device name and thus
    avoid the need to display the cell= and volume= options.

    Signed-off-by: David Howells
    cc: linux-afs@lists.infradead.org
    Signed-off-by: Al Viro

    David Howells
     

10 Jul, 2017

2 commits

  • Add xattrs to allow the user to get/set metadata in lieu of having pioctl()
    available. The following xattrs are now available:

    - "afs.cell"

    The name of the cell in which the vnode's volume resides.

    - "afs.fid"

    The volume ID, vnode ID and vnode uniquifier of the file as three hex
    numbers separated by colons.

    - "afs.volume"

    The name of the volume in which the vnode resides.

    For example:

    # getfattr -d -m ".*" /mnt/scratch
    getfattr: Removing leading '/' from absolute path names
    # file: mnt/scratch
    afs.cell="mycell.myorg.org"
    afs.fid="10000b:1:1"
    afs.volume="scratch"

    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    David Howells
     
  • The AFS_ACE_READ and AFS_ACE_WRITE permission bits should not
    be used to make access decisions for the directory itself. They
    are meant to control access for the objects contained in that
    directory.

    Reading a directory is allowed if the AFS_ACE_LOOKUP bit is set.
    This would cause an incorrect access denied error for a directory
    with AFS_ACE_LOOKUP but not AFS_ACE_READ.

    The AFS_ACE_WRITE bit does not allow operations that modify the
    directory. For a directory with AFS_ACE_WRITE but neither
    AFS_ACE_INSERT nor AFS_ACE_DELETE, this would result in trying
    operations that would ultimately be denied by the server.

    Signed-off-by: Marc Dionne
    Signed-off-by: David Howells
    Signed-off-by: Linus Torvalds

    Marc Dionne
     

06 Jul, 2017

1 commit

  • Pull networking updates from David Miller:
    "Reasonably busy this cycle, but perhaps not as busy as in the 4.12
    merge window:

    1) Several optimizations for UDP processing under high load from
    Paolo Abeni.

    2) Support pacing internally in TCP when using the sch_fq packet
    scheduler for this is not practical. From Eric Dumazet.

    3) Support mutliple filter chains per qdisc, from Jiri Pirko.

    4) Move to 1ms TCP timestamp clock, from Eric Dumazet.

    5) Add batch dequeueing to vhost_net, from Jason Wang.

    6) Flesh out more completely SCTP checksum offload support, from
    Davide Caratti.

    7) More plumbing of extended netlink ACKs, from David Ahern, Pablo
    Neira Ayuso, and Matthias Schiffer.

    8) Add devlink support to nfp driver, from Simon Horman.

    9) Add RTM_F_FIB_MATCH flag to RTM_GETROUTE queries, from Roopa
    Prabhu.

    10) Add stack depth tracking to BPF verifier and use this information
    in the various eBPF JITs. From Alexei Starovoitov.

    11) Support XDP on qed device VFs, from Yuval Mintz.

    12) Introduce BPF PROG ID for better introspection of installed BPF
    programs. From Martin KaFai Lau.

    13) Add bpf_set_hash helper for TC bpf programs, from Daniel Borkmann.

    14) For loads, allow narrower accesses in bpf verifier checking, from
    Yonghong Song.

    15) Support MIPS in the BPF selftests and samples infrastructure, the
    MIPS eBPF JIT will be merged in via the MIPS GIT tree. From David
    Daney.

    16) Support kernel based TLS, from Dave Watson and others.

    17) Remove completely DST garbage collection, from Wei Wang.

    18) Allow installing TCP MD5 rules using prefixes, from Ivan
    Delalande.

    19) Add XDP support to Intel i40e driver, from Björn Töpel

    20) Add support for TC flower offload in nfp driver, from Simon
    Horman, Pieter Jansen van Vuuren, Benjamin LaHaise, Jakub
    Kicinski, and Bert van Leeuwen.

    21) IPSEC offloading support in mlx5, from Ilan Tayari.

    22) Add HW PTP support to macb driver, from Rafal Ozieblo.

    23) Networking refcount_t conversions, From Elena Reshetova.

    24) Add sock_ops support to BPF, from Lawrence Brako. This is useful
    for tuning the TCP sockopt settings of a group of applications,
    currently via CGROUPs"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1899 commits)
    net: phy: dp83867: add workaround for incorrect RX_CTRL pin strap
    dt-bindings: phy: dp83867: provide a workaround for incorrect RX_CTRL pin strap
    cxgb4: Support for get_ts_info ethtool method
    cxgb4: Add PTP Hardware Clock (PHC) support
    cxgb4: time stamping interface for PTP
    nfp: default to chained metadata prepend format
    nfp: remove legacy MAC address lookup
    nfp: improve order of interfaces in breakout mode
    net: macb: remove extraneous return when MACB_EXT_DESC is defined
    bpf: add missing break in for the TCP_BPF_SNDCWND_CLAMP case
    bpf: fix return in load_bpf_file
    mpls: fix rtm policy in mpls_getroute
    net, ax25: convert ax25_cb.refcount from atomic_t to refcount_t
    net, ax25: convert ax25_route.refcount from atomic_t to refcount_t
    net, ax25: convert ax25_uid_assoc.refcount from atomic_t to refcount_t
    net, sctp: convert sctp_ep_common.refcnt from atomic_t to refcount_t
    net, sctp: convert sctp_transport.refcnt from atomic_t to refcount_t
    net, sctp: convert sctp_chunk.refcnt from atomic_t to refcount_t
    net, sctp: convert sctp_datamsg.refcnt from atomic_t to refcount_t
    net, sctp: convert sctp_auth_bytes.refcnt from atomic_t to refcount_t
    ...

    Linus Torvalds
     

08 Jun, 2017

1 commit

  • Provide a control message that can be specified on the first sendmsg() of a
    client call or the first sendmsg() of a service response to indicate the
    total length of the data to be transmitted for that call.

    Currently, because the length of the payload of an encrypted DATA packet is
    encrypted in front of the data, the packet cannot be encrypted until we
    know how much data it will hold.

    By specifying the length at the beginning of the transmit phase, each DATA
    packet length can be set before we start loading data from userspace (where
    several sendmsg() calls may contribute to a particular packet).

    An error will be returned if too little or too much data is presented in
    the Tx phase.

    Signed-off-by: David Howells

    David Howells
     

05 Jun, 2017

1 commit

  • This essentially is a partial revert of commit ff548773
    ("afs: Move UUID struct to linux/uuid.h") and moves struct uuid_v1 back into
    fs/afs as struct afs_uuid. It however keeps it as big endian structure
    so that we can use the normal uuid generation helpers when casting to/from
    struct afs_uuid.

    The V1 uuid intrepretation in struct form isn't really useful to the
    rest of the kernel, and not really compatible to it either, so move it
    back to AFS instead of polluting the global uuid.h.

    Signed-off-by: Christoph Hellwig
    Acked-by: David Howells

    Christoph Hellwig
     

03 May, 2017

1 commit

  • Pull networking updates from David Millar:
    "Here are some highlights from the 2065 networking commits that
    happened this development cycle:

    1) XDP support for IXGBE (John Fastabend) and thunderx (Sunil Kowuri)

    2) Add a generic XDP driver, so that anyone can test XDP even if they
    lack a networking device whose driver has explicit XDP support
    (me).

    3) Sparc64 now has an eBPF JIT too (me)

    4) Add a BPF program testing framework via BPF_PROG_TEST_RUN (Alexei
    Starovoitov)

    5) Make netfitler network namespace teardown less expensive (Florian
    Westphal)

    6) Add symmetric hashing support to nft_hash (Laura Garcia Liebana)

    7) Implement NAPI and GRO in netvsc driver (Stephen Hemminger)

    8) Support TC flower offload statistics in mlxsw (Arkadi Sharshevsky)

    9) Multiqueue support in stmmac driver (Joao Pinto)

    10) Remove TCP timewait recycling, it never really could possibly work
    well in the real world and timestamp randomization really zaps any
    hint of usability this feature had (Soheil Hassas Yeganeh)

    11) Support level3 vs level4 ECMP route hashing in ipv4 (Nikolay
    Aleksandrov)

    12) Add socket busy poll support to epoll (Sridhar Samudrala)

    13) Netlink extended ACK support (Johannes Berg, Pablo Neira Ayuso,
    and several others)

    14) IPSEC hw offload infrastructure (Steffen Klassert)"

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2065 commits)
    tipc: refactor function tipc_sk_recv_stream()
    tipc: refactor function tipc_sk_recvmsg()
    net: thunderx: Optimize page recycling for XDP
    net: thunderx: Support for XDP header adjustment
    net: thunderx: Add support for XDP_TX
    net: thunderx: Add support for XDP_DROP
    net: thunderx: Add basic XDP support
    net: thunderx: Cleanup receive buffer allocation
    net: thunderx: Optimize CQE_TX handling
    net: thunderx: Optimize RBDR descriptor handling
    net: thunderx: Support for page recycling
    ipx: call ipxitf_put() in ioctl error path
    net: sched: add helpers to handle extended actions
    qed*: Fix issues in the ptp filter config implementation.
    qede: Fix concurrency issue in PTP Tx path processing.
    stmmac: Add support for SIMATIC IOT2000 platform
    net: hns: fix ethtool_get_strings overflow in hns driver
    tcp: fix wraparound issue in tcp_lp
    bpf, arm64: fix jit branch offset related to ldimm64
    bpf, arm64: implement jiting of BPF_XADD
    ...

    Linus Torvalds
     

21 Apr, 2017

1 commit

  • Allocate struct backing_dev_info separately instead of embedding it
    inside the superblock. This unifies handling of bdi among users.

    CC: David Howells
    CC: linux-afs@lists.infradead.org
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Jan Kara
    Signed-off-by: Jens Axboe

    Jan Kara
     

06 Apr, 2017

1 commit


17 Mar, 2017

18 commits

  • Drop the page lock before waiting for page writeback.

    Signed-off-by: David Howells

    David Howells
     
  • The ->writepage() op shouldn't call clear_page_dirty_for_io() as that has
    already been called by the caller.

    Fix afs_writepage() by moving the call out of
    afs_write_back_from_locked_page() to afs_writepages_region() where it is
    needed.

    Signed-off-by: David Howells

    David Howells
     
  • Fix the way in which a call that's in progress and being waited for is
    aborted in the case that EINTR is detected. We should be sending
    RX_USER_ABORT rather than RX_CALL_DEAD as the abort code.

    Note that since the only two ways out of the loop are if the call completes
    or if a signal happens, the kill-the-call clause after the loop has
    finished can only happen in the case of EINTR. This means that we only
    have one abort case to deal with, not two, and the "KWC" case can never
    happen and so can be deleted.

    Note further that simply aborting the call isn't necessarily the best thing
    here since at this point: the request has been entirely sent and it's
    likely the server will do the operation anyway - whether we abort it or
    not. In future, we should punt the handling of the remainder of the call
    off to a background thread.

    Reported-by: Marc Dionne
    Signed-off-by: David Howells

    David Howells
     
  • afs_send_pages() should only put the call into the AFS_CALL_AWAIT_REPLY
    state if it has sent all the pages - but the check it makes is incorrect
    and sometimes it will finish the loop early.

    Signed-off-by: David Howells

    David Howells
     
  • Fix afs_kill_pages() in two ways:

    (1) If a writeback has been partially flushed, then if we try and kill the
    pages it contains, some of them may no longer be undergoing writeback
    and end_page_writeback() will assert.

    Fix this by checking to see whether the page in question is actually
    undergoing writeback before ending that writeback.

    (2) The loop that scans for pages to kill doesn't increase the first page
    index, and so the loop may not terminate, but it will try to process
    the same pages over and over again.

    Fix this by increasing the first page index to one after the last page
    we processed.

    Signed-off-by: David Howells

    David Howells
     
  • afs_write_begin() leaks a ref and a lock on a page if afs_fill_page()
    fails. Fix the leak by unlocking and releasing the page in the error path.

    Signed-off-by: David Howells

    David Howells
     
  • Don't set PG_error on a page if we get local EINTR or ENOMEM when filling a
    page for writing.

    Signed-off-by: David Howells

    David Howells
     
  • The inode timestamps should be set from the client time
    in the status received from the server, rather than the
    server time which is meant for internal server use.

    Set AFS_SET_MTIME and populate the mtime for operations
    that take an input status, such as file/dir creation
    and StoreData. If an input time is not provided the
    server will set the vnode times based on the current server
    time.

    In a situation where the server has some skew with the
    client, this could lead to the client seeing a timestamp
    in the future for a file that it just created or wrote.

    Signed-off-by: Marc Dionne
    Signed-off-by: David Howells

    Marc Dionne
     
  • If we receive a network error, a remote abort or a protocol error whilst
    we're still transmitting data, make sure we return an appropriate error to
    the caller rather than ESHUTDOWN or ECONNABORTED.

    Signed-off-by: David Howells

    David Howells
     
  • When we are given an invalid operation ID, we should abort that with
    RXGEN_OPCODE rather than RX_INVALID_OPERATION.

    Also map RXGEN_OPCODE to -ENOTSUPP.

    Signed-off-by: David Howells

    David Howells
     
  • afs_fs_store_data() works out of the size of the write it's going to make,
    but it uses 32-bit unsigned subtraction in one place that gets
    automatically cast to loff_t.

    However, if to < offset, then the number goes negative, but as the result
    isn't signed, this doesn't get sign-extended to 64-bits when placed in a
    loff_t.

    Fix by casting the operands to loff_t.

    Signed-off-by: David Howells

    David Howells
     
  • Use a bvec rather than a kvec in afs_send_pages() as we don't then have to
    call kmap() in advance. This allows us to pass the array of contiguous
    pages that we extracted through to rxrpc in one go rather than passing a
    single page at a time.

    Signed-off-by: David Howells

    David Howells
     
  • Make struct afs_read::remain 64-bit so that it can handle huge transfers if
    we ever request them or the server decides to give us a bit extra data (the
    other fields there are already 64-bit).

    Signed-off-by: David Howells
    Tested-by: Marc Dionne

    David Howells
     
  • Fix a bug in AFS read whereby the request page afs_read::index isn't
    incremented after calling ->page_done() if ->remain reaches 0, indicating
    that the data read is complete.

    Without this a NULL pointer exception happens when ->page_done() is called
    twice for the last page because the page clearing loop will call it also
    and afs_readpages_page_done() clears the current entry in the page list.

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: afs_readpages_page_done+0x21/0xa4 [kafs]
    PGD 0
    Oops: 0002 [#1] SMP
    Modules linked in: kafs(E)
    CPU: 2 PID: 3002 Comm: md5sum Tainted: G E 4.10.0-fscache #485
    Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
    task: ffff8804017d86c0 task.stack: ffff8803fc1d8000
    RIP: 0010:afs_readpages_page_done+0x21/0xa4 [kafs]
    RSP: 0018:ffff8803fc1db978 EFLAGS: 00010282
    RAX: ffff880405d39af8 RBX: 0000000000000000 RCX: ffff880407d83ed4
    RDX: 0000000000000000 RSI: ffff880405d39a00 RDI: ffff880405c6f400
    RBP: ffff8803fc1db988 R08: 0000000000000000 R09: 0000000000000001
    R10: ffff8803fc1db820 R11: ffff88040cf56000 R12: ffff8804088f1780
    R13: ffff8804017d86c0 R14: ffff8804088f1780 R15: 0000000000003840
    FS: 00007f8154469700(0000) GS:ffff88041fb00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 00000004016ec000 CR4: 00000000001406e0
    Call Trace:
    afs_deliver_fs_fetch_data+0x5b9/0x60e [kafs]
    ? afs_make_call+0x316/0x4e8 [kafs]
    ? afs_make_call+0x359/0x4e8 [kafs]
    afs_deliver_to_call+0x173/0x2e8 [kafs]
    ? afs_make_call+0x316/0x4e8 [kafs]
    afs_make_call+0x37a/0x4e8 [kafs]
    ? wake_up_q+0x4f/0x4f
    ? __init_waitqueue_head+0x36/0x49
    afs_fs_fetch_data+0x21c/0x227 [kafs]
    ? afs_fs_fetch_data+0x21c/0x227 [kafs]
    afs_vnode_fetch_data+0xf3/0x1d2 [kafs]
    afs_readpages+0x314/0x3fd [kafs]
    __do_page_cache_readahead+0x208/0x2c5
    ondemand_readahead+0x3a2/0x3b7
    ? ondemand_readahead+0x3a2/0x3b7
    page_cache_async_readahead+0x5e/0x67
    generic_file_read_iter+0x23b/0x70c
    ? __inode_security_revalidate+0x2f/0x62
    __vfs_read+0xc4/0xe8
    vfs_read+0xd1/0x15a
    SyS_read+0x4c/0x89
    do_syscall_64+0x80/0x191
    entry_SYSCALL64_slow_path+0x25/0x25

    Reported-by: Marc Dionne
    Signed-off-by: David Howells
    Tested-by: Marc Dionne

    David Howells
     
  • get_seconds() returns real wall-clock seconds. On 32-bit systems
    this value will overflow in year 2038 and beyond. This patch changes
    afs_vnode record to use ktime_get_real_seconds() instead, for the
    fields cb_expires and cb_expires_at.

    Signed-off-by: Tina Ruchandani
    Signed-off-by: David Howells

    Tina Ruchandani
     
  • get_seconds() returns real wall-clock seconds. On 32-bit systems
    this value will overflow in year 2038 and beyond. This patch changes
    afs's vlocation record to use ktime_get_real_seconds() instead, for the
    fields time_of_death and update_at.

    Signed-off-by: Tina Ruchandani
    Signed-off-by: David Howells

    Tina Ruchandani
     
  • The use of "rcu_assign_pointer()" is NULLing out the pointer.
    According to RCU_INIT_POINTER()'s block comment:
    "1. This use of RCU_INIT_POINTER() is NULLing out the pointer"
    it is better to use it instead of rcu_assign_pointer() because it has a
    smaller overhead.

    The following Coccinelle semantic patch was used:
    @@
    @@

    - rcu_assign_pointer
    + RCU_INIT_POINTER
    (..., NULL)

    Signed-off-by: Andreea-Cristina Bernat
    Signed-off-by: David Howells

    Andreea-Cristina Bernat
     
  • The use of "rcu_assign_pointer()" is NULLing out the pointer.
    According to RCU_INIT_POINTER()'s block comment:
    "1. This use of RCU_INIT_POINTER() is NULLing out the pointer"
    it is better to use it instead of rcu_assign_pointer() because it has a
    smaller overhead.

    The following Coccinelle semantic patch was used:
    @@
    @@

    - rcu_assign_pointer
    + RCU_INIT_POINTER
    (..., NULL)

    Signed-off-by: Andreea-Cristina Bernat
    Signed-off-by: David Howells

    Andreea-Cristina Bernat