07 Jun, 2018

1 commit

  • blk_partition_remap() will only clear bi_partno if an actual remapping
    has happened. But flush request et al don't have an actual size, so
    the remapping doesn't happen and bi_partno is never cleared.
    So for stacked devices blk_partition_remap() will be called on each level.
    If (as is the case for native nvme multipathing) one of the lower-level
    devices do _not_support partitioning a spurious I/O error is generated.

    Reviewed-by: Johannes Thumshirn
    Reviewed-by: Sagi Grimberg
    Reviewed-by: Christoph Hellwig
    Signed-off-by: Hannes Reinecke
    Signed-off-by: Jens Axboe

    Hannes Reinecke
     

06 Jun, 2018

1 commit


05 Jun, 2018

11 commits

  • Technically we should be able to get away with 0 as the
    discard_alignment, but there's no way currently for the protocol to
    indicate different alignments, and in real life most disks have
    discard_alignment == discard_granularity. Just set our alignment to our
    blocksize to make sure discards will actually work properly with 4k
    drives.

    Signed-off-by: Josef Bacik
    Signed-off-by: Jens Axboe

    Josef Bacik
     
  • Existing dev_dbg messages sometimes identify request using request
    pointer, sometimes using nbd_cmd pointer. This makes it hard to
    follow request flow. Consistently use request pointer instead.

    Reviewed-by: Josef Bacik
    Signed-off-by: Kevin Vigor
    Signed-off-by: Jens Axboe

    Kevin Vigor
     
  • I meet strange filesystem corruption issue recently, the reason
    is there are overlaps partitions in cmdline partition argument.

    This patch add verifier for cmdline partition, then if there are
    overlaps partitions, cmdline_partition will log a warning. We don't
    treat overlaps partition as a error:
    "
    Caizhiyong said:
    Partition overlap was intentionally designed in this cmdline partition.
    reference http://lists.infradead.org/pipermail/linux-mtd/2013-August/048092.html
    "

    Signed-off-by: Wang YanQing
    Signed-off-by: Jens Axboe

    Wang YanQing
     
  • Currently the error exit path when the emeta could not be
    interpreted is via fail_free_ws and this fails to free
    invalid_bitmap. Fix this by adding another exit label and
    exiting via this to kfree invalid_bitmap.

    Detected by CoverityScan, CID#1469659 ("Resource leak")

    Fixes: 48b8d20895f8 ("lightnvm: pblk: garbage collect lines with failed writes")
    Signed-off-by: Colin Ian King
    Signed-off-by: Jens Axboe

    Colin Ian King
     
  • Fixes the following sparse warning:

    drivers/lightnvm/pblk-init.c:23:14: warning:
    symbol 'write_buffer_size' was not declared. Should it be static?

    Signed-off-by: Wei Yongjun
    Signed-off-by: Jens Axboe

    Wei Yongjun
     
  • Refactor the validation code used in LOOP_SET_FD so it is also used in
    LOOP_CHANGE_FD. Otherwise it is possible to construct a set of loop
    devices that all refer to each other. This can lead to a infinite
    loop in starting with "while (is_loop_device(f)) .." in loop_set_fd().

    Fix this by refactoring out the validation code and using it for
    LOOP_CHANGE_FD as well as LOOP_SET_FD.

    Reported-by: syzbot+4349872271ece473a7c91190b68b4bac7c5dbc87@syzkaller.appspotmail.com
    Reported-by: syzbot+40bd32c4d9a3cc12a339@syzkaller.appspotmail.com
    Reported-by: syzbot+769c54e66f994b041be7@syzkaller.appspotmail.com
    Reported-by: syzbot+0a89a9ce473936c57065@syzkaller.appspotmail.com
    Signed-off-by: Theodore Ts'o
    Signed-off-by: Jens Axboe

    Theodore Ts'o
     
  • mempool_init()/bioset_init() require that the mempools/biosets be zeroed
    first; they probably should not _require_ this, but not allocating those
    structs with kzalloc is a fairly nonsensical thing to do (calling
    mempool_exit()/bioset_exit() on an uninitialized mempool/bioset is legal
    and safe, but only works if said memory was zeroed.)

    Acked-by: Mike Snitzer
    Signed-off-by: Kent Overstreet
    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • If a hardware queue is stopped, it should not be run again before
    explicitly started. Ignore stopped queues in blk_mq_run_work_fn(),
    fixing a regression recently introduced when the START_ON_RUN bit
    was removed.

    Fixes: 15fe8a90bb45 ("blk-mq: remove blk_mq_delay_queue()")
    Reviewed-by: Ming Lei
    Reviewed-by: Bart Van Assche
    Signed-off-by: Jianchao Wang
    Signed-off-by: Jens Axboe

    Jianchao Wang
     
  • Pull misc vfs updates from Al Viro:
    "Misc bits and pieces not fitting into anything more specific"

    * 'work.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    vfs: delete unnecessary assignment in vfs_listxattr
    Documentation: filesystems: update filesystem locking documentation
    vfs: namei: use path_equal() in follow_dotdot()
    fs.h: fix outdated comment about file flags
    __inode_security_revalidate() never gets NULL opt_dentry
    make xattr_getsecurity() static
    vfat: simplify checks in vfat_lookup()
    get rid of dead code in d_find_alias()
    it's SB_BORN, not MS_BORN...
    msdos_rmdir(): kill BS comment
    remove rpc_rmdir()
    fs: avoid fdput() after failed fdget() in vfs_dedupe_file_range()

    Linus Torvalds
     
  • Pull procfs updates from Al Viro:
    "Christoph's proc_create_... cleanups series"

    * 'hch.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (44 commits)
    xfs, proc: hide unused xfs procfs helpers
    isdn/gigaset: add back gigaset_procinfo assignment
    proc: update SIZEOF_PDE_INLINE_NAME for the new pde fields
    tty: replace ->proc_fops with ->proc_show
    ide: replace ->proc_fops with ->proc_show
    ide: remove ide_driver_proc_write
    isdn: replace ->proc_fops with ->proc_show
    atm: switch to proc_create_seq_private
    atm: simplify procfs code
    bluetooth: switch to proc_create_seq_data
    netfilter/x_tables: switch to proc_create_seq_private
    netfilter/xt_hashlimit: switch to proc_create_{seq,single}_data
    neigh: switch to proc_create_seq_data
    hostap: switch to proc_create_{seq,single}_data
    bonding: switch to proc_create_seq_data
    rtc/proc: switch to proc_create_single_data
    drbd: switch to proc_create_single
    resource: switch to proc_create_seq_data
    staging/rtl8192u: simplify procfs code
    jfs: simplify procfs code
    ...

    Linus Torvalds
     
  • Pull rmdir update from Al Viro:
    "More shrink_dcache_parent()-related stuff - killing the main source of
    potentially contended calls of that on large subtrees"

    * 'work.rmdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    rmdir(),rename(): do shrink_dcache_parent() only on success

    Linus Torvalds
     

04 Jun, 2018

6 commits

  • Pull dcache updates from Al Viro:
    "This is the first part of dealing with livelocks etc around
    shrink_dcache_parent()."

    * 'work.dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    restore cond_resched() in shrink_dcache_parent()
    dput(): turn into explicit while() loop
    dcache: move cond_resched() into the end of __dentry_kill()
    d_walk(): kill 'finish' callback
    d_invalidate(): unhash immediately

    Linus Torvalds
     
  • Pull block updates from Jens Axboe:

    - clean up how we pass around gfp_t and
    blk_mq_req_flags_t (Christoph)

    - prepare us to defer scheduler attach (Christoph)

    - clean up drivers handling of bounce buffers (Christoph)

    - fix timeout handling corner cases (Christoph/Bart/Keith)

    - bcache fixes (Coly)

    - prep work for bcachefs and some block layer optimizations (Kent).

    - convert users of bio_sets to using embedded structs (Kent).

    - fixes for the BFQ io scheduler (Paolo/Davide/Filippo)

    - lightnvm fixes and improvements (Matias, with contributions from Hans
    and Javier)

    - adding discard throttling to blk-wbt (me)

    - sbitmap blk-mq-tag handling (me/Omar/Ming).

    - remove the sparc jsflash block driver, acked by DaveM.

    - Kyber scheduler improvement from Jianchao, making it more friendly
    wrt merging.

    - conversion of symbolic proc permissions to octal, from Joe Perches.
    Previously the block parts were a mix of both.

    - nbd fixes (Josef and Kevin Vigor)

    - unify how we handle the various kinds of timestamps that the block
    core and utility code uses (Omar)

    - three NVMe pull requests from Keith and Christoph, bringing AEN to
    feature completeness, file backed namespaces, cq/sq lock split, and
    various fixes

    - various little fixes and improvements all over the map

    * tag 'for-4.18/block-20180603' of git://git.kernel.dk/linux-block: (196 commits)
    blk-mq: update nr_requests when switching to 'none' scheduler
    block: don't use blocking queue entered for recursive bio submits
    dm-crypt: fix warning in shutdown path
    lightnvm: pblk: take bitmap alloc. out of critical section
    lightnvm: pblk: kick writer on new flush points
    lightnvm: pblk: only try to recover lines with written smeta
    lightnvm: pblk: remove unnecessary bio_get/put
    lightnvm: pblk: add possibility to set write buffer size manually
    lightnvm: fix partial read error path
    lightnvm: proper error handling for pblk_bio_add_pages
    lightnvm: pblk: fix smeta write error path
    lightnvm: pblk: garbage collect lines with failed writes
    lightnvm: pblk: rework write error recovery path
    lightnvm: pblk: remove dead function
    lightnvm: pass flag on graceful teardown to targets
    lightnvm: pblk: check for chunk size before allocating it
    lightnvm: pblk: remove unnecessary argument
    lightnvm: pblk: remove unnecessary indirection
    lightnvm: pblk: return NVM_ error on failed submission
    lightnvm: pblk: warn in case of corrupted write buffer
    ...

    Linus Torvalds
     
  • Linus Torvalds
     
  • Pull vfs fixes from Al Viro.

    - fix io_destroy()/aio_complete() race

    - the vfs_open() change to get rid of open_check_o_direct() boilerplate
    was nice, but buggy. Al has a patch avoiding a revert, but that's
    definitely not a last-day fodder, so for now revert it is...

    * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    Revert "fs: fold open_check_o_direct into do_dentry_open"
    fix io_destroy()/aio_complete() race

    Linus Torvalds
     
  • This reverts commit cab64df194667dc5d9d786f0a895f647f5501c0d.

    Having vfs_open() in some cases drop the reference to
    struct file combined with

    error = vfs_open(path, f, cred);
    if (error) {
    put_filp(f);
    return ERR_PTR(error);
    }
    return f;

    is flat-out wrong. It used to be

    error = vfs_open(path, f, cred);
    if (!error) {
    /* from now on we need fput() to dispose of f */
    error = open_check_o_direct(f);
    if (error) {
    fput(f);
    f = ERR_PTR(error);
    }
    } else {
    put_filp(f);
    f = ERR_PTR(error);
    }

    and sure, having that open_check_o_direct() boilerplate gotten rid of is
    nice, but not that way...

    Worse, another call chain (via finish_open()) is FUBAR now wrt
    FILE_OPENED handling - in that case we get error returned, with file
    already hit by fput() *AND* FILE_OPENED not set. Guess what happens in
    path_openat(), when it hits

    if (!(opened & FILE_OPENED)) {
    BUG_ON(!error);
    put_filp(file);
    }

    The root cause of all that crap is that the callers of do_dentry_open()
    have no way to tell which way did it fail; while that could be fixed up
    (by passing something like int *opened to do_dentry_open() and have it
    marked if we'd called ->open()), it's probably much too late in the
    cycle to do so right now.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • Pull scheduler fixes from Thomas Gleixner:

    - two patches addressing the problem that the scheduler allows under
    certain conditions user space tasks to be scheduled on CPUs which are
    not yet fully booted which causes a few subtle and hard to debug
    issue

    - add a missing runqueue clock update in the deadline scheduler which
    triggers a warning under certain circumstances

    - fix a silly typo in the scheduler header file

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched/headers: Fix typo
    sched/deadline: Fix missing clock update
    sched/core: Require cpu_active() in select_task_rq(), for user tasks
    sched/core: Fix rules for running on online && !active CPUs

    Linus Torvalds
     

03 Jun, 2018

17 commits

  • Pull perf tooling fixes from Thomas Gleixner:

    - fix 'perf test Session topology' segfault on s390 (Thomas Richter)

    - fix NULL return handling in bpf__prepare_load() (YueHaibing)

    - fix indexing on Coresight ETM packet queue decoder (Mathieu Poirier)

    - fix perf.data format description of NRCPUS header (Arnaldo Carvalho
    de Melo)

    - update perf.data documentation section on cpu topology

    - handle uncore event aliases in small groups properly (Kan Liang)

    - add missing perf_sample.addr into python sample dictionary (Leo Yan)

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf tools: Fix perf.data format description of NRCPUS header
    perf script python: Add addr into perf sample dict
    perf data: Update documentation section on cpu topology
    perf cs-etm: Fix indexing for decoder packet queue
    perf bpf: Fix NULL return handling in bpf__prepare_load()
    perf test: "Session topology" dumps core on s390
    perf parse-events: Handle uncore event aliases in small groups properly

    Linus Torvalds
     
  • Now we setup q->nr_requests when switching to one new scheduler,
    but not do it for 'none', then q->nr_requests may not be correct
    for 'none'.

    This patch fixes this issue by always updating 'nr_requests' when
    switching to 'none'.

    Cc: Marco Patalano
    Cc: "Ewan D. Milne"
    Signed-off-by: Ming Lei
    Signed-off-by: Jens Axboe

    Ming Lei
     
  • If we end up splitting a bio and the queue goes away between
    the initial submission and the later split submission, then we
    can block forever in blk_queue_enter() waiting for the reference
    to drop to zero. This will never happen, since we already hold
    a reference.

    Mark a split bio as already having entered the queue, so we can
    just use the live non-blocking queue enter variant.

    Thanks to Tetsuo Handa for the analysis.

    Reported-by: syzbot+c4f9cebf9d651f6e54de@syzkaller.appspotmail.com
    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • The counter for the number of allocated pages includes pages in the
    mempool's reserve, so checking that the number of allocated pages is 0
    needs to happen after we exit the mempool.

    Fixes: 6f1c819c219f ("dm: convert to bioset_init()/mempool_init()")
    Signed-off-by: Kent Overstreet
    Reported-by: Krzysztof Kozlowski
    Acked-by: Mike Snitzer

    Fixed to always just use percpu_counter_sum()

    Signed-off-by: Jens Axboe

    Kent Overstreet
     
  • Pull networking fixes from David Miller:

    1) Infinite loop in _decode_session6(), from Eric Dumazet.

    2) Pass correct argument to nla_strlcpy() in netfilter, also from Eric
    Dumazet.

    3) Out of bounds memory access in ipv6 srh code, from Mathieu Xhonneux.

    4) NULL deref in XDP_REDIRECT handling of tun driver, from Toshiaki
    Makita.

    5) Incorrect idr release in cls_flower, from Paul Blakey.

    6) Probe error handling fix in davinci_emac, from Dan Carpenter.

    7) Memory leak in XPS configuration, from Alexander Duyck.

    8) Use after free with cloned sockets in kcm, from Kirill Tkhai.

    9) MTU handling fixes fo ip_tunnel and ip6_tunnel, from Nicolas
    Dichtel.

    10) Fix UAPI hole in bpf data structure for 32-bit compat applications,
    from Daniel Borkmann.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
    bpf: fix uapi hole for 32 bit compat applications
    net: usb: cdc_mbim: add flag FLAG_SEND_ZLP
    ip6_tunnel: remove magic mtu value 0xFFF8
    ip_tunnel: restore binding to ifaces with a large mtu
    net: dsa: b53: Add BCM5389 support
    kcm: Fix use-after-free caused by clonned sockets
    net-sysfs: Fix memory leak in XPS configuration
    ixgbe: fix parsing of TC actions for HW offload
    net: ethernet: davinci_emac: fix error handling in probe()
    net/ncsi: Fix array size in dumpit handler
    cls_flower: Fix incorrect idr release when failing to modify rule
    net/sonic: Use dma_mapping_error()
    xfrm Fix potential error pointer dereference in xfrm_bundle_create.
    vhost_net: flush batched heads before trying to busy polling
    tun: Fix NULL pointer dereference in XDP redirect
    be2net: Fix error detection logic for BE3
    net: qmi_wwan: Add Netgear Aircard 779S
    mlxsw: spectrum: Forbid creation of VLAN 1 over port/LAG
    atm: zatm: fix memcmp casting
    iwlwifi: pcie: compare with number of IRQs requested for, not number of CPUs
    ...

    Linus Torvalds
     
  • Pull SCSI fix from James Bottomley:
    "Eve of merge window fix: The original code was so bogus as to be
    casting the wrong generic device to an rport and proceeding to take
    actions based on the bogus values it found.

    Fortunately it seems the location that is dereferenced always exists,
    so the code hasn't oopsed yet, but it certainly annoys the memory
    checkers"

    * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
    scsi: scsi_transport_srp: Fix shost to rport translation

    Linus Torvalds
     
  • Pull drm fixes from Dave Airlie:
    "A few final fixes:

    i915:
    - fix for potential Spectre vector in the new query uAPI
    - fix NULL pointer deref (FDO #106559)
    - DMI fix to hide LVDS for Radiant P845 (FDO #105468)

    amdgpu:
    - suspend/resume DC regression fix
    - underscan flicker fix on fiji
    - gamma setting fix after dpms

    omap:
    - fix oops regression

    core:
    - fix PSR timing

    dw-hdmi:
    - fix oops regression"

    * tag 'drm-fixes-for-v4.17-rc8' of git://people.freedesktop.org/~airlied/linux:
    drm/amd/display: Update color props when modeset is required
    drm/amd/display: Make atomic-check validate underscan changes
    drm/bridge/synopsys: dw-hdmi: fix dw_hdmi_setup_rx_sense
    drm/amd/display: Fix BUG_ON during CRTC atomic check update
    drm/i915/query: nospec expects no more than an unsigned long
    drm/i915/query: Protect tainted function pointer lookup
    drm/i915/lvds: Move acpi lid notification registration to registration phase
    drm/i915: Disable LVDS on Radiant P845
    drm/omap: fix NULL deref crash with SDI displays
    drm/psr: Fix missed entry in PSR setup time table.

    Linus Torvalds
     
  • Two last minute DC fixes for 4.17. A fix for underscan on fiji and
    a fix for gamma settings getting after dpms.

    * 'drm-fixes-4.17' of git://people.freedesktop.org/~agd5f/linux:
    drm/amd/display: Update color props when modeset is required
    drm/amd/display: Make atomic-check validate underscan changes

    Dave Airlie
     
  • Pull MIPS fixes from James Hogan:
    "A final few MIPS fixes for 4.17:

    - drop Lantiq gphy reboot/remove reset (4.14)

    - prctl(PR_SET_FP_MODE): Disallow PRE without FR (4.0)

    - ptrace(PTRACE_PEEKUSR): Fix 64-bit FGRs (3.15)"

    * tag 'mips_fixes_4.17_3' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
    MIPS: ptrace: Fix PTRACE_PEEKUSR requests for 64-bit FGRs
    MIPS: prctl: Disallow FRE without FR with PR_SET_FP_MODE requests
    MIPS: lantiq: gphy: Drop reboot/remove reset asserts

    Linus Torvalds
     
  • Pull VFIO fix from Alex Williamson:
    "Revert a pfn page mapping optimization identified as introducing a bad
    page state regression (Alex Williamson)"

    * tag 'vfio-v4.17' of git://github.com/awilliam/linux-vfio:
    Revert "vfio/type1: Improve memory pinning process for raw PFN mapping"

    Linus Torvalds
     
  • Pull char/misc driver fixes from Greg KH:
    "Here are four small bugfixes for some char/misc drivers. Well, really
    three fixes and one fix for one of those fixes due to problems found
    by 0-day.

    This resolves some reported issues with the hwtracing drivers, and a
    reported regression for the thunderbolt subsystem. All of these have
    been in linux-next for a while now with no reported problems"

    * tag 'char-misc-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
    hwtracing: stm: fix build error on some arches
    intel_th: Use correct device when freeing buffers
    stm class: Use vmalloc for the master map
    thunderbolt: Handle NULL boot ACL entries properly

    Linus Torvalds
     
  • Pull IIO driver fixes from Greg KH:
    "Here are some old IIO driver fixes that were sitting in my tree for a
    few weeks. Sorry about not getting them to you sooner. They fix a
    number of small IIO driver issues that have been reported.

    All of these have been in linux-next for a while with no reported
    problems"

    * tag 'staging-4.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
    iio: adc: select buffer for at91-sama5d2_adc
    iio: hid-sensor-trigger: Fix sometimes not powering up the sensor after resume
    iio: adc: at91-sama5d2_adc: fix channel configuration for differential channels
    iio:kfifo_buf: check for uint overflow
    iio:buffer: make length types match kfifo types
    iio: adc: stm32-dfsdm: fix sample rate for div2 spi clock
    iio: adc: stm32-dfsdm: fix successive oversampling settings
    iio: ad7793: implement IIO_CHAN_INFO_SAMP_FREQ

    Linus Torvalds
     
  • Pull rdma fixes from Jason Gunthorpe:
    "Just three small last minute regressions that were found in the last
    week. The Broadcom fix is a bit big for rc7, but since it is fixing
    driver crash regressions that were merged via netdev into rc1, I am
    sending it.

    - bnxt netdev changes merged this cycle caused the bnxt RDMA driver
    to crash under certain situations

    - Arnd found (several, unfortunately) kconfig problems with the
    patches adding INFINIBAND_ADDR_TRANS. Reverting this last part,
    will fix it more fully outside -rc.

    - Subtle change in error code for a uapi function caused breakage in
    userspace. This was bug was subtly introduced cycle"

    * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
    IB/core: Fix error code for invalid GID entry
    IB: Revert "remove redundant INFINIBAND kconfig dependencies"
    RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes

    Linus Torvalds
     
  • Pull i2c fixes from Wolfram Sang:
    "A documentation bugfix and a MAINTAINERS addition"

    * 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
    i2c: ocores: update HDL sources URL
    i2c: xlp9xx: Add MAINTAINERS entry

    Linus Torvalds
     
  • Merge two fixes from Andrew Morton.

    * emailed patches from Andrew Morton :
    mm: fix the NULL mapping case in __isolate_lru_page()
    mm/huge_memory.c: __split_huge_page() use atomic ClearPageDirty()

    Linus Torvalds
     
  • George Boole would have noticed a slight error in 4.16 commit
    69d763fc6d3a ("mm: pin address_space before dereferencing it while
    isolating an LRU page"). Fix it, to match both the comment above it,
    and the original behaviour.

    Although anonymous pages are not marked PageDirty at first, we have an
    old habit of calling SetPageDirty when a page is removed from swap
    cache: so there's a category of ex-swap pages that are easily
    migratable, but were inadvertently excluded from compaction's async
    migration in 4.16.

    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1805302014001.12558@eggly.anvils
    Fixes: 69d763fc6d3a ("mm: pin address_space before dereferencing it while isolating an LRU page")
    Signed-off-by: Hugh Dickins
    Acked-by: Minchan Kim
    Acked-by: Mel Gorman
    Reported-by: Ivan Kalvachev
    Cc: "Huang, Ying"
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Swapping load on huge=always tmpfs (with khugepaged tuned up to be very
    eager, but I'm not sure that is relevant) soon hung uninterruptibly,
    waiting for page lock in shmem_getpage_gfp()'s find_lock_entry(), most
    often when "cp -a" was trying to write to a smallish file. Debug showed
    that the page in question was not locked, and page->mapping NULL by now,
    but page->index consistent with having been in a huge page before.

    Reproduced in minutes on a 4.15 kernel, even with 4.17's 605ca5ede764
    ("mm/huge_memory.c: reorder operations in __split_huge_page_tail()") added
    in; but took hours to reproduce on a 4.17 kernel (no idea why).

    The culprit proved to be the __ClearPageDirty() on tails beyond i_size in
    __split_huge_page(): the non-atomic __bitoperation may have been safe when
    4.8's baa355fd3314 ("thp: file pages support for split_huge_page()")
    introduced it, but liable to erase PageWaiters after 4.10's 62906027091f
    ("mm: add PageWaiters indicating tasks are waiting for a page bit").

    Link: http://lkml.kernel.org/r/alpine.LSU.2.11.1805291841070.3197@eggly.anvils
    Fixes: 62906027091f ("mm: add PageWaiters indicating tasks are waiting for a page bit")
    Signed-off-by: Hugh Dickins
    Acked-by: Kirill A. Shutemov
    Cc: Konstantin Khlebnikov
    Cc: Nicholas Piggin
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

02 Jun, 2018

4 commits

  • Bisection by Amadeusz Sławiński implicates this commit leading to bad
    page state issues after VM shutdown, likely due to unbalanced page
    references. The original commit was intended only as a performance
    improvement, therefore revert for offline rework.

    Link: https://lkml.org/lkml/2018/6/2/97
    Fixes: 356e88ebe447 ("vfio/type1: Improve memory pinning process for raw PFN mapping")
    Cc: Jason Cai (Xiang Feng)
    Reported-by: Amadeusz Sławiński
    Signed-off-by: Alex Williamson

    Alex Williamson
     
  • Daniel Borkmann says:

    ====================
    pull-request: bpf 2018-06-02

    The following pull-request contains BPF updates for your *net* tree.

    The main changes are:

    1) BPF uapi fix in struct bpf_prog_info and struct bpf_map_info in
    order to fix offsets on 32 bit archs.

    This will have a minor merge conflict with net-next which has the
    __u32 gpl_compatible:1 bitfield in struct bpf_prog_info at this
    location. Resolution is to use the gpl_compatible member.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • In 64 bit, we have a 4 byte hole between ifindex and netns_dev in the
    case of struct bpf_map_info but also struct bpf_prog_info. In net-next
    commit b85fab0e67b ("bpf: Add gpl_compatible flag to struct bpf_prog_info")
    added a bitfield into it to expose some flags related to programs. Thus,
    add an unnamed __u32 bitfield for both so that alignment keeps the same
    in both 32 and 64 bit cases, and can be naturally extended from there
    as in b85fab0e67b.

    Before:

    # file test.o
    test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
    # pahole test.o
    struct bpf_map_info {
    __u32 type; /* 0 4 */
    __u32 id; /* 4 4 */
    __u32 key_size; /* 8 4 */
    __u32 value_size; /* 12 4 */
    __u32 max_entries; /* 16 4 */
    __u32 map_flags; /* 20 4 */
    char name[16]; /* 24 16 */
    __u32 ifindex; /* 40 4 */
    __u64 netns_dev; /* 44 8 */
    __u64 netns_ino; /* 52 8 */

    /* size: 64, cachelines: 1, members: 10 */
    /* padding: 4 */
    };

    After (same as on 64 bit):

    # file test.o
    test.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped
    # pahole test.o
    struct bpf_map_info {
    __u32 type; /* 0 4 */
    __u32 id; /* 4 4 */
    __u32 key_size; /* 8 4 */
    __u32 value_size; /* 12 4 */
    __u32 max_entries; /* 16 4 */
    __u32 map_flags; /* 20 4 */
    char name[16]; /* 24 16 */
    __u32 ifindex; /* 40 4 */

    /* XXX 4 bytes hole, try to pack */

    __u64 netns_dev; /* 48 8 */
    __u64 netns_ino; /* 56 8 */
    /* --- cacheline 1 boundary (64 bytes) --- */

    /* size: 64, cachelines: 1, members: 10 */
    /* sum members: 60, holes: 1, sum holes: 4 */
    };

    Reported-by: Dmitry V. Levin
    Reported-by: Eugene Syromiatnikov
    Fixes: 52775b33bb507 ("bpf: offload: report device information about offloaded maps")
    Fixes: 675fc275a3a2d ("bpf: offload: report device information for offloaded programs")
    Signed-off-by: Daniel Borkmann
    Acked-by: Alexei Starovoitov
    Signed-off-by: Alexei Starovoitov

    Daniel Borkmann
     
  • Testing Telit LM940 with ICMP packets > 14552 bytes revealed that
    the modem needs FLAG_SEND_ZLP to properly work, otherwise the cdc
    mbim data interface won't be anymore responsive.

    Signed-off-by: Daniele Palmas
    Acked-by: Bjørn Mork
    Signed-off-by: David S. Miller

    Daniele Palmas