06 Apr, 2016

1 commit


29 Mar, 2016

1 commit

  • Most Maxim PMIC regulator drivers are for sub-devices of Multi-Function
    Devices with drivers under drivers/mfd. But for many of these, the same
    object file name was used for both the MFD and the regulator drivers.

    Having 2 different drivers with the same name causes a lot of confusion
    to Kbuild, specially if these are built as module since only one module
    will be installed and also exported symbols will be undefined due being
    overwritten by the other module during modpost.

    For example, it fixes the following issue when both drivers are module:

    $ make M=drivers/regulator/
    ...
    CC [M] drivers/regulator//max14577.o
    Building modules, stage 2.
    MODPOST 1 modules
    WARNING: "maxim_charger_calc_reg_current" [drivers/regulator//max14577.ko] undefined!
    WARNING: "maxim_charger_currents" [drivers/regulator//max14577.ko] undefined!

    Reported-by: Chanwoo Choi
    Signed-off-by: Javier Martinez Canillas
    Reviewed-by: Chanwoo Choi
    Signed-off-by: Mark Brown

    Javier Martinez Canillas
     

27 Mar, 2016

6 commits

  • Linus Torvalds
     
  • Pull Ceph updates from Sage Weil:
    "There is quite a bit here, including some overdue refactoring and
    cleanup on the mon_client and osd_client code from Ilya, scattered
    writeback support for CephFS and a pile of bug fixes from Zheng, and a
    few random cleanups and fixes from others"

    [ I already decided not to pull this because of it having been rebased
    recently, but ended up changing my mind after all. Next time I'll
    really hold people to it. Oh well. - Linus ]

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (34 commits)
    libceph: use KMEM_CACHE macro
    ceph: use kmem_cache_zalloc
    rbd: use KMEM_CACHE macro
    ceph: use lookup request to revalidate dentry
    ceph: kill ceph_get_dentry_parent_inode()
    ceph: fix security xattr deadlock
    ceph: don't request vxattrs from MDS
    ceph: fix mounting same fs multiple times
    ceph: remove unnecessary NULL check
    ceph: avoid updating directory inode's i_size accidentally
    ceph: fix race during filling readdir cache
    libceph: use sizeof_footer() more
    ceph: kill ceph_empty_snapc
    ceph: fix a wrong comparison
    ceph: replace CURRENT_TIME by current_fs_time()
    ceph: scattered page writeback
    libceph: add helper that duplicates last extent operation
    libceph: enable large, variable-sized OSD requests
    libceph: osdc->req_mempool should be backed by a slab pool
    libceph: make r_request msg_size calculation clearer
    ...

    Linus Torvalds
     
  • Pull orangefs filesystem from Mike Marshall.

    This finally merges the long-pending orangefs filesystem, which has been
    much cleaned up with input from Al Viro over the last six months. From
    the documentation file:

    "OrangeFS is an LGPL userspace scale-out parallel storage system. It
    is ideal for large storage problems faced by HPC, BigData, Streaming
    Video, Genomics, Bioinformatics.

    Orangefs, originally called PVFS, was first developed in 1993 by Walt
    Ligon and Eric Blumer as a parallel file system for Parallel Virtual
    Machine (PVM) as part of a NASA grant to study the I/O patterns of
    parallel programs.

    Orangefs features include:

    - Distributes file data among multiple file servers
    - Supports simultaneous access by multiple clients
    - Stores file data and metadata on servers using local file system
    and access methods
    - Userspace implementation is easy to install and maintain
    - Direct MPI support
    - Stateless"

    see Documentation/filesystems/orangefs.txt for more in-depth details.

    * tag 'ofs-pull-tag-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: (174 commits)
    orangefs: fix orangefs_superblock locking
    orangefs: fix do_readv_writev() handling of error halfway through
    orangefs: have ->kill_sb() evict the VFS side of things first
    orangefs: sanitize ->llseek()
    orangefs-bufmap.h: trim unused junk
    orangefs: saner calling conventions for getting a slot
    orangefs_copy_{to,from}_bufmap(): don't pass bufmap pointer
    orangefs: get rid of readdir_handle_s
    ornagefs: ensure that truncate has an up to date inode size
    orangefs: move code which sets i_link to orangefs_inode_getattr
    orangefs: remove needless wrapper around GFP_KERNEL
    orangefs: remove wrapper around mutex_lock(&inode->i_mutex)
    orangefs: refactor inode type or link_target change detection
    orangefs: use new getattr for revalidate and remove old getattr
    orangefs: use new getattr in inode getattr and permission
    orangefs: use new orangefs_inode_getattr to get size in write and llseek
    orangefs: use new orangefs_inode_getattr to create new inodes
    orangefs: rename orangefs_inode_getattr to orangefs_inode_old_getattr
    orangefs: remove inode->i_lock wrapper
    orangefs: put register_chrdev immediately before register_filesystem
    ...

    Linus Torvalds
     
  • Pull NTB bug fixes from Jon Mason:
    "NTB bug fixes for tasklet from spinning forever, link errors,
    translation window setup, NULL ptr dereference, and ntb-perf errors.

    Also, a modification to the driver API that makes _addr functions
    optional"

    * tag 'ntb-4.6' of git://github.com/jonmason/ntb:
    NTB: Remove _addr functions from ntb_hw_amd
    NTB: Make _addr functions optional in the API
    NTB: Fix incorrect clean up routine in ntb_perf
    NTB: Fix incorrect return check in ntb_perf
    ntb: fix possible NULL dereference
    ntb: add missing setup of translation window
    ntb: stop link work when we do not have memory
    ntb: stop tasklet from spinning forever during shutdown.
    ntb: perf test: fix address space confusion

    Linus Torvalds
     
  • Pull more SCSI updates from James Bottomley:
    "The only new stuff which missed the first pull request is an update to
    the UFS driver.

    The rest is an assortment of bug fixes and minor tweaks which appeared
    recently (some are fixes for recent code and some are stuff spotted
    recently by the checkers or the new gcc-6 compiler [most of Arnd's
    stuff])"

    * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (32 commits)
    scsi_common: do not clobber fixed sense information
    scsi: ufs: select CONFIG_NLS
    scsi: fc: use get/put_unaligned64 for wwn access
    fnic: move printk()s outside of the critical code section.
    qla2xxx: avoid maybe_uninitialized warning
    megaraid_sas: add missing curly braces in ioctl handler
    lpfc: fix misleading indentation
    scsi_transport_sas: add 'scsi_target_id' sysfs attribute
    scsi_dh_alua: uninitialized variable in alua_check_vpd()
    scsi: ufs-qcom: add printouts of testbus debug registers
    scsi: ufs-qcom: enable/disable the device ref clock
    scsi: ufs-qcom: set PA_Local_TX_LCC_Enable before link startup
    scsi: ufs: add device quirk delay before putting UFS rails in LPM
    scsi: ufs: fix leakage during link off state
    scsi: ufs: tune UniPro parameters to optimize hibern8 exit time
    scsi: ufs: handle non spec compliant bkops behaviour by device
    scsi: ufs: add retry for query descriptors
    scsi: ufs: add error recovery after DL NAC error
    scsi: ufs: make error handling bit faster
    scsi: ufs: disable vccq if it's not needed by UFS device
    ...

    Linus Torvalds
     
  • Commit 0b81d07790726 ("fs crypto: move per-file encryption from f2fs
    tree to fs/crypto") moved the f2fs crypto files to fs/crypto/ and
    renamed the symbol prefixes from "f2fs_" to "fscrypt_" (and from "F2FS_"
    to just "FS" for preprocessor symbols).

    Because of the symbol renaming, it's a bit hard to see it as a file
    move: use

    git show -M30 0b81d07790726

    to lower the rename detection to just 30% similarity and make git show
    the files as renamed (the header file won't be shown as a rename even
    then - since all it contains is symbol definitions, it looks almost
    completely different).

    Even with the renames showing as renames, the diffs are not all that
    easy to read, since so much is just the renames. But Eric Biggers
    noticed that it's not just all renames: the initialization of the
    xts_tweak had been broken too, using the inode number rather than the
    page offset.

    That's not right - it makes the xfs_tweak the same for all pages of each
    inode. It _might_ make sense to make the xfs_tweak contain both the
    offset _and_ the inode number, but not just the inode number.

    Reported-by: Eric Biggers
    Cc: Jaegeuk Kim
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

26 Mar, 2016

32 commits

  • Kernel zero day testing warned about address space confusion. A virtual
    iomem address was used where a physical address is expected. The
    offending functions implement an optional part of the api, so they are
    removed. They can be added later, after testing.

    Fixes: a1b3695820aa490e58915d720a1438069813008b

    Signed-off-by: Allen Hubbe
    Acked-by: Xiangliang Yu
    Signed-off-by: Jon Mason

    Allen Hubbe
     
  • * switch orangefs_remount() to taking ORANGEFS_SB(sb) instead of sb
    * remove from the list _before_ orangefs_unmount() - request_mutex
    in the latter will make sure that nothing observed in the loop in
    ORANGEFS_DEV_REMOUNT_ALL handling will get freed until the end
    of loop
    * on removal, keep the forward pointer and zero the back one. That
    way we can drop and regain the spinlock in the loop body (again,
    ORANGEFS_DEV_REMOUNT_ALL one) and still be able to get to the
    rest of the list.

    Signed-off-by: Al Viro
    Signed-off-by: Mike Marshall

    Al Viro
     
  • Error should only be returned if nothing had been read/written.
    Otherwise we need to report a short read/write instead.

    Signed-off-by: Al Viro
    Signed-off-by: Mike Marshall

    Al Viro
     
  • Signed-off-by: Al Viro
    Signed-off-by: Mike Marshall

    Al Viro
     
  • a) open files can't have NULL inodes
    b) it's SEEK_END, not ORANGEFS_SEEK_END; no need to get cute.
    c) make_bad_inode() on lseek()?

    Signed-off-by: Al Viro
    Signed-off-by: Mike Marshall

    Al Viro
     
  • Signed-off-by: Al Viro
    Signed-off-by: Mike Marshall

    Al Viro
     
  • just have it return the slot number or -E... - the caller checks
    the sign anyway

    Signed-off-by: Al Viro
    Signed-off-by: Mike Marshall

    Al Viro
     
  • it's always __orangefs_bufmap

    Signed-off-by: Al Viro
    Signed-off-by: Mike Marshall

    Al Viro
     
  • no point, really - we couldn't keep those across the calls of
    getdents(); it would be too easy to DoS, having all slots exhausted.

    Signed-off-by: Al Viro
    Signed-off-by: Mike Marshall

    Al Viro
     
  • Merge fourth patch-bomb from Andrew Morton:
    "A lot more stuff than expected, sorry. A bunch of ocfs2 reviewing was
    finished off.

    - mhocko's oom-reaper out-of-memory-handler changes

    - ocfs2 fixes and features

    - KASAN feature work

    - various fixes"

    * emailed patches from Andrew Morton : (42 commits)
    thp: fix typo in khugepaged_scan_pmd()
    MAINTAINERS: fill entries for KASAN
    mm/filemap: generic_file_read_iter(): check for zero reads unconditionally
    kasan: test fix: warn if the UAF could not be detected in kmalloc_uaf2
    mm, kasan: stackdepot implementation. Enable stackdepot for SLAB
    arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections
    mm, kasan: add GFP flags to KASAN API
    mm, kasan: SLAB support
    kasan: modify kmalloc_large_oob_right(), add kmalloc_pagealloc_oob_right()
    include/linux/oom.h: remove undefined oom_kills_count()/note_oom_kill()
    mm/page_alloc: prevent merging between isolated and other pageblocks
    drivers/memstick/host/r592.c: avoid gcc-6 warning
    ocfs2: extend enough credits for freeing one truncate record while replaying truncate records
    ocfs2: extend transaction for ocfs2_remove_rightmost_path() and ocfs2_update_edge_lengths() before to avoid inconsistency between inode and et
    ocfs2/dlm: move lock to the tail of grant queue while doing in-place convert
    ocfs2: solve a problem of crossing the boundary in updating backups
    ocfs2: fix occurring deadlock by changing ocfs2_wq from global to local
    ocfs2/dlm: fix BUG in dlm_move_lockres_to_recovery_list
    ocfs2/dlm: fix race between convert and recovery
    ocfs2: fix a deadlock issue in ocfs2_dio_end_io_write()
    ...

    Linus Torvalds
     
  • Pull power management fixlet from Rafael Wysocki:
    "One of commits in my previous pull request changed the permissions of
    drivers/power/avs/rockchip-io-domain.c to executable by mistake"

    * tag 'pm+acpi-4.6-rc1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    Fix permissions of drivers/power/avs/rockchip-io-domain.c

    Linus Torvalds
     
  • Pull ia64 update from Tony Luck:
    "Wire up new system calls p{read,write}v2 for ia64"

    * tag 'please-pull-preadv2' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
    [IA64] Enable preadv2 and pwritev2 syscalls for ia64

    Linus Torvalds
     
  • Pull more input updates from Dmitry Torokhov:
    "Second round of updates for the input subsystem.

    The BYD PS/2 protocol driver now uses absolute reporting mode and
    should behave more like other touchpads; Synaptics driver needed to
    extend one of its quirks to a newer firmware version, and a few USB
    drivers got tightened up checks for the contents of their descriptors"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
    Input: sur40 - fix DMA on stack
    Input: ati_remote2 - fix crashes on detecting device with invalid descriptor
    Input: synaptics - handle spurious release of trackstick buttons, again
    Input: synaptics-rmi4 - remove check of Non-NULL array
    Input: byd - enable absolute mode
    Input: ims-pcu - sanity check against missing interfaces
    Input: melfas_mip4 - add hw_version sysfs attribute

    Linus Torvalds
     
  • !PageLRU should lead to SCAN_PAGE_LRU, not SCAN_SCAN_ABORT result.

    Signed-off-by: Kirill A. Shutemov
    Cc: Ebru Akagunduz
    Cc: Rik van Riel
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Signed-off-by: Andrey Ryabinin
    Cc: Alexander Potapenko
    Acked-by: Dmitry Vyukov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrey Ryabinin
     
  • If
    - generic_file_read_iter() gets called with a zero read length,
    - the read offset is at a page boundary,
    - IOCB_DIRECT is not set
    - and the page in question hasn't made it into the page cache yet,
    then do_generic_file_read() will trigger a readahead with a req_size hint
    of zero.

    Since roundup_pow_of_two(0) is undefined, UBSAN reports

    UBSAN: Undefined behaviour in include/linux/log2.h:63:13
    shift exponent 64 is too large for 64-bit type 'long unsigned int'
    CPU: 3 PID: 1017 Comm: sa1 Tainted: G L 4.5.0-next-20160318+ #14
    [...]
    Call Trace:
    [...]
    [] ondemand_readahead+0x3aa/0x3d0
    [] ? ondemand_readahead+0x3aa/0x3d0
    [] ? find_get_entry+0x2d/0x210
    [] page_cache_sync_readahead+0x63/0xa0
    [] do_generic_file_read+0x80d/0xf90
    [] generic_file_read_iter+0x185/0x420
    [...]
    [] __vfs_read+0x256/0x3d0
    [...]

    when get_init_ra_size() gets called from ondemand_readahead().

    The net effect is that the initial readahead size is arch dependent for
    requested read lengths of zero: for example, since

    1UL << (sizeof(unsigned long) * 8)

    evaluates to 1 on x86 while its result is 0 on ARMv7, the initial readahead
    size becomes 4 on the former and 0 on the latter.

    What's more, whether or not the file access timestamp is updated for zero
    length reads is decided differently for the two cases of IOCB_DIRECT
    being set or cleared: in the first case, generic_file_read_iter()
    explicitly skips updating that timestamp while in the latter case, it is
    always updated through the call to do_generic_file_read().

    According to POSIX, zero length reads "do not modify the last data access
    timestamp" and thus, the IOCB_DIRECT behaviour is POSIXly correct.

    Let generic_file_read_iter() unconditionally check the requested read
    length at its entry and return immediately with success if it is zero.

    Signed-off-by: Nicolai Stange
    Cc: Al Viro
    Reviewed-by: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicolai Stange
     
  • Signed-off-by: Alexander Potapenko
    Acked-by: Andrey Ryabinin
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Steven Rostedt
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     
  • Implement the stack depot and provide CONFIG_STACKDEPOT. Stack depot
    will allow KASAN store allocation/deallocation stack traces for memory
    chunks. The stack traces are stored in a hash table and referenced by
    handles which reside in the kasan_alloc_meta and kasan_free_meta
    structures in the allocated memory chunks.

    IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
    duplication.

    Right now stackdepot support is only enabled in SLAB allocator. Once
    KASAN features in SLAB are on par with those in SLUB we can switch SLUB
    to stackdepot as well, thus removing the dependency on SLUB stack
    bookkeeping, which wastes a lot of memory.

    This patch is based on the "mm: kasan: stack depots" patch originally
    prepared by Dmitry Chernenkov.

    Joonsoo has said that he plans to reuse the stackdepot code for the
    mm/page_owner.c debugging facility.

    [akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t]
    [aryabinin@virtuozzo.com: comment style fixes]
    Signed-off-by: Alexander Potapenko
    Signed-off-by: Andrey Ryabinin
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Steven Rostedt
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     
  • KASAN needs to know whether the allocation happens in an IRQ handler.
    This lets us strip everything below the IRQ entry point to reduce the
    number of unique stack traces needed to be stored.

    Move the definition of __irq_entry to so that the
    users don't need to pull in . Also introduce the
    __softirq_entry macro which is similar to __irq_entry, but puts the
    corresponding functions to the .softirqentry.text section.

    Signed-off-by: Alexander Potapenko
    Acked-by: Steven Rostedt
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     
  • Add GFP flags to KASAN hooks for future patches to use.

    This patch is based on the "mm: kasan: unified support for SLUB and SLAB
    allocators" patch originally prepared by Dmitry Chernenkov.

    Signed-off-by: Alexander Potapenko
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Steven Rostedt
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     
  • Add KASAN hooks to SLAB allocator.

    This patch is based on the "mm: kasan: unified support for SLUB and SLAB
    allocators" patch originally prepared by Dmitry Chernenkov.

    Signed-off-by: Alexander Potapenko
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Steven Rostedt
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     
  • This patchset implements SLAB support for KASAN

    Unlike SLUB, SLAB doesn't store allocation/deallocation stacks for heap
    objects, therefore we reimplement this feature in mm/kasan/stackdepot.c.
    The intention is to ultimately switch SLUB to use this implementation as
    well, which will save a lot of memory (right now SLUB bloats each object
    by 256 bytes to store the allocation/deallocation stacks).

    Also neither SLUB nor SLAB delay the reuse of freed memory chunks, which
    is necessary for better detection of use-after-free errors. We
    introduce memory quarantine (mm/kasan/quarantine.c), which allows
    delayed reuse of deallocated memory.

    This patch (of 7):

    Rename kmalloc_large_oob_right() to kmalloc_pagealloc_oob_right(), as
    the test only checks the page allocator functionality. Also reimplement
    kmalloc_large_oob_right() so that the test allocates a large enough
    chunk of memory that still does not trigger the page allocator fallback.

    Signed-off-by: Alexander Potapenko
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Cc: Andrey Konovalov
    Cc: Dmitry Vyukov
    Cc: Andrey Ryabinin
    Cc: Steven Rostedt
    Cc: Konstantin Serebryany
    Cc: Dmitry Chernenkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexander Potapenko
     
  • A leftover from commit c32b3cbe0d06 ("oom, PM: make OOM detection in the
    freezer path raceless").

    Signed-off-by: Tetsuo Handa
    Acked-by: Michal Hocko
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     
  • Hanjun Guo has reported that a CMA stress test causes broken accounting of
    CMA and free pages:

    > Before the test, I got:
    > -bash-4.3# cat /proc/meminfo | grep Cma
    > CmaTotal: 204800 kB
    > CmaFree: 195044 kB
    >
    >
    > After running the test:
    > -bash-4.3# cat /proc/meminfo | grep Cma
    > CmaTotal: 204800 kB
    > CmaFree: 6602584 kB
    >
    > So the freed CMA memory is more than total..
    >
    > Also the the MemFree is more than mem total:
    >
    > -bash-4.3# cat /proc/meminfo
    > MemTotal: 16342016 kB
    > MemFree: 22367268 kB
    > MemAvailable: 22370528 kB

    Laura Abbott has confirmed the issue and suspected the freepage accounting
    rewrite around 3.18/4.0 by Joonsoo Kim. Joonsoo had a theory that this is
    caused by unexpected merging between MIGRATE_ISOLATE and MIGRATE_CMA
    pageblocks:

    > CMA isolates MAX_ORDER aligned blocks, but, during the process,
    > partialy isolated block exists. If MAX_ORDER is 11 and
    > pageblock_order is 9, two pageblocks make up MAX_ORDER
    > aligned block and I can think following scenario because pageblock
    > (un)isolation would be done one by one.
    >
    > (each character means one pageblock. 'C', 'I' means MIGRATE_CMA,
    > MIGRATE_ISOLATE, respectively.
    >
    > CC -> IC -> II (Isolation)
    > II -> CI -> CC (Un-isolation)
    >
    > If some pages are freed at this intermediate state such as IC or CI,
    > that page could be merged to the other page that is resident on
    > different type of pageblock and it will cause wrong freepage count.

    This was supposed to be prevented by CMA operating on MAX_ORDER blocks,
    but since it doesn't hold the zone->lock between pageblocks, a race
    window does exist.

    It's also likely that unexpected merging can occur between
    MIGRATE_ISOLATE and non-CMA pageblocks. This should be prevented in
    __free_one_page() since commit 3c605096d315 ("mm/page_alloc: restrict
    max order of merging on isolated pageblock"). However, we only check
    the migratetype of the pageblock where buddy merging has been initiated,
    not the migratetype of the buddy pageblock (or group of pageblocks)
    which can be MIGRATE_ISOLATE.

    Joonsoo has suggested checking for buddy migratetype as part of
    page_is_buddy(), but that would add extra checks in allocator hotpath
    and bloat-o-meter has shown significant code bloat (the function is
    inline).

    This patch reduces the bloat at some expense of more complicated code.
    The buddy-merging while-loop in __free_one_page() is initially bounded
    to pageblock_border and without any migratetype checks. The checks are
    placed outside, bumping the max_order if merging is allowed, and
    returning to the while-loop with a statement which can't be possibly
    considered harmful.

    This fixes the accounting bug and also removes the arguably weird state
    in the original commit 3c605096d315 where buddies could be left
    unmerged.

    Fixes: 3c605096d315 ("mm/page_alloc: restrict max order of merging on isolated pageblock")
    Link: https://lkml.org/lkml/2016/3/2/280
    Signed-off-by: Vlastimil Babka
    Reported-by: Hanjun Guo
    Tested-by: Hanjun Guo
    Acked-by: Joonsoo Kim
    Debugged-by: Laura Abbott
    Debugged-by: Joonsoo Kim
    Cc: Mel Gorman
    Cc: "Kirill A. Shutemov"
    Cc: Johannes Weiner
    Cc: Minchan Kim
    Cc: Yasuaki Ishimatsu
    Cc: Zhang Yanfei
    Cc: Michal Nazarewicz
    Cc: Naoya Horiguchi
    Cc: "Aneesh Kumar K.V"
    Cc: [3.18+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vlastimil Babka
     
  • The r592 driver relies on behavior of the DMA mapping API that is
    normally observed but not guaranteed by the API. Instead it uses a
    runtime check to fail transfers if the API ever behaves

    When CONFIG_NEED_SG_DMA_LENGTH is not set, one of the checks turns into a
    comparison of a variable with itself, which gcc-6.0 now warns about:

    drivers/memstick/host/r592.c: In function 'r592_transfer_fifo_dma':
    drivers/memstick/host/r592.c:302:31: error: self-comparison always evaluates to false [-Werror=tautological-compare]
    (sg_dma_len(&dev->req->sg) < dev->req->sg.length)) {
    ^

    The check itself is not a problem, so this patch just rephrases the
    condition in a way that gcc does not consider an indication of a mistake.
    We already know that dev->req->sg.length was initially R592_LFIFO_SIZE, so
    we can compare it to that constant again.

    Signed-off-by: Arnd Bergmann
    Cc: Maxim Levitsky
    Cc: Quentin Lambert
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     
  • Now function ocfs2_replay_truncate_records() first modifies tl_used,
    then calls ocfs2_extend_trans() to extend transactions for gd and alloc
    inode used for freeing clusters. jbd2_journal_restart() may be called
    and it may happen that tl_used in truncate log is decreased but the
    clusters are not freed, which means these clusters are lost. So we
    should avoid extending transactions in these two operations.

    Signed-off-by: joyce.xue
    Reviewed-by: Mark Fasheh
    Acked-by: Joseph Qi
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Xue jiufei
     
  • …e_lengths() before to avoid inconsistency between inode and et

    I found that jbd2_journal_restart() is called in some places without
    keeping things consistently before. However, jbd2_journal_restart() may
    commit the handle's transaction and restart another one. If the first
    transaction is committed successfully while another not, it may cause
    filesystem inconsistency or read only. This is an effort to fix this
    kind of problems.

    This patch (of 3):

    The following functions will be called while truncating an extent:
    ocfs2_remove_btree_range
    -> ocfs2_start_trans
    -> ocfs2_remove_extent
    -> ocfs2_truncate_rec
    -> ocfs2_extend_rotate_transaction
    -> jbd2_journal_restart if jbd2_journal_extend fail
    -> ocfs2_rotate_tree_left
    -> ocfs2_remove_rightmost_path
    -> ocfs2_extend_rotate_transaction
    -> ocfs2_unlink_subtree
    -> ocfs2_update_edge_lengths
    -> ocfs2_extend_trans
    -> jbd2_journal_restart if jbd2_journal_extend fail
    -> ocfs2_et_update_clusters
    -> ocfs2_commit_trans

    jbd2_journal_restart() may be called and it may happened that the buffers
    dirtied in ocfs2_truncate_rec() are committed while buffers dirtied in
    ocfs2_et_update_clusters() are not, the total clusters on extent tree and
    i_clusters in ocfs2_dinode is inconsistency. So the clusters got from
    ocfs2_dinode is incorrect, and it also cause read-only problem when call
    ocfs2_commit_truncate() with the error message: "Inode %llu has empty
    extent block at %llu".

    We should extend enough credits for function ocfs2_remove_rightmost_path
    and ocfs2_update_edge_lengths to avoid this inconsistency.

    Signed-off-by: joyce.xue <xuejiufei@huawei.com>
    Acked-by: Joseph Qi <joseph.qi@huawei.com>
    Cc: Mark Fasheh <mfasheh@suse.com>
    Cc: Joel Becker <jlbec@evilplan.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Xue jiufei
     
  • We have found a bug when two nodes doing umount one after another.

    1) Node 1 migrate a lockres that has 3 locks in grant queue such as
    N2(PR)N3(NL)N4(PR) to N2. After migration, lvb of the lock
    N3(NL) and N4(PR) are empty on node 2 because migration target do not
    copy lvb to these two lock.

    2) Node 3 want to convert to PR, it can be granted in
    __dlmconvert_master(), and the order of these locks is unchanged. The
    lvb of the lock N3(PR) on node 2 is copyed from lockres in function
    dlm_update_lvb() while the lvb of lock N4(PR) is still empty.

    3) Node 2 want to leave domain, it will migrate this lockres to node 3.
    Then node 2 will trigger the BUG in dlm_prepare_lvb_for_migration()
    when adding the lock N4(PR) to mres with the following message because
    the lvb of mres is already copied from lock N3(PR), but the lvb of lock
    N4(PR) is empty.

    "Mismatched lvb in lock cookie=%u:%llu, name=%.*s, node=%u"

    [akpm@linux-foundation.org: tweak comment]
    Signed-off-by: xuejiufei
    Acked-by: Joseph Qi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    xuejiufei
     
  • In update_backups() there exists a problem of crossing the boundary as
    follows:

    we assume that lun will be resized to 1TB(cluster_size is 32kb), it will
    include 0~33554431 cluster, in update_backups func, it will backup super
    block in location of 1TB which is the 33554432th cluster, so the
    phenomenon of crossing the boundary happens.

    Signed-off-by: Yiwen Jiang
    Reviewed-by: Joseph Qi
    Cc: Xue jiufei
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    jiangyiwen
     
  • This patch fixes a deadlock, as follows:

    Node 1 Node 2 Node 3
    1)volume a and b are only mount vol a only mount vol b
    mounted

    2) start to mount b start to mount a

    3) check hb of Node 3 check hb of Node 2
    in vol a, qs_holds++ in vol b, qs_holds++

    4) -------------------- all nodes' network down --------------------

    5) progress of mount b the same situation as
    failed, and then call Node 2
    ocfs2_dismount_volume.
    but the process is hung,
    since there is a work
    in ocfs2_wq cannot beo
    completed. This work is
    about vol a, because
    ocfs2_wq is global wq.
    BTW, this work which is
    scheduled in ocfs2_wq is
    ocfs2_orphan_scan_work,
    and the context in this work
    needs to take inode lock
    of orphan_dir, because
    lockres owner are Node 1 and
    all nodes' nework has been down
    at the same time, so it can't
    get the inode lock.

    6) Why can't this node be fenced
    when network disconnected?
    Because the process of
    mount is hung what caused qs_holds
    is not equal 0.

    Because all works in the ocfs2_wq are relative to the super block.

    The solution is to change the ocfs2_wq from global to local. In other
    words, move it into struct ocfs2_super.

    Signed-off-by: Yiwen Jiang
    Reviewed-by: Joseph Qi
    Cc: Xue jiufei
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Cc: Junxiao Bi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    jiangyiwen
     
  • When master handles convert request, it queues ast first and then
    returns status. This may happen that the ast is sent before the request
    status because the above two messages are sent by two threads. And
    right after the ast is sent, if master down, it may trigger BUG in
    dlm_move_lockres_to_recovery_list in the requested node because ast
    handler moves it to grant list without clear lock->convert_pending. So
    remove BUG_ON statement and check if the ast is processed in
    dlmconvert_remote.

    Signed-off-by: Joseph Qi
    Reported-by: Yiwen Jiang
    Cc: Junxiao Bi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Tariq Saeed
    Cc: Junxiao Bi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi
     
  • There is a race window between dlmconvert_remote and
    dlm_move_lockres_to_recovery_list, which will cause a lock with
    OCFS2_LOCK_BUSY in grant list, thus system hangs.

    dlmconvert_remote
    {
    spin_lock(&res->spinlock);
    list_move_tail(&lock->list, &res->converting);
    lock->convert_pending = 1;
    spin_unlock(&res->spinlock);

    status = dlm_send_remote_convert_request();
    >>>>>> race window, master has queued ast and return DLM_NORMAL,
    and then down before sending ast.
    this node detects master down and calls
    dlm_move_lockres_to_recovery_list, which will revert the
    lock to grant list.
    Then OCFS2_LOCK_BUSY won't be cleared as new master won't
    send ast any more because it thinks already be authorized.

    spin_lock(&res->spinlock);
    lock->convert_pending = 0;
    if (status != DLM_NORMAL)
    dlm_revert_pending_convert(res, lock);
    spin_unlock(&res->spinlock);
    }

    In this case, check if res->state has DLM_LOCK_RES_RECOVERING bit set
    (res is still in recovering) or res master changed (new master has
    finished recovery), reset the status to DLM_RECOVERING, then it will
    retry convert.

    Signed-off-by: Joseph Qi
    Reported-by: Yiwen Jiang
    Reviewed-by: Junxiao Bi
    Cc: Mark Fasheh
    Cc: Joel Becker
    Cc: Tariq Saeed
    Cc: Junxiao Bi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joseph Qi