Eric Lee / smarc-fsl-linux-kernel

10 Sep, 2015

3 commits

bd2843fe1 fix ufs write vs readpage race when writing into a hole ... Browse Code »

Followup to the UFS series - with the way we clear the new blocks (via
buffer cache, possibly on more than a page worth of file) we really
should not insert a reference to new block into inode block tree until
after we'd cleared it.

Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds

Al Viro
2015-09-10 01:43:12 +0800
384989b58 Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 ... Browse Code »

Pull cifs updates from Steve French:
"Small cifs fix and a patch for improved debugging"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Fix use-after-free on mid_q_entry
Update cifs version number
Add way to query server fs info for smb3

Linus Torvalds
2015-09-10 00:59:35 +0800
d77e92e27 dax: update PMD fault handler with PMEM API ... Browse Code »

As part of the v4.3 merge window the DAX code was updated by Matthew and
Kirill to handle PMD pages. Also as part of the v4.3 merge window we
updated the DAX code to do proper PMEM flushing (commit 2765cfbb342c:
"dax: update I/O path to do proper PMEM flushing").

The additional code added by the DAX PMD patches also needs to be
updated to properly use the PMEM API. This ensures that after a PMD
fault is handled the zeros written to the newly allocated pages are
durable on the DIMMs.

linux/dax.h is included to get rid of a bunch of sparse warnings.

Signed-off-by: Ross Zwisler
Cc: Matthew Wilcox ,
Cc: Dan Williams
Cc: Kirill Shutemov
Signed-off-by: Linus Torvalds

Ross Zwisler
2015-09-10 00:47:57 +0800

09 Sep, 2015

28 commits

f6f7a6369 Merge branch 'akpm' (patches from Andrew) ... Browse Code »

Merge second patch-bomb from Andrew Morton:
"Almost all of the rest of MM. There was an unusually large amount of
MM material this time"

* emailed patches from Andrew Morton : (141 commits)
zpool: remove no-op module init/exit
mm: zbud: constify the zbud_ops
mm: zpool: constify the zpool_ops
mm: swap: zswap: maybe_preload & refactoring
zram: unify error reporting
zsmalloc: remove null check from destroy_handle_cache()
zsmalloc: do not take class lock in zs_shrinker_count()
zsmalloc: use class->pages_per_zspage
zsmalloc: consider ZS_ALMOST_FULL as migrate source
zsmalloc: partial page ordering within a fullness_list
zsmalloc: use shrinker to trigger auto-compaction
zsmalloc: account the number of compacted pages
zsmalloc/zram: introduce zs_pool_stats api
zsmalloc: cosmetic compaction code adjustments
zsmalloc: introduce zs_can_compact() function
zsmalloc: always keep per-class stats
zsmalloc: drop unused variable `nr_to_migrate'
mm/memblock.c: fix comment in __next_mem_range()
mm/page_alloc.c: fix type information of memoryless node
memory-hotplug: fix comments in zone_spanned_pages_in_node() and zone_spanned_pages_in_node()
...

Linus Torvalds
2015-09-09 08:52:23 +0800
e81b594cd Merge tag 'regmap-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap ... Browse Code »

Pull regmap updates from Mark Brown:
"This has been a busy release for regmap.

By far the biggest set of changes here are those from Markus Pargmann
which implement support for block transfers in smbus devices. This
required quite a bit of refactoring but leaves us better able to
handle odd restrictions that controllers may have and with better
performance on smbus.

Other new features include:

- Fix interactions with lockdep for nested regmaps (eg, when a device
using regmap is connected to a bus where the bus controller has a
separate regmap). Lockdep's default class identification is too
crude to work without help.

- Support for must write bitfield operations, useful for operations
which require writing a bit to trigger them from Kuniori Morimoto.

- Support for delaying during register patch application from Nariman
Poushin.

- Support for overriding cache state via the debugfs implementation
from Richard Fitzgerald"

* tag 'regmap-v4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap: (25 commits)
regmap: fix a NULL pointer dereference in __regmap_init
regmap: Support bulk reads for devices without raw formatting
regmap-i2c: Add smbus i2c block support
regmap: Add raw_write/read checks for max_raw_write/read sizes
regmap: regmap max_raw_read/write getter functions
regmap: Introduce max_raw_read/write for regmap_bulk_read/write
regmap: Add missing comments about struct regmap_bus
regmap: No multi_write support if bus->write does not exist
regmap: Split use_single_rw internally into use_single_read/write
regmap: Fix regmap_bulk_write for bus writes
regmap: regmap_raw_read return error on !bus->read
regulator: core: Print at debug level on debugfs creation failure
regmap: Fix regmap_can_raw_write check
regmap: fix typos in regmap.c
regmap: Fix integertypes for register address and value
regmap: Move documentation to regmap.h
regmap: Use different lockdep class for each regmap init call
thermal: sti: Add parentheses around bridge->ops->regmap_init call
mfd: vexpress: Add parentheses around bridge->ops->regmap_init call
regmap: debugfs: Fix misuse of IS_ENABLED
...

Linus Torvalds
2015-09-09 07:48:55 +0800
70c3547e3 hugetlbfs: add hugetlbfs_fallocate() ... Browse Code »

This is based on the shmem version, but it has diverged quite a bit. We
have no swap to worry about, nor the new file sealing. Add
synchronication via the fault mutex table to coordinate page faults,
fallocate allocation and fallocate hole punch.

What this allows us to do is move physical memory in and out of a
hugetlbfs file without having it mapped. This also gives us the ability
to support MADV_REMOVE since it is currently implemented using
fallocate(). MADV_REMOVE lets madvise() remove pages from the middle of
a hugetlbfs file, which wasn't possible before.

hugetlbfs fallocate only operates on whole huge pages.

Based on code by Dave Hansen.

Signed-off-by: Mike Kravetz
Reviewed-by: Naoya Horiguchi
Acked-by: Hillf Danton
Cc: Dave Hansen
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Davidlohr Bueso
Cc: Aneesh Kumar
Cc: Christoph Hellwig
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Kravetz
2015-09-09 06:35:28 +0800
b5cec28d3 hugetlbfs: truncate_hugepages() takes a range of pages ... Browse Code »

Modify truncate_hugepages() to take a range of pages (start, end)
instead of simply start. If an end value of LLONG_MAX is passed, the
current "truncate" functionality is maintained. Existing callers are
modified to pass LLONG_MAX as end of range. By keying off end ==
LLONG_MAX, the routine behaves differently for truncate and hole punch.
Page removal is now synchronized with page allocation via faults by
using the fault mutex table. The hole punch case can experience the
rare region_del error and must handle accordingly.

Add the routine hugetlb_fix_reserve_counts to fix up reserve counts in
the case where region_del returns an error.

Since the routine handles more than just the truncate case, it is
renamed to remove_inode_hugepages(). To be consistent, the routine
truncate_huge_page() is renamed remove_huge_page().

Downstream of remove_inode_hugepages(), the routine
hugetlb_unreserve_pages() is also modified to take a range of pages.
hugetlb_unreserve_pages is modified to detect an error from region_del and
pass it back to the caller.

Signed-off-by: Mike Kravetz
Reviewed-by: Naoya Horiguchi
Acked-by: Hillf Danton
Cc: Dave Hansen
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Davidlohr Bueso
Cc: Aneesh Kumar
Cc: Christoph Hellwig
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Kravetz
2015-09-09 06:35:28 +0800
1bfad99ab hugetlbfs: hugetlb_vmtruncate_list() needs to take a range to delete ... Browse Code »

fallocate hole punch will want to unmap a specific range of pages.
Modify the existing hugetlb_vmtruncate_list() routine to take a
start/end range. If end is 0, this indicates all pages after start
should be unmapped. This is the same as the existing truncate
functionality. Modify existing callers to add 0 as end of range.

Since the routine will be used in hole punch as well as truncate
operations, it is more appropriately renamed to hugetlb_vmdelete_list().

Signed-off-by: Mike Kravetz
Reviewed-by: Naoya Horiguchi
Acked-by: Hillf Danton
Cc: Dave Hansen
Cc: David Rientjes
Cc: Hugh Dickins
Cc: Davidlohr Bueso
Cc: Aneesh Kumar
Cc: Christoph Hellwig
Cc: Michal Hocko
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Mike Kravetz
2015-09-09 06:35:28 +0800
8334b9622 mm: /proc/pid/smaps:: show proportional swap share of the mapping ... Browse Code »

We want to know per-process workingset size for smart memory management
on userland and we use swap(ex, zram) heavily to maximize memory
efficiency so workingset includes swap as well as RSS.

On such system, if there are lots of shared anonymous pages, it's really
hard to figure out exactly how many each process consumes memory(ie, rss
+ wap) if the system has lots of shared anonymous memory(e.g, android).

This patch introduces SwapPss field on /proc//smaps so we can get
more exact workingset size per process.

Bongkyu tested it. Result is below.

1. 50M used swap
SwapTotal: 461976 kB
SwapFree: 411192 kB

$ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}';
48236
$ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}';
141184

2. 240M used swap
SwapTotal: 461976 kB
SwapFree: 216808 kB

$ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}';
230315
$ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}';
1387744

[akpm@linux-foundation.org: simplify kunmap_atomic() call]
Signed-off-by: Minchan Kim
Reported-by: Bongkyu Kim
Tested-by: Bongkyu Kim
Cc: Hugh Dickins
Cc: Sergey Senozhatsky
Cc: Jonathan Corbet
Cc: Jerome Marchand
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Minchan Kim
2015-09-09 06:35:28 +0800
77bb499bb pagemap: add mmap-exclusive bit for marking pages mapped only here ... Browse Code »

This patch sets bit 56 in pagemap if this page is mapped only once. It
allows to detect exclusively used pages without exposing PFN:

present file exclusive state
0 0 0 non-present
1 1 0 file page mapped somewhere else
1 1 1 file page mapped only here
1 0 0 anon non-CoWed page (shared with parent/child)
1 0 1 anon CoWed page (or never forked)

CoWed pages in (MAP_FILE | MAP_PRIVATE) areas are anon in this context.

MMap-exclusive bit doesn't reflect potential page-sharing via swapcache:
page could be mapped once but has several swap-ptes which point to it.
Application could detect that by swap bit in pagemap entry and touch that
pte via /proc/pid/mem to get real information.

See http://lkml.kernel.org/r/CAEVpBa+_RyACkhODZrRvQLs80iy0sqpdrd0AaP_-tgnX3Y9yNQ@mail.gmail.com

Requested by Mark Williamson.

[akpm@linux-foundation.org: fix spello]
Signed-off-by: Konstantin Khlebnikov
Reviewed-by: Mark Williamson
Tested-by: Mark Williamson
Reviewed-by: Naoya Horiguchi
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2015-09-09 06:35:28 +0800
1c90308e7 pagemap: hide physical addresses from non-privileged users ... Browse Code »

This patch makes pagemap readable for normal users and hides physical
addresses from them. For some use-cases PFN isn't required at all.

See http://lkml.kernel.org/r/1425935472-17949-1-git-send-email-kirill@shutemov.name

Fixes: ab676b7d6fbf ("pagemap: do not leak physical addresses to non-privileged userspace")
Signed-off-by: Konstantin Khlebnikov
Cc: Naoya Horiguchi
Reviewed-by: Mark Williamson
Tested-by: Mark Williamson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2015-09-09 06:35:28 +0800
356515e7b pagemap: rework hugetlb and thp report ... Browse Code »

This patch moves pmd dissection out of reporting loop: huge pages are
reported as bunch of normal pages with contiguous PFNs.

Add missing "FILE" bit in hugetlb vmas.

Signed-off-by: Konstantin Khlebnikov
Reviewed-by: Naoya Horiguchi
Reviewed-by: Mark Williamson
Tested-by: Mark Williamson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2015-09-09 06:35:28 +0800
deb945441 pagemap: switch to the new format and do some cleanup ... Browse Code »

This patch removes page-shift bits (scheduled to remove since 3.11) and
completes migration to the new bit layout. Also it cleans messy macro.

Signed-off-by: Konstantin Khlebnikov
Reviewed-by: Naoya Horiguchi
Cc: Mark Williamson
Tested-by: Mark Williamson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2015-09-09 06:35:28 +0800
a06db751c pagemap: check permissions and capabilities at open time ... Browse Code »

This patchset makes pagemap useable again in the safe way (after row
hammer bug it was made CAP_SYS_ADMIN-only). This patchset restores access
for non-privileged users but hides PFNs from them.

Also it adds bit 'map-exclusive' which is set if page is mapped only here:
it helps in estimation of working set without exposing pfns and allows to
distinguish CoWed and non-CoWed private anonymous pages.

Second patch removes page-shift bits and completes migration to the new
pagemap format: flags soft-dirty and mmap-exclusive are available only in
the new format.

This patch (of 5):

This patch moves permission checks from pagemap_read() into pagemap_open().

Pointer to mm is saved in file->private_data. This reference pins only
mm_struct itself. /proc/*/mem, maps, smaps already work in the same way.

See http://lkml.kernel.org/r/CA+55aFyKpWrt_Ajzh1rzp_GcwZ4=6Y=kOv8hBz172CFJp6L8Tg@mail.gmail.com

Signed-off-by: Konstantin Khlebnikov
Reviewed-by: Naoya Horiguchi
Reviewed-by: Mark Williamson
Tested-by: Mark Williamson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Konstantin Khlebnikov
2015-09-09 06:35:28 +0800
46c043ede mm: take i_mmap_lock in unmap_mapping_range() for DAX ... Browse Code »

DAX is not so special: we need i_mmap_lock to protect mapping->i_mmap.

__dax_pmd_fault() uses unmap_mapping_range() shoot out zero page from
all mappings. We need to drop i_mmap_lock there to avoid lock deadlock.

Re-aquiring the lock should be fine since we check i_size after the
point.

Signed-off-by: Kirill A. Shutemov
Cc: Matthew Wilcox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2015-09-09 06:35:28 +0800
3fdd1b479 dax: use linear_page_index() ... Browse Code »

I was basically open-coding it (thanks to copying code from do_fault()
which probably also needs to be fixed).

Signed-off-by: Matthew Wilcox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2015-09-09 06:35:28 +0800
73a6ec47f dax: ensure that zero pages are removed from other processes ... Browse Code »

If the first access to a huge page was a store, there would be no existing
zero pmd in this process's page tables. There could be a zero pmd in
another process's page tables, if it had done a load. We can detect this
case by noticing that the buffer_head returned from the filesystem is New,
and ensure that other processes mapping this huge page have their page
tables flushed.

Signed-off-by: Matthew Wilcox
Reported-by: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2015-09-09 06:35:28 +0800
d295e3415 dax: don't use set_huge_zero_page() ... Browse Code »

This is another place where DAX assumed that pgtable_t was a pointer.
Open code the important parts of set_huge_zero_page() in DAX and make
set_huge_zero_page() static again.

Signed-off-by: Kirill A. Shutemov
Signed-off-by: Matthew Wilcox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2015-09-09 06:35:28 +0800
843172978 dax: fix race between simultaneous faults ... Browse Code »

If two threads write-fault on the same hole at the same time, the winner
of the race will return to userspace and complete their store, only to
have the loser overwrite their store with zeroes. Fix this for now by
taking the i_mmap_sem for write instead of read, and do so outside the
call to get_block(). Now the loser of the race will see the block has
already been zeroed, and will not zero it again.

This severely limits our scalability. I have ideas for improving it, but
those can wait for a later patch.

Signed-off-by: Matthew Wilcox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2015-09-09 06:35:28 +0800
01a33b4ac ext4: start transaction before calling into DAX ... Browse Code »

Jan Kara pointed out that in the case where we are writing to a hole, we
can end up with a lock inversion between the page lock and the journal
lock. We can avoid this by starting the transaction in ext4 before
calling into DAX. The journal lock nests inside the superblock
pagefault lock, so we have to duplicate that code from dax_fault, like
XFS does.

Signed-off-by: Matthew Wilcox
Cc: Jan Kara
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2015-09-09 06:35:28 +0800
ed923b577 ext4: add ext4_get_block_dax() ... Browse Code »

DAX wants different semantics from any currently-existing ext4 get_block
callback. Unlike ext4_get_block_write(), it needs to honour the
'create' flag, and unlike ext4_get_block(), it needs to be able to
return unwritten extents. So introduce a new ext4_get_block_dax() which
has those semantics.

We could also change ext4_get_block_write() to honour the 'create' flag,
but that might have consequences on other users that I do not currently
understand.

Signed-off-by: Matthew Wilcox
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2015-09-09 06:35:28 +0800
84c4e5e67 dax: improve comment about truncate race ... Browse Code »

Jan Kara pointed out I should be more explicit here about the perils of
racing against truncate. The comment is mostly the same as for the PTE
case.

Signed-off-by: Matthew Wilcox
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2015-09-09 06:35:28 +0800
e676a4c19 ext4: use ext4_get_block_write() for DAX ... Browse Code »

DAX relies on the get_block function either zeroing newly allocated
blocks before they're findable by subsequent calls to get_block, or
marking newly allocated blocks as unwritten. ext4_get_block() cannot
create unwritten extents, but ext4_get_block_write() can.

Signed-off-by: Matthew Wilcox
Reported-by: Andy Rudoff
Cc: Theodore Ts'o
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2015-09-09 06:35:28 +0800
dd8a2b6c2 fs/dax.c: fix typo in #endif comment ... Browse Code »

Fix typo s/CONFIG_TRANSPARENT_HUGEPAGES/CONFIG_TRANSPARENT_HUGEPAGE/ in
#endif comment introduced by commit 2b26a9206d6a ("dax: add huge page
fault support").

Signed-off-by: Valentin Rothberg
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Valentin Rothberg
2015-09-09 06:35:28 +0800
acd76e74d xfs: huge page fault support ... Browse Code »

Use DAX to provide support for huge pages.

Signed-off-by: Matthew Wilcox
Cc: Hillf Danton
Cc: "Kirill A. Shutemov"
Cc: Theodore Ts'o
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2015-09-09 06:35:28 +0800
11bd1a9ec ext4: huge page fault support ... Browse Code »

Use DAX to provide support for huge pages.

Signed-off-by: Matthew Wilcox
Cc: Hillf Danton
Cc: "Kirill A. Shutemov"
Cc: Theodore Ts'o
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2015-09-09 06:35:28 +0800
e7b1ea2ad ext2: huge page fault support ... Browse Code »

Use DAX to provide support for huge pages.

Signed-off-by: Matthew Wilcox
Cc: Hillf Danton
Cc: "Kirill A. Shutemov"
Cc: Theodore Ts'o
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2015-09-09 06:35:28 +0800
844f35db1 dax: add huge page fault support ... Browse Code »

This is the support code for DAX-enabled filesystems to allow them to
provide huge pages in response to faults.

Signed-off-by: Matthew Wilcox
Cc: Hillf Danton
Cc: "Kirill A. Shutemov"
Cc: Theodore Ts'o
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2015-09-09 06:35:28 +0800
c94c2acf8 dax: move DAX-related functions to a new header ... Browse Code »

In order to handle the !CONFIG_TRANSPARENT_HUGEPAGES case, we need to
return VM_FAULT_FALLBACK from the inlined dax_pmd_fault(), which is
defined in linux/mm.h. Given that we don't want to include
in , the easiest solution is to move the DAX-related
functions to a new header, . We could also have moved
VM_FAULT_* definitions to a new header, or a different header that isn't
quite such a boil-the-ocean header as , but this felt like
the best option.

Signed-off-by: Matthew Wilcox
Cc: Hillf Danton
Cc: "Kirill A. Shutemov"
Cc: Theodore Ts'o
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Matthew Wilcox
2015-09-09 06:35:28 +0800
12f03ee60 Merge tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm ... Browse Code »

Pull libnvdimm updates from Dan Williams:
"This update has successfully completed a 0day-kbuild run and has
appeared in a linux-next release. The changes outside of the typical
drivers/nvdimm/ and drivers/acpi/nfit.[ch] paths are related to the
removal of IORESOURCE_CACHEABLE, the introduction of memremap(), and
the introduction of ZONE_DEVICE + devm_memremap_pages().

Summary:

- Introduce ZONE_DEVICE and devm_memremap_pages() as a generic
mechanism for adding device-driver-discovered memory regions to the
kernel's direct map.

This facility is used by the pmem driver to enable pfn_to_page()
operations on the page frames returned by DAX ('direct_access' in
'struct block_device_operations').

For now, the 'memmap' allocation for these "device" pages comes
from "System RAM". Support for allocating the memmap from device
memory will arrive in a later kernel.

- Introduce memremap() to replace usages of ioremap_cache() and
ioremap_wt(). memremap() drops the __iomem annotation for these
mappings to memory that do not have i/o side effects. The
replacement of ioremap_cache() with memremap() is limited to the
pmem driver to ease merging the api change in v4.3.

Completion of the conversion is targeted for v4.4.

- Similar to the usage of memcpy_to_pmem() + wmb_pmem() in the pmem
driver, update the VFS DAX implementation and PMEM api to provide
persistence guarantees for kernel operations on a DAX mapping.

- Convert the ACPI NFIT 'BLK' driver to map the block apertures as
cacheable to improve performance.

- Miscellaneous updates and fixes to libnvdimm including support for
issuing "address range scrub" commands, clarifying the optimal
'sector size' of pmem devices, a clarification of the usage of the
ACPI '_STA' (status) property for DIMM devices, and other minor
fixes"

* tag 'libnvdimm-for-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (34 commits)
libnvdimm, pmem: direct map legacy pmem by default
libnvdimm, pmem: 'struct page' for pmem
libnvdimm, pfn: 'struct page' provider infrastructure
x86, pmem: clarify that ARCH_HAS_PMEM_API implies PMEM mapped WB
add devm_memremap_pages
mm: ZONE_DEVICE for "device memory"
mm: move __phys_to_pfn and __pfn_to_phys to asm/generic/memory_model.h
dax: drop size parameter to ->direct_access()
nd_blk: change aperture mapping from WC to WB
nvdimm: change to use generic kvfree()
pmem, dax: have direct_access use __pmem annotation
dax: update I/O path to do proper PMEM flushing
pmem: add copy_from_iter_pmem() and clear_pmem()
pmem, x86: clean up conditional pmem includes
pmem: remove layer when calling arch_has_wmb_pmem()
pmem, x86: move x86 PMEM API to new pmem.h header
libnvdimm, e820: make CONFIG_X86_PMEM_LEGACY a tristate option
pmem: switch to devm_ allocations
devres: add devm_memremap
libnvdimm, btt: write and validate parent_uuid
...

Linus Torvalds
2015-09-09 05:35:59 +0800
b9ffce9ae Merge tag 'ecryptfs-4.3-rc1-stale-dcache' of git://git.kernel.org/pub/scm/linux/… ... Browse Code »

…kernel/git/tyhicks/ecryptfs

Pull ecryptfs fixes from Tyler Hicks:
"Invalidate stale eCryptfs dcache entries caused by unlinked lower
inodes"

* tag 'ecryptfs-4.3-rc1-stale-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
eCryptfs: Delete a check before the function call "key_put"
eCryptfs: Invalidate dcache entries when lower i_nlink is zero

Linus Torvalds
2015-09-09 02:26:17 +0800

08 Sep, 2015

4 commits

4e4adb2f4 Merge tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client updates from Trond Myklebust:
"Highlights include:

Stable patches:
- Fix atomicity of pNFS commit list updates
- Fix NFSv4 handling of open(O_CREAT|O_EXCL|O_RDONLY)
- nfs_set_pgio_error sometimes misses errors
- Fix a thinko in xs_connect()
- Fix borkage in _same_data_server_addrs_locked()
- Fix a NULL pointer dereference of migration recovery ops for v4.2
client
- Don't let the ctime override attribute barriers.
- Revert "NFSv4: Remove incorrect check in can_open_delegated()"
- Ensure flexfiles pNFS driver updates the inode after write finishes
- flexfiles must not pollute the attribute cache with attrbutes from
the DS
- Fix a protocol error in layoutreturn
- Fix a protocol issue with NFSv4.1 CLOSE stateids

Bugfixes + cleanups
- pNFS blocks bugfixes from Christoph
- Various cleanups from Anna
- More fixes for delegation corner cases
- Don't fsync twice for O_SYNC/IS_SYNC files
- Fix pNFS and flexfiles layoutstats bugs
- pnfs/flexfiles: avoid duplicate tracking of mirror data
- pnfs: Fix layoutget/layoutreturn/return-on-close serialisation
issues
- pnfs/flexfiles: error handling retries a layoutget before fallback
to MDS

Features:
- Full support for the OPEN NFS4_CREATE_EXCLUSIVE4_1 mode from
Kinglong
- More RDMA client transport improvements from Chuck
- Removal of the deprecated ib_reg_phys_mr() and ib_rereg_phys_mr()
verbs from the SUNRPC, Lustre and core infiniband tree.
- Optimise away the close-to-open getattr if there is no cached data"

* tag 'nfs-for-4.3-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (108 commits)
NFSv4: Respect the server imposed limit on how many changes we may cache
NFSv4: Express delegation limit in units of pages
Revert "NFS: Make close(2) asynchronous when closing NFS O_DIRECT files"
NFS: Optimise away the close-to-open getattr if there is no cached data
NFSv4.1/flexfiles: Clean up ff_layout_write_done_cb/ff_layout_commit_done_cb
NFSv4.1/flexfiles: Mark the layout for return in ff_layout_io_track_ds_error()
nfs: Remove unneeded checking of the return value from scnprintf
nfs: Fix truncated client owner id without proto type
NFSv4.1/flexfiles: Mark layout for return if the mirrors are invalid
NFSv4.1/flexfiles: RW layouts are valid only if all mirrors are valid
NFSv4.1/flexfiles: Fix incorrect usage of pnfs_generic_mark_devid_invalid()
NFSv4.1/flexfiles: Fix freeing of mirrors
NFSv4.1/pNFS: Don't request a minimal read layout beyond the end of file
NFSv4.1/pnfs: Handle LAYOUTGET return values correctly
NFSv4.1/pnfs: Don't ask for a read layout for an empty file.
NFSv4.1: Fix a protocol issue with CLOSE stateids
NFSv4.1/flexfiles: Don't mark the entire deviceid as bad for file errors
SUNRPC: Prevent SYN+SYNACK+RST storms
SUNRPC: xs_reset_transport must mark the connection as disconnected
NFSv4.1/pnfs: Ensure layoutreturn reserves space for the opaque payload
...

Linus Torvalds
2015-09-08 05:02:24 +0800
77a78806c Merge tag 'xfs-for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs ... Browse Code »

Pull xfs updates from Dave Chinner:
"There isn't a whole lot to this update - it's mostly bug fixes and
they are spread pretty much all over XFS. There are some corruption
fixes, some fixes for log recovery, some fixes that prevent unount
from hanging, a lockdep annotation rework for inode locking to prevent
false positives and the usual random bunch of cleanups and minor
improvements.

Deatils:

- large rework of EFI/EFD lifecycle handling to fix log recovery
corruption issues, crashes and unmount hangs

- separate metadata UUID on disk to enable changing boot label UUID
for v5 filesystems

- fixes for gcc miscompilation on certain platforms and optimisation
levels

- remote attribute allocation and recovery corruption fixes

- inode lockdep annotation rework to fix bugs with too many
subclasses

- directory inode locking changes to prevent lockdep false positives

- a handful of minor corruption fixes

- various other small cleanups and bug fixes"

* tag 'xfs-for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs: (42 commits)
xfs: fix error gotos in xfs_setattr_nonsize
xfs: add mssing inode cache attempts counter increment
xfs: return errors from partial I/O failures to files
libxfs: bad magic number should set da block buffer error
xfs: fix non-debug build warnings
xfs: collapse allocsize and biosize mount option handling
xfs: Fix file type directory corruption for btree directories
xfs: lockdep annotations throw warnings on non-debug builds
xfs: Fix uninitialized return value in xfs_alloc_fix_freelist()
xfs: inode lockdep annotations broke non-lockdep build
xfs: flush entire file on dio read/write to cached file
xfs: Fix xfs_attr_leafblock definition
libxfs: readahead of dir3 data blocks should use the read verifier
xfs: stop holding ILOCK over filldir callbacks
xfs: clean up inode lockdep annotations
xfs: swap leaf buffer into path struct atomically during path shift
xfs: relocate sparse inode mount warning
xfs: dquots should be stamped with sb_meta_uuid
xfs: log recovery needs to validate against sb_meta_uuid
xfs: growfs not aware of sb_meta_uuid
...

Linus Torvalds
2015-09-08 04:28:32 +0800
5445b1fbd NFSv4: Respect the server imposed limit on how many changes we may cache ... Browse Code »

The NFSv4 delegation spec allows the server to tell a client to limit how
much data it cache after the file is closed. In return, the server
guarantees enough free space to avoid ENOSPC situations, etc.
Prior to this patch, we assumed we could always cache aggressively after
close. Unfortunately, this causes problems with servers that set the
limit to 0 and therefore do not offer any ENOSPC guarantees.

Signed-off-by: Trond Myklebust

Trond Myklebust
2015-09-08 00:36:17 +0800
7d160a6c4 NFSv4: Express delegation limit in units of pages ... Browse Code »

Since we're tracking modifications to the page cache on a per-page
basis, it makes sense to express the limit to how much we may cache
in units of pages.

Signed-off-by: Trond Myklebust

Trond Myklebust
2015-09-08 00:36:13 +0800

06 Sep, 2015

4 commits

7d9071a09 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs updates from Al Viro:
"In this one:

- d_move fixes (Eric Biederman)

- UFS fixes (me; locking is mostly sane now, a bunch of bugs in error
handling ought to be fixed)

- switch of sb_writers to percpu rwsem (Oleg Nesterov)

- superblock scalability (Josef Bacik and Dave Chinner)

- swapon(2) race fix (Hugh Dickins)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (65 commits)
vfs: Test for and handle paths that are unreachable from their mnt_root
dcache: Reduce the scope of i_lock in d_splice_alias
dcache: Handle escaped paths in prepend_path
mm: fix potential data race in SyS_swapon
inode: don't softlockup when evicting inodes
inode: rename i_wb_list to i_io_list
sync: serialise per-superblock sync operations
inode: convert inode_sb_list_lock to per-sb
inode: add hlist_fake to avoid the inode hash lock in evict
writeback: plug writeback at a high level
change sb_writers to use percpu_rw_semaphore
shift percpu_counter_destroy() into destroy_super_work()
percpu-rwsem: kill CONFIG_PERCPU_RWSEM
percpu-rwsem: introduce percpu_rwsem_release() and percpu_rwsem_acquire()
percpu-rwsem: introduce percpu_down_read_trylock()
document rwsem_release() in sb_wait_write()
fix the broken lockdep logic in __sb_start_write()
introduce __sb_writers_{acquired,release}() helpers
ufs_inode_get{frag,block}(): get rid of 'phys' argument
ufs_getfrag_block(): tidy up a bit
...

Linus Torvalds
2015-09-06 11:34:28 +0800
bd7796699 Merge tag 'for-linus-4.3-merge-window-part-1' of git://git.kernel.org/pub/scm/li… ... Browse Code »

…nux/kernel/git/ericvh/v9fs

Pull 9p updates from Eric Van Hensbergen:
"Just a few cleanups for 4.3 merge window for the 9p file system. I've
gotten several more over the past week, but this group has been in
for-next for at least a couple of weeks so I figured I'd push them
first while I test the rest.

Most of the ones not in this set are bug-fixes anyways so I could hold
them for rc1"

* tag 'for-linus-4.3-merge-window-part-1' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
9p: fix return code of read() when count is 0
9p: remove unused option Opt_trans

Linus Torvalds
2015-09-06 11:33:10 +0800
17447717a Merge tag 'nfsd-4.3' of git://linux-nfs.org/~bfields/linux ... Browse Code »

Pull nfsd updates from Bruce Fields:
"Nothing major, but:

- Add Jeff Layton as an nfsd co-maintainer: no change to existing
practice, just an acknowledgement of the status quo.

- Two patches ("nfsd: ensure that...") for a race overlooked by the
state locking rewrite, causing a crash noticed by multiple users.

- Lots of smaller bugfixes all over from Kinglong Mee.

- From Jeff, some cleanup of server rpc code in preparation for
possible shift of nfsd threads to workqueues"

* tag 'nfsd-4.3' of git://linux-nfs.org/~bfields/linux: (52 commits)
nfsd: deal with DELEGRETURN racing with CB_RECALL
nfsd: return CLID_INUSE for unexpected SETCLIENTID_CONFIRM case
nfsd: ensure that delegation stateid hash references are only put once
nfsd: ensure that the ol stateid hash reference is only put once
net: sunrpc: fix tracepoint Warning: unknown op '->'
nfsd: allow more than one laundry job to run at a time
nfsd: don't WARN/backtrace for invalid container deployment.
fs: fix fs/locks.c kernel-doc warning
nfsd: Add Jeff Layton as co-maintainer
NFSD: Return word2 bitmask if setting security label in OPEN/CREATE
NFSD: Set the attributes used to store the verifier for EXCLUSIVE4_1
nfsd: SUPPATTR_EXCLCREAT must be encoded before SECURITY_LABEL.
nfsd: Fix an FS_LAYOUT_TYPES/LAYOUT_TYPES encode bug
NFSD: Store parent's stat in a separate value
nfsd: Fix two typos in comments
lockd: NLM grace period shouldn't block NFSv4 opens
nfsd: include linux/nfs4.h in export.h
sunrpc: Switch to using hash list instead single list
sunrpc/nfsd: Remove redundant code by exports seq_operations functions
sunrpc: Store cache_detail in seq_file's private directly
...

Linus Torvalds
2015-09-06 08:26:24 +0800
22365979a Merge branch 'for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs ... Browse Code »

Pull btrfs updates from Chris Mason:
"This has Jeff Mahoney's long standing trim patch that fixes corners
where trims were missing. Omar has some raid5/6 fixes, especially for
using scrub and device replace when devices are missing.

Zhao Lie continues cleaning and fixing things, this series fixes some
really hard to hit corners in xfstests. I had to pull it last merge
window due to some deadlocks, but those are now resolved.

I added support for Tejun's new blkio controllers. It seems to work
well for single devices, we'll expand to multi-device as well"

* 'for-linus-4.3' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (47 commits)
btrfs: fix compile when block cgroups are not enabled
Btrfs: fix file read corruption after extent cloning and fsync
Btrfs: check if previous transaction aborted to avoid fs corruption
btrfs: use __GFP_NOFAIL in alloc_btrfs_bio
btrfs: Prevent from early transaction abort
btrfs: Remove unused arguments in tree-log.c
btrfs: Remove useless condition in start_log_trans()
Btrfs: add support for blkio controllers
Btrfs: remove unused mutex from struct 'btrfs_fs_info'
Btrfs: fix parity scrub of RAID 5/6 with missing device
Btrfs: fix device replace of a missing RAID 5/6 device
Btrfs: add RAID 5/6 BTRFS_RBIO_REBUILD_MISSING operation
Btrfs: count devices correctly in readahead during RAID 5/6 replace
Btrfs: remove misleading handling of missing device scrub
btrfs: fix clone / extent-same deadlocks
Btrfs: fix defrag to merge tail file extent
Btrfs: fix warning in backref walking
btrfs: Add WARN_ON() for double lock in btrfs_tree_lock()
btrfs: Remove root argument in extent_data_ref_count()
btrfs: Fix wrong comment of btrfs_alloc_tree_block()
...

Linus Torvalds
2015-09-06 06:14:43 +0800

05 Sep, 2015

1 commit

5477e70a6 mm: move ->mremap() from file_operations to vm_operations_struct ... Browse Code »

vma->vm_ops->mremap() looks more natural and clean in move_vma(), and this
way ->mremap() can have more users. Say, vdso.

While at it, s/aio_ring_remap/aio_ring_mremap/.

Note: this is the minimal change before ->mremap() finds another user in
file_operations; this method should have more arguments, and it can be
used to kill arch_remap().

Signed-off-by: Oleg Nesterov
Acked-by: Pavel Emelyanov
Acked-by: Kirill A. Shutemov
Cc: David Rientjes
Cc: Benjamin LaHaise
Cc: Hugh Dickins
Cc: Jeff Moyer
Cc: Laurent Dufour
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Oleg Nesterov
2015-09-05 07:54:41 +0800