Eric Lee / smarc-fsl-linux-kernel

22 Feb, 2018

1 commit

ef7fd28b1 ext4: correct documentation for grpid mount option ... Browse Code »

commit 9f0372488cc9243018a812e8cfbf27de650b187b upstream.

The grpid option is currently described as being the same as nogrpid.

Signed-off-by: Ernesto A. Fernández
Signed-off-by: Theodore Ts'o
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman

Ernesto A. Fernández
2018-02-22 22:42:26 +0800

06 Oct, 2017

1 commit

8d4ef4e15 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs ... Browse Code »

Pull overlayfs fixes from Miklos Szeredi:
"Fix a regression in 4.14 and one in 4.13. The latter is a case when
Docker is doing something it really shouldn't and gets away with it.
We now print a warning instead of erroring out.

There are also fixes to several error paths"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fix regression caused by exclusive upper/work dir protection
ovl: fix missing unlock_rename() in ovl_do_copy_up()
ovl: fix dentry leak in ovl_indexdir_cleanup()
ovl: fix dput() of ERR_PTR in ovl_cleanup_index()
ovl: fix error value printed in ovl_lookup_index()
ovl: fix may_write_real() for overlayfs directories

Linus Torvalds
2017-10-06 23:52:53 +0800

05 Oct, 2017

1 commit

85fdee1ee ovl: fix regression caused by exclusive upper/work dir protection ... Browse Code »

Enforcing exclusive ownership on upper/work dirs caused a docker
regression: https://github.com/moby/moby/issues/34672.

Euan spotted the regression and pointed to the offending commit.
Vivek has brought the regression to my attention and provided this
reproducer:

Terminal 1:

mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none
merged/

Terminal 2:

unshare -m

Terminal 1:

umount merged
mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none
merged/
mount: /root/overlay-testing/merged: none already mounted or mount point
busy

To fix the regression, I replaced the error with an alarming warning.
With index feature enabled, mount does fail, but logs a suggestion to
override exclusive dir protection by disabling index.
Note that index=off mount does take the inuse locks, so a concurrent
index=off will issue the warning and a concurrent index=on mount will fail.

Documentation was updated to reflect this change.

Fixes: 2cac0c00a6cd ("ovl: get exclusive ownership on upper/work dirs")
Cc: # v4.13
Reported-by: Euan Kemp
Reported-by: Vivek Goyal
Signed-off-by: Amir Goldstein
Signed-off-by: Miklos Szeredi

Amir Goldstein
2017-10-05 21:53:18 +0800

03 Oct, 2017

1 commit

c4142ed60 Merge tag 'driver-core-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull driver core fixes from Greg KH:
"Here are a few small fixes for 4.14-rc4.

The removal of DRIVER_ATTR() was almost completed by 4.14-rc1, but one
straggler made it in through some other tree (odds are, one of
mine...) So there's a simple removal of the last user, and then
finally the macro is removed from the tree.

There's a fix for old crazy udev instances that insist on reloading a
module when it is removed from the kernel due to the new uevents for
bind/unbind. This fixes the reported regression, hopefully some year
in the future we can drop the workaround, once users update to the
latest version, but I'm not holding my breath.

And then there's a build fix for a linker warning, and a buffer
overflow fix to match the PCI fixes you took through the PCI tree in
the same area.

All of these have been in linux-next for a few weeks while I've been
traveling, sorry for the delay"

* tag 'driver-core-4.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: remove DRIVER_ATTR
fpga: altera-cvp: remove DRIVER_ATTR() usage
driver core: platform: Don't read past the end of "driver_override" buffer
base: arch_topology: fix section mismatch build warnings
driver core: suppress sending MODALIAS in UNBIND uevents

Linus Torvalds
2017-10-03 23:57:07 +0800

19 Sep, 2017

2 commits

24420862b Merge tag '4.14-smb3-multidialect-support-and-fixes-for-stable' of git://git.sam… ... Browse Code »

…ba.org/sfrench/cifs-2.6

Pull cifs fixes from Steve French:
"Convert default dialect to smb2.1 or later to allow connecting to
Windows 7 for example, also includes some fixes for stable"

* tag '4.14-smb3-multidialect-support-and-fixes-for-stable' of git://git.samba.org/sfrench/cifs-2.6:
Update version of cifs module
cifs: hide unused functions
SMB3: Add support for multidialect negotiate (SMB2.1 and later)
CIFS/SMB3: Update documentation to reflect SMB3 and various changes
cifs: check rsp for NULL before dereferencing in SMB2_open

Linus Torvalds
2017-09-19 23:35:42 +0800
850fdec8d driver core: remove DRIVER_ATTR ... Browse Code »

DRIVER_ATTR is no longer in use, and driver authors should be using
DRIVER_ATTR_RW() or DRIVER_ATTR_RO() or DRIVER_ATTR_WO() instead in
order to always get the permissions correct. So remove it so that no
one can use it anymore.

Acked-by: Alan Tull
Reviewed-by: Moritz Fischer
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2017-09-19 15:20:33 +0800

17 Sep, 2017

1 commit

ec11653b5 CIFS/SMB3: Update documentation to reflect SMB3 and various changes ... Browse Code »

Signed-off-by: Steve French
Reviewed-by: Aurelien Aptel
Reviewed-by: Pavel Shilovsky

Steve French
2017-09-17 23:48:00 +0800

16 Sep, 2017

1 commit

30db202e5 Merge tag 'for-linus-4.14-ofs2' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux ... Browse Code »

Pull orangefs updates from Mike Marshall:
"Some cleanups and a big bug fix for ACLs.

When I was reviewing Jan Kara's ACL patch, I realized that Orangefs
ACL code was busted, not just in the kernel module, but in the server
as well. I've been working on the code in the server mostly, but
here's one kernel patch, there will be more"

* tag 'for-linus-4.14-ofs2' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux:
orangefs: Adjust three checks for null pointers
orangefs: Use kcalloc() in orangefs_prepare_cdm_array()
orangefs: Delete error messages for a failed memory allocation in five functions
orangefs: constify xattr_handler structure
orangefs: don't call filemap_write_and_wait from fsync
orangefs: off by ones in xattr size checks
orangefs: documentation clean up
orangefs: react properly to posix_acl_update_mode's aftermath.
orangefs: Don't clear SGID when inheriting ACLs

Linus Torvalds
2017-09-16 03:16:18 +0800

15 Sep, 2017

2 commits

0f0d12728 Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull mount flag updates from Al Viro:
"Another chunk of fmount preparations from dhowells; only trivial
conflicts for that part. It separates MS_... bits (very grotty
mount(2) ABI) from the struct super_block ->s_flags (kernel-internal,
only a small subset of MS_... stuff).

This does *not* convert the filesystems to new constants; only the
infrastructure is done here. The next step in that series is where the
conflicts would be; that's the conversion of filesystems. It's purely
mechanical and it's better done after the merge, so if you could run
something like

list=$(for i in MS_RDONLY MS_NOSUID MS_NODEV MS_NOEXEC MS_SYNCHRONOUS MS_MANDLOCK MS_DIRSYNC MS_NOATIME MS_NODIRATIME MS_SILENT MS_POSIXACL MS_KERNMOUNT MS_I_VERSION MS_LAZYTIME; do git grep -l $i fs drivers/staging/lustre drivers/mtd ipc mm include/linux; done|sort|uniq|grep -v '^fs/namespace.c$')

sed -i -e 's/\/SB_RDONLY/g' \
-e 's/\/SB_NOSUID/g' \
-e 's/\/SB_NODEV/g' \
-e 's/\/SB_NOEXEC/g' \
-e 's/\/SB_SYNCHRONOUS/g' \
-e 's/\/SB_MANDLOCK/g' \
-e 's/\/SB_DIRSYNC/g' \
-e 's/\/SB_NOATIME/g' \
-e 's/\/SB_NODIRATIME/g' \
-e 's/\/SB_SILENT/g' \
-e 's/\/SB_POSIXACL/g' \
-e 's/\/SB_KERNMOUNT/g' \
-e 's/\/SB_I_VERSION/g' \
-e 's/\/SB_LAZYTIME/g' \
$list

and commit it with something along the lines of 'convert filesystems
away from use of MS_... constants' as commit message, it would save a
quite a bit of headache next cycle"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
VFS: Differentiate mount flags (MS_*) from internal superblock flags
VFS: Convert sb->s_flags & MS_RDONLY to sb_rdonly(sb)
vfs: Add sb_rdonly(sb) to query the MS_RDONLY flag on s_flags

Linus Torvalds
2017-09-15 09:54:01 +0800
ba5e79ea1 orangefs: documentation clean up ... Browse Code »

Signed-off-by: Mike Marshall

Mike Marshall
2017-09-15 02:54:39 +0800

14 Sep, 2017

1 commit

c353f88f3 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs ... Browse Code »

Pull overlayfs updates from Miklos Szeredi:
"This fixes d_ino correctness in readdir, which brings overlayfs on par
with normal filesystems regarding inode number semantics, as long as
all layers are on the same filesystem.

There are also some bug fixes, one in particular (random ioctl's
shouldn't be able to modify lower layers) that touches some vfs code,
but of course no-op for non-overlay fs"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: fix false positive ESTALE on lookup
ovl: don't allow writing ioctl on lower layer
ovl: fix relatime for directories
vfs: add flags to d_real()
ovl: cleanup d_real for negative
ovl: constant d_ino for non-merge dirs
ovl: constant d_ino across copy up
ovl: fix readdir error value
ovl: check snprintf return

Linus Torvalds
2017-09-14 00:11:44 +0800

13 Sep, 2017

1 commit

6d8ef53e8 Merge tag 'f2fs-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs ... Browse Code »

Pull f2fs updates from Jaegeuk Kim:
"In this round, we've mostly tuned f2fs to provide better user
experience for Android. Especially, we've worked on atomic write
feature again with SQLite community in order to support it officially.
And we added or modified several facilities to analyze and enhance IO
behaviors.

Major changes include:
- add app/fs io stat
- add inode checksum feature
- support project/journalled quota
- enhance atomic write with new ioctl() which exposes feature set
- enhance background gc/discard/fstrim flows with new gc_urgent mode
- add F2FS_IOC_FS{GET,SET}XATTR
- fix some quota flows"

* tag 'f2fs-for-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (63 commits)
f2fs: hurry up to issue discard after io interruption
f2fs: fix to show correct discard_granularity in sysfs
f2fs: detect dirty inode in evict_inode
f2fs: clear radix tree dirty tag of pages whose dirty flag is cleared
f2fs: speed up gc_urgent mode with SSR
f2fs: better to wait for fstrim completion
f2fs: avoid race in between read xattr & write xattr
f2fs: make get_lock_data_page to handle encrypted inode
f2fs: use generic terms used for encrypted block management
f2fs: introduce f2fs_encrypted_file for clean-up
Revert "f2fs: add a new function get_ssr_cost"
f2fs: constify super_operations
f2fs: fix to wake up all sleeping flusher
f2fs: avoid race in between atomic_read & atomic_inc
f2fs: remove unneeded parameter of change_curseg
f2fs: update i_flags correctly
f2fs: don't check inode's checksum if it was dirtied or writebacked
f2fs: don't need to update inode checksum for recovery
f2fs: trigger fdatasync for non-atomic_write file
f2fs: fix to avoid race in between aio and gc
...

Linus Torvalds
2017-09-13 11:05:58 +0800

07 Sep, 2017

2 commits

26b433d0d fscache: remove unused ->now_uncached callback ... Browse Code »

Patch series "Ranged pagevec lookup", v2.

In this series I make pagevec_lookup() update the index (to be
consistent with pagevec_lookup_tag() and also as a preparation for
ranged lookups), provide ranged variant of pagevec_lookup() and use it
in places where it makes sense. This not only removes some common code
but is also a measurable performance win for some use cases (see patch
4/10) where radix tree is sparse and searching & grabing of a page after
the end of the range has measurable overhead.

This patch (of 10):

The callback doesn't ever get called. Remove it.

Link: http://lkml.kernel.org/r/20170726114704.7626-2-jack@suse.cz
Signed-off-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Jan Kara
2017-09-07 08:27:26 +0800
91d25ba8a dax: use common 4k zero page for dax mmap reads ... Browse Code »

When servicing mmap() reads from file holes the current DAX code
allocates a page cache page of all zeroes and places the struct page
pointer in the mapping->page_tree radix tree.

This has three major drawbacks:

1) It consumes memory unnecessarily. For every 4k page that is read via
a DAX mmap() over a hole, we allocate a new page cache page. This
means that if you read 1GiB worth of pages, you end up using 1GiB of
zeroed memory. This is easily visible by looking at the overall
memory consumption of the system or by looking at /proc/[pid]/smaps:

7f62e72b3000-7f63272b3000 rw-s 00000000 103:00 12 /root/dax/data
Size: 1048576 kB
Rss: 1048576 kB
Pss: 1048576 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 1048576 kB
Private_Dirty: 0 kB
Referenced: 1048576 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB

2) It is slower than using a common zero page because each page fault
has more work to do. Instead of just inserting a common zero page we
have to allocate a page cache page, zero it, and then insert it. Here
are the average latencies of dax_load_hole() as measured by ftrace on
a random test box:

Old method, using zeroed page cache pages: 3.4 us
New method, using the common 4k zero page: 0.8 us

This was the average latency over 1 GiB of sequential reads done by
this simple fio script:

[global]
size=1G
filename=/root/dax/data
fallocate=none
[io]
rw=read
ioengine=mmap

3) The fact that we had to check for both DAX exceptional entries and
for page cache pages in the radix tree made the DAX code more
complex.

Solve these issues by following the lead of the DAX PMD code and using a
common 4k zero page instead. As with the PMD code we will now insert a
DAX exceptional entry into the radix tree instead of a struct page
pointer which allows us to remove all the special casing in the DAX
code.

Note that we do still pretty aggressively check for regular pages in the
DAX radix tree, especially where we take action based on the bits set in
the page. If we ever find a regular page in our radix tree now that
most likely means that someone besides DAX is inserting pages (which has
happened lots of times in the past), and we want to find that out early
and fail loudly.

This solution also removes the extra memory consumption. Here is that
same /proc/[pid]/smaps after 1GiB of reading from a hole with the new
code:

7f2054a74000-7f2094a74000 rw-s 00000000 103:00 12 /root/dax/data
Size: 1048576 kB
Rss: 0 kB
Pss: 0 kB
Shared_Clean: 0 kB
Shared_Dirty: 0 kB
Private_Clean: 0 kB
Private_Dirty: 0 kB
Referenced: 0 kB
Anonymous: 0 kB
LazyFree: 0 kB
AnonHugePages: 0 kB
ShmemPmdMapped: 0 kB
Shared_Hugetlb: 0 kB
Private_Hugetlb: 0 kB
Swap: 0 kB
SwapPss: 0 kB
KernelPageSize: 4 kB
MMUPageSize: 4 kB
Locked: 0 kB

Overall system memory consumption is similarly improved.

Another major change is that we remove dax_pfn_mkwrite() from our fault
flow, and instead rely on the page fault itself to make the PTE dirty
and writeable. The following description from the patch adding the
vm_insert_mixed_mkwrite() call explains this a little more:

"To be able to use the common 4k zero page in DAX we need to have our
PTE fault path look more like our PMD fault path where a PTE entry
can be marked as dirty and writeable as it is first inserted rather
than waiting for a follow-up dax_pfn_mkwrite() =>
finish_mkwrite_fault() call.

Right now we can rely on having a dax_pfn_mkwrite() call because we
can distinguish between these two cases in do_wp_page():

case 1: 4k zero page => writable DAX storage
case 2: read-only DAX storage => writeable DAX storage

This distinction is made by via vm_normal_page(). vm_normal_page()
returns false for the common 4k zero page, though, just as it does
for DAX ptes. Instead of special casing the DAX + 4k zero page case
we will simplify our DAX PTE page fault sequence so that it matches
our DAX PMD sequence, and get rid of the dax_pfn_mkwrite() helper.
We will instead use dax_iomap_fault() to handle write-protection
faults.

This means that insert_pfn() needs to follow the lead of
insert_pfn_pmd() and allow us to pass in a 'mkwrite' flag. If
'mkwrite' is set insert_pfn() will do the work that was previously
done by wp_page_reuse() as part of the dax_pfn_mkwrite() call path"

Link: http://lkml.kernel.org/r/20170724170616.25810-4-ross.zwisler@linux.intel.com
Signed-off-by: Ross Zwisler
Reviewed-by: Jan Kara
Cc: "Darrick J. Wong"
Cc: "Theodore Ts'o"
Cc: Alexander Viro
Cc: Andreas Dilger
Cc: Christoph Hellwig
Cc: Dan Williams
Cc: Dave Chinner
Cc: Ingo Molnar
Cc: Jonathan Corbet
Cc: Matthew Wilcox
Cc: Steven Rostedt
Cc: Kirill A. Shutemov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Ross Zwisler
2017-09-07 08:27:24 +0800

05 Sep, 2017

1 commit

495e64293 vfs: add flags to d_real() ... Browse Code »

Add a separate flags argument (in addition to the open flags) to control
the behavior of d_real().

Signed-off-by: Miklos Szeredi

Miklos Szeredi
2017-09-05 03:42:22 +0800

27 Aug, 2017

1 commit

cc4bbaae5 swap: Remove obsolete sentence ... Browse Code »

Currently there are no ->swap_{in,out} method in address_space_operations
sructure definition, so the statement that anything is going to be proxied
through them is wrong.

Signed-off-by: Nikolay Borisov
Signed-off-by: Jonathan Corbet

Nikolay Borisov
2017-08-27 05:55:38 +0800

22 Aug, 2017

1 commit

4b2414d04 f2fs: support journalled quota ... Browse Code »

This patch supports to enable f2fs to accept quota information through
mount option:
- {usr,grp,prj}jquota=
- jqfmt=

Then, in ->mount flow, we can recover quota file during log replaying,
by this, journelled quota can be supported.

Signed-off-by: Chao Yu
[Jaegeuk Kim: Fix wrong return values.]
Signed-off-by: Jaegeuk Kim

Chao Yu
2017-08-22 06:54:48 +0800

16 Aug, 2017

1 commit

d9872a698 f2fs: introduce gc_urgent mode for background GC ... Browse Code »

This patch adds a sysfs entry to control urgent mode for background GC.
If this is set, background GC thread conducts GC with gc_urgent_sleep_time
all the time.

Reviewed-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Jaegeuk Kim
2017-08-16 01:40:12 +0800

01 Aug, 2017

1 commit

5c57132ea f2fs: support project quota ... Browse Code »

This patch adds to support plain project quota.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2017-08-01 07:48:32 +0800

17 Jul, 2017

1 commit

e462ec50c VFS: Differentiate mount flags (MS_*) from internal superblock flags ... Browse Code »

Differentiate the MS_* flags passed to mount(2) from the internal flags set
in the super_block's s_flags. s_flags are now called SB_*, with the names
and the values for the moment mirroring the MS_* flags that they're
equivalent to.

In this patch, just the headers are altered and some kernel code where
blind automated conversion isn't necessarily correct.

Note that this shows up some interesting issues:

(1) Some MS_* flags get translated to MNT_* flags (such as MS_NODEV ->
MNT_NODEV) without passing this on to the filesystem, but some
filesystems set such flags anyway.

(2) The ->remount_fs() methods of some filesystems adjust the *flags
argument by setting MS_* flags in it, such as MS_NOATIME - but these
flags are then scrubbed by do_remount_sb() (only the occupants of
MS_RMT_MASK are permitted: MS_RDONLY, MS_SYNCHRONOUS, MS_MANDLOCK,
MS_I_VERSION and MS_LAZYTIME)

I'm not sure what's the best way to solve all these cases.

Suggested-by: Al Viro
Signed-off-by: David Howells

David Howells
2017-07-17 15:45:35 +0800

16 Jul, 2017

1 commit

78dcf7342 Merge branch 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull ->s_options removal from Al Viro:
"Preparations for fsmount/fsopen stuff (coming next cycle). Everything
gets moved to explicit ->show_options(), killing ->s_options off +
some cosmetic bits around fs/namespace.c and friends. Basically, the
stuff needed to work with fsmount series with minimum of conflicts
with other work.

It's not strictly required for this merge window, but it would reduce
the PITA during the coming cycle, so it would be nice to have those
bits and pieces out of the way"

* 'work.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
isofs: Fix isofs_show_options()
VFS: Kill off s_options and helpers
orangefs: Implement show_options
9p: Implement show_options
isofs: Implement show_options
afs: Implement show_options
affs: Implement show_options
befs: Implement show_options
spufs: Implement show_options
bpf: Implement show_options
ramfs: Implement show_options
pstore: Implement show_options
omfs: Implement show_options
hugetlbfs: Implement show_options
VFS: Don't use save/replace_mount_options if not using generic_show_options
VFS: Provide empty name qstr
VFS: Make get_filesystem() return the affected filesystem
VFS: Clean up whitespace in fs/namespace.c and fs/super.c
Provide a function to create a NUL-terminated string from unterminated data

Linus Torvalds
2017-07-16 03:00:42 +0800

13 Jul, 2017

2 commits

77493f04b procfs: fdinfo: extend information about epoll target files ... Browse Code »

Since it is possbile to have same number in tfd field (say file added,
closed, then nother file dup'ed to same number and added back) it is
imposible to distinguish such target files solely by their numbers.

Strictly speaking regular applications don't need to recognize these
targets at all but for checkpoint/restore sake we need to collect
targets to be able to push them back on restore stage in a proper order.

Thus lets add file position, inode and device number where this target
lays. This three fields can be used as a primary key for sorting, and
together with kcmp help CRIU can find out an exact file target (from the
whole set of processes being checkpointed).

Link: http://lkml.kernel.org/r/20170424154423.436491881@gmail.com
Signed-off-by: Cyrill Gorcunov
Acked-by: Andrei Vagin
Cc: Al Viro
Cc: Pavel Emelyanov
Cc: Michael Kerrisk
Cc: Jason Baron
Cc: Andy Lutomirski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Cyrill Gorcunov
2017-07-13 07:26:01 +0800
6b1c776d3 Merge branch 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs ... Browse Code »

Pull overlayfs updates from Miklos Szeredi:
"This work from Amir introduces the inodes index feature, which
provides:

- hardlinks are not broken on copy up

- infrastructure for overlayfs NFS export

This also fixes constant st_ino for samefs case for lower hardlinks"

* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs: (33 commits)
ovl: mark parent impure and restore timestamp on ovl_link_up()
ovl: document copying layers restrictions with inodes index
ovl: cleanup orphan index entries
ovl: persistent overlay inode nlink for indexed inodes
ovl: implement index dir copy up
ovl: move copy up lock out
ovl: rearrange copy up
ovl: add flag for upper in ovl_entry
ovl: use struct copy_up_ctx as function argument
ovl: base tmpfile in workdir too
ovl: factor out ovl_copy_up_inode() helper
ovl: extract helper to get temp file in copy up
ovl: defer upper dir lock to tempfile link
ovl: hash overlay non-dir inodes by copy up origin
ovl: cleanup bad and stale index entries on mount
ovl: lookup index entry for copy up origin
ovl: verify index dir matches upper dir
ovl: verify upper root dir matches lower root dir
ovl: introduce the inodes index dir feature
ovl: generalize ovl_create_workdir()
...

Linus Torvalds
2017-07-13 00:28:55 +0800

11 Jul, 2017

3 commits

1d278a879 VFS: Kill off s_options and helpers ... Browse Code »

Kill off s_options, save/replace_mount_options() and generic_show_options()
as all filesystems now implement ->show_options() for themselves. This
should make it easier to implement a context-based mount where the mount
options can be passed individually over a file descriptor.

Signed-off-by: David Howells
Signed-off-by: Al Viro

David Howells
2017-07-11 18:09:21 +0800
5cdd4c046 Merge tag 'for-f2fs-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs ... Browse Code »

Pull f2fs updates from Jaegeuk Kim:
"In this round, we've added new features such as disk quota and statx,
and modified internal bio management flow to merge more IOs depending
on block types. We've also made internal threads freezeable for
Android battery life. In addition to them, there are some patches to
avoid lock contention as well as a couple of deadlock conditions.

Enhancements:
- support usrquota, grpquota, and statx
- manage DATA/NODE typed bios separately to serialize more IOs
- modify f2fs_lock_op/wio_mutex to avoid lock contention
- prevent lock contention in migratepage

Bug fixes:
- fix missing load of written inode flag
- fix worst case victim selection in GC
- freezeable GC and discard threads for Android battery life
- sanitize f2fs metadata to deal with security hole
- clean up sysfs-related code and docs"

* tag 'for-f2fs-4.13' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (59 commits)
f2fs: support plain user/group quota
f2fs: avoid deadlock caused by lock order of page and lock_op
f2fs: use spin_{,un}lock_irq{save,restore}
f2fs: relax migratepage for atomic written page
f2fs: don't count inode block in in-memory inode.i_blocks
Revert "f2fs: fix to clean previous mount option when remount_fs"
f2fs: do not set LOST_PINO for renamed dir
f2fs: do not set LOST_PINO for newly created dir
f2fs: skip ->writepages for {mete,node}_inode during recovery
f2fs: introduce __check_sit_bitmap
f2fs: stop gc/discard thread in prior during umount
f2fs: introduce reserved_blocks in sysfs
f2fs: avoid redundant f2fs_flush after remount
f2fs: report # of free inodes more precisely
f2fs: add ioctl to do gc with target block address
f2fs: don't need to check encrypted inode for partial truncation
f2fs: measure inode.i_blocks as generic filesystem
f2fs: set CP_TRIMMED_FLAG correctly
f2fs: require key for truncate(2) of encrypted file
f2fs: move sysfs code from super.c to fs/f2fs/sysfs.c
...

Linus Torvalds
2017-07-11 05:29:45 +0800
7cee9384c Fix up over-eager 'wait_queue_t' renaming ... Browse Code »

Commit ac6424b981bc ("sched/wait: Rename wait_queue_t =>
wait_queue_entry_t") had scripted the renaming incorrectly, and didn't
actually check that the 'wait_queue_t' was a full token.

As a result, it also triggered on 'wait_queue_token', and renamed that
to 'wait_queue_entry_token' entry in the autofs4 packet structure
definition too. That was entirely incorrect, and not intended.

The end result built fine when building just the kernel - because
everything had been renamed consistently there - but caused problems in
user space because the "struct autofs_packet_missing" type is exported
as part of the uapi.

This scripts it all back again:

git grep -lw wait_queue_entry_token |
xargs sed -i 's/wait_queue_entry_token/wait_queue_token/g'

and checks the end result.

Reported-by: Florian Fainelli
Acked-by: Ingo Molnar
Fixes: ac6424b981bc ("sched/wait: Rename wait_queue_t => wait_queue_entry_t")
Signed-off-by: Linus Torvalds

Linus Torvalds
2017-07-11 02:40:19 +0800

09 Jul, 2017

1 commit

0abd675e9 f2fs: support plain user/group quota ... Browse Code »

This patch adds to support plain user/group quota.

Change Note by Jaegeuk Kim.

- Use f2fs page cache for quota files in order to consider garbage collection.
so, quota files are not tolerable for sudden power-cuts, so user needs to do
quotacheck.

- setattr() calls dquot_transfer which will transfer inode->i_blocks.
We can't reclaim that during f2fs_evict_inode(). So, we need to count
node blocks as well in order to match i_blocks with dquot's space.

Note that, Chao wrote a patch to count inode->i_blocks without inode block.
(f2fs: don't count inode block in in-memory inode.i_blocks)

- in f2fs_remount, we need to make RW in prior to dquot_resume.

- handle fault_injection case during f2fs_quota_off_umount

- TODO: Project quota

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2017-07-09 14:12:27 +0800

08 Jul, 2017

1 commit

088737f44 Merge tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux ... Browse Code »

Pull Writeback error handling updates from Jeff Layton:
"This pile represents the bulk of the writeback error handling fixes
that I have for this cycle. Some of the earlier patches in this pile
may look trivial but they are prerequisites for later patches in the
series.

The aim of this set is to improve how we track and report writeback
errors to userland. Most applications that care about data integrity
will periodically call fsync/fdatasync/msync to ensure that their
writes have made it to the backing store.

For a very long time, we have tracked writeback errors using two flags
in the address_space: AS_EIO and AS_ENOSPC. Those flags are set when a
writeback error occurs (via mapping_set_error) and are cleared as a
side-effect of filemap_check_errors (as you noted yesterday). This
model really sucks for userland.

Only the first task to call fsync (or msync or fdatasync) will see the
error. Any subsequent task calling fsync on a file will get back 0
(unless another writeback error occurs in the interim). If I have
several tasks writing to a file and calling fsync to ensure that their
writes got stored, then I need to have them coordinate with one
another. That's difficult enough, but in a world of containerized
setups that coordination may even not be possible.

But wait...it gets worse!

The calls to filemap_check_errors can be buried pretty far down in the
call stack, and there are internal callers of filemap_write_and_wait
and the like that also end up clearing those errors. Many of those
callers ignore the error return from that function or return it to
userland at nonsensical times (e.g. truncate() or stat()). If I get
back -EIO on a truncate, there is no reason to think that it was
because some previous writeback failed, and a subsequent fsync() will
(incorrectly) return 0.

This pile aims to do three things:

1) ensure that when a writeback error occurs that that error will be
reported to userland on a subsequent fsync/fdatasync/msync call,
regardless of what internal callers are doing

2) report writeback errors on all file descriptions that were open at
the time that the error occurred. This is a user-visible change,
but I think most applications are written to assume this behavior
anyway. Those that aren't are unlikely to be hurt by it.

3) document what filesystems should do when there is a writeback
error. Today, there is very little consistency between them, and a
lot of cargo-cult copying. We need to make it very clear what
filesystems should do in this situation.

To achieve this, the set adds a new data type (errseq_t) and then
builds new writeback error tracking infrastructure around that. Once
all of that is in place, we change the filesystems to use the new
infrastructure for reporting wb errors to userland.

Note that this is just the initial foray into cleaning up this mess.
There is a lot of work remaining here:

1) convert the rest of the filesystems in a similar fashion. Once the
initial set is in, then I think most other fs' will be fairly
simple to convert. Hopefully most of those can in via individual
filesystem trees.

2) convert internal waiters on writeback to use errseq_t for
detecting errors instead of relying on the AS_* flags. I have some
draft patches for this for ext4, but they are not quite ready for
prime time yet.

This was a discussion topic this year at LSF/MM too. If you're
interested in the gory details, LWN has some good articles about this:

https://lwn.net/Articles/718734/
https://lwn.net/Articles/724307/"

* tag 'for-linus-v4.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/jlayton/linux:
btrfs: minimal conversion to errseq_t writeback error reporting on fsync
xfs: minimal conversion to errseq_t writeback error reporting
ext4: use errseq_t based error handling for reporting data writeback errors
fs: convert __generic_file_fsync to use errseq_t based reporting
block: convert to errseq_t based writeback error tracking
dax: set errors in mapping when writeback fails
Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors
mm: set both AS_EIO/AS_ENOSPC and errseq_t in mapping_set_error
fs: new infrastructure for writeback error handling and reporting
lib: add errseq_t type and infrastructure for handling it
mm: don't TestClearPageError in __filemap_fdatawait_range
mm: clear AS_EIO/AS_ENOSPC when writeback initiation fails
jbd2: don't clear and reset errors after waiting on writeback
buffer: set errors in mapping at the time that the error occurs
fs: check for writeback errors after syncing out buffers in generic_file_fsync
buffer: use mapping_set_error instead of setting the flag
mm: fix mapping_set_error call in me_pagecache_dirty

Linus Torvalds
2017-07-08 10:38:17 +0800

06 Jul, 2017

1 commit

acbf3c345 Documentation: flesh out the section in vfs.txt on storing and reporting writeback errors ... Browse Code »

Let's try to make this extra clear for fs authors.

Cc: Jan Kara
Signed-off-by: Jeff Layton

Jeff Layton
2017-07-06 19:02:27 +0800

05 Jul, 2017

1 commit

9412812ef ovl: document copying layers restrictions with inodes index ... Browse Code »

The inodes index feature introduces a behavior change - on mount,
upper root origin file handle is verified to match the lower root dir.
This implies that copied layers cannot be mounted with the inodes index
feature enabled, without explicitly removing the upper dir origin xattr
and the index dir.

The inodes index feature is required to support:
- Prevent breaking hardlinks on copy up
- NFS export support (upcoming)
- Overlayfs snapshots (POC)

Signed-off-by: Amir Goldstein
Signed-off-by: Miklos Szeredi

Amir Goldstein
2017-07-05 04:03:19 +0800

04 Jul, 2017

2 commits

56412894b f2fs: fix to document fault injection option and sysfs file ... Browse Code »

Commit 73faec4d9935 ("f2fs: add mount option to select fault injection
ratio") and Commit 087968974fcd ("f2fs: add fault injection to sysfs")
forget to document mount option and sysfs file.

This patch fixes to document them.

Signed-off-by: Chao Yu
Signed-off-by: Jaegeuk Kim

Chao Yu
2017-07-04 17:11:44 +0800
650fc870a Merge tag 'docs-4.13' of git://git.lwn.net/linux ... Browse Code »

Pull documentation updates from Jonathan Corbet:
"There has been a fair amount of activity in the docs tree this time
around. Highlights include:

- Conversion of a bunch of security documentation into RST

- The conversion of the remaining DocBook templates by The Amazing
Mauro Machine. We can now drop the entire DocBook build chain.

- The usual collection of fixes and minor updates"

* tag 'docs-4.13' of git://git.lwn.net/linux: (90 commits)
scripts/kernel-doc: handle DECLARE_HASHTABLE
Documentation: atomic_ops.txt is core-api/atomic_ops.rst
Docs: clean up some DocBook loose ends
Make the main documentation title less Geocities
Docs: Use kernel-figure in vidioc-g-selection.rst
Docs: fix table problems in ras.rst
Docs: Fix breakage with Sphinx 1.5 and upper
Docs: Include the Latex "ifthen" package
doc/kokr/howto: Only send regression fixes after -rc1
docs-rst: fix broken links to dynamic-debug-howto in kernel-parameters
doc: Document suitability of IBM Verse for kernel development
Doc: fix a markup error in coding-style.rst
docs: driver-api: i2c: remove some outdated information
Documentation: DMA API: fix a typo in a function name
Docs: Insert missing space to separate link from text
doc/ko_KR/memory-barriers: Update control-dependencies example
Documentation, kbuild: fix typo "minimun" -> "minimum"
docs: Fix some formatting issues in request-key.rst
doc: ReSTify keys-trusted-encrypted.txt
doc: ReSTify keys-request-key.txt
...

Linus Torvalds
2017-07-04 12:13:25 +0800

20 Jun, 2017

1 commit

ac6424b98 sched/wait: Rename wait_queue_t => wait_queue_entry_t ... Browse Code »

Rename:

wait_queue_t => wait_queue_entry_t

'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
which had to carry the name.

Start sorting this out by renaming it to 'wait_queue_entry_t'.

This also allows the real structure name 'struct __wait_queue' to
lose its double underscore and become 'struct wait_queue_entry',
which is the more canonical nomenclature for such data types.

Cc: Linus Torvalds
Cc: Peter Zijlstra
Cc: Thomas Gleixner
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar

Ingo Molnar
2017-06-20 18:18:27 +0800

19 May, 2017

2 commits

6312811be Merge remote-tracking branch 'mauro-exp/docbook3' into death-to-docbook ... Browse Code »

Mauro says:

This patch series convert the remaining DocBooks to ReST.

The first version was originally
send as 3 patch series:

[PATCH 00/36] Convert DocBook documents to ReST
[PATCH 0/5] Convert more books to ReST
[PATCH 00/13] Get rid of DocBook

The lsm book was added as if it were a text file under
Documentation. The plan is to merge it with another file
under Documentation/security, after both this series and
a security Documentation patch series gets merged.

It also adjusts some Sphinx-pedantic errors/warnings on
some kernel-doc markups.

I also added some patches here to add PDF output for all
existing ReST books.

Jonathan Corbet
2017-05-19 01:03:08 +0800
3db38ed76 doc: ReSTify keys-request-key.txt ... Browse Code »

Adjusts for ReST markup and moves under keys security devel index.

Cc: David Howells
Signed-off-by: Kees Cook
Signed-off-by: Jonathan Corbet

Kees Cook
2017-05-19 00:33:51 +0800

16 May, 2017

3 commits

76d0d5d31 docs-rst: don't ignore internal functions for jbd2 docs ... Browse Code »

Those functions are currently ignored, causing references at
the documentation to be lost. Don't ignore it.

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2017-05-16 19:44:09 +0800
7a2208f63 docs-rst: filesystems: use c domain references where needed ... Browse Code »

Instead of just mention the function names, use cross-references
to the kernel-doc tags where pertinent.

While not all function documentation is included here, I
double-checked that all functions mentioned there still
exists.

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2017-05-16 19:44:08 +0800
90f9f118b docs-rst: convert filesystems book to ReST ... Browse Code »

Use pandoc to convert documentation to ReST by calling
Documentation/sphinx/tmplcvt script.

Signed-off-by: Mauro Carvalho Chehab

Mauro Carvalho Chehab
2017-05-16 19:44:08 +0800

13 May, 2017

1 commit

cea582247 Tigran has moved ... Browse Code »

Cc: Tigran Aivazian
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Andrew Morton
2017-05-13 06:57:15 +0800

11 May, 2017

1 commit

73ccb023a Merge tag 'nfs-for-4.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs ... Browse Code »

Pull NFS client updates from Trond Myklebust:
"Highlights include:

Stable bugfixes:
- Fix use after free in write error path
- Use GFP_NOIO for two allocations in writeback
- Fix a hang in OPEN related to server reboot
- Check the result of nfs4_pnfs_ds_connect
- Fix an rcu lock leak

Features:
- Removal of the unmaintained and unused OSD pNFS layout
- Cleanup and removal of lots of unnecessary dprintk()s
- Cleanup and removal of some memory failure paths now that GFP_NOFS
is guaranteed to never fail.
- Remove the v3-only data server limitation on pNFS/flexfiles

Bugfixes:
- RPC/RDMA connection handling bugfixes
- Copy offload: fixes to ensure the copied data is COMMITed to disk.
- Readdir: switch back to using the ->iterate VFS interface
- File locking fixes from Ben Coddington
- Various use-after-free and deadlock issues in pNFS
- Write path bugfixes"

* tag 'nfs-for-4.12-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (89 commits)
pNFS/flexfiles: Always attempt to call layoutstats when flexfiles is enabled
NFSv4.1: Work around a Linux server bug...
NFS append COMMIT after synchronous COPY
NFSv4: Fix exclusive create attributes encoding
NFSv4: Fix an rcu lock leak
nfs: use kmap/kunmap directly
NFS: always treat the invocation of nfs_getattr as cache hit when noac is on
Fix nfs_client refcounting if kmalloc fails in nfs4_proc_exchange_id and nfs4_proc_async_renew
NFSv4.1: RECLAIM_COMPLETE must handle NFS4ERR_CONN_NOT_BOUND_TO_SESSION
pNFS: Fix NULL dereference in pnfs_generic_alloc_ds_commits
pNFS: Fix a typo in pnfs_generic_alloc_ds_commits
pNFS: Fix a deadlock when coalescing writes and returning the layout
pNFS: Don't clear the layout return info if there are segments to return
pNFS: Ensure we commit the layout if it has been invalidated
pNFS: Don't send COMMITs to the DSes if the server invalidated our layout
pNFS/flexfiles: Fix up the ff_layout_write_pagelist failure path
pNFS: Ensure we check layout validity before marking it for return
NFS4.1 handle interrupted slot reuse from ERR_DELAY
NFSv4: check return value of xdr_inline_decode
nfs/filelayout: fix NULL pointer dereference in fl_pnfs_update_layout()
...

Linus Torvalds
2017-05-11 04:03:38 +0800