Eric Lee / smarc-fsl-linux-kernel

28 May, 2016

1 commit

3767e255b switch ->setxattr() to passing dentry and inode separately ... Browse Code »

smack ->d_instantiate() uses ->setxattr(), so to be able to call it before
we'd hashed the new dentry and attached it to inode, we need ->setxattr()
instances getting the inode as an explicit argument rather than obtaining
it from dentry.

Similar change for ->getxattr() had been done in commit ce23e64. Unlike
->getxattr() (which is used by both selinux and smack instances of
->d_instantiate()) ->setxattr() is used only by smack one and unfortunately
it got missed back then.

Reported-by: Seung-Woo Kim
Tested-by: Casey Schaufler
Signed-off-by: Al Viro

Al Viro
2016-05-28 08:09:16 +0800

21 May, 2016

1 commit

3aa2fc166 Merge tag 'driver-core-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull driver core updates from Greg KH:
"Here's the "big" driver core update for 4.7-rc1.

Mostly just debugfs changes, the long-known and messy races with
removing debugfs files should be fixed thanks to the great work of
Nicolai Stange. We also have some isa updates in here (the x86
maintainers told me to take it through this tree), a new warning when
we run out of dynamic char major numbers, and a few other assorted
changes, details in the shortlog.

All have been in linux-next for some time with no reported issues"

* tag 'driver-core-4.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (32 commits)
Revert "base: dd: don't remove driver_data in -EPROBE_DEFER case"
gpio: ws16c48: Utilize the ISA bus driver
gpio: 104-idio-16: Utilize the ISA bus driver
gpio: 104-idi-48: Utilize the ISA bus driver
gpio: 104-dio-48e: Utilize the ISA bus driver
watchdog: ebc-c384_wdt: Utilize the ISA bus driver
iio: stx104: Utilize the module_isa_driver and max_num_isa_dev macros
iio: stx104: Add X86 dependency to STX104 Kconfig option
Documentation: Add ISA bus driver documentation
isa: Implement the max_num_isa_dev macro
isa: Implement the module_isa_driver macro
pnp: pnpbios: Add explicit X86_32 dependency to PNPBIOS
isa: Decouple X86_32 dependency from the ISA Kconfig option
driver-core: use 'dev' argument in dev_dbg_ratelimited stub
base: dd: don't remove driver_data in -EPROBE_DEFER case
kernfs: Move faulting copy_user operations outside of the mutex
devcoredump: add scatterlist support
debugfs: unproxify files created through debugfs_create_u32_array()
debugfs: unproxify files created through debugfs_create_blob()
debugfs: unproxify files created through debugfs_create_bool()
...

Linus Torvalds
2016-05-21 12:26:15 +0800

18 May, 2016

1 commit

7f427d3a6 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull parallel filesystem directory handling update from Al Viro.

This is the main parallel directory work by Al that makes the vfs layer
able to do lookup and readdir in parallel within a single directory.
That's a big change, since this used to be all protected by the
directory inode mutex.

The inode mutex is replaced by an rwsem, and serialization of lookups of
a single name is done by a "in-progress" dentry marker.

The series begins with xattr cleanups, and then ends with switching
filesystems over to actually doing the readdir in parallel (switching to
the "iterate_shared()" that only takes the read lock).

A more detailed explanation of the process from Al Viro:
"The xattr work starts with some acl fixes, then switches ->getxattr to
passing inode and dentry separately. This is the point where the
things start to get tricky - that got merged into the very beginning
of the -rc3-based #work.lookups, to allow untangling the
security_d_instantiate() mess. The xattr work itself proceeds to
switch a lot of filesystems to generic_...xattr(); no complications
there.

After that initial xattr work, the series then does the following:

- untangle security_d_instantiate()

- convert a bunch of open-coded lookup_one_len_unlocked() to calls of
that thing; one such place (in overlayfs) actually yields a trivial
conflict with overlayfs fixes later in the cycle - overlayfs ended
up switching to a variant of lookup_one_len_unlocked() sans the
permission checks. I would've dropped that commit (it gets
overridden on merge from #ovl-fixes in #for-next; proper resolution
is to use the variant in mainline fs/overlayfs/super.c), but I
didn't want to rebase the damn thing - it was fairly late in the
cycle...

- some filesystems had managed to depend on lookup/lookup exclusion
for *fs-internal* data structures in a way that would break if we
relaxed the VFS exclusion. Fixing hadn't been hard, fortunately.

- core of that series - parallel lookup machinery, replacing
->i_mutex with rwsem, making lookup_slow() take it only shared. At
that point lookups happen in parallel; lookups on the same name
wait for the in-progress one to be done with that dentry.

Surprisingly little code, at that - almost all of it is in
fs/dcache.c, with fs/namei.c changes limited to lookup_slow() -
making it use the new primitive and actually switching to locking
shared.

- parallel readdir stuff - first of all, we provide the exclusion on
per-struct file basis, same as we do for read() vs lseek() for
regular files. That takes care of most of the needed exclusion in
readdir/readdir; however, these guys are trickier than lookups, so
I went for switching them one-by-one. To do that, a new method
'->iterate_shared()' is added and filesystems are switched to it
as they are either confirmed to be OK with shared lock on directory
or fixed to be OK with that. I hope to kill the original method
come next cycle (almost all in-tree filesystems are switched
already), but it's still not quite finished.

- several filesystems get switched to parallel readdir. The
interesting part here is dealing with dcache preseeding by readdir;
that needs minor adjustment to be safe with directory locked only
shared.

Most of the filesystems doing that got switched to in those
commits. Important exception: NFS. Turns out that NFS folks, with
their, er, insistence on VFS getting the fuck out of the way of the
Smart Filesystem Code That Knows How And What To Lock(tm) have
grown the locking of their own. They had their own homegrown
rwsem, with lookup/readdir/atomic_open being *writers* (sillyunlink
is the reader there). Of course, with VFS getting the fuck out of
the way, as requested, the actual smarts of the smart filesystem
code etc. had become exposed...

- do_last/lookup_open/atomic_open cleanups. As the result, open()
without O_CREAT locks the directory only shared. Including the
->atomic_open() case. Backmerge from #for-linus in the middle of
that - atomic_open() fix got brought in.

- then comes NFS switch to saner (VFS-based ;-) locking, killing the
homegrown "lookup and readdir are writers" kinda-sorta rwsem. All
exclusion for sillyunlink/lookup is done by the parallel lookups
mechanism. Exclusion between sillyunlink and rmdir is a real rwsem
now - rmdir being the writer.

Result: NFS lookups/readdirs/O_CREAT-less opens happen in parallel
now.

- the rest of the series consists of switching a lot of filesystems
to parallel readdir; in a lot of cases ->llseek() gets simplified
as well. One backmerge in there (again, #for-linus - rockridge
fix)"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (74 commits)
ext4: switch to ->iterate_shared()
hfs: switch to ->iterate_shared()
hfsplus: switch to ->iterate_shared()
hostfs: switch to ->iterate_shared()
hpfs: switch to ->iterate_shared()
hpfs: handle allocation failures in hpfs_add_pos()
gfs2: switch to ->iterate_shared()
f2fs: switch to ->iterate_shared()
afs: switch to ->iterate_shared()
befs: switch to ->iterate_shared()
befs: constify stuff a bit
isofs: switch to ->iterate_shared()
get_acorn_filename(): deobfuscate a bit
btrfs: switch to ->iterate_shared()
logfs: no need to lock directory in lseek
switch ecryptfs to ->iterate_shared
9p: switch to ->iterate_shared()
fat: switch to ->iterate_shared()
romfs, squashfs: switch to ->iterate_shared()
more trivial ->iterate_shared conversions
...

Linus Torvalds
2016-05-18 02:01:31 +0800

12 May, 2016

1 commit

3cc9b23c8 kernfs: kernfs_sop_show_path: don't return 0 after seq_dentry call ... Browse Code »

Our caller expects 0 on success, not >0.

This fixes a bug in the patch

cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces

where /sys does not show up in mountinfo, breaking criu.

Thanks for catching this, Andrei.

Reported-by: Andrei Vagin
Signed-off-by: Serge Hallyn
Signed-off-by: Tejun Heo

Serge E. Hallyn
2016-05-12 23:03:51 +0800

10 May, 2016

1 commit

4f41fc596 cgroup, kernfs: make mountinfo show properly scoped path for cgroup namespaces ... Browse Code »

Patch summary:

When showing a cgroupfs entry in mountinfo, show the path of the mount
root dentry relative to the reader's cgroup namespace root.

Short explanation (courtesy of mkerrisk):

If we create a new cgroup namespace, then we want both /proc/self/cgroup
and /proc/self/mountinfo to show cgroup paths that are correctly
virtualized with respect to the cgroup mount point. Previous to this
patch, /proc/self/cgroup shows the right info, but /proc/self/mountinfo
does not.

Long version:

When a uid 0 task which is in freezer cgroup /a/b, unshares a new cgroup
namespace, and then mounts a new instance of the freezer cgroup, the new
mount will be rooted at /a/b. The root dentry field of the mountinfo
entry will show '/a/b'.

cat > /tmp/do1 << EOF
mount -t cgroup -o freezer freezer /mnt
grep freezer /proc/self/mountinfo
EOF

unshare -Gm bash /tmp/do1
> 330 160 0:34 / /sys/fs/cgroup/freezer rw,nosuid,nodev,noexec,relatime - cgroup cgroup rw,freezer
> 355 133 0:34 /a/b /mnt rw,relatime - cgroup freezer rw,freezer

The task's freezer cgroup entry in /proc/self/cgroup will simply show
'/':

grep freezer /proc/self/cgroup
9:freezer:/

If instead the same task simply bind mounts the /a/b cgroup directory,
the resulting mountinfo entry will again show /a/b for the dentry root.
However in this case the task will find its own cgroup at /mnt/a/b,
not at /mnt:

mount --bind /sys/fs/cgroup/freezer/a/b /mnt
130 25 0:34 /a/b /mnt rw,nosuid,nodev,noexec,relatime shared:21 - cgroup cgroup rw,freezer

In other words, there is no way for the task to know, based on what is
in mountinfo, which cgroup directory is its own.

Example (by mkerrisk):

First, a little script to save some typing and verbiage:

echo -e "\t/proc/self/cgroup:\t$(cat /proc/self/cgroup | grep freezer)"
cat /proc/self/mountinfo | grep freezer |
awk '{print "\tmountinfo:\t\t" $4 "\t" $5}'

Create cgroup, place this shell into the cgroup, and look at the state
of the /proc files:

2653
2653 # Our shell
14254 # cat(1)
/proc/self/cgroup: 10:freezer:/a/b
mountinfo: / /sys/fs/cgroup/freezer

Create a shell in new cgroup and mount namespaces. The act of creating
a new cgroup namespace causes the process's current cgroups directories
to become its cgroup root directories. (Here, I'm using my own version
of the "unshare" utility, which takes the same options as the util-linux
version):

Look at the state of the /proc files:

/proc/self/cgroup: 10:freezer:/
mountinfo: / /sys/fs/cgroup/freezer

The third entry in /proc/self/cgroup (the pathname of the cgroup inside
the hierarchy) is correctly virtualized w.r.t. the cgroup namespace, which
is rooted at /a/b in the outer namespace.

However, the info in /proc/self/mountinfo is not for this cgroup
namespace, since we are seeing a duplicate of the mount from the
old mount namespace, and the info there does not correspond to the
new cgroup namespace. However, trying to create a new mount still
doesn't show us the right information in mountinfo:

# propagating to other mountns
/proc/self/cgroup: 7:freezer:/
mountinfo: /a/b /mnt/freezer

The act of creating a new cgroup namespace caused the process's
current freezer directory, "/a/b", to become its cgroup freezer root
directory. In other words, the pathname directory of the directory
within the newly mounted cgroup filesystem should be "/",
but mountinfo wrongly shows us "/a/b". The consequence of this is
that the process in the cgroup namespace cannot correctly construct
the pathname of its cgroup root directory from the information in
/proc/PID/mountinfo.

With this patch, the dentry root field in mountinfo is shown relative
to the reader's cgroup namespace. So the same steps as above:

/proc/self/cgroup: 10:freezer:/a/b
mountinfo: / /sys/fs/cgroup/freezer
/proc/self/cgroup: 10:freezer:/
mountinfo: /../.. /sys/fs/cgroup/freezer
/proc/self/cgroup: 10:freezer:/
mountinfo: / /mnt/freezer

cgroup.clone_children freezer.parent_freezing freezer.state tasks
cgroup.procs freezer.self_freezing notify_on_release
3164
2653 # First shell that placed in this cgroup
3164 # Shell started by 'unshare'
14197 # cat(1)

Signed-off-by: Serge Hallyn
Tested-by: Michael Kerrisk
Acked-by: Michael Kerrisk
Signed-off-by: Tejun Heo

Serge E. Hallyn
2016-05-10 00:15:03 +0800

09 May, 2016

1 commit

8cb0d2c1c kernfs: no point locking directory around that generic_file_llseek() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2016-05-09 23:41:13 +0800

03 May, 2016

3 commits

779b83913 kernfs: use lookup_one_len_unlocked() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2016-05-03 07:47:21 +0800
84695ffee Merge getxattr prototype change into work.lookups ... Browse Code »

The rest of work.xattr stuff isn't needed for this branch

Al Viro
2016-05-03 07:45:47 +0800
e99ed4de1 kernfs_path_from_node_locked: don't overwrite nlen ... Browse Code »

We've calculated @len to be the bytes we need for '/..' entries from
@kn_from to the common ancestor, and calculated @nlen to be the extra
bytes we need to get from the common ancestor to @kn_to. We use them
as such at the end. But in the loop copying the actual entries, we
overwrite @nlen. Use a temporary variable for that instead.

Without this, the return length, when the buffer is large enough, is
wrong. (When the buffer is NULL or too small, the returned value is
correct. The buffer contents are also correct.)

Interestingly, no callers of this function are affected by this as of
yet. However the upcoming cgroup_show_path() will be.

Signed-off-by: Serge Hallyn

Serge Hallyn
2016-05-03 00:36:00 +0800

01 May, 2016

1 commit

e4234a1fc kernfs: Move faulting copy_user operations outside of the mutex ... Browse Code »

A fault in a user provided buffer may lead anywhere, and lockdep warns
that we have a potential deadlock between the mm->mmap_sem and the
kernfs file mutex:

[ 82.811702] ======================================================
[ 82.811705] [ INFO: possible circular locking dependency detected ]
[ 82.811709] 4.5.0-rc4-gfxbench+ #1 Not tainted
[ 82.811711] -------------------------------------------------------
[ 82.811714] kms_setmode/5859 is trying to acquire lock:
[ 82.811717] (&dev->struct_mutex){+.+.+.}, at: [] drm_gem_mmap+0x1a1/0x270
[ 82.811731]
but task is already holding lock:
[ 82.811734] (&mm->mmap_sem){++++++}, at: [] vm_mmap_pgoff+0x44/0xa0
[ 82.811745]
which lock already depends on the new lock.

[ 82.811749]
the existing dependency chain (in reverse order) is:
[ 82.811752]
-> #3 (&mm->mmap_sem){++++++}:
[ 82.811761] [] lock_acquire+0xc3/0x1d0
[ 82.811766] [] __might_fault+0x75/0xa0
[ 82.811771] [] kernfs_fop_write+0x8a/0x180
[ 82.811787] [] __vfs_write+0x23/0xe0
[ 82.811792] [] vfs_write+0xa4/0x190
[ 82.811797] [] SyS_write+0x44/0xb0
[ 82.811801] [] entry_SYSCALL_64_fastpath+0x16/0x73
[ 82.811807]
-> #2 (s_active#6){++++.+}:
[ 82.811814] [] lock_acquire+0xc3/0x1d0
[ 82.811819] [] __kernfs_remove+0x210/0x2f0
[ 82.811823] [] kernfs_remove_by_name_ns+0x40/0xa0
[ 82.811828] [] sysfs_remove_file_ns+0x10/0x20
[ 82.811832] [] device_del+0x124/0x250
[ 82.811837] [] device_unregister+0x19/0x60
[ 82.811841] [] cpu_cache_sysfs_exit+0x51/0xb0
[ 82.811846] [] cacheinfo_cpu_callback+0x38/0x70
[ 82.811851] [] notifier_call_chain+0x39/0xa0
[ 82.811856] [] __raw_notifier_call_chain+0x9/0x10
[ 82.811860] [] cpu_notify+0x1e/0x40
[ 82.811865] [] cpu_notify_nofail+0x9/0x20
[ 82.811869] [] _cpu_down+0x233/0x340
[ 82.811874] [] disable_nonboot_cpus+0xc9/0x350
[ 82.811878] [] suspend_devices_and_enter+0x5a1/0xb50
[ 82.811883] [] pm_suspend+0x543/0x8d0
[ 82.811888] [] state_store+0x77/0xe0
[ 82.811892] [] kobj_attr_store+0xf/0x20
[ 82.811897] [] sysfs_kf_write+0x40/0x50
[ 82.811902] [] kernfs_fop_write+0x13c/0x180
[ 82.811906] [] __vfs_write+0x23/0xe0
[ 82.811910] [] vfs_write+0xa4/0x190
[ 82.811914] [] SyS_write+0x44/0xb0
[ 82.811918] [] entry_SYSCALL_64_fastpath+0x16/0x73
[ 82.811923]
-> #1 (cpu_hotplug.lock){+.+.+.}:
[ 82.811929] [] lock_acquire+0xc3/0x1d0
[ 82.811933] [] mutex_lock_nested+0x62/0x3b0
[ 82.811940] [] get_online_cpus+0x61/0x80
[ 82.811944] [] stop_machine+0x1b/0xe0
[ 82.811949] [] gen8_ggtt_insert_entries__BKL+0x2d/0x30 [i915]
[ 82.812009] [] ggtt_bind_vma+0x46/0x70 [i915]
[ 82.812045] [] i915_vma_bind+0x140/0x290 [i915]
[ 82.812081] [] i915_gem_object_do_pin+0x899/0xb00 [i915]
[ 82.812117] [] i915_gem_object_pin+0x35/0x40 [i915]
[ 82.812154] [] intel_init_pipe_control+0xbe/0x210 [i915]
[ 82.812192] [] intel_logical_rings_init+0xe2/0xde0 [i915]
[ 82.812232] [] i915_gem_init+0xf3/0x130 [i915]
[ 82.812278] [] i915_driver_load+0xf2d/0x1770 [i915]
[ 82.812318] [] drm_dev_register+0xa4/0xb0
[ 82.812323] [] drm_get_pci_dev+0xce/0x1e0
[ 82.812328] [] i915_pci_probe+0x2f/0x50 [i915]
[ 82.812360] [] pci_device_probe+0x87/0xf0
[ 82.812366] [] driver_probe_device+0x229/0x450
[ 82.812371] [] __driver_attach+0x83/0x90
[ 82.812375] [] bus_for_each_dev+0x61/0xa0
[ 82.812380] [] driver_attach+0x19/0x20
[ 82.812384] [] bus_add_driver+0x1ef/0x290
[ 82.812388] [] driver_register+0x5b/0xe0
[ 82.812393] [] __pci_register_driver+0x5b/0x60
[ 82.812398] [] drm_pci_init+0xd6/0x100
[ 82.812402] [] 0xffffffffa027c094
[ 82.812406] [] do_one_initcall+0xae/0x1d0
[ 82.812412] [] do_init_module+0x5b/0x1cb
[ 82.812417] [] load_module+0x1c20/0x2480
[ 82.812422] [] SyS_finit_module+0x7e/0xa0
[ 82.812428] [] entry_SYSCALL_64_fastpath+0x16/0x73
[ 82.812433]
-> #0 (&dev->struct_mutex){+.+.+.}:
[ 82.812439] [] __lock_acquire+0x1fc9/0x20f0
[ 82.812443] [] lock_acquire+0xc3/0x1d0
[ 82.812456] [] drm_gem_mmap+0x1c7/0x270
[ 82.812460] [] mmap_region+0x334/0x580
[ 82.812466] [] do_mmap+0x364/0x410
[ 82.812470] [] vm_mmap_pgoff+0x6d/0xa0
[ 82.812474] [] SyS_mmap_pgoff+0x184/0x220
[ 82.812479] [] SyS_mmap+0x1d/0x20
[ 82.812484] [] entry_SYSCALL_64_fastpath+0x16/0x73
[ 82.812489]
other info that might help us debug this:

[ 82.812493] Chain exists of:
&dev->struct_mutex --> s_active#6 --> &mm->mmap_sem

[ 82.812502] Possible unsafe locking scenario:

[ 82.812506] CPU0 CPU1
[ 82.812508] ---- ----
[ 82.812510] lock(&mm->mmap_sem);
[ 82.812514] lock(s_active#6);
[ 82.812519] lock(&mm->mmap_sem);
[ 82.812522] lock(&dev->struct_mutex);
[ 82.812526]
*** DEADLOCK ***

[ 82.812531] 1 lock held by kms_setmode/5859:
[ 82.812533] #0: (&mm->mmap_sem){++++++}, at: [] vm_mmap_pgoff+0x44/0xa0
[ 82.812541]
stack backtrace:
[ 82.812547] CPU: 0 PID: 5859 Comm: kms_setmode Not tainted 4.5.0-rc4-gfxbench+ #1
[ 82.812550] Hardware name: /NUC5CPYB, BIOS PYBSWCEL.86A.0040.2015.0814.1353 08/14/2015
[ 82.812553] 0000000000000000 ffff880079407bf0 ffffffff813f8505 ffffffff825fb270
[ 82.812560] ffffffff825c4190 ffff880079407c30 ffffffff810c84ac ffff880079407c90
[ 82.812566] ffff8800797ed328 ffff8800797ecb00 0000000000000001 ffff8800797ed350
[ 82.812573] Call Trace:
[ 82.812578] [] dump_stack+0x67/0x92
[ 82.812582] [] print_circular_bug+0x1fc/0x310
[ 82.812586] [] __lock_acquire+0x1fc9/0x20f0
[ 82.812590] [] lock_acquire+0xc3/0x1d0
[ 82.812594] [] ? drm_gem_mmap+0x1a1/0x270
[ 82.812599] [] drm_gem_mmap+0x1c7/0x270
[ 82.812603] [] ? drm_gem_mmap+0x1a1/0x270
[ 82.812608] [] mmap_region+0x334/0x580
[ 82.812612] [] do_mmap+0x364/0x410
[ 82.812616] [] vm_mmap_pgoff+0x6d/0xa0
[ 82.812629] [] SyS_mmap_pgoff+0x184/0x220
[ 82.812633] [] SyS_mmap+0x1d/0x20
[ 82.812637] [] entry_SYSCALL_64_fastpath+0x16/0x73

Highly unlikely though this scenario is, we can avoid the issue entirely
by moving the copy operation from out under the kernfs_get_active()
tracking by assigning the preallocated buffer its own mutex. The
temporary buffer allocation doesn't require mutex locking as it is
entirely local.

The locked section was extended by the addition of the preallocated buf
to speed up md user operations in

commit 2b75869bba676c248d8d25ae6d2bd9221dfffdb6
Author: NeilBrown
Date: Mon Oct 13 16:41:28 2014 +1100

sysfs/kernfs: allow attributes to request write buffer be pre-allocated.

Reported-by: Ville Syrjälä
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94350
Signed-off-by: Chris Wilson
Reviewed-by: Joonas Lahtinen
Cc: Ville Syrjälä
Cc: Joonas Lahtinen
Cc: NeilBrown
Acked-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Chris Wilson
2016-05-01 01:05:05 +0800

19 Apr, 2016

1 commit

5614e7725 Merge 4.6-rc4 into driver-core-next ... Browse Code »

We want those fixes in here as well.

Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2016-04-19 03:28:28 +0800

11 Apr, 2016

1 commit

ce23e6401 ->getxattr(): pass dentry and inode as separate arguments ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2016-04-11 12:48:00 +0800

05 Apr, 2016

1 commit

09cbfeaf1 mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros ... Browse Code »

PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
ago with promise that one day it will be possible to implement page
cache with bigger chunks than PAGE_SIZE.

This promise never materialized. And unlikely will.

We have many places where PAGE_CACHE_SIZE assumed to be equal to
PAGE_SIZE. And it's constant source of confusion on whether
PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
especially on the border between fs and mm.

Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
breakage to be doable.

Let's stop pretending that pages in page cache are special. They are
not.

The changes are pretty straight-forward:

- << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> ;

- PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};

- page_cache_get() -> get_page();

- page_cache_release() -> put_page();

This patch contains automated changes generated with coccinelle using
script below. For some reason, coccinelle doesn't patch header files.
I've called spatch for them manually.

The only adjustment after coccinelle is revert of changes to
PAGE_CAHCE_ALIGN definition: we are going to drop it later.

There are few places in the code where coccinelle didn't reach. I'll
fix them manually in a separate patch. Comments and documentation also
will be addressed with the separate patch.

virtual patch

@@
expression E;
@@
- E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
expression E;
@@
- E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
+ E

@@
@@
- PAGE_CACHE_SHIFT
+ PAGE_SHIFT

@@
@@
- PAGE_CACHE_SIZE
+ PAGE_SIZE

@@
@@
- PAGE_CACHE_MASK
+ PAGE_MASK

@@
expression E;
@@
- PAGE_CACHE_ALIGN(E)
+ PAGE_ALIGN(E)

@@
expression E;
@@
- page_cache_get(E)
+ get_page(E)

@@
expression E;
@@
- page_cache_release(E)
+ put_page(E)

Signed-off-by: Kirill A. Shutemov
Acked-by: Michal Hocko
Signed-off-by: Linus Torvalds

Kirill A. Shutemov
2016-04-05 01:41:08 +0800

30 Mar, 2016

1 commit

3a3a5fece fs: kernfs: Replace CURRENT_TIME by current_fs_time() ... Browse Code »

This is in preparation for the series that transitions
filesystem timestamps to use 64 bit time and hence make
them y2038 safe.

CURRENT_TIME macro will be deleted before merging the
aforementioned series.

Use current_fs_time() instead of CURRENT_TIME for inode
timestamps.

struct kernfs_node is associated with a sysfs file/ directory.
Truncate the values to appropriate time granularity when
writing to inode timestamps of the files.

ktime_get_real_ts() is used to obtain times for
struct kernfs_iattrs. Since these times are later assigned to
inode times using timespec_truncate() for all filesystem based
operations, we can save the supers list traversal time here by
using ktime_get_real_ts() directly.

Signed-off-by: Deepa Dinamani
Signed-off-by: Greg Kroah-Hartman

Deepa Dinamani
2016-03-30 01:11:44 +0800

22 Mar, 2016

1 commit

5518f66b5 Merge branch 'for-4.6-ns' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup namespace support from Tejun Heo:
"These are changes to implement namespace support for cgroup which has
been pending for quite some time now. It is very straight-forward and
only affects what part of cgroup hierarchies are visible.

After unsharing, mounting a cgroup fs will be scoped to the cgroups
the task belonged to at the time of unsharing and the cgroup paths
exposed to userland would be adjusted accordingly"

* 'for-4.6-ns' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fix and restructure error handling in copy_cgroup_ns()
cgroup: fix alloc_cgroup_ns() error handling in copy_cgroup_ns()
Add FS_USERNS_FLAG to cgroup fs
cgroup: Add documentation for cgroup namespaces
cgroup: mount cgroupns-root when inside non-init cgroupns
kernfs: define kernfs_node_dentry
cgroup: cgroup namespace setns support
cgroup: introduce cgroup namespaces
sched: new clone flag CLONE_NEWCGROUP for cgroup namespace
kernfs: Add API to generate relative kernfs path

Linus Torvalds
2016-03-22 01:05:13 +0800

17 Feb, 2016

2 commits

fb3c83156 kernfs: define kernfs_node_dentry ... Browse Code »

Add a new kernfs api is added to lookup the dentry for a particular
kernfs path.

Signed-off-by: Aditya Kali
Signed-off-by: Serge E. Hallyn
Acked-by: Greg Kroah-Hartman
Signed-off-by: Tejun Heo

Aditya Kali
2016-02-17 02:04:58 +0800
9f6df573a kernfs: Add API to generate relative kernfs path ... Browse Code »

The new function kernfs_path_from_node() generates and returns kernfs
path of a given kernfs_node relative to a given parent kernfs_node.

Signed-off-by: Aditya Kali
Signed-off-by: Serge E. Hallyn
Acked-by: Greg Kroah-Hartman
Signed-off-by: Tejun Heo

Aditya Kali
2016-02-17 02:04:58 +0800

08 Feb, 2016

1 commit

e56ed358a kernfs: make kernfs_walk_ns() use kernfs_pr_cont_buf[] ... Browse Code »

kernfs_walk_ns() uses a static path_buf[PATH_MAX] to separate out path
components. Keeping around the 4k buffer just for kernfs_walk_ns() is
wasteful. This patch makes it piggyback on kernfs_pr_cont_buf[]
instead. This requires kernfs_walk_ns() to hold kernfs_rename_lock.

Signed-off-by: Tejun Heo
Reported-by: Geert Uytterhoeven
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2016-02-08 12:21:35 +0800

23 Jan, 2016

1 commit

5955102c9 wrappers for ->i_mutex access ... Browse Code »

parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
inode_foo(inode) being mutex_foo(&inode->i_mutex).

Please, use those for access to ->i_mutex; over the coming cycle
->i_mutex will become rwsem, with ->lookup() done with it held
only shared.

Signed-off-by: Al Viro

Al Viro
2016-01-23 07:04:28 +0800

15 Jan, 2016

1 commit

b2a209ffa Revert "kernfs: do not account ino_ida allocations to memcg" ... Browse Code »

Currently, all kmem allocations (namely every kmem_cache_alloc, kmalloc,
alloc_kmem_pages call) are accounted to memory cgroup automatically.
Callers have to explicitly opt out if they don't want/need accounting
for some reason. Such a design decision leads to several problems:

- kmalloc users are highly sensitive to failures, many of them
implicitly rely on the fact that kmalloc never fails, while memcg
makes failures quite plausible.

- A lot of objects are shared among different containers by design.
Accounting such objects to one of containers is just unfair.
Moreover, it might lead to pinning a dead memcg along with its kmem
caches, which aren't tiny, which might result in noticeable increase
in memory consumption for no apparent reason in the long run.

- There are tons of short-lived objects. Accounting them to memcg will
only result in slight noise and won't change the overall picture, but
we still have to pay accounting overhead.

For more info, see

- http://lkml.kernel.org/r/20151105144002.GB15111%40dhcp22.suse.cz
- http://lkml.kernel.org/r/20151106090555.GK29259@esperanza

Therefore this patchset switches to the white list policy. Now kmalloc
users have to explicitly opt in by passing __GFP_ACCOUNT flag.

Currently, the list of accounted objects is quite limited and only
includes those allocations that (1) are known to be easily triggered
from userspace and (2) can fail gracefully (for the full list see patch
no. 6) and it still misses many object types. However, accounting only
those objects should be a satisfactory approximation of the behavior we
used to have for most sane workloads.

This patch (of 6):

Revert 499611ed451508a42d1d7d ("kernfs: do not account ino_ida allocations
to memcg").

Black-list kmem accounting policy (aka __GFP_NOACCOUNT) turned out to be
fragile and difficult to maintain, because there seem to be many more
allocations that should not be accounted than those that should be.
Besides, false accounting an allocation might result in much worse
consequences than not accounting at all, namely increased memory
consumption due to pinned dead kmem caches.

So it was decided to switch to the white-list policy. This patch reverts
bits introducing the black-list policy. The white-list policy will be
introduced later in the series.

Signed-off-by: Vladimir Davydov
Acked-by: Johannes Weiner
Cc: Michal Hocko
Cc: Tejun Heo
Cc: Greg Thelen
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2016-01-15 08:00:49 +0800

13 Jan, 2016

1 commit

aee3bfa33 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next ... Browse Code »

Pull networking updates from Davic Miller:

1) Support busy polling generically, for all NAPI drivers. From Eric
Dumazet.

2) Add byte/packet counter support to nft_ct, from Floriani Westphal.

3) Add RSS/XPS support to mvneta driver, from Gregory Clement.

4) Implement IPV6_HDRINCL socket option for raw sockets, from Hannes
Frederic Sowa.

5) Add support for T6 adapter to cxgb4 driver, from Hariprasad Shenai.

6) Add support for VLAN device bridging to mlxsw switch driver, from
Ido Schimmel.

7) Add driver for Netronome NFP4000/NFP6000, from Jakub Kicinski.

8) Provide hwmon interface to mlxsw switch driver, from Jiri Pirko.

9) Reorganize wireless drivers into per-vendor directories just like we
do for ethernet drivers. From Kalle Valo.

10) Provide a way for administrators "destroy" connected sockets via the
SOCK_DESTROY socket netlink diag operation. From Lorenzo Colitti.

11) Add support to add/remove multicast routes via netlink, from Nikolay
Aleksandrov.

12) Make TCP keepalive settings per-namespace, from Nikolay Borisov.

13) Add forwarding and packet duplication facilities to nf_tables, from
Pablo Neira Ayuso.

14) Dead route support in MPLS, from Roopa Prabhu.

15) TSO support for thunderx chips, from Sunil Goutham.

16) Add driver for IBM's System i/p VNIC protocol, from Thomas Falcon.

17) Rationalize, consolidate, and more completely document the checksum
offloading facilities in the networking stack. From Tom Herbert.

18) Support aborting an ongoing scan in mac80211/cfg80211, from
Vidyullatha Kanchanapally.

19) Use per-bucket spinlock for bpf hash facility, from Tom Leiming.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1375 commits)
net: bnxt: always return values from _bnxt_get_max_rings
net: bpf: reject invalid shifts
phonet: properly unshare skbs in phonet_rcv()
dwc_eth_qos: Fix dma address for multi-fragment skbs
phy: remove an unneeded condition
mdio: remove an unneed condition
mdio_bus: NULL dereference on allocation error
net: Fix typo in netdev_intersect_features
net: freescale: mac-fec: Fix build error from phy_device API change
net: freescale: ucc_geth: Fix build error from phy_device API change
bonding: Prevent IPv6 link local address on enslaved devices
IB/mlx5: Add flow steering support
net/mlx5_core: Export flow steering API
net/mlx5_core: Make ipv4/ipv6 location more clear
net/mlx5_core: Enable flow steering support for the IB driver
net/mlx5_core: Initialize namespaces only when supported by device
net/mlx5_core: Set priority attributes
net/mlx5_core: Connect flow tables
net/mlx5_core: Introduce modify flow table command
net/mlx5_core: Managing root flow table
...

Linus Torvalds
2016-01-13 10:57:02 +0800

12 Jan, 2016

1 commit

ddf1d6238 Merge branch 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs xattr updates from Al Viro:
"Andreas' xattr cleanup series.

It's a followup to his xattr work that went in last cycle; -0.5KLoC"

* 'work.xattr' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
xattr handlers: Simplify list operation
ocfs2: Replace list xattr handler operations
nfs: Move call to security_inode_listsecurity into nfs_listxattr
xfs: Change how listxattr generates synthetic attributes
tmpfs: listxattr should include POSIX ACL xattrs
tmpfs: Use xattr handler infrastructure
btrfs: Use xattr handler infrastructure
vfs: Distinguish between full xattr names and proper prefixes
posix acls: Remove duplicate xattr name definitions
gfs2: Remove gfs2_xattr_acl_chmod
vfs: Remove vfs_xattr_cmp

Linus Torvalds
2016-01-12 05:32:10 +0800

31 Dec, 2015

1 commit

fceef393a switch ->get_link() to delayed_call, kill ->put_link() ... Browse Code »

Signed-off-by: Al Viro

Al Viro
2015-12-31 02:01:03 +0800

30 Dec, 2015

1 commit

cd3417c8f kill free_page_put_link() ... Browse Code »

all callers are better off with kfree_put_link()

Signed-off-by: Al Viro

Al Viro
2015-12-30 05:03:53 +0800

09 Dec, 2015

1 commit

6b2553918 replace ->follow_link() with new method that could stay in RCU mode ... Browse Code »

new method: ->get_link(); replacement of ->follow_link(). The differences
are:
* inode and dentry are passed separately
* might be called both in RCU and non-RCU mode;
the former is indicated by passing it a NULL dentry.
* when called that way it isn't allowed to block
and should return ERR_PTR(-ECHILD) if it needs to be called
in non-RCU mode.

It's a flagday change - the old method is gone, all in-tree instances
converted. Conversion isn't hard; said that, so far very few instances
do not immediately bail out when called in RCU mode. That'll change
in the next commits.

Signed-off-by: Al Viro

Al Viro
2015-12-09 11:41:54 +0800

07 Dec, 2015

2 commits

786534b92 tmpfs: listxattr should include POSIX ACL xattrs ... Browse Code »

When a file on tmpfs has an ACL or a Default ACL, listxattr should include the
corresponding xattr name.

Signed-off-by: Andreas Gruenbacher
Reviewed-by: James Morris
Cc: Hugh Dickins
Cc: linux-mm@kvack.org
Signed-off-by: Al Viro

Andreas Gruenbacher
2015-12-07 10:34:15 +0800
aa7c5241c tmpfs: Use xattr handler infrastructure ... Browse Code »

Use the VFS xattr handler infrastructure and get rid of similar code in
the filesystem. For implementing shmem_xattr_handler_set, we need a
version of simple_xattr_set which removes the attribute when value is
NULL. Use this to implement kernfs_iop_removexattr as well.

Signed-off-by: Andreas Gruenbacher
Reviewed-by: James Morris
Cc: Hugh Dickins
Cc: linux-mm@kvack.org
Signed-off-by: Al Viro

Andreas Gruenbacher
2015-12-07 10:34:15 +0800

21 Nov, 2015

1 commit

bd96f76a2 kernfs: implement kernfs_walk_and_get() ... Browse Code »

Implement kernfs_walk_and_get() which is similar to
kernfs_find_and_get() but can walk a path instead of just a name.

v2: Use strlcpy() instead of strlen() + memcpy() as suggested by
David.

Signed-off-by: Tejun Heo
Acked-by: Greg Kroah-Hartman
Cc: David Miller

Tejun Heo
2015-11-21 04:55:52 +0800

19 Aug, 2015

1 commit

9acee9c55 kernfs: implement kernfs_path_len() ... Browse Code »

Add a function to determine the path length of a kernfs node. This
for now will be used by writeback tracepoint updates.

Signed-off-by: Tejun Heo
Acked-by: Greg Kroah-Hartman
Signed-off-by: Jens Axboe

Tejun Heo
2015-08-19 06:49:15 +0800

04 Jul, 2015

1 commit

0cbee9926 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull user namespace updates from Eric Biederman:
"Long ago and far away when user namespaces where young it was realized
that allowing fresh mounts of proc and sysfs with only user namespace
permissions could violate the basic rule that only root gets to decide
if proc or sysfs should be mounted at all.

Some hacks were put in place to reduce the worst of the damage could
be done, and the common sense rule was adopted that fresh mounts of
proc and sysfs should allow no more than bind mounts of proc and
sysfs. Unfortunately that rule has not been fully enforced.

There are two kinds of gaps in that enforcement. Only filesystems
mounted on empty directories of proc and sysfs should be ignored but
the test for empty directories was insufficient. So in my tree
directories on proc, sysctl and sysfs that will always be empty are
created specially. Every other technique is imperfect as an ordinary
directory can have entries added even after a readdir returns and
shows that the directory is empty. Special creation of directories
for mount points makes the code in the kernel a smidge clearer about
it's purpose. I asked container developers from the various container
projects to help test this and no holes were found in the set of mount
points on proc and sysfs that are created specially.

This set of changes also starts enforcing the mount flags of fresh
mounts of proc and sysfs are consistent with the existing mount of
proc and sysfs. I expected this to be the boring part of the work but
unfortunately unprivileged userspace winds up mounting fresh copies of
proc and sysfs with noexec and nosuid clear when root set those flags
on the previous mount of proc and sysfs. So for now only the atime,
read-only and nodev attributes which userspace happens to keep
consistent are enforced. Dealing with the noexec and nosuid
attributes remains for another time.

This set of changes also addresses an issue with how open file
descriptors from /proc//ns/* are displayed. Recently readlink of
/proc//fd has been triggering a WARN_ON that has not been
meaningful since it was added (as all of the code in the kernel was
converted) and is not now actively wrong.

There is also a short list of issues that have not been fixed yet that
I will mention briefly.

It is possible to rename a directory from below to above a bind mount.
At which point any directory pointers below the renamed directory can
be walked up to the root directory of the filesystem. With user
namespaces enabled a bind mount of the bind mount can be created
allowing the user to pick a directory whose children they can rename
to outside of the bind mount. This is challenging to fix and doubly
so because all obvious solutions must touch code that is in the
performance part of pathname resolution.

As mentioned above there is also a question of how to ensure that
developers by accident or with purpose do not introduce exectuable
files on sysfs and proc and in doing so introduce security regressions
in the current userspace that will not be immediately obvious and as
such are likely to require breaking userspace in painful ways once
they are recognized"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
vfs: Remove incorrect debugging WARN in prepend_path
mnt: Update fs_fully_visible to test for permanently empty directories
sysfs: Create mountpoints with sysfs_create_mount_point
sysfs: Add support for permanently empty directories to serve as mount points.
kernfs: Add support for always empty directories.
proc: Allow creating permanently empty directories that serve as mount points
sysctl: Allow creating permanently empty directories that serve as mountpoints.
fs: Add helper functions for permanently empty directories.
vfs: Ignore unlocked mounts in fs_fully_visible
mnt: Modify fs_fully_visible to deal with locked ro nodev and atime
mnt: Refactor the logic for mounting sysfs and proc in a user namespace

Linus Torvalds
2015-07-04 06:20:57 +0800

01 Jul, 2015

1 commit

ea015218f kernfs: Add support for always empty directories. ... Browse Code »

Add a new function kernfs_create_empty_dir that can be used to create
directory that can not be modified.

Update the code to use make_empty_dir_inode when reporting a
permanently empty directory to the vfs.

Update the code to not allow adding to permanently empty directories.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2015-07-01 23:36:43 +0800

27 Jun, 2015

2 commits

bbe179f88 Merge branch 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup updates from Tejun Heo:

- threadgroup_lock got reorganized so that its users can pick the
actual locking mechanism to use. Its only user - cgroups - is
updated to use a percpu_rwsem instead of per-process rwsem.

This makes things a bit lighter on hot paths and allows cgroups to
perform and fail multi-task (a process) migrations atomically.
Multi-task migrations are used in several places including the
unified hierarchy.

- Delegation rule and documentation added to unified hierarchy. This
will likely be the last interface update from the cgroup core side
for unified hierarchy before lifting the devel mask.

- Some groundwork for the pids controller which is scheduled to be
merged in the coming devel cycle.

* 'for-4.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: add delegation section to unified hierarchy documentation
cgroup: require write perm on common ancestor when moving processes on the default hierarchy
cgroup: separate out cgroup_procs_write_permission() from __cgroup_procs_write()
kernfs: make kernfs_get_inode() public
MAINTAINERS: add a cgroup core co-maintainer
cgroup: fix uninitialised iterator in for_each_subsys_which
cgroup: replace explicit ss_mask checking with for_each_subsys_which
cgroup: use bitmask to filter for_each_subsys
cgroup: add seq_file forward declaration for struct cftype
cgroup: simplify threadgroup locking
sched, cgroup: replace signal_struct->group_rwsem with a global percpu_rwsem
sched, cgroup: reorganize threadgroup locking
cgroup: switch to unsigned long for bitmasks
cgroup: reorganize include/linux/cgroup.h
cgroup: separate out include/linux/cgroup-defs.h
cgroup: fix some comment typos

Linus Torvalds
2015-06-27 10:50:04 +0800
8d7804a2f Merge tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull driver core updates from Greg KH:
"Here is the driver core / firmware changes for 4.2-rc1.

A number of small changes all over the place in the driver core, and
in the firmware subsystem. Nothing really major, full details in the
shortlog. Some of it is a bit of churn, given that the platform
driver probing changes was found to not work well, so they were
reverted.

All of these have been in linux-next for a while with no reported
issues"

* tag 'driver-core-4.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (31 commits)
Revert "base/platform: Only insert MEM and IO resources"
Revert "base/platform: Continue on insert_resource() error"
Revert "of/platform: Use platform_device interface"
Revert "base/platform: Remove code duplication"
firmware: add missing kfree for work on async call
fs: sysfs: don't pass count == 0 to bin file readers
base:dd - Fix for typo in comment to function driver_deferred_probe_trigger().
base/platform: Remove code duplication
of/platform: Use platform_device interface
base/platform: Continue on insert_resource() error
base/platform: Only insert MEM and IO resources
firmware: use const for remaining firmware names
firmware: fix possible use after free on name on asynchronous request
firmware: check for file truncation on direct firmware loading
firmware: fix __getname() missing failure check
drivers: of/base: move of_init to driver_init
drivers/base: cacheinfo: fix annoying typo when DT nodes are absent
sysfs: disambiguate between "error code" and "failure" in comments
driver-core: fix build for !CONFIG_MODULES
driver-core: make __device_attach() static
...

Linus Torvalds
2015-06-27 06:07:37 +0800

23 Jun, 2015

1 commit

052b398a4 Merge branch 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs ... Browse Code »

Pull vfs updates from Al Viro:
"In this pile: pathname resolution rewrite.

- recursion in link_path_walk() is gone.

- nesting limits on symlinks are gone (the only limit remaining is
that the total amount of symlinks is no more than 40, no matter how
nested).

- "fast" (inline) symlinks are handled without leaving rcuwalk mode.

- stack footprint (independent of the nesting) is below kilobyte now,
about on par with what it used to be with one level of nested
symlinks and ~2.8 times lower than it used to be in the worst case.

- struct nameidata is entirely private to fs/namei.c now (not even
opaque pointers are being passed around).

- ->follow_link() and ->put_link() calling conventions had been
changed; all in-tree filesystems converted, out-of-tree should be
able to follow reasonably easily.

For out-of-tree conversions, see Documentation/filesystems/porting
for details (and in-tree filesystems for examples of conversion).

That has sat in -next since mid-May, seems to survive all testing
without regressions and merges clean with v4.1"

* 'for-linus-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (131 commits)
turn user_{path_at,path,lpath,path_dir}() into static inlines
namei: move saved_nd pointer into struct nameidata
inline user_path_create()
inline user_path_parent()
namei: trim do_last() arguments
namei: stash dfd and name into nameidata
namei: fold path_cleanup() into terminate_walk()
namei: saner calling conventions for filename_parentat()
namei: saner calling conventions for filename_create()
namei: shift nameidata down into filename_parentat()
namei: make filename_lookup() reject ERR_PTR() passed as name
namei: shift nameidata inside filename_lookup()
namei: move putname() call into filename_lookup()
namei: pass the struct path to store the result down into path_lookupat()
namei: uninline set_root{,_rcu}()
namei: be careful with mountpoint crossings in follow_dotdot_rcu()
Documentation: remove outdated information from automount-support.txt
get rid of assorted nameidata-related debris
lustre: kill unused helper
lustre: kill unused macro (LOOKUP_CONTINUE)
...

Linus Torvalds
2015-06-23 03:51:21 +0800

19 Jun, 2015

1 commit

fb02915f4 kernfs: make kernfs_get_inode() public ... Browse Code »

Move kernfs_get_inode() prototype from fs/kernfs/kernfs-internal.h to
include/linux/kernfs.h. It obtains the matching inode for a
kernfs_node.

It will be used by cgroup for inode based permission checks for now
but is generally useful.

Signed-off-by: Tejun Heo
Acked-by: Greg Kroah-Hartman

Tejun Heo
2015-06-19 04:54:28 +0800

25 May, 2015

1 commit

ba50150e8 kernfs: remove outdated and confusing comment ... Browse Code »

Grabbing the parent is not happening anymore since 2010 (e72ceb8ccac5f7
"sysfs: Remove sysfs_get/put_active_two"). Remove this confusing
comment.

Signed-off-by: Wolfram Sang
Acked-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Wolfram Sang
2015-05-25 03:28:29 +0800

15 May, 2015

1 commit

499611ed4 kernfs: do not account ino_ida allocations to memcg ... Browse Code »

root->ino_ida is used for kernfs inode number allocations. Since IDA has
a layered structure, different IDs can reside on the same layer, which
is currently accounted to some memory cgroup. The problem is that each
kmem cache of a memory cgroup has its own directory on sysfs (under
/sys/fs/kernel//cgroup). If the inode number of such a
directory or any file in it gets allocated from a layer accounted to the
cgroup which the cache is created for, the cgroup will get pinned for
good, because one has to free all kmem allocations accounted to a cgroup
in order to release it and destroy all its kmem caches. That said we
must not account layers of ino_ida to any memory cgroup.

Since per net init operations may create new sysfs entries directly
(e.g. lo device) or indirectly (nf_conntrack creates a new kmem cache
per each namespace, which, in turn, creates new sysfs entries), an easy
way to reproduce this issue is by creating network namespace(s) from
inside a kmem-active memory cgroup.

Signed-off-by: Vladimir Davydov
Acked-by: Tejun Heo
Cc: Johannes Weiner
Cc: Michal Hocko
Cc: Christoph Lameter
Cc: Pekka Enberg
Cc: David Rientjes
Cc: Joonsoo Kim
Cc: Greg Thelen
Cc: Greg Kroah-Hartman
Cc: [4.0.x]
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Vladimir Davydov
2015-05-15 08:55:51 +0800

11 May, 2015

3 commits

ecc087ff1 new helper: free_page_put_link() ... Browse Code »

similar to kfree_put_link()

Signed-off-by: Al Viro

Al Viro
2015-05-11 20:13:13 +0800
5f2c4179e switch ->put_link() from dentry to inode ... Browse Code »

only one instance looks at that argument at all; that sole
exception wants inode rather than dentry.

Signed-off-by: Al Viro

Al Viro
2015-05-11 20:13:12 +0800
6e77137b3 don't pass nameidata to ->follow_link() ... Browse Code »

its only use is getting passed to nd_jump_link(), which can obtain
it from current->nameidata

Signed-off-by: Al Viro

Al Viro
2015-05-11 10:20:15 +0800