Eric Lee / smarc-fsl-linux-kernel

15 Oct, 2016

1 commit

f34d3606f Merge branch 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup ... Browse Code »

Pull cgroup updates from Tejun Heo:

- tracepoints for basic cgroup management operations added

- kernfs and cgroup path formatting functions updated to behave in the
style of strlcpy()

- non-critical bug fixes

* 'for-4.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
blkcg: Unlock blkcg_pol_mutex only once when cpd == NULL
cgroup: fix error handling regressions in proc_cgroup_show() and cgroup_release_agent()
cpuset: fix error handling regression in proc_cpuset_show()
cgroup: add tracepoints for basic operations
cgroup: make cgroup_path() and friends behave in the style of strlcpy()
kernfs: remove kernfs_path_len()
kernfs: make kernfs_path*() behave in the style of strlcpy()
kernfs: add dummy implementation of kernfs_path_from_node()

Linus Torvalds
2016-10-15 03:18:50 +0800

27 Sep, 2016

1 commit

78618d395 sysfs print name of undiscoverable attribute group ... Browse Code »

Print the name of an undiscoverable attribute group and not the
pointer's address.

Signed-off-by: Johannes Thumshirn
Signed-off-by: Greg Kroah-Hartman

Johannes Thumshirn
2016-09-27 18:24:29 +0800

31 Aug, 2016

1 commit

17d0774f8 sysfs: correctly handle read offset on PREALLOC attrs ... Browse Code »

Attributes declared with __ATTR_PREALLOC use sysfs_kf_read() which returns
zero bytes for non-zero offset. This breaks script checkarray in mdadm tool
in debian where /bin/sh is 'dash' because its builtin 'read' reads only one
byte at a time. Script gets 'i' instead of 'idle' when reads current action
from /sys/block/$dev/md/sync_action and as a result does nothing.

This patch adds trivial implementation of partial read: generate whole
string and move required part into buffer head.

Signed-off-by: Konstantin Khlebnikov
Fixes: 4ef67a8c95f3 ("sysfs/kernfs: make read requests on pre-alloc files use the buffer.")
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=787950
Cc: Stable # v3.19+
Acked-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Konstantin Khlebnikov
2016-08-31 21:14:44 +0800

10 Aug, 2016

1 commit

3abb1d90f kernfs: make kernfs_path*() behave in the style of strlcpy() ... Browse Code »

kernfs_path*() functions always return the length of the full path but
the path content is undefined if the length is larger than the
provided buffer. This makes its behavior different from strlcpy() and
requires error handling in all its users even when they don't care
about truncation. In addition, the implementation can actully be
simplified by making it behave properly in strlcpy() style.

* Update kernfs_path_from_node_locked() to always fill up the buffer
with path. If the buffer is not large enough, the output is
truncated and terminated.

* kernfs_path() no longer needs error handling. Make it a simple
inline wrapper around kernfs_path_from_node().

* sysfs_warn_dup()'s use of kernfs_path() doesn't need error handling.
Updated accordingly.

* cgroup_path()'s use of kernfs_path() updated to retain the old
behavior.

Signed-off-by: Tejun Heo
Acked-by: Greg Kroah-Hartman
Acked-by: Serge Hallyn

Tejun Heo
2016-08-10 23:23:44 +0800

24 Jun, 2016

2 commits

29a517c23 kernfs: The cgroup filesystem also benefits from SB_I_NOEXEC ... Browse Code »

The cgroup filesystem is in the same boat as sysfs. No one ever
permits executables of any kind on the cgroup filesystem, and there is
no reasonable future case to support executables in the future.

Therefore move the setting of SB_I_NOEXEC which makes the code proof
against future mistakes of accidentally creating executables from
sysfs to kernfs itself. Making the code simpler and covering the
sysfs, cgroup, and cgroup2 filesystems.

Acked-by: Seth Forshee
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2016-06-24 04:41:56 +0800
8654df4e2 mnt: Refactor fs_fully_visible into mount_too_revealing ... Browse Code »

Replace the call of fs_fully_visible in do_new_mount from before the
new superblock is allocated with a call of mount_too_revealing after
the superblock is allocated. This winds up being a much better location
for maintainability of the code.

The first change this enables is the replacement of FS_USERNS_VISIBLE
with SB_I_USERNS_VISIBLE. Moving the flag from struct filesystem_type
to sb_iflags on the superblock.

Unfortunately mount_too_revealing fundamentally needs to touch
mnt_flags adding several MNT_LOCKED_XXX flags at the appropriate
times. If the mnt_flags did not need to be touched the code
could be easily moved into the filesystem specific mount code.

Acked-by: Seth Forshee
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2016-06-24 04:41:46 +0800

14 Nov, 2015

1 commit

63f4f7e8d Merge tag 'chrome-platform-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git… ... Browse Code »

…/olof/chrome-platform

Pull chrome platform updates from Olof Johansson:
"Here's the branch of chrome platform changes for v4.4. Some have been
queued up for the full 4.3 release cycle since I forgot to send them
in for that round (rebased early on to deal with fixes conflicts).

Most of these enable EC communication stuff -- Pixel 2015 support,
enabling building for ARM64 platforms, and a few fixes for memory
leaks.

There's also a patch in here to allow reading/writing the verified
boot context, which depends on a sysfs patch acked by Greg"

* tag 'chrome-platform-4.4' of git://git.kernel.org/pub/scm/linux/kernel/git/olof/chrome-platform:
platform/chrome: Fix i2c-designware adapter name
platform/chrome: Support reading/writing the vboot context
sysfs: Support is_visible() on binary attributes
platform/chrome: cros_ec: Fix possible leak in led_rgb_store()
platform/chrome: cros_ec: Fix leak in sequence_store()
platform/chrome: Enable Chrome platforms on 64-bit ARM
platform/chrome: cros_ec_dev - Add a platform device ID table
platform/chrome: cros_ec_lpc - Add support for Google Pixel 2
platform/chrome: cros_ec_lpc - Use existing function to check EC result
platform/chrome: Make depends on MFD_CROS_EC instead CROS_EC_PROTO
Revert "platform/chrome: Don't make CHROME_PLATFORMS depends on X86 || ARM"

Linus Torvalds
2015-11-14 13:53:18 +0800

06 Nov, 2015

1 commit

1873499e1 Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security ... Browse Code »

Pull security subsystem update from James Morris:
"This is mostly maintenance updates across the subsystem, with a
notable update for TPM 2.0, and addition of Jarkko Sakkinen as a
maintainer of that"

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (40 commits)
apparmor: clarify CRYPTO dependency
selinux: Use a kmem_cache for allocation struct file_security_struct
selinux: ioctl_has_perm should be static
selinux: use sprintf return value
selinux: use kstrdup() in security_get_bools()
selinux: use kmemdup in security_sid_to_context_core()
selinux: remove pointless cast in selinux_inode_setsecurity()
selinux: introduce security_context_str_to_sid
selinux: do not check open perm on ftruncate call
selinux: change CONFIG_SECURITY_SELINUX_CHECKREQPROT_VALUE default
KEYS: Merge the type-specific data with the payload data
KEYS: Provide a script to extract a module signature
KEYS: Provide a script to extract the sys cert list from a vmlinux file
keys: Be more consistent in selection of union members used
certs: add .gitignore to stop git nagging about x509_certificate_list
KEYS: use kvfree() in add_key
Smack: limited capability for changing process label
TPM: remove unnecessary little endian conversion
vTPM: support little endian guests
char: Drop owner assignment from i2c_driver
...

Linus Torvalds
2015-11-06 07:32:38 +0800

19 Oct, 2015

1 commit

37c1c04cc sysfs: added __compat_only_sysfs_link_entry_to_kobj() ... Browse Code »

Added a new function __compat_only_sysfs_link_group_to_kobj() that adds
a symlink from attribute or group to a kobject. This needed for
maintaining backwards compatibility with PPI attributes in the TPM
driver.

Signed-off-by: Jarkko Sakkinen
Signed-off-by: Peter Huewe

Jarkko Sakkinen
2015-10-19 07:01:19 +0800

08 Oct, 2015

1 commit

7f5028cf6 sysfs: Support is_visible() on binary attributes ... Browse Code »

According to the sysfs header file:

"The returned value will replace static permissions defined in
struct attribute or struct bin_attribute."

but this isn't the case, as is_visible is only called on struct attribute
only. This patch introduces a new is_bin_visible() function to implement
the same functionality for binary attributes, and updates documentation
accordingly.

Note that to keep functionality and code similar to that of normal
attributes, the mode is now checked as well to ensure it contains only
read/write permissions or SYSFS_PREALLOC.

Reviewed-by: Guenter Roeck
Signed-off-by: Emilio López
Acked-by: Greg Kroah-Hartman
Signed-off-by: Olof Johansson

Emilio López
2015-10-08 06:05:31 +0800

05 Oct, 2015

1 commit

65da3484d sysfs: correctly handle short reads on PREALLOC attrs. ... Browse Code »

attributes declared with __ATTR_PREALLOC use sysfs_kf_read()
which ignores the 'count' arg.
So a 1-byte read request can return more bytes than that.

This is seen with the 'dash' shell when 'read' is used on
some 'md' sysfs attributes.

So only return the 'min' of count and the attribute length.

Signed-off-by: NeilBrown
Acked-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

NeilBrown
2015-10-05 02:42:22 +0800

10 Jul, 2015

1 commit

90f8572b0 vfs: Commit to never having exectuables on proc and sysfs. ... Browse Code »

Today proc and sysfs do not contain any executable files. Several
applications today mount proc or sysfs without noexec and nosuid and
then depend on there being no exectuables files on proc or sysfs.
Having any executable files show on proc or sysfs would cause
a user space visible regression, and most likely security problems.

Therefore commit to never allowing executables on proc and sysfs by
adding a new flag to mark them as filesystems without executables and
enforce that flag.

Test the flag where MNT_NOEXEC is tested today, so that the only user
visible effect will be that exectuables will be treated as if the
execute bit is cleared.

The filesystems proc and sysfs do not currently incoporate any
executable files so this does not result in any user visible effects.

This makes it unnecessary to vet changes to proc and sysfs tightly for
adding exectuable files or changes to chattr that would modify
existing files, as no matter what the individual file say they will
not be treated as exectuable files by the vfs.

Not having to vet changes to closely is important as without this we
are only one proc_create call (or another goof up in the
implementation of notify_change) from having problematic executables
on proc. Those mistakes are all too easy to make and would create
a situation where there are security issues or the assumptions of
some program having to be broken (and cause userspace regressions).

Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2015-07-10 23:39:25 +0800

04 Jul, 2015

1 commit

0cbee9926 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace ... Browse Code »

Pull user namespace updates from Eric Biederman:
"Long ago and far away when user namespaces where young it was realized
that allowing fresh mounts of proc and sysfs with only user namespace
permissions could violate the basic rule that only root gets to decide
if proc or sysfs should be mounted at all.

Some hacks were put in place to reduce the worst of the damage could
be done, and the common sense rule was adopted that fresh mounts of
proc and sysfs should allow no more than bind mounts of proc and
sysfs. Unfortunately that rule has not been fully enforced.

There are two kinds of gaps in that enforcement. Only filesystems
mounted on empty directories of proc and sysfs should be ignored but
the test for empty directories was insufficient. So in my tree
directories on proc, sysctl and sysfs that will always be empty are
created specially. Every other technique is imperfect as an ordinary
directory can have entries added even after a readdir returns and
shows that the directory is empty. Special creation of directories
for mount points makes the code in the kernel a smidge clearer about
it's purpose. I asked container developers from the various container
projects to help test this and no holes were found in the set of mount
points on proc and sysfs that are created specially.

This set of changes also starts enforcing the mount flags of fresh
mounts of proc and sysfs are consistent with the existing mount of
proc and sysfs. I expected this to be the boring part of the work but
unfortunately unprivileged userspace winds up mounting fresh copies of
proc and sysfs with noexec and nosuid clear when root set those flags
on the previous mount of proc and sysfs. So for now only the atime,
read-only and nodev attributes which userspace happens to keep
consistent are enforced. Dealing with the noexec and nosuid
attributes remains for another time.

This set of changes also addresses an issue with how open file
descriptors from /proc//ns/* are displayed. Recently readlink of
/proc//fd has been triggering a WARN_ON that has not been
meaningful since it was added (as all of the code in the kernel was
converted) and is not now actively wrong.

There is also a short list of issues that have not been fixed yet that
I will mention briefly.

It is possible to rename a directory from below to above a bind mount.
At which point any directory pointers below the renamed directory can
be walked up to the root directory of the filesystem. With user
namespaces enabled a bind mount of the bind mount can be created
allowing the user to pick a directory whose children they can rename
to outside of the bind mount. This is challenging to fix and doubly
so because all obvious solutions must touch code that is in the
performance part of pathname resolution.

As mentioned above there is also a question of how to ensure that
developers by accident or with purpose do not introduce exectuable
files on sysfs and proc and in doing so introduce security regressions
in the current userspace that will not be immediately obvious and as
such are likely to require breaking userspace in painful ways once
they are recognized"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
vfs: Remove incorrect debugging WARN in prepend_path
mnt: Update fs_fully_visible to test for permanently empty directories
sysfs: Create mountpoints with sysfs_create_mount_point
sysfs: Add support for permanently empty directories to serve as mount points.
kernfs: Add support for always empty directories.
proc: Allow creating permanently empty directories that serve as mount points
sysctl: Allow creating permanently empty directories that serve as mountpoints.
fs: Add helper functions for permanently empty directories.
vfs: Ignore unlocked mounts in fs_fully_visible
mnt: Modify fs_fully_visible to deal with locked ro nodev and atime
mnt: Refactor the logic for mounting sysfs and proc in a user namespace

Linus Torvalds
2015-07-04 06:20:57 +0800

01 Jul, 2015

1 commit

87d2846fc sysfs: Add support for permanently empty directories to serve as mount points. ... Browse Code »

Add two functions sysfs_create_mount_point and
sysfs_remove_mount_point that hang a permanently empty directory off
of a kobject or remove a permanently emptpy directory hanging from a
kobject. Export these new functions so modular filesystems can use
them.

Cc: stable@vger.kernel.org
Acked-by: Greg Kroah-Hartman
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2015-07-01 23:36:45 +0800

01 Jun, 2015

1 commit

eaa5cd926 fs: sysfs: don't pass count == 0 to bin file readers ... Browse Code »

If count == 0 bytes are requested by a reader, sysfs_kf_bin_read()
deliberately returns 0 without passing a potentially harmful value to
some externally defined underlying battr->read() function.

However in case of (pos == size && count) the next clause always sets
count to 0 and this value is handed over to battr->read().

The change intends to make obsolete (and remove later) a redundant
sanity check in battr->read(), if it is present, or add more
protection to struct bin_attribute users, who does not care about
input arguments.

Signed-off-by: Vladimir Zapolskiy
Acked-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Vladimir Zapolskiy
2015-06-01 09:17:17 +0800

25 May, 2015

1 commit

ed1dc8a89 sysfs: disambiguate between "error code" and "failure" in comments ... Browse Code »

The sentence "Returns 0 on success or error" might be misinterpreted as
"the function will always returns 0", make it less ambiguous.

Also, use the word "failure" as the contrary of "success".

Signed-off-by: Antonio Ospite
Cc: Greg Kroah-Hartman
Cc: Jonathan Corbet
Cc: linux-doc@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman

Antonio Ospite
2015-05-25 03:31:33 +0800

14 May, 2015

1 commit

1b852bceb mnt: Refactor the logic for mounting sysfs and proc in a user namespace ... Browse Code »

Fresh mounts of proc and sysfs are a very special case that works very
much like a bind mount. Unfortunately the current structure can not
preserve the MNT_LOCK... mount flags. Therefore refactor the logic
into a form that can be modified to preserve those lock bits.

Add a new filesystem flag FS_USERNS_VISIBLE that requires some mount
of the filesystem be fully visible in the current mount namespace,
before the filesystem may be mounted.

Move the logic for calling fs_fully_visible from proc and sysfs into
fs/namespace.c where it has greater access to mount namespace state.

Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman"

Eric W. Biederman
2015-05-14 10:44:11 +0800

25 Mar, 2015

2 commits

d8bf8c92e sysfs: Only accept read/write permissions for file attributes ... Browse Code »

For sysfs file attributes, only read and write permissions make sense.
Mask provided attribute permissions accordingly and send a warning
to the console if invalid permission bits are set.

This patch is originally from Guenter [1] and includes the fixup
explained in the thread, that is printing permissions in octal format
and limiting the scope of attributes to SYSFS_PREALLOC | 0664.

[1] https://lkml.org/lkml/2015/1/19/599

Signed-off-by: Vivien Didelot
Reviewed-by: Guenter Roeck
Signed-off-by: Greg Kroah-Hartman

Vivien Didelot
2015-03-25 20:27:57 +0800
da4759c73 sysfs: Use only return value from is_visible for the file mode ... Browse Code »

Up to now, is_visible can only be used to either remove visibility
of a file entirely or to add permissions, but not to reduce permissions.
This makes it impossible, for example, to use DEVICE_ATTR_RW to define
file attributes and reduce permissions to read-only.

This behavior is undesirable and unnecessarily complicates code which
needs to reduce permissions; instead of just returning the desired
permissions, it has to ensure that the permissions in the attribute
variable declaration only reflect the minimal permissions ever needed.

Change semantics of is_visible to only use the permissions returned
from it instead of oring the returned value with the hard-coded
permissions.

Signed-off-by: Guenter Roeck
Signed-off-by: Vivien Didelot
Signed-off-by: Greg Kroah-Hartman

Guenter Roeck
2015-03-25 20:27:57 +0800

16 Feb, 2015

1 commit

9682ec969 Merge tag 'driver-core-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core ... Browse Code »

Pull driver core patches from Greg KH:
"Really tiny set of patches for this kernel. Nothing major, all
described in the shortlog and have been in linux-next for a while"

* tag 'driver-core-3.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
sysfs: fix warning when creating a sysfs group without attributes
firmware_loader: handle timeout via wait_for_completion_interruptible_timeout()
firmware_loader: abort request if wait_for_completion is interrupted
firmware: Correct function name in comment
device: Change dev_ logging functions to return void
device: Fix dev_dbg_once macro

Linus Torvalds
2015-02-16 03:11:47 +0800

14 Feb, 2015

1 commit

dfeb0750b kernfs: remove KERNFS_STATIC_NAME ... Browse Code »

When a new kernfs node is created, KERNFS_STATIC_NAME is used to avoid
making a separate copy of its name. It's currently only used for sysfs
attributes whose filenames are required to stay accessible and unchanged.
There are rare exceptions where these names are allocated and formatted
dynamically but for the vast majority of cases they're consts in the
rodata section.

Now that kernfs is converted to use kstrdup_const() and kfree_const(),
there's little point in keeping KERNFS_STATIC_NAME around. Remove it.

Signed-off-by: Tejun Heo
Cc: Andrzej Hajda
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds

Tejun Heo
2015-02-14 13:21:36 +0800

04 Feb, 2015

1 commit

adf305f77 sysfs: fix warning when creating a sysfs group without attributes ... Browse Code »

When attempting to create a gropu without attrs, the warning prints the
name of the group. However, the check for name being a NULL pointer is
wrong: it uses the pointer to the name when it's NULL. Fix it to use
the name if present, otherwise just put an empty string.

Cc: Bruno Prémont
Cc: Greg Kroah-Hartman
Signed-off-by: Javi Merino
Signed-off-by: Greg Kroah-Hartman

Javi Merino
2015-02-04 07:50:31 +0800

08 Nov, 2014

3 commits

4ef67a8c9 sysfs/kernfs: make read requests on pre-alloc files use the buffer. ... Browse Code »

To match the previous patch which used the pre-alloc buffer for
writes, this patch causes reads to use the same buffer.
This is not strictly necessary as the current seq_read() will allocate
on first read, so user-space can trigger the required pre-alloc. But
consistency is valuable.

The read function is somewhat simpler than seq_read() and, for example,
does not support reading from an offset into the file: reads must be
at the start of the file.

As seq_read() does not use the prealloc buffer, ->seq_show is
incompatible with ->prealloc and caused an EINVAL return from open().
sysfs code which calls into kernfs always chooses the correct function.

As the buffer is shared with writes and other reads, the mutex is
extended to cover the copy_to_user.

Signed-off-by: NeilBrown
Reviewed-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

NeilBrown
2014-11-08 02:54:38 +0800
2b75869bb sysfs/kernfs: allow attributes to request write buffer be pre-allocated. ... Browse Code »

md/raid allows metadata management to be performed in user-space.
A various times, particularly on device failure, the metadata needs
to be updated before further writes can be permitted.
This means that the user-space program which updates metadata much
not block on writeout, and so must not allocate memory.

mlockall(MCL_CURRENT|MCL_FUTURE) and pre-allocation can avoid all
memory allocation issues for user-memory, but that does not help
kernel memory.
Several kernel objects can be pre-allocated. e.g. files opened before
any writes to the array are permitted.
However some kernel allocation happens in places that cannot be
pre-allocated.
In particular, writes to sysfs files (to tell md that it can now
allow writes to the array) allocate a buffer using GFP_KERNEL.

This patch allows attributes to be marked as "PREALLOC". In that case
the maximal buffer is allocated when the file is opened, and then used
on each write instead of allocating a new buffer.

As the same buffer is now shared for all writes on the same file
description, the mutex is extended to cover full use of the buffer
including the copy_from_user().

The new __ATTR_PREALLOC() 'or's a new flag in to the 'mode', which is
inspected by sysfs_add_file_mode_ns() to determine if the file should be
marked as requiring prealloc.

Despite the comment, we *do* use ->seq_show together with ->prealloc
in this patch. The next patch fixes that.

Signed-off-by: NeilBrown
Reviewed-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

NeilBrown
2014-11-08 02:53:25 +0800
093689605 fs: sysfs: return EGBIG on write if offset is larger than file size ... Browse Code »

According to the user expectations common utilities like dd or sh
redirection operator > should work correctly over binary files from
sysfs. At the moment doing excessive write can not be completed:

write(1, "\0\0\0\0\0\0\0\0", 8) = 4
write(1, "\0\0\0\0", 4) = 0
write(1, "\0\0\0\0", 4) = 0
write(1, "\0\0\0\0", 4) = 0
...

Fix the problem by returning EFBIG described in man 2 write.

Signed-off-by: Vladimir Zapolskiy
Cc: Greg Kroah-Hartman
Signed-off-by: Greg Kroah-Hartman

Vladimir Zapolskiy
2014-11-08 02:52:20 +0800

28 May, 2014

2 commits

26fc9cd20 kernfs: move the last knowledge of sysfs out from kernfs ... Browse Code »

There is still one residue of sysfs remaining: the sb_magic
SYSFS_MAGIC. However this should be kernfs user specific,
so this patch moves it out. Kerrnfs user should specify their
magic number while mouting.

Signed-off-by: Jianyu Zhan
Acked-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Jianyu Zhan
2014-05-28 05:33:17 +0800
9f70a4012 sysfs: fix attribute_group bin file path on removal ... Browse Code »

Cody Schafer already fixed binary file creation for attribute groups, see [1].
This patch makes the appropriate changes for binary file removal
of attribute groups.
[1]: http://lkml.org/lkml/2014/2/27/832

Signed-off-by: Robert ABEL
Signed-off-by: Greg Kroah-Hartman

Robert ABEL
2014-05-28 05:33:17 +0800

20 May, 2014

1 commit

f5c16f29b sysfs: make sure read buffer is zeroed ... Browse Code »

13c589d5b0ac ("sysfs: use seq_file when reading regular files")
switched sysfs from custom read implementation to seq_file to enable
later transition to kernfs. After the change, the buffer passed to
->show() is acquired through seq_get_buf(); unfortunately, this
introduces a subtle behavior change. Before the commit, the buffer
passed to ->show() was always zero as it was allocated using
get_zeroed_page(). Because seq_file doesn't clear buffers on
allocation and neither does seq_get_buf(), after the commit, depending
on the behavior of ->show(), we may end up exposing uninitialized data
to userland thus possibly altering userland visible behavior and
leaking information.

Fix it by explicitly clearing the buffer.

Signed-off-by: Tejun Heo
Reported-by: Ron
Fixes: 13c589d5b0ac ("sysfs: use seq_file when reading regular files")
Cc: stable # 3.13+
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-05-20 09:15:53 +0800

13 May, 2014

1 commit

555724a83 kernfs, sysfs, cgroup: restrict extra perm check on open to sysfs ... Browse Code »

The kernfs open method - kernfs_fop_open() - inherited extra
permission checks from sysfs. While the vfs layer allows ignoring the
read/write permissions checks if the issuer has CAP_DAC_OVERRIDE,
sysfs explicitly denied open regardless of the cap if the file doesn't
have any of the UGO perms of the requested access or doesn't implement
the requested operation. It can be debated whether this was a good
idea or not but the behavior is too subtle and dangerous to change at
this point.

After cgroup got converted to kernfs, this extra perm check also got
applied to cgroup breaking libcgroup which opens write-only files with
O_RDWR as root. This patch gates the extra open permission check with
a new flag KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK and enables it for sysfs.
For sysfs, nothing changes. For cgroup, root now can perform any
operation regardless of the permissions as it was before kernfs
conversion. Note that kernfs still fails unimplemented operations
with -EINVAL.

While at it, add comments explaining KERNFS_ROOT flags.

Signed-off-by: Tejun Heo
Reported-by: Andrey Wagin
Tested-by: Andrey Wagin
Cc: Li Zefan
References: http://lkml.kernel.org/g/CANaxB-xUm3rJ-Cbp72q-rQJO5mZe1qK6qXsQM=vh0U8upJ44+A@mail.gmail.com
Fixes: 2bd59d48ebfb ("cgroup: convert to kernfs")
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-05-13 19:21:40 +0800

17 Apr, 2014

1 commit

33ac1257f sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner() ... Browse Code »

All device_schedule_callback_owner() users are converted to use
device_remove_file_self(). Remove now unused
{sysfs|device}_schedule_callback_owner().

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-04-17 02:56:33 +0800

26 Mar, 2014

1 commit

72099304e Revert "sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()" ... Browse Code »

This reverts commit d1ba277e79889085a2faec3b68b91ce89c63f888.

As reported by Stephen, this patch breaks linux-next as a ppc patch
suddenly (after 2 years) started using this old api call. So revert it
for now, it will go away in 3.15-rc2 when we can change the PPC call to
the new api.

Reported-by: Stephen Rothwell
Cc: Tejun Heo
Cc: Stewart Smith
Cc: Benjamin Herrenschmidt
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2014-03-26 11:54:57 +0800

03 Mar, 2014

1 commit

13df79774 Merge 3.14-rc5 into driver-core-next ... Browse Code »

We want the fixes in here.

Greg Kroah-Hartman
2014-03-03 12:09:08 +0800

25 Feb, 2014

1 commit

fed95bab8 sysfs: fix namespace refcnt leak ... Browse Code »

As mount() and kill_sb() is not a one-to-one match, we shoudn't get
ns refcnt unconditionally in sysfs_mount(), and instead we should
get the refcnt only when kernfs_mount() allocated a new superblock.

v2:
- Changed the name of the new argument, suggested by Tejun.
- Made the argument optional, suggested by Tejun.

v3:
- Make the new argument as second-to-last arg, suggested by Tejun.

Signed-off-by: Li Zefan
Acked-by: Tejun Heo
---
fs/kernfs/mount.c | 8 +++++++-
fs/sysfs/mount.c | 5 +++--
include/linux/kernfs.h | 9 +++++----
3 files changed, 15 insertions(+), 7 deletions(-)
Signed-off-by: Greg Kroah-Hartman

Li Zefan
2014-02-25 23:37:52 +0800

16 Feb, 2014

1 commit

aabaf4c20 sysfs: create bin_attributes under the requested group ... Browse Code »

bin_attributes created/updated in create_files() (such as those listed
via (struct device).attribute_groups) were not placed under the
specified group, and instead appeared in the base kobj directory.

Fix this by making bin_attributes use creating code similar to normal
attributes.

A quick grep shows that no one is using bin_attrs in a named attribute
group yet, so we can do this without breaking anything in usespace.

Note that I do not add is_visible() support to
bin_attributes, though that could be done as well.

Signed-off-by: Cody P Schafer
Signed-off-by: Greg Kroah-Hartman

Cody P Schafer
2014-02-16 04:14:55 +0800

08 Feb, 2014

5 commits

ba341d55a kernfs: add CONFIG_KERNFS ... Browse Code »

As sysfs was kernfs's only user, kernfs has been piggybacking on
CONFIG_SYSFS; however, kernfs is scheduled to grow a new user very
soon. Introduce a separate config option CONFIG_KERNFS which is to be
selected by kernfs users.

Signed-off-by: Tejun Heo
Cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-02-08 08:08:57 +0800
3eef34ad7 kernfs: implement kernfs_get_parent(), kernfs_name/path() and friends ... Browse Code »

kernfs_node->parent and ->name are currently marked as "published"
indicating that kernfs users may access them directly; however, those
fields may get updated by kernfs_rename[_ns]() and unrestricted access
may lead to erroneous values or oops.

Protect ->parent and ->name updates with a irq-safe spinlock
kernfs_rename_lock and implement the following accessors for these
fields.

* kernfs_name() - format the node's name into the specified buffer
* kernfs_path() - format the node's path into the specified buffer
* pr_cont_kernfs_name() - pr_cont a node's name (doesn't need buffer)
* pr_cont_kernfs_path() - pr_cont a node's path (doesn't need buffer)
* kernfs_get_parent() - pin and return a node's parent

All can be called under any context. The recursive sysfs_pathname()
in fs/sysfs/dir.c is replaced with kernfs_path() and
sysfs_rename_dir_ns() is updated to use kernfs_get_parent() instead of
dereferencing parent directly.

v2: Dummy definition of kernfs_path() for !CONFIG_KERNFS was missing
static inline making it cause a lot of build warnings. Add it.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-02-08 08:05:35 +0800
d35258ef7 kernfs: allow nodes to be created in the deactivated state ... Browse Code »

Currently, kernfs_nodes are made visible to userland on creation,
which makes it difficult for kernfs users to atomically succeed or
fail creation of multiple nodes. In addition, if something fails
after creating some nodes, the created nodes might already be in use
and their active refs need to be drained for removal, which has the
potential to introduce tricky reverse locking dependency on active_ref
depending on how the error path is synchronized.

This patch introduces per-root flag KERNFS_ROOT_CREATE_DEACTIVATED.
If set, all nodes under the root are created in the deactivated state
and stay invisible to userland until explicitly enabled by the new
kernfs_activate() API. Also, nodes which have never been activated
are guaranteed to bypass draining on removal thus allowing error paths
to not worry about lockding dependency on active_ref draining.

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-02-08 07:52:48 +0800
ce8b04aa6 sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner() ... Browse Code »

All device_schedule_callback_owner() users are converted to use
device_remove_file_self(). Remove now unused
{sysfs|device}_schedule_callback_owner().

Signed-off-by: Tejun Heo
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-02-08 07:42:41 +0800
6b0afc2a2 kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappers ... Browse Code »

Sometimes it's necessary to implement a node which wants to delete
nodes including itself. This isn't straightforward because of kernfs
active reference. While a file operation is in progress, an active
reference is held and kernfs_remove() waits for all such references to
drain before completing. For a self-deleting node, this is a deadlock
as kernfs_remove() ends up waiting for an active reference that itself
is sitting on top of.

This currently is worked around in the sysfs layer using
sysfs_schedule_callback() which makes such removals asynchronous.
While it works, it's rather cumbersome and inherently breaks
synchronicity of the operation - the file operation which triggered
the operation may complete before the removal is finished (or even
started) and the removal may fail asynchronously. If a removal
operation is immmediately followed by another operation which expects
the specific name to be available (e.g. removal followed by rename
onto the same name), there's no way to make the latter operation
reliable.

The thing is there's no inherent reason for this to be asynchrnous.
All that's necessary to do this synchronous is a dedicated operation
which drops its own active ref and deactivates self. This patch
implements kernfs_remove_self() and its wrappers in sysfs and driver
core. kernfs_remove_self() is to be called from one of the file
operations, drops the active ref the task is holding, removes the self
node, and restores active ref to the dead node so that the ref is
balanced afterwards. __kernfs_remove() is updated so that it takes an
early exit if the target node is already fully removed so that the
active ref restored by kernfs_remove_self() after removal doesn't
confuse the deactivation path.

This makes implementing self-deleting nodes very easy. The normal
removal path doesn't even need to be changed to use
kernfs_remove_self() for the self-deleting node. The method can
invoke kernfs_remove_self() on itself before proceeding the normal
removal path. kernfs_remove() invoked on the node by the normal
deletion path will simply be ignored.

This will replace sysfs_schedule_callback(). A subtle feature of
sysfs_schedule_callback() is that it collapses multiple invocations -
even if multiple removals are triggered, the removal callback is run
only once. An equivalent effect can be achieved by testing the return
value of kernfs_remove_self() - only the one which gets %true return
value should proceed with actual deletion. All other instances of
kernfs_remove_self() will wait till the enclosing kernfs operation
which invoked the winning instance of kernfs_remove_self() finishes
and then return %false. This trivially makes all users of
kernfs_remove_self() automatically show correct synchronous behavior
even when there are multiple concurrent operations - all "echo 1 >
delete" instances will finish only after the whole operation is
completed by one of the instances.

Note that manipulation of active ref is implemented in separate public
functions - kernfs_[un]break_active_protection().
kernfs_remove_self() is the only user at the moment but this will be
used to cater to more complex cases.

v2: For !CONFIG_SYSFS, dummy version kernfs_remove_self() was missing
and sysfs_remove_file_self() had incorrect return type. Fix it.
Reported by kbuild test bot.

v3: kernfs_[un]break_active_protection() separated out from
kernfs_remove_self() and exposed as public API.

Signed-off-by: Tejun Heo
Cc: Alan Stern
Cc: kbuild test robot
Signed-off-by: Greg Kroah-Hartman

Tejun Heo
2014-02-08 07:42:41 +0800

14 Jan, 2014

1 commit

a9f138b0e Revert "kernfs, sysfs, driver-core: implement kernfs_remove_self() and its wrappers" ... Browse Code »

This reverts commit 1ae06819c77cff1ea2833c94f8c093fe8a5c79db.

Tejun writes:
I'm sorry but can you please revert the whole series?
get_active() waiting while a node is deactivated has potential
to lead to deadlock and that deactivate/reactivate interface is
something fundamentally flawed and that cgroup will have to work
with the remove_self() like everybody else. IOW, I think the
first posting was correct.

Cc: Tejun Heo
Cc: Alan Stern
Cc: kbuild test robot
Signed-off-by: Greg Kroah-Hartman

Greg Kroah-Hartman
2014-01-14 06:05:13 +0800