02 Nov, 2017

1 commit

  • Many source files in the tree are missing licensing information, which
    makes it harder for compliance tools to determine the correct license.

    By default all files without license information are under the default
    license of the kernel, which is GPL version 2.

    Update the files which contain no license information with the 'GPL-2.0'
    SPDX license identifier. The SPDX identifier is a legally binding
    shorthand, which can be used instead of the full boiler plate text.

    This patch is based on work done by Thomas Gleixner and Kate Stewart and
    Philippe Ombredanne.

    How this work was done:

    Patches were generated and checked against linux-4.14-rc6 for a subset of
    the use cases:
    - file had no licensing information it it.
    - file was a */uapi/* one with no licensing information in it,
    - file was a */uapi/* one with existing licensing information,

    Further patches will be generated in subsequent months to fix up cases
    where non-standard license headers were used, and references to license
    had to be inferred by heuristics based on keywords.

    The analysis to determine which SPDX License Identifier to be applied to
    a file was done in a spreadsheet of side by side results from of the
    output of two independent scanners (ScanCode & Windriver) producing SPDX
    tag:value files created by Philippe Ombredanne. Philippe prepared the
    base worksheet, and did an initial spot review of a few 1000 files.

    The 4.13 kernel was the starting point of the analysis with 60,537 files
    assessed. Kate Stewart did a file by file comparison of the scanner
    results in the spreadsheet to determine which SPDX license identifier(s)
    to be applied to the file. She confirmed any determination that was not
    immediately clear with lawyers working with the Linux Foundation.

    Criteria used to select files for SPDX license identifier tagging was:
    - Files considered eligible had to be source code files.
    - Make and config files were included as candidates if they contained >5
    lines of source
    - File already had some variant of a license header in it (even if
    Reviewed-by: Philippe Ombredanne
    Reviewed-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     

09 May, 2017

1 commit

  • Coccinelle emits this warning:

    WARNING: casting value returned by memory allocation function to (struct proc_inode *) is useless.

    Remove unnecessary cast.

    Link: http://lkml.kernel.org/r/1487745720-16967-1-git-send-email-me@tobin.cc
    Signed-off-by: Tobin C. Harding
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tobin C. Harding
     

25 Feb, 2017

1 commit

  • Previously, the hidepid parameter was checked by comparing literal
    integers 0, 1, 2. Let's add a proper enum for this, to make the
    checking more expressive:

    0 → HIDEPID_OFF
    1 → HIDEPID_NO_ACCESS
    2 → HIDEPID_INVISIBLE

    This changes the internal labelling only, the userspace-facing interface
    remains unmodified, and still works with literal integers 0, 1, 2.

    No functional changes.

    Link: http://lkml.kernel.org/r/1484572984-13388-2-git-send-email-djalal@gmail.com
    Signed-off-by: Lafcadio Wluiki
    Signed-off-by: Djalal Harouni
    Acked-by: Kees Cook
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lafcadio Wluiki
     

13 Feb, 2017

1 commit

  • Currently unregistering sysctl table does not prune its dentries.
    Stale dentries could slowdown sysctl operations significantly.

    For example, command:

    # for i in {1..100000} ; do unshare -n -- sysctl -a &> /dev/null ; done
    creates a millions of stale denties around sysctls of loopback interface:

    # sysctl fs.dentry-state
    fs.dentry-state = 25812579 24724135 45 0 0 0

    All of them have matching names thus lookup have to scan though whole
    hash chain and call d_compare (proc_sys_compare) which checks them
    under system-wide spinlock (sysctl_lock).

    # time sysctl -a > /dev/null
    real 1m12.806s
    user 0m0.016s
    sys 1m12.400s

    Currently only memory reclaimer could remove this garbage.
    But without significant memory pressure this never happens.

    This patch collects sysctl inodes into list on sysctl table header and
    prunes all their dentries once that table unregisters.

    Konstantin Khlebnikov writes:
    > On 10.02.2017 10:47, Al Viro wrote:
    >> how about >> the matching stats *after* that patch?
    >
    > dcache size doesn't grow endlessly, so stats are fine
    >
    > # sysctl fs.dentry-state
    > fs.dentry-state = 92712 58376 45 0 0 0
    >
    > # time sysctl -a &>/dev/null
    >
    > real 0m0.013s
    > user 0m0.004s
    > sys 0m0.008s

    Signed-off-by: Konstantin Khlebnikov
    Suggested-by: Al Viro
    Signed-off-by: Eric W. Biederman

    Konstantin Khlebnikov
     

25 Dec, 2016

1 commit


18 Dec, 2016

1 commit

  • …/linux/kernel/git/mszeredi/vfs

    Pull partial readlink cleanups from Miklos Szeredi.

    This is the uncontroversial part of the readlink cleanup patch-set that
    simplifies the default readlink handling.

    Miklos and Al are still discussing the rest of the series.

    * git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
    vfs: make generic_readlink() static
    vfs: remove ".readlink = generic_readlink" assignments
    vfs: default to generic_readlink()
    vfs: replace calling i_op->readlink with vfs_readlink()
    proc/self: use generic_readlink
    ecryptfs: use vfs_get_link()
    bad_inode: add missing i_op initializers

    Linus Torvalds
     

13 Dec, 2016

4 commits

  • Some comments were obsoleted since commit 05c0ae21c034 ("try a saner
    locking for pde_opener...").

    Some new comments added.

    Some confusing comments replaced with equally confusing ones.

    Link: http://lkml.kernel.org/r/20161029160231.GD1246@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • kzalloc is too much, half of the fields will be reinitialized anyway.

    If proc file doesn't have ->release hook (some still do not), clearing
    is unnecessary because it will be freed immediately.

    Link: http://lkml.kernel.org/r/20161029155747.GC1246@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • struct pde_opener::closing is boolean.

    Link: http://lkml.kernel.org/r/20161029155439.GB1246@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • list_del_init() is too much, structure will be freed in three lines
    anyway.

    Link: http://lkml.kernel.org/r/20161029155313.GA1246@avx2
    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     

09 Dec, 2016

1 commit


28 Sep, 2016

2 commits

  • CURRENT_TIME macro is not appropriate for filesystems as it
    doesn't use the right granularity for filesystem timestamps.
    Use current_time() instead.

    CURRENT_TIME is also not y2038 safe.

    This is also in preparation for the patch that transitions
    vfs timestamps to use 64 bit time and hence make them
    y2038 safe. As part of the effort current_time() will be
    extended to do range checks. Hence, it is necessary for all
    file system timestamps to use current_time(). Also,
    current_time() will be transitioned along with vfs to be
    y2038 safe.

    Note that whenever a single call to current_time() is used
    to change timestamps in different inodes, it is because they
    share the same time granularity.

    Signed-off-by: Deepa Dinamani
    Reviewed-by: Arnd Bergmann
    Acked-by: Felipe Balbi
    Acked-by: Steven Whitehouse
    Acked-by: Ryusuke Konishi
    Acked-by: David Sterba
    Signed-off-by: Al Viro

    Deepa Dinamani
     
  • proc uses new_inode_pseudo() to allocate a new inode.
    This in turn calls the proc_inode_alloc() callback.
    But, at this point, inode is still not initialized
    with the super_block pointer which only happens just
    before alloc_inode() returns after the call to
    inode_init_always().

    Also, the inode times are initialized again after the
    call to new_inode_pseudo() in proc_inode_alloc().
    The assignemet in proc_alloc_inode() is redundant and
    also doesn't work after the current_time() api is
    changed to take struct inode* instead of
    struct *super_block.

    This bug was reported after current_time() was used to
    assign times in proc_alloc_inode().

    Signed-off-by: Deepa Dinamani
    Reported-by: Fengguang Wu [0-day test robot]
    Reviewed-by: Arnd Bergmann
    Signed-off-by: Al Viro

    Deepa Dinamani
     

30 Jul, 2016

1 commit

  • Pull userns vfs updates from Eric Biederman:
    "This tree contains some very long awaited work on generalizing the
    user namespace support for mounting filesystems to include filesystems
    with a backing store. The real world target is fuse but the goal is
    to update the vfs to allow any filesystem to be supported. This
    patchset is based on a lot of code review and testing to approach that
    goal.

    While looking at what is needed to support the fuse filesystem it
    became clear that there were things like xattrs for security modules
    that needed special treatment. That the resolution of those concerns
    would not be fuse specific. That sorting out these general issues
    made most sense at the generic level, where the right people could be
    drawn into the conversation, and the issues could be solved for
    everyone.

    At a high level what this patchset does a couple of simple things:

    - Add a user namespace owner (s_user_ns) to struct super_block.

    - Teach the vfs to handle filesystem uids and gids not mapping into
    to kuids and kgids and being reported as INVALID_UID and
    INVALID_GID in vfs data structures.

    By assigning a user namespace owner filesystems that are mounted with
    only user namespace privilege can be detected. This allows security
    modules and the like to know which mounts may not be trusted. This
    also allows the set of uids and gids that are communicated to the
    filesystem to be capped at the set of kuids and kgids that are in the
    owning user namespace of the filesystem.

    One of the crazier corner casees this handles is the case of inodes
    whose i_uid or i_gid are not mapped into the vfs. Most of the code
    simply doesn't care but it is easy to confuse the inode writeback path
    so no operation that could cause an inode write-back is permitted for
    such inodes (aka only reads are allowed).

    This set of changes starts out by cleaning up the code paths involved
    in user namespace permirted mounts. Then when things are clean enough
    adds code that cleanly sets s_user_ns. Then additional restrictions
    are added that are possible now that the filesystem superblock
    contains owner information.

    These changes should not affect anyone in practice, but there are some
    parts of these restrictions that are changes in behavior.

    - Andy's restriction on suid executables that does not honor the
    suid bit when the path is from another mount namespace (think
    /proc/[pid]/fd/) or when the filesystem was mounted by a less
    privileged user.

    - The replacement of the user namespace implicit setting of MNT_NODEV
    with implicitly setting SB_I_NODEV on the filesystem superblock
    instead.

    Using SB_I_NODEV is a stronger form that happens to make this state
    user invisible. The user visibility can be managed but it caused
    problems when it was introduced from applications reasonably
    expecting mount flags to be what they were set to.

    There is a little bit of work remaining before it is safe to support
    mounting filesystems with backing store in user namespaces, beyond
    what is in this set of changes.

    - Verifying the mounter has permission to read/write the block device
    during mount.

    - Teaching the integrity modules IMA and EVM to handle filesystems
    mounted with only user namespace root and to reduce trust in their
    security xattrs accordingly.

    - Capturing the mounters credentials and using that for permission
    checks in d_automount and the like. (Given that overlayfs already
    does this, and we need the work in d_automount it make sense to
    generalize this case).

    Furthermore there are a few changes that are on the wishlist:

    - Get all filesystems supporting posix acls using the generic posix
    acls so that posix_acl_fix_xattr_from_user and
    posix_acl_fix_xattr_to_user may be removed. [Maintainability]

    - Reducing the permission checks in places such as remount to allow
    the superblock owner to perform them.

    - Allowing the superblock owner to chown files with unmapped uids and
    gids to something that is mapped so the files may be treated
    normally.

    I am not considering even obvious relaxations of permission checks
    until it is clear there are no more corner cases that need to be
    locked down and handled generically.

    Many thanks to Seth Forshee who kept this code alive, and putting up
    with me rewriting substantial portions of what he did to handle more
    corner cases, and for his diligent testing and reviewing of my
    changes"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (30 commits)
    fs: Call d_automount with the filesystems creds
    fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns
    evm: Translate user/group ids relative to s_user_ns when computing HMAC
    dquot: For now explicitly don't support filesystems outside of init_user_ns
    quota: Handle quota data stored in s_user_ns in quota_setxquota
    quota: Ensure qids map to the filesystem
    vfs: Don't create inodes with a uid or gid unknown to the vfs
    vfs: Don't modify inodes with a uid or gid unknown to the vfs
    cred: Reject inodes with invalid ids in set_create_file_as()
    fs: Check for invalid i_uid in may_follow_link()
    vfs: Verify acls are valid within superblock's s_user_ns.
    userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS
    fs: Refuse uid/gid changes which don't map into s_user_ns
    selinux: Add support for unprivileged mounts from user namespaces
    Smack: Handle labels consistently in untrusted mounts
    Smack: Add support for unprivileged mounts from user namespaces
    fs: Treat foreign mounts as nosuid
    fs: Limit file caps to the user namespace of the super block
    userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag
    userns: Remove implicit MNT_NODEV fragility.
    ...

    Linus Torvalds
     

24 Jun, 2016

3 commits

  • Introduce a function may_open_dev that tests MNT_NODEV and a new
    superblock flab SB_I_NODEV. Use this new function in all of the
    places where MNT_NODEV was previously tested.

    Add the new SB_I_NODEV s_iflag to proc, sysfs, and mqueuefs as those
    filesystems should never support device nodes, and a simple superblock
    flags makes that very hard to get wrong. With SB_I_NODEV set if any
    device nodes somehow manage to show up on on a filesystem those
    device nodes will be unopenable.

    Acked-by: Seth Forshee
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Move the call of get_pid_ns, the call of proc_parse_options, and
    the setting of s_iflags into proc_fill_super so that mount_ns
    can be used.

    Convert proc_mount to call mount_ns and remove the now unnecessary
    code.

    Acked-by: Seth Forshee
    Reviewed-by: Djalal Harouni
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     
  • Replace the call of fs_fully_visible in do_new_mount from before the
    new superblock is allocated with a call of mount_too_revealing after
    the superblock is allocated. This winds up being a much better location
    for maintainability of the code.

    The first change this enables is the replacement of FS_USERNS_VISIBLE
    with SB_I_USERNS_VISIBLE. Moving the flag from struct filesystem_type
    to sb_iflags on the superblock.

    Unfortunately mount_too_revealing fundamentally needs to touch
    mnt_flags adding several MNT_LOCKED_XXX flags at the appropriate
    times. If the mnt_flags did not need to be touched the code
    could be easily moved into the filesystem specific mount code.

    Acked-by: Seth Forshee
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

15 Jan, 2016

1 commit

  • Mark those kmem allocations that are known to be easily triggered from
    userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to
    memcg. For the list, see below:

    - threadinfo
    - task_struct
    - task_delay_info
    - pid
    - cred
    - mm_struct
    - vm_area_struct and vm_region (nommu)
    - anon_vma and anon_vma_chain
    - signal_struct
    - sighand_struct
    - fs_struct
    - files_struct
    - fdtable and fdtable->full_fds_bits
    - dentry and external_name
    - inode for all filesystems. This is the most tedious part, because
    most filesystems overwrite the alloc_inode method.

    The list is far from complete, so feel free to add more objects.
    Nevertheless, it should be close to "account everything" approach and
    keep most workloads within bounds. Malevolent users will be able to
    breach the limit, but this was possible even with the former "account
    everything" approach (simply because it did not account everything in
    fact).

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Vladimir Davydov
    Acked-by: Johannes Weiner
    Acked-by: Michal Hocko
    Cc: Tejun Heo
    Cc: Greg Thelen
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Cc: Joonsoo Kim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vladimir Davydov
     

31 Dec, 2015

1 commit


09 Dec, 2015

1 commit

  • new method: ->get_link(); replacement of ->follow_link(). The differences
    are:
    * inode and dentry are passed separately
    * might be called both in RCU and non-RCU mode;
    the former is indicated by passing it a NULL dentry.
    * when called that way it isn't allowed to block
    and should return ERR_PTR(-ECHILD) if it needs to be called
    in non-RCU mode.

    It's a flagday change - the old method is gone, all in-tree instances
    converted. Conversion isn't hard; said that, so far very few instances
    do not immediately bail out when called in RCU mode. That'll change
    in the next commits.

    Signed-off-by: Al Viro

    Al Viro
     

04 Jul, 2015

1 commit

  • Pull user namespace updates from Eric Biederman:
    "Long ago and far away when user namespaces where young it was realized
    that allowing fresh mounts of proc and sysfs with only user namespace
    permissions could violate the basic rule that only root gets to decide
    if proc or sysfs should be mounted at all.

    Some hacks were put in place to reduce the worst of the damage could
    be done, and the common sense rule was adopted that fresh mounts of
    proc and sysfs should allow no more than bind mounts of proc and
    sysfs. Unfortunately that rule has not been fully enforced.

    There are two kinds of gaps in that enforcement. Only filesystems
    mounted on empty directories of proc and sysfs should be ignored but
    the test for empty directories was insufficient. So in my tree
    directories on proc, sysctl and sysfs that will always be empty are
    created specially. Every other technique is imperfect as an ordinary
    directory can have entries added even after a readdir returns and
    shows that the directory is empty. Special creation of directories
    for mount points makes the code in the kernel a smidge clearer about
    it's purpose. I asked container developers from the various container
    projects to help test this and no holes were found in the set of mount
    points on proc and sysfs that are created specially.

    This set of changes also starts enforcing the mount flags of fresh
    mounts of proc and sysfs are consistent with the existing mount of
    proc and sysfs. I expected this to be the boring part of the work but
    unfortunately unprivileged userspace winds up mounting fresh copies of
    proc and sysfs with noexec and nosuid clear when root set those flags
    on the previous mount of proc and sysfs. So for now only the atime,
    read-only and nodev attributes which userspace happens to keep
    consistent are enforced. Dealing with the noexec and nosuid
    attributes remains for another time.

    This set of changes also addresses an issue with how open file
    descriptors from /proc//ns/* are displayed. Recently readlink of
    /proc//fd has been triggering a WARN_ON that has not been
    meaningful since it was added (as all of the code in the kernel was
    converted) and is not now actively wrong.

    There is also a short list of issues that have not been fixed yet that
    I will mention briefly.

    It is possible to rename a directory from below to above a bind mount.
    At which point any directory pointers below the renamed directory can
    be walked up to the root directory of the filesystem. With user
    namespaces enabled a bind mount of the bind mount can be created
    allowing the user to pick a directory whose children they can rename
    to outside of the bind mount. This is challenging to fix and doubly
    so because all obvious solutions must touch code that is in the
    performance part of pathname resolution.

    As mentioned above there is also a question of how to ensure that
    developers by accident or with purpose do not introduce exectuable
    files on sysfs and proc and in doing so introduce security regressions
    in the current userspace that will not be immediately obvious and as
    such are likely to require breaking userspace in painful ways once
    they are recognized"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    vfs: Remove incorrect debugging WARN in prepend_path
    mnt: Update fs_fully_visible to test for permanently empty directories
    sysfs: Create mountpoints with sysfs_create_mount_point
    sysfs: Add support for permanently empty directories to serve as mount points.
    kernfs: Add support for always empty directories.
    proc: Allow creating permanently empty directories that serve as mount points
    sysctl: Allow creating permanently empty directories that serve as mountpoints.
    fs: Add helper functions for permanently empty directories.
    vfs: Ignore unlocked mounts in fs_fully_visible
    mnt: Modify fs_fully_visible to deal with locked ro nodev and atime
    mnt: Refactor the logic for mounting sysfs and proc in a user namespace

    Linus Torvalds
     

01 Jul, 2015

1 commit

  • Add a new function proc_create_mount_point that when used to creates a
    directory that can not be added to.

    Add a new function is_empty_pde to test if a function is a mount
    point.

    Update the code to use make_empty_dir_inode when reporting
    a permanently empty directory to the vfs.

    Update the code to not allow adding to permanently empty directories.

    Update /proc/openprom and /proc/fs/nfsd to be permanently empty directories.

    Cc: stable@vger.kernel.org
    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

11 May, 2015

3 commits

  • only one instance looks at that argument at all; that sole
    exception wants inode rather than dentry.

    Signed-off-by: Al Viro

    Al Viro
     
  • its only use is getting passed to nd_jump_link(), which can obtain
    it from current->nameidata

    Signed-off-by: Al Viro

    Al Viro
     
  • a) instead of storing the symlink body (via nd_set_link()) and returning
    an opaque pointer later passed to ->put_link(), ->follow_link() _stores_
    that opaque pointer (into void * passed by address by caller) and returns
    the symlink body. Returning ERR_PTR() on error, NULL on jump (procfs magic
    symlinks) and pointer to symlink body for normal symlinks. Stored pointer
    is ignored in all cases except the last one.

    Storing NULL for opaque pointer (or not storing it at all) means no call
    of ->put_link().

    b) the body used to be passed to ->put_link() implicitly (via nameidata).
    Now only the opaque pointer is. In the cases when we used the symlink body
    to free stuff, ->follow_link() now should store it as opaque pointer in addition
    to returning it.

    Signed-off-by: Al Viro

    Al Viro
     

16 Apr, 2015

1 commit


23 Feb, 2015

1 commit


13 Feb, 2015

1 commit


11 Dec, 2014

2 commits

  • procfs inodes need only the ns_ops part; nsfs inodes don't need it at all

    Signed-off-by: Al Viro

    Al Viro
     
  • New pseudo-filesystem: nsfs. Targets of /proc/*/ns/* live there now.
    It's not mountable (not even registered, so it's not in /proc/filesystems,
    etc.). Files on it *are* bindable - we explicitly permit that in do_loopback().

    This stuff lives in fs/nsfs.c now; proc_ns_fget() moved there as well.
    get_proc_ns() is a macro now (it's simply returning ->i_private; would
    have been an inline, if not for header ordering headache).
    proc_ns_inode() is an ex-parrot. The interface used in procfs is
    ns_get_path(path, task, ops) and ns_get_name(buf, size, task, ops).

    Dentries and inodes are never hashed; a non-counting reference to dentry
    is stashed in ns_common (removed by ->d_prune()) and reused by ns_get_path()
    if present. See ns_get_path()/ns_prune_dentry/nsfs_evict() for details
    of that mechanism.

    As the result, proc_ns_follow_link() has stopped poking in nd->path.mnt;
    it does nd_jump_link() on a consistent pair it gets
    from ns_get_path().

    Signed-off-by: Al Viro

    Al Viro
     

05 Dec, 2014

2 commits


05 Aug, 2014

1 commit

  • /proc/thread-self is derived from /proc/self. /proc/thread-self
    points to the directory in proc containing information about the
    current thread.

    This funtionality has been missing for a long time, and is tricky to
    implement in userspace as gettid() is not exported by glibc. More
    importantly this allows fixing defects in /proc/mounts and /proc/net
    where in a threaded application today they wind up being empty files
    when only the initial pthread has exited, causing problems for other
    threads.

    Signed-off-by: "Eric W. Biederman"

    Eric W. Biederman
     

08 Apr, 2014

1 commit

  • Replace rcu_assign_pointer(x, NULL) with RCU_INIT_POINTER(x, NULL)

    The rcu_assign_pointer() ensures that the initialization of a structure
    is carried out before storing a pointer to that structure. And in the
    case of the NULL pointer, there is no structure to initialize. So,
    rcu_assign_pointer(p, NULL) can be safely converted to
    RCU_INIT_POINTER(p, NULL)

    Signed-off-by: Monam Agarwal
    Cc: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Monam Agarwal
     

04 Apr, 2014

1 commit

  • Reclaim will be leaving shadow entries in the page cache radix tree upon
    evicting the real page. As those pages are found from the LRU, an
    iput() can lead to the inode being freed concurrently. At this point,
    reclaim must no longer install shadow pages because the inode freeing
    code needs to ensure the page tree is really empty.

    Add an address_space flag, AS_EXITING, that the inode freeing code sets
    under the tree lock before doing the final truncate. Reclaim will check
    for this flag before installing shadow pages.

    Signed-off-by: Johannes Weiner
    Reviewed-by: Rik van Riel
    Reviewed-by: Minchan Kim
    Cc: Andrea Arcangeli
    Cc: Bob Liu
    Cc: Christoph Hellwig
    Cc: Dave Chinner
    Cc: Greg Thelen
    Cc: Hugh Dickins
    Cc: Jan Kara
    Cc: KOSAKI Motohiro
    Cc: Luigi Semenzato
    Cc: Mel Gorman
    Cc: Metin Doslu
    Cc: Michel Lespinasse
    Cc: Ozgun Erdogan
    Cc: Peter Zijlstra
    Cc: Roman Gushchin
    Cc: Ryan Mallon
    Cc: Tejun Heo
    Cc: Vlastimil Babka
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     

13 Dec, 2013

1 commit

  • Commit fad1a86e25e0 ("procfs: call default get_unmapped_area on
    MMU-present architectures"), as its title says, took care of only the
    MMU case, leaving the !MMU side still in the regressed state (returning
    -EIO in all cases where pde->proc_fops->get_unmapped_area is NULL).

    From the fad1a86e25e0 changelog:

    "Commit c4fe24485729 ("sparc: fix PCI device proc file mmap(2)") added
    proc_reg_get_unmapped_area in proc_reg_file_ops and
    proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
    get_unmapped_area method is not defined for the target procfs file, which
    causes regression of mmap on /proc/vmcore.

    To address this issue, like get_unmapped_area(), call default
    current->mm->get_unmapped_area on MMU-present architectures if
    pde->proc_fops->get_unmapped_area, i.e. the one in actual file operation
    in the procfs file, is not defined"

    Signed-off-by: Jan Beulich
    Cc: HATAYAMA Daisuke
    Cc: Alexey Dobriyan
    Cc: David S. Miller
    Cc: [3.12.x]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     

13 Nov, 2013

1 commit


17 Oct, 2013

2 commits

  • Commit c4fe24485729 ("sparc: fix PCI device proc file mmap(2)") added
    proc_reg_get_unmapped_area in proc_reg_file_ops and
    proc_reg_file_ops_no_compat, by which now mmap always returns EIO if
    get_unmapped_area method is not defined for the target procfs file,
    which causes regression of mmap on /proc/vmcore.

    To address this issue, like get_unmapped_area(), call default
    current->mm->get_unmapped_area on MMU-present architectures if
    pde->proc_fops->get_unmapped_area, i.e. the one in actual file
    operation in the procfs file, is not defined.

    Reported-by: Michael Holzheu
    Signed-off-by: HATAYAMA Daisuke
    Cc: Alexey Dobriyan
    Cc: David S. Miller
    Tested-by: Michael Holzheu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HATAYAMA Daisuke
     
  • Currently, proc_reg_get_unmapped_area truncates upper 32-bit of the
    mapped virtual address returned from get_unmapped_area method in
    pde->proc_fops due to the variable rv of signed integer on x86_64. This
    is too small to have vitual address of unsigned long on x86_64 since on
    x86_64, signed integer is of 4 bytes while unsigned long is of 8 bytes.
    To fix this issue, use unsigned long instead.

    Fixes a regression added in commit c4fe24485729 ("sparc: fix PCI device
    proc file mmap(2)").

    Signed-off-by: HATAYAMA Daisuke
    Cc: Alexey Dobriyan
    Cc: David S. Miller
    Tested-by: Michael Holzheu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    HATAYAMA Daisuke
     

06 Sep, 2013

1 commit

  • Commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba "Fix rmmod/read/write races in /proc entries"
    must have broken mmapping of PCI device proc files on Sparc.

    Notice how it adds wrapper around ->mmap but doesn't do it around ->get_unmapped_area.
    Add wrapper around ->get_unmapped_area.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: David S. Miller

    Alexey Dobriyan