09 Jan, 2015

40 commits

  • Greg Kroah-Hartman
     
  • commit 678886bdc6378c1cbd5072da2c5a3035000214e3 upstream.

    When we abort a transaction we iterate over all the ranges marked as dirty
    in fs_info->freed_extents[0] and fs_info->freed_extents[1], clear them
    from those trees, add them back (unpin) to the free space caches and, if
    the fs was mounted with "-o discard", perform a discard on those regions.
    Also, after adding the regions to the free space caches, a fitrim ioctl call
    can see those ranges in a block group's free space cache and perform a discard
    on the ranges, so the same issue can happen without "-o discard" as well.

    This causes corruption, affecting one or multiple btree nodes (in the worst
    case leaving the fs unmountable) because some of those ranges (the ones in
    the fs_info->pinned_extents tree) correspond to btree nodes/leafs that are
    referred by the last committed super block - breaking the rule that anything
    that was committed by a transaction is untouched until the next transaction
    commits successfully.

    I ran into this while running in a loop (for several hours) the fstest that
    I recently submitted:

    [PATCH] fstests: add btrfs test to stress chunk allocation/removal and fstrim

    The corruption always happened when a transaction aborted and then fsck complained
    like this:

    _check_btrfs_filesystem: filesystem on /dev/sdc is inconsistent
    *** fsck.btrfs output ***
    Check tree block failed, want=94945280, have=0
    Check tree block failed, want=94945280, have=0
    Check tree block failed, want=94945280, have=0
    Check tree block failed, want=94945280, have=0
    Check tree block failed, want=94945280, have=0
    read block failed check_tree_block
    Couldn't open file system

    In this case 94945280 corresponded to the root of a tree.
    Using frace what I observed was the following sequence of steps happened:

    1) transaction N started, fs_info->pinned_extents pointed to
    fs_info->freed_extents[0];

    2) node/eb 94945280 is created;

    3) eb is persisted to disk;

    4) transaction N commit starts, fs_info->pinned_extents now points to
    fs_info->freed_extents[1], and transaction N completes;

    5) transaction N + 1 starts;

    6) eb is COWed, and btrfs_free_tree_block() called for this eb;

    7) eb range (94945280 to 94945280 + 16Kb) is added to
    fs_info->pinned_extents (fs_info->freed_extents[1]);

    8) Something goes wrong in transaction N + 1, like hitting ENOSPC
    for example, and the transaction is aborted, turning the fs into
    readonly mode. The stack trace I got for example:

    [112065.253935] [] dump_stack+0x4d/0x66
    [112065.254271] [] warn_slowpath_common+0x7f/0x98
    [112065.254567] [] ? __btrfs_abort_transaction+0x50/0x10b [btrfs]
    [112065.261674] [] warn_slowpath_fmt+0x48/0x50
    [112065.261922] [] ? btrfs_free_path+0x26/0x29 [btrfs]
    [112065.262211] [] __btrfs_abort_transaction+0x50/0x10b [btrfs]
    [112065.262545] [] btrfs_remove_chunk+0x537/0x58b [btrfs]
    [112065.262771] [] btrfs_delete_unused_bgs+0x1de/0x21b [btrfs]
    [112065.263105] [] cleaner_kthread+0x100/0x12f [btrfs]
    (...)
    [112065.264493] ---[ end trace dd7903a975a31a08 ]---
    [112065.264673] BTRFS: error (device sdc) in btrfs_remove_chunk:2625: errno=-28 No space left
    [112065.264997] BTRFS info (device sdc): forced readonly

    9) The clear kthread sees that the BTRFS_FS_STATE_ERROR bit is set in
    fs_info->fs_state and calls btrfs_cleanup_transaction(), which in
    turn calls btrfs_destroy_pinned_extent();

    10) Then btrfs_destroy_pinned_extent() iterates over all the ranges
    marked as dirty in fs_info->freed_extents[], and for each one
    it calls discard, if the fs was mounted with "-o discard", and
    adds the range to the free space cache of the respective block
    group;

    11) btrfs_trim_block_group(), invoked from the fitrim ioctl code path,
    sees the free space entries and performs a discard;

    12) After an umount and mount (or fsck), our eb's location on disk was full
    of zeroes, and it should have been untouched, because it was marked as
    dirty in the fs_info->pinned_extents tree, and therefore used by the
    trees that the last committed superblock points to.

    Fix this by not performing a discard and not adding the ranges to the free space
    caches - it's useless from this point since the fs is now in readonly mode and
    we won't write free space caches to disk anymore (otherwise we would leak space)
    nor any new superblock. By not adding the ranges to the free space caches, it
    prevents other code paths from allocating that space and write to it as well,
    therefore being safer and simpler.

    This isn't a new problem, as it's been present since 2011 (git commit
    acce952b0263825da32cf10489413dec78053347).

    Signed-off-by: Filipe Manana
    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    Filipe Manana
     
  • commit a28046956c71985046474283fa3bcd256915fb72 upstream.

    We use the modified list to keep track of which extents have been modified so we
    know which ones are candidates for logging at fsync() time. Newly modified
    extents are added to the list at modification time, around the same time the
    ordered extent is created. We do this so that we don't have to wait for ordered
    extents to complete before we know what we need to log. The problem is when
    something like this happens

    log extent 0-4k on inode 1
    copy csum for 0-4k from ordered extent into log
    sync log
    commit transaction
    log some other extent on inode 1
    ordered extent for 0-4k completes and adds itself onto modified list again
    log changed extents
    see ordered extent for 0-4k has already been logged
    at this point we assume the csum has been copied
    sync log
    crash

    On replay we will see the extent 0-4k in the log, drop the original 0-4k extent
    which is the same one that we are replaying which also drops the csum, and then
    we won't find the csum in the log for that bytenr. This of course causes us to
    have errors about not having csums for certain ranges of our inode. So remove
    the modified list manipulation in unpin_extent_cache, any modified extents
    should have been added well before now, and we don't want them re-logged. This
    fixes my test that I could reliably reproduce this problem with. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit 942080643bce061c3dd9d5718d3b745dcb39a8bc upstream.

    Dmitry Chernenkov used KASAN to discover that eCryptfs writes past the
    end of the allocated buffer during encrypted filename decoding. This
    fix corrects the issue by getting rid of the unnecessary 0 write when
    the current bit offset is 2.

    Signed-off-by: Michael Halcrow
    Reported-by: Dmitry Chernenkov
    Suggested-by: Kees Cook
    Signed-off-by: Tyler Hicks
    Signed-off-by: Greg Kroah-Hartman

    Michael Halcrow
     
  • commit 332b122d39c9cbff8b799007a825d94b2e7c12f2 upstream.

    The ecryptfs_encrypted_view mount option greatly changes the
    functionality of an eCryptfs mount. Instead of encrypting and decrypting
    lower files, it provides a unified view of the encrypted files in the
    lower filesystem. The presence of the ecryptfs_encrypted_view mount
    option is intended to force a read-only mount and modifying files is not
    supported when the feature is in use. See the following commit for more
    information:

    e77a56d [PATCH] eCryptfs: Encrypted passthrough

    This patch forces the mount to be read-only when the
    ecryptfs_encrypted_view mount option is specified by setting the
    MS_RDONLY flag on the superblock. Additionally, this patch removes some
    broken logic in ecryptfs_open() that attempted to prevent modifications
    of files when the encrypted view feature was in use. The check in
    ecryptfs_open() was not sufficient to prevent file modifications using
    system calls that do not operate on a file descriptor.

    Signed-off-by: Tyler Hicks
    Reported-by: Priya Bansal
    Signed-off-by: Greg Kroah-Hartman

    Tyler Hicks
     
  • commit a1d47b262952a45aae62bd49cfaf33dd76c11a2c upstream.

    UDF specification allows arbitrarily large symlinks. However we support
    only symlinks at most one block large. Check the length of the symlink
    so that we don't access memory beyond end of the symlink block.

    Reported-by: Carl Henrik Lunde
    Signed-off-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit 24c037ebf5723d4d9ab0996433cee4f96c292a4d upstream.

    alloc_pid() does get_pid_ns() beforehand but forgets to put_pid_ns() if it
    fails because disable_pid_allocation() was called by the exiting
    child_reaper.

    We could simply move get_pid_ns() down to successful return, but this fix
    tries to be as trivial as possible.

    Signed-off-by: Oleg Nesterov
    Reviewed-by: "Eric W. Biederman"
    Cc: Aaron Tomlin
    Cc: Pavel Emelyanov
    Cc: Serge Hallyn
    Cc: Sterling Alexander
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Oleg Nesterov
     
  • commit a682e9c28cac152e6e54c39efcf046e0c8cfcf63 upstream.

    If some error happens in NCP_IOC_SETROOT ioctl, the appropriate error
    return value is then (in most cases) just overwritten before we return.
    This can result in reporting success to userspace although error happened.

    This bug was introduced by commit 2e54eb96e2c8 ("BKL: Remove BKL from
    ncpfs"). Propagate the errors correctly.

    Coverity id: 1226925.

    Fixes: 2e54eb96e2c80 ("BKL: Remove BKL from ncpfs")
    Signed-off-by: Jan Kara
    Cc: Petr Vandrovec
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit 7e77bdebff5cb1e9876c561f69710b9ab8fa1f7e upstream.

    If a request is backlogged, it's complete() handler will get called
    twice: once with -EINPROGRESS, and once with the final error code.

    af_alg's complete handler, unlike other users, does not handle the
    -EINPROGRESS but instead always completes the completion that recvmsg()
    is waiting on. This can lead to a return to user space while the
    request is still pending in the driver. If userspace closes the sockets
    before the requests are handled by the driver, this will lead to
    use-after-frees (and potential crashes) in the kernel due to the tfm
    having been freed.

    The crashes can be easily reproduced (for example) by reducing the max
    queue length in cryptod.c and running the following (from
    http://www.chronox.de/libkcapi.html) on AES-NI capable hardware:

    $ while true; do kcapi -x 1 -e -c '__ecb-aes-aesni' \
    -k 00000000000000000000000000000000 \
    -p 00000000000000000000000000000000 >/dev/null & done

    Signed-off-by: Rabin Vincent
    Signed-off-by: Herbert Xu
    Signed-off-by: Greg Kroah-Hartman

    Rabin Vincent
     
  • commit 041d7b98ffe59c59fdd639931dea7d74f9aa9a59 upstream.

    A regression was caused by commit 780a7654cee8:
    audit: Make testing for a valid loginuid explicit.
    (which in turn attempted to fix a regression caused by e1760bd)

    When audit_krule_to_data() fills in the rules to get a listing, there was a
    missing clause to convert back from AUDIT_LOGINUID_SET to AUDIT_LOGINUID.

    This broke userspace by not returning the same information that was sent and
    expected.

    The rule:
    auditctl -a exit,never -F auid=-1
    gives:
    auditctl -l
    LIST_RULES: exit,never f24=0 syscall=all
    when it should give:
    LIST_RULES: exit,never auid=-1 (0xffffffff) syscall=all

    Tag it so that it is reported the same way it was set. Create a new
    private flags audit_krule field (pflags) to store it that won't interact with
    the public one from the API.

    Signed-off-by: Richard Guy Briggs
    Signed-off-by: Paul Moore
    Signed-off-by: Greg Kroah-Hartman

    Richard Guy Briggs
     
  • commit db86da7cb76f797a1a8b445166a15cb922c6ff85 upstream.

    A security fix in caused the way the unprivileged remount tests were
    using user namespaces to break. Tweak the way user namespaces are
    being used so the test works again.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit 66d2f338ee4c449396b6f99f5e75cd18eb6df272 upstream.

    Now that setgroups can be disabled and not reenabled, setting gid_map
    without privielge can now be enabled when setgroups is disabled.

    This restores most of the functionality that was lost when unprivileged
    setting of gid_map was removed. Applications that use this functionality
    will need to check to see if they use setgroups or init_groups, and if they
    don't they can be fixed by simply disabling setgroups before writing to
    gid_map.

    Reviewed-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit 9cc46516ddf497ea16e8d7cb986ae03a0f6b92f8 upstream.

    - Expose the knob to user space through a proc file /proc//setgroups

    A value of "deny" means the setgroups system call is disabled in the
    current processes user namespace and can not be enabled in the
    future in this user namespace.

    A value of "allow" means the segtoups system call is enabled.

    - Descendant user namespaces inherit the value of setgroups from
    their parents.

    - A proc file is used (instead of a sysctl) as sysctls currently do
    not allow checking the permissions at open time.

    - Writing to the proc file is restricted to before the gid_map
    for the user namespace is set.

    This ensures that disabling setgroups at a user namespace
    level will never remove the ability to call setgroups
    from a process that already has that ability.

    A process may opt in to the setgroups disable for itself by
    creating, entering and configuring a user namespace or by calling
    setns on an existing user namespace with setgroups disabled.
    Processes without privileges already can not call setgroups so this
    is a noop. Prodcess with privilege become processes without
    privilege when entering a user namespace and as with any other path
    to dropping privilege they would not have the ability to call
    setgroups. So this remains within the bounds of what is possible
    without a knob to disable setgroups permanently in a user namespace.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit f0d62aec931e4ae3333c797d346dc4f188f454ba upstream.

    Generalize id_map_mutex so it can be used for more state of a user namespace.

    Reviewed-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit f95d7918bd1e724675de4940039f2865e5eec5fe upstream.

    If you did not create the user namespace and are allowed
    to write to uid_map or gid_map you should already have the necessary
    privilege in the parent user namespace to establish any mapping
    you want so this will not affect userspace in practice.

    Limiting unprivileged uid mapping establishment to the creator of the
    user namespace makes it easier to verify all credentials obtained with
    the uid mapping can be obtained without the uid mapping without
    privilege.

    Limiting unprivileged gid mapping establishment (which is temporarily
    absent) to the creator of the user namespace also ensures that the
    combination of uid and gid can already be obtained without privilege.

    This is part of the fix for CVE-2014-8989.

    Reviewed-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit 80dd00a23784b384ccea049bfb3f259d3f973b9d upstream.

    setresuid allows the euid to be set to any of uid, euid, suid, and
    fsuid. Therefor it is safe to allow an unprivileged user to map
    their euid and use CAP_SETUID privileged with exactly that uid,
    as no new credentials can be obtained.

    I can not find a combination of existing system calls that allows setting
    uid, euid, suid, and fsuid from the fsuid making the previous use
    of fsuid for allowing unprivileged mappings a bug.

    This is part of a fix for CVE-2014-8989.

    Reviewed-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit be7c6dba2332cef0677fbabb606e279ae76652c3 upstream.

    As any gid mapping will allow and must allow for backwards
    compatibility dropping groups don't allow any gid mappings to be
    established without CAP_SETGID in the parent user namespace.

    For a small class of applications this change breaks userspace
    and removes useful functionality. This small class of applications
    includes tools/testing/selftests/mount/unprivilged-remount-test.c

    Most of the removed functionality will be added back with the addition
    of a one way knob to disable setgroups. Once setgroups is disabled
    setting the gid_map becomes as safe as setting the uid_map.

    For more common applications that set the uid_map and the gid_map
    with privilege this change will have no affect.

    This is part of a fix for CVE-2014-8989.

    Reviewed-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit 273d2c67c3e179adb1e74f403d1e9a06e3f841b5 upstream.

    setgroups is unique in not needing a valid mapping before it can be called,
    in the case of setgroups(0, NULL) which drops all supplemental groups.

    The design of the user namespace assumes that CAP_SETGID can not actually
    be used until a gid mapping is established. Therefore add a helper function
    to see if the user namespace gid mapping has been established and call
    that function in the setgroups permission check.

    This is part of the fix for CVE-2014-8989, being able to drop groups
    without privilege using user namespaces.

    Reviewed-by: Andy Lutomirski
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit 0542f17bf2c1f2430d368f44c8fcf2f82ec9e53e upstream.

    The rule is simple. Don't allow anything that wouldn't be allowed
    without unprivileged mappings.

    It was previously overlooked that establishing gid mappings would
    allow dropping groups and potentially gaining permission to files and
    directories that had lesser permissions for a specific group than for
    all other users.

    This is the rule needed to fix CVE-2014-8989 and prevent any other
    security issues with new_idmap_permitted.

    The reason for this rule is that the unix permission model is old and
    there are programs out there somewhere that take advantage of every
    little corner of it. So allowing a uid or gid mapping to be
    established without privielge that would allow anything that would not
    be allowed without that mapping will result in expectations from some
    code somewhere being violated. Violated expectations about the
    behavior of the OS is a long way to say a security issue.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit 7ff4d90b4c24a03666f296c3d4878cd39001e81e upstream.

    Today there are 3 instances of setgroups and due to an oversight their
    permission checking has diverged. Add a common function so that
    they may all share the same permission checking code.

    This corrects the current oversight in the current permission checks
    and adds a helper to avoid this in the future.

    A user namespace security fix will update this new helper, shortly.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit b2f5d4dc38e034eecb7987e513255265ff9aa1cf upstream.

    Forced unmount affects not just the mount namespace but the underlying
    superblock as well. Restrict forced unmount to the global root user
    for now. Otherwise it becomes possible a user in a less privileged
    mount namespace to force the shutdown of a superblock of a filesystem
    in a more privileged mount namespace, allowing a DOS attack on root.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit 4a44a19b470a886997d6647a77bb3e38dcbfa8c5 upstream.

    - MNT_NODEV should be irrelevant except when reading back mount flags,
    no longer specify MNT_NODEV on remount.

    - Test MNT_NODEV on devpts where it is meaningful even for unprivileged mounts.

    - Add a test to verify that remount of a prexisting mount with the same flags
    is allowed and does not change those flags.

    - Cleanup up the definitions of MS_REC, MS_RELATIME, MS_STRICTATIME that are used
    when the code is built in an environment without them.

    - Correct the test error messages when tests fail. There were not 5 tests
    that tested MS_RELATIME.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit 3e1866410f11356a9fd869beb3e95983dc79c067 upstream.

    Now that remount is properly enforcing the rule that you can't remove
    nodev at least sandstorm.io is breaking when performing a remount.

    It turns out that there is an easy intuitive solution implicitly
    add nodev on remount when nodev was implicitly added on mount.

    Tested-by: Cedric Bosdonnat
    Tested-by: Richard Weinberger
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit 9d367e5e7b05c71a8c1ac4e9b6e00ba45a79f2fc upstream.

    thermal_unregister_governors() and class_unregister() were being called in
    the wrong order.

    Fixes: 80a26a5c22b9 ("Thermal: build thermal governors into thermal_sys module")
    Signed-off-by: Luis Henriques
    Signed-off-by: Zhang Rui
    Signed-off-by: Greg Kroah-Hartman

    Luis Henriques
     
  • commit c297abfdf15b4480704d6b566ca5ca9438b12456 upstream.

    While reviewing the code of umount_tree I realized that when we append
    to a preexisting unmounted list we do not change pprev of the former
    first item in the list.

    Which means later in namespace_unlock hlist_del_init(&mnt->mnt_hash) on
    the former first item of the list will stomp unmounted.first leaving
    it set to some random mount point which we are likely to free soon.

    This isn't likely to hit, but if it does I don't know how anyone could
    track it down.

    [ This happened because we don't have all the same operations for
    hlist's as we do for normal doubly-linked lists. In particular,
    list_splice() is easy on our standard doubly-linked lists, while
    hlist_splice() doesn't exist and needs both start/end entries of the
    hlist. And commit 38129a13e6e7 incorrectly open-coded that missing
    hlist_splice().

    We should think about making these kinds of "mindless" conversions
    easier to get right by adding the missing hlist helpers - Linus ]

    Fixes: 38129a13e6e71f666e0468e99fdd932a687b4d7e switch mnt_hash to hlist
    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit 28a9bc68124c319b2b3dc861e80828a8865fd1ba upstream.

    When writing the code to allow per-station GTKs, I neglected to
    take into account the management frame keys (index 4 and 5) when
    freeing the station and only added code to free the first four
    data frame keys.

    Fix this by iterating the array of keys over the right length.

    Fixes: e31b82136d1a ("cfg80211/mac80211: allow per-station GTKs")
    Signed-off-by: Johannes Berg
    Signed-off-by: Greg Kroah-Hartman

    Johannes Berg
     
  • commit d025933e29872cb1fe19fc54d80e4dfa4ee5779c upstream.

    As multicast-frames can't be fragmented, "dot11MulticastReceivedFrameCount"
    stopped being incremented after the use-after-free fix. Furthermore, the
    RX-LED will be triggered by every multicast frame (which wouldn't happen
    before) which wouldn't allow the LED to rest at all.

    Fixes https://bugzilla.kernel.org/show_bug.cgi?id=89431 which also had the
    patch.

    Fixes: b8fff407a180 ("mac80211: fix use-after-free in defragmentation")
    Signed-off-by: Andreas Müller
    [rewrite commit message]
    Signed-off-by: Johannes Berg
    Signed-off-by: Greg Kroah-Hartman

    Andreas Müller
     
  • commit b26bdde5bb27f3f900e25a95e33a0c476c8c2c48 upstream.

    When loading encrypted-keys module, if the last check of
    aes_get_sizes() in init_encrypted() fails, the driver just returns an
    error without unregistering its key type. This results in the stale
    entry in the list. In addition to memory leaks, this leads to a kernel
    crash when registering a new key type later.

    This patch fixes the problem by swapping the calls of aes_get_sizes()
    and register_key_type(), and releasing resources properly at the error
    paths.

    Bugzilla: https://bugzilla.opensuse.org/show_bug.cgi?id=908163
    Signed-off-by: Takashi Iwai
    Signed-off-by: Mimi Zohar
    Signed-off-by: Greg Kroah-Hartman

    Takashi Iwai
     
  • commit 4e2024624e678f0ebb916e6192bd23c1f9fdf696 upstream.

    We didn't check length of rock ridge ER records before printing them.
    Thus corrupted isofs image can cause us to access and print some memory
    behind the buffer with obvious consequences.

    Reported-and-tested-by: Carl Henrik Lunde
    Signed-off-by: Jan Kara
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit 3fb2f4237bb452eb4e98f6a5dbd5a445b4fed9d0 upstream.

    It turns out that there's a lurking ABI issue. GCC, when
    compiling this in a 32-bit program:

    struct user_desc desc = {
    .entry_number = idx,
    .base_addr = base,
    .limit = 0xfffff,
    .seg_32bit = 1,
    .contents = 0, /* Data, grow-up */
    .read_exec_only = 0,
    .limit_in_pages = 1,
    .seg_not_present = 0,
    .useable = 0,
    };

    will leave .lm uninitialized. This means that anything in the
    kernel that reads user_desc.lm for 32-bit tasks is unreliable.

    Revert the .lm check in set_thread_area(). The value never did
    anything in the first place.

    Fixes: 0e58af4e1d21 ("x86/tls: Disallow unusual TLS segments")
    Signed-off-by: Andy Lutomirski
    Acked-by: Thomas Gleixner
    Cc: Linus Torvalds
    Link: http://lkml.kernel.org/r/d7875b60e28c512f6a6fc0baf5714d58e7eaadbb.1418856405.git.luto@amacapital.net
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Andy Lutomirski
     
  • commit ab1e85372168892387dd1ac171158fc8c3119be4 upstream.

    Commit a095b1c78a35 ("ARM: mvebu: sort DT nodes by address")
    missed placing the system-controller in the correct order.

    Fixes: a095b1c78a35 ("ARM: mvebu: sort DT nodes by address")
    Signed-off-by: Uwe Kleine-König
    Acked-by: Andrew Lunn
    Link: https://lkml.kernel.org/r/20141114204333.GS27002@pengutronix.de
    Signed-off-by: Jason Cooper
    Signed-off-by: Greg Kroah-Hartman

    Uwe Kleine-König
     
  • commit e4a680099a6e97ecdbb81081cff9e4a489a4dc44 upstream.

    Commit d127e9c ("ARM: tegra: make tegra_resume can work with current and later
    chips") removed tegra_get_soc_id macro leaving used cpu register corrupted after
    branching to v7_invalidate_l1() and as result causing execution of unintended
    code on tegra20. Possibly it was expected that r6 would be SoC id func argument
    since common cpu reset handler is setting r6 before branching to tegra_resume(),
    but neither tegra20_lp1_reset() nor tegra30_lp1_reset() aren't setting r6
    register before jumping to resume function. Fix it by re-adding macro.

    Fixes: d127e9c (ARM: tegra: make tegra_resume can work with current and later chips)
    Reviewed-by: Felipe Balbi
    Signed-off-by: Dmitry Osipenko
    Signed-off-by: Thierry Reding
    Signed-off-by: Greg Kroah-Hartman

    Dmitry Osipenko
     
  • commit 7d57511d2dba03a8046c8b428dd9192a4bfc1e73 upstream.

    Commit a469abd0f868 (ARM: elf: add new hwcap for identifying atomic
    ldrd/strd instructions) introduces HWCAP_ELF for 32-bit ARM
    applications. As LPAE is always present on arm64, report the
    corresponding compat HWCAP to user space.

    Signed-off-by: Catalin Marinas
    Signed-off-by: Will Deacon
    Signed-off-by: Greg Kroah-Hartman

    Catalin Marinas
     
  • commit 2c43fd26e46734430122b8d2ad3024bb532df3ef upstream.

    Discard bios and thin device deletion have the potential to release data
    blocks. If the thin-pool is in out-of-data-space mode, and blocks were
    released, transition the thin-pool back to full write mode.

    The correct time to do this is just after the thin-pool metadata commit.
    It cannot be done before the commit because the space maps will not
    allow immediate reuse of the data blocks in case there's a rollback
    following power failure.

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Joe Thornber
     
  • commit 45ec9bd0fd7abf8705e7cf12205ff69fe9d51181 upstream.

    When the pool was in PM_OUT_OF_SPACE mode its process_prepared_discard
    function pointer was incorrectly being set to
    process_prepared_discard_passdown rather than process_prepared_discard.

    This incorrect function pointer meant the discard was being passed down,
    but not effecting the mapping. As such any discard that was issued, in
    an attempt to reclaim blocks, would not successfully free data space.

    Reported-by: Eric Sandeen
    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Joe Thornber
     
  • commit c1c6156fe4d4577444b769d7edd5dd503e57bbc9 upstream.

    This function isn't right and it causes a static checker warning:

    drivers/md/dm-thin.c:3016 maybe_resize_data_dev()
    error: potentially using uninitialized 'sb_data_size'.

    It should set "*count" and return zero on success the same as the
    sm_metadata_get_nr_blocks() function does earlier.

    Fixes: 3241b1d3e0aa ('dm: add persistent data library')
    Signed-off-by: Dan Carpenter
    Acked-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     
  • commit 1e32134a5a404e80bfb47fad8a94e9bbfcbdacc5 upstream.

    If the incoming bio is a WRITE and completely covers a block then we
    don't bother to do any copying for a promotion operation. Once this is
    done the cache block and origin block will be different, so we need to
    set it to 'dirty'.

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Joe Thornber
     
  • commit f29a3147e251d7ae20d3194ff67f109d71e501b4 upstream.

    Overwrite causes the cache block and origin blocks to diverge, which
    is only allowed in writeback mode.

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Joe Thornber
     
  • commit 1a71d6ffe18c0d0f03fc8531949cc8ed41d702ee upstream.

    Use memzero_explicit to cleanup sensitive data allocated on stack
    to prevent the compiler from optimizing and removing memset() calls.

    Signed-off-by: Milan Broz
    Signed-off-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Milan Broz
     
  • commit 445559cdcb98a141f5de415b94fd6eaccab87e6d upstream.

    When dm-bufio sets out to use the bio built into a struct dm_buffer to
    issue an IO, it needs to call bio_reset after it's done with the bio
    so that we can free things attached to the bio such as the integrity
    payload. Therefore, inject our own endio callback to take care of
    the bio_reset after calling submit_io's end_io callback.

    Test case:
    1. modprobe scsi_debug delay=0 dif=1 dix=199 ato=1 dev_size_mb=300
    2. Set up a dm-bufio client, e.g. dm-verity, on the scsi_debug device
    3. Repeatedly read metadata and watch kmalloc-192 leak!

    Signed-off-by: Darrick J. Wong
    Signed-off-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Darrick J. Wong