27 Apr, 2014

34 commits

  • Greg Kroah-Hartman
     
  • commit c39df5fa37b0623589508c95515b4aa1531c524e upstream.

    Commit 8aac62706ada ("move exit_task_namespaces() outside of
    exit_notify()") breaks pppd and the exiting service crashes the kernel:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000028
    IP: ppp_register_channel+0x13/0x20 [ppp_generic]
    Call Trace:
    ppp_asynctty_open+0x12b/0x170 [ppp_async]
    tty_ldisc_open.isra.2+0x27/0x60
    tty_ldisc_hangup+0x1e3/0x220
    __tty_hangup+0x2c4/0x440
    disassociate_ctty+0x61/0x270
    do_exit+0x7f2/0xa50

    ppp_register_channel() needs ->net_ns and current->nsproxy == NULL.

    Move disassociate_ctty() before exit_task_namespaces(), it doesn't make
    sense to delay it after perf_event_exit_task() or cgroup_exit().

    This also allows to use task_work_add() inside the (nontrivial) code
    paths in disassociate_ctty().

    Investigated by Peter Hurley.

    Signed-off-by: Oleg Nesterov
    Reported-by: Sree Harsha Totakura
    Cc: Peter Hurley
    Cc: Sree Harsha Totakura
    Cc: "Eric W. Biederman"
    Cc: Jeff Dike
    Cc: Ingo Molnar
    Cc: Andrey Vagin
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Oleg Nesterov
     
  • commit dfccbb5e49a621c1b21a62527d61fc4305617aca upstream.

    wait_task_zombie() first does EXIT_ZOMBIE->EXIT_DEAD transition and
    drops tasklist_lock. If this task is not the natural child and it is
    traced, we change its state back to EXIT_ZOMBIE for ->real_parent.

    The last transition is racy, this is even documented in 50b8d257486a
    "ptrace: partially fix the do_wait(WEXITED) vs EXIT_DEAD->EXIT_ZOMBIE
    race". wait_consider_task() tries to detect this transition and clear
    ->notask_error but we can't rely on ptrace_reparented(), debugger can
    exit and do ptrace_unlink() before its sub-thread sets EXIT_ZOMBIE.

    And there is another problem which were missed before: this transition
    can also race with reparent_leader() which doesn't reset >exit_signal if
    EXIT_DEAD, assuming that this task must be reaped by someone else. So
    the tracee can be re-parented with ->exit_signal != SIGCHLD, and if
    /sbin/init doesn't use __WALL it becomes unreapable.

    Change reparent_leader() to update ->exit_signal even if EXIT_DEAD.
    Note: this is the simple temporary hack for -stable, it doesn't try to
    solve all problems, it will be reverted by the next changes.

    Signed-off-by: Oleg Nesterov
    Reported-by: Jan Kratochvil
    Reported-by: Michal Schmidt
    Tested-by: Michal Schmidt
    Cc: Al Viro
    Cc: Lennart Poettering
    Cc: Roland McGrath
    Cc: Tejun Heo
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Oleg Nesterov
     
  • commit 3ead9578443b66ddb3d50ed4f53af8a0c0298ec5 upstream.

    @wait is a local variable, so if we don't remove it from the wait queue
    list, later wake_up() may end up accessing invalid memory.

    This was spotted by eyes.

    Signed-off-by: Li Zefan
    Cc: David Woodhouse
    Cc: Artem Bityutskiy
    Signed-off-by: Andrew Morton
    Signed-off-by: Brian Norris
    Signed-off-by: Greg Kroah-Hartman

    Li Zefan
     
  • commit 13b546d96207c131eeae15dc7b26c6e7d0f1cad7 upstream.

    We triggered soft-lockup under stress test on 2.6.34 kernel.

    BUG: soft lockup - CPU#1 stuck for 60009ms! [lockf2.test:14488]
    ...
    [] (jffs2_do_reserve_space+0x420/0x440 [jffs2])
    [] (jffs2_reserve_space_gc+0x34/0x78 [jffs2])
    [] (jffs2_garbage_collect_dnode.isra.3+0x264/0x478 [jffs2])
    [] (jffs2_garbage_collect_pass+0x9c0/0xe4c [jffs2])
    [] (jffs2_reserve_space+0x104/0x2a8 [jffs2])
    [] (jffs2_write_inode_range+0x5c/0x4d4 [jffs2])
    [] (jffs2_write_end+0x198/0x2c0 [jffs2])
    [] (generic_file_buffered_write+0x158/0x200)
    [] (__generic_file_aio_write+0x3a4/0x414)
    [] (generic_file_aio_write+0x5c/0xbc)
    [] (do_sync_write+0x98/0xd4)
    [] (vfs_write+0xa8/0x150)
    [] (sys_write+0x3c/0xc0)]

    Fix this by adding a cond_resched() in the while loop.

    [akpm@linux-foundation.org: don't initialize `ret']
    Signed-off-by: Li Zefan
    Cc: David Woodhouse
    Cc: Artem Bityutskiy
    Signed-off-by: Andrew Morton
    Signed-off-by: Brian Norris
    Signed-off-by: Greg Kroah-Hartman

    Li Zefan
     
  • commit 41bf1a24c1001f4d0d41a78e1ac575d2f14789d7 upstream.

    mounting JFFS2 partition sometimes crashes with this call trace:

    [ 1322.240000] Kernel bug detected[#1]:
    [ 1322.244000] Cpu 2
    [ 1322.244000] $ 0 : 0000000000000000 0000000000000018 000000003ff00070 0000000000000001
    [ 1322.252000] $ 4 : 0000000000000000 c0000000f3980150 0000000000000000 0000000000010000
    [ 1322.260000] $ 8 : ffffffffc09cd5f8 0000000000000001 0000000000000088 c0000000ed300de8
    [ 1322.268000] $12 : e5e19d9c5f613a45 ffffffffc046d464 0000000000000000 66227ba5ea67b74e
    [ 1322.276000] $16 : c0000000f1769c00 c0000000ed1e0200 c0000000f3980150 0000000000000000
    [ 1322.284000] $20 : c0000000f3a80000 00000000fffffffc c0000000ed2cfbd8 c0000000f39818f0
    [ 1322.292000] $24 : 0000000000000004 0000000000000000
    [ 1322.300000] $28 : c0000000ed2c0000 c0000000ed2cfab8 0000000000010000 ffffffffc039c0b0
    [ 1322.308000] Hi : 000000000000023c
    [ 1322.312000] Lo : 000000000003f802
    [ 1322.316000] epc : ffffffffc039a9f8 check_tn_node+0x88/0x3b0
    [ 1322.320000] Not tainted
    [ 1322.324000] ra : ffffffffc039c0b0 jffs2_do_read_inode_internal+0x1250/0x1e48
    [ 1322.332000] Status: 5400f8e3 KX SX UX KERNEL EXL IE
    [ 1322.336000] Cause : 00800034
    [ 1322.340000] PrId : 000c1004 (Netlogic XLP)
    [ 1322.344000] Modules linked in:
    [ 1322.348000] Process jffs2_gcd_mtd7 (pid: 264, threadinfo=c0000000ed2c0000, task=c0000000f0e68dd8, tls=0000000000000000)
    [ 1322.356000] Stack : c0000000f1769e30 c0000000ed010780 c0000000ed010780 c0000000ed300000
    c0000000f1769c00 c0000000f3980150 c0000000f3a80000 00000000fffffffc
    c0000000ed2cfbd8 ffffffffc039c0b0 ffffffffc09c6340 0000000000001000
    0000000000000dec ffffffffc016c9d8 c0000000f39805a0 c0000000f3980180
    0000008600000000 0000000000000000 0000000000000000 0000000000000000
    0001000000000dec c0000000f1769d98 c0000000ed2cfb18 0000000000010000
    0000000000010000 0000000000000044 c0000000f3a80000 c0000000f1769c00
    c0000000f3d207a8 c0000000f1769d98 c0000000f1769de0 ffffffffc076f9c0
    0000000000000009 0000000000000000 0000000000000000 ffffffffc039cf90
    0000000000000017 ffffffffc013fbdc 0000000000000001 000000010003e61c
    ...
    [ 1322.424000] Call Trace:
    [ 1322.428000] [] check_tn_node+0x88/0x3b0
    [ 1322.432000] [] jffs2_do_read_inode_internal+0x1250/0x1e48
    [ 1322.440000] [] jffs2_do_crccheck_inode+0x70/0xd0
    [ 1322.448000] [] jffs2_garbage_collect_pass+0x160/0x870
    [ 1322.452000] [] jffs2_garbage_collect_thread+0xdc/0x1f0
    [ 1322.460000] [] kthread+0xb8/0xc0
    [ 1322.464000] [] kernel_thread_helper+0x10/0x18
    [ 1322.472000]
    [ 1322.472000]
    Code: 67bd0050 94a4002c 2c830001 de050218 2403fffc 0080a82d 00431824 24630044
    [ 1322.480000] ---[ end trace b052bb90e97dfbf5 ]---

    The variable csize in structure jffs2_tmp_dnode_info is of type uint16_t, but it
    is used to hold the compressed data length(csize) which is declared as uint32_t.
    So, when the value of csize exceeds 16bits, it gets truncated when assigned to
    tn->csize. This is causing a kernel BUG.
    Changing the definition of csize in jffs2_tmp_dnode_info to uint32_t fixes the issue.

    Signed-off-by: Ajesh Kunhipurayil Vijayan
    Signed-off-by: Kamlakant Patel
    Signed-off-by: Brian Norris
    Signed-off-by: Greg Kroah-Hartman

    Ajesh Kunhipurayil Vijayan
     
  • commit 3367da5610c50e6b83f86d366d72b41b350b06a2 upstream.

    Creating a large file on a JFFS2 partition sometimes crashes with this call
    trace:

    [ 306.476000] CPU 13 Unable to handle kernel paging request at virtual address c0000000dfff8002, epc == ffffffffc03a80a8, ra == ffffffffc03a8044
    [ 306.488000] Oops[#1]:
    [ 306.488000] Cpu 13
    [ 306.492000] $ 0 : 0000000000000000 0000000000000000 0000000000008008 0000000000008007
    [ 306.500000] $ 4 : c0000000dfff8002 000000000000009f c0000000e0007cde c0000000ee95fa58
    [ 306.508000] $ 8 : 0000000000000001 0000000000008008 0000000000010000 ffffffffffff8002
    [ 306.516000] $12 : 0000000000007fa9 000000000000ff0e 000000000000ff0f 80e55930aebb92bb
    [ 306.524000] $16 : c0000000e0000000 c0000000ee95fa5c c0000000efc80000 ffffffffc09edd70
    [ 306.532000] $20 : ffffffffc2b60000 c0000000ee95fa58 0000000000000000 c0000000efc80000
    [ 306.540000] $24 : 0000000000000000 0000000000000004
    [ 306.548000] $28 : c0000000ee950000 c0000000ee95f738 0000000000000000 ffffffffc03a8044
    [ 306.556000] Hi : 00000000000574a5
    [ 306.560000] Lo : 6193b7a7e903d8c9
    [ 306.564000] epc : ffffffffc03a80a8 jffs2_rtime_compress+0x98/0x198
    [ 306.568000] Tainted: G W
    [ 306.572000] ra : ffffffffc03a8044 jffs2_rtime_compress+0x34/0x198
    [ 306.580000] Status: 5000f8e3 KX SX UX KERNEL EXL IE
    [ 306.584000] Cause : 00800008
    [ 306.588000] BadVA : c0000000dfff8002
    [ 306.592000] PrId : 000c1100 (Netlogic XLP)
    [ 306.596000] Modules linked in:
    [ 306.596000] Process dd (pid: 170, threadinfo=c0000000ee950000, task=c0000000ee6e0858, tls=0000000000c47490)
    [ 306.608000] Stack : 7c547f377ddc7ee4 7ffc7f967f5d7fae 7f617f507fc37ff4 7e7d7f817f487f5f
    7d8e7fec7ee87eb3 7e977ff27eec7f9e 7d677ec67f917f67 7f3d7e457f017ed7
    7fd37f517f867eb2 7fed7fd17ca57e1d 7e5f7fe87f257f77 7fd77f0d7ede7fdb
    7fba7fef7e197f99 7fde7fe07ee37eb5 7f5c7f8c7fc67f65 7f457fb87f847e93
    7f737f3e7d137cd9 7f8e7e9c7fc47d25 7dbb7fac7fb67e52 7ff17f627da97f64
    7f6b7df77ffa7ec5 80057ef17f357fb3 7f767fa27dfc7fd5 7fe37e8e7fd07e53
    7e227fcf7efb7fa1 7f547e787fa87fcc 7fcb7fc57f5a7ffb 7fc07f6c7ea97e80
    7e2d7ed17e587ee0 7fb17f9d7feb7f31 7f607e797e887faa 7f757fdd7c607ff3
    7e877e657ef37fbd 7ec17fd67fe67ff7 7ff67f797ff87dc4 7eef7f3a7c337fa6
    7fe57fc97ed87f4b 7ebe7f097f0b8003 7fe97e2a7d997cba 7f587f987f3c7fa9
    ...
    [ 306.676000] Call Trace:
    [ 306.680000] [] jffs2_rtime_compress+0x98/0x198
    [ 306.684000] [] jffs2_selected_compress+0x110/0x230
    [ 306.692000] [] jffs2_compress+0x5c/0x388
    [ 306.696000] [] jffs2_write_inode_range+0xd8/0x388
    [ 306.704000] [] jffs2_write_end+0x16c/0x2d0
    [ 306.708000] [] generic_file_buffered_write+0xf8/0x2b8
    [ 306.716000] [] __generic_file_aio_write+0x1ac/0x350
    [ 306.720000] [] generic_file_aio_write+0x80/0x168
    [ 306.728000] [] do_sync_write+0x94/0xf8
    [ 306.732000] [] vfs_write+0xa4/0x1a0
    [ 306.736000] [] SyS_write+0x50/0x90
    [ 306.744000] [] handle_sys+0x180/0x1a0
    [ 306.748000]
    [ 306.748000]
    Code: 020b202d 0205282d 90a50000 14a40038 00000000 0060602d 0000282d 016c5823
    [ 306.760000] ---[ end trace 79dd088435be02d0 ]---
    Segmentation fault

    This crash is caused because the 'positions' is declared as an array of signed
    short. The value of position is in the range 0..65535, and will be converted
    to a negative number when the position is greater than 32767 and causes a
    corruption and crash. Changing the definition to 'unsigned short' fixes this
    issue

    Signed-off-by: Jayachandran C
    Signed-off-by: Kamlakant Patel
    Signed-off-by: Brian Norris
    Signed-off-by: Greg Kroah-Hartman

    Kamlakant Patel
     
  • commit 47ba9734403770a4c5e685b01f0a72b835dd4fff upstream.

    This patch moves the dereference of "buffer" after the check for NULL.
    The only place which passes a NULL parameter is gfs2_set_acl().

    Signed-off-by: Dan Carpenter
    Signed-off-by: Steven Whitehouse
    Signed-off-by: Greg Kroah-Hartman

    Dan Carpenter
     
  • commit ad6599ab3ac98a4474544086e048ce86ec15a4d1 upstream.

    Xfstests generic/311 and shared/298 fail when run on a bigalloc file
    system. Kernel error messages produced during the tests report that
    blocks to be freed are already on the to-be-freed list. When e2fsck
    is run at the end of the tests, it typically reports bad i_blocks and
    bad free blocks counts.

    The bug that causes these failures is located in ext4_ext_rm_leaf().
    Code at the end of the function frees a partial cluster if it's not
    shared with an extent remaining in the leaf. However, if all the
    extents in the leaf have been removed, the code dereferences an
    invalid extent pointer (off the front of the leaf) when the check for
    sharing is made. This generally has the effect of unconditionally
    freeing the partial cluster, which leads to the observed failures
    when the partial cluster is shared with the last extent in the next
    leaf.

    Fix this by attempting to free the cluster only if extents remain in
    the leaf. Any remaining partial cluster will be freed if possible
    when the next leaf is processed or when leaf removal is complete.

    Signed-off-by: Eric Whitney
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Greg Kroah-Hartman

    Eric Whitney
     
  • commit c06344939422bbd032ac967223a7863de57496b5 upstream.

    Commit 9cb00419fa, which enables hole punching for bigalloc file
    systems, exposed a bug introduced by commit 6ae06ff51e in an earlier
    release. When run on a bigalloc file system, xfstests generic/013, 068,
    075, 083, 091, 100, 112, 127, 263, 269, and 270 fail with e2fsck errors
    or cause kernel error messages indicating that previously freed blocks
    are being freed again.

    The latter commit optimizes the selection of the starting extent in
    ext4_ext_rm_leaf() when hole punching by beginning with the extent
    supplied in the path argument rather than with the last extent in the
    leaf node (as is still done when truncating). However, the code in
    rm_leaf that initially sets partial_cluster to track cluster sharing on
    extent boundaries is only guaranteed to run if rm_leaf starts with the
    last node in the leaf. Consequently, partial_cluster is not correctly
    initialized when hole punching, and a cluster on the boundary of a
    punched region that should be retained may instead be deallocated.

    Signed-off-by: Eric Whitney
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Greg Kroah-Hartman

    Eric Whitney
     
  • commit ce37c42919608e96ade3748fe23c3062a0a966c5 upstream.

    Commit 3779473246 breaks the return of error codes from
    ext4_ext_handle_uninitialized_extents() in ext4_ext_map_blocks(). A
    portion of the patch assigns that function's signed integer return
    value to an unsigned int. Consequently, negatively valued error codes
    are lost and can be treated as a bogus allocated block count.

    Signed-off-by: Eric Whitney
    Signed-off-by: "Theodore Ts'o"
    Signed-off-by: Greg Kroah-Hartman

    Eric Whitney
     
  • commit 573a075567f0174551e2fad2a3164afd2af788f2 upstream.

    We could have possibly added an extent_op to the locked_ref while we dropped
    locked_ref->lock, so check for this case as well and loop around. Otherwise we
    could lose flag updates which would lead to extent tree corruption. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit 3bbb24b20a8800158c33eca8564f432dd14d0bf3 upstream.

    Zach found this deadlock that would happen like this

    btrfs_end_transaction use_count to 0
    btrfs_run_delayed_refs
    btrfs_cow_block
    find_free_extent
    btrfs_start_transaction use_count to 1
    allocate chunk
    btrfs_end_transaction use_count to 0
    btrfs_run_delayed_refs
    lock tree block we are cowing above ^^

    We need to only decrease trans->use_count if it is above 1, otherwise leave it
    alone. This will make nested trans be the only ones who decrease their added
    ref, and will let us get rid of the trans->use_count++ hack if we have to commit
    the transaction. Thanks,

    Reported-by: Zach Brown
    Signed-off-by: Josef Bacik
    Tested-by: Zach Brown
    Signed-off-by: Chris Mason
    Signed-off-by: Greg Kroah-Hartman

    Josef Bacik
     
  • commit f88ba6a2a44ee98e8d59654463dc157bb6d13c43 upstream.

    I got an error on v3.13:
    BTRFS error (device sdf1) in write_all_supers:3378: errno=-5 IO failure (errors while submitting device barriers.)

    how to reproduce:
    > mkfs.btrfs -f -d raid1 /dev/sdf1 /dev/sdf2
    > wipefs -a /dev/sdf2
    > mount -o degraded /dev/sdf1 /mnt
    > btrfs balance start -f -sconvert=single -mconvert=single -dconvert=single /mnt

    The reason of the error is that barrier_all_devices() failed to submit
    barrier to the missing device. However it is clear that we cannot do
    anything on missing device, and also it is not necessary to care chunks
    on the missing device.

    This patch stops sending/waiting barrier if device is missing.

    Signed-off-by: Hidetoshi Seto
    Signed-off-by: Josef Bacik
    Signed-off-by: Greg Kroah-Hartman

    Hidetoshi Seto
     
  • commit c88547a8119e3b581318ab65e9b72f27f23e641d upstream.

    Commit f5ea1100 ("xfs: add CRCs to dir2/da node blocks") introduced
    in 3.10 incorrectly converted the btree hash index array pointer in
    xfs_da3_fixhashpath(). It resulted in the the current hash always
    being compared against the first entry in the btree rather than the
    current block index into the btree block's hash entry array. As a
    result, it was comparing the wrong hashes, and so could misorder the
    entries in the btree.

    For most cases, this doesn't cause any problems as it requires hash
    collisions to expose the ordering problem. However, when there are
    hash collisions within a directory there is a very good probability
    that the entries will be ordered incorrectly and that actually
    matters when duplicate hashes are placed into or removed from the
    btree block hash entry array.

    This bug results in an on-disk directory corruption and that results
    in directory verifier functions throwing corruption warnings into
    the logs. While no data or directory entries are lost, access to
    them may be compromised, and attempts to remove entries from a
    directory that has suffered from this corruption may result in a
    filesystem shutdown. xfs_repair will fix the directory hash
    ordering without data loss occuring.

    [dchinner: wrote useful a commit message]

    Reported-by: Hannes Frederic Sowa
    Signed-off-by: Mark Tinguely
    Reviewed-by: Ben Myers
    Signed-off-by: Dave Chinner
    Signed-off-by: Greg Kroah-Hartman

    Mark Tinguely
     
  • commit 5acda9d12dcf1ad0d9a5a2a7c646de3472fa7555 upstream.

    After commit 839a8e8660b6 ("writeback: replace custom worker pool
    implementation with unbound workqueue") when device is removed while we
    are writing to it we crash in bdi_writeback_workfn() ->
    set_worker_desc() because bdi->dev is NULL.

    This can happen because even though bdi_unregister() cancels all pending
    flushing work, nothing really prevents new ones from being queued from
    balance_dirty_pages() or other places.

    Fix the problem by clearing BDI_registered bit in bdi_unregister() and
    checking it before scheduling of any flushing work.

    Fixes: 839a8e8660b6777e7fe4e80af1a048aebe2b5977

    Reviewed-by: Tejun Heo
    Signed-off-by: Jan Kara
    Cc: Derek Basehore
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Jan Kara
     
  • commit 6ca738d60c563d5c6cf6253ee4b8e76fa77b2b9e upstream.

    bdi_wakeup_thread_delayed() used the mod_delayed_work() function to
    schedule work to writeback dirty inodes. The problem with this is that
    it can delay work that is scheduled for immediate execution, such as the
    work from sync_inodes_sb(). This can happen since mod_delayed_work()
    can now steal work from a work_queue. This fixes the problem by using
    queue_delayed_work() instead. This is a regression caused by commit
    839a8e8660b6 ("writeback: replace custom worker pool implementation with
    unbound workqueue").

    The reason that this causes a problem is that laptop-mode will change
    the delay, dirty_writeback_centisecs, to 60000 (10 minutes) by default.
    In the case that bdi_wakeup_thread_delayed() races with
    sync_inodes_sb(), sync will be stopped for 10 minutes and trigger a hung
    task. Even if dirty_writeback_centisecs is not long enough to cause a
    hung task, we still don't want to delay sync for that long.

    We fix the problem by using queue_delayed_work() when we want to
    schedule writeback sometime in future. This function doesn't change the
    timer if it is already armed.

    For the same reason, we also change bdi_writeback_workfn() to
    immediately queue the work again in the case that the work_list is not
    empty. The same problem can happen if the sync work is run on the
    rescue worker.

    [jack@suse.cz: update changelog, add comment, use bdi_wakeup_thread_delayed()]
    Signed-off-by: Derek Basehore
    Reviewed-by: Jan Kara
    Cc: Alexander Viro
    Reviewed-by: Tejun Heo
    Cc: Greg Kroah-Hartman
    Cc: "Darrick J. Wong"
    Cc: Derek Basehore
    Cc: Kees Cook
    Cc: Benson Leung
    Cc: Sonny Rao
    Cc: Luigi Semenzato
    Cc: Jens Axboe
    Cc: Dave Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Derek Basehore
     
  • commit c019e307ad82a8ee652b8ccbacf69ae94263b07b upstream.

    With the new template mechanism introduced in IMA since kernel 3.13,
    the format of data sent through the binary_runtime_measurements interface
    is slightly changed. Now, for a generic measurement, the format of
    template data (after the template name) is:

    template_len | field1_len | field1 | ... | fieldN_len | fieldN

    In addition, fields containing a string now include the '\0' termination
    character.

    Instead, the format for the 'ima' template should be:

    SHA1 digest | event name length | event name

    It must be noted that while in the IMA 3.13 code 'event name length' is
    'IMA_EVENT_NAME_LEN_MAX + 1' (256 bytes), so that the template digest
    is calculated correctly, and 'event name' contains '\0', in the pre 3.13
    code 'event name length' is exactly the string length and 'event name'
    does not contain the termination character.

    The patch restores the behavior of the IMA code pre 3.13 for the 'ima'
    template so that legacy userspace tools obtain a consistent behavior
    when receiving data from the binary_runtime_measurements interface
    regardless of which kernel version is used.

    Signed-off-by: Roberto Sassu
    Signed-off-by: Mimi Zohar
    Signed-off-by: Greg Kroah-Hartman

    Roberto Sassu
     
  • commit 5981a8821b774ada0be512fd9bad7c241e17657e upstream.

    This patch fixes authentication failure on LE link re-connection when
    BlueZ acts as slave (peripheral). LTK is removed from the internal list
    after its first use causing PIN or Key missing reply when re-connecting
    the link. The LE Long Term Key Request event indicates that the master
    is attempting to encrypt or re-encrypt the link.

    Pre-condition: BlueZ host paired and running as slave.
    How to reproduce(master):

    1) Establish an ACL LE encrypted link
    2) Disconnect the link
    3) Try to re-establish the ACL LE encrypted link (fails)

    > HCI Event: LE Meta Event (0x3e) plen 19
    LE Connection Complete (0x01)
    Status: Success (0x00)
    Handle: 64
    Role: Slave (0x01)
    ...
    @ Device Connected: 00:02:72:DC:29:C9 (1) flags 0x0000
    > HCI Event: LE Meta Event (0x3e) plen 13
    LE Long Term Key Request (0x05)
    Handle: 64
    Random number: 875be18439d9aa37
    Encryption diversifier: 0x76ed
    < HCI Command: LE Long Term Key Request Reply (0x08|0x001a) plen 18
    Handle: 64
    Long term key: 2aa531db2fce9f00a0569c7d23d17409
    > HCI Event: Command Complete (0x0e) plen 6
    LE Long Term Key Request Reply (0x08|0x001a) ncmd 1
    Status: Success (0x00)
    Handle: 64
    > HCI Event: Encryption Change (0x08) plen 4
    Status: Success (0x00)
    Handle: 64
    Encryption: Enabled with AES-CCM (0x01)
    ...
    @ Device Disconnected: 00:02:72:DC:29:C9 (1) reason 3
    < HCI Command: LE Set Advertise Enable (0x08|0x000a) plen 1
    Advertising: Enabled (0x01)
    > HCI Event: Command Complete (0x0e) plen 4
    LE Set Advertise Enable (0x08|0x000a) ncmd 1
    Status: Success (0x00)
    > HCI Event: LE Meta Event (0x3e) plen 19
    LE Connection Complete (0x01)
    Status: Success (0x00)
    Handle: 64
    Role: Slave (0x01)
    ...
    @ Device Connected: 00:02:72:DC:29:C9 (1) flags 0x0000
    > HCI Event: LE Meta Event (0x3e) plen 13
    LE Long Term Key Request (0x05)
    Handle: 64
    Random number: 875be18439d9aa37
    Encryption diversifier: 0x76ed
    < HCI Command: LE Long Term Key Request Neg Reply (0x08|0x001b) plen 2
    Handle: 64
    > HCI Event: Command Complete (0x0e) plen 6
    LE Long Term Key Request Neg Reply (0x08|0x001b) ncmd 1
    Status: Success (0x00)
    Handle: 64
    > HCI Event: Disconnect Complete (0x05) plen 4
    Status: Success (0x00)
    Handle: 64
    Reason: Authentication Failure (0x05)
    @ Device Disconnected: 00:02:72:DC:29:C9 (1) reason 0

    Signed-off-by: Claudio Takahasi
    Signed-off-by: Johan Hedberg
    Signed-off-by: Greg Kroah-Hartman

    Claudio Takahasi
     
  • commit d23082257d83e4bc89727d5aedee197e907999d2 upstream.

    pidns_get()->get_pid_ns() can hit ns == NULL. This task_struct can't
    go away, but task_active_pid_ns(task) is NULL if release_task(task)
    was already called. Alternatively we could change get_pid_ns(ns) to
    check ns != NULL, but it seems that other callers are fine.

    Signed-off-by: Oleg Nesterov
    Cc: Eric W. Biederman ebiederm@xmission.com>
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Oleg Nesterov
     
  • commit 7aae51347b21eb738dc1981df1365b57a6c5ee4e upstream.

    Evidently some wacky USB-ATA bridges don't recognize the SYNCHRONIZE
    CACHE command, as shown in this email thread:

    http://marc.info/?t=138978356200002&r=1&w=2

    The fact that we can't tell them to drain their caches shouldn't
    prevent the system from going into suspend. Therefore sd_sync_cache()
    shouldn't return an error if the device replies with an Invalid
    Command ASC.

    Signed-off-by: Alan Stern
    Reported-by: Sven Neumann
    Tested-by: Daniel Mack
    Signed-off-by: James Bottomley
    Signed-off-by: Greg Kroah-Hartman

    Alan Stern
     
  • commit a9c3f68f3cd8d55f809fbdb0c138ed061ea1bd25 upstream.

    The user-settable knob, low_latency, has been the source of
    several BUG reports which stem from flush_to_ldisc() running
    in interrupt context. Since 3.12, which added several sleeping
    locks (termios_rwsem and buf->lock) to the input processing path,
    the frequency of these BUG reports has increased.

    Note that changes in 3.12 did not introduce this regression;
    sleeping locks were first added to the input processing path
    with the removal of the BKL from N_TTY in commit
    a88a69c91256418c5907c2f1f8a0ec0a36f9e6cc,
    'n_tty: Fix loss of echoed characters and remove bkl from n_tty'
    and later in commit 38db89799bdf11625a831c5af33938dcb11908b6,
    'tty: throttling race fix'. Since those changes, executing
    flush_to_ldisc() in interrupt_context (ie, low_latency set), is unsafe.

    However, since most devices do not validate if the low_latency
    setting is appropriate for the context (process or interrupt) in
    which they receive data, some reports are due to misconfiguration.
    Further, serial dma devices for which dma fails, resort to
    interrupt receiving as a backup without resetting low_latency.

    Historically, low_latency was used to force wake-up the reading
    process rather than wait for the next scheduler tick. The
    effect was to trim multiple milliseconds of latency from
    when the process would receive new data.

    Recent tests [1] have shown that the reading process now receives
    data with only 10's of microseconds latency without low_latency set.

    Remove the low_latency rx steering from tty_flip_buffer_push();
    however, leave the knob as an optional hint to drivers that can
    tune their rx fifos and such like. Cleanup stale code comments
    regarding low_latency.

    [1] https://lkml.org/lkml/2014/2/20/434

    "Yay.. thats an annoying historical pain in the butt gone."
    -- Alan Cox

    Reported-by: Beat Bolli
    Reported-by: Pavel Roskin
    Acked-by: David Sterba
    Cc: Grant Edwards
    Cc: Stanislaw Gruszka
    Cc: Hal Murray
    Signed-off-by: Peter Hurley
    Signed-off-by: Alan Cox
    Signed-off-by: Greg Kroah-Hartman

    Peter Hurley
     
  • commit 723abd87f6e536f1353c8f64f621520bc29523a3 upstream.

    The 'active' sysfs attribute should refer to the currently active tty
    devices the console is running on, not the currently active console. The
    console structure doesn't refer to any device in sysfs, only the tty the
    console is running on has. So we need to print out the tty names in
    'active', not the console names.

    There is one special-case, which is tty0. If the console is directed to
    it, we want 'tty0' to show up in the file, so user-space knows that the
    messages get forwarded to the active VT. The ->device() callback would
    resolve tty0, though. Hence, treat it special and don't call into the VT
    layer to resolve it (plymouth is known to depend on it).

    Cc: Lennart Poettering
    Cc: Kay Sievers
    Cc: Jiri Slaby
    Signed-off-by: Werner Fink
    Signed-off-by: Hannes Reinecke
    Signed-off-by: David Herrmann
    Signed-off-by: Greg Kroah-Hartman

    Hannes Reinecke
     
  • commit 4afddd60a770560d370d6f85c5aef57c16bf7502 upstream.

    kernfs_iattrs is allocated lazily when operations which require it
    take place; unfortunately, the lazy allocation and returning weren't
    properly synchronized and when there are multiple concurrent
    operations, it might end up returning kernfs_iattrs which hasn't
    finished initialization yet or different copies to different callers.

    Fix it by synchronizing with a mutex. This can be smarter with memory
    barriers but let's go there if it actually turns out to be necessary.

    Signed-off-by: Tejun Heo
    Link: http://lkml.kernel.org/g/533ABA32.9080602@oracle.com
    Reported-by: Sasha Levin
    Signed-off-by: Greg Kroah-Hartman

    Tejun Heo
     
  • commit 88391d49abb7d8dee91d405f96bd9e003cb6798d upstream.

    The hash values 0 and 1 are reserved for magic directory entries, but
    the code only prevents names hashing to 0. This patch fixes the test
    to also prevent hash value 1.

    Signed-off-by: Richard Cochran
    Reviewed-by: "Eric W. Biederman"
    Signed-off-by: Greg Kroah-Hartman

    Richard Cochran
     
  • commit b34aa86f12e8848ba453215602c8c50fa63c4cb3 upstream.

    Mmapping a comedi data buffer with lockdep checking enabled produced the
    following kernel debug messages:

    ======================================================
    [ INFO: possible circular locking dependency detected ]
    3.5.0-rc3-ija1+ #9 Tainted: G C
    -------------------------------------------------------
    comedi_test/4160 is trying to acquire lock:
    (&dev->mutex#2){+.+.+.}, at: [] comedi_mmap+0x57/0x1d9 [comedi]

    but task is already holding lock:
    (&mm->mmap_sem){++++++}, at: [] vm_mmap_pgoff+0x41/0x76

    which lock already depends on the new lock.

    the existing dependency chain (in reverse order) is:

    -> #1 (&mm->mmap_sem){++++++}:
    [] lock_acquire+0x97/0x105
    [] might_fault+0x6d/0x90
    [] do_devinfo_ioctl.isra.7+0x11e/0x14c [comedi]
    [] comedi_unlocked_ioctl+0x256/0xe48 [comedi]
    [] vfs_ioctl+0x18/0x34
    [] do_vfs_ioctl+0x382/0x43c
    [] sys_ioctl+0x42/0x65
    [] system_call_fastpath+0x16/0x1b

    -> #0 (&dev->mutex#2){+.+.+.}:
    [] __lock_acquire+0x101d/0x1591
    [] lock_acquire+0x97/0x105
    [] mutex_lock_nested+0x46/0x2a4
    [] comedi_mmap+0x57/0x1d9 [comedi]
    [] mmap_region+0x281/0x492
    [] do_mmap_pgoff+0x26b/0x2a7
    [] vm_mmap_pgoff+0x5d/0x76
    [] sys_mmap_pgoff+0xc7/0x10d
    [] sys_mmap+0x16/0x20
    [] system_call_fastpath+0x16/0x1b

    other info that might help us debug this:

    Possible unsafe locking scenario:

    CPU0 CPU1
    ---- ----
    lock(&mm->mmap_sem);
    lock(&dev->mutex#2);
    lock(&mm->mmap_sem);
    lock(&dev->mutex#2);

    *** DEADLOCK ***

    To avoid the circular dependency, just try to get the lock in
    `comedi_mmap()` instead of blocking. Since the comedi device's main mutex
    is heavily used, do a down-read of its `attach_lock` rwsemaphore
    instead. Trying to down-read `attach_lock` should only fail if
    some task has down-write locked it, and that is only done while the
    comedi device is being attached to or detached from a low-level hardware
    device.

    Unfortunately, acquiring the `attach_lock` doesn't prevent another
    task replacing the comedi data buffer we are trying to mmap. The
    details of the buffer are held in a `struct comedi_buf_map` and pointed
    to by `s->async->buf_map` where `s` is the comedi subdevice whose buffer
    we are trying to map. The `struct comedi_buf_map` is already reference
    counted with a `struct kref`, so we can stop it being freed prematurely.

    Modify `comedi_mmap()` to call new function
    `comedi_buf_map_from_subdev_get()` to read the subdevice's current
    buffer map pointer and increment its reference instead of accessing
    `async->buf_map` directly. Call `comedi_buf_map_put()` to decrement the
    reference once the buffer map structure has been dealt with. (Note that
    `comedi_buf_map_put()` does nothing if passed a NULL pointer.)

    `comedi_buf_map_from_subdev_get()` checks the subdevice's buffer map
    pointer has been set and the buffer map has been initialized enough for
    `comedi_mmap()` to deal with it (specifically, check the `n_pages`
    member has been set to a non-zero value). If all is well, the buffer
    map's reference is incremented and a pointer to it is returned. The
    comedi subdevice's spin-lock is used to protect the checks. Also use
    the spin-lock in `__comedi_buf_alloc()` and `__comedi_buf_free()` to
    protect changes to the subdevice's buffer map structure pointer and the
    buffer map structure's `n_pages` member. (This checking of `n_pages` is
    a bit clunky and I [Ian Abbott] plan to deal with it in the future.)

    Signed-off-by: Ian Abbott
    Signed-off-by: Greg Kroah-Hartman

    Ian Abbott
     
  • commit 268d1e799663b795cba15c64f5d29407786a9dd4 upstream.

    According to National Instruments' PCI-DIO-96/PXI-6508/PCI-6503 User
    Manual, the physical address in PCI BAR1 needs to be OR'ed with 0x80 and
    written to register offset 0xC0 in the "MITE" registers (BAR0). Do so
    during initialization of the National Instruments boards handled by the
    "8255_pci" driver. The boards were previously handled by the
    "ni_pcidio" driver, where the initialization was done by `mite_setup()`
    in the "mite" module. The "mite" module comes with too much extra
    baggage for the "8255_pci" driver to deal with so use a local, simpler
    initialization function.

    Signed-off-by: Ian Abbott
    Signed-off-by: Greg Kroah-Hartman

    Ian Abbott
     
  • commit 0bf6368ee8f25826d0645c0f7a4f17c8845356a4 upstream.

    Commit 1696d9d (ACPI: Remove the old /proc/acpi/event interface)
    removed ACPI Button event which originally was sent to userspace via
    /proc/acpi/event. This caused ACPI shutdown regression on gentoo
    in VirtualBox. Now ACPI events are sent to userspace via netlink,
    so add ACPI Button event back via netlink routine.

    References: https://bugzilla.kernel.org/show_bug.cgi?id=71721
    Reported-and-tested-by: Richard Musil
    Signed-off-by: Lan Tianyu
    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Greg Kroah-Hartman

    Lan Tianyu
     
  • commit 017fcdc30cdae18c0946eef1ece1f14b4c7897ba upstream.

    This patch corrects iATU programming for cfg1, io and mem viewport. Enable
    ATU only after configuring it.

    Signed-off-by: Mohit Kumar
    Signed-off-by: Ajay Khandelwal
    Signed-off-by: Bjorn Helgaas
    Acked-by: Jingoo Han
    Signed-off-by: Greg Kroah-Hartman

    Mohit Kumar
     
  • commit dbffdd6862e67d60703f2df66c558bf448f81d6e upstream.

    The Synopsys PCIe core provides one pair of 32-bit BARs (BAR 0 and BAR 1).
    The BARs can be configured as follows:

    - One 64-bit BAR: BARs 0 and 1 are combined to form a single 64-bit BAR
    - Two 32-bit BARs: BARs 0 and 1 are two independent 32-bit BARs

    This patch corrects 64-bit, non-prefetchable memory BAR configuration
    implemented in dw driver.

    Signed-off-by: Mohit Kumar
    Signed-off-by: Bjorn Helgaas
    Cc: Pratyush Anand
    Cc: Jingoo Han
    Cc: Arnd Bergmann
    Signed-off-by: Greg Kroah-Hartman

    Mohit Kumar
     
  • commit 6f8a1b335fde143b7407036e2368d3cd6eb55674 upstream.

    Commit 03bbcb2e7e2 (iommu/vt-d: add quirk for broken interrupt
    remapping on 55XX chipsets) properly disables irq remapping on the
    5500/5520 chipsets that don't correctly perform that feature.

    However, when I wrote it, I followed the errata sheet linked in that
    commit too closely, and explicitly tied the activation of the quirk to
    revision 0x13 of the chip, under the assumption that earlier revisions
    were not in the field. Recently a system was reported to be suffering
    from this remap bug and the quirk hadn't triggered, because the
    revision id register read at a lower value that 0x13, so the quirk
    test failed improperly. Given this, it seems only prudent to adjust
    this quirk so that any revision less than 0x13 has the quirk asserted.

    [ tglx: Removed the 0x12 comparison of pci id 3405 as this is covered
    by the
    Cc: Ingo Molnar
    Cc: "H. Peter Anvin"
    Cc: x86@kernel.org
    Link: http://lkml.kernel.org/r/1394649873-14913-1-git-send-email-nhorman@tuxdriver.com
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Neil Horman
     
  • commit ca3ba2a2f4a49a308e7d78c784d51b2332064f15 upstream.

    This patch bypass the timer_irq_works() check for hyperv guest since:

    - It was guaranteed to work.
    - timer_irq_works() may fail sometime due to the lpj calibration were inaccurate
    in a hyperv guest or a buggy host.

    In the future, we should get the tsc frequency from hypervisor and use preset
    lpj instead.

    [ hpa: I would prefer to not defer things to "the future" in the future... ]

    Cc: K. Y. Srinivasan
    Cc: Haiyang Zhang
    Acked-by: K. Y. Srinivasan
    Signed-off-by: Jason Wang
    Link: http://lkml.kernel.org/r/1393558229-14755-1-git-send-email-jasowang@redhat.com
    Signed-off-by: H. Peter Anvin
    Signed-off-by: Greg Kroah-Hartman

    Jason Wang
     
  • commit a94cdd1f4d30f12904ab528152731fb13a812a16 upstream.

    In read_all_bytes, we do

    unsigned char i;
    ...
    bt->read_data[0] = BMC2HOST;
    bt->read_count = bt->read_data[0];
    ...
    for (i = 1; i read_count; i++)
    bt->read_data[i] = BMC2HOST;

    If bt->read_data[0] == bt->read_count == 255, we loop infinitely in the
    'for' loop. Make 'i' an 'int' instead of 'char' to get rid of the
    overflow and finish the loop after 255 iterations every time.

    Signed-off-by: Jiri Slaby
    Reported-and-debugged-by: Rui Hui Dian
    Cc: Tomas Cech
    Cc: Corey Minyard
    Cc:
    Signed-off-by: Corey Minyard
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Jiri Slaby
     
  • commit e79323bd87808fdfbc68ce6c5371bd224d9672ee upstream.

    smp_read_barrier_depends() can be used if there is data dependency between
    the readers - i.e. if the read operation after the barrier uses address
    that was obtained from the read operation before the barrier.

    In this file, there is only control dependency, no data dependecy, so the
    use of smp_read_barrier_depends() is incorrect. The code could fail in the
    following way:
    * the cpu predicts that idx < entries is true and starts executing the
    body of the for loop
    * the cpu fetches map->extent[0].first and map->extent[0].count
    * the cpu fetches map->nr_extents
    * the cpu verifies that idx < extents is true, so it commits the
    instructions in the body of the for loop

    The problem is that in this scenario, the cpu read map->extent[0].first
    and map->nr_extents in the wrong order. We need a full read memory barrier
    to prevent it.

    Signed-off-by: Mikulas Patocka
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Mikulas Patocka
     

14 Apr, 2014

6 commits

  • Greg Kroah-Hartman
     
  • commit 8ceee72808d1ae3fb191284afc2257a2be964725 upstream.

    The GHASH setkey() function uses SSE registers but fails to call
    kernel_fpu_begin()/kernel_fpu_end(). Instead of adding these calls, and
    then having to deal with the restriction that they cannot be called from
    interrupt context, move the setkey() implementation to the C domain.

    Note that setkey() does not use any particular SSE features and is not
    expected to become a performance bottleneck.

    Signed-off-by: Ard Biesheuvel
    Acked-by: H. Peter Anvin
    Fixes: 0e1227d356e9b (crypto: ghash - Add PCLMULQDQ accelerated implementation)
    Signed-off-by: Herbert Xu
    Signed-off-by: Greg Kroah-Hartman

    Ard Biesheuvel
     
  • commit e571c58f313d35c56e0018470e3375ddd1fd320e upstream.

    Skip the futex_atomic_cmpxchg_inatomic() test in futex_init(). It causes a
    fatal exception on 68030 (and presumably 68020 also).

    Signed-off-by: Finn Thain
    Acked-by: Geert Uytterhoeven
    Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1403061006440.5525@nippy.intranet
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Finn Thain
     
  • commit 03b8c7b623c80af264c4c8d6111e5c6289933666 upstream.

    If an architecture has futex_atomic_cmpxchg_inatomic() implemented and there
    is no runtime check necessary, allow to skip the test within futex_init().

    This allows to get rid of some code which would always give the same result,
    and also allows the compiler to optimize a couple of if statements away.

    Signed-off-by: Heiko Carstens
    Cc: Finn Thain
    Cc: Geert Uytterhoeven
    Link: http://lkml.kernel.org/r/20140302120947.GA3641@osiris
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Heiko Carstens
     
  • commit 61fb4bfc010b0d2940f7fd87acbce6a0f03217cb upstream.

    Despite the switch to right UART driver (prev patch), serial console
    still doesn't work due to missing CONFIG_SERIAL_OF_PLATFORM

    Also fix the default cmdline in DT to not refer to out-of-tree
    ARC framebuffer driver for console.

    Signed-off-by: Vineet Gupta
    Cc: Francois Bedard
    Signed-off-by: Greg Kroah-Hartman

    Vineet Gupta
     
  • commit 6eda477b3c54b8236868c8784e5e042ff14244f0 upstream.

    The Synopsys APB DW UART has a couple of special features that are not
    in the System C model. In 3.8, the 8250_dw driver didn't really use these
    features, but from 3.9 onwards, the 8250_dw driver has become incompatible
    with our model.

    Signed-off-by: Mischa Jonker
    Signed-off-by: Vineet Gupta
    Cc: Francois Bedard
    Signed-off-by: Greg Kroah-Hartman

    Mischa Jonker