23 Oct, 2015

40 commits

  • commit 03a2d2a3eafe4015412cf4e9675ca0e2d9204074 upstream.

    Commit description is copied from the original post of this bug:

    http://comments.gmane.org/gmane.linux.kernel.mm/135349

    Kernels after v3.9 use kmalloc_size(INDEX_NODE + 1) to get the next
    larger cache size than the size index INDEX_NODE mapping. In kernels
    3.9 and earlier we used malloc_sizes[INDEX_L3 + 1].cs_size.

    However, sometimes we can't get the right output we expected via
    kmalloc_size(INDEX_NODE + 1), causing a BUG().

    The mapping table in the latest kernel is like:
    index = {0, 1, 2 , 3, 4, 5, 6, n}
    size = {0, 96, 192, 8, 16, 32, 64, 2^n}
    The mapping table before 3.10 is like this:
    index = {0 , 1 , 2, 3, 4 , 5 , 6, n}
    size = {32, 64, 96, 128, 192, 256, 512, 2^(n+3)}

    The problem on my mips64 machine is as follows:

    (1) When configured DEBUG_SLAB && DEBUG_PAGEALLOC && DEBUG_LOCK_ALLOC
    && DEBUG_SPINLOCK, the sizeof(struct kmem_cache_node) will be "150",
    and the macro INDEX_NODE turns out to be "2": #define INDEX_NODE
    kmalloc_index(sizeof(struct kmem_cache_node))

    (2) Then the result of kmalloc_size(INDEX_NODE + 1) is 8.

    (3) Then "if(size >= kmalloc_size(INDEX_NODE + 1)" will lead to "size
    = PAGE_SIZE".

    (4) Then "if ((size >= (PAGE_SIZE >> 3))" test will be satisfied and
    "flags |= CFLGS_OFF_SLAB" will be covered.

    (5) if (flags & CFLGS_OFF_SLAB)" test will be satisfied and will go to
    "cachep->slabp_cache = kmalloc_slab(slab_size, 0u)", and the result
    here may be NULL while kernel bootup.

    (6) Finally,"BUG_ON(ZERO_OR_NULL_PTR(cachep->slabp_cache));" causes the
    BUG info as the following shows (may be only mips64 has this problem):

    This patch fixes the problem of kmalloc_size(INDEX_NODE + 1) and removes
    the BUG by adding 'size >= 256' check to guarantee that all necessary
    small sized slabs are initialized regardless sequence of slab size in
    mapping table.

    Fixes: e33660165c90 ("slab: Use common kmalloc_index/kmalloc_size...")
    Signed-off-by: Joonsoo Kim
    Reported-by: Liuhailong
    Acked-by: Christoph Lameter
    Cc: Pekka Enberg
    Cc: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Joonsoo Kim
     
  • commit 7180dddf7c32c49975c7e7babf2b60ed450cb760 upstream.

    The kernel may delay interrupts for a long time which can result in timers
    being delayed. If this occurs the intel_pstate driver will crash with a
    divide by zero error:

    divide error: 0000 [#1] SMP
    Modules linked in: btrfs zlib_deflate raid6_pq xor msdos ext4 mbcache jbd2 binfmt_misc arc4 md4 nls_utf8 cifs dns_resolver tcp_lp bnep bluetooth rfkill fuse dm_service_time iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_ftp ip6t_rpfilter ip6t_REJECT ipt_REJECT xt_conntrack ebtable_nat ebtable_broute bridge stp llc ebtable_filter ebtables ip6table_nat nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_ipv6 ip6table_mangle ip6table_security ip6table_raw ip6table_filter ip6_tables iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw iptable_filter ip_tables intel_powerclamp coretemp vfat fat kvm_intel iTCO_wdt iTCO_vendor_support ipmi_devintf sr_mod kvm crct10dif_pclmul
    crc32_pclmul crc32c_intel ghash_clmulni_intel aesni_intel cdc_ether lrw usbnet cdrom mii gf128mul glue_helper ablk_helper cryptd lpc_ich mfd_core pcspkr sb_edac edac_core ipmi_si ipmi_msghandler ioatdma wmi shpchp acpi_pad nfsd auth_rpcgss nfs_acl lockd uinput dm_multipath sunrpc xfs libcrc32c usb_storage sd_mod crc_t10dif crct10dif_common ixgbe mgag200 syscopyarea sysfillrect sysimgblt mdio drm_kms_helper ttm igb drm ptp pps_core dca i2c_algo_bit megaraid_sas i2c_core dm_mirror dm_region_hash dm_log dm_mod
    CPU: 113 PID: 0 Comm: swapper/113 Tainted: G W -------------- 3.10.0-229.1.2.el7.x86_64 #1
    Hardware name: IBM x3950 X6 -[3837AC2]-/00FN827, BIOS -[A8E112BUS-1.00]- 08/27/2014
    task: ffff880fe8abe660 ti: ffff880fe8ae4000 task.ti: ffff880fe8ae4000
    RIP: 0010:[] [] intel_pstate_timer_func+0x179/0x3d0
    RSP: 0018:ffff883fff4e3db8 EFLAGS: 00010206
    RAX: 0000000027100000 RBX: ffff883fe6965100 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: 0000000000000010 RDI: 000000002e53632d
    RBP: ffff883fff4e3e20 R08: 000e6f69a5a125c0 R09: ffff883fe84ec001
    R10: 0000000000000002 R11: 0000000000000005 R12: 00000000000049f5
    R13: 0000000000271000 R14: 00000000000049f5 R15: 0000000000000246
    FS: 0000000000000000(0000) GS:ffff883fff4e0000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 00007f7668601000 CR3: 000000000190a000 CR4: 00000000001407e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Stack:
    ffff883fff4e3e58 ffffffff81099dc1 0000000000000086 0000000000000071
    ffff883fff4f3680 0000000000000071 fbdc8a965e33afee ffffffff810b69dd
    ffff883fe84ec000 ffff883fe6965108 0000000000000100 ffffffff814a9100
    Call Trace:

    [] ? run_posix_cpu_timers+0x51/0x840
    [] ? trigger_load_balance+0x5d/0x200
    [] ? pid_param_set+0x130/0x130
    [] call_timer_fn+0x36/0x110
    [] ? pid_param_set+0x130/0x130
    [] run_timer_softirq+0x21f/0x320
    [] __do_softirq+0xef/0x280
    [] call_softirq+0x1c/0x30
    [] do_softirq+0x65/0xa0
    [] irq_exit+0x115/0x120
    [] smp_apic_timer_interrupt+0x45/0x60
    [] apic_timer_interrupt+0x6d/0x80

    [] ? cpuidle_enter_state+0x52/0xc0
    [] ? cpuidle_enter_state+0x48/0xc0
    [] cpuidle_idle_call+0xc5/0x200
    [] arch_cpu_idle+0xe/0x30
    [] cpu_startup_entry+0xf1/0x290
    [] start_secondary+0x1ba/0x230
    Code: 42 0f 00 45 89 e6 48 01 c2 43 8d 44 6d 00 39 d0 73 26 49 c1 e5 08 89 d2 4d 63 f4 49 63 c5 48 c1 e2 08 48 c1 e0 08 48 63 ca 48 99 f7 f9 48 98 4c 0f af f0 49 c1 ee 08 8b 43 78 c1 e0 08 44 29
    RIP [] intel_pstate_timer_func+0x179/0x3d0
    RSP

    The kernel values for cpudata for CPU 113 were:

    struct cpudata {
    cpu = 113,
    timer = {
    entry = {
    next = 0x0,
    prev = 0xdead000000200200
    },
    expires = 8357799745,
    base = 0xffff883fe84ec001,
    function = 0xffffffff814a9100 ,
    data = 18446612406765768960,

    i_gain = 0,
    d_gain = 0,
    deadband = 0,
    last_err = 22489
    },
    last_sample_time = {
    tv64 = 4063132438017305
    },
    prev_aperf = 287326796397463,
    prev_mperf = 251427432090198,
    sample = {
    core_pct_busy = 23081,
    aperf = 2937407,
    mperf = 3257884,
    freq = 2524484,
    time = {
    tv64 = 4063149215234118
    }
    }
    }

    which results in the time between samples = last_sample_time - sample.time
    = 4063149215234118 - 4063132438017305 = 16777216813 which is 16.777 seconds.

    The duration between reads of the APERF and MPERF registers overflowed a s32
    sized integer in intel_pstate_get_scaled_busy()'s call to div_fp(). The result
    is that int_tofp(duration_us) == 0, and the kernel attempts to divide by 0.

    While the kernel shouldn't be delaying for a long time, it can and does
    happen and the intel_pstate driver should not panic in this situation. This
    patch changes the div_fp() function to use div64_s64() to allow for "long"
    division. This will avoid the overflow condition on long delays.

    [v2]: use div64_s64() in div_fp()

    Signed-off-by: Prarit Bhargava
    Signed-off-by: Rafael J. Wysocki
    Cc: Thomas Renninger
    Signed-off-by: Greg Kroah-Hartman

    Prarit Bhargava
     
  • commit 8f1bd8f2ad2358d6a88c115481ff3e69817d1bde upstream.

    If atmel_init_gpios fails the port has already been marked as busy (in
    line 2629), so this must be undone in the error path.

    This bug was introduced because I created the patch that finally
    became 722ccf416ac2 ("serial: atmel: fix error handling when
    mctrl_gpio_init fails") on top of 3.19 which didn't have commit
    6fbb9bdf0f3f ("tty/serial: at91: fix error handling in
    atmel_serial_probe()") yet.

    Signed-off-by: Uwe Kleine-König
    Fixes: 722ccf416ac2 ("serial: atmel: fix error handling when mctrl_gpio_init fails")
    Acked-by: Nicolas Ferre
    Signed-off-by: Greg Kroah-Hartman

    Uwe Kleine-König
     
  • commit 3c5a0357fdb3a9116a48dbdb0abb91fd23fbff80 upstream.

    This adds an entry to the uart_config table for PORT_RT2880
    enabling rx/tx FIFOs. The UART is actually a Palmchip BK-3103
    which is found in several devices from Alchemy/RMI, Ralink, and
    Sigma Designs.

    Signed-off-by: Mans Rullgard
    Signed-off-by: Greg Kroah-Hartman

    Mans Rullgard
     
  • commit 0c55627167870255158db1cde0d28366f91c8872 upstream.

    This is mostly a hardening fix, given that write-only access to other
    users' ttys is usually only given through setgid tty executables.

    Signed-off-by: Jann Horn
    Signed-off-by: Greg Kroah-Hartman

    Jann Horn
     
  • commit e81107d4c6bd098878af9796b24edc8d4a9524fd upstream.

    My colleague ran into a program stall on a x86_64 server, where
    n_tty_read() was waiting for data even if there was data in the buffer
    in the pty. kernel stack for the stuck process looks like below.
    #0 [ffff88303d107b58] __schedule at ffffffff815c4b20
    #1 [ffff88303d107bd0] schedule at ffffffff815c513e
    #2 [ffff88303d107bf0] schedule_timeout at ffffffff815c7818
    #3 [ffff88303d107ca0] wait_woken at ffffffff81096bd2
    #4 [ffff88303d107ce0] n_tty_read at ffffffff8136fa23
    #5 [ffff88303d107dd0] tty_read at ffffffff81368013
    #6 [ffff88303d107e20] __vfs_read at ffffffff811a3704
    #7 [ffff88303d107ec0] vfs_read at ffffffff811a3a57
    #8 [ffff88303d107f00] sys_read at ffffffff811a4306
    #9 [ffff88303d107f50] entry_SYSCALL_64_fastpath at ffffffff815c86d7

    There seems to be two problems causing this issue.

    First, in drivers/tty/n_tty.c, __receive_buf() stores the data and
    updates ldata->commit_head using smp_store_release() and then checks
    the wait queue using waitqueue_active(). However, since there is no
    memory barrier, __receive_buf() could return without calling
    wake_up_interactive_poll(), and at the same time, n_tty_read() could
    start to wait in wait_woken() as in the following chart.

    __receive_buf() n_tty_read()
    ------------------------------------------------------------------------
    if (waitqueue_active(&tty->read_wait))
    /* Memory operations issued after the
    RELEASE may be completed before the
    RELEASE operation has completed */
    add_wait_queue(&tty->read_wait, &wait);
    ...
    if (!input_available_p(tty, 0)) {
    smp_store_release(&ldata->commit_head,
    ldata->read_head);
    ...
    timeout = wait_woken(&wait,
    TASK_INTERRUPTIBLE, timeout);
    ------------------------------------------------------------------------

    The second problem is that n_tty_read() also lacks a memory barrier
    call and could also cause __receive_buf() to return without calling
    wake_up_interactive_poll(), and n_tty_read() to wait in wait_woken()
    as in the chart below.

    __receive_buf() n_tty_read()
    ------------------------------------------------------------------------
    spin_lock_irqsave(&q->lock, flags);
    /* from add_wait_queue() */
    ...
    if (!input_available_p(tty, 0)) {
    /* Memory operations issued after the
    RELEASE may be completed before the
    RELEASE operation has completed */
    smp_store_release(&ldata->commit_head,
    ldata->read_head);
    if (waitqueue_active(&tty->read_wait))
    __add_wait_queue(q, wait);
    spin_unlock_irqrestore(&q->lock,flags);
    /* from add_wait_queue() */
    ...
    timeout = wait_woken(&wait,
    TASK_INTERRUPTIBLE, timeout);
    ------------------------------------------------------------------------

    There are also other places in drivers/tty/n_tty.c which have similar
    calls to waitqueue_active(), so instead of adding many memory barrier
    calls, this patch simply removes the call to waitqueue_active(),
    leaving just wake_up*() behind.

    This fixes both problems because, even though the memory access before
    or after the spinlocks in both wake_up*() and add_wait_queue() can
    sneak into the critical section, it cannot go past it and the critical
    section assures that they will be serialized (please see "INTER-CPU
    ACQUIRING BARRIER EFFECTS" in Documentation/memory-barriers.txt for a
    better explanation). Moreover, the resulting code is much simpler.

    Latency measurement using a ping-pong test over a pty doesn't show any
    visible performance drop.

    Signed-off-by: Kosuke Tatsukawa
    Signed-off-by: Greg Kroah-Hartman

    Kosuke Tatsukawa
     
  • commit b1d562acc78f0af46de0dfe447410bc40bdb7ece upstream.

    Here is a patch to make speakup-r work again.

    It broke in 3.6 due to commit 4369c64c79a22b98d3b7eff9d089196cd878a10a
    "Input: Send events one packet at a time)

    The problem was that the fakekey.c routine to fake a down arrow no
    longer functioned properly and putting the input_sync fixed it.

    Fixes: 4369c64c79a22b98d3b7eff9d089196cd878a10a
    Acked-by: Samuel Thibault
    Signed-off-by: John Covici
    Signed-off-by: Greg Kroah-Hartman
    Signed-off-by: Greg Kroah-Hartman

    covici@ccs.covici.com
     
  • commit 2bffa1503c5c06192eb1459180fac4416575a966 upstream.

    The cleaner policy doesn't make use of the per cache block hint space in
    the metadata (unlike the other policies). When switching from the
    cleaner policy to mq or smq a NULL pointer crash (in dm_tm_new_block)
    was observed. The crash was caused by bugs in dm-cache-metadata.c
    when trying to skip creation of the hint btree.

    The minimal fix is to change hint size for the cleaner policy to 4 bytes
    (only hint size supported).

    Signed-off-by: Joe Thornber
    Signed-off-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Joe Thornber
     
  • commit 2a708cff93f1845b9239bc7d6310aef54e716c6a upstream.

    __dm_destroy() takes io_barrier SRCU lock (dm_get_live_table) and
    suspend_lock in reverse order. Doing so can cause AB-BA deadlock:

    __dm_destroy dm_swap_table
    ---------------------------------------------------
    mutex_lock(suspend_lock)
    dm_get_live_table()
    srcu_read_lock(io_barrier)
    dm_sync_table()
    synchronize_srcu(io_barrier)
    .. waiting for dm_put_live_table()
    mutex_lock(suspend_lock)
    .. waiting for suspend_lock

    Fix this by taking the locks in proper order.

    Signed-off-by: Jun'ichi Nomura
    Fixes: ab7c7bb6f4ab ("dm: hold suspend_lock while suspending device during device deletion")
    Acked-by: Mikulas Patocka
    Signed-off-by: Mike Snitzer
    Signed-off-by: Greg Kroah-Hartman

    Junichi Nomura
     
  • commit daf3761c9fcde0f4ca64321cbed6c1c86d304193 upstream.

    Leandro Awa writes:
    "After switching to version 4.1.6, our parallelized and distributed
    workflows now fail consistently with errors of the form:

    T34: ./regex.c:39:22: error: config.h: No such file or directory

    From our 'git bisect' testing, the following commit appears to be the
    possible cause of the behavior we've been seeing: commit 766c4cbfacd8"

    Al Viro says:
    "What happens is that 766c4cbfacd8 got the things subtly wrong.

    We used to treat d_is_negative() after lookup_fast() as "fall with
    ENOENT". That was wrong - checking ->d_flags outside of ->d_seq
    protection is unreliable and failing with hard error on what should've
    fallen back to non-RCU pathname resolution is a bug.

    Unfortunately, we'd pulled the test too far up and ran afoul of
    another kind of staleness. The dentry might have been absolutely
    stable from the RCU point of view (and we might be on UP, etc), but
    stale from the remote fs point of view. If ->d_revalidate() returns
    "it's actually stale", dentry gets thrown away and the original code
    wouldn't even have looked at its ->d_flags.

    What we need is to check ->d_flags where 766c4cbfacd8 does (prior to
    ->d_seq validation) but only use the result in cases where we do not
    discard this dentry outright"

    Reported-by: Leandro Awa
    Link: https://bugzilla.kernel.org/show_bug.cgi?id=104911
    Fixes: 766c4cbfacd8 ("namei: d_is_negative() should be checked...")
    Tested-by: Leandro Awa
    Signed-off-by: Trond Myklebust
    Acked-by: Al Viro
    Signed-off-by: Linus Torvalds
    Signed-off-by: Greg Kroah-Hartman

    Trond Myklebust
     
  • commit 19e79687de22f23bcfb5e79cce3daba20af228d1 upstream.

    On the OMAP AM3517 platform the uart4_ick gets registered
    twice, causing any power management to /dev/ttyO3 to fail
    when trying to wake the device up.

    This solves the following oops:

    [] Unhandled fault: external abort on non-linefetch (0x1028) at 0xfa09e008
    [] PC is at serial_omap_pm+0x48/0x15c
    [] LR is at _raw_spin_unlock_irqrestore+0x30/0x5c

    Fixes: aafd900cab87 ("CLK: TI: add omap3 clock init file")
    Cc: mturquette@baylibre.com
    Cc: sboyd@codeaurora.org
    Cc: linux-clk@vger.kernel.org
    Cc: linux-omap@vger.kernel.org
    Cc: linux-kernel@lists.codethink.co.uk
    Signed-off-by: Ben Dooks
    Signed-off-by: Tero Kristo
    Signed-off-by: Greg Kroah-Hartman

    Ben Dooks
     
  • commit 3ec0c97959abff33a42db9081c22132bcff5b4f2 upstream.

    If filelayout_decode_layout fail, _filelayout_free_lseg will causes
    a double freeing of fh_array.

    [ 1179.279800] BUG: unable to handle kernel NULL pointer dereference at (null)
    [ 1179.280198] IP: [] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
    [ 1179.281010] PGD 0
    [ 1179.281443] Oops: 0000 [#1]
    [ 1179.281831] Modules linked in: nfs_layout_nfsv41_files(OE) nfsv4(OE) nfs(OE) fscache(E) xfs libcrc32c coretemp nfsd crct10dif_pclmul ppdev crc32_pclmul crc32c_intel auth_rpcgss ghash_clmulni_intel nfs_acl lockd vmw_balloon grace sunrpc parport_pc vmw_vmci parport shpchp i2c_piix4 vmwgfx drm_kms_helper ttm drm serio_raw mptspi scsi_transport_spi mptscsih e1000 mptbase ata_generic pata_acpi [last unloaded: fscache]
    [ 1179.283891] CPU: 0 PID: 13336 Comm: cat Tainted: G OE 4.3.0-rc1-pnfs+ #244
    [ 1179.284323] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
    [ 1179.285206] task: ffff8800501d48c0 ti: ffff88003e3c4000 task.ti: ffff88003e3c4000
    [ 1179.285668] RIP: 0010:[] [] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
    [ 1179.286612] RSP: 0018:ffff88003e3c77f8 EFLAGS: 00010202
    [ 1179.287092] RAX: 0000000000000000 RBX: ffff88001fe78900 RCX: 0000000000000000
    [ 1179.287731] RDX: ffffea0000f40760 RSI: ffff88001fe789c8 RDI: ffff88001fe789c0
    [ 1179.288383] RBP: ffff88003e3c7810 R08: ffffea0000f40760 R09: 0000000000000000
    [ 1179.289170] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88001fe789c8
    [ 1179.289959] R13: ffff88001fe789c0 R14: ffff88004ec05a80 R15: ffff88004f935b88
    [ 1179.290791] FS: 00007f4e66bb5700(0000) GS:ffffffff81c29000(0000) knlGS:0000000000000000
    [ 1179.291580] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [ 1179.292209] CR2: 0000000000000000 CR3: 00000000203f8000 CR4: 00000000001406f0
    [ 1179.292731] Stack:
    [ 1179.293195] ffff88001fe78900 00000000000000d0 ffff88001fe78178 ffff88003e3c7868
    [ 1179.293676] ffffffffa0272737 0000000000000001 0000000000000001 ffff88001fe78800
    [ 1179.294151] 00000000614fffce ffffffff81727671 ffff88001fe78100 ffff88001fe78100
    [ 1179.294623] Call Trace:
    [ 1179.295092] [] filelayout_alloc_lseg+0xa7/0x2d0 [nfs_layout_nfsv41_files]
    [ 1179.295625] [] ? out_of_line_wait_on_bit+0x81/0xb0
    [ 1179.296133] [] pnfs_layout_process+0xae/0x320 [nfsv4]
    [ 1179.296632] [] nfs4_proc_layoutget+0x2b1/0x360 [nfsv4]
    [ 1179.297134] [] pnfs_update_layout+0x853/0xb30 [nfsv4]
    [ 1179.297632] [] ? nfs_get_lock_context+0x74/0x170 [nfs]
    [ 1179.298158] [] filelayout_pg_init_read+0x37/0x50 [nfs_layout_nfsv41_files]
    [ 1179.298834] [] __nfs_pageio_add_request+0x119/0x460 [nfs]
    [ 1179.299385] [] ? nfs_create_request.part.9+0x37/0x2e0 [nfs]
    [ 1179.299872] [] nfs_pageio_add_request+0xa3/0x1b0 [nfs]
    [ 1179.300362] [] readpage_async_filler+0x85/0x260 [nfs]
    [ 1179.300907] [] read_cache_pages+0x91/0xd0
    [ 1179.301391] [] ? nfs_read_completion+0x220/0x220 [nfs]
    [ 1179.301867] [] nfs_readpages+0x128/0x200 [nfs]
    [ 1179.302330] [] __do_page_cache_readahead+0x203/0x280
    [ 1179.302784] [] ? __do_page_cache_readahead+0xd8/0x280
    [ 1179.303413] [] ondemand_readahead+0x1a6/0x2f0
    [ 1179.303855] [] page_cache_sync_readahead+0x31/0x50
    [ 1179.304286] [] generic_file_read_iter+0x4a6/0x5c0
    [ 1179.304711] [] ? __nfs_revalidate_mapping+0x1f6/0x240 [nfs]
    [ 1179.305132] [] nfs_file_read+0x52/0xa0 [nfs]
    [ 1179.305540] [] __vfs_read+0xcc/0x100
    [ 1179.305936] [] vfs_read+0x85/0x130
    [ 1179.306326] [] SyS_read+0x58/0xd0
    [ 1179.306708] [] entry_SYSCALL_64_fastpath+0x12/0x76
    [ 1179.307094] Code: c4 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 8b 07 49 89 f4 85 c0 74 47 48 8b 06 49 89 fd 8b 38 48 85 ff 74 22 31 db eb 0c 48 63 d3 48 8b 3c d0 48 85
    [ 1179.308357] RIP [] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
    [ 1179.309177] RSP
    [ 1179.309582] CR2: 0000000000000000

    Signed-off-by: Kinglong Mee
    Signed-off-by: Trond Myklebust
    Cc: William Dauchy
    Signed-off-by: Greg Kroah-Hartman

    Kinglong Mee
     
  • commit 9391dd00d13c853ab4f2a85435288ae2202e0e43 upstream.

    when opening a directory we want the overlayfs inode, not one from
    the topmost layer.

    Reported-By: Andrey Jr. Melnikov
    Tested-By: Andrey Jr. Melnikov
    Signed-off-by: Al Viro
    Cc: "Kamata, Munehisa"
    Signed-off-by: Greg Kroah-Hartman

    Al Viro
     
  • commit 4bacc9c9234c7c8eec44f5ed4e960d9f96fa0f01 upstream.

    Make file->f_path always point to the overlay dentry so that the path in
    /proc/pid/fd is correct and to ensure that label-based LSMs have access to the
    overlay as well as the underlay (path-based LSMs probably don't need it).

    Using my union testsuite to set things up, before the patch I see:

    [root@andromeda union-testsuite]# bash 5 /a/foo107
    [root@andromeda union-testsuite]# stat /mnt/a/foo107
    ...
    Device: 23h/35d Inode: 13381 Links: 1
    ...
    [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
    ...
    Device: 23h/35d Inode: 13381 Links: 1
    ...

    After the patch:

    [root@andromeda union-testsuite]# bash 5 /mnt/a/foo107
    [root@andromeda union-testsuite]# stat /mnt/a/foo107
    ...
    Device: 23h/35d Inode: 40346 Links: 1
    ...
    [root@andromeda union-testsuite]# stat -L /proc/$$/fd/5
    ...
    Device: 23h/35d Inode: 40346 Links: 1
    ...

    Note the change in where /proc/$$/fd/5 points to in the ls command. It was
    pointing to /a/foo107 (which doesn't exist) and now points to /mnt/a/foo107
    (which is correct).

    The inode accessed, however, is the lower layer. The union layer is on device
    25h/37d and the upper layer on 24h/36d.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro
    Cc: "Kamata, Munehisa"
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     
  • commit f25801ee4680ef1db21e15c112e6e5fe3ffe8da5 upstream.

    Call ovl_drop_write() earlier in ovl_dentry_open() before we call vfs_open()
    as we've done the copy up for which we needed the freeze-write lock by that
    point.

    Signed-off-by: David Howells
    Signed-off-by: Al Viro
    Cc: "Kamata, Munehisa"
    Signed-off-by: Greg Kroah-Hartman

    David Howells
     
  • commit da6fb7a9e5bd6f04f7e15070f630bdf1ea502841 upstream.

    Passing -1 to bitmap_storage_alloc() causes page->index to be set to
    -1, which is quite problematic.

    So only pass ->cluster_slot if mddev_is_clustered().

    Fixes: b97e92574c0b ("Use separate bitmaps for each nodes in the cluster")
    Signed-off-by: NeilBrown
    Signed-off-by: Greg Kroah-Hartman

    NeilBrown
     
  • commit 95c2b17534654829db428f11bcf4297c059a2a7e upstream.

    Per-IRQ directories in procfs are created only when a handler is first
    added to the irqdesc, not when the irqdesc is created. In the case of
    a shared IRQ, multiple tasks can race to create a directory. This
    race condition seems to have been present forever, but is easier to
    hit with async probing.

    Signed-off-by: Ben Hutchings
    Link: http://lkml.kernel.org/r/1443266636.2004.2.camel@decadent.org.uk
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Ben Hutchings
     
  • commit 6423fc34160939142d72ffeaa2db6408317f54df upstream.

    During driver probing the following code path is triggered.
    igb_probe
    ->igb_sw_init
    ->igb_probe_vfs
    ->igb_pci_enable_sriov
    ->igb_sriov_reinit

    Doing the SR-IOV re-init is not necessary during probing since we're
    starting from scratch. Here we can call igb_enable_sriov() right away.

    Running igb_sriov_reinit() during igb_probe() also seems to cause
    occasional packet loss on some onboard 82576 NICs. Reproduced on
    Dell and HP servers with onboard 82576 NICs.
    Example:
    Intel Corporation 82576 Gigabit Network Connection [8086:10c9] (rev 01)
    Subsystem: Dell Device [1028:0481]

    Signed-off-by: Stefan Assmann
    Tested-by: Aaron Brown
    Signed-off-by: Jeff Kirsher
    Cc: Daniel J Blueman
    Signed-off-by: Greg Kroah-Hartman

    Stefan Assmann
     
  • commit 274b045509175db0405c784be85e8cce116e6f7d upstream.

    If an interface isn't running napi_synchronize() will hang forever.

    [ 392.248403] rmmod R running task 0 359 343 0x00000000
    [ 392.257671] ffff88003760fc88 ffff880037193b40 ffff880037193160 ffff88003760fc88
    [ 392.267644] ffff880037610000 ffff88003760fcd8 0000000100014c22 ffffffff81f75c40
    [ 392.277524] 0000000000bc7010 ffff88003760fca8 ffffffff81796927 ffffffff81f75c40
    [ 392.287323] Call Trace:
    [ 392.291599] [] schedule+0x37/0x90
    [ 392.298553] [] schedule_timeout+0x14b/0x280
    [ 392.306421] [] ? irq_free_descs+0x69/0x80
    [ 392.314006] [] ? internal_add_timer+0xb0/0xb0
    [ 392.322125] [] msleep+0x37/0x50
    [ 392.329037] [] xennet_disconnect_backend.isra.24+0xda/0x390 [xen_netfront]
    [ 392.339658] [] xennet_remove+0x2c/0x80 [xen_netfront]
    [ 392.348516] [] xenbus_dev_remove+0x59/0xc0
    [ 392.356257] [] __device_release_driver+0x87/0x120
    [ 392.364645] [] driver_detach+0xb8/0xc0
    [ 392.371989] [] bus_remove_driver+0x59/0xe0
    [ 392.379883] [] driver_unregister+0x30/0x70
    [ 392.387495] [] xenbus_unregister_driver+0x12/0x20
    [ 392.395908] [] netif_exit+0x10/0x775 [xen_netfront]
    [ 392.404877] [] SyS_delete_module+0x1d8/0x230
    [ 392.412804] [] system_call_fastpath+0x12/0x71

    Signed-off-by: Chas Williams
    Signed-off-by: David S. Miller
    Cc: "Kamata, Munehisa"
    Signed-off-by: Greg Kroah-Hartman

    Chas Williams
     
  • commit 8474ba74193d302e8340dddd1e16c85cc4b98caf upstream.

    Make sure the compiler does not modify arguments of syscall functions.
    This can happen if the compiler generates a tailcall to another
    function. For example, without asmlinkage_protect sys_openat is compiled
    into this function:

    sys_openat:
    clr.l %d0
    move.w 18(%sp),%d0
    move.l %d0,16(%sp)
    jbra do_sys_open

    Note how the fourth argument is modified in place, modifying the register
    %d4 that gets restored from this stack slot when the function returns to
    user-space. The caller may expect the register to be unmodified across
    system calls.

    Signed-off-by: Andreas Schwab
    Signed-off-by: Geert Uytterhoeven
    Signed-off-by: Greg Kroah-Hartman

    Andreas Schwab
     
  • commit 569ba74a7ba69f46ce2950bf085b37fea2408385 upstream.

    This is the arm64 portion of commit 45cac65b0fcd ("readahead: fault
    retry breaks mmap file read random detection"), which was absent from
    the initial port and has since gone unnoticed. The original commit says:

    > .fault now can retry. The retry can break state machine of .fault. In
    > filemap_fault, if page is miss, ra->mmap_miss is increased. In the second
    > try, since the page is in page cache now, ra->mmap_miss is decreased. And
    > these are done in one fault, so we can't detect random mmap file access.
    >
    > Add a new flag to indicate .fault is tried once. In the second try, skip
    > ra->mmap_miss decreasing. The filemap_fault state machine is ok with it.

    With this change, Mark reports that:

    > Random read improves by 250%, sequential read improves by 40%, and
    > random write by 400% to an eMMC device with dm crypto wrapped around it.

    Cc: Shaohua Li
    Cc: Rik van Riel
    Cc: Wu Fengguang
    Signed-off-by: Mark Salyzyn
    Signed-off-by: Riley Andrews
    Signed-off-by: Will Deacon
    Signed-off-by: Greg Kroah-Hartman

    Mark Salyzyn
     
  • commit ee556d00cf20012e889344a0adbbf809ab5015a3 upstream.

    When function graph tracer is enabled, the following operation
    will trigger panic:

    mount -t debugfs nodev /sys/kernel
    echo next_tgid > /sys/kernel/tracing/set_ftrace_filter
    echo function_graph > /sys/kernel/tracing/current_tracer
    ls /proc/

    ------------[ cut here ]------------
    [ 198.501417] Unable to handle kernel paging request at virtual address cb88537fdc8ba316
    [ 198.506126] pgd = ffffffc008f79000
    [ 198.509363] [cb88537fdc8ba316] *pgd=00000000488c6003, *pud=00000000488c6003, *pmd=0000000000000000
    [ 198.517726] Internal error: Oops: 94000005 [#1] SMP
    [ 198.518798] Modules linked in:
    [ 198.520582] CPU: 1 PID: 1388 Comm: ls Tainted: G
    [ 198.521800] Hardware name: linux,dummy-virt (DT)
    [ 198.522852] task: ffffffc0fa9e8000 ti: ffffffc0f9ab0000 task.ti: ffffffc0f9ab0000
    [ 198.524306] PC is at next_tgid+0x30/0x100
    [ 198.525205] LR is at return_to_handler+0x0/0x20
    [ 198.526090] pc : [] lr : [] pstate: 60000145
    [ 198.527392] sp : ffffffc0f9ab3d40
    [ 198.528084] x29: ffffffc0f9ab3d40 x28: ffffffc0f9ab0000
    [ 198.529406] x27: ffffffc000d6a000 x26: ffffffc000b786e8
    [ 198.530659] x25: ffffffc0002a1900 x24: ffffffc0faf16c00
    [ 198.531942] x23: ffffffc0f9ab3ea0 x22: 0000000000000002
    [ 198.533202] x21: ffffffc000d85050 x20: 0000000000000002
    [ 198.534446] x19: 0000000000000002 x18: 0000000000000000
    [ 198.535719] x17: 000000000049fa08 x16: ffffffc000242efc
    [ 198.537030] x15: 0000007fa472b54c x14: ffffffffff000000
    [ 198.538347] x13: ffffffc0fada84a0 x12: 0000000000000001
    [ 198.539634] x11: ffffffc0f9ab3d70 x10: ffffffc0f9ab3d70
    [ 198.540915] x9 : ffffffc0000907c0 x8 : ffffffc0f9ab3d40
    [ 198.542215] x7 : 0000002e330f08f0 x6 : 0000000000000015
    [ 198.543508] x5 : 0000000000000f08 x4 : ffffffc0f9835ec0
    [ 198.544792] x3 : cb88537fdc8ba316 x2 : cb88537fdc8ba306
    [ 198.546108] x1 : 0000000000000002 x0 : ffffffc000d85050
    [ 198.547432]
    [ 198.547920] Process ls (pid: 1388, stack limit = 0xffffffc0f9ab0020)
    [ 198.549170] Stack: (0xffffffc0f9ab3d40 to 0xffffffc0f9ab4000)
    [ 198.582568] Call trace:
    [ 198.583313] [] next_tgid+0x30/0x100
    [ 198.584359] [] ftrace_graph_caller+0x6c/0x70
    [ 198.585503] [] ftrace_graph_caller+0x6c/0x70
    [ 198.586574] [] ftrace_graph_caller+0x6c/0x70
    [ 198.587660] [] ftrace_graph_caller+0x6c/0x70
    [ 198.588896] Code: aa0003f5 2a0103f4 b4000102 91004043 (885f7c60)
    [ 198.591092] ---[ end trace 6a346f8f20949ac8 ]---

    This is because when using function graph tracer, if the traced
    function return value is in multi regs ([x0-x7]), return_to_handler
    may corrupt them. So in return_to_handler, the parameter regs should
    be protected properly.

    Signed-off-by: Li Bin
    Acked-by: AKASHI Takahiro
    Signed-off-by: Catalin Marinas
    Signed-off-by: Greg Kroah-Hartman

    Li Bin
     
  • commit 0ce3cc008ec04258b6a6314b09f1a6012810881a upstream.

    The new Properties Table feature introduced in UEFIv2.5 may
    split memory regions that cover PE/COFF memory images into
    separate code and data regions. Since these regions only differ
    in the type (runtime code vs runtime data) and the permission
    bits, but not in the memory type attributes (UC/WC/WT/WB), the
    spec does not require them to be aligned to 64 KB.

    Since the relative offset of PE/COFF .text and .data segments
    cannot be changed on the fly, this means that we can no longer
    pad out those regions to be mappable using 64 KB pages.
    Unfortunately, there is no annotation in the UEFI memory map
    that identifies data regions that were split off from a code
    region, so we must apply this logic to all adjacent runtime
    regions whose attributes only differ in the permission bits.

    So instead of rounding each memory region to 64 KB alignment at
    both ends, only round down regions that are not directly
    preceded by another runtime region with the same type
    attributes. Since the UEFI spec does not mandate that the memory
    map be sorted, this means we also need to sort it first.

    Note that this change will result in all EFI_MEMORY_RUNTIME
    regions whose start addresses are not aligned to the OS page
    size to be mapped with executable permissions (i.e., on kernels
    compiled with 64 KB pages). However, since these mappings are
    only active during the time that UEFI Runtime Services are being
    invoked, the window for abuse is rather small.

    Tested-by: Mark Salter
    Tested-by: Mark Rutland [UEFI 2.4 only]
    Signed-off-by: Ard Biesheuvel
    Signed-off-by: Matt Fleming
    Reviewed-by: Mark Salter
    Reviewed-by: Mark Rutland
    Cc: Catalin Marinas
    Cc: Leif Lindholm
    Cc: Linus Torvalds
    Cc: Mike Galbraith
    Cc: Peter Zijlstra
    Cc: Thomas Gleixner
    Cc: Will Deacon
    Cc: linux-kernel@vger.kernel.org
    Link: http://lkml.kernel.org/r/1443218539-7610-3-git-send-email-matt@codeblueprint.co.uk
    Signed-off-by: Ingo Molnar
    Signed-off-by: Greg Kroah-Hartman

    Ard Biesheuvel
     
  • commit 397d425dc26da728396e66d392d5dcb8dac30c37 upstream.

    In rare cases a directory can be renamed out from under a bind mount.
    In those cases without special handling it becomes possible to walk up
    the directory tree to the root dentry of the filesystem and down
    from the root dentry to every other file or directory on the filesystem.

    Like division by zero .. from an unconnected path can not be given
    a useful semantic as there is no predicting at which path component
    the code will realize it is unconnected. We certainly can not match
    the current behavior as the current behavior is a security hole.

    Therefore when encounting .. when following an unconnected path
    return -ENOENT.

    - Add a function path_connected to verify path->dentry is reachable
    from path->mnt.mnt_root. AKA to validate that rename did not do
    something nasty to the bind mount.

    To avoid races path_connected must be called after following a path
    component to it's next path component.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit cde93be45a8a90d8c264c776fab63487b5038a65 upstream.

    A rename can result in a dentry that by walking up d_parent
    will never reach it's mnt_root. For lack of a better term
    I call this an escaped path.

    prepend_path is called by four different functions __d_path,
    d_absolute_path, d_path, and getcwd.

    __d_path only wants to see paths are connected to the root it passes
    in. So __d_path needs prepend_path to return an error.

    d_absolute_path similarly wants to see paths that are connected to
    some root. Escaped paths are not connected to any mnt_root so
    d_absolute_path needs prepend_path to return an error greater
    than 1. So escaped paths will be treated like paths on lazily
    unmounted mounts.

    getcwd needs to prepend "(unreachable)" so getcwd also needs
    prepend_path to return an error.

    d_path is the interesting hold out. d_path just wants to print
    something, and does not care about the weird cases. Which raises
    the question what should be printed?

    Given that / should result in -ENOENT I
    believe it is desirable for escaped paths to be printed as empty
    paths. As there are not really any meaninful path components when
    considered from the perspective of a mount tree.

    So tweak prepend_path to return an empty path with an new error
    code of 3 when it encounters an escaped path.

    Signed-off-by: "Eric W. Biederman"
    Signed-off-by: Al Viro
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • commit 43934ece2ea72c1dd279c0b0478c1a036d5d77ee upstream.

    When CONFIG_GPIOLIB is unset, its stubs will return -ENOSYS. That means
    when the mmc core parses DT for CD/WP GPIOs via mmc_of_parse(), -ENOSYS
    becomes propagated to the caller. Typically this means that the mmc host
    driver fails to probe.

    As the CD/WP GPIOs are already treated as optional, let's extend that to
    cover the case when CONFIG_GPIOLIB is unset.

    Reported-by: Michal Simek
    Fixes: 16b23787fc70 ("mmc: sdhci-of-arasan: Call OF parsing for MMC")
    Signed-off-by: Ulf Hansson
    Tested-by: Michal Simek
    Acked-by: Venu Byravarasu
    Signed-off-by: Greg Kroah-Hartman

    Ulf Hansson
     
  • commit d31911b9374a76560d2c8ea4aa6ce5781621e81d upstream.

    Currently one mrq->data maybe execute dma_map_sg() twice
    when mmc subsystem prepare over one new request, and the
    following log show up:
    sdhci[sdhci_pre_dma_transfer] invalid cookie: 24, next-cookie 25

    In this condition, mrq->date map a dma-memory(1) in sdhci_pre_req
    for the first time, and map another dma-memory(2) in sdhci_prepare_data
    for the second time. But driver only unmap the dma-memory(2), and
    dma-memory(1) never unmapped, which cause the dma memory leak issue.

    This patch use another method to map the dma memory for the mrq->data
    which can fix this dma memory leak issue.

    Fixes: 348487cb28e6 ("mmc: sdhci: use pipeline mmc requests to improve performance")
    Reported-and-tested-by: Jiri Slaby
    Signed-off-by: Haibo Chen
    Signed-off-by: Ulf Hansson
    Signed-off-by: Jiri Slaby
    Signed-off-by: Greg Kroah-Hartman

    Haibo Chen
     
  • commit 7c7feb2ebfc9c0552c51f0c050db1d1a004faac5 upstream.

    UBI: attaching mtd1 to ubi0
    UBI: scanning is finished
    UBI error: init_volumes: not enough PEBs, required 706, available 686
    UBI error: ubi_wl_init: no enough physical eraseblocks (-20, need 1)
    UBI error: ubi_attach_mtd_dev: failed to attach mtd1, error -12
    Signed-off-by: Richard Weinberger
    Reviewed-by: David Gstir
    Signed-off-by: Greg Kroah-Hartman

    shengyong
     
  • commit 281fda27673f833a01d516658a64d22a32c8e072 upstream.

    Make sure that data_size is less than LEB size.
    Otherwise a handcrafted UBI image is able to trigger
    an out of bounds memory access in ubi_compare_lebs().

    Signed-off-by: Richard Weinberger
    Reviewed-by: David Gstir
    Signed-off-by: Greg Kroah-Hartman

    Richard Weinberger
     
  • commit cf6f54e3f133229f02a90c04fe0ff9dd9d3264b4 upstream.

    Fixes the following lockdep splat:
    [ 1.244527] =============================================
    [ 1.245193] [ INFO: possible recursive locking detected ]
    [ 1.245193] 4.2.0-rc1+ #37 Not tainted
    [ 1.245193] ---------------------------------------------
    [ 1.245193] cp/742 is trying to acquire lock:
    [ 1.245193] (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [] ubifs_init_security+0x29/0xb0
    [ 1.245193]
    [ 1.245193] but task is already holding lock:
    [ 1.245193] (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [] path_openat+0x3af/0x1280
    [ 1.245193]
    [ 1.245193] other info that might help us debug this:
    [ 1.245193] Possible unsafe locking scenario:
    [ 1.245193]
    [ 1.245193] CPU0
    [ 1.245193] ----
    [ 1.245193] lock(&sb->s_type->i_mutex_key#9);
    [ 1.245193] lock(&sb->s_type->i_mutex_key#9);
    [ 1.245193]
    [ 1.245193] *** DEADLOCK ***
    [ 1.245193]
    [ 1.245193] May be due to missing lock nesting notation
    [ 1.245193]
    [ 1.245193] 2 locks held by cp/742:
    [ 1.245193] #0: (sb_writers#5){.+.+.+}, at: [] mnt_want_write+0x1f/0x50
    [ 1.245193] #1: (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [] path_openat+0x3af/0x1280
    [ 1.245193]
    [ 1.245193] stack backtrace:
    [ 1.245193] CPU: 2 PID: 742 Comm: cp Not tainted 4.2.0-rc1+ #37
    [ 1.245193] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140816_022509-build35 04/01/2014
    [ 1.245193] ffffffff8252d530 ffff88007b023a38 ffffffff814f6f49 ffffffff810b56c5
    [ 1.245193] ffff88007c30cc80 ffff88007b023af8 ffffffff810a150d ffff88007b023a68
    [ 1.245193] 000000008101302a ffff880000000000 00000008f447e23f ffffffff8252d500
    [ 1.245193] Call Trace:
    [ 1.245193] [] dump_stack+0x4c/0x65
    [ 1.245193] [] ? console_unlock+0x1c5/0x510
    [ 1.245193] [] __lock_acquire+0x1a6d/0x1ea0
    [ 1.245193] [] ? __lock_is_held+0x58/0x80
    [ 1.245193] [] lock_acquire+0xd3/0x270
    [ 1.245193] [] ? ubifs_init_security+0x29/0xb0
    [ 1.245193] [] mutex_lock_nested+0x6b/0x3a0
    [ 1.245193] [] ? ubifs_init_security+0x29/0xb0
    [ 1.245193] [] ? ubifs_init_security+0x29/0xb0
    [ 1.245193] [] ubifs_init_security+0x29/0xb0
    [ 1.245193] [] ubifs_create+0xa6/0x1f0
    [ 1.245193] [] ? path_openat+0x3af/0x1280
    [ 1.245193] [] vfs_create+0x95/0xc0
    [ 1.245193] [] path_openat+0x7cc/0x1280
    [ 1.245193] [] ? __lock_acquire+0x543/0x1ea0
    [ 1.245193] [] ? sched_clock_cpu+0x90/0xc0
    [ 1.245193] [] ? calc_global_load_tick+0x60/0x90
    [ 1.245193] [] ? sched_clock_cpu+0x90/0xc0
    [ 1.245193] [] ? __alloc_fd+0xaf/0x180
    [ 1.245193] [] do_filp_open+0x75/0xd0
    [ 1.245193] [] ? _raw_spin_unlock+0x26/0x40
    [ 1.245193] [] ? __alloc_fd+0xaf/0x180
    [ 1.245193] [] do_sys_open+0x129/0x200
    [ 1.245193] [] SyS_open+0x19/0x20
    [ 1.245193] [] entry_SYSCALL_64_fastpath+0x12/0x6f

    While the lockdep splat is a false positive, becuase path_openat holds i_mutex
    of the parent directory and ubifs_init_security() tries to acquire i_mutex
    of a new inode, it reveals that taking i_mutex in ubifs_init_security() is
    in vain because it is only being called in the inode allocation path
    and therefore nobody else can see the inode yet.

    Reported-and-tested-by: Boris Brezillon
    Reviewed-and-tested-by: Dongsheng Yang
    Signed-off-by: Richard Weinberger
    Signed-off-by: dedekind1@gmail.com
    Signed-off-by: Greg Kroah-Hartman

    Richard Weinberger
     
  • commit 83fccfc3940c4a2db90fd7e7079f5b465cd8c6af upstream.

    When replacing del_timer() with del_timer_sync(), I introduced
    a deadlock condition :

    reqsk_queue_unlink() is called from inet_csk_reqsk_queue_drop()

    inet_csk_reqsk_queue_drop() can be called from many contexts,
    one being the timer handler itself (reqsk_timer_handler()).

    In this case, del_timer_sync() loops forever.

    Simple fix is to test if timer is pending.

    Fixes: 2235f2ac75fd ("inet: fix races with reqsk timers")
    Signed-off-by: Eric Dumazet
    Signed-off-by: David S. Miller
    Cc: Holger Hoffstätte
    Cc: Andre Tomt
    Cc: Chris Caputo
    Signed-off-by: Greg Kroah-Hartman

    Eric Dumazet
     
  • commit a8b9774571d46506a0774b1ced3493b1245cf893 upstream.

    Commit 5d5cd85ff441 ("rsi: Fix failure to load firmware after memory
    leak fix and fix the leak") also added a check on the allocation of
    DMA-accessible memory that may directly return. In that case the
    already allocated firmware data is leaked. Make sure the data is
    always freed correctly. Detected by Coverity CID 1316519.

    Fixes: 5d5cd85ff441 ("rsi: Fix failure to load firmware after memory leak fix and fix the leak")
    Signed-off-by: Christian Engelmayer
    Signed-off-by: Kalle Valo
    Signed-off-by: Greg Kroah-Hartman

    Christian Engelmayer
     
  • commit e297c939b745e420ef0b9dc989cb87bda617b399 upstream.

    This fixes a race which can result in the same virtual IRQ number
    being assigned to two different MSI interrupts. The most visible
    consequence of that is usually a warning and stack trace from the
    sysfs code about an attempt to create a duplicate entry in sysfs.

    The race happens when one CPU (say CPU 0) is disposing of an MSI
    while another CPU (say CPU 1) is setting up an MSI. CPU 0 calls
    (for example) pnv_teardown_msi_irqs(), which calls
    msi_bitmap_free_hwirqs() to indicate that the MSI (i.e. its
    hardware IRQ number) is no longer in use. Then, before CPU 0 gets
    to calling irq_dispose_mapping() to free up the virtal IRQ number,
    CPU 1 comes in and calls msi_bitmap_alloc_hwirqs() to allocate an
    MSI, and gets the same hardware IRQ number that CPU 0 just freed.
    CPU 1 then calls irq_create_mapping() to get a virtual IRQ number,
    which sees that there is currently a mapping for that hardware IRQ
    number and returns the corresponding virtual IRQ number (which is
    the same virtual IRQ number that CPU 0 was using). CPU 0 then
    calls irq_dispose_mapping() and frees that virtual IRQ number.
    Now, if another CPU comes along and calls irq_create_mapping(), it
    is likely to get the virtual IRQ number that was just freed,
    resulting in the same virtual IRQ number apparently being used for
    two different hardware interrupts.

    To fix this race, we just move the call to msi_bitmap_free_hwirqs()
    to after the call to irq_dispose_mapping(). Since virq_to_hw()
    doesn't work for the virtual IRQ number after irq_dispose_mapping()
    has been called, we need to call it before irq_dispose_mapping() and
    remember the result for the msi_bitmap_free_hwirqs() call.

    The pattern of calling msi_bitmap_free_hwirqs() before
    irq_dispose_mapping() appears in 5 places under arch/powerpc, and
    appears to have originated in commit 05af7bd2d75e ("[POWERPC] MPIC
    U3/U4 MSI backend") from 2007.

    Fixes: 05af7bd2d75e ("[POWERPC] MPIC U3/U4 MSI backend")
    Reported-by: Alexey Kardashevskiy
    Signed-off-by: Paul Mackerras
    Signed-off-by: Michael Ellerman
    Signed-off-by: Greg Kroah-Hartman

    Paul Mackerras
     
  • commit c2e4b24ff848bb180f9b9cd873a38327cd219ad2 upstream.

    When a trace recorded on a 32-bit device is processed with a 64-bit
    binary, the higher 32-bits of the address need to ignored.

    The lack of this results in the output of the 64-bit pointer
    value to the trace as the 32-bit address lookup fails in find_printk().

    Before:

    burn-1778 [003] 548.600305: bputs: 0xc0046db2s: 2cec5c058d98c

    After:

    burn-1778 [003] 548.600305: bputs: 0xc0046db2s: RT throttling activated

    The problem occurs in PRINT_FIELD when the field is recognized as a
    pointer to a string (of the type const char *)

    Heterogeneous architectures cases below can arise and should be handled:

    * Traces recorded using 32-bit addresses processed on a 64-bit machine
    * Traces recorded using 64-bit addresses processed on a 32-bit machine

    Reported-by: Juri Lelli
    Signed-off-by: Kapileshwar Singh
    Reviewed-by: Steven Rostedt
    Cc: David Ahern
    Cc: Javi Merino
    Cc: Jiri Olsa
    Cc: Namhyung Kim
    Link: http://lkml.kernel.org/r/1442928123-13824-1-git-send-email-kapileshwar.singh@arm.com
    Signed-off-by: Arnaldo Carvalho de Melo
    Signed-off-by: Greg Kroah-Hartman

    Kapileshwar Singh
     
  • commit 53cf037bf846417fd92dc92ddf97267f69b110f4 upstream.

    The two commits noted below added calls to ip_hdr() and ipv6_hdr(). They
    need a correctly set skb network header.

    Unfortunately we cannot rely on the device drivers to set it for us.
    Therefore setting it in the beginning of the according ndo_start_xmit
    handler.

    Fixes: 1d8ab8d3c176 ("batman-adv: Modified forwarding behaviour for multicast packets")
    Fixes: ab49886e3da7 ("batman-adv: Add IPv4 link-local/IPv6-ll-all-nodes multicast support")
    Signed-off-by: Linus Lüssing
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli
    Signed-off-by: Greg Kroah-Hartman

    Linus Lüssing
     
  • commit 8a4023c5b5e30b11f1f383186f4a7222b3b823cf upstream.

    So far the mcast tvlv handler did not anticipate the processing of
    multiple incoming OGMs from the same originator at the same time. This
    can lead to various issues:

    * Broken refcounting: For instance two mcast handlers might both assume
    that an originator just got multicast capabilities and will together
    wrongly decrease mcast.num_disabled by two, potentially leading to
    an integer underflow.

    * Potential kernel panic on hlist_del_rcu(): Two mcast handlers might
    one after another try to do an
    hlist_del_rcu(&orig->mcast_want_all_*_node). The second one will
    cause memory corruption / crashes.
    (Reported by: Sven Eckelmann )

    Right in the beginning the code path makes assumptions about the current
    multicast related state of an originator and bases all updates on that. The
    easiest and least error prune way to fix the issues in this case is to
    serialize multiple mcast handler invocations with a spinlock.

    Fixes: 60432d756cf0 ("batman-adv: Announce new capability via multicast TVLV")
    Signed-off-by: Linus Lüssing
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli
    Signed-off-by: Greg Kroah-Hartman

    Linus Lüssing
     
  • commit 9c936e3f4c4fad07abb6c082a89508b8f724c88f upstream.

    Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
    OGM handler might undo the set/clear of a specific bit from another
    handler run in between.

    Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.

    Fixes: 60432d756cf0 ("batman-adv: Announce new capability via multicast TVLV")
    Signed-off-by: Linus Lüssing
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli
    Signed-off-by: Greg Kroah-Hartman

    Linus Lüssing
     
  • commit ac4eebd48461ec993e7cb614d5afe7df8c72e6b7 upstream.

    Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
    OGM handler might undo the set/clear of a specific bit from another
    handler run in between.

    Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.

    Fixes: e17931d1a61d ("batman-adv: introduce capability initialization bitfield")
    Signed-off-by: Linus Lüssing
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli
    Signed-off-by: Greg Kroah-Hartman

    Linus Lüssing
     
  • commit 4635469f5c617282f18c69643af36cd8c0acf707 upstream.

    Bitwise OR/AND assignments in C aren't guaranteed to be atomic. One
    OGM handler might undo the set/clear of a specific bit from another
    handler run in between.

    Fix this by using the atomic set_bit()/clear_bit()/test_bit() functions.

    Fixes: 3f4841ffb336 ("batman-adv: tvlv - add network coding container")
    Signed-off-by: Linus Lüssing
    Signed-off-by: Marek Lindner
    Signed-off-by: Antonio Quartulli
    Signed-off-by: Greg Kroah-Hartman

    Linus Lüssing
     
  • commit 53960059d56ecef67d4ddd546731623641a3d2d1 upstream.

    If there is a DMA zone (usually 24bit = 16MB I believe), but no DMA32
    zone, as is the case for some 32-bit kernels, then massage_gfp_flags()
    will cause DMA memory allocated for devices with a 32..63-bit
    coherent_dma_mask to fall back to using __GFP_DMA, even though there may
    only be 32-bits of physical address available anyway.

    Correct that case to compare against a mask the size of phys_addr_t
    instead of always using a 64-bit mask.

    Signed-off-by: James Hogan
    Fixes: a2e715a86c6d ("MIPS: DMA: Fix computation of DMA flags from device's coherent_dma_mask.")
    Cc: Ralf Baechle
    Cc: linux-mips@linux-mips.org
    Patchwork: https://patchwork.linux-mips.org/patch/9610/
    Signed-off-by: Ralf Baechle
    Signed-off-by: Greg Kroah-Hartman

    James Hogan