06 Jul, 2014

4 commits


05 Jul, 2014

1 commit


04 Jul, 2014

30 commits

  • Pull sound fixes from Takashi Iwai:
    "This contains a few fixes for HD-audio: yet another Dell headset pin
    quirk, a fixup for Thinkpad T540P, and an improved fix for
    Haswell/Broadwell HDMI clock setup"

    * tag 'sound-3.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
    ALSA: hda - restore BCLK M/N value as per CDCLK for HSW/BDW display HDA controller
    drm/i915: provide interface for audio driver to query cdclk
    ALSA: hda - Add a fixup for Thinkpad T540p
    ALSA: hda - Add another headset pin quirk for some Dell machines

    Linus Torvalds
     
  • Pull btrfs fixes from Chris Mason:
    "We've queued up a few fixes in my for-linus branch"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
    Btrfs: fix crash when starting transaction
    Btrfs: fix btrfs_print_leaf for skinny metadata
    Btrfs: fix race of using total_bytes_pinned
    btrfs: use E2BIG instead of EIO if compression does not help
    btrfs: remove stale comment from btrfs_flush_all_pending_stuffs
    Btrfs: fix use-after-free when cloning a trailing file hole
    btrfs: fix null pointer dereference in btrfs_show_devname when name is null
    btrfs: fix null pointer dereference in clone_fs_devices when name is null
    btrfs: fix nossd and ssd_spread mount option regression
    Btrfs: fix race between balance recovery and root deletion
    Btrfs: atomically set inode->i_flags in btrfs_update_iflags
    btrfs: only unlock block in verify_parent_transid if we locked it
    Btrfs: assert send doesn't attempt to start transactions
    btrfs compression: reuse recently used workspace
    Btrfs: fix crash when mounting raid5 btrfs with missing disks
    btrfs: create sprout should rename fsid on the sysfs as well
    btrfs: dev replace should replace the sysfs entry
    btrfs: dev add should add its sysfs entry
    btrfs: dev delete should remove sysfs entry
    btrfs: rename add_device_membership to btrfs_kobj_add_device

    Linus Torvalds
     
  • The CurrentEL system register reports the Current Exception Level
    of the CPU. It doesn't say anything about the stack handling, and
    yet we compare it to PSR_MODE_EL2t and PSR_MODE_EL2h.

    It works by chance because PSR_MODE_EL2t happens to match the right
    bits, but that's otherwise a very bad idea. Just check for the EL
    value instead.

    Signed-off-by: Marc Zyngier
    [catalin.marinas@arm.com: fixed arch/arm64/kernel/efi-entry.S]
    Signed-off-by: Catalin Marinas

    Marc Zyngier
     
  • The __sync_icache_dcache routine will only flush the dcache for the
    first page of a compound page, potentially leading to stale icache
    data residing further on in a hugetlb page.

    This patch addresses this issue by taking into consideration the
    order of the page when flushing the dcache.

    Reported-by: Mark Brown
    Tested-by: Mark Brown
    Signed-off-by: Steve Capper
    Acked-by: Will Deacon
    Signed-off-by: Catalin Marinas
    Cc: # v3.11+

    Steve Capper
     
  • The define ARM64_64K_PAGES is tested for rather than
    CONFIG_ARM64_64K_PAGES. Correct that typo here.

    Signed-off-by: Steve Capper
    Signed-off-by: Catalin Marinas

    Steve Capper
     
  • For HSW/BDW display HD-A controller, hda_set_bclk() is defined to set BCLK
    by programming the M/N values as per the core display clock (CDCLK) queried from
    i915 display driver.

    And the audio driver will also set BCLK in azx_first_init() since the display
    driver can turn off the shared power in boot phase if only eDP is connected
    and M/N values will be lost and must be reprogrammed.

    Signed-off-by: Mengdong Lin
    Cc:
    Signed-off-by: Takashi Iwai

    Mengdong Lin
     
  • For Haswell and Broadwell, if the display power well has been disabled,
    the display audio controller divider values EM4 M VALUE and EM5 N VALUE
    will have been lost. The CDCLK frequency is required for reprogramming them
    to generate 24MHz HD-A link BCLK. So provide a private interface for the
    audio driver to query CDCLK.

    This is a stopgap solution until a more generic interface between audio
    and display drivers has been implemented.

    Signed-off-by: Jani Nikula
    Reviewed-by: Damien Lespiau
    Signed-off-by: Mengdong Lin
    Cc:
    Signed-off-by: Takashi Iwai

    Jani Nikula
     
  • Pull USB bugfixes from Greg KH:
    "Here's a round of USB bugfixes, quirk additions, and new device ids
    for 3.16-rc4. Nothing major in here at all, just a bunch of tiny
    changes. All have been in linux-next with no reported issues"

    * tag 'usb-3.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (33 commits)
    usb: chipidea: udc: delete td from req's td list at ep_dequeue
    usb: Kconfig: make EHCI_MSM selectable for QCOM SOCs
    usb-storage/SCSI: Add broken_fua blacklist flag
    usb: musb: dsps: fix the base address for accessing the mode register
    tools: ffs-test: fix header values endianess
    usb: phy: msm: Do not do runtime pm if the phy is not idle
    usb: musb: Ensure that cppi41 timer gets armed on premature DMA TX irq
    usb: gadget: gr_udc: Fix check for invalid number of microframes
    usb: musb: Fix panic upon musb_am335x module removal
    usb: gadget: f_fs: resurect usb_functionfs_descs_head structure
    Revert "tools: ffs-test: convert to new descriptor format fixing compilation error"
    xhci: Fix runtime suspended xhci from blocking system suspend.
    xhci: clear root port wake on bits if controller isn't wake-up capable
    xhci: correct burst count field for isoc transfers on 1.0 xhci hosts
    xhci: Use correct SLOT ID when handling a reset device command
    MAINTAINERS: update e-mail address
    usb: option: add/modify Olivetti Olicard modems
    USB: ftdi_sio: fix null deref at port probe
    MAINTAINERS: drop two usb-serial subdriver entries
    USB: option: add device ID for SpeedUp SU9800 usb 3g modem
    ...

    Linus Torvalds
     
  • Pull staging driver bugfixes from Greg KH:
    "Nothing major here, just 4 small bugfixes that resolve some issues
    reported for the IIO (staging and non-staging) and the tidspbridge
    driver"

    * tag 'staging-3.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
    staging: tidspbridge: fix an erroneous removal of parentheses
    iio: of_iio_channel_get_by_name() returns non-null pointers for error legs
    staging: iio/ad7291: fix error code in ad7291_probe()
    iio:adc:ad799x: Fix reading and writing of event values, apply shift

    Linus Torvalds
     
  • Pull driver core fixes from Greg KH:
    "Well, one drivercore fix for kernfs to resolve a reported issue with
    sysfs files being updated from atomic contexts, and another lz4 bugfix
    for testing potential buffer overflows"

    * tag 'driver-core-3.16-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
    lz4: add overrun checks to lz4_uncompress_unknownoutputsize()
    kernfs: kernfs_notify() must be useable from non-sleepable contexts

    Linus Torvalds
     
  • …it/rostedt/linux-trace

    Pull tracing fixes from Steven Rostedt:
    "Oleg Nesterov found and fixed a bug in the perf/ftrace/uprobes code
    where running:

    # perf probe -x /lib/libc.so.6 syscall
    # echo 1 >> /sys/kernel/debug/tracing/events/probe_libc/enable
    # perf record -e probe_libc:syscall whatever

    kills the uprobe. Along the way he found some other minor bugs and
    clean ups that he fixed up making it a total of 4 patches.

    Doing unrelated work, I found that the reading of the ftrace trace
    file disables all function tracer callbacks. This was fine when
    ftrace was the only user, but now that it's used by perf and kprobes,
    this is a bug where reading trace can disable kprobes and perf. A
    very unexpected side effect and should be fixed"

    * tag 'trace-fixes-v3.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Remove ftrace_stop/start() from reading the trace file
    tracing/uprobes: Fix the usage of uprobe_buffer_enable() in probe_event_enable()
    tracing/uprobes: Kill the bogus UPROBE_HANDLER_REMOVE code in uprobe_dispatcher()
    uprobes: Change unregister/apply to WARN() if uprobe/consumer is gone
    tracing/uprobes: Revert "Support mix of ftrace and perf"

    Linus Torvalds
     
  • Pull kbuild fix from Michal Marek:
    "There is one more fix for the relative paths series from -rc1: Print
    the path to the build directory at the start of the build, so that
    editors and IDEs can match the relative paths to source files"

    * 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
    kbuild: Print the name of the build directory

    Linus Torvalds
     
  • Pull nfsd bugfixes from Bruce Fields:
    "By coincidence, two NFSv4 symlink bugs, one introduced in the 3.16 xdr
    encoding rewrite, the other a decoding bug that I think we've had
    since the start but that just doesn't trigger very often"

    * 'for-3.16' of git://linux-nfs.org/~bfields/linux:
    nfs: fix nfs4d readlink truncated packet
    nfsd: fix rare symlink decoding bug

    Linus Torvalds
     
  • The 'sysret' fastpath does not correctly restore even all regular
    registers, much less any segment registers or reflags values. That is
    very much part of why it's faster than 'iret'.

    Normally that isn't a problem, because the normal ptrace() interface
    catches the process using the signal handler infrastructure, which
    always returns with an iret.

    However, some paths can get caught using ptrace_event() instead of the
    signal path, and for those we need to make sure that we aren't going to
    return to user space using 'sysret'. Otherwise the modifications that
    may have been done to the register set by the tracer wouldn't
    necessarily take effect.

    Fix it by forcing IRET path by setting TIF_NOTIFY_RESUME from
    arch_ptrace_stop_needed() which is invoked from ptrace_stop().

    Signed-off-by: Tejun Heo
    Reported-by: Andy Lutomirski
    Acked-by: Oleg Nesterov
    Suggested-by: Linus Torvalds
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Jan points out that I forgot to make the needed fixes to the
    lz4_uncompress_unknownoutputsize() function to mirror the changes done
    in lz4_decompress() with regards to potential pointer overflows.

    The only in-kernel user of this function is the zram code, which only
    takes data from a valid compressed buffer that it made itself, so it's
    not a big issue. But due to external kernel modules using this
    function, it's better to be safe here.

    Reported-by: Jan Beulich
    Cc: "Don A. Bailey"
    Cc: stable
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • Merge fixes from Andrew Morton:
    "14 fixes"

    * emailed patches from Andrew Morton :
    shmem: fix init_page_accessed use to stop !PageLRU bug
    kernel/printk/printk.c: revert "printk: enable interrupts before calling console_trylock_for_printk()"
    tools/testing/selftests/ipc/msgque.c: improve error handling when not running as root
    fs/seq_file: fallback to vmalloc allocation
    /proc/stat: convert to single_open_size()
    hwpoison: fix the handling path of the victimized page frame that belong to non-LRU
    mm:vmscan: update the trace-vmscan-postprocess.pl for event vmscan/mm_vmscan_lru_isolate
    msync: fix incorrect fstart calculation
    zram: revalidate disk after capacity change
    tools: memory-hotplug fix unexpected operator error
    tools: cpu-hotplug fix unexpected operator error
    autofs4: fix false positive compile error
    slub: fix off by one in number of slab tests
    mm: page_alloc: fix CMA area initialisation when pageblock > MAX_ORDER

    Linus Torvalds
     
  • Under shmem swapping load, I sometimes hit the VM_BUG_ON_PAGE(!PageLRU)
    in isolate_lru_pages() at mm/vmscan.c:1281!

    Commit 2457aec63745 ("mm: non-atomically mark page accessed during page
    cache allocation where possible") looks like interrupted work-in-progress.

    mm/filemap.c's call to init_page_accessed() is fine, but not mm/shmem.c's
    - shmem_write_begin() is clearly wrong to use it after shmem_getpage(),
    when the page is always visible in radix_tree, and often already on LRU.

    Revert change to shmem_write_begin(), and use init_page_accessed() or
    mark_page_accessed() appropriately for SGP_WRITE in shmem_getpage_gfp().

    SGP_WRITE also covers shmem_symlink(), which did not mark_page_accessed()
    before; but since many other filesystems use [__]page_symlink(), which did
    and does mark the page accessed, consider this as rectifying an oversight.

    Signed-off-by: Hugh Dickins
    Acked-by: Mel Gorman
    Cc: Johannes Weiner
    Cc: Vlastimil Babka
    Cc: Michal Hocko
    Cc: Dave Hansen
    Cc: Prabhakar Lad
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • …_trylock_for_printk()"

    Revert commit 939f04bec1a4 ("printk: enable interrupts before calling
    console_trylock_for_printk()").

    Andreas reported:

    : None of the post 3.15 kernel boot for me. They all hang at the GRUB
    : screen telling me it loaded and started the kernel, but the kernel
    : itself stops before it prints anything (or even replaces the GRUB
    : background graphics).

    939f04bec1a4 is modest latency reduction. Revert it until we understand
    the reason for these failures.

    Reported-by: Andreas Bombe <aeb@debian.org>
    Cc: Jan Kara <jack@suse.cz>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

    Andrew Morton
     
  • The test fails in the middle when it is not run as root while accessing
    /proc/sys/kernel/msg_next_id. Changed it to check for root at the
    beginning of the test and exit if not root.

    Signed-off-by: Shuah Khan
    Cc: Greg Kroah-Hartman
    Cc: Davidlohr Bueso
    Cc: Colin Ian King
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shuah Khan
     
  • There are a couple of seq_files which use the single_open() interface.
    This interface requires that the whole output must fit into a single
    buffer.

    E.g. for /proc/stat allocation failures have been observed because an
    order-4 memory allocation failed due to memory fragmentation. In such
    situations reading /proc/stat is not possible anymore.

    Therefore change the seq_file code to fallback to vmalloc allocations
    which will usually result in a couple of order-0 allocations and hence
    also work if memory is fragmented.

    For reference a call trace where reading from /proc/stat failed:

    sadc: page allocation failure: order:4, mode:0x1040d0
    CPU: 1 PID: 192063 Comm: sadc Not tainted 3.10.0-123.el7.s390x #1
    [...]
    Call Trace:
    show_stack+0x6c/0xe8
    warn_alloc_failed+0xd6/0x138
    __alloc_pages_nodemask+0x9da/0xb68
    __get_free_pages+0x2e/0x58
    kmalloc_order_trace+0x44/0xc0
    stat_open+0x5a/0xd8
    proc_reg_open+0x8a/0x140
    do_dentry_open+0x1bc/0x2c8
    finish_open+0x46/0x60
    do_last+0x382/0x10d0
    path_openat+0xc8/0x4f8
    do_filp_open+0x46/0xa8
    do_sys_open+0x114/0x1f0
    sysc_tracego+0x14/0x1a

    Signed-off-by: Heiko Carstens
    Tested-by: David Rientjes
    Cc: Ian Kent
    Cc: Hendrik Brueckner
    Cc: Thorsten Diehl
    Cc: Andrea Righi
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: Stefan Bader
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • These two patches are supposed to "fix" failed order-4 memory
    allocations which have been observed when reading /proc/stat. The
    problem has been observed on s390 as well as on x86.

    To address the problem change the seq_file memory allocations to
    fallback to use vmalloc, so that allocations also work if memory is
    fragmented.

    This approach seems to be simpler and less intrusive than changing
    /proc/stat to use an interator. Also it "fixes" other users as well,
    which use seq_file's single_open() interface.

    This patch (of 2):

    Use seq_file's single_open_size() to preallocate a buffer that is large
    enough to hold the whole output, instead of open coding it. Also
    calculate the requested size using the number of online cpus instead of
    possible cpus, since the size of the output only depends on the number
    of online cpus.

    Signed-off-by: Heiko Carstens
    Acked-by: David Rientjes
    Cc: Ian Kent
    Cc: Hendrik Brueckner
    Cc: Thorsten Diehl
    Cc: Andrea Righi
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: Stefan Bader
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     
  • Until now, the kernel has the same policy to handle victimized page
    frames that belong to kernel-space(reserved/slab-subsystem) or
    non-LRU(unknown page state). In other word, the result of handling
    either of these victimized page frames is (IGNORED | FAILED), and the
    return value of memory_failure() is -EBUSY.

    This patch is to avoid that memory_failure() returns very soon due to
    the "true" value of (!PageLRU(p)), and it also ensures that
    action_result() can report more precise information("reserved kernel",
    "kernel slab", and "unknown page state") instead of "non LRU",
    especially for memory errors which are detected by memory-scrubbing.

    Andi said:

    : While running the mcelog test suite on 3.14 I hit the following VM_BUG_ON:
    :
    : soft_offline: 0x56d4: unknown non LRU page type 3ffff800008000
    : page:ffffea000015b400 count:3 mapcount:2097169 mapping: (null) index:0xffff8800056d7000
    : page flags: 0x3ffff800004081(locked|slab|head)
    : ------------[ cut here ]------------
    : kernel BUG at mm/rmap.c:1495!
    :
    : I think what happened is that a LRU page turned into a slab page in
    : parallel with offlining. memory_failure initially tests for this case,
    : but doesn't retest later after the page has been locked.
    :
    : ...
    :
    : I ran this patch in a loop over night with some stress plus
    : the mcelog test suite running in a loop. I cannot guarantee it hit it,
    : but it should have given it a good beating.
    :
    : The kernel survived with no messages, although the mcelog test suite
    : got killed at some point because it couldn't fork anymore. Probably
    : some unrelated problem.
    :
    : So the patch is ok for me for .16.

    Signed-off-by: Chen Yucong
    Acked-by: Naoya Horiguchi
    Reported-by: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen Yucong
     
  • When using trace-vmscan-postprocess.pl for checking the file/anon rate
    of scanning, we can find that it can not be performed. At the same
    time, the following message will be reported:

    WARNING: Format not as expected for event vmscan/mm_vmscan_lru_isolate
    'file' != 'contig_taken' Fewer fields than expected in format at
    ./trace-vmscan-postprocess.pl line 171, line 76.

    In trace-vmscan-postprocess.pl, (contig_taken, contig_dirty, and
    contig_failed) are be associated respectively to (nr_lumpy_taken,
    nr_lumpy_dirty, and nr_lumpy_failed) for lumpy reclaim. Via commit
    c53919adc045 ("mm: vmscan: remove lumpy reclaim"), lumpy reclaim had
    already been removed by Mel, but the update for
    trace-vmscan-postprocess.pl was missed.

    Signed-off-by: Chen Yucong
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Chen Yucong
     
  • Fix a regression caused by 7fc34a62ca44 ("mm/msync.c: sync only the
    requested range in msync()").

    xfstests generic/075 fail occured on ext4 data=journal mode because the
    intended range was not syncing due to wrong fstart calculation.

    Signed-off-by: Namjae Jeon
    Signed-off-by: Ashish Sangwan
    Reported-by: Eric Whitney
    Tested-by: Eric Whitney
    Acked-by: Matthew Wilcox
    Reviewed-by: Lukas Czerner
    Tested-by: Lukas Czerner
    Reviewed-by: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Namjae Jeon
     
  • Alexander reported mkswap on /dev/zram0 is failed if other process is
    opening the block device file.

    Step is as follows,

    0. Reset the unused zram device.
    1. Use a program that opens /dev/zram0 with O_RDWR and sleeps
    until killed.
    2. While that program sleeps, echo the correct value to
    /sys/block/zram0/disksize.
    3. Verify (e.g. in /proc/partitions) that the disk size is applied
    correctly. It is.
    4. While that program still sleeps, attempt to mkswap /dev/zram0.
    This fails: mkswap: error: swap area needs to be at least 40 KiB

    When I investigated, the size get by ioctl(fd, BLKGETSIZE64, xxx) on
    mkswap to get a size of blockdev was zero although zram0 has right size by
    2.

    The reason is zram didn't revalidate disk after changing capacity so that
    size of blockdev's inode is not uptodate until all of file is close.

    This patch should fix the BUG.

    Signed-off-by: Minchan Kim
    Reported-by: Alexander E. Patrakov
    Tested-by: Alexander E. Patrakov
    Reviewed-by: Sergey Senozhatsky
    Cc: Nitin Gupta
    Acked-by: Jerome Marchand
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Minchan Kim
     
  • on-off-test uses "$UID != 0" to test for root, but $UID is a construct
    specific to bash. Using /bin/sh that isn't bash results in the
    following error (due to the "$UID" part expanding to nothing):

    ./on-off-test.sh: 9: [: !=: unexpected operator

    Change Makefile to use bash instead.

    Signed-off-by: Shuah Khan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shuah Khan
     
  • on-off-test uses "$UID != 0" to test for root, but $UID is a construct
    specific to bash. Using /bin/sh that isn't bash results in the
    following error (due to the "$UID" part expanding to nothing):

    ./on-off-test.sh: 9: [: !=: unexpected operator

    Change Makefile to use bash instead.

    Signed-off-by: Shuah Khan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Shuah Khan
     
  • On strict build environments we can see:

    fs/autofs4/inode.c: In function 'autofs4_fill_super':
    fs/autofs4/inode.c:312: error: 'pgrp' may be used uninitialized in this function
    make[2]: *** [fs/autofs4/inode.o] Error 1
    make[1]: *** [fs/autofs4] Error 2
    make: *** [fs] Error 2
    make: *** Waiting for unfinished jobs....

    This is due to the use of pgrp_set being used to indicate pgrp has has
    been set rather than initializing pgrp itself.

    Signed-off-by: Ian Kent
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ian Kent
     
  • min_partial means minimum number of slab cached in node partial list.
    So, if nr_partial is less than it, we keep newly empty slab on node
    partial list rather than freeing it. But if nr_partial is equal or
    greater than it, it means that we have enough partial slabs so should
    free newly empty slab. Current implementation missed the equal case so
    if we set min_partial is 0, then, at least one slab could be cached.
    This is critical problem to kmemcg destroying logic because it doesn't
    works properly if some slabs is cached. This patch fixes this problem.

    Fixes 91cb69620284 ("slub: make dead memcg caches discard free slabs
    immediately").

    Signed-off-by: Joonsoo Kim
    Acked-by: Vladimir Davydov
    Cc: Christoph Lameter
    Cc: Pekka Enberg
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joonsoo Kim
     
  • With a kernel configured with ARM64_64K_PAGES && !TRANSPARENT_HUGEPAGE,
    the following is triggered at early boot:

    SMP: Total of 8 processors activated.
    devtmpfs: initialized
    Unable to handle kernel NULL pointer dereference at virtual address 00000008
    pgd = fffffe0000050000
    [00000008] *pgd=00000043fba00003, *pmd=00000043fba00003, *pte=00e0000078010407
    Internal error: Oops: 96000006 [#1] SMP
    Modules linked in:
    CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.15.0-rc864k+ #44
    task: fffffe03bc040000 ti: fffffe03bc080000 task.ti: fffffe03bc080000
    PC is at __list_add+0x10/0xd4
    LR is at free_one_page+0x270/0x638
    ...
    Call trace:
    __list_add+0x10/0xd4
    free_one_page+0x26c/0x638
    __free_pages_ok.part.52+0x84/0xbc
    __free_pages+0x74/0xbc
    init_cma_reserved_pageblock+0xe8/0x104
    cma_init_reserved_areas+0x190/0x1e4
    do_one_initcall+0xc4/0x154
    kernel_init_freeable+0x204/0x2a8
    kernel_init+0xc/0xd4

    This happens because init_cma_reserved_pageblock() calls
    __free_one_page() with pageblock_order as page order but it is bigger
    than MAX_ORDER. This in turn causes accesses past zone->free_list[].

    Fix the problem by changing init_cma_reserved_pageblock() such that it
    splits pageblock into individual MAX_ORDER pages if pageblock is bigger
    than a MAX_ORDER page.

    In cases where !CONFIG_HUGETLB_PAGE_SIZE_VARIABLE, which is all
    architectures expect for ia64, powerpc and tile at the moment, the
    “pageblock_order > MAX_ORDER” condition will be optimised out since both
    sides of the operator are constants. In cases where pageblock size is
    variable, the performance degradation should not be significant anyway
    since init_cma_reserved_pageblock() is called only at boot time at most
    MAX_CMA_AREAS times which by default is eight.

    Signed-off-by: Michal Nazarewicz
    Reported-by: Mark Salter
    Tested-by: Mark Salter
    Tested-by: Christopher Covington
    Cc: Mel Gorman
    Cc: David Rientjes
    Cc: Marek Szyprowski
    Cc: Catalin Marinas
    Cc: [3.5+]
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Nazarewicz
     

03 Jul, 2014

5 commits

  • Often when starting a transaction we commit the currently running transaction,
    which can end up writing block group caches when the current process has its
    journal_info set to NULL (and not to a transaction). This makes our assertion
    at btrfs_check_data_free_space() (current_journal != NULL) fail, resulting
    in a crash/hang. Therefore fix it by setting journal_info.

    Two different traces of this issue follow below.

    1)

    [51502.241936] BTRFS: assertion failed: current->journal_info, file: fs/btrfs/extent-tree.c, line: 3670
    [51502.242213] ------------[ cut here ]------------
    [51502.242493] kernel BUG at fs/btrfs/ctree.h:3964!
    [51502.242669] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
    (...)
    [51502.244010] Call Trace:
    [51502.244010] [] btrfs_check_data_free_space+0x395/0x3a0 [btrfs]
    [51502.244010] [] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs]
    [51502.244010] [] commit_cowonly_roots+0x164/0x226 [btrfs]
    [51502.244010] [] btrfs_commit_transaction+0x4ed/0xab0 [btrfs]
    [51502.244010] [] ? _raw_spin_unlock+0x2b/0x40
    [51502.244010] [] start_transaction+0x459/0x620 [btrfs]
    [51502.244010] [] btrfs_start_transaction+0x1b/0x20 [btrfs]
    [51502.244010] [] __unlink_start_trans+0x31/0xe0 [btrfs]
    [51502.244010] [] btrfs_unlink+0x37/0xc0 [btrfs]
    [51502.244010] [] ? do_unlinkat+0x114/0x2a0
    [51502.244010] [] vfs_unlink+0xcc/0x150
    [51502.244010] [] do_unlinkat+0x260/0x2a0
    [51502.244010] [] ? filp_close+0x64/0x90
    [51502.244010] [] ? trace_hardirqs_on_caller+0x16/0x1e0
    [51502.244010] [] ? trace_hardirqs_on_thunk+0x3a/0x3f
    [51502.244010] [] SyS_unlinkat+0x1b/0x40
    [51502.244010] [] system_call_fastpath+0x16/0x1b
    [51502.244010] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 71 13 36 a0 48 89 fe 31 c0 48 c7 c7 b8 43 36 a0 48 89 e5 e8 5d b0 32 e1 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5
    [51502.244010] RIP [] assfail.constprop.88+0x1e/0x20 [btrfs]

    2)

    [25405.097230] BTRFS: assertion failed: current->journal_info, file: fs/btrfs/extent-tree.c, line: 3670
    [25405.097488] ------------[ cut here ]------------
    [25405.097767] kernel BUG at fs/btrfs/ctree.h:3964!
    [25405.097940] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC
    (...)
    [25405.100008] Call Trace:
    [25405.100008] [] btrfs_check_data_free_space+0x395/0x3a0 [btrfs]
    [25405.100008] [] btrfs_write_dirty_block_groups+0x4ac/0x640 [btrfs]
    [25405.100008] [] commit_cowonly_roots+0x164/0x226 [btrfs]
    [25405.100008] [] btrfs_commit_transaction+0x4ed/0xab0 [btrfs]
    [25405.100008] [] ? bit_waitqueue+0xc0/0xc0
    [25405.100008] [] start_transaction+0x459/0x620 [btrfs]
    [25405.100008] [] btrfs_start_transaction+0x1b/0x20 [btrfs]
    [25405.100008] [] btrfs_create+0x47/0x210 [btrfs]
    [25405.100008] [] ? btrfs_permission+0x3c/0x80 [btrfs]
    [25405.100008] [] vfs_create+0x9b/0x130
    [25405.100008] [] do_last+0x849/0xe20
    [25405.100008] [] ? link_path_walk+0x79/0x820
    [25405.100008] [] path_openat+0xc5/0x690
    [25405.100008] [] ? trace_hardirqs_on+0xd/0x10
    [25405.100008] [] ? __alloc_fd+0x32/0x1d0
    [25405.100008] [] do_filp_open+0x43/0xa0
    [25405.100008] [] ? __alloc_fd+0x151/0x1d0
    [25405.100008] [] do_sys_open+0x13c/0x230
    [25405.100008] [] ? trace_hardirqs_on_caller+0x16/0x1e0
    [25405.100008] [] SyS_open+0x22/0x30
    [25405.100008] [] system_call_fastpath+0x16/0x1b
    [25405.100008] Code: 0b 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 55 89 f1 48 c7 c2 51 13 36 a0 48 89 fe 31 c0 48 c7 c7 d0 43 36 a0 48 89 e5 e8 6d b5 32 e1 0b 0f 1f 44 00 00 55 b9 11 00 00 00 48 89 e5 41 55 49 89 f5
    [25405.100008] RIP [] assfail.constprop.88+0x1e/0x20 [btrfs]

    Signed-off-by: Filipe David Borba Manana
    Signed-off-by: Chris Mason

    Filipe Manana
     
  • We wouldn't actuall print the extent information if we had a skinny metadata
    item, this fixes that. Thanks,

    Signed-off-by: Josef Bacik
    Signed-off-by: Chris Mason

    Josef Bacik
     
  • This percpu counter @total_bytes_pinned is introduced to skip unnecessary
    operations of 'commit transaction', it accounts for those space we may free
    but are stuck in delayed refs.

    And we zero out @space_info->total_bytes_pinned every transaction period so
    we have a better idea of how much space we'll actually free up by committing
    this transaction. However, we do the 'zero out' part a little earlier, before
    we actually unpin space, so we end up returning ENOSPC when we actually have
    free space that's just unpinned from committing transaction.

    xfstests/generic/074 complained then.

    This fixes it by actually accounting the percpu pinned number when 'unpin',
    and since it's protected by space_info->lock, the race is gone now.

    Signed-off-by: Liu Bo
    Reviewed-by: Miao Xie
    Signed-off-by: Chris Mason

    Liu Bo
     
  • Return codes got updated in 60e1975acb48fc3d74a3422b21dde74c977ac3d5
    (btrfs: return errno instead of -1 from compression)
    lzo wrapper returns E2BIG in this case, do the same for zlib.

    Signed-off-by: David Sterba

    David Sterba
     
  • Commit fcebe4562dec83b3f8d3088d77584727b09130b2 (Btrfs: rework qgroup
    accounting) removed the qgroup accounting after delayed refs.

    Signed-off-by: David Sterba

    David Sterba