28 Apr, 2014

1 commit

  • A race exists between module loading and enabling of function tracer.

    CPU 1 CPU 2
    ----- -----
    load_module()
    module->state = MODULE_STATE_COMING

    register_ftrace_function()
    mutex_lock(&ftrace_lock);
    ftrace_startup()
    update_ftrace_function();
    ftrace_arch_code_modify_prepare()
    set_all_module_text_rw();

    ftrace_arch_code_modify_post_process()
    set_all_module_text_ro();

    [ here all module text is set to RO,
    including the module that is
    loading!! ]

    blocking_notifier_call_chain(MODULE_STATE_COMING);
    ftrace_init_module()

    [ tries to modify code, but it's RO, and fails!
    ftrace_bug() is called]

    When this race happens, ftrace_bug() will produces a nasty warning and
    all of the function tracing features will be disabled until reboot.

    The simple solution is to treate module load the same way the core
    kernel is treated at boot. To hardcode the ftrace function modification
    of converting calls to mcount into nops. This is done in init/main.c
    there's no reason it could not be done in load_module(). This gives
    a better control of the changes and doesn't tie the state of the
    module to its notifiers as much. Ftrace is special, it needs to be
    treated as such.

    The reason this would work, is that the ftrace_module_init() would be
    called while the module is in MODULE_STATE_UNFORMED, which is ignored
    by the set_all_module_text_ro() call.

    Link: http://lkml.kernel.org/r/1395637826-3312-1-git-send-email-indou.takao@jp.fujitsu.com

    Reported-by: Takao Indoh
    Acked-by: Rusty Russell
    Cc: stable@vger.kernel.org # 2.6.38+
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

21 Apr, 2014

4 commits


20 Apr, 2014

9 commits

  • …it/jolsa/perf into perf/urgent

    Pull perf/urgent fixes from Jiri Olsa:

    User visible changes:

    * Adjust symbols in VDSO to properly resolve its function names (Vladimir Nikulichev)

    * Improve error reporting for record session failure (Adrien BAK)

    * Fix 'Min time' counting in report command (Alexander Yarygin)

    Signed-off-by: Jiri Olsa <jolsa@redhat.com>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

    Ingo Molnar
     
  • In the current version, when using perf record, if something goes
    wrong in tools/perf/builtin-record.c:375
    session = perf_session__new(file, false, NULL);

    The error message:
    "Not enough memory for reading per file header"

    is issued. This error message seems to be outdated and is not very
    helpful. This patch proposes to replace this error message by
    "Perf session creation failed"

    I believe this issue has been brought to lkml:
    https://lkml.org/lkml/2014/2/24/458
    although this patch only tackles a (small) part of the issue.

    Additionnaly, this patch improves error reporting in
    tools/perf/util/data.c open_file_write.

    Currently, if the call to open fails, the user is unaware of it.
    This patch logs the error, before returning the error code to
    the caller.

    Reported-by: Will Deacon
    Signed-off-by: Adrien BAK
    Link: http://lkml.kernel.org/r/1397786443.3093.4.camel@beast
    [ Reorganize the changelog into paragraphs ]
    [ Added empty line after fd declaration in open_file_write ]
    Signed-off-by: Jiri Olsa

    Adrien BAK
     
  • pert-report doesn't resolve function names in VDSO:

    $ perf report --stdio -g flat,0.0,15,callee --sort pid
    ...
    8.76%
    0x7fff6b1fe861
    __gettimeofday
    ACE_OS::gettimeofday()
    ...

    In this case symbol values should be adjusted the same way as for executables,
    relocatable objects and prelinked libraries.

    After fix:

    $ perf report --stdio -g flat,0.0,15,callee --sort pid
    ...
    8.76%
    __vdso_gettimeofday
    __gettimeofday
    ACE_OS::gettimeofday()

    Signed-off-by: Vladimir Nikulichev
    Tested-by: Namhyung Kim
    Reviewed-by: Adrian Hunter
    Link: http://lkml.kernel.org/r/969812.163009436-sendEmail@nvs
    Signed-off-by: Jiri Olsa

    Vladimir Nikulichev
     
  • Every event in the perf-kvm has a 'stats' structure, which contains
    max/min/average/etc times of handling this event.
    The problem is that the 'perf-kvm stat report' command always shows
    that 'min time' is 0us for every event. Example:

    # perf kvm stat report

    Analyze events for all VCPUs:

    VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
    [..]
    0xB2 MSCH 12 0.07% 0.00% 0us 8us 7.31us ( +- 2.11% )
    0xB2 CHSC 12 0.07% 0.00% 0us 18us 9.39us ( +- 9.49% )
    0xB2 STPX 8 0.05% 0.00% 0us 2us 1.88us ( +- 7.18% )
    0xB2 STSI 7 0.04% 0.00% 0us 44us 16.49us ( +- 38.20% )
    [..]

    This happens because the 'stats' structure is not initialized and
    stats->min equals to 0. Lets initialize the structure for every
    event after its allocation using init_stats() function. This initializes
    stats->min to -1 and makes 'Min time' statistics counting work:

    # perf kvm stat report

    Analyze events for all VCPUs:

    VM-EXIT Samples Samples% Time% Min Time Max Time Avg time
    [..]
    0xB2 MSCH 12 0.07% 0.00% 6us 8us 7.31us ( +- 2.11% )
    0xB2 CHSC 12 0.07% 0.00% 7us 18us 9.39us ( +- 9.49% )
    0xB2 STPX 8 0.05% 0.00% 1us 2us 1.88us ( +- 7.18% )
    0xB2 STSI 7 0.04% 0.00% 1us 44us 16.49us ( +- 38.20% )
    [..]

    Signed-off-by: Alexander Yarygin
    Signed-off-by: Christian Borntraeger
    Reviewed-by: David Ahern
    Link: http://lkml.kernel.org/r/1397053319-2130-3-git-send-email-borntraeger@de.ibm.com
    [ Fixing the perf examples changelog output ]
    Signed-off-by: Jiri Olsa

    Alexander Yarygin
     
  • A va_list needs to be copied in case it needs to be used twice.

    Thanks to Hugh for debugging this issue, leading to various panics.

    Tested:

    lpq84:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern

    'produce_core' is simply : main() { *(int *)0 = 1;}

    lpq84:~# ./produce_core
    Segmentation fault (core dumped)
    lpq84:~# dmesg | tail -1
    [ 614.352947] Core dump to |/foobar12345 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 (null) pipe failed

    Notice the last argument was replaced by a NULL (we were lucky enough to
    not crash, but do not try this on your production machine !)

    After fix :

    lpq83:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern
    lpq83:~# ./produce_core
    Segmentation fault
    lpq83:~# dmesg | tail -1
    [ 740.800441] Core dump to |/foobar12345 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 pipe failed

    Fixes: 5fe9d8ca21cc ("coredump: cn_vprintf() has no reason to call vsnprintf() twice")
    Signed-off-by: Eric Dumazet
    Diagnosed-by: Hugh Dickins
    Acked-by: Oleg Nesterov
    Cc: Neil Horman
    Cc: Andrew Morton
    Cc: stable@vger.kernel.org # 3.11+
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     
  • Pull x86 fix from Ingo Molnar:
    "This fixes the preemption-count imbalance crash reported by Owen
    Kibel"

    * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    x86/mce: Fix CMCI preemption bugs

    Linus Torvalds
     
  • Pull scheduler fixes from Ingo Molnar:
    "Two fixes:

    - a SCHED_DEADLINE task selection fix
    - a sched/numa related lockdep splat fix"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched: Check for stop task appearance when balancing happens
    sched/numa: Fix task_numa_free() lockdep splat

    Linus Torvalds
     
  • Pull perf fixes from Ingo Molnar:
    "Two kernel side fixes:

    - an Intel uncore PMU driver potential crash fix
    - a kprobes/perf-call-graph interaction fix"

    * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    perf/x86/intel: Use rdmsrl_safe() when initializing RAPL PMU
    kprobes/x86: Fix page-fault handling logic

    Linus Torvalds
     
  • Pull drm fixes from Dave Airlie:
    "Unfortunately this contains no easter eggs, its a bit larger than I'd
    like, but I included a patch that just moves code from one file to
    another and I'd like to avoid merge conflicts with that later, so it
    makes it seem worse than it is,

    Otherwise:
    - radeon: fixes to use new microcode to stabilise some cards, use
    some common displayport code, some runtime pm fixes, pll regression
    fixes
    - i915: fix for some context oopses, a warn in a used path, backlight
    fixes
    - nouveau: regression fix
    - omap: a bunch of fixes"

    * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (51 commits)
    drm: bochs: drop unused struct fields
    drm: bochs: add power management support
    drm: cirrus: add power management support
    drm: Split out drm_probe_helper.c from drm_crtc_helper.c
    drm/plane-helper: Don't fake-implement primary plane disabling
    drm/ast: fix value check in cbr_scan2
    drm/nouveau/bios: fix a bit shift error introduced by 457e77b
    drm/radeon/ci: make sure mc ucode is loaded before checking the size
    drm/radeon/si: make sure mc ucode is loaded before checking the size
    drm/radeon: improve PLL params if we don't match exactly v2
    drm/radeon: memory leak on bo reservation failure. v2
    drm/radeon: fix VCE fence command
    drm/radeon: re-enable mclk dpm on R7 260X asics
    drm/radeon: add support for newer mc ucode on CI (v2)
    drm/radeon: add support for newer mc ucode on SI (v2)
    drm/radeon: apply more strict limits for PLL params v2
    drm/radeon: update CI DPM powertune settings
    drm/radeon: fix runpm handling on APUs (v4)
    drm/radeon: disable mclk dpm on R7 260X
    drm/tegra: Remove gratuitous pad field
    ...

    Linus Torvalds
     

19 Apr, 2014

26 commits

  • Some i2c fixes over DisplayPort.

    * 'drm-next-3.15-wip' of git://people.freedesktop.org/~deathsimple/linux:
    drm/radeon: Improve vramlimit module param documentation
    drm/radeon: fix audio pin counts for DCE6+ (v2)
    drm/radeon/dp: switch to the common i2c over aux code
    drm/dp/i2c: Update comments about common i2c over dp assumptions (v3)
    drm/dp/i2c: send bare addresses to properly reset i2c connections (v4)
    drm/radeon/dp: handle zero sized i2c over aux transactions (v2)
    drm/i915: support address only i2c-over-aux transactions
    drm/tegra: dp: Support address-only I2C-over-AUX transactions

    Dave Airlie
     
  • Pull more networking fixes from David Miller:

    1) Fix mlx4_en_netpoll implementation, it needs to schedule a NAPI
    context, not synchronize it. From Chris Mason.

    2) Ipv4 flow input interface should never be zero, it should be
    LOOPBACK_IFINDEX instead. From Cong Wang and Julian Anastasov.

    3) Properly configure MAC to PHY connection in mvneta devices, from
    Thomas Petazzoni.

    4) sys_recv should use SYSCALL_DEFINE. From Jan Glauber.

    5) Tunnel driver ioctls do not use the correct namespace, fix from
    Nicolas Dichtel.

    6) Fix memory leak on seccomp filter attach, from Kees Cook.

    7) Fix lockdep warning for nested vlans, from Ding Tianhong.

    8) Crashes can happen in SCTP due to how the auth_enable value is
    managed, fix from Vlad Yasevich.

    9) Wireless fixes from John W Linville and co.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (45 commits)
    net: sctp: cache auth_enable per endpoint
    tg3: update rx_jumbo_pending ring param only when jumbo frames are enabled
    vlan: Fix lockdep warning when vlan dev handle notification
    seccomp: fix memory leak on filter attach
    isdn: icn: buffer overflow in icn_command()
    ip6_tunnel: use the right netns in ioctl handler
    sit: use the right netns in ioctl handler
    ip_tunnel: use the right netns in ioctl handler
    net: use SYSCALL_DEFINEx for sys_recv
    net: mdio-gpio: Add support for separate MDI and MDO gpio pins
    net: mdio-gpio: Add support for active low gpio pins
    net: mdio-gpio: Use devm_ functions where possible
    ipv4, route: pass 0 instead of LOOPBACK_IFINDEX to fib_validate_source()
    ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif
    mlx4_en: don't use napi_synchronize inside mlx4_en_netpoll
    net: mvneta: properly configure the MAC PHY connection in all situations
    net: phy: add minimal support for QSGMII PHY
    sfc:On MCDI timeout, issue an FLR (and mark MCDI to fail-fast)
    mwifiex: fix hung task on command timeout
    mwifiex: process event before command response
    ...

    Linus Torvalds
     
  • Pull cifs fixes from Steve French:
    "A set of 5 small cifs fixes"

    * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
    cif: fix dead code
    cifs: fix error handling cifs_user_readv
    fs: cifs: remove unused variable.
    Return correct error on query of xattr on file with empty xattrs
    cifs: Wait for writebacks to complete before attempting write.

    Linus Torvalds
     
  • Pull char/misc driver fixes from Greg KH:
    "Here are a few driver fixes for char/misc drivers that resolve
    reported issues.

    All have been in linux-next successfully for a few days"

    * tag 'char-misc-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
    Drivers: hv: vmbus: Negotiate version 3.0 when running on ws2012r2 hosts
    Tools: hv: Handle the case when the target file exists correctly
    vme_tsi148: Utilize to_pci_dev() macro
    vme_tsi148: Fix PCI address mapping assumption
    vme_tsi148: Fix typo in tsi148_slave_get()
    w1: avoid recursive device_add
    w1: fix netlink refcnt leak on error path
    misc: Grammar s/addition/additional/
    drivers: mcb: fix memory leak in chameleon_parse_cells() error path
    mei: ignore client writing state during cb completion
    mei: me: do not load the driver if the FW doesn't support MEI interface
    GenWQE: Increase driver version number
    GenWQE: Fix multithreading problems
    GenWQE: Ensure rc is not returning an uninitialized value
    GenWQE: Add wmb before DDCB is started
    GenWQE: Enable access to VPD flash area

    Linus Torvalds
     
  • Pull driver core fixes from Greg KH:
    "Here are some driver core fixes for 3.15-rc2. Also in here are some
    documentation updates, as well as an API removal that had to wait for
    after -rc1 due to the cleanups coming into you from multiple developer
    trees (this one and the PPC tree.)

    All have been in linux next successfully"

    * tag 'driver-core-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
    drivers/base/dd.c incorrect pr_debug() parameters
    Documentation: Update stable address in Chinese and Japanese translations
    topology: Fix compilation warning when not in SMP
    Chinese: add translation of io_ordering.txt
    stable_kernel_rules: spelling/word usage
    sysfs, driver-core: remove unused {sysfs|device}_schedule_callback_owner()
    kernfs: protect lazy kernfs_iattrs allocation with mutex
    fs: Don't return 0 from get_anon_bdev

    Linus Torvalds
     
  • Pull staging driver fixes from Greg KH:
    "Here are a few staging driver fixes for issues that have been reported
    for 3.15-rc2.

    Also dominating the diffstat for the pull request is the removal of
    the rtl8187se driver. It's no longer needed in staging as a "real"
    driver for this hardware is now merged in the tree in the "correct"
    location in drivers/net/

    All of these patches have been tested in linux-next"

    * tag 'staging-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
    staging: r8188eu: Fix case where ethtype was never obtained and always be checked against 0
    staging: r8712u: Fix case where ethtype was never obtained and always be checked against 0
    staging: r8188eu: Calling rtw_get_stainfo() with a NULL sta_addr will return NULL
    staging: comedi: fix circular locking dependency in comedi_mmap()
    staging: r8723au: Add missing initialization of change_inx in sort algorithm
    Staging: unisys: use after free in list_for_each()
    staging: unisys: use after free in error messages
    staging: speakup: fix misuse of kstrtol() in handle_goto()
    staging: goldfish: Call free_irq in error path
    staging: delete rtl8187se wireless driver
    staging: rtl8723au: Fix buffer overflow in rtw_get_wfd_ie()
    staging: gs_fpgaboot: remove __TIMESTAMP__ macro
    staging: vme: fix memory leak in vme_user_probe()
    staging: fpgaboot: clean up Makefile
    staging/usbip: fix store_attach() sscanf return value check
    staging/usbip: userspace - fix usbipd SIGSEGV from refresh_exported_devices()
    staging: rtl8188eu: remove spaces, correct counts to unbreak P2P ioctls
    staging/rtl8821ae: Fix OOM handling in _rtl_init_deferred_work()

    Linus Torvalds
     
  • Pull tty/serial driver fixes from Greg KH:
    "Here are a number of small tty/serial driver fixes for 3.15-rc2. Also
    in here are some Documentation file removals for drivers that we
    removed a long time ago, no need to keep it around any longer.

    All of these have been in linux-next for a bit"

    * tag 'tty-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
    Revert "serial: 8250, disable "too much work" messages"
    serial: amba-pl011: fix regression, causing an Oops on rmmod
    tty: Fix help text of SYNCLINK_CS
    tty: fix memleak in alloc_pid
    ttyprintk: Allow built as a module
    ttyprintk: Fix wrong tty_unregister_driver() call in the error path
    serial: 8250, disable "too much work" messages
    Documentation/serial: Delete obsolete driver documentation
    serial: omap: Fix missing pm_runtime_resume handling by simplifying code
    serial_core: Fix pm imbalance on unbind
    serial: pl011: change Rx burst size to half of trigger level
    serial: timberdale: Depend on X86_32
    serial: st-asc: Fix SysRq char handling
    Revert "serial: clps711x: Give a chance to perform useful tasks during wait loop"
    serial_core: Fix conditional start_tx on ring buffer not empty
    serial: efm32: use $vendor,$device scheme for compatible string
    serial: omap: free the wakeup settings in remove

    Linus Torvalds
     
  • Pull USB fixes from Greg KH:
    "Here are a number of tiny USB fixes and new device ids for 3.15-rc2.
    Nothing major, just issues some people have reported.

    All of these have been in linux-next"

    * tag 'usb-3.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
    uas: fix deadlocky memory allocations
    uas: fix error handling during scsi_scan()
    uas: fix GFP_NOIO under spinlock
    uwb: adds missing error handling
    USB: cdc-acm: Remove Motorola/Telit H24 serial interfaces from ACM driver
    USB: ohci-jz4740: FEAT_POWER is a port feature, not a hub feature
    USB: ohci-jz4740: Fix uninitialized variable warning
    USB: EHCI: tegra: set txfill_tuning
    usb: ehci-platform: Return immediately from suspend if ehci_suspend fails
    usb: ehci-exynos: Return immediately from suspend if ehci_suspend fails
    USB: fix crash during hotplug of PCI USB controller card
    USB: cdc-acm: fix double usb_autopm_put_interface() in acm_port_activate()
    usb: usb-common: fix typo for usb_state_string
    USB: usb_wwan: fix handling of missing bulk endpoints
    USB: pl2303: add ids for Hewlett-Packard HP POS pole displays
    USB: cp210x: Add 8281 (Nanotec Plug & Drive)
    usb: option driver, add support for Telit UE910v2
    Revert "USB: serial: add usbid for dell wwan card to sierra.c"
    USB: serial: ftdi_sio: add id for Brainboxes serial cards

    Linus Torvalds
     
  • Merge misc fixes from Andrew Morton:
    "13 fixes"

    * emailed patches from Andrew Morton :
    thp: close race between split and zap huge pages
    mm: fix new kernel-doc warning in filemap.c
    mm: fix CONFIG_DEBUG_VM_RB description
    mm: use paravirt friendly ops for NUMA hinting ptes
    mips: export flush_icache_range
    mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()
    wait: explain the shadowing and type inconsistencies
    Shiraz has moved
    Documentation/vm/numa_memory_policy.txt: fix wrong document in numa_memory_policy.txt
    powerpc/mm: fix ".__node_distance" undefined
    kernel/watchdog.c:touch_softlockup_watchdog(): use raw_cpu_write()
    init/Kconfig: move the trusted keyring config option to general setup
    vmscan: reclaim_clean_pages_from_list() must use mod_zone_page_state()

    Linus Torvalds
     
  • Sasha Levin has reported two THP BUGs[1][2]. I believe both of them
    have the same root cause. Let's look to them one by one.

    The first bug[1] is "kernel BUG at mm/huge_memory.c:1829!". It's
    BUG_ON(mapcount != page_mapcount(page)) in __split_huge_page(). From my
    testing I see that page_mapcount() is higher than mapcount here.

    I think it happens due to race between zap_huge_pmd() and
    page_check_address_pmd(). page_check_address_pmd() misses PMD which is
    under zap:

    CPU0 CPU1
    zap_huge_pmd()
    pmdp_get_and_clear()
    __split_huge_page()
    anon_vma_interval_tree_foreach()
    __split_huge_page_splitting()
    page_check_address_pmd()
    mm_find_pmd()
    /*
    * We check if PMD present without taking ptl: no
    * serialization against zap_huge_pmd(). We miss this PMD,
    * it's not accounted to 'mapcount' in __split_huge_page().
    */
    pmd_present(pmd) == 0

    BUG_ON(mapcount != page_mapcount(page)) // CRASH!!!

    page_remove_rmap(page)
    atomic_add_negative(-1, &page->_mapcount)

    The second bug[2] is "kernel BUG at mm/huge_memory.c:1371!".
    It's VM_BUG_ON_PAGE(!PageHead(page), page) in zap_huge_pmd().

    This happens in similar way:

    CPU0 CPU1
    zap_huge_pmd()
    pmdp_get_and_clear()
    page_remove_rmap(page)
    atomic_add_negative(-1, &page->_mapcount)
    __split_huge_page()
    anon_vma_interval_tree_foreach()
    __split_huge_page_splitting()
    page_check_address_pmd()
    mm_find_pmd()
    pmd_present(pmd) == 0 /* The same comment as above */
    /*
    * No crash this time since we already decremented page->_mapcount in
    * zap_huge_pmd().
    */
    BUG_ON(mapcount != page_mapcount(page))

    /*
    * We split the compound page here into small pages without
    * serialization against zap_huge_pmd()
    */
    __split_huge_page_refcount()
    VM_BUG_ON_PAGE(!PageHead(page), page); // CRASH!!!

    So my understanding the problem is pmd_present() check in mm_find_pmd()
    without taking page table lock.

    The bug was introduced by me commit with commit 117b0791ac42. Sorry for
    that. :(

    Let's open code mm_find_pmd() in page_check_address_pmd() and do the
    check under page table lock.

    Note that __page_check_address() does the same for PTE entires
    if sync != 0.

    I've stress tested split and zap code paths for 36+ hours by now and
    don't see crashes with the patch applied. Before it took
    [2] https://lkml.kernel.org/g/

    Signed-off-by: Kirill A. Shutemov
    Reported-by: Sasha Levin
    Tested-by: Sasha Levin
    Cc: Bob Liu
    Cc: Andrea Arcangeli
    Cc: Rik van Riel
    Cc: Mel Gorman
    Cc: Michel Lespinasse
    Cc: Dave Jones
    Cc: Vlastimil Babka
    Cc: [3.13+]

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill A. Shutemov
     
  • Fix new kernel-doc warning in mm/filemap.c:

    Warning(mm/filemap.c:2600): Excess function parameter 'ppos' description in '__generic_file_aio_write'

    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • This appears to be a copy/paste error. Update the description to
    reflect extra rbtree debug and checks for the config option instead of
    duplicating CONFIG_DEBUG_VM.

    Signed-off-by: Davidlohr Bueso
    Cc: Aswin Chandramouleeswaran
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • David Vrabel identified a regression when using automatic NUMA balancing
    under Xen whereby page table entries were getting corrupted due to the
    use of native PTE operations. Quoting him

    Xen PV guest page tables require that their entries use machine
    addresses if the preset bit (_PAGE_PRESENT) is set, and (for
    successful migration) non-present PTEs must use pseudo-physical
    addresses. This is because on migration MFNs in present PTEs are
    translated to PFNs (canonicalised) so they may be translated back
    to the new MFN in the destination domain (uncanonicalised).

    pte_mknonnuma(), pmd_mknonnuma(), pte_mknuma() and pmd_mknuma()
    set and clear the _PAGE_PRESENT bit using pte_set_flags(),
    pte_clear_flags(), etc.

    In a Xen PV guest, these functions must translate MFNs to PFNs
    when clearing _PAGE_PRESENT and translate PFNs to MFNs when setting
    _PAGE_PRESENT.

    His suggested fix converted p[te|md]_[set|clear]_flags to using
    paravirt-friendly ops but this is overkill. He suggested an alternative
    of using p[te|md]_modify in the NUMA page table operations but this is
    does more work than necessary and would require looking up a VMA for
    protections.

    This patch modifies the NUMA page table operations to use paravirt
    friendly operations to set/clear the flags of interest. Unfortunately
    this will take a performance hit when updating the PTEs on
    CONFIG_PARAVIRT but I do not see a way around it that does not break
    Xen.

    Signed-off-by: Mel Gorman
    Acked-by: David Vrabel
    Tested-by: David Vrabel
    Cc: Ingo Molnar
    Cc: Peter Anvin
    Cc: Fengguang Wu
    Cc: Linus Torvalds
    Cc: Steven Noonan
    Cc: Rik van Riel
    Cc: Peter Zijlstra
    Cc: Andrea Arcangeli
    Cc: Dave Hansen
    Cc: Srikar Dronamraju
    Cc: Cyrill Gorcunov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • The lkdtm module performs tests against executable memory ranges, so it
    needs to flush the icache for proper behaviors. Other architectures
    already export this, so do the same for MIPS.

    [akpm@linux-foundation.org: relocate export sites]
    Signed-off-by: Kees Cook
    Cc: Paul Gortmaker
    Cc: Ralf Baechle
    Cc: Sanjay Lal
    Cc: John Crispin
    Cc: Sergei Shtylyov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kees Cook
     
  • soft lockup in freeing gigantic hugepage fixed in commit 55f67141a892 "mm:
    hugetlb: fix softlockup when a large number of hugepages are freed." can
    happen in return_unused_surplus_pages(), so let's fix it.

    Signed-off-by: Masayoshi Mizuma
    Signed-off-by: Naoya Horiguchi
    Cc: Joonsoo Kim
    Cc: Michal Hocko
    Cc: Aneesh Kumar
    Cc: KOSAKI Motohiro
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mizuma, Masayoshi
     
  • Stick in a comment before someone else tries to fix the sparse warning
    this generates.

    Suggested-by: Andrew Morton
    Signed-off-by: Peter Zijlstra
    Link: http://lkml.kernel.org/n/tip-o2ro6f3vkxklni0bc8f7m68s@git.kernel.org
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • shiraz.hashim@st.com email-id doesn't exist anymore as he has left the
    company. Replace ST's id with shiraz.linux.kernel@gmail.com.

    It also updates .mailmap file to fix address for 'git shortlog'.

    Signed-off-by: Viresh Kumar
    Cc: Shiraz Hashim
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Viresh Kumar
     
  • In document numa_memory_policy.txt, the following examples for flag
    MPOL_F_RELATIVE_NODES are incorrect.

    For example, consider a task that is attached to a cpuset with
    mems 2-5 that sets an Interleave policy over the same set with
    MPOL_F_RELATIVE_NODES. If the cpuset's mems change to 3-7, the
    interleave now occurs over nodes 3,5-6. If the cpuset's mems
    then change to 0,2-3,5, then the interleave occurs over nodes
    0,3,5.

    According to the comment of the patch adding flag MPOL_F_RELATIVE_NODES,
    the nodemasks the user specifies should be considered relative to the
    current task's mems_allowed.

    (https://lkml.org/lkml/2008/2/29/428)

    And according to numa_memory_policy.txt, if the user's nodemask includes
    nodes that are outside the range of the new set of allowed nodes, then
    the remap wraps around to the beginning of the nodemask and, if not
    already set, sets the node in the mempolicy nodemask.

    So in the example, if the user specifies 2-5, for a task whose
    mems_allowed is 3-7, the nodemasks should be remapped the third, fourth,
    fifth, sixth node in mems_allowed. like the following:

    mems_allowed: 3 4 5 6 7

    relative index: 0 1 2 3 4
    5

    So the nodemasks should be remapped to 3,5-7, but not 3,5-6.

    And for a task whose mems_allowed is 0,2-3,5, the nodemasks should be
    remapped to 0,2-3,5, but not 0,3,5.

    mems_allowed: 0 2 3 5

    relative index: 0 1 2 3
    4 5

    Signed-off-by: Tang Chen
    Cc: Randy Dunlap
    Acked-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tang Chen
     
  • CHK include/config/kernel.release
    CHK include/generated/uapi/linux/version.h
    CHK include/generated/utsrelease.h
    ...
    Building modules, stage 2.
    WARNING: 1 bad relocations
    c0000000013d6a30 R_PPC64_ADDR64 uprobes_fetch_type_table
    WRAP arch/powerpc/boot/zImage.pseries
    WRAP arch/powerpc/boot/zImage.epapr
    MODPOST 1849 modules
    ERROR: ".__node_distance" [drivers/block/nvme.ko] undefined!
    make[1]: *** [__modpost] Error 1
    make: *** [modules] Error 2
    make: *** Waiting for unfinished jobs....

    The reason is symbol "__node_distance" not been exported in powerpc.

    Signed-off-by: Mike Qiu
    Acked-by: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Cc: Nathan Fontenot
    Cc: Stephen Rothwell
    Cc: Srivatsa S. Bhat
    Cc: Jesse Larrew
    Cc: Robert Jennings
    Cc: Alistair Popple
    Cc: Mike Qiu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Qiu
     
  • Fix:

    BUG: using __this_cpu_write() in preemptible [00000000] code: systemd-udevd/497
    caller is __this_cpu_preempt_check+0x13/0x20
    CPU: 3 PID: 497 Comm: systemd-udevd Tainted: G W 3.15.0-rc1 #9
    Hardware name: Hewlett-Packard HP EliteBook 8470p/179B, BIOS 68ICF Ver. F.02 04/27/2012
    Call Trace:
    check_preemption_disabled+0xe1/0xf0
    __this_cpu_preempt_check+0x13/0x20
    touch_nmi_watchdog+0x28/0x40

    Reported-by: Luis Henriques
    Tested-by: Luis Henriques
    Cc: Eric Piel
    Cc: Robert Moore
    Cc: Lv Zheng
    Cc: "Rafael J. Wysocki"
    Cc: Len Brown
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • The SYSTEM_TRUSTED_KEYRING config option is not in any menu, causing it
    to show up in the toplevel of the kernel configuration. Fix this by
    moving it under the General Setup menu.

    Signed-off-by: Peter Foley
    Cc: David Howells
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Foley
     
  • Seems to be called with preemption enabled. Therefore it must use
    mod_zone_page_state instead.

    Signed-off-by: Christoph Lameter
    Reported-by: Grygorii Strashko
    Tested-by: Grygorii Strashko
    Cc: Tejun Heo
    Cc: Santosh Shilimkar
    Cc: Ingo Molnar
    Cc: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Currently, it is possible to create an SCTP socket, then switch
    auth_enable via sysctl setting to 1 and crash the system on connect:

    Oops[#1]:
    CPU: 0 PID: 0 Comm: swapper Not tainted 3.14.1-mipsgit-20140415 #1
    task: ffffffff8056ce80 ti: ffffffff8055c000 task.ti: ffffffff8055c000
    [...]
    Call Trace:
    [] sctp_auth_asoc_set_default_hmac+0x68/0x80
    [] sctp_process_init+0x5e0/0x8a4
    [] sctp_sf_do_5_1B_init+0x234/0x34c
    [] sctp_do_sm+0xb4/0x1e8
    [] sctp_endpoint_bh_rcv+0x1c4/0x214
    [] sctp_rcv+0x588/0x630
    [] sctp6_rcv+0x10/0x24
    [] ip6_input+0x2c0/0x440
    [] __netif_receive_skb_core+0x4a8/0x564
    [] process_backlog+0xb4/0x18c
    [] net_rx_action+0x12c/0x210
    [] __do_softirq+0x17c/0x2ac
    [] irq_exit+0x54/0xb0
    [] ret_from_irq+0x0/0x4
    [] rm7k_wait_irqoff+0x24/0x48
    [] cpu_startup_entry+0xc0/0x148
    [] start_kernel+0x37c/0x398
    Code: dd0900b8 000330f8 0126302d 50c0fff1 0047182a a48306a0
    03e00008 00000000
    ---[ end trace b530b0551467f2fd ]---
    Kernel panic - not syncing: Fatal exception in interrupt

    What happens while auth_enable=0 in that case is, that
    ep->auth_hmacs is initialized to NULL in sctp_auth_init_hmacs()
    when endpoint is being created.

    After that point, if an admin switches over to auth_enable=1,
    the machine can crash due to NULL pointer dereference during
    reception of an INIT chunk. When we enter sctp_process_init()
    via sctp_sf_do_5_1B_init() in order to respond to an INIT chunk,
    the INIT verification succeeds and while we walk and process
    all INIT params via sctp_process_param() we find that
    net->sctp.auth_enable is set, therefore do not fall through,
    but invoke sctp_auth_asoc_set_default_hmac() instead, and thus,
    dereference what we have set to NULL during endpoint
    initialization phase.

    The fix is to make auth_enable immutable by caching its value
    during endpoint initialization, so that its original value is
    being carried along until destruction. The bug seems to originate
    from the very first days.

    Fix in joint work with Daniel Borkmann.

    Reported-by: Joshua Kinard
    Signed-off-by: Vlad Yasevich
    Signed-off-by: Daniel Borkmann
    Acked-by: Neil Horman
    Tested-by: Joshua Kinard
    Signed-off-by: David S. Miller

    Vlad Yasevich
     
  • John W. Linville says:

    ====================
    pull request: wireless 2014-04-17

    Please pull this batch of fixes intended for the 3.15 stream...

    For the mac80211 bits, Johannes says:

    "We have a fix from Chun-Yeow to not look at management frame bitrates
    that are typically really low, two fixes from Felix for AP_VLAN
    interfaces, a fix from Ido to disable SMPS settings when a monitor
    interface is enabled, a radar detection fix from Michał and a fix from
    myself for a very old remain-on-channel bug."

    For the iwlwifi bits, Emmanuel says:

    "I have new device IDs and a new firmware API. These are the trivial
    ones. The less trivial ones are Johannes's fix that delays the
    enablement of an interrupt coalescing hardware until after association
    - this fixes a few connection problems seen in the field. Eyal has a
    bunch of rate control fixes. I decided to add these for 3.15 because
    they fix some disconnection and packet loss scenarios which were
    reported by the field. I also have a fix for a memory leak that
    happens only with a very new NIC."

    Along with those...

    Amitkumar Karwar fixes a couple of problems relating to driver/firmware
    interactions in mwifiex.

    Christian Engelmayer avoids a couple of potential memory leaks in
    the new rsi driver.

    Eliad Peller provides a wl18xx mailbox alignment fix for problems
    when using new firmware.

    Frederic Danis adds a couple of missing debugging strings to the
    cw1200 driver.

    Geert Uytterhoeven adds a variable initialization inside of the
    rsi driver.

    Luciano Coelho patches the wlcore code to ignore dummy packet events
    in PLT mode in order to work around a firmware bug.
    ====================

    Signed-off-by: David S. Miller

    David S. Miller
     
  • The patch fixes a problem with dropped jumbo frames after usage of
    'ethtool -G ... rx'.

    Scenario:
    1. ip link set eth0 up
    2. ethtool -G eth0 rx N # 1500).

    Signed-off-by: Ivan Vecera
    Acked-by: Michael Chan
    Signed-off-by: David S. Miller

    Ivan Vecera
     
  • When I open the LOCKDEP config and run these steps:

    modprobe 8021q
    vconfig add eth2 20
    vconfig add eth2.20 30
    ifconfig eth2 xx.xx.xx.xx

    then the Call Trace happened:

    [32524.386288] =============================================
    [32524.386293] [ INFO: possible recursive locking detected ]
    [32524.386298] 3.14.0-rc2-0.7-default+ #35 Tainted: G O
    [32524.386302] ---------------------------------------------
    [32524.386306] ifconfig/3103 is trying to acquire lock:
    [32524.386310] (&vlan_netdev_addr_lock_key/1){+.....}, at: [] dev_mc_sync+0x64/0xb0
    [32524.386326]
    [32524.386326] but task is already holding lock:
    [32524.386330] (&vlan_netdev_addr_lock_key/1){+.....}, at: [] dev_set_rx_mode+0x23/0x40
    [32524.386341]
    [32524.386341] other info that might help us debug this:
    [32524.386345] Possible unsafe locking scenario:
    [32524.386345]
    [32524.386350] CPU0
    [32524.386352] ----
    [32524.386354] lock(&vlan_netdev_addr_lock_key/1);
    [32524.386359] lock(&vlan_netdev_addr_lock_key/1);
    [32524.386364]
    [32524.386364] *** DEADLOCK ***
    [32524.386364]
    [32524.386368] May be due to missing lock nesting notation
    [32524.386368]
    [32524.386373] 2 locks held by ifconfig/3103:
    [32524.386376] #0: (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x12/0x20
    [32524.386387] #1: (&vlan_netdev_addr_lock_key/1){+.....}, at: [] dev_set_rx_mode+0x23/0x40
    [32524.386398]
    [32524.386398] stack backtrace:
    [32524.386403] CPU: 1 PID: 3103 Comm: ifconfig Tainted: G O 3.14.0-rc2-0.7-default+ #35
    [32524.386409] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
    [32524.386414] ffffffff81ffae40 ffff8800d9625ae8 ffffffff814f68a2 ffff8800d9625bc8
    [32524.386421] ffffffff810a35fb ffff8800d8a8d9d0 00000000d9625b28 ffff8800d8a8e5d0
    [32524.386428] 000003cc00000000 0000000000000002 ffff8800d8a8e5f8 0000000000000000
    [32524.386435] Call Trace:
    [32524.386441] [] dump_stack+0x6a/0x78
    [32524.386448] [] __lock_acquire+0x7ab/0x1940
    [32524.386454] [] ? __lock_acquire+0x3ea/0x1940
    [32524.386459] [] lock_acquire+0xe4/0x110
    [32524.386464] [] ? dev_mc_sync+0x64/0xb0
    [32524.386471] [] _raw_spin_lock_nested+0x2a/0x40
    [32524.386476] [] ? dev_mc_sync+0x64/0xb0
    [32524.386481] [] dev_mc_sync+0x64/0xb0
    [32524.386489] [] vlan_dev_set_rx_mode+0x2b/0x50 [8021q]
    [32524.386495] [] __dev_set_rx_mode+0x5f/0xb0
    [32524.386500] [] dev_set_rx_mode+0x2b/0x40
    [32524.386506] [] __dev_open+0xef/0x150
    [32524.386511] [] __dev_change_flags+0xa7/0x190
    [32524.386516] [] dev_change_flags+0x32/0x80
    [32524.386524] [] devinet_ioctl+0x7d6/0x830
    [32524.386532] [] ? dev_ioctl+0x34b/0x660
    [32524.386540] [] inet_ioctl+0x80/0xa0
    [32524.386550] [] sock_do_ioctl+0x2d/0x60
    [32524.386558] [] sock_ioctl+0x82/0x2a0
    [32524.386568] [] do_vfs_ioctl+0x93/0x590
    [32524.386578] [] ? rcu_read_lock_held+0x45/0x50
    [32524.386586] [] ? __fget_light+0x105/0x110
    [32524.386594] [] SyS_ioctl+0x91/0xb0
    [32524.386604] [] system_call_fastpath+0x16/0x1b

    ========================================================================

    The reason is that all of the addr_lock_key for vlan dev have the same class,
    so if we change the status for vlan dev, the vlan dev and its real dev will
    hold the same class of addr_lock_key together, so the warning happened.

    we should distinguish the lock depth for vlan dev and its real dev.

    v1->v2: Convert the vlan_netdev_addr_lock_key to an array of eight elements, which
    could support to add 8 vlan id on a same vlan dev, I think it is enough for current
    scene, because a netdev's name is limited to IFNAMSIZ which could not hold 8 vlan id,
    and the vlan dev would not meet the same class key with its real dev.

    The new function vlan_dev_get_lockdep_subkey() will return the subkey and make the vlan
    dev could get a suitable class key.

    v2->v3: According David's suggestion, I use the subclass to distinguish the lock key for vlan dev
    and its real dev, but it make no sense, because the difference for subclass in the
    lock_class_key doesn't mean that the difference class for lock_key, so I use lock_depth
    to distinguish the different depth for every vlan dev, the same depth of the vlan dev
    could have the same lock_class_key, I import the MAX_LOCK_DEPTH from the include/linux/sched.h,
    I think it is enough here, the lockdep should never exceed that value.

    v3->v4: Add a huge array of locking keys will waste static kernel memory and is not a appropriate method,
    we could use _nested() variants to fix the problem, calculate the depth for every vlan dev,
    and use the depth as the subclass for addr_lock_key.

    Signed-off-by: Ding Tianhong
    Signed-off-by: David S. Miller

    dingtianhong