23 Aug, 2013

2 commits

  • IP sends device configuration (see inet_fill_link_af) as an array
    in the netlink information, but the indices in that array are not
    exposed to userspace through any current santized header file.

    It was available back in 2.6.32 (in /usr/include/linux/sysctl.h)
    but was broken by:
    commit 02291680ffba92e5b5865bc0c5e7d1f3056b80ec
    Author: Eric W. Biederman
    Date: Sun Feb 14 03:25:51 2010 +0000

    net ipv4: Decouple ipv4 interface parameters from binary sysctl numbers

    Eric was solving the sysctl problem but then the indices were re-exposed
    by a later addition of devconf support for IPV4

    commit 9f0f7272ac9506f4c8c05cc597b7e376b0b9f3e4
    Author: Thomas Graf
    Date: Tue Nov 16 04:32:48 2010 +0000

    ipv4: AF_INET link address family

    Putting them in /usr/include/linux/ip.h seemed the logical match
    for the DEVCONF_ definitions for IPV6 in /usr/include/linux/ip6.h

    Signed-off-by: Stephen Hemminger
    Signed-off-by: David S. Miller

    stephen hemminger
     
  • rfc 4861 says the Redirected Header option is optional, so
    the kernel should not drop the Redirect Message that has no
    Redirected Header option. In this patch, the function
    ip6_redirect_no_header() is introduced to deal with that
    condition.

    Signed-off-by: Duan Jiong
    Acked-by: Hannes Frederic Sowa

    Duan Jiong
     

20 Aug, 2013

1 commit

  • It is not allowed for an ipv6 packet to contain multiple fragmentation
    headers. So discard packets which were already reassembled by
    fragmentation logic and send back a parameter problem icmp.

    The updates for RFC 6980 will come in later, I have to do a bit more
    research here.

    Cc: YOSHIFUJI Hideaki
    Signed-off-by: Hannes Frederic Sowa
    Signed-off-by: David S. Miller

    Hannes Frederic Sowa
     

17 Aug, 2013

1 commit

  • Pull networking fixes from David Miller:

    1) Fix SKB leak in 8139cp, from Dave Jones.

    2) Fix use of *_PAGES interfaces with mlx5 firmware, from Moshe Lazar.

    3) RCU conversion of macvtap introduced two races, fixes by Eric
    Dumazet

    4) Synchronize statistic flows in bnx2x driver to prevent corruption,
    from Dmitry Kravkov

    5) Undo optimization in IP tunneling, we were using the inner IP header
    in some cases to inherit the IP ID, but that isn't correct in some
    circumstances. From Pravin B Shelar

    6) Use correct struct size when parsing netlink attributes in
    rtnl_bridge_getlink(). From Asbjoern Sloth Toennesen

    7) Length verifications in tun_get_user() are bogus, from Weiping Pan
    and Dan Carpenter

    8) Fix bad merge resolution during 3.11 networking development in
    openvswitch, albeit a harmless one which added some unreachable
    code. From Jesse Gross

    9) Wrong size used in flexible array allocation in openvswitch, from
    Pravin B Shelar

    10) Clear out firmware capability flags the be2net driver isn't ready to
    handle yet, from Sarveshwar Bandi

    11) Revert DMA mapping error checking addition to cxgb3 driver, it's
    buggy. From Alexey Kardashevskiy

    12) Fix regression in packet scheduler rate limiting when working with a
    link layer of ATM. From Jesper Dangaard Brouer

    13) Fix several errors in TCP Cubic congestion control, in particular
    overflow errors in timestamp calculations. From Eric Dumazet and
    Van Jacobson

    14) In ipv6 routing lookups, we need to backtrack if subtree traversal
    don't result in a match. From Hannes Frederic Sowa

    15) ipgre_header() returns incorrect packet offset. Fix from Timo Teräs

    16) Get "low latency" out of the new MIB counter names. From Eliezer
    Tamir

    17) State check in ndo_dflt_fdb_del() is inverted, from Sridhar
    Samudrala

    18) Handle TCP Fast Open properly in netfilter conntrack, from Yuchung
    Cheng

    19) Wrong memcpy length in pcan_usb driver, from Stephane Grosjean

    20) Fix dealock in TIPC, from Wang Weidong and Ding Tianhong

    21) call_rcu() call to destroy SCTP transport is done too early and
    might result in an oops. From Daniel Borkmann

    22) Fix races in genetlink family dumps, from Johannes Berg

    23) Flags passed into macvlan by the user need to be validated properly,
    from Michael S Tsirkin

    24) Fix skge build on 32-bit, from Stephen Hemminger

    25) Handle malformed TCP headers properly in xt_TCPMSS, from Pablo Neira
    Ayuso

    26) Fix handling of stacked vlans in vlan_dev_real_dev(), from Nikolay
    Aleksandrov

    27) Eliminate MTU calculation overflows in esp{4,6}, from Daniel
    Borkmann

    28) neigh_parms need to be setup before calling the ->ndo_neigh_setup()
    method. From Veaceslav Falico

    29) Kill out-of-bounds prefetch in fib_trie, from Eric Dumazet

    30) Don't dereference MLD query message if the length isn't value in the
    bridge multicast code, from Linus Lüssing

    31) Fix VXLAN IGMP join regression due to an inverted check, from Cong
    Wang

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (70 commits)
    net/mlx5_core: Support MANAGE_PAGES and QUERY_PAGES firmware command changes
    tun: signedness bug in tun_get_user()
    qlcnic: Fix diagnostic interrupt test for 83xx adapters
    qlcnic: Fix beacon state return status handling
    qlcnic: Fix set driver version command
    net: tg3: fix NULL pointer dereference in tg3_io_error_detected and tg3_io_slot_reset
    net_sched: restore "linklayer atm" handling
    drivers/net/ethernet/via/via-velocity.c: update napi implementation
    Revert "cxgb3: Check and handle the dma mapping errors"
    be2net: Clear any capability flags that driver is not interested in.
    openvswitch: Reset tunnel key between input and output.
    openvswitch: Use correct type while allocating flex array.
    openvswitch: Fix bad merge resolution.
    tun: compare with 0 instead of total_len
    rtnetlink: rtnl_bridge_getlink: Call nlmsg_find_attr() with ifinfomsg header
    ethernet/arc/arc_emac - fix NAPI "work > weight" warning
    ip_tunnel: Do not use inner ip-header-id for tunnel ip-header-id.
    bnx2x: prevent crash in shutdown flow with CNIC
    bnx2x: fix PTE write access error
    bnx2x: fix memory leak in VF
    ...

    Linus Torvalds
     

16 Aug, 2013

2 commits

  • Ben Tebulin reported:

    "Since v3.7.2 on two independent machines a very specific Git
    repository fails in 9/10 cases on git-fsck due to an SHA1/memory
    failures. This only occurs on a very specific repository and can be
    reproduced stably on two independent laptops. Git mailing list ran
    out of ideas and for me this looks like some very exotic kernel issue"

    and bisected the failure to the backport of commit 53a59fc67f97 ("mm:
    limit mmu_gather batching to fix soft lockups on !CONFIG_PREEMPT").

    That commit itself is not actually buggy, but what it does is to make it
    much more likely to hit the partial TLB invalidation case, since it
    introduces a new case in tlb_next_batch() that previously only ever
    happened when running out of memory.

    The real bug is that the TLB gather virtual memory range setup is subtly
    buggered. It was introduced in commit 597e1c3580b7 ("mm/mmu_gather:
    enable tlb flush range in generic mmu_gather"), and the range handling
    was already fixed at least once in commit e6c495a96ce0 ("mm: fix the TLB
    range flushed when __tlb_remove_page() runs out of slots"), but that fix
    was not complete.

    The problem with the TLB gather virtual address range is that it isn't
    set up by the initial tlb_gather_mmu() initialization (which didn't get
    the TLB range information), but it is set up ad-hoc later by the
    functions that actually flush the TLB. And so any such case that forgot
    to update the TLB range entries would potentially miss TLB invalidates.

    Rather than try to figure out exactly which particular ad-hoc range
    setup was missing (I personally suspect it's the hugetlb case in
    zap_huge_pmd(), which didn't have the same logic as zap_pte_range()
    did), this patch just gets rid of the problem at the source: make the
    TLB range information available to tlb_gather_mmu(), and initialize it
    when initializing all the other tlb gather fields.

    This makes the patch larger, but conceptually much simpler. And the end
    result is much more understandable; even if you want to play games with
    partial ranges when invalidating the TLB contents in chunks, now the
    range information is always there, and anybody who doesn't want to
    bother with it won't introduce subtle bugs.

    Ben verified that this fixes his problem.

    Reported-bisected-and-tested-by: Ben Tebulin
    Build-testing-by: Stephen Rothwell
    Build-testing-by: Richard Weinberger
    Reviewed-by: Michal Hocko
    Acked-by: Peter Zijlstra
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • In the previous QUERY_PAGES command version we used one command to get the
    required amount of boot, init and post init pages. The new version uses the
    op_mod field to specify whether the query is for the required amount of boot,
    init or post init pages. In addition the output field size for the required
    amount of pages increased from 16 to 32 bits.

    In MANAGE_PAGES command the input_num_entries and output_num_entries fields
    sizes changed from 16 to 32 bits and the PAS tables offset changed to 0x10.

    In the pages request event the num_pages field also changed to 32 bits.

    In the HCA-capabilities-layout the size and location of max_qp_mcg field has
    been changed to support 24 bits.

    This patch isn't compatible with firmware versions < 5; however, it turns out that the
    first GA firmware we will publish will not support previous versions so this should be OK.

    Signed-off-by: Moshe Lazer
    Signed-off-by: Eli Cohen
    Signed-off-by: David S. Miller

    Moshe Lazer
     

15 Aug, 2013

2 commits

  • commit 56b765b79 ("htb: improved accuracy at high rates")
    broke the "linklayer atm" handling.

    tc class add ... htb rate X ceil Y linklayer atm

    The linklayer setting is implemented by modifying the rate table
    which is send to the kernel. No direct parameter were
    transferred to the kernel indicating the linklayer setting.

    The commit 56b765b79 ("htb: improved accuracy at high rates")
    removed the use of the rate table system.

    To keep compatible with older iproute2 utils, this patch detects
    the linklayer by parsing the rate table. It also supports future
    versions of iproute2 to send this linklayer parameter to the
    kernel directly. This is done by using the __reserved field in
    struct tc_ratespec, to convey the choosen linklayer option, but
    only using the lower 4 bits of this field.

    Linklayer detection is limited to speeds below 100Mbit/s, because
    at high rates the rtab is gets too inaccurate, so bad that
    several fields contain the same values, this resembling the ATM
    detect. Fields even start to contain "0" time to send, e.g. at
    1000Mbit/s sending a 96 bytes packet cost "0", thus the rtab have
    been more broken than we first realized.

    Signed-off-by: Jesper Dangaard Brouer
    Signed-off-by: David S. Miller

    Jesper Dangaard Brouer
     
  • Merge a bunch of fixes from Andrew Morton.

    * emailed patches from Andrew Morton :
    fs/proc/task_mmu.c: fix buffer overflow in add_page_map()
    arch: *: Kconfig: add "kernel/Kconfig.freezer" to "arch/*/Kconfig"
    ocfs2: fix null pointer dereference in ocfs2_dir_foreach_blk_id()
    x86 get_unmapped_area(): use proper mmap base for bottom-up direction
    ocfs2: fix NULL pointer dereference in ocfs2_duplicate_clusters_by_page
    ocfs2: Revert 40bd62e to avoid regression in extended allocation
    drivers/rtc/rtc-stmp3xxx.c: provide timeout for potentially endless loop polling a HW bit
    hugetlb: fix lockdep splat caused by pmd sharing
    aoe: adjust ref of head for compound page tails
    microblaze: fix clone syscall
    mm: save soft-dirty bits on file pages
    mm: save soft-dirty bits on swapped pages
    memcg: don't initialize kmem-cache destroying work for root caches

    Linus Torvalds
     

14 Aug, 2013

6 commits

  • When the stack is set to unlimited, the bottomup direction is used for
    mmap-ings but the mmap_base is not used and thus effectively renders
    ASLR for mmapings along with PIE useless.

    Cc: Michel Lespinasse
    Cc: Oleg Nesterov
    Reviewed-by: Rik van Riel
    Acked-by: Ingo Molnar
    Cc: Adrian Sendroiu
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Radu Caragea
     
  • Fix inadvertent breakage in the clone syscall ABI for Microblaze that
    was introduced in commit f3268edbe6fe ("microblaze: switch to generic
    fork/vfork/clone").

    The Microblaze syscall ABI for clone takes the parent tid address in the
    4th argument; the third argument slot is used for the stack size. The
    incorrectly-used CLONE_BACKWARDS type assigned parent tid to the 3rd
    slot.

    This commit restores the original ABI so that existing userspace libc
    code will work correctly.

    All kernel versions from v3.8-rc1 were affected.

    Signed-off-by: Michal Simek
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michal Simek
     
  • Andy reported that if file page get reclaimed we lose the soft-dirty bit
    if it was there, so save _PAGE_BIT_SOFT_DIRTY bit when page address get
    encoded into pte entry. Thus when #pf happens on such non-present pte
    we can restore it back.

    Reported-by: Andy Lutomirski
    Signed-off-by: Cyrill Gorcunov
    Acked-by: Pavel Emelyanov
    Cc: Matt Mackall
    Cc: Xiao Guangrong
    Cc: Marcelo Tosatti
    Cc: KOSAKI Motohiro
    Cc: Stephen Rothwell
    Cc: Peter Zijlstra
    Cc: "Aneesh Kumar K.V"
    Cc: Minchan Kim
    Cc: Wanpeng Li
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • Andy Lutomirski reported that if a page with _PAGE_SOFT_DIRTY bit set
    get swapped out, the bit is getting lost and no longer available when
    pte read back.

    To resolve this we introduce _PTE_SWP_SOFT_DIRTY bit which is saved in
    pte entry for the page being swapped out. When such page is to be read
    back from a swap cache we check for bit presence and if it's there we
    clear it and restore the former _PAGE_SOFT_DIRTY bit back.

    One of the problem was to find a place in pte entry where we can save
    the _PTE_SWP_SOFT_DIRTY bit while page is in swap. The _PAGE_PSE was
    chosen for that, it doesn't intersect with swap entry format stored in
    pte.

    Reported-by: Andy Lutomirski
    Signed-off-by: Cyrill Gorcunov
    Acked-by: Pavel Emelyanov
    Cc: Matt Mackall
    Cc: Xiao Guangrong
    Cc: Marcelo Tosatti
    Cc: KOSAKI Motohiro
    Cc: Stephen Rothwell
    Cc: Peter Zijlstra
    Cc: "Aneesh Kumar K.V"
    Reviewed-by: Minchan Kim
    Reviewed-by: Wanpeng Li
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • Pull scheduler fixes from Ingo Molnar:
    "Docbook fixes that make 99% of the diffstat, plus a oneliner fix"

    * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
    sched: Ensure update_cfs_shares() is called for parents of continuously-running tasks
    sched: Fix some kernel-doc warnings

    Linus Torvalds
     
  • Using inner-id for tunnel id is not safe in some rare cases.
    E.g. packets coming from multiple sources entering same tunnel
    can have same id. Therefore on tunnel packet receive we
    could have packets from two different stream but with same
    source and dst IP with same ip-id which could confuse ip packet
    reassembly.

    Following patch reverts optimization from commit
    490ab08127 (IP_GRE: Fix IP-Identification.)

    CC: Jarno Rajahalme
    CC: Ansis Atteka
    Signed-off-by: Pravin B Shelar
    Signed-off-by: David S. Miller

    Pravin B Shelar
     

13 Aug, 2013

1 commit

  • This is only theoretical, but after try_to_wake_up(p) was changed
    to check p->state under p->pi_lock the code like

    __set_current_state(TASK_INTERRUPTIBLE);
    schedule();

    can miss a signal. This is the special case of wait-for-condition,
    it relies on try_to_wake_up/schedule interaction and thus it does
    not need mb() between __set_current_state() and if(signal_pending).

    However, this __set_current_state() can move into the critical
    section protected by rq->lock, now that try_to_wake_up() takes
    another lock we need to ensure that it can't be reordered with
    "if (signal_pending(current))" check inside that section.

    The patch is actually one-liner, it simply adds smp_wmb() before
    spin_lock_irq(rq->lock). This is what try_to_wake_up() already
    does by the same reason.

    We turn this wmb() into the new helper, smp_mb__before_spinlock(),
    for better documentation and to allow the architectures to change
    the default implementation.

    While at it, kill smp_mb__after_lock(), it has no callers.

    Perhaps we can also add smp_mb__before/after_spinunlock() for
    prepare_to_wait().

    Signed-off-by: Oleg Nesterov
    Acked-by: Peter Zijlstra
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

11 Aug, 2013

2 commits

  • Pull NFS client bugfixes from Trond Myklebust:

    - Stable patch for lockd to fix Oopses due to inappropriate calls to
    utsname()->nodename

    - Stable patches for sunrpc to fix Oopses on shutdown when using
    AF_LOCAL sockets with rpcbind

    - Fix memory leak and error checking issues in nfs4_proc_lookup_mountpoint

    - Fix a regression with the sync mount option failing to work for nfs4
    mounts

    - Fix a writeback performance issue when doing cache invalidation

    - Remove an incorrect call to nfs_setsecurity in nfs_fhget

    * tag 'nfs-for-3.11-4' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFSv4: Fix up nfs4_proc_lookup_mountpoint
    NFS: Remove unnecessary call to nfs_setsecurity in nfs_fhget()
    NFSv4: Fix the sync mount option for nfs4 mounts
    NFS: Fix writeback performance issue on cache invalidation
    SUNRPC: If the rpcbind channel is disconnected, fail the call to unregister
    SUNRPC: Don't auto-disconnect from the local rpcbind socket
    LOCKD: Don't call utsname()->nodename from nlmclnt_setlockargs

    Linus Torvalds
     
  • Pull staging driver fixes from Greg KH:
    "Here are 3 small fixes for staging/IIO drivers for 3.11-rc5. Nothing
    huge, two IIO driver fixes, and a zcache fix. All of these have been
    in linux-next for a while"

    * tag 'staging-3.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
    staging: zcache: fix "zcache=" kernel parameter
    iio: ti_am335x_adc: Fix wrong samples received on 1st read
    iio:trigger: Fix use_count race condition

    Linus Torvalds
     

10 Aug, 2013

3 commits

  • Pull ACPI and power management fixes from Rafael Wysocki:

    - ACPI-based memory hotplug stopped working after a recent change,
    because it's not possible to associate sufficiently many "physical"
    devices with one ACPI device object due to an artificial limit. Fix
    from Rafael J Wysocki removes that limit and makes memory hotplug
    work again.

    - A change made in 3.9 uncovered a bug in the ACPI processor driver
    preventing NUMA nodes from being put offline due to an ordering
    issue. Fix from Yasuaki Ishimatsu changes the ordering to make
    things work again.

    - One of the recent ACPI video commits (that hasn't been reverted so
    far) uncovered a bug in the code handling quirky BIOSes that caused
    some Asus machines to boot with backlight completely off which made
    it quite difficult to use them afterward. Fix from Felipe Contreras
    improves the quirk to cover this particular case correctly.

    - A cpufreq user space interface change made in 3.10 inadvertently
    renamed the ignore_nice_load sysfs attribute to ignore_nice which
    resulted in some confusion. Fix from Viresh Kumar changes the name
    back to ignore_nice_load.

    - An initialization ordering change made in 3.9 broke cpufreq on
    loongson2 boards. Fix from Aaro Koskinen restores the correct
    initialization ordering there.

    - Fix breakage resulting from a mistake made in 3.9 and causing the
    detection of some graphics adapters (that were detected correctly
    before) to fail. There are two objects representing the same PCIe
    port in the affected systems' ACPI tables and both appear as
    "enabled" and we are expected to guess which one to use. We used to
    choose the right one before by pure luck, but when we tried to
    address another similar corner case, the luck went away. This time
    we try to make our guessing a bit more educated which is reported to
    work on those systems.

    - The /proc/acpi/wakeup interface code is missing some locking which
    may lead to breakage if that file is written or read during hotplug
    of wakeup devices. That should be rare but still possible, so it's
    better to start using the appropriate locking there.

    * tag 'pm+acpi-3.11-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    ACPI: Try harder to resolve _ADR collisions for bridges
    cpufreq: rename ignore_nice as ignore_nice_load
    cpufreq: loongson2: fix regression related to clock management
    ACPI / processor: move try_offline_node() after acpi_unmap_lsapic()
    ACPI: Drop physical_node_id_bitmap from struct acpi_device
    ACPI / PM: Walk physical_node_list under physical_node_lock
    ACPI / video: improve quirk check in acpi_video_bqc_quirk()

    Linus Torvalds
     
  • Pull media fixes from Mauro Carvalho Chehab:
    "Some driver fixes (em28xx, coda, usbtv, s5p, hdpvr and ml86v7667) and
    a fix for media DocBook"

    * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
    [media] em28xx: fix assignment of the eeprom data
    [media] hdpvr: fix iteration over uninitialized lists in hdpvr_probe()
    [media] usbtv: fix dependency
    [media] usbtv: Throw corrupted frames away
    [media] usbtv: Fix deinterlacing
    [media] v4l2: added missing mutex.h include to v4l2-ctrls.h
    [media] DocBook: upgrade media_api DocBook version to 4.2
    [media] ml86v7667: fix compile warning: 'ret' set but not used
    [media] s5p-g2d: Fix registration failure
    [media] media: coda: Fix DT driver data pointer for i.MX27
    [media] s5p-mfc: Fix input/output format reporting

    Linus Torvalds
     
  • Rename mib counter from "low latency" to "busy poll"

    v1 also moved the counter to the ip MIB (suggested by Shawn Bohrer)
    Eric Dumazet suggested that the current location is better.

    So v2 just renames the counter to fit the new naming convention.

    Signed-off-by: Eliezer Tamir
    Signed-off-by: David S. Miller

    Eliezer Tamir
     

09 Aug, 2013

2 commits


08 Aug, 2013

3 commits

  • If rpcbind causes our connection to the AF_LOCAL socket to close after
    we've registered a service, then we want to be careful about reconnecting
    since the mount namespace may have changed.

    By simply refusing to reconnect the AF_LOCAL socket in the case of
    unregister, we avoid the need to somehow save the mount namespace. While
    this may lead to some services not unregistering properly, it should
    be safe.

    Signed-off-by: Trond Myklebust
    Cc: Nix
    Cc: Jeff Layton
    Cc: stable@vger.kernel.org # 3.9.x

    Trond Myklebust
     
  • In theory, under a given ACPI namespace node there should be only
    one child device object with _ADR whose value matches a given bus
    address exactly. In practice, however, there are systems in which
    multiple child device objects under a given parent have _ADR matching
    exactly the same address. In those cases we use _STA to determine
    which of the multiple matching devices is enabled, since some systems
    are known to indicate which ACPI device object to associate with the
    given physical (usually PCI) device this way.

    Unfortunately, as it turns out, there are systems in which many
    device objects under the same parent have _ADR matching exactly the
    same bus address and none of them has _STA, in which case they all
    should be regarded as enabled according to the spec. Still, if
    those device objects are supposed to represent bridges (e.g. this
    is the case for device objects corresponding to PCIe ports), we can
    try harder and skip the ones that have no child device objects in the
    ACPI namespace. With luck, we can avoid using device objects that we
    are not expected to use this way.

    Although this only works for bridges whose children also have ACPI
    namespace representation, it is sufficient to address graphics
    adapter detection issues on some systems, so rework the code finding
    a matching device ACPI handle for a given bus address to implement
    this idea.

    Introduce a new function, acpi_find_child(), taking three arguments:
    the ACPI handle of the device's parent, a bus address suitable for
    the device's bus type and a bool indicating if the device is a
    bridge and make it work as outlined above. Reimplement the function
    currently used for this purpose, acpi_get_child(), as a call to
    acpi_find_child() with the last argument set to 'false' and make
    the PCI subsystem use acpi_find_child() with the bridge information
    passed as the last argument to it. [Lan Tianyu notices that it is
    not sufficient to use pci_is_bridge() for that, because the device's
    subordinate pointer hasn't been set yet at this point, so use
    hdr_type instead.]

    This change fixes a regression introduced inadvertently by commit
    33f767d (ACPI: Rework acpi_get_child() to be more efficient) which
    overlooked the fact that for acpi_walk_namespace() "post-order" means
    "after all children have been visited" rather than "on the way back",
    so for device objects without children and for namespace walks of
    depth 1, as in the acpi_get_child() case, the "post-order" callbacks
    ordering is actually the same as the ordering of "pre-order" ones.
    Since that commit changed the namespace walk in acpi_get_child() to
    terminate after finding the first matching object instead of going
    through all of them and returning the last one, it effectively
    changed the result returned by that function in some rare cases and
    that led to problems (the switch from a "pre-order" to a "post-order"
    callback was supposed to prevent that from happening, but it was
    ineffective).

    As it turns out, the systems where the change made by commit
    33f767d actually matters are those where there are multiple ACPI
    device objects representing the same PCIe port (which effectively
    is a bridge). Moreover, only one of them, and the one we are
    expected to use, has child device objects in the ACPI namespace,
    so the regression can be addressed as described above.

    References: https://bugzilla.kernel.org/show_bug.cgi?id=60561
    Reported-by: Peter Wu
    Tested-by: Vladimir Lalov
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Bjorn Helgaas
    Cc: 3.9+ # 3.9+

    Rafael J. Wysocki
     
  • …t/rostedt/linux-trace

    Pull tracing fixes from Steven Rostedt:
    "Oleg Nesterov has been working hard in closing all the holes that can
    lead to race conditions between deleting an event and accessing an
    event debugfs file. This included a fix to the debugfs system (acked
    by Greg Kroah-Hartman). We think that all the holes have been patched
    and hopefully we don't find more. I haven't marked all of them for
    stable because I need to examine them more to figure out how far back
    some of the changes need to go.

    Along the way, some other fixes have been made. Alexander Z Lam fixed
    some logic where the wrong buffer was being modifed.

    Andrew Vagin found a possible corruption for machines that actually
    allocate cpumask, as a reference to one was being zeroed out by
    mistake.

    Dhaval Giani found a bad prototype when tracing is not configured.

    And I not only had some changes to help Oleg, but also finally fixed a
    long standing bug that Dave Jones and others have been hitting, where
    a module unload and reload can cause the function tracing accounting
    to get screwed up"

    * tag 'trace-fixes-3.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Fix reset of time stamps during trace_clock changes
    tracing: Make TRACE_ITER_STOP_ON_FREE stop the correct buffer
    tracing: Fix trace_dump_stack() proto when CONFIG_TRACING is not set
    tracing: Fix fields of struct trace_iterator that are zeroed by mistake
    tracing/uprobes: Fail to unregister if probe event files are in use
    tracing/kprobes: Fail to unregister if probe event files are in use
    tracing: Add comment to describe special break case in probe_remove_event_call()
    tracing: trace_remove_event_call() should fail if call/file is in use
    debugfs: debugfs_remove_recursive() must not rely on list_empty(d_subdirs)
    ftrace: Check module functions being traced on reload
    ftrace: Consolidate some duplicate code for updating ftrace ops
    tracing: Change remove_event_file_dir() to clear "d_subdirs"->i_private
    tracing: Introduce remove_event_file_dir()
    tracing: Change f_start() to take event_mutex and verify i_private != NULL
    tracing: Change event_filter_read/write to verify i_private != NULL
    tracing: Change event_enable/disable_read() to verify i_private != NULL
    tracing: Turn event/id->i_private into call->event.type

    Linus Torvalds
     

07 Aug, 2013

1 commit

  • regmap.h requires linux/err.h if CONFIG_REGMAP is not defined. Without it I get
    error.
    CC drivers/media/platform/exynos4-is/fimc-reg.o
    In file included from drivers/media/platform/exynos4-is/fimc-reg.c:14:0:
    include/linux/regmap.h: In function ‘regmap_write’:
    include/linux/regmap.h:525:10: error: ‘EINVAL’ undeclared (first use in this function)
    include/linux/regmap.h:525:10: note: each undeclared identifier is reported only once for each function it appears in

    Signed-off-by: Mateusz Krawczuk
    Signed-off-by: Kyungmin Park
    Signed-off-by: Mark Brown
    Cc: stable@kernel.org

    Mateusz Krawczuk
     

06 Aug, 2013

2 commits

  • The physical_node_id_bitmap in struct acpi_device is only used for
    looking up the first currently unused dependent phyiscal node ID
    by acpi_bind_one(). It is not really necessary, however, because
    acpi_bind_one() walks the entire physical_node_list of the given
    device object for sanity checking anyway and if that list is always
    sorted by node_id, it is straightforward to find the first gap
    between the currently used node IDs and use that number as the ID
    of the new list node.

    This also removes the artificial limit of the maximum number of
    dependent physical devices per ACPI device object, which now depends
    only on the capacity of unsigend int. As a result, it fixes a
    regression introduced by commit e2ff394 (ACPI / memhotplug: Bind
    removable memory blocks to ACPI device nodes) that caused
    acpi_memory_enable_device() to fail when the number of 128 MB blocks
    within one removable memory module was greater than 32.

    Reported-and-tested-by: Yasuaki Ishimatsu
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Toshi Kani
    Reviewed-by: Yasuaki Ishimatsu

    Rafael J. Wysocki
     
  • Remove this code, per Dave Miller's request, since it is not being used
    anywhere in the kernel.

    Signed-off-by: Eli Cohen
    Signed-off-by: David S. Miller

    Eli Cohen
     

05 Aug, 2013

2 commits


04 Aug, 2013

1 commit

  • Pull networking fixes from David Miller:

    1) Don't ignore user initiated wireless regulatory settings on cards
    with custom regulatory domains, from Arik Nemtsov.

    2) Fix length check of bluetooth information responses, from Jaganath
    Kanakkassery.

    3) Fix misuse of PTR_ERR in btusb, from Adam Lee.

    4) Handle rfkill properly while iwlwifi devices are offline, from
    Emmanuel Grumbach.

    5) Fix r815x devices DMA'ing to stack buffers, from Hayes Wang.

    6) Kernel info leak in ATM packet scheduler, from Dan Carpenter.

    7) 8139cp doesn't check for DMA mapping errors, from Neil Horman.

    8) Fix bridge multicast code to not snoop when no querier exists,
    otherwise mutlicast traffic is lost. From Linus Lüssing.

    9) Avoid soft lockups in fib6_run_gc(), from Michal Kubecek.

    10) Fix races in automatic address asignment on ipv6, which can result
    in incorrect lifetime assignments. From Jiri Benc.

    11) Cure build bustage when CONFIG_NET_LL_RX_POLL is not set and rename
    it CONFIG_NET_RX_BUSY_POLL to eliminate the last reference to the
    original naming of this feature. From Cong Wang.

    12) Fix crash in TIPC when server socket creation fails, from Ying Xue.

    13) macvlan_changelink() silently succeeds when it shouldn't, from
    Michael S Tsirkin.

    14) HTB packet scheduler can crash due to sign extension, fix from
    Stephen Hemminger.

    15) With the cable unplugged, r8169 prints out a message every 10
    seconds, make it netif_dbg() instead of netif_warn(). From Peter
    Wu.

    16) Fix memory leak in rtm_to_ifaddr(), from Daniel Borkmann.

    17) sis900 gets spurious TX queue timeouts due to mismanagement of link
    carrier state, from Denis Kirjanov.

    18) Validate somaxconn sysctl to make sure it fits inside of a u16.
    From Roman Gushchin.

    19) Fix MAC address filtering on qlcnic, from Shahed Shaikh.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (68 commits)
    qlcnic: Fix for flash update failure on 83xx adapter
    qlcnic: Fix link speed and duplex display for 83xx adapter
    qlcnic: Fix link speed display for 82xx adapter
    qlcnic: Fix external loopback test.
    qlcnic: Removed adapter series name from warning messages.
    qlcnic: Free up memory in error path.
    qlcnic: Fix ingress MAC learning
    qlcnic: Fix MAC address filter issue on 82xx adapter
    net: ethernet: davinci_emac: drop IRQF_DISABLED
    netlabel: use domain based selectors when address based selectors are not available
    net: check net.core.somaxconn sysctl values
    sis900: Fix the tx queue timeout issue
    net: rtm_to_ifaddr: free ifa if ifa_cacheinfo processing fails
    r8169: remove "PHY reset until link up" log spam
    net: ethernet: cpsw: drop IRQF_DISABLED
    htb: fix sign extension bug
    macvlan: handle set_promiscuity failures
    macvlan: better mode validation
    tipc: fix oops when creating server socket fails
    net: rename CONFIG_NET_LL_RX_POLL to CONFIG_NET_RX_BUSY_POLL
    ...

    Linus Torvalds
     

03 Aug, 2013

4 commits

  • When CONFIG_TRACING is not enabled, the stub prototype for trace_dump_stack()
    is incorrect. It has (void) when it should be (int).

    Link: http://lkml.kernel.org/r/CAPhKKr_H=ukFnBL4WgDOVT5ay2xeF-Ho+CA0DWZX0E2JW-=vSQ@mail.gmail.com

    Signed-off-by: Dhaval Giani
    Signed-off-by: Steven Rostedt

    Dhaval Giani
     
  • tracing_read_pipe zeros all fields bellow "seq". The declaration contains
    a comment about that, but it doesn't help.

    The first field is "snapshot", it's true when current open file is
    snapshot. Looks obvious, that it should not be zeroed.

    The second field is "started". It was converted from cpumask_t to
    cpumask_var_t (v2.6.28-4983-g4462344), in other words it was
    converted from cpumask to pointer on cpumask.

    Currently the reference on "started" memory is lost after the first read
    from tracing_read_pipe and a proper object will never be freed.

    The "started" is never dereferenced for trace_pipe, because trace_pipe
    can't have the TRACE_FILE_ANNOTATE options.

    Link: http://lkml.kernel.org/r/1375463803-3085183-1-git-send-email-avagin@openvz.org

    Cc: stable@vger.kernel.org # 2.6.30
    Signed-off-by: Andrew Vagin
    Signed-off-by: Steven Rostedt

    Andrew Vagin
     
  • Pull infiniband/rdma fixes from Roland Dreier:
    - Fixes for the newly merged mlx5 hardware driver
    - Stack info leak fixes from Dan Carpenter
    - Fixes for pkey table handling with SR-IOV
    - A few other small things

    * tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
    IPoIB: Fix pkey change flow for virtualization environments
    IPoIB: Make sure child devices use valid/proper pkeys
    IB/core: Create QP1 using the pkey index which contains the default pkey
    mlx5_core: Variable may be used uninitialized
    mlx5_core: Implement new initialization sequence
    mlx5_core: Fix use after free in mlx5_cmd_comp_handler()
    IB/mlx5: Fix stack info leak in mlx5_ib_alloc_ucontext()
    IB/mlx5: Fix error return code in init_one()
    IB/mlx4: Use default pkey when creating tunnel QPs
    RDMA/cma: Only call cma_save_ib_info() for CM REQs
    RDMA/cma: Fix accessing invalid private data for UD
    RDMA/cma: Fix gcc warning
    Revert "RDMA/nes: Fix compilation error when nes_debug is enabled"
    IB/qib: Add err_decode() call for ring dump
    RDMA/cxgb3: Fix stack info leak in iwch_create_cq()
    RDMA/nes: Fix info leaks in nes_create_qp() and nes_create_cq()
    RDMA/ocrdma: Fix several stack info leaks
    RDMA/cxgb4: Fix stack info leak in c4iw_create_qp()
    RDMA/ocrdma: Remove unused include

    Linus Torvalds
     
  • Pull ACPI and power management fixes from Rafael Wysocki:

    - Revert two cpuidle commits added during the 3.8 development cycle
    that turn out to have introduced a significant performance regression
    as requested by Jeremy Eder.

    - The recent patches that made the freezer less heavy-weight introduced
    a regression causing user-space-driven hibernation using the ioctl()
    interface to block indefinitely when the hibernate process executes
    try_to_freeze(). Fix from Colin Cross addresses this by adding a
    process flag to mark the hibernate/suspend process to inform the
    freezer that that process should be ignored.

    - One of the recent cpufreq reverts uncovered a problem in the core
    causing the cpufreq driver module refcount to become negative after a
    system suspend-resume cycle. Fix from Rafael J Wysocki.

    - The evaluation of the ACPI battery _BIX method has never worked
    correctly, because the commit that added support for it forgot to
    take the "Revision" field in the return package into account. As a
    result, the reading of battery info doesn't work at all on some
    systems, which is addressed by a fix from Lan Tianyu.

    * tag 'pm+acpi-3.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    freezer: set PF_SUSPEND_TASK flag on tasks that call freeze_processes
    ACPI / battery: Fix parsing _BIX return value
    cpufreq: Fix cpufreq driver module refcount balance after suspend/resume
    Revert "cpuidle: Quickly notice prediction failure for repeat mode"
    Revert "cpuidle: Quickly notice prediction failure in general case"

    Linus Torvalds
     

02 Aug, 2013

4 commits

  • Eliezer renames several *ll_poll to *busy_poll, but forgets
    CONFIG_NET_LL_RX_POLL, so in case of confusion, rename it too.

    Cc: Eliezer Tamir
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • When CONFIG_NET_LL_RX_POLL is not set, I got:

    net/socket.c: In function ‘sock_poll’:
    net/socket.c:1165:4: error: implicit declaration of function ‘sk_busy_loop’ [-Werror=implicit-function-declaration]

    Fix this by adding a nop when !CONFIG_NET_LL_RX_POLL.

    Cc: Eliezer Tamir
    Cc: David S. Miller
    Signed-off-by: Cong Wang
    Signed-off-by: David S. Miller

    Cong Wang
     
  • On a high-traffic router with many processors and many IPv6 dst
    entries, soft lockup in fib6_run_gc() can occur when number of
    entries reaches gc_thresh.

    This happens because fib6_run_gc() uses fib6_gc_lock to allow
    only one thread to run the garbage collector but ip6_dst_gc()
    doesn't update net->ipv6.ip6_rt_last_gc until fib6_run_gc()
    returns. On a system with many entries, this can take some time
    so that in the meantime, other threads pass the tests in
    ip6_dst_gc() (ip6_rt_last_gc is still not updated) and wait for
    the lock. They then have to run the garbage collector one after
    another which blocks them for quite long.

    Resolve this by replacing special value ~0UL of expire parameter
    to fib6_run_gc() by explicit "force" parameter to choose between
    spin_lock_bh() and spin_trylock_bh() and call fib6_run_gc() with
    force=false if gc_thresh is reached but not max_size.

    Signed-off-by: Michal Kubecek
    Signed-off-by: David S. Miller

    Michal Kubeček
     
  • …wireless into for-davem

    John W. Linville
     

01 Aug, 2013

1 commit

  • Pull drm fixes from Dave Airlie:
    "Radeon, nouveau, exynos, intel, mgag200..

    Not all strictly regressions but there was probably only one patch I'd
    have really left out and it didn't seem worth respinning exynos to
    avoid it, the line change count is quite low.

    radeon: regressions + more dynamic powermanagement fixes, since DPM
    is a new feature, and off by default I'd prefer to keep merging
    fixes since it has a large userbase already and I'd like to keep
    them on mainline

    nouveau: is mostly regression fixes

    i915: is a regression fix since Daniel is on holidays I've merged it.

    mgag200: I've picked a bunch of targetted fixes from a big bunch of
    distro patches,

    exynos: build fixes mostly, one regression fix

    I expect things will slow right down now, I may send on the intel
    early quirk from Jesse separatly, since I think the x86 maintainers
    acked it"

    * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (37 commits)
    drm/i915: fix missed hunk after GT access breakage
    drm/radeon/dpm: re-enable cac control on SI
    drm/radeon/dpm: fix calculations in si_calculate_leakage_for_v_and_t_formula
    drm: fix 64 bit drm fixed point helpers
    drm/radeon/atom: initialize more atom interpretor elements to 0
    drm/nouveau: fix semaphore dmabuf obj
    drm/nouveau/vm: make vm refcount into a kref
    drm/nv31/mpeg: don't recognize nv3x cards as having nv44 graph class
    drm/nv40/mpeg: write magic value to channel object to make it work
    drm/nouveau: fix size check for cards without vm
    drm/nv50-/disp: remove dcb_outp_match call, and related variables
    drm/nva3-/disp: fix hda eld writing, needs to be padded
    drm/nv31/mpeg: fix mpeg engine initialization
    drm/nv50/mc: include vp in the fb error reporting mask
    drm/nouveau: fix null pointer dereference in poll_changed
    drm/nv50/gpio: post-nv92 cards have 32 interrupt lines
    drm/nvc0/fb: take lock in nvc0_ram_put()
    drm/nouveau/core: xtensa firmware size needs to be 0x40000 no matter what
    drm/mgag200: Fix LUT programming for 16bpp
    drm/mgag200: Fix framebuffer pitch calculation
    ...

    Linus Torvalds