19 Jun, 2013

4 commits

  • This is selected sections of the current manual for fmc-bus, as
    developed outside of the kernel before submission.

    Like the other patches in this set, it corresponds to commit ab23167f of
    the repository at ohwr.org

    Signed-off-by: Alessandro Rubini
    Acked-by: Juan David Gonzalez Cobas
    Acked-by: Emilio G. Cota
    Acked-by: Samuel Iglesias Gonsalvez
    Acked-by: Rob Landley
    Signed-off-by: Greg Kroah-Hartman

    Alessandro Rubini
     
  • This module offers registration services for both carriers
    (i.e. devices) and mezzanines (i.e. drivers). The matching for devices
    and drivers is performed according to the IPMI standard for FRU
    devices (Field Replaceable Units).

    The code includes support for parsing an SDB tree if present in the FPGA,
    and dumping it for diagnostics. SDB is not mandatory.

    Files in this commit correspond to commit ab23167f in the master branch
    of the project hosted on ohwr.org.

    Signed-off-by: Alessandro Rubini
    Acked-by: Juan David Gonzalez Cobas
    Acked-by: Emilio G. Cota
    Acked-by: Samuel Iglesias Gonsalvez
    Signed-off-by: Greg Kroah-Hartman

    Alessandro Rubini
     
  • This hopefully will help point developers to the proper way that patches
    should be submitted for inclusion in the stable kernel releases.

    Reported-by: David Howells
    Acked-by: David Howells
    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • Even if guest were compiled without SMP support, it could not assume that host
    wasn't. So switch to use mb() instead of smp_mb() to force memory barriers for
    UP guest.

    Signed-off-by: Jason Wang
    Cc: Haiyang Zhang
    Cc: stable
    Signed-off-by: K. Y. Srinivasan
    Signed-off-by: Greg Kroah-Hartman

    Jason Wang
     

18 Jun, 2013

16 commits


16 Jun, 2013

3 commits

  • Linus Torvalds
     
  • Pull ARM SoC fixes from Olof Johansson:
    "These are a little later than I planned on since I got caught up with
    handling merges for 3.11 most of the week.

    Another week, another batch of fixes for arm-soc platforms.

    Again, nothing controversial. A few more than would be ideal, but all
    are valid fixes. In particular the prima2 panic patch is critical
    since it fixes a problem where multiplatform kernels panic on all but
    prima2 hardware."

    * tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
    ARM: SAMSUNG: pm: Adjust for pinctrl- and DT-enabled platforms
    ARM: prima2: fix incorrect panic usage
    arm: mvebu: armada-xp-{gp,openblocks-ax3-4}: specify PCIe range
    ARM: Kirkwood: handle mv88f6282 cpu in __kirkwood_variant().
    ARM: omap3: clock: fix wrong container_of in clock36xx.c
    ARM: dts: OMAP5: Fix missing PWM capability to timer nodes
    ARM: dts: omap4-panda|sdp: Fix mux for twl6030 IRQ pin and msecure line
    ARM: dts: AM33xx: Fix properties on gpmc node
    arm: omap2: fix AM33xx hwmod infos for UART2
    ARM: OMAP3: Fix iva2_pwrdm settings for 3703

    Linus Torvalds
     
  • Pull networking fixes from David Miller:

    1) Fix RTNL locking in batman-adv, from Matthias Schiffer.

    2) Don't allow non-passthrough macvlan devices to set NOPROMISC via
    netlink, otherwise we can end up with corrupted promisc counter
    values on the device. From Michael S Tsirkin.

    3) Fix stmmac driver build with debugging defines enabled, from Dinh
    Nguyen.

    4) Make sure name string we give in socket address in AF_PACKET is NULL
    terminated, from Daniel Borkmann.

    5) Fix leaking of two uninitialized bytes of memory to userspace in
    l2tp, from Guillaume Nault.

    6) Clear IPCB(skb) before tunneling otherwise we touch dangling IP
    options state and crash. From Saurabh Mohan.

    7) Fix suspend/resume for davinci_mdio by using suspend_late and
    resume_early. From Mugunthan V N.

    8) Don't tag ip_tunnel_init_net and ip_tunnel_delete_net with
    __net_{init,exit}, they can be called outside of those contexts.
    From Eric Dumazet.

    9) Fix RX length error in sh_eth driver, from Yoshihiro Shimoda.

    10) Fix missing sctp_outq initialization in some code paths of SCTP
    stack, from Neil Horman.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (21 commits)
    sctp: fully initialize sctp_outq in sctp_outq_init
    netiucv: Hold rtnl between name allocation and device registration.
    tulip: Properly check dma mapping result
    net: sh_eth: fix incorrect RX length error if R8A7740
    ip_tunnel: remove __net_init/exit from exported functions
    drivers: net: davinci_mdio: restore mdio clk divider in mdio resume
    drivers: net: davinci_mdio: moving mdio resume earlier than cpsw ethernet driver
    net/ipv4: ip_vti clear skb cb before tunneling.
    tg3: Wait for boot code to finish after power on
    l2tp: Fix sendmsg() return value
    l2tp: Fix PPP header erasure and memory leak
    bonding: fix igmp_retrans type and two related races
    bonding: reset master mac on first enslave failure
    packet: packet_getname_spkt: make sure string is always 0-terminated
    net: ethernet: stmicro: stmmac: Fix compile error when STMMAC_XMIT_DEBUG used
    be2net: Fix 32-bit DMA Mask handling
    xen-netback: don't de-reference vif pointer after having called xenvif_put()
    macvlan: don't touch promisc without passthrough
    batman-adv: Don't handle address updates when bla is disabled
    batman-adv: forward late OGMs from best next hop
    ...

    Linus Torvalds
     

15 Jun, 2013

17 commits

  • Pull powerpc fixes from Benjamin Herrenschmidt:
    "So here are 3 fixes still for 3.10. Fixes are simple, bugs are nasty
    (though not recent regressions, nasty enough) and all targeted at
    stable"

    * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
    powerpc: Fix missing/delayed calls to irq_work
    powerpc: Fix emulation of illegal instructions on PowerNV platform
    powerpc: Fix stack overflow crash in resume_kernel when ftracing

    Linus Torvalds
     
  • Thanks to commit f91eb62f71b3 ("init: scream bloody murder if interrupts
    are enabled too early"), "bloody murder" is now being screamed.

    With a MIPS OCTEON config, we use on_each_cpu() in our
    irq_chip.irq_bus_sync_unlock() function. This gets called in early as a
    result of the time_init() call. Because the !SMP version of
    on_each_cpu() unconditionally enables irqs, we get:

    WARNING: at init/main.c:560 start_kernel+0x250/0x410()
    Interrupts were enabled early
    CPU: 0 PID: 0 Comm: swapper Not tainted 3.10.0-rc5-Cavium-Octeon+ #801
    Call Trace:
    show_stack+0x68/0x80
    warn_slowpath_common+0x78/0xb0
    warn_slowpath_fmt+0x38/0x48
    start_kernel+0x250/0x410

    Suggested fix: Do what we already do in the SMP version of
    on_each_cpu(), and use local_irq_save/local_irq_restore. Because we
    need a flags variable, make it a static inline to avoid name space
    issues.

    [ Change from v1: Convert on_each_cpu to a static inline function, add
    #include to avoid build breakage on some files.

    on_each_cpu_mask() and on_each_cpu_cond() suffer the same problem as
    on_each_cpu(), but they are not causing !SMP bugs for me, so I will
    defer changing them to a less urgent patch. ]

    Signed-off-by: David Daney
    Cc: Ralf Baechle
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Daney
     
  • Pull VFS fixes from Al Viro:
    "Several fixes + obvious cleanup (you've missed a couple of open-coded
    can_lookup() back then)"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
    snd_pcm_link(): fix a leak...
    use can_lookup() instead of direct checks of ->i_op->lookup
    move exit_task_namespaces() outside of exit_notify()
    fput: task_work_add() can fail if the caller has passed exit_task_work()
    ncpfs: fix rmdir returns Device or resource busy

    Linus Torvalds
     
  • Pull xfs fixes from Ben Myers:
    - Remove noisy warnings about experimental support which spams the logs
    - Add padding to align directory and attr structures correctly
    - Set block number on child buffer on a root btree split
    - Disable verifiers during log recovery for non-CRC filesystems

    * tag 'for-linus-v3.10-rc6' of git://oss.sgi.com/xfs/xfs:
    xfs: don't shutdown log recovery on validation errors
    xfs: ensure btree root split sets blkno correctly
    xfs: fix implicit padding in directory and attr CRC formats
    xfs: don't emit v5 superblock warnings on write

    Linus Torvalds
     
  • Pull char / misc fixes from Greg Kroah-Hartman:
    "Here are some small mei driver fixes for 3.10-rc6 that fix some
    reported problems"

    * tag 'char-misc-3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
    mei: me: clear interrupts on the resume path
    mei: nfc: fix nfc device freeing
    mei: init: Flush scheduled work before resetting the device

    Linus Torvalds
     
  • Pull USB fixes from Greg Kroah-Hartman:
    "Here are some small USB driver fixes that resolve some reported
    problems for 3.10-rc6

    Nothing major, just 3 USB serial driver fixes, and two chipidea fixes"

    * tag 'usb-3.10-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
    usb: chipidea: fix id change handling
    usb: chipidea: fix no transceiver case
    USB: pl2303: fix device initialisation at open
    USB: spcp8x5: fix device initialisation at open
    USB: f81232: fix device initialisation at open

    Linus Torvalds
     
  • When replaying interrupts (as a result of the interrupt occurring
    while soft-disabled), in the case of the decrementer, we are exclusively
    testing for a pending timer target. However we also use decrementer
    interrupts to trigger the new "irq_work", which in this case would
    be missed.

    This change the logic to force a replay in both cases of a timer
    boundary reached and a decrementer interrupt having actually occurred
    while disabled. The former test is still useful to catch cases where
    a CPU having been hard-disabled for a long time completely misses the
    interrupt due to a decrementer rollover.

    CC: [v3.4+]
    Signed-off-by: Benjamin Herrenschmidt
    Tested-by: Steven Rostedt

    Benjamin Herrenschmidt
     
  • Normally, the kernel emulates a few instructions that are unimplemented
    on some processors (e.g. the old dcba instruction), or privileged (e.g.
    mfpvr). The emulation of unimplemented instructions is currently not
    working on the PowerNV platform. The reason is that on these machines,
    unimplemented and illegal instructions cause a hypervisor emulation
    assist interrupt, rather than a program interrupt as on older CPUs.
    Our vector for the emulation assist interrupt just calls
    program_check_exception() directly, without setting the bit in SRR1
    that indicates an illegal instruction interrupt. This fixes it by
    making the emulation assist interrupt set that bit before calling
    program_check_interrupt(). With this, old programs that use no-longer
    implemented instructions such as dcba now work again.

    CC:
    Signed-off-by: Paul Mackerras
    Signed-off-by: Benjamin Herrenschmidt

    Paul Mackerras
     
  • It's possible for us to crash when running with ftrace enabled, eg:

    Bad kernel stack pointer bffffd12 at c00000000000a454
    cpu 0x3: Vector: 300 (Data Access) at [c00000000ffe3d40]
    pc: c00000000000a454: resume_kernel+0x34/0x60
    lr: c00000000000335c: performance_monitor_common+0x15c/0x180
    sp: bffffd12
    msr: 8000000000001032
    dar: bffffd12
    dsisr: 42000000

    If we look at current's stack (paca->__current->stack) we see it is
    equal to c0000002ecab0000. Our stack is 16K, and comparing to
    paca->kstack (c0000002ecab3e30) we can see that we have overflowed our
    kernel stack. This leads to us writing over our struct thread_info, and
    in this case we have corrupted thread_info->flags and set
    _TIF_EMULATE_STACK_STORE.

    Dumping the stack we see:

    3:mon> t c0000002ecab0000
    [c0000002ecab0000] c00000000002131c .performance_monitor_exception+0x5c/0x70
    [c0000002ecab0080] c00000000000335c performance_monitor_common+0x15c/0x180
    --- Exception: f01 (Performance Monitor) at c0000000000fb2ec .trace_hardirqs_off+0x1c/0x30
    [c0000002ecab0370] c00000000016fdb0 .trace_graph_entry+0xb0/0x280 (unreliable)
    [c0000002ecab0410] c00000000003d038 .prepare_ftrace_return+0x98/0x130
    [c0000002ecab04b0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
    [c0000002ecab0520] c0000000000d6b58 .idle_cpu+0x18/0x90
    [c0000002ecab05a0] c00000000000a934 .return_to_handler+0x0/0x34
    [c0000002ecab0620] c00000000001e660 .timer_interrupt+0x160/0x300
    [c0000002ecab06d0] c0000000000025dc decrementer_common+0x15c/0x180
    --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0
    [c0000002ecab09c0] c0000000000fe044 .trace_hardirqs_on+0x14/0x30 (unreliable)
    [c0000002ecab0fb0] c00000000016fe3c .trace_graph_entry+0x13c/0x280
    [c0000002ecab1050] c00000000003d038 .prepare_ftrace_return+0x98/0x130
    [c0000002ecab10f0] c00000000000a920 .ftrace_graph_caller+0x14/0x28
    [c0000002ecab1160] c0000000000161f0 .__ppc64_runlatch_on+0x10/0x40
    [c0000002ecab11d0] c00000000000a934 .return_to_handler+0x0/0x34
    --- Exception: 901 (Decrementer) at c0000000000104d4 .arch_local_irq_restore+0x74/0xa0

    ... and so on

    __ppc64_runlatch_on() is called from RUNLATCH_ON in the exception entry
    path. At that point the irq state is not consistent, ie. interrupts are
    hard disabled (by the exception entry), but the paca soft-enabled flag
    may be out of sync.

    This leads to the local_irq_restore() in trace_graph_entry() actually
    enabling interrupts, which we do not want. Because we have not yet
    reprogrammed the decrementer we immediately take another decrementer
    exception, and recurse.

    The fix is twofold. Firstly make sure we call DISABLE_INTS before
    calling RUNLATCH_ON. The badly named DISABLE_INTS actually reconciles
    the irq state in the paca with the hardware, making it safe again to
    call local_irq_save/restore().

    Although that should be sufficient to fix the bug, we also mark the
    runlatch routines as notrace. They are called very early in the
    exception entry and we are asking for trouble tracing them. They are
    also fairly uninteresting and tracing them just adds unnecessary
    overhead.

    [ This regression was introduced by fe1952fc0afb9a2e4c79f103c08aef5d13db1873
    "powerpc: Rework runlatch code" by myself --BenH
    ]

    CC: [v3.4+]
    Signed-off-by: Michael Ellerman
    Signed-off-by: Benjamin Herrenschmidt

    Michael Ellerman
     
  • in case when snd_pcm_stream_linked(substream) is true, we end up leaking
    group.

    Signed-off-by: Al Viro

    Al Viro
     
  • a couple of places got missed back when Linus has introduced that one...

    Signed-off-by: Al Viro

    Al Viro
     
  • exit_notify() does exit_task_namespaces() after
    forget_original_parent(). This was needed to ensure that ->nsproxy
    can't be cleared prematurely, an exiting child we are going to
    reparent can do do_notify_parent() and use the parent's (ours) pid_ns.

    However, after 32084504 "pidns: use task_active_pid_ns in
    do_notify_parent" ->nsproxy != NULL is no longer needed, we rely
    on task_active_pid_ns().

    Move exit_task_namespaces() from exit_notify() to do_exit(), after
    exit_fs() and before exit_task_work().

    This solves the problem reported by Andrey, free_ipc_ns()->shm_destroy()
    does fput() which needs task_work_add().

    Note: this particular problem can be fixed if we change fput(), and
    that change makes sense anyway. But there is another reason to move
    the callsite. The original reason for exit_task_namespaces() from
    the middle of exit_notify() was subtle and it has already gone away,
    now this looks confusing. And this allows us do simplify exit_notify(),
    we can avoid unlock/lock(tasklist) and we can use ->exit_state instead
    of PF_EXITING in forget_original_parent().

    Reported-by: Andrey Vagin
    Signed-off-by: Oleg Nesterov
    Acked-by: "Eric W. Biederman"
    Acked-by: Andrey Vagin
    Signed-off-by: Al Viro

    Oleg Nesterov
     
  • fput() assumes that it can't be called after exit_task_work() but
    this is not true, for example free_ipc_ns()->shm_destroy() can do
    this. In this case fput() silently leaks the file.

    Change it to fallback to delayed_fput_work if task_work_add() fails.
    The patch looks complicated but it is not, it changes the code from

    if (PF_KTHREAD) {
    schedule_work(...);
    return;
    }
    task_work_add(...)

    to
    if (!PF_KTHREAD) {
    if (!task_work_add(...))
    return;
    /* fallback */
    }
    schedule_work(...);

    As for shm_destroy() in particular, we could make another fix but I
    think this change makes sense anyway. There could be another similar
    user, it is not safe to assume that task_work_add() can't fail.

    Reported-by: Andrey Vagin
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Al Viro

    Oleg Nesterov
     
  • Unfortunately, we cannot guarantee that items logged multiple times
    and replayed by log recovery do not take objects back in time. When
    they are taken back in time, the go into an intermediate state which
    is corrupt, and hence verification that occurs on this intermediate
    state causes log recovery to abort with a corruption shutdown.

    Instead of causing a shutdown and unmountable filesystem, don't
    verify post-recovery items before they are written to disk. This is
    less than optimal, but there is no way to detect this issue for
    non-CRC filesystems If log recovery successfully completes, this
    will be undone and the object will be consistent by subsequent
    transactions that are replayed, so in most cases we don't need to
    take drastic action.

    For CRC enabled filesystems, leave the verifiers in place - we need
    to call them to recalculate the CRCs on the objects anyway. This
    recovery problem can be solved for such filesystems - we have a LSN
    stamped in all metadata at writeback time that we can to determine
    whether the item should be replayed or not. This is a separate piece
    of work, so is not addressed by this patch.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    (cherry picked from commit 9222a9cf86c0d64ffbedf567412b55da18763aa3)

    Dave Chinner
     
  • For CRC enabled filesystems, the BMBT is rooted in an inode, so it
    passes through a different code path on root splits than the
    freespace and inode btrees. This is much less traversed by xfstests
    than the other trees. When testing on a 1k block size filesystem,
    I've been seeing ASSERT failures in generic/234 like:

    XFS: Assertion failed: cur->bc_btnum != XFS_BTNUM_BMAP || cur->bc_private.b.allocated == 0, file: fs/xfs/xfs_btree.c, line: 317

    which are generally preceded by a lblock check failure. I noticed
    this in the bmbt stats:

    $ pminfo -f xfs.btree.block_map

    xfs.btree.block_map.lookup
    value 39135

    xfs.btree.block_map.compare
    value 268432

    xfs.btree.block_map.insrec
    value 15786

    xfs.btree.block_map.delrec
    value 13884

    xfs.btree.block_map.newroot
    value 2

    xfs.btree.block_map.killroot
    value 0
    .....

    Very little coverage of root splits and merges. Indeed, on a 4k
    filesystem, block_map.newroot and block_map.killroot are both zero.
    i.e. the code is not exercised at all, and it's the only generic
    btree infrastructure operation that is not exercised by a default run
    of xfstests.

    Turns out that on a 1k filesystem, generic/234 accounts for one of
    those two root splits, and that is somewhat of a smoking gun. In
    fact, it's the same problem we saw in the directory/attr code where
    headers are memcpy()d from one block to another without updating the
    self describing metadata.

    Simple fix - when copying the header out of the root block, make
    sure the block number is updated correctly.

    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    (cherry picked from commit ade1335afef556df6538eb02e8c0dc91fbd9cc37)

    Dave Chinner
     
  • Michael L. Semon has been testing CRC patches on a 32 bit system and
    been seeing assert failures in the directory code from xfs/080.
    Thanks to Michael's heroic efforts with printk debugging, we found
    that the problem was that the last free space being left in the
    directory structure was too small to fit a unused tag structure and
    it was being corrupted and attempting to log a region out of bounds.
    Hence the assert failure looked something like:

    .....
    #5 calling xfs_dir2_data_log_unused() 36 32
    #1 4092 4095 4096
    #2 8182 8183 4096
    XFS: Assertion failed: first < BBTOB(bp->b_length), file: fs/xfs/xfs_trans_buf.c, line: 568

    Where #1 showed the first region of the dup being logged (i.e. the
    last 4 bytes of a directory buffer) and #2 shows the corrupt values
    being calculated from the length of the dup entry which overflowed
    the size of the buffer.

    It turns out that the problem was not in the logging code, nor in
    the freespace handling code. It is an initial condition bug that
    only shows up on 32 bit systems. When a new buffer is initialised,
    where's the freespace that is set up:

    [ 172.316249] calling xfs_dir2_leaf_addname() from xfs_dir_createname()
    [ 172.316346] #9 calling xfs_dir2_data_log_unused()
    [ 172.316351] #1 calling xfs_trans_log_buf() 60 63 4096
    [ 172.316353] #2 calling xfs_trans_log_buf() 4094 4095 4096

    Note the offset of the first region being logged? It's 60 bytes into
    the buffer. Once I saw that, I pretty much knew that the bug was
    going to be caused by this.

    Essentially, all direct entries are rounded to 8 bytes in length,
    and all entries start with an 8 byte alignment. This means that we
    can decode inplace as variables are naturally aligned. With the
    directory data supposedly starting on a 8 byte boundary, and all
    entries padded to 8 bytes, the minimum freespace in a directory
    block is supposed to be 8 bytes, which is large enough to fit a
    unused data entry structure (6 bytes in size). The fact we only have
    4 bytes of free space indicates a directory data block alignment
    problem.

    And what do you know - there's an implicit hole in the directory
    data block header for the CRC format, which means the header is 60
    byte on 32 bit intel systems and 64 bytes on 64 bit systems. Needs
    padding. And while looking at the structures, I found the same
    problem in the attr leaf header. Fix them both.

    Note that this only affects 32 bit systems with CRCs enabled.
    Everything else is just fine. Note that CRC enabled filesystems created
    before this fix on such systems will not be readable with this fix
    applied.

    Reported-by: Michael L. Semon
    Debugged-by: Michael L. Semon
    Signed-off-by: Dave Chinner
    Reviewed-by: Ben Myers
    Signed-off-by: Ben Myers

    (cherry picked from commit 8a1fd2950e1fe267e11fc8c85dcaa6b023b51b60)

    Dave Chinner
     
  • We write the superblock every 30s or so which results in the
    verifier being called. Right now that results in this output
    every 30s:

    XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
    Use of these features in this kernel is at your own risk!

    And spamming the logs.

    We don't need to check for whether we support v5 superblocks or
    whether there are feature bits we don't support set as these are
    only relevant when we first mount the filesytem. i.e. on superblock
    read. Hence for the write verification we can just skip all the
    checks (and hence verbose output) altogether.

    Signed-off-by: Dave Chinner
    Reviewed-by: Brian Foster
    Signed-off-by: Ben Myers

    (cherry picked from commit 34510185abeaa5be9b178a41c0a03d30aec3db7e)

    Dave Chinner