05 Jul, 2015

5 commits

  • Pull more vfs updates from Al Viro:
    "Assorted VFS fixes and related cleanups (IMO the most interesting in
    that part are f_path-related things and Eric's descriptor-related
    stuff). UFS regression fixes (it got broken last cycle). 9P fixes.
    fs-cache series, DAX patches, Jan's file_remove_suid() work"

    [ I'd say this is much more than "fixes and related cleanups". The
    file_table locking rule change by Eric Dumazet is a rather big and
    fundamental update even if the patch isn't huge. - Linus ]

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (49 commits)
    9p: cope with bogus responses from server in p9_client_{read,write}
    p9_client_write(): avoid double p9_free_req()
    9p: forgetting to cancel request on interrupted zero-copy RPC
    dax: bdev_direct_access() may sleep
    block: Add support for DAX reads/writes to block devices
    dax: Use copy_from_iter_nocache
    dax: Add block size note to documentation
    fs/file.c: __fget() and dup2() atomicity rules
    fs/file.c: don't acquire files->file_lock in fd_install()
    fs:super:get_anon_bdev: fix race condition could cause dev exceed its upper limitation
    vfs: avoid creation of inode number 0 in get_next_ino
    namei: make set_root_rcu() return void
    make simple_positive() public
    ufs: use dir_pages instead of ufs_dir_pages()
    pagemap.h: move dir_pages() over there
    remove the pointless include of lglock.h
    fs: cleanup slight list_entry abuse
    xfs: Correctly lock inode when removing suid and file capabilities
    fs: Call security_ops->inode_killpriv on truncate
    fs: Provide function telling whether file_remove_privs() will do anything
    ...

    Linus Torvalds
     
  • Commit 835a6a2f8603 ("Bluetooth: Stop sabotaging list poisoning")
    thought that the code was sabotaging the list poisoning when NULL'ing
    out the list pointers and removed it.

    But what was going on was that the bluetooth code was using NULL
    pointers for the list as a way to mark it empty, and that commit just
    broke it (and replaced the test with NULL with a "list_empty()" test on
    a uninitialized list instead, breaking things even further).

    So fix it all up to use the regular and real list_empty() handling
    (which does not use NULL, but a pointer to itself), also making sure to
    initialize the list properly (the previous NULL case was initialized
    implicitly by the session being allocated with kzalloc())

    This is a combination of patches by Marcel Holtmann and Tedd Ho-Jeong
    An.

    [ I would normally expect to get this through the bt tree, but I'm going
    to release -rc1, so I'm just committing this directly - Linus ]

    Reported-and-tested-by: Jörg Otte
    Cc: Alexey Dobriyan
    Original-by: Tedd Ho-Jeong An
    Original-by: Marcel Holtmann :
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • if server claims to have written/read more than we'd told it to,
    warn and cap the claimed byte count to avoid advancing more than
    we are ready to.

    Al Viro
     
  • Braino in "9p: switch p9_client_write() to passing it struct iov_iter *";
    if response is impossible to parse and we discard the request, get the
    out of the loop right there.

    Cc: stable@vger.kernel.org
    Signed-off-by: Al Viro

    Al Viro
     
  • If we'd already sent a request and decide to abort it, we *must*
    issue TFLUSH properly and not just blindly reuse the tag, or
    we'll get seriously screwed when response eventually arrives
    and we confuse it for response to later request that had reused
    the same tag.

    Cc: stable@vger.kernel.org # v3.2 and later
    Signed-off-by: Al Viro

    Al Viro
     

03 Jul, 2015

3 commits

  • Pull Ceph updates from Sage Weil:
    "We have a pile of bug fixes from Ilya, including a few patches that
    sync up the CRUSH code with the latest from userspace.

    There is also a long series from Zheng that fixes various issues with
    snapshots, inline data, and directory fsync, some simplification and
    improvement in the cap release code, and a rework of the caching of
    directory contents.

    To top it off there are a few small fixes and cleanups from Benoit and
    Hong"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (40 commits)
    rbd: use GFP_NOIO in rbd_obj_request_create()
    crush: fix a bug in tree bucket decode
    libceph: Fix ceph_tcp_sendpage()'s more boolean usage
    libceph: Remove spurious kunmap() of the zero page
    rbd: queue_depth map option
    rbd: store rbd_options in rbd_device
    rbd: terminate rbd_opts_tokens with Opt_err
    ceph: fix ceph_writepages_start()
    rbd: bump queue_max_segments
    ceph: rework dcache readdir
    crush: sync up with userspace
    crush: fix crash from invalid 'take' argument
    ceph: switch some GFP_NOFS memory allocation to GFP_KERNEL
    ceph: pre-allocate data structure that tracks caps flushing
    ceph: re-send flushing caps (which are revoked) in reconnect stage
    ceph: send TID of the oldest pending caps flush to MDS
    ceph: track pending caps flushing globally
    ceph: track pending caps flushing accurately
    libceph: fix wrong name "Ceph filesystem for Linux"
    ceph: fix directory fsync
    ...

    Linus Torvalds
     
  • Pull NFS client updates from Trond Myklebust:
    "Highlights include:

    Stable patches:
    - Fix a crash in the NFSv4 file locking code.
    - Fix an fsync() regression, where we were failing to retry I/O in
    some circumstances.
    - Fix an infinite loop in NFSv4.0 OPEN stateid recovery
    - Fix a memory leak when an attempted pnfs fails.
    - Fix a memory leak in the backchannel code
    - Large hostnames were not supported correctly in NFSv4.1
    - Fix a pNFS/flexfiles bug that was impeding error reporting on I/O.
    - Fix a couple of credential issues in pNFS/flexfiles

    Bugfixes + cleanups:
    - Open flag sanity checks in the NFSv4 atomic open codepath
    - More NFSv4 delegation related bugfixes
    - Various NFSv4.1 backchannel bugfixes and cleanups
    - Fix the NFS swap socket code
    - Various cleanups of the NFSv4 SETCLIENTID and EXCHANGE_ID code
    - Fix a UDP transport deadlock issue

    Features:
    - More RDMA client transport improvements
    - NFSv4.2 LAYOUTSTATS functionality for pnfs flexfiles"

    * tag 'nfs-for-4.2-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (87 commits)
    nfs: Remove invalid tk_pid from debug message
    nfs: Remove invalid NFS_ATTR_FATTR_V4_REFERRAL checking in nfs4_get_rootfh
    nfs: Drop bad comment in nfs41_walk_client_list()
    nfs: Remove unneeded micro checking of CONFIG_PROC_FS
    nfs: Don't setting FILE_CREATED flags always
    nfs: Use remove_proc_subtree() instead remove_proc_entry()
    nfs: Remove unused argument in nfs_server_set_fsinfo()
    nfs: Fix a memory leak when meeting an unsupported state protect
    nfs: take extra reference to fl->fl_file when running a LOCKU operation
    NFSv4: When returning a delegation, don't reclaim an incompatible open mode.
    NFSv4.2: LAYOUTSTATS is optional to implement
    NFSv4.2: Fix up a decoding error in layoutstats
    pNFS/flexfiles: Fix the reset of struct pgio_header when resending
    pNFS/flexfiles: Turn off layoutcommit for servers that don't need it
    pnfs/flexfiles: protect ktime manipulation with mirror lock
    nfs: provide pnfs_report_layoutstat when NFS42 is disabled
    nfs: verify open flags before allowing open
    nfs: always update creds in mirror, even when we have an already connected ds
    nfs: fix potential credential leak in ff_layout_update_mirror_cred
    pnfs/flexfiles: report layoutstat regularly
    ...

    Linus Torvalds
     
  • …scm/linux/kernel/git/paulg/linux

    Pull module_init replacement part two from Paul Gortmaker:
    "Replace module_init with appropriate alternate initcall in non
    modules.

    This series converts non-modular code that is using the module_init()
    call to hook itself into the system to instead use one of our
    alternate priority initcalls.

    Unlike the previous series that used device_initcall and hence was a
    runtime no-op, these commits change to one of the alternate initcalls,
    because (a) we have them and (b) it seems like the right thing to do.

    For example, it would seem logical to use arch_initcall for arch
    specific setup code and fs_initcall for filesystem setup code.

    This does mean however, that changes in the init ordering will be
    taking place, and so there is a small risk that some kind of implicit
    init ordering issue may lie uncovered. But I think it is still better
    to give these ones sensible priorities than to just assign them all to
    device_initcall in order to exactly preserve the old ordering.

    Thad said, we have already made similar changes in core kernel code in
    commit c96d6660dc65 ("kernel: audit/fix non-modular users of
    module_init in core code") without any regressions reported, so this
    type of change isn't without precedent. It has also got the same
    local testing and linux-next coverage as all the other pull requests
    that I'm sending for this merge window have got.

    Once again, there is an unused module_exit function removal that shows
    up as an outlier upon casual inspection of the diffstat"

    * tag 'module_init-alternate_initcall-v4.1-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux:
    x86: perf_event_intel_pt.c: use arch_initcall to hook in enabling
    x86: perf_event_intel_bts.c: use arch_initcall to hook in enabling
    mm/page_owner.c: use late_initcall to hook in enabling
    lib/list_sort: use late_initcall to hook in self tests
    arm: use subsys_initcall in non-modular pl320 IPC code
    powerpc: don't use module_init for non-modular core hugetlb code
    powerpc: use subsys_initcall for Freescale Local Bus
    x86: don't use module_init for non-modular core bootflag code
    netfilter: don't use module_init/exit in core IPV4 code
    fs/notify: don't use module_init for non-modular inotify_user code
    mm: replace module_init usages with subsys_initcall in nommu.c

    Linus Torvalds
     

02 Jul, 2015

2 commits

  • Pull networking fixes from David Miller:

    1) mlx4 driver bug fixes (TX queue wakeups, csum complete indications)
    from Ido Shamay, Eran Ben Elisha, and Or Gerlitz.

    2) Missing unlock in error path of PTP support in renesas driver, from
    Dan Carpenter.

    3) Add Vitesse 8641 phy IDs to vitesse PHY driver, from Shaohui Xie.

    4) Bnx2x driver bug fixes (linearization of encap packets, scratchpad
    parity error notifications, flow-control and speed settings) from
    Yuval Mintz, Manish Chopra, Shahed Shaikh, and Ariel Elior.

    5) ipv6 extension header parsing in the igb chip has a HW errata,
    disable it. Frm Todd Fujinaka.

    6) Fix PCI link state locking issue in e1000e driver, from Yanir
    Lubetkin.

    7) Cure panics during MTU change in i40e, from Mitch Williams.

    8) Don't leak promisc refs in DSA slave driver, from Gilad Ben-Yossef.

    9) Add missing HAS_DMA dep to VIA Rhine driver, from Geery
    Uytterhoeven.

    10) Make sure DMA map/unmap calls are symmetric in bnx2x driver, from
    Michal Schmidt.

    11) Workaround for MDIO access problems in bcm7xxx devices, from FLorian
    Fainelli.

    12) Fix races in SCTP protocol between OTTB responses and route
    removals, from Alexander Sverdlin.

    13) Fix jumbo frame checksum issue with some mvneta devices, from Simon
    Guinot.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (58 commits)
    sock_diag: don't broadcast kernel sockets
    net: mvneta: disable IP checksum with jumbo frames for Armada 370
    ARM: mvebu: update Ethernet compatible string for Armada XP
    net: mvneta: introduce compatible string "marvell, armada-xp-neta"
    api: fix compatibility of linux/in.h with netinet/in.h
    net: icplus: fix typo in constant name
    sis900: Trivial: Fix typos in enums
    stmmac: Trivial: fix typo in constant name
    sctp: Fix race between OOTB responce and route removal
    net-Liquidio: Delete unnecessary checks before the function call "vfree"
    vmxnet3: Bump up driver version number
    amd-xgbe: Add the __GFP_NOWARN flag to Rx buffer allocation
    net: phy: mdio-bcm-unimac: workaround initial read failures for integrated PHYs
    net: bcmgenet: workaround initial read failures for integrated PHYs
    net: phy: bcm7xxx: workaround MDIO management controller initial read
    bnx2x: fix DMA API usage
    net: via: VIA_RHINE and VIA_VELOCITY should depend on HAS_DMA
    net/phy: tune get_phy_c45_ids to support more c45 phy
    bnx2x: fix lockdep splat
    net: fec: don't access RACC register when not available
    ...

    Linus Torvalds
     
  • Pull module updates from Rusty Russell:
    "Main excitement here is Peter Zijlstra's lockless rbtree optimization
    to speed module address lookup. He found some abusers of the module
    lock doing that too.

    A little bit of parameter work here too; including Dan Streetman's
    breaking up the big param mutex so writing a parameter can load
    another module (yeah, really). Unfortunately that broke the usual
    suspects, !CONFIG_MODULES and !CONFIG_SYSFS, so those fixes were
    appended too"

    * tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux: (26 commits)
    modules: only use mod->param_lock if CONFIG_MODULES
    param: fix module param locks when !CONFIG_SYSFS.
    rcu: merge fix for Convert ACCESS_ONCE() to READ_ONCE() and WRITE_ONCE()
    module: add per-module param_lock
    module: make perm const
    params: suppress unused variable error, warn once just in case code changes.
    modules: clarify CONFIG_MODULE_COMPRESS help, suggest 'N'.
    kernel/module.c: avoid ifdefs for sig_enforce declaration
    kernel/workqueue.c: remove ifdefs over wq_power_efficient
    kernel/params.c: export param_ops_bool_enable_only
    kernel/params.c: generalize bool_enable_only
    kernel/module.c: use generic module param operaters for sig_enforce
    kernel/params: constify struct kernel_param_ops uses
    sysfs: tightened sysfs permission checks
    module: Rework module_addr_{min,max}
    module: Use __module_address() for module_address_lookup()
    module: Make the mod_tree stuff conditional on PERF_EVENTS || TRACING
    module: Optimize __module_address() using a latched RB-tree
    rbtree: Implement generic latch_tree
    seqlock: Introduce raw_read_seqcount_latch()
    ...

    Linus Torvalds
     

01 Jul, 2015

2 commits

  • struct crush_bucket_tree::num_nodes is u8, so ceph_decode_8_safe()
    should be used. -Wconversion catches this, but I guess it went
    unnoticed in all the noise it spews. The actual problem (at least for
    common crushmaps) isn't the u32 -> u8 truncation though - it's the
    advancement by 4 bytes instead of 1 in the crushmap buffer.

    Fixes: http://tracker.ceph.com/issues/2759

    Cc: stable@vger.kernel.org
    Signed-off-by: Ilya Dryomov
    Reviewed-by: Josh Durgin

    Ilya Dryomov
     
  • Kernel sockets do not hold a reference for the network namespace to
    which they point. Socket destruction broadcasting relies on the
    network namespace and will cause the splat below when a kernel socket
    is destroyed.

    This fix simply ignores kernel sockets when they are destroyed.

    Reported as:
    general protection fault: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    CPU: 1 PID: 9130 Comm: kworker/1:1 Not tainted 4.1.0-gelk-debug+ #1
    Workqueue: sock_diag_events sock_diag_broadcast_destroy_work
    Stack:
    ffff8800b9c586c0 ffff8800b9c586c0 ffff8800ac4692c0 ffff8800936d4a90
    ffff8800352efd38 ffffffff8469a93e ffff8800352efd98 ffffffffc09b9b90
    ffff8800352efd78 ffff8800ac4692c0 ffff8800b9c586c0 ffff8800831b6ab8
    Call Trace:
    [] ? mutex_unlock+0xe/0x10
    [] ? inet_diag_handler_get_info+0x110/0x1fb [inet_diag]
    [] netlink_broadcast+0x1d/0x20
    [] ? mutex_unlock+0xe/0x10
    [] sock_diag_broadcast_destroy_work+0xd5/0x160
    [] process_one_work+0x147/0x420
    [] worker_thread+0x69/0x470
    [] ? preempt_count_sub+0xa3/0xf0
    [] ? rescuer_thread+0x320/0x320
    [] kthread+0x107/0x120
    [] ? kthread_create_on_node+0x1b0/0x1b0
    [] ret_from_fork+0x3f/0x70
    [] ? kthread_create_on_node+0x1b0/0x1b0

    Tested:
    Using a debug kernel while 'ss -E' is running:
    ip netns add test-ns
    ip netns delete test-ns

    Fixes: eb4cb008529c sock_diag: define destruction multicast groups
    Fixes: 26abe14379f8 net: Modify sk_alloc to not reference count the
    netns of kernel sockets.
    Reported-by: Dave Jones
    Suggested-by: Eric Dumazet

    Signed-off-by: Craig Gallek
    Signed-off-by: David S. Miller

    Craig Gallek
     

30 Jun, 2015

2 commits

  • From struct ceph_msg_data_cursor in include/linux/ceph/messenger.h:

    bool last_piece; /* current is last piece */

    In ceph_msg_data_next():

    *last_piece = cursor->last_piece;

    A call to ceph_msg_data_next() is followed by:

    ret = ceph_tcp_sendpage(con->sock, page, page_offset,
    length, last_piece);

    while ceph_tcp_sendpage() is:

    static int ceph_tcp_sendpage(struct socket *sock, struct page *page,
    int offset, size_t size, bool more)

    The logic is inverted: correct it.

    Signed-off-by: Benoît Canet
    Reviewed-by: Alex Elder
    Signed-off-by: Ilya Dryomov

    Benoît Canet
     
  • There is NULL pointer dereference possible during statistics update if the route
    used for OOTB responce is removed at unfortunate time. If the route exists when
    we receive OOTB packet and we finally jump into sctp_packet_transmit() to send
    ABORT, but in the meantime route is removed under our feet, we take "no_route"
    path and try to update stats with IP_INC_STATS(sock_net(asoc->base.sk), ...).

    But sctp_ootb_pkt_new() used to prepare responce packet doesn't call
    sctp_transport_set_owner() and therefore there is no asoc associated with this
    packet. Probably temporary asoc just for OOTB responces is overkill, so just
    introduce a check like in all other places in sctp_packet_transmit(), where
    "asoc" is dereferenced.

    To reproduce this, one needs to
    0. ensure that sctp module is loaded (otherwise ABORT is not generated)
    1. remove default route on the machine
    2. while true; do
    ip route del [interface-specific route]
    ip route add [interface-specific route]
    done
    3. send enough OOTB packets (i.e. HB REQs) from another host to trigger ABORT
    responce

    On x86_64 the crash looks like this:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
    IP: [] sctp_packet_transmit+0x63c/0x730 [sctp]
    PGD 0
    Oops: 0000 [#1] PREEMPT SMP
    Modules linked in: ...
    CPU: 0 PID: 0 Comm: swapper/0 Tainted: G O 4.0.5-1-ARCH #1
    Hardware name: ...
    task: ffffffff818124c0 ti: ffffffff81800000 task.ti: ffffffff81800000
    RIP: 0010:[] [] sctp_packet_transmit+0x63c/0x730 [sctp]
    RSP: 0018:ffff880127c037b8 EFLAGS: 00010296
    RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00000015ff66b480
    RDX: 00000015ff66b400 RSI: ffff880127c17200 RDI: ffff880123403700
    RBP: ffff880127c03888 R08: 0000000000017200 R09: ffffffff814625af
    R10: ffffea00047e4680 R11: 00000000ffffff80 R12: ffff8800b0d38a28
    R13: ffff8800b0d38a28 R14: ffff8800b3e88000 R15: ffffffffa05f24e0
    FS: 0000000000000000(0000) GS:ffff880127c00000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000020 CR3: 00000000c855b000 CR4: 00000000000007f0
    Stack:
    ffff880127c03910 ffff8800b0d38a28 ffffffff8189d240 ffff88011f91b400
    ffff880127c03828 ffffffffa05c94c5 0000000000000000 ffff8800baa1c520
    0000000000000000 0000000000000001 0000000000000000 0000000000000000
    Call Trace:

    [] ? sctp_sf_tabort_8_4_8.isra.20+0x85/0x140 [sctp]
    [] ? sctp_transport_put+0x52/0x80 [sctp]
    [] sctp_do_sm+0xb8c/0x19a0 [sctp]
    [] ? trigger_load_balance+0x90/0x210
    [] ? update_process_times+0x59/0x60
    [] ? timerqueue_add+0x60/0xb0
    [] ? enqueue_hrtimer+0x29/0xa0
    [] ? read_tsc+0x9/0x10
    [] ? put_page+0x55/0x60
    [] ? clockevents_program_event+0x6d/0x100
    [] ? skb_free_head+0x58/0x80
    [] ? chksum_update+0x1b/0x27 [crc32c_generic]
    [] ? crypto_shash_update+0xce/0xf0
    [] sctp_endpoint_bh_rcv+0x113/0x280 [sctp]
    [] sctp_inq_push+0x46/0x60 [sctp]
    [] sctp_rcv+0x880/0x910 [sctp]
    [] ? sctp_packet_transmit_chunk+0xb0/0xb0 [sctp]
    [] ? sctp_csum_update+0x20/0x20 [sctp]
    [] ? ip_route_input_noref+0x235/0xd30
    [] ? ack_ioapic_level+0x7b/0x150
    [] ip_local_deliver_finish+0xae/0x210
    [] ip_local_deliver+0x35/0x90
    [] ip_rcv_finish+0xf5/0x370
    [] ip_rcv+0x2b8/0x3a0
    [] __netif_receive_skb_core+0x763/0xa50
    [] __netif_receive_skb+0x18/0x60
    [] netif_receive_skb_internal+0x40/0xd0
    [] napi_gro_receive+0xe8/0x120
    [] rtl8169_poll+0x2da/0x660 [r8169]
    [] net_rx_action+0x21a/0x360
    [] __do_softirq+0xe1/0x2d0
    [] irq_exit+0xad/0xb0
    [] do_IRQ+0x58/0xf0
    [] common_interrupt+0x6d/0x6d

    [] ? hrtimer_start+0x18/0x20
    [] ? sctp_transport_destroy_rcu+0x29/0x30 [sctp]
    [] ? mwait_idle+0x60/0xa0
    [] arch_cpu_idle+0xf/0x20
    [] cpu_startup_entry+0x3ec/0x480
    [] rest_init+0x85/0x90
    [] start_kernel+0x48b/0x4ac
    [] ? early_idt_handlers+0x120/0x120
    [] x86_64_start_reservations+0x2a/0x2c
    [] x86_64_start_kernel+0x161/0x184
    Code: 90 48 8b 80 b8 00 00 00 48 89 85 70 ff ff ff 48 83 bd 70 ff ff ff 00 0f 85 cd fa ff ff 48 89 df 31 db e8 18 63 e7 e0 48 8b 45 80 8b 40 20 48 8b 40 30 48 8b 80 68 01 00 00 65 48 ff 40 78 e9
    RIP [] sctp_packet_transmit+0x63c/0x730 [sctp]
    RSP
    CR2: 0000000000000020
    ---[ end trace 5aec7fd2dc983574 ]---
    Kernel panic - not syncing: Fatal exception in interrupt
    Kernel Offset: 0x0 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffff9fffffff)
    drm_kms_helper: panic occurred, switching back to text console
    ---[ end Kernel panic - not syncing: Fatal exception in interrupt

    Signed-off-by: Alexander Sverdlin
    Acked-by: Neil Horman
    Acked-by: Marcelo Ricardo Leitner
    Acked-by: Vlad Yasevich
    Signed-off-by: David S. Miller

    Alexander Sverdlin
     

29 Jun, 2015

6 commits

  • DSA master netdev promiscuity counter was not being properly
    decremented on slave device open error path.

    Signed-off-by: Gilad Ben-Yossef
    CC: Gilad Ben-Yossef
    CC: David S. Miller
    CC: Florian Fainelli
    CC: Guenter Roeck
    CC: Andrew Lunn
    CC: Scott Feldman
    Acked-by: Andrew Lunn
    Acked-by: Florian Fainelli
    Signed-off-by: David S. Miller

    Gilad Ben-Yossef
     
  • No more users, so it can now be removed.

    Signed-off-by: David S. Miller

    David Miller
     
  • Just make a ax25_sock structure that provides the ax25_cb pointer.

    Signed-off-by: David S. Miller

    David Miller
     
  • net/core/flow_dissector.c: In function ‘__skb_flow_dissect’:
    net/core/flow_dissector.c:132: warning: ‘ip_proto’ may be used uninitialized in this function

    Signed-off-by: Geert Uytterhoeven
    Acked-by: Tom Herbert
    Signed-off-by: David S. Miller

    Geert Uytterhoeven
     
  • The following lockdep splat was seen due to the wrong context for
    grabbing in_dev.

    ===============================
    [ INFO: suspicious RCU usage. ]
    4.1.0-next-20150626-dbg-00020-g54a6d91-dirty #244 Not tainted
    -------------------------------
    include/linux/inetdevice.h:205 suspicious rcu_dereference_check() usage!

    other info that might help us debug this:

    rcu_scheduler_active = 1, debug_locks = 0
    2 locks held by ip/403:
    #0: (rtnl_mutex){+.+.+.}, at: [] rtnl_lock+0x17/0x19
    #1: ((inetaddr_chain).rwsem){.+.+.+}, at: [] __blocking_notifier_call_chain+0x35/0x6a

    stack backtrace:
    CPU: 2 PID: 403 Comm: ip Not tainted 4.1.0-next-20150626-dbg-00020-g54a6d91-dirty #244
    0000000000000001 ffff8800b189b728 ffffffff8150a542 ffffffff8107a8b3
    ffff880037bbea40 ffff8800b189b758 ffffffff8107cb74 ffff8800379dbd00
    ffff8800bec85800 ffff8800bf9e13c0 00000000000000ff ffff8800b189b7d8
    Call Trace:
    [] dump_stack+0x4c/0x6e
    [] ? up+0x39/0x3e
    [] lockdep_rcu_suspicious+0xf7/0x100
    [] fib_dump_info+0x227/0x3e2
    [] rtmsg_fib+0xa6/0x116
    [] fib_table_insert+0x316/0x355
    [] fib_magic+0xb7/0xc7
    [] fib_add_ifaddr+0xb1/0x13b
    [] fib_inetaddr_event+0x36/0x90
    [] notifier_call_chain+0x4c/0x71
    [] __blocking_notifier_call_chain+0x4e/0x6a
    [] blocking_notifier_call_chain+0x14/0x16
    [] __inet_insert_ifa+0x1a5/0x1b3
    [] inet_rtm_newaddr+0x350/0x35f
    [] rtnetlink_rcv_msg+0x17b/0x18a
    [] ? trace_hardirqs_on+0xd/0xf
    [] ? netlink_deliver_tap+0x1cb/0x1f7
    [] ? rtnl_newlink+0x72a/0x72a
    ...

    This patch resolves that splat.

    Signed-off-by: Andy Gospodarek
    Reported-by: Sergey Senozhatsky
    Signed-off-by: David S. Miller

    Andy Gospodarek
     
  • In commit 1f66d161ab3d8b518903fa6c3f9c1f48d6919e74
    ("tipc: introduce starvation free send algorithm")
    we introduced a counter per priority level for buffers
    in the link backlog queue. We also introduced a new
    function tipc_link_purge_backlog(), to reset these
    counters to zero when the link is reset.

    Unfortunately, we missed to call this function when
    the broadcast link is reset, with the result that the
    values of these counters might be permanently skewed
    when new nodes are attached. This may in the worst case
    lead to permananent, but spurious, broadcast link
    congestion, where no broadcast packets can be sent at
    all.

    We fix this bug with this commit.

    Signed-off-by: Jon Maloy
    Signed-off-by: David S. Miller

    Jon Paul Maloy
     

28 Jun, 2015

1 commit

  • Pull nfsd updates from Bruce Fields:
    "A relatively quiet cycle, with a mix of cleanup and smaller bugfixes"

    * 'for-4.2' of git://linux-nfs.org/~bfields/linux: (24 commits)
    sunrpc: use sg_init_one() in krb5_rc4_setup_enc/seq_key()
    nfsd: wrap too long lines in nfsd4_encode_read
    nfsd: fput rd_file from XDR encode context
    nfsd: take struct file setup fully into nfs4_preprocess_stateid_op
    nfsd: refactor nfs4_preprocess_stateid_op
    nfsd: clean up raparams handling
    nfsd: use swap() in sort_pacl_range()
    rpcrdma: Merge svcrdma and xprtrdma modules into one
    svcrdma: Add a separate "max data segs macro for svcrdma
    svcrdma: Replace GFP_KERNEL in a loop with GFP_NOFAIL
    svcrdma: Keep rpcrdma_msg fields in network byte-order
    svcrdma: Fix byte-swapping in svc_rdma_sendto.c
    nfsd: Update callback sequnce id only CB_SEQUENCE success
    nfsd: Reset cb_status in nfsd4_cb_prepare() at retrying
    svcrdma: Remove svc_rdma_xdr_decode_deferred_req()
    SUNRPC: Move EXPORT_SYMBOL for svc_process
    uapi/nfs: Add NFSv4.1 ACL definitions
    nfsd: Remove dead declarations
    nfsd: work around a gcc-5.1 warning
    nfsd: Checking for acl support does not require fetching any acls
    ...

    Linus Torvalds
     

25 Jun, 2015

11 commits

  • ceph_tcp_sendpage already does the work of mapping/unmapping
    the zero page if needed.

    Signed-off-by: Benoît Canet
    Reviewed-by: Alex Elder
    Signed-off-by: Ilya Dryomov

    Benoît Canet
     
  • Fix typo in the validation rules for flower's attributes

    Signed-off-by: Jamal Hadi Salim
    Signed-off-by: David S. Miller

    Jamal Hadi Salim
     
  • .. up to ceph.git commit 1db1abc8328d ("crush: eliminate ad hoc diff
    between kernel and userspace"). This fixes a bunch of recently pulled
    coding style issues and makes includes a bit cleaner.

    A patch "crush:Make the function crush_ln static" from Nicholas Krause
    is folded in as crush_ln() has been made static
    in userspace as well.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Verify that the 'take' argument is a valid device or bucket.
    Otherwise ignore it (do not add the value to the working vector).

    Reflects ceph.git commit 9324d0a1af61e1c234cc48e2175b4e6320fff8f4.

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • modinfo libceph prints the module name "Ceph filesystem for Linux",
    which is same as the real fs module ceph. It's confusing.

    Signed-off-by: Hong Zhiguo
    Signed-off-by: Ilya Dryomov

    Hong Zhiguo
     
  • - return -ETIMEDOUT instead of -EIO in case of timeout
    - wait_event_interruptible_timeout() returns time left until timeout
    and since it can be almost LONG_MAX we had better assign it to long

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Alex Elder

    Ilya Dryomov
     
  • There are currently three libceph-level timeouts that the user can
    specify on mount: mount_timeout, osd_idle_ttl and osdkeepalive. All of
    these are in seconds and no checking is done on user input: negative
    values are accepted, we multiply them all by HZ which may or may not
    overflow, arbitrarily large jiffies then get added together, etc.

    There is also a bug in the way mount_timeout=0 is handled. It's
    supposed to mean "infinite timeout", but that's not how wait.h APIs
    treat it and so __ceph_open_session() for example will busy loop
    without much chance of being interrupted if none of ceph-mons are
    there.

    Fix all this by verifying user input, storing timeouts capped by
    msecs_to_jiffies() in jiffies and using the new ceph_timeout_jiffies()
    helper for all user-specified waits to handle infinite timeouts
    correctly.

    Signed-off-by: Ilya Dryomov
    Reviewed-by: Alex Elder

    Ilya Dryomov
     
  • This one sneaked in through vfs tree with commit 2b777c9dd9eb
    ("ceph_sync_read: stop poking into iov_iter guts").

    Signed-off-by: Ilya Dryomov

    Ilya Dryomov
     
  • Signed-off-by: Yan, Zheng
    Reviewed-by: Alex Elder

    Yan, Zheng
     
  • Signed-off-by: Yan, Zheng
    Reviewed-by: Alex Elder

    Yan, Zheng
     
  • Pull networking updates from David Miller:

    1) Add TX fast path in mac80211, from Johannes Berg.

    2) Add TSO/GRO support to ibmveth, from Thomas Falcon

    3) Move away from cached routes in ipv6, just like ipv4, from Martin
    KaFai Lau.

    4) Lots of new rhashtable tests, from Thomas Graf.

    5) Run ingress qdisc lockless, from Alexei Starovoitov.

    6) Allow servers to fetch TCP packet headers for SYN packets of new
    connections, for fingerprinting. From Eric Dumazet.

    7) Add mode parameter to pktgen, for testing receive. From Alexei
    Starovoitov.

    8) Cache access optimizations via simplifications of build_skb(), from
    Alexander Duyck.

    9) Move page frag allocator under mm/, also from Alexander.

    10) Add xmit_more support to hv_netvsc, from KY Srinivasan.

    11) Add a counter guard in case we try to perform endless reclassify
    loops in the packet scheduler.

    12) Extern flow dissector to be programmable and use it in new "Flower"
    classifier. From Jiri Pirko.

    13) AF_PACKET fanout rollover fixes, performance improvements, and new
    statistics. From Willem de Bruijn.

    14) Add netdev driver for GENEVE tunnels, from John W Linville.

    15) Add ingress netfilter hooks and filtering, from Pablo Neira Ayuso.

    16) Fix handling of epoll edge triggers in TCP, from Eric Dumazet.

    17) Add an ECN retry fallback for the initial TCP handshake, from Daniel
    Borkmann.

    18) Add tail call support to BPF, from Alexei Starovoitov.

    19) Add several pktgen helper scripts, from Jesper Dangaard Brouer.

    20) Add zerocopy support to AF_UNIX, from Hannes Frederic Sowa.

    21) Favor even port numbers for allocation to connect() requests, and
    odd port numbers for bind(0), in an effort to help avoid
    ip_local_port_range exhaustion. From Eric Dumazet.

    22) Add Cavium ThunderX driver, from Sunil Goutham.

    23) Allow bpf programs to access skb_iif and dev->ifindex SKB metadata,
    from Alexei Starovoitov.

    24) Add support for T6 chips in cxgb4vf driver, from Hariprasad Shenai.

    25) Double TCP Small Queues default to 256K to accomodate situations
    like the XEN driver and wireless aggregation. From Wei Liu.

    26) Add more entropy inputs to flow dissector, from Tom Herbert.

    27) Add CDG congestion control algorithm to TCP, from Kenneth Klette
    Jonassen.

    28) Convert ipset over to RCU locking, from Jozsef Kadlecsik.

    29) Track and act upon link status of ipv4 route nexthops, from Andy
    Gospodarek.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1670 commits)
    bridge: vlan: flush the dynamically learned entries on port vlan delete
    bridge: multicast: add a comment to br_port_state_selection about blocking state
    net: inet_diag: export IPV6_V6ONLY sockopt
    stmmac: troubleshoot unexpected bits in des0 & des1
    net: ipv4 sysctl option to ignore routes when nexthop link is down
    net: track link-status of ipv4 nexthops
    net: switchdev: ignore unsupported bridge flags
    net: Cavium: Fix MAC address setting in shutdown state
    drivers: net: xgene: fix for ACPI support without ACPI
    ip: report the original address of ICMP messages
    net/mlx5e: Prefetch skb data on RX
    net/mlx5e: Pop cq outside mlx5e_get_cqe
    net/mlx5e: Remove mlx5e_cq.sqrq back-pointer
    net/mlx5e: Remove extra spaces
    net/mlx5e: Avoid TX CQE generation if more xmit packets expected
    net/mlx5e: Avoid redundant dev_kfree_skb() upon NOP completion
    net/mlx5e: Remove re-assignment of wq type in mlx5e_enable_rq()
    net/mlx5e: Use skb_shinfo(skb)->gso_segs rather than counting them
    net/mlx5e: Static mapping of netdev priv resources to/from netdev TX queues
    net/mlx4_en: Use HW counters for rx/tx bytes/packets in PF device
    ...

    Linus Torvalds
     

24 Jun, 2015

8 commits

  • Add a new argument to br_fdb_delete_by_port which allows to specify a
    vid to match when flushing entries and use it in nbp_vlan_delete() to
    flush the dynamically learned entries of the vlan/port pair when removing
    a vlan from a port. Before this patch only the local mac was being
    removed and the dynamically learned ones were left to expire.
    Note that the do_all argument is still respected and if specified, the
    vid will be ignored.

    Signed-off-by: Nikolay Aleksandrov
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Add a comment to explain why we're not disabling port's multicast when it
    goes in blocking state. Since there's a check in the timer's function which
    bypasses the timer if the port's in blocking/disabled state, the timer will
    simply expire and stop without sending more queries.

    Suggested-by: Herbert Xu
    Signed-off-by: Nikolay Aleksandrov
    Acked-by: Herbert Xu
    Signed-off-by: David S. Miller

    Nikolay Aleksandrov
     
  • Conflicts:
    drivers/net/ethernet/mellanox/mlx4/main.c
    net/packet/af_packet.c

    Both conflicts were cases of simple overlapping changes.

    Signed-off-by: David S. Miller

    David S. Miller
     
  • For AF_INET6 sockets, the value of struct ipv6_pinfo.ipv6only is
    exported to userspace. It indicates whether a socket bound to in6addr_any
    listens on IPv4 as well as IPv6. Since the socket is natively IPv6, it is not
    listed by e.g. 'ss -l -4'.

    This patch is accompanied by an appropriate one for iproute2 to enable
    the additional information in 'ss -e'.

    Signed-off-by: Phil Sutter
    Signed-off-by: David S. Miller

    Phil Sutter
     
  • This feature is only enabled with the new per-interface or ipv4 global
    sysctls called 'ignore_routes_with_linkdown'.

    net.ipv4.conf.all.ignore_routes_with_linkdown = 0
    net.ipv4.conf.default.ignore_routes_with_linkdown = 0
    net.ipv4.conf.lo.ignore_routes_with_linkdown = 0
    ...

    When the above sysctls are set, will report to userspace that a route is
    dead and will no longer resolve to this nexthop when performing a fib
    lookup. This will signal to userspace that the route will not be
    selected. The signalling of a RTNH_F_DEAD is only passed to userspace
    if the sysctl is enabled and link is down. This was done as without it
    the netlink listeners would have no idea whether or not a nexthop would
    be selected. The kernel only sets RTNH_F_DEAD internally if the
    interface has IFF_UP cleared.

    With the new sysctl set, the following behavior can be observed
    (interface p8p1 is link-down):

    default via 10.0.5.2 dev p9p1
    10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15
    70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1
    80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 dead linkdown
    90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 dead linkdown
    90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2
    90.0.0.1 via 70.0.0.2 dev p7p1 src 70.0.0.1
    cache
    local 80.0.0.1 dev lo src 80.0.0.1
    cache
    80.0.0.2 via 10.0.5.2 dev p9p1 src 10.0.5.15
    cache

    While the route does remain in the table (so it can be modified if
    needed rather than being wiped away as it would be if IFF_UP was
    cleared), the proper next-hop is chosen automatically when the link is
    down. Now interface p8p1 is linked-up:

    default via 10.0.5.2 dev p9p1
    10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15
    70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1
    80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1
    90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1
    90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2
    192.168.56.0/24 dev p2p1 proto kernel scope link src 192.168.56.2
    90.0.0.1 via 80.0.0.2 dev p8p1 src 80.0.0.1
    cache
    local 80.0.0.1 dev lo src 80.0.0.1
    cache
    80.0.0.2 dev p8p1 src 80.0.0.1
    cache

    and the output changes to what one would expect.

    If the sysctl is not set, the following output would be expected when
    p8p1 is down:

    default via 10.0.5.2 dev p9p1
    10.0.5.0/24 dev p9p1 proto kernel scope link src 10.0.5.15
    70.0.0.0/24 dev p7p1 proto kernel scope link src 70.0.0.1
    80.0.0.0/24 dev p8p1 proto kernel scope link src 80.0.0.1 linkdown
    90.0.0.0/24 via 80.0.0.2 dev p8p1 metric 1 linkdown
    90.0.0.0/24 via 70.0.0.2 dev p7p1 metric 2

    Since the dead flag does not appear, there should be no expectation that
    the kernel would skip using this route due to link being down.

    v2: Split kernel changes into 2 patches, this actually makes a
    behavioral change if the sysctl is set. Also took suggestion from Alex
    to simplify code by only checking sysctl during fib lookup and
    suggestion from Scott to add a per-interface sysctl.

    v3: Code clean-ups to make it more readable and efficient as well as a
    reverse path check fix.

    v4: Drop binary sysctl

    v5: Whitespace fixups from Dave

    v6: Style changes from Dave and checkpatch suggestions

    v7: One more checkpatch fixup

    Signed-off-by: Andy Gospodarek
    Signed-off-by: Dinesh Dutt
    Acked-by: Scott Feldman
    Signed-off-by: David S. Miller

    Andy Gospodarek
     
  • Add a fib flag called RTNH_F_LINKDOWN to any ipv4 nexthops that are
    reachable via an interface where carrier is off. No action is taken,
    but additional flags are passed to userspace to indicate carrier status.

    This also includes a cleanup to fib_disable_ip to more clearly indicate
    what event made the function call to replace the more cryptic force
    option previously used.

    v2: Split out kernel functionality into 2 patches, this patch simply
    sets and clears new nexthop flag RTNH_F_LINKDOWN.

    v3: Cleanups suggested by Alex as well as a bug noticed in
    fib_sync_down_dev and fib_sync_up when multipath was not enabled.

    v5: Whitespace and variable declaration fixups suggested by Dave.

    v6: Style fixups noticed by Dave; ran checkpatch to be sure I got them
    all.

    Signed-off-by: Andy Gospodarek
    Signed-off-by: Dinesh Dutt
    Acked-by: Scott Feldman
    Signed-off-by: David S. Miller

    Andy Gospodarek
     
  • switchdev_port_bridge_getlink() queries SWITCHDEV_ATTR_PORT_BRIDGE_FLAGS
    attributes, but a driver doesn't need to implement this in order to get
    bridge link information.

    So error out only on errors different than -EOPNOTSUPP.

    (This is a follow-up patch for 7d4f8d8.)

    Fixes: 8793d0a664a8 ("switchdev: add new switchdev_port_bridge_getlink")
    Signed-off-by: Vivien Didelot
    Acked-by: Jiri Pirko
    Acked-by: Scott Feldman
    Signed-off-by: David S. Miller

    Vivien Didelot
     
  • ICMP messages can trigger ICMP and local errors. In this case
    serr->port is 0 and starting from Linux 4.0 we do not return
    the original target address to the error queue readers.
    Add function to define which errors provide addr_offset.
    With this fix my ping command is not silent anymore.

    Fixes: c247f0534cc5 ("ip: fix error queue empty skb handling")
    Signed-off-by: Julian Anastasov
    Acked-by: Willem de Bruijn
    Signed-off-by: David S. Miller

    Julian Anastasov