01 Aug, 2012

5 commits

  • Pull perf updates from Ingo Molnar:
    "The biggest changes are Intel Nehalem-EX PMU uncore support, uprobes
    updates/cleanups/fixes from Oleg and diverse tooling updates (mostly
    fixes) now that Arnaldo is back from vacation."

    * 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
    uprobes: __replace_page() needs munlock_vma_page()
    uprobes: Rename vma_address() and make it return "unsigned long"
    uprobes: Fix register_for_each_vma()->vma_address() check
    uprobes: Introduce vaddr_to_offset(vma, vaddr)
    uprobes: Teach build_probe_list() to consider the range
    uprobes: Remove insert_vm_struct()->uprobe_mmap()
    uprobes: Remove copy_vma()->uprobe_mmap()
    uprobes: Fix overflow in vma_address()/find_active_uprobe()
    uprobes: Suppress uprobe_munmap() from mmput()
    uprobes: Uprobe_mmap/munmap needs list_for_each_entry_safe()
    uprobes: Clean up and document write_opcode()->lock_page(old_page)
    uprobes: Kill write_opcode()->lock_page(new_page)
    uprobes: __replace_page() should not use page_address_in_vma()
    uprobes: Don't recheck vma/f_mapping in write_opcode()
    perf/x86: Fix missing struct before structure name
    perf/x86: Fix format definition of SNB-EP uncore QPI box
    perf/x86: Make bitfield unsigned
    perf/x86: Fix LLC-* and node-* events on Intel SandyBridge
    perf/x86: Add Intel Nehalem-EX uncore support
    perf/x86: Fix typo in format definition of uncore PCU filter
    ...

    Linus Torvalds
     
  • Pull powerpc updates from Benjamin Herrenschmidt:
    "Kumar sent me a handful of Freescale related fixes and I added another
    regression fix to the pile.

    PS. I -will- eventually learn about that signed tag business :-)"

    * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
    powerpc/kvm/book3s_32: Fix MTMSR_EERI macro
    powerpc/85xx: p1022ds: fix DIU/LBC switching with NAND enabled
    powerpc/85xx: p1022ds: disable the NAND flash node if video is enabled
    powerpc/85xx: Fix sram_offset parameter type
    powerpc/85xx: P3041DS - change espi input-clock from 40MHz to 35MHz
    powerpc/85xx: Fix pci base address error for p2020rdb-pc in dts

    Linus Torvalds
     
  • Pull s390 updates from Martin Schwidefsky:
    "This it the second batch of s390 patches for the 3.6 merge window.
    Included is enablement for two common code changes, killable page
    faults and sorted exception tables. And the regular set of cleanup
    and bug fix patches."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
    s390: make use of user_mode() macro where possible
    s390/mm: rename user_mode variable to addressing_mode
    s390/mm: fix fault handling for page table walk case
    s390/mm: make page faults killable
    s390: update defconfig
    s390/mm: downgrade page table after fork of a 31 bit process
    s390/ipl: Use diagnose 8 command separation
    s390/linker script: use RO_DATA_SECTION
    s390/exceptions: sort exception table at build time
    s390/debug: remove module_exit function / move EXPORT_SYMBOLs

    Linus Torvalds
     
  • Pull nfsd changes from J. Bruce Fields:
    "This has been an unusually quiet cycle--mostly bugfixes and cleanup.
    The one large piece is Stanislav's work to containerize the server's
    grace period--but that in itself is just one more step in a
    not-yet-complete project to allow fully containerized nfs service.

    There are a number of outstanding delegation, container, v4 state, and
    gss patches that aren't quite ready yet; 3.7 may be wilder."

    * 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (35 commits)
    NFSd: make boot_time variable per network namespace
    NFSd: make grace end flag per network namespace
    Lockd: move grace period management from lockd() to per-net functions
    LockD: pass actual network namespace to grace period management functions
    LockD: manage grace list per network namespace
    SUNRPC: service request network namespace helper introduced
    NFSd: make nfsd4_manager allocated per network namespace context.
    LockD: make lockd manager allocated per network namespace
    LockD: manage grace period per network namespace
    Lockd: add more debug to host shutdown functions
    Lockd: host complaining function introduced
    LockD: manage used host count per networks namespace
    LockD: manage garbage collection timeout per networks namespace
    LockD: make garbage collector network namespace aware.
    LockD: mark host per network namespace on garbage collect
    nfsd4: fix missing fault_inject.h include
    locks: move lease-specific code out of locks_delete_lock
    locks: prevent side-effects of locks_release_private before file_lock is initialized
    NFSd: set nfsd_serv to NULL after service destruction
    NFSd: introduce nfsd_destroy() helper
    ...

    Linus Torvalds
     
  • Pull Ceph changes from Sage Weil:
    "Lots of stuff this time around:

    - lots of cleanup and refactoring in the libceph messenger code, and
    many hard to hit races and bugs closed as a result.
    - lots of cleanup and refactoring in the rbd code from Alex Elder,
    mostly in preparation for the layering functionality that will be
    coming in 3.7.
    - some misc rbd cleanups from Josh Durgin that are finally going
    upstream
    - support for CRUSH tunables (used by newer clusters to improve the
    data placement)
    - some cleanup in our use of d_parent that Al brought up a while back
    - a random collection of fixes across the tree

    There is another patch coming that fixes up our ->atomic_open()
    behavior, but I'm going to hammer on it a bit more before sending it."

    Fix up conflicts due to commits that were already committed earlier in
    drivers/block/rbd.c, net/ceph/{messenger.c, osd_client.c}

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (132 commits)
    rbd: create rbd_refresh_helper()
    rbd: return obj version in __rbd_refresh_header()
    rbd: fixes in rbd_header_from_disk()
    rbd: always pass ops array to rbd_req_sync_op()
    rbd: pass null version pointer in add_snap()
    rbd: make rbd_create_rw_ops() return a pointer
    rbd: have __rbd_add_snap_dev() return a pointer
    libceph: recheck con state after allocating incoming message
    libceph: change ceph_con_in_msg_alloc convention to be less weird
    libceph: avoid dropping con mutex before fault
    libceph: verify state after retaking con lock after dispatch
    libceph: revoke mon_client messages on session restart
    libceph: fix handling of immediate socket connect failure
    ceph: update MAINTAINERS file
    libceph: be less chatty about stray replies
    libceph: clear all flags on con_close
    libceph: clean up con flags
    libceph: replace connection state bits with states
    libceph: drop unnecessary CLOSED check in socket state change callback
    libceph: close socket directly from ceph_con_close()
    ...

    Linus Torvalds
     

31 Jul, 2012

35 commits

  • Commit b38c77d82e4 moved the MTMSR_EERI macro from the KVM code to generic
    ppc_asm.h code. However, while adding it in the headers for the ppc32 case,
    it missed out to remove the former definition in the KVM code.

    This patch fixes compilation on server type PPC32 targets with CONFIG_KVM
    enabled.

    Signed-off-by: Alexander Graf
    Signed-off-by: Benjamin Herrenschmidt

    Alexander Graf
     
  • Kumar says:

    "A few patches that missed the initial 3.6 window. These are bug fixes at
    this point."

    Benjamin Herrenschmidt
     
  • Pull writeback updates from Wu Fengguang:
    "Use time based periods to age the writeback proportions, which can
    adapt equally well to fast/slow devices."

    Fix up trivial conflict in comment in fs/sync.c

    * tag 'writeback-proportions' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
    writeback: Fix some comment errors
    block: Convert BDI proportion calculations to flexible proportions
    lib: Fix possible deadlock in flexible proportion code
    lib: Proportions with flexible period

    Linus Torvalds
     
  • Pull NFS client updates from Trond Myklebust:
    "Features include:
    - More preparatory patches for modularising NFSv2/v3/v4. Split out
    the various NFSv2/v3/v4-specific code into separate files
    - More preparation for the NFSv4 migration code
    - Ensure that OPEN(O_CREATE) observes the pNFS mds threshold
    parameters
    - pNFS fast failover when the data servers are down
    - Various cleanups and debugging patches"

    * tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (67 commits)
    nfs: fix fl_type tests in NFSv4 code
    NFS: fix pnfs regression with directio writes
    NFS: fix pnfs regression with directio reads
    sunrpc: clnt: Add missing braces
    nfs: fix stub return type warnings
    NFS: exit_nfs_v4() shouldn't be an __exit function
    SUNRPC: Add a missing spin_unlock to gss_mech_list_pseudoflavors
    NFS: Split out NFS v4 client functions
    NFS: Split out the NFS v4 filesystem types
    NFS: Create a single nfs_clone_super() function
    NFS: Split out NFS v4 server creating code
    NFS: Initialize the NFS v4 client from init_nfs_v4()
    NFS: Move the v4 getroot code to nfs4getroot.c
    NFS: Split out NFS v4 file operations
    NFS: Initialize v4 sysctls from nfs_init_v4()
    NFS: Create an init_nfs_v4() function
    NFS: Split out NFS v4 inode operations
    NFS: Split out NFS v3 inode operations
    NFS: Split out NFS v2 inode operations
    NFS: Clean up nfs4_proc_setclientid() and friends
    ...

    Linus Torvalds
     
  • Pull MFD fix from Samuel Ortiz:
    "This one fixes an s5m8767 regulator build breakage due to a merge
    conflict caused by the MFD s5m API changes."

    * tag 'mfd-for-linus-3.6-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
    regulator: Fix an s5m8767 build failure

    Linus Torvalds
     
  • Pull media updates from Mauro Carvalho Chehab:
    "This is the first part of the media patches for v3.6.

    This patch series contain:
    - new DVB frontend: rtl2832
    - new video drivers: adv7393
    - some unused files got removed
    - a selection API cleanup between V4L2 and V4L2 subdev API's
    - a major redesign at v4l-ioctl2, in order to clean it up
    - several driver fixes and improvements."

    * 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (174 commits)
    v4l: Export v4l2-common.h in include/linux/Kbuild
    media: Revert "[media] Terratec Cinergy S2 USB HD Rev.2"
    [media] media: Use pr_info not homegrown pr_reg macro
    [media] Terratec Cinergy S2 USB HD Rev.2
    [media] v4l: Correct conflicting V4L2 subdev selection API documentation
    [media] Feature removal: V4L2 selections API target and flag definitions
    [media] v4l: Unify selection flags documentation
    [media] v4l: Unify selection flags
    [media] v4l: Common documentation for selection targets
    [media] v4l: Unify selection targets across V4L2 and V4L2 subdev interfaces
    [media] v4l: Remove "_ACTUAL" from subdev selection API target definition names
    [media] V4L: Remove "_ACTIVE" from the selection target name definitions
    [media] media: dvb-usb: print mac address via native %pM
    [media] s5p-tv: Use module_i2c_driver in sii9234_drv.c file
    [media] media: gpio-ir-recv: add allowed_protos for platform data
    [media] s5p-jpeg: Use module_platform_driver in jpeg-core.c file
    [media] saa7134: fix spelling of detach in label
    [media] cx88-blackbird: replace ioctl by unlocked_ioctl
    [media] cx88: don't use current_norm
    [media] cx88: fix a number of v4l2-compliance violations
    ...

    Linus Torvalds
     
  • Create a simple helper that handles the common case of calling
    __rbd_refresh_header() while holding the ctl_mutex.

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     
  • Add a new parameter to __rbd_refresh_header() through which the
    version of the header object is passed back to the caller. In most
    cases this isn't needed. The main motivation is to normalize
    (almost) all calls to __rbd_refresh_header() so they are all
    wrapped immediately by mutex_lock()/mutex_unlock().

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     
  • This fixes a few issues in rbd_header_from_disk():
    - There is a check intended to catch overflow, but it's wrong in
    two ways.
    - First, the type we don't want to overflow is size_t, not
    unsigned int, and there is now a SIZE_MAX we can use for
    use with that type.
    - Second, we're allocating the snapshot ids and snapshot
    image sizes separately (each has type u64; on disk they
    grouped together as a rbd_image_header_ondisk structure).
    So we can use the size of u64 in this overflow check.
    - If there are no snapshots, then there should be no snapshot
    names. Enforce this, and issue a warning if we encounter a
    header with no snapshots but a non-zero snap_names_len.
    - When saving the snapshot names into the header, be more direct
    in defining the offset in the on-disk structure from which
    they're being copied by using "snap_count" rather than "i"
    in the array index.
    - If an error occurs, the "snapc" and "snap_names" fields are
    freed at the end of the function. Make those fields be null
    pointers after they're freed, to be explicit that they are
    no longer valid.
    - Finally, move the definition of the local variable "i" to the
    innermost scope in which it's needed.

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     
  • All of the callers of rbd_req_sync_op() except one pass a non-null
    "ops" pointer. The only one that does not is rbd_req_sync_read(),
    which passes CEPH_OSD_OP_READ as its "opcode" and, CEPH_OSD_FLAG_READ
    for "flags".

    By allocating the ops array in rbd_req_sync_read() and moving the
    special case code for the null ops pointer into it, it becomes
    clear that much of that code is not even necessary.

    In addition, the "opcode" argument to rbd_req_sync_op() is never
    actually used, so get rid of that.

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     
  • rbd_header_add_snap() passes the address of a version variable to
    rbd_req_sync_exec(), but it ignores the result. Just pass a null
    pointer instead.

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     
  • Either rbd_create_rw_ops() will succeed, or it will fail because a
    memory allocation failed. Have it just return a valid pointer or
    null rather than stuffing a pointer into a provided address and
    returning an errno.

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     
  • It's not obvious whether the snapshot pointer whose address is
    provided to __rbd_add_snap_dev() will be assigned by that function.
    Change it to return the snapshot, or a pointer-coded errno in the
    event of a failure.

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     
  • We drop the lock when calling the ->alloc_msg() con op, which means
    we need to (a) not clobber con->in_msg without the mutex held, and (b)
    we need to verify that we are still in the OPEN state when we retake
    it to avoid causing any mayhem. If the state does change, -EAGAIN
    will get us back to con_work() and loop.

    Signed-off-by: Sage Weil
    Reviewed-by: Alex Elder

    Sage Weil
     
  • This function's calling convention is very limiting. In particular,
    we can't return any error other than ENOMEM (and only implicitly),
    which is a problem (see next patch).

    Instead, return an normal 0 or error code, and make the skip a pointer
    output parameter. Drop the useless in_hdr argument (we have the con
    pointer).

    Signed-off-by: Sage Weil
    Reviewed-by: Alex Elder

    Sage Weil
     
  • The ceph_fault() function takes the con mutex, so we should avoid
    dropping it before calling it. This fixes a potential race with
    another thread calling ceph_con_close(), or _open(), or similar (we
    don't reverify con->state after retaking the lock).

    Add annotation so that lockdep realizes we will drop the mutex before
    returning.

    Signed-off-by: Sage Weil
    Reviewed-by: Alex Elder

    Sage Weil
     
  • We drop the con mutex when delivering a message. When we retake the
    lock, we need to verify we are still in the OPEN state before
    preparing to read the next tag, or else we risk stepping on a
    connection that has been closed.

    Signed-off-by: Sage Weil
    Reviewed-by: Alex Elder

    Sage Weil
     
  • Revoke all mon_client messages when we shut down the old connection.
    This is mostly moot since we are re-using the same ceph_connection,
    but it is cleaner.

    Signed-off-by: Sage Weil
    Reviewed-by: Alex Elder

    Sage Weil
     
  • If the connect() call immediately fails such that sock == NULL, we
    still need con_close_socket() to reset our socket state to CLOSED.

    Signed-off-by: Sage Weil
    Reviewed-by: Alex Elder

    Sage Weil
     
  • * shiny new inktank.com email addresses
    * add include/linux/crush directory (previous oversight)

    Signed-off-by: Sage Weil
    Reviewed-by: Alex Elder

    Sage Weil
     
  • There are many (normal) conditions that can lead to us getting
    unexpected replies, include cluster topology changes, osd failures,
    and timeouts. There's no need to spam the console about it.

    Signed-off-by: Sage Weil
    Reviewed-by: Alex Elder

    Sage Weil
     
  • Signed-off-by: Sage Weil

    Sage Weil
     
  • Rename flags with CON_FLAG prefix, move the definitions into the c file,
    and (better) document their meaning.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • Use a simple set of 6 enumerated values for the socket states (CON_STATE_*)
    and use those instead of the state bits. All of the con->state checks are
    now under the protection of the con mutex, so this is safe. It also
    simplifies many of the state checks because we can check for anything other
    than the expected state instead of various bits for races we can think of.

    This appears to hold up well to stress testing both with and without socket
    failure injection on the server side.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • If we are CLOSED, the socket is closed and we won't get these.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • It is simpler to do this immediately, since we already hold the con mutex.
    It also avoids the need to deal with a not-quite-CLOSED socket in con_work.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • If the state is CLOSED or OPENING, we shouldn't have a socket.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • Take the con mutex before checking whether the connection is closed to
    avoid racing with someone else closing it.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • Avoid dropping and retaking con->mutex in the ceph_con_send() case by
    leaving locking up to the caller.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • If we fault on a lossy connection, we should still close the socket
    immediately, and do so under the con mutex.

    We should also take the con mutex before printing out the state bits in
    the debug output.

    Signed-off-by: Sage Weil

    Sage Weil
     
  • rbd_req_sync_unwatch() only ever uses rbd_dev->header_name as the
    value of its "object_name" parameter, and that value is available
    within the function already. So get rid of the parameter.

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     
  • rbd_req_sync_notify_ack() only ever uses rbd_dev->header_name as the
    value of its "object_name" parameter, and that value is available
    within the function already. So get rid of the parameter.

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     
  • rbd_req_sync_notify() only ever uses rbd_dev->header_name as the
    value of its "object_name" parameter, and that value is available
    within the function already. So get rid of the parameter.

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     
  • rbd_req_sync_watch() is only called in one place, and in that place
    it passes rbd_dev->header_name as the value of the "object_name"
    parameter. This value is available within the function already.

    Having the extra parameter leaves the impression the object name
    could take on different values, but it does not.

    So get rid of the parameter. We can always add it back again if
    we find we want to watch some other object in the future.

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder
     
  • Both rbd_register_snap_dev() and __rbd_remove_snap_dev() have
    rbd_dev parameters that are unused. Remove them.

    Signed-off-by: Alex Elder
    Reviewed-by: Josh Durgin

    Alex Elder