30 Oct, 2011

22 commits

  • The currentsd[] array in hpsa_update_scsi_devices had room for
    256 devices. The code was iterating over however many physical
    and logical devices plus an additional number of possible external
    MSA2XXX controllers, which together could potentially exceed 256.

    We increased the size of the currentsd array to 1024 + 1024 + 32 + 1
    elements to reflect a reasonable maximum possible number of devices
    which might be encountered. We also don't just walk off the end
    of the array if the array controller reports more devices than we
    are prepared to handle, we just ignore the excessive devices.

    Signed-off-by: Scott Teel
    Signed-off-by: Stephen M. Cameron
    Signed-off-by: James Bottomley

    Scott Teel
     
  • Rename HPSA_MAX_SCSI_DEVS_PER_HBA to HPSA_MAX_DEVICES

    Signed-off-by: Scott Teel
    Signed-off-by: Stephen M. Cameron
    Signed-off-by: James Bottomley

    Scott Teel
     
  • Signed-off-by: Stephen M. Cameron
    Signed-off-by: James Bottomley

    Stephen M. Cameron
     
  • Set the max hardware sectors in the SCSI host template to 8192
    to allow for larger i/o's (8192 is the same limit the cciss
    driver currently has.)

    Signed-off-by: Stephen M. Cameron
    Signed-off-by: James Bottomley

    Stephen M. Cameron
     
  • During heavy I/O (CPU-affinity mode enabled) and CLI/Agent
    interactions, the driver would report periodic mailbox command
    timeout statuses. Within the CPU-affinity ISR handler, the
    driver should check the 'disable-msix-handshake' flag in deciding
    whether or not to clear HCCRX_CLR_RISC_INT. The mode is not
    specific to a dedicated queue, instead, applies to the current
    'ha' context.

    Signed-off-by: Andrew Vasquez
    Signed-off-by: Chad Dupuis
    Signed-off-by: James Bottomley

    Andrew Vasquez
     
  • Size is 1st arg, not second.

    Signed-off-by: Dave Jones
    Signed-off-by: James Bottomley

    Dave Jones
     
  • Signed-off-by: Bhanu Prakash Gollapudi
    Signed-off-by: James Bottomley

    Bhanu Prakash Gollapudi
     
  • When SRR LS_ACC is dropped, the driver was not issuing ABTS for SRR when it
    times out. Since the target received SRR, it was able to send the XFER_RDY and
    the the original IO request completed successfully. In this condition ABTS was
    not sent during bnx2fc_srr_compl(). Fix this by first checking for ELS timeout
    and issue ABTS before checking if original IO request is complete.

    Signed-off-by: Bhanu Prakash Gollapudi
    Signed-off-by: James Bottomley

    Bhanu Prakash Gollapudi
     
  • If the IO and the corresponding ABTS are not responded by a target, cleanup the
    IO and issue explicit logout when ulp timer expires while waiting for ABTS to
    complete. Wait for the session to be ready before returning to the SCSI layer.
    If the session is not ready let the SCSI-ml escalate the error recovery.

    Signed-off-by: Bhanu Prakash Gollapudi
    Signed-off-by: James Bottomley

    Bhanu Prakash Gollapudi
     
  • The call to complete() in st_scsi_execute_end() wakes up sleeping thread
    in write_behind_check(), which frees the st_request, thus invalidating
    the pointer to the associated bio structure, which is then passed to the
    blk_rq_unmap_user(). Fix by storing pointer to bio structure into
    temporary local variable.

    This bug is present since at least linux-2.6.32.

    CC: stable@kernel.org
    Signed-off-by: Petr Uzel
    Reported-by: Juergen Groß
    Reviewed-by: Jan Kara
    Acked-by: Kai Mäkisara
    Signed-off-by: James Bottomley

    Petr Uzel
     
  • Make sure that SCSI device removal via scsi_remove_host() does finish
    all pending SCSI commands. Currently that's not the case and hence
    removal of a SCSI host during I/O can cause a deadlock. See also
    "blkdev_issue_discard() hangs forever if underlying storage device is
    removed" (http://bugzilla.kernel.org/show_bug.cgi?id=40472). See also
    http://lkml.org/lkml/2011/8/27/6.

    Signed-off-by: Bart Van Assche
    Cc:
    Signed-off-by: James Bottomley

    Bart Van Assche
     
  • There is no reason to limit the SCSI disk namespace to sdXXX.

    Add new error messages to sd_probe() in the unlikely event that either
    ida_get_new() or sd_format_disk_name() fail.

    Signed-off-by: Dave Kleikamp
    Signed-off-by: James Bottomley

    Dave Kleikamp
     
  • Bump driver vesion to 10.100.00.00

    Signed-off-by: Nagalakshmi Nandigama
    Signed-off-by: James Bottomley

    nagalakshmi.nandigama@lsi.com
     
  • The driver was setting the action to MPI2_CONFIG_ACTION_PAGE_READ_CURRENT,
    which only returns active volumes. In order to get info on inactive volumes,
    the driver needs to change the action to
    MPI2_RAID_PGAD_FORM_GET_NEXT_CONFIGNUM, and traverse each config till the
    iocstatus is MPI2_IOCSTATUS_CONFIG_INVALID_PAGE returned.
    Added a change in the driver to remove the instance of
    sas_device object when the driver returns "1" from the slave_configure callback.
    Also fixed code to report the hot spares to the operating system with a /dev/sg
    assigned.

    Signed-off-by: Nagalakshmi Nandigama
    Cc: stable@kernel.org
    Signed-off-by: James Bottomley

    nagalakshmi.nandigama@lsi.com
     
  • …lete while issued during creating a volume

    This is due to the slave_configuration routine is getting called when
    host reset is active, and config page reads are failing, and driver
    attempts to added device with stale config data.

    To fix the issue, added error checking in slave_configure to check
    for configuration pages failing, and return "1" so the device is
    not configured. The config pages are failing if raid volume is
    configured while issuing a host reset, thus driver is reading stale
    data and proceeding to attempt to add. The fix is to return error
    so the volume is not configured.

    Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
    Signed-off-by: James Bottomley <JBottomley@Parallels.com>

    nagalakshmi.nandigama@lsi.com
     
  • This is due to driver reporting a device missing to the OS then the OS sending
    a SYNC_CACHE request to driver while the IO queues are locked due to host reset.

    To fix the issue, the driver will be waking up the port enable context
    immediately when the driver receives the reply message, instead of waiting
    on the hot plug worker threads.

    Signed-off-by: Nagalakshmi Nandigama
    Signed-off-by: James Bottomley

    nagalakshmi.nandigama@lsi.com
     
  • Fix for dead lock occurring between host_lock and sas_device_lock.

    The deadlock is between two spin locks, between the shost->host_lock
    and driver ioc->sas_device_lock.

    The fix is to rearrange the code in the FW/Driver device removal
    handshake so the ioc->sas_device_lock is not occurring when the
    shost->host_lock is taken.

    [jejb: zero initialise sas_address to fix spurious compiler warning]
    Signed-off-by: Nagalakshmi Nandigama
    Signed-off-by: James Bottomley

    nagalakshmi.nandigama@lsi.com
     
  • …while host reset is active

    The fix is in the driver-firmware handshake device removal code. We
    need to read the controller ioc_state to see if controller is OPERATIONAL
    prior to sending target reset and OP_REMOVE. Previously it was checking
    the flag ioc->shost_recovery flag, which is always set when host reset is
    active, thus preventing drives from getting properly deleted.

    Signed-off-by: Nagalakshmi Nandigama <nagalakshmi.nandigama@lsi.com>
    Signed-off-by: James Bottomley <JBottomley@Parallels.com>

    nagalakshmi.nandigama@lsi.com
     
  • The fix is to inhibit the warning message in _scsih_get_sas_address
    when the MPI2_IOCSTATUS_CONFIG_INVALID_PAGE ioc status is returned.

    Signed-off-by: Nagalakshmi Nandigama
    Signed-off-by: James Bottomley

    nagalakshmi.nandigama@lsi.com
     
  • Fix for issue : While discovery is in progress, hot unplug and hot plug of
    enclosure connected to the controller card is causing system to hang.

    When a device is in the process of being detected at driver load time then
    if it is removed, the device that is no longer present will not be added
    to the list. So the code in _scsih_probe_sas() is rearranged as such so
    the devices that failed to be detected are not added to the list.

    Signed-off-by: Nagalakshmi Nandigama
    Cc: stable@kernel.org
    Signed-off-by: James Bottomley

    nagalakshmi.nandigama@lsi.com
     
  • New feature Fast Load Support.

    (1)Asynchronous SCSI scanning: This will allow the drivers to scan
    for devices in parallel while other device drivers are loading at
    the same time. This will improve the amount of time it takes for the
    OS to load.

    (2) Reporting Devices while port enable is active: This feature will
    allow devices to be reported to OS immediately while port enable is
    active. The previous implementation waits for port enable to complete,
    and then report devices. This feature is only enabled on IT firmware
    configurations when there are no boot device configured in BIOS Configuration
    Utility, else the driver will wait till port enable completes reporting
    devices. For IR firmware, this feature is turned off. This feature is to
    address large SAS topologies (>100 drives) when the boot OS is using onboard
    SATA device, in other words, the boot devices is not
    connected to our controller.

    (3) Scanning for devices after diagnostic reset completes: A new routine
    _scsih_scan_start is added. This will scan the expander pages, IR pages,
    and sas device pages, then reporting new devices to SCSI Mid layer. It
    seems the driver is not supporting adding devices while diagnostic reset
    is active. Apparently this is due to the sanity checks on
    ioc->shost_recovery flag throughout the context of kernel work thread FIFO,
    and the mpt2sas_fw_work.

    Signed-off-by: Nagalakshmi Nandigama
    Signed-off-by: James Bottomley

    nagalakshmi.nandigama@lsi.com
     
  • 1)Added ProxyVF_ID field to Configuration Request message.
    2)Added IO Unit Page 8, IO Unit Page 9,and IO Unit Page 10.
    3)Added SASNotifyPrimitiveMasks field to IOC Page 7.
    4)Added SAS NOTIFY Primitive event.
    5)Added Temperature Threshold Event.
    6)Added Host Message Event.
    7)Added Send Host Message request and reply.

    Signed-off-by: Nagalakshmi Nandigama
    Signed-off-by: James Bottomley

    nagalakshmi.nandigama@lsi.com
     

29 Oct, 2011

10 commits

  • * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (204 commits)
    [SCSI] qla4xxx: export address/port of connection (fix udev disk names)
    [SCSI] ipr: Fix BUG on adapter dump timeout
    [SCSI] megaraid_sas: Fix instance access in megasas_reset_timer
    [SCSI] hpsa: change confusing message to be more clear
    [SCSI] iscsi class: fix vlan configuration
    [SCSI] qla4xxx: fix data alignment and use nl helpers
    [SCSI] iscsi class: fix link local mispelling
    [SCSI] iscsi class: Replace iscsi_get_next_target_id with IDA
    [SCSI] aacraid: use lower snprintf() limit
    [SCSI] lpfc 8.3.27: Change driver version to 8.3.27
    [SCSI] lpfc 8.3.27: T10 additions for SLI4
    [SCSI] lpfc 8.3.27: Fix queue allocation failure recovery
    [SCSI] lpfc 8.3.27: Change algorithm for getting physical port name
    [SCSI] lpfc 8.3.27: Changed worst case mailbox timeout
    [SCSI] lpfc 8.3.27: Miscellanous logic and interface fixes
    [SCSI] megaraid_sas: Changelog and version update
    [SCSI] megaraid_sas: Add driver workaround for PERC5/1068 kdump kernel panic
    [SCSI] megaraid_sas: Add multiple MSI-X vector/multiple reply queue support
    [SCSI] megaraid_sas: Add support for MegaRAID 9360/9380 12GB/s controllers
    [SCSI] megaraid_sas: Clear FUSION_IN_RESET before enabling interrupts
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://ceph.newdream.net/git/ceph-client:
    libceph: fix double-free of page vector
    ceph: fix 32-bit ino numbers
    libceph: force resend of osd requests if we skip an osdmap
    ceph: use kernel DNS resolver
    ceph: fix ceph_monc_init memory leak
    ceph: let the set_layout ioctl set single traits
    Revert "ceph: don't truncate dirty pages in invalidate work thread"
    ceph: replace leading spaces with tabs
    libceph: warn on msg allocation failures
    libceph: don't complain on msgpool alloc failures
    libceph: always preallocate mon connection
    libceph: create messenger with client
    ceph: document ioctls
    ceph: implement (optional) max read size
    ceph: rename rsize -> rasize
    ceph: make readpages fully async

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (549 commits)
    ALSA: hda - Fix ADC input-amp handling for Cx20549 codec
    ALSA: hda - Keep EAPD turned on for old Conexant chips
    ALSA: hda/realtek - Fix missing volume controls with ALC260
    ASoC: wm8940: Properly set codec->dapm.bias_level
    ALSA: hda - Fix pin-config for ASUS W90V
    ALSA: hda - Fix surround/CLFE headphone and speaker pins order
    ALSA: hda - Fix typo
    ALSA: Update the sound git tree URL
    ALSA: HDA: Add new revision for ALC662
    ASoC: max98095: Convert codec->hw_write to snd_soc_write
    ASoC: keep pointer to resource so it can be freed
    ASoC: sgtl5000: Fix wrong mask in some snd_soc_update_bits calls
    ASoC: wm8996: Fix wrong mask for setting WM8996_AIF_CLOCKING_2
    ASoC: da7210: Add support for line out and DAC
    ASoC: da7210: Add support for DAPM
    ALSA: hda/realtek - Fix DAC assignments of multiple speakers
    ASoC: Use SGTL5000_LINREG_VDDD_MASK instead of hardcoded mask value
    ASoC: Set sgtl5000->ldo in ldo_regulator_register
    ASoC: wm8996: Use SND_SOC_DAPM_AIF_OUT for AIF2 Capture
    ASoC: wm8994: Use SND_SOC_DAPM_AIF_OUT for AIF3 Capture
    ...

    Linus Torvalds
     
  • * 'next-rebase' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci:
    PCI: Clean-up MPS debug output
    pci: Clamp pcie_set_readrq() when using "performance" settings
    PCI: enable MPS "performance" setting to properly handle bridge MPS
    PCI: Workaround for Intel MPS errata
    PCI: Add support for PASID capability
    PCI: Add implementation for PRI capability
    PCI: Export ATS functions to modules
    PCI: Move ATS implementation into own file
    PCI / PM: Remove unnecessary error variable from acpi_dev_run_wake()
    PCI hotplug: acpiphp: Prevent deadlock on PCI-to-PCI bridge remove
    PCI / PM: Extend PME polling to all PCI devices
    PCI quirk: mmc: Always check for lower base frequency quirk for Ricoh 1180:e823
    PCI: Make pci_setup_bridge() non-static for use by arch code
    x86: constify PCI raw ops structures
    PCI: Add quirk for known incorrect MPSS
    PCI: Add Solarflare vendor ID and SFC4000 device IDs

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/cjb/mmc: (83 commits)
    mmc: fix compile error when CONFIG_BLOCK is not enabled
    mmc: core: Cleanup eMMC4.5 conditionals
    mmc: omap_hsmmc: if multiblock reads are broken, disable them
    mmc: core: add workaround for controllers with broken multiblock reads
    mmc: core: Prevent too long response times for suspend
    mmc: recognise SDIO cards with SDIO_CCCR_REV 3.00
    mmc: sd: Handle SD3.0 cards not supporting UHS-I bus speed mode
    mmc: core: support HPI send command
    mmc: core: Add cache control for eMMC4.5 device
    mmc: core: Modify the timeout value for writing power class
    mmc: core: new discard feature support at eMMC v4.5
    mmc: core: mmc sanitize feature support for v4.5
    mmc: dw_mmc: modify DATA register offset
    mmc: sdhci-pci: add flag for devices that can support runtime PM
    mmc: omap_hsmmc: ensure pbias configuration is always done
    mmc: core: Add Power Off Notify Feature eMMC 4.5
    mmc: sdhci-s3c: fix potential NULL dereference
    mmc: replace printk with appropriate display macro
    mmc: core: Add default timeout value for CMD6
    mmc: sdhci-pci: add runtime pm support
    ...

    Linus Torvalds
     
  • …git-cur/linux-2.6-arm

    * 'devel-stable' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm: (178 commits)
    ARM: 7139/1: fix compilation with CONFIG_ARM_ATAG_DTB_COMPAT and large TEXT_OFFSET
    ARM: gic, local timers: use the request_percpu_irq() interface
    ARM: gic: consolidate PPI handling
    ARM: switch from NO_MACH_MEMORY_H to NEED_MACH_MEMORY_H
    ARM: mach-s5p64x0: remove mach/memory.h
    ARM: mach-s3c64xx: remove mach/memory.h
    ARM: plat-mxc: remove mach/memory.h
    ARM: mach-prima2: remove mach/memory.h
    ARM: mach-zynq: remove mach/memory.h
    ARM: mach-bcmring: remove mach/memory.h
    ARM: mach-davinci: remove mach/memory.h
    ARM: mach-pxa: remove mach/memory.h
    ARM: mach-ixp4xx: remove mach/memory.h
    ARM: mach-h720x: remove mach/memory.h
    ARM: mach-vt8500: remove mach/memory.h
    ARM: mach-s5pc100: remove mach/memory.h
    ARM: mach-tegra: remove mach/memory.h
    ARM: plat-tcc: remove mach/memory.h
    ARM: mach-mmp: remove mach/memory.h
    ARM: mach-cns3xxx: remove mach/memory.h
    ...

    Fix up mostly pretty trivial conflicts in:
    - arch/arm/Kconfig
    - arch/arm/include/asm/localtimer.h
    - arch/arm/kernel/Makefile
    - arch/arm/mach-shmobile/board-ap4evb.c
    - arch/arm/mach-u300/core.c
    - arch/arm/mm/dma-mapping.c
    - arch/arm/mm/proc-v7.S
    - arch/arm/plat-omap/Kconfig
    largely due to some CONFIG option renaming (ie CONFIG_PM_SLEEP ->
    CONFIG_ARM_CPU_SUSPEND for the arm-specific suspend code etc) and
    addition of NEED_MACH_MEMORY_H next to HAVE_IDE.

    Linus Torvalds
     
  • * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/hch/vfs-queue: (21 commits)
    leases: fix write-open/read-lease race
    nfs: drop unnecessary locking in llseek
    ext4: replace cut'n'pasted llseek code with generic_file_llseek_size
    vfs: add generic_file_llseek_size
    vfs: do (nearly) lockless generic_file_llseek
    direct-io: merge direct_io_walker into __blockdev_direct_IO
    direct-io: inline the complete submission path
    direct-io: separate map_bh from dio
    direct-io: use a slab cache for struct dio
    direct-io: rearrange fields in dio/dio_submit to avoid holes
    direct-io: fix a wrong comment
    direct-io: separate fields only used in the submission path from struct dio
    vfs: fix spinning prevention in prune_icache_sb
    vfs: add a comment to inode_permission()
    vfs: pass all mask flags check_acl and posix_acl_permission
    vfs: add hex format for MAY_* flag values
    vfs: indicate that the permission functions take all the MAY_* flags
    compat: sync compat_stats with statfs.
    vfs: add "device" tag to /proc/self/mountstats
    cleanup: vfs: small comment fix for block_invalidatepage
    ...

    Fix up trivial conflict in fs/gfs2/file.c (llseek changes)

    Linus Torvalds
     
  • * http://sucs.org/~rohan/git/gfs2-3.0-nmw: (24 commits)
    GFS2: Move readahead of metadata during deallocation into its own function
    GFS2: Remove two unused variables
    GFS2: Misc fixes
    GFS2: rewrite fallocate code to write blocks directly
    GFS2: speed up delete/unlink performance for large files
    GFS2: Fix off-by-one in gfs2_blk2rgrpd
    GFS2: Clean up ->page_mkwrite
    GFS2: Correctly set goal block after allocation
    GFS2: Fix AIL flush issue during fsync
    GFS2: Use cached rgrp in gfs2_rlist_add()
    GFS2: Call do_strip() directly from recursive_scan()
    GFS2: Remove obsolete assert
    GFS2: Cache the most recently used resource group in the inode
    GFS2: Make resource groups "append only" during life of fs
    GFS2: Use rbtree for resource groups and clean up bitmap buffer ref count scheme
    GFS2: Fix lseek after SEEK_DATA, SEEK_HOLE have been added
    GFS2: Clean up gfs2_create
    GFS2: Use ->dirty_inode()
    GFS2: Fix bug trap and journaled data fsync
    GFS2: Fix inode allocation error path
    ...

    Linus Torvalds
     
  • * '3.2-without-smb2' of git://git.samba.org/sfrench/cifs-2.6: (52 commits)
    Fix build break when freezer not configured
    Add definition for share encryption
    CIFS: Make cifs_push_locks send as many locks at once as possible
    CIFS: Send as many mandatory unlock ranges at once as possible
    CIFS: Implement caching mechanism for posix brlocks
    CIFS: Implement caching mechanism for mandatory brlocks
    CIFS: Fix DFS handling in cifs_get_file_info
    CIFS: Fix error handling in cifs_readv_complete
    [CIFS] Fixup trivial checkpatch warning
    [CIFS] Show nostrictsync and noperm mount options in /proc/mounts
    cifs, freezer: add wait_event_freezekillable and have cifs use it
    cifs: allow cifs_max_pending to be readable under /sys/module/cifs/parameters
    cifs: tune bdi.ra_pages in accordance with the rsize
    cifs: allow for larger rsize= options and change defaults
    cifs: convert cifs_readpages to use async reads
    cifs: add cifs_async_readv
    cifs: fix protocol definition for READ_RSP
    cifs: add a callback function to receive the rest of the frame
    cifs: break out 3rd receive phase into separate function
    cifs: find mid earlier in receive codepath
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://oss.sgi.com/xfs/xfs: (69 commits)
    xfs: add AIL pushing tracepoints
    xfs: put in missed fix for merge problem
    xfs: do not flush data workqueues in xfs_flush_buftarg
    xfs: remove XFS_bflush
    xfs: remove xfs_buf_target_name
    xfs: use xfs_ioerror_alert in xfs_buf_iodone_callbacks
    xfs: clean up xfs_ioerror_alert
    xfs: clean up buffer allocation
    xfs: remove buffers from the delwri list in xfs_buf_stale
    xfs: remove XFS_BUF_STALE and XFS_BUF_SUPER_STALE
    xfs: remove XFS_BUF_SET_VTYPE and XFS_BUF_SET_VTYPE_REF
    xfs: remove XFS_BUF_FINISH_IOWAIT
    xfs: remove xfs_get_buftarg_list
    xfs: fix buffer flushing during unmount
    xfs: optimize fsync on directories
    xfs: reduce the number of log forces from tail pushing
    xfs: Don't allocate new buffers on every call to _xfs_buf_find
    xfs: simplify xfs_trans_ijoin* again
    xfs: unlock the inode before log force in xfs_change_file_space
    xfs: unlock the inode before log force in xfs_fs_nfs_commit_metadata
    ...

    Linus Torvalds
     

28 Oct, 2011

8 commits

  • In setlease, we use i_writecount to decide whether we can give out a
    read lease.

    In open, we break leases before incrementing i_writecount.

    There is therefore a window between the break lease and the i_writecount
    increment when setlease could add a new read lease.

    This would leave us with a simultaneous write open and read lease, which
    shouldn't happen.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Christoph Hellwig

    J. Bruce Fields
     
  • This makes NFS follow the standard generic_file_llseek locking scheme.

    Cc: Trond.Myklebust@netapp.com
    Signed-off-by: Andi Kleen
    Signed-off-by: Christoph Hellwig

    Andi Kleen
     
  • This gives ext4 the benefits of unlocked llseek.

    Cc: tytso@mit.edu
    Signed-off-by: Andi Kleen
    Signed-off-by: Christoph Hellwig

    Andi Kleen
     
  • Add a generic_file_llseek variant to the VFS that allows passing in
    the maximum file size of the file system, instead of always
    using maxbytes from the superblock.

    This can be used to eliminate some cut'n'paste seek code in ext4.

    Signed-off-by: Andi Kleen
    Signed-off-by: Christoph Hellwig

    Andi Kleen
     
  • The i_mutex lock use of generic _file_llseek hurts. Independent processes
    accessing the same file synchronize over a single lock, even though
    they have no need for synchronization at all.

    Under high utilization this can cause llseek to scale very poorly on larger
    systems.

    This patch does some rethinking of the llseek locking model:

    First the 64bit f_pos is not necessarily atomic without locks
    on 32bit systems. This can already cause races with read() today.
    This was discussed on linux-kernel in the past and deemed acceptable.
    The patch does not change that.

    Let's look at the different seek variants:

    SEEK_SET: Doesn't really need any locking.
    If there's a race one writer wins, the other loses.

    For 32bit the non atomic update races against read()
    stay the same. Without a lock they can also happen
    against write() now. The read() race was deemed
    acceptable in past discussions, and I think if it's
    ok for read it's ok for write too.

    => Don't need a lock.

    SEEK_END: This behaves like SEEK_SET plus it reads
    the maximum size too. Reading the maximum size would have the
    32bit atomic problem. But luckily we already have a way to read
    the maximum size without locking (i_size_read), so we
    can just use that instead.

    Without i_mutex there is no synchronization with write() anymore,
    however since the write() update is atomic on 64bit it just behaves
    like another racy SEEK_SET. On non atomic 32bit it's the same
    as SEEK_SET.

    => Don't need a lock, but need to use i_size_read()

    SEEK_CUR: This has a read-modify-write race window
    on the same file. One could argue that any application
    doing unsynchronized seeks on the same file is already broken.
    But for the sake of not adding a regression here I'm
    using the file->f_lock to synchronize this. Using this
    lock is much better than the inode mutex because it doesn't
    synchronize between processes.

    => So still need a lock, but can use a f_lock.

    This patch implements this new scheme in generic_file_llseek.
    I dropped generic_file_llseek_unlocked and changed all callers.

    Signed-off-by: Andi Kleen
    Signed-off-by: Christoph Hellwig

    Andi Kleen
     
  • This doesn't change anything for the compiler, but hch thought it would
    make the code clearer.

    I moved the reference counting into its own little inline.

    Signed-off-by: Andi Kleen
    Acked-by: Jeff Moyer
    Signed-off-by: Christoph Hellwig

    Andi Kleen
     
  • Add inlines to all the submission path functions. While this increases
    code size it also gives gcc a lot of optimization opportunities
    in this critical hotpath.

    In particular -- together with some other changes -- this
    allows gcc to get rid of the unnecessary clearing of
    sdio at the beginning and optimize the messy parameter passing.
    Any non inlining of a function which takes a sdio parameter
    would break this optimization because they cannot be done if the
    address of a structure is taken.

    Note that benefits are only seen with CONFIG_OPTIMIZE_INLINING
    and CONFIG_CC_OPTIMIZE_FOR_SIZE both set to off.

    This gives about 2.2% improvement on a large database benchmark
    with a high IOPS rate.

    Signed-off-by: Andi Kleen
    Signed-off-by: Christoph Hellwig

    Andi Kleen
     
  • Only a single b_private field in the map_bh buffer head is needed after
    the submission path. Move map_bh separately to avoid storing
    this information in the long term slab.

    This avoids the weird 104 byte hole in struct dio_submit which also needed
    to be memseted early.

    Signed-off-by: Andi Kleen
    Signed-off-by: Christoph Hellwig

    Andi Kleen