15 Jan, 2012

1 commit

  • Autogenerated GPG tag for Rusty D1ADB8F1: 15EE 8D6C AB0E 7F0C F999 BFCB D920 0E6C D1AD B8F1

    * tag 'for-linus' of git://github.com/rustyrussell/linux:
    module_param: check that bool parameters really are bool.
    intelfbdrv.c: bailearly is an int module_param
    paride/pcd: fix bool verbose module parameter.
    module_param: make bool parameters really bool (drivers & misc)
    module_param: make bool parameters really bool (arch)
    module_param: make bool parameters really bool (core code)
    kernel/async: remove redundant declaration.
    printk: fix unnecessary module_param_name.
    lirc_parallel: fix module parameter description.
    module_param: avoid bool abuse, add bint for special cases.
    module_param: check type correctness for module_param_array
    modpost: use linker section to generate table.
    modpost: use a table rather than a giant if/else statement.
    modules: sysfs - export: taint, coresize, initsize
    kernel/params: replace DEBUGP with pr_debug
    module: replace DEBUGP with pr_debug
    module: struct module_ref should contains long fields
    module: Fix performance regression on modules with large symbol tables
    module: Add comments describing how the "strmap" logic works

    Fix up conflicts in scripts/mod/file2alias.c due to the new linker-
    generated table approach to adding __mod_*_device_table entries. The
    ARM sa11x0 mcp bus needed to be converted to that too.

    Linus Torvalds
     

13 Jan, 2012

3 commits

  • This patch adds a lightweight sync migrate operation MIGRATE_SYNC_LIGHT
    mode that avoids writing back pages to backing storage. Async compaction
    maps to MIGRATE_ASYNC while sync compaction maps to MIGRATE_SYNC_LIGHT.
    For other migrate_pages users such as memory hotplug, MIGRATE_SYNC is
    used.

    This avoids sync compaction stalling for an excessive length of time,
    particularly when copying files to a USB stick where there might be a
    large number of dirty pages backed by a filesystem that does not support
    ->writepages.

    [aarcange@redhat.com: This patch is heavily based on Andrea's work]
    [akpm@linux-foundation.org: fix fs/nfs/write.c build]
    [akpm@linux-foundation.org: fix fs/btrfs/disk-io.c build]
    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Minchan Kim
    Cc: Dave Jones
    Cc: Jan Kara
    Cc: Andy Isaacson
    Cc: Nai Xia
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Asynchronous compaction is used when allocating transparent hugepages to
    avoid blocking for long periods of time. Due to reports of stalling,
    there was a debate on disabling synchronous compaction but this severely
    impacted allocation success rates. Part of the reason was that many dirty
    pages are skipped in asynchronous compaction by the following check;

    if (PageDirty(page) && !sync &&
    mapping->a_ops->migratepage != migrate_page)
    rc = -EBUSY;

    This skips over all mapping aops using buffer_migrate_page() even though
    it is possible to migrate some of these pages without blocking. This
    patch updates the ->migratepage callback with a "sync" parameter. It is
    the responsibility of the callback to fail gracefully if migration would
    block.

    Signed-off-by: Mel Gorman
    Reviewed-by: Rik van Riel
    Cc: Andrea Arcangeli
    Cc: Minchan Kim
    Cc: Dave Jones
    Cc: Jan Kara
    Cc: Andy Isaacson
    Cc: Nai Xia
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • module_param(bool) used to counter-intuitively take an int. In
    fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
    trick.

    It's time to remove the int/unsigned int option. For this version
    it'll simply give a warning, but it'll break next kernel version.

    Acked-by: Mauro Carvalho Chehab
    Signed-off-by: Rusty Russell

    Rusty Russell
     

11 Jan, 2012

1 commit

  • * 'nfs-for-3.3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    NFSv4: Change the default setting of the nfs4_disable_idmapping parameter
    NFSv4: Save the owner/group name string when doing open
    NFS: Remove pNFS bloat from the generic write path
    pnfs-obj: Must return layout on IO error
    pnfs-obj: pNFS errors are communicated on iodata->pnfs_error
    NFS: Cache state owners after files are closed
    NFS: Clean up nfs4_find_state_owners_locked()
    NFSv4: include bitmap in nfsv4 get acl data
    nfs: fix a minor do_div portability issue
    NFSv4.1: cleanup comment and debug printk
    NFSv4.1: change nfs4_free_slot parameters for dynamic slots
    NFSv4.1: cleanup init and reset of session slot tables
    NFSv4.1: fix backchannel slotid off-by-one bug
    nfs: fix regression in handling of context= option in NFSv4
    NFS - fix recent breakage to NFS error handling.
    NFS: Retry mounting NFSROOT
    SUNRPC: Clean up the RPCSEC_GSS service ticket requests

    Linus Torvalds
     

10 Jan, 2012

1 commit

  • Now that the use of numeric uids/gids is officially sanctioned in
    RFC3530bis, it is time to change the default here to 'enabled'.

    By doing so, we ensure that NFSv4 copies the behaviour of NFSv3 when we're
    using the default AUTH_SYS authentication (i.e. when the client uses the
    numeric uids/gids as authentication tokens), so that when new files are
    created, they will appear to have the correct user/group.
    It also fixes a number of backward compatibility issues when migrating
    from NFSv3 to NFSv4 on a platform where the server uses different uid/gid
    mappings than the client.

    Note also that this setting has been successfully tested against servers
    that do not support numeric uids/gids at several Connectathon/Bakeathon
    events at this point, and the fall back to using string names/groups has
    been shown to work well in all those test cases.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

09 Jan, 2012

1 commit

  • * 'pm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (76 commits)
    PM / Hibernate: Implement compat_ioctl for /dev/snapshot
    PM / Freezer: fix return value of freezable_schedule_timeout_killable()
    PM / shmobile: Allow the A4R domain to be turned off at run time
    PM / input / touchscreen: Make st1232 use device PM QoS constraints
    PM / QoS: Introduce dev_pm_qos_add_ancestor_request()
    PM / shmobile: Remove the stay_on flag from SH7372's PM domains
    PM / shmobile: Don't include SH7372's INTCS in syscore suspend/resume
    PM / shmobile: Add support for the sh7372 A4S power domain / sleep mode
    PM: Drop generic_subsys_pm_ops
    PM / Sleep: Remove forward-only callbacks from AMBA bus type
    PM / Sleep: Remove forward-only callbacks from platform bus type
    PM: Run the driver callback directly if the subsystem one is not there
    PM / Sleep: Make pm_op() and pm_noirq_op() return callback pointers
    PM/Devfreq: Add Exynos4-bus device DVFS driver for Exynos4210/4212/4412.
    PM / Sleep: Merge internal functions in generic_ops.c
    PM / Sleep: Simplify generic system suspend callbacks
    PM / Hibernate: Remove deprecated hibernation snapshot ioctls
    PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()
    ARM: S3C64XX: Implement basic power domain support
    PM / shmobile: Use common always on power domain governor
    ...

    Fix up trivial conflict in fs/xfs/xfs_buf.c due to removal of unused
    XBT_FORCE_SLEEP bit

    Linus Torvalds
     

08 Jan, 2012

1 commit

  • ...so that we can do the uid/gid mapping outside the asynchronous RPC
    context.
    This fixes a bug in the current NFSv4 atomic open code where the client
    isn't able to determine what the true uid/gid fields of the file are,
    (because the asynchronous nature of the OPEN call denies it the ability
    to do an upcall) and so fills them with default values, marking the
    inode as needing revalidation.
    Unfortunately, in some cases, the VFS will do some additional sanity
    checks on the file, and may override the server's decision to allow
    the open because it sees the wrong owner/group fields.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

07 Jan, 2012

4 commits


06 Jan, 2012

4 commits

  • We have no business doing any this in the standard write release path.
    Get rid of it, and put it in the pNFS layer.

    Also, while we're at it, get rid of the completely bogus unlock/relock
    semantics that were present in nfs_writeback_release_full(). It is
    not only unnecessary, but actually dangerous to release the write lock
    just in order to take it again in nfs_page_async_flush(). Better just
    to open code the pgio operations in a pnfs helper.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • As mandated by the standard. In case of an IO error, a pNFS
    objects layout driver must return it's layout. This is because
    all device errors are reported to the server as part of the
    layout return buffer.

    This is implemented the same way PNFS_LAYOUTRET_ON_SETATTR
    is done, through a bit flag on the pnfs_layoutdriver_type->flags
    member. The flag is set by the layout driver that wants a
    layout_return preformed at pnfs_ld_{write,read}_done in case
    of an error.
    (Though I have not defined a wrapper like pnfs_ld_layoutret_on_setattr
    because this code is never called outside of pnfs.c and pnfs IO
    paths)

    Without this patch 3.[0-2] Kernels leak memory and have an annoying
    WARN_ON after every IO error utilizing the pnfs-obj driver.

    [This patch is for 3.2 Kernel. 3.1/0 Kernels need a different patch]
    CC: Stable Tree
    Signed-off-by: Boaz Harrosh
    Signed-off-by: Trond Myklebust

    Boaz Harrosh
     
  • Some time along the way pNFS IO errors were switched to
    communicate with a special iodata->pnfs_error member instead
    of the regular RPC members. But objlayout was not switched
    over.

    Fix that!
    Without this fix any IO error is hanged, because IO is not
    switched to MDS and pages are never cleared or read.

    [Applies to 3.2.0. Same bug different patch for 3.1/0 Kernels]
    CC: Stable Tree
    Signed-off-by: Boaz Harrosh
    Signed-off-by: Trond Myklebust

    Boaz Harrosh
     
  • Servers have a finite amount of memory to store NFSv4 open and lock
    owners. Moreover, servers may have a difficult time determining when
    they can reap their state owner table, thanks to gray areas in the
    NFSv4 protocol specification. Thus clients should be careful to reuse
    state owners when possible.

    Currently Linux is not too careful. When a user has closed all her
    files on one mount point, the state owner's reference count goes to
    zero, and it is released. The next OPEN allocates a new one. A
    workload that serially opens and closes files can run through a large
    number of open owners this way.

    When a state owner's reference count goes to zero, slap it onto a free
    list for that nfs_server, with an expiry time. Garbage collect before
    looking for a state owner. This makes state owners for active users
    available for re-use.

    Now that there can be unused state owners remaining at umount time,
    purge the state owner free list when a server is destroyed. Also be
    sure not to reclaim unused state owners during state recovery.

    This change has benefits for the client as well. For some workloads,
    this approach drops the number of OPEN_CONFIRM calls from the same as
    the number of OPEN calls, down to just one. This reduces wire traffic
    and thus open(2) latency. Before this patch, untarring a kernel
    source tarball shows the OPEN_CONFIRM call counter steadily increasing
    through the test. With the patch, the OPEN_CONFIRM count remains at 1
    throughout the entire untar.

    As long as the expiry time is kept short, I don't think garbage
    collection should be terribly expensive, although it does bounce the
    clp->cl_lock around a bit.

    [ At some point we should rationalize the use of the nfs_server
    ->destroy method. ]

    Signed-off-by: Chuck Lever
    [Trond: Fixed a garbage collection race and a few efficiency issues]
    Signed-off-by: Trond Myklebust

    Chuck Lever
     

05 Jan, 2012

10 commits

  • There's no longer a need to check the so_server field in the state
    owner, because nowadays the RB tree we search for state owners
    contains owners for that only server.

    Make nfs4_find_state_owners_locked() use the same tree searching logic
    as nfs4_insert_state_owner_locked().

    Signed-off-by: Chuck Lever
    Signed-off-by: Trond Myklebust

    Chuck Lever
     
  • The NFSv4 bitmap size is unbounded: a server can return an arbitrary
    sized bitmap in an FATTR4_WORD0_ACL request. Replace using the
    nfs4_fattr_bitmap_maxsz as a guess to the maximum bitmask returned by a server
    with the inclusion of the bitmap (xdr length plus bitmasks) and the acl data
    xdr length to the (cached) acl page data.

    This is a general solution to commit e5012d1f "NFSv4.1: update
    nfs4_fattr_bitmap_maxsz" and fixes hitting a BUG_ON in xdr_shrink_bufhead
    when getting ACLs.

    Fix a bug in decode_getacl that returned -EINVAL on ACLs > page when getxattr
    was called with a NULL buffer, preventing ACL > PAGE_SIZE from being retrieved.

    Cc: stable@kernel.org
    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • This change modifies filelayout_get_dense_offset() to use the functions
    in math64.h and thus avoid a 32-bit platform compile error trying to
    use do_div() on an s64 type.

    Signed-off-by: Chris Metcalf
    Reviewed-by: Boaz Harrosh
    Signed-off-by: Trond Myklebust

    Chris Metcalf
     
  • Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • We are either initializing or resetting a session. Initialize or reset
    the session slot tables accordingly.

    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Cc:stable@kernel.org
    Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Setting the security context of a NFSv4 mount via the context= mount
    option is currently broken. The NFSv4 codepath allocates a parsed
    options struct, and then parses the mount options to fill it. It
    eventually calls nfs4_remote_mount which calls security_init_mnt_opts.
    That clobbers the lsm_opts struct that was populated earlier. This bug
    also looks like it causes a small memory leak on each v4 mount where
    context= is used.

    Fix this by moving the initialization of the lsm_opts into
    nfs_alloc_parsed_mount_data. Also, add a destructor for
    nfs_parsed_mount_data to make it easier to free all of the allocations
    hanging off of it, and to ensure that the security_free_mnt_opts is
    called whenever security_init_mnt_opts is.

    I believe this regression was introduced quite some time ago, probably
    by commit c02d7adf.

    Cc: stable@vger.kernel.org
    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Jeff Layton
     
  • From c6d615d2b97fe305cbf123a8751ced859dca1d5e Mon Sep 17 00:00:00 2001
    From: NeilBrown
    Date: Wed, 16 Nov 2011 09:39:05 +1100
    Subject: [PATCH] NFS - fix recent breakage to NFS error handling.

    commit 02c24a82187d5a628c68edfe71ae60dc135cd178 made a small and
    presumably unintended change to write error handling in NFS.

    Previously an error from filemap_write_and_wait_range would only be of
    interest if nfs_file_fsync did not return an error. After this commit,
    an error from filemap_write_and_wait_range would mean that (the rest of)
    nfs_file_fsync would not even be called.

    This means that:
    1/ you are more likely to see EIO than e.g. EDQUOT or ENOSPC.
    2/ NFS_CONTEXT_ERROR_WRITE remains set for longer so more writes are
    synchronous.

    This patch restores previous behaviour.

    Cc: stable@kernel.org
    Cc: Josef Bacik
    Cc: Jan Kara
    Cc: Al Viro
    Signed-off-by: NeilBrown
    Signed-off-by: Trond Myklebust

    NeilBrown
     
  • Instead of hacking specific service names into gss_encode_v1_msg, we should
    just allow the caller to specify the service name explicitly.

    Signed-off-by: Trond Myklebust
    Acked-by: J. Bruce Fields

    Trond Myklebust
     

04 Jan, 2012

7 commits


22 Dec, 2011

1 commit

  • * master: (848 commits)
    SELinux: Fix RCU deref check warning in sel_netport_insert()
    binary_sysctl(): fix memory leak
    mm/vmalloc.c: remove static declaration of va from __get_vm_area_node
    ipmi_watchdog: restore settings when BMC reset
    oom: fix integer overflow of points in oom_badness
    memcg: keep root group unchanged if creation fails
    nilfs2: potential integer overflow in nilfs_ioctl_clean_segments()
    nilfs2: unbreak compat ioctl
    cpusets: stall when updating mems_allowed for mempolicy or disjoint nodemask
    evm: prevent racing during tfm allocation
    evm: key must be set once during initialization
    mmc: vub300: fix type of firmware_rom_wait_states module parameter
    Revert "mmc: enable runtime PM by default"
    mmc: sdhci: remove "state" argument from sdhci_suspend_host
    x86, dumpstack: Fix code bytes breakage due to missing KERN_CONT
    IB/qib: Correct sense on freectxts increment and decrement
    RDMA/cma: Verify private data length
    cgroups: fix a css_set not found bug in cgroup_attach_proc
    oprofile: Fix uninitialized memory access when writing to writing to oprofilefs
    Revert "xen/pv-on-hvm kexec: add xs_reset_watches to shutdown watches from old kernel"
    ...

    Conflicts:
    kernel/cgroup_freezer.c

    Rafael J. Wysocki
     

16 Dec, 2011

1 commit

  • After commit 06222e491e663dac939f04b125c9dc52126a75c4 (fs: handle
    SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek)
    the behaviour of llseek() was changed so that it always revalidates
    the file size. The bug appears to be due to a logic error in the
    afore-mentioned commit, which always evaluates to 'true'.

    Reported-by: Roel Kluin
    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org [>=3.1]

    Trond Myklebust
     

10 Dec, 2011

2 commits


07 Dec, 2011

1 commit

  • Allow the freezer to skip wait_on_bit_killable sleeps in the sunrpc
    layer. This should allow suspend and hibernate events to proceed, even
    when there are RPC's pending on the wire.

    Also, wrap the TASK_KILLABLE sleeps in NFS layer in freezer_do_not_count
    and freezer_count calls. This allows the freezer to skip tasks that are
    sleeping while looping on EJUKEBOX or NFS4ERR_DELAY sorts of errors.

    Signed-off-by: Jeff Layton
    Signed-off-by: Rafael J. Wysocki

    Jeff Layton
     

02 Dec, 2011

2 commits