01 Aug, 2012

9 commits

  • Merge Andrew's second set of patches:
    - MM
    - a few random fixes
    - a couple of RTC leftovers

    * emailed patches from Andrew Morton : (120 commits)
    rtc/rtc-88pm80x: remove unneed devm_kfree
    rtc/rtc-88pm80x: assign ret only when rtc_register_driver fails
    mm: hugetlbfs: close race during teardown of hugetlbfs shared page tables
    tmpfs: distribute interleave better across nodes
    mm: remove redundant initialization
    mm: warn if pg_data_t isn't initialized with zero
    mips: zero out pg_data_t when it's allocated
    memcg: gix memory accounting scalability in shrink_page_list
    mm/sparse: remove index_init_lock
    mm/sparse: more checks on mem_section number
    mm/sparse: optimize sparse_index_alloc
    memcg: add mem_cgroup_from_css() helper
    memcg: further prevent OOM with too many dirty pages
    memcg: prevent OOM with too many dirty pages
    mm: mmu_notifier: fix freed page still mapped in secondary MMU
    mm: memcg: only check anon swapin page charges for swap cache
    mm: memcg: only check swap cache pages for repeated charging
    mm: memcg: split swapin charge function into private and public part
    mm: memcg: remove needless !mm fixup to init_mm when charging
    mm: memcg: remove unneeded shmem charge type
    ...

    Linus Torvalds
     
  • Pull second wave of NFS client updates from Trond Myklebust:

    - Patches from Bryan to allow splitting of the NFSv2/v3/v4 code into
    separate modules.

    - Fix Oopses in the NFSv4 idmapper

    - Fix a deadlock whereby rpciod tries to allocate a new socket and ends
    up recursing into the NFS code due to memory reclaim.

    - Increase the number of permitted callback connections.

    * tag 'nfs-for-3.6-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
    nfs: explicitly reject LOCK_MAND flock() requests
    nfs: increase number of permitted callback connections.
    SUNRPC: return negative value in case rpcbind client creation error
    NFS: Convert v4 into a module
    NFS: Convert v3 into a module
    NFS: Convert v2 into a module
    NFS: Keep module parameters in the generic NFS client
    NFS: Split out remaining NFS v4 inode functions
    NFS: Pass super operations and xattr handlers in the nfs_subversion
    NFS: Only initialize the ACL client in the v3 case
    NFS: Create a try_mount rpc op
    NFS: Remove the NFS v4 xdev mount function
    NFS: Add version registering framework
    NFS: Fix a number of bugs in the idmapper
    nfs: skip commit in releasepage if we're freeing memory for fs-related reasons
    sunrpc: clarify comments on rpc_make_runnable
    pnfsblock: bail out partial page IO

    Linus Torvalds
     
  • GFP_NOFS is _more_ permissive than GFP_NOIO in that it will initiate IO,
    just not of any filesystem data.

    The problem is that previously NOFS was correct because that avoids
    recursion into the NFS code. With swap-over-NFS, it is no longer correct
    as swap IO can lead to this recursion.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Christoph Hellwig
    Cc: David S. Miller
    Cc: Eric B Munson
    Cc: Eric Paris
    Cc: James Morris
    Cc: Mel Gorman
    Cc: Mike Christie
    Cc: Neil Brown
    Cc: Sebastian Andrzej Siewior
    Cc: Trond Myklebust
    Cc: Xiaotian Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Implement the new swapfile a_ops for NFS and hook up ->direct_IO. This
    will set the NFS socket to SOCK_MEMALLOC and run socket reconnect under
    PF_MEMALLOC as well as reset SOCK_MEMALLOC before engaging the protocol
    ->connect() method.

    PF_MEMALLOC should allow the allocation of struct socket and related
    objects and the early (re)setting of SOCK_MEMALLOC should allow us to
    receive the packets required for the TCP connection buildup.

    [jlayton@redhat.com: Restore PF_MEMALLOC task flags in all cases]
    [dfeng@redhat.com: Fix handling of multiple swap files]
    [a.p.zijlstra@chello.nl: Original patch]
    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Christoph Hellwig
    Cc: David S. Miller
    Cc: Eric B Munson
    Cc: Eric Paris
    Cc: James Morris
    Cc: Mel Gorman
    Cc: Mike Christie
    Cc: Neil Brown
    Cc: Sebastian Andrzej Siewior
    Cc: Trond Myklebust
    Cc: Xiaotian Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • The VM does not like PG_private set on PG_swapcache pages. As suggested
    by Trond in http://lkml.org/lkml/2006/8/25/348, this patch disables NFS
    data cache revalidation on swap files. as it does not make sense to have
    other clients change the file while it is being used as swap. This avoids
    setting PG_private on swap pages, since there ought to be no further races
    with invalidate_inode_pages2() to deal with.

    Since we cannot set PG_private we cannot use page->private which is
    already used by PG_swapcache pages to store the nfs_page. Thus augment
    the new nfs_page_find_request logic.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Christoph Hellwig
    Cc: David S. Miller
    Cc: Eric B Munson
    Cc: Eric Paris
    Cc: James Morris
    Cc: Mel Gorman
    Cc: Mike Christie
    Cc: Neil Brown
    Cc: Sebastian Andrzej Siewior
    Cc: Trond Myklebust
    Cc: Xiaotian Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Replace all relevant occurences of page->index and page->mapping in the
    NFS client with the new page_file_index() and page_file_mapping()
    functions.

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Mel Gorman
    Acked-by: Rik van Riel
    Cc: Christoph Hellwig
    Cc: David S. Miller
    Cc: Eric B Munson
    Cc: Eric Paris
    Cc: James Morris
    Cc: Mel Gorman
    Cc: Mike Christie
    Cc: Neil Brown
    Cc: Sebastian Andrzej Siewior
    Cc: Trond Myklebust
    Cc: Xiaotian Feng
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     
  • Pull nfsd changes from J. Bruce Fields:
    "This has been an unusually quiet cycle--mostly bugfixes and cleanup.
    The one large piece is Stanislav's work to containerize the server's
    grace period--but that in itself is just one more step in a
    not-yet-complete project to allow fully containerized nfs service.

    There are a number of outstanding delegation, container, v4 state, and
    gss patches that aren't quite ready yet; 3.7 may be wilder."

    * 'nfsd-next' of git://linux-nfs.org/~bfields/linux: (35 commits)
    NFSd: make boot_time variable per network namespace
    NFSd: make grace end flag per network namespace
    Lockd: move grace period management from lockd() to per-net functions
    LockD: pass actual network namespace to grace period management functions
    LockD: manage grace list per network namespace
    SUNRPC: service request network namespace helper introduced
    NFSd: make nfsd4_manager allocated per network namespace context.
    LockD: make lockd manager allocated per network namespace
    LockD: manage grace period per network namespace
    Lockd: add more debug to host shutdown functions
    Lockd: host complaining function introduced
    LockD: manage used host count per networks namespace
    LockD: manage garbage collection timeout per networks namespace
    LockD: make garbage collector network namespace aware.
    LockD: mark host per network namespace on garbage collect
    nfsd4: fix missing fault_inject.h include
    locks: move lease-specific code out of locks_delete_lock
    locks: prevent side-effects of locks_release_private before file_lock is initialized
    NFSd: set nfsd_serv to NULL after service destruction
    NFSd: introduce nfsd_destroy() helper
    ...

    Linus Torvalds
     
  • We have no mechanism to emulate LOCK_MAND locks on NFSv4, so explicitly
    return -EINVAL if someone requests it.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Jeff Layton
     
  • By default a sunrpc service is limited to (N+3)*20 connections
    where N is the number of threads. This is 80 when N==1.
    If this number is exceeded a warning is printed suggesting that
    the number of threads be increased. However with services which
    run a single thread, this is impossible.

    For such services there is a ->sv_maxconn setting that can be
    used to forcibly increase the limit, and silence the message.
    This is used by lockd.

    The nfs client uses a sunrpc service to handle callbacks and
    it too is single-threaded, so to avoid the useless messages,
    and to allow a reasonable number of concurrent connections,
    we need to set ->sv_maxconn. 1024 seems like a good number.

    Signed-off-by: NeilBrown
    Signed-off-by: Trond Myklebust

    NeilBrown
     

31 Jul, 2012

18 commits

  • Pull NFS client updates from Trond Myklebust:
    "Features include:
    - More preparatory patches for modularising NFSv2/v3/v4. Split out
    the various NFSv2/v3/v4-specific code into separate files
    - More preparation for the NFSv4 migration code
    - Ensure that OPEN(O_CREATE) observes the pNFS mds threshold
    parameters
    - pNFS fast failover when the data servers are down
    - Various cleanups and debugging patches"

    * tag 'nfs-for-3.6-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (67 commits)
    nfs: fix fl_type tests in NFSv4 code
    NFS: fix pnfs regression with directio writes
    NFS: fix pnfs regression with directio reads
    sunrpc: clnt: Add missing braces
    nfs: fix stub return type warnings
    NFS: exit_nfs_v4() shouldn't be an __exit function
    SUNRPC: Add a missing spin_unlock to gss_mech_list_pseudoflavors
    NFS: Split out NFS v4 client functions
    NFS: Split out the NFS v4 filesystem types
    NFS: Create a single nfs_clone_super() function
    NFS: Split out NFS v4 server creating code
    NFS: Initialize the NFS v4 client from init_nfs_v4()
    NFS: Move the v4 getroot code to nfs4getroot.c
    NFS: Split out NFS v4 file operations
    NFS: Initialize v4 sysctls from nfs_init_v4()
    NFS: Create an init_nfs_v4() function
    NFS: Split out NFS v4 inode operations
    NFS: Split out NFS v3 inode operations
    NFS: Split out NFS v2 inode operations
    NFS: Clean up nfs4_proc_setclientid() and friends
    ...

    Linus Torvalds
     
  • This patch exports symbols needed by the v4 module. In addition, I also
    switch over to using IS_ENABLED() to check if CONFIG_NFS_V4 or
    CONFIG_NFS_V4_MODULE are set.

    The module (nfs4.ko) will be created in the same directory as nfs.ko and
    will be automatically loaded the first time you try to mount over NFS v4.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • This patch exports symbols and moves over the final structures needed by
    the v3 module. In addition, I also switch over to using IS_ENABLED() to
    check if CONFIG_NFS_V3 or CONFIG_NFS_V3_MODULE are set.

    The module (nfs3.ko) will be created in the same directory as nfs.ko and
    will be automatically loaded the first time you try to mount over NFS v3.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • The module (nfs2.ko) will be created in the same directory as nfs.ko and
    will be automatically loaded the first time you try to mount over NFS v2.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • Otherwise we break backwards compatibility when v4 becomes a modules.
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • Somehow I missed this in my previous patch series, but these functions
    are only needed by the v4 code and should be moved to a v4-only file. I
    wasn't exactly sure where I should put these functions, so I moved them
    into nfs4super.c where I could make them static.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • I can set all variables in the nfs_fill_super() function, allowing me to
    remove the nfs4_fill_super() function.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • v2 and v4 don't use it, so I create two new nfs_rpc_ops functions to
    initialize the ACL client only when we are using v3.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • I'm already looking up the nfs subversion in nfs_fs_mount(), so I have
    easy access to rpc_ops that used to be difficult to reach. This allows
    me to set up a different mount path for NFS v2/3 and NFS v4.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • I can now share this code with the v2 and v3 code by using the NFS
    subversion structure.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • This patch adds in the code to track multiple versions of the NFS
    protocol. I created default structures for v2, v3 and v4 so that each
    version can continue to work while I convert them into kernel modules.
    I also removed the const parameter from the rpc_version array so that I
    can change it at runtime.

    Signed-off-by: Bryan Schumaker
    Signed-off-by: Trond Myklebust

    Bryan Schumaker
     
  • Fix a number of bugs in the NFS idmapper code:

    (1) Only registered key types can be passed to the core keys code, so
    register the legacy idmapper key type.

    This is a requirement because the unregister function cleans up keys
    belonging to that key type so that there aren't dangling pointers to the
    module left behind - including the key->type pointer.

    (2) Rename the legacy key type. You can't have two key types with the same
    name, and (1) would otherwise require that.

    (3) complete_request_key() must be called in the error path of
    nfs_idmap_legacy_upcall().

    (4) There is one idmap struct for each nfs_client struct. This means that
    idmap->idmap_key_cons is shared without the use of a lock. This is a
    problem because key_instantiate_and_link() - as called indirectly by
    idmap_pipe_downcall() - releases anyone waiting for the key to be
    instantiated.

    What happens is that idmap_pipe_downcall() running in the rpc.idmapd
    thread, releases the NFS filesystem in whatever thread that is running in
    to continue. This may then make another idmapper call, overwriting
    idmap_key_cons before idmap_pipe_downcall() gets the chance to call
    complete_request_key().

    I *think* that reading idmap_key_cons only once, before
    key_instantiate_and_link() is called, and then caching the result in a
    variable is sufficient.

    Bug (4) is the cause of:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [< (null)>] (null)
    PGD 0
    Oops: 0010 [#1] SMP
    CPU 1
    Modules linked in: ppdev parport_pc lp parport ip6table_filter ip6_tables ebtable_nat ebtables ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack nfs fscache xt_CHECKSUM auth_rpcgss iptable_mangle nfs_acl bridge stp llc lockd be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i libcxgbi cxgb3 mdio ib_iser rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi snd_hda_codec_realtek snd_usb_audio snd_hda_intel snd_hda_codec snd_seq snd_pcm snd_hwdep snd_usbmidi_lib snd_rawmidi snd_timer uvcvideo videobuf2_core videodev media videobuf2_vmalloc snd_seq_device videobuf2_memops e1000e vhost_net iTCO_wdt joydev coretemp snd soundcore macvtap macvlan i2c_i801 snd_page_alloc tun iTCO_vendor_support microcode kvm_intel kvm sunrpc hid_logitech_dj usb_storage i915 drm_kms_helper drm i2c_algo_bit i2c_core video [last unloaded: scsi_wait_scan]
    Pid: 1229, comm: rpc.idmapd Not tainted 3.4.2-1.fc16.x86_64 #1 Gateway DX4710-UB801A/G33M05G1
    RIP: 0010:[] [< (null)>] (null)
    RSP: 0018:ffff8801a3645d40 EFLAGS: 00010246
    RAX: ffff880077707e30 RBX: ffff880077707f50 RCX: ffff8801a18ccd80
    RDX: 0000000000000006 RSI: ffff8801a3645e75 RDI: ffff880077707f50
    RBP: ffff8801a3645d88 R08: ffff8801a430f9c0 R09: ffff8801a3645db0
    R10: 000000000000000a R11: 0000000000000246 R12: ffff8801a18ccd80
    R13: ffff8801a3645e75 R14: ffff8801a430f9c0 R15: 0000000000000006
    FS: 00007fb6fb51a700(0000) GS:ffff8801afc80000(0000) knlGS:0000000000000000
    CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 0000000000000000 CR3: 00000001a49b0000 CR4: 00000000000027e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Process rpc.idmapd (pid: 1229, threadinfo ffff8801a3644000, task ffff8801a3bf9710)
    Stack:
    ffffffff81260878 ffff8801a3645db0 ffff8801a3645db0 ffff880077707a90
    ffff880077707f50 ffff8801a18ccd80 0000000000000006 ffff8801a3645e75
    ffff8801a430f9c0 ffff8801a3645dd8 ffffffff81260983 ffff8801a3645de8
    Call Trace:
    [] ? __key_instantiate_and_link+0x58/0x100
    [] key_instantiate_and_link+0x63/0xa0
    [] idmap_pipe_downcall+0x1cb/0x1e0 [nfs]
    [] rpc_pipe_write+0x67/0x90 [sunrpc]
    [] vfs_write+0xb3/0x180
    [] sys_write+0x4a/0x90
    [] system_call_fastpath+0x16/0x1b
    Code: Bad RIP value.
    RIP [< (null)>] (null)
    RSP
    CR2: 0000000000000000

    Signed-off-by: David Howells
    Reviewed-by: Steve Dickson
    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org [>= 3.4]

    David Howells
     
  • We've had some reports of a deadlock where rpciod ends up with a stack
    trace like this:

    PID: 2507 TASK: ffff88103691ab40 CPU: 14 COMMAND: "rpciod/14"
    #0 [ffff8810343bf2f0] schedule at ffffffff814dabd9
    #1 [ffff8810343bf3b8] nfs_wait_bit_killable at ffffffffa038fc04 [nfs]
    #2 [ffff8810343bf3c8] __wait_on_bit at ffffffff814dbc2f
    #3 [ffff8810343bf418] out_of_line_wait_on_bit at ffffffff814dbcd8
    #4 [ffff8810343bf488] nfs_commit_inode at ffffffffa039e0c1 [nfs]
    #5 [ffff8810343bf4f8] nfs_release_page at ffffffffa038bef6 [nfs]
    #6 [ffff8810343bf528] try_to_release_page at ffffffff8110c670
    #7 [ffff8810343bf538] shrink_page_list.clone.0 at ffffffff81126271
    #8 [ffff8810343bf668] shrink_inactive_list at ffffffff81126638
    #9 [ffff8810343bf818] shrink_zone at ffffffff8112788f
    #10 [ffff8810343bf8c8] do_try_to_free_pages at ffffffff81127b1e
    #11 [ffff8810343bf958] try_to_free_pages at ffffffff8112812f
    #12 [ffff8810343bfa08] __alloc_pages_nodemask at ffffffff8111fdad
    #13 [ffff8810343bfb28] kmem_getpages at ffffffff81159942
    #14 [ffff8810343bfb58] fallback_alloc at ffffffff8115a55a
    #15 [ffff8810343bfbd8] ____cache_alloc_node at ffffffff8115a2d9
    #16 [ffff8810343bfc38] kmem_cache_alloc at ffffffff8115b09b
    #17 [ffff8810343bfc78] sk_prot_alloc at ffffffff81411808
    #18 [ffff8810343bfcb8] sk_alloc at ffffffff8141197c
    #19 [ffff8810343bfce8] inet_create at ffffffff81483ba6
    #20 [ffff8810343bfd38] __sock_create at ffffffff8140b4a7
    #21 [ffff8810343bfd98] xs_create_sock at ffffffffa01f649b [sunrpc]
    #22 [ffff8810343bfdd8] xs_tcp_setup_socket at ffffffffa01f6965 [sunrpc]
    #23 [ffff8810343bfe38] worker_thread at ffffffff810887d0
    #24 [ffff8810343bfee8] kthread at ffffffff8108dd96
    #25 [ffff8810343bff48] kernel_thread at ffffffff8100c1ca

    rpciod is trying to allocate memory for a new socket to talk to the
    server. The VM ends up calling ->releasepage to get more memory, and it
    tries to do a blocking commit. That commit can't succeed however without
    a connected socket, so we deadlock.

    Fix this by setting PF_FSTRANS on the workqueue task prior to doing the
    socket allocation, and having nfs_release_page check for that flag when
    deciding whether to do a commit call. Also, set PF_FSTRANS
    unconditionally in rpc_async_schedule since that function can also do
    allocations sometimes.

    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust
    Cc: stable@vger.kernel.org

    Jeff Layton
     
  • Current block layout driver read/write code assumes page
    aligned IO in many places. Add a checker to validate the assumption.
    Otherwise there would be data corruption like when application does
    open(O_WRONLY) and page unaliged write.

    Signed-off-by: Peng Tao
    Signed-off-by: Trond Myklebust

    Peng Tao
     
  • fl_type is not a bitmap.

    Reported-by: Al Viro
    Signed-off-by: Jeff Layton
    Signed-off-by: Trond Myklebust

    Jeff Layton
     
  • Commit 57208fa7e51 "NFS: Create an write_pageio_init() function"
    did not modify the calls in direct.c, preventing direct io from
    using pnfs. This reintroduces that capability.

    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Fred Isaman
     
  • Commit 1abb50886af "NFS: Create an read_pageio_init() function"
    did not modify the call in direct.c, preventing direct io from
    using pnfs. This reintroduces that capability.

    Signed-off-by: Fred Isaman
    Signed-off-by: Trond Myklebust

    Fred Isaman
     
  • Fix numerous repeated warnings by making the stub function
    void instead of non-void:

    fs/nfs/nfs4_fs.h: In function 'nfs4_unregister_sysctl':
    fs/nfs/nfs4_fs.h:385:1: warning: no return statement in function returning non-void

    Signed-off-by: Randy Dunlap
    Cc: Trond Myklebust
    Signed-off-by: Trond Myklebust

    Randy Dunlap
     

28 Jul, 2012

1 commit


24 Jul, 2012

1 commit

  • Pull the big VFS changes from Al Viro:
    "This one is *big* and changes quite a few things around VFS. What's in there:

    - the first of two really major architecture changes - death to open
    intents.

    The former is finally there; it was very long in making, but with
    Miklos getting through really hard and messy final push in
    fs/namei.c, we finally have it. Unlike his variant, this one
    doesn't introduce struct opendata; what we have instead is
    ->atomic_open() taking preallocated struct file * and passing
    everything via its fields.

    Instead of returning struct file *, it returns -E... on error, 0
    on success and 1 in "deal with it yourself" case (e.g. symlink
    found on server, etc.).

    See comments before fs/namei.c:atomic_open(). That made a lot of
    goodies finally possible and quite a few are in that pile:
    ->lookup(), ->d_revalidate() and ->create() do not get struct
    nameidata * anymore; ->lookup() and ->d_revalidate() get lookup
    flags instead, ->create() gets "do we want it exclusive" flag.

    With the introduction of new helper (kern_path_locked()) we are rid
    of all struct nameidata instances outside of fs/namei.c; it's still
    visible in namei.h, but not for long. Come the next cycle,
    declaration will move either to fs/internal.h or to fs/namei.c
    itself. [me, miklos, hch]

    - The second major change: behaviour of final fput(). Now we have
    __fput() done without any locks held by caller *and* not from deep
    in call stack.

    That obviously lifts a lot of constraints on the locking in there.
    Moreover, it's legal now to call fput() from atomic contexts (which
    has immediately simplified life for aio.c). We also don't need
    anti-recursion logics in __scm_destroy() anymore.

    There is a price, though - the damn thing has become partially
    asynchronous. For fput() from normal process we are guaranteed
    that pending __fput() will be done before the caller returns to
    userland, exits or gets stopped for ptrace.

    For kernel threads and atomic contexts it's done via
    schedule_work(), so theoretically we might need a way to make sure
    it's finished; so far only one such place had been found, but there
    might be more.

    There's flush_delayed_fput() (do all pending __fput()) and there's
    __fput_sync() (fput() analog doing __fput() immediately). I hope
    we won't need them often; see warnings in fs/file_table.c for
    details. [me, based on task_work series from Oleg merged last
    cycle]

    - sync series from Jan

    - large part of "death to sync_supers()" work from Artem; the only
    bits missing here are exofs and ext4 ones. As far as I understand,
    those are going via the exofs and ext4 trees resp.; once they are
    in, we can put ->write_super() to the rest, along with the thread
    calling it.

    - preparatory bits from unionmount series (from dhowells).

    - assorted cleanups and fixes all over the place, as usual.

    This is not the last pile for this cycle; there's at least jlayton's
    ESTALE work and fsfreeze series (the latter - in dire need of fixes,
    so I'm not sure it'll make the cut this cycle). I'll probably throw
    symlink/hardlink restrictions stuff from Kees into the next pile, too.
    Plus there's a lot of misc patches I hadn't thrown into that one -
    it's large enough as it is..."

    * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (127 commits)
    ext4: switch EXT4_IOC_RESIZE_FS to mnt_want_write_file()
    btrfs: switch btrfs_ioctl_balance() to mnt_want_write_file()
    switch dentry_open() to struct path, make it grab references itself
    spufs: shift dget/mntget towards dentry_open()
    zoran: don't bother with struct file * in zoran_map
    ecryptfs: don't reinvent the wheels, please - use struct completion
    don't expose I_NEW inodes via dentry->d_inode
    tidy up namei.c a bit
    unobfuscate follow_up() a bit
    ext3: pass custom EOF to generic_file_llseek_size()
    ext4: use core vfs llseek code for dir seeks
    vfs: allow custom EOF in generic_file_llseek code
    vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes
    vfs: Remove unnecessary flushing of block devices
    vfs: Make sys_sync writeout also block device inodes
    vfs: Create function for iterating over block devices
    vfs: Reorder operations during sys_sync
    quota: Move quota syncing to ->sync_fs method
    quota: Split dquot_quota_sync() to writeback and cache flushing part
    vfs: Move noop_backing_dev_info check from sync into writeback
    ...

    Linus Torvalds
     

21 Jul, 2012

1 commit

  • Pull pnfs/ore fixes from Boaz Harrosh:
    "These are catastrophic fixes to the pnfs objects-layout that were just
    discovered. They are also destined for @stable.

    I have found these and worked on them at around RC1 time but
    unfortunately went to the hospital for kidney stones and had a very
    slow recovery. I refrained from sending them as is, before proper
    testing, and surly I have found a bug just yesterday.

    So now they are all well tested, and have my sign-off. Other then
    fixing the problem at hand, and assuming there are no bugs at the new
    code, there is low risk to any surrounding code. And in anyway they
    affect only these paths that are now broken. That is RAID5 in pnfs
    objects-layout code. It does also affect exofs (which was not broken)
    but I have tested exofs and it is lower priority then objects-layout
    because no one is using exofs, but objects-layout has lots of users."

    * 'for-linus' of git://git.open-osd.org/linux-open-osd:
    pnfs-obj: Fix __r4w_get_page when offset is beyond i_size
    pnfs-obj: don't leak objio_state if ore_write/read fails
    ore: Unlock r4w pages in exact reverse order of locking
    ore: Remove support of partial IO request (NFS crash)
    ore: Fix NFS crash by supporting any unaligned RAID IO

    Linus Torvalds
     

20 Jul, 2012

2 commits

  • It is very common for the end of the file to be unaligned on
    stripe size. But since we know it's beyond file's end then
    the XOR should be preformed with all zeros.

    Old code used to just read zeros out of the OSD devices, which is a great
    waist. But what scares me more about this situation is that, we now have
    pages attached to the file's mapping that are beyond i_size. I don't
    like the kind of bugs this calls for.

    Fix both birds, by returning a global zero_page, if offset is beyond
    i_size.

    TODO:
    Change the API to ->__r4w_get_page() so a NULL can be
    returned without being considered as error, since XOR API
    treats NULL entries as zero_pages.

    [Bug since 3.2. Should apply the same way to all Kernels since]
    CC: Stable Tree
    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     
  • [Bug since 3.2 Kernel]
    CC: Stable Tree
    Signed-off-by: Boaz Harrosh

    Boaz Harrosh
     

18 Jul, 2012

8 commits