01 Jul, 2006

22 commits

  • Add a rq_sendfile_ok flag to svc_rqst which will be cleared in the privacy
    case so that the wrapping code will get copies of the read data instead of
    real page cache pages. This makes life simpler when we encrypt the response.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • Since nfsv4 actually keeps around the file descriptors it gets from open
    (instead of just using them for a single read or write operation), we need to
    make sure that we can do RDWR opens and not just RDONLY/WRONLY.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • These tests always returned true; clearly that wasn't what was intended.

    In keeping with kernel style, make them functions instead of macros while
    we're at it.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • In the event that lookup_one_len() fails in nfsd_link(), fh_unlock() is
    skipped and locks are held overlong.

    Patch was tested on 2.6.17-rc2 by causing lookup_one_len() to fail and
    verifying that fh_unlock() gets called appropriately.

    Signed-off-by: David M. Richter
    Signed-off-by: J. Bruce Fields
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David M. Richter
     
  • We're checking nfs_in_grace here a few times when there isn't really any
    reason to--bad_stateid is probably the more sensible return value anyway.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • In the typical v2/v3 case the only new filehandles used as arguments to
    operations are filehandles taken directly off the wire, which don't get
    dentries until fh_verify() is called.

    But in v4 the filehandles that are arguments to operations were often created
    by previous operations (putrootfh, lookup, etc.) using fh_compose, which sets
    the dentry in the filehandle without calling nfsd_setuser().

    This also means that, for example, if filesystem B is mounted on filesystem A,
    and filesystem A is exported without root-squashing, then a client can bypass
    the rootsquashing on B using a compound that starts at a filehandle in A,
    crosses into B using lookups, and then does stuff in B.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • Fix an improper unlock in an error path.

    Signed-off-by: J. Bruce Fields
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • nfsd tries to return to a client the same sort of filehandle as was used by
    the client. This removes some filehandle aliasing issues and means that a
    server upgrade followed by a downgrade will not confused clients not restarted
    during that time.

    However when crossing a mountpoint, the filehandle used for one filesystem
    doesn't provide any useful information on what sort of filehandle should be
    used on the other, and can provide misleading information. So if the
    reference filehandle is on a different filesystem to the one being generated,
    ignore it.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • There is a perfectly valid situation where fh_update gets called on an already
    uptodate filehandle - in nfsd_create_v3 where a CREATE_UNCHECKED finds an
    existing file and wants to just set the size.

    We could possible optimise out the call in that case, but the only harm
    involved is that fh_update prints a warning, so it is easier to remove the
    warning.

    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Type '3' is used for the fsid in filehandles when the device number of the
    device holding the filesystem has more than 8 bits in either major or minor.
    Unfortunately expkey_parse doesn't recognise type 3. Fix this.

    (Slighty modified from Frank's original)

    Signed-off-by: Frank Filz
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Frank Filz
     
  • Just testing the i_sb isn't really enough, at least the vfsmnt must be the
    same. Thanks Al.

    Cc: Al Viro
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • Add a new security hook definition for the sys_ioprio_get operation. At
    present, the SELinux hook function implementation for this hook is
    identical to the getscheduler implementation but a separate hook is
    introduced to allow this check to be specialized in the future if
    necessary.

    This patch also creates a helper function get_task_ioprio which handles the
    access check in addition to retrieving the ioprio value for the task.

    Signed-off-by: David Quigley
    Acked-by: Stephen Smalley
    Signed-off-by: James Morris
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Quigley
     
  • The remaining counters in page_state after the zoned VM counter patches
    have been applied are all just for show in /proc/vmstat. They have no
    essential function for the VM.

    We use a simple increment of per cpu variables. In order to avoid the most
    severe races we disable preempt. Preempt does not prevent the race between
    an increment and an interrupt handler incrementing the same statistics
    counter. However, that race is exceedingly rare, we may only loose one
    increment or so and there is no requirement (at least not in kernel) that
    the vm event counters have to be accurate.

    In the non preempt case this results in a simple increment for each
    counter. For many architectures this will be reduced by the compiler to a
    single instruction. This single instruction is atomic for i386 and x86_64.
    And therefore even the rare race condition in an interrupt is avoided for
    both architectures in most cases.

    The patchset also adds an off switch for embedded systems that allows a
    building of linux kernels without these counters.

    The implementation of these counters is through inline code that hopefully
    results in only a single instruction increment instruction being emitted
    (i386, x86_64) or in the increment being hidden though instruction
    concurrency (EPIC architectures such as ia64 can get that done).

    Benefits:
    - VM event counter operations usually reduce to a single inline instruction
    on i386 and x86_64.
    - No interrupt disable, only preempt disable for the preempt case.
    Preempt disable can also be avoided by moving the counter into a spinlock.
    - Handling is similar to zoned VM counters.
    - Simple and easily extendable.
    - Can be omitted to reduce memory use for embedded use.

    References:

    RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2
    RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2
    local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2
    V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2
    V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2
    V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2

    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Conversion of nr_bounce to a per zone counter

    nr_bounce is only used for proc output. So it could be left as an event
    counter. However, the event counters may not be accurate and nr_bounce is
    categorizing types of pages in a zone. So we really need this to also be a
    per zone counter.

    [akpm@osdl.org: bugfix]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Conversion of nr_unstable to a per zone counter

    We need to do some special modifications to the nfs code since there are
    multiple cases of disposition and we need to have a page ref for proper
    accounting.

    This converts the last critical page state of the VM and therefore we need to
    remove several functions that were depending on GET_PAGE_STATE_LAST in order
    to make the kernel compile again. We are only left with event type counters
    in page state.

    [akpm@osdl.org: bugfixes]
    Signed-off-by: Christoph Lameter
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Conversion of nr_writeback to per zone counter.

    This removes the last page_state counter from arch/i386/mm/pgtable.c so we
    drop the page_state from there.

    [akpm@osdl.org: bugfix]
    Signed-off-by: Christoph Lameter
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • This makes nr_dirty a per zone counter. Looping over all processors is
    avoided during writeback state determination.

    The counter aggregation for nr_dirty had to be undone in the NFS layer since
    we summed up the page counts from multiple zones. Someone more familiar with
    NFS should probably review what I have done.

    [akpm@osdl.org: bugfix]
    Signed-off-by: Christoph Lameter
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Conversion of nr_page_table_pages to a per zone counter

    [akpm@osdl.org: bugfix]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • - Allows reclaim to access counter without looping over processor counts.

    - Allows accurate statistics on how many pages are used in a zone by
    the slab. This may become useful to balance slab allocations over
    various zones.

    [akpm@osdl.org: bugfix]
    Signed-off-by: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • The current NR_FILE_MAPPED is used by zone reclaim and the dirty load
    calculation as the number of mapped pagecache pages. However, that is not
    true. NR_FILE_MAPPED includes the mapped anonymous pages. This patch
    separates those and therefore allows an accurate tracking of the anonymous
    pages per zone.

    It then becomes possible to determine the number of unmapped pages per zone
    and we can avoid scanning for unmapped pages if there are none.

    Also it may now be possible to determine the mapped/unmapped ratio in
    get_dirty_limit. Isnt the number of anonymous pages irrelevant in that
    calculation?

    Note that this will change the meaning of the number of mapped pages reported
    in /proc/vmstat /proc/meminfo and in the per node statistics. This may affect
    user space tools that monitor these counters! NR_FILE_MAPPED works like
    NR_FILE_DIRTY. It is only valid for pagecache pages.

    Signed-off-by: Christoph Lameter
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • Currently a single atomic variable is used to establish the size of the page
    cache in the whole machine. The zoned VM counters have the same method of
    implementation as the nr_pagecache code but also allow the determination of
    the pagecache size per zone.

    Remove the special implementation for nr_pagecache and make it a zoned counter
    named NR_FILE_PAGES.

    Updates of the page cache counters are always performed with interrupts off.
    We can therefore use the __ variant here.

    Signed-off-by: Christoph Lameter
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     
  • nr_mapped is important because it allows a determination of how many pages of
    a zone are not mapped, which would allow a more efficient means of determining
    when we need to reclaim memory in a zone.

    We take the nr_mapped field out of the page state structure and define a new
    per zone counter named NR_FILE_MAPPED (the anonymous pages will be split off
    from NR_MAPPED in the next patch).

    We replace the use of nr_mapped in various kernel locations. This avoids the
    looping over all processors in try_to_free_pages(), writeback, reclaim (swap +
    zone reclaim).

    [akpm@osdl.org: bugfix]
    Signed-off-by: Christoph Lameter
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Lameter
     

30 Jun, 2006

15 commits


29 Jun, 2006

3 commits