30 Oct, 2005

3 commits

  • Christoph Lameter demonstrated very poor scalability on the SGI 512-way, with
    a many-threaded application which concurrently initializes different parts of
    a large anonymous area.

    This patch corrects that, by using a separate spinlock per page table page, to
    guard the page table entries in that page, instead of using the mm's single
    page_table_lock. (But even then, page_table_lock is still used to guard page
    table allocation, and anon_vma allocation.)

    In this implementation, the spinlock is tucked inside the struct page of the
    page table page: with a BUILD_BUG_ON in case it overflows - which it would in
    the case of 32-bit PA-RISC with spinlock debugging enabled.

    Splitting the lock is not quite for free: another cacheline access. Ideally,
    I suppose we would use split ptlock only for multi-threaded processes on
    multi-cpu machines; but deciding that dynamically would have its own costs.
    So for now enable it by config, at some number of cpus - since the Kconfig
    language doesn't support inequalities, let preprocessor compare that with
    NR_CPUS. But I don't think it's worth being user-configurable: for good
    testing of both split and unsplit configs, split now at 4 cpus, and perhaps
    change that to 8 later.

    There is a benefit even for singly threaded processes: kswapd can be attacking
    one part of the mm while another part is busy faulting.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     
  • Remove PageReserved() calls from core code by tightening VM_RESERVED
    handling in mm/ to cover PageReserved functionality.

    PageReserved special casing is removed from get_page and put_page.

    All setting and clearing of PageReserved is retained, and it is now flagged
    in the page_alloc checks to help ensure we don't introduce any refcount
    based freeing of Reserved pages.

    MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being
    deprecated. We never completely handled it correctly anyway, and is be
    reintroduced in future if required (Hugh has a proof of concept).

    Once PageReserved() calls are removed from kernel/power/swsusp.c, and all
    arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can
    be trivially removed.

    Last real user of PageReserved is swsusp, which uses PageReserved to
    determine whether a struct page points to valid memory or not. This still
    needs to be addressed (a generic page_is_ram() should work).

    A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and
    thus mapcounted and count towards shared rss). These writes to the struct
    page could cause excessive cacheline bouncing on big systems. There are a
    number of ways this could be addressed if it is an issue.

    Signed-off-by: Nick Piggin

    Refcount bug fix for filemap_xip.c

    Signed-off-by: Carsten Otte
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • Impose a little more consistency on the page fault handlers do_wp_page,
    do_swap_page, do_anonymous_page, do_no_page, do_file_page: why not pass their
    arguments in the same order, called the same names?

    break_cow is all very well, but what it did was inlined elsewhere: easier to
    compare if it's brought back into do_wp_page.

    do_file_page's fallback to do_no_page dates from a time when we were testing
    pte_file by using it wherever possible: currently it's peculiar to nonlinear
    vmas, so just check that. BUG_ON if not? Better not, it's probably page
    table corruption, so just show the pte: hmm, there's a pte_ERROR macro, let's
    use that for do_wp_page's invalid pfn too.

    Hah! Someone in the ppc64 world noticed pte_ERROR was unused so removed it:
    restored (and say "pud" not "pmd" in its pud_ERROR).

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

28 Oct, 2005

1 commit


09 Oct, 2005

1 commit

  • - added typedef unsigned int __nocast gfp_t;

    - replaced __nocast uses for gfp flags with gfp_t - it gives exactly
    the same warnings as far as sparse is concerned, doesn't change
    generated code (from gcc point of view we replaced unsigned int with
    typedef) and documents what's going on far better.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

10 Sep, 2005

2 commits

  • This patch modifies tmpfs to call the inode_init_security LSM hook to set
    up the incore inode security state for new inodes before the inode becomes
    accessible via the dcache.

    As there is no underlying storage of security xattrs in this case, it is
    not necessary for the hook to return the (name, value, len) triple to the
    tmpfs code, so this patch also modifies the SELinux hook function to
    correctly handle the case where the (name, value, len) pointers are NULL.

    The hook call is needed in tmpfs in order to support proper security
    labeling of tmpfs inodes (e.g. for udev with tmpfs /dev in Fedora). With
    this change in place, we should then be able to remove the
    security_inode_post_create/mkdir/... hooks safely.

    Signed-off-by: Stephen Smalley
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Smalley
     
  • Update the file systems in fs/ implementing a delete_inode() callback to
    call truncate_inode_pages(). One implementation note: In developing this
    patch I put the calls to truncate_inode_pages() at the very top of those
    filesystems delete_inode() callbacks in order to retain the previous
    behavior. I'm guessing that some of those could probably be optimized.

    Signed-off-by: Mark Fasheh
    Acked-by: Christoph Hellwig
    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mark Fasheh
     

08 Sep, 2005

1 commit


05 Sep, 2005

2 commits

  • This patch modifies the VFS setxattr, getxattr, and listxattr code to fall
    back to the security module for security xattrs if the filesystem does not
    support xattrs natively. This allows security modules to export the incore
    inode security label information to userspace even if the filesystem does
    not provide xattr storage, and eliminates the need to individually patch
    various pseudo filesystem types to provide such access. The patch removes
    the existing xattr code from devpts and tmpfs as it is then no longer
    needed.

    The patch restructures the code flow slightly to reduce duplication between
    the normal path and the fallback path, but this should only have one
    user-visible side effect - a program may get -EACCES rather than
    -EOPNOTSUPP if policy denied access but the filesystem didn't support the
    operation anyway. Note that the post_setxattr hook call is not needed in
    the fallback case, as the inode_setsecurity hook call handles the incore
    inode security state update directly. In contrast, we do call fsnotify in
    both cases.

    Signed-off-by: Stephen Smalley
    Acked-by: James Morris
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Stephen Smalley
     
  • Either shmem_getpage returns a failure, or it found a page, or it was told
    it couldn't do any I/O. So it's useless to check nonblock in the else
    branch. We could add a BUG() there but I preferred to comment the
    offending function.

    This was taken out from one Ingo Molnar's old patch I'm resurrecting.

    Signed-off-by: Paolo 'Blaisorblade' Giarrusso
    Cc: Ingo Molnar
    Cc: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paolo 'Blaisorblade' Giarrusso
     

20 Aug, 2005

1 commit

  • This bug could cause oopses and page state corruption, because ncpfs
    used the generic page-cache symlink handlign functions. But those
    functions only work if the page cache is guaranteed to be "stable", ie a
    page that was installed when the symlink walk was started has to still
    be installed in the page cache at the end of the walk.

    We could have fixed ncpfs to not use the generic helper routines, but it
    is in many ways much cleaner to instead improve on the symlink walking
    helper routines so that they don't require that absolute stability.

    We do this by allowing "follow_link()" to return a error-pointer as a
    cookie, which is fed back to the cleanup "put_link()" routine. This
    also simplifies NFS symlink handling.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

22 Jun, 2005

1 commit

  • To improve shmem scalability, we allowed tmpfs instances which don't need
    their blocks or inodes limited not to count them, and not to allocate any
    sbinfo. Which was okay when the only use for the sbinfo was accounting
    blocks and inodes; but since then a couple of unrelated projects extending
    tmpfs want to store other data in the sbinfo. Whether either extension
    reaches mainline is beside the point: I'm guilty of a bad design decision,
    and should restore sbinfo to make any such future extensions easier.

    So, once again allocate a shmem_sb_info for every shmem/tmpfs instance, and
    now let max_blocks 0 indicate unlimited blocks, and max_inodes 0 unlimited
    inodes. Brent Casavant verified (many months ago) that this does not
    perceptibly impact the scalability (since the unlimited sbinfo cacheline is
    repeatedly accessed but only once dirtied).

    And merge shmem_set_size into its sole caller shmem_remount_fs.

    Signed-off-by: Hugh Dickins
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hugh Dickins
     

17 Apr, 2005

1 commit

  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds