18 Jul, 2007

6 commits

  • KSYM_NAME_LEN is peculiar in that it does not include the space for the
    trailing '\0', forcing all users to use KSYM_NAME_LEN + 1 when allocating
    buffer. This is nonsense and error-prone. Moreover, when the caller
    forgets that it's very likely to subtly bite back by corrupting the stack
    because the last position of the buffer is always cleared to zero.

    This patch increments KSYM_NAME_LEN by one and updates code accordingly.

    * off-by-one bug in asm-powerpc/kprobes.h::kprobe_lookup_name() macro
    is fixed.

    * Where MODULE_NAME_LEN and KSYM_NAME_LEN were used together,
    MODULE_NAME_LEN was treated as if it didn't include space for the
    trailing '\0'. Fix it.

    Signed-off-by: Tejun Heo
    Acked-by: Paulo Marques
    Cc: Benjamin Herrenschmidt
    Cc: Paul Mackerras
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tejun Heo
     
  • Currently, the freezer treats all tasks as freezable, except for the kernel
    threads that explicitly set the PF_NOFREEZE flag for themselves. This
    approach is problematic, since it requires every kernel thread to either
    set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
    care for the freezing of tasks at all.

    It seems better to only require the kernel threads that want to or need to
    be frozen to use some freezer-related code and to remove any
    freezer-related code from the other (nonfreezable) kernel threads, which is
    done in this patch.

    The patch causes all kernel threads to be nonfreezable by default (ie. to
    have PF_NOFREEZE set by default) and introduces the set_freezable()
    function that should be called by the freezable kernel threads in order to
    unset PF_NOFREEZE. It also makes all of the currently freezable kernel
    threads call set_freezable(), so it shouldn't cause any (intentional)
    change of behaviour to appear. Additionally, it updates documentation to
    describe the freezing of tasks more accurately.

    [akpm@linux-foundation.org: build fixes]
    Signed-off-by: Rafael J. Wysocki
    Acked-by: Nigel Cunningham
    Cc: Pavel Machek
    Cc: Oleg Nesterov
    Cc: Gautham R Shenoy
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • It is a bug to set a page dirty if it is not uptodate unless it has
    buffers. If the page has buffers, then the page may be dirty (some buffers
    dirty) but not uptodate (some buffers not uptodate). The exception to this
    rule is if the set_page_dirty caller is racing with truncate or invalidate.

    A buffer can not be set dirty if it is not uptodate.

    If either of these situations occurs, it indicates there could be some data
    loss problem. Some of these warnings could be a harmless one where the
    page or buffer is set uptodate immediately after it is dirtied, however we
    should fix those up, and enforce this ordering.

    Bring the order of operations for truncate into line with those of
    invalidate. This will prevent a page from being able to go !uptodate while
    we're holding the tree_lock, which is probably a good thing anyway.

    Signed-off-by: Nick Piggin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • I can never remember what the function to register to receive VM pressure
    is called. I have to trace down from __alloc_pages() to find it.

    It's called "set_shrinker()", and it needs Your Help.

    1) Don't hide struct shrinker. It contains no magic.
    2) Don't allocate "struct shrinker". It's not helpful.
    3) Call them "register_shrinker" and "unregister_shrinker".
    4) Call the function "shrink" not "shrinker".
    5) Reduce the 17 lines of waffly comments to 13, but document it properly.

    Signed-off-by: Rusty Russell
    Cc: David Chinner
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rusty Russell
     
  • When we are out of memory of a suitable size we enter reclaim. The current
    reclaim algorithm targets pages in LRU order, which is great for fairness at
    order-0 but highly unsuitable if you desire pages at higher orders. To get
    pages of higher order we must shoot down a very high proportion of memory;
    >95% in a lot of cases.

    This patch set adds a lumpy reclaim algorithm to the allocator. It targets
    groups of pages at the specified order anchored at the end of the active and
    inactive lists. This encourages groups of pages at the requested orders to
    move from active to inactive, and active to free lists. This behaviour is
    only triggered out of direct reclaim when higher order pages have been
    requested.

    This patch set is particularly effective when utilised with an
    anti-fragmentation scheme which groups pages of similar reclaimability
    together.

    This patch set is based on Peter Zijlstra's lumpy reclaim V2 patch which forms
    the foundation. Credit to Mel Gorman for sanitity checking.

    Mel said:

    The patches have an application with hugepage pool resizing.

    When lumpy-reclaim is used used with ZONE_MOVABLE, the hugepages pool can
    be resized with greater reliability. Testing on a desktop machine with 2GB
    of RAM showed that growing the hugepage pool with ZONE_MOVABLE on it's own
    was very slow as the success rate was quite low. Without lumpy-reclaim,
    each attempt to grow the pool by 100 pages would yield 1 or 2 hugepages.
    With lumpy-reclaim, getting 40 to 70 hugepages on each attempt was typical.

    [akpm@osdl.org: ia64 pfn_to_nid fixes and loop cleanup]
    [bunk@stusta.de: static declarations for internal functions]
    [a.p.zijlstra@chello.nl: initial lumpy V2 implementation]
    Signed-off-by: Andy Whitcroft
    Acked-by: Peter Zijlstra
    Acked-by: Mel Gorman
    Acked-by: Mel Gorman
    Cc: Bob Picco
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andy Whitcroft
     
  • It is often known at allocation time whether a page may be migrated or not.
    This patch adds a flag called __GFP_MOVABLE and a new mask called
    GFP_HIGH_MOVABLE. Allocations using the __GFP_MOVABLE can be either migrated
    using the page migration mechanism or reclaimed by syncing with backing
    storage and discarding.

    An API function very similar to alloc_zeroed_user_highpage() is added for
    __GFP_MOVABLE allocations called alloc_zeroed_user_highpage_movable(). The
    flags used by alloc_zeroed_user_highpage() are not changed because it would
    change the semantics of an existing API. After this patch is applied there
    are no in-kernel users of alloc_zeroed_user_highpage() so it probably should
    be marked deprecated if this patch is merged.

    Note that this patch includes a minor cleanup to the use of __GFP_ZERO in
    shmem.c to keep all flag modifications to inode->mapping in the
    shmem_dir_alloc() helper function. This clean-up suggestion is courtesy of
    Hugh Dickens.

    Additional credit goes to Christoph Lameter and Linus Torvalds for shaping the
    concept. Credit to Hugh Dickens for catching issues with shmem swap vector
    and ramfs allocations.

    [akpm@linux-foundation.org: build fix]
    [hugh@veritas.com: __GFP_ZERO cleanup]
    Signed-off-by: Mel Gorman
    Cc: Andy Whitcroft
    Cc: Christoph Lameter
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mel Gorman
     

17 Jul, 2007

34 commits

  • do_utimes() does not honour CAP_FOWNER when times==NULL.
    Trivial and obvious one-line fix.

    Signed-off-by: Satyam Sharma
    Signed-off-by: Linus Torvalds

    Satyam Sharma
     
  • Teach LDM about a new field encountered with Windows Vista.

    This fixes LDM for people using Vista who have disabled drive letter
    assignment from one or more volumes. Doing this introduces a so far
    unknown field in the LDM database in the VOL5 VBLK structure which
    causes the LDM driver to fail to parse the VBLK structure and hence LDM
    fails to parse the disk altogether. This patch teaches the driver about
    this field.

    Thanks got to Ashton Mills for reporting the
    problem and working with me on getting it fixed. It is now working for
    him.

    Signed-off-by: Anton Altaparmakov
    CC: Richard Russon
    Signed-off-by: Linus Torvalds

    Anton Altaparmakov
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched:
    [PATCH] sched: fix up fs/proc/array.c whitespace problems
    [PATCH] sched: prettify prio_to_wmult[]
    [PATCH] sched: document prio_to_wmult[]
    [PATCH] sched: improve weight-array comments
    [PATCH] sched: remove dead code from task_stime()

    Fixed up trivial conflict in fs/proc/array.c

    Linus Torvalds
     
  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (32 commits)
    [PATCH] ocfs2: zero_user_page conversion
    ocfs2: Support xfs style space reservation ioctls
    ocfs2: support for removing file regions
    ocfs2: update truncate handling of partial clusters
    ocfs2: btree support for removal of arbirtrary extents
    ocfs2: Support creation of unwritten extents
    ocfs2: support writing of unwritten extents
    ocfs2: small cleanup of ocfs2_write_begin_nolock()
    ocfs2: btree changes for unwritten extents
    ocfs2: abstract btree growing calls
    ocfs2: use all extent block suballocators
    ocfs2: plug truncate into cached dealloc routines
    ocfs2: simplify deallocation locking
    ocfs2: harden buffer check during mapping of page blocks
    ocfs2: shared writeable mmap
    ocfs2: factor out write aops into nolock variants
    ocfs2: rework ocfs2_buffered_write_cluster()
    ocfs2: take ip_alloc_sem during entire truncate
    ocfs2: Add "preferred slot" mount option
    [KJ PATCH] Replacing memset(,0,PAGE_SIZE) with clear_page() in fs/ocfs2/dlm/dlmrecovery.c
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.dk/data/git/linux-2.6-block:
    splice: direct splicing updates ppos twice
    more ACSI removal
    umem: Fix match of pci_ids in umem driver
    umem: Remove references to dead CONFIG_MM_MAP_MEMORY variable
    remove the documentation for the legacy CDROM drivers

    Linus Torvalds
     
  • * master.kernel.org:/pub/scm/linux/kernel/git/lethal/sh-2.6: (68 commits)
    sh: sh-rtc support for SH7709.
    sh: Revert __xdiv64_32 size change.
    sh: Update r7785rp defconfig.
    sh: Export div symbols for GCC 4.2 and ST GCC.
    sh: fix race in parallel out-of-tree build
    sh: Kill off dead mach.c for hp6xx.
    sh: hd64461.h cleanup and added comments.
    sh: Update the alignment when 4K stacks are used.
    sh: Add a .bss.page_aligned section for 4K stacks.
    sh: Don't let SH-4A clobber SH-4 CFLAGS.
    sh: Add parport stub for SuperIO ports.
    sh: Drop -Wa,-dsp for DSP tuning.
    sh: Update dreamcast defconfig.
    fb: pvr2fb: A few more __devinit annotations for PCI.
    fb: pvr2fb: Fix up section mismatch warnings.
    sh: Select IPR-IRQ for SH7091.
    sh: Correct __xdiv64_32/div64_32 return value size.
    sh: Fix timer-tmu build for SH-3.
    sh: Add cpu and mach links to CLEAN_FILES.
    sh: Preliminary support for the SH-X3 CPU.
    ...

    Linus Torvalds
     
  • FAT12 entry is 12bits, so it needs 2 phase to update the value. And
    writer and reader access it without any lock, so reader can get the
    half updated value.

    This fixes the long standing race condition by adding a global
    spinlock to only FAT12 for avoiding any impact against FAT16/32.

    Signed-off-by: OGAWA Hirofumi
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     
  • compat32: Ignore the LOOP_CLR_FD ioctl for the loop block device, to kill an
    annoying kernel message when e.g. busybox umount is used.

    Signed-off-by: Geert Uytterhoeven
    Acked-by: Arnd Bergmann
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Geert Uytterhoeven
     
  • Since the corresponding source files no longer exist, remove the
    irrelevant Makefile entries for them.

    Signed-off-by: Robert P. J. Day
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     
  • This is a patch that speeds up statfs. It is very simple - the "overhead"
    calculation, which takes a huge amount of time for large filesystems, never
    changes unless the size of the filesystem itself changes. That means we can
    store it in memory and only recalculate if the filesystem has been resized
    (almost never).

    It also fixes a minor problem that we never update the on-disk superblock free
    blocks/inodes counts until the filesystem is unmounted. While not fatal, we
    may as well update that on disk when we have the information, and it makes
    things like debugfs and dumpe2fs report a bit more accurate info.

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andreas Dilger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • This is a patch that speeds up statfs. It is very simple - the "overhead"
    calculation, which takes a huge amount of time for large filesystems, never
    changes unless the size of the filesystem itself changes. That means we can
    store it in memory and only recalculate if the filesystem has been resized
    (almost never).

    It also fixes a minor problem that we never update the on-disk superblock free
    blocks/inodes counts until the filesystem is unmounted. While not fatal, we
    may as well update that on disk when we have the information, and it makes
    things like debugfs and dumpe2fs report a bit more accurate info.

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andreas Dilger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • This is a patch that speeds up statfs. It is very simple - the "overhead"
    calculation, which takes a huge amount of time for large filesystems, never
    changes unless the size of the filesystem itself changes. That means we can
    store it in memory and only recalculate if the filesystem has been resized
    (almost never).

    It also fixes a minor problem that we never update the on-disk superblock free
    blocks/inodes counts until the filesystem is unmounted. While not fatal, we
    may as well update that on disk when we have the information, and it makes
    things like debugfs and dumpe2fs report a bit more accurate info.

    Signed-off-by: Badari Pulavarty
    Signed-off-by: Andreas Dilger
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Badari Pulavarty
     
  • Trivial typo and grammar fixes.

    Signed-off-by: "J. Bruce Fields"
    Cc: Davide Libenzi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • Fix error handling in ext4_create_journal according to kernel conventions.

    Signed-off-by: Borislav Petkov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • Fix error handling in ext3_create_journal according to kernel conventions.

    Signed-off-by: Borislav Petkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • We have to change udf_crc16() name to udf_crc() to be able to play with CRC
    test.

    Signed-off-by: Cyrill Gorcunov
    Cc: Jan Kara
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • Replace (n & (n-1)) with is_power_of_2

    Signed-off-by: vignesh babu
    Acked-by: Evgeniy Dushistov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    vignesh babu
     
  • This reduces the memory footprint and it enforces that only the current
    task can enable seccomp on itself (this is a requirement for a
    strightforward [modulo preempt ;) ] TIF_NOTSC implementation).

    Signed-off-by: Andrea Arcangeli
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrea Arcangeli
     
  • Fix this:

    fs/block_dev.c: In function 'bd_claim_by_disk':
    fs/block_dev.c:970: warning: 'found' may be used uninitialized in this function

    and given that free_bd_holder() now needs free(NULL)-is-legal behaviour, we
    can simplify bd_release_from_kobject().

    Cc: Bjorn Steinbrink
    Cc: Johannes Weiner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Replace some funky codepaths in fs/block_dev.c with cleaner versions of the
    affected places.

    [akpm@linux-foundation.org: fix return value]
    Signed-off-by: Johannes Weiner
    Cc: Bjorn Steinbrink
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Johannes Weiner
     
  • Every file should include the headers containing the prototypes for
    its global functions.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Add custom dentry hash and comparison operations for HFS+ filesystems that are
    case-insensitive and/or do automatic unicode decomposition. The new
    operations reuse the existing HFS+ ASCII to unicode conversion, unicode
    decomposition and case folding functionality.

    Signed-off-by: Duane Griffin
    Signed-off-by: Roman Zippel

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Duane Griffin
     
  • The HFS+ filesystem is case-insensitive and does automatic unicode
    decomposition by default, but does not provide custom dentry operations. This
    can lead to multiple dentries being cached for lookups on a filename with
    varying case and/or character (de)composition.

    These patches add custom dentry hash and comparison operations for
    case-sensitive and/or automatically decomposing HFS+ filesystems. Unicode
    decomposition and case-folding are performed as required to ensure equivalent
    filenames are hashed to the same values and compare as equal.

    This patch:

    Refactor existing HFS+ ASCII to unicode string conversion routine to split out
    character conversion functionality. This will be reused by the custom dentry
    hash and comparison routines. This approach avoids unnecessary memory
    allocation compared to using the string conversion routine directly in the new
    functions.

    [akpm@linux-foundation.org: avoid use-of-uninitialised]
    Signed-off-by: Duane Griffin
    Signed-off-by: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Duane Griffin
     
  • In ext4_new_blocks(), one of two ext4_block_bitmap() calls should be
    ext4_inode_bitmap() call. It is not harmful in normal processing, but it
    should be fixed.

    Signed-off-by: Toshiyuki Okajima
    Cc: Theodore Ts'o
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Toshiyuki Okajima
     
  • Replace (n & (n-1)) in the context of power of 2 checks with
    is_power_of_2().

    Signed-off-by: vignesh babu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    vignesh babu
     
  • Replace (n & (n-1)) in the context of power of 2 checks with is_power_of_2()

    Signed-off-by: vignesh babu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    vignesh babu
     
  • sys_ioctl() was only exported for our first version of compat ioctl
    handling. Now that the whole compat ioctl handling mess is more or less
    sorted out there are no more modular users left and we can kill it.

    There's one exception and that's sparc64's solaris compat module, but
    sparc64 has it's own export predating the generic one by years for that
    which this patch leaves untouched.

    Signed-off-by: Christoph Hellwig
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • While working on unshare support for the network namespace I noticed we
    were putting clone flags in an int. Which is weird because the syscall
    uses unsigned long and we at least need an unsigned to properly hold all of
    the unshare flags.

    So to make the code consistent, this patch updates the code to use
    unsigned long instead of int for the clone flags in those places
    where we get it wrong today.

    Signed-off-by: Eric W. Biederman
    Acked-by: Cedric Le Goater
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • ext3_change_inode_journal_flag() is only called from one location:
    ext3_ioctl(EXT3_IOC_SETFLAGS). That ioctl case already has a IS_RDONLY()
    call in it so this one is superfluous.

    Signed-off-by: Dave Hansen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • OpenVZ Linux kernel team has discovered the problem with 32bit quota tools
    working on 64bit architectures. In 2.6.10 kernel sys32_quotactl() function
    was replaced by sys_quotactl() with the comment "sys_quotactl seems to be
    32/64bit clean, enable it for 32bit" However this isn't right. Look at
    if_dqblk structure:

    struct if_dqblk {
    __u64 dqb_bhardlimit;
    __u64 dqb_bsoftlimit;
    __u64 dqb_curspace;
    __u64 dqb_ihardlimit;
    __u64 dqb_isoftlimit;
    __u64 dqb_curinodes;
    __u64 dqb_btime;
    __u64 dqb_itime;
    __u32 dqb_valid;
    };

    For 32 bit quota tools sizeof(if_dqblk) == 0x44.
    But for 64 bit kernel its size is 0x48, 'cause of alignment!
    Thus we got a problem. Attached patch reintroduce sys32_quotactl() function,
    that handles this and related situations.

    [michal.k.k.piotrowski@gmail.com: build fix]
    [akpm@linux-foundation.org: Make it link with CONFIG_QUOTA=n]
    Signed-off-by: Vasily Tarasov
    Cc: Andi Kleen
    Cc: "Luck, Tony"
    Cc: Jan Kara
    Cc:
    Signed-off-by: Michal Piotrowski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasily Tarasov
     
  • ext4_orphan_add() and ext4_orphan_del() functions lock sb->s_lock with a
    transaction started with ext4_mark_recovery_complete() waits for a transaction
    holding sb->s_lock, thus leading to a possible deadlock. At the moment we
    call ext4_mark_recovery_complete() from ext4_remount() we have done all the
    work needed for remounting and thus we are safe to drop sb->s_lock before we
    wait for transactions to commit. Note that at this moment we are still
    guarded by s_umount lock against other remounts/umounts.

    Signed-off-by: Jan Kara
    Cc: Eric Sandeen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • ext3_orphan_add() and ext3_orphan_del() functions lock sb->s_lock with a
    transaction started with ext3_mark_recovery_complete() waits for a transaction
    holding sb->s_lock, thus leading to a possible deadlock. At the moment we
    call ext3_mark_recovery_complete() from ext3_remount() we have done all the
    work needed for remounting and thus we are safe to drop sb->s_lock before we
    wait for transactions to commit. Note that at this moment we are still
    guarded by s_umount lock against other remounts/umounts.

    Signed-off-by: Jan Kara
    Cc: Eric Sandeen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • dup_mnt_ns() and clone_uts_ns() return NULL on failure. This is wrong,
    create_new_namespaces() uses ERR_PTR() to catch an error. This means that the
    subsequent create_new_namespaces() will hit BUG_ON() in copy_mnt_ns() or
    copy_utsname().

    Modify create_new_namespaces() to also use the errors returned by the
    copy_*_ns routines and not to systematically return ENOMEM.

    [oleg@tv-sign.ru: better changelog]
    Signed-off-by: Cedric Le Goater
    Cc: Serge E. Hallyn
    Cc: Badari Pulavarty
    Cc: Pavel Emelianov
    Cc: Herbert Poetzl
    Cc: Eric W. Biederman
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • fs/binfmt_elf.c: In function 'load_elf_binary':
    fs/binfmt_elf.c:1002: warning: 'interp_map_addr' may be used uninitialized in this function

    The compiler (gcc-4.1.0) is correct, but it failed to notice that we didn't
    use the resulting value.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton