17 May, 2007
1 commit
-
SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
Signed-off-by: Christoph Lameter
Cc: David Howells
Cc: Jens Axboe
Cc: Steven French
Cc: Michael Halcrow
Cc: OGAWA Hirofumi
Cc: Miklos Szeredi
Cc: Steven Whitehouse
Cc: Roman Zippel
Cc: David Woodhouse
Cc: Dave Kleikamp
Cc: Trond Myklebust
Cc: "J. Bruce Fields"
Cc: Anton Altaparmakov
Cc: Mark Fasheh
Cc: Paul Mackerras
Cc: Christoph Hellwig
Cc: Jan Kara
Cc: David Chinner
Cc: "David S. Miller"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 May, 2007
4 commits
-
The problems are:
- on filesystems w/o permanent inode numbers, i_ino values can be larger
than 32 bits, which can cause problems for some 32 bit userspace programs on
a 64 bit kernel. We can't do anything for filesystems that have actual
>32-bit inode numbers, but on filesystems that generate i_ino values on the
fly, we should try to have them fit in 32 bits. We could trivially fix this
by making the static counters in new_inode and iunique 32 bits, but...- many filesystems call new_inode and assume that the i_ino values they are
given are unique. They are not guaranteed to be so, since the static
counter can wrap. This problem is exacerbated by the fix for #1.- after allocating a new inode, some filesystems call iunique to try to get
a unique i_ino value, but they don't actually add their inodes to the
hashtable, and so they're still not guaranteed to be unique if that counter
wraps.This patch set takes the simpler approach of simply using iunique and hashing
the inodes afterward. Christoph H. previously mentioned that he thought that
this approach may slow down lookups for filesystems that currently hash their
inodes.The questions are:
1) how much would this slow down lookups for these filesystems?
2) is it enough to justify adding more infrastructure to avoid it?What might be best is to start with this approach and then only move to using
IDR or some other scheme if these extra inodes in the hashtable prove to be
problematic.I've done some cursory testing with this patch and the overhead of hashing and
unhashing the inodes with pipefs is pretty low -- just a few seconds of system
time added on to the creation and destruction of 10 million pipes (very
similar to the overhead that the IDR approach would add).The hard thing to measure is what effect this has on other filesystems. I'm
open to ways to try and gauge this.Again, I've only converted pipefs as an example. If this approach is
acceptable then I'll start work on patches to convert other filesystems.With a pretty-much-worst-case microbenchmark provided by Eric Dumazet
:hashing patch (pipebench):
sys 1m15.329s
sys 1m16.249s
sys 1m17.169sunpatched (pipebench):
sys 1m9.836s
sys 1m12.541s
sys 1m14.153sWhich works out to 1.05642174294555027017. So ~5-6% slowdown.
This patch:
When a 32-bit program that was not compiled with large file offsets does a
stat and gets a st_ino value back that won't fit in the 32 bit field, glibc
(correctly) generates an EOVERFLOW error. We can't do anything about fs's
with larger permanent inode numbers, but when we generate them on the fly, we
ought to try and have them fit within a 32 bit field.This patch takes the first step toward this by making the static counters in
these two functions be 32 bits.[jlayton@redhat.com: mention that it's only the case for 32bit, non-LFS stat]
Signed-off-by: Jeff Layton
Cc: Christoph Hellwig
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
There are many places in the kernel where the construction like
foo = list_entry(head->next, struct foo_struct, list);
are used.
The code might look more descriptive and neat if using the macrolist_first_entry(head, type, member) \
list_entry((head)->next, type, member)Here is the macro itself and the examples of its usage in the generic code.
If it will turn out to be useful, I can prepare the set of patches to
inject in into arch-specific code, drivers, networking, etc.Signed-off-by: Pavel Emelianov
Signed-off-by: Kirill Korotaev
Cc: Randy Dunlap
Cc: Andi Kleen
Cc: Zach Brown
Cc: Davide Libenzi
Cc: John McCutchan
Cc: Thomas Gleixner
Cc: Ingo Molnar
Cc: john stultz
Cc: Ram Pai
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
A while back, Christoph mentioned that he thought that iunique ought to be
cleaned up to use a more conventional loop construct. This patch does that,
turning the strange goto loop into a do/while.Signed-off-by: Jeff Layton
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
inode->i_sb is always set, not need to check for it.
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 May, 2007
1 commit
-
I have never seen a use of SLAB_DEBUG_INITIAL. It is only supported by
SLAB.I think its purpose was to have a callback after an object has been freed
to verify that the state is the constructor state again? The callback is
performed before each freeing of an object.I would think that it is much easier to check the object state manually
before the free. That also places the check near the code object
manipulation of the object.Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
compiled with SLAB debugging on. If there would be code in a constructor
handling SLAB_DEBUG_INITIAL then it would have to be conditional on
SLAB_DEBUG otherwise it would just be dead code. But there is no such code
in the kernel. I think SLUB_DEBUG_INITIAL is too problematic to make real
use of, difficult to understand and there are easier ways to accomplish the
same effect (i.e. add debug code before kfree).There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
clear in fs inode caches. Remove the pointless checks (they would even be
pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.This is the last slab flag that SLUB did not support. Remove the check for
unimplemented flags from SLUB.Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
13 Feb, 2007
2 commits
-
This patch is inspired by Arjan's "Patch series to mark struct
file_operations and struct inode_operations const".Compile tested with gcc & sparse.
Signed-off-by: Josef 'Jeff' Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove_dquot_ref can move to dqout.c instead of beeing in inode.c under
#ifdef CONFIG_QUOTA. Also clean the resulting code up a tiny little bit by
testing sb->dq_op earlier - it's constant over a filesystems lifetime.Signed-off-by: Christoph Hellwig
Cc: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Feb, 2007
3 commits
-
Convert all calls to invalidate_inode_pages() into open-coded calls to
invalidate_mapping_pages().Leave the invalidate_inode_pages() wrapper in place for now, marked as
deprecated.Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
When igrab() is calling __iget() on an inode it should check if
clear_inode() has been called on the inode already. Otherwise there is a
race window between clear_inode() and destroy_inode() where igrab() calls
__iget() which leads to already free inodes on the inode lists.Signed-off-by: Vandana Rungta
Signed-off-by: Jan Blunck
Cc: Al Viro
Cc: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
I added IS_NOATIME(inode) macro definition in include/linux/fs.h, true if
the inode superblock is marked readonly or noatime.This new macro is then used in touch_atime() instead of separatly testing
MS_RDONLY and MS_NOATIMESigned-off-by: Eric Dumazet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
14 Dec, 2006
2 commits
-
Add "relatime" (relative atime) support. Relative atime only updates the
atime if the previous atime is older than the mtime or ctime. Like
noatime, but useful for applications like mutt that need to know when a
file has been read since it was last modified.A corresponding patch against mount(8) is available at
http://userweb.kernel.org/~akpm/mount-relative-atime.txtSigned-off-by: Valerie Henson
Cc: Mark Fasheh
Cc: Al Viro
Cc: Christoph Hellwig
Cc: Karel Zak
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Simplify touch_atime() layout.
Cc: Valerie Henson
Cc: Mark Fasheh
Cc: Al Viro
Cc: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Dec, 2006
1 commit
-
This patch changes struct file to use struct path instead of having
independent pointers to struct dentry and struct vfsmount, and converts all
users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.Additionally, it adds two #define's to make the transition easier for users of
the f_dentry and f_vfsmnt.Signed-off-by: Josef "Jeff" Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Dec, 2006
3 commits
-
Add a proper prototype for remove_inode_dquot_ref() in
include/linux/quotaops.hSigned-off-by: Adrian Bunk
Acked-by: Jan Kara
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Replace all uses of kmem_cache_t with struct kmem_cache.
The patch was generated using the following script:
#!/bin/sh
#
# Replace one string by another in all the kernel sources.
#set -e
for file in `find * -name "*.c" -o -name "*.h"|xargs grep -l $1`; do
quilt add $file
sed -e "1,\$s/$1/$2/g" $file >/tmp/$$
mv /tmp/$$ $file
quilt refresh
doneThe script was run like this
sh replace kmem_cache_t "struct kmem_cache"
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
SLAB_KERNEL is an alias of GFP_KERNEL.
Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Oct, 2006
1 commit
-
The splice_actor may be calling ->prepare_write() and ->commit_write(). We
want i_mutex on the inode being written to before calling those so that we
don't race i_size changes.The double locking behavior is done elsewhere in splice.c, and if we
eventually want _nolock variants of generic_file_splice_write(), fs modules
might have to replicate the nasty locking code. We introduce
inode_double_lock() and inode_double_unlock() to consolidate the locking
rules into one set of functions.Signed-off-by: Mark Fasheh
Signed-off-by: Jens Axboe
11 Oct, 2006
1 commit
-
Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds
02 Oct, 2006
1 commit
-
Only touch inode's i_mtime and i_ctime to make them equal to "now" in case
they aren't yet (don't just update timestamp unconditionally). Uninline
the hash function to save 259 Bytes.This tiny inode change which may improve cache behaviour also shaves off 8
Bytes from file_update_time() on i386.Included a tiny codestyle cleanup, too.
Signed-off-by: Andreas Mohr
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Oct, 2006
1 commit
-
Move __invalidate_device() from fs/inode.c to fs/block_dev.c so that it can
more easily be disabled when the block layer is disabled.Signed-Off-By: David Howells
Signed-off-by: Jens Axboe
30 Sep, 2006
1 commit
-
[assuming BSD security levels are deleted]
The only user of i_security, f_security, s_security fields is SELinux,
however, quite a few security modules are trying to get into kernel.
So, wrap them under CONFIG_SECURITY. Adding config option for each
security field is likely an overkill.Following Stephen Smalley's suggestion, i_security initialization is
moved to security_inode_alloc() to not clutter core code with ifdefs
and make alloc_inode() codepath tiny little bit smaller and faster.The user of (highly greppable) struct fown_struct::security field is
still to be found. I've checked every "fown_struct" and every "f_owner"
occurence. Additionally it's removal doesn't break i386 allmodconfig
build.struct inode, struct file, struct super_block, struct fown_struct
become smaller.P.S. Combined with two reiserfs inode shrinking patches sent to
linux-fsdevel, I can finally suck 12 reiserfs inodes into one page./proc/slabinfo
-ext2_inode_cache 388 10
+ext2_inode_cache 384 10
-inode_cache 280 14
+inode_cache 276 14
-proc_inode_cache 296 13
+proc_inode_cache 292 13
-reiser_inode_cache 336 11
+reiser_inode_cache 332 12
Cc: Stephen Smalley
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Sep, 2006
3 commits
-
Move the i_cdev pointer in struct inode into a union.
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Move the i_bdev pointer in struct inode into a union.
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The following patches reduce the size of the VFS inode structure by 28 bytes
on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction
in the inode size on a UP kernel that is configured in a production mode
(i.e., with no spinlock or other debugging functions enabled; if you want to
save memory taken up by in-core inodes, the first thing you should do is
disable the debugging options; they are responsible for a huge amount of bloat
in the VFS inode structure).This patch:
The filesystem or device-specific pointer in the inode is inside a union,
which is pretty pointless given that all 30+ users of this field have been
using the void pointer. Get rid of the union and rename it to i_private, with
a comment to explain who is allowed to use the void pointer. This is just a
cleanup, but it allows us to reuse the union 'u' for something something where
the union will actually be used.[judith@osdl.org: powerpc build fix]
Signed-off-by: "Theodore Ts'o"
Signed-off-by: Judith Lebzelter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
01 Jul, 2006
3 commits
-
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial:
Remove obsolete #include
remove obsolete swsusp_encrypt
arch/arm26/Kconfig typos
Documentation/IPMI typos
Kconfig: Typos in net/sched/Kconfig
v9fs: do not include linux/version.h
Documentation/DocBook/mtdnand.tmpl: typo fixes
typo fixes: specfic -> specific
typo fixes in Documentation/networking/pktgen.txt
typo fixes: occuring -> occurring
typo fixes: infomation -> information
typo fixes: disadvantadge -> disadvantage
typo fixes: aquire -> acquire
typo fixes: mecanism -> mechanism
typo fixes: bandwith -> bandwidth
fix a typo in the RTC_CLASS help text
smb is no longer maintainedManually merged trivial conflict in arch/um/kernel/vmlinux.lds.S
-
The remaining counters in page_state after the zoned VM counter patches
have been applied are all just for show in /proc/vmstat. They have no
essential function for the VM.We use a simple increment of per cpu variables. In order to avoid the most
severe races we disable preempt. Preempt does not prevent the race between
an increment and an interrupt handler incrementing the same statistics
counter. However, that race is exceedingly rare, we may only loose one
increment or so and there is no requirement (at least not in kernel) that
the vm event counters have to be accurate.In the non preempt case this results in a simple increment for each
counter. For many architectures this will be reduced by the compiler to a
single instruction. This single instruction is atomic for i386 and x86_64.
And therefore even the rare race condition in an interrupt is avoided for
both architectures in most cases.The patchset also adds an off switch for embedded systems that allows a
building of linux kernels without these counters.The implementation of these counters is through inline code that hopefully
results in only a single instruction increment instruction being emitted
(i386, x86_64) or in the increment being hidden though instruction
concurrency (EPIC architectures such as ia64 can get that done).Benefits:
- VM event counter operations usually reduce to a single inline instruction
on i386 and x86_64.
- No interrupt disable, only preempt disable for the preempt case.
Preempt disable can also be avoided by moving the counter into a spinlock.
- Handling is similar to zoned VM counters.
- Simple and easily extendable.
- Can be omitted to reduce memory use for embedded use.References:
RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=113512330605497&w=2
RFC http://marc.theaimsgroup.com/?l=linux-kernel&m=114988082814934&w=2
local_t http://marc.theaimsgroup.com/?l=linux-kernel&m=114991748606690&w=2
V2 http://marc.theaimsgroup.com/?t=115014808400007&r=1&w=2
V3 http://marc.theaimsgroup.com/?l=linux-kernel&m=115024767022346&w=2
V4 http://marc.theaimsgroup.com/?l=linux-kernel&m=115047968808926&w=2Signed-off-by: Christoph Lameter
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Jörn Engel
Signed-off-by: Adrian Bunk
29 Jun, 2006
1 commit
-
Same as with already do with the file operations: keep them in .rodata and
prevents people from doing runtime patching.Signed-off-by: Christoph Hellwig
Cc: Steven French
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
02 Apr, 2006
1 commit
-
this changes if() BUG(); constructs to BUG_ON() which is
cleaner and can better optimized awaySigned-off-by: Eric Sesterhenn
Signed-off-by: Adrian Bunk
29 Mar, 2006
1 commit
-
Mark the f_ops members of inodes as const, as well as fix the
ripple-through this causes by places that copy this f_ops and then "do
stuff" with it.Signed-off-by: Arjan van de Ven
Signed-off-by: Alexey Dobriyan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
27 Mar, 2006
1 commit
-
I discovered on oprofile hunting on a SMP platform that dentry lookups were
slowed down because d_hash_mask, d_hash_shift and dentry_hashtable were in
a cache line that contained inodes_stat. So each time inodes_stats is
changed by a cpu, other cpus have to refill their cache line.This patch moves some variables to the __read_mostly section, in order to
avoid false sharing. RCU dentry lookups can go full speed.Signed-off-by: Eric Dumazet
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Mar, 2006
1 commit
-
There's no reason for iprune_mutex being global.
Signed-off-by: Adrian Bunk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
24 Mar, 2006
1 commit
-
Change the kmem_cache_create calls for certain slab caches to support cpuset
memory spreading.See the previous patches, cpuset_mem_spread, for an explanation of cpuset
memory spreading, and cpuset_mem_spread_slab_cache for the slab cache support
for memory spreading.The slab caches marked for now are: dentry_cache, inode_cache, some xfs slab
caches, and buffer_head. This list may change over time. In particular,
other file system types that are used extensively on large NUMA systems may
want to allow for spreading their directory and inode slab cache entries.Signed-off-by: Paul Jackson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
23 Mar, 2006
2 commits
-
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.Signed-off-by: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Semaphore to mutex conversion.
The conversion was generated via scripts, and the result was validated
automatically via a script as well.Signed-off-by: Ingo Molnar
Cc: John McCutchan
Signed-off-by: Andrew Morton
Acked-by: Robert Love
Signed-off-by: Linus Torvalds
02 Feb, 2006
1 commit
-
Update some parameter descriptions to actually match the code.
Signed-off-by: Martin Waitz
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
11 Jan, 2006
3 commits
-
Turn noatime and nodiratime into per-mount instead of per-sb flags.
After all the preparations this is a rather trivial patch. The mount code
needs to treat the two options as per-mount instead of per-superblock, and
touch_atime needs to be changed to check the new MNT_ flags in addition to
the MS_ flags that are kept for filesystems that are always
noatime/nodiratime but not user settable anymore. Besides that core code
only nfs needed an update because it's leaving atime updates to the server
and thus sets the S_NOATIME flag on every inode, but needs to know whether
it's a real noatime mount for an getattr optimization.While we're at it I've killed the IS_NOATIME/IS_NODIRATIME macros that were
only used by touch_atime.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
All callers use touch_atime now which takes a vfsmount and allows us to
implement per-mount noatime.Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
To allow various options to work per-mount instead of per-sb we need a
struct vfsmount when updating ctime and mtime. This preparation patch
replaces the inode_update_time routine with a file_update_atime routine so
we can easily get at the vfsmount. (and the file makes more sense in this
context anyway). Also get rid of the unused second argument - we always
want to update the ctime when calling this routine.Signed-off-by: Christoph Hellwig
Cc: Al Viro
Cc: Anton Altaparmakov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds