11 Dec, 2006
1 commit
-
Currently, each fdtable supports three dynamically-sized arrays of data: the
fdarray and two fdsets. The code allows the number of fds supported by the
fdarray (fdtable->max_fds) to differ from the number of fds supported by each
of the fdsets (fdtable->max_fdset).In practice, it is wasteful for these two sizes to differ: whenever we hit a
limit on the smaller-capacity structure, we will reallocate the entire fdtable
and all the dynamic arrays within it, so any delta in the memory used by the
larger-capacity structure will never be touched at all.Rather than hogging this excess, we shouldn't even allocate it in the first
place, and keep the capacities of the fdarray and the fdsets equal. This
patch removes fdtable->max_fdset. As an added bonus, most of the supporting
code becomes simpler.Signed-off-by: Vadim Lobanov
Cc: Christoph Hellwig
Cc: Al Viro
Cc: Dipankar Sarma
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Dec, 2006
1 commit
-
This patch changes struct file to use struct path instead of having
independent pointers to struct dentry and struct vfsmount, and converts all
users of f_{dentry,vfsmnt} in fs/ to use f_path.{dentry,mnt}.Additionally, it adds two #define's to make the transition easier for users of
the f_dentry and f_vfsmnt.Signed-off-by: Josef "Jeff" Sipek
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
08 Dec, 2006
2 commits
-
Signed-off-by: Heiko Carstens
Cc: Arnd Bergmann
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
OpenVZ Linux kernel team has found a problem with mounting in compat mode.
Simple command "mount -t smbfs ..." on Fedora Core 5 distro in 32-bit mode
leads to oops:Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP: compat_sys_mount+0xd6/0x290
Process mount (pid: 14656, veid=300, threadinfo ffff810034d30000, task ffff810034c86bc0)
Call Trace: ia32_sysret+0x0/0xaThe problem is that data_page pointer can be NULL, so we should skip data
conversion in this case.Signed-off-by: Andrey Mirkin
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 Dec, 2006
2 commits
-
Signed-off-by: Al Viro
-
Signed-off-by: Al Viro
04 Nov, 2006
1 commit
-
758333458aa719bfc26ec16eafd4ad3a9e96014d fixes the not checked copy_to_user
return value of compat_sys_pselect7. I ran into this too because of an old
source tree, but my fix would look quite a bit different to Andi's fix.The reason is that the compat function IMHO should behave the very same as
the non-compat function if possible. Since sys_pselect7 does not return
-EFAULT in this specific case, change the compat code so it behaves like
sys_pselect7.Cc: David Woodhouse
Cc: Andi Kleen
Signed-off-by: Heiko Carstens
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
11 Oct, 2006
1 commit
-
Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds
03 Oct, 2006
1 commit
-
These patches make the kernel pass 64-bit inode numbers internally when
communicating to userspace, even on a 32-bit system. They are required
because some filesystems have intrinsic 64-bit inode numbers: NFS3+ and XFS
for example. The 64-bit inode numbers are then propagated to userspace
automatically where the arch supports it.Problems have been seen with userspace (eg: ld.so) using the 64-bit inode
number returned by stat64() or getdents64() to differentiate files, and
failing because the 64-bit inode number space was compressed to 32-bits, and
so overlaps occur.This patch:
Make filldir_t take a 64-bit inode number and struct kstat carry a 64-bit
inode number so that 64-bit inode numbers can be passed back to userspace.The stat functions then returns the full 64-bit inode number where
available and where possible. If it is not possible to represent the inode
number supplied by the filesystem in the field provided by userspace, then
error EOVERFLOW will be issued.Similarly, the getdents/readdir functions now pass the full 64-bit inode
number to userspace where possible, returning EOVERFLOW instead when a
directory entry is encountered that can't be properly represented.Note that this means that some inodes will not be stat'able on a 32-bit
system with old libraries where they were before - but it does mean that
there will be no ambiguity over what a 32-bit inode number refers to.Note similarly that directory scans may be cut short with an error on a
32-bit system with old libraries where the scan would work before for the
same reasons.It is judged unlikely that this situation will occur because modern glibc
uses 64-bit capable versions of stat and getdents class functions
exclusively, and that older systems are unlikely to encounter
unrepresentable inode numbers anyway.[akpm: alpha build fix]
Signed-off-by: David Howells
Cc: Trond Myklebust
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
02 Oct, 2006
1 commit
-
Revert Andrew Morton's patch to temporarily hack around the lack of a
declaration of sigset_t in linux/compat.h to make the block-disablement
patches build on IA64. This got accidentally pushed to Linus and should
be fixed in a different manner.Also make linux/compat.h #include asm/signal.h to gain a definition of
sigset_t so that it can externally declare sigset_from_compat().This has been compile-tested for i386, x86_64, ia64, mips, mips64, frv, ppc and
ppc64 and run-tested on frv.Signed-off-by: David Howells
Signed-off-by: Linus Torvalds
01 Oct, 2006
3 commits
-
There were a few accounting data/macros that are used in CSA but are #ifdef'ed
inside CONFIG_BSD_PROCESS_ACCT. This patch is to change those ifdef's from
CONFIG_BSD_PROCESS_ACCT to CONFIG_TASK_XACCT. A few defines are moved from
kernel/acct.c and include/linux/acct.h to kernel/tsacct.c and
include/linux/tsacct_kern.h.Signed-off-by: Jay Lan
Cc: Shailabh Nagar
Cc: Balbir Singh
Cc: Jes Sorensen
Cc: Chris Sturtivant
Cc: Tony Ernst
Cc: Guillaume Thouvenin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch removes readv() and writev() methods and replaces them with
aio_read()/aio_write() methods.Signed-off-by: Badari Pulavarty
Signed-off-by: Christoph Hellwig
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Create a new header file, fs/internal.h, for common definitions local to the
sources in the fs/ directory.Move extern definitions that should be in header files from fs/*.c to
fs/internal.h or other main header files where they span directories.Signed-Off-By: David Howells
Signed-off-by: Jens Axboe
26 Sep, 2006
1 commit
-
Fix
linux/fs/compat.c: In function compat_sys_pselect7
linux/fs/compat.c:1869: warning: ignoring return value of copy_to_user, declared with attribute warn_unused_resultTo make it easier to handle I changed to semantics to not try to
write out a timespec if an error occurred. I hope that's ok.Cc: dwmw2@infradead.org
Signed-off-by: Andi Kleen
27 Jun, 2006
1 commit
-
Sometimes e.g. with crashme the compat layer warnings can be noisy.
Add a way to turn them off by gating all output through compat_printk
that checks a global sysctl. The default is not changed.Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds
23 Jun, 2006
1 commit
-
Give the statfs superblock operation a dentry pointer rather than a superblock
pointer.This complements the get_sb() patch. That reduced the significance of
sb->s_root, allowing NFS to place a fake root there. However, NFS does
require a dentry to use as a target for the statfs operation. This permits
the root in the vfsmount to be used instead.linux/mount.h has been added where necessary to make allyesconfig build
successfully.Interest has also been expressed for use with the FUSE and XFS filesystems.
Signed-off-by: David Howells
Acked-by: Al Viro
Cc: Nathan Scott
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
22 May, 2006
1 commit
-
Functions compat_nfs_svc_trans, compat_nfs_clnt_trans,
compat_nfs_exp_trans, compat_nfs_getfd_trans and compat_nfs_getfs_trans,
which are called by compat_sys_nfsservctl(fs/compat.c), don't handle the
return value of access_ok properly. access_ok return 1 when the addr is
valid, and 0 when it's not, but these functions have the reversed
understanding. When the address is valid, they always return -EFAULT to
compat_sys_nfsservctl.An example is to run /usr/sbin/rpc.nfsd(32bit program on Power5). It
doesn't function as expected. strace showes that nfsservctl returns
-EFAULT.The patch fixes this by correcting the error handling on the return value
of access_ok in the five functions.Signed-off-by: Lin Feng Shen
Cc: Trond Myklebust
Acked-by: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
16 May, 2006
1 commit
-
Mentioned by Mark Armbrust somewhere on Usenet.
Signed-off-by: Alexey Dobriyan
Cc: David Woodhouse
Cc: Ulrich Drepper
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
04 May, 2006
1 commit
-
nr_segs may not be > UIO_MAXIOV, however it may be equal to. This makes
the behaviour identical to the real sys_vmsplice(). The other foov
syscalls also agree that this is the way to go.Signed-off-by: Jens Axboe
02 May, 2006
1 commit
-
Signed-off-by: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Apr, 2006
1 commit
-
This patch addresses a flaw in LSM, where there is no mediation of readv()
and writev() in for 32-bit compatible apps using a 64-bit kernel.This bug was discovered and fixed initially in the native readv/writev
code [1], but was not fixed in the compat code. Thanks to Al for spotting
this one.[1] http://lwn.net/Articles/154282/
Signed-off-by: James Morris
Signed-off-by: Al Viro
Signed-off-by: Linus Torvalds
29 Mar, 2006
1 commit
-
Remove an unnecessary level of indirection in allocating and freeing select
bits, as per the select_bits_alloc() and select_bits_free() functions.
Both select.c and compat.c are updated.Signed-off-by: Vadim Lobanov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
26 Mar, 2006
2 commits
-
Signed-off-by: Oliver Neukum
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Correct some error handling on the compat version of the nfsservctl()
system. It was detecting errors while copying in the arguments from user
space, but then attempting to use the arguments anyway. This didn't seem
so good.Signed-off-by: Peter Staubach
Cc: Trond Myklebust
Cc: Neil Brown
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
24 Mar, 2006
1 commit
-
If we don't want sys_newfstatat because __ARCH_WANT_STAT64 is defined, then
we certainly don't want compat_sys_newfstatat either.Signed-off-by: Grant Grundler
Signed-off-by: Kyle McMartin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
18 Feb, 2006
1 commit
-
I got all of these backwards. We want to return
min(input timeout, new timeout)
to userspace to prevent increasing the time-remaining value.
Thanks to Ernst Herzberg for reporting and diagnosing.
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
12 Feb, 2006
1 commit
-
With David Woodhouse
select() presently has a habit of increasing the value of the user's
`timeout' argument on return.We were writing back a timeout larger than the original. We _deliberately_
round up, since we know we must wait at _least_ as long as the caller asks
us to.The patch adds a couple of helper functions for magnitude comparison of
timespecs and of timevals, and uses them to prevent the various poll and
select functions from returning a timeout which is larger than the one which
was passed in.The patch also fixes a bug in compat_sys_pselect7(): it was adding the new
timeout value to the old one and was returning that. It should just return
the new timeout value.(We have various handy timespec/timeval-to-from-nsec conversion functions in
time.h. But this code open-codes it all).Cc: "David S. Miller"
Cc: Andi Kleen
Cc: Ulrich Drepper
Cc: Thomas Gleixner
Cc: george anzinger
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
02 Feb, 2006
2 commits
-
Most of the 64 bit architectures will zero extend the first argument to
compat_sys_{openat,newfstatat,futimesat} which will fail if the 32 bit
syscall was passed AT_FDCWD (which is a small negative number). Declare
the first argument to be an unsigned int which will force the correct
sign extension when the internal functions are called in each case.Also, do some small white space cleanups in fs/compat.c.
Signed-off-by: Stephen Rothwell
Acked-by: David S. Miller
Signed-off-by: Linus Torvalds -
fs/compat.c: In function `compat_sys_pselect7':
fs/compat.c:1820: warning: passing arg 5 of `compat_core_sys_select' from incompatible pointer typeSigned-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
20 Jan, 2006
1 commit
-
The compat layer timeout handling changes in:
9f72949f679df06021c9e43886c9191494fdb007
are busted. This is most easily seen with an X application
that uses sub-second select/poll timeout such as emacs. You
hit a key and it takes a second or so before the app responds.The two ROUND_UP() calls upon entry are using {tv,ts}_sec where it
should instead be using {tv_usec,ts_nsec}, which perfectly explains
the observed incorrect behavior.Another bug shot down with git bisect.
Signed-off-by: David S. Miller
Signed-off-by: Linus Torvalds
19 Jan, 2006
2 commits
-
The following implementation of ppoll() and pselect() system calls
depends on the architecture providing a TIF_RESTORE_SIGMASK flag in the
thread_info.These system calls have to change the signal mask during their
operation, and signal handlers must be invoked using the new, temporary
signal mask. The old signal mask must be restored either upon successful
exit from the system call, or upon returning from the invoked signal
handler if the system call is interrupted. We can't simply restore the
original signal mask and return to userspace, since the restored signal
mask may actually block the signal which interrupted the system call.The TIF_RESTORE_SIGMASK flag deals with this by causing the syscall exit
path to trap into do_signal() just as TIF_SIGPENDING does, and by
causing do_signal() to use the saved signal mask instead of the current
signal mask when setting up the stack frame for the signal handler -- or
by causing do_signal() to simply restore the saved signal mask in the
case where there is no handler to be invoked.The first patch implements the sys_pselect() and sys_ppoll() system
calls, which are present only if TIF_RESTORE_SIGMASK is defined. That
#ifdef should go away in time when all architectures have implemented
it. The second patch implements TIF_RESTORE_SIGMASK for the PowerPC
kernel (in the -mm tree), and the third patch then removes the
arch-specific implementations of sys_rt_sigsuspend() and replaces them
with generic versions using the same trick.The fourth and fifth patches, provided by David Howells, implement
TIF_RESTORE_SIGMASK for FR-V and i386 respectively, and the sixth patch
adds the syscalls to the i386 syscall table.This patch:
Add the pselect() and ppoll() system calls, providing core routines usable by
the original select() and poll() system calls and also the new calls (with
their semantics w.r.t timeouts).Signed-off-by: David Woodhouse
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Here is a series of patches which introduce in total 13 new system calls
which take a file descriptor/filename pair instead of a single file
name. These functions, openat etc, have been discussed on numerous
occasions. They are needed to implement race-free filesystem traversal,
they are necessary to implement a virtual per-thread current working
directory (think multi-threaded backup software), etc.We have in glibc today implementations of the interfaces which use the
/proc/self/fd magic. But this code is rather expensive. Here are some
results (similar to what Jim Meyering posted before).The test creates a deep directory hierarchy on a tmpfs filesystem. Then
rm -fr is used to remove all directories. Without syscall support I get
this:real 0m31.921s
user 0m0.688s
sys 0m31.234sWith syscall support the results are much better:
real 0m20.699s
user 0m0.536s
sys 0m20.149sThe interfaces are for obvious reasons currently not much used. But they'll
be used. coreutils (and Jeff's posixutils) are already using them.
Furthermore, code like ftw/fts in libc (maybe even glob) will also start using
them. I expect a patch to make follow soon. Every program which is walking
the filesystem tree will benefit.Signed-off-by: Ulrich Drepper
Signed-off-by: Alexey Dobriyan
Cc: Christoph Hellwig
Cc: Al Viro
Acked-by: Ingo Molnar
Cc: Michael Kerrisk
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Jan, 2006
1 commit
-
Remove the "inline" keyword from a bunch of big functions in the kernel with
the goal of shrinking it by 30kb to 40kbSigned-off-by: Arjan van de Ven
Signed-off-by: Ingo Molnar
Acked-by: Jeff Garzik
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
09 Jan, 2006
1 commit
-
When making an fctl locking call through compat_sys_fcntl64 (i.e. a 32bit
app on a 64bit kernel), the syscall can return a locking range that is in
conflict with the queried lock.If some aspect of this range does not fit in the 32bit structure, something
needs to be done.The current code is wrong in several respects:
- It returns data to userspace even if no conflict was found
i.e. it should check l_type for F_UNLCK
- It returns -EOVERFLOW too agressively. A lock range covering
the last possible byte of the file (start = COMPAT_OFF_T_MAX,
len = 1) should be possible, but is rejected with the current test.
- A extra-long 'len' should not be a problem. If only that part
of the conflicting lock that would be visible to the 32bit
app needs to be reported to the 32bit app anyway.This patch addresses those three issues and adds a comment to (hopefully)
record it for posterity.Note: this patch mainly affects test-cases. Real applications rarely is
ever see the problems.This patch has been tested (LSB test suite), and works.
Signed-off-by: Neil Brown
Cc: Arnd Bergmann
Cc: Christoph Hellwig
Cc: Matthew Wilcox
Cc: Trond Myklebust
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
05 Jan, 2006
1 commit
-
In particular, allow over-large read- or write-requests to be downgraded
to a more reasonable range, rather than considering them outright errors.We want to protect lower layers from (the sadly all too common) overflow
conditions, but prefer to do so by chopping the requests up, rather than
just refusing them outright.Cc: Peter Anvin
Cc: Ulrich Drepper
Cc: Andi Kleen
Cc: Al Viro
Signed-off-by: Linus Torvalds
23 Nov, 2005
1 commit
-
In fs/compat.c, whenever put_compat_statfs() returns an error, the
containing syscall returns -EFAULT. This is presumably by analogy with the
non-compat case, where any non-zero code from copy_to_user() should be
translated into an EFAULT. However, put_compat_statfs() is also return
-EOVERFLOW. The same applies for put_compat_statfs64().This bug can be observed with a statfs() on a hugetlbfs directory.
hugetlbfs, when mounted without limits reports available, free and total
blocks as -1 (itself a bug, another patch coming). statfs() will
mysteriously return EFAULT although it's parameters are perfectly valid
addresses.This patch causes the compat versions of statfs() and statfs64() to
correctly propogate the return values from put_compat_statfs() and
put_compat_statfs64().Signed-off-by: David Gibson
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
21 Nov, 2005
1 commit
-
Originally for 2.6.16, but the semaphore causes problems for some
people so get rid of it now.It's not needed anymore because the ioctl hash table is never changed
at run time now.Signed-off-by: Andi Kleen
Signed-off-by: Linus Torvalds
30 Oct, 2005
1 commit
-
update_mem_hiwater has attracted various criticisms, in particular from those
concerned with mm scalability. Originally it was called whenever rss or
total_vm got raised. Then many of those callsites were replaced by a timer
tick call from account_system_time. Now Frank van Maarseveen reports that to
be found inadequate. How about this? Works for Frank.Replace update_mem_hiwater, a poor combination of two unrelated ops, by macros
update_hiwater_rss and update_hiwater_vm. Don't attempt to keep
mm->hiwater_rss up to date at timer tick, nor every time we raise rss (usually
by 1): those are hot paths. Do the opposite, update only when about to lower
rss (usually by many), or just before final accounting in do_exit. Handle
mm->hiwater_vm in the same way, though it's much less of an issue. Demand
that whoever collects these hiwater statistics do the work of taking the
maximum with rss or total_vm.And there has been no collector of these hiwater statistics in the tree. The
new convention needs an example, so match Frank's usage by adding a VmPeak
line above VmSize to /proc//status, and also a VmHWM line above VmRSS
(High-Water-Mark or High-Water-Memory).There was a particular anomaly during mremap move, that hiwater_vm might be
captured too high. A fleeting such anomaly remains, but it's quickly
corrected now, whereas before it would stick.What locking? None: if the app is racy then these statistics will be racy,
it's not worth any overhead to make them exact. But whenever it suits,
hiwater_vm is updated under exclusive mmap_sem, and hiwater_rss under
page_table_lock (for now) or with preemption disabled (later on): without
going to any trouble, minimize the time between reading current values and
updating, to minimize those occasions when a racing thread bumps a count up
and back down in between.Signed-off-by: Hugh Dickins
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds
15 Sep, 2005
1 commit
-
Missing acct_update_integrals() and update_mem_hiwater() calls
compared to it's native counterpart.Signed-off-by: David S. Miller
10 Sep, 2005
1 commit
-
Fix up fs/compat.c fixes.