17 Jul, 2007
40 commits
-
The recent PRIVATE and REQUEUE_PI changes to the futex code made it hard to
read. Tidy it up.Signed-off-by: Thomas Gleixner
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Update the description of struct file_system_type and get_sb() in
Documentation/filesystems/vfs.txt to match the current code.Signed-off-by: Borislav Petkov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Robert P. J. Day
Cc: Jens Axboe
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Improve performance of sys_time(). sys_time() returns time in seconds, but
it does so by calling do_gettimeofday() and then returning the tv_sec
portion of the GTOD time. But the data structure "xtime", which is updated
by every timer/scheduler tick, already offers HZ granularity time.The patch improves the sysbench OLTP macrobenchmark significantly:
2.6.22-rc6:
#threads
1: transactions: 3733 (373.21 per sec.)
2: transactions: 6676 (667.46 per sec.)
3: transactions: 6957 (695.50 per sec.)
4: transactions: 7055 (705.48 per sec.)
5: transactions: 6596 (659.33 per sec.)2.6.22-rc6 + sys_time.patch:
1: transactions: 4005 (400.47 per sec.)
2: transactions: 7379 (737.77 per sec.)
3: transactions: 7347 (734.49 per sec.)
4: transactions: 7468 (746.65 per sec.)
5: transactions: 7428 (742.47 per sec.)Mixed API uses of gettimeofday() and time() are guaranteed to be coherent
via the use of a at-most-once-per-second slowpath that updates xtime.[akpm@linux-foundation.org: build fixes]
Signed-off-by: Ingo Molnar
Cc: John Stultz
Cc: Thomas Gleixner
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Replace (n & (n-1)) in the context of power of 2 checks with
is_power_of_2().Signed-off-by: vignesh babu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Replace (n & (n-1)) in the context of power of 2 checks with is_power_of_2()
Signed-off-by: vignesh babu
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
sys_ioctl() was only exported for our first version of compat ioctl
handling. Now that the whole compat ioctl handling mess is more or less
sorted out there are no more modular users left and we can kill it.There's one exception and that's sparc64's solaris compat module, but
sparc64 has it's own export predating the generic one by years for that
which this patch leaves untouched.Signed-off-by: Christoph Hellwig
Acked-by: David S. Miller
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add info that the Code: bytes line contains or (wxyz) in some
architecture oops reports and what that means.Add a script by Andi Kleen that reads the Code: line from an Oops report
file and generates assembly code from the hex bytes.Signed-off-by: Randy Dunlap
Cc: Andi Kleen
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
While working on unshare support for the network namespace I noticed we
were putting clone flags in an int. Which is weird because the syscall
uses unsigned long and we at least need an unsigned to properly hold all of
the unshare flags.So to make the code consistent, this patch updates the code to use
unsigned long instead of int for the clone flags in those places
where we get it wrong today.Signed-off-by: Eric W. Biederman
Acked-by: Cedric Le Goater
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
ext3_change_inode_journal_flag() is only called from one location:
ext3_ioctl(EXT3_IOC_SETFLAGS). That ioctl case already has a IS_RDONLY()
call in it so this one is superfluous.Signed-off-by: Dave Hansen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The procfs-guide claims that 'the parameter start doesn't seem to be used
anywhere in the kernel'. This is out of date. In linux/fs/proc/generic.c
we find a very nice description of the parameters to read_func. The
appended patch replaces the bogus description with this (as far as I know)
accurate one.Cc: "Randy.Dunlap"
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Robert P. J. Day
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
OpenVZ Linux kernel team has discovered the problem with 32bit quota tools
working on 64bit architectures. In 2.6.10 kernel sys32_quotactl() function
was replaced by sys_quotactl() with the comment "sys_quotactl seems to be
32/64bit clean, enable it for 32bit" However this isn't right. Look at
if_dqblk structure:struct if_dqblk {
__u64 dqb_bhardlimit;
__u64 dqb_bsoftlimit;
__u64 dqb_curspace;
__u64 dqb_ihardlimit;
__u64 dqb_isoftlimit;
__u64 dqb_curinodes;
__u64 dqb_btime;
__u64 dqb_itime;
__u32 dqb_valid;
};For 32 bit quota tools sizeof(if_dqblk) == 0x44.
But for 64 bit kernel its size is 0x48, 'cause of alignment!
Thus we got a problem. Attached patch reintroduce sys32_quotactl() function,
that handles this and related situations.[michal.k.k.piotrowski@gmail.com: build fix]
[akpm@linux-foundation.org: Make it link with CONFIG_QUOTA=n]
Signed-off-by: Vasily Tarasov
Cc: Andi Kleen
Cc: "Luck, Tony"
Cc: Jan Kara
Cc:
Signed-off-by: Michal Piotrowski
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
One common problem with 32 bit system call and ioctl emulation is the
different alignment rules between i386 and 64 bit machines. A number of
drivers work around this by marking the compat structures as
'attribute((packed))', which is not the right solution because it breaks
all the non-x86 architectures that want to use the same compat code.Hopefully, this patch improves the situation, it introduces two new types,
compat_u64 and compat_s64. These are defined on all architectures to have
the same size and alignment as the 32 bit version of u64 and s64.Signed-off-by: Arnd Bergmann
Acked-by: David S. Miller
Cc: David Woodhouse
Cc: Andi Kleen
Cc: Benjamin Herrenschmidt
Cc: Vasily Tarasov
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Fix parameter name in audit_core_dumps for kerneldoc.
Signed-off-by: Henrik Kretzschmar
Cc: Al Viro
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
aa95387774039096c11803c04011f1aa42d85758 removed the implementation of
lock_cpu_hotplug_interruptible and all users of it. This stub definition
for !CONFIG_HOTPLUG_CPU was left over -- kill it now.Signed-off-by: Nathan Lynch
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
ext4_orphan_add() and ext4_orphan_del() functions lock sb->s_lock with a
transaction started with ext4_mark_recovery_complete() waits for a transaction
holding sb->s_lock, thus leading to a possible deadlock. At the moment we
call ext4_mark_recovery_complete() from ext4_remount() we have done all the
work needed for remounting and thus we are safe to drop sb->s_lock before we
wait for transactions to commit. Note that at this moment we are still
guarded by s_umount lock against other remounts/umounts.Signed-off-by: Jan Kara
Cc: Eric Sandeen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
ext3_orphan_add() and ext3_orphan_del() functions lock sb->s_lock with a
transaction started with ext3_mark_recovery_complete() waits for a transaction
holding sb->s_lock, thus leading to a possible deadlock. At the moment we
call ext3_mark_recovery_complete() from ext3_remount() we have done all the
work needed for remounting and thus we are safe to drop sb->s_lock before we
wait for transactions to commit. Note that at this moment we are still
guarded by s_umount lock against other remounts/umounts.Signed-off-by: Jan Kara
Cc: Eric Sandeen
Cc:
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Especially when !CONFIG_HOTPLUG_CPU, avoid needlessy allocating resources for
CPUs that can never become available.Signed-off-by: Jan Beulich
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It should improve performance in some scenarii where a lot of
these nsproxy objects are created by unsharing namespaces. This is
a typical use of virtual servers that are being created or entered.This is also a good tool to find leaks and gather statistics on
namespace usage.Signed-off-by: Cedric Le Goater
Cc: Herbert Poetzl
Cc: Pavel Emelianov
Cc: "Eric W. Biederman"
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
dup_mnt_ns() and clone_uts_ns() return NULL on failure. This is wrong,
create_new_namespaces() uses ERR_PTR() to catch an error. This means that the
subsequent create_new_namespaces() will hit BUG_ON() in copy_mnt_ns() or
copy_utsname().Modify create_new_namespaces() to also use the errors returned by the
copy_*_ns routines and not to systematically return ENOMEM.[oleg@tv-sign.ru: better changelog]
Signed-off-by: Cedric Le Goater
Cc: Serge E. Hallyn
Cc: Badari Pulavarty
Cc: Pavel Emelianov
Cc: Herbert Poetzl
Cc: Eric W. Biederman
Cc: Oleg Nesterov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Repair indenting bustage.
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
It's useful sometimes to disable the softlockup checker at boottime.
Especially if it triggers during a distro install.Signed-off-by: Dave Jones
Cc: Ingo Molnar
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Remove not only the references to Cobalt NVRAM, but the header file as
well.Signed-off-by: Robert P. J. Day
Acked-by: Tim Hockin
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add an item to the RCU documentation checklist noting that RCU callbacks
can run in parallel.Signed-off-by: Paul E. McKenney
Cc: Steven Rostedt
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
fs/binfmt_elf.c: In function 'load_elf_binary':
fs/binfmt_elf.c:1002: warning: 'interp_map_addr' may be used uninitialized in this functionThe compiler (gcc-4.1.0) is correct, but it failed to notice that we didn't
use the resulting value.Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Revert my do_ioctl() debugging patch: Paul fixed the bug.
Cc: Paul Fulghum
Cc: Alan Cox
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
This patch enables the unshare of user namespaces.
It adds a new clone flag CLONE_NEWUSER and implements copy_user_ns() which
resets the current user_struct and adds a new root user (uid == 0)For now, unsharing the user namespace allows a process to reset its
user_struct accounting and uid 0 in the new user namespace should be contained
using appropriate means, for instance selinuxThe plan, when the full support is complete (all uid checks covered), is to
keep the original user's rights in the original namespace, and let a process
become uid 0 in the new namespace, with full capabilities to the new
namespace.Signed-off-by: Serge E. Hallyn
Signed-off-by: Cedric Le Goater
Acked-by: Pavel Emelianov
Cc: Herbert Poetzl
Cc: Kirill Korotaev
Cc: Eric W. Biederman
Cc: Chris Wright
Cc: Stephen Smalley
Cc: James Morris
Cc: Andrew Morgan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Basically, it will allow a process to unshare its user_struct table,
resetting at the same time its own user_struct and all the associated
accounting.A new root user (uid == 0) is added to the user namespace upon creation.
Such root users have full privileges and it seems that theses privileges
should be controlled through some means (process capabilities ?)The unshare is not included in this patch.
Changes since [try #4]:
- Updated get_user_ns and put_user_ns to accept NULL, and
get_user_ns to return the namespace.Changes since [try #3]:
- moved struct user_namespace to files user_namespace.{c,h}Changes since [try #2]:
- removed struct user_namespace* argument from find_user()Changes since [try #1]:
- removed struct user_namespace* argument from find_user()
- added a root_user per user namespaceSigned-off-by: Cedric Le Goater
Signed-off-by: Serge E. Hallyn
Acked-by: Pavel Emelianov
Cc: Herbert Poetzl
Cc: Kirill Korotaev
Cc: Eric W. Biederman
Cc: Chris Wright
Cc: Stephen Smalley
Cc: James Morris
Cc: Andrew Morgan
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
CONFIG_UTS_NS and CONFIG_IPC_NS have very little value as they only
deactivate the unshare of the uts and ipc namespaces and do not improve
performance.Signed-off-by: Cedric Le Goater
Acked-by: "Serge E. Hallyn"
Cc: Eric W. Biederman
Cc: Herbert Poetzl
Cc: Pavel Emelianov
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Add TTY input auditing, used to audit system administrator's actions. This is
required by various security standards such as DCID 6/3 and PCI to provide
non-repudiation of administrator's actions and to allow a review of past
actions if the administrator seems to overstep their duties or if the system
becomes misconfigured for unknown reasons. These requirements do not make it
necessary to audit TTY output as well.Compared to an user-space keylogger, this approach records TTY input using the
audit subsystem, correlated with other audit events, and it is completely
transparent to the user-space application (e.g. the console ioctls still
work).TTY input auditing works on a higher level than auditing all system calls
within the session, which would produce an overwhelming amount of mostly
useless audit events.Add an "audit_tty" attribute, inherited across fork (). Data read from TTYs
by process with the attribute is sent to the audit subsystem by the kernel.
The audit netlink interface is extended to allow modifying the audit_tty
attribute, and to allow sending explanatory audit events from user-space (for
example, a shell might send an event containing the final command, after the
interactive command-line editing and history expansion is performed, which
might be difficult to decipher from the TTY input alone).Because the "audit_tty" attribute is inherited across fork (), it would be set
e.g. for sshd restarted within an audited session. To prevent this, the
audit_tty attribute is cleared when a process with no open TTY file
descriptors (e.g. after daemon startup) opens a TTY.See https://www.redhat.com/archives/linux-audit/2007-June/msg00000.html for a
more detailed rationale document for an older version of this patch.[akpm@linux-foundation.org: build fix]
Signed-off-by: Miloslav Trmac
Cc: Al Viro
Cc: Alan Cox
Cc: Paul Fulghum
Cc: Casey Schaufler
Cc: Steve Grubb
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Currently we handle spurious IRQ activity based upon seeing a lot of
invalid interrupts, and we clear things back on the base of lots of valid
interrupts.Unfortunately in some cases you get legitimate invalid interrupts caused by
timing asynchronicity between the PCI bus and the APIC bus when disabling
interrupts and pulling other tricks. In this case although the spurious
IRQs are not a problem our unhandled counters didn't clear and they act as
a slow running timebomb. (This is effectively what the serial port/tty
problem that was fixed by clearing counters when registering a handler
showed up)It's easy enough to add a second parameter - time. This means that if we
see a regular stream of harmless spurious interrupts which are not harming
processing we don't go off and do something stupid like disable the IRQ
after a month of running. OTOH lockups and performance killers show up a
lot more than 10/second[akpm@linux-foundation.org: cleanup]
Signed-off-by: Alan Cox
Cc: Ingo Molnar
Cc: Thomas Gleixner
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
The intel-rng printed a nice well formatted message when the port was
disabled. Someone then came along and blindly trashed it by screwing up a
trim down to 80 columns.Put it back into the right format and keep the overlong lines as the result
is also MUCH easier to read in this specific case.Signed-off-by: Alan Cox
Cc: Michael Buesch
Acked-by: Jeff Garzik
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
I was seeing a null pointer deref in fs/super.c:vfs_kern_mount().
Some file system get_sb() handler was returning NULL mnt_sb with
a non-negative return value. I also noticed a "hugetlbfs: Bad
mount option:" message in the log.Turns out that hugetlbfs_parse_options() was not checking for an
empty option string after call to strsep(). On failure,
hugetlbfs_parse_options() returns 1. hugetlbfs_fill_super() just
passed this return code back up the call stack where
vfs_kern_mount() missed the error and proceeded with a NULL mnt_sb.Apparently introduced by patch:
hugetlbfs-use-lib-parser-fix-docs.patchThe problem was exposed by this line in my fstab:
none /huge hugetlbfs defaults 0 0
It can also be demonstrated by invoking mount of hugetlbfs
directly with no options or a bogus option.This patch:
1) adds the check for empty option to hugetlbfs_parse_options(),
2) enhances the error message to bracket any unrecognized
option with quotes ,
3) modifies hugetlbfs_parse_options() to return -EINVAL on any
unrecognized option,
4) adds a BUG_ON() to vfs_kern_mount() to catch any get_sb()
handler that returns a NULL mnt->mnt_sb with a return value
>= 0.Signed-off-by: Lee Schermerhorn
Acked-by: Randy Dunlap
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Use lib/parser.c to parse hugetlbfs mount options. Correct docs in
hugetlbpage.txt.old size of hugetlbfs_fill_super: 675 bytes
new size of hugetlbfs_fill_super: 686 bytes
(hugetlbfs_parse_options() is inlined)Signed-off-by: Randy Dunlap
Cc: Hugh Dickins
Cc: David Gibson
Cc: Adam Litke
Acked-by: William Lee Irwin III
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Dave Jones
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Signed-off-by: Jesper Juhl
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Removed kmalloc and memset in favor of kzalloc.
To explain the HFSPLUS_SB() macro in the removed memset call:
hfsplus_fs.h:#define HFSPLUS_SB(super) (*(struct hfsplus_sb_info *)(super)->s_fs_info)
Signed-off-by: Wyatt Banks
Cc: Roman Zippel
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Despite repeated attempts over the last two and half years, this driver
seems somewhat persistant. Remove its deprecated status as it has existing
users who may not be in a position to migrate their apps to O_DIRECT.Signed-off-by: Dave Jones
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds -
Use NULL instead of 0 for pointer:
drivers/misc/sony-laptop.c:1920:6: warning: Using plain integer as NULL pointerSigned-off-by: Randy Dunlap
Acked-by: Mattia Dongili
Signed-off-by: Andrew Morton
Signed-off-by: Linus Torvalds