17 Jul, 2007

40 commits

  • The recent PRIVATE and REQUEUE_PI changes to the futex code made it hard to
    read. Tidy it up.

    Signed-off-by: Thomas Gleixner
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Gleixner
     
  • Update the description of struct file_system_type and get_sb() in
    Documentation/filesystems/vfs.txt to match the current code.

    Signed-off-by: Borislav Petkov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Borislav Petkov
     
  • Signed-off-by: Robert P. J. Day
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     
  • Improve performance of sys_time(). sys_time() returns time in seconds, but
    it does so by calling do_gettimeofday() and then returning the tv_sec
    portion of the GTOD time. But the data structure "xtime", which is updated
    by every timer/scheduler tick, already offers HZ granularity time.

    The patch improves the sysbench OLTP macrobenchmark significantly:

    2.6.22-rc6:

    #threads
    1: transactions: 3733 (373.21 per sec.)
    2: transactions: 6676 (667.46 per sec.)
    3: transactions: 6957 (695.50 per sec.)
    4: transactions: 7055 (705.48 per sec.)
    5: transactions: 6596 (659.33 per sec.)

    2.6.22-rc6 + sys_time.patch:

    1: transactions: 4005 (400.47 per sec.)
    2: transactions: 7379 (737.77 per sec.)
    3: transactions: 7347 (734.49 per sec.)
    4: transactions: 7468 (746.65 per sec.)
    5: transactions: 7428 (742.47 per sec.)

    Mixed API uses of gettimeofday() and time() are guaranteed to be coherent
    via the use of a at-most-once-per-second slowpath that updates xtime.

    [akpm@linux-foundation.org: build fixes]
    Signed-off-by: Ingo Molnar
    Cc: John Stultz
    Cc: Thomas Gleixner
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Replace (n & (n-1)) in the context of power of 2 checks with
    is_power_of_2().

    Signed-off-by: vignesh babu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    vignesh babu
     
  • Replace (n & (n-1)) in the context of power of 2 checks with is_power_of_2()

    Signed-off-by: vignesh babu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    vignesh babu
     
  • sys_ioctl() was only exported for our first version of compat ioctl
    handling. Now that the whole compat ioctl handling mess is more or less
    sorted out there are no more modular users left and we can kill it.

    There's one exception and that's sparc64's solaris compat module, but
    sparc64 has it's own export predating the generic one by years for that
    which this patch leaves untouched.

    Signed-off-by: Christoph Hellwig
    Acked-by: David S. Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Add info that the Code: bytes line contains or (wxyz) in some
    architecture oops reports and what that means.

    Add a script by Andi Kleen that reads the Code: line from an Oops report
    file and generates assembly code from the hex bytes.

    Signed-off-by: Randy Dunlap
    Cc: Andi Kleen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • While working on unshare support for the network namespace I noticed we
    were putting clone flags in an int. Which is weird because the syscall
    uses unsigned long and we at least need an unsigned to properly hold all of
    the unshare flags.

    So to make the code consistent, this patch updates the code to use
    unsigned long instead of int for the clone flags in those places
    where we get it wrong today.

    Signed-off-by: Eric W. Biederman
    Acked-by: Cedric Le Goater
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • ext3_change_inode_journal_flag() is only called from one location:
    ext3_ioctl(EXT3_IOC_SETFLAGS). That ioctl case already has a IS_RDONLY()
    call in it so this one is superfluous.

    Signed-off-by: Dave Hansen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Hansen
     
  • The procfs-guide claims that 'the parameter start doesn't seem to be used
    anywhere in the kernel'. This is out of date. In linux/fs/proc/generic.c
    we find a very nice description of the parameters to read_func. The
    appended patch replaces the bogus description with this (as far as I know)
    accurate one.

    Cc: "Randy.Dunlap"
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    C. Scott Ananian
     
  • Signed-off-by: Robert P. J. Day
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     
  • OpenVZ Linux kernel team has discovered the problem with 32bit quota tools
    working on 64bit architectures. In 2.6.10 kernel sys32_quotactl() function
    was replaced by sys_quotactl() with the comment "sys_quotactl seems to be
    32/64bit clean, enable it for 32bit" However this isn't right. Look at
    if_dqblk structure:

    struct if_dqblk {
    __u64 dqb_bhardlimit;
    __u64 dqb_bsoftlimit;
    __u64 dqb_curspace;
    __u64 dqb_ihardlimit;
    __u64 dqb_isoftlimit;
    __u64 dqb_curinodes;
    __u64 dqb_btime;
    __u64 dqb_itime;
    __u32 dqb_valid;
    };

    For 32 bit quota tools sizeof(if_dqblk) == 0x44.
    But for 64 bit kernel its size is 0x48, 'cause of alignment!
    Thus we got a problem. Attached patch reintroduce sys32_quotactl() function,
    that handles this and related situations.

    [michal.k.k.piotrowski@gmail.com: build fix]
    [akpm@linux-foundation.org: Make it link with CONFIG_QUOTA=n]
    Signed-off-by: Vasily Tarasov
    Cc: Andi Kleen
    Cc: "Luck, Tony"
    Cc: Jan Kara
    Cc:
    Signed-off-by: Michal Piotrowski
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vasily Tarasov
     
  • One common problem with 32 bit system call and ioctl emulation is the
    different alignment rules between i386 and 64 bit machines. A number of
    drivers work around this by marking the compat structures as
    'attribute((packed))', which is not the right solution because it breaks
    all the non-x86 architectures that want to use the same compat code.

    Hopefully, this patch improves the situation, it introduces two new types,
    compat_u64 and compat_s64. These are defined on all architectures to have
    the same size and alignment as the 32 bit version of u64 and s64.

    Signed-off-by: Arnd Bergmann
    Acked-by: David S. Miller
    Cc: David Woodhouse
    Cc: Andi Kleen
    Cc: Benjamin Herrenschmidt
    Cc: Vasily Tarasov
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arnd Bergmann
     
  • Fix parameter name in audit_core_dumps for kerneldoc.

    Signed-off-by: Henrik Kretzschmar
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Henrik Kretzschmar
     
  • aa95387774039096c11803c04011f1aa42d85758 removed the implementation of
    lock_cpu_hotplug_interruptible and all users of it. This stub definition
    for !CONFIG_HOTPLUG_CPU was left over -- kill it now.

    Signed-off-by: Nathan Lynch
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nathan Lynch
     
  • ext4_orphan_add() and ext4_orphan_del() functions lock sb->s_lock with a
    transaction started with ext4_mark_recovery_complete() waits for a transaction
    holding sb->s_lock, thus leading to a possible deadlock. At the moment we
    call ext4_mark_recovery_complete() from ext4_remount() we have done all the
    work needed for remounting and thus we are safe to drop sb->s_lock before we
    wait for transactions to commit. Note that at this moment we are still
    guarded by s_umount lock against other remounts/umounts.

    Signed-off-by: Jan Kara
    Cc: Eric Sandeen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • ext3_orphan_add() and ext3_orphan_del() functions lock sb->s_lock with a
    transaction started with ext3_mark_recovery_complete() waits for a transaction
    holding sb->s_lock, thus leading to a possible deadlock. At the moment we
    call ext3_mark_recovery_complete() from ext3_remount() we have done all the
    work needed for remounting and thus we are safe to drop sb->s_lock before we
    wait for transactions to commit. Note that at this moment we are still
    guarded by s_umount lock against other remounts/umounts.

    Signed-off-by: Jan Kara
    Cc: Eric Sandeen
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Especially when !CONFIG_HOTPLUG_CPU, avoid needlessy allocating resources for
    CPUs that can never become available.

    Signed-off-by: Jan Beulich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     
  • It should improve performance in some scenarii where a lot of
    these nsproxy objects are created by unsharing namespaces. This is
    a typical use of virtual servers that are being created or entered.

    This is also a good tool to find leaks and gather statistics on
    namespace usage.

    Signed-off-by: Cedric Le Goater
    Cc: Herbert Poetzl
    Cc: Pavel Emelianov
    Cc: "Eric W. Biederman"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • dup_mnt_ns() and clone_uts_ns() return NULL on failure. This is wrong,
    create_new_namespaces() uses ERR_PTR() to catch an error. This means that the
    subsequent create_new_namespaces() will hit BUG_ON() in copy_mnt_ns() or
    copy_utsname().

    Modify create_new_namespaces() to also use the errors returned by the
    copy_*_ns routines and not to systematically return ENOMEM.

    [oleg@tv-sign.ru: better changelog]
    Signed-off-by: Cedric Le Goater
    Cc: Serge E. Hallyn
    Cc: Badari Pulavarty
    Cc: Pavel Emelianov
    Cc: Herbert Poetzl
    Cc: Eric W. Biederman
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • Repair indenting bustage.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • It's useful sometimes to disable the softlockup checker at boottime.
    Especially if it triggers during a distro install.

    Signed-off-by: Dave Jones
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jones
     
  • Remove not only the references to Cobalt NVRAM, but the header file as
    well.

    Signed-off-by: Robert P. J. Day
    Acked-by: Tim Hockin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Robert P. J. Day
     
  • Add an item to the RCU documentation checklist noting that RCU callbacks
    can run in parallel.

    Signed-off-by: Paul E. McKenney
    Cc: Steven Rostedt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul E. McKenney
     
  • fs/binfmt_elf.c: In function 'load_elf_binary':
    fs/binfmt_elf.c:1002: warning: 'interp_map_addr' may be used uninitialized in this function

    The compiler (gcc-4.1.0) is correct, but it failed to notice that we didn't
    use the resulting value.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Revert my do_ioctl() debugging patch: Paul fixed the bug.

    Cc: Paul Fulghum
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • This patch enables the unshare of user namespaces.

    It adds a new clone flag CLONE_NEWUSER and implements copy_user_ns() which
    resets the current user_struct and adds a new root user (uid == 0)

    For now, unsharing the user namespace allows a process to reset its
    user_struct accounting and uid 0 in the new user namespace should be contained
    using appropriate means, for instance selinux

    The plan, when the full support is complete (all uid checks covered), is to
    keep the original user's rights in the original namespace, and let a process
    become uid 0 in the new namespace, with full capabilities to the new
    namespace.

    Signed-off-by: Serge E. Hallyn
    Signed-off-by: Cedric Le Goater
    Acked-by: Pavel Emelianov
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Chris Wright
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: Andrew Morgan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • Basically, it will allow a process to unshare its user_struct table,
    resetting at the same time its own user_struct and all the associated
    accounting.

    A new root user (uid == 0) is added to the user namespace upon creation.
    Such root users have full privileges and it seems that theses privileges
    should be controlled through some means (process capabilities ?)

    The unshare is not included in this patch.

    Changes since [try #4]:
    - Updated get_user_ns and put_user_ns to accept NULL, and
    get_user_ns to return the namespace.

    Changes since [try #3]:
    - moved struct user_namespace to files user_namespace.{c,h}

    Changes since [try #2]:
    - removed struct user_namespace* argument from find_user()

    Changes since [try #1]:
    - removed struct user_namespace* argument from find_user()
    - added a root_user per user namespace

    Signed-off-by: Cedric Le Goater
    Signed-off-by: Serge E. Hallyn
    Acked-by: Pavel Emelianov
    Cc: Herbert Poetzl
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Chris Wright
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: Andrew Morgan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • CONFIG_UTS_NS and CONFIG_IPC_NS have very little value as they only
    deactivate the unshare of the uts and ipc namespaces and do not improve
    performance.

    Signed-off-by: Cedric Le Goater
    Acked-by: "Serge E. Hallyn"
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Cc: Pavel Emelianov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • Add TTY input auditing, used to audit system administrator's actions. This is
    required by various security standards such as DCID 6/3 and PCI to provide
    non-repudiation of administrator's actions and to allow a review of past
    actions if the administrator seems to overstep their duties or if the system
    becomes misconfigured for unknown reasons. These requirements do not make it
    necessary to audit TTY output as well.

    Compared to an user-space keylogger, this approach records TTY input using the
    audit subsystem, correlated with other audit events, and it is completely
    transparent to the user-space application (e.g. the console ioctls still
    work).

    TTY input auditing works on a higher level than auditing all system calls
    within the session, which would produce an overwhelming amount of mostly
    useless audit events.

    Add an "audit_tty" attribute, inherited across fork (). Data read from TTYs
    by process with the attribute is sent to the audit subsystem by the kernel.
    The audit netlink interface is extended to allow modifying the audit_tty
    attribute, and to allow sending explanatory audit events from user-space (for
    example, a shell might send an event containing the final command, after the
    interactive command-line editing and history expansion is performed, which
    might be difficult to decipher from the TTY input alone).

    Because the "audit_tty" attribute is inherited across fork (), it would be set
    e.g. for sshd restarted within an audited session. To prevent this, the
    audit_tty attribute is cleared when a process with no open TTY file
    descriptors (e.g. after daemon startup) opens a TTY.

    See https://www.redhat.com/archives/linux-audit/2007-June/msg00000.html for a
    more detailed rationale document for an older version of this patch.

    [akpm@linux-foundation.org: build fix]
    Signed-off-by: Miloslav Trmac
    Cc: Al Viro
    Cc: Alan Cox
    Cc: Paul Fulghum
    Cc: Casey Schaufler
    Cc: Steve Grubb
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miloslav Trmac
     
  • Currently we handle spurious IRQ activity based upon seeing a lot of
    invalid interrupts, and we clear things back on the base of lots of valid
    interrupts.

    Unfortunately in some cases you get legitimate invalid interrupts caused by
    timing asynchronicity between the PCI bus and the APIC bus when disabling
    interrupts and pulling other tricks. In this case although the spurious
    IRQs are not a problem our unhandled counters didn't clear and they act as
    a slow running timebomb. (This is effectively what the serial port/tty
    problem that was fixed by clearing counters when registering a handler
    showed up)

    It's easy enough to add a second parameter - time. This means that if we
    see a regular stream of harmless spurious interrupts which are not harming
    processing we don't go off and do something stupid like disable the IRQ
    after a month of running. OTOH lockups and performance killers show up a
    lot more than 10/second

    [akpm@linux-foundation.org: cleanup]
    Signed-off-by: Alan Cox
    Cc: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • The intel-rng printed a nice well formatted message when the port was
    disabled. Someone then came along and blindly trashed it by screwing up a
    trim down to 80 columns.

    Put it back into the right format and keep the overlong lines as the result
    is also MUCH easier to read in this specific case.

    Signed-off-by: Alan Cox
    Cc: Michael Buesch
    Acked-by: Jeff Garzik
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • I was seeing a null pointer deref in fs/super.c:vfs_kern_mount().
    Some file system get_sb() handler was returning NULL mnt_sb with
    a non-negative return value. I also noticed a "hugetlbfs: Bad
    mount option:" message in the log.

    Turns out that hugetlbfs_parse_options() was not checking for an
    empty option string after call to strsep(). On failure,
    hugetlbfs_parse_options() returns 1. hugetlbfs_fill_super() just
    passed this return code back up the call stack where
    vfs_kern_mount() missed the error and proceeded with a NULL mnt_sb.

    Apparently introduced by patch:
    hugetlbfs-use-lib-parser-fix-docs.patch

    The problem was exposed by this line in my fstab:

    none /huge hugetlbfs defaults 0 0

    It can also be demonstrated by invoking mount of hugetlbfs
    directly with no options or a bogus option.

    This patch:

    1) adds the check for empty option to hugetlbfs_parse_options(),
    2) enhances the error message to bracket any unrecognized
    option with quotes ,
    3) modifies hugetlbfs_parse_options() to return -EINVAL on any
    unrecognized option,
    4) adds a BUG_ON() to vfs_kern_mount() to catch any get_sb()
    handler that returns a NULL mnt->mnt_sb with a return value
    >= 0.

    Signed-off-by: Lee Schermerhorn
    Acked-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Lee Schermerhorn
     
  • Use lib/parser.c to parse hugetlbfs mount options. Correct docs in
    hugetlbpage.txt.

    old size of hugetlbfs_fill_super: 675 bytes
    new size of hugetlbfs_fill_super: 686 bytes
    (hugetlbfs_parse_options() is inlined)

    Signed-off-by: Randy Dunlap
    Cc: Hugh Dickins
    Cc: David Gibson
    Cc: Adam Litke
    Acked-by: William Lee Irwin III
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap
     
  • Signed-off-by: Dave Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jones
     
  • Signed-off-by: Jesper Juhl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jesper Juhl
     
  • Removed kmalloc and memset in favor of kzalloc.

    To explain the HFSPLUS_SB() macro in the removed memset call:

    hfsplus_fs.h:#define HFSPLUS_SB(super) (*(struct hfsplus_sb_info *)(super)->s_fs_info)

    Signed-off-by: Wyatt Banks
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Wyatt Banks
     
  • Despite repeated attempts over the last two and half years, this driver
    seems somewhat persistant. Remove its deprecated status as it has existing
    users who may not be in a position to migrate their apps to O_DIRECT.

    Signed-off-by: Dave Jones
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Jones
     
  • Use NULL instead of 0 for pointer:
    drivers/misc/sony-laptop.c:1920:6: warning: Using plain integer as NULL pointer

    Signed-off-by: Randy Dunlap
    Acked-by: Mattia Dongili
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Randy Dunlap