16 May, 2006

3 commits

  • Signed-off-by: David Woodhouse

    David Woodhouse
     
  • Even since a previous patch:

    Fix race between CONFIG_DEBUG_SLABALLOC and modules
    Sun, 27 Jun 2004 17:55:19 +0000 (17:55 +0000)
    http://www.kernel.org/git/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commit;h=92b3db26d31cf21b70e3c1eadc56c179506d8fbe

    The function symbol_put_addr() will deadlock the kernel.

    symbol_put_addr() would acquire modlist_lock, then while holding the lock call
    two functions kernel_text_address() and module_text_address() which also try
    to acquire the same lock. This deadlocks the kernel of course.

    This patch changes symbol_put_addr() to not acquire the modlist_lock, it
    doesn't need it since it never looks at the module list directly. Also, it
    now uses core_kernel_text() instead of kernel_text_address(). The latter has
    an additional check for addr inside a module, but we don't need to do that
    since we call module_text_address() (the same function kernel_text_address
    uses) ourselves.

    Signed-off-by: Trent Piepho
    Cc: Zwane Mwaikambo
    Acked-by: Rusty Russell
    Cc: Johannes Stezenbach
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Trent Piepho
     
  • With "Paul E. McKenney"

    Introduce rcu_needs_cpu() interface. This can be used to tell if there
    will be a new rcu batch on a cpu soon by looking at the curlist pointer.
    This can be used to avoid to enter a tickless idle state where the cpu
    would miss that a new batch is ready when rcu_start_batch would be called
    on a different cpu.

    Signed-off-by: Heiko Carstens
    Cc: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Heiko Carstens
     

12 May, 2006

1 commit

  • Eric Biederman points out that we can't take the task_lock while holding
    tasklist_lock for writing, because another CPU that holds the task lock
    might take an interrupt that then tries to take tasklist_lock for writing.

    Which would be a nasty deadlock, with one CPU spinning forever in an
    interrupt handler (although admittedly you need to really work at
    triggering it ;)

    Since the ptrace_attach() code is special and very unusual, just make it
    be extra careful, and use trylock+repeat to avoid the possible deadlock.

    Cc: Oleg Nesterov
    Cc: Eric W. Biederman
    Cc: Roland McGrath
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

09 May, 2006

1 commit


08 May, 2006

1 commit

  • This holds the task lock (and, for ptrace_attach, the tasklist_lock)
    over the actual attach event, which closes a race between attacking to a
    thread that is either doing a PTRACE_TRACEME or getting de-threaded.

    Thanks to Oleg Nesterov for reminding me about this, and Chris Wright
    for noticing a lost return value in my first version.

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

01 May, 2006

12 commits

  • While testing the watch performance, I noticed that selinux_task_ctxid()
    was creeping into the results more than it should. Investigation showed
    that the function call was being called whether it was needed or not. The
    below patch fixes this.

    Signed-off-by: Steve Grubb
    Signed-off-by: Al Viro

    Steve Grubb
     
  • 1) The audit_ipc_perms() function has been split into two different
    functions:
    - audit_ipc_obj()
    - audit_ipc_set_perm()

    There's a key shift here... The audit_ipc_obj() collects the uid, gid,
    mode, and SElinux context label of the current ipc object. This
    audit_ipc_obj() hook is now found in several places. Most notably, it
    is hooked in ipcperms(), which is called in various places around the
    ipc code permforming a MAC check. Additionally there are several places
    where *checkid() is used to validate that an operation is being
    performed on a valid object while not necessarily having a nearby
    ipcperms() call. In these locations, audit_ipc_obj() is called to
    ensure that the information is captured by the audit system.

    The audit_set_new_perm() function is called any time the permissions on
    the ipc object changes. In this case, the NEW permissions are recorded
    (and note that an audit_ipc_obj() call exists just a few lines before
    each instance).

    2) Support for an AUDIT_IPC_SET_PERM audit message type. This allows
    for separate auxiliary audit records for normal operations on an IPC
    object and permissions changes. Note that the same struct
    audit_aux_data_ipcctl is used and populated, however there are separate
    audit_log_format statements based on the type of the message. Finally,
    the AUDIT_IPC block of code in audit_free_aux() was extended to handle
    aux messages of this new type. No more mem leaks I hope ;-)

    Signed-off-by: Al Viro

    Steve Grubb
     
  • Hi,

    The patch below builds upon the patch sent earlier and adds subject label to
    all audit events generated via the netlink interface. It also cleans up a few
    other minor things.

    Signed-off-by: Steve Grubb

    Signed-off-by: Al Viro

    Steve Grubb
     
  • The below patch should be applied after the inode and ipc sid patches.
    This patch is a reworking of Tim's patch that has been updated to match
    the inode and ipc patches since its similar.

    [updated:
    > Stephen Smalley also wanted to change a variable from isec to tsec in the
    > user sid patch. ]

    Signed-off-by: Steve Grubb
    Signed-off-by: Al Viro

    Steve Grubb
     
  • Hi,

    The patch below converts IPC auditing to collect sid's and convert to context
    string only if it needs to output an audit record. This patch depends on the
    inode audit change patch already being applied.

    Signed-off-by: Steve Grubb

    Signed-off-by: Al Viro

    Steve Grubb
     
  • Previously, we were gathering the context instead of the sid. Now in this patch,
    we gather just the sid and convert to context only if an audit event is being
    output.

    This patch brings the performance hit from 146% down to 23%

    Signed-off-by: Al Viro

    Steve Grubb
     
  • This patch provides the ability to filter audit messages based on the
    elements of the process' SELinux context (user, role, type, mls sensitivity,
    and mls clearance). It uses the new interfaces from selinux to opaquely
    store information related to the selinux context and to filter based on that
    information. It also uses the callback mechanism provided by selinux to
    refresh the information when a new policy is loaded.

    Signed-off-by: Al Viro

    Darrel Goeddel
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • ... it's always current, and that's a good thing - allows simpler locking.

    Signed-off-by: Al Viro

    Al Viro
     
  • now we can do that - all callers are process-synchronous and do not hold
    any locks.

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Don't assume that audit_log_exit() et.al. are called for the context of
    current; pass task explictly.

    Signed-off-by: Al Viro

    Al Viro
     

28 Apr, 2006

2 commits

  • - Add new SA_PROBEIRQ which suppresses the new sharing-mismatch warning.
    Some drivers like to use request_irq() to find an unused interrupt slot.

    - Use it in i82365.c

    - Kill unused SA_PROBE.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • There's an off-by-1 in kernel/power/main.c:state_store() ... if your
    kernel just happens to have some non-zero data at pm_states[PM_SUSPEND_MAX]
    (i.e. one past the end of the array) then it'll let you write anything you
    want to /sys/power/state and in response the box will enter S5.

    Signed-off-by: dean gaudet
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    dean gaudet
     

26 Apr, 2006

2 commits

  • Few of the notifier_chain_register() callers use __init in the definition
    of notifier_call. It is incorrect as the function definition should be
    available after the initializations (they do not unregister them during
    initializations).

    This patch fixes all such usages to _not_ have the notifier_call __init
    section.

    Signed-off-by: Chandra Seetharaman
    Signed-off-by: Linus Torvalds

    Chandra Seetharaman
     
  • Few of the notifier_chain_register() callers use __devinitdata in the
    definition of notifier_block data structure. It is incorrect as the
    data structure should be available after the initializations (they do
    not unregister them during initializations).

    This was leading to an oops when notifier_chain_register() call is
    invoked for those callback chains after initialization.

    This patch fixes all such usages to _not_ have the notifier_block data
    structure in the init data section.

    Signed-off-by: Chandra Seetharaman
    Signed-off-by: Linus Torvalds

    Chandra Seetharaman
     

20 Apr, 2006

6 commits

  • * 'for-linus' of git://brick.kernel.dk/data/git/linux-2.6-block:
    [PATCH] block/elevator.c: remove unused exports
    [PATCH] splice: fix smaller sized splice reads
    [PATCH] Don't inherit ->splice_pipe across forks
    [patch] cleanup: use blk_queue_stopped
    [PATCH] Document online io scheduler switching

    Linus Torvalds
     
  • In cases where a struct kretprobe's *_handler fields are non-NULL, it is
    possible to cause a system crash, due to the possibility of calls ending up
    in zombie functions. Documentation clearly states that unused *_handlers
    should be set to NULL, but kprobe users sometimes fail to do so.

    Fix it by setting the non-relevant fields of the struct kretprobe to NULL.

    Signed-off-by: Ananth N Mavinakayanahalli
    Acked-by: Jim Keniston
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ananth N Mavinakayanahalli
     
  • It's really task private, so clear that field on fork after copying
    task structure.

    Signed-off-by: Jens Axboe

    Jens Axboe
     
  • Those also break userland regs like following.

    00000000 :
    0: 0f b7 44 24 0c movzwl 0xc(%esp),%eax
    5: 83 ca ff or $0xffffffff,%edx
    8: 0f b7 4c 24 08 movzwl 0x8(%esp),%ecx
    d: 66 83 f8 ff cmp $0xffffffff,%ax
    11: 0f 44 c2 cmove %edx,%eax
    14: 66 83 f9 ff cmp $0xffffffff,%cx
    18: 0f 45 d1 cmovne %ecx,%edx
    1b: 89 44 24 0c mov %eax,0xc(%esp)
    1f: 89 54 24 08 mov %edx,0x8(%esp)
    23: e9 fc ff ff ff jmp 24

    where the tailcall at the end overwrites the incoming stack-frame.

    Signed-off-by: OGAWA Hirofumi
    [ I would _really_ like to have a way to tell gcc about calling
    conventions. The "prevent_tail_call()" macro is pretty ugly ]
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     
  • The function free_pagedir() used by swsusp for freeing its internal data
    structures clears the PG_nosave and PG_nosave_free flags for each page
    being freed.

    However, during resume PG_nosave_free set means that the page in
    question is "unsafe" (ie. it will be overwritten in the process of
    restoring the saved system state from the image), so it should not be
    used for the image data.

    Therefore free_pagedir() should not clear PG_nosave_free if it's called
    during resume (otherwise "unsafe" pages freed by it may be used for
    storing the image data and the data may get corrupted later on).

    Signed-off-by: Rafael J. Wysocki
    Acked-by: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Rafael J. Wysocki
     
  • While we can currently walk through thread groups, process groups, and
    sessions with just the rcu_read_lock, this opens the door to walking the
    entire task list.

    We already have all of the other RCU guarantees so there is no cost in
    doing this, this should be enough so that proc can stop taking the
    tasklist lock during readdir.

    prev_task was killed because it has no users, and using it will miss new
    tasks when doing an rcu traversal.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     

15 Apr, 2006

2 commits

  • Somehow in the midst of dotting i's and crossing t's during
    the merge up to rc1 we wound up keeping __put_task_struct_cb
    when it should have been killed as it no longer has any users.
    Sorry I probably should have caught this while it was
    still in the -mm tree.

    Having the old code there gets confusing when reading
    through the code and trying to understand what is
    happening.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Since the last user is removed in -mm, we can now remove this long deprecated
    function.

    Signed-off-by: Adrian Bunk
    Cc: Pavel Machek
    Signed-off-by: Andrew Morton
    Signed-off-by: Greg Kroah-Hartman

    Adrian Bunk
     

14 Apr, 2006

1 commit

  • This reverts most of commit 30e0fca6c1d7d26f3f2daa4dd2b12c51dadc778a.
    It broke the case of non-leader MT exec when ptraced.
    I think the bug it was intended to fix was already addressed by commit
    788e05a67c343fa22f2ae1d3ca264e7f15c25eaf.

    Signed-off-by: Roland McGrath
    Acked-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Roland McGrath
     

11 Apr, 2006

9 commits

  • Commit e56d090310d7625ecb43a1eeebd479f04affb48b

    [PATCH] RCU signal handling

    made this BUG_ON() unsafe. This code runs under ->siglock,
    while switch_exec_pids() takes tasklist_lock.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • * 'splice' of git://brick.kernel.dk/data/git/linux-2.6-block:
    [PATCH] vfs: add splice_write and splice_read to documentation
    [PATCH] Remove sys_ prefix of new syscalls from __NR_sys_*
    [PATCH] splice: warning fix
    [PATCH] another round of fs/pipe.c cleanups
    [PATCH] splice: comment styles
    [PATCH] splice: add Ingo as addition copyright holder
    [PATCH] splice: unlikely() optimizations
    [PATCH] splice: speedups and optimizations
    [PATCH] pipe.c/fifo.c code cleanups
    [PATCH] get rid of the PIPE_*() macros
    [PATCH] splice: speedup __generic_file_splice_read
    [PATCH] splice: add direct fd fd splicing support
    [PATCH] splice: add optional input and output offsets
    [PATCH] introduce a "kernel-internal pipe object" abstraction
    [PATCH] splice: be smarter about calling do_page_cache_readahead()
    [PATCH] splice: optimize the splice buffer mapping
    [PATCH] splice: cleanup __generic_file_splice_read()
    [PATCH] splice: only call wake_up_interruptible() when we really have to
    [PATCH] splice: potential !page dereference
    [PATCH] splice: mark the io page as accessed

    Linus Torvalds
     
  • Add a cpu_relax() to the hand-coded spinwait in hrtimer_cancel().

    Signed-off-by: Joe Korty
    Acked-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Korty
     
  • Signed-off-by: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Christoph Hellwig
     
  • Implement the scheduled unexport of panic_timeout.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • We need the boot CPU's tvec_bases[] entry to be initialised super-early in
    boot, for early_serial_setup(). That runs within setup_arch(), before even
    per-cpu areas are initialised.

    The patch changes tvec_bases to use compile-time initialisation, and adds a
    separate array `tvec_base_done' to keep track of which CPU has had its
    tvec_bases[] entry initialised (because we can no longer use the zeroness of
    that tvec_bases[] entry to determine whether it has been initialised).

    Thanks to Eugene Surovegin for diagnosing this.

    Cc: Eugene Surovegin
    Cc: Jan Beulich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • For some architectures, a few syscalls are not linked in noMMU mode. In
    that case, the MMU depending syscalls are needed to be defined as
    'cond_syscall'. For example, ARM architecture selectively links sys_mlock
    by the mode configuration.

    In case of FRV, it has been managed by #ifdef CONFIG_MMU macro in
    arch/frv/kernel/entry.S. However these conditional macros are just
    duplicates if they were defined as cond_syscall. Compilation test is done
    with FRV toolchains for both of MMU and noMMU mode.

    Signed-off-by: Hyok S. Choi
    Cc: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hyok S. Choi
     
  • RT tasks are being awakened on the expired array when expired_starving() is
    true, whereas they really should be excluded. Fix.

    Signed-off-by: Mike Galbraith
    Acked-by: Ingo Molnar
    Cc: Con Kolivas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Galbraith
     
  • Fix a starvation problem that occurs when a stream of highly interactive tasks
    delay an array switch for extended periods despite EXPIRED_STARVING(rq) being
    true. AFAIKT, the only choice is to enqueue awakening tasks on the expired
    array in this case.

    Without this patch, it can be nearly impossible to remotely login to a busy
    server, and interactive shell commands can starve for minutes.

    Also, convert the EXPIRED_STARVING macro into an inline function which humans
    can understand.

    Signed-off-by: Mike Galbraith
    Acked-by: Ingo Molnar
    Cc: Nick Piggin
    Acked-by: Con Kolivas
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Mike Galbraith