09 Dec, 2006

23 commits

  • Currently there is a regression and the ipc sysctls don't show up in the
    binary sysctl namespace.

    This patch adds sysctl_ipc_data to read data/write from the appropriate
    namespace and deliver it in the expected manner.

    [akpm@osdl.org: warning fix]
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Refactor the ipc sysctl support so that it is simpler, more readable, and
    prepares for fixing the bug with the wrong values being returned in the
    sys_sysctl interface.

    The function proc_do_ipc_string() was misnamed as it never handled strings.
    It's magic of when to work with strings and when to work with longs belonged
    in the sysctl table. I couldn't tell if the code would work if you disabled
    the ipc namespace but it certainly looked like it would have problems.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • The problem: When using sys_sysctl we don't read the proper values for the
    variables exported from the uts namespace, nor do we do the proper locking.

    This patch introduces sysctl_uts_string which properly fetches the values and
    does the proper locking.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • The binary interface to the namespace sysctls was never implemented resulting
    in some really weird things if you attempted to use sys_sysctl to read your
    hostname for example.

    This patch series simples the code a little and implements the binary sysctl
    interface.

    In testing this patch series I discovered that our 32bit compatibility for the
    binary sysctl interface is imperfect. In particular KERN_SHMMAX and
    KERN_SMMALL are size_t sized quantities and are returned as 8 bytes on to
    32bit binaries using a x86_64 kernel. However this has existing for a long
    time so it is not a new regression with the namespace work.

    Gads the whole sysctl thing needs work before it stops being easy to shoot
    yourself in the foot.

    Looking forward a little bit we need a better way to handle sysctls and
    namespaces as our current technique will not work for the network namespace.
    I think something based on the current overlapping sysctl trees will work but
    the proc side needs to be redone before we can use it.

    This patch:

    Introduce get_uts() and put_uts() (used later) and remove most of the special
    cases for when UTS namespace is compiled in.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • All members of the process group have the same sid and it can't be == 0.

    NOTE: this code (and a similar one in sys_setpgid) was needed because it
    was possibe to have ->session == 0. It's not possible any longer since

    [PATCH] pidhash: don't use zero pids
    Commit: c7c6464117a02b0d54feb4ebeca4db70fa493678

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • All tasks in the process group have the same sid, we don't need to iterate
    them all to check that the caller of sys_setpgid() doesn't change its
    session.

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Add a per pid_namespace child-reaper. This is needed so processes are reaped
    within the same pid space and do not spill over to the parent pid space. Its
    also needed so containers preserve existing semantic that pid == 1 would reap
    orphaned children.

    This is based on Eric Biederman's patch: http://lkml.org/lkml/2006/2/6/285

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Cedric Le Goater
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Signed-off-by: Cedric Le Goater
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • Add the pid namespace framework to the nsproxy object. The copy of the pid
    namespace only increases the refcount on the global pid namespace,
    init_pid_ns, and unshare is not implemented.

    There is no configuration option to activate or deactivate this feature
    because this not relevant for the moment.

    Signed-off-by: Cedric Le Goater
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • Rename struct pspace to struct pid_namespace for consistency with other
    namespaces (uts_namespace and ipc_namespace). Also rename
    include/linux/pspace.h to include/linux/pid_namespace.h and variables from
    pspace to pid_ns.

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Cedric Le Goater
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Add an identifier to nsproxy. The default init_ns_proxy has identifier 0 and
    allocated nsproxies are given -1.

    This identifier will be used by a new syscall sys_bind_ns.

    Signed-off-by: Cedric Le Goater
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • Rename 'struct namespace' to 'struct mnt_namespace' to avoid confusion with
    other namespaces being developped for the containers : pid, uts, ipc, etc.
    'namespace' variables and attributes are also renamed to 'mnt_ns'

    Signed-off-by: Kirill Korotaev
    Signed-off-by: Cedric Le Goater
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kirill Korotaev
     
  • Add an anonymous union and ((deprecated)) to catch direct usage of the
    session field.

    [akpm@osdl.org: fix various missed conversions]
    [jdike@addtoit.com: fix UML bug]
    Signed-off-by: Jeff Dike
    Cc: Cedric Le Goater
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • Replace occurences of task->signal->session by a new process_session() helper
    routine.

    It will be useful for pid namespaces to abstract the session pid number.

    Signed-off-by: Cedric Le Goater
    Cc: Kirill Korotaev
    Cc: Eric W. Biederman
    Cc: Herbert Poetzl
    Cc: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cedric Le Goater
     
  • Signed-off-by: Josef Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef Sipek
     
  • Change all the uses of f_{dentry,vfsmnt} to f_path.{dentry,mnt} in
    linux/kernel/.

    Signed-off-by: Josef "Jeff" Sipek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josef "Jeff" Sipek
     
  • md_open takes ->reconfig_mutex which causes lockdep to complain. This
    (normally) doesn't have deadlock potential as the possible conflict is with a
    reconfig_mutex in a different device.

    I say "normally" because if a loop were created in the array->member hierarchy
    a deadlock could happen. However that causes bigger problems than a deadlock
    and should be fixed independently.

    So we flag the lock in md_open as a nested lock. This requires defining
    mutex_lock_interruptible_nested.

    Cc: Ingo Molnar
    Acked-by: Peter Zijlstra
    Acked-by: Ingo Molnar
    Signed-off-by: Neil Brown
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    NeilBrown
     
  • sys_unshare(CLONE_SIGHAND) is broken, the code under 'if (new_sigh)' is
    never executed but very wrong. Just remove it to avoid a confusion,
    task_lock() has nothing to do with ->sighand changing.

    Also, change the comment in unshare_sighand(). Yes, CLONE_THREAD implies
    CLONE_SIGHAND, but still it looks confusing. Also, we don't need to check
    current->sighand != NULL.

    Signed-off-by: Oleg Nesterov
    Acked-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Make set_special_pids() static, the only caller is daemonize().

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • No need to take the global tty_mutex, signal->tty->driver can't go away while
    we are holding ->siglock.

    Signed-off-by: Oleg Nesterov
    Acked-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     
  • Fix the locking of signal->tty.

    Use ->sighand->siglock to protect ->signal->tty; this lock is already used
    by most other members of ->signal/->sighand. And unless we are 'current'
    or the tasklist_lock is held we need ->siglock to access ->signal anyway.

    (NOTE: sys_unshare() is broken wrt ->sighand locking rules)

    Note that tty_mutex is held over tty destruction, so while holding
    tty_mutex any tty pointer remains valid. Otherwise the lifetime of ttys
    are governed by their open file handles. This leaves some holes for tty
    access from signal->tty (or any other non file related tty access).

    It solves the tty SLAB scribbles we were seeing.

    (NOTE: the change from group_send_sig_info to __group_send_sig_info needs to
    be examined by someone familiar with the security framework, I think
    it is safe given the SEND_SIG_PRIV from other __group_send_sig_info
    invocations)

    [schwidefsky@de.ibm.com: 3270 fix]
    [akpm@osdl.org: various post-viro fixes]
    Signed-off-by: Peter Zijlstra
    Acked-by: Alan Cox
    Cc: Oleg Nesterov
    Cc: Prarit Bhargava
    Cc: Chris Wright
    Cc: Roland McGrath
    Cc: Stephen Smalley
    Cc: James Morris
    Cc: "David S. Miller"
    Cc: Jeff Dike
    Cc: Martin Schwidefsky
    Cc: Jan Kara
    Signed-off-by: Martin Schwidefsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Zijlstra
     
  • While running my MCA test (hardware error injection) on 2.6.19,
    I got some warning like following:

    > BUG: warning at kernel/irq/migration.c:27/move_masked_irq()
    >
    > Call Trace:
    > [] show_stack+0x40/0xa0
    > sp=e00000006b2578d0 bsp=e00000006b2510b0
    > [] dump_stack+0x30/0x60
    > sp=e00000006b257aa0 bsp=e00000006b251098
    > [] move_masked_irq+0xb0/0x240
    > sp=e00000006b257aa0 bsp=e00000006b251070
    > [] move_native_irq+0xe0/0x180
    > sp=e00000006b257aa0 bsp=e00000006b251040
    > [] iosapic_end_level_irq+0x30/0xe0
    > sp=e00000006b257aa0 bsp=e00000006b251020
    > [] __do_IRQ+0x170/0x400
    > sp=e00000006b257aa0 bsp=e00000006b250fd8
    > [] ia64_handle_irq+0x1b0/0x260
    > sp=e00000006b257aa0 bsp=e00000006b250fa8
    > [] ia64_leave_kernel+0x0/0x280
    > sp=e00000006b257aa0 bsp=e00000006b250fa8
    > [] _spin_unlock_irqrestore+0x30/0x60
    > sp=e00000006b257c70 bsp=e00000006b250f90

    It comes from:

    [kernel/irq/migration.c]
    26 if (CHECK_IRQ_PER_CPU(desc->status)) {
    27 WARN_ON(1);
    28 return;
    29 }

    By putting some printk in kernel, I found that irqbalance is trying to
    move CPEI which is handled as PER_CPU irq. That's why.

    CPEI(Corrected Platform Error Interrupt) is ia64 specific irq, is
    allowed to pin to particular processor which selected by the platform, and
    even it is PER_CPU but it has set_affinity handler (=iosapic_set_affinity)
    as same as other IO-SAPIC-level interrupts. (I don't know why, but
    I guess that there would be typical situation where the handler for
    migration is needed, such as hotplug - the processor going to be
    offline/hot-removed.)

    To shut up this warning, there are 2 way at least:
    a) fix CPEI stuff
    b) prohibit setting affinity to PER_CPU irq

    I'm not sure what stuff of CPEI need to be fixed, but I think that
    returning error to attempting move PER_CPU irq is useful for all
    applications since it will never work.

    Following small patch takes b) style.
    It works, the warning disappeared and irqbalance still runs well.

    Signed-off-by: Hidetoshi Seto
    Cc: Arjan van de Ven
    Acked-by: Ingo Molnar
    Acked-by: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Hidetoshi Seto
     
  • Kallsyms data is never written to, so it can as well benefit from
    CONFIG_DEBUG_RODATA.

    Signed-off-by: Jan Beulich
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Beulich
     

08 Dec, 2006

17 commits

  • * 'release' of master.kernel.org:/pub/scm/linux/kernel/git/aegl/linux-2.6:
    [IA64] replace kmalloc+memset with kzalloc
    [IA64] resolve name clash by renaming is_available_memory()
    [IA64] Need export for csum_ipv6_magic
    [IA64] Fix DISCONTIGMEM without VIRTUAL_MEM_MAP
    [PATCH] Add support for type argument in PAL_GET_PSTATE
    [IA64] tidy up return value of ip_fast_csum
    [IA64] implement csum_ipv6_magic for ia64.
    [IA64] More Itanium PAL spec updates
    [IA64] Update processor_info features
    [IA64] Add se bit to Processor State Parameter structure
    [IA64] Add dp bit to cache and bus check structs
    [IA64] SN: Correctly update smp_affinty mask
    [IA64] sparse cleanups
    [IA64] IA64 Kexec/kdump

    Linus Torvalds
     
  • Changes and updates.

    1. Remove fake rendz path and related code according to discuss with Khalid Aziz.
    2. fc.i offset fix in relocate_kernel.S.
    3. iospic shutdown code eoi and mask race fix from Fujitsu.
    4. Warm boot hook in machine_kexec to SN SAL code from Jack Steiner.
    5. Send slave to SAL slave loop patch from Jay Lan.
    6. Kdump on non-recoverable MCA event patch from Jay Lan
    7. Use CTL_UNNUMBERED in kdump_on_init sysctl.

    Signed-off-by: Zou Nan hai
    Signed-off-by: Tony Luck

    Zou Nan hai
     
  • This allows workqueue users to run just their own pending work, rather
    than wait for the whole workqueue to finish running. This solves the
    deadlock with networking libphy that was due to other workqueue entries
    possibly needing a lock that was held by the routine that wanted to
    flush its own work.

    It's not wonderful: if you absolutely need to synchronize with the work
    function having been executed, any user strictly speaking should have
    its own completion tracking logic, since when we run things explicitly
    by hand, the generic workqueue layer can no longer help us synchronize.

    Also, this is strictly only usable for work that has been scheduled
    without any delayed timers. You can not mix the new interface with
    schedule_delayed_work().

    But it's better than what we had currently.

    Acked-by: Maciej W. Rozycki
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (48 commits)
    [NETFILTER]: Fix non-ANSI func. decl.
    [TG3]: Identify Serdes devices more clearly.
    [TG3]: Use msleep.
    [TG3]: Use netif_msg_*.
    [TG3]: Allow partial speed advertisement.
    [TG3]: Add TG3_FLG2_IS_NIC flag.
    [TG3]: Add 5787F device ID.
    [TG3]: Fix Phy loopback.
    [WANROUTER]: Kill kmalloc debugging code.
    [TCP] inet_twdr_hangman: Delete unnecessary memory barrier().
    [NET]: Memory barrier cleanups
    [IPSEC]: Fix inetpeer leak in ipv4 xfrm dst entries.
    audit: disable ipsec auditing when CONFIG_AUDITSYSCALL=n
    audit: Add auditing to ipsec
    [IRDA] irlan: Fix compile warning when CONFIG_PROC_FS=n
    [IrDA]: Incorrect TTP header reservation
    [IrDA]: PXA FIR code device model conversion
    [GENETLINK]: Fix misplaced command flags.
    [NETLIK]: Add a pointer to the Generic Netlink wiki page.
    [IPV6] RAW: Don't release unlocked sock.
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (156 commits)
    [PATCH] x86-64: Export smp_call_function_single
    [PATCH] i386: Clean up smp_tune_scheduling()
    [PATCH] unwinder: move .eh_frame to RODATA
    [PATCH] unwinder: fully support linker generated .eh_frame_hdr section
    [PATCH] x86-64: don't use set_irq_regs()
    [PATCH] x86-64: check vector in setup_ioapic_dest to verify if need setup_IO_APIC_irq
    [PATCH] x86-64: Make ix86 default to HIGHMEM4G instead of NOHIGHMEM
    [PATCH] i386: replace kmalloc+memset with kzalloc
    [PATCH] x86-64: remove remaining pc98 code
    [PATCH] x86-64: remove unused variable
    [PATCH] x86-64: Fix constraints in atomic_add_return()
    [PATCH] x86-64: fix asm constraints in i386 atomic_add_return
    [PATCH] x86-64: Correct documentation for bzImage protocol v2.05
    [PATCH] x86-64: replace kmalloc+memset with kzalloc in MTRR code
    [PATCH] x86-64: Fix numaq build error
    [PATCH] x86-64: include/asm-x86_64/cpufeature.h isn't a userspace header
    [PATCH] unwinder: Add debugging output to the Dwarf2 unwinder
    [PATCH] x86-64: Clarify error message in GART code
    [PATCH] x86-64: Fix interrupt race in idle callback (3rd try)
    [PATCH] x86-64: Remove unwind stack pointer alignment forcing again
    ...

    Fixed conflict in include/linux/uaccess.h manually

    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • When using fake NUMA setup, the number of memory nodes can greatly exceed
    the number of CPUs. So the current limit in cpuset_common_file_write() is
    insufficient.

    Signed-off-by: Paul Menage
    Acked-by: Paul Jackson
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Sometimes the kernel prints something interesting while userspace bootup
    keeps messages turned off via loglevel. Enable the printing of /all/
    kernel messages via the "ignore_loglevel" boot option. Off by default.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • The hash_lock must only ever be taken with irqs disabled. This happens in
    all the important places, except one codepath: register_lock_class(). The
    race should trigger rarely because register_lock_class() is quite rare and
    single-threaded (happens during init most of the time).

    The fix is to disable irqs.

    ( bug found live in -rt: there preemption is alot more agressive and
    preempting with the hash-lock held caused a lockup.)

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • The elf note saving code is currently duplicated over several
    architectures. This cleanup patch simply adds code to a common file and
    then replaces the arch-specific code with calls to the newly added code.

    The only drawback with this approach is that s390 doesn't fully support
    kexec-on-panic which for that arch leads to introduction of unused code.

    Signed-off-by: Magnus Damm
    Cc: Vivek Goyal
    Cc: Andi Kleen
    Cc: Paul Mackerras
    Cc: Benjamin Herrenschmidt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Magnus Damm
     
  • - move some file_operations structs into the .rodata section

    - move static strings from policy_types[] array into the .rodata section

    - fix generic seq_operations usages, so that those structs may be defined
    as "const" as well

    [akpm@osdl.org: couple of fixes]
    Signed-off-by: Helge Deller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Helge Deller
     
  • When disassembling a kernel I found around over 90 sync Instructions from
    mb, rmb and wmb calls in the kernel and only few of those make any sense to
    me. So here's the first one - I think the wmb() in kernel/futex.c is not
    needed on uniprocessors so should become an smb_wmb().

    Signed-off-by: Ralf Baechle
    Acked-by: Ingo Molnar
    Cc: Thomas Gleixner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ralf Baechle
     
  • The SRCU wrapper functions srcu_torture_read_lock and
    srcu_torture_read_unlock in rcutorture intentionally change the SRCU
    context; annotate them accordingly, to avoid a warning.

    Signed-off-by: Josh Triplett
    Acked-by: Paul E. McKenney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Josh Triplett
     
  • Remove unused 'new_ruid' variable.

    Reported by David Binderman .

    Signed-off-by: David Rientjes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • In time for 2.6.20, we can get rid of this junk.

    Signed-off-by: Adrian Bunk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • It is possible to have tasklets get scheduled before softirqd has had a chance
    to spawn on all CPUs. This is totally harmless; after success during action
    CPU_UP_PREPARE, action CPU_ONLINE will be called, which immediately wakes
    softirqd on the appropriate CPU to process the already pending tasklets. So
    there is no danger of having a missed wakeup for any tasklets that were
    already pending.

    In particular, i386 is affected by this during startup, and is visible when
    using a very large initrd; during the time it takes for the initrd to be
    decompressed, a timer IRQ can come in and schedule RCU callbacks. It is also
    possible that resending of a hardware IRQ via a softirq triggers the same bug.

    Because of different timing conditions, this shows up in all emulators and
    virtual machines tested, including Xen, VMware, Virtual PC, and Qemu. It is
    also possible to trigger on native hardware with a large enough initrd,
    although I don't have a reliable case demonstrating that.

    Signed-off-by: Zachary Amsden
    Cc:
    Cc: Ingo Molnar
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Zachary Amsden
     
  • Make the locking self-test failures (of 'FAILURE' type) easier to debug by
    printing more information.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar
     
  • Some have reported a chain-table overflow - double its size.

    Signed-off-by: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ingo Molnar