14 Aug, 2008

7 commits

  • Fix the setting of PF_SUPERPRIV by __capable() as it could corrupt the flags
    the target process if that is not the current process and it is trying to
    change its own flags in a different way at the same time.

    __capable() is using neither atomic ops nor locking to protect t->flags. This
    patch removes __capable() and introduces has_capability() that doesn't set
    PF_SUPERPRIV on the process being queried.

    This patch further splits security_ptrace() in two:

    (1) security_ptrace_may_access(). This passes judgement on whether one
    process may access another only (PTRACE_MODE_ATTACH for ptrace() and
    PTRACE_MODE_READ for /proc), and takes a pointer to the child process.
    current is the parent.

    (2) security_ptrace_traceme(). This passes judgement on PTRACE_TRACEME only,
    and takes only a pointer to the parent process. current is the child.

    In Smack and commoncap, this uses has_capability() to determine whether
    the parent will be permitted to use PTRACE_ATTACH if normal checks fail.
    This does not set PF_SUPERPRIV.

    Two of the instances of __capable() actually only act on current, and so have
    been changed to calls to capable().

    Of the places that were using __capable():

    (1) The OOM killer calls __capable() thrice when weighing the killability of a
    process. All of these now use has_capability().

    (2) cap_ptrace() and smack_ptrace() were using __capable() to check to see
    whether the parent was allowed to trace any process. As mentioned above,
    these have been split. For PTRACE_ATTACH and /proc, capable() is now
    used, and for PTRACE_TRACEME, has_capability() is used.

    (3) cap_safe_nice() only ever saw current, so now uses capable().

    (4) smack_setprocattr() rejected accesses to tasks other than current just
    after calling __capable(), so the order of these two tests have been
    switched and capable() is used instead.

    (5) In smack_file_send_sigiotask(), we need to allow privileged processes to
    receive SIGIO on files they're manipulating.

    (6) In smack_task_wait(), we let a process wait for a privileged process,
    whether or not the process doing the waiting is privileged.

    I've tested this with the LTP SELinux and syscalls testscripts.

    Signed-off-by: David Howells
    Acked-by: Serge Hallyn
    Acked-by: Casey Schaufler
    Acked-by: Andrew G. Morgan
    Acked-by: Al Viro
    Signed-off-by: James Morris

    David Howells
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
    crypto: padlock - fix VIA PadLock instruction usage with irq_ts_save/restore()
    crypto: hash - Add missing top-level functions
    crypto: hash - Fix digest size check for digest type
    crypto: tcrypt - Fix AEAD chunk testing
    crypto: talitos - Add handling for SEC 3.x treatment of link table

    Linus Torvalds
     
  • * git://oss.sgi.com:8090/xfs/linux-2.6: (45 commits)
    [XFS] Fix use after free in xfs_log_done().
    [XFS] Make xfs_bmap_*_count_leaves void.
    [XFS] Use KM_NOFS for debug trace buffers
    [XFS] use KM_MAYFAIL in xfs_mountfs
    [XFS] refactor xfs_mount_free
    [XFS] don't call xfs_freesb from xfs_unmountfs
    [XFS] xfs_unmountfs should return void
    [XFS] cleanup xfs_mountfs
    [XFS] move root inode IRELE into xfs_unmountfs
    [XFS] stop using file_update_time
    [XFS] optimize xfs_ichgtime
    [XFS] update timestamp in xfs_ialloc manually
    [XFS] remove the sema_t from XFS.
    [XFS] replace dquot flush semaphore with a completion
    [XFS] replace inode flush semaphore with a completion
    [XFS] extend completions to provide XFS object flush requirements
    [XFS] replace the XFS buf iodone semaphore with a completion
    [XFS] clean up stale references to semaphores
    [XFS] use get_unaligned_* helpers
    [XFS] Fix compile failure in xfs_buf_trace()
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
    dlm: rename structs
    dlm: add missing kfrees

    Linus Torvalds
     
  • Done as a script (well, a single "git mv" actually) on request from
    Yoshinori Sato as a way to avoid a huge diff.

    Requested-by: Yoshinori Sato
    Cc: Sam Ravnborg
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • Add a dlm_ prefix to the struct names in config.c. This resolves a
    conflict with struct node in particular, when include/linux/node.h
    happens to be included.

    Reported-by: Andrew Morton
    Signed-off-by: David Teigland

    David Teigland
     
  • A couple of unlikely error conditions were missing a kfree on the error
    exit path.

    Reported-by: Juha Leppanen
    Signed-off-by: David Teigland

    David Teigland
     

13 Aug, 2008

33 commits

  • Wolfgang Walter reported this oops on his via C3 using padlock for
    AES-encryption:

    ##################################################################

    BUG: unable to handle kernel NULL pointer dereference at 000001f0
    IP: [] __switch_to+0x30/0x117
    *pde = 00000000
    Oops: 0002 [#1] PREEMPT
    Modules linked in:

    Pid: 2071, comm: sleep Not tainted (2.6.26 #11)
    EIP: 0060:[] EFLAGS: 00010002 CPU: 0
    EIP is at __switch_to+0x30/0x117
    EAX: 00000000 EBX: c0493300 ECX: dc48dd00 EDX: c0493300
    ESI: dc48dd00 EDI: c0493530 EBP: c04cff8c ESP: c04cff7c
    DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 0068
    Process sleep (pid: 2071, ti=c04ce000 task=dc48dd00 task.ti=d2fe6000)
    Stack: dc48df30 c0493300 00000000 00000000 d2fe7f44 c03b5b43 c04cffc8 00000046
    c0131856 0000005a dc472d3c c0493300 c0493470 d983ae00 00002696 00000000
    c0239f54 00000000 c04c4000 c04cffd8 c01025fe c04f3740 00049800 c04cffe0
    Call Trace:
    [] ? schedule+0x285/0x2ff
    [] ? pm_qos_requirement+0x3c/0x53
    [] ? acpi_processor_idle+0x0/0x434
    [] ? cpu_idle+0x73/0x7f
    [] ? rest_init+0x61/0x63
    =======================

    Wolfgang also found out that adding kernel_fpu_begin() and kernel_fpu_end()
    around the padlock instructions fix the oops.

    Suresh wrote:

    These padlock instructions though don't use/touch SSE registers, but it behaves
    similar to other SSE instructions. For example, it might cause DNA faults
    when cr0.ts is set. While this is a spurious DNA trap, it might cause
    oops with the recent fpu code changes.

    This is the code sequence that is probably causing this problem:

    a) new app is getting exec'd and it is somewhere in between
    start_thread() and flush_old_exec() in the load_xyz_binary()

    b) At pont "a", task's fpu state (like TS_USEDFPU, used_math() etc) is
    cleared.

    c) Now we get an interrupt/softirq which starts using these encrypt/decrypt
    routines in the network stack. This generates a math fault (as
    cr0.ts is '1') which sets TS_USEDFPU and restores the math that is
    in the task's xstate.

    d) Return to exec code path, which does start_thread() which does
    free_thread_xstate() and sets xstate pointer to NULL while
    the TS_USEDFPU is still set.

    e) At the next context switch from the new exec'd task to another task,
    we have a scenarios where TS_USEDFPU is set but xstate pointer is null.
    This can cause an oops during unlazy_fpu() in __switch_to()

    Now:

    1) This should happen with or with out pre-emption. Viro also encountered
    similar problem with out CONFIG_PREEMPT.

    2) kernel_fpu_begin() and kernel_fpu_end() will fix this problem, because
    kernel_fpu_begin() will manually do a clts() and won't run in to the
    situation of setting TS_USEDFPU in step "c" above.

    3) This was working before the fpu changes, because its a spurious
    math fault which doesn't corrupt any fpu/sse registers and the task's
    math state was always in an allocated state.

    With out the recent lazy fpu allocation changes, while we don't see oops,
    there is a possible race still present in older kernels(for example,
    while kernel is using kernel_fpu_begin() in some optimized clear/copy
    page and an interrupt/softirq happens which uses these padlock
    instructions generating DNA fault).

    This is the failing scenario that existed even before the lazy fpu allocation
    changes:

    0. CPU's TS flag is set

    1. kernel using FPU in some optimized copy routine and while doing
    kernel_fpu_begin() takes an interrupt just before doing clts()

    2. Takes an interrupt and ipsec uses padlock instruction. And we
    take a DNA fault as TS flag is still set.

    3. We handle the DNA fault and set TS_USEDFPU and clear cr0.ts

    4. We complete the padlock routine

    5. Go back to step-1, which resumes clts() in kernel_fpu_begin(), finishes
    the optimized copy routine and does kernel_fpu_end(). At this point,
    we have cr0.ts again set to '1' but the task's TS_USEFPU is stilll
    set and not cleared.

    6. Now kernel resumes its user operation. And at the next context
    switch, kernel sees it has do a FP save as TS_USEDFPU is still set
    and then will do a unlazy_fpu() in __switch_to(). unlazy_fpu()
    will take a DNA fault, as cr0.ts is '1' and now, because we are
    in __switch_to(), math_state_restore() will get confused and will
    restore the next task's FP state and will save it in prev tasks's FP state.
    Remember, in __switch_to() we are already on the stack of the next task
    but take a DNA fault for the prev task.

    This causes the fpu leakage.

    Fix the padlock instruction usage by calling them inside the
    context of new routines irq_ts_save/restore(), which clear/restore cr0.ts
    manually in the interrupt context. This will not generate spurious DNA
    in the context of the interrupt which will fix the oops encountered and
    the possible FPU leakage issue.

    Reported-and-bisected-by: Wolfgang Walter
    Signed-off-by: Suresh Siddha
    Signed-off-by: Herbert Xu

    Suresh Siddha
     
  • The top-level functions init/update/final were missing for ahash.

    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • The changeset ca786dc738f4f583b57b1bba7a335b5e8233f4b0

    crypto: hash - Fixed digest size check

    missed one spot for the digest type. This patch corrects that
    error.

    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • My changeset 4b22f0ddb6564210c9ded7ba25b2a1007733e784

    crypto: tcrpyt - Remove unnecessary kmap/kunmap calls

    introduced a typo that broke AEAD chunk testing. In particular,
    axbuf should really be xbuf.

    There is also an issue with testing the last segment when encrypting.
    The additional part produced by AEAD wasn't tested. Similarly, on
    decryption the additional part of the AEAD input is mistaken for
    corruption.

    Signed-off-by: Herbert Xu

    Herbert Xu
     
  • Later SEC revision requires the link table (used for scatter/gather)
    to have an extra entry to account for the total length in descriptor [4],
    which contains cipher Input and ICV.
    This only applies to decrypt, not encrypt.
    Without this change, on 837x, a gather return/length error results
    when a decryption uses a link table to gather the fragments.
    This is observed by doing a ping with size of 1447 or larger with AES,
    or a ping with size 1455 or larger with 3des.

    So, add check for SEC compatible "fsl,3.0" for using extra link table entry.

    Signed-off-by: Lee Nipper
    Signed-off-by: Kim Phillips
    Signed-off-by: Herbert Xu

    Lee Nipper
     
  • The ticket allocation code got reworked in 2.6.26 and we now free tickets
    whereas before we used to cache them so the use-after-free went
    undetected.

    SGI-PV: 985525

    SGI-Modid: xfs-linux-melb:xfs-kern:31877a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: David Chinner

    Lachlan McIlroy
     
  • xfs_bmap_count_leaves and xfs_bmap_disk_count_leaves always return always
    0, make them void.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31844a

    Signed-off-by: Ruben Porras
    Signed-off-by: Donald Douwsma
    Signed-off-by: Lachlan McIlroy

    Ruben Porras
     
  • Use KM_NOFS to prevent recursion back into the filesystem which can cause
    deadlocks.

    In the case of xfs_iread() we hold the lock on the inode cluster buffer
    while allocating memory for the trace buffers. If we recurse back into XFS
    to flush data that may require a transaction to allocate extents which
    needs log space. This can deadlock with the xfsaild thread which can't
    push the tail of the log because it is trying to get the inode cluster
    buffer lock.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31838a

    Signed-off-by: Lachlan McIlroy
    Signed-off-by: David Chinner

    Lachlan McIlroy
     
  • Use KM_MAYFAIL for the m_perag allocation, we can deal with the error
    easily and blocking forever during mount is not a good idea either.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31837a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • xfs_mount_free mostly frees the perag data, which is something that is
    duplicated in the mount error path.

    Move the XFS_QM_DONE call to the caller and remove the useless
    mutex_destroy/spinlock_destroy calls so that we can re-use it for the
    mount error path. Also rename it to xfs_free_perag to reflect what it
    does.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31836a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • xfs_readsb is called before xfs_mount so xfs_freesb should be called after
    xfs_unmountfs, too. This means it now happens after a few things during
    the of xfs_unmount which all have nothing to do with the superblock.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31835a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • xfs_unmounts can't and shouldn't return errors so declare it as returning
    void.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31833a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • Remove all the useless flags and code keyed off it in xfs_mountfs.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31831a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • The root inode is allocated in xfs_mountfs so it should be release in
    xfs_unmountfs. For the unmount case that means we do it after the the
    xfs_sync(mp, SYNC_WAIT | SYNC_CLOSE) in the forced shutdown case and the
    dmapi unmount event. Note that both reference the rip variable which might
    be freed by that time in case inode flushing has kicked in, so strictly
    speaking this might count as a bug fix

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31830a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • xfs_ichtime updates the xfs_inode and Linux inode timestamps just fine, no
    need to call file_update_time and then copy the values over to the XFS
    inode. The only additional thing in file_update_time are checks not
    applicable to the write path.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31829a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy
    Signed-off-by: David Chinner

    Christoph Hellwig
     
  • Port a little optmization from file_update_time to xfs_ichgtime, and only
    update the timestamp and mark the inode dirty if the timestamp actually
    changes in the timer tick resultion supported by the running kernel.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31827a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • In xfs_ialloc we just want to set all timestamps to the current time. We
    don't need to mark the inode dirty like xfs_ichgtime does, and we don't
    need nor want the opimizations in xfs_ichgtime that I will introduce in
    the next patch.

    So just opencode the timestamp update in xfs_ialloc, and remove the new
    unused XFS_ICHGTIME_ACC case in xfs_ichgtime.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31825a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • Now that all users of the sema_t are gone from XFS we can finally kill it.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31823a

    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    David Chinner
     
  • Use the new completion flush code to implement the dquot flush lock.
    Removes one of the final users of semaphores in the XFS code base.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31822a

    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    David Chinner
     
  • Use the new completion flush code to implement the inode flush lock.
    Removes one of the final users of semaphores in the XFS code base.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31817a

    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    David Chinner
     
  • XFS object flushing doesn't quite match existing completion semantics. It
    mixed exclusive access with completion. That is, we need to mark an object as
    being flushed before flushing it to disk, and then block any other attempt to
    flush it until the completion occurs. We do this but adding an extra count to
    the completion before we start using them. However, we still need to
    determine if there is a completion in progress, and allow no-blocking attempts
    fo completions to decrement the count.

    To do this we introduce:

    int try_wait_for_completion(struct completion *x)
    returns a failure status if done == 0, otherwise decrements done
    to zero and returns a "started" status. This is provided
    to allow counted completions to begin safely while holding
    object locks in inverted order.

    int completion_done(struct completion *x)
    returns 1 if there is no waiter, 0 if there is a waiter
    (i.e. a completion in progress).

    This replaces the use of semaphores for providing this exclusion
    and completion mechanism.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31816a

    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    David Chinner
     
  • The xfs_buf_t b_iodonesema is really just a semaphore that wants to be a
    completion. Change it to a completion and remove the last user of the
    sema_t from XFS.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31815a

    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    David Chinner
     
  • A lot of code has been converted away from semaphores, but there are still
    comments that reference semaphore behaviour. The log code is the worst
    offender. Update the comments to reflect what the code really does now.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31814a

    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    David Chinner
     
  • SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31813a

    Signed-off-by: Harvey Harrison
    Signed-off-by: Lachlan McIlroy

    Harvey Harrison
     
  • SGI-PV: 957103

    SGI-Modid: xfs-linux-melb:xfs-kern:31804a

    Signed-off-by: Lachlan McIlroy

    Lachlan McIlroy
     
  • The alloc and inobt btree use the same agbp/agno pair in the btree_cur
    union. Make them use the same bc_private.a union member so that code for
    these two short form btree implementations can be shared.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31788a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin
    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • Remove unneeded xfs_btree_get_block forward declaration. Move
    xfs_btree_firstrec next to xfs_btree_lastrec.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31787a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Tim Shimmin
    Signed-off-by: David Chinner
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • Sanitize setting up the Linux indode.

    Setting up the xfs_inode inode link is opencoded in xfs_iget_core now
    because that's the only place it needs to be done, xfs_initialize_vnode is
    renamed to xfs_setup_inode and loses all superflous paramaters. The check
    for I_NEW is removed because it always is true and the di_mode check moves
    into xfs_iget_core because it's only needed there.

    xfs_set_inodeops and xfs_revalidate_inode are merged into xfs_setup_inode
    and the whole things is moved into xfs_iops.c where it belongs.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31782a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Niv Sardi
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • All remaining bhv_vnode_t instance are in code that's more or less Linux
    specific. (Well, for xfs_acl.c that could be argued, but that code is on
    the removal list, too). So just do an s/bhv_vnode_t/struct inode/ over the
    whole tree. We can clean up variable naming and some useless helpers
    later.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31781a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • In various places we can just move a VFS_I call into the argument list of
    called functions/macros instead of having a local bhv_vnode_t.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31776a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • When multiple inodes are locked in XFS it happens in order of the inode
    number, with the everything but the first inode trylocked if any of the
    previous inodes is in the AIL.

    Except for the sorting of the inodes this logic is implemented in
    xfs_lock_inodes, but also partially duplicated in xfs_lock_dir_and_entry
    in a particularly stupid way adds a lock roundtrip if the inode ordering
    is not optimal.

    This patch adds a new helper xfs_lock_two_inodes that takes two inodes and
    locks them in the most optimal way according to the above locking protocol
    and uses it for all places that want to lock two inodes.

    The only caller of xfs_lock_inodes is xfs_rename which might lock up to
    four inodes.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31772a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Donald Douwsma
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • All the error injection is already enabled through ifdef DEBUG, so kill
    the never set second cpp symbol to activate it without the rest of the
    debugging infrastructure.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31771a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Niv Sardi
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig
     
  • Now that all direct calls to VN_HOLD/VN_RELE are gone we can implement
    IHOLD/IRELE directly.

    For the IHOLD case also replace igrab with a direct increment of i_count
    because we are guaranteed to already have a live and referenced inode by
    the VFS. Also remove the vn_hold statistic because it's been rather
    meaningless for some time with most references done by other callers.

    SGI-PV: 981498

    SGI-Modid: xfs-linux-melb:xfs-kern:31764a

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Lachlan McIlroy

    Christoph Hellwig