12 Sep, 2013

15 commits

  • No remaining users, we now use ipc_obtain_object_check().

    Signed-off-by: Davidlohr Bueso
    Cc: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • This function was replaced by a the lockless shm_obtain_object_check(),
    and no longer has any users.

    Signed-off-by: Davidlohr Bueso
    Cc: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • After previous cleanups and optimizations, this function is no longer
    heavily used and we don't have a good reason to keep it. Update the few
    remaining callers and get rid of it.

    Signed-off-by: Davidlohr Bueso
    Cc: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • When !CONFIG_MMU there's a chance we can derefence a NULL pointer when the
    VM area isn't found - check the return value of find_vma().

    Also, remove the redundant -EINVAL return: retval is set to the proper
    return code and *only* changed to 0, when we actually unmap the segments.

    Signed-off-by: Davidlohr Bueso
    Cc: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • As suggested by Andrew, add a generic initial locking scheme used
    throughout all sysv ipc mechanisms. Documenting the ids rwsem, how rcu
    can be enough to do the initial checks and when to actually acquire the
    kern_ipc_perm.lock spinlock.

    I found that adding it to util.c was generic enough.

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • There is only one user left, drop this function and just call
    ipc_unlock_object() and rcu_read_unlock().

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Since in some situations the lock can be shared for readers, we shouldn't
    be calling it a mutex, rename it to rwsem.

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Similar to other system calls, acquire the kern_ipc_perm lock after doing
    the initial permission and security checks.

    [sasha.levin@oracle.com: dont leave do_shmat with rcu lock held]
    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Sasha Levin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Clean up some of the messy do_shmat() spaghetti code, getting rid of
    out_free and out_put_dentry labels. This makes shortening the critical
    region of this function in the next patch a little easier to do and read.

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • With the *_INFO, *_STAT, IPC_RMID and IPC_SET commands already optimized,
    deal with the remaining SHM_LOCK and SHM_UNLOCK commands. Take the
    shm_perm lock after doing the initial auditing and security checks. The
    rest of the logic remains unchanged.

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • While the INFO cmd doesn't take the ipc lock, the STAT commands do acquire
    it unnecessarily. We can do the permissions and security checks only
    holding the rcu lock.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Similar to semctl and msgctl, when calling msgctl, the *_INFO and *_STAT
    commands can be performed without acquiring the ipc object.

    Add a shmctl_nolock() function and move the logic of *_INFO and *_STAT out
    of msgctl(). Since we are just moving functionality, this change still
    takes the lock and it will be properly lockless in the next patch.

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Now that sem, msgque and shm, through *_down(), all use the lockless
    variant of ipcctl_pre_down(), go ahead and delete it.

    [akpm@linux-foundation.org: fix function name in kerneldoc, cleanups]
    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Instead of holding the ipc lock for the entire function, use the
    ipcctl_pre_down_nolock and only acquire the lock for specific commands:
    RMID and SET.

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • This is the third and final patchset that deals with reducing the amount
    of contention we impose on the ipc lock (kern_ipc_perm.lock). These
    changes mostly deal with shared memory, previous work has already been
    done for semaphores and message queues:

    http://lkml.org/lkml/2013/3/20/546 (sems)
    http://lkml.org/lkml/2013/5/15/584 (mqueues)

    With these patches applied, a custom shm microbenchmark stressing shmctl
    doing IPC_STAT with 4 threads a million times, reduces the execution
    time by 50%. A similar run, this time with IPC_SET, reduces the
    execution time from 3 mins and 35 secs to 27 seconds.

    Patches 1-8: replaces blindly taking the ipc lock for a smarter
    combination of rcu and ipc_obtain_object, only acquiring the spinlock
    when updating.

    Patch 9: renames the ids rw_mutex to rwsem, which is what it already was.

    Patch 10: is a trivial mqueue leftover cleanup

    Patch 11: adds a brief lock scheme description, requested by Andrew.

    This patch:

    Add shm_obtain_object() and shm_obtain_object_check(), which will allow us
    to get the ipc object without acquiring the lock. Just as with other
    forms of ipc, these functions are basically wrappers around
    ipc_obtain_object*().

    Signed-off-by: Davidlohr Bueso
    Tested-by: Sedat Dilek
    Cc: Rik van Riel
    Cc: Manfred Spraul
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     

08 Sep, 2013

1 commit

  • Pull namespace changes from Eric Biederman:
    "This is an assorted mishmash of small cleanups, enhancements and bug
    fixes.

    The major theme is user namespace mount restrictions. nsown_capable
    is killed as it encourages not thinking about details that need to be
    considered. A very hard to hit pid namespace exiting bug was finally
    tracked and fixed. A couple of cleanups to the basic namespace
    infrastructure.

    Finally there is an enhancement that makes per user namespace
    capabilities usable as capabilities, and an enhancement that allows
    the per userns root to nice other processes in the user namespace"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
    userns: Kill nsown_capable it makes the wrong thing easy
    capabilities: allow nice if we are privileged
    pidns: Don't have unshare(CLONE_NEWPID) imply CLONE_THREAD
    userns: Allow PR_CAPBSET_DROP in a user namespace.
    namespaces: Simplify copy_namespaces so it is clear what is going on.
    pidns: Fix hang in zap_pid_ns_processes by sending a potentially extra wakeup
    sysfs: Restrict mounting sysfs
    userns: Better restrictions on when proc and sysfs can be mounted
    vfs: Don't copy mount bind mounts of /proc//ns/mnt between namespaces
    kernel/nsproxy.c: Improving a snippet of code.
    proc: Restrict mounting the proc filesystem
    vfs: Lock in place mounts from more privileged users

    Linus Torvalds
     

04 Sep, 2013

1 commit

  • The check if the queue is full and adding current to the wait queue of
    pending msgsnd() operations (ss_add()) must be atomic.

    Otherwise:
    - the thread that performs msgsnd() finds a full queue and decides to
    sleep.
    - the thread that performs msgrcv() first reads all messages from the
    queue and then sleeps, because the queue is empty.
    - the msgrcv() calls do not perform any wakeups, because the msgsnd()
    task has not yet called ss_add().
    - then the msgsnd()-thread first calls ss_add() and then sleeps.

    Net result: msgsnd() and msgrcv() both sleep forever.

    Observed with msgctl08 from ltp with a preemptible kernel.

    Fix: Call ipc_lock_object() before performing the check.

    The patch also moves security_msg_queue_msgsnd() under ipc_lock_object:
    - msgctl(IPC_SET) explicitely mentions that it tries to expunge any
    pending operations that are not allowed anymore with the new
    permissions. If security_msg_queue_msgsnd() is called without locks,
    then there might be races.
    - it makes the patch much simpler.

    Reported-and-tested-by: Vineet Gupta
    Acked-by: Rik van Riel
    Cc: stable@vger.kernel.org # for 3.11
    Signed-off-by: Manfred Spraul
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     

31 Aug, 2013

1 commit


29 Aug, 2013

1 commit

  • According to 'man msgrcv': "If msgtyp is less than 0, the first message of
    the lowest type that is less than or equal to the absolute value of msgtyp
    shall be received."

    Bug: The kernel only returns a message if its type is 1; other messages
    with type < abs(msgtype) will never get returned.

    Fix: After having traversed the list to find the first message with the
    lowest type, we need to actually return that message.

    This regression was introduced by commit daaf74cf0867 ("ipc: refactor
    msg list search into separate function")

    Signed-off-by: Svenning Soerensen
    Reviewed-by: Peter Hurley
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Svenning Sørensen
     

10 Jul, 2013

19 commits

  • Cleanup: Some minor points that I noticed while writing the previous
    patches

    1) The name try_atomic_semop() is misleading: The function performs the
    operation (if it is possible).

    2) Some documentation updates.

    No real code change, a rename and documentation changes.

    Signed-off-by: Manfred Spraul
    Cc: Rik van Riel
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • sem_otime contains the time of the last semaphore operation that
    completed successfully. Every operation updates this value, thus access
    from multiple cpus can cause thrashing.

    Therefore the patch replaces the variable with a per-semaphore variable.
    The per-array sem_otime is only calculated when required.

    No performance improvement on a single-socket i3 - only important for
    larger systems.

    Signed-off-by: Manfred Spraul
    Cc: Rik van Riel
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • There are two places that can contain alter operations:
    - the global queue: sma->pending_alter
    - the per-semaphore queues: sma->sem_base[].pending_alter.

    Since one of the queues must be processed first, this causes an odd
    priorization of the wakeups: complex operations have priority over
    simple ops.

    The patch restores the behavior of linux pending_alter is used.
    - otherwise, the per-semaphore queues are used.

    As a side effect, do_smart_update_queue() becomes much simpler: no more
    goto logic.

    Signed-off-by: Manfred Spraul
    Cc: Rik van Riel
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • Introduce separate queues for operations that do not modify the
    semaphore values. Advantages:

    - Simpler logic in check_restart().
    - Faster update_queue(): Right now, all wait-for-zero operations are
    always tested, even if the semaphore value is not 0.
    - wait-for-zero gets again priority, as in linux
    Cc: Rik van Riel
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • As now each semaphore has its own spinlock and parallel operations are
    possible, give each semaphore its own cacheline.

    On a i3 laptop, this gives up to 28% better performance:

    #semscale 10 | grep "interleave 2"
    - before:
    Cpus 1, interleave 2 delay 0: 36109234 in 10 secs
    Cpus 2, interleave 2 delay 0: 55276317 in 10 secs
    Cpus 3, interleave 2 delay 0: 62411025 in 10 secs
    Cpus 4, interleave 2 delay 0: 81963928 in 10 secs

    -after:
    Cpus 1, interleave 2 delay 0: 35527306 in 10 secs
    Cpus 2, interleave 2 delay 0: 70922909 in 10 secs <<< + 28%
    Cpus 3, interleave 2 delay 0: 80518538 in 10 secs
    Cpus 4, interleave 2 delay 0: 89115148 in 10 secs <<< + 8.7%

    i3, with 2 cores and with hyperthreading enabled. Interleave 2 in order
    use first the full cores. HT partially hides the delay from cacheline
    trashing, thus the improvement is "only" 8.7% if 4 threads are running.

    Signed-off-by: Manfred Spraul
    Cc: Rik van Riel
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • Enforce that ipc_rcu_alloc returns a cacheline aligned pointer on SMP.

    Rationale:

    The SysV sem code tries to move the main spinlock into a seperate
    cacheline (____cacheline_aligned_in_smp). This works only if
    ipc_rcu_alloc returns cacheline aligned pointers. vmalloc and kmalloc
    return cacheline algined pointers, the implementation of ipc_rcu_alloc
    breaks that.

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Manfred Spraul
    Cc: Rik van Riel
    Cc: Davidlohr Bueso
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     
  • We can now drop the msg_lock and msg_lock_check functions along with a
    bogus comment introduced previously in semctl_down.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • do_msgrcv() is the last msg queue function that abuses the ipc lock Take
    it only when needed when actually updating msq.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Tested-by: Sedat Dilek
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • do_msgsnd() is another function that does too many things with the ipc
    object lock acquired. Take it only when needed when actually updating
    msq.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • While the INFO cmd doesn't take the ipc lock, the STAT commands do
    acquire it unnecessarily. We can do the permissions and security checks
    only holding the rcu lock.

    This function now mimics semctl_nolock().

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Add msq_obtain_object() and msq_obtain_object_check(), which will allow
    us to get the ipc object without acquiring the lock. Just as with
    semaphores, these functions are basically wrappers around
    ipc_obtain_object*().

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Similar to semctl, when calling msgctl, the *_INFO and *_STAT commands
    can be performed without acquiring the ipc object.

    Add a msgctl_nolock() function and move the logic of *_INFO and *_STAT
    out of msgctl(). This change still takes the lock and it will be
    properly lockless in the next patch

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Instead of holding the ipc lock for the entire function, use the
    ipcctl_pre_down_nolock and only acquire the lock for specific commands:
    RMID and SET.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • This function currently acquires both the rw_mutex and the rcu lock on
    successful lookups, leaving the callers to explicitly unlock them,
    creating another two level locking situation.

    Make the callers (including those that still use ipcctl_pre_down())
    explicitly lock and unlock the rwsem and rcu lock.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Simple helpers around the (kern_ipc_perm *)->lock spinlock.

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • This patchset continues the work that began in the sysv ipc semaphore
    scaling series, see

    https://lkml.org/lkml/2013/3/20/546

    Just like semaphores used to be, sysv shared memory and msg queues also
    abuse the ipc lock, unnecessarily holding it for operations such as
    permission and security checks.

    This patchset mostly deals with mqueues, and while shared mem can be
    done in a very similar way, I want to get these patches out in the open
    first. It also does some pending cleanups, mostly focused on the two
    level locking we have in ipc code, taking care of ipc_addid() and
    ipcctl_pre_down_nolock() - yes there are still functions that need to be
    updated as well.

    This patch:

    Make all callers explicitly take and release the RCU read lock.

    This addresses the two level locking seen in newary(), newseg() and
    newqueue(). For the last two, explicitly unlock the ipc object and the
    rcu lock, instead of calling the custom shm_unlock and msg_unlock
    functions. The next patch will deal with the open coded locking for
    ->perm.lock

    Signed-off-by: Davidlohr Bueso
    Cc: Andi Kleen
    Cc: Rik van Riel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Davidlohr Bueso
     
  • Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • The old audit PATH records for mq_open looked like this:

    type=PATH msg=audit(1366282323.982:869): item=1 name=(null) inode=6777
    dev=00:0c mode=041777 ouid=0 ogid=0 rdev=00:00
    obj=system_u:object_r:tmpfs_t:s15:c0.c1023
    type=PATH msg=audit(1366282323.982:869): item=0 name="test_mq" inode=26732
    dev=00:0c mode=0100700 ouid=0 ogid=0 rdev=00:00
    obj=staff_u:object_r:user_tmpfs_t:s15:c0.c1023

    ...with the audit related changes that went into 3.7, they now look like this:

    type=PATH msg=audit(1366282236.776:3606): item=2 name=(null) inode=66655
    dev=00:0c mode=0100700 ouid=0 ogid=0 rdev=00:00
    obj=staff_u:object_r:user_tmpfs_t:s15:c0.c1023
    type=PATH msg=audit(1366282236.776:3606): item=1 name=(null) inode=6926
    dev=00:0c mode=041777 ouid=0 ogid=0 rdev=00:00
    obj=system_u:object_r:tmpfs_t:s15:c0.c1023
    type=PATH msg=audit(1366282236.776:3606): item=0 name="test_mq"

    Both of these look wrong to me. As Steve Grubb pointed out:

    "What we need is 1 PATH record that identifies the MQ. The other PATH
    records probably should not be there."

    Fix it to record the mq root as a parent, and flag it such that it
    should be hidden from view when the names are logged, since the root of
    the mq filesystem isn't terribly interesting. With this change, we get
    a single PATH record that looks more like this:

    type=PATH msg=audit(1368021604.836:484): item=0 name="test_mq" inode=16914
    dev=00:0c mode=0100644 ouid=0 ogid=0 rdev=00:00
    obj=unconfined_u:object_r:user_tmpfs_t:s0

    In order to do this, a new audit_inode_parent_hidden() function is
    added. If we do it this way, then we avoid having the existing callers
    of audit_inode needing to do any sort of flag conversion if auditing is
    inactive.

    Signed-off-by: Jeff Layton
    Reported-by: Jiri Jaburek
    Cc: Steve Grubb
    Cc: Eric Paris
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Layton
     

27 May, 2013

1 commit

  • do_smart_update_queue() is called when an operation (semop,
    semctl(SETVAL), semctl(SETALL), ...) modified the array. It must check
    which of the sleeping tasks can proceed.

    do_smart_update_queue() missed a few wakeups:
    - if a sleeping complex op was completed, then all per-semaphore queues
    must be scanned - not only those that were modified by *sops
    - if a sleeping simple op proceeded, then the global queue must be
    scanned again

    And:
    - the test for "|sops == NULL) before scanning the global queue is not
    required: If the global queue is empty, then it doesn't need to be
    scanned - regardless of the reason for calling do_smart_update_queue()

    The patch is not optimized, i.e. even completing a wait-for-zero
    operation causes a rescan. This is done to keep the patch as simple as
    possible.

    Signed-off-by: Manfred Spraul
    Acked-by: Davidlohr Bueso
    Cc: Rik van Riel
    Cc: Andrew Morton
    Signed-off-by: Linus Torvalds

    Manfred Spraul
     

10 May, 2013

1 commit

  • Dave reported an oops triggered by trinity:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    IP: newseg+0x10d/0x390
    PGD cf8c1067 PUD cf8c2067 PMD 0
    Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
    CPU: 2 PID: 7636 Comm: trinity-child2 Not tainted 3.9.0+#67
    ...
    Call Trace:
    ipcget+0x182/0x380
    SyS_shmget+0x5a/0x60
    tracesys+0xdd/0xe2

    This bug was introduced by commit af73e4d9506d ("hugetlbfs: fix mmap
    failure in unaligned size request").

    Reported-by: Dave Jones
    Cc:
    Signed-off-by: Li Zefan
    Reviewed-by: Naoya Horiguchi
    Acked-by: Rik van Riel
    Signed-off-by: Linus Torvalds

    Li Zefan