06 Nov, 2014

1 commit

  • seq_printf functions shouldn't really check the return value.
    Checking seq_has_overflowed() occasionally is used instead.

    Update vfs documentation.

    Link: http://lkml.kernel.org/p/e37e6e7b76acbdcc3bb4ab2a57c8f8ca1ae11b9a.1412031505.git.joe@perches.com

    Cc: David S. Miller
    Cc: Al Viro
    Signed-off-by: Joe Perches
    [ did a few clean ups ]
    Signed-off-by: Steven Rostedt

    Joe Perches
     

11 Sep, 2014

1 commit

  • When calling epoll_ctl with operation EPOLL_CTL_DEL, structure epds is
    not initialized but ep_take_care_of_epollwakeup reads its event field.
    When this unintialized field has EPOLLWAKEUP bit set, a capability check
    is done for CAP_BLOCK_SUSPEND in ep_take_care_of_epollwakeup. This
    produces unexpected messages in the audit log, such as (on a system
    running SELinux):

    type=AVC msg=audit(1408212798.866:410): avc: denied
    { block_suspend } for pid=7754 comm="dbus-daemon" capability=36
    scontext=unconfined_u:unconfined_r:unconfined_t
    tcontext=unconfined_u:unconfined_r:unconfined_t
    tclass=capability2 permissive=1

    type=SYSCALL msg=audit(1408212798.866:410): arch=c000003e syscall=233
    success=yes exit=0 a0=3 a1=2 a2=9 a3=7fffd4d66ec0 items=0 ppid=1
    pid=7754 auid=1000 uid=0 gid=0 euid=0 suid=0 fsuid=0 egid=0 sgid=0
    fsgid=0 tty=(none) ses=3 comm="dbus-daemon"
    exe="/usr/bin/dbus-daemon"
    subj=unconfined_u:unconfined_r:unconfined_t key=(null)

    ("arch=c000003e syscall=233 a1=2" means "epoll_ctl(op=EPOLL_CTL_DEL)")

    Remove use of epds in epoll_ctl when op == EPOLL_CTL_DEL.

    Fixes: 4d7e30d98939 ("epoll: Add a flag, EPOLLWAKEUP, to prevent suspend while epoll events are ready")
    Signed-off-by: Nicolas Iooss
    Cc: Alexander Viro
    Cc: Arve Hjønnevåg
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Nicolas Iooss
     

17 Jun, 2014

1 commit

  • This fixes use-after-free of epi->fllink.next inside list loop macro.
    This loop actually releases elements in the body. The list is
    rcu-protected but here we cannot hold rcu_read_lock because we need to
    lock mutex inside.

    The obvious solution is to use list_for_each_entry_safe(). RCU-ness
    isn't essential because nobody can change this list under us, it's final
    fput for this file.

    The bug was introduced by ae10b2b4eb01 ("epoll: optimize EPOLL_CTL_DEL
    using rcu")

    Signed-off-by: Konstantin Khlebnikov
    Reported-by: Cyrill Gorcunov
    Cc: Stable # 3.13+
    Cc: Sasha Levin
    Cc: Jason Baron
    Signed-off-by: Linus Torvalds

    Konstantin Khlebnikov
     

07 Jun, 2014

1 commit


03 Jan, 2014

1 commit

  • The EPOLL_CTL_DEL path of epoll contains a classic, ab-ba deadlock.
    That is, epoll_ctl(a, EPOLL_CTL_DEL, b, x), will deadlock with
    epoll_ctl(b, EPOLL_CTL_DEL, a, x). The deadlock was introduced with
    commmit 67347fe4e632 ("epoll: do not take global 'epmutex' for simple
    topologies").

    The acquistion of the ep->mtx for the destination 'ep' was added such
    that a concurrent EPOLL_CTL_ADD operation would see the correct state of
    the ep (Specifically, the check for '!list_empty(&f.file->f_ep_links')

    However, by simply not acquiring the lock, we do not serialize behind
    the ep->mtx from the add path, and thus may perform a full path check
    when if we had waited a little longer it may not have been necessary.
    However, this is a transient state, and performing the full loop
    checking in this case is not harmful.

    The important point is that we wouldn't miss doing the full loop
    checking when required, since EPOLL_CTL_ADD always locks any 'ep's that
    its operating upon. The reason we don't need to do lock ordering in the
    add path, is that we are already are holding the global 'epmutex'
    whenever we do the double lock. Further, the original posting of this
    patch, which was tested for the intended performance gains, did not
    perform this additional locking.

    Signed-off-by: Jason Baron
    Cc: Nathan Zimmer
    Cc: Eric Wong
    Cc: Nelson Elhage
    Cc: Al Viro
    Cc: Davide Libenzi
    Cc: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Baron
     

03 Dec, 2013

1 commit


13 Nov, 2013

4 commits

  • Merge first patch-bomb from Andrew Morton:
    "Quite a lot of other stuff is banked up awaiting further
    next->mainline merging, but this batch contains:

    - Lots of random misc patches
    - OCFS2
    - Most of MM
    - backlight updates
    - lib/ updates
    - printk updates
    - checkpatch updates
    - epoll tweaking
    - rtc updates
    - hfs
    - hfsplus
    - documentation
    - procfs
    - update gcov to gcc-4.7 format
    - IPC"

    * emailed patches from Andrew Morton : (269 commits)
    ipc, msg: fix message length check for negative values
    ipc/util.c: remove unnecessary work pending test
    devpts: plug the memory leak in kill_sb
    ./Makefile: export initial ramdisk compression config option
    init/Kconfig: add option to disable kernel compression
    drivers: w1: make w1_slave::flags long to avoid memory corruption
    drivers/w1/masters/ds1wm.cuse dev_get_platdata()
    drivers/memstick/core/ms_block.c: fix unreachable state in h_msb_read_page()
    drivers/memstick/core/mspro_block.c: fix attributes array allocation
    drivers/pps/clients/pps-gpio.c: remove redundant of_match_ptr
    kernel/panic.c: reduce 1 byte usage for print tainted buffer
    gcov: reuse kbasename helper
    kernel/gcov/fs.c: use pr_warn()
    kernel/module.c: use pr_foo()
    gcov: compile specific gcov implementation based on gcc version
    gcov: add support for gcc 4.7 gcov format
    gcov: move gcov structs definitions to a gcc version specific file
    kernel/taskstats.c: return -ENOMEM when alloc memory fails in add_del_listener()
    kernel/taskstats.c: add nla_nest_cancel() for failure processing between nla_nest_start() and nla_nest_end()
    kernel/sysctl_binary.c: use scnprintf() instead of snprintf()
    ...

    Linus Torvalds
     
  • Pull vfs updates from Al Viro:
    "All kinds of stuff this time around; some more notable parts:

    - RCU'd vfsmounts handling
    - new primitives for coredump handling
    - files_lock is gone
    - Bruce's delegations handling series
    - exportfs fixes

    plus misc stuff all over the place"

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (101 commits)
    ecryptfs: ->f_op is never NULL
    locks: break delegations on any attribute modification
    locks: break delegations on link
    locks: break delegations on rename
    locks: helper functions for delegation breaking
    locks: break delegations on unlink
    namei: minor vfs_unlink cleanup
    locks: implement delegations
    locks: introduce new FL_DELEG lock flag
    vfs: take i_mutex on renamed file
    vfs: rename I_MUTEX_QUOTA now that it's not used for quotas
    vfs: don't use PARENT/CHILD lock classes for non-directories
    vfs: pull ext4's double-i_mutex-locking into common code
    exportfs: fix quadratic behavior in filehandle lookup
    exportfs: better variable name
    exportfs: move most of reconnect_path to helper function
    exportfs: eliminate unused "noprogress" counter
    exportfs: stop retrying once we race with rename/remove
    exportfs: clear DISCONNECTED on all parents sooner
    exportfs: more detailed comment for path_reconnect
    ...

    Linus Torvalds
     
  • When calling EPOLL_CTL_ADD for an epoll file descriptor that is attached
    directly to a wakeup source, we do not need to take the global 'epmutex',
    unless the epoll file descriptor is nested. The purpose of taking the
    'epmutex' on add is to prevent complex topologies such as loops and deep
    wakeup paths from forming in parallel through multiple EPOLL_CTL_ADD
    operations. However, for the simple case of an epoll file descriptor
    attached directly to a wakeup source (with no nesting), we do not need to
    hold the 'epmutex'.

    This patch along with 'epoll: optimize EPOLL_CTL_DEL using rcu' improves
    scalability on larger systems. Quoting Nathan Zimmer's mail on SPECjbb
    performance:

    "On the 16 socket run the performance went from 35k jOPS to 125k jOPS. In
    addition the benchmark when from scaling well on 10 sockets to scaling
    well on just over 40 sockets.

    ...

    Currently the benchmark stops scaling at around 40-44 sockets but it seems like
    I found a second unrelated bottleneck."

    [akpm@linux-foundation.org: use `bool' for boolean variables, remove unneeded/undesirable cast of void*, add missed ep_scan_ready_list() kerneldoc]
    Signed-off-by: Jason Baron
    Tested-by: Nathan Zimmer
    Cc: Eric Wong
    Cc: Nelson Elhage
    Cc: Al Viro
    Cc: Davide Libenzi
    Cc: "Paul E. McKenney"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Baron
     
  • Nathan Zimmer found that once we get over 10+ cpus, the scalability of
    SPECjbb falls over due to the contention on the global 'epmutex', which is
    taken in on EPOLL_CTL_ADD and EPOLL_CTL_DEL operations.

    Patch #1 removes the 'epmutex' lock completely from the EPOLL_CTL_DEL path
    by using rcu to guard against any concurrent traversals.

    Patch #2 remove the 'epmutex' lock from EPOLL_CTL_ADD operations for
    simple topologies. IE when adding a link from an epoll file descriptor to
    a wakeup source, where the epoll file descriptor is not nested.

    This patch (of 2):

    Optimize EPOLL_CTL_DEL such that it does not require the 'epmutex' by
    converting the file->f_ep_links list into an rcu one. In this way, we can
    traverse the epoll network on the add path in parallel with deletes.
    Since deletes can't create loops or worse wakeup paths, this is safe.

    This patch in combination with the patch "epoll: Do not take global 'epmutex'
    for simple topologies", shows a dramatic performance improvement in
    scalability for SPECjbb.

    Signed-off-by: Jason Baron
    Tested-by: Nathan Zimmer
    Cc: Eric Wong
    Cc: Nelson Elhage
    Cc: Al Viro
    Cc: Davide Libenzi
    Cc: "Paul E. McKenney"
    CC: Wu Fengguang
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Baron
     

30 Oct, 2013

1 commit

  • This reverts commit 1c441e921201 (epoll: use freezable blocking call)
    which is reported to cause user space memory corruption to happen
    after suspend to RAM.

    Since it appears to be extremely difficult to root cause this
    problem, it is best to revert the offending commit and try to address
    the original issue in a better way later.

    References: https://bugzilla.kernel.org/show_bug.cgi?id=61781
    Reported-by: Natrio
    Reported-by: Jeff Pohlmeyer
    Bisected-by: Leo Wolf
    Fixes: 1c441e921201 (epoll: use freezable blocking call)
    Signed-off-by: Rafael J. Wysocki
    Cc: 3.11+ # 3.11+

    Rafael J. Wysocki
     

25 Oct, 2013

1 commit


12 Sep, 2013

1 commit

  • ep_free() might iterate on a huge set of epitems and hold cpu too long.
    Add two cond_resched() in order to yield cpu to other tasks. This is safe
    as we only hold mutexes in this function.

    Signed-off-by: Eric Dumazet
    Cc: Al Viro
    Cc: Theodore Ts'o
    Acked-by: Eric Wong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Dumazet
     

04 Sep, 2013

1 commit


04 Jul, 2013

2 commits

  • Merge first patch-bomb from Andrew Morton:
    - various misc bits
    - I'm been patchmonkeying ocfs2 for a while, as Joel and Mark have been
    distracted. There has been quite a bit of activity.
    - About half the MM queue
    - Some backlight bits
    - Various lib/ updates
    - checkpatch updates
    - zillions more little rtc patches
    - ptrace
    - signals
    - exec
    - procfs
    - rapidio
    - nbd
    - aoe
    - pps
    - memstick
    - tools/testing/selftests updates

    * emailed patches from Andrew Morton : (445 commits)
    tools/testing/selftests: don't assume the x bit is set on scripts
    selftests: add .gitignore for kcmp
    selftests: fix clean target in kcmp Makefile
    selftests: add .gitignore for vm
    selftests: add hugetlbfstest
    self-test: fix make clean
    selftests: exit 1 on failure
    kernel/resource.c: remove the unneeded assignment in function __find_resource
    aio: fix wrong comment in aio_complete()
    drivers/w1/slaves/w1_ds2408.c: add magic sequence to disable P0 test mode
    drivers/memstick/host/r592.c: convert to module_pci_driver
    drivers/memstick/host/jmb38x_ms: convert to module_pci_driver
    pps-gpio: add device-tree binding and support
    drivers/pps/clients/pps-gpio.c: convert to module_platform_driver
    drivers/pps/clients/pps-gpio.c: convert to devm_* helpers
    drivers/parport/share.c: use kzalloc
    Documentation/accounting/getdelays.c: avoid strncpy in accounting tool
    aoe: update internal version number to v83
    aoe: update copyright date
    aoe: perform I/O completions in parallel
    ...

    Linus Torvalds
     
  • sigprocmask() should die. None of the current callers actually
    need this strange interface.

    Change fs/eventpoll.c to use set_current_blocked(). This also
    means we should not worry about SIGKILL/SIGSTOP.

    Signed-off-by: Oleg Nesterov
    Cc: Al Viro
    Cc: Denys Vlasenko
    Cc: Eric Wong
    Cc: Jason Baron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Oleg Nesterov
     

12 May, 2013

1 commit

  • Avoid waking up every thread sleeping in an epoll_wait call during
    suspend and resume by calling a freezable blocking call. Previous
    patches modified the freezer to avoid sending wakeups to threads
    that are blocked in freezable blocking calls.

    This call was selected to be converted to a freezable call because
    it doesn't hold any locks or release any resources when interrupted
    that might be needed by another freezing task or a kernel driver
    during suspend, and is a common site where idle userspace tasks are
    blocked.

    Acked-by: Tejun Heo
    Signed-off-by: Colin Cross
    Signed-off-by: Rafael J. Wysocki

    Colin Cross
     

01 May, 2013

6 commits

  • Pull compat cleanup from Al Viro:
    "Mostly about syscall wrappers this time; there will be another pile
    with patches in the same general area from various people, but I'd
    rather push those after both that and vfs.git pile are in."

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal:
    syscalls.h: slightly reduce the jungles of macros
    get rid of union semop in sys_semctl(2) arguments
    make do_mremap() static
    sparc: no need to sign-extend in sync_file_range() wrapper
    ppc compat wrappers for add_key(2) and request_key(2) are pointless
    x86: trim sys_ia32.h
    x86: sys32_kill and sys32_mprotect are pointless
    get rid of compat_sys_semctl() and friends in case of ARCH_WANT_OLD_COMPAT_IPC
    merge compat sys_ipc instances
    consolidate compat lookup_dcookie()
    convert vmsplice to COMPAT_SYSCALL_DEFINE
    switch getrusage() to COMPAT_SYSCALL_DEFINE
    switch epoll_pwait to COMPAT_SYSCALL_DEFINE
    convert sendfile{,64} to COMPAT_SYSCALL_DEFINE
    switch signalfd{,4}() to COMPAT_SYSCALL_DEFINE
    make SYSCALL_DEFINE-generated wrappers do asmlinkage_protect
    make HAVE_SYSCALL_WRAPPERS unconditional
    consolidate cond_syscall and SYSCALL_ALIAS declarations
    teach SYSCALL_DEFINE how to deal with long long/unsigned long long
    get rid of duplicate logics in __SC_....[1-6] definitions

    Linus Torvalds
     
  • It is always safe to use RCU_INIT_POINTER to NULL a pointer. This results
    in slightly smaller/faster code.

    Signed-off-by: Eric Wong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Wong
     
  • This reduces the amount of code inside the ready list iteration loops for
    better readability IMHO.

    Signed-off-by: Eric Wong
    Cc: Davide Libenzi
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Wong
     
  • Technically we do not need to hold ep->mtx during ep_free since we are
    certain there are no other users of ep at that point. However, lockdep
    complains with a "suspicious rcu_dereference_check() usage!" message; so
    lock the mutex before ep_remove to silence the warning.

    Signed-off-by: Eric Wong
    Cc: Al Viro
    Cc: Arve Hjønnevåg
    Cc: Davide Libenzi
    Cc: Eric Dumazet
    Cc: NeilBrown ,
    Cc: Rafael J. Wysocki
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Wong
     
  • This prevents wakeup_source destruction when a user hits the item with
    EPOLL_CTL_MOD while ep_poll_callback is running.

    Tested with CONFIG_SPARSE_RCU_POINTER=y and "make fs/eventpoll.o C=2"

    Signed-off-by: Eric Wong
    Cc: Alexander Viro
    Cc: Arve Hjønnevåg
    Cc: Davide Libenzi
    Cc: Eric Dumazet
    Cc: NeilBrown
    Cc: "Rafael J. Wysocki"
    Cc: "Paul E. McKenney"
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Wong
     
  • It is common for epoll users to have thousands of epitems, so saving a
    cache line on every allocation leads to large memory savings.

    Since epitem allocations are cache-aligned, reducing sizeof(struct
    epitem) from 136 bytes to 128 bytes will allow it to squeeze under a
    cache line boundary on x86_64.

    Via /sys/kernel/slab/eventpoll_epi, I see the following changes on my
    x86_64 Core2 Duo (which has 64-byte cache alignment):

    object_size : 192 => 128
    objs_per_slab: 21 => 32

    Also, add a BUILD_BUG_ON() to check for future accidental breakage.

    [akpm@linux-foundation.org: use __packed, for all architectures]
    Signed-off-by: Eric Wong
    Cc: Davide Libenzi
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric Wong
     

04 Mar, 2013

1 commit


03 Jan, 2013

1 commit

  • EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to
    ensure events are not missed. Since the modifications to the interest
    mask are not protected by the same lock as ep_poll_callback, we need to
    ensure the change is visible to other CPUs calling ep_poll_callback.

    We also need to ensure f_op->poll() has an up-to-date view of past
    events which occured before we modified the interest mask. So this
    barrier also pairs with the barrier in wq_has_sleeper().

    This should guarantee either ep_poll_callback or f_op->poll() (or both)
    will notice the readiness of a recently-ready/modified item.

    This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in:
    http://thread.gmane.org/gmane.linux.kernel/1408782/

    Signed-off-by: Eric Wong
    Cc: Hans Verkuil
    Cc: Jiri Olsa
    Cc: Jonathan Corbet
    Cc: Al Viro
    Cc: Davide Libenzi
    Cc: Hans de Goede
    Cc: Mauro Carvalho Chehab
    Cc: David Miller
    Cc: Eric Dumazet
    Cc: Andrew Morton
    Cc: Andreas Voellmy
    Tested-by: "Junchang(Jason) Wang"
    Cc: netdev@vger.kernel.org
    Cc: linux-fsdevel@vger.kernel.org
    Signed-off-by: Linus Torvalds

    Eric Wong
     

18 Dec, 2012

1 commit

  • This allows us to print out eventpoll target file descriptor, events and
    data, the /proc/pid/fdinfo/fd consists of

    | pos: 0
    | flags: 02
    | tfd: 5 events: 1d data: ffffffffffffffff enabled: 1

    [avagin@: fix for unitialized ret variable]

    Signed-off-by: Cyrill Gorcunov
    Acked-by: Pavel Emelyanov
    Cc: Oleg Nesterov
    Cc: Andrey Vagin
    Cc: Al Viro
    Cc: Alexey Dobriyan
    Cc: James Bottomley
    Cc: "Aneesh Kumar K.V"
    Cc: Alexey Dobriyan
    Cc: Matthew Helsley
    Cc: "J. Bruce Fields"
    Cc: "Aneesh Kumar K.V"
    Cc: Tvrtko Ursulin
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     

09 Nov, 2012

1 commit

  • Revert commit 03a7beb55b9f ("epoll: support for disabling items, and a
    self-test app") pending resolution of the issues identified by Michael
    Kerrisk, copied below.

    We'll revisit this for 3.8.

    : I've taken a look at this patch as it currently stands in 3.7-rc1, and
    : done a bit of testing. (By the way, the test program
    : tools/testing/selftests/epoll/test_epoll.c does not compile...)
    :
    : There are one or two places where the behavior seems a little strange,
    : so I have a question or two at the end of this mail. But other than
    : that, I want to check my understanding so that the interface can be
    : correctly documented.
    :
    : Just to go though my understanding, the problem is the following
    : scenario in a multithreaded application:
    :
    : 1. Multiple threads are performing epoll_wait() operations,
    : and maintaining a user-space cache that contains information
    : corresponding to each file descriptor being monitored by
    : epoll_wait().
    :
    : 2. At some point, a thread wants to delete (EPOLL_CTL_DEL)
    : a file descriptor from the epoll interest list, and
    : delete the corresponding record from the user-space cache.
    :
    : 3. The problem with (2) is that some other thread may have
    : previously done an epoll_wait() that retrieved information
    : about the fd in question, and may be in the middle of using
    : information in the cache that relates to that fd. Thus,
    : there is a potential race.
    :
    : 4. The race can't solved purely in user space, because doing
    : so would require applying a mutex across the epoll_wait()
    : call, which would of course blow thread concurrency.
    :
    : Right?
    :
    : Your solution is the EPOLL_CTL_DISABLE operation. I want to
    : confirm my understanding about how to use this flag, since
    : the description that has accompanied the patches so far
    : has been a bit sparse
    :
    : 0. In the scenario you're concerned about, deleting a file
    : descriptor means (safely) doing the following:
    : (a) Deleting the file descriptor from the epoll interest list
    : using EPOLL_CTL_DEL
    : (b) Deleting the corresponding record in the user-space cache
    :
    : 1. It's only meaningful to use this EPOLL_CTL_DISABLE in
    : conjunction with EPOLLONESHOT.
    :
    : 2. Using EPOLL_CTL_DISABLE without using EPOLLONESHOT in
    : conjunction is a logical error.
    :
    : 3. The correct way to code multithreaded applications using
    : EPOLL_CTL_DISABLE and EPOLLONESHOT is as follows:
    :
    : a. All EPOLL_CTL_ADD and EPOLL_CTL_MOD operations should
    : should EPOLLONESHOT.
    :
    : b. When a thread wants to delete a file descriptor, it
    : should do the following:
    :
    : [1] Call epoll_ctl(EPOLL_CTL_DISABLE)
    : [2] If the return status from epoll_ctl(EPOLL_CTL_DISABLE)
    : was zero, then the file descriptor can be safely
    : deleted by the thread that made this call.
    : [3] If the epoll_ctl(EPOLL_CTL_DISABLE) fails with EBUSY,
    : then the descriptor is in use. In this case, the calling
    : thread should set a flag in the user-space cache to
    : indicate that the thread that is using the descriptor
    : should perform the deletion operation.
    :
    : Is all of the above correct?
    :
    : The implementation depends on checking on whether
    : (events & ~EP_PRIVATE_BITS) == 0
    : This replies on the fact that EPOLL_CTL_AD and EPOLL_CTL_MOD always
    : set EPOLLHUP and EPOLLERR in the 'events' mask, and EPOLLONESHOT
    : causes those flags (as well as all others in ~EP_PRIVATE_BITS) to be
    : cleared.
    :
    : A corollary to the previous paragraph is that using EPOLL_CTL_DISABLE
    : is only useful in conjunction with EPOLLONESHOT. However, as things
    : stand, one can use EPOLL_CTL_DISABLE on a file descriptor that does
    : not have EPOLLONESHOT set in 'events' This results in the following
    : (slightly surprising) behavior:
    :
    : (a) The first call to epoll_ctl(EPOLL_CTL_DISABLE) returns 0
    : (the indicator that the file descriptor can be safely deleted).
    : (b) The next call to epoll_ctl(EPOLL_CTL_DISABLE) fails with EBUSY.
    :
    : This doesn't seem particularly useful, and in fact is probably an
    : indication that the user made a logic error: they should only be using
    : epoll_ctl(EPOLL_CTL_DISABLE) on a file descriptor for which
    : EPOLLONESHOT was set in 'events'. If that is correct, then would it
    : not make sense to return an error to user space for this case?

    Cc: Michael Kerrisk
    Cc: "Paton J. Lewis"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     

06 Oct, 2012

1 commit

  • Enhanced epoll_ctl to support EPOLL_CTL_DISABLE, which disables an epoll
    item. If epoll_ctl doesn't return -EBUSY in this case, it is then safe to
    delete the epoll item in a multi-threaded environment. Also added a new
    test_epoll self- test app to both demonstrate the need for this feature
    and test it.

    Signed-off-by: Paton J. Lewis
    Cc: Alexander Viro
    Cc: Jason Baron
    Cc: Paul Holland
    Cc: Davide Libenzi
    Cc: Michael Kerrisk
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paton J. Lewis
     

27 Sep, 2012

2 commits


22 Aug, 2012

1 commit


18 Jul, 2012

1 commit

  • As discussed in
    http://thread.gmane.org/gmane.linux.kernel/1249726/focus=1288990,
    the capability introduced in 4d7e30d98939a0340022ccd49325a3d70f7e0238
    to govern EPOLLWAKEUP seems misnamed: this capability is about governing
    the ability to suspend the system, not using a particular API flag
    (EPOLLWAKEUP). We should make the name of the capability more general
    to encourage reuse in related cases. (Whether or not this capability
    should also be used to govern the use of /sys/power/wake_lock is a
    question that needs to be separately resolved.)

    This patch renames the capability to CAP_BLOCK_SUSPEND. In order to ensure
    that the old capability name doesn't make it out into the wild, could you
    please apply and push up the tree to ensure that it is incorporated
    for the 3.5 release.

    Signed-off-by: Michael Kerrisk
    Acked-by: Serge Hallyn
    Signed-off-by: Rafael J. Wysocki

    Michael Kerrisk
     

02 Jun, 2012

1 commit


23 May, 2012

1 commit

  • Commit 4d7e30d (epoll: Add a flag, EPOLLWAKEUP, to prevent
    suspend while epoll events are ready) caused some applications to
    malfunction, because they set the bit corresponding to the new
    EPOLLWAKEUP flag in their eventpoll flags and they don't have the
    new CAP_EPOLLWAKEUP capability.

    To prevent that from happening, change epoll_ctl() to clear
    EPOLLWAKEUP in epds.events if the caller doesn't have the
    CAP_EPOLLWAKEUP capability instead of failing and returning an
    error code, which allows the affected applications to function
    normally.

    Reported-and-tested-by: Jiri Slaby
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     

06 May, 2012

1 commit


26 Apr, 2012

1 commit

  • An epoll_ctl(,EPOLL_CTL_ADD,,) operation can return '-ELOOP' to prevent
    circular epoll dependencies from being created. However, in that case we
    do not properly clear the 'tfile_check_list'. Thus, add a call to
    clear_tfile_check_list() for the -ELOOP case.

    Signed-off-by: Jason Baron
    Reported-by: Yurij M. Plotnikov
    Cc: Nelson Elhage
    Cc: Davide Libenzi
    Tested-by: Alexandra N. Kossovsky
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jason Baron
     

29 Mar, 2012

2 commits

  • …m/linux/kernel/git/dhowells/linux-asm_system

    Pull "Disintegrate and delete asm/system.h" from David Howells:
    "Here are a bunch of patches to disintegrate asm/system.h into a set of
    separate bits to relieve the problem of circular inclusion
    dependencies.

    I've built all the working defconfigs from all the arches that I can
    and made sure that they don't break.

    The reason for these patches is that I recently encountered a circular
    dependency problem that came about when I produced some patches to
    optimise get_order() by rewriting it to use ilog2().

    This uses bitops - and on the SH arch asm/bitops.h drags in
    asm-generic/get_order.h by a circuituous route involving asm/system.h.

    The main difficulty seems to be asm/system.h. It holds a number of
    low level bits with no/few dependencies that are commonly used (eg.
    memory barriers) and a number of bits with more dependencies that
    aren't used in many places (eg. switch_to()).

    These patches break asm/system.h up into the following core pieces:

    (1) asm/barrier.h

    Move memory barriers here. This already done for MIPS and Alpha.

    (2) asm/switch_to.h

    Move switch_to() and related stuff here.

    (3) asm/exec.h

    Move arch_align_stack() here. Other process execution related bits
    could perhaps go here from asm/processor.h.

    (4) asm/cmpxchg.h

    Move xchg() and cmpxchg() here as they're full word atomic ops and
    frequently used by atomic_xchg() and atomic_cmpxchg().

    (5) asm/bug.h

    Move die() and related bits.

    (6) asm/auxvec.h

    Move AT_VECTOR_SIZE_ARCH here.

    Other arch headers are created as needed on a per-arch basis."

    Fixed up some conflicts from other header file cleanups and moving code
    around that has happened in the meantime, so David's testing is somewhat
    weakened by that. We'll find out anything that got broken and fix it..

    * tag 'split-asm_system_h-for-linus-20120328' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-asm_system: (38 commits)
    Delete all instances of asm/system.h
    Remove all #inclusions of asm/system.h
    Add #includes needed to permit the removal of asm/system.h
    Move all declarations of free_initmem() to linux/mm.h
    Disintegrate asm/system.h for OpenRISC
    Split arch_align_stack() out from asm-generic/system.h
    Split the switch_to() wrapper out of asm-generic/system.h
    Move the asm-generic/system.h xchg() implementation to asm-generic/cmpxchg.h
    Create asm-generic/barrier.h
    Make asm-generic/cmpxchg.h #include asm-generic/cmpxchg-local.h
    Disintegrate asm/system.h for Xtensa
    Disintegrate asm/system.h for Unicore32 [based on ver #3, changed by gxt]
    Disintegrate asm/system.h for Tile
    Disintegrate asm/system.h for Sparc
    Disintegrate asm/system.h for SH
    Disintegrate asm/system.h for Score
    Disintegrate asm/system.h for S390
    Disintegrate asm/system.h for PowerPC
    Disintegrate asm/system.h for PA-RISC
    Disintegrate asm/system.h for MN10300
    ...

    Linus Torvalds
     
  • Remove all #inclusions of asm/system.h preparatory to splitting and killing
    it. Performed with the following command:

    perl -p -i -e 's!^#\s*include\s*.*\n!!' `grep -Irl '^#\s*include\s*' *`

    Signed-off-by: David Howells

    David Howells
     

24 Mar, 2012

2 commits

  • We never use the length variable.

    Signed-off-by: Dan Carpenter
    Acked-by: Jason Baron
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dan Carpenter
     
  • Looking for a bug in -rt, I stumbled across this code here from: commit
    2dfa4eeab0fc ("epoll keyed wakeups: teach epoll about hints coming with
    the wakeup key"), specifically:

    #ifdef CONFIG_DEBUG_LOCK_ALLOC
    static inline void ep_wake_up_nested(wait_queue_head_t *wqueue,
    unsigned long events, int subclass)
    {
    unsigned long flags;

    spin_lock_irqsave_nested(&wqueue->lock, flags, subclass);
    wake_up_locked_poll(wqueue, events);
    spin_unlock_irqrestore(&wqueue->lock, flags);
    }
    #else
    static inline void ep_wake_up_nested(wait_queue_head_t *wqueue,
    unsigned long events, int subclass)
    {
    wake_up_poll(wqueue, events);
    }
    #endif

    You change the function of ep_wake_up_nested() depending on whether
    CONFIG_DEBUG_LOCK_ALLOC is set or not. This looks awfully suspicious,
    and there's no comment to explain why. I initially thought that this
    was trying to fool lockdep, and hiding a real bug.

    Investigating it, I found the creation of wake_up_nested() (which no
    longer exists) but was created for the sole purpose of epoll and its
    strange wake ups, as explained in commit 0ccf831cbee9 ("lockdep:
    annotate epoll")

    Although the commit message says "annotate epoll" the change log is much
    better at explaining what is happening than what is in the actual code.
    Thus a comment is really necessary here. And to save the time of other
    developers from having to go trudging through the git logs trying to
    figure out why this code exists.

    I took parts of the change log and placed it into a comment above the
    affected code. This will make the description of what is happening more
    visible to new developers that have to look at this code for the first
    time.

    Signed-off-by: Steven Rostedt
    Cc: Davide Libenzi
    Cc: Peter Zijlstra
    Cc: Alan Cox
    Cc: Ingo Molnar
    Cc: David Miller
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Steven Rostedt