10 Sep, 2005

3 commits

  • With the use of RCU in files structure, the look-up of files using fds can now
    be lock-free. The lookup is protected by rcu_read_lock()/rcu_read_unlock().
    This patch changes the readers to use lock-free lookup.

    Signed-off-by: Maneesh Soni
    Signed-off-by: Ravikiran Thirumalai
    Signed-off-by: Dipankar Sarma
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dipankar Sarma
     
  • Patch to eliminate struct files_struct.file_lock spinlock on the reader side
    and use rcu refcounting rcuref_xxx api for the f_count refcounter. The
    updates to the fdtable are done by allocating a new fdtable structure and
    setting files->fdt to point to the new structure. The fdtable structure is
    protected by RCU thereby allowing lock-free lookup. For fd arrays/sets that
    are vmalloced, we use keventd to free them since RCU callbacks can't sleep. A
    global list of fdtable to be freed is not scalable, so we use a per-cpu list.
    If keventd is already handling the current cpu's work, we use a timer to defer
    queueing of that work.

    Since the last publication, this patch has been re-written to avoid using
    explicit memory barriers and use rcu_assign_pointer(), rcu_dereference()
    premitives instead. This required that the fd information is kept in a
    separate structure (fdtable) and updated atomically.

    Signed-off-by: Dipankar Sarma
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dipankar Sarma
     
  • In order for the RCU to work, the file table array, sets and their sizes must
    be updated atomically. Instead of ensuring this through too many memory
    barriers, we put the arrays and their sizes in a separate structure. This
    patch takes the first step of putting the file table elements in a separate
    structure fdtable that is embedded withing files_struct. It also changes all
    the users to refer to the file table using files_fdtable() macro. Subsequent
    applciation of RCU becomes easier after this.

    Signed-off-by: Dipankar Sarma
    Signed-Off-By: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dipankar Sarma
     

28 Jul, 2005

1 commit

  • I believe that there is a problem with the handling of POSIX locks, which
    the attached patch should address.

    The problem appears to be a race between fcntl(2) and close(2). A
    multithreaded application could close a file descriptor at the same time as
    it is trying to acquire a lock using the same file descriptor. I would
    suggest that that multithreaded application is not providing the proper
    synchronization for itself, but the OS should still behave correctly.

    SUS3 (Single UNIX Specification Version 3, read: POSIX) indicates that when
    a file descriptor is closed, that all POSIX locks on the file, owned by the
    process which closed the file descriptor, should be released.

    The trick here is when those locks are released. The current code releases
    all locks which exist when close is processing, but any locks in progress
    are handled when the last reference to the open file is released.

    There are three cases to consider.

    One is the simple case, a multithreaded (mt) process has a file open and
    races to close it and acquire a lock on it. In this case, the close will
    release one reference to the open file and when the fcntl is done, it will
    release the other reference. For this situation, no locks should exist on
    the file when both the close and fcntl operations are done. The current
    system will handle this case because the last reference to the open file is
    being released.

    The second case is when the mt process has dup(2)'d the file descriptor.
    The close will release one reference to the file and the fcntl, when done,
    will release another, but there will still be at least one more reference
    to the open file. One could argue that the existence of a lock on the file
    after the close has completed is okay, because it was acquired after the
    close operation and there is still a way for the application to release the
    lock on the file, using an existing file descriptor.

    The third case is when the mt process has forked, after opening the file
    and either before or after becoming an mt process. In this case, each
    process would hold a reference to the open file. For each process, this
    degenerates to first case above. However, the lock continues to exist
    until both processes have released their references to the open file. This
    lock could block other lock requests.

    The changes to release the lock when the last reference to the open file
    aren't quite right because they would allow the lock to exist as long as
    there was a reference to the open file. This is too long.

    The new proposed solution is to add support in the fcntl code path to
    detect a race with close and then to release the lock which was just
    acquired when such as race is detected. This causes locks to be released
    in a timely fashion and for the system to conform to the POSIX semantic
    specification.

    This was tested by instrumenting a kernel to detect the handling locks and
    then running a program which generates case #3 above. A dangling lock
    could be reliably generated. When the changes to detect the close/fcntl
    race were added, a dangling lock could no longer be generated.

    Cc: Matthew Wilcox
    Cc: Trond Myklebust
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Peter Staubach
     

01 May, 2005

1 commit


17 Apr, 2005

2 commits

  • A question on sigwaitinfo based IO mechanism in multithreaded applications.

    I am trying to use RT signals to notify me of IO events using RT signals
    instead of SIGIO in a multithreaded applications. I noticed that there was
    some discussion on lkml during november 1999 with the subject of the
    discussion as "Signal driven IO". In the thread I noticed that RT signals
    were being delivered to the worker thread. I am running 2.6.10 kernel and
    I am trying to use the very same mechanism and I find that only SIGIO being
    propogated to the worker threads and RT signals only being propogated to
    the main thread and not the worker threads where I actually want them to be
    propogated too. On further inspection I found that the following patch
    which I have attached solves the problem.

    I am not sure if this is a bug or feature in the kernel.

    Roland McGrath said:

    This relates only to fcntl F_SETSIG, which is a Linux extension. So there is
    no POSIX issue. When changing various things like the normal SIGIO signalling
    to do group signals, I was concerned strictly with the POSIX semantics and
    generally avoided touching things in the domain of Linux inventions. That's
    why I didn't change this when I changed the call right next to it. There is
    no reason I can see that F_SETSIG-requested signals shouldn't use a group
    signal like normal SIGIO does. I'm happy to ACK this patch, there is nothing
    wrong with its change to the semantics in my book. But neither POSIX nor I
    care a whit what F_SETSIG does.

    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Bharath Ramesh
     
  • Initial git repository build. I'm not bothering with the full history,
    even though we have it. We can create a separate "historical" git
    archive of that later if we want to, and in the meantime it's about
    3.2GB when imported into git - space that would just make the early
    git days unnecessarily complicated, when we don't have a lot of good
    infrastructure for it.

    Let it rip!

    Linus Torvalds