03 Jan, 2009

17 commits

  • Just nail the oddments now while this code is being touched

    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Changelog [v2]:
    - Add note indicating strict isolation is not possible unless all
    mounts of devpts use the 'newinstance' mount option.

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • To support containers, allow multiple instances of devpts filesystem, such
    that indices of ptys allocated in one instance are independent of ptys
    allocated in other instances of devpts.

    But to preserve backward compatibility, enable this support for multiple
    instances only if:

    - CONFIG_DEVPTS_MULTIPLE_INSTANCES is set to Y, and
    - '-o newinstance' mount option is specified while mounting devpts

    To use multi-instance mount, a container startup script could:

    $ ns_exec -cm /bin/bash
    $ umount /dev/pts
    $ mount -t devpts -o newinstance lxcpts /dev/pts
    $ mount -o bind /dev/pts/ptmx /dev/ptmx
    $ /usr/sbin/sshd -p 1234

    where 'ns_exec -cm /bin/bash' is calls clone() with CLONE_NEWNS flag and execs
    /bin/bash in the child process. A pty created by the sshd is not visible in
    the original mount of /dev/pts.

    USER-SPACE-IMPACT:
    - See Documentation/fs/devpts.txt (included in next patch) for user-
    space impact in multi-instance and mixed-mode operation.
    TODO:
    - Update mount(8), pts(4) man pages. Highlight impact of not
    redirecting /dev/ptmx to /dev/pts/ptmx after a multi-instance mount.

    Changelog[v6]:
    - [Dave Hansen] Use new get_init_pts_sb() interface
    - [Serge Hallyn] Don't bother displaying 'newinstance' in show_options
    - [Serge Hallyn] Use macros (PARSE_REMOUNT/PARSE_MOUNT) instead of 0/1.
    - [Serge Hallyn] Check error return from get_sb_single() (now
    get_init_pts_sb())
    - devpts_pty_kill(): don't dput error dentries

    Changelog[v5]:
    - Move get_sb_ref() definition to earlier patch
    - Move usage info to Documentation/filesystems/devpts.txt (next patch)
    - Make ptmx node even in init_pts_ns, now that default mode is 0000
    (defined in earlier patch, enabled here).
    - Cache ptmx dentry and use to update mode during remount
    (defined in earlier patch, enabled here).
    - Bugfix: explicitly ignore newinstance on remount (if newinstance was
    specified on remount of initial mount, it would be ignored but
    /proc/mounts would imply that the option was set)

    Changelog[v4]:

    - Update patch description to address H. Peter Anvin's comments
    - Consolidate multi-instance mode code under new config token,
    CONFIG_DEVPTS_MULTIPLE_INSTANCE.
    - Move usage-details from patch description to
    Documentation/fs/devpts.txt

    Changelog[v3]:
    - Rename new mount option to 'newinstance'
    - Create ptmx nodes only in 'newinstance' mounts
    - Bugfix: parse_mount_options() modifies @data but since we need to
    parse the @data twice (once in devpts_get_sb() and once during
    do_remount_sb()), parse a local copy of @data in devpts_get_sb().
    (restructured code in devpts_get_sb() to fix this)

    Changelog[v2]:
    - Support both single-mount and multiple-mount semantics and
    provide '-onewmnt' option to select the semantics.

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • See comments in the function header for details. The new interface will
    be used in a follow-on patch.

    Changelog [v2]:
    [Dave Hansen] Replace get_sb_ref() in fs/super.c with get_init_pts_sb()
    and make the new interface private to devpts

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • /dev/ptmx is closely tied to the devpts filesystem. An open of /dev/ptmx,
    allocates the next pty index and the associated device shows up in the
    devpts fs as /dev/pts/n.

    Wih multiple instancs of devpts filesystem, during an open of /dev/ptmx
    we would be unable to determine which instance of the devpts is being
    accessed.

    So we move the 'ptmx' node into /dev/pts and use the inode of the 'ptmx'
    node to identify the superblock and hence the devpts instance. This patch
    adds ability for the kernel to internally create the [ptmx, c, 5:2] device
    when mounting devpts filesystem. Since the ptmx node in devpts is new and
    may surprise some userspace scripts, the default permissions for the new
    node is 0000. These permissions can be changed either using chmod or by
    remounting with the new '-o ptmxmode=0666' mount option.

    Changelog[v5]:
    - [Serge Hallyn bugfix]: Letting new_inode() assign inode number to
    ptmx can collide with hand-assigning inode numbers to ptys. So,
    hand-assign specific inode number to ptmx node also.
    - [Serge Hallyn]: Maybe safer to grab root dentry mutex while creating
    ptmx node
    - [Bugfix with Serge Hallyn] Replace lookup_one_len() in mknod_ptmx()
    wih d_alloc_name() (lookup during ->get_sb() locks up system). To
    simplify patchset, fold the ptmx_dentry patch into this.

    Changelog[v4]:
    - Change default permissions of pts/ptmx node to 0000.
    - Move code for ptmxmode under #ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES.

    Changelog[v3]:
    - Rename ptmx_mode to ptmxmode (for consistency with 'newinstance')

    Changelog[v2]:
    - [H. Peter Anvin] Remove mknod() system call support and create the
    ptmx node internally.

    Changelog[v1]:
    - Earlier version of this patch enabled creating /dev/pts/tty as
    well. As pointed out by Al Viro and H. Peter Anvin, that is not
    really necessary.

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Move code to parse mount options into a separate function so it can
    (later) be shared between mount and remount operations.

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • With support for multiple mounts of devpts, the 'config' structure really
    represents per-mount options rather than config parameters. Rename 'config'
    structure to 'pts_mount_opts' and store it in the super-block.

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • To enable multiple mounts of devpts, 'allocated_ptys' must be a per-mount
    variable rather than a global variable. Move 'allocated_ptys' into the
    super_block's s_fs_info.

    Changelog[v2]:
    Define and use DEVPTS_SB() wrapper.

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Remove the 'devpts_root' global variable and find the root dentry using
    the super_block. The super-block can be found from the device inode, using
    the new wrapper, pts_sb_from_inode().

    Changelog: This patch is based on an earlier patchset from Serge Hallyn
    and Matt Helsley.

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Now the main work is done its polishing time

    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • Fixes the loss of echoed (and other ldisc-generated characters) when
    the tty is stopped or when the driver output buffer is full (happens
    frequently for input during continuous program output, such as ^C)
    and removes the Big Kernel Lock from the N_TTY line discipline.

    Adds an "echo buffer" to the N_TTY line discipline that handles all
    ldisc-generated output (including echoed characters). Along with the
    loss of characters, this also fixes the associated loss of sync between
    tty output and the ldisc state when characters cannot be immediately
    written to the tty driver.

    The echo buffer stores (in addition to characters) state operations that need
    to be done at the time of character output (like management of the column
    position). This allows echo to cooperate correctly with program output,
    since the ldisc state remains consistent with actual characters written.

    Since the echo buffer code now isolates the tty column state code
    to the process_out* and process_echoes functions, we can remove the
    Big Kernel Lock (BKL) and replace it with mutex locks.

    Highlights are:

    * Handles echo (and other ldisc output) when tty driver buffer is full
    - continuous program output can block echo
    * Saves echo when tty is in stopped state (e.g. ^S)
    - (e.g.: ^Q will correctly cause held characters to be released for output)
    * Control character pairs (e.g. "^C") are treated atomically and not
    split up by interleaved program output
    * Line discipline state is kept consistent with characters sent to
    the tty driver
    * Remove the big kernel lock (BKL) from N_TTY line discipline

    Signed-off-by: Joe Peterson
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Joe Peterson
     
  • Signed-off-by: Sonic Zhang
    Signed-off-by: Bryan Wu
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Sonic Zhang
     
  • Add spin_lock_irqsave() when receive and transfer data.

    Signed-off-by: Sonic Zhang
    Signed-off-by: Bryan Wu
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Sonic Zhang
     
  • Signed-off-by: Sonic Zhang
    Signed-off-by: Bryan Wu
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Sonic Zhang
     
  • Bug description:
    The IRDA receiver may can't receiving any more after processed some signals.

    To duplicate this issue is put three IRDA devices together, one blackfin,
    two none blackfin, they will detect each other. Let one none blackfin devices
    irdaping the blackfin devices, when it stopped print out ping information,
    it is the time that blackfin stoped receiving, the time is random.

    The related register bit is OK, the other devices is sending data continuously.
    But no interrupt come.

    Fixing:
    I tried Michael's suggestion that request the UARTx error interrupt, and reset
    the IRDA when found FE error. This method helps much, but it can't completely
    avoid stop.

    Reset the IRDA before every time sending the data is more safe.

    Signed-off-by: Graf Yang
    Signed-off-by: Bryan Wu
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Graf Yang
     
  • Signed-off-by: Sonic Zhang
    Signed-off-by: Bryan Wu
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Sonic Zhang
     

01 Jan, 2009

23 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (34 commits)
    nfsd race fixes: jfs
    nfsd race fixes: reiserfs
    nfsd race fixes: ext4
    nfsd race fixes: ext3
    nfsd race fixes: ext2
    nfsd/create race fixes, infrastructure
    filesystem notification: create fs/notify to contain all fs notification
    fs/block_dev.c: __read_mostly improvement and sb_is_blkdev_sb utilization
    kill ->dir_notify()
    filp_cachep can be static in fs/file_table.c
    fix f_count description in Documentation/filesystems/files.txt
    make INIT_FS use the __RW_LOCK_UNLOCKED initialization
    take init_fs to saner place
    kill vfs_permission
    pass a struct path * to may_open
    kill walk_init_root
    remove incorrect comment in inode_permission
    expand some comments (d_path / seq_path)
    correct wrong function name of d_put in kernel document and source comment
    fix switch_names() breakage in short-to-short case
    ...

    Linus Torvalds
     
  • jfs version of Al Viro's nfsd race patches

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Al Viro

    Dave Kleikamp
     
  • ... and the same for reiserfs. The difference here is that we need
    insert_inode_locked4() to match iget5_locked().

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • ext3 analog of the previous patch

    Signed-off-by: Al Viro

    Al Viro
     
  • * make ext2_new_inode() put the inode into icache in locked state
    * do not unlock until the inode is fully set up; otherwise nfsd
    might pick it in half-baked state.
    * make sure that ext2_new_inode() does *not* lead to two inodes with the
    same inumber hashed at the same time; otherwise a bogus fhandle coming
    from nfsd might race with inode creation:

    nfsd: iget_locked() creates inode
    nfsd: try to read from disk, block on that.
    ext2_new_inode(): allocate inode with that inumber
    ext2_new_inode(): insert it into icache, set it up and dirty
    ext2_write_inode(): get the relevant part of inode table in cache,
    set the entry for our inode (and start writing to disk)
    nfsd: get CPU again, look into inode table, see nice and sane on-disk
    inode, set the in-core inode from it

    oops - we have two in-core inodes with the same inumber live in icache,
    both used for IO. Welcome to fs corruption...

    Signed-off-by: Al Viro

    Al Viro
     
  • new helpers - insert_inode_locked() and insert_inode_locked4().
    Hash new inode, making sure that there's no such inode in icache
    already. If there is and it does not end up unhashed (as would
    happen if we have nfsd trying to resolve a bogus fhandle), fail.
    Otherwise insert our inode into hash and succeed.

    In either case have i_state set to new+locked; cleanup ends up
    being simpler with such calling conventions.

    Signed-off-by: Al Viro

    Al Viro
     
  • Creating a generic filesystem notification interface, fsnotify, which will be
    used by inotify, dnotify, and eventually fanotify is really starting to
    clutter the fs directory. This patch simply moves inotify and dnotify into
    fs/notify/inotify and fs/notify/dnotify respectively to make both current fs/
    and future notification tidier.

    Signed-off-by: Eric Paris
    Signed-off-by: Al Viro

    Eric Paris
     
  • - iget5_locked in bdget really needs blockdev_superblock, instead of
    bd_mnt, so bd_mnt could be just a local variable;

    - blockdev_superblock really needs __read_mostly, while local var bd_mnt
    not;

    - make use of sb_is_blkdev_sb in bd_forget, instead of direct reference
    to blockdev_superblock.

    Signed-off-by: Denis ChengRq
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Denis ChengRq
     
  • Remove the hopelessly misguided ->dir_notify(). The only instance (cifs)
    has been broken by design from the very beginning; the objects it creates
    are never destroyed, keep references to struct file they can outlive, nothing
    that could possibly evict them exists on close(2) path *and* no locking
    whatsoever is done to prevent races with close(), should the previous, er,
    deficiencies someday be dealt with.

    Signed-off-by: Al Viro

    Al Viro
     
  • Instead of creating the "filp" kmem_cache in vfs_caches_init(),
    we can do it a litle be later in files_init(), so that filp_cachep
    is static to fs/file_table.c

    Acked-by: Paul E. McKenney

    Signed-off-by: Eric Dumazet
    Signed-off-by: Al Viro

    Eric Dumazet
     
  • Documentation/filesystems/files.txt was not updated when
    f_count became an atomic_long_t.
    atomic_long_inc_not_zero() is now used instead of atomic_inc_not_zero()

    Signed-off-by: Al Viro

    Eric Dumazet
     
  • [AV: rediffed on top of unification of init_fs]
    Initialization of init_fs still uses the deprecated RW_LOCK_UNLOCKED macro.
    This patch updates it to use the __RW_LOCK_UNLOCKED(lock) macro.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Al Viro

    Steven Rostedt
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • With all the nameidata removal there's no point anymore for this helper.
    Of the three callers left two will go away with the next lookup series
    anyway.

    Also add proper kerneldoc to inode_permission as this is the main
    permission check routine now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • No need for the nameidata in may_open - a struct path is enough.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • walk_init_root is a tiny helper that is marked __always_inline, has just
    one caller and an unused argument. Just merge it into the caller.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • We now pass on all MAY_ flags to the filesystems permission routines,
    so remove the comment stating the contrary.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Explain that you really need to use the return value of d_path rather than
    the buffer you passed into it.

    Also fix the comment for seq_path(), the function arguments changed
    recently but the comment hadn't been updated in sync.

    Signed-off-by: Arjan van de Ven
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Arjan van de Ven
     
  • no function named d_put(), it should be dput().

    Impact: fix document and comment, no functionality changed

    Signed-off-by: Zhao Lei
    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Zhaolei
     
  • We want ->name.len to match the resulting name on *both*
    source and target

    Signed-off-by: Al Viro

    Al Viro
     
  • Ensure fast symlink targets are NUL-terminated, even if corrupted
    on-disk.

    Cc: Sergey S. Kostyliov
    Signed-off-by: Duane Griffin
    Signed-off-by: Al Viro

    Duane Griffin
     
  • Ensure fast symlink targets are NUL-terminated, even if corrupted
    on-disk.

    Cc: Christoph Hellwig
    Signed-off-by: Duane Griffin
    Signed-off-by: Al Viro

    Duane Griffin