05 Jan, 2009

4 commits

  • * 'audit.b61' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/audit-current:
    audit: validate comparison operations, store them in sane form
    clean up audit_rule_{add,del} a bit
    make sure that filterkey of task,always rules is reported
    audit rules ordering, part 2
    fixing audit rule ordering mess, part 1
    audit_update_lsm_rules() misses the audit_inode_hash[] ones
    sanitize audit_log_capset()
    sanitize audit_fd_pair()
    sanitize audit_mq_open()
    sanitize AUDIT_MQ_SENDRECV
    sanitize audit_mq_notify()
    sanitize audit_mq_getsetattr()
    sanitize audit_ipc_set_perm()
    sanitize audit_ipc_obj()
    sanitize audit_socketcall
    don't reallocate buffer in every audit_sockaddr()

    Linus Torvalds
     
  • With the write_begin/write_end aops, page_symlink was broken because it
    could no longer pass a GFP_NOFS type mask into the point where the
    allocations happened. They are done in write_begin, which would always
    assume that the filesystem can be entered from reclaim. This bug could
    cause filesystem deadlocks.

    The funny thing with having a gfp_t mask there is that it doesn't really
    allow the caller to arbitrarily tinker with the context in which it can be
    called. It couldn't ever be GFP_ATOMIC, for example, because it needs to
    take the page lock. The only thing any callers care about is __GFP_FS
    anyway, so turn that into a single flag.

    Add a new flag for write_begin, AOP_FLAG_NOFS. Filesystems can now act on
    this flag in their write_begin function. Change __grab_cache_page to
    accept a nofs argument as well, to honour that flag (while we're there,
    change the name to grab_cache_page_write_begin which is more instructive
    and does away with random leading underscores).

    This is really a more flexible way to go in the end anyway -- if a
    filesystem happens to want any extra allocations aside from the pagecache
    ones in ints write_begin function, it may now use GFP_KERNEL (rather than
    GFP_NOFS) for common case allocations (eg. ocfs2_alloc_write_ctxt, for a
    random example).

    [kosaki.motohiro@jp.fujitsu.com: fix ubifs]
    [kosaki.motohiro@jp.fujitsu.com: fix fuse]
    Signed-off-by: Nick Piggin
    Reviewed-by: KOSAKI Motohiro
    Cc: [2.6.28.x]
    Signed-off-by: KOSAKI Motohiro
    Signed-off-by: Andrew Morton
    [ Cleaned up the calling convention: just pass in the AOP flags
    untouched to the grab_cache_page_write_begin() function. That
    just simplifies everybody, and may even allow future expansion of the
    logic. - Linus ]
    Signed-off-by: Linus Torvalds

    Nick Piggin
     
  • As suggested by Andreas Dilger, introduce a bgl_lock_ptr() helper in
    and add separate sb_bgl_lock() helpers to
    filesystem specific header files to break the hidden dependency to
    struct ext[234]_sb_info.

    Also, while at it, convert the macros to static inlines to try make up
    for all the times I broke Andrew Morton's tree.

    Acked-by: Andreas Dilger
    Signed-off-by: Pekka Enberg
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pekka Enberg
     
  • * no allocations
    * return void

    Signed-off-by: Al Viro

    Al Viro
     

04 Jan, 2009

4 commits

  • …/git/tip/linux-2.6-tip

    * 'cpus4096-for-linus-3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (77 commits)
    x86: setup_per_cpu_areas() cleanup
    cpumask: fix compile error when CONFIG_NR_CPUS is not defined
    cpumask: use alloc_cpumask_var_node where appropriate
    cpumask: convert shared_cpu_map in acpi_processor* structs to cpumask_var_t
    x86: use cpumask_var_t in acpi/boot.c
    x86: cleanup some remaining usages of NR_CPUS where s/b nr_cpu_ids
    sched: put back some stack hog changes that were undone in kernel/sched.c
    x86: enable cpus display of kernel_max and offlined cpus
    ia64: cpumask fix for is_affinity_mask_valid()
    cpumask: convert RCU implementations, fix
    xtensa: define __fls
    mn10300: define __fls
    m32r: define __fls
    h8300: define __fls
    frv: define __fls
    cris: define __fls
    cpumask: CONFIG_DISABLE_OBSOLETE_CPUMASK_FUNCTIONS
    cpumask: zero extra bits in alloc_cpumask_var_node
    cpumask: replace for_each_cpu_mask_nr with for_each_cpu in kernel/time/
    cpumask: convert mm/
    ...

    Linus Torvalds
     
  • ... just make it a binfmt handler like #! one.

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • They are actually alpha vs. i386/arm/m68k i.e. ecoff vs. aout.

    In the only place where we actually tried to handle arm and i386/m68k in
    different ways (START_DATA() in coredump handling), the arm variant
    works for all of them (i386 and m68k have u.start_code set to 0).

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     
  • it's been used only in sunos compat

    Signed-off-by: Al Viro
    Signed-off-by: Linus Torvalds

    Al Viro
     

03 Jan, 2009

12 commits

  • * 'linux-next' of git://git.infradead.org/ubifs-2.6: (33 commits)
    UBIFS: add more useful debugging prints
    UBIFS: print debugging messages properly
    UBIFS: fix numerous spelling mistakes
    UBIFS: allow mounting when short of space
    UBIFS: fix writing uncompressed files
    UBIFS: fix checkpatch.pl warnings
    UBIFS: fix sparse warnings
    UBIFS: simplify make_free_space
    UBIFS: do not lie about used blocks
    UBIFS: restore budg_uncommitted_idx
    UBIFS: always commit on unmount
    UBIFS: use ubi_sync
    UBIFS: always commit in sync_fs
    UBIFS: fix file-system synchronization
    UBIFS: fix constants initialization
    UBIFS: avoid unnecessary calculations
    UBIFS: re-calculate min_idx_size after the commit
    UBIFS: use nicer 64-bit math
    UBIFS: fix available blocks count
    UBIFS: various comment improvements and fixes
    ...

    Linus Torvalds
     
  • * 'kvm-updates/2.6.29' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: (140 commits)
    KVM: MMU: handle large host sptes on invlpg/resync
    KVM: Add locking to virtual i8259 interrupt controller
    KVM: MMU: Don't treat a global pte as such if cr4.pge is cleared
    MAINTAINERS: Maintainership changes for kvm/ia64
    KVM: ia64: Fix kvm_arch_vcpu_ioctl_[gs]et_regs()
    KVM: x86: Rework user space NMI injection as KVM_CAP_USER_NMI
    KVM: VMX: Fix pending NMI-vs.-IRQ race for user space irqchip
    KVM: fix handling of ACK from shared guest IRQ
    KVM: MMU: check for present pdptr shadow page in walk_shadow
    KVM: Consolidate userspace memory capability reporting into common code
    KVM: Advertise the bug in memory region destruction as fixed
    KVM: use cpumask_var_t for cpus_hardware_enabled
    KVM: use modern cpumask primitives, no cpumask_t on stack
    KVM: Extract core of kvm_flush_remote_tlbs/kvm_reload_remote_mmus
    KVM: set owner of cpu and vm file operations
    anon_inodes: use fops->owner for module refcount
    x86: KVM guest: kvm_get_tsc_khz: return khz, not lpj
    KVM: MMU: prepopulate the shadow on invlpg
    KVM: MMU: skip global pgtables on sync due to cr3 switch
    KVM: MMU: collapse remote TLB flushes on root sync
    ...

    Linus Torvalds
     
  • Wrap access to task credentials so that they can be separated more easily from
    the task_struct during the introduction of COW creds.

    Change most current->(|e|s|fs)[ug]id to current_(|e|s|fs)[ug]id().

    Change some task->e?[ug]id to task_e?[ug]id(). In some places it makes more
    sense to use RCU directly rather than a convenient wrapper; these will be
    addressed by later patches.

    Signed-off-by: David Howells
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    David Howells
     
  • fs/devpts/inode.c:324: warning: 'compare_init_pts_sb' defined but not used

    Signed-off-by: Andrew Morton
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Just nail the oddments now while this code is being touched

    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Alan Cox
     
  • To support containers, allow multiple instances of devpts filesystem, such
    that indices of ptys allocated in one instance are independent of ptys
    allocated in other instances of devpts.

    But to preserve backward compatibility, enable this support for multiple
    instances only if:

    - CONFIG_DEVPTS_MULTIPLE_INSTANCES is set to Y, and
    - '-o newinstance' mount option is specified while mounting devpts

    To use multi-instance mount, a container startup script could:

    $ ns_exec -cm /bin/bash
    $ umount /dev/pts
    $ mount -t devpts -o newinstance lxcpts /dev/pts
    $ mount -o bind /dev/pts/ptmx /dev/ptmx
    $ /usr/sbin/sshd -p 1234

    where 'ns_exec -cm /bin/bash' is calls clone() with CLONE_NEWNS flag and execs
    /bin/bash in the child process. A pty created by the sshd is not visible in
    the original mount of /dev/pts.

    USER-SPACE-IMPACT:
    - See Documentation/fs/devpts.txt (included in next patch) for user-
    space impact in multi-instance and mixed-mode operation.
    TODO:
    - Update mount(8), pts(4) man pages. Highlight impact of not
    redirecting /dev/ptmx to /dev/pts/ptmx after a multi-instance mount.

    Changelog[v6]:
    - [Dave Hansen] Use new get_init_pts_sb() interface
    - [Serge Hallyn] Don't bother displaying 'newinstance' in show_options
    - [Serge Hallyn] Use macros (PARSE_REMOUNT/PARSE_MOUNT) instead of 0/1.
    - [Serge Hallyn] Check error return from get_sb_single() (now
    get_init_pts_sb())
    - devpts_pty_kill(): don't dput error dentries

    Changelog[v5]:
    - Move get_sb_ref() definition to earlier patch
    - Move usage info to Documentation/filesystems/devpts.txt (next patch)
    - Make ptmx node even in init_pts_ns, now that default mode is 0000
    (defined in earlier patch, enabled here).
    - Cache ptmx dentry and use to update mode during remount
    (defined in earlier patch, enabled here).
    - Bugfix: explicitly ignore newinstance on remount (if newinstance was
    specified on remount of initial mount, it would be ignored but
    /proc/mounts would imply that the option was set)

    Changelog[v4]:

    - Update patch description to address H. Peter Anvin's comments
    - Consolidate multi-instance mode code under new config token,
    CONFIG_DEVPTS_MULTIPLE_INSTANCE.
    - Move usage-details from patch description to
    Documentation/fs/devpts.txt

    Changelog[v3]:
    - Rename new mount option to 'newinstance'
    - Create ptmx nodes only in 'newinstance' mounts
    - Bugfix: parse_mount_options() modifies @data but since we need to
    parse the @data twice (once in devpts_get_sb() and once during
    do_remount_sb()), parse a local copy of @data in devpts_get_sb().
    (restructured code in devpts_get_sb() to fix this)

    Changelog[v2]:
    - Support both single-mount and multiple-mount semantics and
    provide '-onewmnt' option to select the semantics.

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • See comments in the function header for details. The new interface will
    be used in a follow-on patch.

    Changelog [v2]:
    [Dave Hansen] Replace get_sb_ref() in fs/super.c with get_init_pts_sb()
    and make the new interface private to devpts

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • /dev/ptmx is closely tied to the devpts filesystem. An open of /dev/ptmx,
    allocates the next pty index and the associated device shows up in the
    devpts fs as /dev/pts/n.

    Wih multiple instancs of devpts filesystem, during an open of /dev/ptmx
    we would be unable to determine which instance of the devpts is being
    accessed.

    So we move the 'ptmx' node into /dev/pts and use the inode of the 'ptmx'
    node to identify the superblock and hence the devpts instance. This patch
    adds ability for the kernel to internally create the [ptmx, c, 5:2] device
    when mounting devpts filesystem. Since the ptmx node in devpts is new and
    may surprise some userspace scripts, the default permissions for the new
    node is 0000. These permissions can be changed either using chmod or by
    remounting with the new '-o ptmxmode=0666' mount option.

    Changelog[v5]:
    - [Serge Hallyn bugfix]: Letting new_inode() assign inode number to
    ptmx can collide with hand-assigning inode numbers to ptys. So,
    hand-assign specific inode number to ptmx node also.
    - [Serge Hallyn]: Maybe safer to grab root dentry mutex while creating
    ptmx node
    - [Bugfix with Serge Hallyn] Replace lookup_one_len() in mknod_ptmx()
    wih d_alloc_name() (lookup during ->get_sb() locks up system). To
    simplify patchset, fold the ptmx_dentry patch into this.

    Changelog[v4]:
    - Change default permissions of pts/ptmx node to 0000.
    - Move code for ptmxmode under #ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES.

    Changelog[v3]:
    - Rename ptmx_mode to ptmxmode (for consistency with 'newinstance')

    Changelog[v2]:
    - [H. Peter Anvin] Remove mknod() system call support and create the
    ptmx node internally.

    Changelog[v1]:
    - Earlier version of this patch enabled creating /dev/pts/tty as
    well. As pointed out by Al Viro and H. Peter Anvin, that is not
    really necessary.

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Move code to parse mount options into a separate function so it can
    (later) be shared between mount and remount operations.

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • With support for multiple mounts of devpts, the 'config' structure really
    represents per-mount options rather than config parameters. Rename 'config'
    structure to 'pts_mount_opts' and store it in the super-block.

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • To enable multiple mounts of devpts, 'allocated_ptys' must be a per-mount
    variable rather than a global variable. Move 'allocated_ptys' into the
    super_block's s_fs_info.

    Changelog[v2]:
    Define and use DEVPTS_SB() wrapper.

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     
  • Remove the 'devpts_root' global variable and find the root dentry using
    the super_block. The super-block can be found from the device inode, using
    the new wrapper, pts_sb_from_inode().

    Changelog: This patch is based on an earlier patchset from Serge Hallyn
    and Matt Helsley.

    Signed-off-by: Sukadev Bhattiprolu
    Signed-off-by: Alan Cox
    Signed-off-by: Linus Torvalds

    Sukadev Bhattiprolu
     

01 Jan, 2009

20 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (34 commits)
    nfsd race fixes: jfs
    nfsd race fixes: reiserfs
    nfsd race fixes: ext4
    nfsd race fixes: ext3
    nfsd race fixes: ext2
    nfsd/create race fixes, infrastructure
    filesystem notification: create fs/notify to contain all fs notification
    fs/block_dev.c: __read_mostly improvement and sb_is_blkdev_sb utilization
    kill ->dir_notify()
    filp_cachep can be static in fs/file_table.c
    fix f_count description in Documentation/filesystems/files.txt
    make INIT_FS use the __RW_LOCK_UNLOCKED initialization
    take init_fs to saner place
    kill vfs_permission
    pass a struct path * to may_open
    kill walk_init_root
    remove incorrect comment in inode_permission
    expand some comments (d_path / seq_path)
    correct wrong function name of d_put in kernel document and source comment
    fix switch_names() breakage in short-to-short case
    ...

    Linus Torvalds
     
  • jfs version of Al Viro's nfsd race patches

    Signed-off-by: Dave Kleikamp
    Signed-off-by: Al Viro

    Dave Kleikamp
     
  • ... and the same for reiserfs. The difference here is that we need
    insert_inode_locked4() to match iget5_locked().

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • ext3 analog of the previous patch

    Signed-off-by: Al Viro

    Al Viro
     
  • * make ext2_new_inode() put the inode into icache in locked state
    * do not unlock until the inode is fully set up; otherwise nfsd
    might pick it in half-baked state.
    * make sure that ext2_new_inode() does *not* lead to two inodes with the
    same inumber hashed at the same time; otherwise a bogus fhandle coming
    from nfsd might race with inode creation:

    nfsd: iget_locked() creates inode
    nfsd: try to read from disk, block on that.
    ext2_new_inode(): allocate inode with that inumber
    ext2_new_inode(): insert it into icache, set it up and dirty
    ext2_write_inode(): get the relevant part of inode table in cache,
    set the entry for our inode (and start writing to disk)
    nfsd: get CPU again, look into inode table, see nice and sane on-disk
    inode, set the in-core inode from it

    oops - we have two in-core inodes with the same inumber live in icache,
    both used for IO. Welcome to fs corruption...

    Signed-off-by: Al Viro

    Al Viro
     
  • new helpers - insert_inode_locked() and insert_inode_locked4().
    Hash new inode, making sure that there's no such inode in icache
    already. If there is and it does not end up unhashed (as would
    happen if we have nfsd trying to resolve a bogus fhandle), fail.
    Otherwise insert our inode into hash and succeed.

    In either case have i_state set to new+locked; cleanup ends up
    being simpler with such calling conventions.

    Signed-off-by: Al Viro

    Al Viro
     
  • Creating a generic filesystem notification interface, fsnotify, which will be
    used by inotify, dnotify, and eventually fanotify is really starting to
    clutter the fs directory. This patch simply moves inotify and dnotify into
    fs/notify/inotify and fs/notify/dnotify respectively to make both current fs/
    and future notification tidier.

    Signed-off-by: Eric Paris
    Signed-off-by: Al Viro

    Eric Paris
     
  • - iget5_locked in bdget really needs blockdev_superblock, instead of
    bd_mnt, so bd_mnt could be just a local variable;

    - blockdev_superblock really needs __read_mostly, while local var bd_mnt
    not;

    - make use of sb_is_blkdev_sb in bd_forget, instead of direct reference
    to blockdev_superblock.

    Signed-off-by: Denis ChengRq
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Denis ChengRq
     
  • Remove the hopelessly misguided ->dir_notify(). The only instance (cifs)
    has been broken by design from the very beginning; the objects it creates
    are never destroyed, keep references to struct file they can outlive, nothing
    that could possibly evict them exists on close(2) path *and* no locking
    whatsoever is done to prevent races with close(), should the previous, er,
    deficiencies someday be dealt with.

    Signed-off-by: Al Viro

    Al Viro
     
  • Instead of creating the "filp" kmem_cache in vfs_caches_init(),
    we can do it a litle be later in files_init(), so that filp_cachep
    is static to fs/file_table.c

    Acked-by: Paul E. McKenney

    Signed-off-by: Eric Dumazet
    Signed-off-by: Al Viro

    Eric Dumazet
     
  • [AV: rediffed on top of unification of init_fs]
    Initialization of init_fs still uses the deprecated RW_LOCK_UNLOCKED macro.
    This patch updates it to use the __RW_LOCK_UNLOCKED(lock) macro.

    Signed-off-by: Steven Rostedt
    Signed-off-by: Al Viro

    Steven Rostedt
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • With all the nameidata removal there's no point anymore for this helper.
    Of the three callers left two will go away with the next lookup series
    anyway.

    Also add proper kerneldoc to inode_permission as this is the main
    permission check routine now.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • No need for the nameidata in may_open - a struct path is enough.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • walk_init_root is a tiny helper that is marked __always_inline, has just
    one caller and an unused argument. Just merge it into the caller.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • We now pass on all MAY_ flags to the filesystems permission routines,
    so remove the comment stating the contrary.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Explain that you really need to use the return value of d_path rather than
    the buffer you passed into it.

    Also fix the comment for seq_path(), the function arguments changed
    recently but the comment hadn't been updated in sync.

    Signed-off-by: Arjan van de Ven
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Arjan van de Ven
     
  • no function named d_put(), it should be dput().

    Impact: fix document and comment, no functionality changed

    Signed-off-by: Zhao Lei
    Signed-off-by: Randy Dunlap
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Zhaolei
     
  • We want ->name.len to match the resulting name on *both*
    source and target

    Signed-off-by: Al Viro

    Al Viro