11 Aug, 2010

11 commits

  • Conflicts:
    fs/exofs/inode.c

    Jiri Kosina
     
  • It's currently possible to bypass xattr namespace access rules by
    prefixing valid xattr names with "os2.", since the os2 namespace stores
    extended attributes in a legacy format with no prefix.

    This patch adds checking to deny access to any valid namespace prefix
    following "os2.".

    Signed-off-by: Dave Kleikamp
    Reported-by: Sergey Vlasov
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     
  • * 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
    block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
    xen-blkfront: fix missing out label
    blkdev: fix blkdev_issue_zeroout return value
    block: update request stacking methods to support discards
    block: fix missing export of blk_types.h
    writeback: fix bad _bh spinlock nesting
    drbd: revert "delay probes", feature is being re-implemented differently
    drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
    drbd: Disable delay probes for the upcomming release
    writeback: cleanup bdi_register
    writeback: add new tracepoints
    writeback: remove unnecessary init_timer call
    writeback: optimize periodic bdi thread wakeups
    writeback: prevent unnecessary bdi threads wakeups
    writeback: move bdi threads exiting logic to the forker thread
    writeback: restructure bdi forker loop a little
    writeback: move last_active to bdi
    writeback: do not remove bdi from bdi_list
    writeback: simplify bdi code a little
    writeback: do not lose wake-ups in bdi threads
    ...

    Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
    drivers/scsi/scsi_error.c as per Jens.

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (68 commits)
    U6715 16550A serial driver support
    Char: nozomi, set tty->driver_data appropriately
    Char: nozomi, fix tty->count counting
    serial: max3107: Fix gpiolib support
    hsu: call PCI pm hooks in suspend/resume function
    hsu: some code cleanup
    hsu: add a periodic timer to check dma rx channel
    hsu: driver for Medfield High Speed UART device
    mxser: remove unnesesary NULL check
    serial: add support for OX16PCI958 card
    serial: 68328serial.c: remove dead (ALMA_ANS | DRAGONIXVZ | M68EZ328ADS)
    timbuart: use __devinit and __devexit macros for probe and remove
    serial: MMIO32 support for 8250_early.c
    serial: mcf: don't take spinlocks in already protected functions
    serial: general fixes in the serial_rs485 structure
    serial: fix missing bit coverage of ASYNC_FLAGS
    serial: "altera_uart: simplify altera_uart_console_putc()" checkpatch fixes
    serial: crisv10: formatting of pointers in printk()
    vt: Fix warning: statement with no effect due to vt_kern.h
    tty_io: remove casts from void*
    ...

    Linus Torvalds
     
  • * 'bkl/ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
    staging: Pushdown bkl to easycap ioctl handlers
    autofs/autofs4: Move compat_ioctl handling into fs
    v4l: Convert v4l2-dev to unlocked_ioctl
    ia64/perfmon: Convert to unlocked_ioctl
    sunrpc: Remove duplicated #include
    ncpfs: Remove duplicated #include

    Linus Torvalds
     
  • This patch is against the 2.6.34 source.

    Paraphrased from the 1989 BSD patch by David Borman @ cray.com:

    These are the changes needed for the kernel to support
    LINEMODE in the server.

    There is a new bit in the termios local flag word, EXTPROC.
    When this bit is set, several aspects of the terminal driver
    are disabled. Input line editing, character echo, and mapping
    of signals are all disabled. This allows the telnetd to turn
    off these functions when in linemode, but still keep track of
    what state the user wants the terminal to be in.

    New ioctl:
    TIOCSIG Generate a signal to processes in the
    current process group of the pty.

    There is a new mode for packet driver, the TIOCPKT_IOCTL bit.
    When packet mode is turned on in the pty, and the EXTPROC bit
    is set, then whenever the state of the pty is changed, the
    next read on the master side of the pty will have the TIOCPKT_IOCTL
    bit set. This allows the process on the server side of the pty
    to know when the state of the terminal has changed; it can then
    issue the appropriate ioctl to retrieve the new state.

    Since the original BSD patches accompanied the source code for telnet
    I've left that reference here, but obviously the feature is useful for
    any remote terminal protocol, including ssh.

    The corresponding feature has existed in the BSD tty driver since 1989.
    For historical reference, a good copy of the relevant files can be found
    here:

    http://anonsvn.mit.edu/viewvc/krb5/trunk/src/appl/telnet/?pathrev=17741

    Signed-off-by: Howard Chu
    Cc: Alan Cox
    Signed-off-by: Greg Kroah-Hartman

    hyc@symas.com
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ecryptfs/ecryptfs-2.6:
    ecryptfs: dont call lookup_one_len to avoid NULL nameidata
    fs/ecryptfs/file.c: introduce missing free
    ecryptfs: release reference to lower mount if interpose fails
    eCryptfs: Handle ioctl calls with unlocked and compat functions
    ecryptfs: Fix warning in ecryptfs_process_response()

    Linus Torvalds
     
  • * git://git.infradead.org/mtd-2.6: (79 commits)
    mtd: Remove obsolete include
    mtd: Update copyright notices
    jffs2: Update copyright notices
    mtd-physmap: add support users can assign the probe type in board files
    mtd: remove redwood map driver
    mxc_nand: Add v3 (i.MX51) Support
    mxc_nand: support 8bit ecc
    mxc_nand: fix correct_data function
    mxc_nand: add V1_V2 namespace to registers
    mxc_nand: factor out a check_int function
    mxc_nand: make some internally used functions overwriteable
    mxc_nand: rework get_dev_status
    mxc_nand: remove 0xe00 offset from registers
    mtd: denali: Add multi connected NAND support
    mtd: denali: Remove set_ecc_config function
    mtd: denali: Remove unuseful code in get_xx_nand_para functions
    mtd: denali: Remove device_info_tag structure
    mtd: m25p80: add support for the Winbond W25Q32 SPI flash chip
    mtd: m25p80: add support for the Intel/Numonyx {16,32,64}0S33B SPI flash chips
    mtd: m25p80: add support for the EON EN25P{32, 64} SPI flash chips
    ...

    Fix up trivial conflicts in drivers/mtd/maps/{Kconfig,redwood.c} due to
    redwood driver removal.

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/bcopeland/omfs:
    omfs: fix uninitialized variable warning
    omfs: sanity check cluster size
    omfs: refuse to mount if bitmap pointer is obviously wrong
    omfs: check bounds on block numbers before passing to sb_bread
    omfs: fix memory leak

    Linus Torvalds
     
  • * 'for-linus' of git://git.infradead.org/users/eparis/notify: (132 commits)
    fanotify: use both marks when possible
    fsnotify: pass both the vfsmount mark and inode mark
    fsnotify: walk the inode and vfsmount lists simultaneously
    fsnotify: rework ignored mark flushing
    fsnotify: remove global fsnotify groups lists
    fsnotify: remove group->mask
    fsnotify: remove the global masks
    fsnotify: cleanup should_send_event
    fanotify: use the mark in handler functions
    audit: use the mark in handler functions
    dnotify: use the mark in handler functions
    inotify: use the mark in handler functions
    fsnotify: send fsnotify_mark to groups in event handling functions
    fsnotify: Exchange list heads instead of moving elements
    fsnotify: srcu to protect read side of inode and vfsmount locks
    fsnotify: use an explicit flag to indicate fsnotify_destroy_mark has been called
    fsnotify: use _rcu functions for mark list traversal
    fsnotify: place marks on object in order of group memory address
    vfs/fsnotify: fsnotify_close can delay the final work in fput
    fsnotify: store struct file not struct path
    ...

    Fix up trivial delete/modify conflict in fs/notify/inotify/inotify.c.

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
    no need for list_for_each_entry_safe()/resetting with superblock list
    Fix sget() race with failing mount
    vfs: don't hold s_umount over close_bdev_exclusive() call
    sysv: do not mark superblock dirty on remount
    sysv: do not mark superblock dirty on mount
    btrfs: remove junk sb_dirt change
    BFS: clean up the superblock usage
    AFFS: wait for sb synchronization when needed
    AFFS: clean up dirty flag usage
    cifs: truncate fallout
    mbcache: fix shrinker function return value
    mbcache: Remove unused features
    add f_flags to struct statfs(64)
    pass a struct path to vfs_statfs
    update VFS documentation for method changes.
    All filesystems that need invalidate_inode_buffers() are doing that explicitly
    convert remaining ->clear_inode() to ->evict_inode()
    Make ->drop_inode() just return whether inode needs to be dropped
    fs/inode.c:clear_inode() is gone
    fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
    ...

    Fix up trivial conflicts in fs/nilfs2/super.c

    Linus Torvalds
     

10 Aug, 2010

29 commits

  • Conflicts:
    arch/arm/mach-omap1/board-nokia770.c

    Jiri Kosina
     
  • Using:

    gcc (GCC) 4.5.0 20100610 (prerelease)

    The following warnings appear:

    fs/readdir.c: In function `filldir64':
    fs/readdir.c:240:15: warning: `dirent' is used uninitialized in this function
    fs/readdir.c: In function `filldir':
    fs/readdir.c:155:15: warning: `dirent' is used uninitialized in this function
    fs/compat.c: In function `compat_filldir64':
    fs/compat.c:1071:11: warning: `dirent' is used uninitialized in this function
    fs/compat.c: In function `compat_filldir':
    fs/compat.c:984:15: warning: `dirent' is used uninitialized in this function

    The warnings are related to the use of the NAME_OFFSET() macro. Luckily,
    it appears as though the standard offsetof() macro is what is being
    implemented by NAME_OFFSET(), thus we can fix the warning and use a more
    standard code construct at the same time.

    Signed-off-by: Kevin Winchester
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Kevin Winchester
     
  • WB_SYNC_NONE writeback is done in rounds of 1024 pages so that we don't
    write out some huge inode for too long while starving writeout of other
    inodes. To avoid livelocks, we record time we started writeback in
    wbc->wb_start and do not write out inodes which were dirtied after this
    time. But currently, writeback_inodes_wb() resets wb_start each time it
    is called thus effectively invalidating this logic and making any
    WB_SYNC_NONE writeback prone to livelocks.

    This patch makes sure wb_start is set only once when we start writeback.

    Signed-off-by: Jan Kara
    Reviewed-by: Wu Fengguang
    Cc: Christoph Hellwig
    Acked-by: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • /proc/pid/oom_adj is now deprecated so that that it may eventually be
    removed. The target date for removal is August 2012.

    A warning will be printed to the kernel log if a task attempts to use this
    interface. Future warning will be suppressed until the kernel is rebooted
    to prevent spamming the kernel log.

    Signed-off-by: David Rientjes
    Cc: Nick Piggin
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Oleg Nesterov
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • This a complete rewrite of the oom killer's badness() heuristic which is
    used to determine which task to kill in oom conditions. The goal is to
    make it as simple and predictable as possible so the results are better
    understood and we end up killing the task which will lead to the most
    memory freeing while still respecting the fine-tuning from userspace.

    Instead of basing the heuristic on mm->total_vm for each task, the task's
    rss and swap space is used instead. This is a better indication of the
    amount of memory that will be freeable if the oom killed task is chosen
    and subsequently exits. This helps specifically in cases where KDE or
    GNOME is chosen for oom kill on desktop systems instead of a memory
    hogging task.

    The baseline for the heuristic is a proportion of memory that each task is
    currently using in memory plus swap compared to the amount of "allowable"
    memory. "Allowable," in this sense, means the system-wide resources for
    unconstrained oom conditions, the set of mempolicy nodes, the mems
    attached to current's cpuset, or a memory controller's limit. The
    proportion is given on a scale of 0 (never kill) to 1000 (always kill),
    roughly meaning that if a task has a badness() score of 500 that the task
    consumes approximately 50% of allowable memory resident in RAM or in swap
    space.

    The proportion is always relative to the amount of "allowable" memory and
    not the total amount of RAM systemwide so that mempolicies and cpusets may
    operate in isolation; they shall not need to know the true size of the
    machine on which they are running if they are bound to a specific set of
    nodes or mems, respectively.

    Root tasks are given 3% extra memory just like __vm_enough_memory()
    provides in LSMs. In the event of two tasks consuming similar amounts of
    memory, it is generally better to save root's task.

    Because of the change in the badness() heuristic's baseline, it is also
    necessary to introduce a new user interface to tune it. It's not possible
    to redefine the meaning of /proc/pid/oom_adj with a new scale since the
    ABI cannot be changed for backward compatability. Instead, a new tunable,
    /proc/pid/oom_score_adj, is added that ranges from -1000 to +1000. It may
    be used to polarize the heuristic such that certain tasks are never
    considered for oom kill while others may always be considered. The value
    is added directly into the badness() score so a value of -500, for
    example, means to discount 50% of its memory consumption in comparison to
    other tasks either on the system, bound to the mempolicy, in the cpuset,
    or sharing the same memory controller.

    /proc/pid/oom_adj is changed so that its meaning is rescaled into the
    units used by /proc/pid/oom_score_adj, and vice versa. Changing one of
    these per-task tunables will rescale the value of the other to an
    equivalent meaning. Although /proc/pid/oom_adj was originally defined as
    a bitshift on the badness score, it now shares the same linear growth as
    /proc/pid/oom_score_adj but with different granularity. This is required
    so the ABI is not broken with userspace applications and allows oom_adj to
    be deprecated for future removal.

    Signed-off-by: David Rientjes
    Cc: Nick Piggin
    Cc: KAMEZAWA Hiroyuki
    Cc: KOSAKI Motohiro
    Cc: Oleg Nesterov
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Rientjes
     
  • Cc: Minchan Kim
    Cc: David Rientjes
    Cc: KAMEZAWA Hiroyuki
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • If a kernel thread is using use_mm(), badness() returns a positive value.
    This is not a big issue because caller take care of it correctly. But
    there is one exception, /proc//oom_score calls badness() directly and
    doesn't care that the task is a regular process.

    Another example, /proc/1/oom_score return !0 value. But it's unkillable.
    This incorrectness makes administration a little confusing.

    This patch fixes it.

    Signed-off-by: KOSAKI Motohiro
    Cc: Minchan Kim
    Cc: David Rientjes
    Cc: KAMEZAWA Hiroyuki
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    KOSAKI Motohiro
     
  • just delay __put_super() a bit

    Signed-off-by: Al Viro

    Al Viro
     
  • If sget() finds a matching superblock being set up, it'll
    grab an active reference to it and grab s_umount. That's
    fine - we'll wait for completion of foofs_get_sb() that way.
    However, if said foofs_get_sb() fails we'll end up holding
    the halfway-created superblock. deactivate_locked_super()
    called by foofs_get_sb() will just unlock the sucker since
    we are holding another active reference to it.

    What we need is a way to tell if superblock has been successfully
    set up. Unfortunately, neither ->s_root nor the check for
    MS_ACTIVE quite fit. Cheap and easy way, suitable for backport:
    new flag set by the (only) caller of ->get_sb(). If that flag
    isn't present by the time sget() grabbed s_umount on preexisting
    superblock it has found, it's seeing a stillborn and should
    just bury it with deactivate_locked_super() (and repeat the search).

    Longer term we want to set that flag in ->get_sb() instances (and
    check for it to distinguish between "sget() found us a live sb"
    and "sget() has allocated an sb, we need to set it up" in there,
    instead of checking ->s_root as we do now).

    Signed-off-by: Al Viro
    Cc: stable@kernel.org

    Al Viro
     
  • Fix an obscure AB-BA deadlock in get_sb_bdev().

    When a superblock is mounted more than once get_sb_bdev() calls
    close_bdev_exclusive() to drop the extra bdev reference while holding
    s_umount. However, sb->s_umount nests inside bd_mutex during
    __invalidate_device() and close_bdev_exclusive() acquires bd_mutex during
    blkdev_put(); thus creating an AB-BA deadlock.

    This condition doesn't trigger frequently. For this condition to be
    visible to lockdep, the filesystem must occupy the whole device (as
    __invalidate_device() only grabs bd_mutex for the whole device), the FS
    must be mounted more than once and partition rescan should be issued while
    the FS is still mounted.

    Fix it by dropping s_umount over close_bdev_exclusive().

    Signed-off-by: Tejun Heo
    Reported-by: Ciprian Docan
    Cc: Al Viro
    Acked-by: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Al Viro

    Tejun Heo
     
  • No need to mark the superblock as dirty in sysv_remount, synchronize
    it instead (only if mounting R/O).

    I did not find any docs about this file-system, and I have no possibility
    to test my changes. Thus, this is untested. I see other issues in sysv,
    e.g., why sysv_sync_fs writes only in the FSTYPE_SYSV4 case? However,
    it marks its SB bh's dirty for all types, and does not wait for them
    ever. With zero docs I'm unable to fix this.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • I did not find any docs about this file-system, and I have no possibility
    to test my changes. Thus, this is untested.

    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • BTRFS does not define a '->write_super()' method, so it should
    not mark its superblock as dirty. This looks like some left-over.

    Signed-off-by: Artem Bityutskiy
    Acked-by: Chris Mason
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • BFS is a very simple FS and its superblocks contains only static
    information and is never changed. However, the BFS code for some
    misterious reasons marked its buffer head as dirty from time to
    time, but nothing in that buffer was ever changed.

    This patch removes all the BFS superblock manipulation, simply
    because it is not needed. It removes:

    1. The si_sbh filed from 'struct bfs_sb_info' because it is not
    needed. We only need to read the SB once on mount to get the
    start of data blocks and the FS size. After this, we can forget
    about the SB.
    2. All instances of 'mark_buffer_dirty(sbh)' for BFS SB because
    it is never changed.
    3. The '->sync_fs()' method because there is nothing to sync
    (inodes are synched by VFS).
    4. The '->write_super()' method, again, because the SB is never
    changed.

    Tested-by: Artem Bityutskiy
    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • AFFS does not ever wait for superblock synchronization in
    ->put_super(), ->write_super, and ->sync_fs().

    However, it should wait for synchronization in ->put_super() because
    it is about to be unmounted, in ->write_super() because this is
    periodic SB synchronization performed from a separate kernel thread,
    and in ->sync_fs() it should respect the 'wait' flag. This patch fixes
    the situation.

    Also, in ->put_super(), do not write the SB if it is not dirty.

    Tested-by: Artem Bityutskiy
    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • In 'affs_write_super()': remove ancient and wrong commented code,
    remove unneeded 'clean' variable, so the function becomes a bit
    cleaner and simpler.

    In 'affs_remount(): remove unnecessary SB dirty flag changes.

    Tested-by: Artem Bityutskiy
    Signed-off-by: Artem Bityutskiy
    Signed-off-by: Al Viro

    Artem Bityutskiy
     
  • Remove the calls to inode_newsize_ok given that we already did it as
    part of inode_change_ok in the beginning of cifs_setattr_(no)unix.

    No need to call ->truncate if cifs doesn't have one, so remove the
    explicit call in cifs_vmtruncate, and replace the calls to vmtruncate
    with truncate_setsize which is vmtruncate minus inode_newsize_ok
    and the call to ->truncate.

    Rename cifs_vmtruncate to cifs_setsize to match the new calling conventions.

    Question 1: why does cifs do the pagecache munging and i_size update twice
    for each setattr call, once opencoded in cifs_vmtruncate, and once
    using the VFS helpers?
    Question 2: what is supposed to be protected by i_lock in cifs_vmtruncate?
    Do we need it around the call to inode_change_ok?

    [AV: fixed build breakage]

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • The shrinker function is supposed to return the number of cache
    entries after shrinking, not before shrinking. Fix that.

    Based on a patch from Wang Sheng-Hui .

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • The mbcache code was written to support a variable number of indexes,
    but all the existing users use exactly one index. Simplify to code to
    support only that case.

    There are also no users of the cache entry free operation, and none of
    the users keep extra data in cache entries. Remove those features as
    well.

    Signed-off-by: Andreas Gruenbacher
    Signed-off-by: Al Viro

    Andreas Gruenbacher
     
  • Add a flags field to help glibc implementing statvfs(3) efficiently.

    We copy the flag values from glibc, and add a new ST_VALID flag to
    denote that f_flags is implemented.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • We'll need the path to implement the flags field for statvfs support.
    We do have it available in all callers except:

    - ecryptfs_statfs. This one doesn't actually need vfs_statfs but just
    needs to do a caller to the lower filesystem statfs method.
    - sys_ustat. Add a non-exported statfs_by_dentry helper for it which
    doesn't won't be able to fill out the flags field later on.

    In addition rename the helpers for statfs vs fstatfs to do_*statfs instead
    of the misleading vfs prefix.

    Signed-off-by: Christoph Hellwig
    Signed-off-by: Al Viro

    Christoph Hellwig
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • ... and let iput_final() do the actual eviction or retention

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro
     
  • pretty much brute-force...

    Signed-off-by: Al Viro

    Al Viro
     
  • Signed-off-by: Al Viro

    Al Viro