29 Apr, 2008

40 commits

  • Implement a cgroup to track and enforce open and mknod restrictions on device
    files. A device cgroup associates a device access whitelist with each cgroup.
    A whitelist entry has 4 fields. 'type' is a (all), c (char), or b (block).
    'all' means it applies to all types and all major and minor numbers. Major
    and minor are either an integer or * for all. Access is a composition of r
    (read), w (write), and m (mknod).

    The root device cgroup starts with rwm to 'all'. A child devcg gets a copy of
    the parent. Admins can then remove devices from the whitelist or add new
    entries. A child cgroup can never receive a device access which is denied its
    parent. However when a device access is removed from a parent it will not
    also be removed from the child(ren).

    An entry is added using devices.allow, and removed using
    devices.deny. For instance

    echo 'c 1:3 mr' > /cgroups/1/devices.allow

    allows cgroup 1 to read and mknod the device usually known as
    /dev/null. Doing

    echo a > /cgroups/1/devices.deny

    will remove the default 'a *:* mrw' entry.

    CAP_SYS_ADMIN is needed to change permissions or move another task to a new
    cgroup. A cgroup may not be granted more permissions than the cgroup's parent
    has. Any task can move itself between cgroups. This won't be sufficient, but
    we can decide the best way to adequately restrict movement later.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix may-be-used-uninitialized warning]
    Signed-off-by: Serge E. Hallyn
    Acked-by: James Morris
    Looks-good-to: Pavel Emelyanov
    Cc: Daniel Hokka Zakrisson
    Cc: Li Zefan
    Cc: Paul Menage
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • Trigger callback can be used to receive a kick-up from the user space. The
    string written is ignored.

    The cftype->private is used for multiplexing events.

    Signed-off-by: Pavel Emelyanov
    Acked-by: Paul Menage
    Acked-by: KAMEZAWA Hiroyuki
    Cc: Balbir Singh
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • There is a race between create_proc_entry() and the assignment of file ops.
    proc_create() is invented to fix it.

    Signed-off-by: Li Zefan
    Acked-by: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • It is called by cgroup_init() and cgroup_init_early() only, which are
    annotated with __init.

    Signed-off-by: Li Zefan
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Li Zefan
     
  • This removes some filesystem boilerplate from the CFS cgroup subsystem.

    Signed-off-by: Paul Menage
    Acked-by: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • These patches add cgroups read_s64 and write_s64 control file methods (the
    signed equivalent of read_u64/write_u64) and use them to implement the
    cpu.rt_runtime_us control file in the CFS cgroup subsystem.

    This patch:

    These are the signed equivalents of the read_u64/write_u64 methods

    Signed-off-by: Paul Menage
    Acked-by: Peter Zijlstra
    Cc: Ingo Molnar
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • The cgroup debug subsystem isn't generally useful for users. It should
    default to "n".

    Signed-off-by: Paul Menage
    Cc: "Li Zefan"
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: "YAMAMOTO Takashi"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • The "releasable" control file provided by the cgroup framework exports the
    state of a per-cgroup flag that's related to the notify-on-release feature.
    This isn't really generally useful, unless you're trying to debug this
    particular feature of cgroups.

    This patch moves the "releasable" file to the cgroup_debug subsystem.

    Signed-off-by: Paul Menage
    Cc: "Li Zefan"
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: "YAMAMOTO Takashi"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • This function isn't needed - a NULL pointer in the cftype read function will
    result in the same EINVAL response to userspace.

    Signed-off-by: Paul Menage
    Cc: "Li Zefan"
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: "YAMAMOTO Takashi"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Remove the seq_file boilerplate used to construct the memcontrol stats map,
    and instead use the new map representation for cgroup control files

    Signed-off-by: Paul Menage
    Cc: "Li Zefan"
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: "YAMAMOTO Takashi"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Adds a new type of supported control file representation, a map from strings
    to u64 values.

    Each map entry is printed as a line in a similar format to /proc/vmstat, i.e.
    "$key $value\n"

    Signed-off-by: Paul Menage
    Cc: "Li Zefan"
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: "YAMAMOTO Takashi"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Many of the cpusets control files are simple integer values, which don't
    require the overhead of memory allocations for reads and writes.

    Move the handlers for these control files into cpuset_read_u64() and
    cpuset_write_u64().

    [akpm@linux-foundation.org: ad dmissing `break']
    Signed-off-by: Paul Menage
    Cc: "Li Zefan"
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: "YAMAMOTO Takashi"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • This removes the need for people to remember to pass the -n flag to echo when
    writing values to cgroup control files.

    Signed-off-by: Paul Menage
    Cc: "Li Zefan"
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: "YAMAMOTO Takashi"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Update the memory controller to use read_u64 for its limit/usage/failcnt
    control files, calling the new res_counter_read_u64() function.

    Signed-off-by: Paul Menage
    Cc: "Li Zefan"
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: "YAMAMOTO Takashi"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Adds a function for returning the value of a resource counter member, in a
    form suitable for use in a cgroup read_u64 control file method.

    Signed-off-by: Paul Menage
    Cc: "Li Zefan"
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: "YAMAMOTO Takashi"
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Several people have justifiably complained that the "_uint" suffix is
    inappropriate for functions that handle u64 values, so this patch just renames
    all these functions and their users to have the suffic _u64.

    [peterz@infradead.org: build fix]
    Signed-off-by: Paul Menage
    Cc: "Li Zefan"
    Cc: Balbir Singh
    Cc: Paul Jackson
    Cc: Pavel Emelyanov
    Cc: KAMEZAWA Hiroyuki
    Cc: "YAMAMOTO Takashi"
    Signed-off-by: Peter Zijlstra
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Menage
     
  • Every file should include the headers containing the externs its global
    functions (in this case for ns_cgroup_clone()).

    Signed-off-by: Adrian Bunk
    Acked-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Fix a code warning: symbol 'p' shadows an earlier one

    This is a reincarnation of Harvey Harrison's patch:
    cpuset: sparse warnings in cpuset.c

    Independently, Cliff Wickman moved the affected code,
    from kernel/cpuset.c to kernel/cgroup.c, in his patch:
    cpusets: update_cpumask revision

    Signed-off-by: Paul Jackson
    Cc: Harvey Harrison
    Cc: Cliff Wickman
    Acked-by: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Paul Jackson
     
  • Make the needlessly global cgroup_enable_task_cg_lists() static.

    Signed-off-by: Adrian Bunk
    Acked-by: David Rientjes
    Cc: Paul Menage
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • This adds support for OLPC XO hardware. Open Firmware on XOs don't contain
    the VSA, so it is necessary to emulate the PCI BARs in the kernel. This also
    adds functionality for running EC commands, and a CONFIG_OLPC.

    A number of OLPC drivers depend upon CONFIG_OLPC.

    olpc_ec_timeout is a hack to work around Embedded Controller bugs.

    [akpm@linux-foundation.org: build fix]
    [akpm@linux-foundation.org: geode_has_vsa build fix]
    [akpm@linux-foundation.org: olpc_register_battery_callback doesn't exist]
    Signed-off-by: Andres Salomon
    Acked-by: Ingo Molnar
    Cc: Thomas Gleixner
    Cc: Andi Kleen
    Cc: Jordan Crouse
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andres Salomon
     
  • Make sure crypt_stat->flags is protected with a lock in ecryptfs_open().

    Signed-off-by: Michael Halcrow
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Halcrow
     
  • Make eCryptfs key module subsystem respect namespaces.

    Since I will be removing the netlink interface in a future patch, I just made
    changes to the netlink.c code so that it will not break the build. With my
    recent patches, the kernel module currently defaults to the device handle
    interface rather than the netlink interface.

    [akpm@linux-foundation.org: export free_user_ns()]
    Signed-off-by: Michael Halcrow
    Acked-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Halcrow
     
  • Update the versioning information. Make the message types generic. Add an
    outgoing message queue to the daemon struct. Make the functions to parse
    and write the packet lengths available to the rest of the module. Add
    functions to create and destroy the daemon structs. Clean up some of the
    comments and make the code a little more consistent with itself.

    [akpm@linux-foundation.org: printk fixes]
    Signed-off-by: Michael Halcrow
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Halcrow
     
  • A regular device file was my real preference from the get-go, but I went with
    netlink at the time because I thought it would be less complex for managing
    send queues (i.e., just do a unicast and move on). It turns out that we do
    not really get that much complexity reduction with netlink, and netlink is
    more heavyweight than a device handle.

    In addition, the netlink interface to eCryptfs has been broken since 2.6.24.
    I am assuming this is a bug in how eCryptfs uses netlink, since the other
    in-kernel users of netlink do not seem to be having any problems. I have had
    one report of a user successfully using eCryptfs with netlink on 2.6.24, but
    for my own systems, when starting the userspace daemon, the initial helo
    message sent to the eCryptfs kernel module results in an oops right off the
    bat. I spent some time looking at it, but I have not yet found the cause.
    The netlink interface breaking gave me the motivation to just finish my patch
    to migrate to a regular device handle. If I cannot find out soon why the
    netlink interface in eCryptfs broke, I am likely to just send a patch to
    disable it in 2.6.24 and 2.6.25. I would like the device handle to be the
    preferred means of communicating with the userspace daemon from 2.6.26 on
    forward.

    This patch:

    Functions to facilitate reading and writing to the eCryptfs miscellaneous
    device handle. This will replace the netlink interface as the preferred
    mechanism for communicating with the userspace eCryptfs daemon.

    Each user has his own daemon, which registers itself by opening the eCryptfs
    device handle. Only one daemon per euid may be registered at any given time.
    The eCryptfs module sends a message to a daemon by adding its message to the
    daemon's outgoing message queue. The daemon reads the device handle to get
    the oldest message off the queue.

    Incoming messages from the userspace daemon are immediately handled. If the
    message is a response, then the corresponding process that is blocked waiting
    for the response is awakened.

    Signed-off-by: Michael Halcrow
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Halcrow
     
  • Callers of notify_change() need to hold i_mutex.

    Signed-off-by: Miklos Szeredi
    Cc: Michael Halcrow
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • __FUNCTION__ is gcc-specific, use __func__

    Signed-off-by: Harvey Harrison
    Cc: Michael Halcrow
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • Remove the no longer used ecryptfs_header_cache_0.

    Signed-off-by: Adrian Bunk
    Cc: Michael Halcrow
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • Introduced between 2.6.25-rc2 and -rc3
    drivers/block/xen-blkfront.c:139:5: warning: symbol 'blkif_getgeo' was not declared. Should it be static?

    Signed-off-by: Harvey Harrison
    Cc: Jeremy Fitzhardinge
    Cc: Jens Axboe
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • A command that causes a line feed while a background color is active,
    such as

    perl -e 'print "x" x 60, "\e[44m", "x" x 40, "\e[0m\n"'
    and
    perl -e 'print "x" x 40, "\e[44m\n", "x" x 40, "\e[0m\n"'

    causes the line that was started as a result of the line feed to be completely
    filled with the currently active background color instead of the default
    color.

    When scrolling, part of the current screen is memcpy'd/memmove'd to the new
    region, and the new line(s) that will appear as a result are cleared using
    memset. However, the lines are cleared with vc->vc_video_erase_char, causing
    them to be colored with the currently active background color. This is
    different from X11 terminal emulators which always paint the new lines with
    the default background color (e.g. `xterm -bg black`).

    The clear operation (\e[1J and \e[2J) also use vc_video_erase_char, so a new
    vc->vc_scrl_erase_char is introduced with contains the erase character used
    for scrolling, which is built from vc->vc_def_color instead of vc->vc_color.

    Signed-off-by: Jan Engelhardt
    Cc: "Antonino A. Daplas"
    Cc: "H. Peter Anvin"
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Engelhardt
     
  • Instead of using the malloc() and free() wrappers needed by the
    lib/inflate.c code for allocations, simply use kmalloc() and kfree() in the
    initramfs code. This is needed for a further lib/inflate.c-related cleanup
    patch that will remove the malloc() and free() functions.

    Take that opportunity to remove the useless kmalloc() return value
    cast.

    Based on work done by Matt Mackall.

    Signed-off-by: Thomas Petazzoni
    Signed-off-by: Matt Mackall
    Cc: Jan Engelhardt
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Thomas Petazzoni
     
  • Due to the rcupreempt.h WARN_ON trigged, I got 2G syslog file. For some
    serious complaining of kernel, we need repeat the warnings, so here I isolate
    the ratelimit part of printk.c to a standalone file.

    Signed-off-by: Dave Young
    Acked-by: Paul E. McKenney
    Tested-by: Paul E. McKenney
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Young
     
  • We need to check ->s_dirt before calling write_super(). It became the cause
    of an unneeded write.

    This bug was noticed by Sudhanshu Saxena.

    Signed-off-by: OGAWA Hirofumi
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     
  • Add missing consts to xattr function arguments.

    Signed-off-by: David Howells
    Cc: Andreas Gruenbacher
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Remove lives_below_in_same_fs() since is_subdir() from fs/dcache.c is
    providing the same functionality.

    Signed-off-by: Jan Blunck
    Acked-by: Miklos Szeredi
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Blunck
     
  • fs/inode.c: use hlist_for_each_entry() in find_inode() and find_inode_fast()

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Matthias Kaehlcke
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthias Kaehlcke
     
  • Many inodes have no pagecache, so we can avoid lots of lock-takings.

    Signed-off-by: Jan Kara
    Cc: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
    before calling __invalidate_mapping_pages(). We just have to make sure inode
    won't go away from under us by keeping reference to it and putting the
    reference only after we have safely resumed the scan of the inode list. A bit
    tricky but not too bad...

    Signed-off-by: Jan Kara
    Cc: Fengguang Wu
    Cc: David Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • iommu_is_span_boundary in lib/iommu-helper.c was exported for PARISC IOMMUs
    (commit 3715863aa142c4f4c5208f5f3e5e9bac06006d2f). SWIOTLB can use it instead
    of the homegrown function.

    Signed-off-by: FUJITA Tomonori
    Cc: Thomas Gleixner
    Cc: Ingo Molnar
    Cc: H. Peter Anvin
    Cc: Tony Luck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    FUJITA Tomonori
     
  • There's a pointlessly braced block of code in there. Remove the braces and
    save a tabstop.

    Cc: Andi Kleen
    Cc: FUJITA Tomonori
    Cc: Jan Beulich
    Cc: Tony Luck
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • Before requesting firmware, printk a message saying what we're requesting. This
    makes it easier to see what's going on, and provides an explanation for the
    huge silent delay that one would otherwise get after accidentally building
    ipw2200 as a non-module.

    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Ciaran McCreesh