29 Apr, 2008

40 commits

  • Note: THIS_MODULE and header addition aren't technically needed because
    this code is not modular, but let's keep it anyway because people
    can copy this code into modular code.

    Signed-off-by: Alexey Dobriyan
    Cc: Alan Cox
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Now that last dozen or so users of ->get_info were removed, ditch it too.
    Everyone sane shouldd have switched to seq_file interface long ago.

    P.S.: Co-existing 3 interfaces (->get_info/->read_proc/->proc_fops) for proc
    is long-standing crap, BTW, thus
    a) put ->read_proc/->write_proc/read_proc_entry() users on death row,
    b) new such users should be rejected,
    c) everyone is encouraged to convert his favourite ->read_proc user or
    I'll do it, lazy bastards.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Remove proc_root export. Creation and removal works well if parent PDE is
    supplied as NULL -- it worked always that way.

    So, one useless export removed and consistency added, some drivers created
    PDEs with &proc_root as parent but removed them as NULL and so on.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Use creation by full path: "driver/foo".

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Use creation by full path instead: "fs/foo".

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Remove proc_bus export and variable itself. Using pathnames works fine
    and is slightly more understandable and greppable.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • proc-misc code is noticeably full of "if (de)" checks when PDE passed is
    always valid. Remove them.

    Addition of such check in proc_lookup_de() is for failed lookup case.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • If valid "parent" is passed to proc_create/remove_proc_entry(), then name of
    PDE should consist of only one path component, otherwise creation or or
    removal will fail. However, if NULL is passed as parent then create/remove
    accept full path as a argument. This is arbitrary restriction -- all
    infrastructure is in place.

    So, patch allows the following to succeed:

    create_proc_entry("foo/bar", 0, pde_baz);
    remove_proc_entry("baz/foo/bar", &proc_root);

    Also makes the following to behave identically:

    create_proc_entry("foo/bar", 0, NULL);
    create_proc_entry("foo/bar", 0, &proc_root);

    Discrepancy noticed by Den Lunev (IIRC).

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • proc_subdir_lock protects only modifying and walking through PDE lists, so
    after we've found PDE to remove and actually removed it from lists, there is
    no need to hold proc_subdir_lock for the rest of operation.

    Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • This cleans up the permission checks done for /proc/PID/mem i/o calls. It
    puts all the logic in a new function, check_mem_permission().

    The old code repeated the (!MAY_PTRACE(task) || !ptrace_may_attach(task))
    magical expression multiple times. The new function does all that work in one
    place, with clear comments.

    The old code called security_ptrace() twice on successful checks, once in
    MAY_PTRACE() and once in __ptrace_may_attach(). Now it's only called once,
    and only if all other checks have succeeded.

    Signed-off-by: Roland McGrath
    Cc: Alexey Dobriyan
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Roland McGrath
     
  • Signed-off-by: Alexey Dobriyan
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • The kernel implements readlink of /proc/pid/exe by getting the file from
    the first executable VMA. Then the path to the file is reconstructed and
    reported as the result.

    Because of the VMA walk the code is slightly different on nommu systems.
    This patch avoids separate /proc/pid/exe code on nommu systems. Instead of
    walking the VMAs to find the first executable file-backed VMA we store a
    reference to the exec'd file in the mm_struct.

    That reference would prevent the filesystem holding the executable file
    from being unmounted even after unmapping the VMAs. So we track the number
    of VM_EXECUTABLE VMAs and drop the new reference when the last one is
    unmapped. This avoids pinning the mounted filesystem.

    [akpm@linux-foundation.org: improve comments]
    [yamamoto@valinux.co.jp: fix dup_mmap]
    Signed-off-by: Matt Helsley
    Cc: Oleg Nesterov
    Cc: David Howells
    Cc:"Eric W. Biederman"
    Cc: Christoph Hellwig
    Cc: Al Viro
    Cc: Hugh Dickins
    Signed-off-by: YAMAMOTO Takashi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matt Helsley
     
  • This usually saves one recompile to insert similar printk like below. :)

    Sample nastygram:

    remove_proc_entry: removing non-empty directory '/proc/foo', leaking at least 'bar'
    ------------[ cut here ]------------
    WARNING: at fs/proc/generic.c:776 remove_proc_entry+0x18a/0x200()
    Modules linked in: foo(-) container fan battery dock sbs ac sbshc backlight ipv6 loop af_packet amd_rng sr_mod i2c_amd8111 i2c_amd756 cdrom i2c_core button thermal processor
    Pid: 3034, comm: rmmod Tainted: G M 2.6.25-rc1 #5

    Call Trace:
    [] warn_on_slowpath+0x64/0x90
    [] printk+0x4e/0x60
    [] remove_proc_entry+0x18a/0x200
    [] mutex_lock_nested+0x1c8/0x2d0
    [] __try_stop_module+0x0/0x40
    [] sys_delete_module+0x14d/0x200
    [] lockdep_sys_exit_thunk+0x35/0x67
    [] __up_read+0x27/0xa0
    [] trace_hardirqs_on_thunk+0x35/0x3a
    [] system_call_after_swapgs+0x7b/0x80

    ---[ end trace 10ef850597e89c54 ]---

    Signed-off-by: Alexey Dobriyan
    Cc: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Alexey Dobriyan
     
  • Fix these sparse warings:
    fs/binfmt_elf.c:1749:29: warning: symbol 'tmp' shadows an earlier one
    fs/binfmt_elf.c:1734:28: originally declared here
    fs/binfmt_elf.c:2009:26: warning: symbol 'vma' shadows an earlier one
    fs/binfmt_elf.c:1892:24: originally declared here

    [akpm@linux-foundation.org: chose better variable name]
    Signed-off-by: WANG Cong
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    WANG Cong
     
  • This patch does simplify fill_elf_header function by setting
    to zero the whole elf header first. So we fillup the fields
    we really need only.

    before:
    text data bss dec hex filename
    11735 80 0 11815 2e27 fs/binfmt_elf.o

    after:
    text data bss dec hex filename
    11710 80 0 11790 2e0e fs/binfmt_elf.o

    viola, 25 bytes of text is freed

    Signed-off-by: Cyrill Gorcunov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Cyrill Gorcunov
     
  • Remove the mem_cgroup member from mm_struct and instead adds an owner.

    This approach was suggested by Paul Menage. The advantage of this approach
    is that, once the mm->owner is known, using the subsystem id, the cgroup
    can be determined. It also allows several control groups that are
    virtually grouped by mm_struct, to exist independent of the memory
    controller i.e., without adding mem_cgroup's for each controller, to
    mm_struct.

    A new config option CONFIG_MM_OWNER is added and the memory resource
    controller selects this config option.

    This patch also adds cgroup callbacks to notify subsystems when mm->owner
    changes. The mm_cgroup_changed callback is called with the task_lock() of
    the new task held and is called just prior to changing the mm->owner.

    I am indebted to Paul Menage for the several reviews of this patchset and
    helping me make it lighter and simpler.

    This patch was tested on a powerpc box, it was compiled with both the
    MM_OWNER config turned on and off.

    After the thread group leader exits, it's moved to init_css_state by
    cgroup_exit(), thus all future charges from runnings threads would be
    redirected to the init_css_set's subsystem.

    Signed-off-by: Balbir Singh
    Cc: Pavel Emelianov
    Cc: Hugh Dickins
    Cc: Sudhir Kumar
    Cc: YAMAMOTO Takashi
    Cc: Hirokazu Takahashi
    Cc: David Rientjes ,
    Cc: Balbir Singh
    Acked-by: KAMEZAWA Hiroyuki
    Acked-by: Pekka Enberg
    Reviewed-by: Paul Menage
    Cc: Oleg Nesterov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Balbir Singh
     
  • Implement a cgroup to track and enforce open and mknod restrictions on device
    files. A device cgroup associates a device access whitelist with each cgroup.
    A whitelist entry has 4 fields. 'type' is a (all), c (char), or b (block).
    'all' means it applies to all types and all major and minor numbers. Major
    and minor are either an integer or * for all. Access is a composition of r
    (read), w (write), and m (mknod).

    The root device cgroup starts with rwm to 'all'. A child devcg gets a copy of
    the parent. Admins can then remove devices from the whitelist or add new
    entries. A child cgroup can never receive a device access which is denied its
    parent. However when a device access is removed from a parent it will not
    also be removed from the child(ren).

    An entry is added using devices.allow, and removed using
    devices.deny. For instance

    echo 'c 1:3 mr' > /cgroups/1/devices.allow

    allows cgroup 1 to read and mknod the device usually known as
    /dev/null. Doing

    echo a > /cgroups/1/devices.deny

    will remove the default 'a *:* mrw' entry.

    CAP_SYS_ADMIN is needed to change permissions or move another task to a new
    cgroup. A cgroup may not be granted more permissions than the cgroup's parent
    has. Any task can move itself between cgroups. This won't be sufficient, but
    we can decide the best way to adequately restrict movement later.

    [akpm@linux-foundation.org: coding-style fixes]
    [akpm@linux-foundation.org: fix may-be-used-uninitialized warning]
    Signed-off-by: Serge E. Hallyn
    Acked-by: James Morris
    Looks-good-to: Pavel Emelyanov
    Cc: Daniel Hokka Zakrisson
    Cc: Li Zefan
    Cc: Paul Menage
    Cc: Balbir Singh
    Cc: KAMEZAWA Hiroyuki
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Serge E. Hallyn
     
  • Make sure crypt_stat->flags is protected with a lock in ecryptfs_open().

    Signed-off-by: Michael Halcrow
    Cc: Al Viro
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Halcrow
     
  • Make eCryptfs key module subsystem respect namespaces.

    Since I will be removing the netlink interface in a future patch, I just made
    changes to the netlink.c code so that it will not break the build. With my
    recent patches, the kernel module currently defaults to the device handle
    interface rather than the netlink interface.

    [akpm@linux-foundation.org: export free_user_ns()]
    Signed-off-by: Michael Halcrow
    Acked-by: Serge Hallyn
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Halcrow
     
  • Update the versioning information. Make the message types generic. Add an
    outgoing message queue to the daemon struct. Make the functions to parse
    and write the packet lengths available to the rest of the module. Add
    functions to create and destroy the daemon structs. Clean up some of the
    comments and make the code a little more consistent with itself.

    [akpm@linux-foundation.org: printk fixes]
    Signed-off-by: Michael Halcrow
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Halcrow
     
  • A regular device file was my real preference from the get-go, but I went with
    netlink at the time because I thought it would be less complex for managing
    send queues (i.e., just do a unicast and move on). It turns out that we do
    not really get that much complexity reduction with netlink, and netlink is
    more heavyweight than a device handle.

    In addition, the netlink interface to eCryptfs has been broken since 2.6.24.
    I am assuming this is a bug in how eCryptfs uses netlink, since the other
    in-kernel users of netlink do not seem to be having any problems. I have had
    one report of a user successfully using eCryptfs with netlink on 2.6.24, but
    for my own systems, when starting the userspace daemon, the initial helo
    message sent to the eCryptfs kernel module results in an oops right off the
    bat. I spent some time looking at it, but I have not yet found the cause.
    The netlink interface breaking gave me the motivation to just finish my patch
    to migrate to a regular device handle. If I cannot find out soon why the
    netlink interface in eCryptfs broke, I am likely to just send a patch to
    disable it in 2.6.24 and 2.6.25. I would like the device handle to be the
    preferred means of communicating with the userspace daemon from 2.6.26 on
    forward.

    This patch:

    Functions to facilitate reading and writing to the eCryptfs miscellaneous
    device handle. This will replace the netlink interface as the preferred
    mechanism for communicating with the userspace eCryptfs daemon.

    Each user has his own daemon, which registers itself by opening the eCryptfs
    device handle. Only one daemon per euid may be registered at any given time.
    The eCryptfs module sends a message to a daemon by adding its message to the
    daemon's outgoing message queue. The daemon reads the device handle to get
    the oldest message off the queue.

    Incoming messages from the userspace daemon are immediately handled. If the
    message is a response, then the corresponding process that is blocked waiting
    for the response is awakened.

    Signed-off-by: Michael Halcrow
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Halcrow
     
  • Callers of notify_change() need to hold i_mutex.

    Signed-off-by: Miklos Szeredi
    Cc: Michael Halcrow
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Miklos Szeredi
     
  • __FUNCTION__ is gcc-specific, use __func__

    Signed-off-by: Harvey Harrison
    Cc: Michael Halcrow
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Harvey Harrison
     
  • Remove the no longer used ecryptfs_header_cache_0.

    Signed-off-by: Adrian Bunk
    Cc: Michael Halcrow
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • We need to check ->s_dirt before calling write_super(). It became the cause
    of an unneeded write.

    This bug was noticed by Sudhanshu Saxena.

    Signed-off-by: OGAWA Hirofumi
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    OGAWA Hirofumi
     
  • Add missing consts to xattr function arguments.

    Signed-off-by: David Howells
    Cc: Andreas Gruenbacher
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Remove lives_below_in_same_fs() since is_subdir() from fs/dcache.c is
    providing the same functionality.

    Signed-off-by: Jan Blunck
    Acked-by: Miklos Szeredi
    Cc: Al Viro
    Cc: Christoph Hellwig
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Blunck
     
  • fs/inode.c: use hlist_for_each_entry() in find_inode() and find_inode_fast()

    [akpm@linux-foundation.org: coding-style fixes]
    Signed-off-by: Matthias Kaehlcke
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Matthias Kaehlcke
     
  • Many inodes have no pagecache, so we can avoid lots of lock-takings.

    Signed-off-by: Jan Kara
    Cc: Fengguang Wu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Fix longstanding lock inversion in drop_pagecache_sb by dropping inode_lock
    before calling __invalidate_mapping_pages(). We just have to make sure inode
    won't go away from under us by keeping reference to it and putting the
    reference only after we have safely resumed the scan of the inode list. A bit
    tricky but not too bad...

    Signed-off-by: Jan Kara
    Cc: Fengguang Wu
    Cc: David Chinner
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Check that the size of the read returned by kernel_read() is what we asked
    for. If it isn't, then reject the binary as being a badly formatted.

    Signed-off-by: David Howells
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    David Howells
     
  • Use printk_ratelimit() instead of jiffies based arithmetic, suggested by Geert
    Uytterhoeven

    Signed-off-by: S.Caglar Onur
    Cc: Geert Uytterhoeven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    S.Caglar Onur
     
  • This can be triggered with root help only, but...

    Register the ":text:E::txt::/root/cat.txt:' rule in binfmt_misc (by root) and
    try launching the cat.txt file (by anyone) :) The result is - the endless
    recursion in the load_misc_binary -> open_exec -> load_misc_binary chain and
    stack overflow.

    There's a similar problem with binfmt_script, and there's a sh_bang memner on
    linux_binprm structure to handle this, but simply raising this in binfmt_misc
    may break some setups when the interpreter of some misc binaries is a script.

    So the proposal is to turn sh_bang into a bit, add a new one (the misc_bang)
    and raise it in load_misc_binary. After this, even if we set up the misc ->
    script -> misc loop for binfmts one of them will step on its own bang and
    exit.

    Signed-off-by: Pavel Emelyanov
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Pavel Emelyanov
     
  • powerpc:

    fs/coda/coda_linux.c: In function 'coda_iattr_to_vattr':
    fs/coda/coda_linux.c:137: warning: large integer implicitly truncated to unsigned type

    Cc: Jan Harkes
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Andrew Morton
     
  • I noticed that 2.6.24.2 calculates bprm->argv_len at do_execve(). But it
    doesn't update bprm->argv_len after "remove_arg_zero() +
    copy_strings_kernel()" at load_script() etc.

    audit_bprm() is called from search_binary_handler() and
    search_binary_handler() is called from load_script() etc. Thus, I think the
    condition check

    if (bprm->argv_len > (audit_argv_kb << 10))
    return -E2BIG;

    in audit_bprm() might return wrong result when strlen(removed_arg) !=
    strlen(spliced_args). Why not update bprm->argv_len at load_script() etc. ?

    By the way, 2.6.25-rc3 seems to not doing the condition check. Is the field
    bprm->argv_len no longer needed?

    Signed-off-by: Tetsuo Handa
    Cc: Ollie Wild
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Tetsuo Handa
     
  • if (...) BUG(); should be replaced with BUG_ON(...) when the test has no
    side-effects to allow a definition of BUG_ON that drops the code completely.

    The semantic patch that makes this change is as follows:
    (http://www.emn.fr/x-info/coccinelle/)

    //
    @ disable unlikely @ expression E,f; @@

    (
    if () { BUG(); }
    |
    - if (unlikely(E)) { BUG(); }
    + BUG_ON(E);
    )

    @@ expression E,f; @@

    (
    if () { BUG(); }
    |
    - if (E) { BUG(); }
    + BUG_ON(E);
    )
    //

    Signed-off-by: Julia Lawall
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Julia Lawall
     
  • fs/hfsplus/options.c (hfsplus_parse_options): Handle match_strdup failure.

    Signed-off-by: Jim Meyering
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jim Meyering
     
  • fs/hfs/super.c (parse_options): Handle match_strdup failure, twice.

    Signed-off-by: Jim Meyering
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jim Meyering
     
  • fs/affs/super.c (parse_options): Remove useless initialization. Handle
    match_strdup failure.

    Signed-off-by: Jim Meyering
    Cc: Roman Zippel
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jim Meyering
     
  • struct char_device_struct::fops is no longer used: remove it.

    Signed-off-by: Jiri Olsa
    Cc: Greg KH
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jiri Olsa