15 Nov, 2007

12 commits

  • With 64KB blocksize, a directory entry can have size 64KB which does not
    fit into 16 bits we have for entry lenght. So we store 0xffff instead and
    convert value when read from / written to disk. The patch also converts
    some places to use ext3_next_entry() when we are changing them anyway.

    [akpm@linux-foundation.org: coding-style cleanups]
    Signed-off-by: Jan Kara
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • Fix some warnings with SMBFS_DEBUG_* builds. This patch makes it so that
    builds with -Werror don't fail.

    Signed-off-by: Jeff Layton
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jeff Layton
     
  • sys_open / sys_read were used in the early 1.2 days to load firmware from
    disk inside drivers. Since 2.0 or so this was deprecated behavior, but
    several drivers still were using this. Since a few years we have a
    request_firmware() API that implements this in a nice, consistent way.
    Only some old ISA sound drivers (pre-ALSA) still straggled along for some
    time.... however with commit c2b1239a9f22f19c53543b460b24507d0e21ea0c the
    last user is now gone.

    This is a good thing, since using sys_open / sys_read etc for firmware is a
    very buggy to dangerous thing to do; these operations put an fd in the
    process file descriptor table.... which then can be tampered with from
    other threads for example. For those who don't want the firmware loader,
    filp_open()/vfs_read are the better APIs to use, without this security
    issue.

    The patch below marks sys_open and sys_read unused now that they're
    really not used anymore, and for deletion in the 2.6.25 timeframe.

    Signed-off-by: Arjan van de Ven
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Arjan van de Ven
     
  • Currently we special case when we have only the initial pid namespace.
    Unfortunately in doing so the copied case for the other namespaces was
    broken so we don't properly flush the thread directories :(

    So this patch removes the unnecessary special case (removing a usage of
    proc_mnt) and corrects the flushing of the thread directories.

    Signed-off-by: Eric W. Biederman
    Cc: Al Viro
    Cc: Pavel Emelyanov
    Cc: Sukadev Bhattiprolu
    Cc: Kirill Korotaev
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Fix obvious NULL dereferences spotted by the Coverity checker.

    Signed-off-by: Adrian Bunk
    Acked-by: Miklos Szeredi
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adrian Bunk
     
  • This is not a new problem in 2.6.23-git17. 2.6.22/2.6.23 is buggy in the
    same way.

    Reiserfs could accumulate dirty sub-page-size files until umount time.
    They cannot be synced to disk by pdflush routines or explicit `sync'
    commands. Only `umount' can do the trick.

    The direct cause is: the dirty page's PG_dirty is wrongly _cleared_.
    Call trace:
    [] cancel_dirty_page+0xd0/0xf0
    [] :reiserfs:reiserfs_cut_from_item+0x660/0x710
    [] :reiserfs:reiserfs_do_truncate+0x271/0x530
    [] :reiserfs:reiserfs_truncate_file+0xfd/0x3b0
    [] :reiserfs:reiserfs_file_release+0x1e0/0x340
    [] __fput+0xcc/0x1b0
    [] fput+0x16/0x20
    [] filp_close+0x56/0x90
    [] sys_close+0xad/0x110
    [] system_call+0x7e/0x83

    Fix the bug by removing the cancel_dirty_page() call. Tests show that
    it causes no bad behaviors on various write sizes.

    === for the patient ===
    Here are more detailed demonstrations of the problem.

    1) the page has both PG_dirty(D)/PAGECACHE_TAG_DIRTY(d) after being written to;
    and then only PAGECACHE_TAG_DIRTY(d) remains after the file is closed.

    ------------------------------ screen 0 ------------------------------
    [T0] root /home/wfg# cat > /test/tiny
    [T1] hi
    [T2] root /home/wfg#

    ------------------------------ screen 1 ------------------------------
    [T1] root /home/wfg# echo /test/tiny > /proc/filecache
    [T1] root /home/wfg# cat /proc/filecache
    # file /test/tiny
    # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback
    # idx len state refcnt
    0 1 ___UD__Bd_ 2
    [T2] root /home/wfg# cat /proc/filecache
    # file /test/tiny
    # flags R:referenced A:active M:mmap U:uptodate D:dirty W:writeback O:owner B:buffer d:dirty w:writeback
    # idx len state refcnt
    0 1 ___U___Bd_ 2

    2) note the non-zero 'cancelled_write_bytes' after /tmp/hi is copied.

    ------------------------------ screen 0 ------------------------------
    [T0] root /home/wfg# echo hi > /tmp/hi
    [T1] root /home/wfg# cp /tmp/hi /dev/stdin /test
    [T2] hi
    [T3] root /home/wfg#

    ------------------------------ screen 1 ------------------------------
    [T1] root /proc/4397# cd /proc/`pidof cp`
    [T1] root /proc/4713# cat io
    rchar: 8396
    wchar: 3
    syscr: 20
    syscw: 1
    read_bytes: 0
    write_bytes: 20480
    cancelled_write_bytes: 4096
    [T2] root /proc/4713# cat io
    rchar: 8399
    wchar: 6
    syscr: 21
    syscw: 2
    read_bytes: 0
    write_bytes: 24576
    cancelled_write_bytes: 4096

    //Question: the 'write_bytes' is a bit more than expected ;-)

    Tested-by: Maxim Levitsky
    Cc: Peter Zijlstra
    Cc: Jeff Mahoney
    Signed-off-by: Fengguang Wu
    Reviewed-by: Chris Mason
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Fengguang Wu
     
  • I found a few bugs in the BFS driver. Detailed description of the bugs as
    well as the steps to reproduce the errors are given in the kernel bugzilla.
    Please follow these links for more information:

    http://bugzilla.kernel.org/show_bug.cgi?id=9363
    http://bugzilla.kernel.org/show_bug.cgi?id=9364
    http://bugzilla.kernel.org/show_bug.cgi?id=9365
    http://bugzilla.kernel.org/show_bug.cgi?id=9366

    This patch fixes the bugs described above. Besides, the patch introduces
    coding style changes to make the BFS driver conform to the requirements
    specified for Linux kernel code. Finally, I made a few cosmetic changes
    such as removal of trivial debug output.

    Also, the patch removes the fields `si_lf_ioff' and `si_lf_sblk' of the
    in-core superblock structure. These fields are initialized but never
    actually used.

    If you are wondering why I need BFS, here is the answer: I am using this
    driver in the context of Linux kernel classes I am teaching in the Moscow
    State University and in the International Institute of Information
    Technology in Pune, India.

    Signed-off-by: Dmitri Vorobiev
    Cc: Tigran Aivazian
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dmitri Vorobiev
     
  • Add a second parameter 'delta' to hugetlb_get_quota and hugetlb_put_quota to
    allow bulk updating of the sbinfo->free_blocks counter. This will be used by
    the next patch in the series.

    Signed-off-by: Adam Litke
    Cc: Ken Chen
    Cc: Andy Whitcroft
    Cc: Dave Hansen
    Cc: David Gibson
    Cc: William Lee Irwin III
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adam Litke
     
  • The hugetlbfs quota management system was never taught to handle MAP_PRIVATE
    mappings when that support was added. Currently, quota is debited at page
    instantiation and credited at file truncation. This approach works correctly
    for shared pages but is incomplete for private pages. In addition to
    hugetlb_no_page(), private pages can be instantiated by hugetlb_cow(); but
    this function does not respect quotas.

    Private huge pages are treated very much like normal, anonymous pages. They
    are not "backed" by the hugetlbfs file and are not stored in the mapping's
    radix tree. This means that private pages are invisible to
    truncate_hugepages() so that function will not credit the quota.

    This patch (based on a prototype provided by Ken Chen) moves quota crediting
    for all pages into free_huge_page(). page->private is used to store a pointer
    to the mapping to which this page belongs. This is used to credit quota on
    the appropriate hugetlbfs instance.

    Signed-off-by: Adam Litke
    Cc: Ken Chen
    Cc: Ken Chen
    Cc: Andy Whitcroft
    Cc: Dave Hansen
    Cc: David Gibson
    Cc: William Lee Irwin III
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Adam Litke
     
  • It appears we overlooked support for removing generic proc files
    when we added support for multiple proc super blocks. Handle
    that now.

    [akpm@linux-foundation.org: coding-style cleanups]
    Signed-off-by: Eric W. Biederman
    Acked-by: Pavel Emelyanov
    Cc: Alexey Dobriyan
    Acked-by: Sukadev Bhattiprolu
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Eric W. Biederman
     
  • Forbid user from changing file flags on quota files. User has no bussiness
    in playing with these flags when quota is on. Furthermore there is a
    remote possibility of deadlock due to a lock inversion between quota file's
    i_mutex and transaction's start (i_mutex for quota file is locked only when
    trasaction is started in quota operations) in ext3 and ext4.

    Signed-off-by: Jan Kara
    Cc: LIOU Payphone
    Cc:
    Acked-by: Dave Kleikamp
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Jan Kara
     
  • page->index should be cast to loff_t instead of off_t.

    Signed-off-by: Michael Halcrow
    Reported-by: Eric Sandeen
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Michael Halcrow
     

14 Nov, 2007

2 commits

  • * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (45 commits)
    [NETFILTER]: xt_time should not assume CONFIG_KTIME_SCALAR
    [NET]: Move unneeded data to initdata section.
    [NET]: Cleanup pernet operation without CONFIG_NET_NS
    [TEHUTI]: Fix incorrect usage of strncat in bdx_get_drvinfo()
    [MYRI_SBUS]: Prevent that myri_do_handshake lies about ticks.
    [NETFILTER]: bridge: fix double POSTROUTING hook invocation
    [NETFILTER]: Consolidate nf_sockopt and compat_nf_sockopt
    [NETFILTER]: nf_nat: fix memset error
    [INET]: Use list_head-s in inetpeer.c
    [IPVS]: Remove unused exports.
    [NET]: Unexport sysctl_{r,w}mem_max.
    [TG3]: Update version to 3.86
    [TG3]: MII => TP
    [TG3]: Add A1 revs
    [TG3]: Increase the PCI MRRS
    [TG3]: Prescaler fix
    [TG3]: Limit 5784 / 5764 to MAC LED mode
    [TG3]: Disable GPHY autopowerdown
    [TG3]: CPMU adjustments for loopback tests
    [TG3]: Fix nvram selftest failures
    ...

    Linus Torvalds
     
  • This reverts commit 7c9e69faa28027913ee059c285a5ea8382e24b5d, fixing up
    conflicts in fs/ext4/balloc.c manually.

    The cost of doing the bitmap validation on each lookup - even when the
    bitmap is cached - is absolutely prohibitive. We could, and probably
    should, do it only when adding the bitmap to the buffer cache. However,
    right now we are better off just reverting it.

    Peter Zijlstra measured the cost of this extra validation as a 85%
    decrease in cached iozone, and while I had a patch that took it down to
    just 17% by not being _quite_ so stupid in the validation, it was still
    a big slowdown that could have been avoided by just doing it right.

    Cc: Peter Zijlstra
    Cc: Andrew Morton
    Cc: Aneesh Kumar
    Cc: Andreas Dilger
    Cc: Mingming Cao
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

13 Nov, 2007

6 commits

  • This patch reverts Eric's commit 2b008b0a8e96b726c603c5e1a5a7a509b5f61e35

    It diets .text & .data section of the kernel if CONFIG_NET_NS is not set.
    This is safe after list operations cleanup.

    Signed-of-by: Denis V. Lunev
    Signed-off-by: David S. Miller

    Denis V. Lunev
     
  • ...and fix a couple of bugs in the NBD, CIFS and OCFS2 socket handlers.

    Looking at the sock->op->shutdown() handlers, it looks as if all of them
    take a SHUT_RD/SHUT_WR/SHUT_RDWR argument instead of the
    RCV_SHUTDOWN/SEND_SHUTDOWN arguments.
    Add a helper, and then define the SHUT_* enum to ensure that kernel users
    of shutdown() don't get confused.

    Signed-off-by: Trond Myklebust
    Acked-by: Mark Fasheh
    Acked-by: David Howells
    Signed-off-by: David S. Miller

    Trond Myklebust
     
  • As with commit 7fc90ec93a5eb71f4b08403baf5ba7176b3ec6b1 ("knfsd: nfsd:
    call nfsd_setuser() on fh_compose(), fix nfsd4 permissions problem")
    this is a case where we need to redo a security check in fh_verify()
    even though the filehandle already has an associated dentry--if the
    filehandle was created by fh_compose() in an earlier operation of the
    nfsv4 compound, then we may not have done these checks yet.

    Without this fix it is possible, for example, to traverse from an export
    without the secure ports requirement to one with it in a single
    compound, and bypass the secure port check on the new export.

    While we're here, fix up some minor style problems and change a printk()
    to a dprintk(), to make it harder for random unprivileged users to spam
    the logs.

    Signed-off-by: J. Bruce Fields
    Reviewed-By: NeilBrown
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • The v2/v3 acl code in nfsd is translating any return from fh_verify() to
    nfserr_inval. This is particularly unfortunate in the case of an
    nfserr_dropit return, which is an internal error meant to indicate to
    callers that this request has been deferred and should just be dropped
    pending the results of an upcall to mountd.

    Thanks to Roland for bug report and data collection.

    Cc: Roland
    Acked-by: Andreas Gruenbacher
    Signed-off-by: J. Bruce Fields
    Reviewed-By: NeilBrown
    Signed-off-by: Linus Torvalds

    J. Bruce Fields
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: (21 commits)
    [CIFS] fix oops on second mount to same server when null auth is used
    [CIFS] Fix stale mode after readdir when cifsacl specified
    [CIFS] add mode to acl conversion helper function
    [CIFS] Fix incorrect mode when ACL had deny access control entries
    [CIFS] Add uid to key description so krb can handle user mounts
    [CIFS] Fix walking out end of cifs dacl
    [CIFS] Add upcall files for cifs to use spnego/kerberos
    [CIFS] add OIDs for KRB5 and MSKRB5 to ASN1 parsing routines
    [CIFS] Register and unregister cifs_spnego_key_type on module init/exit
    [CIFS] implement upcalls for SPNEGO blob via keyctl API
    [CIFS] allow cifs_calc_signature2 to deal with a zero length iovec
    [CIFS] If no Access Control Entries, set mode perm bits to zero
    [CIFS] when mount helper missing fix slash wrong direction in share
    [CIFS] Don't request too much permission when reading an ACL
    [CIFS] enable get mode from ACL when cifsacl mount option specified
    [CIFS] ACL support part 8
    [CIFS] acl support part 7
    [CIFS] acl support part 6
    [CIFS] acl support part 6
    [CIFS] remove unused funtion compile warning when experimental off
    ...

    Linus Torvalds
     
  • The coredump code always calls set_dumpable(0) when it starts (even
    if RLIMIT_CORE prevents any core from being dumped). The effect of
    this (via task_dumpable) is to make /proc/pid/* files owned by root
    instead of the user, so the user can no longer examine his own
    process--in a case where there was never any privileged data to
    protect. This affects e.g. auxv, environ, fd; in Fedora (execshield)
    kernels, also maps. In practice, you can only notice this when a
    debugger has requested PTRACE_EVENT_EXIT tracing.

    set_dumpable was only used in do_coredump for synchronization and not
    intended for any security purpose. (It doesn't secure anything that wasn't
    already unsecured when a process dies by SIGTERM instead of SIGQUIT.)

    This changes do_coredump to check the core_waiters count as the means of
    synchronization, which is sufficient. Now we leave the "dumpable" bits alone.

    Signed-off-by: Roland McGrath
    Signed-off-by: Linus Torvalds

    Roland McGrath
     

10 Nov, 2007

3 commits

  • When a share is mounted using no username, cifs_mount sets
    volume_info.username as a NULL pointer, and the sesInfo userName as an
    empty string. The volume_info.username is passed to a couple of other
    functions to see if there is an existing unc or tcp connection that can
    be used. These functions assume that the username will be a valid
    string that can be passed to strncmp. If the pointer is NULL, then the
    kernel will oops if there's an existing session to which the string
    can be compared.

    This patch changes cifs_mount to set volume_info.username to an empty
    string in this situation, which prevents the oops and should make it
    so that the comparison to other null auth sessions match.

    Signed-off-by: Jeff Layton
    Signed-off-by: Steve French

    Jeff Layton
     
  • * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
    Add UNPLUG traces to all appropriate places
    block: fix requeue handling in blk_queue_invalidate_tags()
    mmc: Fix sg helper copy-and-paste error
    pktcdvd: fix BUG caused by sysfs module reference semantics change
    ioprio: allow sys_ioprio_set() value of 0 to reset ioprio setting
    cfq_idle_class_timer: add paranoid checks for jiffies overflow
    cfq: fix IOPRIO_CLASS_IDLE delays
    cfq: fix IOPRIO_CLASS_IDLE accounting

    Linus Torvalds
     
  • * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2:
    ocfs2: fix rename vs unlink race
    [PATCH] Fix possibly too long write in o2hb_setup_one_bio()
    ocfs2: fix write() performance regression
    ocfs2: Commit journal on sync writes
    ocfs2: Re-order iput in ocfs2_drop_dentry_lock
    ocfs2: Create locks at initially requested level
    [PATCH] Fix priority mistakes in fs/ocfs2/{alloc.c, dlmglue.c}
    [2.6 patch] make ocfs2_find_entry_el() static

    Linus Torvalds
     

09 Nov, 2007

4 commits

  • When mounted with cifsacl mount option, readdir can not
    instantiate the inode with the estimated mode based on the ACL
    for each file since we have not queried for the ACL for
    each of these files yet. So set the refresh time to zero
    for these inodes so that the next stat will cause the client
    to go to the server for the ACL info so we can build the estimated
    mode (this means we also will issue an extra QueryPathInfo if
    the stat happens within 1 second, but this is trivial compared to
    the time required to open/getacl/close for each).

    ls -l is slower when cifsacl mount option is specified, but
    displays correct mode information.

    Signed-off-by: Shirish Pargaonkar
    Signed-off-by: Steve French

    Steve French
     
  • Acked-by: Shirish Pargaonkar
    Signed-off-by: Steve French

    Steve French
     
  • When mounted with the cifsacl mount option, we were
    treating any deny ACEs found like allow ACEs and it turns out for
    SFU and SUA Windows set these type of access control entries often.
    The order of ACEs is important too. The canonical order that most
    ACL tools and Windows explorer consruct ACLs with is to begin with
    DENY entries then follow with ALLOW, otherwise an allow entry
    could be encountered first, making the subsequent deny entry like "dead
    code which would be superflous since Windows stops when a match is
    made for the operation you are trying to perform for your user

    We start with no permissions in the mode and build up as we find
    permissions (ie allow ACEs). This fixes deny ACEs so they affect
    the mask used to set the subsequent allow ACEs.

    Acked-by: Shirish Pargaonkar
    CC: Alexander Bokovoy
    Signed-off-by: Steve French

    Steve French
     
  • Adds uid to key description fro supporting user mounts
    and minor formating changes

    Acked-by: Jeff Layton
    Signed-off-by: Igor Mammedov
    Signed-off-by: Steve French

    Igor Mammedov
     

07 Nov, 2007

11 commits


06 Nov, 2007

2 commits