14 Mar, 2011

2 commits

  • all remaining callers pass LOOKUP_PARENT to it, so
    flags argument can die; renamed to kern_path_parent()

    Signed-off-by: Al Viro

    Al Viro
     
  • Fix for a dumb preadv()/pwritev() compat bug - unlike the native
    variants, compat_... ones forget to check FMODE_P{READ,WRITE}, so e.g.
    on pipe the native preadv() will fail with -ESPIPE and compat one will
    act as readv() and succeed. Not critical, but it's a clear bug with trivial
    fix.

    Signed-off-by: Al Viro

    Al Viro
     

10 Mar, 2011

12 commits


09 Mar, 2011

2 commits


08 Mar, 2011

2 commits

  • a) struct inode is not going to be freed under ->d_compare();
    however, the thing PROC_I(inode)->sysctl points to just might.
    Fortunately, it's enough to make freeing that sucker delayed,
    provided that we don't step on its ->unregistering, clear
    the pointer to it in PROC_I(inode) before dropping the reference
    and check if it's NULL in ->d_compare().

    b) I'm not sure that we *can* walk into NULL inode here (we recheck
    dentry->seq between verifying that it's still hashed / fetching
    dentry->d_inode and passing it to ->d_compare() and there's no
    negative hashed dentries in /proc/sys/*), but if we can walk into
    that, we really should not have ->d_compare() return 0 on it!
    Said that, I really suspect that this check can be simply killed.
    Nick?

    Signed-off-by: Al Viro

    Al Viro
     
  • In case of a nonempty list, the return on error here is obviously bogus;
    it ends up being a pointer to the list head instead of to any valid
    delegation on the list.

    In particular, if nfsd4_delegreturn() hits this case, and you're quite unlucky,
    then renew_client may oops, and it may take an embarassingly long time to
    figure out why. Facepalm.

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000090
    IP: [] nfsd4_delegreturn+0x125/0x200
    ...

    Cc: stable@kernel.org
    Signed-off-by: J. Bruce Fields

    J. Bruce Fields
     

06 Mar, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    ceph: no .snap inside of snapped namespace
    libceph: fix msgr standby handling
    libceph: fix msgr keepalive flag
    libceph: fix msgr backoff
    libceph: retry after authorization failure
    libceph: fix handling of short returns from get_user_pages
    ceph: do not clear I_COMPLETE from d_release
    ceph: do not set I_COMPLETE
    Revert "ceph: keep reference to parent inode on ceph_dentry"

    Linus Torvalds
     

05 Mar, 2011

3 commits

  • The "bad_page()" page allocator sanity check was reported recently (call
    chain as follows):

    bad_page+0x69/0x91
    free_hot_cold_page+0x81/0x144
    skb_release_data+0x5f/0x98
    __kfree_skb+0x11/0x1a
    tcp_ack+0x6a3/0x1868
    tcp_rcv_established+0x7a6/0x8b9
    tcp_v4_do_rcv+0x2a/0x2fa
    tcp_v4_rcv+0x9a2/0x9f6
    do_timer+0x2df/0x52c
    ip_local_deliver+0x19d/0x263
    ip_rcv+0x539/0x57c
    netif_receive_skb+0x470/0x49f
    :virtio_net:virtnet_poll+0x46b/0x5c5
    net_rx_action+0xac/0x1b3
    __do_softirq+0x89/0x133
    call_softirq+0x1c/0x28
    do_softirq+0x2c/0x7d
    do_IRQ+0xec/0xf5
    default_idle+0x0/0x50
    ret_from_intr+0x0/0xa
    default_idle+0x29/0x50
    cpu_idle+0x95/0xb8
    start_kernel+0x220/0x225
    _sinittext+0x22f/0x236

    It occurs because an skb with a fraglist was freed from the tcp
    retransmit queue when it was acked, but a page on that fraglist had
    PG_Slab set (indicating it was allocated from the Slab allocator (which
    means the free path above can't safely free it via put_page.

    We tracked this back to an nfsv4 setacl operation, in which the nfs code
    attempted to fill convert the passed in buffer to an array of pages in
    __nfs4_proc_set_acl, which gets used by the skb->frags list in
    xs_sendpages. __nfs4_proc_set_acl just converts each page in the buffer
    to a page struct via virt_to_page, but the vfs allocates the buffer via
    kmalloc, meaning the PG_slab bit is set. We can't create a buffer with
    kmalloc and free it later in the tcp ack path with put_page, so we need
    to either:

    1) ensure that when we create the list of pages, no page struct has
    PG_Slab set

    or

    2) not use a page list to send this data

    Given that these buffers can be multiple pages and arbitrarily sized, I
    think (1) is the right way to go. I've written the below patch to
    allocate a page from the buddy allocator directly and copy the data over
    to it. This ensures that we have a put_page free-able page for every
    entry that winds up on an skb frag list, so it can be safely freed when
    the frame is acked. We do a put page on each entry after the
    rpc_call_sync call so as to drop our own reference count to the page,
    leaving only the ref count taken by tcp_sendpages. This way the data
    will be properly freed when the ack comes in

    Successfully tested by myself to solve the above oops.

    Note, as this is the result of a setacl operation that exceeded a page
    of data, I think this amounts to a local DOS triggerable by an
    uprivlidged user, so I'm CCing security on this as well.

    Signed-off-by: Neil Horman
    CC: Trond Myklebust
    CC: security@kernel.org
    CC: Jeff Layton
    Signed-off-by: Linus Torvalds

    Neil Horman
     
  • Otherwise you can do things like

    # mkdir .snap/foo
    # cd .snap/foo/.snap
    # ls

    Signed-off-by: Sage Weil

    Sage Weil
     
  • failure exits on the no-O_CREAT side of do_filp_open() merge with
    those of O_CREAT one; unfortunately, if do_path_lookup() returns
    -ESTALE, we'll get out_filp:, notice that we are about to return
    -ESTALE without having trying to create the sucker with LOOKUP_REVAL
    and jump right into the O_CREAT side of code. And proceed to try
    and create a file. Usually that'll fail with -ESTALE again, but
    we can race and get that attempt of pathname resolution to succeed.

    open() without O_CREAT really shouldn't end up creating files, races
    or not. The real fix is to rearchitect the whole do_filp_open(),
    but for now splitting the failure exits will do.

    Signed-off-by: Al Viro

    Al Viro
     

04 Mar, 2011

6 commits


03 Mar, 2011

10 commits


02 Mar, 2011

2 commits

  • vfs_rename_other() does not lock renamed inode with i_mutex. Thus changing
    i_nlink in a non-atomic manner (which happens in ext2_rename()) can corrupt
    it as reported and analyzed by Josh.

    In fact, there is no good reason to mess with i_nlink of the moved file.
    We did it presumably to simulate linking into the new directory and unlinking
    from an old one. But the practical effect of this is disputable because fsck
    can possibly treat file as being properly linked into both directories without
    writing any error which is confusing. So we just stop increment-decrement
    games with i_nlink which also fixes the corruption.

    CC: stable@kernel.org
    CC: Al Viro
    Signed-off-by: Josh Hunt
    Signed-off-by: Jan Kara

    Josh Hunt
     
  • Commit 493f3358cb289ccf716c5a14fa5bb52ab75943e5 added this call to
    xfs_fs_geometry() in order to avoid passing kernel stack data back
    to user space:

    + memset(geo, 0, sizeof(*geo));

    Unfortunately, one of the callers of that function passes the
    address of a smaller data type, cast to fit the type that
    xfs_fs_geometry() requires. As a result, this can happen:

    Kernel panic - not syncing: stack-protector: Kernel stack is corrupted
    in: f87aca93

    Pid: 262, comm: xfs_fsr Not tainted 2.6.38-rc6-493f3358cb2+ #1
    Call Trace:

    [] ? panic+0x50/0x150
    [] ? __stack_chk_fail+0x10/0x18
    [] ? xfs_ioc_fsgeometry_v1+0x56/0x5d [xfs]

    Fix this by fixing that one caller to pass the right type and then
    copy out the subset it is interested in.

    Note: This patch is an alternative to one originally proposed by
    Eric Sandeen.

    Reported-by: Jeffrey Hundstad
    Signed-off-by: Alex Elder
    Reviewed-by: Eric Sandeen
    Tested-by: Jeffrey Hundstad

    Alex Elder