16 Mar, 2011

5 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (57 commits)
    tidy the trailing symlinks traversal up
    Turn resolution of trailing symlinks iterative everywhere
    simplify link_path_walk() tail
    Make trailing symlink resolution in path_lookupat() iterative
    update nd->inode in __do_follow_link() instead of after do_follow_link()
    pull handling of one pathname component into a helper
    fs: allow AT_EMPTY_PATH in linkat(), limit that to CAP_DAC_READ_SEARCH
    Allow passing O_PATH descriptors via SCM_RIGHTS datagrams
    readlinkat(), fchownat() and fstatat() with empty relative pathnames
    Allow O_PATH for symlinks
    New kind of open files - "location only".
    ext4: Copy fs UUID to superblock
    ext3: Copy fs UUID to superblock.
    vfs: Export file system uuid via /proc//mountinfo
    unistd.h: Add new syscalls numbers to asm-generic
    x86: Add new syscalls for x86_64
    x86: Add new syscalls for x86_32
    fs: Remove i_nlink check from file system link callback
    fs: Don't allow to create hardlink for deleted file
    vfs: Add open by file handle support
    ...

    Linus Torvalds
     
  • * 'for-linus' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm:
    xen: suspend: remove xen_hvm_suspend
    xen: suspend: pull pre/post suspend hooks out into suspend_info
    xen: suspend: move arch specific pre/post suspend hooks into generic hooks
    xen: suspend: refactor non-arch specific pre/post suspend hooks
    xen: suspend: add "arch" to pre/post suspend hooks
    xen: suspend: pass extra hypercall argument via suspend_info struct
    xen: suspend: refactor cancellation flag into a structure
    xen: suspend: use HYPERVISOR_suspend for PVHVM case instead of open coding
    xen: switch to new schedop hypercall by default.
    xen: use new schedop interface for suspend
    xen: do not respond to unknown xenstore control requests
    xen: fix compile issue if XEN is enabled but XEN_PVHVM is disabled
    xen: PV on HVM: support PV spinlocks and IPIs
    xen: make the ballon driver work for hvm domains
    xen-blkfront: handle Xen major numbers other than XENVBD
    xen: do not use xen_info on HVM, set pv_info name to "Xen HVM"
    xen: no need to delay xen_setup_shutdown_event for hvm guests anymore

    Linus Torvalds
     
  • …git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen

    * 'stable/ia64' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
    xen: ia64 build broken due to "xen: switch to new schedop hypercall by default."

    * 'stable/blkfront-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
    xen: Union the blkif_request request specific fields

    * 'stable/cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
    xen: annotate functions which only call into __init at start of day
    xen p2m: annotate variable which appears unused
    xen: events: mark cpu_evtchn_mask_p as __refdata

    Linus Torvalds
     
  • * 'stable/irq.cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
    xen: events: remove dom0 specific xen_create_msi_irq
    xen: events: use xen_bind_pirq_msi_to_irq from xen_create_msi_irq
    xen: events: push set_irq_msi down into xen_create_msi_irq
    xen: events: update pirq_to_irq in xen_create_msi_irq
    xen: events: refactor xen_create_msi_irq slightly
    xen: events: separate MSI PIRQ allocation from PIRQ binding to IRQ
    xen: events: assume PHYSDEVOP_get_free_pirq exists
    xen: pci: collapse apic_register_gsi_xen_hvm and xen_hvm_register_pirq
    xen: events: return irq from xen_allocate_pirq_msi
    xen: events: drop XEN_ALLOC_IRQ flag to xen_allocate_pirq_msi
    xen: events: do not leak IRQ from xen_allocate_pirq_msi when no pirq available.
    xen: pci: only define xen_initdom_setup_msi_irqs if CONFIG_XEN_DOM0

    Linus Torvalds
     
  • …el.org/pub/scm/linux/kernel/git/konrad/xen

    * 'stable/irq.rework' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
    xen/irq: Cleanup up the pirq_to_irq for DomU PV PCI passthrough guests as well.
    xen: Use IRQF_FORCE_RESUME
    xen/timer: Missing IRQF_NO_SUSPEND in timer code broke suspend.
    xen: Fix compile error introduced by "switch to new irq_chip functions"
    xen: Switch to new irq_chip functions
    xen: Remove stale irq_chip.end
    xen: events: do not free legacy IRQs
    xen: events: allocate GSIs and dynamic IRQs from separate IRQ ranges.
    xen: events: add xen_allocate_irq_{dynamic, gsi} and xen_free_irq
    xen:events: move find_unbound_irq inside CONFIG_PCI_MSI
    xen: handled remapped IRQs when enabling a pcifront PCI device.
    genirq: Add IRQF_FORCE_RESUME

    * 'stable/pcifront-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
    pci/xen: When free-ing MSI-X/MSI irq->desc also use generic code.
    pci/xen: Cleanup: convert int** to int[]
    pci/xen: Use xen_allocate_pirq_msi instead of xen_allocate_pirq
    xen-pcifront: Sanity check the MSI/MSI-X values
    xen-pcifront: don't use flush_scheduled_work()

    Linus Torvalds
     

15 Mar, 2011

7 commits

  • New flag for open(2) - O_PATH. Semantics:
    * pathname is resolved, but the file itself is _NOT_ opened
    as far as filesystem is concerned.
    * almost all operations on the resulting descriptors shall
    fail with -EBADF. Exceptions are:
    1) operations on descriptors themselves (i.e.
    close(), dup(), dup2(), dup3(), fcntl(fd, F_DUPFD),
    fcntl(fd, F_DUPFD_CLOEXEC, ...), fcntl(fd, F_GETFD),
    fcntl(fd, F_SETFD, ...))
    2) fcntl(fd, F_GETFL), for a common non-destructive way to
    check if descriptor is open
    3) "dfd" arguments of ...at(2) syscalls, i.e. the starting
    points of pathname resolution
    * closing such descriptor does *NOT* affect dnotify or
    posix locks.
    * permissions are checked as usual along the way to file;
    no permission checks are applied to the file itself. Of course,
    giving such thing to syscall will result in permission checks (at
    the moment it means checking that starting point of ....at() is
    a directory and caller has exec permissions on it).

    fget() and fget_light() return NULL on such descriptors; use of
    fget_raw() and fget_raw_light() is needed to get them. That protects
    existing code from dealing with those things.

    There are two things still missing (they come in the next commits):
    one is handling of symlinks (right now we refuse to open them that
    way; see the next commit for semantics related to those) and another
    is descriptor passing via SCM_RIGHTS datagrams.

    Signed-off-by: Al Viro

    Al Viro
     
  • We add a per superblock uuid field. File systems should
    update the uuid in the fill_super callback

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Al Viro

    Aneesh Kumar K.V
     
  • Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Al Viro

    Aneesh Kumar K.V
     
  • [AV: duplicate of open() guts removed; file_open_root() used instead]

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Al Viro

    Aneesh Kumar K.V
     
  • The syscall also return mount id which can be used
    to lookup file system specific information such as uuid
    in /proc//mountinfo

    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Al Viro

    Aneesh Kumar K.V
     
  • For name_to_handle_at(2) we'll want both ...at()-style syscall that
    would be usable for non-directory descriptors (with empty relative
    pathname). Introduce new flag (AT_EMPTY_PATH) to deal with that and
    corresponding LOOKUP_EMPTY; teach user_path_at() and path_init() to
    deal with the latter.

    Signed-off-by: Al Viro

    Al Viro
     
  • * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
    NFS: NFSROOT should default to "proto=udp"
    nfs4: remove duplicated #include
    NFSv4: nfs4_state_mark_reclaim_nograce() should be static
    NFSv4: Fix the setlk error handler
    NFSv4.1: Fix the handling of the SEQUENCE status bits
    NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses
    NFSv4.1 reclaim complete must wait for completion
    NFSv4: remove duplicate clientid in struct nfs_client
    NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY
    sunrpc: Propagate errors from xs_bind() through xs_create_sock()
    (try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid
    nfs: fix compilation warning
    nfs: add kmalloc return value check in decode_and_add_ds
    SUNRPC: Remove resource leak in svc_rdma_send_error()
    nfs: close NFSv4 COMMIT vs. CLOSE race
    SUNRPC: Close a race in __rpc_wait_for_completion_task()

    Linus Torvalds
     

14 Mar, 2011

9 commits

  • The exportfs encode handle function should return the minimum required
    handle size. This helps user to find out the handle size by passing 0
    handle size in the first step and then redoing to the call again with
    the returned handle size value.

    Acked-by: Serge Hallyn
    Signed-off-by: Aneesh Kumar K.V
    Signed-off-by: Al Viro

    Aneesh Kumar K.V
     
  • New helpers: user_statfs() and fd_statfs(), taking userland pathname and
    descriptor resp. and filling struct kstatfs. Syscalls of statfs family
    (native, compat and foreign - osf and hpux on alpha and parisc resp.)
    switched to those. Removes some boilerplate code, simplifies cleanup
    on errors...

    Signed-off-by: Al Viro

    Al Viro
     
  • new function: file_open_root(dentry, mnt, name, flags) opens the file
    vfs_path_lookup would arrive to.

    Note that name can be empty; in that case the usual requirement that
    dentry should be a directory is lifted.

    open-coded equivalents switched to it, may_open() got down exactly
    one caller and became static.

    Signed-off-by: Al Viro

    Al Viro
     
  • New lookup flag: LOOKUP_ROOT. nd->root is set (and held) by caller,
    path_init() starts walking from that place and all pathname resolution
    machinery never drops nd->root if that flag is set. That turns
    vfs_path_lookup() into a special case of do_path_lookup() *and*
    gets us down to 3 callers of link_path_walk(), making it finally
    feasible to rip the handling of trailing symlink out of link_path_walk().
    That will not only simply the living hell out of it, but make life
    much simpler for unionfs merge. Trailing symlink handling will
    become iterative, which is a good thing for stack footprint in
    a lot of situations as well.

    Signed-off-by: Al Viro

    Al Viro
     
  • Don't stash the struct file * used as starting point of walk in nameidata;
    pass file ** to path_init() instead.

    Signed-off-by: Al Viro

    Al Viro
     
  • take calculation of open_flags by open(2) arguments into new helper
    in fs/open.c, move filp_open() over there, have it and do_sys_open()
    use that helper, switch exec.c callers of do_filp_open() to explicit
    (and constant) struct open_flags.

    Signed-off-by: Al Viro

    Al Viro
     
  • instead of ad-hackery around need_reval_dot(), do the following:
    set a flag (LOOKUP_JUMPED) in the beginning of path, on absolute
    symlink traversal, on ".." and on procfs-style symlinks. Clear on
    normal components, leave unchanged on ".". Non-nested callers of
    link_path_walk() call handle_reval_path(), which checks that flag
    is set and that fs does want the final revalidate thing, then does
    ->d_revalidate(). In link_path_walk() all the return_reval stuff
    is gone.

    Signed-off-by: Al Viro

    Al Viro
     
  • all remaining callers pass LOOKUP_PARENT to it, so
    flags argument can die; renamed to kern_path_parent()

    Signed-off-by: Al Viro

    Al Viro
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6:
    [SCSI] target: Fix t_transport_aborted handling in LUN_RESET + active I/O shutdown

    Linus Torvalds
     

12 Mar, 2011

1 commit

  • nfs4_schedule_state_recovery() should only be used when we need to force
    the state manager to check the lease. If we just want to start the
    state manager in order to handle a state recovery situation, we should be
    using nfs4_schedule_state_manager().

    This patch fixes the abuses of nfs4_schedule_state_recovery() by replacing
    its use with a set of helper functions that do the right thing.

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     

11 Mar, 2011

8 commits

  • Signed-off-by: Andy Adamson
    Signed-off-by: Trond Myklebust

    Andy Adamson
     
  • Although they run as rpciod background tasks, under normal operation
    (i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck()
    and nfs4_do_close() want to be fully synchronous. This means that when we
    exit, we want all references to the rpc_task to be gone, and we want
    any dentry references etc. held by that task to be released.

    For this reason these functions call __rpc_wait_for_completion_task(),
    followed by rpc_put_task() in the expectation that the latter will be
    releasing the last reference to the rpc_task, and thus ensuring that the
    callback_ops->rpc_release() has been called synchronously.

    This patch fixes a race which exists due to the fact that
    rpciod calls rpc_complete_task() (in order to wake up the callers of
    __rpc_wait_for_completion_task()) and then subsequently calls
    rpc_put_task() without ensuring that these two steps are done atomically.

    In order to avoid adding new spin locks, the patch uses the existing
    waitqueue spin lock to order the rpc_task reference count releases between
    the waiting process and rpciod.
    The common case where nobody is waiting for completion is optimised for by
    checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task
    reference count is 1: in those cases we drop trying to grab the spin lock,
    and immediately free up the rpc_task.

    Those few processes that need to put the rpc_task from inside an
    asynchronous context and that do not care about ordering are given a new
    helper: rpc_put_task_async().

    Signed-off-by: Trond Myklebust

    Trond Myklebust
     
  • The function name does not distinguish it from xen_allocate_pirq_msi
    (which operates on domU and pvhvm domains rather than dom0).

    Hoist domain 0 specific functionality up into the only caller leaving
    functionality common to all guest types in xen_bind_pirq_msi_to_irq.

    Signed-off-by: Ian Campbell
    Signed-off-by: Konrad Rzeszutek Wilk

    Ian Campbell
     
  • Signed-off-by: Ian Campbell
    Signed-off-by: Konrad Rzeszutek Wilk

    Ian Campbell
     
  • Split the binding aspect of xen_allocate_pirq_msi out into a new
    xen_bind_pirq_to_irq function.

    In xen_hvm_setup_msi_irq when allocating a pirq write the MSI message
    to signal the PIRQ as soon as the pirq is obtained. There is no way to
    free the pirq back so if the subsequent binding to an IRQ fails we
    want to ensure that we will reuse the PIRQ next time rather than leak
    it.

    Signed-off-by: Ian Campbell
    Signed-off-by: Konrad Rzeszutek Wilk

    Ian Campbell
     
  • consistent with other similar functions.

    Signed-off-by: Ian Campbell
    Signed-off-by: Konrad Rzeszutek Wilk

    Ian Campbell
     
  • All callers pass this flag so it is pointless.

    Signed-off-by: Ian Campbell
    Signed-off-by: Konrad Rzeszutek Wilk

    Ian Campbell
     
  • * stable/irq.rework:
    xen/irq: Cleanup up the pirq_to_irq for DomU PV PCI passthrough guests as well.
    xen: Use IRQF_FORCE_RESUME
    xen/timer: Missing IRQF_NO_SUSPEND in timer code broke suspend.
    xen: Fix compile error introduced by "switch to new irq_chip functions"
    xen: Switch to new irq_chip functions
    xen: Remove stale irq_chip.end
    xen: events: do not free legacy IRQs
    xen: events: allocate GSIs and dynamic IRQs from separate IRQ ranges.
    xen: events: add xen_allocate_irq_{dynamic, gsi} and xen_free_irq
    xen:events: move find_unbound_irq inside CONFIG_PCI_MSI
    xen: handled remapped IRQs when enabling a pcifront PCI device.
    genirq: Add IRQF_FORCE_RESUME

    Konrad Rzeszutek Wilk
     

10 Mar, 2011

4 commits

  • …s/security-testing-2.6

    * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6:
    net: don't allow CAP_NET_ADMIN to load non-netdev kernel modules

    Linus Torvalds
     
  • Fixes this build-check error:

    include/linux/sysctl.h:28: included file 'linux/rcupdate.h' is not exported

    Signed-off-by: Stephen Rothwell
    Signed-off-by: Linus Torvalds

    Stephen Rothwell
     
  • Since a8f80e8ff94ecba629542d9b4b5f5a8ee3eb565c any process with
    CAP_NET_ADMIN may load any module from /lib/modules/. This doesn't mean
    that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are
    limited to /lib/modules/**. However, CAP_NET_ADMIN capability shouldn't
    allow anybody load any module not related to networking.

    This patch restricts an ability of autoloading modules to netdev modules
    with explicit aliases. This fixes CVE-2011-1019.

    Arnd Bergmann suggested to leave untouched the old pre-v2.6.32 behavior
    of loading netdev modules by name (without any prefix) for processes
    with CAP_SYS_MODULE to maintain the compatibility with network scripts
    that use autoloading netdev modules by aliases like "eth0", "wlan0".

    Currently there are only three users of the feature in the upstream
    kernel: ipip, ip_gre and sit.

    root@albatros:~# capsh --drop=$(seq -s, 0 11),$(seq -s, 13 34) --
    root@albatros:~# grep Cap /proc/$$/status
    CapInh: 0000000000000000
    CapPrm: fffffff800001000
    CapEff: fffffff800001000
    CapBnd: fffffff800001000
    root@albatros:~# modprobe xfs
    FATAL: Error inserting xfs
    (/lib/modules/2.6.38-rc6-00001-g2bf4ca3/kernel/fs/xfs/xfs.ko): Operation not permitted
    root@albatros:~# lsmod | grep xfs
    root@albatros:~# ifconfig xfs
    xfs: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep xfs
    root@albatros:~# lsmod | grep sit
    root@albatros:~# ifconfig sit
    sit: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep sit
    root@albatros:~# ifconfig sit0
    sit0 Link encap:IPv6-in-IPv4
    NOARP MTU:1480 Metric:1

    root@albatros:~# lsmod | grep sit
    sit 10457 0
    tunnel4 2957 1 sit

    For CAP_SYS_MODULE module loading is still relaxed:

    root@albatros:~# grep Cap /proc/$$/status
    CapInh: 0000000000000000
    CapPrm: ffffffffffffffff
    CapEff: ffffffffffffffff
    CapBnd: ffffffffffffffff
    root@albatros:~# ifconfig xfs
    xfs: error fetching interface information: Device not found
    root@albatros:~# lsmod | grep xfs
    xfs 745319 0

    Reference: https://lkml.org/lkml/2011/2/24/203

    Signed-off-by: Vasiliy Kulikov
    Signed-off-by: Michael Tokarev
    Acked-by: David S. Miller
    Acked-by: Kees Cook
    Signed-off-by: James Morris

    Vasiliy Kulikov
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6:
    nd->inode is not set on the second attempt in path_walk()
    unfuck proc_sysctl ->d_compare()
    minimal fix for do_filp_open() race

    Linus Torvalds
     

09 Mar, 2011

1 commit


08 Mar, 2011

1 commit

  • a) struct inode is not going to be freed under ->d_compare();
    however, the thing PROC_I(inode)->sysctl points to just might.
    Fortunately, it's enough to make freeing that sucker delayed,
    provided that we don't step on its ->unregistering, clear
    the pointer to it in PROC_I(inode) before dropping the reference
    and check if it's NULL in ->d_compare().

    b) I'm not sure that we *can* walk into NULL inode here (we recheck
    dentry->seq between verifying that it's still hashed / fetching
    dentry->d_inode and passing it to ->d_compare() and there's no
    negative hashed dentries in /proc/sys/*), but if we can walk into
    that, we really should not have ->d_compare() return 0 on it!
    Said that, I really suspect that this check can be simply killed.
    Nick?

    Signed-off-by: Al Viro

    Al Viro
     

06 Mar, 2011

1 commit

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
    ceph: no .snap inside of snapped namespace
    libceph: fix msgr standby handling
    libceph: fix msgr keepalive flag
    libceph: fix msgr backoff
    libceph: retry after authorization failure
    libceph: fix handling of short returns from get_user_pages
    ceph: do not clear I_COMPLETE from d_release
    ceph: do not set I_COMPLETE
    Revert "ceph: keep reference to parent inode on ceph_dentry"

    Linus Torvalds
     

05 Mar, 2011

3 commits