22 May, 2010

35 commits

  • When netlink sockets are used to convey data that is in a namespace
    we need a way to select a subset of the listening sockets to deliver
    the packet to. For the network namespace we have been doing this
    by only transmitting packets in the correct network namespace.

    For data belonging to other namespaces netlink_bradcast_filtered
    provides a mechanism that allows us to examine the destination
    socket and to decide if we should transmit the specified packet
    to it.

    Signed-off-by: Eric W. Biederman
    Acked-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • I had a couple of stupid bugs in:
    netns: Teach network device kobjects which namespace they are in.

    - I duplicated the Kconfig for the NET_NS
    - The build was broken when sysfs was not compiled in

    The sysfs breakage is because after I moved the operations
    for the sysfs to the kobject layer, to make things cleaner
    I forgot to move the ifdefs. Opps.

    I'm not quite certain how I got introduced a second NET_NS Kconfig,
    but it was probably a 3 way merge somewhere along the way that
    did not notice that the NET_NS Kconfig option had mvoed and thout
    that was a bug. It probably slipped in because it used to be the
    sysfs patches were the first patches in my network namespace patches.
    Some things just don't go like you would expect.

    Neither of these bugs actually affect anything in the common case
    but they should be fixed.

    Thanks to Serge for noticing they were present.

    Reported-by: Serge E. Hallyn
    Signed-off-by: Eric W. Biederman
    Acked-by: David S. Miller

    Eric W. Biederman
     
  • The problem. Network devices show up in sysfs and with the network
    namespace active multiple devices with the same name can show up in
    the same directory, ouch!

    To avoid that problem and allow existing applications in network namespaces
    to see the same interface that is currently presented in sysfs, this
    patch enables the tagging directory support in sysfs.

    By using the network namespace pointers as tags to separate out the
    the sysfs directory entries we ensure that we don't have conflicts
    in the directories and applications only see a limited set of
    the network devices.

    Signed-off-by: Eric W. Biederman
    Acked-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Open a copy of the uevent kernel socket in each network
    namespace so we can send uevents in all network namespaces.

    Signed-off-by: Eric W. Biederman
    Acked-by: David S. Miller
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • In this code section the final S of CONFIG_MODULES was missed making
    the whole check useless

    Signed-off-by: Christoph Egger
    Cc: Mark McLoughlin
    Signed-off-by: Greg Kroah-Hartman

    Christoph Egger
     
  • The PCI config space bin_attr read handler has a hardcoded CAP_SYS_ADMIN
    check to verify privileges before allowing a user to read device
    dependent config space. This is meant to protect from an unprivileged
    user potentially locking up the box.

    When assigning a PCI device directly to a guest with libvirt and KVM,
    the sysfs config space file is chown'd to the unprivileged user that
    the KVM guest will run as. The guest needs to have full access to the
    device's config space since it's responsible for driving the device.
    However, despite being the owner of the sysfs file, the CAP_SYS_ADMIN
    check will not allow read access beyond the config header.

    With this patch we check privileges against the capabilities used when
    openining the sysfs file. The allows a privileged process to open the
    file and hand it to an unprivileged process, and the unprivileged process
    can still read all of the config space.

    Signed-off-by: Chris Wright
    Acked-by: Jesse Barnes
    Cc: Alan Cox
    Signed-off-by: Greg Kroah-Hartman

    Chris Wright
     
  • This allows bin_attr->read,write,mmap callbacks to check file specific data
    (such as inode owner) as part of any privilege validation.

    Signed-off-by: Chris Wright
    Signed-off-by: Greg Kroah-Hartman

    Chris Wright
     
  • In Al's latest vfs tree the code is reworked and S_BIAS has been removed.

    It turns out that checking to see if a super block is in the
    middle of an unmount in sysfs_exit_ns is unnecessary because we
    remove the super_block from the s_supers/s_instances list before
    struct sysfs_super_info pointed to by sb->s_fs_info is freed.

    For now just delete the unnecessary check to see if a superblock is in the
    middle of an unmount, it isn't necessary with or without Al's changes
    and it just causes a needless conflict.

    Reported-by: Stephen Rothwell
    Cc: Al Viro
    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • It appears gcc can't cope with using an enum that is only declared in
    an inline function declaration, that doesn't even use the variable
    that is so declared.

    Avoid the silliness and replace the enum with an int, and make gcc
    happy.

    Signed-off-by: Eric W. Biederman
    Acked-by: Randy Dunlap
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • The first three paragraphs are almost verbatim taken from Eric's
    commit message on the patch introducing network ns tags. The next
    two paragraphs I wrote to be a brief high level overview. The last
    section is taken from the commit message on "Implement sysfs tagged
    directory support", but updated. Hopefully correctly.

    Signed-off-by: Serge E. Hallyn
    Cc: Eric W. Biederman
    Cc: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Serge E. Hallyn
     
  • Add some in-line comments to explain the new infrastructure, which
    was introduced to support sysfs directory tagging with namespaces.
    I think an overall description someplace might be good too, but it
    didn't really seem to fit into Documentation/filesystems/sysfs.txt,
    which appears more geared toward users, rather than maintainers, of
    sysfs.

    (Tejun, please let me know if I can make anything clearer or failed
    altogether to comment something that should be commented.)

    Signed-off-by: Serge E. Hallyn
    Cc: Eric W. Biederman
    Cc: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Serge E. Hallyn
     
  • device_del and device_rename were modified to use
    sysfs_delete_link and sysfs_rename_link respectively to ensure
    when these operations happen on devices whose classes
    are in namespace directories they work properly.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Benjamin Thery
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • When removing a symlink sysfs_remove_link does not provide
    enough information to figure out which tagged directory the symlink
    falls in. So I need sysfs_delete_link which is passed the target
    of the symlink to delete.

    sysfs_rename_link is updated to call sysfs_delete_link instead
    of sysfs_remove_link as we have all of the information necessary
    and the callers are interesting.

    Both of these functions now have enough information to find a symlink
    in a tagged directory. The only restriction is that they must be called
    before the target kobject is renamed or deleted. If they are called
    later I loose track of which tag the target kobject was marked with
    and can no longer find the old symlink to remove it.

    This patch was split from an earlier patch.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Benjamin Thery
    Signed-off-by: Daniel Lezcano
    Acked-by: Tejun Heo
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • I had hopped to avoid this but the bonding driver adds a file
    to /sys/class/net/ and the easiest way to handle that file is
    to make it untagged and to register it only once.

    So relax the rules on tagged directories, and make bonding work.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • The problem. When implementing a network namespace I need to be able
    to have multiple network devices with the same name. Currently this
    is a problem for /sys/class/net/*, /sys/devices/virtual/net/*, and
    potentially a few other directories of the form /sys/ ... /net/*.

    What this patch does is to add an additional tag field to the
    sysfs dirent structure. For directories that should show different
    contents depending on the context such as /sys/class/net/, and
    /sys/devices/virtual/net/ this tag field is used to specify the
    context in which those directories should be visible. Effectively
    this is the same as creating multiple distinct directories with
    the same name but internally to sysfs the result is nicer.

    I am calling the concept of a single directory that looks like multiple
    directories all at the same path in the filesystem tagged directories.

    For the networking namespace the set of directories whose contents I need
    to filter with tags can depend on the presence or absence of hotplug
    hardware or which modules are currently loaded. Which means I need
    a simple race free way to setup those directories as tagged.

    To achieve a reace free design all tagged directories are created
    and managed by sysfs itself.

    Users of this interface:
    - define a type in the sysfs_tag_type enumeration.
    - call sysfs_register_ns_types with the type and it's operations
    - sysfs_exit_ns when an individual tag is no longer valid

    - Implement mount_ns() which returns the ns of the calling process
    so we can attach it to a sysfs superblock.
    - Implement ktype.namespace() which returns the ns of a syfs kobject.

    Everything else is left up to sysfs and the driver layer.

    For the network namespace mount_ns and namespace() are essentially
    one line functions, and look to remain that.

    Tags are currently represented a const void * pointers as that is
    both generic, prevides enough information for equality comparisons,
    and is trivial to create for current users, as it is just the
    existing namespace pointer.

    The work needed in sysfs is more extensive. At each directory
    or symlink creating I need to check if the directory it is being
    created in is a tagged directory and if so generate the appropriate
    tag to place on the sysfs_dirent. Likewise at each symlink or
    directory removal I need to check if the sysfs directory it is
    being removed from is a tagged directory and if so figure out
    which tag goes along with the name I am deleting.

    Currently only directories which hold kobjects, and
    symlinks are supported. There is not enough information
    in the current file attribute interfaces to give us anything
    to discriminate on which makes it useless, and there are
    no potential users which makes it an uninteresting problem
    to solve.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Benjamin Thery
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Move complete knowledge of namespaces into the kobject layer
    so we can use that information when reporting kobjects to
    userspace.

    Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Signed-off-by: Eric W. Biederman
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Add all of the necessary bioler plate to support
    multiple superblocks in sysfs.

    Signed-off-by: Eric W. Biederman
    Acked-by: Serge Hallyn
    Signed-off-by: Greg Kroah-Hartman

    Eric W. Biederman
     
  • Recent udev versions probe loop devices for filesystems meaning that
    the /dev/disk hierarchy may contain useful entries such as

    $ ls -l /dev/disk/by-label/Fedora-12-x86_64-Live
    lrwxrwxrwx 1 root root 11 Mar 11 13:41 /dev/disk/by-label/Fedora-12-x86_64-Live -> ../../loop0

    Unfortunately, no "change" uevent is generated when the loop device is
    detached so the symlink persists. Additionally, no "change" uevent is
    guaranteed to be generated when attaching an fd or changing capacity.
    For example, user space could open the loop device O_RDONLY (in fact,
    recent util-linux-ng does this) so udev's OPTIONS+="watch" machinery may
    not trigger the "change" uevent.

    This patch ensures that the "change" uevent is generated in all of
    these cases. As a result, the /dev/disk hierarchy works as expected
    for loop devices.

    Signed-off-by: David Zeuthen
    Acked-by: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    David Zeuthen
     
  • While device_shutdown() walks through devices_kset to shutdown all
    devices, device unplug events may race to shutdown individual devices.
    Specifically, sd_shutdown(), on behalf of fc_starget_delete(), has
    been observed deleting devices during device_shutdown()'s list
    traversal. So we factor out list_for_each_entry_safe_reverse(...) in
    favor of while (!list_empty(...)).

    Signed-off-by: Hugh Daschbach
    Signed-off-by: Greg Kroah-Hartman

    Hugh Daschbach
     
  • fw_id has the same life time as firmware_priv so it makes sense to move
    it into firmware_priv structure instead of allocating separately.

    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Greg Kroah-Hartman

    Dmitry Torokhov
     
  • Split builtin firmware handling into separate functions to clean up the
    main body of code.

    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Greg Kroah-Hartman

    Dmitry Torokhov
     
  • Do not create 'timeout' attribute manually, let driver core do it for us.
    This also ensures that attribute is cleaned up properly.

    Signed-off-by: Dmitry Torokhov
    Signed-off-by: Greg Kroah-Hartman

    Dmitry Torokhov
     
  • When we use request_firmware_nowait(), userspace may
    not want to answer negatively right away when for
    example it is answering from an initrd only, but
    with request_firmware() it has to in order to not
    delay the kernel boot until the request times out.

    This allows userspace to differentiate between the
    two in order to be able to reply negatively to async
    requests only when all filesystems have been mounted
    and have been checked for the requested firmware file.

    Signed-off-by: Johannes Berg
    Cc: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Johannes Berg
     
  • The conversion of device->sem to device->mutex resulted in lockdep
    warnings. Create a novalidate class for now until the driver folks
    come up with separate classes. That way we have at least the basic
    mutex debugging coverage.

    Add a checkpatch error so the usage is reserved for device->mutex.

    [ tglx: checkpatch and compile fix for LOCKDEP=n ]

    Signed-off-by: Peter Zijlstra
    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Peter Zijlstra
     
  • The semaphore is semantically a mutex. Convert it to a real mutex and
    fix up a few places where code was relying on semaphore.h to be included
    by device.h, as well as the users of the trylock function, as that value
    is now reversed.

    Signed-off-by: Thomas Gleixner
    Signed-off-by: Greg Kroah-Hartman

    Thomas Gleixner
     
  • When runtime PM for platform_bus was added, it allowed for platforms
    to customize the runtime PM methods since they are defined as weak
    symbols.

    This patch allows platforms to also extend the system PM methods with
    custom hooks so runtime PM and system PM extensions can be managed
    together by custom platform-specific code.

    Signed-off-by: Kevin Hilman
    Cc: Magnus Damm
    Cc: Rafael Wysocki
    Cc: Dmitry Torokhov
    Cc: Eric Miao
    Signed-off-by: Greg Kroah-Hartman

    Kevin Hilman
     
  • Make devtmpfs available on (embedded) configurations without SHMEM/TMPFS,
    using ramfs instead.

    Saves ~15KB.

    Signed-off-by: Peter Korsgaard
    Acked-by: Kay Sievers
    Signed-off-by: Greg Kroah-Hartman

    Peter Korsgaard
     
  • kasprintf combines kmalloc and sprintf, and takes care of the size
    calculation itself.

    The semantic patch that makes this change is as follows:
    (http://coccinelle.lip6.fr/)

    //
    @@
    expression a,flag;
    expression list args;
    statement S;
    @@

    a =
    - \(kmalloc\|kzalloc\)(...,flag)
    + kasprintf(flag,args)

    - sprintf(a,args);
    //

    Signed-off-by: Julia Lawall
    Signed-off-by: Greg Kroah-Hartman

    Julia Lawall
     
  • This patch (as1351) removes an unnecessary and unwanted assignment
    from device_initialize(). The wakeup flags are set to 0 along with
    everything else when the device structure is allocated, so we don't
    need to do it again. Furthermore, the subsystem might already have
    set these flags to their correct values; we don't want to override it.

    Signed-off-by: Alan Stern
    Signed-off-by: Greg Kroah-Hartman

    Alan Stern
     
  • This patch fix a potential race condition in the driver_bound() function
    in the file driver/base/dd.c.

    The broadcast of the BUS_NOTIFY_BOUND_DRIVER notifier should be done
    after adding the new device to the driver list. Otherwise notifier
    listener will fail if they use functions like usb_find_interface().

    The patch is against kernel 2.6.33. Please merge it.

    Signed-off-by: Stefani Seibold
    Signed-off-by: Greg Kroah-Hartman

    Stefani Seibold
     
  • The messages from _request_firmware() informing that firmware is
    being requested or built-in firmware is going to be used are printed
    at KERN_INFO, which produces lots of noise on systems with huge
    numbers of AMD CPUs. Reduce the level of these messages to
    KERN_DEBUG to get rid of that noise.

    Signed-off-by: Rafael J. Wysocki
    Signed-off-by: Greg Kroah-Hartman

    Rafael J. Wysocki
     
  • Of the three uses of kref_set in the kernel:

    One really should be kref_put as the code is letting go of a
    reference,
    Two really should be kref_init because the kref is being
    initialised.

    This suggests that making kref_set available encourages bad code.
    So fix the three uses and remove kref_set completely.

    Signed-off-by: NeilBrown
    Acked-by: Mimi Zohar
    Acked-by: Serge Hallyn
    Signed-off-by: Greg Kroah-Hartman

    NeilBrown
     
  • fix memory leak introduced by the patch 6e03a201bbe:
    firmware: speed up request_firmware()

    1. vfree won't release pages there were allocated explicitly and mapped
    using vmap. The memory has to be vunmap-ed and the pages needs
    to be freed explicitly

    2. page array is moved into the 'struct
    firmware' so that we can free it from release_firmware()
    and not only in fw_dev_release()

    The fix doesn't break the firmware load speed.

    Cc: Johannes Berg
    Cc: Ming Lei
    Cc: Catalin Marinas
    Singed-off-by: Kay Sievers
    Signed-off-by: David Woodhouse
    Signed-off-by: Tomas Winkler
    Signed-off-by: Greg Kroah-Hartman

    David Woodhouse
     
  • Without CONFIG_CPUMASK_OFFSTACK, simply inverting cpu_online_mask leads
    to CPUs beyond nr_cpu_ids to be displayed twice and CPUs not even
    possible to be displayed as offline.

    Signed-off-by: Jan Beulich
    Cc: Andi Kleen
    Cc: Stephen Rothwell
    Signed-off-by: Andrew Morton
    Cc: stable
    Signed-off-by: Greg Kroah-Hartman

    Jan Beulich
     

21 May, 2010

5 commits

  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: (23 commits)
    nilfs2: disallow remount of snapshot from/to a regular mount
    nilfs2: use huge_encode_dev/huge_decode_dev
    nilfs2: update comment on deactivate_super at nilfs_get_sb
    nilfs2: replace MS_VERBOSE with MS_SILENT
    nilfs2: add missing initialization of s_mode
    nilfs2: fix misuse of open_bdev_exclusive/close_bdev_exclusive
    nilfs2: enlarge s_volume_name member in nilfs_super_block
    nilfs2: use checkpoint number instead of timestamp to select super block
    nilfs2: add missing endian conversion on super block magic number
    nilfs2: make nilfs_sc_*_ops static
    nilfs2: add kernel doc comments to persistent object allocator functions
    nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info
    nilfs2: remove nilfs_segctor_init() in segment.c
    nilfs2: insert checkpoint number in segment summary header
    nilfs2: add a print message after loading nilfs2
    nilfs2: cleanup multi kmem_cache_{create,destroy} code
    nilfs2: move out checksum routines to segment buffer code
    nilfs2: move pointer to super root block into logs
    nilfs2: change default of 'errors' mount option to 'remount-ro' mode
    nilfs2: Combine nilfs_btree_release_path() and nilfs_btree_free_path()
    ...

    Linus Torvalds
     
  • * git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw:
    GFS2: Fix typo
    GFS2: stuck in inode wait, no glocks stuck
    GFS2: Eliminate useless err variable
    GFS2: Fix writing to non-page aligned gfs2_quota structures
    GFS2: Add some useful messages
    GFS2: fix quota state reporting
    GFS2: Various gfs2_logd improvements
    GFS2: glock livelock
    GFS2: Clean up stuffed file copying
    GFS2: docs update
    GFS2: Remove space from slab cache name

    Linus Torvalds
     
  • * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/dlm:
    dlm: fix ast ordering for user locks
    dlm: cleanup remove unused code

    Linus Torvalds
     
  • * git://git.infradead.org/mtd-2.6: (154 commits)
    mtd: cfi_cmdset_0002: use AMD standard command-set with Winbond flash chips
    mtd: cfi_cmdset_0002: Fix MODULE_ALIAS and linkage for new 0701 commandset ID
    mtd: mxc_nand: Remove duplicate NAND_CMD_RESET case value
    mtd: update gfp/slab.h includes
    jffs2: Stop triggering block erases from jffs2_write_super()
    jffs2: Rename jffs2_erase_pending_trigger() to jffs2_dirty_trigger()
    jffs2: Use jffs2_garbage_collect_trigger() to trigger pending erases
    jffs2: Require jffs2_garbage_collect_trigger() to be called with lock held
    jffs2: Wake GC thread when there are blocks to be erased
    jffs2: Erase pending blocks in GC pass, avoid invalid -EIO return
    jffs2: Add 'work_done' return value from jffs2_erase_pending_blocks()
    mtd: mtdchar: Do not corrupt backing device of device node inode
    mtd/maps/pcmciamtd: Fix printk format for ssize_t in debug messages
    drivers/mtd: Use kmemdup
    mtd: cfi_cmdset_0002: Fix argument order in bootloc warning
    mtd: nand: add Toshiba TC58NVG0 device ID
    pcmciamtd: add another ID
    pcmciamtd: coding style cleanups
    pcmciamtd: fixing obvious errors
    mtd: chips: add SST39WF160x NOR-flashes
    ...

    Trivial conflicts due to dev_node removal in drivers/mtd/maps/pcmciamtd.c

    Linus Torvalds
     
  • * 'linux-next' of git://git.infradead.org/ubi-2.6:
    UBI: misc comment fixes
    UBI: fix s/then/than/ typos
    UBI: init even if MTD device cannot be attached, if built into kernel
    UBI: remove reboot notifier

    Linus Torvalds