04 Sep, 2013

3 commits

  • Pull pstore changes from Tony Luck:
    "A big part of this is the addition of compression to the generic
    pstore layer so that all backends can use the pitiful amounts of
    storage they control more effectively. Three other small
    fixes/cleanups too.

    * tag 'please-pull-pstore' of git://git.kernel.org/pub/scm/linux/kernel/git/aegl/linux:
    pstore/ram: (really) fix undefined usage of rounddown_pow_of_two
    pstore/ram: Read and write to the 'compressed' flag of pstore
    efi-pstore: Read and write to the 'compressed' flag of pstore
    erst: Read and write to the 'compressed' flag of pstore
    powerpc/pseries: Read and write to the 'compressed' flag of pstore
    pstore: Add file extension to pstore file if compressed
    pstore: Add decompression support to pstore
    pstore: Introduce new argument 'compressed' in the read callback
    pstore: Add compression support to pstore
    pstore/Kconfig: Select ZLIB_DEFLATE and ZLIB_INFLATE when PSTORE is selected
    pstore: Add new argument 'compressed' in pstore write callback
    powerpc/pseries: Remove (de)compression in nvram with pstore enabled
    pstore: d_alloc_name() doesn't return an ERR_PTR
    acpi/apei/erst: Add missing iounmap() on error in erst_exec_move_data()

    Linus Torvalds
     
  • Pull cgroup updates from Tejun Heo:
    "A lot of activities on the cgroup front. Most changes aren't visible
    to userland at all at this point and are laying foundation for the
    planned unified hierarchy.

    - The biggest change is decoupling the lifetime management of css
    (cgroup_subsys_state) from that of cgroup's. Because controllers
    (cpu, memory, block and so on) will need to be dynamically enabled
    and disabled, css which is the association point between a cgroup
    and a controller may come and go dynamically across the lifetime of
    a cgroup. Till now, css's were created when the associated cgroup
    was created and stayed till the cgroup got destroyed.

    Assumptions around this tight coupling permeated through cgroup
    core and controllers. These assumptions are gradually removed,
    which consists bulk of patches, and css destruction path is
    completely decoupled from cgroup destruction path. Note that
    decoupling of creation path is relatively easy on top of these
    changes and the patchset is pending for the next window.

    - cgroup has its own event mechanism cgroup.event_control, which is
    only used by memcg. It is overly complex trying to achieve high
    flexibility whose benefits seem dubious at best. Going forward,
    new events will simply generate file modified event and the
    existing mechanism is being made specific to memcg. This pull
    request contains prepatory patches for such change.

    - Various fixes and cleanups"

    Fixed up conflict in kernel/cgroup.c as per Tejun.

    * 'for-3.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (69 commits)
    cgroup: fix cgroup_css() invocation in css_from_id()
    cgroup: make cgroup_write_event_control() use css_from_dir() instead of __d_cgrp()
    cgroup: make cgroup_event hold onto cgroup_subsys_state instead of cgroup
    cgroup: implement CFTYPE_NO_PREFIX
    cgroup: make cgroup_css() take cgroup_subsys * instead and allow NULL subsys
    cgroup: rename cgroup_css_from_dir() to css_from_dir() and update its syntax
    cgroup: fix cgroup_write_event_control()
    cgroup: fix subsystem file accesses on the root cgroup
    cgroup: change cgroup_from_id() to css_from_id()
    cgroup: use css_get() in cgroup_create() to check CSS_ROOT
    cpuset: remove an unncessary forward declaration
    cgroup: RCU protect each cgroup_subsys_state release
    cgroup: move subsys file removal to kill_css()
    cgroup: factor out kill_css()
    cgroup: decouple cgroup_subsys_state destruction from cgroup destruction
    cgroup: replace cgroup->css_kill_cnt with ->nr_css
    cgroup: bounce cgroup_subsys_state ref kill confirmation to a work item
    cgroup: move cgroup->subsys[] assignment to online_css()
    cgroup: reorganize css init / exit paths
    cgroup: add __rcu modifier to cgroup->subsys[]
    ...

    Linus Torvalds
     
  • Pull driver core patches from Greg KH:
    "Here's the big driver core pull request for 3.12-rc1.

    Lots of tiny changes here fixing up the way sysfs attributes are
    created, to try to make drivers simpler, and fix a whole class race
    conditions with creations of device attributes after the device was
    announced to userspace.

    All the various pieces are acked by the different subsystem
    maintainers"

    * tag 'driver-core-3.12-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (119 commits)
    firmware loader: fix pending_fw_head list corruption
    drivers/base/memory.c: introduce help macro to_memory_block
    dynamic debug: line queries failing due to uninitialized local variable
    sysfs: sysfs_create_groups returns a value.
    debugfs: provide debugfs_create_x64() when disabled
    rbd: convert bus code to use bus_groups
    firmware: dcdbas: use binary attribute groups
    sysfs: add sysfs_create/remove_groups for when SYSFS is not enabled
    driver core: add #include to core files.
    HID: convert bus code to use dev_groups
    Input: serio: convert bus code to use drv_groups
    Input: gameport: convert bus code to use drv_groups
    driver core: firmware: use __ATTR_RW()
    driver core: core: use DEVICE_ATTR_RO
    driver core: bus: use DRIVER_ATTR_WO()
    driver core: create write-only attribute macros for devices and drivers
    sysfs: create __ATTR_WO()
    driver-core: platform: convert bus code to use dev_groups
    workqueue: convert bus code to use dev_groups
    MEI: convert bus code to use dev_groups
    ...

    Linus Torvalds
     

03 Sep, 2013

3 commits

  • Merge lockref infrastructure code by me and Waiman Long.

    I already merged some of the preparatory patches that didn't actually do
    any semantic changes earlier, but this merges the actual _reason_ for
    those preparatory patches.

    The "lockref" structure is a combination "spinlock and reference count"
    that allows optimized reference count accesses. In particular, it
    guarantees that the reference count will be updated AS IF the spinlock
    was held, but using atomic accesses that cover both the reference count
    and the spinlock words, we can often do the update without actually
    having to take the lock.

    This allows us to avoid the nastiest cases of spinlock contention on
    large machines under heavy pathname lookup loads. When updating the
    dentry reference counts on a large system, we'll still end up with the
    cache line bouncing around, but that's much less noticeable than
    actually having to spin waiting for the lock.

    * lockref:
    lockref: implement lockless reference count updates using cmpxchg()
    lockref: uninline lockref helper functions
    vfs: reimplement d_rcu_to_refcount() using lockref_get_or_lock()
    vfs: use lockref_get_not_zero() for optimistic lockless dget_parent()
    lockref: add 'lockref_get_or_lock() helper

    Linus Torvalds
     
  • This moves __d_rcu_to_refcount() from into fs/namei.c
    and re-implements it using the lockref infrastructure instead. It also
    adds a lot of comments about what is actually going on, because turning
    a dentry that was looked up using RCU into a long-lived reference
    counted entry is one of the more subtle parts of the rcu walk.

    We also used to be _particularly_ subtle in unlazy_walk() where we
    re-validate both the dentry and its parent using the same sequence
    count. We used to do it by nesting the locks and then verifying the
    sequence count just once.

    That was silly, because nested locking is expensive, but the sequence
    count check is not. So this just re-validates the dentry and the parent
    separately, avoiding the nested locking, and making the lockref lookup
    possible.

    Acked-by: Waiman Long
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • A valid parent pointer is always going to have a non-zero reference
    count, but if we look up the parent optimistically without locking, we
    have to protect against the (very unlikely) race against renaming
    changing the parent from under us.

    We do that by using lockref_get_not_zero(), and then re-checking the
    parent pointer after getting a valid reference.

    [ This is a re-implementation of a chunk from the original patch by
    Waiman Long: "dcache: Enable lockless update of dentry's refcount".
    I've completely rewritten the patch-series and split it up, but I'm
    attributing this part to Waiman as it's close enough to his earlier
    patch - Linus ]

    Signed-off-by: Waiman Long
    Signed-off-by: Linus Torvalds

    Waiman Long
     

31 Aug, 2013

1 commit


29 Aug, 2013

4 commits

  • Merge fixes from Andrew Morton:
    "Five fixes.

    err, make that six. let me try again"

    * emailed patches from Andrew Morton :
    fs/ocfs2/super.c: Use bigger nodestr to accomodate 32-bit node numbers
    memcg: check that kmem_cache has memcg_params before accessing it
    drivers/base/memory.c: fix show_mem_removable() to handle missing sections
    IPC: bugfix for msgrcv with msgtyp < 0
    Omnikey Cardman 4000: pull in ioctl.h in user header
    timer_list: correct the iterator for timer_list

    Linus Torvalds
     
  • While using pacemaker/corosync, the node numbers are generated using IP
    address as opposed to serial node number generation. This may not fit
    in a 8-byte string. Use a bigger string to print the complete node
    number.

    Signed-off-by: Goldwyn Rodrigues
    Cc: Mark Fasheh
    Cc: Joel Becker
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Goldwyn Rodrigues
     
  • This just replaces the dentry count/lock combination with the lockref
    structure that contains both a count and a spinlock, and does the
    mechanical conversion to use the lockref infrastructure.

    There are no semantic changes here, it's purely syntactic. The
    reference lockref implementation uses the spinlock exactly the same way
    that the old dcache code did, and the bulk of this patch is just
    expanding the internal "d_count" use in the dcache code to use
    "d_lockref.count" instead.

    This is purely preparation for the real change to make the reference
    count updates be lockless during the 3.12 merge window.

    [ As with the previous commit, this is a rewritten version of a concept
    originally from Waiman, so credit goes to him, blame for any errors
    goes to me.

    Waiman's patch had some semantic differences for taking advantage of
    the lockless update in dget_parent(), while this patch is
    intentionally a pure search-and-replace change with no semantic
    changes. - Linus ]

    Signed-off-by: Waiman Long
    Signed-off-by: Linus Torvalds

    Waiman Long
     
  • This reverts commit bb2314b47996491bbc5add73633905c3120b6268.

    It wasn't necessarily wrong per se, but we're still busily discussing
    the exact details of this all, so I'm going to revert it for now.

    It's true that you can already do flink() through /proc and that flink()
    isn't new. But as Brad Spengler points out, some secure environments do
    not mount proc, and flink adds a new interface that can avoid path
    lookup of the source for those kinds of environments.

    We may re-do this (and even mark it for stable backporting back in 3.11
    and possibly earlier) once the whole discussion about the interface is done.

    Cc: Andy Lutomirski
    Cc: Al Viro
    Cc: Oleg Nesterov
    Cc: Brad Spengler
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     

27 Aug, 2013

1 commit


26 Aug, 2013

1 commit


25 Aug, 2013

6 commits


24 Aug, 2013

2 commits

  • Fix the issue with improper counting number of flying bio requests for
    BIO_EOPNOTSUPP error detection case.

    The sb_nbio must be incremented exactly the same number of times as
    complete() function was called (or will be called) because
    nilfs_segbuf_wait() will call wail_for_completion() for the number of
    times set to sb_nbio:

    do {
    wait_for_completion(&segbuf->sb_bio_event);
    } while (--segbuf->sb_nbio > 0);

    Two functions complete() and wait_for_completion() must be called the
    same number of times for the same sb_bio_event. Otherwise,
    wait_for_completion() will hang or leak.

    Signed-off-by: Vyacheslav Dubeyko
    Cc: Dan Carpenter
    Acked-by: Ryusuke Konishi
    Tested-by: Ryusuke Konishi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     
  • Remove double call of bio_put() in nilfs_end_bio_write() for the case of
    BIO_EOPNOTSUPP error detection. The issue was found by Dan Carpenter
    and he suggests first version of the fix too.

    Signed-off-by: Vyacheslav Dubeyko
    Reported-by: Dan Carpenter
    Acked-by: Ryusuke Konishi
    Tested-by: Ryusuke Konishi
    Cc:
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Vyacheslav Dubeyko
     

23 Aug, 2013

1 commit


22 Aug, 2013

16 commits

  • This fixes up the remaining coding style issues in sysfs.h

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This fixes the coding style warnings in fs/sysfs/file.c for broken
    strings across lines.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This fixes up the odd do/while after an if statement warning in dir.c

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This fixes the uaccess.h warnings in the sysfs.c files.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This fixes up the 80 column coding style issues in the sysfs .c files.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This fixes up all of the space-related coding style issues for the sysfs
    code.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This removes all trailing whitespace errors in the sysfs code.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • The export should happen after the function, not at the bottom of the
    file, so fix that up.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • sysfs_remove_group() never had kerneldoc, so add it, and fix up the
    kerneldoc for sysfs_remove_groups() which didn't specify the parameters
    properly.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • checkpatch complains about the broken string in the file, and it's
    correct, so fix it up.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This fixes up the * coding style warnings for the group.c sysfs file.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • There was some trailing spaces in the file, fix that up.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • This fixes up the coding style issue of incorrectly placing the
    EXPORT_SYMBOL_GPL() macro, it should be right after the function itself,
    not at the end of the file.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • These functions are being open-coded in 3 different places in the driver
    core, and other driver subsystems will want to start doing this as well,
    so move it to the sysfs core to keep it all in one place, where we know
    it is written properly.

    Signed-off-by: Greg Kroah-Hartman

    Greg Kroah-Hartman
     
  • There is a nasty bug in the SCSI SG_IO ioctl that in some circumstances
    leads to one process writing data into the address space of some other
    random unrelated process if the ioctl is interrupted by a signal.
    What happens is the following:

    - A process issues an SG_IO ioctl with direction DXFER_FROM_DEV (ie the
    underlying SCSI command will transfer data from the SCSI device to
    the buffer provided in the ioctl)

    - Before the command finishes, a signal is sent to the process waiting
    in the ioctl. This will end up waking up the sg_ioctl() code:

    result = wait_event_interruptible(sfp->read_wait,
    (srp_done(sfp, srp) || sdp->detached));

    but neither srp_done() nor sdp->detached is true, so we end up just
    setting srp->orphan and returning to userspace:

    srp->orphan = 1;
    write_unlock_irq(&sfp->rq_list_lock);
    return result; /* -ERESTARTSYS because signal hit process */

    At this point the original process is done with the ioctl and
    blithely goes ahead handling the signal, reissuing the ioctl, etc.

    - Eventually, the SCSI command issued by the first ioctl finishes and
    ends up in sg_rq_end_io(). At the end of that function, we run through:

    write_lock_irqsave(&sfp->rq_list_lock, iflags);
    if (unlikely(srp->orphan)) {
    if (sfp->keep_orphan)
    srp->sg_io_owned = 0;
    else
    done = 0;
    }
    srp->done = done;
    write_unlock_irqrestore(&sfp->rq_list_lock, iflags);

    if (likely(done)) {
    /* Now wake up any sg_read() that is waiting for this
    * packet.
    */
    wake_up_interruptible(&sfp->read_wait);
    kill_fasync(&sfp->async_qp, SIGPOLL, POLL_IN);
    kref_put(&sfp->f_ref, sg_remove_sfp);
    } else {
    INIT_WORK(&srp->ew.work, sg_rq_end_io_usercontext);
    schedule_work(&srp->ew.work);
    }

    Since srp->orphan *is* set, we set done to 0 (assuming the
    userspace app has not set keep_orphan via an SG_SET_KEEP_ORPHAN
    ioctl), and therefore we end up scheduling sg_rq_end_io_usercontext()
    to run in a workqueue.

    - In workqueue context we go through sg_rq_end_io_usercontext() ->
    sg_finish_rem_req() -> blk_rq_unmap_user() -> ... ->
    bio_uncopy_user() -> __bio_copy_iov() -> copy_to_user().

    The key point here is that we are doing copy_to_user() on a
    workqueue -- that is, we're on a kernel thread with current->mm
    equal to whatever random previous user process was scheduled before
    this kernel thread. So we end up copying whatever data the SCSI
    command returned to the virtual address of the buffer passed into
    the original ioctl, but it's quite likely we do this copying into a
    different address space!

    As suggested by James Bottomley ,
    add a check for current->mm (which is NULL if we're on a kernel thread
    without a real userspace address space) in bio_uncopy_user(), and skip
    the copy if we're on a kernel thread.

    There's no reason that I can think of for any caller of bio_uncopy_user()
    to want to do copying on a kernel thread with a random active userspace
    address space.

    Huge thanks to Costa Sapuntzakis for the
    original pointer to this bug in the sg code.

    Signed-off-by: Roland Dreier
    Tested-by: David Milburn
    Cc: Jens Axboe
    Cc:
    Signed-off-by: James Bottomley

    Roland Dreier
     

20 Aug, 2013

2 commits

  • In the previous commit, Richard Genoud fixed proc_root_readdir(), which
    had lost the check for whether all of the non-process /proc entries had
    been returned or not.

    But that in turn exposed _another_ bug, namely that the original readdir
    conversion patch had yet another problem: it had lost the return value
    of proc_readdir_de(), so now checking whether it had completed
    successfully or not didn't actually work right anyway.

    This reinstates the non-zero return for the "end of base entries" that
    had also gotten lost in commit f0c3b5093add ("[readdir] convert
    procfs"). So now you get all the base entries *and* you get all the
    process entries, regardless of getdents buffer size.

    (Side note: the Linux "getdents" manual page actually has a nice example
    application for testing getdents, which can be easily modified to use
    different buffers. Who knew? Man-pages can be useful)

    Reported-by: Emmanuel Benisty
    Reported-by: Marc Dionne
    Cc: Richard Genoud
    Cc: Al Viro
    Signed-off-by: Linus Torvalds

    Linus Torvalds
     
  • In pstore write, add character 'C'(compressed) or 'D'(decompressed)
    in the header while writing to Ram persistent buffer. In pstore read,
    read the header and update the 'compressed' flag accordingly.

    Signed-off-by: Aruna Balakrishnaiah
    Reviewed-by: Kees Cook
    Signed-off-by: Tony Luck

    Aruna Balakrishnaiah