03 Aug, 2013

1 commit

  • Pull ACPI and power management fixes from Rafael Wysocki:

    - Revert two cpuidle commits added during the 3.8 development cycle
    that turn out to have introduced a significant performance regression
    as requested by Jeremy Eder.

    - The recent patches that made the freezer less heavy-weight introduced
    a regression causing user-space-driven hibernation using the ioctl()
    interface to block indefinitely when the hibernate process executes
    try_to_freeze(). Fix from Colin Cross addresses this by adding a
    process flag to mark the hibernate/suspend process to inform the
    freezer that that process should be ignored.

    - One of the recent cpufreq reverts uncovered a problem in the core
    causing the cpufreq driver module refcount to become negative after a
    system suspend-resume cycle. Fix from Rafael J Wysocki.

    - The evaluation of the ACPI battery _BIX method has never worked
    correctly, because the commit that added support for it forgot to
    take the "Revision" field in the return package into account. As a
    result, the reading of battery info doesn't work at all on some
    systems, which is addressed by a fix from Lan Tianyu.

    * tag 'pm+acpi-3.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    freezer: set PF_SUSPEND_TASK flag on tasks that call freeze_processes
    ACPI / battery: Fix parsing _BIX return value
    cpufreq: Fix cpufreq driver module refcount balance after suspend/resume
    Revert "cpuidle: Quickly notice prediction failure for repeat mode"
    Revert "cpuidle: Quickly notice prediction failure in general case"

    Linus Torvalds
     

01 Aug, 2013

8 commits

  • Merge more patches from Andrew Morton:
    "A bunch of fixes.

    Plus Joe's printk move and rework. It's not a -rc3 thing but now
    would be a nice time to offload it, while things are quiet. I've been
    sitting on it all for a couple of weeks, no issues"

    * emailed patches from Andrew Morton :
    vmpressure: make sure there are no events queued after memcg is offlined
    vmpressure: do not check for pending work to prevent from new work
    vmpressure: change vmpressure::sr_lock to spinlock
    printk: rename struct log to struct printk_log
    printk: use pointer for console_cmdline indexing
    printk: move braille console support into separate braille.[ch] files
    printk: add console_cmdline.h
    printk: move to separate directory for easier modification
    drivers/rtc/rtc-twl.c: fix: rtcX/wakealarm attribute isn't created
    mm: zbud: fix condition check on allocation size
    thp, mm: avoid PageUnevictable on active/inactive lru lists
    mm/swap.c: clear PageActive before adding pages onto unevictable list
    arch/x86/platform/ce4100/ce4100.c: include reboot.h
    mm: sched: numa: fix NUMA balancing when !SCHED_DEBUG
    rapidio: fix use after free in rio_unregister_scan()
    .gitignore: ignore *.lz4 files
    MAINTAINERS: dynamic debug: Jason's not there...
    dmi_scan: add comments on dmi_present() and the loop in dmi_scan_machine()
    ocfs2/refcounttree: add the missing NULL check of the return value of find_or_create_page()
    mm: mempolicy: fix mbind_range() && vma_adjust() interaction

    Linus Torvalds
     
  • Rename the struct to enable moving portions of
    printk.c to separate files.

    The rename changes output of /proc/vmcoreinfo.

    Signed-off-by: Joe Perches
    Cc: Samuel Thibault
    Cc: Ming Lei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Make the code a bit more compact by always using a pointer for the active
    console_cmdline.

    Move overly indented code to correct indent level.

    Signed-off-by: Joe Perches
    Cc: Samuel Thibault
    Cc: Ming Lei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Create files with prototypes and static inlines for braille support. Make
    braille_console functions return 1 on success.

    Corrected CONFIG_A11Y_BRAILLE_CONSOLE=n _braille_console_setup
    return value to NULL.

    Signed-off-by: Joe Perches
    Reviewed-by: Samuel Thibault
    Cc: Ming Lei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Add an include file for the console_cmdline struct so that the braille
    console driver can be separated.

    Signed-off-by: Joe Perches
    Cc: Samuel Thibault
    Cc: Ming Lei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Make it easier to break up printk into bite-sized chunks.

    Remove printk path/filename from comment.

    Signed-off-by: Joe Perches
    Cc: Samuel Thibault
    Cc: Ming Lei
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Joe Perches
     
  • Commit 3105b86a9fee ("mm: sched: numa: Control enabling and disabling of
    NUMA balancing if !SCHED_DEBUG") defined numabalancing_enabled to
    control the enabling and disabling of automatic NUMA balancing, but it
    is never used.

    I believe the intention was to use this in place of sched_feat_numa(NUMA).

    Currently, if SCHED_DEBUG is not defined, sched_feat_numa(NUMA) will
    never be changed from the initial "false".

    Signed-off-by: Dave Kleikamp
    Acked-by: Mel Gorman
    Signed-off-by: Andrew Morton
    Signed-off-by: Linus Torvalds

    Dave Kleikamp
     
  • Pull networking fixes from David Miller:

    1) Fix association failures not triggering a connect-failure event in
    cfg80211, from Johannes Berg.

    2) Eliminate a potential NULL deref with older iptables tools when
    configuring xt_socket rules, from Eric Dumazet.

    3) Missing RTNL locking in wireless regulatory code, from Johannes
    Berg.

    4) Fix OOPS caused by firmware loading races in ath9k_htc, from Alexey
    Khoroshilov.

    5) Fix usb URB leak in usb_8dev CAN driver, also from Alexey
    Khoroshilov.

    6) VXLAN namespace teardown fails to unregister devices, from Stephen
    Hemminger.

    7) Fix multicast settings getting dropped by firmware in qlcnic driver,
    from Sucheta Chakraborty.

    8) Add sysctl range enforcement for tcp_syn_retries, from Michal Tesar.

    9) Fix a nasty bug in bridging where an active timer would get
    reinitialized with a setup_timer() call. From Eric Dumazet.

    10) Fix use after free in new mlx5 driver, from Dan Carpenter.

    11) Fix freed pointer reference in ipv6 multicast routing on namespace
    cleanup, from Hannes Frederic Sowa.

    12) Some usbnet drivers report TSO and SG in their feature set, but the
    usbnet layer doesn't really support them. From Eric Dumazet.

    13) Fix crash on EEH errors in tg3 driver, from Gavin Shan.

    14) Drop cb_lock when requesting modules in genetlink, from Stanislaw
    Gruszka.

    15) Kernel stack leaks in cbq scheduler and af_key pfkey messages, from
    Dan Carpenter.

    16) FEC driver erroneously signals NETDEV_TX_BUSY on transmit leading to
    endless loops, from Uwe Kleine-König.

    17) Fix hangs from loading mvneta driver, from Arnaud Patard.

    * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (84 commits)
    mlx5: fix error return code in mlx5_alloc_uuars()
    mvneta: Try to fix mvneta when compiled as module
    mvneta: Fix hang when loading the mvneta driver
    atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring
    genetlink: fix usage of NLM_F_EXCL or NLM_F_REPLACE
    af_key: more info leaks in pfkey messages
    net/fec: Don't let ndo_start_xmit return NETDEV_TX_BUSY without link
    net_sched: Fix stack info leak in cbq_dump_wrr().
    igb: fix vlan filtering in promisc mode when not in VT mode
    ixgbe: Fix Tx Hang issue with lldpad on 82598EB
    genetlink: release cb_lock before requesting additional module
    net: fec: workaround stop tx during errata ERR006358
    qlcnic: Fix diagnostic interrupt test for 83xx adapters.
    qlcnic: Fix setting Guest VLAN
    qlcnic: Fix operation type and command type.
    qlcnic: Fix initialization of work function.
    Revert "atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring"
    atl1c: Fix misuse of netdev_alloc_skb in refilling rx ring
    net/tg3: Fix warning from pci_disable_device()
    net/tg3: Fix kernel crash
    ...

    Linus Torvalds
     

30 Jul, 2013

1 commit

  • Calling freeze_processes sets a global flag that will cause any
    process that calls try_to_freeze to enter the refrigerator. It
    skips sending a signal to the current task, but if the current
    task ever hits try_to_freeze, all threads will be frozen and the
    system will deadlock.

    Set a new flag, PF_SUSPEND_TASK, on the task that calls
    freeze_processes. The flag notifies the freezer that the thread
    is involved in suspend and should not be frozen. Also add a
    WARN_ON in thaw_processes if the caller does not have the
    PF_SUSPEND_TASK flag set to catch if a different task calls
    thaw_processes than the one that called freeze_processes, leaving
    a task with PF_SUSPEND_TASK permanently set on it.

    Threads that spawn off a task with PF_SUSPEND_TASK set (which
    swsusp does) will also have PF_SUSPEND_TASK set, preventing them
    from freezing while they are helping with suspend, but they need
    to be dead by the time suspend is triggered, otherwise they may
    run when userspace is expected to be frozen. Add a WARN_ON in
    thaw_processes if more than one thread has the PF_SUSPEND_TASK
    flag set.

    Reported-and-tested-by: Michael Leun
    Signed-off-by: Colin Cross
    Signed-off-by: Rafael J. Wysocki

    Colin Cross
     

29 Jul, 2013

2 commits

  • Revert commit 69a37bea (cpuidle: Quickly notice prediction failure for
    repeat mode), because it has been identified as the source of a
    significant performance regression in v3.8 and later as explained by
    Jeremy Eder:

    We believe we've identified a particular commit to the cpuidle code
    that seems to be impacting performance of variety of workloads.
    The simplest way to reproduce is using netperf TCP_RR test, so
    we're using that, on a pair of Sandy Bridge based servers. We also
    have data from a large database setup where performance is also
    measurably/positively impacted, though that test data isn't easily
    share-able.

    Included below are test results from 3 test kernels:

    kernel reverts
    -----------------------------------------------------------
    1) vanilla upstream (no reverts)

    2) perfteam2 reverts e11538d1f03914eb92af5a1a378375c05ae8520c

    3) test reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4
    e11538d1f03914eb92af5a1a378375c05ae8520c

    In summary, netperf TCP_RR numbers improve by approximately 4%
    after reverting 69a37beabf1f0a6705c08e879bdd5d82ff6486c4. When
    69a37beabf1f0a6705c08e879bdd5d82ff6486c4 is included, C0 residency
    never seems to get above 40%. Taking that patch out gets C0 near
    100% quite often, and performance increases.

    The below data are histograms representing the %c0 residency @
    1-second sample rates (using turbostat), while under netperf test.

    - If you look at the first 4 histograms, you can see %c0 residency
    almost entirely in the 30,40% bin.
    - The last pair, which reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4,
    shows %c0 in the 80,90,100% bins.

    Below each kernel name are netperf TCP_RR trans/s numbers for the
    particular kernel that can be disclosed publicly, comparing the 3
    test kernels. We ran a 4th test with the vanilla kernel where
    we've also set /dev/cpu_dma_latency=0 to show overall impact
    boosting single-threaded TCP_RR performance over 11% above
    baseline.

    3.10-rc2 vanilla RX + c0 lock (/dev/cpu_dma_latency=0):
    TCP_RR trans/s 54323.78

    -----------------------------------------------------------
    3.10-rc2 vanilla RX (no reverts)
    TCP_RR trans/s 48192.47

    Receiver %c0
    0.0000 - 10.0000 [ 1]: *
    10.0000 - 20.0000 [ 0]:
    20.0000 - 30.0000 [ 0]:
    30.0000 - 40.0000 [ 59]:
    ***********************************************************
    40.0000 - 50.0000 [ 1]: *
    50.0000 - 60.0000 [ 0]:
    60.0000 - 70.0000 [ 0]:
    70.0000 - 80.0000 [ 0]:
    80.0000 - 90.0000 [ 0]:
    90.0000 - 100.0000 [ 0]:

    Sender %c0
    0.0000 - 10.0000 [ 1]: *
    10.0000 - 20.0000 [ 0]:
    20.0000 - 30.0000 [ 0]:
    30.0000 - 40.0000 [ 11]: ***********
    40.0000 - 50.0000 [ 49]:
    *************************************************
    50.0000 - 60.0000 [ 0]:
    60.0000 - 70.0000 [ 0]:
    70.0000 - 80.0000 [ 0]:
    80.0000 - 90.0000 [ 0]:
    90.0000 - 100.0000 [ 0]:

    -----------------------------------------------------------
    3.10-rc2 perfteam2 RX (reverts commit
    e11538d1f03914eb92af5a1a378375c05ae8520c)
    TCP_RR trans/s 49698.69

    Receiver %c0
    0.0000 - 10.0000 [ 1]: *
    10.0000 - 20.0000 [ 1]: *
    20.0000 - 30.0000 [ 0]:
    30.0000 - 40.0000 [ 59]:
    ***********************************************************
    40.0000 - 50.0000 [ 0]:
    50.0000 - 60.0000 [ 0]:
    60.0000 - 70.0000 [ 0]:
    70.0000 - 80.0000 [ 0]:
    80.0000 - 90.0000 [ 0]:
    90.0000 - 100.0000 [ 0]:

    Sender %c0
    0.0000 - 10.0000 [ 1]: *
    10.0000 - 20.0000 [ 0]:
    20.0000 - 30.0000 [ 0]:
    30.0000 - 40.0000 [ 2]: **
    40.0000 - 50.0000 [ 58]:
    **********************************************************
    50.0000 - 60.0000 [ 0]:
    60.0000 - 70.0000 [ 0]:
    70.0000 - 80.0000 [ 0]:
    80.0000 - 90.0000 [ 0]:
    90.0000 - 100.0000 [ 0]:

    -----------------------------------------------------------
    3.10-rc2 test RX (reverts 69a37beabf1f0a6705c08e879bdd5d82ff6486c4
    and e11538d1f03914eb92af5a1a378375c05ae8520c)
    TCP_RR trans/s 47766.95

    Receiver %c0
    0.0000 - 10.0000 [ 1]: *
    10.0000 - 20.0000 [ 1]: *
    20.0000 - 30.0000 [ 0]:
    30.0000 - 40.0000 [ 27]: ***************************
    40.0000 - 50.0000 [ 2]: **
    50.0000 - 60.0000 [ 0]:
    60.0000 - 70.0000 [ 2]: **
    70.0000 - 80.0000 [ 0]:
    80.0000 - 90.0000 [ 0]:
    90.0000 - 100.0000 [ 28]: ****************************

    Sender:
    0.0000 - 10.0000 [ 1]: *
    10.0000 - 20.0000 [ 0]:
    20.0000 - 30.0000 [ 0]:
    30.0000 - 40.0000 [ 11]: ***********
    40.0000 - 50.0000 [ 0]:
    50.0000 - 60.0000 [ 1]: *
    60.0000 - 70.0000 [ 0]:
    70.0000 - 80.0000 [ 3]: ***
    80.0000 - 90.0000 [ 7]: *******
    90.0000 - 100.0000 [ 38]: **************************************

    These results demonstrate gaining back the tendency of the CPU to
    stay in more responsive, performant C-states (and thus yield
    measurably better performance), by reverting commit
    69a37beabf1f0a6705c08e879bdd5d82ff6486c4.

    Requested-by: Jeremy Eder
    Tested-by: Len Brown
    Cc: 3.8+
    Signed-off-by: Rafael J. Wysocki

    Rafael J. Wysocki
     
  • …t/rostedt/linux-trace

    Pull tracing fixes from Steven Rostedt:
    "Oleg is working on fixing a very tight race between opening a event
    file and deleting that event at the same time (both must be done as
    root).

    I also found a bug while testing Oleg's patches which has to do with a
    race with kprobes using the function tracer.

    There's also a deadlock fix that was introduced with the previous
    fixes"

    * tag 'trace-fixes-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Remove locking trace_types_lock from tracing_reset_all_online_cpus()
    ftrace: Add check for NULL regs if ops has SAVE_REGS set
    tracing: Kill trace_cpu struct/members
    tracing: Change tracing_fops/snapshot_fops to rely on tracing_get_cpu()
    tracing: Change tracing_entries_fops to rely on tracing_get_cpu()
    tracing: Change tracing_stats_fops to rely on tracing_get_cpu()
    tracing: Change tracing_buffers_fops to rely on tracing_get_cpu()
    tracing: Change tracing_pipe_fops() to rely on tracing_get_cpu()
    tracing: Introduce trace_create_cpu_file() and tracing_get_cpu()

    Linus Torvalds
     

27 Jul, 2013

1 commit

  • When (integer) sysctl values are expressed in ms and have to be
    represented internally as jiffies. The msecs_to_jiffies function
    returns an unsigned long, which gets assigned to the integer.
    This patch prevents the value to be assigned if bigger than
    INT_MAX, done in a similar way as in cba9f3 ("Range checking in
    do_proc_dointvec_(userhz_)jiffies_conv").

    Signed-off-by: Francesco Fusco
    CC: Andrew Morton
    CC: linux-kernel@vger.kernel.org
    Signed-off-by: David S. Miller

    Francesco Fusco
     

26 Jul, 2013

1 commit

  • Commit a82274151af "tracing: Protect ftrace_trace_arrays list in trace_events.c"
    added taking the trace_types_lock mutex in trace_events.c as there were
    several locations that needed it for protection. Unfortunately, it also
    encapsulated a call to tracing_reset_all_online_cpus() which also takes
    the trace_types_lock, causing a deadlock.

    This happens when a module has tracepoints and has been traced. When the
    module is removed, the trace events module notifier will grab the
    trace_types_lock, do a bunch of clean ups, and also clears the buffer
    by calling tracing_reset_all_online_cpus. This doesn't happen often
    which explains why it wasn't caught right away.

    Commit a82274151af was marked for stable, which means this must be
    sent to stable too.

    Link: http://lkml.kernel.org/r/51EEC646.7070306@broadcom.com

    Reported-by: Arend van Spril
    Tested-by: Arend van Spriel
    Cc: Alexander Z Lam
    Cc: Vaibhav Nagarnaik
    Cc: David Sharp
    Cc: stable@vger.kernel.org # 3.10
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     

24 Jul, 2013

10 commits

  • If a ftrace ops is registered with the SAVE_REGS flag set, and there's
    already a ops registered to one of its functions but without the
    SAVE_REGS flag, there's a small race window where the SAVE_REGS ops gets
    added to the list of callbacks to call for that function before the
    callback trampoline gets set to save the regs.

    The problem is, the function is not currently saving regs, which opens
    a small race window where the ops that is expecting regs to be passed
    to it, wont. This can cause a crash if the callback were to reference
    the regs, as the SAVE_REGS guarantees that regs will be set.

    To fix this, we add a check in the loop case where it checks if the ops
    has the SAVE_REGS flag set, and if so, it will ignore it if regs is
    not set.

    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • After the previous changes trace_array_cpu->trace_cpu and
    trace_array->trace_cpu becomes write-only. Remove these members
    and kill "struct trace_cpu" as well.

    As a side effect this also removes memset(per_cpu_memory, 0).
    It was not needed, alloc_percpu() returns zero-filled memory.

    Link: http://lkml.kernel.org/r/20130723152613.GA23741@redhat.com

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • tracing_open() and tracing_snapshot_open() are racy, the memory
    inode->i_private points to can be already freed.

    Convert these last users of "inode->i_private == trace_cpu" to
    use "i_private = trace_array" and rely on tracing_get_cpu().

    v2: incorporate the fix from Steven, tracing_release() must not
    blindly dereference file->private_data unless we know that
    the file was opened for reading.

    Link: http://lkml.kernel.org/r/20130723152610.GA23737@redhat.com

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • tracing_open_generic_tc() is racy, the memory inode->i_private
    points to can be already freed.

    1. Change its last user, tracing_entries_fops, to use
    tracing_*_generic_tr() instead.

    2. Change debugfs_create_file("buffer_size_kb", data) callers
    to pass "data = tr".

    3. Change tracing_entries_read() and tracing_entries_write() to
    use tracing_get_cpu().

    4. Kill the no longer used tracing_open_generic_tc() and
    tracing_release_generic_tc().

    Link: http://lkml.kernel.org/r/20130723152606.GA23730@redhat.com

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • tracing_open_generic_tc() is racy, the memory inode->i_private
    points to can be already freed.

    1. Change one of its users, tracing_stats_fops, to use
    tracing_*_generic_tr() instead.

    2. Change trace_create_cpu_file("stats", data) to pass "data = tr".

    3. Change tracing_stats_read() to use tracing_get_cpu().

    Link: http://lkml.kernel.org/r/20130723152603.GA23727@redhat.com

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • tracing_buffers_open() is racy, the memory inode->i_private points
    to can be already freed.

    Change debugfs_create_file("trace_pipe_raw", data) caller to pass
    "data = tr", tracing_buffers_open() can use tracing_get_cpu().

    Change debugfs_create_file("snapshot_raw_fops", data) caller too,
    this file uses tracing_buffers_open/release.

    Link: http://lkml.kernel.org/r/20130723152600.GA23720@redhat.com

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • tracing_open_pipe() is racy, the memory inode->i_private points to
    can be already freed.

    Change debugfs_create_file("trace_pipe", data) callers to to pass
    "data = tr", tracing_open_pipe() can use tracing_get_cpu().

    Link: http://lkml.kernel.org/r/20130723152557.GA23717@redhat.com

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • Every "file_operations" used by tracing_init_debugfs_percpu is buggy.
    f_op->open/etc does:

    1. struct trace_cpu *tc = inode->i_private;
    struct trace_array *tr = tc->tr;

    2. trace_array_get(tr) or fail;

    3. do_something(tc);

    But tc (and tr) can be already freed before trace_array_get() is called.
    And it doesn't matter whether this file is per-cpu or it was created by
    init_tracer_debugfs(), free_percpu() or kfree() are equally bad.

    Note that even 1. is not safe, the freed memory can be unmapped. But even
    if it was safe trace_array_get() can wrongly succeed if we also race with
    the next new_instance_create() which can re-allocate the same tr, or tc
    was overwritten and ->tr points to the valid tr. In this case 3. uses the
    freed/reused memory.

    Add the new trivial helper, trace_create_cpu_file() which simply calls
    trace_create_file() and encodes "cpu" in "struct inode". Another helper,
    tracing_get_cpu() will be used to read cpu_nr-or-RING_BUFFER_ALL_CPUS.

    The patch abuses ->i_cdev to encode the number, it is never used unless
    the file is S_ISCHR(). But we could use something else, say, i_bytes or
    even ->d_fsdata. In any case this hack is hidden inside these 2 helpers,
    it would be trivial to change them if needed.

    This patch only changes tracing_init_debugfs_percpu() to use the new
    trace_create_cpu_file(), the next patches will change file_operations.

    Note: tracing_get_cpu(inode) is always safe but you can't trust the
    result unless trace_array_get() was called, without trace_types_lock
    which acts as a barrier it can wrongly return RING_BUFFER_ALL_CPUS.

    Link: http://lkml.kernel.org/r/20130723152554.GA23710@redhat.com

    Cc: Al Viro
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • Pull cgroup changes from Tejun Heo:
    "This contains two patches, both of which aren't fixes per-se but I
    think it'd be better to fast-track them.

    One removes bcache_subsys_id which was added without proper review
    through the block tree. Fortunately, bcache cgroup code is
    unconditionally disabled, so this was never exposed to userland. The
    cgroup subsys_id is removed. Kent will remove the affected (disabled)
    code through bcache branch.

    The other simplifies task_group_path_from_hierarchy(). The function
    doesn't currently have in-kernel users but there are external code and
    development going on dependent on the function and making the function
    available for 3.11 would make things go smoother"

    * 'for-3.11-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
    cgroup: replace task_cgroup_path_from_hierarchy() with task_cgroup_path()
    cgroup: remove bcache_subsys_id which got added stealthily

    Linus Torvalds
     
  • Fix __wait_on_atomic_t() so that it calls the action func if the counter != 0
    rather than if the counter is 0 so as to be analogous to __wait_on_bit().

    Thanks to Yacine who found this by visual inspection.

    This will affect FS-Cache in that it will could fail to sleep correctly when
    trying to clean up after a netfs cookie is withdrawn.

    Reported-by: Yacine Belkadi
    Signed-off-by: David Howells
    Reviewed-by: Jeff Layton
    cc: Milosz Tanski
    Signed-off-by: Linus Torvalds

    David Howells
     

23 Jul, 2013

1 commit

  • Pull tracing fixes and cleanups from Steven Rostedt:
    "This contains fixes, optimizations and some clean ups

    Some of the fixes need to go back to 3.10. They are minor, and deal
    mostly with incorrect ref counting in accessing event files.

    There was a couple of optimizations that should have perf perform a
    bit better when accessing trace events.

    And some various clean ups. Some of the clean ups are necessary to
    help in a fix to a theoretical race between opening a event file and
    deleting that event"

    * tag 'trace-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
    tracing: Kill the unbalanced tr->ref++ in tracing_buffers_open()
    tracing: Kill trace_array->waiter
    tracing: Do not (ab)use trace_seq in event_id_read()
    tracing: Simplify the iteration logic in f_start/f_next
    tracing: Add ref_data to function and fgraph tracer structs
    tracing: Miscellaneous fixes for trace_array ref counting
    tracing: Fix error handling to ensure instances can always be removed
    tracing/kprobe: Wait for disabling all running kprobe handlers
    tracing/perf: Move the PERF_MAX_TRACE_SIZE check into perf_trace_buf_prepare()
    tracing/syscall: Avoid perf_trace_buf_*() if sys_data->perf_events is empty
    tracing/function: Avoid perf_trace_buf_*() if event_function.perf_events is empty
    tracing: Typo fix on ring buffer comments
    tracing: Use trace_seq_puts()/trace_seq_putc() where possible
    tracing: Use correct config guard CONFIG_STACK_TRACER

    Linus Torvalds
     

20 Jul, 2013

2 commits

  • tracing_buffers_open() does trace_array_get() and then it wrongly
    inrcements tr->ref again under trace_types_lock. This means that
    every caller leaks trace_array:

    # cd /sys/kernel/debug/tracing/
    # mkdir instances/X
    # true < instances/X/per_cpu/cpu0/trace_pipe_raw
    # rmdir instances/X
    rmdir: failed to remove `instances/X': Device or resource busy

    Link: http://lkml.kernel.org/r/20130719153644.GA18899@redhat.com

    Cc: Ingo Molnar
    Cc: Frederic Weisbecker
    Cc: Masami Hiramatsu
    Cc: stable@vger.kernel.org # 3.10
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • Pull power management and ACPI fixes from Rafael Wysocki:
    "These are fixes collected over the last week, most importnatly two
    cpufreq reverts fixing regressions introduced in 3.10, an autoseelp
    fix preventing systems using it from crashing during shutdown and two
    ACPI scan fixes related to hotplug.

    Specifics:

    - Two cpufreq commits from the 3.10 cycle introduced regressions.
    The first of them was buggy (it did way much more than it needed to
    do) and the second one attempted to fix an issue introduced by the
    first one. Fixes from Srivatsa S Bhat revert both.

    - If autosleep triggers during system shutdown and the shutdown
    callbacks of some device drivers have been called already, it may
    crash the system. Fix from Liu Shuo prevents that from happening
    by making try_to_suspend() check system_state.

    - The ACPI memory hotplug driver doesn't clear its driver_data on
    errors which may cause a NULL poiter dereference to happen later.
    Fix from Toshi Kani.

    - The ACPI namespace scanning code should not try to attach scan
    handlers to device objects that have them already, which may
    confuse things quite a bit, and it should rescan the whole
    namespace branch starting at the given node after receiving a bus
    check notify event even if the device at that particular node has
    been discovered already. Fixes from Rafael J Wysocki.

    - New ACPI video blacklist entry for a system whose initial backlight
    setting from the BIOS doesn't make sense. From Lan Tianyu.

    - Garbage string output avoindance for ACPI PNP from Liu Shuo.

    - Two Kconfig fixes for issues introduced recently in the s3c24xx
    cpufreq driver (when moving the driver to drivers/cpufreq) from
    Paul Bolle.

    - Trivial comment fix in pm_wakeup.h from Chanwoo Choi"

    * tag 'pm+acpi-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
    ACPI / video: ignore BIOS initial backlight value for Fujitsu E753
    PNP / ACPI: avoid garbage in resource name
    cpufreq: Revert commit 2f7021a8 to fix CPU hotplug regression
    cpufreq: s3c24xx: fix "depends on ARM_S3C24XX" in Kconfig
    cpufreq: s3c24xx: rename CONFIG_CPU_FREQ_S3C24XX_DEBUGFS
    PM / Sleep: Fix comment typo in pm_wakeup.h
    PM / Sleep: avoid 'autosleep' in shutdown progress
    cpufreq: Revert commit a66b2e to fix suspend/resume regression
    ACPI / memhotplug: Fix a stale pointer in error path
    ACPI / scan: Always call acpi_bus_scan() for bus check notifications
    ACPI / scan: Do not try to attach scan handlers to devices having them

    Linus Torvalds
     

19 Jul, 2013

13 commits

  • Trivial. trace_array->waiter has no users since 6eaaa5d5
    "tracing/core: use appropriate waiting on trace_pipe".

    Link: http://lkml.kernel.org/r/20130719142036.GA1594@redhat.com

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • event_id_read() has no reason to kmalloc "struct trace_seq"
    (more than PAGE_SIZE!), it can use a small buffer instead.

    Note: "if (*ppos) return 0" looks strange and even wrong,
    simple_read_from_buffer() handles ppos != 0 case corrrectly.

    And it seems that almost every user of trace_seq in this file
    should be converted too. Unless you use seq_open(), trace_seq
    buys nothing compared to the raw buffer, but it needs a bit
    more memory and code.

    Link: http://lkml.kernel.org/r/20130718184712.GA4786@redhat.com

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • f_next() looks overcomplicated, and it is not strictly correct
    even if this doesn't matter.

    Say, FORMAT_FIELD_SEPERATOR should not return NULL (means EOF)
    if trace_get_fields() returns an empty list, we should simply
    advance to FORMAT_PRINTFMT as we do when we find the end of list.

    1. Change f_next() to return "struct list_head *" rather than
    "ftrace_event_field *", and change f_show() to do list_entry().

    This simplifies the code a bit, only f_show() needs to know
    about ftrace_event_field, and f_next() can play with ->prev
    directly

    2. Change f_next() to not play with ->prev / return inside the
    switch() statement. It can simply set node = head/common_head,
    the prev-or-advance-to-the-next-magic below does all work.

    While at it. f_start() looks overcomplicated too. I don't think
    *pos == 0 makes sense as a separate case, just change this code
    to do "while" instead of "do/while".

    The patch also moves f_start() down, close to f_stop(). This is
    purely cosmetic, just to make the locking added by the next patch
    more clear/visible.

    Link: http://lkml.kernel.org/r/20130718184710.GA4783@redhat.com

    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • The selftest for function and function graph tracers are defined as
    __init, as they are only executed at boot up. The "tracer" structs
    that are associated to those tracers are not setup as __init as they
    are used after boot. To stop mismatch warnings, those structures
    need to be annotated with __ref_data.

    Currently, the tracer structures are defined to __read_mostly, as they
    do not really change. But in the future they should be converted to
    consts, but that will take a little work because they have a "next"
    pointer that gets updated when they are registered. That will have to
    wait till the next major release.

    Link: http://lkml.kernel.org/r/1373596735.17876.84.camel@gandalf.local.home

    Reported-by: kbuild test robot
    Reported-by: Chen Gang
    Signed-off-by: Steven Rostedt

    Steven Rostedt (Red Hat)
     
  • Some error paths did not handle ref counting properly, and some trace files need
    ref counting.

    Link: http://lkml.kernel.org/r/1374171524-11948-1-git-send-email-azl@google.com

    Cc: stable@vger.kernel.org # 3.10
    Cc: Vaibhav Nagarnaik
    Cc: David Sharp
    Cc: Alexander Z Lam
    Signed-off-by: Alexander Z Lam
    Signed-off-by: Steven Rostedt

    Alexander Z Lam
     
  • Remove debugfs directories for tracing instances during creation if an error
    occurs causing the trace_array for that instance to not be added to
    ftrace_trace_arrays. If the directory continues to exist after the error, it
    cannot be removed because the respective trace_array is not in
    ftrace_trace_arrays.

    Link: http://lkml.kernel.org/r/1373502874-1706-2-git-send-email-azl@google.com

    Cc: stable@vger.kernel.org # 3.10
    Cc: Vaibhav Nagarnaik
    Cc: David Sharp
    Cc: Alexander Z Lam
    Signed-off-by: Alexander Z Lam
    Signed-off-by: Steven Rostedt

    Alexander Z Lam
     
  • Wait for disabling all running kprobe handlers when a kprobe
    event is disabled, since the caller, trace_remove_event_call()
    supposes that a removing event is disabled completely by
    disabling the event.
    With this change, ftrace can ensure that there is no running
    event handlers after disabling it.

    Link: http://lkml.kernel.org/r/20130709093526.20138.93100.stgit@mhiramat-M0-7522

    Signed-off-by: Masami Hiramatsu
    Signed-off-by: Steven Rostedt

    Masami Hiramatsu
     
  • Every perf_trace_buf_prepare() caller does
    WARN_ONCE(size > PERF_MAX_TRACE_SIZE, message) and "message" is
    almost the same.

    Shift this WARN_ONCE() into perf_trace_buf_prepare(). This changes
    the meaning of _ONCE, but I think this is fine.

    - 4947014 2932448 10104832 17984294 1126b26 vmlinux
    + 4948422 2932448 10104832 17985702 11270a6 vmlinux

    on my build.

    Link: http://lkml.kernel.org/r/20130617170211.GA19813@redhat.com

    Acked-by: Peter Zijlstra
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • perf_trace_buf_prepare() + perf_trace_buf_submit(head, task => NULL)
    make no sense if hlist_empty(head). Change perf_syscall_enter/exit()
    to check sys_data->{enter,exit}_event->perf_events beforehand.

    Link: http://lkml.kernel.org/r/20130617170207.GA19806@redhat.com

    Acked-by: Peter Zijlstra
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • perf_trace_buf_prepare() + perf_trace_buf_submit(head, task => NULL)
    make no sense if hlist_empty(head). Change perf_ftrace_function_call()
    to check event_function.perf_events beforehand.

    Link: http://lkml.kernel.org/r/20130617170204.GA19803@redhat.com

    Acked-by: Peter Zijlstra
    Signed-off-by: Oleg Nesterov
    Signed-off-by: Steven Rostedt

    Oleg Nesterov
     
  • There have some mismatch between comments with
    real function name, update it.

    This patch also add some missed function arguments
    description.

    Link: http://lkml.kernel.org/r/51E3B3B2.4080307@huawei.com

    Signed-off-by: zhangwei(Jovi)
    Signed-off-by: Steven Rostedt

    zhangwei(Jovi)
     
  • For string without format specifiers, use trace_seq_puts()
    or trace_seq_putc().

    Link: http://lkml.kernel.org/r/51E3B3AC.1000605@huawei.com

    Signed-off-by: zhangwei(Jovi)
    [ fixed a trace_seq_putc(s, " ") to trace_seq_putc(s, ' ') ]
    Signed-off-by: Steven Rostedt

    zhangwei(Jovi)
     
  • Pull driver core patches from Greg KH:
    "Here are some driver core patches for 3.11-rc2. They aren't really
    bugfixes, but a bunch of new helper macros for drivers to properly
    create attribute groups, which drivers and subsystems need to fix up a
    ton of race issues with incorrectly creating sysfs files (binary and
    normal) after userspace has been told that the device is present.

    Also here is the ability to create binary files as attribute groups,
    to solve that race condition, which was impossible to do before this,
    so that's my fault the drivers were broken.

    The majority of the .c changes is indenting and moving code around a
    bit. It affects no existing code, but allows the large backlog of 70+
    patches that I already have created to start flowing into the
    different subtrees, instead of having to live in my driver-core tree,
    causing merge nightmares in linux-next for the next few months.

    These were finalized too late for the -rc1 merge window, which is why
    they were didn't make that pull request, testing and review from
    others didn't happen until a few weeks ago, and then there's the whole
    distraction of the past few days, which prevented these from getting
    to you sooner, sorry about that.

    Oh, and there's a bugfix for the documentation build warning in here
    as well. All of these have been in linux-next this week, with no
    reported problems"

    * tag 'driver-core-3.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
    driver-core: fix new kernel-doc warning in base/platform.c
    sysfs: use file mode defines from stat.h
    sysfs: add more helper macro's for (bin_)attribute(_groups)
    driver core: add default groups to struct class
    driver core: Introduce device_create_groups
    sysfs: prevent warning when only using binary attributes
    sysfs: add support for binary attributes in groups
    driver core: device.h: add RW and RO attribute macros
    sysfs.h: add BIN_ATTR macro
    sysfs.h: add ATTRIBUTE_GROUPS() macro
    sysfs.h: add __ATTR_RW() macro

    Linus Torvalds