1. 15 Jun, 2009 8 commits
    • Paul Mackerras's avatar
      perf_counter: powerpc: Fix two compile warnings · 90c8f954
      Paul Mackerras authored
      
      This fixes a couple of compile warnings that crept into the powerpc
      perf_counter code recently:
      
         CC      arch/powerpc/kernel/perf_counter.o
       arch/powerpc/kernel/perf_counter.c: In function 'record_and_restart':
       arch/powerpc/kernel/perf_counter.c:1016: warning: unused variable 'addr'
       arch/powerpc/kernel/perf_counter.c: In function 'hw_perf_counter_init':
       arch/powerpc/kernel/perf_counter.c:891: warning: 'ev' may be used uninitialized in this function
      
      Stephen Rothwell reported this against linux-next as well.
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <18998.12884.787039.22202@cargo.ozlabs.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      90c8f954
    • Ingo Molnar's avatar
      perf report: Add per system call overhead histogram · 3dfabc74
      Ingo Molnar authored
      
      Take advantage of call-graph percounter sampling/recording to
      display a non-trivial histogram: the true, collapsed/summarized
      cost measurement, on a per system call total overhead basis:
      
       aldebaran:~/linux/linux/tools/perf> ./perf record -g -a -f ~/hackbench 10
       aldebaran:~/linux/linux/tools/perf> ./perf report -s symbol --syscalls | head -10
       #
       # (3536 samples)
       #
       # Overhead  Symbol
       # ........  ......
       #
           40.75%  [k] sys_write
           40.21%  [k] sys_read
            4.44%  [k] do_nmi
       ...
      
      This is done by accounting each (reliable) call-chain that chains back
      to a given system call to that system call function.
      
      [ So in the above example we can see that hackbench spends about 40% of
        its total time somewhere in sys_write() and 40% somewhere in
        sys_read(), the rest of the time is spent in user-space. The time
        is not spent in sys_write() _itself_ but in one of its many child
        functions. ]
      
      Or, a recording of a (source files are already in the page-cache) kernel build:
      
       $ perf record -g -m 512 -f -- make -j32 kernel
       $ perf report -s s --syscalls | grep '\[k\]' | grep -v nmi
      
           4.14%  [k] do_page_fault
           1.20%  [k] sys_write
           1.10%  [k] sys_open
           0.63%  [k] sys_exit_group
           0.48%  [k] smp_apic_timer_interrupt
           0.37%  [k] sys_read
           0.37%  [k] sys_execve
           0.20%  [k] sys_mmap
           0.18%  [k] sys_close
           0.14%  [k] sys_munmap
           0.13%  [k] sys_poll
           0.09%  [k] sys_newstat
           0.07%  [k] sys_clone
           0.06%  [k] sys_newfstat
           0.05%  [k] sys_access
           0.05%  [k] schedule
      
      Shows the true total cost of each syscall variant that gets used
      during a kernel build. This profile reveals it that pagefaults are
      the costliest, followed by read()/write().
      
      An interesting detail: timer interrupts cost 0.5% - or 0.5 seconds
      per 100 seconds of kernel build-time. (this was done with HZ=1000)
      
      The summary is done in 'perf report', i.e. in the post-processing
      stage - so once we have a good call-graph recording, this type of
      non-trivial high-level analysis becomes possible.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3dfabc74
    • Peter Zijlstra's avatar
      perf_counter: x86: Fix call-chain support to use NMI-safe methods · 74193ef0
      Peter Zijlstra authored
      
      __copy_from_user_inatomic() isn't NMI safe in that it can trigger
      the page fault handler which is another trap and its return path
      invokes IRET which will also close the NMI context.
      
      Therefore use a GUP based approach to copy the stack frames over.
      
      We tried an alternative solution as well: we used a forward ported
      version of Mathieu Desnoyers's "NMI safe INT3 and Page Fault" patch
      that modifies the exception return path to use an open-coded IRET with
      explicit stack unrolling and TF checking.
      
      This didnt work as it interacted with faulting user-space instructions,
      causing them not to restart properly, which corrupts user-space
      registers.
      
      Solving that would probably involve disassembling those instructions
      and backtracing the RIP. But even without that, the code was deemed
      rather complex to the already non-trivial x86 entry assembly code,
      so instead we went for this GUP based method that does a
      software-walk of the pagetables.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      74193ef0
    • Peter Zijlstra's avatar
      x86: Add NMI types for kmap_atomic · 3ff0141a
      Peter Zijlstra authored
      
      Two new kmap_atomic slots for NMI context. And teach pte_offset_map()
      about NMI context.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      CC: Nick Piggin <npiggin@suse.de>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3ff0141a
    • Peter Zijlstra's avatar
      x86, mm: Add __get_user_pages_fast() · 465a454f
      Peter Zijlstra authored
      
      Introduce a gup_fast() variant which is usable from IRQ/NMI context.
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      CC: Nick Piggin <npiggin@suse.de>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      465a454f
    • Peter Zijlstra's avatar
      perf_counter: Fix ctx->mutex vs counter->mutex inversion · 75f937f2
      Peter Zijlstra authored
      
      Simon triggered a lockdep inversion report about us taking ctx->mutex
      vs counter->mutex in inverse orders. Fix that up.
      Reported-by: default avatarSimon Holm Thøgersen <odie@cs.aau.dk>
      Tested-by: default avatarSimon Holm Thøgersen <odie@cs.aau.dk>
      Signed-off-by: default avatarPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      75f937f2
    • Ingo Molnar's avatar
      perf record: Fix fast task-exit race · 613d8602
      Ingo Molnar authored
      
      Recording with -a (or with -p) can race with tasks going away:
      
         couldn't open /proc/8440/maps
      
      Causing an early exit() and no recording done.
      
      Do not abort the recording session - instead just skip that task.
      
      Also, only print the warnings under -v.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      613d8602
    • Ingo Molnar's avatar
      perf_counter, x86: Fix kernel-space call-chains · 038e836e
      Ingo Molnar authored
      
      Kernel-space call-chains were trimmed at the first entry because
      we never processed anything beyond the first stack context.
      
      Allow the backtrace to jump from NMI to IRQ stack then to task stack
      and finally user-space stack.
      
      Also calculate the stack and bp variables correctly so that the
      stack walker does not exit early.
      
      We can get deep traces as a result, visible in perf report -D output:
      
      0x32af0 [0xe0]: PERF_EVENT (IP, 5): 15134: 0xffffffff815225fd period: 1
      ... chain: u:2, k:22, nr:24
      .....  0: 0xffffffff815225fd
      .....  1: 0xffffffff810ac51c
      .....  2: 0xffffffff81018e29
      .....  3: 0xffffffff81523939
      .....  4: 0xffffffff81524b8f
      .....  5: 0xffffffff81524bd9
      .....  6: 0xffffffff8105e498
      .....  7: 0xffffffff8152315a
      .....  8: 0xffffffff81522c3a
      .....  9: 0xffffffff810d9b74
      ..... 10: 0xffffffff810dbeec
      ..... 11: 0xffffffff810dc3fb
      
      This is a 22-entries kernel-space chain.
      
      (We still only record reliable stack entries.)
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      038e836e
  2. 14 Jun, 2009 3 commits
    • Ingo Molnar's avatar
      perf_counter, x86: Fix call-chain walking · 5a6cec3a
      Ingo Molnar authored
      
      Fix the ptregs variant when we hit user-mode tasks.
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      5a6cec3a
    • Ingo Molnar's avatar
      perf record/report: Add call graph / call chain profiling · 3efa1cc9
      Ingo Molnar authored
      
      Add the first steps of call-graph profiling:
      
       - add the -c (--call-graph) option to perf record
       - parse the call-graph record and printout out under -D (--dump-trace)
      
      The call-graph data is not put into the histogram yet, but it
      can be seen that it's being processed correctly:
      
      0x3ce0 [0x38]: event: 35
      .
      . ... raw event: size 56 bytes
      .  0000:  23 00 00 00 05 00 38 00 d4 df 0e 81 ff ff ff ff  #.....8........
      .  0010:  60 0b 00 00 60 0b 00 00 03 00 00 00 01 00 02 00  `...`..........
      .  0020:  d4 df 0e 81 ff ff ff ff a0 61 ed 41 36 00 00 00  .........a.A6..
      .  0030:  04 92 e6 41 36 00 00 00                          .a.A6..
      .
      0x3ce0 [0x38]: PERF_EVENT (IP, 5): 2912: 0xffffffff810edfd4 period: 1
      ... chain: u:2, k:1, nr:3
      .....  0: 0xffffffff810edfd4
      .....  1: 0x3641ed61a0
      .....  2: 0x3641e69204
       ... thread: perf:2912
       ...... dso: [kernel]
      
      This shows a 3-entry call-graph: with 1 kernel-space and two user-space
      entries
      
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Pekka Enberg <penberg@cs.helsinki.fi>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      3efa1cc9
    • Ingo Molnar's avatar
      perf report: Print out raw events in hexa · 8465b050
      Ingo Molnar authored
      
      Print out events in hexa dump format, when -D is specified:
      
      0x4868 [0x48]: event: 1
      .
      . ... raw event: size 72 bytes
      .  0000:  01 00 00 00 00 00 48 00 d4 72 00 00 d4 72 00 00  ......H..r...r.
      .  0010:  00 00 40 f2 3e 00 00 00 00 30 01 00 00 00 00 00  ..@.>....0.....
      .  0020:  00 00 00 00 00 00 00 00 2f 75 73 72 2f 6c 69 62  ......../usr/li
      .  0030:  36 34 2f 6c 69 62 65 6c 66 2d 30 2e 31 34 31 2e  64/libelf-0.141
      .  0040:  73 6f 00 00 00 00 00 00                          f-0.141
      .
      0x4868 [0x48]: PERF_EVENT_MMAP 29396: [0x3ef2400000(0x13000) @ (nil)]: /usr/lib64/libelf-0.141.so
      
      This helps the debugging of mis-parsing of data files, and helps
      the addition of new sample/trace formats.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      8465b050
  3. 13 Jun, 2009 10 commits
    • Frederic Weisbecker's avatar
      perf annotate: Fixes for filename:line displays · c17c2db1
      Frederic Weisbecker authored
      
      - fix addr2line on userspace binary: don't only check kernel image.
      - fix string allocation size for path: missing ending null char room
      - fix overflow in symbol extra info
      Reported-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1244907563-7820-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      c17c2db1
    • Ingo Molnar's avatar
      perf stat: Enable raw data to be printed · ef281a19
      Ingo Molnar authored
      
      If -vv (very verbose) is specified, print out raw data
      in the following format:
      
      $ perf stat -vv -r 3 ./loop_1b_instructions
      
      [ perf stat: executing run #1 ... ]
      [ perf stat: executing run #2 ... ]
      [ perf stat: executing run #3 ... ]
      
      debug:              runtime[0]: 235871872
      debug:             walltime[0]: 236646752
      debug:       runtime_cycles[0]: 755150182
      debug:            counter/0[0]: 235871872
      debug:            counter/1[0]: 235871872
      debug:            counter/2[0]: 235871872
      debug:               scaled[0]: 0
      debug:            counter/0[1]: 2
      debug:            counter/1[1]: 235870662
      debug:            counter/2[1]: 235870662
      debug:               scaled[1]: 0
      debug:            counter/0[2]: 1
      debug:            counter/1[2]: 235870437
      debug:            counter/2[2]: 235870437
      debug:               scaled[2]: 0
      debug:            counter/0[3]: 140
      debug:            counter/1[3]: 235870298
      debug:            counter/2[3]: 235870298
      debug:               scaled[3]: 0
      debug:            counter/0[4]: 755150182
      debug:            counter/1[4]: 235870145
      debug:            counter/2[4]: 235870145
      debug:               scaled[4]: 0
      debug:            counter/0[5]: 1001411258
      debug:            counter/1[5]: 235868838
      debug:            counter/2[5]: 235868838
      debug:               scaled[5]: 0
      debug:            counter/0[6]: 27897
      debug:            counter/1[6]: 235868560
      debug:            counter/2[6]: 235868560
      debug:               scaled[6]: 0
      debug:            counter/0[7]: 2910
      debug:            counter/1[7]: 235868151
      debug:            counter/2[7]: 235868151
      debug:               scaled[7]: 0
      debug:              runtime[0]: 235980257
      debug:             walltime[0]: 236770942
      debug:       runtime_cycles[0]: 755114546
      debug:            counter/0[0]: 235980257
      debug:            counter/1[0]: 235980257
      debug:            counter/2[0]: 235980257
      debug:               scaled[0]: 0
      debug:            counter/0[1]: 3
      debug:            counter/1[1]: 235980049
      debug:            counter/2[1]: 235980049
      debug:               scaled[1]: 0
      debug:            counter/0[2]: 1
      debug:            counter/1[2]: 235979907
      debug:            counter/2[2]: 235979907
      debug:               scaled[2]: 0
      debug:            counter/0[3]: 135
      debug:            counter/1[3]: 235979780
      debug:            counter/2[3]: 235979780
      debug:               scaled[3]: 0
      debug:            counter/0[4]: 755114546
      debug:            counter/1[4]: 235979652
      debug:            counter/2[4]: 235979652
      debug:               scaled[4]: 0
      debug:            counter/0[5]: 1001439771
      debug:            counter/1[5]: 235979304
      debug:            counter/2[5]: 235979304
      debug:               scaled[5]: 0
      debug:            counter/0[6]: 23723
      debug:            counter/1[6]: 235979050
      debug:            counter/2[6]: 235979050
      debug:               scaled[6]: 0
      debug:            counter/0[7]: 2213
      debug:            counter/1[7]: 235978820
      debug:            counter/2[7]: 235978820
      debug:               scaled[7]: 0
      debug:              runtime[0]: 235888002
      debug:             walltime[0]: 236700533
      debug:       runtime_cycles[0]: 754881504
      debug:            counter/0[0]: 235888002
      debug:            counter/1[0]: 235888002
      debug:            counter/2[0]: 235888002
      debug:               scaled[0]: 0
      debug:            counter/0[1]: 2
      debug:            counter/1[1]: 235887793
      debug:            counter/2[1]: 235887793
      debug:               scaled[1]: 0
      debug:            counter/0[2]: 1
      debug:            counter/1[2]: 235887645
      debug:            counter/2[2]: 235887645
      debug:               scaled[2]: 0
      debug:            counter/0[3]: 135
      debug:            counter/1[3]: 235887499
      debug:            counter/2[3]: 235887499
      debug:               scaled[3]: 0
      debug:            counter/0[4]: 754881504
      debug:            counter/1[4]: 235887368
      debug:            counter/2[4]: 235887368
      debug:               scaled[4]: 0
      debug:            counter/0[5]: 1001401731
      debug:            counter/1[5]: 235887024
      debug:            counter/2[5]: 235887024
      debug:               scaled[5]: 0
      debug:            counter/0[6]: 24212
      debug:            counter/1[6]: 235886786
      debug:            counter/2[6]: 235886786
      debug:               scaled[6]: 0
      debug:            counter/0[7]: 1824
      debug:            counter/1[7]: 235886560
      debug:            counter/2[7]: 235886560
      debug:               scaled[7]: 0
      
       Performance counter stats for '/home/mingo/loop_1b_instructions' (3 runs):
      
           235.913377  task-clock-msecs     #      0.997 CPUs    ( +-   0.011% )
                    2  context-switches     #      0.000 M/sec   ( +-   0.000% )
                    1  CPU-migrations       #      0.000 M/sec   ( +-   0.000% )
                  136  page-faults          #      0.001 M/sec   ( +-   0.730% )
            755048744  cycles               #   3200.534 M/sec   ( +-   0.009% )
           1001417586  instructions         #      1.326 IPC     ( +-   0.001% )
                25277  cache-references     #      0.107 M/sec   ( +-   3.988% )
                 2315  cache-misses         #      0.010 M/sec   ( +-   9.845% )
      
          0.236706075  seconds time elapsed.
      
      This allows the summary stats to be validated.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      ef281a19
    • Ingo Molnar's avatar
      perf stat: Add feature to run and measure a command multiple times · 42202dd5
      Ingo Molnar authored
      
      Add the --repeat <n> feature to perf stat, which repeats a given
      command up to a 100 times, collects the stats and calculates an
      average and a stddev.
      
      For example, the following oneliner 'perf stat' command runs hackbench
      5 times and prints a tabulated result of all metrics, with averages
      and noise levels (in percentage) printed:
      
       aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10
       Time: 0.117
       Time: 0.108
       Time: 0.089
       Time: 0.088
       Time: 0.100
      
       Performance counter stats for '/home/mingo/hackbench 10' (5 runs):
      
          1243.989586  task-clock-msecs     #     10.460 CPUs    ( +-   4.720% )
                47706  context-switches     #      0.038 M/sec   ( +-  19.706% )
                  387  CPU-migrations       #      0.000 M/sec   ( +-   3.608% )
                17793  page-faults          #      0.014 M/sec   ( +-   0.354% )
           3770941606  cycles               #   3031.329 M/sec   ( +-   4.621% )
           1566372416  instructions         #      0.415 IPC     ( +-   2.703% )
             16783421  cache-references     #     13.492 M/sec   ( +-   5.202% )
              7128590  cache-misses         #      5.730 M/sec   ( +-   7.420% )
      
          0.118924455  seconds time elapsed.
      
      The goal of this feature is to allow the reliance on these accurate
      statistics and to know how many times a command has to be repeated
      for the noise to go down to an acceptable level.
      
      (The -v option can be used to see a line printed out as each run progresses.)
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      42202dd5
    • Ingo Molnar's avatar
      perf stat: Reorganize output · 44175b6f
      Ingo Molnar authored
      
       - use IPC for the instruction normalization output
       - CPUs for the CPU utilization factor value.
       - print out time elapsed like the other rows
       - tidy up the task-clocks/cpu-clocks printout
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      44175b6f
    • Jaswinder Singh Rajput's avatar
      perf_counter, x86: Update AMD hw caching related event table · f4db43a3
      Jaswinder Singh Rajput authored
      
      All AMD models share the same hw caching related event table.
      
      Also complete the table with more events.
      Signed-off-by: default avatarJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1244835381.2802.2.camel@ht.satnam>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      f4db43a3
    • Jaswinder Singh Rajput's avatar
      perf_counter, x86: Check old-AMD performance monitoring support · 4d2be126
      Jaswinder Singh Rajput authored
      
      AMD supports performance monitoring start from K7 (i.e. family 6),
      so disable it for earlier AMD CPUs.
      Signed-off-by: default avatarJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <1244714289.6923.0.camel@ht.satnam>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      4d2be126
    • Marti Raudsepp's avatar
      perf_counter: Fix stack corruption in perf_read_hw · d5e8da64
      Marti Raudsepp authored
      
      With PERF_FORMAT_ID, perf_read_hw now needs space for up to 4 values.
      
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      LKML-Reference: <new-submission>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      d5e8da64
    • Paul Mackerras's avatar
      perf_counter: Fix atomic_set vs. atomic64_t type mismatch · 87847b8f
      Paul Mackerras authored
      
      Using atomic_set on an atomic64_t variable gives a compiler
      warning on powerpc, and won't give the desired result at runtime.
      This fixes an instance of this error in the perf_counter code.
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <18995.20490.979429.244883@cargo.ozlabs.ibm.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      87847b8f
    • Frederic Weisbecker's avatar
      perf annotate: Print a sorted summary of annotated overhead lines · 971738f3
      Frederic Weisbecker authored
      
      It's can be very annoying to scroll down perf annotated output
      until we find relevant overhead.
      
      Using the -l option, you can now have a small summary sorted per
      overhead in the beginning of the output.
      
      Example:
      
      ./perf annotate -l -k ../../vmlinux -s __lock_acquire
      
      Sorted summary for file ../../vmlinux
      ----------------------------------------------
      
         12.04 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1653
          4.61 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1740
          3.77 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1775
          3.56 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1653
          2.93 /home/fweisbec/linux/linux-2.6-tip/arch/x86/include/asm/irqflags.h:15
          2.83 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2545
          2.30 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2594
          2.20 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2388
          2.20 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:730
          2.09 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:730
          2.09 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:138
          1.88 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2548
          1.47 /home/fweisbec/linux/linux-2.6-tip/arch/x86/include/asm/irqflags.h:15
          1.36 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2594
          1.36 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:730
          1.26 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1654
          1.26 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1653
          1.15 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:2592
          1.15 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1740
          1.15 /home/fweisbec/linux/linux-2.6-tip/kernel/lockdep.c:1740
      
      [...]
      
      Only overhead over 0.5% are summarized.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1244844682-12928-2-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      971738f3
    • Frederic Weisbecker's avatar
      perf annotate: Print the filename:line for annotated colored lines · 301406b9
      Frederic Weisbecker authored
      
      When we have a colored line in perf annotate, ie a middle/high
      overhead one, it's sometimes useful to get the matching line
      and filename from the source file, especially this path prepares
      to another subsequent one which will print a sorted summary of
      midle/high overhead lines in the beginning of the output.
      
      Filename:Lines have the same color than the concerned ip lines.
      
      It can be slow because it relies on addr2line. We could also
      use objdump with -l but that implies we would have to bufferize
      objdump output and parse it to filter the relevant lines since
      we want to print a sorted summary in the beginning.
      Signed-off-by: default avatarFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Paul Mackerras <paulus@samba.org>
      LKML-Reference: <1244844682-12928-1-git-send-email-fweisbec@gmail.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      301406b9
  4. 12 Jun, 2009 19 commits