Linux self-test and SystemTap

December 3, 2009

Modern operating system kernel to provide self-test function, which dynamically checks the kernel the ability to understand their behavior. These behaviors may reflect core issues and performance bottlenecks. When has this information, you can fine-tune or modify the kernel to avoid failure. This article explores SystemTap called open source infrastructure for the Linux ® kernel is to provide such a dynamic self.

<! - START RESERVED FOR FUTURE USE INCLUDE FILES - ><!-- include java script once we verify teams wants to use this and it will work on dbcs and cyrillic characters -> <! - END RESERVED FOR FUTURE USE INCLUDE FILES ->
SystemTap is a monitoring and tracking in the Linux kernel to run the operation of the dynamic method. Key words of this statement is dynamic, because SystemTap not use tools to build a special kernel, but allows you to dynamically at run time to install the tool. It is through the application of a programming interface called Kprobes (API) to achieve this goal, this paper will explore the API. We first understand some of the previous kernel tracking method, and then SystemTap in depth the structure and its use.

Kernel tracking

SystemTap with the old technology called DTrace is similar to the technology from Sun Solaris operating system. In DTrace, developers can use D programming language (C a subset of the language, but modified to support the tracking behavior) scripting. DTrace script contains a number of probes and associated operations, these operations in the probe "trigger" occurs. For example, the probe can be expressed as a simple system call can also be expressed more complex interactions, such as the implementation of specific lines of code. Listing 1 shows a simple example DTrace script, it is calculated for each process the number of system calls issued (Note that using a dictionary to count and process associate). The format of the script contains the probe (system call is issued when the trigger) and operations (corresponding to the operation of the script).

Listing 1. Calculation of system calls per process simple DTrace script


  @num[pid,execname] = count(); 


DTrace is the most compelling part of Solaris, so the development of other operating systems it is not surprising. DTrace is the Common Development and Distribution License (CDDL) issued under, and was ported to FreeBSD operating system.

Another useful tool is the kernel trace ProbeVue, it is the IBM operating system for the IBM ® AIX ® 6.1 development. You can use the ProbeVue detection system behavior and performance, and provide specific details of the process. This tool uses a standard kernel to dynamically track. Listing 2 shows an example of ProbeVue script, it pointed out that the system calls issued sync specific process.

Listing 2. That the process which calls sync script simple ProbeVue

  printf( "sync() syscall invoked by process ID %d\n", __pid );

DTrace and ProbeVue taking into account the respective operating systems in the great effects of an implementation plan for the Linux operating system, the function of the open-source project is overwhelming. SystemTap development since 2005, which provides with DTrace and ProbeVue similar functions. Many communities also further improve it, including Red Hat, Intel, Hitachi and IBM.

These solutions are similar in function, the trigger probe using probe and associated action script. Now, we look at the SystemTap installation, and then explore its architecture and use.

Back to top

Installation SystemTap

You may be able to install only a SystemTap support SystemTap, depending on your distribution and kernel. For other cases, need to use a debug kernel image. This section describes the Ubuntu version 8.10 (Intrepid Ibex) on the steps to install SystemTap, but this is not a representative SystemTap installation. In the reference section, you can find in other distributions and versions installed on SystemTap for more information.

For most users, installation is very simple SystemTap. For Ubuntu, use apt-get:

$ sudo apt-get install systemtap

After the installation is complete, you can test the kernel to see if it supports SystemTap. To do this, use the following simple command line script:

$ sudo stap -ve 'probe begin { log("hello world") exit() }'

If the script works properly, you will be in the standard output [stdout] see "hello world". If you do not see these two words, you also need other work. For Ubuntu 8.10, need to use a debug kernel image. Should be used to obtain package linux-image-debug-generic apt-get can get it. But here can not directly use apt-get, so you can download the package and install it using the dpkg. You can download the image package and the universal call to install it in accordance with the following:

$ wget

$ sudo dpkg -i linux-image-debug-2.6.27-14-generic_2.6.27-14.39_i386.ddeb

Now, have installed the generic debug image. For Ubuntu 8.10, also need a step: SystemTap distribution there is a problem, but you can easily modify the source code SystemTap solution. View reference on how to update the run-time time.c file.

If you use a custom kernel, you need to make sure the kernel option enabled CONFIG_RELAY, CONFIG_DEBUG_FS, CONFIG_DEBUG_INFO and CONFIG_KPROBES.

Back to top

SystemTap structure

Let us further explore the SystemTap in some detail, to understand how it is provided in the running kernel dynamic probes. You will also see SystemTap is how it works, the script from the build process to run the kernel in the activation of the script.

Check the kernel dynamically

SystemTap used to check running kernel is Kprobes and return two probes. But understanding the most critical element of any kernel is the kernel of the map, it provides symbolic information (such as functions, variables and their addresses). With the core map, you can resolve the address of any symbol, and change the behavior of the probe.

Kprobes start from 2.6.9 version to the mainstream Linux kernel, and to provide general services for the detection of the kernel. It provides a number of different services, but the two most important service is Kprobe and Kretprobe. Kprobe specific to architecture, it is necessary to check the first byte of the instruction to insert a breakpoint instruction. When you call the instruction, will execute the specific processing functions for the probe. Implementation is complete, then the implementation of the original instructions (starting from the breakpoint).

Kretprobes different, because it calls the function of the return operation results. Note that, because a function may have multiple return points, so it sounds somewhat complicated matter. However, it actually uses a simple technique known as the trampoline. You will add a small section of the function entry code, instead of checking the function of each return point. Replace this code uses the trampoline address the return address on the stack - Kretprobe address. When the function exists, it does not return to the caller, but the call Kretprobe (the implementation of its functions), then Kretprobe return to the actual caller.

SystemTap process

SystemTap Figure 1 shows the basic process involves three interactive utilities and five stages. The process first SystemTap script from the beginning. You use the stap utility stap script into the kernel module to provide probe behavior. stap process from script to convert the parse tree will start (pass 1). Then, the thinning (elaboration) step (pass 2) on the currently running kernel resolve symbol symbol information. Next, the conversion process will parse tree into C source code (pass 3) and use the parsed information and tapset script (SystemTap definition library contains useful functions). stap final step is to construct a kernel module build process using local kernel module (pass 4).

Figure 1. SystemTap process
Linux self-test and SystemTap

With available after the kernel module, stap completed their task, and control to the other two utility SystemTap: staprun and stapio. Coordination of the two utilities, is responsible for installing the kernel module and the output sent to stdout (pass 5). If the shell by pressing Ctrl-C key combination or the script out, the implementation of the removal process, which will lead to unload the module and exit all utilities.

SystemTap An interesting feature is the ability to change the script cache. If the script does not change after installation, you can use the existing modules, rather than re-building blocks. Figure 2 shows the user-space and kernel-space elements, and the conversion process based on stap.

Figure 2. From the kernel / user-space process point of view about SystemTap
Linux self-test and SystemTap

Back to top

SystemTap scripting

SystemTap write scripts in a very simple but also very flexible, there are many options you need to use. Brief details of language and the feasibility of providing a link to the manual, but this section will discuss only some examples, so you know SystemTap initial script.


SystemTap script triggered by the probe and the probe is required in the implementation of the code blocks. Probe many pre-defined patterns, Table 1 lists some of them. This table lists several probe types, including calling the kernel function and the function returns from the kernel.

Table 1. Probe model example

Probe type Explain
begin Triggered the start of the script
end Trigger the end of the script
kernel.function("sys_sync") Call sys_sync trigger
kernel.function("sys_sync").call Ibid
kernel.function("sys_sync").return Back sys_sync trigger
kernel.syscall.* For any system call is triggered
kernel.function("*@kernel/fork.c:934") The first 934 arrived fork.c row trigger
module("ext3").function("ext3_file_write") Trigger function calls ext3 write
timer.jiffies(1000) 1000 triggered every time the kernel jiffy Once every 200 milliseconds to trigger, with a linear distribution of the random additional time (-50 to +50)

Through a simple example to understand how to structure probe, and the code associated with the probe. Listing 3 shows a sample probe, it calls the kernel system call sys_sync trigger. When the probe is triggered, you want to calculate the number of calls, and send the count, and that the calling process ID (PID) information. First, declare that any probe can use a global value (global name space is common to all probes), then it is initialized to 0. Second, define your probe, which is a core function sys_sync probe entry. Script associated with the probe will increase count variable, and then send a message, the message defines the number of calls and the current call of the PID. Note that this example and C language very similar to the probe (probe except for the definition of syntax), if the language has C background will be very helpful.

Listing 3. A simple probe and scripts

global count=0

probe kernel.function("sys_sync") {
  printf( "sys_sync called %d times, currently by pid %d\n", count, pid );

You can also declare probe can call the function, in particular, is to probe calls for more general function. This tool also supports recursion to a given depth.

Variables and types

SystemTap allows you to define various types of variables, but the type is inferred from the context, so do not use the type declaration. In SystemTap, you can find the number (64-bit signed integer), integer (64 bit), string and literal content (string or integer). You can also use an associative array and statistical data (we will later discuss).


SystemTap provide C language commonly used in all the necessary operators, and usage is the same. You can also find the arithmetic operators, binary operators, assignment operator and pointer abandoned. You also see a simplified brought from C language, including the connection string, associative array elements, and combined operators.

Language elements

In the internal probe, SystemTap provides a set of easy to use as similar to C statement. Note that although the language allows you to develop complex scripts, but each probe can only run 1000 statement (this number is configurable). Table 2 lists a small part of the statement as an example. Note that many elements and C here in the same, although there are some additional things specific to SystemTap's.

Table 2. SystemTap language elements

Statement Explain
if (exp) {} else {} Standard if-then-else statement
for (exp1 ; exp2 ; exp3 ) {} A for cycle
while (exp) {} Standard while cycle
do {} while (exp) A do-while cycle
break Exit iteration
continue Continue iteration
next Back from the probe
return From the function returns an expression
foreach (VAR in ARRAY) {} Iteration an array, the current key assigned to VAR

This paper explores the statistical sample script data and aggregate functions, because this is C language does not exist.

Finally, SystemTap provides many internal functions that provide additional information on the current context. For example, you can use caller() identify the current function call, use cpu() identify the current processor number, and the use of pid() return PID. SystemTap also offers many other functions, provide the call stack and the current access to the registry.

Back to top

SystemTap example

After a brief introduction SystemTap points, we then through some simple examples to understand the SystemTap works. This article also shows some of the interesting aspects of scripting languages, such as polymerization.

System call monitoring

The previous section to explore a system call monitoring sync simple script. Now, we see a more representative of the script, it can monitor all system calls and collect additional information associated with them.

Listing 4 shows a simple script that contains a global variable definitions and three separate probes. In the first load script calls the first probe (begin probe). In this probe, you can send a script that runs in kernel text message. Next is a syscall probe. Note the use of wildcards (*), it tells SystemTap monitor all system calls that match. When the probe is triggered, for a particular PID and process name of an associative array elements increase. The last probe is probe timer. The probe in 10,000 milliseconds (10 seconds) after the trigger. Associated with the probe script will send the collected data (associative array for each member of traversing). When, after traversing all the members will call exit call, which led to unload the module in and out all the relevant SystemTap process.

Listing 4. Monitor all system calls (profile.stp)

global syscalllist

probe begin {
  printf("System Call Monitoring Started (10 seconds)...\n")

probe syscall.*
  syscalllist[pid(), execname()]++

probe {
  foreach ( [pid, procname] in syscalllist ) {
    printf("%s[%d] = %d\n", procname, pid, syscalllist[pid, procname] )

Listing 4, the output of the script as shown in Listing 5. From this script you can see the run in user space per process, and issued within 10 seconds the number of system calls.

Listing 5. Profile.stp script output

$ sudo stap profile.stp

System Call Monitoring Started (10 seconds)...
stapio[16208] = 104
gnome-terminal[6416] = 196
Xorg[5525] = 90
vmware-guestd[5307] = 764
hald-addon-stor[4969] = 30
hald-addon-stor[4988] = 15
update-notifier[6204] = 10
munin-node[5925] = 5
gnome-panel[6190] = 33
ntpd[5830] = 20
pulseaudio[6152] = 25[5859] = 10
syslogd[4513] = 5
gnome-power-man[6215] = 4
gconfd-2[6157] = 5
hald[4877] = 3

Specific process control system call

In this example, you slightly modify a script, make it a process of system calls to collect data. In addition to capturing only count but also the process of capturing a specific target system calls. Listing 6 shows the script.

This example process according to the specific test (in this case syslog daemon), and then change the associative array to the system call name mapping to the count data.

Listing 6. The new system call monitoring script (syslog_profile.stp)

global syscalllist

probe begin {
  printf("Syslog Monitoring Started (10 seconds)...\n")

probe syscall.*
  if (execname() == "syslogd") {

probe {
  foreach ( name in syscalllist ) {
    printf("%s = %d\n", name, syscalllist[name] )

Listing 7 provides the output of the script.

Listing 7. SystemTap new script output (syslog_profile.stp)

$ sudo stap syslog_profile.stp

Syslog Monitoring Started (10 seconds)...
writev = 3
rt_sigprocmask = 1
select = 1

Digital data using the aggregation step

Aggregate instance number of the value of statistical data to capture the remarkable way. When you capture large amounts of data, this method is very helpful and efficient. In this example, you collect on the network packet to send and receive data. Listing 8 defines two new probes to capture the network I / O. Each capture probe specific network device name, PID and process name of the packet length. The user presses Ctrl-C probes provide end the call to send captured data. In this case, you will traverse recv aggregation of content, for each tuple (device name, PID and process name) added to Bao's length, and then send the data. Note that using the extractor to add tuple: @count extractor to get the length of capture (including counts). You can also use @sum extractor to perform the add operation, were used @min or @max to collect the extent of the shortest or the longest, and the use of @avg to calculate the average.

Listing 8. Collect network packet length data (net.stp)

global recv, xmit

probe begin {
  printf("Starting network capture (Ctl-C to end)\n")

probe netdev.receive {
  recv[dev_name, pid(), execname()] <<< length

probe netdev.transmit {
  xmit[dev_name, pid(), execname()] <<< length

probe end {
  printf("\nEnd Capture\n\n")

  printf("Iface Process........ PID.. RcvPktCnt XmtPktCnt\n")

  foreach ([dev, pid, name] in recv) {
    recvcount = @count(recv[dev, pid, name])
    xmitcount = @count(xmit[dev, pid, name])
    printf( "%5s %-15s %-5d %9d %9d\n", dev, name, pid, recvcount, xmitcount )

  delete recv
  delete xmit

Listing 8 Listing 9 provides the output of the script. Note that when the user presses Ctrl-C to quit the script, and then send the captured data.

Listing 9. Net.stp output

$ sudo stap net.stp

Starting network capture (Ctl-C to end)
End Capture

Iface Process........ PID.. RcvPktCnt XmtPktCnt
 eth0 swapper         0           122        85
 eth0 metacity        6171          4         2
 eth0 gconfd-2        6157          5         1
 eth0 firefox         21424        48        98
 eth0 Xorg            5525         36        21
 eth0 bash            22860         1         0
 eth0 vmware-guestd   5307          1         1
 eth0 gnome-screensav 6244          6         3
Pass 5: run completed in 0usr/50sys/37694real ms.

Histogram data capture

The last example shows SystemTap present data with other forms of how simple - in this case, the form of a histogram display data. Return to a case, the data capture to a polymer called histogram (see list 10). Then, use netdev receive and send probes to capture the data packet length. When the probe is finished, you will use @hist_log extractor in the form of a histogram showing data.

Listing 10. Steps and histogram data presented (nethist.stp)

global histogram

probe begin {

probe netdev.receive {
  histogram <<< length

probe netdev.transmit {
  histogram <<< length

probe end {
  printf( "\n" )
  print( @hist_log(histogram) )

Listing 11 shows a list of 10 of the script output. In this example, the use of a browser session, an FTP session and ping to generate network traffic. @hist_log extractor is a two logarithmic histogram for the end of period (see below). Histogram can also be other steps, which enables you to define the size of the bucket.

Listing 11. Nethist.stp histogram output

$ sudo stap nethist.stp
value |-------------------------------------------------- count
    8 |                                                      0
   16 |                                                      0
   32 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@            1601
   64 |@                                                    52
  128 |@                                                    46
  256 |@@@@                                                164
  512 |@@@                                                 140
 1024 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  2033
 2048 |                                                      0
 4096 |                                                      0

Back to top


This article explores the SystemTap only the most simple functions. In the reference part, you can find lots of tutorials, examples and links to language reference, these resources provide all the details needed to understand SystemTap information. SystemTap use several existing methods and draws on track to achieve the previous kernel. Although the tool is still tension among the development, but it is now ready for use. Please look forward to the emergence of new features.

Reference material


分类:OS 时间:2010-05-27 人气:286
blog comments powered by Disqus


iOS 开发

Android 开发

Python 开发



PHP 开发

Ruby 开发






Javascript 开发

.NET 开发



Copyright (C), All Rights Reserved. 版权所有 闽ICP备15018612号

processed in 0.452 (s). 12 q(s)