The print
statement is meant for quick and easy output. To format
the output exactly the way you want it, you may have to use the printf
statement. As we see in the AWK language reference manual, printf
can produce almost any kind of output, but in this section we'll only show a few
of its capabilities.
Lining Up Fields
The printf
statement has the form
printf(format, value1,...,valuen)
where format is a string that contains text to be printed verbatim, interspersed
with specifications of how each of the values is to be printed. A specification is
a %
followed by a few characters that control the format of a
value. The first specification tells how value1 is to be
printed, the second how value2 is to be printed, and so on.
Thus, there must be as many %
specifications in format as
values to be printed.
Here's a program that uses printf
to print the total pay for every
employee:
{ printf("total pay for %s is $%.2f\n", $1, $2 * $3) }
The specification string in the printf
statement contains two
%
specifications. The first, %s
, says to print the
first value, $1
, as a string of characters, the second, %.2f
,
says to print the second value, $2*$3
, as a number with 2 digits
after the decimal point. Everything else in the specification string, including
the dollar sign, is printed verbatim; the \n
at the end of the string
stands for a newline, which causes subsequent output to begin on the next line.
With emp.data
as input, this program yields:
total pay for Beth is $0.00
total pay for Dan is $0.00
total pay for Kathy is $40.00
total pay for Mark is $100.00
total pay for Mary is $121.00
total pay for Susie is $76.50
With printf
, no blanks or newlines are produced automatically;
you must create them yourself. Don't forget the \n
.
Sorting the Output
Suppose you want to print all the data for each employee, along with his or
her pay, sorted in order of increasing pay. The easiest way is to use awk to
prefix the total pay to each employee record, and run that output through
a sorting program. On Unix, the command line
awk '{ printf("%6.2f %s\n", $2 * $3, $0) }' emp.data | sort
pipes the output of awk into the sort
command, and produces:
0.00 Beth 4.00 0
0.00 Dan 3.75 0
40.00 Kathy 4.00 10
76.50 Susie 4.25 18
100.00 Mark 5.00 20
121.00 Mary 5.50 22
1.4 Selection
Awk patterns are good for selecting interesting lines from the input for further
processing. Since a pattern without an action prints all lines matching the pattern,
many awk programs consist of nothing more than a single pattern. This section
gives some examples of useful patterns.
Selection by Comparison
This program uses a comparison pattern to select the records of employees
who earn $5.00 or more per hour, that is, lines in which the second field is greater
than or equal to 5:
$2 >= 5
It selects these lines from emp.data
:
Mark 5.00 20
Mary 5.50 22
Selection by Computation
The program
$2 * $3 > 50 { printf("$%.2f for %s\n", $2 * $3, $1) }
prints the pay of those employees whose total pay exceeds $50:
$100.00 for Mark
$121.00 for Mary
$76.50 for Susie
Selection by Text Content
Besides numeric tests, you can select input lines that contain specific words
or phrases. This program prints all lines in which the first field is
Susie
:
$1 == "Susie"
The operator ==
tests for equality. You can also look for text
containing any of a set of letters, words, and phrases by using patterns
called regular expressions. This program prints all lines that contain
Susie
anywhere:
/Susie/
The output is this line:
Susie 4.25 18
Regular expressions can be used to specify much more elaborate patterns.
Combinations of Patterns
Patterns can be combined with parentheses and the logical operators
&&
, ||
, and !
, which stand for
AND, OR, and NOT. The program
$2 >= 4 || $3 >= 20
prints those lines where $2
is at least 4 or
$3
is at least 30:
Beth 4.00 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18
Lines that satisfy both conditions are printed only once.
BEGIN and END
The special pattern BEGIN
matches before the first line of the first
input file is read, and END
matches after the last line of the last
file has been processed. This program uses BEGIN
to print a heading:
BEGIN { print "NAME RATE HOURS"; print "" }
{ print }
The output is:
NAME RATE HOURS
Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18
You can put several statements on a single line if you separate them by semicolons.
Note that print ""
prints a blank line, quite different from just
plain print
, which prints the current line.
1.5 Computing with AWK
An action is a sequence of statements separated by newlines or semicolons.
This section provides examples of statements for performing simple numeric
and string computations. In these statements you can use not only the built-in
variables like NF
, but you can create your own variables for
performing calculations, storing data, and the like. In awk, user-created
variables are not declared.
Counting
This program uses a variable emp
to count employees who have
worked more than 15 hours:
$3 > 15 { emp = emp + 1 }
END { print emp, "employees worked more than 15 hours" }
For every line in which the third field exceeds 15, the previous value of
emp
is incremented by 1. With emp.data
as input,
this program yields:
3 employees worked more than 15 hours
Awk variables used as numbers begin life with the value 0, so we didn't
need to initialize emp
.
To count the number of employees, we can use the built-in variable
NR
, which holds the number of lines read so far; its value
at the end of all input is the total number of lines read.
END { print NR, "employees" }
The output is:
6 employees
String Concatenation
New strings may be created by combining old ones; this operation is called
concatenation. The program
{ names = names $1 " " }
END { print names }
collects all the employee names into a single string, by appending each name
and a blank to the previous value in the variable names
. The
value of names
is printed by the END
action:
Beth Dan Kathy Mark Mary Susie
The concatenation operation is represented in an awk program by writing string
values one after the other. At every input line, the first statement in the
program concatenates three strings: the previous value of names
,
the first field, and a blank; it then assigns the resulting string to
names
. Variables used to store strings begin life holding the
null string, so in this program names
did not have to be
explicitly initialized.
Built-in Functions
We have already seen that awk provides built-in variables that maintain
frequently used quantities like the number of fields and the input line number.
Similarly, there are built-in functions for computing other useful values.
Besides arithmetic functions for square roots, logarithms, random numbers,
and the like, there are also functions that manipulate text. One of these is
length
, which counts the number of characters in a string. For example,
this program computes the length of each person's name:
{ print $1, length($1) }
The result:
Beth 4
Dan 3
Kathy 5
Mark 4
Mary 4
Susie 5
Counting Lines, Words, and Characters
This program uses length
, NF
, and NR
to count the number of lines, words, and characters in the input. For convenience,
we'll treat each field as a word.
{ nc = nc + length($0) + 1
nw = nw + NF }
END { print NR, "lines, ", nw, "words, ", nc, "characters" }
The file emp.data
has
6 lines, 18 words, 77 characters
We have added one for the newline character at the end of each input line,
since $0
doesn't include it.
1.6 Control-Flow Statements
Awk provides statements for making decisions and writing
loops, mostly modeled on those found in the C programming language. They can only
be used in actions.
If-Else Statement
The following program computes the total and average pay of employees making
more than $6.00 an hour. It uses an if
to defend against division
by zero in computing the aveage pay.
$2 > 6 { n = n + 1; pay = pay + $2 * $3 }
END { if (n > 0)
print n, "employees, total pay is", pay,
"average pay is", pay/n
else
print "no employees are paid more than $6/hour"
}
In the if-else
statement, the condition following the if
is evaluated. If it is true, the first print
statement is performed.
Otherwise, the second print
statement is performed. Note that we can
continue a long statement over several lines by breaking it after a comma.
1.7 Arrays
Awk provides arrays for storing groups of related values. Unlike many
earlier languages, array subscripts in awk are strings of characters.
This gives awk a capability like the associative memory of SNOBOL4 tables,
and for this reason, arrays in awk are called associative arrays.
Suppose some employees get a raise and work at two different pay rates:
Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18
Kathy 5.25 25
Susie 5.75 12
You now want to compute and print for each employee who worked
the employee's total number of hours and
total pay. The following awk program does the job:
$3 > 0 { hours[$1] += $3; pay[$1] += $2 * $3 }
END { for (emp in hours)
printf("%s worked %d hours and received $%.2f\n",\
emp, hours[emp], pay[emp])
}
The first action uses two arrays, hours
and pay
,
to accumulate for each employee his or her total hours and pay. The name in the
first field ($1
) is used to index each array. As in C, an
assignment statement of the form x += y
is a shorthand
for x = x + y
. When an array entry is first created, it is
automatically initialed to zero if it used to hold numeric data.
The END
action uses a form of the for
statement
that loops over all subscripts that were used to index the array. The loop
executes the printf
statement with the variable emp
set in turn to each different subscript in the array. The order in which
the subscripts are considered is implementation dependent.
Note that the long printf
statement has been broken by a backslash
at the end of the line. One possible
output from executing this program on the new employee data is
Mary worked 22 hours and received $121.00
Kathy worked 35 hours and received $171.25
Susie worked 30 hours and received $145.50
Mark worked 20 hours and received $100.00
We could pipe the output into sort
as we did above to sort the
output by employee name.
1.8 A Handful of Useful "One-liners"
Although awk can be used to write programs of some complexity, many useful
programs are not much more complicated than what we've seen so far. Here is
a collection of short programs that you might find handy and/or instructive.
Most are variations on material already covered.
- Print the total number of input lines:
END { print NR }
- Print the tenth input line:
NR == 10
- Print the last field of every input line:
{ print $NF }
- Print the last field of the last input line:
{ field = $NF }
END { print field }
- Print every unique input line:
!a[$0]++
1.9 Summary
You have now seen the essentials of awk. Each program in this chapter
has been a sequence of pattern-action statements. Awk tests every input
line against the patterns, and when a pattern matches, performs the
corresponding action. Actions can involve numeric and string comparisons,
and actions can include computation and formatted printing. Besides reading
through your input files automatically, awk splits each input line into fields.
It also provides a number of built-in variables and functions, and lets you
define your own as well. With this combination of features, quite a few
useful computations can be expressed by short programs -- many of the details
that would be needed in another language are handled implicitly in an awk
program.
For more information, consult the man
pages for awk and
"The AWK Programming Language" book by Aho, Kernighan, and Weinberger.