this is line 1 这行就会与Pattern1进行匹配。如果匹配成功,就会执行ACTIONS。然后this is line 1 会和Pattern2进行匹配。如果匹配失败,它就会跳到Pattern3进行匹配,以此类推。
一旦所有的模式都匹配过了,this is line 2 就会以同样的步骤进行匹配。其他的行也一样,直到读取完整个文件。
简而言之,这就是Awk的运行模式 数据类型
Awk仅有两个主要的数据类型:字符串和数字。即便如此,Awk的字符串和数字还可以相互转换。字符串能够被解释为数字并把它的值转换为数字值。如果字符串不包含数字,它就被转换为0.
它们都可以在你代码里的ACTIONS部分使用 = 操作符给变量赋值。我们可以在任意时刻、任意地方声明和使用变量,也可以使用未初始化的变量,此时他们的默认值是空字符串:“”。
最后,Awk有数组类型,并且它们是动态的一维关联数组。它们的语法是这样的:var[key] = value 。Awk可以模拟多维数组,但无论怎样,这是一个大的技巧(big hack)。 模式
可以使用的模式分为三大类:正则表达式、布尔表达式和特殊模式。 正则表达式和布尔表达式
你使用的Awk正则表达式比较轻量。它们不是Awk下的PCRE(但是gawk可以支持该库——这依赖于具体的实现!请使用 awk
–version查看),然而,对于大部分的使用需求已经足够了:
代码如下:
/admin/ { ... } # any line that contains 'admin'
/^admin/ { ... } # lines that begin with 'admin'
/admin$/ { ... } # lines that end with 'admin'
/^[0-9.]+ / { ... } # lines beginning with series of numbers and periods
/(POST|PUT|DELETE)/ # lines that contain specific HTTP verbs
# According to the following line
#
# $1 $2 $3
# 00:34:23 GET /foo/bar.html
# _____________ _____________/
# $0
# Hack attempt?
/admin.html$/ && $2 == "DELETE" {
print "Hacker Alert!";
}
域(默认地)由空格分隔。$0 域代表了一整行的字符串。 $1 域是第一块字符串(在任何空格之前), $2 域是后一块,以此类推。
一个有趣的事实(并且是在大多是情况下我们要避免的事情),你可以通过给相应的域赋值来修改相应的行。例如,如果你在一个块里执行 $0 = “HAHA THE LINE IS GONE”,那么现在下一个模式将会对修改后的行进行操作而不是操作原始的行。其他的域变量都类似。 行为
这里有一堆可用的行为(possible actions),但是最常用和最有用的行为(以我的经验来说)是:
代码如下:
{ print $0; } # prints $0. In this case, equivalent to 'print' alone
{ exit; } # ends the program
{ next; } # skips to the next line of input
{ a=$1; b=$0 } # variable assignment
{ c[$1] = $2 } # variable assignment (array)
{ if (BOOLEAN) { ACTION }
else if (BOOLEAN) { ACTION }
else { ACTION }
}
{ for (i=1; i<x; i++) { ACTION } }
{ for (item in c) { ACTION } }
# function arguments are call-by-value
function name(parameter-list) {
ACTIONS; # same actions as usual
}
# return is a valid keyword
function add1(val) {
return val+1;
}
BEGIN { # Can be modified by the user
FS = ","; # Field Separator
RS = "n"; # Record Separator (lines)
OFS = " "; # Output Field Separator
ORS = "n"; # Output Record Separator (lines)
}
{ # Can't be modified by the user
NF # Number of Fields in the current Record (line)
NR # Number of Records seen so far
ARGV / ARGC # Script Arguments
}
# Parse Erlang Crash Dumps and correlate mailbox size to the currently running
# function.
#
# Once in the procs section of the dump, all processes are displayed with
# =proc:<0.M.N> followed by a list of their attributes, which include the
# message queue length and the program counter (what code is currently
# executing).
#
# Run as:
#
# $ awk -v threshold=$THRESHOLD -f queue_fun.awk $CRASHDUMP
#
# Where $THRESHOLD is the smallest mailbox you want inspects. Default value
# is 1000.
BEGIN {
if (threshold == "") {
threshold = 1000 # default mailbox size
}
procs = 0 # are we in the =procs entries?
print "MESSAGE QUEUE LENGTH: CURRENT FUNCTION"
print "======================================"
}
# Only bother with the =proc: entries. Anything else is useless.
procs == 0 && /^=proc/ { procs = 1 } # entering the =procs entries
procs == 1 && /^=/ && !/^=proc/ { exit 0 } # we're done
# Message queue length: 1210
# 1 2 3 4
/^Message queue length: / && $4 >= threshold { flag=1; ct=$4 }
/^Message queue length: / && $4 < threshold { flag=0 }
# Program counter: 0x00007f5fb8cb2238 (io:wait_io_mon_reply/2 + 56)
# 1 2 3 4 5 6
flag == 1 && /^Program counter: / { print ct ":", substr($4,2) }