Linux three sword passengers and use introduction

Linux Three Musketeers are short for (grep, sed, awk), and proficient use of these three tools can improve the efficiency of operation and maintenance. The Linux Three Musketeers are based on regular expressions. In the Linux system, two regular expressions are supported, namely “standard regular expressions” and “extended regular expressions”. After mastering the regular expressions, the usage of the Three Musketeers will be explained in detail.

One, regular expression

Regular expression: REGular EXPression, REGEXP metacharacter: .: Match any single character [ ]: Match any single character in the specified range [^]: Match any single character set outside the specified range: [:digit:], [:lower:], [:upper:], [:punct:], [ :space:], [:alpha:], [:alnum:] Note: The character set must use [] to include the number of matches (greedy mode): *: Match the preceding character any number of times a, b, ab, aab, acb , adb, amnb a*b, a?b a.*b .*: Any character of any length \?: Match the preceding character 1 or 0 times \+: Match at least once \{m,n\}: Match the preceding character at least m times and at most n times. \{1,\} \{0,3\} Note: At least 0 times, it must be written out displayed. Position anchor: ^: anchor the beginning of the line, any content after this character must appear at the beginning of the line $: anchor the end of the line, any content before this character must appear at the end of the line ^$: blank line \< or \b : Anchor the beginning of the word, any character after it must appear as the beginning of the word \> or \b: Anchor the end of the word, any character before it must appear as the end of the word group: \(\) \(ab\)* To quote \1: Quoting all the contents of the first left parenthesis and the corresponding right parenthesis\2: \3:

You can see that in the use of standard regular expressions, many symbols are Need to escape, which brings some inconvenience in the work, so the extended regular expression appears. [ ^abc]: Does not contain any one character of abc 2. Number of matches (no need to escape): *: ?:+: Match the character before it at least once {m,n} 3. Position anchor: ^$\< \>4. Grouping (no need to escape): (): grouping\1, \2, \3, …5. Or |: orC|cat: C or cat (representing the whole part)

As you can see, the use of extended regular expressions can omit a lot of escape symbols, which greatly improves the readability of the code, especially when writing sed statements. It is recommended to use extended regular expressions first.

Three, grep command family

The grep command family consists of grep, egrep, fgrep are composed of three sub-commands, suitable for different scenarios. The details are as follows:
Command Description
grep The native grep command uses “standard regular expression” as the matching criterion.
egrep The extended grep command, equivalent to $(grep -E), uses “extended regular expression” as the matching criterion.
fgrep A simplified version of the grep command, does not support regular expressions, but the search speed is fast and the system resource usage rate is low.

3.2. How to use

Syntax
grep [options] PATTERN [FILE…]
< strong>options part
-i: Ignore case
–color: highlight the matched string
-v: display the lines that are not matched by the pattern
-o : Only display the string matched by the pattern
-E: Use the extended regular expression
PATTERN part
Given the matching template as a string, you can use ordinary Strings and regular expressions (standard & extended).
FILE section
Need to find the content of the file.

four, sed command

4.1. Overview

The full name of sed is Stream EDitor
sed is A stream editor, line editor

4.2 Basic syntax

sed [option]’script’ [input file]…
option part
-n: Do not output the content in the pattern space to stdout
-e: You can specify multiple scripts in the sed command, multi-point editing function
-f: Input sed script, the script is written with editing commands
-r: supports the use of extended regularities
-i: directly edit the source file

script part< /strong>
Address delimitation editing command (similar to the vim command)
1) Empty address: full text editing
2) Single address:
? #: Specify a line to edit a specific line
? /pattern/: Specify the line where the pattern matches
3) Address range:
? #,#
? #,+#
? #,/pattern/
? /pattern1/,/pattern2/
4) Step address:
? 1~2: start line with 1, then advance 2 lines to match down
? 2~2: all even numbers Line
5) Editing command:
? D: delete the entire line, put d at the end
? P: display the content in the mode space, put at the end
? A: follow the matched line To add text, use \n to support multi-line appending. a is placed after the delimitation
? i: add text in front. Example: sed ‘3i hello’ xxx
? C: Replace the text specified by the behavior. Example: sed ‘3c text’ xxx replace the third line with text. sed -i ‘/xyz/c helloworld’ num.txt
? w: Save the matched content in the pattern space to the specified location. Example: sed -n ‘/^[^#]/w /tmp/demo’ /etc/fstab Save the lines that do not start with # in /etc/fstab to /tmp/demo.
? r: read the content of the specified file and add it to the line that matches the current file to merge the files.
?! : The condition is reversed. Usage: Address delimitation! Edit command.
? s///: Conditional replacement.
Remarks for replacement tags: g (global replacement), p (display successful replacement lines)

Replacement example: Find the directory based on input
echo “/var/log/messages” | sed'[email protected][^/]+$/[email protected]@’

4.3.sed advanced usage

  1. Mode space and holding space
    Share pictures

In the pattern space, complete the matching operation. When there is no match, the text line content will output stdout by default; when the text line is matched, the editing command will be executed, and the execution result will be output to stdout.
The holding space can be understood as a temporary storage area, which is only used to complete additional actions.

  1. Parameter
    h: Overwrite the content in the pattern space to the holding space;
    H: Append the content in the pattern space to the holding space;
    g: Overwrite the content in the hold space to the pattern space;
    G: append the content in the hold space to the pattern space;
    x: exchange the content in the pattern space with the content in the hold space;< br> n: Overwrite and read the next line of the matched line (change the direction) to the pattern space;
    N: append read the next line of the matched line (change the direction) to the pattern space;
    d: delete lines in the pattern space;
    D: delete all lines in the multi-line pattern space;
3. For example sed -n'n;p 'FILE: display even-numbered lines; sed '1!G;h;$!d' FILE: display the contents of the file in reverse order; sed'$!d' FILE: take out the last line; sed'\$!N;$!D' FILE: remove the last two lines of the file; sed'/^$/d;G' FILE: delete all the original blank lines, and then add a blank line after all non-blank lines; sed'n;d' FILE: display Odd-numbered lines; sed'G' FILE: Add a blank line after each original line; 
  • Example: Extract string
/bin/bashinfo="hellozimskyshenzhen"echo $info | sed's/hello\(\w\+\)shenzhen/\1/g'

Remarks:

< ul>

  • \d is not supported in sed. If you want to use numbers, use [0-9], but \w is supported.
  • The () in sed must be escaped, + must be escaped, and <>greater than less than sign must be escaped.
    • Example: Determine whether there is a string in the specified format
    #!/bin/bash # Determine whether the input is an integer if [-n "$(echo $1 | sed -n'/^[0-9]\+$/p')"]; then echo'yes'else echo'no'fi 

    five, awk command

    5.1. awk overview

    awk is the name of the three authors who invented the tool Abbreviated by the initials, awk is a report generator mainly used for formatted output. Formatted text output device.

    5.2. Basic Usage

    1. Syntax
    gawk [option]'program' FILE
    where program : PATTERN{ACTION STATEMENTS}
    {ACTION STATEMENTS} can be understood as a command, the most commonly used are print, printf

    2. The process of awk reading documents
    Follow Line to read the document, divided into small parts according to the input separator (using built-in variables to represent $0$1$2...), and use ACTION STATEMENTS to process. $0 means to display the entire line.

    3. option
    -F: specifies the separator of the input field;
    -v: used to implement custom variables var=value;

    4. PATTERN (for delimitation)
    ?Empty: means to process each line of the file
    ?/pattern/: Use regular matching to process lines
    ?!/pattern/: The above is negated
    ? Relational expression: if the result is true or false, the result is true, and the false is not processed. Non-zero and non-empty strings are true, and the rest are false.
    ? Row delimitation: The format of giving numbers directly (1,2{...}) is not supported. See example.
    BEGIN/END mode: BEGIN{} means a program that is executed only once before starting to process the text in the file, such as printing the header. END{} means to execute once after the text processing is completed, such as summary data.

    For example: awk -F:'$NF=="/bin/bash" {print $1, $NF}' /etc/passwdawk -F:'$NF!"/ bash/$"{print $1,$NF}' passwdawk -F:'$3<1000 {print $1, $3}' /etc/passwdawk -F;'(NR>=2&&NR<=10){print $1}' / etc/passwd line delimitation awk -F:'{printf "%-15s %10s\n", $1, $2}' /etc/passwd

    5. Variables< /p>

    • Built-in variables (no need to add $ when referencing variables)
      FS: input field separator: input field separator Character, the default blank character. Use -v to specify.
      OFS: Output field separator. Use -v to specify.
      RS: Line break during input
      ORS: Line break during output
      NF: number of field per The number of fields in a row. Add $NF to indicate the last column.
      NR: number of record file lines, printed out is the print line number
      FNR: number of lines in multiple files are counted separately
      FILENAME: the file name of the current file
      ARGC: the number of parameters in the parameter command line
      ARGV: return array, command Each parameter in the line
      Example: awk'BEGIN {print ARGV[0]}' /etc/fstab /etc/issue
      Here ARGV[0] is awk, which is fixed as the 0th parameter. ARGV[1] is /etc/fstab, ARGV[2] is /etc/issue
      Example: awk -v FS=':''{print $1}' -v OFS=':' /etc/passwd named The colon is used as the input separator. Same as awk -F: ...

    • Custom Variables
      Method 1: -v var=value (case sensitive)
      Method 2: Define in the program

      Example: awk -v test='hello''BEGIN {print test}'
      awk'BEGIN {test='hello' print test}'

    6. Commonly used ACTION commands

    • print
      Output format: print item1,item2 ...
      Note: Use comma as a separator; output item can be a string, built-in variable, awk expression; if item is omitted, then Display the entire line of $0;

    • printf
      Formatted output: printf FORMAT, item1, item2... are placed in format by bit.
      Note: format must be given; if it needs to wrap, it must be displayed and written out; format must be specified for each item in the following;
    • Expressions
    • Control statements: control statement if, while
      if(condition){statement}
      if(condition){statement} else {statements}
      while(condition) {statements}
      do {statements} while( condition)
      for(expr1;expr2;expr3) {statements}
      break
      continue
      delete array[index]
      delete array delete the entire array
      exit Exit statement
    • Compound statements: Combined statements
    • Input statements: Input statements
    • Output statements: Output statements
      Format specifiers:
      ?%c: display the ASCII value of the character
      ?%d: display a decimal integer
      ?%e: display a numerical value in scientific notation
      ?%f: display as a floating point number
      ?%g: Display floating-point numbers in scientific notation
      ?%s: display string
      ?%u: display unsigned integer
      ?%%: display% itself
      modifier :
      ?#[.#]: The first number is used to control the width of the displayed characters, and the second number represents the precision of decimals (for floating-point numbers); the default output is right-aligned %15s, left-aligned:% -15s; +: means with positive and negative signs;
      operator:
      ? Arithmetic operator: +-/*; +x converts a string into a number; -x changes to Negative number;
      ?String operator: string concatenation (no operator)
      ?Copy operator: =,+=,-=,/=,++,--
      ?Comparison operator :>,<,<=,!=,==
      Pattern Matching Character:
      ?~: Whether the string on the left is matched by the pattern
      ?!~: Left Whether the string on the side cannot be matched by the pattern
      Logical operator:
      ?&&: and
      ?||: or
      ?!: non
      function call:
      ?function_name(arg1, arg2, .. .)
      Conditional expression:
      ?selector?true_exp:false_exp is the same as the ternary operator

    • Operation example
    • # Generally speaking, print stateless content in BEGIN and END blocks awk -v begin="hello" -v end="ok" -F:'BEGIN{print begin} ; {print $1, $NF}; END{print end}' /etc/passwd
      5.3. awk advanced usage and examples

      < strong>awk commonly used built-in variables

      $1: represents the first column $NF: represents the last column $NR: represents the row number

      Common condition expressions

      1) /Specify content/

      This way can match to the line with "specified content", do not add $# in the condition It is recommended not to use regularity for the item, because there is an abnormal situation.

      awk -F:'/nologin/{print $0}' /etc/passwd #match to the line containing nologin keyword seq 100 | awk'/1/{print $1} '

      2) $#=/Specify content/

      This way, specify the #th column to match the specified content

      awk -F: '$1=/bin/{print $0}' /etc/passwd

      3) $#~/Specify content/

      This method is used to specify fuzzy matching of columns (regular matching ) Specify the content and get the line.

      awk -F:'$1~/dae/{print $1}' /etc/passwd #forward selection awk -F:'$1!~/dae/{print $1} '/etc/passwd #Reverse selection

      4) Value judgment

      Use >,<,>=,<=,==,!= to judge the value of the specified column.

      awk -F:'$3>=10{print $1}' /etc/passwd

      5) Logical judgment

      Use && ,|| to make logical judgments.

      awk -F:'$3>=5 && $3<=10{print $1}' /etc/passwd

      6) if condition judgment

      awk -F:'{if ($NF~/nologin$/){i++}else{j++}}; END{print i, j}' /etc/passwd#Note if- The else condition judgment is placed in {}

      7) Dictionary use

      Array types can be defined in awk for statistics.

      awk'{ip[$1]++}; END{for (i in ip) {print i, ip[i]}}' access.log#Analysis: The first A column of ip is set as the key of the dictionary, and it is incremented by 1 when the same ip appears once, which is used to count all ip counts. Get the key corresponding to each dictionary in the #for loop, and then use the print block to print it out. Pay attention to the isolation of curly braces. 
      #QQ号级时长#Statistical level (30<=x<=90), the duration of the same account#1234 12 23#1234 10 122#1233 92 4212#1233 42 4252 #1239 87 2313#1233 56 1121#1231 19 45#1235 45 679cat data | awk'$2>=30&&$2<=90{dic[$1]+=$3}; END{for (i in dic) {print i, dic[i]}}'

    Leave a Comment

    Your email address will not be published.