Hinson's blog

π•°π–π–ˆπ–Šπ–‘π–‘π–Šπ–“π–ˆπ–Š π–Žπ–“ π–ˆπ–‘π–”π–šπ–‰ π–†π–—π–ˆπ–π–Žπ–™π–Šπ–ˆπ–™π–šπ–—π–Š 𝖆𝖓𝖉 π–˜π–Šπ–ˆπ–šπ–—π–Žπ–™π–ž π–‰π–”π–’π–†π–Žπ–“π–˜ π–Žπ–˜ π–™π–π–Š π–Šπ–™π–Šπ–—π–“π–†π–‘ π–•π–šπ–—π–˜π–šπ–Žπ–™ 𝖔𝖋 π–’π–ž π–•π–—π–”π–‹π–Šπ–˜π–˜π–Žπ–”π–“π–†π–‘ π–Œπ–—π–”π–œπ–™π–

Deep dive to explore the charm of Linux awk command

Spread the love

awk options 'selection _criteria {action }' input-file > output-file


  • Awk’s built-in variables include the field variablesβ€”$1, $2, $3, and so on ($0 is the entire line)
  • NR:Β NR command keeps a current count of the number of input records. – Β prints all the lines along with the line number.Β 
  • NF:Β NF command keeps a count of the number of fields within the current input record.Β – $NF represents last field.Β 
  • FS: FS command contains the field separator character which is used to divide fields on the input line. The default is β€œwhite space”, meaning space and tab characters. FS can be reassigned to another character (typically in BEGIN) to change the field separator. 
  • RS: RS command stores the current record separator character. Since, by default, an input line is the input record, the default record separator character is a newline. 
  • OFS: OFS command stores the output field separator, which separates the fields when Awk prints them. The default is a blank space. Whenever print has several parameters separated with commas, it will print the value of OFS in between each parameter. 
  • ORS:Β ORS command stores the output record separator, which separates the output lines when Awk prints them. The default is a newline character. print automatically outputs the contents of ORS at the end of whatever it is given to print.Β 

  • Print empty line number: awk 'NF == 0 {print NR}' Β geeksforgeeks.txt
  • $NF represents last field.Β 



192.168.1.1 - - [10/Oct/2023:13:55:36 -0400] "GET /index.html HTTP/1.1" 200 2326
192.168.1.2 - - [10/Oct/2023:14:05:21 -0400] "POST /login HTTP/1.1" 302 512
192.168.1.3 - - [10/Oct/2023:14:15:55 -0400] "GET /products/1 HTTP/1.1" 200 1548
192.168.1.4 - - [10/Oct/2023:14:20:33 -0400] "GET /about-us HTTP/1.1" 200 984
192.168.1.5 - - [10/Oct/2023:14:35:47 -0400] "GET /contact HTTP/1.1" 200 1105
192.168.1.6 - - [10/Oct/2023:14:45:22 -0400] "POST /api/data HTTP/1.1" 200 2048
192.168.1.1 - - [10/Oct/2023:15:00:18 -0400] "GET /news HTTP/1.1" 200 3072
192.168.1.2 - - [10/Oct/2023:15:05:29 -0400] "DELETE /api/session HTTP/1.1" 204 0
192.168.1.3 - - [10/Oct/2023:15:15:45 -0400] "GET /images/logo.png HTTP/1.1" 200 1024
192.168.1.4 - - [10/Oct/2023:15:25:53 -0400] "PUT /profile/update HTTP/1.1" 200 768
192.168.1.5 - - [10/Oct/2023:15:40:19 -0400] "GET /dashboard HTTP/1.1" 200 2150
192.168.1.6 - - [10/Oct/2023:15:50:05 -0400] "GET /settings HTTP/1.1" 200 1234
  • '{print $1}' access.log extracts the first field (IP address) from each line.
  • sort sorts the IP addresses.
  • uniq -c counts occurrences of each IP.
  • sort -nr sorts the count in descending order.

ItemID,ItemName,Quantity,Price
1001,Apple,30,0.50
1002,Banana,20,0.20
1003,Orange,25,0.35
1001,Apple,15,0.50
1003,Orange,10,0.35
1002,Banana,30,0.20
  • -F,: Tells awk to use the comma (,) as the field separator, as the data is in CSV format.
  • NR > 1: NR stands for “Number of Records” which in awk is synonymous with the current line number being processed. NR > 1 skips the first line (the header).
  • qty[$2] += $3; sales[$2] += $3 * $4: For each line, these expressions accumulate the total quantity and total sales per item. $2, $3, and $4 refer to the second (ItemName), third (Quantity), and fourth (Price) fields of the current line, respectively. Arrays qty and sales are indexed by the item name.
  • END {for (item in qty) print item, qty[item], sales[item]}: After processing all lines, this block iterates over the qty array and prints the item name, total quantity, and total sales for each item.
  • BEGIN {print "ItemName | Total Quantity | Total Sales ($)"}: Before processing any data, this prints a header row for the report.
  • printf "%-10s | %14d | %15.2f\n", item, qty[item], sales[item]: Uses printf for formatted output, ensuring columns are aligned and sales are shown with two decimal places. The -10s format specifier aligns item names left in a field 10 characters wide.

Output:





Leave a Reply

Your email address will not be published. Required fields are marked *