Linux Text Processing: grep, sed, awk, cut, sort, uniq, and Regular Expressions

Text processing is one of the most powerful aspects of Linux. Mastering these tools enables efficient data analysis, log parsing, and system administration tasks.

grep - Search Text

Find lines matching patterns.

# Simple search
grep "error" logfile.txt
 
# Case-insensitive search
grep -i "error" logfile.txt
 
# Show matching lines with numbers
grep -n "error" logfile.txt
# 45: This is an error message
# 67: Another error occurred
 
# Invert match (show non-matching)
grep -v "debug" logfile.txt
 
# Count matching lines
grep -c "error" logfile.txt
# 15
 
# Show only matching part
grep -o "pattern" file.txt
 
# Show context lines
grep -B 2 "error" file.txt  # Before
grep -A 2 "error" file.txt  # After
grep -C 2 "error" file.txt  # Context (both)
 
# Search multiple files
grep "error" *.log
grep -r "error" /var/log/
 
# Show file names only
grep -l "error" *.txt
 
# Recursive with file types
grep -r --include="*.log" "error" /var/log/
 
# Word boundary
grep -w "error" file.txt
# Matches "error" but not "errors"
 
# Extended regex
grep -E "error|warning" file.txt
 
# Fixed string (no regex)
grep -F "10.0.0.1" logfile.txt
 
# Performance optimization
grep -m 1 "pattern" largefile.txt
# -m: stop after first match
 
# Highlight matches
grep --color=always "error" file.txt | less -R

sed - Stream Editor

Edit text with commands.

# Print specific lines
sed -n '5p' file.txt       # Line 5
sed -n '5,10p' file.txt    # Lines 5-10
sed -n '1p;$p' file.txt    # First and last
 
# Delete lines
sed '5d' file.txt          # Delete line 5
sed '5,10d' file.txt       # Delete lines 5-10
sed '/pattern/d' file.txt  # Delete matching lines
 
# Substitute text
sed 's/old/new/' file.txt           # Replace first occurrence
sed 's/old/new/g' file.txt          # Replace all occurrences
sed 's/old/new/2' file.txt          # Replace 2nd occurrence
sed 's/old/new/2g' file.txt         # Replace from 2nd onward
 
# Case-insensitive substitution
sed 's/old/new/i' file.txt
 
# Substitute in specific line
sed '5s/old/new/' file.txt          # Line 5 only
sed '5,10s/old/new/g' file.txt      # Lines 5-10
 
# Use different delimiter
sed 's|/path/old|/path/new|' file.txt
 
# Escape special characters
sed 's/\*/asterisk/' file.txt
 
# Backreferences
sed 's/\([0-9]\+\) \(.*\)/\2: \1/' file.txt
 
# Save changes to file
sed -i 's/old/new/g' file.txt
sed -i.bak 's/old/new/g' file.txt  # Backup original
 
# Multiple operations
sed -e 's/old/new/g' -e 's/foo/bar/g' file.txt
 
# Transform characters
sed 'y/abc/xyz/' file.txt
# a->x, b->y, c->z
 
# Insert/append lines
sed '5a\New line text' file.txt     # Append after line 5
sed '5i\New line text' file.txt     # Insert before line 5
sed '$ a\End of file' file.txt      # Append at end

awk - Text Processing Language

Process structured data.

# Print specific columns
awk '{print $1}' file.txt           # First column
awk '{print $1, $3}' file.txt       # Columns 1 and 3
awk '{print NF}' file.txt           # Number of fields
 
# Field separator
awk -F: '{print $1}' /etc/passwd    # Parse password file
awk -F',' '{print $2}' data.csv     # CSV file
 
# Pattern matching
awk '/error/ {print}' file.txt
awk '/error/ {print $0}' file.txt
 
# Conditional statements
awk '$3 > 100 {print $1, $3}' data.txt
 
# Built-in variables
awk '{print NR, $0}' file.txt       # Line number and content
awk 'END {print NR}' file.txt       # Total lines
awk 'BEGIN {FS=":"} {print $1}' /etc/passwd
 
# Sum values
awk '{sum += $1} END {print sum}' numbers.txt
 
# Count occurrences
awk '/pattern/ {count++} END {print count}' file.txt
 
# Calculate average
awk '{sum += $1; count++} END {print sum/count}' numbers.txt
 
# Multiple conditions
awk '$2 > 50 && $3 < 100 {print $1}' data.txt
 
# Pattern ranges
awk '/start/,/end/ {print}' file.txt
 
# String functions
awk '{print toupper($1)}' file.txt
awk '{print substr($0, 1, 5)}' file.txt
awk '{print length($0)}' file.txt
 
# Complex awk script
awk 'BEGIN {
  print "Processing file..."
}
/error/ {
  error_count++
}
END {
  print "Total errors:", error_count
}' logfile.txt
 
# Process CSV with headers
awk -F',' 'NR==1 {for(i=1;i<=NF;i++) h[$i]=i} {print $h["name"]}' data.csv

cut - Extract Columns

Select columns from text.

# Cut specific columns
cut -f 1 file.txt           # Column 1 (tab-delimited)
cut -f 1,3 file.txt         # Columns 1 and 3
cut -f 2-4 file.txt         # Columns 2-4
 
# Specify delimiter
cut -d: -f1 /etc/passwd     # First field of password file
cut -d',' -f2 data.csv      # Second field of CSV
 
# Character positions
cut -c 1-5 file.txt         # Characters 1-5
cut -c 1,3,5 file.txt       # Characters 1, 3, 5
cut -c 5- file.txt          # From character 5 onward
 
# Complement (exclude columns)
cut -f2 --complement file.txt
 
# Output delimiter
cut -d: -f1,3 --output-delimiter=, /etc/passwd
 
# Extract field range
cut -d: -f1-3 /etc/passwd
 
# Only lines with delimiter
cut -d: -f1 --only-delimited /etc/passwd

sort - Sort Lines

Sort text data.

# Simple sort
sort file.txt
 
# Reverse sort
sort -r file.txt
 
# Sort numerically
sort -n numbers.txt
sort -nr numbers.txt        # Numeric reverse
 
# Sort by field
sort -k2 file.txt           # Sort by field 2
sort -k2n file.txt          # Numeric field sort
sort -k1,1 -k2n file.txt    # Multiple keys
 
# Field delimiter
sort -t: -k3n /etc/passwd   # Sort by UID
 
# Case-insensitive
sort -f file.txt
 
# Unique (same as sort | uniq)
sort -u file.txt
 
# Check if sorted
sort -c file.txt
# Returns 0 if sorted, 1 if not
 
# Temporary file location
sort -T /tmp file.txt
 
# Parallel sort
sort --parallel=4 file.txt
 
# Sort by month
sort -M file.txt
 
# Version sort
sort -V file.txt

uniq - Remove Duplicates

Remove or count duplicate lines (input must be sorted).

# Remove duplicates
sort file.txt | uniq
 
# Count duplicates
sort file.txt | uniq -c
#       1 line one
#       3 line two
#       2 line three
 
# Show only duplicates
sort file.txt | uniq -d
 
# Show only unique
sort file.txt | uniq -u
 
# Case-insensitive
sort file.txt | uniq -i
 
# Compare fields
sort file.txt | uniq -f1
# Ignore first field when comparing
 
# Skip fields
sort file.txt | uniq -s5
# Skip first 5 characters
 
# Check characters
sort file.txt | uniq -w10
# Only check first 10 characters
 
# Skip to character
sort file.txt | uniq -s0 -w5

Additional Text Processing Tools

wc - Word Count

# Count lines, words, characters
wc file.txt
#  45  156 1230 file.txt
 
# Count lines
wc -l file.txt
 
# Count words
wc -w file.txt
 
# Count characters
wc -c file.txt
 
# Count files
wc -l *.txt
 
# Total lines
wc -l *.txt | tail -1
 
# Largest file
wc -l *.txt | sort -rn | head -1

diff - Compare Files

# Show differences
diff file1.txt file2.txt
 
# Unified format
diff -u file1.txt file2.txt
 
# Context format
diff -c file1.txt file2.txt
 
# Ignore whitespace
diff -w file1.txt file2.txt
 
# Compare directories
diff -r dir1/ dir2/
 
# Ignore case
diff -i file1.txt file2.txt
 
# Side-by-side
diff -y file1.txt file2.txt

tr - Translate Characters

# Replace characters
tr 'a-z' 'A-Z' < file.txt  # Lowercase to uppercase
 
# Delete characters
tr -d '0-9' < file.txt
 
# Squeeze repeated
tr -s ' ' < file.txt
 
# Translate
tr ':' ',' < file.txt
 
# Delete digits and spaces
tr -d '0-9 ' < file.txt

Regular Expressions

Basic Regex

# Anchor to start
grep "^error" file.txt
 
# Anchor to end
grep "error$" file.txt
 
# Any character
grep "c.t" file.txt        # cat, cot, cut
 
# Character class
grep "[aeiou]" file.txt    # Contains vowel
grep "[0-9]" file.txt      # Contains digit
grep "[^0-9]" file.txt     # Non-digit
 
# Quantifiers
grep "a*" file.txt         # Zero or more a
grep "a+" file.txt         # One or more a
grep "a?" file.txt         # Zero or one a
grep "a{3}" file.txt       # Exactly 3 a's
grep "a{2,4}" file.txt     # 2 to 4 a's
 
# OR operator
grep -E "cat|dog" file.txt
 
# Grouping
grep -E "(ab)+" file.txt

Extended Regex (grep -E or egrep)

# Email validation
egrep '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' file.txt
 
# IP address
egrep '^([0-9]{1,3}\.){3}[0-9]{1,3}$' file.txt
 
# Phone number
egrep '^\+?[0-9]{1,3}-?[0-9]{3,4}-?[0-9]{4}$' file.txt
 
# Date (YYYY-MM-DD)
egrep '^[0-9]{4}-[0-9]{2}-[0-9]{2}$' file.txt
 
# URL
egrep 'https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txt

Practical Examples

Parse Log Files

# Count error types
grep -o 'error: [^;]*' app.log | cut -d: -f2 | sort | uniq -c
 
# Extract timestamps
grep "2026-01-23" access.log | awk '{print $4}' | cut -d: -f1-2
 
# Analyze by status code
cut -d' ' -f9 access.log | sort | uniq -c | sort -rn

Data Processing

# CSV analysis
awk -F',' '{print $2}' data.csv | sort | uniq -c | sort -rn
 
# Remove duplicates maintaining order
awk '!seen[$0]++' file.txt
 
# Filter and transform
grep -E '^[0-9]+' data.txt | awk '{print $1 * 2}' | sort -n

Best Practices

Test First - Use non-destructive commands first
Use Pipes - Chain commands for efficiency
Understand Regex - Learn regex patterns
Know Your Tools - Choose right tool for task
Use Delimiters - Specify delimiters explicitly
Backup Files - Before using sed -i
Escape Special Chars - In regex and sed
Performance - Consider file size and complexity
Document Scripts - Comment complex one-liners
Test Regex - Verify patterns work correctly

Summary

Text processing is essential for Linux mastery:

grep finds patterns in text
sed edits text streams
awk processes structured data
cut extracts columns
sort/uniq organize data
Regular expressions enable pattern matching
These tools combined enable powerful data processing

Master text processing for professional Linux administration and data analysis.

Linux Text Processing: grep, sed, awk, cut, sort, uniq, and Regular Expressions

Linux Text Processing: grep, sed, awk, cut, sort, uniq, and Regular Expressions

grep - Search Text

sed - Stream Editor

awk - Text Processing Language

cut - Extract Columns

sort - Sort Lines

uniq - Remove Duplicates

Additional Text Processing Tools

wc - Word Count

diff - Compare Files

tr - Translate Characters

Regular Expressions

Basic Regex

Extended Regex (grep -E or egrep)

Practical Examples

Parse Log Files

Data Processing

Best Practices

Summary

Continue Reading

How I Cut 89% of Unused JavaScript from a Production AI App

Linux Disk Management: df, du, fdisk, parted, mount, and LVM Basics

Linux Environment Variables: PATH, HOME, USER, and Shell Configuration