Linux Text Processing: grep, sed, awk, cut, sort, uniq, and Regular Expressions
Master Linux text processing commands including grep for searching, sed for stream editing, awk for data transformation, and text manipulation tools with practical examples.
Linux Text Processing: grep, sed, awk, cut, sort, uniq, and Regular Expressions
Text processing is one of the most powerful aspects of Linux. Mastering these tools enables efficient data analysis, log parsing, and system administration tasks.
grep - Search Text
Find lines matching patterns.
# Simple search
grep "error" logfile.txt
# Case-insensitive search
grep -i "error" logfile.txt
# Show matching lines with numbers
grep -n "error" logfile.txt
# 45: This is an error message
# 67: Another error occurred
# Invert match (show non-matching)
grep -v "debug" logfile.txt
# Count matching lines
grep -c "error" logfile.txt
# 15
# Show only matching part
grep -o "pattern" file.txt
# Show context lines
grep -B 2 "error" file.txt # Before
grep -A 2 "error" file.txt # After
grep -C 2 "error" file.txt # Context (both)
# Search multiple files
grep "error" *.log
grep -r "error" /var/log/
# Show file names only
grep -l "error" *.txt
# Recursive with file types
grep -r --include="*.log" "error" /var/log/
# Word boundary
grep -w "error" file.txt
# Matches "error" but not "errors"
# Extended regex
grep -E "error|warning" file.txt
# Fixed string (no regex)
grep -F "10.0.0.1" logfile.txt
# Performance optimization
grep -m 1 "pattern" largefile.txt
# -m: stop after first match
# Highlight matches
grep --color=always "error" file.txt | less -Rsed - Stream Editor
Edit text with commands.
# Print specific lines
sed -n '5p' file.txt # Line 5
sed -n '5,10p' file.txt # Lines 5-10
sed -n '1p;$p' file.txt # First and last
# Delete lines
sed '5d' file.txt # Delete line 5
sed '5,10d' file.txt # Delete lines 5-10
sed '/pattern/d' file.txt # Delete matching lines
# Substitute text
sed 's/old/new/' file.txt # Replace first occurrence
sed 's/old/new/g' file.txt # Replace all occurrences
sed 's/old/new/2' file.txt # Replace 2nd occurrence
sed 's/old/new/2g' file.txt # Replace from 2nd onward
# Case-insensitive substitution
sed 's/old/new/i' file.txt
# Substitute in specific line
sed '5s/old/new/' file.txt # Line 5 only
sed '5,10s/old/new/g' file.txt # Lines 5-10
# Use different delimiter
sed 's|/path/old|/path/new|' file.txt
# Escape special characters
sed 's/\*/asterisk/' file.txt
# Backreferences
sed 's/\([0-9]\+\) \(.*\)/\2: \1/' file.txt
# Save changes to file
sed -i 's/old/new/g' file.txt
sed -i.bak 's/old/new/g' file.txt # Backup original
# Multiple operations
sed -e 's/old/new/g' -e 's/foo/bar/g' file.txt
# Transform characters
sed 'y/abc/xyz/' file.txt
# a->x, b->y, c->z
# Insert/append lines
sed '5a\New line text' file.txt # Append after line 5
sed '5i\New line text' file.txt # Insert before line 5
sed '$ a\End of file' file.txt # Append at endawk - Text Processing Language
Process structured data.
# Print specific columns
awk '{print $1}' file.txt # First column
awk '{print $1, $3}' file.txt # Columns 1 and 3
awk '{print NF}' file.txt # Number of fields
# Field separator
awk -F: '{print $1}' /etc/passwd # Parse password file
awk -F',' '{print $2}' data.csv # CSV file
# Pattern matching
awk '/error/ {print}' file.txt
awk '/error/ {print $0}' file.txt
# Conditional statements
awk '$3 > 100 {print $1, $3}' data.txt
# Built-in variables
awk '{print NR, $0}' file.txt # Line number and content
awk 'END {print NR}' file.txt # Total lines
awk 'BEGIN {FS=":"} {print $1}' /etc/passwd
# Sum values
awk '{sum += $1} END {print sum}' numbers.txt
# Count occurrences
awk '/pattern/ {count++} END {print count}' file.txt
# Calculate average
awk '{sum += $1; count++} END {print sum/count}' numbers.txt
# Multiple conditions
awk '$2 > 50 && $3 < 100 {print $1}' data.txt
# Pattern ranges
awk '/start/,/end/ {print}' file.txt
# String functions
awk '{print toupper($1)}' file.txt
awk '{print substr($0, 1, 5)}' file.txt
awk '{print length($0)}' file.txt
# Complex awk script
awk 'BEGIN {
print "Processing file..."
}
/error/ {
error_count++
}
END {
print "Total errors:", error_count
}' logfile.txt
# Process CSV with headers
awk -F',' 'NR==1 {for(i=1;i<=NF;i++) h[$i]=i} {print $h["name"]}' data.csvcut - Extract Columns
Select columns from text.
# Cut specific columns
cut -f 1 file.txt # Column 1 (tab-delimited)
cut -f 1,3 file.txt # Columns 1 and 3
cut -f 2-4 file.txt # Columns 2-4
# Specify delimiter
cut -d: -f1 /etc/passwd # First field of password file
cut -d',' -f2 data.csv # Second field of CSV
# Character positions
cut -c 1-5 file.txt # Characters 1-5
cut -c 1,3,5 file.txt # Characters 1, 3, 5
cut -c 5- file.txt # From character 5 onward
# Complement (exclude columns)
cut -f2 --complement file.txt
# Output delimiter
cut -d: -f1,3 --output-delimiter=, /etc/passwd
# Extract field range
cut -d: -f1-3 /etc/passwd
# Only lines with delimiter
cut -d: -f1 --only-delimited /etc/passwdsort - Sort Lines
Sort text data.
# Simple sort
sort file.txt
# Reverse sort
sort -r file.txt
# Sort numerically
sort -n numbers.txt
sort -nr numbers.txt # Numeric reverse
# Sort by field
sort -k2 file.txt # Sort by field 2
sort -k2n file.txt # Numeric field sort
sort -k1,1 -k2n file.txt # Multiple keys
# Field delimiter
sort -t: -k3n /etc/passwd # Sort by UID
# Case-insensitive
sort -f file.txt
# Unique (same as sort | uniq)
sort -u file.txt
# Check if sorted
sort -c file.txt
# Returns 0 if sorted, 1 if not
# Temporary file location
sort -T /tmp file.txt
# Parallel sort
sort --parallel=4 file.txt
# Sort by month
sort -M file.txt
# Version sort
sort -V file.txtuniq - Remove Duplicates
Remove or count duplicate lines (input must be sorted).
# Remove duplicates
sort file.txt | uniq
# Count duplicates
sort file.txt | uniq -c
# 1 line one
# 3 line two
# 2 line three
# Show only duplicates
sort file.txt | uniq -d
# Show only unique
sort file.txt | uniq -u
# Case-insensitive
sort file.txt | uniq -i
# Compare fields
sort file.txt | uniq -f1
# Ignore first field when comparing
# Skip fields
sort file.txt | uniq -s5
# Skip first 5 characters
# Check characters
sort file.txt | uniq -w10
# Only check first 10 characters
# Skip to character
sort file.txt | uniq -s0 -w5Additional Text Processing Tools
wc - Word Count
# Count lines, words, characters
wc file.txt
# 45 156 1230 file.txt
# Count lines
wc -l file.txt
# Count words
wc -w file.txt
# Count characters
wc -c file.txt
# Count files
wc -l *.txt
# Total lines
wc -l *.txt | tail -1
# Largest file
wc -l *.txt | sort -rn | head -1diff - Compare Files
# Show differences
diff file1.txt file2.txt
# Unified format
diff -u file1.txt file2.txt
# Context format
diff -c file1.txt file2.txt
# Ignore whitespace
diff -w file1.txt file2.txt
# Compare directories
diff -r dir1/ dir2/
# Ignore case
diff -i file1.txt file2.txt
# Side-by-side
diff -y file1.txt file2.txttr - Translate Characters
# Replace characters
tr 'a-z' 'A-Z' < file.txt # Lowercase to uppercase
# Delete characters
tr -d '0-9' < file.txt
# Squeeze repeated
tr -s ' ' < file.txt
# Translate
tr ':' ',' < file.txt
# Delete digits and spaces
tr -d '0-9 ' < file.txtRegular Expressions
Basic Regex
# Anchor to start
grep "^error" file.txt
# Anchor to end
grep "error$" file.txt
# Any character
grep "c.t" file.txt # cat, cot, cut
# Character class
grep "[aeiou]" file.txt # Contains vowel
grep "[0-9]" file.txt # Contains digit
grep "[^0-9]" file.txt # Non-digit
# Quantifiers
grep "a*" file.txt # Zero or more a
grep "a+" file.txt # One or more a
grep "a?" file.txt # Zero or one a
grep "a{3}" file.txt # Exactly 3 a's
grep "a{2,4}" file.txt # 2 to 4 a's
# OR operator
grep -E "cat|dog" file.txt
# Grouping
grep -E "(ab)+" file.txtExtended Regex (grep -E or egrep)
# Email validation
egrep '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$' file.txt
# IP address
egrep '^([0-9]{1,3}\.){3}[0-9]{1,3}$' file.txt
# Phone number
egrep '^\+?[0-9]{1,3}-?[0-9]{3,4}-?[0-9]{4}$' file.txt
# Date (YYYY-MM-DD)
egrep '^[0-9]{4}-[0-9]{2}-[0-9]{2}$' file.txt
# URL
egrep 'https?://[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' file.txtPractical Examples
Parse Log Files
# Count error types
grep -o 'error: [^;]*' app.log | cut -d: -f2 | sort | uniq -c
# Extract timestamps
grep "2026-01-23" access.log | awk '{print $4}' | cut -d: -f1-2
# Analyze by status code
cut -d' ' -f9 access.log | sort | uniq -c | sort -rnData Processing
# CSV analysis
awk -F',' '{print $2}' data.csv | sort | uniq -c | sort -rn
# Remove duplicates maintaining order
awk '!seen[$0]++' file.txt
# Filter and transform
grep -E '^[0-9]+' data.txt | awk '{print $1 * 2}' | sort -nBest Practices
- Test First - Use non-destructive commands first
- Use Pipes - Chain commands for efficiency
- Understand Regex - Learn regex patterns
- Know Your Tools - Choose right tool for task
- Use Delimiters - Specify delimiters explicitly
- Backup Files - Before using sed -i
- Escape Special Chars - In regex and sed
- Performance - Consider file size and complexity
- Document Scripts - Comment complex one-liners
- Test Regex - Verify patterns work correctly
Summary
Text processing is essential for Linux mastery:
- grep finds patterns in text
- sed edits text streams
- awk processes structured data
- cut extracts columns
- sort/uniq organize data
- Regular expressions enable pattern matching
- These tools combined enable powerful data processing
Master text processing for professional Linux administration and data analysis.