FTJ
← Blog
Text

How to Remove Duplicate Lines from Text — Methods and Free Tool

Learn multiple methods to remove duplicate lines from text files and strings, including command line tools, programming approaches, and our free online deduplication tool.

Why Remove Duplicate Lines?

Duplicate lines in text files cause problems across many scenarios:

  1. Data cleaning: Removing duplicate entries from CSV exports or log files
  2. List management: Deduplicating email lists, IP addresses, or domain lists
  3. Code cleanup: Removing duplicate import statements or configuration entries
  4. Log analysis: Filtering repeated log entries for cleaner analysis
  5. SEO and content: Ensuring unique meta tags, keywords, or URLs

Method 1: Using FreeToolJet's Remove Duplicate Lines Tool

Our Remove Duplicate Lines tool is the easiest way to deduplicate text:

Step-by-Step Guide

  1. Open the Remove Duplicate Lines tool
  2. Paste your text into the input area (or upload a file)
  3. Choose your options:
  4. Click "Remove Duplicates"
  5. Copy the cleaned text or download as a file

Features

  • Instant results: No page refresh, real-time processing
  • Case sensitivity options: Control how matching works
  • Whitespace handling: Optionally trim spaces before comparing
  • Preserve order: Keep first occurrence order (or sort alphabetically)
  • Statistics: See how many duplicates were removed
  • Client-side only: Your text never leaves your browser

Method 2: Command Line Tools

Using sort and uniq (Linux/macOS)

The classic Unix approach:

# Remove duplicates, keep sorted output

# Remove duplicates, keep original order (preserve first occurrence) awk '!seen[$0]++' input.txt > output.txt

# Case-insensitive deduplication sort -f input.txt | uniq -i > output.txt

# Count occurrences before removing sort input.txt | uniq -c > with_counts.txt `

Using PowerShell (Windows)

# Remove duplicates, preserve order

# Alternative: preserve original order $lines = Get-Content input.txt $lines | Select-Object -Unique | Out-File output.txt

# Case-insensitive (Get-Content input.txt).ToLower() | Select-Object -Unique `

Method 3: Text Editors

VS Code

  1. Open your file
  2. Press Ctrl+Shift+P (or Cmd+Shift+P on Mac)
  3. Type "Sort Lines Ascending" and run it
  4. Press Ctrl+H to open Find/Replace
  5. Enable regex mode (.* button)
  6. Find: ^(.*)(\n\1)+$
  7. Replace: $1
  8. Click "Replace All"

Sublime Text

  1. Open file
  2. Edit → Sort Lines
  3. Edit → Permute Lines → Unique

Vim

# Sort and remove duplicates

# Remove duplicates without sorting (preserve order) :g/^\(.*\)$\n\1/d `

Method 4: Programming Languages

Python

# Method 1: Using dict.fromkeys() (preserves order, Python 3.7+)
with open('input.txt', 'r') as f:

unique_lines = list(dict.fromkeys(lines))

with open('output.txt', 'w') as f: f.writelines(unique_lines)

# Method 2: Using set (doesn't preserve order) with open('input.txt', 'r') as f: unique_lines = set(f.readlines())

with open('output.txt', 'w') as f: f.writelines(unique_lines)

# Method 3: Case-insensitive, preserving order of first occurrence def remove_duplicates_preserve_order(lines, case_sensitive=False): seen = set() result = [] for line in lines: compare_line = line if case_sensitive else line.lower() if compare_line not in seen: seen.add(compare_line) result.append(line) return result `

JavaScript/Node.js

// Method 1: Using Set (doesn't preserve order)
const fs = require('fs');
const lines = fs.readFileSync('input.txt', 'utf8').split('
');
const unique = [...new Set(lines)];
fs.writeFileSync('output.txt', unique.join('

// Method 2: Preserve order function removeDuplicates(lines, caseSensitive = true) { const seen = new Set(); return lines.filter(line => { const key = caseSensitive ? line : line.toLowerCase(); if (seen.has(key)) return false; seen.add(key); return true; }); }

const lines = fs.readFileSync('input.txt', 'utf8').split(' '); const unique = removeDuplicates(lines, false); // case-insensitive fs.writeFileSync('output.txt', unique.join(' ')); `

Go

import ( "bufio" "fmt" "os" "strings" )

func removeDuplicates(lines []string, caseSensitive bool) []string { seen := make(map[string]bool) var result []string for _, line := range lines { key := line if !caseSensitive { key = strings.ToLower(line) } if !seen[key] { seen[key] = true result = append(result, line) } } return result }

func main() { file, _ := os.Open("input.txt") defer file.Close() var lines []string scanner := bufio.NewScanner(file) for scanner.Scan() { lines = append(lines, scanner.Text()) } unique := removeDuplicates(lines, true) output, _ := os.Create("output.txt") defer output.Close() writer := bufio.NewWriter(output) for _, line := range unique { fmt.Fprintln(writer, line) } writer.Flush() } `

Advanced Deduplication Scenarios

Remove Duplicate Lines Based on a Column

For CSV or tabular data, you might want to deduplicate based on a specific column:

def remove_duplicates_by_column(input_file, output_file, column_index): seen = set() with open(input_file, 'r') as infile, open(output_file, 'w') as outfile: reader = csv.reader(infile) writer = csv.writer(outfile) for row in reader: key = row[column_index] if key not in seen: seen.add(key) writer.writerow(row)

# Remove duplicates based on first column (index 0) remove_duplicates_by_column('data.csv', 'cleaned.csv', 0) `

Remove Near-Duplicates (Fuzzy Matching)

For lines that are similar but not identical:

def is_similar(line1, line2, threshold=0.9): return SequenceMatcher(None, line1, line2).ratio() > threshold

def remove_near_duplicates(lines, threshold=0.9): result = [] for line in lines: if not any(is_similar(line, existing, threshold) for existing in result): result.append(line) return result `

Remove Duplicate Lines with Count

Sometimes you want to know how many times each line appeared:

with open('input.txt', 'r') as f: lines = f.readlines()

counts = Counter(lines)

for line, count in counts.items(): print(f"{count}: {line.strip()}") `

Performance Considerations

When processing large files:

MethodMemory UsageSpeedPreserves Order
`sortuniq`Low (streaming)FastNo
awk '!seen[$0]++'MediumFastYes
Python set()HighVery FastNo
Python dict.fromkeys()HighVery FastYes

For very large files (GBs): Use streaming approaches like awk or process the file in chunks.

Common Pitfalls

  1. Whitespace differences: "hello" and "hello " are different lines
  1. Line ending differences: \n vs \r\n
  1. Case sensitivity: "Hello" and "hello" are different
  1. Empty lines: Multiple blank lines may be considered duplicates
  1. Unicode normalization: Accented characters can have multiple representations

When to Use Each Method

ScenarioRecommended Method
Quick one-time cleanupFreeToolJet Remove Duplicate Lines tool
Large files (GBs)awk '!seen[$0]++' or streaming Python
Part of a data pipelinePython script with proper error handling
In a text editorVS Code / Sublime Text / Vim commands
Windows without WSLPowerShell
Preserve orderFreeToolJet tool or awk method
Case-insensitiveFreeToolJet tool or `sort -funiq -i`

Related Tools

Try These Tools

More Articles