String Traversals: Declarative Text Processing

Type-Safe Text Manipulation Without Regex Complexity

What You'll Learn

  • Breaking strings into traversable units (characters, words, lines)
  • Declarative text normalisation and validation
  • Composing string traversals with filtered optics for pattern matching
  • Real-world text processing: logs, CSV, configuration files
  • When to use string traversals vs Stream API vs regex
  • Performance characteristics and best practices

Working with text in Java often feels like choosing between extremes: verbose manual string manipulation with substring() and indexOf(), or cryptic regular expressions that become unmaintainable. String traversals offer a middle path: declarative, composable, and type-safe.

Consider these common scenarios from enterprise Java applications:

  • Configuration Management: Normalising property values across .properties files
  • Log Analysis: Filtering and transforming log entries line-by-line
  • Data Import: Processing CSV files with per-field transformations
  • API Integration: Standardising email addresses from external systems
  • Validation: Checking character-level constraints (length, allowed characters)

The traditional approach mixes parsing logic with transformation logic, making code difficult to test and reuse:

// Traditional: Mixed concerns, hard to compose
String normaliseEmail(String email) {
    String[] parts = email.toLowerCase().split("@");
    if (parts.length != 2) throw new IllegalArgumentException();
    String domain = parts[1].trim();
    return parts[0] + "@" + domain;
}

// What if we need to normalise just the domain? Or multiple emails in a document?
// We'd need separate methods or complex parameters.

String traversals let you separate the "what" (the structure) from the "how" (the transformation), making your text processing logic reusable and composable.


Think of String Traversals Like...

  • Java Stream's split() + map(): Like text.lines().map(...) but integrated into optic composition
  • IntelliJ's "Replace in Selection": Focus on text units, transform them, reassemble automatically
  • Unix text tools: Similar to awk and sed pipelines, but type-safe and composable
  • SQL's string functions: Like UPPER(), TRIM(), SPLIT_PART(), but for immutable Java strings

The key insight: text structure (characters, words, lines) becomes part of your optic's identity, not preprocessing before the real work.


Three Ways to Decompose Text

The StringTraversals utility class provides three fundamental decompositions:

MethodUnitExample InputFocused Elements
chars()Characters"hello"['h', 'e', 'l', 'l', 'o']
worded()Words (by \s+)"hello world"["hello", "world"]
lined()Lines (by \n, \r\n, \r)"line1\nline2"["line1", "line2"]

Each returns a Traversal<String, ?> that can be composed with other optics and applied via Traversals.modify() or Traversals.getAll().


A Step-by-Step Walkthrough

Step 1: Character-Level Processing with chars()

The chars() traversal breaks a string into individual characters, allowing transformations at the finest granularity.

import org.higherkindedj.optics.util.StringTraversals;
import org.higherkindedj.optics.util.Traversals;

// Create a character traversal
Traversal<String, Character> charTraversal = StringTraversals.chars();

// Transform all characters to uppercase
String uppercased = Traversals.modify(charTraversal, Character::toUpperCase, "hello world");
// Result: "HELLO WORLD"

// Extract all characters as a list
List<Character> chars = Traversals.getAll(charTraversal, "abc");
// Result: ['a', 'b', 'c']

// Compose with filtered for selective transformation
Traversal<String, Character> vowels = charTraversal.filtered(c ->
    "aeiouAEIOU".indexOf(c) >= 0
);
String result = Traversals.modify(vowels, Character::toUpperCase, "hello world");
// Result: "hEllO wOrld"  (only vowels uppercased)

Use Cases:

  • Character-level validation (alphanumeric checks)
  • ROT13 or Caesar cipher transformations
  • Character frequency analysis
  • Removing or replacing specific characters

Step 2: Word-Level Processing with worded()

The worded() traversal splits by whitespace (\s+), focusing on each word independently.

Key Semantics:

  • Multiple consecutive spaces are normalised to single spaces
  • Leading and trailing whitespace is removed
  • Empty strings or whitespace-only strings produce no words
Traversal<String, String> wordTraversal = StringTraversals.worded();

// Capitalise each word
String capitalised = Traversals.modify(
    wordTraversal,
    word -> word.substring(0, 1).toUpperCase() + word.substring(1).toLowerCase(),
    "hello WORLD from JAVA"
);
// Result: "Hello World From Java"

// Extract all words (whitespace normalisation automatic)
List<String> words = Traversals.getAll(wordTraversal, "foo  bar\t\tbaz");
// Result: ["foo", "bar", "baz"]

// Compose with filtered for conditional transformation
Traversal<String, String> longWords = wordTraversal.filtered(w -> w.length() > 5);
String emphasised = Traversals.modify(longWords, w -> w.toUpperCase(), "make software better");
// Result: "make SOFTWARE BETTER"

Use Cases:

  • Title case formatting
  • Stop word filtering
  • Word-based text normalisation
  • Search query processing
  • Email domain extraction

Step 3: Line-Level Processing with lined()

The lined() traversal splits by line separators (\n, \r\n, or \r), treating each line as a focus target.

Key Semantics:

  • All line endings are normalised to \n in output
  • Empty strings produce no lines
  • Trailing newlines are preserved in individual line processing
Traversal<String, String> lineTraversal = StringTraversals.lined();

// Prefix each line with a marker
String prefixed = Traversals.modify(
    lineTraversal,
    line -> "> " + line,
    "line1\nline2\nline3"
);
// Result: "> line1\n> line2\n> line3"

// Extract all non-empty lines
List<String> lines = Traversals.getAll(lineTraversal, "first\n\nthird");
// Result: ["first", "", "third"]  (empty line preserved)

// Filter lines by content
Traversal<String, String> errorLines = lineTraversal.filtered(line ->
    line.contains("ERROR")
);
String errors = Traversals.getAll(errorLines, logContent).stream()
    .collect(Collectors.joining("\n"));

Use Cases:

  • Log file filtering and transformation
  • CSV row processing
  • Configuration file parsing
  • Code formatting (indentation, comments)
  • Multi-line text validation

Real-World Example: Email Normalisation Service

A common requirement in enterprise systems: normalise email addresses from various sources before storage.

import org.higherkindedj.optics.util.StringTraversals;
import org.higherkindedj.optics.util.Traversals;

public class EmailNormaliser {

    // Normalise the local part (before @) to lowercase
    // Normalise the domain (after @) to lowercase and trim
    public static String normalise(String email) {
        Traversal<String, String> words = StringTraversals.worded();

        // Split email by @ symbol (treated as whitespace separator for this example)
        // In production, you'd want more robust parsing
        String lowercased = Traversals.modify(words, String::toLowerCase, email);

        return lowercased.trim();
    }

    // More sophisticated: normalise domain parts separately
    public static String normaliseDomain(String email) {
        int atIndex = email.indexOf('@');
        if (atIndex == -1) return email;

        String local = email.substring(0, atIndex);
        String domain = email.substring(atIndex + 1);

        // Normalise domain components
        Traversal<String, String> domainParts = StringTraversals.worded();
        String normalisedDomain = Traversals.modify(
            domainParts,
            String::toLowerCase,
            domain.replace(".", " ")  // Split domain by dots
        ).replace(" ", ".");  // Rejoin

        return local + "@" + normalisedDomain;
    }
}

Composing String Traversals

The power emerges when combining string traversals with other optics:

With Filtered Traversals – Pattern Matching

// Find and transform lines starting with a prefix
Traversal<String, String> commentLines =
    StringTraversals.lined().filtered(line -> line.trim().startsWith("#"));

String withoutComments = Traversals.modify(
    commentLines,
    line -> "",  // Remove comment lines by replacing with empty
    sourceCode
);

With Nested Structures – Bulk Text Processing

@GenerateLenses
public record Document(String title, List<String> paragraphs) {}

// Capitalise first letter of each word in all paragraphs
Traversal<Document, String> allWords =
    DocumentLenses.paragraphs().asTraversal()
        .andThen(Traversals.forList())
        .andThen(StringTraversals.worded());

Document formatted = Traversals.modify(
    allWords,
    word -> word.substring(0, 1).toUpperCase() + word.substring(1),
    document
);

With Effectful Operations – Validation

import org.higherkindedj.hkt.optional.OptionalMonad;

// Validate that all words are alphanumeric
Traversal<String, String> words = StringTraversals.worded();

Function<String, Kind<OptionalKind.Witness, String>> validateWord = word -> {
    boolean alphanumeric = word.chars().allMatch(Character::isLetterOrDigit);
    return alphanumeric
        ? OptionalKindHelper.OPTIONAL.widen(Optional.of(word))
        : OptionalKindHelper.OPTIONAL.widen(Optional.empty());
};

Optional<String> validated = OptionalKindHelper.OPTIONAL.narrow(
    words.modifyF(validateWord, input, OptionalMonad.INSTANCE)
);
// Returns Optional.empty() if any word contains non-alphanumeric characters

Common Patterns

Log File Processing

// Extract ERROR lines from application logs
Traversal<String, String> errorLines =
    StringTraversals.lined().filtered(line -> line.contains("ERROR"));

List<String> errors = Traversals.getAll(errorLines, logContent);

// Add timestamps to each line
Traversal<String, String> allLines = StringTraversals.lined();
String timestamped = Traversals.modify(
    allLines,
    line -> LocalDateTime.now() + " " + line,
    originalLog
);

CSV Processing

// Process CSV by splitting into lines, then cells
Traversal<String, String> rows = StringTraversals.lined();
Traversal<String, String> cells = StringTraversals.worded();  // Simplified; use split(",") in production

// Transform specific column (e.g., third column to uppercase)
String processedCsv = Traversals.modify(
    rows,
    row -> {
        List<String> parts = List.of(row.split(","));
        if (parts.size() > 2) {
            List<String> modified = new ArrayList<>(parts);
            modified.set(2, parts.get(2).toUpperCase());
            return String.join(",", modified);
        }
        return row;
    },
    csvContent
);

Configuration File Normalisation

// Trim all property values in .properties format
Traversal<String, String> propertyLines = StringTraversals.lined();

String normalised = Traversals.modify(
    propertyLines,
    line -> {
        if (line.contains("=")) {
            String[] parts = line.split("=", 2);
            return parts[0].trim() + "=" + parts[1].trim();
        }
        return line;
    },
    propertiesContent
);

When to Use String Traversals vs Other Approaches

Use String Traversals When:

  • Reusable text transformations - Define once, apply across multiple strings
  • Composable pipelines - Building complex optic chains with lenses and prisms
  • Type-safe operations - Character/word/line transformations with compile-time safety
  • Immutable updates - Transforming text whilst keeping data immutable
  • Declarative intent - Express "what" without "how" (no manual indexing)
// Perfect: Reusable, composable, declarative
Traversal<Config, String> allPropertyValues =
    ConfigLenses.properties().asTraversal()
        .andThen(StringTraversals.lined())
        .andThen(StringTraversals.worded());

Config trimmed = Traversals.modify(allPropertyValues, String::trim, config);

Use Stream API When:

  • Complex filtering - Multiple conditions with short-circuiting
  • Aggregations - Counting, collecting to new structures
  • No structural preservation needed - Extracting data, not updating in place
  • One-time operations - Not reused across different contexts
// Better with streams: Complex aggregation
long wordCount = text.lines()
    .flatMap(line -> Arrays.stream(line.split("\\s+")))
    .filter(word -> word.length() > 5)
    .count();

Use Regular Expressions When:

  • Complex pattern matching - Extracting structured data (emails, URLs, dates)
  • Search and replace - Simple find-and-replace operations
  • Validation - Checking format compliance (phone numbers, postal codes)
// Sometimes regex is clearest
Pattern emailPattern = Pattern.compile("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}");
Matcher matcher = emailPattern.matcher(text);
while (matcher.find()) {
    processEmail(matcher.group());
}

Common Pitfalls

Don't Do This:

// Inefficient: Creating traversals in loops
for (String paragraph : document.paragraphs()) {
    Traversal<String, String> words = StringTraversals.worded();
    Traversals.modify(words, String::toUpperCase, paragraph);
}

// Over-engineering: Using traversals for simple operations
Traversal<String, Character> chars = StringTraversals.chars();
String upper = Traversals.modify(chars, Character::toUpperCase, "hello");
// Just use: "hello".toUpperCase()

// Wrong expectation: Thinking it changes string length
Traversal<String, Character> chars = StringTraversals.chars();
String result = Traversals.modify(chars.filtered(c -> c != 'a'), c -> c, "banana");
// Result: "banana" (still 6 chars, 'a' positions unchanged)
// Filtered traversals preserve structure!

Do This Instead:

// Efficient: Create traversal once, reuse
Traversal<String, String> words = StringTraversals.worded();
List<String> processed = document.paragraphs().stream()
    .map(p -> Traversals.modify(words, String::toUpperCase, p))
    .collect(toList());

// Right tool: Use built-in methods for simple cases
String upper = text.toUpperCase();  // Simple and clear

// Correct expectation: Use getAll for extraction
Traversal<String, Character> vowels = StringTraversals.chars()
    .filtered(c -> "aeiou".indexOf(c) >= 0);
List<Character> extracted = Traversals.getAll(vowels, "banana");
// Result: ['a', 'a', 'a'] - extracts vowels without changing structure

Performance Notes

String traversals are optimised for immutability:

  • Single pass: Text is decomposed and reconstructed in one traversal
  • No intermediate strings: Operates on character/word lists internally
  • Structural sharing: For filtered operations, unchanged portions reference original
  • Lazy bounds checking: Minimal overhead for validation

Best Practice: For frequently used string transformations, create traversals as constants:

public class TextProcessing {
    // Reusable string traversals
    public static final Traversal<String, String> WORDS =
        StringTraversals.worded();

    public static final Traversal<String, String> LINES =
        StringTraversals.lined();

    public static final Traversal<String, Character> VOWELS =
        StringTraversals.chars().filtered(c -> "aeiouAEIOU".indexOf(c) >= 0);

    // Domain-specific compositions
    public static final Traversal<String, String> ERROR_LOG_LINES =
        LINES.filtered(line -> line.contains("ERROR"));
}

Integration with Functional Java Ecosystem

String traversals complement existing functional libraries:

Cyclops Integration

import cyclops.control.Validated;

// Validate each word using Cyclops Validated
Traversal<String, String> words = StringTraversals.worded();

Function<String, Kind<ValidatedKind.Witness<List<String>>, String>> validateLength =
    word -> word.length() <= 10
        ? VALIDATED.widen(Validated.valid(word))
        : VALIDATED.widen(Validated.invalid(List.of("Word too long: " + word)));

Validated<List<String>, String> result = VALIDATED.narrow(
    words.modifyF(validateLength, input, validatedApplicative)
);

Functional Java Libraries:

  • Cyclops - Functional control structures and higher-kinded types
  • jOOλ - Functional utilities complementing Java Streams

Further Reading:

  • Functional Programming in Java by Venkat Subramaniam - Practical FP patterns
  • Modern Java in Action by Raoul-Gabriel Urma - Streams, lambdas, and functional style
  • Optics By Example by Chris Penner - Haskell optics comprehensive guide

Comparison with Other Languages:

  • Haskell's Data.Text - Similar text processing with optics
  • Scala's Monocle - String traversals via Traversal[String, Char]

Previous: Each Typeclass Next: Indexed Access