String Traversals: Declarative Text Processing

Type-Safe Text Manipulation Without Regex Complexity

What You'll Learn

Breaking strings into traversable units (characters, words, lines)
Declarative text normalisation and validation
Composing string traversals with filtered optics for pattern matching
Real-world text processing: logs, CSV, configuration files
When to use string traversals vs Stream API vs regex
Performance characteristics and best practices

Example Code

Working with text in Java often feels like choosing between extremes: verbose manual string manipulation with substring() and indexOf(), or cryptic regular expressions that become unmaintainable. String traversals offer a middle path: declarative, composable, and type-safe.

Consider these common scenarios from enterprise Java applications:

Configuration Management: Normalising property values across .properties files
Log Analysis: Filtering and transforming log entries line-by-line
Data Import: Processing CSV files with per-field transformations
API Integration: Standardising email addresses from external systems
Validation: Checking character-level constraints (length, allowed characters)

The traditional approach mixes parsing logic with transformation logic, making code difficult to test and reuse:

// Traditional: Mixed concerns, hard to compose
String normaliseEmail(String email) {
    String[] parts = email.toLowerCase().split("@");
    if (parts.length != 2) throw new IllegalArgumentException();
    String domain = parts[1].trim();
    return parts[0] + "@" + domain;
}

// What if we need to normalise just the domain? Or multiple emails in a document?
// We'd need separate methods or complex parameters.

String traversals let you separate the "what" (the structure) from the "how" (the transformation), making your text processing logic reusable and composable.

Think of String Traversals Like...

Java Stream's split() + map(): Like text.lines().map(...) but integrated into optic composition
IntelliJ's "Replace in Selection": Focus on text units, transform them, reassemble automatically
Unix text tools: Similar to awk and sed pipelines, but type-safe and composable
SQL's string functions: Like UPPER(), TRIM(), SPLIT_PART(), but for immutable Java strings

The key insight: text structure (characters, words, lines) becomes part of your optic's identity, not preprocessing before the real work.

Three Ways to Decompose Text

The StringTraversals utility class provides three fundamental decompositions:

Method	Unit	Example Input	Focused Elements
`chars()`	Characters	`"hello"`	`['h', 'e', 'l', 'l', 'o']`
`worded()`	Words (by `\s+`)	`"hello world"`	`["hello", "world"]`
`lined()`	Lines (by `\n`, `\r\n`, `\r`)	`"line1\nline2"`	`["line1", "line2"]`

Each returns a Traversal<String, ?> that can be composed with other optics and applied via Traversals.modify() or Traversals.getAll().

A Step-by-Step Walkthrough

Step 1: Character-Level Processing with `chars()`

The chars() traversal breaks a string into individual characters, allowing transformations at the finest granularity.

import org.higherkindedj.optics.util.StringTraversals;
import org.higherkindedj.optics.util.Traversals;

// Create a character traversal
Traversal<String, Character> charTraversal = StringTraversals.chars();

// Transform all characters to uppercase
String uppercased = Traversals.modify(charTraversal, Character::toUpperCase, "hello world");
// Result: "HELLO WORLD"

// Extract all characters as a list
List<Character> chars = Traversals.getAll(charTraversal, "abc");
// Result: ['a', 'b', 'c']

// Compose with filtered for selective transformation
Traversal<String, Character> vowels = charTraversal.filtered(c ->
    "aeiouAEIOU".indexOf(c) >= 0
);
String result = Traversals.modify(vowels, Character::toUpperCase, "hello world");
// Result: "hEllO wOrld"  (only vowels uppercased)

Use Cases:

Character-level validation (alphanumeric checks)
ROT13 or Caesar cipher transformations
Character frequency analysis
Removing or replacing specific characters

Step 2: Word-Level Processing with `worded()`

The worded() traversal splits by whitespace (\s+), focusing on each word independently.

Key Semantics:

Multiple consecutive spaces are normalised to single spaces
Leading and trailing whitespace is removed
Empty strings or whitespace-only strings produce no words

Traversal<String, String> wordTraversal = StringTraversals.worded();

// Capitalise each word
String capitalised = Traversals.modify(
    wordTraversal,
    word -> word.substring(0, 1).toUpperCase() + word.substring(1).toLowerCase(),
    "hello WORLD from JAVA"
);
// Result: "Hello World From Java"

// Extract all words (whitespace normalisation automatic)
List<String> words = Traversals.getAll(wordTraversal, "foo  bar\t\tbaz");
// Result: ["foo", "bar", "baz"]

// Compose with filtered for conditional transformation
Traversal<String, String> longWords = wordTraversal.filtered(w -> w.length() > 5);
String emphasised = Traversals.modify(longWords, w -> w.toUpperCase(), "make software better");
// Result: "make SOFTWARE BETTER"

Use Cases:

Title case formatting
Stop word filtering
Word-based text normalisation
Search query processing
Email domain extraction

Step 3: Line-Level Processing with `lined()`

The lined() traversal splits by line separators (\n, \r\n, or \r), treating each line as a focus target.

Key Semantics:

All line endings are normalised to \n in output
Empty strings produce no lines
Trailing newlines are preserved in individual line processing

Traversal<String, String> lineTraversal = StringTraversals.lined();

// Prefix each line with a marker
String prefixed = Traversals.modify(
    lineTraversal,
    line -> "> " + line,
    "line1\nline2\nline3"
);
// Result: "> line1\n> line2\n> line3"

// Extract all non-empty lines
List<String> lines = Traversals.getAll(lineTraversal, "first\n\nthird");
// Result: ["first", "", "third"]  (empty line preserved)

// Filter lines by content
Traversal<String, String> errorLines = lineTraversal.filtered(line ->
    line.contains("ERROR")
);
String errors = Traversals.getAll(errorLines, logContent).stream()
    .collect(Collectors.joining("\n"));

Use Cases:

Log file filtering and transformation
CSV row processing
Configuration file parsing
Code formatting (indentation, comments)
Multi-line text validation

Real-World Example: Email Normalisation Service

A common requirement in enterprise systems: normalise email addresses from various sources before storage.

import org.higherkindedj.optics.util.StringTraversals;
import org.higherkindedj.optics.util.Traversals;

public class EmailNormaliser {

    // Normalise the local part (before @) to lowercase
    // Normalise the domain (after @) to lowercase and trim
    public static String normalise(String email) {
        Traversal<String, String> words = StringTraversals.worded();

        // Split email by @ symbol (treated as whitespace separator for this example)
        // In production, you'd want more robust parsing
        String lowercased = Traversals.modify(words, String::toLowerCase, email);

        return lowercased.trim();
    }

    // More sophisticated: normalise domain parts separately
    public static String normaliseDomain(String email) {
        int atIndex = email.indexOf('@');
        if (atIndex == -1) return email;

        String local = email.substring(0, atIndex);
        String domain = email.substring(atIndex + 1);

        // Normalise domain components
        Traversal<String, String> domainParts = StringTraversals.worded();
        String normalisedDomain = Traversals.modify(
            domainParts,
            String::toLowerCase,
            domain.replace(".", " ")  // Split domain by dots
        ).replace(" ", ".");  // Rejoin

        return local + "@" + normalisedDomain;
    }
}

Composing String Traversals

The power emerges when combining string traversals with other optics:

With Filtered Traversals – Pattern Matching

// Find and transform lines starting with a prefix
Traversal<String, String> commentLines =
    StringTraversals.lined().filtered(line -> line.trim().startsWith("#"));

String withoutComments = Traversals.modify(
    commentLines,
    line -> "",  // Remove comment lines by replacing with empty
    sourceCode
);

With Nested Structures – Bulk Text Processing

@GenerateLenses
public record Document(String title, List<String> paragraphs) {}

// Capitalise first letter of each word in all paragraphs
Traversal<Document, String> allWords =
    DocumentLenses.paragraphs().asTraversal()
        .andThen(Traversals.forList())
        .andThen(StringTraversals.worded());

Document formatted = Traversals.modify(
    allWords,
    word -> word.substring(0, 1).toUpperCase() + word.substring(1),
    document
);

With Effectful Operations – Validation

import org.higherkindedj.hkt.optional.OptionalMonad;

// Validate that all words are alphanumeric
Traversal<String, String> words = StringTraversals.worded();

Function<String, Kind<OptionalKind.Witness, String>> validateWord = word -> {
    boolean alphanumeric = word.chars().allMatch(Character::isLetterOrDigit);
    return alphanumeric
        ? OptionalKindHelper.OPTIONAL.widen(Optional.of(word))
        : OptionalKindHelper.OPTIONAL.widen(Optional.empty());
};

Optional<String> validated = OptionalKindHelper.OPTIONAL.narrow(
    words.modifyF(validateWord, input, OptionalMonad.INSTANCE)
);
// Returns Optional.empty() if any word contains non-alphanumeric characters

Common Patterns

Log File Processing

// Extract ERROR lines from application logs
Traversal<String, String> errorLines =
    StringTraversals.lined().filtered(line -> line.contains("ERROR"));

List<String> errors = Traversals.getAll(errorLines, logContent);

// Add timestamps to each line
Traversal<String, String> allLines = StringTraversals.lined();
String timestamped = Traversals.modify(
    allLines,
    line -> LocalDateTime.now() + " " + line,
    originalLog
);

CSV Processing

// Process CSV by splitting into lines, then cells
Traversal<String, String> rows = StringTraversals.lined();
Traversal<String, String> cells = StringTraversals.worded();  // Simplified; use split(",") in production

// Transform specific column (e.g., third column to uppercase)
String processedCsv = Traversals.modify(
    rows,
    row -> {
        List<String> parts = List.of(row.split(","));
        if (parts.size() > 2) {
            List<String> modified = new ArrayList<>(parts);
            modified.set(2, parts.get(2).toUpperCase());
            return String.join(",", modified);
        }
        return row;
    },
    csvContent
);

Configuration File Normalisation

// Trim all property values in .properties format
Traversal<String, String> propertyLines = StringTraversals.lined();

String normalised = Traversals.modify(
    propertyLines,
    line -> {
        if (line.contains("=")) {
            String[] parts = line.split("=", 2);
            return parts[0].trim() + "=" + parts[1].trim();
        }
        return line;
    },
    propertiesContent
);

When to Use String Traversals vs Other Approaches

Use String Traversals When:

Reusable text transformations - Define once, apply across multiple strings
Composable pipelines - Building complex optic chains with lenses and prisms
Type-safe operations - Character/word/line transformations with compile-time safety
Immutable updates - Transforming text whilst keeping data immutable
Declarative intent - Express "what" without "how" (no manual indexing)

// Perfect: Reusable, composable, declarative
Traversal<Config, String> allPropertyValues =
    ConfigLenses.properties().asTraversal()
        .andThen(StringTraversals.lined())
        .andThen(StringTraversals.worded());

Config trimmed = Traversals.modify(allPropertyValues, String::trim, config);

Use Stream API When:

Complex filtering - Multiple conditions with short-circuiting
Aggregations - Counting, collecting to new structures
No structural preservation needed - Extracting data, not updating in place
One-time operations - Not reused across different contexts

// Better with streams: Complex aggregation
long wordCount = text.lines()
    .flatMap(line -> Arrays.stream(line.split("\\s+")))
    .filter(word -> word.length() > 5)
    .count();

Use Regular Expressions When:

Complex pattern matching - Extracting structured data (emails, URLs, dates)
Search and replace - Simple find-and-replace operations
Validation - Checking format compliance (phone numbers, postal codes)

// Sometimes regex is clearest
Pattern emailPattern = Pattern.compile("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}");
Matcher matcher = emailPattern.matcher(text);
while (matcher.find()) {
    processEmail(matcher.group());
}

Common Pitfalls

Don't Do This:

// Inefficient: Creating traversals in loops
for (String paragraph : document.paragraphs()) {
    Traversal<String, String> words = StringTraversals.worded();
    Traversals.modify(words, String::toUpperCase, paragraph);
}

// Over-engineering: Using traversals for simple operations
Traversal<String, Character> chars = StringTraversals.chars();
String upper = Traversals.modify(chars, Character::toUpperCase, "hello");
// Just use: "hello".toUpperCase()

// Wrong expectation: Thinking it changes string length
Traversal<String, Character> chars = StringTraversals.chars();
String result = Traversals.modify(chars.filtered(c -> c != 'a'), c -> c, "banana");
// Result: "banana" (still 6 chars, 'a' positions unchanged)
// Filtered traversals preserve structure!

Do This Instead:

// Efficient: Create traversal once, reuse
Traversal<String, String> words = StringTraversals.worded();
List<String> processed = document.paragraphs().stream()
    .map(p -> Traversals.modify(words, String::toUpperCase, p))
    .collect(toList());

// Right tool: Use built-in methods for simple cases
String upper = text.toUpperCase();  // Simple and clear

// Correct expectation: Use getAll for extraction
Traversal<String, Character> vowels = StringTraversals.chars()
    .filtered(c -> "aeiou".indexOf(c) >= 0);
List<Character> extracted = Traversals.getAll(vowels, "banana");
// Result: ['a', 'a', 'a'] - extracts vowels without changing structure

Performance Notes

String traversals are optimised for immutability:

Single pass: Text is decomposed and reconstructed in one traversal
No intermediate strings: Operates on character/word lists internally
Structural sharing: For filtered operations, unchanged portions reference original
Lazy bounds checking: Minimal overhead for validation

Best Practice: For frequently used string transformations, create traversals as constants:

public class TextProcessing {
    // Reusable string traversals
    public static final Traversal<String, String> WORDS =
        StringTraversals.worded();

    public static final Traversal<String, String> LINES =
        StringTraversals.lined();

    public static final Traversal<String, Character> VOWELS =
        StringTraversals.chars().filtered(c -> "aeiouAEIOU".indexOf(c) >= 0);

    // Domain-specific compositions
    public static final Traversal<String, String> ERROR_LOG_LINES =
        LINES.filtered(line -> line.contains("ERROR"));
}

Integration with Functional Java Ecosystem

String traversals complement existing functional libraries:

Cyclops Integration

import cyclops.control.Validated;

// Validate each word using Cyclops Validated
Traversal<String, String> words = StringTraversals.worded();

Function<String, Kind<ValidatedKind.Witness<List<String>>, String>> validateLength =
    word -> word.length() <= 10
        ? VALIDATED.widen(Validated.valid(word))
        : VALIDATED.widen(Validated.invalid(List.of("Word too long: " + word)));

Validated<List<String>, String> result = VALIDATED.narrow(
    words.modifyF(validateLength, input, validatedApplicative)
);

Functional Java Libraries:

Cyclops - Functional control structures and higher-kinded types
jOOλ - Functional utilities complementing Java Streams

Further Reading:

Functional Programming in Java by Venkat Subramaniam - Practical FP patterns
Modern Java in Action by Raoul-Gabriel Urma - Streams, lambdas, and functional style
Optics By Example by Chris Penner - Haskell optics comprehensive guide

Comparison with Other Languages:

Haskell's Data.Text - Similar text processing with optics
Scala's Monocle - String traversals via Traversal[String, Char]

Previous: Each Typeclass Next: Indexed Access

Higher-Kinded-J: Composable Effects and Advanced Optics for Java