Benchmarks & Performance

Higher-Kinded-J ships with a comprehensive JMH benchmark suite in the hkj-benchmarks module. These benchmarks measure the real cost of the library's abstractions so you can make informed decisions about where and when to use them.

What You'll Learn

What the benchmark suite covers and how it is organised
How to run benchmarks: all, per-type, with GC profiling
How to interpret results and spot regressions
What performance characteristics to expect from each type

"Measure. Don't guess." — Kirk Pepperdine, Java performance expert

Why Benchmarks Matter

Functional abstractions wrap values. Wrapping has a cost. The question is never "is there overhead?" — there always is — but "does the overhead matter for my workload?" The benchmark suite answers that question with data rather than intuition.

The suite is designed around three principles:

Honesty — measure real abstraction costs, not contrived best cases
Comparability — include raw Java baselines alongside library operations
Actionability — organise results so regressions are immediately visible

What Is Measured

The hkj-benchmarks module contains 18 benchmark classes covering every major type in the library:

Core Types

Benchmark	Type	What It Tells You
`EitherBenchmark`	`Either<L,R>`	Instance reuse on the Left track, short-circuit efficiency
`MaybeBenchmark`	`Maybe<A>`	Instance reuse on Nothing, nullable interop cost
`TrampolineBenchmark`	`Trampoline<A>`	Stack-safe recursion overhead vs naive recursion
`FreeBenchmark`	`Free<F,A>`	Free monad interpretation cost

Effect Types

Benchmark	Type	What It Tells You
`IOBenchmark`	`IO<A>`	Lazy construction and platform thread execution
`VTaskBenchmark`	`VTask<A>`	Virtual thread execution, map/flatMap chains
`VStreamBenchmark`	`VStream<A>`	Pull-based stream construction, combinator pipelines, Java Stream comparison
`VTaskParBenchmark`	`Par` combinators	Parallel zip, all, race, traverse via StructuredTaskScope

Effect Path Wrappers

Benchmark	Type	What It Tells You
`VTaskPathBenchmark`	`VTaskPath<A>`	Wrapper overhead on top of VTask
`IOPathBenchmark`	`IOPath<A>`	Wrapper overhead on top of IO
`ForPathVTaskBenchmark`	ForPath with VTask	For-comprehension tuple allocation cost

Comparisons

Benchmark	What It Compares
`VTaskVsIOBenchmark`	Virtual threads vs platform threads
`VTaskVsPlatformThreadsBenchmark`	VTask vs ExecutorService at scale
`VTaskPathVsIOPathBenchmark`	Path wrapper costs across effect types
`AbstractionOverheadBenchmark`	HKJ abstractions vs raw Java
`ConcurrencyScalingBenchmark`	Thread scaling under concurrent load
`MemoryFootprintBenchmark`	Allocation rates for VTask, IO, CompletableFuture

Running Benchmarks

All Benchmarks

./gradlew :hkj-benchmarks:jmh

A Single Benchmark Class

./gradlew :hkj-benchmarks:jmh --includes=".*VStreamBenchmark.*"
./gradlew :hkj-benchmarks:jmh --includes=".*VTaskBenchmark.*"
./gradlew :hkj-benchmarks:jmh --includes=".*EitherBenchmark.*"

A Single Benchmark Method

./gradlew :hkj-benchmarks:jmh --includes=".*VTaskBenchmark.runSucceed.*"

With GC Profiling

This reveals allocation rates and GC pressure — essential for understanding memory behaviour:

./gradlew :hkj-benchmarks:jmh -Pjmh.profilers=gc

Long / Stress Mode

Runs with chainDepth=10000 and recursionDepth=10000 for thorough stack-safety validation:

./gradlew :hkj-benchmarks:longBenchmark

Formatted Report

./gradlew :hkj-benchmarks:benchmarkReport

Reading the Output

JMH reports throughput in operations per microsecond. Higher is better.

Benchmark                                    Mode  Cnt   Score   Error   Units
EitherBenchmark.rightMap                   thrpt   20  15.234 ± 0.512  ops/us
EitherBenchmark.leftMap                    thrpt   20  89.123 ± 1.234  ops/us

Score is the measured throughput. Error is the 99.9% confidence interval. If the error is larger than ~30% of the score, the result is noisy — increase warmup or measurement iterations.

What to Look For

Signal	Meaning
Left/Nothing operations 5-10x faster than Right/Just	Instance reuse is working
VTask ~10-30% slower than IO for simple ops	Expected virtual thread overhead
Deep chain (50+ steps) completes without error	Stack safety is intact
VStream slower than Java Stream	Expected; virtual thread + pull overhead
Wrapper overhead < 15%	Acceptable Path wrapper cost

Warning Signs

Signal	Possible Cause
Left/Nothing same speed as Right/Just	Instance reuse broken
Error margin > 50% of score	Noisy environment, insufficient warmup
Deep chain throws StackOverflowError	Stack safety regression
VStream > 100x slower than Java Stream	Excessive allocation in pull loop
Wrapper overhead > 30%	Unnecessary allocation in Path wrapper

Expected Performance by Type

Either and Maybe

These types use instance reuse: Left and Nothing operations return the same object without allocating, making short-circuit paths essentially free.

Comparison	Expected Ratio
`leftMap` vs `rightMap`	Left 5-10x faster
`nothingMap` vs `justMap`	Nothing 5-10x faster
`leftLongChain` vs `rightLongChain`	Left 10-50x faster

VTask

Virtual thread overhead is the dominant cost for simple operations. For real workloads involving I/O, this overhead is negligible.

Comparison	Expected
Construction (succeed, delay)	Very fast (~100+ ops/us)
VTask vs IO (simple execution)	VTask ~10-30% slower
Deep chains (50+)	Completes without error
High concurrency (1000+ tasks)	VTask scales better than platform threads

VStream

VStream's pull-based model adds overhead per element compared to Java Stream's push model, but provides laziness, virtual thread execution, and error recovery that Java Stream cannot.

Comparison	Expected
Construction (empty, of, range)	Very fast (~100+ ops/us)
VStream map vs Java Stream map	VStream slower
Deep map chain (50)	Completes without error
Deep flatMap chain (50)	Completes without error
`existsEarlyMatch` vs `existsNoMatch`	Early match much faster (short-circuit)

Effect Path Wrappers

Comparison	Expected Overhead
VTaskPath vs raw VTask	5-15%
IOPath vs raw IO	5-15%
ForPath vs direct chaining	10-25%

When Overhead Matters (and When It Doesn't)

The benchmarks consistently show that abstraction overhead is measured in nanoseconds. Real-world operations — database queries, HTTP calls, file reads — are measured in milliseconds. The overhead is three to four orders of magnitude smaller than any I/O operation.

Abstraction overhead matters in exactly two scenarios:

Tight computational loops processing millions of items per second with no I/O — use primitives directly
Very long chains (hundreds of steps) creating GC pressure — break into named submethods

For everything else, the type safety, composability, and testability benefits far outweigh the cost.

Higher-Kinded-J: Composable Effects and Advanced Optics for Java