Benchmarks & Performance

Higher-Kinded-J ships with a comprehensive JMH benchmark suite in the hkj-benchmarks module. These benchmarks measure the real cost of the library's abstractions so you can make informed decisions about where and when to use them.

What You'll Learn

  • What the benchmark suite covers and how it is organised
  • How to run benchmarks: all, per-type, with GC profiling
  • How to interpret results and spot regressions
  • What performance characteristics to expect from each type

"Measure. Don't guess."Kirk Pepperdine, Java performance expert


Why Benchmarks Matter

Functional abstractions wrap values. Wrapping has a cost. The question is never "is there overhead?" — there always is — but "does the overhead matter for my workload?" The benchmark suite answers that question with data rather than intuition.

The suite is designed around three principles:

  1. Honesty — measure real abstraction costs, not contrived best cases
  2. Comparability — include raw Java baselines alongside library operations
  3. Actionability — organise results so regressions are immediately visible

What Is Measured

The hkj-benchmarks module contains 19 benchmark classes covering every major type in the library:

Core Types

BenchmarkTypeWhat It Tells You
EitherBenchmarkEither<L,R>Instance reuse on the Left track, short-circuit efficiency
MaybeBenchmarkMaybe<A>Instance reuse on Nothing, nullable interop cost
TrampolineBenchmarkTrampoline<A>Stack-safe recursion overhead vs naive recursion
FreeBenchmarkFree<F,A>Free monad interpretation cost

Effect Types

BenchmarkTypeWhat It Tells You
IOBenchmarkIO<A>Lazy construction and platform thread execution
VTaskBenchmarkVTask<A>Virtual thread execution, map/flatMap chains
VStreamBenchmarkVStream<A>Pull-based stream construction, combinator pipelines, parallel ops, chunking, Java Stream comparison
VTaskParBenchmarkPar combinatorsParallel zip, all, race, traverse via StructuredTaskScope
ScopeBenchmarkScope, ResourceScope joiner strategies (allSucceed, anySucceed, accumulating), Resource bracket overhead

Effect Path Wrappers

BenchmarkTypeWhat It Tells You
VTaskPathBenchmarkVTaskPath<A>Wrapper overhead on top of VTask
IOPathBenchmarkIOPath<A>Wrapper overhead on top of IO
ForPathVTaskBenchmarkForPath with VTaskFor-comprehension tuple allocation cost

Comparisons

BenchmarkWhat It Compares
VTaskVsIOBenchmarkVirtual threads vs platform threads
VTaskVsPlatformThreadsBenchmarkVTask vs ExecutorService at scale
VTaskPathVsIOPathBenchmarkPath wrapper costs across effect types
AbstractionOverheadBenchmarkHKJ abstractions vs raw Java
ConcurrencyScalingBenchmarkThread scaling under concurrent load
MemoryFootprintBenchmarkAllocation rates for VTask, IO, CompletableFuture

Running Benchmarks

All Benchmarks

./gradlew :hkj-benchmarks:jmh

A Single Benchmark Class

./gradlew :hkj-benchmarks:jmh --includes=".*VStreamBenchmark.*"
./gradlew :hkj-benchmarks:jmh --includes=".*VTaskBenchmark.*"
./gradlew :hkj-benchmarks:jmh --includes=".*EitherBenchmark.*"

A Single Benchmark Method

./gradlew :hkj-benchmarks:jmh --includes=".*VTaskBenchmark.runSucceed.*"

With GC Profiling

This reveals allocation rates and GC pressure — essential for understanding memory behaviour:

./gradlew :hkj-benchmarks:jmh -Pjmh.profilers=gc

Long / Stress Mode

Runs with chainDepth=10000 and recursionDepth=10000 for thorough stack-safety validation:

./gradlew :hkj-benchmarks:longBenchmark

Formatted Report

./gradlew :hkj-benchmarks:benchmarkReport

Reading the Output

JMH reports throughput in operations per microsecond. Higher is better.

Benchmark                                    Mode  Cnt   Score   Error   Units
EitherBenchmark.rightMap                   thrpt   20  15.234 ± 0.512  ops/us
EitherBenchmark.leftMap                    thrpt   20  89.123 ± 1.234  ops/us

Score is the measured throughput. Error is the 99.9% confidence interval. If the error is larger than ~30% of the score, the result is noisy — increase warmup or measurement iterations.

What to Look For

SignalMeaning
Left/Nothing operations 5-10x faster than Right/JustInstance reuse is working
VTask ~10-30% slower than IO for simple opsExpected virtual thread overhead
Deep chain (50+ steps) completes without errorStack safety is intact
VStream slower than Java StreamExpected; virtual thread + pull overhead
parEvalMap scales with concurrency for I/OParallel pipeline working correctly
Scope joiners similar speed to Par.allMinimal Scope abstraction cost
Wrapper overhead < 15%Acceptable Path wrapper cost

Warning Signs

SignalPossible Cause
Left/Nothing same speed as Right/JustInstance reuse broken
Error margin > 50% of scoreNoisy environment, insufficient warmup
Deep chain throws StackOverflowErrorStack safety regression
VStream > 100x slower than Java StreamExcessive allocation in pull loop
Wrapper overhead > 30%Unnecessary allocation in Path wrapper

Expected Performance by Type

Either and Maybe

These types use instance reuse: Left and Nothing operations return the same object without allocating, making short-circuit paths essentially free.

ComparisonExpected Ratio
leftMap vs rightMapLeft 5-10x faster
nothingMap vs justMapNothing 5-10x faster
leftLongChain vs rightLongChainLeft 10-50x faster

VTask

Virtual thread overhead is the dominant cost for simple operations. For real workloads involving I/O, this overhead is negligible.

ComparisonExpected
Construction (succeed, delay)Very fast (~100+ ops/us)
VTask vs IO (simple execution)VTask ~10-30% slower
Deep chains (50+)Completes without error
High concurrency (1000+ tasks)VTask scales better than platform threads

VStream

VStream's pull-based model adds overhead per element compared to Java Stream's push model, but provides laziness, virtual thread execution, and error recovery that Java Stream cannot.

ComparisonExpected
Construction (empty, of, range)Very fast (~100+ ops/us)
VStream map vs Java Stream mapVStream slower
Deep map chain (50)Completes without error
Deep flatMap chain (50)Completes without error
existsEarlyMatch vs existsNoMatchEarly match much faster (short-circuit)

Effect Path Wrappers

ComparisonExpected Overhead
VTaskPath vs raw VTask5-15%
IOPath vs raw IO5-15%
ForPath vs direct chaining10-25%

Benchmark Assertion Tests

The benchmark suite includes automated assertion tests that validate performance characteristics after each benchmark run. These are not just "did it finish?" checks — they verify relative performance, overhead ratios, and sanity bounds.

Tests Fail If Benchmarks Haven't Run

The assertion tests fail (not skip) if benchmark results are missing. This is intentional. Run ./gradlew :hkj-benchmarks:jmh before running ./gradlew :hkj-benchmarks:test. Silent skips hide missing quality gates.

What the Tests Validate

Test GroupWhat It Checks
SanityChecksEvery benchmark has positive throughput and bounded error margins
VTaskRelativePerformanceVTask construction costs (succeed, of, map) are positive
ParCombinatorPerformancePar.zip and Par.map2 have positive throughput
VTaskVsIOOverheadBoth VTask and IO construction perform within expected bounds
CoreTypePerformanceMaybe, Either, and Trampoline operations have positive throughput
FoldPlusPerformanceFold combination overhead is bounded; sum vs plus parity
AbstractionOverheadRaw Java > IO > VTask ordering; VTaskPath wrapper overhead bounded
ConcurrencyScalingSingle and multi-threaded VTask/IO performance is positive
IOPerformanceIO construction vs execution ratios; deep recursion completes (stack safety)
IOPathPerformanceIOPath construction, map pipelines, and error handling overhead
VTaskPathPerformanceVTaskPath construction, map pipelines, and timeout overhead
VTaskPathVsIOPathCross-type comparison: construction ratios and conversion costs
ForPathVTaskPerformanceFor-comprehension overhead vs direct chaining; parallel step overhead
ScopePerformanceScope.allSucceed, Resource bracket, and Par.all throughput
MemoryFootprintBulk construction rates for VTask, IO, and CompletableFuture
VStreamPerformanceVStream map execution, construction vs execution, Java Stream baseline
VTaskVsPlatformThreadsVTask Par.all vs platform thread pool at scale
FreeMonadPerformanceFree monad construction, stack safety, and interpretation overhead

Running the Tests

# Step 1: Run benchmarks (generates results.json)
./gradlew :hkj-benchmarks:jmh

# Step 2: Run assertion tests against the results
./gradlew :hkj-benchmarks:test

# Or run both together via the benchmarkValidation task
./gradlew benchmarkValidation

Release Quality Gate

The releaseReadiness task is a single-command quality gate that runs every verification step, ordered from fastest to slowest so failures surface early:

./gradlew releaseReadiness
StepTaskWhat It ChecksSpeed
1spotlessCheckCode formatting (Google Java Format)Seconds
2buildCompilation, all unit tests, JaCoCo coverageMinutes
3:hkj-benchmarks:jmhJMH benchmarks execute successfullyMinutes
4:hkj-benchmarks:testBenchmark assertion tests passSeconds
5:hkj-processor:pitest (full)Mutation testing with STRONGER mutatorsSlowest

If any step fails, the build stops immediately. All five must pass before a release.

Pitest Full Profile

The release gate runs pitest with -Ppitest.profile=full, which uses STRONGER mutators and all available CPU cores. This is more thorough than the default conservative profile used during local development.

Reports Generated

After a successful run, reports are available at:

ToolLocation
JaCoCohkj-core/build/reports/jacoco/test/html/index.html
JMH (JSON)hkj-benchmarks/build/reports/jmh/results.json
JMH (human)hkj-benchmarks/build/reports/jmh/human.txt
Pitesthkj-processor/build/reports/pitest/index.html

When Overhead Matters (and When It Doesn't)

The benchmarks consistently show that abstraction overhead is measured in nanoseconds. Real-world operations — database queries, HTTP calls, file reads — are measured in milliseconds. The overhead is three to four orders of magnitude smaller than any I/O operation.

Abstraction overhead matters in exactly two scenarios:

  1. Tight computational loops processing millions of items per second with no I/O — use primitives directly
  2. Very long chains (hundreds of steps) creating GC pressure — break into named submethods

For everything else, the type safety, composability, and testability benefits far outweigh the cost.

See Also


Previous: Release History