When it comes to stream processing, the Open Source community provides an entire ecosystem to tackle a set of generic problems. Among the emergent Apache projects, Beam is providing a clean programming model intended to be run on top of a runtime like Flink, Spark, Google Cloud DataFlow, etc.
A really convenient declarative framework which allows to specify complex processing pipeline in very few lines of code; the typical and inflated word count example looks like:
public static void main(String[] args) { WordCountOptions options = PipelineOptionsFactory.fromArgs(args).withValidation() .as(WordCountOptions.class); Pipeline p = Pipeline.create(options); p.apply("ReadLines", TextIO.read().from(options.getInputFile())) .apply(new CountWords()) .apply(MapElements.via(new FormatAsTextFn())) .apply("WriteCounts", TextIO.write().to(options.getOutput())); p.run().waitUntilFinish(); }
For a detailed explanation, there’s the quickstart page.
A solid alternative to create dataflow-oriented processing pipelines which can be executed on distributed schedulers/runtimes.