Skip to content

TheTechSolo

A site where Engineering, Technologies and Thoughts meet all together!

About

Cloud Comptuing · Data Structures and Algorithms · Distributed Algorithms and Communication Protocols

Apache Beam+Apache Flink/Spark for Batch&Stream Processing

October 14, 2017 Paolo Maresca

When it comes to stream processing, the Open Source community provides an entire ecosystem to tackle a set of generic problems. Among the emergent Apache projects, Beam is providing a clean programming model intended to be run on top of a runtime like Flink, Spark, Google Cloud DataFlow, etc.

A really convenient declarative framework which allows to specify complex processing pipeline in very few lines of code; the typical and inflated word count example looks like:

public static void main(String[] args) {
 WordCountOptions options = PipelineOptionsFactory.fromArgs(args).withValidation()
 .as(WordCountOptions.class);
 Pipeline p = Pipeline.create(options);

 p.apply("ReadLines", TextIO.read().from(options.getInputFile()))
 .apply(new CountWords())
 .apply(MapElements.via(new FormatAsTextFn()))
 .apply("WriteCounts", TextIO.write().to(options.getOutput()));

p.run().waitUntilFinish();
}

For a detailed explanation, there’s the quickstart page.

A solid alternative to create dataflow-oriented processing pipelines which can be executed on distributed schedulers/runtimes.

Either if you like it, or it helped you, then make it popular!

Click to email a link to a friend (Opens in new window)
Click to print (Opens in new window)
Click to share on Twitter (Opens in new window)
Click to share on Facebook (Opens in new window)
Click to share on LinkedIn (Opens in new window)
Click to share on Tumblr (Opens in new window)
Click to share on Pocket (Opens in new window)
Click to share on Reddit (Opens in new window)
Click to share on Pinterest (Opens in new window)

Like Loading...

Related

Either if you like it, or it helped you, then make it popular!

Click to email a link to a friend (Opens in new window)
Click to print (Opens in new window)
Click to share on Twitter (Opens in new window)
Click to share on Facebook (Opens in new window)
Click to share on LinkedIn (Opens in new window)
Click to share on Tumblr (Opens in new window)
Click to share on Pocket (Opens in new window)
Click to share on Reddit (Opens in new window)
Click to share on Pinterest (Opens in new window)

Like Loading...

Post navigation

Previous Post Managing Secrets with Ansible Vault – The Missing Guide (Part 1 of 2)

Next Post Consequences of a “Bad Hire”

Leave a comment Cancel reply

Δ

Follow TheTechSolo on WordPress.com

Images Cloud

This slideshow requires JavaScript.

Search for:

Follow me on Twitter

Recent Posts

Consequences of a “Bad Hire”
Apache Beam+Apache Flink/Spark for Batch&Stream Processing
Managing Secrets with Ansible Vault – The Missing Guide (Part 1 of 2)
HopFS: Scaling hierarchical file system metadata using NewSQL databases
Advancements in Data-Intensive Distributed Systems Engineering

Concepts Cloud

Applied Concurrency Applied Math Architectures and Design Patterns Cloud Comptuing Core Development Data Structures and Algorithms Distributed Algorithms and Communication Protocols Distributed Computing LaTex linux News from the Web NoSQL Operating Systems OS Kernel Performance Performance, Throughput, Real-time and Other Persistence Programming Languages Real-time and Other Scripting Software Engineering Team Leadership Thoughts from Real Life Throughput virtualization

Archives

March 2019
October 2017
March 2017
December 2016
August 2016
February 2016
January 2016
November 2015
October 2015
August 2015
July 2015
April 2015
March 2015
January 2015
November 2014

Categories

Applied Concurrency
Applied Math
Architectures and Design Patterns
Cloud Comptuing
Core Development
Data Structures and Algorithms
Distributed Algorithms and Communication Protocols
Distributed Computing
LaTex
linux
News from the Web
NoSQL
Operating Systems
OS Kernel
Performance
Performance, Throughput, Real-time and Other
Persistence
Programming Languages
Real-time and Other
Scripting
Software Engineering
Team Leadership
Thoughts from Real Life
Throughput
virtualization

Recent Comments

	The State of Async R… on Scaling to Thousands of T…
	The State of Async R… on Scaling to Thousands of T…
	Avro Vs ProtoBuf - D… on Apache Avro Schema-less Serial…
	Hairstyles on Hazelcast In Memory Data Grid:…
	Beauty Fashion on Hazelcast In Memory Data Grid:…

Meta

Register
Log in
Entries feed
Comments feed
WordPress.com

Create a free website or blog at WordPress.com.

Privacy & Cookies: This site uses cookies. By continuing to use this website, you agree to their use.
To find out more, including how to control cookies, see here: Cookie Policy

Comment
Reblog
Subscribe Subscribed
- TheTechSolo
- Already have a WordPress.com account? Log in now.

Loading Comments...

Write a Comment...

Email (Required)

Name (Required)

Website

%d

%d