It's been a hot summer at Apache Hop. After 4.5 months of work on well over 200 tickets, Apache Hop 2.1.0 is available.
Don't let the minor version number fool you, this is not a small release. Where the main focus for Apache Hop 2.0 was the upgrade to Java 11, 2.1.0 comes packed with new functionality, improvements and bug fixes.
Let's walk through what Apache Hop 2.1.0 has to offer.
A number of performance improvements have been implemented in the Apache Beam run configurations, more specifically with regard to locking, serialization and removal of redundant checks. The combination of these changes means you should see a major improvement in the processing throughput of your pipelines in any of the supported Apache Beam run configurations.
AWS Kinesis is now supported in the Beam run configurations with two new transforms: Beam Kinesis Consume and Beam Kinesis Produce.
This framework allows Hop users to configure where and how execution information is stored. The available framework options currently include the local file system, a remote Hop Server and a Neo4j graph database.
Alongside the pure execution information, a data profiling framework can be configured. This allows users to tell Hop to profile the data that flows through a pipeline, or to sample the first, last or a random set of rows.
A new Execution Information perspective provides an overview of the ongoing and previous executions of your workflows and pipelines, with the ability to drill up to or down from the parent or child workflow or pipeline.
All workflow and pipeline engines can be configured to gather and store execution information. Data profiling is only available to pipelines.
Unrelated to these Helm charts, a new section was added to the Beam docs to explain how you can run Apache Hop pipelines using the Apache Flink Kubernetes operator.
Documentation is an ongoing effort with every release.
All transform documentation pages now contain an indication of the supported engines. A lot of these are going through testing and will be updated accordingly.
New how-to guides were added on how to work with Joins and Lookups and on how to run workflows and pipelines from Apache Airflow with the docker operator.
The installation and configuration instructions have been extended with an upgrade section. Installing and configuring Apache Hop already always was a breeze, but this section explains a couple of tweaks
In addition to the major changes in Apache Hop 2.1.0 we just walked through, there are a number of other new additions to the Apache Hop plugin family:
The Apache Hop community continues to grow, both in numbers and geographically.
As the community grows, so does the number of active contributors. Active community involvement is what makes an open source project thrive and grow, a huge thank you and shoutout to everyone who participated in the Apache Hop 2.1.0 release.
Even more importantly, we noticed a shift in the way organizations work with Apache Hop. Where organizations were initially exploring Apache Hop in the early days as an incubating project and in the pre-2.0 days, we're now seeing a lot of organizations using Apache Hop on a daily basis and running their projects in production.
Lean With Data actively develops and supports Apache Hop. We're ready to help you in every step of your Apache Hop journey with production support, coaching, architecture and training.
Contact us to find out more.