Apache Hop 2.1.0 is available

Written by Bart Maertens | Oct 18, 2022 9:09:46 AM

It's been a hot summer at Apache Hop. After 4.5 months of work on well over 200 tickets, Apache Hop 2.1.0 is available.

Don't let the minor version number fool you, this is not a small release. Where the main focus for Apache Hop 2.0 was the upgrade to Java 11, 2.1.0 comes packed with new functionality, improvements and bug fixes.

Let's walk through what Apache Hop 2.1.0 has to offer.

MongoDB

Apache Hop now has a new delete transform for MongoDB. The MongoDB drivers and documentation have been updated.

Apache Beam

Apache Beam has been updated to 2.41.0, with Apache Spark 3.3.0 and Apache Flink 1.15.2. The Apache Spark run configuration now supports local execution again, making it easier to test your Apache Hop pipelines on Apache Spark.

A number of performance improvements have been implemented in the Apache Beam run configurations, more specifically with regard to locking, serialization and removal of redundant checks. The combination of these changes means you should see a major improvement in the processing throughput of your pipelines in any of the supported Apache Beam run configurations.

AWS Kinesis is now supported in the Beam run configurations with two new transforms: Beam Kinesis Consume and Beam Kinesis Produce.

Execution Information Framework

The most significant area of new functionality by far in Apache Hop 2.1.0 is a new execution information and data profiling framework.

This framework allows Hop users to configure where and how execution information is stored. The available framework options currently include the local file system, a remote Hop Server and a Neo4j graph database.

Alongside the pure execution information, a data profiling framework can be configured. This allows users to tell Hop to profile the data that flows through a pipeline, or to sample the first, last or a random set of rows.

A new Execution Information perspective provides an overview of the ongoing and previous executions of your workflows and pipelines, with the ability to drill up to or down from the parent or child workflow or pipeline.

All workflow and pipeline engines can be configured to gather and store execution information. Data profiling is only available to pipelines.

Kubernetes

Apache Hop 2.1.0 comes with Helm charts for Hop Server and Hop Web.

Unrelated to these Helm charts, a new section was added to the Beam docs to explain how you can run Apache Hop pipelines using the Apache Flink Kubernetes operator.

Documentation

Documentation is an ongoing effort with every release.

All transform documentation pages now contain an indication of the supported engines. A lot of these are going through testing and will be updated accordingly.

New how-to guides were added on how to work with Joins and Lookups and on how to run workflows and pipelines from Apache Airflow with the docker operator.

The installation and configuration instructions have been extended with an upgrade section. Installing and configuring Apache Hop already always was a breeze, but this section explains a couple of tweaks

Various

In addition to the major changes in Apache Hop 2.1.0 we just walked through, there are a number of other new additions to the Apache Hop plugin family:

transforms
- Microsoft Access Output transform lets you write data to Microsoft Access databases. Even though MS Access is not the most advanced data platform, it's still an indispensable data format in a lot of organizations.
- Snowflake Bulk Loader transform lets you bulk upload data to your Snowflake analytical cloud databases.
databases: Apache Hive is now a fully supported database type. Support for Apache Hive was lacking in previous Apache Hop releases. That functionality gap has now been closed.

Community

The Apache Hop community continues to grow, both in numbers and geographically.

As the community grows, so does the number of active contributors. Active community involvement is what makes an open source project thrive and grow, a huge thank you and shoutout to everyone who participated in the Apache Hop 2.1.0 release.

Even more importantly, we noticed a shift in the way organizations work with Apache Hop. Where organizations were initially exploring Apache Hop in the early days as an incubating project and in the pre-2.0 days, we're now seeing a lot of organizations using Apache Hop on a daily basis and running their projects in production.

Apache Hop and Lean With Data

Lean With Data actively develops and supports Apache Hop. We're ready to help you in every step of your Apache Hop journey with production support, coaching, architecture and training.

View full post