The Apache Hop team just released Apache Hop 2.0.0, the first major release after graduating as a...
Apache Hop 2.2.0 is available!
The Apache Hop community just released Apache Hop 2.2.0, the fifth (!!) and final release of 2022,
With another two months of work and over 160 tickets processed, 2.2.0 is not just a minor release. A lot was done, mainly on two fronts: Hop GUI and Hop on one hand, Apache Beam and Google Dataflow on the other.
Apache Beam and Google Dataflow
A lot has happened to improve the experience of developing and working with Apache Hop pipelines in the Apache Beam context, especially Google Dataflow. Here at Lean With Data, we've been working with the Apache Hop community and our friends in the Apache Beam team at Google to make building and running pipelines even easier and more enjoyable.
Apache Beam Upgrade: as with every release, Apache Hop 2.2.0 ships with the latest Apache Beam release (2.43.0) with support for Apache Spark 3.3.0 and Apache Flink 1.15.2.
Google Dataflow specific pipeline options: the Google Dataflow pipeline run configuration in Apache Hop now supports passing specific options to Google Dataflow jobs. This was implemented after a lengthy discussion on the Apache Hop mailing lists. Check the mailing list archives to catch up on the entire thread.
Jump to GCP Dataflow console: when developing or debugging Apache Beam pipelines in Hop GUI, you'll often want to follow up on progress in the environment your pipeline was deployed to. For Google Cloud Dataflow pipelines, Hop GUI now has a button that lets you jump directly to the GCP Dataflow console for your running dataflow job.
Simple mapping support: the Simple Mapping transform allows Hop data developers to re-use a series of transforms in their pipelines. As this helps to reduce or even prevent code duplication in your projects, it helps to enhance your overall project and code quality. The Simple Mapping transform is now supported in Apache Beam pipelines.
Dataflow templates are a way to package your Dataflow templates for deployment. Since Dataflow templates can be parameterized, scheduled to be deployed later, version controlled and more, they offer a great way to manage your GCP Dataflow pipelines and jobs. Apache Hop 2.2.0 now supports Google Dataflow Flex templates, check the docs for more information and to start building your own Flex templates.
Also noteworthy is that
- the Apache Beam API in Apache Hop now is a first-class citizen. Plugin developers can now use and depend on the Beam API in their own plugins.
- as the number of integration tests continues to grow, the Hop development team is gathering more and reliable information about which plugins work with the various Beam run configurations. This has been reflected in the "supported engines" section of the transform docs.
Hop GUI and Hop Web
A new welcome dialog gives new Apache Hop users a quick introduction to Hop GUI and Apache Hop in general. The welcome dialog points to a number of documentation pages and sample pipelines or workflows in the samples project.
In the good old Apache Hop tradition, the welcome dialog is pluggable: Hop plugin developers can add their own information to the dialog. The dialog also introduces a new "link" widget type that will start to appear in other areas of Hop GUI.
Navigation viewport: even though overly large workflows and pipelines are hard to build and maintain and often are a possible cause of performance issues, sometimes they are unavoidable. Using vertical and (especially) horizontal scrollbars not only is painful as a Hop GUI user, they are particularly hard to get right on the large variety of platform combinations Hop GUI and Hop Web support.
To improve the user experience, the Hop development team got rid of scrollbars altogether. Instead of scrollbars, Apache Hop 2.2.0 introduces a new viewport that allows you to quickly navigate large workflows and pipelines by just dragging your mouse around in the viewport, or by using the arrow keys on your keyboard.
Zooming in and out has also been streamlined. Use the key combinations `CTRL-+/=` and `CTRL--` to zoom in and out, `CTRL-0` to return to 100% zoom.
Data grid toolbars with Excel export: Hop users very often come across data grids when working with pipelines and workflows: field lists, preview dialogs, input/output fields and many more.
A lot of the common operations in these grids, like row manipulation, cut/copy/paste etc required opening and navigating a right-click menu. This right-click menu is still available, but Apache Hop 2.2.0 introduces a new data grid toolbar that makes a lot of these operations available at the click of a button.
One new option in this toolbar is the ability to export the current grid to an Excel (or similar, e.g. gsheet) spreadsheet. As advanced or long-term Apache Hop users may already know, all data in Apache Hop data grids can be copied to a tab-separated matrix in your clipboard that can be copy/pasted back and forth between your favorite spreadsheet editor and Hop GUI. This can help tremendously speed up the editing of large or advanced data grids.
Configuration perspective: as the amount of functionality in Hop GUI grows, so does the number of available configuration options. To avoid having too many cluttered configuration options all over the place, Hop 2.2.0 comes with a new configuration perspective.
The most obvious configuration options have already been moved to this perspective. New plugins can add their own tabs to the perspective, so the number of available tabs is expected to grow in future releases.
Hop Web has been a first-class citizen since the early Hop releases. A lot has happened in the latest releases to make Hop Web a fully functional and usable alternative to the desktop Hop GUI.
With Apache Hop 2.2.0, the number of full or partial refreshes in the Hop Web UI has decreased significantly, and all images are now SVGs. This results in a much more stable user experience that feels a lot snappier.
The file menu (which doesn't feel very web-native) has been replaced with a hamburger-style menu with the Hop logo as the menu's main button.
The Apache Cassandra support in Apache Hop 2.2.0 was upgraded to version 4. Cassandra 4 brings Java 11 support, virtual tables, audit and full query logging, messaging, streaming and transient replication.
Support for Neo4j was upgraded to version 5, that comes with increased performance, sharding, autonomous clustering and agile operations.
Another Neo4j related change in 2.2.0 are the execution lineage and Cypher tabs for Neo4j. These tabs were already available in the Neo4j logging tab, and have now appeared in the Execution Information perspective for configurations that write to Neo4j.
The Apache Hop community continues to grow, both in numbers and geographically.
As the community grows, so does the number of active contributors. Active community involvement is what makes an open source project thrive and grow, a huge thank you and shoutout to everyone who participated in the Apache Hop 2.1.0 release.
Even more importantly, we noticed a shift in the way organizations work with Apache Hop. Where organizations were initially exploring Apache Hop in the early days as an incubating project and in the pre-2.0 days, we're now seeing a lot of organizations using Apache Hop on a daily basis and running their projects in production.
Apache Hop and Lean With Data
Lean With Data actively develops and supports Apache Hop. We're ready to help you in every step of your Apache Hop journey with production support, coaching, architecture and training.
Contact us to find out more.