Apache Hop 1.0 is available!!

Apache Hop 1.0 Released

The Apache Hop (Incubating) team and community released the first major release of the Apache Hop platform.

Hop 1.0 is the first major release on a roadmap that intends to put Hop in the spotlights of the data orchestration, data integration and data engineering landscape.

Apache Hop (Incubating) - Release 1.0 available!

This first release contains all the tools and integrations data developers need to be efficient in building robust, scalable and reliable data projects.

Hop 1.0 also marks the end of the beginning. A lot of time was spent refactoring and cleaning the original code base and building the foundations for the Hop platform. With the major cleanup and refactoring behind us and the core building blocks in place, the Hop team is eager to extend the existing functionality, add new functionality and build new integrations.

Hop 1.0 Highlights

Architecture

Hop’s architecture has been designed to be as flexible and extensible as possible. All non-essential code was stripped out of the engine and core and moved to plugins.

Hop now consists of a small but flexible and powerful kernel with over 400 plugins that add over 20 different types of functionality, from databases over runtime engines, to testing and lots more.

Code cleanup and refactoring

Apache Hop (Incubating) started as Project Hop in the summer of 2019 as a fork of Pentaho Data Integration (Kettle). After almost twenty years of development, the code base was in a worse shape than expected. It took the better part of 2 years to throw out the furniture, tear down walls and tear out the plumbing and wiring. There will always be some cleanup to do, but Hop 1.0 now works with updated dependencies and a clean and easy to use code base.

A Uniform set of tools

Hop 1.0 comes with an extensive number of tools to design and run workflows and pipelines, to search through available metadata, configure projects and environments and more.

All the functionality to develop workflows and pipelines and to manage metadata like database connections or unit tests is available from Hop Gui. This new Gui is a visual IDE that was written from scratch, is available on all desktop platforms (Windows, Mac OS, Linux) and from the browser as Hop Gui. All non-design tasks can not only be performed from Hop Gui but also from a variety of command line tools.

Apache Hop (Incubating) - Hop Gui

Take Hop where your data is

Hop workflows and pipelines are designed in Hop Gui, but are not constrained to the local Hop environment. Workflows can be executed in the native Hop engine, both locally and on remote servers through a uniform and transparent user interface. Pipelines can be executed in native and remote pipeline run configurations as well, but have the additional possibility to run on an Apache Spark or Apache Flink cluster, or on Google Cloud’s Dataflow through Apache Beam.

This unparallelled flexibility gives Hop developers and projects to take their Hop implementation where the data takes them: smart small while the project matures and move to a (cloud) cluster when the project and data volumes grow.

Project and Life Cycle Management

Data projects fail all the time, too much focus on technology and not enough focus on project, process and life cycle management are only some of many reasons.

Hop wants to enable non-technical users to be successful in their data projects. To do this, Hop organizes work in projects and environments, where projects (code) are strictly separated from configuration (environments).

Projects can be managed in version control through the git integration in Hop Gui’s file explorer perspective. Differences between versions of a file can be visually inspected through colored indications for new, modified or deleted parts of a workflow or pipeline.

Pipelines and workflows not only need to run without (uncaught) errors, they also need to process the data exactly the way they’re supposed to. To ensure the data actually is processed correctly, Hop data developers have the ability to add unit tests to pipelines. If a pipeline’s produced result matches an expected (golden) data set, the test passes. If there are differences between what was produced and what was expected, the test fails. When added to a library of unit, integration and regression tests, data developers and project owners can rest assured, knowing their data is processed correctly.

Apache Hop (Incubating) - visual git diff

Community

Although not necessarily a feature in Hop 1.0, community is crucial to Hop. Hop wouldn’t be possible without the support and contributions of a growing community.

The community has grown significantly since Hop entered the Apache Incubator in September 2020. Each of the social media channels has hundreds of followers, local user groups started to appear all over the world (e.g. Brazil, Spain, Japan).

This community growth is expected to even accelerate now that the worst of the global pandemic seems to be behind us.

Lean With Data and Hop 1.0

The general availability of Hop 1.0 is one of two major milestones. The second of the two is the graduation as an Apache Software Foundation Top Level Project (TLP).

The graduation as an Apache TLP means the Apache Software Foundation formally takes ownership of the Hop software, documentation and everything that is part of the project.

Both having a 1.0 and becoming an Apache TLP are expected to introduce Hop to a much larger audience, significantly increasing its adoption.

Lean With Data is ready to support customers who want to take Hop into production. The freedom and flexibility of a data orchestration platform that is guaranteed to be open source is a reassurance for the future of a project. However, systems break all the time, configuring and integrating an already large platform like Hop in an even more complex existing architecture is not a trivial task. Lean Orchestration is Lean With Data’s Hop support package for Hop users, by Hop developers.

Lean With Data guides customers through the initial phases of building Hop solutions and running those solutions in productions. We have your back with upgrades from PDI/Kettle to Hop, training and coaching in running Hop according to best practices.

In order to serve our customers even better, we are building a global network of partners that will be trained and certified to help you in your local time zone and language.

Blog comments

related posts