Apache Hop 0.99 is ready to download, run and TEST!!

Apache Hop 0.99 Released - Lean With Data - Fast, Flexible, Robust and Visual Data Pipelines

With another 3 months of work and over 300 JIRA tickets worked on since the 0.70 release, the Apache Hop team released Hop 0.99 earlier this month. As the name implies, this 0.99 is intended to find and squash the last bugs and clean up the last messy parts of the code before Hop 1.0 is released to the world. 

Hop 1.0 is getting really close now, and it will be quite something! We’re not there yet, let’s take a look at what the Hop community worked on for 0.99. Even though we intended to focus on making Hop as stable and robust as possible rather than adding new features, 0.99 comes with an impressive number of new functionality.

Hop Web

When Hop started in the summer of 2019, one of the project’s primary goals was to support Hop Gui on four major platforms: Windows, Mac OS, Linux, and the web.

Hop Web started from the WebSpoon project, started by Hiromu Hota. Hop Web provides the full Hop Gui experience in your browser, and now even has its own dark mode.

Hop Web is a part of the default Hop build and a Hop Web Docker image is pushed to Docker Hub multiple times per day.

Trying out Hop Web for yourself is as easy as pulling the docker image: docker pull apache/incubator-hop-web

Hop Web Dark Mode

Avro and Parquet

Hop 0.99 comes with four new transforms for Avro and Parquet data serialization formats, both now have input and output transforms.

Additionally, there also is a new Avro Record data type.

New Avro and Parquet transforms

VFS everywhere

VFS (Virtual File System) is a project under the Apache Commons umbrella that provides access to a large variety of data platforms over a URL.

In addition to the large collection of files and storage systems that are supported by VFS out of the box, Hop already added support for various cloud storage platforms like AWS S3, Azure Blog Storage, Google Drive, and Google Cloud Storage earlier on.

In 0.99, Hop supports VFS in almost all file interactions. For example, Hop projects can now live in an AWS S3 bucket or Google Drive folder. In combination with Hop Web, this enhanced VFS support opens up a world of possibilities to deploy Hop in cloud environments.

Pentaho Importer Improvements

Hop started as a fork of Pentaho Data Integration (Kettle) in 2019, but was intended to be a platform on its own from day one.

Hop now comes at an installation footprint and startup time that is only a fraction of PDI, with a lot more functionality. However, the shared history allows the Hop team to convert PDI jobs and transformations into Hop workflows and pipelines, and together with all other metadata, import them into Hop projects.

This import functionality appeared earlier this year but has now received a major update. Variables are imported into an environment file instead of on the project level, there are options to skip existing files and/or folders etc.

Even though Pentaho Data Integration still exists, a lot of organizations with active projects feel they can’t be innovative anymore. Hop and the Pentaho importer provide a way forward for these projects. Upgrading to Hop is straightforward and allows you to reconnect with innovation.

Improved Pentaho Importer

Metadata Injection improvements

Metadata injection allows data developers to specify a pipeline template and insert the actual metadata for that pipeline in runtime. This is useful in scenarios where you need to perform the same tasks with different metadata repeatedly, for example, load a list of CSV files with different layouts to database tables.

The PDI implementation of metadata injection used two different APIs. Instead of upgrading the existing functionality when a new API was added, the old API was left in place, and new transformation steps were developed with the new API. As part of one of the major refactorings in Hop 0.99, the old metadata API was removed. This broke metadata injection support in a lot of the transforms and needed to be re-implemented.

In the process of adding new and improved metadata support, the metadata injection transform received a facelift. A number of existing UI issues were fixed, and the template transform icons are now shown, which makes it easier for developers to see or remember what type a transform is and what its purpose is. Hop now supports metadata injection for the majority of transforms and intends to have metadata injection support enabled by default for all transforms in the near (post 1.0) future.

Check the Hop docs to find out what the metadata injection status is for the various transforms.

Improved Metadata Injection

Internationalization (Translations)

A platform that is available in your own native language is crucial for a lot of people. Internationalization and translations have been high priority functionality for Hop since the earliest days of the platform.

The Italian community worked hard to create a complete and top-notch Italian version of the entire Hop platform, fixing a number of internationalization issues in the process.

Hop is now available in 10 languages. As Hop development has been moving incredibly fast over the last two years, not all languages are supported completely.

Hop makes it as easy as possible to translate Hop in your own language or to improve or extend an existing translation. Translations are a relatively easy contribution with a major impact, and a perfect way to start contributing to Hop. Check Hop’s translation contribution guide if you’d like to add or improve support for your own language.

Documentation and samples

Documentation has been a first class citizen for Hop since day 1. The speed of development until 0.70 made it impossible to keep the docs up to date, features were added and the UI changed at a pace that was just too fast to document.

After the 0.70 release, the amount of new functionality that was added decreased in favor of stability and bug fixes. When the development dust started to settle, the Hop community worked hard on documenting all of the functionality in the platform.

There’s always room for improvement, but all major functionality in the platform is now documented. As the documentation has been merged into the main Hop software repository, documentation is now ready to be versioned, and you’ll be able to switch to the documentation for your Hop version.

Community

One of the challenges all incubating Apache projects face is community growth. In short, the Apache Software Foundation (ASF) doesn’t really care about the quality of the software a project delivers, that is the project team’s responsibility. What the ASF cares about are the legal aspect of the software and maybe even more important, community building. 

Apache Hop built a large following on social media. Many hundreds of new followers on the various social media platforms started following Hop since the 0.70 release, 2 new committers were added to the project, and the overall attention and activity around everything Hop increased significantly. User groups started to appear around the globe. There are now user groups in (at least) Brasil, Spain, Italy, and Japan.

Beyond Hop 1.0

Hop 0.99 is a release candidate for Apache Hop 1.0. After a couple of weeks of bug hunting, Hop 1.0 should see the light of day.

While preparing for the 0.99 release, the Hop team removed the last couple of hurdles in the licensing, copyright, and ownership of the source code and dependencies. There’s no urgency, but expect Hop to leave the incubator and become a Top-Level Project (TLP) in the not too distant future.

Hop development doesn’t stop with 1.0. The Hop team released an updated roadmap with a couple of items the team will start working on once 1.0 is out the door. These include a marketplace to allow third-party developers to publish their Hop plugins and pluggable serialization to allow workflows and pipelines to be saved in more modern file formats like JSON and YAML instead of the outdated XML. Also on the roadmap are new services for logging and monitoring and a more generic logging perspective instead of the current Neo4j-centric version.

Apache Hop is ready to become your data engineering and data orchestration platform of choice.

Apache Hop and Lean With Data

Hop is getting really close to both 1.0 and the graduation as an Apache Top Level Project. As the project matures, we see a major growth in the community and the number of active deployments.

At Lean With Data, we are 100% committed to building the best open source data orchestration platform possible. In addition to that, we’re also convinced that serious Hop implementations should not go into production without professional support.

We’re here to help you build Hop projects that are robust, performant and scalable!

Blog comments

related posts