TL;DR

Introducing Lean With Data - Lean With Data - Fast, Flexible, Robust and Visual Data Pipelines

Lean With Data launched in late January 2021. While we’ve been busy setting up shop, we wanted to take a quick moment to explain what Lean With Data will be as a platform and as a business, and want to explain what our goals are in building the platform.

TL;DR

Lean With Data believes the data world has come to a stand still on data engineering, is in a nuclear winter on data visualization and is in an absolute need for low entry level tools for advanced analytics. Sure, there are an endless number of frameworks and platforms that do all of these and more, but we are convinced that there’s a need for a unified platform that combines these end-to-end efforts in an easy to use and affordable way.

Lean With Data aims to build a next-generation data engineering, data visualization and advanced analytics platform. The platform we’re building will eventually consist of these three interconnected components, each of which will be strong and independent platforms on their own, but are designed and will be built to be stronger together.

Lean Orchestration, Lean Presentation and Lean Knowledge will be our set of tools to orchestrate, visualize and analyze all your data related business problems, both reactively and proactively.

The entire platform will be built as open source, we’ll donate as much of our code as possible to the Apache Software Foundation.

We’ll build our main revenue streams from professional support, training and on-demand product development. The revenue we get from these activities will be used to fund further development of the platform.

Let’s dive into some more detail!

Nuclear Winter

Data problems are hard:

  • Orchestration: Data is scattered all over the place. Overall data quality is poor. Cleaning, combining, transforming and loading data is a complex and time consuming task. There’s an overabundance of data orchestration platforms out there, but almost all of those are either niche products that only solve part of the problem. They require a lot of tedious work, are hard to learn and hard to manage, require coding, and often are a combination of all of these.

  • Visualization has come to a standstill. The major players in the traditional data visualization market seem to have divided the market share between them. There are a number of open source initiatives, but these are either limited in scope and functionality, are hard to use or inaccessible to business users (Python or Javascript code!) or both.

  • Advanced Analytics is unavailable as integrated, affordable and easy to use solutions. There are more Python libraries than one can count, there is an abundance of data science platforms, but there are no affordable, easy to use tools that integrate with orchestration and visualization.

The Lean With Data Platform

We believe building a unified platform that brings orchestration, visualization and advanced analytics together is not impossible to build. Heck, we’ve already designed and are building it!

Let’s explore our core components.

Designed for robustness and flexibility

All of our components will use the same design: a kernel architecture holds the core functionality: process, visualize, analyze and track data. Detailed functionality is added through plugins.

This design holds a number of important advantages:

  • robustness: a single plugin may fail, but that shouldn’t bring the entire system to its knees.

  • accessible: plugins are designed to be easy to build for developers, allowing organizations and developers to add plugins to the Lean ecosystem

  • flexible: a plugin architecture makes it easy to tailor one component or the entire platform to your needs: keep only the plugins you need or add anything and the kitchen sink, run on a tiny device or a huge cluster etc

Policies and configurations will work with the different components in the system to tailor the platform for a specific project or environment, prevent unauthorized access or use and make sure no unnecessary functionality runs in any given environment.

Lean Orchestration

Lean Orchestration is our data engineering component. Lean Orchestration will have Apache Hop (Incubating) at its core. Hop and Lean Orchestration are designed and built to be easy to use, easy to develop and easy to manage, while being extremely flexible and powerful.

With a kernel based architecture for the core system and plugins for extended functionality, Hop has already proven to be extremely powerful and flexible. We support data architectures that run on-premise and in the cloud, work in batch and streaming and can operate in IoT as well as Big Data environments.

Lean Presentation

Data only serves its purpose when it is easy to interpret and easy to understand.

Lean Presentation will be able to visualize data at any given point as an independent visualization platform, but will be tightly integrated in the entire platform as well.

Visualizations will not only be available for the actual data at hand, but will also provide insights in auditing (who touched my data?), lineage (what is the life cycle of a data point) and impact analysis (how does a change in my data impact e.g. other reports or dashboards). Just like Lean Orchestration, Lean Presentation will be integrated in every atom of the Lean With Data platform.

Lean Knowledge

As data flows through Lean Orchestration, Lean Presentation and all other components in a modern data architecture, Lean Knowledge will have a ton of information at its disposal to build insights.

With graph technology at its core, Lean Knowledge will investigate known relations in your data, and will discover unknown relationships in your data and metadata. Metadata will play an important role in Lean Orchestration and Lean Presentation, but Lean Knowledge is where it all comes together: not only will Lean Knowledge be able to integrate graph and traditional algorithms (forecasting, clustering, recommendations, anomaly detection and so much more) on your data itself, we’ll be able to track and analyze any possible use of your combined data and metadata.

With the Lean Knowledge’s findings and results fed back into Lean Orchestration and Lean Presentation, we’ll close the loop and take your data processing, visualization and understanding to another level.

Business Model

The Lean With Data platform will be built on and as open source as much as possible. We intend to build all three components in the platform as separate Apache projects (with Apache Hop (Incubating) as the first one).

As open source projects, building a large community will be key to develop, extend, test and use the platform. Our definition of community is very inclusive, we invite anyone who is interested in the platform, whether they are a user, developer, partner or customers to join.

Building this ambitious Lean With Data platform will take time and will require resources. To fund the development of the platform, we already offer support, services and custom development on Lean Orchestration, and will continue to do so as Lean Presentation and Lean Knowledge start to take shape.

Above anything else, we want to stay true to our goals and build a platform that makes a difference. While Lean With Data needs to be a healthy business in order to achieve our long term goals, we’re currently not taking any investment that forces us to think short term and prevents us from working on our long term strategy.

In Closing

The Lean With Data platform will be an adventure that will take a while to build.

There will be challenges and roadblocks to overcome, but we do believe our platform design is solid and the opportunities are real. Let’s join forces in building a platform that matters!

Blog comments

related posts