What Is Data Mesh: A Simple Guide to Implementing the Latest Industry
Jenna Bunnell
Published 03/22/2022
Share this on:
Running a business can sometimes seem overwhelming. You have overall responsibility for ensuring your organization runs efficiently and that you have suitable strategies in areas such as engagement strategy. However, as well as any current processes and systems you employ, you also have to keep one eye on current and emerging trends in your industry.
Trends come and go, especially in the fields of SaaS, PaaS, and IaaS. It can be difficult to make a choice in the early stages of a trend, especially if that trend requires significant investment. But you also want to identify the ‘good’ trends, the ones that will offer real and tangible benefits to your business.
One recent trend is data mesh. But what is data mesh? We look at what data mesh is and how you can implement it into your existing structure.
What is data mesh?
Think about data mesh as being similar to microservices but on a data platform. The term was coined by Zhamak Dehghani in 2019 and represents a major paradigm shift in how you store analytical data. Previously, this data was stored in oversized data warehouses or lakes and could pose issues when it came to access and use.
With data mesh, that data is moved into a more distributed architecture, meaning you can access it in real-time for analytical purposes whenever you need to. You now store and use data in more ways than you did 10 years ago, and there was a pressing need for how data is stored to drastically change to reflect the evolution of that use.
By using this method your data delivers better value and more in-depth insights, something that is crucial in today’s digital world. Meshing your data removes friction and lets you access it according to your operational needs.
Data mesh connects your data in a more peer-to-peer way and frees it up for various needs, including data and conversation analytics, ML (machine learning), and any data-intensive apps or systems you use. It moves away from the previously centralized data architecture, devolving access and storage in a more efficient way.
The data mesh paradigm
There are four main principles to the idea of data mesh:
1. Ownership being domain-oriented.
Data meshes shift data ownership to a more federal-like system where domain/data owners have the responsibility of providing their data as products while enabling communication between any distributed data. This offers each domain the needed solutions to process the data, but domains have the responsibility for overseeing ingestion, aggregation, and cleaning of that data so it can be used by BI (business intelligence) apps.
2. Your data as a product.
While traditional models and architectures see data as a service, data meshing recognizes it more as a product. You use that product in the various ways you need, from making business and operational decisions, detecting and preventing fraud, creating personalized experiences, and so on.
3. Your data is available anywhere via a self-service infrastructure.
By shifting to a self-service infrastructure, organizations can avoid any complex technical issues and can instead focus on their own data use cases and needs. Data meshing extracts the data infrastructure capabilities and places them on a central platform that handles all storage, streaming infrastructure, and data pipeline engines. Each and every domain then has responsibility for using those centralized components to run their own customized ETL (extract, transform, and load) pipelines.
4. Governance in standardizing your data.
If you have many domains, you need to ensure that different ones can collaborate when needed. Data meshing achieves this by having a universal set of standards for the data. Data will be of value to more than one domain, so standardizing factors such as governance, formatting, metadata, and discoverability means that collaboration becomes easier. Added to this is the requirement that each domain must agree on quality measures and SLAs (service-level agreements) that they guarantee for consumers.
Are you ready to implement a data mesh?
You can probably now see the advantages that data meshing offers and that it is not a temporary fad. You may even be thinking it is definitely something your organization could do with adopting, but how do you start? The first thing to note is that date mesh is still a relatively new concept and you need to be sure your organization is ready to adopt and adapt.
The second thing to note is that there is not one single type of data mesh, in fact, there are many different types. Some of those are primarily centralized while others remain decentralized. Migrating to the latter type is not only difficult but can be very time-consuming. Many of the challenges you may face will be technical, but the main one remains to shift your organizational mindset.
Organizational mindsets exist where companies were created based on a hierarchical and centralized model. This then leads to a major barrier when it comes to moving away from those models and is why so few organizations have moved to decentralized microservices. So, how do you know your organization is ready for data mesh?
1. Structure
Your business’s structure needs to be centered around business domains or composed of teams that work cross-functionally. If teams such as products, engineers, and QA are all working in vacuums, then you need to shift them to working together before implementing data mesh.
2. Focus
You need a strong focus on DevOps where all your team members are looking at building automated services, ideally with a GitOps angle.
3. Platform
You need to be utilizing a modern platform that enables high levels of productivity while hiding tech details from users. You also need a certain level of abstraction in order to have a self-service infrastructure. Using a service mesh is also useful to bring together the different networking aspects of your architecture.
4. Streaming
In order to move forward, you need a streaming engine platform such as Kafka. Without such a platform, you cannot migrate to any microservices and unify your batch and streaming. It also helps you close the gap between OLTP (online transaction processing) and OLAP (online analytical processing) workloads. You can use these platforms to either develop microservices driven by events or to move data.
5. Migration
Migrate to microservices. This is a major step and requires that you learn how DDD (domain driven design) works. Once you understand DDD and know how to split your microservices around different business domains, then doing the same with your data products becomes easier.
6. Big data
If you do not familiarize yourself with big data, how it works, and what its implications are, then you will find implementing data mesh difficult or unnecessary.
7. Metadata
Learn how to manage metadata and how to address, discover, and catalog your data.
How to build a data mesh
To comprehensively describe how to build and implement a data mesh would be both lengthy and complicated. However, there are three main phases to the process that you can consider.
1. Addressable
Make all your relevant data addressable so that it can be discovered easily. Use the REST approach and standardize all your bucket names. This can help lead to standardizing all your data and making it easily consumable. Then look at adding SLAs to all end points and also monitor those endpoints to ensure data is always available.
Once that is done, you can then re-route any query engines and BI tools you use so that they can access and use these new data products. If you are using data warehouses or lakes, use the same approach for them and create standardized schemas and views. This phase will use a centralized approach and should be carried out by your data platform teams.
2. Catalogs
Improve your metadata and data catalogs. By doing this, you allow for greater discovery of your data so that anyone can find and use data products within your organization. You need a ‘place’ where users can look for, find, and then access the data they need. You also need to establish a way where both data owners and consumers can request and gain access to your data products without the involvement of a central team.
At this stage, you should also be looking at adding tests that measure the quality of the data as well as the lineage and monitoring. These tests should apply to both moving data and data at rest. You may even want to consider what is a CI testing process’s place at this stage.
3. DDD
Implement DDD and break or move away from previous monolithic data structures. This is a major step toward establishing a decentralized architecture. One part of achieving this is to try and put ownership onto the domain team that is creating any data.
Each of your teams must own their particular data assets, quality testing and monitoring, ETL pipelines, testing, and other factors. It is also crucial that you remember the need for federated governance in order to guarantee security, interoperability, and data standardization.
If all this is in place, then it minimizes the need for changes, particularly major ones. You just need to ensure that these capabilities are available as services so that you can create your self-service platform. This is also the stage where you would introduce your DataOps practices as well as improve the capabilities for self-service and also observational capabilities.
The takeaway
Data mesh is no ordinary trend. It represents a major paradigm change in how you can organize, store, and access your data. However, the main point to note is that not all organizations need to make the change now or have the ability to do so. Look at your existing systems to see if you have that ability to help make the transition smoother.
About the Writer
– Senior Manager, Content Marketing, Dialpad. Jenna is the Senior Manager for Content Marketing at Dialpad, an AI-incorporated cloud-hosted unified communications system that provides valuable call details and free conference calling for business owners and sales representatives. She is driven and passionate about communicating a brand’s design sensibility and visualizing how content can be presented in creative and comprehensive ways. She has written for sites such as Cybint Solutions and Crocoblock. Check out her LinkedIn.