Data architecture’s just like regular architecture. In both spheres, principles underlying good architecture should be observed. Sure, there’ll be certain designs that work well for a broad swathe of applications and other designs that are a little more niche, but no matter the exact nature of the structure, you can bet that if it’s a successful one, the architect bore in mind the essentials.
Want More Tech News? Subscribe to ComputingEdge Newsletter Today!
What is Data Architecture?
Data architecture can get complicated.
But there’s no need to get this complicated straightaway. Most approaches to architecture start with a foundation, and that’s what we’re about to lay down here.
Data architecture can be described as how an entity organizes its data.
There are three aspects to this:
- How is the data stored?
- How is the data processed?
- How is the data used?
We will see these questions crop up all over data architecture concerns, sometimes two or all three at once.
But, to deal with each in turn, storage includes factors such as accuracy, access, control, and scalability. This is the ‘data lake’ of raw data.
Processing covers security, data transmission to and from peripheral sources, and flexibility. The processed data forms the ‘data warehouse.’
Usage covers interfaces, data sharing, and application.
Some companies have very formal approaches to these three aspects of data architecture, some less so. But all companies should cover them in some manner. This way, they can ensure that data management is given the priority it deserves.
Such are the penalties for being careless with data (the average fine for US companies found guilty of a data breach was $4.24 million in 2021) that organizations owe it to themselves, their clients, and any of their contacts to apply a little conscientiousness to their data. Data is precious, so businesses need to view it with the same if not higher regard as capital.
It is to this necessary veneration of data that we’ll turn first.
1. Data Culture
With any paradigm shift, it’s no good just attending to one aspect of a company in isolation if you want major change. For instance, sexism in the workplace is being challenged (albeit slowly) but not with an exclusive concentration on recruitment or any other single area. To ensure the root and branch change required, it has been necessary to tackle the entire environment and psychology of the workplace. In other words, its culture.
Exactly the same with data. There must be a prioritization of data concerns, which is imparted by getting everyone to adhere to the data creed. Data is no longer just the preserve of data scientists.
Here’s one way of depicting this:
One of the biggest mistakes companies make is to recruit a team of data staff, give them a fancy office with all the latest gear, and then sit back, thinking the data job’s done. The trouble is, that the data that your new department is looking after for you will be accessed by lots of others, both internal teams and those beyond the company. If those others aren’t so mindful of data matters, you may have trouble.
These others might end up spreading data to those without the right to access it. We’ve already mentioned the importance of data security and access governance’s value. Almost as bad, they might not provide it to those who need it and workflows may suffer.
All staff have a responsibility to ensure data reaches absolutely everyone who needs it, and absolutely nobody else. Your job is to inculcate this into them so that they begin to see data for the valuable commodity it is, and not just something that might or might not be up for grabs to who-knows-who.
The need to share leads us to our next principle.
2. Dish the Data
So, staff should supply data to one another where tasks require it. But it goes further than this. There should be attention given to making the data work for everyone in the same way. A very salient aspect of this is metrics. A particular metric should mean the same in marketing as it does to the sales team. There has to be a common vocabulary, with no obscure within-office dialects.
Let’s say two parts of the business are working with similar figures, but one works exclusively with monthly data, while the other works only with weekly data. If at all possible, there should be an effort made to unify their data, so that meaningful comparisons and relational appraisals can be made with greater ease and speed.
The more cross-office consensus on what specific data represents and where it directs the organization, the more your business will benefit from joined-up thinking from joined-up departments.
Your excellent data professionals might need a bit of encouragement when it comes to sharing in the first place. It’s often the case that data staff can think of themselves as guardians when they should really think of themselves as facilitators. And part of this facilitation boils down to cutting the jargon. In this regard, there should, in a very real sense, be an effort to get everyone to speak a shared language.
One final point: make sure that your company’s data is organized in such a way that its accessibility is safeguarded. For example, try to make it secure against power outages so that uptime can be optimized and a protected ability for customers to use your services.
3. Avoid Vendor Lock-In
Vendor lock-in is what happens when you acquire a piece of technology that you end up being stuck with by virtue of it not being easy to swap out of your architecture. For example, when a company chooses from a range of hosted PBX providers, it should look for an easy exit route as much as an enticing entrance. Otherwise, its communications could be run by a service that may prove unsuitable as the future unfolds.
So, any technology procurement needs to be conducted with an eye on the future. You need to consider not just what this technology can contribute while being part of your business. You need to think about how it can contribute by being easily jettisoned.
4. Be Secure
How can you marry the need to ensure legitimate access with the requirement to stop unauthorized access? Data architecture ensures this by classifying items of data according to their
sensitivity and who can access them. To take hosted contact center software as an example, there will be a provision in place to ensure that client details are only ever accessible to those who have an express and permitted purpose with that information.
For instance, a healthcare data architecture will make sure that any data which is to be accessed just for macro analytics will be anonymized.
Data architecture will set out the means by which privacy controls guarantee confidentiality. Multiple layers of security can be built into the data architecture to ensure that at no stage will the data be vulnerable, whether it be in storage, processing, or application.
Here’s an interesting stat that comes from the world of conventional architecture, where it was found that half of the surveyed architects are discouraged from using their BIM (Building Information Modeling) team software by concerns over data security.
So, a lot of valuable collaboration wasn’t happening because the workers involved didn’t feel in a secure enough environment. You need to provide that security.
5. Be a Greater Data Curator
There’s more data around everywhere nowadays. At times we’re almost submerged in it. When it’s in its raw and/or disorganized state, data’s usefulness can be jeopardized. It takes a certain amount of sorting out before it reaches its efficacy potential.
For instance, we have more TV than we know what to do with. At times, it can be baffling just starting to decide on what you’re going to watch that night. This is why TV services often have a curator mode, where certain movies or series are highlighted as being perhaps more likely to be of interest to the viewer, based on previous watching and other data.
The viewer may or may not take the service upon its suggestions. If they decide not to, they will almost certainly seek other material by looking through various groupings of programs – drama, thriller, sci-fi etc. This is another layer of curating, known as taxonomy.
When it comes to data in the workplace, the same principles apply. To ensure that your staff is given the material that’s most appropriate to their tasks, the data architecture has to be such that the information is displayed in an easily understood and readily accessible pattern.
The curated data must be helpful to the business user, so should be subject to regular quality checks. For this reason, data architecture should include the best practices in test automation.
6. Be Flexible
There’s one constant in business: change. The more you expect it and even embrace it, the better your business will perform. With that in mind, any data architecture you implement should have within it the potential to easily evolve. Modularity, for example, is to be highly prized, giving an organization the chance to update a system without having to replace it wholesale.
Another area of flexibility resides in the means by which staff can access data. It makes sense to have your data architecture designed to allow multiple formats of access request. This way, your system will be able to cope with, for instance, unstructured emails every bit as well as structured CSV files. This ability to cope with non-technical staff inputs will remove the need for possibly time-consuming and expensive training.
7. Reduce Data Copies
Your data architecture should be arranged in such a way as to reduce the need to copy data constantly. To produce endless copies of data is wasteful in terms of processing space and, eventually, finance. It’s also a security risk in itself.
Data virtualization can remove the need to transfer and copy data. It’s possible to run queries across all of your data with no transfer necessary, using such tools as Azure Synapse Analytics.
8. Reverse ETL
You probably already know what ETL is. Just in case you don’t, ETL (or Extract, Transform, Load) is the means by which enterprise data warehouses are often made. It is a way of combining data from multiple sources into a coherent whole.
So, reverse ETL is a way of taking the data from a data warehouse and changing its format. In order for the data to be compatible with apps from third-party sources like Salesforce, Hubspot or Marketo, it needs to be brought out from where it’s stored and transformed into a more suitable shape.
So, your data architecture has to allow for this. There are reverse ETL tools that have pre-installed API integrations, which simplifies usage and maintenance. But, even if you don’t use actual reverse ETL, it’s important that you are aware of the need for a process by which data can be accessed for use with a variety of apps.
Standardized interfaces such as SQL, RESTful API, or OLAP should be implemented, depending on the nature of the business and the data being stored.
This standardization will ensure that retrieved data arrives in a predictable and therefore immediately usable format.
9. The Ingestion Question
Your ingestion tools are the means by which data is loaded from the ingestion stack onto the data warehouse. This data will come in a host of forms from a wealth of sources, so your data architecture needs an ingestion tool that can deal with as many as possible.
Better to have a few versatile ingestion tools than many single-source ingestion tools. To have to swap between tools is a drain on time and will impact your data performance.
So, what you have to do is to pinpoint which ingestion forms you will need to support, for instance, FTP, Batch, CDC, API) and make sure your data architecture is built around an ingestion tool that can cope with them.
10. Data Discovery
Your data architecture should have within it a provision made for automated data discovery sessions. This can reveal interesting and valuable data patterns, as well as highlight where applications could do with being updated.
A cloud telephone system, for instance, should perform regular data discovery sweeps to check for obsolete or conflicting personal information.
So, data architecture is primarily about making sure you have thought through the structure of your information holding. Does it have input means that are up to scratch? Do the output formats chime with what your business needs? Any system planning approach has to include answers to these questions.
To return to our original schema of storing, processing, and using, it’s clear that most parts of your data architecture impact more than one of these areas. In this regard, good data architecture has a lot in common with good operational systems design in general.
Although it’s often good to analyze by breaking things down, sometimes one has to have a holistic view to see how a structure works. Such a view will pay dividends with data architecture.
About the Writer
Tanhaz Kamaly is a Partnership Executive at Dialpad, a modern cloud-hosted business communications platform that turns conversations into the best opportunities, both for businesses and clients. He is well-versed and passionate about helping companies work in constantly evolving contexts, anywhere, anytime.
Check out his LinkedIn profile.