What is data architecture?
Data architecture defines how data flows in an organization. It consists of rules, standards, policies and models that define how data is acquired, processed, stored in repositories and managed within an organization.
Oh dear, sounds like a lot 🤯! Think of it this way. Data architecture is like a blueprint for your organization’s data that answers three questions:
- What data should you store? 💾
- Where and how should you store it?🛢
- Who can access your data? How? 🔐
Who is a data architect?
Data architects create the blueprints for data management systems. They visualize, design and prepare data in a framework that can be used by members of a data team such as data scientists, engineers and business analysts (aka the humans of data).
They define data flows, create inventories of data, set data management standards and map the systems and interfaces required to manage data.
According to the Data Management Body of Knowledge, the data architect:
- Provides a standard common business vocabulary
- Expresses strategic data requirements
- Outlines high level integrated designs to meet these requirements
- Aligns with enterprise strategy and related business architecture
As big data becomes more common, data architects have to build systems capable of processing, storing and handling massive, complex data such as web traffic, financial data and customer history.
As a result, data architects must be well-versed in big data tools and technologies such as Hadoop, Spark, Hive and MapReduce, repositories such as data lakes and NoSQL DBMS such as MongoDB, Cassandra and HBase.
What does a data architect do?
Data architects understand how people use data to make decisions before even creating the data architecture for an organization. They work with solution architects and engineers to develop data architecture that defines data standards, principles, systems and frameworks.
They understand the business goals and existing data infrastructure of an organization. Using this information, they start designing the technological requirements for that organization. For this, they also need to understand what data is valuable to the organization and design an architecture that can handle data from several sources, in a wide variety of formats.
In short, they create a blueprint for what your data’s home will look like! 🏡
Think we're missing something? 🧐 Help us update this article by sending us your suggestions here. 🙏