What is a data dictionary?

A modern data dictionary is the go-to tool for the humans of data (i.e. you) to understand everything about their data sets and verify data credibility at a glance.

A good example of a data dictionary would be Atlan’s auto-generated data dictionary, which provides you with information such as variable name, description, type and frequency, among others.

How is it different from a business glossary or a data glossary?

Traditionally, a data dictionary was referred to as a database dictionary. The database dictionary covered variable names, types, descriptions, frequencies and other such information on data sets. However, it only made sense to engineering, operations or IT, but not to business.

Enter the business glossary (or enterprise business glossary) that defines business terms used within an organization.

For example, a variable like date and its specs is an example of an entry from the database dictionary.

Whereas a term like Customer (how does this organization define a customer, how it relates with other terms) is an example of an entry from the business glossary or data glossary (or enterprise data glossary).

What is the importance of a data dictionary?

Collecting data isn't enough if you cannot understand or analyze it. When you deal with TBs of data sets, it's easy to drown in a sea of misunderstood variable names. It's also not uncommon for vast enterprises to place their trust in inaccurate or bad data, and months later, realize that something's wrong.

A data dictionary can help in all such situations. And these aren't the only reasons why.

Three reasons why you need a data dictionary

1. Detect anomalies and outliers in data

A data dictionary helps you spot missing data, outliers, duplicates and errors in data with simple data quality checks at a glance.

2. Evaluate data quality

All the variables in a data set within a data dictionary are given standard names and descriptions, making it easier to understand, interpret and use data for decision-making.

3. Work with more trustworthy data

A data dictionary acts as a single source of truth for all data, including details on data set source, purpose, description, owners and more, making your data more trustworthy.

Think we're missing something? 🧐 Help us update this article by sending us your suggestions here. 🙏

See also

Articles you might be interested in

  1. What is a data dictionary?
  2. What is a SQL server database dictionary?
  3. What is metadata?