What is data engineering?

Data engineering involves cleaning and transforming raw data into a consistent format and storing it in data repositories for further analysis.

It draws influences from data science and computer engineering and is considered to be a discipline adjacent to data science.

If you're wondering how is data engineering different from data science, here's a quote that sums it up.

A scientist can discover a new star, but he cannot make one. He would have to ask an engineer to do it for him.

Gordon Lindsay Glegg

Who is a data engineer?

Data engineers set up, manage, optimize and maintain the required data infrastructure, so that data is ready for further analysis and use within the organization.

You might be wondering what skills do data engineers have? Here's a list (which by no means is an exhaustive one):

  1. Background in software engineering
  2. Familiarity with scripting and programming languages (SQL, Perl, Java, Ruby, C++)
  3. Skilled in data modeling and architecture
  4. Knowledge of database solutions (SQL and NoSQL), ETL tools and various operating systems (Linux, Ubuntu)

With the rise of big data, data engineers must be able to work with big data technologies such as Hadoop and Spark, cloud tools and distributed computing.

What does a data engineer do?

Data engineers clean, transform and model data sets using programming languages such as Python and R to build data pipelines and repositories.

You cannot analyze data before organizing it in a usable format in one place. That's where data engineers come in.

What does a data engineer do? Image courtesy: Monica Rogati

They primarily focus on:

  1. Building, testing and maintaining the organization's data pipelines and architectures (data warehouses, databases)
  2. Managing data security, backup and recovery, access and data integrity
  3. Cleaning and wrangling data from various sources and formats into a usable state for further analysis

Think we're missing something? 🧐 Help us update this article by sending us your suggestions here. 🙏

See also

Articles you might be interested in

  1. Data engineering: A quick and simple definition
  2. A beginner's guide to data engineering
  3. The different data science roles in the industry