article-spots
article-carousel-spots
programs
Real stories
Data Quality Engineers – professionals in charge of data reliability
10 Mar

According to the Harvard Business Review, completing any task can cost a business 100 times as much when data is incorrect as it does when data is accurate. The research refers to the 1-10-100 rule devised by George Labovitz and Yu Sang Chang, which emphasizes the importance of maintaining a high level of data quality.

It's no wonder that in today's world, where data is crucial for the success of any business, the profession of a data quality engineer is quickly becoming popular. Viktoriia Vakhrina, Senior Data Quality Engineer at EPAM, explains what these professionals do and what beginners should know.

Computer science, working with data, and aviation

I began my career at EPAM around three years ago, with the Data Quality program at EPAM. Before that, my job was related to aviation. I was a senior flight attendant, and at the same time, I was pursuing my second degree in Software Engineering at Kyiv National Polytechnic University.

My interest in information technology sparked a long time ago, so when the number of flights and my workload decreased significantly due to the pandemic, I devoted my free time to researching IT trends and learning Python since the language is often used in areas related to data processing. Later, I came across a course in data quality from EPAM and realized that this specialization perfectly blends working with data and technology and has a lower entry threshold than the profession of a developer or DevOps engineer, for example. After completing the program and passing several interviews, I began working with EPAM, and, in less than two and a half years, I worked my way up from Junior Specialist to Senior Data Quality Engineer.

Data quality from the practitioner’s perspective

Data quality engineering, as the name implies, is aimed at checking the quality of the data. DQ engineers handle data at every stage, from obtaining raw data in various formats to transforming, storing, processing, and visualizing it with tools like Power BI or Tableau. We know how to process data and what checks to perform to ensure its quality.

How does this work in practice? DQ engineers don't call people to ask if their email is correct in the database. Instead, we check whether the data meets the business requirements and is suitable for getting the desired result for the customer.

For example, a company stores sales information and wants a weekly BI report to see its sales dynamics. The data quality engineers ensure that the data from the chain of stores matches the specified parameters (correct product names, sale dates, purchase prices, etc.). Then, they check whether the data is loaded into the database correctly, is consistent, without duplicates, whether all further aggregations that the customer wants to see in the reports are mathematically correct, whether the data is displayed correctly in the reports, whether the reports are updated with the new data, etc. DQ engineers work closely with business analysts, who provide a list of customer requirements, as well as data analysts, data engineers, or developers, depending on the project's specifics.

By the way, for security reasons, on most projects, we work with specially generated test data instead of confidential business information.

DQ engineers' tools of the trade

The toolkit of DQ engineers is incredibly diverse; its precise configuration depends on the project's specifics. However, in 90% of cases, SQL or SQL-like tools are used, available on most cloud platforms.

Python and its libraries for working with data are indispensable, especially for automated testing.

Big data solutions such as Spark and Hadoop prove to be useful when dealing with massive amounts of data.

Communication skills are essential, as DQ engineers frequently interact with business analysts and developers. Misunderstandings within the team can cause many complications, so you should clarify all the subtleties and ask questions until you are 100% sure all parties involved understand each other correctly.

And of course, DQ engineers must know how to work with project documentation.

DQ engineers' pet projects

Unlike developers, data quality engineers have a rather limited list of pet project ideas. However, there are many open datasets on the Internet for you to practice on, for example, on Kaggle.com, a well-known resource in the DQ community. You can use them to build dashboards, run classic and simple data checks, and consider what you could test at each stage. You can also access training datasets on the AWS and Google Cloud platforms. This kind of practical experience may come in handy during interviews.

Personal qualities that will help you achieve success in this profession:

  • Meticulousness and attentiveness;
  • Curiosity and communication skills - sometimes you have to find out the undocumented subtleties of a project;
  • Tolerance for occasional monotonous work;
  • Ability to think outside the box and identify weaknesses and discrepancies not visible at first glance.

The importance of English

For people who want to develop in IT, learning English is not a whim but a necessity. The profession of a data quality engineer is no exception. Most fresh publications are available only in English, and communication with customers and sometimes with the team is also in English. My language proficiency is high, and I work hard to keep it that way. 

Starter kit for beginners

The EPAM Data Quality Engineering program equips beginners with all the necessary skills to enter the profession. Even unskilled novices can give it a shot, provided they are willing to invest a great deal of effort and time in training. Passing the program will be easier for candidates who have basic knowledge of SQL, relational databases, and Git, and understand CI/CD processes and basic testing concepts, such as test cases, bug reports, etc. Understanding these topics will speed up the process of becoming a data quality engineer.

And what about AI?

Even if artificial intelligence replaces DQ engineers, I doubt it will happen anytime soon.

The tester's task is to question many aspects. While AI knows how to perform basic checks on systems that work well, it is unlikely that it will be able to model parameters to check the system's performance if it receives incorrect or even unexpected data. Today, artificial intelligence does not think critically and cannot accidentally notice a discrepancy when checking other parameters. Therefore, we can be sure about the professional future of DQ engineers. At least for the time being.

Sounds interesting? Learn more about the open enrollment for the Data Quality Engineering program and try your hand at data quality assurance! All opportunities to start a career in Data are available here.