article-spots
article-carousel-spots
programs
Hard skills
Big Data for everyone. An inside into technologies of tomorrow
11 Feb 2020

Still wondering how to explain Big Data to your granny and touch on a GMO topic along the way? Check out a story about developers from Lviv in their fight against world hunger. It’s a great starter to discuss genetics and technologies altogether.

Meet our expert, Volodymyr Fedorchuk, an EPAM project manager for seven years. Sixteen years ago, he started his career in IT as a system engineer and developer. In 2012, he became a Big Data delivery manager. Over the past few years Volodymyr has been working on genetics projects.

Big Data is a stack of technologies which enable processing of enormous volumes of data. They’re applied to lots of various tasks such as public opinion analysis, urban design, and emergencies detection. They’re also used for fighting hunger.


Feeding everyone

We need food to survive, but there’s not enough food for everyone. The problem is being addressed differently. From 1940s to 1970s, the developing nations implemented advanced farming techniques, invented high-yielding varieties of crops, broadened the range of application of fertilizers, pesticides, and advanced machinery. This whole new set of developments in farming was called “the green revolution”. It affected such countries as India, Mexico, Philippines, Pakistan, Peru, Columbia, and Nigeria.

But what was next? That’s when the selection entered the stage. Based on Charles Darwin’s studies, the selection enabled raising many varieties of agricultural plants and breeds of domestic animals of certain qualities. Back then, it seemed like the end of the problem. But even today the World Health Organization reports 820 million starving people around the globe.


No GMO (is a bad idea)

While your granny is in an active search for non-GMO labels in a supermarket, scientists have high hopes for genetic modifications.

This technology enables modifying plants by inserting certain genes in their structure. These genes allow to resist adverse impact on climate, increase yield, and prolong a conservation period.

One of the main benefits of GMO is the reduced need for pesticides. Now you don’t have to pour insecticides on your potato if you can raise the same potato which is immune to insects. Besides, you can choose a combination that will be harmful only for insects but not for people.

“The developing countries face a serious problem in the form of vitamin A deficiency. Millions of children under five years old die. And even those who survive remain blind. To solve this problem, scientists created golden rice”.

The golden rice is a genetically modified variety of rice, which grains contain lots of beta-carotene. When cooked, it has the vitamin A.

EPAM also works on such projects. So, let’s dive into genetics to understand the type of data that developers work with.


I’m a program, and you’re a program

Each cell of a life form has chromosomes that contain deoxyribonucleic acid or DNA. It includes a genetic code that determines the development and functionality of our bodies. It’s also transferred from one generation to another. DNA appears as a double helix composed of nucleotides. A gene is a sequence of the nucleotides that code a certain feature, such as an eye color. The nucleotides contain four nitrogenous bases. These are guanine, adenine, cytosine, and thymine (G, A, C, T). This is the code used to write all life on Earth.

“DNA is a software of all living forms. Computer programs are made of binary code: zeros and ones. The life is made of four elements: G, A, C, T”.

The program code is stored in archives, while the genetic code is contained in chromosomes. They are basically hard drives with info. In nature, there’s a phenomenon called a horizontal gene transfer which is common to bacteria. This phenomenon can be viewed as an open source. Both environments host viruses. As soon as a virus penetrates a cell, it introduces its own code that makes the cell do whatever the virus wants and not what the cell used to do before.

DNA can be fully digitalized with high accuracy of 99,9%. Though, storing and processing such massive volumes of data may appear a problem. Indeed, working with dozens of chromosomes, thousands of genes, and billions of nucleotides requires special tools.


DNA on my flash card

Within the scope of its projects, EPAM performs the sequencing of a plant genome and processing of this info using Big Data.

DNA sequencing is the sequencing of nucleotide bases (G, A, C, T). The nucleotides are marked with different colors. These fragments are copied using a sequencer. Then scientists can analyze them using special software.

“We receive data from sequence machines and upload them to a database. There’re many different formats which can be converted. We don’t need all the data for our work. That’s why we cleanse them first. Our main task is to upload these data and handle them to the scientists in the desirable format”.

Minor tasks include data parallelism. It’s the development of algorithms that secure uninterrupted uploads of data. The data should be verified. Further on, it’s users who work with them. They extract certain data segments and identify the genes.

The information is stored in hubs or data lakes. It’s the best solution for such type of data.

“Harvesting happens twice a year. And we need high-power capacity to process all these data. What is great about cloud technologies in this case? We launch two hundred computers that process the data and uploaded them to cloud environment. It’s a much more affordable solution for episodic computations since it doesn’t require the constant employment of servers”.

Each day a data lake downloads data from various systems. Downloading can last three or four hours with five extra hours spent on data processing. A client can employ machine learning to work with such massive volumes of data. Thus, we can predict which plants are better to interbreed so that they could grow in preset conditions. These are mainly temperature, humidity, and soil.

This solution won’t put an end to the hunger immediately. But for now, it seems like the most relevant and suitable one. If there were no Big Data technologies, working with such type of data would take too much time and require too many resources.

And the amount of data is just getting bigger and bigger…

So, next time when your granny comes back from a supermarket, tell her how much data was processed to get these new varieties of plants. Just for these ruddy apples to appear on the shelves.


The source: Lviv.com