Making existing COVID-19 multiOMIC data quickly available for research

Limited use of existing data

Did you know that an enormous amount of data is generated every day? It is estimated that about 2.5 quintillion bytes of data (2.5 followed by 18 zeros) are created daily, and this number is growing exponentially! Making use of all this data is one of the great challenges of our time.

The trend in academia is also towards open science, which has led to a huge increase in the amount of scientific data that is accessible to everyone. In addition, researchers work in different communities and consortia around the world. Consequently, data is stored in many different places, so called “data silos”, (e.g. publications, websites, clouds, servers, etc.) distributed across the globe. Thus, it is often very difficult for individual researchers to collect all of the existing and important data, resulting in limited use of it.

Hu et al., a collaboration of knowing01 with Marsico Lab and Knauer-Arloth Lab at its pre-incubation academic host institution Helmholtz Munich, Germany, were interested in the question of why symptoms vary so much in COVID-19 infected individuals. Therefore, they investigated the influence of genetics and pre-existing conditions on the severity of COVID-19 symptoms.

One of the challenges was finding existing data and using it for the study to evaluate the novel multi-modal network embeddings.

With the help of knowing01 software and our curated COVID-19 data library, the data space was expanded tremendously. By overlapping COVID-19 GWAS variants with known human genes, we were able to identify COVID-19 affected genes, which could then be linked to various pre-pandemic datasets. This allowed the authors to establish an association of these affected genes with brain and gut tissues, as well as ischaemic heart disease, cerebrovascular disease, and hypertension, and to establish a link between the PTPN6 gene and a genetic predisposition to COVID-19.

knowing01 software quickly leverages public data

The biggest challenge is finding important public data and incorporating it into one’s research. This task consumes a lot of time and knowledge of where the existing data is stored. The software at knowing01 is optimized to quickly link many different datasets that come from a variety of resources, creating a large multiOMICs data library. This has allowed us to contribute to the COVID-19 analyses of Hu et al. in the following way:

  • Data search space expansion: Identification of COVID-19 genes by finding, retrieving and linking genetic patient data with genes. This allowed us to quickly increase the data search space for our collaboration partners.

