Exploring bioinformatics

(From left to right:) KAUST Biological and Environmental Science and Engineering researchers Dr. Octavio R. Salazar, Dr. Arun Nagarajan, Dr. Robert Lehmann, Manjula P. Thimma and Dr. Alaguraj Veluchamy developed and organized a recent University workshop covering the fundamentals of bioinformatics analysis of genomics and transcriptomics data. File photo.

​-By Robert Lehmann, KAUST News

DNA and RNA are fundamental molecules encoding the building plan of every living organism. Sequencing methods for DNA and RNA have become ubiquitous tools for the life sciences and healthcare. For example, sequencing has opened up entirely new avenues of research in diverse fields, such as the study of evolution or the understanding of the molecular mechanisms of complex diseases like cancer.

Due to the sheer amount of data generated by these methods, practitioners require extensive experience in handling, processing and analyzing large amounts of data. To meet an ever-increasing use of sequencing and need for computational analyses at KAUST, a week-long workshop covering the fundamentals of bioinformatics analysis of genomics and transcriptomics data took place in August on the University's campus. Targeted at students, postdoctoral fellows and senior researchers not necessarily trained in advanced data analytics, the workshop also focused on providing hands-on training in coding and analysis.

Training researchers in data analysis

Bioinformatics tools and pipelines typically require knowledge of programming languages such as R, Python and shell scripts—much like other data analysis-intensive areas, such as computational biology, computer vision, scientific computing and machine learning. For these reasons, data analysis-oriented training opportunities targeting specific application areas are valuable additions to the educational landscape at KAUST.

There are two typical sequencing-related tasks that researchers face regularly: The first is how to examine and assemble an enormous set of sequenced DNA fragments into a genome sequence resembling the original full-length DNA sequence; and the second is how to use a similar set of sequenced RNA pieces to infer the activity of all genes in the organism of interest at the time of sampling. Both problems can be broken down into a multitude of smaller problems that can be solved in various ways.

Methods for sequencing DNA and RNA are extremely important tools in the life and medical sciences. However, these methods generate huge amounts of data that must be handled, processed and analyzed. Image courtesy of Shutterstock.

Understanding the concepts and algorithms on one hand and the practical data handling on the other had to be carefully balanced to allow for mastering the practical examples using the high-performance computing environment of Ibex at KAUST.

Workshop organizers handed out a questionnaire that indicated most attendees had little background in computational biology and bioinformatics, but they had the courage to move out of their comfort zones and attend the sessions. This meant the event required an introduction to some of the fundamental concepts and tools in data analysis.

Accordingly, the workshop was split into three blocks: First, organizers introduced data handling techniques and the HPC environment of Ibex; they then introduced methods, tools and algorithms used to analyze next-generation sequencing data and assemble a genome; and finally, the theoretical concepts were brought to life using an example project. The workshop's last block demonstrated how to compare transcription data between different samples.

Creating a unique event

The workshop was developed and organized by five KAUST postdoctoral fellows and research scientists from the University's Biological and Environmental Science and Engineering division

Postdoctoral fellow Arun Nagarajan's research interest lies at the saddle-point of experimental and computational approaches to understanding biological questions.

Research scientist Dr. Robert Lehmann is a bioinformatician by training and has a background in the systems biology of the circadian clock. His work on marine non-model organisms focuses on generating reference genomes together with transcriptomic data sets to understand responses to climate change, while his work on model organisms is focused on the transcriptional regulation underlying complex neurodegenerative diseases.

Postdoctoral fellow Octavio R. Salazar is a hybrid between a bioinformatician and an experimental biologist. He sees bioinformatics as a tool to identify novel interactions and generate new hypotheses that may later be tested in laboratory conditions.

Research scientist Manjula P. Thimma is bioinformatician working on understanding the role of LINE1 elements during differentiation, and she also studies cellular reprogramming. She is currently working on single cell transcriptomics to elucidate the role of cell-type specific drug response in melanoma and multiple sclerosis patients.

Research scientist Dr. Alaguraj Veluchamy is a bioinformatics scientist in the KAUST Laboratory of Chromatin Biochemistry. His research interest includes genome organization and the epigenetics of metazoans.

Participants take part in the recent bioinformatics and advanced data analytics workshop on the University's campus. Feedback for the workshop was 'overwhelmingly positive,' noted the organizers. File photo.

Gaining new knowledge 

Workshop participants gave overwhelmingly positive feedback about the workshop. They noted that a main insight from the first installment of the event was to extend the time dedicated to covering data handling and wrangling. They also encourage interested participants to take advantage of other data carpentry-oriented workshops offered at KAUST in preparation for similar workshops in the future.

"The workshop content was fair and straightforward," stated participant and KAUST Ph.D. student Rayyan Alamoudi. "I built a basic understanding of how bioinformatics-related tasks work. [The event] helped in building bioinformatics skills. Everything in the course was new to me, and I never knew that everyone at KAUST has access to Ibex, for example...I love how all the instructors handled their part of the material."

Participant Sumy V. Baby, a KAUST bioinformatics programmer, said she learned the "basics of RNASeq and grabbed knowledge about Ibex, slurm script, job execution and the visualization of genes using genome browsers."

The workshop successfully filled an important gap in the landscape of available extracurricular training events at KAUST, and the organizers plan to run it again in August 2020.

Related stories: