Protein molecules carry out important tasks in the cells of all organisms, from providing building material for the cellular structure to catalyzing cascades of complex biochemical reactions, as well as acting as signaling molecules to establish communication between cells. Thus, it is the composition of the cells’ proteins, rather than sequences of the genomes, that determines cellular properties and subsequently the overall physiological state of organisms.
Proteins are synthesized by ribosomes, which are macromolecular nanomachines that convert genetic information from RNA molecules, transcribed from genomic segments, into proteins. They do this by employing a myriad of RNA and protein molecules that assist in the synthesis of proteins. Protein synthesis is the most energy-consuming process in the cell and a typical human cell has millions of active ribosomes.
Ten years ago, Nicholas Ingolia and Jonathan Weissman at UCSF developed a technique called “ribosome profiling” that can provide a snapshot of the activities of all ribosomes in the cell by capturing the fragments of RNA that are being decoded, often called ribosome “footprints” by researchers. Footprints are short sequences of RNA varying from ≈20 to ≈40 nucleotides, which depend on the organism and the functional state of the ribosomes producing them.
A typical ribosome profiling experiment produces hundreds of millions of such sequences that then need to be computationally processed. The processing involves accurate mapping to the sequences of genomes or RNA molecules and separating the genuine ribosome footprints from other RNA fragments such as the debris of the destroyed ribosomes. This processing is followed by elaborate computational analysis of the mapped footprints.
Downstream analysis can include the detection of novel proteins and variants of already known proteins, called proteoforms, and comparison of individual protein synthesis rates. The analysis could also provide important information regarding the dynamics of the process, e.g., by identifying the locations where ribosome movement along RNA is frequently halted.
In a recent study published in WIRES RNA, researchers from the LAPTI laboratory at University College Cork compiled the computational approaches, software tools, and data resources that have been developed over the last ten years for ribosome profiling data processing and analysis.
The study outlines all the necessary steps involved in the initial processing of the data and highlights potential sources of artifacts and explains how to avoid them. A special section of the review is dedicated to the quality assessment of experimental data.
Prof. Pasha Baranov, a senior author of the review, states: “The quality of ribosome profiling data varies dramatically between datasets, and it is not just good or bad data, some datasets are very well suited for one purpose, but useless for another. Hence it is very important to characterize different quality parameters of the data.”
The article also provides readers with references to numerous software packages developed for specific types of data analysis, e.g., detection of ORF translation, gene expression analysis, and ribosome pause detection. Researchers can use this comprehensive guide to choose the most suitable tool depending on their specific goals and a preferable computational platform.
Some resources provide access to publicly available data and can be used online, such as those of the RiboSeq.Org portal, which includes the genomic browser GWIPS-viz, the graphical computational environment Trips-Viz, and RiboGalaxy, which is an instance of Galaxy tailored for ribosome profiling data analysis.
Challenges in ribosome profiling data analysis may inspire computational biologists to search for novel, potentially superior, solutions that will improve and expand the bioinformatician’s toolbox for ribosome profiling data analysis.