Abstract: The human genome is formed by a DNA template of just four letters. Despite this limited alphabet, cells display remarkable precision in regulating gene expression, a fact that has long intrigued scientists. Recently, sequence-based deep learning models have excelled at several computational genomics tasks, suggesting that these models learn complex representations of the data that resemble true biology. However, deciphering the “code of life” within these models for human comprehension remains extremely challenging.
In this talk, I will build upon prior work on using a performant deep learning model, SpliceAI, to study the mechanisms behind a fundamental biological process, the splicing of RNA. Particularly, I will showcase a prototype of an evolution-inspired algorithm that generates genomic sequences with specific target properties. We demonstrate the effectiveness of a grammar-based approach in restricting the search space with biologically plausible constraints. Furthermore, our analysis yields results that align with expected biological outcomes. These findings lay the foundation for our ultimate goal: decoding the latent knowledge acquired by SpliceAI into biologically interpretable expressions that are accessible to human understanding.