Purpose
Attention mechanisms are increasingly applied to genotype–phenotype mapping problems, particularly for capturing epistatic interactions. Rijal et al. (2025) recently demonstrated an attention-based model for this task, but their architecture omitted standard transformer components like skip connections, layer normalization, and feed-forward sub-layers.
Here, we test whether incorporating these canonical elements improves predictive performance. Using the same yeast dataset (~100,000 segregants, 18 growth phenotypes), we show that standard transformer components moderately improve accuracy. We also find that predicting all phenotypes jointly provides additional gains by leveraging cross-phenotype genetic correlations, an advantage the original single-output approach couldn't exploit.
This work should interest researchers applying deep learning to genotype–phenotype problems. Our results suggest that well-established architectural choices from the broader ML literature transfer well to genetics applications, and that multi-task learning offers a straightforward path to improved predictions when correlated phenotypes are available. We share all code and model checkpoints to enable rapid iteration by others.
View the notebook
The full pub is available here.
The source code to generate it is available in this GitHub repo (DOI: 10.5281/zenodo.15320438).
In the future, we hope to host notebook pubs directly on PubPub. Until that’s possible, we’ll create stubs like this with key metadata like the DOI, author roles, citation information, and an external link to the pub itself.