Matching receptor to odorant with protein language and graph neural networks
Mammalian sense of smell can distinguish a myriad of various odors using a combinatorial codingscheme, in which different odors are represented by the activity patterns of hundreds of proteins,called olfactory receptors (ORs). Each odorant molecule activates a set of these ORs, creating arepresentation that our brain eventually interprets as a perception, which we call smell. However,revealing this combinatorial code is a long-standing challenge and determining the code even for asingle molecule, is costly and time-consuming. For humans, nearly 400 laboratory experiments arerequired for each molecule. In this work, we combine protein language1 with graph neural networksto predict OR activation, and propose a tailored architecture incorporating inductive biases fromthe protein-molecule interaction2. On a novel dataset of 46 700 OR-molecule pairs3, this modeloutperforms state-of-the-art drug-target interaction prediction models as well as standard GNNbaselines. Notably, our predictions are in agreement with combinatorial coding theory in olfaction.Our results reveal consistent coding for a large number of odor families and the model suggestsnew insights such as previously unknown pairs of enantiomers with distinct combinatorial codes.

Figure 1: (a) Model overview. The input is a pair of protein sequence and molecular graph.The sequence is embedded using [CLS] token from protBERT and the resulting representation is concatenated to each node of the molecular graph. (b) Graph processing block.
1Elanggar et al., Prottrans: Towards cracking the language of lifes code through self-supervised deep learningand high performance computing.
2Hladiˇs et al., Matching receptor to odorant with protein language and graph neural networks
3Lalis et al., M2OR: A Database of Olfactory Receptor-Odorant Pairs for Understanding the Molecular Mech-anisms of Olfaction, https://m2or.chemsensim.fr