Recently, foundation models of MS2 spectra have been developed [1]. These models are developed from large amounts of MS2 spectra of unidentified compounds. They provide a link between the MS2 spectrum and a position in the embedding space of the model. Chemically similar compounds end up close in this embedding space. On top of this model, a prediction model for toxicity can be developed using known compounds with their toxicity score. The hypothesis is that prediction models based on the foundation model will be better in predicting toxicity than models developed only on the spectra of the known compounds.
In this project you have to develop the prediction model on top of the foundation model and compare this to other machine learning models that only use the MS2 spectra of known compounds.
[1] https://dreams-docs.readthedocs.io/en/latest/index.html
[2] https://apple.github.io/embedding-atlas/
Study program(s)
MSc Bioinformatics and Systems Biology
MSc Computational Science
