Machine learning to predict retention time of small molecules in nano-HPLC

Sergey Osipenko, Inga Bashkirova, Sergey Sosnin, Oxana Kovaleva, Maxim Fedorov, Eugene Nikolaev, Yury Kostyukevich

Research output: Contribution to journalArticlepeer-review

12 Citations (Scopus)


Retention time is an important parameter for identification in untargeted LC-MS screening. Precise retention time prediction facilitates the annotation process and is well known for proteomics. However, the lack of available experimental information for a long time has limited the prediction accuracy for small molecules. Recently introduced large databases for small-molecule retention times make possible reliable machine learning–based predictions for the whole diversity of compounds. Applying simple projections may expand these predictions on various LC systems and conditions. In our work, we describe a complex approach to predict retention times for nano-HPLC that includes the consequent deployment of binary and regression gradient boosting models trained on the METLIN small-molecule dataset and simple projection of the results with a small number of easily available compounds onto nano-HPLC separations. The proposed model outperforms previous attempts to use machine learning for predictions with a 46-s mean absolute error. The overall performance after transfer to nano-LC conditions is less than 155 s (10.8%) in terms of the median absolute (relative) error. To illustrate the applicability of the described approach, we successfully managed to eliminate averagely 25 to 42% of false-positives with a filter threshold derived from ROC curves. Thus, the proposed approach should be used in addition to other well-established in silico methods and their integration may broaden the range of correctly identified molecules.

Original languageEnglish
Pages (from-to)7767-7776
Number of pages10
JournalAnalytical and Bioanalytical Chemistry
Issue number28
Publication statusPublished - 1 Nov 2020


  • Machine learning
  • Nano-HPLC
  • Retention time prediction


Dive into the research topics of 'Machine learning to predict retention time of small molecules in nano-HPLC'. Together they form a unique fingerprint.

Cite this