Carlota Cardoso, Rita Sousa and Cátia Pesquita, LASIGE researchers, in a collaboration with Sebastian Khöler (Monarch Initiative), present a collection of 21 benchmark data sets that aim at circumventing the difficulties in building benchmarks for large biomedical knowledge graphs by exploiting proxies for biomedical entity similarity, published in Database: The Journal of Biological Databases and Curation.
It is of particular importance in the biomedical domain, where semantic similarity can be applied to the prediction of protein-protein interactions, associations between diseases and genes, cellular localization of proteins, the ability to compare entities within a knowledge graph is a cornerstone technique for several applications, ranging from the integration of heterogeneous data to machine learning, among others.
Database is a top-ranked journal (top 10% Scimago and top 10 in Google Scholar’s Bioinformatics & Computational Biology Category).
The paper is available here.