Mendeley’s open data for science and learning: a reply to the DataTEL challenge
Abstract
In this paper two open access data sources, provided by Mendeley Ltd., are described in detail. The first data source is a snapshot of 50,000 user libraries, enabling researchers to test collaborative filtering algorithms for scientific paper recommendation. The second data source is Mendeley’s API that provides access to 50 million research articles, crowdsourced from over 1.5 million users. This API allows researchers to build third party applications on top of real-time and large scale data. By providing data sources for the study of collaborative filtering, metadata extraction, deduplication and related research, Mendeley hopes to enable researchers in these domains to better collaborate and share knowledge, ultimately encouraging good research practices and advancing the state of scientific knowledge.
Keywords
References
- 1. J. Beel, B. Gipp, '‘Link analysis in mind maps: a new approach to determining document relatedness’' Proceedings of the 4th International Conference on Ubiquitous Information Management and Communication’10 (2010) Google Scholar
- 2. L. Candillier, K. Jack, F. Fessant, F. Meyer, M. Chevalier Ed., C. Julien Ed., C. Soule-Dupuy Ed., '‘State-of-the-art recommender systems’' Collaborative and Social Information Retrieval and Access Techniques for Improved User Modeling (2009) Google Scholar
- 3. I.G. Councill, C.L. Giles, M-Y. Kan, '‘ParsCit: an open-source CRF reference string parsing package’' (2008) Google Scholar
- 4. H. Drachsler, T. Bogers, R. Vuorikari, K. Verbert, E. Duval, N. Manouselis, G. Beham, S. Lindstaedt, H. Stern, M. Friedrich, M. Wolpers, N. Manouselis Ed., H. Drachsler Ed., K. Verbert Ed., O. Santos Ed., '‘Issues and considerations regarding sharable data sets for recommender systems in technology enhanced learning’' Proceedings of Elsevier Procedia Computer Science (2010) Google Scholar
- 5. B. Gipp, J. Beel, C. Hentschel, '‘Scienstein: a research paper recommender system’' Proceedings of the International Conference on Emerging Trends in Computing (ICETiC09) (2009) Google Scholar
- 6. M. Granitzer, M. Hristakeva, R. Knight, K. Jack, '‘A comparison of metadata extraction techniques for crowdsourced bibliographic metadata management’' Proceedings of the 27th Symposium on Applied Computing (2012a) Google Scholar
- 7. M. Granitzer, M. Hristakeva, R. Knight, K. Jack, R. Kern, '‘A comparison of layout based bibliographic metadata extraction techniques’' Proceedings of 2nd International Conference on Web Intelligence, Mining and Semantics 2012 (2012b) Google Scholar
- 8. J.A. Hammerton, M. Granitzer, D. Harvey, M. Hristakeva, K. Jack, '‘On generating large-scale ground truth datasets for the deduplication of bibliographic records’' (2012) Google Scholar
- 9. H. Han, C.L. Giles, E. Manavoglu, H. Zha, Z. Zhang, E.A. Fox, '‘Automatic document metadata extraction using support vector machines’' Proceedings of 2003 Joint Conference on Digital Libraries (2003) Google Scholar
- 10. H. Han, E. Manavoglu, H. Zha, K. Tsioutsiouliklis, C.L. Giles, X. Zhang, '‘Rulebased word clustering for document metadata extraction’' Proceedings of the 2005 ACM symposium on Applied computing SAC 05 (2005) Google Scholar
- 11. J. Haupt, '‘Last.fm: peoplepowered online radio’' Music Reference Services Quarterly (2009) Google Scholar
- 12. V. Henning, J. Reichelt, '‘Mendeley–a last.fm for research?’' (2008) Google Scholar
- 13. N. Kapoor, J. Chen, J.T. Butler, G.C. Fouty, J.A. Stemper, J. Riedl, J.A. Konstan, '‘Techlens: a researcher’s desktop’' Proceedings of the 2007 ACM conference on Recommender Systems (2007) Google Scholar
- 14. P. Kraker, C. Korner, K. Jack, G. Michael, '‘Harnessing user library statistics for research evaluation and knowledge domain visualization’' (2012) Google Scholar
- 15. S. Lawrence, C.L. Giles, K.D. Bollacker, O. Etzioni Ed., J.P. Miller Ed., J.M. Bradshaw Ed., '‘Autonomous citation matching’' Proceedings of the 3rd International Conference on Autonomous Agent (1999) Google Scholar
- 16. G. Linden, B. Smith, J. York, '‘' IEEE Internet Computing (2003) Google Scholar
- 17. A. Narayanan, V. Shmatikov, '‘Robust de-anonymization of large sparse datasets’' (2008) Google Scholar
- 18. D. Parra-Santander, P. Brusilovsky, '‘Evaluation of collaborative filtering algorithms for recommending articles’' Web 3.0: Merging Semantic Web and Social Web at HyperText 09 (2009) Google Scholar
- 19. P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom, J. Riedl, '‘GroupLens–an open architecture for collaborative filtering of netnews’' Proceedings of the 1994 ACM Conference on Computer Supported Cooperative Work CSCW ’94 (1994) Google Scholar
- 20. A. Vellino, '‘Recommending journal articles with pagerank ratings’' Recommender Systems (2009) Google Scholar
- 21. K. Verbert, H. Drachsler, N. Manouselis, M. Wolpers, R. Vuorikari, E. Duval, '‘Dataset-driven research for improving recommender systems for learning’' Proceedings of the 1st International Conference on Learning Analytics and Knowledge, LAK 11 (2011) Google Scholar
- 22. C. Wang, D. Blei, '‘Collaborative topic modeling for recommending scientific articles’' Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2011) Google Scholar


