Predicting scientific impact at an early stage has the potential to impact the rate of scientific discovery, providing information about which concepts, papers or authors researchers should pay attention to. In this project, we are developing an integrated system -- DETAiLS (Discovering and Explaining Technical Emergence through Analysis of the Language and Structure of Scientific Publications) -- based on indicator theory to identify indicators from full text scientific documents and their associated meta-data and to predict scientific impact. Our research features a suite of conceptual and NLP tools that are specifically tailored to extracting indicators that can be used for determining impact.We are exploring features that represent domain-specific concept detection, the structure of journal articles, relations drawn from articles and times-series analysis over these features. We are also exploring meta-data features based on the citation networks and author networks.
The system we have developed can predict future scientific impact of terms, authors, or papers. We have experimented with the relative usefulness of the textual content of scientific articles versus the meta-data associated with it. Our experiments are carried out on 3.8 million full-text scientific articles published by Elsevier and their meta-data. Our results show that full-text features are better predictors than meta-data alone, but that the combination out performs both.