Author: Ana Trisovic
This study provides an extensive analysis of artificial intelligence (AI) integration within academic sciences, specifically examining the adoption, use, and customization of open-source AI foundation models. By manually collecting and analyzing data from over a 1,000 AI foundation models—documenting characteristics such as model size, institution of origin, levels of openness (ranging from fully open-source to restricted access), training data, and software availability—the paper establishes a dataset that reflects the landscape of AI resources available to researchers.
Complemented by a dataset of almost 100,000 open access academic papers retrieved from Semantic Scholar that cite these models, the analysis investigates the scholarly engagement with AI in research publications. Using large language models, namely Llama 3.1 and GPT-4o, we categorize the use of foundation models in academia into three main applications: the development of novel AI technologies, the customization of existing models, and the employment of AI as a routine tool in scientific methodology.