The U.S. Department of Commerce’s National Institute of Standards and Technology (NIST) and the White House Office of Science and Technology Policy (OSTP) have just launched a joint effort to support the development of search engines for research that will help in the fight against COVID-19. The project was developed in response to the March 16 White House Call to Action to the Tech Community on New Machine Readable COVID-19 Dataset.
In this effort, NIST will work initially with the Allen Institute for Artificial Intelligence, the National Library of Medicine, Oregon Health & Science University (OHSU), and the University of Texas Health Science Center at Houston (UT Health). The team will apply the successful, long-running program of expert engagement and technology assessment called the Text Retrieval Conference, or TREC, to the COVID-19 Open Research Dataset (CORD-19), a resource of more than 44,000 research articles and related data about COVID-19 and the coronavirus family of viruses. The TREC-COVID program goals include creating datasets and using an independent assessment process that will help search engine developers to evaluate and optimize their systems in meeting the needs of the research and health-care communities.
The team will first release a series of sample queries for the biomedical research community, developed by team members at the National Library of Medicine, OHSU and UT Health. Registered participants in TREC-COVID will use their information retrieval and search systems to run the queries against the CORD-19 document set and return their results to NIST. Biomedical experts will then review test results, including document relevance rankings, to assess the overall performance of the retrieval systems.
Using proven TREC protocols, NIST will score the submissions and post the scores, the retrieval results themselves, and the lists of key reference documents to the TREC-COVID website. These “test collections” can then be used by information retrieval researchers to evaluate and enhance the performance of their own search engines. This effort is intended to help researchers understand how search systems could best support medical researchers when available information is developing quickly, as in the current pandemic.
The Allen Institute for Artificial Intelligence has been releasing an expanded CORD-19 document set each Friday to capture the most recent articles on COVID-19 and related coronaviruses. Later rounds of TREC-COVID will use the larger releases of CORD-19 and expanded query sets. Participants will have one week to submit their search results, and within about a week NIST will post results, with an expected spacing of about two weeks between each new dataset round being released. The team initially anticipates conducting five consecutive rounds of search system assessments. Interested organizations are invited to register to participate in the TREC-COVID program on the NIST website.