How the Virtual Journal Works
The Signal Transduction Virtual Journal contains articles pertaining to signal transduction selected from a number of other journals available online through HighWire Press. The articles in the Virtual Journal have not been selected by human authorities; rather, selection is achieved through an automatic indexing process supplemented by editorial oversight.
Prior to June 2002, the indexing software chose articles by comparing key words in the title and abstract to terms found in a database of thousands of articles previously classified by human indexers at the National Library of Medicine as pertaining to signal transduction. The Science Signaling editors checked the Virtual Journal to see if articles had been included that were clearly irrelevant to signal transduction and also to see if articles pertaining to signal transduction had been omitted. This allowed us to amass a large database of articles in the Virtual Journal clearly identified as belonging to the discipline of signal transduction and to further refine the selection process.
The indexing software currently performs a linguistic analysis of the titles and abstracts of newly published articles and compares observed phrases to those in articles already included in the Virtual Journal. The indexing program adds new articles into the Virtual Journal based on their perceived similarity to articles already established as belonging in the discipline of signal transduction. It excludes articles based on their lack of perceived similarity.
Like the rest of us, the new indexing program is designed to "learn" from its mistakes. Manual correction of errors is used to "reeducate" the software, and reduce the likelihood that it will make similar errors in the future. In effect, the Virtual Journal's software learns to recognize signal transduction articles by observing how humans do it.
How well does it work? You tell us. The Virtual Journal's semantic indexing software is under continuous improvement at AAAS. As a system still in development, it occasionally yields some unexpected results (i.e., don't be too surprised if it includes a few articles that appear to have little to do with signal transduction). So please, let us know if you see an irrelevant article and also if you notice that a signal transduction-related paper is missing. By alerting us to the errors that occasionally slip past the editors, you will help us train the system to perform even more accurately in the future.
Dexter, the first version of the Virtual Journal indexing software (used from 1997 through May 2002), was based on an algorithm developed by Plaunt and Norgard. This algorithm relies on two modes: a learning mode, in which content that is already indexed is analyzed, and an index mode, in which cooccurence metrics developed in the learning mode are used to predict index terms for new content. The original implementation of Dexter analyzed a set of 20,000 Medline records with associated MeSH terms. Single words in the title and abstract were used in learning mode; single words in the title were analyzed in index mode to predict MeSH terms and, consequently, whether the candidate article belonged in the Virtual Journal.
Dexter Jr., the current version of the indexing software (implemented June 2002), uses the current contents of the Virtual Journal as its database. Software from Semio, Inc. is used for linguistic analysis of titles and abstracts based on noun phrases, rather than individual words, and titles and abstracts are both used in indexing mode, as well as in learning mode. The algorithm is used to directly predict set membership in the Virtual Journal, rather than predicting MeSH terms. Articles are indexed as "in the Virtual Journal", "out of the Virtual Journal", or "deleted from the Virtual Journal"; the editorial act of deleting an article decreases the likelihood that similar articles will be included in the future. The implementation is in Java and uses a Sybase RDBMS for persistent data.
C. Plaunt, B.A. Norgard, An Association Based Method for Automatic Indexing with a Controlled Vocabulary, Journal of the American Society for Informations Science 49, 888-898 (1998).
Science Signaling. ISSN 1937-9145 (online), 1945-0877 (print). Pre-2008: Science's STKE. ISSN 1525-8882