NEWREF(1) OGMP SEQUENCE UTILITIES NEWREF(1) NAME newref - find new medline references through entrez SYNOPSIS newref [-u] [-n number] [-c config] [-i | -p range | -f range | -l] [-C cache_file] DESCRIPTION Newref is a script used to keep up with the constant flow of publications. It will access the entrez subset of the medline database and report all new articles that match a certain number of criteria specified in a configuration file (see below). After invocation, newref will save the list of documents it has displayed in a cache file so that future invocations will be able to determine what is new. Config_file: Newref takes as input a configuration file that imposes restrictions on the articles to search for. A simple configuration file might look like this: # A comment SPECIFICATION = New articles on horses TITLE = horses END Every line in the configuraton file that starts with "#" is a comment and is ignored by newref. Blank lines are also ignored. The SPECIFICATION line summarizes what kind of articles newref will fetch from the database. It has the syntax : SPECIFICATION = < Free text describing the query > Articles to be fetched can be specified by defining (restricting) at least one the fields : "AUTHOR" "DATE" "ECNUMBER" "GENE" "JOURNAL" "MESH" "SUBSTANCE" "TEXT" "TITLE" In the example above, the articles are restricted to have the word "horses" in their title. Every query ends with a line containing the single word "END". More complex queries: It is possible to define more than one field in the same query, as in : # A comment SPECIFICATION = New "horses" articles AUTHOR = rioux TITLE = horses END which further restricts the previous search for articles on horses to those that were written by any author whose last name is rioux. More elaborate restrictions on a field can be formulated by using the operators "," "|", "!"and "*". The comma stands for "AND", the "|" stands for "OR", the "!" stands for "NOT"; when a term of the restriction ends with "*", nclever truncation mode is applied to perform the query (see the nclever manpage). Examples: TITLE = do* search for article whose title contain a word that starts with do ( as in "dog", "donkey", .... ) TITLE = dog, cat means articles whose title contains "dog" AND "cat" TITLE = dog | cat means articles whose title contains "dog" OR "cat" TITLE = dog ! cat means articles whose title contains "dog" but NOT "cat" There are a few things to consider when building these more complex queries: 1. Spaces in the vicinity of the operators "," "!" and "|" are ignored. 2. "|" has higher precedence than "," so a query like: TITLE = dog, cat | giraf means articles whose title contains "dog" AND either "cat" OR "giraf". It does not mean articles whose title contains "dog" AND "cat" or articles that contain "giraf". 3. In a given restriction, there cannot be a "," or "|" used after a "!" operator (but you can use more than one "!" if you want). So for example this restriction is not valid : TITLE = dog ! cat | giraf since there is a "|" operator used after the "!". This is legal though: TITLE = dog, cat | giraf ! lions ! mo* ! tiger 4. Restrictions can span more than one line, in which case the operator "\" is used to indicate a line continuation. The previous TITLE restriction could thus be written as : TITLE = dog, cat | \ giraf ! lions \ ! \ mouse ! \ tiger As a final example a more elaborate configuration file: # Sample newref configuration file. # # First article specification. It searches all articles that # have "horse" or "horses" in the title. SPECIFICATION=Articles that talk about horses TITLE=horse|horses END # Second article specification. Articles on gene cat (not on felines!), # by a particular author (Tim Littlejohn) SPECIFICATION=Papers by Tim about the cat gene AUTHOR=Littlejohn TG GENE=cat END # Third article specification. Articles written by rioux but not gras # that were published after 1990 SPECIFICATION = rioux' papers w/o gras AUTHOR = rioux ! gras DATE = 1993 | 1994 | \ 1995 | 1996 END ----- The default configuration_file name is ~/.newrefrc but any other name can be specified via the -f option. Newref will abort it's execution if it finds any error in the configuration file. If not, it will try to access the entrez medline database to seek new references. All the references that newref has already displayed are kept in the file ~/.newref_cache and they are indexed by medline uids. Only the medline uids are kept in the cache file, wheras the abstracts are printed to the screen. Newref's output can also be redirected to a file by using the operators ">" or ">>" in order to save the list of new abstracts in a file. By default, newref will output all the abstracts of the articles it has found that were not in the newref cache file. If the -u option is used, newref will only output the new medline uids instead of the complete abstracts. It will then update the cache file with the new items displayed. By default, 100 items (uids or abstracts depending on the -u switch) are printed on the screen. This default value can be changed via the -n switch to specify how many items are to be printed. OPTIONS -c config_file Use config_file instead of the default config file ~/.newrefrc -C cache_file Use cache_file instead of the standard cache_file ~/.newref_cache. -u Print only the new medline uids (instead of the complete abstracts) on output. -n number Print number items instead of the default 100 -l List all the specs, with their associated ids (numbers) -f range Flush the cache file for the specs in the range specified (the seen references for these specs are erased from the cache) -p range Seek new references, but only for the specs whose ids are specified in range (this will also update the cache accordingly) -i Run newref in interactive mode Note that a valid range is a set of comma or dash-separated numbers, as in: 1,2,3,4 1-4 1-1,2,3,7-9 9 No blanks are allowed in a valid range specification. FILES /share/supported/apps/ogmp/bin/nclever The tool used to access the entrez database. BUGS nclever cannot reload very large lists of MEDLINE UIDs (the exact size where the problem starts to occur is not known, but it has been seen to fail at 22000 UIDs). The problem seems to lie in the NCBI toolkit's Entrez libraries used in compiling nclever. SEE ALSO nclever(1), entrez(5). AUTHOR Gertraud Burger (conception, testing) Pierre Rioux (project management) Nicolas Brossard (prog. design, manpage) MODIFICATIONS LOG * Revision at 2004/01/20 by Liusong Yang nclever is not applied to perform the query. Instead, we build a web link of query to the Entrez database with the base URL : http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=search and http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=retrieve * Revision at 2004/03/26 by Liusong Yang The URL www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=retrieve is replaced by www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=text to retrieve results in text format directly.