SEA Currents September 26th, 2017
CategoriesCategories Contact Us Archives Region Search



Date prong graphic

MEDLINE Character Set Expansion

Posted by on August 27th, 2010 Posted in: All Posts

From the NLM Technical Bulletin:

By J. Shore, Index Section

Shore J. MEDLINE Character Set Expansion. NLM Tech Bull. 2010 Jul-Aug;(375):e13.

Since the inception of MEDLINE, NLM has limited the characters used to those typed from a standard US keyboard and a small set of frequently used diacritics (see this character set at Limited MEDLINE/PubMed Character Set).

Starting in early September 2010, NLM will accept for newly created MEDLINE records any UTF-8 character in the Latin (Roman) and Greek scripts as well as mathematical and other symbols commonly found in biomedical literature. Other scripts such as Chinese, Japanese, or Korean are not supported (see MEDLINE/PubMed Character Set for the expanded character set).

The most notable difference is the addition of Greek characters to the database. Previously, NLM spelled out Greek letters, for example, replacing β (Unicode 03B2) with beta. PubMed users are now able to search for these characters either by copying and pasting the text from an online source or by spelling out the letter as they always have done. Both approaches retrieve the same set of citations.

NLM will continue to standardize some characters:

All instances that represent a Double Quote will be translated to the straight double quote ” (Unicode 0022).

All instances that represent a Single Quote (this includes prime and apostrophe) will be translated to the straight single quote ‘ (Unicode 0027).

Em Dash, En Dash, Hyphen, or Minus will be translated to the single dash – (Unicode 002D).

See Diacritics in PubMed Displays and Searching for additional information.

Image of the author ABOUT SEA Currents

Email author Visit author's website View all posts by

SEA CUrrents Archives 2006-Present

SEA Currents Archives: 2001-2005

Subscribe to SEA Currents

Blog Categories

Funded under cooperative agreement number UG4LM012340 with the University of Maryland, Health Sciences and Human Services Library, and awarded by the DHHS, NIH, National Library of Medicine.

NNLM and NATIONAL NETWORK OF LIBRARIES OF MEDICINE are service marks of the US Department of Health and Human Services | Copyright | Download PDF Reader