Workshop on the Analysis of Geographic References

May 31 2003 NAACL-HLT 2003 Edmonton, Alberta

Organizing Committee

Chairs: Andras Kornai (Metacarta Inc.)
Beth Sundheim (US Navy SPAWAR Systems Center)

Members: Doug Appelt (SRI)
Merrick Lex Berman (Harvard Yenching Institute)
Sean Boisen (BBN Verizon)
Quintin Congdon (Army NGIC)
James Cowie (NMSU/CRL)
Linda Hill (Univ. California, Santa Barbara)
Doug Jones (MIT Lincoln Labs)
George Wilson (Mitre)


DARPA Translingual Information Detection, Extraction and Summarization (TIDES)
ARDA Advanced Question Answering for Intelligence (AQUAINT)


The primary focus of the workshop is to discuss how existing NLP techniques can be adapted and new ones developed that will advance core technology in geographic reference analysis. Also of interest are studies of geographic reference issues that arise from work on applications such as question-answering, multidocument summarization, cross-language information retrieval, multidocument information extraction, and "first-story detection" in streams of broadcast news.

Important dates

• Two-page extended abstracts (750 words max) in plain ascii (PDF only if there is compelling graphical content) were due March 15 2003, electronic submissions only. CLOSED, NO FURTHER SUBMISSIONS PLEASE. Acceptance notifications already went out March 21st 2003.
• Full papers were due April 10 CLOSED, for detailed instructions please check here.
• HLT-NAACL early registration ends April 26
• Workshop May 31


Effective analysis of textual references to places is a critical core technology for a wide variety of NLP applications. This workshop focuses on topics concerning the recognition, disambiguation, normalization, storage, and display of geographic references, e.g., "New York", "Nueva York", "LaGuardia Airport", "LaGuardia", "[the] Brooklyn Bridge", "a mile from downtown Manhattan", "the southern tip of Manhattan Island", "the Amazon delta", "the San Diego-Tijuana border".

Many place names are lexically ambiguous with respect to their physical location -- "Orange" as a county in either California or Florida, for example -- and sometimes also with respect to type -- "New York" as city or state in the U.S. Such names are relatively straightforward to normalize, though there is room for discussion about establishing some common normalization practices. Other forms of expression are more difficult to normalize, e.g., references to vague areas such as "the Amazon delta" and relative locations such as "a mile south of the village".

To respond to the challenges of providing accurate analyses in broad domains and across languages, and useful information on subjects for which there is sparse training data, advances in core technology are needed that can bring to bear both lexical and spatial background knowledge about places worldwide. For most applications, it is not enough for the system to bracket a text string and tag it as a "LOCATION"; it is necessary to normalize the information in a way that specifically describes or even uniquely identifies the place in question.

Since texts may contain place references without providing all the extra information needed to disambiguate them, the system needs background knowledge in some form or another that it can draw on to tell it about known names and their types and locations. One form of knowledge resource is a placename gazetteer, and there are large electronic gazetteers in existence that are publicly available. How can they be tailored and exploited to meet the needs of NLP?

Such resources may also contain foreign names in native script or in transliteration. These name forms are critically needed to support analysis in multilingual and cross-lingual settings. Recognition of the various ways that a given place may be referenced in one language is a challenging problem in and of itself, and issues of name translation and transliteration and special character sets multiply that problem.

Map-based visualization of the results geographic reference analysis introduces the challenge of associating places with specific locations. Coordinates that identify a centerpoint of a place may be found in some gazetteers; in some cases, more extensive coordinates may be available that approximately define the boundaries of the place. Although it's obvious that such gazetteer information enables data visualization, it's also clear that it can sometimes be useful in the process of doing geographic reference analysis, e.g., to identify the set of states that are intended by a phrase such as "the states that neighbor California" or "the states on the coast of the Gulf of Mexico". Research that explores this topic further is of interest to the workshop.

Relevant topics

• Disambiguation of recognized terms and geographic references based on text evidence ("London" in Ontario v. "London" in England; "New York" as city v. "New York" as state): Special aspects of recognizing and characterizing place entities (i.e., methods, etc. that do not apply equally to processing other types of entities, e.g., persons and organizations)

• Usage of background knowledge (external gazetteers or other knowledge sources) to assist in analysis of textual references to places: Partial term matching; cross-gazetteer term matching; methods for weighing name, type and location evidence to identify best match

• Usage of results of text analysis to improve knowledge resources: detecting and filling gaps in coverage of names, types and containment/coordinate data; detecting need to update existing entries due to text-based evidence of changes in place names and/or characteristics (e.g., a commercial building that is converted to a church, a city that becomes the capital of a newly defined republic)

• Gazetteer localization and multilingual fill; spelling normalization and cross-language term matching

• Automatic categorization and description of place names: Definition of standard categorization schemes and mapping among schemes; automatic processes for coarse- and fine-grained category assignment, e.g., to distinguish at some level among functional (bridge, building, ...), geographical (cave, bay, ...) and administrative-political (county, province, ...) types of places

• Interpretation and representation of complex references, such as relative locations ("10 miles from Ankara"), vague areas ("the Amazon delta"), boundaries ("the San Diego-Tijuana border"), metonymies ("Green Bay" used in reference to football team)

• Language analysis uses for data on absolute coordinates (bounding boxes, polygons, polylines, etc.) in gazetteers, e.g., understanding containment, border and neighbor relationships via knowledge of areas and boundaries

• Standards for annotation and data interchange


The workshop will be conducted in a roundtable format, and the program will include a mixture of paper presentations, invited talks, demos, and discussion periods.

Confirmed papers

Confirmed papers


Workshop schedule

Local information

There will be an overhead projector, a data projector and a screen. There will not be computers in the sessions. The presenters are expected to bring their own notebooks. If they won't have a notebook with them, but need one, they should contact other people in the same session and try to sort this out. The presenters should also have transparencies ready as backup. If anybody needs an easel or a posterboard, it can be arranged. The main conference will have an e-mail room with 5 PCs, 10 LAN wires and a wireless router. These facilities will be available from noon of May 27 to the afternoon of May 31.

Edmonton, Travel, Hotel, Registration

Hard copy

Soft copy will be made available on this website as soon as it is accepted. Hard copy proceedings will be available in Edmonton.
Back to NAACL-HLT home page
Page frozen June 6 2003