Extracting and Mapping Location Mentions From Texts To The Ground

7:38:00 PM
by Unknown

The meaning of a social media post can be a function of location. For example, the meaning of "The Main Street bridge is closed" is ambiguous without establishing exactly which bridge is in question (The one in Danville, VA or in Columbus, OH). At the same time, location information metadata is sparse, forcing analysis of social media content and context to disambiguate alternative mappings. This article uncovers some of the persisting challenges in the recovery of location information from content and context: Text normalization, ambiguous location information, Geoparsing, and the future steps of our research.

Consider the Tweet in Figure 1. It contains valuable, but implicit information for Disaster Response and Flood Modeling. Here the user provides the level of water during a  storm surge. Knowledge-based inference supports the enrichment of this claim to determine that the water level is around 3 meters (to the height of a first floor)1. If we knew the location of Ganapathi colony, this quantitative data can inform a storm surge model to predict the direction of the surge and the danger it might pose.

Fig. 1: An example tweet from Twitris campaign of Chennai Flood 2015.

At Kno.e.sis center, one of the goals of our NSF-funded project (Social and Physical Sensing Enabled Decision Support) is to make information available in social media accessible to first-responders to prioritize relief efforts. Mapping events to locations in order to attach ground-based information to locations on the map will allow us to achieve the desired goal. In contrast to physical sensors, the resolution of social media data is a function of population rather than specialized infrastructure. In the case of lack of sensor/IoT coverage or malfunctioning of sensors, citizen sensing can provide information about ground status to compensate for missing data.

Using Twitris, Kno.e.sis’ robust semantic social web platform that has been used in a number of real-world scenarios, we collect real-time, event-centric twitter data to understand social perceptions. During the 2015 Chennai floods, we created a Twitris campaign that  collected  around 508K relevant tweets. Determining location is an important part of making these tweets informative. Location extraction and mapping is performed in two steps: Toponym extraction and Geoparsing. 

Toponym Extraction

Toponym extraction is the process of extracting names of places from texts such as street names, points of interest (POI), cities, countries, and so on. There are two traditional ways to extract toponyms from texts: a supervised approach and an gazetteer-matching approach.

Supervised approaches. In the supervised approach, we train the model using manually annotated data of location mentions [1],  Supervised approaches tend to suffer from the underestimation of missing data. They require sufficient amount of annotated data from the same data source (e.g., microblog text), to enable location detections from similar data sources. However, the gazetteer approach discussed next has its own difficulties and issues that must be solved to extract locations from texts efficiently.

Fig. 2: Syntactic Parse tree built using NLTK's cascaded chunk parser.

Gazetteers approaches. In the gazetteer approach, we extract location mentions on the fly without using any training dataset.  Gazetteer approaches often use syntactic parse trees (for noun phrase extraction), direction and distance markers, gazetteers, dictionaries and many other knowledge bases in order to extract locations from texts [2-4]. Figure 2 shows part of the parse tree built using NLTK of the tweet text in Figure 3. Parsing the tweet text allows us to find noun phrases using the NLTK’s cascaded chunk parser. The parser matches a set of predefined rules to text. For example, the rule ( VP: {<VB.*><NP|PP|CLAUSE>+$} ) allows us to detect and extract the noun phrase (NP) “SRM university” which follows the preposition (PP) “Near”.

Fig. 3: Tweet mentioning the toponyms “SRM university” and “kattankulathur”.

Similarly, direction and distance markers allow us to retrieve toponyms the markers are pointing at. For example, in Figure 4. The direction marker “south of” points at the toponym “101 Fwy” which is then added to our list of potential geo-parsable toponym names.

Fig. 4: Tweet mentioning a toponym (“101 Fwy”) pointed at by a direction marker

Two challenges arise in Toponym extraction:  Text normalization and ambiguous location information.

Text normalization. Text normalization involves subtasks such as abbreviations and acronyms expansion and misspelling corrections. Figure 5 shows an example of a tweet with such difficulties. The author of the tweet used “Rd” as an abbreviation of “road”. Moreover, the text “Kilpauk Garden” is incomplete relative to the Gazetteer name  “Kilpauk, Aspiran Garden Colony”2.

Fig. 5: An example tweet with abbreviations (Rd) and incomplete information (Kilpauk Garden).

Locations can also be embedded in hashtags or usernames. For example, both @yankeestadium and #YankeeStadium refer to the location name “Yankee Stadium”. Therefore, such location mentions can also be extracted using a word segmentation (tokenization) method. The method uses a classifier on unigram and bigram language models of word frequencies to find word boundaries.

Ambiguous Location Information. Location information is not always explicit. The relative directionality and distance content noted above hints at this problem.  Consider  the following tweet (Figure 6) as a more challenging example:

Fig. 6: A tweet showing an ambiguous mention of a location.

In this example, a renowned author (Indian racing driver Karun Chandhok) is referring to his parent’s house.  Ideally, the location of the house could be extracted from a knowledge base. The extracted toponym “our house” should then be mapped to an absolute  location name. This extracted piece of information can then provide us with the fact that people are evacuating from his parents’ area. For another example of using a knowledge base (or location database) to identify a building name (Nariman House) is shown in Figure 2 of this article on Citizen Sensing.


Geoparsing contrasts with geocoding, and both follow toponym extraction. Geocoding works with unambiguous location references such as postal addresses to specify a location on Earth using coordinates (latitude and longitude). Geoparsing is similar to Geocoding but differs in that it works with ambiguous location references in unstructured texts (such as tweets).

Geoparsing can be performed through a gazetteer matching process that allows us to  retrieve all the metadata of the matched location. OpenStreetMap, for example, provides information such as the bounding box, the latitude and longitude, the full address, the class of the location name (Map Features), and the full display name of the matched toponym. The information extracted after a successful gazetteer matching pinpoints the toponym on the map and attaches to it the extracted metadata. The following map (Figure 7) shows the mapped toponym from the tweet in Figure 5.

Fig. 7: Pinpointing the full location name of the extracted toponym from the tweet in Figure 5: “No. 17/7, New Avadi Road, Kilpauk, Aspiran Garden Colony, Kilpauk, Chennai, Tamil Nadu 600010, India”

A typical gazetteer matching task requires complex text normalization and missing data restoration. To overcome some of the difficulties posed by Twitter data we typically use fuzzy text matching during toponym extraction, in addition to the previously discussed text normalization process. As for the incompleteness of gazetteers, a combination of one or more additional knowledge bases can be used. An example of such dictionaries is a list of points of interest3 that can be retrieved from an external data source.

Other things our research is addressing is the problem of disambiguation during Geoparsing. If a toponym name has many records in the gazetteer, the method should reasonably disambiguate which location the tweet was referring to. This problem includes the Whole-Part Relationship (i.e., which section of the road and which campus of a university). Using the provided context from text as shown in Figure 4, where the toponym “101 Fwy” is supposed to be “between Woodman Ave and Coldwater Canyon”, can tremendously help in solving such problems. Our research is currently investigating such problems and possible solutions.


Toponym extraction and Geoparsing require more than text normalization and the retrieval of unambiguous location names from the text. The disaster relief scenario aids in the identification of several important, and the research challenges yet to be solved well, such as ambiguous location information and more advanced Geoparsing disambiguations. The Kno.e.sis center's mission “Computing for Human Experience”, drives the recognition of these challenges while providing ground impact beyond lab implementations.

The issue of reliability and trustworthiness of the extracted information are relevant to our project but are not discussed here.
2 The correct location found using Google Maps goo.gl/AN9Gxr
3 Area specific points of interest (for example, in Chennai) are typically businesses, hospitals, shopping malls, etc.


[1] Lingad, John, Sarvnaz Karimi, and Jie Yin. "Location extraction from disaster-related microblogs." In Proceedings of the 22nd international conference on World Wide Web companion, pp. 1017-1020. International World Wide Web Conferences Steering Committee, 2013.
[2] Gelernter, Judith, and Shilpa Balaji. "An algorithm for local geoparsing of microtext." GeoInformatica 17, no. 4 (2013): 635-667.
[3] Shervin Malmasi, Mark Dras. “Location Mention Detection in Tweets and Microblogs”. Computational Linguistics. Volume 593 of the series Communications in Computer and Information Science pp 123-134. Springer February 2016.
[4] Middleton, Stuart E., Lee Middleton, and Stefano Modafferi. "Real-time crisis mapping of natural disasters using social media." Intelligent Systems, IEEE 29, no. 2 (2014): 9-17.

Parent Project

You Might Also Like


  1. Best Casino games, bonuses and promotions (November 2021
    Best casino games, bonuses and promotions (November 2021 밀양 출장안마 부천 출장마사지 서울특별 출장안마 전주 출장안마 Casino Review 2021 | Avis Casino UK | 태백 출장안마 Free Spins | No Deposit Bonus.

  2. Very helpful advice on this article! It is the little changes that make the biggest changes. Thanks a lot for sharing! 사설토토

  3. Really a nice article. Thank you so much for your efforts.

  4. Thanks for sharing the information keep updating, looking forward to more post.
    Nice post ! I love its your site after reading ! thanks for sharing. I am so happy to read this. This is the kind of manual that needs to be given and not the random misinformation that’s at the other blogs. Appreciate your sharing this greatest doc.
    defending against protective order in virginia
    domestic violence protective orders in virginia

  5. I am happy to see a wonderful post. The information you are shared in this post is very informative and creative. The creative ideas behind is awesome. Thanks for sharing. Traffic Lawyer Frederick VA

  6. Knoesis is a research center that focuses on various aspects of data and knowledge management, including areas like semantic web technologies, big data analytics, and artificial intelligence. While I don't have specific information about Knoesis beyond my last knowledge update in September 2021, I can provide a general review of research centers like Knoesis based on their typical characteristics:Accidente Camión de Fedex

  7. Knoesis is a research group and center at Wright State University, specializing in various aspects of computer science and informatics, particularly focusing on data and knowledge engineering, semantics, and data analytics. Knoesis has been involved in a wide range of research projects and academic activities. However, it's important to note that Knoesis itself is not a product or service, so reviews in the traditional sense may not be readily available.abogados de accidentes


  8. Abogado por Accidente de Motocicleta "Extracting and Mapping Location Mentions from Texts to the Ground" is a groundbreaking study in geospatial data processing, utilizing natural language processing and geographic information systems to extract and map location data from textual sources. This interdisciplinary research reduces manual labor and automates geospatial data extraction, offering practical benefits in fields like urban planning, disaster response, and social sciences. The research bridges the gap between unstructured textual information and structured geospatial data, offering significant implications for location-based decision-making and analysis. The method's versatility and scalability make it a valuable resource for professionals and researchers in geospatial fields. The work exemplifies the power of combining advanced technologies to unlock valuable geographic insights from textual data sources.

  9. The accuracy and precision in identifying and plotting locations is impressive, and the mapping process is seamless. The visual representation of the data provides a clear understanding of the data. The reviewer is impressed with the efficiency of the task, the attention to detail in mapping the locations, and the informative visual output. The reviewer appreciates the dedication to quality and professionalism of the team. The seamless integration of location data into the project is a testament to the technical proficiency, and the team's ability to adapt to different data sources and formats is a valuable skill. The reviewer concludes that the team's expertise in extracting and mapping locations is crucial for ensuring the accuracy of spatial analyses.Good Car Accident Attorney VA

  10. Addressing location extraction challenges is crucial for improving disaster response using social media data. Your work in enhancing toponym extraction and handling ambiguous location information is commendable.
    Divorcio en Estado de Nueva York ¿Cuánto Tiempo Lleva?

  11. Amelia Conducción imprudente

    La conducción imprudente en Amelia, Virginia, puede tener consecuencias legales significativas. Si enfrentas cargos por conducción imprudente en Amelia, es crucial buscar la asesoría de un abogado especializado en leyes de tráfico. Un abogado con experiencia en este campo puede ayudarte a entender tus derechos, evaluar la evidencia en tu caso y trabajar en tu defensa. No dudes en ponerte en contacto con un abogado de tráfico en Amelia para obtener asesoramiento legal específico y determinar la mejor estrategia para tu situación.

  12. Your blog is a great example of how online content can be both informative and engaging - keep up the fantastic work!DUI Lawyer Prince William County VA

  13. I'm glad you found the blog post helpful. If you have any more topics or questions you'd like to explore in future posts, please don't hesitate to let me know. Your feedback and engagement are greatly appreciated!
    Note of Issue Contested Divorce New York

  14. This text explores the challenges and methodologies in extracting and mapping location mentions from texts to ground references, focusing on leveraging social media content during crisis situations like natural disasters. It discusses techniques like toponym extraction and geoparsing, highlighting complexities encountered and methods employed to resolve them. The ultimate goal is to make real-time social media data actionable for first responders in disaster scenarios, ensuring its reliability and relevance. The approach is comprehensive, leveraging tools like Twitris and NLTK while integrating multiple data sources and knowledge bases for accurate location extraction. The text emphasizes the importance of text normalization to handle abbreviations, misspellings, and incomplete location references in social media content. Geoparsing involves matching ambiguous location references to specific coordinates, using gazetteer matching processes and integrating multiple knowledge bases. Addressing complexities of disambiguation during geoparsing remains a key research focus. The text highlights the need for a more nuanced approach beyond text normalization and the importance of disambiguating location mentions and ensuring the reliability of extracted information. abogado de accidentes de camiones