Geospatial Knowledge in Real Estate Listings : Extracting and Localizing Uncertain Spatial Information from Text.
Spatial information is found in numerous unstructured (textual) documents such as travel blogs, social media in emergencies or Real Estate advertisements, and could be very difficult to extract and localize. Usually, digital gazetteers are used to match geospatial objects to their boundaries but they might be incomplete. Indeed, humans often use spatial expressions with toponyms (e.g., "West of Nice, France", "Nearby the Promeande des Anglais"), place types instead of toponyms (e.g., "Next to the beach") or local and unofficial place names (e.g., "La Banane in Cannes") that are not found in official gazetteers. For example, Real Estate professionals often exaggerate boundaries of a place that is popular and well-reputed since the location is one of the most valuable factors of purchasing. Thus, a number of studies have proposed to enrich gazetteers by estimating and representing the vernacular places. However, only a few approaches have taken into account vague spatial expressions such as "nearby" and places without toponyms (e.g, "the university"). In our work, we propose an automatic workflow to extract spatial information from Real Estate advertisements and retrieve a location approximation of the uncertain places in order to enrich geographic gazetteers.