A graph-based community detection approach for identifying the semantic neighbourhoods within London’s Airbnb properties
Keywords: Neighbourhoods, Natural Language Processing, Graphs, Large Language Models
Abstract. Neighbourhoods are a fundamental unit for organising, analysing, and understanding urban systems. While they can be described in administrative terms, subjective boundaries often better capture lived urban experience. The natural language people use to describe where they live provides one signal of these boundaries. We present a spatio-semantic approach to neighbourhood identification, recovering neighbourhood partitions from geo-tagged natural language descriptions of 40,346 London Airbnb listings. We embed descriptions using an Large Language Model-based embedding model and construct a weighted kNN graph that integrates geographic proximity and semantic similarity between properties. Leiden community detection on this graph yields spatially contiguous neighbourhood partitions, which we validate against three indicators of urban structure: functional and commercial concentration via amenity distribution, accessibility and urban connectivity via transit structure, and social composition via socio-economic patterning. While these indicators are not an exhaustive representation of urban characteristics, they do provide an interpretable basis against which our spatio-semantic partitions can be assessed. Communities align strongly with amenity structure, with POI density higher in community cores than peripheries in 91.9% of cases, and align moderately with socio-economic structure (global NMI = 0.193). We also demonstrate qualitative alignment between transit structure and identified partitions. An ablation study shows that semantic information improves amenity alignment substantially more than socio-economic alignment, consistent with the leisure- and tourism-oriented content of listing descriptions.