<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" "https://jats.nlm.nih.gov/nlm-dtd/publishing/3.0/journalpublishing3.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article" dtd-version="3.0" xml:lang="en">
<front>
<journal-meta>
<journal-id journal-id-type="publisher">AGILE-GISS</journal-id>
<journal-title-group>
<journal-title>AGILE: GIScience Series</journal-title>
<abbrev-journal-title abbrev-type="publisher">AGILE-GISS</abbrev-journal-title>
<abbrev-journal-title abbrev-type="nlm-ta">AGILE GIScience Ser.</abbrev-journal-title>
</journal-title-group>
<issn pub-type="epub">2700-8150</issn>
<publisher><publisher-name>Copernicus Publications</publisher-name>
<publisher-loc>Göttingen, Germany</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.5194/agile-giss-7-29-2026</article-id>
<title-group>
<article-title>Enhancing OpenStreetMap Building Footprints through nDSM-Based Geometric Segmentation for AI Training Data</article-title>
</title-group>
<contrib-group><contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Kuper</surname>
<given-names>Paul</given-names>
<ext-link>https://orcid.org/0000-0002-9912-1958</ext-link>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Liu</surname>
<given-names>Ruiqi</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Deng</surname>
<given-names>Hanwen</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
<contrib contrib-type="author" xlink:type="simple"><name name-style="western"><surname>Breunig</surname>
<given-names>Martin</given-names>
</name>
<xref ref-type="aff" rid="aff1">
<sup>1</sup>
</xref>
</contrib>
</contrib-group><aff id="aff1">
<label>1</label>
<addr-line>Geodetic Institute, Karlsruhe Institute of Technology, Karlsruhe, Germany</addr-line>
</aff>
<pub-date pub-type="epub">
<day>10</day>
<month>06</month>
<year>2026</year>
</pub-date>
<volume>7</volume>
<elocation-id>29</elocation-id>
<permissions>
<copyright-statement>Copyright: &#x000a9; 2026 Paul Kuper et al.</copyright-statement>
<copyright-year>2026</copyright-year>
<license license-type="open-access">
<license-p>This work is licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this licence, visit <ext-link ext-link-type="uri"  xlink:href="https://creativecommons.org/licenses/by/4.0/">https://creativecommons.org/licenses/by/4.0/</ext-link></license-p>
</license>
</permissions>
<self-uri xlink:href="https://agile-giss.copernicus.org/articles/7/29/2026/agile-giss-7-29-2026.html">This article is available from https://agile-giss.copernicus.org/articles/7/29/2026/agile-giss-7-29-2026.html</self-uri>
<self-uri xlink:href="https://agile-giss.copernicus.org/articles/7/29/2026/agile-giss-7-29-2026.pdf">The full text article is available as a PDF file from https://agile-giss.copernicus.org/articles/7/29/2026/agile-giss-7-29-2026.pdf</self-uri>
<abstract>
<p>High-quality building footprint labels are a critical prerequisite for training AI-based segmentation models, yet reliable ground truth data is rarely available at scale. On the one hand, vegetation often prevents the reliable determination of buildings when only using imagery data and on the other hand, community-driven open data sources such as OpenStreetMap (OSM) frequently exhibit spatial inconsistencies and incompleteness. This study brings both data sources together: it investigates the potential of utilizing airborne LiDAR-derived Normalized Digital Surface Models (nDSM) to improve building extraction and refine OSM labels. Two automated strategies are implemented and compared: 1) a rule-based region growing algorithm and 2) a Density-Based Spatial Clustering (DBSCAN) pipeline leveraging a multi-dimensional feature space that incorporates nDSM heights and local roughness. As a result, more reliable building footprint labels are generated to be used as training data for AI-based building segmentation. The two methods are evaluated on orthophoto-based ground truth data in Karlsruhe, Germany. Quantitative results demonstrate that the nDSM-based DBSCAN approach yields the most robust performance, achieving an F1-score of 0.94 and an Intersection-over-Union (IoU) of 0.89. This method systematically improves upon the raw OSM baseline by effectively filtering vegetation and correcting geometric misalignments through multi-source constraints, specifically the Normalized Difference Vegetation Index (NDVI) including OSM map data overlap. Finally, conclusions are drawn and the outlook indicates the way to AI-based building segmentation, trained on such labels, to be used in scenarios where high-quality ground truth is unavailable.</p>
</abstract>
<counts><page-count count="7"/></counts>
</article-meta>
</front>
<body/>
<back>
</back>
</article>