Occupation Prediction with Multimodal Learning from Tweet Messages and Google Street View Images
Keywords: GeoAI, multimodal learning, deep learning, social media user profiling, demographic prediction, transformer
Abstract. Despite the development of various heuristic and machine learning models, social media user occupation predication remains challenging due to limited high-quality ground truth data and difficulties in effectively integrating multiple data sources in different modalities, which can be complementary and contribute to informing the profession or job role of an individual. In response, this study introduces a novel semi-supervised multimodal learning method for Twitter user occupation prediction with a limited number of training samples. Specifically, an unsupervised learning model is first designed to extract textual and visual embeddings from individual tweet messages (textual) and Google Street View images (visual), with the latter capturing the geographical and environmental context surrounding individuals’ residential and workplace areas. Next, these high-dimensional multimodal features are fed into a multilayer transfer learning model for individual occupation classification. The proposed occupation prediction method achieves high evaluation scores for identifying Office workers, Students, and Others or Jobless people, with the F1 score for identifying Office workers surpassing the best previously reported scores for occupation classification using social media data.