Development Of A Machine Learning Model For Brand And Audience Segmentation Using Demographic Data

No Thumbnail Available
Date
2025
Journal Title
Journal ISSN
Volume Title
Publisher
Corpus Intellectual
Abstract
The expansion of the global business landscape, a highimpact factor in eCommerce, has resulted in identifying potential customers and their positive reactions to products or services offered by companies that use the internet to promote their electronic business. With a high increase in audience using social media, there is a need for brand and audience segmentation and targeting for profit-making; thus, this study developed a machine learning model for brand and audience segmentation using the Social Media Advertising Dataset. The dataset includes comprehensive data on social media advertising campaigns across Facebook, Instagram, Pinterest, and Twitter, featuring ad impressions, clicks, spending, demographic targeting, and conversion rates. With 16 columns and 300,000 rows, the dataset offered substantial data for analysis. The study compared the performance of a Naive Bayes model with a Random Forest algorithm in two existing literature; the Naive Bayes model achieved an accuracy of 35%, the Random Forest model achieved an accuracy of 89.6%, and the Random Forest model in the current study's model reached 97% accuracy. The Random Forest model's superior performance in both studies demonstrates its effectiveness in consumer group segmentation, indicating its practical utility in optimizing marketing strategies and improving customer targeting. An implementation of the developed model of the study was in Python and deployed on a website using the Flask framework, providing an accessible tool for practical applications.
Description
The model employs a Random Forest classifier, a powerful ensemble learning method that builds multiple decision trees and merges their predictions to improve accuracy and control overfitting. The dataset is divided into training and testing sets, with 80% used for training the Random Forest classifier and 20% reserved for testing. The training process is conducted using Jupyter, which allows for rapid iterations and model improvements. The flowchart, as depicted in this study, illustrates the logical flow and relationships between various components, which helps to comprehend the design of the System, spot possible bottlenecks, and make sure all required processes are taken into account.
Keywords
Citation