See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/388549013 Development Of A Machine Learning Model For Brand And Audience Segmentation Using Demographic Data Conference Paper · January 2025 CITATIONS 0 READS 53 3 authors, including: Gbeminiyi Falowo Redeemer's University 3 PUBLICATIONS   1 CITATION    SEE PROFILE Blessing Oluwatobi Olorunfemi Redeemer's University 5 PUBLICATIONS   7 CITATIONS    SEE PROFILE All content following this page was uploaded by Gbeminiyi Falowo on 31 January 2025. The user has requested enhancement of the downloaded file. https://www.researchgate.net/publication/388549013_Development_Of_A_Machine_Learning_Model_For_Brand_And_Audience_Segmentation_Using_Demographic_Data?enrichId=rgreq-52ffdbe9ec88b79e3d5735806efa20c2-XXX&enrichSource=Y292ZXJQYWdlOzM4ODU0OTAxMztBUzoxMTQzMTI4MTMwNjQ5OTcxNEAxNzM4MzE1MjExMDI3&el=1_x_2&_esc=publicationCoverPdf https://www.researchgate.net/publication/388549013_Development_Of_A_Machine_Learning_Model_For_Brand_And_Audience_Segmentation_Using_Demographic_Data?enrichId=rgreq-52ffdbe9ec88b79e3d5735806efa20c2-XXX&enrichSource=Y292ZXJQYWdlOzM4ODU0OTAxMztBUzoxMTQzMTI4MTMwNjQ5OTcxNEAxNzM4MzE1MjExMDI3&el=1_x_3&_esc=publicationCoverPdf https://www.researchgate.net/?enrichId=rgreq-52ffdbe9ec88b79e3d5735806efa20c2-XXX&enrichSource=Y292ZXJQYWdlOzM4ODU0OTAxMztBUzoxMTQzMTI4MTMwNjQ5OTcxNEAxNzM4MzE1MjExMDI3&el=1_x_1&_esc=publicationCoverPdf https://www.researchgate.net/profile/Gbeminiyi-Falowo?enrichId=rgreq-52ffdbe9ec88b79e3d5735806efa20c2-XXX&enrichSource=Y292ZXJQYWdlOzM4ODU0OTAxMztBUzoxMTQzMTI4MTMwNjQ5OTcxNEAxNzM4MzE1MjExMDI3&el=1_x_4&_esc=publicationCoverPdf https://www.researchgate.net/profile/Gbeminiyi-Falowo?enrichId=rgreq-52ffdbe9ec88b79e3d5735806efa20c2-XXX&enrichSource=Y292ZXJQYWdlOzM4ODU0OTAxMztBUzoxMTQzMTI4MTMwNjQ5OTcxNEAxNzM4MzE1MjExMDI3&el=1_x_5&_esc=publicationCoverPdf https://www.researchgate.net/institution/Redeemers_University?enrichId=rgreq-52ffdbe9ec88b79e3d5735806efa20c2-XXX&enrichSource=Y292ZXJQYWdlOzM4ODU0OTAxMztBUzoxMTQzMTI4MTMwNjQ5OTcxNEAxNzM4MzE1MjExMDI3&el=1_x_6&_esc=publicationCoverPdf https://www.researchgate.net/profile/Gbeminiyi-Falowo?enrichId=rgreq-52ffdbe9ec88b79e3d5735806efa20c2-XXX&enrichSource=Y292ZXJQYWdlOzM4ODU0OTAxMztBUzoxMTQzMTI4MTMwNjQ5OTcxNEAxNzM4MzE1MjExMDI3&el=1_x_7&_esc=publicationCoverPdf https://www.researchgate.net/profile/Blessing-Olorunfemi-2?enrichId=rgreq-52ffdbe9ec88b79e3d5735806efa20c2-XXX&enrichSource=Y292ZXJQYWdlOzM4ODU0OTAxMztBUzoxMTQzMTI4MTMwNjQ5OTcxNEAxNzM4MzE1MjExMDI3&el=1_x_4&_esc=publicationCoverPdf https://www.researchgate.net/profile/Blessing-Olorunfemi-2?enrichId=rgreq-52ffdbe9ec88b79e3d5735806efa20c2-XXX&enrichSource=Y292ZXJQYWdlOzM4ODU0OTAxMztBUzoxMTQzMTI4MTMwNjQ5OTcxNEAxNzM4MzE1MjExMDI3&el=1_x_5&_esc=publicationCoverPdf https://www.researchgate.net/institution/Redeemers_University?enrichId=rgreq-52ffdbe9ec88b79e3d5735806efa20c2-XXX&enrichSource=Y292ZXJQYWdlOzM4ODU0OTAxMztBUzoxMTQzMTI4MTMwNjQ5OTcxNEAxNzM4MzE1MjExMDI3&el=1_x_6&_esc=publicationCoverPdf https://www.researchgate.net/profile/Blessing-Olorunfemi-2?enrichId=rgreq-52ffdbe9ec88b79e3d5735806efa20c2-XXX&enrichSource=Y292ZXJQYWdlOzM4ODU0OTAxMztBUzoxMTQzMTI4MTMwNjQ5OTcxNEAxNzM4MzE1MjExMDI3&el=1_x_7&_esc=publicationCoverPdf https://www.researchgate.net/profile/Gbeminiyi-Falowo?enrichId=rgreq-52ffdbe9ec88b79e3d5735806efa20c2-XXX&enrichSource=Y292ZXJQYWdlOzM4ODU0OTAxMztBUzoxMTQzMTI4MTMwNjQ5OTcxNEAxNzM4MzE1MjExMDI3&el=1_x_10&_esc=publicationCoverPdf 1 Corpus Intellectual ISSN PRINT 2811-3187 ONLINE 2811-3209 Volume 3 NO 3 2024 Conf. Edition Development Of A Machine Learning Model For Brand And Audience Segmentation Using Demographic Data Abstract The expansion of the global business landscape, a high- impact factor in eCommerce, has resulted in identifying potential customers and their positive reactions to products or services offered by companies that use the internet to promote their electronic business. With a high increase in audience using social media, there is a need for brand and audience segmentation and targeting for profit-making; thus, this study developed a machine learning model for brand and audience segmentation using the Social Media Advertising Dataset. The dataset includes comprehensive data on social media advertising campaigns across Facebook, Instagram, Pinterest, and Twitter, featuring ad impressions, clicks, spending, demographic targeting, and conversion rates. With 16 columns and 300,000 rows, the dataset offered substantial data for analysis. The study compared the performance of a Naive Bayes model with a Random Forest algorithm in two existing literature; the Naive Bayes model achieved an accuracy of 35%, the Random Forest model achieved an accuracy of 89.6%, and the Random Forest model in the current study's model reached 97% accuracy. The Random Forest model's superior performance in both studies demonstrates its effectiveness in consumer group segmentation, indicating its practical utility in optimizing marketing strategies and improving customer targeting. An implementation of the developed model of the study was in Python and deployed on a website using the Flask framework, providing an accessible tool for practical applications. Lead Author Gbeminiyi Falowo Affiliation: Department of Mass communicati on Redeemer’s University Ede, Osun State 2 Corpus Intellectual ISSN PRINT 2811-3187 ONLINE 2811-3209 Volume 3 NO 3 2024 Conf. Edition Co-Authors: Blessing Oluwatobi Olorunfemi; Inwang Emmanuel Inwang Computer Science Department, Redeemer’s University Ede, Osun State. Keywords: Brand Segmentation, Audience Segmentation, Machine Learning, Demographic Data, Social Media Advertising, Naive Bayes, Random Forest, Marketing Strategies, Customer Targeting 1. Introduction Customer data forms the foundation for successful business strategies. Exploring data to uncover customer insights and support decision- making enhances business interest. Rather than applying marketing strategies uniformly to all customers, clustering customers allows businesses to identify target segments, enabling a deeper understanding of each segment's characteristics and the development of tailored business strategies (Dawane et al., 2021). Consequently, applying clustering methods to identify potential customers is a leading trend in today's tech space. Combining machine learning (ML) algorithms with user data exemplifies customer segmentation and supports businesses in identifying segments of customers that are difficult to detect through intuition and manual information inspection (Kumar, 2023). The combination of these ML models further results in market segmentation, which is the dividing of a market into distinct sub-groups of customers with different needs, characteristics, or behaviours who may require separate products or respond differently to various marketing efforts (Durojaye & Obunadike, 2022). In today's business landscape, companies face the challenge of identifying potential customers most likely to respond positively to a product or offer. Here, data mining techniques become crucial. With the growing amount of available data, data mining has become essential for direct marketing efforts, enabling companies to create prediction response models based on past client purchase data (Kasem et al., 2023). Companies must understand client demands and provide tailored products and services to secure ample profits. This understanding can be achieved through segmentation via machine learning. Applying the right marketing tactics to the correct customer segments increases the probability of profit maximization and enhances cost efficiency by avoiding the expenditure of resources on unlikely customer bases (Yadegaridehkordi et al., 2021). Demographic data such as Gender, age, familial and marital status, income, education, occupation, and geographical information are crucial for segmentation. Depending on the company's scope, this geographical information could range from specific towns or counties 3 Corpus Intellectual ISSN PRINT 2811-3187 ONLINE 2811-3209 Volume 3 NO 3 2024 Conf. Edition to broader regions such as cities, states, or countries (Thalkar, 2021). In this context, machine learning, a subfield of artificial intelligence, is focused on developing algorithms and techniques that enable computers to learn from data (Kasem et al., 2023) and make essential predictions, especially for marketing products. Despite several studies on the role of intelligence in marketing (Boisena et al., 2018), AI-driven segmentation to predict customer behaviours remains underexplored. The segmentation in this research specifically focuses on social media platforms, enabling brands and audiences to be categorized into their desired categories for more targeted marketing efforts. 2. Review Of Literature The dynamic interactions between brands and audiences within social media are critical in forming digital marketing strategies. As Zhang and Daugherty (2018) identified, electronic businesses (eBusinesses) utilize social media platforms to communicate with their target audience, advertise products or services, and establish brand awareness. However, they only focus on Pinterest, which may not generalize to other social media platforms. Consequently, brands (eBusinesses) that want to customize their marketing strategies and cultivate brand loyalty must comprehend their target audience's varied demographics, preferences, and behaviours (Suryakanthan et al., 2024). The segments in Suryakanthan et al. (2024) provided valuable insights for tailored marketing campaigns, product suggestions, and enhanced customer experiences but were limited to K-means clustering. On the other hand, social media platform audiences are made up of a wide range of people with different interests, demography, and levels of involvement (Nguyen, 2021). Furthermore, Amutha and Khan (2023) stated that through shares, clicks, comments, and other types of engagement, audiences actively consume information, engage with companies, and add to the online conversation. By implication, from the statement above, audiences look for real connections, pertinent material, and tailored experiences from brands on social media. In Zote (2024), an audience can be divided into groups of various interests, such as pastimes and activities. This enables the dissemination of messages primarily to relevant audiences. Thus, Kubade et al. (2023) combined and compared the analysis of the Support Vector Machine, Random Forest algorithm, and KNN model for audience segmentation, with Random Forest outperforming the others in accuracy, yet the proposed model in this study had a better performance as projected by Sruthi (2024). 4 Corpus Intellectual ISSN PRINT 2811-3187 ONLINE 2811-3209 Volume 3 NO 3 2024 Conf. Edition 3. Methodology This section details the methodology for developing the Machine Learning Model for Brand and Audience segmentation using demographic data. It comprises data collection, pre-processing, feature engineering, model selection, system design, and evaluation metrics. 3.1 Method of Data Collection This study uses the Social Media Advertising Dataset as obtained from the Kaggle Repository. The file in the dataset is named Social_Media_Advertising.csv. The dataset's access link https://www.kaggle.com/datasets/jsonk11/social-media-advertising- dataset 3.1.1 Data Description The Social Media Advertising dataset is a comprehensive collection of data related to various social media advertising campaigns. It includes ad impressions, clicks, spending, demographic targeting, and conversion rates. The dataset encompasses multiple social media platforms such as Facebook, Instagram, Pinterest, and Twitter, providing diverse advertising campaign data. This dataset contains 16 columns and 300,000 rows, offering substantial data for analysis. Table 3.1 explains the data attributes. Table 3.1: Description of the Dataset S/ N Attribute Name Attribute Description 1 Campaign_ I.D. A unique identifier for each advertising campaign. 2 Target Audience The specific demographic or audience segment targeted by the ad campaign. 3 Campaign Goal The main objective of the campaign (e.g., brand awareness, lead generation, sales conversion). 4 Duration The length of time the ad campaign ran was typically measured in days. https://www.kaggle.com/datasets/jsonk11/social-media-advertising-dataset https://www.kaggle.com/datasets/jsonk11/social-media-advertising-dataset 5 Corpus Intellectual ISSN PRINT 2811-3187 ONLINE 2811-3209 Volume 3 NO 3 2024 Conf. Edition 5 Channel Used The social media platform where the ad was displayed (e.g., Facebook, Instagram, Pinterest, Twitter). 6 Conversion Rate The percentage of persons who finished a desired action after engaging with the ad. 7 Acquisition Cost The cost incurred to acquire a customer through the ad campaign. 8 ROI Return on Investment: a way to measure the ad campaign for profit making. 9 Location The geographical region targeted by the ad campaign. 10 Language The language used in the ad campaign. 11 Clicks The actual number of times users press the mouse button on the ad. 12 Impressions The number of times the ad was shown to users. 13 Engageme nt Score A metric indicating user engagement with the ad (e.g., likes, shares, comments). 14 Customer Segment The segment of customers targeted by the ad campaign; this serves as the target or label for analysis. 15 Date The date when the ad campaign was run. 16 Company The company or brand running the ad campaign. 6 Corpus Intellectual ISSN PRINT 2811-3187 ONLINE 2811-3209 Volume 3 NO 3 2024 Conf. Edition 3.2 System Architecture The model employs a Random Forest classifier, a powerful ensemble learning method that builds multiple decision trees and merges their predictions to improve accuracy and control overfitting. The dataset is divided into training and testing sets, with 80% used for training the Random Forest classifier and 20% reserved for testing. The training process is conducted using Jupyter, which allows for rapid iterations and model improvements. Figure 3.1. System Architecture 3.2 System Design The flowchart, as depicted in Figure 3.3 below, illustrates the logical flow and relationships between various components, which helps to comprehend the design of the System, spot possible bottlenecks, and make sure all required processes are taken into account. 7 Corpus Intellectual ISSN PRINT 2811-3187 ONLINE 2811-3209 Volume 3 NO 3 2024 Conf. Edition Figure 3.2. System Flowchart Also, as seen in Figure 3.3, the use case diagram for "Customer Segment Prediction" depicts the interactions between various actors: User, System, and Admin and the System to forecast customer segments. The process begins with the User inputting customer details into the System. Once the input is provided, the User submits the information, initiating the prediction process. The System then takes over, processing the submitted details to predict the customer segment. After the prediction is made, the System displays the prediction to the User, allowing them to view the results. Figure 3.4. System Use Case Design 8 Corpus Intellectual ISSN PRINT 2811-3187 ONLINE 2811-3209 Volume 3 NO 3 2024 Conf. Edition System Implementation Several tools and platforms were used to design the model. Python was employed for data collection and pre-processing, with libraries like Pandas and NumPy utilized for numerical operations and data manipulation. Matplotlib and Seaborn were used for exploratory data analysis and visualization. Scikit-learn, a powerful package that efficiently implements numerous techniques, is used to build the machine learning model. Scikit-learn's pre-processing module will handle missing values and feature scaling, ensuring that the data is clean and prepared for analysis. All missing data were filled in using imputation techniques, and feature engineering was performed to enhance the dataset's informative value. The model training and evaluation were conducted using Jupyter Notebook, which provides an interactive environment ideal for data exploration, visualization, and iterative model development. Additionally, evaluation metrics such as accuracy, precision, and recall were calculated using Scikit- learn's metrics module to comprehensively assess the Random Forest model's performance. 4.1 Dataset Pre-processing and Analysis In the initial phase of the study, as illustrated in Figure 4.1, the necessary library, pandas, was imported for data manipulation and analysis in Python. The path to the CSV file containing the dataset was specified, and the dataset was stored on the System. The dataset, named "Social_Media_Advertising.csv," was read into a panda Data Frame using the pd.read_csv function, which allowed the data to be loaded into a structured format suitable for analysis. To verify that the data was loaded correctly and to gain an initial understanding of its structure, the first five rows of the Data Frame were displayed using the data.head() function. 9 Corpus Intellectual ISSN PRINT 2811-3187 ONLINE 2811-3209 Volume 3 NO 3 2024 Conf. Edition Figure 4. 1: Overview of the Dataset First Rows. As the next step in data pre-processing and analysis, the target audience information was split into two columns using the first space in the string. Specifically, the Target Audience column was divided at the first space, and the portion following the first space was extracted into a new column named age. This was achieved by using the str.split(' ') method and selecting the elements after the first split. To ensure that the age data was recorded correctly, the extracted list of strings for the age column was then transformed back into a single string using the apply method using a lambda function. Concurrently, the initial Target Audience column was modified to preserve solely the initial segment of the string, so the target audience data is divided into two distinct and significant columns for additional examination, as illustrated in Figure 4.2. 10 Corpus Intellectual ISSN PRINT 2811-3187 ONLINE 2811-3209 Volume 3 NO 3 2024 Conf. Edition Figure 4.2: Overview of the Data after Splitting In this step of data pre-processing, the column name Target Audience was changed to Gender to reflect better the data it represents. The original column name and the new column name were specified in a dictionary format utilizing the rename method from pandas to achieve this. The inplace=True option made sure that the modifications were applied to the DataFrame without requiring its creation. Print(data.head()) was used to show the first five rows of the revised DataFrame following the column renaming in order to verify the modifications. In addition, print(data.shape) was used to display the DataFrame's shape, which contains the number of rows and columns, to summarise the dataset's dimensions, as seen in Figure 4.3. Figure 4.3: Renaming the Data Target Audience to Gender 11 Corpus Intellectual ISSN PRINT 2811-3187 ONLINE 2811-3209 Volume 3 NO 3 2024 Conf. Edition In this data pre-processing step, duplicated rows were removed from the DataFrame to ensure data integrity and accuracy. Using the inplace=True option, the drop duplicates method from pandas was used to apply the modifications directly to the DataFrame. Print(data.head()) was used to show the top five rows of the revised DataFrame to confirm the alterations after the duplicates were eliminated. In addition, when the duplicates were eliminated, print(data.shape) was used to print the DataFrame's shape, which shows the number of rows and columns, to summarise the dataset's dimensions. Following this, specific columns deemed unnecessary for the analysis were dropped from the DataFrame. Using the drop technique and the columns parameter, the columns Campaign_ID, Acquisition_Cost, ROI, Duration, Date, and Campaign_Goal was eliminated. Once more, these modifications were applied straight to the DataFrame using inplace=True. Print(data.head()) was used to show the first five rows of the revised DataFrame to verify that the designated columns had been removed. As seen in Figure 4.4, the shape of the DataFrame was printed again using print(data.shape) to display the updated dataset dimensions following the removal of redundant columns. Figure 4.4: Dataset after Dropping some Columns Finding the target variable and the feature set was the first step. Customer_Segment was declared as the goal variable, or the variable that has to be forecasted. The dataset's remaining columns were all regarded as features. A list of feature columns that did not include the target variable was produced to do this. Since machine learning models usually require numerical input, label encoding was used to translate categorical variables into numerical values. The LabelEncoder from the sklearn.pre-processing module was used for this procedure. Gender, Channel_Used, Location, Language, Company, and Age were the categorical columns that were 12 Corpus Intellectual ISSN PRINT 2811-3187 ONLINE 2811-3209 Volume 3 NO 3 2024 Conf. Edition recommended for encoding. Every categorical column was encoded iteratively after a LabelEncoder object was initialized. This process assigned a unique numerical value to each category within a column, transforming the categorical data into a format suitable for the Random Forest model. To make sure the target variable Customer_Segment was in the appropriate numerical format for model training, it was also encoded using the LabelEncoder if it was categorical. Next, the dataset was divided into testing and training sets. After separating the features (X) and target (y), the data was split into 80% training and 20% testing sets using the train_test_split method from sklearn.model_selection, with a random state of 42 for repeatability. Using the trained Random Forest model, feature importance was computed to determine each feature's importance in the model's predictions. After being extracted, the feature importances were saved in a DataFrame and sorted by importance. The "Company" variable, which has a significantly higher importance score than other features, is the most important component impacting the model's predictions, according to the feature importance chart in Figure 4.5. After that, "Clicks" and "Impressions" both provide a significant but far less contribution than "Company." While features like "age," "language," "channel_used," and "gender" have little bearing on the model's predictions, features like "conversion_rate," "engagement_score," and "location" demonstrate considerable value. This implies that the business linked to the data points significantly influences the result, outweighing other factors in terms of predictive ability. When evaluating the dataset's applicability for machine learning tasks and making decisions on how best to handle class distributions during model development, this graphical representation in Figure 4.6 was essential. 13 Corpus Intellectual ISSN PRINT 2811-3187 ONLINE 2811-3209 Volume 3 NO 3 2024 Conf. Edition Figure 4.5: Feature Importance of the Dataset Class Figure 4.6: Description of the Balanced Training Dataset Class 4.2 Implementation The described implementation presents a streamlined approach to predicting customer segments using a trained Random Forest Classifier through an interactive interface built with widgets. Users can input key features influencing predictions, such as categorical features (Gender, Channel_Used, Location, Language, Company, and age) through dropdown menus and numerical features (Conversion_Rate, 14 Corpus Intellectual ISSN PRINT 2811-3187 ONLINE 2811-3209 Volume 3 NO 3 2024 Conf. Edition Clicks, Impressions, and Engagement_Score) via sliders. This setup allows for dynamic adjustment of input values within specified ranges. The process begins with data pre-processing, where categorical inputs are transformed into numerical representations using pre-fitted LabelEncoder instances to ensure compatibility with the machine learning model. These inputs are then structured into a data frame and fed into the trained classifier, which predicts the customer segment based on learned patterns. The predicted segment is displayed immediately, providing users with quick insights. This interactive approach is valuable for businesses needing rapid experimentation and scenario analysis to tailor marketing strategies, optimize resources, and enhance customer engagement, thereby improving decision-making processes through actionable insights derived from predictive analytics, as shown in Figure 4.7. and 4.8, respectively. Figure 4.7: The Model Implementation on Python. 15 Corpus Intellectual ISSN PRINT 2811-3187 ONLINE 2811-3209 Volume 3 NO 3 2024 Conf. Edition Figure 4. 8: Description of the Home Page 4.3 Testing Testing the customer segmentation prediction system involves using test data to evaluate its performance. After the User inputs the test data, they can click on the submit button to initiate the prediction process. The description of some sample test data is presented in Figure 4.9. Figure 4.9: Testing the system Figure 4.10: Testing Result In the above figure, the result page displays the predicted outcome of the customer segmentation process. After the User inputs the test data and clicks the predict button, the System processes the information and presents the results. Results The achieved metrics accuracy, precision, recall, and F1 Score all stand at 0.97, highlighting the classifier's consistent capacity to generate correct predictions across a range of assessment criteria, as shown in Figure 5.1. Accuracy served as a foundational metric, indicating that 97% of the classifier's predictions align correctly with the actual customer segments in the test data. Precision, which measured the proportion of correctly predicted positive instances 16 Corpus Intellectual ISSN PRINT 2811-3187 ONLINE 2811-3209 Volume 3 NO 3 2024 Conf. Edition (customer segments) out of all the cases predicted as positive, stands at 0.97. This signifies that the classifier maintains a high precision rate, minimizing false positives and ensuring the identified customer segments are reliably accurate. Recall that quantifying the proportion of correctly predicted positive instances out of all actual positive instances also stands at 0.97. This indicates the classifier's ability to effectively capture most of the true positive customer segments in the dataset. F1 Score, a combined metric of precision and recall, further reinforces the classifier's strong performance with a score of 0.97. This harmonic mean reflects a balanced assessment of the model's predictive power, highlighting its ability to identify positive instances and avoid misclassifications accurately. Figure 5.1: Random Forest Result. According to the confusion matrix in Figure 5.2, the model demonstrated strong performance across all classes, with the highest number of correct predictions occurring along the diagonal. Specifically, the model correctly classified 11,769 instances of class 0, 11,591 instances of class 1, 11,834 instances of class 2, 11,772 instances of class 3, and 11,424 instances of class 4. There are relatively few misclassifications. For instance, class 0 had minor misclassifications into other classes, with the highest being 180 instances misclassified as class 1. Similarly, for class 1, the most notable misclassification was 270 instances classified as class 3. Overall, the matrix indicates that the model is effective at correctly predicting the majority of instances, with only a small number of instances being incorrectly classified. The confusion matrix of the Random Forest model is presented in Figure 5.2. 17 Corpus Intellectual ISSN PRINT 2811-3187 ONLINE 2811-3209 Volume 3 NO 3 2024 Conf. Edition Figure 5.2: Random Forest Confusion Matrix. The results for the Naive Bayes model were reported as follows: the accuracy was 0.35, indicating that the model correctly classified 35% of the instances. The precision was 0.41, meaning that 41% of the cases predicted as positive were positive. The recall was 0.35, showing the model identified 35% of the actual positive instances. Finally, the F1 score was 0.36, which is the harmonic mean of precision and recall; summarizing the balance between these two metrics and the confusion matrix is presented in Figure 5.3. Figure 5.3: Naive Bayes Confusion Matrix. 18 Corpus Intellectual ISSN PRINT 2811-3187 ONLINE 2811-3209 Volume 3 NO 3 2024 Conf. Edition Discussion Comparing the Naive Bayes model results with the results achieved by the Random Forest algorithm in the study by Kubade et al. (2023) and the model from the current study highlights a significant performance gap, as shown in Table 4.1. The Naive Bayes model achieved an accuracy of 0.35, precision of 0.41, recall of 0.35, and an F1 score of 0.36. In contrast, the Random Forest algorithm in the study by Kubade et al. (2023) achieved an accuracy of 89.6% for customer segmentation, while the current study's model achieved an accuracy of 97%. The Naive Bayes model's accuracy of 35% was substantially lower than the Random Forest model's 89.6% and the current study's model's 97%. This suggests that the Naive Bayes model did not capture the underlying patterns and relationships within the data as effectively as the other two models. The significantly higher accuracy rates of the Random Forest model and the current study's model indicate its superior ability to forecast consumer groups, making them more favourable for practical applications in optimizing marketing strategies and enhancing customer targeting with greater confidence and precision. Table 6.1 Comparison of the models' accuracy S/N Model Accuracy 1 Naive Bayes 35% 3 Random Forest 97% Table 6.2: Results Comparison with existing studies S/ N Author( s) Methods Accura cy Precisio n Dataset Overall Best Method 1 Kubad e et al. (2023) Support Vector Machine Random Forest Classifier K- Nearest Neighbo ur 75.3% 89.6% 83.2% 66.7% 67.5% 75.3% E- commer ce Customer Data RF achieve d the best result with the dataset 19 Corpus Intellectual ISSN PRINT 2811-3187 ONLINE 2811-3209 Volume 3 NO 3 2024 Conf. Edition 2 Current Study Naive Bayes Random Forest 35% 97% 41% 97% Social Media Advertisin g Dataset RF achieve d the best result with the dataset Conclusion In conclusion, developing and evaluating a Random Forest Classifier for customer segmentation demonstrated remarkable performance, achieving a 97% accuracy across various metrics, including Precision, Recall, and F1 Score. This performance was far better than Kubade et al. (2023) found in comparison research, where a model identical to this one attained an accuracy of 89.6%. Thus, the model's resilience and efficacy in forecasting client categories were emphasized. It was also observed that adding an interactive prediction interface improved the model's usefulness. This interface allows stakeholders to input and analyze critical criteria impacting consumer segmentation in real-time. As a result, companies can quickly make well-informed decisions, optimize marketing plans, and enhance consumer interaction techniques through trustworthy predictive insights. References Amutha, R., & Khan, A. A. (2023). Customer segmentation using machine learning techniques. Tuijin Jishu/Journal of Propulsion Technology, 44(3), 2051. Boisena, M, K Terlouwb, P Grootea and O Couwenberga (2018). Reframing place promotion, place marketing, and place branding — moving beyond conceptual confusion. Cities, 80, 4–11 Dawane, V., Waghodekar, P., & Pagare, J. (2021). RFM Analysis Using K-Means Clustering to Improve Revenue and Customer Retention. In Proceedings of the International Conference on Smart Data Intelligence (ICSMDI 2021). 1982-1989. Durojaye, D. I., & Obunadike, G. N. (2022). Analysis and visualization of market segmentation in the banking sector using KMeans machine learning algorithm. FUDMA Journal of Sciences (FJS), 6(1), 387-393. https://doi.org/10.33003/fjs-2022-0601-910 https://doi.org/10.33003/fjs-2022-0601-910 20 Corpus Intellectual ISSN PRINT 2811-3187 ONLINE 2811-3209 Volume 3 NO 3 2024 Conf. Edition Kasem, M. S., Hamada, M., & Taj-Eddin, I. (2023). Customer profiling, segmentation, and sales prediction using AI in direct marketing. arXiv. https://arxiv.org/abs/2302.01786 Kubade, H., Gharde, P. J., Fulbandhe, T. D., Pandey, A. A., Rehpade, K. S., & Hedaoo, S. R. (2023). Customer segmentation: Types of models and clustering techniques. International Journal of Advanced Research and Innovative Ideas in Education (IJARIIE), 9(2), 2706. https://ijariie.com Kumar, A. (2023). Customer Segmentation of Shopping Mall Users Using K-Means Clustering. In Advancing SMEs Toward E-Commerce Policies for Sustainability (pp. 248-270). IGI Global Nguyen, S. P. (2021). Deep customer segmentation with applications to a Vietnamese supermarket's data. Soft Computing, 25, 7785-7793. https://doi.org/10.1007/s00500-021-05796-0 Sruthi, E. R. (2024). Understand random forest algorithms with examples (updated 2024). Retrieved from https://www.analyticsvidhya.com/blog/2021/06/understanding- random-forest/ Suryakanthan, M., Vimal, K., Sanjay Raj, R., & Thiruselvan, P. M. E. (2024). Customer segmentation for enhancing business strategy. International Journal of Research Publication and Reviews, 5(1), 4735- 4740. https://www.ijrpr.com Thalkar, V. R. (2021). Customer segmentation using machine learning. International Journal of Scientific Research in Computer Science, Engineering and Information Technology, 7(6), 28-37. https://doi.org/10.32628/CSEIT217654 Yadegaridehkordi, E., Nilashi, M., Nasir, M. H. N. B. M., Momtazi, S., Samad, S., Supriyanto, E., & Ghabban, F. (2021). Customer segmentation in eco-friendly hotels using multi-criteria and machine learning techniques. Technology in Society, 65, 101528. https://doi.org/10.1016/j.techsoc.2021.101528 Zhang, Y., & Daugherty, T. (2018). Data-driven visual content marketing: Understanding consumer engagement through Pinterest. Journal of Retailing and Consumer Services, 43, 205-216. https://doi.org/10.1016/j.jretconser.2018.02.006 https://arxiv.org/abs/2302.01786 https://ijariie.com/ https://doi.org/10.1007/s00500-021-05796-0 https://www.analyticsvidhya.com/blog/2021/06/understanding-random-forest/ https://www.analyticsvidhya.com/blog/2021/06/understanding-random-forest/ https://www.ijrpr.com/ https://doi.org/10.32628/CSEIT217654 https://doi.org/10.1016/j.techsoc.2021.101528 https://doi.org/10.1016/j.jretconser.2018.02.006 21 Corpus Intellectual ISSN PRINT 2811-3187 ONLINE 2811-3209 Volume 3 NO 3 2024 Conf. Edition Zote, J. (2024). Social media target audience: How to find and engage yours. Retrieved from https://sproutsocial.com/insights/social- media-target-audience/ View publication stats https://sproutsocial.com/insights/social-media-target-audience/ https://sproutsocial.com/insights/social-media-target-audience/ https://www.researchgate.net/publication/388549013