Authorship Verification of Yorùbá Blog Posts using Character N-grams

dc.contributor.authorOmolayo Abegunde
dc.date.accessioned2025-05-27T16:30:29Z
dc.date.available2025-05-27T16:30:29Z
dc.date.issued2020
dc.descriptionThe article presents the authorship verification of a Yorùbá online posts. The experimental results show that there is a perfect relationship in the dataset. The study was able to prove that the contents (data) of the posts are from the same author and not from the different authors as earlier thought. The study believed that the application of further techniques can be employed for further evaluation. It was discovered that using unprocessed data will most time give a low result or misleading result. As a further study, the room to add additional dataset is required, generate additional features and use other ML techniques to ascertain our result.
dc.description.abstractThe task of determining whether a pair (or more) documents were written by the same author comes under authorship verification. N-grams are sequences of elements appearing in texts; they can be words, POS tags, characters, or some other elements that can be encountered one after another in texts. The tasks in authorship verification were more challenging as it focused on whether the target author and the text to be used have a closely related style. In this paper, an authorship verification task on Yorùbá blog posts is hereby presented. N-grams features were extracted from the corpus, and inductive learning techniques were applied to build feature-based models in order to perform the automatic author identification. The K-means clustering algorithm was used in the study since the supervised algorithm cannot be applied to the one-class classification of the dataset. The evaluation was done with the Silhouette Coefficient algorithm, which is used to evaluate unlabeled data. The result obtained is positive, which indicates the data points have a strong relationship with the dataset. The obtained result signifies a yes relationship between the posts. This signifies that the posts were from the same author.
dc.description.sponsorshipSelf
dc.identifier.citation7
dc.identifier.urihttps://repository.run.edu.ng/handle/123456789/4826
dc.language.isoen
dc.publisherICMCECS (IEEE)
dc.titleAuthorship Verification of Yorùbá Blog Posts using Character N-grams
dc.typeArticle
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
agbeyangi2020.pdf
Size:
147.2 KB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: