Negation Detection In Arabic Opinion Reviews: A Comprehensive Annotated Dataset For Sentiment Analysis
Keywords:
Annotated dataset; Arabic opinion reviews; Dialectal Arabic; Modern standard Arabic; Natural Language Processing; Negation detection; Sentiment analysis.Abstract
Negation detection plays a vital role in Natural Language Processing (NLP), especially in sentiment analysis. In this paper, we introduce a comprehensive dataset of Arabic opinion reviews, specifically annotated for negation detection. The dataset consists of 84,000 reviews collected from TripAdvisor, Booking.com, and Agoda, spanning the period from June 2013 to June 2023. It is evenly divided between 42,000 'negated positive' reviews and 42,000 positive reviews. The reviews focus on hotels and travel accommodations across the Middle East and North Africa and are written in various Arabic dialects. The data collection process involved web scraping, language filtering, and both automatic and manual annotation of negation cues, such as ‘لا’ (no) and ‘ليس’ (not). The quality of the annotations was verified through expert review and inter-annotator agreement, ensuring high consistency. This dataset offers valuable insights into negation structures in both Modern Standard (MSA) and Dialectal Arabic (DA), providing a foundation for developing and evaluating negation detection methods. It will be made available to the Arabic research community to help address these key linguistic challenges.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Journal of Information Systems Research and Practice
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.