A Sentiment Analysis:
Predicting Communicative Intention in Code Review Questions
Pooya Nikbakht - Winter 2021

semantic analysis - code review questions

Abstract: In the previous work "Communicative Intention in Code Review Questions" by Ebert et al., the authors have studied the communicative intention of comments during code review where "developers request clarifications, suggest improvements, or ask for explanations about the rationale behind the implementation choices". They "conducted an exploratory case study by manually classifying 499 questions extracted from 399 Android code reviews to understand the real communicative intentions they convey". On top of this work, We will try to build some classification models to predict the communicative intention of new comments/questions in a code review. To do so, in addition to the question's words, We will consider three new features extracted from the raw questions, and examine their impact on the prediction models' accuracy. The three features are:

  • Key-Questions: the key question of each raw comment, from which the question's intention is concluded.

  • Question-Sentiment: the sentiment of each question.

  • Question-Length: the length of each question.

Tools. We have used Python as the programming language together with some related packages and libraries. To be more precise:

  • We utilized the Scikit-learn package as well as the XGBoost library for building different classification models and predicting the test data.

  • Scikit-learn was also used to evaluate the models by accuracy, precision and recall.

  • NumPy and Pandas libraries were deployed for data manipulation and analysis.

  • NLTK libraries for sentiment analysis.

  • Jupyter Notebook was used to provide a publishable document containing live code and results.

Project's Full Paper: Click Here

Project's Source Codes (in the form of a Jupyter Notebook): Click Here