Skip to content

Latest commit

 

History

History
9 lines (5 loc) · 1.74 KB

README.md

File metadata and controls

9 lines (5 loc) · 1.74 KB

Data-Analytics-Thesis

My thesis on ranking algorithms, submitted in partial fulfilment of the requirements for the degree of Masters of Science in Computer Science (Data Analytics).

This thesis acheived a first class honours degree.

Abstract

Citation analysis is an important tool used to evaluate researchers and their scientific work. The most common evaluation metrics used today are the impact factor for journals and the h-index for authors. In recent years a trend has emerged where these evaluation metrics are increasingly being used to determine whether or not a researcher gets considered for a job, gets a promotion, or even gets considered for a government grant. The issue here is that these evaluation metrics are easily manipulated by self-citations and the more serious recent emergence of citation cartels. On the one hand, self-citations are easy to spot but on the other hand, citation cartels are not. This research project introduces alternative approaches, which are based on Google’s PageRank algorithm, to evaluate researchers and journals. A citation dataset composed by Valcav Belák, ArnetCite, was used. How these algorithms ranked papers compared to raw citation counts was first looked at. The robustness of these algorithms against author self-citations was then determined. After this, four of the lowest ranking papers in both algorithms were chosen and a citation cartel was formed by creating synthetic citation data with cartel features by modifying existing entries. The performance of the algorithms is measured in terms of how robust they are after their scores were recalculated when the cartel was created. The methodologies and the results of the algorithms are discussed, and future work and limitations are also provided.