Google Summer of Code 2019: Improve Article Recommendation Pipeline

GSoC Project Proposal: Improve Article Recommendation Pipeline

Mentor: Bahodir Mansurov

Synopsis

The project improved the article recommendation pipeline by solving the various issues in the article-recommender projects. The issues that were solved as a part of the project are:

Merged:

Outcome

Each task in the project had a noticeable outcome.

The first task made sure that the data returned by the morelike endpoint did not have duplicate data. This ensured that the data we got from a single call to the API contained more wikidata items than what was being returned previously.

The second task made sure that the translation endpoint does not fail intermittently without returning a proper error code. This helps us with debugging when some error arises. It also made sure that the API does not fail due to an error with the Wikidata Query Service.

The third task replaced the internal call that was being made to Wikidata Query Service(WDQS) with a call to the MediaWiki API(MWAPI) when the translation endpoint is called. This decreased the number of requests being made internally thereby decreasing the time required by the translation endpoint. It also decreased the time required by replacing the slower WDQS with a faster MWAPI.

The fourth task made sure that the script used to import the data generated by Hadoop into the database does not block the CPU. It improves the CPU efficiency of the shared machines and allows to run other CPU tasks as well without them getting blocked.

Acknowledgement

I would like to thank my mentor, Bahodir Mansurov for helping me throughout the program, encouraging me with great feedback and guiding me towards project completion.

Thoughts about the project and more..

This was the first time I was formally going to work on a “real” project. I have been contributing to open source since more than 2 years but they have mostly been one off contributions. This was also the first time I was writing code for an organization as big as Wikimedia. I was excited to work with the other developers. Everyone I interacted with from the Wikimedia team was helpful and more than willing to go out of the way to help a beginner like me. It felt wonderful working along with my mentor and all the other people in the team. A big thank you to all the fellow developers who helped me with the project and also to the Wikimedia Foundation and Google for providing me this wonderful opportunity.