Interoperable USP Bioinformatics Theses Catalog with Wikidata

Quick report using Wikidata to shed light on USP’s bioinformatics program.

The report format is in topics, and although it doesn’t look so pretty, I think it gets straight to the point.


Have an open semantic representation of the theses of the USPR bioinformatics program Answer questions such as:
– What topics are most studied?
– What is the average size of the theses sent to the program?
– What is the biggest thesis in number of pages?
– Who participated in the most stalls on the program?


Scraping information from the platform Theses USP using Python and the Selenium and BeautifuSoup modulesReconciliation of data using Google Spreadsheets with Add On Wikipedia and Wikidata Tools along with manual curation and creating Wikidata entries when necessaryUpdating the Wikidata base using the Quickstatements tool

GitHub repository:


More than 75 theses reconciled to Wikidata:

List of the most common topics in program theses:

Biggest theses of the USP program:
Average Master’s pages: 103
Average doctoral pages: 143


Putting the data of theses and boards in an organized way, it is possible to answer many interesting questions!

Want to make one of these for your postgraduate program? Say hello! I’d be happy to teach/do together.

Leave a Comment