Interoperable USP Bioinformatics Theses Catalog with Wikidata

Quick report using Wikidata to shed light on USP’s bioinformatics program.

The report format is in topics, and although it doesn’t look so pretty, I think it gets straight to the point.

Objective:

Have an open semantic representation of the theses of the USPR bioinformatics program Answer questions such as:
– What topics are most studied?
– What is the average size of the theses sent to the program?
– What is the biggest thesis in number of pages?
– Who participated in the most stalls on the program?

Method:

Scraping information from the platform Theses USP using Python and the Selenium and BeautifuSoup modulesReconciliation of data using Google Spreadsheets with Add On Wikipedia and Wikidata Tools along with manual curation and creating Wikidata entries when necessaryUpdating the Wikidata base using the Quickstatements tool

GitHub repository: https://github.com/lubianat/thesis2wikidata

Results:

More than 75 theses reconciled to Wikidata: https://w.wiki/5b83

List of the most common topics in program theses: https://w.wiki/5b8

Biggest theses of the USP program: https://w.wiki/5b93
Average Master’s pages: 103 https://w.wiki/5b94
Average doctoral pages: 143 https://w.wiki/5b94

Conclusion

Putting the data of theses and boards in an organized way, it is possible to answer many interesting questions!

Want to make one of these for your postgraduate program? Say hello! I’d be happy to teach/do together.

Leave a Comment