Quick report using Wikidata to shed light on USP’s bioinformatics program.
The report format is in topics, and although it doesn’t look so pretty, I think it gets straight to the point.
Objective:
Have an open semantic representation of the theses of the USPR bioinformatics program Answer questions such as:
– What topics are most studied?
– What is the average size of the theses sent to the program?
– What is the biggest thesis in number of pages?
– Who participated in the most stalls on the program?
Method:
Scraping information from the platform Theses USP using Python and the Selenium and BeautifuSoup modulesReconciliation of data using Google Spreadsheets with Add On Wikipedia and Wikidata Tools along with manual curation and creating Wikidata entries when necessaryUpdating the Wikidata base using the Quickstatements tool
GitHub repository: https://github.com/lubianat/thesis2wikidata
Results:
More than 75 theses reconciled to Wikidata: https://w.wiki/5b83
List of the most common topics in program theses: https://w.wiki/5b8
![](https://pointstodots.wordpress.com/wp-content/uploads/2022/08/image-1.png?w=874)
Biggest theses of the USP program: https://w.wiki/5b93
Average Master’s pages: 103 https://w.wiki/5b94
Average doctoral pages: 143 https://w.wiki/5b94
![](https://pointstodots.wordpress.com/wp-content/uploads/2022/08/image-2.png?w=1024)
- People that were most times committee members: https://w.wiki/5b96
![](https://pointstodots.wordpress.com/wp-content/uploads/2022/08/image-3.png?w=446)
- People that were together on committees https://w.wiki/5b98
![](https://pointstodots.wordpress.com/wp-content/uploads/2022/08/image-4.png?w=753)
- Committee co-participation network: https://w.wiki/5b9S
![](https://pointstodots.wordpress.com/wp-content/uploads/2022/08/image.png?w=1024)
Conclusion
Putting the data of theses and boards in an organized way, it is possible to answer many interesting questions!
Want to make one of these for your postgraduate program? Say hello! I’d be happy to teach/do together.