A demo is worth a million words. – Harold Pimentel
First, the demo [1]:
https://hpimentel.shinyapps.io/leaderboard_demo/
Pritchard lab has been doing a “paper reading club” every summer for the last few years. The general idea is to share papers you read, whether you recommend others read it, and some ideas you liked or ideas you might be skeptical of in the paper.
This year I built a leaderboard application that keeps track who read what and provides some summaries of the data. The goal of the leaderboard is to provide a place to share and to facilitate conversation about papers amongst people who share common interests.
From an end-user perspective one simply logs the paper, a recommendation, and comments to a Slack channel (see engineering section) and the reviews/summaries are automatically populated.
Given some reviews from users, leaderboard provides a few key summaries:
- a searchable list of all the papers that have been read.
- count of how many people read each paper and what their recommendations are.
- network analysis of the papers read.
- and perhaps the most important metric: how many in the group have read more than Dear Leader.
In total, 10 of us logged papers on a regular basis and over the summer we logged a total of 249 papers (approximately 250)!
So what is the prize? Well, Jonathan got us all a cake (yum!). Despite my pleading, we allowed members of the lab who did not participate to also eat the cake. Of course, there is also the metaphorical cake: reading more papers.
Below is an artist’s interpretation of the first top 3 readers with their “Congratulation!” cake:
From left to right: Jeff Spence (39), Roshni Patel (39), and Alyssa Lyn Fortier (46).
An overview of the app follows.
leaderboard
This is the main page showing high-level summary statistics:
- a histogram of the ranking.
- a more detailed table showing the exact count of each person.
- a time series plot showing how many papers have been read by each person.
DOI info
This is a quick way to get an idea of what people think of papers and the most read papers. Each paper is listed with the users that read it and their recommendations.
Additionally, basic information is listed (e.g. authors, link to paper, date of publication, etc.).
network analysis
Each node in the network is a person and node size is scaled by the number of papers read.
Each edge indicates shared papers read and the size of the edge is proportional to how much sharing.
word clouds
Word clouds of recommendations and comments.
Pretty silly, but fun nonetheless.
raw data
The full data set which is also searchable.
This is useful in case you are interested in detailed comments about a specific paper.
personal experiences
The format is pretty fun and honestly pretty useful. It creates a healthy, friendly competition among the lab to read consistently and write reviews in a meaningful way. I spent a lot time reflecting on what I have to say about papers because I know others in the lab will be reading it.
Perhaps unsurprisingly recommendations tend to be pretty helpful. People tend to be fairly honest and put a fair amount of thought into their reviews. It is also nice to get an idea of what people are thinking about and what they found interesting or not interesting about seminal papers.
I don’t think others did this, but I refereed a number of papers this summer and posted my full reviews. I figure this sort of post might be useful to young trainees who may not have much experience reviewing papers.
Finally, the suspense adds another dimension of fun. Each week I tried to provide a summary of the rankings (this should be automated in the future). There were numerous dark horses and the competition was quite fierce at times. You can see from the time series plot, Roshni (roshni_patel) was the leader for quite some time. Then Jeff (jspence) silently posted a bunch of papers directly on the spreadsheet without anyone noticing and became the leader. Shortly thereafter Alyssa (alyssa_lyn_fortier) was like “I’m having none of that!” [2] and logged tons of papers in just two weeks.
a paper worth posting?
A common question: is this paper and my review worth posting? There is no mandate but my general rule has been if I have read a paper closely enough to have an informed opinion about it, my review counts.
engineering
The general workflow looks like this:
The app itself is a Shiny R app hosted on http://shinyapps.io for free. Whenever there is a new post to the #paper_reading_hook Slack channel, a Zapier daemon is listening (setup by Nasa in the lab). A post to #paper_reading_hook looks like:
DOI_OR_OTHER_ID RECOMMENDATION [COMMENTS]
An example:
10.1038/nmeth.4324 YES one of the best papers I have ever read. you MUST read it!
Zapier then parses a regex pattern (([^ ]+) ([^ ]+) (.*)
) and then outputs to a private Google Sheet in the format
doi handle date recommend comments
The actual UI looks like this:
When the app is opened:
- It checks for new entries.
- If there is a new DOI, it uses
rcrossref
to pull all relevant information about the paper (e.g. authors, date, etc.). - It then aggregates the data, joins some tables and writes to a different private table that contains some summaries.
- If there is a new DOI, it uses
- The app then displays the current state.
discussion and future work
Right now the application is only scaled for the lab. An immediate improvement would be putting a proper database in place of the Google Sheet. In the long-term, I think it could be quite interesting to use publicly with some modifications:
- Allow anonymous posts but have some form of community regulation.
- I like the arXiv model of having someone else vouch for you before you post.
- a voting system could help regulate comments/weed the way the noise.
- Collaborative filtering of course for recommendations.
- Modeling of the content of the papers themselves.
These ideas are of course nothing new, but I will say the format is quite fun.
Additionally, the UI could use a fair amount of work. It is very much a project born out of my “spare time” in between running kallisto. If kallisto were slower, then maybe leaderboard would be better. You can blame Páll Melsted for that.
If you are interested in contributing or using leaderboard in your own lab, the code is available free as in beer:
https://github.com/pimentel/leaderboard
acknowledgements
Firstly, the Pritchard lab for participating, providing feedback, and making it fun! Jonathan for giving us the freedom and supporting the project. Nasa was instrumental in getting the hook working and I think we would’ve gotten much less participation without it.
Thanks, all!
[1] The demo and screenshots, except the time series/front page are obviously bogus. They are a simulation from our papers and comment distribution, completely at random. This was done to protect the private comments of those in the lab.
[2] At least she said that in my head.