Last spring, I spent three months at the Recurse Center, which is like a writers’ retreat for programmers. People from all over the world and with very different backgrounds go there to become better programmers. It’s a great environment for self-learning and collaborating with others. Here I’ll briefly outline what I worked on during that time.
For the past ten years, my number one tool at work as been R. As I’ve been focused on cancer research and chromosomal aberrations, also the packages I’ve used have been from this area (especially Bioconductor). For the three months at RC, I decided to work on broadening my skillset to more general-purpose data science tools.
I still worked mostly in R, and learned to use data manipulation packages like data.table (for more efficiency and modifying data in place) and dplyr (more expressive/logical/readable code, at least for someone like me with background in SQL). For visualizations, I learned how to use ggplot2, how to make interactive apps with shiny, and how to draw maps with ggmap and leaflet. And regarding machine learning, I learned how to use the caret package’s unified interface to train and tune various statistical models. I also took Stanford’s online course on Statistical Learning.
To get to know these packages, I worked on three little projects: how different neighborhoods in New York City vary in their Citi Bike usage patterns, what data from the Moves app tells about how I move and where I’ve been, and also made a prediction on who would win the Stanley Cup.
In addition to working with R, I brushed up my Python skills completing the exercises available at Dataquest. I got to know the very basics of libraries like NumPy, pandas, and matplotlib, as my previous Python experience was only from writing basic utility scripts and not really any type of data analysis.
I also listened to many excellent talks on topics such as public speaking, network protocols, UNIX process model and shell programming, Docker/containerization (and updated the server this website is hosted at to use systemd containers), immutability, hashes, and whether artificial intelligence is a threat or not. Books I read included An Introduction to Statistical Learning, ggplot2 – elegant graphics for data analysis, and The Second Machine Age.
In general, my experience at RC was very positive. I learned a lot and was surrounded with very smart people who I could always ask for advice and guidance. To anyone contemplating a batch I would say go.