As a relative newcomer to the field, I’ve been learning and doing data science largely on my own. This is okay, I guess, given access to Stack Overflow, MOOCs, and a handful of O’Reilly’s textbooks, but not ideal. Fortunately, the data science community here in New York seems to be big and active, so opportunities to connect are plentiful.

A particularly easy way to join (or at least follow) the conversation is via Twitter. A few weeks ago I got a Twitter account, @bjdewilde, and started following some of the big names in data science: @hmason, @drewconway, @jakeporway, and @fivethirtyeight, among others. My Twitter feed now provides a steady stream of interesting and useful information –— along with the absurd, the irrelevant, the over-my-head –— that I might not have found otherwise. Like this. And this. And this. I’m not a prolific tweeter by any stretch, but as I get more comfortable in this forum, I’d like to start contributing to the conversation rather than just reading it silently. Then again, I’m a long-time photographer, so voyeurism comes easily to me. :)

I’ve also become a regular at an informal, weekly data/journalism working group called CSV Soundsystem, which convenes at a Think Coffee in The Village for its decent beer and free wi-fi. I was introduced to the group by Harmony Institute’s previous data scientist, Brian Abelson, who now works at The New York Times. The benefits of working in the company of other data folks need no explanation. During my first evening with CSV, I acquired a new text editor (Sublime Text) and a new terminal (iTerm), thereby fundamentally improving my workflow; this sort of thing happens surprisingly often. Another great experience has been working together on group projects, which brings me to the next paragraph…

2013-02-17-team-csv-at-hackathon.jpg
Two members of Team CSV: Brian Abelson (NYT) and Michael Keller (The Daily Beast).

Two weekends ago, I participated in my first hackathon: a “Bicoastal Datafest analyzing money’s influence in politics” held simultaneously at Columbia University and Stanford University. Team CSV attended and ended up winning Best in Innovation and Best in Show — a promising start to my hackathon career. ;) I wrote a post about this for HI’s blog, The Ripple Effect, which you can read here, so I won’t go into details. I will emphasize that my main task of the weekend was hardcore data munging in Python, which wasn’t glamorous but was very necessary in order for any interesting analysis to be performed. I’ve mentioned this before: the bulk of a data scientist’s work tends to be in data fetching and cleaning rather than analysis. But that’s life.

So, I’m slowly but surely working my way into the data science community, and that’s a Very Good Thing. Working in isolation may well get the job done, but working within a network of like-minded people gets the job done faster and better and has much more potential for surprising but fruitful detours. I hope to continue collaborating with other data people (e.g. DataKind) as well as attend a couple conferences and generally show my face at data-related events in the coming months. But for now, I think I’m ready for my Twitter fix –— Modern Seinfeld has been on a roll lately!