Monday, September 21, 2015

Artificial Intelligence Synonyms

Synonyms are words that mean largely the same thing, although there may be contextual differences. In my view, the following terms are synonyms. 
Artificial Intelligence - often focused on robotics. The AI field used to approach intelligence by building models of the world. There may still be pockets of people that still do that, but if there are, they publish papers and attend conferences that I haven’t come across yet. My understanding of the AI field is that it’s largely adopted the Machine Learning Approach.
Machine Learning - largely similar to Artificial Intelligence in it’s application, although it’s usually defined in a way that simply finds patterns/correlations in past data. Machine learning usually focuses on what’s going to happen and doesn’t care as much about “why”.
Data Science/Data Mining - largely generic terms. I went to KDD2014, a conference that labels itself as “Knowledge and Data Discovery” There was an entire “visualization” track of presentations that to me felt strikingly similar to Business Intelligence.
Informatics/Bioinformatics - focused on various aspects of the medical profession/business. Applications here are wide and deep, and include everything from real-time disease prediction to analyzing who’s (not) going to pay their bills.
Natural Language Processing - focused on gleaning insights from a large collection of unstructured text. 
Computer Vision - often similar in practice to Natural Language Processing, although it analyzes pixels of pictures/videos rather than collections of text.
Business Intelligence - Anecdotally, my experience is that many BI people focus on questions that they can answer with sums and averages. Data Scientists may look at the practice of BI and assert that BI practitioners don’t ask hard enough questions. Sales professionals from BI vendors have been known in the past to push back on this assertion from Data Scientists. 
If you disagree, I love comments! Please be nice.

Monday, September 14, 2015

Data Science and Dysentery

If you ask 100 experts what Data Science is, you’re likely to get 100 different answers. Often times the answer you get will reflect the background and skill set of the person speaking. (Full disclosure: I have a Computer Science degree, and I learned statistics over time.)
I define Data Science as applying the old-school scientific method (observation, question, hypothesis, experiment, analyze, etc) using 21st century tools. I’m not sure that I agree 100% with wired magazine that Data Science makes science “Obsolete”, but I do agree that the technology is a game changer. 
Data Science is kinda like the invention of air travel. Air travel didn’t make walking completely obsolete, just like traditional science isn’t likely to go anywhere either. However, now that airports are commonplace it is considerably less common for people to walk the Oregon Trail and die of dysentery. 
Image couratesy of walknboston, “Hard Drive” September 14, 2015 via Flickr; Creative Commons 2.0 Generic

Why I'm not a fan of Big Data

I’ve never been a fan of the term “Big Data” - for several reasons: 
  1. It’s not totally clear how to measure “big” in this context. Is "big" a certain number of rows or columns, or is it a specific size on disk? All three quantities can be manipulated.
  2. “Big” is a subjective term, with a meaning that changes over time. Data contained on a full hard drive from 20 years ago fits on a keychain today - with room to spare. 
  3. “Big Data" sounds suspiciously like “Big Oil”, “Big Tobacco”, and “Big Pharma,” 3 other groups of entities that the media likes to demonize. 
  4. More often than not, when somebody uses the term “Big Data”, it’s a marker indicating that said person doesn’t have a good understanding of what they’re talking about.
I personally prefer the terms “Data Science” or “Analytics” unless I’m working in a niche that generally uses another synonym. I’m convinced that “Big Data” has mainly survived because a sizable set of journalists  who don’t understand the field keep using the term.
The downside is that (for better or for worse) the term has stuck. Using said term has become somewhat of a necessary evil for anybody (like me) who wants to communicate with people interested in the topic.

Image couratesy of Scott Kveton, “You Have Died of Dysentery” August 31, 2015 via Flickr; Creative Commons 2.0 Generic

Tuesday, September 1, 2015

Data: Where Do You Start?

In conversations about how to use data to create value, one common question is “Where do you start?” 
Often times, people think that it’s necessary to start on day 1 by deploying some complex algorithm with a name that sounds really technical. Not only do I not suggest that approach, I happen to think that it's a horrible idea.
I start all of my data efforts with a series of questions about the field I’m working in at the time. What’s happened to date? What efforts will move the needle? What are the main things the organization can change to improve the operation? In my experience, data work initiated without a specific question in mind leads to technology without an immediate real-world practical use.
To Quote Albert Einstein, Things should be as simple as possible, but no simpler. 
Photograph by Oren Jack Turner, Princeton, N.J. Modified with Photoshop by PM_Poon and later by Dantadd. [Public domain], via Wikimedia Commons