By @SimonCocking

Interview with Dr. Brian Mac Namee @BrianMacNamee . Brian is a lecturer at the School of Computer Science in University College Dublin (UCD), and a principle investigator at CeADAR, the Centre for Applied Data Analytics Research. Brian has over 10 years experience working in applications of data analytics to real-world problems, with a particular emphasis on machine learning, novelty detection and data visualisation. See  Brian speaking at the Predict Conference September 15 – 17th

How did you first become interested in data / predictive analytics?

In 4th year of a computer science degree I took a machine learning module, and was hooked!

How do you think data will change business (the world or your sector)?

We can measure more things now than we ever could before because of the proliferation of small, cheap sensors and the transfer of so many activities online. So I think that the way we analyse data will continue to be similar, we will just be able to do it more and more areas.

What is your best data (data modelling/predictive analysis) tip?

When building predictive models ALWAYS make sure to FULLY understand how the model will be used when deployed VERY EARLY in a project. Otherwise we can end up spending a load of time working on models that don’t match how they will be used, and so are useless.

 

What advice would you give to someone just starting out on their data journey?

Just start, but start easy! Don’t try to jump straight into predictive modelling try more straightforward work.

Take some data about something you are interested in – say some of the public datasets covered by the Guardian Data Blog (http://www.theguardian.com/data), the Irish Times data team (http://www.irishtimes.com/news/irish-times-data), the Central Statistics Office (http://www.cso.ie/en/index.html), or sports data from open sources (http://datahub.io/dataset/uk-premier-league-match-by-match-2011-2012, http://www.football-data.co.uk/englandm.php,http://gaelicstats.com/) – and try to use it to answer some questions. Data about sports and societal issues is great for this. So, can you find out how unemployment has changed over the past 10 years in Ireland? Has this been different in different parts of the country? Is it different for men or women, young people or old people, … ? Or, can you find out which premiership team has the most expensive players? How has leagues positions of premiership teams changed over the last 20 years? Does the money spent on players actually lead to better league positions? Or in GAA, are more points scored from frees than from open play? Has this changed over the last 10 years? Make sure that you are always  trying to answer specific questions rather than just “looking around” in a dataset, as this is much more productive.

You can do this sort of analysis with easy to access free tools like Excel, Tableau Public, SAS Univeristy Edition, R, Python, … Getting used to working with data to answer questions is the most important way to get started.

What skills do you think a good data scientist analyst should have?

Coding (Python, R, SAS,…), statistics, machine learning, SQL, data visualisation, problem solving, communication, …

What resources would you recommend (e.g. books, websites, blogs, data, technology, etc.)?

Fundamentals of Machine Learning for Predictive Data Analytics: Algorithms, Worked Examples, and Case Studies, By John D. Kelleher, Brian Mac Namee and Aoife D’Arcy, MIT Press, 2015. (https://mitpress.mit.edu/books/fundamentals-machine-learning-predictive-data-analytics)

KDnuggets (http://www.kdnuggets.com/)

Quick-R (http://www.statmethods.net/index.html)

Which trends in this area will make the biggest change in people’s lives in the next 2 – 5 years?

Giving people more access to their own data has the potential to be a massive impetus for new uses of data. There is huge uncertainty about this though, as it could be that all personal data access enables is widespread Feltron-like navel gazing (http://feltron.com/) – Feltron is great, though!

Pin It on Pinterest

Share This