Why do you need a data science team instead of a single data scientist?
This article is also posted on ElasticMiningWhen we talk about data scientist, maybe there is a superhero-like person appearing in your mind, because you have read a lot of magazine articles or columns about what a magical model they designed and how much value they could bring for business. I must admit that data science is really charming for business. When data become a kind of resource, data scientist is considered as a gold miner and is expected to extract something valuable from the data. However, whoever thinks a data scientist could do this ALONE is wrong.
Have you ever seen The Flash or The Avengers? Even though any of them could destroy the Earth, they all have a TEAM to fight against enemies. Therefore, even the best data scientist in the world is not able to deal with every detail of his/her own. Data science is a science as complicated as any other science. There are more than 1 TB data, equal to 500 high quality movies, generated in a median level internet company per day. Data scientists need to know how to store large amounts of data, choose the most appropriate algorithm to analyze it and turn it into values as big as possible, there are too many different jobs that requires people with different data science skills to work on.
Let’s see a general life circle of data analytics. There are different domains of knowledge, techniques and skills that data scientists would need in each stage:
- Data sources
We are in a complex data environment that data is everywhere, big and variety. Data scientists must know how to deal with and how to combine the data coming from different sources like rational and nosql data base, social media, sensor, server logs and etc. - Data exploration
Sampling, statistic, and visualization are used to help data scientists understand the data first in a variety of ways, including data distribution, statistical values or correlation between two features. - Data transforming and cleansing
In most of situations, data is useless or could not be used for mining without this stage. Data in the real world is dirty and messy, maybe having missing values, undefined value, outlier, or noise and those dirty data might point to a wrong direction of analysis even a wrong result of it. - Data mining and analytics
There are lots of algorithms and mathematics involved in this part. Data scientists will choose the appropriate algorithm and model from million of them for prediction such as churn rate, click rate, ant etc. - Data visualization and storytelling
The last and the most important, data scientists need to tell a good story about their exciting, brilliant findings and models to their boss, stockholders, or customers, so a good visualization and storytelling skill will be very helpful.
沒有留言:
張貼留言