After knowing What is Data Science, the Key Pillars of Data Science, the Roles & Responsibilities of a Data Scientist one of the major questions that arise is Why do we need data science? But before jumping to the question let’s discuss briefly Why do data science? This thing motivates you to learn more about data science.
Why do Data science?
Speaking of the demand, there is an immense need for individuals with data science skills. According to LinkedIn U.S. Emerging Jobs Report, 2020 Data Scientist ranked #3 with 37% annual growth. This field has topped the Emerging Jobs list for three years running. Moreover, according to Glassdoor, in which they listed the top 50 most satisfying jobs in America, Data Scientist is #3 job in the US in 2020, based on job satisfaction(4.0/5), salary($107,801), and demand. According to StackOverflow developer survey, 2020 – developer roles, about 8.1% of respondents identify as Data scientists or machine learning specialists.
So this is a great time to be getting into data science – not only do we have more numerous data, and more numerous tools for gathering, warehousing, and interpreting it, but the need for data scientists is growing frequently and perceived as essential in many diverse sectors, not just business and academia. So now come to the topic.
Why do we need data science?
You may notice that there is the term “data” in the “data science”. So whats actual data is? Let’s discuss the term data briefly.
What is Data?
As we have used some time discussing what data science is, it’s necessary to spend some time looking at what exactly data is. Wikipedia defines data as,
A set of values of qualitative or quantitative variables.
This definition focuses more on what data entails. And although it is a reasonably short definition. Let’s take a second to parse this and focus on each component individually.
- A set of values: The first term to concentrate on is “a set of values” – to have data, we require a set of values to include. In statistics, this set of values is known as the population. For example, that set of values needed to answer your question might be all websites or applications or it might be the set of all people getting a particular drug or set of people visiting a particular website. But generally, it’s a set of things that you’re going to make measurements on.
- Variables: The next thing to focus on is “variables” – variables are measurements or characteristics of an item. For example, you could be measuring the weight of a person, or you are estimating the amount of time a person visits a website or app. Or it may be a further qualitative characteristic you are trying to measure, like what a person clicks on a website, or whether you think the person visiting is male or female.
- Qualitative and quantitative variables: Finally, we have both “qualitative and quantitative variables“. Qualitative variables are information about qualities. They are things like country of origin, gender, religion, etc. They’re usually represented by words, not numbers, and they are not indexed or ordered. On the other hand, quantitative variables are information regarding quantities. Quantitative measurements are normally represented by numbers and are estimated on a constant ordered scale; they’re something like weight, height, age, and blood pressure.
After getting a brief knowledge of data there is another term we frequently hear this term Big Data when it comes to the data science world. So it deserves an introduction here – since it has been so integral to the rise of data science.
What is Big Data?
Big Data literally means large amounts of data. Big data is the pillar behind the idea that one can make useful inferences with a large body of data that wasn’t possible before with smaller datasets. So extremely large data sets may be analyzed computationally to reveal patterns, trends, and associations that are not transparent or easy to identify.
Why is everyone interested in Big Data?
Big data is everywhere!
Every time you go to the web and do something that data is collected, every time you buy something from one of the e-commerce your data is collected. Whenever you go to store data is collected at the point of sale, when you do Bank transactions that data is there, when you go to Social networks like Facebook, Twitter that data is collected. Now, these are more social data, but the same thing is starting to happen with real engineering plants. Real-time data is collected from plants all over the world. Not only these if you are doing much more sophisticated simulation, molecular simulations, which generates tons of data that is also collected and stored.
How much data is Big Data?
- Google processes 20 Petabytes(PB) per day (2008)
- Facebook has 2.5 PB of user data + 15 TB per day (2009)
- eBay has 6.5 PB of user data + 50 TB per day (2009)
- CERN’s Large Hadron Collider(LHC) generates 15 PB a year
So one of the reasons for the acceleration of data science in recent years is the enormous volume of data (e.g Big Data) currently available and being generated. Not only are huge amounts of data being collected about many aspects of the world and our lives, but we concurrently have the rise of inexpensive computing. This has formed the perfect storm in which we have rich data and the tools to analyze it. Advancing computer memory capacities, more enhanced software, more competent processors, and now, more numerous data scientists with the skills to put this to use and solve questions using the data! And that’s the big reason why do we need data science in the future.