Overview
- Amazon Web Services (AWS) is the leading cloud platform for deploying machine learning solutions
- Every data science professional should learn how AWS works
Introduction
“Your machine ran out of memory.”
Sounds familiar? It certainly is for me – especially anytime I try to run a complex machine learning algorithm on my personal machine. It’s quite a frustrating experience that a lot of data science professionals feel. We don’t have the unlimited computing power of the tech behemoths – so what should we do?
This is where the power of the cloud has transformed data science. And Amazon, with its AWS offering, has conquered the data science market like nothing before.
Cloud computing has seen tremendous growth in the past few years. Almost every organization nowadays uses cloud computing for its wide range of services. 70% of all the money spent on tech is expected to go into cloud services by the end of 2020.
Did you know that AWS’s revenue in the first quarter of 2020 was $10 billion? That’s almost twice as much as its next closest competitor! Every data science professional, from a data science to a data analyst, needs to learn AWS and how it works.
So in this article, let’s dive into what AWS is and find out why it has come at the forefront of cloud computing services.
Table of Contents
- What is Amazon Web Services (AWS)?
- History of Amazon Web Services
- Services provided by Amazon Web Services
- Here’s why you can’t use your local system for all of your data tasks
- How can Amazon Web Services help you?
What is Amazon Web Services (AWS)?
AWS is a cloud computing platform by Amazon that provides services such as Infrastructure as a Service (IaaS), platform as a service (PaaS), and packaged software as a service (SaaS) on a pay-as-you-go basis. It was launched in 2006 but was originally used to handle Amazon’s online retail operations.
AWS has 3 main products:
- EC2 (Amazon Elastic Compute Cloud):
EC2 allows users to rent virtual machines/servers on which they run their own applications. These servers come in different operating systems and Amazon charges you based on the computing power and capacity of the server (i.e. Hard Drive capacity, CPU, Memory, etc.) and the duration the server been up - Glacier
Glacier is a low-cost online file storage web service. Amazon Glacier is designed for the long-term storage of inactive data that will not need to be quickly retrieved - S3 (Amazon Simple Storage Services)
S3 provides object storage through a web service interface, with scalability and high-speed being its boon
AWS provides its consumers with many advantages:
- Security: AWS provides comprehensive security capabilities to assure the most demanding requirements
- Compliance: AWS has rich controls, auditing, and broad security accreditation
- Hybridism: It allows the building of hybrid architectures that extend the on-premises infrastructure to the cloud
- Scalability: It allows scaling up and scaling down with ease
- Pay-as-you-go: This means that you pay in accordance to the services you use. Useless, pay less. Use more, pay more but per-unit price goes down as you scale up
Here is an article that will help you begin your journey in using AWS:
History of Amazon Web Services (AWS)
AWS was initially launched in 2002 but it provided only a few services. In 2006, AWS launched its cloud products which included Amazon S3 cloud storage, SQS (Simple Queue Service), and EC2 and in doing so, marked its entry in the online core services industry.
In 2009, AWS saw the international expansion of AWS to Europe where S3 and EC2 were launched. Elastic Block Store (EBS), which provides block-level storage, and Amazon CloudFront, a content delivery network, were released and incorporated into AWS.
It provides block-level storage to use with Amazon EC2 instances. Amazon Elastic Block Store volumes are network-attached and remain independent from the life of an instance.
Over the years, a lot of services were added to the AWS platform which has made it a cost-effective and highly scalable platform. Now, AWS has its data centers all over the world including the United States, Japan, Europe, Australia, and Brazil.
AWS Global Infrastructure map
Services provided by Amazon Web Services
The following services are provided by AWS in the respective domains:
- Compute Services:
- EC2 (Elastic Compute Cloud)
- EKS (Elastic Container Service for Kubernetes)
- Lambda
- Amazon LightSail
- Elastic Beanstalk
- Database Services:
- Neptune
- RDS
- Aurora
- RedShift
- DynamoDB
- ElastiCache
- Security Services:
- KMS (Key Management Service)
- AWS IAM (Identity and Access Management)
- Inspector
- WAF (Web Application Firewall)
- Cloud Directory
- Certificate Manager
- Organizations
- Shield
- Macie
- GuardDuty
- Storage Services:
- Amazon Glacier
- S3 (Simple Storage Service)
- AWS Snowball
- Elastic Block Store
- Migration Services:
- Snowball
- DMS (Database Migration Service)
- SMS (Server Migration Service)
- Analytical Services:
- Kinesis
- QuickSight EMR (Elastic Map Reduce)
- Data Pipeline
- CloudSearch
- Athena
- ElasticSearch
- Management Tools:
- CloudWatch
- CloudFormation
- CloudTrail
- OpsWorks
- Config
- AWS Auto Scaling
- Messaging Services:
- Pinpoint
- SQS
- SES
- SNS
For more information on services provided by AWS, click here.
By now you would have a broad understanding of what AWS is. So now, let’s shed some light on why companies require their data scientists to know AWS.
Here’s why you can’t use your local system for all of your data tasks
Remember when you were just sitting idle waiting for the system to respond? Here, we highlight a list of problems that your local systems must be able to overcome:
- The system on which you deploy tasks has low processing power that will have a drag on your punctuality. You must have noticed this while processing huge volumes of data and I am pretty sure the thoughts of an external, centrally managed system must have crossed your mind
- Large data sets don’t fit into the IDE’s system memory which is required for analytics or model training. Remember when your Jupyter Notebook got stuck?
- It costs a lot both in terms of time and money to install and maintain your own hardware
How can Amazon Web Services help you?
I am sure many of you would be still wondering why you should use AWS? Why not go for something else (like Google’s GCP)? Let me answer this by giving the following benefits fo AWS:
-
User Friendly
AWS has a very well documented user interface which eradicates the requirement of on-site servers to meet the IT demands. This eases up the deployment of programs, software from time to time. AWS meets your every need.
-
Diverse Tools
Earlier in this article, we saw what a diverse range of services AWS has to offer. It’s the all in one solution for your IT and cloud requisites considering its efficiency.
-
Computing Capacity
You don’t need to worry about whether large datasets will fit into your IDE’s system memory or not.
-
Infrastructure
The AWS Global Cloud Infrastructure is the most extensive, and reliable cloud platform, offering over 175 fully-featured services from data centers globally. Whether you need to deploy your application workloads across the globe in a single click, or you want to build and deploy specific applications closer to your end-users with single-digit millisecond latency, AWS provides you the cloud infrastructure where and when you need it easily.
-
Pricing
I sense this will act as the most convincing points! AWS is one of the cheapest platforms for cloud servicing. This is really useful for small businesses to function and grow without allocating much working capital on servers.
2020 Gartner Magic Quadrant for Cloud Infrastructure and Platform Services
Why do companies emphasize on AWS knowledge for their data scientists?
Whichever firm you work for, cloud infrastructure will become an important part of your daily data science regime because companies have become more inclined towards cloud computing for solutions.
According to a report from Indeed.com, AWS rose from a 2.7% share in tech skills in 2014 to 14.2% in 2019. That’s a 418% change!
This is because of the pricing model on which AWS works. AWS works on a pay-as-you-go model and charges on either a per-hour or a per-second basis. It also provides an option to reserve a specific amount of computing capacity at discounted rates.
Additionally, AWS keeps in mind the prospective consumers who can’t afford its services. For them, it provides the AWS Free Tier service which allows them to gain hands-on experience with AWS services absolutely free.
All businesses, whether big or small, want to save costs. Small companies save costs of buying servers and conglomerates gain authenticity and productivity. AWS services are also very powerful. On one hand, where it takes days to set up a Hadoop cluster with Spark, AWS does it within a few minutes.
End Notes
In today’s competitive world, having hands-on experience with cloud services like AWS gives a great lead in the data science race. AWS is now very popular among businesses and your experience with such cloud computing platforms highlights your skills during the recruitment process.
Here are some additional resources that you should look into:
- The guide to quickly learn Cloud Computing in R Programming
- Essential Functionalities to Guide you While using AWS Glue and PySpark!
I hope this article serves as a solid argument supporting why cloud computing is necessary for data scientists. Please use the comment section below if any thoughts to share or general queries.