Introduction
While we bid adieu to 2021, one should not fail to acknowledge the fact that it was another crazy year in the history of humanity. We all aged simultaneously with a never-ending pandemic turning up with new variants every now and then.
Coming to data science, 2021 was a very incremental year for this industry in terms of breakthroughs, but one where we saw an exponential rise in the demand for data professionals, the rise of data engineering, and further developments in MLOps.
As is our annual tradition, Analytics Vidhya is back with its review of the best developments and breakthroughs in data science in 2021 and we also look forward to what you can expect in 2022. There’s a lot to unpack here so let’s get going
Developments in Computer Vision(CV) in 2021
As we mentioned earlier, this year was for incremental growths and not major breakthroughs for computer vision. This year we saw an adaptation of computer vision applications in various industries.
With the workforce being reduced in the traditional sector, a lot of companies started to apply already developed CV applications to continue their work.
Let’s have a look at one of the major developments that happened in this sector.
OpenCV AI Kit
According to the hugely-successful Kickstarter Campaign:
OAK is a modular, open-source ecosystem composed of MIT-licensed hardware, software, and AI training – that allows you to embed the super-power of spatial AI plus accelerated computer vision functions into your product. OAK provides in a single, cohesive solution what would otherwise require cobbling together disparate hardware and software components.
In simple terms, it helps you analyze anything and everything by unlocking all the applications. And, the best part is that it fits in your pocket so that you can carry it with you at all times. It is effortless and can be connected with your OS in just three steps, and you can dig in and change anything you want as it’s open-source.
Image 1
Some major applications in:
– 3D Object/ Vehicle Detection: This is what humans do. We know that there are different objects everywhere, in our physical space. That’s how we can pick up a glass of water or catch a thrown ball. While you’re driving, you need to know the full speed, trajectory, vehicle type/color, license plate, etc.
– 3D Semantic Segmentation: 3D depth information is semantically labelled per pixel. It allows your robot to stay on the sidewalk or know when there’s some object in your path.
– Human Pose Estimation: The product can track your hand’s position and even full-body pose in full 3D coordinates. It can even capture you while you’re on the move. It can also try to predict your 3D hand pose estimation; even though there are so many possibilities, it can still have all sorts of interactive control.
– 3D Manipulation and Control: You could pair this with a holographic display from the Looking Glass Factory and make the future now. Your robot can sense how someone is looking at you or how are you looking at someone or something. Interesting? Well, the best part is that your robot won’t get as tempted as you while you gaze at your favourite ice cream.
OAK-D Lite and the open-source Depth AI OAK ecosystem give you all the pieces, thoughtfully and performantly combined. This product will save a lot of your time; it is cheaper than your custom build. You now no longer have to kludge together disparate firmware, software, and hardware to harness the power of spatial AI – it is tightly integrated into OAK-D. LITE is open-source, which can be extended, and the functionality can be changed as and when required.
A lot of excitement has been created in the market. As this price range (early bid price $74) is affordable, many customers have already claimed the product. The expected delivery date for backers is December 2021 or an early start in 2022.
Want to know more? Read here and learn how to create real-time reaction videos.
If you would like to read about OpenCV, then head to our blog.
NLP Updates 2021
Unlike the last year with tremendous breakthroughs, this year was again a year of adoption for NLP. With widespread industry use, companies have been able to cut down their costs a lot.
These are 2 of the biggest developments we found in this sector-
Google LaMDA
Image 2
LaMDA is Google’s latest research breakthrough announced at the beginning of the year. LaMDA stands for Language Model for Dialogue Applications and it can engage in a free-flowing conversation irrespective of the topic. This is a breakthrough in unlocking more natural ways of interacting with technology and new categories of supportive applications.
LaMDA is not restricted to pre-fed answers and will lead to a more user-friendly experience. This new model of conversation-making is more meaningful and alive.
Google has been trying to fine-tune the model and improve the sensibility and specificity of its response. Trials are still going on, but yes, the focus is that LaMDA upholds high fairness, accuracy, and privacy standards. It is yet to begin preparations on various other formats such as audio, videos, and images, and it is just focusing on the text right now.
This is going to be very interesting. Are we going to replace our friends with such technologies? With no emotional drama involved? And yet, share with them everything. Only time will tell
Hugging Face releases Optimum and Infinity
Image 3
There exists no NLP enthusiast who does not know about Hugging Face. They came up with 2 developments this year that excited the community.
Check out their website
- Optimum – Transformers have been a game-changer when it comes to improving the accuracy of Machine Learning and NLP models. But putting these huge models into production and using them at scale has always been a challenge.
Through Optimum, a new open-source library, Hugging Face aims to build the definitive toolkit for Transformers production performance and enable maximum efficiency to train and run models on specific hardware.It aims to provide performance optimization tools targeting efficient AI hardware, built-in collaboration with their Hardware Partners, and turn Machine Learning Engineers into ML Optimization wizards.
The Transformers library made using state-of-the-art models easy, alleviating the complexity of frameworks, architectures, and pipelines. With the Optimum library, we are aiding engineers to use all the available hardware features at their disposal, reducing the complexity of model acceleration on hardware platforms.
- Infinity: The on-prem containerized solution delivers Transformers accuracy at 1ms latency. It helps you to speed up and fasten your inference in your infrastructure. It can be deployed in any production environment and can be easily be scaled to thousands of requests every second. It helps achieve unmatched performance and 1ms latency for BERT-like models on GPU and 4ms on CPU.
Infinity meets the highest security requirements and can be integrated into your system, including air-gapped environments where you can control your model, data, and traffic.
Data Science Contribution towards COVID-19 in India
The pandemic ravaged the entire globe in 2020 and continued to do so in 2021 with many countries being hit with the 2nd wave. But the nothing could dim the lights of hope put out by the data science community. Here are a couple of instances where data science came to rescue and minimize the damage.
-
Integrating FTIR Microscopy with Artificial Intelligence
India’s 2nd wave provided an ideal environment for this to propagate with 1000s of cases being detected across the country.
To tackle this, the experts in data science created machines that integrated FTIR microscopy and artificial intelligence to analyze infected patients’ samples.
Fourier transforms infrared (FTIR) microscopy is an efficient, rapid, and repeatable process for obtaining spectral fingerprints of biomolecules. The researchers took a computer-based model which is unique and is trained to recognize different signals from black fungus molecules, and it will match each data from the patients with a determined spectrum.
-
DarwinAI
At a global level, we saw unique diagnostics projects, such as DarwinAI (Canada). With this computer vision tool in place, it’s possible to diagnose COVID-19 by chest radiography scans only. Before, the only medical-imaging COVID-19 diagnostic method was computer tomography (CT).
-
Real-Time Data Analytics
The South Korean government took major preventative measures using real-time analytics for strategy design and patient surveillance.
It uses the data from IoT and AI solutions underlying the live smart cities networks and personal information provided by confirmed patients.This allows researchers to track the patients’ movements, identify their contacts, and predict the potential outbreak scale in a given region with the help of big data analytics. The data is also used for drafting preventive measures and instructions.
MLOps in 2021
The goal of MLOps is to manage and accelerate the lifecycle for analytics and ML models from development into production. With billions of dollars being funded to develop MLOps infrastructure, 2021 saw a rise in companies taking MLOps seriously.
In 2021, the start-ups took the lead as they needed to help get more machine learning models into production. It is vital to make an effort to define and monitor this market. Most of the MLOps start-ups are first focusing on the Tabular Data and then expanding into other types.
MLOps show a standard progression where they master Tabular Data with their unique Data Governance, Data Monitoring, ML Monitoring, ML Platforms, and serving platforms. MLOps is a market ripe for private equity investors looking for M&A opportunities and investors looking to get into AI. It is believed that the mid-sized companies will now start investing and buying this technology and climb a level up concerning innovation.
In 2022, MLOps will change the trend and help manage, accelerate the lifecycle for analytics and ML models from development into production. Start-ups need insights, innovation, and urgency to solve these problems. The solutions can deliver energy to their enterprise customers who need to get more value from their ML models.
Data Engineering Trends in 2021
According to Research and Markets, the Data Engineering market is expected to grow to Usd 77.37 Billion By 2023. With the pandemic, people have adopted a “living on the internet” lifestyle. And this has only led to more streaming or live data to be analyzed, which in turn gave rise to the demand for more data engineering jobs in 2021.
We saw the following trends happening in 2021-
– Transition to the Cloud: The work from home or the hybrid model of working has encouraged various companies to make a shift and transfer their data to the cloud. The transition to the cloud has multiple advantages, including cost and time savings, reliability, and mobility. The millennials and other employers are drawn to the organizations using the latest tools and technologies, which further will help to advance the business.
– Snowflakes and Kubernetes: Snowflake has become a popular technology that has taken an upward and onward growth in 2021. This tool is flexible, user-friendly and can support cloud platforms. Kubernetes has been increasing as the demand for data engineering roles has increased, requiring more DevOps responsibilities.
– Healthcare Hiring: While analyzing the employment trends for data engineers, it’s evident that there is a high push in the healthcare industry. The analytics professionals employed in healthcare have nearly tripled, with 18% in a survey done on a small sample.
– Popularity of Notebooks continue to Rise: Notebook interfaces have been popular in the data science industry for many years now. And, yes, Notebooks will continue to gain traction among data engineers in 2022 as well. Notebooks allow data engineers to mix and match language as per the task requirement.
Analytics Vidhya’s Take on Data Industry Trends in 2022
1. The number of jobs in the Data Science Domain will continue to rise in 2022 – The trend continues as we predicted in the previous year. With Data Engineering and MLOps taking precedence, there will be no turning back.
2. More focus on Ethical AI – Data is available everywhere. The amount of data generated and collected has far exceeded our expectations. With such information in hand, it is only logical to focus on collecting only the required data and using it only for the said purposes rather than selling it to 3rd party without any concerns.
There will be so many new explorations in the field of data science as we step in 2022. With the demand, data science enthusiasts keep exploring and trying their hands on different projects to contribute to this community.
What are your thoughts and predictions for 2022? Please share with us in the comments below.
References
– Image 1: https://www.indiegogo.com/projects/opencv-ai-kit-lite#/
– Image 2: https://blog.google/technology/ai/lamda/
– Image 3: https://huggingface.co