The state of entrerpise NLP in 2020. This has been a unique year for public health, professional life, the economy, and just about every other aspect of daily life. While some doors are closing, and others are pivoting their business models, businesses that haven’t taken a hit are a rare breed. Despite this, there are some sectors that are thriving, and it’s not just virtual conferencing or healthcare.
Natural Language Processing (NLP) is one of those areas. In fact, the NLP market size is expected to grow from $10.2 billion in 2019 to $26.4 billion by 2024, according to research from MarketsandMarkets™. With use cases in assisting patients and practitioners in a healthcare setting, easing customer service queries, or even virtual assistance to help shoppers, there are several growth factors driving this uptick in NLP technology. NLP has the power to help users work faster, smarter, and more accurately, whether you’re a novice or an experienced data scientist.
To get a sense of how NLP is poised to grow over the next few years, we first need to understand the state of NLP now – from the challenges, successes, most prevalent use cases, and more. To achieve this, John Snow Labs in partnership with Gradient Flow, recently issued new research exploring the use of NLP across industries, geographies, and levels of adoption. Knowledge is power, and the goal of this survey is to help IT leaders realize NLP’s full potential by learning how organizations are using the technology.
The global survey – which queried nearly 600 respondents from more than 50 countries – gives a comprehensive view into the 2020 state of NLP adoption and implementation. The key findings below will help set a benchmark for the industry and track where we’re going as NLP advances over the next year.
NLP Spending is on the Rise: Despite the downturn in IT spending this year, interestingly, NLP budgets across the board, reported increases in NLP technology budgets to the tune of 10-30% more compared to last year. This is especially significant, given the survey was run during the peak of the global COVID-19 pandemic, when worldwide IT spending is down (Gartner). A majority 53% of respondents who are technical leaders indicated their NLP budget was at least 10% higher compared to 2019, with 31% stating their budget was at least 30% higher than the previous year. The same trend applies to large companies (those with more than 5,000 employees), in which 61% of respondents cited budget increases in 2020.
Cloud Use Presents Challenges: 77% of all survey respondents indicated that they use at least one of the four NLP cloud services listed – Google, AWS, Azure, or IBM. Despite the popularity of cloud-based services, respondents cited cost as the key challenge they face when using NLP cloud services. There are also concerns about extensibility, since so many NLP applications depend on domain-specific language use, and cloud providers have been slow to service these market needs. That said, it’s not surprising that 53% of respondents reported using at least one of the top two NLP libraries, Spark NLP and spaCy, a more accurate and cost-effective option.
Accuracy is Important and Challenging: More than 40% of all respondents noted accuracy as the most important criteria they use to evaluate NLP libraries. This is especially important, given NLP’s use in critical applications, such as electronic health records or to detect adverse drug events in a healthcare setting. On the flip side, accuracy was also the most frequent challenge cited by all respondents. This changed slightly when looking at the subset of respondents who identified as technical leaders, however. Integration issues, language support, and scalability are right there with accuracy as far as pressing challenges. Fortunately, areas, such as language support, are improving vastly. Companies such as Google and Facebook are publishing pre-trained embeddings for 150+ languages. And NLP libraries are following suit.
Classification and NER are the Main Use Cases: The four most popular applications of NLP are Document Classification, Named Entity Recognition (NER), Sentiment Analysis, and Knowledge Graphs. Respondents from healthcare cited de-identification as another common NLP use case. A once extremely manual and labor-intensive process, automated NLP has made this far less of a burden. NER and Classification are two other NLP use cases in which healthcare organizations are seeing great value. For example, these applications can help medical professionals identify adverse drug events (ADE) in patients quickly and accurately, improving care, and lessening the burden and cost on healthcare systems.
The Data Sources: Data from files (e.g., pdf, txt, docx, etc.) and databases top the list of data sources used in language processing projects (61%). From legal contracts and news articles, to medical records and SEC filings, that input documents are often stored in PDF format. While deep learning models have improved over the past few years, there are many difficulties and data quality issues that come into play when extracting text from PDFs. Interestingly, there were some differences in the data sources companies still exploring NLP vs. those further along in the adoption curve. Respondents in the exploration phase reported using audio data at a higher rate (29% ) compared to respondents from those further along (22%).
Given NLP’s growth trajectory over the past year, it’s clear that its momentum will persist through 2021. It will be interesting to see how adoption and use cases evolve with time and imminent technology enhancements. NLP has the power to change how we work, give and receive medical care, shop, and interface with customer service. While some of these cases may be more impactful than others, they will all shape how we work and live for the better.
Interested in learning more about NLP? We have a number of courses on our Ai+ Training Platform that cover important NLP topics:
An Introduction to Transfer Learning in NLP and HuggingFace Tools: Thomas Wolf, PhD | Chief Science Officer | Hugging Face
Natural Language Processing Case-studies for Healthcare Models: Veysel Kocaman, PhD | Senior Data Scientist | John Snow Labs
More sessions on NLP will be added every week! Stay tuned for more.