Thursday, October 9, 2025
HomeData Modelling & AIWatch: State of the Art Natural Language Understanding at Scale

Watch: State of the Art Natural Language Understanding at Scale

Natural language understanding is a key component in many data science systems that must understand or reason about text. Common use cases include question answering, paraphrasing or summarization, sentiment analysis, natural language BI, language modeling, and disambiguation. Building such systems usually requires combining three types of software libraries: NLP annotation frameworks, machine learning frameworks, and deep learning frameworks.

[Related Article: The State of Automatic Text Summarization with NLP]

This talk by David Talby at ODSC West 2018 introduces the NLP library for Apache Spark. It natively extends the Spark ML pipeline API’s which enabling zero-copy, distributed, combined NLP & ML pipelines, which leverage all of Spark’s built-in optimizations. Benchmarks and design best practices for building NLP, ML and DL pipelines on Spark will be shared. The library implements core NLP algorithms including lemmatization, part of speech tagging, dependency parsing, named entity recognition, spell checking and sentiment detection. The talk demonstrates using these algorithms to build commonly used pipelines, using PySpark on notebooks that will be made publicly available after the talk.

Dominic
Dominichttp://wardslaus.com
infosec,malicious & dos attacks generator, boot rom exploit philanthropist , wild hacker , game developer,
RELATED ARTICLES

Most Popular

Dominic
32342 POSTS0 COMMENTS
Milvus
87 POSTS0 COMMENTS
Nango Kala
6713 POSTS0 COMMENTS
Nicole Veronica
11876 POSTS0 COMMENTS
Nokonwaba Nkukhwana
11937 POSTS0 COMMENTS
Shaida Kate Naidoo
6833 POSTS0 COMMENTS
Ted Musemwa
7092 POSTS0 COMMENTS
Thapelo Manthata
6786 POSTS0 COMMENTS
Umr Jansen
6789 POSTS0 COMMENTS