Watch: Understanding Unstructured Data with Language Models

6 September 2024

2

As data scientists, we’ve seen a rapid improvement in the last decades in the tools available for working with structured data (be it tabular data, graph data, sensor data etc.). Yet, the vast majority of our data (Merrill Lynch puts the figure at roughly 90%) is *unstructured*, and lives in the form of documents, emails, reviews, reports, and chat logs etc. Many of us are far less familiar with how to analyze and understand this trove of unstructured data.

This talk by Alex Peattie focuses on language models, one of the most fundamental tools for working with unstructured data. Language models are all around us (although we’re probably unaware of them), underpinning everything from Word’s spellchecker to home assistants like Alexa. While plenty of “out of the box” language modeling libraries exists, the first part of the talk focuses on getting a thorough understanding of what a language model is, and how it works. We touch on key ideas from statistics and information theory, and see how Alan Turing, in developing techniques to break Nazi codes at Bletchley Park, created the smoothing techniques which remain widely used in language models today. We then proceed to the present day, looking at how techniques like word vectors and transfer learning have yielded an improved generation of tools. In the second half of the talk, we look at how we can practically use language models to understand unstructured data.

Specifically, this video explores:

– Classification: the canonical application of language models, they can help us identify spam, analyze sentiment or perform unsupervised clustering. We look at a famous case where language models were able to successfully identify a Shakespeare forgery.

– Predictive modeling: if I were to look at your Tweets (and nothing else), could I guess your gender? It turns out state-of-the-art techniques can successfully predict it with an 80%+ success rate. We look at how language models can enrich your datasets with additional demographic or contextual data.

– Information retrieval: finally, we see how language models have been used extensively (for example in the legal sector), to extract targeted insights from enormous data sets.

Watch: Understanding Unstructured Data with Language Models

Run Local AWS Cloud Stack using LocalStack on Linux

Learn Terraform Automation in 3 days using Video Courses

How To Expose Ansible AWX Service using Nginx Ingress

LEAVE A REPLY Cancel reply

Most Popular

5 Best Antiviruses With Keylogger Protection in 2025 by Tyler Cross

Best VPNs for School in 2025 That Work With Firewalls by Toma Novakovic

How to Watch the Super Bowl From Anywhere in 2025 by Raven Wu

Best Malware Removal + Protection Software in 2025 by Raven Wu

Recent Comments

EDITOR PICKS

5 Best Antiviruses With Keylogger Protection in 2025 by Tyler Cross

Best VPNs for School in 2025 That Work With Firewalls by Toma Novakovic

How to Watch the Super Bowl From Anywhere in 2025 by Raven Wu

POPULAR POSTS

5 Best Antiviruses With Keylogger Protection in 2025 by Tyler Cross

Best VPNs for School in 2025 That Work With Firewalls by Toma Novakovic

How to Watch the Super Bowl From Anywhere in 2025 by Raven Wu

POPULAR CATEGORY

ABOUT US

FOLLOW US