Tuesday, September 17, 2024
Google search engine
HomeData Modelling & AIArchitecting the Edge for AI and ML

Architecting the Edge for AI and ML

The edge ML wave

To fully understand what’s happening at the intersection of machine learning and edge computing, we have to take a step back and examine all the different paradigm shifts that have occurred with compute power and data collection over the last decade. For starters, computer chips are now everywhere. Chips can be found in many of the devices we use and rely on everyday — from cars, to refrigerators, drones, doorbells and more. It’s now possible to access a range of chipsets, form factors, and architectures, such as those provided by Arm that enable access to high-speed, low power computing capabilities.

A location-centric framework for running ML models at the edge

Current approaches to running ML models center around the type of compute you’ll be using to run your models — in the cloud, on-prem, hybrid, air gap, or at the edge. At Modzy we saw all of this happening and flipped the problem on its head. We’ve started to approach running and managing models from a location-centric mindset, which reduces some of the complexity that arises with only worrying about the compute. In this framework, each new environment becomes an “edge location,” and your ML models could be running wherever you want. Be it on-prem, private cloud, public cloud, these are just “edge” locations with significantly greater compute capacity than those we usually think of when talking about the edge, like a Jetson Nano, Raspberry Pi, or Intel Up board. Although there’s a lot of variety in all these environments, the main factors impacting how your ML models will run are power consumption and network connectivity.

Edge device dependencies: Power consumption and network connectivity

To understand the impacts of these dependencies, it helps to segment device types by power consumption and network access. For example, let’s consider a 5G tower as the near edge. The bases of 5G towers have racks of servers that perform many functions, including data transmission. 5G towers also have access to high power compute like GPUs, which is useful for things like GPS navigation, which requires models to be running close to cell phones. In this case, it’s challenging to run a large neural network on someone’s cell phone, and the latency associated with sending the data back to the cloud for inference would take too long. So here, it makes the most sense to run the model on GPUs co-located with the cell tower. A near edge scenario is good for situations where you want your compute resources as close to the end application as possible for fast data processing.

Types of edge devices vary based on their computing power and network access

Challenges with running ML models at the edge

Top 5 issues setting up and running ml models on edge hardware

Elements of an ideal edge ML Architecture

With all these challenges in mind, there are four key components that can help bring order to the chaos and allow you to build an efficient, scalable edge ML architecture:

Key elements of an ideal edge ML architecture

  • Device agnostic: ensuring your architecture is device agnostic can save you a lot of time. While many IoT devices come built with Arm chips, you’ll want to also make sure they work for AMD chips. Same goes for data transfer protocols. For example, while MQTT might be the standard in manufacturing, you’ll also want to make sure your architecture woks for gRPC, Rest, etc.
  • Low latency inferences: if fast response time is important for your use case, low latency inference will be a non-negotiable.
  • Ability to operate in a disconnected environment: if you’re running ML models at the edge, chances are, there will be situations where the devices go offline. It’s better to account for these scenarios from the start.

Edge-centric AI system

By adopting a first-principles approach to your building out your edge ML architecture, you first consider your device locations, and then create a mechanism to configure and interact with them accordingly. Taking things one level down, the key components of your edge-centric AI system include:

  • Centralized model storehost your containers and allows your edge devices to grab container images and pull models in from the outside.
  • More than one “device”: run and manage models on multiple devices. This doesn’t just mean SBCs — it can include cloud or on-prem computers, and is a great way to address challenges associated with running models on multi-cloud compute.
  • Container runtime: Tools like Docker provide a container runtime, which is helpful for remotely processing data in these locations using the same models.
  • REST or gRPCconnect high speed, low latency inferencing to the rest of your app. gRPC isn’t quite as user-friendly as REST, which can be great for working offline because of network speed or when latency doesn’t matter.

Four Design Patterns for Edge ML Architectures

Now that we’ve covered the components that will set you up for success in building an edge-ML system let’s dig into design patterns. We’ll cover four different options you might choose, depending on your use case and workload.

Native edge architecture diagram: run many models on many devices

Network local architecuture diagram: Run many models on multiple edge servers

Edge cloud architecture diagram: Run AI models with no cloud mgmt, whatsoever

Remote batch architecture: Run many models on Spark cluster all over the world

Final Thoughts

An edge-first architecture isn’t just for running ML models at the edge — it will be useful any time you need high performance, low latency offline and online execution, or if you want to run your ML models on more than one computer. The edge-centric approach laid out here can help you scale, add more locations, models, devices, systems, and applications, in a seamless, streamlined way. At least for the next decade!

Article originally posted here. Reposted with permission.

RELATED ARTICLES

Most Popular

Recent Comments