So you’re thinking of starting a computer vision program, but you’ve realized now that the logistics are overwhelming. What framework do you use? What infrastructure? Do you go with an out of the box solution or take the time to build your own? Cloud GPU or on-premise? What’s your infrastructure?
Each time you inch towards a decision, some other factor pops up. Some new framework makes a splash on the scene. An upgrade happens to an out of the box solution. Each second that passes without adoption is more time you’re losing the real benefit of implementing a computer vision program.
Dr. Ali Vanderveld, lead data scientist at Shoprunner, is here to help you get through these decisions with a series of real-world examples and some targeted questions to ask to help narrow down those confusing choices. Let’s get your computer vision programming up and running.
[Related Article: 4 Steps to Start Machine Learning with Computer Vision]
What Is Computer Vision and What Can We Do With It?
Computer vision is a program that allows machines to process and “understand” the contents of an image. Before, machines had trouble understanding what it is they were seeing. We simply reduced the image to a series of textural markers or mathematical gradients to make the image more digestible for our algorithms to understand.
Advances in computer vision allow machines to process what they’re seeing more easily. While far from perfect, websites like Shoprunner use computer vision to classify garment types for easier, more effective customer searches and to provide something called visually similar search. Think of the suggestions you receive from Google or Pinterest based on the previous image you interacted with. For Shoprunner and other fashion sites, it allows customers to search not only by key terms but also by similarity to an already vetted image.
Getting Started Part 1: Examine Your Data
The first question is all about your data, but it isn’t just the kind of data. You’ll also need to have a clear understanding of what you want to accomplish from your data. This understanding helps narrow down the infrastructure you’ll need to launch your own computer vision program.
What Is Your Data Like?
If you’re using a pre-established API, you’ll need to know if:
- your data (both images and labels) and the training data are similar enough.
- your data has multiple objects that need to be detected.
If your data isn’t similar or your images contain multiple objects, you may have better luck with customized solutions rather than out of the box options. If your labels are simple and there is only a single object per image, an out of the box solution could help get things rolling more quickly.
How Much Labeled Data Do You Have?
IF you’re training a new model, you’ll need between 100 to 1000 labeled examples. You’ll need to find a dataset that works for your particular field. In some cases, you can make use of open source data, but you could also crowdsource your data if you need something more specific. If you decide to crowdsource, it’s always preferable to use the people within your own company, but if you’re patient, you could potentially use outside sources (although this won’t get your data as clean as your own organization).
What Outputs do you need?
You’ll have to decide if you’re after labels or vectors. The complex nature of your visual search helps determine which program can provide the right kind of results. Vector packages are available, and many are open source. Shoprunner uses Spotify’s Annoy, but others are also available.
Getting Started Part 2: Infrastructure
If you can’t find something that suits your use case or you’ve reached the point where you need more customization, you’ll need to figure out your infrastructure.
What are your training considerations?
Your hardware will need to be robust. CPUs aren’t going to cut it. Switching to GPUs provides not only processing power but also storage capabilities. If you’re fine-tuning a pre-trained model, you may not have as much labeled data. If you’re starting with scratch with a new architecture, your computational power will be significantly higher.
How are you deploying?
If this is an in-house model, you may not need something that provides the same computational power. In-house deployments won’t have nearly the requests as production systems, so think about your final deployment purposes.
Both of these questions will direct you to cloud computing or on-premises solutions. If you’re building deeply scalable solutions for production, on-premises solutions will likely hamper your growth. A cloud computing solution might serve you a lot better.
On the other hand, if you’re keeping your computer vision program within your organization, it may not be worth the hassle of setting up in the cloud. Your security could be better served keeping that information close, especially if it’s sensitive information.
[Related Article: Want to Train Computer Vision Models 100x Faster? Meet MissingLink.ai]
Getting Started Part 3: People
Who you have on your team will also help direct your decisions even further. Your HR resources will bring some constraints to your solution.
- What are your coding levels? Experts or junior developers. Do you have anyone with experience at all and can you bring those experts in?
- Is your team ready to handle the rigors of complete customization or is your team more comfortable with easy to use, out of the box systems?
It can be challenging to find a computer vision expert with the engineering expertise to build these custom solution. Vanderveld estimates it could take six months or more to find the right person. That may not be possible for you. If that’s the case, take the time to train your existing team for this type of deep learning. You could even do a Kaggle hackathon like the team at Shoprunner.
Be sure to catch Dr. Vanderveld’s talk on ODSC’s Youtube channel for specific recommendations for infrastructure, frameworks, and languages!