In this interview from neveropen Foo Camp 2019, Hands-On Unsupervised Learning Using Python author Ankur Patel discusses the challenges and opportunities in making machine learning and AI accessible and financially viable for enterprise applications.
Highlights from the interview include:
The biggest hurdle businesses face when implementing machine learning or AI solutions is cleaning and preparing unstructured data that exists across silos. Patel says commoditized infrastructure from companies like Amazon and Google is one of the most significant advancements toward a solution in this area: “A lot of the work that data scientists would have to do in a custom way is now being done, basically, out of the box by API calls on one of these platforms.” (00:57)
Open source is going to provide a “massive benefit” for businesses, Patel says. “In computer vision, for example, starting in 2012, those models were essentially open sourced, so a lot of businesses then got into the business of applying those computer vision models for specific use cases, like autonomous tracking vehicles. So, it’s going to be less about the models, per se—it’s going to be more about the use cases and applications of those models.” (01:57)
Open source data and transfer learning are also enabling businesses to more easily move models into production and to achieve an ROI. Patel notes that when data sets are open sourced, “that means any firm that wants to work on the data set, instead of training their own models, is able to do that. Then you have pre-trained models you can do transfer learning with. If you take a language model, for example, that’s provided by Google’s BERT and apply it to a corpus of documents that is in your vertical—let’s say legal documents at a law firm—and you want to make it easier to process law documents as opposed to using paralegals. You can take the massively pre-trained language model, fine tune it on your legal corpus, and then deploy that as a solution. So, you’re able to see the ROI a lot faster—say in six to 12 months versus what previously would’ve taken three to five years because you would’ve had to train your own model from scratch. This idea of transfer learning, using large pre-trained models, fine tuning on your own corpus of text, that is where we’re going in the near future. I think that’s something most businesses should be very optimistic about.” (06:27)