Explore the future of AI with Dr. Vikas Agrawal, Senior Principal Data Scientist at Oracle Analytics Cloud. In this Leading with Data session, he shares insights on problem-solving in data science, MLops, and the impact of generative AI on enterprise solutions. The discussion spans from practical approaches to pitfalls in data science projects, offering essential advice for aspiring data scientists.
Key Insights from our Conversation with Vikas Agrawal
- In data science, focusing on understanding the problem is crucial, taking up the majority of the effort.
- A successful Proof of Concept (POC) in data science should consider not only technical aspects but also the practicality and scalability of the solution.
- Clear communication and setting realistic expectations with customers are vital to avoid costly misunderstandings driven by AI hype.
- Generative AI holds the potential to revolutionize enterprise solutions, especially in areas related to text and user interfaces.
- Building a career in data science requires a solid foundation in mathematics and a deep understanding of algorithms.
- In enterprise settings, ensuring the trustworthiness and reliability of AI outputs calls for new validation techniques.
- As AI tools evolve, data scientists need skills to enhance and improve these tools, not just operate them.
How do you balance technical depth with a macro view in data science?
In my day-to-day work, I owe a lot to my mentors from various esteemed institutions and companies who instilled in me the philosophy that technology is a means to an end, not the end itself. The key is to spend a significant amount of time understanding the problem – about 90% of the effort goes there. The rest is about finding solutions, which often involves looking at how others have approached similar issues and what the customer ultimately needs. This approach has been fundamental in connecting technology with business impact.
What is your approach to solving a customer’s problem?
Once we’ve identified a problem worth solving, we first ensure we have the data needed to address it. Then we assess whether the technology exists to solve the problem within a reasonable timeframe. If we see a path, even if it’s a couple of years out, we’ll proceed with a proof of concept (POC). This POC is comprehensive, covering everything from data pipelines to end-to-end functionality, although scalability at this stage is not the primary concern. The goal is to have a clear path to the algorithms, data sources, and nature of the output we’re aiming for.
How do you handle the optimization phase and ML ops?
After a successful POC, we enter the optimization phase, which is where the bulk of the work lies. This involves ensuring the model adapts to different business processes and geographies, and can correct itself when it goes out of distribution. It’s also about ensuring the model can be retrained efficiently and scales appropriately. This phase is critical because it’s where the model transitions from a concept to a practical, deployable solution.
What are the most common pitfalls in data science projects?
The most costly mistakes usually revolve around AI hype and miscommunication. It’s crucial to set clear and mutual expectations with the customer. Often, customers have high expectations due to the industry buzz around AI, not realizing that the state of the art may not always provide the correct answers they seek. Another pitfall is defining the problem incorrectly, either by not addressing the customer’s issue directly or by attempting to ‘boil the ocean.’
How do you interact with generative AI in your workflows?
Generative AI is not widely used in most enterprises due to concerns about copyright and IP contamination. However, we do leverage commercially available open-source material. Generative AI has advanced significantly in areas like text summarization, expanding text, and providing explanations. Trustworthiness remains a challenge, and we’re exploring techniques to filter outputs from large language models (LLMs) to ensure they’re reliable for enterprise use.
What impact do you foresee generative AI having on enterprise solutions?
Generative AI will likely have the most significant impact on workflows involving running text, such as information retrieval and user interfaces. For example, it can dramatically improve enterprise search by retrieving semantically similar pieces of text. It can also revolutionize natural language interfaces for databases, allowing users to ask questions in natural language and receive accurate SQL responses.
What advice would you give to those entering the data science field today?
It’s an exciting time to be in data science, but it’s crucial to have a strong foundation in mathematics and understand the algorithms you’re working with. As AI tools become more sophisticated, the ability to augment and improve them will be a valuable skill. Those who can create new algorithms or understand the intricacies of existing ones will be in high demand.
Summing-up the Conversation with Vikas Agrawal
In this insightful session, Dr. Vikas Agrawal shared key insights for success in data science career. From emphasizing problem comprehension to navigating pitfalls and embracing generative AI, the interview provides a roadmap. Aspiring data scientists are advised to build a robust foundation in mathematics and algorithms for a field in constant evolution. This interview heralds a new era of innovation in AI.
Stay tuned with us on Leading with Data to catch-up with the journeys of more such pioneer AI and Data Science leaders in the industry. You can checkout our upcoming Leading with Data sessions here!