India is leveraging artificial intelligence (AI) to bridge linguistic gaps and ensure the inclusion of its diverse population. Villagers in Karnataka, a southwestern state, have played a pivotal role in this endeavor by contributing to the creation of the nation’s first AI-driven chatbot for Tuberculosis. This project aims to address linguistic diversity in a country where over 121 languages are spoken by more than 10,000 people each.
Linguistic Diversity and AI Challenges
India, with over 40 million native Kannada speakers, faces a significant challenge in providing AI solutions that cater to linguistic diversity beyond major languages covered in natural language processing (NLP). The exclusion of hundreds of millions of Indians from valuable information and economic opportunities due to language barriers has prompted innovative solutions.
Building Datasets for AI Models
Tech firm Karya is at the forefront of this linguistic revolution, engaging thousands of speakers from various Indian languages, including Kannada, to generate speech data. These datasets are then utilized by major tech giants such as Microsoft and Google to enhance AI models for sectors like education and healthcare. The government’s initiative, Bhashini, is also making strides by creating open-source datasets for AI tools through a crowdsourcing platform.
Overcoming Challenges in Data Collection
Despite the enthusiasm for creating datasets in Indian languages, there are formidable challenges. Many Indian languages maintain an oral tradition, with limited electronic records and prevalent code-mixing. Collecting data in less common languages requires special efforts. Experts like Kalika Bali from Microsoft Research India emphasize the importance of ethical crowdsourcing, taking into account linguistic, cultural, and socio-economic nuances.
Economic Value and Community Empowerment
Karya highlights the economic potential of speech data. It collaborates with non-profit organizations, empowering workers below the poverty line. Paying workers above the minimum wage and letting them own part of the data, Karya envisions economic value and potential AI product development for communities, especially in healthcare and farming.
AI Applications for Multilingual Inclusion
Less than 11% of India’s population speaks English, emphasizing the need for AI models focused on speech and speech recognition. Projects like Google-funded Project Vaani and AI4Bharat’s Jugalbandi chatbot showcase how AI can break language barriers. Social enterprises like Gram Vaani are utilizing AI-based chatbots to respond to questions on welfare benefits, empowering communities at the grassroots level.
Also Read: India’s BharatGPT Attracts Google’s Attention
Our Say
In conclusion, India’s journey into AI-driven multilingual inclusion is a testament to the transformative power of technology. India harnesses its diverse population’s voice, breaking language barriers, creating economic opportunities, and empowering communities. As AI’s demand for diverse languages rises, ethical data collection and model development become crucial. India’s pioneering efforts shine a beacon for nations facing linguistic diversity, highlighting AI’s potential for global inclusivity.