We are officially in the era of Artificial Intelligence or AI. AI is set to enter our lives in a big way and ChatGPT from Open AI is one of the prime examples of AI going mainstream. Large Language Models (LLMs) are at the heart of the AI revolution that is taking place. However, most of the large language models from the west offer limited support for Indic languages. But this is set to change with significant development now focused on regional LLMs and Indic languages.
Bhashini
Bhashini, a Govt of India AI based language translation initiative aims to break language barriers across India. It supports 22 languages, over 300 AI models and has clocked 500K+ mobile app downloads. AI4Bharat, a research lab at IIT Madras, is dedicated to advancing Indian language technology by developing open-source datasets, tools, models, and applications. Their pioneering work in this field has been recognized at leading international conferences. Among their key contributions are projects like IndicCorp, BPCC, Shrutilipi, Kathbath, IndicBERT, IndicTrans, IndicXlit, IndicWav2Vec, Indic Whisper, and TTS.
Also read: OpenAI’s o1 ‘Strawberry’ AI can think like humans—but why is it named after a fruit?
Sarvam AI
Sarvam AI, a startup in the Generative AI space founded by Vivek Raghavan and Pratyush Kumar and backed by Lightspeed, Peak XV Partners and Khosla Ventures, is developing generative AI models focused on Indic languages. Sarvam AI aims to enhance the accuracy of generative AI apps in India at lower costs.Recently, Sarvam AI introduced a 2-billion parameter model, Sarvam 2B, which they have open-sourced and made available on Hugging Face. Sarvam AI claims that its model is significantly more efficient for Indian languages compared to Meta’s Llama 3.1, Google’s Gemma 2, and GPT-4o.
Tech Mahindra
Tech Mahindra recently announced Project Indus with a focus on developing the largest Indian LLM from scratch. Kunal Purohit, President – Next Gen Services, Tech Mahindra said “India has traditionally been a consumer of technology as a nation; however, we are now taking proactive steps to transition into a producer of technology. This shift has generated positive momentum, and we have made considerable advancements with Project Indus and Indic LLM. From the outset, our objective has been to construct a foundational model from scratch. With Project Indus, we reached our initial milestone by creating an open-source foundational model. Our aim was to cater to the various dialects spoken across India. We have successfully launched Indus, a 1.2-billion parameter model trained in Hindi and its 37 plus dialects, allowing users to pose questions in their native dialects and receive precise responses. This model ensures seamless engagement between brands and individuals across these dialects”.
Also read: Google will now help you turn your notes into podcast, new AI-backed Audio Overview feature rolling out
Gnani.ai
Another company taking an interesting approach is Gnani.ai which was been developing SLMs or small language models for industry specific use cases. The company has been investing in AI long before it became mainstream. It has patented several innovations and counts Samsung Ventures and Infoedge Ventures as investors, due to the expertise in multiple Indian languages it has developed in-house. Ganesh Gopalan Co-Founder and CEO of Gnani.ai believes that AI can solve several fundamental problems in India such as primary education, maternal healthcare and more. He believes we have barely scratched the surface when it comes to utilising the power of AI. He adds, the noises you hear in India are very different from anywhere in the world, be it people speaking in an auto rickshaw or train.
Project Vaani
Project Vaani, a collaborative initiative by IISc Bangalore, ARTPARK, and Google, aims to offer developers access to over 14,000 hours of speech data in 59 languages, gathered from 80 districts across India. Google is taking this initiative further by investing in a new project known as Morni and developing AI models to support close to 125 Indic languages.
Although local development and training of AI models are feasible, there is still a heavy reliance on NVIDIA GPUs and shortage of capable hardware. Recently, the Government of Telangana has partnered with Yotta Data Services to launch India’s largest AI supercomputer, equipped with 25,000 high-performance GPUs. The AI Cloud Data Center campus will feature a dedicated GPU cloud infrastructure offering access to high-performance computing resources, powered by approximately 4,000 NVIDIA H100/H200 GPUs, with the ability to scale up to more than 25,000 GPUs in the future. These GPUs will be interconnected through high-speed networking. This infrastructure will be made available to startups, educational institutions, research labs, businesses, and government organisations.
Also read: WhatsApp to boost Meta AI with multiple voice options to enhance personalised user interactions
Voice bots have emerged as a prominent AI application in India, largely fueled by the rapid growth of the fintech sector. AI is clearly set to become widespread across the country, with many implementations acting as co-pilots to enhance existing processes. It is worth pointing out that the development of Indic language models demands significantly more resources than those for English. Despite these challenges India is set to become one of the largest markets for widespread AI adoption.