PESHAWAR: Across the globe, Silicon Valley tech companies are investing billions to build the future of artificial intelligence, making headlines. But in Peshawar, two university students are challenging the status quo with grit, determination, and a bit of prize money from national hackathons.
Muhammad Uzair and Junaid Ahmed, both fourth-semester computer science students at Peshawar University and both cousins, have built AI models designed specifically for the Pashto language.
With over 60 million speakers worldwide, especially in Pakistan, Afghanistan and the Middle East, Pashto has been poorly integrated by major AI platforms like ChatGPT and Gemini. The lack of Pashto data online for training AI models means that the more established AI giants often mix it up with Arabic or Persian because of their similar scripts.
Recognizing this digital niche, Uzair and Junaid decided to take matters into their own hands. Uzair's solution is "Katib-ASR," an Automatic Speech Recognition (ASR) model. Katib is the traditional word for scribe as the model writes down what it hears, converting spoken Pashto audio directly into accurate Pashto script.
Building it was a huge linguistic hurdle. Pashto features 45 letters, including more than half a dozen variations of a single letter like "Ye”. Because relevant datasets didn't exist, Uzair had to build his own from scratch. He pulled audio from regional podcasts, generated it himself and with friends and family or found it online and clipped it into tiny segments to teach his model how specific sounds correspond to text.
Currently sitting at an impressive 75%-80% accuracy, Uzair’s goal is to further refine the model and eventually introduce text-to-speech and other capabilities for Pashto speakers.
Meanwhile, his cousin Junaid faced a similar problem with text generation. His focus is "Qehwa AI," a text-generation model named after the region's famous traditional tea.
Junaid scraped every corner of the internet for Pashto text, translated entire books from English and Urdu and compiled an astonishing 1.5 billion words. According to Junaid, this makes it the largest Pashto dataset.
By feeding this massive library into an open-source base model, he has created an AI that understands the deep context, literature, cultural nuances and history of the language.
Whether you prompt it in English, Urdu, or Pashto, Qehwa AI processes the information and responds in Pashto. Since it is specifically trained on Pashto, Junaid claims that Qehwa AI outperforms global models in Pashto translation and summarization tasks.
According to the two developers, training AI requires immense computational power, usually found in massive data centers. When the duo approached local institutions for help, they found a lack of supercomputers or advanced graphics cards (GPUs). And lack of funds was also a major obstacle.
The idea they came up with was to compete in major national Hackathons, win and use that money. And win they did. They pooled their prize money -- nearly Rs200,000 -- and used it to rent cloud-based GPUs. Grassroots, self-funded operation.
Going forward, both students see this as just the beginning. While Uzair aims to perfect the audio mechanics, Junaid plans to expand his model to include other regional languages like Punjabi, Sindhi, and Balochi, ultimately creating a unified AI for all Pakistani languages.
In a world where AI is dominated by corporate conglomerates, Uzair and Junaid are a reminder that actual innovation can often start with limited funds and resources, and a drive to empower and represent one's own culture, land and language on a global level.
3 HOURS AGO

3 HOURS AGO

4 HOURS AGO

5 HOURS AGO

5 HOURS AGO

