Shaligram Dewangan

Connect with me:

LinkedIn

GitHub

About

Hi, I'm Shaligram. I am moving into the 3rd year of my undergraduate in Artificial Intelligence at IIT Guwahati. I was previously a Research Intern at IISc, Bangalore where I worked on building various AI models for legged robots. I've previously trained a GPT-2 sized model on 20 billion tokens for Indic languages and have outperformed the original.

I am deeply interested in building large deep neural nets, which can supercharge the potential of humanity. If you're interested in something similar too, maybe we could be good friends!

Projects

GPT-2 for Indic Languages

Mar 2025 - May 2025  |  GitHub Repo

Pre-trained 124M parameter decoder only dense transformer model. Trained it on 20 billion English and Hindi language tokens from FineWeb-Edu and Fineweb-2 datasets. My model outperformed OpenAI GPT-2 (124M) on Hindi-Hellaswag by ~10% and achieved a similar performance on Hellaswag (English) benchmark. My BPE tokenizer achieved 3x better average tokenization efficiency for English and Hindi language task.

Quadruped Robot

Jun 2023 - Aug 2023

Made a 4-legged 12DoF robot based on Stanford Pupper v1 Project, which can trot, walk and jump.

Education

Bachelor of Science (Hons.)  -  IIT Guwahati

Oct 2023 - Aug 2027 (ongoing)  |  4 Years  |  CPI: 8.6/10

Major in Data Science and Artificial Intelligence

Experience

Research Intern  -  IISc, Bangalore

Dec 2023 - Feb 2025  |  1.2 Years

Advised by Prof. Shishir NY, I worked on Vision based Deep RL models for quadruped robots, Deep Learning based models for actuators and various other things.

Awards

First Prize, Poster Presentation

Dec 2024  |  Artificial Intelligence Confluence, IIT Guwahati

Vision based Deep RL models for Agile Industrial Robots, Shaligram Dewangan, et al.

Skills

Languages and Libraries

Very familer with Python and its ecosystems

Python
PyTorch
NumPy
Pandas
Matplotlib
Tensorboard
Datasets

Others

Including but not limited to the following

Deep Learning
Reinforcement Learning
Natural Language Processing
Large Language Models
Pre Training
Evaluation
Tokenization