Education

Manipal Institute of Technology

B.E. - Electronics and Communication (2009 - 2013)

Founding Member, RoboManipal
Director, TEDxManipalU
Associate Editor, Editorial Board - Manipal University

Areas of Interest

Natural Language Generation (NLG)
Question Answering (QA) Systems
Knowledge Graphs (KGs)
Graph Neural Networks (GNNs)
Meta Learning
Never Ending Language Learning (NELL)
Reinforcement Learning

Skills

» ML Frameworks

PyTorch, TensorFlow, Keras

» Cloud Infra

GCP, AWS

» Data Processing

NumPy, Pandas, Apache Beam + Google Cloud Dataflow

» ML Orchestration and Deployment

Kubernetes, Kubeflow Pipelines

» Build Tools

Bazel | (also migration from Maven to Bazel)

» Tools

VSCode, Spyder, Git, Docker, VIM, Slack

» API Building

Flask, GUnicorn, FastAPI

REST, gRPC

» Databases

MongoDB

Professional Experience

Implemented CNNs for text classification w/ explainability using LIME (Locally Interpretable Model Agnostic Explanations) on legal contract data for multiple contract types
Implemented Induction Networks w/ Episodic Learning for Few-Shot Text Classification on legal contract data for multiple contract types
Implemented transformer based architectures for the text classification pipeline including BERT, DistilBERT, RoBERTa, XLNet on legal contract data for multiple contract types
Implemented Unsupervised Data Augmentation (UDA) for semi-supervised training for training in data scarce scenarios with flexibility for any model to plug into the UDA methodology
Data creation using multiple methods such as backtranslations, text generation using GPT-2 conditioned on a classifier (PPLM)
Model attack using TextFooler that uses tiny perturbations in textual data to fool the model thus creating adverserial examples which are used to re-train the models
Used fuzzy-matching and grouping of textual data for label validation for training data
Used Snorkel AI’s methodology for programmatically building and managing training datasets using rules, heuristics, and other sources of signal
Built a nuanced custom text segmenter for legal contract data (reqd. complex multiple nesting to get entire context of a sentence) such that it was generalizable across any legal contract type by designing a feature-based heuristic approach achieving a False Negative Rate of 0 and a system that is accurate >95% of the time
Built e2e gRPC services for parsing PDF documents using Adobe’s Java SDK involving migration from Maven to Bazel
Built a Central Data Store for all legal contracts and built services around it that handled parsing of source data, segmentation using the nuanced custom text segmenter and storing of this segmented data along with their embeddings that allowed for maintaining data integrity and similar clause/sub-clause search with user defined search space.

Awards

Best Enterprise + Industrial AI, Amazon AI Conclave 2019