Education
Manipal Institute of Technology
B.E. - Electronics and Communication (2009 - 2013)
- Founding Member, RoboManipal
- Director, TEDxManipalU
- Associate Editor, Editorial Board - Manipal University
Areas of Interest
- Natural Language Generation (NLG)
- Question Answering (QA) Systems
- Knowledge Graphs (KGs)
- Graph Neural Networks (GNNs)
- Meta Learning
- Never Ending Language Learning (NELL)
- Reinforcement Learning
Skills
» ML Frameworks
PyTorch, TensorFlow, Keras
» Cloud Infra
GCP, AWS
» Data Processing
NumPy, Pandas, Apache Beam + Google Cloud Dataflow
» ML Orchestration and Deployment
Kubernetes, Kubeflow Pipelines
Bazel | (also migration from Maven to Bazel)
VSCode, Spyder, Git, Docker, VIM, Slack
» API Building
Flask, GUnicorn, FastAPI
REST, gRPC
» Databases
MongoDB
Professional Experience
- Implemented CNNs for text classification w/ explainability using LIME (Locally Interpretable Model Agnostic Explanations) on legal contract data for multiple contract types
- Implemented Induction Networks w/ Episodic Learning for Few-Shot Text Classification on legal contract data for multiple contract types
- Implemented transformer based architectures for the text classification pipeline including BERT, DistilBERT, RoBERTa, XLNet on legal contract data for multiple contract types
- Implemented Unsupervised Data Augmentation (UDA) for semi-supervised training for training in data scarce scenarios with flexibility for any model to plug into the UDA methodology
- Data creation using multiple methods such as backtranslations, text generation using GPT-2 conditioned on a classifier (PPLM)
- Model attack using TextFooler that uses tiny perturbations in textual data to fool the model thus creating adverserial examples which are used to re-train the models
- Used fuzzy-matching and grouping of textual data for label validation for training data
- Used Snorkel AI’s methodology for programmatically building and managing training datasets using rules, heuristics, and other sources of signal
- Built a nuanced custom text segmenter for legal contract data (reqd. complex multiple nesting to get entire context of a sentence) such that it was generalizable across any legal contract type by designing a feature-based heuristic approach achieving a False Negative Rate of 0 and a system that is accurate >95% of the time
- Built e2e gRPC services for parsing PDF documents using Adobe’s Java SDK involving migration from Maven to Bazel
- Built a Central Data Store for all legal contracts and built services around it that handled parsing of source data, segmentation using the nuanced custom text segmenter and storing of this segmented data along with their embeddings that allowed for maintaining data integrity and similar clause/sub-clause search with user defined search space.
Awards
Best Enterprise + Industrial AI, Amazon AI Conclave 2019