www.kanhaiya.site
Data Science | GenAI | ML | DevOps
Kanhaiya Kumar

Overview

Engineering the AI Future
Kanhaiya Kumar
Data Scientist · AI Engineer · GenAI · ML · NLP · Azure · DevOps · QA
Data Science Machine Learning NLP Generative AI Python
Building practical AI systems - I design and ship intelligent workflows that transform unstructured documents and data into automated, production-ready pipelines using GenAI, ML, and Azure.
Pune, Maharashtra, India From Begusarai, Bihar 4+ years in software
GenAI, ML, NLP
Azure OpenAI & Data Pipelines
Quality-first, production focus

About

Profile

I’m a Data Scientist / AI Engineer at Xoriant, building GenAI and ML workflows that turn RFQs, spec logs, and technical docs into structured, usable data for engineering and quoting teams.

Hands-on with Python, Azure OpenAI, and Microsoft Azure, I ship document AI pipelines end to end: ingest PDFs/Word/Excel/email, extract key fields, summarise, validate, and deliver production-ready outputs.

Grounded in DevOps and QA/UAT, I prioritise reliability, repeatability, and measurable impact in every release.

Current focus: Strengthen data engineering foundations, refine user-centric prompt strategies, and scale high-performing GenAI systems across multiple production environments.

AI & GenAI Work

Applied Intelligence

A snapshot of how I use Machine Learning, NLP, and Generative AI in production-like workflows.

  • Build document AI pipelines that process RFQs, spec logs, and drawings (PDF, Word, Excel, emails) into structured data using LLMs plus rules.
  • Design prompt strategies and keyword maps so outputs match what Application Engineers need.
  • Use Azure OpenAI / OpenAI for summaries, entity extraction, and domain Q&A on complex technical docs.
  • Validate AI systems with a QA/UAT mindset to ensure reliability and consistency before rollout.
  • Integrate AI workflows into DevOps pipelines for continuous improvement of prompts, models, and rules.

Experience

Career Journey
Software Engineer / Data Scientist - Xoriant
Sep 2022 - Present · Pune, Maharashtra, India

Data Scientist & AI Engineer focused on Data Science, GenAI/ML, and AI-driven solutions with DevOps and QA discipline. Hands-on with Python, OpenAI/GenAI, Azure, QA, and automation.

  • Designed and implemented AI-driven workflows for document processing, including data extraction, summarisation, and validation.
  • Built Python-based pipelines integrating Azure Blob, file conversion, and Azure OpenAI for entity extraction and summarisation.
  • Collaborated with Application Engineers to ensure extracted data and summaries match real quoting and engineering requirements.
  • Contributed to DevOps and deployment tasks, including CI/CD pipelines and environment configuration on Azure.
  • Performed UAT and QA on AI tools, identifying edge cases and improving model behaviour and prompts over time.
Associate Software Engineer - Xoriant
Aug 2021 - Sep 2022 · Pune, Maharashtra, India

Early-career role focused on build optimisation, cloud, and DevOps, with exposure to secure and scalable systems.

  • Worked with build tools like BuildGrid, RECC, and Bazel to improve build performance and scalability.
  • Streamlined build pipelines, reducing development time and increasing productivity for engineering teams.
  • Supported Azure and cloud-related work, contributing to stable and efficient deployments.
  • Focused on continuous improvement and automation across the build and release process.

Skills

Stack & Strengths
Core
Data Science Python Machine Learning GenAI / LLMs NLP Prompt Engineering DevOps Microsoft Azure QA & UAT
Tools & Platforms
Azure OpenAI Azure Blob Azure DevOps Git Red Hat Enterprise Linux (RHEL) BuildGrid RECC Bazel
Soft Skills
Problem-Solving Workflow Automation Collaboration with AEs Continuous Improvement Attention to Detail

Projects

Selected Work
Automated Data Extraction for Seal Quote Generation
Python | OpenAI/GenAI | Azure | Automation

End-to-end pipeline to extract data from RFQs and related documents (PDF, Word, Excel, emails) and generate structured Excel-based Seal Quotes.

  • Organised input files by opportunity, extracted key technical and commercial fields, and validated outputs.
  • Reduced manual work for Application Engineers and improved consistency of quotations.
RFQ AutomationDocument AIAzure Blob
Automated Data Extraction from Specification Logs
Python | Data Validation | Workflow Automation

Automated extraction and validation of fields from specification log files, exporting clean data for downstream use.

  • Built rules and checks to maintain high data quality.
  • Supported accurate and efficient quote preparation and analysis.
Spec LogsData Quality
Spec-Extract - Content Extraction & Summarisation
GenAI | NLP | Prompt Engineering

System to extract and summarise relevant sections from specification logs using keyword mapping plus Generative AI.

  • Designed keyword maps and prompts to focus on voltage, pressures, materials, and other key attributes.
  • Enhanced summarisation quality using GenAI / OpenAI to align with how Application Engineers think and search.
SummarisationKeyword MappingPrompt Engineering

Resources

Downloads
Resume
Snapshot of my experience, projects, and skills.
Download PDF
Portfolio / Case Study
Highlights of AI/GenAI + automation work.
View
Sample Data / Demo
Example outputs from document AI pipelines.
Download

Education

Academic

Orissa Engineering College

B.Tech - Computer Science · Aug 2017 - Jun 2021
CGPA: 8.65
  • Built strong foundations in computer science, programming, and software engineering.
  • Active in college activities and societies, fostering a sense of community and collaboration.
  • Explored photography through the media club and stayed updated with technology trends.
  • Developed critical thinking, problem-solving, and time management skills with a global perspective.

Certifications & Goals

Growth
  • Microsoft Certified: Azure Administrator Associate.
  • Multiple Udemy certifications across Python, data, and cloud / DevOps.
I am looking for roles where I can combine Data Engineering / Data Science, GenAI, and DevOps to build end-to-end AI systems that have a clear real-world impact.

Contact

Let Us Connect