Shivam is available for hire

Shivam Garg

Verified Expert in Engineering

Computer Vision Engineer and Developer

Location

Delhi, India

Toptal Member Since

August 1, 2023

Shivam是一名高级人工智能工程师，在深度学习和人工智能方面拥有4年以上的实践经验. 精通TensorFlow等各种深度学习框架, PyTorch, and Keras, he excels in generative AI, Stable Diffusion, and large language models (LLMs). Furthermore, Shivam因其在经典计算机视觉和机器学习方面的广泛专业知识而脱颖而出.

Portfolio

Self-employed

Python，生成人工智能(GenAI)，稳定扩散...

Avatarin Inc

3D Reconstruction, Python, Computer Vision, OCR...

AlphaICs

Python，深度学习，量化，计算机视觉，NVIDIA TensorRT...

Experience

Python - 5 years Natural Language Processing (NLP) - 4 years Computer Vision - 4 years PyTorch - 4 years Generative Artificial Intelligence (GenAI) - 4 years Stable Diffusion - 2 years Large Language Models (LLMs) - 1 year LangChain - 1 year

Availability

Part-time

Preferred Environment

Python, PyTorch, TensorFlow, Deep Learning, Generative Artificial Intelligence (GenAI), Stable Diffusion, Computer Vision, Natural Language Processing (NLP), Docker, LangChain, Large Language Models (LLMs), Machine Learning, Data Science, Image Generation, Chatbot, Chatbots, Generative Pre-trained Transformers (GPT), OpenAI GPT-3 API, Generative Pre-trained Transformer 3 (GPT-3), Notion, APIs, Software Architecture, Events, LSTM, BERT, Reinforcement Learning, Falcon, 2D, JavaScript, Text to Speech (TTS), Generative AI

The most amazing...

...我交付的生成式人工智能模型使用Stable Diffusion和LLMs来动画新闻文章中的故事，并帮助获得Y Combinator的资金.

Work Experience

Senior AI Consultant

2023 - PRESENT

Self-employed

利用ControlNet开发了一个稳定的扩散模型，将草图转换为具有姿态输入条件的逼真图像. 通过Lora对交叉注意层进行调整，以优化训练模型的空间要求.
使用稳定扩散和llm提供生成式AI模型, capable of generating animated stories from news articles, which secured Y Combinator fundraising for the client.
开发了一种独特的方法，通过对未配对的动物图像进行GAN训练，将动物图像转换为动画, leveraging StyleGAN architecture, 并使用CLIP和特征提取器增强输出.
构建了一个系统，通过稳定扩散和深度估计，使用选择性3D喷漆将非可替换代币(nft)的2D图像转换为3D模型.
使用微调等技术开发文本到美术的系统, autoencoders, and prompt engineering, 成功地从文本描述中生成具有视觉吸引力的艺术.
在印度创建了一个使用ML和自然语言处理(NLP)来检测和分类假新闻的系统. Preprocessed text data, employed SetFit and long short-term memory (LSTM) models, and created an ensemble for precise identification.
使用Langchain的OpenAI ada模型嵌入和FAISS构建了一个工具，可以在美国专利商标局(USPTO)的数据库中搜索类似的专利，改进了专利嵌入的索引和搜索.
通过LLM (ada模型)将CLIP模型的视觉嵌入与ocr派生的文本嵌入进行比较，创建了一个电子商务产品匹配系统, enhancing accuracy and efficiency.

Technologies: Python，生成人工智能(GenAI)，稳定扩散, Deep Learning, Computer Vision, Natural Language Processing (NLP), PyTorch, TensorFlow, Docker, LangChain, Generative Pre-trained Transformers (GPT), AWS IoT, Git, Generative Adversarial Networks (GANs), Artificial Intelligence (AI), OCR, Google Cloud Platform (GCP), Convolutional Neural Networks (CNN), ChatGPT, OpenAI GPT-4 API, OpenAI GPT-3 API, Search Engine Optimization (SEO), OpenCV, Machine Learning Operations (MLOps), Amazon Web Services (AWS), Product Matching, LoRa, Large Language Models (LLMs), Diffusion Models, NLU, Deep Neural Networks, Language Models, MySQL, Machine Learning, Statistical Analysis, Data Analysis, Image Analysis, Data Science, MongoDB, Image Generation, Chatbot, Chatbots, LlamaIndex, Django, Pandas, Generative Pre-trained Transformer 3 (GPT-3), Llama 2, Text Analytics, Video & Audio Processing, OpenAI, Notion, APIs, Haystack, Supervised Learning, Unsupervised Learning, Leadership, Software Architecture, LSTM, BERT, Reinforcement Learning, Falcon, PEFT, 2D, JavaScript, Google Speech-to-Text API, Speech to Text, Point Clouds, Point Cloud Data, Azure Machine Learning, Azure DevOps, Text to Speech (TTS), Whisper, Generative AI

AI Engineer 3

2022 - 2023

Avatarin Inc

创建了一个系统，通过模仿学习和OpenCV辅助人类汉字书写，使用汉字视频生成汉字图像，预测机器人手臂的姿势.
Automated health records and invoices for Yale University, 利用OCR和OpenCV从各种健康文档中提取文本，并将其转换为数字格式.
实现了一个使用VideoMAE检测机场可疑活动的模型. 它优先考虑在客户端的Linux服务器上进行高精度、低延迟和高效的部署.
Shot detection using YOLOv5, OpenCV for object detection, 和VideoMAE为世界乒乓球组织在TT比赛中进行击球识别.

Technologies: 3D Reconstruction, Python, Computer Vision, OCR, Natural Language Processing (NLP), Object Detection, Image Processing, Benchmarking, OpenCV, Amazon Web Services (AWS), Text to Image, Large Language Models (LLMs), Diffusion Models, Deep Neural Networks, ChatGPT, OpenAI GPT-4 API, Language Models, MySQL, Machine Learning, Statistical Analysis, Data Analysis, Image Analysis, Data Science, MongoDB, Image Generation, Chatbot, Chatbots, LangChain, LlamaIndex, Django, Pandas, Generative Pre-trained Transformers (GPT), OpenAI GPT-3 API, Generative Pre-trained Transformer 3 (GPT-3), Text Analytics, Video & Audio Processing, OpenAI, HubSpot, Notion, APIs, HubSpot CRM, Haystack, C++, Supervised Learning, Unsupervised Learning, Leadership, Software Architecture, Events, LSTM, BERT, Reinforcement Learning, Falcon, PEFT, 2D, Google Speech-to-Text API, Speech to Text, Point Clouds, Point Cloud Data, Azure Machine Learning, Azure DevOps, Text to Speech (TTS), Generative AI

Senior AI Engineer

2020 - 2022

AlphaICs

利用一阶模型实现了一个运动传递系统, 在保持目标面部的身份和面部表情的同时，实现面部之间的高质量运动转移.
构建了4位和8位量化软件开发工具包(SDK), 能够在边缘(基于cpu的)硬件上高效地实现和优化深度学习模型, which enhanced performance and capabilities.
使用针对物联网和自定义边缘设备的定制量化和优化SDK对不同的计算机视觉和生成模型进行基准测试.
Worked on brain image segmentation using deep learning, 其中包括训练神经网络来准确识别和分类与阿尔茨海默病有关的大脑图像中的结构. I've used segmentation and computer vision techniques.
推出了一款使用激光雷达数据和VoxelNet算法的自动驾驶汽车3D目标检测和跟踪系统, 增强车辆在3D环境中的感知和跟踪能力.
利用You Only Look Once (YOLO)架构开发了红外目标检测系统, 在红外图像中实现对目标的高精度探测，提供可靠的识别和跟踪能力.
创建了一个卫星图像分割系统，用于使用U-Net和Mask R-CNN模型的级联来检测农田, 改善农业分析和决策过程.

Technologies: Python，深度学习，量化，计算机视觉，NVIDIA TensorRT, Continuous Development (CD), Continuous Integration (CI), Models, PyTorch, TensorFlow, Keras, FastAPI, Fast.ai, GPT, You Only Look Once (YOLO), Artificial Intelligence (AI), Google Cloud Platform (GCP), Convolutional Neural Networks (CNN), Image Processing, Benchmarking, Amazon Web Services (AWS), Large Language Models (LLMs), Text to Image, Diffusion Models, Deep Neural Networks, Language Models, MySQL, Machine Learning, ETL, Statistical Analysis, Data Analysis, Image Analysis, Data Science, OpenCV, iOS, Image Generation, Chatbot, Chatbots, Pandas, Generative Pre-trained Transformers (GPT), Text Analytics, Video & Audio Processing, OpenAI, HubSpot, Notion, APIs, Haystack, C++, Supervised Learning, Unsupervised Learning, Leadership, Software Architecture, Events, LSTM, BERT, Reinforcement Learning, Falcon, PEFT, 2D, JavaScript, Google Speech-to-Text API, Speech to Text, Point Clouds, Point Cloud Data, Azure DevOps, Text to Speech (TTS), Generative AI

Machine Learning Engineer

2019 - 2020

UnrealAI

使用OpenPifPaf在Android平台上开发并部署实时瑜伽姿势估计, achieving accurate results for Indian yoga poses. 优化推理速度，将模型转换为TensorFlow Lite格式，实现无缝集成.
Created a topic modeling model, 利用LDA和NMF算法从文本语料库中提取潜在主题, and applied clustering algorithms to group similar topics, 提供对文本文档更好的理解和组织.
建立了一个计算机视觉系统，可以准确地检测厨房里的物品, with high accuracy and low latency. 该系统针对移动设备的实时性能进行了优化.
使用监督异常检测集合检测所得税欺诈, unsupervised clusterin, and rule-based backtracking.

Technologies: Computer Vision, PyTorch, TensorFlow, TensorFlow Light, Continuous Integration (CI), Continuous Development (CD), Flask, Deep Learning, Pose Estimation, Open Neural Network Exchange (ONNX), Natural Language Processing (NLP), Machine Learning, Artificial Intelligence (AI), Convolutional Neural Networks (CNN), Deep Neural Networks, Language Models, MySQL, ETL, Statistical Analysis, Data Analysis, Image Analysis, Python, Large Language Models (LLMs), Data Science, MongoDB, OpenCV, iOS, Image Generation, Django, Pandas, Text Analytics, Video & Audio Processing, Notion, APIs, HubSpot CRM, Haystack, C++, Supervised Learning, Unsupervised Learning, Leadership, Software Architecture, Events, LSTM, Reinforcement Learning, Falcon, 2D, JavaScript, Google Speech-to-Text API, Point Clouds, Point Cloud Data, Text to Speech (TTS), Generative AI

Experience

法律聊天机器人与RAG，松果集成，流光用户界面，和GPT-4

In this project, we developed a legal chatbot leveraging OpenAI's GPT-4, LangChain, and the retrieval-augmented generation (RAG) model, 与松果数据库集成，并使用Streamlit为用户界面开发, all built on a scalable Azure architecture. 这个聊天机器人的设计目的是提供精确和上下文敏感的法律建议, utilizing the Azure OpenAI GPT-4 series, GPT-35-Turbo series, 自然语言理解的嵌入系列模型, and LangChain for seamless conversational AI. 我们在Azure AI Studio上对模型进行了微调，并通过将llm与其他Azure服务连接起来来增强模型功能, like Azure AI Search.

Personalized Art Generation Bot

开发了一个机器人，帮助用户根据他们与机器人的互动和用户提供的图像生成自定义艺术. 为此，需要一个大型语言模型(LLM)，特别是GPT-3.5, was employed as the basis for the bot. Further, a soft prompt pipeline is Implemented, 考虑用户之前的互动，以准确捕捉用户的语气. Notably, 该系统展示了处理用户特定数据的能力, including NSFW and adult content, all the while maintaining strict user privacy. In terms of image generation, Stable Diffusion 2.我使用Lora进行了微调，结合了LLM推荐的主题和提示.

NFT Image to Immersive 3D

开发了一种系统，通过稳定扩散和深度估计技术，将nft的2D图像转换为沉浸式3D模型.

选择性三维补图涉及到在二维图像中填充缺失或损坏区域的高级过程, 从而产生一个完整的和视觉上吸引人的3D表示. 这种技术有助于提高生成的3D模型的整体质量和真实感.

深度估计是该系统的另一个关键组成部分，因为它可以从2D图像中确定空间深度信息. 这种深度信息对于在生成的3D模型中创建深度感和视角至关重要.

By leveraging Stable Diffusion, 系统保证了稳定一致的发电过程, 从nft的2D对应物中提供高质量和准确的3D表示. 由此产生的3D模型可以显著丰富用户在各种应用中的观看和交互体验, 从虚拟画廊到增强现实环境.

News to Infographics

利用稳定扩散和llm技术，成功交付了生成式AI模型. 这种模式能够从新闻文章中获得生动的故事，并帮助客户获得了Y Combinator的成功融资.

这个过程从新闻文章开始，首先使用GPT-3进行总结.5 Turbo and Davinci, facilitated by LangChain. 随后，视频生成使用微调稳定扩散2.1技术，导致引人入胜的和动态的视觉呈现的新闻故事.

Yoga Pose Correction

开发并部署了一个基于Android平台的实时瑜伽姿势估计与校正系统, utilizing the OpenPifPaf model. 主要目标是实现对各种印度瑜伽姿势的精确和可靠的识别. 一个主要的重点是致力于优化系统的推理速度，以确保在现场瑜伽课程中无缝和实时的性能.

经过深思熟虑的训练模型被量化并转换为TensorFlow Lite格式，以增强可用性和集成. 这种转换简化了将模型整合到Android应用程序中的过程, 为瑜伽爱好者提供一个用户友好的工具来完善他们的练习，并深入了解不同的姿势.

边缘的全整数量化感知训练系统与方法

开发了一个全整数量化感知训练系统. 该系统提高了深度学习网络在低精度设备上的速度和性能.
我开发了伪交叉熵损失函数，并设计了量化方案，用于纯整数量化感知训练. Additionally, 开发了一个SDK，使该系统能够在低功耗边缘计算设备上使用. 该SDK已经成功地用于量化Jetson和供应商定制硬件上的模型.

Fake News Classification

利用机器学习和自然语言处理技术，在印度开发了一个检测和分类假新闻文章的系统.

The project involved preprocessing text data, employing the SetFit model and LSTM, 开发SetFit和LSTM的集合来准确识别假新闻.

此外，使用k-means聚类对假新闻的类型进行聚类. 最终目标是创建一个可靠的工具来打击错误信息的传播. The environment used for this project included Linux, TensorFlow, k-means clustering, scikit-learn, Python, and SetFit.

Text-to-video Generation for Mathematical Equations

开发了一个鲁棒的扩散模型，能够解释数学方程的英文文本描述并生成准确的, coherent video representations. I built a tool that can assist in educational settings, 为学生和教育工作者提供视觉辅助，以更好地理解和交流复杂的数学概念. 我的工作还包括实现高级优化技术，以提高模型在延迟和内存占用方面的性能, 同时也使实时应用程序更高效和可访问.

Skills

Languages

Python, C++, Falcon, JavaScript, Bash Script

Frameworks

Flask, LlamaIndex, Django, Streamlit

Libraries/APIs

PyTorch, TensorFlow, Scikit-learn, SpaCy, OpenCV, Pandas, LSTM, Google Speech-to-Text API, Keras, Fast.ai

Tools

You Only Look Once (YOLO), Git, Notion, Haystack, Azure Machine Learning, Whisper, Amazon SageMaker, Google Bard

Paradigms

Data Science, ETL, Azure DevOps, Continuous Development (CD), Continuous Integration (CI), Search Engine Optimization (SEO)

Platforms

Docker, AWS IoT, Google Cloud Platform (GCP), AWS Lambda, Amazon EC2, iOS, Linux, Amazon Web Services (AWS), Azure

Storage

MySQL, MongoDB, Databases

Other

Deep Learning, Generative Artificial Intelligence (GenAI), Stable Diffusion, Computer Vision, Natural Language Processing (NLP), Quantization, Models, TensorFlow Light, Machine Learning, LangChain, Statistics, Depth Estimation, Time Series, Hugging Face, Detectron, Generative Pre-trained Transformers (GPT), GPT, Large Language Models (LLMs), Artificial Intelligence (AI), OCR, Convolutional Neural Networks (CNN), Image Processing, ChatGPT, OpenAI GPT-4 API, OpenAI GPT-3 API, Text to Image, Diffusion Models, NLU, Deep Neural Networks, Language Models, Statistical Analysis, Data Analysis, Image Analysis, Image Generation, Chatbot, Chatbots, Generative Pre-trained Transformer 3 (GPT-3), Llama 2, Text Analytics, Model Development, Video & Audio Processing, OpenAI, HubSpot, APIs, HubSpot CRM, Retrieval-augmented Generation (RAG), Supervised Learning, Unsupervised Learning, Leadership, Software Architecture, Events, BERT, Reinforcement Learning, PEFT, 2D, Speech to Text, Point Clouds, Point Cloud Data, Text to Speech (TTS), Generative AI, NVIDIA TensorRT, FastAPI, Pose Estimation, 3D Reconstruction, DreamBooth, LoRa, Generative Adversarial Networks (GANs), K-means Clustering, Edge AI, Quantisation, Open Neural Network Exchange (ONNX), Prunning, Benchmarking, Object Detection, Machine Learning Operations (MLOps), Product Matching, Prompt Engineering, ControlNet, Gradio, Civitai, Videos

Education

2016 - 2020

Bachelor of Technology Degree in Computer Science

信息，通信和技术学院-德瓦尔卡，德里，印度

Collaboration That Works

How to Work with Toptal

在数小时内，而不是数周或数月，我们的网络将为您直接匹配全球行业专家.

Share your needs

在与Toptal领域专家的电话中讨论您的需求并细化您的范围.

Choose your talent

在24小时内获得专业匹配人才的简短列表，以进行审查，面试和选择.

Start your risk-free talent trial

与你选择的人才一起工作，试用最多两周. Pay only if you decide to hire them.

Top talent is in high demand.

Start hiring