Scalable MLOps for AI-Powered Video Analytics

Industry

Automotive & Industrial Operations

Location

United States

Company Size

Enterprise

Project Duration

6 Months

Services Provided

We built a flexible system on Amazon EKS microservices.
We divided computing power into groups, one for user-facing parts (ReactJS frontend and NestJS backend), another for AI/ML tasks.
We used an S3 Bucket and Amazon SQS to queue video, adding robust features to the system that also prevent it from crashing during heavy load.

Technologies used

Challenges

Overcoming Monolithic Bottlenecks in Video Analytics

Our client, a leading video analytics provider, was struggling with a monolithic setup. The major issue with this old setup was that it processed high-resolution videos simultaneously. When users tried to upload multiple files at once, the system often timed out, ran out of resources, and made the entire experience frustrating and time-consuming. Moving from an old, rigid system to an advanced, cloud-native platform that leverages artificial intelligence was the solution they were looking for.

PSSPL delivered a cloud-native, scalable platform on Amazon EKS to transform a top video analytics provider’s monolithic system. Our AI- powered solutions handle high-resolution video uploads efficiently, preventing timeouts and resource crashes during busy hours. We overcome the core challenge of our client’s legacy monolithic setup with a flexible AI-leveraged cloud platform for seamless scaling.

How PSSPL Helped

Our team of MLOps developers leverages Amazon EKS for isolated, scalable microservices across the ML lifecycle.

Asynchronous Orchestration with SQS and S3

The raw video files get uploaded straight to Amazon S3, and the backend queues jobs in SQS.

Video Processing Workflow: A DAG-Powered Pipeline

For robust feature extraction, the system follows a directed acyclic graph (DAG).

MLOps Best Practices in Action

We leverage the best practices of MLOps, including auto-scaling, hybrid inference, and state management.

Disaggregated Compute via Node Groups

Frontend/Backend Nodes: ReactJS (UI) and NestJS (API orchestration) run on cost-effective CPU instances. <br>
Python Listener Nodes (ML Workers): GPU-powered machines for quicker analysis of videos and extracting key details.

Ultimate Results: Scalability, Reliability, and Efficiency Gains

We have scaled Python node groups, empowering the solution to handle 1-50+ videos. <br>

SQS decoupling prevents failures from propagating to the UI.<br>

Automating these checks reduces manual review time by 80% while delivering VQS insights in a fraction of a second.

Features We Added

Frame Extraction

We used FFmpeg to extract frames from videos for further processing.

Audio Extraction

To extract audio tracks from video files for audio processing, we use FFmpeg.

Longest Silence Duration

Detection of the longest silence period in an audio file.

Audio Volume Analysis

Tells the user about the average loudness of the video and the point at which the volume is highest.

Video and Audio Metadata Extraction

Gathering key information about video and audio streams, such as how long the video is, how many images per second, and the total number of pictures.

Thumbnail Generation

It automatically captures clear images from the video, reduces their size, and turns them into attractive thumbnail images.

Blur Detection

With blur detection, the quality of thumbnails is measured, accurately detecting blurred frames, so that you only get the best.

Speech-to-Text Transcription

Using OpenAI Whisper, spoken words in the video are converted into text (with high accuracy and some built-in error correction).

Profanity Check

Identifies and lists any word that's profane in the transcription text.

Word Count Calculation

Count and mention the number of words in the transcription text.

Text Summarisation

The detailed transcription text is converted into a summary.

Detection of License Plate

From the video frames, the license plates of the car are detected using OCR.

Words Per Minute Calculation

Based on the transcription text and the duration of audio, it extracts the average number of words spoken per minute.

Keyword Detection

Identify and jot down relevant keywords to automobile services from the transcription text.

Subtitle Stream Check

Identifies and provides information about subtitle streams in a video file.

Car Dealer and Model Extraction

Uses GPT- 4o to extract car dealer names and model information from transcriptions.

Muffled Audio Detection

Analysis of audio tracks for muffled quality issues.

Object Detection

From the video, various components of a car are detected, including tyres, brakes, and viper blades.

Camera Stability

The motion speed and stability of the camera are checked for the recorded video.

Educate Customer

Assess whether the technician clearly explains the issues and procedures to the customer.

Ready to Leverage Scalable AI for your Video Analytics?

PSSPL handles 1-50+ videos effortlessly with auto-scaling Python nodes, powered by our MLOps experts.

We architect future-proof solutions using proven MLOps best practices, slashing manual review time by 80%. Whether you are in automotive, security, media, or beyond, we'll rebuild your platform from the ground up.

Get Started Now!

Project Highlights

Ready to Build Enterprise AR Solutions?

Contact Us Now!

AI capabilities

AI by industry

Enterprise AI

SharePoint

Dynamics 365 ERP

Dynamics 365 CRM

MS Teams

Power Platform

Azure & Cloud

SERVICES

Design

Backend / Open source

Frontend tech

Mobile

Digital Marketing

Industries

Specialised

Hire Us

About

Portfolio & Blog

Careers

Partner