Skip links

Scalable MLOps for AI-Powered Video Analytics

An AI-Driven Video Quality Scoring (VQS) Platform

Industry

Automotive & Industrial Operations

Location

United States

Company Size

Enterprise

Project Duration

6 Months

Services Provided

Technologies used

ReactJS

TypeScript

NestJS

Python

FFmpeg

OpenAI

AWS RDS

Web Socket

AWS SQS

AWS S3

AWS EC2

Challenges

Overcoming Monolithic Bottlenecks in Video Analytics

Our client, a leading video analytics provider, was struggling with a monolithic setup. The major issue with this old setup was that it processed high-resolution videos simultaneously. When users tried to upload multiple files at once, the system often timed out, ran out of resources, and made the entire experience frustrating and time-consuming. Moving from an old, rigid system to an advanced, cloud-native platform that leverages artificial intelligence was the solution they were looking for.

PSSPL delivered a cloud-native, scalable platform on Amazon EKS to transform a top video analytics provider’s monolithic system. Our AI- powered solutions handle high-resolution video uploads efficiently, preventing timeouts and resource crashes during busy hours. We overcome the core challenge of our client’s legacy monolithic setup with a flexible AI-leveraged cloud platform for seamless scaling.

Manish Langa

AI Practice Head, PSSPL

How PSSPL Helped

Our team of MLOps developers leverages Amazon EKS for isolated, scalable microservices across the ML lifecycle.

The raw video files get uploaded straight to Amazon S3, and the backend queues jobs in SQS.

For robust feature extraction, the system follows a directed acyclic graph (DAG).

We leverage the best practices of MLOps, including auto-scaling, hybrid inference, and state management.

Frontend/Backend Nodes: ReactJS (UI) and NestJS (API orchestration) run on cost-effective CPU instances. <br>
Python Listener Nodes (ML Workers): GPU-powered machines for quicker analysis of videos and extracting key details.

We have scaled Python node groups, empowering the solution to handle 1-50+ videos. <br>

SQS decoupling prevents failures from propagating to the UI.<br>

Automating these checks reduces manual review time by 80% while delivering VQS insights in a fraction of a second.

Features We Added

Frame Extraction

We used FFmpeg to extract frames from videos for further processing.

Audio Extraction

Audio Extraction

To extract audio tracks from video files for audio processing, we use FFmpeg.

Longest Silence Duration

Longest Silence Duration

Detection of the longest silence period in an audio file.

Audio Volume Analysis

Audio Volume Analysis

Tells the user about the average loudness of the video and the point at which the volume is highest.

Video and Audio Metadata Extraction

Video and Audio Metadata Extraction

Gathering key information about video and audio streams, such as how long the video is, how many images per second, and the total number of pictures.

Thumbnail Generation

Thumbnail Generation

It automatically captures clear images from the video, reduces their size, and turns them into attractive thumbnail images.

Blur Detection

Blur Detection

With blur detection, the quality of thumbnails is measured, accurately detecting blurred frames, so that you only get the best.

Speech-to-Text Transcription

Speech-to-Text Transcription

Using OpenAI Whisper, spoken words in the video are converted into text (with high accuracy and some built-in error correction).

Profanity Check

Profanity Check

Identifies and lists any word that's profane in the transcription text.

Word Count Calculation

Word Count Calculation

Count and mention the number of words in the transcription text.

Text Summarisation

Text Summarisation

The detailed transcription text is converted into a summary.

Detection of License Plate

Detection of License Plate

From the video frames, the license plates of the car are detected using OCR.

Words Per Minute Calculation

Words Per Minute Calculation

Based on the transcription text and the duration of audio, it extracts the average number of words spoken per minute.

Keyword Detection

Keyword Detection

Identify and jot down relevant keywords to automobile services from the transcription text.

Subtitle Stream Check

Subtitle Stream Check

Identifies and provides information about subtitle streams in a video file.

Car Dealer and Model Extraction

Car Dealer and Model Extraction

Uses GPT- 4o to extract car dealer names and model information from transcriptions.

Muffled Audio Detection

Muffled Audio Detection

Analysis of audio tracks for muffled quality issues.

Object Detection

Object Detection

From the video, various components of a car are detected, including tyres, brakes, and viper blades.

Camera Stability

Camera Stability

The motion speed and stability of the camera are checked for the recorded video.

Educate Customer

Educate Customer

Assess whether the technician clearly explains the issues and procedures to the customer.

Ready to Leverage Scalable AI for your Video Analytics?

PSSPL handles 1-50+ videos effortlessly with auto-scaling Python nodes, powered by our MLOps experts.

We architect future-proof solutions using proven MLOps best practices, slashing manual review time by 80%. Whether you are in automotive, security, media, or beyond, we'll rebuild your platform from the ground up.

Project Highlights

Ready to Build Enterprise AR Solutions?

Contact Us Now!