Open source software is a software with source code that anyone can inspect, modify, and enhance. There are many institutions and individuals who write open software, mainly for research or free deployment purposes. Mostly these softwares, have only a few maintainers, and multiple people, writing and debugging the code, helps a lot. This is where Google Summer of Code GSoC
comes into the picture. It is a global, online program focused on bringing new contributors into open source software development. Many organisations float projects for the developers to take over the summer and Google mediates in the process, while also paying the contributors for their work over the summer.
The primary motivation behind this project is to enhance the media consumption experience by providing users with control over the content they see during commercial breaks. Advertisements can often be disruptive, irrelevant, or inappropriate for the viewing context.
For instance:
By developing a system that detects and replaces commercials in real-time, this project aims to ensure that viewers can enjoy a seamless and pleasant viewing experience, tailored to their preferences. Also it will serve as an educational resource on integrating AI and machine learning capabilities into embedded systems.
The project has 3 main components:
Creating a Deep Learning Model:
Implementing a GStreamer plugin:
Optimizing for BeagleBoard:
BeagleBone® AI-64 brings a complete system for developing artificial intelligence (AI) and machine learning solutions with the convenience and expandability of the BeagleBone® platform and the peripherals on board to get started right away learning and building applications. The hardware supports real-time H.264 and H.265 (HEVC) video encoding and decoding. Typical performance metrics indicate the capability to handle 1080p streams at 60 frames per second (fps) or multiple 720p streams simultaneously.
For the project, I’ll take input of media from an HDMI source and, after processing it, the output will be displayed on a monitor using a miniDP to HDMI cable.
Using BeagleBone AI-64 hardware is a key aspect of this project, offering several benefits:
We can see from the above comparison that TDA4VM is up to 60% better in terms of FPS/TOPS efficiency. What this means is that 60% less TOPS are needed to run equivalent deep learning functions.
Deep learning is a subset of machine learning that involves neural networks with many layers, known as deep neural networks. Each layer of the network extracts increasingly abstract features from the input data, enabling the system to understand and generate intricate representations.
For the project, I’ll develop an audio-visual CNN model that combines Mel-spectrogram data from audio with frames extracted from videos as its input. To extract features, I’ll utilize well-known CNN architectures like MobileNetV2, InceptionV3, and DenseNet169. Ultimately, I’ll merge the features obtained from the audio and visual components and then conduct classification based on these merged features and then will train the model accordingly.
GStreamer is a powerful open-source multimedia framework that provides a pipeline-based system for constructing media applications. It allows developers to create, edit, and play various types of multimedia content, including audio, video, and streaming media. One of the key features of GStreamer is its plugin-based architecture, which allows developers to extend its functionality by adding new plugins for different media formats, codecs, and processing elements.
In the project, I will create a GStreamer Plugin that will receive input from an HDMI source and will use the trained model to inference whether the current frame belongs to commercial or not. If a commercial video is detected, it will apply blurring to the video frames and replace the audio.
Thank you for reading the complete blog!!