How AI can reduce costs and increase efficiency in content creation | Industry trends


“Video content is a very powerful source of information, essential for analysis,” said TVCONAL founder Masoumeh Izadi.

His company has developed a platform powered by artificial intelligence and machine learning that quickly analyzes sports footage, currently focusing on cricket.

“Data plays a central role in sports,” she said. “Coming together to create game semantics and metadata tags, it would make your content searchable, customizable, and you can extract value from it and monetize it.”

Read more Artificial Intelligence in Broadcasting

TVCONAL analyzed 168 cricket matches over an eight-month period. The end result in each case is a searchable content platform for game events such as specific hitting or bowling techniques.

Every match “has to be cut into what we call units of analysis, which are different in every sport,” Izadi said. “For example, it could be shots or throws, or a delivery or a stroke – depending on the sport… At the heart of this learning model is learning to slice.”

The platform uses “machine learning and computer vision” algorithms to recognize these game events based solely on the content of the video itself.

“The solution we’re proposing is to use video analytics, which at the moment is very, very advanced, to the point where you can understand and find out what’s in the content. In sports content, that would mean identify and locate different types of objects, be able to track those objects, detect players, type of player, track their movements, etc. – be it their position or key points on their body – just from the content of the video.

AI recognizes kicking techniques based on player movement or a 6-point shot by analyzing when the ball crosses the boundary, with over 95% accuracy. TVCONAL has developed the system for recording premier multi-camera cricket productions, basic 3-camera shoots and single-camera recordings, where accuracy with a more limited analysis model can reach over 99% .

Technology that saves time and money

This form of AI-powered video analysis eliminates much of the cost and effort associated with tagging and categorizing content.

“Even in high-level productions, it is time-consuming, labor-intensive, and prone to human error. Also, for archive content accumulated over the years, it’s a nightmare to go through,” Izadi said.

There are many uses for this form of AI content analysis, and TVCONAL is currently focusing on applications within sport itself. It is in discussions with six professional cricket teams in Southeast Asia and 10 cricket centers of excellence in Malaysia about using its platform as a training tool.

Izadi calls it “a movement to democratize sport in the digital age”.

“[AI technology can] give any sports team the privilege of major sports teams. Saving costs and time on productions, empowering the production team in their operation and unleashing their ability to produce more engaging and interesting sports content.

TVCONAL has used over “20,000 samples” to train its cricket machine learning algorithms and plans to branch out into other sports in the future. “We are looking at racquet spots, tennis and table tennis,” Izadi said.

She also demonstrated experimental auto-generated commentary at IBC2002, relying on analysis of game events with sportscaster-style speech automatically generated to match the on-screen action.

However, the true cutting edge of speech synthesis was demonstrated by Kiyoshi Kurihara of NHK, the Japan Broadcasting Corporation.

NHK currently uses text-to-speech to accompany news reports on the NHK 1 television channel, live online sports commentary, and weather forecasts on local radio stations. It provides professional line reading, via an “AI anchor”, but the actual input is typed or auto-generated rather than spoken by a real person.

Kiyoshi Kurihara explained that the process breaks down written words into graphemes, recognizes their respective sound or phoneme, and then converts those phonemes into a waveform. This is the audio clip of the generated speech that can be streamed or synced with video.

The AI ​​model that can produce realistic line readings “requires 20 hours of voice data per person,” according to Kunihara. “It’s every hard work,” he added.

This is especially true using a more traditional method. “Training is difficult for two reasons. First, text-to-speech requires high-quality speech [recordings], and it requires four people. A presenter, an engineer, a director and a data annotator. Secondly, in terms of quality, it is important to produce high-quality speech synthesis, because the noise will also be regenerated,” he explained.

The “forward-thinking” component of NHK’s process removes much of this heavy workload. “This manual process can be eliminated,” Kunihara said, using a supervised learning approach.

Read more Broadcast Trends: From AI and D2C to NFT and 8K

The birth of an AI anchor

NHK’s AI Anchor model is created using actual radio speech that has already been broadcast. A difficulty here is that a radio program can feature a mixture of music and speech and, naturally, only speech can be used to build the tech-to-speech profile.

“We developed a method to automatically retrieve clips from a sentence using state-of-the-art speech processing,” Kunihara said. The show is broken down into small segments, removing music and other superfluous sections, which become the units of the units used to form the AI ​​voice synthesizer.

“Waveform synthesis uses deep learning and converts the phoneme into a waveform,” Kunihara explained. And by automating some of the most difficult parts of the process, NHK is able to affordably and efficiently develop virtual radio and television presenters.

Local radio provides an excellent example of where this can not only reduce costs, but increase the usefulness of broadcasting. “There are 54 [radio] local stations in the area and it was expensive for them to provide local weather,” Kunihara said. NHK automatically generates scripts using weather report information for each local station, then uses its TTS (text-to-speech) system to create a bespoke weather report ready for broadcast.

NHK also used similar techniques for mass sporting events. “In 2018 and 2021, we provided live sports commentary using TTS during the Olympics and Paralympics and distributed them over the internet,” Kunihara said. The team used metadata from the official Olympic data feed to automatically generate a script for each feed.

This relates to the creation of metadata that TVCONAL creates in its analysis of cricket footage, demonstrating how AI technologies can often work hand in hand in this area.

NHK’s Kiyoshi Kurihara and TVCONAL Founder Masoumeh Izadi spoke at an IBC2022 session titled Tech Papers: How AI Advances Media Production. It was hosted by

Nick Lodge, director of Logical Media. For more content from the IBC2022 show, check out the latest IBC2022 video and full coverage on 365.

Source link

Jenny T. Curlee