Generative Artificial Intelligence, or Gen AI, is being hailed as a ground-breaking breakthrough that has the potential to completely transform the way we work in the quickly changing technological landscape. As the term implies, Gen AI has seen applications across a whole range of modalities, from text to video, speech, robotics. In particular, technologies such as ChatGPT and DALL-E have democratized GenAI, allowing any user to access and generate unique textual, image and video content based on their own set of unique inputs.
Specifically, Gen AI can produce original content that mimics human creativity, including literature, images, music, and videos, by utilising sophisticated algorithms and neural networks. Understanding and utilising the potential of Gen AI can open new possibilities for productivity, innovation, and strategic growth as companies work to maintain their competitiveness and inventiveness. This article examines the revolutionary possibilities and risks presented by Gen AI, the latest developments in this field, and how KLASS is well-positioned to leverage on Gen AI to meet the growing risks and opportunities associated with the evolving digital, socio-economic and environmental landscapes.
Revolutionary possibilities and risks of Gen AI
Multi-modal Gen AI involves integrating different types of data to create more comprehensive and versatile models. It also trains models to understand and generate content across multiple modalities, enhancing their ability to perform complex tasks. This includes aligning text descriptions with corresponding images or videos which improve the model’s ability to generate coherent and contextually accurate content.
With these possibilities, various risks have also been presented that affect public safety. Malicious actors have used Gen AI to produce deepfakes, both audio and visual, to create fake information regarding different Political Persons of interest. For example, Senior Minister Lee Hsien Loong was most recently subject to a series of deepfake videos that show him apparently talking about US-China relations in the context of tensions in the South China Sea, as well as the two superpowers’ relationship with the Philippines (Channel NewsAsia, 2024). By manipulating different audio and visual inputs, these deepfakes enable the creation of misinformation campaigns which can result in grave consequences for public safety.
Latest research offerings
Fueling the use of multi-modal Gen AI use cases is the continued innovation in newer state of the art Large Language Models (LLMs). In the past year, research breakthroughs have been seen in the use of LLMs to enhance natural language understanding and generation for agents within multi-agentic systems, and subsequent frameworks have been created to enable multiple autonomous agents to interact and collaborate within a shared environment. This has even been extended beyond text, as we see the use of LLMs to assist in Robotics Task Planning, harnessing the LLMs to convert high level instructions into useful input to guide robotics actions (Gao et al., 2024; Kannan, Venkatesh, & Min, 2023). This emergent area of research has also driven new discussions around the abilities of embodied AI robotics systems, an area which KLASS is well positioned to explore and productionise.
KLASS’ unique offerings in applied R&D
At KLASS, our teams across Speech Analytics, Linguistics, Video Analytics, internet of Things, Cloud have the technical and research expertise to leverage on the current trends in Gen AI, particularly multi-modality. Refer to our other article about the different innovations that KLASS has championed earlier this year.
With our breadth of technical expertise over our teams, we are able to work with public safety professionals to leverage multi-agentic large language models (LLMs) to interact across systems, such as centralised sensemaking solutions involving video & speech analytics across multiple data sources. KLASS is also spearheading applied research efforts to foster the development of spoofing detection solutions to discriminate between bona fide and deepfake speech utterances, an area of research to counter misinformation campaigns (Delgado et al., 2024). Many of these solutions can be on-premise, taking into account the various data governance requirements. We are also able to conduct research into explainable AI methods, as well as exploring guardrails around LLM hallucination. By closely monitoring technology advancements in key areas, such as efficiency improvement through small language models, quantisation and non-GPU computation (see ThirdAI labs), we are constantly seeking to identify gaps and tailored solutions for our customer groups.
Allen Hum, Head of Engineering for Cloud and AI solutions, shared his insights on the subject: “We have already observed a shift from traditional modalities classification and prediction tasks towards the use of natural language. With the emergence of large language models (LLMs), we can expect more multi-modalities event prediction tasks leveraging the strengths of LLMs, such as zero-shot capabilities and scalability. LLMs have demonstrated remarkable abilities in retrieving out-of-domain information and have recently made progress in reasoning tasks across various domains. However, LLMs are not without their flaws, such as hallucination, susceptibility to adversarial attacks on prompts, and inconsistent reasoning capabilities, which can result in undesirable outputs, harmful content, and irrational responses. These challenges open up new research areas to improve LLMs. In addition, research on multiple AI agents and embodied AI within the LLM context is also gaining traction.”
The potential of Gen AI to revolutionise industries and improve public safety is becoming more and more apparent. At KLASS, we’re dedicated to using technology’s potential to develop cutting-edge solutions for problems pertaining to public safety. We cordially invite you to accompany us on this innovative and exploratory adventure. KLASS is ready to work with you to create meaningful change, whether you’re a researcher excited about expanding AI capabilities, a corporate leader trying to remain ahead of the curve, or a public safety expert looking for practical answers. Together, let’s maximize the promise of generative AI and build a more effective, safe future. To find out how we can assist you in utilising the potential of Gen AI, get in touch with us right now.
References:
Channel NewsAsia. (2024). Senior Minister Lee Hsien Loong warns of malicious deepfake videos targeting foreign relations leaders. Channel NewsAsia. Retrieved August 6, 2024, from https://www.channelnewsasia.com/singapore/senior-minister-lee-hsien-loong-warns-malicious-deepfake-videos-foreign-relations-leaders-4440386
Delgado, H., Evans, N., Jung, J. W., Kinnunen, T., Kukanov, I., Lee, K. A., & Yamagishi, J. (2024). ASVspoof 5 Evaluation Plan.
Gao, J., Sarkar, B., Xia, F., Xiao, T., Wu, J., Ichter, B., Majumdar, A., & Sadigh, D. (2024). Physically grounded vision-language models for robotic manipulation. In IEEE International Conference on Robotics and Automation (ICRA). IEEE.
Kannan, S. S., Venkatesh, V. L., & Min, B. C. (2023). Smart-llm: Smart multi-agent robot task planning using large language models. arXiv preprint arXiv:2309.10062.
Klasses.com.sg (2024). Championing Innovation – KLASS. Retrieved August 6, 2024, from https://klasses.com.sg/events/championing-innovation-klass/