Skip to main content

Artificial intelligence has proliferated the world and aided the rapid advancement of many industries, including finance, marketing and agriculture. Its applications in the public domain have also been widely studied, contributing to every nation’s step towards building a safer and better living environment for everyone.

Especially in this challenging time, where strict measures have to be put in place to curb the spread of COVID-19, adherence to rules enforced to safeguard the public’s health is critical in ensuring a stable recovery for every country. Monitoring and ensuring a safe distance between persons reduces the risk of community cases. However, it is inefficient to have ambassadors stationed at every location to enforce safe distancing all the time.  Implementing a social distancing monitoring system helps to administer timely warnings where necessary. A key challenge of building this system is to accurately measure the inter-personal distance between persons of arbitrary location in an uncalibrated camera view.

Fig. 1: Depth Estimation Applied on Social Distancing

One of the methods is first to employ Depth Estimation to calculate a person’s distance from the camera. Subsequently, the pinhole camera model is used to convert a person’s pixel coordinates in the camera frame to real-world coordinates. The coordinates of each person are then utilized to calculate the interpersonal-distance between them (Fig. 1). However, this method requires a significant amount of time to train the deep neural network, and are situation-specific.

While patrolling and current surveillance technology are great for observing unexpected behaviours, it is not feasible to have staff make rounds and watch every camera around the clock. For such cases, surveillance analytics can be employed to observe every crevice for unusual activities and articles, aiding security personnel’s response time.


Fig. 2.1. Original Video of Person Snatching Bag

Fig. 2.2: Reconstructed Video of Person Snatching Bag

Fig. 2.3: Normalcy Graph

One of the approaches is implementing an autoencoder, which comprises an encoder, that reduces the input to a lower-dimensional space, and a decoder, that reconstructs the frames using the encoded representations (Fig. 2.1, Fig. 2.2 and Fig. 2.3). However, a key challenge facing its nationwide deployment is defining what constitutes normal and abnormal conduct. It is often a fine line between normal and anomalous, depending on the expected situation at the given location and time. Furthermore, current studies have only been able to observe abnormalities based on dictated normalcy in the past, resulting in high false positives when situations are different, but within expectation (e.g., A mall on Black Friday as opposed to a mall on a normal day).

Aside from imaging-related studies, speech analytics is also widely researched and contributes significantly to the nation’s well-being. In emergencies, deciphering the key content can shed light on the situation and help mitigate conflicts arising from the emergency. However, during such situations, parties tend to be agitated or in a state of panic, resulting in overlapping speech streams. Speech separation systems are used to identify multiple speakers who are conversing simultaneously.

Fig. 3.1: Original Audio with Overlapping Speakers

Fig. 3.2: Speaker 1 Content

Fig. 3.3: Speaker 2 Content

One of the state-of-the-art methods is the Dual-Path RNN, which is an extension of Convolutional Time-domain Audio Separation Network (Conv-TasNet), that employs additional bidirectional long-short term memory networks (LSTMs), to better capture the temporal context of speech data (Fig. 3.1, Fig. 3.2 and Fig. 3.3). A long mixture speech is first truncated into short segments for separation. The x-vectors, which represent the speakers’ characteristics, are extracted from the separated short segments. These x-vectors are then clustered and the separated short segments are grouped to form each speaker’s content. A significant obstacle plaguing the accuracy of the network is the presence of noise and inconsistent speech patterns. During emergencies, the environment is generally chaotic, with several speakers both in the foreground and background. On top of having a low word error rate when discerning foreground speech from background noise, the model should ideally also automatically detect the number of speakers in a clip. At present, speech separation models are limited to observing a known number of speakers, resulting in limited effectiveness when applied practically.

A large part of our national well-being includes quick and accurate emergency support. Mobility of surveillance technology is essential for these circumstances. When affixed with an array of sensors, unmanned ground vehicles (UGVs) can easily survey complex terrains that pose danger to human disaster relief support, like natural disasters.  A key feature of autonomous systems, such as these, is to have an effective navigation stack, as a false move can lead to detrimental effects, particularly when lives are at stake. Developing a robust navigation stack is challenging for UGVs as it requires localization and navigation on a three-dimensional space, without compromising the system’s operating efficiency. A three-dimensional location with two-and-a-half-dimensional route planning navigation stack is a strategy to reduce computation demands while achieving a three-dimensional planar perspective that can be used by the UGV  to navigate various uneven terrains. The two-and-a-half-dimensional path planning is achieved by embedding a three-dimensional terrain on a two-dimensional space, then planning the path for the UGV to take. This allows the UGV to plan its route on a two-dimensional plane while accounting for the topography features. While adopting a two-and-a-half-dimensional navigation stack does alleviate the computational demands, it does not consider the elevation of the ground when planning the route – each mapped cell is taken to either be passable or obstructed – potentially missing on the quickest path.

There is a multitude of other ways to apply machine learning for public wellness; the methods and applications listed above are just tip of the iceberg. Nonetheless, research engineers are constantly innovating and exploring new possibilities to safeguard the public better.