Making our roads safer with AI

By applying deep learning and computer vision methods, we can estimate traffic safety indicators in interactions of autonomous vehicles with different classes of road users. Aligning video analytics and AI to infer traffic safety can help make our roads safer in the future.

Traffic safety assessment with video analytics

Although there is a lot of buzz surrounding autonomous vehicles, we are still a few years away from seeing autonomous vehicles on our streets as part of our day-to-day routine. It’s not only a race for technology, but it’s also a race for the most trustworthy and safe car.


Companies and countries are investing in research and development on the autonomous vehicles’ industry. The European Union has several ongoing projects to understand the best technologies that will support this new wave of mobility.


The biggest driver for our initiative is a co-funded European project called 5G-Mobix. It aims to showcase the added value of 5G technology for advanced connected autonomous mobility.


Its goal is to bring automated driving to the next level of automation, mainly in a cross-border environment. This is relevant because there is a need for network handover between countries.


The main purpose of this project is to investigate the impact of network handover and potential service disruption on traffic safety parameters, for example to check if the likeliness of collision increases.


To perform this traffic safety assessment, we will be acquiring video footage of the autonomous vehicles and road users using a drone, with a top-down view of the use cases.

Taking it to the field with real traffic scenarios

The first use case where we intend to apply our methodology is in a Lane Merge Scenario, where an autonomous vehicle will perform a lane merge maneuver onto a lane where there are connected and unconnected vehicles.


The second one is an Automated Overtaking Scenario, where an autonomous vehicle will perform an overtaking maneuver, once again on a highway and in the same conditions as the previous scenario.


In the third use case we have an Autonomous Shuttle and Vulnerable Road User Interaction Scenario, evaluated in an urban environment with pedestrians crossing the trajectory of an autonomous shuttle.

Are the autonomous vehicles driving safely?

We are interested in measuring three time-based traffic safety indicators.


Time-To-Collision is one of the most frequently used. Many driving assistants and car collision avoidance systems rely on it. It evaluates the time to collision in case the vehicles keep their current speed and trajectory.


We will also be measuring the Post-Encroachment-Time. This indicator measures the time difference between a vehicle or pedestrian leaving an area of potential collision, and a conflicting vehicle entering it.


The last indicator is called Time Headway, which measures the time gap between two consecutive vehicles that pass the same region in the road.

Our first challenge – using AI for object detection

A video is basically a sequence of static images. In video analytics, we first want to split the video into its individual frames and perform additional computations on each one.

Since we want to evaluate different interactions between vehicles and/or pedestrians, our first challenge is to detect them in the video. And this is where AI comes into play.


AI tries to mimic the way humans think. If we want this AI, also called neural network, to learn how to detect our target classes – vehicles and people – we need to first teach it with examples.


To do this we give it thousands of images similar to the ones we’ll be acquiring with the drone (of roads from a top-down perspective, with and without vehicles and people) where every single vehicle and pedestrian is already labeled.


After this training, when we give it a new image which the AI has never seen before, it will apply those previously learned rules and successfully identify and localize our target classes in the image.


This way, it will learn a set of rules based on the features of the images, that help it identify our target classes.


In this project we are using YOLOv4, a state-of-the-art convolutional neural network designed for object detection to solve this challenge.

Piecing the information together

After detecting the vehicles and pedestrians, we need to know the trajectory of these road users in the video (to know, for example, if they are at risk of collision).


To do this, we track them throughout the consecutive frames of the video, giving each road user a unique ID, so that we can get the direction of their movement.


We also need to know the distance between them. However, we are dealing with a digital video, which is made of pixels.


To obtain a real-life distance in meters from pixels in a digital video, we need a conversion factor, calculated based on details from the camera and video, as well as the altitude of the drone.


By knowing the frame rate of the video, we can easily calculate how much time passes between two consecutive frames. And with distance over time we can determine the speed of our target road users. We now have all the necessary data to calculate the safety metrics.

Seeing the traffic safety metrics in action

Below you can see a sample of our results. Every vehicle is identified, getting a unique ID for tracking purposes and has its instant trajectory represented as a small arrow in the middle of the bounding boxes. The speed is also being estimated.

The Time-To-Collision is being calculated between car 41 and 42, since they have the same trajectory and direction, and the following car has a larger speed than the leading vehicle.

For the Time Headway calculation, we determined the region of interest on the road using a red line, and we measure the time between two vehicles crossing it. In this example we can actually see an unsafe situation between vehicles 34 and 33, since this value is below the safety threshold of 2 seconds.