News and insights on the impact of data on smart cities, businesses and technologies. Curated by RList

Autonomous driving researchers can now tap into more than 140 million frames of data from open dataset to accelerate autonomous driving research

Thank you for your support at RList Insights all these years!
We are migrating our robotics-related content to, a platform (beta) which helps you discover companies & products in robotics, AI, IoT, metaverses & other emerging tech. Visit our new home!

An updated 2022 version of this article Autonomous Driving Open Datasets Released To Date (2022) can be viewed at
The recent release of the Waymo Open Dataset in August 2019 adds to the open pool of more than 140 million frames of autonomous driving data from various autonomous vehicle firms, car manufacturers and research labs.

self-driving car image from Pixabay

Recent releases

Waymo is the latest self-driving car company to release an open version of its dataset in 2019. However, in the past two to three years, several notable autonomous driving firms and research labs have released their datasets publicly for researchers worldwide to work with. Earlier in the year, Aptiv (previously Nutonomy) released the nuScenes dataset, followed by Argo with the Argoverse dataset and Lyft with their Level 5 Dataset. In 2018, Berkeley A.I. Research (B.A.I.R), released the BDD100K dataset which is the largest to date in terms of monocular video data frames (120 million frames). Baidu's Apollo program released the ApolloScape dataset which featured 146,997 frames. Hesai & Scale is expected to release their full dataset in the coming months.

Types of data released

The earlier datasets such as the BBD100K and ApolloScape contained primarily annotated frames from monocular camera video. The datasets released in 2019 come in richer variety and include different types of data from LiDAR cameras, radar and stereo cameras. Most of these datasets provide different city scenarios, multiple weather conditions, times of day and scene types to help researchers improve their autonomous driving models and algorithms to work optimally in different situations. The bulk of the datasets are collected from U.S. cities such as San Francisco, Phoenix, Pittsburgh, New York and others, as well as overseas cities in Singapore, Germany (Karlsruhe) and China.

Edit: For more information on the released dataset, refer to this article 2019 Autonomous Driving Open Datasets Released To Date.

Why companies are beginning to release their datasets into the open

 The European Union commissioner for transport, Violeta Bulc, said during the City as a Lab conference that she expects full self-driving capability by 2030.  2030 is a decade away but there are still many research challenges in realizing fully autonomous driving (Level 5 driving automation). The release of more open datasets to the research community will certainly help to accelerate the pace and depth of research towards fully autonomous driving.

How ready is autonomous driving today?

SAE J3016 Six Levels of Driving Automation
SAE J3016 Six Levels of Driving Automation ( click to enlarge )

There are six levels of driving automation in SAE (Society of Automobile Engineers) International's J3016TM driving automation standard from Level 0 (No driving automation) to Level 5 (Full driving automation). Most vehicles on our roads today are at Level 0, that is, manually driven. Tesla Autopilot and Cadillac Super Cruise qualify as Level 2. The 2019 Audi A8L with Traffic Jam Pilot will be classified at Level 3 when rolled out. Level 4 autonomous vehicles are in geo-fenced test-bedding stages at the moment and that is the frontline research of autonomous driving in the world today.

A future of Level 5 fully autonomous driving will greatly reduce fatalities on the road, solve traffic issues such as congestion and parking, and improve the environment by reducing personal cars on the road and maximizing shared transport. The datasets these firms have released may only be a small fraction of the dataset they have proprietary, but these datasets have already put many researchers on a more ready starting plane than before.