Building a Dataset

For educational purposes, I’m trying to trying to train a yolov8 Model to detect US streetsigns. Since the downloadable streetsign content includes streetsigns from all over the world, I wanted to use the Mapillary API/Mapillary Python SDK to do the following:

  • Extract X amounts of Images that show streetsigns (With Bounding boxes in US cities)
  • Extract label Annotations of each streetsign from each Image
    OR
  • Find all Streetsign labels for the US
  • Extract all images inside of bounding box with filtered streetsign label
  • Extract label Annotations of each streetsign from each of these Images

In the End I want to have a folder full of images and a folder full of Labels. Is that doable with Mapillary?
If anyone here has some experience using the Mapillary Python SDK I would be grateful for some tips tailored to my goal^^

1 Like

Hey Tobias, yeah that should be doable with the Mapillary API. I’ve done something similar for traffic sign detection. You’ll want to filter by region (US), use the object detection layer to pull only images with sign labels, and grab the bounding box data from the detections endpoint. The SDK helps a lot, but expect some cleanup on the annotations. Good luck with the dataset!

1 Like

You can also look into using the map_feature and image endpoints together to narrow down by location and object type. Once you have the image keys, you can pull the detections and filter for street signs, then export both images and annotation files in your preferred format. It’s a bit of scripting, but the API gives you all the pieces.

1 Like