Detection api in v4

Hi, I want to get specific categories’s bounding boxes(eg. car, human …) in image,

1 does v4 api provide entry to get object’s bounding boxes in image?

2 if not, I try to use polygons to transform to bounding boxes as polygons provided by v3 object detection api, but I work with v4 api, and use https://graph.mapillary.com/:image_id/detections to get polygons, I use code below:

# Python 3.8.3
from urllib.parse import urljoin, urlparse, urlencode, urlunparse
import requests
import base64

default_header:requests.structures.CaseInsensitiveDict = requests.utils.default_headers()
client_token = $YOUR_CLIENT_TOKEN
default_header['Authorization'] = 'OAuth ' + client_token

if __name__ == '__main__':
    img_id = '162491019065656'
    url = urlparse(urljoin('https://graph.mapillary.com','{}/detections'.format(img_id)))
    query = urlencode({'fields':'geometry,value,image'})
    url = urlunparse(url._replace(query=query))
    response = requests.get(url, headers=default_header)
    js = response.json()

    mytest = js['data'][0]['geometry']
    print(mytest)
    print(base64.b64decode(mytest).decode('utf-8'))

And get json below:

{
  "data": [
    {
      "geometry": "GjgKBm1weS1vchIYEgIAABgDIhAJzBvgGhrMAgAAqgPLAgAPGgR0eXBlIgkKB3BvbHlnb24ogCB4AQ==",
      "value": "warning--traffic-merges-left--g1",
      "image": {
        "geometry": {
          "type": "Point",
          "coordinates": [
            139.7798438,
            35.6831047
          ]
        },
        "id": "162491019065656"
      },
      "id": "164110925570332"
    },
    {
      "geometry": "GjUKBm1weS1vchIVEgIAABgDIg0J9CLIHxpsAABCawAPGgR0eXBlIgkKB3BvbHlnb24ogCB4AQ==",
      "value": "information--general-directions--g1",
      "image": {
        "geometry": {
          "type": "Point",
          "coordinates": [
            139.7798438,
            35.6831047
          ]
        },
        "id": "162491019065656"
      },
      "id": "164275338887224"
    },
    {
      "geometry": "GlV4AgoGbXB5LW9yKIAgEkYIARgDIkAJxi+MGdoBPQAFDQDjAgQNQgEADAQIBAAABQgHCAICB1oBCAIYBQgOA4QCBGIHFEUAEQUbAAMFAAcHAAsHBwgP",
      "value": "object--sign--advertisement",
      "image": {
        "geometry": {
          "type": "Point",
          "coordinates": [
            139.7798438,
            35.6831047
          ]
        },
        "id": "162491019065656"
      },
      "id": "165684308746327"
    },
    {
      "geometry": "GjZ4AgoGbXB5LW9yKIAgEicIARgDIiEJ9ifKHmoXAwsHAXcIBSoAGgQGAgYKAxYCQAYaCQwnAQ8=",
      "value": "object--sign--advertisement",
      "image": {
        "geometry": {
          "type": "Point",
          "coordinates": [
            139.7798438,
            35.6831047
          ]
        },
        "id": "162491019065656"
      },
      "id": "166312585350166"
    }
  ]
}

3 what does each of two id in returned json mean? it seems some of them are not related to image id I set.

And also get UnicodeDecodeError

GjgKBm1weS1vchIYEgIAABgDIhAJzBvgGhrMAgAAqgPLAgAPGgR0eXBlIgkKB3BvbHlnb24ogCB4AQ==
Traceback (most recent call last):
  File "download_img_from_key.py", line 19, in <module>
    print(base64.b64decode(mytest).decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 21: invalid continuation byte

4 if decoded successfully, what is the content of geometry?

5 Is the segmentation label(eg, "value": "information--general-directions--g1" ... ) the same as the v3 api ?

6 By the way, does v4 api provide instance segmentation?

I would appreciate your reply, because I arrange a lot of image keys provided by v3 api and prepare to download images, but keys is not valid now, and some of works using v3 object detection api can not work.By the way, object detection v3 api is a nice work, hope that v4 api also has this feature.

Thanks

2 Likes

I checked some geometry values .- looks like they are corrupted in some way or there is some obscure algorithm in use.
For example, there are keywords like “mpy-or”, “type” and “polygon” inside. But the rest is gibberish.
I assume it should be something similar to “shape” in old API:

“shape”:{“type”:“Polygon”,“coordinates”:[[[0.6904296875,0.626708984375],[0.7216796875,0.626708984375],[0.7216796875,0.669189453125],[0.6904296875,0.669189453125],[0.6904296875,0.626708984375]]]}

1 Like

The first outer ID is the ID of the detection, and the one in the object for the image is the image ID (the one you used to look up the image and all its detections).

We do not currently have the segmentations in v4 API, but stand by.

1 Like

Bumping this - how are the base64 bbox coordinates encoded?

When I decode simply using base64.standard_b64decode(base64-polygon-string), I get a bytes output with a lot of x’s and slashes, like this:

b’\x1a6\n\x06mpy-or\x12\x16\x12\x02\x00\x00\x18\x03"\x0e\t\xee&\xf4\x1d\x1a8\x00\x00\xb6\x017\x00\x0f\x1a\x04type"\t\n\x07polygon(\x80 x\x01’

Any suggestions for how this might be properly decoded?

@jakebelman try this in Python

import base64

import mapbox_vector_tile

base64_string = "Gjh4AgoGbXB5LW9yKIAgEikIARgDIiMJxCXQFHIMAAocACAHFAkIUQMLGQQvCAkcBQwTDAIGCggADw=="

data = base64.decodebytes(base64_string.encode('utf-8'))

detg = mapbox_vector_tile.decode(data)

print(detg)

# {'mpy-or': {'extent': 4096, 'version': 2, 'features': [{'geometry': {'type': 'Polygon', 'coordinates': [[[2402, 2776], [2408, 2776], [2413, 2762], [2413, 2746], [2409, 2736], [2404, 2732], [2363, 2734], [2357, 2747], [2359, 2771], [2363, 2776], [2377, 2779], [2383, 2789], [2389, 2788], [2392, 2783], [2396, 2783], [2402, 2776]]]}, 'properties': {}, 'id': 1, 'type': 3}]}} ```

@stevage this may interest you too ^

1 Like

Thanks @chrisbeddow ! What does the geometry represent in this case? Is that a bounding polygon for a single detection, in [x,y] pixels?

Yes, this is a polygon, in x,y pixels, not just bounding but the segmentation. Notice it has the extent, too, of the image, so you may need to normalize by dividing x/extent and y/extent to then project it into MapillaryJS for example, so it then is the percentage from the origin that the pixel coordinate lies.

1 Like

@jakebelman I think I am actually incorrect about the extent field being useful. You need to get the original width and height fields from the photo containing the detection, so ask for fields=width,height in the API request to the image key, then divide the x by width and y by height to get a normalized value.

This then can be rendered in the Mapillary viewer. I need to confirm and test, however.

2 Likes

Hi @chrisbeddow - I’ve tested this out and I’m getting values for the detection coordinates that lie outside of the width/height returned from the API.

As an example, for the image I’m looking at now, the detection API returned coordinates bounded by:
[left, top, right, bottom] = [3470, 2374, 3517, 2557]

But the dimensions of the image returned by the image API are:
[width, height] = [1920, 1080]

Do you know what’s going on here?

Hi Jake, can you share an image key and detection key?

I think you have a bug because I discovered a similar one we are fixing. But it might be something else.

Generally I think a bug fix will start returning these correctly, but I can use this image and detection to help test the fix.

Sure!

Image Key: 120777826726018
Detection Key: 121967983273669

Thanks :slight_smile:

@jakebelman update on this:

The coordinates must be divided by extent (4096) to get [0, 1] based coordinates (normalized), normalized, and then multiplied with the current width/height to get pixel coordinates exactly.

@chrisbeddow that worked! Thanks so much!

Old threat but I think I need to do a similar analysis.

I want to download all advertisements (only the ads, not the whole street picture) in image format (preferably jpg) in a 500m radius around a certain location.
When using the web app I can only download the location in shp, GeoJson, ect format.
I have very limited programming knowledge and the code in this thread is too complex for me. Is there a comprehensive tutorial where this process is explained step by step ?

Thanks

The Information you shared above is great. I have been reading all you shared here. In this you explained everything very well. If i want any further guideline we will contact you here https://forum.mapillary.com/t/detection-api-in-v4/4999/14-results.

Hi, I’m trying to crop images around the polygon coordinates, I followed the instruction given vy the documentation API but it doesn’t work. Am I doing something wrong with the normalization?
This is my code:

mapfid = ‘1380366595696443’

    detections_url = f'https://graph.mapillary.com/{mapfid}/detections?access_token={codes.API_KEY}&fields=geometry'

    # request the detection
    response = requests.get(detections_url)
    json = response.json()
    detection = json['data'][0]
    base64_string = detection['geometry']

    # decode from base64
    vector_data = base64.decodebytes(base64_string.encode('utf-8'))

    # decode the vector tile into detection geometry
    decoded_geometry = mapbox_vector_tile.decode(vector_data)

    # select just the coordinate xy pairs from this detection
    detection_coordinates = decoded_geometry['mpy-or']['features'][0]['geometry']['coordinates']
    print(detection_coordinates)


    #detection_coordinates = [[[1759, 2163], [1759, 1897], [1940, 1897], [1940, 2163], [1759, 2163]]]
    img = cv2.imread('D:/im/1380366595696443.jpg')
    height = img.shape[0]
    width = img.shape[1]
    #print(width, height)

    # normalize by the 4096 extent, then multiply by image height and width to get true coordinate location
    pg = [[[int(x / 4096 * width), int(y / 4096 * height)] for x, y in tuple(coord_pair)] for coord_pair in detection_coordinates]

    print(pg)

    pts = np.array(pg)

    ## (1) Crop the bounding rect
    rect = cv2.boundingRect(pts)
    x, y, w, h = rect
    cropped = img[y:y + h, x:x + w].copy()

    ## (2) make mask
    pts = pts - pts.min(axis=0)

    mask = np.zeros(cropped.shape[:2], np.uint8)
    cv2.drawContours(mask, [pts], -1, (255, 255, 255), -1, cv2.LINE_AA)

    ## (3) do bit-op
    dst = cv2.bitwise_and(cropped, cropped, mask=mask)

    ## (4) add the white background
    bg = np.ones_like(cropped, np.uint8) * 255
    cv2.bitwise_not(bg, bg, mask=mask)
    dst2 = bg + dst

    cv2.imwrite("D:/cropped_" + mapfid + ".jpg", cropped)
    cv2.imwrite("D:/Dataset Custom Mapillary/test_crop/mask_" + mapfid + ".jpg", mask)
    cv2.imwrite("D:/Dataset Custom Mapillary/test_crop/dst_" + mapfid + ".jpg", dst)
    cv2.imwrite("D:/Dataset Custom Mapillary/test_crop/dst2_" + mapfid + ".jpg", dst2)