Detection api in v4

klay6096tw · June 15, 2021, 3:00pm

Hi, I want to get specific categories’s bounding boxes(eg. car, human …) in image,

1 does v4 api provide entry to get object’s bounding boxes in image?

2 if not, I try to use polygons to transform to bounding boxes as polygons provided by v3 object detection api, but I work with v4 api, and use https://graph.mapillary.com/:image_id/detections to get polygons, I use code below：

# Python 3.8.3
from urllib.parse import urljoin, urlparse, urlencode, urlunparse
import requests
import base64

default_header:requests.structures.CaseInsensitiveDict = requests.utils.default_headers()
client_token = $YOUR_CLIENT_TOKEN
default_header['Authorization'] = 'OAuth ' + client_token

if __name__ == '__main__':
    img_id = '162491019065656'
    url = urlparse(urljoin('https://graph.mapillary.com','{}/detections'.format(img_id)))
    query = urlencode({'fields':'geometry,value,image'})
    url = urlunparse(url._replace(query=query))
    response = requests.get(url, headers=default_header)
    js = response.json()

    mytest = js['data'][0]['geometry']
    print(mytest)
    print(base64.b64decode(mytest).decode('utf-8'))

And get json below:

{
  "data": [
    {
      "geometry": "GjgKBm1weS1vchIYEgIAABgDIhAJzBvgGhrMAgAAqgPLAgAPGgR0eXBlIgkKB3BvbHlnb24ogCB4AQ==",
      "value": "warning--traffic-merges-left--g1",
      "image": {
        "geometry": {
          "type": "Point",
          "coordinates": [
            139.7798438,
            35.6831047
          ]
        },
        "id": "162491019065656"
      },
      "id": "164110925570332"
    },
    {
      "geometry": "GjUKBm1weS1vchIVEgIAABgDIg0J9CLIHxpsAABCawAPGgR0eXBlIgkKB3BvbHlnb24ogCB4AQ==",
      "value": "information--general-directions--g1",
      "image": {
        "geometry": {
          "type": "Point",
          "coordinates": [
            139.7798438,
            35.6831047
          ]
        },
        "id": "162491019065656"
      },
      "id": "164275338887224"
    },
    {
      "geometry": "GlV4AgoGbXB5LW9yKIAgEkYIARgDIkAJxi+MGdoBPQAFDQDjAgQNQgEADAQIBAAABQgHCAICB1oBCAIYBQgOA4QCBGIHFEUAEQUbAAMFAAcHAAsHBwgP",
      "value": "object--sign--advertisement",
      "image": {
        "geometry": {
          "type": "Point",
          "coordinates": [
            139.7798438,
            35.6831047
          ]
        },
        "id": "162491019065656"
      },
      "id": "165684308746327"
    },
    {
      "geometry": "GjZ4AgoGbXB5LW9yKIAgEicIARgDIiEJ9ifKHmoXAwsHAXcIBSoAGgQGAgYKAxYCQAYaCQwnAQ8=",
      "value": "object--sign--advertisement",
      "image": {
        "geometry": {
          "type": "Point",
          "coordinates": [
            139.7798438,
            35.6831047
          ]
        },
        "id": "162491019065656"
      },
      "id": "166312585350166"
    }
  ]
}

3 what does each of two id in returned json mean? it seems some of them are not related to image id I set.

And also get UnicodeDecodeError ：

GjgKBm1weS1vchIYEgIAABgDIhAJzBvgGhrMAgAAqgPLAgAPGgR0eXBlIgkKB3BvbHlnb24ogCB4AQ==
Traceback (most recent call last):
  File "download_img_from_key.py", line 19, in <module>
    print(base64.b64decode(mytest).decode('utf-8'))
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xcc in position 21: invalid continuation byte

4 if decoded successfully, what is the content of geometry?

5 Is the segmentation label(eg, "value": "information--general-directions--g1" ... ) the same as the v3 api ?

6 By the way, does v4 api provide instance segmentation?

I would appreciate your reply, because I arrange a lot of image keys provided by v3 api and prepare to download images, but keys is not valid now, and some of works using v3 object detection api can not work.By the way, object detection v3 api is a nice work, hope that v4 api also has this feature.

Thanks

jorrarro · June 17, 2021, 8:51am

I checked some geometry values .- looks like they are corrupted in some way or there is some obscure algorithm in use.
For example, there are keywords like “mpy-or”, “type” and “polygon” inside. But the rest is gibberish.
I assume it should be something similar to “shape” in old API:

“shape”:{“type”:“Polygon”,“coordinates”:[[[0.6904296875,0.626708984375],[0.7216796875,0.626708984375],[0.7216796875,0.669189453125],[0.6904296875,0.669189453125],[0.6904296875,0.626708984375]]]}

chrisbeddow · June 17, 2021, 9:06pm

The first outer ID is the ID of the detection, and the one in the object for the image is the image ID (the one you used to look up the image and all its detections).

We do not currently have the segmentations in v4 API, but stand by.

jakebelman · June 24, 2021, 4:03pm

Bumping this - how are the base64 bbox coordinates encoded?

When I decode simply using base64.standard_b64decode(base64-polygon-string), I get a bytes output with a lot of x’s and slashes, like this:

b’\x1a6\n\x06mpy-or\x12\x16\x12\x02\x00\x00\x18\x03"\x0e\t\xee&\xf4\x1d\x1a8\x00\x00\xb6\x017\x00\x0f\x1a\x04type"\t\n\x07polygon(\x80 x\x01’

Any suggestions for how this might be properly decoded?

chrisbeddow · June 29, 2021, 6:32pm

@jakebelman try this in Python

import base64

import mapbox_vector_tile

base64_string = "Gjh4AgoGbXB5LW9yKIAgEikIARgDIiMJxCXQFHIMAAocACAHFAkIUQMLGQQvCAkcBQwTDAIGCggADw=="

data = base64.decodebytes(base64_string.encode('utf-8'))

detg = mapbox_vector_tile.decode(data)

print(detg)

# {'mpy-or': {'extent': 4096, 'version': 2, 'features': [{'geometry': {'type': 'Polygon', 'coordinates': [[[2402, 2776], [2408, 2776], [2413, 2762], [2413, 2746], [2409, 2736], [2404, 2732], [2363, 2734], [2357, 2747], [2359, 2771], [2363, 2776], [2377, 2779], [2383, 2789], [2389, 2788], [2392, 2783], [2396, 2783], [2402, 2776]]]}, 'properties': {}, 'id': 1, 'type': 3}]}} ```

@stevage this may interest you too ^

jakebelman · June 29, 2021, 6:34pm

Thanks @chrisbeddow ! What does the geometry represent in this case? Is that a bounding polygon for a single detection, in [x,y] pixels?

chrisbeddow · June 29, 2021, 7:01pm

Yes, this is a polygon, in x,y pixels, not just bounding but the segmentation. Notice it has the extent, too, of the image, so you may need to normalize by dividing x/extent and y/extent to then project it into MapillaryJS for example, so it then is the percentage from the origin that the pixel coordinate lies.

chrisbeddow · June 29, 2021, 7:36pm

@jakebelman I think I am actually incorrect about the extent field being useful. You need to get the original width and height fields from the photo containing the detection, so ask for fields=width,height in the API request to the image key, then divide the x by width and y by height to get a normalized value.

This then can be rendered in the Mapillary viewer. I need to confirm and test, however.

jakebelman · July 16, 2021, 4:44pm

Hi @chrisbeddow - I’ve tested this out and I’m getting values for the detection coordinates that lie outside of the width/height returned from the API.

As an example, for the image I’m looking at now, the detection API returned coordinates bounded by:
[left, top, right, bottom] = [3470, 2374, 3517, 2557]

But the dimensions of the image returned by the image API are:
[width, height] = [1920, 1080]

Do you know what’s going on here?

chrisbeddow · July 16, 2021, 10:11pm

Hi Jake, can you share an image key and detection key?

I think you have a bug because I discovered a similar one we are fixing. But it might be something else.

Generally I think a bug fix will start returning these correctly, but I can use this image and detection to help test the fix.

jakebelman · July 19, 2021, 3:33pm

Sure!

Image Key: 120777826726018
Detection Key: 121967983273669

Thanks

chrisbeddow · July 19, 2021, 10:30pm

@jakebelman update on this:

The coordinates must be divided by extent (4096) to get [0, 1] based coordinates (normalized), normalized, and then multiplied with the current width/height to get pixel coordinates exactly.

jakebelman · July 21, 2021, 6:25pm

@chrisbeddow that worked! Thanks so much!

Vin · October 14, 2021, 8:44am

Old threat but I think I need to do a similar analysis.

I want to download all advertisements (only the ads, not the whole street picture) in image format (preferably jpg) in a 500m radius around a certain location.
When using the web app I can only download the location in shp, GeoJson, ect format.
I have very limited programming knowledge and the code in this thread is too complex for me. Is there a comprehensive tutorial where this process is explained step by step ?

Thanks

GroveTitus · March 17, 2022, 3:01pm

The Information you shared above is great. I have been reading all you shared here. In this you explained everything very well. If i want any further guideline we will contact you here https://forum.mapillary.com/t/detection-api-in-v4/4999/14 -results.

Gianluca · February 16, 2024, 9:45am

Hi, I’m trying to crop images around the polygon coordinates, I followed the instruction given vy the documentation API but it doesn’t work. Am I doing something wrong with the normalization?
This is my code:

mapfid = ‘1380366595696443’

    detections_url = f'https://graph.mapillary.com/{mapfid}/detections?access_token={codes.API_KEY}&fields=geometry'

    # request the detection
    response = requests.get(detections_url)
    json = response.json()
    detection = json['data'][0]
    base64_string = detection['geometry']

    # decode from base64
    vector_data = base64.decodebytes(base64_string.encode('utf-8'))

    # decode the vector tile into detection geometry
    decoded_geometry = mapbox_vector_tile.decode(vector_data)

    # select just the coordinate xy pairs from this detection
    detection_coordinates = decoded_geometry['mpy-or']['features'][0]['geometry']['coordinates']
    print(detection_coordinates)


    #detection_coordinates = [[[1759, 2163], [1759, 1897], [1940, 1897], [1940, 2163], [1759, 2163]]]
    img = cv2.imread('D:/im/1380366595696443.jpg')
    height = img.shape[0]
    width = img.shape[1]
    #print(width, height)

    # normalize by the 4096 extent, then multiply by image height and width to get true coordinate location
    pg = [[[int(x / 4096 * width), int(y / 4096 * height)] for x, y in tuple(coord_pair)] for coord_pair in detection_coordinates]

    print(pg)

    pts = np.array(pg)

    ## (1) Crop the bounding rect
    rect = cv2.boundingRect(pts)
    x, y, w, h = rect
    cropped = img[y:y + h, x:x + w].copy()

    ## (2) make mask
    pts = pts - pts.min(axis=0)

    mask = np.zeros(cropped.shape[:2], np.uint8)
    cv2.drawContours(mask, [pts], -1, (255, 255, 255), -1, cv2.LINE_AA)

    ## (3) do bit-op
    dst = cv2.bitwise_and(cropped, cropped, mask=mask)

    ## (4) add the white background
    bg = np.ones_like(cropped, np.uint8) * 255
    cv2.bitwise_not(bg, bg, mask=mask)
    dst2 = bg + dst

    cv2.imwrite("D:/cropped_" + mapfid + ".jpg", cropped)
    cv2.imwrite("D:/Dataset Custom Mapillary/test_crop/mask_" + mapfid + ".jpg", mask)
    cv2.imwrite("D:/Dataset Custom Mapillary/test_crop/dst_" + mapfid + ".jpg", dst)
    cv2.imwrite("D:/Dataset Custom Mapillary/test_crop/dst2_" + mapfid + ".jpg", dst2)

Topic		Replies	Views
Crop Images around traffic signs using polygon coordinates Imagery, data, and integrations	0	384	January 29, 2024
Mapillary API v4 timeline - what is going on? Mapillary integrations	25	2743	August 5, 2021
Objects from a sequence Map data and detections	0	346	November 3, 2022
Does mapillary provide specific location (lat,lon) of individual detections within an image? Imagery, data, and integrations	1	330	September 20, 2023
About the imagery, data, and integrations category Imagery, data, and integrations	1	1286	June 23, 2021

Detection api in v4

Related topics