Mapillary Tools 0.14 is released

To upgrade:

pip install -U mapillary_tools

# or simply run
uv tool run mapillary_tools --version
mapillary_tools version 0.14.0

:glowing_star: Highlights

  • :rocket: Performance Boosts:
    Enjoy massive speed improvements across the board, including faster native EXIF, GoPro, CAMM, and Blackvue video processing. Uploads and processing are now significantly more responsive, making large dataset handling smoother than ever.

  • :hammer_and_wrench: Built-in ExifTool Fallback:
    Mapillary Tools now uses its optimized native parsers for EXIF, CAMM, GoPro, and related metadata by default. If native parsing fails, it seamlessly falls back to using the ExifTool installed on your system, ensuring robust and reliable metadata extraction for all your files.

  • :outbox_tray: File-by-File Upload:
    Images are now uploaded individually, eliminating the need to create zip archives before uploading. This approach greatly reduces intermediate disk usage, speeds up the upload process, and enhances reliability—especially with large or interrupted uploads.

See the full release note:

3 Likes

There are massive improvements, great work @tao !

Folks - we’d love your feedback on how this new version is working for you.

Note that we will also incorporate this new version and capabilities into an upcoming release of Desktop Uploader (stay tuned!) cc: @nikola

2 Likes

decided to download mapillary_tools on my win10.

my pc did all nighter,

did upload all night, i woke up, IT WAS UPLOADED! (not like the app, takes 5 whooole days.)

2 Likes

Here are my initial findings on 0.14.

  • Orthogonality remains a key design principle for usability and clarity
    Expose all sequence limits and cutoffs also as --cutoff_* options to the user. There is really no need to hide them because all have default values.
    • MAPILLARY_TOOLS_MAX_CAPTURE_SPEED_KMH → --cutoff_speed with unit suffixes, like mps, m/s, kph, km/h, or mph, etc.
    • MAPILLARY_TOOLS_MAX_SEQUENCE_LENGTH → --cutoff_count
    • MAPILLARY_TOOLS_MAX_SEQUENCE_FILESIZE → --cutoff_filesize
    • MAPILLARY_TOOLS_MAX_SEQUENCE_PIXELS → --cutoff_pixels
  • Naming is crucial in every design
    • MAPILLARY_TOOLS_MAX_SEQUENCE_LENGTH can be misleading and is actually ambiguous because LENGTH is a dimension of distance, not an amount. Please rename it to MAPILLARY_TOOLS_MAX_SEQUENCE_COUNT.
  • mapillary_tools’ implementation should remain opaque to the user. So should Python and its intricacies. The (±)inf and (±)infinity literals are a special Python artifact. Please describe inf accepting options on the --help page with something like “inf” or “infinity” turns off this option, etc.
  • Unexpectedly, using the --verbose option on the process command basically silences output when correlating GPX tracks to images in the final write step. So actually when it is most needed. Users should see a printout of what is written or modified.
  • Reading the mapillary_image_description.json file is a tight bottleneck because the JSON validator implementation is poor, relies on Python’s slow string handling, and sequential (single core) processing only. Plus, there is no message of it nor of its progress for the user. Hence, it may look like mapillary_tools has hung up. When I want to upload a few thousand files it takes up to an hour before this file has been validated and anything starts going over the wire. This is unacceptable. Besides, does this description file really have to have bloat data? If I understand correctly, this file is supposed to just group image files into sequences.
  • It makes no sense to have to pass an import path with the upload command when you also pass a description file via the --desc_path option because the description file contains absolute image file paths by design. Having to have to pass an import path would only make sense in this case if the description file contained relative image file paths.
  • The upload command should search description files recursively for every passed import path.
  • Passing multiple import paths with the upload command should not require to also have to pass a description file via the --desc_path option. Again, upload should just search import paths for description files and upload them in the order passed. --desc_path should be treated as an override for any description files potentially found in import paths or any of their sub‑folders.
  • To make things even simpler, you do not actually need the --desc_path option with the upload command. Just enable users to pass either paths to description files or import paths because description files are descriptions of things ready to upload and import paths may contain ready to upload description files. If the passed path is a file then treat it as a description file, otherwise as an import path.
  • Enable the --skip_subfolders option with the upload command. This ties in with making the upload command search import path sub‑folders for description files.
  • Do not open and establish a separate TCP/HTTPS connection per image file. Reuse existing (keep alive) connections. Keep the amount of concurrent connections to as few as possible and only create additional connections for as long as they bring additional throughput but do not over saturate the link’s overall throughput. While it is a smart strategy to have multiple TCP/HTTPS connections on broadband links to increase throughput, you cannot give a static number which is going to work for all links. For example, although I am on a 1 Gbps fiber link, I had to set MAPILLARY_TOOLS_MAX_IMAGE_UPLOAD_WORKERS to 8 for my machine’s network stack not to get overburdened. mapillary_tools’ default 64 upload workers basically caused my network stack to stop to open new connections and mapillary_tools spewed errors about being unable to find the upload host, open new connections, or upload files. Granted, this usually only happened when uploading sequences with a couple of thousand files, right before finishing such a sequence. But anyhow, this should not happen. Unfortunately, this also had another ugly side effect in that mapillary_tools sort of blocked all other apps from opening network connections for some time because the network stack was busy closing connections and freeing resources.
1 Like

Occasionally, I get this error when uploading:

2025-08-01 14:46:52,338 - WARNING - Error uploading at offset=None since begin_offset=None: HTTPError: 412 Client Error: Precondition Failed for url: https://rupload.facebook.com/mapillary_public_uploads/mly_tools_3e70a8cff2214740b9f8aab0b38c2791.jpg
2025-08-01 14:46:52,339 - INFO    - Retrying in 0 seconds (1/200)

However, overall the affected sequences continue to complete.

@GITNE Thank you for the excellent feedback and suggestions — they are very helpful for us to keep improving our tools. All your suggestions make sense to me, and some are already on our roadmap

Do not open and establish a separate TCP/HTTPS connection per image file.

Yes, this is going to be my next optimization. It’s going to be some work though as we need to find a way to share HTTP sessions among multiple uploads (also make sure resumes and retries and so on continue to work). I think it very doable and looks promising. Once it’s there, I’d expect uploading to be even more reliable with less connections (or upload workers). cc @boris @nikola.

For example, although I am on a 1 Gbps fiber link, I had to set MAPILLARY_TOOLS_MAX_IMAGE_UPLOAD_WORKERS to 8 for my machine’s network stack not to get overburdened.

The default # of upload worker 64 is an empirical number from my test environments that starts to saturate the bandwidth. Per your comment it feels not easy to find a constant for all network environments. Let’s see if the Keep-Alive optimization can help us reduce the default # of workers to 32 or 16. If not, we will see if we can find a dynamic way to increase/decrease workers? Another good point to expose the envvar as a CLI param for user to adjust the bandwidth usage.

Orthogonality remains a key design principle for usability and clarity

Yes we are aware of the challenges. In H2 we are planning to improve the CLI’s UI/UX including redesigning these envvars and CLI params. Stay tuned.

Enable the --skip_subfolders option with the upload command.

That’s a good point. Let’s leave it for v0.15 as it’s a breaking change.

Unexpectedly, using the --verbose option on the process command basically silences output when correlating GPX tracks to images in the final write step.

Good callout. Fixing in the next version.

To make things even simpler, you do not actually need the --desc_path option with the upload command.

In some cases, users need to upload just a subset of files extracted in the description JSON file, then upload import_path –desc_path=desc.json can be useful.

2 Likes

I see. Then I guess I have not figured out the proper reason for the --desc_path option. However, I would expect users to have to create subset description files in this case. Because the case you point out also suggests an additional caveat in that the upload command can create sequences too or enables sequence creation control besides the process command, which kind of muddles the whole process → upload command steps concept. But, maybe I am missing something?