Mapillary Tools 0.14 is released

To upgrade:

pip install -U mapillary_tools

# or simply run
uv tool run mapillary_tools --version
mapillary_tools version 0.14.0

:glowing_star: Highlights

  • :rocket: Performance Boosts:
    Enjoy massive speed improvements across the board, including faster native EXIF, GoPro, CAMM, and Blackvue video processing. Uploads and processing are now significantly more responsive, making large dataset handling smoother than ever.

  • :hammer_and_wrench: Built-in ExifTool Fallback:
    Mapillary Tools now uses its optimized native parsers for EXIF, CAMM, GoPro, and related metadata by default. If native parsing fails, it seamlessly falls back to using the ExifTool installed on your system, ensuring robust and reliable metadata extraction for all your files.

  • :outbox_tray: File-by-File Upload:
    Images are now uploaded individually, eliminating the need to create zip archives before uploading. This approach greatly reduces intermediate disk usage, speeds up the upload process, and enhances reliability—especially with large or interrupted uploads.

See the full release note:

4 Likes

There are massive improvements, great work @tao !

Folks - we’d love your feedback on how this new version is working for you.

Note that we will also incorporate this new version and capabilities into an upcoming release of Desktop Uploader (stay tuned!) cc: @nikola

2 Likes

decided to download mapillary_tools on my win10.

my pc did all nighter,

did upload all night, i woke up, IT WAS UPLOADED! (not like the app, takes 5 whooole days.)

2 Likes

Here are my initial findings on 0.14.

  • Orthogonality remains a key design principle for usability and clarity
    Expose all sequence limits and cutoffs also as --cutoff_* options to the user. There is really no need to hide them because all have default values.
    • MAPILLARY_TOOLS_MAX_CAPTURE_SPEED_KMH--cutoff_speed with unit suffixes, like mps, m/s, kph, km/h, or mph, etc.
    • MAPILLARY_TOOLS_MAX_SEQUENCE_LENGTH--cutoff_count
    • MAPILLARY_TOOLS_MAX_SEQUENCE_FILESIZE--cutoff_filesize
    • MAPILLARY_TOOLS_MAX_SEQUENCE_PIXELS--cutoff_pixels
  • Naming is crucial in every design
    • MAPILLARY_TOOLS_MAX_SEQUENCE_LENGTH can be misleading and is actually ambiguous because LENGTH is a dimension of distance, not an amount. Please rename it to MAPILLARY_TOOLS_MAX_SEQUENCE_COUNT.
    • Rename ‑‑interpolate_directions to or supplement with ‑‑normalize_directions because it does not actually interpolate anything.
  • mapillary_tools’ implementation should remain opaque to the user. So should Python and its intricacies. The (±)inf and (±)infinity literals are a special Python artifact. Please describe inf accepting options on the --help page with something like “inf” or “infinity” turns off this option, etc.
  • Unexpectedly, using the --verbose option on the process command basically silences output when correlating GPX tracks to images in the final write step. So actually when it is most needed. Users should see a printout of what is written or modified.
  • Reading the mapillary_image_description.json file is a tight bottleneck because the JSON validator implementation is poor, relies on Python’s slow string handling, and sequential (single core) processing only. Plus, there is no message of it nor of its progress for the user. Hence, it may look like mapillary_tools has hung up. When I want to upload a few thousand files it takes up to an hour before this file has been validated and anything starts going over the wire. This is unacceptable. Besides, does this description file really have to have bloat data? If I understand correctly, this file is supposed to just group image files into sequences.
  • It makes no sense to have to pass an import path with the upload command when you also pass a description file via the --desc_path option because the description file contains absolute image file paths by design. Having to have to pass an import path would only make sense in this case if the description file contained relative image file paths.
  • The upload command should search description files recursively for every passed import path.
  • Passing multiple import paths with the upload command should not require to also have to pass a description file via the --desc_path option. Again, upload should just search import paths for description files and upload them in the order passed. --desc_path should be treated as an override for any description files potentially found in import paths or any of their sub‑folders.
  • To make things even simpler, you do not actually need the --desc_path option with the upload command. Just enable users to pass either paths to description files or import paths because description files are descriptions of things ready to upload and import paths may contain ready to upload description files. If the passed path is a file then treat it as a description file, otherwise as an import path.
  • Enable the --skip_subfolders option with the upload command. This ties in with making the upload command search import path sub‑folders for description files.
  • Do not open and establish a separate TCP/HTTPS connection per image file. Reuse existing (keep alive) connections. Keep the amount of concurrent connections to as few as possible and only create additional connections for as long as they bring additional throughput but do not over saturate the link’s overall throughput. While it is a smart strategy to have multiple TCP/HTTPS connections on broadband links to increase throughput, you cannot give a static number which is going to work for all links. For example, although I am on a 1 Gbps fiber link, I had to set MAPILLARY_TOOLS_MAX_IMAGE_UPLOAD_WORKERS to 8 for my machine’s network stack not to get overburdened. mapillary_tools’ default 64 upload workers basically caused my network stack to stop to open new connections and mapillary_tools spewed errors about being unable to find the upload host, open new connections, or upload files. Granted, this usually only happened when uploading sequences with a couple of thousand files, right before finishing such a sequence. But anyhow, this should not happen. Unfortunately, this also had another ugly side effect in that mapillary_tools sort of blocked all other apps from opening network connections for some time because the network stack was busy closing connections and freeing resources.
2 Likes

Occasionally, I get this error when uploading:

2025-08-01 14:46:52,338 - WARNING - Error uploading at offset=None since begin_offset=None: HTTPError: 412 Client Error: Precondition Failed for url: https://rupload.facebook.com/mapillary_public_uploads/mly_tools_3e70a8cff2214740b9f8aab0b38c2791.jpg
2025-08-01 14:46:52,339 - INFO    - Retrying in 0 seconds (1/200)

However, overall the affected sequences continue to complete.

@GITNE Thank you for the excellent feedback and suggestions — they are very helpful for us to keep improving our tools. All your suggestions make sense to me, and some are already on our roadmap

Do not open and establish a separate TCP/HTTPS connection per image file.

Yes, this is going to be my next optimization. It’s going to be some work though as we need to find a way to share HTTP sessions among multiple uploads (also make sure resumes and retries and so on continue to work). I think it very doable and looks promising. Once it’s there, I’d expect uploading to be even more reliable with less connections (or upload workers). cc @boris @nikola.

For example, although I am on a 1 Gbps fiber link, I had to set MAPILLARY_TOOLS_MAX_IMAGE_UPLOAD_WORKERS to 8 for my machine’s network stack not to get overburdened.

The default # of upload worker 64 is an empirical number from my test environments that starts to saturate the bandwidth. Per your comment it feels not easy to find a constant for all network environments. Let’s see if the Keep-Alive optimization can help us reduce the default # of workers to 32 or 16. If not, we will see if we can find a dynamic way to increase/decrease workers? Another good point to expose the envvar as a CLI param for user to adjust the bandwidth usage.

Orthogonality remains a key design principle for usability and clarity

Yes we are aware of the challenges. In H2 we are planning to improve the CLI’s UI/UX including redesigning these envvars and CLI params. Stay tuned.

Enable the --skip_subfolders option with the upload command.

That’s a good point. Let’s leave it for v0.15 as it’s a breaking change.

Unexpectedly, using the --verbose option on the process command basically silences output when correlating GPX tracks to images in the final write step.

Good callout. Fixing in the next version.

To make things even simpler, you do not actually need the --desc_path option with the upload command.

In some cases, users need to upload just a subset of files extracted in the description JSON file, then upload import_path –desc_path=desc.json can be useful.

2 Likes

I see. Then I guess I have not figured out the proper reason for the --desc_path option. However, I would expect users to have to create subset description files in this case. Because the case you point out also suggests an additional caveat in that the upload command can create sequences too or enables sequence creation control besides the process command, which kind of muddles the whole processupload command steps concept. But, maybe I am missing something?

Hello,

thanks @GITNE for telling me the env var MAPILLARY_TOOLS_MAX_IMAGE_UPLOAD_WORKERS

I was having a lot of problems uploading with 0.14 before. Now it works ok. I set the var to 3 and my problems are gone.

Problems with the default no of workers: High amount of errors or warnings like:

WARNING - Error uploading at offset=0 since begin_offset=0: SSLError: HTTPSConnectionPool(host=‘rupload.facebook.com’, port=443): Max retries exceeded with url: /mapillary_public_uploads/mly_tools_91b1ebecc021e
2ab2b3ea8a857fd4efc.jpg (Caused by SSLError(SSLError(5, ‘[SSL: UNEXPECTED_EOF_WHILE_READING] unexpected eof while reading (_ssl.c:2649)’)))

ERROR - HTTPError: POST https://rupload.facebook.com/mapillary_public_uploads/mly_tools_7e71423ece9379045559204ee011815b.jpg => 400 Bad Request: {“debug_info”: {“retriable”: false, “type”: “StorageWriteFailed
Error”, “message”: “Failed to write to storage”}}

Error uploading at offset=0 since begin_offset=0: ConnectionError: (‘Connection aborted.’, RemoteDisconnected(‘Remote end closed connection without response’)

But the real problems came with no error message. Upload was just cancelled, no message, and as a change to 0.14b1 the whole command ended. So, when I have 15 sequences in the folder to upload, and sequence 8 failed without error, it just stops and would not try to upload the next sequence. One example:

Uploading IMAGE (38/46): 52%|████████████████████████████████████████████████████████████████████████████████████████▊ | 311M/598M [04:33<00:22, 13.2MB/s]

2025-08-12 19:50:21,158 - INFO - ==> Upload summary
2025-08-12 19:50:21,158 - INFO - Skipped 36 already uploaded sequences (use --reupload to force re-upload)
2025-08-12 19:50:21,158 - INFO - 1 sequences uploaded
2025-08-12 19:50:21,158 - INFO - 367.9 MB read in total
2025-08-12 19:50:21,158 - INFO - 367.9 MB uploaded
2025-08-12 19:50:21,158 - INFO - 15.7976 upload time

As you can see, 46 sequences in the folder, sequence 38 failed, no reason given (not any of the frequent error messages was written), and the upload command ends even if it could try to go on with sequence 39 like the beta did.

Another annoying thing: The same sequence shows up in my profile up to 6 times! So, I guess the acknowledge of a finished sequence could not be retrieved properly with too many workers. This bug did not happen anymore when using only 3 workers.

Sequence split bug is still the same as i reported for the last beta, no change here.

1 Like

Another thing that has been bugging me ever since is the naming of the ‑‑interpolate_directions option. Actually, it does not interpolate anything. It normalizes directions. Hence, I would highly appreciate it if it would be renamed to or at least supplemented with a synonymous ‑‑normalize_directions option. Thank you.

1 Like

@GITNE Uploading with keep-alive and logging for “process” are added in Release v0.14.1 · mapillary/mapillary_tools · GitHub

Let me know if you still see issues with uploading.

2 Likes

@Teddy73 It looks strange – I can’t reproduce the issues you reported. MT does guarantee outputs=inputs, unless fatal errors occur, which usually printed out at least.

In your second case, it showed 46 sequences to upload, it then showed 36 sequences were already uploaded hence skipped, and finally got 1 sequence uploaded. There are still 9 sequences didn’t show any messages.

Would be great if you could help us debug with the following:

Thanks!

Have some strangeness with 14.1 upload. Done as separate process then upload. No tools environment settings made.

Abended - last few lines below. Doesn’t like uploading zipped logs The big worry for me is wondering if I just paid my provider to upload 680MBytes that just got dumped.

Restarted just the upload. Seemed to recover okay. Might be good to handle the output of this better.

05:37:29.686 - DEBUG - UPLOAD_PROGRESS: {“sequence_idx”: 0, “total_sequence_count”: 23, “sequence_image_count”: 340, “sequence_uuid”: “0”, “file_type”: “image”, “sequence_md5sum”: “7a8114281963098466fbc8ac74bc4efd”, “entity_size”: 1582390, “upload_start_time”: 1755891120.0699842, “upload_total_time”: 0, “upload_last_restart_time”: 1755891120.0699842, “upload_first_offset”: 0, “import_path”: “/mybook/mupload/20250822WS/process/20250821_224337_192.jpg”, “chunk_size”: 1582061, “retries”: 0, “begin_offset”: 0, “offset”: 1582390}
05:37:29.689 - DEBUG - UPLOAD_FAILED: {“sequence_idx”: 0, “total_sequence_count”: 23, “sequence_image_count”: 340, “sequence_uuid”: “0”, “file_type”: “image”, “sequence_md5sum”: “7a8114281963098466fbc8ac74bc4efd”, “entity_size”: 686702963, “upload_start_time”: 1755891120.0699842, “upload_total_time”: 0, “upload_last_restart_time”: 1755891120.0699842, “upload_first_offset”: 0}
05:37:30.393 - INFO - ==> Upload summary
05:37:30.393 - INFO - Nothing uploaded. Bye.
Traceback (most recent call last):
File “main.py”, line 8, in
File “mapillary_tools/commands/main.py”, line 156, in main
File “mapillary_tools/commands/upload.py”, line 82, in run
File “mapillary_tools/upload.py”, line 122, in upload
File “mapillary_tools/upload.py”, line 113, in upload
File “mapillary_tools/upload.py”, line 632, in _continue_or_fail
File “mapillary_tools/uploader.py”, line 559, in upload_images
File “mapillary_tools/uploader.py”, line 585, in _upload_sequence_and_finish
File “mapillary_tools/uploader.py”, line 580, in _upload_sequence_and_finish
File “mapillary_tools/uploader.py”, line 656, in _upload_images_parallel
File “concurrent/futures/_base.py”, line 456, in result
File “concurrent/futures/_base.py”, line 401, in __get_result
File “concurrent/futures/thread.py”, line 59, in run
File “mapillary_tools/uploader.py”, line 690, in _upload_images_from_queue
File “mapillary_tools/uploader.py”, line 741, in init
File “mapillary_tools/uploader.py”, line 822, in _maybe_create_persistent_cache_instance
File “mapillary_tools/history.py”, line 146, in clear_expired
File “dbm/init.py”, line 89, in open
dbm.error: db type could not be determined
[PYI-2473:ERROR] Failed to execute script ‘main’ due to unhandled exception!

Thanks for the feedback @bob3bob3

Looks like there is a chance the intermediate database can get corrupted if

  1. Run uploads partially
  2. Update mapillary_tools or Python to another version
  3. Resume the uploads

Can you share more details like which python versions did you run?

Thanks @tao

Using Python 3.11.2 (Bookworm) and the 14.1 appimage.

After that job re-upload I ran two batches of BlackVue’s. The first (53) worked fine, but the second (120) only got as far as 20. Restarts continued to fail even with --skip_process_errors, the completed vids removed from the tree, and a modem restart (new IP address)

06:42:14.114 - DEBUG - HTTP POST https://rupload.facebook.com/mapillary_public_uploads/mly_tools_469ccc37ef3947d8c6ad73741d6762d7.mp4 HEADERS={‘Offset’: ‘0’, ‘X-Entity-Name’: ‘mly_tools_469ccc37ef3947d8c6ad73741d6762d7.mp4’} TIMEOUT=(60, 924.9480078125)
06:42:15.308 - DEBUG - UPLOAD_PROGRESS: {“total_sequence_count”: 120, “sequence_idx”: 20, “file_type”: “blackvue”, “import_path”: “/mybook/mupload/20250822BV/20250822_102404_NF.mp4”, “sequence_md5sum”: “856f804beef0a7ed0971c5967450e5fd”, “entity_size”: 47357338, “chunk_size”: 2097152, “retries”: 0, “begin_offset”: 0, “upload_start_time”: 1755895333.2785213, “upload_total_time”: 0, “upload_last_restart_time”: 1755895334.1145275, “upload_first_offset”: 0, “offset”: 2097152}
06:42:26.972 - DEBUG - HTTP 400 Bad Request (12857 ms): {“debug_info”: {“retriable”: false, “type”: “ProcessingFailedError”, “message”: “Request processing failed”}}
06:42:26.973 - DEBUG - UPLOAD_FAILED: {“total_sequence_count”: 120, “sequence_idx”: 20, “file_type”: “blackvue”, “import_path”: “/mybook/mupload/20250822BV/20250822_102404_NF.mp4”, “sequence_md5sum”: “856f804beef0a7ed0971c5967450e5fd”, “entity_size”: 47357338, “chunk_size”: 1219994, “retries”: 0, “begin_offset”: 0, “upload_start_time”: 1755895333.2785213, “upload_total_time”: 0, “upload_last_restart_time”: 1755895334.1145275, “upload_first_offset”: 0, “offset”: 47357338, “_last_upload_progress_debug_at”: 1755895335.3077962}
06:42:27.530 - INFO - ==> Upload summary
06:42:27.530 - INFO - 20 blackvue uploaded
06:42:27.530 - INFO - 939.1 MB read in total
06:42:27.531 - INFO - 939.1 MB uploaded
06:42:27.531 - INFO - 262.763 seconds upload time
06:42:27.531 - ERROR - HTTPError: POST https://rupload.facebook.com/mapillary_public_uploads/mly_tools_469ccc37ef3947d8c6ad73741d6762d7.mp4 => 400 Bad Request: {“debug_info”: {“retriable”: false, “type”: “ProcessingFailedError”, “message”: “Request processing failed”}}
Traceback (most recent call last):
File “mapillary_tools/commands/main.py”, line 156, in main
File “mapillary_tools/commands/process_and_upload.py”, line 33, in run
File “mapillary_tools/commands/upload.py”, line 82, in run
File “mapillary_tools/upload.py”, line 122, in upload
File “mapillary_tools/upload.py”, line 113, in upload
File “mapillary_tools/upload.py”, line 630, in _continue_or_fail
File “mapillary_tools/uploader.py”, line 250, in upload_videos
File “mapillary_tools/uploader.py”, line 902, in upload_stream
File “mapillary_tools/uploader.py”, line 994, in _handle_upload_exception
File “mapillary_tools/uploader.py”, line 895, in upload_stream
File “mapillary_tools/uploader.py”, line 1068, in _upload_stream_retryable
File “mapillary_tools/upload_api_v4.py”, line 163, in upload_shifted_chunks
File “requests/models.py”, line 1026, in raise_for_status
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://rupload.facebook.com/mapillary_public_uploads/mly_tools_469ccc37ef3947d8c6ad73741d6762d7.mp4

This is server side issue. Try reuploading in a few hours or next day it should work.

This issue `db type could not be determined` should be fixed in Release v0.14.2 · mapillary/mapillary_tools · GitHub

FYI @GITNE JSON validation is also slightly improved in v0.14.2 above.

1 Like

What’s specified in --desc_path can be considered as a metadata database. Users can upload just part of files extracted in the database via upload subset_of_import_paths instead of changing the database.

Right not it’s stored as JSON. In the next versions we might consider other more compact format, e.g. sqlite3 or protobuf.

You can think of process as exiftool (but specialized for geotagging) that extracts metadata and store them in desc_path. Upload simply uploads selected ones from the database.

1 Like

Thank you for iterating and implementing improvements quickly. :+1:

I do understand the reasoning behind the upload command’s ‑‑desc_path option (with an import path) and it is fair. What bugs me is that you cannot pass the upload command multiple import paths without also passing the ‑‑desc_path option. Intuitively, one would expect upload to recursively search every import path for description files and act on them.

Great, this should be a smart move! :smile: Personally, I would favor a SQLite database over Protobuf mainly because SQLite has a widely readily available simple tool to examine and modify SQLite databases for debugging and development. Imho it is much easier to figure out what has gone wrong with SQLite than with Protobuf. Plus, SQLite is lightweight, fast, has a low memory footprint, produces a compact single file, and enables fast search and structured modification. Validation is also extremely fast. Protobuf is neat too but it has been designed rather more with serialization (over the wire) in mind than with searching and modifying content quickly and easily. Content transformation with Protobuf can become a real pain in the rear end. SQLite basically encapsulates structured content data and metadata (data structure description) together in one artifact whereas Protobuf relies on sidecar .proto files for describing data structure (metadata) and thus also reading and writing data (hence the trouble when transforming data). Please, do not get me wrong; JSON is also great (mainly because it is simple self‑describing text and thus human readable) but implementing performant JSON validation is really non‑trivial.

Anyway, once you decide on a structurally searchable and transformable data format, you may then also make it live in a presumably always writable location, like $XDG_CACHE_HOME/mapillary_tools, than an arbitrary import path. Upload history could also become part of this data structure. :wink:

Thanks @GITNE

Yes, sqlite3 sounds very promising given all advantages you mentioned, also its builtin support in Python. Actually MT is already using it behind to cache some upload results, so I think it a natural move too to eventually using sqlite3 for most of local storage/cache. cc @nikola

1 Like

Edit - Ignore this. A WiFi router power recycle solved this. Tnx

Just some feedback

I upload over variable speed (often slow) 4G networks and have generally found that the default MAPILLARY_TOOLS_MAX_IMAGE_UPLOAD_WORKERS in 14.1 and 14.2 choke the network connection between the laptop(s) and WiFi/4G router. AFAIK 4G is usually asymmetric as well. (if significant) I can’t for example do concurrent OSM editing like I use to. (ie mapillary_tools uploading on one laptop makes the other laptop mostly unusable)

I am assuming that a pre environment variable of MAPILLARY_TOOLS_MAX_IMAGE_UPLOAD_WORKERS=1 will solve this but I haven’t tried yet.