About image processing speed - ideas

bob3bob3 · March 8, 2026, 8:00am

I am about to start using a “new” camera in the vehicle that requires some time to process each days take. After some local processing I run the job in separate “process” and upload steps.

For a 200-300km travel day I might have 40,000 images and the process step can take 4-5 hours on an i5 laptop. This is is a vehicle so there is an energy budget as well. The images have a fully populated EXIF so there is no geotagging or rate/distance calcs to worry about.

I know most of the Linux OS tricks to optimise this, but would appreciate any general ideas. I note too that the --num_processes would set to the number of CPU’s and wonder if an increase would help. It currently doesn’t get very local SSD I/O or CPU intensive. I should point out that 40GBytes/day over a USB2 plug in drive is probably a bottleneck, so I’ll likely increase the read ahead buffer and write delay. From memory too there are hard references to the physical file locations so running “process” on the SSD and “upload” from a USB2 plugged drive (after moving) can be problematic.

Ideas appreciated.

osmplus_org · March 8, 2026, 10:36am

Hello bob,

When planning longer recordings, you should be aware that, in my experience, Mapillary and GSV can only process file uploads with a maximum size of 40 GB each.

This means you might have to edit longer recordings. Another method is to reduce the number of GPX points accordingly. In the Insta 360 universe, I edit sequences with a maximum length of 5 minutes each and reduce the GPX track output by a factor of 10. A DJI Osmo produces sequences of 15 minutes each, which cannot be edited. I reduce the file size during export from a nominal 50 GB to 40 GB and reduce the GPX track by a factor of 28.

This means that for each new camera, the optimal balance between recording duration, GPX density, and file size must be determined iteratively.

You should also keep in mind that longer journeys will very likely involve a mix of different speeds. Therefore, I recommend using video mode for such recordings and letting Mapillary decide how many frames are actually extracted from the video. The number of GPX nodes isn’t necessarily crucial here; Mapillary interpolates additional GPX points as needed. My experience has shown that a high GPX node density during upload is more of a hindrance than a help for this process.

Those using a GoPro MAX2 are in luck, as Mapillary takes care of everything for them.

bob3bob3 · March 8, 2026, 5:41pm

Okay thanks for that, but don’t know how significant that will be in my case as the tools are already looking at populated jpg EXIFs. (using exiftool to do the geotagging & interpolation) They don’t look at a GPX at all and so far the 3FPS frames have only had a few duplicate removals that signify deliberately dropped frames. My process command line though does have;

–duplicate_angle 90
–duplicate_distance 0.25
–cutoff_distance 50
–cutoff_time 180

Most of the take is at highway speeds of perhaps 8m/frame. I’ll probably increase the --duplicate_distance to about 2m (and --duplicate_angle to maybe 20) but have to keep in mind that the images are only 2K and are bit gluggy, so object recognition may be better.

For reference my source GPS data is NMEA at 5/sec from a ublox as a single daily file. The source video (that I process with ffmpeg) are 1 GByte, 5 minute 2K MOV’s at 30FPS that I only pull the (1 in 10) index/key frames from. Largest Mapillary sequences are then around 800-900 images.

GITNE · March 8, 2026, 6:30pm

Uploading to Mapillary and efficiency… I guess we can only dream. Like you have already mentioned, a USB 2.0 connected drive is always going to be the main bottleneck in this scenario. And, it will not help that the drive is a SSD. I too do my image post‑processing on a USB 2.0 connected hard drive (for multiple reasons there is no need to go into here). The HDD is more than fast enough for USB 2.0, so I accept this bottleneck.

However, for me the tightest bottleneck is mapillary_tools itself. It is incredibly slow and inefficient. Most things happen on one CPU core or thread only because of Python’s global interpreter lock (GIL). In other words, Python does not support true multi‑threading to this very day, despite having a threading API module. So, do not get fooled by --num_processes and --max_upload_workers options. They do not do exactly what you think they should do. Due to the GIL Python however does support asynchronicity at best. Effectively, this is comparable to Hyper‑Threading at CPU level or Windows 3.0-3.11 switched tasking. Only because of I/O’s inherent asynchronous nature on modern operating systems, you kind of get the illusion of multi‑threading in Python, although it is not true multi‑threading.
But, this is not the worst part. mapillary_tools has many horrendous internal bottlenecks. For instance, mapillary_tools validates its own JSON image description files (yes, these are files that it has just output itself!) with an awfully slow single threaded validator. Then, it MD5 hashes each file fully, also on a single thread only. When uploading, it does not even hash ahead all sequences in the upload queue. It only hashes the next sequence right before it commences uploading. All of the above may work okay for a handful of sequences with a couple hundred images each but it does not at all scale to hundreds of sequences with thousands of images per sequence. You, as a contributor, cannot fix this easily. This is something rather on Mapillary’s end to do. So, I am afraid, I cannot give you any really helpful ideas due to the “elephant in the room”.

I have said it before and will say it again; Python is a rapid prototyping tool, nothing more, nothing less. It should not be used in any serious production code scenarios, including running on thousands of end user machines. The best solution to this problem would have been if Mapillary had published an upload protocol specification years ago, like I have requested multiple times. This way, people would have innovated, competed, and implemented the best solution(s) for their needs. Instead, we are stuck with mapillary_tools as the GIL hampered gate of pain and sorrow.

GITNE · March 8, 2026, 7:19pm

Actually, there is piece of advice I can share with you, especially regarding your USB 2.0 connected drive scenario! I consistently get higher throughput on my drive when I set --num_processes 0, which effectively means single thread I/O.

bob3bob3 · March 8, 2026, 8:55pm

I guess some questions then. Unfortunately I am stuck with what is available! I personally would have liked for big uploaders to have a shell account on the Mapillary server and just use rsync to get batches of videos or images transferred..

About multiple instances generally? I seem to remember you made the comment some time ago that you regularly ran the tools this way and some recent patch/upgrade (SQlite?) helped the collisions that occasionally occurred. I also quite often run (the tools on) 4 camera view jobs concurrently and these collisions/abends haven’t happened for a while. Looking at the process table during uploading too I more or less figured that the --max_upload_workers functionality was handled that way.

IYO then has the ability to safely run concurrently reliable? That may even go well within your --num_processes 0 setting. I have had the vague impression that larger numbers of images being processed are not linear. ie 5000 may take 20x longer than 1000. Splitting the whole job into the number of CPU’s?

I am hoping to improve the process, not upload step. My last test run was about 5 hours process and 2 hours upload. The upload would not be hindered by a USB drive, but I wonder if running process on an i7/SSD then transferring it to the USB drive for upload on the i5 might help. I guess I’d also have to modify the json file for any file location change or do some creative symlinking. Can you see any other problem (apart from the log in .cache/mapillary_tools) doing an xfer?

For some of my processing too I have to throttle the laptop CPU, but am pretty sure the tools use never cause that. This all suggests an I/O bottleneck to me.

GITNE · March 8, 2026, 10:21pm

Well, this would be one idea. But, I am not sure this would be a good solution even for power users. mapillary_tools is like driving with the parking (or hand) brake applied. I have my own set of ideas how efficiency could be improved best, like publishing an upload protocol spec and others. But, this thread is not about sharing what could be done.

No, I do not run multiple instances, neither for processing nor uploading. And, I have always advised against it for uploading due to the way mapillary_tools is implemented, especially because of the way how it creates and handles the upload history and runs sequence upload sessions. However, AFAIK @TheWizard and @osmplus_org did or continue to run multiple instances. You can safely run the process sub‑command concurrently. So, this is where you can do as you like. However, you should not run the upload sub‑command concurrently. The collisions you are referring to were race conditions when using multiple upload workers (--num_upload_workers) that had to write the upload history. This was an internal mapillary_tools bug, which has been fixed quite some time ago. Generally speaking, for the sake of comparability, elimination of bugs, and thus making the conversation easier, try to always run the latest mapillary_tools version, despite the fact that you usually cannot expect much of a performance improvement on every new version.

This is true, mostly because of the sleepy JSON validator.

Like you say, this should work, as long as you update the absolute file paths in the JSON image description file. These absolute image file paths is another unfortunate design choice to me. It may have had some rational cause to work around another issue, which eludes me right now, but it nevertheless is unfortunate.

bob3bob3 · March 10, 2026, 8:11pm

Just the process step only on a 40,000 image (23GByte) EXIF populated set, using the i7 and SSD only took 10 minutes. Actually took longer to rsync it to the i5 USB drive.. Used creative symlinking. (have yet to upload)

Well worth it..

GITNE · March 10, 2026, 8:30pm

Certainly, USB 2.0 is definitely a bottleneck. But, we knew that already. However, your process step should have taken only 1 minute. In other words, mapillary_tools is still wasting ca. a 10× speedup.

Did you run multiple process instances? Did you play with --num_processes?

bob3bob3 · March 11, 2026, 5:47am

No just let it run with defaults. By the time I got back to check the process table (for multiple instances) it was already complete.

It does strip symlinks back to the base, so had to do some more fiddling to get that working. Have just uploaded another. In the scale of things saving 9 mins out of 10 isn’t much use to me when on average 20GByte transfers and uploads are involved. My DC energy budget is actually a major player.

TheWizard · March 20, 2026, 12:55pm

I do run multiple instances, but not in the same folders, to avoid corruption of files. But there is no issue to run 2 or 3 mapillary instances side by side but you have to run it from different target/source folders.

osmplus_org · April 14, 2026, 1:36pm

I tried to better utilize my provider’s upload bandwidth by using four Ubuntu machines, each with a desktop uploader. This failed. I upgraded my provider’s bandwidth, which ultimately solved the problem. Now I only have one instance running.

However, I have another issue. Sequences I upload that generate more than 2000 images in Mapillary are not processed by Mapillary and remain yellow in the timeline. In my case, reducing the size of recording sequences is extremely time-consuming due to my camera choice. Although the tool OSV2GPX—a project by dwikel—allows for sequence splitting, the practical handling is inhumanly tedious with a large number of recordings, which is why I avoid splitting them. I also don’t see any practical disadvantage to yellow sequences. One option would be to split sequences later in the web browser’s image options. As a contributor, I think it would be feasible to find a division point for very large sequences, for example at an intersection, and perform a division using a button in the image options. The sequence would retain the same name, and the sub-sequence could be identified by an added appendix.

boris · April 14, 2026, 1:51pm

@osmplus_org - what do you mean by “more than 2000 pixels”? Did you mean to say “pixels” here? cc: @balys who is working on this problem

GITNE · April 15, 2026, 3:00pm

If I am not mistaken mapillary_tools creates a new sequence every 1,000 images by default, which should also apply to videos. You can control sequence cutoff with MAPILLARY_TOOLS_MAX_SEQUENCE_LENGTH and MAPILLARY_TOOLS_MAX_SEQUENCE_PIXELS environment variables. Though, imho there is no rational reason for these cutoffs to be nothing other than infinite (inf) by default.

Besides, you are not the only one who suffers from forever yellow sequences. Everybody does. And, sequence image count has nothing to do with it.

Unfortunately, there are indeed tangible disadvantages.

No reconstructed camera positions
No spatial navigation
No traffic sign and point feature triangulation
No Time Travel feature
Sequences are not fully counted towards personal as well as global image and distance counters

osmplus_org · April 15, 2026, 3:13pm

If you see a problem, I’d like to point out the division of labor in this project. Contributors create Street View images and upload them using software provided by Mapillary. Why the shell software offers different options than the desktop software is beyond me. Software is software, and if Mapillary offers it this way, then that’s fine by me.

Topic		Replies	Views
Delays on ingested imagery uploaded from 11th Oct Contributing and equipment	35	733	March 26, 2026
Mapillary Tools 0.14 is released Command line tools	52	835	September 9, 2025
Current Processing delay [Solved] Command line tools	119	5501	September 5, 2025
How Many Images per Sequence? Imagery	24	560	May 25, 2026
Mapillary Desktop Uploader 4.6.0 is out now Contributing and equipment	67	1457	April 17, 2025

About image processing speed - ideas

Related topics