How Many Images per Sequence?

GITNE · April 19, 2025, 4:23pm

In other words; is it better to upload a single sequence or multiple sequences for reconstruction? Which improves existing reconstruction results best?

Mapillary help says that sequences should be limited to 1,000 images. However, imho this limit is rather low and quite arbitrary, especially if you want to somewhat group your sequences into areas, like streets or neighborhoods etc. Things get even more limiting if you capture in video mode and want to make use of every frame. Say you capture in 25 fps then this makes 40 s long sequences. This is oddly short. I think, I know the technical reason for this 1k images soft limit — it has to do with OpenSfM’s RAM requirements — but imho it should not actually exist. In fact, if you think about it for a bit longer then there should be no image count nor storage size limits on sequences. OpenSfM should probably process images in a stream with multiple passes rather than in one huge RAM dataset, which with the advent of SSDs is totally acceptable (though a hardware JPEG decoding step would surely massively help in this scenario).

Furthermore, it looks to me like OpenSfM chops up a sequence in bundles of 30 or so of chronologically ordered images, reconstructs these into bundles, and then tries to reconstruct the bundles to each other. It works but it seems to me like this may be the RAM limiting factor. Why not reconstruct following a binary tree; search for best overlapping (reconstructed) image pairs (into tuple-bundles) and then reconstruct tuples into quad-tuples (pairs of pairs), and so on?

I am asking because something tells me that as things are right now it would be best to upload 1 (or maybe 2) image sequences for best results.

Anyhow, maybe I am completely confused and do not know what I am talking about. So, it would be nice if you could share some insight in all of this. Thanks!

bob3bob3 · April 20, 2025, 7:40pm

I don’t give any thought to the end use when creating sequence splits.

I upload 2FPS BlackVue vids and more or less a continuous stream of (left pointing) images from 2 phones only at 3-38kph. They work out at around 0.5 to 2FPS and are not synchronous. (One uses Wifi and is a lot slower than the other’s USB connection)

The BlackVue vids are always 1 sequence per minute/file, so end up with about 120 images per. (It use to misses 5 images odd due to the NMEA data in the file being offset by 2-3 seconds. Most of my older uploads are 117/118 frames.)

The phone based images can breach the 500 frame sequence limit and split, but not often. I remove a stack of bad (eg sun in lens or blurred) or private frames (eg fronts of houses/homes where no other useful mapping features are in view) manually before running the tools.

For me the biggest factors are associated with time and cost of upload. Australia has a low population density compared to ROW, making for weaker cell signals and higher cost. The average cell phone system design really wants to see a base site within maybe 3-5km, yet I regularly upload at 10-12km consuming a lot of battery/solar energy to do so.

This is all not very relevant to your post of course, but my view is that after the server “could” adjust lat/lon/dir/fov for errors and break for date/time shifts, it would be possible to do anything in reconstructing. ie a sequence is only handy as an error reduction method and a way to reduce processing/CPU time. Multiple frames (sequence agnostic) will eventually be used. Might take a year to get there though!

I often deliberately capture the same view twice, (2 sequences) on the side cameras as to fit between pedestrians and parked cars. I am of course looking for business names/types/contacts for OSM entry. If traffic permits I will also vary by road speed. Splitting over a few hours is also helpful for better lighting angles and avoiding view obstructions.

My view.. Cheers.

GITNE · April 20, 2025, 11:30pm

In theory, a sequence is just a data structure for grouping images, nothing more and nothing less. Reconstruction complexity should scale linearly, i.e. O(n) = n. Reconstructing even hundreds of millions of images (with a couple of thousand features per image) per sector or batch should not be limited much by RAM these days. Sure, you can run out of RAM quickly if you massively crank up the number of features per image but that’s a function of diminishing returns, so that you usually want to tune the process to as few features per image as possible to get acceptable reconstruction results in some finite time. Maybe I am just missing something.

bob3bob3 · April 21, 2025, 3:11am

I suspect not linear but more a square or cubed thing. (- processing areas and possibly volumes) I am not at all across the method but it will be resource intensive, not just RAM but time and grunt. Complex things just take longer. If the maths used resource is large compared to the I/O resource, then the process may benefit from a widely distributed processing approach.

And one asks are we only defining/recognizing objects or building a 3D view (with other uses) first. Kind of like ISAR (inverse synthetic aperture radar). That uses a moving sensor platform which isnt unlike the Mapillary sequence approach. ie “using a reasonably straight sensor line” lock down as many variables as possible and process until a defined quality is achieved. If the quality isn’t achieved with 1 or a few sequences then bring in more nearby ones until it is, or wait/prompt for more image input. Meaning drive the need for more imagery not on so much on what doesn’t exist, but whether the quality of an area of interest is too low.

This is all just waffling and talking through my hat stuff though. I am sure the Mapillary people have sat down and figured the best course already.

GITNE · April 21, 2025, 6:06pm

You are right, I was overly optimistic. The naive implementation demands square complexity. However, a binary tree optimization would push it below square, yet won’t make it linear. Nevertheless, I don’t think that RAM could be a limiting factor.

I am sure too they have but this 1k figure makes no sense to me.

boris · April 22, 2025, 10:54am

Quick update on that point - I think the 1,000 number was historical, and no longer accurate. We have removed it from the help center - generally you can capture and upload as many images/videos as you’d like - the more the merrier!

paulinus · April 22, 2025, 10:57am

From a reconstruction perspective, there is no big difference in uploading long sequences as a single one or two. Internally, we align different sequences to each other anyway.

The only thing that is not advisable are very short sequences. We use the fact that images are in a sequence to group the processing of those. Having very short sequences can limit the quality of the reconstructions.

That said, after say 50 images, it does not make a big difference. As @GITNE guesses, we currently process long sequences in chunks and therefore the total length of the sequence has little impact on the result.

Keep in mind though, that the way images are processed could change in the future. So the choice of the sequence length should be independent of it. If you are capturing images as a sequence, upload them as a sequence and don’t worry about how long or short it is.

GITNE · April 22, 2025, 11:36am

@boris @paulinus Thank you for the clarification! It was really bugging me because some of my over 1k sequences get stuck others are processed rapidly. Hence, I was really confused that maybe I was doing something wrong.

GITNE · June 18, 2025, 3:35pm

@nikola There is another limit in MapillaryJS: The sequence media bar is limited 10k images. You cannot play beyond 10k images either.

You can however navigate to a > 10kth image but only by space navigation arrows or by clicking a dot on the map. Another thing to consider might be to let the media bar grow up to the viewport’s width with ever more numerous sequences.

GITNE · June 23, 2025, 11:49am

@paulinus @abalys I have a few more questions about the correlation between images per sequence, space density, and Mapillary reconstruction accuracy.

Do you favor image resolution or camera position density (number of image per unit of volume), or both? @boris has suggested that Mapillary targets ⅓ images per meter with videos, which seems to me rather a bit sparse for high density maps. Does then going denser than ⅓ images per meter mean a waste of effort? Bee Maps goes for 30 fps in their capturing devices (for cars), which equates to denser coverage per meter when capturing by car at urban speeds. I am though aware that Bee Maps may not be using every frame for reconstruction either.
Is there a minimum space density resolution (camera positions per unit of volume) in Mapillary’s reconstruction?

Some of my sequences fail due to not reconstructing and I do not know why:

9bO4gcxKUn7GHIDRAQ0Mtm
a6OP2L8gFd0TXIYVQyjEhD
wo8HOv2YmbLVuRl4WngDe7
GbzQ908mheDJPL76cIXstF
btm3hiTFOWzqdyunCK7aBG

abalys · June 23, 2025, 6:06pm

Hello - thanks for the excellent questions,

@paulinus @Leonardo are the right people to answer this!

Kind regards,
Balys

Topic		Replies	Views
Questions about "sequences "	12	2032	October 17, 2017
Hundreds of 1 or 2 image sequences? Contributing and equipment	2	509	December 15, 2020
Maximum images in sequence for edit now limited to 1000? Web and notifications	4	803	August 22, 2017
Android app crashed while uploading Mobile apps	5	287	June 27, 2023
Disordered sequences Website	3	332	June 20, 2023

How Many Images per Sequence?

Related topics