Why is Mapillary breaking everything?

Thanks to Mapillary’s move away from GitHub to an email system, my issues are no longer public. However, I would like to share my experiences and see if other people are noticing things as well, or if I’m seriously unlucky.

Issue 1:
In mid-August, uploaded sequences were breaking out of order, causing “spiderwebs” across the map as the connecting lines between unordered points streaked everywhere. This issue was brought up in this thread (long read). The problem was fixed, and sequences were uploading fine.

However, a lot of my sequences remained broken, and Mapillary either overlooked them or some automatic process broke down. See this part of the map for an example. Unfortunately, sequence_bot got ahold of these broken sequences and chopped them up into little bits. So now my single sequences were made up of some longish ones, and MANY 1–5 photo sequences. I tried getting it resolved, but eventually tried deleting them myself. That leads to…

Issue 2:
I deleted a whole bunch of sequences manually, and they were either auto-approved under @peter’s name, or he actually manually approved the deletion. Great! But the photos weren’t completely gone. The points remain on the map, show a thumbnail on hover, but then pop up a “Sorry, we can’t find the image you’re looking for” message (click most of the points in the map view above).

I emailed Mapillary support on September 28th about this and issue 3 (we’ll get there). They eventually got back to me about it a full month later, saying they fixed issue 3, but mentioned nothing about issue 2. @Brenna said a couple days ago that they’re looking into that…

However, I recently tried deleting some more sequences, fixed from issue 3, and most of them left the map, but not all. See this orphaned bit? That’s what’s left. Hovering over the points shows no thumbnail, at least.

Issue 3:
So I was waiting for my deleted sequences to just disappear already. Meanwhile, something happened in early September where one sequence was split into more than one (same thread, post 59ish). This wasn’t causing the same spiderwebs as before, because sequences were in order, just not all the photos were in one sequence over one span. This is what support “resolved” out of my two emailed issues. They basically stitched the various sequences back together by time stamp and user. (I’m still waiting for my sequence from Sept. 13th to be fixed, but WHATEVER.)

However, this hacky fix meant that my two cameras taking concurrent sequences at different angles were stitched into one! That’s what I tried deleting that has now been orphaned. I still haven’t touched a few others. Check out how fun it looks!

Issue 4:
Finally, a lot of the above problems were exacerbated by sequence_bot doing automated fixes. Some of it made sense, because it was just doing its job with the broken information it had…

But at some point in the last few weeks, a change was enacted by a human that made sequence_bot cut anything over 1000 photos into neat chunks of 1000. I say neat, but it doesn’t do this right either.

Here’s a nice year+ old sequence of mine, next to a similar one in the opposite direction by @danbjoseph. Both seem to have been cut, and not cleanly. See the nice long line? There should be a whole bunch of photos in there. And that sequence was a whole lot more filled in. Perhaps this is part of what @Brenna mentioned in a support email: “You had 21k images deleted by the mapillary Recache system after there were some very long sequences broken up into shorter ones”. These were only 7k, though.

Summary:
Manual uploads and sequence_bot have caused some major problems in the last few months. Mapillary Support seems to be slowly trying to help, but it hasn’t actually fixed anything for me yet. I have lately manually uploading in 1000-photo bits, and so far no problems. But I’m rather demotivated to continue for now. And when you click on a sequence_bot post in your feed, at least it mentions which photos it messed with, but saying “Our bot discovered that some of your images were too far apart to make sense in one sequence, so they were split into several shorter ones” is a blatant lie when they’re just cutting them into shorter bits for some reason they have not told us the contributors.

EDIT: 21k photos on my pre-September cameras would have taken at least 11.67 hours. I have never uploaded one sequence of that many…

6 Likes

Mapilllary has always been a “chemin des combattants”.

Peter and all,

TLDR: Many of these issues are surfacing now as we are working on fixing old inconsistencies in Mapillary from before last summer, and we will continue to crank until we fix this.

Let me just answer inline, thanks to your issue - grouping (copying in your original post below)

Thanks to Mapillary’s move away from GitHub to an email system, my issues are no longer public. However, I would like to share my experiences and see if other people are noticing things as well, or if I’m seriously unlucky.

Going away from GitHub to a model of Zenhub/Forums is a move we made to make the process of support more smooth and transparent. As a developer, I’m - like you - not 100% sure that this is the case, however, we have been noticing both customers who want us to handle their issues privately, and community members who are not comfortable with using Github and issue trackers. Keeping all 3 channels is too much of an overhead for the team, so we decided to go down to 2 channels. We can always revisit that decision

Issue 1:
In mid-August, uploaded sequences were breaking out of order, causing “spiderwebs” across the map as the connecting lines between unordered points streaked everywhere. This issue was brought up in this thread (long read). The problem was fixed, and sequences were uploading fine.

Let me tell you a story …

The background to all this sequence madness is that we had a serious emergency back before summer with our PostgresQL database backend. Due to the increased traffic and some remnants of non-performing Ruby Rails code that did a lot of small transactions to the backend, we realized that we were running slowly out of transaction IDs. You can have 2 billion tx ids (increasing for every write TX you do) and the PSQL vacuum process is freeing up old TX ids so they can be reused. When we realized this, a vacuum process cycle was freeing 400M ids which took 3 weeks, and growing. We were using 600M TX ids during these 3 weeks, and we were at 1.6 billion as the highest ID. We were preparing to move and clean out the database system over time, but hoped we would get a few weeks. However, while the next cycle was running, our cloud provider KILLED the vacuum process wihout alerting us first after 2.5 weeks. Basically we had just a few days to move the database and the data to a totally new instance before running out of IDs, at which point PSQL will stop accepting all transactions until it has available IDs, rendering the database dead for a number of weeks until a vacuum finishes (at that point about 5 weeks).

In the old domain model in PSQL, every image could be potentially part of multiple sequences. Also, there were numerous data inconsistencies that we wanted to clean up before moving to a more strict and compact model, where every image is part of exactly one sequence. With this panic move, we had no time to run the cleanups and decided to move the images in a way over to the new model that gave them the last sequence key if there were several (and there were many of these, due to errors in code). This was assuming that the other of our backend system, Elastic, was holding the last sequence key (since on sequence moves, the last one would be written into the image document).

The move went well, and in August I started working on making the backend more consistent, adding many more checks to our Reaching system (the one that takes a sequence, checks that all images are in a consistent state between the different databases, and have all the information like blurs etc we need to be able to publish the sequence, and posts the updated vector tile regions to a tiler that then generates the updated tiles=green lines, for the old and the new positions).

However, a lot of my sequences remained broken, and Mapillary either overlooked them or some automatic process broke down. See this part of the map for an example. Unfortunately, sequence_bot got ahold of these broken sequences and chopped them up into little bits. So now my single sequences were made up of some longish ones, and MANY 1–5 photo sequences. I tried getting it resolved, but eventually tried deleting them myself. That leads to…

It turned out that the assumption that the last sequence key is consistent not always has been true. I scheduled first the sequences over 10km long for recaching and consistency check. If images in PSQL (Postgres) and ELS (Elasticsearch) had different sequence IDs, that process attaches the images to the ELS sequence. The tiles are generated from the PSQL sequence data.

It turns out that even the ELS data is not fully consistent, which is evident in the overlapping sequence data you are seeing. I have stopped the recaching process now, and are thinking to do a brute-force data consolidation - for every time we are seeing inconsistency, we could re-calculate the correct sequences not only by looking at existing sequence keys and incrementally attach images to existing sequences, but actually re-reading all uploads from the user for that day, and totally recalculate new sequences based on the time and location of all images relative to each other. That would work, but is a costly process since it will move ALL images concerned and need a full re-render of all tiles. But it might be necessary.

After these sequences (longer than 10km for their geometry) had been rendered at least data-wise consistent, we started to cut them (around 133000 sequences longer than 10000m). We have been having a lot of these that the cut-bot had missed earlier and that have been creating spiderwebs. These webs have been there for a long time (since these sequences have not been having correctly cut), but we have not rendered long lines between images in the tiler, making the spiderwebs disappear from the map (but not fixing the problem). Now that we have a lot more people using the APIs, the spider webs turn up again, since the data is not fixed, and we have to actually make things right. If sequences are overlapping now (the same original sequence now being two interleaving ones), you will see it now on the map since we will render it as such, which makes the errors visible.

As stated above, given the frequency of the inconsistencies, we might have to look at a full sequence-recalculation on these days. Sorry for that, totally our fault from old errors (but fixable).

Issue 2:
I deleted a whole bunch of sequences manually, and they were either auto-approved under @peter’s name, or he actually manually approved the deletion. Great! But the photos weren’t completely gone. The points remain on the map, show a thumbnail on hover, but then pop up a “Sorry, we can’t find the image you’re looking for” message (click most of the points in the map view above).

I emailed Mapillary support on September 28th about this and issue 3 (we’ll get there). They eventually got back to me about it a full month later, saying they fixed issue 3, but mentioned nothing about issue 2. @Brenna said a couple days ago that they’re looking into that…

However, I recently tried deleting some more sequences, fixed from issue 3, and most of them left the map, but not all. See this orphaned bit? That’s what’s left. Hovering over the points shows no thumbnail, at least.

Similar thing here. This is due to old sequences not being consistent, meaning that - when you delete or change an image (position, user, organization, sequence), we are submitting the old and the new geometry for retiling (in the deletion). For these inconsistencies, the old position is not what is in the existing vector tiles, making us retile the wrong tile.

When we are through all of this, we will trigger a full retile of the world, and remove these hanging geometries from the tiles. Until then, We can retile things like e.g. bounding boxes, so that’s a good idea in your case here.

Issue 3:
So I was waiting for my deleted sequences to just disappear already. Meanwhile, something happened in early September where one sequence was split into more than one (same thread, post 59ish). This wasn’t causing the same spiderwebs as before, because sequences were in order, just not all the photos were in one sequence over one span. This is what support “resolved” out of my two emailed issues. They basically stitched the various sequences back together by time stamp and user. (I’m still waiting for my sequence from Sept. 13th to be fixed, but WHATEVER.)

Yes, that is the process I mentioned earlier - but apparently, this needs to take not only the times, but also the compass angle into account. Will improve that script so we can make the below case working.

However, this hacky fix meant that my two cameras taking concurrent sequences at different angles were stitched into one! That’s what I tried deleting that has now been orphaned. I still haven’t touched a few others. Check out how fun it looks!

See above - this is fixable by improving these re-calculating algos, just didn’t know.

Issue 4:
Finally, a lot of the above problems were exacerbated by sequence_bot doing automated fixes. Some of it made sense, because it was just doing its job with the broken information it had…

Exactly, see above for improving this.

But at some point in the last few weeks, a change was enacted by a human that made sequence_bot cut anything over 1000 photos into neat chunks of 1000. I say neat, but it doesn’t do this right either.

Here’s a nice year+ old sequence of mine, next to a similar one in the opposite direction by @danbjoseph. Both seem to have been cut, and not cleanly. See the nice long line? There should be a whole bunch of photos in there. And that sequence was a whole lot more filled in. Perhaps this is part of what @Brenna mentioned in a support email: “You had 21k images deleted by the mapillary Recache system after there were some very long sequences broken up into shorter ones”. These were only 7k, though.

This is also part of the consistency checking. I did see that 21k of your images are deleted (not necessarily from the same sequence - it actually was around 30 sequence IDs, a number of them being 1001 images sequence), but it seems a lot of these were already gone from the system from much older deletions and now cleaned up properly and recorded. We have added more meta data to the deletions now concerning their original meta data, and the reason for deletion (manual deletion via changesets, corrupt EXIF, 0 bytes data, non-image format etc etc), and a retention of 5 weeks before the originals acutally are deleted from AWS S3 so we an resurrect tham in case of error. The 7k images that are wrongly gone are a one-time case as far as I can see. They were deleted by looking at the state if the metadata and assuming that image-processing had failed on these. I hve talked to the vision team, and we have double checked and adjusted the code not to do this going forward. I’m really sorry about that - one of a few cases where we actually hard deleted wrong data.

Summary:
Manual uploads and sequence_bot have caused some major problems in the last few months. Mapillary Support seems to be slowly trying to help, but it hasn’t actually fixed anything for me yet. I have lately manually uploading in 1000-photo bits, and so far no problems. But I’m rather demotivated to continue for now. And when you click on a sequence_bot post in your feed, at least it mentions which photos it messed with, but saying “Our bot discovered that some of your images were too far apart to make sense in one sequence, so they were split into several shorter ones” is a blatant lie when they’re just cutting them into shorter bits for some reason they have not told us the contributors.

Sorry for being perceived slow - we are working as fast as we can. The process of fixing all these inconsistencies in around 300M images (the newest 100M after the PSQL move are much better and much easier to deal with, mostly because they have no sequence inconsistencies) is long, and it’s hard to know all the edge cases of both attaching images back to the right sequences, cutting things that are wrongly in the same sequence, and applying changes which take days and often weeks to process, if we don’t want to overload our systems. I’m on it @pkoby and all others, and will be until we get this in order.

EDIT: 21k photos on my pre-September cameras would have taken at least 11.67 hours. I have never uploaded one sequence of that many…

See above. Not saying that this was one sequence, but saying that some of the sequences stemmed from cutting (1001 images) longer sequences.

Btw, we cut the sequences into smaller bits because there are a lot of really long sequences that give us problems in the backend when they result in long linestring geometries being sent around, and big database result sets being returned in chunks of 100.000 and higher in some cases.

/peter

12 Likes

Hi @peter,

This is a better answer than I could have hoped for! Thank you! I figured there was a bunch of stuff going on in the backend, but I don’t think there have been any sort of updates to the public about that (unless I missed a post). So apologies if I come across as rude with the above post, but it stems from frustration and lack of knowledge.

A lot of my confusion about things seems to come from the multiple parties of communication, so I had been getting things at least third-hand.

To respond to your major points:
It sounds like most of my sequences are fine except that one 7k one in Italy? That’s fine, I have the photos and can reupload (that goes for pretty much all my data). (One point I missed before: there are some images near the end of the sequence as shown on the map right now that exists, that you can get to via the arrows in-image, but don’t show up on the map. I guess this is another vector tile issue?)

The half-missing spiderweb sequences and the mostly-missing orphans should disappear from the map at some point when you do a retile? Or could these tiles be bbox-refreshed?

There are those sequences that were restitched without taking compass into account. Would that be easier for me to delete and reupload, or should I wait to have that reworked by your team?

Thanks again for the response, and I hope we will see some updates to the map soon. If it would be helpful for me to get you sequence IDs to reprocess, I can do that too.

1 Like

@peter: Thank you for your clear and open response. I had the impression that Mapillary communication tended to be more and more closed/secretive, this is a nice counter-example.

1 Like

Sequence cut is still at it for me, last on Nov 12th it split a totally fine seq

Hi @peter and/or other staff,

It’s been 5 and a half months since I posted. My issues 1, 3, and 4 are still in the same state they were. I am aware that issues 3 and 4 are probably up to me to delete/reupload, but issue 1 remains an open Mapillary Support issue (thanks no Github issue tracker…).

Can I get an update on if this issue will be fixed? (E.g. Mapillary covered in unclickable spiderwebs of points.)

@pkoby - I’m working with a colleague on these issues, just not at the speed I would like to. The bottom line is that we will have to recalculate the sequences in the areas where these rogue cuts have occured, but there is a remaining issue where sequences (even new ones) are cut wrongly. We want to solve that first before continuing with fixing the older sequences so we don’t end up in that cutting mess like the last time we did this.

This is some chunk of work that gets pushed in down from the top of the stack, but it’s not forgotten, sorry that this takes a lot of delay. I can fix areas at once if you have some you explicitly care about (that is - we know how to fix these issues, we just don’t want to run the process globally right now)?

1 Like

Thanks for the update! If you wouldn’t mind fixing this area, that would be great.

Will try to do it tomorrow @pkoby

I have now recalculated all your sequences in Huntington center @pkoby (177k images). Let’s wait a while so the tiles can be recached, and check out the result.

Well, so far I see a couple things: the bounding box edges are clear, because it split sequences arbitrarily and processed only what was inside. I don’t see any change to the spiderwebs, but I’ll wait on that to see if they are a tile cache issue. But most annoyingly, all my straight and sideways concurrent sequences are now batched together, so it seems like all the ones in the box now alternate between a straight photo and a sideways photo, back and forth, completely disrupting the flow of the sequence. I guess for a map verification purpose, it’s fine, but it’s not pretty.

This issue was already mentioned, but it seems like it wasn’t fixed to take compass angle into account. Can these sequences be re-processed to re-separate them? Or do I give up at some point?

Yes, this test displayer a bug in retailing that we are looking at. The spiderwebs are deleted sequences that should disappear.

For the direction offsets, I will try to improve the separation as mentioned before. Nowadays we are keeping track of the unique camera uuid for every image which would keep these apart even with the same timestamp range, but I think this came after last summer - will check. This is a good test to fix the issue, since we will need it when reprocessing. There is another property in Form of a client sequence uuid in case the images were captured with the apps, but not otherwise.

Will work more on this next week. As mentioned, this need to be fixed anyway in order to reprocess our old inconsistencies. Also, there is an element of resilience to it - we want to be able to reasonably reprocess images in case of any disruption of service or corruption of databases. This is a small sample that we can test.

@pkoby can yiu just point me to a sequence that has been conflated from different viewing directions so I can test on it locally?

/peter

Okay, glad to hear that you’re working to fix these issues. I think pretty much everything from August is now conflated (which is most of the core of Huntington), but here’s a randomly selected sequence: Mapillary

Perfect, thanks!
As mentioned, these are one bug (deleted sequences not being cleared when recreating the vector tiles) and one missing feature when recalculating sequences (taking compass angle offset into account). Going to work on these now.

/peter

@pkoby I have made progress with the scripts to recalculate sequences, but they are not perfect yet. Don’t be alarmed over the area being reprocessed, I’m tuning things …

Okay, good to hear! It was a bit of a shock to see the map in its current state, but I trust it will return.

Yes,
I was testing on a very small sample and realized that some of your sequences not giving the right results. Doing a new run right now, will check the results tomorrow (should look a whole lot better).

So, things are getting better, but there is a problem in making the angle offsets into static buckets that is showing here: Mapillary - some images get out of one bucket (offset -45-45 degrees) and into another one, so the sequence gets a false split. That is the main reason for the crossing lines you can see now. This is especially prominent with bike sequences as the offset tends to vary a bit.

Will think about how to avoid that tomorrow.

Yes, I was thinking about this problem and how you might get around it.

Would it be possible to calculate some expected angle buckets? In the editor, you can normalize a sequence, which aligns all images with the direction of the associated line. If you created a bucket for images that match that forward angle, and a bucket for everything else, that might fix the issue.

I guess I’m not sure if you want to create scripts that would solve this issue for all cases, or if you’re looking to just run a refresh on my region. If the latter, I could give you more information on my techniques, and you could tailor the scripts to match that? I’m the only contributor in this area except for one sequence by JB.

For each day of the problem sequences (which would have occurred between late July and early September of 2018), I ran two cameras concurrently, one facing forward (0/360 degrees) and one to the left (at 300 degrees, I think). I interpolated location for each camera using the same GPX, so that’s why they coincide. I tended to add a slight offset of a fraction of a second so the images weren’t on top of each other, but this wasn’t always the case. Angles were calculated by averaging the look-behind angle (to the previous image), and the look-ahead angle (to the next one) for each image, except for the first and last images (first just looks ahead, last repeats second-to-last angle).

So to fix my area, you might be able to look for the images that occur within two seconds of each other (my cameras had a two-second delay), with one sequence matching the expected angles, and one with an offset. So if you start with an image of angle 0, the next one should be to the north. That image might have an angle of 45, which would put the next image 90 degrees away (to the east). And so on.

EDIT TO ADD:
I looked at some single sequences too, not concurrent with other angles. These suffer some of the same issue, though it’s not as clear why. Perhaps limiting the next image in a sequence to a distance threshold might help? Time threshold would be possible too, but there would be some pauses at lights and whatnot that would throw that off. Though a split to a new sequence in these cases wouldn’t be the worst. Again, for my area, the real issues occurred really only in August (during the q3 challenge).

Hope this information helps.

Last ditch option, you could also possible batch delete all the offending sequences from August 2018, but that would kind of be a pain to reupload. I’d get to it eventually, though.