Peter and all,
TLDR: Many of these issues are surfacing now as we are working on fixing old inconsistencies in Mapillary from before last summer, and we will continue to crank until we fix this.
Let me just answer inline, thanks to your issue - grouping (copying in your original post below)
Thanks to Mapillary’s move away from GitHub to an email system, my issues are no longer public. However, I would like to share my experiences and see if other people are noticing things as well, or if I’m seriously unlucky.
Going away from GitHub to a model of Zenhub/Forums is a move we made to make the process of support more smooth and transparent. As a developer, I’m - like you - not 100% sure that this is the case, however, we have been noticing both customers who want us to handle their issues privately, and community members who are not comfortable with using Github and issue trackers. Keeping all 3 channels is too much of an overhead for the team, so we decided to go down to 2 channels. We can always revisit that decision
In mid-August, uploaded sequences were breaking out of order, causing “spiderwebs” across the map as the connecting lines between unordered points streaked everywhere. This issue was brought up in this thread (long read). The problem was fixed, and sequences were uploading fine.
Let me tell you a story …
The background to all this sequence madness is that we had a serious emergency back before summer with our PostgresQL database backend. Due to the increased traffic and some remnants of non-performing Ruby Rails code that did a lot of small transactions to the backend, we realized that we were running slowly out of transaction IDs. You can have 2 billion tx ids (increasing for every write TX you do) and the PSQL vacuum process is freeing up old TX ids so they can be reused. When we realized this, a vacuum process cycle was freeing 400M ids which took 3 weeks, and growing. We were using 600M TX ids during these 3 weeks, and we were at 1.6 billion as the highest ID. We were preparing to move and clean out the database system over time, but hoped we would get a few weeks. However, while the next cycle was running, our cloud provider KILLED the vacuum process wihout alerting us first after 2.5 weeks. Basically we had just a few days to move the database and the data to a totally new instance before running out of IDs, at which point PSQL will stop accepting all transactions until it has available IDs, rendering the database dead for a number of weeks until a vacuum finishes (at that point about 5 weeks).
In the old domain model in PSQL, every image could be potentially part of multiple sequences. Also, there were numerous data inconsistencies that we wanted to clean up before moving to a more strict and compact model, where every image is part of exactly one sequence. With this panic move, we had no time to run the cleanups and decided to move the images in a way over to the new model that gave them the last sequence key if there were several (and there were many of these, due to errors in code). This was assuming that the other of our backend system, Elastic, was holding the last sequence key (since on sequence moves, the last one would be written into the image document).
The move went well, and in August I started working on making the backend more consistent, adding many more checks to our Reaching system (the one that takes a sequence, checks that all images are in a consistent state between the different databases, and have all the information like blurs etc we need to be able to publish the sequence, and posts the updated vector tile regions to a tiler that then generates the updated tiles=green lines, for the old and the new positions).
However, a lot of my sequences remained broken, and Mapillary either overlooked them or some automatic process broke down. See this part of the map for an example. Unfortunately, sequence_bot got ahold of these broken sequences and chopped them up into little bits. So now my single sequences were made up of some longish ones, and MANY 1–5 photo sequences. I tried getting it resolved, but eventually tried deleting them myself. That leads to…
It turned out that the assumption that the last sequence key is consistent not always has been true. I scheduled first the sequences over 10km long for recaching and consistency check. If images in PSQL (Postgres) and ELS (Elasticsearch) had different sequence IDs, that process attaches the images to the ELS sequence. The tiles are generated from the PSQL sequence data.
It turns out that even the ELS data is not fully consistent, which is evident in the overlapping sequence data you are seeing. I have stopped the recaching process now, and are thinking to do a brute-force data consolidation - for every time we are seeing inconsistency, we could re-calculate the correct sequences not only by looking at existing sequence keys and incrementally attach images to existing sequences, but actually re-reading all uploads from the user for that day, and totally recalculate new sequences based on the time and location of all images relative to each other. That would work, but is a costly process since it will move ALL images concerned and need a full re-render of all tiles. But it might be necessary.
After these sequences (longer than 10km for their geometry) had been rendered at least data-wise consistent, we started to cut them (around 133000 sequences longer than 10000m). We have been having a lot of these that the cut-bot had missed earlier and that have been creating spiderwebs. These webs have been there for a long time (since these sequences have not been having correctly cut), but we have not rendered long lines between images in the tiler, making the spiderwebs disappear from the map (but not fixing the problem). Now that we have a lot more people using the APIs, the spider webs turn up again, since the data is not fixed, and we have to actually make things right. If sequences are overlapping now (the same original sequence now being two interleaving ones), you will see it now on the map since we will render it as such, which makes the errors visible.
As stated above, given the frequency of the inconsistencies, we might have to look at a full sequence-recalculation on these days. Sorry for that, totally our fault from old errors (but fixable).
I deleted a whole bunch of sequences manually, and they were either auto-approved under @peter’s name, or he actually manually approved the deletion. Great! But the photos weren’t completely gone. The points remain on the map, show a thumbnail on hover, but then pop up a “Sorry, we can’t find the image you’re looking for” message (click most of the points in the map view above).
I emailed Mapillary support on September 28th about this and issue 3 (we’ll get there). They eventually got back to me about it a full month later, saying they fixed issue 3, but mentioned nothing about issue 2. @Brenna said a couple days ago that they’re looking into that…
However, I recently tried deleting some more sequences, fixed from issue 3, and most of them left the map, but not all. See this orphaned bit? That’s what’s left. Hovering over the points shows no thumbnail, at least.
Similar thing here. This is due to old sequences not being consistent, meaning that - when you delete or change an image (position, user, organization, sequence), we are submitting the old and the new geometry for retiling (in the deletion). For these inconsistencies, the old position is not what is in the existing vector tiles, making us retile the wrong tile.
When we are through all of this, we will trigger a full retile of the world, and remove these hanging geometries from the tiles. Until then, We can retile things like e.g. bounding boxes, so that’s a good idea in your case here.
So I was waiting for my deleted sequences to just disappear already. Meanwhile, something happened in early September where one sequence was split into more than one (same thread, post 59ish). This wasn’t causing the same spiderwebs as before, because sequences were in order, just not all the photos were in one sequence over one span. This is what support “resolved” out of my two emailed issues. They basically stitched the various sequences back together by time stamp and user. (I’m still waiting for my sequence from Sept. 13th to be fixed, but WHATEVER.)
Yes, that is the process I mentioned earlier - but apparently, this needs to take not only the times, but also the compass angle into account. Will improve that script so we can make the below case working.
However, this hacky fix meant that my two cameras taking concurrent sequences at different angles were stitched into one! That’s what I tried deleting that has now been orphaned. I still haven’t touched a few others. Check out how fun it looks!
See above - this is fixable by improving these re-calculating algos, just didn’t know.
Finally, a lot of the above problems were exacerbated by sequence_bot doing automated fixes. Some of it made sense, because it was just doing its job with the broken information it had…
Exactly, see above for improving this.
But at some point in the last few weeks, a change was enacted by a human that made sequence_bot cut anything over 1000 photos into neat chunks of 1000. I say neat, but it doesn’t do this right either.
Here’s a nice year+ old sequence of mine, next to a similar one in the opposite direction by @danbjoseph. Both seem to have been cut, and not cleanly. See the nice long line? There should be a whole bunch of photos in there. And that sequence was a whole lot more filled in. Perhaps this is part of what @Brenna mentioned in a support email: “You had 21k images deleted by the mapillary Recache system after there were some very long sequences broken up into shorter ones”. These were only 7k, though.
This is also part of the consistency checking. I did see that 21k of your images are deleted (not necessarily from the same sequence - it actually was around 30 sequence IDs, a number of them being 1001 images sequence), but it seems a lot of these were already gone from the system from much older deletions and now cleaned up properly and recorded. We have added more meta data to the deletions now concerning their original meta data, and the reason for deletion (manual deletion via changesets, corrupt EXIF, 0 bytes data, non-image format etc etc), and a retention of 5 weeks before the originals acutally are deleted from AWS S3 so we an resurrect tham in case of error. The 7k images that are wrongly gone are a one-time case as far as I can see. They were deleted by looking at the state if the metadata and assuming that image-processing had failed on these. I hve talked to the vision team, and we have double checked and adjusted the code not to do this going forward. I’m really sorry about that - one of a few cases where we actually hard deleted wrong data.
Manual uploads and sequence_bot have caused some major problems in the last few months. Mapillary Support seems to be slowly trying to help, but it hasn’t actually fixed anything for me yet. I have lately manually uploading in 1000-photo bits, and so far no problems. But I’m rather demotivated to continue for now. And when you click on a sequence_bot post in your feed, at least it mentions which photos it messed with, but saying “Our bot discovered that some of your images were too far apart to make sense in one sequence, so they were split into several shorter ones” is a blatant lie when they’re just cutting them into shorter bits for some reason they have not told us the contributors.
Sorry for being perceived slow - we are working as fast as we can. The process of fixing all these inconsistencies in around 300M images (the newest 100M after the PSQL move are much better and much easier to deal with, mostly because they have no sequence inconsistencies) is long, and it’s hard to know all the edge cases of both attaching images back to the right sequences, cutting things that are wrongly in the same sequence, and applying changes which take days and often weeks to process, if we don’t want to overload our systems. I’m on it @pkoby and all others, and will be until we get this in order.
EDIT: 21k photos on my pre-September cameras would have taken at least 11.67 hours. I have never uploaded one sequence of that many…
See above. Not saying that this was one sequence, but saying that some of the sequences stemmed from cutting (1001 images) longer sequences.
Btw, we cut the sequences into smaller bits because there are a lot of really long sequences that give us problems in the backend when they result in long linestring geometries being sent around, and big database result sets being returned in chunks of 100.000 and higher in some cases.