Used older mapillary_tools versions a lot. After some break updated to 0.8.2.
In older versions, if a processing and upload run was interrupted or failed, it could be re-run on the same directory, and it resumed from where it had left, more or less.
With 0.8, this seems to be broken - running mapillary_tools against the same directory goes through the “Processing”, “Test EXIF writing”, and “Uploading” phases.
Is this indeed correct, do the tools fail to keep status info now, and do everything over and over again?
My understanding is that the “status” is now a server side database. eg you shouldnt be able to upload the same sequence twice. I am not sure what happens with the local json file now as I only upload BlackVue mp4’s
However I dont trust the tools to restart a directory with 6 gazillion mp4’s in it so my command line generates a verbose log, that is then parsed for successful uploads and renames them. This is handy because right now I see reliable tools crashes every 20-30 mp4’s. (My uploads are done by a 3rd party)
Thanks, I do recall the duplicate check during upload being introduced - but that serves a different purpose. It helps Mapillary not to have duplicate images in the end.
Local status in mapillary_tools is for contributors so that they do not waste processing power, bandwidth and not kill their disks that soon.
Uploading many images, sometimes over slow internet connection, I highly value the functionality to do absolute minimum duplicate operations in case some step fails or is interrupted.
It also helps to avoid data loss in case some directory/images have a full/partial failure to process or upload.
For example, if a contributor launches mapillary_tools in a loop against 20 directories with images, some directory inbetween might fail to upload. The contributor should be able to re-run mapillary_tools against all directories and have them pick up only anything not processed/uploaded without re-doing everything.
For me lack of this functionality is a major regression and a dealbreaker. @asturksever, sorry to bother you directly, but could this please be escalated, if possible?
Thank you for the quick reply, greatly appreciated.
That issue only handles repeated uploads - what about other processing, should I create a new Github issue?
Regarding the impact of this, I have a few dozen of directories, ~1000 images each. There was a mapillary_tools process and upload run on them all in a loop. Without status kept, it seems that I have no way to figure out whether anything there failed, right?
Also, would downgrading to mapillary_tools 0.7 allow to have local status tracking?
That is, are tools version 0.7 compatible with the upstream services?
For example, if a contributor launches mapillary_tools in a loop against 20 directories with images, some directory inbetween might fail to upload. The contributor should be able to re-run mapillary_tools against all directories and have them pick up only anything not processed/uploaded without re-doing everything.
@Richlv The duplication check enables the use case above.
Hmm, it might not have been good before (0.8.3 was suggested, I only got 0.8.2 after updating), but now 0.9.0 is available
I haven’t done extensive tests (time and resource consuming), only noticed that fully uploaded sequences skip the upload step for them.
Could you please confirm whether all of these cases should have their status kept and properly resumed?
a) “Processing” and “Test EXIF writing” steps - if interrupted in the middle, resume from the image where they were interrupted at. If completed, don’t go parsing at all.
b) Temporary directory during upload - any processed data there is kept (in case of an interruption or failure). Upload inside any particular sequence is resumed exactly where it was left off.
Such behaviour was present before (with functional differences due to processing and upload being per-image, but that doesn’t change the need), and is extremely useful when travelling, and perhaps only getting short time windows to do image processing and upload.
We only keep track if a sequence is uploaded or not. No status for processing. This is because processing is fast comparing to uploading, and processing won’t be interrupted as often (due to fatal errors) as uploading (due to network errors for example).
For example, if you are uploading a folder that contains 2 sequences, sequence A and sequence B respectively, the procedure will be like:
process A
upload A
process B
upload B
So:
if you interrupt the program at step 1, the next run it will run 1,2,3,4
if you interrupt the program at step 4 where 80% of B is already uploaded, the next run it will skip 1 and 2 because A is uploaded, and then it will run step 3, and then upload the rest (20%) bytes of B at step 4
To answer your questions:
a) If it gets interrupted, in the next run it will be processed again
b) No temporary directories are created if you process images. For video processing, samples will be created under a folder called “mapillary_sampled_video_frames”, and it will be kept until you remove them manually.
c) Uploading will be assumed from where it’s left.
Thank you for the explanation, greatly appreciated.
Can we please consider adding tracking for all processing as well?
Sometimes, when travelling, I have to upload a directory many, many times. When that happens, it would end up re-doing the processing each time.
This adds extra load on my storage devices. I already lost one external harddrive I used for Mapillary processing.
Reducing the wear and tear on contributor equipment would be extremely welcome
Very glad to hear about uploads being resumed. Is this happening by keeping an aggregate file in TMPDIR, or is it some other approach?
Here’s a usecase I had just now, not even travelling that much.
Was moving between a home office, a library, a pub and a friends’ place. Each time I had to cancel the upload. Resuming it in the next location went through all the processing again.
In case I had not had the time to manually remove the fully uploaded directories, it went through re-processing many, many gigabytes of images.
Pretty please Mapillary, add full and detailed status tracking, or send me a few large removable drives
Sure, most of it is GoPro images, 999 images and usually ~2.2GB per directory.
In a run there usually are 1-30 such directories (some might have less images).
While time taken is a bit of a concern for me, my previous external disk I used for Mapillary processing died - reducing useless disk activity seems very attractive
I now do processing and uploading separately. Still getting issues with duplicate uploads. Worse, I accidentally triggered the same folder twice, and now the same images is visible both with key
179744871254126 and 690470781981911
I did notice that there’s a recurrent warning during upload that might be related: seems the tool doens’t have the proper rights to write a local archive of what stuff has been uploaded already?