mapillary_tools version 0.13.2
Uploading ZIP mly_tools_737db28b51bb9e407ddf8a31abb1771e.zip (4/5): 100%|█████████████████████████████████████████████████████████████████████████████| 61.6G/61.6G [14:36:08<00:00, 1.26MB/s]
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/upload.py", line 654, in upload
_upload_zipfiles(mly_uploader, zip_paths)
File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/upload.py", line 709, in _upload_zipfiles
raise UploadError(ex) from ex
mapillary_tools.upload.UploadError: Upload server error: failed to create the cluster {}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/mapillary_tools", line 8, in <module>
sys.exit(main())
^^^^^^
File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/commands/__main__.py", line 162, in main
args.func(argvars)
File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/commands/upload.py", line 50, in run
upload(**args)
File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/upload.py", line 683, in upload
raise inner_ex
File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/upload.py", line 705, in _upload_zipfiles
cluster_id = mly_uploader.upload_zipfile(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/uploader.py", line 132, in upload_zipfile
return self.upload_stream(
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/uploader.py", line 221, in upload_stream
return _upload_stream(
^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/uploader.py", line 435, in _upload_stream
cluster_id = upload_service.finish(file_handle)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/upload_api_v4.py", line 196, in finish
raise RuntimeError(
RuntimeError: Upload server error: failed to create the cluster {}
After 14 hours of uploading the counter hits 100% and then everything is reset to 0. Is this supposed to be funny or a bad joke? Please guys, get a grip on yourselves. I am speechless; the level of poor quality of your upload infrastructure is beyond my comprehension. These are files, just files…
We don’t have widespread issues with uploads although I do see some requests failing with invalid file handle, which leads me to believe that your upload may be too big.
To investigate this further, can you please:
Provide the command that you used for uploading.
Run the same command with --verbose flag and append the output.
Provide the file size of folders/zips that you are trying to upload.
It all worked fine. I suspect these errors might be associated with trying to run multiple concurrent uploads, and that I am still using mapillary_tools version 0.12.1 (hihi)
No complaints here. I would rather had used rsync and remote/shell server process though!
@GITNE@bob3bob3 Sorry for the inconvenience. There seems multiple issues:
RuntimeError: failed to create the cluster {}: this is likely due to that the server wasn’t able to handle large files. It is fixed now. Please try again
Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')): Are you both using any proxy here? It may cause SSL handshake failed. Can you verify it by turning the proxy off? If yes, can you share more details about your proxy setup for us to reproduce?
MapillaryUploadConnectionError: (‘Connection aborted.’, ConnectionResetError(104, ‘Connection reset by peer’)) this is likely caused by network. There is not much we can do other than trying it again in another network?
No you don’t need to reset the upload. Let’s wait and see.
Regarding 2, do you think it possible that the proxy servers you configured do not support https (so it does not respond the SSL handshake initialization, hence violation of protocol)?
They do support HTTPS. They are protocol transparent SOCKS proxies. Besides, the same SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')) error happens via direct connection as well as via a proxy.
Thanks for the clarification. Are you using WiFi, cable, or cellular data?
This is an old(er) issue.
I guess this error is transient as I saw some of uploads made progress. How frequently does it happen? Is it more frequent when using proxy?
If it’s low odds, I wonder if reducing the number of HTTP requests helps. One possibility is to use Chunked transfer encoding - Wikipedia so we can upload whole sequence/video with just one HTTP request. I need to confirm if the server well supports that.
Please note also that the error message says SSLError: HTTPSConnectionPool(host='rupload.facebook.com', port=443): Max retries exceeded with url: /mapillary_public_uploads/mly_tools_737db28b51bb9e407ddf8a31abb1771e.zip
I believe this is some default internal retries by the requests library MT is using
@tao I think I solved the cause for resets. Simply put, rupload.facebook.com’s DNS TTL is too short.
$ dig +ttlunits 'rupload.facebook.com'
rupload.facebook.com. 1h IN CNAME star.c10r.facebook.com.
When uploading mapillary_tools makes DNS resoultion requests with every HTTPS request and every chunk, usually against the local DNS cache. However, when the cache entry expires (which is dictated by the DNS TTL), the DNS resolver makes a new request to resolve rupload.facebook.com. It then so happens that in most cases rupload.facebook.com is resolved to a different IP address. The existing TLS session then sends data to the new IP address and hence causes an EOF TLS error because naturally the new IP address does not have a TLS session running. In other words, the TLS session (or upload session) does not migrate to the newly resolved server IP address.
What can you do?
Make rupload.facebook.com’s DNS TTL indefinite (you will still get load balancing!)
Or, make the upload session migrate to the new IP address
Or, make rupload.facebook.com resolve always to only one IP address
Or, hard code an IP address in mapillary_tools
What can users do?
Resolve the rupload.facebook.com alias or star.c10r.facebook.com host name once (usually to an IP address for their location) and map star.c10r.facebook.com to a static IP address by putting it into the hosts file
Set the environment variables MAPILLARY_UPLOAD_ENDPOINT and MAPILLARY_GRAPH_API_ENDPOINT to URLs with host names replaced to identical resolved IP addresses:
@tao I am no DNS expert but it looks like the rupload.facebook.com alias and star.c10r.facebook.com have very volatile and diverging TTLs. Maybe I am doing something wrong.
$ dig +ttlunits rupload.facebook.com
;; ANSWER SECTION:
rupload.facebook.com. 11m55s IN CNAME star.c10r.facebook.com.
star.c10r.facebook.com. 40s IN A 157.240.223.17
;; Query time: 32 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri Mar 07 14:16:37 UTC 2025
;; MSG SIZE rcvd: 89
$ dig +ttlunits rupload.facebook.com
;; ANSWER SECTION:
rupload.facebook.com. 10m50s IN CNAME star.c10r.facebook.com.
star.c10r.facebook.com. 13s IN A 157.240.27.18
;; Query time: 35 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri Mar 07 14:17:43 UTC 2025
;; MSG SIZE rcvd: 89
$ dig +ttlunits rupload.facebook.com
;; ANSWER SECTION:
rupload.facebook.com. 58m49s IN CNAME star.c10r.facebook.com.
star.c10r.facebook.com. 45s IN A 157.240.27.18
;; Query time: 73 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri Mar 07 14:28:50 UTC 2025
;; MSG SIZE rcvd: 89
When I set my primary DNS server to Google’s DNS (8.8.8.8) then the TTL are also volatile. So, it looks like the issue is at the source, since if I understand things correctly the TTL should be propagated.
$ dig +ttlunits rupload.facebook.com
;; ANSWER SECTION:
rupload.facebook.com. 51m8s IN CNAME star.c10r.facebook.com.
star.c10r.facebook.com. 50s IN A 157.240.252.10
;; Query time: 21 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)
;; WHEN: Fri Mar 07 14:34:08 UTC 2025
;; MSG SIZE rcvd: 89
$ dig +ttlunits rupload.facebook.com
;; ANSWER SECTION:
rupload.facebook.com. 45s IN CNAME star.c10r.facebook.com.
star.c10r.facebook.com. 45s IN A 157.240.252.10
;; Query time: 1 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)
;; WHEN: Fri Mar 07 14:34:12 UTC 2025
;; MSG SIZE rcvd: 89
AFAIK once DNS resolves you an IP, the subsequent HTTP/TLS/TCP session will be using this IP address despite DNS changes (i.e. once the connection established, it won’t suddenly switch to a new IP), so DNS is unlikely the cause IMO. Also busy sites usually use 1h or even shorter TTLs for better load balancing.
I’m trying to reproduce the offset reset issue with different network settings (VPN, proxies). I think that’s the key. Let’s see!
Generally, that’s true. However, when the TTL expires the resolver is forced to make a new DNS query, AFAIK. Otherwise, the TTL would be pointless.
Right, but most of them only push data and do not expect long running uploads. For an upload server you would want the TTL to be indefinite but only the host name resolve to different IPs depending on load. You can have dynamic TTLs on an upload server but then you also have to make sure that upload sessions migrate to different IPs, which makes the whole concept of upload load balancing more complex than actually needed. As a compromise, you can also use a very long TTL, like a week (but conceptually it will not make much of a difference).
@taograph.mapillary.com also maps to star.c10r.facebook.com, which has a dynamic IP address and the same volatile TTL behavior as rupload.facebook.com.
Hence, uploads can hit 100% on one IP address but ultimately fail on another IP address because the upload finished request can go to a different IP address than the upload IP address. And, upload sessions do not migrate. This is really messy and confusing.
Hmm, if everything maps to star.c10r.facebook.com why the different aliases?
Uploading ZIP mly_tools_90f4e91938f32803fe70e8c82c5b8669.zip (1/1): 100%|█████████████████████████████████████████████████████████████████████████████| 85.9G/85.9G [20:57:55<00:00, 1.22MB/s]
2025-03-09 01:05:53,363 - INFO - 1 ZIP files uploaded
2025-03-09 01:05:53,386 - INFO - 88010.9M data in total
2025-03-09 01:05:53,420 - INFO - 88010.9M data uploaded
2025-03-09 01:05:53,421 - INFO - 79334.1s upload time
@tao Since I mapped rupload.facebook.com, graph.mapillary.com, and star.c10r.facebook.com to the same static IP address, everything works flawlessly free of SSLErrors! Plus, I can pause the upload session at any time and as often as I need or like, and the upload resumes reliably. Finally!
I am not sure but I think that mapping aliases only may not be enough. Again, I am not a DNS expert but something tells me that resolution may go A→CNAME→IP address.
Next, I am going to comment out the static IP address mapping form the hosts file and try to substitute the host name of both endpoint URLs with star.c10r.facebook.com to see whether I get SSLErrors and resuming uploads works properly. My expectation is that I should get SSLErrors again and resuming uploads should break either.
Hey @GITNE very happy to see you get a workaround here.
I can only reproduce the offset reset issue by switching VPNs, and I can confirm that resumable uploads do not work across data centers (i.e. dc in the response), i.e. if you connects to a new data center, it’s likely the offset will be rest. What affects the data center selection is likely your IP, which is always routed to the nearest data center I assume. By using a static IP for rupload.facebook.com I guess it also fixes which dc to connect to, so you don’t see any offset issues. I can’t reproduce any SSLError so can’t find more information here.
BTW the mapillary_tools repo provides a neat test CLI to test upload without affecting your uploads:
Did you throttle your upload speed? How large were your files? Try uploading a large file, like a few GBs, and throttle down to 64 kbps to make things perhaps a bit more extreme (but maybe not that unrealistic in some scenarios) to provoke an SSLError. Additinally, since graph.mapillary.com is shared for uploading and the web app, try surfing the web app at the same time over the same VPN connection. Maybe this has some impact too? Oh, and please flush your DNS cache first then make sure that host names are resolved over the VPN connection, not your local DNS server. Try using a non‑facebook.com DNS server.
Also a couple times now a SSL error, it’s just reuploading the file, no problem there. Just to inform that it happens more. Upload seems to be capped to 100 Mbit (I have a 1 Gbit fiber connection). Is it an option to implement a multi upload?
… it is actually a real problem. The upload infrastructure design is broken. Throwing brute force or throughput at this problem is no solution at all. No amount of throughput is going to solve it.
And, that broken design affects all upload routes, including the Mapillary mobile apps, which is especially annoying since many contributors continue to pay for metered mobile connections. Imho it is a disgrace for a multi‑billion dollar tech company to expect contributors to not only capture imagery for free but also to cause additional cost to their contributors (their second most valiuable asset) because they are unable to do their simplest homework; like uploading files.