Upload server error: failed to create the cluster {}

mapillary_tools version 0.13.2
Uploading ZIP mly_tools_737db28b51bb9e407ddf8a31abb1771e.zip (4/5): 100%|█████████████████████████████████████████████████████████████████████████████| 61.6G/61.6G [14:36:08<00:00, 1.26MB/s]
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/upload.py", line 654, in upload
    _upload_zipfiles(mly_uploader, zip_paths)
  File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/upload.py", line 709, in _upload_zipfiles
    raise UploadError(ex) from ex
mapillary_tools.upload.UploadError: Upload server error: failed to create the cluster {}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/mapillary_tools", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/commands/__main__.py", line 162, in main
    args.func(argvars)
  File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/commands/upload.py", line 50, in run
    upload(**args)
  File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/upload.py", line 683, in upload
    raise inner_ex
  File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/upload.py", line 705, in _upload_zipfiles
    cluster_id = mly_uploader.upload_zipfile(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/uploader.py", line 132, in upload_zipfile
    return self.upload_stream(
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/uploader.py", line 221, in upload_stream
    return _upload_stream(
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/uploader.py", line 435, in _upload_stream
    cluster_id = upload_service.finish(file_handle)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/upload_api_v4.py", line 196, in finish
    raise RuntimeError(
RuntimeError: Upload server error: failed to create the cluster {}

After 14 hours of uploading the counter hits 100% and then everything is reset to 0. Is this supposed to be funny or a bad joke? :pray: Please guys, get a grip on yourselves. I am speechless; the level of poor quality of your upload infrastructure is beyond my comprehension. These are files, just files…

1 Like

@tao, @abalys - are you folks able to take a look?

Hi,

Thanks for reaching out.

We don’t have widespread issues with uploads although I do see some requests failing with invalid file handle, which leads me to believe that your upload may be too big.

To investigate this further, can you please:

  • Provide the command that you used for uploading.
  • Run the same command with --verbose flag and append the output.
  • Provide the file size of folders/zips that you are trying to upload.

cc’ing @tao and @nikola who are experts in this.

Kind regards,
Balys

2025-03-03 17:56:37,771 - INFO    - Retrying in 4 seconds (2/200)
2025-03-03 17:56:41,771 - DEBUG   - GET https://rupload.facebook.com/mapillary_public_uploads/mly_tools_737db28b51bb9e407ddf8a31abb1771e.zip
2025-03-03 17:56:42,638 - DEBUG   - HTTP response 200: b'{"dc":"ftw3c14","offset":3288334336}'
2025-03-03 17:56:42,639 - DEBUG   - Sending upload_fetch_offset via IPC: {'total_sequence_count': 5, 'sequence_idx': 3, 'file_type': 'zip', 'import_path': '4/mly_tools_737db28b51bb9e407ddf8a31abb1771e.zip', 'sequence_image_count': 38227, 'entity_size': 66178742427, 'md5sum': '737db28b51bb9e407ddf8a31abb1771e', 'upload_start_time': 1741009179.7226415, 'upload_total_time': 11812.591799259186, 'offset': 3288334336, 'retries': 2, 'upload_first_offset': 3288334336, 'chunk_size': 67108864, 'upload_last_restart_time': 1741021002.6394222}
2025-03-03 17:56:42,802 - DEBUG   - POST https://rupload.facebook.com/mapillary_public_uploads/mly_tools_737db28b51bb9e407ddf8a31abb1771e.zip HEADERS {"Offset": "3288334336", "X-Entity-Length": "66178742427", "X-Entity-Name": "mly_tools_737db28b51bb9e407ddf8a31abb1771e.zip", "X-Entity-Type": "application/zip"}
2025-03-03 17:56:43,072 - WARNING - Error uploading chunk_size 67108864 at begin_offset 3288334336: SSLError: HTTPSConnectionPool(host='rupload.facebook.com', port=443): Max retries exceeded with url: /mapillary_public_uploads/mly_tools_737db28b51bb9e407ddf8a31abb1771e.zip (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))
2025-03-03 17:56:43,072 - INFO    - Retrying in 8 seconds (3/200)
2025-03-03 17:56:51,073 - DEBUG   - GET https://rupload.facebook.com/mapillary_public_uploads/mly_tools_737db28b51bb9e407ddf8a31abb1771e.zip
2025-03-03 17:56:51,487 - DEBUG   - HTTP response 200: b'{"dc":"atn5c15","offset":0}'
2025-03-03 17:56:51,488 - DEBUG   - Sending upload_fetch_offset via IPC: {'total_sequence_count': 5, 'sequence_idx': 3, 'file_type': 'zip', 'import_path': '4/mly_tools_737db28b51bb9e407ddf8a31abb1771e.zip', 'sequence_image_count': 38227, 'entity_size': 66178742427, 'md5sum': '737db28b51bb9e407ddf8a31abb1771e', 'upload_start_time': 1741009179.7226415, 'upload_total_time': 11813.02470445633, 'offset': 0, 'retries': 3, 'upload_first_offset': 0, 'chunk_size': 67108864, 'upload_last_restart_time': 1741021011.4887266}
2025-03-03 17:56:51,636 - DEBUG   - POST https://rupload.facebook.com/mapillary_public_uploads/mly_tools_737db28b51bb9e407ddf8a31abb1771e.zip HEADERS {"Offset": "0", "X-Entity-Length": "66178742427", "X-Entity-Name": "mly_tools_737db28b51bb9e407ddf8a31abb1771e.zip", "X-Entity-Type": "application/zip"}
2025-03-03 17:57:44,979 - DEBUG   - HTTP response 206: b'{"debug_info":{"retriable":true,"type":"PartialRequestError","message":"Partial request (did not match length of file)"}}'
2025-03-03 17:57:44,979 - DEBUG   - The next offset will be: 67108864

:person_facepalming: This is pathetic!

1 Like

On the plus side, I just uploaded 150GBytes odd of a BlackVue and EXIF image mix using the tools over 12 hours. Apart from the frequent errors like;

2025-02-28 11:07:05,767 - WARNING - Error uploading chunk_size 16777216 at begin_offset 0: SSLError: HTTPSConnectionPool(host=‘rupload.facebook.com’, port=443): Max retries exceeded with url: /mapillary_public_uploads/mly_tools_0da5fc607189b6e299c6e634da6c6afd.zip (Caused by SSLError(SSLEOFError(8, ‘EOF occurred in violation of protocol (_ssl.c:2393)’)))

and

2025-03-01 04:46:18,685 - ERROR - MapillaryUploadConnectionError: (‘Connection aborted.’, ConnectionResetError(104, ‘Connection reset by peer’))

It all worked fine. I suspect these errors might be associated with trying to run multiple concurrent uploads, and that I am still using mapillary_tools version 0.12.1 (hihi)

No complaints here. I would rather had used rsync and remote/shell server process though!

1 Like

Thank you @bob3bob3 for confirming that I am suffering not alone.

No concurrent uploads on my side.

I am on

2025-03-03 18:41:08,678 - DEBUG   - mapillary_tools version 0.13.2
2025-03-03 17:56:42,638 - DEBUG   - HTTP response 200: b'{"dc":"ftw3c14","offset":3288334336}'
2025-03-03 17:56:51,487 - DEBUG   - HTTP response 200: b'{"dc":"atn5c15","offset":0}'

The issue is clearly on Mapillary’s end.

@GITNE @bob3bob3 Sorry for the inconvenience. There seems multiple issues:

  1. RuntimeError: failed to create the cluster {}: this is likely due to that the server wasn’t able to handle large files. It is fixed now. Please try again
  2. Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')): Are you both using any proxy here? It may cause SSL handshake failed. Can you verify it by turning the proxy off? If yes, can you share more details about your proxy setup for us to reproduce?
  3. MapillaryUploadConnectionError: (‘Connection aborted.’, ConnectionResetError(104, ‘Connection reset by peer’)) this is likely caused by network. There is not much we can do other than trying it again in another network?
1 Like

Thank you for investigating.

Great! :+1::slightly_smiling_face: I have already resumed uploading the failed zip file. We are going to know more in a few hours. Of course, I do not know whether your changes impact the already running upload. Would I need to reset the upload?

I tried uploading directly and via a proxy with a stable public IP address. Both ways exhibit the same Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')) errors. However, fewer via the proxy. I cannot think of any reason why this should happen on my side. I do not see such or similar behavior in the browser either. I run the latest OpenSSL and GNU TLS versions. :person_shrugging: Though, I have not explictly tested OpenSSL as the TLS backend since the latest mapillary_tools update (GNU TLS is the default). I will try doing so with the next uploads and let you know. But, I do not think these errors are caused by any TLS backend.

I have also played a bit with the chunk size. It looks like the larger the chunk size is, the fewer TLS errors I get. This may impact HTTP keep alive timeouts (just a thought). Can we somehow configure the HTTP keep alive timeout in mapillary_tools? The HTTP keep alive timeout may also be connected to the 104, ‘Connection reset by peer’ errors. And yes, the clocks on my machines are all stable, hence I also rule out any early HTTP keep alive disconnects due to a broken clock.

Would I need to reset the upload?

No you don’t need to reset the upload. Let’s wait and see.

Regarding 2, do you think it possible that the proxy servers you configured do not support https (so it does not respond the SSL handshake initialization, hence violation of protocol)?

Also, is it a new issue in v0.13 or also occurred in v0.12? If it’s a new issue since v0.13 then the only related change is fix: fallback to system SSL certs when certifi fails by ptpt · Pull Request #698 · mapillary/mapillary_tools · GitHub which falls back to use system CA certs when the bundled CA certs fail. Do you see any warnings like SSL error occurred, falling back to system SSL certificates?

I will read more about HTTP keep alive timeouts and get back here.

1 Like

They do support HTTPS. They are protocol transparent SOCKS proxies. Besides, the same SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')) error happens via direct connection as well as via a proxy.

This is an old(er) issue.

No, I do not have these errors. I think these may happen if the proxy itself runs TLS connections and authenticates with a self‑signed certificate or a certificate signed by a CA not in the (user) certificate store. Furthermore, the SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')) error happens running the mapillary_tools AppImage packages, pip3 wheels, and the GitHub repo.

Please note also that the error message says

SSLError: HTTPSConnectionPool(host='rupload.facebook.com', port=443): Max retries exceeded with url: /mapillary_public_uploads/mly_tools_737db28b51bb9e407ddf8a31abb1771e.zip

Which may hint that the server has hit some kind of Max retries limit. Could this be the number of chunks? In other words, the server misinterprets chunks for retries?

They do support HTTPS. They are protocol transparent SOCKS proxies. Besides, the same SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')) error happens via direct connection as well as via a proxy.

Thanks for the clarification. Are you using WiFi, cable, or cellular data?

This is an old(er) issue.

I guess this error is transient as I saw some of uploads made progress. How frequently does it happen? Is it more frequent when using proxy?

If it’s low odds, I wonder if reducing the number of HTTP requests helps. One possibility is to use Chunked transfer encoding - Wikipedia so we can upload whole sequence/video with just one HTTP request. I need to confirm if the server well supports that.

Please note also that the error message says SSLError: HTTPSConnectionPool(host='rupload.facebook.com', port=443): Max retries exceeded with url: /mapillary_public_uploads/mly_tools_737db28b51bb9e407ddf8a31abb1771e.zip

I believe this is some default internal retries by the requests library MT is using

1 Like

This error happens on everything, on all kinds of devices, and with different ISPs.

There is no indicator. It really is random and there is nothing I can nor cannot do to stop it from happening.

My personal impression is that it happens less frequently via a proxy but I have no hard evidence numbers to backup my impression (which is not really that important either).

Does mapillary_tools make concurrent HTTPS request by default when uploading? If so, then it may fairly easily over saturate the connection and therefore make some HTTPS requests fail. Can we configure the number of concurrent HTTPS requests in mapillary_tools?

1 Like

Thanks @GITNE ! Wonder if capturing the network activities with tcpdump/wireshark can provide some insights.

Uploading is always single-threaded (sequential).

@GITNE FYI Chunked transfer encoding is implemented in the latest pre-release. See the instruction to update your mapillary_tools in [BUG] Interrupted uploads do not resume · Issue #569 · mapillary/mapillary_tools · GitHub

1 Like

Thank you @tao. Your explanation in your PR improve: use chunked transfer encoding to stream large files by ptpt · Pull Request #714 · mapillary/mapillary_tools · GitHub makes sense. In any event, this should also lower some load on your servers, which is always great. :+1:

I am going to test v0.13.3a1 as soon as my currently running upload completes and will let you know about my experience.

:crying_face: The last upload on v0.13.2 did hit 100% but ultimately failed :frowning: due to server timeout error and then reset to 0.

Uploading ZIP mly_tools_90f4e91938f32803fe70e8c82c5b8669.zip (1/1): 100%|█████████████████████████████████████████████████████████████████████████████▉| 85.9G/85.9G [8:19:29<00:00, 1.14MB/s]
2025-03-06 16:59:30,051 - WARNING - Error uploading chunk_size 1048576 at begin_offset 58193870848: HTTPError: 504 Server Error: Server timeout for url: https://rupload.facebook.com/mapillary_public_uploads/mly_tools_90f4e91938f32803fe70e8c82c5b8669.zip
2025-03-06 16:59:30,051 - INFO    - Retrying in 2 seconds (1/200)
Uploading ZIP mly_tools_90f4e91938f32803fe70e8c82c5b8669.zip (1/1): 100%|█████████████████████████████████████████████████████████████████████████████▉| 85.9G/85.9G [8:28:14<00:00, 1.12MB/s]
Uploading ZIP mly_tools_90f4e91938f32803fe70e8c82c5b8669.zip (1/1):   0%|                                                                             | 19.0M/85.9G [00:17<22:50:29, 1.12MB/s]

I am going to upload again with v0.13.3a1. :person_shrugging:

2025-03-06 16:59:30,051 - WARNING - Error uploading chunk_size 1048576 at begin_offset 58193870848: HTTPError: 504 Server Error: Server timeout for url: https://rupload.facebook.com/mapillary_public_uploads/mly_tools_90f4e91938f32803fe70e8c82c5b8669.zip

My guess is that the server may be busy md5sum checking the uploaded zip before responding in time. Imho, md5sum checking should not be part of the upload process but the backend pipeline. Please keep in mind that the load on the server my change any time, so please do not simply increase the timeout because it will not solve anything.

By the way, please do not use MD5. It is broken beyond repair. You should not use it, not even for pseudo random number generation. Please transition to SHA1. Although SHA1 is in the process of being phased out from standards, it continues to be collision resistant for up to 16GiB of input. Thus, good enough and even better than MD5 for what you need.

$ mapillary_tools --version
mapillary_tools version 0.13.3a1

2025-03-06 20:54:54,346 - DEBUG   - Sending upload_progress via IPC: {'total_sequence_count': 1, 'sequence_idx': 0, 'file_type': 'zip', 'import_path': 'mly_tools_90f4e91938f32803fe70e8c82c5b8669.zip', 'sequence_image_count': 58620, 'entity_size': 92286085618, 'md5sum': '90f4e91938f32803fe70e8c82c5b8669', 'upload_start_time': 1741277326.36584, 'upload_total_time': 0, 'offset': 17129537536, 'retries': 0, 'upload_last_restart_time': 1741277326.5264294, 'upload_first_offset': 20971520, 'chunk_size': 16777216}
2025-03-06 20:55:05,023 - WARNING - Error uploading chunk_size 16777216 at begin_offset 20971520: SSLError: HTTPSConnectionPool(host='rupload.facebook.com', port=443): Max retries exceeded with url: /mapillary_public_uploads/mly_tools_90f4e91938f32803fe70e8c82c5b8669.zip (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))
2025-03-06 20:55:05,023 - INFO    - Retrying in 2 seconds (1/200)
2025-03-06 20:55:07,024 - DEBUG   - GET https://rupload.facebook.com/mapillary_public_uploads/mly_tools_90f4e91938f32803fe70e8c82c5b8669.zip
2025-03-06 20:55:07,235 - DEBUG   - HTTP response 200: b'{"dc":"odn2c17","offset":0}'
2025-03-06 20:55:07,236 - DEBUG   - Sending upload_fetch_offset via IPC: {'total_sequence_count': 1, 'sequence_idx': 0, 'file_type': 'zip', 'import_path': 'mly_tools_90f4e91938f32803fe70e8c82c5b8669.zip', 'sequence_image_count': 58620, 'entity_size': 92286085618, 'md5sum': '90f4e91938f32803fe70e8c82c5b8669', 'upload_start_time': 1741277326.36584, 'upload_total_time': 13578.496676683426, 'offset': 0, 'retries': 1, 'upload_first_offset': 0, 'chunk_size': 16777216, 'upload_last_restart_time': 1741290907.2362647}
2025-03-06 20:55:07,236 - DEBUG   - POST https://rupload.facebook.com/mapillary_public_uploads/mly_tools_90f4e91938f32803fe70e8c82c5b8669.zip HEADERS {"Offset": "0", "X-Entity-Name": "mly_tools_90f4e91938f32803fe70e8c82c5b8669.zip", "X-Entity-Type": "application/zip"}
2025-03-06 20:55:20,324 - DEBUG   - Sending upload_progress via IPC: {'total_sequence_count': 1, 'sequence_idx': 0, 'file_type': 'zip', 'import_path': 'mly_tools_90f4e91938f32803fe70e8c82c5b8669.zip', 'sequence_image_count': 58620, 'entity_size': 92286085618, 'md5sum': '90f4e91938f32803fe70e8c82c5b8669', 'upload_start_time': 1741277326.36584, 'upload_total_time': 13578.496676683426, 'offset': 16777216, 'retries': 1, 'upload_first_offset': 0, 'chunk_size': 16777216, 'upload_last_restart_time': 1741290907.2362647}

Unfortunately, 0.13.3a1 brought no improvement. It continues to reset to 0 randomly. :frowning: I could live with SSL errors but I cannot live with resets to 0. Things become unuploadable.

@tao I think I solved the cause for resets. Simply put, rupload.facebook.com’s DNS TTL is too short.

$ dig +ttlunits 'rupload.facebook.com'
rupload.facebook.com.	1h	IN	CNAME	star.c10r.facebook.com.

When uploading mapillary_tools makes DNS resoultion requests with every HTTPS request and every chunk, usually against the local DNS cache. However, when the cache entry expires (which is dictated by the DNS TTL), the DNS resolver makes a new request to resolve rupload.facebook.com. It then so happens that in most cases rupload.facebook.com is resolved to a different IP address. The existing TLS session then sends data to the new IP address and hence causes an EOF TLS error because naturally the new IP address does not have a TLS session running. In other words, the TLS session (or upload session) does not migrate to the newly resolved server IP address.

What can you do?

  • Make rupload.facebook.com’s DNS TTL indefinite (you will still get load balancing!)
  • Or, make the upload session migrate to the new IP address
  • Or, make rupload.facebook.com resolve always to only one IP address
  • Or, hard code an IP address in mapillary_tools

What can users do?

  • Resolve the rupload.facebook.com alias or star.c10r.facebook.com host name once (usually to an IP address for their location) and map star.c10r.facebook.com to a static IP address by putting it into the hosts file
  • Set the environment variables MAPILLARY_UPLOAD_ENDPOINT and MAPILLARY_GRAPH_API_ENDPOINT to URLs with host names replaced to identical resolved IP addresses:
    MAPILLARY_UPLOAD_ENDPOINT=https://157.240.X.X/mapillary_public_uploads
    MAPILLARY_GRAPH_API_ENDPOINT=https://157.240.X.X
    

However, these should only be considered a workaround until Mapillary fixes this issue permanently.