Upload server error: File handle not found in the upload response {"debug_info":{"retriable":true,"type":"ShutdownError","message":"Server shutting down. Please try again."}}

GITNE · March 2, 2025, 7:49pm

mapillary_tools version 0.13.2
Uploading ZIP mly_tools_737db28b51bb9e407ddf8a31abb1771e.zip (4/5): 100%|█████████████████████████████████████████████████████████████████████████████| 61.6G/61.6G [14:36:08<00:00, 1.26MB/s]
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/upload.py", line 654, in upload
    _upload_zipfiles(mly_uploader, zip_paths)
  File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/upload.py", line 709, in _upload_zipfiles
    raise UploadError(ex) from ex
mapillary_tools.upload.UploadError: Upload server error: failed to create the cluster {}

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/mapillary_tools", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/commands/__main__.py", line 162, in main
    args.func(argvars)
  File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/commands/upload.py", line 50, in run
    upload(**args)
  File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/upload.py", line 683, in upload
    raise inner_ex
  File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/upload.py", line 705, in _upload_zipfiles
    cluster_id = mly_uploader.upload_zipfile(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/uploader.py", line 132, in upload_zipfile
    return self.upload_stream(
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/uploader.py", line 221, in upload_stream
    return _upload_stream(
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/uploader.py", line 435, in _upload_stream
    cluster_id = upload_service.finish(file_handle)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/mapillary_tools/upload_api_v4.py", line 196, in finish
    raise RuntimeError(
RuntimeError: Upload server error: failed to create the cluster {}

After 14 hours of uploading the counter hits 100% and then everything is reset to 0. Is this supposed to be funny or a bad joke? Please guys, get a grip on yourselves. I am speechless; the level of poor quality of your upload infrastructure is beyond my comprehension. These are files, just files…

boris · March 3, 2025, 11:59am

@tao, @balys - are you folks able to take a look?

balys · March 3, 2025, 1:17pm

Hi,

Thanks for reaching out.

We don’t have widespread issues with uploads although I do see some requests failing with invalid file handle, which leads me to believe that your upload may be too big.

To investigate this further, can you please:

Provide the command that you used for uploading.
Run the same command with --verbose flag and append the output.
Provide the file size of folders/zips that you are trying to upload.

cc’ing @tao and @nikola who are experts in this.

Kind regards,
Balys

bob3bob3 · March 3, 2025, 6:32pm

On the plus side, I just uploaded 150GBytes odd of a BlackVue and EXIF image mix using the tools over 12 hours. Apart from the frequent errors like;

2025-02-28 11:07:05,767 - WARNING - Error uploading chunk_size 16777216 at begin_offset 0: SSLError: HTTPSConnectionPool(host=‘rupload.facebook.com’, port=443): Max retries exceeded with url: /mapillary_public_uploads/mly_tools_0da5fc607189b6e299c6e634da6c6afd.zip (Caused by SSLError(SSLEOFError(8, ‘EOF occurred in violation of protocol (_ssl.c:2393)’)))

and

2025-03-01 04:46:18,685 - ERROR - MapillaryUploadConnectionError: (‘Connection aborted.’, ConnectionResetError(104, ‘Connection reset by peer’))

It all worked fine. I suspect these errors might be associated with trying to run multiple concurrent uploads, and that I am still using mapillary_tools version 0.12.1 (hihi)

No complaints here. I would rather had used rsync and remote/shell server process though!

tao · March 4, 2025, 9:03pm

@GITNE @bob3bob3 Sorry for the inconvenience. There seems multiple issues:

RuntimeError: failed to create the cluster {}: this is likely due to that the server wasn’t able to handle large files. It is fixed now. Please try again
Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')): Are you both using any proxy here? It may cause SSL handshake failed. Can you verify it by turning the proxy off? If yes, can you share more details about your proxy setup for us to reproduce?
MapillaryUploadConnectionError: (‘Connection aborted.’, ConnectionResetError(104, ‘Connection reset by peer’)) this is likely caused by network. There is not much we can do other than trying it again in another network?

tao · March 5, 2025, 12:30am

Would I need to reset the upload?

No you don’t need to reset the upload. Let’s wait and see.

Regarding 2, do you think it possible that the proxy servers you configured do not support https (so it does not respond the SSL handshake initialization, hence violation of protocol)?

Also, is it a new issue in v0.13 or also occurred in v0.12? If it’s a new issue since v0.13 then the only related change is fix: fallback to system SSL certs when certifi fails by ptpt · Pull Request #698 · mapillary/mapillary_tools · GitHub which falls back to use system CA certs when the bundled CA certs fail. Do you see any warnings like SSL error occurred, falling back to system SSL certificates?

I will read more about HTTP keep alive timeouts and get back here.

tao · March 5, 2025, 7:10pm

They do support HTTPS. They are protocol transparent SOCKS proxies. Besides, the same SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')) error happens via direct connection as well as via a proxy.

Thanks for the clarification. Are you using WiFi, cable, or cellular data?

This is an old(er) issue.

I guess this error is transient as I saw some of uploads made progress. How frequently does it happen? Is it more frequent when using proxy?

If it’s low odds, I wonder if reducing the number of HTTP requests helps. One possibility is to use Chunked transfer encoding - Wikipedia so we can upload whole sequence/video with just one HTTP request. I need to confirm if the server well supports that.

Please note also that the error message says SSLError: HTTPSConnectionPool(host='rupload.facebook.com', port=443): Max retries exceeded with url: /mapillary_public_uploads/mly_tools_737db28b51bb9e407ddf8a31abb1771e.zip

I believe this is some default internal retries by the requests library MT is using

tao · March 5, 2025, 7:37pm

Thanks @GITNE ! Wonder if capturing the network activities with tcpdump/wireshark can provide some insights.

Uploading is always single-threaded (sequential).

tao · March 6, 2025, 6:38am

@GITNE FYI Chunked transfer encoding is implemented in the latest pre-release. See the instruction to update your mapillary_tools in [BUG] Interrupted uploads do not resume · Issue #569 · mapillary/mapillary_tools · GitHub

GITNE · March 6, 2025, 10:00pm

@tao I think I solved the cause for resets. Simply put, rupload.facebook.com’s DNS TTL is too short.

$ dig +ttlunits 'rupload.facebook.com'
rupload.facebook.com.	1h	IN	CNAME	star.c10r.facebook.com.

When uploading mapillary_tools makes DNS resoultion requests with every HTTPS request and every chunk, usually against the local DNS cache. However, when the cache entry expires (which is dictated by the DNS TTL), the DNS resolver makes a new request to resolve rupload.facebook.com. It then so happens that in most cases rupload.facebook.com is resolved to a different IP address. The existing TLS session then sends data to the new IP address and hence causes an EOF TLS error because naturally the new IP address does not have a TLS session running. In other words, the TLS session (or upload session) does not migrate to the newly resolved server IP address.

What can you do?

Make rupload.facebook.com’s DNS TTL indefinite (you will still get load balancing!)
Or, make the upload session migrate to the new IP address
Or, make rupload.facebook.com resolve always to only one IP address
Or, hard code an IP address in mapillary_tools

What can users do?

Resolve the rupload.facebook.com alias or star.c10r.facebook.com host name once (usually to an IP address for their location) and map star.c10r.facebook.com to a static IP address by putting it into the hosts file
Set the environment variables MAPILLARY_UPLOAD_ENDPOINT and MAPILLARY_GRAPH_API_ENDPOINT to URLs with host names replaced to identical resolved IP addresses:
```
MAPILLARY_UPLOAD_ENDPOINT=https://157.240.X.X/mapillary_public_uploads
MAPILLARY_GRAPH_API_ENDPOINT=https://157.240.X.X
```

However, these should only be considered a workaround until Mapillary fixes this issue permanently.

GITNE · March 7, 2025, 2:39pm

@tao I am no DNS expert but it looks like the rupload.facebook.com alias and star.c10r.facebook.com have very volatile and diverging TTLs. Maybe I am doing something wrong.

$ dig +ttlunits rupload.facebook.com
;; ANSWER SECTION:
rupload.facebook.com.	11m55s	IN	CNAME	star.c10r.facebook.com.
star.c10r.facebook.com.	40s	IN	A	157.240.223.17

;; Query time: 32 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri Mar 07 14:16:37 UTC 2025
;; MSG SIZE  rcvd: 89
$ dig +ttlunits rupload.facebook.com
;; ANSWER SECTION:
rupload.facebook.com.	10m50s	IN	CNAME	star.c10r.facebook.com.
star.c10r.facebook.com.	13s	IN	A	157.240.27.18

;; Query time: 35 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri Mar 07 14:17:43 UTC 2025
;; MSG SIZE  rcvd: 89
$ dig +ttlunits rupload.facebook.com
;; ANSWER SECTION:
rupload.facebook.com.	58m49s	IN	CNAME	star.c10r.facebook.com.
star.c10r.facebook.com.	45s	IN	A	157.240.27.18

;; Query time: 73 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri Mar 07 14:28:50 UTC 2025
;; MSG SIZE  rcvd: 89

When I set my primary DNS server to Google’s DNS (8.8.8.8) then the TTL are also volatile. So, it looks like the issue is at the source, since if I understand things correctly the TTL should be propagated.

$ dig +ttlunits rupload.facebook.com
;; ANSWER SECTION:
rupload.facebook.com.	51m8s	IN	CNAME	star.c10r.facebook.com.
star.c10r.facebook.com.	50s	IN	A	157.240.252.10

;; Query time: 21 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)
;; WHEN: Fri Mar 07 14:34:08 UTC 2025
;; MSG SIZE  rcvd: 89
$ dig +ttlunits rupload.facebook.com
;; ANSWER SECTION:
rupload.facebook.com.	45s	IN	CNAME	star.c10r.facebook.com.
star.c10r.facebook.com.	45s	IN	A	157.240.252.10

;; Query time: 1 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)
;; WHEN: Fri Mar 07 14:34:12 UTC 2025
;; MSG SIZE  rcvd: 89

tao · March 7, 2025, 6:59pm

Thanks for the investigation @GITNE !

AFAIK once DNS resolves you an IP, the subsequent HTTP/TLS/TCP session will be using this IP address despite DNS changes (i.e. once the connection established, it won’t suddenly switch to a new IP), so DNS is unlikely the cause IMO. Also busy sites usually use 1h or even shorter TTLs for better load balancing.

I’m trying to reproduce the offset reset issue with different network settings (VPN, proxies). I think that’s the key. Let’s see!

GITNE · March 7, 2025, 8:35pm

Generally, that’s true. However, when the TTL expires the resolver is forced to make a new DNS query, AFAIK. Otherwise, the TTL would be pointless.

Right, but most of them only push data and do not expect long running uploads. For an upload server you would want the TTL to be indefinite but only the host name resolve to different IPs depending on load. You can have dynamic TTLs on an upload server but then you also have to make sure that upload sessions migrate to different IPs, which makes the whole concept of upload load balancing more complex than actually needed. As a compromise, you can also use a very long TTL, like a week (but conceptually it will not make much of a difference).

Test, test, test,…

GITNE · March 7, 2025, 10:46pm

@tao graph.mapillary.com also maps to star.c10r.facebook.com, which has a dynamic IP address and the same volatile TTL behavior as rupload.facebook.com.

github.com/mapillary/mapillary_tools

mapillary_tools/api_v4.py

main


      
          MAPILLARY_GRAPH_API_ENDPOINT = os.getenv(
              "MAPILLARY_GRAPH_API_ENDPOINT", "https://graph.mapillary.com"

github.com/mapillary/mapillary_tools

mapillary_tools/upload_api_v4.py

main


      
          url = f"{MAPILLARY_GRAPH_API_ENDPOINT}/finish_upload"

Hence, uploads can hit 100% on one IP address but ultimately fail on another IP address because the upload finished request can go to a different IP address than the upload IP address. And, upload sessions do not migrate. This is really messy and confusing.

Hmm, if everything maps to star.c10r.facebook.com why the different aliases?

bob3bob3 · March 8, 2025, 12:35am

briefly @tao

I wasn’t complaining or overly concerned, but will do a log check after I update the tools.

GITNE · March 9, 2025, 1:39am

Uploading ZIP mly_tools_90f4e91938f32803fe70e8c82c5b8669.zip (1/1): 100%|█████████████████████████████████████████████████████████████████████████████| 85.9G/85.9G [20:57:55<00:00, 1.22MB/s]
2025-03-09 01:05:53,363 - INFO    -        1  ZIP files uploaded
2025-03-09 01:05:53,386 - INFO    -  88010.9M data in total
2025-03-09 01:05:53,420 - INFO    -  88010.9M data uploaded
2025-03-09 01:05:53,421 - INFO    -  79334.1s upload time

@tao Since I mapped rupload.facebook.com, graph.mapillary.com, and star.c10r.facebook.com to the same static IP address, everything works flawlessly free of SSLErrors! Plus, I can pause the upload session at any time and as often as I need or like, and the upload resumes reliably. Finally!

I am not sure but I think that mapping aliases only may not be enough. Again, I am not a DNS expert but something tells me that resolution may go A→CNAME→IP address.

Next, I am going to comment out the static IP address mapping form the hosts file and try to substitute the host name of both endpoint URLs with star.c10r.facebook.com to see whether I get SSLErrors and resuming uploads works properly. My expectation is that I should get SSLErrors again and resuming uploads should break either.

tao · March 11, 2025, 5:26am

Hey @GITNE very happy to see you get a workaround here.

I can only reproduce the offset reset issue by switching VPNs, and I can confirm that resumable uploads do not work across data centers (i.e. dc in the response), i.e. if you connects to a new data center, it’s likely the offset will be rest. What affects the data center selection is likely your IP, which is always routed to the nearest data center I assume. By using a static IP for rupload.facebook.com I guess it also fixes which dc to connect to, so you don’t see any offset issues. I can’t reproduce any SSLError so can’t find more information here.

BTW the mapillary_tools repo provides a neat test CLI to test upload without affecting your uploads:

python3 -m tests.cli.upload_api_v4 ~/Downloads/GS010002.360 SESSION_KEY --chunk_size=1 --user_name=YOUR_MLY_USERNAME

So you can experiment different network configurations, or parameters (e.g. chunk sizes).

GITNE · March 11, 2025, 11:52am

Did you throttle your upload speed? How large were your files? Try uploading a large file, like a few GBs, and throttle down to 64 kbps to make things perhaps a bit more extreme (but maybe not that unrealistic in some scenarios) to provoke an SSLError. Additinally, since graph.mapillary.com is shared for uploading and the web app, try surfing the web app at the same time over the same VPN connection. Maybe this has some impact too? Oh, and please flush your DNS cache first then make sure that host names are resolved over the VPN connection, not your local DNS server. Try using a non‑facebook.com DNS server.

TheWizard · March 11, 2025, 5:09pm

Also a couple times now a SSL error, it’s just reuploading the file, no problem there. Just to inform that it happens more. Upload seems to be capped to 100 Mbit (I have a 1 Gbit fiber connection). Is it an option to implement a multi upload?

GITNE · March 11, 2025, 5:21pm

Thank you for confirming the issue. However,…

… it is actually a real problem. The upload infrastructure design is broken. Throwing brute force or throughput at this problem is no solution at all. No amount of throughput is going to solve it.

And, that broken design affects all upload routes, including the Mapillary mobile apps, which is especially annoying since many contributors continue to pay for metered mobile connections. Imho it is a disgrace for a multi‑billion dollar tech company to expect contributors to not only capture imagery for free but also to cause additional cost to their contributors (their second most valiuable asset) because they are unable to do their simplest homework; like uploading files.

Topic		Replies	Views
Mapillary Tools 0.14 is released Command line tools	52	633	September 9, 2025
"OSError: [Errno 28] No space left on device" when uploading Command line tools	30	11570	August 6, 2025
Been here forever but going, going, gone Introduction	42	4182	November 13, 2021
403 UPLOAD Error Contributing and equipment	45	2777	January 11, 2020
Mapillary desktop uploader 3.1.0 is out Contributing and equipment	21	1244	December 1, 2022

Upload server error: File handle not found in the upload response {"debug_info":{"retriable":true,"type":"ShutdownError","message":"Server shutting down. Please try again."}}

What can you do?

What can users do?

Related topics