Upload server error: failed to create the cluster {}

@tao I am no DNS expert but it looks like the rupload.facebook.com alias and star.c10r.facebook.com have very volatile and diverging TTLs. Maybe I am doing something wrong.

$ dig +ttlunits rupload.facebook.com
;; ANSWER SECTION:
rupload.facebook.com.	11m55s	IN	CNAME	star.c10r.facebook.com.
star.c10r.facebook.com.	40s	IN	A	157.240.223.17

;; Query time: 32 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri Mar 07 14:16:37 UTC 2025
;; MSG SIZE  rcvd: 89
$ dig +ttlunits rupload.facebook.com
;; ANSWER SECTION:
rupload.facebook.com.	10m50s	IN	CNAME	star.c10r.facebook.com.
star.c10r.facebook.com.	13s	IN	A	157.240.27.18

;; Query time: 35 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri Mar 07 14:17:43 UTC 2025
;; MSG SIZE  rcvd: 89
$ dig +ttlunits rupload.facebook.com
;; ANSWER SECTION:
rupload.facebook.com.	58m49s	IN	CNAME	star.c10r.facebook.com.
star.c10r.facebook.com.	45s	IN	A	157.240.27.18

;; Query time: 73 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri Mar 07 14:28:50 UTC 2025
;; MSG SIZE  rcvd: 89

When I set my primary DNS server to Google’s DNS (8.8.8.8) then the TTL are also volatile. So, it looks like the issue is at the source, since if I understand things correctly the TTL should be propagated.

$ dig +ttlunits rupload.facebook.com
;; ANSWER SECTION:
rupload.facebook.com.	51m8s	IN	CNAME	star.c10r.facebook.com.
star.c10r.facebook.com.	50s	IN	A	157.240.252.10

;; Query time: 21 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)
;; WHEN: Fri Mar 07 14:34:08 UTC 2025
;; MSG SIZE  rcvd: 89
$ dig +ttlunits rupload.facebook.com
;; ANSWER SECTION:
rupload.facebook.com.	45s	IN	CNAME	star.c10r.facebook.com.
star.c10r.facebook.com.	45s	IN	A	157.240.252.10

;; Query time: 1 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)
;; WHEN: Fri Mar 07 14:34:12 UTC 2025
;; MSG SIZE  rcvd: 89

Thanks for the investigation @GITNE !

AFAIK once DNS resolves you an IP, the subsequent HTTP/TLS/TCP session will be using this IP address despite DNS changes (i.e. once the connection established, it won’t suddenly switch to a new IP), so DNS is unlikely the cause IMO. Also busy sites usually use 1h or even shorter TTLs for better load balancing.

I’m trying to reproduce the offset reset issue with different network settings (VPN, proxies). I think that’s the key. Let’s see!

1 Like

Generally, that’s true. However, when the TTL expires the resolver is forced to make a new DNS query, AFAIK. Otherwise, the TTL would be pointless.

Right, but most of them only push data and do not expect long running uploads. For an upload server you would want the TTL to be indefinite but only the host name resolve to different IPs depending on load. You can have dynamic TTLs on an upload server but then you also have to make sure that upload sessions migrate to different IPs, which makes the whole concept of upload load balancing more complex than actually needed. As a compromise, you can also use a very long TTL, like a week (but conceptually it will not make much of a difference).

:+1: Test, test, test,…

@tao graph.mapillary.com also maps to star.c10r.facebook.com, which has a dynamic IP address and the same volatile TTL behavior as rupload.facebook.com.

Hence, uploads can hit 100% on one IP address but ultimately fail on another IP address because the upload finished request can go to a different IP address than the upload IP address. And, upload sessions do not migrate. This is really messy and confusing.

Hmm, :thinking: if everything maps to star.c10r.facebook.com why the different aliases?

1 Like

briefly @tao

I wasn’t complaining or overly concerned, but will do a log check after I update the tools.

1 Like
Uploading ZIP mly_tools_90f4e91938f32803fe70e8c82c5b8669.zip (1/1): 100%|█████████████████████████████████████████████████████████████████████████████| 85.9G/85.9G [20:57:55<00:00, 1.22MB/s]
2025-03-09 01:05:53,363 - INFO    -        1  ZIP files uploaded
2025-03-09 01:05:53,386 - INFO    -  88010.9M data in total
2025-03-09 01:05:53,420 - INFO    -  88010.9M data uploaded
2025-03-09 01:05:53,421 - INFO    -  79334.1s upload time

:partying_face: @tao Since I mapped rupload.facebook.com, graph.mapillary.com, and star.c10r.facebook.com to the same static IP address, everything works flawlessly free of SSLErrors! Plus, I can pause the upload session at any time and as often as I need or like, and the upload resumes reliably. Finally! :relieved_face:

I am not sure but I think that mapping aliases only may not be enough. Again, I am not a DNS expert but something tells me that resolution may go ACNAMEIP address.

Next, I am going to comment out the static IP address mapping form the hosts file and try to substitute the host name of both endpoint URLs with star.c10r.facebook.com to see whether I get SSLErrors and resuming uploads works properly. My expectation is that I should get SSLErrors again and resuming uploads should break either.

1 Like

Hey @GITNE very happy to see you get a workaround here.

I can only reproduce the offset reset issue by switching VPNs, and I can confirm that resumable uploads do not work across data centers (i.e. dc in the response), i.e. if you connects to a new data center, it’s likely the offset will be rest. What affects the data center selection is likely your IP, which is always routed to the nearest data center I assume. By using a static IP for rupload.facebook.com I guess it also fixes which dc to connect to, so you don’t see any offset issues. I can’t reproduce any SSLError so can’t find more information here.

BTW the mapillary_tools repo provides a neat test CLI to test upload without affecting your uploads:

python3 -m tests.cli.upload_api_v4 ~/Downloads/GS010002.360 SESSION_KEY --chunk_size=1 --user_name=YOUR_MLY_USERNAME

So you can experiment different network configurations, or parameters (e.g. chunk sizes).

1 Like

Did you throttle your upload speed? How large were your files? Try uploading a large file, like a few GBs, and throttle down to 64 kbps to make things perhaps a bit more extreme (but maybe not that unrealistic in some scenarios) to provoke an SSLError. Additinally, since graph.mapillary.com is shared for uploading and the web app, try surfing the web app at the same time over the same VPN connection. Maybe this has some impact too? Oh, and please flush your DNS cache first then make sure that host names are resolved over the VPN connection, not your local DNS server. Try using a non‑facebook.com DNS server.

2 Likes


Also a couple times now a SSL error, it’s just reuploading the file, no problem there. Just to inform that it happens more. Upload seems to be capped to 100 Mbit (I have a 1 Gbit fiber connection). Is it an option to implement a multi upload?

2 Likes

Thank you for confirming the issue. However,…

… it is actually a real problem. The upload infrastructure design is broken. Throwing brute force or throughput at this problem is no solution at all. No amount of throughput is going to solve it.

And, that broken design affects all upload routes, including the Mapillary mobile apps, which is especially annoying since many contributors continue to pay for metered mobile connections. Imho it is a disgrace for a multi‑billion dollar tech company to expect contributors to not only capture imagery for free but also to cause additional cost to their contributors (their second most valiuable asset) because they are unable to do their simplest homework; like uploading files.

1 Like

@TheWizard were you running the latest version Release v0.13.3 · mapillary/mapillary_tools · GitHub, if not, could you try it out and let me know if speed improves?

With v0.13.3 I think the egress can be saturated with a single upload process, as I also tried to run multiple MT processes to upload and I didn’t notice any speed improvement, compared to single upload process.

1 Like

Hi tao,

I was indeed on the previous version. I tested it now with the 0.13.3 version and the upload speeds are greatly improved! There is now more fluctuation between the different uploads, it’s not related to my computer, CPU time is 20%. I presume it’s related to the server or interconnects. But I love those speeds way more than before :slight_smile:

1 Like