@tao I am no DNS expert but it looks like the rupload.facebook.com alias and star.c10r.facebook.com have very volatile and diverging TTLs. Maybe I am doing something wrong.
$ dig +ttlunits rupload.facebook.com
;; ANSWER SECTION:
rupload.facebook.com. 11m55s IN CNAME star.c10r.facebook.com.
star.c10r.facebook.com. 40s IN A 157.240.223.17
;; Query time: 32 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri Mar 07 14:16:37 UTC 2025
;; MSG SIZE rcvd: 89
$ dig +ttlunits rupload.facebook.com
;; ANSWER SECTION:
rupload.facebook.com. 10m50s IN CNAME star.c10r.facebook.com.
star.c10r.facebook.com. 13s IN A 157.240.27.18
;; Query time: 35 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri Mar 07 14:17:43 UTC 2025
;; MSG SIZE rcvd: 89
$ dig +ttlunits rupload.facebook.com
;; ANSWER SECTION:
rupload.facebook.com. 58m49s IN CNAME star.c10r.facebook.com.
star.c10r.facebook.com. 45s IN A 157.240.27.18
;; Query time: 73 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri Mar 07 14:28:50 UTC 2025
;; MSG SIZE rcvd: 89
When I set my primary DNS server to Google’s DNS (8.8.8.8) then the TTL are also volatile. So, it looks like the issue is at the source, since if I understand things correctly the TTL should be propagated.
$ dig +ttlunits rupload.facebook.com
;; ANSWER SECTION:
rupload.facebook.com. 51m8s IN CNAME star.c10r.facebook.com.
star.c10r.facebook.com. 50s IN A 157.240.252.10
;; Query time: 21 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)
;; WHEN: Fri Mar 07 14:34:08 UTC 2025
;; MSG SIZE rcvd: 89
$ dig +ttlunits rupload.facebook.com
;; ANSWER SECTION:
rupload.facebook.com. 45s IN CNAME star.c10r.facebook.com.
star.c10r.facebook.com. 45s IN A 157.240.252.10
;; Query time: 1 msec
;; SERVER: 8.8.8.8#53(8.8.8.8) (UDP)
;; WHEN: Fri Mar 07 14:34:12 UTC 2025
;; MSG SIZE rcvd: 89
AFAIK once DNS resolves you an IP, the subsequent HTTP/TLS/TCP session will be using this IP address despite DNS changes (i.e. once the connection established, it won’t suddenly switch to a new IP), so DNS is unlikely the cause IMO. Also busy sites usually use 1h or even shorter TTLs for better load balancing.
I’m trying to reproduce the offset reset issue with different network settings (VPN, proxies). I think that’s the key. Let’s see!
Generally, that’s true. However, when the TTL expires the resolver is forced to make a new DNS query, AFAIK. Otherwise, the TTL would be pointless.
Right, but most of them only push data and do not expect long running uploads. For an upload server you would want the TTL to be indefinite but only the host name resolve to different IPs depending on load. You can have dynamic TTLs on an upload server but then you also have to make sure that upload sessions migrate to different IPs, which makes the whole concept of upload load balancing more complex than actually needed. As a compromise, you can also use a very long TTL, like a week (but conceptually it will not make much of a difference).
@taograph.mapillary.com also maps to star.c10r.facebook.com, which has a dynamic IP address and the same volatile TTL behavior as rupload.facebook.com.
Hence, uploads can hit 100% on one IP address but ultimately fail on another IP address because the upload finished request can go to a different IP address than the upload IP address. And, upload sessions do not migrate. This is really messy and confusing.
Hmm, if everything maps to star.c10r.facebook.com why the different aliases?
Uploading ZIP mly_tools_90f4e91938f32803fe70e8c82c5b8669.zip (1/1): 100%|█████████████████████████████████████████████████████████████████████████████| 85.9G/85.9G [20:57:55<00:00, 1.22MB/s]
2025-03-09 01:05:53,363 - INFO - 1 ZIP files uploaded
2025-03-09 01:05:53,386 - INFO - 88010.9M data in total
2025-03-09 01:05:53,420 - INFO - 88010.9M data uploaded
2025-03-09 01:05:53,421 - INFO - 79334.1s upload time
@tao Since I mapped rupload.facebook.com, graph.mapillary.com, and star.c10r.facebook.com to the same static IP address, everything works flawlessly free of SSLErrors! Plus, I can pause the upload session at any time and as often as I need or like, and the upload resumes reliably. Finally!
I am not sure but I think that mapping aliases only may not be enough. Again, I am not a DNS expert but something tells me that resolution may go A→CNAME→IP address.
Next, I am going to comment out the static IP address mapping form the hosts file and try to substitute the host name of both endpoint URLs with star.c10r.facebook.com to see whether I get SSLErrors and resuming uploads works properly. My expectation is that I should get SSLErrors again and resuming uploads should break either.
Hey @GITNE very happy to see you get a workaround here.
I can only reproduce the offset reset issue by switching VPNs, and I can confirm that resumable uploads do not work across data centers (i.e. dc in the response), i.e. if you connects to a new data center, it’s likely the offset will be rest. What affects the data center selection is likely your IP, which is always routed to the nearest data center I assume. By using a static IP for rupload.facebook.com I guess it also fixes which dc to connect to, so you don’t see any offset issues. I can’t reproduce any SSLError so can’t find more information here.
BTW the mapillary_tools repo provides a neat test CLI to test upload without affecting your uploads:
Did you throttle your upload speed? How large were your files? Try uploading a large file, like a few GBs, and throttle down to 64 kbps to make things perhaps a bit more extreme (but maybe not that unrealistic in some scenarios) to provoke an SSLError. Additinally, since graph.mapillary.com is shared for uploading and the web app, try surfing the web app at the same time over the same VPN connection. Maybe this has some impact too? Oh, and please flush your DNS cache first then make sure that host names are resolved over the VPN connection, not your local DNS server. Try using a non‑facebook.com DNS server.
Also a couple times now a SSL error, it’s just reuploading the file, no problem there. Just to inform that it happens more. Upload seems to be capped to 100 Mbit (I have a 1 Gbit fiber connection). Is it an option to implement a multi upload?
… it is actually a real problem. The upload infrastructure design is broken. Throwing brute force or throughput at this problem is no solution at all. No amount of throughput is going to solve it.
And, that broken design affects all upload routes, including the Mapillary mobile apps, which is especially annoying since many contributors continue to pay for metered mobile connections. Imho it is a disgrace for a multi‑billion dollar tech company to expect contributors to not only capture imagery for free but also to cause additional cost to their contributors (their second most valiuable asset) because they are unable to do their simplest homework; like uploading files.
With v0.13.3 I think the egress can be saturated with a single upload process, as I also tried to run multiple MT processes to upload and I didn’t notice any speed improvement, compared to single upload process.
I was indeed on the previous version. I tested it now with the 0.13.3 version and the upload speeds are greatly improved! There is now more fluctuation between the different uploads, it’s not related to my computer, CPU time is 20%. I presume it’s related to the server or interconnects. But I love those speeds way more than before
@tao When pip building the latest mapillary_tools==0.13.3 the pynmea2==1.19 dependency does not build. All other dependencies build flawlessly.
❯ python3 --version
Python 3.12.9
❯ pip --version
pip 25.0.1 from /usr/lib/python3.12/site-packages/pip (python 3.12)
❯ pip install --no-binary :all: mapillary_tools
Collecting pynmea2<2.0.0,>=1.12.0 (from mapillary_tools)
Downloading pynmea2-1.19.0.tar.gz (36 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [6 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-wk7g44vb/pynmea2_da86abe101874af696200f318c659b8c/setup.py", line 3, in <module>
import imp
ModuleNotFoundError: No module named 'imp'
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
Looks like this has been already fixed
but the source dist has not been updated on PyPI yet.
I have made a handful of larger uploads with the tests.cli.upload_api_v4 module and a legit dummy upload with v0.13.3 into the actual upload feed, both over a direct connection without static IP address mapping (sort of default net config). I have also played with different chunk sizes. None of the uploads caused a SSLError, even when surfing the Mapillary web app at the same time. All uploads also resumed correctly. I am not sure what you have changed but things look stable for now. The DNS TTLs continue to be volatile and generally quite short, usually around one minute. I will continue to monitor the situation on upcoming uploads.
@tao I continue to rarely get SSLErrors and progress resets with static IP address mapping:
Uploading ZIP mly_tools_687a1686021c47aa8fb831fba230de15.zip (1/1): 34%|█████████████████████████▉ | 12.5G/36.5G [2:56:50<5:38:48, 1.27MB/s]
2025-03-25 02:49:25,859 - WARNING - Error uploading chunk_size 5242880 at begin_offset 0: SSLError: HTTPSConnectionPool(host='rupload.facebook.com', port=443): Max retries exceeded with url: /mapillary_public_uploads/mly_tools_687a1686021c47aa8fb831fba230de15.zip (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:2427)')))
2025-03-25 02:49:25,860 - INFO - Retrying in 2 seconds (1/200)
Uploading ZIP mly_tools_687a1686021c47aa8fb831fba230de15.zip (1/1): 34%|█████████████████████████▉ | 12.5G/36.5G [3:13:11<6:12:30, 1.15MB/s]
Due to its sporadic nature, maybe this also happens when the server is briefly overloaded? The server then just closes the oldest connection(s) to free resources for new connections?
Using PUT seems to work too. But you can not resume from a PUT uploading, i.e. the offset is always returned as 0 if you use PUT, which makes sense from its semantics perspective (replace a resource) PUT - HTTP | MDN