We're launching photorealistic Neural Radiance Fields (NeRF) at Mapillary!

Neural Radiance Fields, or NeRFs, are a relatively new type of 3D model that can be generated from a set of 2D images for a scene. The technology works by using machine learning algorithms to analyze the images and create a 3D representation of it. This allows for incredibly detailed and realistic reconstructions of real-world scenes, capturing everything from the intricate details of buildings to the lush foliage of trees, including view-dependent effects. But that’s not all - once trained, NeRFs can be used for novel view synthesis, allowing us to render a video from any angle and camera trajectory. This means that it is possible for us to navigate around in the 3D model as if we were actually there, giving us a completely new way to experience and explore the world. Whether you’re interested in exploring new places or just want to see your local neighborhood in a whole new light, our NeRF reconstructions are a powerful tool for mapping enthusiasts everywhere.

You can read the full announcement on the Mapillary blog and visit the map on Mapillary to see them all!

You can also get involved and capture your own places to populate our maps with even more amazing NeRF reconstructions. We have instructions available on how to capture the data in the best possible way. Let us know what you think and we are looking forward to seeing NeRF captures from the community!

13 Likes

Wohoo! By the way, when you play the YouTube video above, try click and dragging to look around in 360 degrees. :smiley:

2 Likes

Will small portions of 360 videos that I’ve uploaded work with this?

2 Likes

Broadly things have to be captured following the instructions from the help center. If you happened to do that style of capture, you can suggest them for NeRF, otherwise we’d love for you to give it a try following those instructions and we’ll NeRF em!

3 Likes

What an amazing thing, @nikola :slight_smile:

BR, Yaro

1 Like

Will it be possible to download the NeRF data? Or is it already available via api (the two applicable documented fields are mesh and sfm_cluster)?

I’d really like to investigate pulling it into JOSM/Rapid as a background layer for mapping. I think this would be extraordinarily useful in the following situations:

  • New construction
  • Overhead obstructions (trees, tunnels, etc.)
  • Height information (bridges, passageways, etc.)
  • High resolution imagery of a control point (such as a survey marker) for aligning aerial imagery

@nikola Cool :sunglasses: stuff indeed!
Do fixed white balance, exposure time, and sensitivity help or are the images color normalized over the entire sequence before processing?

I am assuming you won’t be doing some sort of preprocessing to see areas potentially suitable for this, and will instead rely on the suggestions only? given the resource constraints etc

@nikola Yeah, and my next questions would be: What is the time frame we might expect any answer (positive or negative) in after posting a suggestion? Furthermore, what about adding a contributor’s NeRFs to the feed?

@nikola The “Suggest for NeRF” button in the “Advanced” sub‑menu should be only visible for logged in users. Clicking this button when not logged in just gives some obscure error message. And finally, once a sequence has been suggested the button should either vanish or be made disabled with its label changed to something like “Suggested for NeRF”.

Thanks for the feedback! I’ll also look into the issue for logged out users.
We’ve added some more info to the guide: https://help.mapillary.com/hc/en-us/articles/12769328936476-NeRF-capture-instructions We plan on reviewing and adding new NeRFs on a weekly basis.

1 Like

称名寺(Shomyoji Temple) is a Buddhist temple located in Yokohama, Japan. The Kanazawa Bunko, which is attached to the temple, was built in the 13th century and had a large collection of books. It is sometimes said to be the first library built by samurai.

  • LhXImQ6qRp3vMKF7VUzNcd
  • bi0PGhOFKWJwvEaCeVr3ym
  • OTCcDt1gn4xU7WeHfuVq3Q
    I have recommended these sequences as NeRF because they meet three criteria.
  1. It is an important building in the history of Japan and of interest to many people.
  2. Few tourists will be in the picture.
  3. Easily reachable by public transportation.

In taking the NeRF photos, I have three questions

  • A Mapillary sequence may be split into multiple sequences based on the distance and angle of the photo. How do I submit multiple sequences of a single object taken as a NeRF?
  • Shoot a single object with multiple cameras. For example, an Android smartphone and a Micro Four Thirds. Can I apply for these multiple sequences as a NeRF?
  • Digital cameras often do not have an electronic compass; is it possible to submit a sequence without an EXIF compass direction tag as a NeRF?

Hey Gitne!
Yes, generally it helps if these camera parameters are fixed as we then have better correspondence of the rgb values across different images.

1 Like

Thanks for capturing the Shomyoji Temple and the Kanazawa Bunko, this looks like a great scene to create a NeRF of!
To answer your questions:

  1. Please tag all sequences as NeRF candidates, we will merge sequences that belong to a single capture together for you. In the future, we also may expose a UI to let you choose sequences that belong together.
  2. Note that NeRFs work best if the whole capture was done with a single camera. However, we are happy if you tag all of your sequences as NeRF candidates even when taken with different camera models. We will run our pipeline on the two captures with the different cameras and will publish the NeRF from the data that yielded a better result on the Mapillary website.
  3. As far as I know, we need at minimum a GPS location, so your digital camera should geotag all captured images and save this geotag in the exif. The view direction from a digital compass in the exif is optional.
2 Likes

Thank you for answering my questions in detail. I have also submitted as NeRF a sequence taken of Shomyoji Temple using a Panasonic GM1S. These photographs contain more detailed surface of the wooden architecture.

The next time I take a photo for NeRF, I will fix the exposure and white balance.
Thanks for your advice, it is much appreciated.

1 Like

Those videos are an amazing tech demo! I hope we will be able to freely move in those environments soon!

4 Likes

The NeRF videos are popping up nicely. :smiley:
Here are some suggestions on how to perhaps make the videos even better, if you can:

  • It would be nice to have some standard resolution and aspect ratio, like 3840×2160 (4K) and 16÷9
  • Make the videos fast start capable by adding -movflags faststart. This is especially helpful when streaming videos over the internet.
  • You can drop the useless handler name by adding -empty_hdlr_name 1
  • Lowering the GOP to 30 or even 15 frames (-g option) should get grid of perhaps even more artifacts
  • -psy 1 is basically useless at these resolutions but costs encoding time and visual quality
  • Consider using -bitexact to get rid of even more useless ballast for streaming over the internet
  • 60 fps is nice! :+1:

Glad that you like it! :slight_smile:

I will change it such that all future videos are 1920 X 1080 pixels. For higher resolutions, i.e. 4K, I do not think that we would get a big quality gain as the limiting factor becomes our NeRF technology.

We don’t use ffmpeg for creating the videos as of now, but I will have a look into adding the video settings you proposed.

1 Like

We don’t use ffmpeg for creating the videos as of now, but I will have a look into adding the video settings you proposed.

x264 - core 157 - H.264/MPEG-4 AVC codec - Copyleft 2003-2018 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 fast_pskip=1 chroma_qp_offset=-2 threads=48 lookahead_threads=8 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.0

Oh right, I must have overlooked this. :face_with_hand_over_mouth: It’s VideoLAN indeed. Anyhow, VideoLAN as a frontend uses basically the same libavcodec and libavformat backends as FFmpeg, so you should be able to pass FFmpeg options down the stack. :wink:

Unfortunately, I have not had the time to capture a sequence specifically for NeRF so far but I am already looking out for some potential places and planning capture routes.

2 Likes

Hey, really neat idea!

What I don’t get though is why some sequencies have “Suggest for NeRF” button disabled, even if they have 300 images? Plus considering the 200 images limit, it’s makes it impossible to use “breakout” sequences…

For examples:
1/3 KrFdxqWmjkUDOepL9wnhgV
2/3 YV8a0cf6y3HukSotRgve5r
3/3 369VimT4zqAcLosklejayQ (tail sequence, 136 images)

This are also part of the same walk:
znUB8MYTsm59NtrugAxGfq - 112
U13r9Q5G0nBPJgNxVTZzKX - 126
INbOA9sy8CUziEac5HD3Lu - 160
pfor5cX8wFjk7yBR4IHmGD - 47