Anonymize (blurring faces and license plates) before upload

With the recent acquisition of Mapillary, many users are concerned about Facebook’s access to unblurred images.

One solution could be adding one step more to the workflow, and blurring faces and license plates BEFORE upload to Mapillary (or OpenStreetCam, OpenTrailView, etc).

Well, I tried this project https://github.com/understand-ai/anonymizer and works very good. Its open source and multiplatform. You need to have python 3.6 in your system (I tried with python 3.8 and fail, but I forked the repo I’ll try to add support to recent versions).

Installation
$ python --version
Python 3.6.10

python -m venv ~/.virtualenvs/anonymizer
source ~/.virtualenvs/anonymizer/bin/activate

git clone https://github.com/understand-ai/anonymizer
cd anonymizer

pip install --upgrade pip
pip install -r requirements.txt

Usage
In my test, the process of a 10MB 360 pano jpg image, consumes 3G of RAM. So, its recomended to close heavy applications in your system before executing.
PYTHONPATH=$PYTHONPATH:. python anonymizer/bin/anonymize.py --input /path/to/input_folder --image-output /path/to/output_folder --weights weights

Replace input and output folder paths.
If the weights folder not exist, it created and the files weights_face_v1.0.0.pb weights_plate_v1.0.0.pb are automatically downloaded.

I’ll try to build a docker image for easy installation and usage.

4 Likes

very nice workflow, thanks.

But it would be nice if Mapillary just gave us a checkbox option to indicate if my original raw upload(s) should be removed on their server after blur algorithm has been applied, so they keep only the blurred version of uploaded images

Well done @juanman . This looks promising and could work for my workflow because I use action cams and a 360 camera. Any idea how much time this extra step consumes. I sometimes shoot > 10.000 images a day. I agree with @micmin1972 but we’ll have to wait until Mapilary react which could take some time I guess.

Hi
Looks petty good.
Can many images be anonymized at the same time or does this have to be done for each individual image?
Regards
Dominik

Do drop a line when you have added 3.8 support !

I dont know if is there a GUI. The script write a .json file with the coordinates of plates and faces detected, so could be the input data for a GUI.

In my tests, I dont see false positives.

Another project for the same function is https://github.com/everestpipkin/image-scrubber Its a web GUI, but for do the blur manually.

Yes, you write the input directory path, and the script do the blur for all the images in that directory.

I succeeded to make this tool work under Debian and did some tests with captured sequences. My experience, mainly with plate detection as this is easier to verify than face detection:

  • Face and plate detection seems to work very well. However, with the default parameters, some plates are not detected. Lowering the detection thresholds from 0.3 to 0.1 results in close to 100% detection rate. In several hundred photos, I did not find one where a plate or a face has not been detected.
  • In some rare cases, a plate is detected with a bad position and the the bounding rectangle does not or not fully cover the plate. I suppose that this is a bug in the tool.
  • With the default blurring parameters ( --obfuscation-kernel 21,2,9), large size plates in the foreground are not sufficiently blurred. The size of the gaussian kernel must be increased significantly to get good results. I got good results with --obfuscation-kernel 47,1,9
  • Surprisingly (for me), the processing performance of the tool depends mainly on the blurring parameters, especially on the the gaussian kernel size value. On my system, processing of a picture with the default values takes about one minute. With --obfuscation-kernel 47,1,9 it takes up to 3 minutes, with --obfuscation-kernel 1,1,1 it takes less than 10 seconds (without any blurring, of course).

To get a decent processing time I set up this workflow:

  • Process the images with anonymizer --obfuscation-kernel 1,0,1
  • Throw away the output images and keep the JSON files
  • Blur the original images with imagemagick -gaussian-blur using the rectangle coordinates from the JSON files

This is sufficiently fast for my purpose. The blurred areas have sharp borders which does not bother me. There is probably a way to get smooth transitions with imagemagick.

1 Like

It’s a good suggestion and one we’re looking into. In the meantime, we’re blurring images as soon as they hit our servers and removing the originals.

More updates to come on privacy and blurring.

2 Likes

Contrary to most, I would like to keep access to my unblurred pictures.
It is sentimentally important to me. And do not forget the wrong blurring.

1 Like

Good the know. I have many more images ready for upload a soon as privacy is dealt with in an acceptable way for me. Removing originals is fine for me…I use (360 degree) action cams most of the time so I have originals anyway. I store my most interesting sequences on an external hard drive.

Keep posting on privacy and blurring because I hope for a suitable solution in the Mapillary workflow.

Me too. However, if you think long enough about it there are certain issues that come up with the desire for its convenience. For one, no organization no matter how much manpower and resources it may spend on security is going to be impenetrable. Eventually, any security system can be breached. So, storing raw imagery data basically connected to the internet will always pose a desirable target for malicious actors to attack, especially if they are able to analyze this data with AI, which will surely happen sooner than later.
For two, using another cloud storage service is no option either because reason number one still applies. So, if you really want to store your raw imagery securely, you are left with nothing else than storing it on safely locked up media disconnected from the internet. And, even then it probably will not be perfectly secure but may be sufficiently secure. Anyhow, I think this is the least what every contributor owes or should owe to uninvolved people. Again, that is only if you are interested in archiving raw imagery.

Yes, I am aware that in most jurisdictions capturing images of uninvolved people in public space is absolutely legal. However, you are also probably going to infringe these people’s personal rights as soon as you publish these images or if you completely delegate security concerns to a third party. But, this a topic of its own.

1 Like

Statistically one of my pictures must be an award winning piece of art.
So I prefer not to blur.

Note: it is possible to blur images without affecting the rest of the file, due to the structure of JPEG:

Additionally, the “diff” file could be a normal jpeg, with just everything white or black except the blurred parts.
And, if it one encrypts it with the uploader’s public key, it could even be stored safely on facebook’s servers. But I digress…

1 Like

Hi, just following up this - I know it’s been a while but I’ve experimented a bit with the understand.ai anonymizer myself.

Again, pretty good results for faces that are clearly visible. Don’t have too many panos with license plates so untested as yet.

To deal with the blurring part being slow, I just used the detection part and then used Pillow’s inbuilt blur function. Performance was fine.

However I have a question on our legal obligations - this is for a separate project - not Mapillary - so apologies if it’s inappropriate (my server is in Germany, I am in UK, presumably German law applies as does the GDPR): to what extent do we need to blur faces and license plates?

With this tool I can blur faces that are clearly visible. Faces further away from the camera are not reliably detected and blurred - but these faces are not clearly visible anyway. Do we thus have to blur ALL people showing on the panorama irrespective of whether their face is clearly visible, or just faces that are clearly visible? From a privacy POV I’d have thought just clearly visible faces, but IANAL.

Thanks.