Anonymize (blurring faces and license plates) before upload

With the recent acquisition of Mapillary, many users are concerned about Facebook’s access to unblurred images.

One solution could be adding one step more to the workflow, and blurring faces and license plates BEFORE upload to Mapillary (or OpenStreetCam, OpenTrailView, etc).

Well, I tried this project GitHub - understand-ai/anonymizer: **ARCHIVED** An anonymizer to obfuscate faces and license plates. and works very good. Its open source and multiplatform. You need to have python 3.6 in your system (I tried with python 3.8 and fail, but I forked the repo I’ll try to add support to recent versions).

Installation
$ python --version
Python 3.6.10

python -m venv ~/.virtualenvs/anonymizer
source ~/.virtualenvs/anonymizer/bin/activate

git clone GitHub - understand-ai/anonymizer: **ARCHIVED** An anonymizer to obfuscate faces and license plates.
cd anonymizer

pip install --upgrade pip
pip install -r requirements.txt

Usage
In my test, the process of a 10MB 360 pano jpg image, consumes 3G of RAM. So, its recomended to close heavy applications in your system before executing.
PYTHONPATH=$PYTHONPATH:. python anonymizer/bin/anonymize.py --input /path/to/input_folder --image-output /path/to/output_folder --weights weights

Replace input and output folder paths.
If the weights folder not exist, it created and the files weights_face_v1.0.0.pb weights_plate_v1.0.0.pb are automatically downloaded.

I’ll try to build a docker image for easy installation and usage.

5 Likes

very nice workflow, thanks.

But it would be nice if Mapillary just gave us a checkbox option to indicate if my original raw upload(s) should be removed on their server after blur algorithm has been applied, so they keep only the blurred version of uploaded images

Well done @juanman . This looks promising and could work for my workflow because I use action cams and a 360 camera. Any idea how much time this extra step consumes. I sometimes shoot > 10.000 images a day. I agree with @micmin1972 but we’ll have to wait until Mapilary react which could take some time I guess.

Hi
Looks petty good.
Can many images be anonymized at the same time or does this have to be done for each individual image?
Regards
Dominik

Do drop a line when you have added 3.8 support !

I dont know if is there a GUI. The script write a .json file with the coordinates of plates and faces detected, so could be the input data for a GUI.

In my tests, I dont see false positives.

Another project for the same function is GitHub - everestpipkin/image-scrubber: A friendly browser-based tool for anonymizing photographs taken at protests Its a web GUI, but for do the blur manually.

Yes, you write the input directory path, and the script do the blur for all the images in that directory.

I succeeded to make this tool work under Debian and did some tests with captured sequences. My experience, mainly with plate detection as this is easier to verify than face detection:

  • Face and plate detection seems to work very well. However, with the default parameters, some plates are not detected. Lowering the detection thresholds from 0.3 to 0.1 results in close to 100% detection rate. In several hundred photos, I did not find one where a plate or a face has not been detected.
  • In some rare cases, a plate is detected with a bad position and the the bounding rectangle does not or not fully cover the plate. I suppose that this is a bug in the tool.
  • With the default blurring parameters ( --obfuscation-kernel 21,2,9), large size plates in the foreground are not sufficiently blurred. The size of the gaussian kernel must be increased significantly to get good results. I got good results with --obfuscation-kernel 47,1,9
  • Surprisingly (for me), the processing performance of the tool depends mainly on the blurring parameters, especially on the the gaussian kernel size value. On my system, processing of a picture with the default values takes about one minute. With --obfuscation-kernel 47,1,9 it takes up to 3 minutes, with --obfuscation-kernel 1,1,1 it takes less than 10 seconds (without any blurring, of course).

To get a decent processing time I set up this workflow:

  • Process the images with anonymizer --obfuscation-kernel 1,0,1
  • Throw away the output images and keep the JSON files
  • Blur the original images with imagemagick -gaussian-blur using the rectangle coordinates from the JSON files

This is sufficiently fast for my purpose. The blurred areas have sharp borders which does not bother me. There is probably a way to get smooth transitions with imagemagick.

1 Like

It’s a good suggestion and one we’re looking into. In the meantime, we’re blurring images as soon as they hit our servers and removing the originals.

More updates to come on privacy and blurring.

2 Likes

Contrary to most, I would like to keep access to my unblurred pictures.
It is sentimentally important to me. And do not forget the wrong blurring.

1 Like

Good the know. I have many more images ready for upload a soon as privacy is dealt with in an acceptable way for me. Removing originals is fine for me…I use (360 degree) action cams most of the time so I have originals anyway. I store my most interesting sequences on an external hard drive.

Keep posting on privacy and blurring because I hope for a suitable solution in the Mapillary workflow.

1 Like

Statistically one of my pictures must be an award winning piece of art.
So I prefer not to blur.

Note: it is possible to blur images without affecting the rest of the file, due to the structure of JPEG:

http://mapillary.trydiscourse.com/t/mapillary-joins-facebook/4163/98?u=enteq

Additionally, the “diff” file could be a normal jpeg, with just everything white or black except the blurred parts.
And, if it one encrypts it with the uploader’s public key, it could even be stored safely on facebook’s servers. But I digress…

1 Like

Hi, just following up this - I know it’s been a while but I’ve experimented a bit with the understand.ai anonymizer myself.

Again, pretty good results for faces that are clearly visible. Don’t have too many panos with license plates so untested as yet.

To deal with the blurring part being slow, I just used the detection part and then used Pillow’s inbuilt blur function. Performance was fine.

However I have a question on our legal obligations - this is for a separate project - not Mapillary - so apologies if it’s inappropriate (my server is in Germany, I am in UK, presumably German law applies as does the GDPR): to what extent do we need to blur faces and license plates?

With this tool I can blur faces that are clearly visible. Faces further away from the camera are not reliably detected and blurred - but these faces are not clearly visible anyway. Do we thus have to blur ALL people showing on the panorama irrespective of whether their face is clearly visible, or just faces that are clearly visible? From a privacy POV I’d have thought just clearly visible faces, but IANAL.

Thanks.

Serious problem

I successfully installed the package (with “tensorflow-gpu” replaced with “tensorflow” in the requirements.txt file) per your instruction on Ubuntu 18.04 LTS on Windows 10 subsystem Linus 2. Python version is 3.6.9.

I ran with the command:
PYTHONPATH=$PYTHONPATH:. pytho
n anonymizer/bin/anonymize.py --input /mnt/d/dl/virbphoto/1024 --image-output /m
nt/d/dl/virbphoto/1024x --weights anonymizer/weights

The run was successful and the car plates and human faces in the photos were properly blurred.

However all the EXIF tags in all the outout jpg files are removed. There is a serious problem when uploading to Mapillary, because Mapillary requires the date and time the image was taken and the GPS coordinates.

Anybody has the same problem? Did I missed something? or there is a parameter that tells Anonymizer to not remove EXIF tags?

Technically, Anonymizer does not remove EXIF tags. It creates a new image which never had the tags. You can modify the code to copy the tags or just use exiftool afterwards:

exiftool -TagsFromFile oldImage.jpg newImage.jpg

On the GDPR part of your post:

Disclaimer : The following contribution is added by someone who hasn’t completed formal legal training, but who did over the decades pick up elements from the legal systems in both the UK and the nearby Continent (more specifically Belgium and the Netherlands) - amongst others from CPD seminars.

You may well find that you have complied with privacy laws if the face-like shape can’t be authenticated as being one particular person? And like similar for license plates?

Imagine, on a horizontal row, four dark pixels, two bright ones and again four dark ones, below that two 2×2pixel dark squares midway below the four dark pixels : I’ve just descriibed what a distant face looks like on a pixel-by-pixel basis;

if you showed a photo with this level of detail to a person and told them ‘it’s you’ they might believe that to be the case, especially if the clothing matches what they’re wearing; now take that photo to their next door neighbour, to relatives or work colleagues : if no one recognises the person then they’re obviously not authenticable, thus one would be unlikely to have violated their privacy? If someone offers ‘so-and-so has a sweater like that’ - they’re not certain, and your authenticating that person’s face has failed.

But please bear in mind that although German law may state one thing, British law incorporates a concept (other jurisdictions have similar concepts) whereby the photo or whatever is published where ever a member of the public views it : thus a photo hosted on a server located in Germany may be published in the US, and under a different doctrine in English law : where the laws of any country offer inadequate protection, the English courts may decide to hear a case and pass judgement.

In concluding I’d offer that blurring is an unavoidable requirement, but in defense of not blurring small features one could present that the face or license plate can’t be reliably connected to be one unique person or vehicle.

Please remember that this is a personal opinion, building on bits gleaned over the years.