Behind the Scenes: Building an AI Identification Check
I run a site called SEASLUG.WORLD. Divers have contributed over 45,000 underwater photographs. When posting, contributors select the species themselves — but in the sea slug world with so many look-alikes, misidentifications are inevitable.
Checking all of them by hand is impossible, so I decided to get some help from AI.
How it works
Running a photo through an AI model yields a "feature vector" for that image — a sequence of 768 numbers, effectively the fingerprint of the photo.
Photos that look similar produce similar fingerprints. Using this, for any given photo you can ask "which photo in the database is the most similar?"
For example, running a similarity search for this photo brings up a lineup of visually similar photos. Each of those has a species name attached by its contributor, so by tallying "which species appear at the top of the results", the AI can estimate what species this photo is.
Applied to identification check
Running this species by species in bulk is the "identification check" feature I built.
Accuracy varies by species.
For each species, we run every contributed photo through similarity search and tally the species of the top results. If "this photo is tagged as species A, but the similar photos are all species B", the identification may be wrong.
With one button in the admin panel, all photos attached to a species are checked at once. Suspicious photos are listed with a thumbnail and the estimated species, and clicking jumps straight to the edit screen.
Still early days.
Honestly, accuracy is still a work in progress. Even within the same species, some photos get judged "not similar" under different shooting conditions; and conversely, different species sometimes come out "similar" just because the background matches.
Right now I'm experimenting with model selection and with using type-specimen photos from scientific monographs as ground truth. I'll share another update when it's going well.
Enjoyed this post? You can tip the author directly —
Tip this post