Species detection¶
The classification algorithms in zamba
are designed to identify species of animals that appear in camera trap videos. The pretrained models that ship with the zamba
package are: blank_nonblank
, time_distributed
, slowfast
, and european
. For more details of each, read on!
Model summary¶
Model | Geography | Relative strengths | Architecture | Number of training videos |
---|---|---|---|---|
blank_nonblank |
Central Africa, West Africa, and Western Europe | Just blank detection, without species classification | Image-based TimeDistributedEfficientNet |
~263,000 |
time_distributed |
Central and West Africa | Recommended species classification model for jungle ecologies | Image-based TimeDistributedEfficientNet |
~250,000 |
slowfast |
Central and West Africa | Potentially better than time_distributed at small species detection |
Video-native SlowFast |
~15,000 |
european |
Western Europe | Trained on non-jungle ecologies | Finetuned time_distributed model |
~13,000 |
The models trained on the largest datasets took a couple weeks to train on a single GPU machine. Some models will be updated in the future, and you can always check the changelog to see if there have been updates.
All models support training, fine-tuning, and inference. For fine-tuning, we recommend using the time_distributed
model as the starting point.
What species can zamba
detect?¶
The blank_nonblank
model is trained to do blank detection, without the species classification. It only outputs the probability that the video is blank
, meaning that it does not contain an animal.
The time_distributed
and slowfast
models are both trained to identify 32 common species from Central and West Africa. The output labels in these models are:
aardvark
antelope_duiker
badger
bat
bird
blank
cattle
cheetah
chimpanzee_bonobo
civet_genet
elephant
equid
forest_buffalo
fox
giraffe
gorilla
hare_rabbit
hippopotamus
hog
human
hyena
large_flightless_bird
leopard
lion
mongoose
monkey_prosimian
pangolin
porcupine
reptile
rodent
small_cat
wild_dog_jackal
The european
model is trained to identify 11 common species in Western Europe. The possible class labels are:
bird
blank
domestic_cat
european_badger
european_beaver
european_hare
european_roe_deer
north_american_raccoon
red_fox
weasel
wild_boar
blank_nonblank
model¶
Architecture¶
The blank_nonblank
uses the same architecture as time_distributed
model, but there is only one output class as this is a binary classification problem.
Default configuration¶
The full default configuration is available on Github.
The blank_nonblank
model uses the same default configuration as the time_distributed
model. For the frame selection, an efficient object detection model called MegadetectorLite is run on all frames to determine which are the most likely to contain an animal. Then the classification model is run on only the 16 frames with the highest predicted probability of detection.
Training data¶
The blank_nonblank
model was trained on all the data used for the the time_distributed
and european
models.
time_distributed
model¶
Architecture¶
The time_distributed
model was built by re-training a well-known image classification architecture called EfficientNetV2 (Tan, M., & Le, Q., 2019) to identify the species in our camera trap videos. EfficientNetV2 models are convolutional neural networks designed to jointly optimize model size and training speed. EfficientNetV2 is image native, meaning it classifies each frame separately when generating predictions. The model is wrapped in a TimeDistributed
layer which enables a single prediction per video.
Training data¶
The time_distributed
model was trained using data collected and annotated by trained ecologists from Cameroon, Central African Republic, Democratic Republic of the Congo, Gabon, Guinea, Liberia, Mozambique, Nigeria, Republic of the Congo, Senegal, Tanzania, and Uganda, as well as citizen scientists on the Chimp&See platform.
The data included camera trap videos from:
Country | Location |
---|---|
Cameroon | Campo Ma'an National Park |
Korup National Park | |
Central African Republic | Dzanga-Sangha Protected Area |
Côte d'Ivoire | Comoé National Park |
Guiroutou | |
Taï National Park | |
Democratic Republic of the Congo | Bili-Uele Protect Area |
Salonga National Park | |
Gabon | Loango National Park |
Lopé National Park | |
Guinea | Bakoun Classified Forest |
Moyen-Bafing National Park | |
Liberia | East Nimba Nature Reserve |
Grebo-Krahn National Park | |
Sapo National Park | |
Mozambique | Gorongosa National Park |
Nigeria | Gashaka-Gumti National Park |
Republic of the Congo | Conkouati-Douli National Park |
Nouabale-Ndoki National Park | |
Senegal | Kayan |
Tanzania | Grumeti Game Reserve |
Ugalla River National Park | |
Uganda | Budongo Forest Reserve |
Bwindi Forest National Park | |
Ngogo and Kibale National Park |
Default configuration¶
The full default configuration is available on Github.
By default, an efficient object detection model called MegadetectorLite is run on all frames to determine which are the most likely to contain an animal. Then time_distributed
is run on only the 16 frames with the highest predicted probability of detection. By default, videos are resized to 240x426 pixels following frame selection.
The default video loading configuration for time_distributed
is:
video_loader_config:
model_input_height: 240
model_input_width: 426
crop_bottom_pixels: 50
fps: 4
total_frames: 16
ensure_total_frames: true
megadetector_lite_config:
confidence: 0.25
fill_mode: score_sorted
n_frames: 16
frame_batch_size: 24
image_height: 640
image_width: 640
You can choose different frame selection methods and vary the size of the images that are used by passing in a custom YAML configuration file. The only requirement for the time_distributed
model is that the video loader must return 16 frames.
slowfast
model¶
Architecture¶
The slowfast
model was built by re-training a video classification backbone called SlowFast (Feichtenhofer, C., Fan, H., Malik, J., & He, K., 2019). SlowFast refers to the two model pathways involved: one that operates at a low frame rate to capture spatial semantics, and one that operates at a high frame rate to capture motion over time.
Source: Feichtenhofer, C., Fan, H., Malik, J., & He, K. (2019). Slowfast networks for video recognition. In Proceedings of the IEEE/CVF international conference on computer vision (pp. 6202-6211).
Unlike time_distributed
, slowfast
is video native. This means it takes into account the relationship between frames in a video, rather than running independently on each frame.
Training data¶
The slowfast
model was trained on a subset of the data used for the time_distributed
model.
Default configuration¶
The full default configuration is available on Github.
By default, an efficient object detection model called MegadetectorLite is run on all frames to determine which are the most likely to contain an animal. Then slowfast
is run on only the 32 frames with the highest predicted probability of detection. By default, videos are resized to 240x426 pixels.
The full default video loading configuration is:
video_loader_config:
model_input_height: 240
model_input_width: 426
crop_bottom_pixels: 50
fps: 8
total_frames: 32
ensure_total_frames: true
megadetector_lite_config:
confidence: 0.25
fill_mode: score_sorted
n_frames: 32
image_height: 416
image_width: 416
You can choose different frame selection methods and vary the size of the images that are used by passing in a custom YAML configuration file. The two requirements for the slowfast
model are that:
- the video loader must return 32 frames
- videos inputted into the model must be at least 200 x 200 pixels
european
model¶
Architecture¶
The european
model starts from the a previous version of the time_distributed
model, and then replaces and trains the final output layer to predict European species.
Training data¶
The european
model is finetuned with data collected and annotated by partners at the German Centre for Integrative Biodiversity Research (iDiv) Halle-Jena-Leipzig and The Max Planck Institute for Evolutionary Anthropology. The finetuning data included camera trap videos from Hintenteiche bei Biesenbrow, Germany.
Default configuration¶
The full default configuration is available on Github.
The european
model uses the same default configuration as the time_distributed
model.
As with all models, you can choose different frame selection methods and vary the size of the images that are used by passing in a custom YAML configuration file. The only requirement for the european
model is that the video loader must return 16 frames.
MegadetectorLite¶
Frame selection for video models is critical as it would be infeasible to train neural networks on all the frames in a video. For all the species detection models that ship with zamba
, the default frame selection method is an efficient object detection model called MegadetectorLite that determines the likelihood that each frame contains an animal. Then, only the frames with the highest probability of detection are passed to the model.
MegadetectorLite combines two open-source models:
- Megadetector is a pretrained image model designed to detect animals, people, and vehicles in camera trap videos.
- YOLOX is a high-performance, lightweight object detection model that is much less computationally intensive than Megadetector.
While highly accurate, Megadetector is too computationally intensive to run on every frame. MegadetectorLite was created by training a YOLOX model using the predictions of the Megadetector as ground truth - this method is called student-teacher training.
MegadetectorLite can be imported into Python code and used directly since it has convenient methods for detect_image
and detect_video
. See the API documentation for more details.
User contributed models¶
We encourage people to share their custom models trained with Zamba. If you train a model and want to make it available, please add it to the Model Zoo Wiki for others to be able to use!
To use one of these models, download the weights file and the configuration file from the Model Zoo Wiki. You'll need to create a configuration yaml to use that at least contains the same video_loader_config
from the configuration yaml you downloaded. Then you can run the model with:
$ zamba predict --checkpoint downloaded_weights.ckpt --config predict_config.yaml