MEDIC Dataset

The MEDIC is the largest multi-task learning disaster related dataset, which is an extended version of the crisis image benchmark dataset. It consists data from several data sources such as CrisisMMD, data from AIDR and Damage Multimodal Dataset (DMD). The dataset contains 71,198 images.

Data format and directories

Directories

data: This is the main directory under which the following directory contains

aidr_disaster_types/: Contains images collected using AIDR system for disaster types task.
aidr_info/: Contains images collected using AIDR system for informativeness task.
ASONAM17_Damage_Image_Dataset/: Damage Assessment Dataset [4]
crisismmd/: CrisisMMD dataset [2].
multimodal-deep-learning-disaster-response-mouzannar/: Damage Multimodal Dataset (DMD) [3]

MEDIC_train.tsv, MEDIC_dev.tsv and MEDIC_test.tsv/: are train, dev and test files with the following file format.
LICENSE_CC_BY_NC_SA_4.0.txt: License information.
terms-of-use.txt: Terms and conditions.

Format of the TSV file

image_id: Corresponds to the either tweet id from Twitter or id from the respective source.
event_name: Name of the event or data source.
image_path: Relative path of the image.
damage_severity: Corresponds to the damage severity class label.
informative: Corresponds to the informativeness class label.
humanitarian: Corresponds to the disaster types class label.
disaster_types: Corresponds to the disaster types class label.

Disaster response tasks

Disaster types
- Earthquake
- Fire
- Flood
- Hurricane
- Landslide
- Not disaster
- Other disaster
Informativeness
- Informative
- Not informative
Humanitarian categories
- Affected, injured, or dead people
- Infrastructure and utility damage
- Not humanitarian
- Rescue volunteering or donation effort
Damage severity assesment
- Little or no damage
- Mild damage
- Severe damage

Downloads:

MEDIC Dataset, version v1.0: download (11G)
Code: https://github.com/firojalam/medic/

Please cite the following papers, if you use this dataset in your research.

Firoj Alam, Tanvirul Alam, Md. Arid Hasan, Abul Hasnat, Muhammad Imran, Ferda Ofli, MEDIC: A Multi-Task Learning Dataset for Disaster Image Classification. Neural Computing and Applications, 35(3):2609–2632, 2023. [Bibtex] [Arxiv]
Firoj Alam, Ferda Ofli, Muhammad Imran, Tanvirul Alam, Umair Qazi, Deep Learning Benchmarks and Datasets for Social Media Image Classification for Disaster Response, In 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2020.
Firoj Alam, Ferda Ofli, and Muhammad Imran, CrisisMMD: Multimodal Twitter Datasets from Natural Disasters. In Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM), 2018, Stanford, California, USA.
Hussein Mozannar, Yara Rizk, and Mariette Awad, Damage Identification in Social Media Posts using Multimodal Deep Learning, In Proc. of ISCRAM, May 2018, pp. 529–543.
Dat Tien Nguyen, Ferda Ofli, Muhammad Imran, and Prasenjit Mitra, Damage assessment from social418media imagery data during disasters. In Proc. of ASONAM, pages 1–8, Aug 2017.

License

The MEDIC dataset is published under CC BY-NC-SA 4.0 license, which means everyone can use this dataset for non-commercial research purpose: https://creativecommons.org/licenses/by-nc/4.0/.

Terms of Use

Please see Terms of Use