MEDIC Dataset
Data
The MEDIC is the largest multi-task learning disaster related dataset, which is an extended version of the crisis image benchmark dataset. It consists data from several data sources such as CrisisMMD, data from AIDR and Damage Multimodal Dataset (DMD). The dataset contains 71,198 images.
Table of Contents:
Data format and directories
Directories
- data: This is the main directory under which the following directory contains
- aidr_disaster_types/: Contains images collected using AIDR system for disaster types task.
- aidr_info/: Contains images collected using AIDR system for informativeness task.
- ASONAM17_Damage_Image_Dataset/: Damage Assessment Dataset [4]
- crisismmd/: CrisisMMD dataset [2].
- multimodal-deep-learning-disaster-response-mouzannar/: Damage Multimodal Dataset (DMD) [3]
- MEDIC_train.tsv, MEDIC_dev.tsv and MEDIC_test.tsv/: are train, dev and test files with the following file format.
- LICENSE_CC_BY_NC_SA_4.0.txt: License information.
- terms-of-use.txt: Terms and conditions.
Format of the TSV file
- image_id: Corresponds to the either tweet id from Twitter or id from the respective source.
- event_name: Name of the event or data source.
- image_path: Relative path of the image.
- damage_severity: Corresponds to the damage severity class label.
- informative: Corresponds to the informativeness class label.
- humanitarian: Corresponds to the disaster types class label.
- disaster_types: Corresponds to the disaster types class label.
Disaster response tasks
-
Disaster types
- Earthquake
- Fire
- Flood
- Hurricane
- Landslide
- Not disaster
- Other disaster
- Informativeness
- Informative
- Not informative
- Humanitarian categories
- Affected, injured, or dead people
- Infrastructure and utility damage
- Not humanitarian
- Rescue volunteering or donation effort
- Damage severity assesment
- Little or no damage
- Mild damage
- Severe damage
Downloads:
- MEDIC Dataset, version v1.0: download (11G)
- Code: https://github.com/firojalam/medic/
Please cite the following papers, if you use this dataset in your research.
-
Firoj Alam, Tanvirul Alam, Md. Arid Hasan, Abul Hasnat, Muhammad Imran, Ferda Ofli, MEDIC: A Multi-Task Learning Dataset for Disaster Image Classification. Neural Computing and Applications, 35(3):2609–2632, 2023. [Bibtex] [Arxiv]
-
Firoj Alam, Ferda Ofli, Muhammad Imran, Tanvirul Alam, Umair Qazi, Deep Learning Benchmarks and Datasets for Social Media Image Classification for Disaster Response, In 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2020.
Firoj Alam, Ferda Ofli, and Muhammad Imran, CrisisMMD: Multimodal Twitter Datasets from Natural Disasters. In Proceedings of the 12th International AAAI Conference on Web and Social Media (ICWSM), 2018, Stanford, California, USA.
Hussein Mozannar, Yara Rizk, and Mariette Awad, Damage Identification in Social Media Posts using Multimodal Deep Learning, In Proc. of ISCRAM, May 2018, pp. 529–543.
Dat Tien Nguyen, Ferda Ofli, Muhammad Imran, and Prasenjit Mitra, Damage assessment from social418media imagery data during disasters. In Proc. of ASONAM, pages 1–8, Aug 2017.
License
The MEDIC dataset is published under CC BY-NC-SA 4.0 license, which means everyone can use this dataset for non-commercial research purpose: https://creativecommons.org/licenses/by-nc/4.0/.
Terms of Use
Please see Terms of Use