Resources for Research on Crisis Informatics

The following resources are made available to help researchers and technologists to advance research on humanitarian and crisis computing by developing new computational models, innovative techniques, and systems useful for humanitarian aid.

RESOURCE # 1
This resource consists of Twitter data collected during 19 natural and man-made disasters. Each dataset contains crisis-related tweets ids and human-labeled tweets. Moreover, it contains a dictionary of out-of-vocabulary(OOV) words, a word2vec model, and a tweets downloader tool. Please cite the following paper, if you use any of these resources in your research.

Muhammad Imran, Prasenjit Mitra, and Carlos Castillo: Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages. In Proceedings of the 10th Language Resources and Evaluation Conference (LREC), pp. 1638-1643. May 2016, Portoro┼ż, Slovenia. [Bibtex]

Resource details and downloading »
RESOURCE # 2
This resource consists of human-labeled tweets collected during the 2012 Hurricane Sandy and the 2011 Joplin tornado. Please cite the following paper, if you use this resource in your research.

Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz, and Patrick Meier. Practical Extraction of Disaster-Relevant Information from Social Media. In Proceedings of the 22nd international conference on World Wide Web companion, May 2013, Rio de Janeiro, Brazil. [Bibtex]

Download

RESOURCE # 3
This resource consists of human-labeled tweets collected during the 2011 Joplin tornado and labeled into humanitarina categories. Please cite the following paper, if you use this resource in your research.

Muhammad Imran, Shady Elbassuoni, Carlos Castillo, Fernando Diaz, and Patrick Meier. Extracting Information Nuggets from Disaster-Related Messages in Social Media.In Proceedings of the 10th International Conference on Information Systems for Crisis Response and Management (ISCRAM), May 2013, Baden-Baden, Germany. [Bibtex]

Download

RESOURCE # 4 NEW
This resource provides read-to-use Python implementation of a number of neural network and non-neural network baesd classifiers for the classification of crisis-related Twitter data. Please cite the following paper, if you use this resource in your research.

Dat Tien Nguyen, Kamela Ali Al-Mannai, Shafiq Joty, Hassan Sajjad, Muhammad Imran, Prasenjit Mitra. Robust Classification of Crisis-Related Data on Social Networks using Convolutional Neural Networks. In Proceedings of the 11th International AAAI Conference on Web and Social Media (ICWSM), 2017, Montreal, Canada.

Resource details and downloading »

RESOURCE # 5 NEW
This resource provides human-labeled multimodal datasets comprised of tweets and images collected during seven major natural disasters. Please cite the following paper, if you use this resource in your research.

Firoj Alam, Ferda Ofli, Muhammad Imran. CrisisMMD: Multimodal Twitter Datasets from Natural Disasters. To appear at the 12th International AAAI Conference on Web and Social Media (ICWSM), 2018, Stanford, California, USA. [Bibtex]

Download (~1.8GB)

RESOURCE # 6 NEW
This resource comprised of tweet-ids and a sample of raw tweets (50k) collected during three devastating hurricanes in 2017 namely Hurricane Harvey, Hurricane Irma, and Hurricane Maria.

Firoj Alam, Ferda Ofli, Muhammad Imran, Michael Aupetit. A Twitter Tale of Three Hurricanes: Harvey, Irma, and Maria. In proceedings of the 15th International Conference on Information Systems for Crisis Response and Management (ISCRAM), May 2018, Rochester NY, USA. [Bibtex]

Download (~64MB)

RESOURCE # 7 NEW
This resource comprised of human-labeled tweets collected from the 2015 Nepal earthquake and the 2013 Queensland floods.

Firoj Alam, Shafiq Joty, Muhammad Imran. Domain Adaptation with Adversarial Training and Graph Embeddings. Accepted for publication at the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018, Melbourne, Australia. [Bibtex]

Download (~7MB)

RESOURCE # 8 NEW
This resource is Java-based tool to download full tweets content using tweet ids. This tool can make 180 API calls per 15 minutes, each API call downloads up to 100 tweets i.e. it can download up to 72,000 tweets per hour.

Download

RESOURCE # 9 NEW
This corpus comprises images collected from Twitter during four natural disasters, namely Typhoon Ruby (2014), Nepal Earthquake (2015), Ecuador Earthquake (2016), and Hurricane Matthew (2016). In addition to Twitter images, it contains images collected from Google using queries such as "damage building", "damage bridge", and "damage road" to deal with labeled data scarcity problem.

Dat Tien Nguyen, Ferda Ofli, Muhammad Imran, Prasenjit Mitra. Damage Assessment from Social Media Imagery Data During Disasters. In Proceedings of the IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), 2017, Sydney, Australia. [Bibtex]

Download (~5GB)

RESOURCE # 10 NEW
This resource comprised of human-labeled tweets collected from the 2015 Nepal earthquake and the 2013 Queensland floods.

Firoj Alam, Shafiq Joty, Muhammad Imran. Graph Based Semi-supervised Learning with Convolutional Neural Networks to Classify Crisis Related Tweets. Accepted for publication at the International AAAI Conference on Web and Social Media (ICWSM), 2018, Stanford, California, USA.

Download (~7MB)

Please carefully read our Terms of use before using resources available on this site.

Subscribe to CrisisNLP to receive announcements about these and new resources.
Follow us on Twitter: @NLP4Crisis
For inquiries, issues, feedback, or collaborations, contact: Admins