Twitter Datasets from Crises

The following resources contain crisis-related posts collected from Twitter, human-labeled tweets, dictionaries of out-of-vocabulary (OOV) words, word2vec embeddings, and other related tools. For more information about these resources, see the following paper.

Please cite the following paper, if you use any of these resources in your research.

Muhammad Imran, Prasenjit Mitra, Carlos Castillo: Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages. In Proceedings of the 10th Language Resources and Evaluation Conference (LREC), pp. 1638-1643. May 2016, Portoro┼ż, Slovenia. [Bibtex]

Datasets details:

Crisis Type Crisis Name Country Language Number of Tweets Year
Earthquake Nepal Earthquake Nepal English 4,223,937 2015
Earthquake Terremoto Chile Chile Spanish 842,209 2014
Earthquake Chile Earthquake Chile English 368,630 2014
Earthquake California Earthquake USA English 254,525 2014
Earthquake Pakistan Earthquake Pakistan English 156,905 2013
Typhoon Cyclone PAM Vanuatu English 490,402 2015
Typhoon Typhoon Hagupit Phillippines English 625,976 2014
Typhoon Hurricane Odile Mexico English 62,058 2014
Volcano Iceland Volcano Iceland English 83,470 2014
Floods Pakistan Floods Pakistan English 1,236,610 2014
Floods India Floods India English 5,259,681 2014
War & Conflicts Palestine Conflict Palestine & Israel English 27,770,276 2014
War & Conflicts Peshawar School Attack Pakistan English 1,135,655 2014
Biological Middle East Respiratory Syndrome (MERS) Worldwide English 215,370 2014
Biological Ebola Virus Outbreak Worldwide English 5,107,139 2014
Landslide Landslides worldwide Worldwide English 382,626 2014
Landslide Landslides worldwide Worldwide French 17,329 2015
Landslide Landslides worldwide Worldwide Spanish 75,244 2015
Airline Accident Flight MH370 Malaysia English 4,507,157 2014

Labeled data and other resources

1. Labeled data of all the events annotated by paid workers:» Download
2. Labeled data of all the events annotated by volunteers: » Download
3. Word2vec embeddings trained using crisis-related tweets (size ~2.5GB):» Download word2vec model
4. Out-Of-Vocabular (OOV) words and their meanings:» Download OOV dictionary