Twitter Datasets from Crises

The following resources contain crisis-related posts collected from Twitter, human-labeled tweets, dictionaries of out-of-vocabulary (OOV) words, word2vec embeddings, and other related tools. For more information about these resources, see the following paper.

Please cite the following paper, if you use any of these resources in your research.


Muhammad Imran, Prasenjit Mitra, Carlos Castillo: Twitter as a Lifeline: Human-annotated Twitter Corpora for NLP of Crisis-related Messages. In Proceedings of the 10th Language Resources and Evaluation Conference (LREC), pp. 1638-1643. May 2016, Portoro┼ż, Slovenia. [Bibtex]

Datasets details:

Crisis Type Crisis Name Country Language Number of Tweets Year
Earthquake Nepal Earthquake Nepal English 4,223,937 2015
Earthquake Terremoto Chile Chile Spanish 842,209 2014
Earthquake Chile Earthquake Chile English 368,630 2014
Earthquake California Earthquake USA English 254,525 2014
Earthquake Pakistan Earthquake Pakistan English 156,905 2013
Typhoon Cyclone PAM Vanuatu English 490,402 2015
Typhoon Typhoon Hagupit Phillippines English 625,976 2014
Typhoon Hurricane Odile Mexico English 62,058 2014
Volcano Iceland Volcano Iceland English 83,470 2014
Floods Pakistan Floods Pakistan English 1,236,610 2014
Floods India Floods India English 5,259,681 2014
War & Conflicts Palestine Conflict Palestine & Israel English 27,770,276 2014
War & Conflicts Peshawar School Attack Pakistan English 1,135,655 2014
Biological Middle East Respiratory Syndrome (MERS) Worldwide English 215,370 2014
Biological Ebola Virus Outbreak Worldwide English 5,107,139 2014
Landslide Landslides worldwide Worldwide English 382,626 2014
Landslide Landslides worldwide Worldwide French 17,329 2015
Landslide Landslides worldwide Worldwide Spanish 75,244 2015
Airline Accident Flight MH370 Malaysia English 4,507,157 2014

Human-Labeled data

Labeled data (tweet-ids, labels) annotated by paid workers: — Labeled data (tweet-ids, labels) v1.1.zip

Labeled data (tweet-ids, labels) annotated by volunteers: — Labeled data (tweet-ids, labels) v1.0.zip

Word2vec embeddings trained using crisis-related tweets: — Word2vec Crisis Embeddings v1.2.zip (2.5GB)

Out-Of-Vocabular (OOV) words and their meanings: — OOV Doctionary v1.0.zip

Tweets downloader tool

To download full tweets content from Twitter, you can use our Tweets downloader tool written in Java. The tool can make 180 API calls per 15 minutes, each API call allows to get up to 100 tweets i.e. it can download up to 72,000 tweets per hour. — Tweets Downloader v1.2.zip

Available resources