Last September we completed a Tech Challenge for Atrocity Prevention with USAID and Humanity United through the platform, topcoder.com, but the value of what was developed is still being comprehended.
The project started from a generic and broad request: “Can we use the power of the crowd to develop a method that predicts the risk of mass-atrocity cases for every region of every country in the world using open source data“?
At the time we started the project, we only had a general knowledge of what atrocities were happening and where they happened and we knew that we were asking for a tall order from the crowd. This would be the first time that a serious effort was made to predict atrocities before they happened.
This was a difficult task for many reasons. Namely, there is difficulty in finding a sufficient amount of behavioral datasets that cover the world equally. Some even believe there is little hope in achieving meaningful prediction because atrocities represent extreme, and often irrational, cases of human behavior that can hardly be traced from day-to-day life patterns.
For these reasons, the project was launched with very limited expectations. However, crowdsourcing this generic, complex problem resulted in a great success story. It was the crowd that framed the problem, found the datasets, tested/verified/analyzed the data, and prepared the data for the major machine-learning prediction shot – Algorithmic Marathon (See also my post on Algorithmic Marathons).
Long story short, we have developed a set of promising algorithms that use sociopolitical factors from around the world (GDELT) and data from past atrocities (PITF) to predict the monthly risk of future atrocity events for any region of any country for the time period, covered by both datasets. The algorithms performed much better than expected and are currently available for anyone to use as open-source.
This Tech challenge was just the first round of a bigger project. The resulting algorithms are using adopted and pre-processed versions of GDELT and PITF datasets and, thus, their predictions are limited to the period between January 2009 and July 2012 that was available for download at the start of the project. Subsequent rounds for this project will require improved algorithms that remove this limitation of time.
Recently, I was shocked by the cruel kidnapping of over 200 Nigerian school girls. As an atrocity, this was relevant to the Atrocity Prevention challenge. We can ask “Can we apply the atrocity prediction algorithms to this specific dataset?”. To answer that, I have looked at the results of the Tech challenge. Here is what the top algorithm showed for the regions of Nigeria for the time-period January 2009 and July 2012:
The plot shows that the atrocity monthly risk prediction for the Borno State region of Nigeria where the girls got abducted (Purple). The Blue line shows the monthly risk, averaged over all 38 Nigerian regions (all regions are weighted equally). Vertical lines are instances of atrocities in the Borno State, registered in PITF dataset.
The main takeaways here are:
1. The regional risk in Nigeria is not uniform – some regions are much riskier than the others.
2. The Borno State region is one of the 4 “high-risk” regions in Nigeria, and it is a few times riskier than average.
3. The history of that region shows that the risk started to rise quickly in Spring-Summer 2011 and (I speculate) has never gone down since.
4. The data shows there was a growing turbulence in the country at the beginning 2010 what is noticeable in the other 3 high-risk regions, but it had not impacted Borno State. At the same time, the turbulence of Summer 2011 produced a very high impact specifically on this region of Nigeria.
The important question here is: If we could extend the prediction time range of the algorithm, broaden its database, and made it run in real-time, could we have predicted the kidnapping of these Nigerian girls?
The answer is a complex one. We certainly cannot predict the massive kidnapping itself, as this specific atrocity, fortunately, does not happen often enough, to be able to train the algorithm specifically on this type of case. But we can clearly see the atrocity risk, growing in those specific regions of Nigeria. We can also catch the important negative signals from that region before the world mass media was reporting on it. Even the two-year-old data from Nigeria shows that the kidnapping was not just a random case, but is was a part of a larger turbulent process that we can distinguish and visualize.
So while we cannot predict this specific occurrence, there where certainly indicators of increasing turbulence in Nigerian Borno State and signs of an impending atrocity. If sufficient data were available in real time, similar improved algorithms could provide this type of data to policy-makers and civil activists all over the world. This information could allow them to monitor these turbulent processes and could help prepare them for extreme events, perhaps even preventing them before they happen.
What do we need to make this future happen? To start, we need improved/smarter algorithms, which is exactly what we will try to accomplish in the next round of this project – Atrocity Prevention Part II. Stay tuned.
For more information, please visit: http://www.topcoder.com/techchallenge/
This effort was made possible through the Center of Excellence for Collaborative Innovation.