Table of contents
Open Table of contents
Why explore the Global Terrorism Database?
👉 TLDR: Jump to the maps
My initial plan was to explore Wikidata and compile a CSV of terrorist attacks. However, I quickly realized it required extensive work on definitions—specifically how to tag an event as terrorism or not.
Fortunately, the University of Maryland had already done the heavy lifting and released the Global Terrorism Database (GTD), which includes:
- Over 200,000 recorded attacks
- Coverage from 1970 through the 2000s
- Based on more than 4 million source articles
I requested access and soon received a large XLS file containing the full dataset.
🗺️ Explore the maps
The goal: visualize attack locations over time, and highlight areas of activity for each terrorist group.
Code documentation
Splitting the dataset
The full dataset (converted to CSV) was too large for direct use in a Next.js project—about 186 MB. To manage this, I split the file by year (1970–2020):
#Reading and splitting 2000 to 2020 files
df <- read_csv2("data/global-terrorism-00-20.csv")
## Retain only 6 specified fields
df <- df[, c("eventid", "nkill", "iyear", "latitude", "longitude", "gname")]
split_dfs <- split(df, df$iyear)
for (iyear in names(split_dfs)) {
write_csv(split_dfs[[iyear]], paste0("global-terrorism-", iyear, ".csv"))
}
for (iyear in names(split_dfs)) {
csv_path = paste0("global-terrorism-", iyear, ".csv")
json_path = paste0("global-terrorism-", iyear, ".json")
df_year <- read_csv(csv_path)
json_data <- toJSON(df_year, pretty = TRUE)
write(json_data, file = json_path)
}
Extracting terrorist groups names
The field “gname” contains the name of the group that carried out the attack. These categories do not represent discrete entities : they are not exhaustive or mutually exclusive.
Here is a Python script for the year 1971:
# Load the data from the provided JSON file for 1971
with open('/mnt/data/global-terrorism-1971.json', 'r') as file:
data_1971 = json.load(file)
# Extract the "gname" entries from the data for 1971
gname_list_1971 = [entry.get('gname', '') for entry in data_1971]
# Calculate the frequency of each "gname" entry for 1971
gname_counts_1971 = Counter(gname_list_1971)
# Get the 10 most frequent "gname" entries for 1971
top_10_gnames_1971 = gname_counts_1971.most_common(10)
top_10_gnames_1971
Here is the result :
('Unknown', 98),
('Irish Republican Army (IRA)', 60),
('Left-Wing Militants', 48),
("Turkish People's Liberation Army", 23),
('Chicano Liberation Front', 22),
('Black Liberation Army', 16),
('Armed Revolutionary Independence Movement (MIRA)', 16),
('Student Radicals', 13),
('Black Nationalists', 13),
('Weather Underground, Weathermen', 12)