Exploring the Global Terrorism DataBase

Open Table of contents

Why explore the Global Terrorism Database?
🗺️ Explore the maps
Code documentation
- Splitting the dataset
- Extracting terrorist groups names

Why explore the Global Terrorism Database?

My initial plan was to explore Wikidata and compile a CSV of terrorist attacks. However, I quickly realized it required extensive work on definitions—specifically how to tag an event as terrorism or not.

Fortunately, the University of Maryland had already done the heavy lifting and released the Global Terrorism Database (GTD), which includes:

Over 200,000 recorded attacks
Coverage from 1970 through the 2000s
Based on more than 4 million source articles

I requested access and soon received a large XLS file containing the full dataset.

🗺️ Explore the maps

The goal: visualize attack locations over time, and highlight areas of activity for each terrorist group.

Code documentation

Splitting the dataset

The full dataset (converted to CSV) was too large for direct use in a Next.js project—about 186 MB. To manage this, I split the file by year (1970–2020):

#Reading and splitting 2000 to 2020 files

df <- read_csv2("data/global-terrorism-00-20.csv")

## Retain only 6 specified fields
df <- df[, c("eventid", "nkill", "iyear", "latitude", "longitude", "gname")]

split_dfs <- split(df, df$iyear)

for (iyear in names(split_dfs)) {
  write_csv(split_dfs[[iyear]], paste0("global-terrorism-", iyear, ".csv"))
}

for (iyear in names(split_dfs)) {
  csv_path = paste0("global-terrorism-", iyear, ".csv")
  json_path = paste0("global-terrorism-", iyear, ".json")
  
  df_year <- read_csv(csv_path)
  json_data <- toJSON(df_year, pretty = TRUE)
  
  write(json_data, file = json_path)
}

Extracting terrorist groups names

The field “gname” contains the name of the group that carried out the attack. These categories do not represent discrete entities : they are not exhaustive or mutually exclusive.

Here is a Python script for the year 1971:

# Load the data from the provided JSON file for 1971
with open('/mnt/data/global-terrorism-1971.json', 'r') as file:
    data_1971 = json.load(file)

# Extract the "gname" entries from the data for 1971
gname_list_1971 = [entry.get('gname', '') for entry in data_1971]

# Calculate the frequency of each "gname" entry for 1971
gname_counts_1971 = Counter(gname_list_1971)

# Get the 10 most frequent "gname" entries for 1971
top_10_gnames_1971 = gname_counts_1971.most_common(10)
top_10_gnames_1971

Here is the result :


('Unknown', 98),
('Irish Republican Army (IRA)', 60),
('Left-Wing Militants', 48),
("Turkish People's Liberation Army", 23),
('Chicano Liberation Front', 22),
('Black Liberation Army', 16),
('Armed Revolutionary Independence Movement (MIRA)', 16),
('Student Radicals', 13),
('Black Nationalists', 13),
('Weather Underground, Weathermen', 12)