Statistical Analysis of Export Post CTF

Hello! While the couple of graphs CTFd gives you can be useful, I am looking to gather more granular statistics on how the challenge ran after one has been completed. Things like user by user analysis of what challenges they got wrong, as well as which questions were hardest, etc. I know that can be done by loading the JSON from the challenge export into a script to chew the data. But before i went and tried banging my head against it I thought I would reach out to the community and see if anyone has done any analysis of the export before.

1 Like

We did some basis manual crunching of weblog data. Would be interested in hearing more, as we plan to do similar things for learning analytics.

I spent a day just taking a look at JSON output and came up with this python script that is pretty unique to my use case for CTFd. I create scoreboards using CTFd that we have customers of my product use as a learning exercise disguised as a competition, and I wanted a sort of report on each employees performance i could give to managers after the event is completed. This is what I came up with using pandas, not sure if anyone else would be able to make use of it and I will continue to enhance this code with things like adding it into django as a web app but wanted to share regardless

import pandas as pd
import zipfile
import json
import os
import plotly.express as px
import plotly.graph_objects as go
from tabulate import tabulate

# Open the zip file
zip_path = './testexport.zip'
with zipfile.ZipFile(zip_path, 'r') as zip_file:
    # Extract the JSON files to a directory
    zip_file.extractall('extracted_files')

# Read and parse the JSON files
extracted_files_dir = 'extracted_files'
for root, dirs, files in os.walk(extracted_files_dir):
    for filename in files:
        if filename.endswith('submissions.json'):
            file_path = os.path.join(root, filename)
            with open(file_path) as file:
                submissions_data = json.load(file)
        if filename.endswith('challenges.json'):
            file_path = os.path.join(root, filename)
            with open(file_path) as file:
                challenges_data = json.load(file)
        if filename.endswith('users.json'):
            file_path = os.path.join(root, filename)
            with open(file_path) as file:
                users_data = json.load(file)

# Pull out the results JSON and add them as pandas dataframes
submissions_data = submissions_data['results']
submissions_df = pd.DataFrame(submissions_data)
challenges_data = challenges_data['results']
challenges_df = pd.DataFrame(challenges_data)
users_data = users_data['results']
users_df = pd.DataFrame(users_data)

# Merge submissions DataFrame with users DataFrame to include user information
merged_df = pd.merge(submissions_df, users_df, left_on='user_id', right_on='id', how='left',
                     suffixes=('_submissions', '_users'))
# Merge merged DataFrame with challenges DataFrame to include challenge information
merged_df = pd.merge(merged_df, challenges_df, left_on='challenge_id', right_on='id', how='left',
                     suffixes=('_merged', '_challenges'))

####
# Category Analysis
####
# Calculate the percentage of correct answers per category
category_stats = merged_df.groupby('category')['type_submissions'].apply(lambda x: (x == 'correct').mean() * 100)
category_stats = category_stats.sort_values(ascending=True)
category_stats_df = pd.DataFrame({'percentage_correct': category_stats})
# Find the most incorrect question per category
most_incorrect_questions = merged_df.loc[merged_df['type_submissions'] == 'incorrect'].groupby('category')[
    'name_challenges'].value_counts().groupby(level=0).idxmax()
most_incorrect_questions_df = pd.DataFrame(most_incorrect_questions.reset_index())
most_incorrect_questions_df.columns = ['category', 'most_incorrect_challenge']

# Create a DataFrame to store the results
category_results = pd.merge(category_stats_df, most_incorrect_questions_df, left_index=True, right_on='category')
category_results = category_results.sort_values(by='percentage_correct', ascending=True)

####
# User Analysis
####
# Calculate the score for each user based on the correctness of the submission
merged_df['score'] = merged_df['value'].where(merged_df['type_submissions'] == 'correct', 0)
# Group by user and calculate total score, correct answers, and incorrect answers
user_stats = merged_df.groupby('user_id').agg(
    total_score=('score', 'sum'),
    correct_answers=('type_submissions', lambda x: (x == 'correct').sum()),
    incorrect_answers=('type_submissions', lambda x: (x == 'incorrect').sum())
)
# Calculate percentage of correct answers for each user
user_stats['percentage_correct'] = (user_stats['correct_answers'] / (
            user_stats['correct_answers'] + user_stats['incorrect_answers'])) * 100

# Group by user and category, count incorrect answers, and find worst category for each user
worst_category = merged_df[merged_df['type_submissions'] == 'incorrect'].groupby(
    ['user_id', 'category']).size().reset_index().rename(columns={0: 'incorrect_count'})
worst_category = worst_category.sort_values('incorrect_count', ascending=False).drop_duplicates('user_id').drop(
    'incorrect_count', axis=1)

# Merge user_stats DataFrame with worst_category DataFrame to include worst category
user_stats = pd.merge(user_stats, worst_category, on='user_id', how='left')

# Merge user_stats DataFrame with users DataFrame to include user names
user_stats = pd.merge(user_stats, users_df[['id', 'name']], left_on='user_id', right_on='id', how='left')
user_stats = user_stats.drop(['user_id', 'id'], axis=1)
user_stats = user_stats.sort_values(by='total_score', ascending=False)

####
# Cleanup of Data Frames
####
# reorder columns
new_order = ['name', 'total_score', 'percentage_correct', 'correct_answers', 'incorrect_answers', 'category']
new_names = {'category': 'worst_category'}
user_stats = user_stats.reindex(columns=new_order)  # Reorder the columns
user_stats = user_stats.rename(columns=new_names)  # Rename the columns
new_order = ['category', 'percentage_correct', 'most_incorrect_challenge']
# new_names = {'category': 'Category', 'percentage_correct': 'Percentage Correct Answers', 'most_incorrect_challenge': 'Most Incorrect Challenge'}
category_results = category_results.reindex(columns=new_order)  # Reorder the columns
# category_results = category_results.rename(columns=new_names)  # Rename the columns
# Print the user statistics DataFrame
category_results.to_csv('category_stats.csv')
user_stats.to_csv('user_stats.csv')
print('###################### USER BREAKDOWN ######################')
print(tabulate(user_stats, headers='keys', tablefmt='pretty'))
print('###################### CATEGORY BREAKDOWN ######################')
print(tabulate(category_results, headers='keys', tablefmt='pretty'))

report_elements = []

pie = px.pie(user_stats, values='total_score', names='name', title='Score Breakdown')
pie.update_traces(textposition='inside', textinfo='percent+label')
pie.update_layout(uniformtext_minsize=18, uniformtext_mode='hide')
pie.show()

bar = px.bar(user_stats, x='worst_category', y='percentage_correct', color='name', title="Category Breakdown",
             text='name', )
bar.show()

table = go.Figure(data=[go.Table(
    header=dict(values=list(user_stats.columns),
                fill_color='#bc3937',
                font=dict(color='#ffffff', size=18),
                height=60,
                align='center'),
    cells=dict(
        values=[user_stats.name, user_stats.total_score, user_stats.percentage_correct, user_stats.correct_answers,
                user_stats.incorrect_answers,
                user_stats.worst_category],
        fill_color='#a4a5a6',
        font=dict(color='#232222', size=14),
        height=40,
        align='center'))
])
table.show()

table = go.Figure(data=[go.Table(
    header=dict(values=list(category_results.columns),
                fill_color='#bc3937',
                font=dict(color='#ffffff', size=18),
                height=60,
                align='center'),
    cells=dict(values=[category_results.category, category_results.percentage_correct,
                       category_results.most_incorrect_challenge],
               fill_color='#a4a5a6',
               font=dict(color='#232222', size=14),
               height=40,
               align='center'))
])
table.show()

The end result is 2 csv files and 4 graphs. 1 csv is a user breakdown with info on each users score, what percentage of their submissions were correct answers, and what their worst category was. The other csv is a similar breakdown but by category instead of users, it shows what percentage of submissions in that category were correct answers, as well as what individual challenge was the most difficult in that category. The 4 graphs are various visualizations of that data that currently using plotly that just pop up in your web browser.