Data Science & Machine Learning

Kaggle competitor. Graph neural network researcher. Production ML engineer. I don't just build models—I build winning systems.

kaggle-profile.md

Kaggle Competitor

I compete where the best data scientists in the world prove their worth. Kaggle isn't academic—it's survival of the fittest algorithms. My profile: kaggle.com/brianedwards

Why Kaggle Matters:

Anyone can claim ML expertise. Kaggle rankings are proof. You're competing against thousands of PhD researchers, industry veterans, and AI labs. Every competition teaches techniques that work in the real world—not just in papers.

mania.py

March Machine Learning Mania

The annual NCAA tournament prediction competition. 68 teams, millions of possible brackets, and the chaos of March Madness. My system combines historical performance data, advanced basketball metrics, and ensemble learning to predict upset probabilities with precision.

Statistical Foundation

Ken Pomeroy metrics, strength of schedule adjustments, tempo-free efficiency ratings

Ensemble Models

XGBoost, LightGBM, neural networks—stacked for optimal bracket predictions

Upset Detection

Specialized models for identifying bracket-busting upsets before they happen

Reproducible Pipeline

Full end-to-end automation from data ingestion to submission generation

conduit-pipeline.py

Conduit - Kaggle Competition Pipeline

A battle-tested pipeline for Kaggle competitions. From raw data to leaderboard submission, Conduit handles the tedious infrastructure so you can focus on feature engineering and model experimentation. Built from lessons learned across dozens of competitions.

# Typical Conduit workflow

conduit init my-competition

conduit feature add momentum_indicators

conduit train --model xgboost --cv 5

conduit submit --ensemble best_3

Key Features:

  • - Automated cross-validation with stratification
  • - Feature importance tracking and selection
  • - Experiment logging with MLflow integration
  • - Ensemble creation and blending utilities
wdi-py.py

World Bank Development Indicators

Global development data analysis using Polars—the blazing-fast DataFrame library that makes pandas feel like a horse-drawn carriage. This project processes decades of economic indicators across 200+ countries with the speed and efficiency that modern data science demands.

1400+

Development Indicators

217

Countries & Regions

60+

Years of Data

Why Polars?

10-100x faster than pandas. Lazy evaluation. Multi-threaded by default. When you're processing billions of data points, speed isn't a luxury—it's survival.

graphyard.md

Graphyard - Graph Analysis Tools & Blog

The world is graphs. Social networks, knowledge bases, molecular structures, supply chains— everything interesting is connected. Graphyard is my laboratory for graph algorithms, network analysis, and the deep mathematical structures that underpin complex systems.

Graphyard blog article on graph analysis

Graph Algorithms

PageRank, community detection, centrality measures, shortest paths at scale

Network Visualization

Interactive force-directed layouts, hierarchical structures, temporal evolution

Technical Blog

Deep dives into graph theory, algorithm implementations, real-world applications

Open Source Tools

Reusable libraries for graph processing and analysis

gnn.py

Graph Neural Networks (GNN)

When traditional neural networks meet graph structures, magic happens. GNNs learn representations that capture both node features and structural relationships. My implementations push the boundaries of message passing, attention mechanisms, and scalable training on massive graphs.

Applications:

  • - Drug discovery and molecular property prediction
  • - Social network analysis and influence propagation
  • - Recommendation systems with relational data
  • - Fraud detection in financial transaction networks
gyat-architecture.py

GYAT - Graph Predictive Attention Network

My custom architecture combining graph attention mechanisms with predictive self-supervised learning. GYAT learns rich node embeddings by predicting masked graph structures—like BERT, but for networks. The attention mechanism dynamically weights neighbor contributions based on learned relevance.

Multi-Head Attention

Parallel attention heads capture diverse relationship types

Predictive Pretraining

Self-supervised learning on graph structure for robust representations

Scalable Training

Mini-batch sampling for graphs with millions of nodes

Research Frontier:

GYAT represents the cutting edge of graph representation learning. This isn't off-the-shelf ML—it's original research applied to real problems.

why-hire-me.md

Why My Data Science is Different

Battle-Tested

Kaggle competitions are the proving ground. I've competed against the best and learned what actually works—not what looks good in a paper.

Production-Ready

Notebooks are prototypes. I build pipelines that run in production, scale with your data, and don't break at 3 AM.

Full Stack ML

Data engineering, feature stores, model training, deployment, monitoring— I own the entire lifecycle, not just the modeling phase.

Research-Grade

Custom architectures like GYAT aren't just academic exercises. They're competitive advantages when off-the-shelf solutions plateau.

Need ML That Actually Works?

Whether you need a Kaggle-winning prediction system, a production ML pipeline, or custom research—I deliver results, not just models.