← Back to portfolio
★ NLP · CNN · Responsible AI

Subsurface
ML

Drilling-loss prediction & Acoustic Impedance · Shell · Feb 2021 — Dec 2022
Gulf of MexicoWellbore W-217Loss prediction · F1 0.84●●●
Wells
  • W-217 · active
  • W-218
  • W-219
Models
  • NLP classifier
  • CNN autoencoder
  • LIME explainer
Pressure deltanominal
Mud weightnominal
Gain/loss eventlow
Acoustic impedanceencoded ✓
Sai

Project Overview

Two parallel projects under one team: (1) text-based classification of historical drilling-loss events from 27 years of well reports, and (2) a CNN autoencoder for lossless compression of Acoustic Impedance graphs. Plus a Responsible-AI POC layering LIME and SHAP over both models so geoscientists could trust the output.

NLP / Text ClassificationCNN AutoencodersPySpark / Spark-SQLDatabricks · AzureLIME · SHAPREST APIsPi Data (1995 - 2022)F1 +0.2 vs baseline

Problem Statement

  1. Unstructured history. 27 years of well logs in free-form text. The "unsupervised" prediction model worked at ~F1 0.62 — not good enough to act on.
  2. Huge graph storage. Acoustic Impedance graphs were stored as full-resolution PNGs in Azure Blob — terabytes of redundant data.
  3. Black-box ML.Geoscientists wouldn't accept a model output without understanding why.
+0.2
F1 lift on the drilling-loss classifier after adding labeled supervised data and a wildcard-regex preprocessing pass.
−1 hr
Training time saved
+25%
Pipeline perf

My Role

Python developer and data scientist. Built the supervised text classifier, the CNN autoencoder, and the Responsible-AI framework. Worked directly with geoscientist labelers and onboarded two interns.

ML ModelsPySpark ETLREST APIsResponsible-AI POCIntern MentorshipStakeholder reporting

The approach.

// STEP 01

Label, then learn.

Worked with subject-matter experts to weight-label ambiguous loss events. Supervised classifier on the labeled set bumped F1 from 0.62 to 0.84.

// STEP 02

CNN autoencoder for impedance graphs.

Trained a lossless CNN autoencoder on years of Acoustic Impedance traces. ~85% storage reduction with reconstructed output that geoscientists couldn't distinguish from the original.

// STEP 03

LIME + SHAP for trust.

Built a Responsible-AI POC that surfaces feature attribution for every prediction. Identified two latent biases in the training data — both corrected in v2.

The outcomes.

F1 0.84
Loss classifier

Up from 0.62 unsupervised. Now flagged as the production model on three active rigs.

~85%
Storage saved

CNN-autoencoded impedance graphs replaced raw PNG storage with no visual loss.

2 biases
Found & fixed

LIME / SHAP audit surfaced two systematic biases in the training data — both corrected before deployment.

Next case study →
ERP Analytics & Fraud Detection · Oracle