The Illusion of Just Knowing
Data Science, Data Governance Paul Karsten Data Science, Data Governance Paul Karsten

The Illusion of Just Knowing

Introducing AI/ML in your business emphasizes the importance of data for understanding business operations and driving growth. Relying on assumptions hinders progress and creates an illusion of competence. The document advocates metrics, experimentation, and the scientific method to create a data-driven approach to doing business.

Read More
Anti-Patterns in Data Mesh
Data Science, Data Ops Paul Karsten Data Science, Data Ops Paul Karsten

Anti-Patterns in Data Mesh

This article explores common anti-patterns in implementing Data Mesh, a decentralized data architecture emphasizing domain-oriented data ownership. While Data Mesh aims to enhance data accessibility and usability across organizations, its success relies on understanding core principles: domain-driven data ownership, data products, and federated governance.

Read More
Model Development
Data Science, Data Management Paul Karsten Data Science, Data Management Paul Karsten

Model Development

This blog post outlines the second phase of our Data Science Process: Model Development. Which involves building, training, and evaluating models based on data gathered during Question Formation. The process is iterative, experimenting with different algorithms, features, and parameters in a sandbox environment before scaling to larger datasets. Model performance is evaluated using metrics, validation for overfitting/underfitting, and checks for robustness and interpretability. Finally, models must be versioned, monitored for data drift, and continuously updated to ensure they remain effective and relevant over time.

Read More
Question Formation and Data Analysis in Data Science
Data Science, Data Management Paul Karsten Data Science, Data Management Paul Karsten

Question Formation and Data Analysis in Data Science

This blog post focuses on the first phase of our Data Science Process: Question Formation and Data Analysis. In this phase, we iterate multiple times through question formation, data collection, and exploration. Initial questions are likely to be of low fidelity. Through the process of data exploration, the questions gain fidelity and drive toward business value.

Read More