Charlie 🕵️

This is a tool for analyzing and visualizing your git history. Based on ideas from "Your Code as a Crime Scene" by Adam Tornhill. Highly recommended reading.

Motivation

In "Your Code as a Crime Scene", Adam Tornhill presents powerful techniques for mining insights from version control systems to identify problematic code patterns, architectural issues, and team dynamics. The book demonstrates these concepts using Code Maat, a command-line tool that extracts and analyzes VCS data. While Code Maat is excellent for research and deep analysis, it requires exporting git logs to files and often involves additional Python scripts to generate visualizations from CSV outputs.

CodeScene, the commercial evolution of these ideas, provides beautiful visualizations and automated analysis through GitHub integration. While it's a powerful tool, some developers prefer a completely local solution without any external service dependencies.

Charlie bridges this gap by providing a tool similar to bundle-analyzer or dependency-cruiser - you can run it locally with a single command and immediately see visual results in your browser. No file exports, no online services, just instant insights into your codebase's behavioral patterns.

Building this tool (with a little help from my friends) has been the best way to truly understand the concepts from the book. As they say, you don't really know something until you can build it yourself.

Installation and usage

Install from npm

This will make the charlie command available globally. You may also omit -g if you want to install it locally.

npm install -g charlie-git

Install from source

After cloning the repository:

npm install
npm run build
npm pack
npm i -g

Usage

After installing, you can run the tool with:

charlie

Don't forget to cd into your project directory before running the tool.

Alternatively, you can run the tool with:

charile /path/to/your/project

After running the tool, in the root of your project you should see a file called charlie-report.html. Open it in your browser to see the report.

Core Concepts

Hotspots

A hotspot is a file or module that is both frequently modified AND has high complexity. These represent the most problematic areas of your codebase - they change often (indicating active development or bug fixes) and are complex (making them risky to modify). Hotspots should be your top priority for refactoring. The circle size represents the complexity of the file being changed. Colors represent the frequency of the file being changed (from gray i.e. low frequency, to blue-ish, to red-ish i.e. high frequency).

Coupling Analysis

The Coupling view combines two powerful metrics to help you identify architectural problems and find clusters of tightly related files:

Sum of Coupling (SOC)

SOC is a metric calculated per file that counts how many times the file appears in commits with other files (i.e., it's not alone in the commit). Every time a file is committed alongside other files, we assume it might be coupled with them. A high SOC score indicates a file that's frequently involved in multi-file changes, which could signal architectural problems.

Coupled Pairs Integration

Each file in the coupling view can be expanded to reveal its coupled pairs - files that frequently appear together in the same commits. When two files are consistently modified together, it suggests they're more tightly coupled than your architecture might indicate. High coupling can lead to ripple effects where changes in one file require changes in another.

Finding Clusters

By expanding high-SOC files, you can identify clusters of tightly coupled files that might benefit from:

Being moved into the same module or package
Being refactored to reduce dependencies
Being split if they're doing too many things

When a file is both a hotspot AND has high SOC with many coupled pairs, it becomes a critical refactoring priority. The expandable view helps you understand not just that coupling exists, but exactly which files are involved in the coupling relationships.

The Power of Data Over Time

These metrics might sound overly simplistic at first glance, but when you collect data over months or a full year, powerful patterns emerge. Individual commits might seem random, but aggregate behavior reveals the true structure and pain points of your codebase. Data is king - it shows you what's actually happening, not what you think is happening.

Complexity Calculation

Charlie calculates complexity using a simple but effective approach: it adds 1 to the complexity score for each line of code in the file, and adds another point whenever a line has more leading whitespace than the previous line (indicating nested code blocks). This method is language-agnostic and works well for identifying complex areas across different codebases. As long as the formatting is consistent, this approach will work.

While cyclomatic complexity might be more academically accurate, this nested-based approach is sufficient for the behavioral analysis goals of this tool. For individual file analysis, I still recommend measuring cyclomatic complexity, but for understanding large-scale patterns and trends, this simpler metric serves us well.

.charlie.config.json

The .charlie.config.json file allows you to customize Charlie's analysis behavior. This file should be placed in the root of your repository (the same directory where you run the charlie command). Additional analysis options like coupling thresholds and percentile filters are available through the interactive frontend.

Configuration Fields

`include` (optional)

Type: string[] (array of regex patterns)
Default: [] (includes all files)

An array of regular expression patterns to specify which files should be included in the analysis. If this field is empty or not provided, all files are included by default.

{
  "include": ["^src/", "^lib/", "\\.ts$", "\\.js$"]
}

`exclude` (optional)

Type: string[] (array of regex patterns)
Default: [] (excludes no files)

An array of regular expression patterns to specify which files should be excluded from the analysis. These patterns are applied after the include patterns.

{
  "exclude": ["node_modules/", "\\.test\\.", "\\.spec\\.", "dist/", "build/"]
}

`after` (optional)

Type: string (ISO date format)
Default: One year ago from the current date

Specifies the earliest date for git commits to include in the analysis. Only commits made after this date will be considered.

{
  "after": "2023-01-01T00:00:00.000Z"
}

`architecturalGroups` (optional)

Type: Record<string, string> (regex pattern → group name mapping)
Default: undefined (no grouping)

Allows you to group files into architectural components for analysis. The key is a regex pattern that matches file paths, and the value is the name of the architectural group. Files matching the same group will be consolidated into single entries. Only the first group that matches a file is used.

When architecturalGroups is specified, Charlie generates both file-level and grouped visualizations in the report:

File-level Hotspots - Shows individual files as separate hotspots
Grouped Hotspots - Shows architectural groups as consolidated hotspots
Coupling Analysis - Shows both file-level and group-level coupling relationships with expandable details

This allows you to see both the detailed file-level view and the higher-level architectural view simultaneously.

{
  "architecturalGroups": {
    "^src/components/": "UI Components",
    "^src/services/": "Business Logic",
    "^src/utils/": "Utilities",
    "^src/hooks/": "React Hooks"
  }
}

Complete Example

Here's a comprehensive example of a .charlie.config.json file:

{
  "include": ["^src/", "^lib/"],
  "exclude": [
    "node_modules/",
    "\\.test\\.",
    "\\.spec\\.",
    "dist/",
    "build/",
    "__tests__/",
    "\\.d\\.ts$"
  ],
  "after": "2023-06-01T00:00:00.000Z",
  "architecturalGroups": {
    "^src/components/": "UI Layer",
    "^src/services/": "Service Layer",
    "^src/store/": "State Management",
    "^src/utils/": "Utilities",
    "^src/types/": "Type Definitions"
  }
}

How It Works

File Filtering: Charlie first applies the include patterns (if any), then applies the exclude patterns to filter which files are analyzed.
Date Filtering: Git commits are filtered to only include those made after the specified after date.
Architectural Grouping: If architecturalGroups is specified, files matching the regex patterns are grouped together and their complexity/revision metrics are combined. Both the original file-level hotspots and the grouped architectural hotspots are displayed in separate visualizations.
Interactive Analysis: Additional filtering options for SOC analysis, coupled pairs, and other metrics are available through the interactive frontend, allowing you to adjust thresholds and percentiles dynamically without regenerating the analysis.

This configuration system allows you to focus your analysis on specific parts of your codebase and organize the results in a way that makes sense for your project's architecture.

Thoughts

On Architectural Grouping

I personally haven't yet found an easy and useful case for architectural grouping. Usually when the codebase is messy, it's very hard to group things properly, but these types of codebases are the ones you usually need to analyze with tools like Charlie. And the codebases where things are easy to group, well... things are usually obvious enough without needing to group them.

This creates an interesting paradox: the feature works best on codebases that need it least, and struggles most on codebases that would benefit from it the most. That said, your mileage may vary - if you have a reasonably well-organized codebase with clear architectural boundaries that just needs some fine-tuning, architectural grouping might provide valuable insights. Or, perhaps, you have a good codebase, but the number of files is so large that it's hard to see the forest for the trees.

Credits

Special thanks to Aleksandra Kozlova and Darya Losich for their contributions and support in making this project possible. Also I'm thankful to Adam Tornhill for his book and for the inspiration. And special thanks for my wife, Olga, for making impressed faces when I show her the visualizations.

Name		Name	Last commit message	Last commit date
Latest commit History 248 Commits
src		src
.charlie.config.json		.charlie.config.json
.gitignore		.gitignore
.prettierrc		.prettierrc
LICENSE		LICENSE
README.md		README.md
eslint.config.js		eslint.config.js
package-lock.json		package-lock.json
package.json		package.json
stryker.config.json		stryker.config.json
tsconfig.cli.json		tsconfig.cli.json
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Charlie 🕵️

Table of Contents

Motivation

Installation and usage

Install from npm

Install from source

Usage

Core Concepts

Hotspots

Coupling Analysis

Sum of Coupling (SOC)

Coupled Pairs Integration

Finding Clusters

The Power of Data Over Time

Complexity Calculation

.charlie.config.json

Configuration Fields

`include` (optional)

`exclude` (optional)

`after` (optional)

`architecturalGroups` (optional)

Complete Example

How It Works

Thoughts

On Architectural Grouping

Credits

TODO:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

sudo97/charlie

Folders and files

Latest commit

History

Repository files navigation

Charlie 🕵️

Table of Contents

Motivation

Installation and usage

Install from npm

Install from source

Usage

Core Concepts

Hotspots

Coupling Analysis

Sum of Coupling (SOC)

Coupled Pairs Integration

Finding Clusters

The Power of Data Over Time

Complexity Calculation

.charlie.config.json

Configuration Fields

include (optional)

exclude (optional)

after (optional)

architecturalGroups (optional)

Complete Example

How It Works

Thoughts

On Architectural Grouping

Credits

TODO:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`include` (optional)

`exclude` (optional)

`after` (optional)

`architecturalGroups` (optional)

Packages