Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
35a1f83
upgrade to python 3.11 and with uv support
Chenglong-MS Jan 31, 2026
a65798d
make import more clear
Chenglong-MS Jan 31, 2026
fe60de0
clean up data loader to prep for new design
Chenglong-MS Feb 1, 2026
fd5a233
udpate auth workflow to prep for new workspace manage
Chenglong-MS Feb 1, 2026
dd9ebd3
wip on new workspace design
Chenglong-MS Feb 4, 2026
f671cc1
updates to unify data execution method
Chenglong-MS Feb 4, 2026
bf5a609
unify computation approach for both in memory and datalake tables
Chenglong-MS Feb 6, 2026
1a3ec07
after unification, remove separate agents
Chenglong-MS Feb 6, 2026
8e6ea7c
temp
Chenglong-MS Feb 6, 2026
d0a1371
semantic enhanced chart assemble
Chenglong-MS Feb 6, 2026
9d6a34e
some cleaning
Chenglong-MS Feb 6, 2026
4c83824
an algorithm to calculate layout dynamically, might be a good intervi…
Chenglong-MS Feb 7, 2026
b14e242
useless? ui design
Chenglong-MS Feb 7, 2026
08a5840
refactor design into design tokens
Chenglong-MS Feb 8, 2026
10526c0
optimize data loader
Chenglong-MS Feb 8, 2026
463d541
fix
Chenglong-MS Feb 8, 2026
9d522fa
Add projection types and projection centers for the map
BAIGUANGMEI Feb 8, 2026
48f0c33
Add map support prompts
BAIGUANGMEI Feb 8, 2026
6086035
Clean import
BAIGUANGMEI Feb 8, 2026
b4e9ed4
Merge pull request #1 from microsoft/main
BAIGUANGMEI Feb 8, 2026
f98309f
Update src/app/utils.tsx
Chenglong-MS Feb 8, 2026
824a404
Update py-src/data_formulator/agents/agent_data_rec.py
Chenglong-MS Feb 8, 2026
86b2dc5
Update src/components/ChartTemplates.tsx
Chenglong-MS Feb 8, 2026
fd7aa53
Apply suggestions from code review
Chenglong-MS Feb 8, 2026
4ad7219
Merge pull request #232 from BAIGUANGMEI/feature/map-projection-support
Chenglong-MS Feb 8, 2026
c8307ff
minor
Chenglong-MS Feb 8, 2026
b0da0d2
Merge remote-tracking branch 'refs/remotes/origin/dev' into dev
Chenglong-MS Feb 8, 2026
61d826c
update config
Chenglong-MS Feb 8, 2026
7f172ed
udpate rendering workflow
Chenglong-MS Feb 8, 2026
1d2623a
fix scale bug
Chenglong-MS Feb 8, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 11 additions & 10 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
@@ -1,23 +1,24 @@
// For format details, see https://aka.ms/devcontainer.json. For config options, see the
// README at: https://github.com/devcontainers/templates/tree/main/src/python
{
"name": "Python 3",
"name": "Data Formulator Dev",
// Or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile
"image": "mcr.microsoft.com/devcontainers/python:1-3.12-bullseye",
"image": "mcr.microsoft.com/devcontainers/python:1-3.11-bullseye",

// Features to add to the dev container. More info: https://containers.dev/features.
"features": {
"ghcr.io/devcontainers/features/node:1": {
"version": "18"
},
"ghcr.io/devcontainers/features/azure-cli:1": {}
},
"features": {
"ghcr.io/devcontainers/features/node:1": {
"version": "18"
},
"ghcr.io/devcontainers/features/azure-cli:1": {},
"ghcr.io/astral-sh/uv:1": {}
},

// Use 'forwardPorts' to make a list of ports inside the container available locally.
// "forwardPorts": [],
"forwardPorts": [5000, 5173],

// Use 'postCreateCommand' to run commands after the container is created.
"postCreateCommand": "cd /workspaces/data-formulator && npm install && npm run build && python3 -m venv /workspaces/data-formulator/venv && . /workspaces/data-formulator/venv/bin/activate && pip install -e /workspaces/data-formulator --verbose && data_formulator"
"postCreateCommand": "cd /workspaces/data-formulator && npm install && npm run build && uv sync && uv run data_formulator"

// Configure tool-specific properties.
// "customizations": {},
Expand Down
4 changes: 1 addition & 3 deletions .env.template
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,4 @@
# python -m data_formulator -p 5000 --exec-python-in-subprocess true --disable-display-keys true

DISABLE_DISPLAY_KEYS=false # if true, the display keys will not be shown in the frontend
EXEC_PYTHON_IN_SUBPROCESS=false # if true, the python code will be executed in a subprocess to avoid crashing the main app, but it will increase the time of response

LOCAL_DB_DIR= # the directory to store the local database, if not provided, the app will use the temp directory
EXEC_PYTHON_IN_SUBPROCESS=false # if true, the python code will be executed in a subprocess to avoid crashing the main app, but it will increase the time of response
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@

*env
.venv/
*api-keys.env
**/*.ipynb_checkpoints/
.DS_Store
Expand Down
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.11
149 changes: 131 additions & 18 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,34 @@
How to set up your local machine.

## Prerequisites
* Python > 3.11
* Python >= 3.11
* Node.js
* Yarn
* [uv](https://docs.astral.sh/uv/) (recommended) or pip

## Backend (Python)

### Option 1: With uv (recommended)

uv is faster and provides reproducible builds via lockfile.

```bash
uv sync # Creates .venv and installs all dependencies
uv run data_formulator # Run app (opens browser automatically)
uv run data_formulator --dev # Run backend only (for frontend development)
```

**Which command to use:**
- **End users / testing the full app**: `uv run data_formulator` - starts server and opens browser to http://localhost:5000
- **Frontend development**: `uv run data_formulator --dev` - starts backend server only, then run `yarn start` separately for the Vite dev server on http://localhost:5173

### Option 2: With pip (fallback)

- **Create a Virtual Environment**
```bash
python -m venv venv
.\venv\Scripts\activate
source venv/bin/activate # Unix
# or .\venv\Scripts\activate # Windows
```

- **Install Dependencies**
Expand All @@ -29,7 +47,6 @@ How to set up your local machine.
- configure settings as needed:
- DISABLE_DISPLAY_KEYS: if true, API keys will not be shown in the frontend
- EXEC_PYTHON_IN_SUBPROCESS: if true, Python code runs in a subprocess (safer but slower), you may consider setting it true when you are hosting Data Formulator for others
- LOCAL_DB_DIR: directory to store the local database (uses temp directory if not set)
- External database settings (when USE_EXTERNAL_DB=true):
- DB_NAME: name to refer to this database connection
- DB_TYPE: mysql or postgresql (currently only these two are supported)
Expand All @@ -41,14 +58,16 @@ How to set up your local machine.


- **Run the app**
- **Windows**
```bash
.\local_server.bat
```

- **Unix-based**
```bash
# Unix
./local_server.sh

# Windows
.\local_server.bat

# Or directly
data_formulator # Opens browser automatically
data_formulator --dev # Backend only (for frontend development)
```

## Frontend (TypeScript)
Expand All @@ -61,7 +80,12 @@ How to set up your local machine.

- **Development mode**

Run the front-end in development mode using, allowing real-time edits and previews:
First, start the backend server (in a separate terminal):
```bash
uv run data_formulator --dev # or ./local_server.sh
```

Then, run the frontend in development mode with hot reloading:
```bash
yarn start
```
Expand All @@ -81,6 +105,10 @@ How to set up your local machine.
Then, build python package:

```bash
# With uv
uv build

# Or with pip
pip install build
python -m build
```
Expand Down Expand Up @@ -112,23 +140,23 @@ How to set up your local machine.

When deploying Data Formulator to production, please be aware of the following security considerations:

### Database Storage Security
### Database and Data Storage Security

1. **Local DuckDB Files**: When database functionality is enabled (default), Data Formulator stores DuckDB database files locally on the server. These files contain user data and are stored in the system's temporary directory or a configured `LOCAL_DB_DIR`.
1. **Workspace and table data**: Table data is stored in per-identity workspaces (e.g. parquet files). DuckDB is used only in-memory per request when needed (e.g. for SQL mode); no persistent DuckDB database files are created by the app.

2. **Session Management**:
- When database is **enabled**: Session IDs are stored in Flask sessions (cookies) and linked to local DuckDB files
- When database is **disabled**: No persistent storage is used, and no cookies are set. Session IDs are generated per request for API consistency
2. **Identity Management**:
- Each user's data is isolated by a namespaced identity key (e.g., `user:alice@example.com` or `browser:550e8400-...`)
- Anonymous users get a browser-based UUID stored in localStorage
- Authenticated users get their verified user ID from the auth provider

3. **Data Persistence**: User data processed through Data Formulator may be temporarily stored in these local DuckDB files, which could be a security risk in multi-tenant environments.
3. **Data persistence**: User data may be written to workspace storage (e.g. parquet) on the server. In multi-tenant deployments, ensure workspace directories are isolated and access-controlled.

### Recommended Security Measures

For production deployment, consider:

1. **Use `--disable-database` flag** for stateless deployments where no data persistence is needed
1. **Use `--disable-database` flag** to disable table-connector routes when you do not need external or uploaded table support
2. **Implement proper authentication, authorization, and other security measures** as needed for your specific use case, for example:
- Store DuckDB file in a database
- User authentication (OAuth, JWT tokens, etc.)
- Role-based access control
- API rate limiting
Expand All @@ -142,5 +170,90 @@ For production deployment, consider:
python -m data_formulator.app --disable-database
```

## Authentication Architecture

Data Formulator supports a **hybrid identity system** that supports both anonymous and authenticated users.

### Identity Flow Overview

```
┌─────────────────────────────────────────────────────────────────────┐
│ Frontend Request │
├─────────────────────────────────────────────────────────────────────┤
│ Headers: │
│ X-Identity-Id: "browser:550e8400-..." (namespace sent by client) │
│ Authorization: Bearer <jwt> (if custom auth implemented) │
│ (Azure also adds X-MS-CLIENT-PRINCIPAL-ID automatically) │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Backend Identity Resolution │
│ (auth.py: get_identity_id) │
├─────────────────────────────────────────────────────────────────────┤
│ Priority 1: Azure X-MS-CLIENT-PRINCIPAL-ID → "user:<azure_id>" │
│ Priority 2: JWT Bearer token (if implemented) → "user:<jwt_sub>" │
│ Priority 3: X-Identity-Id header → ALWAYS "browser:<id>" │
│ (client-provided namespace is IGNORED for security) │
└─────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────────┐
│ Storage Isolation │
├─────────────────────────────────────────────────────────────────────┤
│ "user:alice@example.com" → alice's DuckDB file (ONLY via auth) │
│ "browser:550e8400-..." → anonymous user's DuckDB file │
└─────────────────────────────────────────────────────────────────────┘
```

### Security Model

**Critical Security Rule:** The backend NEVER trusts the namespace prefix from the client-provided `X-Identity-Id` header. Even if a client sends `X-Identity-Id: "user:alice@..."`, the backend strips the prefix and forces `browser:alice@...`. Only verified authentication (Azure headers or JWT) can result in a `user:` prefixed identity.

The key security principle is **namespaced isolation with forced prefixing**:

| Scenario | X-Identity-Id Sent | Backend Resolution | Storage Key |
|----------|-------------------|-------------------|-------------|
| Anonymous user | `browser:550e8400-...` | Strips prefix, forces `browser:` | `browser:550e8400-...` |
| Azure logged-in user | `browser:550e8400-...` | Uses Azure header (priority 1) | `user:alice@...` |
| Attacker spoofing | `user:alice@...` (forged) | No valid auth, strips & forces `browser:` | `browser:alice@...` |

**Why this is secure:** An attacker sending `X-Identity-Id: user:alice@...` gets `browser:alice@...` as their storage key, which is completely separate from the real `user:alice@...` that only authenticated Alice can access.

### Implementing Custom Authentication

To add JWT-based authentication:

1. **Backend** (`tables_routes.py`): Uncomment and configure the JWT verification code in `get_identity_id()`
2. **Frontend** (`utils.tsx`): Implement `getAuthToken()` to retrieve the JWT from your auth context
3. **Add JWT secret** to Flask config: `current_app.config['JWT_SECRET']`

### Azure App Service Authentication

When deployed to Azure with EasyAuth enabled:
- Azure automatically adds `X-MS-CLIENT-PRINCIPAL-ID` header to authenticated requests
- The backend reads this header first (highest priority)
- No frontend changes needed - Azure handles the auth flow

### Frontend Identity Management

The frontend (`src/app/identity.ts`) manages identity as follows:

```typescript
// Identity is always initialized with browser ID
identity: { type: 'browser', id: getBrowserId() }

// If user logs in (e.g., via Azure), it's updated to:
identity: { type: 'user', id: userInfo.userId }

// All API requests send namespaced identity:
// X-Identity-Id: "browser:550e8400-..." or "user:alice@..."
```

This ensures:
1. **Anonymous users**: Work immediately with localStorage-based browser ID
2. **Logged-in users**: Get their verified user ID from the auth provider
3. **Cross-tab consistency**: Browser ID is shared via localStorage across all tabs

## Usage
See the [Usage section on the README.md page](README.md#usage).
34 changes: 29 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
<p align="center">
<a href="https://data-formulator.ai"><img src="https://img.shields.io/badge/🚀_Try_Online_Demo-data--formulator.ai-F59E0B?style=for-the-badge" alt="Try Online Demo"></a>
&nbsp;
<a href="#get-started"><img src="https://img.shields.io/badge/💻_Install_Locally-pip_install-3776AB?style=for-the-badge" alt="Install Locally"></a>
<a href="#get-started"><img src="https://img.shields.io/badge/💻_Install_Locally-uvx_|_pip-3776AB?style=for-the-badge" alt="Install Locally"></a>
</p>

<p align="center">
Expand All @@ -32,6 +32,9 @@ https://github.com/user-attachments/assets/8ca57b68-4d7a-42cb-bcce-43f8b1681ce2


## News 🔥🔥🔥
[01-31-2025] **uv support** — Faster installation with uv
- 🚀 **Install with uv**: Data Formulator now supports installation via [uv](https://docs.astral.sh/uv/), the ultra-fast Python package manager. Get started in seconds with `uvx data_formulator` or `uv pip install data_formulator`.

[01-25-2025] **Data Formulator 0.6** — Real-time insights from live data
- ⚡ **Connect to live data**: Connect to URLs and databases with automatic refresh intervals. Visualizations update automatically as your data changes to provide you live insights. [Demo: track international space station position speed live](https://github.com/microsoft/data-formulator/releases/tag/0.6)
- 🎨 **UI Updates**: Unified UI for data loading; direct drag-and-drop fields from the data table to update visualization designs.
Expand Down Expand Up @@ -127,9 +130,30 @@ Data Formulator enables analysts to iteratively explore and visualize data. Star

Play with Data Formulator with one of the following options:

- **Option 1: Install via Python PIP**
- **Option 1: Install via uv (recommended)**

[uv](https://docs.astral.sh/uv/) is an extremely fast Python package manager. If you have uv installed, you can run Data Formulator directly without any setup:

```bash
# Run data formulator directly (no install needed)
uvx data_formulator
```

Or install it in a project/virtual environment:

```bash
# Install data_formulator
uv pip install data_formulator

# Run data formulator
python -m data_formulator
```

Data Formulator will be automatically opened in the browser at [http://localhost:5000](http://localhost:5000).

- **Option 2: Install via pip**

Use Python PIP for an easy setup experience, running locally (recommend: install it in a virtual environment).
Use pip for installation (recommend: install it in a virtual environment).

```bash
# install data_formulator
Expand All @@ -143,13 +167,13 @@ Play with Data Formulator with one of the following options:

*you can specify the port number (e.g., 8080) by `python -m data_formulator --port 8080` if the default port is occupied.*

- **Option 2: Codespaces (5 minutes)**
- **Option 3: Codespaces (5 minutes)**

You can also run Data Formulator in Codespaces; we have everything pre-configured. For more details, see [CODESPACES.md](CODESPACES.md).

[![Open in GitHub Codespaces](https://github.com/codespaces/badge.svg)](https://codespaces.new/microsoft/data-formulator?quickstart=1)

- **Option 3: Working in the developer mode**
- **Option 4: Working in the developer mode**

You can build Data Formulator locally if you prefer full control over your development environment and develop your own version on top. For detailed instructions, refer to [DEVELOPMENT.md](DEVELOPMENT.md).

Expand Down
9 changes: 8 additions & 1 deletion local_server.bat
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,11 @@
:: set https_proxy=http://127.0.0.1:7890

set FLASK_RUN_PORT=5000
python -m py-src.data_formulator.app --port %FLASK_RUN_PORT% --dev

:: Use uv if available, otherwise fall back to python
where uv >nul 2>nul
if %ERRORLEVEL% EQU 0 (
uv run data_formulator --port %FLASK_RUN_PORT% --dev
) else (
python -m data_formulator.app --port %FLASK_RUN_PORT% --dev
)
9 changes: 7 additions & 2 deletions local_server.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,11 @@
# export http_proxy=http://127.0.0.1:7890
# export https_proxy=http://127.0.0.1:7890

#env FLASK_APP=py-src/data_formulator/app.py FLASK_RUN_PORT=5000 FLASK_RUN_HOST=0.0.0.0 flask run
export FLASK_RUN_PORT=5000
python -m py-src.data_formulator.app --port ${FLASK_RUN_PORT} --dev

# Use uv if available, otherwise fall back to python
if command -v uv &> /dev/null; then
uv run data_formulator --port ${FLASK_RUN_PORT} --dev
else
python -m data_formulator.app --port ${FLASK_RUN_PORT} --dev
fi
Binary file added public/screenshot-stock-price-live.webp
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion py-src/data_formulator/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

def run_app():
"""Launch the Data Formulator Flask application."""
# Import app only when actually running to avoid side effects
# Import app only when actually running to avoid heavy imports at package load
from data_formulator.app import run_app as _run_app
return _run_app()

Expand Down
Loading
Loading