Normalize Persian text in Excel (.xlsx) files for reliable search/filter behavior, without breaking workbook structure.
FarsiFix is a client-side web app: files are processed locally in your browser using a Web Worker. No server upload is required.
Mixed Arabic/Persian code points in spreadsheet text can make filters and lookups fail even when words look identical. FarsiFix standardizes those text variants while keeping formulas, styles, and workbook structure intact.
- Normalizes Persian text in
xl/sharedStrings.xml. - Normalizes inline string cells inside
xl/worksheets/sheet*.xml. - Preserves XML entities (
&,<,>,",') exactly as encoded. - Leaves non-text XML untouched (including formulas and formatting tags).
- Produces a download named
<original>_FarsiFix.xlsx.
- Input:
.xlsx - Output:
.xlsx(same workbook structure, normalized text nodes)
- Arabic/Persian letter variants are canonicalized (for example
ك->ک,ي/ى->ی). - Persian and Arabic-Indic digits are normalized to ASCII (
۱۲۳٤٥->12345). - ZWNJ (
\u200c) is mapped to a space in default mode. - Horizontal whitespace is collapsed per line, while newline structure is preserved.
- Urdu full stop
۔is normalized to.. - Latin text casing is preserved (no case folding).
Example:
| Input | Output |
|---|---|
كريم |
کریم |
سلام & دنيا |
سلام & دنیا |
میروم |
می روم |
۱۲۳٤٥٦ |
123456 |
FarsiFix follows strict XML invariants:
- Never decode/re-encode XML entities.
- Use regex-based string surgery only (no DOM parsing/reserialization).
- Normalize text only inside
<t>tags. - Preserve tag attributes such as
xml:space="preserve".
Guardrails:
- UI rejects files larger than
VITE_MAX_FILE_SIZE_MB(default100MB). - Worker aborts when
xl/sharedStrings.xmlexceeds200MB (unzipped). - Worker aborts when any
xl/worksheets/sheet*.xmlexceeds50MB (unzipped).
flowchart LR
A["Browser UI (React)"] --> B["Main Thread Hooks"]
B --> C["Web Worker (Comlink)"]
C --> D["JSZip: read/write XLSX parts"]
D --> E["normalizeXmlText() on <t> nodes"]
E --> F["normalizeText() Persian rules"]
C --> G["Repack XLSX (DEFLATE)"]
G --> H["Download: *_FarsiFix.xlsx"]
- React 19 + TypeScript + Vite
- Tailwind CSS v4
- Web Workers + Comlink
- JSZip for
.xlsxpackage manipulation - Vitest (unit) + Playwright (E2E)
- Biome + Oxlint
npm installnpm run devOpen http://localhost:5173.
npm run buildnpm run dev- Start Vite dev server.npm run build- Typecheck and build.npm run preview:local- Preview production build locally.npm run typecheck- TypeScript checks only.npm run test- Run Vitest unit tests.npm run test:watch- Run unit tests in watch mode.npm run e2e- Run Playwright end-to-end tests.npm run perf:metrics- Build + run Lighthouse (mobile/desktop) and save normalized metrics JSON/Markdown.npm run perf:compare- Compare two metrics JSON files and produce a before/after Markdown report.npm run lint- Run Biome checks.npm run lint:fix- Apply Biome fixes.npm run lint:ox- Run Oxlint (type-aware).npm run lint:all- Run Biome + Oxlint.npm run check:theme- Verify class-based dark mode in built CSS.npm run view- Open a headed Playwright session against dev server.npm run deploy:pages- Build and deploy to Cloudflare Pages.
Collect a normalized performance snapshot (Lighthouse mobile + desktop):
npm run perf:metrics -- --out output/perf/before.json --raw-dir output/perf/raw-beforeRun again after your changes:
npm run perf:metrics -- --out output/perf/after.json --raw-dir output/perf/raw-afterGenerate the before/after report:
npm run perf:compare -- --before output/perf/before.json --after output/perf/after.json --out output/perf/report.mdFail on regressions (useful in CI):
npm run perf:compare -- --before output/perf/before.json --after output/perf/after.json --fail-on-regressionGitHub Actions workflow is included at .github/workflows/perf-regression.yml and runs on pull requests, pushes to main, and manual dispatch. It also runs the full quality gate (lint:all, build, test, e2e, check:theme) before generating performance reports.
.env:
VITE_MAX_FILE_SIZE_MB=100npm run lint:all
npm run build
npm run test
npm run e2e
npm run check:themesrc/
components/ UI
hooks/ App/worker orchestration
lib/ Pure normalization and utilities
workers/ Worker entrypoint + Excel core logic
e2e/ Playwright specs
fixtures/ Test workbook fixtures
scripts/ Local tooling and checks
Cloudflare Pages config is included via wrangler.toml.
npm run deploy:pagesThe repository also contains a Python normalizer and keyboard mapping XML files used as historical/reference material during rule design:
persian_normalizer.pytests/test_normalizer.pypersian-legacy.xmlpersian-standard.xml
The production web app is implemented in TypeScript under src/.