SPECODEX SPEC · ODEX ISSUE 1 — 2026
▮▮▮ FIELD MANUAL ▮▮▮

A product selection frontend that only an engineer could love.

Industrial spec data — drives, motors, gearheads, contactors, actuators — indexed, filtered, and exportable. No marketing copy on the rows. No "request a quote" gates. The number you need, with the datasheet that produced it.

▮▮▮ WHAT IT DOES ▮▮▮
  1. TM-01

    Filter chips, not facets

    Every spec on every record is a chip. Click to constrain, click again to drop. No nested accordions, no "show more". The filter set is the data.

  2. TM-02

    Metric ↔ imperial, header toggle

    Display-layer conversion across the whole catalog — torque, force, length, temperature. The underlying value never moves; the unit you read does.

  3. TM-03

    Datasheet links on every row

    The PDF that produced the row is one click away. Verify a number, check a derate curve, copy a part code straight from the source.

  4. TM-04

    Rows export like a BOM

    Filter to a shortlist, export to CSV. Tabular numerics, canonical units, manufacturer + part number — drop it into a spec sheet without massage.

▮▮▮ DATA SOURCES ▮▮▮
  1. DS-01

    PDF catalogs

    specodex CLI: page-finder identifies spec tables (free, no LLM call), Gemini extracts structured rows, Pydantic validates, DynamoDB stores. Never feed it a 600-page raw catalog — page filtering is mandatory.

  2. DS-02

    Product webpages

    web-scraper CLI: Playwright renders JS-heavy product pages, pulls JSON-LD + HTML, runs the same extraction pipeline. Behaves the same as PDFs from the database's perspective.

  3. DS-03

    Manual entry

    Admin-mode UI: presigned-URL upload, full CRUD, per-row edits. For when a vendor ships a one-off spec note that no PDF will ever show.

▮▮▮ PRODUCT TYPES ▮▮▮
  1. PT-01

    Motors

    Brushless DC, AC servo, AC induction. Voltage, current, power, torque, speed, encoder type, rotor inertia, IP rating.

  2. PT-02

    Drives

    Servo and VFD drives. Input/output voltage, power, switching frequency, I/O counts, fieldbus protocols, safety ratings.

  3. PT-03

    Gearheads

    Planetary, harmonic, cycloidal. Ratio, backlash, continuous and peak torque, input speed, torsional rigidity, service life.

  4. PT-04

    Electric cylinders

    Stroke, push/pull force, continuous force, linear speed, positioning repeatability, lead-screw pitch.

  5. PT-05

    Linear actuators

    Stroke, force class, lead, screw type, duty cycle, IP rating — the screw-driven side, kept separate from servo electric cylinders.

  6. PT-06

    Robot arms

    Payload, reach, pose repeatability, max TCP speed, axis count, per-axis torque and speed.

  7. PT-07

    Contactors

    AC-1/AC-3 ratings, coil voltages, auxiliary contact counts, short-circuit ratings — switching gear that lives one rack over from the drives.

  8. PT-08

    Extensible

    Run ./Quickstart schemagen <pdf>... --type <name> with 3–5 vendor catalogs to scaffold a new Pydantic model. It auto-discovers in every CLI; the TS allow-lists are documented in CLAUDE.md.

▮▮▮ QUICK START ▮▮▮

Everything goes through ./Quickstart <command> — a single bash entry point that delegates to cli/quickstart.py. Run it with no command for local dev servers.

# Clone and install
git clone https://github.com/JimothyJohn/specodex.git
cd specodex
uv sync

# Local dev (backend :3001, frontend Vite :5173)
./Quickstart dev

# Pre-push gate — mirrors CI exactly: lint + tests + build
./Quickstart verify

# Extract specs from a PDF (page-finder + Gemini + Pydantic)
uv run specodex \
  --url "https://example.com/motor-catalog.pdf" \
  --type motor --manufacturer "Acme" --product-name "X100"

# Scrape a product webpage (JS-rendered, same pipeline)
uv run web-scraper \
  --url "https://shop.example.com/products/X100" \
  --type motor --manufacturer "Acme" --product-name "X100"

# Query the database
uv run dsm find --type motor \
  --where "rated_power>=1000" --sort "rated_torque:desc"

# Propose a new Pydantic product model from 3-5 vendors
./Quickstart schemagen abb.pdf siemens.pdf schneider.pdf --type valve

# Benchmark the ingress pipeline against control datasheets
./Quickstart bench
▮▮▮ API ▮▮▮
Endpoint Description
GET /healthStatus, mode, environment, timestamp
GET /api/v1/searchFull-text + filtered + sorted search
GET /api/productsList products (filter ?type=motor)
GET /api/products/summaryCounts by product type
GET /api/products/categoriesAvailable product types with counts
GET /api/products/manufacturersUnique manufacturer list
GET /api/datasheetsDatasheet source entries
POST /api/uploadQueue a datasheet for processing (admin)
▮▮▮ ARCHITECTURE ▮▮▮
  1. ARCH-01

    Extraction (Python)

    Page-finder text heuristic strips a 600-page catalog to ~20 spec pages before any LLM call. Gemini emits structured JSON; specodex validators map it to canonical value;unit compact strings. Quality-scored, then written.

  2. ARCH-02

    Frontend

    React + TypeScript + Vite. Deployed to S3 behind CloudFront. Two modes: admin (full CRUD) and public (read-only search and filter).

  3. ARCH-03

    Backend

    Express on AWS Lambda via API Gateway. REST: search, product CRUD, datasheet management, upload pipeline.

  4. ARCH-04

    Data

    DynamoDB single-table: PK=PRODUCT#TYPE, SK=PRODUCT#UUID. S3 for PDFs. Deterministic UUIDs deduplicate across sources.