Methodology

How CranioSwift is built, evaluated, and disclosed.

This page documents the methodology at the level a scientific reviewer needs to evaluate the work, without crossing the patent-disclosure line. For confidential method-level detail (architectures, loss functions, calibration internals), the Monash Innovation IDF channel carries the full disclosure.

Disclosure posture: Public — outcome / module-name level only
IP channel: Monash Innovation IDF (in flight)
Confidential channel: Patent counsel via Monash Innovation
Detail under NDA: Investor materials packet on request

1. Scope of disclosure

This page is a public methodology statement. It is deliberately bounded above (no architectural detail beyond module-name granularity) and bounded below (no marketing-only claims that wouldn't survive scientific review). The boundary is set by the active patent posture (see §9) and by the commercialization invariants documented in the project's CLAUDE.md.

What is on this page: protocol, splits, metrics, dataset licenses, calibration approach, commercial-clean vs research-line bookkeeping, evaluation rigor.
What is not on this page: model architectures, loss functions, decoder heads, threshold-calibration internals, training hyperparameter tables. Those live in the Monash Innovation IDF.

2. System overview: four reviewable modules

CranioSwift's backend is structured as four modules with clean interfaces. The module-level split is the public abstraction; downstream method-level detail is confidential during the patent window.

Defect recognition

Input: defective skull mask (CT-derived). Output: defect mask + defect type + region-of-interest for downstream modules.

Boundary constraint

Input: skull mask + coarse seed surface. Output: portable boundary-constraint artifact (rim points + skull surface summary).

Implant shape generation

Input: boundary-constraint artifact. Output: patient-specific implant mask + smooth surface mesh + reviewable preview metadata.

Engineering feedback

Input: implant candidate. Output: structured engineering feedback report. Current scope: structured proxy analysis suitable for triage and structural review; patient-specific contact analysis is on the post-funding roadmap.

The split is not academic: each module's output is the input to the next, and each module can advance, be audited, and be replaced independently. The reviewability claim ("a clinician can inspect each step") depends on this split, not on any single architectural choice inside a module.

3. Datasets and licensing posture

CranioSwift uses three families of cranial data; the licensing posture governs what can and cannot ship in any commercial path.

Dataset family	License posture	Use in CranioSwift
Public NC-licensed cranial datasets	CC BY-NC-SA 4.0 (research / non-commercial)	Research-line training and evaluation; the live demo bundle is research-only and cannot ship in any commercial path.
Public commercially-clean cranial datasets	CC BY 4.0 (commercial-clean)	The commercial-clean training corpus available today; small relative to clinical-distribution needs — the named bottleneck for commercial-grade performance is private clinical data scale.
Private clinical data (planned)	Project-specific institutional framework	The next-phase commercial-clean corpus; the pre-seed round funds the acquisition framework.

Every internal model bundle carries a model card recording (a) training-data sources, (b) their licenses, and (c) whether the bundle is research-only or commercial-grade. This is a project invariant, not optional. Specific dataset names, sources, and counts are listed in the investor materials packet under NDA.

4. Evaluation set, metrics, and promote rule

Evaluation set

A small fixed-tail set of public-cranial-dataset cases, held constant across all internal phases so per-case comparisons are stable across experiments. The choice is deliberate: a fixed tail surfaces regressions on the hardest cases that mean-only averaging would hide. The shipping bundle's per-case numbers are visible on the live demo.

Primary metric

Dice between the predicted implant mask and the ground-truth implant mask, averaged over the fixed-tail set. Per-case Dice and skull-overlap voxel counts are exposed on the live demo for each shipping bundle.

Secondary metrics

Skull-overlap voxel count (target = 0 across the set).
Watertight surface boolean (target = true across the set).
Per-case Dice (the tail set surfaces per-case regressions that mean alone hides).

Promote rule

A new model bundle is promoted to demo / commercial use only if both:

Mean fixed-tail Dice exceeds the prior baseline by a defined absolute margin.
No per-case Dice regresses below the baseline's per-case value (within tolerance).

Specific numeric thresholds, prior-baseline values, and per-phase outcomes against the promote rule are in the investor materials packet under NDA. The arc-level finding visible publicly: the binding constraint for the next promotion is private clinical data scale, not architecture or loss design.

5. Training and evaluation protocol

Train / evaluation split

Two training lines are maintained in parallel (see §8): a research line and a commercial-clean line, each with its own fixed train / evaluation split. No data leakage between train and evaluation in any phase. Specific case counts and held-out IDs are in the investor materials packet under NDA.

Source of ground truth

Implant masks are derived from the paired (intact, defective) pairs distributed with each public dataset; the implant is the residual region. For the planned private clinical data, ground-truth implants will come from post-cranioplasty CT scans where the actual implanted geometry can be segmented.

Negative-result reporting

Every internal phase report records experiments that did not improve the metric, with the diagnostic that ruled them out. The convention is binding: anything that doesn't promote is logged with a diagnostic, not silently dropped. The detailed negative-result inventory is shared under NDA via the investor materials packet.

6. Calibration approach

Inference-time thresholds are calibrated on the training set only, never on the evaluation tail set. The calibration procedure is bounded; method-level details are confidential and live in the IDF.

The public surface guarantee: no train-only threshold was ever optimized against tail-set Dice. This is the same reason the fixed-tail set is preserved across phases: any threshold or model selection touching the tail-set Dice would invalidate the cross-phase comparison.

7. Engineering feedback (module D)

Module D's current state is a structured engineering deck written per case — geometry, material assumptions, boundary conditions, and load cases at a level suitable for structural-engineering review. The deck is a structured proxy, useful for triage and review, not for clinical loading analysis.

The post-funding roadmap upgrades module D toward patient-specific contact analysis. Until then, the demo surface communicates module D status truthfully on every case as “deck written, simulation pending”.

8. Commercial-clean line vs research line

CranioSwift maintains two parallel training/evaluation lines so that commercial product paths and research outputs never mix in a way that violates dataset licenses or commercial framing.

Research line

Trained on public NC-licensed cranial datasets; evaluated on a fixed-tail subset. Public artifacts: the live demo bundle, internal evaluation reports, and the planned research publication.

Cannot ship in any commercial path. Research-only by license.

Commercial-clean line

Trained on public commercially-clean cranial datasets today; future training corpus = private clinical data (post-funding). Evaluated on a held-out commercially-clean subset (clinical-distribution yardstick). No artifacts yet ship at demo grade on this line — the binding constraint is data scale.

Can ship in commercial paths once a private-data-trained bundle clears the promote rule.

9. Patent-disclosure posture

CranioSwift files for institutional patent disclosure via the Monash Innovation IDF channel rather than self-filing a provisional. The institutional provisional is gated on an internal validation gate; until it is filed, public-facing materials — this site, public blog posts, talks, manuscript abstracts — stay at outcome / module-name granularity.

Method-level architectural detail (decoder structure, loss decomposition, residual labelling schemes, threshold-calibration internals) is disclosed only through the confidential channel: Monash Innovation IDF → patent counsel via Monash Innovation. Investor diligence on the technical line happens under NDA via the investor materials packet.

This posture is a project invariant and applies to every page on this site, every commit message in the public repository, and every public conversation about the technical line.

10. Reproducibility envelope

The shipping bundle's per-case outputs are visible on the live demo for inspection. Method-level reproducibility (training scripts, hyperparameters, calibration internals) is shared under NDA only, consistent with the patent posture in §9.

Researchers wanting to validate the method-level claims should request a confidential briefing through the IDF channel, not from public materials.

For the public summary of the model-development arc, see /phase-summary/. For the live model output on the fixed-tail cases, see /demo/ (invite-only). For diligence-grade method detail under NDA, contact [email protected].