Live demo

Retrieval · Ablation · Hybrid

Twenty-nine thousand toys, one query away.

A vision-language model tags every product. Those tags are clustered into 4,609 canonical labels. A 64-dimensional autoencoder re-ranks what the text filter catches. The result — a 4.45× lift over the naïve baseline — is shown live for eight held-out queries below.

Precision @ 10
0.154
hybrid mean · 132 queries
Lift over baseline
4.45×
vs. autoencoder only
Catalogue
29,637
SKUs · 99.98% scraped
Build cost
< HK$500
one-off API spend

§ I

Pick a query

§ II

Dispatch

§ III

Top-ten findings

Hybrid retrieval · text-tag filter → 64-d embedding re-rank. Ground-truth hits are marked.