Query console
§ II
Tag · Search · Recommend
A vision-language model reads each product's image and title and emits hashtags. The 29,466 raw tags collapse into 4,609 canonical labels that anyone can search. A 64-dimensional autoencoder then re-ranks the matches to produce the top-10 recommendations — a 4.45× lift over the text-only baseline.
§ I
One product, one pass. The VLM reads the image + title and emits hashtags; clustering collapses them into canonical labels.
§ II
§ IV
Hybrid retrieval — the canonical-tag filter narrows 29,637 products to a shortlist; the 64-d embedding re-ranks the top ten.