The article discusses an improvement in Manticore Search (a popular full-text search engine) that addresses a common issue with product model names. Specifically, it tackles the problem where users often type model names without spaces between words and numbers (e.g., "xt850"), while traditional tokenization methods split such terms into separate tokens ("xt" and "850"). This mismatch can lead to search failures because the exact term is not present in the database.
Key Points:
-
Problem Description:
- Users frequently type model names as single words without spaces (e.g., "iphone5se").
- Traditional tokenization splits these into separate tokens ("iphone", "5", and "se"), leading to non-matches.
-
Solution in Manticore Search:
- Starting from version 23.0.0, Manticore introduced the
bigram_delimitersetting along with newbigram_indexmodes (second_numericandsecond_has_digit) to address this issue. - These settings allow for more flexible tokenization that can handle model names without spaces by treating them as single tokens or specific patterns.
- Starting from version 23.0.0, Manticore introduced the
-
**Usage of `bigram_delimiter
Read the full article at DEV Community
Want to create content about this topic? Use Nemati AI tools to generate articles, social posts, and more.

![[AINews] The Unreasonable Effectiveness of Closing the Loop](/_next/image?url=https%3A%2F%2Fmedia.nemati.ai%2Fmedia%2Fblog%2Fimages%2Farticles%2F600e22851bc7453b.webp&w=3840&q=75)



