![]() For Vietnamese, I want to look at vnTokenzizer.For Japanese, I want to look at MeCab, tinysegmenter, and possibly CaboCha.As long as we can run the analysis and map analyzed tokens back to their original text, we can do a most of the language analysis analysis to determine whether the analyzer is worth pursuing for integration. Fortunately, we don’t need to commit to full Elasticsearch integration to perform our standard testing. Other important criteria for actual development and deployment (which would be assessed in a follow-up task) include:īased on my review of these seven languages, I suggest testing some of the software packages. Code that looks to be reasonably mature. Code that isn’t in a huge library and doesn’t have massive dependencies.Code that’s in a reasonable programming language. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |