An analyzer of type snowball
that uses the standard tokenizer, with standard filter, lowercase filter, stop filter, and snowball filter.
The Snowball Analyzer is a stemming analyzer from Lucene that is originally based on the snowball project from snowball.tartarus.org.
Sample usage:
{ "index" : { "analysis" : { "analyzer" : { "my_analyzer" : { "type" : "snowball", "language" : "English" } } } } }
The language
parameter can have the same values as the snowball filter and defaults to English
. Note that not all the language analyzers have a default set of stopwords provided.
The stopwords
parameter can be used to provide stopwords for the languages that has no defaults, or to simply replace the default set with your custom list. A default set of stopwords for many of these languages is available from for instance here and here.
A sample configuration (in YAML format) specifying Swedish with stopwords:
index : analysis : analyzer : my_analyzer: type: snowball language: Swedish stopwords: "och,det,att,i,en,jag,hon,som,han,på,den,med,var,sig,för,så,till,är,men,ett,om,hade,de,av,icke,mig,du,henne,då,sin,nu,har,inte,hans,honom,skulle,hennes,där,min,man,ej,vid,kunde,något,från,ut,när,efter,upp,vi,dem,vara,vad,över,än,dig,kan,sina,här,ha,mot,alla,under,någon,allt,mycket,sedan,ju,denna,själv,detta,åt,utan,varit,hur,ingen,mitt,ni,bli,blev,oss,din,dessa,några,deras,blir,mina,samma,vilken,er,sådan,vår,blivit,dess,inom,mellan,sådant,varför,varje,vilka,ditt,vem,vilket,sitta,sådana,vart,dina,vars,vårt,våra,ert,era,vilkas"