Python - Stemming Algorithms





In the areas of Natural Language Processing we come across situation where two or more words have a common root. For example, the three words - agreed, agreeing and agreeable have the same root word agree. A search involving any of these words should treat them as the same word which is the root word. So, it becomes essential to link all the words into their root word. The NLTK library has methods to do this linking and give the output showing the root word.

There are three most used stemming algorithms available in nltk. They give slightly different result. The below example shows the use of all the three stemming algorithms and their result.


import nltk
from nltk.stem.porter import PorterStemmer
from nltk.stem.lancaster import LancasterStemmer
from nltk.stem import SnowballStemmer 

porter_stemmer = PorterStemmer()
lanca_stemmer = LancasterStemmer()
sb_stemmer = SnowballStemmer("english",)

word_data = "Aging head of famous crime family decides to transfer his position to one of his subalterns" 
# First Word tokenization
nltk_tokens = nltk.word_tokenize(word_data)
#Next find the roots of the word
print '***PorterStemmer****\n'
for w_port in nltk_tokens:
   print "Actual: %s  || Stem: %s"  % (w_port,porter_stemmer.stem(w_port))

print '\n***LancasterStemmer****\n'    
for w_lanca in nltk_tokens:
      print "Actual: %s  || Stem: %s"  % (w_lanca,lanca_stemmer.stem(w_lanca))
print '\n***SnowballStemmer****\n' 

for w_snow in nltk_tokens:
      print "Actual: %s  || Stem: %s"  % (w_snow,sb_stemmer.stem(w_snow))   

When we run the above program we get the following output −

***PorterStemmer****

Actual: Aging  || Stem: age
Actual: head  || Stem: head
Actual: of  || Stem: of
Actual: famous  || Stem: famou
Actual: crime  || Stem: crime
Actual: family  || Stem: famili
Actual: decides  || Stem: decid
Actual: to  || Stem: to
Actual: transfer  || Stem: transfer
Actual: his  || Stem: hi
Actual: position  || Stem: posit
Actual: to  || Stem: to
Actual: one  || Stem: one
Actual: of  || Stem: of
Actual: his  || Stem: hi
Actual: subalterns  || Stem: subaltern

***LancasterStemmer****

Actual: Aging  || Stem: ag
Actual: head  || Stem: head
Actual: of  || Stem: of
Actual: famous  || Stem: fam
Actual: crime  || Stem: crim
Actual: family  || Stem: famy
Actual: decides  || Stem: decid
Actual: to  || Stem: to
Actual: transfer  || Stem: transf
Actual: his  || Stem: his
Actual: position  || Stem: posit
Actual: to  || Stem: to
Actual: one  || Stem: on
Actual: of  || Stem: of
Actual: his  || Stem: his
Actual: subalterns  || Stem: subaltern

***SnowballStemmer****

Actual: Aging  || Stem: age
Actual: head  || Stem: head
Actual: of  || Stem: of
Actual: famous  || Stem: famous
Actual: crime  || Stem: crime
Actual: family  || Stem: famili
Actual: decides  || Stem: decid
Actual: to  || Stem: to
Actual: transfer  || Stem: transfer
Actual: his  || Stem: his
Actual: position  || Stem: posit
Actual: to  || Stem: to
Actual: one  || Stem: one
Actual: of  || Stem: of
Actual: his  || Stem: his
Actual: subalterns  || Stem: subaltern


Frequently Asked Questions

+
Ans: Python Text Processing Tutorial for Beginners - Learn Python Text Processing in simple and easy steps starting from basic to advanced concepts with examples including Text Processing,Text Processing Environment,String Immutability,Sorting Lines,Reformatting Paragraphs,Counting Token in Paragraphs ,Convert Binary to ASCII,Convert ASCII to Binary,Strings as Files,Backward File Reading,Filter Duplicate Words,Extract Emails from Text,Extract URL from Text,Pretty Print Numbers,Text Processing State Machine,Capitalize and Translate,Tokenization,Remove Stopwords,Synonyms and Antonyms,Text Translation,Word Replacement,Spelling Check,WordNet Interface,Corpora Access,Tagging Words,Chunks and Chinks,Chunk Classification,Text Classification,Bigrams,Process PDF,Process Word Document,Reading RSS feed,Sentiment Analysis,Search and Match,Text Munging,Text wrapping,Frequency Distribution,Text Summarization,Stemming Algorithms,Constrained Search. view more..
+
Ans: Python Text Processing Tutorial for Beginners - Learn Python Text Processing in simple and easy steps starting from basic to advanced concepts with examples including Text Processing,Text Processing Environment,String Immutability,Sorting Lines,Reformatting Paragraphs,Counting Token in Paragraphs ,Convert Binary to ASCII,Convert ASCII to Binary,Strings as Files,Backward File Reading,Filter Duplicate Words,Extract Emails from Text,Extract URL from Text,Pretty Print Numbers,Text Processing State Machine,Capitalize and Translate,Tokenization,Remove Stopwords,Synonyms and Antonyms,Text Translation,Word Replacement,Spelling Check,WordNet Interface,Corpora Access,Tagging Words,Chunks and Chinks,Chunk Classification,Text Classification,Bigrams,Process PDF,Process Word Document,Reading RSS feed,Sentiment Analysis,Search and Match,Text Munging,Text wrapping,Frequency Distribution,Text Summarization,Stemming Algorithms,Constrained Search. view more..
+
Ans: Python Text Processing Tutorial for Beginners - Learn Python Text Processing in simple and easy steps starting from basic to advanced concepts with examples including Text Processing,Text Processing Environment,String Immutability,Sorting Lines,Reformatting Paragraphs,Counting Token in Paragraphs ,Convert Binary to ASCII,Convert ASCII to Binary,Strings as Files,Backward File Reading,Filter Duplicate Words,Extract Emails from Text,Extract URL from Text,Pretty Print Numbers,Text Processing State Machine,Capitalize and Translate,Tokenization,Remove Stopwords,Synonyms and Antonyms,Text Translation,Word Replacement,Spelling Check,WordNet Interface,Corpora Access,Tagging Words,Chunks and Chinks,Chunk Classification,Text Classification,Bigrams,Process PDF,Process Word Document,Reading RSS feed,Sentiment Analysis,Search and Match,Text Munging,Text wrapping,Frequency Distribution,Text Summarization,Stemming Algorithms,Constrained Search. view more..
+
Ans: Python Text Processing Tutorial for Beginners - Learn Python Text Processing in simple and easy steps starting from basic to advanced concepts with examples including Text Processing,Text Processing Environment,String Immutability,Sorting Lines,Reformatting Paragraphs,Counting Token in Paragraphs ,Convert Binary to ASCII,Convert ASCII to Binary,Strings as Files,Backward File Reading,Filter Duplicate Words,Extract Emails from Text,Extract URL from Text,Pretty Print Numbers,Text Processing State Machine,Capitalize and Translate,Tokenization,Remove Stopwords,Synonyms and Antonyms,Text Translation,Word Replacement,Spelling Check,WordNet Interface,Corpora Access,Tagging Words,Chunks and Chinks,Chunk Classification,Text Classification,Bigrams,Process PDF,Process Word Document,Reading RSS feed,Sentiment Analysis,Search and Match,Text Munging,Text wrapping,Frequency Distribution,Text Summarization,Stemming Algorithms,Constrained Search. view more..
+
Ans: Python Text Processing Tutorial for Beginners - Learn Python Text Processing in simple and easy steps starting from basic to advanced concepts with examples including Text Processing,Text Processing Environment,String Immutability,Sorting Lines,Reformatting Paragraphs,Counting Token in Paragraphs ,Convert Binary to ASCII,Convert ASCII to Binary,Strings as Files,Backward File Reading,Filter Duplicate Words,Extract Emails from Text,Extract URL from Text,Pretty Print Numbers,Text Processing State Machine,Capitalize and Translate,Tokenization,Remove Stopwords,Synonyms and Antonyms,Text Translation,Word Replacement,Spelling Check,WordNet Interface,Corpora Access,Tagging Words,Chunks and Chinks,Chunk Classification,Text Classification,Bigrams,Process PDF,Process Word Document,Reading RSS feed,Sentiment Analysis,Search and Match,Text Munging,Text wrapping,Frequency Distribution,Text Summarization,Stemming Algorithms,Constrained Search. view more..
+
Ans: Python Tutorial for Beginners - Learn Python in simple and easy steps starting from basic to advanced concepts with examples including Python Syntax Object Oriented Language, Methods, Tuples, Tools/Utilities, Exceptions Handling, Sockets, GUI, Extentions, XML Programming. view more..
+
Ans: Python Overview - Learn Python in simple and easy steps starting from basic to advanced concepts with examples including Python Syntax Object Oriented Language, Methods, Tuples, Tools/Utilities, Exceptions Handling, Sockets, GUI, Extentions, XML Programming. view more..
+
Ans: Python Environment Setup - Learn Python in simple and easy steps starting from basic to advanced concepts with examples including Python 3 Syntax Object Oriented Language, Methods, Tuples, Tools/Utilities, Exceptions Handling, Sockets, GUI, Extentions, XML Programming. view more..
+
Ans: Python Basic Syntax - Learn Python in simple and easy steps starting from basic to advanced concepts with examples including Python Syntax Object Oriented Language, Methods, Tuples, Tools/Utilities, Exceptions Handling, Sockets, GUI, Extentions, XML Programming. view more..
+
Ans: Python Variable Types - Learn Python in simple and easy steps starting from basic to advanced concepts with examples including Python Syntax Object Oriented Language, Methods, Tuples, Tools/Utilities, Exceptions Handling, Sockets, GUI, Extentions, XML Programming. view more..
+
Ans: Python Basic Operators - Learn Python in simple and easy steps starting from basic to advanced concepts with examples including Python Syntax Object Oriented Language, Methods, Tuples, Tools/Utilities, Exceptions Handling, Sockets, GUI, Extentions, XML Programming. view more..
+
Ans: Python Decision Making - Learn Python in simple and easy steps starting from basic to advanced concepts with examples including Python Syntax Object Oriented Language, Methods, Tuples, Tools/Utilities, Exceptions Handling, Sockets, GUI, Extentions, XML Programming. view more..
+
Ans: Python Loops - Learn Python in simple and easy steps starting from basic to advanced concepts with examples including Python Syntax Object Oriented Language, Methods, Tuples, Tools/Utilities, Exceptions Handling, Sockets, GUI, Extentions, XML Programming. view more..
+
Ans: Python Numbers - Learn Python in simple and easy steps starting from basic to advanced concepts with examples including Python Syntax Object Oriented Language, Methods, Tuples, Tools/Utilities, Exceptions Handling, Sockets, GUI, Extentions, XML Programming. view more..
+
Ans: Python Strings - Learn Python in simple and easy steps starting from basic to advanced concepts with examples including Python Syntax Object Oriented Language, Methods, Tuples, Tools/Utilities, Exceptions Handling, Sockets, GUI, Extentions, XML Programming. view more..
+
Ans: Python Lists - Learn Python in simple and easy steps starting from basic to advanced concepts with examples including Python Syntax Object Oriented Language, Methods, Tuples, Tools/Utilities, Exceptions Handling, Sockets, GUI, Extentions, XML Programming. view more..
+
Ans: Python Tuples - Learn Python in simple and easy steps starting from basic to advanced concepts with examples including Python Syntax Object Oriented Language, Methods, Tuples, Tools/Utilities, Exceptions Handling, Sockets, GUI, Extentions, XML Programming. view more..
+
Ans: Python Dictionary - Learn Python in simple and easy steps starting from basic to advanced concepts with examples including Python Syntax Object Oriented Language, Methods, Tuples, Tools/Utilities, Exceptions Handling, Sockets, GUI, Extentions, XML Programming. view more..




Rating - NAN/5
473 views

Advertisements