In this tutorial, you will learn –

Installing NLTK in Windows

In this part, we will learn that how to make setup NLTK via terminal (Command prompt in windows).

The instruction given below are based on the assumption that you don't have python installed. So, first step is to install python.

Installing Python in Windows:

Step 1) Go to link https://www.python.org/downloads/, and select the latest version for windows.

Note: If you don't want to download the latest version, you can visit the download tab and see all releases.

Step 2) Click on the Downloaded File

Step 3)Select Customize Installation

Step 4) Click NEXT

Step 5) In next screen

  1. Select the advanced options
  2. Give a Custom install location. In my case, a folder on C drive is chosen for ease in operation
  3. Click Install

Step 6) Click Close button once install is done.

Step 7) Copy the path of your Scripts folder.

Step 8) In windows command prompt

  • Navigate to the location of the pip folder
  • Enter command to install NLTK
    pip3 install nltk
  • Installation should be done successfully

NOTE: For Python2 use the commandpip2 install nltk

Step 9) In Windows Start Menu, search and open PythonShell

Step 10) You can verify whether the installation is accurate supplying the below command

import nltk

If you see no error, Installation is complete.

Installing NLTK in Mac/Linux

Installing NLTK in Mac/Unix requires python package manager pip to install nltk. If pip is not installed, please follow the below instructions to complete the process

Step1) Update the package index by typing the below command

sudo apt update

Step2) Installing pip for Python 3:

sudo apt install python3-pip

You can also install pip using easy_install.

sudo apt-get install python-setuptools  python-dev build-essential 

Now easy_install is installed. Run the below command to install pip

sudo easy_install pip

Step3)Use following command to install NLTK

sudo pip install -U nltk
sudo pip3 install -U nltk

Installing NLTK through Anaconda

Step1) Please install anaconda (which can also be used to install different packages) by visiting https://www.anaconda.com/download/ and select which version of python you need to install for anaconda.

Note: Refer to this tutorial for detailed steps to install anaconda

Step 2)In the Anaconda prompt,

  1. Enter command
    conda install -c anaconda nltk
  2. Review the package upgrade, downgrade, install information and enter yes
  3. NLTK is downloaded and installed

NLTK Dataset

NLTK module has many datasets available that you need to download to use. More technically it is called corpus. Some of the examples are stopwords, gutenberg, framenet_v15, large_grammarsand so on.

How to Download all packages of NLTK

Step 1)Run the Python interpreter in Windows or Linux

Step 2)

  1. Enter the commands
import nltk
nltk.download ()
  1. NLTK Downloaded Window Opens. Click the Download Button to download the dataset. This process will take time, based on your internet connection

NOTE: You can change the download location by Clicking File> Change Download Directory

Step 3) To test the installed data use the following code

>>> from nltk.corpus import brown
>>>brown.words()

['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]

Running the NLP Script

We are going to discuss how NLP script will be executed on our local PC. There are many libraries for Natural Language Processing present in the market. So choosing a library depends on fitting your requirements. Here is the list of NLP libraries.

How to Run NLTK Script

Step1) In your favorite code editor, copy the code and save the file as "NLTKsample.py "

from nltk.tokenize import RegexpTokenizer
tokenizer = RegexpTokenizer(r'\w+')
filterdText=tokenizer.tokenize('Hello Guru99, You have build a very good site and I love visiting your site.')
print(filterdText)

Code Explanation:

  1. In this program, the objective was to remove all type of punctuations from given text. We imported "RegexpTokenizer" which is a module of NLTK. It removes all the expression, symbol, character, numeric or any things whatever you want.
  2. You just have passed the regular Expression to the "RegexpTokenizer" module.
  3. Further, we tokenized the word using "tokenize" module. The output is stored in the "filterdText" variable.
  4. And printed them using "print()."

Step2)In the command prompt

  • Navigate to the location where you have saved the file
  • Run the command Python NLTKsample.py

This will show output as :

['Hello', 'Guru99', 'You', 'have', 'build', 'a', 'very', 'good', 'site', 'and', 'I', 'love', 'visiting', 'your', 'site']

 

YOU MIGHT LIKE: