I. CRAW DATA

Craw data from Goodread by Selenium and Python.

Installation and Run

Python3 .
Clone this repository .

$ git clone https://github.com/congdaoduy298/Crawl-Data.git

Install dependencies .

$ cd Crawl-Data/

$ pip3 install -r requirements.txt

Run file by terminal .

$ python crawl_books.py

Result

Total running time: 6181s

II. NAMED ENTITY RECOGNITION

Get Vietnamese NER by using VnCoreNLP and English NER by using nltk + spacy.

Installation

Python 3.4+ (< 3.8).
Have to install all dependent libraries

$ pip3 install -r requirements.txt

Clone VnCoreNLP repository and install vncorenlp.

$ git clone https://github.com/vncorenlp/VnCoreNLP

Java 1.8+
File VnCoreNLP-1.1.1.jar (27MB) and folder models (115MB) are placed in the same working folder.
NLTK Library (Do not need if use Bert-base model).
Spacy Library (Do not need if use Bert-base model).

$ python3 -m spacy download en_core_web_sm

Run

I. Use NLTK and Spacy

Run VnCoreNLP server.

$ vncorenlp -Xmx2g <FULL-PATH-to-VnCoreNLP-jar-file> -p 9000 -a "wseg,pos,ner"

Open new terminal.

$ python3 get_ner.py

II. Use Bert-base

Get NER with Vietnamese sentences by VnCoreNLP.

$ vncorenlp -Xmx2g <FULL-PATH-to-VnCoreNLP-jar-file> -p 9000 -a "wseg,pos,ner"

$ python3 get_vn_ner.py

Use GPU of Google Colab and run all code in Bert_NER.ipynb.

REFERENCES

VnCoreNLP: A Vietnamese Natural Language Processing Toolkit

Named Entity Recognition with NLTK and SpaCy

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
Bert_NER.ipynb		Bert_NER.ipynb
README.md		README.md
VN_name_entity.json		VN_name_entity.json
crawl_books.py		crawl_books.py
data.json		data.json
get_ner.py		get_ner.py
get_vn_ner.py		get_vn_ner.py
name_entity.json		name_entity.json
requirements.txt		requirements.txt
utils.py		utils.py
utils_ner.py		utils_ner.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

I. CRAW DATA

Installation and Run

Result

II. NAMED ENTITY RECOGNITION

Installation

Run

REFERENCES

About

Uh oh!

Releases

Packages

Languages

congdaoduy298/Crawl-Data

Folders and files

Latest commit

History

Repository files navigation

I. CRAW DATA

Installation and Run

Result

II. NAMED ENTITY RECOGNITION

Installation

Run

REFERENCES

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages