Skip to content

Search keyword in traditional/simplified Chinese text documetns

Notifications You must be signed in to change notification settings

JohnYeung-dojjy/chinese_doc_search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Chinese doc search

This is a Chinese article text search Engine developed using FastHTML and ElasticSearch

Loading data to ElasticSearch form excel file

If you have enough memory to load the entire excel file at once, then it is strict forward by using

   for i, row in df.iterrows():
      ... # reference to create_fake_data.py

Otherwise, please either:

  1. Save the file as csv and refer to How do I read a large csv file with pandas?
  2. Know the number of rows in your file in advance and refer to read a full excel file chunk by chunk using pandas

How to run

  1. Setup ElasticSearch in docker and start the container, My version is 8.15.0
  2. create a .env file and define these variables in it
    • DEBUG
    • ELASTICSEARCH_PORT
    • ELASTICSEARCH_INDEX
  3. pip install -r requirements.txt
  4. python src/main.py

Previews

The tests are done on 10,000 entries, each entries' full_text section is 10,000 characters long

Please click on the links to watch the preview videos

About

Search keyword in traditional/simplified Chinese text documetns

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages