Arabic Named Entity Recognition with HuggingFace and AraBERT

This project demonstrates how to implement Named Entity Recognition (NER) for Arabic text using the HuggingFace Transformers library and the BERT architecture. The implementation involves loading a dataset, tokenizing and aligning labels, training a AraBERT model for token classification, evaluating the model, and deploying it using Gradio.

Overview

Dataset: Utilizes the conllpp-ner-ar dataset from HuggingFace datasets.
Model: Employs aubmindlab/bert-base-arabertv02 for token classification.
Training: Includes tokenization, label alignment, and model training using the Trainer API.
Evaluation: Evaluates the model on a test dataset to measure performance metrics.
Deployment: Uses Gradio to create an interactive web interface for the NER model, allowing users to input Arabic text and receive NER predictions.

Installation

Clone the repository:

git clone https://github.com/MostafaAhmed98/arabic_named_entity
cd arabic_named_entity

Install the required libraries:

pip install transformers datasets evaluate gradio

Usage

To use the Gradio app, run the following command:

python app.py

This will launch a Gradio interface in your web browser where you can input Arabic text and get Named Entity Recognition predictions.

or you can access the model from HuggingFace Spaces : https://huggingface.co/spaces/MostafaAhmed98/arabic_named_entity

Acknowledgements

HuggingFace for the transformers and datasets libraries.
e-hossam96 for the conllpp-ner-ar dataset.
aubmindlab for the bert-base-arabertv02 model.

Gradio App Usage

Here's how you can use the app.py file to run the Gradio interface:

import gradio as gr
from pathlib import Path
from transformers import pipeline

# Define base path and default text
base_path = str(Path(__file__).parent)
default_text = "اجتياح رفح الفلسطينية أكبر جريمة إبادة فى التاريخ المعاصر"

def loading_model_and_prediction(ner_text):
    # Load the model checkpoint
    model_checkpoint = base_path + "/checkpoint-3846/"
    token_classifier = pipeline("token-classification", model=model_checkpoint, aggregation_strategy="simple")
    predictions = token_classifier(ner_text)
    formated_preds = [f"the word {i['word']} is labeled as {i['entity_group']}" for i in predictions]
    return formated_preds

def predict(user_text):
    model_preds = loading_model_and_prediction(user_text)
    if len(model_preds) == 0:
        return "No Named Entity Found"
    return "\n".join(model_preds)

# Create Gradio interface
demo = gr.Interface(
    fn=predict,
    inputs=gr.Text(value=default_text, placeholder="Arabic Text", label="Arabic Text"),
    outputs=gr.Text(label="Named Entity Predictions"),
    title="Arabic Named Entity",
    allow_flagging=False
)

# Launch the interface
demo.launch(share=True)

Save this code as app.py and run it using:

python app.py

This will start a Gradio web interface where you can input Arabic text and receive NER predictions.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
model_implementation		model_implementation
README.md		README.md
app.py		app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arabic Named Entity Recognition with HuggingFace and AraBERT

Overview

Installation

Usage

Acknowledgements

Gradio App Usage

About

Releases

Packages

Languages

MostafaAhmed98/arabic_named_entity

Folders and files

Latest commit

History

Repository files navigation

Arabic Named Entity Recognition with HuggingFace and AraBERT

Overview

Installation

Usage

Acknowledgements

Gradio App Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages