Welcome to EduNLP’s Tutorials and Documentation

_images/EduNLP.png VERSION PyPI test codecov Documentation Status Download DOI License

EduNLP is a library for advanced Natural Language Processing in Python and is one of the projects of EduX plan of BDAA. It’s built on the very latest research, and was designed from day one to be used in real educational products.

EduNLP now comes with pretrained pipelines and currently supports segment, tokenization and vertorization. It supports varies of preprocessing for NLP in educational scenario, such as formula parsing, multi-modal segment.

EduNLP is commercial open-source software, released under the Apache-2.0 license.

Install

EduNLP requires Python version 3.6, 3.7, 3.8 or 3.9. EduNLP use PyTorch as the backend tensor library.

We recommend installing EduNLP by pip:

# basic installation
pip install EduNLP

# full installation
pip install EduNLP[full]

But you can also install from source:

git clone https://github.com/bigdata-ustc/EduNLP.git
cd EduNLP

# basic installation
pip install .

# full installation
pip install .[full]

Getting Started

One basic usage of EduNLP is to convert an item into a vector, i.e.,

from EduNLP import get_pretrained_i2v
i2v = get_pretrained_i2v("d2v_all_256", "./model")
item_vector, token_vector = i2v(["the item content"])

For absolute beginners, start with the Tutorial to EduNLP (中文版). It covers the basic concepts of EduNLP and a step-by-step on training, loading and using the language models.

Resource

We will continuously publish new datasets in Standard Item Format (SIF) to encourage the relevant research works. The data resources can be accessed via another EduX project EduData

Contribution

EduNLP is free software; you can redistribute it and/or modify it under the terms of the Apache License 2.0. We welcome contributions. Join us on GitHub and check out our contribution guidelines (中文版).

Citation

If this repository is helpful for you, please cite our work

 @misc{bigdata2021edunlp,
  title={EduNLP},
  author={bigdata-ustc},
  publisher = {GitHub},
  journal = {GitHub repository},
  year = {2021},
  howpublished = {\url{https://github.com/bigdata-ustc/EduNLP}},
}