Commit 6ed8c8d0 authored by Maximilian Legnar's avatar Maximilian Legnar

added some instructions to README.md

parent 895d0dec
......@@ -3,7 +3,7 @@
This python project was created as part of the article "Natural Language Processing in diagnostic texts from
nephropathology".
The paper can be found (soon) [here](LINK).
The paper can be found [here](LINK).
The scripts ```database_preparation/data_preparation_pipeline.py```, ```TextClustering/clustering_pipeline.py```
and ```TextClassification/classification_pipeline.py``` gives an idea of how this project can be used with other datasets.
......@@ -18,7 +18,11 @@ Feel free to use and adapt the scripts to your own needs.
## Requirements
```database_preparation/preprocess.py``` requires some nltk corporas:
Create a new environment, then install the required python packages with
```pip install -r requirements.txt```.
The script ```database_preparation/preprocess.py``` requires some nltk corporas:
```
import nltk
nltk.download('stopwords')
......
# -*- coding: iso-8859-1 -*-
import os
# params:
##### Script paramseters: Please adapt to own data!!! #####
path_to_reports = '../DataNephroTexts/reports'
author_names = "Name1 Name2 Name3 Name4" ## <- Type in the names of the pathologists of your institut!
author_names = "Name1 Name2 Name3 Name4" ## names of the pathologists who wrote the reports
splitted_reports_folder_path = '../DataNephroTexts'
path2corpus_bow_preprocessed_diagnosis = 'database/bow_prepro_diag.pkl'
path2corpus_embedding_preprocessed_diagnosis = 'database/embedding_prepro_diag.pkl'
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment