Commit 6ed8c8d0 authored by Maximilian Legnar's avatar Maximilian Legnar

added some instructions to README.md

parent 895d0dec
...@@ -3,7 +3,7 @@ ...@@ -3,7 +3,7 @@
This python project was created as part of the article "Natural Language Processing in diagnostic texts from This python project was created as part of the article "Natural Language Processing in diagnostic texts from
nephropathology". nephropathology".
The paper can be found (soon) [here](LINK). The paper can be found [here](LINK).
The scripts ```database_preparation/data_preparation_pipeline.py```, ```TextClustering/clustering_pipeline.py``` The scripts ```database_preparation/data_preparation_pipeline.py```, ```TextClustering/clustering_pipeline.py```
and ```TextClassification/classification_pipeline.py``` gives an idea of how this project can be used with other datasets. and ```TextClassification/classification_pipeline.py``` gives an idea of how this project can be used with other datasets.
...@@ -18,7 +18,11 @@ Feel free to use and adapt the scripts to your own needs. ...@@ -18,7 +18,11 @@ Feel free to use and adapt the scripts to your own needs.
## Requirements ## Requirements
```database_preparation/preprocess.py``` requires some nltk corporas: Create a new environment, then install the required python packages with
```pip install -r requirements.txt```.
The script ```database_preparation/preprocess.py``` requires some nltk corporas:
``` ```
import nltk import nltk
nltk.download('stopwords') nltk.download('stopwords')
......
# -*- coding: iso-8859-1 -*- # -*- coding: iso-8859-1 -*-
import os import os
# params: ##### Script paramseters: Please adapt to own data!!! #####
path_to_reports = '../DataNephroTexts/reports' path_to_reports = '../DataNephroTexts/reports'
author_names = "Name1 Name2 Name3 Name4" ## <- Type in the names of the pathologists of your institut! author_names = "Name1 Name2 Name3 Name4" ## names of the pathologists who wrote the reports
splitted_reports_folder_path = '../DataNephroTexts' splitted_reports_folder_path = '../DataNephroTexts'
path2corpus_bow_preprocessed_diagnosis = 'database/bow_prepro_diag.pkl' path2corpus_bow_preprocessed_diagnosis = 'database/bow_prepro_diag.pkl'
path2corpus_embedding_preprocessed_diagnosis = 'database/embedding_prepro_diag.pkl' path2corpus_embedding_preprocessed_diagnosis = 'database/embedding_prepro_diag.pkl'
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment