added some instructions to README.md

6ed8c8d0 · Maximilian Legnar · 895d0dec · 6ed8c8d0 · 6ed8c8d0
Commit 6ed8c8d0 authored Jul 18, 2022 by Maximilian Legnar
Hide whitespace changes
Inline Side-by-side

Showing with 8 additions and 4 deletions

README.md README.md +6 -2

data_preparation_pipeline.py database_preparation/data_preparation_pipeline.py +2 -2

No files found.
--- a/README.md
+++ b/README.md
@@ -3,7 +3,7 @@
 This python project was created as part of the article "Natural Language Processing in diagnostic texts from
 nephropathology".

-The paper can be found (soon) [here](LINK).
+The paper can be found [here](LINK).

 The scripts ```database_preparation/data_preparation_pipeline.py```, ```TextClustering/clustering_pipeline.py```
 and ```TextClassification/classification_pipeline.py``` gives an idea of how this project can be used with other datasets.
@@ -18,7 +18,11 @@ Feel free to use and adapt the scripts to your own needs.

 ## Requirements

-```database_preparation/preprocess.py``` requires some nltk corporas:
+Create a new environment, then install the required python packages with
+
+```pip install -r requirements.txt```.  
+
+The script ```database_preparation/preprocess.py``` requires some nltk corporas:
 ```
 import nltk
 nltk.download('stopwords')

--- a/database_preparation/data_preparation_pipeline.py
+++ b/database_preparation/data_preparation_pipeline.py
 # -*- coding: iso-8859-1 -*-
 import os

-# params:
+##### Script paramseters: Please adapt to own data!!! #####
 path_to_reports = '../DataNephroTexts/reports'
-author_names = "Name1 Name2 Name3 Name4"   ## <- Type in the names of the pathologists of your institut!
+author_names = "Name1 Name2 Name3 Name4"   ## names of the pathologists who wrote the reports
 splitted_reports_folder_path = '../DataNephroTexts'
 path2corpus_bow_preprocessed_diagnosis = 'database/bow_prepro_diag.pkl'
 path2corpus_embedding_preprocessed_diagnosis = 'database/embedding_prepro_diag.pkl'