User Tools

Site Tools


managing_ingests_and_corpora

Managing Ingests and Corpora

Creating Ingests

Creating a new Ingest will reingest all data from ANNIS, including Corpus Meta Items, Texts, Text Meta Items, HTML Visualizations, Search Fields, and Search Field Values. Due to the fact that the ingest process needs to execute Javascript to ingest HTML Visualizations and run a headless browser, the Ingest process is time-consuming. An administrator should allow at least 30 seconds per text for an ingest.

To create a new Ingest, navigate to the administrative homepage and to the right of the “Ingests” content type title, select “+ Add”. On the next screen, in the bottom right corner of the screen, select “Save”. This will start the ingest process running in the background on the server.

The Ingest logs information about which texts it is ingesting and where it is in the ingest process. To view this information, SSH to the server where the application is hosted and with a text editor such as nano, emacs, or vi, open the following file: /home/ubuntu/celery/coptic_django/logs/celery-worker.log. This will provide information about the ingest process and log any errors that might have occurred during an ingest.

To watch the progress of the ingest as it occurs, connect to the server via SSH and issue the following command: “tail -f /home/ubuntu/celery/coptic_django/logs/celery-worker.log”. This will display the output of the log file as an ingest progresses.

Adding and Removing Corpora

Adding A Corpora

To add a corpora to be included in an ingest, first navigate to the administrative homepage. Then, to the right of the “Corpora” title for the Corpora content type, click “+ Add”.

Fill out the fields accordingly for the new corpus that will be added to the URN Resolver. Reference the documentation on the Corpora content type in Section 1 for more information. An administrator will not need to associate Corpus Meta Items manually—those will occur with the next ingest.

Texts for the corpora will be appear after running the next ingest.

Removing a Corpora

To remove a corpora, first navigate to the administrative homepage and click on the “Corpora” title for the Corpora content type. Then click on the title of the corpus to be deleted. On the corpus edit screen (titled “Change corpus”), on the bottom left of the screen, click “Delete”. On the following page titled “Are you sure?”, on the bottom right of the screen, click “Yes, I’m sure.”.

All of the Texts, Text Meta Items, and Corpus Meta Items will be removed on the next ingest.

Expiring Ingests

Expiring an ingest will mark all Texts as expired. Whenever a user visits a text marked as expired, the HTML Visualizations for that Text will be re-ingested from ANNIS and the Text will no longer be marked as expired. To expire an ingest, first navigate to the administrative homepage. Then to the right of the “Expire Ingest” title on the Expire Ingest content type, select “+ Add”. On the following page, in the bottom right corner of the screen, click “Save”. This will mark all texts as expired.

managing_ingests_and_corpora.txt · Last modified: 2015/09/09 08:29 by ctschroeder