Version 40, last updated by alc28 at 07 May 12:32 UTC
processor documentation
processor documentation
-
SFTP raw vendor file to openurlquality.niso.org
- Each known logfile source has its own directory within/var/www/vhosts/openurlquality.org/httpdocs/oqdata/providers
- if it is a new provider, cd to the main providers directory and run this script:
- . ../scripts/makeproviderdirs.sh newprovidername
- Load the file ./preprocess dir
- Files should be named as yyyyqqprovider_openurls.txt. Ex: 2010q1ebsco_openurls.txt.gz
- Files may be gzipped/compressed
- If the provider file is too large to transfer by email attachment, give use our restricted ftp account. Files ftp-ed into this account appear in this directory: '/web_users/logprovider'. From here, the file can be copied into the normal /providers/preprocess directory. After the file is copied, log back into logprovider acccount and delete the file.
- Prepare vendor/library file: In order to be processed, each line in the file must be a separate URL. No additional fields are allowed. In some cases, preprocessing of the input files may be needed. Preprocessing should be done in the specific providers ./preprocess dir. If preprocessing is needed:
-
- cd into the ./preprocess dir. Read the README file for instructions.
- Place the file to be processed there and run the scripts according to the README.
- Unzip the file first if needed. If file has the extension .zip type unzip filename
- Rename the file according to our naming convention standards.
- Copy the file into the providers ./incoming dir so that it is available for processing (after the load is complete and tested, come back here and delete the copy that is still in the preprocess directory).
- cp filename ../incoming/
- Providers requiring reprocess: cornell, ebsco, git, kstate, oclc
- Parse and Load:
Move to oqdata directory cd ../.../.../ and pwd to see what directory you are in
Run the command: ./scripts/oq_processor.pl –p provider_name >> ./logs/oq_processor_20100916.log 2>&1 &
The “-p provider_name” is optional. If omitted, the script will look for incoming files across all known providers. The metrics file generated by the parser will be loaded into the table. The script will archive both the incoming file and the metric file (once loaded). oq_processor.pl includes an array of known providers; if the provider is new, this array needs to be edited. Also, new provider names need to be added to share.php in the UI.
Reprocess
If you need to reprocess a provider, you can copy all archived incoming files from the archive into the “live” incoming dir, then run the script with the “-r” option. This will cause all records to be dropped prior to the loads.
Reprocess one provider
Ex: ./scripts/oq_processor.pl –r –p provider_name >> ./logs/oq_processor_20100916.log 2>&1 &
This will drop the records from the table for the specific provider, then load all files processed that are found in the ./incoming dir of the provider.
Reprocess all providers
1) Copy all archived incoming files into ./incoming dir. From the ./provider dir, run the following command.
Ex: for d in `ls -A`; do cp ./$d/archive/incoming/* ./$d/incoming; done;
2) Process all provider files.
Ex: ./scripts/oq_processor.pl –r >> ./logs/oq_processor_20100916.log 2>&1 &
This will drop all records from the table (all providers) then load all files processed found in all ./incoming dirs.
Note: After load mysql command line client login: mysql -u oq2user -p -D alc28_oq2
To review all the logsources: select logsource, year, quarter, sum(count) from oq where metric ='total' group by logsource, year, quarter;