Download CHECKMEZ.EXE
by Arnauld Souplet IFREMER 1, rue Jean Vilar F-34200 Sète March 1998
1. Introduction
By writing the CHECKMED program (A. Souplet, 1995), it was expected that most of errors in the MEDITS data files could be pointed out. Nevertheless it appeared after the 1996 and 1997 surveys that some errors can remain, for example : file not sorted as described below, several lines (one by sex) for the same species in the same haul in the TB file, cross-checking between TB and TC files not made.
To avoid this kind of errors, a second version as been written whose name is CHKMED2 . It does basically the same things as the CHECKMED one with some minor amendments and includes now the checking of the above mentionned errors.
2. Warning
To correctly operate the programs,
it is essential that the TA, TB and TC files are sorted
as indicated below. The files can be sorted using any commercial software
(Excel
â
, Dbase
â
, etc ...)
2.1. "TA" file
Sorted by :
HAUL NUMBER (position 28-30 in the file)
2.2. "TB" file
Sorted by :
HAUL NUMBER (position 11-13)
GENUS-SPECIES (17-23)
2.3. "TC" file
Sorted by :
HAUL NUMBER (position 11-13)
GENUS-SPECIES (16-22)
WEIGHT OF THE FRACTIONS (PFRAC ; 24-29)
SEX (36-36)
LENGTH CLASS (43-46)
3. Structure of the software
From the previous version (dated from October 1995), the package has been
extended to include 12 countries/areas, 200 hauls in each, 800 species in the
FM list and 31 reference species. It is divided into 1 batch file, 6 programs
and 4 reference files. These are :
CHKMED2.BAT |
Beginning of the checking procedure |
CHKMED2D.EXE |
Input various informations used by the other programs |
CHKMED2A.EXE |
Checks TA |
CHKMED2B.EXE |
Checks TB |
CHKMED2C.EXE |
Checks TC |
CHKMED2F.EXE |
Results of the individual file checking |
CHKMED2X.EXE |
Cross-checking of TB and TC |
BATyy.CHK |
List of vessels (yy = year) |
ESPECEyy.CHK |
List of species with faunistic category and length code |
LATLON.CHK |
Limits of latitudes and longitudes for each country/area |
MINMAX.CHK |
Minimum and maximum length ever seen for each species (31) of the reference list |
In addition the package uses the reference file ESPECE.REF used in the INDMED program.
All these files must be on the same directory together with the 3 sorted data files to be checked : TAccyyaa.TXT , TBccttaa.TXT and TCccyyaa.TXT in which cc = country code on 2 characters, yy = year and aa = area code on 2 characters.
As output, the programs create 3 files which have the same name as the data file. The file name extension .TXT is replaced by .VER (for the French "vérification", i.e. "checking"). In these files clear error or warning messages are written. In the case of several records for a given species in a given haul, the program creates a file named TBccttaa.MUL in which the numbers of records for the same species in the same haul are given.
4. The reference files
4.1. Vessels reference file
This file contains, for each country/area the 2 characters country and area codes and the 3 characters code for the vessels. It might be changed each year if necessary. Its structure for 1997 is shown below :
ES__COR
FR__LEU
ITM1FRP
ITM2NUS
ITM3SAN
ITM4BIM
ITM5PRI
SL__PRI
HR__PRI
AL__BIM
GRG1PAR
GRG2IRO
4.2. Species reference file
It contains the Rubbin code, the faunistic category (CATFAU) and the length code (CODLON) for all species refered to in the LISTFM file used for the relevant survey. An example of this file is given below :
ABRAVER C0
ACANEXI Bm
ACANPEL Bm
ACATPAL A0
AEQUOPE D0
ALEPROS A0
ALLOMED C0
4.3. Latitudes and longitudes reference file
It contains the minimal and maximal latitudes an longitudes for each country/area. This file might also be changed if necessary. Its main use is to avoid important typing errors, which, for example, could lead us to have trawled under the Tour Eiffel !. This file is shown below (the negative value for the longitude indicates a west longitude).
ES__3555.16-516.154223.75 339.29
FR__4130.20 304.944328.12 945.35
ITM14052.88 746.954419.861327.98
ITM23830.56 745.574119.241001.00
ITM33522.591112.974102.211610.67
ITM43644.141511.254206.391846.59
ITM54159.391218.794538.981734.61
SL__4533.731331.024535.391336.74
HR__4216.241310.024531.531751.70
AL__4025.131846.914141.201924.63
GRM13810.162237.994051.942609.35
GRM2 35.27 25.583900.102747.99
4.4. Reference file of minimum and maximum length by species
This file contains the minimum and maximum length ever seen in MEDITS for the 31 species of reference. It might obviously been modified if an individual outside these limits is reported but it can be useful to detect enormous typing errors such as a hake of 3 m instead of 30 cm. The lengths in the file are given in millimeters :
CITHMAC 45 330
EUTRGUR 40 370
HELIDAC 40 310
LEPMBOS 40 385
LOPHBUD 40 870
LOPHPIS 30 1120
MERLMER 10 840
MICMPOU 20 410
MULLBAR 20 370
MULLSUR 60 370
PAGEACA 25 330
PAGEBOG 35 535
PAGEERY 35 440
SPARPAG 30 510
PHYIBLE 30 565
RAJACLA 100 970
SOLEVUL 80 435
SPICFLE 40 240
TRACMED 20 435
TRACTRA 30 460
TRISCAP 20 285
ZEUSFAB 45 620
ARITANT 10 64
ARISFOL 10 40
NEPRNOR 10 41
PAPELON 10 47
ELEDCIR 10 295
ILLECOI 13 250
LOLIVUL 10 440
OCTOVUL 30 250
SEPIOFF 15 185
5. Running the program
5.1. Getting started
By typing
CHKMED2
, the user will call the batch file shown below :
ECHO OFF
IF EXIST CHKMED2.DAT DEL CHKMED2.DAT
IF NOT EXIST CHKMED.DAT GOTO POINT1
DEL CHKMED.DAT
:POINT1
CHKMED2D
CHKMED2A
CHKMED2B
IF EXIST CHKMED2.DAT GOTO FIN
CHKMED2C
CHKMED2F
:FIN
This batch file calls the first 5 programs in the right order. The first one (
CHKMED2D
) asks the user for the Country code, the Area code and the Year. The file
names to be checked are build from these data and stored in the
CHKMED.DAT
file, together with the number of errors encountered in each file (0 at the
beginning of course).
5.2. Checking of "TA" file (program
CHKMED2A
)
The program first checks that the hauls are in numerical order, which means
that at least this file has been sorted in the duty way (
that does not prevent the user to check that the other files have been sorted
). If not the following message is display and the program stops.
***************************************************************************
* *
* THE "TA" FILE IS NOT SORTED BY HAUL NUMBER. YOU HAVE TO SORT IT *
* TO DO THAT REFER TO THE "CHECKME2.DOC" DOCUMENT *
* *
* AT THE SAME TIME CHECK WHETHER OR NOT THE "TB" AND "TC" FILES ARE SORTED*
* AS DECRIBED IN THE SAME DOCUMENT *
* *
***************************************************************************
If the "TA" file appears to have been correctly sorted, the program checks that
each haul, from the first to the last, appears once and only once in the file.
Thereafter it checks the file line by line. The following checkings are
performed :
TYPENR
TA
PAYS
Correctly typed
BATEAU
The same as in the reference file
ENGIN
GOC73
GREEMENT
GC73
PANNEAUX
WHS8
AN
The correct year as given by the user in the first program
MOIS
April to August
JOUR
1 to 30 for April and June ; 1 to 31 for May, July and August
FERCHA
S or C
QUADEB
QUAFIN
1 for France, Italy, Slovenia, Croatia, Albania and Greece ; 1 or 7 for Spain
LATDEB
LGNDEB
LATFIN
LGNFIN
They must be within the limits referred to in the reference file
PRODEB
PROFIN
The difference between these two depths is calculated. If it is larger than
20%, a warning message is written in the output file together with the two
depths.
HDEB
HFIN
HDEB must be less than HFIN
DUREE
It must be equal to those calculated from the reported times
VALID
V or I
PARCOU
R or N
ESPENR
0 to 4
DIST
It must be equal (within a 10% limit) to those calculated from the reported
positions
OUVER
1 to 4 m
ECAIL
5 to 25 m
PRGEO
M or E
LONBRA
100 meters down to 200 m depth, 150 meters downward
LONFUN
100 to 2200 m
DIAFUN
14 to 30 mm
5.3. Checking of "TB" file (program
CHKMED2B
)
The program first reads the valid hauls in the TA file and checks that all
these hauls are present in the TB file. It asks the user whether or not he
wants to check that the reported species are in the FM list. If not, there will
be no error message if a species does not belong to the list. Thereafter it
checks that it is only one line by species and by haul. If not, the following
message is displayed and the program stops.
***************************************************************************
* *
* THERE IS SOME MULTIPLE RECORDS FOR THE SAME SPECIES IN THE SAME HAUL *
* YOU CAN CHECK THEM IN THE FILE "Tbccttaa.MUL" *
* *
* WHEN ALL THESE ERRORS WILL BE CORRECTED, YOU MUST RUN THE WHOLE *
* PROGRAM AGAIN TO DETECT OTHER POSSIBLE ERRORS IN THE "TB" FILE *
* *
* FURTHERMORE YOU SHOULD NOTE THAT IN THE CASE OF THESE ERRORS *
* THE "TC" FILE IS NOT YET CHECKED *
* *
***************************************************************************
If this errors do not occur, the file is checked line by line. The following
checkings are performed :
TYPENR
TB
PAYS
Correctly typed
BATEAU
The same as in the reference file
AN
The correct year as given by the user in the first program
FERCHA
S or C
PARTIT
S if FERCHA = S ; A, M or P if FERCHA = C
SPECIES
Checks whether or not the reported species is in the FM list
CATFAU
Must be the same as in MEDIESP.DAT file for the given species
PTOT
Must not be 0
NBTOT
Must not be 0 but equal to NBFEM+NBMAL+NBIND
In addition, the program checks that there are only allowed characters in the
file. In case of a forbidden character, the program stops and the number of the
line containing a forbidden character is displayed on the screen.
5.4. Checking of "TC" file (program
CHKMED2C
)
The program first reads the valid hauls in the TA file and checks that all
these hauls are present in the TC file. If not, it is not necessarily an error
because it could happen that there is no reference species in a given haul.
Thereafter it checks the file line by line. The following checkings are
performed :
TYPENR
TC
PAYS
Correctly typed
BATEAU
The same as in the reference file
AN
The correct year as given by the user in the first program
FERCHA
S or C
PARTIT
S if FERCHA = S ; A, M or P if FERCHA = C
SPECIES
Checks whether or not the reported species is in the FM list
CODLON
Must be the same as in MEDIESP.DAT file for the given species
PECHAN
Must be less than or equal to PFRAC
SEXE
F, M, I or N
MATUR
0-4
CLALON
Must be within the limits defined in MINMAX.CHK file
NBLON
Must be better less than 500
Once again the program checks that there are only allowed characters in the
file. In case of a forbidden character, the program stops and the number of the
line containing a forbidden character is displayed on the screen.
5.5. Results of the checking of the individual files (program
CHKMED2F
)
This program displays whether or not each file is correct and, if not, the
number of errors which remains in each and the name(s) of the file(s) in which
these errors are listed. Thereafter, it diplays the following
important
message (the first lines are only an example in the case of three correct
file) :
***************************************************************************
"TA" FILE IS CORRECT
"TB" FILE IS CORRECT
"TC" FILE IS CORRECT
EVEN IF ALL YOUR FILES ARE INTRINSICALLY CORRECT, DONT FORGET TO RUN
THE "CHKMED2X" PROGRAM TO MAKE THE CROSS-CHECKING OF THE "TB" AND "TC" FILES
THANK YOU FOR YOU COOPERATION
ARNAULD
***************************************************************************
5.6. Cross checking of "TB" and "TC" files (program
CHKMED2X
)
This program has to be run only after that the 3 data files have been checked
and corrected indicidually. It is called by typing
CHKMED2X
. The checking is performed only on the reference species because to make it
for all species in the FM list would need a huge memory, which is not currently
available on PC's. For the other species, the attention of the users is drawn
on the fact that some mistakes can remain in the data files, due to no control
on that species. For each haul and reference species the program checks that :
The output file has the following name : TXccyyaa.VER with cc = country code , yy = year , aa = area code . The program advices you how many errors have been detected or if the two files are consistent. In the later case, it give you its congratulations on behalf of myself.
6. General comments and Conclusions
All these programs can be operated alone, without running the whole software.
This is useful to make the user sure that no error remains in the file and it
can be done very easily by typing the program's name without modifying the
CHKMED.DAT file.
This checking procedure is NOT a correction procedure. The cases of errors are
so numerous and so various (as it has been seen !) that it seems rather
impossible to write such a procedure.
It is everybody's duty to correct, if necessary, his own files
.
The author thinks (once more, maybe the last !) that the majority of the
possible errors has been taken into account. Nevertheless, should the users
note some omission(s) and/or bug(s) in this package, they are invited to
contact him as soon as possible.
Thank you again for your co-operation.