Download CHECKMEZ.EXE

 

CHECKMED2:
A SECOND VERSION OF THE SOFTWARE TO CHECK THE MEDITS DATA FILES

by Arnauld Souplet IFREMER 1, rue Jean Vilar F-34200 Sète March 1998

 

1. Introduction

By writing the CHECKMED program (A. Souplet, 1995), it was expected that most of errors in the MEDITS data files could be pointed out. Nevertheless it appeared after the 1996 and 1997 surveys that some errors can remain, for example : file not sorted as described below, several lines (one by sex) for the same species in the same haul in the TB file, cross-checking between TB and TC files not made.

To avoid this kind of errors, a second version as been written whose name is CHKMED2 . It does basically the same things as the CHECKMED one with some minor amendments and includes now the checking of the above mentionned errors.

2. Warning

To correctly operate the programs, it is essential that the TA, TB and TC files are sorted as indicated below. The files can be sorted using any commercial software (Excel â , Dbase â , etc ...)

2.1. "TA" file

Sorted by :

HAUL NUMBER (position 28-30 in the file)

2.2. "TB" file

Sorted by :

HAUL NUMBER (position 11-13)

GENUS-SPECIES (17-23)

2.3. "TC" file

Sorted by :

HAUL NUMBER (position 11-13)

GENUS-SPECIES (16-22)

WEIGHT OF THE FRACTIONS (PFRAC ; 24-29)

SEX (36-36)

LENGTH CLASS (43-46)

3. Structure of the software

From the previous version (dated from October 1995), the package has been extended to include 12 countries/areas, 200 hauls in each, 800 species in the FM list and 31 reference species. It is divided into 1 batch file, 6 programs and 4 reference files. These are :

CHKMED2.BAT

Beginning of the checking procedure

CHKMED2D.EXE

Input various informations used by the other programs

CHKMED2A.EXE

Checks TA

CHKMED2B.EXE

Checks TB

CHKMED2C.EXE

Checks TC

CHKMED2F.EXE

Results of the individual file checking

CHKMED2X.EXE

Cross-checking of TB and TC

BATyy.CHK

List of vessels (yy = year)

ESPECEyy.CHK

List of species with faunistic category and length code

LATLON.CHK

Limits of latitudes and longitudes for each country/area

MINMAX.CHK

Minimum and maximum length ever seen for each species (31) of the reference list

 

In addition the package uses the reference file ESPECE.REF used in the INDMED program.

All these files must be on the same directory together with the 3 sorted data files to be checked : TAccyyaa.TXT , TBccttaa.TXT and TCccyyaa.TXT in which cc = country code on 2 characters, yy = year and aa = area code on 2 characters.

As output, the programs create 3 files which have the same name as the data file. The file name extension .TXT is replaced by .VER (for the French "vérification", i.e. "checking"). In these files clear error or warning messages are written. In the case of several records for a given species in a given haul, the program creates a file named TBccttaa.MUL in which the numbers of records for the same species in the same haul are given.

4. The reference files

4.1. Vessels reference file

This file contains, for each country/area the 2 characters country and area codes and the 3 characters code for the vessels. It might be changed each year if necessary. Its structure for 1997 is shown below :

ES__COR

FR__LEU

ITM1FRP

ITM2NUS

ITM3SAN

ITM4BIM

ITM5PRI

SL__PRI

HR__PRI

AL__BIM

GRG1PAR

GRG2IRO

 

4.2. Species reference file

It contains the Rubbin code, the faunistic category (CATFAU) and the length code (CODLON) for all species refered to in the LISTFM file used for the relevant survey. An example of this file is given below :

ABRAVER C0

ACANEXI Bm

ACANPEL Bm

ACATPAL A0

AEQUOPE D0

ALEPROS A0

ALLOMED C0

4.3. Latitudes and longitudes reference file

It contains the minimal and maximal latitudes an longitudes for each country/area. This file might also be changed if necessary. Its main use is to avoid important typing errors, which, for example, could lead us to have trawled under the Tour Eiffel !. This file is shown below (the negative value for the longitude indicates a west longitude).

ES__3555.16-516.154223.75 339.29

FR__4130.20 304.944328.12 945.35

ITM14052.88 746.954419.861327.98

ITM23830.56 745.574119.241001.00

ITM33522.591112.974102.211610.67

ITM43644.141511.254206.391846.59

ITM54159.391218.794538.981734.61

SL__4533.731331.024535.391336.74

HR__4216.241310.024531.531751.70

AL__4025.131846.914141.201924.63

GRM13810.162237.994051.942609.35

GRM2 35.27 25.583900.102747.99

4.4. Reference file of minimum and maximum length by species

This file contains the minimum and maximum length ever seen in MEDITS for the 31 species of reference. It might obviously been modified if an individual outside these limits is reported but it can be useful to detect enormous typing errors such as a hake of 3 m instead of 30 cm. The lengths in the file are given in millimeters :

CITHMAC 45 330

EUTRGUR 40 370

HELIDAC 40 310

LEPMBOS 40 385

LOPHBUD 40 870

LOPHPIS 30 1120

MERLMER 10 840

MICMPOU 20 410

MULLBAR 20 370

MULLSUR 60 370

PAGEACA 25 330

PAGEBOG 35 535

PAGEERY 35 440

SPARPAG 30 510

PHYIBLE 30 565

RAJACLA 100 970

SOLEVUL 80 435

SPICFLE 40 240

TRACMED 20 435

TRACTRA 30 460

TRISCAP 20 285

ZEUSFAB 45 620

ARITANT 10 64

ARISFOL 10 40

NEPRNOR 10 41

PAPELON 10 47

ELEDCIR 10 295

ILLECOI 13 250

LOLIVUL 10 440

OCTOVUL 30 250

SEPIOFF 15 185

5. Running the program

5.1. Getting started

By typing CHKMED2 , the user will call the batch file shown below :

ECHO OFF

IF EXIST CHKMED2.DAT DEL CHKMED2.DAT

IF NOT EXIST CHKMED.DAT GOTO POINT1

DEL CHKMED.DAT

:POINT1

CHKMED2D

CHKMED2A

CHKMED2B

IF EXIST CHKMED2.DAT GOTO FIN

CHKMED2C

CHKMED2F

:FIN

This batch file calls the first 5 programs in the right order. The first one ( CHKMED2D ) asks the user for the Country code, the Area code and the Year. The file names to be checked are build from these data and stored in the CHKMED.DAT file, together with the number of errors encountered in each file (0 at the beginning of course).

5.2. Checking of "TA" file (program CHKMED2A )

The program first checks that the hauls are in numerical order, which means that at least this file has been sorted in the duty way ( that does not prevent the user to check that the other files have been sorted ). If not the following message is display and the program stops.

***************************************************************************

* *

* THE "TA" FILE IS NOT SORTED BY HAUL NUMBER. YOU HAVE TO SORT IT *

* TO DO THAT REFER TO THE "CHECKME2.DOC" DOCUMENT *

* *

* AT THE SAME TIME CHECK WHETHER OR NOT THE "TB" AND "TC" FILES ARE SORTED*

* AS DECRIBED IN THE SAME DOCUMENT *

* *

***************************************************************************

If the "TA" file appears to have been correctly sorted, the program checks that each haul, from the first to the last, appears once and only once in the file. Thereafter it checks the file line by line. The following checkings are performed :

TYPENR

TA

PAYS

Correctly typed

BATEAU

The same as in the reference file

ENGIN

GOC73

GREEMENT

GC73

PANNEAUX

WHS8

AN

The correct year as given by the user in the first program

MOIS

April to August

JOUR

1 to 30 for April and June ; 1 to 31 for May, July and August

FERCHA

S or C

QUADEB

QUAFIN

1 for France, Italy, Slovenia, Croatia, Albania and Greece ; 1 or 7 for Spain

LATDEB

LGNDEB

LATFIN

LGNFIN

They must be within the limits referred to in the reference file

PRODEB

PROFIN

The difference between these two depths is calculated. If it is larger than 20%, a warning message is written in the output file together with the two depths.

HDEB

HFIN

HDEB must be less than HFIN

DUREE

It must be equal to those calculated from the reported times

VALID

V or I

PARCOU

R or N

ESPENR

0 to 4

DIST

It must be equal (within a 10% limit) to those calculated from the reported positions

OUVER

1 to 4 m

ECAIL

5 to 25 m

PRGEO

M or E

LONBRA

100 meters down to 200 m depth, 150 meters downward

LONFUN

100 to 2200 m

DIAFUN

14 to 30 mm

 

5.3. Checking of "TB" file (program CHKMED2B )

The program first reads the valid hauls in the TA file and checks that all these hauls are present in the TB file. It asks the user whether or not he wants to check that the reported species are in the FM list. If not, there will be no error message if a species does not belong to the list. Thereafter it checks that it is only one line by species and by haul. If not, the following message is displayed and the program stops.

***************************************************************************

* *

* THERE IS SOME MULTIPLE RECORDS FOR THE SAME SPECIES IN THE SAME HAUL *

* YOU CAN CHECK THEM IN THE FILE "Tbccttaa.MUL" *

* *

* WHEN ALL THESE ERRORS WILL BE CORRECTED, YOU MUST RUN THE WHOLE *

* PROGRAM AGAIN TO DETECT OTHER POSSIBLE ERRORS IN THE "TB" FILE *

* *

* FURTHERMORE YOU SHOULD NOTE THAT IN THE CASE OF THESE ERRORS *

* THE "TC" FILE IS NOT YET CHECKED *

* *

***************************************************************************

 

If this errors do not occur, the file is checked line by line. The following checkings are performed :

TYPENR

TB

PAYS

Correctly typed

BATEAU

The same as in the reference file

AN

The correct year as given by the user in the first program

FERCHA

S or C

PARTIT

S if FERCHA = S ; A, M or P if FERCHA = C

SPECIES

Checks whether or not the reported species is in the FM list

CATFAU

Must be the same as in MEDIESP.DAT file for the given species

PTOT

Must not be 0

NBTOT

Must not be 0 but equal to NBFEM+NBMAL+NBIND

 

In addition, the program checks that there are only allowed characters in the file. In case of a forbidden character, the program stops and the number of the line containing a forbidden character is displayed on the screen.

5.4. Checking of "TC" file (program CHKMED2C )

The program first reads the valid hauls in the TA file and checks that all these hauls are present in the TC file. If not, it is not necessarily an error because it could happen that there is no reference species in a given haul. Thereafter it checks the file line by line. The following checkings are performed :

TYPENR

TC

PAYS

Correctly typed

BATEAU

The same as in the reference file

AN

The correct year as given by the user in the first program

FERCHA

S or C

PARTIT

S if FERCHA = S ; A, M or P if FERCHA = C

SPECIES

Checks whether or not the reported species is in the FM list

CODLON

Must be the same as in MEDIESP.DAT file for the given species

PECHAN

Must be less than or equal to PFRAC

SEXE

F, M, I or N

MATUR

0-4

CLALON

Must be within the limits defined in MINMAX.CHK file

NBLON

Must be better less than 500

 

Once again the program checks that there are only allowed characters in the file. In case of a forbidden character, the program stops and the number of the line containing a forbidden character is displayed on the screen.

5.5. Results of the checking of the individual files (program CHKMED2F )

This program displays whether or not each file is correct and, if not, the number of errors which remains in each and the name(s) of the file(s) in which these errors are listed. Thereafter, it diplays the following important message (the first lines are only an example in the case of three correct file) :

***************************************************************************

 

"TA" FILE IS CORRECT

 

"TB" FILE IS CORRECT

 

"TC" FILE IS CORRECT

 

 

EVEN IF ALL YOUR FILES ARE INTRINSICALLY CORRECT, DONT FORGET TO RUN

THE "CHKMED2X" PROGRAM TO MAKE THE CROSS-CHECKING OF THE "TB" AND "TC" FILES

 

THANK YOU FOR YOU COOPERATION

 

ARNAULD

 

***************************************************************************

5.6. Cross checking of "TB" and "TC" files (program CHKMED2X )

This program has to be run only after that the 3 data files have been checked and corrected indicidually. It is called by typing CHKMED2X . The checking is performed only on the reference species because to make it for all species in the FM list would need a huge memory, which is not currently available on PC's. For the other species, the attention of the users is drawn on the fact that some mistakes can remain in the data files, due to no control on that species. For each haul and reference species the program checks that :

The output file has the following name : TXccyyaa.VER with cc = country code , yy = year , aa = area code . The program advices you how many errors have been detected or if the two files are consistent. In the later case, it give you its congratulations on behalf of myself.

6. General comments and Conclusions

All these programs can be operated alone, without running the whole software. This is useful to make the user sure that no error remains in the file and it can be done very easily by typing the program's name without modifying the CHKMED.DAT file.

This checking procedure is NOT a correction procedure. The cases of errors are so numerous and so various (as it has been seen !) that it seems rather impossible to write such a procedure. It is everybody's duty to correct, if necessary, his own files .

The author thinks (once more, maybe the last !) that the majority of the possible errors has been taken into account. Nevertheless, should the users note some omission(s) and/or bug(s) in this package, they are invited to contact him as soon as possible.

Thank you again for your co-operation.