Report of the Working Group on the MEDITS Data Base

Athens, 19-20 April 1996

 

 

 

The Working Group met in Athens at the NCMR Centre the 19th and 20th of April 1996.

1. Participants

J. Dokos Greece

E. Ferrandis Spain

S. Kavadas Greece

C. Papaconstantinou Greece

G. Petrakis Greece

C.Y. Politou Greece

A. Souplet (Chairman) France

A. Tursi Italy

2. Terms of Reference

The only reference to this Working Group is in the current MEDITS contract and its wording is rather general. So the Working Group’s chairman, in advance of the meeting, proposed to the members a list of questions to be addressed to and this list has been used as terms of reference. These are as follow :

During the meeting these "terms of reference" have been addressed to in that order and the resulting discussions are reported below. The recommendations or suggestions from the Working Group to the Steering Committee are written in bold characters.

3.Need of a Data Base

3.1. Data input

In the chairman’s proposal, the use of an integrated system to input the data at sea or in the laboratories was mentioned. A review of the data input existing systems showed that such systems have been developed in many participating Institutes : under Paradox in Greece, Dbase 4 in France (also used in Spain and by some Italian teams), Extrabase in Bari, etc.

In such a case it is recommended that :

3.2. Data checking and correction

It has been agreed that, as far as possible, the data validation should be done during the data input. It means that all input programs, whatever language in with they have been written, should include all necessary checking procedure. That doesn’t prevent to use the CHECKMED program on the ASCII data before to send them to the General Co-ordinator. In any case the IFREMER Laboratory in Sète will check the data a third time with the same program before to include them in the data base.

The problem of the species coding system has been raised. At present we use a so-called Rubbin’s code. It is pleasant to use but incomplete. In fact this code has been set for North Atlantic species and it is not sure at all that it includes all interesting Mediterranean species. An other problem is that of the synonymies or the species whose name is changed. The current procedure is that any new species should be notified to IFREMER which decides a new code (and sometimes invents it). In this case, the FM list runs the risk to become a "regional" system very different from an official one. It is believed that the NODC coding system covers all living species all over the world and, in that respect, this code could be considered as better than the currently used "Rubbin" code, but this code is far from user-friendly (a 10 digits number by species) although it is used by ICES in the IBTS program.

This question is addressed to the Steering Committee.

3.3. Data storage and retrieving, standard output and special queries

Obviously the data storage and retrieving is one of the main uses of a data base and this point did not make problem. In addition the system should be designed to produce standard output in a very easy and user-friendly way.

In some cases, peoples can need to retrieve special data. The data base should allow this without to much difficulties.

4. The Standard Output

By "standard output" it should be understood the results which are needed in routine to produce the annual reports. At present the INDMED program gives abundance indices in terms of density and length frequencies by stratum. It is expected that abundance indices could be calculated in terms of CPUE as well. Furthermore and because of the differences of growth between males and females, it has been agreed that it could be advisable to be able to calculate the length frequencies by sex if necessary. In the same way the system should provide data on maturity and reproduction in routine.

The system should provide data to draw maps of density, index, length frequency, kriging, pie-charts, etc. That raises the problem of the interface between the data base and the mapping program. Some progress in that way have already been made in Spain. The French mapping software KARTO does not seem appropriate and it has been suggested to use a small GIS named IDRISI with already digitised map from GEBCO.

It is asked to the Steering Committee to give advice on this last point and on the relationships between MEDITS and FIGIS.

5. Hardware

Considering the volume of data (» 5 Mb per year in ASCII format) and the current capacities of PC's (even portable) and their probable increase in the near future, it seems that a Pentium will be sufficient to host the data base because its capacities are very similar to those of an UNIX work station.

The question of the access to the data base by network has been raised. If PCs are used, each user needs a similar computer to run the programs. On the other hand with a UNIX server, the programs run on the server and the users need only small terminals. But in that case, both hardware and software are very expensive. Furthermore, a Pentium pro 200 can be used as a Web server and it is easy to transfer the data on this machine from the current PCs with SCSI.2.

6. Software

The first idea was to use a widely distributed software, well known by many users in the scientific community and made by an experienced company. So Access from Microsoft has been suggested. Although this software is largely used in the European University and research Institutes, it has been noticed that it is not "professional" in the sense that it is not object oriented. In addition it seems rather slow and its programming is quite difficult. It could be better to use software’s as Paradox, Dbase or Oracle. It is indicated that a large amount of work has already been done with Paradox in Greece. On the other hand it is easy to transfer data from Access to mapping software such as Mapinfo.

The Group agreed to postpone the decision which is not urgent.

Nevertheless it should be noted that the chosen software has not to be very sophisticated in terms of mathematics and statistics. The calculations necessary to produce the standard output are very simple. If we need more, it would be easy to extract the data from the data base and to use an other tool.

In any case we should avoid the tool which is supposed to do everything, because either it does not everything or it does but badly.

7. Data Base Location

The Group agreed that it could be dangerous to make the data base accessible by Internet. Even with a password system, some crafty peoples are able to enter the most protected systems.

It seemed more advisable for safety and maintenance reasons to have a central data base with an administrator and to distribute the data base by floppies or network to each participant, even if this could represent a hard work for the administrator.

The Steering Committee is invited to think where will be the data base located and eventually who will be the administrator.

8. Who will do the Work ?

The two options are by ourself or by a service company.

With a private company there is a risk to get an unsatisfactory product which would be impossible to modify.

As the compentencies exist in the various Institutes involved in the program, it could be preferable to make the data base software ourself. The communications between computer engineers and biologists will be easier, the evolutions of the software will be easier as well and this solution will be very likely less expensive.

Following that an answer to the last EU call of proposal will be made under a French coordination.