• No results found

Creation of a national mineral database - An experiment Part I

N/A
N/A
Protected

Academic year: 2022

Share "Creation of a national mineral database - An experiment Part I"

Copied!
7
0
0

Loading.... (view fulltext now)

Full text

(1)

CREATION OF A NATIONAL MINERAL DATABASE - AN EXPERIMENT

PART I

Describes the different aspects of a national mineral database. The first part deals with the theoretical aspects of the database and studies various methods employed for codification and formating the data. The secomJ-pm-Ldeals with the actual implementation and its use to answer various queries. For demonstration DMS-Ila Burroughs package on data base

management was used.

INTRODUCTION

During the last decade computer technology has assumed an important role in earth sciences.

There has been increasing recognition to the fact that electronic data processing enables earth scientists to utilize, to the fullest extent, the large resources of data currently and poten- tially available to them.

Once data have been acquired in computer- processable form, the geologist can take full advantage of the benefits offered by computer.

Data can be retrieved more efficiently and in a more meaningful form for the solution of a specific problem. Further advantages of electro- nic data processing can be realised in the ex- change and publication of mineral data. The exchange of data in a standardized computer processable form is an important factor in the improvement of communication amongst earth scientists. When data are really available in this form, the need to publish large quantities of data in tabular form is eliminated, and this results in cost reduction.

Part I of this study examines the standards which must be considered in developing format for recording, storing, and retrieving data in computer processable files and proposes a framework for linking all such data. This part

PKROY INSDOC

New Delhi 110067

does not consider input formats or other de- tailed specifications required by' specific com- puters and programs. [6]

DATA FILES ON MINERALS

Before discussing the concept of a national system, which would consist of a great number of data files, it is necessary to consider the requirements for an individual computer pro- cessable file.

Assembling a data file on minerals is not without difficulties. If however, keywords, system of classification and classification ter- minology are standardized for use within an individual file they may be treated as data and the difficulties are thus eliminated.

The requirements of an individual computer processable file are as follows:

1. Data within the file must be recorded in a consistent manner.

2. The file must use a reference numbering system that identifies uniquely each item or block of data, and provides an internal link between items of data.

If an individual file is to be used for more than one purpose, or if a number of individual files are to be utilised for one or more appli- cations, the standards u: ed to define and record data must be recognised by all potential users.

Extension of these requirements to all mineral data files in India, forms the basis of the con- cept of the national system.

NATIONAL SYSTEM OF MINERAL DATA concept and requirements: In order to achieve

(2)

the desired flexibility and compatibility with other environmental sciences, the system must consist of a framework of principles rather than detailed specifications and formats which may not stand the test of time and in any case are not fundamental to the system. Therefore, the system is defined according to the follow- ing principles:

1) The system will consist of files held by individual organisations dealing with the mineral data.

2) File within the system will be compu- ter-based, will be designed to cater to the needs of the geologist who collects and uses the data.

3) File within the system will be linked by entries for reference numbering and geographic location, and common me- thods of coding.

4) The national index will serve as the key to the contents and location of data files.

5) Standards will be established for de- fining and recording data. However, the specific document for recording the data will be based on the user's requirements [ 8 ].

National Index

One of the primary functions of the national system is the selective retrieval and interchange of data. This benefit cannot be achieved without a national index to mineral data available in India.

The following are the main points of na- tional index:

1) The. national index is a computer- assisted index to the existing mineral data contained in unpublished or pub- lished documents.

2) The national index is a necessary com- ponent of the national system, however, it can exist independently of the system.

3) A central agency is essential to co- ordinate indexing.

Vol 34 No 3 September 1987

4) Any organization wishing to partici- pate in the national system will be responsible for the indexing of its own data, according to standard indexing procedures.

5) All organisations and individuals will have access to the index regardless of whether or not they choose to index their own documents.

National Indexing Procedures

The essential steps in compiling a national index to mineral deposit data are:

i) Indexing: The indexing of any document involves the description of a document and its owner, and the recognition of its significant data content.

a) The title and other bibliographic data which are sufficient to identify a document, show where it is stored and by whom.

b) Type of output control required for the preparation of indexes. 'Output Control' is a device in the system that enables the total 'data store' to be split into any number of dis- crete indexes.

c) Concepts describing geographic location and kind of data. The development of skill and consistency in indexing will depend on the establishment and maintenance of an indexing manual.

ii) Vocabulary Control: For effective indexing, it is mandatory that a system of vocabulary control be established through a thesaurus.

When a group of more specific concepts is approved, as well as the generic term with a broader meaning, reference is made to the broader and narrower terms as well as related terms to indicate the choice available. The choice made by the indexer will be a matter of his judgment.

iii) Preparation and distribution of the index:

Document being indexed will provide the input to a computer-assisted information retrieval system that will store each entry on storage media. Printed editions of the national index

91

(3)

or part of it would be derived from these media as required, by exercising the output control.

The national index can appear in several volu- mes. Each printed index produced will provide an alphabetic listing of each concept that has been entered into the system upto that time.

The concepts recognized would appear as- headings in an alphabetic order, followed by numeric concepts.

A search for data would begin with a de- finition of concepts that refer to the kind of data sought, because the searcher would likely be unfamiliar with the policy adopted in defining concepts in the index. He would be wise to consult the thesaurus to check usage and reduce the chance of missing an index concept that may be useful in his task.

iv) Index revision: It is inevitable that users of the index will discover inconsistencies or find data that in their opinion, have not been properly indexed or entirely missed. It will be essential to provide a formal means for drawing this to the attention of the index policy group: Users should always be encouraged to suggest these improvements, including de- letion of documents by means of a special form designed for this purpose.

REFERENCE NUMBER

A reference number is used in documents for the identification and organisation of various items of data in a file into compartments suit- able for retrieval.

There is a strong compulsion to make further use of reference number by including data for such purposes as identifying the orga- nization, the decade and year of assignment, the project for which it was assigned, the class of data to which it refers and even geographic location of observed data. In minerals, a sample number is often analogous to a reference num-

ber and to many, a reference number has much in common with library call number. Both the library and sample numbers serve to classify units as well as identify them.

For computer-processed data, the reference number may be ·best described as a tag which accompanies the item of data and serves to identify it in the computer's memory. It may also serve to link items of fact relating to a single observation, point or sample.

Recommended Reference Numbering Format 1) The 'file code' would identify the organi- zation and enable the interchange of data within the context of a national system. To avoid duplication the 'file code' would be assigned by a central organization. .

2) The 'project number' would be designated by the organization in accordance with internal policy or operational requirement.

3) The 'year' would be recorded for example, as '80' for 1980.

4) 'Accession number' would be recorded as defined.

In general, reference number should satis- fy the following conditions:

i) Uniqueness ii) Consistency

iii) Meaningfulness to the participant but without usefulness as data in a system.

Where useful data must also be included elsewhere. It is unlikely that reference numbers in the suggested context will be used for sort- ing or retrieval, as this is most commonly done on geographic location, class, or discipline,

Recommended Reference Numbering Format 1

File code

2 Project No.

3 Year

4 Accession No.

1 234 5 6 7 8 9 10 11 12 13 14

(4)

all of which are considered data and hence are independent of the reference number.

iv) A prefix or 'Key' should precede the re- ference number where an interchange of data is to occur. This prefix, assigned by a central organization, would designate the name of the participating organization and would facilitate the interchange of data.

GEOGRAPHIC COORDINATES

Location is one major factor that is common to virtually all geographical data and obser- vations. It may also be the most frequently used reference in retrieval of data from files, and as such, must be common to files in widely divergent disciplines.

Other factors that are important in the selection of a system for defining location in data processing are that it should:

i) be national and preferably international in scope;

ii) be capable of defining both a point and an area;

iii) contain a mmunumof discontinui- ties between individual coordinate grids and thus be useful for ariel and regional computations;

iv) be amenable to use on automatic plot- ters;

v) be readily usable for measurement of coordinates from points plotted on a map both by the use of a simple romer and a digitizer;

vi) be available on current maps of India, vii) be widely known and understood.

viii) facilitate reoccupation of a point where a sample was taken or an observation made.

GEOGRAPHIC COORDINATES (LATITUDE AND LONGITUDE)

It is assumed that these geographic or angular coordinates are so well-known that a descrip- Vol 34 No 3 September 1987

tion is unnecessary. They are rarely printed as a grid on maps but appear in a graticule around the edge of maps with intersections plotted as crosses, in the body of the maps.

Neatlines (inner margins) of 1 :250,000 ma/s are divided into degrees and units of 15 minutes where neatlines of 1 :25,000 maps are in degrees and units of 5 minutes.

THE UNIVERSAL TRANSVERSE MERCA- TOR GRID

This is a world-wide system of zones, each zone being 6 degrees longitude wide and each extend- ing from the equator to the 80th parallel of latitudes, north and south. Within each of the 60 zones there is a rectangular grid in meters.

The ordinates of this grid are parallel to the central meridian of each zone and the abscissae are normal to it. Thus all squares (1,000 metre on 1:250,000 maps) are the same size through- out each zone [8].

CODING OF MINERAL NAMES AND TERMS Coding is the arbitrary assignment of symbols, numbers or letters to ordinary written language for some particular purpose such as shortening word lengths or facilitating computer process- ing. During the early days of computer techno- logy, coding was an effective means for coping with problems of limited storage capacity, slow processing speed, and the constraints imposed by an 80-column card format. Al- though such coding was generally justified from the computer point of view, it tended to dis- courage the user from both entering and re- trieving information. Fortunately, coding is no longer a major hindrance. Recent develop- ments in computer technology, including highly increased storage capacity and pro- cessing speed, have effectively eliminated the need, so far as the computer is concerned, for coding at the input and output stages. The differences in processing numbers as against letters, or a long word as against a short word have become vanishingly small. The advantages obtained by saving computer time and storage space by entering data using unfamiliar codes is mostly offset by factors affecting the cost and accuracy of recording the data in the first place. Moreover, the prime objective of any system should be to maximize effectiveness for the user and the use of familiar uncoded languages does this in most cases.

93

(5)

Factors in Coding

The essential consideration in selection of a set of either coded or uncoded names and terms that are to be distinguished by a compu- ter is symbolic uniqueness for each name and term. Computers operate at the symbolic (syntactical) level and cannot make distinctions between symbolically similar terms (e.g. for- mation and formation), even though the geo- logist recognized a profound semantic distinc- tion (a body of rocks vs. a process). Converse- ly, a computer will distinguish between two semantically identical terms if their syntax differs. Within a given context, any coding system must provide for a set of unique sym- bols, letters or numbers. A desirable characteri- stic of a coding system for general use is that it should be mnemonic (helping, or meant to help, identification of the word being coded).

This trait is common in ordinary abbreviations (codes), for example, Oft' (feet) or 'Dev' (Devo- nian). Codes completely lacking mnemonic qualities generally require constant reliance on a dictionary and are therefore, awkward and undesirable for the geologist [5] .

Coding in the National System

A national system would consist of files held in different organizations and in various fields of minerals. This implies a minimum of cen- tralized control and presents the undesirable possibility of different coding systems being applied to similar files across the country.

Two methods of approaching this problem are possible.

1) Assigned codes for certain large, speciali- zed classes of terms:

Certain field of minerals use large numbers of terms involving many subtle but important semantic distinctions. Coding of these terms may be common practice because they are used repeatedly; unfortunately, the coding is often inconsistent. In such situations, the best approach usually is to have a committee of experts to propose a set of assigned codes.

Users of these codes would require a dictionary authorized by the committee for encoding and decoding. A good example is the 'Well Data Glossary' prepared by the Subcommittee of Well Data Retrieval Systems of the American

Petroleum Institute (1966). This Subcommi- ttee classified and coded approximately 1,300 terms. The codes recommended are more or less mnemonic but because of jhe large number of symbolically and semantically closely related terms, it was impossible to use any particular systematic coding method and still maintain uniqueness for each of the terms.

2) Derived codes for smaller classes of terms common to many files:

A derived coding method generates a code directly from the full term according to some set of rules, and therefore, has the advantage of allowing codes to be generated without central control (i.e. without an authorised dictionary). If the population of words being coded is reasonably small, the methods will

produce only a few duplicates, although some cannot be avoided. This approach would be ideal for many purposes in the national sys- tem, as any term can be independently encoded.

However, it may break down at some point when the number of duplicates becomes so large that it must evolve into an assigned system in order to maintain uniqueness .. Nevertheless, it appears that the use of a derived coding system would have many useful applications in the national system, for example, in the coding of rock names or geographic locations.

Another useful application of derived coding method is in abbreviating words in tables, or other output displays where there are space limitations. The use of a standard method of abbreviation in these situations would of course improve communciation.

Numeric Code for Standard Geological Time Terms

In view of the common use of geological time terms in many types of data files, there may be some advantages of having available standard numeric codes for these terms. Various numeric codes have been published (e.g. Buller, 1964, p. 882-885; Ontario Dept. E.R.M. 1965 p.

14-15) and still others probably exist in private files. This coding system has the following advantages:

i) The coding system follows the hierar- chical structure of the eons, eras, and periods.

(6)

ii) The system is 'open-ended' at the lower (older) end of the time scale, which will allow for easy modification and addition to the older time terms, where future changes are most likely to occur.

iii) The three-digit codes can be expanded to four digits or more to accommodate finer subdivisions for individual pur- poses without prejudice to the standard aspect of the first three digits.

iv) The relative numeric values of the codes are parallel to the interval units of the time scale. Thus, higher the code number, the older the time unit. This will allow a simple numeric sort to effect a geological time unit sort [9].

SUGGESTIONS

1) Codes for geological names and terms should be used only when it is in the best interests of the geologists. Considerations involving subsequent computer operations should not influence the choice. In general, names and terms should be used in the form and context which is most familiar and conve- nient to the geologist.

2) Each name and term entered as data in a given context, whether coded or uncoded, must be symbolically unique, if that item is to be recognized and retrieved from a computer file.

3) Two approaches to coding for a given file are possible:

(a) Assigned codes, prepared by a commi- ttee' of experts, and controlled by an authorized dictionary.

(b) Derived codes, generated by standard set of rules, requiring essentially no central control.

4) All codes, whether assigned or derived, should have high mnemonic qualities.

5) A standard. numeric code for the major geological time units is suggested for use where a numeric code is desired.

Vol 34 No 3 September 1987

CONCLUSION

This study is conducted to take necessary steps to develop a national system for the recording, storage and retrieval of minerals data in computer processable form. The imme- diate need for such a system results from the current expansion in the volume of mineral data and the increased availability and use of computer for data storage and treatment.

Standards for computer-processable files are an urgent requirement, if Indian geologists are to take full advantage of the large resources of data available to them.

The system is defined by the following principles:

1) The system will consist of data files held and controlled by individual organizations.

2) Files within the system will be computer- based but oriented to the requirements of the users; computer requirements will be of secondary importance.

3) Files within the system will be linked by the use of standard methods of recording reference numbers, geographic location of coding.

4) The index to the contents and location of data files in India, within and outside of the system, will be a computer-assisted National Index.

5) Data in files within the system will be recorded according to certain minimum standards; however the standards for indi- viduals may exceed these minimum depend- ing on the user's needs.

Several steps have been taken towards developing the system. Procedures and standards with respect to reference number, geographic location and coding are suggested.

ACKNOWLEDGEMENT

I would like to record my gratitude to the authorities of Geological Survey of India and INSDOC for providing me with necessary help in completing this work. I am indebted to Shri R Satyanarayana, INSDOC for his guidance and keen interest in the work.

95

(7)

REFERENCES

1. Bayer R & McCreight E: Organisation and main- tenance of large order indexes. Icta Information

1971,1,173-189.

2. Buler J V: A computer oriented system for the storage and retrieval of well information. Bull Canadian Petroleum Geology 12(4), 847-91.

3. Hubaux A, ed: Geological data files: survey of international activity. CODATA Bull 1972, 8.

4. Hubaux A, ed: A new geological test - The data.

Earth Science Review 1973, 9(2), 159-196.

5. Hermer M, Lenci M & Lesage M T: SIGMI: a user oriented file-processing system. Geosdence

1976,1,187-193.

6. Murthy M V N: National earth sciences data centre in G.S.1. Paper presented at' the U.S. - India Se- minar on Information Resources in Energy, En- vironment and National Resources, Washington D.C., 1976,Oct, 13-16.

7. Murthy M V N: Earth Sciences in India, problems and opportunities with special reference to de- velopment programmes. Paper presented at the

u.s. -

India Seminar, 1976,Oct, .13-16.

8. Robinson S C: Interim report of the committee on storage and retrieval of geological data in Cana- da.

9. Talpatra A K: Geological documentation for com-

\

puter based studies. GSI News 1975,6(5/6).

References

Related documents

Percentage of countries with DRR integrated in climate change adaptation frameworks, mechanisms and processes Disaster risk reduction is an integral objective of

This report provides some important advances in our understanding of how the concept of planetary boundaries can be operationalised in Europe by (1) demonstrating how European

SaLt MaRSheS The latest data indicates salt marshes may be unable to keep pace with sea-level rise and drown, transforming the coastal landscape and depriv- ing us of a

much higher production cost levels and lower productivity levels compared to countries such as; China, Philippines, Cambodia, and even Bangladesh, which appear to have

These gains in crop production are unprecedented which is why 5 million small farmers in India in 2008 elected to plant 7.6 million hectares of Bt cotton which

Planned relocation is recognized as a possible response to rising climate risks in the Cancun Adaptation Framework under the United Nations Framework Convention for Climate Change

Angola Benin Burkina Faso Burundi Central African Republic Chad Comoros Democratic Republic of the Congo Djibouti Eritrea Ethiopia Gambia Guinea Guinea-Bissau Haiti Lesotho

The scan line algorithm which is based on the platform of calculating the coordinate of the line in the image and then finding the non background pixels in those lines and