ZMARCO is an Open Archive Initiative Protocol for Metadata Harvesting (OAI-PMH) 2.0 compliant data provider. The 'Z' in ZMARCO stands for Z39.50; 'MARC' stands for MAchine-Readable Cataloging; and the 'O' stands for OAI, as in the Open Archives Inititive. Essentially ZMARCO allows MARC records which are available through a Z39.50 server to relatively easily be made available via the OAI-PMH.
The rationale for ZMARCO is that Z39.50 and MARC are fairly ubiquitous in the traditional library world, while at the same time the OAI-PMH is quickly being adopted as a light-weight protocol for the sharing of metadata within the digital library community. Therefore, it would seem useful to develop a tool that would allow the ubiquitous (but complex) Z39.50 and MARC protocols to be utilized for the creation of the OAI data providers, thereby making the huge amount of data which is already available via these older standards also available via the new OAI-PMH. This is an attempt toward that end.
'define the max records returned in one response application("MAX_ListIdentifiers") = 1000 application("MAX_ListRecords") = 50 application("MAX_ListSets") = 1000 'define the various components used to make an OAI identifier (i.e. oai:oai.library.uiuc.edu:illinet_online/AAA-1234) application("NamespaceIdentifier") = "oai.library.uiuc.edu" application("LocalIdentifierPath") = "illinet_online" 'define the various settings needed for the Z39.50 server application("Z3950Host") = "dra.ilcso.uiuc.edu" application("Z3950Port") = 210 application("Z3950Database") = "illinet_online"
The source code for the ZMARCOPopulator is included in a separate zip file which was installed with this package, ZMARCOPop_0.2_src.zip. The source code for the VBZOOM.dll can be obtained from VBZOOM site on SourceForge. The source code for the YAZ.dll can be obtained from Indexdata. And, of course, the source code for the ASP scripts are all included with this package.
ZMARCO consists of two components. The first is a simple ZMARCO database which must be pre-populated with some minimal data which are periodically harvested from the Z39.50 server. This is required because most typical Z39.50/MARC catalogs do not allow queries based on the last modified date stamp for the metadata itself (BIB-1 USE attributes 1011 or 1012). However, access to these datestamps is essential for the OAI protocol. Fortunately, even if they are not directly queriable via Z39.50, most (but not all) MARC records have these datestamps, either in the 005 field or the first six characters of the 008 field. If these dates are not available, the current date is arbitrarily used for the record.
In order to populate the ZMARCO database some assumptions (which may not be universally true) needed to be made. First, the Z39.50/MARC catalog supports queries for the year of publication (BIB-1 USE attribute 31), and all records of interest have a valid year of publication which can be queried. The Z39.50/MARC catalog must also support queries for the local number (BIB-1 USE attribute 12, also found in the MARC 001, Control Number field). Also, the Z39.50 server must not arbitrarily limit the number of hits allowed for a single query; if a query returns 125,362 hits, the server must make all 125,362 of those records available for presentation. A simple Z39.50 client program, ZMARCOPopulator.exe, uses these assumptions to sequentially harvest all records within in a range of publication years. It then pulls out the MARC 001, 005 or 008/00-05 fields and adds them to the database. It also stores the publication year used to find the records, so that it can be used as a set parameter for the OAI requests. Once ZMARCOPopulator.exe has populated the ZMARCO database, the database can be used by the actual OAI provider to make available for harvesting all the indexed records in the Z39.50/MARC catalog. Also, in order to keep the ZMARCO database current, the ZMARCOPopulator.exe will need to be completely re-run periodically.
Currently, all of the above assumptions are hard-coded into the ZMARCOPopulator.exe. However, since the source code is available it should be possible for a programmer to relatively easily modify the code to support the nuances of any particular Z39.50/MARC catalog. Basic changes to the programs should not require any detailed knowledge of either Z39.50 or MARC. All of the code for handling these is encapulsated in an easy-to-use ActiveX DLL called VBZOOM which is used by both the ZMARCOPopulator.exe and the actual ZMARCO OAI provider ASP scripts. We hope to eventually make the program more flexible by expressing the various assumptions in configuration files to allow non-programmers to also modify the behavior of the program.
Using ActiveX Data Objects (ADO and ADOX), the ZMARCOPopulator.exe will automatically create an Access database the first time it is run. The database consists of a single table which is described below. For testing purposes, we have used the Access database to harvest and provide about 4 million MARC records from the University of Illinois' online catalog, and the performance was acceptable. However, for high usage and frequent simultaneous harvesting scenarios, using a more robust database such as Oracle or SQL Server would be preferred. This can be accomplished with some fairly simple modifications to the database connection strings used in the programs. However, if this is done, the database will need to be created and set up manually before the ZMARCOPopulator.exe is run for the first time.
ControlNumber | LastTransactionDateStamp | PublicationYear | Deleted | GeneratedDate | |
---|---|---|---|---|---|
Datatypes: | Text (20) | Date/Time | Number (Integer) | Yes/No | Yes/No |
Indexed: | Unique Primary Key | Non-Unique Index | Non-Unique Index | Not Indexed | Not Indexed |
Bib-1 Use Attribute: | 12 | 1011 or 1012 | 31 | N/A | N/A |
MARC Field: | 001 | 005 or 008/00-05 | 008/07-10 or 008/11-14 | N/A | N/A |
OAI Use: | Unique Identifier | Selective Harvesting Datestamp | Selective Harvesting SetSpec | Deleted Status | N/A |
Following is an image of the ZMARCOPopulator user interface:
The second component of ZMARCO is the actual OAI data provider. These are various Active Server Page scripts written in VBScript and JScript. They parse out and handle the various OAI requests, interfacing with the ZMARCO database or the Z39.50 server as required.