D2.2 Final report on the provision of usage data and
manuscript deposit procedures
ECP-2007-DILI-537003
PEER
D2.2 Final report on the provision of usage data
and manuscript deposit procedures
for publishers and repository managers
Deliverable number
D-2.2
Dissemination level
Public
Delivery date
31 October 2009
Status
Final, 29 Oct 2009, v5
Author(s)
Barbara Bayer-Schur; Foudil Brétel;
Natasa Bulatovic; Gabriella Harangi;
Wolfram Horstmann; Friederike
Kleinfercher; Rianne Koning; Vilius
Ku?iukas; Marianna Mühlh?lzer; Dale
Peters; Laurent Romary; Jochen
Schirrwagen; Maurice Vanderfeesten
Internal reviewer
Christoph Bruch; Jacques Millet
External reviewer
CIBER group, UCL
eContent
plus
This project is funded under the
eContent
plus programme1,
a multiannual Community programme to make digital content in Europe more accessible,
usable and exploitable.
1
OJ L 79, 24.3.2005, p. 1.
Table of Contents
Tables, Figures & Appendices................................................................................................................4
Introduction .............................................................................................................................................6
1
Content deposits from publishers to repositories ...........................................................................9
1.1
Convention.................................................................................................................... 9
1.2
Established workflows................................................................................................... 9
1.3
Deposit procedures from publishers to the PEER Depot ........................................... 10
1.3.1
Full-text format .................................................................................................... 11
1.3.2
Metadata ............................................................................................................. 11
1.3.3
Embargo period .................................................................................................. 13
1.3.4
Filtering ............................................................................................................... 13
1.4
Deposit procedures from the PEER Depot to repositories ......................................... 14
1.4.1
Transfer procedures overview ............................................................................ 15
1.4.1.1 Normalisation and Packaging......................................................................... 15
1.4.1.2 SWORD: Transporting embargo released stage-2 material .......................... 16
1.4.1.3 SWORD: Notifying successful transfer with file location................................ 17
1.4.1.4 SWORD: Notifying unsuccessful transfer of the file....................................... 18
1.4.2
Metadata ............................................................................................................. 20
1.4.3
Embargo period .................................................................................................. 20
1.5
Deposit procedures from the PEER Depot to LTP Depot........................................... 20
1.5.1
Introduction ......................................................................................................... 20
1.5.2
Content................................................................................................................ 21
1.5.3
Workflow for Transfer to LTP Depot ................................................................... 21
1.5.4
Metadata ............................................................................................................. 21
1.5.5
Digital Preservation............................................................................................. 22
2
Content deposits from authors to repositories..............................................................................24
2.1
Options for authors ..................................................................................................... 24
2.2
Communication with authors....................................................................................... 24
2.3
Author deposit workflow.............................................................................................. 25
2.3.1
Remote author authentication............................................................................. 26
2.3.2
Embargo management by repositories............................................................... 26
2.3.3
Automated metadata matching process (duplicate author deposits) ................. 26
2.3.4
Author deposit to a participating PEER Repository ............................................ 27
2.3.5
Author deposit to a non-PEER repository........................................................... 28
2.3.6
Monitoring author response ................................................................................ 29
3
Provision of usage data ................................................................................................................30
3.1
Introduction ................................................................................................................. 30
3.1.1
Work package interdependency ......................................................................... 30
3.1.2
Usage research team.......................................................................................... 30
3.1.3
Motivation............................................................................................................ 31
3.2
Transmission of Log files ............................................................................................ 31
3.2.1
Structure of Log files ........................................................................................... 32
3.3
Identification of documents ......................................................................................... 33
3.4
Expected Result.......................................................................................................... 34
4
Ongoing support for publishers and repository managers............................................................36
4.1
Introduction ................................................................................................................. 36
4.2
Establishment of a Helpdesk ...................................................................................... 36
4.2.1
Helpdesk functions.............................................................................................. 36
4.2.2
Helpdesk Workflow ............................................................................................. 37
4.2.3
Helpdesk for Publishers and Repository Managers ........................................... 38
4.2.4
Helpdesk for Authors .......................................................................................... 38
4.2.4.3 Guidance for authors on deposit procedures................................................. 38
5
Conclusions ..................................................................................................................................41
Page 4 of 75
Tables, Figures & Appendices
Tables
Table 1: Minimum metadata requirements ...........................................................................................12
Table 2: SWORD error feedback..........................................................................................................19
Table 3: Metadata categories specified under OAIS model .................................................................22
Table 4: Log file format .........................................................................................................................32
Table 5: PEER information model ........................................................................................................61
Figures
Figure 1: PEER workflow......................................................................................................................10
Figure 2: Deposit procedure from the PEER Depot to the repositories................................................15
Figure 3: Transmission action via the HTTP-protocol ..........................................................................17
Figure 4: Notification of successful transfer from PEER Depot to repository.......................................17
Figure 5: PEER author deposit workflow..............................................................................................28
Figure 6: UML activity diagram of Helpdesk ticketing system workflow...............................................37
Figure 7: PEER Helpdesk: Input flow ...................................................................................................38
Figure 8: Content Package or Container ..............................................................................................53
Figure 9: HTTP request and response structure in the SWORD context.............................................53
Figure 10: PEER Workflow ...................................................................................................................54
Figure 11: Deposit situation ..................................................................................................................54
Figure 12: OAI-PMH data harvest ........................................................................................................54
Figure 13: SWORD data deposit ..........................................................................................................54
Figure 14: SWORD versus FTP ...........................................................................................................55
Figure 15: SWORD use in PEER for PEER Depot...............................................................................56
Figure 16: Submission Information Package structure.........................................................................56
Figure 17: PEER deposit workflow .......................................................................................................58
Figure 18: PEER Object model ERD....................................................................................................60
Figure 19: OAIS Information Package ERD .........................................................................................62
Figure 20: OAIS Content Information Object ERD ...............................................................................62
Figure 21: OAIS Package Description Information ERD ......................................................................63
Figure 22: OAIS Reference Model-PEER Information Mapping ..........................................................63
Figure 23: Technical Mapping of the PEER model...............................................................................64
Figure 24: HTTP Mapping of the Technical Model...............................................................................65
Figure 25: Scenario 1 ...........................................................................................................................71
Figure 26: Scenario 2 ...........................................................................................................................72
Figure 27: Scenario 3 ...........................................................................................................................73
Figure 28: Scenario 4 ...........................................................................................................................74
Page 5 of 75
Appendices
Appendix A.
Participating journals.................................................................................................42
Appendix B.
Technical specifications for CSV metadata provision...............................................51
Appendix C.
The SWORD protocol ...............................................................................................52
Appendix D.
Peer Author Deposit interface specification..............................................................69
Appendix E.
Alternate author deposit workflow scenarios ............................................................70
Appendix F.
Current and planned practice in the provision of usage data
in a participating repository.......................................................................................75
Page 6 of 75
Introduction
The
Draft report on the provision of usage data1
and manuscript deposit procedures for
publishers and repository managers, deliverable 2.1, set out to establish a workflow for
depositing stage-2 outputs in and harvesting log files from repositories to enable the
research envisaged in the PEER project. As that report preceded the tendering process
whereby the respective research teams were selected, a number of issues were flagged for
attention, particularly of the Usage research team, in WP5 and have since been referred for
consultation.
A significant outcome of the previous draft report was the recommendation to establish the
PEER Depot as a closed intermediary repository, to receive publisher deposit in the form of
both 50% of the full-text outputs, as well as 100% of the metadata outputs; and to serve as
a base line control for the research process. The PEER Depot has since been established,
and has come to play a significant role in the workflow developed. While the draft report set
out a preliminary deposit workflow from publishers to repositories, the central role of the
PEER Depot has since influenced further developments in the provision of usage data and
manuscript deposit procedures for both publishers and authors.
This report is the result of an ongoing negotiation between stakeholder groups comprising
publishers and the library/repository community to establish best practice in deposit proce-
dures that are least disruptive of existing publication workflows, while minimizing additional
effort in repository ingest activities.
1 Methodology
Interaction between stakeholder groups has been conducted in a series of face-to-face
meetings, in which a progressively increasing number of participants from both publisher
and repository communities chose to participate by teleconference. Not only does this
signify more efficient communication, it also indicates a growing sense of trust amongst and
between stakeholder groups, borne from a common understanding of project objectives,
and a pragmatic understanding of the complexity of everyday work processes encountered
by both parties.
The draft recommendations of D2.1 were tested in the course of these discussions. Queries
that have arisen in areas of concern are indicated and some alterations to the workflow are
formulated in this final report.
Following the establishment of the PEER Depot, a pilot phase for publisher deposit to the
PEER Depot was conducted in M10+11, with satisfactory results. A pilot phase for deposit
from the PEER Depot to the repositories and the upload of log files was conducted in M12.
Theoretically, the trial workflow has now moved into production, pending the validation of
individual publisher deposits following the resolution of specific problems encountered
during the pilot phase.
Two major accomplishments of the combined effort of WP2/3 have been the establishment
of a responsible embargo management procedure, now conducted centrally at the PEER
Depot for both publisher and author deposit; and an author deposit workflow, developed in
progressive scenario testing process.
Standardised workflow set out in this final report enables a core group of interoperable
European repositories, capable in theory of accepting material deposited from third party
publishers and authors, beyond the project duration.
1
The DoW originally names this task ?Harvesting of log files“. Since the recommended
practice was altered, it is preferred in this document to call it “provision of usage data”.
Page 7 of 75
A further significant achievement of the joint effort of WP2/3 has been formalisation of the
transfer from the PEER Depot to all partner repositories in a single simultaneous process,
using the SWORD protocol. Not only is this a new application in the transfer of both
metadata and full-text articles, it represents a limited percentage of unknown errors in the
transfer process. The intention is to have all PEER content mirrored in all participating
repositories, to achieve a critical mass, except where precluded for technical reasons. The
application of the SWORD Protocol represents best effort at achieving maximum content.
2
Repository Task Force
The Repository Task Force has been successfully established with the following six
participating repositories:
? PubMan, Max-Planck-Gesellschaft zur F?rderung der Wissenschaften e.V. (MPG)
http://dev-pubman.mpdl.mpg.de/pubman/
? HAL, Institut National de Recherche en Informatique et en Automatique (INRIA)
Centre pour la Communication Scientifique Directe (CCSD/CNRS)
http://hal.archives-ouvertes.fr/
? G?ttingen State and University Library (UGOE)
http://repository.peerproject.eu:8080/jspui/
? BIPrints, Uni Bielefeld
http://129.70.12.25/opus4/public/home
? Kaunas University of Technology, Lithuania
http://peer.elaba.lt/fedora/search
? University Library of Debrecen, Hungary
http://ganymedes.lib.unideb.hu:8080/udpeer/
In addition, a UK-based repository has been invited to join the task force to better reflect
usage of predominantly English language content expected. Preliminary enquiries, however,
indicate a reluctance to participate in the project, ostensibly on the basis of heavy workloads
of repository managers, who furthermore do not benefit financially from the project.
3
Interaction between stakeholder groups
Partners and stakeholders across Europe hosted meetings of work package 2/3: STM,
London (M2) & (M4); Elsevier, Amsterdam (M6); INRIA, Paris (M8); the SURF Foundation,
Utrecht (M10) and the Max Planck Digital Library, Munich (M13).
This interaction has been supported by the constructive mediation of the Project Manager,
who participates in WP2/3 listserv discussions as a representative of the publisher stake-
holder group. Similarly, the interaction with the research teams is mediated by WP1, and
the research manager is also included in WP2/3 listserv discussions. A recent further deve-
lopment of this interaction has been the establishment of a repository managers’ listserv, to
include the research manager and a representative of the Usage research team.
4
Relationship between work packages and dependencies
Concern was expressed in the draft report at the disjuncture of work schedules in related
work packages, so that decisions taken on a technical level in WP2 regarding the specifi-
cation of log files, for example, might later impact on the suitability of data provided to the
Usage research team in WP 5. With the subsequent appointment of the CIBER group from
University College London (UCL), selected by tender to conduct the usage research, it has
become possible to communicate relevant issues via WP 1, Manage Research Process.
The benefit of the dependency acknowledged between WP 2/3 and WP 4/5 has been de-
monstrated in the recommendation of WP 5 to include a UK repository. Since much of the
Page 8 of 75
content is in the English language, usage rates will be much improved by increasing the
geographic coverage accordingly.
An attempt has been made to improve the mediated communication between related work
packages firstly by means of a designated repository representative, and subsequently by
means of a shared listserv of all relevant parties.
The relationship between work packages remains a high priority to ensure that identified
dependencies are addressed and miscommunication remains limited. For example it is
emerged after much uncertainty that no common mechanism devised for repositories in the
preparation of usage log files can be applied to all publishers. Publishers are individually
negotiating with CIBER regarding their log file provision, since they do not have a uniform
set-up internally. Therefore, this report treats only publisher deposit to repositories, and the
usage data subsequently gathered in repositories.
Page 9 of 75
1 Content deposits from publishers to repositories
1.1
Convention
In the context of the PEER project, content refers to stage-2 manuscripts and is understood
as peer-reviewed article manuscripts with corrections as accepted for publication, but prior
to editing and formatting for publication.
A trial/pilot phase for publisher deposit to the PEER Depot was carried out in M10+11 the
results of which were satisfactory. Participating repositories are now ready to receive stage-2
research outputs from publishers via the PEER Depot, following the expiration of an agreed
embargo period.
1.2
Established workflows
In an ideal world, publishers could directly deposit their content to repositories. But consid-
ering the different technologies provided by repositories and the disparity of technologies
implemented by publishers, it appeared that a centralised point of collection, known as the
PEER Depot, would be best suited to gather content from publishers, before processing
and final deposit to repositories and to KB’s long-term preservation (LTP) depot on behalf of
the publishers. The e-Depot at the Koninklijke Bibliotheek in The Netherlands was invited to
act as a long-term preservation archive, without participation in the usage measurement.
The e-Depot acts in similar role to the publishing industry, and is therefore well positioned
to enable the development of workflow, guidelines and standards that will secure the long-
term preservation of the project’s content. The PEER Depot is hosted at INRIA with the
responsibility for facilitating publisher deposit and dissemination to repositories and to the
LTP Depot. The content is also retained in the PEER Depot, in case of processing or
delivery errors. The PEER Depot receives 100% metadata and 50% full-texts of the
publishers’ content. Some publishers only participate in the author deposit aspect, thus
providing only metadata. The metadata is held extant to provide a control mechanism for
the comparative research processes of measuring the balance of the 50% deposit by
means of author deposit. This depot shall not be another repository, but a dark archive (not
accessible, nor searchable).
The PEER workflow (Figure 1) shows the expected parallel paths of publisher deposit and
author deposit.
Page 10 of 75
Figure 1: PEER workflow
1.3
Deposit procedures from publishers to the PEER Depot
Publishers deliver content (data + metadata) to the PEER Depot:
? On a daily basis or continuously
? Through FTPS or FTP into a dedicated directory
? As ZIP files, one per article
? File naming convention as [PublisherArticleId]_[yymmddhhmmss].zip 1
? Preferably with an md5 checksum2
? The metadata file contained in the ZIP file should include the name of the full-text
file, or the ZIP package must contain only one obvious full-text file.
Publishers provide in advance indication of:
? How to extract PEER-related metadata from the metadata file
? How/where to find the full-text in the zip file
? The deposit option chosen regarding metadata (see 1.3.2)
Publishers also provide in advance a list of journals contributed to PEER, with their assigned
destination, i.e. publisher or author pathway (see Appendix A:
Participating journals).
1
The PublisherArticleId may not be the same article-id as in the metadata, but it must be
some kind of unique alphanumerical identifier. 'yymmddhhmmss' is the date in the form year in two
digits, month, day, hour, minutes, seconds.
2
Each ZIP file should be delivered along with its checksum file.
deposit
deposit
PEER Depot
eligible journals / articles
100% metadata
50% manuscripts
50%
manuscripts
select
inform
deposit
deposit
Publishers
deposit
PEER Repositories
HAL
KTU
UGOE
ULD
MPG
UNIBI
deposit
LTP Depot
deposit
Central Deposit Interface
deposit
Authors
External
Repositories
(institutional or
subject-based)
Page 11 of 75
1.3.1
Full-text format
For the sake of long-term preservation, the preferred file format of full-texts is PDF/A-1 [1].
Almost all publishers agreed to provide PDF (not PDF/A), which is also acceptable for the
purposes of the PEER project. Publishers participating in the author deposit pathway do not
provide the full-text in any format. Conversion of source files to PDF is not yet supported by
the PEER Depot. The PDF file must include all figures. Provision of supplementary data is
not needed since the PEER Depot does not forward them to repositories. Files indicating
failed PDF conversion prior to transfer are excluded. The first provided version is authori-
tative over eventual following versions.
In order to identify articles, the full-text file received by the PEER Depot are renamed as
follows: "PEER_stage2_[urlencoded-DOI].pdf" before submission by the PEER Depot to
repositories and the LTP Depot.
1.3.2
Metadata
All publishers agreed to provide metadata in an XML format. Because every publisher uses
a different DTD standard, it is decided that the PEER Depot would convert all publishers’
XML into the TEI DTD standard. The TEI is a widely-used standard for encoding text
materials in XML (including metadata). INRIA is in position to provide a 99,9% conversion
transformation mechanism from any DTD to TEI.
Since exports to the PEER Depot might occur in different systems at different stages in the
publication workflow, publishers indicated difficulties providing coherent stage-2 metadata.
In some cases, critical metadata elements, such as embargo dates and persistent
identifiers, are either added or first allocated at stage-3 in the publication workflow. To limit
disruption of production workflows, it was agreed that the PEER Depot would support three
options for gathering metadata. A submission is considered complete when all required
metadata are provided.
? Option 1: All required metadata are submitted at stage-2 deposit.
? Option 2: Only a subset of metadata is provided during the first deposit including a
publisher-article-id; the rest is provided in a second deposit during the embargo
period including a publisher-article-id1.
? Option 3: All the metadata updated by the publisher at stage-3 is submitted again, in
replacement of the stage-2 deposit (except the document, which remains stage-2).
In option 2, for the second pass only, publishers can also provide the complementary
metadata in the following forms:
a. a single XML file, not zipped
b. a CSV file (see Appendix B:
Technical specifications for CSV metadata provision)
Derived from the DRIVER Guidelines [3], the minimum required set of metadata also
includes the mandatory fields recommended in DRIVER viz.: Title, Creator, Date, Type and
Identifier. Mandatory fields are marked (*).
While the PEER project recommends the submission of as much metadata as possible, the
minimum requirements are marked (*) as set out below.
1
Because the second pass
completes the first one, metadata provided twice are not updated.
Page 12 of 75
DublinCore-
like name
Comment
Title*
Article Title
Creator*
Corresponding Author’s name: Last Name, First Name
AuthorEmail
Corresponding Author’s e-mail address
Description
Abstract
Date*
Date of Publication
Identifier*
DOI or PublisherArticleId
Coverage
Geographic location of the Contributing Author: ISO 3166-1-A2
Journal
Journal Title
Affiliation
multi-tier organisation list: Country, Organisation, Laboratory
ISSN
(e-ISSN,
p-ISSN)
Volume
Issue
First Page
Last Page
These elements are not mandatory to electronic publication and can
be derived from CrossRef after DOI is provided. They may therefore
not be provided by publishers.
Type*
Default value = article. Mapped to info:eu-repo/semantics/article,
info:eu-repo/semantics/acceptedVersion
Subject
Headings
Subject headings; Scientific classification (defaults to what is provided
in the PEER Journal tables)
Language
Language of the article, ISO 639-3 (defaults to 'eng')
Embargo
Embargo period for PEER Depot (defaults to what is provided in the
PEER Journal tables)
Publisher name Name of publisher (can be derived from the PEER Journal tables or
FTPS homedir and is provided in the metadata file as an element)
Access
Open Access or Restricted
Table 1: Minimum metadata requirements
Since some articles may appear online only, or are published online before distribution of
the paper edition, it was decided that the PEER Depot would not wait for missing metadata
that should be provided by CrossRef (mainly
volume,
issue,
pages), and transmit articles as
Page 13 of 75
soon as possible. In this respect, the
volume,
issue,
pages metadata can be considered as
recommended, but not mandatory.
Finally, in the case of backfiles comprising previous articles, already set aside by publishers
for the PEER project, and which might be delivered with only a DOI, but no further meta-
data, further investigation is required to source metadata from known public sources e.g.
Public Library of Science (PLoS) or PubMedCentral. Each publisher will be approached
individually to check whether backfiles can be provided in a format similar to current
articles. In this case, ingestion to the PEER Depot and transfer to repositories can occur
immediately, to facilitate the research process.
A database is used to store the metadata in the PEER Depot and to track events related to
submission procedures (e.g. incoming and outgoing timestamps). This information can be
made available to the PEER research teams, either through replication, or frequent exports.
A complete list of articles processed in PEER is thus provided for comparative research
between publisher deposit and author deposit procedures. The database also enables
monitoring of the activity of the PEER Depot.
1.3.3
Embargo period
The period of embargo determines the date of distribution from the PEER Depot to
participating repositories and to the LTP Depot. The duration of the embargo period differs
from publisher to publisher and from journal to journal and also applies to author
submission. These dates result in an agreed generic formula:
PublicationDate + EmbargoPeriod = Distribution Date
The publication date is provided in the minimum metadata set, defined either at stage-2 or
stage-3 deposit. The embargo period, if not otherwise defined, defaults to that provided in
the PEER Journal tables1.
The embargo period on publisher contributed content is handled by the PEER Depot. For
authors’ content provided via the central deposit interface, the embargo period will also be
handled by the PEER Depot (see Ch. 2.3.5). For publisher as well as author deposit the
embargo period is applied according to the metadata provided in “date of publication” (see
Table 1 above). As soon as the embargo period expires and the metadata file is complete,
the content is ready to be transferred to and processed by the repositories and the LTP
Depot.
1.3.4
Filtering
Two levels of filtering are envisaged as functions of the PEER Depot. Firstly, of journal titles
by publishers for distribution to repositories and the LTP Depot, and secondly, of articles
submitted by European authors. The PEER Depot receives 100% metadata and 50%
publishers provided full-texts. All selected content ? that is 50% of metadata and the corres-
ponding full-texts ? is disseminated to participating repositories and the LTP Depot.
The selection of publisher-deposited full-text is conducted at the journal title level, not
manuscript level. The choice of eligible journal titles is defined by the publisher community,
with due cognisance of research requirements, viz. behavioural response of specific subject
disciplines.
See Appendix A: Participating journals
In addition, the filtering by type of non-research papers (i.e. letters to the editor) may be
operated by the PEER Depot, if the Type metadata is provided.
1
See Appendix A & the project website http://www.peerproject.eu/about/participating-journals/
Page 14 of 75
The project design further requires that only articles of European authors should be
included in the study. Since publishers do not generally filter content in this manner, it was
decided that the location of the corresponding author would be used to identify European
content. The automated selection takes place at the PEER Depot, filtered against the
coverage metadata element containing the geographical location of the corresponding
author (by country). The contribution of additional European authors is regrettably lost to
the research process.
An inevitable outcome of the project design, resulting from the filtering process is a limited
research sample. While 50% full-texts of the publishers’ content is disseminated to reposito-
ries and the LTP Depot, in fact, only that portion represented by the European correspon-
ding author within that 50% are effectively disseminated.
The effective percentage of dis-
seminated content will therefore be lower than 50%. This issue is noted for further con-
sideration, and possible adjustment of content quotas to ensure a valid research procedure.
1.4
Deposit procedures from the PEER Depot to repositories
A wide range of content formats submitted by publishers are normalised by the PEER
Depot for transfer to participating repositories. Minimal metadata requirements for participa-
ting repositories are set out in the DRIVER Guidelines.1
? Participating repositories opt to set up a dedicated repository exclusively for receipt
of PEER content; or to add content to an existing repository.
? Additional effort in the ingest of PEER content is limited to the implementation of the
SWORD interface using the SWORD protocol (see Appendix C:
The SWORD
protocol).
The LTP Depot is not SWORD compliant, so for transferring the content from the PEER
Depot to the LTP Depot, the FTP protocol or a FTP/s client will be used.
The deposit procedure uses a unified ingestion service, based on accepted international
standards. These standards include PDF/A (ISO 19005-1:2005); TEI metadata format for
descriptive metadata; ZIP for creating a package containing the PEER content; and the
Atom Publishing Protocol (RFC 5023) using the SWORD specification as a transport
protocol transferring the package to the repository. The benefit achieved is a core group of
interoperable European repositories, capable in theory of accepting material deposited
directly by third party publishers and authors beyond the project duration.
The deposit procedure is an automated process whereby the publications released from
embargo are transferred from the PEER Depot to all partner repositories in a single
simultaneous SWORD transfer. The intention is to have all PEER content mirrored in all
participating repositories, to achieve a critical mass, except where precluded for technical
reasons. When the publications in the repository are accepted and stored, an automated
confirmation message is sent back to the PEER Depot with the online link to the
publication. These locations can be used to notify the author about the links where he or
she can find the stage-2 material.2
The deposit procedure from the PEER Depot to the repositories is illustrated in Figure 2
below.
1
DRIVER Guidelines v.2.0: http://www.driver-repository.eu/DRIVER-Guidelines.html
2
See minutes of PEER WP 2/3 meeting, 3rd September 2009, MPDL, Munich.
Page 15 of 75
Figure 2: Deposit procedure from the PEER Depot to the repositories
1.4.1
Transfer procedures overview
The transfer of 50% full-text content and the author submitted files from the PEER Depot is
conducted as follows:
? On a daily basis, as articles are normalised continuously
? Submission by FTP/S1 transmission2 or SWORD protocol
? As ZIP files, one per article3
? The ZIP package contains only one pdf data file and one metadata file
? File naming convention as
o [PEER_stage2_[urlencoded-DOI].pdf]
o [PEER_stage2_[urlencoded-DOI].xml]
o [PEER_stage2_[urlencoded-DOI].zip]
in order to identify PEER articles in repository log files, slashes in the DOI format
are encoded as “_slsh_”.
? Submission accompanied by an md5 checksum4
? In the case of FTP/S, an acknowledgement file named
ack_PEER_stage2_[urlencoded-DOI].txt
comprising only the repository internal identifier, which is the URL pointing to the
created resource, will be returned in successful ingestion (void if unsuccessful).
1.4.1.1 Normalisation and Packaging
The metadata and the full-text files submitted by publisher deposit and that submitted by
author deposit are normalised, since repositories expect a unified standard of the material.
The different variations of the delivered metadata formats (mostly NLM format in different
versions) are converted to the TEI metadata format. The full-text files are delivered by the
publishers in PDF or PDF/A format. Both files are packaged in a ZIP compliant file.
The filename of these files are renamed to contain the DOI and follows the following syntax:
“PEER_stage2_ [urlencoded-DOI].[ext]” for all files accounts where [ext] has to be replaced
with respectively “xml” (TEI metadata), “pdf” (full-text) and “zip” (package). For all files
accounts where [urlencoded-DOI] has to be replaced with the DOI string that accompanies
the publication, and in some cases, not encouraged, all slashes in the DOI string may be
replaced with the following string: “_slsh_” (for security reasons concerning the web server,
1
FTP/SSL is a secure way to transfer files. The opensource command line tool cURL can be
used as a FTPS client.
2
FTP pull has two advantages: repositories do not have to install a FTP-server; and they have
confirmation of successful ingest.
3
A single zip-file is essential to enable the PEER Depot to identify clearly each article, i.e. the
material is not spread into many files that need to be gathered together.
4
Each ZIP file is delivered along with its checksum file.
Page 16 of 75
when changing webserver configuration is not allowed). Then the filename is URL-encoded
(RFC 3986), to avoid unusual behaviour upon unrecognised characters.
Examples using the DOI “10.2345/38884.299_299” creates the following filenames:
? PEER_stage2_10.2345_slsh_38884.299_299.xml
? PEER_stage2_10.2345_slsh_38884.299_299.pdf
? PEER_stage2_10.2345_slsh_38884.299_299.zip
The application profile of the TEI metadata tells the repository manager how to interpret the
metadata fields in the PEER context. Both the TEI metadata DTD (see Table 1) and the
way of packaging provide the repository a standard what to expect when they receive a
PEER package. This standard is put under a unique namespace that can be used when
sending the package using SWORD-APP. The name space goes by the URI:
http://purl.org/net/sword-types/tei/peer/ .
Agreements and conventions
? Package contains one metadata file and one PDF file
? Package format is ZIP
? Metadata format is TEI (Text Encoding Initiative) according to the PEER-TEI
Application Profile (see Ch. 1.3.2)
? PDF format is PDF/A (ISO 19005-1:2005)
? Filenames are renamed in the following syntax: PEER_stage2_[DOI].[ext]
? [DOI] is the Digital Object Identifier of the publication
? [ext] is the extension of the files, either PDF, XML or ZIP
? All the slashes in the filename may be replaced with: “_slsh_” This is not
encouraged, the default action is to change the webserver configuration to allow
slashes.
? All filenames are completely URL-encoded
1.4.1.2 SWORD: Transporting embargo released stage-2 material
The complete ZIP file is then ready for transfer to the repositories on expiration of the
embargo period. The embargo period differs per journal and is listed accordingly in
Appendix A:
Participating journals. The algorithm setting the release date is described in
Ch. 1.4.3 below. The transfer is authenticated via the SWORD-APP protocol, posts being
authorised only by the PEER Depot.
Figure 3 below depicts the transmission action via the HTTP-protocol.
Page 17 of 75
Figure 3: Transmission action via the HTTP-protocol
Agreements and conventions
? Submission accompanied by an md5 checksum1
? Basic authentication is used
1.4.1.3 SWORD: Notifying successful transfer with file location
When the SWORD interface has received the package it unpacks the ZIP-file and stores
the PDF and Metadata into the repository. When this is done the SWORD interface
immediately notifies the PEER Depot about the successful operation with the URL of the
PDF located at the repository.
Figure 4: Notification of successful transfer from PEER Depot to repository
1
Each ZIP file is delivered along with its checksum file.
HTTP-header of the success response
ATOM entry of the response
HTTP/1.1 201 Created
Date: Mon, 18 August 2008 14:27:11 GMT
Content-Length: nnn
Content-Type: application/atom+xml; charset="utf-8"
Location: http://www.myrepository.org/geo/atom/my_deposit.atom
<entry ...>
<title>My Deposit</title>
<id>http://hdl.handle.net/2437.2/20</id>
<updated>2008-08-18T14:27:08Z</updated>
<author><name>jbloggs</name></author>
<summary type="text">A summary</summary>
...
<content type="application/zip" src="http://www.myrepository.org/geo/deposit1.zip"/>
<sword:packaging>http://purl.org/net/sword-types/tei/peer</sword:packaging>
<link rel="edit"
href="http://www.myrepository.org/geo/atom/my_deposit.atom" />
<link rel="part"
href=”http://www.myrepository.org/geo/pubs/PEER_stage2_[DOI].pdf”
type=”application/pdf” />
</entry>
The id MUST be an IRI (allows Unicode-chars) or URI
These MUST be the same
HTTP-header of the POST action
POST /geo HTTP/1.1
Host: www.myrepository.org
Content-Type: application/zip
Authorisation: Basic ZGFmZnk6c2VjZXJldA==
Content-Length: nnn
Content-MD5: [md5-digest]
Content-Disposition: filename= PEER_stage2_[urlencoded-DOI].zip
X-Packaging: http://purl.org/net/sword-types/tei/peer
User-Agent: MyJavaClient/0.1 Restlet/2.0
X-packaging must be this namespace,
the receiving party then knows how to
handle the zip file.
Page 18 of 75
Agreements and conventions
? HTTP-header response element “Location” MUST contain the URI of the Media Link
Entry, as defined in ATOMPUB.
? The Media Link Entry URI MUST dereference.
? The Media Link Entry URI MUST contain an <atom:content> element with a “src”
attribute containing a URI.
? The Media Link Entry URI MUST contain the location of at least the PDF file in the
repository.
? The Media Link Entry MAY occur more than once containing other relevant locations
to the publication at the repository.
? The Media Link Entry URI MUST NOT contain internal server paths.
? <atom:id> MUST contain an IRI (Internationalised Resource Identifier, RFC 3987),
allowing Unicode, or an URI (which is a subset of an IRI)
? <atom:author> MUST contain the user sending the package, it MUST NOT contain
the author of the publication.
? Additional mandatory fields are <atom:title> and <atom:summary>
1.4.1.4 SWORD: Notifying unsuccessful transfer of the file
In the case of ingestion things might go wrong in three places:
1) At the HTTP protocol level
2) At the SWORD interface level
3) At the repository upon ingestion
Level 1 and 2 describe errors that happen on the surface, on the “communication” level.
Level 3 describes an error that occurs below the surface, inside the repository.
Providing error handling at the HTTP level (1) is considered standardised in all repositories,
this MUST be used, and will not be mentioned here further. Providing error handling at the
SWORD interface level (2) described in the SWORD protocol v1.3 SHOULD be used, and
will not be explained here, but we will refer to the SWORD v1.3 specifications. Error
handling at the repository level (3) SHOULD also be used and will be explained below.
Ingestion Error feedback
When a file has been successfully transferred to the repository, the case might be that the
repository cannot ingest the received file. The most appropriate response might be that
there is something wrong with the ingestion and not with the transmission.
To provide the PEER Depot with a clue about that the file is not processed in the repository
the following error handling information SHOULD be used.
Error URI
Usage notes
http://peerproject.eu/sword/error/ErrorOnIngest The server MUST also return a HTTP
status code, which describes the
situation most likely.
Page 19 of 75
This introduces a new namespace "http://peerproject.eu/sword/error" and an error type
"ErrorOnIngest", which means the document couldn't be stored in the repository because of
an error like
? the repository is down (status code 503)
? the repository takes too long to answer the request
? the repository requires authentication, the SWORD interface cannot fulfil
SWORD Error feedback
The following error handling procedures are written in the SWORD protocol v1.3, and
SHOULD be implemented to provide useful error handling information.
Error URI
Usage notes
http://purl.org/net/sword/error/ErrorContent
The supplied format is not the
same as that identified in the
X-Packaging header and/or
that supported by the server
http://purl.org/net/sword/error/ErrorChecksumMismatch Checksum sent does not
match the calculated
checksum. The server MUST
also return a status code of
412 Precondition Failed.
http://purl.org/net/sword/error/ErrorBadRequest
Some parameters sent with
the POST were not
understood. The server MUST
also return a status code of
400 Bad Request.
http://purl.org/net/sword/error/TargetOwnerUnknown
Used in mediated deposit (see
Part A Section 2) when the
server does not know the
identity of the X-On-Behalf-Of
user.
http://purl.org/net/sword/error/MediationNotAllowed
Used where a client has
attempted a mediated deposit,
but this is not supported by
the server. The server MUST
also return a status code of
412 Precondition Failed.
Table 2: SWORD error feedback
Page 20 of 75
Agreements and conventions
Due to rapid implementation, it has been decided1 not to put effort in the error handling
procedures. The first priority is to get SWORD working and rely on the standard HTTP error
messages to identify problems.
However, it is recommended that the SWORD and Ingestion error handling is enabled to
provide finer granular feedback. This information is useful for better analysis when a
problem occurs, that might lead to a quicker solution.
1.4.2
Metadata
Publisher profiles indicate a wide range of metadata schema deployed. Derived from the
DRIVER Guidelines2, the minimum required set of metadata elements common to all
publisher submissions, will be transferred to repositories:
? Mandatory elements : Title, Creator, Date, Type and Identifier
? Additional recommended elements as available
? PEER Depot transforms received metadata to TEI
? PEER Depot exports only TEI metadata files, as per repository preference
1.4.3
Embargo period
The embargo period differs according to each journal.
? Publication date plus embargo period determines the date of distribution from the
PEER Depot to participating repositories
PublicationDate + EmbargoPeriod = Distribution Date
o Where “publication date” is the date of publication of the stage-3 publication,
it can be found in the metadata provided by the publisher.
o “Embargo period” is the period of time an article is not allowed to be
released determined by the name of the Journal that can be found in the
table of Journals participating the PEER project (see Appendix A:
Participating Journals)
o “Distribution date” is the date when the PEER Depot is allowed to transfer
the article to the repositories (after the expiration of the embargo period).
? After author deposit, and if an e-mail address is provided, authors will receive a con-
firmation message that indicates notification of availability in the participating reposi-
tories following expiration of the embargo period. The confirmation message relies
on the previous transfer of relevant metadata from publishers.
? Where possible, authors are then notified accordingly, with the links of the repository
pages where they can find their deposited material.
1.5
Deposit procedures from the PEER Depot to LTP Depot
1.5.1
Introduction
The e-Depot of the National Library of The Netherlands (KB) aims to ensure perpetual access
to the published records of the arts, humanities and social sciences, science, technology and
medicine, and the digital cultural heritage. The KB assures publishers, libraries and end users
1
See minutes of PEER WP 2/3 meeting, 3rd September 2009, MPDL, Munich.
2
DRIVER Guidelines: http://www.driver-repository.eu/DRIVER-Guidelines.html
Page 21 of 75
that the information preserved in the archive will outlast the transience of digital information
carriers and formats. The role of the KB in the PEER project is to act as the long-term
preservation (LTP) archive. The e-Depot is not an additional PEER repository, but in fulfil-
ment of the curatorial responsibility of the library and repository community, will serve as long-
term preservation depot in which the data objects and the accompanying metadata are kept
safe beyond the duration of the project. The KB provides access to the content, based on the
available access information in the metadata of the stage-2 manuscripts.
1.5.2
Content
The LTP Depot only receives and archives the final version of the content (PDF+
accompanying XML in one zip file) as delivered to the PEER Depot. The KB does not
receive authors’ content directly from the author, but (after a possible embargo period) via
the PEER Depot with the authors’ PDF incl. the accompanying complete (stage-3) XML
metadata from the publishers. As the LTP Depot preserves content objects, only those
records from the PEER Depot that contain an object and accompanying metadata will be
transferred to and archived in the LTP Depot. This means that metadata only records are
not transferred from the PEER Depot to the LTP Depot.
1.5.3
Workflow for Transfer to LTP Depot
The function of the LTP Depot is to preserve stage-2 manuscripts as deposited in the PEER
Depot. Consequently, the LTP Depot has a different role and place within the PEER
workflow functioning as an archive in which data objects and the accompanying metadata
are kept safe. Figure 1 shows the place and role of the LTP Depot in relation to the PEER
Depot and the PEER repositories.
As described in the workflow, stage-2 manuscripts are fetched from the PEER Depot by
FTP/S. Before transfer takes place, the content of each zip file is converted to its final stage
for archiving into the LTP Depot. Each zip file contains:
? A main file in full-text PDF format
? The complete accompanying metadata file
In processing the content, bibliographic metadata described according to the PEER/TEI
DTD is converted to KB’s DTD, whereas the original metadata delivered by the PEER
Depot is stored with the content and converted metadata. Processing of the packages is
based on the OAIS reference model1.
The LTP Depot is investigating the possibility of responding an acknowledgement file
named “ack_PEER_stage2_[urlencoded-DOI].txt” to the PEER Depot upon successful pre-
process on its side. The acknowledgement file should comprise only the repository internal
identifier (which may be a URL). Upon unsuccessful pre-processing, the acknowledgement
file should be empty, and the rejection would be handled by the involved teams.
1.5.4
Metadata
Within the PEER project standardised metadata is applied according to the metadata
requirements set out in Table 1. Publishers deliver the stage-2 manuscripts including
metadata to the PEER Depot, and where possible, further recommended and optional
elements are included. The PEER Depot converts the bibliographic metadata into the
PEER/TEI DTD, which once transferred – as stage-3 i.e. most complete metadata ? to the
repositories and the LTP Depot is mapped according to local usage. The Dublin Core
Metadata Element Set is commonly applied. Both the PEER Depot and the KB process
1
http://public.ccsds.org/publications/archive/650x0b1.pdf
Page 22 of 75
bibliographic metadata according to the PEER /TEI DTD ? stored in XML format ? into their
systems, together with the stage-2 manuscript. This bibliographic metadata may be con-
verted to local workflows, for providing access through the local catalogue for example.
Further investigation is required of the possible future inclusion of an International Standard
Name Identifier (ISNI) [4] and ultimately, a Digital Author Identification (DAI), [5] as these
standards become more widely accepted.
1.5.5
Digital Preservation
PDF Guidelines
The article itself is required in a PDF-format. The KB maintains PDF guidelines which are
mainly about the following subjects: Accessibility and structure, Fonts, Compression,
Images, Executable actions and Colour. The KB is able to archive all PDF versions. For
preservation purposes, PDF/A is the most suitable version. The main reason for this
preference is that PDF files are portable across systems and platforms without changing
the content or authenticity of the document now and in the future.
File information
For every file, main file and supplements, the KB requires the file name of a zip file, the file
name of a main file, the file format and the file version. Regarding the file format the KB
advises to handle PUID (Persistent Unique Identifier) as a standard. File information can be
added in the metadata and is important for migrating files in the future, which might other-
wise become unreadable. To ensure long-term preservation, it could happen that specific
files need to be migrated to another (readable) file format.
The PDI (Preservation Description Information) is required for adequate preservation of the
Content Information. Besides bibliographic metadata, the KB needs to identify metadata
categories as specified under the OAIS model, listed below. For each category, the KB
prefers a separate format. Currently the following categories and the related metadata format
are preferred:
Category:
Format:
Bibliographic/Descriptive metadata DCX
Structural Metadata
MPEG21-DIDL
Preservation Metadata
PREMIS
Provenance Metadata
Technical Metadata
For still images: MIX
For text documents: TextMD
Rights Metadata
For still images: MIX
For text documents: TextMD
Table 3: Metadata categories specified under OAIS model
There are no strict boundaries between the different categories of metadata, some
elements can also be sub-types of other elements and there is also a lot of overlap between
the different categories.
Page 23 of 75
References
[1] PDF/A-1 ISO 19005-1: Document Management ? Electronic document file format for long-term
preservation ? Part 1: Use of PDF 1.4 (PDF/A-1). http://www.pdfa.org
[2] NLM Journal Publishing Tag Set http://dtd.nlm.nih.gov/publishing/
[3] DRIVER Guidelines
http://www.driver-support.eu/documents/DRIVER_Guidelines_v2_Final_2008-11-13.pdf
[4] International Standard Name Identifier: http://www.isni.org/
[5] DigitalAuthorIdentification:
http://www.surffoundation.nl/smartsite.dws?ch=eng&id=13480
Page 24 of 75
2 Content deposits from authors to repositories
The development of an appropriate workflow for author deposits has proved to be most
challenging, as the author response is unpredictable. This chapter sets out a process of
author deposit that, as far as possible, does not interfere in established practice. Authors
are therefore encouraged to follow their established practice of deposit in an institutional or
subject-specific repository. Failing such practice, central deposit in the PEER Depot for
distribution to designated repositories is recommended. It is highly unlikely that authors
would be willing to deposit twice, nor does the project wish to impose additional work on
those authors willing to participate. On the other hand, a direct author deposit procedure
parallel to that of publisher deposit is not possible, without undue intervention in scholarly
practice. Instead, authors eligible for participating in the PEER project are notified via the
publisher and invited to respond.
There is currently no effective mechanism in place to ensure significant author participation,
and without it, the value of the research might be questionable. The author deposit workflow
is acknowledged as no more than an effort to keep track of authors self-archiving to PEER
repositories and other repositories. However, it is precisely this lack of a controlled re-
sponse to the author deposit procedures that will inform the behaviour and usage research
investigations in Work Packages 4 & 5 respectively.
2.1
Options for authors
The author deposit procedure is envisaged in alignment with the normal points of contact
between publishers and authors, as follows:
? Authors submitting manuscripts to eligible journals will be informed by the publisher
about PEER and its objectives.
? At the point of acceptance, the author will be invited to participate, and to visit the
PEER Helpdesk for further details of the project. The request for deposition will
include a request to inform the project, should the author intend to deposit the
manuscript in a repository of choice, other than in PEER (see Ch. 4.2.4.3).
2.2
Communication with authors
For reasons of data privacy, the participating publishers are not able to make the contact
details of eligible authors available, and no direct communication is envisaged. Publishers
are therefore provided with generic texts to communicate sufficient and consistent
information to authors. At the point of acceptance of their manuscripts by their publishers,
the authors will receive an invitation to deposit their manuscript within the framwork of the
PEER project:
This journal is participating in the PEER project <http://www.peerproject.eu/>, which
aims to monitor the effects of systematic self-archiving (author deposit in repositories)
over time. PEER is supported by the EC eContent
plus
programme
<http://ec.europa.eu/information_society/activities/econtentplus/index_en.htm>.
As your manuscript has been accepted for publication by [Journal name], you may be
eligible to participate in the PEER project. If you are based in the European Union, you
are hereby invited to deposit your accepted manuscript in one of the participating PEER
repositories. You may also choose to deposit in a non-PEER, institutional or subject
repository in addition to, or as an alternative to deposit in a PEER designated
repository. If depositing your accepted manuscript in a non-PEER repository, please set
an embargo period of X months from the date of publication of the journal article for the
public release of your accepted manuscript. For further information on PEER deposit,
non-PEER deposit and embargo periods please visit the
PEER Helpdesk:
http://peer.mpdl.mpg.de/helpdesk.
Page 25 of 75
However, since it is expected that authors may choose to respond immediately upon receipt
of invitation to deposit, the invitation will be linked to the PEER Helpdesk website where
authors are informed on their deposit options:
? For deposit to the PEER Depot, an online interface is established to guide authors
through a simple deposit procedure (see Appendix D:
Peer Author Deposit interface
specification). In this case authors may provide their e-mail address for further
contact by the PEER Depot upon successful deposit to participating repositories
(see Figure 4).
? When depositing to other repositories (not participating in PEER), authors are in-
vited to provide the URL of the item location in the repository, their name and op-
tionally, an e-mail address for later contacts by the PEER research team. Although
we cannot determine how many authors deposit outside the PEER Depot, because
authors may or may not declare their intent, any information gathered on alternative
deposit may provide useful to the behavioural research.
The PEER Helpdesk additionally provides further information to authors on the PEER
project itself; information on participating repositories or on the handling of the embargo
period. Authors may also post their questions via the
Trac1 ticketing system to the PEER
project support team.
See Chapter 4: Ongoing support for publishers and repository managers
When depositing to the PEER Depot, the author receives two feedback messages:
1. Upon successful submission:
A message is shown on the screen to notify the author that his/her submitted
publication will be deposited in all PEER participating repositories2 after the
expiration of the embargo period.
2. Upon deposit to the participating repositories i.e. after the embargo period expires:
The PEER Depot will transfer the author submitted manuscript to all participating
repositories. As the SWORD3 protocol is used for this purpose, each repository
confirms the accepted deposit by a message containing the URL which indicates the
location of the article in the repository. The URLs from all repositories are collected
and e-mailed to the author.
2.3
Author deposit workflow
Several scenarios for the author deposit workflow were considered (see Appendix E:
Alternate author deposit workflow scenarios) before the definition of the final workflow for
the author deposit. As outlined in the [DoW], [D2.1] and [D3.1], it was assumed that authors
would deposit their stage-2 manuscripts directly to the repositories participating in PEER.
A problem was foreseen however in the fact that this assumption does not take into account
the authentication of the author with the repositories during the deposit workflow (see Ch.
2.3.1). As most participating repositories do not allow for anonymous deposits, some
authors might not be able to deposit in the repository of their choice, even if they wished to
do so. Some repositories allow for registration directly via their repository interfaces, but
those repositories based at a university, are restricted by the authentication based on the
network and the IP address of the client. Therefore, separate authentication of authors not
affiliated to the repository host organisation would have been necessary for all repositories.
1
http://trac.edgewall.org/
2
Information on participating repositories is available both at the PEER project website
(http://www.peerproject.eu/about/) and the PEER Helpdesk (http://peer.mpdl.mpg.de/helpdesk/wiki/
repositorymanagers#PEERaffiliatedRepositories).
3
http://www.ukoln.ac.uk/repositories/digirep/index/SWORD
Page 26 of 75
Due to data protection issues, the project is not allowed to use author e-mail addresses for
authentication.
Possible alternatives to improve the author deposit ? given that the anticipated low deposit
rate threatens the validity of the project – are the registration at a central point and perhaps
even a centralised deposit. The advantage of the latter is seen additionally in the possibility
to enable any PEER author to deposit to each designated PEER repository, indirectly,
through the PEER Depot. For this purpose, the SWORD protocol, originally applied to the
transfer of publisher data from the PEER Depot to the participating repositories would also
serve as the mechanism to facilitate author deposits to repositories.
2.3.1
Remote author authentication
The following alternative strategy for author authentication has been devised:
? Authentication at the PEER repository of choice, possibly by a single (PEER-guest)
account. This account could then be used to disambiguate between the standard
and the PEER related repository content.
? Centralised authentication conducted at the PEER Depot or the PEER Helpdesk,
either by the author requesting an account (self-registration), by providing his/her
e-mail address or as anonymous deposit (no authentication at all) – but with a
spam-preventing functionality such as reCAPTCHA1.
The recommendation for centralised authentication for remote author deposits found project
support, though members of this work package are aware that this procedure requires extra
effort to develop and integrate such functionality as a new application in the workflow.
2.3.2
Embargo management by repositories
The embargo period differs for each journal. A list of journal titles and the corresponding
embargo period was provided by the participating publishers and is publicly available at the
PEER website2. It has been acknowledged that repository management of the embargo
period requires considerable and repeated effort. Therefore it was decided to manage the
embargo of author deposits centrally at the PEER Depot prior to repository transfer in a
manner similar to that for publisher deposit:
? Publication date extended for the duration of the embargo period determines the
date of distribution of an article from the PEER Depot to participating repositories.
? PEER Depot holds any content previously received via author deposit until matching
metadata are received from the publishers.
? Matching of publisher deposited metadata with the metadata received from author
deposits determines the release of the deposits to participating repositories after the
expiration of the embargo period.
2.3.3
Automated metadata matching process (duplicate author deposits)
To ensure the correct handling of the respective embargo period of an article, it was
regarded necessary to conduct an automated process to match the author deposit with the
corresponding metadata provided by the publisher at the repositories.
In the workflow originally envisioned, the PEER Depot would have transferred the metadata
corresponding to an author submitted article only after the expiration of the embargo period
signalling to the repositories the release of the article. The metadata provided by the author
1
http://recaptcha.net/
2
http://www.peerproject.eu/about/participating-journals/
Page 27 of 75
would then have been overwritten with the publisher’s version, since this is expected to be
of a higher standard.
Repositories would have most likely received the full-texts from author self-archiving first,
and thereafter the corresponding metadata from the PEER Depot (after expiration of
embargo period). The identification would have taken place by matching author name and
article title. Solely for the purpose of matching metadata and as an exception, taking the
data protection issues into account, the use of author e-mail addresses was recommended
as a means to match metadata and article. The DOI would not have been suitable as
identification element, since the authors do not know the DOI at the time of deposit.
This procedure may still have resulted in some elements of manual checking e.g. author
names written with special characters or variations of names (abbreviations, academic
titles…).1
However, as both publisher deposits and author deposits are conducted via the PEER
Depot, the process of matching of metadata has been natively moved to the PEER Depot.
Authors provide the stage-2 manuscript metadata, full-text, and optionally corresponding
author’s e-mail, when making their deposit. This is the basis to match the author provided
metadata with the metadata received from the publishers. Once metadata are matched and
the embargo period has expired, the PEER Depot proceeds with the deposit of the stage-2
manuscript to the participating repositories. Thus an additional effort to match or overwrite
author deposited metadata with the publisher deposits for repositories is avoided and the
process is simplified.
2.3.4
Author deposit to a participating PEER Repository
In a process of consultation between members of the work packages 2 and 3, a series of
author deposit workflow scenarios were developed.
See Appendix E: Alternate author deposit workflow scenarios
The guiding principle throughout remains the freedom of authors to choose to deposit their
data to an alternative repository of their choice, in accordance with already established prac-
tice, and to inform PEER of this deposit. This functionality is enabled by the PEER Helpdesk.
Eligible (EU) authors who receive an invitation from the publisher to deposit their accepted
manuscript to PEER are directed to the PEER Helpdesk, where they are offered two options:
1. Authors have the option to deposit their accepted manuscripts directly. Here they
have the opportunity to enter their metadata and upload their manuscripts.
2. Authors may choose to deposit in their institutional repository, a subject-based repo-
sitory or on their personal website. Authors who choose to do so are kindly request-
ed to notify the project by inserting the URL of the article in the repository of choice
(see Ch. 2.3.5 and 4.2.4.3).
There is no authentication mechanism in place; instead, a reCAPTCHA2 is used to prevent
automated deposits and spamming.
S
ee Appendix D: Peer Author Deposit interface specification
The article and metadata submitted by the author are transferred to the PEER Depot where
a. the metadata is matched against those received by the respective publisher
b. embargo management takes place
c. the author is informed about transfer of data to repositories
2
See minutes of PEER WP 2/3 meeting, 25th June 2009, SURF, Utrecht.
2
http://recaptcha.net/
Page 28 of 75
The chosen method of author deposit is regarded as a satisfying solution for both the pro-
ject and the authors, since it limits the author’s effort: By making one deposit the manuscript
will be available in all participating repositories. The PEER author deposit workflow is de-
scribed in Figure 5 below1.
Figure 5: PEER author deposit workflow
2.3.5
Author deposit to a non-PEER repository
When invited to deposit data to PEER, authors are given an additional option to inform
PEER on alternative deposit, in case they have deposited their stage-2 manuscripts in a
non-PEER repository.
The PEER Helpdesk directs authors for this purpose to a form where they can provide infor-
mation on the URL of the item/repository where they have deposited their data, their name
and optionally, an e-mail address for further contact by PEER behavioural research team.
1
Further information is available at the Max Planck Digital Library Wiki (see
http://colab.mpdl.mpg.de/mediawiki/Peer:_Author_Deposit).
Page 29 of 75
2.3.6
Monitoring author response
Deposit will be monitored by the behavioural research undertaken in WP4, with the appoint-
ed team being the Department of Information Science and LISU at Loughborough Univer-
sity, UK, and measured against the 100% metadata control managed by the PEER Depot.
The project is aware of the fact that it is not possible to predict the behaviour of authors
invited to deposit. It is noted that limited contact with authors, and hence minimal support
for author deposit could affect the size of the research sample available in WP4. An option
of supplementary harvesting by the PEER Depot, as a means of redress, requires further
investigation.
Page 30 of 75
3 Provision1 of usage data
3.1
Introduction
This chapter defines how repositories should make available usage data to enable research
on usage statistics, i.e. usage levels and patterns.2 According to decisions taken in the
PEER project, a very basic solution is presented: PEER repositories participating as usage
data providers should upload usage log files in a regular manner. The usage log file is a
text (ASCII) file containing, at a minimum, a record of the time and origin of requests for the
PDFs provided by PEER.
The party selected to perform usage research in WP5 (CIBER group, see Ch. 3.1.2) is
required to approach publishers individually for access to their log files. As a consequence,
this interaction with publishers will not be described further in this report.
See Appendix F: Current and planned practice in the provision of usage data in a
participating repository
3.1.1
Work package interdependency
The PEER project [1] will investigate the effects of the large-scale deposit of publications in
repositories on user access, author visibility, and journal viability. Three tenders have been
launched for behavioural and usage (December 2008) as well as for economic research
(September 2009), respectively3. In order to enable this research organised in work
package 1 and 5 of PEER, WP2 and WP3 are required to prepare the technical ground.
This chapter describes basic assumptions and decisions relevant for specifying what WP2
and WP3 can provide for the usage research. The objectives of the usage research will be:
a. to determine usage trends at publishers and repositories
b. understand source and nature of use of deposited manuscripts in repositories
c. track trends, develop indicators, and explain patterns of usage for repositories and
journals [2]
Thus, usage research requires:
? Complete information on the publications to be observed (see Ch. 2)
? Recorded usage events for these publications from all participating repositories as
data-providers
The remainder of this chapter describes how these requirements can be met by the PEER
project, specifically WP2 and WP3.
3.1.2
Usage research team
The CIBER group from University College London (UCL) has been selected to perform
usage research. Together with the behavioural research team it will provide final reports
1
The DoW orginally names this task ?Harvesting of log files“. Since the recommended
practice was altered, it is preferred in this document to call it “provision of usage data”.
2
The publishers are individually reaching agreement with CIBER regarding their log file
provision, since they do not have a uniform set-up internally.
3
http://www.peerproject.eu/press-releases-announcements/
Page 31 of 75
mid 2011 and will feed into model development to determine whether (and how) traditional
publishing systems can co-exist with self-archiving.1
CIBER is concerned that the rate of usage of the material be limited if no repositories from
English speaking countries are included, since the vast majority of content is English
language. They expressed the need to expand the repository task force to achieve better
geographic representation. Thus, CIBER recommended the addition of a repository in the
UK to better reflect the usage of predominantly English language content.
Furthermore, log files from the repositories for at least six months are required before
PEER content becomes available in order to indicate if this additional content makes any
difference to usage levels. For participating repositories that are dedicated PEER reposito-
ries this requirement cannot be met, since they contain no legacy content.
3.1.3
Motivation
Usage research in the domain of digital scholarly publications has recently been discussed
intensively in the context of developing expressive indicators and metrics for the impact of
scholarly publications (see [3] for a recent summary). Other than the conventional ap-
proaches based on citations and often related to complete journals rather than to the article
level, usage events are thought to have the potential of providing higher temporal and
thematic resolution (“quicker and more precise”). Methodologies have been developed [4],
also in large scale projects (e.g. MESUR [5]) and standards are about to be expressed (e.g.
PIRUS [3]). Within PEER, it was assumed that these developments are premature – thus
implicating unforeseen work for the project – and it was decided [6] not to prepare the
infrastructure for the use of such methodologies or standards but rather to provide 'raw'
web-server log files to the party acquiring the usage research tender. Thus, specific ques-
tions to be answered by this document are limited as follows:
? How can raw web-server log files be transmitted from local data providers to the
Usage research team?
? What is the structure of the log files?
? Which data shall be as minimum provided with the web-server log files?
? How can PEER articles be identified in log files?
3.2
Transmission of Log files
Local data providers upload their local log files to a secure server located at UCL. UCL has
set up accounts for the data providers in order to upload by SSH based protocol rsync, SCP
or SFTP. An automated upload by rsync over SSH on a daily basis with one (compressed)
file per day is preferred. Alternatively the dropbox at <http://www.ucl.ac.uk/dropbox/> may
be used. In this case the files should be sent weekly or monthly with all the daily files in a
compressed archive format. The reader may picture this package as a tar.gz or zip-file with
the naming convention:
PEER_usage_[data_provider_name]_[yyyymmddhhmmss].log.[tgz | zip]".
The chosen file naming convention is specifically designed to avoid mistaken file overwriting.
The KB, though, will not deliver log files to the PEER project.
1
Any data supplied to CIBER will be stored on a secure server located at UCL in London and
held in accordance with UCL data protection policies, http://www.ucl.ac.uk/efd/recordsoffice/data-
protection/
Page 32 of 75
3.2.1
Structure of Log files
The log analysis team requests full and raw logs for two reasons [7].
1. Additional information over and above the minimum enables better validation of the
data.
2. Additional information provides information on patterns of use, and thus the
development of a richer model of user behaviour.
This implies refraining from the application of cleaning routines, typical of analytic tools
such as AW-Stats [8]. Also, it is assumed (according to [6, 7]) that log files may contain
non-PEER documents and that the filtering out PEER documents is an obligation of the
research team (see also “Identification of Documents”) [11].
A generic and basic specification of log file formats is provided by the W3C [9], commonly
used as “Common Log file Format [10]” and elaborated as “NCSA combined” or “NSCA
extended”.
attribute mandatory/
optional
example
Comment
host
m
125.125.125.125
maybe anonymised, see
below1
rfc931
o
-
username2
o
jdoe
date:time m
10/Oct/1999:21:15:05 +0500
Local time
request
m
"GET
/PEER_stage2_10.1017_S1751731
109003917.pdf HTTP/1.0"
PEER filename a must
statuscode m
200
Bytes
o
1043
referer
m
http://www.google.com/
Highly recommended
user_agent m
"Mozilla/5.0“
Table 4: Log file format
Optional fields that are missing must be represented as “-“. Log files are ascii-textfiles.
Fields are blank-separated and events are paragraph-separated. Please refer to the
Website [11] for details.
An example is:
66.249.66.5 - - [12/Jan/2009:20:31:53 +0100] "GET /pdf_frontpage.php?source_opus
=87&startfile=Egelhaaf_et_al_UniForsch2002.pdf HTTP/1.1" 302 414 "-" "Mozilla/5.
0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"
1
It should be noted that, at least according to German law, IP-addresses are not allowed to
be recorded and handed over to a third party. IP-logging can be either suppressed in the configura-
tion of the applied logging-routine or the log file has to be made anonymous before submitting them.
2
PubMan repository reports it cannot provide a username in the logs. This is also anony-
mised, and only logged-in users vs. non-logged in users are tracked.
Page 33 of 75
As an apache configuration:
"%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-agent}i\""
Where for reasons of confidentiality data has to be suppressed or anonymised then the
redacted fields should be replaced with a hash value. In the particular case of IP addresses
it is essential to provide the first three octets of the IP, e.g. ‘128.40.47.21’ may be rewritten
as ‘128.40.47.xxx’. A hash of the full value in addition is highly desirable.
The suggested procedure for redaction is thus:
Original log entry:
128.40.47.21 - - [31/Jul/2009:19:01:13 +0200] "GET
/docs/00/27/02/65/PDF/maladiesdesfemmes.pdf HTTP/1.1" 200 13032
"http://www.google.com/m?q=que%20se%20passe%20t%27il%20lorsqu%27une%20personne%20porte%20un%2
0gono%20qui%20n%27est%20pas%20bien%20soign%c3%a9&client=ms-opera-mini&channel=new"
"Opera/9.60 (J2ME/MIDP; Opera Mini/4.2.14881/504; U; fr) Presto/2.2.0"
applying a hash function on the IP address (The type of hash, e.g. MD5 or SHA, is not
important)
hash_function(‘128.40.47.21’) -> '1b5e84c1f858d5b9b6b06e47b6ca35ec'
Log entry provided for UCL:
128.40.47.xxx - - [31/Jul/2009:19:01:13 +0200] "GET
/docs/00/27/02/65/PDF/maladiesdesfemmes.pdf HTTP/1.1" 200 13032
http://www.google.com/m?q=que%20se%20passe%20t%27il%20lorsqu%27une%20personne%20porte%20un%20
gono%20qui%20n%27est%20pas%20bien%20soign%c3%a9&client=ms-opera-mini&channel=new" Opera/9.60
(J2ME/MIDP; Opera Mini/4.2.14881/504; U; fr) Presto/2.2.0"
IPHASH=1b5e84c1f858d5b9b6b06e47b6ca35ec
It is expected that different software environments (e.g. simple apache server logs as in the
case of standard repository systems or complex service oriented architectures as in the
case of the MPDL) will cause different local policies for providing log files and some pitfalls
are manifest:
? The filename or identifier appearing as http-request in the log file may only be
known to the application (repository) but has no reference to PEER documents.
? In other cases raw log files may contain only cryptic calls of services (e.g. a PHP
script1 Web-services, Session Management, Cookie etc.) that does not contain any
identifier and render a later identification of documents difficult or impossible.
? When http-'post' is used instead of http-'get' the identifier may be used as
‘TYPE=HIDDEN’ and does not appear in the log file.
These cases – as well as the many others that can occur – would render it impossible for
the research team to infer which usage event belongs to a specific PEER-document. Thus,
specific elaborations of the log files are to be prepared by an individual data provider.
These elaborations might have different formats and encodings (e.g. TXT, CSV, XML, XLS)
but the use of simple ASCII-textfiles is highly recommended, to avoid errors in the post-
processing by the research team and to limit their workload.
3.3
Identification of documents
Raw log files contain much data of no relevance to the PEER project. Although WP1 have
decided to leave the task of filtering out that data that are relevant for PEER to the Usage
research team, it is the assumed responsibility of WP2 to indicate the identification of the usage
1
192.168.47.11 - - [15/Jan/2009:07:35:06 +0100] "GET /sendfile.php?type=0&file_id=
8c49d37b913076c63054db5414d545c0 HTTP/1.1" 200 61846.
Page 34 of 75
events for publications relevant for PEER. This is conceived here essentially
as any kind of
object identifier that can be used to match strings in the usage log files.
As agreed [7], the research design (WP1) foresees that 100% metadata for publications
eligible in PEER are provided by the publishers (via a continuous FTP upload to the PEER
Depot). These metadata will be provided by PEER to the Usage research team (see 3.1.2),
in order to obtain the current list at any given point in time, enabling the matching between
usage events in log files and eligible articles.
It has also been agreed [12], that an identifier will be created at the PEER Depot (see Ch.
1.3.1) that should be used by repositories as a filename after the document has been
received from the PEER Depot.
This
filename of the full-text provided by PEER should, in an optimal situation, allow easy
tracking of usage events in the log files. It is therefore mandatory for participating reposito-
ries to represent this PEER-filename, either in the URL of the document or in any form that
allows a later mapping between an internal identifier occurring in the usage event and the
PEER-filename1. In the latter case, the deposit procedures of a participating repository
must thus ensure storage of the PEER-filename as an additional identifier for each
document. Furthermore, the participating repository must provide a list with pairs of local
identifiers and PEER-filenames or a pattern to match to the research team, to track which
usage event belongs to which document.
It was also decided [6;7] that only 50% of the articles eligible in PEER are deposited on
behalf of the publishers while the other 50% are subject to spontaneous author submission.
The latter 50% will be deposited via a central author deposit interface to the PEER Depot
and transferred to all participating repositories. Therefore the PEER Depot makes this
matching before depositing to repositories. Thus usage events can be readily identified in
the raw log files also for spontaneous deposited articles.
3.4
Expected Result
The expected result of this procedure is a service provided by UCL, by which each
participating repository uploads raw server log files that contain usage events of PEER
articles. The additional requirement of a list of articles eligible in PEER is subject to the
specification of the deposit process.
References
[1] PEER – Description of Work.
[2] Calls for Research Tender, 22 December 2008
http://www.peerproject.eu/press-releases-announcements/
[3] http://ie-repository.jisc.ac.uk/250/1/Usage_Statistics_Review_Final_report.pdf
[4] http://arxiv.org/PS_cache/cs/pdf/0605/0605113v1.pdf
[5] http://www.mesur.org
[6] PEER Steering Committee Meeting, Frankfurt 28-Nov-2008.
[7] PEER Kick-Off Meeting, Sophia-Antipolis 12-Sep-2008.
1
The filename will be “PEER_stage2_[url-encoded-DOI].zip” for publisher content and
“PEER_author_[url-encoded-DOI].zip” for author content. There may be an additional mapping
between the original PEER filename and the filename used in the repository.
Page 35 of 75
[8] http://www.awstats.org
[9] http://www.w3.org/TR/WD-logfile
[10] http://www.w3.org/Daemon/User/Config/Logging.html
[11] http://publib.boulder.ibm.com/tividd/td/ITWSA/ITWSA_info45/en_US/HTML/guide/c-logs.html
[12] PEER Technical Meeting, London 7-Nov-2008.
Page 36 of 75
4 Ongoing support for publishers and repository managers
4.1
Introduction
Communication between the publisher community, the PEER Depot and the repository
community has been ongoing during the course of the project and is documented in this
chapter to bring together the recent developments and resolution of outstanding issues
described in D2.1
Draft report on the provision of usage data and manuscript deposit
procedures for publishers and repository managers.
Due to the overlapping nature of the work, the main point of contact between the members
of work package 2/3 is the list service <peer-wp2-3@inria.fr>. The Project Manager serves
to represent the publisher community and is included in the listserv communication
mechanism. Face-to-face meetings are held regularly between participating publishers, the
repository task force and above all, the members of the respective work packages. This
provides the opportunity to discuss issues in detail which would exceed the limits of e-mail
contact. Several meetings dedicated to discussing technical issues in work package 2/3
were held at various locations. Partners and stakeholders across Europe hosted these
meetings: STM, London (M2) & (M4); Elsevier, Amsterdam (M6); INRIA, Paris (M8); SURF
Foundation, Utrecht (M10) and Max Planck Digital Library, Munich (M13).
The draft recommendations of D2.1 were tested in the course of these discussions. Queries
that have arisen in areas of concern are indicated and some alterations to the workflow are
formulated in this final report.
4.2
Establishment of a Helpdesk
4.2.1
Helpdesk functions
Actors involved in the ongoing support facility envisaged include authors, publishers,
repository representatives and PEER researchers. Support for stakeholders on deposit is
available from two sources. Firstly, the PEER website offers general information on the
project and detailed information tailored to the needs of the various stakeholder groups.
Secondly, a PEER Helpdesk1 online interface has been established. The Helpdesk is envis-
aged as a key point of contact for all the stakeholder communities participating in PEER
and has been established primarily as a central point of author support, available at
<http://peer.mpdl.mpg.de/helpdesk>.
This online interface, linked from the PEER website, is an authoritative source of informa-
tion. Publishers refer authors to the Helpdesk that, in turn, will direct the author to the
deposit interface (see Ch. 2.3.5 & Appendix D:
Peer Author Deposit interface specification).
As an online interface the Helpdesk will facilitate outreach and information provision activ-
ities and will moreover provide advice and support on the implementation of the D3.1
Guidelines2, and questions of deposit and transfer, as described in this report.
The Helpdesk will offer direct support by means of an online query and mediated response
service throughout the project duration. Automated systems have been investigated based
on the following project criteria:
? Meeting the diverse needs of three identified stakeholder communities
? Efficient query handling and response mechanisms
? Handling of specific query behaviour on predetermined information-seeking tasks
1
http://peer.mpdl.mpg.de/helpdesk
2
http://www.peerproject.eu/reports/
Page 37 of 75
The technical representatives of work package 2/3 came to the result to implement the
support facility in the form of a ticket system (software
Trac, which offers a Wiki and an
issue tracker in one). A ticketing system is highly effective since the questions and answers
are well documented. Each query result will be published, and the participants are able to
review arising issues. This system then also provides a mechanism of passive interaction
for those seeking assistance, but are unwilling to ask – a notable online query behaviour
pattern. Where the ticketing system is made public, the “wisdom of crowds” principle can be
applied to gain more efficient response to complex problems. Furthermore, frequently
asked questions (FAQs) have been developed on the basis of the query results for future
reference and published on the Helpdesk site, based on that established in DRIVER1.
Figure 6 below shows the generic ticketing system workflow:
Figure 6: UML activity diagram of Helpdesk ticketing system workflow
4.2.2
Helpdesk Workflow
The Helpdesk workflow has been modelled on the DRIVER Helpdesk system, supported by
the collaborative Wiki concept: Everybody receives all the queries and answers. But to
ensure that each question gets answered, every query is passed on to a designated
member of work package 2/3 who will be responsible for allocated areas of support. The
moderator at SUB G?ttingen is to monitor the Helpdesk and refer queries to representatives
of WP2/3 as per designated responsibilities. Figure 7 below shows the Helpdesk input flow.
1
DRIVER Helpdesk: http://helpdesk.driver.research-infrastructures.eu/
Expert
Moderator
Inquirer
(author, publisher, etc.)
Place a ticket
(e.g. question)
Review ticket
Read ticket answer
Publish ticket
Solve ticket
Publish ticket answer
Send ticket to expert
Send activity
notification
Page 38 of 75
Figure 7: PEER Helpdesk: Input flow
4.2.3
Helpdesk for Publishers and Repository Managers
Publishers will deposit both 50% of the full-text outputs, as well as 100% of the metadata
outputs from eligible journals at the PEER Depot. The 50% full-text outputs will be
transferred from the PEER Depot to the repositories participating in PEER.
Although it is expected that implementing the D3.1
Guidelines is straightforward, the PEER
Helpdesk will, however, support the consistent explanation and information on guiding
publishers and repository managers through the deposition process.
The support for publishers is provided by experts resp. representatives of INRIA and is
expected to cover queries regarding the expected metadata schema, transfer procedures,
deviations from profile submitted, etc.
The support for repository managers is provided by expert representatives from SURF and
MPDL. It is expected to cover queries regarding how to obtain the “NSCA combined” log file
format, if not directly available. This might entail the provision of scripts for mapping from
other formats, help for using the PEER-filename in the repository, advice on the
corresponding interface to implement the SWORD protocol, etc.
4.2.4
Helpdesk for Authors
4.2.4.3 Guidance for authors on deposit procedures
For reasons of data privacy, no direct communication is envisaged between the project
members and eligible authors. The procedures of author communication in the framework
of the PEER project are described in detail in the D3.1
Guidelines. Therefore, the D3.1
Guidelines are not directed at the author community directly, but rather they reflect the
considered opinion of the work package in consultation with the publisher community on
recommended practice in offering assistance to authors.
The PEER Helpdesk will not only offer guidance to publishers and repository managers,
often already involved in large-scale archiving, but mainly to authors, who may need
PEER website
PEER Helpdesk
(Trac)
FAQ –
ticket
results
e-mail –
ticket
results
Author is
directed to
deposit
interface
PEER Deposit
Interface / PEER
Depot
Participating
Repositories
SWORD + TEI + PDF
Feedback to authors with link
to publications in Repositories
Authors
indicates URL
of alternative
non-PEER
repository
Page 39 of 75
guidance in self-archiving for the first time. A recent study by Swan indicates that a sub-
stantial proportion of the author population (36%) are unaware of the possibility of providing
Open Access to their work by self-archiving, and that only 49% of the author population
have self-archived in some way. Of relevance to the PEER Helpdesk is the observation that
authors have frequently expressed reluctance to self-archive because of the perceived time
required and possible technical difficulties in carrying out this activity. However, similar
findings suggest that only 20% of authors found some degree of difficulty with the first act of
depositing an article in a repository, and that this dropped to 9% for subsequent deposits.1
Therefore, information on the Helpdesk is intended to be given in a plain and easy way.
The authors will be guided to the PEER Helpdesk by their publishers. The PEER Helpdesk
offers:
? Generic information on the PEER project
? Option of deposit
? The possibility to pose questions by creating a ticket
? FAQ
? Information on embargo periods
? Information on and for publishers
? Information on and for repositories
Enabling the author to choose to deposit their data within the scope of the project but also to
an alternative repository of their choice, in accordance with already established practice, the
PEER Helpdesk offers the author two options of deposit:
a. PEER Author deposit
In line with the way of author deposit the project agreed upon (Ch. 2.3.4) the author will
be guided from the Helpdesk to the central deposit interface for deposit of accepted
manuscripts from participating PEER journals in PEER repositories. At the same time
he/she will be alerted that by deposting once, his/her manuscript will be available in all
the participating repositories after the expiration of the embargo period.
b. non-PEER Author deposit
As agreed in D3.1
Guidelines2, the project will address those authors who do not wish
to deposit within the framework of the PEER project. Accordingly, the author will be
guided to a dedicated page on the Helpdesk, where he/she is requested to insert the
URL of the article submitted in his/her preferred repository of choice. The data gathered
will be sent to the behavioural research team.
No direct communication between the project and the author is envisaged for reasons
of data privacy3. Nevertheless, for the project, in particular for the behavioural research
team, it would be worth collecting details of non-PEER deposits. Therefore, the author
is additionally requested to provide his/her name and e-mail address to enable the
behavioural research team to possibly contact him/her with a questionnaire.
1
SWAN, A. & BROWN, S. (2005)
Open access self-archiving: An author study.
http://cogprints.org/4385/
2
D3.1
Guidelines, Appendix C, Ch. 2.1, http://www.peerproject.eu/reports/
3
D3.1
Guidelines, Ch. 4.3.1, http://www.peerproject.eu/reports/
Page 40 of 75
The PEER Privacy Policy states:
If author information is to be used […], permission will be asked of the authors at the
point where the author provides this information.
Individuals may be contacted for the purposes of PEER research either by publishers,
repositories, or directly by the research teams. Where such contact takes place, it will
be undertaken in accordance with the data protection and privacy policies of the
relevant organisation.1
Hence, it is optional for the author to provide the two last details on a voluntary basis.
The author will be alerted that by giving his/her e-mail address he/she agrees to be
addressed by the behavioural research team. Although we cannot determine how many
author deposits will be done outside the scope of the project, because authors may or
may not declare their intent, the information gathered on alternative deposit may provide
useful information to the behavioural research.
1
PEER Privacy Policy: http://www.peerproject.eu/privacy-policy/
Page 41 of 75
5 Conclusions
This report concludes the development of an overall framework for depositing stage-two
outputs in and for harvesting log files from repositories. An innovative workflow has been
devised to describe and standardise the deposit from publishers to repositories that demon-
strates, in a core group of interoperable European repositories, the capability of accepting
material deposited from third party publishers and authors beyond the project duration.
The development of an appropriate workflow for author deposits has proved challenging, as
the author response is unpredictable, and cannot readily be standardised. The guiding prin-
ciple adopted is that authors are encouraged to follow their established practice of deposit
in an institutional or subject-specific repository. Failing such practice, a central deposit in
the PEER Depot for distribution to designated PEER repositories is recommended.
A number of concerns remain, and may yet impact on the project outcomes. The author
deposit workflow described in this report is acknowledged as no more than an effort to keep
track of authors self-archiving to PEER repositories and other repositories. While it is
precisely this lack of a controlled response to the invitation to participate that will inform the
behaviour and usage research investigations in Work Packages 4 & 5 respectively, without
significant author participation, the value of the research may ultimately be compromised.
Another concern arising from the project design is the limitation of the research sample,
resulting from the filtering process. While 50% full-texts of the publishers’ content is dissemi-
nated to repositories and the LTP Depot, in fact, only that portion represented by the Euro-
pean corresponding author within that 50% are effectively disseminated. The effective per-
centage of disseminated content will therefore be lower than 50%. This potential deficiency
is noted for ongoing monitoring in work package 3, and adjustment of content quotas is
recommended during the course of the project to ensure a valid research procedure.
Evident too is the delayed research implementation affected by the 6 month embargo
period. Ongoing attempts to secure back files accumulated by publishers may serve to alle-
viate this concern. Furthermore, log files from the repositories for at least the previous six
months are required before PEER content becomes available in order to indicate whether
additional PEER content makes any difference to usage levels. For participating reposito-
ries that are dedicated PEER repositories this requirement is invalid, since they contain no
legacy content.
Despite the accepted recommendation by CIBER to support an increased rate of usage of
predominantly English language content material by the inclusion of repositories from
English-speaking countries, this has yet to be achieved. Although this recommendation will
be pursued, preliminary enquiries indicate a reluctance to participate in the project,
ostensibly on the basis of heavy workloads of repository managers, who furthermore do not
benefit financially from the project.
The final report on the provision of usage data and manuscript deposit procedures for
publishers and repository managers reflects a collaborative effort between publishers and
the library and repository stakeholder communities to achieve a feasible workflow for depo-
siting stage-2 outputs and for harvesting log files from repositories. The limitations of the
project design have been identified and made known to the behavioural and usage
research teams in WP 4 and 5 respectively, to monitor their anticipated impact on the pro-
ject outcomes.
Page 42 of 75
Appendix A.
Participating journals
PEER: Author submission Journal list by publisher
Publisher/ Journal
ISSN
Broad Classification
Embargo*
(months)
Language
(if not Eng)
BMJ Publishing Group
Journal of Neurology, Neurosurgery and
Psychiatry (including Practical Neurology )
0022-3050 Medicine
6
Journal of Medical Genetics
0022-2593 Medicine
5
Sexually Transmitted Infections
1368-4973 Medicine
5
Cambridge University Press
The Journal of Agricultural Science
0021-8596 Life Sciences
12
Bilingualism: Language and Cognition
1366-7289 Social Sciences &
Humanities
12
Journal of Biosocial Science
0021-9320 Life Sciences
12
Journal of Helminthology
0022-149X Life Sciences
12
Science in Context
0269-8897 Social Sciences &
Humanities
12
Urban History
0963-9268 Social Sciences &
Humanities
12
Elsevier
Annales d’Endocrinologie
0003-4266 Life Sciences
18
French
Annales de Dermatologie et de Venereologie
0151-9638 Medicine
18
French
Annals of Pure and Applied Logic
0168-0072 Physical Sciences
18
Applied Acoustics
0003-682X Physical Sciences
24
Biomass and Bioenergy
0961-9534 Physical Sciences
24
Blood Cells Molecules and Diseases
1079-9796 Medicine
18
Brain and Language
0093-934X Life Sciences
18
Cell Calcium
0143-4160 Life Sciences
12
Computers and Geotechnics
0266-352X Physical Sciences
24
Energy
0360-5442 Physical Sciences
18
Enfermedades infecciosas y Microbiologia
Clinica
0213-005X Medicine
18
Spanish
European Journal of Radiology
0720-048X Medicine
18
European Journal of Soil Biology
1164-5563 Life Sciences
18
European Journal of Surgical Oncology
(EJSO)
0748-7983 Medicine
12
Fire Safety Journal
0379-7112 Physical Sciences
24
Immunology Letters
0165-2478 Life Sciences
12
International Journal of Antimicrobial Agents
0924-8579 Medicine
18
Page 43 of 75
Journal of Pragmatics
0378-2166 Social Sciences &
Humanities
24
Journal of Theoretical Biology
0022-5193 Life Sciences
18
Materials Science in Semiconductor
Processing
1369-8001 Physical Sciences
24
Nuclear Engineering and Design
0029-5493 Physical Sciences
24
Radiotherapy and Oncology
0167-8140 Medicine
18
Sociologie du Travail
0038-0296 Social Sciences &
Humanities
24
French
Solar Energy
0038-092X Physical Sciences
24
Telecommunications Policy
0308-5961 Physical Sciences
18
IOP Publishing
Classical and Quantum Gravity
0264-9381 Physical Sciences
12
Journal of Physics A: Mathematical and
Theoretical
1751-8113 Physical Sciences
24
Journal of Physics: Condensed Matter
0953-8984 Physical Sciences
12
Nature Publishing Group
Bone Marrow Transplantation
0268-3369 Medicine
6
Embo Journal, The
0261-4189 Life Sciences
6
Gene Therapy
0969-7128 Life Sciences
6
Genes & Immunity
1466-4879 Life Sciences
6
Leukemia
0887-6924 Medicine
6
Nature Genetics
1061-4036 Life Sciences
6
Nature Structural & Molecular Biology
1545-9993 Life Sciences
6
Oncogene
0950-9232 Life Sciences
6
Oxford University Press
Family Practice
0263-2136 Medicine
12
Molecular Biology and Evolution
0737-4038 Life Sciences
12
Systematic Biology
1063-5157 Life Sciences
12
Annals of Occupational Hygiene
0003-4878 Medicine
12
Sage Publications
Active Learning in Higher Education
1469-7874 Social Sciences &
Humanities
6
Concurrent Engineering
1063-293X Physical Sciences
12
Cultural Geographies
1474-4740 Social Sciences &
Humanities
12
Ethnicities
1468-7968 Social Sciences &
Humanities
24
European Journal of Cultural Studies
1367-5494 Social Sciences &
Humanities
18
European Journal of Industrial Relations
0959-6801 Social Sciences &
Humanities
12
Page 44 of 75
European Journal of Women's Studies
1350-5068 Social Sciences &
Humanities
18
European Union Politics
1465-1165 Social Sciences &
Humanities
24
Global Social Policy
1468-0181 Social Sciences &
Humanities
6
Group Processes and Intergroup Relations
1368-4302 Social Sciences &
Humanities
18
Health
1363-4593 Social Sciences &
Humanities
12
History of Psychiatry
0957-154X Social Sciences &
Humanities
12
International Journal of Damage Mechanics
1056-7895 Physical Sciences
12
Journal of Biomaterials Applications
0885-3282 Physical Sciences
12
Journal of Plastic Film and Sheeting
8756-0879 Physical Sciences
12
Journal of Thermoplastic Composite Materials 0892-7057 Physical Sciences
12
Public Understanding of Science
0963-6625 Social Sciences &
Humanities
18
Second Language Research
0267-6583 Social Sciences &
Humanities
24
Time & Society
0961-463X Social Sciences &
Humanities
12
Vascular Medicine
1358-863X Medicine
12
Springer
Biotechnology Letters
0141-5492 Life Sciences
12
Cancer Chemotherapy and Pharmacology
0344-5704 Medicine
12
Celestial Mechanics and Dynamical
Astronomy
0923-2958 Physical Sciences
6
European Journal of Clinical Microbiology &
Infectious Diseases
0934-9723 Life Sciences
12
European Journal of Epidemiology
0393-2990 Medicine
12
Holz Als Roh und Werkstoff
0018-3768 Physical Sciences
12
German
Journal of Ornithology
0021-8375 Life Sciences
12
Journal of Molecular Modeling
1610-2940 Physical Sciences
12
Neophilologus
0028-2677 Social Sciences &
Humanities
6
Nonlinear Dynamics
0924-090X Physical Sciences
12
Queueing Systems
0257-0130 Social Sciences &
Humanities
24
Rheumatology International
0172-8172 Medicine
12
Taylor & Francis Group
Applied Economics Letters
1350-4851 Social Sciences &
Humanities
18
British Journal of Guidance and Counselling
0306-9885 Social Sciences &
Humanities
12
Page 45 of 75
Civil Engineering and Environmental Systems
1028-6608 Physical Sciences
12
Communications in Statistics – Theory and
Methods
0361-0926 Physical Sciences
12
Ergonomics
0014-0139 Physical Sciences
12
International Journal of Environmental
Analytical Chemistry
0306-7319 Physical Sciences
12
International Journal of Psychology
0020-7594 Life Sciences
12
International Journal of Remote Sensing
0143-1161 Physical Sciences
12
International Journal of Systems Science
0020-7721 Physical Sciences
12
Journal of Engineering Design
0954-4828 Physical Sciences
12
Journal of Modern Optics
0950-0340 Physical Sciences
12
Journal of Natural History
0022-2933 Life Sciences
12
Journal of Sports Sciences
0264-0414 Social Sciences &
Humanities
18
Optimization Methods and Software
1055-6788 Physical Sciences
12
Phase Transitions
0141-1594 Physical Sciences
12
Philosophical Magazine Letters
0950-0839 Physical Sciences
12
Phsychotherapy Research
1050-3307 Social Sciences &
Humanities
12
Wiley-Blackwell
Applied Cognitive Psychology
0888-4080 Social Sciences &
Humanities
24
Applied Organometallic Chemistry
0268-2605 Physical Sciences
24
Biomedical Chromatography
0269-3879 Physical Sciences
24
Biopharmaceutics and Drug Disposition
0142-2782 Life Sciences
12
Computer Animation and Virtual Worlds
1546-4261 Physical Sciences
24
Concurrency and Computation: Practice &
Experience
1532-0626 Physical Sciences
24
Contrast Media and Molecular Imaging
1555-4309 Physical Sciences
24
European Law Journal
1351-5993 Social Sciences &
Humanities
24
European Transactions on Electrical Power
1430-144X Physical Sciences
24
Forest Pathology
1437-4781 Life Sciences
12
Higher Education Quarterly
0951-5224 Social Sciences &
Humanities
24
Hippocampus
1050-9631 Life Sciences
12
Infant and Child Development
1522-7227 Social Sciences &
Humanities
24
International Journal for Numerical Methods in
Engineering
0029-5981 Physical Sciences
24
International Journal of Adaptive Control and
Signal Processing
0890-6327 Physical Sciences
24
Page 46 of 75
International Journal of Applied Linguistics
0802-6106 Social Sciences &
Humanities
24
International Journal of Osteoarchaeology
1047-482X Life Sciences
12
International Journal of Systematic Theology
1463-1652 Social Sciences &
Humanities
24
Journal of Advanced Nursing
0309-2402 Medicine
12
Journal of Clinical Periodontology
0303-6979 Medicine
12
Journal of Molecular Recognition
0952-3499 Physical Sciences
24
Journal of Sociolinguistics
1360-6441 Social Sciences &
Humanities
24
Luminescence
1522-7235 Physical Sciences
24
Marine Ecology
0173-9565 Life Sciences
24
Modern Theology
0266-7177 Social Sciences &
Humanities
24
Particle and Particle Systems
Characterization
0934-0866 Physical Sciences
24
Polymers for Advanced Technologies
1042-7147 Physical Sciences
24
River Research and Applications
1535-1459 Life Sciences
12
Social Policy & Administration
0144-5596 Social Sciences &
Humanities
24
Zoo Biology
0733-3188 Life Sciences
12
* Authors are recommended to refer to the
PEER Helpdesk
(http://peer.mpdl.mpg.de/helpdesk/wiki/embargoperiods) for an explanation of the embargo period
and how this relates to their submissions to participating PEER repositories
PEER: Publisher submission Journal list by publisher
Publisher/ Journal
ISSN
Broad Classification
Embargo
(months)
Language
(if not Eng)
BMJ Publishing Group
British Journal of Ophthalmology
0007-1161 Medicine
6
Journal of Epidemiology and Community Health 0143-005X Medicine
6
Tobacco Control
0964-4563 Medicine
5
EDP Sciences
ESAIM: Probability and Statistics
1292-8100 Physical Sciences
12
French/
Eng
The European Physical Journal ? Applied
Physics
1286-0042 Physical Sciences
12
Elsevier
Annales Medico-Psychologiques
0003-4487 Medicine
18
French
Applied Thermal Engineering
1359-4311 Physical Sciences
24
Astroparticle Physics
0927-6505 Physical Sciences
18
Page 47 of 75
Biochemical Pharmacology
0006-2952 Life Sciences
12
Biochimica et Biophysica Acta (BBA) –
Molecular Basis of Disease
0925-4439 Life Sciences
12
Biophysical Chemistry
0301-4622 Physical Sciences
18
Composites Science and Technology
0266-3538 Physical Sciences
18
Computer Speech & Language
0885-2308 Physical Sciences
18
European Journal of Mechanics ? A/Solids
0997-7538 Physical Sciences
24
Experimental and Toxicologic Pathology
0940-2993 Life Sciences
18
Experimental Gerontology
0531-5565 Medicine
18
Human Movement Science
0167-9457 Life Sciences
18
Icarus
0019-1035 Physical Sciences
18
International Journal of Impact Engineering
0734-743X Physical Sciences
24
International Journal of Non-Linear Mechanics
0020-7462 Physical Sciences
18
Journal of Econometrics
0304-4076 Social Sciences &
Humanities
36
Journal of Economic Behavior & Organization
0167-2681 Social Sciences &
Humanities
36
Journal of Economic Dynamics & Control
0165-1889 Social Sciences &
Humanities
36
Journal of Experimental Social Psychology
0022-1031 Social Sciences &
Humanities
36
Journal of Geodynamics
0264-3707 Physical Sciences
18
Journal of Physics and Chemistry of Solids
0022-3697 Physical Sciences
18
Marine Environmental Research
0141-1136 Life Sciences
12
Molecular and Cellular Endocrinology
0303-7207 Life Sciences
12
Physics of the Earth and Planetary Interiors
0031-9201 Physical Sciences
24
Pulmonary Pharmacology & Therapeutics
1094-5539 Medicine
18
Speech Communication
0167-6393 Physical Sciences
18
Statistics & Probability Letters
0167-7152 Physical Sciences
24
Veterinary Microbiology
0378-1135 Medicine
18
IOP Publishing
Journal of Physics B: Atomic, Molecular and
Optical Physics
0953-4075 Physical Sciences
12
Journal of Physics D: Applied Physics
0022-3727 Physical Sciences
12
Journal of Physics G: Nuclear and Particle
Physics
0954-3899 Physical Sciences
12
Nature Publishing Group
Cell Death and Differentiation
1350-9047 Life Sciences
6
European Journal of Clinical Nutrition
0954-3007 Medicine
6
European Journal of Human Genetics
1018-4813 Life Sciences
6
Molecular Psychiatry
1359-4184 Medicine
6
Page 48 of 75
Nature Immunology
1529-2908 Life Sciences
6
Nature Neuroscience
1097-6256 Life Sciences
6
Neuropsychopharmacology
0893-133X Life Sciences
6
Prostate Cancer and Prostatic Diseases
1365-7852 Medicine
6
Oxford University Press
International Journal of Epidemiology
0300-5771 Medicine
12
Journal of Plankton Research
0142-7873 Life Sciences
12
Portland Press
Clinical Science
0143-5221 Medicine
12
Springer
Agriculture and Human Values
0889-048X Social Sciences &
Humanities
24
Annals of Hematology
0939-5555 Medicine
12
Breast Cancer Research and Treatment
0167-6806 Medicine
12
Crime Law and Social Change
0925-4994 Social Sciences &
Humanities
24
European Child & Adolescent Psychiatry
1018-8827 Social Sciences &
Humanities
24
European Journal of Clinical Pharmacology
0031-6970 Life Sciences
12
European Journal of Population
0168-6577 Social Sciences &
Humanities
6
European Journal of Wildlife Research
1612-4642 Life Sciences
12
Formal Aspects of Computing
0934-5043 Physical Sciences
12
Helgoland Marine Research
1438-387X Physical Sciences
12
Journal of Public Health
0943-1853 Social Sciences &
Humanities
6
Journal of Seismology
1383-4649 Physical Sciences
12
Linguistics and Philosophy
0165-0157 Social Sciences &
Humanities
24
Review of World Economics
1610-2878 Social Sciences &
Humanities
24
Revue de Synthese
0035-1776 Social Sciences &
Humanities
24
French
Taylor & Francis Group
Aids Care
0954-0121 Life Sciences
12
Applied Economics
0003-6846 Social Sciences &
Humanities
18
Avian Pathology
0307-9457 Life Sciences
12
British Poultry Science
0007-1668 Life Sciences
12
Communications in Statistics – Simulation and
Computation
0361-0918 Physical Sciences
12
Engineering Optimization
0305-215X Physical Sciences
12
Page 49 of 75
Ethnic and Racial Studies
0141-9870 Social Sciences &
Humanities
18
Europe-Asia Studies
0966-8136 Social Sciences &
Humanities
18
Food Additives & Contaminants (Part A)
0265-203X Life Sciences
12
International Journal of Computer Integrated
Manufacturing
0951-192X Physical Sciences
12
International Journal of Computer Mathematics
0020-7160 Physical Sciences
12
International Journal of Production Research
0020-7543 Physical Sciences
12
International Journal of Science Education
0950-0693 Social Sciences &
Humanities
18
Journal of Development Studies
0022-0388 Social Sciences &
Humanities
18
Molecular Physics
0026-8976 Physical Sciences
12
Molecular Simulation
0892-7022 Physical Sciences
12
Philosophical Magazine
1478-6435 Physical Sciences
12
Psychology and Health
0887-0446 Social Sciences &
Humanities
12
Quantitative Finance
1469-7688 Social Sciences &
Humanities
18
Regional Studies
0034-3404 Social Sciences &
Humanities
18
Supramolecular Chemistry
1061-0278 Physical Sciences
12
Technology Analysis & Strategic Management
0953-7325 Social Sciences &
Humanities
18
Wiley-Blackwell
Alimentary Pharmacology & Therapeutics
0269-2813 Medicine
12
Allergy
0105-4538 Medicine
12
American Journal of Hematology
0361-8609 Medicine
12
Bioethics
0269-9702 Social Sciences &
Humanities
24
Biotechnology Journal
1860-6768 Life Sciences
12
British Journal of Haematology
0007-1048 Medicine
12
Cell Biochemistry and Function
0263-6484 Life Sciences
12
Clinical Endocrinology
0300-0664 Medicine
12
Corporate Governance
0964-8410 Social Sciences &
Humanities
24
Developing World Bioethics
1471-8731 Social Sciences &
Humanities
24
Developmental Science
1363-755X Social Sciences &
Humanities
24
Electrophoresis
0173-0835 Life Sciences
12
Fuel Cells
1615-6846 Physical Sciences
24
Global Change Biology
1354-1013 Life Sciences
24
Page 50 of 75
Haemophilia
1351-8216 Medicine
12
Histopathology
0309-0167 Medicine
12
Human Brain Mapping
1065-9471 Life Sciences
12
Human Mutation
1059-7794 Life Sciences
12
International Journal of Clinical Practice
1368-5031 Medicine
12
Journal of Clinical Ultrasound
0091-2751 Medicine
12
Journal of Community and Applied Social
Psychology
1052-9284 Social Sciences &
Humanities
24
Journal of Medical Virology
0146-6615 Medicine
12
Journal of Physical Organic Chemistry
0894-3230 Physical Sciences
24
Molecular Microbiology
0950-382X Life Sciences
12
Oral Diseases
1354-523X Medicine
12
Pediatric Anesthesia
1155-5645 Medicine
12
Pediatric Pulmonology
8755-6863 Medicine
12
Phytotherapy Research
0951-418X Life Sciences
12
Social Development
0961-205X Social Sciences &
Humanities
24
ZAAC ? Zeitschrift für anorganische und
allgemeine Chemie / Journal of Inorganic and
General Chemistry
0044-2313 Physical Sciences
24
German /
English
Page 51 of 75
Appendix B.
Technical specifications for CSV metadata provision
The CSV file must conform with the following specifications:
? Filename not important, but extension must be '.csv'
? UTF8 encoding
? Quote character
"
? Separation character
,
? End-of-line
\n
? Field names included in the first line
? Field names must be among :
Textual column title
Comment
author_country
country code ISO 3166-1-A2
author_firstname
author_middle
author_lastname
author_email
affiliation_institution
affiliation_department
pubdate
ISO 8601
article_title
journal_title
publisher_article_id
doi
abstract
issn
volume
issue
fpage
lpage
subject
Lang
ISO 639-3
embargo
In months
We insist that, except for 'publisher_article_id' and 'doi' which are used for linking both
passes, there is no overlapping between the metadata sets of both passes. Metadata
submitted twice will not be updated.
Page 52 of 75
Appendix C.
The SWORD protocol
1
Introduction
In the PEER project, selected stage-2 material from publishers is being transferred to or
deposited into the PEER Depot after which the content is being transferred from the depot
to multiple, publicly available repositories.
The stage-2 material will be transferred in a Submission Information Package (SIP) contain-
ing the full-text publication, metadata and the complementary stage-2 source files. The
SWORD AtomPub profile contains specific features that allows for an application-level
deposit of material into repositories.
The PEER information model can be mapped onto the OAIS Reference Model and the
DRIVER object model for Enhanced Publications.
Implementers may set up their own server conforming to these guidelines using one of
repository specific implementations available from SourceForge, or write their own custom
implementation either using the generic Java library, also available from SourceForge,
begin their implementation from scratch.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD",
"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
interpreted as described in RFC 2119.
It is assumed that the reader of this document has knowledge of the PEER D2.1 report1,
SWORD profile v1.32 , the OAIS3 Reference Model4 and the DRIVER5 II Enhanced
Publication object model and Functionalities6.
1.1
SWORD overview
The SWORD AtomPub Profile is an application profile of the Atom Publishing Protocol
(APP) (RFC 5023)7 that contains specific features that allows for an application-level
deposit of material into repositories.
The APP is based on the HTTP transfer of Atom-formatted representations. It is easy to
think of APP as a way of publishing just Atom Syndication Format feeds. While it is true that
APP provides the means to publish Atom Syndication Format Entries to collections (such as
blogs), it also provides a mechanism for the publishing of binary formatted data called
Media Resources in APP context (Internet Engineering Task Force 2007). While in the blog
scenario this mechanism may be used to add attachments to a blog post i.e. images, audio,
video, documents), SWORD exploits this for the publishing (or deposit) of material into
repositories, usually in some form of content packaging in which data and descriptive
metadata are being held together in one container (see Figure 8).
1
PEER D2.1
Draft report on log file harvesting systems and manuscript deposit procedures
for publishers and repository managers, http://www.peerproject.eu/reports/
2
Allinson, J et al 2008,
SWORD AtomPub Profile version 1.3, viewed 25 March 2009
http://www.swordapp.org/docs/sword-profile-1.3.html
3
Open Archival Information System.
4
Consultative Committee for Space Data Systems 2002,
OAIS Reference Model
http://public.ccsds.org/publications/archive/650x0b1.pdf
5
Digital Repository Infrastructure Vision for the European Region.
6
Verhaar, P & Place, T 2008,
Report on Object Models and Functionalities, DRIVER II D4.2.
7
Internet Engineering Task Force 2007,
The Atom Publication Protocol, RFC 5023, Internet
Engineering Task Force, http://tools.ietf.org/html/rfc5023
Page 53 of 75
Figure 8: Content Package or Container
An example of an implementation of such a container would be a ZIP-file containing a full-
text manuscript in the PDF/A-1 format and descriptive metadata in the TEI-XML format.
The container is being submitted by a client to a SWORD interface service (server) as a bit
stream using a HTTP POST request consisting of a header containing information about
authorisation and the bit stream (type and format of the container) in order for the server to
be able to interpret the bit stream properly, and a body part containing the bit stream itself
(see Figure 9). Upon reception, the server sends a HTTP response back to the client –
again consisting of a header and a body part – with the header containing a HTTP status
code indicating a success or failure of the attempted deposit according to regular HTTP
semantics, and a response document containing additional APP/SWORD specific informa-
tion about the deposit being made.
Figure 9: HTTP request and response structure in the SWORD context
1.2
Use of SWORD in PEER
In the PEER workflow there are two scenarios of deposits into the PEER repositories
specified: deposit made by PEER and deposit made by authors (see Figure 10)
Page 54 of 75
Figure 10: PEER Workflow
Figure 11: Deposit situation
This results in an n:n-relation between repositories and deposit sources either the PEER
Depot or third party services operated by an author (see Figure 11). To prevent multiple
tailored solutions and implementations it is important to define a standard process for the
deposit of material into repositories.
The processes may be categorised into two types of mechanisms:
push and pull. An
example of the
pull mechanism is the KB's mechanism of the e-depot harvesting
repositories through OAI-PMH and pulling content using a webclient (see Figure 12) which
downloads the objects specified in the location entries in the metadata.
Figure 12: OAI-PMH data harvest
An example of the
push mechanism is the SWORD deposit mechanism where the data is
being pushed by an agent (i.e. a webservice or desktop application representing a user) to
the SWORD interface of a repository which then accepts or rejects the deposit (see
Figure 13).
Figure 13: SWORD data deposit
deposit
deposit
PEER Depot
eligible journals / articles
100% metadata 50% manuscripts
50%
manuscripts
select
inform
deposit
deposit
Publishers
deposit
PEER Repositories
HAL
KTU
UGOE
ULD
MPG
UNIBI
deposit
LTP Depot
deposit
Central Deposit Interface
deposit
Authors
External
Repositories
(institutional or
subject-based)
Page 55 of 75
Finally, a third, hybrid mechanism can be created by setting up an FTP server to which
deposits can be uploaded (pushed) by an agent. A repository may then pull the FTP
content which is then being pulled into the repository (see Figure 14).
Figure 14: SWORD versus FTP
A disadvantage of this mechanism is that this only provides direct feedback to the agent
about status of the upload, not of the status of the actual deposit into the repository. This
may lead to the situation when an agent successfully uploads data to the FTP server, but
the data is being rejected by the repository afterwards because it does not adhere to rules
the repository enforces on its contents without the agent being informed about this rejection
– something that is not the case when using SWORD.
Figure 15 provides a schematic overview of the use of SWORD in the PEER deposit
scenario. Here a publisher transfers manuscripts and metadata into the PEER Depot where
the manuscripts and metadata are being converted and crosswalked to the formats
specified for the PEER deposit process. The converted and crosswalked manuscripts and
metadata are then being packaged into a container and sent to the SWORD interface
service of a repository where the contents are being unpacked from the container. Upon
reception these MAY be converted and crosswalked into an internal storage format before
they are being archived into the repository.
Page 56 of 75
Figure 15: SWORD use in PEER for PEER Depot
2
Use of SWORD features
2.1
About this section
This section will describe the use of the SWORD profile in the context of the PEER project.
The contents are organised according and supplementary to the document SWORD Atom
Pub Profile version 1.3 part A. If a SWORD profile section or feature is omitted, implemen-
tations MUST behave as defined in SWORD profile.
2.2
Package Support
The PEER Submission Information Package (SIP) MAY be expressed using (a combination
of) different formats (i.e. XML containers or RFC 1951 compliant ZIP archives) and/or
serialised using different structural models (i.e. DIDL, METS, ORE, TEI, NLM, MODS, DC).
The mappings between the SIP, its components and the formats and structures will be
defined and expressed using specialised application profiles developed in the PEER context.
Figure 16: Submission Information Package structure
Page 57 of 75
The SWORD profile offers the possibility to enumerate multiple packaging formats in the
Service Document and supply a Quality Value attribute indicating a preference and level of
support for a designated package format.
2.2.1
Package support in Service Description
The server MAY support multiple packaging formats with varying quality values according
to the support of the PEER Submission Information Package (SIP).
The server MUST support at least one package format with Quality Value “1.0”, indicating
full support where all components supplied within the SIP will be processed and understood
when using the designated package format.
All supported formats MUST be listed in the Service Document.
All formats listed in the Service Document MUST have a Quality Value attribute assigned.
The value used in the <sword:accepted Packaging> element MUST NOT overload any
values enumerated in the SWORD Content Package Types.
The server MAY use the <sword:service> element in the Service Document to indicate the
existence of other service interfaces supporting additional package formats.
The server SHOULD NOT accept a specific package format across multiple interfaces with
different levels of support as indicated by the Quality Value attribute in the Service Docu-
ment.
2.2.2
Package Support during Resource Creation
If a server receives a POST request with a format that is not listed as an accepted format in
the Service Document, the server MUST reject the package by returning an HTTP status
code of 415 (unsupported media type).
2.2.3
Package description in entry documents
When describing packaged resources in Media Entry documents, the server SHOULD add
sword:packaging elements to the entry.
2.3
Mediated Deposit
The following paragraph is considered informative, but is included for clarity in the use of
the SWORD profile outside the PEER project.
The PEER workflow offers two ways a manuscript can be deposited into one of the publicly
available PEER repositories: either by publisher deposit (through the PEER Depot) or by
author deposit (where the publisher informs the author who deposits his/her article(s) via
the depot interface /the PEER Depot at the actual publicly available repositories).
For the author deposit, the author MAY make the deposit by proxy through a web service
(i.e. by filling in a form to provide the metadata and upload a file containing the full-text
material) after which the web service is making the actual deposit. The web service MAY
not be used for the PEER project exclusively in which case the web service MAY use its
own credentials to authenticate at the server (at the repository side).
Figure 17 depicts an example of the use of this mechanism in the PEER context. Note that
the greyed out parts of the figure are considered outside the scope of the PEER project.
Page 58 of 75
Figure 17: PEER deposit workflow
It is recognised that the repository MAY want to keep track of data that is being deposited
within the PEER context by creating a single user account to the PEER Depot. This then
covers the publisher deposit workflow, but does not provide for a solution for the case of
author deposit through another web service which MAY use different credentials.
A possible solution MAY be the use of mediated deposit where a client authenticates using
its assigned credentials on behalf of another known user (e.g. a web service authenticates
using its own credentials and makes the deposit on behalf of the PEER user which is used
by the PEER Depot).
This method MAY also be used to authenticate on behalf of other users (i.e. authors,
librarians, data stewards, research assistants, etc.) that already have a valid user account
at the repository.
The use of mediated deposit is considered OPTIONAL and is currently not implemented in
the application of the SWORD profile within the PEER project.
2.3.1
Mediation in Service Description
Servers supporting mediated deposit MUST indicate this by including a SWORD:mediation
element with a value of “true” in the Service Document as defined in the SWORD profile
version 1.3 section 2.1.
For servers that do not include a SWORD mediation element in the Service Document, a
default value of “no” SHOULD be assumed by clients.
Page 59 of 75
2.4
Auto-discovery
AtomPub makes no recommendations on the discovery of Service Documents.
The SWORD profile states that it is RECOMMENDED that server implementations use an
<html:link rel="sword" href="[Service Document URL]"/> element in the head of a relevant
HTML document to assist with service discovery.
In addition, it is RECOMMENDED to also include an <atom:link rel="sword"
type="application/atomsvc+xml" href="[Service Document URL]"/> element in relevant
response documents such as Error Documents.
2.5
Nested Service Descriptions
Nested Service Descriptions MAY be used to specify alternative collections for both
organisational (i.e. generic collection with a nested PEER specific collection) and technical
purposes (i.e. a specific interface or service instance to cater for specific types of content
packaging).
3
Use of APP features
The contents of the following section are organised according and supplementary to the
document SWORD Atom Pub Profile version 1.3 part B. If a SWORD profile section or
feature is omitted, implementations MUST behave as defined in SWORD profile.
3.1
Securing the Atom Publishing Protocol
The SWORD profile states servers SHOULD support the use of HTTP Basic Authentication
over TLS. Because from a trust perspective it is important to confirm the identity of the
PEER Depot during the deposit proces, this statement is considered insufficient for the
purposes of the PEER project.Therefor this requirement has been restated as follows:
Servers implementing SWORD MUST support HTTP Basic Authentication (RFC 2617) over
TLS (RFC 2818).
3.2
Creating and Editing Resources
When depositing resources using SWORD, resources are created by a server when a client
makes an HTTP POST request with the resource in the HTTP request body. If the deposit
is made successfully, the server then gives a HTTP reponse with the HTTP 201 Status
code in the header of the response indicating the resource has been successfully created at
the repository side.
Servers returning a HTTP 201 status code after a deposit MUST preserve the resource
deposited.
Clients receiving a HTTP 201 status code MUST consider the resource deposited as being
accepted for storage by the repository.
3.2.1
Asynchronous treatment of resources
It MAY however be the case that the repository implements an additional asynchronous
validation process after which a resource MAY or MAY NOT be accepted. This for instance
is the case when a repository uses an intermediate repository where resources deposited
through the SWORD interface are temporarily stored, after which they will be moved to a
final location within the repository when they are properly validated by a repository
manager. When a resource is then being rejected by the repository during the validation
process after the server has sent an HTTP 201 response to the client, the situation MAY
arise where the client considers the resource as being successfully deposited into the
repository, while in fact the resource is NOT being stored into the repository. This situation
is viewed as undesirable.
Page 60 of 75
Servers implementing an asynchronous validation process MUST return an HTTP 202
Accept response code indicating the request has been accepted for processing, but the
processing has not been completed.
Clients receiving a HTTP 202 status code upon deposit of a resource MUST consider the
resource deposited as NOT being stored into the repository.
RFC2616 states that there is no facility for the re-sending of status codes. Therefore, a
client will not receive a notification of the outcome of the processing carried out by the
server. In order to allow clients to retrieve the outcome of the deposit, the sword:treatment
element MAY contain the status of the processing of the deposited resource.
Servers implementing HTTP 202 status codes MUST supply a permanent link to the Atom
Entry document of the response.
Servers implementing HTTP 202 status codes MUST update the sword:treatment element
of the Atom Entry document of the resource with the status of the processing of the
deposited resource.
Client SHOULD implement a mechanism to confirm the successful deposit by periodically
checking back at the server with an HTTP GET request to the permanent link supplied by
the server, in order to check the contents of the sword:treatment element of the Atom Entry
describing the deposited resource when a HTTP 202 status code has been received upon
deposit.
4
PEER Object Model
In order to future proof agreements and guidelines for a technical model, it is important to
detach the technical implementation from the abstract object and information model.
Furthermore, it is important to keep this abstract model aligned with other developments in
the area the model will be used in. For PEER, there are two of such developments:
? OAIS Reference Model for its use by the KB
? DRIVER object model for Enhanced Publications for its use in DRIVER context
In PEER, manuscripts and metadata will be transferred between authors, publishers, the
PEER Depot, Open Access repositories and an LTP Depot exploited by the KB.
This results in a PEER object consisting of a manuscript object which is being described by
one or more metadata objects (see Figure 18).
The D2.1 report provides an exhaustive metadata field set to be used in the PEER project
(see Table 5, below).
Figure 18: PEER Object model ERD
Page 61 of 75
Field Name Semantics
Syntax
Title
Article Title
Creator
Corresponding author's
name
Last name, first name
AuthorEmail Corresponding author's
e-mail address
Description Abstract
Date
Date of publication
ISO 8601:2004 ; yyyy-MM-dd
Identifier
DOI of published article
Coverage
Geographic location of the
contributing Author
ISO 3166-1-A2
Journal
Journal title
Affiliation
multi-tier organisation list
Country, Organisation, Laboratory
ISSN
Volume
Issue
Page
Type
Semantic type of the
publication
info:eu-repo/semantics/article
info:eu-repo/semantics/acceptedVersion
defaults to article.
Subject
Subject headings; Scientific
classification (defaults to
what is provided in the PEER
Journal tables1)
Language Language of the publication ISO 639-3 (defaults to 'eng')
Embargo
Embargo Period (defaults to
what is provided in the PEER
Journal tables)
Table 5: PEER information model
1
See Appendix A.
Page 62 of 75
For deposit the PEER object will be packaged into a container. The OAIS reference model
specifies the Submission Information Package (SIP) as a specialised Information Package
(IP) – which is used by the KB in the e-depot – for submission purposes (see Figure 19).
Figure 19: OAIS Information Package ERD
Figure 20: OAIS Content Information Object ERD
An IP consists of content information that is being described by Package Description
Information (PDI).
The content information object is defined as a data object (i.e. PDF file) interpreted using
representation information (i.e. mime-type, encoding version, etc.) (see Figure 20). Note the
structure information being a part of the representation information.
The PDI (see Figure 21) contains Reference Information (i.e. bibliographic descriptions and
persistent identifiers), Provenance Information (i.e. information about the conversion pro-
cess), Context Information (i.e. reference to the research project a publication is based on)
and Fixity Information (i.e. a checksum).
Page 63 of 75
Figure 21: OAIS Package Description Information ERD
The PEER information model can be mapped onto the OAIS Reference Model as depicted
in Figure 22. Here the structure object containing the structural information is being added
to the PEER object model.
Figure 22: OAIS Reference Model-PEER Information Mapping
Figure 23 depicts a technical mapping of the PEER object model. The structure is
expressed using the ORE abstract data model which is serialised as an Atom feed. This
Atom feed is to be contained in an XML file. The metadata is serialised in TEI, again
contained in an XML file. The manuscript is encoding in PDF/A, contained in a PDF-file.
Page 64 of 75
The XML file containing the ORE Atom feed, the XML file containing the TEI document and
the PDF file containing the manuscript are then being packaged in a compliant ZIP-file.
Upon deposit using SWORD, the ZIP-file is being placed into the body of the HTTP POST
request. The Header contains an MD5 checksum and MAY contain authorisation
information (see Figure 24).
The HTTP POST request is then being sent to the SWORD Interface Service as described
in paragraph 1.2.
Figure 23: Technical Mapping of the PEER model
Page 65 of 75
Figure 24: HTTP Mapping of the Technical Model
5
Implications on Repository level
5.1
Overview of the technical process
Generally speaking, the technical process of deposit can be broken in sequential order to
the serialisation and deposit request sub-processes on the client side (i.e. the PEER Depot)
and the de-serialisation, response and store sub-processes on the repository side (i.e.
publicly available repository).
5.1.1
Serialisation – Client side
The serialisation processes involve the serialisation of the metadata from an internal
storage to a specific (agreed upon standard) metadata field set and structure (i.e. DC) and
the packaging of the metadata and object file(s) into a content package (i.e. MPEG-21 DIDL
XML containers or compliant ZIP archives) which MAY include adding a manifest describing
contents and their correlation (i.e. the relation between an XML file containing the
descriptive metadata of a full-text publication and a PDF-file containing the actual full-text
publication) to a bit stream.
5.1.2
Deposit Request – Client side
The deposit request process includes a client posting the data to a service (i.e. the HTTP
POST request in SWORD) and the server receiving the data and placing it into a temporary
storage (either in memory or on disk).
5.1.3
De-serialisation – Repository side
In the de-serialisation process the receiving server tries to interpret (decode) the bit stream
again, in essence validating the contents. This MAY include the unpacking (when using ZIP
archives) or decoding (when using XML containers) of the bit stream to be able to interpret
the individual contents. It MAY also include the mapping or crosswalking of the metadata
structure to an internal (proprietary) metadata field set and/or structure. This process MAY
not necessarily be taking place in the actual interface service; it MAY include the sending of
the bit stream to an internal storage service which then indicates a success or failure to the
deposit service.
Page 66 of 75
5.1.4
Response – Repository side
After the contents are being de-serialised successfully and the server confirms the contents
of the received bitstream, the server MUST reply its status to the client (when using
SWORD this means reporting the appropriate HTTP status code and correct Atom/SWORD
response document). If the deserialisation process has failed for whatever reason, the
server SHOULD reject the deposit request to indicate an unsuccessful deposit to the client,
or accept the deposit with an HTTP 4xx status code and appropriate exception message
indicating a partial successful deposit.
5.1.5
Store – Repository side
The final step of the deposit process includes the storage of the received (meta)data into
the internal (meta)data store. This part of the process is implementation specific and
considered outside the scope of this document.
5.2
Functional Requirements
A repository implementing a SWORD interface service in the PEER context MUST be able to:
? Authenticate a user
? Receive, process and respond to an HTTP POST request as specified in this
document
? Interpret and store a PEER Submission Information Package as specified by the
PEER project
5.3
Implementation Steps
The implementation steps can be broken down into the implementation and exposure of the
web service to the outside world and interface with the repository on the inside.
Depending on specific needs, an implementer of the SWORD profile may either choose to
make an implementation by using one of the repository specific implementations available
for DSpace, ePrints and Fedora on Sourceforge1 or to write a custom SWORD server imple-
mentation (optionally by using the generic Java library also available from Sourceforge).
For the repository specific option please refer to the documentation provided with the
designated packages.
The second option either involves writing a service from scratch or use the source code
available from the SWORD Java library. This library contains ready to implement code for
writing servers and clients.
In addition to creating the web service which behaves according to the guidelines specified
in this document, special attention should be paid to the creation of crosswalk rules to map
the expression(s) of the PEER SIP to the internal repository data structure and semantics.
5.4
Common implementation faults
In the initial testing phase, a number of common implementation faults have been identified.
This paragraph will provide a brief overview of these findings and provide guidelines in
order to avoid these mishaps.
1
SWORD Project,
SourceForge.net: SWORD ? Project Web Hosting ? Open Source
Software, viewed on 25 March 2009, http://sword-app.sourceforge.net/
Page 67 of 75
5.4.1
Mandatory fields for Atom Entry
A number of issues have been identified with regard to the contents of the Atom Entry
within the service response documents.
RFC 5023 (Atom Publishing Protocol) states:
Implementers are asked to note that [RFC4287] specifies that Atom Entries MUST
contain an atom:summary element. Thus, upon successful creation of a Media Link
Entry, a server MAY choose to populate the atom:summary element (as well as any
other mandatory elements such as atom:id, atom:author, and atom:title) with content
derived from the POSTed entity or from any other source. A server might not allow a
client to modify the server-selected values for these elements.
5.4.2
Field Semantics
All contents of the Atom Entry document are to be interpreted as within the SWORD
context. For example, atom:author states the author of the transaction (i.e. authenticated
user), not that of the contents being transported (i.e. author of the publication).
5.4.3
Atom:Id
The contents of the atom:id field MUST be encoded using IRI (International Resource
Identifier) as defined in RFC3987.
Valid example:
<atom:id>http://hdl.handle.net/2437.2/20</atom:id>
Invalid example:
<atom:id>1234</atom:id>
Locations
When the server has processed the submission information package containing the fulltext
and metadata files, a number of locations may be identified:
1. Location of the Media Link Entry
2. Location of the original submission information package
3. Location of the fulltext
4. Location of the metadata
…
SWORD APP Profile v1.3 states:
The Location element of the HTTP header response MUST contain the URI of the Media Link
Entry, as defined in ATOMPUB. The Media Link Entry URI MUST dereference, and MUST
contain an atom:content element with a src attribute containing a URI.
Example:
HTTP/1.1 201 Created
Date: Mon, 18 August 2008 14:27:11 GMT
Content-Length: nnn
Content-Type: application/atom+xml; charset="utf-8"
Location:
http://www.myrepository.org/geo/atom/my_deposit.atom
Page 68 of 75
<entry ...>
<title>My Deposit</title>
<id>info:something:1</id>
<updated>2008-08-18T14:27:08Z</updated>
<author><name>jbloggs</name></author>
<summary type="text">A summary</summary>
...
<content type="application/zip"
src="
http://www.myrepository.ac.uk/geo/deposit1.zip"/>
<sword:packaging>http://purl.org/net/sword-types/tei/peer</sword:packaging>
<link rel="edit"
href="
http://www.myrepository.org/geo/atom/my_deposit.atom" />
</entry>
The Server MUST use the correct MIME-type reference in the type attribute of the
atom:content element in the Media Link Entry.
The Media Link Entry MUST contain an atom:element with a src attribute containing the
location of the fulltext. This is used in the feedback e-mail to the author.
The Media Link Entry MAY contain an atom:element with a src attribute containing the
location of the original submission information package, the raw metadata, ORE-ReM,
jump-off page, etc.
Page 69 of 75
Appendix D.
Peer Author Deposit interface specification
Interface: http://peer.mpdl.mpg.de/deposit
Summary
Authors are invited to self-deposit publications to the PEER repositories.
Actors
Corresponding author
Flow of Events
1. A user chooses to deposit his/her publication to the PEER Depot.
2. The user can enter basic metadata by using a webform (Note: journal name is provided
from a list)
3. The user can upload a PDF file.
3.1. The system checks the file mimetype and gives an error message, if the file is not
recognised as application/pdf.
4. The user needs to fill out a text shown on an image to avoid spamming (ReCAPTCHA1
mechanism).
5. The user finalises the submission by submitting the form.
6. The webform performs a simple validation.
6.1. The webform content is validated successfully:
6.1.1. The system shows a confirmation message.
6.2. The webform content is validated unsuccessfully:
6.2.1. The system informs the user on missing/not populated mandatory fields, or
wrong image recognition and asks the user to correct the entries and re-
submit the form. The use case ends unsuccessfully.
7. The system packs the metadata and the PDF file into an archive and saves the content
to a dedicated directory on the server (see Processing and Deposit of publications).
8. The use case ends successfully.
Additional information on depositing interface2
? All data will be deposited to all participating repositories.
? The user must select the journal name from a predefined list of journals.
? Metadata will be provided in TEI-XML format.
? Deposit interface will pack the metadata and provided PDF file into an archive.
? The archived data will be posted to the Peer Depot via ftps3 protocol.
1
http://en.wikipedia.org/wiki/ReCAPTCHA
2
More details can be found at http://colab.mpdl.mpg.de/mediawiki/Peer:_Author_Deposit
3
http://en.wikipedia.org/wiki/FTPS
Page 70 of 75
Appendix E.
Alternate author deposit workflow scenarios
Scenario 1: Author deposit in repositories, after registration at repositories
? Author receives notification of acceptance, including invitation from publisher to self-
archive stage-2 article. At this stage publication date is undetermined, and embargo
period is unknown.
? Author follows invitation to access further details via PEER Helpdesk.
? Author selects a participating repository for deposit or enters an URL of additional/
alternative repository of choice.
? Author accesses the participating repository’s website.
? Author registers1 at the repository in order to be able to make deposit. Authors,
including those not affiliated to the repository host institution, are authorised for
deposit.
? Author deposits his/her data in the repository.
? Author receives a notification of successful deposit.
? Embargo management is handled by the repository.
? Repositories transfer their usage log files to the Usage Research Team.
Problems:
? Authors can be authenticated and authorised only if they access from the reposi-
tory’s host institution network (e.g. university network).
? Embargo management requires automated matching process at repositories: match
author submitted article with metadata from publishers (transferred by PEER Depot)
and overwrite author’s metadata with publisher’s metadata. Elements of manual
checking may occur (e.g. special characters).
? Random deposit by non-PEER authors difficult to monitor, as there is no possibility
to identify if potential deposits come by following a PEER invitation or not.
1
The author registration may vary from one repository to another e.g. sending an e-mail to the
repository managers or fill-in a web form for requesting the privileges. For purpose of simplification of
the workflow analysis these details are not further depicted.
Page 71 of 75
Figure 25: Scenario 1
Scenario 2: Central author registration at PEER Helpdesk. Author gets redirected to
repositories for authentication under generic PEER account and deposit
? Author receives notification of acceptance, including invitation from publisher to self-
archive stage-2 article. At this stage publication date is undetermined, and embargo
period is unknown.
? Author follows invitation to access further details via PEER Helpdesk.
?
Author registers at the PEER Helpdesk.
? Author selects a participating repository for deposit.
? Author enters URL of additional /alternative repository of choice.
?
Author is redirected to the chosen repository to make deposit.
?
Author is authenticated under generic PEER guest account.
? Author deposits his/her data in the repository.
? Author receives a notification of successful deposit.
? Embargo management is handled by the repository.
? Repositories transfer their usage log files to the Usage research team.
Problems:
? Setting up a central author registration application means an additional effort of
funding and staff resources.
Repository 1
Author xy
Repository 6
PEER Depot
PEER Helpdesk
Repository 5
Repository 4
Repository 3
Repository 2
Author accesses the PEER repository website +
registers at repository to
make deposit. Authors, including remote authors (= not affiliated to the
repository host institution), are authenticated for deposit. Author makes deposit at
repository/ies.
Repositories:
? Send notification of successful deposit to author
? Embargo management
? Usage log file transfer to Usage Research Team
Author follows invitation to access further details via PEER Helpdesk
and selects a participating repository for deposit, or enters URL of
additional /alternative repository of choice.
Author receives notification of acceptance, including
invitation from publisher to self-archive stage-2 article.
Page 72 of 75
? Central registration is dependent upon authentication of generic PEER account at
repositories.
? Embargo management requires automated matching process at repositories: match
author submitted article with metadata from publishers (transferred by PEER Depot)
and overwrite author’s metadata with publisher’s metadata. Elements of manual
checking may occur (e.g. special characters).
Figure 26: Scenario 2
It is possible to combine the central registration with a central deposit with the authors
uploading their articles directly into the PEER Depot. Author deposits will in this scenario be
handled like publisher deposits, with distribution to all participating repositories, and where
duplicate filtering and embargo management will also happen at the PEER Depot.
The workflow for central author deposits is described in scenarios 3 & 4 below.
Scenario 3: Central registration at PEER Helpdesk, central deposit at PEER Depot via
Helpdesk
? Author receives notification of acceptance, including invitation from publisher to self-
archive stage-2 article. At this stage publication date is undetermined, and embargo
period is unknown.
? Author follows invitation to access further details via PEER Helpdesk.
? Author registers at the PEER Helpdesk.
?
Author is redirected to a deposit interface on the PEER Depot.
Repository 1
Author xy
Repository 6
PEER Depot
PEER Helpdesk
Repository 5
Repository 4
Repository 3
Repository 2
Author is redirected to the chosen PEER repository to make deposit +
authenticated under generic PEER guest account.
Author makes deposit in repository/ies.
Author registers at Helpdesk + selects a participating repository for
deposit + enters URL of additional /alternative repository of choice.
Author receives notification of acceptance, including
invitation from publisher to self-archive stage-2 article.
Author follows invitation to access further details via PEER helpdesk.
Repositories:
? Send notification of successful deposit to author
? Embargo management
? Usage log file transfer to Usage Research Team
Page 73 of 75
?
Author deposits his/her data in the PEER Depot.
? Author receives notification of successful deposit.
? Embargo management at the PEER Depot, as per publisher deposits.
? Distribution of author deposits to all participating repositories, as per publisher
deposits.
? Repositories transfer their usage log files to the Usage research team.
Problems:
? Central deposit seen as interference with the ‘natural way’ of author deposit –
referred to and approved by Executive.
? No communication between authors and repositories.
Figure 27: Scenario 3
Scenario 4: Central registration & central deposit a PEER Depot
? Author receives notification of acceptance, including invitation from publisher to self-
archive stage-2 article. At this stage publication date is undetermined, and embargo
period is unknown.
? Author follows invitation to access further details via PEER Helpdesk.
?
Author is redirected to the PEER Depot for registration and deposit.
Repository 1
Author xy
Repository 6
PEER Depot
PEER Helpdesk
Repository 5
Repository 4
Repository 3
Repository 2
Repositories:
Usage log file transfer to Usage research team
Author follows invitation to access further details via PEER
helpdesk +
registers at Helpdesk.
Author is redirected to deposit interface on PEER Depot.
Author receives notification of acceptance, including
invitation from publisher to self-archive stage-2 article.
Author
makes
deposit
at PEER
Depot
PEER Depot:
? Send notification of successful deposit to author
? Embargo management
? Distribution of author deposits to all
participating repositories, as per publisher
deposits.
Page 74 of 75
?
Author registers at the PEER Depot.
?
Author deposits his/her data in the PEER Depot.
? Author receives notification of successful deposit.
? Embargo management at the PEER Depot, as per publisher deposits.
? Distribution of author deposits to all participating repositories, as per publisher
deposits.
? Repositories transfer their usage log files to the Usage research team.
Problems:
? Setting up a central author registration application means an additional effort of
funding and staff resources.
? Central deposit seen as interference with ‘natural way’ of author deposit – referred
to and approved by Executive.
? No communication between authors and repositories.
Figure 28: Scenario 4
Repository 1
Author xy
Repository 6
PEER Depot
PEER Helpdesk
Repository 5
Repository 4
Repository 3
Repository 2
Repositories:
Usage log file transfer to Usage Research Team
Author follows invitation to access further details via PEER
Helpdesk. Author is directed to PEER Depot for registration and
deposit.
Author receives notification of acceptance, including
invitation from publisher to self archive stage-2 article.
Author
registers &
deposits
at PEER
Depot
PEER Depot:
? Send notification of successful deposit to author
? Embargo management
? Distribution of author deposits to all
participating repositories, as per publisher
deposits.
Page 75 of 75
Appendix F.
Current and planned practice in the provision of usage data
in a participating repository
PubMan@MPDL
PubMan@MPDL (http://pubman.mpdl.mpg.de/pubman/) is a participating repository in the
PEER repository task force. As it is primary importance to limit additional effort imposed
upon participating repositories, PubMan@MPDL is well positioned to provide a sample of
current practices against which the PEER specification for the provision of usage data can
be measured.
The PubMan@MPDL supports scientists and institutes in the management and the digital
curation of their publications. This solution addresses all disciplines and focuses on the tar-
get groups of scientists, local librarians and local IT. It is built as an eSciDoc solution (see
http://escidoc.org) that focuses on publication management. PubMan is used at present by
several early-adopter Max-Planck Institutes. It is anticipated be used by all institutes within
the Max-Planck Society (MPG) and to replace the current eDoc publication repository of
MPG.
PubMan@MPDL provides the log files in the format NCSA combined.
Example:
134.76.162.XXX - - [01/Sep/2009:10:07:44 +0200] "GET
/pubman/item/escidoc:265993:2/component/escidoc:265992/PEER_stage2_10.1080_slash
_00268970903078542.pdf HTTP/1.1" 200 9193 "-" "Mozilla/5.0 (Windows; U; Windows NT
5.1; de; rv:1.9.0.13) Gecko/2009073022 Firefox/3.0.13 (.NET CLR 3.5.30729)"
"layout=PubManTheme; JSESSIONID=FC7627E3DED9E5F00E57C5861AC53A57"
The request contains the following information:
? Item identifier (escidoc:265993)
? File identifier (escidoc:265992)
? File name (PEER_stage2_10.1080_slash_00268970903078542.pdf)
? Session identifier (FC7627E3DED9E5F00E57C5861AC53A57)
Notes: Due to technical constraints, the file name has all slash symbols (coming from the
DOI identifier) replaced with “_slash_” string. Due to privacy issues, the last three digits in
the IP address of the request are be replaced with “XXX”.
HAL@INRIA
HAL uses Apache and produces log files in the format NCSA combined.
BiPrints@UniBi
http://repositories.ub.uni-bielefeld.de/biprints/
BiPrints uses APACHE and produces log files in the format NCSA combined.
66.249.66.5 - - [12/Jan/2009:20:31:53 +0100] "GET /pdf_frontpage.php?source_opus
=87&startfile=Egelhaaf_et_al_UniForsch2002.pdf HTTP/1.1" 302 414 "-" "Mozilla/5.
0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"