Professional Documents
Culture Documents
Document Details
Project HELM
Document Approval
Approver Pistoia Alliance Executive Director John Wise 24th July 2015
Approver HELM Standard Management Group Lead Matthias Nolte 24th July 2015
Jan Holst-Jensen
Roland Knispel,
HELM Standard Management Group Team Stefan
Approver 24th July 2015
Members Klostermann, Sven
Neumeyer, Yohann
Potier
1.0 Issued 24th July Claire Bellamy, Jan First version released to support RFP.
2015 Holst-Jensen, Roland
Knispel, Stefan
Klostermann, Sven
Neumeyer, Matthias
Nolte, Yohann Potier,
Tianhong Zhang
Disclaimer
If you are reading this document on paper, it is an uncontrolled copy and you may not have the latest
version. If you are updating this document, you may not be using the current version. Please refer to
the electronic copy available within the HELM Google Drive for the authoritative version.
Ambiguous HELM Requirements Specification - Phase 1
Version: 1.0
Type: Specification
Page 2/17
Ambiguous HELM Requirements Specification - Phase 1
Version: 1.0
Type: Specification
1. INTRODUCTION
1.1 Definitions
Term Description
1.2 Purpose
The purpose of this User Requirements Specification (URS), is to outline the requirements of an
extension to the HELM standard and its associated tools.
The first purpose of this extension is to enable the representation of ambiguous macromolecules
meaning macromolecules where not all characteristics of the structures are or can be fully specified.
The intention is to enable HELM to capture the structural information that can be specified and report
what is not known.
The second purpose of this work is to implement a set of APIs (Application Programming Interfaces)
including web-services for the toolkit. These are intended to:
1. Provide a single mechanism via which the toolkit can access the developers chemical engine
of choice.
2. Abstract the toolkit functionality such that in the future the functionality can be accessed by a
thin client by using the web-services.
The intended audience of this document are the HELM project team members and groups interested
in implementing the requirements.
1.3 Background
The Hierarchical Editing Language for Macromolecules (HELM), an emerging notation standard,
enables the representation of a wide range of biomolecules (e.g. proteins, nucleotides, antibody drug
conjugates) through a hierarchical notation that represents complex macromolecules as polymeric
structures with support for unnatural components (e.g. unnatural amino acids) and chemical
modifications. Created by Pfizer scientists, the Pistoia Alliance formalized the HELM notation as an
open standard in early 2013 and publicly released a modified version of previously proprietary
software tools to the Open Source community, which now serve as the reference implementation of
the HELM standard.
Page 3/17
Ambiguous HELM Requirements Specification - Phase 1
Version: 1.0
Type: Specification
Two major extensions to the original code have been created and published (figure 1);
Exchangeable HELM which enables the user to include the monomer definition with the
HELM string thus creating a format that can be used to exchange information between
organizations.
HAbE (HELM Antibody Editor). Created by Roche HAbE enables the recognition and display
of antibody domains and provides functionality and tools that perform related functions. One
example is the automatic creation of Cys-Cys bonds.
yFiles a graphing package that supports the graphical representation of HELM structures. A
developer licence is required to work with the code. You do not need a licence to distribute
your final system.
MarvinBeans a chemical engine and sketcher that is used to perform calculations, interpret
extended SMILES strings and sketch monomers. A licence is required for each HELM editor
user.
Prospective HELM adopters have expressed concerns that these dependencies restrict their ability to
use HELM within their organization. To address this in the shorter term, work is being done by third
parties to create new versions of the HELM editor that can access different chemical tools for drawing
and calculation. The Roche HAbE team are also working to include ambiguity in the antibody editor.
Therefore by the time the requirements in this RFP are implemented, there may be minor changes to
the landscape, however the project does not control these timescales and cannot guarantee whether
this work will be complete or not.
Page 4/17
Ambiguous HELM Requirements Specification - Phase 1
Version: 1.0
Type: Specification
Greater flexibility in accessing the functionality and in the choice of drawing tool/chemical
engine, so they do not have to purchase a Marvin licence for every HELM editor user.
The ability to represent ambiguity in macromolecules.
The project has considered these requirements and produced a road-map that divides the work into
two tranches.
In phase 2 the project intends to create a new thin client editor which includes the HAbE functionality.
This editor will provide user access to the representation of ambiguity in HELM structures. The
existing editor and HAbE will be retained on GitHub, but will not be developed further. The new editor
will not be dependent on yFiles.
Page 5/17
Ambiguous HELM Requirements Specification - Phase 1
Version: 1.0
Type: Specification
Phase 2 does not form part of these requirements and is included for background information only.
1.4 Scope
The work defined in this specification consists of the following changes to the HELM toolkit:
1. To implement an extension to the HELM notation which allows the definition of structurally
ambiguous molecules. The line notation design specified in Ambiguous HELM Line Notation
Design.doc should be used as the definition for this work.
2. To create a collection of Chemical Toolkit APIs that allow abstracted integration of chemical
engines and the HELM toolkit.
3. To create a collection of web-services that allow access to the HELM toolkit functionality.
4. To document the notation and code changes.
Page 6/17
Ambiguous HELM Requirements Specification - Phase 1
Version: 1.0
Type: Specification
The HELM notation management group who have the responsibility for managing the HELM
notation.
The HELM open source dictator who approves all code merges to the main trunk.
The wider Pistoia Alliance HELM project team and steering committee.
HELM adopters particularly in-house IS teams responsible for implementing and managing
HELM based systems and vendors supplying HELM compliant systems.
The end users of HELM compliant systems (typically scientists) who use the systems.
1.5 References
1.5.1 Background Resources
[1] Zhang, T., et. al., (2012), HELM: A Hierarchical Notation Language for Complex Biomolecule
Structure Representation, J. Chem. Inf. Model., vol 52,pp 27962806
http://pubs.acs.org/doi/full/10.1021/ci3001925
www.OpenHELM.org
[3] HELM resource centre (contains documentation and links to the code)
https://pistoiaalliance.atlassian.net/wiki/display/PUB/HELM+Resources
https://github.com/PistoiaHELM.
1.5.2 Documentation
https://drive.google.com/file/d/0BybDwk56P1wFZnprdVlDWjI4QzQ/view?usp=sharing
Page 7/17
Ambiguous HELM Requirements Specification - Phase 1
Version: 1.0
Type: Specification
2. GENERAL DESCRIPTION
2.1 Ambiguity - structure types to be represented
In order to understand the requirements of ambiguous HELM we need to take into account the
different types of ambiguity that could be present. It is possible that a structure can be fully specified
except for one aspect and there is much useful structural information that can be recorded. It is
equally possible that very little of the structure is specific, but there is still some information that is
worth capturing.
To this end the project team has defined three types of ambiguity:
Component
Composition
Connection
The following examples illustrate the different ambiguities. These are chosen to represent each type,
but it is possible that a particular structure will include a combination of these types.
Examples
PEG or bead coupled in a specified and defined manner to a specified and defined position
on a monomer in a simple polymer.
Parts of the variable regions of heavy chain and/or light chain for antibodies where the
monomers in that region are not specified.
The glycosylation moiety of proteins (if not fully specified as G0, G1, or G2)
Example: mRNA synthesized using a mixture of 50% Uridine and 50% Pseudouridine as
the reagent. This will result in each Uridine position having the possibility of being either
Uridine or Pseudouridine.
Page 8/17
Ambiguous HELM Requirements Specification - Phase 1
Version: 1.0
Type: Specification
Undetermined monomer due to the inability of analytical methods to identify that particular
monomer. In this case only one distinct monomer exists (i.e. it is not a mixture) but the
monomer cant be unequivocally identified. The probability of several different monomers of
choice is given but should not be confused with ratios as detailed above.
This applies as well to Met oxidation, Trp oxidation, His oxidation, Lys glycation, Asn
deamidation, Pyr Glu
Unknown attachment points between 2 defined simple polymers, e.g. ADC in a specified ratio
(1:1 ; 1:1.5 ; 1:2 etc.)
Unknown attachment points between 2 defined simple polymers e.g. ADC in a non-specific
ratio of one of the types below:
o no ratio defined
o - ratio given as decimal number (e.g. 1:2.1)
o - ratio given as interval (e.g. 1:2.1---2.3)
There is no user interface component to this work, so no user orientated cases can be created.
Examples of the types of molecule to be represented are given in the general description section.
2.3 Overview
HELM is a notation: The definition and specification of standard is documented in the HELM notation
specification V1.1.
Supporting the notation is the HELM software suite which consists of the following components
Page 9/17
Ambiguous HELM Requirements Specification - Phase 1
Version: 1.0
Type: Specification
The HELM Toolkit: The HELM Toolkit contains the functionality needed to implement a HELM-
based system, enabling reading, analysis, and manipulation of HELM objects, as well as some
monomer management. It is written in Java and delivered as a .jar file.
The HELM Editor: The HELM Editor is a tool that enables the user to visualize and edit HELM
molecules. It is dependent on the HELM toolkit. The latest version of the HELM editor can be
found on http://pistoiahelm.github.io/. It is written in Java.
HAbE (HELM Antibody Editor): A tool that analyses antibody structures, displays them at a
domain level and allows specific actions to be taken such as connecting free Cys residues.
Currently the software is available as open source code on GitHub and in two compiled forms: Java
web start and an applet. Both are available from https://github.com/PistoiaHELM.
2.3.1 Deliverables
The output of this work shall consist of the following:
1. Code updates to the HELM toolkit to include ambiguity, add the additional functions and create
the HELM notation toolkit API.
3. New code to create the chemical toolkit API and two implementations. One to use MarvinBeans
and the other to use a free chemistry engine of the developers choice.
4. Updates to the HELM specification document to include the ambiguity extension definition.
5. Release notes (it is acceptable for these to be generated from the code).
As this work package only concerns the HELM toolkit the users do not directly interact with the
functionality. Therefore no users are identified in this section.
Maven Archetype
Tomcat
Spring 4 (application context, dependency injection)
Spring Data, JPA 2.1/Hibernate 4.3 (persistence)
JAX-RS 2.0/Apache CXF 3.0 (stateless REST services)
Spring Security
TestNG + HSQLDB for in-memory tests
Jenkins + CID custom tooling (CI build/deploy to Tomcat)
The final solution shall contain no dependencies on third party tools that require paid licences.
It is assumed that respondents will modify the existing HELM code and not start from scratch.
The HELM editor is dependent on the toolkit. This RFP details changes to the toolkit, but it must be
possible to use the new toolkit code with the existing Java editor without further changes to the editor.
The functionality to handle ambiguity will not be accessible from the editor since the UI has no way of
entering it or representing it.
HELM is a live standard and, as such, any changes will affect its current users and should be made
backwardly compatible as far as possible. HELM is an open standard and it is not possible for the
project to be aware of the details of every groups use of HELM, and so there will be unknown
dependencies. All development should aim to minimize disruption to existing users.
Page 11/17
Ambiguous HELM Requirements Specification - Phase 1
Version: 1.0
Type: Specification
3. OPERATIONAL REQUIREMENTS
Key to priorities
E = Essential. The system must fulfil this requirement or it fails in a fundamental way and
cannot be deployed.
H = High. A requirement of high importance
M = Medium. A requirement of medium importance
L= Low. A requirement of low importance
Priority
Req # Requirement
(E,H,M,L)
The notation changes described in Ambiguous HELM Line
Notation Design.doc must be implemented in the HELM
FRS100 E
notation toolkit. This includes all functions required to write and
interpret ambiguity as defined.
The implementation shall be backwardly compatible with the
FRS110 E
current version of the toolkit.
There shall be single mechanism to pass in original HELM and
FRS120 E
ambiguous HELM notation.
Priority
Req # Requirement
(E,H,M,L)
Page 12/17
Ambiguous HELM Requirements Specification - Phase 1
Version: 1.0
Type: Specification
Note that this service will initially link to the local HELM
monomer XML file, not a centralised store.
The requirements must be implemented for both the current HELM specification (HELM 1.1) and
ambiguous HELM (HELM2.0) unless otherwise specified.
Priority
Req # Requirement
(E,H,M,L)
There shall be a service that checks the input HELM string for:
Conformance to the specification
FRS300 E
Availability of monomers in the current monomer
database.
Page 13/17
Ambiguous HELM Requirements Specification - Phase 1
Version: 1.0
Type: Specification
Priority
Req # Requirement
(E,H,M,L)
There shall be an API through which the HELM notation toolkit
can access the chemical toolkit of the users choice. The
following calls shall be supported:
SMILES validation
SMILES/MolFile import/conversion
FRS200 HELM to chemical structure conversion E
Molecular weight
Molecular formula
Canonicalization of ad-hoc chemical modifications
Specification of chemical attachment points on
monomer structures.
Molecule manipulation (bond breaking and forming)
Page 14/17
Ambiguous HELM Requirements Specification - Phase 1
Version: 1.0
Type: Specification
The exact tool and version to be agreed with the project team
prior to starting development work.
3.1.5 Documentation
The following documents shall be created:
Priority
Req # Requirement
(E,H,M,L)
Page 15/17
Ambiguous HELM Requirements Specification - Phase 1
Version: 1.0
Type: Specification
Java
Maven Archetype
Tomcat
Spring 4 (application context, dependency injection)
NFR100 Spring Data, JPA 2.1/Hibernate 4.3 (persistence) E
JAX-RS 2.0/Apache CXF 3.0 (stateless REST
services)
Spring Security
TestNG + HSQLDB for in-memory tests
Jenkins + CID custom tooling (CI build/deploy to
Tomcat)
All code must pass regular code reviews with the HELM
project team for good software engineering practice. A
NFR220 minimum of two reviews will be held, the first will be conducted
E
at a time not later than one third of the way through the
planned development period.
Page 16/17
Ambiguous HELM Requirements Specification - Phase 1
Version: 1.0
Type: Specification
Page 17/17