Imagining a CIF-based XAFS data exchange standard James Hester, Bragg Institute

Outline • More detail on a CIF solution • What needs to be decided

Some points from Madrid talk • Once the community have adopted a standard, it will be very hard to move away from it – so take the time to get it right • Flexibility is important – things change • Complexity can only be moved around, not eliminated: CIF maintains a simple syntax by moving most complexity into textual “Dictionaries”. • Framework, not format • CIF syntax is adequate for expressing scientific data

Fun facts on the Crystallographic Information Framework (CIF) • Until 2005, CIF stood for “Crystallographic Information File” • Canonical information on CIF contained in International Tables Vol G (594 pages, also accessible at http://it.iucr.org/g) • This year is the 20th anniversary of CIF, with the original publication in 1991 (Hall, Allen and Brown (1991) Acta Cryst. A47, 655-685) • Invented before HTML or XML or the World Wide Web

Constructing meaning

• “Interpret” – the hard part

How CIF constructs meaning • Diagram 4

CIF syntax • A CIF file is pure ASCII text • All items are white-space separated (i.e. free format) • A CIF file is divided into data blocks, which start with the “data_name” token • A CIF dataname is followed by the value taken by that dataname (tag-value format) • Tabular data is represented by “loops”, which are laid out like tables in publications – first a header, then columns of data. Columns and rows can appear in any order • Loops and tag-value pairs can appear in any order within data blocks, which can themselves appear in any order • Su's are expressed according to IUCr recommendations

# An Example CIF-format file for XAFS # data_v2o5_nanotube _xafs_absorber.atom V _xafs_absorber.edge K _xafs_source.identification 'KEK-PF BL20B' _xafs_source.location 'Tsukuba, Japan' _xafs_collection.date_time_initiated '2008-05-26T15:35:33' loop_ _xafs_detectors.label _xafs_detectors.position _xafs_detectors.type _xafs_detectors.special_details monitor monitor ionisation . fl-detector detector fluorescence '36-element PAD detector' io-detector detector ionisation . foil foil ionisation . loop_ _xafs_ionisation_detector.label _xafs_ionisation_detector.gas_pressure _xafs_ionisation_detector.length _xafs_ionisation_detector.amplifier _xafs_ionisation_detector.amplifier_gain monitor 1 10 'Keithley' 10 io-detector 1 20 'Keithley' 10 foil 1 5 'Keithley' 11 loop_ _xafs_reduced.energy _xafs_reduced.absorbance 5248.52108 0.813(2) 5258.29435 0.798(2) 5268.26606 0.781(2) 5278.27878 0.764(2) 5288.28697 0.748(2) 5298.19834 0.731(19)

CIF dictionaries • A collection of definitions also in CIF-like format • Tags drawn from restricted vocabulary of around 50 possible tags • About half of these tags are for human consumption only (e.g. item_description) • Almost all the rest are for validation • Three variants: DDL1/2/m

Comparing the DDLs • DDL1: the first DDL – Basic relational database descriptors – Simplest, 27 tags in total • DDL2: development driven by macromolecular database (PDB) – Detailed data types – Excellent relational database match – 60 tags • DDLm (draft): brings DDL1 and 2 together, addresses deficiencies – Excellent relational database descriptors – Vectors and matrices – Algorithms for describing relationships between datanames – Dictionaries can be assembled out of reusable chunks

The DDL Dictionary “datamodel” A dictionary language will refer to a datamodel, which must be compatible with the syntactical datamodel. For all CIF DDLs, this datamodel is isomorphic to a relational database. In particular, note the following equivalences: Category = table description ● Category key = table key ●

Not implied by the grammar!

Datafile loop = a filled-in table in the database ● A loop row = a table record ● Loop headers = table columns ● Unlooped datanames = values taken by columns in a single-row table ●

Definitions • Human-readable material, including examples and descriptions • Relational database ready: keys are definable • Tags for validation, e.g. mandatory or not?

save_XAFS_DETECTORS     _category.description ;   Data items in the XAFS_DETECTORS category record details  about the layout and type of detectors used in an XAFS  experiment.  Further details about particular aspects of the  detectors used are recorded in the separate categories XAFS_DETECTORS_IONISATION and XAFS_DETECTORS_FLUORESCENCE. ;     _category.id   xafs_detectors     _category.mandatory_code  no     _category_key.name     '_xafs_detectors.label'     loop_         _category_examples.detail         _category_examples.case ;  EXAMPLE 1: A simple three­ionisation counter setup for  absorption measurements ; ;   loop_     _xafs_detectors.label     _xafs_detectors.position     _xafs_detectors.type    monitor         monitor        ionisation      .    detector       detector       ionisation       .    foil           foil           ionisation       . ; ; EXAMPLE 2: A fluorescence detector as well as 3 ionisation  chambers ; ;   loop_     _xafs_detectors.label     _xafs_detectors.position     _xafs_detectors.type     _xafs_detectors.special_details    monitor        monitor        ionisation      .    fl­detector    detector       fluorescence    '36­element  Ge PAD detector'    io­detector    detector       ionisation       .

An item definition • Enumerated values allow validation • The relevant category is identified • Correct data value construction is specified

save__xafs_detectors.type     _item_description.description ;     The type of detector used for detecting photons ;      _item.name     '_xafs_detectors.type'      _item.category_id  'xafs_detectors'      _item.mandatory_code   yes      _item_type.code         string      loop_      _item_enumeration.value      _item_enumeration.detail         ionisation    'An ionisation chamber'         fluorescence  'A pixelated fluorescence detector'         Lytle         'A Lytle detector' save_

Another item definition • Units can be specified

save__xafs_reduced.energy     _item_description.description ;  The energy at which a single measurement of  absorbance was taken, after all beamline­dependent  corrections have been applied. ;      _item.name         '_xafs_reduced.energy'      _item.category_id  'xafs_reduced'      _item.mandatory_code   yes      _item_type.code     float      _item_units.code    electron_volts save_

Issues • Each data tag can only appear once in a data block, so if 'energy' appears in multiple tables, it must have multiple names • Data items in the same category must always be tabulated together. For example, k and (k) • Only 2D tables are possible. • No complex data structures (vectors, matrices)

Management • As the field develops, dictionaries will need updating. How is this going to be managed? • Where are the canonical copies kept? • IUCr: – permanent committee (COMCIFS) – Dictionary management groups report to COMCIFS – COMCIFS monitor developments in relevant IUCr commissions (computing, nomenclature) – IUCr maintains web-accessible register of dictionaries, the dictionaries themselves and CIF documentation

A protected standard • CIF is trademarked for protective purposes (cf 'Linux') and the standards themselves are copyrighted by the IUCr • Statement of policy http://ww1.iucr.org/ipr.html • All software claiming to read or write a CIF-format file must actually be able to do so • IUCr is interested in promoting the standard for use in structural science

CIF services from COMCIFS • Verification of syntax (datafile and dictionary) • Check that XAFS ontology matches IUCrmaintained ontologies (if required) • Advice on dictionary construction

Decisions, decisions • Agreement on definitions • Agreement on purpose(s) – For databases – Data transfer – Publication supplementary material • Agreement on minimum information for each purpose • Datafile format – Text/binary, simple/complex, established/new • Dictionary format, if any • Dictionary language – Which set of tags? DDL1/2/m/custom

• Management – Custodian – Update mechanism

Imagining a CIF-based XAFS data exchange standard - GitHub

May 26, 2008 - Excellent relational database descriptors. – Vectors and ... Datafile loop = a filled-in table in the database. ○ A loop row ... DDL1/2/m/custom.

953KB Sizes 7 Downloads 166 Views

Recommend Documents

Libraries of XAFS Spectra - GitHub
Can the IXAS or IUCr support and host these libraries? The model of ... Web-based Libraries of XAFS Spectra have obvious utility for sharing data: Look up ... But: relational databases have been shown many times to be the best ... Page 10 ...

A Framework for Flexible and Scalable Replica-Exchange on ... - GitHub
a type of application with multiple scales of communication. ... Chemistry and Chemical Biology, Rutgers University, Piscataway,. NJ 08854. †Electrical .... ity built on the BigJob/SAGA distributed computing envi- ronment ... Fortunately, great pro

LOOPRING Decentralized Token Exchange Protocol v1.22 - GitHub
Aug 7, 2017 - Blockchain[1][2] technology was created to facilitate the cryptocurrency Bitcoin[3]. It was ... Bitcoin exchange ”Mt. Gox” suspended trading, closed its website and exchange service, ... ILP[10]) to power payments across different l

Javascript Data Exploration - GitHub
Apr 20, 2016 - Designers. I'm a sort of. « social data scientist ». Paris. Sciences Po médialab. I just received a CSV. Let me grab my laptop ... Page 9 ...

Tabloid data set - GitHub
The Predictive Analytics team builds a model for the probability the customer responds given ... 3 Summary statistics .... Predictions are stored for later analysis.

RStudio Data Import - GitHub
“A data model in which the data is organized into a tree-like structure” - Wikipedia. Page 10. WHAT IS XML, HTML AND JSON? XML: Extensible Markup ...

Data Science - GitHub
Exploratory Data Analysis ... The Data Science Specialization covers the concepts and tools for ... a degree or official status at the Johns Hopkins University.

My precious data - GitHub
Open Science Course 2016 ... It's part of my contribution to science community ... Exports several formats (pdf, docx, csv, text, json, html, xml) ... http://dataverse.org/blog/scientific-data-now-recommends-harvard-dataverse-all-areas-s · cience ...

Open Data Canvas - GitHub
Top need for accessing data online. What data is most needed? Solution. How would you solve this problem? ... How big is the universe of users? Format/Use.

data tables - GitHub
fwrite - parallel file writer. SOURCE: http://blog.h2o.ai/2016/04/fast-csv-writing-for-r/ ... SOURCE: https://www.r-project.org/dsc/2016/slides/ParallelSort.pdf length.

Reading in data - GitHub
... handles import from SPSS. Once installed, the package contents can be loaded into R (made available to the R system) with the function call. > library(Hmisc) ...

meteor's data layer - GitHub
Full-stack JavaScript Framework for both Web and. Mobile. □. Built on top of the NodeJs. □. Open Source. □ ... Meteor doesn't send HTML over the network. The server sends data ... All layers, from database to template, update themselves ...

imagining india free download pdf
There was a problem previewing this document. Retrying... Download. Connect more apps... Try one of the apps below to open or edit this item. imagining india ...

Live the Cool Life - Imagining a Greener Sheffield
29 Mar 2007 - “Climate scientists report global emissions of greenhouse gases now so reduced that the world is on track to stop run away climate change” and the Star headline .... Heather Hunt and Jenny Patient 20th February 2007. Sheffield Campa

Research Data Management Training - GitHub
Overview. Research Data management Training Working Group: Approach and. Methodology ... CC Australia ported licence) licence. ... http://www.griffith.edu.au/__data/assets/pdf_file/0009/528993/Best_Practice_Guidelines.pdf. University of ...

RN-171 Data Sheet - GitHub
Jan 27, 2012 - 171 is perfect for mobile wireless applications such as asset monitoring ... development of your application. ... sensor data to a web server.