Submitted by toniher on
It's a long time since Molecular Biology data (from sequence strings to protein structural coordinates) are being released openly to the public, as it's the Web an interface for exploring and visualizing those data. Indeed, my first approaches to Bioinformatics and to, more or less, serious programing were preparing CGI points to command-line applications or to results of analysed data.
I think most Bioinformaticians of my age learned or assumed that serious business had to be done by bulk downloading raw files and then working on them (normally, by that time, in not very popular operating systems). At the same time, I have the impression that web interfaces were usually regarded as a kind of accessory commodity for less tech-savvy scientists (for instance, our wet lab colleagues).
As more and more data were produced, and at a fastest step, taking care of being up-to-date or handle different releases (sometimes in different variety of formats) became a real issue.
With this scenario, as a compromise for enabling automated access and not having to tangle with all the infrastructure and management difficulties resulting from handling large data, some providers started to offer APIs for their resources.
In some cases, such as ENSEMBL API, it consisted in a powerful abstract interface to a MySQL public-access database, allowing scientists to design their queries in a more biologically-oriented fashion (thinking about concepts such as sequences, annotation or chromosome locations) instead of having to pay attention to whatever DB structure is behind.
Parallelly, coming from enterprise environments, other solutions started to make themselves a place, such as SOAP. This web services protocol, despite its powerfulness and possibilities in a two-way communication, still represented a barrier for a larger adoption among 3rd party clients (which, most of the times, might simply desire a read-only easy-to-use access).
Finally, we are progressively seeing an increasing number of knowledgebases and services that are adopting RESTful APIs. That's the case of NCBI E-Utilities, released some years ago and, more recently, a platform such as ENSEMBL is making the step.
This latter access to resources eases the creation of more diverse client solutions, such as webapps, which are essentially based on Javascript and can process XML or JSON responses very naturally. Depending on the case, despite it can still be necessary to have an intermediary layer between the primary source and the final client app, the workflow gets simplified by interiorising a more web-dominant communication.
In my opinion, this approach can enable more parties —who can afford to be less familiar with all the historical intricacies of Bioinformatics conventions— to explore biological data coming from other backgrounds where the Web is already succeeding and conveying this knowledge to audiences who might also benefit of it nowadays. These audiences should include from scientists to final citizens interested about themselves or about how we understand Life.
This last paragraph summarises the statement of principles that nurtured the BioData Design Jam Alina Mierluș and me prepared for last Campus Party 2012 in Berlin. This Saturday November 10, as part of the enormous Mozilla Festival 2012 (in London) we will iterate again on this with all those who want to join (more details).
Some extra links
- Personal experience blogpost about REST and ENSEMBL
- Webservices in EMBL (both REST and SOAP)
- Bio2RDF (entry point for exploring many biodata resources the Linked Open Data way)