Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

BackgroundModern biological science generates a vast amount of data, the analysis of which presents a major challenge to researchers. Data are commonly represented in tables stored as plain text files and require line-by-line parsing for analysis, which is time consuming and error prone. Furthermore, there is no simple means of indexing these files so that rows containing particular values can be quickly found.ResultsWe introduce a new data format and software library called wormtable, which provides efficient access to tabular data in Python. Wormtable stores data in a compact binary format, provides random access to rows, and enables sophisticated indexing on columns within these tables. Files written in existing formats can be easily converted to wormtable format, and we provide conversion utilities for the VCF and GTF formats.ConclusionsWormtable's simple API allows users to process large tables orders of magnitude more quickly than is possible when parsing text. Furthermore, the indexing facilities provide efficient access to subsets of the data along with providing useful methods of summarising columns. Since third-party libraries or custom code are no longer needed to parse complex plain text formats, analysis code can also be substantially simpler as well as being uniform across different data formats. These benefits of reduced code complexity and greatly increased performance allow users much greater freedom to explore their data.

More information Original publication

DOI

10.1186/1471-2105-14-356

Type

Journal article

Publication Date

2013-12-01T00:00:00+00:00

Volume

14

Addresses

U, n, i, v, e, r, s, i, t, y, , o, f, , E, d, i, n, b, u, r, g, h, ,, , K, i, n, g, ', s, , B, u, i, l, d, i, n, g, s, ,, , W, e, s, t, , M, a, i, n, s, , R, o, a, d, ,, , E, d, i, n, b, u, r, g, h, ,, , E, H, 9, , 3, J, T, ,, , U, K, ., , j, e, r, o, m, e, ., k, e, l, l, e, h, e, r, @, e, d, ., a, c, ., u, k, .

Keywords

Animals, Humans, Drosophila Proteins, Random Allocation, Computational Biology, Genomics, Genome, Human, Computer Simulation, Software, Libraries, Digital, Databases, Factual, Genome, Insect, Search Engine, Electronic Data Processing