Longitudinal datasets

What is longitudinal data?

Longitudinal data is the same type of information on the same types of subjects, tracked over time.

The Gazette’s longitudinal data contains specific information covering insolvencies, wills and probate, civil promotions, church appointments, military promotions and honours. These datasets go back as far as 1900. The limiting factors are the importance and level of interest in the information, and how easy it is to extract.

What are the benefits of using longitudinal data?

Longitudinal data allows you to find exactly what you’re looking for, providing greater accuracy and more relevant search results. It can be used in a number of ways, and there are a number of benefits to users.

The ability to find information in The Gazette going back to 1900 has been greatly improved. We have processed the information contained in The Gazette’s indexes and extracted key information, such as the types of notice, people’s names and links to pages where these occur. So you can run specific queries, such as to find all the references to people with the surname ‘Bircham’ who have received a military promotion, for example.

The longitudinal data extracted from notices after 1997 allows more sophisticated queries, enabling users to:

distinguish patterns, such as looking at the number of insolvency notices in a particular geographic area over time
measure and evaluate the effectiveness of a specific policy

How have these datasets been created?

Data prior to 1997 was extracted from historic Gazette documents in the form of issues, supplements and indexes. We have processed archive indexes from 1900 onwards, and notices from 1998 onwards.

Datasets after 1997 were created from extensible markup language (XML), which was used to display the original notices. For datasets before 1997, and for data that doesn’t exist in XML format, we have used existing scanned images of documents and our own information extraction tools. These exploit specific document structure conventions which recognise and allow meaningful data to be extracted.

What datasets are available?

There are a number of longitudinal datasets available for download, the following are available free of charge:

Wills and probate (from 1998): missing wills, missing beneficiaries, missing creditors
Appointments (from 1900): civil promotions, church appointments and military promotions
Honours, awards and charters (from 1900)
Wills and probate (from 1900)

The following are available for a charge, please contact data@thegazette.co.uk or telephone 01603 985949 to talk about your requirement:

Companies (from 1998): insolvency, filings at Companies House, striking off, dissolutions, reinstatements, takeovers and transfers, changes in capital structure, property disclaimers, claims against pension schemes, societies notices and cancellations
Individuals and partnerships (from 1998): insolvency, partnership changes and dissolution

Data held within The Gazette publications, unless stated otherwise, is Crown Copyright and is therefore free for you to use under the Open Government Licence. However, please note that this licence does not cover the re-use of personal data.

In what form are the datasets accessible?

The most user-friendly interface for longitudinal data is The Gazette browser. You can access this using the following format for URLs:

https://www.thegazette.co.uk/{edition}/index/year/{year}/volume/{volume}/page/{index page}?view=linked-data

Where:

{edition}: the edition of The Gazette index, eg London, Edinburgh or Belfast

{year}: the year of the index, eg 1909

{volume}: the volume of the index, eg 4

{index page}: the page of the index volume, eg 012 (must be 3 digits)

For more technical users wishing to make use of the data, the following resources are available:

SPARQL endpoints are available for query at:
- https://www.thegazette.co.uk/sparql (post 1997)
- https://www.thegazette.co.uk/longitudinal-dataset/sparql (pre 1997)
- these endpoints are available for query the SPARQL editor: https://www.thegazette.co.uk/flint
data dumps in resource descriptive framework (RDF) format that can be used for offline analysis are available at:
- ftp://ftp.thegazette.co.uk
- data dumps in XML format to be used for correction (submission of corrections will be coming soon, see below for provenance tracking)
- https://github.com/TheGazette/Dataset-Index-{decade} (where decade is a year range in the format of 1990-1999)

In the coming months we’ll be developing user-friendly search capabilities using this valuable resource, helping to give a detailed insight into those honoured during World War 1.

How are the datasets maintained?

For incoming notices, the relevant data is collected in a machine-readable form, and the dataset can be extended by regular update. For archive datasets going back to 1900, the quantity of archive data in scanned format is fixed.

Though the initial mechanical extraction of the data will inevitably contain errors, the quality will improve over time through community-driven initiatives to correct and improve the data quality through provenance tracking.

Provenance tracking is the ability for users to manually correct any errors in the extracted data. Should an error be spotted, we will provide the capability for users to manually correct them and submit them for approval. All corrections included in the longitudinal data will be included in the provenance trail, which helps to increase transparency.

Find out more about:

our policy for re-using our data
our data service

We also have the ability to offer consultancy services to support users of this resource. For more information please contact technical.support@thegazette.co.uk.