Skip to content

Data in general

Term Definition
Administrative Data

See: Administrative data πŸ”—.

Big Data

Large amounts of information that, because of its scale, may need novel or non-standard methods to process. In the original coining, "big" referred to one or more of volume (the raw size of the data), velocity (the rate at which new data were generated) or variety (the complexity or richness of the data).

Census

A survey of a national population which asks questions about age, gender, background and so on. In the UK, censuses are carried out every 10 years or so. Census information helps with things like local service planning and making important decisions. Census data can be used in academic research. If so, it is anonymised before being used.

See also: Anonymisation.

Characteristic

A piece of information about an individual, place or thing that is potentially useful in data analysis. For example, characteristics of a person might be age, gender, ethnicity, socioeconomic status and education level. If data about individuals were recorded in a table, the columns of the table might be characteristics.

See also: Socio-demographic Factors.

Code Lists

A collection of specific, standard codes (labels) that are used in healthcare to represent different things, such as medical diagnoses, treatments, or procedures.

Contrast with: Code Control.

Data

See: Data πŸ”— and Data πŸ”— and Data πŸ”—.

Data Curation

See: Data Curation πŸ”—.

See: Data Curation πŸ”—.

See: Data Curation πŸ”—.

Data Literacy

The ability to understand, analyse, interpret, and critically evaluate data and data related studies.

Data Mining

See: Data Mining πŸ”—.

See: Data Mining πŸ”—.

Data Science

A field of analysis focused on extracting knowledge and insights from data. It combines techniques from data management, computer science, and statistics to store, organize, and analyze data. Data science also involves applying this knowledge to specific problems, making it highly interdisciplinary, with experts from various backgrounds (such as clinicians and computer scientists) collaborating. Its goal is to uncover useful patterns and make data-driven decisions or predictions.

Data Users

See: Data User πŸ”—.

Database

See also: Relational Database.

See: Database πŸ”—.

See: Database πŸ”—.

FAIR Data

FAIR data is a set of principles ensuring data is:

Findable: Easy to locate through clear identification and metadata.

Accessible: Retrievable through standard methods, even if authentication is needed.

Interoperable: Can work across different systems and with other datasets.

Reusable: Well-documented and properly licensed so others can use it.

See also: FAIR Data πŸ”—.

See also: FAIR πŸ”—.

Longitudinal Dataset

A collection of data related to the same group of people over a long time to see how things change. This may involve asking the same questions at different ages.

Metadata

Data that describes or provides information about other data. It is used to provide context, meaning, and structure to data, and helps to make it easier to understand and use. Metadata can describe various aspects of data, such as its content, format, structure, origin, quality, and usage.

Personal Data

UK data protection regulation defines personal data as any piece of information that someone can use to identify, with some degree of accuracy, a living person. It is also something which can confirm your physical presence somewhere.

Examples of personal data would be: a name and surname; a home address; an email address; an identification card number; location data; an Internet Protocol (IP) address; the advertising identifier of your phone.

Personal data can also be sensitive (or "Special Category Data"): see Sensitive Data.

Relational Database

An organised collectiom of data, where data are related to each other in a systematic manner so that they can be reorganised and accessed in a number of different ways. A relational database may house one or many datasets.

See also: Database.

See also: Database πŸ”—.

Sensitive Data

UK Data Protection Regulaiton (UK GDPR) defines sensitive data as Special Category Data and is subject to specific processing conditions under the UK GDPR: personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs; trade-union membership; genetic data, biometric data processed solely to identify a human being; health-related data; data concerning a person’s sex life or sexual orientation.

Commercial data such as retail information, business details, IP (intellectual property) and Copyright information or confidential product details is also be considered sensitive data.

Data sensitivity is be classified at an institutional level within policy documents (e.g. highly confidential, confidential, not classified) with handling requirements and placed on the different levels of confidetiality required. See Personal Data

Socio-demographic Factors

Characteristics of individuals or populations related to social and demographic aspects such as age, gender, ethnicity, socioeconomic status, and education level.

See also: Characteristic.

Structured Data

Data which are organised and formatted using pre-defined rules, so that computational analysis is easier. For example, structured data is often stored as tables in a database where each column represents a different type of information (like numbers or words), and each cell in the table holds a single piece of data. This organisation helps with sorting, searching, and understanding the data more easily.

See also: Unstructured Data.

Unconsented Data

Personal data used for secondary purposes (such as research) where a specific, demonstrated public benefit is proven, usually with Article 6 and Article 9 in the General Data Protection Regulations as the legal basis for undertaking that secondary use of that data (as opposed to individual consent).

See also: Consent; UK General Data Protection Regulation (UK GDPR).

Unstructured Data

Data that has limited structure, or structure that is very difficult to process computationally. Examples of such unstructured data includeΒ free text, like paragraphs of written information, or images such as X-ray or scan pictures, or scanned letters.

See also: Structured Data.

Variable

Any characteristic, number, or quantity that is represented in a dataset for each observation. In data analysis,a variable is a symbolic name to represent different types of information in datasets. For example, date of birth is a variable representing when a person was born.

See also: Characteristic.