Data in general

Term	Definition
Administrative Data	See: Administrative data 🔗.
Big Data	Large amounts of information that, because of its scale, may need novel or non-standard methods to process. In the original coining, "big" referred to one or more of volume (the raw size of the data), velocity (the rate at which new data were generated) or variety (the complexity or richness of the data).
Census	A survey of a national population which asks questions about age, gender, background and so on. In the UK, censuses are carried out every 10 years or so. Census information helps with things like local service planning and making important decisions. Census data can be used in academic research. If so, it is anonymised before being used. See also: Anonymisation.
Characteristic	A piece of information about an individual, place or thing that is potentially useful in data analysis. For example, characteristics of a person might be age, gender, ethnicity, socioeconomic status and education level. If data about individuals were recorded in a table, the columns of the table might be characteristics. See also: Socio-demographic Factors.
Code Lists	A collection of specific, standard codes (labels) that are used in healthcare to represent different things, such as medical diagnoses, treatments, or procedures. Contrast with: Code Control.
Data	See: Data 🔗 and Data 🔗 and Data 🔗.
Data Curation	See: Data Curation 🔗. See: Data Curation 🔗. See: Data Curation 🔗.
Data Literacy	The ability to understand, analyse, interpret, and critically evaluate data and data related studies.
Data Mining	See: Data Mining 🔗. See: Data Mining 🔗.
Data Science	A field of analysis focused on extracting knowledge and insights from data. It combines techniques from data management, computer science, and statistics to store, organize, and analyze data. Data science also involves applying this knowledge to specific problems, making it highly interdisciplinary, with experts from various backgrounds (such as clinicians and computer scientists) collaborating. Its goal is to uncover useful patterns and make data-driven decisions or predictions.
Data Users	See: Data User 🔗.
Database	See also: Relational Database. See: Database 🔗. See: Database 🔗.
FAIR Data	FAIR data is a set of principles ensuring data is: Findable: Easy to locate through clear identification and metadata. Accessible: Retrievable through standard methods, even if authentication is needed. Interoperable: Can work across different systems and with other datasets. Reusable: Well-documented and properly licensed so others can use it. See also: FAIR Data 🔗. See also: FAIR 🔗.
Longitudinal Dataset	A collection of data related to the same group of people over a long time to see how things change. This may involve asking the same questions at different ages.
Metadata	Data that describes or provides information about other data. It is used to provide context, meaning, and structure to data, and helps to make it easier to understand and use. Metadata can describe various aspects of data, such as its content, format, structure, origin, quality, and usage.
Personal Data	UK data protection regulation defines personal data as any piece of information that someone can use to identify, with some degree of accuracy, a living person. It is also something which can confirm your physical presence somewhere. Examples of personal data would be: a name and surname; a home address; an email address; an identification card number; location data; an Internet Protocol (IP) address; the advertising identifier of your phone. Personal data can also be sensitive (or "Special Category Data"): see Sensitive Data.
Relational Database	An organised collectiom of data, where data are related to each other in a systematic manner so that they can be reorganised and accessed in a number of different ways. A relational database may house one or many datasets. See also: Database. See also: Database 🔗.
Sensitive Data	UK Data Protection Regulaiton (UK GDPR) defines sensitive data as Special Category Data and is subject to specific processing conditions under the UK GDPR: personal data revealing racial or ethnic origin, political opinions, religious or philosophical beliefs; trade-union membership; genetic data, biometric data processed solely to identify a human being; health-related data; data concerning a person’s sex life or sexual orientation. Commercial data such as retail information, business details, IP (intellectual property) and Copyright information or confidential product details is also be considered sensitive data. Data sensitivity is be classified at an institutional level within policy documents (e.g. highly confidential, confidential, not classified) with handling requirements and placed on the different levels of confidetiality required. See Personal Data
Socio-demographic Factors	Characteristics of individuals or populations related to social and demographic aspects such as age, gender, ethnicity, socioeconomic status, and education level. See also: Characteristic.
Structured Data	Data which are organised and formatted using pre-defined rules, so that computational analysis is easier. For example, structured data is often stored as tables in a database where each column represents a different type of information (like numbers or words), and each cell in the table holds a single piece of data. This organisation helps with sorting, searching, and understanding the data more easily. See also: Unstructured Data.
Unconsented Data	Personal data used for secondary purposes (such as research) where a specific, demonstrated public benefit is proven, usually with Article 6 and Article 9 in the General Data Protection Regulations as the legal basis for undertaking that secondary use of that data (as opposed to individual consent). See also: Consent; UK General Data Protection Regulation (UK GDPR).
Unstructured Data	Data that has limited structure, or structure that is very difficult to process computationally. Examples of such unstructured data include free text, like paragraphs of written information, or images such as X-ray or scan pictures, or scanned letters. See also: Structured Data.
Variable	Any characteristic, number, or quantity that is represented in a dataset for each observation. In data analysis,a variable is a symbolic name to represent different types of information in datasets. For example, date of birth is a variable representing when a person was born. See also: Characteristic.