4.11.1 - Big Data [AQA]

In this section, you will learn that big data is a catch-all term for data which exceeds the usual limits.

We describe big data by using the 3 V's:

Big data is an enormous quantity of data, generated and captured at breakneck speed. This therefore requires a massive amount of data to be generated, stored and processed at extremely high speeds.

The magnitude is in zetabytes (ZB) now. This is 10^21 bytes, or 1 trillion GB.


The Internet of Things

We call anything with a sensor, processing ability and other things that can exchange with and retrieve data from other devices part of the Internet of Things (IoT). The IoT is enabled by automation and embedded systems and powers ubiquitous computing (where computing systems appear anywhere and everywhere, not just on your computer), and machine learning.

Structured and unstructured data

We traditionally organise data in relational databases which accept data based on a predetermined model with tables, fields and records. As it follows a predefined format, we consider it structured data.

Semi-structured data is not as predictable as structured data, but contains elements that can be used to find its underlying structure. For example, a .csv file is a plain text file using delimiters (a character marking the start and end of data, in CSV this is usually a comma) to separate the different data values.

Unstructured data has no predefined format and cannot be turned into distinct components before it is received. It is more difficult to collect, process and analyse than structured/semi-structured data.

Big data is mostly unstructured, so it cannot comply with the row-column structure of a relational database. It therefore needs a different solution allowing data to be spread across machines. We call this functional programming.


Last topic:
Client-server databases
Next topic:
Functional programming

© 2023 A Level Studies Neocities