meta data for this page


Definition of Big Data

A simple definition of big data would be the storage and analysis of large and or complex data sets. While this is a very simplistic and narrow definition, two important elements can be identified from it: size and complexity. If you change size into volume, complexity into variety and add a third element, velocity, you get what are called the three Vs. The three Vs are a good way to define the characteristics of big date.

Data volume is an obvious attribute of big data. Big data is often seen as large datasets composing of multiple terabytes, or even petabytes, in size. In addition to this, the data volume could also be quantified via the amount of records or files used, or even by the amount of time taken to collect the data.

The second attribute that defines big data, is variety. Big data is so big because it comes from multiple different sources. Information coming from multiple different sources comes usually in multiple different forms, especially information coming from the Internet, and this can make the data incredibly complicated to analyze using traditional methods.

The third attribute is velocity, which defines the speed of data generation and delivery. For example, data never stops coming from the web and making sense from it all can prove to be a daunting task.

Big data can be defined using these attributes as data that is too big, too fast and too hard to analyze using traditional tools and processes.

Ethics of big data

There are many ethical questions that can be raised concerning the use of big data. For example the risk of losing our privacy is greater then ever before when all of our information gets collected on the Internet. Everything from our social relationships to shopping habbits get collected as information by websites such as Facebook and Ebay. In many cases it's hard to follow how our information is being used and whether we have any control over it. Even if we give consent for our information to be used in one way, we can never be sure if it will be used in another in the future. It's also nealy impossible to know where exactly our information ends up once we give it away online. The data might be sold or even stolen to other hands and there is always a risk that the destination is dubious. We must also wonder who is accountable, when information leads into the wrong hands and get people into trouble.

Another important question involves the use of big data to make predictions. With the amount of data increasing at an alarming rate, more and more accurate predictions can be made analyzing it. But is it just for a person to ger penalized based on these predictions? For example it we suspect a person might commit a crime, can we punish him based on the possibility alone. The question goes further then law enforcement as well, for example with insurance companies determinating who can insured in what extent. This could also be employed in health care to determine which patients get the treatment.

We must also be able question the reliability of the date and the judgement of the people making the predictions based on it. Data can always be biased or false and the analysts can always make mistakes or simply lie. We must be able to not rely too blindly on raw data or we will become liable to fall for it's shortcomings.

Exam questions

1. How is big data changing the way organizations are operated?

Why this question: Big data is clearly changing the game as far as how successful organizations are operated. This question makes you think how big data works, how it is utilized in general terms and in which ways it has changed the way success can be achieved in an organization.

2. Are there differences in the ways big and small organizations utilize big data?

Why this question: There are countless ways to use big data and gain profit from it, but not all organizations can utilize it using the same ways. In most cases, big and powerful organizations can gather more data and have more efficient methods of analyzing it. This question makes you think of the different ways big data is used depending on the size of the organization and the resources available to it.

3. Are the risks involved with big data unavoidable or are there ways to minimize them?

Why this question: The risks involved with big data are an important element in the course materials and everyone should be aware of the ethical issues big data introduces. This question makes you think of the risks from a different perspective and consider just how strong their impact is really going to be.

4. Why is it important to embrace your doomsday scenario?

Why this question: Embracing the doomsday scenario was an important step in The New Killer Apps-book. This question tests your knowledge of the book and especially of the chapter about doomsday scenarios.

Presentation slides