meta data for this page

This is an old revision of the document!

Definition of Big data

Big data is data that exceeds the processing capacity of traditional databases. The data is too big to be processed by a single machine. New and innovative methods are required to process and store such large volumes of data. Also big data might refer to methods that help to gain new kinds of insight, that traditional small data methods couldn't provide. In the heart of these methods is correlation based thinking.


Gupta, Richa, Sunny Gupta, and Anuradha Singhal. “Big Data: Overview.” arXiv preprint arXiv:1404.4136 (2014).

Jacobs, Adam. “The pathologies of big data.” Communications of the ACM 52.8 (2009): 36-44.

Cuzzocrea, Alfredo, Il-Yeol Song, and Karen C. Davis. “Analytics over large-scale multidimensional data: the big data revolution!.” Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP. ACM, 2011.

Ethics of big data

Big data features numerous ethical problems:

Data collection. Service providers collect lots of information about customers, who have no control over it. Data collection is done in secrecy, no one know what is actually collected.

Privacy. Using big data methods it is possible to extract sensitive information from bulk data. This could lead into serious breach of privacy. Also it must be noted that perfect anonymization is not possible, even if personal data is removed.

Predictions based on big data. For an example big data could make it possible to detect potential criminals even before they commit crimes. However it is worth to notice that big data predictions are based in correlation and in direct evidence. No prediction is ever completely accurate. Also basic human rights protect individuals from prosecution before they have done anything from. These kind of predictive methods should be utilized extremely carefully and actions based on these predictions should be only guiding.

Four exam questions

1. Critical thinking: What are the key aspects to take into account when considering credibility of an argument?

Question is in the essence of critical thinking. It summarizes the most important aspects of the online course. This is also something every student should be able to answer.

2. In your opinion why are big data and open data such a significant concepts?

Makes student to review the essential aspects of both concepts.

3. What kind of Privacy solutions are there for big data and open data?

A really important question without perfect answer. Books provide solutions. Checks if student has read the books. Because answers are spread throughout the pages. Also requires creative thinking. Combined since the concepts overlap.

4. Give an example of a scenario where open data could be used for profit. Also describe how profit would be generated.

The open data book was largely centered around real world business examples. The practical side is at least as important as more theoretical parts.