meta data for this page
  •  

This is an old revision of the document!


Definition of Big data

Big data is data that exceeds the processing capacity of traditional databases. The data is too big to be processed by a single machine. New and innovative methods are required to process and store such large volumes of data. Also big data might refer to methods that help to gain new kinds of insight, that traditional small data methods couldn't provide. In the heart of these methods is correlation based thinking.

References:

Gupta, Richa, Sunny Gupta, and Anuradha Singhal. “Big Data: Overview.” arXiv preprint arXiv:1404.4136 (2014).

Jacobs, Adam. “The pathologies of big data.” Communications of the ACM 52.8 (2009): 36-44.

Cuzzocrea, Alfredo, Il-Yeol Song, and Karen C. Davis. “Analytics over large-scale multidimensional data: the big data revolution!.” Proceedings of the ACM 14th international workshop on Data Warehousing and OLAP. ACM, 2011.

Ethics of big data:

Big data features numerous ethical problems:

Data collection. Service providers collect lots of information about customers, who have no control over it. Data collection is done in secrecy, no one know what is actually collected.

Privacy. Using big data methods it is possible to extract sensitive information from bulk data. This could lead into serious breach of privacy. Also it must be noted that perfect anonymization is not possible, even if personal data is removed.

Predictions based on big data. For an example big data could make it possible to detect potential criminals even before they commit crimes. However it is worth to notice that big data predictions are based in correlation and in direct evidence. No prediction is ever completely accurate. Also basic human rights protect individuals from prosecution before they have done anything from. These kind of predictive methods should be utilized extremely carefully and actions based on these predictions should be only guiding.