meta data for this page

My Big data definition

“Big data are amount of data, which size is over abilities to capture, display, store and to be maintain and processed by regular used databases in reasonable time.”

Main reason why I agree with (chose this) definition is because if it wouldn't be enough (regular databases) there would not be necessity of developing the technologies and principles like Map reduce and Hadoop and other else. And these technologies are considered as clearly big data technologies.

I would also add that big data have something to do with no-structured / semistructured or data which are difficult to structure.

And thanks to big data we can also by proper analysis find patterns, anomalies, or new structures we wasn't aware about. We can derive from them new information. They can bring us comprehension how more complex things can works.


Zheng, Z., Zhu, J., Lyu, M., Service-generated Big Data and Big Data-as-a-Service: An Overview, IEEE congress 2013

Ethics in big data

If we consider just data what refers to people's and their activities, there are some ethical issues, what needs to be take care about:

  • Who is collecting data and for which reasons
  • Have this subject permission to store this data about specific people?
  • Are these data stored securely? In a way that no other subject will get to them in non cooperative way
  • Will not be data sold/given to other institutions?
  • Who have access to thus data (in company who collecting them)
  • Will not be these data misused?
  • Is legislative says what is allowed and prohibited with collecting and processing these data?

Personal data collection should be done in transparent ways and if possible the subject of it should have some power about it. In many cases data collection shouldn't be done at all if it doesn't benefit the subject. There are many ethical issues concerning big data but because of the great benefits of it those are often ignored.

Nicole Laskowski, 2013

Ethics of Big Data by Kord Davis with Doug Patterson 2012, O'Reilly

My presentation


Exam questions

1) Where do you see big data in the future? ⇒the thing is to find out if the student consider the imporance of this topic and outputs of big data analysis, and try to find out how the student percept this issue in the future. What will be the BD position.

2) Explain the differences between correlation and causation. And put it into the relation with big data.

⇒Student should feel the difference between these two expressions and be aware of the fact that every explored correlations doesn't necessarily mean, that the things really works like that.

3) Where would you set the border with publishing data, in consideration which of them contains privacy and shouldnt be published and which data can be published with no privacy issues? ⇒ student should be more thinking about which data are ok to publish and which already violating privacy

4) Explain the advantage/disadvantage of providing opendata in scale of national economy of given country.

⇒Student should show that he is aware about all the economical benefits which comes from providing big data. Especially the thing that based on these accessible data can run new companies, which can these data processing and selling them with some added value. And also that these processed data can give us good information which can increase efficiency of our behavour (avoiding traffics, more efficient using of public transports, better planing of activities dependant on more accurate weather forecasts… etc.)

Extra homework for missed lecture no.3

Grading of other students presentations

Grading using 5 point scale 1min - 5max

Big Data

  • Ch1 - Vero - 4
  • Ch2 - Ibrahim - 3
  • Ch3 - Mansooreh - 3
  • Ch4 - Marc - 5
  • Ch5 - Masood - 4
  • Ch6 - Elen - 4
  • Ch7 - Vitor - 3
  • Ch8 - Me
  • Ch9 - Manuel - 4
  • Ch10 - Michal - 4


  • Ch1 - Manuel - 4
  • Ch2 - Vero - 4