Definition of Big Data

Richard Cumbley, Peter Church - Is “Big Data” creepy?

Definition under “What do we mean by Big Data?”:

  • Big Data as in large amount of unstructured data in sizes ranging from the Terabyte to the Exabyte.
  • Usually unstructured of semi-structured
  • Not just large strings, but Big Data can come in forms of video or audio.
    • “To give an example, a YouTube video of the Harlem Shake may be many megabytes in size but much of it is just noise to a machine.”
  • Usefulness of the Big Data for companies (or individuals) is debatable
  • Still a lot of limitations

The article itself talks about the security and privacy implications of Big Data. The whole lifecycle of Big Data (collection, combination, analysis and use) presents risks to individual privacy. Although, I think this is out of the scope of this assignment for now.

Jules J. Berman, Ph.D., M.D. - Principles of Big Data: Preparing, Sharing, and Analyzing Complex Information

Citation from the book in Introduction chapter:

Big Data is defined by the three V’s:

  1. Volume—large amounts of data
  2. Variety—the data comes in different
    1. forms, including traditional databases,
    2. images, documents, and complex records
  3. Velocity—the content of the data is
    1. constantly changing, through the
    2. absorption of complementary data
    3. collections, through the introduction of
    4. previously archived data or legacy
    5. collections, and from streamed data
    6. arriving from multiple sources

“It is important to distinguish Big Data from “lotsa data” or “massive data.” In a Big Data Resource, all three V’s must apply. It is the size, complexity, and restlessness of Big Data resources that account for the methods by which these resources are designed, operated, and analyzed.”

Ibrahim Abaker Targio Hashema, Ibrar Yaqooba, Nor Badrul Anuara, Salimah Mokhtara, Abdullah Gania, Samee Ullah Khanb - The rise of “big data” on cloud computing: Review and open research issues

Citation from the article:

“Big data is a term utilized to refer to the increase in the volume of data that are difficult to store, process, and analyze through traditional database technologies. The nature of big data is indistinct and involves considerable processes to identify and translate the data into new insights.”

Ethics in big data

Big data poses many ethical issues and threatens the privacy of people. Using big data analysis companies can gather, buy and sell data that was never meant to be discovered by a individual person. It is questionable what data can be shared, sold or bought in terms of gathered analysis of big data and how relevant and truthful is it.

IBM's Distinguished Engineer Mandy Chessell ( has listed a framework to use for companies to determine whether their analysis are ethical or not:

  • Context
    • For what purpose was the data originally surrendered?
    • For what purpose is the data now being used?
    • How far removed from the original context is its new use?
    • Is this appropriate?
  • Consent & Choice
    • What are the choices given to an affected party?
    • Do they know they are making a choice?
    • Do they really understand what they are agreeing to?
    • Do they really have an opportunity to decline?
    • What alternatives are offered?
  • Reasonable
    • Is the depth and breadth of the data used and the relationships derived reasonable for the application it is used for?
  • Substantiated
    • Are the sources of data used appropriate, authoritative, complete and timely for the application?
  • Owned
    • Who owns the resulting insight?
    • What are their responsibilities towards it in terms of its protection and the obligation to act?
  • Fair
    • How equitable are the results of the application to all parties?
    • Is everyone properly compensated?
  • Considered
    • What are the consequences of the data collection and analysis?
  • Access
    • What access to data is given to the data subject?
  • Accountable
    • How are mistakes and unintended consequences detected and repaired?
    • Can the interested parties check the results that affect them?

Is something ethical or not does always depend on how do one determine what is ethical. One might find something unethical that other one does not. This is why companies should ask these questions from themselves and determine what is ethical for them and their customers and stakeholders.

Ethics for big data can be tricky to determine, but there is at least one easy way to determine what shall not be done and that is to read the local / global laws. More than ethics, laws should be obeyed by every company. Laws should be set so that the privacy of people is not affected by ever developing techniques of data analysis but so that the power of big data can be harnessed.


Exam questions

Big data

Question 1: Characteristics of big data?

  • would show if the person has idea what big data is

Question 2: Privacy and security challenges in Big Data?

  • Important question about deploying big data applications and interesting topic in my opinion

New killer apps

Question 3: Describe main steps for creating killer apps.

  • Because this is the base of the book

Question 4: Explain the differences between start-up innovation- and large company innovation processes (,risks and challenges).

  • These differs quite a bit and its good to know the both processes (risks and challenges at least)