AdministratorOctober 20, 2020 at 10:12 pm
Nowadays every one is talking about big data big data. But what does it really mean? What makes it different from traditional data?
Here is the widely used definition of big data:
“Big data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis.”
Please note: there are 2 key words here. Large and unstructured !!!
Large means it may have big volume, thousands of variables or billions of records. An input data could be over 100GB. The biggest data I processed in work was 80 GB for one single input file.
But more important, it not only means big volume, it also refers to its complexity because it’s unstructured. This is a big difference from traditional data
Structured data is just like Excel tables. All the data is populated into rows and columns, they are 2 dimensional always.
Unstructured data means no such structures in data. For example it could be an analysis of Twitter or Facebook discussions about an event, such COVID-19 or US election. The data is text, and this is called text mining.
It could be analysis of a picture, an audio or a video file, such as image recognition, text generating,, speech recognition, self-driving, language translation etc. So analysis of unstructured data is a distinct difference in big data analytics compared to traditional analytics, which is getting more and more important and prevailing now.
Generally speaking, big data has below characteristics:
Volume: large size and from various sources.
Velocity: high speed data streams, must be dealt with in a timely manner, often in near-real time manner.
Variety: all types of formats – from structured, numeric data in traditional databases to unstructured text documents, email, video, audio, stock ticker data and financial transactions.
Veracity: quality and value of data.
These 4Vs are the distinct features of current big data analytics. Keep it in mind, some people may ask this question during interview.
Log in to reply.