Data mining is a word we have always heard and tried to understand what it is but never got the right information, right! Just read along to get your hands on Data mining and you`ll be scared as well after knowing what it is. What you`ll know by the end of this blog:
·
Definition
·
How Google and Facebook fetch your data
·
What are Cookies
·
What are Tracking Cookies and Deep Face
·
4 steps of Data mining
·
Data Mining Techniques
·
Software and Tools
· Pros and Cons
Data is “Raw facts, and figures”
while Mining is “Extraction” like gold mining. So,
we can say that,
Data mining is the process of finding
peculiarity, patterns and correlations within
large data sets to predict outcomes.
Using a broad range of techniques.
Let`s take a look at an example to
understand it better (Note: You can try it as
well). First, open Facebook and
scroll through without opening anything, now
open something let's say, you opened
an e-store and you were looking through
mobiles, don`t tap at any phone, just
close the Facebook and type the company
name of the phone that you just saw
at the Facebook. You`ll be surprised to see
the name of the mobile phone that appeared you were scrolling through as illustrated in the video below.
How does that happen?
Google uses IP addresses and Cookies
to fetch your data, wondering what is an IP address and what are cookies?
IP address: Internet Protocol is a protocol your
device needs to connect to the internet, IP address is usually in the form of 192.168.10.0,
these numbers store your:
·
City
·
Zipcode/area
code
·
And your ISP
(Internet Service Provider) name
It tells your accurate location not
wholly but, partially for sure. Want to know your IP address:
Cookies: Cookies are the things you need to be
aware of the most. It is a small file that website stores in your browser to
store your data like whatever site you visit and you log in, the next you`ll
see is a form/panel hanging saying, “Save Password” somewhat like this:
Your username and
password are also saved in your browser in that particular cookie file because
if the company went to store your passwords and usernames, their databases will
surely run down on storage after a hundred or more records.
Now, we have
Facebook`s Tracking Cookies, you`ll be thinking what is the difference
between normal cookies and tracking cookies. Normal cookies can`t
track your mouse, the website owner doesn`t know where your mouse is hovering
but, tracking cookies do know where your mouse hovered on their website.
We have always seen notifications on Facebook, recently we have started receiving notifications, somewhat like this:
Someone from
your friends has uploaded a photo of you, how does Facebook know?
For this purpose, Facebook uses Deep
Face (Accuracy: 97.35%) that can recognize you even if your picture is a
little bit in another direction or its upside down, like this:
Picture (a)
is the original one which is slight to the right but after crossing it from
Deep Face you can see in the picture (g) it's a front-side of his face.
Now, we have
4 steps of Data Mining, which include:
1. Data Gathering
2. Data Preparation
3. Mining the data
4. Data Analysis
and interpretation
To
understand it better, let`s see a picture below:
The Data
Source in the image is the first process where the data is gathered and
then the data is sent to ETL (Extraction, Transformation, and Loading)
this phase also includes error removing and a lot more than ETL actually. Next,
the data went for the data warehouse, in the warehouse, there are small Data
Marts, Data Marts store the most used data like cache memory in the
computer that stores the recently used programs/applications. Then, this data
is fetched through OLAP Server (Online Analytical Processing Server) and
used for data mining, reporting tool, and analysis tool.
After 4
steps of Data Mining, we have different Data Mining Techniques. Don`t
worry! We won`t go into more detail (Just names, will make a blog on them after
it). These are:
·
Classification
·
Clustering
·
Regression
·
Neural networks
·
Association
·
Sequence
Now, there
are some software's through which we do Data Mining (Of course, we need
something to mine data, we can`t do it in the air), these software`s are:
1. Alteryx
2. Amazon Web
Services
3. Data Bricks
4. Data Robot
5. and there are
many other
Last but not
least is its Pros and Cons:
Pros:
1. Business
Purpose
2. Better
Customer Service
Cons:
1. Security
2. Information
Misuse
Business
Purpose: Companies
use data mining to know their customer's preferences so, that they can work
more on it and show the user relevant products or even ads. Like Google Ad Sense. See open any website on your
browser and they will show you the ads relevant to you and appropriate
regarding your country, like here:
Google is
showing the ad of PSL because of my Location.
Better
Customer Service: When
companies look at the problem they are facing like finding a product on a
website, the company will look after it and do what they can to solve the
problem. Like they made a Recommended for you button:
Now, we get Cons as well.
This is a
YouTube channel Analytics, you can see it says when your viewers are on
YouTube, Other Channel Your audience watches, and other videos your audience
watched, means how do you know? Sometimes, we don`t want others to know
what we watched or which YouTube channels we watched. So, first comes a
Security risk.
Information Misuse: 2 years ago, the data of 115 million people of Pakistan was on the Dark Web for sale.
As we have
gone through everything about Data Mining, let's go for its history.
History: It all emerges in the late 1980s and
early 1990s to analyze the vast amount of data when companies all over the
world were gathering and producing data. The word Data Mining was in use
by 1995 when the first international conference was held in Montreal.
The event was sponsored by AARI (Association for the Advancement of
Artificial Intelligence). A journal called Data Mining and Knowledge
Discovery published its first problem in 1997, and so on the problems and
the advancement begins.
Comments
Post a Comment