Big Data Essentials
What is Big Data?
Big data is a buzzword, or catch-phrase, used to describe a massive volume of both structured and unstructured data that is so large that it’s difficult to process using traditional database and software techniques. In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity. While the term may seem to reference the volume of data, that isn’t always the case. The term big data, especially when used by vendors, may refer to the technology (which includes tools and processes) that an organisation requires to handle the large amounts of data and storage facilities.
The term big data is believed to have originated with Web search companies who had to query very large distributed aggregations of loosely-structured data.
An Example of Big Data
An example of big data might be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data consisting of billions to trillions of records of millions of people—all from different sources (e.g. Web, sales, customer contact centres, social media, mobile data and so on). The data is typically loosely structured data that is often incomplete and inaccessible.
When dealing with larger datasets, organisations face difficulties in being able to create, manipulate, and manage big data. Big data is particularly a problem in business analytics because standard tools and procedures are not designed to search and analyse massive datasets. Big data may also be called enterprise big data.
What is this course about? This course is an overview of Big Data foundation knowledge, tools and technologies. It establishes a strong working knowledge of the concepts, techniques and products associated with Big Data. Attendees learn the different storage models, processing approaches and reporting tools available to work with Big Data. They will learn the core functionality of each major Big Data component and how they integrate to form a coherent solution with business benefit. Hands-on exercises aim to provide insight into what the tools do so that their role in a Big Data system can be understood. Exercises are intended to build practical skills in the toolsets at beginner level only.
Why is it becoming important now?
- Rise of smartphones with GPS and internet connectivity
There are 4.6 billion mobile-phone subscriptions worldwide and there are between 1 and 2 billion people accessing the internet.
- Aerial sensors and sensor networks
The NASA Center for Climate Simulation stores 32 petabytes of climate observations and simulations on the Discover supercomputing cluster.
- Social network adoption
Facebook has 1.06 billion monthly active users with 30 billion pieces of content shared on Facebook every month. There are roughly 175 million tweets every day, from more than 465 million accounts.
How BIG is Big Data?
- 7 zetabytes (that’s 27 with 21 zeros after it) of data exist in the digital universe today.
- By 2025 analysts predict the amount of data will be 150x what it is today.
- In 2015 90% of all the data that existed in our entire history had been created in the previous 2 years.
- Every 2 days we create as much information as we did from the beginning of time up to 2003.
How are people using it?
- US and European countries are using Big Data to predict crime before it even happens.
- Google Flu Trends uses search terms to predict the spread of the flu virus.
- Statistician Nate Silver predicted the outcome of the US election down to each individual state in 2012.
- Singtel, Starhub are using mobile phone data to establish how peoples’ locations and traffic patterns can be used for urban planning.
How we can use Big Data to improve people’s lives and day to day experiences?
- If 2.7 zetabytes of data exist in the digital universe today, the richest experiences are created by the broadest data sources.
- Over a million merchants worldwide use Foursquare as a voucher distribution channel. Brands can reward people based on influence rather than just their behaviour.
- By 2035 traffic will be travelling 8% slower due to congestion. Big Data can control traffic flow for “Smarter Cities”.
- At the current rate, by 2022 1/3 of UK retail will be done online. Big Data can bring the intelligence of online shopping into the retail environment.
- Big Data – An Introduction, Key Characteristics
- Traditional Data vs. Big Data
- Business Impact of Big Data, Data Monetisation
- Organisational Impact of Big Data
- Business Intelligence vs. Data Science
- Data Analytics Lifecycle
- Data Scientist – An exploration
- IT Stacks for Big Data – SMAQ ; Concepts, Case Study
- Hadoop, MapReduce, NoSQL fundamentals
- Hadoop Conceptual Framework, Technical Components
- Realworld Data, Sentiment Analysis and Visualisation
- An introduction to Hadoop with Hive and Pig
- Process Data with Apache Pig, Apache Hive
- Use HCatalog, Pig & Hive Commands
- Use Basic Pig Commands
- Use MS-Excel to Access/Analyse Hadoop Data
- Visualise Website Clickstream Data
- Refine and Visualise Server Log Data, Sentiment Data, Machine & Sensor Data
- Privacy and Ethics of Big Data
Executives, managers, consultants, business analysts, operation personnel, programmers, architects, administrators and data analysts who want a foundational overview of the key components required to effectively understand and analyse Big Data. Familiarity working with computers and business applications is assumed. Programming experience is beneficial but not required.
There is no certification for this program but rather to provide all the information necessary to understand Big Data and how it impacts people and businesses.
You may have a basic working knowledge of MS-EXCEL software. A basic knowledge of programming, databases and SQL is helpful but not mandatory.