What is Big Data?

Why Big Data?

Today, An Application without data analytics is like a car without a steering wheel. It will go, but there’s no controlling its direction.

Data Management has become extremely complex in recent years, because day by day data is growing exponentially and it is biggest challenge for IT industry to store and analyze those data. One-third of organizations experiencing increase of 25 percent or more annually. Healthcare is one of the fastest growing segments and growing at 48 percentage yearly compared to 40 percent of overall digital universe.

Data which fulfill three criteria is call “Big Data”:
Volume: How much data
Variety: The various types of data
Velocity: How fast data is processed

Volume: Big data implies enormous volumes of data (petabytes of data), it is the most immediate challenge of big data. It is used to be employees created data, now that data is generated by human, network and machines interaction on systems like social media. It requires scalable storage and support for complex, distributed queries across multiple data sources. The big challenge is to identify, locate, analyze and aggregate particular pieces of data.

Variety: It is refer to the many types of data both structured and unstructured. We normally used to store data from sources like database, text file and spreadsheets, but data come in form of photos, videos, PDfs, audio, emails. Social media, blogs, web server logs, GPS, RFID and web content. This verity of unstructured data creates problems for storing, mining and analyzing data. Standard techniques and technology we normally used for large volume of structure data. It is very big challenge for current industry to process and analyze large volume of structure and unstructured data and convert those data into meaningful information.

Velocity: Analytics for traditional data warehouse tend to be based on periodic – daily, weekly, monthly - modification of data. In healthcare industry, area such as Patient analysis and Clinical decision support, it is very important to have real-time or near real-time data to elimination of errors and timely decision making. To support automated decision making data is require real-time you can’t use 5 minute old data for automated decision making because manual reviews of such decision is expensive and time-consuming.


How Big Data Help?

Many company has invented various tools which help industry to deal with high volume and variety of data. Using this tools industry can easily analyze customer data, for ex. In Healthcare industry this tool helps to incent and compensate providers to keep patients healthy. In other side patients is also demanding information about their healthcare options so that they understand their choices and can participate in decisions about their care.


Big Data Tools

IBM Text Analytics: Using this tool we can extract structured information from unstructured and semi-structured documents. Annotation Query Language (AQL) is helping in extracting this information from unstructured document.

Big R: It helps in analyze, manipulates and visualize big data. It used the open source R language to enable rich statistical analysis and predictive model.

IBM InfoSphere Streams: It is advanced analytics platform which allow user to develop application quickly, analyze and correlate the information which is arrive from thousands of real-time sources, It handle very high data throughput rates, millions of events and messages per seconds.

IBM Accelerator: It is used for Machine data analytics to import, extract, index, searching patterns and analyze significance machine data files.


Apache Spark: It is open source big data processing framework, It enables application in Hadoop clusters to run up to 100 times faster in memory and 10 times faster even when running on disk. 

Comments

Popular posts from this blog

SSRS Report Design: Best Practices

SSAS OLAP Design - Best Practices

Enable Usage-Based Optimization in SSAS