Ph.D Projects In Big Data

Big data refer to technologies and initiatives that tackle diverse, massive data to address the traditional technologies, skills, and infrastructure efficiently. The volume, velocity, and variety of data are greatly high. Big Data is not a single technology or initiative, but it depends on several domains of business and technology. Recently developed technologies make it possible to recognize value from Big Data. For instance, governments and even Google can track the emergence of disease outbreaks through social media signals. The coal mines can make use of the output of sensors in their mining equipment to make more efficient and safer mining decisions. Big Data refer to large and complex data sets that are impractical to manage with traditional software tools. The size of Big Data might be represented in petabytes (1024 terabytes) or Exabytes (1024 petabytes) that consist of trillion records of millions of people collected from various sources such as web, social media, mobile data, and customer contact center. The nature of data is loosely structured, i.e. incomplete and inaccessible.

Operational technology and analytical technology are the two technologies that dominate the Big Data domain. The former class of technology offers operational capabilities for real-time, data manipulation where the data is primarily captured and stored. The latter class of technology offers analytical capabilities for complex analysis based on all data. These technologies are complement to each other and frequently deployed together. Both these technologies have opposing requirements and unique demands in a very different manner. Operational systems like NoSQL database offers service to several concurrent requests while ensuring low response-latency. The analytical system focuses on high throughput even if the queries are too complex and require referring all data in the system. Hadoop is an analytical system for MapReduce.

Big Data grasped a lot of attention from market trends, equipment based performance, and other industry elements. Big data, analytical tools and technologies greatly assist in IT decision making. Even the large organizations find it difficult to deal with the larger datasets in terms of manipulating and managing the Big Data. Big Data is particularly a troublesome factor in business analytics since the traditional tools and procedures are not designed to search and analyze massive datasets. Big Data deals with two classes of data sets, namely, structured and unstructured. The records obtained from inventories, orders, and customer information contributes to the structured datasets. The unstructured data set can be obtained from the web, social media, and intelligent devices.

Tools and Technologies

  • Apache Hadoop
  • Cassandra
  • MongoDB
  • R
  • Ambari
  • Hbase 0.94.16
  • Pig
  • Spark
  • Mahout
  • Pentaho
  • IntelliJ IDEA
  • J2SE
  • Eclipse – Indigo SR2
  • ArgoUML 0.34
  • Java Database Connectivity (JDBC)
  • Java Server Pages (JSP)
  • Servlets