Big Data Analytics (Batch-03)
August 5, 2018 to August 10, 2018
Programme Directors : Profs. Dinesh Kumar, Shankar Venkatagri & Pulak Ghosh
Programme Dates : 05 – 10 August 2018
Programme Venue : M-12, IIMB Campus
Programme Overview : A triad of terms captures the essence of “big data”: volume, velocity and variety. The volume and pace at which data is created can challenge existing computing infrastructure. For example, every flight of a Boeing 777 can generate up to 1 terabyte (~1000 gigabytes) of data. Making sense of this data is imperative for decision-making and troubleshooting.
Organizations large and small are forced to grapple with problems of big data, which challenge the existing tenets of data science and computing technologies. Straightforward tasks such as interpreting descriptive statistics have their share of issues. We begin to question the utility of summary measures and diagrams.
Algorithms that work well on “small” datasets crumble when the size of the data extends into the terabytes. Time series techniques must be revamped to handle streaming data in continuous time. Social media messages are unstructured, and have data formats that are unfit to be represented by traditional databases. While these may appear to be difficult problems, there has been tremendous progress in analyzing such data. Columnar databases have significantly boosted query speeds. File systems can seamlessly distribute datasets on multiple hard drives, and facilitate analytics on them in real time. Finally, the free and open source nature of big data platforms promotes their rapid adoption.
Big Data Analytics (Batch-03)
August 5, 2018 to August 10, 2018
Programme Directors : Profs. Dinesh Kumar, Shankar Venkatagri & Pulak Ghosh
Programme Dates : 05 – 10 August 2018
Programme Venue : M-12, IIMB Campus
Programme Overview : A triad of terms captures the essence of “big data”: volume, velocity and variety. The volume and pace at which data is created can challenge existing computing infrastructure. For example, every flight of a Boeing 777 can generate up to 1 terabyte (~1000 gigabytes) of data. Making sense of this data is imperative for decision-making and troubleshooting.
Organizations large and small are forced to grapple with problems of big data, which challenge the existing tenets of data science and computing technologies. Straightforward tasks such as interpreting descriptive statistics have their share of issues. We begin to question the utility of summary measures and diagrams.
Algorithms that work well on “small” datasets crumble when the size of the data extends into the terabytes. Time series techniques must be revamped to handle streaming data in continuous time. Social media messages are unstructured, and have data formats that are unfit to be represented by traditional databases. While these may appear to be difficult problems, there has been tremendous progress in analyzing such data. Columnar databases have significantly boosted query speeds. File systems can seamlessly distribute datasets on multiple hard drives, and facilitate analytics on them in real time. Finally, the free and open source nature of big data platforms promotes their rapid adoption.