Beautiful Data: Introduction to Practical Data Science

      Fall 2016

      Alex Szalay


Class times:

Class location:

MW 15:00-16:15

Bloomberg 278


Send mail to Alex Szalay

Homework assignments:

Additional useful material:

Resources:

Database resources:

Powerpoint links


Syllabus
  • Data-Intensive Computing
    • The Fourth Paradigm
    • History of e-Science
    • Big Data in Science
  • Introduction to Databases
    • Relational databases, ACID
    • Indexing
    • Introduction to SQL
    • User defined functions
  • Hardware architectures
    • Storage hierarchy
    • Nature of low level I/O
    • Redundant storage, RAID, erasure codes
    • Networking issues
    • Balanced systems, Amdahl's Laws
    • Cloud computing vs Beowulf
  • Elementary Statistics
    • Distributions
    • Expectation values, moments
    • Central limit theorem
    • Linear regression
    • Principal component analysis
    • Random forests
  • Data transformations
    • Fourier transforms
    • Wavelets
    • Random projections
  • Data structures
    • Trees
    • K-d trees
    • Quad- and octrees, space filling curves
  • Hashing
    • Hash functions
    • Locality sensitive hashing
    • Bloom filters
  • Graphs
    • Representation of graphs
    • Properties of graphs
    • Laplacian, eigenvalues
    • Graphs as spring networks
  • Sorting and Searching
    • Quicksort
    • Queues
    • Merge-sort
  • Data streams, streaming algorithms
    • Mean, median
    • Sketching
  • Data visualization