# Big Data Programming in Matlab

o For each programming question, submit both the Matlab script and the output.Answer the following questions.(1) Implement MDS in Matlab (although Matlab already has coded MDS in a function called cmdscale.m, I still would like you to do it by yourself in order to fully understand the steps of MDS). Its preferable to write a function for MDS with input being distances and dimension of target Euclidean space and output being the low dimensional coordinates and the Kruskal stress score:function [Y, stress] = mds(X, k). Afterwards, do the following:0 Apply your function to a data set (called Chinese CityData.mat) that contains the mutual distances of 12 Chinese cities to pro- duce a two-dimensional map (for clarity lets place the City of Urumqi in the top left corner). How good is your map?0 Suppose you had airline distances for 50 cities around the world. Could you use these distances to construct a world map?(2) Download the ISOmap code from the course website (note that the code provided on the ISOmap website has an error; it has been fixed in the version I provide). Also, find a real data set from the Internet (anything but those already listed on the ISOmap website) to which it makes sense to apply ISOmap (be sure to describe your data clearly but briefly). Perform ISOmap on your data set and interpret the low dimensional representation you got.(3) Use Dijkstras algorithm to calculate, by hand, the shortest distance from the node 0 to every other node in the following graph:28 CONTENTS (4) This question concerns Kernel PCA with the Gaussian kernel, also called the Radial Basis Function (RBF) kernel: x._x. 2 xix = 6- 302? and aims to help you understand what the combined algorithm does:0 The value [(Xz, Xj) represents dot product between the images of Xi, Xj in some infinite-dimensional feature space f;9 Each data point x,- is mapped to a unit vector in .7: (as [(xi, xi) 1);o If two points xi,xj 6 Rd are spatially close under the Eu- clidean distance, then their feature vectors gbi, (25,- E J: will have a small angle;0 If two points X,xj 6 Rd are spatially far from each other, then their feature vectors #57:: qfij E .7: will have an angle close to 90 degrees;0 The scale parameter a > 0 defines how far is far and how close is close. It is normally chosen to be the average dis- tance between each point and its kth nearest neighbor in the data set (say 16 = 8).Overall, Kernel PCA maps nearby points to unit feature vectors with small angles and faraway points into orthogonal directions and applies PCA in the feature space .73, thus preserving all (and only) local geometry. It does not depend on the shape of the data set and thus is a general-purpose kernel. For more reference, you should read the first two papers listed under Kernel PCA on the course website.Now, implement Kernel PCA in Matlab as a function and apply it to the data in kernelpcmdatwmat. Display the two dimensional representation of the data obtained by Kernel PCA. What do you find?(5) The Iris data set in the University of California, Irvine (UCI) Ma- chine Learning Repository (http : //archive . ics .uci . edu/ml/datasets/ Iris) contains 3 classes of 50 instances each, where each class refers to a type of iris plant. First, download this data set to your com- puter and use the file script_read_irisdata.m to read it into Matlab. Afterwards, perform the following tasks:0 Apply kmeans with 10 restarts to the iris data set to divide it into three groups. What is the error percentage of your clustering? Is it good and why?0 Now, suppose we do not know how many classes there are and would like to estimate it by kmeans with k : 1,. . . ,6, each with 10 restarts. Plot the scatter versus 1:. How many clusters does the plot indicate?