HW3 - 3101 Matlab -- (INSERT NAME and UNI HERE)

Due 11:59pm on Tuesday, April 1st, 2008

Contents

Goals

Working with new data structures: strings, and cells, avoiding loops, saving and loading variables to files

Directions

Please fill in your name at the top of this page and write your answers under their corresponding question. When you are done, publish this file to html using File > Publish To HTML (or the command publish('hw1.m', 'html') and create a new zip file that is labeled with your UNI and homework number, in this format: bs2018_hw1.zip, that contains this original m file, and the html directory created from publishing the file. Make sure the html file adequately shows your work, (has images, etc.), and then email this file to cs3101@gmail.com, making sure to include your name, and uni in the subject.

Extra credit

There will be extra credit points assigned on problems that are marked with (EC for no loop), if you are able to answer the question without using a loop.

Problem 1 - Working with Strings

teststring1 = 'The MATLAB high-performance language for technical computing integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation.';

1.1

Convert teststring1 to lowercase and store that new string in a variable called s1 and display s1.

s1 = lower(teststring1)
s1 =

the matlab high-performance language for technical computing integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation.

1.2

Using one command (no loops!), find the indices of all the characters in s1 that are letters (not punctuation or spaces), then display only the first 5 of those indices.

i = find(isletter(s1));
i(1:5)
ans =

     1     2     3     5     6

1.3

Convert s1 into an array of number values (stored as doubles) that correspond to the ASCII values for each character. Call this array x1 and display only the first 5 elements of it.

x1 = double(s1);
x1(1:5)
ans =

   116   104   101    32   109

1.4

Compute and plot a histogram of the letter distribution in x1. Make sure your histogram has 26 bins, and that you are only counting lowercase characters in x1, not any punctuation. (EC for no loop)

[n, xout] = hist(x1(i), 26);
bar(xout, n);

1.4.1

Compute and display a new string called s1count which contains the 26 lowercase characters a,b,c...z arranged in order by their number of occurences in s1. The first character of s1count should be the character most used in teststring1. (EC for no loop)

[B, idx] = sort(n, 'descend');
s1count = char(xout(idx))
s1count =

aemisnqmrlgchotdfbuvwxyjkp

1.5

At what index index in teststring1 does the word 'environment' appear?

findstr(teststring1, 'environment')
ans =

   135

Problem 2 - Working with Cells

2.1

Make a new cell called e1. Fill e1 with the inidividual words from teststring1, so that each element of e1 is a word from teststrin1. The words should be lowercase and stripped of any punctuation (except you should make sure to leave hyphens).

e1 = cell(1, 1);
rem = teststring1;
i = 1;
while length(rem) > 0
   [token, rem] = strtok(rem);
   e1{i} = lower(token);
   i = i+1;
end

2.2

Show the two different ways ({}, ()), to index the 6th element of e1, explain with a brief comment what is returned by each technique, and why they are different

%this grabs the contents of the 6th cell in the cell array e1
e1{6}

% this simply grabs the 6th cell
e1(6)
ans =

technical


ans = 

    'technical'

2.3

Set N to the number of words in teststring1 and display N
N = length(e1)
N =

    26

2.4

Create a new cell called c1 that is NxN, each element of c1 should correspond to comparing two words that are in teststring1. in each element of c1, store 3 sub-elements: the first word, the second word, and the number of characters that the two words have in common.

c1 = cell(N, N);

for i=1:N
    for j=1:N
        a = e1{i};
        b = e1{j};
        c1{i, j} = {a, b, length(intersect(a, b))};
    end
end

2.4.1

display the 1st element of c1

c1{1, 1}
ans = 

    'the'    'the'    [3]

2.5

Using c1 that you computed in the previous problem, how many letters do the 9th and 10 words have in common? (make sure your answer is a number and not a cell, and furthermore the command to get this value should be only one line, no semicolons)

x = c1{9, 10}{3}
x =

     7

Problem 3 - Saving and loading variables

3.1

Print out the names of the variables you have declared so far in the this homework and make sure the ouput shows the sizes and classes of those variables.

whos
  Name                   Size                Bytes  Class     Attributes

  A                   1000x1000            8000000  double              
  B                      1x26                  208  double              
  C                      1x1                     8  double              
  D                   1000x1000            8000000  double              
  I                      1x1                     8  double              
  N                      1x1                     8  double              
  Ntest                  1x1                     8  double              
  V                   1000x1000            8000000  double              
  X                    256x1000            2048000  double              
  Xtest                256x20                40960  double              
  Y                      1x1000               8000  double              
  Ytest                  1x20                  160  double              
  a                      1x9                    18  char                
  ans                    1x3                   200  cell                
  b                      1x9                    18  char                
  c                      1x1                     2  char                
  c1                    26x26               188240  cell                
  d                   1000x1                  8000  double              
  dists                  1x1000               8000  double              
  e1                     1x26                 1956  cell                
  encryptedtext          1x53                  106  char                
  i                      1x1                     8  double              
  idx                    1x26                  208  double              
  j                      1x1                     8  double              
  n                      1x26                  208  double              
  ntest                  1x26                  208  double              
  numClusters            1x1                     8  double              
  p                    256x1                  2048  double              
  predictions           20x1                   160  double              
  rem                    1x0                     0  char                
  s                      1x53                  106  char                
  s1                     1x223                 446  char                
  s13                    1x53                  106  char                
  s1count                1x26                   52  char                
  scores                 1x26                  208  double              
  se                     1x53                  106  char                
  shift                  1x1                     8  double              
  shiftScore             1x1                     8  double              
  st                     1x3832               7664  char                
  teststring1            1x223                 446  char                
  token                  1x9                    18  char                
  traintext              1x3832               7664  char                
  translatedtext         1x53                  106  char                
  v                    256x1                  2048  double              
  v1                   256x1                  2048  double              
  v2                   256x1                  2048  double              
  v3                   256x1                  2048  double              
  x                      1x1                     8  double              
  x1                     1x223                1784  double              
  xout                   1x26                  208  double              
  xouttest               1x26                  208  double              
  y                   1000x1                  8000  double              

3.2

How many variables have you declared? (Answer this question with a command, not by couting yourself).

length(whos)
ans =

    52

3.3

Save all of your current variables to a file called hw3temp.mat.

save hw3temp.mat

3.4

Clear your workspace and also clear your command window.

clear;
clc;

3.5

How many variables do you have in your workspace now?

length(whos)
ans =

     0

3.6

Load all of the variables in hw3temp.mat back into the workspace.

load hw3temp.mat

3.7

How many variables do you have in your workspace now?

length(whos)
ans =

    52

Problem 4 - ROT13 encryption

Implementing rot13 encryption, for more info Wikipedia: ROT 13

4.1

Write a short code snippet that takes as input a string called s and creates a new string called s13 that is rot13 encoded. initially set s to the contents of teststring1, and display s13 when the loop is done. Make sure that you properly preserve uppercase and lowercase letters as well as punctuation. (EC for no loop)

s = teststring1;
s13 = [];

for i=1:length(s)
    c = s(i);
    if (c >= 'a' && c <= 'z')
        s13(i) = mod(c - 'a' + 13, 26) + 'a';
    elseif (c >= 'A' && c <= 'Z')
        s13(i) = mod(c - 'A' + 13, 26) + 'A';
    else
        s13(i) = c;
    end
end

s13 = char(s13)


% % % % %Extra Credit Solution
% % s = teststring1;
% % s13 = s;
% % i = s13 >= 'a' & s13 <= 'z';
% % s13(i) = char(mod(s13(i) - 'a' + 13, 26) + 'a');
% % i = s13>='A' & s13<='Z';
% % s13(i) = char(mod(s13(i) - 'A' + 13, 26) + 'A');
% % s13
s13 =

Gur ZNGYNO uvtu-cresbeznapr ynathntr sbe grpuavpny pbzchgvat vagrtengrf pbzchgngvba, ivfhnyvmngvba, naq cebtenzzvat va na rnfl-gb-hfr raivebazrag jurer ceboyrzf naq fbyhgvbaf ner rkcerffrq va snzvyvne zngurzngvpny abgngvba.

4.2

Show how applying the rot13 algorithm twice, gives back the original string. copy and paste your rot13 code again below, but this time set s equal to the s13 that you just computed. display the new s13, (which should be indentical to the original teststring1.

s = s13;
s13 = [];

for i=1:length(s)
    c = s(i);
    if (c >= 'a' && c <= 'z')
        s13(i) = mod(c - 'a' + 13, 26) + 'a';
    elseif (c >= 'A' && c <= 'Z')
        s13(i) = mod(c - 'A' + 13, 26) + 'A';
    else
        s13(i) = c;
    end
end


s13 = char(s13)
s13 =

The MATLAB high-performance language for technical computing integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation.

4.3

Write a short snippet of code that determines if s13 and teststring1 are identical, and if they are then display the words, 'The two strings are the same, and both have x characters', where x is properly filled in with the length of the string. If the strings are not the same, display the words 'The two strings are different'.

if isequal(s13, teststring1)
    disp(sprintf('The two strings are the same, and both have %d characters', length(s13)));
else
    disp('The two strings are different');
end
The two strings are the same, and both have 223 characters

Problem 5 - Breaking the Caesar Cipher

http://en.wikipedia.org/wiki/Caesar_cipher

5.0

Print out the working current directory.

pwd
ans =

/Users/blake/Work/Machine Learning/Teach Matlab 3101/Class 3

5.1

Load in the file hw3caesar.mat, which can be downloaded from the class website.

load hw3caesar.mat;

5.2

Display only the variables that are contained within hw3caesar.mat

whos -file hw3caesar
  Name               Size              Bytes  Class    Attributes

  encryptedtext      1x53                106  char               
  traintext          1x3832             7664  char               

5.2.1

Display the variable encryptedtext

encryptedtext
encryptedtext =

Uibtij qa ug nidwzqbm tivociom. Appp! Qb qa i amkzmb.

5.3

Write a short program to decode the string stored in the encryptedtext variable in hw3caesar.mat. The encrypted text has been encoded with a caesar cipher with an unknown shift value. Your task is to to find that shift value without explicitly shifting the encrypted text 26 times. First compute the histogram of the encrypted text and then simply compare shifted versions of that histogram with the histogram of traintext. The correct shift value is the one for which the two histograms have the lowest chi square statistic. It is helpful to first convert the traintext and encrypted text to lower case before you create and compare the histograms. when you are done, print out the hidden shift value you found.

se = lower(encryptedtext);
i = find(isletter(se));
[ntest, xouttest] = hist(double(se(i)), 26);

st = lower(traintext);
i = find(isletter(st));
[n, xout] = hist(double(st(i)), 26);


scores = [];

for shift=1:26
    scores(shift) = sum(((circshift(ntest, [0, shift]) - n).^2) ./ n);
end

[shiftScore, shift] = min(scores)
 plot(scores)
shiftScore =

   3.0807e+03


shift =

    18

5.4

Now that you have found the shift value from the previous problem, what does the encrypted text say?

s = encryptedtext;
s13 = s;

i = ((s13 >= 'a') & (s13 <= 'z'));
s13(i) = char(mod(s13(i) - 'a' + shift, 26) + 'a');
i = ((s13>='A') & (s13<='Z'));
s13(i) = char(mod(s13(i) - 'A' + shift, 26) + 'A');
s13
s13 =

Matlab is my favorite language. Shhh! It is a secret.

5.5

Declare a new variable called translatedtext and set it equal to what you have decoded from the encrypted text. save ONLY the tranlated text variable to a file called hw3tt.mat, after that print out the variables that are containted within the file hw3tt.mat.

translatedtext = s13;

save 'hw3tt.mat' 'translatedtext';

whos -file hw3tt.mat
  Name                Size            Bytes  Class    Attributes

  translatedtext      1x53              106  char