HW3 - 3101 Matlab -- (INSERT NAME and UNI HERE)
Due 11:59pm on Tuesday, April 1st, 2008
Contents
- Goals
- Directions
- Extra credit
- Problem 1 - Working with Strings
- 1.1
- 1.2
- 1.3
- 1.4
- 1.4.1
- 1.5
- Problem 2 - Working with Cells
- 2.1
- 2.2
- 2.3
- 2.4
- 2.4.1
- 2.5
- Problem 3 - Saving and loading variables
- 3.1
- 3.2
- 3.3
- 3.4
- 3.5
- 3.6
- 3.7
- Problem 4 - ROT13 encryption
- 4.1
- 4.2
- 4.3
- Problem 5 - Breaking the Caesar Cipher
- 5.0
- 5.1
- 5.2
- 5.2.1
- 5.3
- 5.4
- 5.5
Goals
Working with new data structures: strings, and cells, avoiding loops, saving and loading variables to files
Directions
Please fill in your name at the top of this page and write your answers under their corresponding question. When you are done, publish this file to html using File > Publish To HTML (or the command publish('hw1.m', 'html') and create a new zip file that is labeled with your UNI and homework number, in this format: bs2018_hw1.zip, that contains this original m file, and the html directory created from publishing the file. Make sure the html file adequately shows your work, (has images, etc.), and then email this file to cs3101@gmail.com, making sure to include your name, and uni in the subject.
Extra credit
There will be extra credit points assigned on problems that are marked with (EC for no loop), if you are able to answer the question without using a loop.
Problem 1 - Working with Strings
teststring1 = 'The MATLAB high-performance language for technical computing integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation.';
1.1
Convert teststring1 to lowercase and store that new string in a variable called s1 and display s1.
s1 = lower(teststring1)
s1 = the matlab high-performance language for technical computing integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation.
1.2
Using one command (no loops!), find the indices of all the characters in s1 that are letters (not punctuation or spaces), then display only the first 5 of those indices.
i = find(isletter(s1)); i(1:5)
ans = 1 2 3 5 6
1.3
Convert s1 into an array of number values (stored as doubles) that correspond to the ASCII values for each character. Call this array x1 and display only the first 5 elements of it.
x1 = double(s1); x1(1:5)
ans = 116 104 101 32 109
1.4
Compute and plot a histogram of the letter distribution in x1. Make sure your histogram has 26 bins, and that you are only counting lowercase characters in x1, not any punctuation. (EC for no loop)
[n, xout] = hist(x1(i), 26); bar(xout, n);

1.4.1
Compute and display a new string called s1count which contains the 26 lowercase characters a,b,c...z arranged in order by their number of occurences in s1. The first character of s1count should be the character most used in teststring1. (EC for no loop)
[B, idx] = sort(n, 'descend');
s1count = char(xout(idx))
s1count = aemisnqmrlgchotdfbuvwxyjkp
1.5
At what index index in teststring1 does the word 'environment' appear?
findstr(teststring1, 'environment')
ans = 135
Problem 2 - Working with Cells
2.1
Make a new cell called e1. Fill e1 with the inidividual words from teststring1, so that each element of e1 is a word from teststrin1. The words should be lowercase and stripped of any punctuation (except you should make sure to leave hyphens).
e1 = cell(1, 1); rem = teststring1; i = 1; while length(rem) > 0 [token, rem] = strtok(rem); e1{i} = lower(token); i = i+1; end
2.2
Show the two different ways ({}, ()), to index the 6th element of e1, explain with a brief comment what is returned by each technique, and why they are different
%this grabs the contents of the 6th cell in the cell array e1 e1{6} % this simply grabs the 6th cell e1(6)
ans = technical ans = 'technical'
2.3
Set N to the number of words in teststring1 and display N
N = length(e1)
N = 26
2.4
Create a new cell called c1 that is NxN, each element of c1 should correspond to comparing two words that are in teststring1. in each element of c1, store 3 sub-elements: the first word, the second word, and the number of characters that the two words have in common.
c1 = cell(N, N); for i=1:N for j=1:N a = e1{i}; b = e1{j}; c1{i, j} = {a, b, length(intersect(a, b))}; end end
2.4.1
display the 1st element of c1
c1{1, 1}
ans = 'the' 'the' [3]
2.5
Using c1 that you computed in the previous problem, how many letters do the 9th and 10 words have in common? (make sure your answer is a number and not a cell, and furthermore the command to get this value should be only one line, no semicolons)
x = c1{9, 10}{3}
x = 7
Problem 3 - Saving and loading variables
3.1
Print out the names of the variables you have declared so far in the this homework and make sure the ouput shows the sizes and classes of those variables.
whos
Name Size Bytes Class Attributes A 1000x1000 8000000 double B 1x26 208 double C 1x1 8 double D 1000x1000 8000000 double I 1x1 8 double N 1x1 8 double Ntest 1x1 8 double V 1000x1000 8000000 double X 256x1000 2048000 double Xtest 256x20 40960 double Y 1x1000 8000 double Ytest 1x20 160 double a 1x9 18 char ans 1x3 200 cell b 1x9 18 char c 1x1 2 char c1 26x26 188240 cell d 1000x1 8000 double dists 1x1000 8000 double e1 1x26 1956 cell encryptedtext 1x53 106 char i 1x1 8 double idx 1x26 208 double j 1x1 8 double n 1x26 208 double ntest 1x26 208 double numClusters 1x1 8 double p 256x1 2048 double predictions 20x1 160 double rem 1x0 0 char s 1x53 106 char s1 1x223 446 char s13 1x53 106 char s1count 1x26 52 char scores 1x26 208 double se 1x53 106 char shift 1x1 8 double shiftScore 1x1 8 double st 1x3832 7664 char teststring1 1x223 446 char token 1x9 18 char traintext 1x3832 7664 char translatedtext 1x53 106 char v 256x1 2048 double v1 256x1 2048 double v2 256x1 2048 double v3 256x1 2048 double x 1x1 8 double x1 1x223 1784 double xout 1x26 208 double xouttest 1x26 208 double y 1000x1 8000 double
3.2
How many variables have you declared? (Answer this question with a command, not by couting yourself).
length(whos)
ans = 52
3.3
Save all of your current variables to a file called hw3temp.mat.
save hw3temp.mat
3.4
Clear your workspace and also clear your command window.
clear; clc;
3.5
How many variables do you have in your workspace now?
length(whos)
ans = 0
3.6
Load all of the variables in hw3temp.mat back into the workspace.
load hw3temp.mat
3.7
How many variables do you have in your workspace now?
length(whos)
ans = 52
Problem 4 - ROT13 encryption
Implementing rot13 encryption, for more info Wikipedia: ROT 13
4.1
Write a short code snippet that takes as input a string called s and creates a new string called s13 that is rot13 encoded. initially set s to the contents of teststring1, and display s13 when the loop is done. Make sure that you properly preserve uppercase and lowercase letters as well as punctuation. (EC for no loop)
s = teststring1; s13 = []; for i=1:length(s) c = s(i); if (c >= 'a' && c <= 'z') s13(i) = mod(c - 'a' + 13, 26) + 'a'; elseif (c >= 'A' && c <= 'Z') s13(i) = mod(c - 'A' + 13, 26) + 'A'; else s13(i) = c; end end s13 = char(s13) % % % % %Extra Credit Solution % % s = teststring1; % % s13 = s; % % i = s13 >= 'a' & s13 <= 'z'; % % s13(i) = char(mod(s13(i) - 'a' + 13, 26) + 'a'); % % i = s13>='A' & s13<='Z'; % % s13(i) = char(mod(s13(i) - 'A' + 13, 26) + 'A'); % % s13
s13 = Gur ZNGYNO uvtu-cresbeznapr ynathntr sbe grpuavpny pbzchgvat vagrtengrf pbzchgngvba, ivfhnyvmngvba, naq cebtenzzvat va na rnfl-gb-hfr raivebazrag jurer ceboyrzf naq fbyhgvbaf ner rkcerffrq va snzvyvne zngurzngvpny abgngvba.
4.2
Show how applying the rot13 algorithm twice, gives back the original string. copy and paste your rot13 code again below, but this time set s equal to the s13 that you just computed. display the new s13, (which should be indentical to the original teststring1.
s = s13; s13 = []; for i=1:length(s) c = s(i); if (c >= 'a' && c <= 'z') s13(i) = mod(c - 'a' + 13, 26) + 'a'; elseif (c >= 'A' && c <= 'Z') s13(i) = mod(c - 'A' + 13, 26) + 'A'; else s13(i) = c; end end s13 = char(s13)
s13 = The MATLAB high-performance language for technical computing integrates computation, visualization, and programming in an easy-to-use environment where problems and solutions are expressed in familiar mathematical notation.
4.3
Write a short snippet of code that determines if s13 and teststring1 are identical, and if they are then display the words, 'The two strings are the same, and both have x characters', where x is properly filled in with the length of the string. If the strings are not the same, display the words 'The two strings are different'.
if isequal(s13, teststring1) disp(sprintf('The two strings are the same, and both have %d characters', length(s13))); else disp('The two strings are different'); end
The two strings are the same, and both have 223 characters
Problem 5 - Breaking the Caesar Cipher
http://en.wikipedia.org/wiki/Caesar_cipher
5.0
Print out the working current directory.
pwd
ans = /Users/blake/Work/Machine Learning/Teach Matlab 3101/Class 3
5.1
Load in the file hw3caesar.mat, which can be downloaded from the class website.
load hw3caesar.mat;
5.2
Display only the variables that are contained within hw3caesar.mat
whos -file hw3caesar
Name Size Bytes Class Attributes encryptedtext 1x53 106 char traintext 1x3832 7664 char
5.2.1
Display the variable encryptedtext
encryptedtext
encryptedtext = Uibtij qa ug nidwzqbm tivociom. Appp! Qb qa i amkzmb.
5.3
Write a short program to decode the string stored in the encryptedtext variable in hw3caesar.mat. The encrypted text has been encoded with a caesar cipher with an unknown shift value. Your task is to to find that shift value without explicitly shifting the encrypted text 26 times. First compute the histogram of the encrypted text and then simply compare shifted versions of that histogram with the histogram of traintext. The correct shift value is the one for which the two histograms have the lowest chi square statistic. It is helpful to first convert the traintext and encrypted text to lower case before you create and compare the histograms. when you are done, print out the hidden shift value you found.
se = lower(encryptedtext); i = find(isletter(se)); [ntest, xouttest] = hist(double(se(i)), 26); st = lower(traintext); i = find(isletter(st)); [n, xout] = hist(double(st(i)), 26); scores = []; for shift=1:26 scores(shift) = sum(((circshift(ntest, [0, shift]) - n).^2) ./ n); end [shiftScore, shift] = min(scores) plot(scores)
shiftScore = 3.0807e+03 shift = 18

5.4
Now that you have found the shift value from the previous problem, what does the encrypted text say?
s = encryptedtext; s13 = s; i = ((s13 >= 'a') & (s13 <= 'z')); s13(i) = char(mod(s13(i) - 'a' + shift, 26) + 'a'); i = ((s13>='A') & (s13<='Z')); s13(i) = char(mod(s13(i) - 'A' + shift, 26) + 'A'); s13
s13 = Matlab is my favorite language. Shhh! It is a secret.
5.5
Declare a new variable called translatedtext and set it equal to what you have decoded from the encrypted text. save ONLY the tranlated text variable to a file called hw3tt.mat, after that print out the variables that are containted within the file hw3tt.mat.
translatedtext = s13; save 'hw3tt.mat' 'translatedtext'; whos -file hw3tt.mat
Name Size Bytes Class Attributes translatedtext 1x53 106 char